Oh, also, your logging is going to cause a HUGE performance hit, especially
if that machine is one of the Riak nodes.  Too much disk and IO thrash.

-mox

On Tue, Aug 28, 2012 at 10:56 AM, Mike Oxford <moxf...@gmail.com> wrote:

> Use the https://github.com/basho/riak-erlang-client directly, instead of
> calling os:cmd and pushing through CURL.
> You can also parallelize it at that time, because right now you're doing
> 25million os:cmd calls and making 25million curl calls.  Open up a pool of
> connections (or even just N and round-robin them) and keep them open.
>
> A 2-node cluster will have 1/3 of the set on one machine, and 2/3 on the
> other.  You may consider moving to N=2 on the bucket, which will put one
> copy on each machine (eg, dual-master.)
>
> Beyond that, you have not provided enough information as to where the
> bottleneck may be, though I'm sure the Basho crew will have some better
> better answers.  :)
>
> -mox
>
>
> On Mon, Aug 27, 2012 at 8:26 PM, <sangeetha.pattabiram...@cognizant.com>wrote:
>
>>  Dear team,
>>
>>
>>
>>
>>
>> I am trying to load 25 million dataset (1.3 Gb)  of sample call data
>> onto riak..its a 4-quad core ---1.5 TB storage 2-node raik cluster…takes
>>  real    5671m12.812s.please suggest the solutions for the betterment of
>> the same…5671m12.812s is quite huge…we deal with bigdata and I need to
>> store and test 165 GB on the riak..if so I may take years for loading I
>> guess with the present scenario…loaded 165 GB on to mongodb and got the
>> results..for *comparative performance study of mongodb  and riak db* …please 
>> do assist me with the  same .
>>
>>
>>
>>
>>
>>
>>
>> *using the following code for loading :*
>>
>>
>>
>> #!/usr/local/bin/escript
>>
>> main([Filename]) ->
>>
>>     {ok, Data} = file:read_file(Filename),
>>
>>     Lines = tl(re:split(Data, "\r?\n", [{return, binary},trim])),
>>
>>     lists:foreach(fun(L) -> LS = re:split(L, ","), format_and_insert(LS)
>> end, Lines).
>>
>>
>>
>> format_and_insert(Line) ->
>>
>>     JSON =
>> io_lib:format("{\"id\":\"~s\",\"phonenumber\":~s,\"callednumber\":~s,\"starttime\":~s,\"endtime\":~s,\"status\":~s}",
>> Line),
>>
>>     Command = io_lib:format("curl -X PUT
>> http://10.232.5.169:8098/riak/CustCalls25m/~s -d '~s' -H 'content-type:
>> application/json'", [hd(Line),JSON]),
>>
>>     io:format("Inserting: ~s~n", [hd(Line)]),
>>
>>     os:cmd(Command).
>>
>>
>>
>> *[hadoop@CTSINGMRGTO data]$ time ./load_data25m CustCalls25m.csv >>
>> 25m.txt &*
>>
>> [3] 32354
>>
>>
>>
>>
>>
>> [hadoop@CTSINGMRGTO data]$
>>
>> *real    5671m12.812s*
>>
>> user    1725m31.862s
>>
>> sys     3074m42.135s
>>
>> [hadoop@CTSINGMRGTO data]$
>>
>>
>>
>> [hadoop@CTSINGMRGTO data]$ tail -4 25m.txt
>>
>> Inserting: 24999997
>>
>> Inserting: 24999998
>>
>> Inserting: 24999999
>>
>> *Inserting: 25000000*
>>
>> [hadoop@CTSINGMRGTO data]$
>>
>>
>>  This e-mail and any files transmitted with it are for the sole use of
>> the intended recipient(s) and may contain confidential and privileged
>> information. If you are not the intended recipient(s), please reply to the
>> sender and destroy all copies of the original message. Any unauthorized
>> review, use, disclosure, dissemination, forwarding, printing or copying of
>> this email, and/or any action taken in reliance on the contents of this
>> e-mail is strictly prohibited and may be unlawful.
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to