Oh, also, your logging is going to cause a HUGE performance hit, especially if that machine is one of the Riak nodes. Too much disk and IO thrash.
-mox On Tue, Aug 28, 2012 at 10:56 AM, Mike Oxford <moxf...@gmail.com> wrote: > Use the https://github.com/basho/riak-erlang-client directly, instead of > calling os:cmd and pushing through CURL. > You can also parallelize it at that time, because right now you're doing > 25million os:cmd calls and making 25million curl calls. Open up a pool of > connections (or even just N and round-robin them) and keep them open. > > A 2-node cluster will have 1/3 of the set on one machine, and 2/3 on the > other. You may consider moving to N=2 on the bucket, which will put one > copy on each machine (eg, dual-master.) > > Beyond that, you have not provided enough information as to where the > bottleneck may be, though I'm sure the Basho crew will have some better > better answers. :) > > -mox > > > On Mon, Aug 27, 2012 at 8:26 PM, <sangeetha.pattabiram...@cognizant.com>wrote: > >> Dear team, >> >> >> >> >> >> I am trying to load 25 million dataset (1.3 Gb) of sample call data >> onto riak..its a 4-quad core ---1.5 TB storage 2-node raik cluster…takes >> real 5671m12.812s.please suggest the solutions for the betterment of >> the same…5671m12.812s is quite huge…we deal with bigdata and I need to >> store and test 165 GB on the riak..if so I may take years for loading I >> guess with the present scenario…loaded 165 GB on to mongodb and got the >> results..for *comparative performance study of mongodb and riak db* …please >> do assist me with the same . >> >> >> >> >> >> >> >> *using the following code for loading :* >> >> >> >> #!/usr/local/bin/escript >> >> main([Filename]) -> >> >> {ok, Data} = file:read_file(Filename), >> >> Lines = tl(re:split(Data, "\r?\n", [{return, binary},trim])), >> >> lists:foreach(fun(L) -> LS = re:split(L, ","), format_and_insert(LS) >> end, Lines). >> >> >> >> format_and_insert(Line) -> >> >> JSON = >> io_lib:format("{\"id\":\"~s\",\"phonenumber\":~s,\"callednumber\":~s,\"starttime\":~s,\"endtime\":~s,\"status\":~s}", >> Line), >> >> Command = io_lib:format("curl -X PUT >> http://10.232.5.169:8098/riak/CustCalls25m/~s -d '~s' -H 'content-type: >> application/json'", [hd(Line),JSON]), >> >> io:format("Inserting: ~s~n", [hd(Line)]), >> >> os:cmd(Command). >> >> >> >> *[hadoop@CTSINGMRGTO data]$ time ./load_data25m CustCalls25m.csv >> >> 25m.txt &* >> >> [3] 32354 >> >> >> >> >> >> [hadoop@CTSINGMRGTO data]$ >> >> *real 5671m12.812s* >> >> user 1725m31.862s >> >> sys 3074m42.135s >> >> [hadoop@CTSINGMRGTO data]$ >> >> >> >> [hadoop@CTSINGMRGTO data]$ tail -4 25m.txt >> >> Inserting: 24999997 >> >> Inserting: 24999998 >> >> Inserting: 24999999 >> >> *Inserting: 25000000* >> >> [hadoop@CTSINGMRGTO data]$ >> >> >> This e-mail and any files transmitted with it are for the sole use of >> the intended recipient(s) and may contain confidential and privileged >> information. If you are not the intended recipient(s), please reply to the >> sender and destroy all copies of the original message. Any unauthorized >> review, use, disclosure, dissemination, forwarding, printing or copying of >> this email, and/or any action taken in reliance on the contents of this >> e-mail is strictly prohibited and may be unlawful. >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com