This is great. I was loading data using Python. My code would spawn 10 threads and put data in a queue. All threads would read data from this queue. However, all threads were hitting the same server/load balancer.
I tried a different setup too. Where I spawned processes with each process having its own queue. In this case too, all processes were hitting the same server. I just now made a change to my code. So now I have 10 threads randomly selecting a node and storing data in it. Again, I am getting around 50 writes/sec Could there be something wrong with the way I have written my loader script? On Wed, Jun 27, 2012 at 5:10 PM, Russell Brown <russell.br...@mac.com>wrote: > > On 27 Jun 2012, at 12:36, Yousuf Fauzan wrote: > > So I changed concurrency to 10 and put all the IPs of the nodes in basho > bench config. > Throughput is now around 1500. > > > I guess you can now try 5 or 15 concurrent workers and see which is > optimal for that set up to get a good feel for the sizing of any connection > pools for your application. > > You can also see how adding nodes and adding workers effects your results > to help you size the cluster you need for your expected usage. > > Cheers > > Russell > > > On Wed, Jun 27, 2012 at 4:40 PM, Russell Brown <russell.br...@mac.com>wrote: > >> >> On 27 Jun 2012, at 12:09, Yousuf Fauzan wrote: >> >> I used examples/riakc_pb.config >> >> {mode, max}. >> >> {duration, 10}. >> >> {concurrent, 1}. >> >> >> Try upping this. On my local 3 node cluster with 8gb ram and an old, >> cheap quad core per box I'd set concurrency to 10 workers. >> >> >> {driver, basho_bench_driver_riakc_pb}. >> >> {key_generator, {int_to_bin, {uniform_int, 10000}}}. >> >> {value_generator, {fixed_bin, 10000}}. >> >> {riakc_pb_ips, [{<IP of one of the nodes>}]}. >> >> >> I add all the IPs here, one entry per node. >> >> >> {riakc_pb_replies, 1}. >> >> {operations, [{get, 1}, {update, 1}]}. >> >> >> On Wed, Jun 27, 2012 at 4:37 PM, Russell Brown <russell.br...@mac.com>wrote: >> >>> >>> On 27 Jun 2012, at 12:05, Yousuf Fauzan wrote: >>> >>> I did use basho bench on my clusters. It should throughput of around 150 >>> >>> >>> Could you share the config you used, please? >>> >>> >>> On Wed, Jun 27, 2012 at 4:24 PM, Russell Brown <russell.br...@mac.com>wrote: >>> >>>> >>>> On 27 Jun 2012, at 11:50, Yousuf Fauzan wrote: >>>> >>>> Its not about the difference in throughput in the two approaches I >>>> took. Rather, the issue is that even 200 writes/sec is a bit on the lower >>>> side. >>>> I could be doing something wrong with the configuration because people >>>> are reporting throughputs of 2-3k ops/sec >>>> >>>> If anyone here could guide me in setting up a cluster which would give >>>> such kind of throughput. >>>> >>>> >>>> To get the kind of throughput I use multiple threads / workers. Have >>>> you looked at basho_bench[1], it is a simple, reliable tool to benchmark >>>> Riak clusters? >>>> >>>> Cheers >>>> >>>> Russell >>>> >>>> [1] Basho Bench - https://github.com/basho/basho_bench and >>>> http://wiki.basho.com/Benchmarking.html >>>> >>>> >>>> Thanks, >>>> Yousuf >>>> >>>> On Wed, Jun 27, 2012 at 4:02 PM, Eric Anderson >>>> <ander...@copperegg.com>wrote: >>>> >>>>> On Jun 27, 2012, at 5:13 AM, Yousuf Fauzan <yousuffau...@gmail.com> >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I setup a 3 machine riak SM cluster. Each machine used 4GB Ram and >>>>> riak OpenSource SmartMachine Image. >>>>> >>>>> Afterwards I tried loading data by following two methods >>>>> 1. Bash script >>>>> #!/bin/bash >>>>> echo $(date) >>>>> for (( c=1; c<=1000; c++ )) >>>>> do >>>>> curl -s -d 'this is a test' -H "Content-Type: text/plain" >>>>> http://127.0.0.1:8098/buckets/test/keys >>>>> done >>>>> echo $(date) >>>>> >>>>> 2. Python Riak Client >>>>> c=riak.RiakClient("10.112.2.185") >>>>> b=c.bucket("test") >>>>> for i in xrange(10000):o=b.new(str(i), str(i)).store() >>>>> >>>>> For case 1, throughput was 25 writes/sec >>>>> For case 2, throughput was 200 writes/sec >>>>> >>>>> Maybe I am making a fundamental mistake somewhere. I tried the above >>>>> two scripts on EC2 clusters too and still got the same performance. >>>>> >>>>> Please, someone help >>>>> >>>>> >>>>> >>>>> The major difference between these two is the first is executing a >>>>> binary, which has to basically create everything (connection, payload, >>>>> etc) >>>>> every time through the loop. The second does not - it creates the client >>>>> once, then iterates over it keeping the same client and presumably the >>>>> same >>>>> connection as well. That makes a huge difference. >>>>> >>>>> I would not use curl to do performance testing. What you probably >>>>> want is something like your python script that will work on many >>>>> threads/processes at once (or fire them up many times). >>>>> >>>>> >>>>> Eric Anderson >>>>> Co-Founder >>>>> CopperEgg >>>>> >>>>> >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> riak-users mailing list >>>> riak-users@lists.basho.com >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>> >>>> >>>> >>> >>> >> >> > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com