This is great.

I was loading data using Python. My code would spawn 10 threads and put
data in a queue. All threads would read data from this queue.
However, all threads were hitting the same server/load balancer.

I tried a different setup too. Where I spawned processes with each process
having its own queue. In this case too, all processes were hitting the same
server.

I just now made a change to my code. So now I have 10 threads randomly
selecting a node and storing data in it.
Again, I am getting around 50 writes/sec

Could there be something wrong with the way I have written my loader script?

On Wed, Jun 27, 2012 at 5:10 PM, Russell Brown <russell.br...@mac.com>wrote:

>
> On 27 Jun 2012, at 12:36, Yousuf Fauzan wrote:
>
> So I changed concurrency to 10 and put all the IPs of the nodes in basho
> bench config.
> Throughput is now around 1500.
>
>
> I guess you can now try 5 or 15 concurrent workers and see which is
> optimal for that set up to get a good feel for the sizing of any connection
> pools for your application.
>
> You can also see how adding nodes and adding workers effects your results
> to help you size the cluster you need for your expected usage.
>
> Cheers
>
> Russell
>
>
> On Wed, Jun 27, 2012 at 4:40 PM, Russell Brown <russell.br...@mac.com>wrote:
>
>>
>> On 27 Jun 2012, at 12:09, Yousuf Fauzan wrote:
>>
>> I used examples/riakc_pb.config
>>
>> {mode, max}.
>>
>> {duration, 10}.
>>
>> {concurrent, 1}.
>>
>>
>> Try upping this. On my local 3 node cluster with 8gb ram and an old,
>> cheap quad core per box I'd set concurrency to 10 workers.
>>
>>
>> {driver, basho_bench_driver_riakc_pb}.
>>
>> {key_generator, {int_to_bin, {uniform_int, 10000}}}.
>>
>> {value_generator, {fixed_bin, 10000}}.
>>
>> {riakc_pb_ips, [{<IP of one of the nodes>}]}.
>>
>>
>> I add all the IPs here, one entry per node.
>>
>>
>> {riakc_pb_replies, 1}.
>>
>> {operations, [{get, 1}, {update, 1}]}.
>>
>>
>> On Wed, Jun 27, 2012 at 4:37 PM, Russell Brown <russell.br...@mac.com>wrote:
>>
>>>
>>> On 27 Jun 2012, at 12:05, Yousuf Fauzan wrote:
>>>
>>> I did use basho bench on my clusters. It should throughput of around 150
>>>
>>>
>>> Could you share the config you used, please?
>>>
>>>
>>> On Wed, Jun 27, 2012 at 4:24 PM, Russell Brown <russell.br...@mac.com>wrote:
>>>
>>>>
>>>> On 27 Jun 2012, at 11:50, Yousuf Fauzan wrote:
>>>>
>>>> Its not about the difference in throughput in the two approaches I
>>>> took. Rather, the issue is that even 200 writes/sec is a bit on the lower
>>>> side.
>>>> I could be doing something wrong with the configuration because people
>>>> are reporting throughputs of 2-3k ops/sec
>>>>
>>>> If anyone here could guide me in setting up a cluster which would give
>>>> such kind of throughput.
>>>>
>>>>
>>>> To get the kind of throughput I use multiple threads / workers. Have
>>>> you looked at basho_bench[1], it is a simple, reliable tool to benchmark
>>>> Riak clusters?
>>>>
>>>> Cheers
>>>>
>>>> Russell
>>>>
>>>> [1] Basho Bench - https://github.com/basho/basho_bench and
>>>> http://wiki.basho.com/Benchmarking.html
>>>>
>>>>
>>>> Thanks,
>>>> Yousuf
>>>>
>>>> On Wed, Jun 27, 2012 at 4:02 PM, Eric Anderson 
>>>> <ander...@copperegg.com>wrote:
>>>>
>>>>> On Jun 27, 2012, at 5:13 AM, Yousuf Fauzan <yousuffau...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I setup a 3 machine riak SM cluster. Each machine used 4GB Ram and
>>>>> riak OpenSource SmartMachine Image.
>>>>>
>>>>> Afterwards I tried loading data by following two methods
>>>>> 1. Bash script
>>>>> #!/bin/bash
>>>>> echo $(date)
>>>>> for (( c=1; c<=1000; c++ ))
>>>>> do
>>>>> curl -s -d 'this is a test' -H "Content-Type: text/plain"
>>>>> http://127.0.0.1:8098/buckets/test/keys
>>>>> done
>>>>> echo $(date)
>>>>>
>>>>> 2. Python Riak Client
>>>>> c=riak.RiakClient("10.112.2.185")
>>>>> b=c.bucket("test")
>>>>> for i in xrange(10000):o=b.new(str(i), str(i)).store()
>>>>>
>>>>> For case 1, throughput was 25 writes/sec
>>>>> For case 2, throughput was 200 writes/sec
>>>>>
>>>>> Maybe I am making a fundamental mistake somewhere. I tried the above
>>>>> two scripts on EC2 clusters too and still got the same performance.
>>>>>
>>>>> Please, someone help
>>>>>
>>>>>
>>>>>
>>>>> The major difference between these two is the first is executing a
>>>>> binary, which has to basically create everything (connection, payload, 
>>>>> etc)
>>>>> every time through the loop.  The second does not - it creates the client
>>>>> once, then iterates over it keeping the same client and presumably the 
>>>>> same
>>>>> connection as well.  That makes a huge difference.
>>>>>
>>>>> I would not use curl to do performance testing.  What you probably
>>>>> want is something like your python script that will work on many
>>>>> threads/processes at once (or fire them up many times).
>>>>>
>>>>>
>>>>> Eric Anderson
>>>>> Co-Founder
>>>>> CopperEgg
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users@lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to