If you are not adverse to building from source, you might like the leveldb 
updates slated for 1.4 Riak.  There is an emphasis on write throughput.  The 
best branch for trial would be mv-level-work2 in github's basho/leveldb 
repository.  I suspect you will find about 50% improvement in data ingest over 
periods of 2 hours or more.

Matthew


On Apr 4, 2013, at 6:22 PM, Matthew MacClary 
<maccl...@lifetime.oregonstate.edu> wrote:

> PBC is certainly something I have on my list of things to explore. 
> Conceptually I am not sure if the speed gains from this protocol will be 
> apparent with large binary payloads. I thought that main speed gains were 
> from 1) more compact binary representation and 2) lower interpretation 
> overhead. In my situation I already have a largish binary payload that does 
> not need to be parsed. I could be wrong and may find that out as I explore 
> this further.
> 
> -Matt
> 
> 
> On Thu, Apr 4, 2013 at 1:45 PM, Shuhao <shu...@shuhaowu.com> wrote:
> Just as a side note, you might want to retry the test with PBC. While I have 
> only did testings with < 10kb documents, my tests indicates that PBC is twice 
> as fast as HTTP in almost all cases.
> 
> Shuhao
> 
> 
> On 13-04-04 04:14 PM, Matthew MacClary wrote:
> Thanks for the feedback. I made two changes to my test setup and saw better
> throughput:
> 
> 1) Don't write to the same key over and over. Updating a key appears to be
> a lot slower than creating a new key
> 
> 2) I used parallel PUTs
> 
> The throughput I was measuring before was about 26MB/s on localhost. With
> these changes it went to around 200MB/s on a disk that can write at about
> 480MB/s. That is more the type of performance I need for the data store we
> have in mind. I am going to proceed with testing on 8 nodes with RAID0
> drives.
> 
> Here are some details of the testing I did if it will help others. I tried
> the test with 1MB, 10MB, and 20MB binary data. I didn't notice a big signal
> with regard to larger objects slowing things down.
> 
> wget
> http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/1.2/1.2.1/rhel/5/riak-1.2.1-1.el5.x86_64.rpm
> 
> sudo rpm -Uvh riak-1.2.1-1.el5.x86_64.rpm
> /usr/sbin/riak start
> mkdir data-dir && cd data-dir
> seq -w 0 100 | parallel dd if=/dev/zero of={}.10meg bs=8k count=1280
> http_proxy=   # don’t contact proxy
> time find . -name \*.10meg | parallel -j8 -n1 wget --post-file {}
> http://127.0.0.1:8098/riak/test1/{}
> 
> During these tests I saw beam.smp jumping to 350-550 while watching %CPU
> under top. When I was seeing slower thoughput beam.smp was using much less
> CPU.
> 
> Kind regards,
> 
> -Matt
> 
> On Wed, Apr 3, 2013 at 7:20 AM, Reid Draper <reiddra...@gmail.com> wrote:
> 
> inline:
> 
> 
> On Apr 2, 2013, at 6:48 PM, Matthew MacClary <
> maccl...@lifetime.oregonstate.edu> wrote:
> 
> Hi all, I am new to this list. Thanks for taking the time to read my
> questions! I just want to know if the data throughput I am seeing is
> expected for the bitcask backend or if it is too low.
> 
> I am doing the preliminary feasibility study to decide if we should
> implement a Riak data store. Our application involves rendering chunks of
> data that range in size from about 1MB-9MB or so. This rendering work is
> CPU intensive so it is spread over a bunch of compute nodes which write the
> output into a data store.
> 
> 
> Riak is not intended to store objects of this size, not at the moment
> anyway. Riak CS [1], on the other hand, can store files up to several TB.
> That being said, Riak CS may or may not have other qualities  you desire.
> It's a known issue [2] that the Riak object size limitations should be
> better documented.
> 
> 
> After rendering, a second process consumes that data chunks from the data
> store at a rate of about 480MB/s in a streaming configuration so there is >
> 480MB/s of new data coming in at the same time the data is being read.
> 
> 
> Is this a single-socket, or is there some concurrency here?
> 
> 
> My testing so far involves a one node cluster on a dev box. What I wanted
> to show is that Riak writes were limited by the hard disk throughput. So
> far I haven't seen writes to localhost come anywhere close to the hard disk
> throughput:
> 
> $ MYFILE=/tmp/output.png
> $ dd if=/dev/zero of=$MYFILE bs=8k count=256k
> 262144+0 records in
> 262144+0 records out
> 2147483648 bytes (2.1 GB) copied, 4.48906 seconds, 478 MB/s
> $ rm $MYFILE
> 
> So the hard disk throughput is around 478MB/s for this simple write test.
> 
> The next test I did was to load a 39MB binary file into my one node
> cluster. I used a script to do 12 POSTs with curl and 12 POSTSs with wget.
> 
> curl --tcp-nodelay -XPOST http://${IP}:${PORT}/riak/test/file3 \
>      -H "Content-Type:application/octet-stream" \
>      --data-binary @${UPLOAD_FILE} \
>      --write-out "%{speed_upload}\n"
> 
> wget --post-file ${UPLOAD_FILE} http://127.0.0.1:8098/riak/test/file1
> 
> What I found was that I could get only about 26MB/s with this command line
> testing. Does this seam about right? Should I see an 18x slow down over the
> write speed of the disk drive?
> 
> 
> Was this running the 24 (12 * 2) uploads in serial or parallel? With a
> single-threaded workload, you're unlikely to get Riak to be able to
> saturate a disk. Furthermore, there are design decisions in Riak at the
> moment that make it less than optimal for single objects of 39MB.
> Single-object high throughput (measured in MB) is more in the wheelhouse of
> Riak CS than Riak on it's own, which is primarily designed for low-latency
> and high-throughput (measured in ops/sec). One of the ways that Riak CS
> achieves this on top of Riak is by introducing concurrency between the
> end-user and Riak.
> 
> 
> Thanks for your comments on my application and test approach!
> 
> 
> Hope this helps,
> Reid
> 
> [1] http://docs.basho.com/riakcs/latest/
> [2] https://github.com/basho/basho_docs/issues/256
> 
> 
> 
> -Matt
> 
> -----------------------------------------------
> Dev Environment Details:
> dev box  running RHEL6.2, 12 cores, 48GB, 6Gb/s SAS 15k HD
> Riak 1.2.1 from
> http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/1.2/1.2.1/rhel/5/riak-1.2.1-1.el5.x86_64.rpm
> n_val=1
> r=1
> w=1
> backend=bitcask
> 
> Deploy Environment Details:
>   Node to node bandwidth > 40Gb/s
>   similar config for node servers
>   n_val=3
>   r=1
>   w=1
>   backend=?
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> 
> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to