Re: Tool to to execute an benchmark for HBase.

2015-01-30 Thread Nishanth S
You are hitting  hbase harder now which is important for benchmarking.If
there is no data loss it means your hbase cluster is  good enough to handle
the load.You are simply making more use of the cores from where you launch
 ycsb  process.Write your own workload depending on the record sizes,format
to see what can be achieved in a particular use case.

-Nishanth

On Fri, Jan 30, 2015 at 5:34 AM, Guillermo Ortiz 
wrote:

> I have coming back to the benchmark.I executde this command:
> yscb run hbase -P workflowA -p columnfamilty=cf -p
> operationcount=10 threads=32
>
> And I got an performace of 2000op/seg
> What I did later it's to execute ten of those commands in parallel and
> I got about 18000op/sec  in total. I don't get 2000op/sec for each ot
> them executions but I got about 1800op/sec
>
> I don't know if ti's an HBase question, but, I don't understand why I
> got more performance if I execute more commands in parallel if I
> already execute 32 threads.
> I took a look to the "top" and I saw that in the first (just one
> process) the CPU was working about 20-60% when I launch more processes
> the CPU it's about 400-500%.
>
>
>
> 2015-01-29 18:23 GMT+01:00 Guillermo Ortiz :
> > There's an option when you execute yscb to say how many clients
> > threads you want to use. I tried with 1/8/16/32. Those results are
> > with 16, the improvement 1vs8 it's pretty high not as much 16 to 32.
> > I only use one yscb, could it be that important?
> >
> > -threads : the number of client threads. By default, the YCSB Client
> > uses a single worker thread, but additional threads can be specified.
> > This is often done to increase the amount of load offered against the
> > database.
> >
> > 2015-01-29 17:27 GMT+01:00 Nishanth S :
> >> How many instances of ycsb do you run and how many threads do you use
> per
> >> instance.I guess these ops are per instance and  you should get similar
> >> numbers if you run  more instances.In short try running more  workload
> >> instances...
> >>
> >> -Nishanth
> >>
> >> On Thu, Jan 29, 2015 at 8:49 AM, Guillermo Ortiz 
> >> wrote:
> >>
> >>> Yes, I'm using 40%. i can't access to those data either.
> >>> I don't know how YSCB executes the reads and if they are random and
> >>> could take advange of the cache.
> >>>
> >>> Do you think that it's an acceptable performance?
> >>>
> >>>
> >>> 2015-01-29 16:26 GMT+01:00 Ted Yu :
> >>> > What's the value for hfile.block.cache.size ?
> >>> >
> >>> > By default it is 40%. You may want to increase its value if you're
> using
> >>> > default.
> >>> >
> >>> > Andrew published some ycsb results :
> >>> > http://people.apache.org/~apurtell/results-ycsb-0.98.8/ycsb
> >>> > -0.98.0-vs-0.98.8.pdf
> >>> >
> >>> > However, I couldn't access the above now.
> >>> >
> >>> > Cheers
> >>> >
> >>> > On Thu, Jan 29, 2015 at 7:14 AM, Guillermo Ortiz <
> konstt2...@gmail.com>
> >>> > wrote:
> >>> >
> >>> >> Is there any result with that benchmark to compare??
> >>> >> I'm executing the different workloads and for example for 100% Reads
> >>> >> in a table with 10Millions of records I only get an performance of
> >>> >> 2000operations/sec. I hoped much better performance but I could be
> >>> >> wrong. I'd like to know if it's a normal performance or I could have
> >>> >> something bad configured.
> >>> >>
> >>> >>
> >>> >> I have splitted the tabled and all the records are balanced and used
> >>> >> snappy.
> >>> >> The cluster has a master and 4 regions servers with 256Gb,Cores 2
> (32
> >>> >> w/ Hyperthreading), 0.98.6-cdh5.3.0,
> >>> >>
> >>> >> RegionServer is executed with these parameters:
> >>> >>  /usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_regionserver
> >>> >> -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
> >>> >> -Djava.net.preferIPv4Stack=true -Xms640679936 -Xmx640679936
> >>> >> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> >>> >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> >>> >> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
> >>> >> -Dhbase.log.dir=/var/log/hbase
> >>> >>
> >>> >>
> >>>
> -Dhbase.log.file=hbase-cmf-hbase-REGIONSERVER-cnsalbsrvcl23.lvtc.gsnet.corp.log.out
> >>> >>
> >>>
> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase
> >>> >> -Dhbase.id.str= -Dhbase.root.logger=INFO,RFA
> >>> >>
> >>> >>
> >>>
> -Djava.library.path=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
> >>> >> -Dhbase.security.logger=INFO,RFAS
> >>> >> org.apache.hadoop.hbase.regionserver.HRegionServer start
> >>> >>
> >>> >>
> >>> >> The results for 100% reads are
> >>> >> [OVERALL], RunTime(ms), 42734.0
> >>> >> [OVERALL], Throughput(ops/sec), 2340.0570973931763
> >>> >> [UPDATE], Operations, 1.0
> >>> >> [UPDATE], AverageLatency(us), 103170.0
> >>> >> [UPDATE], MinLatency(us), 103168.0
> >>> >> [UPDATE], MaxLatency(us), 103171.0
> >>> >> [UPDATE], 95thPercentileLatency(ms), 103.0
> >>> >> [UPDATE], 99thPercentileLatency(ms), 103.0
> >>>

Re: Tool to to execute an benchmark for HBase.

2015-01-30 Thread Guillermo Ortiz
I have coming back to the benchmark.I executde this command:
yscb run hbase -P workflowA -p columnfamilty=cf -p
operationcount=10 threads=32

And I got an performace of 2000op/seg
What I did later it's to execute ten of those commands in parallel and
I got about 18000op/sec  in total. I don't get 2000op/sec for each ot
them executions but I got about 1800op/sec

I don't know if ti's an HBase question, but, I don't understand why I
got more performance if I execute more commands in parallel if I
already execute 32 threads.
I took a look to the "top" and I saw that in the first (just one
process) the CPU was working about 20-60% when I launch more processes
the CPU it's about 400-500%.



2015-01-29 18:23 GMT+01:00 Guillermo Ortiz :
> There's an option when you execute yscb to say how many clients
> threads you want to use. I tried with 1/8/16/32. Those results are
> with 16, the improvement 1vs8 it's pretty high not as much 16 to 32.
> I only use one yscb, could it be that important?
>
> -threads : the number of client threads. By default, the YCSB Client
> uses a single worker thread, but additional threads can be specified.
> This is often done to increase the amount of load offered against the
> database.
>
> 2015-01-29 17:27 GMT+01:00 Nishanth S :
>> How many instances of ycsb do you run and how many threads do you use per
>> instance.I guess these ops are per instance and  you should get similar
>> numbers if you run  more instances.In short try running more  workload
>> instances...
>>
>> -Nishanth
>>
>> On Thu, Jan 29, 2015 at 8:49 AM, Guillermo Ortiz 
>> wrote:
>>
>>> Yes, I'm using 40%. i can't access to those data either.
>>> I don't know how YSCB executes the reads and if they are random and
>>> could take advange of the cache.
>>>
>>> Do you think that it's an acceptable performance?
>>>
>>>
>>> 2015-01-29 16:26 GMT+01:00 Ted Yu :
>>> > What's the value for hfile.block.cache.size ?
>>> >
>>> > By default it is 40%. You may want to increase its value if you're using
>>> > default.
>>> >
>>> > Andrew published some ycsb results :
>>> > http://people.apache.org/~apurtell/results-ycsb-0.98.8/ycsb
>>> > -0.98.0-vs-0.98.8.pdf
>>> >
>>> > However, I couldn't access the above now.
>>> >
>>> > Cheers
>>> >
>>> > On Thu, Jan 29, 2015 at 7:14 AM, Guillermo Ortiz 
>>> > wrote:
>>> >
>>> >> Is there any result with that benchmark to compare??
>>> >> I'm executing the different workloads and for example for 100% Reads
>>> >> in a table with 10Millions of records I only get an performance of
>>> >> 2000operations/sec. I hoped much better performance but I could be
>>> >> wrong. I'd like to know if it's a normal performance or I could have
>>> >> something bad configured.
>>> >>
>>> >>
>>> >> I have splitted the tabled and all the records are balanced and used
>>> >> snappy.
>>> >> The cluster has a master and 4 regions servers with 256Gb,Cores 2 (32
>>> >> w/ Hyperthreading), 0.98.6-cdh5.3.0,
>>> >>
>>> >> RegionServer is executed with these parameters:
>>> >>  /usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_regionserver
>>> >> -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
>>> >> -Djava.net.preferIPv4Stack=true -Xms640679936 -Xmx640679936
>>> >> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>> >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>> >> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
>>> >> -Dhbase.log.dir=/var/log/hbase
>>> >>
>>> >>
>>> -Dhbase.log.file=hbase-cmf-hbase-REGIONSERVER-cnsalbsrvcl23.lvtc.gsnet.corp.log.out
>>> >>
>>> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase
>>> >> -Dhbase.id.str= -Dhbase.root.logger=INFO,RFA
>>> >>
>>> >>
>>> -Djava.library.path=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
>>> >> -Dhbase.security.logger=INFO,RFAS
>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer start
>>> >>
>>> >>
>>> >> The results for 100% reads are
>>> >> [OVERALL], RunTime(ms), 42734.0
>>> >> [OVERALL], Throughput(ops/sec), 2340.0570973931763
>>> >> [UPDATE], Operations, 1.0
>>> >> [UPDATE], AverageLatency(us), 103170.0
>>> >> [UPDATE], MinLatency(us), 103168.0
>>> >> [UPDATE], MaxLatency(us), 103171.0
>>> >> [UPDATE], 95thPercentileLatency(ms), 103.0
>>> >> [UPDATE], 99thPercentileLatency(ms), 103.0
>>> >> [READ], Operations, 10.0
>>> >> [READ], AverageLatency(us), 412.5534
>>> >> [READ], AverageLatency(us,corrected), 581.6249026771276
>>> >> [READ], MinLatency(us), 218.0
>>> >> [READ], MaxLatency(us), 268383.0
>>> >> [READ], MaxLatency(us,corrected), 268383.0
>>> >> [READ], 95thPercentileLatency(ms), 0.0
>>> >> [READ], 95thPercentileLatency(ms,corrected), 0.0
>>> >> [READ], 99thPercentileLatency(ms), 0.0
>>> >> [READ], 99thPercentileLatency(ms,corrected), 0.0
>>> >> [READ], Return=0, 10
>>> >> [CLEANUP], Operations, 1.0
>>> >> [CLEANUP], AverageLatency(us), 103598.0
>>> >> [CLEANUP], MinLatency(us), 103596.0
>>> >> [CLEANUP], MaxLatency(us), 103599.0
>>

Re: Tool to to execute an benchmark for HBase.

2015-01-29 Thread Guillermo Ortiz
There's an option when you execute yscb to say how many clients
threads you want to use. I tried with 1/8/16/32. Those results are
with 16, the improvement 1vs8 it's pretty high not as much 16 to 32.
I only use one yscb, could it be that important?

-threads : the number of client threads. By default, the YCSB Client
uses a single worker thread, but additional threads can be specified.
This is often done to increase the amount of load offered against the
database.

2015-01-29 17:27 GMT+01:00 Nishanth S :
> How many instances of ycsb do you run and how many threads do you use per
> instance.I guess these ops are per instance and  you should get similar
> numbers if you run  more instances.In short try running more  workload
> instances...
>
> -Nishanth
>
> On Thu, Jan 29, 2015 at 8:49 AM, Guillermo Ortiz 
> wrote:
>
>> Yes, I'm using 40%. i can't access to those data either.
>> I don't know how YSCB executes the reads and if they are random and
>> could take advange of the cache.
>>
>> Do you think that it's an acceptable performance?
>>
>>
>> 2015-01-29 16:26 GMT+01:00 Ted Yu :
>> > What's the value for hfile.block.cache.size ?
>> >
>> > By default it is 40%. You may want to increase its value if you're using
>> > default.
>> >
>> > Andrew published some ycsb results :
>> > http://people.apache.org/~apurtell/results-ycsb-0.98.8/ycsb
>> > -0.98.0-vs-0.98.8.pdf
>> >
>> > However, I couldn't access the above now.
>> >
>> > Cheers
>> >
>> > On Thu, Jan 29, 2015 at 7:14 AM, Guillermo Ortiz 
>> > wrote:
>> >
>> >> Is there any result with that benchmark to compare??
>> >> I'm executing the different workloads and for example for 100% Reads
>> >> in a table with 10Millions of records I only get an performance of
>> >> 2000operations/sec. I hoped much better performance but I could be
>> >> wrong. I'd like to know if it's a normal performance or I could have
>> >> something bad configured.
>> >>
>> >>
>> >> I have splitted the tabled and all the records are balanced and used
>> >> snappy.
>> >> The cluster has a master and 4 regions servers with 256Gb,Cores 2 (32
>> >> w/ Hyperthreading), 0.98.6-cdh5.3.0,
>> >>
>> >> RegionServer is executed with these parameters:
>> >>  /usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_regionserver
>> >> -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
>> >> -Djava.net.preferIPv4Stack=true -Xms640679936 -Xmx640679936
>> >> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>> >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>> >> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
>> >> -Dhbase.log.dir=/var/log/hbase
>> >>
>> >>
>> -Dhbase.log.file=hbase-cmf-hbase-REGIONSERVER-cnsalbsrvcl23.lvtc.gsnet.corp.log.out
>> >>
>> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase
>> >> -Dhbase.id.str= -Dhbase.root.logger=INFO,RFA
>> >>
>> >>
>> -Djava.library.path=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
>> >> -Dhbase.security.logger=INFO,RFAS
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer start
>> >>
>> >>
>> >> The results for 100% reads are
>> >> [OVERALL], RunTime(ms), 42734.0
>> >> [OVERALL], Throughput(ops/sec), 2340.0570973931763
>> >> [UPDATE], Operations, 1.0
>> >> [UPDATE], AverageLatency(us), 103170.0
>> >> [UPDATE], MinLatency(us), 103168.0
>> >> [UPDATE], MaxLatency(us), 103171.0
>> >> [UPDATE], 95thPercentileLatency(ms), 103.0
>> >> [UPDATE], 99thPercentileLatency(ms), 103.0
>> >> [READ], Operations, 10.0
>> >> [READ], AverageLatency(us), 412.5534
>> >> [READ], AverageLatency(us,corrected), 581.6249026771276
>> >> [READ], MinLatency(us), 218.0
>> >> [READ], MaxLatency(us), 268383.0
>> >> [READ], MaxLatency(us,corrected), 268383.0
>> >> [READ], 95thPercentileLatency(ms), 0.0
>> >> [READ], 95thPercentileLatency(ms,corrected), 0.0
>> >> [READ], 99thPercentileLatency(ms), 0.0
>> >> [READ], 99thPercentileLatency(ms,corrected), 0.0
>> >> [READ], Return=0, 10
>> >> [CLEANUP], Operations, 1.0
>> >> [CLEANUP], AverageLatency(us), 103598.0
>> >> [CLEANUP], MinLatency(us), 103596.0
>> >> [CLEANUP], MaxLatency(us), 103599.0
>> >> [CLEANUP], 95thPercentileLatency(ms), 103.0
>> >> [CLEANUP], 99thPercentileLatency(ms), 103.0
>> >>
>> >> hbase(main):030:0> describe 'username'
>> >> DESCRIPTION
>> >> ENABLED
>> >>  'username', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER
>> >> => 'ROW', REPLICATION_SCOPE => '0', true
>> >>   VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0', TTL
>> >> => 'FOREVER', KEEP_DELETED_CELLS => '
>> >>  false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
>> 'true'}
>> >> 1 row(s) in 0.0170 seconds
>> >>
>> >> 2015-01-29 5:27 GMT+01:00 Ted Yu :
>> >> > Maybe ask on Cassandra mailing list for the benchmark tool they use ?
>> >> >
>> >> > Cheers
>> >> >
>> >> > On Wed, Jan 28, 2015 at 1:23 PM, Guillermo Ortiz <
>> konstt2...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> I was ch

Re: Tool to to execute an benchmark for HBase.

2015-01-29 Thread Nishanth S
How many instances of ycsb do you run and how many threads do you use per
instance.I guess these ops are per instance and  you should get similar
numbers if you run  more instances.In short try running more  workload
instances...

-Nishanth

On Thu, Jan 29, 2015 at 8:49 AM, Guillermo Ortiz 
wrote:

> Yes, I'm using 40%. i can't access to those data either.
> I don't know how YSCB executes the reads and if they are random and
> could take advange of the cache.
>
> Do you think that it's an acceptable performance?
>
>
> 2015-01-29 16:26 GMT+01:00 Ted Yu :
> > What's the value for hfile.block.cache.size ?
> >
> > By default it is 40%. You may want to increase its value if you're using
> > default.
> >
> > Andrew published some ycsb results :
> > http://people.apache.org/~apurtell/results-ycsb-0.98.8/ycsb
> > -0.98.0-vs-0.98.8.pdf
> >
> > However, I couldn't access the above now.
> >
> > Cheers
> >
> > On Thu, Jan 29, 2015 at 7:14 AM, Guillermo Ortiz 
> > wrote:
> >
> >> Is there any result with that benchmark to compare??
> >> I'm executing the different workloads and for example for 100% Reads
> >> in a table with 10Millions of records I only get an performance of
> >> 2000operations/sec. I hoped much better performance but I could be
> >> wrong. I'd like to know if it's a normal performance or I could have
> >> something bad configured.
> >>
> >>
> >> I have splitted the tabled and all the records are balanced and used
> >> snappy.
> >> The cluster has a master and 4 regions servers with 256Gb,Cores 2 (32
> >> w/ Hyperthreading), 0.98.6-cdh5.3.0,
> >>
> >> RegionServer is executed with these parameters:
> >>  /usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_regionserver
> >> -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
> >> -Djava.net.preferIPv4Stack=true -Xms640679936 -Xmx640679936
> >> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> >> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
> >> -Dhbase.log.dir=/var/log/hbase
> >>
> >>
> -Dhbase.log.file=hbase-cmf-hbase-REGIONSERVER-cnsalbsrvcl23.lvtc.gsnet.corp.log.out
> >>
> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase
> >> -Dhbase.id.str= -Dhbase.root.logger=INFO,RFA
> >>
> >>
> -Djava.library.path=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
> >> -Dhbase.security.logger=INFO,RFAS
> >> org.apache.hadoop.hbase.regionserver.HRegionServer start
> >>
> >>
> >> The results for 100% reads are
> >> [OVERALL], RunTime(ms), 42734.0
> >> [OVERALL], Throughput(ops/sec), 2340.0570973931763
> >> [UPDATE], Operations, 1.0
> >> [UPDATE], AverageLatency(us), 103170.0
> >> [UPDATE], MinLatency(us), 103168.0
> >> [UPDATE], MaxLatency(us), 103171.0
> >> [UPDATE], 95thPercentileLatency(ms), 103.0
> >> [UPDATE], 99thPercentileLatency(ms), 103.0
> >> [READ], Operations, 10.0
> >> [READ], AverageLatency(us), 412.5534
> >> [READ], AverageLatency(us,corrected), 581.6249026771276
> >> [READ], MinLatency(us), 218.0
> >> [READ], MaxLatency(us), 268383.0
> >> [READ], MaxLatency(us,corrected), 268383.0
> >> [READ], 95thPercentileLatency(ms), 0.0
> >> [READ], 95thPercentileLatency(ms,corrected), 0.0
> >> [READ], 99thPercentileLatency(ms), 0.0
> >> [READ], 99thPercentileLatency(ms,corrected), 0.0
> >> [READ], Return=0, 10
> >> [CLEANUP], Operations, 1.0
> >> [CLEANUP], AverageLatency(us), 103598.0
> >> [CLEANUP], MinLatency(us), 103596.0
> >> [CLEANUP], MaxLatency(us), 103599.0
> >> [CLEANUP], 95thPercentileLatency(ms), 103.0
> >> [CLEANUP], 99thPercentileLatency(ms), 103.0
> >>
> >> hbase(main):030:0> describe 'username'
> >> DESCRIPTION
> >> ENABLED
> >>  'username', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER
> >> => 'ROW', REPLICATION_SCOPE => '0', true
> >>   VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0', TTL
> >> => 'FOREVER', KEEP_DELETED_CELLS => '
> >>  false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> 'true'}
> >> 1 row(s) in 0.0170 seconds
> >>
> >> 2015-01-29 5:27 GMT+01:00 Ted Yu :
> >> > Maybe ask on Cassandra mailing list for the benchmark tool they use ?
> >> >
> >> > Cheers
> >> >
> >> > On Wed, Jan 28, 2015 at 1:23 PM, Guillermo Ortiz <
> konstt2...@gmail.com>
> >> > wrote:
> >> >
> >> >> I was checking that web, do you know if there's another possibility
> >> >> since last updated for Cassandra was two years ago and I'd like to
> >> >> compare bothof them with kind of same tool/code.
> >> >>
> >> >> 2015-01-28 22:10 GMT+01:00 Ted Yu :
> >> >> > Guillermo:
> >> >> > If you use hbase 0.98.x, please consider Andrew's ycsb repo:
> >> >> >
> >> >> > https://github.com/apurtell/ycsb/tree/new_hbase_client
> >> >> >
> >> >> > Cheers
> >> >> >
> >> >> > On Wed, Jan 28, 2015 at 12:41 PM, Nishanth S <
> nishanth.2...@gmail.com
> >> >
> >> >> > wrote:
> >> >> >
> >> >> >> You can use ycsb for this purpose.See here
> >> >> >>
> >

Re: Tool to to execute an benchmark for HBase.

2015-01-29 Thread Guillermo Ortiz
Yes, I'm using 40%. i can't access to those data either.
I don't know how YSCB executes the reads and if they are random and
could take advange of the cache.

Do you think that it's an acceptable performance?


2015-01-29 16:26 GMT+01:00 Ted Yu :
> What's the value for hfile.block.cache.size ?
>
> By default it is 40%. You may want to increase its value if you're using
> default.
>
> Andrew published some ycsb results :
> http://people.apache.org/~apurtell/results-ycsb-0.98.8/ycsb
> -0.98.0-vs-0.98.8.pdf
>
> However, I couldn't access the above now.
>
> Cheers
>
> On Thu, Jan 29, 2015 at 7:14 AM, Guillermo Ortiz 
> wrote:
>
>> Is there any result with that benchmark to compare??
>> I'm executing the different workloads and for example for 100% Reads
>> in a table with 10Millions of records I only get an performance of
>> 2000operations/sec. I hoped much better performance but I could be
>> wrong. I'd like to know if it's a normal performance or I could have
>> something bad configured.
>>
>>
>> I have splitted the tabled and all the records are balanced and used
>> snappy.
>> The cluster has a master and 4 regions servers with 256Gb,Cores 2 (32
>> w/ Hyperthreading), 0.98.6-cdh5.3.0,
>>
>> RegionServer is executed with these parameters:
>>  /usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_regionserver
>> -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
>> -Djava.net.preferIPv4Stack=true -Xms640679936 -Xmx640679936
>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
>> -Dhbase.log.dir=/var/log/hbase
>>
>> -Dhbase.log.file=hbase-cmf-hbase-REGIONSERVER-cnsalbsrvcl23.lvtc.gsnet.corp.log.out
>> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase
>> -Dhbase.id.str= -Dhbase.root.logger=INFO,RFA
>>
>> -Djava.library.path=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
>> -Dhbase.security.logger=INFO,RFAS
>> org.apache.hadoop.hbase.regionserver.HRegionServer start
>>
>>
>> The results for 100% reads are
>> [OVERALL], RunTime(ms), 42734.0
>> [OVERALL], Throughput(ops/sec), 2340.0570973931763
>> [UPDATE], Operations, 1.0
>> [UPDATE], AverageLatency(us), 103170.0
>> [UPDATE], MinLatency(us), 103168.0
>> [UPDATE], MaxLatency(us), 103171.0
>> [UPDATE], 95thPercentileLatency(ms), 103.0
>> [UPDATE], 99thPercentileLatency(ms), 103.0
>> [READ], Operations, 10.0
>> [READ], AverageLatency(us), 412.5534
>> [READ], AverageLatency(us,corrected), 581.6249026771276
>> [READ], MinLatency(us), 218.0
>> [READ], MaxLatency(us), 268383.0
>> [READ], MaxLatency(us,corrected), 268383.0
>> [READ], 95thPercentileLatency(ms), 0.0
>> [READ], 95thPercentileLatency(ms,corrected), 0.0
>> [READ], 99thPercentileLatency(ms), 0.0
>> [READ], 99thPercentileLatency(ms,corrected), 0.0
>> [READ], Return=0, 10
>> [CLEANUP], Operations, 1.0
>> [CLEANUP], AverageLatency(us), 103598.0
>> [CLEANUP], MinLatency(us), 103596.0
>> [CLEANUP], MaxLatency(us), 103599.0
>> [CLEANUP], 95thPercentileLatency(ms), 103.0
>> [CLEANUP], 99thPercentileLatency(ms), 103.0
>>
>> hbase(main):030:0> describe 'username'
>> DESCRIPTION
>> ENABLED
>>  'username', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER
>> => 'ROW', REPLICATION_SCOPE => '0', true
>>   VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0', TTL
>> => 'FOREVER', KEEP_DELETED_CELLS => '
>>  false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
>> 1 row(s) in 0.0170 seconds
>>
>> 2015-01-29 5:27 GMT+01:00 Ted Yu :
>> > Maybe ask on Cassandra mailing list for the benchmark tool they use ?
>> >
>> > Cheers
>> >
>> > On Wed, Jan 28, 2015 at 1:23 PM, Guillermo Ortiz 
>> > wrote:
>> >
>> >> I was checking that web, do you know if there's another possibility
>> >> since last updated for Cassandra was two years ago and I'd like to
>> >> compare bothof them with kind of same tool/code.
>> >>
>> >> 2015-01-28 22:10 GMT+01:00 Ted Yu :
>> >> > Guillermo:
>> >> > If you use hbase 0.98.x, please consider Andrew's ycsb repo:
>> >> >
>> >> > https://github.com/apurtell/ycsb/tree/new_hbase_client
>> >> >
>> >> > Cheers
>> >> >
>> >> > On Wed, Jan 28, 2015 at 12:41 PM, Nishanth S > >
>> >> > wrote:
>> >> >
>> >> >> You can use ycsb for this purpose.See here
>> >> >>
>> >> >> https://github.com/brianfrankcooper/YCSB/wiki/Getting-Started
>> >> >> -Nishanth
>> >> >>
>> >> >> On Wed, Jan 28, 2015 at 1:37 PM, Guillermo Ortiz <
>> konstt2...@gmail.com>
>> >> >> wrote:
>> >> >>
>> >> >> > Hi,
>> >> >> >
>> >> >> > I'd like to do some benchmarks fo HBase but I don't know what tool
>> >> >> > could use. I started to make some code but I guess that there're
>> some
>> >> >> > easier.
>> >> >> >
>> >> >> > I've taken a look to JMeter, but I guess that I'd attack directly
>> from
>> >> >> > Java, JMeter looks great but I don't know if it fits well in this
>> >> >> 

Re: Tool to to execute an benchmark for HBase.

2015-01-29 Thread Ted Yu
What's the value for hfile.block.cache.size ?

By default it is 40%. You may want to increase its value if you're using
default.

Andrew published some ycsb results :
http://people.apache.org/~apurtell/results-ycsb-0.98.8/ycsb
-0.98.0-vs-0.98.8.pdf

However, I couldn't access the above now.

Cheers

On Thu, Jan 29, 2015 at 7:14 AM, Guillermo Ortiz 
wrote:

> Is there any result with that benchmark to compare??
> I'm executing the different workloads and for example for 100% Reads
> in a table with 10Millions of records I only get an performance of
> 2000operations/sec. I hoped much better performance but I could be
> wrong. I'd like to know if it's a normal performance or I could have
> something bad configured.
>
>
> I have splitted the tabled and all the records are balanced and used
> snappy.
> The cluster has a master and 4 regions servers with 256Gb,Cores 2 (32
> w/ Hyperthreading), 0.98.6-cdh5.3.0,
>
> RegionServer is executed with these parameters:
>  /usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_regionserver
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
> -Djava.net.preferIPv4Stack=true -Xms640679936 -Xmx640679936
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
> -Dhbase.log.dir=/var/log/hbase
>
> -Dhbase.log.file=hbase-cmf-hbase-REGIONSERVER-cnsalbsrvcl23.lvtc.gsnet.corp.log.out
> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase
> -Dhbase.id.str= -Dhbase.root.logger=INFO,RFA
>
> -Djava.library.path=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
> -Dhbase.security.logger=INFO,RFAS
> org.apache.hadoop.hbase.regionserver.HRegionServer start
>
>
> The results for 100% reads are
> [OVERALL], RunTime(ms), 42734.0
> [OVERALL], Throughput(ops/sec), 2340.0570973931763
> [UPDATE], Operations, 1.0
> [UPDATE], AverageLatency(us), 103170.0
> [UPDATE], MinLatency(us), 103168.0
> [UPDATE], MaxLatency(us), 103171.0
> [UPDATE], 95thPercentileLatency(ms), 103.0
> [UPDATE], 99thPercentileLatency(ms), 103.0
> [READ], Operations, 10.0
> [READ], AverageLatency(us), 412.5534
> [READ], AverageLatency(us,corrected), 581.6249026771276
> [READ], MinLatency(us), 218.0
> [READ], MaxLatency(us), 268383.0
> [READ], MaxLatency(us,corrected), 268383.0
> [READ], 95thPercentileLatency(ms), 0.0
> [READ], 95thPercentileLatency(ms,corrected), 0.0
> [READ], 99thPercentileLatency(ms), 0.0
> [READ], 99thPercentileLatency(ms,corrected), 0.0
> [READ], Return=0, 10
> [CLEANUP], Operations, 1.0
> [CLEANUP], AverageLatency(us), 103598.0
> [CLEANUP], MinLatency(us), 103596.0
> [CLEANUP], MaxLatency(us), 103599.0
> [CLEANUP], 95thPercentileLatency(ms), 103.0
> [CLEANUP], 99thPercentileLatency(ms), 103.0
>
> hbase(main):030:0> describe 'username'
> DESCRIPTION
> ENABLED
>  'username', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER
> => 'ROW', REPLICATION_SCOPE => '0', true
>   VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0', TTL
> => 'FOREVER', KEEP_DELETED_CELLS => '
>  false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> 1 row(s) in 0.0170 seconds
>
> 2015-01-29 5:27 GMT+01:00 Ted Yu :
> > Maybe ask on Cassandra mailing list for the benchmark tool they use ?
> >
> > Cheers
> >
> > On Wed, Jan 28, 2015 at 1:23 PM, Guillermo Ortiz 
> > wrote:
> >
> >> I was checking that web, do you know if there's another possibility
> >> since last updated for Cassandra was two years ago and I'd like to
> >> compare bothof them with kind of same tool/code.
> >>
> >> 2015-01-28 22:10 GMT+01:00 Ted Yu :
> >> > Guillermo:
> >> > If you use hbase 0.98.x, please consider Andrew's ycsb repo:
> >> >
> >> > https://github.com/apurtell/ycsb/tree/new_hbase_client
> >> >
> >> > Cheers
> >> >
> >> > On Wed, Jan 28, 2015 at 12:41 PM, Nishanth S  >
> >> > wrote:
> >> >
> >> >> You can use ycsb for this purpose.See here
> >> >>
> >> >> https://github.com/brianfrankcooper/YCSB/wiki/Getting-Started
> >> >> -Nishanth
> >> >>
> >> >> On Wed, Jan 28, 2015 at 1:37 PM, Guillermo Ortiz <
> konstt2...@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > I'd like to do some benchmarks fo HBase but I don't know what tool
> >> >> > could use. I started to make some code but I guess that there're
> some
> >> >> > easier.
> >> >> >
> >> >> > I've taken a look to JMeter, but I guess that I'd attack directly
> from
> >> >> > Java, JMeter looks great but I don't know if it fits well in this
> >> >> > scenario. What tool could I use to take some measures as time to
> >> >> > response some read and write request, etc. I'd like that to be
> able to
> >> >> > make the same benchmarks to Cassandra.
> >> >> >
> >> >>
> >>
>


Re: Tool to to execute an benchmark for HBase.

2015-01-29 Thread Guillermo Ortiz
Is there any result with that benchmark to compare??
I'm executing the different workloads and for example for 100% Reads
in a table with 10Millions of records I only get an performance of
2000operations/sec. I hoped much better performance but I could be
wrong. I'd like to know if it's a normal performance or I could have
something bad configured.


I have splitted the tabled and all the records are balanced and used snappy.
The cluster has a master and 4 regions servers with 256Gb,Cores 2 (32
w/ Hyperthreading), 0.98.6-cdh5.3.0,

RegionServer is executed with these parameters:
 /usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_regionserver
-XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
-Djava.net.preferIPv4Stack=true -Xms640679936 -Xmx640679936
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
-XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
-XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
-Dhbase.log.dir=/var/log/hbase
-Dhbase.log.file=hbase-cmf-hbase-REGIONSERVER-cnsalbsrvcl23.lvtc.gsnet.corp.log.out
-Dhbase.home.dir=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase
-Dhbase.id.str= -Dhbase.root.logger=INFO,RFA
-Djava.library.path=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
-Dhbase.security.logger=INFO,RFAS
org.apache.hadoop.hbase.regionserver.HRegionServer start


The results for 100% reads are
[OVERALL], RunTime(ms), 42734.0
[OVERALL], Throughput(ops/sec), 2340.0570973931763
[UPDATE], Operations, 1.0
[UPDATE], AverageLatency(us), 103170.0
[UPDATE], MinLatency(us), 103168.0
[UPDATE], MaxLatency(us), 103171.0
[UPDATE], 95thPercentileLatency(ms), 103.0
[UPDATE], 99thPercentileLatency(ms), 103.0
[READ], Operations, 10.0
[READ], AverageLatency(us), 412.5534
[READ], AverageLatency(us,corrected), 581.6249026771276
[READ], MinLatency(us), 218.0
[READ], MaxLatency(us), 268383.0
[READ], MaxLatency(us,corrected), 268383.0
[READ], 95thPercentileLatency(ms), 0.0
[READ], 95thPercentileLatency(ms,corrected), 0.0
[READ], 99thPercentileLatency(ms), 0.0
[READ], 99thPercentileLatency(ms,corrected), 0.0
[READ], Return=0, 10
[CLEANUP], Operations, 1.0
[CLEANUP], AverageLatency(us), 103598.0
[CLEANUP], MinLatency(us), 103596.0
[CLEANUP], MaxLatency(us), 103599.0
[CLEANUP], 95thPercentileLatency(ms), 103.0
[CLEANUP], 99thPercentileLatency(ms), 103.0

hbase(main):030:0> describe 'username'
DESCRIPTION
ENABLED
 'username', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER
=> 'ROW', REPLICATION_SCOPE => '0', true
  VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0', TTL
=> 'FOREVER', KEEP_DELETED_CELLS => '
 false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
1 row(s) in 0.0170 seconds

2015-01-29 5:27 GMT+01:00 Ted Yu :
> Maybe ask on Cassandra mailing list for the benchmark tool they use ?
>
> Cheers
>
> On Wed, Jan 28, 2015 at 1:23 PM, Guillermo Ortiz 
> wrote:
>
>> I was checking that web, do you know if there's another possibility
>> since last updated for Cassandra was two years ago and I'd like to
>> compare bothof them with kind of same tool/code.
>>
>> 2015-01-28 22:10 GMT+01:00 Ted Yu :
>> > Guillermo:
>> > If you use hbase 0.98.x, please consider Andrew's ycsb repo:
>> >
>> > https://github.com/apurtell/ycsb/tree/new_hbase_client
>> >
>> > Cheers
>> >
>> > On Wed, Jan 28, 2015 at 12:41 PM, Nishanth S 
>> > wrote:
>> >
>> >> You can use ycsb for this purpose.See here
>> >>
>> >> https://github.com/brianfrankcooper/YCSB/wiki/Getting-Started
>> >> -Nishanth
>> >>
>> >> On Wed, Jan 28, 2015 at 1:37 PM, Guillermo Ortiz 
>> >> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > I'd like to do some benchmarks fo HBase but I don't know what tool
>> >> > could use. I started to make some code but I guess that there're some
>> >> > easier.
>> >> >
>> >> > I've taken a look to JMeter, but I guess that I'd attack directly from
>> >> > Java, JMeter looks great but I don't know if it fits well in this
>> >> > scenario. What tool could I use to take some measures as time to
>> >> > response some read and write request, etc. I'd like that to be able to
>> >> > make the same benchmarks to Cassandra.
>> >> >
>> >>
>>


Re: Tool to to execute an benchmark for HBase.

2015-01-28 Thread Ted Yu
Maybe ask on Cassandra mailing list for the benchmark tool they use ?

Cheers

On Wed, Jan 28, 2015 at 1:23 PM, Guillermo Ortiz 
wrote:

> I was checking that web, do you know if there's another possibility
> since last updated for Cassandra was two years ago and I'd like to
> compare bothof them with kind of same tool/code.
>
> 2015-01-28 22:10 GMT+01:00 Ted Yu :
> > Guillermo:
> > If you use hbase 0.98.x, please consider Andrew's ycsb repo:
> >
> > https://github.com/apurtell/ycsb/tree/new_hbase_client
> >
> > Cheers
> >
> > On Wed, Jan 28, 2015 at 12:41 PM, Nishanth S 
> > wrote:
> >
> >> You can use ycsb for this purpose.See here
> >>
> >> https://github.com/brianfrankcooper/YCSB/wiki/Getting-Started
> >> -Nishanth
> >>
> >> On Wed, Jan 28, 2015 at 1:37 PM, Guillermo Ortiz 
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I'd like to do some benchmarks fo HBase but I don't know what tool
> >> > could use. I started to make some code but I guess that there're some
> >> > easier.
> >> >
> >> > I've taken a look to JMeter, but I guess that I'd attack directly from
> >> > Java, JMeter looks great but I don't know if it fits well in this
> >> > scenario. What tool could I use to take some measures as time to
> >> > response some read and write request, etc. I'd like that to be able to
> >> > make the same benchmarks to Cassandra.
> >> >
> >>
>


Re: Tool to to execute an benchmark for HBase.

2015-01-28 Thread Guillermo Ortiz
I was checking that web, do you know if there's another possibility
since last updated for Cassandra was two years ago and I'd like to
compare bothof them with kind of same tool/code.

2015-01-28 22:10 GMT+01:00 Ted Yu :
> Guillermo:
> If you use hbase 0.98.x, please consider Andrew's ycsb repo:
>
> https://github.com/apurtell/ycsb/tree/new_hbase_client
>
> Cheers
>
> On Wed, Jan 28, 2015 at 12:41 PM, Nishanth S 
> wrote:
>
>> You can use ycsb for this purpose.See here
>>
>> https://github.com/brianfrankcooper/YCSB/wiki/Getting-Started
>> -Nishanth
>>
>> On Wed, Jan 28, 2015 at 1:37 PM, Guillermo Ortiz 
>> wrote:
>>
>> > Hi,
>> >
>> > I'd like to do some benchmarks fo HBase but I don't know what tool
>> > could use. I started to make some code but I guess that there're some
>> > easier.
>> >
>> > I've taken a look to JMeter, but I guess that I'd attack directly from
>> > Java, JMeter looks great but I don't know if it fits well in this
>> > scenario. What tool could I use to take some measures as time to
>> > response some read and write request, etc. I'd like that to be able to
>> > make the same benchmarks to Cassandra.
>> >
>>


Re: Tool to to execute an benchmark for HBase.

2015-01-28 Thread Ted Yu
Guillermo:
If you use hbase 0.98.x, please consider Andrew's ycsb repo:

https://github.com/apurtell/ycsb/tree/new_hbase_client

Cheers

On Wed, Jan 28, 2015 at 12:41 PM, Nishanth S 
wrote:

> You can use ycsb for this purpose.See here
>
> https://github.com/brianfrankcooper/YCSB/wiki/Getting-Started
> -Nishanth
>
> On Wed, Jan 28, 2015 at 1:37 PM, Guillermo Ortiz 
> wrote:
>
> > Hi,
> >
> > I'd like to do some benchmarks fo HBase but I don't know what tool
> > could use. I started to make some code but I guess that there're some
> > easier.
> >
> > I've taken a look to JMeter, but I guess that I'd attack directly from
> > Java, JMeter looks great but I don't know if it fits well in this
> > scenario. What tool could I use to take some measures as time to
> > response some read and write request, etc. I'd like that to be able to
> > make the same benchmarks to Cassandra.
> >
>


Re: Tool to to execute an benchmark for HBase.

2015-01-28 Thread Nishanth S
You can use ycsb for this purpose.See here

https://github.com/brianfrankcooper/YCSB/wiki/Getting-Started
-Nishanth

On Wed, Jan 28, 2015 at 1:37 PM, Guillermo Ortiz 
wrote:

> Hi,
>
> I'd like to do some benchmarks fo HBase but I don't know what tool
> could use. I started to make some code but I guess that there're some
> easier.
>
> I've taken a look to JMeter, but I guess that I'd attack directly from
> Java, JMeter looks great but I don't know if it fits well in this
> scenario. What tool could I use to take some measures as time to
> response some read and write request, etc. I'd like that to be able to
> make the same benchmarks to Cassandra.
>


Tool to to execute an benchmark for HBase.

2015-01-28 Thread Guillermo Ortiz
Hi,

I'd like to do some benchmarks fo HBase but I don't know what tool
could use. I started to make some code but I guess that there're some
easier.

I've taken a look to JMeter, but I guess that I'd attack directly from
Java, JMeter looks great but I don't know if it fits well in this
scenario. What tool could I use to take some measures as time to
response some read and write request, etc. I'd like that to be able to
make the same benchmarks to Cassandra.