Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-12 Thread ramkrishna vasudevan
Hi

>> Since I'm storing
historical data (snapshot data) and changes between adjacent value cells
are relatively small.

If the values are changing even if it is smaller the FASTDIFF will rewrite
the value part.  Only if there are exact matches then it would skip the
value part. JFYI.

Regards
Ram

On Thu, Nov 13, 2014 at 11:23 AM, Jianshi Huang 
wrote:

> I thought FASTDIFF was only for rowkey and columns, great if it also works
> in value cell.
>
> And thanks for the bjson link!
>
> Jianshi
>
> On Thu, Nov 13, 2014 at 1:18 PM, Ted Yu  wrote:
>
> > There is FASTDIFF data block encoding.
> >
> > See also http://bjson.org/
> >
> > Cheers
> >
> > On Nov 12, 2014, at 9:08 PM, Jianshi Huang 
> > wrote:
> >
> > > Hi,
> > >
> > > I'm currently saving JSON in pure String format in the value cell and
> > > depends on HBase' block compression to reduce the overhead of JSON.
> > >
> > > I'm wondering if there's a more space efficient way to store JSON?
> > > (there're lots of 0s and 1s, JSON String actually is an OK format)
> > >
> > > I want to keep the value as a Map since the schema of source data might
> > > change over time.
> > >
> > > Also is there a DIFF based encoding for values? Since I'm storing
> > > historical data (snapshot data) and changes between adjacent value
> cells
> > > are relatively small.
> > >
> > >
> > > Thanks,
> > > --
> > > Jianshi Huang
> > >
> > > LinkedIn: jianshi
> > > Twitter: @jshuang
> > > Github & Blog: http://huangjs.github.com/
> >
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>


Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-12 Thread Jianshi Huang
I thought FASTDIFF was only for rowkey and columns, great if it also works
in value cell.

And thanks for the bjson link!

Jianshi

On Thu, Nov 13, 2014 at 1:18 PM, Ted Yu  wrote:

> There is FASTDIFF data block encoding.
>
> See also http://bjson.org/
>
> Cheers
>
> On Nov 12, 2014, at 9:08 PM, Jianshi Huang 
> wrote:
>
> > Hi,
> >
> > I'm currently saving JSON in pure String format in the value cell and
> > depends on HBase' block compression to reduce the overhead of JSON.
> >
> > I'm wondering if there's a more space efficient way to store JSON?
> > (there're lots of 0s and 1s, JSON String actually is an OK format)
> >
> > I want to keep the value as a Map since the schema of source data might
> > change over time.
> >
> > Also is there a DIFF based encoding for values? Since I'm storing
> > historical data (snapshot data) and changes between adjacent value cells
> > are relatively small.
> >
> >
> > Thanks,
> > --
> > Jianshi Huang
> >
> > LinkedIn: jianshi
> > Twitter: @jshuang
> > Github & Blog: http://huangjs.github.com/
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/


Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-12 Thread Ted Yu
There is FASTDIFF data block encoding. 

See also http://bjson.org/

Cheers

On Nov 12, 2014, at 9:08 PM, Jianshi Huang  wrote:

> Hi,
> 
> I'm currently saving JSON in pure String format in the value cell and
> depends on HBase' block compression to reduce the overhead of JSON.
> 
> I'm wondering if there's a more space efficient way to store JSON?
> (there're lots of 0s and 1s, JSON String actually is an OK format)
> 
> I want to keep the value as a Map since the schema of source data might
> change over time.
> 
> Also is there a DIFF based encoding for values? Since I'm storing
> historical data (snapshot data) and changes between adjacent value cells
> are relatively small.
> 
> 
> Thanks,
> -- 
> Jianshi Huang
> 
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/


Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-12 Thread Jianshi Huang
Hi,

I'm currently saving JSON in pure String format in the value cell and
depends on HBase' block compression to reduce the overhead of JSON.

I'm wondering if there's a more space efficient way to store JSON?
(there're lots of 0s and 1s, JSON String actually is an OK format)

I want to keep the value as a Map since the schema of source data might
change over time.

Also is there a DIFF based encoding for values? Since I'm storing
historical data (snapshot data) and changes between adjacent value cells
are relatively small.


Thanks,
-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/


Re: Is it possible that HBase update performance is much better than read in YCSB test?

2014-11-12 Thread ramkrishna vasudevan
Thanks Andrew.  This would be a very useful information along with the
github link.

Regards
Ram

On Thu, Nov 13, 2014 at 9:00 AM, Liu, Ming (HPIT-GADSC) 
wrote:

> Thank you Andrew, this is an excellent answer, I get it now. I will try
> your hbase client for a 'fair' test :-)
>
> Best Regards,
> Ming
>
> -Original Message-
> From: Andrew Purtell [mailto:apurt...@apache.org]
> Sent: Thursday, November 13, 2014 2:08 AM
> To: user@hbase.apache.org
> Cc: DeRoo, John
> Subject: Re: Is it possible that HBase update performance is much better
> than read in YCSB test?
>
> Try this HBase YCSB client instead:
> https://github.com/apurtell/ycsb/tree/new_hbase_client
>
> The HBase YCSB driver in the master repo holds on to one HTable instance
> per driver thread. We accumulate writes into a 12MB write buffer before
> flushing them en masse. This is why the behavior you are seeing confounds
> your expectations. It's not correct behavior IMHO. YCSB wants to measure
> the round trip of every op, not the non-cost of local caching. Worse, if we
> have a lot of driver threads accumulating 12MB of edits more or less at the
> same rate, then we will flush these buffers more or less at the same time
> and stampede the cluster, which leads to deep valleys in observed write
> performance of 30-60 seconds or longer.
>
>
>
> On Tue, Nov 11, 2014 at 8:40 PM, Liu, Ming (HPIT-GADSC) 
> wrote:
>
> > Hi, all,
> >
> > I am trying to use YCSB to test on our HBase 0.98.5 instance and got a
> > strange result: update is 6x better than read. It is just an exercise,
> > so the HBase is running in a workstation in standalone mode.
> > I modified the workloada shipped with YCSB into two new workloads:
> > workloadr and workloadu, where workloadr is do 100% read operation and
> > workloadu is do 100% update operation. At the bottom is the workloadr
> > and workloadu config files for your reference.
> >
> > I found out that the read performance is much worse than the update
> > performance, read is about 6000:
> >
> > YCSB Client 0.1
> > Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadr
> > -p columnfamily=family -s -t [OVERALL], RunTime(ms), 16565.0
> > [OVERALL], Throughput(ops/sec), 6036.824630244491
> >
> > And the update performance is about 36000, 6x better than read.
> >
> > YCSB Client 0.1
> > Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadu
> > -p columnfamily=family -s -t [OVERALL], RunTime(ms), 2767.0 [OVERALL],
> > Throughput(ops/sec), 36140.22406938923
> >
> > Is this possible? IMHO, read should be faster than update.
> > Maybe I am wrong in the workload file? Or there is a possibility that
> > update is faster than read? I don't find a YCSB mailing list, if
> > anyone knows, please give me a link, so I can also ask question on
> > that mailing list. But is it possible that put is faster than get in
> > hbase? If not, the result must be wrong and I need to debug the YCSB
> > code to figure out what is going wrong.
> >
> > Workloadr:
> > recordcount=10
> > operationcount=10
> > workload=com.yahoo.ycsb.workloads.CoreWorkload
> > readallfields=true
> > readproportion=1
> > updateproportion=0
> > scanproportion=0
> > insertproportion=0
> > requestdistribution=zipfian
> >
> > workloadu:
> > recordcount=10
> > operationcount=10
> > workload=com.yahoo.ycsb.workloads.CoreWorkload
> > readallfields=true
> > readproportion=0
> > updateproportion=1
> > scanproportion=0
> > insertproportion=0
> > requestdistribution=zipfian
> >
> >
> > Thanks,
> > Ming
> >
>
>
>
> --
> Best regards,
>
>- Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>


RE: Is it possible that HBase update performance is much better than read in YCSB test?

2014-11-12 Thread Liu, Ming (HPIT-GADSC)
Thank you Andrew, this is an excellent answer, I get it now. I will try your 
hbase client for a 'fair' test :-)

Best Regards,
Ming

-Original Message-
From: Andrew Purtell [mailto:apurt...@apache.org] 
Sent: Thursday, November 13, 2014 2:08 AM
To: user@hbase.apache.org
Cc: DeRoo, John
Subject: Re: Is it possible that HBase update performance is much better than 
read in YCSB test?

Try this HBase YCSB client instead:
https://github.com/apurtell/ycsb/tree/new_hbase_client

The HBase YCSB driver in the master repo holds on to one HTable instance per 
driver thread. We accumulate writes into a 12MB write buffer before flushing 
them en masse. This is why the behavior you are seeing confounds your 
expectations. It's not correct behavior IMHO. YCSB wants to measure the round 
trip of every op, not the non-cost of local caching. Worse, if we have a lot of 
driver threads accumulating 12MB of edits more or less at the same rate, then 
we will flush these buffers more or less at the same time and stampede the 
cluster, which leads to deep valleys in observed write performance of 30-60 
seconds or longer.



On Tue, Nov 11, 2014 at 8:40 PM, Liu, Ming (HPIT-GADSC) 
wrote:

> Hi, all,
>
> I am trying to use YCSB to test on our HBase 0.98.5 instance and got a 
> strange result: update is 6x better than read. It is just an exercise, 
> so the HBase is running in a workstation in standalone mode.
> I modified the workloada shipped with YCSB into two new workloads:
> workloadr and workloadu, where workloadr is do 100% read operation and 
> workloadu is do 100% update operation. At the bottom is the workloadr 
> and workloadu config files for your reference.
>
> I found out that the read performance is much worse than the update 
> performance, read is about 6000:
>
> YCSB Client 0.1
> Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadr 
> -p columnfamily=family -s -t [OVERALL], RunTime(ms), 16565.0 
> [OVERALL], Throughput(ops/sec), 6036.824630244491
>
> And the update performance is about 36000, 6x better than read.
>
> YCSB Client 0.1
> Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadu 
> -p columnfamily=family -s -t [OVERALL], RunTime(ms), 2767.0 [OVERALL], 
> Throughput(ops/sec), 36140.22406938923
>
> Is this possible? IMHO, read should be faster than update.
> Maybe I am wrong in the workload file? Or there is a possibility that 
> update is faster than read? I don't find a YCSB mailing list, if 
> anyone knows, please give me a link, so I can also ask question on 
> that mailing list. But is it possible that put is faster than get in 
> hbase? If not, the result must be wrong and I need to debug the YCSB 
> code to figure out what is going wrong.
>
> Workloadr:
> recordcount=10
> operationcount=10
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> readallfields=true
> readproportion=1
> updateproportion=0
> scanproportion=0
> insertproportion=0
> requestdistribution=zipfian
>
> workloadu:
> recordcount=10
> operationcount=10
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> readallfields=true
> readproportion=0
> updateproportion=1
> scanproportion=0
> insertproportion=0
> requestdistribution=zipfian
>
>
> Thanks,
> Ming
>



--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)


Re: Trying to connect HBase Java Client I get: "Failed to locate the winutils binary in the hadoop binary path"

2014-11-12 Thread Néstor Boscán
Yes I already applied that.

I just wanted to understand that if I have a web application then I'll have
to have the hadoop distribution installed to use the hbase client.

Regards,

Néstor

On Wed, Nov 12, 2014 at 7:57 PM, Ted Yu  wrote:

> Cycling bits: http://search-hadoop.com/m/DHED4y3J2B
>
> On Wed, Nov 12, 2014 at 3:27 PM, Néstor Boscán  wrote:
>
> > Hi
> >
> > I'm creating my first HBase application and I'm trying to connect from
> the
> > Java application in my Java IDE to my HBase server on a Horton Workds 2.1
> > Virtual Machine. When I run I get:
> >
> > Failed to locate the winutils binary in the hadoop binary path
> >
> > Does this mean that I have to have hadoop installed in my laptop to be
> able
> > to test connections to HBase?
> >
> > Regards,
> >
> > Néstor
> >
>


Re: Trying to connect HBase Java Client I get: "Failed to locate the winutils binary in the hadoop binary path"

2014-11-12 Thread Ted Yu
Cycling bits: http://search-hadoop.com/m/DHED4y3J2B

On Wed, Nov 12, 2014 at 3:27 PM, Néstor Boscán  wrote:

> Hi
>
> I'm creating my first HBase application and I'm trying to connect from the
> Java application in my Java IDE to my HBase server on a Horton Workds 2.1
> Virtual Machine. When I run I get:
>
> Failed to locate the winutils binary in the hadoop binary path
>
> Does this mean that I have to have hadoop installed in my laptop to be able
> to test connections to HBase?
>
> Regards,
>
> Néstor
>


Trying to connect HBase Java Client I get: "Failed to locate the winutils binary in the hadoop binary path"

2014-11-12 Thread Néstor Boscán
Hi

I'm creating my first HBase application and I'm trying to connect from the
Java application in my Java IDE to my HBase server on a Horton Workds 2.1
Virtual Machine. When I run I get:

Failed to locate the winutils binary in the hadoop binary path

Does this mean that I have to have hadoop installed in my laptop to be able
to test connections to HBase?

Regards,

Néstor


Re: Programmatic HBase version detection/extraction

2014-11-12 Thread Otis Gospodnetic
Hi,

Thanks Gary, I think this is exactly what I was after!
Btw. might be nice to expose this via JMX, too, for apps who needs this
info but are not in process.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Nov 12, 2014 at 4:44 PM, Gary Helmling  wrote:

> Yes, you can use the org.apache.hadoop.hbase.util.VersionInfo class.
>
> From java code, you can use VersionInfo.getVersion().  From shell
> scripts, you can just run "hbase version" and parse the output.
>
> On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic
>  wrote:
> > Hi,
> >
> > Is there a way to detect which version of HBase one is running?
> > Is there an API for that, or a constant with this value, or maybe an
> MBean
> > or some other way to get to this info?
> >
> > Thanks,
> > Otis
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
>


Re: Programmatic HBase version detection/extraction

2014-11-12 Thread Ted Yu
Java-wise, you can use this API in HBaseAdmin:

  ClusterStatus getClusterStatus() throws IOException;

ClusterStatus provides:

  public String getHBaseVersion() {

Cheers

On Wed, Nov 12, 2014 at 2:06 PM, Ted Yu  wrote:

> Otis:
> You can parse the output from "status 'detailed'" command - look for the
> line starting with 'version'
>
> I checked the output from /jmx but didn't find such information there. The
> version would appear in the classpath but that's not easy to parse.
>
> One note about "hbase version" is that it returns the version of HBase
> client was built with - not the version of the cluster the client is
> talking to.
>
> Cheers
>
> On Wed, Nov 12, 2014 at 1:49 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
>> Hi Ted,
>>
>> Thanks, but I'm looking for something I can grab programmatically (not
>> manually), for example from a Java app.  Maybe there is some API that
>> exposes this information or an MBean?
>>
>> Here's the use case:
>> SPM monitors HBase , but HBase MBeans and
>> metrics
>> have changed over time.
>> How will SPM agent know which MBeans to look for, which metrics to
>> extract,
>> and how to interpret values it extracts without knowing which version of
>> HBase it's monitoring?
>> It could try proming for some known MBeans and deduce HBase version from
>> that, but that feels a little sloppy.
>> Ideally, we'd be able to grab the version from some MBean and based on
>> that
>> extract metrics we know are exposed in that version of HBase.
>>
>> Thanks,
>> Otis
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On Wed, Nov 12, 2014 at 4:41 PM, Ted Yu  wrote:
>>
>> > Using hbase shell:
>> >
>> > hbase(main):002:0> status 'detailed'
>> > version 0.98.4.2-hadoop2
>> >
>> > Cheers
>> >
>> > On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic <
>> > otis.gospodne...@gmail.com> wrote:
>> >
>> > > Hi,
>> > >
>> > > Is there a way to detect which version of HBase one is running?
>> > > Is there an API for that, or a constant with this value, or maybe an
>> > MBean
>> > > or some other way to get to this info?
>> > >
>> > > Thanks,
>> > > Otis
>> > > --
>> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> > > Solr & Elasticsearch Support * http://sematext.com/
>> > >
>> >
>>
>
>


Re: Call for Presentations - HBase User group meeting

2014-11-12 Thread Ryan Rawson
Just popping this back to the top, we are still looking for people to
present at the HBase User Group Meetup in 2 weeks:

http://www.meetup.com/hbaseusergroup/events/205219992/

As always, food and beverages are being provided.  Come and hear about
the cool goings on in HBase land, and possibly even present a few of
your own!

-ryan


On Mon, Nov 10, 2014 at 2:58 PM, Ryan Rawson  wrote:
> Hi all,
>
> The next HBase user group meeting is on November the 20th.  We need a
> few more presenters still!
>
> Please send me your proposals - summary and outline of your talk!
>
> Thanks!
> -ryan


Re: Programmatic HBase version detection/extraction

2014-11-12 Thread Ted Yu
Otis:
You can parse the output from "status 'detailed'" command - look for the
line starting with 'version'

I checked the output from /jmx but didn't find such information there. The
version would appear in the classpath but that's not easy to parse.

One note about "hbase version" is that it returns the version of HBase
client was built with - not the version of the cluster the client is
talking to.

Cheers

On Wed, Nov 12, 2014 at 1:49 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi Ted,
>
> Thanks, but I'm looking for something I can grab programmatically (not
> manually), for example from a Java app.  Maybe there is some API that
> exposes this information or an MBean?
>
> Here's the use case:
> SPM monitors HBase , but HBase MBeans and
> metrics
> have changed over time.
> How will SPM agent know which MBeans to look for, which metrics to extract,
> and how to interpret values it extracts without knowing which version of
> HBase it's monitoring?
> It could try proming for some known MBeans and deduce HBase version from
> that, but that feels a little sloppy.
> Ideally, we'd be able to grab the version from some MBean and based on that
> extract metrics we know are exposed in that version of HBase.
>
> Thanks,
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Wed, Nov 12, 2014 at 4:41 PM, Ted Yu  wrote:
>
> > Using hbase shell:
> >
> > hbase(main):002:0> status 'detailed'
> > version 0.98.4.2-hadoop2
> >
> > Cheers
> >
> > On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic <
> > otis.gospodne...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Is there a way to detect which version of HBase one is running?
> > > Is there an API for that, or a constant with this value, or maybe an
> > MBean
> > > or some other way to get to this info?
> > >
> > > Thanks,
> > > Otis
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> >
>


Re: Programmatic HBase version detection/extraction

2014-11-12 Thread Otis Gospodnetic
Hi Ted,

Thanks, but I'm looking for something I can grab programmatically (not
manually), for example from a Java app.  Maybe there is some API that
exposes this information or an MBean?

Here's the use case:
SPM monitors HBase , but HBase MBeans and metrics
have changed over time.
How will SPM agent know which MBeans to look for, which metrics to extract,
and how to interpret values it extracts without knowing which version of
HBase it's monitoring?
It could try proming for some known MBeans and deduce HBase version from
that, but that feels a little sloppy.
Ideally, we'd be able to grab the version from some MBean and based on that
extract metrics we know are exposed in that version of HBase.

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Nov 12, 2014 at 4:41 PM, Ted Yu  wrote:

> Using hbase shell:
>
> hbase(main):002:0> status 'detailed'
> version 0.98.4.2-hadoop2
>
> Cheers
>
> On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
> > Hi,
> >
> > Is there a way to detect which version of HBase one is running?
> > Is there an API for that, or a constant with this value, or maybe an
> MBean
> > or some other way to get to this info?
> >
> > Thanks,
> > Otis
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
>


Re: Programmatic HBase version detection/extraction

2014-11-12 Thread Gary Helmling
Yes, you can use the org.apache.hadoop.hbase.util.VersionInfo class.

>From java code, you can use VersionInfo.getVersion().  From shell
scripts, you can just run "hbase version" and parse the output.

On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic
 wrote:
> Hi,
>
> Is there a way to detect which version of HBase one is running?
> Is there an API for that, or a constant with this value, or maybe an MBean
> or some other way to get to this info?
>
> Thanks,
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/


Re: Programmatic HBase version detection/extraction

2014-11-12 Thread Ted Yu
Using hbase shell:

hbase(main):002:0> status 'detailed'
version 0.98.4.2-hadoop2

Cheers

On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> Is there a way to detect which version of HBase one is running?
> Is there an API for that, or a constant with this value, or maybe an MBean
> or some other way to get to this info?
>
> Thanks,
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>


Programmatic HBase version detection/extraction

2014-11-12 Thread Otis Gospodnetic
Hi,

Is there a way to detect which version of HBase one is running?
Is there an API for that, or a constant with this value, or maybe an MBean
or some other way to get to this info?

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Re: Is it possible that HBase update performance is much better than read in YCSB test?

2014-11-12 Thread Andrew Purtell
Try this HBase YCSB client instead:
https://github.com/apurtell/ycsb/tree/new_hbase_client

The HBase YCSB driver in the master repo holds on to one HTable instance
per driver thread. We accumulate writes into a 12MB write buffer before
flushing them en masse. This is why the behavior you are seeing confounds
your expectations. It's not correct behavior IMHO. YCSB wants to measure
the round trip of every op, not the non-cost of local caching. Worse, if we
have a lot of driver threads accumulating 12MB of edits more or less at the
same rate, then we will flush these buffers more or less at the same time
and stampede the cluster, which leads to deep valleys in observed write
performance of 30-60 seconds or longer.



On Tue, Nov 11, 2014 at 8:40 PM, Liu, Ming (HPIT-GADSC) 
wrote:

> Hi, all,
>
> I am trying to use YCSB to test on our HBase 0.98.5 instance and got a
> strange result: update is 6x better than read. It is just an exercise, so
> the HBase is running in a workstation in standalone mode.
> I modified the workloada shipped with YCSB into two new workloads:
> workloadr and workloadu, where workloadr is do 100% read operation and
> workloadu is do 100% update operation. At the bottom is the workloadr and
> workloadu config files for your reference.
>
> I found out that the read performance is much worse than the update
> performance, read is about 6000:
>
> YCSB Client 0.1
> Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadr -p
> columnfamily=family -s -t
> [OVERALL], RunTime(ms), 16565.0
> [OVERALL], Throughput(ops/sec), 6036.824630244491
>
> And the update performance is about 36000, 6x better than read.
>
> YCSB Client 0.1
> Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadu -p
> columnfamily=family -s -t
> [OVERALL], RunTime(ms), 2767.0
> [OVERALL], Throughput(ops/sec), 36140.22406938923
>
> Is this possible? IMHO, read should be faster than update.
> Maybe I am wrong in the workload file? Or there is a possibility that
> update is faster than read? I don't find a YCSB mailing list, if anyone
> knows, please give me a link, so I can also ask question on that mailing
> list. But is it possible that put is faster than get in hbase? If not, the
> result must be wrong and I need to debug the YCSB code to figure out what
> is going wrong.
>
> Workloadr:
> recordcount=10
> operationcount=10
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> readallfields=true
> readproportion=1
> updateproportion=0
> scanproportion=0
> insertproportion=0
> requestdistribution=zipfian
>
> workloadu:
> recordcount=10
> operationcount=10
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> readallfields=true
> readproportion=0
> updateproportion=1
> scanproportion=0
> insertproportion=0
> requestdistribution=zipfian
>
>
> Thanks,
> Ming
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: Version in HBase

2014-11-12 Thread Anoop John
So you want one version with ts<= give ts?

Have a look at Scan#setTimeRange(long minStamp, long maxStamp)
If you know the exact ts for cells, you can use Scan#setTimeStamp(long
timestamp)

-Anoop-

On Wed, Nov 12, 2014 at 11:17 AM, Krishna Kalyan 
wrote:

> For Example for table 'test_table', Values inserted are:
>
> Row1 - Val1 => t
> Row1 - Val2 => t + 3
> Row1 - Val3 => t + 5
>
> Row2 - Val1 => t
> Row2 - Val2 => t + 3
> Row2 - Val3 => t + 5
>
> on scan 'test_table' where version = t + 4 should return
> Row1 - Val1 => t + 3
> Row2 - Val2 => t + 3
>
> How do i achieve time stamp based scans?.
>
> Thanks and Regards,
> Krishna
>
>
>
>
> On Wed, Nov 12, 2014 at 10:56 AM, Krishna Kalyan  >
> wrote:
>
> > Hi,
> > Is it possible to do a
> > select * from  where version = "somedate" ; using HBase
> APIs?.
> > (Scanning for values where version <= "somedate" )
> > Could you please direct me to appropriate links to achieve this?.
> >
> >
> > Regards,
> > Krishna
> >
> >
> >
>