Retrive HBase row based on dynamic qualifier filters

2016-11-07 Thread Mukesh Jha
Hello HBase users,

I have an hbase tables with timestamp based dynamic column qualifiers,
where it can look like below* [1]*.

I want to use a qualifier filter which basically checks for the presences
of a timestamp say ts1, and if it's present it should return the whole
hbase row.

I looked into using QualifierFilter *[2]* but it returns only the matched
rows and not the entire HBaseRow.

What is the best way to deal with this scenario?

Thanks for the help, you guys are super helpful.

*[1] HbaseRow*
rowKey1 => {
  cf1 => {
 ts1 => val1,
 ts2 => val2,
 ts6 => val6,
 foo => Array[Byte],
 bar => Array[Byte]
  }
}

*[2]*
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/QualifierFilter.html

-- 
Thanks & Regards,

*Mukesh Jha *


hbase filter based on dynamic column qualifier's value

2016-02-11 Thread Mukesh Jha
Hi

I'm storing an array of attributes having attributes+index as qualifier
name, my column family name is 'cf' here.

Each row in my HBase table that looks like below.

cf:attribute1 -> 'value apple'
cf:attribute2 -> 'value banana'
cf:attribute{N} -> 'value iphone'
cf:someId1 -> 
cf:someOtherQualifier -> 'value e'


While reading data our of HBase I want to scan my table and use an *ValueFilter
*on the cf:attribute* columns for a value (say "apple").

On a match I want the entirerows to be returned.

Below are possible solutions for me

   - Add multiple SingleColumnValue filters for each attribute*. But I
   donot know the no. of items that will be present in attribute* also the
   list might go till 100 so will it affect the scan performance?
   - Store the attributes arraylist as ArrayWritable [1], I'm now sure how
   the scan filter's will work here. If any of you have any experience please
   help.
   - Implement my own filter and ship it in all my RS.

*[1]:*
http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.action/0.2.0/org/apache/hadoop/io/ArrayWritable.java

-- 
Thanks & Regards,

*Mukesh Jha *


Re: High get/scan rates on HBase table even if no readers are on

2015-11-30 Thread Mukesh Jha
On Mon, Nov 30, 2015 at 10:35 PM, Stack  wrote:

> On Fri, Nov 27, 2015 at 10:11 AM, Mukesh Jha 
> wrote:
>
> > I'm working with cloudera hbase v0.98, my HBase table has ~5k regions.
> >
> >
> How many servers do you have carrying the 5k regions?
>
I've 50 nodes hosting these regions.

>
>
> > From the cloudera UI charts i see a lot of get & scan operations active
> on
> > my table even after i shut down all the reader applications.
> >
> >
> Then, there must be an application still running?

I think cloudera's total_get_rates care cumulative in nature
(total_read_requests_rate_across_regionservers but graph sows rate in
ops/sec so still confused here) and hence are showing up in the graph. When
I check per table get/scan rates () they come down to 0 on bringing down
all the applications.

SELECT total_scan_next_rate_across_hregions // shows rate at ~5k
operations/sec
SELECT scan_next_rate // shows~500 operations/sec

>
>
> > I'm suspecting that this is impacting my scan performance.
> >
> > So I'd like to know if there is a way by which i can identify the hosts
> > calling  these get/scan operations? I tried netstat and similar linux
> > commands without much luck.
> >
>
> You can do as Samir suggests. You could also do it on one server only
> temporarily via the RegionServer UI. Look along the top of the webpage for
> Log Level.
>
I'm planning to do that but that'd need a regions server restart, is there
any other way I can trace the calls?

>
> St.Ack
>



-- 


Thanks & Regards,

*Mukesh Jha *


Re: High get/scan rates on HBase table even if no readers are on

2015-11-30 Thread Mukesh Jha
Any clue guys?

Because of this I am getting a lot of slow scans.

>From HBase Regionserver logs

hbase5.usdc2.cloud.com 2015-11-30 09:10:53,592 WARN
org.apache.hadoop.ipc.RpcServer: (responseTooSlow):
{"processingtimems":10630,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"
10.193.150.127:37070
","starttimems":1448874642962,"queuetimems":1,"class":"HRegionServer","responsesize":12,"method":"Scan"}


On Fri, Nov 27, 2015 at 11:41 PM, Mukesh Jha 
wrote:

> I'm working with cloudera hbase v0.98, my HBase table has ~5k regions.
>
> From the cloudera UI charts i see a lot of get & scan operations active on
> my table even after i shut down all the reader applications.
>
> I'm suspecting that this is impacting my scan performance.
>
> So I'd like to know if there is a way by which i can identify the hosts
> calling  these get/scan operations? I tried netstat and similar linux
> commands without much luck.
>



-- 


Thanks & Regards,

*Mukesh Jha *


High get/scan rates on HBase table even if no readers are on

2015-11-27 Thread Mukesh Jha
I'm working with cloudera hbase v0.98, my HBase table has ~5k regions.

>From the cloudera UI charts i see a lot of get & scan operations active on
my table even after i shut down all the reader applications.

I'm suspecting that this is impacting my scan performance.

So I'd like to know if there is a way by which i can identify the hosts
calling  these get/scan operations? I tried netstat and similar linux
commands without much luck.


Re: Spark-HBase connector

2014-12-19 Thread Mukesh Jha
Thanks Stack, looks promising will give it a try.

On Fri, Dec 19, 2014 at 3:28 AM, Stack  wrote:
>
> On Tue, Dec 16, 2014 at 10:52 AM, Stack  wrote:
> >
> > On Sun, Dec 14, 2014 at 10:49 PM, Mukesh Jha 
> > wrote:
> >>
> >> Hello Experts,
> >>
> >> I've come across multiple posts where users want to read/write to hbase
> >> from Spark/Spark-streaming apps and everyone has to implement the same
> >> logic.
> >>
> >> Does HBase has (or is there any ongoing work for the same) a spark
> >> connector similar to the cassandra connector *[1]*?
> >>
> >> *[1] *https://github.com/datastax/spark-cassandra-connector
> >>
> >>
> > You might try:
> >
> > https://github.com/tmalaska/SparkOnHBase
> >
>
> On the above, Ted Malaska just posted a nice how to blog:
> http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
> (Pardon the vendor colors; we'll work on getting how to into the refguide).
>
> St.Ack
>


-- 


Thanks & Regards,

*Mukesh Jha *


Spark-HBase connector

2014-12-14 Thread Mukesh Jha
Hello Experts,

I've come across multiple posts where users want to read/write to hbase
from Spark/Spark-streaming apps and everyone has to implement the same
logic.

Does HBase has (or is there any ongoing work for the same) a spark
connector similar to the cassandra connector *[1]*?

*[1] *https://github.com/datastax/spark-cassandra-connector

-- 
Thanks & Regards,

*Mukesh Jha *