Re: Understanding scan behaviour

2013-03-30 Thread Ted Yu
See javadoc of Scan:

   * @param stopRow row to stop scanner before (exclusive)

   */

  public Scan(byte [] startRow, byte [] stopRow) {


On Sat, Mar 30, 2013 at 8:25 AM, Mohit Anchlia wrote:

> Thanks, that's a good point about last byte being max :)
>
> When I query 1234555..1234556 do I also get row for 1234556 if one exist?
>
> On Sat, Mar 30, 2013 at 6:55 AM, Asaf Mesika 
> wrote:
>
> > Yes.
> > Watch out for last byte being max
> >
> >
> > On Fri, Mar 29, 2013 at 7:31 PM, Mohit Anchlia  > >wrote:
> >
> > > Thanks everyone, it's really helpful. I'll change my prefix filter to
> end
> > > row. Is it necessary to increment the last byte? So if I have hash of
> > > 1234555 my end key should be 1234556?
> > >
> > >
> > > On Thu, Mar 28, 2013 at 11:20 PM, ramkrishna vasudevan <
> > > ramkrishna.s.vasude...@gmail.com> wrote:
> > >
> > > > Mohith,
> > > >
> > > > It is always better to go with start row and end row if you are
> knowing
> > > > what are they.
> > > > Just add one byte more to the actual end row (inclusive row) and form
> > the
> > > > end key.  This will narrow down the search.
> > > >
> > > > Remeber the byte comparison is the way that HBase scans.
> > > > Regards
> > > > Ram
> > > >
> > > > On Fri, Mar 29, 2013 at 11:18 AM, Li, Min 
> > > wrote:
> > > >
> > > > > Hi, Mohit,
> > > > >
> > > > > Try using ENDROW. STARTROW&ENDROW is much faster than PrefixFilter.
> > > > >
> > > > > "+" ascii code is 43
> > > > > "," ascii code is 44
> > > > >
> > > > > scan 'SESSIONID_TIMELINE', {LIMIT => 1,STARTROW => '',
> > > > ENDROW=>'+++,'}
> > > > >
> > > > > Min
> > > > >
> > > > > -Original Message-
> > > > > From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
> > > > > Sent: Friday, March 29, 2013 1:18 AM
> > > > > To: user@hbase.apache.org
> > > > > Subject: Re: Understanding scan behaviour
> > > > >
> > > > > Could the prefix filter lead to full tablescan? In other words is
> > > > > PrefixFilter applied after fetching the rows?
> > > > >
> > > > > Another question I have is say I have row key abc and abd and I
> > search
> > > > for
> > > > > row "abc", is it always guranteed to be the first key when returned
> > > from
> > > > > scanned results? If so I can alway put a condition in the client
> app.
> > > > >
> > > > > On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu 
> wrote:
> > > > >
> > > > > > Take a look at the following in
> > > > > > hbase-server/src/main/ruby/shell/commands/scan.rb
> > > > > > (trunk)
> > > > > >
> > > > > >   hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
> > > > > > (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter (
> > > 123,
> > > > > > 456))"}
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia <
> > > mohitanch...@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > I see then I misunderstood the behaviour. My keys are id +
> > > timestamp
> > > > so
> > > > > > > that I can do a range type search. So what I really want is to
> > > > return a
> > > > > > row
> > > > > > > where id matches the prefix. Is there a way to do this without
> > > having
> > > > > to
> > > > > > > scan large amounts of data?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari <
> > > > > > > jean-m...@spaggiari.org> wrote:
> > > > > > >
> > > > > > > > Hi Mohit,
> > > > > > > >
> > > > > > > > "+" ascii code is 43
> > > > >

Re: Understanding scan behaviour

2013-03-30 Thread Mohit Anchlia
Thanks, that's a good point about last byte being max :)

When I query 1234555..1234556 do I also get row for 1234556 if one exist?

On Sat, Mar 30, 2013 at 6:55 AM, Asaf Mesika  wrote:

> Yes.
> Watch out for last byte being max
>
>
> On Fri, Mar 29, 2013 at 7:31 PM, Mohit Anchlia  >wrote:
>
> > Thanks everyone, it's really helpful. I'll change my prefix filter to end
> > row. Is it necessary to increment the last byte? So if I have hash of
> > 1234555 my end key should be 1234556?
> >
> >
> > On Thu, Mar 28, 2013 at 11:20 PM, ramkrishna vasudevan <
> > ramkrishna.s.vasude...@gmail.com> wrote:
> >
> > > Mohith,
> > >
> > > It is always better to go with start row and end row if you are knowing
> > > what are they.
> > > Just add one byte more to the actual end row (inclusive row) and form
> the
> > > end key.  This will narrow down the search.
> > >
> > > Remeber the byte comparison is the way that HBase scans.
> > > Regards
> > > Ram
> > >
> > > On Fri, Mar 29, 2013 at 11:18 AM, Li, Min 
> > wrote:
> > >
> > > > Hi, Mohit,
> > > >
> > > > Try using ENDROW. STARTROW&ENDROW is much faster than PrefixFilter.
> > > >
> > > > "+" ascii code is 43
> > > > "," ascii code is 44
> > > >
> > > > scan 'SESSIONID_TIMELINE', {LIMIT => 1,STARTROW => '',
> > > ENDROW=>'+++,'}
> > > >
> > > > Min
> > > >
> > > > -Original Message-
> > > > From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
> > > > Sent: Friday, March 29, 2013 1:18 AM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: Understanding scan behaviour
> > > >
> > > > Could the prefix filter lead to full tablescan? In other words is
> > > > PrefixFilter applied after fetching the rows?
> > > >
> > > > Another question I have is say I have row key abc and abd and I
> search
> > > for
> > > > row "abc", is it always guranteed to be the first key when returned
> > from
> > > > scanned results? If so I can alway put a condition in the client app.
> > > >
> > > > On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu  wrote:
> > > >
> > > > > Take a look at the following in
> > > > > hbase-server/src/main/ruby/shell/commands/scan.rb
> > > > > (trunk)
> > > > >
> > > > >   hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
> > > > > (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter (
> > 123,
> > > > > 456))"}
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia <
> > mohitanch...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > I see then I misunderstood the behaviour. My keys are id +
> > timestamp
> > > so
> > > > > > that I can do a range type search. So what I really want is to
> > > return a
> > > > > row
> > > > > > where id matches the prefix. Is there a way to do this without
> > having
> > > > to
> > > > > > scan large amounts of data?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari <
> > > > > > jean-m...@spaggiari.org> wrote:
> > > > > >
> > > > > > > Hi Mohit,
> > > > > > >
> > > > > > > "+" ascii code is 43
> > > > > > > "9" ascii code is 57.
> > > > > > >
> > > > > > > So "+9" is coming after "++". If you don't have any row with
> the
> > > > exact
> > > > > > > key "+", HBase will look for the first one after this one.
> > And
> > > in
> > > > > > > your case, it's
> > > +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF.
> > > > > > >
> > > > > > > JM
> > > > > > >
> > > > > > > 2013/3/28 Mohit Anchlia :
> > > > > > > > My understanding is that

Re: Understanding scan behaviour

2013-03-30 Thread Asaf Mesika
Yes.
Watch out for last byte being max


On Fri, Mar 29, 2013 at 7:31 PM, Mohit Anchlia wrote:

> Thanks everyone, it's really helpful. I'll change my prefix filter to end
> row. Is it necessary to increment the last byte? So if I have hash of
> 1234555 my end key should be 1234556?
>
>
> On Thu, Mar 28, 2013 at 11:20 PM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > Mohith,
> >
> > It is always better to go with start row and end row if you are knowing
> > what are they.
> > Just add one byte more to the actual end row (inclusive row) and form the
> > end key.  This will narrow down the search.
> >
> > Remeber the byte comparison is the way that HBase scans.
> > Regards
> > Ram
> >
> > On Fri, Mar 29, 2013 at 11:18 AM, Li, Min 
> wrote:
> >
> > > Hi, Mohit,
> > >
> > > Try using ENDROW. STARTROW&ENDROW is much faster than PrefixFilter.
> > >
> > > "+" ascii code is 43
> > > "," ascii code is 44
> > >
> > > scan 'SESSIONID_TIMELINE', {LIMIT => 1,STARTROW => '',
> > ENDROW=>'+++,'}
> > >
> > > Min
> > >
> > > -Original Message-
> > > From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
> > > Sent: Friday, March 29, 2013 1:18 AM
> > > To: user@hbase.apache.org
> > > Subject: Re: Understanding scan behaviour
> > >
> > > Could the prefix filter lead to full tablescan? In other words is
> > > PrefixFilter applied after fetching the rows?
> > >
> > > Another question I have is say I have row key abc and abd and I search
> > for
> > > row "abc", is it always guranteed to be the first key when returned
> from
> > > scanned results? If so I can alway put a condition in the client app.
> > >
> > > On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu  wrote:
> > >
> > > > Take a look at the following in
> > > > hbase-server/src/main/ruby/shell/commands/scan.rb
> > > > (trunk)
> > > >
> > > >   hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
> > > > (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter (
> 123,
> > > > 456))"}
> > > >
> > > > Cheers
> > > >
> > > > On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia <
> mohitanch...@gmail.com
> > > > >wrote:
> > > >
> > > > > I see then I misunderstood the behaviour. My keys are id +
> timestamp
> > so
> > > > > that I can do a range type search. So what I really want is to
> > return a
> > > > row
> > > > > where id matches the prefix. Is there a way to do this without
> having
> > > to
> > > > > scan large amounts of data?
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari <
> > > > > jean-m...@spaggiari.org> wrote:
> > > > >
> > > > > > Hi Mohit,
> > > > > >
> > > > > > "+" ascii code is 43
> > > > > > "9" ascii code is 57.
> > > > > >
> > > > > > So "+9" is coming after "++". If you don't have any row with the
> > > exact
> > > > > > key "+", HBase will look for the first one after this one.
> And
> > in
> > > > > > your case, it's
> > +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF.
> > > > > >
> > > > > > JM
> > > > > >
> > > > > > 2013/3/28 Mohit Anchlia :
> > > > > > > My understanding is that the row key would start with + for
> > > > > instance.
> > > > > > >
> > > > > > > On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari <
> > > > > > > jean-m...@spaggiari.org> wrote:
> > > > > > >
> > > > > > >> Hi Mohit,
> > > > > > >>
> > > > > > >> I see nothing wrong with the results below. What would I have
> > > > > expected?
> > > > > > >>
> > > > > > >> JM
> > > > > > >>
> > > > > > >> 2013/3/28 Mohit Anchlia :
> > > >

Re: Understanding scan behaviour

2013-03-29 Thread Mohit Anchlia
Thanks everyone, it's really helpful. I'll change my prefix filter to end
row. Is it necessary to increment the last byte? So if I have hash of
1234555 my end key should be 1234556?


On Thu, Mar 28, 2013 at 11:20 PM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> Mohith,
>
> It is always better to go with start row and end row if you are knowing
> what are they.
> Just add one byte more to the actual end row (inclusive row) and form the
> end key.  This will narrow down the search.
>
> Remeber the byte comparison is the way that HBase scans.
> Regards
> Ram
>
> On Fri, Mar 29, 2013 at 11:18 AM, Li, Min  wrote:
>
> > Hi, Mohit,
> >
> > Try using ENDROW. STARTROW&ENDROW is much faster than PrefixFilter.
> >
> > "+" ascii code is 43
> > "," ascii code is 44
> >
> > scan 'SESSIONID_TIMELINE', {LIMIT => 1,STARTROW => '',
> ENDROW=>'+++,'}
> >
> > Min
> >
> > -Original Message-
> > From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
> > Sent: Friday, March 29, 2013 1:18 AM
> > To: user@hbase.apache.org
> > Subject: Re: Understanding scan behaviour
> >
> > Could the prefix filter lead to full tablescan? In other words is
> > PrefixFilter applied after fetching the rows?
> >
> > Another question I have is say I have row key abc and abd and I search
> for
> > row "abc", is it always guranteed to be the first key when returned from
> > scanned results? If so I can alway put a condition in the client app.
> >
> > On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu  wrote:
> >
> > > Take a look at the following in
> > > hbase-server/src/main/ruby/shell/commands/scan.rb
> > > (trunk)
> > >
> > >   hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
> > > (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123,
> > > 456))"}
> > >
> > > Cheers
> > >
> > > On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia  > > >wrote:
> > >
> > > > I see then I misunderstood the behaviour. My keys are id + timestamp
> so
> > > > that I can do a range type search. So what I really want is to
> return a
> > > row
> > > > where id matches the prefix. Is there a way to do this without having
> > to
> > > > scan large amounts of data?
> > > >
> > > >
> > > >
> > > > On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari <
> > > > jean-m...@spaggiari.org> wrote:
> > > >
> > > > > Hi Mohit,
> > > > >
> > > > > "+" ascii code is 43
> > > > > "9" ascii code is 57.
> > > > >
> > > > > So "+9" is coming after "++". If you don't have any row with the
> > exact
> > > > > key "+", HBase will look for the first one after this one. And
> in
> > > > > your case, it's
> +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF.
> > > > >
> > > > > JM
> > > > >
> > > > > 2013/3/28 Mohit Anchlia :
> > > > > > My understanding is that the row key would start with + for
> > > > instance.
> > > > > >
> > > > > > On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari <
> > > > > > jean-m...@spaggiari.org> wrote:
> > > > > >
> > > > > >> Hi Mohit,
> > > > > >>
> > > > > >> I see nothing wrong with the results below. What would I have
> > > > expected?
> > > > > >>
> > > > > >> JM
> > > > > >>
> > > > > >> 2013/3/28 Mohit Anchlia :
> > > > > >>  > I am running 92.1 version and this is what happens.
> > > > > >> >
> > > > > >> >
> > > > > >> > hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
> > > STARTROW
> > > > =>
> > > > > >> > 'sdw0'}
> > > > > >> > ROW
>  COLUMN+CELL
> > > > > >> >  s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x
> > > > > >> > column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106,
> > > > > >> > value=PAGE

Re: Understanding scan behaviour

2013-03-29 Thread James Taylor

Mohith,
Are you wanting to reduce the amount of data you're scanning and bring 
down your query time when:

- you have a row key has a multi-part row key of a string and time value and
- you know the prefix of the string and a range of the time value?
That's possible (but not easy) to do with HBase using the filter's 
ability to return a seek hint to jump to the next set of contiguous 
rows. If the cardinality of your string value isn't too large, this 
approach can make a pretty dramatic performance improvement.


You should take a look at Phoenix 
(https://github.com/forcedotcom/phoenix), a SQL skin on top of HBase - 
we just introduced the above optimization. You'd create your table like 
this:


CREATE TABLE t1 (id VARCHAR not null, timestamp DATE not null CONSTRAINT 
pk PRIMARY KEY (id, timestamp));


Then your query would look like this:

SELECT id, timestamp FROM t1 WHERE id LIKE 'abc%' AND timestamp > ? AND 
timestamp < ?;


and you'd bind the ? using the regular JDBC PreparedStatement APIs.

Regards,
James
@JamesPlusPlus

On 03/28/2013 11:20 PM, ramkrishna vasudevan wrote:

Mohith,

It is always better to go with start row and end row if you are knowing
what are they.
Just add one byte more to the actual end row (inclusive row) and form the
end key.  This will narrow down the search.

Remeber the byte comparison is the way that HBase scans.
Regards
Ram

On Fri, Mar 29, 2013 at 11:18 AM, Li, Min  wrote:


Hi, Mohit,

Try using ENDROW. STARTROW&ENDROW is much faster than PrefixFilter.

"+" ascii code is 43
"," ascii code is 44

scan 'SESSIONID_TIMELINE', {LIMIT => 1,STARTROW => '', ENDROW=>'+++,'}

Min

-Original Message-
From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
Sent: Friday, March 29, 2013 1:18 AM
To: user@hbase.apache.org
Subject: Re: Understanding scan behaviour

Could the prefix filter lead to full tablescan? In other words is
PrefixFilter applied after fetching the rows?

Another question I have is say I have row key abc and abd and I search for
row "abc", is it always guranteed to be the first key when returned from
scanned results? If so I can alway put a condition in the client app.

On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu  wrote:


Take a look at the following in
hbase-server/src/main/ruby/shell/commands/scan.rb
(trunk)

   hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
 (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123,
456))"}

Cheers

On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia 
wrote:
I see then I misunderstood the behaviour. My keys are id + timestamp so
that I can do a range type search. So what I really want is to return a

row

where id matches the prefix. Is there a way to do this without having

to

scan large amounts of data?



On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:


Hi Mohit,

"+" ascii code is 43
"9" ascii code is 57.

So "+9" is coming after "++". If you don't have any row with the

exact

key "+", HBase will look for the first one after this one. And in
your case, it's +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF.

JM

2013/3/28 Mohit Anchlia :

My understanding is that the row key would start with + for

instance.

On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:


Hi Mohit,

I see nothing wrong with the results below. What would I have

expected?

JM

2013/3/28 Mohit Anchlia :
  > I am running 92.1 version and this is what happens.


hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,

STARTROW

=>

'sdw0'}
ROW  COLUMN+CELL
  s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x
column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106,
value=PAGE\x09\x091363056252990\x09\x09/
  7F\xFF\xFE\xC2\xA3\x84Z\x7F

1 row(s) in 0.0450 seconds
hbase(main):004:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,

STARTROW

=>

'--'}
ROW  COLUMN+CELL
  -\xA1\xAF>r\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\
column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714,
value=PAGE\x09239923973\x091363384698919\x09/
  xFF\xFE\xC2\x8F\xF0\xC1\xBF
   row(s) in 0.0500 seconds
hbase(main):005:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,

STARTROW

=>

''}
ROW  COLUMN+CELL
  +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF
column=SID_T_MTX:\x00\x00<2, timestamp=1364404155426,
value=PAGE\x09\x091364404145275\x09 \x09/
  E\xC2S-\x08\x1F
1 row(s) in 0.0640 seconds
hbase(main):006:0>


On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.co

Re: Understanding scan behaviour

2013-03-28 Thread ramkrishna vasudevan
Mohith,

It is always better to go with start row and end row if you are knowing
what are they.
Just add one byte more to the actual end row (inclusive row) and form the
end key.  This will narrow down the search.

Remeber the byte comparison is the way that HBase scans.
Regards
Ram

On Fri, Mar 29, 2013 at 11:18 AM, Li, Min  wrote:

> Hi, Mohit,
>
> Try using ENDROW. STARTROW&ENDROW is much faster than PrefixFilter.
>
> "+" ascii code is 43
> "," ascii code is 44
>
> scan 'SESSIONID_TIMELINE', {LIMIT => 1,STARTROW => '', ENDROW=>'+++,'}
>
> Min
>
> -Original Message-
> From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
> Sent: Friday, March 29, 2013 1:18 AM
> To: user@hbase.apache.org
> Subject: Re: Understanding scan behaviour
>
> Could the prefix filter lead to full tablescan? In other words is
> PrefixFilter applied after fetching the rows?
>
> Another question I have is say I have row key abc and abd and I search for
> row "abc", is it always guranteed to be the first key when returned from
> scanned results? If so I can alway put a condition in the client app.
>
> On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu  wrote:
>
> > Take a look at the following in
> > hbase-server/src/main/ruby/shell/commands/scan.rb
> > (trunk)
> >
> >   hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
> > (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123,
> > 456))"}
> >
> > Cheers
> >
> > On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia  > >wrote:
> >
> > > I see then I misunderstood the behaviour. My keys are id + timestamp so
> > > that I can do a range type search. So what I really want is to return a
> > row
> > > where id matches the prefix. Is there a way to do this without having
> to
> > > scan large amounts of data?
> > >
> > >
> > >
> > > On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari <
> > > jean-m...@spaggiari.org> wrote:
> > >
> > > > Hi Mohit,
> > > >
> > > > "+" ascii code is 43
> > > > "9" ascii code is 57.
> > > >
> > > > So "+9" is coming after "++". If you don't have any row with the
> exact
> > > > key "+", HBase will look for the first one after this one. And in
> > > > your case, it's +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF.
> > > >
> > > > JM
> > > >
> > > > 2013/3/28 Mohit Anchlia :
> > > > > My understanding is that the row key would start with + for
> > > instance.
> > > > >
> > > > > On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari <
> > > > > jean-m...@spaggiari.org> wrote:
> > > > >
> > > > >> Hi Mohit,
> > > > >>
> > > > >> I see nothing wrong with the results below. What would I have
> > > expected?
> > > > >>
> > > > >> JM
> > > > >>
> > > > >> 2013/3/28 Mohit Anchlia :
> > > > >>  > I am running 92.1 version and this is what happens.
> > > > >> >
> > > > >> >
> > > > >> > hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
> > STARTROW
> > > =>
> > > > >> > 'sdw0'}
> > > > >> > ROW  COLUMN+CELL
> > > > >> >  s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x
> > > > >> > column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106,
> > > > >> > value=PAGE\x09\x091363056252990\x09\x09/
> > > > >> >  7F\xFF\xFE\xC2\xA3\x84Z\x7F
> > > > >> >
> > > > >> > 1 row(s) in 0.0450 seconds
> > > > >> > hbase(main):004:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
> > STARTROW
> > > =>
> > > > >> > '--'}
> > > > >> > ROW  COLUMN+CELL
> > > > >> >  -\xA1\xAF>r\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\
> > > > >> > column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714,
> > > > >> > value=PAGE\x09239923973\x091363384698919\x09/
> > > > >> >  xFF\xFE\xC2\x8F\xF0\xC1\xBF

RE: Understanding scan behaviour

2013-03-28 Thread Li, Min
Hi, Mohit,

Try using ENDROW. STARTROW&ENDROW is much faster than PrefixFilter.

"+" ascii code is 43
"," ascii code is 44

scan 'SESSIONID_TIMELINE', {LIMIT => 1,STARTROW => '', ENDROW=>'+++,'}

Min

-Original Message-
From: Mohit Anchlia [mailto:mohitanch...@gmail.com] 
Sent: Friday, March 29, 2013 1:18 AM
To: user@hbase.apache.org
Subject: Re: Understanding scan behaviour

Could the prefix filter lead to full tablescan? In other words is
PrefixFilter applied after fetching the rows?

Another question I have is say I have row key abc and abd and I search for
row "abc", is it always guranteed to be the first key when returned from
scanned results? If so I can alway put a condition in the client app.

On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu  wrote:

> Take a look at the following in
> hbase-server/src/main/ruby/shell/commands/scan.rb
> (trunk)
>
>   hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
> (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123,
> 456))"}
>
> Cheers
>
> On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia  >wrote:
>
> > I see then I misunderstood the behaviour. My keys are id + timestamp so
> > that I can do a range type search. So what I really want is to return a
> row
> > where id matches the prefix. Is there a way to do this without having to
> > scan large amounts of data?
> >
> >
> >
> > On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari <
> > jean-m...@spaggiari.org> wrote:
> >
> > > Hi Mohit,
> > >
> > > "+" ascii code is 43
> > > "9" ascii code is 57.
> > >
> > > So "+9" is coming after "++". If you don't have any row with the exact
> > > key "+", HBase will look for the first one after this one. And in
> > > your case, it's +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF.
> > >
> > > JM
> > >
> > > 2013/3/28 Mohit Anchlia :
> > > > My understanding is that the row key would start with + for
> > instance.
> > > >
> > > > On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari <
> > > > jean-m...@spaggiari.org> wrote:
> > > >
> > > >> Hi Mohit,
> > > >>
> > > >> I see nothing wrong with the results below. What would I have
> > expected?
> > > >>
> > > >> JM
> > > >>
> > > >> 2013/3/28 Mohit Anchlia :
> > > >>  > I am running 92.1 version and this is what happens.
> > > >> >
> > > >> >
> > > >> > hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
> STARTROW
> > =>
> > > >> > 'sdw0'}
> > > >> > ROW  COLUMN+CELL
> > > >> >  s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x
> > > >> > column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106,
> > > >> > value=PAGE\x09\x091363056252990\x09\x09/
> > > >> >  7F\xFF\xFE\xC2\xA3\x84Z\x7F
> > > >> >
> > > >> > 1 row(s) in 0.0450 seconds
> > > >> > hbase(main):004:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
> STARTROW
> > =>
> > > >> > '--'}
> > > >> > ROW  COLUMN+CELL
> > > >> >  -\xA1\xAF>r\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\
> > > >> > column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714,
> > > >> > value=PAGE\x09239923973\x091363384698919\x09/
> > > >> >  xFF\xFE\xC2\x8F\xF0\xC1\xBF
> > > >> >   row(s) in 0.0500 seconds
> > > >> > hbase(main):005:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
> STARTROW
> > =>
> > > >> > ''}
> > > >> > ROW  COLUMN+CELL
> > > >> >  +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF
> > > >> > column=SID_T_MTX:\x00\x00<2, timestamp=1364404155426,
> > > >> > value=PAGE\x09\x091364404145275\x09 \x09/
> > > >> >  E\xC2S-\x08\x1F
> > > >> > 1 row(s) in 0.0640 seconds
> > > >> > hbase(main):006:0>
> > > >> >
> > > >> >
> > > >> > On Wed, Mar 

Re: Understanding scan behaviour

2013-03-28 Thread Ted Yu
See javadoc of TimestampsFilter which reveals how you can narrow the scan:

 * Note: Use of this filter overrides any time range/time stamp

 * options specified using {@link
org.apache.hadoop.hbase.client.Get#setTimeRange(long, long)},

 * {@link org.apache.hadoop.hbase.client.Scan#setTimeRange(long, long)}, {@link
org.apache.hadoop.hbase.client.Get#setTimeStamp(long)},

 * or {@link org.apache.hadoop.hbase.client.Scan#setTimeStamp(long)}.


Answer to your second question is Yes.

On Thu, Mar 28, 2013 at 10:17 AM, Mohit Anchlia wrote:

> Could the prefix filter lead to full tablescan? In other words is
> PrefixFilter applied after fetching the rows?
>
> Another question I have is say I have row key abc and abd and I search for
> row "abc", is it always guranteed to be the first key when returned from
> scanned results? If so I can alway put a condition in the client app.
>
> On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu  wrote:
>
> > Take a look at the following in
> > hbase-server/src/main/ruby/shell/commands/scan.rb
> > (trunk)
> >
> >   hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
> > (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123,
> > 456))"}
> >
> > Cheers
> >
> > On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia  > >wrote:
> >
> > > I see then I misunderstood the behaviour. My keys are id + timestamp so
> > > that I can do a range type search. So what I really want is to return a
> > row
> > > where id matches the prefix. Is there a way to do this without having
> to
> > > scan large amounts of data?
> > >
> > >
> > >
> > > On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari <
> > > jean-m...@spaggiari.org> wrote:
> > >
> > > > Hi Mohit,
> > > >
> > > > "+" ascii code is 43
> > > > "9" ascii code is 57.
> > > >
> > > > So "+9" is coming after "++". If you don't have any row with the
> exact
> > > > key "+", HBase will look for the first one after this one. And in
> > > > your case, it's +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF.
> > > >
> > > > JM
> > > >
> > > > 2013/3/28 Mohit Anchlia :
> > > > > My understanding is that the row key would start with + for
> > > instance.
> > > > >
> > > > > On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari <
> > > > > jean-m...@spaggiari.org> wrote:
> > > > >
> > > > >> Hi Mohit,
> > > > >>
> > > > >> I see nothing wrong with the results below. What would I have
> > > expected?
> > > > >>
> > > > >> JM
> > > > >>
> > > > >> 2013/3/28 Mohit Anchlia :
> > > > >>  > I am running 92.1 version and this is what happens.
> > > > >> >
> > > > >> >
> > > > >> > hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
> > STARTROW
> > > =>
> > > > >> > 'sdw0'}
> > > > >> > ROW  COLUMN+CELL
> > > > >> >  s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x
> > > > >> > column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106,
> > > > >> > value=PAGE\x09\x091363056252990\x09\x09/
> > > > >> >  7F\xFF\xFE\xC2\xA3\x84Z\x7F
> > > > >> >
> > > > >> > 1 row(s) in 0.0450 seconds
> > > > >> > hbase(main):004:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
> > STARTROW
> > > =>
> > > > >> > '--'}
> > > > >> > ROW  COLUMN+CELL
> > > > >> >  -\xA1\xAF>r\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\
> > > > >> > column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714,
> > > > >> > value=PAGE\x09239923973\x091363384698919\x09/
> > > > >> >  xFF\xFE\xC2\x8F\xF0\xC1\xBF
> > > > >> >   row(s) in 0.0500 seconds
> > > > >> > hbase(main):005:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
> > STARTROW
> > > =>
> > > > >> > ''}
> > > > >> > ROW  COLUMN+CELL
> > > > >> >  +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF
> > > > >> > column=SID_T_MTX:\x00\x00<2, timestamp=1364404155426,
> > > > >> > value=PAGE\x09\x091364404145275\x09 \x09/
> > > > >> >  E\xC2S-\x08\x1F
> > > > >> > 1 row(s) in 0.0640 seconds
> > > > >> > hbase(main):006:0>
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan <
> > > > >> > ramkrishna.s.vasude...@gmail.com> wrote:
> > > > >> >
> > > > >> >> Same question, same time :)
> > > > >> >>
> > > > >> >> Regards
> > > > >> >> Ram
> > > > >> >>
> > > > >> >> On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan <
> > > > >> >> ramkrishna.s.vasude...@gmail.com> wrote:
> > > > >> >>
> > > > >> >> > Could you give us some more insights on this?
> > > > >> >> > So you mean when you set the row key as 'azzzaaa', though
> this
> > > row
> > > > >> does
> > > > >> >> > not exist, the scanner returns some other row?  Or it is
> giving
> > > > you a
> > > > >> row
> > > > >> >> > that does not exist?
> > > > >> >> >
> > > > >> >> > Or you mean it is doing a full table scan?
> > > > >> >> >
> > > > >> >> > Which version of HBase and what type of filters are you
> using?
> > > > >> >> > Regards
> > > > >> >> > Ram
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > On Th

Re: Understanding scan behaviour

2013-03-28 Thread Mohit Anchlia
Could the prefix filter lead to full tablescan? In other words is
PrefixFilter applied after fetching the rows?

Another question I have is say I have row key abc and abd and I search for
row "abc", is it always guranteed to be the first key when returned from
scanned results? If so I can alway put a condition in the client app.

On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu  wrote:

> Take a look at the following in
> hbase-server/src/main/ruby/shell/commands/scan.rb
> (trunk)
>
>   hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
> (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123,
> 456))"}
>
> Cheers
>
> On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia  >wrote:
>
> > I see then I misunderstood the behaviour. My keys are id + timestamp so
> > that I can do a range type search. So what I really want is to return a
> row
> > where id matches the prefix. Is there a way to do this without having to
> > scan large amounts of data?
> >
> >
> >
> > On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari <
> > jean-m...@spaggiari.org> wrote:
> >
> > > Hi Mohit,
> > >
> > > "+" ascii code is 43
> > > "9" ascii code is 57.
> > >
> > > So "+9" is coming after "++". If you don't have any row with the exact
> > > key "+", HBase will look for the first one after this one. And in
> > > your case, it's +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF.
> > >
> > > JM
> > >
> > > 2013/3/28 Mohit Anchlia :
> > > > My understanding is that the row key would start with + for
> > instance.
> > > >
> > > > On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari <
> > > > jean-m...@spaggiari.org> wrote:
> > > >
> > > >> Hi Mohit,
> > > >>
> > > >> I see nothing wrong with the results below. What would I have
> > expected?
> > > >>
> > > >> JM
> > > >>
> > > >> 2013/3/28 Mohit Anchlia :
> > > >>  > I am running 92.1 version and this is what happens.
> > > >> >
> > > >> >
> > > >> > hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
> STARTROW
> > =>
> > > >> > 'sdw0'}
> > > >> > ROW  COLUMN+CELL
> > > >> >  s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x
> > > >> > column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106,
> > > >> > value=PAGE\x09\x091363056252990\x09\x09/
> > > >> >  7F\xFF\xFE\xC2\xA3\x84Z\x7F
> > > >> >
> > > >> > 1 row(s) in 0.0450 seconds
> > > >> > hbase(main):004:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
> STARTROW
> > =>
> > > >> > '--'}
> > > >> > ROW  COLUMN+CELL
> > > >> >  -\xA1\xAF>r\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\
> > > >> > column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714,
> > > >> > value=PAGE\x09239923973\x091363384698919\x09/
> > > >> >  xFF\xFE\xC2\x8F\xF0\xC1\xBF
> > > >> >   row(s) in 0.0500 seconds
> > > >> > hbase(main):005:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
> STARTROW
> > =>
> > > >> > ''}
> > > >> > ROW  COLUMN+CELL
> > > >> >  +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF
> > > >> > column=SID_T_MTX:\x00\x00<2, timestamp=1364404155426,
> > > >> > value=PAGE\x09\x091364404145275\x09 \x09/
> > > >> >  E\xC2S-\x08\x1F
> > > >> > 1 row(s) in 0.0640 seconds
> > > >> > hbase(main):006:0>
> > > >> >
> > > >> >
> > > >> > On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan <
> > > >> > ramkrishna.s.vasude...@gmail.com> wrote:
> > > >> >
> > > >> >> Same question, same time :)
> > > >> >>
> > > >> >> Regards
> > > >> >> Ram
> > > >> >>
> > > >> >> On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan <
> > > >> >> ramkrishna.s.vasude...@gmail.com> wrote:
> > > >> >>
> > > >> >> > Could you give us some more insights on this?
> > > >> >> > So you mean when you set the row key as 'azzzaaa', though this
> > row
> > > >> does
> > > >> >> > not exist, the scanner returns some other row?  Or it is giving
> > > you a
> > > >> row
> > > >> >> > that does not exist?
> > > >> >> >
> > > >> >> > Or you mean it is doing a full table scan?
> > > >> >> >
> > > >> >> > Which version of HBase and what type of filters are you using?
> > > >> >> > Regards
> > > >> >> > Ram
> > > >> >> >
> > > >> >> >
> > > >> >> > On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia <
> > > >> mohitanch...@gmail.com
> > > >> >> >wrote:
> > > >> >> >
> > > >> >> >> I have key in the form of "hashedid + timestamp" but when I
> run
> > > scan
> > > >> I
> > > >> >> get
> > > >> >> >> rows for almost every value. For instance if I run scan for
> > > 'azzzaaa'
> > > >> >> that
> > > >> >> >> doesn't even exist even then I get the results.
> > > >> >> >>
> > > >> >> >> Could someone help me understand what might be going on here?
> > > >> >> >>
> > > >> >> >
> > > >> >> >
> > > >> >>
> > > >>
> > >
> >
>


Re: Understanding scan behaviour

2013-03-28 Thread Ted Yu
Take a look at the following in
hbase-server/src/main/ruby/shell/commands/scan.rb
(trunk)

  hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
(QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123,
456))"}

Cheers

On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia wrote:

> I see then I misunderstood the behaviour. My keys are id + timestamp so
> that I can do a range type search. So what I really want is to return a row
> where id matches the prefix. Is there a way to do this without having to
> scan large amounts of data?
>
>
>
> On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
> > Hi Mohit,
> >
> > "+" ascii code is 43
> > "9" ascii code is 57.
> >
> > So "+9" is coming after "++". If you don't have any row with the exact
> > key "+", HBase will look for the first one after this one. And in
> > your case, it's +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF.
> >
> > JM
> >
> > 2013/3/28 Mohit Anchlia :
> > > My understanding is that the row key would start with + for
> instance.
> > >
> > > On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari <
> > > jean-m...@spaggiari.org> wrote:
> > >
> > >> Hi Mohit,
> > >>
> > >> I see nothing wrong with the results below. What would I have
> expected?
> > >>
> > >> JM
> > >>
> > >> 2013/3/28 Mohit Anchlia :
> > >>  > I am running 92.1 version and this is what happens.
> > >> >
> > >> >
> > >> > hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW
> =>
> > >> > 'sdw0'}
> > >> > ROW  COLUMN+CELL
> > >> >  s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x
> > >> > column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106,
> > >> > value=PAGE\x09\x091363056252990\x09\x09/
> > >> >  7F\xFF\xFE\xC2\xA3\x84Z\x7F
> > >> >
> > >> > 1 row(s) in 0.0450 seconds
> > >> > hbase(main):004:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW
> =>
> > >> > '--'}
> > >> > ROW  COLUMN+CELL
> > >> >  -\xA1\xAF>r\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\
> > >> > column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714,
> > >> > value=PAGE\x09239923973\x091363384698919\x09/
> > >> >  xFF\xFE\xC2\x8F\xF0\xC1\xBF
> > >> >   row(s) in 0.0500 seconds
> > >> > hbase(main):005:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW
> =>
> > >> > ''}
> > >> > ROW  COLUMN+CELL
> > >> >  +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF
> > >> > column=SID_T_MTX:\x00\x00<2, timestamp=1364404155426,
> > >> > value=PAGE\x09\x091364404145275\x09 \x09/
> > >> >  E\xC2S-\x08\x1F
> > >> > 1 row(s) in 0.0640 seconds
> > >> > hbase(main):006:0>
> > >> >
> > >> >
> > >> > On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan <
> > >> > ramkrishna.s.vasude...@gmail.com> wrote:
> > >> >
> > >> >> Same question, same time :)
> > >> >>
> > >> >> Regards
> > >> >> Ram
> > >> >>
> > >> >> On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan <
> > >> >> ramkrishna.s.vasude...@gmail.com> wrote:
> > >> >>
> > >> >> > Could you give us some more insights on this?
> > >> >> > So you mean when you set the row key as 'azzzaaa', though this
> row
> > >> does
> > >> >> > not exist, the scanner returns some other row?  Or it is giving
> > you a
> > >> row
> > >> >> > that does not exist?
> > >> >> >
> > >> >> > Or you mean it is doing a full table scan?
> > >> >> >
> > >> >> > Which version of HBase and what type of filters are you using?
> > >> >> > Regards
> > >> >> > Ram
> > >> >> >
> > >> >> >
> > >> >> > On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia <
> > >> mohitanch...@gmail.com
> > >> >> >wrote:
> > >> >> >
> > >> >> >> I have key in the form of "hashedid + timestamp" but when I run
> > scan
> > >> I
> > >> >> get
> > >> >> >> rows for almost every value. For instance if I run scan for
> > 'azzzaaa'
> > >> >> that
> > >> >> >> doesn't even exist even then I get the results.
> > >> >> >>
> > >> >> >> Could someone help me understand what might be going on here?
> > >> >> >>
> > >> >> >
> > >> >> >
> > >> >>
> > >>
> >
>


Re: Understanding scan behaviour

2013-03-28 Thread Mohit Anchlia
I see then I misunderstood the behaviour. My keys are id + timestamp so
that I can do a range type search. So what I really want is to return a row
where id matches the prefix. Is there a way to do this without having to
scan large amounts of data?



On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> Hi Mohit,
>
> "+" ascii code is 43
> "9" ascii code is 57.
>
> So "+9" is coming after "++". If you don't have any row with the exact
> key "+", HBase will look for the first one after this one. And in
> your case, it's +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF.
>
> JM
>
> 2013/3/28 Mohit Anchlia :
> > My understanding is that the row key would start with + for instance.
> >
> > On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari <
> > jean-m...@spaggiari.org> wrote:
> >
> >> Hi Mohit,
> >>
> >> I see nothing wrong with the results below. What would I have expected?
> >>
> >> JM
> >>
> >> 2013/3/28 Mohit Anchlia :
> >>  > I am running 92.1 version and this is what happens.
> >> >
> >> >
> >> > hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW =>
> >> > 'sdw0'}
> >> > ROW  COLUMN+CELL
> >> >  s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x
> >> > column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106,
> >> > value=PAGE\x09\x091363056252990\x09\x09/
> >> >  7F\xFF\xFE\xC2\xA3\x84Z\x7F
> >> >
> >> > 1 row(s) in 0.0450 seconds
> >> > hbase(main):004:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW =>
> >> > '--'}
> >> > ROW  COLUMN+CELL
> >> >  -\xA1\xAF>r\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\
> >> > column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714,
> >> > value=PAGE\x09239923973\x091363384698919\x09/
> >> >  xFF\xFE\xC2\x8F\xF0\xC1\xBF
> >> >   row(s) in 0.0500 seconds
> >> > hbase(main):005:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW =>
> >> > ''}
> >> > ROW  COLUMN+CELL
> >> >  +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF
> >> > column=SID_T_MTX:\x00\x00<2, timestamp=1364404155426,
> >> > value=PAGE\x09\x091364404145275\x09 \x09/
> >> >  E\xC2S-\x08\x1F
> >> > 1 row(s) in 0.0640 seconds
> >> > hbase(main):006:0>
> >> >
> >> >
> >> > On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan <
> >> > ramkrishna.s.vasude...@gmail.com> wrote:
> >> >
> >> >> Same question, same time :)
> >> >>
> >> >> Regards
> >> >> Ram
> >> >>
> >> >> On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan <
> >> >> ramkrishna.s.vasude...@gmail.com> wrote:
> >> >>
> >> >> > Could you give us some more insights on this?
> >> >> > So you mean when you set the row key as 'azzzaaa', though this row
> >> does
> >> >> > not exist, the scanner returns some other row?  Or it is giving
> you a
> >> row
> >> >> > that does not exist?
> >> >> >
> >> >> > Or you mean it is doing a full table scan?
> >> >> >
> >> >> > Which version of HBase and what type of filters are you using?
> >> >> > Regards
> >> >> > Ram
> >> >> >
> >> >> >
> >> >> > On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia <
> >> mohitanch...@gmail.com
> >> >> >wrote:
> >> >> >
> >> >> >> I have key in the form of "hashedid + timestamp" but when I run
> scan
> >> I
> >> >> get
> >> >> >> rows for almost every value. For instance if I run scan for
> 'azzzaaa'
> >> >> that
> >> >> >> doesn't even exist even then I get the results.
> >> >> >>
> >> >> >> Could someone help me understand what might be going on here?
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >>
>


Re: Understanding scan behaviour

2013-03-28 Thread Jean-Marc Spaggiari
Hi Mohit,

"+" ascii code is 43
"9" ascii code is 57.

So "+9" is coming after "++". If you don't have any row with the exact
key "+", HBase will look for the first one after this one. And in
your case, it's +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF.

JM

2013/3/28 Mohit Anchlia :
> My understanding is that the row key would start with + for instance.
>
> On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
>> Hi Mohit,
>>
>> I see nothing wrong with the results below. What would I have expected?
>>
>> JM
>>
>> 2013/3/28 Mohit Anchlia :
>>  > I am running 92.1 version and this is what happens.
>> >
>> >
>> > hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW =>
>> > 'sdw0'}
>> > ROW  COLUMN+CELL
>> >  s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x
>> > column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106,
>> > value=PAGE\x09\x091363056252990\x09\x09/
>> >  7F\xFF\xFE\xC2\xA3\x84Z\x7F
>> >
>> > 1 row(s) in 0.0450 seconds
>> > hbase(main):004:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW =>
>> > '--'}
>> > ROW  COLUMN+CELL
>> >  -\xA1\xAF>r\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\
>> > column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714,
>> > value=PAGE\x09239923973\x091363384698919\x09/
>> >  xFF\xFE\xC2\x8F\xF0\xC1\xBF
>> >   row(s) in 0.0500 seconds
>> > hbase(main):005:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW =>
>> > ''}
>> > ROW  COLUMN+CELL
>> >  +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF
>> > column=SID_T_MTX:\x00\x00<2, timestamp=1364404155426,
>> > value=PAGE\x09\x091364404145275\x09 \x09/
>> >  E\xC2S-\x08\x1F
>> > 1 row(s) in 0.0640 seconds
>> > hbase(main):006:0>
>> >
>> >
>> > On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan <
>> > ramkrishna.s.vasude...@gmail.com> wrote:
>> >
>> >> Same question, same time :)
>> >>
>> >> Regards
>> >> Ram
>> >>
>> >> On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan <
>> >> ramkrishna.s.vasude...@gmail.com> wrote:
>> >>
>> >> > Could you give us some more insights on this?
>> >> > So you mean when you set the row key as 'azzzaaa', though this row
>> does
>> >> > not exist, the scanner returns some other row?  Or it is giving you a
>> row
>> >> > that does not exist?
>> >> >
>> >> > Or you mean it is doing a full table scan?
>> >> >
>> >> > Which version of HBase and what type of filters are you using?
>> >> > Regards
>> >> > Ram
>> >> >
>> >> >
>> >> > On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia <
>> mohitanch...@gmail.com
>> >> >wrote:
>> >> >
>> >> >> I have key in the form of "hashedid + timestamp" but when I run scan
>> I
>> >> get
>> >> >> rows for almost every value. For instance if I run scan for 'azzzaaa'
>> >> that
>> >> >> doesn't even exist even then I get the results.
>> >> >>
>> >> >> Could someone help me understand what might be going on here?
>> >> >>
>> >> >
>> >> >
>> >>
>>


Re: Understanding scan behaviour

2013-03-28 Thread Mohit Anchlia
My understanding is that the row key would start with + for instance.

On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> Hi Mohit,
>
> I see nothing wrong with the results below. What would I have expected?
>
> JM
>
> 2013/3/28 Mohit Anchlia :
>  > I am running 92.1 version and this is what happens.
> >
> >
> > hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW =>
> > 'sdw0'}
> > ROW  COLUMN+CELL
> >  s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x
> > column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106,
> > value=PAGE\x09\x091363056252990\x09\x09/
> >  7F\xFF\xFE\xC2\xA3\x84Z\x7F
> >
> > 1 row(s) in 0.0450 seconds
> > hbase(main):004:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW =>
> > '--'}
> > ROW  COLUMN+CELL
> >  -\xA1\xAF>r\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\
> > column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714,
> > value=PAGE\x09239923973\x091363384698919\x09/
> >  xFF\xFE\xC2\x8F\xF0\xC1\xBF
> >   row(s) in 0.0500 seconds
> > hbase(main):005:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW =>
> > ''}
> > ROW  COLUMN+CELL
> >  +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF
> > column=SID_T_MTX:\x00\x00<2, timestamp=1364404155426,
> > value=PAGE\x09\x091364404145275\x09 \x09/
> >  E\xC2S-\x08\x1F
> > 1 row(s) in 0.0640 seconds
> > hbase(main):006:0>
> >
> >
> > On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan <
> > ramkrishna.s.vasude...@gmail.com> wrote:
> >
> >> Same question, same time :)
> >>
> >> Regards
> >> Ram
> >>
> >> On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan <
> >> ramkrishna.s.vasude...@gmail.com> wrote:
> >>
> >> > Could you give us some more insights on this?
> >> > So you mean when you set the row key as 'azzzaaa', though this row
> does
> >> > not exist, the scanner returns some other row?  Or it is giving you a
> row
> >> > that does not exist?
> >> >
> >> > Or you mean it is doing a full table scan?
> >> >
> >> > Which version of HBase and what type of filters are you using?
> >> > Regards
> >> > Ram
> >> >
> >> >
> >> > On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia <
> mohitanch...@gmail.com
> >> >wrote:
> >> >
> >> >> I have key in the form of "hashedid + timestamp" but when I run scan
> I
> >> get
> >> >> rows for almost every value. For instance if I run scan for 'azzzaaa'
> >> that
> >> >> doesn't even exist even then I get the results.
> >> >>
> >> >> Could someone help me understand what might be going on here?
> >> >>
> >> >
> >> >
> >>
>


Re: Understanding scan behaviour

2013-03-28 Thread Jean-Marc Spaggiari
Hi Mohit,

I see nothing wrong with the results below. What would I have expected?

JM

2013/3/28 Mohit Anchlia :
> I am running 92.1 version and this is what happens.
>
>
> hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW =>
> 'sdw0'}
> ROW  COLUMN+CELL
>  s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x
> column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106,
> value=PAGE\x09\x091363056252990\x09\x09/
>  7F\xFF\xFE\xC2\xA3\x84Z\x7F
>
> 1 row(s) in 0.0450 seconds
> hbase(main):004:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW =>
> '--'}
> ROW  COLUMN+CELL
>  -\xA1\xAF>r\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\
> column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714,
> value=PAGE\x09239923973\x091363384698919\x09/
>  xFF\xFE\xC2\x8F\xF0\xC1\xBF
>   row(s) in 0.0500 seconds
> hbase(main):005:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW =>
> ''}
> ROW  COLUMN+CELL
>  +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF
> column=SID_T_MTX:\x00\x00<2, timestamp=1364404155426,
> value=PAGE\x09\x091364404145275\x09 \x09/
>  E\xC2S-\x08\x1F
> 1 row(s) in 0.0640 seconds
> hbase(main):006:0>
>
>
> On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
>> Same question, same time :)
>>
>> Regards
>> Ram
>>
>> On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan <
>> ramkrishna.s.vasude...@gmail.com> wrote:
>>
>> > Could you give us some more insights on this?
>> > So you mean when you set the row key as 'azzzaaa', though this row does
>> > not exist, the scanner returns some other row?  Or it is giving you a row
>> > that does not exist?
>> >
>> > Or you mean it is doing a full table scan?
>> >
>> > Which version of HBase and what type of filters are you using?
>> > Regards
>> > Ram
>> >
>> >
>> > On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia > >wrote:
>> >
>> >> I have key in the form of "hashedid + timestamp" but when I run scan I
>> get
>> >> rows for almost every value. For instance if I run scan for 'azzzaaa'
>> that
>> >> doesn't even exist even then I get the results.
>> >>
>> >> Could someone help me understand what might be going on here?
>> >>
>> >
>> >
>>


Re: Understanding scan behaviour

2013-03-28 Thread Mohit Anchlia
I am running 92.1 version and this is what happens.


hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW =>
'sdw0'}
ROW  COLUMN+CELL
 s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x
column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106,
value=PAGE\x09\x091363056252990\x09\x09/
 7F\xFF\xFE\xC2\xA3\x84Z\x7F

1 row(s) in 0.0450 seconds
hbase(main):004:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW =>
'--'}
ROW  COLUMN+CELL
 -\xA1\xAF>r\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\
column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714,
value=PAGE\x09239923973\x091363384698919\x09/
 xFF\xFE\xC2\x8F\xF0\xC1\xBF
  row(s) in 0.0500 seconds
hbase(main):005:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, STARTROW =>
''}
ROW  COLUMN+CELL
 +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF
column=SID_T_MTX:\x00\x00<2, timestamp=1364404155426,
value=PAGE\x09\x091364404145275\x09 \x09/
 E\xC2S-\x08\x1F
1 row(s) in 0.0640 seconds
hbase(main):006:0>


On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> Same question, same time :)
>
> Regards
> Ram
>
> On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > Could you give us some more insights on this?
> > So you mean when you set the row key as 'azzzaaa', though this row does
> > not exist, the scanner returns some other row?  Or it is giving you a row
> > that does not exist?
> >
> > Or you mean it is doing a full table scan?
> >
> > Which version of HBase and what type of filters are you using?
> > Regards
> > Ram
> >
> >
> > On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia  >wrote:
> >
> >> I have key in the form of "hashedid + timestamp" but when I run scan I
> get
> >> rows for almost every value. For instance if I run scan for 'azzzaaa'
> that
> >> doesn't even exist even then I get the results.
> >>
> >> Could someone help me understand what might be going on here?
> >>
> >
> >
>


Re: Understanding scan behaviour

2013-03-27 Thread ramkrishna vasudevan
Same question, same time :)

Regards
Ram

On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> Could you give us some more insights on this?
> So you mean when you set the row key as 'azzzaaa', though this row does
> not exist, the scanner returns some other row?  Or it is giving you a row
> that does not exist?
>
> Or you mean it is doing a full table scan?
>
> Which version of HBase and what type of filters are you using?
> Regards
> Ram
>
>
> On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia wrote:
>
>> I have key in the form of "hashedid + timestamp" but when I run scan I get
>> rows for almost every value. For instance if I run scan for 'azzzaaa' that
>> doesn't even exist even then I get the results.
>>
>> Could someone help me understand what might be going on here?
>>
>
>


Re: Understanding scan behaviour

2013-03-27 Thread ramkrishna vasudevan
Could you give us some more insights on this?
So you mean when you set the row key as 'azzzaaa', though this row does not
exist, the scanner returns some other row?  Or it is giving you a row that
does not exist?

Or you mean it is doing a full table scan?

Which version of HBase and what type of filters are you using?
Regards
Ram

On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia wrote:

> I have key in the form of "hashedid + timestamp" but when I run scan I get
> rows for almost every value. For instance if I run scan for 'azzzaaa' that
> doesn't even exist even then I get the results.
>
> Could someone help me understand what might be going on here?
>


Re: Understanding scan behaviour

2013-03-27 Thread Ted Yu
Can you give us some more information ?
What version of HBase are you using ?

bq. even then I get the results
Can you specify what results ? Obviously the result couldn't have been
'azzzaaa'

Cheers

On Wed, Mar 27, 2013 at 9:15 PM, Mohit Anchlia wrote:

> I have key in the form of "hashedid + timestamp" but when I run scan I get
> rows for almost every value. For instance if I run scan for 'azzzaaa' that
> doesn't even exist even then I get the results.
>
> Could someone help me understand what might be going on here?
>