RE: Hbase Count Aggregate Function

2013-01-01 Thread Dalia Sobhy

Thanks Ram,

Issue is resolved i forgot to add
scan.addFilter(fliterlist);

Thats why it was not filtering !!!

 Date: Wed, 26 Dec 2012 21:11:32 +0530
 Subject: Re: Hbase Count Aggregate Function
 From: ramkrishna.s.vasude...@gmail.com
 To: user@hbase.apache.org
 
 Dalia,
 
 I tried out this eg,
 
 {code}
   private static final byte[] TEST_TABLE = Bytes.toBytes(TestTable);
   private static final byte[] TEST_FAMILY = Bytes.toBytes(TestFamily);
   private static final byte[] TEST_QUALIFIER =
 Bytes.toBytes(TestQualifier);
   private static final byte[] TEST_MULTI_CQ = Bytes.toBytes(TestMultiCQ);
 
   private static byte[] ROW = Bytes.toBytes(testRow);
   private static final int ROWSIZE = 20;
   private static final int rowSeperator1 = 5;
   private static final int rowSeperator2 = 12;
   private static byte[][] ROWS = makeN(ROW, ROWSIZE);
 for (int i = 0; i  ROWSIZE; i++) {
   Put put = new Put(ROWS[i]);
   put.setWriteToWAL(false);
   Long l = new Long(i);
   put.add(TEST_FAMILY, TEST_QUALIFIER, Bytes.toBytes(l));
   table.put(put);
   Put p2 = new Put(ROWS[i]);
   put.setWriteToWAL(false);
   p2.add(TEST_FAMILY, Bytes.add(TEST_MULTI_CQ, Bytes.toBytes(l)), Bytes
   .toBytes(l * 10));
   table.put(p2);
 
AggregationClient aClient = new AggregationClient(conf);
 Scan scan = new Scan();
 scan.addColumn(TEST_FAMILY, TEST_QUALIFIER);
 final ColumnInterpreterLong, Long ci = new LongColumnInterpreter();
 SingleColumnValueFilter scvf = new SingleColumnValueFilter(TEST_FAMILY,
 TEST_QUALIFIER, CompareOp.EQUAL,
   Bytes.toBytes(4l));
 scan.setFilter(scvf);
 long rowCount = aClient.rowCount(TEST_TABLE, ci,
 scan);
 assertEquals(ROWSIZE, rowCount);
 }
 {code}
 
 So this assertion is failing and it is working as expected.  If you want to
 try out check out the testcase
 in TestAggregateProtocol.testRowCountAllTable().
 Just modify the testcase so that you pass a SingleColumnValueFilter.  It is
 working fine.
 
 Please check and let me know.  May be am doing some mistake.
 
 Regards
 Ram
 
 On Tue, Dec 25, 2012 at 11:25 PM, Dalia Sobhy 
 dalia.mohso...@hotmail.comwrote:
 
 
  Is there a problem in letting ID (rowkey) int value??
 
   Date: Tue, 25 Dec 2012 22:44:00 +0530
   Subject: Re: Hbase Count Aggregate Function
   From: ramkrishna.s.vasude...@gmail.com
   To: user@hbase.apache.org
  
   @Dalia
  
   I think the aggregation client should work with what you have passed.
   What
   i meant in the previous mail was with table.count() and now with
   AggregationClient.
   {code}
   if (scan.getFilter() == null  qualifier == null)
 scan.setFilter(new FirstKeyOnlyFilter());
   {code}
  
   So as you have passed the filter then it should work as how the SCVF
  should
   work.  I can check this out during free time (may be tomorrow).
   If not you can raise a bug.  If it turns to be fine then we can close it
   out otherwise its better we fix it.
   I can understand your urgency in this.
  
   Regards
   Ram
  
  
  
  
  
   On Tue, Dec 25, 2012 at 10:27 PM, yuzhih...@gmail.com wrote:
  
RowCount method accepts scan object where you can attach your custom
filter.
   
Cheers
   
   
   
On Dec 25, 2012, at 8:42 AM, Dalia Sobhy dalia.mohso...@hotmail.com
wrote:
   

 Do you mean I implement a new rowCount method in Aggregation Client
Class.

 I cannot understand, could u illustrate with a code sample Ram?

 Date: Tue, 25 Dec 2012 00:21:14 +0530
 Subject: Re: Hbase Count Aggregate Function
 From: ramkrishna.s.vasude...@gmail.com
 To: user@hbase.apache.org

 Hi
 You could have custom filter implemented which is similar to
 FirstKeyOnlyfilter.
 Implement the filterKeyValue method such that it should match your
keyvalue
 (the specific qualifier that you are looking for).

 Deploy it in your cluster.  It should work.

 Regards
 Ram

 On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy 
dalia.mohso...@hotmail.comwrote:


 So do you have a suggestion how to enable/work the filter?

 Date: Mon, 24 Dec 2012 22:22:49 +0530
 Subject: Re: Hbase Count Aggregate Function
 From: ramkrishna.s.vasude...@gmail.com
 To: user@hbase.apache.org

 Okie, seeing the shell script and the code I feel that while you
  use
this
 counter, the user's filter is not taken into account.
 It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.

 Regards
 Ram

 On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy 
 dalia.mohso...@hotmail.comwrote:


 yeah scan gives the correct number of rows, while count returns
  the
 total
 number of rows.

 Both are using the same filter, I even tried it using Java API,
using
 row
 count method.

 rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);

 I get the total

Re: Hbase Count Aggregate Function

2013-01-01 Thread ramkrishna vasudevan
Oh...Oops..

Regards
Ram

On Wed, Jan 2, 2013 at 3:14 AM, Dalia Sobhy dalia.mohso...@hotmail.comwrote:


 Thanks Ram,

 Issue is resolved i forgot to add
 scan.addFilter(fliterlist);

 Thats why it was not filtering !!!

  Date: Wed, 26 Dec 2012 21:11:32 +0530
  Subject: Re: Hbase Count Aggregate Function
  From: ramkrishna.s.vasude...@gmail.com
  To: user@hbase.apache.org
 
  Dalia,
 
  I tried out this eg,
 
  {code}
private static final byte[] TEST_TABLE = Bytes.toBytes(TestTable);
private static final byte[] TEST_FAMILY = Bytes.toBytes(TestFamily);
private static final byte[] TEST_QUALIFIER =
  Bytes.toBytes(TestQualifier);
private static final byte[] TEST_MULTI_CQ =
 Bytes.toBytes(TestMultiCQ);
 
private static byte[] ROW = Bytes.toBytes(testRow);
private static final int ROWSIZE = 20;
private static final int rowSeperator1 = 5;
private static final int rowSeperator2 = 12;
private static byte[][] ROWS = makeN(ROW, ROWSIZE);
  for (int i = 0; i  ROWSIZE; i++) {
Put put = new Put(ROWS[i]);
put.setWriteToWAL(false);
Long l = new Long(i);
put.add(TEST_FAMILY, TEST_QUALIFIER, Bytes.toBytes(l));
table.put(put);
Put p2 = new Put(ROWS[i]);
put.setWriteToWAL(false);
p2.add(TEST_FAMILY, Bytes.add(TEST_MULTI_CQ, Bytes.toBytes(l)),
 Bytes
.toBytes(l * 10));
table.put(p2);
 
 AggregationClient aClient = new AggregationClient(conf);
  Scan scan = new Scan();
  scan.addColumn(TEST_FAMILY, TEST_QUALIFIER);
  final ColumnInterpreterLong, Long ci = new LongColumnInterpreter();
  SingleColumnValueFilter scvf = new
 SingleColumnValueFilter(TEST_FAMILY,
  TEST_QUALIFIER, CompareOp.EQUAL,
Bytes.toBytes(4l));
  scan.setFilter(scvf);
  long rowCount = aClient.rowCount(TEST_TABLE, ci,
  scan);
  assertEquals(ROWSIZE, rowCount);
  }
  {code}
 
  So this assertion is failing and it is working as expected.  If you want
 to
  try out check out the testcase
  in TestAggregateProtocol.testRowCountAllTable().
  Just modify the testcase so that you pass a SingleColumnValueFilter.  It
 is
  working fine.
 
  Please check and let me know.  May be am doing some mistake.
 
  Regards
  Ram
 
  On Tue, Dec 25, 2012 at 11:25 PM, Dalia Sobhy 
 dalia.mohso...@hotmail.comwrote:
 
  
   Is there a problem in letting ID (rowkey) int value??
  
Date: Tue, 25 Dec 2012 22:44:00 +0530
Subject: Re: Hbase Count Aggregate Function
From: ramkrishna.s.vasude...@gmail.com
To: user@hbase.apache.org
   
@Dalia
   
I think the aggregation client should work with what you have passed.
What
i meant in the previous mail was with table.count() and now with
AggregationClient.
{code}
if (scan.getFilter() == null  qualifier == null)
  scan.setFilter(new FirstKeyOnlyFilter());
{code}
   
So as you have passed the filter then it should work as how the SCVF
   should
work.  I can check this out during free time (may be tomorrow).
If not you can raise a bug.  If it turns to be fine then we can
 close it
out otherwise its better we fix it.
I can understand your urgency in this.
   
Regards
Ram
   
   
   
   
   
On Tue, Dec 25, 2012 at 10:27 PM, yuzhih...@gmail.com wrote:
   
 RowCount method accepts scan object where you can attach your
 custom
 filter.

 Cheers



 On Dec 25, 2012, at 8:42 AM, Dalia Sobhy 
 dalia.mohso...@hotmail.com
 wrote:

 
  Do you mean I implement a new rowCount method in Aggregation
 Client
 Class.
 
  I cannot understand, could u illustrate with a code sample Ram?
 
  Date: Tue, 25 Dec 2012 00:21:14 +0530
  Subject: Re: Hbase Count Aggregate Function
  From: ramkrishna.s.vasude...@gmail.com
  To: user@hbase.apache.org
 
  Hi
  You could have custom filter implemented which is similar to
  FirstKeyOnlyfilter.
  Implement the filterKeyValue method such that it should match
 your
 keyvalue
  (the specific qualifier that you are looking for).
 
  Deploy it in your cluster.  It should work.
 
  Regards
  Ram
 
  On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy 
 dalia.mohso...@hotmail.comwrote:
 
 
  So do you have a suggestion how to enable/work the filter?
 
  Date: Mon, 24 Dec 2012 22:22:49 +0530
  Subject: Re: Hbase Count Aggregate Function
  From: ramkrishna.s.vasude...@gmail.com
  To: user@hbase.apache.org
 
  Okie, seeing the shell script and the code I feel that while
 you
   use
 this
  counter, the user's filter is not taken into account.
  It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
 
  Regards
  Ram
 
  On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy 
  dalia.mohso...@hotmail.comwrote:
 
 
  yeah scan gives the correct number of rows, while

Re: Hbase Count Aggregate Function

2012-12-26 Thread ramkrishna vasudevan
Dalia,

I tried out this eg,

{code}
  private static final byte[] TEST_TABLE = Bytes.toBytes(TestTable);
  private static final byte[] TEST_FAMILY = Bytes.toBytes(TestFamily);
  private static final byte[] TEST_QUALIFIER =
Bytes.toBytes(TestQualifier);
  private static final byte[] TEST_MULTI_CQ = Bytes.toBytes(TestMultiCQ);

  private static byte[] ROW = Bytes.toBytes(testRow);
  private static final int ROWSIZE = 20;
  private static final int rowSeperator1 = 5;
  private static final int rowSeperator2 = 12;
  private static byte[][] ROWS = makeN(ROW, ROWSIZE);
for (int i = 0; i  ROWSIZE; i++) {
  Put put = new Put(ROWS[i]);
  put.setWriteToWAL(false);
  Long l = new Long(i);
  put.add(TEST_FAMILY, TEST_QUALIFIER, Bytes.toBytes(l));
  table.put(put);
  Put p2 = new Put(ROWS[i]);
  put.setWriteToWAL(false);
  p2.add(TEST_FAMILY, Bytes.add(TEST_MULTI_CQ, Bytes.toBytes(l)), Bytes
  .toBytes(l * 10));
  table.put(p2);

   AggregationClient aClient = new AggregationClient(conf);
Scan scan = new Scan();
scan.addColumn(TEST_FAMILY, TEST_QUALIFIER);
final ColumnInterpreterLong, Long ci = new LongColumnInterpreter();
SingleColumnValueFilter scvf = new SingleColumnValueFilter(TEST_FAMILY,
TEST_QUALIFIER, CompareOp.EQUAL,
  Bytes.toBytes(4l));
scan.setFilter(scvf);
long rowCount = aClient.rowCount(TEST_TABLE, ci,
scan);
assertEquals(ROWSIZE, rowCount);
}
{code}

So this assertion is failing and it is working as expected.  If you want to
try out check out the testcase
in TestAggregateProtocol.testRowCountAllTable().
Just modify the testcase so that you pass a SingleColumnValueFilter.  It is
working fine.

Please check and let me know.  May be am doing some mistake.

Regards
Ram

On Tue, Dec 25, 2012 at 11:25 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote:


 Is there a problem in letting ID (rowkey) int value??

  Date: Tue, 25 Dec 2012 22:44:00 +0530
  Subject: Re: Hbase Count Aggregate Function
  From: ramkrishna.s.vasude...@gmail.com
  To: user@hbase.apache.org
 
  @Dalia
 
  I think the aggregation client should work with what you have passed.
  What
  i meant in the previous mail was with table.count() and now with
  AggregationClient.
  {code}
  if (scan.getFilter() == null  qualifier == null)
scan.setFilter(new FirstKeyOnlyFilter());
  {code}
 
  So as you have passed the filter then it should work as how the SCVF
 should
  work.  I can check this out during free time (may be tomorrow).
  If not you can raise a bug.  If it turns to be fine then we can close it
  out otherwise its better we fix it.
  I can understand your urgency in this.
 
  Regards
  Ram
 
 
 
 
 
  On Tue, Dec 25, 2012 at 10:27 PM, yuzhih...@gmail.com wrote:
 
   RowCount method accepts scan object where you can attach your custom
   filter.
  
   Cheers
  
  
  
   On Dec 25, 2012, at 8:42 AM, Dalia Sobhy dalia.mohso...@hotmail.com
   wrote:
  
   
Do you mean I implement a new rowCount method in Aggregation Client
   Class.
   
I cannot understand, could u illustrate with a code sample Ram?
   
Date: Tue, 25 Dec 2012 00:21:14 +0530
Subject: Re: Hbase Count Aggregate Function
From: ramkrishna.s.vasude...@gmail.com
To: user@hbase.apache.org
   
Hi
You could have custom filter implemented which is similar to
FirstKeyOnlyfilter.
Implement the filterKeyValue method such that it should match your
   keyvalue
(the specific qualifier that you are looking for).
   
Deploy it in your cluster.  It should work.
   
Regards
Ram
   
On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy 
   dalia.mohso...@hotmail.comwrote:
   
   
So do you have a suggestion how to enable/work the filter?
   
Date: Mon, 24 Dec 2012 22:22:49 +0530
Subject: Re: Hbase Count Aggregate Function
From: ramkrishna.s.vasude...@gmail.com
To: user@hbase.apache.org
   
Okie, seeing the shell script and the code I feel that while you
 use
   this
counter, the user's filter is not taken into account.
It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
   
Regards
Ram
   
On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy 
dalia.mohso...@hotmail.comwrote:
   
   
yeah scan gives the correct number of rows, while count returns
 the
total
number of rows.
   
Both are using the same filter, I even tried it using Java API,
   using
row
count method.
   
rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
   
I get the total number of rows not the number of rows filtered.
   
So any idea ??
   
Thanks Ram :)
   
Date: Mon, 24 Dec 2012 21:57:54 +0530
Subject: Re: Hbase Count Aggregate Function
From: ramkrishna.s.vasude...@gmail.com
To: user@hbase.apache.org
   
So you find that scan with a filter and count with the same
 filter
   is
giving you different results?
   
Regards
Ram
   
On Mon, Dec 24

RE: Hbase Count Aggregate Function

2012-12-25 Thread Dalia Sobhy

Do you mean I implement a new rowCount method in Aggregation Client Class.

I cannot understand, could u illustrate with a code sample Ram?

  Date: Tue, 25 Dec 2012 00:21:14 +0530
  Subject: Re: Hbase Count Aggregate Function
  From: ramkrishna.s.vasude...@gmail.com
  To: user@hbase.apache.org
  
  Hi
  You could have custom filter implemented which is similar to
  FirstKeyOnlyfilter.
  Implement the filterKeyValue method such that it should match your keyvalue
  (the specific qualifier that you are looking for).
  
  Deploy it in your cluster.  It should work.
  
  Regards
  Ram
  
  On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy 
  dalia.mohso...@hotmail.comwrote:
  
  
   So do you have a suggestion how to enable/work the filter?
  
Date: Mon, 24 Dec 2012 22:22:49 +0530
Subject: Re: Hbase Count Aggregate Function
From: ramkrishna.s.vasude...@gmail.com
To: user@hbase.apache.org
   
Okie, seeing the shell script and the code I feel that while you use 
this
counter, the user's filter is not taken into account.
It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
   
Regards
Ram
   
On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy 
   dalia.mohso...@hotmail.comwrote:
   

 yeah scan gives the correct number of rows, while count returns the
   total
 number of rows.

 Both are using the same filter, I even tried it using Java API, using
   row
 count method.

 rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);

 I get the total number of rows not the number of rows filtered.

 So any idea ??

 Thanks Ram :)

  Date: Mon, 24 Dec 2012 21:57:54 +0530
  Subject: Re: Hbase Count Aggregate Function
  From: ramkrishna.s.vasude...@gmail.com
  To: user@hbase.apache.org
 
  So you find that scan with a filter and count with the same filter 
  is
  giving you different results?
 
  Regards
  Ram
 
  On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy 
   dalia.mohso...@hotmail.com
 wrote:
 
  
   Dear all,
  
   I have 50,000 row with diagnosis qualifier = cardiac, and 
   another
 50,000
   rows with renal.
  
   When I type this in Hbase shell,
  
   import org.apache.hadoop.hbase.filter.CompareFilter
   import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
   import org.apache.hadoop.hbase.filter.SubstringComparator
   import org.apache.hadoop.hbase.util.Bytes
  
   scan 'patient', { COLUMNS = info:diagnosis, FILTER =
   SingleColumnValueFilter.new(Bytes.toBytes('info'),
Bytes.toBytes('diagnosis'),
CompareFilter::CompareOp.valueOf('EQUAL'),
SubstringComparator.new('cardiac'))}
  
   Output = 50,000 row
  
   import org.apache.hadoop.hbase.filter.CompareFilter
   import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
   import org.apache.hadoop.hbase.filter.SubstringComparator
   import org.apache.hadoop.hbase.util.Bytes
  
   count 'patient', { COLUMNS = info:diagnosis, FILTER =
   SingleColumnValueFilter.new(Bytes.toBytes('info'),
Bytes.toBytes('diagnosis'),
CompareFilter::CompareOp.valueOf('EQUAL'),
SubstringComparator.new('cardiac'))}
   Output = 100,000 row
  
   Even though I tried it using Hbase Java API, Aggregation Client
 Instance,
   and I enabled the Coprocessor aggregation for the table.
   rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
  
   Also when measuring the improved performance on case of adding 
   more
 nodes
   the operation takes the same time.
  
   So any advice please?
  
   I have been throughout all this mess from a couple of weeks
  
   Thanks,


  
  
 
  

Re: Hbase Count Aggregate Function

2012-12-25 Thread yuzhihong
RowCount method accepts scan object where you can attach your custom filter. 

Cheers



On Dec 25, 2012, at 8:42 AM, Dalia Sobhy dalia.mohso...@hotmail.com wrote:

 
 Do you mean I implement a new rowCount method in Aggregation Client Class.
 
 I cannot understand, could u illustrate with a code sample Ram?
 
 Date: Tue, 25 Dec 2012 00:21:14 +0530
 Subject: Re: Hbase Count Aggregate Function
 From: ramkrishna.s.vasude...@gmail.com
 To: user@hbase.apache.org
 
 Hi
 You could have custom filter implemented which is similar to
 FirstKeyOnlyfilter.
 Implement the filterKeyValue method such that it should match your keyvalue
 (the specific qualifier that you are looking for).
 
 Deploy it in your cluster.  It should work.
 
 Regards
 Ram
 
 On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy 
 dalia.mohso...@hotmail.comwrote:
 
 
 So do you have a suggestion how to enable/work the filter?
 
 Date: Mon, 24 Dec 2012 22:22:49 +0530
 Subject: Re: Hbase Count Aggregate Function
 From: ramkrishna.s.vasude...@gmail.com
 To: user@hbase.apache.org
 
 Okie, seeing the shell script and the code I feel that while you use this
 counter, the user's filter is not taken into account.
 It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
 
 Regards
 Ram
 
 On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy 
 dalia.mohso...@hotmail.comwrote:
 
 
 yeah scan gives the correct number of rows, while count returns the
 total
 number of rows.
 
 Both are using the same filter, I even tried it using Java API, using
 row
 count method.
 
 rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
 
 I get the total number of rows not the number of rows filtered.
 
 So any idea ??
 
 Thanks Ram :)
 
 Date: Mon, 24 Dec 2012 21:57:54 +0530
 Subject: Re: Hbase Count Aggregate Function
 From: ramkrishna.s.vasude...@gmail.com
 To: user@hbase.apache.org
 
 So you find that scan with a filter and count with the same filter is
 giving you different results?
 
 Regards
 Ram
 
 On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy 
 dalia.mohso...@hotmail.com
 wrote:
 
 
 Dear all,
 
 I have 50,000 row with diagnosis qualifier = cardiac, and another
 50,000
 rows with renal.
 
 When I type this in Hbase shell,
 
 import org.apache.hadoop.hbase.filter.CompareFilter
 import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
 import org.apache.hadoop.hbase.filter.SubstringComparator
 import org.apache.hadoop.hbase.util.Bytes
 
 scan 'patient', { COLUMNS = info:diagnosis, FILTER =
SingleColumnValueFilter.new(Bytes.toBytes('info'),
 Bytes.toBytes('diagnosis'),
 CompareFilter::CompareOp.valueOf('EQUAL'),
 SubstringComparator.new('cardiac'))}
 
 Output = 50,000 row
 
 import org.apache.hadoop.hbase.filter.CompareFilter
 import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
 import org.apache.hadoop.hbase.filter.SubstringComparator
 import org.apache.hadoop.hbase.util.Bytes
 
 count 'patient', { COLUMNS = info:diagnosis, FILTER =
SingleColumnValueFilter.new(Bytes.toBytes('info'),
 Bytes.toBytes('diagnosis'),
 CompareFilter::CompareOp.valueOf('EQUAL'),
 SubstringComparator.new('cardiac'))}
 Output = 100,000 row
 
 Even though I tried it using Hbase Java API, Aggregation Client
 Instance,
 and I enabled the Coprocessor aggregation for the table.
 rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
 
 Also when measuring the improved performance on case of adding more
 nodes
 the operation takes the same time.
 
 So any advice please?
 
 I have been throughout all this mess from a couple of weeks
 
 Thanks,
 
 
 
 
 
 


Re: Hbase Count Aggregate Function

2012-12-25 Thread ramkrishna vasudevan
@Dalia

I think the aggregation client should work with what you have passed.  What
i meant in the previous mail was with table.count() and now with
AggregationClient.
{code}
if (scan.getFilter() == null  qualifier == null)
  scan.setFilter(new FirstKeyOnlyFilter());
{code}

So as you have passed the filter then it should work as how the SCVF should
work.  I can check this out during free time (may be tomorrow).
If not you can raise a bug.  If it turns to be fine then we can close it
out otherwise its better we fix it.
I can understand your urgency in this.

Regards
Ram





On Tue, Dec 25, 2012 at 10:27 PM, yuzhih...@gmail.com wrote:

 RowCount method accepts scan object where you can attach your custom
 filter.

 Cheers



 On Dec 25, 2012, at 8:42 AM, Dalia Sobhy dalia.mohso...@hotmail.com
 wrote:

 
  Do you mean I implement a new rowCount method in Aggregation Client
 Class.
 
  I cannot understand, could u illustrate with a code sample Ram?
 
  Date: Tue, 25 Dec 2012 00:21:14 +0530
  Subject: Re: Hbase Count Aggregate Function
  From: ramkrishna.s.vasude...@gmail.com
  To: user@hbase.apache.org
 
  Hi
  You could have custom filter implemented which is similar to
  FirstKeyOnlyfilter.
  Implement the filterKeyValue method such that it should match your
 keyvalue
  (the specific qualifier that you are looking for).
 
  Deploy it in your cluster.  It should work.
 
  Regards
  Ram
 
  On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy 
 dalia.mohso...@hotmail.comwrote:
 
 
  So do you have a suggestion how to enable/work the filter?
 
  Date: Mon, 24 Dec 2012 22:22:49 +0530
  Subject: Re: Hbase Count Aggregate Function
  From: ramkrishna.s.vasude...@gmail.com
  To: user@hbase.apache.org
 
  Okie, seeing the shell script and the code I feel that while you use
 this
  counter, the user's filter is not taken into account.
  It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
 
  Regards
  Ram
 
  On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy 
  dalia.mohso...@hotmail.comwrote:
 
 
  yeah scan gives the correct number of rows, while count returns the
  total
  number of rows.
 
  Both are using the same filter, I even tried it using Java API,
 using
  row
  count method.
 
  rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
 
  I get the total number of rows not the number of rows filtered.
 
  So any idea ??
 
  Thanks Ram :)
 
  Date: Mon, 24 Dec 2012 21:57:54 +0530
  Subject: Re: Hbase Count Aggregate Function
  From: ramkrishna.s.vasude...@gmail.com
  To: user@hbase.apache.org
 
  So you find that scan with a filter and count with the same filter
 is
  giving you different results?
 
  Regards
  Ram
 
  On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy 
  dalia.mohso...@hotmail.com
  wrote:
 
 
  Dear all,
 
  I have 50,000 row with diagnosis qualifier = cardiac, and
 another
  50,000
  rows with renal.
 
  When I type this in Hbase shell,
 
  import org.apache.hadoop.hbase.filter.CompareFilter
  import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
  import org.apache.hadoop.hbase.filter.SubstringComparator
  import org.apache.hadoop.hbase.util.Bytes
 
  scan 'patient', { COLUMNS = info:diagnosis, FILTER =
 SingleColumnValueFilter.new(Bytes.toBytes('info'),
  Bytes.toBytes('diagnosis'),
  CompareFilter::CompareOp.valueOf('EQUAL'),
  SubstringComparator.new('cardiac'))}
 
  Output = 50,000 row
 
  import org.apache.hadoop.hbase.filter.CompareFilter
  import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
  import org.apache.hadoop.hbase.filter.SubstringComparator
  import org.apache.hadoop.hbase.util.Bytes
 
  count 'patient', { COLUMNS = info:diagnosis, FILTER =
 SingleColumnValueFilter.new(Bytes.toBytes('info'),
  Bytes.toBytes('diagnosis'),
  CompareFilter::CompareOp.valueOf('EQUAL'),
  SubstringComparator.new('cardiac'))}
  Output = 100,000 row
 
  Even though I tried it using Hbase Java API, Aggregation Client
  Instance,
  and I enabled the Coprocessor aggregation for the table.
  rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
 
  Also when measuring the improved performance on case of adding
 more
  nodes
  the operation takes the same time.
 
  So any advice please?
 
  I have been throughout all this mess from a couple of weeks
 
  Thanks,
 
 
 
 
 
 



RE: Hbase Count Aggregate Function

2012-12-25 Thread Dalia Sobhy

Thanks Ram,

I have tried it alot.

I even tried to it by hbase shell, by scanning using filters. 

By using scan , it returns the right number. But still the aggregationClient 
RowCount method returns the wrong number as if it cannot see the filter. 
Although I have sent it false values to return zero, it returned the total 
number of rows in the table.

So what do you think ??

 Date: Tue, 25 Dec 2012 22:44:00 +0530
 Subject: Re: Hbase Count Aggregate Function
 From: ramkrishna.s.vasude...@gmail.com
 To: user@hbase.apache.org
 
 @Dalia
 
 I think the aggregation client should work with what you have passed.  What
 i meant in the previous mail was with table.count() and now with
 AggregationClient.
 {code}
 if (scan.getFilter() == null  qualifier == null)
   scan.setFilter(new FirstKeyOnlyFilter());
 {code}
 
 So as you have passed the filter then it should work as how the SCVF should
 work.  I can check this out during free time (may be tomorrow).
 If not you can raise a bug.  If it turns to be fine then we can close it
 out otherwise its better we fix it.
 I can understand your urgency in this.
 
 Regards
 Ram
 
 
 
 
 
 On Tue, Dec 25, 2012 at 10:27 PM, yuzhih...@gmail.com wrote:
 
  RowCount method accepts scan object where you can attach your custom
  filter.
 
  Cheers
 
 
 
  On Dec 25, 2012, at 8:42 AM, Dalia Sobhy dalia.mohso...@hotmail.com
  wrote:
 
  
   Do you mean I implement a new rowCount method in Aggregation Client
  Class.
  
   I cannot understand, could u illustrate with a code sample Ram?
  
   Date: Tue, 25 Dec 2012 00:21:14 +0530
   Subject: Re: Hbase Count Aggregate Function
   From: ramkrishna.s.vasude...@gmail.com
   To: user@hbase.apache.org
  
   Hi
   You could have custom filter implemented which is similar to
   FirstKeyOnlyfilter.
   Implement the filterKeyValue method such that it should match your
  keyvalue
   (the specific qualifier that you are looking for).
  
   Deploy it in your cluster.  It should work.
  
   Regards
   Ram
  
   On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy 
  dalia.mohso...@hotmail.comwrote:
  
  
   So do you have a suggestion how to enable/work the filter?
  
   Date: Mon, 24 Dec 2012 22:22:49 +0530
   Subject: Re: Hbase Count Aggregate Function
   From: ramkrishna.s.vasude...@gmail.com
   To: user@hbase.apache.org
  
   Okie, seeing the shell script and the code I feel that while you use
  this
   counter, the user's filter is not taken into account.
   It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
  
   Regards
   Ram
  
   On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy 
   dalia.mohso...@hotmail.comwrote:
  
  
   yeah scan gives the correct number of rows, while count returns the
   total
   number of rows.
  
   Both are using the same filter, I even tried it using Java API,
  using
   row
   count method.
  
   rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
  
   I get the total number of rows not the number of rows filtered.
  
   So any idea ??
  
   Thanks Ram :)
  
   Date: Mon, 24 Dec 2012 21:57:54 +0530
   Subject: Re: Hbase Count Aggregate Function
   From: ramkrishna.s.vasude...@gmail.com
   To: user@hbase.apache.org
  
   So you find that scan with a filter and count with the same filter
  is
   giving you different results?
  
   Regards
   Ram
  
   On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy 
   dalia.mohso...@hotmail.com
   wrote:
  
  
   Dear all,
  
   I have 50,000 row with diagnosis qualifier = cardiac, and
  another
   50,000
   rows with renal.
  
   When I type this in Hbase shell,
  
   import org.apache.hadoop.hbase.filter.CompareFilter
   import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
   import org.apache.hadoop.hbase.filter.SubstringComparator
   import org.apache.hadoop.hbase.util.Bytes
  
   scan 'patient', { COLUMNS = info:diagnosis, FILTER =
  SingleColumnValueFilter.new(Bytes.toBytes('info'),
   Bytes.toBytes('diagnosis'),
   CompareFilter::CompareOp.valueOf('EQUAL'),
   SubstringComparator.new('cardiac'))}
  
   Output = 50,000 row
  
   import org.apache.hadoop.hbase.filter.CompareFilter
   import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
   import org.apache.hadoop.hbase.filter.SubstringComparator
   import org.apache.hadoop.hbase.util.Bytes
  
   count 'patient', { COLUMNS = info:diagnosis, FILTER =
  SingleColumnValueFilter.new(Bytes.toBytes('info'),
   Bytes.toBytes('diagnosis'),
   CompareFilter::CompareOp.valueOf('EQUAL'),
   SubstringComparator.new('cardiac'))}
   Output = 100,000 row
  
   Even though I tried it using Hbase Java API, Aggregation Client
   Instance,
   and I enabled the Coprocessor aggregation for the table.
   rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
  
   Also when measuring the improved performance on case of adding
  more
   nodes
   the operation takes the same time.
  
   So any advice please?
  
   I have been throughout all this mess

RE: Hbase Count Aggregate Function

2012-12-25 Thread Dalia Sobhy

Is there a problem in letting ID (rowkey) int value??

 Date: Tue, 25 Dec 2012 22:44:00 +0530
 Subject: Re: Hbase Count Aggregate Function
 From: ramkrishna.s.vasude...@gmail.com
 To: user@hbase.apache.org
 
 @Dalia
 
 I think the aggregation client should work with what you have passed.  What
 i meant in the previous mail was with table.count() and now with
 AggregationClient.
 {code}
 if (scan.getFilter() == null  qualifier == null)
   scan.setFilter(new FirstKeyOnlyFilter());
 {code}
 
 So as you have passed the filter then it should work as how the SCVF should
 work.  I can check this out during free time (may be tomorrow).
 If not you can raise a bug.  If it turns to be fine then we can close it
 out otherwise its better we fix it.
 I can understand your urgency in this.
 
 Regards
 Ram
 
 
 
 
 
 On Tue, Dec 25, 2012 at 10:27 PM, yuzhih...@gmail.com wrote:
 
  RowCount method accepts scan object where you can attach your custom
  filter.
 
  Cheers
 
 
 
  On Dec 25, 2012, at 8:42 AM, Dalia Sobhy dalia.mohso...@hotmail.com
  wrote:
 
  
   Do you mean I implement a new rowCount method in Aggregation Client
  Class.
  
   I cannot understand, could u illustrate with a code sample Ram?
  
   Date: Tue, 25 Dec 2012 00:21:14 +0530
   Subject: Re: Hbase Count Aggregate Function
   From: ramkrishna.s.vasude...@gmail.com
   To: user@hbase.apache.org
  
   Hi
   You could have custom filter implemented which is similar to
   FirstKeyOnlyfilter.
   Implement the filterKeyValue method such that it should match your
  keyvalue
   (the specific qualifier that you are looking for).
  
   Deploy it in your cluster.  It should work.
  
   Regards
   Ram
  
   On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy 
  dalia.mohso...@hotmail.comwrote:
  
  
   So do you have a suggestion how to enable/work the filter?
  
   Date: Mon, 24 Dec 2012 22:22:49 +0530
   Subject: Re: Hbase Count Aggregate Function
   From: ramkrishna.s.vasude...@gmail.com
   To: user@hbase.apache.org
  
   Okie, seeing the shell script and the code I feel that while you use
  this
   counter, the user's filter is not taken into account.
   It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
  
   Regards
   Ram
  
   On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy 
   dalia.mohso...@hotmail.comwrote:
  
  
   yeah scan gives the correct number of rows, while count returns the
   total
   number of rows.
  
   Both are using the same filter, I even tried it using Java API,
  using
   row
   count method.
  
   rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
  
   I get the total number of rows not the number of rows filtered.
  
   So any idea ??
  
   Thanks Ram :)
  
   Date: Mon, 24 Dec 2012 21:57:54 +0530
   Subject: Re: Hbase Count Aggregate Function
   From: ramkrishna.s.vasude...@gmail.com
   To: user@hbase.apache.org
  
   So you find that scan with a filter and count with the same filter
  is
   giving you different results?
  
   Regards
   Ram
  
   On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy 
   dalia.mohso...@hotmail.com
   wrote:
  
  
   Dear all,
  
   I have 50,000 row with diagnosis qualifier = cardiac, and
  another
   50,000
   rows with renal.
  
   When I type this in Hbase shell,
  
   import org.apache.hadoop.hbase.filter.CompareFilter
   import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
   import org.apache.hadoop.hbase.filter.SubstringComparator
   import org.apache.hadoop.hbase.util.Bytes
  
   scan 'patient', { COLUMNS = info:diagnosis, FILTER =
  SingleColumnValueFilter.new(Bytes.toBytes('info'),
   Bytes.toBytes('diagnosis'),
   CompareFilter::CompareOp.valueOf('EQUAL'),
   SubstringComparator.new('cardiac'))}
  
   Output = 50,000 row
  
   import org.apache.hadoop.hbase.filter.CompareFilter
   import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
   import org.apache.hadoop.hbase.filter.SubstringComparator
   import org.apache.hadoop.hbase.util.Bytes
  
   count 'patient', { COLUMNS = info:diagnosis, FILTER =
  SingleColumnValueFilter.new(Bytes.toBytes('info'),
   Bytes.toBytes('diagnosis'),
   CompareFilter::CompareOp.valueOf('EQUAL'),
   SubstringComparator.new('cardiac'))}
   Output = 100,000 row
  
   Even though I tried it using Hbase Java API, Aggregation Client
   Instance,
   and I enabled the Coprocessor aggregation for the table.
   rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
  
   Also when measuring the improved performance on case of adding
  more
   nodes
   the operation takes the same time.
  
   So any advice please?
  
   I have been throughout all this mess from a couple of weeks
  
   Thanks,
  
  
  
  
  
  
 
  

Re: Hbase Count Aggregate Function

2012-12-24 Thread Jean-Marc Spaggiari
Hi Dalia,

You already sent the same question yesterday ;) Just give some time to
people to look at it.

JM

2012/12/24, Dalia Sobhy dalia.mohso...@hotmail.com:

 Dear all,

 I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000
 rows with renal.

 When I type this in Hbase shell,

 import org.apache.hadoop.hbase.filter.CompareFilter
 import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
 import org.apache.hadoop.hbase.filter.SubstringComparator
 import org.apache.hadoop.hbase.util.Bytes

 scan 'patient', { COLUMNS = info:diagnosis, FILTER =
 SingleColumnValueFilter.new(Bytes.toBytes('info'),
  Bytes.toBytes('diagnosis'),
  CompareFilter::CompareOp.valueOf('EQUAL'),
  SubstringComparator.new('cardiac'))}

 Output = 50,000 row

 import org.apache.hadoop.hbase.filter.CompareFilter
 import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
 import org.apache.hadoop.hbase.filter.SubstringComparator
 import org.apache.hadoop.hbase.util.Bytes

 count 'patient', { COLUMNS = info:diagnosis, FILTER =
 SingleColumnValueFilter.new(Bytes.toBytes('info'),
  Bytes.toBytes('diagnosis'),
  CompareFilter::CompareOp.valueOf('EQUAL'),
  SubstringComparator.new('cardiac'))}
 Output = 100,000 row

 Even though I tried it using Hbase Java API, Aggregation Client Instance,
 and I enabled the Coprocessor aggregation for the table.
 rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)

 Also when measuring the improved performance on case of adding more nodes
 the operation takes the same time.

 So any advice please?

 I have been throughout all this mess from a couple of weeks

 Thanks,


Re: Hbase Count Aggregate Function

2012-12-24 Thread ramkrishna vasudevan
So you find that scan with a filter and count with the same filter is
giving you different results?

Regards
Ram

On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote:


 Dear all,

 I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000
 rows with renal.

 When I type this in Hbase shell,

 import org.apache.hadoop.hbase.filter.CompareFilter
 import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
 import org.apache.hadoop.hbase.filter.SubstringComparator
 import org.apache.hadoop.hbase.util.Bytes

 scan 'patient', { COLUMNS = info:diagnosis, FILTER =
 SingleColumnValueFilter.new(Bytes.toBytes('info'),
  Bytes.toBytes('diagnosis'),
  CompareFilter::CompareOp.valueOf('EQUAL'),
  SubstringComparator.new('cardiac'))}

 Output = 50,000 row

 import org.apache.hadoop.hbase.filter.CompareFilter
 import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
 import org.apache.hadoop.hbase.filter.SubstringComparator
 import org.apache.hadoop.hbase.util.Bytes

 count 'patient', { COLUMNS = info:diagnosis, FILTER =
 SingleColumnValueFilter.new(Bytes.toBytes('info'),
  Bytes.toBytes('diagnosis'),
  CompareFilter::CompareOp.valueOf('EQUAL'),
  SubstringComparator.new('cardiac'))}
 Output = 100,000 row

 Even though I tried it using Hbase Java API, Aggregation Client Instance,
 and I enabled the Coprocessor aggregation for the table.
 rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)

 Also when measuring the improved performance on case of adding more nodes
 the operation takes the same time.

 So any advice please?

 I have been throughout all this mess from a couple of weeks

 Thanks,


RE: Hbase Count Aggregate Function

2012-12-24 Thread Dalia Sobhy

yeah scan gives the correct number of rows, while count returns the total 
number of rows. 

Both are using the same filter, I even tried it using Java API, using row count 
method.

rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);

I get the total number of rows not the number of rows filtered.

So any idea ??

Thanks Ram :)

 Date: Mon, 24 Dec 2012 21:57:54 +0530
 Subject: Re: Hbase Count Aggregate Function
 From: ramkrishna.s.vasude...@gmail.com
 To: user@hbase.apache.org
 
 So you find that scan with a filter and count with the same filter is
 giving you different results?
 
 Regards
 Ram
 
 On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy 
 dalia.mohso...@hotmail.comwrote:
 
 
  Dear all,
 
  I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000
  rows with renal.
 
  When I type this in Hbase shell,
 
  import org.apache.hadoop.hbase.filter.CompareFilter
  import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
  import org.apache.hadoop.hbase.filter.SubstringComparator
  import org.apache.hadoop.hbase.util.Bytes
 
  scan 'patient', { COLUMNS = info:diagnosis, FILTER =
  SingleColumnValueFilter.new(Bytes.toBytes('info'),
   Bytes.toBytes('diagnosis'),
   CompareFilter::CompareOp.valueOf('EQUAL'),
   SubstringComparator.new('cardiac'))}
 
  Output = 50,000 row
 
  import org.apache.hadoop.hbase.filter.CompareFilter
  import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
  import org.apache.hadoop.hbase.filter.SubstringComparator
  import org.apache.hadoop.hbase.util.Bytes
 
  count 'patient', { COLUMNS = info:diagnosis, FILTER =
  SingleColumnValueFilter.new(Bytes.toBytes('info'),
   Bytes.toBytes('diagnosis'),
   CompareFilter::CompareOp.valueOf('EQUAL'),
   SubstringComparator.new('cardiac'))}
  Output = 100,000 row
 
  Even though I tried it using Hbase Java API, Aggregation Client Instance,
  and I enabled the Coprocessor aggregation for the table.
  rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
 
  Also when measuring the improved performance on case of adding more nodes
  the operation takes the same time.
 
  So any advice please?
 
  I have been throughout all this mess from a couple of weeks
 
  Thanks,
  

Re: Hbase Count Aggregate Function

2012-12-24 Thread ramkrishna vasudevan
Okie, seeing the shell script and the code I feel that while you use this
counter, the user's filter is not taken into account.
It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.

Regards
Ram

On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote:


 yeah scan gives the correct number of rows, while count returns the total
 number of rows.

 Both are using the same filter, I even tried it using Java API, using row
 count method.

 rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);

 I get the total number of rows not the number of rows filtered.

 So any idea ??

 Thanks Ram :)

  Date: Mon, 24 Dec 2012 21:57:54 +0530
  Subject: Re: Hbase Count Aggregate Function
  From: ramkrishna.s.vasude...@gmail.com
  To: user@hbase.apache.org
 
  So you find that scan with a filter and count with the same filter is
  giving you different results?
 
  Regards
  Ram
 
  On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy dalia.mohso...@hotmail.com
 wrote:
 
  
   Dear all,
  
   I have 50,000 row with diagnosis qualifier = cardiac, and another
 50,000
   rows with renal.
  
   When I type this in Hbase shell,
  
   import org.apache.hadoop.hbase.filter.CompareFilter
   import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
   import org.apache.hadoop.hbase.filter.SubstringComparator
   import org.apache.hadoop.hbase.util.Bytes
  
   scan 'patient', { COLUMNS = info:diagnosis, FILTER =
   SingleColumnValueFilter.new(Bytes.toBytes('info'),
Bytes.toBytes('diagnosis'),
CompareFilter::CompareOp.valueOf('EQUAL'),
SubstringComparator.new('cardiac'))}
  
   Output = 50,000 row
  
   import org.apache.hadoop.hbase.filter.CompareFilter
   import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
   import org.apache.hadoop.hbase.filter.SubstringComparator
   import org.apache.hadoop.hbase.util.Bytes
  
   count 'patient', { COLUMNS = info:diagnosis, FILTER =
   SingleColumnValueFilter.new(Bytes.toBytes('info'),
Bytes.toBytes('diagnosis'),
CompareFilter::CompareOp.valueOf('EQUAL'),
SubstringComparator.new('cardiac'))}
   Output = 100,000 row
  
   Even though I tried it using Hbase Java API, Aggregation Client
 Instance,
   and I enabled the Coprocessor aggregation for the table.
   rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
  
   Also when measuring the improved performance on case of adding more
 nodes
   the operation takes the same time.
  
   So any advice please?
  
   I have been throughout all this mess from a couple of weeks
  
   Thanks,




RE: Hbase Count Aggregate Function

2012-12-24 Thread Dalia Sobhy

So do you have a suggestion how to enable/work the filter?

 Date: Mon, 24 Dec 2012 22:22:49 +0530
 Subject: Re: Hbase Count Aggregate Function
 From: ramkrishna.s.vasude...@gmail.com
 To: user@hbase.apache.org
 
 Okie, seeing the shell script and the code I feel that while you use this
 counter, the user's filter is not taken into account.
 It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
 
 Regards
 Ram
 
 On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy 
 dalia.mohso...@hotmail.comwrote:
 
 
  yeah scan gives the correct number of rows, while count returns the total
  number of rows.
 
  Both are using the same filter, I even tried it using Java API, using row
  count method.
 
  rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
 
  I get the total number of rows not the number of rows filtered.
 
  So any idea ??
 
  Thanks Ram :)
 
   Date: Mon, 24 Dec 2012 21:57:54 +0530
   Subject: Re: Hbase Count Aggregate Function
   From: ramkrishna.s.vasude...@gmail.com
   To: user@hbase.apache.org
  
   So you find that scan with a filter and count with the same filter is
   giving you different results?
  
   Regards
   Ram
  
   On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy dalia.mohso...@hotmail.com
  wrote:
  
   
Dear all,
   
I have 50,000 row with diagnosis qualifier = cardiac, and another
  50,000
rows with renal.
   
When I type this in Hbase shell,
   
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes
   
scan 'patient', { COLUMNS = info:diagnosis, FILTER =
SingleColumnValueFilter.new(Bytes.toBytes('info'),
 Bytes.toBytes('diagnosis'),
 CompareFilter::CompareOp.valueOf('EQUAL'),
 SubstringComparator.new('cardiac'))}
   
Output = 50,000 row
   
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes
   
count 'patient', { COLUMNS = info:diagnosis, FILTER =
SingleColumnValueFilter.new(Bytes.toBytes('info'),
 Bytes.toBytes('diagnosis'),
 CompareFilter::CompareOp.valueOf('EQUAL'),
 SubstringComparator.new('cardiac'))}
Output = 100,000 row
   
Even though I tried it using Hbase Java API, Aggregation Client
  Instance,
and I enabled the Coprocessor aggregation for the table.
rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
   
Also when measuring the improved performance on case of adding more
  nodes
the operation takes the same time.
   
So any advice please?
   
I have been throughout all this mess from a couple of weeks
   
Thanks,
 
 
  

Re: Hbase Count Aggregate Function

2012-12-24 Thread ramkrishna vasudevan
Hi
You could have custom filter implemented which is similar to
FirstKeyOnlyfilter.
Implement the filterKeyValue method such that it should match your keyvalue
(the specific qualifier that you are looking for).

Deploy it in your cluster.  It should work.

Regards
Ram

On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote:


 So do you have a suggestion how to enable/work the filter?

  Date: Mon, 24 Dec 2012 22:22:49 +0530
  Subject: Re: Hbase Count Aggregate Function
  From: ramkrishna.s.vasude...@gmail.com
  To: user@hbase.apache.org
 
  Okie, seeing the shell script and the code I feel that while you use this
  counter, the user's filter is not taken into account.
  It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
 
  Regards
  Ram
 
  On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy 
 dalia.mohso...@hotmail.comwrote:
 
  
   yeah scan gives the correct number of rows, while count returns the
 total
   number of rows.
  
   Both are using the same filter, I even tried it using Java API, using
 row
   count method.
  
   rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
  
   I get the total number of rows not the number of rows filtered.
  
   So any idea ??
  
   Thanks Ram :)
  
Date: Mon, 24 Dec 2012 21:57:54 +0530
Subject: Re: Hbase Count Aggregate Function
From: ramkrishna.s.vasude...@gmail.com
To: user@hbase.apache.org
   
So you find that scan with a filter and count with the same filter is
giving you different results?
   
Regards
Ram
   
On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy 
 dalia.mohso...@hotmail.com
   wrote:
   

 Dear all,

 I have 50,000 row with diagnosis qualifier = cardiac, and another
   50,000
 rows with renal.

 When I type this in Hbase shell,

 import org.apache.hadoop.hbase.filter.CompareFilter
 import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
 import org.apache.hadoop.hbase.filter.SubstringComparator
 import org.apache.hadoop.hbase.util.Bytes

 scan 'patient', { COLUMNS = info:diagnosis, FILTER =
 SingleColumnValueFilter.new(Bytes.toBytes('info'),
  Bytes.toBytes('diagnosis'),
  CompareFilter::CompareOp.valueOf('EQUAL'),
  SubstringComparator.new('cardiac'))}

 Output = 50,000 row

 import org.apache.hadoop.hbase.filter.CompareFilter
 import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
 import org.apache.hadoop.hbase.filter.SubstringComparator
 import org.apache.hadoop.hbase.util.Bytes

 count 'patient', { COLUMNS = info:diagnosis, FILTER =
 SingleColumnValueFilter.new(Bytes.toBytes('info'),
  Bytes.toBytes('diagnosis'),
  CompareFilter::CompareOp.valueOf('EQUAL'),
  SubstringComparator.new('cardiac'))}
 Output = 100,000 row

 Even though I tried it using Hbase Java API, Aggregation Client
   Instance,
 and I enabled the Coprocessor aggregation for the table.
 rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)

 Also when measuring the improved performance on case of adding more
   nodes
 the operation takes the same time.

 So any advice please?

 I have been throughout all this mess from a couple of weeks

 Thanks,
  
  




RE: Hbase Count Aggregate Function

2012-12-24 Thread Dalia Sobhy

Do you mean I implement a new rowCount method in Aggregation Client Class.

I cannot understand, could u illustrate with a code sample Ram?

Thanks,

 Date: Tue, 25 Dec 2012 00:21:14 +0530
 Subject: Re: Hbase Count Aggregate Function
 From: ramkrishna.s.vasude...@gmail.com
 To: user@hbase.apache.org
 
 Hi
 You could have custom filter implemented which is similar to
 FirstKeyOnlyfilter.
 Implement the filterKeyValue method such that it should match your keyvalue
 (the specific qualifier that you are looking for).
 
 Deploy it in your cluster.  It should work.
 
 Regards
 Ram
 
 On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy 
 dalia.mohso...@hotmail.comwrote:
 
 
  So do you have a suggestion how to enable/work the filter?
 
   Date: Mon, 24 Dec 2012 22:22:49 +0530
   Subject: Re: Hbase Count Aggregate Function
   From: ramkrishna.s.vasude...@gmail.com
   To: user@hbase.apache.org
  
   Okie, seeing the shell script and the code I feel that while you use this
   counter, the user's filter is not taken into account.
   It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
  
   Regards
   Ram
  
   On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy 
  dalia.mohso...@hotmail.comwrote:
  
   
yeah scan gives the correct number of rows, while count returns the
  total
number of rows.
   
Both are using the same filter, I even tried it using Java API, using
  row
count method.
   
rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
   
I get the total number of rows not the number of rows filtered.
   
So any idea ??
   
Thanks Ram :)
   
 Date: Mon, 24 Dec 2012 21:57:54 +0530
 Subject: Re: Hbase Count Aggregate Function
 From: ramkrishna.s.vasude...@gmail.com
 To: user@hbase.apache.org

 So you find that scan with a filter and count with the same filter is
 giving you different results?

 Regards
 Ram

 On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy 
  dalia.mohso...@hotmail.com
wrote:

 
  Dear all,
 
  I have 50,000 row with diagnosis qualifier = cardiac, and another
50,000
  rows with renal.
 
  When I type this in Hbase shell,
 
  import org.apache.hadoop.hbase.filter.CompareFilter
  import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
  import org.apache.hadoop.hbase.filter.SubstringComparator
  import org.apache.hadoop.hbase.util.Bytes
 
  scan 'patient', { COLUMNS = info:diagnosis, FILTER =
  SingleColumnValueFilter.new(Bytes.toBytes('info'),
   Bytes.toBytes('diagnosis'),
   CompareFilter::CompareOp.valueOf('EQUAL'),
   SubstringComparator.new('cardiac'))}
 
  Output = 50,000 row
 
  import org.apache.hadoop.hbase.filter.CompareFilter
  import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
  import org.apache.hadoop.hbase.filter.SubstringComparator
  import org.apache.hadoop.hbase.util.Bytes
 
  count 'patient', { COLUMNS = info:diagnosis, FILTER =
  SingleColumnValueFilter.new(Bytes.toBytes('info'),
   Bytes.toBytes('diagnosis'),
   CompareFilter::CompareOp.valueOf('EQUAL'),
   SubstringComparator.new('cardiac'))}
  Output = 100,000 row
 
  Even though I tried it using Hbase Java API, Aggregation Client
Instance,
  and I enabled the Coprocessor aggregation for the table.
  rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
 
  Also when measuring the improved performance on case of adding more
nodes
  the operation takes the same time.
 
  So any advice please?
 
  I have been throughout all this mess from a couple of weeks
 
  Thanks,
   
   
 
 
  

RE: Hbase Count Aggregate Function

2012-12-24 Thread Dalia Sobhy

This is my function:

public long CountByDiagnosis(String diagnosis) throws IOException
  {
customConf.setStrings(hbase.zookeeper.quorum,hbaseZookeeperQuorum);
customConf.setLong(hbase.rpc.timeout, 60);
customConf.setLong(hbase.client.scanner.caching, 1000);
configuration = HBaseConfiguration.create(customConf);
aggregationClient = new AggregationClient(configuration);

scan.addFamily(CF);

//Filter by a particular Diagnosis
SingleColumnValueFilter filter1 = new SingleColumnValueFilter(
  CF,
  Column,
  CompareOp.EQUAL,
  Bytes.toBytes(diagnosis)
  );
scan.setFilter(filter1);

long rowCount = -1;
//Count the number of patients suffering from cardiac diagnosis
try {
  rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
} catch (Throwable e) {
  e.printStackTrace();
}
return rowCount;

  }
 


 Date: Tue, 25 Dec 2012 00:21:14 +0530
 Subject: Re: Hbase Count Aggregate Function
 From: ramkrishna.s.vasude...@gmail.com
 To: user@hbase.apache.org
 
 Hi
 You could have custom filter implemented which is similar to
 FirstKeyOnlyfilter.
 Implement the filterKeyValue method such that it should match your keyvalue
 (the specific qualifier that you are looking for).
 
 Deploy it in your cluster.  It should work.
 
 Regards
 Ram
 
 On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy 
 dalia.mohso...@hotmail.comwrote:
 
 
  So do you have a suggestion how to enable/work the filter?
 
   Date: Mon, 24 Dec 2012 22:22:49 +0530
   Subject: Re: Hbase Count Aggregate Function
   From: ramkrishna.s.vasude...@gmail.com
   To: user@hbase.apache.org
  
   Okie, seeing the shell script and the code I feel that while you use this
   counter, the user's filter is not taken into account.
   It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
  
   Regards
   Ram
  
   On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy 
  dalia.mohso...@hotmail.comwrote:
  
   
yeah scan gives the correct number of rows, while count returns the
  total
number of rows.
   
Both are using the same filter, I even tried it using Java API, using
  row
count method.
   
rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
   
I get the total number of rows not the number of rows filtered.
   
So any idea ??
   
Thanks Ram :)
   
 Date: Mon, 24 Dec 2012 21:57:54 +0530
 Subject: Re: Hbase Count Aggregate Function
 From: ramkrishna.s.vasude...@gmail.com
 To: user@hbase.apache.org

 So you find that scan with a filter and count with the same filter is
 giving you different results?

 Regards
 Ram

 On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy 
  dalia.mohso...@hotmail.com
wrote:

 
  Dear all,
 
  I have 50,000 row with diagnosis qualifier = cardiac, and another
50,000
  rows with renal.
 
  When I type this in Hbase shell,
 
  import org.apache.hadoop.hbase.filter.CompareFilter
  import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
  import org.apache.hadoop.hbase.filter.SubstringComparator
  import org.apache.hadoop.hbase.util.Bytes
 
  scan 'patient', { COLUMNS = info:diagnosis, FILTER =
  SingleColumnValueFilter.new(Bytes.toBytes('info'),
   Bytes.toBytes('diagnosis'),
   CompareFilter::CompareOp.valueOf('EQUAL'),
   SubstringComparator.new('cardiac'))}
 
  Output = 50,000 row
 
  import org.apache.hadoop.hbase.filter.CompareFilter
  import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
  import org.apache.hadoop.hbase.filter.SubstringComparator
  import org.apache.hadoop.hbase.util.Bytes
 
  count 'patient', { COLUMNS = info:diagnosis, FILTER =
  SingleColumnValueFilter.new(Bytes.toBytes('info'),
   Bytes.toBytes('diagnosis'),
   CompareFilter::CompareOp.valueOf('EQUAL'),
   SubstringComparator.new('cardiac'))}
  Output = 100,000 row
 
  Even though I tried it using Hbase Java API, Aggregation Client
Instance,
  and I enabled the Coprocessor aggregation for the table.
  rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
 
  Also when measuring the improved performance on case of adding more
nodes
  the operation takes the same time.
 
  So any advice please?
 
  I have been throughout all this mess from a couple of weeks
 
  Thanks,