RE: Hbase Count Aggregate Function
Thanks Ram, Issue is resolved i forgot to add scan.addFilter(fliterlist); Thats why it was not filtering !!! Date: Wed, 26 Dec 2012 21:11:32 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Dalia, I tried out this eg, {code} private static final byte[] TEST_TABLE = Bytes.toBytes(TestTable); private static final byte[] TEST_FAMILY = Bytes.toBytes(TestFamily); private static final byte[] TEST_QUALIFIER = Bytes.toBytes(TestQualifier); private static final byte[] TEST_MULTI_CQ = Bytes.toBytes(TestMultiCQ); private static byte[] ROW = Bytes.toBytes(testRow); private static final int ROWSIZE = 20; private static final int rowSeperator1 = 5; private static final int rowSeperator2 = 12; private static byte[][] ROWS = makeN(ROW, ROWSIZE); for (int i = 0; i ROWSIZE; i++) { Put put = new Put(ROWS[i]); put.setWriteToWAL(false); Long l = new Long(i); put.add(TEST_FAMILY, TEST_QUALIFIER, Bytes.toBytes(l)); table.put(put); Put p2 = new Put(ROWS[i]); put.setWriteToWAL(false); p2.add(TEST_FAMILY, Bytes.add(TEST_MULTI_CQ, Bytes.toBytes(l)), Bytes .toBytes(l * 10)); table.put(p2); AggregationClient aClient = new AggregationClient(conf); Scan scan = new Scan(); scan.addColumn(TEST_FAMILY, TEST_QUALIFIER); final ColumnInterpreterLong, Long ci = new LongColumnInterpreter(); SingleColumnValueFilter scvf = new SingleColumnValueFilter(TEST_FAMILY, TEST_QUALIFIER, CompareOp.EQUAL, Bytes.toBytes(4l)); scan.setFilter(scvf); long rowCount = aClient.rowCount(TEST_TABLE, ci, scan); assertEquals(ROWSIZE, rowCount); } {code} So this assertion is failing and it is working as expected. If you want to try out check out the testcase in TestAggregateProtocol.testRowCountAllTable(). Just modify the testcase so that you pass a SingleColumnValueFilter. It is working fine. Please check and let me know. May be am doing some mistake. Regards Ram On Tue, Dec 25, 2012 at 11:25 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: Is there a problem in letting ID (rowkey) int value?? Date: Tue, 25 Dec 2012 22:44:00 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org @Dalia I think the aggregation client should work with what you have passed. What i meant in the previous mail was with table.count() and now with AggregationClient. {code} if (scan.getFilter() == null qualifier == null) scan.setFilter(new FirstKeyOnlyFilter()); {code} So as you have passed the filter then it should work as how the SCVF should work. I can check this out during free time (may be tomorrow). If not you can raise a bug. If it turns to be fine then we can close it out otherwise its better we fix it. I can understand your urgency in this. Regards Ram On Tue, Dec 25, 2012 at 10:27 PM, yuzhih...@gmail.com wrote: RowCount method accepts scan object where you can attach your custom filter. Cheers On Dec 25, 2012, at 8:42 AM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Do you mean I implement a new rowCount method in Aggregation Client Class. I cannot understand, could u illustrate with a code sample Ram? Date: Tue, 25 Dec 2012 00:21:14 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Hi You could have custom filter implemented which is similar to FirstKeyOnlyfilter. Implement the filterKeyValue method such that it should match your keyvalue (the specific qualifier that you are looking for). Deploy it in your cluster. It should work. Regards Ram On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: So do you have a suggestion how to enable/work the filter? Date: Mon, 24 Dec 2012 22:22:49 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Okie, seeing the shell script and the code I feel that while you use this counter, the user's filter is not taken into account. It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. Regards Ram On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: yeah scan gives the correct number of rows, while count returns the total number of rows. Both are using the same filter, I even tried it using Java API, using row count method. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); I get the total
Re: Hbase Count Aggregate Function
Oh...Oops.. Regards Ram On Wed, Jan 2, 2013 at 3:14 AM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: Thanks Ram, Issue is resolved i forgot to add scan.addFilter(fliterlist); Thats why it was not filtering !!! Date: Wed, 26 Dec 2012 21:11:32 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Dalia, I tried out this eg, {code} private static final byte[] TEST_TABLE = Bytes.toBytes(TestTable); private static final byte[] TEST_FAMILY = Bytes.toBytes(TestFamily); private static final byte[] TEST_QUALIFIER = Bytes.toBytes(TestQualifier); private static final byte[] TEST_MULTI_CQ = Bytes.toBytes(TestMultiCQ); private static byte[] ROW = Bytes.toBytes(testRow); private static final int ROWSIZE = 20; private static final int rowSeperator1 = 5; private static final int rowSeperator2 = 12; private static byte[][] ROWS = makeN(ROW, ROWSIZE); for (int i = 0; i ROWSIZE; i++) { Put put = new Put(ROWS[i]); put.setWriteToWAL(false); Long l = new Long(i); put.add(TEST_FAMILY, TEST_QUALIFIER, Bytes.toBytes(l)); table.put(put); Put p2 = new Put(ROWS[i]); put.setWriteToWAL(false); p2.add(TEST_FAMILY, Bytes.add(TEST_MULTI_CQ, Bytes.toBytes(l)), Bytes .toBytes(l * 10)); table.put(p2); AggregationClient aClient = new AggregationClient(conf); Scan scan = new Scan(); scan.addColumn(TEST_FAMILY, TEST_QUALIFIER); final ColumnInterpreterLong, Long ci = new LongColumnInterpreter(); SingleColumnValueFilter scvf = new SingleColumnValueFilter(TEST_FAMILY, TEST_QUALIFIER, CompareOp.EQUAL, Bytes.toBytes(4l)); scan.setFilter(scvf); long rowCount = aClient.rowCount(TEST_TABLE, ci, scan); assertEquals(ROWSIZE, rowCount); } {code} So this assertion is failing and it is working as expected. If you want to try out check out the testcase in TestAggregateProtocol.testRowCountAllTable(). Just modify the testcase so that you pass a SingleColumnValueFilter. It is working fine. Please check and let me know. May be am doing some mistake. Regards Ram On Tue, Dec 25, 2012 at 11:25 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: Is there a problem in letting ID (rowkey) int value?? Date: Tue, 25 Dec 2012 22:44:00 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org @Dalia I think the aggregation client should work with what you have passed. What i meant in the previous mail was with table.count() and now with AggregationClient. {code} if (scan.getFilter() == null qualifier == null) scan.setFilter(new FirstKeyOnlyFilter()); {code} So as you have passed the filter then it should work as how the SCVF should work. I can check this out during free time (may be tomorrow). If not you can raise a bug. If it turns to be fine then we can close it out otherwise its better we fix it. I can understand your urgency in this. Regards Ram On Tue, Dec 25, 2012 at 10:27 PM, yuzhih...@gmail.com wrote: RowCount method accepts scan object where you can attach your custom filter. Cheers On Dec 25, 2012, at 8:42 AM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Do you mean I implement a new rowCount method in Aggregation Client Class. I cannot understand, could u illustrate with a code sample Ram? Date: Tue, 25 Dec 2012 00:21:14 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Hi You could have custom filter implemented which is similar to FirstKeyOnlyfilter. Implement the filterKeyValue method such that it should match your keyvalue (the specific qualifier that you are looking for). Deploy it in your cluster. It should work. Regards Ram On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: So do you have a suggestion how to enable/work the filter? Date: Mon, 24 Dec 2012 22:22:49 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Okie, seeing the shell script and the code I feel that while you use this counter, the user's filter is not taken into account. It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. Regards Ram On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: yeah scan gives the correct number of rows, while
Re: Hbase Count Aggregate Function
Dalia, I tried out this eg, {code} private static final byte[] TEST_TABLE = Bytes.toBytes(TestTable); private static final byte[] TEST_FAMILY = Bytes.toBytes(TestFamily); private static final byte[] TEST_QUALIFIER = Bytes.toBytes(TestQualifier); private static final byte[] TEST_MULTI_CQ = Bytes.toBytes(TestMultiCQ); private static byte[] ROW = Bytes.toBytes(testRow); private static final int ROWSIZE = 20; private static final int rowSeperator1 = 5; private static final int rowSeperator2 = 12; private static byte[][] ROWS = makeN(ROW, ROWSIZE); for (int i = 0; i ROWSIZE; i++) { Put put = new Put(ROWS[i]); put.setWriteToWAL(false); Long l = new Long(i); put.add(TEST_FAMILY, TEST_QUALIFIER, Bytes.toBytes(l)); table.put(put); Put p2 = new Put(ROWS[i]); put.setWriteToWAL(false); p2.add(TEST_FAMILY, Bytes.add(TEST_MULTI_CQ, Bytes.toBytes(l)), Bytes .toBytes(l * 10)); table.put(p2); AggregationClient aClient = new AggregationClient(conf); Scan scan = new Scan(); scan.addColumn(TEST_FAMILY, TEST_QUALIFIER); final ColumnInterpreterLong, Long ci = new LongColumnInterpreter(); SingleColumnValueFilter scvf = new SingleColumnValueFilter(TEST_FAMILY, TEST_QUALIFIER, CompareOp.EQUAL, Bytes.toBytes(4l)); scan.setFilter(scvf); long rowCount = aClient.rowCount(TEST_TABLE, ci, scan); assertEquals(ROWSIZE, rowCount); } {code} So this assertion is failing and it is working as expected. If you want to try out check out the testcase in TestAggregateProtocol.testRowCountAllTable(). Just modify the testcase so that you pass a SingleColumnValueFilter. It is working fine. Please check and let me know. May be am doing some mistake. Regards Ram On Tue, Dec 25, 2012 at 11:25 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: Is there a problem in letting ID (rowkey) int value?? Date: Tue, 25 Dec 2012 22:44:00 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org @Dalia I think the aggregation client should work with what you have passed. What i meant in the previous mail was with table.count() and now with AggregationClient. {code} if (scan.getFilter() == null qualifier == null) scan.setFilter(new FirstKeyOnlyFilter()); {code} So as you have passed the filter then it should work as how the SCVF should work. I can check this out during free time (may be tomorrow). If not you can raise a bug. If it turns to be fine then we can close it out otherwise its better we fix it. I can understand your urgency in this. Regards Ram On Tue, Dec 25, 2012 at 10:27 PM, yuzhih...@gmail.com wrote: RowCount method accepts scan object where you can attach your custom filter. Cheers On Dec 25, 2012, at 8:42 AM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Do you mean I implement a new rowCount method in Aggregation Client Class. I cannot understand, could u illustrate with a code sample Ram? Date: Tue, 25 Dec 2012 00:21:14 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Hi You could have custom filter implemented which is similar to FirstKeyOnlyfilter. Implement the filterKeyValue method such that it should match your keyvalue (the specific qualifier that you are looking for). Deploy it in your cluster. It should work. Regards Ram On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: So do you have a suggestion how to enable/work the filter? Date: Mon, 24 Dec 2012 22:22:49 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Okie, seeing the shell script and the code I feel that while you use this counter, the user's filter is not taken into account. It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. Regards Ram On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: yeah scan gives the correct number of rows, while count returns the total number of rows. Both are using the same filter, I even tried it using Java API, using row count method. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); I get the total number of rows not the number of rows filtered. So any idea ?? Thanks Ram :) Date: Mon, 24 Dec 2012 21:57:54 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org So you find that scan with a filter and count with the same filter is giving you different results? Regards Ram On Mon, Dec 24
RE: Hbase Count Aggregate Function
Do you mean I implement a new rowCount method in Aggregation Client Class. I cannot understand, could u illustrate with a code sample Ram? Date: Tue, 25 Dec 2012 00:21:14 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Hi You could have custom filter implemented which is similar to FirstKeyOnlyfilter. Implement the filterKeyValue method such that it should match your keyvalue (the specific qualifier that you are looking for). Deploy it in your cluster. It should work. Regards Ram On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: So do you have a suggestion how to enable/work the filter? Date: Mon, 24 Dec 2012 22:22:49 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Okie, seeing the shell script and the code I feel that while you use this counter, the user's filter is not taken into account. It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. Regards Ram On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: yeah scan gives the correct number of rows, while count returns the total number of rows. Both are using the same filter, I even tried it using Java API, using row count method. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); I get the total number of rows not the number of rows filtered. So any idea ?? Thanks Ram :) Date: Mon, 24 Dec 2012 21:57:54 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org So you find that scan with a filter and count with the same filter is giving you different results? Regards Ram On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Dear all, I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000 rows with renal. When I type this in Hbase shell, import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 50,000 row import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes count 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 100,000 row Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) Also when measuring the improved performance on case of adding more nodes the operation takes the same time. So any advice please? I have been throughout all this mess from a couple of weeks Thanks,
Re: Hbase Count Aggregate Function
RowCount method accepts scan object where you can attach your custom filter. Cheers On Dec 25, 2012, at 8:42 AM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Do you mean I implement a new rowCount method in Aggregation Client Class. I cannot understand, could u illustrate with a code sample Ram? Date: Tue, 25 Dec 2012 00:21:14 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Hi You could have custom filter implemented which is similar to FirstKeyOnlyfilter. Implement the filterKeyValue method such that it should match your keyvalue (the specific qualifier that you are looking for). Deploy it in your cluster. It should work. Regards Ram On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: So do you have a suggestion how to enable/work the filter? Date: Mon, 24 Dec 2012 22:22:49 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Okie, seeing the shell script and the code I feel that while you use this counter, the user's filter is not taken into account. It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. Regards Ram On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: yeah scan gives the correct number of rows, while count returns the total number of rows. Both are using the same filter, I even tried it using Java API, using row count method. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); I get the total number of rows not the number of rows filtered. So any idea ?? Thanks Ram :) Date: Mon, 24 Dec 2012 21:57:54 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org So you find that scan with a filter and count with the same filter is giving you different results? Regards Ram On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Dear all, I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000 rows with renal. When I type this in Hbase shell, import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 50,000 row import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes count 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 100,000 row Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) Also when measuring the improved performance on case of adding more nodes the operation takes the same time. So any advice please? I have been throughout all this mess from a couple of weeks Thanks,
Re: Hbase Count Aggregate Function
@Dalia I think the aggregation client should work with what you have passed. What i meant in the previous mail was with table.count() and now with AggregationClient. {code} if (scan.getFilter() == null qualifier == null) scan.setFilter(new FirstKeyOnlyFilter()); {code} So as you have passed the filter then it should work as how the SCVF should work. I can check this out during free time (may be tomorrow). If not you can raise a bug. If it turns to be fine then we can close it out otherwise its better we fix it. I can understand your urgency in this. Regards Ram On Tue, Dec 25, 2012 at 10:27 PM, yuzhih...@gmail.com wrote: RowCount method accepts scan object where you can attach your custom filter. Cheers On Dec 25, 2012, at 8:42 AM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Do you mean I implement a new rowCount method in Aggregation Client Class. I cannot understand, could u illustrate with a code sample Ram? Date: Tue, 25 Dec 2012 00:21:14 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Hi You could have custom filter implemented which is similar to FirstKeyOnlyfilter. Implement the filterKeyValue method such that it should match your keyvalue (the specific qualifier that you are looking for). Deploy it in your cluster. It should work. Regards Ram On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: So do you have a suggestion how to enable/work the filter? Date: Mon, 24 Dec 2012 22:22:49 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Okie, seeing the shell script and the code I feel that while you use this counter, the user's filter is not taken into account. It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. Regards Ram On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: yeah scan gives the correct number of rows, while count returns the total number of rows. Both are using the same filter, I even tried it using Java API, using row count method. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); I get the total number of rows not the number of rows filtered. So any idea ?? Thanks Ram :) Date: Mon, 24 Dec 2012 21:57:54 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org So you find that scan with a filter and count with the same filter is giving you different results? Regards Ram On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Dear all, I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000 rows with renal. When I type this in Hbase shell, import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 50,000 row import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes count 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 100,000 row Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) Also when measuring the improved performance on case of adding more nodes the operation takes the same time. So any advice please? I have been throughout all this mess from a couple of weeks Thanks,
RE: Hbase Count Aggregate Function
Thanks Ram, I have tried it alot. I even tried to it by hbase shell, by scanning using filters. By using scan , it returns the right number. But still the aggregationClient RowCount method returns the wrong number as if it cannot see the filter. Although I have sent it false values to return zero, it returned the total number of rows in the table. So what do you think ?? Date: Tue, 25 Dec 2012 22:44:00 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org @Dalia I think the aggregation client should work with what you have passed. What i meant in the previous mail was with table.count() and now with AggregationClient. {code} if (scan.getFilter() == null qualifier == null) scan.setFilter(new FirstKeyOnlyFilter()); {code} So as you have passed the filter then it should work as how the SCVF should work. I can check this out during free time (may be tomorrow). If not you can raise a bug. If it turns to be fine then we can close it out otherwise its better we fix it. I can understand your urgency in this. Regards Ram On Tue, Dec 25, 2012 at 10:27 PM, yuzhih...@gmail.com wrote: RowCount method accepts scan object where you can attach your custom filter. Cheers On Dec 25, 2012, at 8:42 AM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Do you mean I implement a new rowCount method in Aggregation Client Class. I cannot understand, could u illustrate with a code sample Ram? Date: Tue, 25 Dec 2012 00:21:14 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Hi You could have custom filter implemented which is similar to FirstKeyOnlyfilter. Implement the filterKeyValue method such that it should match your keyvalue (the specific qualifier that you are looking for). Deploy it in your cluster. It should work. Regards Ram On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: So do you have a suggestion how to enable/work the filter? Date: Mon, 24 Dec 2012 22:22:49 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Okie, seeing the shell script and the code I feel that while you use this counter, the user's filter is not taken into account. It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. Regards Ram On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: yeah scan gives the correct number of rows, while count returns the total number of rows. Both are using the same filter, I even tried it using Java API, using row count method. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); I get the total number of rows not the number of rows filtered. So any idea ?? Thanks Ram :) Date: Mon, 24 Dec 2012 21:57:54 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org So you find that scan with a filter and count with the same filter is giving you different results? Regards Ram On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Dear all, I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000 rows with renal. When I type this in Hbase shell, import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 50,000 row import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes count 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 100,000 row Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) Also when measuring the improved performance on case of adding more nodes the operation takes the same time. So any advice please? I have been throughout all this mess
RE: Hbase Count Aggregate Function
Is there a problem in letting ID (rowkey) int value?? Date: Tue, 25 Dec 2012 22:44:00 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org @Dalia I think the aggregation client should work with what you have passed. What i meant in the previous mail was with table.count() and now with AggregationClient. {code} if (scan.getFilter() == null qualifier == null) scan.setFilter(new FirstKeyOnlyFilter()); {code} So as you have passed the filter then it should work as how the SCVF should work. I can check this out during free time (may be tomorrow). If not you can raise a bug. If it turns to be fine then we can close it out otherwise its better we fix it. I can understand your urgency in this. Regards Ram On Tue, Dec 25, 2012 at 10:27 PM, yuzhih...@gmail.com wrote: RowCount method accepts scan object where you can attach your custom filter. Cheers On Dec 25, 2012, at 8:42 AM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Do you mean I implement a new rowCount method in Aggregation Client Class. I cannot understand, could u illustrate with a code sample Ram? Date: Tue, 25 Dec 2012 00:21:14 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Hi You could have custom filter implemented which is similar to FirstKeyOnlyfilter. Implement the filterKeyValue method such that it should match your keyvalue (the specific qualifier that you are looking for). Deploy it in your cluster. It should work. Regards Ram On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: So do you have a suggestion how to enable/work the filter? Date: Mon, 24 Dec 2012 22:22:49 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Okie, seeing the shell script and the code I feel that while you use this counter, the user's filter is not taken into account. It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. Regards Ram On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: yeah scan gives the correct number of rows, while count returns the total number of rows. Both are using the same filter, I even tried it using Java API, using row count method. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); I get the total number of rows not the number of rows filtered. So any idea ?? Thanks Ram :) Date: Mon, 24 Dec 2012 21:57:54 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org So you find that scan with a filter and count with the same filter is giving you different results? Regards Ram On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Dear all, I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000 rows with renal. When I type this in Hbase shell, import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 50,000 row import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes count 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 100,000 row Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) Also when measuring the improved performance on case of adding more nodes the operation takes the same time. So any advice please? I have been throughout all this mess from a couple of weeks Thanks,
Re: Hbase Count Aggregate Function
Hi Dalia, You already sent the same question yesterday ;) Just give some time to people to look at it. JM 2012/12/24, Dalia Sobhy dalia.mohso...@hotmail.com: Dear all, I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000 rows with renal. When I type this in Hbase shell, import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 50,000 row import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes count 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 100,000 row Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) Also when measuring the improved performance on case of adding more nodes the operation takes the same time. So any advice please? I have been throughout all this mess from a couple of weeks Thanks,
Re: Hbase Count Aggregate Function
So you find that scan with a filter and count with the same filter is giving you different results? Regards Ram On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: Dear all, I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000 rows with renal. When I type this in Hbase shell, import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 50,000 row import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes count 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 100,000 row Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) Also when measuring the improved performance on case of adding more nodes the operation takes the same time. So any advice please? I have been throughout all this mess from a couple of weeks Thanks,
RE: Hbase Count Aggregate Function
yeah scan gives the correct number of rows, while count returns the total number of rows. Both are using the same filter, I even tried it using Java API, using row count method. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); I get the total number of rows not the number of rows filtered. So any idea ?? Thanks Ram :) Date: Mon, 24 Dec 2012 21:57:54 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org So you find that scan with a filter and count with the same filter is giving you different results? Regards Ram On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: Dear all, I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000 rows with renal. When I type this in Hbase shell, import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 50,000 row import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes count 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 100,000 row Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) Also when measuring the improved performance on case of adding more nodes the operation takes the same time. So any advice please? I have been throughout all this mess from a couple of weeks Thanks,
Re: Hbase Count Aggregate Function
Okie, seeing the shell script and the code I feel that while you use this counter, the user's filter is not taken into account. It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. Regards Ram On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: yeah scan gives the correct number of rows, while count returns the total number of rows. Both are using the same filter, I even tried it using Java API, using row count method. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); I get the total number of rows not the number of rows filtered. So any idea ?? Thanks Ram :) Date: Mon, 24 Dec 2012 21:57:54 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org So you find that scan with a filter and count with the same filter is giving you different results? Regards Ram On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Dear all, I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000 rows with renal. When I type this in Hbase shell, import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 50,000 row import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes count 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 100,000 row Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) Also when measuring the improved performance on case of adding more nodes the operation takes the same time. So any advice please? I have been throughout all this mess from a couple of weeks Thanks,
RE: Hbase Count Aggregate Function
So do you have a suggestion how to enable/work the filter? Date: Mon, 24 Dec 2012 22:22:49 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Okie, seeing the shell script and the code I feel that while you use this counter, the user's filter is not taken into account. It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. Regards Ram On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: yeah scan gives the correct number of rows, while count returns the total number of rows. Both are using the same filter, I even tried it using Java API, using row count method. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); I get the total number of rows not the number of rows filtered. So any idea ?? Thanks Ram :) Date: Mon, 24 Dec 2012 21:57:54 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org So you find that scan with a filter and count with the same filter is giving you different results? Regards Ram On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Dear all, I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000 rows with renal. When I type this in Hbase shell, import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 50,000 row import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes count 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 100,000 row Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) Also when measuring the improved performance on case of adding more nodes the operation takes the same time. So any advice please? I have been throughout all this mess from a couple of weeks Thanks,
Re: Hbase Count Aggregate Function
Hi You could have custom filter implemented which is similar to FirstKeyOnlyfilter. Implement the filterKeyValue method such that it should match your keyvalue (the specific qualifier that you are looking for). Deploy it in your cluster. It should work. Regards Ram On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: So do you have a suggestion how to enable/work the filter? Date: Mon, 24 Dec 2012 22:22:49 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Okie, seeing the shell script and the code I feel that while you use this counter, the user's filter is not taken into account. It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. Regards Ram On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: yeah scan gives the correct number of rows, while count returns the total number of rows. Both are using the same filter, I even tried it using Java API, using row count method. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); I get the total number of rows not the number of rows filtered. So any idea ?? Thanks Ram :) Date: Mon, 24 Dec 2012 21:57:54 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org So you find that scan with a filter and count with the same filter is giving you different results? Regards Ram On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Dear all, I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000 rows with renal. When I type this in Hbase shell, import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 50,000 row import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes count 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 100,000 row Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) Also when measuring the improved performance on case of adding more nodes the operation takes the same time. So any advice please? I have been throughout all this mess from a couple of weeks Thanks,
RE: Hbase Count Aggregate Function
Do you mean I implement a new rowCount method in Aggregation Client Class. I cannot understand, could u illustrate with a code sample Ram? Thanks, Date: Tue, 25 Dec 2012 00:21:14 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Hi You could have custom filter implemented which is similar to FirstKeyOnlyfilter. Implement the filterKeyValue method such that it should match your keyvalue (the specific qualifier that you are looking for). Deploy it in your cluster. It should work. Regards Ram On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: So do you have a suggestion how to enable/work the filter? Date: Mon, 24 Dec 2012 22:22:49 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Okie, seeing the shell script and the code I feel that while you use this counter, the user's filter is not taken into account. It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. Regards Ram On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: yeah scan gives the correct number of rows, while count returns the total number of rows. Both are using the same filter, I even tried it using Java API, using row count method. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); I get the total number of rows not the number of rows filtered. So any idea ?? Thanks Ram :) Date: Mon, 24 Dec 2012 21:57:54 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org So you find that scan with a filter and count with the same filter is giving you different results? Regards Ram On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Dear all, I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000 rows with renal. When I type this in Hbase shell, import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 50,000 row import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes count 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 100,000 row Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) Also when measuring the improved performance on case of adding more nodes the operation takes the same time. So any advice please? I have been throughout all this mess from a couple of weeks Thanks,
RE: Hbase Count Aggregate Function
This is my function: public long CountByDiagnosis(String diagnosis) throws IOException { customConf.setStrings(hbase.zookeeper.quorum,hbaseZookeeperQuorum); customConf.setLong(hbase.rpc.timeout, 60); customConf.setLong(hbase.client.scanner.caching, 1000); configuration = HBaseConfiguration.create(customConf); aggregationClient = new AggregationClient(configuration); scan.addFamily(CF); //Filter by a particular Diagnosis SingleColumnValueFilter filter1 = new SingleColumnValueFilter( CF, Column, CompareOp.EQUAL, Bytes.toBytes(diagnosis) ); scan.setFilter(filter1); long rowCount = -1; //Count the number of patients suffering from cardiac diagnosis try { rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); } catch (Throwable e) { e.printStackTrace(); } return rowCount; } Date: Tue, 25 Dec 2012 00:21:14 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Hi You could have custom filter implemented which is similar to FirstKeyOnlyfilter. Implement the filterKeyValue method such that it should match your keyvalue (the specific qualifier that you are looking for). Deploy it in your cluster. It should work. Regards Ram On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: So do you have a suggestion how to enable/work the filter? Date: Mon, 24 Dec 2012 22:22:49 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org Okie, seeing the shell script and the code I feel that while you use this counter, the user's filter is not taken into account. It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. Regards Ram On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy dalia.mohso...@hotmail.comwrote: yeah scan gives the correct number of rows, while count returns the total number of rows. Both are using the same filter, I even tried it using Java API, using row count method. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); I get the total number of rows not the number of rows filtered. So any idea ?? Thanks Ram :) Date: Mon, 24 Dec 2012 21:57:54 +0530 Subject: Re: Hbase Count Aggregate Function From: ramkrishna.s.vasude...@gmail.com To: user@hbase.apache.org So you find that scan with a filter and count with the same filter is giving you different results? Regards Ram On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Dear all, I have 50,000 row with diagnosis qualifier = cardiac, and another 50,000 rows with renal. When I type this in Hbase shell, import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 50,000 row import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes count 'patient', { COLUMNS = info:diagnosis, FILTER = SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 100,000 row Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) Also when measuring the improved performance on case of adding more nodes the operation takes the same time. So any advice please? I have been throughout all this mess from a couple of weeks Thanks,