Hey Ted:
See bin/HBase.rb. Look at the count method. See how it uses a filter
called FirstKeyOnlyFilter. This will return after it finds a value on a
row, the first value.
Looking at the arguments you can pass a Scan, it looks like you can pass a
FILTER argument only I see there is a bug in that FILTER is not defined. Do
this to fix it:
Index: bin/HBase.rb
===================================================================
--- bin/HBase.rb (revision 889094)
+++ bin/HBase.rb (working copy)
@@ -44,6 +44,7 @@
METHOD = "METHOD"
MAXLENGTH = "MAXLENGTH"
CACHE_BLOCKS = "CACHE_BLOCKS"
+ FILTER = "FILTER"
# Wrapper for org.apache.hadoop.hbase.client.HBaseAdmin
class Admin
Now, to scan and get unique rows only, you could do following in shell
(after making above change):
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Version: 0.21.0-dev, r889026, Thu Dec 10 05:27:10 UTC 2009
hbase(main):002:0> f = FirstKeyOnlyFilter.new()
hbase(main):003:0> scan 'TestTable', {FILTER => f}
# i.e. make an instance of this filter and then pass it to the scan
Will this work for you?
St.Ack
On Thu, Dec 17, 2009 at 2:44 PM, Ted Yu <[email protected]> wrote:
> Can you outline how such command can be added ?
>
> Thanks
>
> On Thu, Dec 17, 2009 at 11:06 AM, stack <[email protected]> wrote:
>
>> On Tue, Dec 15, 2009 at 12:59 PM, Ted Yu <[email protected]> wrote:
>>
>> > That works.
>> >
>> > scan command gives values for columns.
>> > Is there a shell command which lists unique row values, such as
>> > 'com.onsoft.www:http/' ?
>> >
>> >
>> > If you mean a command to list rows only, there is not such a command
>> (Wouldn't be hard to add).
>> St.Ack
>>
>
>