[ 
https://issues.apache.org/jira/browse/HBASE-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569468#action_12569468
 ] 

stack commented on HBASE-82:
----------------------------

Kevin Beyer> If I recall correctly, BerkeleyDB also allows a comparator to be 
passed.

Yeah. You can set the comparator for the database in bdb.  Default is byte 
comparisons.  In our case, I suppose you'd set the comparator at the column 
family level; a single Comparator would be used across the whole Store.  The 
classname could be an attribute of column family.  Presuming we added to hbase 
some kinda network classloader or a classloader that could read from an hdfs 
directory, and as Kevin suggests, Filters would also benefit if we had such a 
mechanism in place, then each Store could instantiate its own Comparator 
instance.  Presumption again is that client is consistent about the Type bytes 
inserted into the column family.  Otherwise, sort order will be wonky.  Do 
folks think it OK that there'd be no checks in place to prevent clients 
inserting bytes from different key types?  I suppose there is nothing we can do 
about it if keys are byte arrays.  Would this work for jaql Kevin?

I'd prefer this approach of byte keys plus optional comparator to the 
alternative where we set the allowed WritableComparable key type on the Store 
and each key's type was checked to see that it matches the advertised type 
(Another downside to this alternative would require our undoing HBaseRPC 
putting back the generic hadoop RPC because we could be passing keys of any 
WritableComparable type).



> [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
> ----------------------------------------------------------------------
>
>                 Key: HBASE-82
>                 URL: https://issues.apache.org/jira/browse/HBASE-82
>             Project: Hadoop HBase
>          Issue Type: Wish
>            Reporter: Jim Kellerman
>
> I have heard from several people that row keys in HBase should be less 
> restricted than hadoop.io.Text.
> What do you think?
> At the very least, a row key has to be a WritableComparable. This would lead 
> to the most general case being either hadoop.io.BytesWritable or 
> hbase.io.ImmutableBytesWritable. The primary difference between these two 
> classes is that hadoop.io.BytesWritable by default allocates 100 bytes and if 
> you do not pay attention to the length, (BytesWritable.getSize()), converting 
> a String to a BytesWritable and vice versa can become problematic. 
> hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as 
> you pass in and then does not allow the size to be changed.
> If we were to change from Text to a non-text key, my preference would be for 
> ImmutableBytesWritable, because it has a fixed size once set, and operations 
> like get, etc do not have to something like System.arrayCopy where you 
> specify the number of bytes to copy.
> Your comments, questions are welcome on this issue. If we receive enough 
> feedback that Text is too restrictive, we are willing to change it, but we 
> need to hear what would be the most useful thing to change it to as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to