[jira] Commented: (HBASE-82) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?

Kevin Beyer (JIRA) Fri, 15 Feb 2008 12:36:29 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569401#action_12569401
 ]


Kevin Beyer commented on HBASE-82:
----------------------------------

JK> BerkeleyDB uses byte arrays as keys and values

If I recall correctly, BerkeleyDB also allows a comparator to be passed.  I'm 
ok with a byte array as the key with a comparator passed in.  Map/reduce 
internally uses such an api during the sort, to avoid unpacking and then 
comparing.  I would want to use the same comparator classes that map/reduce 
uses.

BD> This is because region servers would have to have the comparator code on 
hand in order to produce proper orderings in mapfiles and on gets and puts. 
Correct me if I'm wrong, but this would also require code distribution and 
restarting of HBase.

It is possible to dynamically add more code to a running java process.  It is 
done inside DB2 for user-defined functions, for example.  It's not terribly 
difficult; the distribution is slightly more painful.  If dfs were an 
installable file-system (as multiple people have prototyped with libFUSE), then 
distribution is easily done by dfs.

Moreover, type dynamic code loading problem has to be solved to make Filters 
useful, and Filters are much more dynamic than key types/comparators.

BD> Can you give some examples of complex types that would be used as keys in 
jaql?

Jaql supports an extended JSON data model.  For example, (almost) any JSON 
value can be used as a map/reduce key, join key, group-by key, or sort key.  
For example, a (string,int)-pair like ["astring",17] can be a key.  Jaql even 
allows mixed types in these comparisons (by an arbitrary ordering of the 
types), which is useful for some applications that put different types of 
things in the same bag (eg, departments and employees).

> [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
> ----------------------------------------------------------------------
>
>                 Key: HBASE-82
>                 URL: https://issues.apache.org/jira/browse/HBASE-82
>             Project: Hadoop HBase
>          Issue Type: Wish
>            Reporter: Jim Kellerman
>
> I have heard from several people that row keys in HBase should be less 
> restricted than hadoop.io.Text.
> What do you think?
> At the very least, a row key has to be a WritableComparable. This would lead 
> to the most general case being either hadoop.io.BytesWritable or 
> hbase.io.ImmutableBytesWritable. The primary difference between these two 
> classes is that hadoop.io.BytesWritable by default allocates 100 bytes and if 
> you do not pay attention to the length, (BytesWritable.getSize()), converting 
> a String to a BytesWritable and vice versa can become problematic. 
> hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as 
> you pass in and then does not allow the size to be changed.
> If we were to change from Text to a non-text key, my preference would be for 
> ImmutableBytesWritable, because it has a fixed size once set, and operations 
> like get, etc do not have to something like System.arrayCopy where you 
> specify the number of bytes to copy.
> Your comments, questions are welcome on this issue. If we receive enough 
> feedback that Text is too restrictive, we are willing to change it, but we 
> need to hear what would be the most useful thing to change it to as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-82) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?

Reply via email to