[jira] Commented: (HBASE-82) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?

Jim Kellerman (JIRA) Fri, 15 Feb 2008 10:36:30 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569357#action_12569357
 ]


Jim Kellerman commented on HBASE-82:
------------------------------------

> Bryan Duxbury - 15/Feb/08 09:56 AM
> I understand that distributing new code is part of MapReduce, but that makes 
> a lot of sense,
> because MapReduce is a job-oriented, limited-lifetime process. When it ends, 
> you push out
> new code. HBase, on the other hand, should be a long-running process, which 
> makes service
> interruptions to add new key types costly, especially if it's being used by 
> multiple applications.

I agree. I just wanted to present some pros and cons for generic 
WritableComparable. In my
opinion, the cons outweigh the pros.

> I agree that byte arrays as keys is acceptable. What's the big difference 
> between Text and a byte
> array as it is? Just additional logic in the Text class?

Mostly. Text isn't very fussy about what you put into it, and it just 
serializes bytes. But there is
kind of an expectation that it contains UTF-8 text.

> If we switch to using byte arrays as keys, we should be prepared to offer 
> convenience overloaded
> methods to take String or Text keys which get converted before being sent 
> over the wire.

Yes.

> If we change keys from Text to byte[], will we also change column family 
> names and qualifiers
> in the same way?

No, I don't think so. There are good reasons for keeping the schema as readable 
text.


> [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
> ----------------------------------------------------------------------
>
>                 Key: HBASE-82
>                 URL: https://issues.apache.org/jira/browse/HBASE-82
>             Project: Hadoop HBase
>          Issue Type: Wish
>            Reporter: Jim Kellerman
>
> I have heard from several people that row keys in HBase should be less 
> restricted than hadoop.io.Text.
> What do you think?
> At the very least, a row key has to be a WritableComparable. This would lead 
> to the most general case being either hadoop.io.BytesWritable or 
> hbase.io.ImmutableBytesWritable. The primary difference between these two 
> classes is that hadoop.io.BytesWritable by default allocates 100 bytes and if 
> you do not pay attention to the length, (BytesWritable.getSize()), converting 
> a String to a BytesWritable and vice versa can become problematic. 
> hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as 
> you pass in and then does not allow the size to be changed.
> If we were to change from Text to a non-text key, my preference would be for 
> ImmutableBytesWritable, because it has a fixed size once set, and operations 
> like get, etc do not have to something like System.arrayCopy where you 
> specify the number of bytes to copy.
> Your comments, questions are welcome on this issue. If we receive enough 
> feedback that Text is too restrictive, we are willing to change it, but we 
> need to hear what would be the most useful thing to change it to as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-82) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?

Reply via email to