[
https://issues.apache.org/jira/browse/HBASE-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569132#action_12569132
]
Kevin Beyer commented on HBASE-82:
----------------------------------
Using Text is not a very good solution. I don't mind avoiding generics, but
the interface should be using a Writable (or WritableComparable) in place of
Text, and it should allow me to specify a comparator. I don't see why the
hbase key code should not be as general as the MapReduce key, and it is
limiting for our use in jaql.
For jaql, I would like to use any type, and even a complex type as the key. My
WritableComparable would work there.
Others may want to create a table that has a custom collation on a text key.
For example, case insensitivity, or in German, ss and esset are equivalent.
Serializing everything into text is not a great solution. Mapping arbitrary
comparators into some text string is nontrivial and often causes an increase in
storage to create the key as well as the original form of the key (eg, key =
data.toUpperCase(), so data needs to be stored too).
I would like to see this issue reopened. It's probably not the top priority
for hbase, but I think it should be fixed before the uptake of hbase causes
makes it a painful changed later.
If you are really only going to support one specific type for the key, why not
use byte arrays instead? They are faster than text and easier to understand
the comparison method.
> [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
> ----------------------------------------------------------------------
>
> Key: HBASE-82
> URL: https://issues.apache.org/jira/browse/HBASE-82
> Project: Hadoop HBase
> Issue Type: Wish
> Reporter: Jim Kellerman
> Priority: Minor
>
> I have heard from several people that row keys in HBase should be less
> restricted than hadoop.io.Text.
> What do you think?
> At the very least, a row key has to be a WritableComparable. This would lead
> to the most general case being either hadoop.io.BytesWritable or
> hbase.io.ImmutableBytesWritable. The primary difference between these two
> classes is that hadoop.io.BytesWritable by default allocates 100 bytes and if
> you do not pay attention to the length, (BytesWritable.getSize()), converting
> a String to a BytesWritable and vice versa can become problematic.
> hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as
> you pass in and then does not allow the size to be changed.
> If we were to change from Text to a non-text key, my preference would be for
> ImmutableBytesWritable, because it has a fixed size once set, and operations
> like get, etc do not have to something like System.arrayCopy where you
> specify the number of bytes to copy.
> Your comments, questions are welcome on this issue. If we receive enough
> feedback that Text is too restrictive, we are willing to change it, but we
> need to hear what would be the most useful thing to change it to as well.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.