[ 
https://issues.apache.org/jira/browse/CASSANDRA-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894370#action_12894370
 ] 

Uwe Schindler commented on CASSANDRA-767:
-----------------------------------------

About Lucandra: Currently all keys in Lucene are valid UTF-8 encoded bytes, so 
making them Strings in Cassandra is fine - also for numeric terms as Todd Nine 
said (they use only 7 bits of the byte[], so are valid UTF-8 - but there was 
still a bug in Cassandra by trimming keys, now solved).

Lucene trunk now has migrated to pure byte[] terms, so Lucandra will do the 
same. It is therefore no longer guaranteed that terms in an Lucene index are 
really representable as String, also the ordering of keys must be native 
unsigned byte[] and not UTF-16 (String.compareTo()) for several Queries in 
Lucene to work correct.

Additionally, the encoding of terms in Lucene trunk (aka 4.0) will also change 
to BOCU-1 for better space efficiency of eastern languages, also numeric terms 
will saved as raw byte[] with full 8bits, too.

> Row keys should be byte[]s, not Strings
> ---------------------------------------
>
>                 Key: CASSANDRA-767
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-767
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.7 beta 1
>
>         Attachments: 0001-Implement-compaction-benchmark.patch, 
> 0002-Implement-a-legacy-sstable-test.patch, 
> 0003-Store-bytes-in-DecoratedKey-and-cleanup-dead-code.patch, 
> 0004-Extract-read-writeName.patch, 
> 0005-Convert-IPartitioner-disk-key-format-to-bytes.patch, 
> 0006-Bump-SSTable-version-to-c-remove-utf16-encoding-from.patch
>
>
> This issue has come up numerous times, and we've dealt with a lot of pain 
> because of it: let's get it knocked out.
> Keys being Java Strings can make it painful to use Cassandra from other 
> languages, encoding binary data like integers as Strings is very inefficient, 
> and there is a disconnect between our column data types and the plain String 
> treatment we give row keys.
> The key design decision that needs discussion is: Should we apply the column 
> AbstractTypes to row keys? If so, how do Partitioners change?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to