[
https://issues.apache.org/jira/browse/PHOENIX-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364227#comment-14364227
]
James Taylor commented on PHOENIX-1737:
---------------------------------------
bq. I've gz compression and fast_diff enconding on the hbase table, which
should take care of deduping
How would this take care of deduping? Maybe the KeyValueSortReducer would,
though.
Thanks for your help on this one, [~tulasip].
> Provide APIs for creating Phoenix encoded rowkeys
> -------------------------------------------------
>
> Key: PHOENIX-1737
> URL: https://issues.apache.org/jira/browse/PHOENIX-1737
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Tulasi P
>
> Here is the code I used for direct Phoenix encoding of the composite rowkey.
> Bulk-loading data with direct encoding can give upto 4x better performance
> compared to JDBC path used in the default csv bulk-loader.
> Providing APIs for performing Phoenix encoding will be useful in such
> scenarios.
> {code}
> // rowkey is a 3 column (unsigned & fixed-size) composite key
> // 3 column qualifiers - q1, q2, q3
> ImmutableBytesWritable outputKey = new ImmutableBytesWritable();
> byte[] key1 = new byte[1];
> byte[] key2 = new byte[4];
> byte[] key3 = new byte[4];
> byte[] outKeyByteArr = new byte[1 + key1.length + key2.length + key3.length];
>
> byte[] saltedKeyByteArr = new byte[outKeyByteArr.length];
> System.arraycopy(key1, 0, outKeyByteArr, 1, key1.length);
> System.arraycopy(key2, 0, outKeyByteArr, 1+key1.length, key2.length);
> System.arraycopy(key3, 0, outKeyByteArr, 1+key1.length+key2.length,
> key3.length);
> saltedKeyByteArr = SaltingUtil.getSaltedKey(new
> ImmutableBytesWritable(outKeyByteArr), NUM_BUCKETS);
> outputKey.set(saltedKeyByteArr);
> kv = new KeyValue(outputKey.get(),"0".getBytes(), "q1".getBytes(),
> v1.getBytes());
> context.write(outputKey, kv);
> kv = new KeyValue(outputKey.get(),"0".getBytes(), "q2".getBytes(),
> v2.getBytes());
> context.write(outputKey, kv);
> kv = new KeyValue(outputKey.get(),"0".getBytes(), "q3".getBytes(),
> v3.getBytes());
> context.write(outputKey, kv);
> kv = new KeyValue(outputKey.get(),"0".getBytes(), "_0".getBytes());
> context.write(outputKey, kv);
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)