[jira] [Comment Edited] (PHOENIX-2417) Compress memory used by row key byte[] of guideposts

James Taylor (JIRA) Wed, 13 Jan 2016 20:18:55 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096763#comment-15096763
 ]


James Taylor edited comment on PHOENIX-2417 at 1/14/16 4:17 AM:
----------------------------------------------------------------

Thanks for the update. I noticed one issue that may be causing the failures. In 
BaseResultIterators.incrementGuidePostStreamToCurrentKey you need to use the 
Bytes.compareTo() that takes an offset and length as the underlying 
ImmutableBytesWritable may have a length set that's less than byte[].length.
{code}
+    private int incrementGuidePostStreamToCurrentKey(PrefixByteDecoder 
decoder, DataInput input, byte[] currentKey) {
+        int count = 0;
+        try {
+            while (Bytes.compareTo(currentKey, CodecUtils.decode(decoder, 
input).get()) <= 0) {
+                count++;
+            }
+        } catch (EOFException e) {}
+        return count;
+    }
+
{code}
Another, probably better alternative is to just define currentKey as an 
ImmutableBytesWritable in BaseResultIterator.getParallelScans(), pass that 
around everywhere (instead of copying the bytes), and then you can use 
currentKey.compareTo(CodecUtils.decode(decoder, input)) here.

Same for scanRanges.intersectScan() - pass through ImmutableBytesWritable and 
don't copy the bytes unless you have to.


was (Author: jamestaylor):
Thanks for the update. I noticed one issue that may be causing the failures. In 
BaseResultIterators.incrementGuidePostStreamToCurrentKey you need to use the 
Bytes.compareTo() that takes an offset and length as the underlying 
ImmutableBytesWritable may have a length set that's less than byte[].length.
{code}
+    private int incrementGuidePostStreamToCurrentKey(PrefixByteDecoder 
decoder, DataInput input, byte[] currentKey) {
+        int count = 0;
+        try {
+            while (Bytes.compareTo(currentKey, CodecUtils.decode(decoder, 
input).get()) <= 0) {
+                count++;
+            }
+        } catch (EOFException e) {}
+        return count;
+    }
+
{code}


> Compress memory used by row key byte[] of guideposts
> ----------------------------------------------------
>
>                 Key: PHOENIX-2417
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2417
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Ankit Singhal
>             Fix For: 4.7.0
>
>         Attachments: PHOENIX-2417.patch, PHOENIX-2417_encoder.diff, 
> PHOENIX-2417_v2_wip.patch
>
>
> We've found that smaller guideposts are better in terms of minimizing any 
> increase in latency for point scans. However, this increases the amount of 
> memory significantly when caching the guideposts on the client. Guidepost are 
> equidistant row keys in the form of raw byte[] which are likely to have a 
> large percentage of their leading bytes in common (as they're stored in 
> sorted order. We should use a simple compression technique to mitigate this. 
> I noticed that Apache Parquet has a run length encoding - perhaps we can use 
> that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-2417) Compress memory used by row key byte[] of guideposts

Reply via email to