[ https://issues.apache.org/jira/browse/PHOENIX-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096763#comment-15096763 ]
James Taylor edited comment on PHOENIX-2417 at 1/14/16 4:17 AM: ---------------------------------------------------------------- Thanks for the update. I noticed one issue that may be causing the failures. In BaseResultIterators.incrementGuidePostStreamToCurrentKey you need to use the Bytes.compareTo() that takes an offset and length as the underlying ImmutableBytesWritable may have a length set that's less than byte[].length. {code} + private int incrementGuidePostStreamToCurrentKey(PrefixByteDecoder decoder, DataInput input, byte[] currentKey) { + int count = 0; + try { + while (Bytes.compareTo(currentKey, CodecUtils.decode(decoder, input).get()) <= 0) { + count++; + } + } catch (EOFException e) {} + return count; + } + {code} Another, probably better alternative is to just define currentKey as an ImmutableBytesWritable in BaseResultIterator.getParallelScans(), pass that around everywhere (instead of copying the bytes), and then you can use currentKey.compareTo(CodecUtils.decode(decoder, input)) here. Same for scanRanges.intersectScan() - pass through ImmutableBytesWritable and don't copy the bytes unless you have to. was (Author: jamestaylor): Thanks for the update. I noticed one issue that may be causing the failures. In BaseResultIterators.incrementGuidePostStreamToCurrentKey you need to use the Bytes.compareTo() that takes an offset and length as the underlying ImmutableBytesWritable may have a length set that's less than byte[].length. {code} + private int incrementGuidePostStreamToCurrentKey(PrefixByteDecoder decoder, DataInput input, byte[] currentKey) { + int count = 0; + try { + while (Bytes.compareTo(currentKey, CodecUtils.decode(decoder, input).get()) <= 0) { + count++; + } + } catch (EOFException e) {} + return count; + } + {code} > Compress memory used by row key byte[] of guideposts > ---------------------------------------------------- > > Key: PHOENIX-2417 > URL: https://issues.apache.org/jira/browse/PHOENIX-2417 > Project: Phoenix > Issue Type: Sub-task > Reporter: James Taylor > Assignee: Ankit Singhal > Fix For: 4.7.0 > > Attachments: PHOENIX-2417.patch, PHOENIX-2417_encoder.diff, > PHOENIX-2417_v2_wip.patch > > > We've found that smaller guideposts are better in terms of minimizing any > increase in latency for point scans. However, this increases the amount of > memory significantly when caching the guideposts on the client. Guidepost are > equidistant row keys in the form of raw byte[] which are likely to have a > large percentage of their leading bytes in common (as they're stored in > sorted order. We should use a simple compression technique to mitigate this. > I noticed that Apache Parquet has a run length encoding - perhaps we can use > that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)