[ https://issues.apache.org/jira/browse/PHOENIX-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102932#comment-15102932 ]
ASF GitHub Bot commented on PHOENIX-2417: ----------------------------------------- Github user JamesRTaylor commented on a diff in the pull request: https://github.com/apache/phoenix/pull/147#discussion_r49925667 --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java --- @@ -490,15 +492,35 @@ private static String toString(List<byte[]> gps) { } List<List<Scan>> parallelScans = Lists.newArrayListWithExpectedSize(stopIndex - regionIndex + 1); - byte[] currentKey = startKey; - int guideIndex = currentKey.length == 0 ? 0 : getIndexContainingInclusive(gps, currentKey); - int gpsSize = gps.size(); + ImmutableBytesWritable currentKey = new ImmutableBytesWritable(startKey, 0, startKey.length); + + int gpsSize = gps.getGuidePostsCount(); int estGuidepostsPerRegion = gpsSize == 0 ? 1 : gpsSize / regionLocations.size() + 1; int keyOffset = 0; + ImmutableBytesWritable currentGuidePost = ByteUtil.EMPTY_IMMUTABLE_BYTE_ARRAY; List<Scan> scans = Lists.newArrayListWithExpectedSize(estGuidepostsPerRegion); + ImmutableBytesWritable guidePosts = gps.getGuidePosts(); + ByteArrayInputStream stream = null; + DataInput input = null; + PrefixByteDecoder decoder = null; + int guideIndex = 0; + if (gpsSize > 0) { + stream = new ByteArrayInputStream(guidePosts.get(), guidePosts.getOffset(), guidePosts.getLength()); + input = new DataInputStream(stream); + decoder = new PrefixByteDecoder(gps.getMaxLength()); + try { + while (currentKey.compareTo(currentGuidePost = CodecUtils.decode(decoder, input)) >= 0 + && currentKey.getLength() != 0) { + guideIndex++; + } + } catch (EOFException e) {} + } + byte[] currentKeyBytes = currentKey.copyBytes(); --- End diff -- Always use ByteUtil.copyKeyBytesIfNecessary(currentKey) instead of copyBytes() as it's often not necessary to actually copy the bytes. > Compress memory used by row key byte[] of guideposts > ---------------------------------------------------- > > Key: PHOENIX-2417 > URL: https://issues.apache.org/jira/browse/PHOENIX-2417 > Project: Phoenix > Issue Type: Sub-task > Reporter: James Taylor > Assignee: Ankit Singhal > Fix For: 4.7.0 > > Attachments: PHOENIX-2417.patch, PHOENIX-2417_encoder.diff, > PHOENIX-2417_v2_wip.patch > > > We've found that smaller guideposts are better in terms of minimizing any > increase in latency for point scans. However, this increases the amount of > memory significantly when caching the guideposts on the client. Guidepost are > equidistant row keys in the form of raw byte[] which are likely to have a > large percentage of their leading bytes in common (as they're stored in > sorted order. We should use a simple compression technique to mitigate this. > I noticed that Apache Parquet has a run length encoding - perhaps we can use > that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)