[ https://issues.apache.org/jira/browse/HBASE-23279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022602#comment-17022602 ]
Viraj Jasani edited comment on HBASE-23279 at 1/24/20 12:17 AM: ---------------------------------------------------------------- We have this function in ByteBufferUtils: {code:java} public static void copyFromBufferToArray(byte[] out, ByteBuffer in, int sourceOffset, int destinationOffset, int length) { if (in.hasArray()) { System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out, destinationOffset, length); } else if (UNSAFE_AVAIL) { UnsafeAccess.copy(in, sourceOffset, out, destinationOffset, length); } else { ByteBuffer inDup = in.duplicate(); inDup.position(sourceOffset); inDup.get(out, destinationOffset, length); } } {code} Which is used while copying the content from this ByteBuff's current position to the byte array all the way from TestHFileWriterV3 -> ByteBufferUtils while copying keys and values. While using NONE vs ROW_INDEX_V1, the highest "byte[] out" length is present for ROW_INDEX_V1: NONE: {code:java} out.len: 32768 sourceOffset: 0 destinationOffset: 0 length: 2408 {code} ROW_INDEX_V1: {code:java} out.len: 458752 sourceOffset: 8 destinationOffset: 0 length: 458752 {code} For ROW_INDEX_V1, it gives ArrayIndexOutOfBoundsExceptions with 458752 length. It seems avg keyLen for NONE encoding is ~60 whereas for ROW_INDEX_V1 it is 458752. I tried updating some encoding based conditions in HFileBlock.unpack() but no luck. Exception stacktrace: {code:java} java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:1151) at org.apache.hadoop.hbase.nio.SingleByteBuff.get(SingleByteBuff.java:216) at org.apache.hadoop.hbase.nio.SingleByteBuff.get(SingleByteBuff.java:228) at org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.writeDataAndReadFromHFile(TestHFileWriterV3.java:255) at org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.testHFileFormatV3Internals(TestHFileWriterV3.java:109) at org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.testHFileFormatV3(TestHFileWriterV3.java:102){code} [~stack] [~ram_krish] We have many HFile write tests right? Is this the only test that directly deals with ByteBuff interface? This is the only test failure. was (Author: vjasani): We have this function in ByteBufferUtils: {code:java} public static void copyFromBufferToArray(byte[] out, ByteBuffer in, int sourceOffset, int destinationOffset, int length) { if (in.hasArray()) { System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out, destinationOffset, length); } else if (UNSAFE_AVAIL) { UnsafeAccess.copy(in, sourceOffset, out, destinationOffset, length); } else { ByteBuffer inDup = in.duplicate(); inDup.position(sourceOffset); inDup.get(out, destinationOffset, length); } } {code} Which is used while copying the content from this ByteBuff's current position to the byte array all the way from TestHFileWriterV3 -> ByteBufferUtils while copying keys and values. While using NONE vs ROW_INDEX_V1, the highest "byte[] out" length is present for ROW_INDEX_V1: NONE: {code:java} out.len: 32768 sourceOffset: 0 destinationOffset: 0 length: 2408 {code} ROW_INDEX_V1: {code:java} out.len: 458752 sourceOffset: 8 destinationOffset: 0 length: 458752 {code} For ROW_INDEX_V1, it gives ArrayIndexOutOfBoundsExceptions with 458752 length. It seems like avg keyLen for NONE encoding is ~60 whereas for ROW_INDEX_V1 it is 458752. I tried updating some encoding based conditions in HFileBlock.unpack() but no luck. Exception stacktrace: {code:java} java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:1151) at org.apache.hadoop.hbase.nio.SingleByteBuff.get(SingleByteBuff.java:216) at org.apache.hadoop.hbase.nio.SingleByteBuff.get(SingleByteBuff.java:228) at org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.writeDataAndReadFromHFile(TestHFileWriterV3.java:255) at org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.testHFileFormatV3Internals(TestHFileWriterV3.java:109) at org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.testHFileFormatV3(TestHFileWriterV3.java:102){code} [~stack] [~ram_krish] > Switch default block encoding to ROW_INDEX_V1 > --------------------------------------------- > > Key: HBASE-23279 > URL: https://issues.apache.org/jira/browse/HBASE-23279 > Project: HBase > Issue Type: Wish > Affects Versions: 3.0.0, 2.3.0 > Reporter: Lars Hofhansl > Assignee: Viraj Jasani > Priority: Minor > Fix For: 3.0.0, 2.3.0 > > Attachments: HBASE-23279.master.000.patch, > HBASE-23279.master.001.patch, HBASE-23279.master.002.patch, > HBASE-23279.master.003.patch, HBASE-23279.master.004.patch > > > Currently we set both block encoding and compression to NONE. > ROW_INDEX_V1 has many advantages and (almost) no disadvantages (the hfiles > are slightly larger about 3% or so). I think that would a better default than > NONE. -- This message was sent by Atlassian Jira (v8.3.4#803005)