[jira] [Comment Edited] (HBASE-23279) Switch default block encoding to ROW_INDEX_V1

Viraj Jasani (Jira) Thu, 23 Jan 2020 16:18:05 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-23279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022602#comment-17022602
 ]


Viraj Jasani edited comment on HBASE-23279 at 1/24/20 12:17 AM:
----------------------------------------------------------------

We have this function in ByteBufferUtils: 
{code:java}
public static void copyFromBufferToArray(byte[] out, ByteBuffer in, int 
sourceOffset,
    int destinationOffset, int length) {
  if (in.hasArray()) {
    System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out, 
destinationOffset, length);
  } else if (UNSAFE_AVAIL) {
    UnsafeAccess.copy(in, sourceOffset, out, destinationOffset, length);
  } else {
    ByteBuffer inDup = in.duplicate();
    inDup.position(sourceOffset);
    inDup.get(out, destinationOffset, length);
  }
}
{code}
Which is used while copying the content from this ByteBuff's current position 
to the byte array all the way from TestHFileWriterV3 -> ByteBufferUtils while 
copying keys and values.

While using NONE vs ROW_INDEX_V1, the highest "byte[] out" length is present 
for ROW_INDEX_V1:

 

NONE: 
{code:java}
out.len: 32768 sourceOffset: 0 destinationOffset: 0 length: 2408
{code}
 

ROW_INDEX_V1: 
{code:java}
out.len: 458752 sourceOffset: 8 destinationOffset: 0 length: 458752
{code}
 

For ROW_INDEX_V1, it gives ArrayIndexOutOfBoundsExceptions with 458752 length. 
It seems avg keyLen for NONE encoding is ~60 whereas for ROW_INDEX_V1 it is 
458752.

 

I tried updating some encoding based conditions in HFileBlock.unpack() but no 
luck.

 

Exception stacktrace: 
{code:java}
java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBoundsException
 at java.lang.System.arraycopy(Native Method) at 
org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:1151)
 at org.apache.hadoop.hbase.nio.SingleByteBuff.get(SingleByteBuff.java:216) at 
org.apache.hadoop.hbase.nio.SingleByteBuff.get(SingleByteBuff.java:228) at 
org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.writeDataAndReadFromHFile(TestHFileWriterV3.java:255)
 at 
org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.testHFileFormatV3Internals(TestHFileWriterV3.java:109)
 at 
org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.testHFileFormatV3(TestHFileWriterV3.java:102){code}
 

[~stack] [~ram_krish] We have many HFile write tests right? Is this the only 
test that directly deals with ByteBuff interface? This is the only test failure.


was (Author: vjasani):
We have this function in ByteBufferUtils:

 
{code:java}
public static void copyFromBufferToArray(byte[] out, ByteBuffer in, int 
sourceOffset,
    int destinationOffset, int length) {
  if (in.hasArray()) {
    System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out, 
destinationOffset, length);
  } else if (UNSAFE_AVAIL) {
    UnsafeAccess.copy(in, sourceOffset, out, destinationOffset, length);
  } else {
    ByteBuffer inDup = in.duplicate();
    inDup.position(sourceOffset);
    inDup.get(out, destinationOffset, length);
  }
}
{code}
Which is used while copying the content from this ByteBuff's current position 
to the byte array all the way from TestHFileWriterV3 -> ByteBufferUtils while 
copying keys and values.

While using NONE vs ROW_INDEX_V1, the highest "byte[] out" length is present 
for ROW_INDEX_V1:

NONE:

 
{code:java}
out.len: 32768 sourceOffset: 0 destinationOffset: 0 length: 2408
{code}
ROW_INDEX_V1:

 

 
{code:java}
out.len: 458752 sourceOffset: 8 destinationOffset: 0 length: 458752
{code}
For ROW_INDEX_V1, it gives ArrayIndexOutOfBoundsExceptions with 458752 length. 
It seems like avg keyLen for NONE encoding is ~60 whereas for ROW_INDEX_V1 it 
is 458752.

 

 

I tried updating some encoding based conditions in HFileBlock.unpack() but no 
luck.

 

 

Exception stacktrace:

 
{code:java}
java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBoundsException
 at java.lang.System.arraycopy(Native Method) at 
org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:1151)
 at org.apache.hadoop.hbase.nio.SingleByteBuff.get(SingleByteBuff.java:216) at 
org.apache.hadoop.hbase.nio.SingleByteBuff.get(SingleByteBuff.java:228) at 
org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.writeDataAndReadFromHFile(TestHFileWriterV3.java:255)
 at 
org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.testHFileFormatV3Internals(TestHFileWriterV3.java:109)
 at 
org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.testHFileFormatV3(TestHFileWriterV3.java:102){code}
 

 

[~stack] [~ram_krish]

> Switch default block encoding to ROW_INDEX_V1
> ---------------------------------------------
>
>                 Key: HBASE-23279
>                 URL: https://issues.apache.org/jira/browse/HBASE-23279
>             Project: HBase
>          Issue Type: Wish
>    Affects Versions: 3.0.0, 2.3.0
>            Reporter: Lars Hofhansl
>            Assignee: Viraj Jasani
>            Priority: Minor
>             Fix For: 3.0.0, 2.3.0
>
>         Attachments: HBASE-23279.master.000.patch, 
> HBASE-23279.master.001.patch, HBASE-23279.master.002.patch, 
> HBASE-23279.master.003.patch, HBASE-23279.master.004.patch
>
>
> Currently we set both block encoding and compression to NONE.
> ROW_INDEX_V1 has many advantages and (almost) no disadvantages (the hfiles 
> are slightly larger about 3% or so). I think that would a better default than 
> NONE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HBASE-23279) Switch default block encoding to ROW_INDEX_V1

Reply via email to