[ https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172541#comment-17172541 ]
ramkrishna.s.vasudevan edited comment on HBASE-24754 at 8/6/20, 5:16 PM: ------------------------------------------------------------------------- I was able to verify in my local linux VM and the significant drop is due to the Comparator. The branch-1.3 took consistenly ~11 to 12 secs but the branch-2 is varying much from 15 to 22 secs. See the stack trace and that explains the reason Branch-1.3 {code} main" #1 prio=5 os_prio=0 tid=0x00007f5ffc010800 nid=0x4b0b runnable [0x00007f6003887000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1897) at java.util.TreeMap.put(TreeMap.java:552) at java.util.TreeSet.add(TreeSet.java:255) at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:104) at org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:157) Where as in the branch-2 code base {code} "main" #1 prio=5 os_prio=0 tid=0x00007f4a48016000 nid=0x488a runnable [0x00007f4a507bb000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hbase.util.Bytes$ConverterHolder$UnsafeConverter.toShort(Bytes.java:1533) at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:1127) at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:1111) at org.apache.hadoop.hbase.KeyValue.getRowLength(KeyValue.java:1337) at org.apache.hadoop.hbase.KeyValue.getFamilyOffset(KeyValue.java:1353) at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1368) at org.apache.hadoop.hbase.KeyValue.getQualifierLength(KeyValue.java:1406) at org.apache.hadoop.hbase.CellComparatorImpl.compareQualifiers(CellComparatorImpl.java:169) at org.apache.hadoop.hbase.CellComparatorImpl.compareColumns(CellComparatorImpl.java:105) at org.apache.hadoop.hbase.CellComparatorImpl.compareWithoutRow(CellComparatorImpl.java:266) at org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:86) at org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:67) at org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:45) at java.util.TreeMap.put(TreeMap.java:552) at java.util.TreeSet.add(TreeSet.java:255) at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:191) at org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:242) {code} So we do more work to do the comparison when we have large rows. I think the similar thing is happening out in the other issue where we try to filter out large number of rows during a scan. (just saying but that i have not spent time on that ). was (Author: ram_krish): I was able to verify in my local linux VM and the significant drop is due to the Comparator. The branch-1.3 took consistenly ~11 to 12 secs but the branch-2 is varying much from 15 to 22 secs. See the stack trace and that explains the reason Branch-1.3 {code} main" #1 prio=5 os_prio=0 tid=0x00007f5ffc010800 nid=0x4b0b runnable [0x00007f6003887000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1897) at java.util.TreeMap.put(TreeMap.java:552) at java.util.TreeSet.add(TreeSet.java:255) at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:104) at org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:157) {code} Where the code there is {code} return Bytes.compareTo(left, loffset + lfamilylength, llength - lfamilylength, right, roffset + rfamilylength, rlength - rfamilylength); {code} Where as in the branch-2 code base {code} "main" #1 prio=5 os_prio=0 tid=0x00007f4a48016000 nid=0x488a runnable [0x00007f4a507bb000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hbase.util.Bytes$ConverterHolder$UnsafeConverter.toShort(Bytes.java:1533) at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:1127) at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:1111) at org.apache.hadoop.hbase.KeyValue.getRowLength(KeyValue.java:1337) at org.apache.hadoop.hbase.KeyValue.getFamilyOffset(KeyValue.java:1353) at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1368) at org.apache.hadoop.hbase.KeyValue.getQualifierLength(KeyValue.java:1406) at org.apache.hadoop.hbase.CellComparatorImpl.compareQualifiers(CellComparatorImpl.java:169) at org.apache.hadoop.hbase.CellComparatorImpl.compareColumns(CellComparatorImpl.java:105) at org.apache.hadoop.hbase.CellComparatorImpl.compareWithoutRow(CellComparatorImpl.java:266) at org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:86) at org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:67) at org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:45) at java.util.TreeMap.put(TreeMap.java:552) at java.util.TreeSet.add(TreeSet.java:255) at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:191) at org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:242) {code} So we do more work to do the comparison when we have large rows. I think the similar thing is happening out in the other issue where we try to filter out large number of rows during a scan. (just saying but that i have not spent time on that ). > Bulk load performance is degraded in HBase 2 > --------------------------------------------- > > Key: HBASE-24754 > URL: https://issues.apache.org/jira/browse/HBASE-24754 > Project: HBase > Issue Type: Bug > Components: Performance > Affects Versions: 2.2.3 > Reporter: Ajeet Rai > Priority: Major > Attachments: Branch1.3_putSortReducer_sampleCode.patch, > Branch2_putSortReducer_sampleCode.patch > > > in our Test,It is observed that Bulk load performance is degraded in HBase 2 . > Test Input: > 1: Table with 500 region(300 column family) > 2: data =2 TB > Data Sample > 18600000001201502051000000068110,18600000001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111111111111111111111111111111111111111111111111111111111111111111111111111111111 > 3: Cluster: 7 node(2 master+5 Region Server) > 4: No of Container Launched are same in both case > HBase 2 took 10% more time then HBase 1.3 where test input is same for both > cluster > > |Feature|HBase 2.2.3 > Time(Sec)|HBase 1.3.1 > Time(Sec)|Diff%|Snappy lib: > | > |BulkLoad|21837|19686.16|-10.93|Snappy lib: > HBase 2.2.3: 1.4 > HBase 1.3.1: 1.4| -- This message was sent by Atlassian Jira (v8.3.4#803005)