[ https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250562#comment-17250562 ]
Michael Stack commented on HBASE-24754: --------------------------------------- Chatting w/ a coworker, he talked of being able to make a call high-up on what types of Cells/KVs are involved and before we start the task, make a call on the CellComparator to use (even suggested auto-generating the optimal... ). Seems like you can do this when bulk loading.Can look at the file and figure what the Cell type.... And then choose a CellComparator to use... one w/ no branching shaped to fit the Cells it will see. Are we set up to allow inserting a particular CellComparator to use in MR tasks? Good stuff. > Bulk load performance is degraded in HBase 2 > --------------------------------------------- > > Key: HBASE-24754 > URL: https://issues.apache.org/jira/browse/HBASE-24754 > Project: HBase > Issue Type: Bug > Components: Performance > Affects Versions: 2.2.3 > Reporter: Ajeet Rai > Assignee: ramkrishna.s.vasudevan > Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0 > > Attachments: Branc2_withComparator_atKeyValue.patch, > Branch1.3_putSortReducer_sampleCode.patch, > Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, > flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg > > > in our Test,It is observed that Bulk load performance is degraded in HBase 2 . > Test Input: > 1: Table with 500 region(300 column family) > 2: data =2 TB > Data Sample > 18600000001201502051000000068110,18600000001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111111111111111111111111111111111111111111111111111111111111111111111111111111111 > 3: Cluster: 7 node(2 master+5 Region Server) > 4: No of Container Launched are same in both case > HBase 2 took 10% more time then HBase 1.3 where test input is same for both > cluster > > |Feature|HBase 2.2.3 > Time(Sec)|HBase 1.3.1 > Time(Sec)|Diff%|Snappy lib: > | > |BulkLoad|21837|19686.16|-10.93|Snappy lib: > HBase 2.2.3: 1.4 > HBase 1.3.1: 1.4| -- This message was sent by Atlassian Jira (v8.3.4#803005)