[ 
https://issues.apache.org/jira/browse/HBASE-20710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509937#comment-16509937
 ] 

huaxiang sun commented on HBASE-20710:
--------------------------------------

Thanks [~mdrob] for review, will address the comments. The main idea is as 
follows:

Cellblock:
[family1:qualifer1, v1], [family1:qualifer2, v2], [family1:qualifer3, v3] ....
cell1                                  cell2                                  
cell3

The first family byte array "family1" is added to the map(familyAdded). For 
cell2, its family is read into familyFromCell(allocate once). After it finds 
out that it is the same family, it will use familyAdded to put the cell2 into 
the TreeMap(very fast). For cell3, the family is read into 
familyFromCell(already allocated, no new allocation is needed), it will again 
compare with familyAdded and reuse familyAdded for put into the TreeMap. For 
cell4 and on, there will be no new allocation for family, and familyFromCell is 
reused. 

With this, there is no need to clone family for each cell and save heap 
allocation. Compared with the pre-patch case, the save is huge as it calls 
cloneFamily() twice for each cell (cellblock case). Similar applies to normal 
put case.


> extra cloneFamily() in Mutation.add(Cell)
> -----------------------------------------
>
>                 Key: HBASE-20710
>                 URL: https://issues.apache.org/jira/browse/HBASE-20710
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>    Affects Versions: 2.0.1
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>            Priority: Minor
>             Fix For: 2.0.1
>
>         Attachments: HBASE-20710-master-v001.patch
>
>
> The cpu profiling shows that during PE randomWrite testing, about 1 percent 
> of time is spent in cloneFamily. Reviewing code found that when a cell is DBB 
> backed ByteBuffKeyValueCell (which is default with Netty Rpc), 
> cell.getFamilyArray() will call cloneFamily() and there is again a 
> cloneFamily() in the following line of the code. since this is the critical 
> write path processing, this needs to be optimized.
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java#L791
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java#L795



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to