[ 
https://issues.apache.org/jira/browse/HBASE-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207793#comment-15207793
 ] 

Anoop Sam John commented on HBASE-15493:
----------------------------------------

Otherwise you can think of adding an API to Put which takes a List of Cells?
We have exposed APIs in CellUtil to create cell objects based on row, cf, q, 
value, ts etc.
We already have an API  Put#add(Cell kv)
May be we should add some thing like  add(byte[] cf, List<Cell> cells)?   
Directly take this list into the familyMap with out copy/clone.  So may be we 
should name it addImmutable?  We already have APIs of this type in Put.   Ya 
when user uses it, we knows that HBase will reuse his List and he should not 
pollute it. When user creates Put and Cells he knows the size of the List 
clearly.
Rather than adding a size related new API, this may be working out?  Ya may be 
few extra LOC for the user app.
Just sharing my thought after seeing this jira



> Default ArrayList size may not be optimal for Mutation
> ------------------------------------------------------
>
>                 Key: HBASE-15493
>                 URL: https://issues.apache.org/jira/browse/HBASE-15493
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, regionserver
>    Affects Versions: 2.0.0
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15493-v1.patch, HBASE-15493-v2.patch
>
>
> {code}
>   List<Cell> getCellList(byte[] family) {
>     List<Cell> list = this.familyMap.get(family);
>     if (list == null) {
>       list = new ArrayList<Cell>();
>     }
>     return list;
>   }
> {code}
> Creates list of size 10, this is up to 80 bytes per column family in mutation 
> object. 
> Suggested:
> {code}
>   List<Cell> getCellList(byte[] family) {
>     List<Cell> list = this.familyMap.get(family);
>     if (list == null) {
>       list = new ArrayList<Cell>(CELL_LIST_INITIAL_CAPACITY);
>     }
>     return list;
>   }
> {code}
> CELL_LIST_INITIAL_CAPACITY = 2 in the patch, this is debatable. For mutation 
> where every CF has 1 cell, this gives decent reduction in memory allocation 
> rate in both client and server during write workload. ~2%, not a big number, 
> but as I said, already, memory optimization will include many small steps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to