[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()

2017-08-11 Thread Xiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123559#comment-16123559
 ] 

Xiang Li commented on HBASE-18555:
--

I feel that Append and Delete could be updated to use 
Mutation#getCellList(family) as Increment and Put do in their addxxx() 
function. Those are proposed in another JIRA - HBASE-18573

> Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
> 
>
> Key: HBASE-18555
> URL: https://issues.apache.org/jira/browse/HBASE-18555
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Minor
> Attachments: HBASE-18555.master.000.patch
>
>
> In Put#addColumn() and addImmutable(), after getting the cell list of the 
> given family and add the cell into the list, the code puts (key=family, 
> value=list) into familyMap.
> In addColumn(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
> list.add(kv);
> familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here
> return this;
> {code}
> In addImmutable(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag);
> list.add(kv);
> familyMap.put(family, list); // <-- here
> return this;
> {code}
> I think those put() for Map only take effect when getCellList(family) returns 
> a new allocated ArrayList. When the list for a family already exist, put() 
> for Map will update the value to the reference of the list, but actually, the 
> reference of the list is not changed.
> Those put() do not do any harm in terms of the correctness when they are 
> here. But it could be removed to improve the performance. familyMap searches 
> for key and set the new value and return the old value. Those operation take 
> some time but actually are not needed. 
> The put() could be moved into Mutation#getCellList(family)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()

2017-08-11 Thread Xiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123556#comment-16123556
 ] 

Xiang Li commented on HBASE-18555:
--

Upload patch 001 to address [~chia7712]'s comment (Thanks very much) to update 
addxxx() functions of the follow classes
* sub-classes of Mutation
** Append
** Delete
** Increment
** Put
* sub-classes of Query
** Get
** Scan

> Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
> 
>
> Key: HBASE-18555
> URL: https://issues.apache.org/jira/browse/HBASE-18555
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Minor
> Attachments: HBASE-18555.master.000.patch
>
>
> In Put#addColumn() and addImmutable(), after getting the cell list of the 
> given family and add the cell into the list, the code puts (key=family, 
> value=list) into familyMap.
> In addColumn(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
> list.add(kv);
> familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here
> return this;
> {code}
> In addImmutable(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag);
> list.add(kv);
> familyMap.put(family, list); // <-- here
> return this;
> {code}
> I think those put() for Map only take effect when getCellList(family) returns 
> a new allocated ArrayList. When the list for a family already exist, put() 
> for Map will update the value to the reference of the list, but actually, the 
> reference of the list is not changed.
> Those put() do not do any harm in terms of the correctness when they are 
> here. But it could be removed to improve the performance. familyMap searches 
> for key and set the new value and return the old value. Those operation take 
> some time but actually are not needed. 
> The put() could be moved into Mutation#getCellList(family)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()

2017-08-11 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123394#comment-16123394
 ] 

Mike Drob commented on HBASE-18555:
---

Ah, ok, I see now. With putIfAbsent the code would have an extra allocation and 
gc when the family already exists. That's good enough for me.

> Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
> 
>
> Key: HBASE-18555
> URL: https://issues.apache.org/jira/browse/HBASE-18555
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Minor
> Attachments: HBASE-18555.master.000.patch
>
>
> In Put#addColumn() and addImmutable(), after getting the cell list of the 
> given family and add the cell into the list, the code puts (key=family, 
> value=list) into familyMap.
> In addColumn(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
> list.add(kv);
> familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here
> return this;
> {code}
> In addImmutable(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag);
> list.add(kv);
> familyMap.put(family, list); // <-- here
> return this;
> {code}
> I think those put() for Map only take effect when getCellList(family) returns 
> a new allocated ArrayList. When the list for a family already exist, put() 
> for Map will update the value to the reference of the list, but actually, the 
> reference of the list is not changed.
> Those put() do not do any harm in terms of the correctness when they are 
> here. But it could be removed to improve the performance. familyMap searches 
> for key and set the new value and return the old value. Those operation take 
> some time but actually are not needed. 
> The put() could be moved into Mutation#getCellList(family)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()

2017-08-11 Thread Xiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123212#comment-16123212
 ] 

Xiang Li commented on HBASE-18555:
--

Hi Mike,

Regarding
bq. Your modification to getCellList makes it look almost identical to the 
get/null-check/put pattern though. What am I missing?
yes, the patch updates the logic of Mutation#getCellList(family) to be 
get/null-check/put pattern.

Please allow me to summarize the code path in different scenarios 
* Without the patch
** When cell list for a family exists: get list from map -> null check -> 
update list -> {color:#d04437}put list to map (could be removed){color}
** When cell list for a family does not exist: get list from map -> null check 
-> allocate list -> {color:#59afe1}update list -> put list to map{color}
* With the patch
** When cell list for a family exists: {color:#14892c}get list from map -> null 
check -> update list {color}
** When cell list for a family does not exist: get list from map -> null check 
-> allocate list -> {color:#59afe1}put list to map -> update list{color}

The patch targets to remove the redundant Map#put() when cell list for a family 
already exists (remove the red to make it as green)
It also changes the logic when the cell list for a family does not exit, by 
moving Map#put() to be in front of List#add(), in blue, but does not do any 
harm.

> Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
> 
>
> Key: HBASE-18555
> URL: https://issues.apache.org/jira/browse/HBASE-18555
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Minor
> Attachments: HBASE-18555.master.000.patch
>
>
> In Put#addColumn() and addImmutable(), after getting the cell list of the 
> given family and add the cell into the list, the code puts (key=family, 
> value=list) into familyMap.
> In addColumn(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
> list.add(kv);
> familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here
> return this;
> {code}
> In addImmutable(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag);
> list.add(kv);
> familyMap.put(family, list); // <-- here
> return this;
> {code}
> I think those put() for Map only take effect when getCellList(family) returns 
> a new allocated ArrayList. When the list for a family already exist, put() 
> for Map will update the value to the reference of the list, but actually, the 
> reference of the list is not changed.
> Those put() do not do any harm in terms of the correctness when they are 
> here. But it could be removed to improve the performance. familyMap searches 
> for key and set the new value and return the old value. Those operation take 
> some time but actually are not needed. 
> The put() could be moved into Mutation#getCellList(family)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()

2017-08-10 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122434#comment-16122434
 ] 

Jerry He commented on HBASE-18555:
--

The patch saves the unnecessary last 'put' when the family already exists 
compared to current code.
Compared to putIfAbsent(), it saves a get/null-check since it has been done.  I 
also wonder how to use putIfAbsent() here.

> Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
> 
>
> Key: HBASE-18555
> URL: https://issues.apache.org/jira/browse/HBASE-18555
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Minor
> Attachments: HBASE-18555.master.000.patch
>
>
> In Put#addColumn() and addImmutable(), after getting the cell list of the 
> given family and add the cell into the list, the code puts (key=family, 
> value=list) into familyMap.
> In addColumn(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
> list.add(kv);
> familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here
> return this;
> {code}
> In addImmutable(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag);
> list.add(kv);
> familyMap.put(family, list); // <-- here
> return this;
> {code}
> I think those put() for Map only take effect when getCellList(family) returns 
> a new allocated ArrayList. When the list for a family already exist, put() 
> for Map will update the value to the reference of the list, but actually, the 
> reference of the list is not changed.
> Those put() do not do any harm in terms of the correctness when they are 
> here. But it could be removed to improve the performance. familyMap searches 
> for key and set the new value and return the old value. Those operation take 
> some time but actually are not needed. 
> The put() could be moved into Mutation#getCellList(family)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()

2017-08-10 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121898#comment-16121898
 ] 

Mike Drob commented on HBASE-18555:
---

Your modification to getCellList makes it look almost identical to the 
get/null-check/put pattern though. What am I missing?

> Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
> 
>
> Key: HBASE-18555
> URL: https://issues.apache.org/jira/browse/HBASE-18555
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Minor
> Attachments: HBASE-18555.master.000.patch
>
>
> In Put#addColumn() and addImmutable(), after getting the cell list of the 
> given family and add the cell into the list, the code puts (key=family, 
> value=list) into familyMap.
> In addColumn(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
> list.add(kv);
> familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here
> return this;
> {code}
> In addImmutable(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag);
> list.add(kv);
> familyMap.put(family, list); // <-- here
> return this;
> {code}
> I think those put() for Map only take effect when getCellList(family) returns 
> a new allocated ArrayList. When the list for a family already exist, put() 
> for Map will update the value to the reference of the list, but actually, the 
> reference of the list is not changed.
> Those put() do not do any harm in terms of the correctness when they are 
> here. But it could be removed to improve the performance. familyMap searches 
> for key and set the new value and return the old value. Those operation take 
> some time but actually are not needed. 
> The put() could be moved into Mutation#getCellList(family)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()

2017-08-10 Thread Xiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121889#comment-16121889
 ] 

Xiang Li commented on HBASE-18555:
--

Hi Mike Drob, 
bq. This seems like the perfect use case for putIfAbsent
Yes, it is, but could be better. Map#putIfAbsent() is like
{code}
 V v = map.get(key);
 if (v == null)
 v = map.put(key, value);

 return v;
{code}
When the key is associated with a value, it performs get(). When the key is not 
associated with a value, it performs get() and then put(). familyMap is 
TreeMap, which is Red-Black tree. The time complexity of get() and put() is 
quite low, but it is still O(log(n)).
So there could be a better way than putIfAbsent(), like the patch does: only 
put (key=family, value=list) right after the list is allocated.

> Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
> 
>
> Key: HBASE-18555
> URL: https://issues.apache.org/jira/browse/HBASE-18555
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Minor
> Attachments: HBASE-18555.master.000.patch
>
>
> In Put#addColumn() and addImmutable(), after getting the cell list of the 
> given family and add the cell into the list, the code puts (key=family, 
> value=list) into familyMap.
> In addColumn(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
> list.add(kv);
> familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here
> return this;
> {code}
> In addImmutable(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag);
> list.add(kv);
> familyMap.put(family, list); // <-- here
> return this;
> {code}
> I think those put() for Map only take effect when getCellList(family) returns 
> a new allocated ArrayList. When the list for a family already exist, put() 
> for Map will update the value to the reference of the list, but actually, the 
> reference of the list is not changed.
> Those put() do not do any harm in terms of the correctness when they are 
> here. But it could be removed to improve the performance. familyMap searches 
> for key and set the new value and return the old value. Those operation take 
> some time but actually are not needed. 
> The put() could be moved into Mutation#getCellList(family)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()

2017-08-10 Thread Chia-Ping Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121884#comment-16121884
 ] 

Chia-Ping Tsai commented on HBASE-18555:


There is same issue in Delete, Append, Increment, and Get.

> Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
> 
>
> Key: HBASE-18555
> URL: https://issues.apache.org/jira/browse/HBASE-18555
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Minor
> Attachments: HBASE-18555.master.000.patch
>
>
> In Put#addColumn() and addImmutable(), after getting the cell list of the 
> given family and add the cell into the list, the code puts (key=family, 
> value=list) into familyMap.
> In addColumn(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
> list.add(kv);
> familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here
> return this;
> {code}
> In addImmutable(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag);
> list.add(kv);
> familyMap.put(family, list); // <-- here
> return this;
> {code}
> I think those put() for Map only take effect when getCellList(family) returns 
> a new allocated ArrayList. When the list for a family already exist, put() 
> for Map will update the value to the reference of the list, but actually, the 
> reference of the list is not changed.
> Those put() do not do any harm in terms of the correctness when they are 
> here. But it could be removed to improve the performance. familyMap searches 
> for key and set the new value and return the old value. Those operation take 
> some time but actually are not needed. 
> The put() could be moved into Mutation#getCellList(family)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()

2017-08-10 Thread Xiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121858#comment-16121858
 ] 

Xiang Li commented on HBASE-18555:
--

The code has been there since HBASE-1304. 

> Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
> 
>
> Key: HBASE-18555
> URL: https://issues.apache.org/jira/browse/HBASE-18555
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Minor
> Attachments: HBASE-18555.master.000.patch
>
>
> In Put#addColumn() and addImmutable(), after getting the cell list of the 
> given family and add the cell into the list, the code puts (key=family, 
> value=list) into familyMap.
> In addColumn(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
> list.add(kv);
> familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here
> return this;
> {code}
> In addImmutable(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag);
> list.add(kv);
> familyMap.put(family, list); // <-- here
> return this;
> {code}
> I think those put() for Map only take effect when getCellList(family) returns 
> a new allocated ArrayList. When the list for a family already exist, put() 
> for Map will update the value to the reference of the list, but actually, the 
> reference of the list is not changed.
> Those put() do not do any harm in terms of the correctness when they are 
> here. But it could be removed to improve the performance. familyMap searches 
> for key and set the new value and return the old value. Those operation take 
> some time but actually are not needed. 
> The put() could be moved into Mutation#getCellList(family)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()

2017-08-10 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121851#comment-16121851
 ] 

Mike Drob commented on HBASE-18555:
---

This seems like the perfect use case for {{putIfAbsent}}

> Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
> 
>
> Key: HBASE-18555
> URL: https://issues.apache.org/jira/browse/HBASE-18555
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Minor
> Attachments: HBASE-18555.master.000.patch
>
>
> In Put#addColumn() and addImmutable(), after getting the cell list of the 
> given family and add the cell into the list, the code puts (key=family, 
> value=list) into familyMap.
> In addColumn(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
> list.add(kv);
> familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here
> return this;
> {code}
> In addImmutable(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag);
> list.add(kv);
> familyMap.put(family, list); // <-- here
> return this;
> {code}
> I think those put() for Map only take effect when getCellList(family) returns 
> a new allocated ArrayList. When the list for a family already exist, put() 
> for Map will update the value to the reference of the list, but actually, the 
> reference of the list is not changed.
> Those put() do not do any harm in terms of the correctness when they are 
> here. But it could be removed to improve the performance. familyMap searches 
> for key and set the new value and return the old value. Those operation take 
> some time but actually are not needed. 
> The put() could be moved into Mutation#getCellList(family)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()

2017-08-10 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121848#comment-16121848
 ] 

Jerry He commented on HBASE-18555:
--

Nice little improvement.  LGTM.

> Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
> 
>
> Key: HBASE-18555
> URL: https://issues.apache.org/jira/browse/HBASE-18555
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Minor
> Attachments: HBASE-18555.master.000.patch
>
>
> In Put#addColumn() and addImmutable(), after getting the cell list of the 
> given family and add the cell into the list, the code puts (key=family, 
> value=list) into familyMap.
> In addColumn(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
> list.add(kv);
> familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here
> return this;
> {code}
> In addImmutable(), it is like
> {code}
> List list = getCellList(family);
> KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag);
> list.add(kv);
> familyMap.put(family, list); // <-- here
> return this;
> {code}
> I think those put() for Map only take effect when getCellList(family) returns 
> a new allocated ArrayList. When the list for a family already exist, put() 
> for Map will update the value to the reference of the list, but actually, the 
> reference of the list is not changed.
> Those put() do not do any harm in terms of the correctness when they are 
> here. But it could be removed to improve the performance. familyMap searches 
> for key and set the new value and return the old value. Those operation take 
> some time but actually are not needed. 
> The put() could be moved into Mutation#getCellList(family)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)