[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
[ https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123559#comment-16123559 ] Xiang Li commented on HBASE-18555: -- I feel that Append and Delete could be updated to use Mutation#getCellList(family) as Increment and Put do in their addxxx() function. Those are proposed in another JIRA - HBASE-18573 > Remove redundant familyMap.put() from Put#addColumn() and addImmutable() > > > Key: HBASE-18555 > URL: https://issues.apache.org/jira/browse/HBASE-18555 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Xiang Li >Assignee: Xiang Li >Priority: Minor > Attachments: HBASE-18555.master.000.patch > > > In Put#addColumn() and addImmutable(), after getting the cell list of the > given family and add the cell into the list, the code puts (key=family, > value=list) into familyMap. > In addColumn(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value); > list.add(kv); > familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here > return this; > {code} > In addImmutable(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag); > list.add(kv); > familyMap.put(family, list); // <-- here > return this; > {code} > I think those put() for Map only take effect when getCellList(family) returns > a new allocated ArrayList. When the list for a family already exist, put() > for Map will update the value to the reference of the list, but actually, the > reference of the list is not changed. > Those put() do not do any harm in terms of the correctness when they are > here. But it could be removed to improve the performance. familyMap searches > for key and set the new value and return the old value. Those operation take > some time but actually are not needed. > The put() could be moved into Mutation#getCellList(family) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
[ https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123556#comment-16123556 ] Xiang Li commented on HBASE-18555: -- Upload patch 001 to address [~chia7712]'s comment (Thanks very much) to update addxxx() functions of the follow classes * sub-classes of Mutation ** Append ** Delete ** Increment ** Put * sub-classes of Query ** Get ** Scan > Remove redundant familyMap.put() from Put#addColumn() and addImmutable() > > > Key: HBASE-18555 > URL: https://issues.apache.org/jira/browse/HBASE-18555 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Xiang Li >Assignee: Xiang Li >Priority: Minor > Attachments: HBASE-18555.master.000.patch > > > In Put#addColumn() and addImmutable(), after getting the cell list of the > given family and add the cell into the list, the code puts (key=family, > value=list) into familyMap. > In addColumn(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value); > list.add(kv); > familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here > return this; > {code} > In addImmutable(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag); > list.add(kv); > familyMap.put(family, list); // <-- here > return this; > {code} > I think those put() for Map only take effect when getCellList(family) returns > a new allocated ArrayList. When the list for a family already exist, put() > for Map will update the value to the reference of the list, but actually, the > reference of the list is not changed. > Those put() do not do any harm in terms of the correctness when they are > here. But it could be removed to improve the performance. familyMap searches > for key and set the new value and return the old value. Those operation take > some time but actually are not needed. > The put() could be moved into Mutation#getCellList(family) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
[ https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123394#comment-16123394 ] Mike Drob commented on HBASE-18555: --- Ah, ok, I see now. With putIfAbsent the code would have an extra allocation and gc when the family already exists. That's good enough for me. > Remove redundant familyMap.put() from Put#addColumn() and addImmutable() > > > Key: HBASE-18555 > URL: https://issues.apache.org/jira/browse/HBASE-18555 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Xiang Li >Assignee: Xiang Li >Priority: Minor > Attachments: HBASE-18555.master.000.patch > > > In Put#addColumn() and addImmutable(), after getting the cell list of the > given family and add the cell into the list, the code puts (key=family, > value=list) into familyMap. > In addColumn(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value); > list.add(kv); > familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here > return this; > {code} > In addImmutable(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag); > list.add(kv); > familyMap.put(family, list); // <-- here > return this; > {code} > I think those put() for Map only take effect when getCellList(family) returns > a new allocated ArrayList. When the list for a family already exist, put() > for Map will update the value to the reference of the list, but actually, the > reference of the list is not changed. > Those put() do not do any harm in terms of the correctness when they are > here. But it could be removed to improve the performance. familyMap searches > for key and set the new value and return the old value. Those operation take > some time but actually are not needed. > The put() could be moved into Mutation#getCellList(family) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
[ https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123212#comment-16123212 ] Xiang Li commented on HBASE-18555: -- Hi Mike, Regarding bq. Your modification to getCellList makes it look almost identical to the get/null-check/put pattern though. What am I missing? yes, the patch updates the logic of Mutation#getCellList(family) to be get/null-check/put pattern. Please allow me to summarize the code path in different scenarios * Without the patch ** When cell list for a family exists: get list from map -> null check -> update list -> {color:#d04437}put list to map (could be removed){color} ** When cell list for a family does not exist: get list from map -> null check -> allocate list -> {color:#59afe1}update list -> put list to map{color} * With the patch ** When cell list for a family exists: {color:#14892c}get list from map -> null check -> update list {color} ** When cell list for a family does not exist: get list from map -> null check -> allocate list -> {color:#59afe1}put list to map -> update list{color} The patch targets to remove the redundant Map#put() when cell list for a family already exists (remove the red to make it as green) It also changes the logic when the cell list for a family does not exit, by moving Map#put() to be in front of List#add(), in blue, but does not do any harm. > Remove redundant familyMap.put() from Put#addColumn() and addImmutable() > > > Key: HBASE-18555 > URL: https://issues.apache.org/jira/browse/HBASE-18555 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Xiang Li >Assignee: Xiang Li >Priority: Minor > Attachments: HBASE-18555.master.000.patch > > > In Put#addColumn() and addImmutable(), after getting the cell list of the > given family and add the cell into the list, the code puts (key=family, > value=list) into familyMap. > In addColumn(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value); > list.add(kv); > familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here > return this; > {code} > In addImmutable(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag); > list.add(kv); > familyMap.put(family, list); // <-- here > return this; > {code} > I think those put() for Map only take effect when getCellList(family) returns > a new allocated ArrayList. When the list for a family already exist, put() > for Map will update the value to the reference of the list, but actually, the > reference of the list is not changed. > Those put() do not do any harm in terms of the correctness when they are > here. But it could be removed to improve the performance. familyMap searches > for key and set the new value and return the old value. Those operation take > some time but actually are not needed. > The put() could be moved into Mutation#getCellList(family) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
[ https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122434#comment-16122434 ] Jerry He commented on HBASE-18555: -- The patch saves the unnecessary last 'put' when the family already exists compared to current code. Compared to putIfAbsent(), it saves a get/null-check since it has been done. I also wonder how to use putIfAbsent() here. > Remove redundant familyMap.put() from Put#addColumn() and addImmutable() > > > Key: HBASE-18555 > URL: https://issues.apache.org/jira/browse/HBASE-18555 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Xiang Li >Assignee: Xiang Li >Priority: Minor > Attachments: HBASE-18555.master.000.patch > > > In Put#addColumn() and addImmutable(), after getting the cell list of the > given family and add the cell into the list, the code puts (key=family, > value=list) into familyMap. > In addColumn(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value); > list.add(kv); > familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here > return this; > {code} > In addImmutable(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag); > list.add(kv); > familyMap.put(family, list); // <-- here > return this; > {code} > I think those put() for Map only take effect when getCellList(family) returns > a new allocated ArrayList. When the list for a family already exist, put() > for Map will update the value to the reference of the list, but actually, the > reference of the list is not changed. > Those put() do not do any harm in terms of the correctness when they are > here. But it could be removed to improve the performance. familyMap searches > for key and set the new value and return the old value. Those operation take > some time but actually are not needed. > The put() could be moved into Mutation#getCellList(family) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
[ https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121898#comment-16121898 ] Mike Drob commented on HBASE-18555: --- Your modification to getCellList makes it look almost identical to the get/null-check/put pattern though. What am I missing? > Remove redundant familyMap.put() from Put#addColumn() and addImmutable() > > > Key: HBASE-18555 > URL: https://issues.apache.org/jira/browse/HBASE-18555 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Xiang Li >Assignee: Xiang Li >Priority: Minor > Attachments: HBASE-18555.master.000.patch > > > In Put#addColumn() and addImmutable(), after getting the cell list of the > given family and add the cell into the list, the code puts (key=family, > value=list) into familyMap. > In addColumn(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value); > list.add(kv); > familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here > return this; > {code} > In addImmutable(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag); > list.add(kv); > familyMap.put(family, list); // <-- here > return this; > {code} > I think those put() for Map only take effect when getCellList(family) returns > a new allocated ArrayList. When the list for a family already exist, put() > for Map will update the value to the reference of the list, but actually, the > reference of the list is not changed. > Those put() do not do any harm in terms of the correctness when they are > here. But it could be removed to improve the performance. familyMap searches > for key and set the new value and return the old value. Those operation take > some time but actually are not needed. > The put() could be moved into Mutation#getCellList(family) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
[ https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121889#comment-16121889 ] Xiang Li commented on HBASE-18555: -- Hi Mike Drob, bq. This seems like the perfect use case for putIfAbsent Yes, it is, but could be better. Map#putIfAbsent() is like {code} V v = map.get(key); if (v == null) v = map.put(key, value); return v; {code} When the key is associated with a value, it performs get(). When the key is not associated with a value, it performs get() and then put(). familyMap is TreeMap, which is Red-Black tree. The time complexity of get() and put() is quite low, but it is still O(log(n)). So there could be a better way than putIfAbsent(), like the patch does: only put (key=family, value=list) right after the list is allocated. > Remove redundant familyMap.put() from Put#addColumn() and addImmutable() > > > Key: HBASE-18555 > URL: https://issues.apache.org/jira/browse/HBASE-18555 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Xiang Li >Assignee: Xiang Li >Priority: Minor > Attachments: HBASE-18555.master.000.patch > > > In Put#addColumn() and addImmutable(), after getting the cell list of the > given family and add the cell into the list, the code puts (key=family, > value=list) into familyMap. > In addColumn(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value); > list.add(kv); > familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here > return this; > {code} > In addImmutable(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag); > list.add(kv); > familyMap.put(family, list); // <-- here > return this; > {code} > I think those put() for Map only take effect when getCellList(family) returns > a new allocated ArrayList. When the list for a family already exist, put() > for Map will update the value to the reference of the list, but actually, the > reference of the list is not changed. > Those put() do not do any harm in terms of the correctness when they are > here. But it could be removed to improve the performance. familyMap searches > for key and set the new value and return the old value. Those operation take > some time but actually are not needed. > The put() could be moved into Mutation#getCellList(family) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
[ https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121884#comment-16121884 ] Chia-Ping Tsai commented on HBASE-18555: There is same issue in Delete, Append, Increment, and Get. > Remove redundant familyMap.put() from Put#addColumn() and addImmutable() > > > Key: HBASE-18555 > URL: https://issues.apache.org/jira/browse/HBASE-18555 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Xiang Li >Assignee: Xiang Li >Priority: Minor > Attachments: HBASE-18555.master.000.patch > > > In Put#addColumn() and addImmutable(), after getting the cell list of the > given family and add the cell into the list, the code puts (key=family, > value=list) into familyMap. > In addColumn(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value); > list.add(kv); > familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here > return this; > {code} > In addImmutable(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag); > list.add(kv); > familyMap.put(family, list); // <-- here > return this; > {code} > I think those put() for Map only take effect when getCellList(family) returns > a new allocated ArrayList. When the list for a family already exist, put() > for Map will update the value to the reference of the list, but actually, the > reference of the list is not changed. > Those put() do not do any harm in terms of the correctness when they are > here. But it could be removed to improve the performance. familyMap searches > for key and set the new value and return the old value. Those operation take > some time but actually are not needed. > The put() could be moved into Mutation#getCellList(family) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
[ https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121858#comment-16121858 ] Xiang Li commented on HBASE-18555: -- The code has been there since HBASE-1304. > Remove redundant familyMap.put() from Put#addColumn() and addImmutable() > > > Key: HBASE-18555 > URL: https://issues.apache.org/jira/browse/HBASE-18555 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Xiang Li >Assignee: Xiang Li >Priority: Minor > Attachments: HBASE-18555.master.000.patch > > > In Put#addColumn() and addImmutable(), after getting the cell list of the > given family and add the cell into the list, the code puts (key=family, > value=list) into familyMap. > In addColumn(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value); > list.add(kv); > familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here > return this; > {code} > In addImmutable(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag); > list.add(kv); > familyMap.put(family, list); // <-- here > return this; > {code} > I think those put() for Map only take effect when getCellList(family) returns > a new allocated ArrayList. When the list for a family already exist, put() > for Map will update the value to the reference of the list, but actually, the > reference of the list is not changed. > Those put() do not do any harm in terms of the correctness when they are > here. But it could be removed to improve the performance. familyMap searches > for key and set the new value and return the old value. Those operation take > some time but actually are not needed. > The put() could be moved into Mutation#getCellList(family) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
[ https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121851#comment-16121851 ] Mike Drob commented on HBASE-18555: --- This seems like the perfect use case for {{putIfAbsent}} > Remove redundant familyMap.put() from Put#addColumn() and addImmutable() > > > Key: HBASE-18555 > URL: https://issues.apache.org/jira/browse/HBASE-18555 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Xiang Li >Assignee: Xiang Li >Priority: Minor > Attachments: HBASE-18555.master.000.patch > > > In Put#addColumn() and addImmutable(), after getting the cell list of the > given family and add the cell into the list, the code puts (key=family, > value=list) into familyMap. > In addColumn(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value); > list.add(kv); > familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here > return this; > {code} > In addImmutable(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag); > list.add(kv); > familyMap.put(family, list); // <-- here > return this; > {code} > I think those put() for Map only take effect when getCellList(family) returns > a new allocated ArrayList. When the list for a family already exist, put() > for Map will update the value to the reference of the list, but actually, the > reference of the list is not changed. > Those put() do not do any harm in terms of the correctness when they are > here. But it could be removed to improve the performance. familyMap searches > for key and set the new value and return the old value. Those operation take > some time but actually are not needed. > The put() could be moved into Mutation#getCellList(family) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
[ https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121848#comment-16121848 ] Jerry He commented on HBASE-18555: -- Nice little improvement. LGTM. > Remove redundant familyMap.put() from Put#addColumn() and addImmutable() > > > Key: HBASE-18555 > URL: https://issues.apache.org/jira/browse/HBASE-18555 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Xiang Li >Assignee: Xiang Li >Priority: Minor > Attachments: HBASE-18555.master.000.patch > > > In Put#addColumn() and addImmutable(), after getting the cell list of the > given family and add the cell into the list, the code puts (key=family, > value=list) into familyMap. > In addColumn(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value); > list.add(kv); > familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here > return this; > {code} > In addImmutable(), it is like > {code} > List list = getCellList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag); > list.add(kv); > familyMap.put(family, list); // <-- here > return this; > {code} > I think those put() for Map only take effect when getCellList(family) returns > a new allocated ArrayList. When the list for a family already exist, put() > for Map will update the value to the reference of the list, but actually, the > reference of the list is not changed. > Those put() do not do any harm in terms of the correctness when they are > here. But it could be removed to improve the performance. familyMap searches > for key and set the new value and return the old value. Those operation take > some time but actually are not needed. > The put() could be moved into Mutation#getCellList(family) -- This message was sent by Atlassian JIRA (v6.4.14#64029)