[ https://issues.apache.org/jira/browse/ACCUMULO-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153653#comment-15153653 ]
ASF GitHub Bot commented on ACCUMULO-4145: ------------------------------------------ GitHub user ctubbsii opened a pull request: https://github.com/apache/accumulo/pull/72 ACCUMULO-4145 Eliminate Text wrapping of tableIDs * Best attempt to eliminate text wrapping of tableIDs on master branch. * Some ByteBuffer wrapping still occurs due to thrift using bytes to transfer tableId instead of string. * Some Text wrapping is needed for: - preserving existing KeyExtent serialization - putting tableIDs in replication-related table entries - avoiding public API changes to deprecated KeyExtent. All unit tests pass, and -Psunny ITs, as well as checkstyle, findbugs, and modernizer This replaces #70 You can merge this pull request into a Git repository by running: $ git pull https://github.com/ctubbsii/accumulo ACCUMULO-4145 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/accumulo/pull/72.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #72 ---- commit 413244765adb7a60cef8de701de6983d4c99c8e2 Author: Christopher Tubbs <ctubb...@apache.org> Date: 2016-02-18T23:56:54Z ACCUMULO-4145 Eliminate Text wrapping of tableIDs ---- > Eliminate constant unnecessary Text wrapping of immutable tableIDs > ------------------------------------------------------------------ > > Key: ACCUMULO-4145 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4145 > Project: Accumulo > Issue Type: Improvement > Reporter: Christopher Tubbs > Assignee: Christopher Tubbs > Labels: tedious > Fix For: 1.8.0 > > > I started looking at ACCUMULO-4138 to see how KeyExtent is used to do > overlaps, and I noticed a *lot* of {{new KeyExtent(new Text(String), ...)}} > calls, where that first parameter is the Text version of tableId. > It appears, we even do this practice so much, that KeyExtent has a built-in > WeakReference caching of these tableIds, so the Java GC we can dedupe them > and avoid creating too many. > The thing is... a lot of this attempt to optimize appears to be the result of > unnecessarily wrapping all these immutable String tableIDs with a Text > object, yet it doesn't really seem to buy us anything. In most of our API and > a lot of internal utilities, these are already Strings with references > elsewhere in the code... so even if we dedupe the Text objects, we're not > doing anything about these Strings. Worse, we're actually doing some > protective copying here and there as we pass around these Text objects, using > TextUtil, etc. This is completely unnecessary, because they were immutable > before we copied them and wrapped them. > In some cases, we actually call toString on the text objects to pass them > around, and they get flip-flopped a few times between String and Text, > depending on what the internal API accepts. > At best, using the Text object helps us serialize KeyExtent... but that's > questionably beneficial since DataOutput can already serialize with: > {{out.writeUTF(String)}}, so that's not actually that helpful. > The only other benefit I could possibly see is with compareTo for > lexicographical comparison, but I have a hard time believing Text can compare > faster than {{java.lang.String}}, and there shouldn't be any difference in > the results of the comparison between the UTF-8 encoding of Text and the > modified UTF encoding which is native to Java Strings, certainly not for the > base-36 characters and fixed constants (for special tables) we use in > tableIDs. > We should just completely strip out all the Text versions of tableId, > wherever possible. I've already done this as an exercise in the 1.6 branch, > but it might be potentially risky as these are put in Java maps which don't > have strict type checking ({{map.get(Object)}}, for instance) and are > sometimes compared with {{.equals(Object)}}. For what it's worth, the only > public API this really touches is {{o.a.a.core.data.KeyExtent}}, which was > only inadvertently in the public API and has since been moved (IIRC). We can > preserve the old methods for compatibility, if necessary (deprecated, of > course). > Also, getting rid of these Text objects, and just sticking with the immutable > String objects, we'll be able to take advantage of future JVM improvements > for String deduplication, like > http://java-performance.info/java-string-deduplication/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)