[ 
https://issues.apache.org/jira/browse/ACCUMULO-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153653#comment-15153653
 ] 

ASF GitHub Bot commented on ACCUMULO-4145:
------------------------------------------

GitHub user ctubbsii opened a pull request:

    https://github.com/apache/accumulo/pull/72

    ACCUMULO-4145 Eliminate Text wrapping of tableIDs

    * Best attempt to eliminate text wrapping of tableIDs on master branch.
    * Some ByteBuffer wrapping still occurs due to thrift using bytes to 
transfer tableId instead of string.
    * Some Text wrapping is needed for:
     - preserving existing KeyExtent serialization
     - putting tableIDs in replication-related table entries
     - avoiding public API changes to deprecated KeyExtent.
    
    All unit tests pass, and -Psunny ITs, as well as checkstyle, findbugs, and 
modernizer
    This replaces #70

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ctubbsii/accumulo ACCUMULO-4145

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/72.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #72
    
----
commit 413244765adb7a60cef8de701de6983d4c99c8e2
Author: Christopher Tubbs <ctubb...@apache.org>
Date:   2016-02-18T23:56:54Z

    ACCUMULO-4145 Eliminate Text wrapping of tableIDs

----


> Eliminate constant unnecessary Text wrapping of immutable tableIDs
> ------------------------------------------------------------------
>
>                 Key: ACCUMULO-4145
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4145
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Christopher Tubbs
>            Assignee: Christopher Tubbs
>              Labels: tedious
>             Fix For: 1.8.0
>
>
> I started looking at ACCUMULO-4138 to see how KeyExtent is used to do 
> overlaps, and I noticed a *lot* of {{new KeyExtent(new Text(String), ...)}} 
> calls, where that first parameter is the Text version of tableId.
> It appears, we even do this practice so much, that KeyExtent has a built-in 
> WeakReference caching of these tableIds, so the Java GC we can dedupe them 
> and avoid creating too many.
> The thing is... a lot of this attempt to optimize appears to be the result of 
> unnecessarily wrapping all these immutable String tableIDs with a Text 
> object, yet it doesn't really seem to buy us anything. In most of our API and 
> a lot of internal utilities, these are already Strings with references 
> elsewhere in the code... so even if we dedupe the Text objects, we're not 
> doing anything about these Strings. Worse, we're actually doing some 
> protective copying here and there as we pass around these Text objects, using 
> TextUtil, etc. This is completely unnecessary, because they were immutable 
> before we copied them and wrapped them.
> In some cases, we actually call toString on the text objects to pass them 
> around, and they get flip-flopped a few times between String and Text, 
> depending on what the internal API accepts.
> At best, using the Text object helps us serialize KeyExtent... but that's 
> questionably beneficial since DataOutput can already serialize with: 
> {{out.writeUTF(String)}}, so that's not actually that helpful.
> The only other benefit I could possibly see is with compareTo for 
> lexicographical comparison, but I have a hard time believing Text can compare 
> faster than {{java.lang.String}}, and there shouldn't be any difference in 
> the results of the comparison between the UTF-8 encoding of Text and the 
> modified UTF encoding which is native to Java Strings, certainly not for the 
> base-36 characters and fixed constants (for special tables) we use in 
> tableIDs.
> We should just completely strip out all the Text versions of tableId, 
> wherever possible. I've already done this as an exercise in the 1.6 branch, 
> but it might be potentially risky as these are put in Java maps which don't 
> have strict type checking ({{map.get(Object)}}, for instance) and are 
> sometimes compared with {{.equals(Object)}}. For what it's worth, the only 
> public API this really touches is {{o.a.a.core.data.KeyExtent}}, which was 
> only inadvertently in the public API and has since been moved (IIRC). We can 
> preserve the old methods for compatibility, if necessary (deprecated, of 
> course).
> Also, getting rid of these Text objects, and just sticking with the immutable 
> String objects, we'll be able to take advantage of future JVM improvements 
> for String deduplication, like 
> http://java-performance.info/java-string-deduplication/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to