[ 
https://issues.apache.org/jira/browse/ACCUMULO-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150758#comment-15150758
 ] 

Keith Turner commented on ACCUMULO-4145:
----------------------------------------

KeyExtent is referenced by methods that are in the public API.   During the 
1.7.0 API cleanup I did the following for KeyExtent.

 * Deprecated data.KeyExtent and all public API methods that use it.
 * Created TabletId class in public API and replaced deprecated methods that 
used KeyExtent with new ones that used TabletId
 * Created a new data.impl.KeyExtent class that all internal code uses.  Only 
deprecated APIs use data.KeyExtent now
 * Made data.KeyExtent wrap data.impl.KeyExtent

I made data.KeyExtent wrap data.impl.KeyExtent instead of extend so that when 
new methods are added to data.impl.KeyExtent, data.KeyExtent does not 
automatically inherit those methods.  This freezes the deprecated KeyExtent API.

Given that in 1.7.0 I tried to be really conservative with API changes and 
minimize the possibility of breaking user code, I would not be in favor of 
these changes for 1.6.  The GeoWave input format uses tablet locator and key 
extent (until 1.8.0 is realesed with tablet location in the API).


> Eliminate constant unnecessary Text wrapping of immutable tableIDs
> ------------------------------------------------------------------
>
>                 Key: ACCUMULO-4145
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4145
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Christopher Tubbs
>            Assignee: Christopher Tubbs
>              Labels: tedious
>             Fix For: 1.8.0
>
>
> I started looking at ACCUMULO-4138 to see how KeyExtent is used to do 
> overlaps, and I noticed a *lot* of {{new KeyExtent(new Text(String), ...)}} 
> calls, where that first parameter is the Text version of tableId.
> It appears, we even do this practice so much, that KeyExtent has a built-in 
> WeakReference caching of these tableIds, so the Java GC we can dedupe them 
> and avoid creating too many.
> The thing is... a lot of this attempt to optimize appears to be the result of 
> unnecessarily wrapping all these immutable String tableIDs with a Text 
> object, yet it doesn't really seem to buy us anything. In most of our API and 
> a lot of internal utilities, these are already Strings with references 
> elsewhere in the code... so even if we dedupe the Text objects, we're not 
> doing anything about these Strings. Worse, we're actually doing some 
> protective copying here and there as we pass around these Text objects, using 
> TextUtil, etc. This is completely unnecessary, because they were immutable 
> before we copied them and wrapped them.
> In some cases, we actually call toString on the text objects to pass them 
> around, and they get flip-flopped a few times between String and Text, 
> depending on what the internal API accepts.
> At best, using the Text object helps us serialize KeyExtent... but that's 
> questionably beneficial since DataOutput can already serialize with: 
> {{out.writeUTF(String)}}, so that's not actually that helpful.
> The only other benefit I could possibly see is with compareTo for 
> lexicographical comparison, but I have a hard time believing Text can compare 
> faster than {{java.lang.String}}, and there shouldn't be any difference in 
> the results of the comparison between the UTF-8 encoding of Text and the 
> modified UTF encoding which is native to Java Strings, certainly not for the 
> base-36 characters and fixed constants (for special tables) we use in 
> tableIDs.
> We should just completely strip out all the Text versions of tableId, 
> wherever possible. I've already done this as an exercise in the 1.6 branch, 
> but it might be potentially risky as these are put in Java maps which don't 
> have strict type checking ({{map.get(Object)}}, for instance) and are 
> sometimes compared with {{.equals(Object)}}. For what it's worth, the only 
> public API this really touches is {{o.a.a.core.data.KeyExtent}}, which was 
> only inadvertently in the public API and has since been moved (IIRC). We can 
> preserve the old methods for compatibility, if necessary (deprecated, of 
> course).
> Also, getting rid of these Text objects, and just sticking with the immutable 
> String objects, we'll be able to take advantage of future JVM improvements 
> for String deduplication, like 
> http://java-performance.info/java-string-deduplication/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to