[ 
https://issues.apache.org/jira/browse/MNG-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731628#comment-17731628
 ] 

Guillaume Nodet commented on MNG-7592:
--------------------------------------

That sounds like a good improvement.  
Since interned strings are GC'ed, a trivial thing would be to wrap the 
{{XmlPullParser}} created in the  {{ModelReader#read}} method to use intern 
strings on {{XmlPullParser#getName()}} and {{XmlPullParser#getText()}}.  I 
would think all name elements have to be interned, and most element's text 
(groupId, artifactId, version, scope, etc...)
This could give a good estimate if that's worth investigating or not.

> String deduplication in model building
> --------------------------------------
>
>                 Key: MNG-7592
>                 URL: https://issues.apache.org/jira/browse/MNG-7592
>             Project: Maven
>          Issue Type: Improvement
>            Reporter: Christoph Läubrich
>            Priority: Major
>
> I currently investigate improving memory consumption in m2eclipse (maven ide 
> extension) and noticed that one problem is that maven model seem to not 
> deduplicate strings, so for large projects (I used apache camel as an 
> example), there are a lot of duplicate strings hanging around, e.g. I see 
> 12.000 instances of "org.apache.maven.plugins" or around 10.000 of 
> "org.apache.camel" (please note that probably not all related to maven!).
> If I look at the Graph of incoming references I see for example that these 
> are from Model/Artifact groupId.
> I know that string deduplication in general is hard and even controversial, 
> but maybe one could think about such thing at least for the "hotsposts", e,g, 
> groupId, artifactId and version or even managementKeys seem good candidates 
> to be considered for such thing as these are used all over the place.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to