[ https://issues.apache.org/jira/browse/MNG-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731628#comment-17731628 ]
Guillaume Nodet commented on MNG-7592: -------------------------------------- That sounds like a good improvement. Since interned strings are GC'ed, a trivial thing would be to wrap the {{XmlPullParser}} created in the {{ModelReader#read}} method to use intern strings on {{XmlPullParser#getName()}} and {{XmlPullParser#getText()}}. I would think all name elements have to be interned, and most element's text (groupId, artifactId, version, scope, etc...) This could give a good estimate if that's worth investigating or not. > String deduplication in model building > -------------------------------------- > > Key: MNG-7592 > URL: https://issues.apache.org/jira/browse/MNG-7592 > Project: Maven > Issue Type: Improvement > Reporter: Christoph Läubrich > Priority: Major > > I currently investigate improving memory consumption in m2eclipse (maven ide > extension) and noticed that one problem is that maven model seem to not > deduplicate strings, so for large projects (I used apache camel as an > example), there are a lot of duplicate strings hanging around, e.g. I see > 12.000 instances of "org.apache.maven.plugins" or around 10.000 of > "org.apache.camel" (please note that probably not all related to maven!). > If I look at the Graph of incoming references I see for example that these > are from Model/Artifact groupId. > I know that string deduplication in general is hard and even controversial, > but maybe one could think about such thing at least for the "hotsposts", e,g, > groupId, artifactId and version or even managementKeys seem good candidates > to be considered for such thing as these are used all over the place. -- This message was sent by Atlassian Jira (v8.20.10#820010)