Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.
On 20/10/11 19:57, Kristen Eisenberg wrote: > On 31/03/10 14:56, Scott A Crosby wrote: > > I noticed that mkgmap does not intern any strings. In particular, this > > This is true. There is a pending patch that deals with excessive > memory use in a slightly different way which is on the style-speed > branch, with a pre-built one available at the bottom of the mkgmap > download page. That branch was merged to trunk quite some time ago. There was also a change to intern the string tag values. So now there are no duplicates in either the tag keys or values... Unless there is a new bug? ..Steve ___ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.
Am 02.04.2010 06:16, schrieb Scott A Crosby: > On Thu, 01 Apr 2010 12:10:22 +0100, Steve Ratcliffe > writes: > >> On 31/03/10 14:56, Scott A Crosby wrote: >>> I noticed that mkgmap does not intern any strings. In particular, this >> >> This is true. There is a pending patch that deals with excessive >> memory use in a slightly different way which is on the style-speed >> branch, with a pre-built one available at the bottom of the mkgmap >> download page. > > I pulled it out of s...@1569. > >> I drops all tags that are not used by the current style and as an >> extra feature it ensures that there is only copy of all the strings >> on the key side. >> >> Could you try it out please. It may be worth also interning the values >> but when I was looking at it there was much less benefit for the maps >> I was looking at (but it might vary with the input). >> I'd be happy to apply a patch that interned the values too if there >> was a decent memory saving without a significant performance drop. > > The style-speed branch builds the tile in question within 1gb of ram > using the default style. I suspect that that style isn't using the > problematic tags. A different style that did use the tags would likely > still blow up. This seems fragile. > > I benchmarked the problematic tile on the style-speed branch with and > without interning all keys and values in the Tag() constructor using > ordinary String.intern(). The 18 character change implementing > interning seems to increase performance by about a second, from 66 to > 65 seconds. > > I'd say to go for it. > > Scott Scott, thanks for posting hard performance values! That's good news because it makes possible to use a simple intern() solution. Before I read your post I compared some german tiles. The single threaded SVN r1625 took 328s with the default style. Using the intern() patch it took 335s. That seems to be an unsignificant change. I am not able to post multithreaded comparisons because my geriatric CPU has one core only... I also used the VisualGC plugin of JVisualVM to view the GC. The permanent space (where all the interned Strings are stored) did not exceed 11mb. So interning all Strings will probably not exceed the permanent space. However we might add a note to the README file that in some special cases it is necessary to increase the permanent size with -XX:MaxPermSize=128m (or more) to avoid OutOfMemoryExceptions. WanMil ___ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.
On Thu, 01 Apr 2010 12:10:22 +0100, Steve Ratcliffe writes: > On 31/03/10 14:56, Scott A Crosby wrote: >> I noticed that mkgmap does not intern any strings. In particular, this > > This is true. There is a pending patch that deals with excessive > memory use in a slightly different way which is on the style-speed > branch, with a pre-built one available at the bottom of the mkgmap > download page. I pulled it out of s...@1569. > I drops all tags that are not used by the current style and as an > extra feature it ensures that there is only copy of all the strings > on the key side. > > Could you try it out please. It may be worth also interning the values > but when I was looking at it there was much less benefit for the maps > I was looking at (but it might vary with the input). > I'd be happy to apply a patch that interned the values too if there > was a decent memory saving without a significant performance drop. The style-speed branch builds the tile in question within 1gb of ram using the default style. I suspect that that style isn't using the problematic tags. A different style that did use the tags would likely still blow up. This seems fragile. I benchmarked the problematic tile on the style-speed branch with and without interning all keys and values in the Tag() constructor using ordinary String.intern(). The 18 character change implementing interning seems to increase performance by about a second, from 66 to 65 seconds. I'd say to go for it. Scott ___ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.
On Wed, 31 Mar 2010 22:12:12 + (UTC), Chris Miller writes: > Note that Java's String.intern() method can be pretty slow, so while you'll > save a fair chunk of memory you'll potentially suffer a noticable performance > hit too if you're calling it a lot. By adding a barrier-free caching layer > in front of the String.intern() calls you can gain a reasonable performance > boost in this situation. As an example of how this can be implemented, take > a look at Lucene's SimpleStringInterner which does exactly this: > > http://github.com/apache/lucene/blob/1c5c409241a2b8b9e64dc8c253791b497a66c369/src/java/org/apache/lucene/util/SimpleStringInterner.java > > It's threadsafe in that it guarantees just enough visibility to never > generate > invalid results, yet also avoids any blocking. Might be worth benchmarking > something like this against the normal String.intern() with mkgmap. We don't have to get rid of every duplicate string; only most of them, so approximation techniques such as a per-thread or per-parser FuzzyIntern will work fine, and require no concurrent access. Now that I know that String. intern() uses weak references, it seems to be the most trivial way to reduce the memory usage of that tile of planet.osm by at least 3x. Scott ___ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.
On Wed, 31 Mar 2010 23:58:45 +0200, WanMil writes: > Am 31.03.2010 22:10, schrieb Scott A Crosby: >> On Wed, 31 Mar 2010 21:13:49 +0200, WanMil writes: >> > my patch interned all keys and additionally the values of a limited > number of keys. Maybe it's not necessary to limit the interning of > values. So I have attached the very simple but hopefully very > effective patch regarding the memory footprint of mkgmap. My opinion is that its simpler and more robust to intern or pseudointern every tag value. The bad tile had a lot of duplicate values, what if those tags were not on your list of the 'only intern value strings for some keys'? > Regarding your patch: I don't understand the function of the > FuzzyIntern class. You build a HashMap from (uninterned) Strings to > the interned String. Then you are looking up new strings in this > HashMap and use the interned variant. Where's the difference to the > (hopefully) very performance optimized intern() method? Note that this code is not actually interning any strings in with String.intern(). Call it psuedointerning. The purpose of FuzzyIntern was when I believed that String.intern() interned forever, which I considered very undesirable. I wanted semantics that would remove (most) duplicate strings in memory without forcing those strings to remain in RAM forever. >> String intern does not intern forever > I didn't know that. Do you have any link where this is specified? A google search 'java intern weak reference' seems to indicate that Java since version 1.2 uses a weak reference table for string intern, which means that they can be removed. That search offers alternative implementation ideas, such as using a real weak reference hashtable. Scott ___ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.
> I think the point that Chris brought up applies mostly in cases where > there is a lot of contended access between threads. If that is the > case then it won't matter much for us as reading the input files is > currently single threaded. It's true, the biggest gains by far in the approach I was pointing out are when you have heavy concurrent access. I just saw Scott mention his code wasn't threadsafe so figured a multithreaded solution was desirable. However String.intern() can still be expensive even in a single-threaded environment. Replacing String.intern() with something simple like if (!hashSet.contains()) hashSet.add(str) should be roughly 2x quicker. Probably negligible in the grand scheme of things :) Chris ___ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.
On 31/03/10 14:56, Scott A Crosby wrote: > I noticed that mkgmap does not intern any strings. In particular, this This is true. There is a pending patch that deals with excessive memory use in a slightly different way which is on the style-speed branch, with a pre-built one available at the bottom of the mkgmap download page. I drops all tags that are not used by the current style and as an extra feature it ensures that there is only copy of all the strings on the key side. Could you try it out please. It may be worth also interning the values but when I was looking at it there was much less benefit for the maps I was looking at (but it might vary with the input). I'd be happy to apply a patch that interned the values too if there was a decent memory saving without a significant performance drop. I think the point that Chris brought up applies mostly in cases where there is a lot of contended access between threads. If that is the case then it won't matter much for us as reading the input files is currently single threaded. I can't recall why reading maps *is* single threaded. I'm not sure there is really a problem and if there is I am sure it can be fixed easily. Does anyone know or can find out? ..Steve ___ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.
Note that Java's String.intern() method can be pretty slow, so while you'll save a fair chunk of memory you'll potentially suffer a noticable performance hit too if you're calling it a lot. By adding a barrier-free caching layer in front of the String.intern() calls you can gain a reasonable performance boost in this situation. As an example of how this can be implemented, take a look at Lucene's SimpleStringInterner which does exactly this: http://github.com/apache/lucene/blob/1c5c409241a2b8b9e64dc8c253791b497a66c369/src/java/org/apache/lucene/util/SimpleStringInterner.java It's threadsafe in that it guarantees just enough visibility to never generate invalid results, yet also avoids any blocking. Might be worth benchmarking something like this against the normal String.intern() with mkgmap. Chris > On Wed, 31 Mar 2010 21:13:49 +0200, WanMil writes: > >>> I noticed that mkgmap does not intern any strings. In particular, >>> this tile, generated by the splitter, fails to build with -Xmx3000m >>> on 64-bit jdk under linux. With my patch, mkgmap generates the tile >>> with -Xmx1000m. >>> >>> >> maxlon='11.513671875'/> >>> >>> This tile has 1m nodes. Among the nodes and ways on this tile, there >>> are 12m tags, yet only 100k distinct tag key/value pairs; on average >>> each value occurs 120 times. >>> >>> I explicitly do not use normal string interning because >>> String.intern() strings are kept forever, and I want these strings >>> to be GC'able after the tile is done. I trade GCability for having >>> the occasional string duplicated in memory by flushing the interning >>> table every 10k unique strings. >>> >>> This code is not presently multithread safe; Ideally there should be >>> one string interning table for each parser/thread. >>> >>> Scott >>> >> Hi Scott! >> >> I think that's a good idea to intern the strings. >> As far as I know the LossyIntern class is not needed. The .intern() >> function of a string does exactly the same. > You are right. String intern does not intern forever at least since > Java 1.2. > >> Some time ago I sent a very similar patch to the mailing list which >> is not yet committed. Could you please test with your use case if it >> performs a similar memory reduction? >> > You can run it if you want, but from the numbers I gave above for this > tile, interning values as in my patch will decrease the number of > strings in RAM from 12M to <100k values. Interning only keys would > reduce the number of Strings in RAM from 24M to 12M. > >> The patch is thread safe and does not intern all strings. In my >> opinion the value of a name tag should not be interned because there >> is a high probability that this tag is used once only. >> > Thats probably true for many or most tiles, but not for the tile I > referenced above, where on average each value occurs 120 times. That > tile is unbuildable with a 3gb heap without my patch and buildable > with 1gb heap with my patch. > > Shall I post an updated patch without FuzzyIntern? > > Scott > ___ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.
Am 31.03.2010 22:10, schrieb Scott A Crosby: On Wed, 31 Mar 2010 21:13:49 +0200, WanMil writes: I noticed that mkgmap does not intern any strings. In particular, this tile, generated by the splitter, fails to build with -Xmx3000m on 64-bit jdk under linux. With my patch, mkgmap generates the tile with -Xmx1000m. This tile has 1m nodes. Among the nodes and ways on this tile, there are 12m tags, yet only 100k distinct tag key/value pairs; on average each value occurs 120 times. I explicitly do not use normal string interning because String.intern() strings are kept forever, and I want these strings to be GC'able after the tile is done. I trade GCability for having the occasional string duplicated in memory by flushing the interning table every 10k unique strings. This code is not presently multithread safe; Ideally there should be one string interning table for each parser/thread. Scott Hi Scott! I think that's a good idea to intern the strings. As far as I know the LossyIntern class is not needed. The .intern() function of a string does exactly the same. You are right. String intern does not intern forever at least since Java 1.2. Some time ago I sent a very similar patch to the mailing list which is not yet committed. Could you please test with your use case if it performs a similar memory reduction? You can run it if you want, but from the numbers I gave above for this tile, interning values as in my patch will decrease the number of strings in RAM from 12M to<100k values. Interning only keys would reduce the number of Strings in RAM from 24M to 12M. The patch is thread safe and does not intern all strings. In my opinion the value of a name tag should not be interned because there is a high probability that this tag is used once only. Thats probably true for many or most tiles, but not for the tile I referenced above, where on average each value occurs 120 times. That tile is unbuildable with a 3gb heap without my patch and buildable with 1gb heap with my patch. Shall I post an updated patch without FuzzyIntern? Scott Scott, my patch interned all keys and additionally the values of a limited number of keys. Maybe it's not necessary to limit the interning of values. So I have attached the very simple but hopefully very effective patch regarding the memory footprint of mkgmap. Regarding your patch: I don't understand the function of the FuzzyIntern class. You build a HashMap from (uninterned) Strings to the interned String. Then you are looking up new strings in this HashMap and use the interned variant. Where's the difference to the (hopefully) very performance optimized intern() method? > String intern does not intern forever I didn't know that. Do you have any link where this is specified? WanMil Index: src/uk/me/parabola/mkgmap/reader/osm/Tags.java === --- src/uk/me/parabola/mkgmap/reader/osm/Tags.java (revision 1624) +++ src/uk/me/parabola/mkgmap/reader/osm/Tags.java (working copy) @@ -65,12 +65,14 @@ Integer ind = keyPos(key); if (ind == null) assert false : "keyPos(" + key + ") returns null - size = " + size + ", capacity = " + capacity; - keys[ind] = key; + // use .intern() to reduce memory footprint + keys[ind] = key.intern(); String old = values[ind]; if (old == null) size++; - values[ind] = value; + + values[ind] = value.intern(); return old; } ___ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.
On Wed, 31 Mar 2010 21:13:49 +0200, WanMil writes: >> I noticed that mkgmap does not intern any strings. In particular, this >> tile, generated by the splitter, fails to build with -Xmx3000m on >> 64-bit jdk under linux. With my patch, mkgmap generates the tile with >> -Xmx1000m. >> >> > maxlon='11.513671875'/> >> >> This tile has 1m nodes. Among the nodes and ways on this tile, there >> are 12m tags, yet only 100k distinct tag key/value pairs; on average >> each value occurs 120 times. >> >> I explicitly do not use normal string interning because >> String.intern() strings are kept forever, and I want these strings to >> be GC'able after the tile is done. I trade GCability for having the >> occasional string duplicated in memory by flushing the interning table >> every 10k unique strings. >> >> This code is not presently multithread safe; Ideally there should be >> one string interning table for each parser/thread. >> >> Scott >> > > Hi Scott! > > I think that's a good idea to intern the strings. > As far as I know the LossyIntern class is not needed. The .intern() > function of a string does exactly the same. You are right. String intern does not intern forever at least since Java 1.2. > Some time ago I sent a very similar patch to the mailing list which > is not yet committed. Could you please test with your use case if it > performs a similar memory reduction? You can run it if you want, but from the numbers I gave above for this tile, interning values as in my patch will decrease the number of strings in RAM from 12M to <100k values. Interning only keys would reduce the number of Strings in RAM from 24M to 12M. > The patch is thread safe and does not intern all strings. In my > opinion the value of a name tag should not be interned because there > is a high probability that this tag is used once only. Thats probably true for many or most tiles, but not for the tile I referenced above, where on average each value occurs 120 times. That tile is unbuildable with a 3gb heap without my patch and buildable with 1gb heap with my patch. Shall I post an updated patch without FuzzyIntern? Scott ___ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.
I noticed that mkgmap does not intern any strings. In particular, this tile, generated by the splitter, fails to build with -Xmx3000m on 64-bit jdk under linux. With my patch, mkgmap generates the tile with -Xmx1000m. This tile has 1m nodes. Among the nodes and ways on this tile, there are 12m tags, yet only 100k distinct tag key/value pairs; on average each value occurs 120 times. I explicitly do not use normal string interning because String.intern() strings are kept forever, and I want these strings to be GC'able after the tile is done. I trade GCability for having the occasional string duplicated in memory by flushing the interning table every 10k unique strings. This code is not presently multithread safe; Ideally there should be one string interning table for each parser/thread. Scott Hi Scott! I think that's a good idea to intern the strings. As far as I know the LossyIntern class is not needed. The .intern() function of a string does exactly the same. Some time ago I sent a very similar patch to the mailing list which is not yet committed. Could you please test with your use case if it performs a similar memory reduction? The patch is thread safe and does not intern all strings. In my opinion the value of a name tag should not be interned because there is a high probability that this tag is used once only. WanMil Index: src/uk/me/parabola/mkgmap/reader/osm/Tags.java === --- src/uk/me/parabola/mkgmap/reader/osm/Tags.java (revision 1566) +++ src/uk/me/parabola/mkgmap/reader/osm/Tags.java (working copy) @@ -19,6 +19,7 @@ import java.util.AbstractMap; import java.util.Arrays; import java.util.HashMap; +import java.util.HashSet; import java.util.Iterator; import java.util.Map; @@ -45,6 +46,18 @@ private String[] keys; private String[] values; + + /** +* Stores all tags which values should be stored as String intern. The values of +* these tags should have a limited number of different values to get a +* reasonable memory footprint effect. +*/ + private final static HashSet interableValueTags = new HashSet( + Arrays.asList("highway", "building", "addr:housenumber", "access", + "natural", "waterway", "amenity", "oneway", "surface", + "landuse", "lanes", "place", "layer", "tracktype", "maxspeed", + "foot", "bridge", "height", "area", "railway", "admin_level", + "power", "type", "leisure", "barrier")); public Tags() { keys = new String[INIT_SIZE]; @@ -65,11 +78,19 @@ Integer ind = keyPos(key); if (ind == null) assert false : "keyPos(" + key + ") returns null - size = " + size + ", capacity = " + capacity; - keys[ind] = key; + // use .intern() to reduce memory footprint + keys[ind] = key.intern(); String old = values[ind]; if (old == null) size++; + + if (interableValueTags.contains(key)) { + // use .intern() to reduce memory footprint for the most + // common tags with a limited range of values + value = value.intern(); + } + values[ind] = value; return old; ___ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev