Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.

2011-10-20 Thread Steve Ratcliffe
On 20/10/11 19:57, Kristen Eisenberg wrote:
> On 31/03/10 14:56, Scott A Crosby wrote:
>  > I noticed that mkgmap does not intern any strings. In particular, this
>
> This is true. There is a pending patch that deals with excessive
> memory use in a slightly different way which is on the style-speed
> branch, with a pre-built one available at the bottom of the mkgmap
> download page.

That branch was merged to trunk quite some time ago.  There was also a
change to intern the string tag values.  So now there are no
duplicates in either the tag keys or values... Unless there is a new
bug?


..Steve
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.

2010-04-02 Thread WanMil
Am 02.04.2010 06:16, schrieb Scott A Crosby:
> On Thu, 01 Apr 2010 12:10:22 +0100, Steve Ratcliffe  
> writes:
>
>> On 31/03/10 14:56, Scott A Crosby wrote:
>>> I noticed that mkgmap does not intern any strings. In particular, this
>>
>> This is true.  There is a pending patch that deals with excessive
>> memory use in a slightly different way which is on the style-speed
>> branch, with a pre-built one available at the bottom of the mkgmap
>> download page.
>
> I pulled it out of s...@1569.
>
>> I drops all tags that are not used by the current style and as an
>> extra feature it ensures that there is only copy of all the strings
>> on the key side.
>>
>> Could you try it out please. It may be worth also interning the values
>> but when I was looking at it there was much less benefit for the maps
>> I was looking at (but it might vary with the input).
>> I'd be happy to apply a patch that interned the values too if there
>> was a decent memory saving without a significant performance drop.
>
> The style-speed branch builds the tile in question within 1gb of ram
> using the default style. I suspect that that style isn't using the
> problematic tags. A different style that did use the tags would likely
> still blow up. This seems fragile.
>
> I benchmarked the problematic tile on the style-speed branch with and
> without interning all keys and values in the Tag() constructor using
> ordinary String.intern(). The 18 character change implementing
> interning seems to increase performance by about a second, from 66 to
> 65 seconds.
>
> I'd say to go for it.
>
> Scott

Scott,

thanks for posting hard performance values! That's good news because it 
makes possible to use a simple intern() solution.

Before I read your post I compared some german tiles. The single 
threaded SVN r1625 took 328s with the default style. Using the intern() 
patch it took 335s. That seems to be an unsignificant change.
I am not able to post multithreaded comparisons because my geriatric CPU 
has one core only...

I also used the VisualGC plugin of JVisualVM to view the GC. The 
permanent space (where all the interned Strings are stored) did not 
exceed 11mb. So interning all Strings will probably not exceed the 
permanent space. However we might add a note to the README file that in 
some special cases it is necessary to increase the permanent size with 
-XX:MaxPermSize=128m (or more) to avoid OutOfMemoryExceptions.

WanMil

___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.

2010-04-01 Thread Scott A Crosby
On Thu, 01 Apr 2010 12:10:22 +0100, Steve Ratcliffe  
writes:

> On 31/03/10 14:56, Scott A Crosby wrote:
>> I noticed that mkgmap does not intern any strings. In particular, this
>
> This is true.  There is a pending patch that deals with excessive
> memory use in a slightly different way which is on the style-speed
> branch, with a pre-built one available at the bottom of the mkgmap
> download page.

I pulled it out of s...@1569.

> I drops all tags that are not used by the current style and as an
> extra feature it ensures that there is only copy of all the strings
> on the key side.
>
> Could you try it out please. It may be worth also interning the values
> but when I was looking at it there was much less benefit for the maps
> I was looking at (but it might vary with the input).
> I'd be happy to apply a patch that interned the values too if there
> was a decent memory saving without a significant performance drop.

The style-speed branch builds the tile in question within 1gb of ram
using the default style. I suspect that that style isn't using the
problematic tags. A different style that did use the tags would likely
still blow up. This seems fragile.

I benchmarked the problematic tile on the style-speed branch with and
without interning all keys and values in the Tag() constructor using
ordinary String.intern(). The 18 character change implementing
interning seems to increase performance by about a second, from 66 to
65 seconds.

I'd say to go for it.

Scott
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.

2010-04-01 Thread Scott A Crosby
On Wed, 31 Mar 2010 22:12:12 + (UTC), Chris Miller 
 writes:

> Note that Java's String.intern() method can be pretty slow, so while you'll 
> save a fair chunk of memory you'll potentially suffer a noticable performance 
> hit too if you're calling it a lot. By adding a barrier-free caching layer 
> in front of the String.intern() calls you can gain a reasonable performance 
> boost in this situation. As an example of how this can be implemented, take 
> a look at Lucene's SimpleStringInterner which does exactly this:
>
> http://github.com/apache/lucene/blob/1c5c409241a2b8b9e64dc8c253791b497a66c369/src/java/org/apache/lucene/util/SimpleStringInterner.java
>
> It's threadsafe in that it guarantees just enough visibility to never 
> generate 
> invalid results, yet also avoids any blocking. Might be worth benchmarking 
> something like this against the normal String.intern() with mkgmap.

We don't have to get rid of every duplicate string; only most of them,
so approximation techniques such as a per-thread or per-parser
FuzzyIntern will work fine, and require no concurrent access. Now
that I know that String. intern() uses weak references, it seems to be
the most trivial way to reduce the memory usage of that tile of
planet.osm by at least 3x.

Scott
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.

2010-04-01 Thread Scott A Crosby
On Wed, 31 Mar 2010 23:58:45 +0200, WanMil  writes:

> Am 31.03.2010 22:10, schrieb Scott A Crosby:
>> On Wed, 31 Mar 2010 21:13:49 +0200, WanMil  writes:
>>


> my patch interned all keys and additionally the values of a limited
> number of keys. Maybe it's not necessary to limit the interning of
> values. So I have attached the very simple but hopefully very
> effective patch regarding the memory footprint of mkgmap.

My opinion is that its simpler and more robust to intern or
pseudointern every tag value. The bad tile had a lot of duplicate
values, what if those tags were not on your list of the 'only intern
value strings for some keys'?

> Regarding your patch: I don't understand the function of the
> FuzzyIntern class. You build a HashMap from (uninterned) Strings to
> the interned String. Then you are looking up new strings in this
> HashMap and use the interned variant. Where's the difference to the
> (hopefully) very performance optimized intern() method?

Note that this code is not actually interning any strings in with
String.intern(). Call it psuedointerning. The purpose of FuzzyIntern
was when I believed that String.intern() interned forever, which I
considered very undesirable. I wanted semantics that would remove
(most) duplicate strings in memory without forcing those strings to
remain in RAM forever.

>> String intern does not intern forever

> I didn't know that. Do you have any link where this is specified?

A google search 'java intern weak reference' seems to indicate that
Java since version 1.2 uses a weak reference table for string intern,
which means that they can be removed. That search offers alternative
implementation ideas, such as using a real weak reference hashtable.

Scott
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.

2010-04-01 Thread Chris Miller
> I think the point that Chris brought up applies mostly in cases where
> there is a lot of contended access between threads.  If that is the
> case then it won't matter much for us as reading the input files is
> currently single threaded.

It's true, the biggest gains by far in the approach I was pointing out are 
when you have heavy concurrent access. I just saw Scott mention his code 
wasn't threadsafe so figured a multithreaded solution was desirable. However 
String.intern() can still be expensive even in a single-threaded environment. 
Replacing String.intern() with something simple like  if (!hashSet.contains()) 
hashSet.add(str)  should be roughly 2x quicker. Probably negligible in the 
grand scheme of things :)

Chris



___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.

2010-04-01 Thread Steve Ratcliffe
On 31/03/10 14:56, Scott A Crosby wrote:
> I noticed that mkgmap does not intern any strings. In particular, this

This is true.  There is a pending patch that deals with excessive
memory use in a slightly different way which is on the style-speed
branch, with a pre-built one available at the bottom of the mkgmap
download page.

I drops all tags that are not used by the current style and as an
extra feature it ensures that there is only copy of all the strings
on the key side.

Could you try it out please. It may be worth also interning the values
but when I was looking at it there was much less benefit for the maps
I was looking at (but it might vary with the input).
I'd be happy to apply a patch that interned the values too if there
was a decent memory saving without a significant performance drop.

I think the point that Chris brought up applies mostly in cases where
there is a lot of contended access between threads.  If that is the
case then it won't matter much for us as reading the input files is
currently single threaded.

I can't recall why reading maps *is* single threaded. I'm not sure
there is really a problem and if there is I am sure it can be fixed
easily.  Does anyone know or can find out?

..Steve
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.

2010-03-31 Thread Chris Miller
Note that Java's String.intern() method can be pretty slow, so while you'll 
save a fair chunk of memory you'll potentially suffer a noticable performance 
hit too if you're calling it a lot. By adding a barrier-free caching layer 
in front of the String.intern() calls you can gain a reasonable performance 
boost in this situation. As an example of how this can be implemented, take 
a look at Lucene's SimpleStringInterner which does exactly this:

http://github.com/apache/lucene/blob/1c5c409241a2b8b9e64dc8c253791b497a66c369/src/java/org/apache/lucene/util/SimpleStringInterner.java

It's threadsafe in that it guarantees just enough visibility to never generate 
invalid results, yet also avoids any blocking. Might be worth benchmarking 
something like this against the normal String.intern() with mkgmap.

Chris


> On Wed, 31 Mar 2010 21:13:49 +0200, WanMil  writes:
> 
>>> I noticed that mkgmap does not intern any strings. In particular,
>>> this tile, generated by the splitter, fails to build with -Xmx3000m
>>> on 64-bit jdk under linux. With my patch, mkgmap generates the tile
>>> with -Xmx1000m.
>>> 
>>> >> maxlon='11.513671875'/>
>>> 
>>> This tile has 1m nodes. Among the nodes and ways on this tile, there
>>> are 12m tags, yet only 100k distinct tag key/value pairs; on average
>>> each value occurs 120 times.
>>> 
>>> I explicitly do not use normal string interning because
>>> String.intern() strings are kept forever, and I want these strings
>>> to be GC'able after the tile is done. I trade GCability for having
>>> the occasional string duplicated in memory by flushing the interning
>>> table every 10k unique strings.
>>> 
>>> This code is not presently multithread safe; Ideally there should be
>>> one string interning table for each parser/thread.
>>> 
>>> Scott
>>> 
>> Hi Scott!
>> 
>> I think that's a good idea to intern the strings.
>> As far as I know the LossyIntern class is not needed. The .intern()
>> function of a string does exactly the same.
> You are right. String intern does not intern forever at least since
> Java 1.2.
> 
>> Some time ago I sent a very similar patch to the mailing list which
>> is not yet committed. Could you please test with your use case if it
>> performs a similar memory reduction?
>> 
> You can run it if you want, but from the numbers I gave above for this
> tile, interning values as in my patch will decrease the number of
> strings in RAM from 12M to <100k values. Interning only keys would
> reduce the number of Strings in RAM from 24M to 12M.
> 
>> The patch is thread safe and does not intern all strings. In my
>> opinion the value of a name tag should not be interned because there
>> is a high probability that this tag is used once only.
>> 
> Thats probably true for many or most tiles, but not for the tile I
> referenced above, where on average each value occurs 120 times. That
> tile is unbuildable with a 3gb heap without my patch and buildable
> with 1gb heap with my patch.
> 
> Shall I post an updated patch without FuzzyIntern?
> 
> Scott
> 



___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.

2010-03-31 Thread WanMil

Am 31.03.2010 22:10, schrieb Scott A Crosby:

On Wed, 31 Mar 2010 21:13:49 +0200, WanMil  writes:


I noticed that mkgmap does not intern any strings. In particular, this
tile, generated by the splitter, fails to build with -Xmx3000m on
64-bit jdk under linux. With my patch, mkgmap generates the tile with
-Xmx1000m.

  

This tile has 1m nodes. Among the nodes and ways on this tile, there
are 12m tags, yet only 100k distinct tag key/value pairs; on average
each value occurs 120 times.

I explicitly do not use normal string interning because
String.intern() strings are kept forever, and I want these strings to
be GC'able after the tile is done. I trade GCability for having the
occasional string duplicated in memory by flushing the interning table
every 10k unique strings.

This code is not presently multithread safe; Ideally there should be
one string interning table for each parser/thread.

Scott



Hi Scott!

I think that's a good idea to intern the strings.
As far as I know the LossyIntern class is not needed. The .intern()
function of a string does exactly the same.


You are right. String intern does not intern forever at least since
Java 1.2.


Some time ago I sent a very similar patch to the mailing list which
is not yet committed. Could you please test with your use case if it
performs a similar memory reduction?


You can run it if you want, but from the numbers I gave above for this
tile, interning values as in my patch will decrease the number of
strings in RAM from 12M to<100k values. Interning only keys would
reduce the number of Strings in RAM from 24M to 12M.



The patch is thread safe and does not intern all strings. In my
opinion the value of a name tag should not be interned because there
is a high probability that this tag is used once only.


Thats probably true for many or most tiles, but not for the tile I
referenced above, where on average each value occurs 120 times. That
tile is unbuildable with a 3gb heap without my patch and buildable
with 1gb heap with my patch.

Shall I post an updated patch without FuzzyIntern?

Scott


Scott,

my patch interned all keys and additionally the values of a limited 
number of keys. Maybe it's not necessary to limit the interning of 
values. So I have attached the very simple but hopefully very effective 
patch regarding the memory footprint of mkgmap.


Regarding your patch: I don't understand the function of the FuzzyIntern 
class. You build a HashMap from (uninterned) Strings to the interned 
String. Then you are looking up new strings in this HashMap and use the 
interned variant. Where's the difference to the (hopefully) very 
performance optimized intern() method?


> String intern does not intern forever
I didn't know that. Do you have any link where this is specified?

WanMil


Index: src/uk/me/parabola/mkgmap/reader/osm/Tags.java
===
--- src/uk/me/parabola/mkgmap/reader/osm/Tags.java  (revision 1624)
+++ src/uk/me/parabola/mkgmap/reader/osm/Tags.java  (working copy)
@@ -65,12 +65,14 @@
Integer ind = keyPos(key);
if (ind == null)
assert false : "keyPos(" + key + ") returns null - size 
= " + size + ", capacity = " + capacity;
-   keys[ind] = key;
+   // use .intern() to reduce memory footprint
+   keys[ind] = key.intern();
 
String old = values[ind];
if (old == null)
size++;
-   values[ind] = value;
+   
+   values[ind] = value.intern();
 
return old;
}
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.

2010-03-31 Thread Scott A Crosby
On Wed, 31 Mar 2010 21:13:49 +0200, WanMil  writes:

>> I noticed that mkgmap does not intern any strings. In particular, this
>> tile, generated by the splitter, fails to build with -Xmx3000m on
>> 64-bit jdk under linux. With my patch, mkgmap generates the tile with
>> -Xmx1000m.
>>
>>  >  maxlon='11.513671875'/>
>>
>> This tile has 1m nodes. Among the nodes and ways on this tile, there
>> are 12m tags, yet only 100k distinct tag key/value pairs; on average
>> each value occurs 120 times.
>>
>> I explicitly do not use normal string interning because
>> String.intern() strings are kept forever, and I want these strings to
>> be GC'able after the tile is done. I trade GCability for having the
>> occasional string duplicated in memory by flushing the interning table
>> every 10k unique strings.
>>
>> This code is not presently multithread safe; Ideally there should be
>> one string interning table for each parser/thread.
>>
>> Scott
>>
>
> Hi Scott!
>
> I think that's a good idea to intern the strings.
> As far as I know the LossyIntern class is not needed. The .intern()
> function of a string does exactly the same.

You are right. String intern does not intern forever at least since
Java 1.2.

> Some time ago I sent a very similar patch to the mailing list which
> is not yet committed. Could you please test with your use case if it
> performs a similar memory reduction?

You can run it if you want, but from the numbers I gave above for this
tile, interning values as in my patch will decrease the number of
strings in RAM from 12M to <100k values. Interning only keys would
reduce the number of Strings in RAM from 24M to 12M.


> The patch is thread safe and does not intern all strings. In my
> opinion the value of a name tag should not be interned because there
> is a high probability that this tag is used once only.

Thats probably true for many or most tiles, but not for the tile I
referenced above, where on average each value occurs 120 times. That
tile is unbuildable with a 3gb heap without my patch and buildable
with 1gb heap with my patch.

Shall I post an updated patch without FuzzyIntern?

Scott
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] Patch to reduce memory usage by interning strings.

2010-03-31 Thread WanMil

I noticed that mkgmap does not intern any strings. In particular, this
tile, generated by the splitter, fails to build with -Xmx3000m on
64-bit jdk under linux. With my patch, mkgmap generates the tile with
-Xmx1000m.

 

This tile has 1m nodes. Among the nodes and ways on this tile, there
are 12m tags, yet only 100k distinct tag key/value pairs; on average
each value occurs 120 times.

I explicitly do not use normal string interning because
String.intern() strings are kept forever, and I want these strings to
be GC'able after the tile is done. I trade GCability for having the
occasional string duplicated in memory by flushing the interning table
every 10k unique strings.

This code is not presently multithread safe; Ideally there should be
one string interning table for each parser/thread.

Scott



Hi Scott!

I think that's a good idea to intern the strings.
As far as I know the LossyIntern class is not needed. The .intern() 
function of a string does exactly the same.


Some time ago I sent a very similar patch to the mailing list which is 
not yet committed. Could you please test with your use case if it 
performs a similar memory reduction?


The patch is thread safe and does not intern all strings. In my opinion 
the value of a name tag should not be interned because there is a high 
probability that this tag is used once only.


WanMil
Index: src/uk/me/parabola/mkgmap/reader/osm/Tags.java
===
--- src/uk/me/parabola/mkgmap/reader/osm/Tags.java  (revision 1566)
+++ src/uk/me/parabola/mkgmap/reader/osm/Tags.java  (working copy)
@@ -19,6 +19,7 @@
 import java.util.AbstractMap;
 import java.util.Arrays;
 import java.util.HashMap;
+import java.util.HashSet;
 import java.util.Iterator;
 import java.util.Map;
 
@@ -45,6 +46,18 @@
 
private String[] keys;
private String[] values;
+   
+   /** 
+* Stores all tags which values should be stored as String intern. The 
values of
+* these tags should have a limited number of different values to get a 
+* reasonable memory footprint effect.
+*/
+   private final static HashSet interableValueTags = new 
HashSet(
+   Arrays.asList("highway", "building", 
"addr:housenumber", "access",
+   "natural", "waterway", "amenity", "oneway", 
"surface",
+   "landuse", "lanes", "place", "layer", 
"tracktype", "maxspeed",
+   "foot", "bridge", "height", "area", "railway", 
"admin_level",
+   "power", "type", "leisure", "barrier"));
 
public Tags() {
keys = new String[INIT_SIZE];
@@ -65,11 +78,19 @@
Integer ind = keyPos(key);
if (ind == null)
assert false : "keyPos(" + key + ") returns null - size 
= " + size + ", capacity = " + capacity;
-   keys[ind] = key;
+   // use .intern() to reduce memory footprint
+   keys[ind] = key.intern();
 
String old = values[ind];
if (old == null)
size++;
+   
+   if (interableValueTags.contains(key)) {
+   // use .intern() to reduce memory footprint for the most
+   // common tags with a limited range of values
+   value = value.intern();
+   }
+   
values[ind] = value;
 
return old;
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev