Re: [OSM-talk] First drop in planet size ?

2010-03-14 Thread hbogner
Thanks it worked :D

Nic Roets wrote:
 There's a bug in the code that generated this week's planet. You
 should either wait until next week or filter the planet with the
 following command:
 bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|...
 
 There has been a long discussion on 'dev', mentioning other remedies.



___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] First drop in planet size ?

2010-03-13 Thread hbogner
Nic Roets wrote:
 (since we got rid of the segments)
 
 From 8.2 GB to 8.1 GB:
 http://planet.openstreetmap.org/
Maybe something is wrong with it.
I don't know if anybody has the same problem but I can't manage to 
complete an extract with osmosis. I'm doing the same thing as everytime 
and it doesn't work. I tried downloading it again, nothing, another 
version of osmosis, nothing, another version of .poly, nothing.
At the moment I'm waiting for it to extract from bz2 and try again, 
maybe it's just bz2 ...

error:
SEVERE: Thread for task 1-read-xml failed
org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to parse 
xml file /dev/stdin.  publicId=(null), systemId=(null), 
lineNumber=529642199, columnNumber=27.

osmosis:
bzip2 -d -c planet-100310.osm.bz2 | osmosis/bin/osmosis --read-xml 
/dev/stdin --bounding-polygon clipIncompleteEntities=true 
file=croatia50km.poly --write-xml file=- | bzip2 -c  
20100310-croatia50km.osm.bz2


___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] First drop in planet size ?

2010-03-13 Thread Nic Roets
There's a bug in the code that generated this week's planet. You
should either wait until next week or filter the planet with the
following command:
bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|...

There has been a long discussion on 'dev', mentioning other remedies.

On Sat, Mar 13, 2010 at 6:29 PM, hbogner hbog...@gmail.com wrote:
 Nic Roets wrote:
 (since we got rid of the segments)

 From 8.2 GB to 8.1 GB:
 http://planet.openstreetmap.org/
 Maybe something is wrong with it.
 I don't know if anybody has the same problem but I can't manage to
 complete an extract with osmosis. I'm doing the same thing as everytime
 and it doesn't work. I tried downloading it again, nothing, another
 version of osmosis, nothing, another version of .poly, nothing.
 At the moment I'm waiting for it to extract from bz2 and try again,
 maybe it's just bz2 ...

 error:
 SEVERE: Thread for task 1-read-xml failed
 org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to parse
 xml file /dev/stdin.  publicId=(null), systemId=(null),
 lineNumber=529642199, columnNumber=27.

 osmosis:
 bzip2 -d -c planet-100310.osm.bz2 | osmosis/bin/osmosis --read-xml
 /dev/stdin --bounding-polygon clipIncompleteEntities=true
 file=croatia50km.poly --write-xml file=- | bzip2 -c 
 20100310-croatia50km.osm.bz2


 ___
 talk mailing list
 talk@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk


___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] First drop in planet size ?

2010-03-13 Thread hbogner
Thx for help, I'll try it.

Now I have to follow 'dev' too :D

Nic Roets wrote:
 There's a bug in the code that generated this week's planet. You
 should either wait until next week or filter the planet with the
 following command:
 bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|...
 
 There has been a long discussion on 'dev', mentioning other remedies.
 


___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] First drop in planet size ?

2010-03-13 Thread John Mitchell
Will this also be a problem if you try to import via osm2pgsql into postgres?

Thanks,

John

On 3/13/10, hbogner hbog...@gmail.com wrote:
 Thx for help, I'll try it.

 Now I have to follow 'dev' too :D

 Nic Roets wrote:
 There's a bug in the code that generated this week's planet. You
 should either wait until next week or filter the planet with the
 following command:
 bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|...

 There has been a long discussion on 'dev', mentioning other remedies.



 ___
 talk mailing list
 talk@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk



-- 
John J. Mitchell

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] First drop in planet size ?

2010-03-13 Thread Nic Roets
My understanding is that all Xml compliant* parsers will abort at the
file offsets that Frederik mentions.
My advice is to use the egrep filter when in doubt, because you will
loose no more than a dozen lines in a planet file of billions of
lines.

*: (My split program is not compliant and will happily ignore these errors:
http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp)

On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell mitchellj...@gmail.com wrote:
 Will this also be a problem if you try to import via osm2pgsql into postgres?

 Thanks,

 John

 On 3/13/10, hbogner hbog...@gmail.com wrote:
 Thx for help, I'll try it.

 Now I have to follow 'dev' too :D

 Nic Roets wrote:
 There's a bug in the code that generated this week's planet. You
 should either wait until next week or filter the planet with the
 following command:
 bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|...

 There has been a long discussion on 'dev', mentioning other remedies.



 ___
 talk mailing list
 talk@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk



 --
 John J. Mitchell

 ___
 talk mailing list
 talk@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk


___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] First drop in planet size ?

2010-03-13 Thread jamesmikedup...@googlemail.com
That is very deep c++ code!
care to comment on how it works?
would be very interested to understand its performance ! looks very fast.
mike

On Sat, Mar 13, 2010 at 7:06 PM, Nic Roets nro...@gmail.com wrote:

 My understanding is that all Xml compliant* parsers will abort at the
 file offsets that Frederik mentions.
 My advice is to use the egrep filter when in doubt, because you will
 loose no more than a dozen lines in a planet file of billions of
 lines.

 *: (My split program is not compliant and will happily ignore these errors:

 http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp
 )

 On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell mitchellj...@gmail.com
 wrote:
  Will this also be a problem if you try to import via osm2pgsql into
 postgres?
 
  Thanks,
 
  John
 
  On 3/13/10, hbogner hbog...@gmail.com wrote:
  Thx for help, I'll try it.
 
  Now I have to follow 'dev' too :D
 
  Nic Roets wrote:
  There's a bug in the code that generated this week's planet. You
  should either wait until next week or filter the planet with the
  following command:
  bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|...
 
  There has been a long discussion on 'dev', mentioning other remedies.
 
 
 
  ___
  talk mailing list
  talk@openstreetmap.org
  http://lists.openstreetmap.org/listinfo/talk
 
 
 
  --
  John J. Mitchell
 
  ___
  talk mailing list
  talk@openstreetmap.org
  http://lists.openstreetmap.org/listinfo/talk
 

 ___
 talk mailing list
 talk@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] First drop in planet size ?

2010-03-13 Thread Nic Roets
Hello James,

I wanted to split the planet into overlapping bboxes like this (click
to see actual size):
http://dev.openstreetmap.de/gosmore/

On talk I described how I was dissatisfied with osmosis's memory
consumption. So I came up with this observation: Most entities will
end up in one or two extracts. And when it's two, it's in a pattern
that is often repeated, say Africa bbox and Middle East bbox. Never
Africa and Canada. So of the 2^168 possible combinations only around
3000 is actually used.

So bboxSplit allocates 16 bits for each entity. Those are then indexes
into the array of 'youniouns'. If a new node comes along, I check it
against list of bboxes and it typically matches 1 or 2. So to find out
quickly if I already have that combination of bboxes, I also have an
STL map on the array of younions. A hashtable would have been faster.

Ways and relations also trigger the code that merge younions.

bboxSplit is faster than the corresponding bunzip and any program that
uses libxml, i.e. very fast.

Regards,
Nic

On Sat, Mar 13, 2010 at 10:03 PM, jamesmikedup...@googlemail.com
jamesmikedup...@googlemail.com wrote:
 That is very deep c++ code!
 care to comment on how it works?
 would be very interested to understand its performance ! looks very fast.
 mike

 On Sat, Mar 13, 2010 at 7:06 PM, Nic Roets nro...@gmail.com wrote:

 My understanding is that all Xml compliant* parsers will abort at the
 file offsets that Frederik mentions.
 My advice is to use the egrep filter when in doubt, because you will
 loose no more than a dozen lines in a planet file of billions of
 lines.

 *: (My split program is not compliant and will happily ignore these
 errors:

 http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp)

 On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell mitchellj...@gmail.com
 wrote:
  Will this also be a problem if you try to import via osm2pgsql into
  postgres?
 
  Thanks,
 
  John
 
  On 3/13/10, hbogner hbog...@gmail.com wrote:
  Thx for help, I'll try it.
 
  Now I have to follow 'dev' too :D
 
  Nic Roets wrote:
  There's a bug in the code that generated this week's planet. You
  should either wait until next week or filter the planet with the
  following command:
  bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|...
 
  There has been a long discussion on 'dev', mentioning other remedies.
 
 
 
  ___
  talk mailing list
  talk@openstreetmap.org
  http://lists.openstreetmap.org/listinfo/talk
 
 
 
  --
  John J. Mitchell
 
  ___
  talk mailing list
  talk@openstreetmap.org
  http://lists.openstreetmap.org/listinfo/talk
 

 ___
 talk mailing list
 talk@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk



___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] First drop in planet size ?

2010-03-13 Thread jamesmikedup...@googlemail.com
you are bunziping the code ? you are scanning the bzip blocks?
it is faster than the bunzip. But maybe you mean that it is very fast.

I have experimented with bziprecover to extract blocks on their own,
i made a perl script to extract blocks from a wikipedia file that can be
used to run the processing  of the huge file by many people in parallel.

https://code.launchpad.net/~jamesmikedupont/+junk/openstreetmap-wikipedia

It is a tool to extract lat/long coords from the wikipedia articles.

Such a processing of the large files would allow us to team up and all help.
We really need to just have an index file of all the blocks so that we can
find the ones that we need. Imagine being able to process the bzip file
directly!

mike

On Sat, Mar 13, 2010 at 9:31 PM, Nic Roets nro...@gmail.com wrote:

 Hello James,

 I wanted to split the planet into overlapping bboxes like this (click
 to see actual size):
 http://dev.openstreetmap.de/gosmore/

 On talk I described how I was dissatisfied with osmosis's memory
 consumption. So I came up with this observation: Most entities will
 end up in one or two extracts. And when it's two, it's in a pattern
 that is often repeated, say Africa bbox and Middle East bbox. Never
 Africa and Canada. So of the 2^168 possible combinations only around
 3000 is actually used.

 So bboxSplit allocates 16 bits for each entity. Those are then indexes
 into the array of 'youniouns'. If a new node comes along, I check it
 against list of bboxes and it typically matches 1 or 2. So to find out
 quickly if I already have that combination of bboxes, I also have an
 STL map on the array of younions. A hashtable would have been faster.

 Ways and relations also trigger the code that merge younions.

 bboxSplit is faster than the corresponding bunzip and any program that
 uses libxml, i.e. very fast.

 Regards,
 Nic

 On Sat, Mar 13, 2010 at 10:03 PM, jamesmikedup...@googlemail.com
 jamesmikedup...@googlemail.com wrote:
  That is very deep c++ code!
  care to comment on how it works?
  would be very interested to understand its performance ! looks very fast.
  mike
 
  On Sat, Mar 13, 2010 at 7:06 PM, Nic Roets nro...@gmail.com wrote:
 
  My understanding is that all Xml compliant* parsers will abort at the
  file offsets that Frederik mentions.
  My advice is to use the egrep filter when in doubt, because you will
  loose no more than a dozen lines in a planet file of billions of
  lines.
 
  *: (My split program is not compliant and will happily ignore these
  errors:
 
 
 http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp
 )
 
  On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell mitchellj...@gmail.com
  wrote:
   Will this also be a problem if you try to import via osm2pgsql into
   postgres?
  
   Thanks,
  
   John
  
   On 3/13/10, hbogner hbog...@gmail.com wrote:
   Thx for help, I'll try it.
  
   Now I have to follow 'dev' too :D
  
   Nic Roets wrote:
   There's a bug in the code that generated this week's planet. You
   should either wait until next week or filter the planet with the
   following command:
   bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|...
  
   There has been a long discussion on 'dev', mentioning other
 remedies.
  
  
  
   ___
   talk mailing list
   talk@openstreetmap.org
   http://lists.openstreetmap.org/listinfo/talk
  
  
  
   --
   John J. Mitchell
  
   ___
   talk mailing list
   talk@openstreetmap.org
   http://lists.openstreetmap.org/listinfo/talk
  
 
  ___
  talk mailing list
  talk@openstreetmap.org
  http://lists.openstreetmap.org/listinfo/talk
 
 

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] First drop in planet size ?

2010-03-13 Thread Nic Roets
No. It runs on the uncompressed planet, like this :
bzcat /osm/planet-10*.osm.bz2 |   /osm/gosmore/bboxSplit \
   -85.05113   73.125009.44906  180.0 gzip 0720048510241024.osm.gz \
   -25.48295  120.58594   72.91964  180.0 gzip 0855020310240587.osm.gz \
   -85.05113   98.43750   13.23995  172.61719 gzip 0792047410031024.osm.gz \
...

I'm not too worried about further optimizations: Unlike wikipedia,
there isn't the same urgency to have up-to-date. Except for disaster
relief.


On Sat, Mar 13, 2010 at 10:42 PM, jamesmikedup...@googlemail.com
jamesmikedup...@googlemail.com wrote:
 you are bunziping the code ? you are scanning the bzip blocks?
 it is faster than the bunzip. But maybe you mean that it is very fast.

 I have experimented with bziprecover to extract blocks on their own,
 i made a perl script to extract blocks from a wikipedia file that can be
 used to run the processing  of the huge file by many people in parallel.

 https://code.launchpad.net/~jamesmikedupont/+junk/openstreetmap-wikipedia

 It is a tool to extract lat/long coords from the wikipedia articles.

 Such a processing of the large files would allow us to team up and all help.
 We really need to just have an index file of all the blocks so that we can
 find the ones that we need. Imagine being able to process the bzip file
 directly!

 mike

 On Sat, Mar 13, 2010 at 9:31 PM, Nic Roets nro...@gmail.com wrote:

 Hello James,

 I wanted to split the planet into overlapping bboxes like this (click
 to see actual size):
 http://dev.openstreetmap.de/gosmore/

 On talk I described how I was dissatisfied with osmosis's memory
 consumption. So I came up with this observation: Most entities will
 end up in one or two extracts. And when it's two, it's in a pattern
 that is often repeated, say Africa bbox and Middle East bbox. Never
 Africa and Canada. So of the 2^168 possible combinations only around
 3000 is actually used.

 So bboxSplit allocates 16 bits for each entity. Those are then indexes
 into the array of 'youniouns'. If a new node comes along, I check it
 against list of bboxes and it typically matches 1 or 2. So to find out
 quickly if I already have that combination of bboxes, I also have an
 STL map on the array of younions. A hashtable would have been faster.

 Ways and relations also trigger the code that merge younions.

 bboxSplit is faster than the corresponding bunzip and any program that
 uses libxml, i.e. very fast.

 Regards,
 Nic

 On Sat, Mar 13, 2010 at 10:03 PM, jamesmikedup...@googlemail.com
 jamesmikedup...@googlemail.com wrote:
  That is very deep c++ code!
  care to comment on how it works?
  would be very interested to understand its performance ! looks very
  fast.
  mike
 
  On Sat, Mar 13, 2010 at 7:06 PM, Nic Roets nro...@gmail.com wrote:
 
  My understanding is that all Xml compliant* parsers will abort at the
  file offsets that Frederik mentions.
  My advice is to use the egrep filter when in doubt, because you will
  loose no more than a dozen lines in a planet file of billions of
  lines.
 
  *: (My split program is not compliant and will happily ignore these
  errors:
 
 
  http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp)
 
  On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell mitchellj...@gmail.com
  wrote:
   Will this also be a problem if you try to import via osm2pgsql into
   postgres?
  
   Thanks,
  
   John
  
   On 3/13/10, hbogner hbog...@gmail.com wrote:
   Thx for help, I'll try it.
  
   Now I have to follow 'dev' too :D
  
   Nic Roets wrote:
   There's a bug in the code that generated this week's planet. You
   should either wait until next week or filter the planet with the
   following command:
   bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|...
  
   There has been a long discussion on 'dev', mentioning other
   remedies.
  
  
  
   ___
   talk mailing list
   talk@openstreetmap.org
   http://lists.openstreetmap.org/listinfo/talk
  
  
  
   --
   John J. Mitchell
  
   ___
   talk mailing list
   talk@openstreetmap.org
   http://lists.openstreetmap.org/listinfo/talk
  
 
  ___
  talk mailing list
  talk@openstreetmap.org
  http://lists.openstreetmap.org/listinfo/talk
 
 



___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


[OSM-talk] First drop in planet size ?

2010-03-11 Thread Nic Roets
(since we got rid of the segments)

From 8.2 GB to 8.1 GB:
http://planet.openstreetmap.org/

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] First drop in planet size ?

2010-03-11 Thread Grant Slater
On 11 March 2010 15:50, Nic Roets nro...@gmail.com wrote:
 (since we got rid of the segments)

 From 8.2 GB to 8.1 GB:
 http://planet.openstreetmap.org/


Interesting...

There has been a change to the dumping script since the previous week:
http://trac.openstreetmap.org/changeset/20396

But more likely; we have dropped about a million duplicate nodes:
http://matt.dev.openstreetmap.org/dupe_nodes/about.html

/ Grant

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] First drop in planet size ?

2010-03-11 Thread SteveC
lots of dupe node removal?

On Mar 11, 2010, at 3:50 PM, Nic Roets wrote:

 (since we got rid of the segments)
 
 From 8.2 GB to 8.1 GB:
 http://planet.openstreetmap.org/
 
 ___
 talk mailing list
 talk@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk
 

Yours c.

Steve


___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] First drop in planet size ?

2010-03-11 Thread Lars Francke
No.

 From 8.2 GB to 8.1 GB:
 http://planet.openstreetmap.org/

planet-091007.osm.bz2 09-Oct-2009 03:37  7.4G
planet-091014.osm.bz2 14-Oct-2009 20:35  7.2G

And I'm sure it has happened before.

What exactly were you trying to tell us? :)

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] First drop in planet size ?

2010-03-11 Thread Grant Slater
On 11 March 2010 16:03, Lars Francke lars.fran...@gmail.com wrote:

 planet-091007.osm.bz2                     09-Oct-2009 03:37  7.4G
 planet-091014.osm.bz2                     14-Oct-2009 20:35  7.2G


I tweaked the bz2 compression block size around then, which would
account for that size change.

/ Grant

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk