Re: [OSM-talk] First drop in planet size ?
Thanks it worked :D Nic Roets wrote: There's a bug in the code that generated this week's planet. You should either wait until next week or filter the planet with the following command: bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|... There has been a long discussion on 'dev', mentioning other remedies. ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] First drop in planet size ?
Nic Roets wrote: (since we got rid of the segments) From 8.2 GB to 8.1 GB: http://planet.openstreetmap.org/ Maybe something is wrong with it. I don't know if anybody has the same problem but I can't manage to complete an extract with osmosis. I'm doing the same thing as everytime and it doesn't work. I tried downloading it again, nothing, another version of osmosis, nothing, another version of .poly, nothing. At the moment I'm waiting for it to extract from bz2 and try again, maybe it's just bz2 ... error: SEVERE: Thread for task 1-read-xml failed org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to parse xml file /dev/stdin. publicId=(null), systemId=(null), lineNumber=529642199, columnNumber=27. osmosis: bzip2 -d -c planet-100310.osm.bz2 | osmosis/bin/osmosis --read-xml /dev/stdin --bounding-polygon clipIncompleteEntities=true file=croatia50km.poly --write-xml file=- | bzip2 -c 20100310-croatia50km.osm.bz2 ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] First drop in planet size ?
There's a bug in the code that generated this week's planet. You should either wait until next week or filter the planet with the following command: bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|... There has been a long discussion on 'dev', mentioning other remedies. On Sat, Mar 13, 2010 at 6:29 PM, hbogner hbog...@gmail.com wrote: Nic Roets wrote: (since we got rid of the segments) From 8.2 GB to 8.1 GB: http://planet.openstreetmap.org/ Maybe something is wrong with it. I don't know if anybody has the same problem but I can't manage to complete an extract with osmosis. I'm doing the same thing as everytime and it doesn't work. I tried downloading it again, nothing, another version of osmosis, nothing, another version of .poly, nothing. At the moment I'm waiting for it to extract from bz2 and try again, maybe it's just bz2 ... error: SEVERE: Thread for task 1-read-xml failed org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to parse xml file /dev/stdin. publicId=(null), systemId=(null), lineNumber=529642199, columnNumber=27. osmosis: bzip2 -d -c planet-100310.osm.bz2 | osmosis/bin/osmosis --read-xml /dev/stdin --bounding-polygon clipIncompleteEntities=true file=croatia50km.poly --write-xml file=- | bzip2 -c 20100310-croatia50km.osm.bz2 ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] First drop in planet size ?
Thx for help, I'll try it. Now I have to follow 'dev' too :D Nic Roets wrote: There's a bug in the code that generated this week's planet. You should either wait until next week or filter the planet with the following command: bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|... There has been a long discussion on 'dev', mentioning other remedies. ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] First drop in planet size ?
Will this also be a problem if you try to import via osm2pgsql into postgres? Thanks, John On 3/13/10, hbogner hbog...@gmail.com wrote: Thx for help, I'll try it. Now I have to follow 'dev' too :D Nic Roets wrote: There's a bug in the code that generated this week's planet. You should either wait until next week or filter the planet with the following command: bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|... There has been a long discussion on 'dev', mentioning other remedies. ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk -- John J. Mitchell ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] First drop in planet size ?
My understanding is that all Xml compliant* parsers will abort at the file offsets that Frederik mentions. My advice is to use the egrep filter when in doubt, because you will loose no more than a dozen lines in a planet file of billions of lines. *: (My split program is not compliant and will happily ignore these errors: http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp) On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell mitchellj...@gmail.com wrote: Will this also be a problem if you try to import via osm2pgsql into postgres? Thanks, John On 3/13/10, hbogner hbog...@gmail.com wrote: Thx for help, I'll try it. Now I have to follow 'dev' too :D Nic Roets wrote: There's a bug in the code that generated this week's planet. You should either wait until next week or filter the planet with the following command: bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|... There has been a long discussion on 'dev', mentioning other remedies. ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk -- John J. Mitchell ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] First drop in planet size ?
That is very deep c++ code! care to comment on how it works? would be very interested to understand its performance ! looks very fast. mike On Sat, Mar 13, 2010 at 7:06 PM, Nic Roets nro...@gmail.com wrote: My understanding is that all Xml compliant* parsers will abort at the file offsets that Frederik mentions. My advice is to use the egrep filter when in doubt, because you will loose no more than a dozen lines in a planet file of billions of lines. *: (My split program is not compliant and will happily ignore these errors: http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp ) On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell mitchellj...@gmail.com wrote: Will this also be a problem if you try to import via osm2pgsql into postgres? Thanks, John On 3/13/10, hbogner hbog...@gmail.com wrote: Thx for help, I'll try it. Now I have to follow 'dev' too :D Nic Roets wrote: There's a bug in the code that generated this week's planet. You should either wait until next week or filter the planet with the following command: bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|... There has been a long discussion on 'dev', mentioning other remedies. ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk -- John J. Mitchell ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] First drop in planet size ?
Hello James, I wanted to split the planet into overlapping bboxes like this (click to see actual size): http://dev.openstreetmap.de/gosmore/ On talk I described how I was dissatisfied with osmosis's memory consumption. So I came up with this observation: Most entities will end up in one or two extracts. And when it's two, it's in a pattern that is often repeated, say Africa bbox and Middle East bbox. Never Africa and Canada. So of the 2^168 possible combinations only around 3000 is actually used. So bboxSplit allocates 16 bits for each entity. Those are then indexes into the array of 'youniouns'. If a new node comes along, I check it against list of bboxes and it typically matches 1 or 2. So to find out quickly if I already have that combination of bboxes, I also have an STL map on the array of younions. A hashtable would have been faster. Ways and relations also trigger the code that merge younions. bboxSplit is faster than the corresponding bunzip and any program that uses libxml, i.e. very fast. Regards, Nic On Sat, Mar 13, 2010 at 10:03 PM, jamesmikedup...@googlemail.com jamesmikedup...@googlemail.com wrote: That is very deep c++ code! care to comment on how it works? would be very interested to understand its performance ! looks very fast. mike On Sat, Mar 13, 2010 at 7:06 PM, Nic Roets nro...@gmail.com wrote: My understanding is that all Xml compliant* parsers will abort at the file offsets that Frederik mentions. My advice is to use the egrep filter when in doubt, because you will loose no more than a dozen lines in a planet file of billions of lines. *: (My split program is not compliant and will happily ignore these errors: http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp) On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell mitchellj...@gmail.com wrote: Will this also be a problem if you try to import via osm2pgsql into postgres? Thanks, John On 3/13/10, hbogner hbog...@gmail.com wrote: Thx for help, I'll try it. Now I have to follow 'dev' too :D Nic Roets wrote: There's a bug in the code that generated this week's planet. You should either wait until next week or filter the planet with the following command: bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|... There has been a long discussion on 'dev', mentioning other remedies. ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk -- John J. Mitchell ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] First drop in planet size ?
you are bunziping the code ? you are scanning the bzip blocks? it is faster than the bunzip. But maybe you mean that it is very fast. I have experimented with bziprecover to extract blocks on their own, i made a perl script to extract blocks from a wikipedia file that can be used to run the processing of the huge file by many people in parallel. https://code.launchpad.net/~jamesmikedupont/+junk/openstreetmap-wikipedia It is a tool to extract lat/long coords from the wikipedia articles. Such a processing of the large files would allow us to team up and all help. We really need to just have an index file of all the blocks so that we can find the ones that we need. Imagine being able to process the bzip file directly! mike On Sat, Mar 13, 2010 at 9:31 PM, Nic Roets nro...@gmail.com wrote: Hello James, I wanted to split the planet into overlapping bboxes like this (click to see actual size): http://dev.openstreetmap.de/gosmore/ On talk I described how I was dissatisfied with osmosis's memory consumption. So I came up with this observation: Most entities will end up in one or two extracts. And when it's two, it's in a pattern that is often repeated, say Africa bbox and Middle East bbox. Never Africa and Canada. So of the 2^168 possible combinations only around 3000 is actually used. So bboxSplit allocates 16 bits for each entity. Those are then indexes into the array of 'youniouns'. If a new node comes along, I check it against list of bboxes and it typically matches 1 or 2. So to find out quickly if I already have that combination of bboxes, I also have an STL map on the array of younions. A hashtable would have been faster. Ways and relations also trigger the code that merge younions. bboxSplit is faster than the corresponding bunzip and any program that uses libxml, i.e. very fast. Regards, Nic On Sat, Mar 13, 2010 at 10:03 PM, jamesmikedup...@googlemail.com jamesmikedup...@googlemail.com wrote: That is very deep c++ code! care to comment on how it works? would be very interested to understand its performance ! looks very fast. mike On Sat, Mar 13, 2010 at 7:06 PM, Nic Roets nro...@gmail.com wrote: My understanding is that all Xml compliant* parsers will abort at the file offsets that Frederik mentions. My advice is to use the egrep filter when in doubt, because you will loose no more than a dozen lines in a planet file of billions of lines. *: (My split program is not compliant and will happily ignore these errors: http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp ) On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell mitchellj...@gmail.com wrote: Will this also be a problem if you try to import via osm2pgsql into postgres? Thanks, John On 3/13/10, hbogner hbog...@gmail.com wrote: Thx for help, I'll try it. Now I have to follow 'dev' too :D Nic Roets wrote: There's a bug in the code that generated this week's planet. You should either wait until next week or filter the planet with the following command: bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|... There has been a long discussion on 'dev', mentioning other remedies. ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk -- John J. Mitchell ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] First drop in planet size ?
No. It runs on the uncompressed planet, like this : bzcat /osm/planet-10*.osm.bz2 | /osm/gosmore/bboxSplit \ -85.05113 73.125009.44906 180.0 gzip 0720048510241024.osm.gz \ -25.48295 120.58594 72.91964 180.0 gzip 0855020310240587.osm.gz \ -85.05113 98.43750 13.23995 172.61719 gzip 0792047410031024.osm.gz \ ... I'm not too worried about further optimizations: Unlike wikipedia, there isn't the same urgency to have up-to-date. Except for disaster relief. On Sat, Mar 13, 2010 at 10:42 PM, jamesmikedup...@googlemail.com jamesmikedup...@googlemail.com wrote: you are bunziping the code ? you are scanning the bzip blocks? it is faster than the bunzip. But maybe you mean that it is very fast. I have experimented with bziprecover to extract blocks on their own, i made a perl script to extract blocks from a wikipedia file that can be used to run the processing of the huge file by many people in parallel. https://code.launchpad.net/~jamesmikedupont/+junk/openstreetmap-wikipedia It is a tool to extract lat/long coords from the wikipedia articles. Such a processing of the large files would allow us to team up and all help. We really need to just have an index file of all the blocks so that we can find the ones that we need. Imagine being able to process the bzip file directly! mike On Sat, Mar 13, 2010 at 9:31 PM, Nic Roets nro...@gmail.com wrote: Hello James, I wanted to split the planet into overlapping bboxes like this (click to see actual size): http://dev.openstreetmap.de/gosmore/ On talk I described how I was dissatisfied with osmosis's memory consumption. So I came up with this observation: Most entities will end up in one or two extracts. And when it's two, it's in a pattern that is often repeated, say Africa bbox and Middle East bbox. Never Africa and Canada. So of the 2^168 possible combinations only around 3000 is actually used. So bboxSplit allocates 16 bits for each entity. Those are then indexes into the array of 'youniouns'. If a new node comes along, I check it against list of bboxes and it typically matches 1 or 2. So to find out quickly if I already have that combination of bboxes, I also have an STL map on the array of younions. A hashtable would have been faster. Ways and relations also trigger the code that merge younions. bboxSplit is faster than the corresponding bunzip and any program that uses libxml, i.e. very fast. Regards, Nic On Sat, Mar 13, 2010 at 10:03 PM, jamesmikedup...@googlemail.com jamesmikedup...@googlemail.com wrote: That is very deep c++ code! care to comment on how it works? would be very interested to understand its performance ! looks very fast. mike On Sat, Mar 13, 2010 at 7:06 PM, Nic Roets nro...@gmail.com wrote: My understanding is that all Xml compliant* parsers will abort at the file offsets that Frederik mentions. My advice is to use the egrep filter when in doubt, because you will loose no more than a dozen lines in a planet file of billions of lines. *: (My split program is not compliant and will happily ignore these errors: http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp) On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell mitchellj...@gmail.com wrote: Will this also be a problem if you try to import via osm2pgsql into postgres? Thanks, John On 3/13/10, hbogner hbog...@gmail.com wrote: Thx for help, I'll try it. Now I have to follow 'dev' too :D Nic Roets wrote: There's a bug in the code that generated this week's planet. You should either wait until next week or filter the planet with the following command: bzcat /osm/planet-10*.osm.bz2 |egrep -v '#[0-9]*;'|... There has been a long discussion on 'dev', mentioning other remedies. ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk -- John J. Mitchell ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk
[OSM-talk] First drop in planet size ?
(since we got rid of the segments) From 8.2 GB to 8.1 GB: http://planet.openstreetmap.org/ ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] First drop in planet size ?
On 11 March 2010 15:50, Nic Roets nro...@gmail.com wrote: (since we got rid of the segments) From 8.2 GB to 8.1 GB: http://planet.openstreetmap.org/ Interesting... There has been a change to the dumping script since the previous week: http://trac.openstreetmap.org/changeset/20396 But more likely; we have dropped about a million duplicate nodes: http://matt.dev.openstreetmap.org/dupe_nodes/about.html / Grant ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] First drop in planet size ?
lots of dupe node removal? On Mar 11, 2010, at 3:50 PM, Nic Roets wrote: (since we got rid of the segments) From 8.2 GB to 8.1 GB: http://planet.openstreetmap.org/ ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk Yours c. Steve ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] First drop in planet size ?
No. From 8.2 GB to 8.1 GB: http://planet.openstreetmap.org/ planet-091007.osm.bz2 09-Oct-2009 03:37 7.4G planet-091014.osm.bz2 14-Oct-2009 20:35 7.2G And I'm sure it has happened before. What exactly were you trying to tell us? :) ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] First drop in planet size ?
On 11 March 2010 16:03, Lars Francke lars.fran...@gmail.com wrote: planet-091007.osm.bz2 09-Oct-2009 03:37 7.4G planet-091014.osm.bz2 14-Oct-2009 20:35 7.2G I tweaked the bz2 compression block size around then, which would account for that size change. / Grant ___ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk