On 18/04/2016 15:01, Sylvain Melin wrote:
On 18/04/2016 12:23, Jochen Topf wrote:
On Mo, Apr 18, 2016 at 11:52:24 +0200, Sylvain Melin wrote:
On 18/04/2016 11:00, Jochen Topf wrote:
On Mo, Apr 18, 2016 at 10:10:06 +0200, Sylvain Melin wrote:
My plan is to :
- exploit a planet sized pbf file
- cut it into 1° tiles using osmosis
- filter and extract the data from these tiles as shapefiles using
libosmium
If you are writing your own program anyway to create those
shapefiles, why
don't you do the splitting in this step *after* creating the
geometries and
before writing them into shapefiles? That is probably much easier
to do than
based on the PBF due to the structure of the OSM data files.
Jochen
Maybe I'm wrong but because I don't want to parse the full
planet.osm.pbf
every time I want to extract a small set of data.
The processing time seems to grow exponentially with the size of
source file
The time of what processing exactly? I don't see anything in what you
are doing
that should scale worse then linearly. Of course if you don't have
enough memory
you'll run into problems.
so having an intermediate level with 1° sized pbf containing everything
seems very practical to me.
In theory yes, but, as you noticed, you'll have to handle all objects
specially
that straddle tile boundaries.
Also, my osmium program loops over the target tile and parse the
appropriate
pbf :
/for each j in [-90,89]//
//{//
// for each i in [-180,179]//
// {//
// create osmium::handler//
// parse i_j.pbf with osmium::io::Reader//
// extract data to single handler with osmium::apply//
// }//
////}/
Do you think it would be more efficient to have a single big PBF and
extract
data to several handlers ?
It will probably be most efficient to just do everything in one go.
And only at
the moment where you are writing out the finished feature into the
shapefile,
decide in which shapefile it should belong. You'll only have one
handler, but
180*360 output shapefiles.
Is it even possible without filling the RAM ?
Depends on how much RAM you have. You'll need 32GB RAM for the node
location
store. And you'll need same RAM to buffer the output, because you
can't write
to 180*360 files at the same time efficiently. Maybe fewer files
would be
better? (Also you'll have not only one shape file for each tile, but
probably
dozens for all the different layers of data, which makes this problem
worse.)
So if you don't have this kind of memory, you have a problem.
You can also have a look at
https://github.com/joto/osm-history-splitter
which should be more efficient at splitting a planet into smaller
files than
Osmosis. But people have reported some issues with this software. It
is on my
TODO list to look at this and fix them, but that will take a while.
Jochen
Ok I got it ! Unfortunately, I don't have enough RAM for this method.
I did not thought about it before but given the small amount of data I
need, I wonder if using xapi to request data per degree isn't the most
obvious way to get the data I need, unless xapi has the same kind of
problem with the borders.
I'll also take a look at osm-history-splitter.
Thank you very much !
Sylvain
I finally found a proper method to do this.
I wrote a bash script that uses overpass api to request and filter the
data, and convert the resulting osm.xml file to shapefile with my osmium
program.
Overpass api does not clip data on the edges of the bounding box.
Also, I only have the data I need on my hard drive and I'm sure it's up
to date.
Thank you for your help.
I hope it will help people facing the same issue.
Regards,
Sylvain
_______________________________________________
osmosis-dev mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/osmosis-dev