On 18/04/2016 12:23, Jochen Topf wrote:
On Mo, Apr 18, 2016 at 11:52:24 +0200, Sylvain Melin wrote:
On 18/04/2016 11:00, Jochen Topf wrote:
On Mo, Apr 18, 2016 at 10:10:06 +0200, Sylvain Melin wrote:
My plan is to :
- exploit a planet sized pbf file
- cut it into 1° tiles using osmosis
- filter and extract the data from these tiles as shapefiles using libosmium
If you are writing your own program anyway to create those shapefiles, why
don't you do the splitting in this step *after* creating the geometries and
before writing them into shapefiles? That is probably much easier to do than
based on the PBF due to the structure of the OSM data files.
Jochen
Maybe I'm wrong but because I don't want to parse the full planet.osm.pbf
every time I want to extract a small set of data.
The processing time seems to grow exponentially with the size of source file
The time of what processing exactly? I don't see anything in what you are doing
that should scale worse then linearly. Of course if you don't have enough memory
you'll run into problems.
so having an intermediate level with 1° sized pbf containing everything
seems very practical to me.
In theory yes, but, as you noticed, you'll have to handle all objects specially
that straddle tile boundaries.
Also, my osmium program loops over the target tile and parse the appropriate
pbf :
/for each j in [-90,89]//
//{//
// for each i in [-180,179]//
// {//
// create osmium::handler//
// parse i_j.pbf with osmium::io::Reader//
// extract data to single handler with osmium::apply//
// }//
////}/
Do you think it would be more efficient to have a single big PBF and extract
data to several handlers ?
It will probably be most efficient to just do everything in one go. And only at
the moment where you are writing out the finished feature into the shapefile,
decide in which shapefile it should belong. You'll only have one handler, but
180*360 output shapefiles.
Is it even possible without filling the RAM ?
Depends on how much RAM you have. You'll need 32GB RAM for the node location
store. And you'll need same RAM to buffer the output, because you can't write
to 180*360 files at the same time efficiently. Maybe fewer files would be
better? (Also you'll have not only one shape file for each tile, but probably
dozens for all the different layers of data, which makes this problem worse.)
So if you don't have this kind of memory, you have a problem.
You can also have a look at
https://github.com/joto/osm-history-splitter
which should be more efficient at splitting a planet into smaller files than
Osmosis. But people have reported some issues with this software. It is on my
TODO list to look at this and fix them, but that will take a while.
Jochen
Ok I got it ! Unfortunately, I don't have enough RAM for this method.
I did not thought about it before but given the small amount of data I
need, I wonder if using xapi to request data per degree isn't the most
obvious way to get the data I need, unless xapi has the same kind of
problem with the borders.
I'll also take a look at osm-history-splitter.
Thank you very much !
Sylvain
_______________________________________________
osmosis-dev mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/osmosis-dev