On Mo, Apr 18, 2016 at 11:52:24 +0200, Sylvain Melin wrote:
> On 18/04/2016 11:00, Jochen Topf wrote:
> >On Mo, Apr 18, 2016 at 10:10:06 +0200, Sylvain Melin wrote:
> >>My plan is to :
> >>- exploit a planet sized pbf file
> >>- cut it into 1° tiles using osmosis
> >>- filter and extract the data from these tiles as shapefiles using libosmium
> >If you are writing your own program anyway to create those shapefiles, why
> >don't you do the splitting in this step *after* creating the geometries and
> >before writing them into shapefiles? That is probably much easier to do than
> >based on the PBF due to the structure of the OSM data files.
> >
> >Jochen
> Maybe I'm wrong but because I don't want to parse the full planet.osm.pbf
> every time I want to extract a small set of data.
> The processing time seems to grow exponentially with the size of source file

The time of what processing exactly? I don't see anything in what you are doing
that should scale worse then linearly. Of course if you don't have enough memory
you'll run into problems.

> so having an intermediate level with 1° sized pbf containing everything
> seems very practical to me.

In theory yes, but, as you noticed, you'll have to handle all objects specially
that straddle tile boundaries.

> Also, my osmium program loops over the target tile and parse the appropriate
> pbf :
> 
> /for each j in [-90,89]//
> //{//
> //        for each i in [-180,179]//
> //        {//
> //                create osmium::handler//
> //                parse i_j.pbf with osmium::io::Reader//
> //                extract data to single handler with osmium::apply//
> //        }//
> ////}/
> 
> Do you think it would be more efficient to have a single big PBF and extract
> data to several handlers ?

It will probably be most efficient to just do everything in one go. And only at
the moment where you are writing out the finished feature into the shapefile,
decide in which shapefile it should belong. You'll only have one handler, but
180*360 output shapefiles. 

> Is it even possible without filling the RAM ?

Depends on how much RAM you have. You'll need 32GB RAM for the node location
store. And you'll need same RAM to buffer the output, because you can't write
to 180*360 files at the same time efficiently. Maybe fewer files would be
better? (Also you'll have not only one shape file for each tile, but probably
dozens for all the different layers of data, which makes this problem worse.)

So if you don't have this kind of memory, you have a problem.

You can also have a look at
https://github.com/joto/osm-history-splitter

which should be more efficient at splitting a planet into smaller files than
Osmosis. But people have reported some issues with this software. It is on my
TODO list to look at this and fix them, but that will take a while.

Jochen
-- 
Jochen Topf  joc...@remote.org  http://www.jochentopf.com/  +49-351-31778688

_______________________________________________
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/osmosis-dev

Reply via email to