Re: [OSM-dev] Indexing of PBF files

William Temperley Tue, 16 Oct 2018 14:14:55 -0700

No, Osmium can't do what I described. The reader thread / worker thread
model you describe does not read the data in parallel on multiple machines,
which is what I have been doing, albeit with a preprocessing step to
separate the blocks as they are not currently directly addressable, or even
seperable, without a sequential read.


A delimiter would however solve this problem.

On Tue, 16 Oct 2018 at 22:43, Jochen Topf <[email protected]> wrote:

> On Tue, Oct 16, 2018 at 10:18:08PM +0200, William Temperley wrote:
> > Requiring the sequential read makes using the pbf format difficult in
> data
> > parallel processing.
> >
> > When files are split into equal sized chunks to be processed in parallel,
> > it is necessary to be able to seek to the beginning of the next block
> > (blob) to begin processing there.
> >
> > This is not currently possible with the pbf format, as the file _must_ be
> > read sequentially to figure out where the blob ends / new one begins.
> With
> > an index, or even just a simple delimiter it would be possible to figure
> > this out in a parallel processing scenario.
>
> Osmium can do this just fine. It has one thread reading the data
> sequentially, figuring out where the blocks start and end and parceling
> out the block decoding work to other threads. Not as simple and probably
> not quite as fast as with an index pointing to those blocks, but it does
> work.
>
> Indexes have the drawback that you can't streaming-write the data any
> more, you have to go back to write the index. Or you write them at the
> end, then you can't streaming read any more (at least when you want to
> use the index).
>
> Jochen
> --
> Jochen Topf  [email protected]  https://www.jochentopf.com/
> +49-351-31778688
>

_______________________________________________
dev mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/dev

Re: [OSM-dev] Indexing of PBF files

Reply via email to