No, Osmium can't do what I described. The reader thread / worker thread model you describe does not read the data in parallel on multiple machines, which is what I have been doing, albeit with a preprocessing step to separate the blocks as they are not currently directly addressable, or even seperable, without a sequential read.
A delimiter would however solve this problem. On Tue, 16 Oct 2018 at 22:43, Jochen Topf <[email protected]> wrote: > On Tue, Oct 16, 2018 at 10:18:08PM +0200, William Temperley wrote: > > Requiring the sequential read makes using the pbf format difficult in > data > > parallel processing. > > > > When files are split into equal sized chunks to be processed in parallel, > > it is necessary to be able to seek to the beginning of the next block > > (blob) to begin processing there. > > > > This is not currently possible with the pbf format, as the file _must_ be > > read sequentially to figure out where the blob ends / new one begins. > With > > an index, or even just a simple delimiter it would be possible to figure > > this out in a parallel processing scenario. > > Osmium can do this just fine. It has one thread reading the data > sequentially, figuring out where the blocks start and end and parceling > out the block decoding work to other threads. Not as simple and probably > not quite as fast as with an index pointing to those blocks, but it does > work. > > Indexes have the drawback that you can't streaming-write the data any > more, you have to go back to write the index. Or you write them at the > end, then you can't streaming read any more (at least when you want to > use the index). > > Jochen > -- > Jochen Topf [email protected] https://www.jochentopf.com/ > +49-351-31778688 >
_______________________________________________ dev mailing list [email protected] https://lists.openstreetmap.org/listinfo/dev

