On Di, Apr 28, 2015 at 06:56:23 -0600, Martijn van Exel wrote: > Not sure if this has been discussed recently, but we've been thinking > about improving osmosis PBF reading performance over at Telenav. My > colleague Jon (cc) has come up with a suggestion that I want to put > forward for discussion. I'm posting this to both osmosis-dev as well > as dev because it affects the PBF format definition. > > When reading a large PBF resource from a random access file (as > opposed to a stream), it might be possible to significantly increase > throughput by reading data of the same entity type from multiple > threads simultaneously, making use of an optional directory structure > to locate valid blocks of nodes, ways and relations for threads to > consume. > > To support parallel access, an optional directory_offset might be > added to the HeaderBlock: > > message HeaderBlock { > … > optional int64 directory_offset > } > > The directory_offset field would be the seek location in the file of a > Directory message which is written at the end of the file (since the > directory is flexible in length and all offsets are only known after > writing all data to the PBF file). The directory itself is simply a > list of valid read offsets for each entity type. Threads can read data > from a given offset in the list to the next offset. The best chunk > size for blocks in the directory can be determined through > experimentation, although something around 1MB might be a good first > guess. > > message Directory { > repeated int64 node_block_offsets; > repeated int64 way_block_offsets; > repeated int64 relation_block_offsets; > } > > Before we explore this further, I'd like to know if this has been > attempted before, and what concerns there may be.
PBF files already come in blocks with a length header in front of every block. Osmium reads this length header in one thread and then puts the data of each block into a work queue to be parsed by as many threads as you want. This way you already get a nice speedup without any changes to the file format. Jochen -- Jochen Topf joc...@remote.org http://www.jochentopf.com/ +49-351-31778688 _______________________________________________ osmosis-dev mailing list osmosis-dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/osmosis-dev