Hi all, Not sure if this has been discussed recently, but we've been thinking about improving osmosis PBF reading performance over at Telenav. My colleague Jon (cc) has come up with a suggestion that I want to put forward for discussion. I'm posting this to both osmosis-dev as well as dev because it affects the PBF format definition.
When reading a large PBF resource from a random access file (as opposed to a stream), it might be possible to significantly increase throughput by reading data of the same entity type from multiple threads simultaneously, making use of an optional directory structure to locate valid blocks of nodes, ways and relations for threads to consume. To support parallel access, an optional directory_offset might be added to the HeaderBlock: message HeaderBlock { … optional int64 directory_offset } The directory_offset field would be the seek location in the file of a Directory message which is written at the end of the file (since the directory is flexible in length and all offsets are only known after writing all data to the PBF file). The directory itself is simply a list of valid read offsets for each entity type. Threads can read data from a given offset in the list to the next offset. The best chunk size for blocks in the directory can be determined through experimentation, although something around 1MB might be a good first guess. message Directory { repeated int64 node_block_offsets; repeated int64 way_block_offsets; repeated int64 relation_block_offsets; } Before we explore this further, I'd like to know if this has been attempted before, and what concerns there may be. Best, Martijn _______________________________________________ osmosis-dev mailing list osmosis-dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/osmosis-dev