I think for exporting data of a given area, the cell based spatial splitting will outperform any database solution that treats individual geometries by an order of magnitude. But I think you are right about PBF being too simple, random access issues and updateability.
I still think that a clever block handling would work well for both extracts and updates. What do you think about changing vex files to use offset pointers for it's data, and a fill factor of say 10-20 percent? So the initial cell size would be 20% larger on disk, change set data would mean to simply unlink the offset and append data to the current offset. When growing too large, a new file would be generated with again 20% additional space for change sets. Is there any data missing from the vex file format, or is all included from OSM pbf? Ben Von meinem iPad gesendet > Am 06.01.2016 um 15:14 schrieb Andrew Byrd <and...@fastmail.net>: > > >> On 06 Jan 2016, at 14:10, Stadin, Benjamin >> <benjamin.sta...@heidelberg-mobil.com> wrote: >> >> And about the cell data: I'm considering to just reuse OSM pbf format, >> without preserving sort and size attributes. When exporting the data from >> individual grid cells, all data items will be streamed to the output ordered >> by type and ID. A simple in memory AVL tree should be sufficient (storing id >> keys and pointers to items as node data, iterating lowest to highest id on >> output) > > We wanted to preserve conventional entity ordering (node, way, relation) but > maintaining increasing ID number was not important for us; I preferred a > constant-memory export process (i.e. memory consumption does not grow with > the geographic size of the extract) that simply iterates over index cells in > order three times, dumping first nodes, then ways, then relations. > > If I understand you correctly you’d use the PBF format as your internal > storage format, making one PBF file per spatial index cell (essentially > splitting planet.pbf into one PBF file per tile). I can see the appeal of > simplicity here, and I considered this approach myself, but I think PBF would > be problematic if you intend to perform random access within those tiles to > apply minutely updates. PBF is a data interchange format, to my knowledge > designed and used primarily for moving or streaming database dumps or > extracts from one site to another. You’ll end up doing a lot of > decompress-filter-modify-rewrite operations on entire tiles. It could work, > but it seems awkward and resource intensive. I can also imagine running into > some problems with a 1 to N geographic PBF splitter. Due to PBF's block-based > nature you might have to keep a prohibitively large number of files open > simultaneously during your planet-to-tile splitter step. If the planet.pbf > must pass through some intermediate representation to allow splitting > (essentially a spatially indexed database of some kind), why not keep it in > that intermediate representation and perform the spatial splitting on demand. > > -Andrew > _______________________________________________ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev