I think for exporting data of a given area, the cell based spatial splitting 
will outperform any database solution that treats individual geometries by an 
order of magnitude. But I think you are right about PBF being too simple, 
random access issues and updateability. 

I still think that a clever block handling would work well for both extracts 
and updates. 
What do you think about changing vex files to use offset pointers for it's 
data, and a fill factor of say 10-20 percent? 
So the initial cell size would be 20% larger on disk, change set data would 
mean to simply unlink the offset and append data to the current offset. When 
growing too large, a new file would be generated with again 20% additional 
space for change sets. 
Is there any data missing from the vex file format, or is all included from OSM 
pbf? 

Ben

Von meinem iPad gesendet

> Am 06.01.2016 um 15:14 schrieb Andrew Byrd <and...@fastmail.net>:
> 
> 
>> On 06 Jan 2016, at 14:10, Stadin, Benjamin 
>> <benjamin.sta...@heidelberg-mobil.com> wrote:
>> 
>> And about the cell data: I'm considering to just reuse OSM pbf format, 
>> without preserving sort and size attributes. When exporting the data from 
>> individual grid cells, all data items will be streamed to the output ordered 
>> by type and ID. A simple in memory AVL tree should be sufficient (storing id 
>> keys and pointers to items as node data, iterating lowest to highest id on 
>> output)
> 
> We wanted to preserve conventional entity ordering (node, way, relation) but 
> maintaining increasing ID number was not important for us; I preferred a 
> constant-memory export process (i.e. memory consumption does not grow with 
> the geographic size of the extract) that simply iterates over index cells in 
> order three times, dumping first nodes, then ways, then relations.
> 
> If I understand you correctly you’d use the PBF format as your internal 
> storage format, making one PBF file per spatial index cell (essentially 
> splitting planet.pbf into one PBF file per tile). I can see the appeal of 
> simplicity here, and I considered this approach myself, but I think PBF would 
> be problematic if you intend to perform random access within those tiles to 
> apply minutely updates. PBF is a data interchange format, to my knowledge 
> designed and used primarily for moving or streaming database dumps or 
> extracts from one site to another. You’ll end up doing a lot of 
> decompress-filter-modify-rewrite operations on entire tiles. It could work, 
> but it seems awkward and resource intensive. I can also imagine running into 
> some problems with a 1 to N geographic PBF splitter. Due to PBF's block-based 
> nature you might have to keep a prohibitively large number of files open 
> simultaneously during your planet-to-tile splitter step. If the planet.pbf 
> must pass through some intermediate representation to allow splitting 
> (essentially a spatially indexed database of some kind), why not keep it in 
> that intermediate representation and perform the spatial splitting on demand.
> 
> -Andrew
> 

_______________________________________________
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev

Reply via email to