Re: [OSM-dev] public transport stop_area relations: have you implemented for routing or another purpose?

2019-08-01 Thread Andrew Byrd
Hello,

I work on public transport journey planning software. We use OSM data heavily, 
but primarily for the parts of journeys outside public transportation: the 
first- or last-mile or transfer segments. In my experience we rarely use OSM 
data directly for the transit part of routing. We use data loaded from GTFS or 
Netex feeds, which are provided by the public transportation operators or 
regional authorities. Public transport entities or tags from OSM are generally 
used as a sanity check or backup source of information, for example to 
determine from which road it is easiest to reach a stop, when that stop is 
physically located halfway between two candidate roads.

These data sources do have a concept of stop hierarchies. GTFS groups stops 
into stations, and Netex allows more complex hierarchies. If data of this kind 
are imported into OSM, or if the data in OSM are expected to resemble the 
conceptual model used in exchanging public transport data, then I'd expect 
these groupings to be present. I can confirm that the groupings are meaningful 
and useful in routing applications, both when finding paths and when presenting 
those paths to the end user.

We mostly work in places where operators provide detailed data about their 
services. In places where there is no such official data, or where mappers have 
created a better data set than the official one, someone might want to route on 
the user-generated data in OpenStreetMap. The station groupings would be useful 
in that case. Even for purely visual (non-routing) map display, I can imagine 
the station groupings would assist in layout and labeling.

Regards,
Andrew



> On 1 Aug 2019, at 15:11, Joseph Eisenberg  wrote:
> 
> I'm trying to find out if the type=public_transport +
> public_transport=stop_area relation or *=stop_area_group relation is
> used by any developer or database user.
> 
> These relations are supposed to group together features like all the
> platforms in a bus station or train station. However, it seems like
> these relations may not be necessary or useful for routing
> applications.
> 
> Has anyone looked into them or know of any current use cases?
> 
> Joseph
> 
> ___
> dev mailing list
> dev@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/dev


___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] pyosmium: osmium extract

2018-11-10 Thread Andrew Byrd


> On 11 Nov 2018, at 08:52, koji higuchi  wrote:
> Thanks for your information.
> Problem is that osmium is not available for windows, and it seems too 
> difficult to install in linux

The osmium command line tool is available as a pre-compiled software package on 
some popular linux distributions. It should be quite easy to install.

On Debian and derivatives (e.g. Ubuntu) you can install it with: ‘sudo apt-get 
install osmium-tool’. Similarly on Fedora there is an RPM package called 
osmium-tool. 

On MacOS you can install it with 'brew install osmium-tool'.

It looks like there are no easily available build of osmium-tool for Windows. 
This is fairly common for open source software. Many FOSS developers have 
little or no experience with Windows or actively avoid it. If you’re going to 
be building data pipelines using a lot of open source tools it’s probably worth 
having a machine (or virtual machine) with a unix-like OS on it.

Here is a support issue where people were setting up a Windows build of osmium. 
It looks like it was eventually abandoned because there was no one to maintain 
it. https://github.com/osmcode/osmium-tool/pull/105 


-Andrew
 ___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Spatial road network data sets and implementation using API

2017-03-28 Thread Andrew Byrd

> On 28 Mar 2017, at 15:16, Debajyoti Ghosh <4u.debajy...@gmail.com> wrote:
> 1] Can u specifically refer a site for data where from i can dwnld & 
> visualize in triplet  format and n POIs is 
> ? In which editor i'll be able to see ?

OpenStreetMap data is good for this purpose. Many people provide extracts of 
OpenStreetMap data for specific geographic regions, for example: 
https://mapzen.com/data/metro-extracts/ 


But OSM data is essentially a huge list of points (nodes) and ways formed by 
connecting those nodes in a certain sequence. It needs to be converted to a 
graph if you want to apply graph algorithms, and generally you’ll want to treat 
the nodes that intersections differently than those that only contribute to the 
shape and length of the edges between the intersections. 

I don’t know of any source for just converting to a routable graph and listing 
the edge tuples, but again by using an existing routing system you will have 
that information in data structures in memory. To cite the system I’m most 
familiar with (OpenTripPlanner) if you load some OSM data, you’ll end up with a 
Graph object:
https://github.com/opentripplanner/OpenTripPlanner/blob/master/src/main/java/org/opentripplanner/routing/graph/Graph.java
 


which then contains a lot of Vertex and Edge objects, for example in the fields 
vertices and edgeById. You could write whatever code you want to dump those out 
as tuples or route on them.

Andrew

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Spatial road network data sets and implementation using API

2017-03-28 Thread Andrew Byrd

> On 28 Mar 2017, at 14:46, Debajyoti Ghosh <4u.debajy...@gmail.com> wrote:
>  I need spatial data sets in this format   
> for each POIs need 

Both types of data are available in OSM. If the (start node, end node, 
distance) tuples you want are for individual edges in the graph (rather than 
distances across long sequences of edges), those will not be asymmetric in OSM 
data. Then length of a way between two intersections is always identical in 
both directions. However the vehicle permissions or other characteristics may 
be different in the two directions. If you require asymmetric networks, you 
will need to load the OSM data into a system that will e.g. allow vehicles to 
move only in one direction on one-way streets.

> I have to implement my algorithm on real road network data to verify my 
> algorithms- i need 2 things - 1] spatial road network data

OSM is great for that.

>   2] API detail where I can implement my algorithm with minimum effort & then 
> run it with 1]   - So, now, my requirement is clear to you?

If you mean API in the sense of programming API (a set of functions and data 
types exposed to the programmer by a library) rather than web API (some URLs 
you can hit to call on remote services) then I’d recommend the open-source 
routing projects I cited in my previous message. You should be able to load 
your OSM data into them, then write your own algorithms to search on their 
existing graph data structures. That would save you re-inventing the wheel and 
writing all the code to load OSM data into a routable graph.

-Andrew

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Spatial road network data sets and implementation using API

2017-03-27 Thread Andrew Byrd
Hi Debajyoti,

I think you’ll need to provide more detailed information on your goals and the 
problems you are encountering for people to help you. Below I’ll list some of 
the questions that came to my mind in hopes that it will help clarify things, 
as well as some suggestions.

> On 28 Mar 2017, at 13:35, Debajyoti Ghosh <4u.debajy...@gmail.com> wrote:
> Dear all:
> Please suggest how to implement my algorithm on real ASYMMETRIC road network 
> datasets to get results. 

When you say “my algorithm” are you referring to existing published algorithms, 
or new algorithms that you have devised? Can you point to articles detailing 
those algorithms?

When you say the network data set is “real”, do you just mean it’s the road 
network of a real city rather than a synthetic network?

What is “asymmetric” about that network - traffic flow speeds, direction of 
travel? 

> I'm very much confused that I can't interpret data sets as it is given in 
> various uncommon/unknown file format.

Which file formats are you referring to, and where did you get these data in 
the unfamiliar formats? Are you referring to OSM data in XML or PBF formats?

> I believe that I need data sets in the form  pair for n 
> POI on real road network(spatial data)

When you say POI on the road network, do you mean the nodes making up the road 
network itself, or places that happen to be near the road? The latter I suppose.

> In addition we also need directed graph data sets for asymmetric road network 
> where d(a,b) != d(b,a).

I suppose this is what you mean by “asymmetric” above, just that network 
distances between any two nodes are dependent on which is the origin and which 
is the destination node. OSM data does allow different characteristics for the 
opposite lanes/direction of a single way, but you’d need to build a routable 
graph from OSM data that preserves those differences.

> a) data sets collection and interpret/Extracting Spatial Data(say from 
> OpenStreetMap)

“Spatial data” is a very general term - are you specifically talking about a 
routable graph / network, location of points of interest near such a network, 
or both? Do you need to collect data “in the field” or are you just using data 
from OSM?

> b) implement/simulate a prototype library/framework of LBS queries

I am assuming that by LBS you mean “location-based-services” and by “LBS 
queries” you mean, essentially, finding places or objects near a given 
location. There are two main ways to do that: either using straight-line 
distance or network distance. 

For short distances you could do a lot just placing all your points in a 
spatial index and calculating straight line distances for all objects found in 
a spatial index query. This would also be a suitable placeholder technique if 
you’re concentrating on applications or algorithms higher up the stack and just 
need a dummy data source that provides nearby points and distances.

On the other hand, if you want to find nearby objects over longer distances 
(driving or public transport) or in places with fragmented or irregular 
networks (mountains, water, sparse public transit), or if you need accurate 
distances because this is a real service rather than a dummy layer in a 
prototype, you’ll need a network model.

Many pieces of software exist for building routable networks from OSM data. For 
example:
https://github.com/graphhopper/graphhopper 

http://project-osrm.org/ 
https://github.com/valhalla
https://github.com/opentripplanner/OpenTripPlanner 


See also:
http://wiki.openstreetmap.org/wiki/Routing 

http://wiki.openstreetmap.org/wiki/Routing/online_routers 


Regards,
Andrew


___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Simpler binary OSM formats

2016-02-16 Thread Andrew Byrd
Hi Colin, 

> On 08 Feb 2016, at 13:07, Colin Smale  wrote:
> There are discussions going on which may change the underlying data 
> metamodel. I am thinking of support for polygons/areas as primitive types and 
> multi-valued keys. Although the model has been stable since API0.6 it would 
> not be prudent to preclude changes in the future.
> 
Thanks for pointing this out. You are right that we should consider how any 
format can be adapted to such changes in the OSM data model.

Adding a new primitive type is in fact very straightforward, because every vex 
block contains only entities of a single type. We would simply introduce a new 
block type that would have its own distinctive marker in the block header 
(probably P or A instead of N, W, or R that are currently used for Nodes, Ways, 
and Relations). Existing readers would not recognize this type and would either 
skip these blocks or fail reporting an unrecognized block type. These area 
blocks would be structured similarly to all the others.

Multiple tag values are also straightforward to support. Nothing in the current 
vex format specification prevents the same key from appearing more than once, 
or separator characters appearing in the values for example. There is no 
practical limit on the length of value strings.

I am confident that the vex format can readily adapt to these anticipated 
changes to the OSM data model. Adapting would create a new, incompatible 
version of vex, but a few simple additions to the specification would make 
readers certain to detect incompatible extensions and fail gracefully.

Andrew___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Simpler binary OSM formats

2016-02-08 Thread Andrew Byrd
ighweight parser using FlatBuffers for your data 
> scheme. 
> 
> Regards
> Ben 
> 
> Von meinem iPad gesendet
> 
> Am 06.02.2016 um 22:37 schrieb Andrew Byrd  <mailto:and...@fastmail.net>>:
> 
>> Hello OSM developers,
>> 
>> Last spring I posted an article discussing some shortcomings of the PBF 
>> format and proposing a simpler binary OSM interchange format called VEX. 
>> There was a generally positive response at the time, including helpful 
>> feedback from other developers. Since then I have revised the VEX 
>> specification as well as our implementation, and Conveyal has been using 
>> this format in our own day-to-day work.
>> 
>> I have written a new article describing of the revised format:
>> http://conveyal.com/blog/2016/02/06/vex-format-part-two 
>> <http://conveyal.com/blog/2016/02/06/vex-format-part-two>
>> 
>> The main differences are 1) it is more regular and even simpler to parse; 
>> and 2) file blocks are compressed individually, allowing parallel processing 
>> and seeking to specific entity types. It is no longer smaller than PBF, but 
>> still comparable in size.
>> 
>> Again, I would welcome any comments you may have on the revised format and 
>> the potential for a shift to simpler binary OSM formats.
>> 
>> Regards,
>> Andrew Byrd
>> 
>> 
>>> On 29 Apr 2015, at 01:35, andrew byrd >> <mailto:and...@fastmail.net>> wrote:
>>> 
>>> Hello OSM developers,
>>>  
>>> Over the last few years I have worked on several pieces of software that 
>>> consume and produce the PBF format. I have always appreciated the 
>>> advantages of PBF over XML for our use cases, but over time it became 
>>> apparent to me that PBF is significantly more complex than would be 
>>> necessary to meet its objectives of speed and compactness.
>>>  
>>> Based on my observations about the effectiveness of various techniques used 
>>> in PBF and other formats, I devised an alternative OSM representation that 
>>> is consistently about 8% smaller than PBF but substantially simpler to 
>>> encode and decode. This work is presented in an article at 
>>> http://conveyal.com/blog/2015/04/27/osm-formats/ 
>>> <http://conveyal.com/blog/2015/04/27/osm-formats/>. I welcome any comments 
>>> you may have on this article or on the potential for a shift to simpler 
>>> binary OSM formats.
>>>  
>>> Regards,
>>> Andrew Byrd
>>> ___
>>> dev mailing list
>>> dev@openstreetmap.org <mailto:dev@openstreetmap.org>
>>> https://lists.openstreetmap.org/listinfo/dev 
>>> <https://lists.openstreetmap.org/listinfo/dev>
>> 
>> ___
>> dev mailing list
>> dev@openstreetmap.org <mailto:dev@openstreetmap.org>
>> https://lists.openstreetmap.org/listinfo/dev 
>> <https://lists.openstreetmap.org/listinfo/dev>

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Simpler binary OSM formats

2016-02-08 Thread Andrew Byrd

> On 08 Feb 2016, at 10:57, Andrew Byrd  wrote:
> To me, it seems much more advantageous to provide a table of file offsets 
> stating where each entity type begins. I have already considered adding this 
> to vex after the basic format is established (like author metadata and map 
> layers). It seems appropriate to place such a table at the end of the vex 
> data, because this allows the writer to produce output as a stream (no 
> backward seeks) and a reader can only make effective use of this table if 
> it’s buffering the data and able to seek within the file.

On second thought, if the table is to be placed at the end of the file/stream 
the writer would not even necessarily need to store it because the reader can 
easily construct an equivalent table as it receives the data (or the first time 
it scans over the file).

-Andrew
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Simpler binary OSM formats

2016-02-08 Thread Andrew Byrd

> On 07 Feb 2016, at 20:10, Дмитрий Киселев  wrote:
> 
> As for fixed sized blocks in vex, I did consider that option but couldn’t 
> come up with a compelling reason for it. I can see the case for a maximum 
> block size (so we know what the maximum size of allocation will be), but can 
> you give a concrete example of how fixed-size blocks would be advantageous in 
> practice? I would be very hesitant to split any entity across multiple blocks.
> 
> 
> When you need relations-ways-nodes read order, blocks will save you  from 
> unnecessary read-through the whole file (yes, you can skip decompression for 
> nodes/ways but still you must read the whole file).

Let me rephrase the question: You specifically mentioned blocks of a 
predetermined, fixed size. How would having fixed-size blocks (as opposed to 
the current variable sized blocks) improve your ability to seek to different 
entity types within a file? Maybe you are thinking of doing a binary search 
through the file rather than a linear search for the blocks of interest. But 
that means the vex blocks would need to be a fixed size after compression, not 
before compression. It seems too complex to require writers to target an exact 
post-compression block size.

Also, I think your observation that “you must read the whole file” when seeking 
ahead to another entity type requires some additional nuance. You must only 
read the header of each block, at which point you know how long that block is 
and you can seek ahead to the next block. So indeed, you’d touch at least one 
page or disk block per vex block. Pages are typically 4 kbytes, so if your vex 
blocks are a few Mbytes in size, you would only access on the order of 1/1000 
of the pages while seeking ahead to a particular entity type. 

To me, it seems much more advantageous to provide a table of file offsets 
stating where each entity type begins. I have already considered adding this to 
vex after the basic format is established (like author metadata and map 
layers). It seems appropriate to place such a table at the end of the vex data, 
because this allows the writer to produce output as a stream (no backward 
seeks) and a reader can only make effective use of this table if it’s buffering 
the data and able to seek within the file.

> Second example: find something by id, if you have blocks it's easy to map 
> whole block into memory and do a binary search for logN block reads instead 
> of seeing through a file all the time.

Unlike o5m I have not included any requirement that IDs be in a particular 
order, which means binary searches are not always possible. I see vex as a data 
interchange format usable in both streaming and disk-backed contexts, not as a 
replacement for an indexed database table. It’s an interesting idea to try to 
serve both purposes at once and be able to quickly seek to an ID within a flat 
data file, but I’m not sure if such capabilities are worth the extra 
complexity. Such a binary search, especially if performed repeatedly for 
different entities, would be touching (and decompressing) a lot of disk blocks 
/ memory pages because the IDs you’re searching through are mixed in with the 
rest of the data rather than in a separate index as they would be in a database.

Andrew

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Simpler binary OSM formats

2016-02-07 Thread Andrew Byrd
Hi Dmitry,

Yes, there are similarities and I did study the o5m format before I began work 
on vex. The last section of my original article compares the two and gives my 
impressions of o5m: 
http://conveyal.com/blog/2015/04/27/osm-formats#comparisons-with-o5m 
<http://conveyal.com/blog/2015/04/27/osm-formats#comparisons-with-o5m>

In summary: o5m uses string tables with a fixed size and an LRU eviction 
policy. Producers and consumers must keep their string tables exactly in sync. 
Strings are then referenced by integers indicating how recently they were used 
(1 to 15000). This adds quite a bit of complexity to o5m implementations, 
especially considering that this eviction strategy can backfire on certain 
inputs leading to files that are actually bigger than a basic gzipped text 
representation of the same data. According to 
http://wiki.openstreetmap.org/wiki/Talk:O5m#Compression_Algorithms 
<http://wiki.openstreetmap.org/wiki/Talk:O5m#Compression_Algorithms> o5m uses 
string tables specifically to avoid relying on general purpose compression. I 
find this unnecessary considering that zlib compression is quite effective, 
resource efficient (with adjustable compression level), and available 
practically everywhere.

There are a few other unusual design decisions documented and discussed at 
http://wiki.openstreetmap.org/wiki/O5m <http://wiki.openstreetmap.org/wiki/O5m> 
and http://wiki.openstreetmap.org/wiki/Talk:O5m 
<http://wiki.openstreetmap.org/wiki/Talk:O5m>. For example, strings are both 
introduced and terminated by a null byte, and are often stored in pairs (i.e. 
three null bytes per string pair, one of which is at the beginning of the 
string).

Of course I recognize o5m's contribution to the dialog on binary formats, and 
we can of course learn from the o5m concept, but my conclusion is that it does 
not have the combination of extreme simplicity and compactness necessary to 
complement the existing formats.

As for fixed sized blocks in vex, I did consider that option but couldn’t come 
up with a compelling reason for it. I can see the case for a maximum block size 
(so we know what the maximum size of allocation will be), but can you give a 
concrete example of how fixed-size blocks would be advantageous in practice? I 
would be very hesitant to split any entity across multiple blocks.

-Andrew

> On 07 Feb 2016, at 09:06, Дмитрий Киселев  wrote:
> 
> Looks pretty similar to o5m, except tags key=value are not round-buffered.
> 
> As a further extension, it would be nice to have the ability to have blocks 
> of fixed size. 
> Just write nodes one by one while you haven't full-fill byte buffer.
> For extremely big relations (which are larger than one block) it's possible 
> to mark two adjacent blocks as connected, but there should be a few of them.
> 
> It would help to read write and seek over files.
> 
> 2016-02-07 3:47 GMT+05:00 Stadin, Benjamin 
>  <mailto:benjamin.sta...@heidelberg-mobil.com>>:
> Hi Andrew,
> 
> Cap'n Proto (successor of ProtoBuffer from the guy who invented ProtoBuffer) 
> and FlatBuffers (another ProtoBuffer succesor, by Google) have gained a lot 
> of traction since last year. Both eliminate many if the shortcomings of the 
> original ProtoBuffer (allow for random access, streaming,...), and improve on 
> performance also.
> 
> https://github.com/google/flatbuffers <https://github.com/google/flatbuffers>
> 
> Here is a comparison between ProtoBuffer competitors:
> https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html 
> <https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html>
> 
> In my opinion FlatBuffers is the most interesting. It seems to have very good 
> language and platform support, and has quite a high adoption rate already. 
> 
> I think that it's well worth to reconsider creating an own file format and 
> parser for several reasons. Your concept looks well thought, it should be 
> possible to implement a lighweight parser using FlatBuffers for your data 
> scheme. 
> 
> Regards
> Ben 
> 
> Von meinem iPad gesendet
> 
> Am 06.02.2016 um 22:37 schrieb Andrew Byrd  <mailto:and...@fastmail.net>>:
> 
>> Hello OSM developers,
>> 
>> Last spring I posted an article discussing some shortcomings of the PBF 
>> format and proposing a simpler binary OSM interchange format called VEX. 
>> There was a generally positive response at the time, including helpful 
>> feedback from other developers. Since then I have revised the VEX 
>> specification as well as our implementation, and Conveyal has been using 
>> this format in our own day-to-day work.
>> 
>> I have written a new article describing of the revised format:
>> http://conveyal.com/blog/2016/02/06/vex-format-par

Re: [OSM-dev] Simpler binary OSM formats

2016-02-06 Thread Andrew Byrd
Hello OSM developers,

Last spring I posted an article discussing some shortcomings of the PBF format 
and proposing a simpler binary OSM interchange format called VEX. There was a 
generally positive response at the time, including helpful feedback from other 
developers. Since then I have revised the VEX specification as well as our 
implementation, and Conveyal has been using this format in our own day-to-day 
work.

I have written a new article describing of the revised format:
http://conveyal.com/blog/2016/02/06/vex-format-part-two 
<http://conveyal.com/blog/2016/02/06/vex-format-part-two>

The main differences are 1) it is more regular and even simpler to parse; and 
2) file blocks are compressed individually, allowing parallel processing and 
seeking to specific entity types. It is no longer smaller than PBF, but still 
comparable in size.

Again, I would welcome any comments you may have on the revised format and the 
potential for a shift to simpler binary OSM formats.

Regards,
Andrew Byrd


> On 29 Apr 2015, at 01:35, andrew byrd  wrote:
> 
> Hello OSM developers,
>  
> Over the last few years I have worked on several pieces of software that 
> consume and produce the PBF format. I have always appreciated the advantages 
> of PBF over XML for our use cases, but over time it became apparent to me 
> that PBF is significantly more complex than would be necessary to meet its 
> objectives of speed and compactness.
>  
> Based on my observations about the effectiveness of various techniques used 
> in PBF and other formats, I devised an alternative OSM representation that is 
> consistently about 8% smaller than PBF but substantially simpler to encode 
> and decode. This work is presented in an article at 
> http://conveyal.com/blog/2015/04/27/osm-formats/ 
> <http://conveyal.com/blog/2015/04/27/osm-formats/>. I welcome any comments 
> you may have on this article or on the potential for a shift to simpler 
> binary OSM formats.
>  
> Regards,
> Andrew Byrd
> ___
> dev mailing list
> dev@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/dev

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM PBF and spatial characteristics of blocks

2016-01-06 Thread Andrew Byrd

> On 06 Jan 2016, at 14:10, Stadin, Benjamin 
>  wrote:
> 
> And about the cell data: I'm considering to just reuse OSM pbf format, 
> without preserving sort and size attributes. When exporting the data from 
> individual grid cells, all data items will be streamed to the output ordered 
> by type and ID. A simple in memory AVL tree should be sufficient (storing id 
> keys and pointers to items as node data, iterating lowest to highest id on 
> output)

We wanted to preserve conventional entity ordering (node, way, relation) but 
maintaining increasing ID number was not important for us; I preferred a 
constant-memory export process (i.e. memory consumption does not grow with the 
geographic size of the extract) that simply iterates over index cells in order 
three times, dumping first nodes, then ways, then relations.

If I understand you correctly you’d use the PBF format as your internal storage 
format, making one PBF file per spatial index cell (essentially splitting 
planet.pbf into one PBF file per tile). I can see the appeal of simplicity 
here, and I considered this approach myself, but I think PBF would be 
problematic if you intend to perform random access within those tiles to apply 
minutely updates. PBF is a data interchange format, to my knowledge designed 
and used primarily for moving or streaming database dumps or extracts from one 
site to another. You’ll end up doing a lot of decompress-filter-modify-rewrite 
operations on entire tiles. It could work, but it seems awkward and resource 
intensive. I can also imagine running into some problems with a 1 to N 
geographic PBF splitter. Due to PBF's block-based nature you might have to keep 
a prohibitively large number of files open simultaneously during your 
planet-to-tile splitter step. If the planet.pbf must pass through some 
intermediate representation to allow splitting (essentially a spatially indexed 
database of some kind), why not keep it in that intermediate representation and 
perform the spatial splitting on demand.

-Andrew


___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM PBF and spatial characteristics of blocks

2016-01-06 Thread Andrew Byrd
On 06 Jan 2016, at 13:56, Stadin, Benjamin 
 wrote:
> The problem I have to solve is that I need to be able to export also to other 
> projections. For our indoor navigation and accurate site maps we have to use 
> a proper projection. Thus my idea to index WGS84 datum, and enble to index 
> oberlappings efficiently. When this is solved it should be possible to load 
> WGS84 data cells that overlap a world area for a given projection (using the 
> bbox from input, loading overlapping MODIS cells) and cut the export at the 
> bounding sites later on.

You can do a similar kind of spatial indexing on raw WGS84 coordinates. It is 
more compact to store WGS84 coordinates as fixed-precision ints than as doubles 
(single-precision floats are not precise enough worldwide), and you can insert 
those int coordinates into an implicit quad tree in the same way map tiles 
work, just by shifting low order bits off.

-Andrew
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM PBF and spatial characteristics of blocks

2016-01-06 Thread Andrew Byrd

> On 06 Jan 2016, at 13:13, Andrew Byrd  wrote:
> Obviously in the long term we’ll want to improve the index to handle these 
> cases. Both of these limitations should be straightforward to overcome. To 
> index large polygons as areas and you’d either need some kind of multi-level 
> index (rectangle tree or “pyramids") or just accept rasterizing area polygons 
> into all the index cells they overlap (a polygon’s ID appearing repeatedly, 
> in every tile it overlaps).

I should elaborate: 

Using a predefined grid is practical because you don’t need to store the bounds 
of the index nodes (they’re regular tiles, so their bounding boxes are implicit 
in the grid definition). You can determine which index cell any element is in 
by just chopping low-order bits off its projected coordinates to match the 
index’s constant zoom level.

If you’re rendering map tiles, there’s also the advantage of a 1:1 
correspondance between one spatial index node and one output map tile at or 
above the index’s zoom level, and a simple one-to-many relationship below that 
zoom level. This is also practical for exporting rectangular PBF regions.

If you want to index objects physically larger than a tile in the index or 
reflect the fact that objects span tiles, you could repeat the identifier of 
that object in every tile it touches (what I referred to as rasterizing the 
geometry), but this approach leads to a lot of repetition of identifiers and 
de-duplication of identifiers when you query the index.

An alternative approach is a hierarchical index, where each index node is 
entirely physically contained within a higher-level node. Conveniently the web 
Mercator map tile system defines a perfectly aligned hierarchy of squares via 
its zoom levels: it is a quad tree (https://en.wikipedia.org/wiki/Quadtree). 
You simply index each object at the highest zoom level that still contains the 
object entirely within one tile, and when you do a query operation from the 
index, you can easily traverse up the zoom levels without maintaining pointers 
between them (you can find the coordinates of the next higher tile by just 
chopping off low order bits).

There are of course many ways to implement spatial indexes, but considering 
that we’re working with data that often gets rendered or exported in tile-sized 
chunks, this approach seemed well-suited to me.

I’d of course welcome any insight or suggestions that might improve our 
implementation.

-Andrew 
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM PBF and spatial characteristics of blocks

2016-01-06 Thread Andrew Byrd

> On 06 Jan 2016, at 03:40, Stadin, Benjamin 
>  wrote:
> Does your Vanilla Extract consider overlapping polygons? Like if you export a 
> small area within a country, does it add the country's polygon that overlaps 
> the area? 
> It looks pretty interesting though. I'm not sure where to start at, yet I 
> thinkit will be good to combine features from TileMaker and Vanilla Extract. 

Our spatial indexing is rather crude and tile-based. This is intentional to 
keep it small and simple. We have a grid of cells which correspond to the web 
mercator tiles at a single zoom level, and every OSM object is assigned to one 
tile only. This is problematic for objects that span multiple tiles. Also note 
that free-floating nodes which are not included in any way are not reachable 
using the current index. For our applications we just haven’t needed to index 
free-floating POI nodes yet, and don’t need large administrative borders or 
huge area polygons.

Obviously in the long term we’ll want to improve the index to handle these 
cases. Both of these limitations should be straightforward to overcome. To 
index large polygons as areas and you’d either need some kind of multi-level 
index (rectangle tree or “pyramids") or just accept rasterizing area polygons 
into all the index cells they overlap (a polygon’s ID appearing repeatedly, in 
every tile it overlaps).

So the indexing system would need some work for your application. But I thought 
the two underlying storage systems for OSM data could be useful to you.

-Andrew


___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM PBF and spatial characteristics of blocks

2016-01-05 Thread Andrew Byrd
> On 05 Jan 2016, at 19:18, Paul Norman  wrote:
> 
> Both of these seem to be software rather than file format definitions. I had 
> a quick look around the repo, but I couldn't find a format definition except 
> that for standard OSM PBF.

Hi Paul,

They are indeed software, both of which store planet-scale OSM data sets and 
perform on-demand geographic extracts, and the second one applies minutely 
updates. I may have misunderstood something, but I was under the impression 
that the original poster was looking for a way to fetch up to date, 
geographically contiguous chunks of PBF data for use in generating image tiles.


So these two links are not file format definitions, they are essentially 
special purpose database systems, but of course each one has its own internal 
on-disk representation of bulk OSM data, which is distinct from PBF. I would 
not consider PBF to be an optimal bulk storage format if you intend to perform 
a continuous stream of minutely updates (delete, move, change tags) on 
arbitrary OSM entities scattered around the world. Performing random access 
updates inside PBF files could get awkward. Say for example you need to update 
a tag on way 142563. Where is that way located geographically, in which file, 
and what is its position inside that file? Once you locate that file, you’d 
need to decompress the entire file and scan through it to find the way, and if 
the edit makes the file block larger than it was before, you’d need to shift 
and rewrite the rest of the file. 

So I was providing these repos as examples of systems that were intended from 
the beginning to allow minutely updates and arbitrary PBF extracts. They might 
need to be completed or adapted, but could provide a starting point.

The second (Java) version that I cited is however part of an OSM handling 
library, which does specify its own file format called VEX. Like PBF, that is a 
data exchange format for passing around OSM regions in files. But neither PBF 
nor VEX format is used to store the bulk, spatially indexed data internally.

Andrew
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM PBF and spatial characteristics of blocks

2016-01-05 Thread Andrew Byrd
Hello,

It’s interesting to see more demand for grabbing up-to-date chunks of PBF 
around the world.

Keep in mind that while OSMPBF is an essential export format (as the most 
efficient widely used OSM data format), it is not the only option for storing 
the OSM entities in your spatially indexed / geographically clustered 
planet-scale container.

We’ve got similar needs (storage of OSM data at planet scale, rapid extraction 
of up-to-date, compact binary OSM data for any location) and created the 
Vanilla Extract project to meet these needs. There’s the original C version 
(which is fast but doesn’t do minutely updates) and a newer MapDB based Java 
version that does minutely updates but seems less performant in practice.

You might be able to extend or borrow code from these projects to meet your 
needs:
https://github.com/conveyal/vanilla-extract 
<https://github.com/conveyal/vanilla-extract>
https://github.com/conveyal/osm-lib/blob/master/src/main/java/com/conveyal/osmlib/VanillaExtract.java
 
<https://github.com/conveyal/osm-lib/blob/master/src/main/java/com/conveyal/osmlib/VanillaExtract.java>

Regards,
Andrew Byrd

> On 05 Jan 2016, at 18:40, Stadin, Benjamin 
>  wrote:
> 
> Thank you. This is enough clarification for me. Then I’ll create an 
> independent store (using OSM PBF format but using spatial clustering) and on 
> export the required order for the region will be recreated.
> 
> Von: Paul Norman mailto:penor...@mac.com>>
> Datum: Dienstag, 5. Januar 2016 um 18:09
> An: "dev@openstreetmap.org <mailto:dev@openstreetmap.org>" 
> mailto:dev@openstreetmap.org>>
> Betreff: Re: [OSM-dev] OSM PBF and spatial characteristics of blocks
> 
> On 1/5/2016 8:32 AM, Stadin, Benjamin wrote:
>> I’m thinking about a design for an efficient storage container for OSM PBF 
>> (planet size data, minutely updates), for the purpose of TileMaker as well 
>> as for an internal application. 
> 
> Good to see Tilemaker (https://github.com/systemed/tilemaker 
> <https://github.com/systemed/tilemaker>) getting some traction.
> 
>> One thing I stumbled on is the usage of the bounding boxes within OSM PBF. 
>> The documentation [1] does not clarify on the spatial characteristics of the 
>> individual FileBlocks. Some questions:
>> Is it correct that there is exactly one HeaderBlock in a .pbf file? If so, 
>> the BBOX defined within the HeaderBlock defines the whole region of the .pbf 
>> export?
>> What are the spatial characteristics of an individual FileBlock within the 
>> FileBlocks sequence? Is a FileBlock generated by any kind of spatial 
>> ordering? For example, is it save to assume that all content is very dense / 
>> close to a region of the world? Or can this be controlled when creating a 
>> .pbf? If there was a spatial loose relationship, it would allow to relate 
>> FileBlocks to map „tile“ regions (a FileBlock may obviously relate to 
>> several „tiles“, but would be fine as long as the blocks relate to a certain 
>> region for most of it’s content)
>> There is a commented BBOX definition within the PrimitiveBlock. What remains 
>> to be done to to enable this proposed BBOX extension? I’d have the same 
>> question about this BBOX as with my second question.
> 
> PBFs are generally ordered by type then ID, so there is no guaranteed spatial 
> clustering. There is a strong correlation between nearby IDs and objects 
> being near each other which makes delta encoding worthwhile.
> 
> A lot of software implicitly depends on ordering. Sorting by type is often a 
> hard requirement - doing anything with ways normally requires having parsed 
> all the nodes for geometries. Sorting by ID may be needed depending on how 
> storage algorithms were implemented - software can become less efficient or 
> break if it's expecting ordered IDs and gets unordered.
> ___
> dev mailing list
> dev@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/dev

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] setting up mirror

2015-11-18 Thread Andrew Byrd
Hello,

I think we should clarify some terms here. The word “mirror” usually refers to 
a secondary copy of some very large or very heavily used resource, generally at 
a different physical location or network link than the primary “official” copy. 
Its purpose is to reduce load on the main server and avoid sending huge amounts 
of data across the world by keeping a recent duplicate of the main resource 
nearer to the final consumers.

When you say “planet osm mirror” people might assume you mean "a copy of 
planet.pbf closer to home”. Creating this kind of mirror is just that simple: 
you’d place a recent copy of planet.pbf on a web or file server of some kind, 
and have people within your organization or geographic area fetch planet.pbf 
from your server instead of the main one (generally to reduce traffic crossing 
out of your local network or traveling long distances). Planet.pbf is updated 
regularly, so you’d probably schedule some scripts to keep your mirrored copy 
in sync by overwriting it once in a while or applying OSM diffs.

I am a bit confused by the comments saying there is only one planet mirror. 
There are eight of them listed at http://wiki.openstreetmap.org/wiki/Planet.osm 
 and I’m sure there are many 
more undocumented mirrors around the world within organizations that use OSM 
data.

If you only want to work on one geographic region, it’s also possible to 
download a PBF extract of that region and keep it up to date with tools like 
osmupdate or osmosis. This would be much less bandwidth and resource intensive 
than cloning the whole planet.
http://wiki.openstreetmap.org/wiki/Osmupdate 


There is another notion which could also arguably be called “mirroring” but is 
commonly referred to as “replication”: creating a local database with a similar 
schema to the main OSM database and applying diffs to maintain a completely 
up-to-date copy of that main database. This allows extensive indexing and thus 
searching for items within OSM by location, ID, tags etc. 
http://wiki.openstreetmap.org/wiki/Osmosis/Replication 


Special-purpose OSM replication databases exist, which you could use instead of 
a general-purpose relational database depending on your use case. Overpass is 
one such project. Instead of using the public overpass API server, you can also 
install Overpass locally to mirror the whole main OSM database and keep your 
mirror in sync: http://wiki.openstreetmap.org/wiki/Overpass_API/Installation 


I’m also working on a replication system called Vanilla Extract that is geared 
toward performing large, fast geographic extracts. It’s still under development 
but stable enough for us to use it on a daily basis. Our use case is routing, 
so it currently only indexes ways and the nodes referenced by them. No 
free-floating POI nodes (though we’ll eventually add that), and no author 
metadata.

The original version (in C) loads a PBF at a single point in time and does no 
updates. 
https://github.com/conveyal/vanilla-extract 


The newer version (in Java using MapDB) is capable of replicating planet.osm 
with a one-line command and performs minutely updates, with the option of even 
pulling the OSM data in over the network without a local copy of planet.pbf. 
However, it’s still a bit slow on large extracts. We’re currently migrating it 
to MapDB2 and expect a performance improvement.
https://github.com/conveyal/osm-lib/blob/master/src/main/java/com/conveyal/osmlib/VanillaExtract.java
 


Andrew

> On 18 Nov 2015, at 18:46, Kevin Mcintyre  wrote:
> 
> There's some good information here - 
> http://download.geofabrik.de/technical.html 
>  one companies process at least.
> 
> Appreciate the response, it's a tricky search because the results are 
> generally information on mirrors not how to become a mirror.
> 
> 
> 
> On Wed, Nov 18, 2015 at 9:21 AM, Tom Hughes  > wrote:
> On 18/11/15 17:02, Kevin Mcintyre wrote:
> 
> Hello - I'm seeking information on setting up a planet osm mirror.
> 
> I think the reason you're not getting replies is that you're asking for 
> something that doesn't really exist - there isn't any general infrastructure 
> for mirrors in existence.
> 
> We have one mirror that we redirect most downloads to, and that's it.
> 
> Tom
> 
> -- 
> Tom Hughes (t...@compton.nu )
> http://compton.nu/ 
> 
> ___
> dev mailing list
> dev@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/dev

___
dev mailing list
dev@op

Re: [OSM-dev] new Java OSM loading / storage library

2015-06-30 Thread Andrew Byrd
The README is currently rather limited but the following classes should serve 
as entry points to demonstrate how the library is used:

https://github.com/conveyal/osm-lib/blob/master/src/test/java/com/conveyal/osmlib/RoundTripTest.java
https://github.com/conveyal/osm-lib/blob/master/src/main/java/com/conveyal/osmlib/main/Converter.java

-Andrew

> On 01 Jul 2015, at 00:08, Andrew Byrd  wrote:
> Conveyal has created an open source Java OSM library for use in our own 
> projects (including OpenTripPlanner). It is now in a usable pre-release 
> state, and some of you might be interested in trying it out. I would of 
> course be very interested in any commentary or feedback that could help 
> improve the library and make it more useful to the wider community. 
> 
> https://github.com/conveyal/osm-lib


___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


[OSM-dev] new Java OSM loading / storage library

2015-06-30 Thread Andrew Byrd
Hello OSM developers,

Conveyal has created an open source Java OSM library for use in our own 
projects (including OpenTripPlanner). It is now in a usable pre-release state, 
and some of you might be interested in trying it out. I would of course be very 
interested in any commentary or feedback that could help improve the library 
and make it more useful to the wider community. 

https://github.com/conveyal/osm-lib

This library provides:

0. A set of Java classes modeling OSM entities (Nodes, Ways, and Relations).

1. Disk-backed random access storage of OSM data of any size (up to an entire 
planet dump) without any external database server, using the excellent MapDB 
in-process storage engine.

2. Reading and writing of the PBF format, as well as a still-evolving 
implementation of the VEX format I proposed a few months back [1], and 
conversion between the two.

3. Spatial indexing of OSM data based on web Mercator tiles, which allows 
fetching tiles from anywhere in the world on demand. 

4. A web API built on top of that spatial index which retrieves arbitrary 
rectangles of PBF data on demand.

5. Continuous minutely updates while the API server is running. The OSM data 
can stay in sync only a few seconds behind the OSM replication server.

Points 3 and 4 are essentially a Java port of our Vanilla Extract project [2]. 
This has the potential to simplify workflows that produce many geographic 
extracts from a single source PBF, or that rely on fetching up-to-date 
geographic extracts on demand.

Spatial indexing is currently limited to ways and the nodes referenced by those 
ways, i.e. “loose” POI nodes are not indexed. This is simply because our main 
use case is routing along ways.

Regards,
Andrew Byrd

[1] https://lists.openstreetmap.org/pipermail/dev/2015-April/028546.html
[2] https://github.com/conveyal/vanilla-extract



___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] [osmosis-dev] Proposal for a multithreaded PBF reader

2015-06-04 Thread Andrew Byrd
Hello,

Can anyone provide anecdotes of use cases where multi-threaded PBF reading 
significantly speeds up processing? Generally I would expect PBF reading to be 
IO bound rather than processor bound, but I still need to make more accurate 
measurements. 

Of course actually processing the OSM data once the PBF is decoded can be quite 
CPU intensive, but that would imply buffering decoded data and parallelizing 
geometric operations for example, not the reading.

I’d appreciate any data points and example use cases you might have, as I’m 
currently working on related tooling.

Andrew Byrd

> On 04 Jun 2015, at 05:57, Brett Henderson  wrote:
> 
> On 30 April 2015 at 03:27, Paul Norman  <mailto:penor...@mac.com>> wrote:
> On 4/29/2015 9:55 AM, Martijn van Exel wrote:
> If osmosis is the reference implementation, is there a reason why it
> doesn't seem to leverage this block structure to speed up reading? Or
> does it?
> Osmosis has the --read-pbf-fast task which allows multiple worker threads.
> 
> That's right.  I forget how the PBF structure works off the top of my head, 
> but the file is already split into blocks.  The main --read-pbf-fast thread 
> simply grabs the outer protobuf blocks from file and then distributes them to 
> worker threads who parse out the OSM entities from within the block.  After 
> extraction, the entities within each block are passed to the downstream task 
> in original file order.  I'm not sure I see the need to modify the PBF file 
> format.
>  
> ___
> osmosis-dev mailing list
> osmosis-...@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/osmosis-dev

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Simpler binary OSM formats

2015-04-29 Thread andrew byrd
On Wed, Apr 29, 2015, at 11:45, François Battail wrote:
> with liprotobuf-c.so, default allocator, without assembly support:
> 799s (bandwidth: 33.9 MiB/s) with liprotobuf-c.so, sw_pool_t
> allocator, little assembly support: 629s (bandwidth: 43.1 MiB/s)"

Point taken -- in future comparisons I will focus on throughput as much
or more than size itself. I did notice that protobuf-c allows for
customizing ProtobufCAllocator, and it's interesting to see the results
of using a pool allocator. However my first reaction is that we really
shouldn't need to do much dynamic allocation at all to handle OSM data.
The use of dynamic allocation is a result of the general-purpose nature
of Protobuf. My sense is that Protobuf is not even a particularly good
fit for bulk OSM data transfer, considering the hoops that we must jump
through in dense-nodes to bypass the natural mapping of one Protobuf
message to one OSM entity.

> Most of the time is spent in zlib, libprotobuf-c and memory
> allocations. I've addressed the last point using x86_64 assembly
> language and a pool allocator. I think a rewrite of an optimized
> libprotobuf library would help to gain some speed but the cost is very
> high (at least for my application).

The alternative being to use a format that is not based on Protobuf at
all, but only uses its variable-byte encoding scheme.

Thanks for your comments! Andrew

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Simpler binary OSM formats

2015-04-29 Thread andrew byrd
Thanks for your comments Jochen. Responses in-line below.

On Wed, Apr 29, 2015, at 11:18, Jochen Topf wrote:
> On Wed, Apr 29, 2015 at 01:35:29AM +0200, andrew byrd wrote: ... it
> has some nice properties we shouldn't forget. First and foremost that
> is the block structure. This allows generating and parsing in multiple
> threads. I think thats an important optimization going forward.

A very good point. In any further testing I will apply intra-block
compression rather than compressing the entire stream.

>  I also think it is important to have some kind of header for storing
>  file meta data in a flexibly way, PBF has that.

Agreed. Any header metadata would be defined later if there is wider
interest in the format, since it represents a very small constant factor
in performance.

> Looking at your proposal you seem to be very concerned with file size
> but not so much with read/write speed. From my experience reading and
> writing PBF is always CPU bound. Removing complexity could speed this
> up considerably.

My true goal was to reduce complexity while at least maintaining the
performance characteristics of PBF. Given that goal, it is true that
there is too much emphasis on file size in the article. I will need to
do a follow-up to cover speed and memory usage.

Anecdotally, I was seeing about 2x speedup relative to PBF writing for a
specific processing step, with both PBF and VEX writing code written in
C. That certainly needs to be confirmed more methodically.

The PBF parsing code produced by the Protobuf compiler seems to involve
quite a lot of dynamic memory allocations. It is straightforward to read
and write OSM data with no dynamic allocation at all (outside the
compression library), and this is one place where VEX could offer an
advantage.

> Currently you can save quite a lot of CPU time if you do not
> compress the PBF blocks but leave them uncompressed. Of course the
> file size goes up, but if you have the storage space that doesn't
> matter that much.

I have to admit I did not really consider this because I've rarely
encountered uncompressed PBF "in the wild" and have always used PBF with
compression turned on. Indeed, any future comparison should ideally
include PBF with uncompressed blocks.

With storage space and bandwidth as high as it is today, it is true that
throughput should be considered as much or more so than file size.
Again, my true motivation is simplicity, and in performance improvements
to the extent that they can be enabled by simplicity.

I have observed that speed numbers can of course be quite different
depending on whether you handle the compression in a separate thread.

> First, I'd like to see the numbers for the whole planet. A size
> difference between small extracts doesn't really matter all that much,
> because the absolute size is so small. Savings on the whole planet
> file would be much more interesting.

A good suggestion. After I've amended the VEX format taking into account
the commentary I've received, I will perform a test on the whole planet.

> Second: The XML and PBF format usually contain the metadata that you
> removed in your VEX format. Have you accounted for that in your
> numbers? Ie. did you remove the metadata from XML and PBF, too?

Yes, I stripped the metadata from the source PBF before converting it to
all the other formats to ensure a fair comparison. These numbers are of
more interest to me in my daily applications, but since I see that there
is some interest from the wider community I agree that a future
comparison should be done including metadata.

> Incidentally I came up with a similar text format as you did. It is
> documented here:
> http://osmcode.org/libosmium/manual/libosmium-manual.html#opl-object-per-line-format

Yes, it's a very similar idea but of course mine is less complete since
it began life as debug output. One unusual ingredient of my text output
is the inclusion of nodes' complete data inline after the ways that
reference them. It's a trade-off which denormalizes intersection nodes,
but avoids keeping nodes far away from their references, avoids
repeatedly mentioning non-intersection node identifiers, and makes the
output human-readable as a series of complete way descriptions.

-Andrew
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Simpler binary OSM formats

2015-04-29 Thread andrew byrd
On Wed, Apr 29, 2015, at 07:03, Paul Norman wrote:
> How does it work with diffs and history files?

These features are not implemented, as this is a prototype format whose
main purpose so far is to seed a discussion. The way OSM diffs are
usually handled (full information is given for modified entities,
overwriting the existing entities) I am confident that the format would
require only minor amendments to handle diffs.

> PBF comes with relatively common support and libraries with a
> reasonable interface for most languages. How is this with your
> proposed format?

Being merely a proposal at this point, there is no support for VEX
beyond a prototype implementation in our OSM loading library. However
you can see from that example that the code to read and write the format
is quite simple. If I see interest in a more polished version of the
format, I would of course port the library to some common languages.

> Parse times for an extract with osm2pgsql are 13s for PBF (334 MB) and
> 18s for o5m (698 MB). I don't have any large o5m files sitting around
> so I can't check for a larger extract.

You are right to emphasize processing speed, especially considering the
huge size of many OSM data sets. I will need to do a follow up
addressing speed and memory usage. However, the real motivation for my
work on the VEX format was to aim for _simplicity_ while maintaining
speed and size at least as good as PBF. Anecdotally, for certain
operations in a C-language utility I was seeing processing speeds about
2x those for PBF. I will need to test that methodically.

-Andrew
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


[OSM-dev] Simpler binary OSM formats

2015-04-28 Thread andrew byrd
Hello OSM developers,

Over the last few years I have worked on several pieces of software that
consume and produce the PBF format. I have always appreciated the
advantages of PBF over XML for our use cases, but over time it became
apparent to me that PBF is significantly more complex than would be
necessary to meet its objectives of speed and compactness.

Based on my observations about the effectiveness of various techniques
used in PBF and other formats, I devised an alternative OSM
representation that is consistently about 8% smaller than PBF but
substantially simpler to encode and decode. This work is presented in an
article at http://conveyal.com/blog/2015/04/27/osm-formats/. I welcome
any comments you may have on this article or on the potential for a
shift to simpler binary OSM formats.

Regards, Andrew Byrd
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev