Re: [osmosis-dev] Reading OSM History dumps

Brett Henderson Tue, 24 Aug 2010 03:45:33 -0700

On Tue, Aug 24, 2010 at 1:09 AM, Peter Körner <osm-li...@mazdermind.de>wrote:


> Am 23.08.2010 13:35, schrieb Brett Henderson:
>
> To create your own store implementation you can build on the Osmosis
>> persistence support.  All classes that are persistable implement the
>> Storeable interface and have a constructor with "StoreReader sr,
>> StoreClassRegister scr" arguments.
>>
>> The existing IndexedObjectStore assumes that the key is a long but
>> provides a good example to start from.  The underlying IndexStore it
>> uses can support any type of key as long as it has a fixed width (ie.
>> always persists to the same number of bytes).
>>
> It would need a key of 96 bit (id long + version int). I was not aware of
> any type >64bit in java so I'm not sure how I could build a store with a
> 96bit index, but I think I have to take a deeper look into the IndexStore &
> company.
>

IndexStore just requires an IndexElement implementation that holds both the
key and the value.  You can define a key implementation class that holds as
many individual long or int values as you like, so long as it persists
through the Storeable interface to a fixed number of bytes.  You also have
to provide the IndexStore with a comparator that knows how to compare the
order of two keys.


>
> The timestamp is just a 64bit long value, so the only problem is here to do
> the comparison but this is the easy past, i think.


>
>  It may
>> be possible to make the existing IndexedObjectStore more generic but I'd
>> need to experiment with it.
>>
> I'll try to keep the whole changes local to my project. Once its finished
> you can take classes over to core as they're needed.
>
>
>  Hmm, but thinking more about your problem it may make more sense to
>> stick with the IndexedObjectStore and store a list of Nodes as each
>> element instead of single Nodes.  I suspect in most cases you won't know
>> the exact version you're looking for when you're loading a Node
>>
> In the first phase when selecting the versions of the nodes used to create
> a version of a way I'll have a lot of timestamp searches (find the oldest
> node that is younger then the timestamp of the way) that need the timestamp
> index.
>
> later on, when the intermediate versions are calculated, i'll need a lookup
> for all versions of an id.
>
> a direct request for a known id/version will, as far as I see in this early
> stage, not be used too often (maybe during linestring building)
>
>
> > (you'll
>
>> only know node ids when looking at a way after all), and will only know
>> a timestamp range.  When looking up a specific node/version/timestamp
>> combination you would have to load all versions of a node from the
>> IndexedObjectStore then linearly search for a match in the (usually
>> fairly limited) list of objects.  You will possibly need to create you
>> own Storeable list type to hold all versions of a particular Node
>> because I don't think one exists.
>>
> The main problem I see is, that such a list won't be of fixed size. When I
> write it to the store and later on add another version, it will grow bigger
> and have to be re-allocated in the store file, freing up space at the
> beginning. Basically a malloc/realloc/free in files.


If you need the ability to write values randomly then it won't work.  But if
you have sorted input (ie. all versions of a node are together on input)
then you can write them all to the store at once.  IndexedObjectStore will
allow you to write variable length objects to the store which is already
necessary to hold entities with variable numbers of tags.


>
>
>  Just keep in mind that Osmosis stores aren't particularly fast to query
>> because they're based on very simple data structures.  They tend to
>> result in huge amounts of disk seeks when processing, so there may be
>> libraries out there that perform better.  The main reason they were
>> originally developed was to minimise external library dependencies and I
>> haven't revisited that decision since Osmosis put on weight (ie. it now
>> relies on many third-party jars).
>>
> Thinking about all this I find that we're re-inventing the wheel. I'll try
> to use a JavaDB as the backend store. It is entirely written in Java ant
> thus cross platform compatible, supports btree indexes on multiple fields an
> can reside both, in-memory and on-disk. If it shows that it's fast enough,
> it may be a good alternative to a custom binary file/memory store.


I hope it works out because I've been down a similar path here.  After I
gave up on custom stores I tried Berkeley DB Java edition and performance
was horrible.  I finally bit the bullet and went the PostgreSQL path and
created the pgsql tasks.  I hope JavaDB works out though because requiring a
full database server really complicates usage.

Be very careful with btree indexes on multiple fields because they usually
only work well when you're looking up values for specific values of indexed
fields.  If you ever need to use range queries (eg. timestamp range)
involving multiple fields they tend to fall down.  I suspect you'll be just
as well off using a single index on the id and not worry about the other
columns because the number of rows with a single id will be small and the
index will be much more compact that way.

Brett

_______________________________________________
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev

Re: [osmosis-dev] Reading OSM History dumps

Reply via email to