On Mon, Aug 17, 2009 at 09:27:11PM +0200, Lars Francke wrote:
> I'll need output in the following form:
> tag-key, number of changesets, nodes, relations and ways this key is
> used on, number of distinct values
> tag-value, the tag-key this value belongs to, number of changesets,
> nodes, relations and ways this value is used on
> 
> Additionally the following information would be nice:
> key/key combinations and how often these two keys are used in together
> on changesets, nodes, relations and ways

I think the bad news is that this kind of job really needs to be done in RAM.
Using the disk you'll just be paging blocks in and out all the time.

I have a Perl job doing almost exactly what you want to do. Its reads CSV
files (which have been generated in an earlier step from the planet XML)
and does all the counting and spits out CSV again. I don't know how
much memory it needs at the moment (should probably check that :-), but
it fits in the 16GB the machine has. It counts the number of nodes, ways
and relations having the same tag key, key-value combo and key-key combo.
It takes about three hours on my machine.

You could probably save RAM by using some kind of tighter data structure.
Depending on the programming language you might be wasting huge amounts
of memory. But not everybody wants to write in C.

If you absolutely can't get by with the RAM, you might need some kind of
last-recently-used cache where you can keep the counters for the tags used
most often in RAM and put the others on disk. Maybe use the results from
the previous run to optimize this. You might also get a performance
increase if you special case certain tags. For instance you know that
it doesn't really make much sense to count how often the different values
for the 'name' tag appear. There are other keys like this, like strange
id tags from imports.

Jochen
-- 
Jochen Topf  joc...@remote.org  http://www.remote.org/jochen/  +49-721-388298


_______________________________________________
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev

Reply via email to