Re: [OSM-dev] Update osm2pgsql patch mode

2008-09-01 Thread Brett Henderson
Brett Henderson wrote:
> Matt Amos wrote:
>> Martijn van Oosterhout wrote:
>>  
>>> In any case, the patch applying is fast enough as it is once its going
>>> so maybe I'll make the modify-instead-of-add the default for the time
>>> being until a better implementation comes along.
>>> 
>>
>> yep. works for me here :-)
>>
>> daily diff with create-as-modify import took 1h10m, which is awesome.
>>
>> the only problem now is trying to import the 28-29 diff because of 
>> the UTF8 truncation issue. is the easiest way to deal with this just 
>> to wait for next week's planet? i imagine the alternative is fiddling 
>> around with the diffs by hand to make the string valid?
>>   
> Give me about half an hour and I'll have a new one ...
>
Okay, it was nearly finished so took less time than expected.

TomH has fixed the dodgy record in the database and I've re-created the 
problematic daily and hourly changeset files.  The minute changeset had 
already been deleted so if anybody is waiting on one of those they'll 
have to catch up with daily files.

If anybody sees any further problems please let me now.

Cheers,
Brett


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Update osm2pgsql patch mode

2008-09-01 Thread Brett Henderson
Matt Amos wrote:
> Martijn van Oosterhout wrote:
>   
>> In any case, the patch applying is fast enough as it is once its going
>> so maybe I'll make the modify-instead-of-add the default for the time
>> being until a better implementation comes along.
>> 
>
> yep. works for me here :-)
>
> daily diff with create-as-modify import took 1h10m, which is awesome.
>
> the only problem now is trying to import the 28-29 diff because of the 
> UTF8 truncation issue. is the easiest way to deal with this just to wait 
> for next week's planet? i imagine the alternative is fiddling around 
> with the diffs by hand to make the string valid?
>   
Give me about half an hour and I'll have a new one ...


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Update osm2pgsql patch mode

2008-09-01 Thread Martijn van Oosterhout
On Mon, Sep 1, 2008 at 10:12 AM, Matt Amos <[EMAIL PROTECTED]> wrote:
> yep. works for me here :-)
>
> daily diff with create-as-modify import took 1h10m, which is awesome.
>
> the only problem now is trying to import the 28-29 diff because of the UTF8
> truncation issue. is the easiest way to deal with this just to wait for next
> week's planet? i imagine the alternative is fiddling around with the diffs
> by hand to make the string valid?

I think Frederik posted a fixed version on the list recently. It's
also available here:
http://hypercube.telascience.org/planet/

Have a nice day,
-- 
Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Update osm2pgsql patch mode

2008-09-01 Thread Matt Amos
Martijn van Oosterhout wrote:
> In any case, the patch applying is fast enough as it is once its going
> so maybe I'll make the modify-instead-of-add the default for the time
> being until a better implementation comes along.

yep. works for me here :-)

daily diff with create-as-modify import took 1h10m, which is awesome.

the only problem now is trying to import the 28-29 diff because of the 
UTF8 truncation issue. is the easiest way to deal with this just to wait 
for next week's planet? i imagine the alternative is fiddling around 
with the diffs by hand to make the string valid?

cheers,

matt


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Update osm2pgsql patch mode

2008-09-01 Thread Martijn van Oosterhout
On Mon, Sep 1, 2008 at 1:57 AM, Robert (Jamie) Munro <[EMAIL PROTECTED]> wrote:
>> FWIW, that's exactly what osmosis does.  It treats create and modify
>> identically when applying changes.  I go to a fair bit of trouble when
>> creating a changeset file to get the create versus modify correct
>> (including cases where an entity is deleted and re-created) but it all
>> falls over when applying changes onto a planet file where you have to
>> overlap a time period to get a consistent snapshot.
>
> In theory, you only need to do this once on a planet file. After that,
> everything should be consistent and remain consistent, and it's probably
> worth putting the checks back in, because a subsequent error would imply
> a deeper problem with either the diffs or with the merge routine.

It's also bad performance-wise because for osm2pgsql deleting
something requires anywhere from 3 to 7 queries to delete stuff from
various places.

I've been thinking of making a mode --safe which will make it ignore
errors, which will be needed the first patch only. And have JOSM
format (planet dump) defaults to off and for osmChange it's on. And a
lot of the time is taken by building indexes, and the whole clustering
bit which I'm currently doubtful whether it really helps.

In any case, the patch applying is fast enough as it is once its going
so maybe I'll make the modify-instead-of-add the default for the time
being until a better implementation comes along.

Have a nice day,
-- 
Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Update osm2pgsql patch mode

2008-08-31 Thread Brett Henderson
On Mon, Sep 1, 2008 at 9:57 AM, Robert (Jamie) Munro <[EMAIL PROTECTED]>wrote:

>
> In theory, you only need to do this once on a planet file. After that,
> everything should be consistent and remain consistent, and it's probably
> worth putting the checks back in, because a subsequent error would imply
> a deeper problem with either the diffs or with the merge routine.


Yep, that was my original thinking and it used to be that way within
osmosis.  To get around this problem and keep things simple I made the
changeset code more lenient.  Now that it's starting to get widely used and
more people understand how it works it might make sense to make the process
stricter.  I guess it all depends on how critical the downstream accuracy
is.

At the moment the production db has a number of quirks that make any process
likely to have some flaws.  Utf-8 encoding issues are one, lack of
transactional integrity between current and history tables is another.  I
think efforts should be focused there first.  Once that is rock solid then
we can look at tightening up replication mechanisms.
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Update osm2pgsql patch mode

2008-08-31 Thread Robert (Jamie) Munro
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Brett Henderson wrote:
> On Mon, Sep 1, 2008 at 6:50 AM, Martijn van Oosterhout
> <[EMAIL PROTECTED] > wrote:
> 
> Umm, yeah. There's that. The way I solved it was with the patch below,
> which is a gross hack but it works. Basically it turns every create
> into a modify so it deletes any conflicting rows before inserting. It
> may be the only way, but I'm still thinking on it...
> 
> 
> FWIW, that's exactly what osmosis does.  It treats create and modify
> identically when applying changes.  I go to a fair bit of trouble when
> creating a changeset file to get the create versus modify correct
> (including cases where an entity is deleted and re-created) but it all
> falls over when applying changes onto a planet file where you have to
> overlap a time period to get a consistent snapshot.

In theory, you only need to do this once on a planet file. After that,
everything should be consistent and remain consistent, and it's probably
worth putting the checks back in, because a subsequent error would imply
a deeper problem with either the diffs or with the merge routine.

Robert (Jamie) Munro
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAki7L+YACgkQz+aYVHdncI1bjgCfdF40andOJIztfe6au5cJRNvZ
yEoAoKy9w0nXx0vP+8lhUyIqdSDuzWdd
=edmM
-END PGP SIGNATURE-

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Update osm2pgsql patch mode

2008-08-31 Thread Brett Henderson
On Mon, Sep 1, 2008 at 8:57 AM, Karl Newman <[EMAIL PROTECTED]> wrote:

>
> I'm curious why the planet dump doesn't cut off at midnight, so you
> wouldn't get these duplicates. Is it because it's way more expensive to
> compare timestamps? I doubt it, because the minutely diff generation can run
> through the timestamps in the entire database in a few seconds. Am I missing
> something, or is it for hysterical raisins (as TomH likes to say) and nobody
> ever bothered to change it?
>

The current planet dump just dumps the current tables, it doesn't look at
history tables which is the only way to avoid this.  It also introduces data
inconsistencies because nodes, ways and relations will be read at different
times.  Osmosis does have a task for producing a consistent snapshot at a
point in time using the history tables but it isn't used in production, I
originally wrote it with the idea of replacing the current planet process
but it isn't very fast and not appropriate to running against the main
database.

The current method isn't ideal but it is fast and you can obtain a
consistent snapshot by then applying an osmosis changeset that overlaps the
period when the planet was generated.
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Update osm2pgsql patch mode

2008-08-31 Thread Brett Henderson
On Mon, Sep 1, 2008 at 6:50 AM, Martijn van Oosterhout <[EMAIL PROTECTED]>wrote:

> Umm, yeah. There's that. The way I solved it was with the patch below,
> which is a gross hack but it works. Basically it turns every create
> into a modify so it deletes any conflicting rows before inserting. It
> may be the only way, but I'm still thinking on it...
>

FWIW, that's exactly what osmosis does.  It treats create and modify
identically when applying changes.  I go to a fair bit of trouble when
creating a changeset file to get the create versus modify correct (including
cases where an entity is deleted and re-created) but it all falls over when
applying changes onto a planet file where you have to overlap a time period
to get a consistent snapshot.
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Update osm2pgsql patch mode

2008-08-31 Thread Karl Newman
On Sun, Aug 31, 2008 at 1:50 PM, Martijn van Oosterhout
<[EMAIL PROTECTED]>wrote:

> Umm, yeah. There's that. The way I solved it was with the patch below,
> which is a gross hack but it works. Basically it turns every create
> into a modify so it deletes any conflicting rows before inserting. It
> may be the only way, but I'm still thinking on it...
>
> Have a nice day,
>
> Index: osm2pgsql.c
> ===
> --- osm2pgsql.c (revision 10079)
> +++ osm2pgsql.c (working copy)
> @@ -278,6 +278,7 @@
> xmlFree(xtype);
> } else if (xmlStrEqual(name, BAD_CAST "create")) {
> action = ACTION_CREATE;
> +action = ACTION_MODIFY;
> } else if (xmlStrEqual(name, BAD_CAST "modify")) {
> action = ACTION_MODIFY;
> } else if (xmlStrEqual(name, BAD_CAST "delete")) {
>
>
> On Sun, Aug 31, 2008 at 7:50 PM, Matt Amos <[EMAIL PROTECTED]>
> wrote:
> > Martijn van Oosterhout wrote:
> >>
> >> On Fri, Aug 29, 2008 at 5:23 PM, Robert (Jamie) Munro <
> [EMAIL PROTECTED]>
> >> wrote:
> >>>
> >>> Great progress - that's now about 14 times real time.
> >>>
> >>> How much disc space does the whole operation require?
> >>
> >> On this system 42.5GB. Transient may be more, not sure.
> >
> > on this system it took 22.8h for the initial planet import, and it is
> > currently consuming 44Gb. from munin graphs it seems that max usage was
> > roughly 9Gb more.
> >
> > when i try to load the next day's osc i get the following error:
> >
> > Reading in file: /home/osm/planets/20080827-20080828.osc.gz
> > Processing: Node(50k) Way(0k) Relation(0k)insert_node failed: ERROR:
> > duplicate key value violates unique constraint "planet_osm_nodes_pkey"
> > (7)
> > Arguments were: 291466881, 6319926.5639616642, -13695169.0319488477,
> (null),
> > Error occurred, cleaning up
> >
> > the node with that ID seems to have been created just after midnight and
> > appears in the both the planet dump and daily diff. is there any way
> around
> > this, or should i just import the hourly diffs from 1am instead?
> >
> > cheers,
> >
> > matt
> >
>

I'm curious why the planet dump doesn't cut off at midnight, so you wouldn't
get these duplicates. Is it because it's way more expensive to compare
timestamps? I doubt it, because the minutely diff generation can run through
the timestamps in the entire database in a few seconds. Am I missing
something, or is it for hysterical raisins (as TomH likes to say) and nobody
ever bothered to change it?

Karl
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Update osm2pgsql patch mode

2008-08-31 Thread Martijn van Oosterhout
Umm, yeah. There's that. The way I solved it was with the patch below,
which is a gross hack but it works. Basically it turns every create
into a modify so it deletes any conflicting rows before inserting. It
may be the only way, but I'm still thinking on it...

Have a nice day,

Index: osm2pgsql.c
===
--- osm2pgsql.c (revision 10079)
+++ osm2pgsql.c (working copy)
@@ -278,6 +278,7 @@
 xmlFree(xtype);
 } else if (xmlStrEqual(name, BAD_CAST "create")) {
 action = ACTION_CREATE;
+action = ACTION_MODIFY;
 } else if (xmlStrEqual(name, BAD_CAST "modify")) {
 action = ACTION_MODIFY;
 } else if (xmlStrEqual(name, BAD_CAST "delete")) {


On Sun, Aug 31, 2008 at 7:50 PM, Matt Amos <[EMAIL PROTECTED]> wrote:
> Martijn van Oosterhout wrote:
>>
>> On Fri, Aug 29, 2008 at 5:23 PM, Robert (Jamie) Munro <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Great progress - that's now about 14 times real time.
>>>
>>> How much disc space does the whole operation require?
>>
>> On this system 42.5GB. Transient may be more, not sure.
>
> on this system it took 22.8h for the initial planet import, and it is
> currently consuming 44Gb. from munin graphs it seems that max usage was
> roughly 9Gb more.
>
> when i try to load the next day's osc i get the following error:
>
> Reading in file: /home/osm/planets/20080827-20080828.osc.gz
> Processing: Node(50k) Way(0k) Relation(0k)insert_node failed: ERROR:
> duplicate key value violates unique constraint "planet_osm_nodes_pkey"
> (7)
> Arguments were: 291466881, 6319926.5639616642, -13695169.0319488477, (null),
> Error occurred, cleaning up
>
> the node with that ID seems to have been created just after midnight and
> appears in the both the planet dump and daily diff. is there any way around
> this, or should i just import the hourly diffs from 1am instead?
>
> cheers,
>
> matt
>



-- 
Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Update osm2pgsql patch mode

2008-08-31 Thread Matt Amos
Martijn van Oosterhout wrote:
> On Fri, Aug 29, 2008 at 5:23 PM, Robert (Jamie) Munro <[EMAIL PROTECTED]> 
> wrote:
>> Great progress - that's now about 14 times real time.
>>
>> How much disc space does the whole operation require?
> 
> On this system 42.5GB. Transient may be more, not sure.

on this system it took 22.8h for the initial planet import, and it is 
currently consuming 44Gb. from munin graphs it seems that max usage was 
roughly 9Gb more.

when i try to load the next day's osc i get the following error:

Reading in file: /home/osm/planets/20080827-20080828.osc.gz
Processing: Node(50k) Way(0k) Relation(0k)insert_node failed: ERROR: 
duplicate key value violates unique constraint "planet_osm_nodes_pkey"
(7)
Arguments were: 291466881, 6319926.5639616642, -13695169.0319488477, (null),
Error occurred, cleaning up

the node with that ID seems to have been created just after midnight and 
appears in the both the planet dump and daily diff. is there any way 
around this, or should i just import the hourly diffs from 1am instead?

cheers,

matt

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Update osm2pgsql patch mode

2008-08-29 Thread Martijn van Oosterhout
On Fri, Aug 29, 2008 at 5:23 PM, Robert (Jamie) Munro <[EMAIL PROTECTED]> wrote:
> Great progress - that's now about 14 times real time.
>
> How much disc space does the whole operation require?

On this system 42.5GB. Transient may be more, not sure.

Have a nice day,
-- 
Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Update osm2pgsql patch mode

2008-08-29 Thread Robert (Jamie) Munro
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Martijn van Oosterhout wrote:
> Due to the generous contribution of a (loan of a) machine from Richard
> Duivenvoorde I had the opportunity to examime the bottlenecks in the
> osm2pgsql patching process. I fixed a number of stupid problems and
> switched to GIN indexes. It now takes on this machine 24 hours to load
> the planet dump (!) (that's with 1.5GB cache) but it can apply a daily
> diff in 100 minutes and a minute diff in seconds.
> 
> I hope to be able to analyze exactly where the time is going over the
> next while to see what improvements can be made (I'm sure there's
> lots). I'll let you know how it goes.

Great progress - that's now about 14 times real time.

How much disc space does the whole operation require?

Robert (Jamie) Munro


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAki4FHQACgkQz+aYVHdncI2xyACgyfXluf6nLCBxIMHEN1veQ2mm
U3QAniSwP/a5rqFcj9pd88ktVob2M6cg
=eGeL
-END PGP SIGNATURE-

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev