Re: [OSM-talk-be] CRAB Import Tool

2013-10-22 Thread Kurt Roeckx
On Mon, Oct 21, 2013 at 10:45:22PM +0200, Kurt Roeckx wrote:
> On Mon, Oct 21, 2013 at 10:06:03PM +0200, Kurt Roeckx wrote:
> > I really see no good reason not to add those IDs at this point.
> > I don't see the harm in them.  I can only see them being useful.
> 
> I would actually want to propose a different import strategy:
> - Add the CRAB IDs to all existing addresses in Flanders
> - Import the rest or large parts of CRAB in one big import

So after feedback on this, I want to propose that instead of
actually importing this that we provide the data that this import
tool would generate in such a way that it's easy for people to
take the data and import it themself, potentially after fixing
things.

This would make it easier to improve the import tool after getting
feedback of what it generates wrong.


Kurt


___
Talk-be mailing list
Talk-be@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-be


Re: [OSM-talk-be] CRAB Import Tool

2013-10-22 Thread Jo
FWIW I also see value in adding a backreference to CRAB in the OSM data. It
will make it a lot easier to do automated follow up and comparison in the
long run.

I also see value in the slow but steady way of having contributors
integrate the data. It's slower by an order of magnitude, but indeed
community is more important than content in the long run.

Jo


2013/10/22 Kurt Roeckx 

> On Tue, Oct 22, 2013 at 05:36:28AM +0200, Marc Gemis wrote:
> > So you are going to write an algorithm that matching addresses in OSM
> with
> > addresses in Crab in order to add an id. Right now there are already
> > addresses in OSM that are not in Crab. The same might happen next year.
> > People might have added POIs with addresses. So you will always need an
> > address based matching algorithm. So there is no reason to add the Crab
> id
> > in OSM.
>
> I don't follow your reasoning here.  Addresses in OSM but not CRAB
> shouldn't be a problem.  I also don't understand your comment
> about POIs with an address.
>
> It's not because you can match a lot of the addresses based on
> al algorithm that you can find all of them.  This *will* require
> people to manually fix things and manually add the relation
> between the 2.
>
> > What do you mean by "Fix our data" ? Is Crab suddenly the holy grail ?
>
> I didn't say anything like that.  I just say that our data *does*
> contain errors.  I'm also pretty sure theirs contain errors.  If
> we look at the differences we need to find out which one is correct,
> and then try to get the correct information in both.
>
> > Their DB contains mistakes as well. I'm against a full automatic import.
> > I'm still in favor of the workflow that Ben proposes. Using a website to
> > download a street. Manually merging with existing data, drawing
> buildings,
> > merging or splitting buildings were needed. Who wrote a few days back
> that
> > house nodes without buildings are not so good (I'm not saying it was
> you) ?
> > An automatic import cannot prevent that.
>
> I did say that I would prefer the address information is added to
> building, but that just having a housenumber and no building is
> better then nothing.
>
> I also don't see myself drawing all the buildings when I'm going
> to import the address information because it will take a lot more
> time.  But I will at some point draw them.  You might have
> different priorities than I.
>
> If I were to write an import tool, I would be careful on when to
> import something, and when in doubt don't import the address.  I
> already have several rule in my head that could be useful.  But it
> looks to me like nobody wants me to do this, so I'm not going to
> put any effort in this.
>
> > It would be nice though to have something like Jo did for the busstops.
> > Have a table for mismatches between the OSM data and the imported data.
> > Such a list could be generated every year to see which data should be
> added
> > or updated
>
> AGIV delivers updated files on daily basis.  There should not be a
> problem to actually also compare them on daily basis, and update
> the list of nodes that still need to be imported on daily basis.
> But I don't see myself putting time in this if there is no
> relation between the 2 databases.
>
>
> Kurt
>
>
> ___
> Talk-be mailing list
> Talk-be@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk-be
>
___
Talk-be mailing list
Talk-be@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-be


Re: [OSM-talk-be] CRAB Import Tool

2013-10-22 Thread Kurt Roeckx
On Tue, Oct 22, 2013 at 05:36:28AM +0200, Marc Gemis wrote:
> So you are going to write an algorithm that matching addresses in OSM with
> addresses in Crab in order to add an id. Right now there are already
> addresses in OSM that are not in Crab. The same might happen next year.
> People might have added POIs with addresses. So you will always need an
> address based matching algorithm. So there is no reason to add the Crab id
> in OSM.

I don't follow your reasoning here.  Addresses in OSM but not CRAB
shouldn't be a problem.  I also don't understand your comment
about POIs with an address.

It's not because you can match a lot of the addresses based on
al algorithm that you can find all of them.  This *will* require
people to manually fix things and manually add the relation
between the 2.

> What do you mean by "Fix our data" ? Is Crab suddenly the holy grail ?

I didn't say anything like that.  I just say that our data *does*
contain errors.  I'm also pretty sure theirs contain errors.  If
we look at the differences we need to find out which one is correct,
and then try to get the correct information in both.

> Their DB contains mistakes as well. I'm against a full automatic import.
> I'm still in favor of the workflow that Ben proposes. Using a website to
> download a street. Manually merging with existing data, drawing buildings,
> merging or splitting buildings were needed. Who wrote a few days back that
> house nodes without buildings are not so good (I'm not saying it was you) ?
> An automatic import cannot prevent that.

I did say that I would prefer the address information is added to
building, but that just having a housenumber and no building is
better then nothing.

I also don't see myself drawing all the buildings when I'm going
to import the address information because it will take a lot more
time.  But I will at some point draw them.  You might have
different priorities than I.

If I were to write an import tool, I would be careful on when to
import something, and when in doubt don't import the address.  I
already have several rule in my head that could be useful.  But it
looks to me like nobody wants me to do this, so I'm not going to
put any effort in this.

> It would be nice though to have something like Jo did for the busstops.
> Have a table for mismatches between the OSM data and the imported data.
> Such a list could be generated every year to see which data should be added
> or updated

AGIV delivers updated files on daily basis.  There should not be a
problem to actually also compare them on daily basis, and update
the list of nodes that still need to be imported on daily basis.
But I don't see myself putting time in this if there is no
relation between the 2 databases.


Kurt


___
Talk-be mailing list
Talk-be@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-be


Re: [OSM-talk-be] Why I'm against a full automatic import (ook in het Nederlands) (Was CRAB Import Tool)

2013-10-22 Thread Glenn Plas

English only since I've slept minus 3 hours. Sorry.

I can't help but aggree fully.   In analogy,  as we speak I'm trying to 
migrate data from a poi list of over 1000 poi's with my own version of 
this database using the customer foreign key to make this happen.  (for 
an API) I'm actually pretty much against this action but there are few 
alternatives since it was poorly planned by the customer.   I'm doing a 
full merge, so coordinates will come from mine, labels from the customer 
-They are actually bus stops-, public ones that are used by private 
transport in Antwerp, used by collective transporting services by BASF 
etc.).


This is only 1000 poi's and it's a merger's hell.  Arrival times differ 
between both version, even validity is offset.  Sometimes important 
stuff has been put in a comment field.  This makes me remember how hard 
it is to automate these things, I've done acts like that before, but 
here I am using an excell sheet(dump customer) and an sqlite database to 
construct this single table instead...


I just cannot create an algorithm of my common sense on how to go about 
each record.


The reason why I'm against such an action is because I cannot trust 
neither sources of the data to be consistent.  You might manage to 
import 30%, even 60% without problem, but it will be like the 80/20 
percent rule.  20% of the effort and time will be spent in importing 80% 
of the data, and vice versa, you'll spend 80% of your time on the 
remaining 20% of the data.   You might also end up spending weeks using 
overpass to correct OSM before import.  I've done a lot of those and I 
assure you, it's just crazy the kind of mistakes you find in OSM alone. 
addr:postcode=Zemst , addr:city=1980 first one that pops in my mind.  
I've done lots of corrections like that.   But that is just one of many 
idiotic things, honest mistakes and ignorance at work, all well meant 
efforts, with the best intentions.


So in the end, we'll need something like Ben proposed.  It's a lot more 
fun indeed and it gives ownership,  just perfect.


Glenn

On 2013-10-22 07:12, Marc Gemis wrote:

Nederlands onderaan

Allow me to explain why I'm against a full automatic import of the 
Crab data, as proposed on this mailing list


I understand that this is the fastest way to get the data into OSM and 
ready for use by everybody.
However, the data will then be owned by 1 or 2 people that did the 
import. They will not be able to cope with the consequences of the 
data they imported. The import software will have some flaws (double 
addresses, missing buildings, bad buildings, problems with 
associatedStreet merging, etc.)

Will you clean up the mess that others made ?

If, on the other hand, you allow people to import their own chunks of 
data (via the tool made by the French, a lot of people "own" the data. 
Every contributor takes some pride in the data s/he added and will be 
glad to make corrections to it. Even during the initial import 
improvements to the imported & existing data will be made. The more 
people that do this, the better.


It's all about community building. Build a community around this 
import. This community will do other things as well afterwards.


You can hear the same message in all presentations on import at the 
SOTM US and SOTM conferences. Please take a look at those videos.



- Nederlands---
Sta me toe om uit te leggen waarom ik tegen een volledige 
geautomatiseerde import van Crab data ben, zoals ergens voorgesteld werd.


Ik begrijp dat sommigen de data snel in OSM willen krijgen, zodat het 
door iedereen kan gebruikt worden. Het gevolg daarvan is dat de 
gegevens door 1 of 2 mensen aangemaakt is. Zij kunnen niet alle 
probleempjes oplossen die ontstaan door deze invoer. Ik denk hierbij 
aan foutjes in de software die ervoor zorgen dat er dubbele adressen 
zijn of problemen met de associatedStreet-relaties. Ook wordt er 
tijdens de import ook niks gedaan aan ontbrekende of foutieve 
gebouwen. Wie gaat die problemen aanpakken die door anderen gemaakt zijn ?


Als je aan de andere kant, iedereen toelaat om stukjes gegevens te 
importeren en onmiddellijk te verbeteren, krijg je een groep van 
mensen die de gegevens bezit/beheert. Deze mensen gaan in zekere zin 
fier zijn op hun werk en proberen de fouten eruit te halen. Hoe meer 
van deze mensen hoe beter.


Het gaat dus over het opbouwen van een community. Bouw aub een 
community op rond deze import. Op langere termijn zal osm er wel bij 
varen.


I meen deze boodschap ook te horen in alle presentaties rond imports 
die gegeven zijn op de SOTM US en SOTM conferenties. Kijk maar eens 
naar die videos. (wel in het Engels)


groeten

m


___
Talk-be mailing list
Talk-be@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-be


___
Talk-be mailing list
Talk-be@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-be