Re: [Talk-ca] importing GeoBase Data (learning from TIGER)

2008-11-26 Thread Jason Reid
James Ewen wrote:
> Okay, 'splain this to me...
>
> Let's just say that we have the whole of Canada imported from Geobase.
> How does an update get processed? ie. Geobase makes a big update to
> it's database, and we want to update OSM from that? Will there need to
> be some ID tag on each way so that we don't just create a whole new
> layer under/over the old?
>
> If this is the case, we need to ensure that these ID tags stay with
> the ways. If someone goes in, and joins two ways that make up the
> street in front of his house, because he thinks it really shouldn't be
> two ways between the three nodes, is this going to be a problem?
>
> If we do a mass import, but skip well defined areas like Toronto, does
> this mean that it's going to need to be skipped forever since the
> Geobase tags won't be integrated into the skipped area ever?
>
> Whatever it means, I'm itching to get more data stuffed into Alberta!
> I've burnt a couple thousand dollars worth of fuel just putting some
> of the major highways into the database. I'm hoping there's going to
> be some way to define an area, and then ask for a Geobase import to be
> done on that area, and then I just need to go into that area and make
> sure things imported properly, watching for any errors/duplication.
>
> James
> VE6SRV
>
> ___
> Talk-ca mailing list
> Talk-ca@openstreetmap.org
> http://lists.openstreetmap.org/listinfo/talk-ca
>   
1. Yes. Geobase assigns each node/line in its data set a unique ID for 
this purpose, so we'll just use those.
2. Yes, its going to be the single issue that may well prevent doing 
much more then updating properties of ways/nodes that have tags from a 
previous import. And even that could be full of problems. Theres a huge 
potential to cause problems even without modifying the geobase data in 
OSM, but rather adding things connected to it.
3. Yes, and No. The idea is to skip areas with data until someone 
volunteers to hand massage that data into place. That would pretty much 
involve loading both the current OSM and Geobase data for an area into 
the editor, and picking through it to remove the duplicates. Some of 
this could be automated to an extent, but its really going to be a 
manual process.

-Jason Reid

___
Talk-ca mailing list
Talk-ca@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] importing GeoBase Data (learning from TIGER)

2008-11-26 Thread Sam Vekemans
http://wiki.openstreetmap.org/wiki/Geobase_NRN_-_OSM_Map_Feature
I created the wiki page for it, and added the 7 different areas of Geobase
data.
Going through the pdf, wow, thats a lot to sink in. :)

I messaged another user of Geobase data i found Googling, so perhaps more
help can be found out there.
It looks like the chart would show the direct comparisons of each road type
and features, to what OSM does.

But shouldn't be too hard. :)

On Wed, Nov 26, 2008 at 7:42 PM, Steve Singer <[EMAIL PROTECTED]>wrote:

> On Wed, 26 Nov 2008, Michel Gilbert wrote:
>
>  Geobase has a uniformed road classification. The matching between osm and
>> geobase road classes should be applicable globally. I suspect that local
>> contexts may be necessary in some cases (ramp classification form
>> example).
>> We can start Geobase NRN - OSM Map Feature.  I can start this.
>>
>
>
> Yes this needs to be started.  The wiki would be a good place to start
> collecting the mapping in. Others can contribute that way as well.
>
> The feature catalogue for the road network, and hydro network  can be found
> at
>
> http://www.geobase.ca/doc/specs/pdf/GeoBase_FeatureCatalogue_SegmentedView_NRN_2_0_EN.pdf
> http://www.geobase.ca/doc/catalogue/GeoBase_NHN_Catalogue_1.0.1_EN.html
>
>
>
>
>>
>> Cheers,
>>
>> Michel
>>
>>
> Steve
>
>
___
Talk-ca mailing list
Talk-ca@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] importing GeoBase Data (learning from TIGER)

2008-11-26 Thread James Ewen
Okay, 'splain this to me...

Let's just say that we have the whole of Canada imported from Geobase.
How does an update get processed? ie. Geobase makes a big update to
it's database, and we want to update OSM from that? Will there need to
be some ID tag on each way so that we don't just create a whole new
layer under/over the old?

If this is the case, we need to ensure that these ID tags stay with
the ways. If someone goes in, and joins two ways that make up the
street in front of his house, because he thinks it really shouldn't be
two ways between the three nodes, is this going to be a problem?

If we do a mass import, but skip well defined areas like Toronto, does
this mean that it's going to need to be skipped forever since the
Geobase tags won't be integrated into the skipped area ever?

Whatever it means, I'm itching to get more data stuffed into Alberta!
I've burnt a couple thousand dollars worth of fuel just putting some
of the major highways into the database. I'm hoping there's going to
be some way to define an area, and then ask for a Geobase import to be
done on that area, and then I just need to go into that area and make
sure things imported properly, watching for any errors/duplication.

James
VE6SRV

___
Talk-ca mailing list
Talk-ca@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] importing GeoBase Data (learning from TIGER)

2008-11-26 Thread Jason Reid
Steve Singer wrote:
> On Wed, 26 Nov 2008, Michel Gilbert wrote:
>
>   
>> Geobase has a uniformed road classification. The matching between osm and
>> geobase road classes should be applicable globally. I suspect that local
>> contexts may be necessary in some cases (ramp classification form example).
>> We can start Geobase NRN - OSM Map Feature.  I can start this.
>> 
>
>
> Yes this needs to be started.  The wiki would be a good place to 
> start collecting the mapping in. Others can contribute that way as well.
>
> The feature catalogue for the road network, and hydro network  can be found 
> at
> http://www.geobase.ca/doc/specs/pdf/GeoBase_FeatureCatalogue_SegmentedView_NRN_2_0_EN.pdf
> http://www.geobase.ca/doc/catalogue/GeoBase_NHN_Catalogue_1.0.1_EN.html
>
>
>
>   
>> Cheers,
>>
>> Michel
>>
>> 
>
> Steve
>
>
> ___
> Talk-ca mailing list
> Talk-ca@openstreetmap.org
> http://lists.openstreetmap.org/listinfo/talk-ca
>   
If you take a look at my prototype script it includes some of the 
highway related translations:

http://svn.openstreetmap.org/applications/utils/import/geobase2osm/geobase2osm.py

It has at least the basic mappings for most of the provinces from the 
NRN dataset.

-Jason Reid

___
Talk-ca mailing list
Talk-ca@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] importing GeoBase Data (learning from TIGER)

2008-11-26 Thread Steve Singer
On Wed, 26 Nov 2008, Michel Gilbert wrote:

> Geobase has a uniformed road classification. The matching between osm and
> geobase road classes should be applicable globally. I suspect that local
> contexts may be necessary in some cases (ramp classification form example).
> We can start Geobase NRN - OSM Map Feature.  I can start this.


Yes this needs to be started.  The wiki would be a good place to 
start collecting the mapping in. Others can contribute that way as well.

The feature catalogue for the road network, and hydro network  can be found 
at
http://www.geobase.ca/doc/specs/pdf/GeoBase_FeatureCatalogue_SegmentedView_NRN_2_0_EN.pdf
http://www.geobase.ca/doc/catalogue/GeoBase_NHN_Catalogue_1.0.1_EN.html



>
>
> Cheers,
>
> Michel
>

Steve


___
Talk-ca mailing list
Talk-ca@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] importing GeoBase Data (learning from TIGER)

2008-11-26 Thread Sam Vekemans
Thanks, i have updated the wiki to include more points.

Idea: shut down osm for upload of geobase road data? = scrubbed, we (1
person) can load the data tile by tile (using the conversion script,
hopefully ibycus can help)
issue w/ id #duplicates? = the routeen would be set to retry and and
only load this skipped data. ... so there probably is script that
would work.   ... however if there is a way to load it all at once? ..
i'm sure the Foundation would go for it. ... but going at it, tile by
tile, makes more sense to keep the accuracy, and lower the risk of
error.

Idea: make it all render=no, so to avoid having 'ghost lines' in
'mostly complete' areas. = almost scrubed.
Solution: Going at it tile by tile, starting with 001001.(see geobase
map) .. and using the wiki page to show the chart of each tile as its
done, would help. .. and make to process consistant.
For most of Canada.. it would be fine to NOT to include the render=no
tag and so... with tile import, we ALL can go at it, and look for
duplicates that the script misses. ... once were happy that it looks
good, we go on to the next tile.  (we dont need to physically be
there) and can fix the 'ghost lines' ...

Issue: Duplicate edges
I think Ibycus has fixed that with his script, (converting the stuff
to polish format), remember that the way the geobase roads are set, is
that the road is broken up, block by block.  most Canada roads is a
grid pattern, and lines up fine.  I don't see an issue it.

Issue: Uploading polygons that cross bounding box tiles.
Ya, again Ibycus topo had that issue, and solved it someway.
We need to remember that these boxes are .5 x .25 degree's big. ...
manually fixing, as we need to include missing data such as the right
park name, and the relation ie. BC parks. would be a manual thing
(above the fix) as trails etc, all need to be added. BTW, the newer
version of the map contains combined tile sets, making for a smaller
number of files to load.

Solution: don't know yet.

Issue: OSM file size (ie. toronto duplicates)
Well, ya... the Ibycus topo is 3 gigs of IMG files, OSM files are 4
times larger, so thats 12 gigs of data. ... (including the countour
lines), so i don't know.. the roads would probably be a smaller load,
than the natural features and & poi the other databases to be loaded.
So we'll need to have a chart showing what databases are being loaded,
and whats the status of each.
So for those particular tiles which cover a large about of data,
having everyone at it (adjusting ghost linns) around the same time,
wouldn't be practicle. .. using a script (Hi road, you have the same
name and the same type, so i wont bother joining OSM) would work.
... the problem with that is of course, on future imports. .. BUT
because the script says (Hi road) when future roads are in place
on OSM, the script wouldn't alow the import of that new Geobase data.
So if  we can show and prove it.
ie. on the test area remove geobase data, and add in OSM user same
road name, same class... then import geobase data again, to see if the
script picked it up, and asked that question.

Solution: Start with 001K11, which is not  St. Johns NFLD ... but
south of it. and try :) .. maybe start with land & water features
first?

Issue: Attempting to map data already done.
As richard pointed out (about the render=no) .. there has been an
example where the user was set out to map a parking are, when by the
time they got home, someone else was also mapping, a few hours before
or after and had the same idea. .. so all they could do is look a
little deeper, and see what other features are missing, that they
could possably add.
The solution to that persons delema is this:  By contacting all the
Mappers in the area, and physically meeting them, you can get an idea
of what kinds of things people would like to map. .. and so its common
to start with your local area, and build outwards. .. so if you are
out mapping, announce it, and see if others want to join in.

If the render=no was an option for Toronto, after importing the data,
and running the script .. with only the 'maybies' being rendered no.
... the 1st task for the user would be hold out a little, and wait or
helpout, until the import process is done for the area.
If everyone in the tile are knows that importing is going on, the
likely hood of mappers being dissapointed is minimal.

The priority is this:
-Make sure the importing script code, asks if road names & types are
in there, if so it (or that road segment) doesnt get imported.
-make sure that all data that can be imported, does get imported, and
that the data of duplicates, the script should know the difference.
-take our time, going through each tile slowely, to make sure all
duplicates, and errors got fixed for the next run. .. so reverting
back would be minimal.

Maybe that helps?
Cheers,
Sam

P.S. feel free to add to the Wiki :) & ya, this discussion is delayed
by a few days. ... but its worth it, to make it right :)

On 11/26/08, [EMAIL 

Re: [Talk-ca] importing GeoBase Data (learning from TIGER)

2008-11-26 Thread Michel Gilbert
Hi all,

2008/11/26 Dave Hansen <[EMAIL PROTECTED]>

> On Wed, 2008-11-26 at 01:40 -0800, Sam Vekemans wrote:
> >
> >
> >
> > I do think it was important to have things broken up
> > geographically.  It
> > makes it much easier if something goes bad to find the data,
> > remove it,
> > an retry.
> >
> >
> > Since the data is broken up into Geobase tiles, perhaps importing by
> > tile area to get more specific. The provinces are rather large, so
> > going at it, by 1 degree x 2 degree would be better??
>
> Yes, it probably would be better.  However, there is also the problem of
> stitching things back together in the end.  I never dealt with that
> part.
>
> Also 1x2 degrees probably isn't bad for, say, the Yukon Territories.
> But, I would imagine that Toronto is going to fit almost entirely into
> one of those.  That might pose a few problems.  I think the largest .osm
> file that I uploaded was Kern county in California.  It was 165MB.




A regular tile size looks fine to me but will introduce duplicates along the
edges. How do we handle polygons that cross the bounding box ? From my
experience, uploading data with josm has a limitation number. Beyond that
limitation josm stops, therefore leaves a mess in the database. May be there
are others ways to upload large amount of data.




>
> > One thing I never considered, but did come back to bite me a
> > few times
> > was concurrency.  I'd upload a node, make a way use it, then
> > come back a
> > few hours later to have another way use the node.  But,
> > somebody got to
> > the node before I did.  There were three or four of these and
> > I fixed
> > them up by hand.  It sucked.  :)
> >
> >
> > Well, remember (last week i think it was) when OpenStreetMap was shut
> > down for maintenance?
> > Well, what about convincing the foundation to shut down the server so
> > then all the data can be uploaded at once?
> > That would fix the problem that you had.  :)
>
> Sure, if you can pull this off, go for it.  Otherwise, it isn't *that*
> difficult of a thing to plan for and fix.
>
> Basically, if you notice that some node that you need is gone, you just
> re-upload a new copy of the original node and make a note of it.  It's
> that simple.
>
> > Keep a record of everything that you do.  Keep good logs and
> > make sure
> > that whatever programs you use to upload the data can be
> > stopped and
> > restarted at any time with no ill effects.  This generally
> > means keeping
> > a table of which objects have been uploaded and their id
> > mappings.
>
> >
> > I think we already discovered that the natural features shapefiles
> > data, shouldnt post any conflect... not a major one that is. ... Every
> > city does have some kind of water feature, and it's probably labeled,
> > but thats about it.
> > and for the other features  ya pausing OpenStreetMap to make the
> > import happen.  would guarentee no point conflicts.
> >
> >
> > I
> > think bulk-upload.pl
> > http://wiki.openstreetmap.org/wiki/Bulk_import.pl
> > does this pretty well, although I did have to
> > customize it a bit.
> > Ya, as as far as i can see, the way that GeoBase keeps the data is a
> > bit different.
> > each province does have a different way of classing roads. .. so when
> > your literally traveling between provinces.. the pavement is
> > identical.. yet the signs on the roads indicate a different road
> > class.  (thats because provincial roads are funded provincially, there
> > is very little discussion between provinces.  So each provincial
> > upload would be different. (talk-ca talked about it back in the
> > summer)
>
> Yeah, one of the first steps is to come up with a conversion scheme to
> convert your features into OSM features.
>
> -- Dave
>

Geobase has a uniformed road classification. The matching between osm and
geobase road classes should be applicable globally. I suspect that local
contexts may be necessary in some cases (ramp classification form example).
We can start Geobase NRN - OSM Map Feature.  I can start this.


Cheers,

Michel
___
Talk-ca mailing list
Talk-ca@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] importing GeoBase Data (learning from TIGER)

2008-11-26 Thread richard
> On Wed, 26 Nov 2008, Sam Vekemans wrote:
>
>> Well, remember (last week i think it was) when OpenStreetMap was shut
>> down
>> for maintenance?
>> Well, what about convincing the foundation to shut down the server so
>> then
>> all the data can be uploaded at once?
>> That would fix the problem that you had.  :)
>
> If we want to do a progressive import ( small tile by small tile) then
> this
> won't work, we aren't talking about one server shutdown but many.  I'm
> also
> no so sure the rest of the OSM community is keen on outages for data
> imports.We might be better off writing scripts to detect (and maybe
> fix/revert?) conflicts after the fact.

I think asking OSM to shut down so we can play is unlikely to win us
friends.  And I don't think that it is required.  There was much more data
imported from TIGER than we have from GeoBase, and that was done county by
county I believe.

GeoBase tiles may be a rough equivalent in size to the county uploads from
TIGER.  I've emailed one of the TIGER import folks and asked him to join
us here on talk-ca.

I also think that uploading everything and hiding some / all of it is a
bad idea.  We know that tagging for the renderer is sub-optimal and that
things should be tagged "correctly" so that future renderers and editors
will "get it".

Needless duplication of data (say OSM Toronto, plus Toronto on GeoBase) is
wasteful of our resources in terms of database space and bandwidth to
editors.

I also see potential trouble with making additions and changes to any
"overlaid" Toronto data.  Imagine that you spend an afternoon adding bike
routes and bus routes as relations, but didn't notice that half of the
ways you worked on were "render=no".  Or that you did notice and just
changed them to render=yes because of course you want to see your
relations render

I'm very excited that we have this wonderful data contribution and that we
have such an enthusiastic and energetic group to participate in the
discussion and import.

I think we should take a measured approach and delicate steps.  TIGER took
months to upload, and had at least one false start.  We don't have a
deadline to include the GeoBase data.  Let's find a way to include it that
makes it super easy to accept updates from GeoBase in future (hello, road
names, I'm talking to you).  And let's avoid three or four uploads of
everything, then rollbacks, then uploads again.  Nobody wants to see
Canada rendered then unrendered like a web site that over-uses the < blink
> tag.

Best regards,
Richard



___
Talk-ca mailing list
Talk-ca@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] importing GeoBase Data (learning from TIGER)

2008-11-26 Thread Steve Singer
On Wed, 26 Nov 2008, Sam Vekemans wrote:

> Well, remember (last week i think it was) when OpenStreetMap was shut down
> for maintenance?
> Well, what about convincing the foundation to shut down the server so then
> all the data can be uploaded at once?
> That would fix the problem that you had.  :)

If we want to do a progressive import ( small tile by small tile) then this 
won't work, we aren't talking about one server shutdown but many.  I'm also 
no so sure the rest of the OSM community is keen on outages for data 
imports.We might be better off writing scripts to detect (and maybe 
fix/revert?) conflicts after the fact.


>
> So a bounding box over the 'complete' areas... or a blanket 'render=no' over
> the whole thing. So it shows 'ghost lines' where the difference is from the
> import to what the users did.

How widely accepted/implemented is the render=no tag? I haven't been 
able to find a wiki page for it. I'm concerned that bulk importing large 
quantities of 'render=no' ways will confuse some of the other programs that 
use OSM data (ie navigation software) particularly if the tag isn't 
typically used for this.

How difficult (in practice) is it to bulk merge data. Something along the 
lines of

*If OSM already contains a way within x meters of what we are importing that 
moves in the same direction and is of the same 'type' then call them the 
same way.

I saw talk on the TIGER discussions about this sort of thing but it wasn't 
clear what they actually implemented (I probably should look at the code)

>
> Hopefully this all makes sense, :)
>
> Cheers,
> Sam Vekemans
> Across Canada Trails
>


___
Talk-ca mailing list
Talk-ca@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] importing GeoBase Data (learning from TIGER)

2008-11-26 Thread Sam Vekemans
>
>
>
> I do think it was important to have things broken up geographically.  It
> makes it much easier if something goes bad to find the data, remove it,
> an retry.


Since the data is broken up into Geobase tiles, perhaps importing by tile
area to get more specific. The provinces are rather large, so going at it,
by 1 degree x 2 degree would be better??


>
>
> One thing I never considered, but did come back to bite me a few times
> was concurrency.  I'd upload a node, make a way use it, then come back a
> few hours later to have another way use the node.  But, somebody got to
> the node before I did.  There were three or four of these and I fixed
> them up by hand.  It sucked.  :)
>

Well, remember (last week i think it was) when OpenStreetMap was shut down
for maintenance?
Well, what about convincing the foundation to shut down the server so then
all the data can be uploaded at once?
That would fix the problem that you had.  :)

>
> Keep a record of everything that you do.  Keep good logs and make sure
> that whatever programs you use to upload the data can be stopped and
> restarted at any time with no ill effects.  This generally means keeping
> a table of which objects have been uploaded and their id mappings.


I think we already discovered that the natural features shapefiles data,
shouldnt post any conflect... not a major one that is. ... Every city does
have some kind of water feature, and it's probably labeled, but thats about
it.
and for the other features  ya pausing OpenStreetMap to make the import
happen.  would guarentee no point conflicts.

I
> think bulk-upload.pl

http://wiki.openstreetmap.org/wiki/Bulk_import.pl

> does this pretty well, although I did have to
> customize it a bit.

Ya, as as far as i can see, the way that GeoBase keeps the data is a bit
different.
each province does have a different way of classing roads. .. so when your
literally traveling between provinces.. the pavement is identical.. yet the
signs on the roads indicate a different road class.  (thats because
provincial roads are funded provincially, there is very little discussion
between provinces.  So each provincial upload would be different. (talk-ca
talked about it back in the summer)


> Try to contact the 'owners' of local areas.  Most of them will be a bit
> cranky,

We haven't heard from some people yet. .. and many of the smaller
contributors don't know it yet.  I did post it on the Diary board.. so more
people will get the word.
I think the number is about 20 larger users? I'm sure we can get an exact
number on that.

but will grudgingly accept that they generally need to clear out
> their work and just deal with the new data.  But, let them make the
> decisions as much as possible.
>

So a bounding box over the 'complete' areas... or a blanket 'render=no' over
the whole thing. So it shows 'ghost lines' where the difference is from the
import to what the users did.

>
> I'm sure you're also going to do plenty of small-scale experiments.


I think taking some screenshots of what the ibycus topo looks like on
mapsource, vs. what the OSM data looks like on mapsource, is a good example
to show people the differences, would really help.


>  I
> had to do several iterations, mostly on my local data set before it
> starting rendering like I needed.
>
> > My thought was adding the render =no tag to it all. and import it all,
> > then manually going in there and cleaning up the roads which don't
> > align the same way.
>
> Oh, you mean for overlapping data?  There's really no good way to handle
> the overlapping stuff.  I'd say just import it all, keep it visible,

Well, we can show screenshots of what the Mapsource view of both the Ibycus
topo, as well as current OSM data, will look like, with all its ghost lines
showing.?

> and
> let people clean it up later.
>
> -- Dave
>
> By having the data available, in some cases where the town only has 2 lines
running through it, it would be easier to blanket import the data, and
select it all and change it to render=yes. .. and then eithor move the
imported data points so it overlaps the user data.. or move the user data so
it overlaps the imported data.
By having everything set to render=no, this avoids rendering the ghost
lines, and there is no cleanup needed for the 'completed' areas. .. so this
can be done in potlatch easily, as it will show these not rendered roads, as
ghost roads. (They show up lighter on potlatch anyway)
.. so really... if someone is really board.. they can go in there and move
the user data ontop of the imported data if they like... but it would really
make no difference. ... as long as the line that is rendered is right. ..
ya, no difference.

.. and if your working on an area. .. the 1st thing we would ask users todo
is select the areas and get it all renderable.  in JOSM. .. with potlatch,
its visually easier to see whats not rendered, but for a new area.
...
So this way... for when i mapped protection island none of my work would