Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Ian Dees
On Thu, Dec 9, 2010 at 7:31 PM, Serge Wroclawski  wrote:

> >>> 3) Having looked at the data catalogue page, I do have concerns about
> the
> >>> terms of use and think it's best to get SF to explicitly agree to allow
> OSM
> >>> to use the data.
> >>>
> >>> http://gispub02.sfgov.org/website/sfshare/index2.asp
>
> I'd have legal look at this. I'm a little confused by some of the
> wording about deravitive works and transfered rights and
> indemnification.
>

It's a fairly standard (if a little more wordy) indemnification agreement.
The derivative works stuff is making sure that no one can hold the San Fran
government liable for using the data directly or indirectly or when used as
part of a derivative work.

I don't see anything preventing the data from being used or traced by OSM. I
don't even see an attribution requirement.
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Serge Wroclawski
On Thu, Dec 9, 2010 at 8:14 PM, Katie Filbert  wrote:
> On Thu, Dec 9, 2010 at 8:06 PM, Gregory Arenius  wrote:
>>
>>> A few comments...
>>>
>>> 1) San Francisco explicitly says they do not have building outline data.
>>> :(  So, I suppose we get to add buildings ourselves.  I do see that SF does
>>> have parcels.

If buildings aren't availble, that's too bad, but such is life.

I don't think parcels are generally useful.

>>> 2) I don't consider the addresses as noise.  The data is very helpful for
>>> geocoding.  If the renderer does a sloppy job making noise out of addresses,
>>> the renderings should be improved.

Katie's position is certainly valid, especially as it relates to geocoding.

They render ugly, but I'd rather ugly render and some data than no data.

>>> 3) Having looked at the data catalogue page, I do have concerns about the
>>> terms of use and think it's best to get SF to explicitly agree to allow OSM
>>> to use the data.
>>>
>>> http://gispub02.sfgov.org/website/sfshare/index2.asp

I'd have legal look at this. I'm a little confused by some of the
wording about deravitive works and transfered rights and
indemnification.

If SF is open minded- that's awesome. In an ideal world they'd use an
existing license with well defined boundries, like CC0, but baring
that, I'd say don't mention the license at all, but simply have them
donate the data to OSM itself. Legal can help with this.


As for a demo of the data, yeah, an OSM file would be perfect. Also,
though, I'd keep the previous dataset ID, in case you need to do a
comparison later.

- Serge

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Katie Filbert
On Thu, Dec 9, 2010 at 8:06 PM, Gregory Arenius  wrote:

>
> A few comments...
>>
>> 1) San Francisco explicitly says they do not have building outline data.
>> :(  So, I suppose we get to add buildings ourselves.  I do see that SF does
>> have parcels.
>>
>> For DC, we are attaching addresses to buildings when there is a one-to-one
>> relation between them.  When there are multiple address nodes for a single
>> building, then we keep them as nodes. In vast majority of cases, we do not
>> have apartment numbers but in some cases we have things like 1120a, 1120b,
>> 1120c that can be imported.  Obviously, without a buildings dataset, our
>> approach won't quite apply for SF.
>>
>
>
> We mostly only have building shapes drawn in downtown where its unlikely
> there will be many one-to-one matches.  I wish we did have a building shape
> file though, that would be great.  I have thought about using the parcel
> data but I'm not sure thats as useful.
>

Agree, not sure how useful parcels are for us.



>
>
>> 2) I don't consider the addresses as noise.  The data is very helpful for
>> geocoding.  If the renderer does a sloppy job making noise out of addresses,
>> the renderings should be improved.
>>
>
>> 3) Having looked at the data catalogue page, I do have concerns about the
>> terms of use and think it's best to get SF to explicitly agree to allow OSM
>> to use the data.
>>
>> http://gispub02.sfgov.org/website/sfshare/index2.asp
>>
>
> What terms in particular caused you concern?  I'll need to know if I'm
> going to ask for explicit permission. A while back I posted the previous
> terms to talk legal and they pointed out problems.  The city changed the
> license when I pointed out that it cause problems for open project
> (apparently that was in the works anyway).  I thought those problems were
> removed.  I had a conference call with one of the datasf.org people the
> helps make city datasets available and an assistant city attorney prior to
> those changes and I was told that unless specifically noted otherwise in the
> dataset that the data was public domain.  I do understand that that isn't in
> writing though.
>
> If there is a problem with the terms though there is still a good chance
> the city would give us explicit permission to use the data;  they seemed
> excited about the prospect of some of it ending up in OSM.
>

I don't know enough to assess but concerned about the "click" to agree.
Also concerned about the possibility of switching to ODBL and contributor
terms and want to make sure the data would be compatible with those.  I
think it helps to have explicit permission (e.g. e-mail) for use in OSM
(agree that we can use dual-licensed under CC-BY-SA and ODBL) on file.

>
>
>> 4) If you can get explicit permission, then I suggest breaking up the
>> address nodes into smaller chunks (e.g. by census block group), convert them
>> to osm format with Ian's shp-to-osm tool, and check them for quality and
>> against existing OSM data (e.g. existing pois w/ addresses) in JOSM before
>> importing.  QGIS and/or PostGIS can be useful for chopping up the data into
>> geographic chunks.  This approach gives opportunity to apply due diligence,
>> to check things, and keep chunks small enough that it's reasonably possible
>> to deal with any mistakes or glitches.
>>
>
> I had been planning on using shp-to-osm to break it in to chunks by number
> of nodes but doing it geographically makes more sense.   Do you think census
> block size is best or maybe by neighborhood or aim for an approximate number
> of nodes in each geographic chunk?
>

With buildings, our data was a bit denser. I did some by census tract and
found some were too big for the OSM API and JOSM whereas census block group
has worked well. With just nodes, I think you could do somewhat larger
chunks.

Cheers,
Katie



>
> Cheers,
> Gregory Arenius
>



-- 
Katie Filbert
filbe...@gmail.com
@filbertkm
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Apollinaris Schoell
On Thu, Dec 9, 2010 at 4:38 PM, Gregory Arenius  wrote:

>
>  The wiki states that this is how address nodes are done.  They can be
> attached to other objects of course but they can also be independent.  Like
> I stated earlier I did check how they are actually being done elsewhere and
> the ones I've seen entered are done in this manner.
>
> Also, why do you think of them as noise?  They're useful for geocoding and
> door to door routing.  The routing in particular is something people clamor
> for when its lacking.
>
>
individual address nodes are common and there is nothing wrong adding them


> As for attaching them to buildings that doesn't particularly work well in
> many cases especially in San Francisco.  For instance a building might have
> a number of addresses in it.  A large building taking up a whole block could
> have addresses on multiple streets.  Also, we don't have building outlines
> for most of SF and that shouldn't stop us from having useful routing.
>

setting address to a building is good if there are buildings. but in this
case it makes absolute sense to have individual nodes. in case of multiple
addresses on one building the address nodes can be used as a node in the
building outline to mark the individual entrances on large buildings. but
this is really optional.



>
>>
>> > Also, there are a large number of places where there are multiple nodes
>> in
>> > one location if there is more than one address at that location.  One
>> > example would be a house broken into five apartments.  Sometimes they
>> keep
>> > one address and use apartment numbers and sometimes each apartment gets
>> its
>> > own house number.  In the latter cases there will be five nodes with
>> > different addr:housenumber fields but identical addr:street and lat/long
>> > coordinates.
>>
>> > Should I keep the individual nodes or should I combine them?
>>
>>
don't combine them if they have different house number. reality is there are
different address so we should map all of them even if the location is the
same.


>
>> I hear this every time imports come up.  I got it.  Its hard.  Thats why
> I'm soliciting feedback and willing to take my time and am really trying to
> do it correctly.  I'm not willing to just give up because there have been
> problems with imports in the past.
>

I would say this is one of the easy imports, there is not too much harm it
can create. only problem is to merge it with existing data and make a
decision which one is better. Since this data is probably authoritative it
might be ok to replace most of the less accurate data already in OSM.
For this reason I would drop any of the nodes in case of a conflict but
rename the tags to something else like sf_addrimport_addr:*
a survey on the road can check them later and compare with the existing addr
nodes and decide which one to keep and rename the import tags to the real
tags


>
> ___
> Talk-us mailing list
> Talk-us@openstreetmap.org
> http://lists.openstreetmap.org/listinfo/talk-us
>
>
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Gregory Arenius
A few comments...
>
> 1) San Francisco explicitly says they do not have building outline data.
> :(  So, I suppose we get to add buildings ourselves.  I do see that SF does
> have parcels.
>
> For DC, we are attaching addresses to buildings when there is a one-to-one
> relation between them.  When there are multiple address nodes for a single
> building, then we keep them as nodes. In vast majority of cases, we do not
> have apartment numbers but in some cases we have things like 1120a, 1120b,
> 1120c that can be imported.  Obviously, without a buildings dataset, our
> approach won't quite apply for SF.
>


We mostly only have building shapes drawn in downtown where its unlikely
there will be many one-to-one matches.  I wish we did have a building shape
file though, that would be great.  I have thought about using the parcel
data but I'm not sure thats as useful.


> 2) I don't consider the addresses as noise.  The data is very helpful for
> geocoding.  If the renderer does a sloppy job making noise out of addresses,
> the renderings should be improved.
>

> 3) Having looked at the data catalogue page, I do have concerns about the
> terms of use and think it's best to get SF to explicitly agree to allow OSM
> to use the data.
>
> http://gispub02.sfgov.org/website/sfshare/index2.asp
>

What terms in particular caused you concern?  I'll need to know if I'm going
to ask for explicit permission. A while back I posted the previous terms to
talk legal and they pointed out problems.  The city changed the license when
I pointed out that it cause problems for open project (apparently that was
in the works anyway).  I thought those problems were removed.  I had a
conference call with one of the datasf.org people the helps make city
datasets available and an assistant city attorney prior to those changes and
I was told that unless specifically noted otherwise in the dataset that the
data was public domain.  I do understand that that isn't in writing though.

If there is a problem with the terms though there is still a good chance the
city would give us explicit permission to use the data;  they seemed excited
about the prospect of some of it ending up in OSM.


> 4) If you can get explicit permission, then I suggest breaking up the
> address nodes into smaller chunks (e.g. by census block group), convert them
> to osm format with Ian's shp-to-osm tool, and check them for quality and
> against existing OSM data (e.g. existing pois w/ addresses) in JOSM before
> importing.  QGIS and/or PostGIS can be useful for chopping up the data into
> geographic chunks.  This approach gives opportunity to apply due diligence,
> to check things, and keep chunks small enough that it's reasonably possible
> to deal with any mistakes or glitches.
>

I had been planning on using shp-to-osm to break it in to chunks by number
of nodes but doing it geographically makes more sense.   Do you think census
block size is best or maybe by neighborhood or aim for an approximate number
of nodes in each geographic chunk?

Cheers,
Gregory Arenius
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Gregory Arenius
On Thu, Dec 9, 2010 at 3:20 PM, Serge Wroclawski  wrote:

> On Thu, Dec 9, 2010 at 6:00 PM, Gregory Arenius 
> wrote:
> > I've been working on an import of San Francisco address node data.  I
> have
> > several thoughts and questions and would appreciate any feedback.
>
> The Wiki page doesn't mention the original dataset url. I have a few
> concerns:


The shapefile is
here.

I added it to the wiki.  I'm sorry, it should have been there to start with.



>
> 1) Without seeing the dataset url, it's hard to know anything about
> the dataset (its age, accuracy, etc.)
>
> This is a real problem with imports- knowing the original quality of
> the dataset before it's imported.
>
> The project has had to remove or correct so many bad datasets, it's
> incredibly annoying.
>

I've spot checked a number of blocks by going out and comparing the data and
been impressed with its accuracy.  The data is sourced from the Department
of Building Inspection's Address Verification System, the Assessor-Recorder
Office's Parcel database and the Department of Elections (Voter Registration
Project).  I believe it to be high quality and have been told by another
that has used it that the dataset is "legit."


> > About the data.  Its in a shapefile format containing about 230,000
> > individual nodes.  The data is really high quality and all of the
> addresses
> > I have checked are correct.  It has pretty complete coverage of the
> entire
> > city.
>
> MHO is that individual node addresses are pretty awful. If you can
> import the building outlines, and then attach the addresses to them,
> great (and you'll need to consider what's to be done with any existing
> data), but otherwise, IMHO, this dataset just appears as noise.
>

 The wiki states that this is how address nodes are done.  They can be
attached to other objects of course but they can also be independent.  Like
I stated earlier I did check how they are actually being done elsewhere and
the ones I've seen entered are done in this manner.

Also, why do you think of them as noise?  They're useful for geocoding and
door to door routing.  The routing in particular is something people clamor
for when its lacking.

As for attaching them to buildings that doesn't particularly work well in
many cases especially in San Francisco.  For instance a building might have
a number of addresses in it.  A large building taking up a whole block could
have addresses on multiple streets.  Also, we don't have building outlines
for most of SF and that shouldn't stop us from having useful routing.

>
>
> > Also, there are a large number of places where there are multiple nodes
> in
> > one location if there is more than one address at that location.  One
> > example would be a house broken into five apartments.  Sometimes they
> keep
> > one address and use apartment numbers and sometimes each apartment gets
> its
> > own house number.  In the latter cases there will be five nodes with
> > different addr:housenumber fields but identical addr:street and lat/long
> > coordinates.
>
> > Should I keep the individual nodes or should I combine them?
>
> Honestly, I think this is a very cart-before-horse. Please consider
> making a test of your dataset somewhere people can check out, and then
> solicit feedback on the process.
>

As I'm still planning things out I think its a good time to discuss this
type of issue.  As to a test, what do you recommend?  Tossing the OSM file
up somewhere for people to see or did you mean more testing the upload
process on a dev server type of thing.  I'm planning on doing both but if
you have other ideas that might help I'm listening.


>
>
> > I haven't yet looked into how I plan to do the actual uploading but I'll
> > take care to make sure its easily reversible if anything goes wrong and
> > doesn't hammer any servers.
>
> There are people who've spent years with the project and not gotten
> imports right, I think this is a less trivial problem than you might
> expect.
>
>
I hear this every time imports come up.  I got it.  Its hard.  Thats why I'm
soliciting feedback and willing to take my time and am really trying to do
it correctly.  I'm not willing to just give up because there have been
problems with imports in the past.


> > I've also made a wiki page for the import.
> >
> > Feedback welcome here or on the wiki page.
>
> This really belongs on the imports list as well, but my feedback would be:
>
> 1) Where's the shapefile? (if for nothing else, than the licnese, but
> also for feedback)
>
 I added it to the wiki page.  Again I'm sorry it wasn't there to begin
with.  The shapefile is
here.As
for the license I believe its okay but I posted that bit to talk legal
because I thought it belonged there.


> 2) Can you attach the addresses to real objects (rather tha

Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Katie Filbert
On Thu, Dec 9, 2010 at 6:20 PM, Serge Wroclawski  wrote:

> On Thu, Dec 9, 2010 at 6:00 PM, Gregory Arenius 
> wrote:
> > I've been working on an import of San Francisco address node data.  I
> have
> > several thoughts and questions and would appreciate any feedback.
>
> The Wiki page doesn't mention the original dataset url. I have a few
> concerns:
>
> 1) Without seeing the dataset url, it's hard to know anything about
> the dataset (its age, accuracy, etc.)
>

> This is a real problem with imports- knowing the original quality of
> the dataset before it's imported.
>
> The project has had to remove or correct so many bad datasets, it's
> incredibly annoying.
>
> > About the data.  Its in a shapefile format containing about 230,000
> > individual nodes.  The data is really high quality and all of the
> addresses
> > I have checked are correct.  It has pretty complete coverage of the
> entire
> > city.
>
> MHO is that individual node addresses are pretty awful. If you can
> import the building outlines, and then attach the addresses to them,
> great (and you'll need to consider what's to be done with any existing
> data), but otherwise, IMHO, this dataset just appears as noise.
>



> Also, there are a large number of places where there are multiple nodes in
> one location if there is more than one address at that location.  One
> example would be a house broken into five apartments.  Sometimes they keep
> one address and use apartment numbers and sometimes each apartment gets
its
> own house number.  In the latter cases there will be five nodes with
> different addr:housenumber fields but identical addr:street and lat/long
> coordinates.

> Should I keep the individual nodes or should I combine them?

 Honestly, I think this is a very cart-before-horse. Please consider
> making a test of your dataset somewhere people can check out, and then
> solicit feedback on the process.
>
>
> > I haven't yet looked into how I plan to do the actual uploading but I'll
> > take care to make sure its easily reversible if anything goes wrong and
> > doesn't hammer any servers.
>
> There are people who've spent years with the project and not gotten
> imports right, I think this is a less trivial problem than you might
> expect.
>
>
> > I've also made a wiki page for the import.
> >
> > Feedback welcome here or on the wiki page.
>
> This really belongs on the imports list as well, but my feedback would be:
>
> 1) Where's the shapefile? (if for nothing else, than the licnese, but
> also for feedback)
> 2) Can you attach the addresses to real objects (rather than standalone
> nodes)?
> 3) What metadata will you keep from the other dataset?
> 4) How will you handle internally conflicting data?
> 5) How will you handle conflicts with existing OSM data?
>
> - Serge
>
>
A few comments...

1) San Francisco explicitly says they do not have building outline data. :(
So, I suppose we get to add buildings ourselves.  I do see that SF does have
parcels.

For DC, we are attaching addresses to buildings when there is a one-to-one
relation between them.  When there are multiple address nodes for a single
building, then we keep them as nodes. In vast majority of cases, we do not
have apartment numbers but in some cases we have things like 1120a, 1120b,
1120c that can be imported.  Obviously, without a buildings dataset, our
approach won't quite apply for SF.

2) I don't consider the addresses as noise.  The data is very helpful for
geocoding.  If the renderer does a sloppy job making noise out of addresses,
the renderings should be improved.

3) Having looked at the data catalogue page, I do have concerns about the
terms of use and think it's best to get SF to explicitly agree to allow OSM
to use the data.

http://gispub02.sfgov.org/website/sfshare/index2.asp

4) If you can get explicit permission, then I suggest breaking up the
address nodes into smaller chunks (e.g. by census block group), convert them
to osm format with Ian's shp-to-osm tool, and check them for quality and
against existing OSM data (e.g. existing pois w/ addresses) in JOSM before
importing.  QGIS and/or PostGIS can be useful for chopping up the data into
geographic chunks.  This approach gives opportunity to apply due diligence,
to check things, and keep chunks small enough that it's reasonably possible
to deal with any mistakes or glitches.

-Katie



> ___
> Talk-us mailing list
> Talk-us@openstreetmap.org
> http://lists.openstreetmap.org/listinfo/talk-us
>



-- 
Katie Filbert
filbe...@gmail.com
@filbertkm
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Mike N.

MHO is that individual node addresses are pretty awful. If you can
import the building outlines, and then attach the addresses to them,
great (and you'll need to consider what's to be done with any existing
data), but otherwise, IMHO, this dataset just appears as noise.


  Why does the dataset appear as noise when not attached to another object? 
Have I been mapping address nodes wrong?




___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Nathan Edgars II
On Thu, Dec 9, 2010 at 6:44 PM, Mike Thompson  wrote:
> Also, there are cases where a
> single building has its own zip code, and these do not show up in the
> census zip code polygons.

Or an entire (company-owned) city: Lake Buena Vista, Florida has been
32830 since 1971, but the TIGER polygons don't recognize this.

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Mike Thompson
On Thu, Dec 9, 2010 at 4:09 PM, Mike N.  wrote:
>>First, I've looked at how address nodes have been input manually.  In some
>> places they are just addr:housenumber and addr:street and nothing else.  In
>> other places they include the city and the country and sometimes another
>> administrative level such as state.  Since the last three pieces of
>> information can be fairly easily derived I was thinking of just doing the
>> house number and the street.   The dataset is fairly large so I don't want
>> to include any extra fields if I don't have to.  Is this level of
>> information sufficient?  Or should I include the city and the state and the
>> country in each node?
>    I would recommend just addr:housenumber and addr:street.   The reason is
> that the city, state, etc can be derived from bounding polygons.   In
> addition, those polygons frequently change.  By not including city, state,
> etc, there is one less step to go through when the boundaries change.
That works for states, but not cities as the cities used in postal
addresses don't match municipal boundaries in many cases.  It would be
good to include postal codes (zip codes in U.S.) as it would eliminate
the need for a city provided the application doing the routing has a
suitable look up table.  But there are problems with this as well. For
example, the USPS is always making changes to the zip codes and
usually the only authoritative source is licensed data from the USPS
(i.e. there is usually no way to observe zip codes from a field
survey).

Note that the zip code boundaries from the US Census Bureau are not
real zip code boundaries, they are only for statistical purposes and
have been edited to fit that purpose.  Also, there are cases where a
single building has its own zip code, and these do not show up in the
census zip code polygons.

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Serge Wroclawski
On Thu, Dec 9, 2010 at 6:00 PM, Gregory Arenius  wrote:
> I've been working on an import of San Francisco address node data.  I have
> several thoughts and questions and would appreciate any feedback.

The Wiki page doesn't mention the original dataset url. I have a few concerns:

1) Without seeing the dataset url, it's hard to know anything about
the dataset (its age, accuracy, etc.)

This is a real problem with imports- knowing the original quality of
the dataset before it's imported.

The project has had to remove or correct so many bad datasets, it's
incredibly annoying.

> About the data.  Its in a shapefile format containing about 230,000
> individual nodes.  The data is really high quality and all of the addresses
> I have checked are correct.  It has pretty complete coverage of the entire
> city.

MHO is that individual node addresses are pretty awful. If you can
import the building outlines, and then attach the addresses to them,
great (and you'll need to consider what's to be done with any existing
data), but otherwise, IMHO, this dataset just appears as noise.


> Also, there are a large number of places where there are multiple nodes in
> one location if there is more than one address at that location.  One
> example would be a house broken into five apartments.  Sometimes they keep
> one address and use apartment numbers and sometimes each apartment gets its
> own house number.  In the latter cases there will be five nodes with
> different addr:housenumber fields but identical addr:street and lat/long
> coordinates.

> Should I keep the individual nodes or should I combine them?

Honestly, I think this is a very cart-before-horse. Please consider
making a test of your dataset somewhere people can check out, and then
solicit feedback on the process.


> I haven't yet looked into how I plan to do the actual uploading but I'll
> take care to make sure its easily reversible if anything goes wrong and
> doesn't hammer any servers.

There are people who've spent years with the project and not gotten
imports right, I think this is a less trivial problem than you might
expect.


> I've also made a wiki page for the import.
>
> Feedback welcome here or on the wiki page.

This really belongs on the imports list as well, but my feedback would be:

1) Where's the shapefile? (if for nothing else, than the licnese, but
also for feedback)
2) Can you attach the addresses to real objects (rather than standalone nodes)?
3) What metadata will you keep from the other dataset?
4) How will you handle internally conflicting data?
5) How will you handle conflicts with existing OSM data?

- Serge

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Mike N.
>First, I've looked at how address nodes have been input manually.  In some 
>places they are just addr:housenumber and addr:street and nothing else.  In 
>other places they include the city and the country and sometimes another 
>administrative level such as state.  Since the last three pieces of 
>information can be fairly easily derived I was thinking of just doing the 
>house number and the street.   The dataset is fairly large so I don't want to 
>include any extra fields if I don't have to.  Is this level of information 
>sufficient?  Or should I include the city and the state and the country in 
>each node?

   I would recommend just addr:housenumber and addr:street.   The reason is 
that the city, state, etc can be derived from bounding polygons.   In addition, 
those polygons frequently change.  By not including city, state, etc, there is 
one less step to go through when the boundaries change.

>Also, there are a large number of places where there are multiple nodes in one 
>location if there is more than one address at that location.  One example 
>would be a house broken into five apartments.  Sometimes they keep one address 
>and use apartment numbers and sometimes each apartment gets its own house 
>number.  In the latter cases there will be five nodes with different 
>addr:housenumber fields but identical addr:street and lat/long coordinates.  
>Should I keep the individual nodes or should I combine them?  For instance, I 
>could do one node and have addr:housenumber=5;6;7;8;9 or have a node for each 
>address.   Combining nodes would cut the number of nodes imported by about 40% 
>but I fear that it might be harder to work with manually and also not 
>recognized by routers and other software.

   I would recommend a node per address; this matches the existing Wiki 
convention, and should work with Routers and Nominatum.  Editors don't make it 
easy to access an individual node out of a stack, but it is not too difficult 
for the odd case where it might be necessary.___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


[Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Gregory Arenius
I've been working on an import of San Francisco address node data.  I have
several thoughts and questions and would appreciate any feedback.

About the data.  Its in a shapefile format containing about 230,000
individual nodes.  The data is really high quality and all of the addresses
I have checked are correct.  It has pretty complete coverage of the entire
city.

First, I've looked at how address nodes have been input manually.  In some
places they are just addr:housenumber and addr:street and nothing else.  In
other places they include the city and the country and sometimes another
administrative level such as state.  Since the last three pieces of
information can be fairly easily derived I was thinking of just doing the
house number and the street.   The dataset is fairly large so I don't want
to include any extra fields if I don't have to.  Is this level of
information sufficient?  Or should I include the city and the state and the
country in each node?

Also, there are a large number of places where there are multiple nodes in
one location if there is more than one address at that location.  One
example would be a house broken into five apartments.  Sometimes they keep
one address and use apartment numbers and sometimes each apartment gets its
own house number.  In the latter cases there will be five nodes with
different addr:housenumber fields but identical addr:street and lat/long
coordinates.  Should I keep the individual nodes or should I combine them?
For instance, I could do one node and have addr:housenumber=5;6;7;8;9 or
have a node for each address.   Combining nodes would cut the number of
nodes imported by about 40% but I fear that it might be harder to work with
manually and also not recognized by routers and other software.

Before importing the data I will run a comparison against existing OSM data
and not upload nodes that match an existing addr:housenumber/addr:street
combination.  There aren't many plain address nodes in the city at the
moment (a couple hundred, tops) but there are a fair number of businesses
that have had address data added to them and I don't want any duplicate
address nodes as a result of this import.

There are only a very few address ways in the SF dataset but they aren't any
where near as accurate as the data I will be importing so I plan on deleting
those.

I haven't yet looked into how I plan to do the actual uploading but I'll
take care to make sure its easily reversible if anything goes wrong and
doesn't hammer any servers.

I've also made a wiki page for the
import.

Feedback welcome here or on the wiki page.

Cheers,
Gregory Arenius
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us