Re: [Talk-us] Address Node Import for San Francisco

2010-12-13 Thread Gregory Arenius
As I've been out of touch here is a sort of omnibus reply to the last couple
of days worth of discussion on SF Addresses.   Thanks for all the ideas and
help.


 I would say this is one of the easy imports, there is not too much harm it
 can create. only problem is to merge it with existing data and make a
 decision which one is better. Since this data is probably authoritative it
 might be ok to replace most of the less accurate data already in OSM.
 For this reason I would drop any of the nodes in case of a conflict but
 rename the tags to something else like sf_addrimport_addr:*
 a survey on the road can check them later and compare with the existing
 addr nodes and decide which one to keep and rename the import tags to the
 real tags


 I don't think that the data currently in OSM is less accurate than that of
the import.  The address data currently in OSM in SF is either on a node for
something else like a restaurant or shop or its one of the very few
standalone address nodes that have been entered.

I think that having the data attached to a business is much more valuable
than it is alone and don't intend to over write any of that.

There are only a couple of little areas comprising maybe half a dozen blocks
that have standalone address nodes.  The data thats in there looks like it
has been carefully entered and I don't doubt its accuracy especially because
I've met one of the mappers that did some of it and she knows what shes
doing.

As such I don't really think its worth having a fall back
sf_addrimport_addr: tag for conflicting nodes and would rather just drop the
ones from the dataset I'm importing if they conflict.  I definitely will
though if anything I come across in the data makes me think it would be
worthwhile.

Do spot-check different neighborhoods. In reviewing the San Bernardino
 County assessor's shapefile, I found that housenumbers, ZIP codes, and even
 street names were missing/wrong in some areas I spot-checked. The county's
 response was that this data was of secondary importance to the assessor,
 understandably - as long as they have all the parcels, and the billing
 address for them, the actual postal address of the parcel is not critical
 info.


I will spot check different neighborhoods to make sure that they're of equal
quality to the blocks I've checked which have mostly been ones local to
where I live and work.

I've found no reason to think that any of the data is billing addresses for
the parcel instead of the mailing address of the parcel.  I'll keep an eye
out for it though.

As to zip codes I don't plan on putting any in because I haven't found a
source for them that I feel would work.

As for a demo of the data, yeah, an OSM file would be perfect. Also,
 though, I'd keep the previous dataset ID, in case you need to do a
 comparison later.


I will definitely post an OSM file once I have something a bit closer to
being import ready.

As to the previous dataset ID, in this case the ObjectID, I'm not
particularly opposed to keeping it I'm just not sure what we'd gain in this
instance and I know there are people who object to having lots of third
party IDs in our database.

In this particular instance I think comparisons between OSM and any future
SFAddress files can be done equally as well using the
addr:housenumber/addr:street combo which should, ideally,  be unique.  As
we'll have a fair number of nodes that aren't imported because the address
is already taken by a business or otherwise already in OSM we'd have to
resort to using that sort of matching system anyhow.

I don't agree that the other info can be easily, or accurately, derived.
 Addresses near the borders of those polygons are often subject to
 seemingly-arbitrary decisions. The physical location of the centroid of a
 parcel may not be within the same ZIP, city, and/or county polygons as their
 address. I would include the city and ZIP code.


Make sense.  I will include the city but as stated above I don't have the
ZIPs.

Just wanna say that addressing in SF would be awesome :-)


The goal is to make it so that SF is fully routable.  As we have good (not
perfect, but really good)street geometry, junctions, classifications (eg,
primary, residential, etc), oneways and names the main things we're lacking
are addresses and turn restrictions.  Hopefully we'll get addresses from
this import.  As there doesn't seem to be any source for the turn
restriction data I've put up a page on the
wikihttp://wiki.openstreetmap.org/wiki/San_Francisco_Turn_Restrictionsto
help coordinate efforts to map them and put a little dent in it myself
to
start.  Hopefully some more people join in.  I think getting these two
things done will put OSM at a pretty competitive level with any of the
commercial data providers with respect to SF.


 Hopefully this is helpful, as you'll want to import street names that
 actually match those in OSM's view of San Francisco.


It is helpful, thank you, especially being able to see where many of the

Re: [Talk-us] Address Node Import for San Francisco

2010-12-12 Thread SteveC
Just wanna say that addressing in SF would be awesome :-)

Steve

stevecoast.com

On Dec 10, 2010, at 1:29 AM, Katie Filbert filbe...@gmail.com wrote:

 On Thu, Dec 9, 2010 at 6:20 PM, Serge Wroclawski emac...@gmail.com wrote:
 On Thu, Dec 9, 2010 at 6:00 PM, Gregory Arenius greg...@arenius.com wrote:
  I've been working on an import of San Francisco address node data.  I have
  several thoughts and questions and would appreciate any feedback.
 
 The Wiki page doesn't mention the original dataset url. I have a few concerns:
 
 1) Without seeing the dataset url, it's hard to know anything about
 the dataset (its age, accuracy, etc.) 
 
 This is a real problem with imports- knowing the original quality of
 the dataset before it's imported.
 
 The project has had to remove or correct so many bad datasets, it's
 incredibly annoying.
 
  About the data.  Its in a shapefile format containing about 230,000
  individual nodes.  The data is really high quality and all of the addresses
  I have checked are correct.  It has pretty complete coverage of the entire
  city.
 
 MHO is that individual node addresses are pretty awful. If you can
 import the building outlines, and then attach the addresses to them,
 great (and you'll need to consider what's to be done with any existing
 data), but otherwise, IMHO, this dataset just appears as noise.
 
  
 
  Also, there are a large number of places where there are multiple nodes in
  one location if there is more than one address at that location.  One
  example would be a house broken into five apartments.  Sometimes they keep
  one address and use apartment numbers and sometimes each apartment gets its
  own house number.  In the latter cases there will be five nodes with
  different addr:housenumber fields but identical addr:street and lat/long
  coordinates.
 
  Should I keep the individual nodes or should I combine them?
 
 Honestly, I think this is a very cart-before-horse. Please consider
 making a test of your dataset somewhere people can check out, and then
 solicit feedback on the process.
 
 
  I haven't yet looked into how I plan to do the actual uploading but I'll
  take care to make sure its easily reversible if anything goes wrong and
  doesn't hammer any servers.
 
 There are people who've spent years with the project and not gotten
 imports right, I think this is a less trivial problem than you might
 expect.
 
 
  I've also made a wiki page for the import.
 
  Feedback welcome here or on the wiki page.
 
 This really belongs on the imports list as well, but my feedback would be:
 
 1) Where's the shapefile? (if for nothing else, than the licnese, but
 also for feedback)
 2) Can you attach the addresses to real objects (rather than standalone 
 nodes)?
 3) What metadata will you keep from the other dataset?
 4) How will you handle internally conflicting data?
 5) How will you handle conflicts with existing OSM data?
 
 - Serge
 
 
 A few comments...
 
 1) San Francisco explicitly says they do not have building outline data. :(  
 So, I suppose we get to add buildings ourselves.  I do see that SF does have 
 parcels.  
 
 For DC, we are attaching addresses to buildings when there is a one-to-one 
 relation between them.  When there are multiple address nodes for a single 
 building, then we keep them as nodes. In vast majority of cases, we do not 
 have apartment numbers but in some cases we have things like 1120a, 1120b, 
 1120c that can be imported.  Obviously, without a buildings dataset, our 
 approach won't quite apply for SF.
 
 2) I don't consider the addresses as noise.  The data is very helpful for 
 geocoding.  If the renderer does a sloppy job making noise out of addresses, 
 the renderings should be improved. 
 
 3) Having looked at the data catalogue page, I do have concerns about the 
 terms of use and think it's best to get SF to explicitly agree to allow OSM 
 to use the data.
 
 http://gispub02.sfgov.org/website/sfshare/index2.asp
 
 4) If you can get explicit permission, then I suggest breaking up the address 
 nodes into smaller chunks (e.g. by census block group), convert them to osm 
 format with Ian's shp-to-osm tool, and check them for quality and against 
 existing OSM data (e.g. existing pois w/ addresses) in JOSM before importing. 
  QGIS and/or PostGIS can be useful for chopping up the data into geographic 
 chunks.  This approach gives opportunity to apply due diligence, to check 
 things, and keep chunks small enough that it's reasonably possible to deal 
 with any mistakes or glitches.
 
 -Katie
 
  
 ___
 Talk-us mailing list
 Talk-us@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk-us
 
 
 
 -- 
 Katie Filbert
 filbe...@gmail.com
 @filbertkm
 ___
 Talk-us mailing list
 Talk-us@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk-us
___
Talk-us mailing list

Re: [Talk-us] Address Node Import for San Francisco

2010-12-12 Thread Michal Migurski
On Dec 9, 2010, at 3:00 PM, Gregory Arenius wrote:

 About the data.  Its in a shapefile format containing about 230,000 
 individual nodes.  The data is really high quality and all of the addresses I 
 have checked are correct.  It has pretty complete coverage of the entire city.

I've worked with this file before. When I matched it to OSM data two years ago, 
I found that the SF data had numerous errors, so I wrote this mapping script:

http://mike.teczno.com/img/sf-addresses/mapping.py
Usage: mapping.py [osm streets csv] [sf streets csv]  [street names 
csv]

Here are all the street names in the shapefile:
http://mike.teczno.com/img/sf-addresses/sfaddresses.csv

Here are all the street names in OSM at the time I did the comparison (may have 
changed since):
http://mike.teczno.com/img/sf-addresses/osm_streets.csv

And this is the mapping result I got:
http://mike.teczno.com/img/sf-addresses/street_names.csv

Hopefully this is helpful, as you'll want to import street names that actually 
match those in OSM's view of San Francisco.

I found some other weird burrs in the data as well, in terms of how it arranges 
addresses stacked on top of one another in tall buildings. Nothing that can't 
be dealt with in an import.

I also did a bunch of geometry work to match those address points to nearby 
street segments in order to break up the street grid into addresses segments, 
but that code is a bit of a rat's nest. The idea was to build up the little 
block numbers you see rendered here:
http://www.flickr.com/photos/mmigurski/5229627985/sizes/l/

Katie's suggestion of breaking the data into smaller chunks is a good one.

-mike.


michal migurski- m...@stamen.com
 415.558.1610




___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-10 Thread Alan Mintz

At 2010-12-09 17:14, Katie Filbert wrote:

...
With buildings, our data was a bit denser. I did some by census tract and 
found some were too big for the OSM API and JOSM whereas census block 
group has worked well. With just nodes, I think you could do somewhat 
larger chunks.


Were the shapes (needlessly) over-digitized? I saw this with some of the 
CASIL polygons, and with much of Kern County, where huge reductions 
(thousands of nodes per square mile) were possible with little effect on 
rendering, using the JOSM simplify. Hopefully there is something similar 
available in the import tool collection.


--
Alan Mintz alan_mintz+...@earthlink.net


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-10 Thread Alan Mintz

At 2010-12-09 15:00, Gregory Arenius wrote:
About the data.  Its in a shapefile format containing about 230,000 
individual nodes.  The data is really high quality and all of the 
addresses I have checked are correct.  It has pretty complete coverage of 
the entire city.


Do spot-check different neighborhoods. In reviewing the San Bernardino 
County assessor's shapefile, I found that housenumbers, ZIP codes, and even 
street names were missing/wrong in some areas I spot-checked. The county's 
response was that this data was of secondary importance to the assessor, 
understandably - as long as they have all the parcels, and the billing 
address for them, the actual postal address of the parcel is not critical info.



First, I've looked at how address nodes have been input manually.  In some 
places they are just addr:housenumber and addr:street and nothing 
else.  In other places they include the city and the country and sometimes 
another administrative level such as state.  Since the last three pieces 
of information can be fairly easily derived I was thinking of just doing 
the house number and the street.
 The dataset is fairly large so I don't want to include any extra fields 
if I don't have to.  Is this level of information sufficient?  Or should 
I include the city and the state and the country in each node?


I don't agree that the other info can be easily, or accurately, derived. 
Addresses near the borders of those polygons are often subject to 
seemingly-arbitrary decisions. The physical location of the centroid of a 
parcel may not be within the same ZIP, city, and/or county polygons as 
their address. I would include the city and ZIP code.


Note, BTW, that there are lots of ZIP code issues that come up, and I'm not 
always sure how to deal with them. I'll look up an address I know to exist 
using http://zip4.usps.com/zip4/welcome.jsp, but it won't find it - often 
because the USPS uses a different city name. It seems to happen a lot in 
rural areas, but not exclusively, and not always for the reason you might 
think (that it's the city of the post office that serves the address). 
Hopefully, that won't be a problem for your single-city import, though.


--
Alan Mintz alan_mintz+...@earthlink.net


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-10 Thread Katie Filbert
On Dec 10, 2010, at 5:10 AM, Alan Mintz alan_mintz+...@earthlink.net  
wrote:



At 2010-12-09 17:14, Katie Filbert wrote:

...
With buildings, our data was a bit denser. I did some by census  
tract and found some were too big for the OSM API and JOSM whereas  
census block group has worked well. With just nodes, I think you  
could do somewhat larger chunks.


Were the shapes (needlessly) over-digitized? I saw this with some of  
the CASIL polygons, and with much of Kern County, where huge  
reductions (thousands of nodes per square mile) were possible with  
little effect on rendering, using the JOSM simplify. Hopefully  
there is something similar available in the import tool collection.


Yes, simplifying the buildings in JOSM is an important step in the  
process.   I tweaked the simplify param to get it best as possible.   
Though not 100% happy with the results so do some more manual tweaking  
in JOSM


Katie





--
Alan Mintz alan_mintz+...@earthlink.net


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


[Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Gregory Arenius
I've been working on an import of San Francisco address node data.  I have
several thoughts and questions and would appreciate any feedback.

About the data.  Its in a shapefile format containing about 230,000
individual nodes.  The data is really high quality and all of the addresses
I have checked are correct.  It has pretty complete coverage of the entire
city.

First, I've looked at how address nodes have been input manually.  In some
places they are just addr:housenumber and addr:street and nothing else.  In
other places they include the city and the country and sometimes another
administrative level such as state.  Since the last three pieces of
information can be fairly easily derived I was thinking of just doing the
house number and the street.   The dataset is fairly large so I don't want
to include any extra fields if I don't have to.  Is this level of
information sufficient?  Or should I include the city and the state and the
country in each node?

Also, there are a large number of places where there are multiple nodes in
one location if there is more than one address at that location.  One
example would be a house broken into five apartments.  Sometimes they keep
one address and use apartment numbers and sometimes each apartment gets its
own house number.  In the latter cases there will be five nodes with
different addr:housenumber fields but identical addr:street and lat/long
coordinates.  Should I keep the individual nodes or should I combine them?
For instance, I could do one node and have addr:housenumber=5;6;7;8;9 or
have a node for each address.   Combining nodes would cut the number of
nodes imported by about 40% but I fear that it might be harder to work with
manually and also not recognized by routers and other software.

Before importing the data I will run a comparison against existing OSM data
and not upload nodes that match an existing addr:housenumber/addr:street
combination.  There aren't many plain address nodes in the city at the
moment (a couple hundred, tops) but there are a fair number of businesses
that have had address data added to them and I don't want any duplicate
address nodes as a result of this import.

There are only a very few address ways in the SF dataset but they aren't any
where near as accurate as the data I will be importing so I plan on deleting
those.

I haven't yet looked into how I plan to do the actual uploading but I'll
take care to make sure its easily reversible if anything goes wrong and
doesn't hammer any servers.

I've also made a wiki page for the
import.http://wiki.openstreetmap.org/wiki/San_Francisco_Address_Import

Feedback welcome here or on the wiki page.

Cheers,
Gregory Arenius
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Mike N.
First, I've looked at how address nodes have been input manually.  In some 
places they are just addr:housenumber and addr:street and nothing else.  In 
other places they include the city and the country and sometimes another 
administrative level such as state.  Since the last three pieces of 
information can be fairly easily derived I was thinking of just doing the 
house number and the street.   The dataset is fairly large so I don't want to 
include any extra fields if I don't have to.  Is this level of information 
sufficient?  Or should I include the city and the state and the country in 
each node?

   I would recommend just addr:housenumber and addr:street.   The reason is 
that the city, state, etc can be derived from bounding polygons.   In addition, 
those polygons frequently change.  By not including city, state, etc, there is 
one less step to go through when the boundaries change.

Also, there are a large number of places where there are multiple nodes in one 
location if there is more than one address at that location.  One example 
would be a house broken into five apartments.  Sometimes they keep one address 
and use apartment numbers and sometimes each apartment gets its own house 
number.  In the latter cases there will be five nodes with different 
addr:housenumber fields but identical addr:street and lat/long coordinates.  
Should I keep the individual nodes or should I combine them?  For instance, I 
could do one node and have addr:housenumber=5;6;7;8;9 or have a node for each 
address.   Combining nodes would cut the number of nodes imported by about 40% 
but I fear that it might be harder to work with manually and also not 
recognized by routers and other software.

   I would recommend a node per address; this matches the existing Wiki 
convention, and should work with Routers and Nominatum.  Editors don't make it 
easy to access an individual node out of a stack, but it is not too difficult 
for the odd case where it might be necessary.___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Mike Thompson
On Thu, Dec 9, 2010 at 4:09 PM, Mike N. nice...@att.net wrote:
First, I've looked at how address nodes have been input manually.  In some
 places they are just addr:housenumber and addr:street and nothing else.  In
 other places they include the city and the country and sometimes another
 administrative level such as state.  Since the last three pieces of
 information can be fairly easily derived I was thinking of just doing the
 house number and the street.   The dataset is fairly large so I don't want
 to include any extra fields if I don't have to.  Is this level of
 information sufficient?  Or should I include the city and the state and the
 country in each node?
    I would recommend just addr:housenumber and addr:street.   The reason is
 that the city, state, etc can be derived from bounding polygons.   In
 addition, those polygons frequently change.  By not including city, state,
 etc, there is one less step to go through when the boundaries change.
That works for states, but not cities as the cities used in postal
addresses don't match municipal boundaries in many cases.  It would be
good to include postal codes (zip codes in U.S.) as it would eliminate
the need for a city provided the application doing the routing has a
suitable look up table.  But there are problems with this as well. For
example, the USPS is always making changes to the zip codes and
usually the only authoritative source is licensed data from the USPS
(i.e. there is usually no way to observe zip codes from a field
survey).

Note that the zip code boundaries from the US Census Bureau are not
real zip code boundaries, they are only for statistical purposes and
have been edited to fit that purpose.  Also, there are cases where a
single building has its own zip code, and these do not show up in the
census zip code polygons.

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Nathan Edgars II
On Thu, Dec 9, 2010 at 6:44 PM, Mike Thompson miketh...@gmail.com wrote:
 Also, there are cases where a
 single building has its own zip code, and these do not show up in the
 census zip code polygons.

Or an entire (company-owned) city: Lake Buena Vista, Florida has been
32830 since 1971, but the TIGER polygons don't recognize this.

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Mike N.

MHO is that individual node addresses are pretty awful. If you can
import the building outlines, and then attach the addresses to them,
great (and you'll need to consider what's to be done with any existing
data), but otherwise, IMHO, this dataset just appears as noise.


  Why does the dataset appear as noise when not attached to another object? 
Have I been mapping address nodes wrong?




___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Katie Filbert
On Thu, Dec 9, 2010 at 6:20 PM, Serge Wroclawski emac...@gmail.com wrote:

 On Thu, Dec 9, 2010 at 6:00 PM, Gregory Arenius greg...@arenius.com
 wrote:
  I've been working on an import of San Francisco address node data.  I
 have
  several thoughts and questions and would appreciate any feedback.

 The Wiki page doesn't mention the original dataset url. I have a few
 concerns:

 1) Without seeing the dataset url, it's hard to know anything about
 the dataset (its age, accuracy, etc.)


 This is a real problem with imports- knowing the original quality of
 the dataset before it's imported.

 The project has had to remove or correct so many bad datasets, it's
 incredibly annoying.

  About the data.  Its in a shapefile format containing about 230,000
  individual nodes.  The data is really high quality and all of the
 addresses
  I have checked are correct.  It has pretty complete coverage of the
 entire
  city.

 MHO is that individual node addresses are pretty awful. If you can
 import the building outlines, and then attach the addresses to them,
 great (and you'll need to consider what's to be done with any existing
 data), but otherwise, IMHO, this dataset just appears as noise.




 Also, there are a large number of places where there are multiple nodes in
 one location if there is more than one address at that location.  One
 example would be a house broken into five apartments.  Sometimes they keep
 one address and use apartment numbers and sometimes each apartment gets
its
 own house number.  In the latter cases there will be five nodes with
 different addr:housenumber fields but identical addr:street and lat/long
 coordinates.

 Should I keep the individual nodes or should I combine them?

 Honestly, I think this is a very cart-before-horse. Please consider
 making a test of your dataset somewhere people can check out, and then
 solicit feedback on the process.


  I haven't yet looked into how I plan to do the actual uploading but I'll
  take care to make sure its easily reversible if anything goes wrong and
  doesn't hammer any servers.

 There are people who've spent years with the project and not gotten
 imports right, I think this is a less trivial problem than you might
 expect.


  I've also made a wiki page for the import.
 
  Feedback welcome here or on the wiki page.

 This really belongs on the imports list as well, but my feedback would be:

 1) Where's the shapefile? (if for nothing else, than the licnese, but
 also for feedback)
 2) Can you attach the addresses to real objects (rather than standalone
 nodes)?
 3) What metadata will you keep from the other dataset?
 4) How will you handle internally conflicting data?
 5) How will you handle conflicts with existing OSM data?

 - Serge


A few comments...

1) San Francisco explicitly says they do not have building outline data. :(
So, I suppose we get to add buildings ourselves.  I do see that SF does have
parcels.

For DC, we are attaching addresses to buildings when there is a one-to-one
relation between them.  When there are multiple address nodes for a single
building, then we keep them as nodes. In vast majority of cases, we do not
have apartment numbers but in some cases we have things like 1120a, 1120b,
1120c that can be imported.  Obviously, without a buildings dataset, our
approach won't quite apply for SF.

2) I don't consider the addresses as noise.  The data is very helpful for
geocoding.  If the renderer does a sloppy job making noise out of addresses,
the renderings should be improved.

3) Having looked at the data catalogue page, I do have concerns about the
terms of use and think it's best to get SF to explicitly agree to allow OSM
to use the data.

http://gispub02.sfgov.org/website/sfshare/index2.asp

4) If you can get explicit permission, then I suggest breaking up the
address nodes into smaller chunks (e.g. by census block group), convert them
to osm format with Ian's shp-to-osm tool, and check them for quality and
against existing OSM data (e.g. existing pois w/ addresses) in JOSM before
importing.  QGIS and/or PostGIS can be useful for chopping up the data into
geographic chunks.  This approach gives opportunity to apply due diligence,
to check things, and keep chunks small enough that it's reasonably possible
to deal with any mistakes or glitches.

-Katie



 ___
 Talk-us mailing list
 Talk-us@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk-us




-- 
Katie Filbert
filbe...@gmail.com
@filbertkm
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Gregory Arenius
On Thu, Dec 9, 2010 at 3:20 PM, Serge Wroclawski emac...@gmail.com wrote:

 On Thu, Dec 9, 2010 at 6:00 PM, Gregory Arenius greg...@arenius.com
 wrote:
  I've been working on an import of San Francisco address node data.  I
 have
  several thoughts and questions and would appreciate any feedback.

 The Wiki page doesn't mention the original dataset url. I have a few
 concerns:http://gispub02.sfgov.org/website/sfshare/catalog/sfaddresses.zip


The shapefile is
here.http://gispub02.sfgov.org/website/sfshare/catalog/sfaddresses.zip

I added it to the wiki.  I'm sorry, it should have been there to start with.




 1) Without seeing the dataset url, it's hard to know anything about
 the dataset (its age, accuracy, etc.)

 This is a real problem with imports- knowing the original quality of
 the dataset before it's imported.

 The project has had to remove or correct so many bad datasets, it's
 incredibly annoying.


I've spot checked a number of blocks by going out and comparing the data and
been impressed with its accuracy.  The data is sourced from the Department
of Building Inspection's Address Verification System, the Assessor-Recorder
Office's Parcel database and the Department of Elections (Voter Registration
Project).  I believe it to be high quality and have been told by another
that has used it that the dataset is legit.


  About the data.  Its in a shapefile format containing about 230,000
  individual nodes.  The data is really high quality and all of the
 addresses
  I have checked are correct.  It has pretty complete coverage of the
 entire
  city.

 MHO is that individual node addresses are pretty awful. If you can
 import the building outlines, and then attach the addresses to them,
 great (and you'll need to consider what's to be done with any existing
 data), but otherwise, IMHO, this dataset just appears as noise.


 The wiki states that this is how address nodes are done.  They can be
attached to other objects of course but they can also be independent.  Like
I stated earlier I did check how they are actually being done elsewhere and
the ones I've seen entered are done in this manner.

Also, why do you think of them as noise?  They're useful for geocoding and
door to door routing.  The routing in particular is something people clamor
for when its lacking.

As for attaching them to buildings that doesn't particularly work well in
many cases especially in San Francisco.  For instance a building might have
a number of addresses in it.  A large building taking up a whole block could
have addresses on multiple streets.  Also, we don't have building outlines
for most of SF and that shouldn't stop us from having useful routing.



  Also, there are a large number of places where there are multiple nodes
 in
  one location if there is more than one address at that location.  One
  example would be a house broken into five apartments.  Sometimes they
 keep
  one address and use apartment numbers and sometimes each apartment gets
 its
  own house number.  In the latter cases there will be five nodes with
  different addr:housenumber fields but identical addr:street and lat/long
  coordinates.

  Should I keep the individual nodes or should I combine them?

 Honestly, I think this is a very cart-before-horse. Please consider
 making a test of your dataset somewhere people can check out, and then
 solicit feedback on the process.


As I'm still planning things out I think its a good time to discuss this
type of issue.  As to a test, what do you recommend?  Tossing the OSM file
up somewhere for people to see or did you mean more testing the upload
process on a dev server type of thing.  I'm planning on doing both but if
you have other ideas that might help I'm listening.




  I haven't yet looked into how I plan to do the actual uploading but I'll
  take care to make sure its easily reversible if anything goes wrong and
  doesn't hammer any servers.

 There are people who've spent years with the project and not gotten
 imports right, I think this is a less trivial problem than you might
 expect.


I hear this every time imports come up.  I got it.  Its hard.  Thats why I'm
soliciting feedback and willing to take my time and am really trying to do
it correctly.  I'm not willing to just give up because there have been
problems with imports in the past.


  I've also made a wiki page for the import.
 
  Feedback welcome here or on the wiki page.

 This really belongs on the imports list as well, but my feedback would be:

 1) Where's the shapefile? (if for nothing else, than the licnese, but
 also for feedback)

 I added it to the wiki page.  Again I'm sorry it wasn't there to begin
with.  The shapefile is
here.http://gispub02.sfgov.org/website/sfshare/catalog/sfaddresses.zipAs
for the license I believe its okay but I posted that bit to talk legal
because I thought it belonged there.


 2) Can you attach the addresses to real objects (rather than standalone
 nodes)?


Generally speaking, no.  We don't 

Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Gregory Arenius
A few comments...

 1) San Francisco explicitly says they do not have building outline data.
 :(  So, I suppose we get to add buildings ourselves.  I do see that SF does
 have parcels.

 For DC, we are attaching addresses to buildings when there is a one-to-one
 relation between them.  When there are multiple address nodes for a single
 building, then we keep them as nodes. In vast majority of cases, we do not
 have apartment numbers but in some cases we have things like 1120a, 1120b,
 1120c that can be imported.  Obviously, without a buildings dataset, our
 approach won't quite apply for SF.



We mostly only have building shapes drawn in downtown where its unlikely
there will be many one-to-one matches.  I wish we did have a building shape
file though, that would be great.  I have thought about using the parcel
data but I'm not sure thats as useful.


 2) I don't consider the addresses as noise.  The data is very helpful for
 geocoding.  If the renderer does a sloppy job making noise out of addresses,
 the renderings should be improved.


 3) Having looked at the data catalogue page, I do have concerns about the
 terms of use and think it's best to get SF to explicitly agree to allow OSM
 to use the data.

 http://gispub02.sfgov.org/website/sfshare/index2.asp


What terms in particular caused you concern?  I'll need to know if I'm going
to ask for explicit permission. A while back I posted the previous terms to
talk legal and they pointed out problems.  The city changed the license when
I pointed out that it cause problems for open project (apparently that was
in the works anyway).  I thought those problems were removed.  I had a
conference call with one of the datasf.org people the helps make city
datasets available and an assistant city attorney prior to those changes and
I was told that unless specifically noted otherwise in the dataset that the
data was public domain.  I do understand that that isn't in writing though.

If there is a problem with the terms though there is still a good chance the
city would give us explicit permission to use the data;  they seemed excited
about the prospect of some of it ending up in OSM.


 4) If you can get explicit permission, then I suggest breaking up the
 address nodes into smaller chunks (e.g. by census block group), convert them
 to osm format with Ian's shp-to-osm tool, and check them for quality and
 against existing OSM data (e.g. existing pois w/ addresses) in JOSM before
 importing.  QGIS and/or PostGIS can be useful for chopping up the data into
 geographic chunks.  This approach gives opportunity to apply due diligence,
 to check things, and keep chunks small enough that it's reasonably possible
 to deal with any mistakes or glitches.


I had been planning on using shp-to-osm to break it in to chunks by number
of nodes but doing it geographically makes more sense.   Do you think census
block size is best or maybe by neighborhood or aim for an approximate number
of nodes in each geographic chunk?

Cheers,
Gregory Arenius
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Apollinaris Schoell
On Thu, Dec 9, 2010 at 4:38 PM, Gregory Arenius greg...@arenius.com wrote:


  The wiki states that this is how address nodes are done.  They can be
 attached to other objects of course but they can also be independent.  Like
 I stated earlier I did check how they are actually being done elsewhere and
 the ones I've seen entered are done in this manner.

 Also, why do you think of them as noise?  They're useful for geocoding and
 door to door routing.  The routing in particular is something people clamor
 for when its lacking.


individual address nodes are common and there is nothing wrong adding them


 As for attaching them to buildings that doesn't particularly work well in
 many cases especially in San Francisco.  For instance a building might have
 a number of addresses in it.  A large building taking up a whole block could
 have addresses on multiple streets.  Also, we don't have building outlines
 for most of SF and that shouldn't stop us from having useful routing.


setting address to a building is good if there are buildings. but in this
case it makes absolute sense to have individual nodes. in case of multiple
addresses on one building the address nodes can be used as a node in the
building outline to mark the individual entrances on large buildings. but
this is really optional.





  Also, there are a large number of places where there are multiple nodes
 in
  one location if there is more than one address at that location.  One
  example would be a house broken into five apartments.  Sometimes they
 keep
  one address and use apartment numbers and sometimes each apartment gets
 its
  own house number.  In the latter cases there will be five nodes with
  different addr:housenumber fields but identical addr:street and lat/long
  coordinates.

  Should I keep the individual nodes or should I combine them?


don't combine them if they have different house number. reality is there are
different address so we should map all of them even if the location is the
same.



 I hear this every time imports come up.  I got it.  Its hard.  Thats why
 I'm soliciting feedback and willing to take my time and am really trying to
 do it correctly.  I'm not willing to just give up because there have been
 problems with imports in the past.


I would say this is one of the easy imports, there is not too much harm it
can create. only problem is to merge it with existing data and make a
decision which one is better. Since this data is probably authoritative it
might be ok to replace most of the less accurate data already in OSM.
For this reason I would drop any of the nodes in case of a conflict but
rename the tags to something else like sf_addrimport_addr:*
a survey on the road can check them later and compare with the existing addr
nodes and decide which one to keep and rename the import tags to the real
tags



 ___
 Talk-us mailing list
 Talk-us@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk-us


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Katie Filbert
On Thu, Dec 9, 2010 at 8:06 PM, Gregory Arenius greg...@arenius.com wrote:


 A few comments...

 1) San Francisco explicitly says they do not have building outline data.
 :(  So, I suppose we get to add buildings ourselves.  I do see that SF does
 have parcels.

 For DC, we are attaching addresses to buildings when there is a one-to-one
 relation between them.  When there are multiple address nodes for a single
 building, then we keep them as nodes. In vast majority of cases, we do not
 have apartment numbers but in some cases we have things like 1120a, 1120b,
 1120c that can be imported.  Obviously, without a buildings dataset, our
 approach won't quite apply for SF.



 We mostly only have building shapes drawn in downtown where its unlikely
 there will be many one-to-one matches.  I wish we did have a building shape
 file though, that would be great.  I have thought about using the parcel
 data but I'm not sure thats as useful.


Agree, not sure how useful parcels are for us.





 2) I don't consider the addresses as noise.  The data is very helpful for
 geocoding.  If the renderer does a sloppy job making noise out of addresses,
 the renderings should be improved.


 3) Having looked at the data catalogue page, I do have concerns about the
 terms of use and think it's best to get SF to explicitly agree to allow OSM
 to use the data.

 http://gispub02.sfgov.org/website/sfshare/index2.asp


 What terms in particular caused you concern?  I'll need to know if I'm
 going to ask for explicit permission. A while back I posted the previous
 terms to talk legal and they pointed out problems.  The city changed the
 license when I pointed out that it cause problems for open project
 (apparently that was in the works anyway).  I thought those problems were
 removed.  I had a conference call with one of the datasf.org people the
 helps make city datasets available and an assistant city attorney prior to
 those changes and I was told that unless specifically noted otherwise in the
 dataset that the data was public domain.  I do understand that that isn't in
 writing though.

 If there is a problem with the terms though there is still a good chance
 the city would give us explicit permission to use the data;  they seemed
 excited about the prospect of some of it ending up in OSM.


I don't know enough to assess but concerned about the click to agree.
Also concerned about the possibility of switching to ODBL and contributor
terms and want to make sure the data would be compatible with those.  I
think it helps to have explicit permission (e.g. e-mail) for use in OSM
(agree that we can use dual-licensed under CC-BY-SA and ODBL) on file.



 4) If you can get explicit permission, then I suggest breaking up the
 address nodes into smaller chunks (e.g. by census block group), convert them
 to osm format with Ian's shp-to-osm tool, and check them for quality and
 against existing OSM data (e.g. existing pois w/ addresses) in JOSM before
 importing.  QGIS and/or PostGIS can be useful for chopping up the data into
 geographic chunks.  This approach gives opportunity to apply due diligence,
 to check things, and keep chunks small enough that it's reasonably possible
 to deal with any mistakes or glitches.


 I had been planning on using shp-to-osm to break it in to chunks by number
 of nodes but doing it geographically makes more sense.   Do you think census
 block size is best or maybe by neighborhood or aim for an approximate number
 of nodes in each geographic chunk?


With buildings, our data was a bit denser. I did some by census tract and
found some were too big for the OSM API and JOSM whereas census block group
has worked well. With just nodes, I think you could do somewhat larger
chunks.

Cheers,
Katie




 Cheers,
 Gregory Arenius




-- 
Katie Filbert
filbe...@gmail.com
@filbertkm
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Serge Wroclawski
On Thu, Dec 9, 2010 at 8:14 PM, Katie Filbert filbe...@gmail.com wrote:
 On Thu, Dec 9, 2010 at 8:06 PM, Gregory Arenius greg...@arenius.com wrote:

 A few comments...

 1) San Francisco explicitly says they do not have building outline data.
 :(  So, I suppose we get to add buildings ourselves.  I do see that SF does
 have parcels.

If buildings aren't availble, that's too bad, but such is life.

I don't think parcels are generally useful.

 2) I don't consider the addresses as noise.  The data is very helpful for
 geocoding.  If the renderer does a sloppy job making noise out of addresses,
 the renderings should be improved.

Katie's position is certainly valid, especially as it relates to geocoding.

They render ugly, but I'd rather ugly render and some data than no data.

 3) Having looked at the data catalogue page, I do have concerns about the
 terms of use and think it's best to get SF to explicitly agree to allow OSM
 to use the data.

 http://gispub02.sfgov.org/website/sfshare/index2.asp

I'd have legal look at this. I'm a little confused by some of the
wording about deravitive works and transfered rights and
indemnification.

If SF is open minded- that's awesome. In an ideal world they'd use an
existing license with well defined boundries, like CC0, but baring
that, I'd say don't mention the license at all, but simply have them
donate the data to OSM itself. Legal can help with this.


As for a demo of the data, yeah, an OSM file would be perfect. Also,
though, I'd keep the previous dataset ID, in case you need to do a
comparison later.

- Serge

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Address Node Import for San Francisco

2010-12-09 Thread Ian Dees
On Thu, Dec 9, 2010 at 7:31 PM, Serge Wroclawski emac...@gmail.com wrote:

  3) Having looked at the data catalogue page, I do have concerns about
 the
  terms of use and think it's best to get SF to explicitly agree to allow
 OSM
  to use the data.
 
  http://gispub02.sfgov.org/website/sfshare/index2.asp

 I'd have legal look at this. I'm a little confused by some of the
 wording about deravitive works and transfered rights and
 indemnification.


It's a fairly standard (if a little more wordy) indemnification agreement.
The derivative works stuff is making sure that no one can hold the San Fran
government liable for using the data directly or indirectly or when used as
part of a derivative work.

I don't see anything preventing the data from being used or traced by OSM. I
don't even see an attribution requirement.
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us