Re: [Talk-us] Address Node Import for San Francisco
As I've been out of touch here is a sort of omnibus reply to the last couple of days worth of discussion on SF Addresses. Thanks for all the ideas and help. I would say this is one of the easy imports, there is not too much harm it can create. only problem is to merge it with existing data and make a decision which one is better. Since this data is probably authoritative it might be ok to replace most of the less accurate data already in OSM. For this reason I would drop any of the nodes in case of a conflict but rename the tags to something else like sf_addrimport_addr:* a survey on the road can check them later and compare with the existing addr nodes and decide which one to keep and rename the import tags to the real tags I don't think that the data currently in OSM is less accurate than that of the import. The address data currently in OSM in SF is either on a node for something else like a restaurant or shop or its one of the very few standalone address nodes that have been entered. I think that having the data attached to a business is much more valuable than it is alone and don't intend to over write any of that. There are only a couple of little areas comprising maybe half a dozen blocks that have standalone address nodes. The data thats in there looks like it has been carefully entered and I don't doubt its accuracy especially because I've met one of the mappers that did some of it and she knows what shes doing. As such I don't really think its worth having a fall back sf_addrimport_addr: tag for conflicting nodes and would rather just drop the ones from the dataset I'm importing if they conflict. I definitely will though if anything I come across in the data makes me think it would be worthwhile. Do spot-check different neighborhoods. In reviewing the San Bernardino County assessor's shapefile, I found that housenumbers, ZIP codes, and even street names were missing/wrong in some areas I spot-checked. The county's response was that this data was of secondary importance to the assessor, understandably - as long as they have all the parcels, and the billing address for them, the actual postal address of the parcel is not critical info. I will spot check different neighborhoods to make sure that they're of equal quality to the blocks I've checked which have mostly been ones local to where I live and work. I've found no reason to think that any of the data is billing addresses for the parcel instead of the mailing address of the parcel. I'll keep an eye out for it though. As to zip codes I don't plan on putting any in because I haven't found a source for them that I feel would work. As for a demo of the data, yeah, an OSM file would be perfect. Also, though, I'd keep the previous dataset ID, in case you need to do a comparison later. I will definitely post an OSM file once I have something a bit closer to being import ready. As to the previous dataset ID, in this case the ObjectID, I'm not particularly opposed to keeping it I'm just not sure what we'd gain in this instance and I know there are people who object to having lots of third party IDs in our database. In this particular instance I think comparisons between OSM and any future SFAddress files can be done equally as well using the addr:housenumber/addr:street combo which should, ideally, be unique. As we'll have a fair number of nodes that aren't imported because the address is already taken by a business or otherwise already in OSM we'd have to resort to using that sort of matching system anyhow. I don't agree that the other info can be easily, or accurately, derived. Addresses near the borders of those polygons are often subject to seemingly-arbitrary decisions. The physical location of the centroid of a parcel may not be within the same ZIP, city, and/or county polygons as their address. I would include the city and ZIP code. Make sense. I will include the city but as stated above I don't have the ZIPs. Just wanna say that addressing in SF would be awesome :-) The goal is to make it so that SF is fully routable. As we have good (not perfect, but really good)street geometry, junctions, classifications (eg, primary, residential, etc), oneways and names the main things we're lacking are addresses and turn restrictions. Hopefully we'll get addresses from this import. As there doesn't seem to be any source for the turn restriction data I've put up a page on the wikihttp://wiki.openstreetmap.org/wiki/San_Francisco_Turn_Restrictionsto help coordinate efforts to map them and put a little dent in it myself to start. Hopefully some more people join in. I think getting these two things done will put OSM at a pretty competitive level with any of the commercial data providers with respect to SF. Hopefully this is helpful, as you'll want to import street names that actually match those in OSM's view of San Francisco. It is helpful, thank you, especially being able to see where many of the
Re: [Talk-us] Address Node Import for San Francisco
Just wanna say that addressing in SF would be awesome :-) Steve stevecoast.com On Dec 10, 2010, at 1:29 AM, Katie Filbert filbe...@gmail.com wrote: On Thu, Dec 9, 2010 at 6:20 PM, Serge Wroclawski emac...@gmail.com wrote: On Thu, Dec 9, 2010 at 6:00 PM, Gregory Arenius greg...@arenius.com wrote: I've been working on an import of San Francisco address node data. I have several thoughts and questions and would appreciate any feedback. The Wiki page doesn't mention the original dataset url. I have a few concerns: 1) Without seeing the dataset url, it's hard to know anything about the dataset (its age, accuracy, etc.) This is a real problem with imports- knowing the original quality of the dataset before it's imported. The project has had to remove or correct so many bad datasets, it's incredibly annoying. About the data. Its in a shapefile format containing about 230,000 individual nodes. The data is really high quality and all of the addresses I have checked are correct. It has pretty complete coverage of the entire city. MHO is that individual node addresses are pretty awful. If you can import the building outlines, and then attach the addresses to them, great (and you'll need to consider what's to be done with any existing data), but otherwise, IMHO, this dataset just appears as noise. Also, there are a large number of places where there are multiple nodes in one location if there is more than one address at that location. One example would be a house broken into five apartments. Sometimes they keep one address and use apartment numbers and sometimes each apartment gets its own house number. In the latter cases there will be five nodes with different addr:housenumber fields but identical addr:street and lat/long coordinates. Should I keep the individual nodes or should I combine them? Honestly, I think this is a very cart-before-horse. Please consider making a test of your dataset somewhere people can check out, and then solicit feedback on the process. I haven't yet looked into how I plan to do the actual uploading but I'll take care to make sure its easily reversible if anything goes wrong and doesn't hammer any servers. There are people who've spent years with the project and not gotten imports right, I think this is a less trivial problem than you might expect. I've also made a wiki page for the import. Feedback welcome here or on the wiki page. This really belongs on the imports list as well, but my feedback would be: 1) Where's the shapefile? (if for nothing else, than the licnese, but also for feedback) 2) Can you attach the addresses to real objects (rather than standalone nodes)? 3) What metadata will you keep from the other dataset? 4) How will you handle internally conflicting data? 5) How will you handle conflicts with existing OSM data? - Serge A few comments... 1) San Francisco explicitly says they do not have building outline data. :( So, I suppose we get to add buildings ourselves. I do see that SF does have parcels. For DC, we are attaching addresses to buildings when there is a one-to-one relation between them. When there are multiple address nodes for a single building, then we keep them as nodes. In vast majority of cases, we do not have apartment numbers but in some cases we have things like 1120a, 1120b, 1120c that can be imported. Obviously, without a buildings dataset, our approach won't quite apply for SF. 2) I don't consider the addresses as noise. The data is very helpful for geocoding. If the renderer does a sloppy job making noise out of addresses, the renderings should be improved. 3) Having looked at the data catalogue page, I do have concerns about the terms of use and think it's best to get SF to explicitly agree to allow OSM to use the data. http://gispub02.sfgov.org/website/sfshare/index2.asp 4) If you can get explicit permission, then I suggest breaking up the address nodes into smaller chunks (e.g. by census block group), convert them to osm format with Ian's shp-to-osm tool, and check them for quality and against existing OSM data (e.g. existing pois w/ addresses) in JOSM before importing. QGIS and/or PostGIS can be useful for chopping up the data into geographic chunks. This approach gives opportunity to apply due diligence, to check things, and keep chunks small enough that it's reasonably possible to deal with any mistakes or glitches. -Katie ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us -- Katie Filbert filbe...@gmail.com @filbertkm ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us ___ Talk-us mailing list
Re: [Talk-us] Address Node Import for San Francisco
On Dec 9, 2010, at 3:00 PM, Gregory Arenius wrote: About the data. Its in a shapefile format containing about 230,000 individual nodes. The data is really high quality and all of the addresses I have checked are correct. It has pretty complete coverage of the entire city. I've worked with this file before. When I matched it to OSM data two years ago, I found that the SF data had numerous errors, so I wrote this mapping script: http://mike.teczno.com/img/sf-addresses/mapping.py Usage: mapping.py [osm streets csv] [sf streets csv] [street names csv] Here are all the street names in the shapefile: http://mike.teczno.com/img/sf-addresses/sfaddresses.csv Here are all the street names in OSM at the time I did the comparison (may have changed since): http://mike.teczno.com/img/sf-addresses/osm_streets.csv And this is the mapping result I got: http://mike.teczno.com/img/sf-addresses/street_names.csv Hopefully this is helpful, as you'll want to import street names that actually match those in OSM's view of San Francisco. I found some other weird burrs in the data as well, in terms of how it arranges addresses stacked on top of one another in tall buildings. Nothing that can't be dealt with in an import. I also did a bunch of geometry work to match those address points to nearby street segments in order to break up the street grid into addresses segments, but that code is a bit of a rat's nest. The idea was to build up the little block numbers you see rendered here: http://www.flickr.com/photos/mmigurski/5229627985/sizes/l/ Katie's suggestion of breaking the data into smaller chunks is a good one. -mike. michal migurski- m...@stamen.com 415.558.1610 ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Address Node Import for San Francisco
At 2010-12-09 17:14, Katie Filbert wrote: ... With buildings, our data was a bit denser. I did some by census tract and found some were too big for the OSM API and JOSM whereas census block group has worked well. With just nodes, I think you could do somewhat larger chunks. Were the shapes (needlessly) over-digitized? I saw this with some of the CASIL polygons, and with much of Kern County, where huge reductions (thousands of nodes per square mile) were possible with little effect on rendering, using the JOSM simplify. Hopefully there is something similar available in the import tool collection. -- Alan Mintz alan_mintz+...@earthlink.net ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Address Node Import for San Francisco
At 2010-12-09 15:00, Gregory Arenius wrote: About the data. Its in a shapefile format containing about 230,000 individual nodes. The data is really high quality and all of the addresses I have checked are correct. It has pretty complete coverage of the entire city. Do spot-check different neighborhoods. In reviewing the San Bernardino County assessor's shapefile, I found that housenumbers, ZIP codes, and even street names were missing/wrong in some areas I spot-checked. The county's response was that this data was of secondary importance to the assessor, understandably - as long as they have all the parcels, and the billing address for them, the actual postal address of the parcel is not critical info. First, I've looked at how address nodes have been input manually. In some places they are just addr:housenumber and addr:street and nothing else. In other places they include the city and the country and sometimes another administrative level such as state. Since the last three pieces of information can be fairly easily derived I was thinking of just doing the house number and the street. The dataset is fairly large so I don't want to include any extra fields if I don't have to. Is this level of information sufficient? Or should I include the city and the state and the country in each node? I don't agree that the other info can be easily, or accurately, derived. Addresses near the borders of those polygons are often subject to seemingly-arbitrary decisions. The physical location of the centroid of a parcel may not be within the same ZIP, city, and/or county polygons as their address. I would include the city and ZIP code. Note, BTW, that there are lots of ZIP code issues that come up, and I'm not always sure how to deal with them. I'll look up an address I know to exist using http://zip4.usps.com/zip4/welcome.jsp, but it won't find it - often because the USPS uses a different city name. It seems to happen a lot in rural areas, but not exclusively, and not always for the reason you might think (that it's the city of the post office that serves the address). Hopefully, that won't be a problem for your single-city import, though. -- Alan Mintz alan_mintz+...@earthlink.net ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Address Node Import for San Francisco
On Dec 10, 2010, at 5:10 AM, Alan Mintz alan_mintz+...@earthlink.net wrote: At 2010-12-09 17:14, Katie Filbert wrote: ... With buildings, our data was a bit denser. I did some by census tract and found some were too big for the OSM API and JOSM whereas census block group has worked well. With just nodes, I think you could do somewhat larger chunks. Were the shapes (needlessly) over-digitized? I saw this with some of the CASIL polygons, and with much of Kern County, where huge reductions (thousands of nodes per square mile) were possible with little effect on rendering, using the JOSM simplify. Hopefully there is something similar available in the import tool collection. Yes, simplifying the buildings in JOSM is an important step in the process. I tweaked the simplify param to get it best as possible. Though not 100% happy with the results so do some more manual tweaking in JOSM Katie -- Alan Mintz alan_mintz+...@earthlink.net ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
[Talk-us] Address Node Import for San Francisco
I've been working on an import of San Francisco address node data. I have several thoughts and questions and would appreciate any feedback. About the data. Its in a shapefile format containing about 230,000 individual nodes. The data is really high quality and all of the addresses I have checked are correct. It has pretty complete coverage of the entire city. First, I've looked at how address nodes have been input manually. In some places they are just addr:housenumber and addr:street and nothing else. In other places they include the city and the country and sometimes another administrative level such as state. Since the last three pieces of information can be fairly easily derived I was thinking of just doing the house number and the street. The dataset is fairly large so I don't want to include any extra fields if I don't have to. Is this level of information sufficient? Or should I include the city and the state and the country in each node? Also, there are a large number of places where there are multiple nodes in one location if there is more than one address at that location. One example would be a house broken into five apartments. Sometimes they keep one address and use apartment numbers and sometimes each apartment gets its own house number. In the latter cases there will be five nodes with different addr:housenumber fields but identical addr:street and lat/long coordinates. Should I keep the individual nodes or should I combine them? For instance, I could do one node and have addr:housenumber=5;6;7;8;9 or have a node for each address. Combining nodes would cut the number of nodes imported by about 40% but I fear that it might be harder to work with manually and also not recognized by routers and other software. Before importing the data I will run a comparison against existing OSM data and not upload nodes that match an existing addr:housenumber/addr:street combination. There aren't many plain address nodes in the city at the moment (a couple hundred, tops) but there are a fair number of businesses that have had address data added to them and I don't want any duplicate address nodes as a result of this import. There are only a very few address ways in the SF dataset but they aren't any where near as accurate as the data I will be importing so I plan on deleting those. I haven't yet looked into how I plan to do the actual uploading but I'll take care to make sure its easily reversible if anything goes wrong and doesn't hammer any servers. I've also made a wiki page for the import.http://wiki.openstreetmap.org/wiki/San_Francisco_Address_Import Feedback welcome here or on the wiki page. Cheers, Gregory Arenius ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Address Node Import for San Francisco
First, I've looked at how address nodes have been input manually. In some places they are just addr:housenumber and addr:street and nothing else. In other places they include the city and the country and sometimes another administrative level such as state. Since the last three pieces of information can be fairly easily derived I was thinking of just doing the house number and the street. The dataset is fairly large so I don't want to include any extra fields if I don't have to. Is this level of information sufficient? Or should I include the city and the state and the country in each node? I would recommend just addr:housenumber and addr:street. The reason is that the city, state, etc can be derived from bounding polygons. In addition, those polygons frequently change. By not including city, state, etc, there is one less step to go through when the boundaries change. Also, there are a large number of places where there are multiple nodes in one location if there is more than one address at that location. One example would be a house broken into five apartments. Sometimes they keep one address and use apartment numbers and sometimes each apartment gets its own house number. In the latter cases there will be five nodes with different addr:housenumber fields but identical addr:street and lat/long coordinates. Should I keep the individual nodes or should I combine them? For instance, I could do one node and have addr:housenumber=5;6;7;8;9 or have a node for each address. Combining nodes would cut the number of nodes imported by about 40% but I fear that it might be harder to work with manually and also not recognized by routers and other software. I would recommend a node per address; this matches the existing Wiki convention, and should work with Routers and Nominatum. Editors don't make it easy to access an individual node out of a stack, but it is not too difficult for the odd case where it might be necessary.___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Address Node Import for San Francisco
On Thu, Dec 9, 2010 at 4:09 PM, Mike N. nice...@att.net wrote: First, I've looked at how address nodes have been input manually. In some places they are just addr:housenumber and addr:street and nothing else. In other places they include the city and the country and sometimes another administrative level such as state. Since the last three pieces of information can be fairly easily derived I was thinking of just doing the house number and the street. The dataset is fairly large so I don't want to include any extra fields if I don't have to. Is this level of information sufficient? Or should I include the city and the state and the country in each node? I would recommend just addr:housenumber and addr:street. The reason is that the city, state, etc can be derived from bounding polygons. In addition, those polygons frequently change. By not including city, state, etc, there is one less step to go through when the boundaries change. That works for states, but not cities as the cities used in postal addresses don't match municipal boundaries in many cases. It would be good to include postal codes (zip codes in U.S.) as it would eliminate the need for a city provided the application doing the routing has a suitable look up table. But there are problems with this as well. For example, the USPS is always making changes to the zip codes and usually the only authoritative source is licensed data from the USPS (i.e. there is usually no way to observe zip codes from a field survey). Note that the zip code boundaries from the US Census Bureau are not real zip code boundaries, they are only for statistical purposes and have been edited to fit that purpose. Also, there are cases where a single building has its own zip code, and these do not show up in the census zip code polygons. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Address Node Import for San Francisco
On Thu, Dec 9, 2010 at 6:44 PM, Mike Thompson miketh...@gmail.com wrote: Also, there are cases where a single building has its own zip code, and these do not show up in the census zip code polygons. Or an entire (company-owned) city: Lake Buena Vista, Florida has been 32830 since 1971, but the TIGER polygons don't recognize this. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Address Node Import for San Francisco
MHO is that individual node addresses are pretty awful. If you can import the building outlines, and then attach the addresses to them, great (and you'll need to consider what's to be done with any existing data), but otherwise, IMHO, this dataset just appears as noise. Why does the dataset appear as noise when not attached to another object? Have I been mapping address nodes wrong? ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Address Node Import for San Francisco
On Thu, Dec 9, 2010 at 6:20 PM, Serge Wroclawski emac...@gmail.com wrote: On Thu, Dec 9, 2010 at 6:00 PM, Gregory Arenius greg...@arenius.com wrote: I've been working on an import of San Francisco address node data. I have several thoughts and questions and would appreciate any feedback. The Wiki page doesn't mention the original dataset url. I have a few concerns: 1) Without seeing the dataset url, it's hard to know anything about the dataset (its age, accuracy, etc.) This is a real problem with imports- knowing the original quality of the dataset before it's imported. The project has had to remove or correct so many bad datasets, it's incredibly annoying. About the data. Its in a shapefile format containing about 230,000 individual nodes. The data is really high quality and all of the addresses I have checked are correct. It has pretty complete coverage of the entire city. MHO is that individual node addresses are pretty awful. If you can import the building outlines, and then attach the addresses to them, great (and you'll need to consider what's to be done with any existing data), but otherwise, IMHO, this dataset just appears as noise. Also, there are a large number of places where there are multiple nodes in one location if there is more than one address at that location. One example would be a house broken into five apartments. Sometimes they keep one address and use apartment numbers and sometimes each apartment gets its own house number. In the latter cases there will be five nodes with different addr:housenumber fields but identical addr:street and lat/long coordinates. Should I keep the individual nodes or should I combine them? Honestly, I think this is a very cart-before-horse. Please consider making a test of your dataset somewhere people can check out, and then solicit feedback on the process. I haven't yet looked into how I plan to do the actual uploading but I'll take care to make sure its easily reversible if anything goes wrong and doesn't hammer any servers. There are people who've spent years with the project and not gotten imports right, I think this is a less trivial problem than you might expect. I've also made a wiki page for the import. Feedback welcome here or on the wiki page. This really belongs on the imports list as well, but my feedback would be: 1) Where's the shapefile? (if for nothing else, than the licnese, but also for feedback) 2) Can you attach the addresses to real objects (rather than standalone nodes)? 3) What metadata will you keep from the other dataset? 4) How will you handle internally conflicting data? 5) How will you handle conflicts with existing OSM data? - Serge A few comments... 1) San Francisco explicitly says they do not have building outline data. :( So, I suppose we get to add buildings ourselves. I do see that SF does have parcels. For DC, we are attaching addresses to buildings when there is a one-to-one relation between them. When there are multiple address nodes for a single building, then we keep them as nodes. In vast majority of cases, we do not have apartment numbers but in some cases we have things like 1120a, 1120b, 1120c that can be imported. Obviously, without a buildings dataset, our approach won't quite apply for SF. 2) I don't consider the addresses as noise. The data is very helpful for geocoding. If the renderer does a sloppy job making noise out of addresses, the renderings should be improved. 3) Having looked at the data catalogue page, I do have concerns about the terms of use and think it's best to get SF to explicitly agree to allow OSM to use the data. http://gispub02.sfgov.org/website/sfshare/index2.asp 4) If you can get explicit permission, then I suggest breaking up the address nodes into smaller chunks (e.g. by census block group), convert them to osm format with Ian's shp-to-osm tool, and check them for quality and against existing OSM data (e.g. existing pois w/ addresses) in JOSM before importing. QGIS and/or PostGIS can be useful for chopping up the data into geographic chunks. This approach gives opportunity to apply due diligence, to check things, and keep chunks small enough that it's reasonably possible to deal with any mistakes or glitches. -Katie ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us -- Katie Filbert filbe...@gmail.com @filbertkm ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Address Node Import for San Francisco
On Thu, Dec 9, 2010 at 3:20 PM, Serge Wroclawski emac...@gmail.com wrote: On Thu, Dec 9, 2010 at 6:00 PM, Gregory Arenius greg...@arenius.com wrote: I've been working on an import of San Francisco address node data. I have several thoughts and questions and would appreciate any feedback. The Wiki page doesn't mention the original dataset url. I have a few concerns:http://gispub02.sfgov.org/website/sfshare/catalog/sfaddresses.zip The shapefile is here.http://gispub02.sfgov.org/website/sfshare/catalog/sfaddresses.zip I added it to the wiki. I'm sorry, it should have been there to start with. 1) Without seeing the dataset url, it's hard to know anything about the dataset (its age, accuracy, etc.) This is a real problem with imports- knowing the original quality of the dataset before it's imported. The project has had to remove or correct so many bad datasets, it's incredibly annoying. I've spot checked a number of blocks by going out and comparing the data and been impressed with its accuracy. The data is sourced from the Department of Building Inspection's Address Verification System, the Assessor-Recorder Office's Parcel database and the Department of Elections (Voter Registration Project). I believe it to be high quality and have been told by another that has used it that the dataset is legit. About the data. Its in a shapefile format containing about 230,000 individual nodes. The data is really high quality and all of the addresses I have checked are correct. It has pretty complete coverage of the entire city. MHO is that individual node addresses are pretty awful. If you can import the building outlines, and then attach the addresses to them, great (and you'll need to consider what's to be done with any existing data), but otherwise, IMHO, this dataset just appears as noise. The wiki states that this is how address nodes are done. They can be attached to other objects of course but they can also be independent. Like I stated earlier I did check how they are actually being done elsewhere and the ones I've seen entered are done in this manner. Also, why do you think of them as noise? They're useful for geocoding and door to door routing. The routing in particular is something people clamor for when its lacking. As for attaching them to buildings that doesn't particularly work well in many cases especially in San Francisco. For instance a building might have a number of addresses in it. A large building taking up a whole block could have addresses on multiple streets. Also, we don't have building outlines for most of SF and that shouldn't stop us from having useful routing. Also, there are a large number of places where there are multiple nodes in one location if there is more than one address at that location. One example would be a house broken into five apartments. Sometimes they keep one address and use apartment numbers and sometimes each apartment gets its own house number. In the latter cases there will be five nodes with different addr:housenumber fields but identical addr:street and lat/long coordinates. Should I keep the individual nodes or should I combine them? Honestly, I think this is a very cart-before-horse. Please consider making a test of your dataset somewhere people can check out, and then solicit feedback on the process. As I'm still planning things out I think its a good time to discuss this type of issue. As to a test, what do you recommend? Tossing the OSM file up somewhere for people to see or did you mean more testing the upload process on a dev server type of thing. I'm planning on doing both but if you have other ideas that might help I'm listening. I haven't yet looked into how I plan to do the actual uploading but I'll take care to make sure its easily reversible if anything goes wrong and doesn't hammer any servers. There are people who've spent years with the project and not gotten imports right, I think this is a less trivial problem than you might expect. I hear this every time imports come up. I got it. Its hard. Thats why I'm soliciting feedback and willing to take my time and am really trying to do it correctly. I'm not willing to just give up because there have been problems with imports in the past. I've also made a wiki page for the import. Feedback welcome here or on the wiki page. This really belongs on the imports list as well, but my feedback would be: 1) Where's the shapefile? (if for nothing else, than the licnese, but also for feedback) I added it to the wiki page. Again I'm sorry it wasn't there to begin with. The shapefile is here.http://gispub02.sfgov.org/website/sfshare/catalog/sfaddresses.zipAs for the license I believe its okay but I posted that bit to talk legal because I thought it belonged there. 2) Can you attach the addresses to real objects (rather than standalone nodes)? Generally speaking, no. We don't
Re: [Talk-us] Address Node Import for San Francisco
A few comments... 1) San Francisco explicitly says they do not have building outline data. :( So, I suppose we get to add buildings ourselves. I do see that SF does have parcels. For DC, we are attaching addresses to buildings when there is a one-to-one relation between them. When there are multiple address nodes for a single building, then we keep them as nodes. In vast majority of cases, we do not have apartment numbers but in some cases we have things like 1120a, 1120b, 1120c that can be imported. Obviously, without a buildings dataset, our approach won't quite apply for SF. We mostly only have building shapes drawn in downtown where its unlikely there will be many one-to-one matches. I wish we did have a building shape file though, that would be great. I have thought about using the parcel data but I'm not sure thats as useful. 2) I don't consider the addresses as noise. The data is very helpful for geocoding. If the renderer does a sloppy job making noise out of addresses, the renderings should be improved. 3) Having looked at the data catalogue page, I do have concerns about the terms of use and think it's best to get SF to explicitly agree to allow OSM to use the data. http://gispub02.sfgov.org/website/sfshare/index2.asp What terms in particular caused you concern? I'll need to know if I'm going to ask for explicit permission. A while back I posted the previous terms to talk legal and they pointed out problems. The city changed the license when I pointed out that it cause problems for open project (apparently that was in the works anyway). I thought those problems were removed. I had a conference call with one of the datasf.org people the helps make city datasets available and an assistant city attorney prior to those changes and I was told that unless specifically noted otherwise in the dataset that the data was public domain. I do understand that that isn't in writing though. If there is a problem with the terms though there is still a good chance the city would give us explicit permission to use the data; they seemed excited about the prospect of some of it ending up in OSM. 4) If you can get explicit permission, then I suggest breaking up the address nodes into smaller chunks (e.g. by census block group), convert them to osm format with Ian's shp-to-osm tool, and check them for quality and against existing OSM data (e.g. existing pois w/ addresses) in JOSM before importing. QGIS and/or PostGIS can be useful for chopping up the data into geographic chunks. This approach gives opportunity to apply due diligence, to check things, and keep chunks small enough that it's reasonably possible to deal with any mistakes or glitches. I had been planning on using shp-to-osm to break it in to chunks by number of nodes but doing it geographically makes more sense. Do you think census block size is best or maybe by neighborhood or aim for an approximate number of nodes in each geographic chunk? Cheers, Gregory Arenius ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Address Node Import for San Francisco
On Thu, Dec 9, 2010 at 4:38 PM, Gregory Arenius greg...@arenius.com wrote: The wiki states that this is how address nodes are done. They can be attached to other objects of course but they can also be independent. Like I stated earlier I did check how they are actually being done elsewhere and the ones I've seen entered are done in this manner. Also, why do you think of them as noise? They're useful for geocoding and door to door routing. The routing in particular is something people clamor for when its lacking. individual address nodes are common and there is nothing wrong adding them As for attaching them to buildings that doesn't particularly work well in many cases especially in San Francisco. For instance a building might have a number of addresses in it. A large building taking up a whole block could have addresses on multiple streets. Also, we don't have building outlines for most of SF and that shouldn't stop us from having useful routing. setting address to a building is good if there are buildings. but in this case it makes absolute sense to have individual nodes. in case of multiple addresses on one building the address nodes can be used as a node in the building outline to mark the individual entrances on large buildings. but this is really optional. Also, there are a large number of places where there are multiple nodes in one location if there is more than one address at that location. One example would be a house broken into five apartments. Sometimes they keep one address and use apartment numbers and sometimes each apartment gets its own house number. In the latter cases there will be five nodes with different addr:housenumber fields but identical addr:street and lat/long coordinates. Should I keep the individual nodes or should I combine them? don't combine them if they have different house number. reality is there are different address so we should map all of them even if the location is the same. I hear this every time imports come up. I got it. Its hard. Thats why I'm soliciting feedback and willing to take my time and am really trying to do it correctly. I'm not willing to just give up because there have been problems with imports in the past. I would say this is one of the easy imports, there is not too much harm it can create. only problem is to merge it with existing data and make a decision which one is better. Since this data is probably authoritative it might be ok to replace most of the less accurate data already in OSM. For this reason I would drop any of the nodes in case of a conflict but rename the tags to something else like sf_addrimport_addr:* a survey on the road can check them later and compare with the existing addr nodes and decide which one to keep and rename the import tags to the real tags ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Address Node Import for San Francisco
On Thu, Dec 9, 2010 at 8:06 PM, Gregory Arenius greg...@arenius.com wrote: A few comments... 1) San Francisco explicitly says they do not have building outline data. :( So, I suppose we get to add buildings ourselves. I do see that SF does have parcels. For DC, we are attaching addresses to buildings when there is a one-to-one relation between them. When there are multiple address nodes for a single building, then we keep them as nodes. In vast majority of cases, we do not have apartment numbers but in some cases we have things like 1120a, 1120b, 1120c that can be imported. Obviously, without a buildings dataset, our approach won't quite apply for SF. We mostly only have building shapes drawn in downtown where its unlikely there will be many one-to-one matches. I wish we did have a building shape file though, that would be great. I have thought about using the parcel data but I'm not sure thats as useful. Agree, not sure how useful parcels are for us. 2) I don't consider the addresses as noise. The data is very helpful for geocoding. If the renderer does a sloppy job making noise out of addresses, the renderings should be improved. 3) Having looked at the data catalogue page, I do have concerns about the terms of use and think it's best to get SF to explicitly agree to allow OSM to use the data. http://gispub02.sfgov.org/website/sfshare/index2.asp What terms in particular caused you concern? I'll need to know if I'm going to ask for explicit permission. A while back I posted the previous terms to talk legal and they pointed out problems. The city changed the license when I pointed out that it cause problems for open project (apparently that was in the works anyway). I thought those problems were removed. I had a conference call with one of the datasf.org people the helps make city datasets available and an assistant city attorney prior to those changes and I was told that unless specifically noted otherwise in the dataset that the data was public domain. I do understand that that isn't in writing though. If there is a problem with the terms though there is still a good chance the city would give us explicit permission to use the data; they seemed excited about the prospect of some of it ending up in OSM. I don't know enough to assess but concerned about the click to agree. Also concerned about the possibility of switching to ODBL and contributor terms and want to make sure the data would be compatible with those. I think it helps to have explicit permission (e.g. e-mail) for use in OSM (agree that we can use dual-licensed under CC-BY-SA and ODBL) on file. 4) If you can get explicit permission, then I suggest breaking up the address nodes into smaller chunks (e.g. by census block group), convert them to osm format with Ian's shp-to-osm tool, and check them for quality and against existing OSM data (e.g. existing pois w/ addresses) in JOSM before importing. QGIS and/or PostGIS can be useful for chopping up the data into geographic chunks. This approach gives opportunity to apply due diligence, to check things, and keep chunks small enough that it's reasonably possible to deal with any mistakes or glitches. I had been planning on using shp-to-osm to break it in to chunks by number of nodes but doing it geographically makes more sense. Do you think census block size is best or maybe by neighborhood or aim for an approximate number of nodes in each geographic chunk? With buildings, our data was a bit denser. I did some by census tract and found some were too big for the OSM API and JOSM whereas census block group has worked well. With just nodes, I think you could do somewhat larger chunks. Cheers, Katie Cheers, Gregory Arenius -- Katie Filbert filbe...@gmail.com @filbertkm ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Address Node Import for San Francisco
On Thu, Dec 9, 2010 at 8:14 PM, Katie Filbert filbe...@gmail.com wrote: On Thu, Dec 9, 2010 at 8:06 PM, Gregory Arenius greg...@arenius.com wrote: A few comments... 1) San Francisco explicitly says they do not have building outline data. :( So, I suppose we get to add buildings ourselves. I do see that SF does have parcels. If buildings aren't availble, that's too bad, but such is life. I don't think parcels are generally useful. 2) I don't consider the addresses as noise. The data is very helpful for geocoding. If the renderer does a sloppy job making noise out of addresses, the renderings should be improved. Katie's position is certainly valid, especially as it relates to geocoding. They render ugly, but I'd rather ugly render and some data than no data. 3) Having looked at the data catalogue page, I do have concerns about the terms of use and think it's best to get SF to explicitly agree to allow OSM to use the data. http://gispub02.sfgov.org/website/sfshare/index2.asp I'd have legal look at this. I'm a little confused by some of the wording about deravitive works and transfered rights and indemnification. If SF is open minded- that's awesome. In an ideal world they'd use an existing license with well defined boundries, like CC0, but baring that, I'd say don't mention the license at all, but simply have them donate the data to OSM itself. Legal can help with this. As for a demo of the data, yeah, an OSM file would be perfect. Also, though, I'd keep the previous dataset ID, in case you need to do a comparison later. - Serge ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Address Node Import for San Francisco
On Thu, Dec 9, 2010 at 7:31 PM, Serge Wroclawski emac...@gmail.com wrote: 3) Having looked at the data catalogue page, I do have concerns about the terms of use and think it's best to get SF to explicitly agree to allow OSM to use the data. http://gispub02.sfgov.org/website/sfshare/index2.asp I'd have legal look at this. I'm a little confused by some of the wording about deravitive works and transfered rights and indemnification. It's a fairly standard (if a little more wordy) indemnification agreement. The derivative works stuff is making sure that no one can hold the San Fran government liable for using the data directly or indirectly or when used as part of a derivative work. I don't see anything preventing the data from being used or traced by OSM. I don't even see an attribution requirement. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us