Re: [Talk-us] TIGER road expansion code
On Fri, May 11, 2012 at 11:20 PM, Serge Wroclawski emac...@gmail.com wrote: W and W Industrial Rd expands to West and W Industrial Road, since W is the direction_prefix, but the second W is unaccounted for, the script doesn't know if that is supposed to be W or West (and neither do I). The old script would have punted (since it's ambiguous which W should be expanded) the new one expands the first, since W is the direction_prefix. I think instead of focusing on these odd edge cases, we focus on the fact that we're now hitting the .0001% of roads that can't be expanded and accept that we're going to have to accept some small error rate, and so instead of focusing on fixing them, decide how we want to identify them). What percentage of roads, where the old script would have punted, are now being expanded correctly? What error rate is acceptable? ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] TIGER road expansion code
On Sat, May 12, 2012 at 2:20 AM, Dale Puch dale.p...@gmail.com wrote: Nice work. See the list of way names I posted in the other thread for some odd name or tag test cases. Dale, What I need at this point are people to look at the code, and give concrete feedback directly on the functionality of the code, or (better yet) to provide patches. - Serge ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] TIGER road expansion code
On Sat, May 12, 2012 at 12:41 PM, Serge Wroclawski emac...@gmail.com wrote: On Sat, May 12, 2012 at 7:05 AM, Anthony o...@inbox.org wrote: What percentage of roads, where the old script would have punted, are now being expanded correctly? The new script is up now. I used Maryland as a test case In Maryland the number went from 21 roads it couldn't expand to 1. But that's 20 out of the ~70k it *was* able to expand, so either you could say it handles 99.5% of all previously unexpandable roads, or it handled .0003% of all total roads. By roads you mean unique names? In any case, if it's only 20 for Maryland, it's probably a low enough number for the country that they can all be manually reviewed. What error rate is acceptable? As low as possible So any error rate is acceptable? ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] TIGER road expansion code
On 5/12/2012 12:41 PM, Serge Wroclawski wrote: What error rate is acceptable? As low as possible, but I've been generally able to handle the edge cases I've seen, either by doing the right thing, or by punting and doing nothing at all. It's worth noting that any errors are already there as errors in the TIGER tags. So, had the TIGER import been done properly in the first place, these errors would be in the name tags as well. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] TIGER road expansion code
On Sat, May 12, 2012 at 5:27 PM, Nathan Edgars II nerou...@gmail.com wrote: It's worth noting that any errors are already there as errors in the TIGER tags. So, had the TIGER import been done properly in the first place, these errors would be in the name tags as well. This seems to be the case. The code checks for tiger:name_base, and if it can't find it, then it doesn't make any changes. (Can someone confirm this? http://www.openstreetmap.org/browse/way/16011770 will be untouched, right?) If so, this is good, but it does mean that road names are going to get out of sync, if, for instance, tiger:name_base was removed from some of the ways and not removed from others. This will complicate later fixes/enhancements. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] TIGER road expansion code
On 5/12/2012 5:54 PM, Anthony wrote: If so, this is good, but it does mean that road names are going to get out of sync, if, for instance, tiger:name_base was removed from some of the ways and not removed from others. This will complicate later fixes/enhancements. This also happens long term as people create roads from scratch and don't know about the abbreviation rule when starting. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] TIGER road expansion code
On Sat, May 12, 2012 at 6:01 PM, Mike N nice...@att.net wrote: On 5/12/2012 5:54 PM, Anthony wrote: If so, this is good, but it does mean that road names are going to get out of sync, if, for instance, tiger:name_base was removed from some of the ways and not removed from others. This will complicate later fixes/enhancements. This also happens long term as people create roads from scratch and don't know about the abbreviation rule when starting. Yeah, it does, but we're not discussing that right now. If we can avoid adding tens of thousands of more of these cases, we should. And it seems we can. Even a simple rule like assuming that two connected ways with the same name are the road would be a useful addition. Something more complicated, especially something that looks at the history, would be even better. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
[Talk-us] TIGER road expansion code
Since the other thread has gotten a bit long, I want to start a new thread to discuss the TIGER road expansion code. The current version of the code is at: https://gist.github.com/2656735 Taking Dale's test cases, I've met a new version of the code and ran it against Maryland. (I didn't put the code up yet, I can if someone asks) This time, instead of 21 ambiguous names (expanding 99.9997% of ways), it came up with only 1 ambiguous road (99.% of ways), and that one is an interesting case where a user came in and modified the tiger tags, changing the tiger:name_base to the name, while leaving the tiger:name_type in place, so Lyon Dr was the name, Dr was the name_type and Lyon Dr was the name_base. This seemed like an odd case and the script did the right thing. I looked over the other examples where the script would have punted but now expanded, and it looks like it did the right thing, though there may be some issues with the TIGER data. for example: W and W Industrial Rd expands to West and W Industrial Road, since W is the direction_prefix, but the second W is unaccounted for, the script doesn't know if that is supposed to be W or West (and neither do I). The old script would have punted (since it's ambiguous which W should be expanded) the new one expands the first, since W is the direction_prefix. I think instead of focusing on these odd edge cases, we focus on the fact that we're now hitting the .0001% of roads that can't be expanded and accept that we're going to have to accept some small error rate, and so instead of focusing on fixing them, decide how we want to identify them). As for the code itself, I'm happy to take feedback, but I'd find it much easier to work with if that feedback came in the form of specific code questions, patches, or specific real world examples. - Serge ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us