Re: [Talk-us] TIGER road expansion code

2012-05-12 Thread Anthony
On Fri, May 11, 2012 at 11:20 PM, Serge Wroclawski emac...@gmail.com wrote:
 W and W Industrial Rd expands to West and W Industrial Road, since
 W is the direction_prefix, but the second W is unaccounted for, the
 script doesn't know if that is supposed to be W or West (and neither
 do I). The old script would have punted (since it's ambiguous which W
 should be expanded) the new one expands the first, since W is the
 direction_prefix.

 I think instead of focusing on these odd edge cases, we focus on the
 fact that we're now hitting the .0001% of roads that can't be expanded
 and accept that we're going to have to accept some small error rate,
 and so instead of focusing on fixing them, decide how we want to
 identify them).

What percentage of roads, where the old script would have punted, are
now being expanded correctly?

What error rate is acceptable?

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] TIGER road expansion code

2012-05-12 Thread Serge Wroclawski
On Sat, May 12, 2012 at 2:20 AM, Dale Puch dale.p...@gmail.com wrote:
 Nice work.  See the list of way names I posted in the other thread for some
 odd name or tag test cases.

Dale,

What I need at this point are people to look at the code, and give
concrete feedback directly on the functionality of the code, or
(better yet) to provide patches.

- Serge

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] TIGER road expansion code

2012-05-12 Thread Anthony
On Sat, May 12, 2012 at 12:41 PM, Serge Wroclawski emac...@gmail.com wrote:
 On Sat, May 12, 2012 at 7:05 AM, Anthony o...@inbox.org wrote:

 What percentage of roads, where the old script would have punted, are
 now being expanded correctly?

 The new script is up now.

 I used Maryland as a test case

 In Maryland the number went from 21 roads it couldn't expand to 1.

 But that's 20 out of the ~70k it *was* able to expand, so either you
 could say it handles 99.5% of all previously unexpandable roads, or it
 handled .0003% of all total roads.

By roads you mean unique names?

In any case, if it's only 20 for Maryland, it's probably a low enough
number for the country that they can all be manually reviewed.

 What error rate is acceptable?

 As low as possible

So any error rate is acceptable?

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] TIGER road expansion code

2012-05-12 Thread Nathan Edgars II

On 5/12/2012 12:41 PM, Serge Wroclawski wrote:

What error rate is acceptable?


As low as possible, but I've been generally able to handle the edge
cases I've seen, either by doing the right thing, or by punting and
doing nothing at all.


It's worth noting that any errors are already there as errors in the 
TIGER tags. So, had the TIGER import been done properly in the first 
place, these errors would be in the name tags as well.


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] TIGER road expansion code

2012-05-12 Thread Anthony
On Sat, May 12, 2012 at 5:27 PM, Nathan Edgars II nerou...@gmail.com wrote:
 It's worth noting that any errors are already there as errors in the TIGER
 tags. So, had the TIGER import been done properly in the first place, these
 errors would be in the name tags as well.

This seems to be the case.  The code checks for tiger:name_base, and
if it can't find it, then it doesn't make any changes.

(Can someone confirm this?
http://www.openstreetmap.org/browse/way/16011770 will be untouched,
right?)

If so, this is good, but it does mean that road names are going to get
out of sync, if, for instance, tiger:name_base was removed from some
of the ways and not removed from others.  This will complicate later
fixes/enhancements.

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] TIGER road expansion code

2012-05-12 Thread Mike N

On 5/12/2012 5:54 PM, Anthony wrote:

If so, this is good, but it does mean that road names are going to get
out of sync, if, for instance, tiger:name_base was removed from some
of the ways and not removed from others.  This will complicate later
fixes/enhancements.


  This also happens long term as people create roads from scratch and 
don't know about the abbreviation rule when starting.


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] TIGER road expansion code

2012-05-12 Thread Anthony
On Sat, May 12, 2012 at 6:01 PM, Mike N nice...@att.net wrote:
 On 5/12/2012 5:54 PM, Anthony wrote:

 If so, this is good, but it does mean that road names are going to get
 out of sync, if, for instance, tiger:name_base was removed from some
 of the ways and not removed from others.  This will complicate later
 fixes/enhancements.


  This also happens long term as people create roads from scratch and don't
 know about the abbreviation rule when starting.

Yeah, it does, but we're not discussing that right now.

If we can avoid adding tens of thousands of more of these cases, we should.

And it seems we can.  Even a simple rule like assuming that two
connected ways with the same name are the road would be a useful
addition.  Something more complicated, especially something that looks
at the history, would be even better.

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


[Talk-us] TIGER road expansion code

2012-05-11 Thread Serge Wroclawski
Since the other thread has gotten a bit long, I want to start a new
thread to discuss the TIGER road expansion code.

The current version of the code is at:  https://gist.github.com/2656735

Taking Dale's test cases, I've met a new version of the code and ran
it against Maryland. (I didn't put the code up yet, I can if someone
asks)

This time, instead of 21 ambiguous names (expanding 99.9997% of ways),
it came up with only 1 ambiguous road (99.% of ways), and that
one is an interesting case where a user came in and modified the tiger
tags, changing the tiger:name_base to the name, while leaving the
tiger:name_type in place, so Lyon Dr was the name, Dr was the
name_type and Lyon Dr was the name_base. This seemed like an odd
case and the script did the right thing.

I looked over the other examples where the script would have punted
but now expanded, and it looks like it did the right thing, though
there may be some issues with the TIGER data. for example:

W and W Industrial Rd expands to West and W Industrial Road, since
W is the direction_prefix, but the second W is unaccounted for, the
script doesn't know if that is supposed to be W or West (and neither
do I). The old script would have punted (since it's ambiguous which W
should be expanded) the new one expands the first, since W is the
direction_prefix.

I think instead of focusing on these odd edge cases, we focus on the
fact that we're now hitting the .0001% of roads that can't be expanded
and accept that we're going to have to accept some small error rate,
and so instead of focusing on fixing them, decide how we want to
identify them).

As for the code itself, I'm happy to take feedback, but I'd find it
much easier to work with if that feedback came in the form of specific
code questions, patches, or specific real world examples.

- Serge

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us