Re: [Talk-us] Fixing TIGER street name abbreviations
Lots of weird ones from Florida Many should not give you an issue due to how your processing, but it is best to test them anyhow. Also it might be a good reference when looking at other expansions after this runs. way id=10761946 name v=E 10th Ct E way id=10763539 name v=E 10th St E way id=10759486 name v=E 14th Pl E way id=11018453 name v=E 1st Avenue Pl -- not really a problem, just... odd way id=10763214 name v=E 40th Pz E -- Note the double space before E way id=10966845 name v=E Camp N Comfort Ln -- Non directional N way id=11210989 name v=E Canal St N way id=10967404 name v=E Dr way id=10974755 name v=E Dr Martin Luther King Jr Blvd way id=11278916 name v=E H St E way id=10965707 name v=E Ln way id=11242732 name v=E Martin Luther King Jr Dr way id=11102139 name v=E Pl way id=10959109 name v=E St Andrews Dr way id=10827576 name v=E St James Loop -- I guess Tiger did not abbreviate loop way id=11272826 name v=E St Johns St way id=11021472 name v=E St Louis Ave way id=11065801 name v=E W Reeves Rd way id=103599461 name v=E. Watson Road -- Not a tiger import way id=10983188 name v=East North Street -- already expanded tiger way id=11270447 name v=East North St -- not expanded way id=11274418 name v=Edwin St N E way id=10851149 name v=Egretapos;s Walk Cir S -- In case the apos;s causes problems way id=10808177 name v=Ellesmere E way id=10951424 name v=Ave del Ctr way id=11288799 name v=Avenue E N way id=10939680 name v=Avenue N way id=11285084 name v=Avenue N NW way id=11097378 name v=Dr way id=10812824 name v=Dr Faruqui Dr way id=11358527 name v=Dr Joe Abal Dr way id=10919692 name v=Dr Martin L King Jr Dr way id=11128816 name v=N 14th St Pl way id=10982651 name v=N 19th Cir SW way id=39488514 name v=N 22nd St. -- non tiger way id=10885972 name v=N 3rd Street Cir way id=10993673 name v=N Blvd way id=10807124 name v=N Cortez Dr Cir C way id=11371860 name_1 v=N Cswy -- name v=N Causway way id=11090351 name v=N E 144th Avenue Rd way id=11080981 name v=N E 238 Ave Rd way id=11089629 name v=N E 62nd Ct Rd way id=10927659 name v=N E St way id=11013343 name v=N F S 595-2 way id=10925619 name v=N N St way id=11359562 name v=N N Road way id=10921209 name v=N S St way id=10880720 name v=N St Andrews St way id=10765917 name v=N St Clair St way id=10979914 name v=N St Peter St way id=11302478 name v=N Swan Ct NE way id=10243562 name v=N W 34th St R way id=11092219 name v=N W 51 St Ct way id=10927760 name v=N W Ave F North way id=10763701 name v=N de Gama Ave N way id=26630760 name v=N orth22nd Street --bad manual edit way id=27354570 name v=N orthGarcia Avenue --bad manual edit way id=10754189 name v=N-Yellow Pine Cir -- name_1 v=Yellow Pine North Cir way id=119723334 name v=N. Shingle Lane -- non tiger way id=10983026 name v=N19th Ave -- tiger:name_base v=111th Probably due to edits way id=11058140 name v=NE 40 Ln -- name_1 v=NE 1 St Ave Version 1 tiger way id=10806770 name v=NE 16th Ter; NE 17th Ave -- double name possibly from edits way id=11079312 name v=NE 172 Ave Rd way id=11089303 name v=NE 18th Ave; NE 9th St -- double name possibly from edits way id=10800930 name v=NE 19th Ter; NE 25th St -- double name possibly from edits way id=11100990 name v=NE 196 Ter Rd way id=11099492 name v=NE 21st Ter W way id=11088248 name v=NE 220th Ave Rd way id=11062349 name v=NE 3 Rd Ave way id=11081124 name v=NE 36th Av Rd way id=11070763 name v=NE Mt Zion A M E Church Ave way id=11081908 name v=NE226 Ter way id=28931406 name v=NE31st Ave -- non tiger way id=10789444 name v=NE way id=10788734 name v=NW 10th St Access Rd way id=10788581 name v=NW 126th Ave; NW 126th Way way id=10242655 name v=NW 141st way id=10242241 name v=NW 181 St way id=11128828 name v=NW 181st St way id=11085308 name v=NW 21st Street way id=11082282 name v=NW 221st Street Rd way id=10765627 name v=NW 231 St way id=11151648 name v=NW 4th Avenue Cir E way id=10792992 name v=NW 6th Ave; Blanch Ely Ave way id=10809778 name v=NW 71st Pl; NW 71st St way id=10928777 name v=NW Avenue G; Avenue G North; NW Avenue G way id=11273744 name v=NW Dr way id=107757877 name v=NW NW 125th Avenue -- non tiger way id=10246730 name v=NW30Ln -- name1 has spaces way id=11065133 name v=National Forest Rd 141A way id=11060010 name v=Nf Rd 354 way id=11083729 name v=Nfr 75B way id=11034257 name v=Nfs 572 B way id=10237516 name v=Nnw 141 St way id=10803531 name v=Nmw 49th Ave way id=83737572 name v=North 46th Streeet --manual expansion typo way id=10874252 name v=Northern Pacific Dr N way id=11124503 name v=Northwest 38th Court; NW 38th Ct way id=11213490 name v=Norwich O -- tiger:name_direction_suffix v=O way id=57732753 name v=Nw 35th Ave -- name case way id=9059279 name v=S St way id=11058256 name v=S W Cr 347 way id=11030290 name v=S and S Ln way id=34939098 name v=S.W. Sundance Trail -- non tiger way id=10927892 name_1 v=SE Ave E -- name
Re: [Talk-us] Fixing TIGER street name abbreviations
On 11 May 2012 22:17, Dale Puch dale.p...@gmail.com wrote: I understand the script checks for only one instance of the abbreviation. My point was what is someone manually expanded ONE of the abbreviations, leaving st something street? Is that checked for? The question also applies to Dr something Dr previously changed to Dr something Drive, and possibly directionals as well. Serge seems to be doing a good job with this, and this is just feedback so there aren't any incorrect expansions. The way the old script deals with those, is it has a list of abbreviations that come as a suffix and those that come as a prefix, from the TIGER documentation. It checks suffixes starting from the end, so if you have St something St E or St something St East, it'll only check E or East and then St and then stop because something is not a known suffix. There are cases where something can be both a suffix and a prefix, but those cases are known from the TIGER documentation. Note that that St something St, can be Saint something St, but it can also be State something St. The script uses a list of things that can be saint and those that can be state owned. Cheers ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Sat, May 12, 2012 at 4:21 PM, andrzej zaborowski balr...@gmail.com wrote: It checks suffixes starting from the end, so if you have St something St E or St something St East, it'll only check E or East and then St and then stop because something is not a known suffix. So Calle Ave Maria will be expanded to Calle Avenue Maria? ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Sat, May 12, 2012 at 4:47 PM, Anthony o...@inbox.org wrote: On Sat, May 12, 2012 at 4:21 PM, andrzej zaborowski balr...@gmail.com wrote: It checks suffixes starting from the end, so if you have St something St E or St something St East, it'll only check E or East and then St and then stop because something is not a known suffix. So Calle Ave Maria will be expanded to Calle Avenue Maria? Nevermind. No. It won't. Because Maria is not a known suffix, right? ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
The process seems obvious to me: check that the name is still what it originally was (from the tiger:name_base etc. tags), and if so, use those tags to expand abbreviations. (Ignore any with semicolons/colons from joining.) If not, set it aside for semi-manual checking. The only false positives that are not errors in the TIGER data will be caused by someone changing the tiger tags, and if both these and the name were changed consistently, the editor probably knew what they were doing. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Fri, May 11, 2012 at 04:47:37AM -0400, Serge Wroclawski wrote: I've added direction expansion into a new version, and thrown it up as a gist: https://gist.github.com/2656735 I don't treat direction prefixes and suffixes any differently- I haven't seen an example where there is both a prefix and a suffix in the name, and they're the same as the suffix. You might want to check Minneapolis/St Paul. They have some really bizarre directional combinations that could give you heartburn. -- Kristian M Zoerhoff pgpZwZcR4bgpU.pgp Description: PGP signature ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Thu, May 10, 2012 at 11:45 PM, Dale Puch dale.p...@gmail.com wrote: Clarity! The abbreviations are just that, they mean the full word, and are spoken that way, but written and displayed as the abbreviation. I also disagree I have never know anyone that said whatever A V E they do not spell it out, they say the word the abbreviation stands for. Same for St, Dr ect. What are you disagreeing with? I've known streets that were called Whatever Ave (Rhymes with Whatever Have). Not Whatever Avenue. And certainly not Whatever A V E. It is a LOT easier to abbreviate from the full word than to go the other way. Not really. Is 1515 South West Shore Boulevard, Tampa abbreviated 1515 S West Shore Blvd, Tampa, or is it abbreviated 1515 S W Shore Blvd, Tampa? If you want the answer, ask usps.com. The only way to capture the full information is to have additional tags telling you what the base is. And if you do that, abbreviating or not abbreviating doesn't matter. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Fri, May 11, 2012 at 9:45 AM, Anthony o...@inbox.org wrote: The only way to capture the full information is to have additional tags telling you what the base is. And if you do that, abbreviating or not abbreviating doesn't matter. And if you want to avoid tremendous redundancy, the way to that is with some sort of street relations. Each way should contain information about the way, the whole way, and *nothing but the way*. Including base_name information in every instance of the way fails 3NF. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On 2012-05-11 6:45 AM, Anthony wrote: Not really. Is 1515 South West Shore Boulevard, Tampa abbreviated 1515 S West Shore Blvd, Tampa, or is it abbreviated 1515 S W Shore Blvd, Tampa? If you want the answer, ask usps.com. The only way to capture the full information is to have additional tags telling you what the base is. And if you do that, abbreviating or not abbreviating doesn't matter. That's similar to how the tiger:* tags are structured, and it's the subject of a proposal on the wiki: http://wiki.openstreetmap.org/wiki/Proposed_features/Directional_Prefix_%26_Suffix_Indication -- Minh Nguyen m...@1ec5.org Jabber: m...@1ec5.org; Blog: http://notes.1ec5.org/ ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
At 2012-05-10 19:40, Anthony wrote: On Thu, May 10, 2012 at 10:25 PM, Mike N nice...@att.net wrote: The only question is what to do about those cases where it's only referred to locally as 'Ave', and the postal service would refuse letters addressed to 'Avenue'. The postal service would refuse letters addressed to Avenue in some instances? Unless this quote is out of context, that seems ridiculous (in the US). -- Alan Mintz alan_mintz+...@earthlink.net ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
At 2012-05-10 19:56, Anthony wrote: On Thu, May 10, 2012 at 10:45 PM, Mike N nice...@att.net wrote: But you wouldn't be confused if an stranger came in asking how to get to Whatever Avenue?If not, then there's no problem with the expansion. Okay, so basically we're ignoring the on-the-ground rule in order to map for the renderer. Exactly :) Why that is ok, I don't know :( -- Alan Mintz alan_mintz+...@earthlink.net ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Fri, May 11, 2012 at 12:26 PM, Minh Nguyen m...@1ec5.org wrote: On 2012-05-11 6:45 AM, Anthony wrote: The only way to capture the full information is to have additional tags telling you what the base is. And if you do that, abbreviating or not abbreviating doesn't matter. That's similar to how the tiger:* tags are structured, and it's the subject of a proposal on the wiki: http://wiki.openstreetmap.org/wiki/Proposed_features/Directional_Prefix_%26_Suffix_Indication Well, yes, we should find a way to separate out the parts of the name. In addition to facilitating abbreviation, it also facilitates translation. We also should include pronunciation information. But first we need street relations. It turns out some people already did research into ways to structure a database which minimizes the inconsistencies we currently have in the OSM database. It's called database normalization. As it turns out, the current method of putting names on ways already fails to even be in first normal form. Some ways represent more than one road. The solution is to use relations. http://wiki.openstreetmap.org/wiki/Relations/Proposed/Street is somewhat of a good proposal, though I have a little bit of trouble with the wording. We shouldn't include Any Tag that applies to all parts of the road, but only to those tags which apply to the entire road as a whole. In other words, we'd include goods=no if there were a law saying no commercial vehicles are allowed on Whatever Parkway, but not just because all the ways happen to have goods=no tags. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Fri, May 11, 2012 at 1:35 PM, Alan Mintz alan_mintz+...@earthlink.net wrote: At 2012-05-10 19:40, Anthony wrote: On Thu, May 10, 2012 at 10:25 PM, Mike N nice...@att.net wrote: The only question is what to do about those cases where it's only referred to locally as 'Ave', and the postal service would refuse letters addressed to 'Avenue'. The postal service would refuse letters addressed to Avenue in some instances? Unless this quote is out of context, that seems ridiculous (in the US). I very well may have misquoted Mike North. I'm not sure what he was trying to say. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
At 2012-05-11 10:20, David ``Smith'' wrote: Third, I suggest retaining the abbreviated form in a tag like abbr_name. Ideally, this should be the exact abbreviated form used on signs, if that's consistent. Getting this right requires local knowledge, but TIGER's abbreviation might be better than nothing. I'm sure some will disagree with that point. Better yet, since a proper expansion bot has to chop up the name into its components, why not take the opportunity to advance the project by tagging (and re-abbreviating if necessary) those individual components (e.g. street:dir_prefix, street:name, street:type, street:dir_suffix)? That, I could support. One field with the full name for the text-to-speech consumers, and another set of fields to properly identify the street the way others do. Fifth, renderers must take care in abbreviating street names. For example, Mapquest Open turns Lane Avenue into Ln Ave, where only the last word should be abbreviated. To eliminate guesswork, renderers can use the abbr_name tag, if present. Wouldn't happen with street:name=Lane, street:type=Ave (since it would not speak street:name verbatim) -- Alan Mintz alan_mintz+...@earthlink.net ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
I understand the script checks for only one instance of the abbreviation. My point was what is someone manually expanded ONE of the abbreviations, leaving st something street? Is that checked for? The question also applies to Dr something Dr previously changed to Dr something Drive, and possibly directionals as well. Serge seems to be doing a good job with this, and this is just feedback so there aren't any incorrect expansions. On Fri, May 11, 2012 at 12:27 AM, Toby Murray toby.mur...@gmail.com wrote: On Thu, May 10, 2012 at 10:52 PM, Dale Puch dale.p...@gmail.com wrote: I think I came up with a rare possibility for error. The original st something st was manually expanded to st something street your checking for a single st, and there would be. Or am I missing another check? It checks for one and ONLY one possible abbreviation to expand. If there are more than one it punts and ignores the way. This is a very conservative approach which is probably good at least for a first pass. Maybe if the first run goes well we can see how many problems are left and look at refining things for a second pass to catch more difficult ones. Or not... Toby ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us -- Dale Puch ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On 5/11/2012 1:36 PM, Alan Mintz wrote: Okay, so basically we're ignoring the on-the-ground rule in order to map for the renderer. Exactly :) Why that is ok, I don't know :( Mapping for the renderer has never been wrong or discouraged. Tagging incorrectly for the renderer is another story... ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
At 2012-05-11 14:11, Mike N wrote: On 5/11/2012 1:36 PM, Alan Mintz wrote: Okay, so basically we're ignoring the on-the-ground rule in order to map for the renderer. Exactly :) Why that is ok, I don't know :( Mapping for the renderer has never been wrong or discouraged. Tagging incorrectly for the renderer is another story... That is not my recollection. -- Alan Mintz alan_mintz+...@earthlink.net ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Fri, May 11, 2012 at 4:17 PM, Dale Puch dale.p...@gmail.com wrote: I understand the script checks for only one instance of the abbreviation. My point was what is someone manually expanded ONE of the abbreviations, leaving st something street? Is that checked for? I have a number of thoughts here: 1. Real world examples. Many of the examples I've seen are contrived. I'm glad we're testing, but testing needs to be based on actual data seen in the US dataset. That said: 2. There are a couple of ways to handle this: * One way (the most conservative way) would be to test for untouched TIGER ways. That is ways in which they're still at version 1. This would be a real problem, though, since there are lots of examples were someone may have fixed the geometry without touching the tags. * The other way is a method I'm using in an experimental branch of the code on my machine, which is to try to be a bit more selective about the expansions of road types. If we assume that the road type always appears after the base name, we can be handle examples like (real world example) St Marys St. The same would hold true for direction tags, so we'd be able to expand E E St confidently as well. But there's a catch. If someone would have edited the name of the above street from the original St Marys St to St. Marys St then that test would fail, and the expansion would never occur, where as in the current version, it would. So: 3. Any method used is going to produce some number of potential either false positives or false negatives. I contend that the number of errors in either case will be so tiny that it will be lost in the noise, but there's no way to promise it will always be 0. The best we can do is toss out uncertain expansions and have them handled manually (which is something I'm working to make better in the next version of the code as well). But: 4. I don't want us to rely on cleverness. I'd much rather rely on people testing the code with real world inputs and checking the outputs. I should have a new version of the code either tonight or tomorrow, with the new expansion rules. - Serge ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
Sorry, I should have been clearer, the results I posted were from my quick test. I just wanted to report the abbreviations I saw as possible additions to the list in Serge's script. And to give an idea of which showed up most either for scripting or if someone wanted to handle the lesser used ones manually. On Thu, May 10, 2012 at 6:25 PM, Serge Wroclawski emac...@gmail.com wrote: On Thu, May 10, 2012 at 6:08 PM, Mike N nice...@att.net wrote: On 5/10/2012 4:08 PM, Serge Wroclawski wrote: I've been testing a script to do this. Here it is: Thanks for posting it. I don't see where it expands directionals; I don't see the same thing Dale saw: 5141 instances of E changed to East for example. It doesn't expand directionals, and it only touches ways with tiger tags, so I suspect Dale is looking at the wrong code, or the wrong output, or something else. If it doesn't expand directionals, I believe that it should where the TIGER hint is available tiger:name_direction_prefix. Otherwise we'll still end up with endless nagging over Okay, let's talk about this then. It was originally outside the scope of our discussion, but I'm happy to add it- it won't be more than a another few lines of code and another lookup table. Warning - abbreviation in 'E Pond Scum Street' when uploading. Please do not upload the output! 1. We should all agree on the correct output. 2. We should organize the right account to upload with. 3. We should add tags to the uploaded changesets. - Serge ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us -- Dale Puch ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Thu, May 10, 2012 at 3:28 PM, Dale Puch dale.p...@gmail.com wrote: As a quick and dirty test I took Florida and Illinois road data from cloudmade. A simple replace of the top 7 or so suffixes at the end of the name an with a space in front of it resulted in over 700,000 name changes for those 2 states alone, and that did not include all the names with cardinals (prefix and suffix) that need expanding. It was well over 80% of the names. Anyone arguing that not scripting these changes should spend a day or two trying to do that by hand and get back to us how they feel afterwards. You seem to be assuming all the changes are positive. What happened to the on the ground rule, anyway? ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On 5/10/2012 9:48 PM, Anthony wrote: You seem to be assuming all the changes are positive. I didn't take it that way - it was just a quick test for orders of magnitude. An actual script takes more review. What happened to the on the ground rule, anyway? That already doesn't directly apply because most street signs are abbreviated to start with. Local and regional knowledge will be helpful though. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On 5/10/2012 10:19 PM, Anthony wrote: What I'm questioning is why it doesn't apply. If the people call it Whatever Ave, shouldn't the data read Whatever Ave? Most of the US wouldn't call it 'Whatever Ave'; when spoken, it would be 'Avenue'. Having it expanded makes programs with spoken directions much more accurate. The only question is what to do about those cases where it's only referred to locally as 'Ave', and the postal service would refuse letters addressed to 'Avenue'. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Thu, May 10, 2012 at 10:25 PM, Mike N nice...@att.net wrote: On 5/10/2012 10:19 PM, Anthony wrote: What I'm questioning is why it doesn't apply. If the people call it Whatever Ave, shouldn't the data read Whatever Ave? Most of the US wouldn't call it 'Whatever Ave'; when spoken, it would be 'Avenue'. Having it expanded makes programs with spoken directions much more accurate. Depends on what street you're talking about. I've certainly lived in places where the vast majority of the locals called it Whatever Ave, and not Whatever Avenue. Most of the US...wouldn't talk about the street at all. The only question is what to do about those cases where it's only referred to locally as 'Ave', and the postal service would refuse letters addressed to 'Avenue'. The postal service would refuse letters addressed to Avenue in some instances? ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On 5/10/2012 10:40 PM, Anthony wrote: Depends on what street you're talking about. I've certainly lived in places where the vast majority of the locals called it Whatever Ave, and not Whatever Avenue. Most of the US...wouldn't talk about the street at all. But you wouldn't be confused if an stranger came in asking how to get to Whatever Avenue?If not, then there's no problem with the expansion. Presumably, a US-centric renderer would abbreviate names for display, while spoken directions would be no more confusing than this stranger. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
The issue with abbreviations is very muddy. BUT it has been said many time that we do not want to abbreviate where possible. There are several reasons. - Clarity! The abbreviations are just that, they mean the full word, and are spoken that way, but written and displayed as the abbreviation. I also disagree I have never know anyone that said whatever A V E they do not spell it out, they say the word the abbreviation stands for. Same for St, Dr ect. - It is a LOT easier to abbreviate from the full word than to go the other way. Otherwise this scripting expansion thing would be easy and error free. - As mentioned it makes use of the data easier, especially for searching, and text to speech. Yes there can be errors with going from abbreviations to the full words. A reason for doing this as said, do it once with review instead of in every program that uses the data. But those errors are small in comparison to the number of abbreviated way names and can be corrected later as found just like any other tagging error. Most of those errors are going to be on names that are unclear to begin with. People have gotten so gun shy of any automation or imports that I feel they are actively blocking people trying to do the right thing and a good job. It is almost to a point I wonder why someone would go thru talking it over on the list if you get grief for it if you can just quietly start doing it. Obviously not what we want. On Thu, May 10, 2012 at 10:56 PM, Anthony o...@inbox.org wrote: On Thu, May 10, 2012 at 10:45 PM, Mike N nice...@att.net wrote: But you wouldn't be confused if an stranger came in asking how to get to Whatever Avenue?If not, then there's no problem with the expansion. Okay, so basically we're ignoring the on-the-ground rule in order to map for the renderer. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us -- Dale Puch ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
I think I came up with a rare possibility for error. The original st something st was manually expanded to st something street your checking for a single st, and there would be. Or am I missing another check? I can't think of any other situations besides Saint and Street like this. Possibly check it is at the end of the name or only followed by a N S E ect. Or just save those for a second pass at expanding either by another script or by hand. On Thu, May 10, 2012 at 4:08 PM, Serge Wroclawski emac...@gmail.com wrote: I've been testing a script to do this. Here it is: http://www.emacsen.net/tiger.py It needs to be fed a file. I've been using the state files from geofabrik. the resulting files in expansions can then be fed to a script for upload. I welcome feedback on the script and the resulting output. - Serge ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us -- Dale Puch ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Wed, May 2, 2012 at 7:28 AM, Mike N nice...@att.net wrote: On 5/1/2012 11:49 PM, Anthony wrote: That assumes that the TIGER tags will always be present to assist with proper automatic expansion. I'm not sure what you mean, because I am not making that assumption at all. You mentioned use of the history to access the TIGER tags. Yes, I said if a bot is smart enough to go through the history tags, then this *does* provide an advantage over data consumers doing it themselves. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Wed, May 2, 2012 at 12:01 PM, Serge Wroclawski emac...@gmail.com wrote: 2) My human error rate estimation of 1/1000 seems entirely reasonable. Think typos, or misreading. I'm sure we see error rates that high now in OSM and we find them acceptable. A computer that's acting conservatively will actually produce far lower error rates! But its error rates are potentially much more annoying. Doctor Martin Luther King Bolevard is one thing. Drive Martin Luther King Boulevard is another. The latter will be much more difficult to find at a later date and flag for review. Maybe you won't make that particular error, but you're going to have to be really careful to avoid making any errors like it. And relying on the TIGER tags may or may not help. I wouldn't be surprised if many of the TIGER tags themselves are screwed up, based on the kinds of mistakes I've seen in TIGER data. Anthony ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Tue, May 8, 2012 at 11:31 PM, Anthony o...@inbox.org wrote: Doctor Martin Luther King Bolevard is one thing. Drive Martin Luther King Boulevard is another. And if we're going to make so many mistakes (1/1000 means thousands of mistakes), I'd rather it just be left as Dr Martin Luther King Blvd. Yes, we can't stop people from making mistakes. But we can refuse to allow thousands of mistakes to be added, for the sake of removing abbreviations which aren't hurting anyone in the first place. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
ISTM this might be a good mechanical turk application if there is genuine concern that there will be a substantial error rate (my point-of-view as a social scientist is that a hypothesized 1/1000 error rate is pretty darn low, but I can appreciate that some might have more exacting standards), either implemented on the web or as a JOSM plugin. Anything beats tedious, manual ad-hoc editing whenever there's a slight geometry change with the accompanying JOSM nags. Chris ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Wed, May 2, 2012 at 11:08 AM, Chris Lawrence lordsu...@gmail.com wrote: ISTM this might be a good mechanical turk application if there is genuine concern that there will be a substantial error rate (my point-of-view as a social scientist is that a hypothesized 1/1000 error rate is pretty darn low, but I can appreciate that some might have more exacting standards), either implemented on the web or as a JOSM plugin. I'm already working on a revised script, but; 1) We're not talking about a small number of ways- we're talking about over a million ways. if we assume it takes 20 seconds per way to correct (which I think is actually low when you add in factors like upload times) then it will over five and a half thousand man hours. This would be a very large undertaking 2) My human error rate estimation of 1/1000 seems entirely reasonable. Think typos, or misreading. I'm sure we see error rates that high now in OSM and we find them acceptable. A computer that's acting conservatively will actually produce far lower error rates! 3) I'm seeing very little resistance to the idea of an expansion script on this list. There's pretty much universal support for expansions, especially since it's half done already. The concern seems to be about the script and error rates. We can (and should) test that- I suspect we'll find very low errors rates- and we can correct the errors, either in the script or if they're one-offs, in a post-script process. - Serge ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Wed, May 2, 2012 at 10:08 AM, Chris Lawrence lordsu...@gmail.com wrote: ISTM this might be a good mechanical turk application if there is genuine concern that there will be a substantial error rate (my point-of-view as a social scientist is that a hypothesized 1/1000 error rate is pretty darn low, but I can appreciate that some might have more exacting standards), either implemented on the web or as a JOSM plugin. Anything beats tedious, manual ad-hoc editing whenever there's a slight geometry change with the accompanying JOSM nags. Well to put a scope on things, Ian and I were playing around with some queries on IRC yesterday. Here is the result. This is the breakdown of the values for the tiger:name_type tag for all version 1 ways. So most of them will have last been touched by the original TIGER upload although way splitting means there will be a few in here from other users and may already be un-abbreviated. But I expect that to be a pretty small fraction. http://pastebin.com/LWyejSMr Obviously the values at the bottom are all silly. Probably the result of merging ways. This isn't going to help anyone and definitely needs manual cleanup: Ln; Rd; Dr; Dr But the top few values are obviously what we would want to focus on with this bot. Toby ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Mon, Apr 30, 2012 at 11:28 PM, Serge Wroclawski emac...@gmail.com wrote: On Mon, Apr 30, 2012 at 8:14 PM, Paul Johnson ba...@ursamundi.org wrote: There have been some limited automated expansions, though they can be problematic, because abbreviations can mean many possible things. Expanding abbreviations requires a bit of a human touch. Creating abbreviations in the renderer when so desired, not so much. This is true, but if one is talking about the TIGER data, there are a number of hints that can make this problem virtually nil. There's a tag tiger:name_type key that contains the value of the expandable name section, eg. St or Ln or Pky. AFAIK these are always expandable to Street, Lane and Parkway. And of course one must only expand the name_tag value if it's the last component of the name string, eg. Ln Ln should be Ln Lane. This should be fairly easy to construct in a regex, but one should be careful of it. Those two rules should eliminate a vast majority of expansion issues. If we only expand TIGER data, then it should be a fairly straightforward process. Of course such a script should be peer reviewed and tested, but I'm confident that the error rate will be very low. I guess this would be okay, so long as it gets peer reviewed and tested by a group including you. And for those few exceptions where the expansion is wrong, a human review process will turn this up and make it fairly correctable. In fact, I'd argue that the problems won't be subtle, making them easy to spot and fix. How would the human review process work? Isn't it better to do the review *before* editing the database? In return, we'll save hundreds, maybe thousands of man hours doing expansions. Useless expansions, though. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Tue, May 1, 2012 at 1:06 PM, Serge Wroclawski emac...@gmail.com wrote: The other point that's being missed is that we as a community already accept an error rate in our data that's far larger than any potential mistake rate on a well written script. If the script makes one error in 1000 streets, it will be doing a better job than a vast majority of manual mappers, and like manual mappers, they can be corrected. If someone manually expanded 1,000,000 street name abbreviations, and made 1,000 mistakes, it would not be acceptable. If they were doing something more useful than expanding street name abbreviations, fine. But expanding street name abbreviations, according to a very simple heuristic which can easily be done at the preprocessing stage, is not very useful. If this is going to be done, I hope the error rate is much smaller than 1 in 1,000. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On 5/1/2012 12:59 PM, Anthony wrote: I'm not sure what you're saying. Automatically expanding abbreviations is a terrible idea. If an abbreviation is unambiguous, then it can be expanded during the preprocessing step. If, on the other hand, it is ambiguous, then you are turning ambiguous data into incorrect data, which certainly diminishes the data. What preprocessing step? TIGER data has already been imported. The types of errors I'm referring to are where you go to upload from JOSM, then decide to slavishly submit to the validator's warnings about abbreviated street names. What person manually types 2 - 3 dozen versions of Street , Avenue, Boulevard, Point, Circle without any typos? ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Tue, May 1, 2012 at 1:18 PM, Nathan Edgars II nerou...@gmail.com wrote: On 5/1/2012 12:59 PM, Anthony wrote: Automatically expanding abbreviations is a terrible idea. If an abbreviation is unambiguous, then it can be expanded during the preprocessing step. If, on the other hand, it is ambiguous, then you are turning ambiguous data into incorrect data, which certainly diminishes the data. Not quite. We have various TIGER tags that break the name into pieces, and allow automated expansion where the name field may be ambiguous. (Though occasionally these tags are wrong.) I'm not sure what you're disagreeing with. Either it is unambiguous (due to TIGER tags or whatever), and therefore can be done during the preprocessing step. Or it is ambiguous, and needs human intervention/review. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On 5/1/2012 1:23 PM, Anthony wrote: On Tue, May 1, 2012 at 1:18 PM, Nathan Edgars IInerou...@gmail.com wrote: On 5/1/2012 12:59 PM, Anthony wrote: Automatically expanding abbreviations is a terrible idea. If an abbreviation is unambiguous, then it can be expanded during the preprocessing step. If, on the other hand, it is ambiguous, then you are turning ambiguous data into incorrect data, which certainly diminishes the data. Not quite. We have various TIGER tags that break the name into pieces, and allow automated expansion where the name field may be ambiguous. (Though occasionally these tags are wrong.) I'm not sure what you're disagreeing with. Either it is unambiguous (due to TIGER tags or whatever), and therefore can be done during the preprocessing step. Or it is ambiguous, and needs human intervention/review. The TIGER tags are not exactly standard OSM tags that belong in the database. Better that we get rid of them at the same time as we expand abbreviations. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Tue, May 1, 2012 at 1:26 PM, Nathan Edgars II nerou...@gmail.com wrote: On 5/1/2012 1:23 PM, Anthony wrote: On Tue, May 1, 2012 at 1:18 PM, Nathan Edgars IInerou...@gmail.com wrote: On 5/1/2012 12:59 PM, Anthony wrote: Automatically expanding abbreviations is a terrible idea. If an abbreviation is unambiguous, then it can be expanded during the preprocessing step. If, on the other hand, it is ambiguous, then you are turning ambiguous data into incorrect data, which certainly diminishes the data. Not quite. We have various TIGER tags that break the name into pieces, and allow automated expansion where the name field may be ambiguous. (Though occasionally these tags are wrong.) I'm not sure what you're disagreeing with. Either it is unambiguous (due to TIGER tags or whatever), and therefore can be done during the preprocessing step. Or it is ambiguous, and needs human intervention/review. The TIGER tags are not exactly standard OSM tags that belong in the database. Better that we get rid of them at the same time as we expand abbreviations. On that point, I strongly agree. And actually, if the bot is going to be smart enough to look at the history, to find deleted TIGER tags, then maybe there is some advantage to doing this during the preprocessing step (which would often not have access to history data). ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Tue, May 1, 2012 at 1:31 PM, Anthony o...@inbox.org wrote: And actually, if the bot is going to be smart enough to look at the history, to find deleted TIGER tags, then maybe there is some advantage to doing this during the preprocessing step (which would often not have access to history data). What I mean is that, if the bot is going to look at the history, then there would be an advantage to letting the bot run. But I am assuming this could be done with much less than a 1/1000 error rate. 1/10,000 would maybe be acceptable. 1/100,000 would be okay. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On 5/1/2012 1:21 PM, Anthony wrote: The preprocessing step between downloading the data from OSM and doing something with it. That assumes that the TIGER tags will always be present to assist with proper automatic expansion. And I'd rather have the US data in line with the world-wide OSM data where it makes sense. That way the US can consume OSM US data with tools developed worldwide, without the tool writers needing to implement US-specific rules. After analysis, most of the US opinions fall on the side of no abbreviations. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Tue, May 1, 2012 at 12:26 PM, Nathan Edgars II nerou...@gmail.comwrote: The TIGER tags are not exactly standard OSM tags that belong in the database. Better that we get rid of them at the same time as we expand abbreviations. Although the tiger:* keys aren't standard, the information they store is very useful. There are plenty of people that might want to know the different parts of a road name, so we should simply rename these tags instead of completely blowing the data away. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Tue, May 1, 2012 at 1:36 PM, Mike N nice...@att.net wrote: On 5/1/2012 1:21 PM, Anthony wrote: The preprocessing step between downloading the data from OSM and doing something with it. That assumes that the TIGER tags will always be present to assist with proper automatic expansion. I'm not sure what you mean, because I am not making that assumption at all. And I'd rather have the US data in line with the world-wide OSM data where it makes sense. That way the US can consume OSM US data with tools developed worldwide, without the tool writers needing to implement US-specific rules. After analysis, most of the US opinions fall on the side of no abbreviations. I don't think anyone in this thread is arguing against expanding abbreviations. The question is whether or not it's okay for a bot to expand abbreviations. And to a large extent that depends on how accurate the bot will be. If the bot is sure to be 100% accurate, then hey, no problem. But I don't believe that is the case. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Tue, May 1, 2012 at 1:41 PM, Ian Dees ian.d...@gmail.com wrote: On Tue, May 1, 2012 at 12:26 PM, Nathan Edgars II nerou...@gmail.com wrote: The TIGER tags are not exactly standard OSM tags that belong in the database. Better that we get rid of them at the same time as we expand abbreviations. Although the tiger:* keys aren't standard, the information they store is very useful. There are plenty of people that might want to know the different parts of a road name, so we should simply rename these tags instead of completely blowing the data away. I guess that's okay too, though personally I get so annoyed by the redundant data (*) that I couldn't be bothered. Why street relations never caught on is beyond me. (*) I.E. adding base_name=Main to the 100 different ways that Main Street is split up into. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On Mon, Apr 30, 2012 at 7:14 PM, Paul Johnson ba...@ursamundi.org wrote: On Apr 30, 2012 5:00 PM, David Litke dwli...@comcast.net wrote: I just did a few manual TIGER reviews in JOSM and got a validation warning that words like Street and Avenue were abbreviated as St and Ave. So I wonder if this is considered something that needs to be fixed? Yes. Rule one of abbreviations: Don't do it! If so, shouldn't it be easy to somehow do a batch global update? There have been some limited automated expansions, though they can be problematic, because abbreviations can mean many possible things. Expanding abbreviations requires a bit of a human touch. Creating abbreviations in the renderer when so desired, not so much. If by limited you mean half the country :) Yes, there was a TIGER name expansion bot that ran from the west coast to about the Mississippi. I believe It was stopped after some complaints about it not handling some situations correctly. But I would probably be in favor of trying to complete it. Related: http://ksmapper.blogspot.com/2011/05/main-attraction.html Toby ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On 4/30/2012 10:24 PM, Toby Murray wrote: I believe It was stopped after some complaints about it not handling some situations correctly. But I would probably be in favor of trying to complete it. I would agree - there's no point in asserting that we have to spend time manually expanding everything, it's not adding value to the map data. And the bot is probably more accurate than a human, limited only by the accuracy of the base TIGER data - think of all the possible typos on streeet, avenve, and boulavard. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
David Litke dwli...@comcast.net wrote: I just did a few manual TIGER reviews in JOSM and got a validation warning that words like Street and Avenue were abbreviated as St and Ave. So I wonder if this is considered something that needs to be fixed? If so, shouldn't it be easy to somehow do a batch global update? This has been discussed, and tried, before. Unfortunately, some abbreviations can stand for more than one thing, and it takes local knowledge to be sure what is the right choice. -- John F. Eldredge -- j...@jfeldredge.com Reserve your right to think, for even to think wrongly is better than not to think at all. -- Hypatia of Alexandria ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Fixing TIGER street name abbreviations
On 4/30/12 10:35 PM, Mike N wrote: On 4/30/2012 10:24 PM, Toby Murray wrote: I believe It was stopped after some complaints about it not handling some situations correctly. But I would probably be in favor of trying to complete it. I would agree - there's no point in asserting that we have to spend time manually expanding everything, it's not adding value to the map data. And the bot is probably more accurate than a human, limited only by the accuracy of the base TIGER data - think of all the possible typos on streeet, avenve, and boulavard. from what i gather, there is more than one expansion bot, and they're not all the same. at least, i saw some incorrectly expanded names in North-Central Iowa last year, and everyone involved in the bots that spoke up disclaimed knowledge of that particular example of bad expansions. so there are bots and there are bots, and i'd feel happer about them if i sensed more devotion to quality assurance. after all, if you don't expand automagically, you have unwanted abbreviations that may not get expanded for years if you do expand, you may introduce errors in place of abbreviations that may go undetected for years. richard ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us