Re: [Talk-us] Tidying up TIGER data
On Thu, Jun 4, 2009 at 1:36 AM, Ted Percival t...@midg3t.net wrote: Paul Johnson wrote: 4) Remove abbreviations TIGER imported. Sometimes, I really wonder if TIGER was such a hot dataset to import... http://wiki.openstreetmap.org/wiki/Key:name#Abbreviation_.28don.27t_do_it.29 I wrote a script to expand TIGER abbreviations into full words. It's an addon for the change_tags.py script ( http://svn.openstreetmap.org/applications/utils/change_tags/change_tags.py ) which unfortunately doesn't work with API 0.6 yet - at least last time I tried. It does a few other things too, particularly for areas that use the grid system. While most of the time this would work well, I can think of some cases where making these assumptions with TIGER data is a bad idea Its functions are: - Strip St suffix from grid-named streets (eg. South 500 West) - Collapse multiple spaces into a single space (lots of TIGER) - Expand abbreviated directions (eg. S 500 E to South 500 East) - Expand abbreviated suffixes (Rd - Road, St - Street, etc) - Strip St.: is that recommended somewhere? It seems silly to remove data like that... - Collapse spaces: Ok, that makes sense. - Expand abbreviated dirs: This is the one that I have the most problems with. In my neighborhood in Minnaepolis, the official names for roads actually end in SE. For example, I live on 6th Avenue SE. I've seen several different representations of this, but when I ask several different mail carriers and some GIS folks at the University there, they all said that SE is the official name, not southeast. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Tidying up TIGER data
Ian Dees wrote: Its functions are: - Strip St suffix from grid-named streets (eg. South 500 West) - Collapse multiple spaces into a single space (lots of TIGER) - Expand abbreviated directions (eg. S 500 E to South 500 East) - Expand abbreviated suffixes (Rd - Road, St - Street, etc) - Strip St.: is that recommended somewhere? It seems silly to remove data like that... Until you go out to pretty much any city out in the desert or originally built by Mormons. In such cities, 90%+ of the streets are not named to begin with, locations are purely Cartesian. The only two streets I know have a name in Salt Lake City are State Street and Temple Square, and I'm not sure Temple Square counts (I'd rather not get too close, to be honest). All the other ways are referred to by address, such as 450 S 700 E would mean that the address is located four and a half blocks south of the Mormon temple on the even side of the street, 7 blocks east of the temple. Interestingly enough, if you navigate to cities that have a lack of street names, you'll see stuff like E 2100 S St in TIGER, even though this is wrong! - Collapse spaces: Ok, that makes sense. - Expand abbreviated dirs: This is the one that I have the most problems with. In my neighborhood in Minnaepolis, the official names for roads actually end in SE. For example, I live on 6th Avenue SE. I've seen several different representations of this, but when I ask several different mail carriers and some GIS folks at the University there, they all said that SE is the official name, not southeast. I could be wrong on this, but I've been making an exception for cardinals myself, using the same logic behind NOT using abbreviations for everything else. I honestly can't think of any other common abbreviations that would prevent a signature.asc Description: OpenPGP digital signature ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Tidying up TIGER data
Also in Atlanta, there's N St. I got directions from google and thought I was looking for North St. Man was that a big mistake. Cheers, Adam On 6/4/09, Paul Johnson ba...@ursamundi.org wrote: Ian Dees wrote: Its functions are: - Strip St suffix from grid-named streets (eg. South 500 West) - Collapse multiple spaces into a single space (lots of TIGER) - Expand abbreviated directions (eg. S 500 E to South 500 East) - Expand abbreviated suffixes (Rd - Road, St - Street, etc) - Strip St.: is that recommended somewhere? It seems silly to remove data like that... Until you go out to pretty much any city out in the desert or originally built by Mormons. In such cities, 90%+ of the streets are not named to begin with, locations are purely Cartesian. The only two streets I know have a name in Salt Lake City are State Street and Temple Square, and I'm not sure Temple Square counts (I'd rather not get too close, to be honest). All the other ways are referred to by address, such as 450 S 700 E would mean that the address is located four and a half blocks south of the Mormon temple on the even side of the street, 7 blocks east of the temple. Interestingly enough, if you navigate to cities that have a lack of street names, you'll see stuff like E 2100 S St in TIGER, even though this is wrong! - Collapse spaces: Ok, that makes sense. - Expand abbreviated dirs: This is the one that I have the most problems with. In my neighborhood in Minnaepolis, the official names for roads actually end in SE. For example, I live on 6th Avenue SE. I've seen several different representations of this, but when I ask several different mail carriers and some GIS folks at the University there, they all said that SE is the official name, not southeast. I could be wrong on this, but I've been making an exception for cardinals myself, using the same logic behind NOT using abbreviations for everything else. I honestly can't think of any other common abbreviations that would prevent a ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Tidying up TIGER data
On Thu, 2009-06-04 at 00:36 -0600, Ted Percival wrote: Its functions are: - Strip St suffix from grid-named streets (eg. South 500 West) - Collapse multiple spaces into a single space (lots of TIGER) - Expand abbreviated directions (eg. S 500 E to South 500 East) - Expand abbreviated suffixes (Rd - Road, St - Street, etc) So, I looked at doing this when I originally converted the TIGER data. The issue is that I'm too dumb to come up with anything that worked universally across the entire country. This kind of script is useful for small areas that you've looked at manually, but please don't apply it too widely. It does the right actions for sanely-named things, but TIGER is full of goofy stuff. Consider: St. Helens St.. There are also plenty of semi-mistakes or weird abbreviations in TIGER that appear to be mistakes. I wouldn't be surprised to see Saint Street entered somewhere as name: St. type: St. We don't want to make that Street Street. That makes it even worse. :) Again, these can work in limited areas where the naming is nice and consistent, but it's really really hard to make it work on a large scale where things are *NOT* consistent. -- Dave ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Tidying up TIGER data
Paul Johnson wrote: Ian Dees wrote: - Collapse spaces: Ok, that makes sense. - Expand abbreviated dirs: This is the one that I have the most problems with. In my neighborhood in Minnaepolis, the official names for roads actually end in SE. For example, I live on 6th Avenue SE. I've seen several different representations of this, but when I ask several different mail carriers and some GIS folks at the University there, they all said that SE is the official name, not southeast. The script only does word-bounded cardinal directions, so SE remains SE. That said, it *does* currently bust a few lettered streets in Salt Lake City (E Street, N Street, etc.). I'll fix that up in the next version by requiring at least three words in the name. I could be wrong on this, but I've been making an exception for cardinals myself, using the same logic behind NOT using abbreviations for everything else. I'm not sure why the logic is inverted. While it is common notation to abbreviate the cardinal directions, the street signs actually say 300 West, and I would prefer voice navigation software to say Turn on Three hundred West Rather than Turn on Three Hundred Double-U, for instance. I think the usual principles apply: it's easy enough for renderers to abbreviate full words when it's appropriate, and routing software to understand how users might abbreviate their input. Maintaining unnecessary ambiguity in the database should be avoided. ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Tidying up TIGER data
Adam Schreiber wrote: Also in Atlanta, there's N St. I got directions from google and thought I was looking for North St. Man was that a big mistake. OK, so expand cardinals or not? signature.asc Description: OpenPGP digital signature ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us
Re: [Talk-us] Tidying up TIGER data
Ted Percival wrote: I'm not sure why the logic is inverted. While it is common notation to abbreviate the cardinal directions, the street signs actually say 300 West, and I would prefer voice navigation software to say Turn on Three hundred West Rather than Turn on Three Hundred Double-U, for instance. I think the usual principles apply: it's easy enough for renderers to abbreviate full words when it's appropriate, and routing software to understand how users might abbreviate their input. Maintaining unnecessary ambiguity in the database should be avoided. Well, you hit the nail on the head, I figured expanding cardinals is about as trivial like creating abbreviations for particular words automatically as needed. If the general consensus is that we should expand those, I will. signature.asc Description: OpenPGP digital signature ___ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us