Can anyone comment on the accuracy of the Tiger geocoder vs MapMarker?


> I have worked with the Tiger data for about 10 years now. The recent
> improvements in tiger are really great to see, but not without their own set
> of issues. Tiger has a lot of known limitations based on the rules, regs and
> requirements of the US Census. The recent work has georectified the street
> data and added lots of new streets based on digitizing high-res satellite
> imagery. but that does not let you read the street names so they are added
> after the fact. There are a lot of street segments that do not have names.
> We can only hope that these will be added over time. Because of
> non-disclosure, address ranges can be weird also. Many small streets have
> address ranges 1-100 encoded on them, in spite of the fact that the real
> address ranges only run from 1-20. This has the effect of skewing all the
> locations to the front end of the street.
> Because language is ambiguous and typos and sounds-like errors, fuzzy
> searching is employed. Most geocoders do some form of fuzzy searching so you
> often run into the Main St vs Main Ln issue or you find W Main St when you
> are search for E Main St.
> When a geocoder says "Found it!", you need to be prepared to say Found
> What? or be tolerant to mis-geocodes. I like geocoders the score the results
> and return them in ranked order.
> In general a geocoder can never be better than its data and can in fact be
> much worse than its data. Fuzzy searching lets you find possible candidates
> in the data that might not have been encoded correctly in either the input
> address or the data address, but with the uncertainty that this is the
> actual location wanted or not.
> You might also want to look at PAGC Geocoder. It is written in C and uses
> some statistical matching techniques which are very good, There are some
> change in one of the branches that let you load all the Tiger data for the
> US.
Kevin Galligan wrote:
>> I actually bought an early access copy of the book.  I work in linux and
>> have been playing around with different geocoders and the tiger files.  Most
>> recently with a ruby geocoder, for no other reason than I'm trying to find
>> one that is fairly complete and functional.
>> Any idea how "production quality" this particular one is?  If its fairly
>> high, I'll probably put some time in to get it working on linux.  I have the
>> full 2009 tiger dataset on an EC2 block drive, waiting to import into a
>> different database.
>> Right now I'm using zip+4 data to get a rough geocode, which is good
>> enough for what we're doing, but it only gets 92% of our non-PO Box data.
>>  From my experience with the tiger data, it only adds a couple percent at
>> most above that, but the geocoders I've used have been pretty hacky, so its
>> possible that was the issue.  Also, some of them seem to not be concerned
>> with stuff like matching "Main St" when you're looking for "Main Ln", which
>> is pretty terrible.
>> On the plus side, if there is major work going on with this geocoder (or
>> any tiger geocoder), I have a huge national data volume that will help
>> stress test the system.
>> Recently I've been toying with USC's free geocoder project.  In some areas
>> it actually gets about half of the data I previously could not, which is
>> impressive.
>> The really frustrating thing is, in general, the first 90% is cheap/free.
>>  The next 3-4% is marginally expensive.  The rest is really pricey.
>> Is there any idea how complete the tiger data is, and why there is this
>> apparent lack of data in there?  I find it strange.  Some streets are just
>> missing.  Stuff like that.
>> Rambling.  Anyway, will take a look later.  Thoughts on the quality of the
>> geocoder appreciated.
>>    David,
>>    As a matter of fact we've been working on that for chapter 10 of our
>>    upcoming book and think we have it all working.  As a part of the
>>    example
>>    generation process for our chapter 10, we had to come up with a way
>>    to load
>>    the tables that works on both windows and Linux.  Unfortunately we
>>    haven't
>>    had a chance to test the Linux loading approach, but is pretty much a
>>    parallel of the windows approach.
>>    To do so we started out with Steve's code, added some additional
>>    skeleton
>>    tables and a database function that generates a command line script
>>    for the
>>    respective OS.  Hopefully it all makes sense from the readme file we
>>    have
>>    packaged.
>>    We also changed one of the functions because there was an error in
>>    it and
>>    revised slightly to work with Tiger 2009 data.  You can dowload our
>>    slightly
>>    hacked version of Steve's code from our chapter 10 page.
>>    Steve -- if you are listening we are hoping to remerge your version
>>    with our
>>    loader part and bring back into the PostGIS distribution as part of
>>    PostGIS
>>    1.5.1 or 2.0 release.
>>    I'm trying to set up the TIGER geocoder from
>> which is new and aims to
>> work
>>    with the new TIGER shapefiles.  I'm trying with the 2009 shapefiles
>> from
>>    <>.
>>    I'm not sure how to create the roads_local table (derived closely from
>>    completechain in the old version).  A join between edges and addr?
>>    Wondering if anyone can offer any direction.  A relevant ticket is
>>  The out-of-date file
>>    which used
>>    to create the roads_local table is tables/roads_local.sql, in the above
>>    repository.
>>                                          Table "tiger.edges"
>>      Column   |          Type          |                         Modifiers
>>  ------------+------------------------+----------------------------------
>>    ------------+------------------------+--------------------------
>>     gid        | integer                | not null default
>>    nextval('public.edges_gid_seq'::regclass)
>>     statefp    | character varying(2)   |
>>     countyfp   | character varying(3)   |
>>     tlid       | bigint                 |
>>     tfidl      | bigint                 |
>>     tfidr      | bigint                 |
>>     mtfcc      | character varying(5)   |
>>     fullname   | character varying(100) |
>>     smid       | character varying(22)  |
>>     lfromadd   | character varying(12)  |
>>     ltoadd     | character varying(12)  |
>>     rfromadd   | character varying(12)  |
>>     rtoadd     | character varying(12)  |
>>     zipl       | character varying(5)   |
>>     zipr       | character varying(5)   |
>>     featcat    | character varying(1)   |
>>     hydroflg   | character varying(1)   |
>>     railflg    | character varying(1)   |
>>     roadflg    | character varying(1)   |
>>     olfflg     | character varying(1)   |
>>     passflg    | character varying(1)   |
>>     divroad    | character varying(1)   |
>>     exttyp     | character varying(1)   |
>>     ttyp       | character varying(1)   |
>>     deckedroad | character varying(1)   |
>>     artpath    | character varying(1)   |
>>     persist    | character varying(1)   |
>>     gcseflg    | character varying(1)   |
>>     offsetl    | character varying(1)   |
>>     offsetr    | character varying(1)   |
>>     tnidf      | bigint                 |
>>     tnidt      | bigint                 |
>>     the_geom   | public.geometry        |
>>                                         Table "tiger.addr"
>>     Column   |         Type          |                         Modifiers
>>  -----------+-----------------------+------------------------------------
>>    -----------+-----------------------+-----------------------
>>     gid       | integer               | not null default
>>    nextval('public.addr_gid_seq'::regclass)
>>     tlid      | bigint                |
>>     fromhn    | character varying(12) |
>>     tohn      | character varying(12) |
>>     side      | character varying(1)  |
>>     zip       | character varying(5)  |
>>     plus4     | character varying(4)  |
>>     fromtyp   | character varying(1)  |
>>     totyp     | character varying(1)  |
>>     fromarmid | integer               |
>>     toarmid   | integer               |
>>     arid      | character varying(22) |
>>     mtfcc     | character varying(5)  |
>>     statefp   | character varying(2)  | not null
