[postgis-users] [slightly off-topic] Question on build a C address parse for an embedded geocoder

2012-10-20 Thread Stephen Woodbridge

Hi Dev's,

I am interested in writing an address standardized in C that could be 
callable from SQL. I understand the basics of doing this from supporting 
pgRouting and writing some additional commands. It would get used 
something like:


select * from standardize(address, city, state, country, postcode);
select * from standardize(address_one_line);

and would return a standardized set of fields like: house_num, street, 
city, state, country, postcode. These could be then used to create 
standardized reference table or it could be passed into the geocoder 
that would search the standardized reference table.


What I am struggling with is how to best initial the address 
parser/standardize. The concept I have in mind is to have some tables 
that represent the lexicon, gazetteer, parsing rules, etc. This data 
could be specific to country and/or country-state. I could be fairly 
small or quite large. For example, there are about 40K unique city names 
based on the USPS zipcodes and about 7K of them have duplicate 
standardizations based on state.


On the one hand I can read these tables on every request and build the 
internal structures, parse the request, and throw out the internal 
structures.


Basically once the reference source records have been standardized you 
should not be changing the above tables because you want to standardize 
future search requests based on the same rules that the reference road 
segments were standardized.


And ideally you do not want to spend the time to rebuild these internal 
structures on every search request.


So is there a mechanism for building some internal data and holding on 
to it between requests. I suppose I could store it in a blob, but it 
would then need to be de-toasted on every search request.


Maybe, I'm this is an non-issue, but it seems to impact the design 
depending on what options I might have and how they are implemented and 
accessed from the code.


Thoughts?

Thanks for any help or suggestions,
  -Steve W
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] [slightly off-topic] Question on build a C address parse for an embedded geocoder

2012-10-21 Thread Paul Ramsey
You can stuff things into an upper memory context, but I'm not sure
how wise that would be. I does however seem to be the only reasonable
approach to getting things to last much longer than a statement.

P.

On Sat, Oct 20, 2012 at 9:12 PM, Stephen Woodbridge
 wrote:
> Hi Dev's,
>
> I am interested in writing an address standardized in C that could be
> callable from SQL. I understand the basics of doing this from supporting
> pgRouting and writing some additional commands. It would get used something
> like:
>
> select * from standardize(address, city, state, country, postcode);
> select * from standardize(address_one_line);
>
> and would return a standardized set of fields like: house_num, street, city,
> state, country, postcode. These could be then used to create standardized
> reference table or it could be passed into the geocoder that would search
> the standardized reference table.
>
> What I am struggling with is how to best initial the address
> parser/standardize. The concept I have in mind is to have some tables that
> represent the lexicon, gazetteer, parsing rules, etc. This data could be
> specific to country and/or country-state. I could be fairly small or quite
> large. For example, there are about 40K unique city names based on the USPS
> zipcodes and about 7K of them have duplicate standardizations based on
> state.
>
> On the one hand I can read these tables on every request and build the
> internal structures, parse the request, and throw out the internal
> structures.
>
> Basically once the reference source records have been standardized you
> should not be changing the above tables because you want to standardize
> future search requests based on the same rules that the reference road
> segments were standardized.
>
> And ideally you do not want to spend the time to rebuild these internal
> structures on every search request.
>
> So is there a mechanism for building some internal data and holding on to it
> between requests. I suppose I could store it in a blob, but it would then
> need to be de-toasted on every search request.
>
> Maybe, I'm this is an non-issue, but it seems to impact the design depending
> on what options I might have and how they are implemented and accessed from
> the code.
>
> Thoughts?
>
> Thanks for any help or suggestions,
>   -Steve W
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] [slightly off-topic] Question on build a C address parse for an embedded geocoder

2012-10-21 Thread Stephen Woodbridge
OK, I can look into doing that if I need to. I guess the appropriate 
thing to do at the moment is to see if I can get things working just 
loading them from tables first and see if there are performance issues 
before I try to optimize it.


I should not be an issue for standardizing the reference data set, 
because you only build it once for the whole reference table. I'll have 
to see how much time it takes to build the lexicon tables relative to 
querying the reference set to determine how much overhead there is.


Thanks,
  -Steve

On 10/21/2012 12:03 PM, Paul Ramsey wrote:

You can stuff things into an upper memory context, but I'm not sure
how wise that would be. I does however seem to be the only reasonable
approach to getting things to last much longer than a statement.

P.

On Sat, Oct 20, 2012 at 9:12 PM, Stephen Woodbridge
 wrote:

Hi Dev's,

I am interested in writing an address standardized in C that could be
callable from SQL. I understand the basics of doing this from supporting
pgRouting and writing some additional commands. It would get used something
like:

select * from standardize(address, city, state, country, postcode);
select * from standardize(address_one_line);

and would return a standardized set of fields like: house_num, street, city,
state, country, postcode. These could be then used to create standardized
reference table or it could be passed into the geocoder that would search
the standardized reference table.

What I am struggling with is how to best initial the address
parser/standardize. The concept I have in mind is to have some tables that
represent the lexicon, gazetteer, parsing rules, etc. This data could be
specific to country and/or country-state. I could be fairly small or quite
large. For example, there are about 40K unique city names based on the USPS
zipcodes and about 7K of them have duplicate standardizations based on
state.

On the one hand I can read these tables on every request and build the
internal structures, parse the request, and throw out the internal
structures.

Basically once the reference source records have been standardized you
should not be changing the above tables because you want to standardize
future search requests based on the same rules that the reference road
segments were standardized.

And ideally you do not want to spend the time to rebuild these internal
structures on every search request.

So is there a mechanism for building some internal data and holding on to it
between requests. I suppose I could store it in a blob, but it would then
need to be de-toasted on every search request.

Maybe, I'm this is an non-issue, but it seems to impact the design depending
on what options I might have and how they are implemented and accessed from
the code.

Thoughts?

Thanks for any help or suggestions,
   -Steve W
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users



___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users