On Tue, Jan 3, 2012 at 4:30 PM, Ryan McKinley <ryan...@gmail.com> wrote: > > Just brainstorming, it seems like an FST could be a good/efficient way > to match documents. My plan would be to: > > 1. Use an Analyzer to create a TokenStream for each place name. From > the TokenStream create an FST<docid> -- this would have to pick some > impossible character for the token seperator. > 2. While indexing, create a TokenStream from the input text. For each > token, try to follow the Arc to a match. If there is a match, add it > to the document. > > Does this approach seem reasonable? > Is there some standard way to do this that I am missing? >
I'm not really sure this will fit well inside a tokenstream at all, as it seems more like the kind of thing you would do before analysis, and at analysis you would be worried about how you are going to index the text for search, what you are going to do with the location (separate field or whatever), etc. apart from that - as far as whether or not to use an FST, it seems ok to me, especially if the data used for geocoding is pretty static. if you want to prototype using an FST inside a tokenstream to do this, just convert your geocoding data into a synonyms file (mapping to the location), use SynonymsFilter, and you are done. -- lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org