Re: Tagging documents as they are indexed -- Is FST a reasonable approach?

Robert Muir Tue, 03 Jan 2012 13:45:07 -0800

On Tue, Jan 3, 2012 at 4:30 PM, Ryan McKinley <ryan...@gmail.com> wrote:
>
> Just brainstorming, it seems like an FST could be a good/efficient way
> to match documents.  My plan would be to:
>
> 1. Use an Analyzer to create a TokenStream for each place name.  From
> the TokenStream create an FST<docid> -- this would have to pick some
> impossible character for the token seperator.
> 2. While indexing, create a TokenStream from the input text.  For each
> token, try to follow the Arc to a match.  If there is a match, add it
> to the document.
>
> Does this approach seem reasonable?
> Is there some standard way to do this that I am missing?
>


I'm not really sure this will fit well inside a tokenstream at all, as
it seems more like the kind of thing you would do before analysis, and
at analysis you would be worried about how you are going to index the
text for search, what you are going to do with the location (separate
field or whatever), etc.

apart from that - as far as whether or not to use an FST, it seems ok
to me, especially if the data used for geocoding is pretty static.

if you want to prototype using an FST inside a tokenstream to do this,
just convert your geocoding data into a synonyms file (mapping to the
location), use SynonymsFilter, and you are done.

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Tagging documents as they are indexed -- Is FST a reasonable approach?

Reply via email to