And to be more specific, most query parsers will have already separated the terms and will call the analyzer with only one term at a time, so no term recombination is possible for those parsed terms, at query time.

-- Jack Krupansky
-----Original Message----- From: Erick Erickson
Sent: Friday, December 21, 2012 8:27 AM
To: java-user
Subject: Re: Which token filter can combine 2 terms into 1?

If it's a fixed list and not excessively long, would synonyms work?

But if theres some kind of logic you need to apply, I don't think you're
going to find anything OOB.
The problem is that by the time a token filter gets called, they are
already split up, you'll probably
have to write a custom filter that manages that logic.

Best
Erick


On Fri, Dec 21, 2012 at 4:16 AM, Xi Shen <davidshe...@gmail.com> wrote:

Unfortunately, no...I am not combine every two term into one. I am
combining a specific pair.

E.g. the Token Stream: t1 t2 t2a t3
should be rewritten into t1 t2t2a t3

But the TS: t1 t2 t3 t2a
should not be rewritten, and it is already correct


On Fri, Dec 21, 2012 at 5:00 PM, Alan Woodward <
alan.woodw...@romseysoftware.co.uk> wrote:

> Have a look at ShingleFilter:
>
http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/shingle/ShingleFilter.html
>
> On 21 Dec 2012, at 08:42, Xi Shen wrote:
>
> > I have to use the white space and word delimiter to process the input
> > first. I tried many combination, and it seems to me that it is
inevitable
> > the term will be split into two :(
> >
> > I think developing my own filter is the only resolution...but I just
> cannot
> > find a guide to help me understand what I need to do to implement a
> > TokenFilter.
> >
> >
> > On Fri, Dec 21, 2012 at 4:03 PM, Danil ŢORIN <torin...@gmail.com>
wrote:
> >
> >> Easiest way would be to pre-process your input and join those 2 > >> tokens
> >> before splitting them by white space.
> >>
> >> But from given context I might miss some details...still worth a > >> shot.
> >>
> >> On Fri, Dec 21, 2012 at 9:50 AM, Xi Shen <davidshe...@gmail.com>
wrote:
> >>
> >>> Hi,
> >>>
> >>> I am looking for a token filter that can combine 2 terms into 1? > >>> E.g.
> >>>
> >>> the input has been tokenized by white space:
> >>>
> >>> t1 t2 t2a t3
> >>>
> >>> I want a filter that output:
> >>>
> >>> t1 t2t2a t3
> >>>
> >>> I know it is a very special case, and I am thinking about develop a
> >> filter
> >>> of my own. But I cannot figure out which API I should use to look > >>> for
> >> terms
> >>> in a Token Stream.
> >>>
> >>> --
> >>> Regards,
> >>> David Shen
> >>>
> >>> http://about.me/davidshen
> >>> https://twitter.com/#!/davidshen84
> >>>
> >>
> >
> >
> >
> > --
> > Regards,
> > David Shen
> >
> > http://about.me/davidshen
> > https://twitter.com/#!/davidshen84
>
>


--
Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to