Use a queryparser that doesnt break on whitespace as a workaround? Or, we can start thinking about how to fix QueryParser (https://issues.apache.org/jira/browse/LUCENE-2605)
The bug is that QueryParser tries to be a Tokenizer and breaks on whitespace. Allowing tokenizer access to the query string would just mean that your tokenizer hacks around this by trying to be a QueryParser, too, making matters even worse! On Wed, Oct 26, 2011 at 8:05 AM, Bernd Fehling <[email protected]> wrote: > OK, I think "query string" is a bit to specific, so more general > what I need is access from inside of a filter to the complete string > (not only token) being analyzed. > > A very dirty workaround would be a "collector filter" which collects all > tokens after WhitespaceTokenizer and makes it somehow available for > the following filters, or not? > So at least at the last run of incrementToken() I have the original string. > > Bernd > > Am 26.10.2011 10:26, schrieb Uwe Schindler: >> >> The input from StringReader does not help you: >> - in the case of QueryParser it is *not* the query string!!! >> - storing it in an attribute would blow up your heap for real documents >> >> Uwe >> ----- >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: [email protected] >> >> >>> -----Original Message----- >>> From: Bernd Fehling [mailto:[email protected]] >>> Sent: Wednesday, October 26, 2011 10:06 AM >>> To: [email protected] >>> Subject: Re: accessing the query string from inside TokenFilter >>> >>> From what I can see in the debugger the analyzer chain is implemented as >> >> a >>> >>> stack with last filter at the bottom and the first filter at the top. >>> >>> An analyzer query chain of: >>> charFilter: MappingCharFilterFactory >>> tokenizer : WhitespaceTokenizerFactory >>> filter : PatternReplaceFilterFactory >>> filter : LowerCaseFilterFactory >>> filter : ShingleFilterFactory >>> filter : SynonymFilterFactory >>> >>> has a chain of: >>> this.input(SynonymFilter) --> input(ShingleFilter) --> >>> input(LowerCaseFilter) --> input(PatternReplaceFilter) --> >>> input(WhitespaceTokenizer) --> input(MappingCharFilter) --> >>> input(CharReader) --> input(StringReader).str >>> >>> So I can always "see" the input of StringReader, but can I access it? >>> >>> Bernd >>> >>> Am 26.10.2011 09:37, schrieb Chris Male: >>>> >>>> We've also lost the full query string by the time the QP creates its >>>> TokenStream, right? Because the QP tokenizes on whitespace. >>>> >>>> On Wed, Oct 26, 2011 at 8:32 PM, Uwe Schindler<[email protected]> wrote: >>>> >>>>> Hi Simon, >>>>> >>>>> The problem is the xchanged consumer/producer role. Once the >>>>> TokenStream calls clearAttributes() the attributes are gone, but >>>>> query parser can only set the attribute *before* calling >>>>> incrementToken(), so you have no chance to get them, as Tokenizer >>>>> cleared it before any filter can read it (unless we use an attribute >>>>> with clear() a no-op, which would fail lots of tests, as it's a hack). >>>>> >>>>> Uwe >>>>> >>>>> ----- >>>>> Uwe Schindler >>>>> H.-H.-Meier-Allee 63, D-28213 Bremen >>>>> http://www.thetaphi.de >>>>> eMail: [email protected] >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Simon Willnauer [mailto:[email protected]] >>>>>> Sent: Wednesday, October 26, 2011 9:21 AM >>>>>> To: [email protected] >>>>>> Subject: Re: accessing the query string from inside TokenFilter >>>>>> >>>>>> What Uwe says is correct though. What we possibly could do is adding >>>>>> a queryattribute that is set in a query parser (you can do that >>>>>> yourself >>>>> >>>>> though). >>>>>> >>>>>> not sure if it is worth it and if we should do it. >>>>>> >>>>>> simon >>>>>> >>>>>> On Wed, Oct 26, 2011 at 8:58 AM, Uwe Schindler<[email protected]> >>> >>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> QueryParser and TokenStreams are clearly separated, there is no way >>>>>>> to get the query string from inside a TokenStream (and there cannot >>>>>>> be, because QP is a consumer of the TS, which is used not only for >>>>>>> query parsing). The only chance you have is to use a ThreadLocal >>>>>>> that you set before the query is parsed and then use it in the >> >> TokenFilter. >>>>>>> >>>>>>> Uwe >>>>>>> >>>>>>> ----- >>>>>>> Uwe Schindler >>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de >>>>>>> eMail: [email protected] >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Bernd Fehling [mailto:[email protected]] >>>>>>>> Sent: Wednesday, October 26, 2011 8:33 AM >>>>>>>> To: [email protected] >>>>>>>> Subject: accessing the query string from inside TokenFilter >>>>>>>> >>>>>>>> Dear list, >>>>>>>> while writing some TokenFilter for my analyzer chain I need access >> >> to >>>>>>>> >>>>>>>> the >>>>>>> >>>>>>> query >>>>>>>> >>>>>>>> string from inside of my TokenFilter for some comparison, but the >>>>>>>> Filters >>>>>>> >>>>>>> are >>>>>>>> >>>>>>>> working with a TokenStream and get seperate Tokens. >>>>>>>> Currently I couldn't get any access to the query string. >>>>>>>> >>>>>>>> It would be great to have such a funtionality in lucene/solr. >>>>>>>> >>>>>>>> Should I write a jira issue for it or is there somewhere a wish >> >> list? >>>>>>>> >>>>>>>> Best regards >>>>>>>> Bernd >>>>>>>> >>>>>>>> >> --------------------------------------------------------------------- >>>>>>>> >>>>>>>> To unsubscribe, e-mail: [email protected] For >>>>>>>> additional commands, e-mail: [email protected] >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: [email protected] For >>>>>>> additional commands, e-mail: [email protected] >>>>>>> >>>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: [email protected] For >> >> additional >>>>>> >>>>>> commands, e-mail: [email protected] >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >>>>> >>>> >>>> >>> >>> -- >>> ************************************************************* >>> Bernd Fehling Universitätsbibliothek Bielefeld >>> Dipl.-Inform. (FH) Universitätsstr. 25 >>> Tel. +49 521 106-4060 Fax. +49 521 106-4052 >>> [email protected] 33615 Bielefeld >>> >>> BASE - Bielefeld Academic Search Engine - www.base-search.net >>> ************************************************************* >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > -- > ************************************************************* > Bernd Fehling Universitätsbibliothek Bielefeld > Dipl.-Inform. (FH) Universitätsstr. 25 > Tel. +49 521 106-4060 Fax. +49 521 106-4052 > [email protected] 33615 Bielefeld > > BASE - Bielefeld Academic Search Engine - www.base-search.net > ************************************************************* > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
