On Wed, Oct 26, 2011 at 2:09 PM, Robert Muir <[email protected]> wrote:
> Use a queryparser that doesnt break on whitespace as a workaround?
> Or, we can start thinking about how to fix QueryParser
> (https://issues.apache.org/jira/browse/LUCENE-2605)

+1
>
> The bug is that QueryParser tries to be a Tokenizer and breaks on whitespace.
> Allowing tokenizer access to the query string would just mean that
> your tokenizer hacks around this by trying to be a QueryParser, too,
> making matters even worse!
>
>
> On Wed, Oct 26, 2011 at 8:05 AM, Bernd Fehling
> <[email protected]> wrote:
>> OK, I think "query string" is a bit to specific, so more general
>> what I need is access from inside of a filter to the complete string
>> (not only token) being analyzed.
>>
>> A very dirty workaround would be a "collector filter" which collects all
>> tokens after WhitespaceTokenizer and makes it somehow available for
>> the following filters, or not?
>> So at least at the last run of incrementToken() I have the original string.
>>
>> Bernd
>>
>> Am 26.10.2011 10:26, schrieb Uwe Schindler:
>>>
>>> The input from StringReader does not help you:
>>> - in the case of QueryParser it is *not* the query string!!!
>>> - storing it in an attribute would blow up your heap for real documents
>>>
>>> Uwe
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: [email protected]
>>>
>>>
>>>> -----Original Message-----
>>>> From: Bernd Fehling [mailto:[email protected]]
>>>> Sent: Wednesday, October 26, 2011 10:06 AM
>>>> To: [email protected]
>>>> Subject: Re: accessing the query string from inside TokenFilter
>>>>
>>>>  From what I can see in the debugger the analyzer chain is implemented as
>>>
>>> a
>>>>
>>>> stack with last filter at the bottom and the first filter at the top.
>>>>
>>>> An analyzer query chain of:
>>>> charFilter: MappingCharFilterFactory
>>>> tokenizer : WhitespaceTokenizerFactory
>>>> filter    : PatternReplaceFilterFactory
>>>> filter    : LowerCaseFilterFactory
>>>> filter    : ShingleFilterFactory
>>>> filter    : SynonymFilterFactory
>>>>
>>>> has a chain of:
>>>> this.input(SynonymFilter) -->  input(ShingleFilter) -->
>>>> input(LowerCaseFilter) -->  input(PatternReplaceFilter) -->
>>>> input(WhitespaceTokenizer) -->  input(MappingCharFilter) -->
>>>> input(CharReader) -->  input(StringReader).str
>>>>
>>>> So I can always "see" the input of StringReader, but can I access it?
>>>>
>>>> Bernd
>>>>
>>>> Am 26.10.2011 09:37, schrieb Chris Male:
>>>>>
>>>>> We've also lost the full query string by the time the QP creates its
>>>>> TokenStream, right? Because the QP tokenizes on whitespace.
>>>>>
>>>>> On Wed, Oct 26, 2011 at 8:32 PM, Uwe Schindler<[email protected]>   wrote:
>>>>>
>>>>>> Hi Simon,
>>>>>>
>>>>>> The problem is the xchanged consumer/producer role. Once the
>>>>>> TokenStream calls clearAttributes() the attributes are gone, but
>>>>>> query parser can only set the attribute *before* calling
>>>>>> incrementToken(), so you have no chance to get them, as Tokenizer
>>>>>> cleared it before any filter can read it (unless we use an attribute
>>>>>> with clear() a no-op, which would fail lots of tests, as it's a hack).
>>>>>>
>>>>>> Uwe
>>>>>>
>>>>>> -----
>>>>>> Uwe Schindler
>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>>> http://www.thetaphi.de
>>>>>> eMail: [email protected]
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Simon Willnauer [mailto:[email protected]]
>>>>>>> Sent: Wednesday, October 26, 2011 9:21 AM
>>>>>>> To: [email protected]
>>>>>>> Subject: Re: accessing the query string from inside TokenFilter
>>>>>>>
>>>>>>> What Uwe says is correct though. What we possibly could do is adding
>>>>>>> a queryattribute that is set in a query parser (you can do that
>>>>>>> yourself
>>>>>>
>>>>>> though).
>>>>>>>
>>>>>>> not sure if it is worth it and if we should do it.
>>>>>>>
>>>>>>> simon
>>>>>>>
>>>>>>> On Wed, Oct 26, 2011 at 8:58 AM, Uwe Schindler<[email protected]>
>>>>
>>>> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> QueryParser and TokenStreams are clearly separated, there is no way
>>>>>>>> to get the query string from inside a TokenStream (and there cannot
>>>>>>>> be, because QP is a consumer of the TS, which is used not only for
>>>>>>>> query parsing). The only chance you have is to use a ThreadLocal
>>>>>>>> that you set before the query is parsed and then use it in the
>>>
>>> TokenFilter.
>>>>>>>>
>>>>>>>> Uwe
>>>>>>>>
>>>>>>>> -----
>>>>>>>> Uwe Schindler
>>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
>>>>>>>> eMail: [email protected]
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Bernd Fehling [mailto:[email protected]]
>>>>>>>>> Sent: Wednesday, October 26, 2011 8:33 AM
>>>>>>>>> To: [email protected]
>>>>>>>>> Subject: accessing the query string from inside TokenFilter
>>>>>>>>>
>>>>>>>>> Dear list,
>>>>>>>>> while writing some TokenFilter for my analyzer chain I need access
>>>
>>> to
>>>>>>>>>
>>>>>>>>> the
>>>>>>>>
>>>>>>>> query
>>>>>>>>>
>>>>>>>>> string from inside of my TokenFilter for some comparison, but the
>>>>>>>>> Filters
>>>>>>>>
>>>>>>>> are
>>>>>>>>>
>>>>>>>>> working with a TokenStream and get seperate Tokens.
>>>>>>>>> Currently I couldn't get any access to the query string.
>>>>>>>>>
>>>>>>>>> It would be great to have such a funtionality in lucene/solr.
>>>>>>>>>
>>>>>>>>> Should I write a jira issue for it or is there somewhere a wish
>>>
>>> list?
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>> Bernd
>>>>>>>>>
>>>>>>>>>
>>> ---------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> To unsubscribe, e-mail: [email protected] For
>>>>>>>>> additional commands, e-mail: [email protected]
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [email protected] For
>>>>>>>> additional commands, e-mail: [email protected]
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [email protected] For
>>>
>>> additional
>>>>>>>
>>>>>>> commands, e-mail: [email protected]
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>> For additional commands, e-mail: [email protected]
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> *************************************************************
>>>> Bernd Fehling                Universitätsbibliothek Bielefeld
>>>> Dipl.-Inform. (FH)                        Universitätsstr. 25
>>>> Tel. +49 521 106-4060                   Fax. +49 521 106-4052
>>>> [email protected]                33615 Bielefeld
>>>>
>>>> BASE - Bielefeld Academic Search Engine - www.base-search.net
>>>> *************************************************************
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>
>> --
>> *************************************************************
>> Bernd Fehling                Universitätsbibliothek Bielefeld
>> Dipl.-Inform. (FH)                        Universitätsstr. 25
>> Tel. +49 521 106-4060                   Fax. +49 521 106-4052
>> [email protected]                33615 Bielefeld
>>
>> BASE - Bielefeld Academic Search Engine - www.base-search.net
>> *************************************************************
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
>
>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to