Memo: Re: RE: RE: Query parser and minus signs

2004-05-27 Thread alex . bourne




Thanks Erik :)

We are using 1.3 so it looks like an upgrade should be made asap.

Whilst hacking around I found an alternative solution. I went back to using
a Keyword field, but instead of using the minus symbol in the query I just
used "-language:en*" which has the desired effect.

Now I know about the upgrade to 1.4 I'll have a look at some alternative
solutions.

Thanks for everyone's suggestions on this problem.

Alex B.




Erik Hatcher <[EMAIL PROTECTED]> on 26 May 2004 17:24

Please respond to "Lucene Users List" <[EMAIL PROTECTED]>

To:"Lucene Users List" <[EMAIL PROTECTED]>
cc:
bcc:

Subject:Re: RE: RE: Query parser and minus signs



On May 26, 2004, at 10:48 AM, [EMAIL PROTECTED] wrote:
> Query: hsbc -language:zh-HK
> Parsed query: (contents:hsbc -language:zh -contents:hk) (keywords:hsbc
> -language:zh -keywords:hk) (title:hsbc -language:zh -title:hk)
> (language:hsbc
> -language:zh -language:HK)
> Hits: 169
> Not quite what I was expecting from the parsed query - the zh and HK
> are now separated.

I think I can safely say that you are not running the latest version of
Lucene.  This has been corrected in the 1.4 versions.

I've tested this with "Wal-Mart" (without the quote) and QueryParser,
and it works as expected.


> Query: hsbc -language:zh\-HK
> Parsed query: (contents:hsbc -language:zh\-HK) (keywords:hsbc
> -language:zh\-HK) (title:hsbc -language:zh\-HK) (language:hsbc
> -language:zh\-HK)
> Hits: 206
> And I'm guessing here, but I don't think the slash is escaping, does
> it just become part of the query??

Now that is odd.

QueryParser is an awkward beast at times, and combining it with
MultiFieldQueryParser (which I'd recommend against, as you can see with
the odd queries it built for you) gets even more confusing.

Hopefully the latest Lucene 1.4 RC release will fix up your situation.

 Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



**
 This message originated from the Internet. Its originator may or
 may not be who they claim to be and the information contained in
 the message and any attachments may or may not be accurate.
**








_

This transmission has been issued by a member of the HSBC Group 
("HSBC") for the information of the addressee only and should not be 
reproduced and / or distributed to any other person. Each page 
attached hereto must be read in conjunction with any disclaimer which 
forms part of it. This transmission is neither an offer nor the solicitation 
of an offer to sell or purchase any investment. Its contents are based 
on information obtained from sources believed to be reliable but HSBC 
makes no representation and accepts no responsibility or liability as to 
its completeness or accuracy.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Memo: Re: RE: RE: Query parser and minus signs

2004-05-26 Thread Erik Hatcher
On May 26, 2004, at 10:48 AM, [EMAIL PROTECTED] wrote:
Query: hsbc -language:zh-HK
Parsed query: (contents:hsbc -language:zh -contents:hk) (keywords:hsbc 
-language:zh -keywords:hk) (title:hsbc -language:zh -title:hk) 
(language:hsbc
-language:zh -language:HK)
Hits: 169
Not quite what I was expecting from the parsed query - the zh and HK 
are now separated.
I think I can safely say that you are not running the latest version of 
Lucene.  This has been corrected in the 1.4 versions.

I've tested this with "Wal-Mart" (without the quote) and QueryParser, 
and it works as expected.


Query: hsbc -language:zh\-HK
Parsed query: (contents:hsbc -language:zh\-HK) (keywords:hsbc 
-language:zh\-HK) (title:hsbc -language:zh\-HK) (language:hsbc 
-language:zh\-HK)
Hits: 206
And I'm guessing here, but I don't think the slash is escaping, does 
it just become part of the query??
Now that is odd.
QueryParser is an awkward beast at times, and combining it with 
MultiFieldQueryParser (which I'd recommend against, as you can see with 
the odd queries it built for you) gets even more confusing.

Hopefully the latest Lucene 1.4 RC release will fix up your situation.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Memo: Re: RE: RE: Query parser and minus signs

2004-05-26 Thread alex . bourne




Being a bit of a newbie I had tried putting "-language:zh-HK" by itself,
where it seems it will always return no results unless you combine it with
a positive term. However I then tried this and it does not seem to build
the query I had hoped for:

Query: hsbc
Parsed query: contents:hsbc keywords:hsbc title:hsbc language:hsbc
Hits: 206

Query: hsbc -language:zh-HK
Parsed query: (contents:hsbc -language:zh -contents:hk) (keywords:hsbc -language:zh 
-keywords:hk) (title:hsbc -language:zh -title:hk) (language:hsbc
-language:zh -language:HK)
Hits: 169
Not quite what I was expecting from the parsed query - the zh and HK are now separated.

Query: hsbc -language:zh\-HK
Parsed query: (contents:hsbc -language:zh\-HK) (keywords:hsbc -language:zh\-HK) 
(title:hsbc -language:zh\-HK) (language:hsbc -language:zh\-HK)
Hits: 206
And I'm guessing here, but I don't think the slash is escaping, does it just become 
part of the query??






Erik Hatcher <[EMAIL PROTECTED]> on 26 May 2004 15:11

Please respond to "Lucene Users List" <[EMAIL PROTECTED]>

To:"Lucene Users List" <[EMAIL PROTECTED]>
cc:
bcc:

Subject:Re: RE: RE: Query parser and minus signs


What is the value of your "Parsed query:" output?


On May 26, 2004, at 8:39 AM, [EMAIL PROTECTED] wrote:

>
>
>
>
> I switched to indexing using a text field instead of keyword, then I
> tried
> the following based on various pieces of advice:
>
> PerFieldAnalyzerWrapper pfaw = new
> PerFieldAnalyzerWrapper(new ChineseAnalyzer());
> pfaw.addAnalyzer("language", new WhitespaceAnalyzer());
>
> try
> {
> query = MultiFieldQueryParser.parse(queryString, new
> String[]{"contents", "keywords", "title", "language"}, (Analyzer)
> pfaw);
> System.out.println("Parsed query: " +
> query.toString());
> }
> catch (ParseException e)
> {
> error = true;
> e.printStackTrace();
> }
>
> I have tried both "language:zh-HK" and  "language:zh\-HK" (which
> appears in
> the debugger as "language:zh\\-HK") as the query, and neither return
> any
> hits. I've tried stepping through the code to see what is being indexed
> (which looks OK at least to a relative beginner like myself), and also
> through the search code but I'm still none the wiser.
>
> Am I doing something wrong, or have I completely missed the point ??
>
>
>
> To:Alex BOURNE/IBEU/[EMAIL PROTECTED]
> cc:
> bcc:
>
> Subject:RE: RE: Query parser and minus signs
>
>
> remember luke does not display the indexed tokens but the stored
> field.  So
> you would expect to see en-uk in the field.
>
> doc.add(Field.Keyword("locale","test-uk"));
>
> are you adding to the document like this?
>
> Also what analyzer you using to pass the query?
>
> org.apache.lucene.analysis.WhitespaceAnalyzer : parses as locale:en-uk
> org.apache.lucene.analysis.SimpleAnalyzer : parses as locale:en uk
> org.apache.lucene.analysis.standard.StandardAnalyzer : parses as
> locale:en
> uk
>
> Try using whitespace analyzer in Luke and see how it's interpreting the
> query.  If you are storing as a keyword but searching with tokens, it
> may
> be your problem.
>
>
>
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Sent: 24 May 2004 09:50
> To: Lucene Users List
> Subject: RE: RE: Query parser and minus signs
>
>
>
>
>
>
> I tried this, but no it does not work. I'm concerned that escaping the
> minus symbol does not appear to work. The field is indexed as a
> keyword so
> is not tokenized - I've checked the contents using luke which confirms
> this.
>
>
>
>
> "David Townsend" <[EMAIL PROTECTED]> on 21 May 2004 17:02
>
> Please respond to "Lucene Users List" <[EMAIL PROTECTED]>
>
> To:"Lucene Users List" <[EMAIL PROTECTED]>
> cc:
> bcc:
>
> Subject:RE: RE: Query parser and minus signs
>
>
> Doesn't "en UK" as a phrase query work?
>
> You're probably indexing it as a text field so it's being tokenised.
>
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Sent: 21 May 2004 16:47
> To: Lucene Users List
> Subject: Memo: RE: Query parser and minus signs
>
>
>
>
>
>
> Hmm, we may have to if there is no work around. We're not using java
> locales, but were trying to stick to the ISO standard which uses
> hyphens.
>
>
>
>
> "Ryan Sonnek" <[EMAIL PROTECTED]> on 21 May 2004 16:38
>
> Please respond to "Lucene Users List" <[EMAIL PROTECTED]>
>
> To:"Lucene Users List" <[EMAIL PROTECTED]>
> cc:
> bcc:
>
> Subject:RE: Query parser and minus signs
>
>
> if you're dealing with locales, why not use java's built in locale
> syntax
> (ex: en_UK, zh_HK)?
>
>> -Original Message-
>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
>> Sent: Friday, May 21, 2004 10:36 AM
>> To: [EMAIL PROTECTED]
>> Subject: Query parser and minus signs
>>
>>
>>
>>
>>
>>
>> Hi All,
>>
>> I'm using Lucene on a site that has split content with a
>>