Re: Memo: RE: RE: Query parser and minus signs

2004-05-26 Thread Erik Hatcher
What is the value of your "Parsed query:" output?
On May 26, 2004, at 8:39 AM, [EMAIL PROTECTED] wrote:


I switched to indexing using a text field instead of keyword, then I 
tried
the following based on various pieces of advice:

PerFieldAnalyzerWrapper pfaw = new 
PerFieldAnalyzerWrapper(new ChineseAnalyzer());
pfaw.addAnalyzer("language", new WhitespaceAnalyzer());

try
{
query = MultiFieldQueryParser.parse(queryString, new 
String[]{"contents", "keywords", "title", "language"}, (Analyzer) 
pfaw);
System.out.println("Parsed query: " + 
query.toString());
}
catch (ParseException e)
{
error = true;
e.printStackTrace();
}

I have tried both "language:zh-HK" and  "language:zh\-HK" (which 
appears in
the debugger as "language:zh\\-HK") as the query, and neither return 
any
hits. I've tried stepping through the code to see what is being indexed
(which looks OK at least to a relative beginner like myself), and also
through the search code but I'm still none the wiser.

Am I doing something wrong, or have I completely missed the point ??

To:Alex BOURNE/IBEU/[EMAIL PROTECTED]
cc:
bcc:
Subject:RE: RE: Query parser and minus signs
remember luke does not display the indexed tokens but the stored 
field.  So
you would expect to see en-uk in the field.

doc.add(Field.Keyword("locale","test-uk"));
are you adding to the document like this?
Also what analyzer you using to pass the query?
org.apache.lucene.analysis.WhitespaceAnalyzer : parses as locale:en-uk
org.apache.lucene.analysis.SimpleAnalyzer : parses as locale:en uk
org.apache.lucene.analysis.standard.StandardAnalyzer : parses as 
locale:en
uk

Try using whitespace analyzer in Luke and see how it's interpreting the
query.  If you are storing as a keyword but searching with tokens, it 
may
be your problem.


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: 24 May 2004 09:50
To: Lucene Users List
Subject: RE: RE: Query parser and minus signs


I tried this, but no it does not work. I'm concerned that escaping the
minus symbol does not appear to work. The field is indexed as a 
keyword so
is not tokenized - I've checked the contents using luke which confirms
this.


"David Townsend" <[EMAIL PROTECTED]> on 21 May 2004 17:02
Please respond to "Lucene Users List" <[EMAIL PROTECTED]>
To:"Lucene Users List" <[EMAIL PROTECTED]>
cc:
bcc:
Subject:RE: RE: Query parser and minus signs
Doesn't "en UK" as a phrase query work?
You're probably indexing it as a text field so it's being tokenised.
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: 21 May 2004 16:47
To: Lucene Users List
Subject: Memo: RE: Query parser and minus signs


Hmm, we may have to if there is no work around. We're not using java
locales, but were trying to stick to the ISO standard which uses 
hyphens.


"Ryan Sonnek" <[EMAIL PROTECTED]> on 21 May 2004 16:38
Please respond to "Lucene Users List" <[EMAIL PROTECTED]>
To:"Lucene Users List" <[EMAIL PROTECTED]>
cc:
bcc:
Subject:RE: Query parser and minus signs
if you're dealing with locales, why not use java's built in locale 
syntax
(ex: en_UK, zh_HK)?

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Friday, May 21, 2004 10:36 AM
To: [EMAIL PROTECTED]
Subject: Query parser and minus signs


Hi All,
I'm using Lucene on a site that has split content with a
branch containing
pages in English and a separate branch in Chinese.  Some of
the chinese
pages include some (untranslatable) English words, so when a search is
carried out in either language you can get pages from the
wrong branch. To
combat this we introduced a language field into the index
which contains
the standard language codes: en-UK and zh-HK.
When you parse a query  e.g. language:"en\-UK" you could
reasonably expect
the search to recover all pages with the language field set
to "en-UK" (the
minus symbol should be escaped by the backslash according to the FAQ).
Unfortunately the parser seems to return "en UK" as the
parsed query and
hence returns no documents.
Has anyone else had this problem, or could suggest a
workaround ?? as I
have
yet to find a solution in the mailing archives or elsewhere.
Many thanks in advance,
Alex Bourne

_
This transmission has been issued by a member of the HSBC Group
("HSBC") for the information of the addressee only and should not be
reproduced and / or distributed to any other person. Each page
attached hereto must be read in conjunction with any disclaimer which
forms part of it. This transmission is neither an offer nor
the solicitation
of an offer to sell or purchase any investment. Its contents
are based
on information obtained from sources believed to be reliable but HSBC
makes no representation and accepts no responsibility or
liab

Memo: RE: RE: Query parser and minus signs

2004-05-26 Thread alex . bourne




I switched to indexing using a text field instead of keyword, then I tried
the following based on various pieces of advice:

PerFieldAnalyzerWrapper pfaw = new PerFieldAnalyzerWrapper(new 
ChineseAnalyzer());
pfaw.addAnalyzer("language", new WhitespaceAnalyzer());

try
{
query = MultiFieldQueryParser.parse(queryString, new 
String[]{"contents", "keywords", "title", "language"}, (Analyzer) pfaw);
System.out.println("Parsed query: " + query.toString());
}
catch (ParseException e)
{
error = true;
e.printStackTrace();
}

I have tried both "language:zh-HK" and  "language:zh\-HK" (which appears in
the debugger as "language:zh\\-HK") as the query, and neither return any
hits. I've tried stepping through the code to see what is being indexed
(which looks OK at least to a relative beginner like myself), and also
through the search code but I'm still none the wiser.

Am I doing something wrong, or have I completely missed the point ??



To:Alex BOURNE/IBEU/[EMAIL PROTECTED]
cc:
bcc:

Subject:RE: RE: Query parser and minus signs


remember luke does not display the indexed tokens but the stored field.  So
you would expect to see en-uk in the field.

doc.add(Field.Keyword("locale","test-uk"));

are you adding to the document like this?

Also what analyzer you using to pass the query?

org.apache.lucene.analysis.WhitespaceAnalyzer : parses as locale:en-uk
org.apache.lucene.analysis.SimpleAnalyzer : parses as locale:en uk
org.apache.lucene.analysis.standard.StandardAnalyzer : parses as locale:en
uk

Try using whitespace analyzer in Luke and see how it's interpreting the
query.  If you are storing as a keyword but searching with tokens, it may
be your problem.



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: 24 May 2004 09:50
To: Lucene Users List
Subject: RE: RE: Query parser and minus signs






I tried this, but no it does not work. I'm concerned that escaping the
minus symbol does not appear to work. The field is indexed as a keyword so
is not tokenized - I've checked the contents using luke which confirms
this.




"David Townsend" <[EMAIL PROTECTED]> on 21 May 2004 17:02

Please respond to "Lucene Users List" <[EMAIL PROTECTED]>

To:"Lucene Users List" <[EMAIL PROTECTED]>
cc:
bcc:

Subject:RE: RE: Query parser and minus signs


Doesn't "en UK" as a phrase query work?

You're probably indexing it as a text field so it's being tokenised.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: 21 May 2004 16:47
To: Lucene Users List
Subject: Memo: RE: Query parser and minus signs






Hmm, we may have to if there is no work around. We're not using java
locales, but were trying to stick to the ISO standard which uses hyphens.




"Ryan Sonnek" <[EMAIL PROTECTED]> on 21 May 2004 16:38

Please respond to "Lucene Users List" <[EMAIL PROTECTED]>

To:"Lucene Users List" <[EMAIL PROTECTED]>
cc:
bcc:

Subject:RE: Query parser and minus signs


if you're dealing with locales, why not use java's built in locale syntax
(ex: en_UK, zh_HK)?

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Sent: Friday, May 21, 2004 10:36 AM
> To: [EMAIL PROTECTED]
> Subject: Query parser and minus signs
>
>
>
>
>
>
> Hi All,
>
> I'm using Lucene on a site that has split content with a
> branch containing
> pages in English and a separate branch in Chinese.  Some of
> the chinese
> pages include some (untranslatable) English words, so when a search is
> carried out in either language you can get pages from the
> wrong branch. To
> combat this we introduced a language field into the index
> which contains
> the standard language codes: en-UK and zh-HK.
>
> When you parse a query  e.g. language:"en\-UK" you could
> reasonably expect
> the search to recover all pages with the language field set
> to "en-UK" (the
> minus symbol should be escaped by the backslash according to the FAQ).
> Unfortunately the parser seems to return "en UK" as the
> parsed query and
> hence returns no documents.
>
> Has anyone else had this problem, or could suggest a
> workaround ?? as I
> have
> yet to find a solution in the mailing archives or elsewhere.
>
> Many thanks in advance,
>
> Alex Bourne
>
>
>
> _
>
> This transmission has been issued by a member of the HSBC Group
> ("HSBC") for the information of the addressee only and should not be
> reproduced and / or distributed to any other person. Each page
> attached hereto must be read in conjunction with any disclaimer which
> forms part of it. This transmission is neither an offer nor
> the solicitation
> of an offer to sell or purchase any investment. Its contents
> are based
> on information obtained from sources believed to be reliable but HSBC
> makes no representation and acce