Re: Query Question

2005-02-18 Thread Luke Shannon
Thanks Erik. Option 2 sounds like the path of least resistance.

Luke
- Original Message - 
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List lucene-user@jakarta.apache.org
Sent: Thursday, February 17, 2005 9:05 PM
Subject: Re: Query Question


 On Feb 17, 2005, at 5:51 PM, Luke Shannon wrote:
  My manager is now totally stuck about being able to query data with * 
  in it.
 
 He's gonna have to wait a bit longer, you've got a slightly tricky 
 situation on your hands
 
  WildcardQuery(new Term(name, *home\**));
 
 The \* is the problem.  WildcardQuery doesn't deal with escaping like 
 you're trying.  Your query is essentially this now:
 
 home\*
 
 Where backslash has no special meaning at all... you're literally 
 looking for all terms that start with home followed by a backslash.  
 Two asterisks at the end really collapse into a single one logically.
 
  Any theories as to why the it would not match:
 
  Document (relevant fields):
  Keywordtype:203
  Keywordname:marcipan + home*
 
  Is the \ escaping both * characters?
 
 So, again, no escaping is being done here.  You're a bit stuck in this 
 situation because * (and ?) are special to WildcardQuery, and it does 
 no escaping.  Two options I think of:
 
 - Build your own clone of WildcardQuery that does escaping - or 
 perhaps change the wildcard characters to something you do not index 
 and use those instead.
 
 - Replace asterisks in the terms indexed with some other non-wildcard 
 character, then replace it on your queries as appropriate.
 
 Erik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query Question

2005-02-17 Thread Luke Shannon
Hello;

My manager is now totally stuck about being able to query data with * in it.

Here are two queries.

TermQuery(new Term(type, 203));
WildcardQuery(new Term(name, *home\**));

They are joined in a boolean query. That query gives this result when you
call the toString():

+(type:203) +(name:*home\**)

This looks right to me.

Any theories as to why the it would not match:

Document (relevant fields):
Keywordtype:203
Keywordname:marcipan + home*

Is the \ escaping both * characters?

Thanks,

Luke




- Original Message - 
From: Luke Shannon [EMAIL PROTECTED]
To: Lucene Users List lucene-user@jakarta.apache.org
Sent: Thursday, February 17, 2005 2:44 PM
Subject: Query Question


 Hello;

 Why won't this query find the document below?

 Query:
 +(type:203) +(name:*home\**)

 Document (relevant fields):
 Keywordtype:203
 Keywordname:marcipan + home*

 I was hoping by escaping the * it would be treated as a string. What am I
 doing wrong?

 Thanks,

 Luke



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query Question

2005-02-17 Thread Erik Hatcher
On Feb 17, 2005, at 5:51 PM, Luke Shannon wrote:
My manager is now totally stuck about being able to query data with * 
in it.
He's gonna have to wait a bit longer, you've got a slightly tricky 
situation on your hands

WildcardQuery(new Term(name, *home\**));
The \* is the problem.  WildcardQuery doesn't deal with escaping like 
you're trying.  Your query is essentially this now:

home\*
Where backslash has no special meaning at all... you're literally 
looking for all terms that start with home followed by a backslash.  
Two asterisks at the end really collapse into a single one logically.

Any theories as to why the it would not match:
Document (relevant fields):
Keywordtype:203
Keywordname:marcipan + home*
Is the \ escaping both * characters?
So, again, no escaping is being done here.  You're a bit stuck in this 
situation because * (and ?) are special to WildcardQuery, and it does 
no escaping.  Two options I think of:

	- Build your own clone of WildcardQuery that does escaping - or 
perhaps change the wildcard characters to something you do not index 
and use those instead.

	- Replace asterisks in the terms indexed with some other non-wildcard 
character, then replace it on your queries as appropriate.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Boolean Phrase Query question

2004-04-03 Thread Erik Hatcher
On Apr 3, 2004, at 12:13 PM, Ankur Goel wrote:
Hi,
I have to provide a functionality which provides search on both file 
name
and contents of the file.

For indexing I use the following code:

org.apache.lucene.document.Document doc = new org.apache.
lucene.document.Document();
doc.add(Field.Keyword(fileId, + document.getFileId()));
doc.add(Field.Text(fileName,fileName);
doc.add(Field.Text(contents, new FileReader(new File(fileName)));
I'm not sure what you plan on doing with the fileName field, but you 
probably want to use a Keyword field for it.

And you may want to glue the file name and contents together into a 
single field to facilitate searches to span both.  (be sure to put a 
space in between if you do this)

For searching a text say  temp I use the following code to look both 
in
file Name and contents of the file:

BooleanQuery finalQuery = new BooleanQuery();
Query titleQuery = QueryParser.parse(temp,fileName,analyzer);
Query mainQuery = QueryParser.parse(temp,contents,analyzer);
finalQuery.add(titleQuery, true, false);
finalQuery.add(mainQuery, true, false);
Hits hits = is.search(finalQuery);
By using true on the finalQuery.add calls, you have said that both 
fields must have the word temp in them.  Is that what you meant?  Or 
did you mean an OR type of query?

	Erik



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Boolean Phrase Query question

2004-04-03 Thread Ankur Goel
Thanks Eric for the solution. I have to filename field as I have to give the
end user facility to search on File Name also. That's   why I am using TEXT
for file Name also.

By using true on the finalQuery.add calls, you have said that both 
fields must have the word temp in them.  Is that what you meant?  Or 
did you mean an OR type of query?

I need an OR type of query. I mean the word can be in the filename or in the
contents of the filename. But i am not able to do this. Can you tell me how
to do it?

Regards,
Ankur 

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Sunday, April 04, 2004 1:27 AM
To: Lucene Users List
Subject: Re: Boolean Phrase Query question

On Apr 3, 2004, at 12:13 PM, Ankur Goel wrote:

 Hi,
 I have to provide a functionality which provides search on both file 
 name
 and contents of the file.

 For indexing I use the following code:


 org.apache.lucene.document.Document doc = new org.apache.
 lucene.document.Document();
 doc.add(Field.Keyword(fileId, + document.getFileId()));
 doc.add(Field.Text(fileName,fileName);
 doc.add(Field.Text(contents, new FileReader(new File(fileName)));

I'm not sure what you plan on doing with the fileName field, but you 
probably want to use a Keyword field for it.

And you may want to glue the file name and contents together into a 
single field to facilitate searches to span both.  (be sure to put a 
space in between if you do this)

 For searching a text say  temp I use the following code to look both 
 in
 file Name and contents of the file:

 BooleanQuery finalQuery = new BooleanQuery();
 Query titleQuery = QueryParser.parse(temp,fileName,analyzer);
 Query mainQuery = QueryParser.parse(temp,contents,analyzer);

 finalQuery.add(titleQuery, true, false);
 finalQuery.add(mainQuery, true, false);

 Hits hits = is.search(finalQuery);

By using true on the finalQuery.add calls, you have said that both 
fields must have the word temp in them.  Is that what you meant?  Or 
did you mean an OR type of query?

Erik



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Boolean Phrase Query question

2004-04-03 Thread Erik Hatcher
On Apr 3, 2004, at 3:05 PM, Ankur Goel wrote:
By using true on the finalQuery.add calls, you have said that both
fields must have the word temp in them.  Is that what you meant?  Or
did you mean an OR type of query?
I need an OR type of query. I mean the word can be in the filename or 
in the
contents of the filename. But i am not able to do this. Can you tell 
me how
to do it?
I did tell you how to do it.  Use false for both required and 
prohibited flags when adding queries to a BooleanQuery.  Check the 
javadocs for more details.

Keep in mind (and see recent, and frequent, discussion on this topic) 
that your analyzer choice is very important.  Look at my intro Lucene 
article for code to allow you to view what is happening with the 
analysis process.

	Erik


Regards,
Ankur
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Sunday, April 04, 2004 1:27 AM
To: Lucene Users List
Subject: Re: Boolean Phrase Query question
On Apr 3, 2004, at 12:13 PM, Ankur Goel wrote:
Hi,
I have to provide a functionality which provides search on both file
name
and contents of the file.
For indexing I use the following code:

org.apache.lucene.document.Document doc = new org.apache.
lucene.document.Document();
doc.add(Field.Keyword(fileId, + document.getFileId()));
doc.add(Field.Text(fileName,fileName);
doc.add(Field.Text(contents, new FileReader(new File(fileName)));
I'm not sure what you plan on doing with the fileName field, but you
probably want to use a Keyword field for it.
And you may want to glue the file name and contents together into a
single field to facilitate searches to span both.  (be sure to put a
space in between if you do this)
For searching a text say  temp I use the following code to look both
in
file Name and contents of the file:
BooleanQuery finalQuery = new BooleanQuery();
Query titleQuery = QueryParser.parse(temp,fileName,analyzer);
Query mainQuery = QueryParser.parse(temp,contents,analyzer);
finalQuery.add(titleQuery, true, false);
finalQuery.add(mainQuery, true, false);
Hits hits = is.search(finalQuery);
By using true on the finalQuery.add calls, you have said that both
fields must have the word temp in them.  Is that what you meant?  Or
did you mean an OR type of query?
	Erik



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Query question

2004-02-06 Thread Justin Woody
Hi Erik,

Here is the IndexWriter with the Standard analyzer:
Class variable: IndexWriter writer;


writer = IndexWriter(indexDirectory, new StandardAnalyzer(), true); 

While looping over the ResultSet I call this method:

private void indexDoc(ResultSet rs) throws Exception {
Document doc = new Document();

doc.add(Field.UnIndexed(value, rs.getString(value)));
doc.add(Field.UnIndexed(name, rs.getString(name)));

doc.add(Field.UnStored(content,rs.getString(indexed)));

writer.addDocument(doc);
}

The indexed data is a concatenation of the Code and Desciptor(s)
fields that they want to search by. They are concatenated with a space.
Ex. Select col1 as value, col2 as name, col3 || ' ' || col2 || ' ' ||
col5 as indexed from tableName. Since there are many tables that are
similar in structure I wrote the queries like this so I could multi
thread the re indexing process on a frequent basis and use one generic
class.

Here is my test search class:

public IndexSearchTest(String search, String index) throws Exception {
String indexName = dirLucene + index +/;
System.out.println(Index Name  + indexName);

IndexSearcher searcher = new
IndexSearcher(IndexReader.open(indexName));

Query query = QueryParser.parse(search.toUpperCase(), content,
new StandardAnalyzer());

Hits hits = searcher.search(query);
Document result;
System.out.println(Begin Search Results);
for (int i=0;ihits.length();i++) {
result = hits.doc(i);
System.out.println(Key : + result.get(value) +  Desc: 
+ result.get(name)) ;
}
System.out.println(Finished Search:  +hits.length());
}

Thanks in advance,
Justin

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Thursday, February 05, 2004 6:34 PM
To: Lucene Users List
Subject: Re: Query question


On Feb 5, 2004, at 3:27 PM, Justin Woody wrote:
 If I search the index for building it comes back fine (2 records) or

 builder (1record), but if I search for build* I only receive one 
 record, in my example, the second record. The client would like all 3 
 records to come back. Is there a way I can make that happen? I've been

 trying different query types and syntax, but haven't been able to 
 succeed.

We need more details to know what is going on.  What analyzer are you 
using with indexing?

How are you building the query objects?   QueryParser?  Same Analyzer 
as with indexer?

(Succinct) code is the best :)

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query question

2004-02-06 Thread Erik Hatcher
Everything you are doing looks ok to me.  Next step is to run some 
sample text through something like the AnalyzerDemo.analyze method 
shown here:

	http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html

Be sure to use real world data, although builder building would be a 
good first pass to ensure all is working well then.  If you are really 
searching for build* using the code you've shown (without the 
quotes!) then it should work from my quick look at what you've done.

	Erik

On Feb 6, 2004, at 9:27 AM, Justin Woody wrote:

Hi Erik,

Here is the IndexWriter with the Standard analyzer:
Class variable: IndexWriter writer;
writer = IndexWriter(indexDirectory, new StandardAnalyzer(), true);

While looping over the ResultSet I call this method:

private void indexDoc(ResultSet rs) throws Exception {
Document doc = new Document();
doc.add(Field.UnIndexed(value, rs.getString(value)));
doc.add(Field.UnIndexed(name, rs.getString(name)));
doc.add(Field.UnStored(content,rs.getString(indexed)));

writer.addDocument(doc);
}
The indexed data is a concatenation of the Code and Desciptor(s)
fields that they want to search by. They are concatenated with a space.
Ex. Select col1 as value, col2 as name, col3 || ' ' || col2 || ' ' ||
col5 as indexed from tableName. Since there are many tables that are
similar in structure I wrote the queries like this so I could multi
thread the re indexing process on a frequent basis and use one generic
class.
Here is my test search class:

public IndexSearchTest(String search, String index) throws Exception {
String indexName = dirLucene + index +/;
System.out.println(Index Name  + indexName);
IndexSearcher searcher = new
IndexSearcher(IndexReader.open(indexName));
Query query = QueryParser.parse(search.toUpperCase(), 
content,
new StandardAnalyzer());

Hits hits = searcher.search(query);
Document result;
System.out.println(Begin Search Results);
for (int i=0;ihits.length();i++) {
result = hits.doc(i);
System.out.println(Key : + result.get(value) +  Desc: 

+ result.get(name)) ;
}
System.out.println(Finished Search:  +hits.length());
}

Thanks in advance,
Justin
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, February 05, 2004 6:34 PM
To: Lucene Users List
Subject: Re: Query question
On Feb 5, 2004, at 3:27 PM, Justin Woody wrote:
If I search the index for building it comes back fine (2 records) or

builder (1record), but if I search for build* I only receive one
record, in my example, the second record. The client would like all 3
records to come back. Is there a way I can make that happen? I've been

trying different query types and syntax, but haven't been able to
succeed.
We need more details to know what is going on.  What analyzer are you
using with indexing?
How are you building the query objects?   QueryParser?  Same Analyzer
as with indexer?
(Succinct) code is the best :)

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Query question

2004-02-06 Thread Justin Woody
Hi Erik,

The analysis class is parsing the terms as expected. However, no partial
terms will return results. I've tried the following:
build
build*
build
build*

All return 0 hits unless the entire word (in this case build) appears.
I've tried this with multiple keywords. Any other ideas?
Thanks,
Justin

Looking forward to your book, there's not enough info out there for
Lucene.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 06, 2004 11:42 AM
To: Lucene Users List
Subject: Re: Query question


Everything you are doing looks ok to me.  Next step is to run some 
sample text through something like the AnalyzerDemo.analyze method 
shown here:

http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html

Be sure to use real world data, although builder building would be a 
good first pass to ensure all is working well then.  If you are really 
searching for build* using the code you've shown (without the 
quotes!) then it should work from my quick look at what you've done.

Erik

On Feb 6, 2004, at 9:27 AM, Justin Woody wrote:

 Hi Erik,

 Here is the IndexWriter with the Standard analyzer:
 Class variable: IndexWriter writer;


 writer = IndexWriter(indexDirectory, new StandardAnalyzer(), true);

 While looping over the ResultSet I call this method:

 private void indexDoc(ResultSet rs) throws Exception {
 Document doc = new Document();

 doc.add(Field.UnIndexed(value, rs.getString(value)));
 doc.add(Field.UnIndexed(name, rs.getString(name)));

 doc.add(Field.UnStored(content,rs.getString(indexed)));

 writer.addDocument(doc);
 }

 The indexed data is a concatenation of the Code and Desciptor(s) 
 fields that they want to search by. They are concatenated with a 
 space. Ex. Select col1 as value, col2 as name, col3 || ' ' || col2 || 
 ' ' || col5 as indexed from tableName. Since there are many tables 
 that are similar in structure I wrote the queries like this so I could

 multi thread the re indexing process on a frequent basis and use one 
 generic class.

 Here is my test search class:

 public IndexSearchTest(String search, String index) throws Exception {
 String indexName = dirLucene + index +/;
 System.out.println(Index Name  + indexName);

 IndexSearcher searcher = new 
 IndexSearcher(IndexReader.open(indexName));

 Query query = QueryParser.parse(search.toUpperCase(),
 content,
 new StandardAnalyzer());

 Hits hits = searcher.search(query);
 Document result;
 System.out.println(Begin Search Results);
 for (int i=0;ihits.length();i++) {
 result = hits.doc(i);
 System.out.println(Key : + result.get(value) +  Desc:
 
 + result.get(name)) ;
 }
 System.out.println(Finished Search:  +hits.length());
 }

 Thanks in advance,
 Justin

 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Thursday, February 05, 2004 6:34 PM
 To: Lucene Users List
 Subject: Re: Query question


 On Feb 5, 2004, at 3:27 PM, Justin Woody wrote:
 If I search the index for building it comes back fine (2 records) 
 or

 builder (1record), but if I search for build* I only receive one 
 record, in my example, the second record. The client would like all 3

 records to come back. Is there a way I can make that happen? I've 
 been

 trying different query types and syntax, but haven't been able to 
 succeed.

 We need more details to know what is going on.  What analyzer are you 
 using with indexing?

 How are you building the query objects?   QueryParser?  Same Analyzer
 as with indexer?

 (Succinct) code is the best :)

   Erik


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query question

2004-02-06 Thread Justin Woody
Erik,

I think I found the problem. I thought queries were case sensitive, but
after running your AnalyzerDemo, it seems that it was indexing all of my
information in lower case. Anyway, when I did a toLowerCase() on my
search string, the expected results were returned. Does this sound
right?

Thanks
Justin

-Original Message-
From: Justin Woody [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 06, 2004 2:33 PM
To: 'Lucene Users List'
Subject: RE: Query question


Hi Erik,

The analysis class is parsing the terms as expected. However, no partial
terms will return results. I've tried the following:
build
build*
build
build*

All return 0 hits unless the entire word (in this case build) appears.
I've tried this with multiple keywords. Any other ideas?
Thanks,
Justin

Looking forward to your book, there's not enough info out there for
Lucene.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 06, 2004 11:42 AM
To: Lucene Users List
Subject: Re: Query question


Everything you are doing looks ok to me.  Next step is to run some 
sample text through something like the AnalyzerDemo.analyze method 
shown here:

http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html

Be sure to use real world data, although builder building would be a 
good first pass to ensure all is working well then.  If you are really 
searching for build* using the code you've shown (without the 
quotes!) then it should work from my quick look at what you've done.

Erik

On Feb 6, 2004, at 9:27 AM, Justin Woody wrote:

 Hi Erik,

 Here is the IndexWriter with the Standard analyzer:
 Class variable: IndexWriter writer;


 writer = IndexWriter(indexDirectory, new StandardAnalyzer(), true);

 While looping over the ResultSet I call this method:

 private void indexDoc(ResultSet rs) throws Exception {
 Document doc = new Document();

 doc.add(Field.UnIndexed(value, rs.getString(value)));
 doc.add(Field.UnIndexed(name, rs.getString(name)));

 doc.add(Field.UnStored(content,rs.getString(indexed)));

 writer.addDocument(doc);
 }

 The indexed data is a concatenation of the Code and Desciptor(s) 
 fields that they want to search by. They are concatenated with a 
 space. Ex. Select col1 as value, col2 as name, col3 || ' ' || col2 || 
 ' ' || col5 as indexed from tableName. Since there are many tables 
 that are similar in structure I wrote the queries like this so I could

 multi thread the re indexing process on a frequent basis and use one 
 generic class.

 Here is my test search class:

 public IndexSearchTest(String search, String index) throws Exception {
 String indexName = dirLucene + index +/;
 System.out.println(Index Name  + indexName);

 IndexSearcher searcher = new 
 IndexSearcher(IndexReader.open(indexName));

 Query query = QueryParser.parse(search.toUpperCase(),
 content,
 new StandardAnalyzer());

 Hits hits = searcher.search(query);
 Document result;
 System.out.println(Begin Search Results);
 for (int i=0;ihits.length();i++) {
 result = hits.doc(i);
 System.out.println(Key : + result.get(value) +  Desc:
 
 + result.get(name)) ;
 }
 System.out.println(Finished Search:  +hits.length());
 }

 Thanks in advance,
 Justin

 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Thursday, February 05, 2004 6:34 PM
 To: Lucene Users List
 Subject: Re: Query question


 On Feb 5, 2004, at 3:27 PM, Justin Woody wrote:
 If I search the index for building it comes back fine (2 records) 
 or

 builder (1record), but if I search for build* I only receive one 
 record, in my example, the second record. The client would like all 3

 records to come back. Is there a way I can make that happen? I've 
 been

 trying different query types and syntax, but haven't been able to 
 succeed.

 We need more details to know what is going on.  What analyzer are you 
 using with indexing?

 How are you building the query objects?   QueryParser?  Same Analyzer
 as with indexer?

 (Succinct) code is the best :)

   Erik


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands

RE: Newbie Phrase Query question

2004-02-05 Thread Scott Smith
Actually, I found your QueryParser Rules article the most useful.  It
explained a number of things that I had puzzled about.  Query.toString()
helped also.

So, obvious in hindsight, an exact phrase match still goes through the
tokenizer.  If there are stop words or you're stemming or etc., you need
to tokenize the phrase before trying to get an exact match.  Clearly,
that has implications for what exact phrase match means.

The toString() told me that the quotes are handled by the queryParser.
The weblucene cjk tokenizer works just fine with it and I didn't make
any changes to it.

The bad news is that after going through all of this, the code just
started to work as expected.  I'm not sure what I did to fix it.

There is a minor issue I found that I think works as documented, but
wonder why it's that way.  If you enter a search string that's a
hyphenated word such as fred-bill (w/o the quotes), the QueryParser
generates a search string to find all documents with fred but w/o bill.
I believe this is expected behavior based on the javadocs.  The effect
of this is that a hyphenated word gives unexpected results unless
surrounded by quotes.  Perhaps the syntax should have been fred -bill
(space before the hyphen required) to indicate that you didn't want bill
and that it's not a hyphenated word.  Seems a tad more general.  It's an
issue for me because my application deals with hyphenated words a lot
and I don't think my users would ever understand when quotes should be
used and when they should not (most of them won't figure out how to use
the not syntax).  I can solve it by requiring the user to enter a
space before the hyphen if they mean not and then have the search code
automatically add the quotes for hyphenated words.  It's just a little
painful.  Just a thought for 1.4. ;-)

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 03, 2004 8:26 PM
To: Lucene Users List
Subject: Re: Newbie Phrase Query question


The best suggestion I have is to look at the code in my first java.net 
article (Intro Lucene) and borrow the Analyzer utility code to see what 
happens to a sample string as it is analyzed.  Then pass that same 
string to QueryParser (along with the same analyzer) and see what the 
Query.toString(default field name) returns.  This should shed light 
on the issue more clearly.

Erik


On Feb 3, 2004, at 10:01 PM, Scott Smith wrote:

 I'm having problems searching for an exact match with a phrase. 
 Essentially, I think my problem is that the tokenizer is tossing the 
 double quotes around the phrase, tokenizing each word and so I end up 
 with the document hit I want plus several more I don't (the latter 
 having some of the words, but not exact matches).  Here's the 
 specifics.


 First, I'm using the CJKTokenizer from WebLucene which I believe is a 
 modified version of the stopword tokenizer enhanced to handle asian 
 characters (that's according to the header; I don't think the asian 
 characters have anything to do with my problem).

 The documents I need to search, for reasons related to the 
 application, often end up with hyphenated words in critical places.  
 For example, the original text to be indexed might be something like 
 this is Bill-Fred.

 When this is tokenized initially, I end up with two tokens bill and 
 fred (the tokenizer converts to lower case;  this and is are 
 removed as stop words; the hyphen is removed by the tokenizer).  So 
 far so good.

 I pass the phrase I want an exact match on to a QueryParser in quotes 
 (so Bill-Fred is the search string; quotes included).  I watched the

 output of the tokenizer from the query parser and it is clearly 
 tossing the double quotes and tokenizing each word separately.  It 
 passes the words bill and fred as separate entities back to the 
 QueryParser. Looking at the tokenizer code, I understand why.  
 Obviously, that's why I end up with documents that contain the words 
 even if they are not exact matches.

 Here's the question.  I can modify the CJKTokenizer so that when it
 sees
 Fred-Bill it creates a single token that looks like fred bill.
 Would this now work?  Is this the right thing to do?  I realize this
 means that I'd hit on Fred-Bill and Fred Bill, but I can probably
 live with that.

 However, it also seems like I now have a problem if the original text 
 contains a quotation from someone that happens to be part of the 
 document (i.e., the original text has double quotes in it).  It seems 
 like I need to ignore quotes for the initial index, but use them to 
 build phrases when I'm tokenizing a search string in the QueryParser. 
 Do I need two tokenizers?

 Does any of this make any sense?  I'm not quite sure what the 
 QueryParser wants to see to properly do a phrase match.  Is 
 QueryParser the wrong thing to be using here?  Suggestions or 
 comments?

 Scott

 -
 To unsubscribe, e-mail

Re: Newbie Phrase Query question

2004-02-05 Thread Erik Hatcher
On Feb 5, 2004, at 8:19 PM, Scott Smith wrote:
There is a minor issue I found that I think works as documented, but
wonder why it's that way.  If you enter a search string that's a
hyphenated word such as fred-bill (w/o the quotes), the QueryParser
generates a search string to find all documents with fred but w/o bill.
I believe this is expected behavior based on the javadocs.
This is actually a documented bug that needs to be fixed.  If there is 
no whitespace, the dash should not be taken as term negation, but 
rather the entire unit should be passed to the analyzer.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Newbie Phrase Query question

2004-02-03 Thread Scott Smith
I'm having problems searching for an exact match with a phrase.
Essentially, I think my problem is that the tokenizer is tossing the
double quotes around the phrase, tokenizing each word and so I end up
with the document hit I want plus several more I don't (the latter
having some of the words, but not exact matches).  Here's the specifics.


First, I'm using the CJKTokenizer from WebLucene which I believe is a
modified version of the stopword tokenizer enhanced to handle asian
characters (that's according to the header; I don't think the asian
characters have anything to do with my problem).  

The documents I need to search, for reasons related to the application,
often end up with hyphenated words in critical places.  For example, the
original text to be indexed might be something like this is Bill-Fred.

When this is tokenized initially, I end up with two tokens bill and
fred (the tokenizer converts to lower case;  this and is are
removed as stop words; the hyphen is removed by the tokenizer).  So far
so good.

I pass the phrase I want an exact match on to a QueryParser in quotes
(so Bill-Fred is the search string; quotes included).  I watched the
output of the tokenizer from the query parser and it is clearly tossing
the double quotes and tokenizing each word separately.  It passes the
words bill and fred as separate entities back to the QueryParser.
Looking at the tokenizer code, I understand why.  Obviously, that's why
I end up with documents that contain the words even if they are not
exact matches.

Here's the question.  I can modify the CJKTokenizer so that when it sees
Fred-Bill it creates a single token that looks like fred bill.
Would this now work?  Is this the right thing to do?  I realize this
means that I'd hit on Fred-Bill and Fred Bill, but I can probably
live with that.  

However, it also seems like I now have a problem if the original text
contains a quotation from someone that happens to be part of the
document (i.e., the original text has double quotes in it).  It seems
like I need to ignore quotes for the initial index, but use them to
build phrases when I'm tokenizing a search string in the QueryParser.
Do I need two tokenizers?

Does any of this make any sense?  I'm not quite sure what the
QueryParser wants to see to properly do a phrase match.  Is QueryParser
the wrong thing to be using here?  Suggestions or comments?

Scott

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Newbie Phrase Query question

2004-02-03 Thread Erik Hatcher
The best suggestion I have is to look at the code in my first java.net 
article (Intro Lucene) and borrow the Analyzer utility code to see what 
happens to a sample string as it is analyzed.  Then pass that same 
string to QueryParser (along with the same analyzer) and see what the 
Query.toString(default field name) returns.  This should shed light 
on the issue more clearly.

	Erik

On Feb 3, 2004, at 10:01 PM, Scott Smith wrote:

I'm having problems searching for an exact match with a phrase.
Essentially, I think my problem is that the tokenizer is tossing the
double quotes around the phrase, tokenizing each word and so I end up
with the document hit I want plus several more I don't (the latter
having some of the words, but not exact matches).  Here's the 
specifics.

First, I'm using the CJKTokenizer from WebLucene which I believe is a
modified version of the stopword tokenizer enhanced to handle asian
characters (that's according to the header; I don't think the asian
characters have anything to do with my problem).
The documents I need to search, for reasons related to the application,
often end up with hyphenated words in critical places.  For example, 
the
original text to be indexed might be something like this is 
Bill-Fred.

When this is tokenized initially, I end up with two tokens bill and
fred (the tokenizer converts to lower case;  this and is are
removed as stop words; the hyphen is removed by the tokenizer).  So far
so good.
I pass the phrase I want an exact match on to a QueryParser in quotes
(so Bill-Fred is the search string; quotes included).  I watched the
output of the tokenizer from the query parser and it is clearly tossing
the double quotes and tokenizing each word separately.  It passes the
words bill and fred as separate entities back to the QueryParser.
Looking at the tokenizer code, I understand why.  Obviously, that's why
I end up with documents that contain the words even if they are not
exact matches.
Here's the question.  I can modify the CJKTokenizer so that when it 
sees
Fred-Bill it creates a single token that looks like fred bill.
Would this now work?  Is this the right thing to do?  I realize this
means that I'd hit on Fred-Bill and Fred Bill, but I can probably
live with that.

However, it also seems like I now have a problem if the original text
contains a quotation from someone that happens to be part of the
document (i.e., the original text has double quotes in it).  It seems
like I need to ignore quotes for the initial index, but use them to
build phrases when I'm tokenizing a search string in the QueryParser.
Do I need two tokenizers?
Does any of this make any sense?  I'm not quite sure what the
QueryParser wants to see to properly do a phrase match.  Is QueryParser
the wrong thing to be using here?  Suggestions or comments?
Scott

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Query question

2004-01-13 Thread Erik Hatcher
On Jan 12, 2004, at 7:49 PM, Scott Smith wrote:
Does the following do that:

BooleanQuery Query QA = new Boolean Query();
Query qa1 = QueryParser.parse(A1, FieldA, analyzer());
Query qa2 = QueryParser.parse(A2, FieldA, analyzer());
QA.add(qa1, false, false);  // this term is not required
QA.add(qa2, false, false);  // this term is not required
BooleanQuery QB = new BooleanQuery();
Query qb1 = QueryParser.parse(B1, FieldB, analyzer());
Query qb2 = QueryParser.parse(B2, FieldB, analyzer());
QB.add(qb1, false, false);  // this term is not required
QB.add(qb2, false, false);  // this term is not required
BooleanQuery Qfinal = new BooleanQuery();
Qfinal.add(QA, true, false);// gotta have at least one from here
Qfinal.add(QB, true, false);// gotta have at least one from here
	hits = mySearcher.search(Qfinal);
Your use of QueryParser is unnecessary.  Simply construct TermQuery's 
instead.   Otherwise, what you are doing looks fine.

I guess I'm assuming that if I add a queries to a BooleanQuery and 
none of
the items are required, there still needs to be a hit on at least one 
of the
items for the Document to make it out of the BooleanQuery.
Right.   A OR B means that either A or B have to be present, but if 
neither are present then there is no match.

Is this the right way to do this?  Is there an easier/faster way to do 
the
same thing?
You're asking a pretty general question - are you really just using two 
terms for each field?  What you've shown based on the example (with the 
exception of using QueryParser) is fine.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Query question

2004-01-13 Thread Scott Smith
So I can write:

Query q2 = new TermQuery(new Term(a1, FieldA));

And similar things for all of the QueryParser's.  This makes sense and I
assume must be more efficient than using the QueryParser for simple terms.

As you have guessed, there may be an arbitrary number of terms (not just 2)
but they are all simple words.  Some of the terms are generated
programmatically and not entered explicitly by the user.  But the code below
(even using TermQuery) seems like it should generalize to an arbitrary
number of terms.

I guess what is confusing me now is that the search code no longer
references an analyzer???!!!  How does it know how to tokenize, stem, etc.
the search terms?

Thanks for the help

Scott

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 13, 2004 6:27 AM
To: Lucene Users List
Subject: Re: Query question


On Jan 12, 2004, at 7:49 PM, Scott Smith wrote:
 Does the following do that:

   BooleanQuery Query QA = new Boolean Query();
   Query qa1 = QueryParser.parse(A1, FieldA, analyzer());
   Query qa2 = QueryParser.parse(A2, FieldA, analyzer());
   QA.add(qa1, false, false);  // this term is not required
   QA.add(qa2, false, false);  // this term is not required

   BooleanQuery QB = new BooleanQuery();
   Query qb1 = QueryParser.parse(B1, FieldB, analyzer());
   Query qb2 = QueryParser.parse(B2, FieldB, analyzer());
   QB.add(qb1, false, false);  // this term is not required
   QB.add(qb2, false, false);  // this term is not required

   BooleanQuery Qfinal = new BooleanQuery();
   Qfinal.add(QA, true, false);// gotta have at least one from here
   Qfinal.add(QB, true, false);// gotta have at least one from here

   hits = mySearcher.search(Qfinal);

Your use of QueryParser is unnecessary.  Simply construct TermQuery's 
instead.   Otherwise, what you are doing looks fine.

 I guess I'm assuming that if I add a queries to a BooleanQuery and
 none of
 the items are required, there still needs to be a hit on at least one 
 of the
 items for the Document to make it out of the BooleanQuery.

Right.   A OR B means that either A or B have to be present, but if 
neither are present then there is no match.

 Is this the right way to do this?  Is there an easier/faster way to do
 the
 same thing?

You're asking a pretty general question - are you really just using two 
terms for each field?  What you've shown based on the example (with the 
exception of using QueryParser) is fine.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query question

2004-01-13 Thread Erik Hatcher
On Jan 13, 2004, at 5:21 PM, Scott Smith wrote:
I guess what is confusing me now is that the search code no longer
references an analyzer???!!!  How does it know how to tokenize, stem, 
etc.
the search terms?
It doesn't.  A TermQuery is exactly as-is.  If you need the analysis 
part, you can use QueryParser or talk to an Analyzer directly and use 
the TokenStream it feeds you back to build TermQuery's by hand.  I 
would not recommend using QueryParser for code-generated queries - 
there are just too many variables in that equation for comfort (to me).

	Erik



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Query question

2004-01-12 Thread Scott Smith
I have two fields, call them FieldA and FieldB.  I have a set of words I'm
looking for in FieldA, call them A1 and A2.  I have a different set of words
for FieldB, call them B1 and B2.  Now I want a hit list which contains items
that have at least one A item in FieldA and one B item in FieldB.  In
essence, I think I'm saying I want (A1 OR A2) AND (B1 OR B2)

Does the following do that:

BooleanQuery Query QA = new Boolean Query();
Query qa1 = QueryParser.parse(A1, FieldA, analyzer());
Query qa2 = QueryParser.parse(A2, FieldA, analyzer());
QA.add(qa1, false, false);  // this term is not required
QA.add(qa2, false, false);  // this term is not required

BooleanQuery QB = new BooleanQuery();
Query qb1 = QueryParser.parse(B1, FieldB, analyzer());
Query qb2 = QueryParser.parse(B2, FieldB, analyzer());
QB.add(qb1, false, false);  // this term is not required
QB.add(qb2, false, false);  // this term is not required

BooleanQuery Qfinal = new BooleanQuery();
Qfinal.add(QA, true, false);// gotta have at least one from here
Qfinal.add(QB, true, false);// gotta have at least one from here

hits = mySearcher.search(Qfinal);

I guess I'm assuming that if I add a queries to a BooleanQuery and none of
the items are required, there still needs to be a hit on at least one of the
items for the Document to make it out of the BooleanQuery.

Is this the right way to do this?  Is there an easier/faster way to do the
same thing?

Scott

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query question

2003-09-11 Thread Rob Outar
Otis,

Are you referring to this:

How do I retrieve all the values of a particular field that exists 
within
an index, across all documents?

I need a query to do it, the only way clients access the index is via
queries so they cannot write the code in the faq above.

Thanks,

Rob


-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Wednesday, September 10, 2003 5:05 PM
To: Lucene Users List
Subject: Re: Query question


Go to Lucene FAQ at jGuru.com and search for the word 'all'.

Otis

--- Rob Outar [EMAIL PROTECTED] wrote:
 Hi all,

   I have a field called echelon that are assigned to certain files.
 Is
 there a query I can write that will give me all files that have this
 field?

   I have tried stuff like echelon:.+*, echelon:*, etc... some give a
 query
 parser exception while others return nothing.

 Let me know,

 Rob



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



__
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query question

2003-09-11 Thread Otis Gospodnetic
Aha.  You can't do it with a query, unless you add a fixed-value field
to all documents added to your index.
e.g.
field:X

Then you can get all documents by searching for +field:X

Otis

--- Rob Outar [EMAIL PROTECTED] wrote:
 Otis,
 
   Are you referring to this:
 
   How do I retrieve all the values of a particular field that exists
 within
 an index, across all documents?
 
 I need a query to do it, the only way clients access the index is via
 queries so they cannot write the code in the faq above.
 
 Thanks,
 
 Rob
 
 
 -Original Message-
 From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, September 10, 2003 5:05 PM
 To: Lucene Users List
 Subject: Re: Query question
 
 
 Go to Lucene FAQ at jGuru.com and search for the word 'all'.
 
 Otis
 
 --- Rob Outar [EMAIL PROTECTED] wrote:
  Hi all,
 
  I have a field called echelon that are assigned to certain
 files.
  Is
  there a query I can write that will give me all files that have
 this
  field?
 
  I have tried stuff like echelon:.+*, echelon:*, etc... some give a
  query
  parser exception while others return nothing.
 
  Let me know,
 
  Rob
 
 
 
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 
 
 __
 Do you Yahoo!?
 Yahoo! SiteBuilder - Free, easy-to-use web site design software
 http://sitebuilder.yahoo.com
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


__
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Query question

2003-09-10 Thread Rob Outar
Hi all,

I have a field called echelon that are assigned to certain files.  Is
there a query I can write that will give me all files that have this field?

I have tried stuff like echelon:.+*, echelon:*, etc... some give a query
parser exception while others return nothing.

Let me know,

Rob



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query question

2003-09-10 Thread Otis Gospodnetic
Go to Lucene FAQ at jGuru.com and search for the word 'all'.

Otis

--- Rob Outar [EMAIL PROTECTED] wrote:
 Hi all,
 
   I have a field called echelon that are assigned to certain files. 
 Is
 there a query I can write that will give me all files that have this
 field?
 
   I have tried stuff like echelon:.+*, echelon:*, etc... some give a
 query
 parser exception while others return nothing.
 
 Let me know,
 
 Rob
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


__
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: query question in trouble

2003-06-12 Thread Ulrich Mayring
Aviran Mordo wrote:
In is probably a STOP word in your analyzer
Actually I think it's not a good idea to apply stopwords, when the user 
searches with exact string.

Ulrich



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


query question in trouble

2003-06-11 Thread Ryan Clifton
Hello,

Upon reviewing the results of some queries recently I noticed that the query: in 
trouble always searches for trouble.

Is 'in' a keyword that I'm not aware of?  I searched the whole query syntax page and 
didn't see it mentioned.  I tried an trouble and the query worked fine.  The query 
parser appears to be stripping out 'in', but not doing anything with it.

Here's my log:

**Query: in trouble
2003-06-11 12:08:50,540 DEBUG Searching for: textcontent:trouble (Query.toString())
2003-06-11 12:08:50,569 DEBUG 6582 total matching documents

**Query: an trouble
2003-06-11 12:06:11,275 DEBUG Searching for: textcontent:an trouble  
(Query.toString())
2003-06-11 12:06:12,342 DEBUG 1 total matching documents

Any ideas?

Thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: query question in trouble

2003-06-11 Thread Aviran Mordo
In is probably a STOP word in your analyzer

-Original Message-
From: Ryan Clifton [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 11, 2003 3:13 PM
To: Lucene Users List
Subject: query question in trouble


Hello,

Upon reviewing the results of some queries recently I noticed that the
query: in trouble always searches for trouble.

Is 'in' a keyword that I'm not aware of?  I searched the whole query
syntax page and didn't see it mentioned.  I tried an trouble and the
query worked fine.  The query parser appears to be stripping out 'in',
but not doing anything with it.

Here's my log:

**Query: in trouble
2003-06-11 12:08:50,540 DEBUG Searching for: textcontent:trouble
(Query.toString()) 2003-06-11 12:08:50,569 DEBUG 6582 total matching
documents

**Query: an trouble
2003-06-11 12:06:11,275 DEBUG Searching for: textcontent:an trouble
(Query.toString()) 2003-06-11 12:06:12,342 DEBUG 1 total matching
documents

Any ideas?

Thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: query question in trouble

2003-06-11 Thread Ryan Clifton
Actually, I'm using the StandardAnalyzer.  I pretty much using an off-the-shelf 
implementation of Lucene.

-Original Message-
From: Aviran Mordo [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 11, 2003 12:50 PM
To: 'Lucene Users List'
Subject: RE: query question in trouble


In is probably a STOP word in your analyzer

-Original Message-
From: Ryan Clifton [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 11, 2003 3:13 PM
To: Lucene Users List
Subject: query question in trouble


Hello,

Upon reviewing the results of some queries recently I noticed that the
query: in trouble always searches for trouble.

Is 'in' a keyword that I'm not aware of?  I searched the whole query
syntax page and didn't see it mentioned.  I tried an trouble and the
query worked fine.  The query parser appears to be stripping out 'in',
but not doing anything with it.

Here's my log:

**Query: in trouble
2003-06-11 12:08:50,540 DEBUG Searching for: textcontent:trouble
(Query.toString()) 2003-06-11 12:08:50,569 DEBUG 6582 total matching
documents

**Query: an trouble
2003-06-11 12:06:11,275 DEBUG Searching for: textcontent:an trouble
(Query.toString()) 2003-06-11 12:06:12,342 DEBUG 1 total matching
documents

Any ideas?

Thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: query question in trouble

2003-06-11 Thread Ryan Clifton
Ok, well you were right.

public class StandardAnalyzer extends Analyzer {
private Hashtable stopTable;

/** An array containing some common English words that are usually not
useful for searching. */
public static final String[] STOP_WORDS = {
a, and, are, as, at, be, but, by,
for, if, in, into, is, it,
no, not, of, on, or, s, such,
t, that, the, their, then, there, these,
they, this, to, was, will, with
};

Thanks.

-Original Message-
From: Ryan Clifton 
Sent: Wednesday, June 11, 2003 12:52 PM
To: Lucene Users List
Subject: RE: query question in trouble


Actually, I'm using the StandardAnalyzer.  I pretty much using an off-the-shelf 
implementation of Lucene.

-Original Message-
From: Aviran Mordo [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 11, 2003 12:50 PM
To: 'Lucene Users List'
Subject: RE: query question in trouble


In is probably a STOP word in your analyzer

-Original Message-
From: Ryan Clifton [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 11, 2003 3:13 PM
To: Lucene Users List
Subject: query question in trouble


Hello,

Upon reviewing the results of some queries recently I noticed that the
query: in trouble always searches for trouble.

Is 'in' a keyword that I'm not aware of?  I searched the whole query
syntax page and didn't see it mentioned.  I tried an trouble and the
query worked fine.  The query parser appears to be stripping out 'in',
but not doing anything with it.

Here's my log:

**Query: in trouble
2003-06-11 12:08:50,540 DEBUG Searching for: textcontent:trouble
(Query.toString()) 2003-06-11 12:08:50,569 DEBUG 6582 total matching
documents

**Query: an trouble
2003-06-11 12:06:11,275 DEBUG Searching for: textcontent:an trouble
(Query.toString()) 2003-06-11 12:06:12,342 DEBUG 1 total matching
documents

Any ideas?

Thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Boolean Query Question

2003-01-24 Thread Otis Gospodnetic
You should not get any documents that contain 'e' in the search
results.
-e means 'e' is verbotten!

Otis

--- alex [EMAIL PROTECTED] wrote:
 HI all
 
 If i enter a search say:  +a  +b  +c   -e
 this return a set of results containing a AND b AND c , if I find in
 the
 results there is a term  e  aswell does this mean the search failed
 or is this correct ? can someone explain please ?
 
 thxs
 
 Alex
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Boolean Query Question

2003-01-24 Thread alex
Thx for the answer


- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Friday, January 24, 2003 8:53 PM
Subject: Re: Boolean Query Question


 You should not get any documents that contain 'e' in the search
 results.
 -e means 'e' is verbotten!

 Otis

 --- alex [EMAIL PROTECTED] wrote:
  HI all
 
  If i enter a search say:  +a  +b  +c   -e
  this return a set of results containing a AND b AND c , if I find in
  the
  results there is a term  e  aswell does this mean the search failed
  or is this correct ? can someone explain please ?
 
  thxs
 
  Alex
 
 
  --
  To unsubscribe, e-mail:
  mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
  mailto:[EMAIL PROTECTED]
 


 __
 Do you Yahoo!?
 Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
 http://mailplus.yahoo.com

 --
 To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
mailto:[EMAIL PROTECTED]




--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]