Question about the extends the query parser to support NumericField on Lucene 2.9.0
Hi, I have a problem to work support the NumericField in query parser.
My environment is like this:
Windows XP with
C:\work\> java -version
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) Client VM (build 11.0-b15, mixed mode, sharing)
I am using the lucene 2.9.0 releases.
I write my query parser class to support this numeric field, here is copy of
the override methods:
/**
* Create a new range query of query parser.
*
* If the filed is a numeric field, return NumericRangeQuery;
* otherwise, let super class handle it
*
* @param fieldName The file name
* @param part1 The lower bound
* @param part2 The high bound
* @throws IllegalArgumentExceptoin if the field type is not supported
* @throws NumberFormatException if the query data does not match with the
field type
*/
@Override
protected Query newRangeQuery(String fieldName, String part1, String part2,
boolean inclusive)
{
fieldName = fieldName.toLowerCase();
if (LogUtil.getInstance().isDebugEnabled(DcQueryParser.class))
{
LogUtil.getInstance().debug(DcQueryParser.class,
"Create a new range query for: " + fieldName);
}
mFieldNames.add(fieldName);
IFieldDefinition fieldDef = mIndexDef.getFieldDefinition(fieldName);
if (part1.trim().startsWith("+"))
{
part1 = part1.substring(1);
}
if (part2.trim().startsWith("+"))
{
part2 = part2.substring(1);
}
if (fieldDef != null && fieldDef.isNumericField())
{
if (fieldDef.getFieldType() == IFieldDefinition.FieldType.INT)
{
return NumericRangeQuery.newIntRange(fieldDef.getName(),
Integer.parseInt(part1), Integer.parseInt(part2), inclusive, inclusive);
}
else if (fieldDef.getFieldType() == IFieldDefinition.FieldType.LONG)
{
return NumericRangeQuery.newLongRange(fieldDef.getName(),
Long.parseLong(part1), Long.parseLong(part2), inclusive, inclusive);
}
else if (fieldDef.getFieldType() ==
IFieldDefinition.FieldType.FLOAT)
{
return NumericRangeQuery.newFloatRange(fieldDef.getName(),
Float.parseFloat(part1), Float.parseFloat(part2), inclusive, inclusive);
}
else if (fieldDef.getFieldType() ==
IFieldDefinition.FieldType.DOUBLE)
{
return NumericRangeQuery.newDoubleRange(fieldDef.getName(),
Double.parseDouble(part1), Double.parseDouble(part2), inclusive, inclusive);
}
else
{
throw new IllegalArgumentException("Unsupported new Numeric
field type, as the type is: " + fieldDef.getFieldType().name());
}
}
else
{
return super.newRangeQuery(fieldName, part1, part2, inclusive);
}
}
/**
* Create a new term query of query parser.
* If the filed is a numeric field, use xxxPrefixCoded
* otherwise, let super class handle it
*
* @param term The term object
* @return The query object
* @throws IllegalArgumentExceptoin if the field type is not supported
* @throws NumberFormatException if the query data does not match with the
field type
*/
@Override
protected Query newTermQuery(Term term)
{
System.out.println("..1");
String fieldName = term.field();
if (LogUtil.getInstance().isDebugEnabled(DcQueryParser.class))
{
LogUtil.getInstance().debug(DcQueryParser.class,
"Create a new term query for: " + fieldName);
}
mFieldNames.add(fieldName);
IFieldDefinition fieldDef = mIndexDef.getFieldDefinition(fieldName);
if (fieldDef != null && fieldDef.isNumericField())
{
System.out.println("..2");
String queryString = term.text().trim();
if (queryString.startsWith("+"))
{
queryString.substring(1);
}
if (fieldDef.getFieldType() == IFieldDefinition.FieldType.INT)
{
return new TermQuery(new Term(term.field(),
NumericUtils.intToPrefixCoded(Integer.parseInt(queryString;
}
else if (fieldDef.getFieldType() == IFieldDefinition.FieldType.LONG)
{
return new TermQuery(new Term(term.field(),
NumericUtils.longToPrefixCoded(Long.parseLong(queryString;
}
else if (fieldDef.getFieldType() ==
IFieldDefinition.FieldType.FLOAT)
{
return new TermQuery(new Term(term.field(),
NumericUtils.floatToPrefixCoded(Float.parseFloat(queryString;
}
else if (fieldDef.getFieldType() ==
IFieldDefinition.Field
What is the best way to handle the primary key case during lucene indexing
Hi, In our application, we will allow the user to create a primary key defined in the document. We are using lucene 2.9. In this case, when we index the data coming from the client, if the metadata contains the primary key defined, we have to do the search/update for every row based on the primary key. Here is our current problems: 1) If the meta data coming from client defined a primary key (which can contain one or multi fields), then for the data supplied from the client, we have to make sure that later row will override the previous row, if they have the same primary key as the data. 2) To do the above, we have to loop through the data first, to check if any later rows containing the same PK as the previous rows, so we will build the MAP in the memory to override the previous one by the latest ones. This is a very expensive operation. 3) Even in this case, for every row after the above filter steps, we still have to search the current index to see if any data with the same PK exist or not. So we have to do the remove before we add the new data in the index. I want to know if anyone has the same requirement like this PK using the lucene? What is the best way to index data in this case? First, I am thinking if it is possible to remove the above step2? the problem for the lucene is that when we add document in the index, we can NOT search it before commit it. But we only commit once when the whole data file is finished. So we have to loop through the data once to check to see if any data sharing the same PK in the data file. I am wondering if there is a way in the index writer, before it commits anything, when we add the new document into it, it can do the merging of the PK data? What I mean is that if the same PK data already exist in any previous added document, just remove it and let the new added data containing the same PK data take the place? If we can do this, then the whole pre checking data step can be removed. Second, for the above step 3, if the searching the existing index is NOT avoidable, what is the fast way to search by the PK? Of course we already indexed all the PK fields. When we add new data, we have to search every row of existing index by the PK fields, to see if it exist or not. If it does, remove it and add the new one. We constructor the query by the PK fields at run time, then search it row by row. This is also very bad as the indexing the data for performance. Here is what I am thinking? 1) Can I use the Indexreader.term(terms)? I heard it is much faster than the query searching? Is that right? 2) Currently we are do the search row by row? Should I do it in batching? Like I will combine 100 PK search into one search, using Boolean term? So one search will give me back all the data in this 100 PK which are in the index. Then I can remove them from the index using the result set. In this case, I only need to do 1/100 search requests as before? This will much faster than row by row in theory. Please let me know any feedbacks? If you ever dealed with PK data support, please share some thougths and experience. Thanks for your kind help. _ Hotmail: Free, trusted and rich email service. http://clk.atdmt.com/GBL/go/171222984/direct/01/
RE: What is the best way to handle the primary key case during lucene indexing
What I mean is that for one index, client can defined multi field in the index as the primary key (composite key). > Date: Mon, 16 Nov 2009 12:45:40 -0500 > Subject: Re: What is the best way to handle the primary key case during > luceneindexing > From: [email protected] > To: [email protected] > > What is the form of the unique key? I'm a bit confused here by your comment: > "which can contain one or multi fields". > > But it seems like IndexWriter.deleteDocuments should work here. It's easy > if your PKs are single terms, there's even a deleteDocuments(Term[]) form. > But this really *requires* that your PKs are single terms in a field. If > your PKs > are some sort of composite field, perhaps the iw.DeleteDocuments(Query[]) > would help where each query is enough to uniquely identify your document. > > Best > Erick > > On Mon, Nov 16, 2009 at 12:15 PM, java8964 java8964 > wrote: > > > > > Hi, > > > > In our application, we will allow the user to create a primary key defined > > in the document. We are using lucene 2.9. > > In this case, when we index the data coming from the client, if the > > metadata contains the primary key defined, > > we have to do the search/update for every row based on the primary key. > > > > Here is our current problems: > > > > 1) If the meta data coming from client defined a primary key (which can > > contain one or multi fields), > >then for the data supplied from the client, we have to make sure that > > later row will override the previous row, if they have the same primary key > > as the data. > > 2) To do the above, we have to loop through the data first, to check if any > > later rows containing the same PK as the previous rows, so we will build the > > MAP in the memory to override the previous one by the latest ones. > > This is a very expensive operation. > > 3) Even in this case, for every row after the above filter steps, we still > > have to search the current index to see if any data with the same PK exist > > or not. So we have to do the remove before we add the new data in the index. > > > > I want to know if anyone has the same requirement like this PK using the > > lucene? What is the best way to index data in this case? > > > > First, I am thinking if it is possible to remove the above step2? > > the problem for the lucene is that when we add document in the index, we > > can NOT search it before commit it. > > But we only commit once when the whole data file is finished. So we have to > > loop through the data once to check to see if any data sharing the same PK > > in the data file. > > I am wondering if there is a way in the index writer, before it commits > > anything, when we add the new document into it, it can do the merging of the > > PK data? What I mean is that if the same PK data already exist in any > > previous added document, just remove it and let the new added data > > containing the same PK data take the place? If we can do this, then the > > whole pre checking data step can be removed. > > > > Second, for the above step 3, if the searching the existing index is NOT > > avoidable, what is the fast way to search by the PK? Of course we already > > indexed all the PK fields. When we add new data, we have to search every row > > of existing index by the PK fields, to see if it exist or not. If it does, > > remove it and add the new one. > > We constructor the query by the PK fields at run time, then search it row > > by row. This is also very bad as the indexing the data for performance. > > > > Here is what I am thinking? > > 1) Can I use the Indexreader.term(terms)? I heard it is much faster than > > the query searching? Is that right? > > 2) Currently we are do the search row by row? Should I do it in batching? > > Like I will combine 100 PK search into one search, using Boolean term? So > > one search will give me back all the data in this 100 PK which are in the > > index. Then I can remove them from the index using the result set. In this > > case, I only need to do 1/100 search requests as before? This will much > > faster than row by row in theory. > > > > > > Please let me know any feedbacks? If you ever dealed with PK data support, > > please share some thougths and experience. > > > > Thanks for your kind help. > > > > _ > > Hotmail: Free, trusted and rich email service. > > http://clk.atdmt.com/GBL/go/171222984/direct/01/ > > _ Hotmail: Free, trusted and rich email service. http://clk.atdmt.com/GBL/go/171222984/direct/01/
RE: What is the best way to handle the primary key case during lucene indexing
But can IndexWriter.updateDocument(Term, Document) handle the composite key case? If my primary key contains field1 and field2, can I use one Term to include both field1 and field2? Thanks > Date: Mon, 16 Nov 2009 09:44:35 -0800 > Subject: Re: What is the best way to handle the primary key case during > luceneindexing > From: [email protected] > To: [email protected] > > The usual way to do this is to use: > >IndexWriter.updateDocument(Term, Document) > > This method deletes all documents with the given Term in it (this would be > your primary key), and then adds the Document you want to add. This is the > traditional way to do updates, and it is fast. > > -jake > > > > On Mon, Nov 16, 2009 at 9:15 AM, java8964 java8964 > wrote: > > > > > Hi, > > > > In our application, we will allow the user to create a primary key defined > > in the document. We are using lucene 2.9. > > In this case, when we index the data coming from the client, if the > > metadata contains the primary key defined, > > we have to do the search/update for every row based on the primary key. > > > > Here is our current problems: > > > > 1) If the meta data coming from client defined a primary key (which can > > contain one or multi fields), > >then for the data supplied from the client, we have to make sure that > > later row will override the previous row, if they have the same primary key > > as the data. > > 2) To do the above, we have to loop through the data first, to check if any > > later rows containing the same PK as the previous rows, so we will build the > > MAP in the memory to override the previous one by the latest ones. > > This is a very expensive operation. > > 3) Even in this case, for every row after the above filter steps, we still > > have to search the current index to see if any data with the same PK exist > > or not. So we have to do the remove before we add the new data in the index. > > > > I want to know if anyone has the same requirement like this PK using the > > lucene? What is the best way to index data in this case? > > > > First, I am thinking if it is possible to remove the above step2? > > the problem for the lucene is that when we add document in the index, we > > can NOT search it before commit it. > > But we only commit once when the whole data file is finished. So we have to > > loop through the data once to check to see if any data sharing the same PK > > in the data file. > > I am wondering if there is a way in the index writer, before it commits > > anything, when we add the new document into it, it can do the merging of the > > PK data? What I mean is that if the same PK data already exist in any > > previous added document, just remove it and let the new added data > > containing the same PK data take the place? If we can do this, then the > > whole pre checking data step can be removed. > > > > Second, for the above step 3, if the searching the existing index is NOT > > avoidable, what is the fast way to search by the PK? Of course we already > > indexed all the PK fields. When we add new data, we have to search every row > > of existing index by the PK fields, to see if it exist or not. If it does, > > remove it and add the new one. > > We constructor the query by the PK fields at run time, then search it row > > by row. This is also very bad as the indexing the data for performance. > > > > Here is what I am thinking? > > 1) Can I use the Indexreader.term(terms)? I heard it is much faster than > > the query searching? Is that right? > > 2) Currently we are do the search row by row? Should I do it in batching? > > Like I will combine 100 PK search into one search, using Boolean term? So > > one search will give me back all the data in this 100 PK which are in the > > index. Then I can remove them from the index using the result set. In this > > case, I only need to do 1/100 search requests as before? This will much > > faster than row by row in theory. > > > > > > Please let me know any feedbacks? If you ever dealed with PK data support, > > please share some thougths and experience. > > > > Thanks for your kind help. > > > > _ > > Hotmail: Free, trusted and rich email service. > > http://clk.atdmt.com/GBL/go/171222984/direct/01/ > > _ Hotmail: Powerful Free email with security by Microsoft. http://clk.atdmt.com/GBL/go/171222986/direct/01/
During the wild card search, will lucene 2.9.0 to convert the search string to lower case?
I noticed a strange result from the following test case. For wildcard search,
my understanding is that lucene will NOT use any analyzer on the query string.
But as the following simple code to show, it looks like that lucene will lower
case the search query in the wildcard search. Why? If not, why the following
test case show the search hits as one for lower case wildcard search, but not
for the upper case data? My original data is NOT analyzed, so they should be
stored as the original data in the index segment, right?
Lucene version: 2.9.0
JDK version: JDK 1.6.0_17
public class IndexTest1 {
public static void main(String[] args) {
try {
Directory directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory, new
StandardAnalyzer(Version.LUCENE_CURRENT), IndexWriter.MaxFieldLength.UNLIMITED);
Document doc = new Document();
doc.add(new Field("title", "BBB CCC", Field.Store.YES,
Field.Index.NOT_ANALYZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field("title", "ddd eee", Field.Store.YES,
Field.Index.NOT_ANALYZED));
writer.addDocument(doc);
writer.close();
IndexSearcher searcher = new IndexSearcher(directory, true);
PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
StandardAnalyzer(Version.LUCENE_CURRENT));
wrapper.addAnalyzer("title", new KeywordAnalyzer());
Query query = new QueryParser("title",
wrapper).parse("title:BBB*");
System.out.println("hits of title = " + searcher.search(query,
100).totalHits);
query = new QueryParser("title",
wrapper).parse("title:ddd*");
System.out.println("hits of title = " + searcher.search(query,
100).totalHits);
searcher.close();
} catch (Exception e) {
System.out.println(e);
}
}
}
The output:
hits of title = 0
hits of title = 1
_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/201469227/direct/01/
RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?
I would like to confirm your reply. You mean that the query parse will lower
casing. In fact, it looks like that it only does this for wild card query,
right?
For the term query, it didn't. As proved by if you change the line to:
Query query = new QueryParser("title", wrapper).parse("title:\"BBB
CCC\"");
You will get 1 hits back. So in this case, the query parser class did in
different way for term query and wild card query.
We have to use the query parse in this case, but we have our own Query parser
class extends from the lucene query parser class. Anything we can do to about
it?
Will lucense's query parser class be fixed for the above inconsistent
implementation?
Thanks
> From: [email protected]
> To: [email protected]
> Subject: RE: During the wild card search, will lucene 2.9.0 to convert the
> search string to lower case?
> Date: Mon, 1 Feb 2010 17:41:08 +0100
>
> Only query parser does the lower casing. For such a special case, I would
> suggest to use a PrefixQuery or WildcardQuery directly and not use query
> parser.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
>
> > -Original Message-
> > From: java8964 java8964 [mailto:[email protected]]
> > Sent: Monday, February 01, 2010 5:27 PM
> > To: [email protected]
> > Subject: During the wild card search, will lucene 2.9.0 to convert the
> > search string to lower case?
> >
> >
> > I noticed a strange result from the following test case. For wildcard
> > search, my understanding is that lucene will NOT use any analyzer on
> > the query string. But as the following simple code to show, it looks
> > like that lucene will lower case the search query in the wildcard
> > search. Why? If not, why the following test case show the search hits
> > as one for lower case wildcard search, but not for the upper case data?
> > My original data is NOT analyzed, so they should be stored as the
> > original data in the index segment, right?
> >
> > Lucene version: 2.9.0
> >
> > JDK version: JDK 1.6.0_17
> >
> >
> > public class IndexTest1 {
> > public static void main(String[] args) {
> > try {
> > Directory directory = new RAMDirectory();
> > IndexWriter writer = new IndexWriter(directory, new
> > StandardAnalyzer(Version.LUCENE_CURRENT),
> > IndexWriter.MaxFieldLength.UNLIMITED);
> > Document doc = new Document();
> > doc.add(new Field("title", "BBB CCC", Field.Store.YES,
> > Field.Index.NOT_ANALYZED));
> > writer.addDocument(doc);
> > doc = new Document();
> > doc.add(new Field("title", "ddd eee", Field.Store.YES,
> > Field.Index.NOT_ANALYZED));
> > writer.addDocument(doc);
> >
> > writer.close();
> >
> > IndexSearcher searcher = new IndexSearcher(directory,
> > true);
> > PerFieldAnalyzerWrapper wrapper = new
> > PerFieldAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_CURRENT));
> > wrapper.addAnalyzer("title", new KeywordAnalyzer());
> > Query query = new QueryParser("title",
> > wrapper).parse("title:BBB*");
> > System.out.println("hits of title = " +
> > searcher.search(query, 100).totalHits);
> > query = new QueryParser("title",
> > wrapper).parse("title:ddd*");
> > System.out.println("hits of title = " +
> > searcher.search(query, 100).totalHits);
> > searcher.close();
> > } catch (Exception e) {
> > System.out.println(e);
> > }
> > }
> > }
> >
> > The output:
> > hits of title = 0
> > hits of title = 1
> >
> >
> > _
> > Hotmail: Trusted email with powerful SPAM protection.
> > http://clk.atdmt.com/GBL/go/201469227/direct/01/
>
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
_
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/201469230/direct/01/
RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?
This is maybe something I am looking for. We are using the default value, which is true. Let me examine this method more. Thanks for your help. > From: [email protected] > To: [email protected] > Subject: RE: During the wild card search, will lucene 2.9.0 to convert the > search string to lower case? > Date: Mon, 1 Feb 2010 20:36:29 +0200 > > Did you try queryParser.SetLowercaseExpandedTerms(false)? > > DIGY > > -----Original Message- > From: java8964 java8964 [mailto:[email protected]] > Sent: Monday, February 01, 2010 8:11 PM > To: [email protected] > Subject: RE: During the wild card search, will lucene 2.9.0 to convert the > search string to lower case? > > > I would like to confirm your reply. You mean that the query parse will lower > casing. In fact, it looks like that it only does this for wild card query, > right? > > For the term query, it didn't. As proved by if you change the line to: > > Query query = new QueryParser("title", > wrapper).parse("title:\"BBB CCC\""); > > You will get 1 hits back. So in this case, the query parser class did in > different way for term query and wild card query. > > We have to use the query parse in this case, but we have our own Query > parser class extends from the lucene query parser class. Anything we can do > to about it? > > Will lucense's query parser class be fixed for the above inconsistent > implementation? > > Thanks > > > > From: [email protected] > > To: [email protected] > > Subject: RE: During the wild card search, will lucene 2.9.0 to convert the > search string to lower case? > > Date: Mon, 1 Feb 2010 17:41:08 +0100 > > > > Only query parser does the lower casing. For such a special case, I would > suggest to use a PrefixQuery or WildcardQuery directly and not use query > parser. > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: [email protected] > > > > > -Original Message- > > > From: java8964 java8964 [mailto:[email protected]] > > > Sent: Monday, February 01, 2010 5:27 PM > > > To: [email protected] > > > Subject: During the wild card search, will lucene 2.9.0 to convert the > > > search string to lower case? > > > > > > > > > I noticed a strange result from the following test case. For wildcard > > > search, my understanding is that lucene will NOT use any analyzer on > > > the query string. But as the following simple code to show, it looks > > > like that lucene will lower case the search query in the wildcard > > > search. Why? If not, why the following test case show the search hits > > > as one for lower case wildcard search, but not for the upper case data? > > > My original data is NOT analyzed, so they should be stored as the > > > original data in the index segment, right? > > > > > > Lucene version: 2.9.0 > > > > > > JDK version: JDK 1.6.0_17 > > > > > > > > > public class IndexTest1 { > > > public static void main(String[] args) { > > > try { > > > Directory directory = new RAMDirectory(); > > > IndexWriter writer = new IndexWriter(directory, new > > > StandardAnalyzer(Version.LUCENE_CURRENT), > > > IndexWriter.MaxFieldLength.UNLIMITED); > > > Document doc = new Document(); > > > doc.add(new Field("title", "BBB CCC", Field.Store.YES, > > > Field.Index.NOT_ANALYZED)); > > > writer.addDocument(doc); > > > doc = new Document(); > > > doc.add(new Field("title", "ddd eee", Field.Store.YES, > > > Field.Index.NOT_ANALYZED)); > > > writer.addDocument(doc); > > > > > > writer.close(); > > > > > > IndexSearcher searcher = new IndexSearcher(directory, > > > true); > > > PerFieldAnalyzerWrapper wrapper = new > > > PerFieldAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_CURRENT)); > > > wrapper.addAnalyzer("title", new KeywordAnalyzer()); > > > Query query = new QueryParser("title", > > > wrapper).parse("title:BBB*"); > > > System.out.println("hits of title = " + > > > searcher.search(query, 100).totalHits); > > >
confused by the lucene boolean query with wildcard result
Hi, I have the following test case point to the index generated in our
application. The result is confusing me and I don't know the reason.
Lucene version: 2.9.0
JDK 1.6.0_18
public class IndexTest1 {
public static void main(String[] args) {
try {
FSDirectory directory = FSDirectory.open(new
File("/path_to_index_files"));
IndexSearcher searcher = new IndexSearcher(directory, true);
PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
StandardAnalyzer());
wrapper.addAnalyzer("f1string_sif", new KeywordAnalyzer());
wrapper.addAnalyzer("f2string_ti", new
StandardAnalyzer(Version.LUCENE_CURRENT));
Query query = new QueryParser("f1string_sif", new
StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank*");
System.out.println("query = " + query);
System.out.println("hits = " + searcher.search(query,
100).totalHits);
searcher.close();
} catch (Exception e) {
System.out.println(e);
}
}
}
Output:
query = f2string_ti:subbank*
hits = 6
If I change the line to the following:
Query query = new QueryParser("f1string_sif", new
StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:rdmap*");
Output:
query = f2string_ti:rdmap*
hits = 4
The above result are both correct based on my data.
Now if I change the line to:
Query query = new QueryParser("f1string_sif", new
StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank* OR
f2string_ti:rdmap*");
Output:
query = f2string_ti:subbank* f2string_ti:rdmap*
hits = 2
I assume the count in the last result should be larger than max(6,4), but it is
2. Any reason for that?
Thanks
_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/201469227/direct/01/
RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?
Is there an analyzer like keyword analyzer, but will also lowering the data from lucene? Or I have to do a customer analyzer by myself? Thanks > From: [email protected] > To: [email protected] > Subject: RE: During the wild card search, will lucene 2.9.0 to convert the > search string to lower case? > Date: Mon, 1 Feb 2010 14:24:00 -0500 > > > This is maybe something I am looking for. We are using the default value, > which is true. > > Let me examine this method more. > > Thanks for your help. > > > From: [email protected] > > To: [email protected] > > Subject: RE: During the wild card search, will lucene 2.9.0 to convert the > > search string to lower case? > > Date: Mon, 1 Feb 2010 20:36:29 +0200 > > > > Did you try queryParser.SetLowercaseExpandedTerms(false)? > > > > DIGY > > > > -Original Message- > > From: java8964 java8964 [mailto:[email protected]] > > Sent: Monday, February 01, 2010 8:11 PM > > To: [email protected] > > Subject: RE: During the wild card search, will lucene 2.9.0 to convert the > > search string to lower case? > > > > > > I would like to confirm your reply. You mean that the query parse will lower > > casing. In fact, it looks like that it only does this for wild card query, > > right? > > > > For the term query, it didn't. As proved by if you change the line to: > > > > Query query = new QueryParser("title", > > wrapper).parse("title:\"BBB CCC\""); > > > > You will get 1 hits back. So in this case, the query parser class did in > > different way for term query and wild card query. > > > > We have to use the query parse in this case, but we have our own Query > > parser class extends from the lucene query parser class. Anything we can do > > to about it? > > > > Will lucense's query parser class be fixed for the above inconsistent > > implementation? > > > > Thanks > > > > > > > From: [email protected] > > > To: [email protected] > > > Subject: RE: During the wild card search, will lucene 2.9.0 to convert the > > search string to lower case? > > > Date: Mon, 1 Feb 2010 17:41:08 +0100 > > > > > > Only query parser does the lower casing. For such a special case, I would > > suggest to use a PrefixQuery or WildcardQuery directly and not use query > > parser. > > > > > > - > > > Uwe Schindler > > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > http://www.thetaphi.de > > > eMail: [email protected] > > > > > > > -Original Message- > > > > From: java8964 java8964 [mailto:[email protected]] > > > > Sent: Monday, February 01, 2010 5:27 PM > > > > To: [email protected] > > > > Subject: During the wild card search, will lucene 2.9.0 to convert the > > > > search string to lower case? > > > > > > > > > > > > I noticed a strange result from the following test case. For wildcard > > > > search, my understanding is that lucene will NOT use any analyzer on > > > > the query string. But as the following simple code to show, it looks > > > > like that lucene will lower case the search query in the wildcard > > > > search. Why? If not, why the following test case show the search hits > > > > as one for lower case wildcard search, but not for the upper case data? > > > > My original data is NOT analyzed, so they should be stored as the > > > > original data in the index segment, right? > > > > > > > > Lucene version: 2.9.0 > > > > > > > > JDK version: JDK 1.6.0_17 > > > > > > > > > > > > public class IndexTest1 { > > > > public static void main(String[] args) { > > > > try { > > > > Directory directory = new RAMDirectory(); > > > > IndexWriter writer = new IndexWriter(directory, new > > > > StandardAnalyzer(Version.LUCENE_CURRENT), > > > > IndexWriter.MaxFieldLength.UNLIMITED); > > > > Document doc = new Document(); > > > > doc.add(new Field("title", "BBB CCC", Field.Store.YES, > > > > Field.Index.NOT_ANALYZED)); > > > > writer.addDocument(doc); > > > > doc = new Document(); > > > > doc.add(new Field(&quo
RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?
Thanks for your help. My concern now is that the field could be defined as store. So when the user receive the field data, we want to still show the original data, in upper case in this case. First, I don't think I can use queryParser.SetLowercaseExpandedTerms(false), which will remove the wildcard search case insensitive functionality for tokenized field. To handle this case, if the data is NOT tokenized, but contain upper case data, to be able do the wildcard search with uppercase letter, like 'BB*', I am thinking that I have to analyzer the non tokenized data, using a KeywordTokenizer plus lowercase the data. For your suggestion, will the data change to lower case and stored in the lucene when it being retrieved? Thanks > From: [email protected] > To: [email protected] > Subject: RE: During the wild card search, will lucene 2.9.0 to convert the > search string to lower case? > Date: Wed, 3 Feb 2010 11:17:27 +0100 > > For specific fields using a special TokenStream chain, there is no need to > write a separate analyzer. You can add fields to a document using a > TokenStream as parameter: new Field(name, TokenStream). > > As TokenStream just create a chain from Tokenizer and all Filters like: > > TokenStream ts = new KeywordTokenizer(new StringReader("your text to index")); > ts = new LowercaseFilter(ts); > ... > document.add("fieldname", ts); > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: [email protected] > > > > -Original Message- > > From: Ian Lea [mailto:[email protected]] > > Sent: Wednesday, February 03, 2010 11:06 AM > > To: [email protected] > > Subject: Re: During the wild card search, will lucene 2.9.0 to convert > > the search string to lower case? > > > > I think you'll have to write your own. Or just downcase the text > > yourself first. > > > > > > -- > > Ian. > > > > > > On Tue, Feb 2, 2010 at 9:30 PM, java8964 java8964 > > wrote: > > > > > > Is there an analyzer like keyword analyzer, but will also lowering > > the data from lucene? Or I have to do a customer analyzer by myself? > > > > > > Thanks > > > > > >> From: [email protected] > > >> To: [email protected] > > >> Subject: RE: During the wild card search, will lucene 2.9.0 to > > convert the search string to lower case? > > >> Date: Mon, 1 Feb 2010 14:24:00 -0500 > > >> > > >> > > >> This is maybe something I am looking for. We are using the default > > value, which is true. > > >> > > >> Let me examine this method more. > > >> > > >> Thanks for your help. > > >> > > >> > From: [email protected] > > >> > To: [email protected] > > >> > Subject: RE: During the wild card search, will lucene 2.9.0 to > > convert the search string to lower case? > > >> > Date: Mon, 1 Feb 2010 20:36:29 +0200 > > >> > > > >> > Did you try queryParser.SetLowercaseExpandedTerms(false)? > > >> > > > >> > DIGY > > >> > > > >> > -Original Message- > > >> > From: java8964 java8964 [mailto:[email protected]] > > >> > Sent: Monday, February 01, 2010 8:11 PM > > >> > To: [email protected] > > >> > Subject: RE: During the wild card search, will lucene 2.9.0 to > > convert the > > >> > search string to lower case? > > >> > > > >> > > > >> > I would like to confirm your reply. You mean that the query parse > > will lower > > >> > casing. In fact, it looks like that it only does this for wild > > card query, > > >> > right? > > >> > > > >> > For the term query, it didn't. As proved by if you change the line > > to: > > >> > > > >> > Query query = new QueryParser("title", > > >> > wrapper).parse("title:\"BBB CCC\""); > > >> > > > >> > You will get 1 hits back. So in this case, the query parser class > > did in > > >> > different way for term query and wild card query. > > >> > > > >> > We have to use the query parse in this case, but we have our own > > Query > > >> > parser class extends from the lucene query parser class. Anything > > we can do > &
RE: confused by the lucene boolean query with wildcard result
Thanks for you help. I upgrade the lucene to 2.9.1, the problem is gone. It looks like a boolean query bug in the lucene 2.9.0 and fixed in the 2.9.1 Thanks > From: [email protected] > Date: Wed, 3 Feb 2010 10:02:27 + > Subject: Re: confused by the lucene boolean query with wildcard result > To: [email protected] > > You should probably be using your PerFieldAnalyzerWrapper in your > calls to QueryParser but apart from that I can't see any obvious > reason. General advice: use Luke to check what has been indexed and > read > http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F > > If none of these help, post again but showing what you are indexing as > well as how you are searching - the smallest possible test case or > self-contained program that shows the problem. > > Or maybe someone else will spot the problem. > > > -- > Ian. > > > > On Tue, Feb 2, 2010 at 8:56 PM, java8964 java8964 > wrote: > > > > Hi, I have the following test case point to the index generated in our > > application. The result is confusing me and I don't know the reason. > > > > Lucene version: 2.9.0 > > JDK 1.6.0_18 > > > > public class IndexTest1 { > >public static void main(String[] args) { > >try { > >FSDirectory directory = FSDirectory.open(new > > File("/path_to_index_files")); > >IndexSearcher searcher = new IndexSearcher(directory, true); > >PerFieldAnalyzerWrapper wrapper = new > > PerFieldAnalyzerWrapper(new StandardAnalyzer()); > >wrapper.addAnalyzer("f1string_sif", new KeywordAnalyzer()); > >wrapper.addAnalyzer("f2string_ti", new > > StandardAnalyzer(Version.LUCENE_CURRENT)); > >Query query = new QueryParser("f1string_sif", new > > StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank*"); > >System.out.println("query = " + query); > >System.out.println("hits = " + searcher.search(query, > > 100).totalHits); > >searcher.close(); > >} catch (Exception e) { > >System.out.println(e); > >} > >} > > } > > > > Output: > > query = f2string_ti:subbank* > > hits = 6 > > > > If I change the line to the following: > > > > Query query = new QueryParser("f1string_sif", new > > StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:rdmap*"); > > > > Output: > > query = f2string_ti:rdmap* > > hits = 4 > > > > The above result are both correct based on my data. > > > > Now if I change the line to: > > > > Query query = new QueryParser("f1string_sif", new > > StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank* OR > > f2string_ti:rdmap*"); > > > > Output: > > query = f2string_ti:subbank* f2string_ti:rdmap* > > hits = 2 > > > > > > I assume the count in the last result should be larger than max(6,4), but > > it is 2. Any reason for that? > > > > Thanks > > > > > > _ > > Hotmail: Trusted email with powerful SPAM protection. > > http://clk.atdmt.com/GBL/go/201469227/direct/01/ > > - > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > _ Hotmail: Powerful Free email with security by Microsoft. http://clk.atdmt.com/GBL/go/201469230/direct/01/
