Re: Using lucene queries to search StringFields

2015-06-21 Thread Jack Krupansky
Unlike Solr, which customizes the query parser to do field-specific
analysis, and only analyzes tokenized fields, not string fields, the Lucene
query parser will unconditionally analyze every query term for every field
using the single specified analyzer, which is the white space analyzer in
this case, which will split your string term with an embedded space into
two separate terms, which will generate a phrase query rather that a single
term query, which is not supported for non-tokenized fields.

Use the KeywordAnalyzer which will not split a quoted string into multiple
terms:
http://lucene.apache.org/core/5_2_0/analyzers-common/org/apache/lucene/analysis/core/KeywordAnalyzer.html

You can also simply escape the spaces with a backslash rather than quote
the entire term, but you still need to use the keyword analyzer.


-- Jack Krupansky

On Fri, Jun 19, 2015 at 2:31 AM, Gimantha Bandara giman...@wso2.com wrote:

 Correction..

 second time I used the following code to test. Then I got the above
 IllegalStateException issue.

 w = new QueryParser(null, new WhitespaceAnalyzer()).parse(*B:\1 2\*);

 not the below one.

 w = new QueryParser(null, new WhitespaceAnalyzer()).parse(*\**B:1 2\*);

 Can someone point out the correct way to query for StringFields?

 Thanks,

 On Thu, Jun 18, 2015 at 2:12 PM, Gimantha Bandara giman...@wso2.com
 wrote:

  Hi all,
 
  I have created lucene documents like below.
 
  Document doc = new Document();
  doc.add(new TextField(A, 1, Field.Store.YES));
  doc.add(new StringField(B, 1 2 3, Field.Store.NO));
  doc.add(new TextField(Publish Date, 2010, Field.Store.NO));
  indexWriter.addDocument(doc);
 
  doc = new Document();
  doc.add(new TextField(A, 2, Field.Store.YES));
  doc.add(new StringField(B, 1 2, Field.Store.NO));
  doc.add(new TextField(Publish Date, 2010, Field.Store.NO));
  indexWriter.addDocument(doc);
 
  doc = new Document();
  doc.add(new TextField(A, 3, Field.Store.YES));
  doc.add(new StringField(B, 1, Field.Store.NO));
  doc.add(new TextField(Publish Date, 2012, Field.Store.NO));
  indexWriter.addDocument(doc);
 
  Now I am using the following code to test the StringField behavior.
 
  Query w = null;
  try {
  w = new QueryParser(null, new
 WhitespaceAnalyzer()).parse(B:1
  2);
  } catch (ParseException e) {
  e.printStackTrace();
  }
  TopScoreDocCollector collector = TopScoreDocCollector.create(100,
  true);
  searcher.search(w, collector);
  ScoreDoc[] hits = collector.topDocs(0).scoreDocs;
  Document indexDoc;
  for (ScoreDoc doc : hits) {
  indexDoc = searcher.doc(doc.doc);
  System.out.println(indexDoc.get(A));
  }
 
  Above code should print only the second document's 'A' value as it is the
  only one where 'B' has value '1 2'. But it returns the 3rd document. So I
  tried using double quotation marks for 'B' value as below.
 
  w = new QueryParser(null, new WhitespaceAnalyzer()).parse(\B:1 2\);
 
  It gives the following error.
 
  Exception in thread main java.lang.IllegalStateException: field B was
  indexed without position data; cannot run PhraseQuery (term=1)
  at
 
 org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:277)
  at org.apache.lucene.search.Weight.bulkScorer(Weight.java:131)
  at
  org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
  at
  org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
  Is
  my searching query wrong? (Note: I am using whitespace analyzer
 everywhere)
 
  --
  Gimantha Bandara
  Software Engineer
  WSO2. Inc : http://wso2.com
  Mobile : +94714961919
 



 --
 Gimantha Bandara
 Software Engineer
 WSO2. Inc : http://wso2.com
 Mobile : +94714961919



Re: Using lucene queries to search StringFields

2015-06-21 Thread Gimantha Bandara
@Sheng
I am using StandardAnalyzer

@Ahmet
I know using the query object will simply work. But I hae a requirement
where the user insert the whole String and i want to return the doc which
exactly match the given text

On Fri, Jun 19, 2015 at 9:23 PM, Sheng sheng...@gmail.com wrote:

 1. What is the analyzer are you using for indexing ?
 2. you cannot fuzzy match field name - that for sure will throw exception
 3. I would start from a simple, deterministic query object to rule out all
 unlikely possibilities first before resorting to parser to generate that
 for you.


 On Fri, Jun 19, 2015 at 10:45 AM, Ahmet Arslan iori...@yahoo.com.invalid
 wrote:

  Hi,
 
  Why don't you create your query with API?
 
  Term term = new Term(B, 1 2);
  Query query = new TermQuery(term);
 
  Ahmet
 
 
 
  On Friday, June 19, 2015 9:31 AM, Gimantha Bandara giman...@wso2.com
  wrote:
  Correction..
 
  second time I used the following code to test. Then I got the above
  IllegalStateException issue.
 
  w = new QueryParser(null, new WhitespaceAnalyzer()).parse(*B:\1 2\*);
 
  not the below one.
 
  w = new QueryParser(null, new WhitespaceAnalyzer()).parse(*\**B:1
 2\*);
 
  Can someone point out the correct way to query for StringFields?
 
  Thanks,
 
  On Thu, Jun 18, 2015 at 2:12 PM, Gimantha Bandara giman...@wso2.com
  wrote:
 
   Hi all,
  
   I have created lucene documents like below.
  
   Document doc = new Document();
   doc.add(new TextField(A, 1, Field.Store.YES));
   doc.add(new StringField(B, 1 2 3, Field.Store.NO));
   doc.add(new TextField(Publish Date, 2010, Field.Store.NO));
   indexWriter.addDocument(doc);
  
   doc = new Document();
   doc.add(new TextField(A, 2, Field.Store.YES));
   doc.add(new StringField(B, 1 2, Field.Store.NO));
   doc.add(new TextField(Publish Date, 2010, Field.Store.NO));
   indexWriter.addDocument(doc);
  
   doc = new Document();
   doc.add(new TextField(A, 3, Field.Store.YES));
   doc.add(new StringField(B, 1, Field.Store.NO));
   doc.add(new TextField(Publish Date, 2012, Field.Store.NO));
   indexWriter.addDocument(doc);
  
   Now I am using the following code to test the StringField behavior.
  
   Query w = null;
   try {
   w = new QueryParser(null, new
  WhitespaceAnalyzer()).parse(B:1
   2);
   } catch (ParseException e) {
   e.printStackTrace();
   }
   TopScoreDocCollector collector =
 TopScoreDocCollector.create(100,
   true);
   searcher.search(w, collector);
   ScoreDoc[] hits = collector.topDocs(0).scoreDocs;
   Document indexDoc;
   for (ScoreDoc doc : hits) {
   indexDoc = searcher.doc(doc.doc);
   System.out.println(indexDoc.get(A));
   }
  
   Above code should print only the second document's 'A' value as it is
 the
   only one where 'B' has value '1 2'. But it returns the 3rd document.
 So I
   tried using double quotation marks for 'B' value as below.
  
   w = new QueryParser(null, new WhitespaceAnalyzer()).parse(\B:1 2\);
  
   It gives the following error.
  
   Exception in thread main java.lang.IllegalStateException: field B
 was
   indexed without position data; cannot run PhraseQuery (term=1)
   at
  
 
 org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:277)
   at org.apache.lucene.search.Weight.bulkScorer(Weight.java:131)
   at
   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
   at
   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
   Is
   my searching query wrong? (Note: I am using whitespace analyzer
  everywhere)
  
   --
   Gimantha Bandara
   Software Engineer
   WSO2. Inc : http://wso2.com
   Mobile : +94714961919
 
  
 
 
 
  --
  Gimantha Bandara
  Software Engineer
  WSO2. Inc : http://wso2.com
  Mobile : +94714961919
 
  -
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 




-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919