RE: Cannot Escape Special charectors Search with Lucene.Net 2.0

Granroth, Neal V. Fri, 17 Dec 2010 09:54:38 -0800

Another confusion factor in the original question is the purpose of the escape 
character.  It appears to work fine.


Here's an example that indexes "test&&test" and finds it in a search.  Note the 
escape character is not needed for this.

It also indexes the odd phrase "yellow^orange" and uses the escape character so 
the circumflex character "^" is used as search text instead of being used to 
indicate a boost value as it would normally do.  Without the escape preceding 
the circumflex a parse exception occurs.

----------------------------

using WhitespaceAnalyzer = Lucene.Net.Analysis.WhitespaceAnalyzer;
using RAMDirectory = Lucene.Net.Store.RAMDirectory;
using IndexModifier = Lucene.Net.Index.IndexModifier;
using LDocument = Lucene.Net.Documents.Document;
using LField = Lucene.Net.Documents.Field;
using IndexSearcher = Lucene.Net.Search.IndexSearcher;
using Query = Lucene.Net.Search.Query;
using QueryParser = Lucene.Net.QueryParsers.QueryParser;
using Hits = Lucene.Net.Search.Hits;
using Hit = Lucene.Net.Search.Hit; 

RAMDirectory ixStoreII = new RAMDirectory();

IndexModifier ixModifierII =
        new IndexModifier(ixStoreII, new WhitespaceAnalyzer(), true);

LDocument docTest = null;

docTest = new LDocument();
docTest.Add(LField.Keyword("name", "Doc-1"));
docTest.Add(LField.Text("content", "cyan magenta yellow^orange"));
ixModifierII.AddDocument(docTest);

docTest = new LDocument();
docTest.Add(LField.Keyword("name", "Doc-2"));
docTest.Add(LField.Text("content", "red green test&&test blue"));
ixModifierII.AddDocument(docTest);

docTest = new LDocument();
docTest.Add(LField.Keyword("name", "Doc-3"));
docTest.Add(LField.Text("content", "red green test magenta"));
ixModifierII.AddDocument(docTest);

ixModifierII.Close();

IndexSearcher ixSearcher = new IndexSearcher(ixStoreII);

Query q = QueryParser.Parse(
        "test&&test OR yellow\\^orange", "content",
        new WhitespaceAnalyzer());

Hits hits = ixSearcher.Search(q);
System.Collections.IEnumerator euHits = hits.Iterator();
while (euHits.MoveNext())
{
        Hit htCUR = (Hit)(euHits.Current);
        LDocument ixDoc = htCUR.GetDocument();
        System.Console.WriteLine("Found " + ixDoc.Get("name"));
}

ixSearcher.Close();

ixStoreII.Close();

----------------------------

The output is:

Found Doc-1
Found Doc-2


- Neal
 

-----Original Message-----
From: Robert Jordan [mailto:[email protected]] 
Sent: Friday, December 17, 2010 11:12 AM
To: [email protected]
Subject: Re: Cannot Escape Special charectors Search with Lucene.Net 2.0

On 17.12.2010 17:59, Digy wrote:
>> N.G -->  You can see that the "&&" characters were identified as separators
> and two "test" tokens were emitted not the single "test&&test" you expected.
>
>> A.R -->  The scenario is if I try search a text "Test&&Test"
>
> But the query "Test&&Test" will also be parsed as "test test" by
> StandardAnalyzer. Since there are 2 sucessive "test"s in the index, there
> must be a hit.

Or he doesn't use the same analyzer for indexing and searching.

Robert


>
> DIGY
>
>
> -----Original Message-----
> From: Granroth, Neal V. [mailto:[email protected]]
> Sent: Friday, December 17, 2010 6:06 PM
> To: [email protected]
> Subject: RE: Cannot Escape Special charectors Search with Lucene.Net 2.0
>
>
> Robert's correct the StandardAnalyzer will split the input text at the "&&"
> characters so your index will not contain them.  As in this simple example:
>
> StandardAnalyzer aa = new StandardAnalyzer();
>
> System.IO.StringReader srs = new System.IO.StringReader("aaa bbb test&&test
> ccc ddd");
>
> Lucene.Net.Analysis.TokenStream ts = aa.TokenStream(srs);
>                       
> Lucene.Net.Analysis.Token tk;
> while( (tk = ts.Next()) != null )
> {
>     System.Console.WriteLine(String.Format("Token: \"{0}\": S:{1}, E:{2}",
>        tk.TermText(),tk.StartOffset(),tk.EndOffset()));
> }
>
> The output looks like this:
> Token: "aaa": S:0, E:3
> Token: "bbb": S:4, E:7
> Token: "test": S:8, E:12
> Token: "test": S:14, E:18
> Token: "ccc": S:19, E:22
> Token: "ddd": S:23, E:26
>
> You can see that the "&&" characters were identified as separators and two
> "test" tokens were emitted not the single "test&&test" you expected.
>
>
> - Neal
>
> -----Original Message-----
> From: Robert Jordan [mailto:[email protected]]
> Sent: Friday, December 17, 2010 6:25 AM
> To: [email protected]
> Subject: Re: Cannot Escape Special charectors Search with Lucene.Net 2.0
>
> On 17.12.2010 12:29, abhilash ramachandran wrote:
>> q = new global::Lucene.Net.QueryParsers.QueryParser("content", new
>> StandardAnalyzer()).Parse(query);
>
> I believe the issue has nothing to do with your query
> syntax. StandardAnalyzer is skipping chars like "&" during
> the indexing process, so you can't search for them.
>
> Robert
>
>

RE: Cannot Escape Special charectors Search with Lucene.Net 2.0

Reply via email to