Another confusion factor in the original question is the purpose of the escape
character. It appears to work fine.
Here's an example that indexes "test&&test" and finds it in a search. Note the
escape character is not needed for this.
It also indexes the odd phrase "yellow^orange" and uses the escape character so
the circumflex character "^" is used as search text instead of being used to
indicate a boost value as it would normally do. Without the escape preceding
the circumflex a parse exception occurs.
----------------------------
using WhitespaceAnalyzer = Lucene.Net.Analysis.WhitespaceAnalyzer;
using RAMDirectory = Lucene.Net.Store.RAMDirectory;
using IndexModifier = Lucene.Net.Index.IndexModifier;
using LDocument = Lucene.Net.Documents.Document;
using LField = Lucene.Net.Documents.Field;
using IndexSearcher = Lucene.Net.Search.IndexSearcher;
using Query = Lucene.Net.Search.Query;
using QueryParser = Lucene.Net.QueryParsers.QueryParser;
using Hits = Lucene.Net.Search.Hits;
using Hit = Lucene.Net.Search.Hit;
RAMDirectory ixStoreII = new RAMDirectory();
IndexModifier ixModifierII =
new IndexModifier(ixStoreII, new WhitespaceAnalyzer(), true);
LDocument docTest = null;
docTest = new LDocument();
docTest.Add(LField.Keyword("name", "Doc-1"));
docTest.Add(LField.Text("content", "cyan magenta yellow^orange"));
ixModifierII.AddDocument(docTest);
docTest = new LDocument();
docTest.Add(LField.Keyword("name", "Doc-2"));
docTest.Add(LField.Text("content", "red green test&&test blue"));
ixModifierII.AddDocument(docTest);
docTest = new LDocument();
docTest.Add(LField.Keyword("name", "Doc-3"));
docTest.Add(LField.Text("content", "red green test magenta"));
ixModifierII.AddDocument(docTest);
ixModifierII.Close();
IndexSearcher ixSearcher = new IndexSearcher(ixStoreII);
Query q = QueryParser.Parse(
"test&&test OR yellow\\^orange", "content",
new WhitespaceAnalyzer());
Hits hits = ixSearcher.Search(q);
System.Collections.IEnumerator euHits = hits.Iterator();
while (euHits.MoveNext())
{
Hit htCUR = (Hit)(euHits.Current);
LDocument ixDoc = htCUR.GetDocument();
System.Console.WriteLine("Found " + ixDoc.Get("name"));
}
ixSearcher.Close();
ixStoreII.Close();
----------------------------
The output is:
Found Doc-1
Found Doc-2
- Neal
-----Original Message-----
From: Robert Jordan [mailto:[email protected]]
Sent: Friday, December 17, 2010 11:12 AM
To: [email protected]
Subject: Re: Cannot Escape Special charectors Search with Lucene.Net 2.0
On 17.12.2010 17:59, Digy wrote:
>> N.G --> You can see that the "&&" characters were identified as separators
> and two "test" tokens were emitted not the single "test&&test" you expected.
>
>> A.R --> The scenario is if I try search a text "Test&&Test"
>
> But the query "Test&&Test" will also be parsed as "test test" by
> StandardAnalyzer. Since there are 2 sucessive "test"s in the index, there
> must be a hit.
Or he doesn't use the same analyzer for indexing and searching.
Robert
>
> DIGY
>
>
> -----Original Message-----
> From: Granroth, Neal V. [mailto:[email protected]]
> Sent: Friday, December 17, 2010 6:06 PM
> To: [email protected]
> Subject: RE: Cannot Escape Special charectors Search with Lucene.Net 2.0
>
>
> Robert's correct the StandardAnalyzer will split the input text at the "&&"
> characters so your index will not contain them. As in this simple example:
>
> StandardAnalyzer aa = new StandardAnalyzer();
>
> System.IO.StringReader srs = new System.IO.StringReader("aaa bbb test&&test
> ccc ddd");
>
> Lucene.Net.Analysis.TokenStream ts = aa.TokenStream(srs);
>
> Lucene.Net.Analysis.Token tk;
> while( (tk = ts.Next()) != null )
> {
> System.Console.WriteLine(String.Format("Token: \"{0}\": S:{1}, E:{2}",
> tk.TermText(),tk.StartOffset(),tk.EndOffset()));
> }
>
> The output looks like this:
> Token: "aaa": S:0, E:3
> Token: "bbb": S:4, E:7
> Token: "test": S:8, E:12
> Token: "test": S:14, E:18
> Token: "ccc": S:19, E:22
> Token: "ddd": S:23, E:26
>
> You can see that the "&&" characters were identified as separators and two
> "test" tokens were emitted not the single "test&&test" you expected.
>
>
> - Neal
>
> -----Original Message-----
> From: Robert Jordan [mailto:[email protected]]
> Sent: Friday, December 17, 2010 6:25 AM
> To: [email protected]
> Subject: Re: Cannot Escape Special charectors Search with Lucene.Net 2.0
>
> On 17.12.2010 12:29, abhilash ramachandran wrote:
>> q = new global::Lucene.Net.QueryParsers.QueryParser("content", new
>> StandardAnalyzer()).Parse(query);
>
> I believe the issue has nothing to do with your query
> syntax. StandardAnalyzer is skipping chars like "&" during
> the indexing process, so you can't search for them.
>
> Robert
>
>