Great info Morus,
 
After making the "escape the dash" change to the QueryParser:
 
Query query = QueryParser.parse("+category:HW\\-NCI_TOPICS AND SPACE",
                                      "description",
                                      analyzer);
      Hits hits = searcher.search(query);
      System.out.println("query.ToString = " + query.toString("description"));
      assertEquals("HW-NCI_TOPICS kept as-is",
                   "+category:HW\\-NCI_TOPICS +space", query.toString("description")); 
 <------note that this passes with the escape put in, so not "as-is".
      assertEquals("doc found!", 1, hits.length());
 
I'm still getting this output:
 
 domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS] 
 
query.ToString = +category:HW\-NCI_TOPICS +space
 
junit.framework.AssertionFailedError: doc found! expected:<1> but was:<0>
 
It look like bug, http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 
<http://issues.apache.org/bugzilla/show_bug.cgi?id=27491> , was fixed today:
 
------- Additional Comments From Otis Gospodnetic <mailto:[EMAIL PROTECTED]>  
2004-03-24 10:10 -------

Although tft-monitor should not really result in a phrase query "tft monitor", I
agree that this is better than converting it to tft AND NOT monitor (tft -monitor).
Moreover, I have seen query syntax where '-' characters are used for phrase
queries instead or in addition to quotes, so one could use either morus-walter
or "morus walter".

I applied your change, as it doesn't look like it breaks anything, and I hope
nobody relied on ill behaviour where tft-monitor would result in AND NOT query.
-----------
But I assume this fix won't come out for some time.  Is there a way I can get this fix 
sooner?  
I'm up against a deadline and would very much like this functionality. 
 
And to go one more step with the KeywordAnalyzer that I wrote, changing this method to 
skip the escape:
    protected boolean isTokenChar(char c)
    {
         if (c == '\\')
         {
            return false;
         }
         else
         {
            return true;
         }
      }
The test then returns with a space:
 healthecare.domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS] 
query.ToString = +category:"HW -NCI_TOPICS" +space
junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is 
Expected:+category:HW\-NCI_TOPICS +space
Actual  :+category:"HW -NCI_TOPICS" +space   <----note space where escape was.
thanks,
chad.

        -----Original Message----- 
        From: Morus Walter [mailto:[EMAIL PROTECTED] 
        Sent: Wed 3/24/2004 1:43 AM 
        To: Lucene Users List 
        Cc: 
        Subject: RE: Query syntax on Keyword field question
        
        

        Chad Small writes:
        > Here is my attempt at a KeywordAnalyzer - although is not working?  Excuse 
the length of the message, but wanted to give actual code.
        > 
        > With this output:
        > 
        > Analzying "HW-NCI_TOPICS"
        >  org.apache.lucene.analysis.WhitespaceAnalyzer:
        >   [HW-NCI_TOPICS]
        >  org.apache.lucene.analysis.SimpleAnalyzer:
        >   [hw] [nci] [topics]
        >  org.apache.lucene.analysis.StopAnalyzer:
        >   [hw] [nci] [topics]
        >  org.apache.lucene.analysis.standard.StandardAnalyzer:
        >   [hw] [nci] [topics]
        >  healthecare.domain.lucenesearch.KeywordAnalyzer:
        >   [HW-NCI_TOPICS]
        > 
        > query.ToString = category:HW -"nci topics" +space
        >
        > junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
        > Expected:+category:HW-NCI_TOPICS +space
        > Actual  :category:HW -"nci topics" +space
        > 
        
        Well query parser does not allow `-' within words currently.
        So before your analyzer is called, query parser reads one word HW, a `-'
        operator, one word NCI_TOPICS.
        The latter is analyzed as "nci topics" because it's not in field category
        anymore, I guess.
        
        I suggested to change this. See
        http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
        
        Either you escape the - using category:HW\-NCI_TOPICS in your query
        (untested. and I don't know where the escape character will be removed)
        or you apply my suggested change.
        
        Another option for using keywords with query parser might be adding a
        keyword syntax to the query parser.
        Something like category:key("HW-NCI_TOPICS") or category="HW-NCI_TOPICS".
        
        HTH
                Morus
        
        ---------------------------------------------------------------------
        To unsubscribe, e-mail: [EMAIL PROTECTED]
        For additional commands, e-mail: [EMAIL PROTECTED]
        
        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to