Re: Extracting terms from a query splitting a phrase.

Erick Erickson Tue, 05 Feb 2008 12:20:18 -0800

I don't think WhitespaceAnalyzer is doing what you think it is. From
the Javadoc...


public class *WhitespaceTokenizer*extends
CharTokenizer<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene/analysis/CharTokenizer.html>

A WhitespaceTokenizer is a tokenizer that divides text at
whitespace. Adjacent sequences of non-Whitespace characters form tokens.

 ------------------------------

 CharacterTokenizer
An abstract base class for simple, character-oriented tokenizers.

So I'm pretty sure that CharacterTokenizer is throwing out all the
non-character data (i.e. your double quotes), then WhitespaceTokenizer
is breaking on the space.

What is it that you want to have happen? If you're searching for
"General" right next to "Act", you can use a SpanNearQuery with
two SpanTermQuerys and a slop of 0.

The other thing to be aware of with WhitespaceAnalyzer is that
it doesn't lower case anything, so whether you'll get any hits
in your index depends upon the analyzers you used to index with
and whether case matches exactly.

Best
Erick

On Feb 5, 2008 3:03 PM, Spencer Tickner <[EMAIL PROTECTED]> wrote:

> Hi List,
>
> Thanks in advance for the help. I'm trying to extract terms from a
> query. From the reading I've done a phrase such as "General Act" is
> considered a term.
> http://lucene.apache.org/java/docs/queryparsersyntax.html#Terms .
> However when I'm doing testing to get the extractTerms of my query it
> splits this into General and Act. I'm wondering if I'm missing or not
> understanding something.
>
> My test Java code is:
>
>        private String FIELD_NAME = "rr_root";
>        private Query query;
>        private Hits hits = null;
>
>        public void testSearch() throws Exception
>        {
>                doSearching("\"General Act\"");
>                HashSet terms = new HashSet();
>                query.extractTerms(terms);
>                int i = 0;
>                for (Iterator iter = terms.iterator(); iter.hasNext();)
>                {
>                        i++;
>                        Term term = (Term)iter.next();
>                        System.out.println(i + " " + "term-" + term.text()
> + " field-" +
> term.field());
>                }
>         }
>
>        public void doSearching(String queryString) throws Exception
>        {
>                QueryParser parser=new QueryParser(FIELD_NAME, new
> WhitespaceAnalyzer());
>                query = parser.parse(queryString);
>                doSearching(query);
>        }
>        public void doSearching(Query unReWrittenQuery) throws Exception
>        {
>                searcher = aspect.getSearcher(); // searcher comming from a
> cahed class
>                query=unReWrittenQuery.rewrite(aspect.getReader()); //
> reader
> comming from a cached class
>                System.out.println("Searching for: " + query.toString
> (FIELD_NAME));
>                hits = searcher.search(query);
>        }
>
> The current output is:
>
> Searching for: "General Act"
> 1 term-General field-rr_root
> 2 term-Act field-rr_root
>
> The output I expect is:
>
> Searching for: "General Act"
> 1 term-General Act field-rr_root
>
> Thanks for any help.
>
> Spencer
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: Extracting terms from a query splitting a phrase.

Reply via email to