I still don't see what Bill gains by doing the term analysis himself rather than letting QueryParser do the hard work, in a portable non-lucene-version-specific way.
-- Ian. On Fri, Aug 3, 2012 at 9:39 PM, Robert Muir <rcm...@gmail.com> wrote: > you must call reset() before consuming any tokenstream. > > On Fri, Aug 3, 2012 at 4:03 PM, Jack Krupansky <j...@basetechnology.com> > wrote: >> Simon gave sample code for analyzing a multi-term string. >> >> Here's some pseudo-code (hasn't been compiled to check it) to analyze a >> single term with Lucene 3.6: >> >> public Term analyzeTerm(Analyzer analyzer, String termString){ >> TokenStream stream = analyzer.tokenStream(field, new >> StringReader(termString)); >> if (stream.incrementToken()) >> return new >> Term(stream.getAttribute(CharacterTermAttribute.class).toString()); >> else >> return null; >> // TODO: Close the StringReader >> // TODO: Handle terms that analyze into multiple terms (e.g., embedded >> punctuation) >> } >> >> And here's the corresponding code for Lucene 4.0: >> >> public Term analyzeTerm(Analyzer analyzer, String termString){ >> TokenStream stream = analyzer.tokenStream(field, new >> StringReader(termString)); >> if (stream.incrementToken()){ >> TermToBytesRefAttribute termAtt = >> stream.getAttribute(TermToBytesRefAttribute.class); >> BytesRef bytes = termAtt.getBytesRef(); >> return new Term(BytesRef.deepCopyOf(bytes)); >> } else >> return null; >> // TODO: Close the StringReader >> // TODO: Handle terms that analyze into multiple terms (e.g., embedded >> punctuation) >> } >> >> -- Jack Krupansky >> >> -----Original Message----- From: Bill Chesky >> Sent: Friday, August 03, 2012 2:55 PM >> To: java-user@lucene.apache.org >> >> Subject: RE: Analyzer on query question >> >> Ian/Jack, >> >> Ok, thanks for the help. I certainly don't want to take a cheap way out, >> hence my original question about whether this is the right way to do this. >> Jack, you say the right way is to do Term analysis before creating the Term. >> If anybody has any information on how to accomplish this I'd greatly >> appreciate it. >> >> regards, >> >> Bill >> >> -----Original Message----- >> From: Jack Krupansky [mailto:j...@basetechnology.com] >> Sent: Friday, August 03, 2012 1:22 PM >> To: java-user@lucene.apache.org >> Subject: Re: Analyzer on query question >> >> Bill, the re-parse of Query.toString will work provided that your query >> terms are either un-analyzed or their analyzer is "idempotent" (can be >> applied repeatedly without changing the output terms.) In your case, you are >> doing the former. >> >> The bottom line: 1) if it works for you, great, 2) for other readers, please >> do not depend on this approach if your input data is filtered in any way - >> if your index analyzer "filters" terms (e.g, stemming, case changes, >> term-splitting), your Term/TermQuery should be analyzed/filtered comparably, >> in which case the extra parse (to cause term analysis such as stemming) >> becomes unnecessary and risky if you are not very careful or very lucky. >> >> -- Jack Krupansky >> >> -----Original Message----- From: Ian Lea >> Sent: Friday, August 03, 2012 1:12 PM >> To: java-user@lucene.apache.org >> Subject: Re: Analyzer on query question >> >> Bill >> >> >> You're getting the snowball stemming either way which I guess is good, >> and if you get same results either way maybe it doesn't matter which >> technique you use. I'd be a bit worried about parsing the result of >> query.toString() because you aren't guaranteed to get back, in text, >> what you put in. >> >> My way seems better to me, but then it would. If you prefer your way >> I won't argue with you. >> >> >> -- >> Ian. >> >> >> On Fri, Aug 3, 2012 at 5:57 PM, Bill Chesky <bill.che...@learninga-z.com> >> wrote: >>> >>> Ian, >>> >>> I gave this method a try, at least the way I understood your suggestion. >>> E.g. to search for the phrase "cells combine" I built up a string like: >>> >>> title:"cells combine" description:"cells combine" text:"cells combine" >>> >>> then I passed that to the queryParser.parse() method (where queryParser is >>> an instance of QueryParser constructed using SnowballAnalyzer) and added >>> the result as a MUST clause in my final BooleanQuery. >>> >>> When I print the resulting query out as a string I get: >>> >>> +(title:"cell combin" description:"cell combin" keywords:"cell combin") >>> >>> So it looks like the SnowballAnalyzer is doing some stemming for me. But >>> this is the exact same result I'd get doing it the way I described in my >>> original email. I just built the unanalyzed string on my own rather than >>> using the various query classes like PhraseQuery, etc. >>> >>> So I don't see the advantage to doing it this way over the original >>> method. I just don't know if the original way I described is wrong or >>> will give me bad results. >>> >>> thanks for the help, >>> >>> Bill >>> >>> -----Original Message----- >>> From: Ian Lea [mailto:ian....@gmail.com] >>> Sent: Friday, August 03, 2012 9:32 AM >>> To: java-user@lucene.apache.org >>> Subject: Re: Analyzer on query question >>> >>> You can add parsed queries to a BooleanQuery. Would that help in this >>> case? >>> >>> SnowballAnalyzer sba = whatever(); >>> QueryParser qp = new QueryParser(..., sba); >>> Query q1 = qp.parse("some snowball string"); >>> Query q2 = qp.parse("some other snowball string"); >>> >>> BooleanQuery bq = new BooleanQuery(); >>> bq.add(q1, ...); >>> bq.add(q2, ...); >>> bq.add(loads of other stuff); >>> >>> >>> -- >>> ian. >>> >>> >>> On Fri, Aug 3, 2012 at 2:19 PM, Bill Chesky <bill.che...@learninga-z.com> >>> wrote: >>>> >>>> Thanks Simon, >>>> >>>> Unfortunately, I'm using Lucene 3.0.1 and CharTermAttribute doesn't seem >>>> to have been introduced until 3.1.0. Similarly my version of Lucene does >>>> not have a BooleanQuery.addClause(BooleanClause) method. Maybe you meant >>>> BooleanQuery.add(BooleanClause). >>> >>> >>>> >>>> In any case, most of what you're doing there, I'm just not familiar with. >>>> Seems very low level. I've never had to use TokenStreams to build a >>>> query before and I'm not really sure what is going on there. Also, I >>>> don't know what PositionIncrementAttribute is or how it would be used to >>>> create a PhraseQuery. The way I'm currently creating PhraseQuerys is >>>> very straightforward and intuitive. E.g. to search for the term "foo >>>> bar" I'd build the query like this: >>>> >>>> PhraseQuery phraseQuery = >>>> new PhraseQuery(); >>>> phraseQuery.add(new >>>> Term("title", "foo")); >>>> phraseQuery.add(new >>>> Term("title", "bar")); >>>> >>>> Is there really no easier way to associate the correct analyzer with >>>> these types of queries? >>>> >>>> Bill >>>> >>>> -----Original Message----- >>>> From: Simon Willnauer [mailto:simon.willna...@gmail.com] >>>> Sent: Friday, August 03, 2012 3:43 AM >>>> To: java-user@lucene.apache.org; Bill Chesky >>>> Subject: Re: Analyzer on query question >>>> >>>> On Thu, Aug 2, 2012 at 11:09 PM, Bill Chesky >>>> <bill.che...@learninga-z.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I understand that generally speaking you should use the same analyzer on >>>>> querying as was used on indexing. In my code I am using the >>>>> SnowballAnalyzer on index creation. However, on the query side I am >>>>> building up a complex BooleanQuery from other BooleanQuerys and/or >>>>> PhraseQuerys on several fields. None of these require specifying an >>>>> analyzer anywhere. This is causing some odd results, I think, because a >>>>> different analyzer (or no analyzer?) is being used for the query. >>>>> >>>>> Question: how do I build my boolean and phrase queries using the >>>>> SnowballAnalyzer? >>>>> >>>>> One thing I did that seemed to kind of work was to build my complex >>>>> query normally then build a snowball-analyzed query using a QueryParser >>>>> instantiated with a SnowballAnalyzer. To do this, I simply pass the >>>>> string value of the complex query to the QueryParser.parse() method to >>>>> get the new query. Something like this: >>>>> >>>>> // build a complex query from other BooleanQuerys and PhraseQuerys >>>>> BooleanQuery fullQuery = buildComplexQuery(); >>>>> QueryParser parser = new QueryParser(Version.LUCENE_30, "title", new >>>>> SnowballAnalyzer(Version.LUCENE_30, "English")); >>>>> Query snowballAnalyzedQuery = parser.parse(fullQuery.toString()); >>>>> >>>>> TopScoreDocCollector collector = TopScoreDocCollector.create(10000, >>>>> true); >>>>> indexSearcher.search(snowballAnalyzedQuery, collector); >>>> >>>> >>>> you can just use the analyzer directly like this: >>>> Analyzer analyzer = new SnowballAnalyzer(Version.LUCENE_30, "English"); >>>> >>>> TokenStream stream = analyzer.tokenStream("title", new >>>> StringReader(fullQuery.toString()): >>>> CharTermAttribute termAttr = >>>> stream.addAttribute(CharTermAttribute.class); >>>> stream.reset(); >>>> BooleanQuery q = new BooleanQuery(); >>>> while(stream.incrementToken()) { >>>> q.addClause(new BooleanClause(Occur.MUST, new Term("title", >>>> termAttr.toString()))); >>>> } >>>> >>>> you also have access to the token positions if you want to create >>>> phrase queries etc. just add a PositionIncrementAttribute like this: >>>> PositionIncrementAttribute posAttr = >>>> stream.addAttribute(PositionsIncrementAttribute.class); >>>> >>>> pls. doublecheck the code it's straight from the top of my head. >>>> >>>> simon >>>> >>>>> >>>>> Like I said, this seems to kind of work but it doesn't feel right. Does >>>>> this make sense? Is there a better way? >>>>> >>>>> thanks in advance, >>>>> >>>>> Bill >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > > -- > lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org