Hi Tavi, [email protected] has been deprecated since the Lucene and Solr source trees merged last year. Please use [email protected] instead.
However, your question is about *usage* of Lucene/Solr, rather than *development*, so you should be using [email protected] or [email protected]. Please repost your question to one of these lists. Steve > -----Original Message----- > From: Tavi Nathanson [mailto:[email protected]] > Sent: Monday, February 07, 2011 12:12 PM > To: [email protected] > Subject: Tokenization and Fuzziness: How to Allow Multiple Strategies? > > > Hey everyone, > > Tokenization seems inherently fuzzy and imprecise, yet Lucene does not > appear to provide an easy mechanism to account for this fuzziness. > > Let's take an example, where the document I'm indexing is "v1.1.0 mr. > jones > [email protected]" > > I may want to tokenize this as follows: ["v1.1.0", "mr", "jones", > "[email protected]"] > ...or I may want to tokenize this as follows: ["v1", "1.0", "mr", "jones", > "david", "gmail.com"] > ...or I may want to tokenize it another way. > > I would think that the best approach would be indexing using multiple > strategies, such as: > > ["v1.1.0", "v1", "1.0", "mr", "jones", "[email protected]", "david", > "gmail.com"] > > However, this would destroy phrase queries. And while Lucene lets you > index > multiple tokens at the same position, I haven't found a way to deal with > cases where you want to index a set of tokens at one position: nor does > that > even make sense. For instance, I can't index ["david", "gmail.com"] in the > same position as "[email protected]". > > So: > > - Any thoughts, in general, about how you all approach this fuzziness? Do > you just choose one tokenization strategy and hope for the best? > - Might there be a way to use multiple strategies and *not* break phrase > queries that I'm overlooking? > > Thanks! > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Tokenization-and-Fuzziness-How-to- > Allow-Multiple-Strategies-tp2444956p2444956.html > Sent from the Solr - Dev mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected]
