Hi Tavi, solr-...@lucene.apache.org has been deprecated since the Lucene and Solr source trees merged last year. Please use dev@lucene.apache.org instead.
However, your question is about *usage* of Lucene/Solr, rather than *development*, so you should be using solr-u...@lucene.apache.org or lucene-u...@lucene.apache.org. Please repost your question to one of these lists. Steve > -----Original Message----- > From: Tavi Nathanson [mailto:tavi.nathan...@gmail.com] > Sent: Monday, February 07, 2011 12:12 PM > To: solr-...@lucene.apache.org > Subject: Tokenization and Fuzziness: How to Allow Multiple Strategies? > > > Hey everyone, > > Tokenization seems inherently fuzzy and imprecise, yet Lucene does not > appear to provide an easy mechanism to account for this fuzziness. > > Let's take an example, where the document I'm indexing is "v1.1.0 mr. > jones > da...@gmail.com" > > I may want to tokenize this as follows: ["v1.1.0", "mr", "jones", > "da...@gmail.com"] > ...or I may want to tokenize this as follows: ["v1", "1.0", "mr", "jones", > "david", "gmail.com"] > ...or I may want to tokenize it another way. > > I would think that the best approach would be indexing using multiple > strategies, such as: > > ["v1.1.0", "v1", "1.0", "mr", "jones", "da...@gmail.com", "david", > "gmail.com"] > > However, this would destroy phrase queries. And while Lucene lets you > index > multiple tokens at the same position, I haven't found a way to deal with > cases where you want to index a set of tokens at one position: nor does > that > even make sense. For instance, I can't index ["david", "gmail.com"] in the > same position as "da...@gmail.com". > > So: > > - Any thoughts, in general, about how you all approach this fuzziness? Do > you just choose one tokenization strategy and hope for the best? > - Might there be a way to use multiple strategies and *not* break phrase > queries that I'm overlooking? > > Thanks! > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Tokenization-and-Fuzziness-How-to- > Allow-Multiple-Strategies-tp2444956p2444956.html > Sent from the Solr - Dev mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org