: I was trying to apply both : org.apache.solr.analysis.WordDelimiterFilter and : org.apache.lucene.analysis.ngram.NGramTokenFilter. : : Can I achive this with lucene's TokenStream?
Sure ... you just have to pick an ordering and wrap one arround the other. Solr does this anytime you define an <analyzer> using a <tokenizer> and a list of <filter>s : While thinking about TokenFilters, I came to an idea that : the TokenStream should have a structured representation. I've thought about that once or twice over the years as well... it would make things like multiword synonyms a lot easier to deal with if instead of a TokenStream we could have a directed TokenGraph with a single start and a single end (ie: only one node with no incoming links and only one node with no outgoing links). But even if you had a graph based api for Analyzers to express the set of tokens found, what would the end product look like? what would the format be of an index that stored Term position information as graph connections (esentially 3 dimensional info) instead of simple numeric position (1 dimensional) ? could it be searched as quickly? Most of the time, things that I think would be easier with a TokenGraph are still feasible using judicious use of positionIncrement, slop, and artifical "marker tokens" ... with Payloads even more complex things should move into the realm of "practical" (but it's likely I'm putting Payloads on too much of a pedestal ... I've never actually tried using them for anything) -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]