Hi Frank, I have been working on something very similar and I am at the point where I don't believe (and I could be totally wrong) that a pure Solr solution is going to do this. I would look at Mahout and play with some of the machine learning algorithms that it can run against a Lucene index. I have not gotten any further than experimenting with it right now but so far it looks promising.
Adam On Sun, Jun 12, 2011 at 10:20 AM, Frank A <fsa...@gmail.com> wrote: > I have a single copyfield that has a number of other fields copied to it. > I'm trying to "extract" a list of keywords and common terms. I realize it > may not be a 100% dynamic and I may need to manually filter. Right now I > tried using a CommonGrams filter. However, what I see is it creates tokens > for both "hot" "dog" and "hot dog". Is there anyway from within solr > configuration to treat "hot"'s frequency as "hot when not followed by dog". > For example, right now I may see a term/frequency of: > > hot 8 > dog 6 > hot dog 6 > > What I really want is: > > hot dog 6 > hot 2 > > Any ideas? >