Re: find all two word phrases that appear in more than one document
I believe one of the admin pages (Solr 4+) shows all the terms and frequencies. You can use that even with stock example. Try that. If that makes sense, you can explore further. As to other examples, there is a couple of books. I bet Jack's book covers this. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Sep 10, 2013 at 12:09 PM, Ali, Saqib wrote: > Thanks Alexandre. I looked at the wiki page for the TermsComponent. But I > am not sure if I follow. Do you have an example or some better document? > Thanks! :) > > > On Mon, Sep 9, 2013 at 8:17 PM, Alexandre Rafalovitch >wrote: > > > The "phases" are usually called n-grams or shingles. > > > > You can probably use ShingleFilterFactory to create your shingles > (possibly > > with outputUnigrams=false) and then use TermsComponent ( > > http://wiki.apache.org/solr/TermsComponent) to list the results. > > > > Regards, > >Alex. > > > > Personal website: http://www.outerthoughts.com/ > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > > - Time is the quality of nature that keeps events from happening all at > > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > > > > On Tue, Sep 10, 2013 at 8:22 AM, Ali, Saqib > wrote: > > > > > Dear Solr Ninjas, > > > > > > We would like to run a query that returns two word phrases that appear > in > > > more than one document. So for e.g. take the string "Solr Ninja". Since > > it > > > appears in more than one document in our Solr instance, the query > should > > > return that. The query should find all such phrases from all the > > documents > > > in our Solr instance, by querying for two adjacent word combination > > > (forming a phrase) in the documents that are in the Solr. These two > > > adjacent word combinations should come from the documents in the Solr > > > index. > > > > > > Any ideas on how to write this query? > > > > > > Thanks. > > > > > >
Re: find all two word phrases that appear in more than one document
Thanks Alexandre. I looked at the wiki page for the TermsComponent. But I am not sure if I follow. Do you have an example or some better document? Thanks! :) On Mon, Sep 9, 2013 at 8:17 PM, Alexandre Rafalovitch wrote: > The "phases" are usually called n-grams or shingles. > > You can probably use ShingleFilterFactory to create your shingles (possibly > with outputUnigrams=false) and then use TermsComponent ( > http://wiki.apache.org/solr/TermsComponent) to list the results. > > Regards, >Alex. > > Personal website: http://www.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all at > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > On Tue, Sep 10, 2013 at 8:22 AM, Ali, Saqib wrote: > > > Dear Solr Ninjas, > > > > We would like to run a query that returns two word phrases that appear in > > more than one document. So for e.g. take the string "Solr Ninja". Since > it > > appears in more than one document in our Solr instance, the query should > > return that. The query should find all such phrases from all the > documents > > in our Solr instance, by querying for two adjacent word combination > > (forming a phrase) in the documents that are in the Solr. These two > > adjacent word combinations should come from the documents in the Solr > > index. > > > > Any ideas on how to write this query? > > > > Thanks. > > >
Re: find all two word phrases that appear in more than one document
The "phases" are usually called n-grams or shingles. You can probably use ShingleFilterFactory to create your shingles (possibly with outputUnigrams=false) and then use TermsComponent ( http://wiki.apache.org/solr/TermsComponent) to list the results. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Sep 10, 2013 at 8:22 AM, Ali, Saqib wrote: > Dear Solr Ninjas, > > We would like to run a query that returns two word phrases that appear in > more than one document. So for e.g. take the string "Solr Ninja". Since it > appears in more than one document in our Solr instance, the query should > return that. The query should find all such phrases from all the documents > in our Solr instance, by querying for two adjacent word combination > (forming a phrase) in the documents that are in the Solr. These two > adjacent word combinations should come from the documents in the Solr > index. > > Any ideas on how to write this query? > > Thanks. >