Re: find all two word phrases that appear in more than one document

2013-09-09 Thread Alexandre Rafalovitch
I believe one of the admin pages (Solr 4+) shows all the terms and
frequencies. You can use that even with stock example. Try that. If that
makes sense, you can explore further.

As to other examples, there is a couple of books. I bet Jack's book covers
this.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Sep 10, 2013 at 12:09 PM, Ali, Saqib  wrote:

> Thanks Alexandre. I looked at the wiki page for the TermsComponent. But I
> am not sure if I follow. Do you have an example or some better document?
> Thanks! :)
>
>
> On Mon, Sep 9, 2013 at 8:17 PM, Alexandre Rafalovitch  >wrote:
>
> > The "phases" are usually called n-grams or shingles.
> >
> > You can probably use ShingleFilterFactory to create your shingles
> (possibly
> > with outputUnigrams=false) and then use TermsComponent (
> > http://wiki.apache.org/solr/TermsComponent) to list the results.
> >
> > Regards,
> >Alex.
> >
> > Personal website: http://www.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all at
> > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> >
> >
> > On Tue, Sep 10, 2013 at 8:22 AM, Ali, Saqib 
> wrote:
> >
> > > Dear Solr Ninjas,
> > >
> > > We would like to run a query that returns two word phrases that appear
> in
> > > more than one document. So for e.g. take the string "Solr Ninja". Since
> > it
> > > appears in more than one document in our Solr instance, the query
> should
> > > return that. The query should  find all such phrases from all the
> > documents
> > > in our Solr instance, by querying for two adjacent word combination
> > > (forming a phrase) in the documents that are in the Solr. These two
> > > adjacent word combinations should come from the documents in the Solr
> > > index.
> > >
> > > Any ideas on how to write this query?
> > >
> > > Thanks.
> > >
> >
>


Re: find all two word phrases that appear in more than one document

2013-09-09 Thread Ali, Saqib
Thanks Alexandre. I looked at the wiki page for the TermsComponent. But I
am not sure if I follow. Do you have an example or some better document?
Thanks! :)


On Mon, Sep 9, 2013 at 8:17 PM, Alexandre Rafalovitch wrote:

> The "phases" are usually called n-grams or shingles.
>
> You can probably use ShingleFilterFactory to create your shingles (possibly
> with outputUnigrams=false) and then use TermsComponent (
> http://wiki.apache.org/solr/TermsComponent) to list the results.
>
> Regards,
>Alex.
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Tue, Sep 10, 2013 at 8:22 AM, Ali, Saqib  wrote:
>
> > Dear Solr Ninjas,
> >
> > We would like to run a query that returns two word phrases that appear in
> > more than one document. So for e.g. take the string "Solr Ninja". Since
> it
> > appears in more than one document in our Solr instance, the query should
> > return that. The query should  find all such phrases from all the
> documents
> > in our Solr instance, by querying for two adjacent word combination
> > (forming a phrase) in the documents that are in the Solr. These two
> > adjacent word combinations should come from the documents in the Solr
> > index.
> >
> > Any ideas on how to write this query?
> >
> > Thanks.
> >
>


Re: find all two word phrases that appear in more than one document

2013-09-09 Thread Alexandre Rafalovitch
The "phases" are usually called n-grams or shingles.

You can probably use ShingleFilterFactory to create your shingles (possibly
with outputUnigrams=false) and then use TermsComponent (
http://wiki.apache.org/solr/TermsComponent) to list the results.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Sep 10, 2013 at 8:22 AM, Ali, Saqib  wrote:

> Dear Solr Ninjas,
>
> We would like to run a query that returns two word phrases that appear in
> more than one document. So for e.g. take the string "Solr Ninja". Since it
> appears in more than one document in our Solr instance, the query should
> return that. The query should  find all such phrases from all the documents
> in our Solr instance, by querying for two adjacent word combination
> (forming a phrase) in the documents that are in the Solr. These two
> adjacent word combinations should come from the documents in the Solr
> index.
>
> Any ideas on how to write this query?
>
> Thanks.
>