Re: Detect term occurrences

2015-09-13 Thread Francisco Andrés Fernández
Thanks again.
For the moment I think it won't be a problem. I have ~500 documents.
Regards,

Francisco

El vie., 11 de sept. de 2015 a la(s) 6:08 p. m., simon 
escribió:

> +1 on Sujit's recommendation: we have a similar use case (detecting drug
> names / disease entities /MeSH terms ) and have been using the
> SolrTextTagger with great success.
>
> We run a separate Solr instance as a tagging  service and add the detected
> tags as metadata fields to a document before it is ingested into our main
> Solr collection.
>
> How many documents/product leaflets do you have ? The tagger is very fast
> at the Solr level but I'm seeing quite a bit of HTTP overhead.
>
> best
>
> -Simon
>
> On Fri, Sep 11, 2015 at 1:39 PM, Sujit Pal  wrote:
>
> > Hi Francisco,
> >
> > >> I have many drug products leaflets, each corresponding to 1 product.
> In
> > the
> > other hand we have a medical dictionary with about 10^5 terms.
> > I want to detect all the occurrences of those terms for any leaflet
> > document.
> > Take a look at SolrTextTagger for this use case.
> > https://github.com/OpenSextant/SolrTextTagger
> >
> > 10^5 entries are not that large, I am using it for much larger
> dictionaries
> > at the moment with very good results.
> >
> > Its a project built (at least originally) by David Smiley, who is also
> > quite active in this group.
> >
> > -sujit
> >
> >
> > On Fri, Sep 11, 2015 at 7:29 AM, Alexandre Rafalovitch <
> arafa...@gmail.com
> > >
> > wrote:
> >
> > > Assuming the medical dictionary is constant, I would do a copyField of
> > > text into a separate field and have that separate field use:
> > >
> > >
> >
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html
> > > with words coming from the dictionary (normalized).
> > >
> > > That way that new field will ONLY have your dictionary terms from the
> > > text. Then you can do facet against that field or anything else. Or
> > > even search and just be a lot more efficient.
> > >
> > > The main issue would be a gigantic filter, which may mean speed and/or
> > > memory issues. Solr has some ways to deal with such large set matches
> > > by compiling them into a state machine (used for auto-complete), but I
> > > don't know if that's exposed for your purpose.
> > >
> > > But could make a fun custom filter to build.
> > >
> > > Regards,
> > >Alex.
> > > 
> > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > > http://www.solr-start.com/
> > >
> > >
> > > On 10 September 2015 at 22:21, Francisco Andrés Fernández
> > >  wrote:
> > > > Yes.
> > > > I have many drug products leaflets, each corresponding to 1 product.
> In
> > > the
> > > > other hand we have a medical dictionary with about 10^5 terms.
> > > > I want to detect all the occurrences of those terms for any leaflet
> > > > document.
> > > > Could you give me a clue about how is the best way to perform it?
> > > > Perhaps, the best way is (as Walter suggests) to do all the queries
> > every
> > > > time, as needed.
> > > > Regards,
> > > >
> > > > Francisco
> > > >
> > > > El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre
> > Rafalovitch <
> > > > arafa...@gmail.com> escribió:
> > > >
> > > >> Can you tell us a bit more about the business case? Not the current
> > > >> technical one. Because it is entirely possible Solr can solve the
> > > >> higher level problem out of the box without you doing manual term
> > > >> comparisons.In which case, your problem scope is not quite right.
> > > >>
> > > >> Regards,
> > > >>Alex.
> > > >> 
> > > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > > >> http://www.solr-start.com/
> > > >>
> > > >>
> > > >> On 10 September 2015 at 09:58, Francisco Andrés Fernández
> > > >>  wrote:
> > > >> > Hi all, I'm new to Solr.
> > > >> > I want to detect all ocurrences of terms existing in a thesaurus
> > into
> > > 1
> > > >> or
> > > >> > more documents.
> > > >> > What´s the best strategy to make it?
> > > >> > Doing a query for each term doesn't seem to be the best way.
> > > >> > Many thanks,
> > > >> >
> > > >> > Francisco
> > > >>
> > >
> >
>


Re: Detect term occurrences

2015-09-11 Thread Sujit Pal
Hi Francisco,

>> I have many drug products leaflets, each corresponding to 1 product. In
the
other hand we have a medical dictionary with about 10^5 terms.
I want to detect all the occurrences of those terms for any leaflet
document.
Take a look at SolrTextTagger for this use case.
https://github.com/OpenSextant/SolrTextTagger

10^5 entries are not that large, I am using it for much larger dictionaries
at the moment with very good results.

Its a project built (at least originally) by David Smiley, who is also
quite active in this group.

-sujit


On Fri, Sep 11, 2015 at 7:29 AM, Alexandre Rafalovitch 
wrote:

> Assuming the medical dictionary is constant, I would do a copyField of
> text into a separate field and have that separate field use:
>
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html
> with words coming from the dictionary (normalized).
>
> That way that new field will ONLY have your dictionary terms from the
> text. Then you can do facet against that field or anything else. Or
> even search and just be a lot more efficient.
>
> The main issue would be a gigantic filter, which may mean speed and/or
> memory issues. Solr has some ways to deal with such large set matches
> by compiling them into a state machine (used for auto-complete), but I
> don't know if that's exposed for your purpose.
>
> But could make a fun custom filter to build.
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 10 September 2015 at 22:21, Francisco Andrés Fernández
>  wrote:
> > Yes.
> > I have many drug products leaflets, each corresponding to 1 product. In
> the
> > other hand we have a medical dictionary with about 10^5 terms.
> > I want to detect all the occurrences of those terms for any leaflet
> > document.
> > Could you give me a clue about how is the best way to perform it?
> > Perhaps, the best way is (as Walter suggests) to do all the queries every
> > time, as needed.
> > Regards,
> >
> > Francisco
> >
> > El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre Rafalovitch <
> > arafa...@gmail.com> escribió:
> >
> >> Can you tell us a bit more about the business case? Not the current
> >> technical one. Because it is entirely possible Solr can solve the
> >> higher level problem out of the box without you doing manual term
> >> comparisons.In which case, your problem scope is not quite right.
> >>
> >> Regards,
> >>Alex.
> >> 
> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> http://www.solr-start.com/
> >>
> >>
> >> On 10 September 2015 at 09:58, Francisco Andrés Fernández
> >>  wrote:
> >> > Hi all, I'm new to Solr.
> >> > I want to detect all ocurrences of terms existing in a thesaurus into
> 1
> >> or
> >> > more documents.
> >> > What´s the best strategy to make it?
> >> > Doing a query for each term doesn't seem to be the best way.
> >> > Many thanks,
> >> >
> >> > Francisco
> >>
>


Re: Detect term occurrences

2015-09-11 Thread Upayavira
It sounds to me like you are wanting to *filter* your document to only
include terms within that medical dictionary. Or to have a keyword field
based upon those of your 100k terms that appear in that doc.

Synonyms are your saviour, if that's the case. Create a synonyms list
for your terms, they can be a one-to-one mapping, so:

diabetes => diabetes

is quite okay. Then, in your index time analysis chain, have a
SynonymFilterFactory followed by a TypeTokenFilterFactory configured to
only allow SYNONYM tokens through.

Then, in your index, you will have a field that contains all the terms
from your 100k that are included in that particular document.

Does that get it?

Upayavira

On Fri, Sep 11, 2015, at 03:21 AM, Francisco Andrés Fernández wrote:
> Yes.
> I have many drug products leaflets, each corresponding to 1 product. In
> the
> other hand we have a medical dictionary with about 10^5 terms.
> I want to detect all the occurrences of those terms for any leaflet
> document.
> Could you give me a clue about how is the best way to perform it?
> Perhaps, the best way is (as Walter suggests) to do all the queries every
> time, as needed.
> Regards,
> 
> Francisco
> 
> El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre Rafalovitch <
> arafa...@gmail.com> escribió:
> 
> > Can you tell us a bit more about the business case? Not the current
> > technical one. Because it is entirely possible Solr can solve the
> > higher level problem out of the box without you doing manual term
> > comparisons.In which case, your problem scope is not quite right.
> >
> > Regards,
> >Alex.
> > 
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 10 September 2015 at 09:58, Francisco Andrés Fernández
> >  wrote:
> > > Hi all, I'm new to Solr.
> > > I want to detect all ocurrences of terms existing in a thesaurus into 1
> > or
> > > more documents.
> > > What´s the best strategy to make it?
> > > Doing a query for each term doesn't seem to be the best way.
> > > Many thanks,
> > >
> > > Francisco
> >


Re: Detect term occurrences

2015-09-11 Thread simon
+1 on Sujit's recommendation: we have a similar use case (detecting drug
names / disease entities /MeSH terms ) and have been using the
SolrTextTagger with great success.

We run a separate Solr instance as a tagging  service and add the detected
tags as metadata fields to a document before it is ingested into our main
Solr collection.

How many documents/product leaflets do you have ? The tagger is very fast
at the Solr level but I'm seeing quite a bit of HTTP overhead.

best

-Simon

On Fri, Sep 11, 2015 at 1:39 PM, Sujit Pal  wrote:

> Hi Francisco,
>
> >> I have many drug products leaflets, each corresponding to 1 product. In
> the
> other hand we have a medical dictionary with about 10^5 terms.
> I want to detect all the occurrences of those terms for any leaflet
> document.
> Take a look at SolrTextTagger for this use case.
> https://github.com/OpenSextant/SolrTextTagger
>
> 10^5 entries are not that large, I am using it for much larger dictionaries
> at the moment with very good results.
>
> Its a project built (at least originally) by David Smiley, who is also
> quite active in this group.
>
> -sujit
>
>
> On Fri, Sep 11, 2015 at 7:29 AM, Alexandre Rafalovitch  >
> wrote:
>
> > Assuming the medical dictionary is constant, I would do a copyField of
> > text into a separate field and have that separate field use:
> >
> >
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html
> > with words coming from the dictionary (normalized).
> >
> > That way that new field will ONLY have your dictionary terms from the
> > text. Then you can do facet against that field or anything else. Or
> > even search and just be a lot more efficient.
> >
> > The main issue would be a gigantic filter, which may mean speed and/or
> > memory issues. Solr has some ways to deal with such large set matches
> > by compiling them into a state machine (used for auto-complete), but I
> > don't know if that's exposed for your purpose.
> >
> > But could make a fun custom filter to build.
> >
> > Regards,
> >Alex.
> > 
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 10 September 2015 at 22:21, Francisco Andrés Fernández
> >  wrote:
> > > Yes.
> > > I have many drug products leaflets, each corresponding to 1 product. In
> > the
> > > other hand we have a medical dictionary with about 10^5 terms.
> > > I want to detect all the occurrences of those terms for any leaflet
> > > document.
> > > Could you give me a clue about how is the best way to perform it?
> > > Perhaps, the best way is (as Walter suggests) to do all the queries
> every
> > > time, as needed.
> > > Regards,
> > >
> > > Francisco
> > >
> > > El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre
> Rafalovitch <
> > > arafa...@gmail.com> escribió:
> > >
> > >> Can you tell us a bit more about the business case? Not the current
> > >> technical one. Because it is entirely possible Solr can solve the
> > >> higher level problem out of the box without you doing manual term
> > >> comparisons.In which case, your problem scope is not quite right.
> > >>
> > >> Regards,
> > >>Alex.
> > >> 
> > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > >> http://www.solr-start.com/
> > >>
> > >>
> > >> On 10 September 2015 at 09:58, Francisco Andrés Fernández
> > >>  wrote:
> > >> > Hi all, I'm new to Solr.
> > >> > I want to detect all ocurrences of terms existing in a thesaurus
> into
> > 1
> > >> or
> > >> > more documents.
> > >> > What´s the best strategy to make it?
> > >> > Doing a query for each term doesn't seem to be the best way.
> > >> > Many thanks,
> > >> >
> > >> > Francisco
> > >>
> >
>


Re: Detect term occurrences

2015-09-11 Thread Francisco Andrés Fernández
Many thanks pals.
I will walk some of those ways (and return with new questions)
;)
Best regards,

Francisco

El vie., 11 de sept. de 2015 a la(s) 5:41 a. m., Upayavira 
escribió:

> It sounds to me like you are wanting to *filter* your document to only
> include terms within that medical dictionary. Or to have a keyword field
> based upon those of your 100k terms that appear in that doc.
>
> Synonyms are your saviour, if that's the case. Create a synonyms list
> for your terms, they can be a one-to-one mapping, so:
>
> diabetes => diabetes
>
> is quite okay. Then, in your index time analysis chain, have a
> SynonymFilterFactory followed by a TypeTokenFilterFactory configured to
> only allow SYNONYM tokens through.
>
> Then, in your index, you will have a field that contains all the terms
> from your 100k that are included in that particular document.
>
> Does that get it?
>
> Upayavira
>
> On Fri, Sep 11, 2015, at 03:21 AM, Francisco Andrés Fernández wrote:
> > Yes.
> > I have many drug products leaflets, each corresponding to 1 product. In
> > the
> > other hand we have a medical dictionary with about 10^5 terms.
> > I want to detect all the occurrences of those terms for any leaflet
> > document.
> > Could you give me a clue about how is the best way to perform it?
> > Perhaps, the best way is (as Walter suggests) to do all the queries every
> > time, as needed.
> > Regards,
> >
> > Francisco
> >
> > El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre Rafalovitch <
> > arafa...@gmail.com> escribió:
> >
> > > Can you tell us a bit more about the business case? Not the current
> > > technical one. Because it is entirely possible Solr can solve the
> > > higher level problem out of the box without you doing manual term
> > > comparisons.In which case, your problem scope is not quite right.
> > >
> > > Regards,
> > >Alex.
> > > 
> > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > > http://www.solr-start.com/
> > >
> > >
> > > On 10 September 2015 at 09:58, Francisco Andrés Fernández
> > >  wrote:
> > > > Hi all, I'm new to Solr.
> > > > I want to detect all ocurrences of terms existing in a thesaurus
> into 1
> > > or
> > > > more documents.
> > > > What´s the best strategy to make it?
> > > > Doing a query for each term doesn't seem to be the best way.
> > > > Many thanks,
> > > >
> > > > Francisco
> > >
>


Re: Detect term occurrences

2015-09-11 Thread Francisco Andrés Fernández
Thanks!

El vie, sep 11, 2015 14:39, Sujit Pal  escribió:

> Hi Francisco,
>
> >> I have many drug products leaflets, each corresponding to 1 product. In
> the
> other hand we have a medical dictionary with about 10^5 terms.
> I want to detect all the occurrences of those terms for any leaflet
> document.
> Take a look at SolrTextTagger for this use case.
> https://github.com/OpenSextant/SolrTextTagger
>
> 10^5 entries are not that large, I am using it for much larger dictionaries
> at the moment with very good results.
>
> Its a project built (at least originally) by David Smiley, who is also
> quite active in this group.
>
> -sujit
>
>
> On Fri, Sep 11, 2015 at 7:29 AM, Alexandre Rafalovitch  >
> wrote:
>
> > Assuming the medical dictionary is constant, I would do a copyField of
> > text into a separate field and have that separate field use:
> >
> >
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html
> > with words coming from the dictionary (normalized).
> >
> > That way that new field will ONLY have your dictionary terms from the
> > text. Then you can do facet against that field or anything else. Or
> > even search and just be a lot more efficient.
> >
> > The main issue would be a gigantic filter, which may mean speed and/or
> > memory issues. Solr has some ways to deal with such large set matches
> > by compiling them into a state machine (used for auto-complete), but I
> > don't know if that's exposed for your purpose.
> >
> > But could make a fun custom filter to build.
> >
> > Regards,
> >Alex.
> > 
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 10 September 2015 at 22:21, Francisco Andrés Fernández
> >  wrote:
> > > Yes.
> > > I have many drug products leaflets, each corresponding to 1 product. In
> > the
> > > other hand we have a medical dictionary with about 10^5 terms.
> > > I want to detect all the occurrences of those terms for any leaflet
> > > document.
> > > Could you give me a clue about how is the best way to perform it?
> > > Perhaps, the best way is (as Walter suggests) to do all the queries
> every
> > > time, as needed.
> > > Regards,
> > >
> > > Francisco
> > >
> > > El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre
> Rafalovitch <
> > > arafa...@gmail.com> escribió:
> > >
> > >> Can you tell us a bit more about the business case? Not the current
> > >> technical one. Because it is entirely possible Solr can solve the
> > >> higher level problem out of the box without you doing manual term
> > >> comparisons.In which case, your problem scope is not quite right.
> > >>
> > >> Regards,
> > >>Alex.
> > >> 
> > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > >> http://www.solr-start.com/
> > >>
> > >>
> > >> On 10 September 2015 at 09:58, Francisco Andrés Fernández
> > >>  wrote:
> > >> > Hi all, I'm new to Solr.
> > >> > I want to detect all ocurrences of terms existing in a thesaurus
> into
> > 1
> > >> or
> > >> > more documents.
> > >> > What´s the best strategy to make it?
> > >> > Doing a query for each term doesn't seem to be the best way.
> > >> > Many thanks,
> > >> >
> > >> > Francisco
> > >>
> >
>


Re: Detect term occurrences

2015-09-11 Thread Alexandre Rafalovitch
Assuming the medical dictionary is constant, I would do a copyField of
text into a separate field and have that separate field use:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html
with words coming from the dictionary (normalized).

That way that new field will ONLY have your dictionary terms from the
text. Then you can do facet against that field or anything else. Or
even search and just be a lot more efficient.

The main issue would be a gigantic filter, which may mean speed and/or
memory issues. Solr has some ways to deal with such large set matches
by compiling them into a state machine (used for auto-complete), but I
don't know if that's exposed for your purpose.

But could make a fun custom filter to build.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 10 September 2015 at 22:21, Francisco Andrés Fernández
 wrote:
> Yes.
> I have many drug products leaflets, each corresponding to 1 product. In the
> other hand we have a medical dictionary with about 10^5 terms.
> I want to detect all the occurrences of those terms for any leaflet
> document.
> Could you give me a clue about how is the best way to perform it?
> Perhaps, the best way is (as Walter suggests) to do all the queries every
> time, as needed.
> Regards,
>
> Francisco
>
> El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre Rafalovitch <
> arafa...@gmail.com> escribió:
>
>> Can you tell us a bit more about the business case? Not the current
>> technical one. Because it is entirely possible Solr can solve the
>> higher level problem out of the box without you doing manual term
>> comparisons.In which case, your problem scope is not quite right.
>>
>> Regards,
>>Alex.
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 10 September 2015 at 09:58, Francisco Andrés Fernández
>>  wrote:
>> > Hi all, I'm new to Solr.
>> > I want to detect all ocurrences of terms existing in a thesaurus into 1
>> or
>> > more documents.
>> > What´s the best strategy to make it?
>> > Doing a query for each term doesn't seem to be the best way.
>> > Many thanks,
>> >
>> > Francisco
>>


Re: Detect term occurrences

2015-09-10 Thread Walter Underwood
Doing a query for each term should work well. Solr is fast for queries. Write a 
script.

I assume you only need to do this once. Running all the queries will probably 
take less time than figuring out a different approach.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Sep 10, 2015, at 7:37 AM, Markus Jelsma  wrote:

> If you are interested in just the number of occurences of an indexed term. 
> The TermsComponent will give that answer.
> MArkus 
> 
> -Original message-
>> From:Francisco Andrés Fernández 
>> Sent: Thursday 10th September 2015 15:58
>> To: solr-user@lucene.apache.org
>> Subject: Detect term occurrences
>> 
>> Hi all, I'm new to Solr.
>> I want to detect all ocurrences of terms existing in a thesaurus into 1 or
>> more documents.
>> What´s the best strategy to make it?
>> Doing a query for each term doesn't seem to be the best way.
>> Many thanks,
>> 
>> Francisco
>> 



RE: Detect term occurrences

2015-09-10 Thread Markus Jelsma
If you are interested in just the number of occurences of an indexed term. The 
TermsComponent will give that answer.
MArkus 
 
-Original message-
> From:Francisco Andrés Fernández 
> Sent: Thursday 10th September 2015 15:58
> To: solr-user@lucene.apache.org
> Subject: Detect term occurrences
> 
> Hi all, I'm new to Solr.
> I want to detect all ocurrences of terms existing in a thesaurus into 1 or
> more documents.
> What´s the best strategy to make it?
> Doing a query for each term doesn't seem to be the best way.
> Many thanks,
> 
> Francisco
> 


Re: Detect term occurrences

2015-09-10 Thread Alexandre Rafalovitch
Can you tell us a bit more about the business case? Not the current
technical one. Because it is entirely possible Solr can solve the
higher level problem out of the box without you doing manual term
comparisons.In which case, your problem scope is not quite right.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 10 September 2015 at 09:58, Francisco Andrés Fernández
 wrote:
> Hi all, I'm new to Solr.
> I want to detect all ocurrences of terms existing in a thesaurus into 1 or
> more documents.
> What´s the best strategy to make it?
> Doing a query for each term doesn't seem to be the best way.
> Many thanks,
>
> Francisco


Re: Detect term occurrences

2015-09-10 Thread Erick Erickson
_Assuming_ this isn't a high throughput _and_ the leaflet text isn't too big...

Index the thesaurus and fire all the terms of the query in a big OR
clause against the index as a _query_. Perhaps turn highlighting on
and highlight the entire leaflet text.

Note, this is just "off the top of my head", I really haven't thought
it through too far and a lot depends on how many leaflets you have to
process and how often

Best,
Erick

On Thu, Sep 10, 2015 at 7:21 PM, Francisco Andrés Fernández
 wrote:
> Yes.
> I have many drug products leaflets, each corresponding to 1 product. In the
> other hand we have a medical dictionary with about 10^5 terms.
> I want to detect all the occurrences of those terms for any leaflet
> document.
> Could you give me a clue about how is the best way to perform it?
> Perhaps, the best way is (as Walter suggests) to do all the queries every
> time, as needed.
> Regards,
>
> Francisco
>
> El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre Rafalovitch <
> arafa...@gmail.com> escribió:
>
>> Can you tell us a bit more about the business case? Not the current
>> technical one. Because it is entirely possible Solr can solve the
>> higher level problem out of the box without you doing manual term
>> comparisons.In which case, your problem scope is not quite right.
>>
>> Regards,
>>Alex.
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 10 September 2015 at 09:58, Francisco Andrés Fernández
>>  wrote:
>> > Hi all, I'm new to Solr.
>> > I want to detect all ocurrences of terms existing in a thesaurus into 1
>> or
>> > more documents.
>> > What´s the best strategy to make it?
>> > Doing a query for each term doesn't seem to be the best way.
>> > Many thanks,
>> >
>> > Francisco
>>


Re: Detect term occurrences

2015-09-10 Thread Francisco Andrés Fernández
Yes.
I have many drug products leaflets, each corresponding to 1 product. In the
other hand we have a medical dictionary with about 10^5 terms.
I want to detect all the occurrences of those terms for any leaflet
document.
Could you give me a clue about how is the best way to perform it?
Perhaps, the best way is (as Walter suggests) to do all the queries every
time, as needed.
Regards,

Francisco

El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre Rafalovitch <
arafa...@gmail.com> escribió:

> Can you tell us a bit more about the business case? Not the current
> technical one. Because it is entirely possible Solr can solve the
> higher level problem out of the box without you doing manual term
> comparisons.In which case, your problem scope is not quite right.
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 10 September 2015 at 09:58, Francisco Andrés Fernández
>  wrote:
> > Hi all, I'm new to Solr.
> > I want to detect all ocurrences of terms existing in a thesaurus into 1
> or
> > more documents.
> > What´s the best strategy to make it?
> > Doing a query for each term doesn't seem to be the best way.
> > Many thanks,
> >
> > Francisco
>