Total Term Frequency per ResultSet in Solr 4.3 ?
Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe Hunger Book/str/doc And I am looking for term 'hunger' in product field then I want to get value = '2' , and if I am searching for term 'games' in product field I want to get value = '1' . Thanks, Tony
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
Sorry, but there is no such feature in Solr at this time - you would have to do it manually, either by retrieving all of the results or by writing a custom value source (function) that does the desired calculation within Solr. Feel free to file a Jira for suggesting such a new feature/improvement. -- Jack Krupansky -Original Message- From: Tony Mullins Sent: Thursday, July 04, 2013 9:45 AM To: solr-user@lucene.apache.org Subject: Total Term Frequency per ResultSet in Solr 4.3 ? Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe Hunger Book/str/doc And I am looking for term 'hunger' in product field then I want to get value = '2' , and if I am searching for term 'games' in product field I want to get value = '1' . Thanks, Tony
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
If you just want to retrieve those counts, this seems like simple faceting. q=something facet=true facet.query=product:hunger facet.query=product:games -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe Hunger Book/str/doc And I am looking for term 'hunger' in product field then I want to get value = '2' , and if I am searching for term 'games' in product field I want to get value = '1' . Thanks, Tony
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
Hi Yonik, With facet it didn't work. Please see the result set doc below http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20 doc str name=id27/str str name=typeMovies/str str name=formatdvd/str str name=productThe amazing spider man is amazing spider the spider/str int name=popularity1/int long name=_version_1439641369145507840/long int name=amazing_freq2/int int name=spider_freq3/int /doc /resultlst name=facet_countslst name=facet_queries int name=product:spider1/int int name=product:amazing1/int /lst As you can see facet is actually just returning the no. of docs found against those keywrods not the actual frequency. Actual frequency is returned by the field 'amazing_freq' 'spider_freq' ! So is there any workaround for this to get the total of term-frequency in resultset without any modification to Solr source code ? Thanks, Tony On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com wrote: If you just want to retrieve those counts, this seems like simple faceting. q=something facet=true facet.query=product:hunger facet.query=product:games -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe Hunger Book/str/doc And I am looking for term 'hunger' in product field then I want to get value = '2' , and if I am searching for term 'games' in product field I want to get value = '1' . Thanks, Tony
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
Ah, sorry - I thought you were after docfreq, not termfreq. -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi Yonik, With facet it didn't work. Please see the result set doc below http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20 doc str name=id27/str str name=typeMovies/str str name=formatdvd/str str name=productThe amazing spider man is amazing spider the spider/str int name=popularity1/int long name=_version_1439641369145507840/long int name=amazing_freq2/int int name=spider_freq3/int /doc /resultlst name=facet_countslst name=facet_queries int name=product:spider1/int int name=product:amazing1/int /lst As you can see facet is actually just returning the no. of docs found against those keywrods not the actual frequency. Actual frequency is returned by the field 'amazing_freq' 'spider_freq' ! So is there any workaround for this to get the total of term-frequency in resultset without any modification to Solr source code ? Thanks, Tony On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com wrote: If you just want to retrieve those counts, this seems like simple faceting. q=something facet=true facet.query=product:hunger facet.query=product:games -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe Hunger Book/str/doc And I am looking for term 'hunger' in product field then I want to get value = '2' , and if I am searching for term 'games' in product field I want to get value = '1' . Thanks, Tony
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
So what is the workaround for this problem ? Can it be done without changing any source code ? Thanks, Tony On Thu, Jul 4, 2013 at 8:01 PM, Yonik Seeley yo...@lucidworks.com wrote: Ah, sorry - I thought you were after docfreq, not termfreq. -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi Yonik, With facet it didn't work. Please see the result set doc below http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20 doc str name=id27/str str name=typeMovies/str str name=formatdvd/str str name=productThe amazing spider man is amazing spider the spider/str int name=popularity1/int long name=_version_1439641369145507840/long int name=amazing_freq2/int int name=spider_freq3/int /doc /resultlst name=facet_countslst name=facet_queries int name=product:spider1/int int name=product:amazing1/int /lst As you can see facet is actually just returning the no. of docs found against those keywrods not the actual frequency. Actual frequency is returned by the field 'amazing_freq' 'spider_freq' ! So is there any workaround for this to get the total of term-frequency in resultset without any modification to Solr source code ? Thanks, Tony On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com wrote: If you just want to retrieve those counts, this seems like simple faceting. q=something facet=true facet.query=product:hunger facet.query=product:games -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe Hunger Book/str/doc And I am looking for term 'hunger' in product field then I want to get value = '2' , and if I am searching for term 'games' in product field I want to get value = '1' . Thanks, Tony
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
These statistics are use for determining document relevance or score for the query itself. As such, they are one of two things: 1) (per field) per document, or for the universe of documents in the collection. That's it, one of the two. You keep referring to ResultSet, but there is no such concept in relevancy or scoring, at least in the Lucene model for relevancy and scoring. If you might more details on Lucene/Solr scoring, see: http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html Feel free to propose an alternative model to relevancy and scoring, but don't expect an implementation of such a model in the near-term. You might also be able to implement your alternative model for relevance and scoring using a custom Similarity (scoring) plug-in, coupled with custom Value Sources to expose whatever alternative metrics you wish. But, before you embark on such a venture, be aware that the performance of such an alternative relevance model might not be as appealing as you might want. You'll have to do a proof of concept to see how well things actually work out. -- Jack Krupansky -Original Message- From: Tony Mullins Sent: Thursday, July 04, 2013 12:24 PM To: solr-user@lucene.apache.org Subject: Re: Total Term Frequency per ResultSet in Solr 4.3 ? So what is the workaround for this problem ? Can it be done without changing any source code ? Thanks, Tony On Thu, Jul 4, 2013 at 8:01 PM, Yonik Seeley yo...@lucidworks.com wrote: Ah, sorry - I thought you were after docfreq, not termfreq. -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi Yonik, With facet it didn't work. Please see the result set doc below http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20 doc str name=id27/str str name=typeMovies/str str name=formatdvd/str str name=productThe amazing spider man is amazing spider the spider/str int name=popularity1/int long name=_version_1439641369145507840/long int name=amazing_freq2/int int name=spider_freq3/int /doc /resultlst name=facet_countslst name=facet_queries int name=product:spider1/int int name=product:amazing1/int /lst As you can see facet is actually just returning the no. of docs found against those keywrods not the actual frequency. Actual frequency is returned by the field 'amazing_freq' 'spider_freq' ! So is there any workaround for this to get the total of term-frequency in resultset without any modification to Solr source code ? Thanks, Tony On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com wrote: If you just want to retrieve those counts, this seems like simple faceting. q=something facet=true facet.query=product:hunger facet.query=product:games -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe Hunger Book/str/doc And I am looking for term 'hunger' in product field then I want to get value = '2' , and if I am searching for term 'games' in product field I want to get value = '1' . Thanks, Tony
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
Hi Tony, Have you seen the TermVectorComponenthttp://wiki.apache.org/solr/TermVectorComponent? It will return the TermVectors for the documents in your result set (note that the rows parameter matters if you want results for the whole set, the default is 10). TermVectors also must be stored for each field that you want term frequency returned for. Suppose you have the query http://localhost:8983/solr/collection1/tvrh?q=cablefl=includestv.tf=true on the example that comes packaged with Solr. Then part of the response is: lst name=termVectors str name=uniqueKeyFieldNameid/str lst name=IW-02 str name=uniqueKeyIW-02/str /lst lst name=9885A004 str name=uniqueKey9885A004/str lst name=includes lst name=32mb int name=tf1/int /lst lst name=av int name=tf1/int /lst lst name=battery int name=tf1/int /lst lst name=cable int name=tf2/int /lst lst name=card int name=tf1/int /lst lst name=sd int name=tf1/int /lst lst name=usb int name=tf1/int /lst /lst /lst lst name=3007WFP str name=uniqueKey3007WFP/str lst name=includes lst name=cable int name=tf1/int /lst lst name=usb int name=tf1/int /lst /lst /lst lst name=MA147LL/A str name=uniqueKeyMA147LL/A/str lst name=includes lst name=cable int name=tf1/int /lst lst name=earbud int name=tf1/int /lst lst name=headphones int name=tf1/int /lst lst name=usb int name=tf1/int /lst /lst /lst /lst Then you can use an XPath query like sum(//lst[@name='cable']/int[@name='tf']) where 'cable' was the term, to calculate the term frequency in the 'includes' field for the whole result set. You could extend this to get the term frequency across all fields for your result set with some alterations to the query and schema.xml configuration. Alternately you could get the response as json (wt=json) and use javascript to sum. I know this is not terribly efficient but, if I'm understanding your request correctly, it's possible. Cheers, Tricia On Thu, Jul 4, 2013 at 10:24 AM, Tony Mullins tonymullins...@gmail.comwrote: So what is the workaround for this problem ? Can it be done without changing any source code ? Thanks, Tony On Thu, Jul 4, 2013 at 8:01 PM, Yonik Seeley yo...@lucidworks.com wrote: Ah, sorry - I thought you were after docfreq, not termfreq. -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi Yonik, With facet it didn't work. Please see the result set doc below http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20 doc str name=id27/str str name=typeMovies/str str name=formatdvd/str str name=productThe amazing spider man is amazing spider the spider/str int name=popularity1/int long name=_version_1439641369145507840/long int name=amazing_freq2/int int name=spider_freq3/int /doc /resultlst name=facet_countslst name=facet_queries int name=product:spider1/int int name=product:amazing1/int /lst As you can see facet is actually just returning the no. of docs found against those keywrods not the actual frequency. Actual frequency is returned by the field 'amazing_freq' 'spider_freq' ! So is there any workaround for this to get the total of term-frequency in resultset without any modification to Solr source code ? Thanks, Tony On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com wrote: If you just want to retrieve those counts, this seems like simple faceting. q=something facet=true facet.query=product:hunger facet.query=product:games -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
OK. Thanks Tricia , Jack Yonik for your suggestions and time. Regards, Tony. On Fri, Jul 5, 2013 at 1:20 AM, P Williams williams.tricia.l...@gmail.comwrote: Hi Tony, Have you seen the TermVectorComponenthttp://wiki.apache.org/solr/TermVectorComponent? It will return the TermVectors for the documents in your result set (note that the rows parameter matters if you want results for the whole set, the default is 10). TermVectors also must be stored for each field that you want term frequency returned for. Suppose you have the query http://localhost:8983/solr/collection1/tvrh?q=cablefl=includestv.tf=trueon the example that comes packaged with Solr. Then part of the response is: lst name=termVectors str name=uniqueKeyFieldNameid/str lst name=IW-02 str name=uniqueKeyIW-02/str /lst lst name=9885A004 str name=uniqueKey9885A004/str lst name=includes lst name=32mb int name=tf1/int /lst lst name=av int name=tf1/int /lst lst name=battery int name=tf1/int /lst lst name=cable int name=tf2/int /lst lst name=card int name=tf1/int /lst lst name=sd int name=tf1/int /lst lst name=usb int name=tf1/int /lst /lst /lst lst name=3007WFP str name=uniqueKey3007WFP/str lst name=includes lst name=cable int name=tf1/int /lst lst name=usb int name=tf1/int /lst /lst /lst lst name=MA147LL/A str name=uniqueKeyMA147LL/A/str lst name=includes lst name=cable int name=tf1/int /lst lst name=earbud int name=tf1/int /lst lst name=headphones int name=tf1/int /lst lst name=usb int name=tf1/int /lst /lst /lst /lst Then you can use an XPath query like sum(//lst[@name='cable']/int[@name='tf']) where 'cable' was the term, to calculate the term frequency in the 'includes' field for the whole result set. You could extend this to get the term frequency across all fields for your result set with some alterations to the query and schema.xml configuration. Alternately you could get the response as json (wt=json) and use javascript to sum. I know this is not terribly efficient but, if I'm understanding your request correctly, it's possible. Cheers, Tricia On Thu, Jul 4, 2013 at 10:24 AM, Tony Mullins tonymullins...@gmail.com wrote: So what is the workaround for this problem ? Can it be done without changing any source code ? Thanks, Tony On Thu, Jul 4, 2013 at 8:01 PM, Yonik Seeley yo...@lucidworks.com wrote: Ah, sorry - I thought you were after docfreq, not termfreq. -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi Yonik, With facet it didn't work. Please see the result set doc below http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20 doc str name=id27/str str name=typeMovies/str str name=formatdvd/str str name=productThe amazing spider man is amazing spider the spider/str int name=popularity1/int long name=_version_1439641369145507840/long int name=amazing_freq2/int int name=spider_freq3/int /doc /resultlst name=facet_countslst name=facet_queries int name=product:spider1/int int name=product:amazing1/int /lst As you can see facet is actually just returning the no. of docs found against those keywrods not the actual frequency. Actual frequency is returned by the field 'amazing_freq' 'spider_freq' ! So is there any workaround for this to get the total of term-frequency in resultset without any modification to Solr source code ? Thanks, Tony On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com wrote: If you just want to retrieve those counts, this seems like simple faceting. q=something facet=true facet.query=product:hunger facet.query=product:games -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to