Re: [More Like This] Query building
Hi Alessandro, It's not uncommon for Solr patches to remain uncommitted for months, even years. In fact some never get merged. Don't let that discourage you! k/r, Scott On Fri, Mar 11, 2016 at 11:49 AM, Alessandro Benedetti < abenede...@apache.org> wrote: > I start to feel that is not that easy to contribute improvements or small > fix to Solr ( if they are not super interesting to the mass) . > I think this one could be a good improvement in the MLT but I would love to > discuss this with some committer. > The patch is attached, it is there since months ago... > Any feedback would be appreciated, I want to contribute, but I need some > second opinions ... > > Cheers > > On 11 February 2016 at 13:48, Alessandro Benedetti> wrote: > > > Hi Guys, > > is it possible to have any feedback ? > > Is there any process to speed up bug resolution / discussions ? > > just want to understand if the patch is not good enough, if I need to > > improve it or simply no-one took a look ... > > > > https://issues.apache.org/jira/browse/LUCENE-6954 > > > > Cheers > > > > On 11 January 2016 at 15:25, Alessandro Benedetti > > > wrote: > > > >> Hi guys, > >> the patch seems fine to me. > >> I didn't spend much more time on the code but I checked the tests and > the > >> pre-commit checks. > >> It seems fine to me. > >> Let me know , > >> > >> Cheers > >> > >> On 31 December 2015 at 18:40, Alessandro Benedetti < > abenede...@apache.org > >> > wrote: > >> > >>> https://issues.apache.org/jira/browse/LUCENE-6954 > >>> > >>> First draft patch available, I will check better the tests new year ! > >>> > >>> On 29 December 2015 at 13:43, Alessandro Benedetti < > >>> abenede...@apache.org> wrote: > >>> > Sure, I will proceed tomorrow with the Jira and the simple patch + > tests. > > In the meantime let's try to collect some additional feedback. > > Cheers > > On 29 December 2015 at 12:43, Anshum Gupta > wrote: > > > Feel free to create a JIRA and put up a patch if you can. > > > > On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti < > > abenede...@apache.org > > > wrote: > > > > > Hi guys, > > > While I was exploring the way we build the More Like This query, I > > > discovered a part I am not convinced of : > > > > > > > > > > > > Let's see how we build the query : > > > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) > > > > > > 1) we extract the terms from the interesting fields, adding them to > > a map : > > > > > > Map termFreqMap = new HashMap<>(); > > > > > > *( we lose the relation field-> term, we don't know anymore where > > the term > > > was coming ! )* > > > > > > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue > > > > > > 2) we build the queue that will contain the query terms, at this > > point we > > > connect again there terms to some field, but : > > > > > > ... > > >> // go through all the fields and find the largest document > frequency > > >> String topField = fieldNames[0]; > > >> int docFreq = 0; > > >> for (String fieldName : fieldNames) { > > >> int freq = ir.docFreq(new Term(fieldName, word)); > > >> topField = (freq > docFreq) ? fieldName : topField; > > >> docFreq = (freq > docFreq) ? freq : docFreq; > > >> } > > >> ... > > > > > > > > > We identify the topField as the field with the highest document > > frequency > > > for the term t . > > > Then we build the termQuery : > > > > > > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, > tf)); > > > > > > In this way we lose a lot of precision. > > > Not sure why we do that. > > > I would prefer to keep the relation between terms and fields. > > > The MLT query can improve a lot the quality. > > > If i run the MLT on 2 fields : *description* and *facilities* for > > example. > > > It is likely I want to find documents with similar terms in the > > > description and similar terms in the facilities, without mixing up > > the > > > things and loosing the semantic of the terms. > > > > > > Let me know your opinion, > > > > > > Cheers > > > > > > > > > -- > > > -- > > > > > > Benedetti Alessandro > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > "Tyger, tyger burning bright > > > In the forests of the night, > > > What immortal hand or eye > > > Could frame thy fearful symmetry?" > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > -- > > Anshum Gupta > > > > > > -- > -- > > Benedetti Alessandro > Visiting card :
Re: [More Like This] Query building
I start to feel that is not that easy to contribute improvements or small fix to Solr ( if they are not super interesting to the mass) . I think this one could be a good improvement in the MLT but I would love to discuss this with some committer. The patch is attached, it is there since months ago... Any feedback would be appreciated, I want to contribute, but I need some second opinions ... Cheers On 11 February 2016 at 13:48, Alessandro Benedettiwrote: > Hi Guys, > is it possible to have any feedback ? > Is there any process to speed up bug resolution / discussions ? > just want to understand if the patch is not good enough, if I need to > improve it or simply no-one took a look ... > > https://issues.apache.org/jira/browse/LUCENE-6954 > > Cheers > > On 11 January 2016 at 15:25, Alessandro Benedetti > wrote: > >> Hi guys, >> the patch seems fine to me. >> I didn't spend much more time on the code but I checked the tests and the >> pre-commit checks. >> It seems fine to me. >> Let me know , >> >> Cheers >> >> On 31 December 2015 at 18:40, Alessandro Benedetti > > wrote: >> >>> https://issues.apache.org/jira/browse/LUCENE-6954 >>> >>> First draft patch available, I will check better the tests new year ! >>> >>> On 29 December 2015 at 13:43, Alessandro Benedetti < >>> abenede...@apache.org> wrote: >>> Sure, I will proceed tomorrow with the Jira and the simple patch + tests. In the meantime let's try to collect some additional feedback. Cheers On 29 December 2015 at 12:43, Anshum Gupta wrote: > Feel free to create a JIRA and put up a patch if you can. > > On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti < > abenede...@apache.org > > wrote: > > > Hi guys, > > While I was exploring the way we build the More Like This query, I > > discovered a part I am not convinced of : > > > > > > > > Let's see how we build the query : > > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) > > > > 1) we extract the terms from the interesting fields, adding them to > a map : > > > > Map termFreqMap = new HashMap<>(); > > > > *( we lose the relation field-> term, we don't know anymore where > the term > > was coming ! )* > > > > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue > > > > 2) we build the queue that will contain the query terms, at this > point we > > connect again there terms to some field, but : > > > > ... > >> // go through all the fields and find the largest document frequency > >> String topField = fieldNames[0]; > >> int docFreq = 0; > >> for (String fieldName : fieldNames) { > >> int freq = ir.docFreq(new Term(fieldName, word)); > >> topField = (freq > docFreq) ? fieldName : topField; > >> docFreq = (freq > docFreq) ? freq : docFreq; > >> } > >> ... > > > > > > We identify the topField as the field with the highest document > frequency > > for the term t . > > Then we build the termQuery : > > > > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf)); > > > > In this way we lose a lot of precision. > > Not sure why we do that. > > I would prefer to keep the relation between terms and fields. > > The MLT query can improve a lot the quality. > > If i run the MLT on 2 fields : *description* and *facilities* for > example. > > It is likely I want to find documents with similar terms in the > > description and similar terms in the facilities, without mixing up > the > > things and loosing the semantic of the terms. > > > > Let me know your opinion, > > > > Cheers > > > > > > -- > > -- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > > > > -- > Anshum Gupta > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England >>> >>> >>> >>> -- >>> -- >>> >>> Benedetti Alessandro >>> Visiting card : http://about.me/alessandro_benedetti >>> >>> "Tyger, tyger burning bright >>> In the forests of the night, >>> What immortal hand or eye >>> Could frame thy fearful symmetry?" >>> >>> William Blake - Songs of Experience
Re: [More Like This] Query building
Hi Guys, is it possible to have any feedback ? Is there any process to speed up bug resolution / discussions ? just want to understand if the patch is not good enough, if I need to improve it or simply no-one took a look ... https://issues.apache.org/jira/browse/LUCENE-6954 Cheers On 11 January 2016 at 15:25, Alessandro Benedettiwrote: > Hi guys, > the patch seems fine to me. > I didn't spend much more time on the code but I checked the tests and the > pre-commit checks. > It seems fine to me. > Let me know , > > Cheers > > On 31 December 2015 at 18:40, Alessandro Benedetti > wrote: > >> https://issues.apache.org/jira/browse/LUCENE-6954 >> >> First draft patch available, I will check better the tests new year ! >> >> On 29 December 2015 at 13:43, Alessandro Benedetti > > wrote: >> >>> Sure, I will proceed tomorrow with the Jira and the simple patch + tests. >>> >>> In the meantime let's try to collect some additional feedback. >>> >>> Cheers >>> >>> On 29 December 2015 at 12:43, Anshum Gupta >>> wrote: >>> Feel free to create a JIRA and put up a patch if you can. On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti < abenede...@apache.org > wrote: > Hi guys, > While I was exploring the way we build the More Like This query, I > discovered a part I am not convinced of : > > > > Let's see how we build the query : > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) > > 1) we extract the terms from the interesting fields, adding them to a map : > > Map termFreqMap = new HashMap<>(); > > *( we lose the relation field-> term, we don't know anymore where the term > was coming ! )* > > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue > > 2) we build the queue that will contain the query terms, at this point we > connect again there terms to some field, but : > > ... >> // go through all the fields and find the largest document frequency >> String topField = fieldNames[0]; >> int docFreq = 0; >> for (String fieldName : fieldNames) { >> int freq = ir.docFreq(new Term(fieldName, word)); >> topField = (freq > docFreq) ? fieldName : topField; >> docFreq = (freq > docFreq) ? freq : docFreq; >> } >> ... > > > We identify the topField as the field with the highest document frequency > for the term t . > Then we build the termQuery : > > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf)); > > In this way we lose a lot of precision. > Not sure why we do that. > I would prefer to keep the relation between terms and fields. > The MLT query can improve a lot the quality. > If i run the MLT on 2 fields : *description* and *facilities* for example. > It is likely I want to find documents with similar terms in the > description and similar terms in the facilities, without mixing up the > things and loosing the semantic of the terms. > > Let me know your opinion, > > Cheers > > > -- > -- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- Anshum Gupta >>> >>> >>> >>> -- >>> -- >>> >>> Benedetti Alessandro >>> Visiting card : http://about.me/alessandro_benedetti >>> >>> "Tyger, tyger burning bright >>> In the forests of the night, >>> What immortal hand or eye >>> Could frame thy fearful symmetry?" >>> >>> William Blake - Songs of Experience -1794 England >>> >> >> >> >> -- >> -- >> >> Benedetti Alessandro >> Visiting card : http://about.me/alessandro_benedetti >> >> "Tyger, tyger burning bright >> In the forests of the night, >> What immortal hand or eye >> Could frame thy fearful symmetry?" >> >> William Blake - Songs of Experience -1794 England >> > > > > -- > -- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: [More Like This] Query building
Hi guys, the patch seems fine to me. I didn't spend much more time on the code but I checked the tests and the pre-commit checks. It seems fine to me. Let me know , Cheers On 31 December 2015 at 18:40, Alessandro Benedettiwrote: > https://issues.apache.org/jira/browse/LUCENE-6954 > > First draft patch available, I will check better the tests new year ! > > On 29 December 2015 at 13:43, Alessandro Benedetti > wrote: > >> Sure, I will proceed tomorrow with the Jira and the simple patch + tests. >> >> In the meantime let's try to collect some additional feedback. >> >> Cheers >> >> On 29 December 2015 at 12:43, Anshum Gupta >> wrote: >> >>> Feel free to create a JIRA and put up a patch if you can. >>> >>> On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti < >>> abenede...@apache.org >>> > wrote: >>> >>> > Hi guys, >>> > While I was exploring the way we build the More Like This query, I >>> > discovered a part I am not convinced of : >>> > >>> > >>> > >>> > Let's see how we build the query : >>> > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) >>> > >>> > 1) we extract the terms from the interesting fields, adding them to a >>> map : >>> > >>> > Map termFreqMap = new HashMap<>(); >>> > >>> > *( we lose the relation field-> term, we don't know anymore where the >>> term >>> > was coming ! )* >>> > >>> > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue >>> > >>> > 2) we build the queue that will contain the query terms, at this point >>> we >>> > connect again there terms to some field, but : >>> > >>> > ... >>> >> // go through all the fields and find the largest document frequency >>> >> String topField = fieldNames[0]; >>> >> int docFreq = 0; >>> >> for (String fieldName : fieldNames) { >>> >> int freq = ir.docFreq(new Term(fieldName, word)); >>> >> topField = (freq > docFreq) ? fieldName : topField; >>> >> docFreq = (freq > docFreq) ? freq : docFreq; >>> >> } >>> >> ... >>> > >>> > >>> > We identify the topField as the field with the highest document >>> frequency >>> > for the term t . >>> > Then we build the termQuery : >>> > >>> > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf)); >>> > >>> > In this way we lose a lot of precision. >>> > Not sure why we do that. >>> > I would prefer to keep the relation between terms and fields. >>> > The MLT query can improve a lot the quality. >>> > If i run the MLT on 2 fields : *description* and *facilities* for >>> example. >>> > It is likely I want to find documents with similar terms in the >>> > description and similar terms in the facilities, without mixing up the >>> > things and loosing the semantic of the terms. >>> > >>> > Let me know your opinion, >>> > >>> > Cheers >>> > >>> > >>> > -- >>> > -- >>> > >>> > Benedetti Alessandro >>> > Visiting card : http://about.me/alessandro_benedetti >>> > >>> > "Tyger, tyger burning bright >>> > In the forests of the night, >>> > What immortal hand or eye >>> > Could frame thy fearful symmetry?" >>> > >>> > William Blake - Songs of Experience -1794 England >>> > >>> >>> >>> >>> -- >>> Anshum Gupta >>> >> >> >> >> -- >> -- >> >> Benedetti Alessandro >> Visiting card : http://about.me/alessandro_benedetti >> >> "Tyger, tyger burning bright >> In the forests of the night, >> What immortal hand or eye >> Could frame thy fearful symmetry?" >> >> William Blake - Songs of Experience -1794 England >> > > > > -- > -- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: [More Like This] Query building
https://issues.apache.org/jira/browse/LUCENE-6954 First draft patch available, I will check better the tests new year ! On 29 December 2015 at 13:43, Alessandro Benedettiwrote: > Sure, I will proceed tomorrow with the Jira and the simple patch + tests. > > In the meantime let's try to collect some additional feedback. > > Cheers > > On 29 December 2015 at 12:43, Anshum Gupta wrote: > >> Feel free to create a JIRA and put up a patch if you can. >> >> On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti < >> abenede...@apache.org >> > wrote: >> >> > Hi guys, >> > While I was exploring the way we build the More Like This query, I >> > discovered a part I am not convinced of : >> > >> > >> > >> > Let's see how we build the query : >> > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) >> > >> > 1) we extract the terms from the interesting fields, adding them to a >> map : >> > >> > Map termFreqMap = new HashMap<>(); >> > >> > *( we lose the relation field-> term, we don't know anymore where the >> term >> > was coming ! )* >> > >> > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue >> > >> > 2) we build the queue that will contain the query terms, at this point >> we >> > connect again there terms to some field, but : >> > >> > ... >> >> // go through all the fields and find the largest document frequency >> >> String topField = fieldNames[0]; >> >> int docFreq = 0; >> >> for (String fieldName : fieldNames) { >> >> int freq = ir.docFreq(new Term(fieldName, word)); >> >> topField = (freq > docFreq) ? fieldName : topField; >> >> docFreq = (freq > docFreq) ? freq : docFreq; >> >> } >> >> ... >> > >> > >> > We identify the topField as the field with the highest document >> frequency >> > for the term t . >> > Then we build the termQuery : >> > >> > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf)); >> > >> > In this way we lose a lot of precision. >> > Not sure why we do that. >> > I would prefer to keep the relation between terms and fields. >> > The MLT query can improve a lot the quality. >> > If i run the MLT on 2 fields : *description* and *facilities* for >> example. >> > It is likely I want to find documents with similar terms in the >> > description and similar terms in the facilities, without mixing up the >> > things and loosing the semantic of the terms. >> > >> > Let me know your opinion, >> > >> > Cheers >> > >> > >> > -- >> > -- >> > >> > Benedetti Alessandro >> > Visiting card : http://about.me/alessandro_benedetti >> > >> > "Tyger, tyger burning bright >> > In the forests of the night, >> > What immortal hand or eye >> > Could frame thy fearful symmetry?" >> > >> > William Blake - Songs of Experience -1794 England >> > >> >> >> >> -- >> Anshum Gupta >> > > > > -- > -- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
[More Like This] Query building
Hi guys, While I was exploring the way we build the More Like This query, I discovered a part I am not convinced of : Let's see how we build the query : org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) 1) we extract the terms from the interesting fields, adding them to a map : MaptermFreqMap = new HashMap<>(); *( we lose the relation field-> term, we don't know anymore where the term was coming ! )* org.apache.lucene.queries.mlt.MoreLikeThis#createQueue 2) we build the queue that will contain the query terms, at this point we connect again there terms to some field, but : ... > // go through all the fields and find the largest document frequency > String topField = fieldNames[0]; > int docFreq = 0; > for (String fieldName : fieldNames) { > int freq = ir.docFreq(new Term(fieldName, word)); > topField = (freq > docFreq) ? fieldName : topField; > docFreq = (freq > docFreq) ? freq : docFreq; > } > ... We identify the topField as the field with the highest document frequency for the term t . Then we build the termQuery : queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf)); In this way we lose a lot of precision. Not sure why we do that. I would prefer to keep the relation between terms and fields. The MLT query can improve a lot the quality. If i run the MLT on 2 fields : *description* and *facilities* for example. It is likely I want to find documents with similar terms in the description and similar terms in the facilities, without mixing up the things and loosing the semantic of the terms. Let me know your opinion, Cheers -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: [More Like This] Query building
Feel free to create a JIRA and put up a patch if you can. On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedettiwrote: > Hi guys, > While I was exploring the way we build the More Like This query, I > discovered a part I am not convinced of : > > > > Let's see how we build the query : > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) > > 1) we extract the terms from the interesting fields, adding them to a map : > > Map termFreqMap = new HashMap<>(); > > *( we lose the relation field-> term, we don't know anymore where the term > was coming ! )* > > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue > > 2) we build the queue that will contain the query terms, at this point we > connect again there terms to some field, but : > > ... >> // go through all the fields and find the largest document frequency >> String topField = fieldNames[0]; >> int docFreq = 0; >> for (String fieldName : fieldNames) { >> int freq = ir.docFreq(new Term(fieldName, word)); >> topField = (freq > docFreq) ? fieldName : topField; >> docFreq = (freq > docFreq) ? freq : docFreq; >> } >> ... > > > We identify the topField as the field with the highest document frequency > for the term t . > Then we build the termQuery : > > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf)); > > In this way we lose a lot of precision. > Not sure why we do that. > I would prefer to keep the relation between terms and fields. > The MLT query can improve a lot the quality. > If i run the MLT on 2 fields : *description* and *facilities* for example. > It is likely I want to find documents with similar terms in the > description and similar terms in the facilities, without mixing up the > things and loosing the semantic of the terms. > > Let me know your opinion, > > Cheers > > > -- > -- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- Anshum Gupta
Re: [More Like This] Query building
Sure, I will proceed tomorrow with the Jira and the simple patch + tests. In the meantime let's try to collect some additional feedback. Cheers On 29 December 2015 at 12:43, Anshum Guptawrote: > Feel free to create a JIRA and put up a patch if you can. > > On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti < > abenede...@apache.org > > wrote: > > > Hi guys, > > While I was exploring the way we build the More Like This query, I > > discovered a part I am not convinced of : > > > > > > > > Let's see how we build the query : > > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) > > > > 1) we extract the terms from the interesting fields, adding them to a > map : > > > > Map termFreqMap = new HashMap<>(); > > > > *( we lose the relation field-> term, we don't know anymore where the > term > > was coming ! )* > > > > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue > > > > 2) we build the queue that will contain the query terms, at this point we > > connect again there terms to some field, but : > > > > ... > >> // go through all the fields and find the largest document frequency > >> String topField = fieldNames[0]; > >> int docFreq = 0; > >> for (String fieldName : fieldNames) { > >> int freq = ir.docFreq(new Term(fieldName, word)); > >> topField = (freq > docFreq) ? fieldName : topField; > >> docFreq = (freq > docFreq) ? freq : docFreq; > >> } > >> ... > > > > > > We identify the topField as the field with the highest document frequency > > for the term t . > > Then we build the termQuery : > > > > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf)); > > > > In this way we lose a lot of precision. > > Not sure why we do that. > > I would prefer to keep the relation between terms and fields. > > The MLT query can improve a lot the quality. > > If i run the MLT on 2 fields : *description* and *facilities* for > example. > > It is likely I want to find documents with similar terms in the > > description and similar terms in the facilities, without mixing up the > > things and loosing the semantic of the terms. > > > > Let me know your opinion, > > > > Cheers > > > > > > -- > > -- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > > > > -- > Anshum Gupta > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England