I start to feel that is not that easy to contribute improvements or small fix to Solr ( if they are not super interesting to the mass) . I think this one could be a good improvement in the MLT but I would love to discuss this with some committer. The patch is attached, it is there since months ago... Any feedback would be appreciated, I want to contribute, but I need some second opinions ...
Cheers On 11 February 2016 at 13:48, Alessandro Benedetti <abenede...@apache.org> wrote: > Hi Guys, > is it possible to have any feedback ? > Is there any process to speed up bug resolution / discussions ? > just want to understand if the patch is not good enough, if I need to > improve it or simply no-one took a look ... > > https://issues.apache.org/jira/browse/LUCENE-6954 > > Cheers > > On 11 January 2016 at 15:25, Alessandro Benedetti <abenede...@apache.org> > wrote: > >> Hi guys, >> the patch seems fine to me. >> I didn't spend much more time on the code but I checked the tests and the >> pre-commit checks. >> It seems fine to me. >> Let me know , >> >> Cheers >> >> On 31 December 2015 at 18:40, Alessandro Benedetti <abenede...@apache.org >> > wrote: >> >>> https://issues.apache.org/jira/browse/LUCENE-6954 >>> >>> First draft patch available, I will check better the tests new year ! >>> >>> On 29 December 2015 at 13:43, Alessandro Benedetti < >>> abenede...@apache.org> wrote: >>> >>>> Sure, I will proceed tomorrow with the Jira and the simple patch + >>>> tests. >>>> >>>> In the meantime let's try to collect some additional feedback. >>>> >>>> Cheers >>>> >>>> On 29 December 2015 at 12:43, Anshum Gupta <ans...@anshumgupta.net> >>>> wrote: >>>> >>>>> Feel free to create a JIRA and put up a patch if you can. >>>>> >>>>> On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti < >>>>> abenede...@apache.org >>>>> > wrote: >>>>> >>>>> > Hi guys, >>>>> > While I was exploring the way we build the More Like This query, I >>>>> > discovered a part I am not convinced of : >>>>> > >>>>> > >>>>> > >>>>> > Let's see how we build the query : >>>>> > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) >>>>> > >>>>> > 1) we extract the terms from the interesting fields, adding them to >>>>> a map : >>>>> > >>>>> > Map<String, Int> termFreqMap = new HashMap<>(); >>>>> > >>>>> > *( we lose the relation field-> term, we don't know anymore where >>>>> the term >>>>> > was coming ! )* >>>>> > >>>>> > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue >>>>> > >>>>> > 2) we build the queue that will contain the query terms, at this >>>>> point we >>>>> > connect again there terms to some field, but : >>>>> > >>>>> > ... >>>>> >> // go through all the fields and find the largest document frequency >>>>> >> String topField = fieldNames[0]; >>>>> >> int docFreq = 0; >>>>> >> for (String fieldName : fieldNames) { >>>>> >> int freq = ir.docFreq(new Term(fieldName, word)); >>>>> >> topField = (freq > docFreq) ? fieldName : topField; >>>>> >> docFreq = (freq > docFreq) ? freq : docFreq; >>>>> >> } >>>>> >> ... >>>>> > >>>>> > >>>>> > We identify the topField as the field with the highest document >>>>> frequency >>>>> > for the term t . >>>>> > Then we build the termQuery : >>>>> > >>>>> > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf)); >>>>> > >>>>> > In this way we lose a lot of precision. >>>>> > Not sure why we do that. >>>>> > I would prefer to keep the relation between terms and fields. >>>>> > The MLT query can improve a lot the quality. >>>>> > If i run the MLT on 2 fields : *description* and *facilities* for >>>>> example. >>>>> > It is likely I want to find documents with similar terms in the >>>>> > description and similar terms in the facilities, without mixing up >>>>> the >>>>> > things and loosing the semantic of the terms. >>>>> > >>>>> > Let me know your opinion, >>>>> > >>>>> > Cheers >>>>> > >>>>> > >>>>> > -- >>>>> > -------------------------- >>>>> > >>>>> > Benedetti Alessandro >>>>> > Visiting card : http://about.me/alessandro_benedetti >>>>> > >>>>> > "Tyger, tyger burning bright >>>>> > In the forests of the night, >>>>> > What immortal hand or eye >>>>> > Could frame thy fearful symmetry?" >>>>> > >>>>> > William Blake - Songs of Experience -1794 England >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Anshum Gupta >>>>> >>>> >>>> >>>> >>>> -- >>>> -------------------------- >>>> >>>> Benedetti Alessandro >>>> Visiting card : http://about.me/alessandro_benedetti >>>> >>>> "Tyger, tyger burning bright >>>> In the forests of the night, >>>> What immortal hand or eye >>>> Could frame thy fearful symmetry?" >>>> >>>> William Blake - Songs of Experience -1794 England >>>> >>> >>> >>> >>> -- >>> -------------------------- >>> >>> Benedetti Alessandro >>> Visiting card : http://about.me/alessandro_benedetti >>> >>> "Tyger, tyger burning bright >>> In the forests of the night, >>> What immortal hand or eye >>> Could frame thy fearful symmetry?" >>> >>> William Blake - Songs of Experience -1794 England >>> >> >> >> >> -- >> -------------------------- >> >> Benedetti Alessandro >> Visiting card : http://about.me/alessandro_benedetti >> >> "Tyger, tyger burning bright >> In the forests of the night, >> What immortal hand or eye >> Could frame thy fearful symmetry?" >> >> William Blake - Songs of Experience -1794 England >> > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England