Re:How to modify solr results programmatically
hello I am awfully sorry to post this message again with the same content but with a different title, I have done that because I found the title how to add a new parameter to solr requestthat I have given to my last post don't reflect really what I want to do, so I prefer posting it gain with the title How to modify solr results . Hello every body I want to modify a little bit the behaviour of Solr and I want to know if it is possible; Here is my problem : I give to Solr document to index which UniqueKey Field is based on the Url and the Time at which the croawler downloaded it so UniqueKey is a digit obtained like that MyAlgo(Url+Time); the problem occur at searching time solr return me the result which contain duplication it means for example the 10 first result correspond to the same web page with the same content because in fact it is the same Url. So I want to remove this duplication, so I want to add a parameter in the solr request for example permitdupp which takes values (true or false ) if permitdupp= true I will let the default Solr behaviour but if permitdupp=false I want to remouve all the duplicative document and just to keep the recent indexed document (to get the one recent my documents contain a date field ) . So I want to know which is the easiest way to do this; may be there is solr parametters I have to use (faceting???).or Programmatically : in that case which classes I have to modify or I have to inherit from to develop this solution. any suggestion is welcome. and thank you in advance. hello every body I want just to add this example to be more clear. I have this result from solr. result name=response numFound=7 start=0 maxScore=0.59129626 − doc str name=id1/str str name=DocUrlhttp://www.sarkozy.fr/str str name=date01/01/2008/str /doc − doc str name=id2/str str name=DocUrlhttp://www.sarkozy.fr/str str name=date31/01/2008/str /doc − doc str name=id3/str str name=DocUrlhttp://www.sarkozy.fr/str str name=date15/01/2008/str /doc . . . /result Note that it's the same field DocUrl (http://www.sarkozy.fr) for the three shown document above. I want to get in the result something like that. result name=response numFound=7 start=0 maxScore=0.59129626 − doc str name=id2/str str name=DocUrlhttp://www.sarkozy.fr/str str name=date31/01/2008/str /doc . . . /result keep the recent one. How to deal with that. Thank you in advance. -- View this message in context: http://www.nabble.com/how-to-add-a-new-parameter-to-solr-request-tp17338190p17357687.html Sent from the Solr - Dev mailing list archive at Nabble.com.
how to add a new parameter to solr request
Hello every body I want to modify a little bit the behaviour of Solr and I want to know if it is possible; Here is my problem : I give to Solr document to index which UniqueKey Field is based on the Url and the Time at which the croawler downloaded it so UniqueKey is a digit obtained like that MyAlgo(Url+Time); the problem occur at searching time solr return me the result which contain duplication it means for example the 10 first result correspond to the same web page with the same content because in fact it is the same Url. So I want to remove this duplication, so I want to add a parameter in the solr request for example permitdupp which takes values (true or false ) if permitdupp= true I will let the default Solr behaviour but if permitdupp=false I want to remouve all the duplicative document and just to keep the recent indexed document (to get the one recent my documents contain a date field ) . So I want to know which is the easiest way to do this; may be there is solr parametters I have to use (faceting???).or Programmatically : in that case which classes I have to modify or I have to inherit from to develop this solution. any suggestion is welcome. and thank you in advance. -- View this message in context: http://www.nabble.com/how-to-add-a-new-parameter-to-solr-request-tp17338190p17338190.html Sent from the Solr - Dev mailing list archive at Nabble.com.
Re: how to add a new parameter to solr request
hello every body I want just to add this example to be more clear. I have this result from solr. result name=response numFound=7 start=0 maxScore=0.59129626 − doc str name=id1/str str name=DocUrlhttp://www.sarkozy.fr/str str name=date01/01/2008/str /doc − doc str name=id2/str str name=DocUrlhttp://www.sarkozy.fr/str str name=date31/01/2008/str /doc − doc str name=id3/str str name=DocUrlhttp://www.sarkozy.fr/str str name=date15/01/2008/str /doc . . . /result Note that it's the same field DocUrl (http://www.sarkozy.fr) for the three shown document above. I want to get in the result something like that. result name=response numFound=7 start=0 maxScore=0.59129626 − doc str name=id2/str str name=DocUrlhttp://www.sarkozy.fr/str str name=date31/01/2008/str /doc . . . /result keep the recent one. How to deal with that. Thank you in advance. -- View this message in context: http://www.nabble.com/how-to-add-a-new-parameter-to-solr-request-tp17338190p17344135.html Sent from the Solr - Dev mailing list archive at Nabble.com.
Problrm Highlighting
hello every body Here is my problem : when using highlighting solr return only the best fragment (the most relevant section of the document) like this Nicolas Sarkozy naît le 28 janvier 1955 dans le 17e but I want solr to return me not only the best section but the best sections (that I precise the number my self ) at first I thought that hl.snippet=number is suitable to generate best sections of text but I noticed that this parameter has no effect on the result of highlighting, even using it on per field like this: http://localhost:8983/solr/select?indent=onversion=2.2q=arcDoc%3Asarkozystart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl=onhl.fl=arcDocf.arcDoc.hl.snippets=3hl.fragsize=300 the result I want to get is for example containing the best 3 sections like this NicolasSarkozy naît le 28 janvier 1955 dans le 17e ... Lorsque Paul Sarkozyquitte le domicile conjugal en 1959 et ... Paul Sarkozy se ... I found in the source code of the HighlightingUtils.class and the GapFragmenter.class / get highlighter, and number of fragments for this field Highlighter highlighter = getHighlighter(query, fieldName, req); int numFragments = getMaxSnippets(fieldName, req); .. .. frag = highlighter.getBestTextFragments(tstream, docTexts[0], false, numFragments); but why numFragments is 1 all the time. is it a known bug or tell me if I have forgotten something in my request or any config parameter. the other question is why there is similar classes (HighlightingUtils.class and the GapFragmenter.class) with different name and which one is used:confused: thank you in advance. -- View this message in context: http://www.nabble.com/Problrm-Highlighting-tp16698518p16698518.html Sent from the Solr - Dev mailing list archive at Nabble.com.
Re: Highlighting/getBestFragment
Mike Klaas wrote: On 10-Apr-08, at 7:41 AM, khirb7 wrote: I have done deep search and I found that lucene provide this that methode : getBestFragments highlighter.getBestFragments(tokenStream, text, maxNumFragment, ...); so with this methode we can precise to lucene to return maxNumFragment fragment (with highligted word)of fragsize characters, but there is no maxFragSize parameter in solr. this would be useful in my case if I want to highlight not only the first occurrence of a searched word but up to 1 occurrence of the same word in the highlighted text. I'm not sure I understand exactly what you want the parameter to do. see http://wiki.apache.org/solr/HighlightingParameters use: hl.fragsize=size to set the desired fragment size, and hl.snippets=number to set the number of returned snippets/fragments. -Mike thank you for your response, I think that I wasn't enough clear in my last post, (I have already read http://wiki.apache.org/solr/HighlightingParameters before asking my question last time)this is what I want to do: now solr give in response one fragment and I know hl.fragsize=size to set the desired fragment size, and hl.snippets=number to set the number of returned snippets/fragments. but hl.snippets is useful if we deal with multi-valuated field (for instance the feature field in the solr schema example) but in my case I have a single field myText which type is text in each document so here hl.snippets=number has no sense, either used or not the highlighted result is the same. here is what I want to do. lucene provide overloaded methodes getBestFragment() to return fragments : I think that solr classes use this methode highlighter.getBestFragment(tokenStream, text) which return one fragment containing the first occurence of the searched wordhighlighted , but I dont want only the first occurrence but the N(2th or 3th.) th one's and I want to replace the previous methode by String result = highlighter.getBestFragments(tokenStream, text, 5, ...); here we have maxNumFragment=5 the the five best fragment so I want to know and where I must modify in Solr to do that: which class and how. or in solrconfig.xml but i found this difficult may be I have to create my Handler I am waitin your suggestion how to deal with that. -- View this message in context: http://www.nabble.com/Highlighting-getBestFragment-tp16608862p16656982.html Sent from the Solr - Dev mailing list archive at Nabble.com.
Highlighting/getBestFragment
I have done deep search and I found that lucene provide this that methode : getBestFragments highlighter.getBestFragments(tokenStream, text, maxNumFragment, ...); so with this methode we can precise to lucene to return maxNumFragment fragment (with highligted word)of fragsize characters, but there is no maxFragSize parameter in solr. this would be useful in my case if I want to highlight not only the first occurrence of a searched word but up to 1 occurrence of the same word in the highlighted text. so is it possible to add this option to solr? how and where? cheers -- View this message in context: http://www.nabble.com/Highlighting-getBestFragment-tp16608862p16608862.html Sent from the Solr - Dev mailing list archive at Nabble.com.