Re:How to modify solr results programmatically

2008-05-21 Thread khirb7

hello 
I am awfully sorry to post this message again with the same content but with
a different title, I have done that because I found the title  how to add a
new parameter to solr requestthat I have given to my last  post don't
reflect really what I want to do, so I prefer posting it gain with the title
How to modify solr results .

Hello every body

I want to modify  a little bit the behaviour of Solr and I want to know if
it is possible; Here is my problem :
I give to Solr document to index which UniqueKey Field is based on the Url
and the  Time at which the croawler downloaded it  so  UniqueKey is a digit
obtained like that  MyAlgo(Url+Time); the problem occur at searching time
solr return me the result which contain duplication it means for example the 
10 first result correspond to the same  web page with the same content 
because in  fact it is the same Url. So I  want to remove this duplication,
so I want to add a parameter  in the solr request for example  permitdupp
which takes values (true or false ) if  permitdupp= true I will let the
default Solr behaviour but if permitdupp=false I want to remouve all the
duplicative document and just to keep the recent indexed document (to get
the one recent my documents contain a date field ) .
So I want to know which is the easiest way to do this;
may be there is solr parametters I have to use (faceting???).or
Programmatically : in that case  which classes I have to modify or  I have
to inherit  from to develop this solution.
any suggestion  is welcome. and thank you in advance. 


hello every body
I want just to add this example to be more clear. I have this result from
solr.

result name=response numFound=7 start=0 maxScore=0.59129626
−
doc
str name=id1/str
str name=DocUrlhttp://www.sarkozy.fr/str
str name=date01/01/2008/str
/doc
−
doc
str name=id2/str
str name=DocUrlhttp://www.sarkozy.fr/str
str name=date31/01/2008/str
/doc
−
doc
str name=id3/str
str name=DocUrlhttp://www.sarkozy.fr/str
str name=date15/01/2008/str
/doc
 .
 .
 .
/result

Note that it's the same field   DocUrl (http://www.sarkozy.fr) for the three
shown document above. I want to get in  the result something like that.

result name=response numFound=7 start=0 maxScore=0.59129626
−
doc
str name=id2/str
str name=DocUrlhttp://www.sarkozy.fr/str
str name=date31/01/2008/str

/doc


 .
 .
 .
/result
keep the recent one.

How to deal with that. Thank you in advance. 
-- 
View this message in context: 
http://www.nabble.com/how-to-add-a-new-parameter-to-solr-request-tp17338190p17357687.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



how to add a new parameter to solr request

2008-05-20 Thread khirb7

Hello every body

I want to modify  a little bit the behaviour of Solr and I want to know if
it is possible; Here is my problem :
I give to Solr document to index which UniqueKey Field is based on the Url
and the  Time at which the croawler downloaded it  so  UniqueKey is a digit
obtained like that  MyAlgo(Url+Time); the problem occur at searching time
solr return me the result which contain duplication it means for example the 
10 first result correspond to the same  web page with the same content 
because in  fact it is the same Url. So I  want to remove this duplication,
so I want to add a parameter  in the solr request for example  permitdupp
which takes values (true or false ) if  permitdupp= true I will let the
default Solr behaviour but if permitdupp=false I want to remouve all the
duplicative document and just to keep the recent indexed document (to get
the one recent my documents contain a date field ) .
So I want to know which is the easiest way to do this;
may be there is solr parametters I have to use (faceting???).or 
Programmatically : in that case  which classes I have to modify or  I have
to inherit  from to develop this solution.
any suggestion  is welcome. and thank you in advance.







-- 
View this message in context: 
http://www.nabble.com/how-to-add-a-new-parameter-to-solr-request-tp17338190p17338190.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



Re: how to add a new parameter to solr request

2008-05-20 Thread khirb7

hello every body
I want just to add this example to be more clear. I have this result from
solr.

result name=response numFound=7 start=0 maxScore=0.59129626
−
doc
str name=id1/str
str name=DocUrlhttp://www.sarkozy.fr/str
str name=date01/01/2008/str
/doc
−
doc
str name=id2/str
str name=DocUrlhttp://www.sarkozy.fr/str
str name=date31/01/2008/str
/doc
−
doc
str name=id3/str
str name=DocUrlhttp://www.sarkozy.fr/str
str name=date15/01/2008/str
/doc
 .
 . 
 .
/result

Note that it's the same field   DocUrl (http://www.sarkozy.fr) for the three
shown document above. I want to get in  the result something like that.

result name=response numFound=7 start=0 maxScore=0.59129626
−
doc
str name=id2/str
str name=DocUrlhttp://www.sarkozy.fr/str
str name=date31/01/2008/str

/doc


 .
 . 
 .
/result
keep the recent one.

How to deal with that. Thank you in advance.




-- 
View this message in context: 
http://www.nabble.com/how-to-add-a-new-parameter-to-solr-request-tp17338190p17344135.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



Problrm Highlighting

2008-04-15 Thread khirb7

hello every body

Here is my problem :

when using highlighting solr   return only the best fragment (the most
relevant  section of the document) like this
Nicolas Sarkozy naît le 28 janvier 1955 dans le 17e
but I want solr to return me not only the best section but the best sections
(that I precise the number my self )
at first I thought that hl.snippet=number is suitable to generate best
sections of text but I noticed that this parameter has no effect on the
result of highlighting, even using it on per field like this:
http://localhost:8983/solr/select?indent=onversion=2.2q=arcDoc%3Asarkozystart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl=onhl.fl=arcDocf.arcDoc.hl.snippets=3hl.fragsize=300

the result I want to get is for example containing the best 3 sections like
this 
NicolasSarkozy naît le 28 janvier 1955 dans le 17e ... Lorsque Paul
 Sarkozyquitte le domicile conjugal en 1959 et ... Paul Sarkozy se ...

I found in the source code of the HighlightingUtils.class and the
GapFragmenter.class

/ get highlighter, and number of fragments for this field
Highlighter highlighter = getHighlighter(query, fieldName, req);
int numFragments = getMaxSnippets(fieldName, req);

   ..
   ..
frag = highlighter.getBestTextFragments(tstream, docTexts[0], false,
numFragments);

but why numFragments is 1 all the time. is it a known bug or tell me if I
have forgotten something in my request or any config parameter.

the other question is why there is similar classes (HighlightingUtils.class
and the GapFragmenter.class) with different name and which one is
used:confused:

thank you in advance.

-- 
View this message in context: 
http://www.nabble.com/Problrm-Highlighting-tp16698518p16698518.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



Re: Highlighting/getBestFragment

2008-04-13 Thread khirb7



Mike Klaas wrote:
 
 
 On 10-Apr-08, at 7:41 AM, khirb7 wrote:

 I have done deep search and I found that lucene provide this that  
 methode  :
 getBestFragments
 highlighter.getBestFragments(tokenStream, text, maxNumFragment,  
 ...);

 so with this methode we can precise to lucene to return
 maxNumFragment
 fragment (with highligted word)of fragsize characters, but there is no
 maxFragSize parameter in solr. this would be useful in my case if I  
 want to
 highlight not only the first occurrence of a searched word but up to 1
 occurrence of the same word in the highlighted text.
 
 I'm not sure I understand exactly what you want the parameter to do.
 
 see http://wiki.apache.org/solr/HighlightingParameters
 
 use:
 hl.fragsize=size to set the desired fragment size, and
 hl.snippets=number to set the number of returned snippets/fragments.
 
 -Mike
 
 
thank you for your response,

I think that I wasn't enough clear in my last post, (I have already read
http://wiki.apache.org/solr/HighlightingParameters before asking my question
last time)this is what I want to do:
now solr give in response one fragment and  I know 
hl.fragsize=size to set the desired fragment size, and
hl.snippets=number to set the number of returned snippets/fragments. but
hl.snippets is useful if we deal with multi-valuated field  (for instance
the feature field in the solr schema example) but in my case I have a single
field myText  which type is text   in each document so here
hl.snippets=number  has no sense, either used or not the highlighted
result is the same.

here is what I want to do.
lucene provide overloaded  methodes getBestFragment()  to return
fragments :
I think that solr classes use this methode 
highlighter.getBestFragment(tokenStream, text)
which return one fragment containing the first occurence of the searched
wordhighlighted , but I dont want only the first occurrence but the N(2th or
3th.) th one's
and I want to replace the previous methode by 

String result =
highlighter.getBestFragments(tokenStream, text, 5, ...); 
here we have maxNumFragment=5  the the five best fragment
so I want to know and where I must modify in Solr to do that:
which class and how.
or in solrconfig.xml  but i found this difficult may be I have to create my
Handler

I am waitin your suggestion how to deal with that.


 

 

-- 
View this message in context: 
http://www.nabble.com/Highlighting-getBestFragment-tp16608862p16656982.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



Highlighting/getBestFragment

2008-04-10 Thread khirb7

I have done deep search and I found that lucene provide this that methode  :
getBestFragments
highlighter.getBestFragments(tokenStream, text, maxNumFragment, ...);

so with this methode we can precise to lucene to return   maxNumFragment
fragment (with highligted word)of fragsize characters, but there is no
maxFragSize parameter in solr. this would be useful in my case if I want to
highlight not only the first occurrence of a searched word but up to 1
occurrence of the same word in the highlighted text.

so is it possible to add this option to solr? how and where?

cheers


-- 
View this message in context: 
http://www.nabble.com/Highlighting-getBestFragment-tp16608862p16608862.html
Sent from the Solr - Dev mailing list archive at Nabble.com.