Re: Problem generating summaries for redirected url´s

Elena Tue, 02 Dec 2008 00:02:23 -0800

Thank you for your response. I was lost seeing that summaries were only
generated for certain urls.


Is there any date set for the 1.0 release?

Elena


2008/11/25 Dennis Kubes <[EMAIL PROTECTED]>

>
>
> Elena wrote:
>
>> Hello everyone,
>>
>> I am using Nutch with the Solr plugin, and I am having a problem indexing
>> redirected url´s. While Solr generates its fields just fine, as if they
>> belonged to the redirected url, Nutch leaves the summary field empty. It
>> seems as if Nutch tries to generate the summary of the original url and
>> then
>> makes the query to Solr, which then follows the redirect and fills the
>> rest
>> of the fields using the final url. But I am not quite sure of this.
>>
>
> It depends on what version of Nutch you are using.  This was a problem with
> some older Trunk versions.  The problem is that Nutch has the concept of a
> representative url for redirects.  Redirects have an original and a
> redirected to url.  Logic dictates which of those is stored as the url and
> which is displayed on search results pages.  Most of the problems which this
> mismatch have been fixed in recent patches and should be deployed out in a
> new 1.0 release in the next week or so.
>
>
>> I would like to know what is the way Nutch generates summaries, why it
>> leaves them empty when redirecting. Perharps there is a command to
>> generate
>> one field in particular, after the indexing is done.
>>
>>  Summaries are generated, at query time, from the full text of the web
> page stored in ParseText under segments.  The
> org.apache.nutch.searcher.Summarizer plugins are what actually returns the
> summary text.  By default it uses the summary-basic plugin.
>
> Dennis
>
>  Thanks!
>>
>>

Re: Problem generating summaries for redirected url´s

Reply via email to