[jira] [Resolved] (NUTCH-2319) Link with "rel=alternate" doesn't return in crawl

Sebastian Nagel (JIRA) Thu, 06 Apr 2017 02:24:08 -0700

     [ 
https://issues.apache.org/jira/browse/NUTCH-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sebastian Nagel resolved NUTCH-2319.
------------------------------------
    Resolution: Not A Problem

Hi [~zbhatuk], please reopen if the problem persists. It's not a bug but just 
an undesired site-effect when web servers send different content depending on 
the HTTP request. Spiders may see different content than a browser. That's even 
ok in this case because a RSS feed is more efficient to process than a HTML 
page.

> Link with "rel=alternate" doesn't return in crawl 
> --------------------------------------------------
>
>                 Key: NUTCH-2319
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2319
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Zuber
>
> I am using nutch-1.4. I am getting the issue that the nutch doesn't return 
> the URLs from the link rel="alternate".
>  For example, I am trying to crawl the URL  
> http://rssfeeds.azcentral.com/phoenix/asu which contains the  below link 
> which I am not getting as result.
> <link rel="alternate" type="application/atom+xml" 
> href="http://rssfeeds.azcentral.com/phoenix/asu&amp;x=1"; title="Phoenix - 
> ASU">
> Could you please help



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (NUTCH-2319) Link with "rel=alternate" doesn't return in crawl

Reply via email to