[ https://issues.apache.org/jira/browse/NUTCH-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel updated NUTCH-2261: ----------------------------------- Fix Version/s: 1.14 > ParseSegment job does not pass metadata for content-level redirects > ------------------------------------------------------------------- > > Key: NUTCH-2261 > URL: https://issues.apache.org/jira/browse/NUTCH-2261 > Project: Nutch > Issue Type: Bug > Components: metadata, parser > Affects Versions: 1.11, 1.12, 1.13 > Reporter: David Astle > Priority: Minor > Fix For: 1.14 > > > When Fetcher runs in parsing mode, CrawlDatum metadata is properly passed to > a new CrawlDatum for content-level redirects (HTML meta tag "Refresh"). If > Fetcher runs in non-parsing mode, and ParseSegment is run as a separate step, > then metadata other than "_repr_" is not passed to the new CrawlDatum. > This means that any filter relying on metadata, such as DepthScoringFilter > and URLMetaScoringFilter, will not work. -- This message was sent by Atlassian JIRA (v6.4.14#64029)