[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476635
]
Andrzej Bialecki commented on NUTCH-443:
-----------------------------------------
Re: the "fake" CrawlDatum-s: this looks ugly no matter which way we look at it
... :| It appears you were right from the start, FETCH_TIME_KEY seems to be the
lesser evil at the moment.
Re: ParseResult.filter(): indeed - in fact, there is an inconsistency between
what Fetcher does and what ParseSegment does. Fetcher actually stores the
information about failed parsing - I had an impression that ParseSegment does
this too. IMHO it's a good opportunity to fix this so that it works the same
way in both places. Currently this information is used only in SegmentReader to
provide the info about the total numbers of generated, fetched and parsed urls.
However, other tools may use it to determine the failure rate of a specific
parser ... so I would hate to discard it.
Re: ParseImpl.isFetched compat issue - I was wrong here. That's a relief - I
hate such complications ...
Thanks!
> allow parsers to return multiple Parse object, this will speed up the rss
> parser
> --------------------------------------------------------------------------------
>
> Key: NUTCH-443
> URL: https://issues.apache.org/jira/browse/NUTCH-443
> Project: Nutch
> Issue Type: New Feature
> Components: fetcher
> Affects Versions: 0.9.0
> Reporter: Renaud Richardet
> Assigned To: Chris A. Mattmann
> Priority: Minor
> Fix For: 0.9.0
>
> Attachments: NUTCH-443-draft-v1.patch, NUTCH-443-draft-v2.patch,
> NUTCH-443-draft-v3.patch, NUTCH-443-draft-v4.patch, NUTCH-443-draft-v5.patch,
> NUTCH-443-draft-v6.patch, NUTCH-443-draft-v7.patch,
> NUTCH-443.022507.patch.txt, NUTCH-443.02282007.patch,
> parse-map-core-draft-v1.patch, parse-map-core-untested.patch, parsers.diff
>
>
> allow Parser#parse to return a Map<String,Parse>. This way, the RSS parser
> can return multiple parse objects, that will all be indexed separately.
> Advantage: no need to fetch all feed-items separately.
> see the discussion at
> http://www.nabble.com/RSS-fecter-and-index-individul-how-can-i-realize-this-function-tf3146271.html
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers