Implement a filter mechanism that allow intecepting every stage of a crawling
process
-------------------------------------------------------------------------------------
Key: DROIDS-58
URL: https://issues.apache.org/jira/browse/DROIDS-58
Project: Droids
Issue Type: New Feature
Affects Versions: 0.01
Reporter: Mingfai Ma
refer to this:
http://mail-archives.apache.org/mod_mbox/incubator-droids-dev/200906.mbox/%[email protected]%3e
assume the process is
1. poll a link from queue
2. fetch entity
3. parse entity
4. extract outlinks
we provide a mechanism to intercept the process in every stage. e.g. a
LinkFilter has a "public T polled(T link);" interface, any filter may reject
or transform a Link polled from the queue. similar logic applies to fetching,
parsing, and extracting (outlink)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.