Segment merge filering based on segment content
-----------------------------------------------

                 Key: NUTCH-677
                 URL: https://issues.apache.org/jira/browse/NUTCH-677
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 0.9.0
            Reporter: Marcin Okraszewski
             Fix For: 0.9.0


I needed a segment filtering based on meta data detected during parse phase. 
Unfortunately current URL based filtering does not allow for this. So I have 
created a new SegmentMergeFilter extension which receives segment entry which 
is being merged and decides if it should be included or not. Even though I 
needed only ParseData for my purpose I have done it a bit more general purpose, 
so the filter receives all merged data.

The attached patch is for version 0.9 which I use. Unfortunately I didn't have 
time to check how it fits to trunk version. Sorry :(

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to