[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doğacan Güney updated NUTCH-677: -------------------------------- Fix Version/s: (was: 0.9.0) 1.1 Moving this issue to 1.1. > Segment merge filering based on segment content > ----------------------------------------------- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement > Affects Versions: 0.9.0 > Reporter: Marcin Okraszewski > Fix For: 1.1 > > Attachments: MergeFilter.patch, SegmentMergeFilter.java, > SegmentMergeFilters.java > > > I needed a segment filtering based on meta data detected during parse phase. > Unfortunately current URL based filtering does not allow for this. So I have > created a new SegmentMergeFilter extension which receives segment entry which > is being merged and decides if it should be included or not. Even though I > needed only ParseData for my purpose I have done it a bit more general > purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't > have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.