[jira] Updated: (NUTCH-677) Segment merge filering based on segment content
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-677: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 > Segment merge filering based on segment content > --- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement >Affects Versions: 0.9.0 >Reporter: Marcin Okraszewski > Attachments: MergeFilter.patch, MergeFilter_for_1.0.patch, > SegmentMergeFilter.java, SegmentMergeFilter.java, SegmentMergeFilters.java, > SegmentMergeFilters.java > > > I needed a segment filtering based on meta data detected during parse phase. > Unfortunately current URL based filtering does not allow for this. So I have > created a new SegmentMergeFilter extension which receives segment entry which > is being merged and decides if it should be included or not. Even though I > needed only ParseData for my purpose I have done it a bit more general > purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't > have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: - Attachment: SegmentMergeFilters.java Added Apache license header. > Segment merge filering based on segment content > --- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement >Affects Versions: 0.9.0 >Reporter: Marcin Okraszewski > Fix For: 1.1 > > Attachments: MergeFilter.patch, MergeFilter_for_1.0.patch, > SegmentMergeFilter.java, SegmentMergeFilter.java, SegmentMergeFilters.java, > SegmentMergeFilters.java > > > I needed a segment filtering based on meta data detected during parse phase. > Unfortunately current URL based filtering does not allow for this. So I have > created a new SegmentMergeFilter extension which receives segment entry which > is being merged and decides if it should be included or not. Even though I > needed only ParseData for my purpose I have done it a bit more general > purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't > have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: - Attachment: SegmentMergeFilter.java Added Apache License. > Segment merge filering based on segment content > --- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement >Affects Versions: 0.9.0 >Reporter: Marcin Okraszewski > Fix For: 1.1 > > Attachments: MergeFilter.patch, MergeFilter_for_1.0.patch, > SegmentMergeFilter.java, SegmentMergeFilter.java, SegmentMergeFilters.java > > > I needed a segment filtering based on meta data detected during parse phase. > Unfortunately current URL based filtering does not allow for this. So I have > created a new SegmentMergeFilter extension which receives segment entry which > is being merged and decides if it should be included or not. Even though I > needed only ParseData for my purpose I have done it a bit more general > purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't > have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: - Attachment: MergeFilter_for_1.0.patch The patch ported to Nutch 1.0. The Java files remain unchanged, only patch has changed. > Segment merge filering based on segment content > --- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement >Affects Versions: 0.9.0 >Reporter: Marcin Okraszewski > Fix For: 1.1 > > Attachments: MergeFilter.patch, MergeFilter_for_1.0.patch, > SegmentMergeFilter.java, SegmentMergeFilters.java > > > I needed a segment filtering based on meta data detected during parse phase. > Unfortunately current URL based filtering does not allow for this. So I have > created a new SegmentMergeFilter extension which receives segment entry which > is being merged and decides if it should be included or not. Even though I > needed only ParseData for my purpose I have done it a bit more general > purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't > have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney updated NUTCH-677: Fix Version/s: (was: 0.9.0) 1.1 Moving this issue to 1.1. > Segment merge filering based on segment content > --- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement >Affects Versions: 0.9.0 >Reporter: Marcin Okraszewski > Fix For: 1.1 > > Attachments: MergeFilter.patch, SegmentMergeFilter.java, > SegmentMergeFilters.java > > > I needed a segment filtering based on meta data detected during parse phase. > Unfortunately current URL based filtering does not allow for this. So I have > created a new SegmentMergeFilter extension which receives segment entry which > is being merged and decides if it should be included or not. Even though I > needed only ParseData for my purpose I have done it a bit more general > purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't > have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: - Attachment: SegmentMergeFilters.java Merge filter aggregation which hides extension point, etc. It is referred by the patch. > Segment merge filering based on segment content > --- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement >Affects Versions: 0.9.0 >Reporter: Marcin Okraszewski > Fix For: 0.9.0 > > Attachments: MergeFilter.patch, SegmentMergeFilter.java, > SegmentMergeFilters.java > > > I needed a segment filtering based on meta data detected during parse phase. > Unfortunately current URL based filtering does not allow for this. So I have > created a new SegmentMergeFilter extension which receives segment entry which > is being merged and decides if it should be included or not. Even though I > needed only ParseData for my purpose I have done it a bit more general > purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't > have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: - Attachment: SegmentMergeFilter.java The filter interface (referred by the patch). > Segment merge filering based on segment content > --- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement >Affects Versions: 0.9.0 >Reporter: Marcin Okraszewski > Fix For: 0.9.0 > > Attachments: MergeFilter.patch, SegmentMergeFilter.java > > > I needed a segment filtering based on meta data detected during parse phase. > Unfortunately current URL based filtering does not allow for this. So I have > created a new SegmentMergeFilter extension which receives segment entry which > is being merged and decides if it should be included or not. Even though I > needed only ParseData for my purpose I have done it a bit more general > purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't > have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: - Attachment: MergeFilter.patch The patch for 0.9 > Segment merge filering based on segment content > --- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement >Affects Versions: 0.9.0 >Reporter: Marcin Okraszewski > Fix For: 0.9.0 > > Attachments: MergeFilter.patch, SegmentMergeFilter.java > > > I needed a segment filtering based on meta data detected during parse phase. > Unfortunately current URL based filtering does not allow for this. So I have > created a new SegmentMergeFilter extension which receives segment entry which > is being merged and decides if it should be included or not. Even though I > needed only ParseData for my purpose I have done it a bit more general > purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't > have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.