[ https://issues.apache.org/jira/browse/NUTCH-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney resolved NUTCH-1585. ----------------------------------------- Resolution: Fixed > Ensure duplicate tags do not exist in microformat-reltag tag set. > ----------------------------------------------------------------- > > Key: NUTCH-1585 > URL: https://issues.apache.org/jira/browse/NUTCH-1585 > Project: Nutch > Issue Type: Improvement > Components: parser > Affects Versions: 1.6, 2.2 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney > Fix For: 1.7, 2.2.1 > > Attachments: NUTCH-1585-2.x.patch, NUTCH-1585-trunk.patch > > > A WebPage can have many many embedded tags and other such markup. > Creating huge tag lists containing many many duplicates is counter productive > to the process of parsing and extracting out such structure. > We should add a mechanism to only include single tag occurrences for the > microformats-reltag parser. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira