[ https://issues.apache.org/jira/browse/NUTCH-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208251#comment-16208251 ]
Sebastian Nagel commented on NUTCH-2443: ---------------------------------------- +1 Good catch. There are actually a few more links missed, esp. in HTML5, cf. [this list of URL-value attributes|https://stackoverflow.com/questions/2725156/complete-list-of-html-tag-attributes-which-have-a-url-value]. Nevertheless +1! > Extract links from the video tag with the parse-html plugin > ----------------------------------------------------------- > > Key: NUTCH-2443 > URL: https://issues.apache.org/jira/browse/NUTCH-2443 > Project: Nutch > Issue Type: Improvement > Components: parser, plugin > Affects Versions: 1.13 > Reporter: Jorge Luis Betancourt Gonzalez > Assignee: Jorge Luis Betancourt Gonzalez > Priority: Minor > Fix For: 1.14 > > > At the moment the {{parse-html}} extracts links from the tags {{a, area, > form}} (configurable){{, frame, iframe, script, link, img}}. Since we allow > extracting links to binary files (images) extracting links also from the > {{video}} tag should be supported. -- This message was sent by Atlassian JIRA (v6.4.14#64029)