[jira] Created: (TIKA-463) HtmlParser doesn't extract links from img, map, object, frame, iframe, area, link

Ken Krugler (JIRA) Mon, 12 Jul 2010 13:23:45 -0700

HtmlParser doesn't extract links from img, map, object, frame, iframe, area, 
link
---------------------------------------------------------------------------------


                 Key: TIKA-463
                 URL: https://issues.apache.org/jira/browse/TIKA-463
             Project: Tika
          Issue Type: Bug
            Reporter: Ken Krugler
            Assignee: Ken Krugler


All of the listed HTML elements can have URLs as attributes, and thus we'd want 
to extract those links, if possible.

For elements that aren't valid as XHTML 1.0, there might be some challenges in 
the right way to handle this.

But if XHTML 1.0 means the union of "transitional and frameset" variants, then 
all of the above are valid, and thus should be emitted by the parser,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (TIKA-463) HtmlParser doesn't extract links from img, map, object, frame, iframe, area, link

Reply via email to