[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-985: - Fix Version/s: (was: 2.0.0) 2.0.0-BETA > Support for HTML5 elements > -- > > Key: TIKA-985 > URL: https://issues.apache.org/jira/browse/TIKA-985 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.2 >Reporter: Markus Jelsma >Priority: Major > Fix For: 1.17, 2.0.0-BETA, 2.0.1 > > Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, > TIKA-985-1.3-3.patch, TIKA-985-1.5.patch > > > TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, > section). This prevents some custom ContentHandlers from reading expected > elements and/or attributes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-985: - Fix Version/s: (was: 2.0.0) 2.0.1 > Support for HTML5 elements > -- > > Key: TIKA-985 > URL: https://issues.apache.org/jira/browse/TIKA-985 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.2 >Reporter: Markus Jelsma >Priority: Major > Fix For: 1.17, 2.0.0-BETA, 2.0.1 > > Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, > TIKA-985-1.3-3.patch, TIKA-985-1.5.patch > > > TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, > section). This prevents some custom ContentHandlers from reading expected > elements and/or attributes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-985: --- Fix Version/s: (was: 1.15) 1.16 > Support for HTML5 elements > -- > > Key: TIKA-985 > URL: https://issues.apache.org/jira/browse/TIKA-985 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.2 >Reporter: Markus Jelsma > Fix For: 1.16 > > Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, > TIKA-985-1.3-3.patch, TIKA-985-1.5.patch > > > TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, > section). This prevents some custom ContentHandlers from reading expected > elements and/or attributes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-985: --- Fix Version/s: (was: 1.14) 1.15 > Support for HTML5 elements > -- > > Key: TIKA-985 > URL: https://issues.apache.org/jira/browse/TIKA-985 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.2 >Reporter: Markus Jelsma > Fix For: 1.15 > > Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, > TIKA-985-1.3-3.patch, TIKA-985-1.5.patch > > > TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, > section). This prevents some custom ContentHandlers from reading expected > elements and/or attributes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-985: --- Fix Version/s: (was: 1.13) 1.14 > Support for HTML5 elements > -- > > Key: TIKA-985 > URL: https://issues.apache.org/jira/browse/TIKA-985 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.2 >Reporter: Markus Jelsma > Fix For: 1.14 > > Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, > TIKA-985-1.3-3.patch, TIKA-985-1.5.patch > > > TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, > section). This prevents some custom ContentHandlers from reading expected > elements and/or attributes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-985: --- Fix Version/s: (was: 1.12) 1.13 > Support for HTML5 elements > -- > > Key: TIKA-985 > URL: https://issues.apache.org/jira/browse/TIKA-985 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.2 >Reporter: Markus Jelsma > Fix For: 1.13 > > Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, > TIKA-985-1.3-3.patch, TIKA-985-1.5.patch > > > TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, > section). This prevents some custom ContentHandlers from reading expected > elements and/or attributes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-985: --- Fix Version/s: (was: 1.11) 1.12 > Support for HTML5 elements > -- > > Key: TIKA-985 > URL: https://issues.apache.org/jira/browse/TIKA-985 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.2 >Reporter: Markus Jelsma > Fix For: 1.12 > > Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, > TIKA-985-1.3-3.patch, TIKA-985-1.5.patch > > > TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, > section). This prevents some custom ContentHandlers from reading expected > elements and/or attributes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Meikle updated TIKA-985: - Fix Version/s: (was: 1.10) 1.11 * Pushed to 1.11 following 1.10 release Support for HTML5 elements -- Key: TIKA-985 URL: https://issues.apache.org/jira/browse/TIKA-985 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.2 Reporter: Markus Jelsma Fix For: 1.11 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, TIKA-985-1.3-3.patch, TIKA-985-1.5.patch TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, section). This prevents some custom ContentHandlers from reading expected elements and/or attributes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-985: --- Fix Version/s: (was: 1.7) 1.8 - push to 1.8 Support for HTML5 elements -- Key: TIKA-985 URL: https://issues.apache.org/jira/browse/TIKA-985 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.2 Reporter: Markus Jelsma Fix For: 1.8 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, TIKA-985-1.3-3.patch, TIKA-985-1.5.patch TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, section). This prevents some custom ContentHandlers from reading expected elements and/or attributes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-985: --- Fix Version/s: (was: 1.6) 1.7 Support for HTML5 elements -- Key: TIKA-985 URL: https://issues.apache.org/jira/browse/TIKA-985 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.2 Reporter: Markus Jelsma Fix For: 1.7 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, TIKA-985-1.3-3.patch, TIKA-985-1.5.patch TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, section). This prevents some custom ContentHandlers from reading expected elements and/or attributes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Meikle updated TIKA-985: - Fix Version/s: (was: 1.5) 1.6 Pushed out to 1.6, preparing for 1.5 RC Support for HTML5 elements -- Key: TIKA-985 URL: https://issues.apache.org/jira/browse/TIKA-985 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.2 Reporter: Markus Jelsma Fix For: 1.6 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, TIKA-985-1.3-3.patch, TIKA-985-1.5.patch TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, section). This prevents some custom ContentHandlers from reading expected elements and/or attributes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated TIKA-985: --- Attachment: TIKA-985-1.5.patch Dirty patch for Tika 1.5. This patch allows for headings (h1...h6) to be embedded inside elements like anchors etc. This is allowed in HTML5 and some pages already use this. Without this patch headings are reported out of order as SAX events. Support for HTML5 elements -- Key: TIKA-985 URL: https://issues.apache.org/jira/browse/TIKA-985 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.2 Reporter: Markus Jelsma Fix For: 1.5 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, TIKA-985-1.3-3.patch, TIKA-985-1.5.patch TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, section). This prevents some custom ContentHandlers from reading expected elements and/or attributes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-985: --- Fix Version/s: (was: 1.4) 1.5 - push to 1.5, get ready for 1.4 RC #1. Support for HTML5 elements -- Key: TIKA-985 URL: https://issues.apache.org/jira/browse/TIKA-985 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.2 Reporter: Markus Jelsma Fix For: 1.5 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, TIKA-985-1.3-3.patch TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, section). This prevents some custom ContentHandlers from reading expected elements and/or attributes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-985: --- Fix Version/s: (was: 1.3) 1.4 - push out to 1.4 Support for HTML5 elements -- Key: TIKA-985 URL: https://issues.apache.org/jira/browse/TIKA-985 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.2 Reporter: Markus Jelsma Fix For: 1.4 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, TIKA-985-1.3-3.patch TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, section). This prevents some custom ContentHandlers from reading expected elements and/or attributes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-985: --- - push out to 1.4 Support for HTML5 elements -- Key: TIKA-985 URL: https://issues.apache.org/jira/browse/TIKA-985 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.2 Reporter: Markus Jelsma Fix For: 1.4 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, TIKA-985-1.3-3.patch TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, section). This prevents some custom ContentHandlers from reading expected elements and/or attributes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated TIKA-985: --- Attachment: TIKA-985-1.3-3.patch Here's a new patch. It allows metadata to be read from within the body and maintain metadata in the head. Support for HTML5 elements -- Key: TIKA-985 URL: https://issues.apache.org/jira/browse/TIKA-985 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.2 Reporter: Markus Jelsma Fix For: 1.3 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, TIKA-985-1.3-3.patch TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, section). This prevents some custom ContentHandlers from reading expected elements and/or attributes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated TIKA-985: --- Attachment: TIKA-985-1.3-1.patch Here's a preliminary patch for 1.3. It adds some HTML5 elements to TagSoup's schema in our HtmlParser constructor. This allows for those elements to be parsed. Support for all HTML5 elements should be added in TagSoup's schema. Support for HTML5 elements -- Key: TIKA-985 URL: https://issues.apache.org/jira/browse/TIKA-985 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.2 Reporter: Markus Jelsma Fix For: 1.3 Attachments: TIKA-985-1.3-1.patch TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, section). This prevents some custom ContentHandlers from reading expected elements and/or attributes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-985) Support for HTML5 elements
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated TIKA-985: --- Attachment: TIKA-985-1.3-2.patch Here's a new patch listing all HTML5 elements that are missing in the html.tssl file. Support for HTML5 elements -- Key: TIKA-985 URL: https://issues.apache.org/jira/browse/TIKA-985 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.2 Reporter: Markus Jelsma Fix For: 1.3 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, section). This prevents some custom ContentHandlers from reading expected elements and/or attributes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira