[jira] [Updated] (TIKA-985) Support for HTML5 elements

2021-07-21 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-985:
-
Fix Version/s: (was: 2.0.0)
   2.0.0-BETA

> Support for HTML5 elements
> --
>
> Key: TIKA-985
> URL: https://issues.apache.org/jira/browse/TIKA-985
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.2
>Reporter: Markus Jelsma
>Priority: Major
> Fix For: 1.17, 2.0.0-BETA, 2.0.1
>
> Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
> TIKA-985-1.3-3.patch, TIKA-985-1.5.patch
>
>
> TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
> section). This prevents some custom ContentHandlers from reading expected 
> elements and/or attributes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2021-07-21 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-985:
-
Fix Version/s: (was: 2.0.0)
   2.0.1

> Support for HTML5 elements
> --
>
> Key: TIKA-985
> URL: https://issues.apache.org/jira/browse/TIKA-985
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.2
>Reporter: Markus Jelsma
>Priority: Major
> Fix For: 1.17, 2.0.0-BETA, 2.0.1
>
> Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
> TIKA-985-1.3-3.patch, TIKA-985-1.5.patch
>
>
> TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
> section). This prevents some custom ContentHandlers from reading expected 
> elements and/or attributes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2017-05-21 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-985:
---
Fix Version/s: (was: 1.15)
   1.16

> Support for HTML5 elements
> --
>
> Key: TIKA-985
> URL: https://issues.apache.org/jira/browse/TIKA-985
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.2
>Reporter: Markus Jelsma
> Fix For: 1.16
>
> Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
> TIKA-985-1.3-3.patch, TIKA-985-1.5.patch
>
>
> TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
> section). This prevents some custom ContentHandlers from reading expected 
> elements and/or attributes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2016-10-19 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-985:
---
Fix Version/s: (was: 1.14)
   1.15

> Support for HTML5 elements
> --
>
> Key: TIKA-985
> URL: https://issues.apache.org/jira/browse/TIKA-985
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.2
>Reporter: Markus Jelsma
> Fix For: 1.15
>
> Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
> TIKA-985-1.3-3.patch, TIKA-985-1.5.patch
>
>
> TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
> section). This prevents some custom ContentHandlers from reading expected 
> elements and/or attributes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2016-04-22 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-985:
---
Fix Version/s: (was: 1.13)
   1.14

> Support for HTML5 elements
> --
>
> Key: TIKA-985
> URL: https://issues.apache.org/jira/browse/TIKA-985
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.2
>Reporter: Markus Jelsma
> Fix For: 1.14
>
> Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
> TIKA-985-1.3-3.patch, TIKA-985-1.5.patch
>
>
> TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
> section). This prevents some custom ContentHandlers from reading expected 
> elements and/or attributes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2016-01-24 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-985:
---
Fix Version/s: (was: 1.12)
   1.13

> Support for HTML5 elements
> --
>
> Key: TIKA-985
> URL: https://issues.apache.org/jira/browse/TIKA-985
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.2
>Reporter: Markus Jelsma
> Fix For: 1.13
>
> Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
> TIKA-985-1.3-3.patch, TIKA-985-1.5.patch
>
>
> TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
> section). This prevents some custom ContentHandlers from reading expected 
> elements and/or attributes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2015-10-18 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-985:
---
Fix Version/s: (was: 1.11)
   1.12

> Support for HTML5 elements
> --
>
> Key: TIKA-985
> URL: https://issues.apache.org/jira/browse/TIKA-985
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.2
>Reporter: Markus Jelsma
> Fix For: 1.12
>
> Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
> TIKA-985-1.3-3.patch, TIKA-985-1.5.patch
>
>
> TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
> section). This prevents some custom ContentHandlers from reading expected 
> elements and/or attributes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2015-08-08 Thread Dave Meikle (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Meikle updated TIKA-985:
-
Fix Version/s: (was: 1.10)
   1.11

* Pushed to 1.11 following 1.10 release

 Support for HTML5 elements
 --

 Key: TIKA-985
 URL: https://issues.apache.org/jira/browse/TIKA-985
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.2
Reporter: Markus Jelsma
 Fix For: 1.11

 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
 TIKA-985-1.3-3.patch, TIKA-985-1.5.patch


 TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
 section). This prevents some custom ContentHandlers from reading expected 
 elements and/or attributes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2014-10-24 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-985:
---
Fix Version/s: (was: 1.7)
   1.8

- push to 1.8

 Support for HTML5 elements
 --

 Key: TIKA-985
 URL: https://issues.apache.org/jira/browse/TIKA-985
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.2
Reporter: Markus Jelsma
 Fix For: 1.8

 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
 TIKA-985-1.3-3.patch, TIKA-985-1.5.patch


 TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
 section). This prevents some custom ContentHandlers from reading expected 
 elements and/or attributes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2014-06-05 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-985:
---

Fix Version/s: (was: 1.6)
   1.7

 Support for HTML5 elements
 --

 Key: TIKA-985
 URL: https://issues.apache.org/jira/browse/TIKA-985
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.2
Reporter: Markus Jelsma
 Fix For: 1.7

 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
 TIKA-985-1.3-3.patch, TIKA-985-1.5.patch


 TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
 section). This prevents some custom ContentHandlers from reading expected 
 elements and/or attributes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2014-02-04 Thread Dave Meikle (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Meikle updated TIKA-985:
-

Fix Version/s: (was: 1.5)
   1.6

Pushed out to 1.6, preparing for 1.5 RC

 Support for HTML5 elements
 --

 Key: TIKA-985
 URL: https://issues.apache.org/jira/browse/TIKA-985
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.2
Reporter: Markus Jelsma
 Fix For: 1.6

 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
 TIKA-985-1.3-3.patch, TIKA-985-1.5.patch


 TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
 section). This prevents some custom ContentHandlers from reading expected 
 elements and/or attributes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2013-07-25 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated TIKA-985:
---

Attachment: TIKA-985-1.5.patch

Dirty patch for Tika 1.5. This patch allows for headings (h1...h6) to be 
embedded inside elements like anchors etc. This is allowed in HTML5 and some 
pages already use this. Without this patch headings are reported out of order 
as SAX events.

 Support for HTML5 elements
 --

 Key: TIKA-985
 URL: https://issues.apache.org/jira/browse/TIKA-985
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.2
Reporter: Markus Jelsma
 Fix For: 1.5

 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
 TIKA-985-1.3-3.patch, TIKA-985-1.5.patch


 TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
 section). This prevents some custom ContentHandlers from reading expected 
 elements and/or attributes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2013-05-27 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-985:
---

Fix Version/s: (was: 1.4)
   1.5

- push to 1.5, get ready for 1.4 RC #1.

 Support for HTML5 elements
 --

 Key: TIKA-985
 URL: https://issues.apache.org/jira/browse/TIKA-985
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.2
Reporter: Markus Jelsma
 Fix For: 1.5

 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
 TIKA-985-1.3-3.patch


 TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
 section). This prevents some custom ContentHandlers from reading expected 
 elements and/or attributes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2013-01-17 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-985:
---

Fix Version/s: (was: 1.3)
   1.4

- push out to 1.4

 Support for HTML5 elements
 --

 Key: TIKA-985
 URL: https://issues.apache.org/jira/browse/TIKA-985
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.2
Reporter: Markus Jelsma
 Fix For: 1.4

 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
 TIKA-985-1.3-3.patch


 TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
 section). This prevents some custom ContentHandlers from reading expected 
 elements and/or attributes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2013-01-17 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-985:
---


- push out to 1.4

 Support for HTML5 elements
 --

 Key: TIKA-985
 URL: https://issues.apache.org/jira/browse/TIKA-985
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.2
Reporter: Markus Jelsma
 Fix For: 1.4

 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
 TIKA-985-1.3-3.patch


 TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
 section). This prevents some custom ContentHandlers from reading expected 
 elements and/or attributes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2012-10-03 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated TIKA-985:
---

Attachment: TIKA-985-1.3-3.patch

Here's a new patch. It allows metadata to be read from within the body and 
maintain metadata in the head.

 Support for HTML5 elements
 --

 Key: TIKA-985
 URL: https://issues.apache.org/jira/browse/TIKA-985
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.2
Reporter: Markus Jelsma
 Fix For: 1.3

 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch, 
 TIKA-985-1.3-3.patch


 TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
 section). This prevents some custom ContentHandlers from reading expected 
 elements and/or attributes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2012-08-30 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated TIKA-985:
---

Attachment: TIKA-985-1.3-1.patch

Here's a preliminary patch for 1.3. It adds some HTML5 elements to TagSoup's 
schema in our HtmlParser constructor. This allows for those elements to be 
parsed.

Support for all HTML5 elements should be added in TagSoup's schema.

 Support for HTML5 elements
 --

 Key: TIKA-985
 URL: https://issues.apache.org/jira/browse/TIKA-985
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.2
Reporter: Markus Jelsma
 Fix For: 1.3

 Attachments: TIKA-985-1.3-1.patch


 TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
 section). This prevents some custom ContentHandlers from reading expected 
 elements and/or attributes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TIKA-985) Support for HTML5 elements

2012-08-30 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated TIKA-985:
---

Attachment: TIKA-985-1.3-2.patch

Here's a new patch listing all HTML5 elements that are missing in the html.tssl 
file.

 Support for HTML5 elements
 --

 Key: TIKA-985
 URL: https://issues.apache.org/jira/browse/TIKA-985
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.2
Reporter: Markus Jelsma
 Fix For: 1.3

 Attachments: TIKA-985-1.3-1.patch, TIKA-985-1.3-2.patch


 TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, 
 section). This prevents some custom ContentHandlers from reading expected 
 elements and/or attributes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira