[ https://issues.apache.org/jira/browse/SLING-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463833#comment-16463833 ]
Oliver Lietz commented on SLING-6783: ------------------------------------- [~jebailey], [~klcodanr], I guess we have to change the API of Commons HTML (used in Rewriter) and getting rid of SAX API to use a different parser for HTML5. I tried to plug in [AttoParser|https://www.attoparser.org] and [jsoup|https://jsoup.org] but both do not fit properly. WDYT? > Updates for Commons HTML > ------------------------ > > Key: SLING-6783 > URL: https://issues.apache.org/jira/browse/SLING-6783 > Project: Sling > Issue Type: Improvement > Components: Commons > Reporter: Jason E Bailey > Assignee: Oliver Lietz > Priority: Minor > Fix For: Commons HTML 1.0.2 > > Attachments: sling.patch > > > Following updates: > Updated tagsoup lib to 1.2.1 which has the following modifications > * DOCTYPE is now recognized even in lower case. > * We make sure to buffer the reader, eliminating a long-standing bug that > would crash on certain inputs, such as & followed by CR+LF. > * The HTML scanner's table is precompiled at run time for efficiency, causing > a 4x speedup on large input documents. > * ]] within a CDATA section no longer causes input to be discarded. > * Remove bogus newline after printing children of the root element. > * Allow the noscript element anywhere, the same as the script element. > * Updated to the 2011 edition of the W3C character entity list. > Additionally: > Updated license with new home page for tagsoup > Updated annotations to OSGi annotations > Added the ability to specify additional features/properties for the parser > Documented available settings > Javadoc fixed > Prepared for different parsers by renaming HtmlParserImpl and adding > component properties > Configuration improved -- This message was sent by Atlassian JIRA (v7.6.3#76005)