[ 
https://issues.apache.org/jira/browse/ANY23-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337540#comment-16337540
 ] 

ASF GitHub Bot commented on ANY23-326:
--------------------------------------

GitHub user HansBrende opened a pull request:

    https://github.com/apache/any23/pull/59

    ANY23-326 fixed rdfa issue with unclosed input & meta tags

    This PR should also fix ANY23-317, ANY23-273, ANY23-267, ANY23-271, and 
ANY23-227 (this time, for realz).
    
    These all have to do with the RDFa implementation failing to parse HTML.
    
    My previous commit attempted to fix these issues by changing the default 
parser from NekoHTML to Jsoup. But alas, it turns out the RDFa implementation 
is using a completely different html parser under the hood, and it's the RDFa 
parser that's too strict, not ours, so changing ours from NekoHTML to Jsoup had 
no effect (although it did come with a nice 20% speed increase, so there's 
that). It seems that, for rio parsers, the document is parsed with Jsoup *only 
to get the document language* and then parsed **again** under the hood with who 
knows what.
    
    Now, I simply check the RDF format to see if we're putting out XHTML. If we 
are, I first XHTML-ify the stream with Jsoup before sending it on to the rio 
RDF parser.
    
    mvn clean install -> all tests passed.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HansBrende/any23 ANY23-326

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/any23/pull/59.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #59
    
----
commit 74b2909b6d91cc4989093d90a38baef1c34c603f
Author: Hans <firedrake93@...>
Date:   2018-01-24T12:26:40Z

    ANY23-326 fixed rdfa issue with unclosed input & meta tags

----


> parsing unclosed meta and input tags fails
> ------------------------------------------
>
>                 Key: ANY23-326
>                 URL: https://issues.apache.org/jira/browse/ANY23-326
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 2.1
>         Environment: ubuntu 17.04
>            Reporter: Ben Roberts
>            Priority: Major
>             Fix For: 2.2
>
>
> parsing fails as soon as it hits an unclosed input or meta tag, as an example 
> try
>  ./bin/any23 rover https://ben.thatmustbe.me/note/2017/12/28/1
> [Fatal Error] :170:3: The element type "input" must be terminated by the 
> matching end-tag "</input>".
>  
> It seems like the issue might be that this is using a very old version of 
> jsoup.  at least as best I could tell.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to