[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212495#comment-15212495
 ] 

ASF GitHub Bot commented on ANY23-247:
--------------------------------------

Github user ansell commented on the pull request:

    https://github.com/apache/any23/pull/17#issuecomment-201545776
  
    The system does seem a little too complex for our purposes and isn't usable 
because of that.
    
    Removing generics would be the first step IMO as there are too many 
rawtypes definitions which indicate generics are being used badly.
    
    ContentExtractor may be able to be completely removed instead of being 
refitted into the process after that and the parser should always be set to 
parse as far as practical for our purposes.
    
    It is a little strange that there isn't a buffered, markable, InputStream 
provided for all of the steps to reuse as necessary rather than pushing a raw 
InputStream or other source into different extractors.


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: ANY23-247
>                 URL: https://issues.apache.org/jira/browse/ANY23-247
>             Project: Apache Any23
>          Issue Type: Improvement
>    Affects Versions: 1.1
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>             Fix For: 1.2
>
>
> In the following markup
> {code}
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
> "http://www.w3.org/TR/html4/loose.dtd";>
> <html xmlns="http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
> <meta http-equiv="X-UA-Compatible" content="IE=edge" />
> <meta name="generator" content="ToolTwist" />
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to