[ 
https://issues.apache.org/jira/browse/ANY23-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755838#comment-13755838
 ] 

Peter Ansell commented on ANY23-137:
------------------------------------

Lev released 0.6 over the weekend and I updated the RDFa parser factories in 
Any23 to use it (via RDFFormat.RDFA). 

There are some unit tests that are failing, so I haven't committed it to the 
master branch yet. Some are failing due to well-formedness exceptions, which 
may be that Semargl is more strict than our previous tag soup parser. One of 
them that I am interested in seems to be failing due to an error extracting 
CURIEs and mapping them to Sesame:

    
RDFa11ExtractorTest>AbstractRDFaExtractorTestCase.testRDFa11CURIEs:77->AbstractExtractorTestCase.assertContains:244
 Assertion failed! Extracted triples:
    <http://dbpedia.org/resource/Albert_Einstein> <http://dbpedia.org/name> 
"Albert Einstein" ;
        <http://dbpedia.org/knows> 
<http://dbpedia.org/resource/Franklin_Roosevlet> .

    <db:table/Departments> <db:description> "Tables listing departments" ;
        <http://xmlns.com/foaf/0.1/author> <db:people/Davide_Palmisano> ;
        <http://purl.org/dc/terms/name> "Departments" .
    Cannot find triple (http://database.org/table/Departments 
http://database.org/description "Tables listing departments")

That error message seems to indicate that the internal Sesame repository did 
not receive the namespace declaration to map "db:" to "http://database.org/";. 
That will need to be tested at the Semargl end of things, however, it may also 
be an error on our end if we are using a custom RDFHandler that doesn't react 
properly to RDFHandler.handleNamespace.

The branch, named ANY23-137, with the parser factory conversion is available in 
the Apache Git repository and in my GitHub repository if you prefer to fetch it 
from there.
                
> RDFa parser implementation proposal
> -----------------------------------
>
>                 Key: ANY23-137
>                 URL: https://issues.apache.org/jira/browse/ANY23-137
>             Project: Apache Any23
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 0.8.0
>            Reporter: Lev Khomich
>            Assignee: Peter Ansell
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: oQYfomKX.part, rdfa-extractor-proposal.patch
>
>
> As a follow up to discussion [1].
> I've implemented another RDFa extractor for Any23 (0.7.1).
> Proposed code depends on semargl project [2]. It isn't published in maven
> central, therefore I didn't change any poms.
> Still not quite sure about class name (because related ones are already 
> taken),
> feel free to rename it. See attachments for patch with extractor and tests.
> [1] http://mail-archives.apache.org/mod_mbox/any23-dev/201212.mbox/browser
> [2] http://semarglproject.org

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to