[ https://issues.apache.org/jira/browse/ANY23-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755838#comment-13755838 ]
Peter Ansell commented on ANY23-137: ------------------------------------ Lev released 0.6 over the weekend and I updated the RDFa parser factories in Any23 to use it (via RDFFormat.RDFA). There are some unit tests that are failing, so I haven't committed it to the master branch yet. Some are failing due to well-formedness exceptions, which may be that Semargl is more strict than our previous tag soup parser. One of them that I am interested in seems to be failing due to an error extracting CURIEs and mapping them to Sesame: RDFa11ExtractorTest>AbstractRDFaExtractorTestCase.testRDFa11CURIEs:77->AbstractExtractorTestCase.assertContains:244 Assertion failed! Extracted triples: <http://dbpedia.org/resource/Albert_Einstein> <http://dbpedia.org/name> "Albert Einstein" ; <http://dbpedia.org/knows> <http://dbpedia.org/resource/Franklin_Roosevlet> . <db:table/Departments> <db:description> "Tables listing departments" ; <http://xmlns.com/foaf/0.1/author> <db:people/Davide_Palmisano> ; <http://purl.org/dc/terms/name> "Departments" . Cannot find triple (http://database.org/table/Departments http://database.org/description "Tables listing departments") That error message seems to indicate that the internal Sesame repository did not receive the namespace declaration to map "db:" to "http://database.org/". That will need to be tested at the Semargl end of things, however, it may also be an error on our end if we are using a custom RDFHandler that doesn't react properly to RDFHandler.handleNamespace. The branch, named ANY23-137, with the parser factory conversion is available in the Apache Git repository and in my GitHub repository if you prefer to fetch it from there. > RDFa parser implementation proposal > ----------------------------------- > > Key: ANY23-137 > URL: https://issues.apache.org/jira/browse/ANY23-137 > Project: Apache Any23 > Issue Type: Improvement > Components: core > Affects Versions: 0.8.0 > Reporter: Lev Khomich > Assignee: Peter Ansell > Priority: Minor > Fix For: 0.9.0 > > Attachments: oQYfomKX.part, rdfa-extractor-proposal.patch > > > As a follow up to discussion [1]. > I've implemented another RDFa extractor for Any23 (0.7.1). > Proposed code depends on semargl project [2]. It isn't published in maven > central, therefore I didn't change any poms. > Still not quite sure about class name (because related ones are already > taken), > feel free to rename it. See attachments for patch with extractor and tests. > [1] http://mail-archives.apache.org/mod_mbox/any23-dev/201212.mbox/browser > [2] http://semarglproject.org -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira