GitHub user lewismc opened a pull request:

    https://github.com/apache/any23/pull/24

    Initial move towards addressing ANY23-280 Refactor ContentExtractor to 
improve extraction flexibility

    Hi Folks,
    This is an initial crack at addressing 
https://issues.apache.org/jira/browse/ANY23-280
    Essentially, the main API difference is the complete removal of ```public 
interface ContentExtractor extends Extractor<InputStream>``` from the Extractor 
interface in the api module.
    This patch has a long way to go with numerous failing tests however I 
wanted to post it for feedback.
    Although Any23 still builds with -DskipTests, without that flag the failing 
tests are as follows
    ```
    Results :
    
    Failed tests:
      Any23Test.testDemoCodeSnippet1:201
      Any23Test.testN3Detection1:92->assertDetection:661
      Any23Test.testN3Detection2:97->assertDetection:661
      Any23Test.testTTLDetection:87->assertDetection:661
      RoverTest.testRunMultiURLs:104->runWithMultiSourcesAndVerify:134 
Unexpected number of statements.
    Tests in error:
      Any23Test.testProgrammaticExtraction:279 » NullPointer
    
CSVExtractorTest.testExtractionCommaSeparated:49->AbstractExtractorTestCase.dumpModelToRDFXML:714
 » Runtime
    
CSVExtractorTest.testExtractionEmptyValue:112->AbstractExtractorTestCase.dumpModelToRDFXML:714
 » Runtime
    
CSVExtractorTest.testExtractionSemicolonSeparated:64->AbstractExtractorTestCase.dumpModelToRDFXML:714
 » Runtime
    
CSVExtractorTest.testExtractionTabSeparated:79->AbstractExtractorTestCase.dumpModelToRDFXML:714
 » Runtime
    
CSVExtractorTest.testTypeManagement:94->AbstractExtractorTestCase.dumpModelToRDFXML:714
 » Runtime
    
RDFa11ExtractorTest>AbstractRDFaExtractorTestCase.testDrupalTestPage:124->AbstractExtractorTestCase.assertExtract:217->AbstractExtractorTestCase.assertExtract:200->AbstractExtractorTestCase.extract:185
 » NullPointer
    
RDFaExtractorTest>AbstractRDFaExtractorTestCase.testDrupalTestPage:124->AbstractExtractorTestCase.assertExtract:217->AbstractExtractorTestCase.assertExtract:200->AbstractExtractorTestCase.extract:185
 » NullPointer
    Tests run: 403, Failures: 5, Errors: 8, Skipped: 11
    ```
    You will see that some of the tests concern 
https://issues.apache.org/jira/browse/ANY23-267 as well.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lewismc/any23 ANY23-280

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/any23/pull/24.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #24
    
----
commit 801f2f93967bfd1295700223085eef3f54181517
Author: Lewis John McGibbney <lewis.j.mcgibb...@jpl.nasa.gov>
Date:   2016-04-06T19:44:35Z

    Initial move towards addressing ANY23-280 Refactor ContentExtractor to 
improve extraction flexibility

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to