[ 
https://issues.apache.org/jira/browse/SOLR-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291935#comment-16291935
 ] 

Tim Allison commented on SOLR-11701:
------------------------------------

I merged [~kramachand...@commvault.com]'s mods and made a few updates for Tika 
1.17.

I ran an integration test against 643 files in Apache Tika's unit test docs, 
and I got the same # of documents indexed in Solr as tika-app.jar parsed 
without exceptions.

{noformat}
    public static void main(String[] args) throws Exception {
        Path extracts = Paths.get("C:\\data\\tika_unit_tests_extracts");
        SolrClient client = new 
HttpSolrClient.Builder("http://localhost:8983/solr/fileupload_passt/";).build();
        for (File f : extracts.toFile().listFiles()) {
            try (Reader r = Files.newBufferedReader(f.toPath(), 
StandardCharsets.UTF_8)) {
                List<Metadata> metadataList = JsonMetadataList.fromJson(r);
                String ex = 
metadataList.get(0).get(TikaCoreProperties.TIKA_META_EXCEPTION_PREFIX + 
"runtime");
                if (ex == null) {
                    SolrQuery q = new SolrQuery("id: 
"+f.getName().replace(".json", ""));
                    QueryResponse response = client.query(q);
                    SolrDocumentList results = response.getResults();
                    if (results.getNumFound() != 1) {
                        System.err.println(f.getName() + " " + 
results.getNumFound());
                    }
                }
            }
        }
    }
{noformat}

I did the usual dance:
{noformat}
ant clean-jars jar-checksums
ant precommit
{noformat}

[~erickerickson], this _should_ be good to go.  


> Upgrade to Tika 1.17 when available
> -----------------------------------
>
>                 Key: SOLR-11701
>                 URL: https://issues.apache.org/jira/browse/SOLR-11701
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Tim Allison
>
> Kicking off release process for Tika 1.17 in the next few days.  Please let 
> us know if you have any requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to