[ 
https://issues.apache.org/jira/browse/TIKA-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18044383#comment-18044383
 ] 

Tilman Hausherr edited comment on TIKA-4545 at 12/11/25 11:23 AM:
------------------------------------------------------------------

Windows build fails in the tika pipes solr integration tests because of the 
path problem we've had in the past. It can be fixed at 3 places in 
TikaPipesSolrTestBase.getTikaConfig() like this
{code:java}
       String res = json.replace("UPDATE_STRATEGY", updateStrategy.toString())
                .replace("ATTACHMENT_STRATEGY", attachmentStrategy.toString())
                .replaceAll("FETCHER_BASE_PATH",
                        
Matcher.quoteReplacement(testFileFolder.toAbsolutePath().toString().replace("\\",
 "/")))
                .replace("PARSE_MODE", parseMode.name())
                .replace("SOLR_URLS", solrUrls)
                .replace("SOLR_ZK_HOSTS", solrZkHosts);

        res = res.replace("TIKA_CONFIG", 
tikaConfig.toAbsolutePath().toString().replace("\\", "/"));

        Path log4jPropFile = pipesDirectory.resolve("log4j2.xml");
        try (InputStream is = 
this.getClass().getResourceAsStream("/pipes-fork-server-custom-log4j2.xml")) {
            Files.copy(is, log4jPropFile, 
java.nio.file.StandardCopyOption.REPLACE_EXISTING);
        }
        res = res.replace("LOG4J_PROPERTIES_FILE", 
log4jPropFile.toAbsolutePath().toString().replace("\\", "/"));
{code}

Same in S3PipeIntegrationTest.java:
{code:java}
        String pluginsConfig = pluginsTemplate
                .replace("{TIKA_CONFIG}", 
tikaConfigFile.getAbsolutePath().replace("\\", "/"))
                .replace("{LOG4J_PROPERTIES_FILE}", 
log4jPropFile.getAbsolutePath().replace("\\", "/"))
{code}
Same in TikaPipesKafkaTest.getTikaConfig, I realize now that a replace at the 
end is fine too, like already done elsewhere.


was (Author: tilman):
Windows build fails in the tika pipes solr integration tests because of the 
path problem we've had in the past. It can be fixed at 3 places in 
TikaPipesSolrTestBase.getTikaConfig() like this
{code:java}
       String res = json.replace("UPDATE_STRATEGY", updateStrategy.toString())
                .replace("ATTACHMENT_STRATEGY", attachmentStrategy.toString())
                .replaceAll("FETCHER_BASE_PATH",
                        
Matcher.quoteReplacement(testFileFolder.toAbsolutePath().toString().replace("\\",
 "/")))
                .replace("PARSE_MODE", parseMode.name())
                .replace("SOLR_URLS", solrUrls)
                .replace("SOLR_ZK_HOSTS", solrZkHosts);

        res = res.replace("TIKA_CONFIG", 
tikaConfig.toAbsolutePath().toString().replace("\\", "/"));

        Path log4jPropFile = pipesDirectory.resolve("log4j2.xml");
        try (InputStream is = 
this.getClass().getResourceAsStream("/pipes-fork-server-custom-log4j2.xml")) {
            Files.copy(is, log4jPropFile, 
java.nio.file.StandardCopyOption.REPLACE_EXISTING);
        }
        res = res.replace("LOG4J_PROPERTIES_FILE", 
log4jPropFile.toAbsolutePath().toString().replace("\\", "/"));
{code}

Same in S3PipeIntegrationTest.java:
{code:java}
        String pluginsConfig = pluginsTemplate
                .replace("{TIKA_CONFIG}", 
tikaConfigFile.getAbsolutePath().replace("\\", "/"))
                .replace("{LOG4J_PROPERTIES_FILE}", 
log4jPropFile.getAbsolutePath().replace("\\", "/"))
{code}


> Fully integrate new json based deserializer in 4.x
> --------------------------------------------------
>
>                 Key: TIKA-4545
>                 URL: https://issues.apache.org/jira/browse/TIKA-4545
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>             Fix For: 4.0.0
>
>
> Follow on for TIKA-4544.
> Steps:
>  * Add annotations to components (parsers, etc.) and unit tests to confirm 
> they work (finished this today)
>  * Modify components (parsers etc), at least a few of them so that they are 
> actually configurable. We don't have to modify all, just the most important 
> ones PDFParser, tesseract, MSOffice, and others???
>  * Move to tika-config.json in tika-pipes client/server, tika-async-cli, 
> tika-app and tika-server one by one



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to