Adrian Bird created TIKA-4737:
---------------------------------

             Summary: tika-4.0.0-alpha1 - Batch mode is confusing
                 Key: TIKA-4737
                 URL: https://issues.apache.org/jira/browse/TIKA-4737
             Project: Tika
          Issue Type: Bug
         Environment: Windows 11 with Java 17
            Reporter: Adrian Bird


Looking at the documentation I've found it very confusing for using what I'll 
call 'standard' mode vs 'batch' mode.
 # [Batch 
Processing|https://tika.apache.org/docs/4.0.0-SNAPSHOT/using-tika/cli/index.html#_batch_processing_tika_async_cli]
 says 'For processing large numbers of files, use {{{}tika-async-cli{}}}. It 
uses the Tika Pipes architecture with forked JVM processes for fault tolerance.'
The examples uses 'tika-async-cli.jar' but this doesn't exist, but the example 
runs with the 'tika-app.jar'.
 # By using 'tika-app.jar' it is not clear what makes it run in 'batch' or 
'standard ' mode. My assumption is that it is the presence of the '-i' and '-o' 
options.
 # The help from the 'batch' process differs quite a lot from the options 
specified in the Batch Processing page above and in the 'standard' help output.
 # The Batch Processing page above doesn't say anything about how to use a 
config file, but the help does. 
 #  It is confusing to have 2 different ways of specifying the config file, 
depending whether you are using the 'standard' '–config=file.json' or 'batch' 
'-c file.json'.
 # It would also be useful if a message was output saying whether it was 
'standard' or 'batch' mode.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to