Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "TikaBatchUsage" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/TikaBatchUsage?action=diff&rev1=7&rev2=8

  = Usage =
  See TikaBatchOverview for a general design overview of tika-batch.
  
- tika-batch is now available in trunk and will be available in Tika 1.8.
+ tika-batch was added to Tika 1.8 as its own package, and it was integrated 
into tika-app with 1.8 as well.
  
- == TikaBatch FileSystem (FS) ==
- For expert users who don't want to use tika-app or who might want to do 
custom extensions, there are example driver files and logging config files 
available in 
[[https://github.com/tballison/tika/tree/TIKA-1302/tika-batch/src/main/examples|here]].
  
  == TikaBatch via tika-app-X.Y.jar ==
- There is an initial integration with tika-app on a github 
[[https://github.com/tballison/tika/tree/TIKA-1302|fork]].
  
  You can see the commandline arguments via the regular "-?" or "--help" 
commands.  There is a separate section at the end for tika-batch options.
  
@@ -25, +22 @@

  
   *Most basic (with output to a directory called "output"):
  
-       java -jar tika-app.X.Y.jar <inputDirectory> <outputDirectory>
+       `java -jar tika-app.X.Y.jar <inputDirectory> <outputDirectory>`
  
   *Specify input and output directories:
  
-       java -jar tika-app.X.Y.jar -i /mydata/src/dir -o /mydata/output/dir
+       `java -jar tika-app.X.Y.jar -i /mydata/src/dir -o /mydata/output/dir`
  
   *Set the number of file consumer threads:
  
-       java -jar tika-app.X.Y.jar -numConsumers 10 -i <inputDirectory> -o 
<outputDirectory>
+       `java -jar tika-app.X.Y.jar -numConsumers 10 -i <inputDirectory> -o 
<outputDirectory>`
  
   *Output text instead of xml
  
-       java -jar tika-app.X.Y.jar -t -i <inputDirectory> -o <outputDirectory>
+       `java -jar tika-app.X.Y.jar -t -i <inputDirectory> -o <outputDirectory>`
  
   *Use the !RecursiveParserWrapper and store text for each document:
-       java -jar tika-app.X.Y.jar -J -t -i <inputDirectory> -o 
<outputDirectory>
+       `java -jar tika-app.X.Y.jar -J -t -i <inputDirectory> -o 
<outputDirectory>`
  
   *Specify jvm args to be used by the child process (prepend a "J" to the 
regular args):
  
-       java -jar tika-app.X.Y.jar -JXmx2g 
-JDlog4j.configuration={{file:bin/log4j.xml}} -i <inputDirectory> -o 
<outputDirectory>
+       `java -jar tika-app.X.Y.jar -JXmx2g -JDlog4j.configuration=log4j.xml -i 
<inputDirectory> -o <outputDirectory>`
  
   *Commandline to generate output files for tika-eval...only process those 
files listed in pdfs_random_50000.csv:
-       java -Dlog4j.configuration=file:bin/log4j_driver.xml -jar 
tika-app-X.Y.jar -JXmx6g -JDlog4j.configuration=file:bin/log4j.xml -bc 
tika-batch-config-basic-test.xml -numConsumers 10 -o <outputDirectory> -i 
<inputDirectory> -fileList pdfs_random_50000.csv
+           `java -Dlog4j.debug -Dlog4j.configuration=file:log4j_driver.xml -cp 
"bin/*" org.apache.tika.cli.TikaCLI -JXX:-OmitStackTraceInFastThrow -JXmx5g 
-JDlog4j.configuration=file:log4j.xml -bc tika-batch-config-basic-test.xml -i 
<input_directory> -o <output_directory> -fileList pdfs_random_50000.csv`
  
+  
        
  === Some notes ===
  
   *The watchdog process will restart the child process unless the child 
process exits with a "do not restart value"=254.  If you want to kill all 
processing, make sure to kill the parent process and then the child process.
  
- 
+  *Make sure to add -JXX:-OmitStackTraceInFastThrow to the child process's 
commandline arguments so that Java doesn't swallow your stack traces.
  
  == TikaBatch Server ==
  Module not yet implemented...want to contribute?

Reply via email to