Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "TikaBatchUsage" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/TikaBatchUsage?action=diff&rev1=3&rev2=4

  == TikaBatch FileSystem (FS) ==
  For expert users who don't want to use tika-app or who might want to do 
custom extensions, there are example driver files and logging config files 
available in 
[[https://github.com/tballison/tika/tree/TIKA-1302/tika-batch/src/main/examples|here]].
  
- == TikaBatch via tika-app-X.X.jar ==
+ == TikaBatch via tika-app-X.Y.jar ==
  There is an initial integration with tika-app on a github 
[[https://github.com/tballison/tika/tree/TIKA-1302|fork]].
  
  You can see the commandline arguments via the regular "-?" or "--help" 
commands.  There is a separate section at the end for tika-batch options.
@@ -38, +38 @@

  
        java -jar tika-app.X.Y.jar -JXmx2g 
-JDlog4j.configuration={{file:bin/log4j.xml}} <inputDirectory>
  
+  *Commandline to generate output files for tika-eval...only process those 
files listed in pdfs_random_50000.csv:
+       java -Dlog4j.configuration=file:bin/log4j_driver.xml -jar 
tika-app-X.Y.jar -JXmx6g -JDlog4j.configuration=file:bin/log4j.xml -bc 
tika-batch-config-basic-test.xml -numConsumers 10 -targDir <targDir> -srcDir 
<srcDir> -fileList pdfs_random_50000.csv
+ 
+ 
+       
+ 
  == TikaBatch Server ==
  Module not yet implemented...want to contribute?
+ This would require hardening the server and creating an example client to be 
used within
+ TikaBatch FS framework.
  
  == TikaBatch Hadoop ==
- Module not yet implemented...want to contribute?
+ Module not yet implemented within Tika project...want to contribute?
+ Some external project links and blogs:
+  *[[https://github.com/DigitalPebble/behemoth|DigitalPebble]]
+  
*[[http://openpreservation.org/knowledge/blogs/2014/03/21/tika-ride-characterising-web-content-nanite/|Nanite]]
  

Reply via email to