Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "TikaBatchUsage" page has been changed by TimothyAllison: https://wiki.apache.org/tika/TikaBatchUsage?action=diff&rev1=3&rev2=4 == TikaBatch FileSystem (FS) == For expert users who don't want to use tika-app or who might want to do custom extensions, there are example driver files and logging config files available in [[https://github.com/tballison/tika/tree/TIKA-1302/tika-batch/src/main/examples|here]]. - == TikaBatch via tika-app-X.X.jar == + == TikaBatch via tika-app-X.Y.jar == There is an initial integration with tika-app on a github [[https://github.com/tballison/tika/tree/TIKA-1302|fork]]. You can see the commandline arguments via the regular "-?" or "--help" commands. There is a separate section at the end for tika-batch options. @@ -38, +38 @@ java -jar tika-app.X.Y.jar -JXmx2g -JDlog4j.configuration={{file:bin/log4j.xml}} <inputDirectory> + *Commandline to generate output files for tika-eval...only process those files listed in pdfs_random_50000.csv: + java -Dlog4j.configuration=file:bin/log4j_driver.xml -jar tika-app-X.Y.jar -JXmx6g -JDlog4j.configuration=file:bin/log4j.xml -bc tika-batch-config-basic-test.xml -numConsumers 10 -targDir <targDir> -srcDir <srcDir> -fileList pdfs_random_50000.csv + + + + == TikaBatch Server == Module not yet implemented...want to contribute? + This would require hardening the server and creating an example client to be used within + TikaBatch FS framework. == TikaBatch Hadoop == - Module not yet implemented...want to contribute? + Module not yet implemented within Tika project...want to contribute? + Some external project links and blogs: + *[[https://github.com/DigitalPebble/behemoth|DigitalPebble]] + *[[http://openpreservation.org/knowledge/blogs/2014/03/21/tika-ride-characterising-web-content-nanite/|Nanite]]
