Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "TikaBatchUsage" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/TikaBatchUsage?action=diff&rev1=6&rev2=7

  = Usage =
  See TikaBatchOverview for a general design overview of tika-batch.
  
- The code is currently available 
[[https://github.com/tballison/tika/tree/TIKA-1330|here]].  The current goal is 
to add this into Tika 1.8.
+ tika-batch is now available in trunk and will be available in Tika 1.8.
  
  == TikaBatch FileSystem (FS) ==
  For expert users who don't want to use tika-app or who might want to do 
custom extensions, there are example driver files and logging config files 
available in 
[[https://github.com/tballison/tika/tree/TIKA-1302/tika-batch/src/main/examples|here]].
@@ -39, +39 @@

  
        java -jar tika-app.X.Y.jar -t -i <inputDirectory> -o <outputDirectory>
  
-  *Use the RecursiveParserWrapper and store text for each document:
+  *Use the !RecursiveParserWrapper and store text for each document:
        java -jar tika-app.X.Y.jar -J -t -i <inputDirectory> -o 
<outputDirectory>
  
   *Specify jvm args to be used by the child process (prepend a "J" to the 
regular args):
@@ -52, +52 @@

        
  === Some notes ===
  
-  *The watchdog process will restart the child process unless the child 
process exits with a "do not restart value"=666.  If you want to kill all 
processing, make sure to kill the parent process and then the child process.
+  *The watchdog process will restart the child process unless the child 
process exits with a "do not restart value"=254.  If you want to kill all 
processing, make sure to kill the parent process and then the child process.
  
-  *Because of a feature in javax's xml parser and the way the parser is 
configured in PDFBox 1.8.8, it is common to see error messages (e.g. Fatal 
Error :21:9: Element type "pdfx:" must be followed by either attribute 
specifications, ">" or "/>").  That should go away when Tika migrates to PDFBox 
2.0.
  
  
  == TikaBatch Server ==
@@ -64, +63 @@

  
  == TikaBatch Hadoop ==
  Module not yet implemented within Tika project...want to contribute?
+ See TikaInHadoop.
- Some external project links and blogs:
-  *[[http://svn.apache.org/repos/asf/oodt/trunk/crawler|Apache OODT Crawler]]
-  *[[https://github.com/DigitalPebble/behemoth|DigitalPebble]]
-  
*[[http://openpreservation.org/knowledge/blogs/2014/03/21/tika-ride-characterising-web-content-nanite/|Nanite]]
  

Reply via email to