[ https://issues.apache.org/jira/browse/TIKA-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043805#comment-17043805 ]
Tilman Hausherr commented on TIKA-3035: --------------------------------------- I have no position on this. [~sorend] did not bring any further argument to support his position after reading the commit text so I guess this can be closed. > Tika-app --extract mode outputs to stderr instead of stdout > ----------------------------------------------------------- > > Key: TIKA-3035 > URL: https://issues.apache.org/jira/browse/TIKA-3035 > Project: Tika > Issue Type: Bug > Components: app > Affects Versions: 1.23 > Reporter: Soren Daugaard > Priority: Major > Labels: app, extract > Attachments: testPDF_childAttachments.pdf > > > In version 1.23 of Tika I am noticing a problem using the extract > functionality. When extracting items from a file the "Extracting ... to ... " > output goes to {{stderr}} instead of {{stdout}}. > This problem is observed using the runnable jar `tika-app-1.23.jar` . > _*Example to re-create problem:*_ > Here we explode {{testPDF_childAttachments.pdf}} and redirects standard error > to /{{dev/null}}: > {code:java} > $ java -jar tika-app-1.23.jar --extract-dir=tika-test/out/ -z > testPDF_childAttachments.pdf 2> /dev/null > {code} > If I do not redirect stderr I see: > {code:java} > $ java -jar tika-app-1.23.jar --extract-dir=tika-test/out/ -z > testPDF_childAttachments.pdf > INFO As a convenience, TikaCLI has turned on extraction of > inline images for the PDFParser (TIKA-2374). > Aside from the -z option, this is not the default behavior > in Tika generally or in tika-server. > Jan 31, 2020 8:06:01 PM org.apache.tika.config.InitializableProblemHandler$3 > handleInitializableProblem > WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. > See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io > for optional dependencies.Jan 31, 2020 8:06:01 PM > org.apache.tika.config.InitializableProblemHandler$3 > handleInitializableProblem > WARNING: Tesseract OCR is installed and will be automatically applied to > image files unless > you've excluded the TesseractOCRParser from the default parser. > Tesseract may dramatically slow down content extraction (TIKA-2359). > As of Tika 1.15 (and prior versions), Tesseract is automatically called. > In future versions of Tika, users may need to turn the TesseractOCRParser on > via TikaConfig. > Jan 31, 2020 8:06:01 PM org.apache.tika.config.InitializableProblemHandler$3 > handleInitializableProblem > WARNING: org.xerial's sqlite-jdbc is not loaded. > Please provide the jar on your classpath to parse sqlite files. > See tika-parsers/pom.xml for the correct version. > Extracting 'image0.jpg' (image/jpeg) to > tika-test/out/3975acae-089c-43ae-a3bc-04e4987a0282-image0.jpg > Extracting 'image1.tif' (image/tiff) to > tika-test/out/8d11e4e3-735b-4b0b-9441-3ed4332c2f53-image1.tif > WARN No Unicode mapping for f_i (31) in font SCZFMD+HelveticaNeueLTStd-Roman > Extracting 'Press Quality(1).joboptions' (text/plain) to > tika-test/out/28c3fb48-30ea-403b-8a35-252c8f692305-Press Quality(1).joboptions > Extracting 'Unit10.doc' (application/msword) to > tika-test/out/008b9157-75f3-453b-bdfd-d5403c56891c-Unit10.doc > {code} > Using 1.22 I correctly see the extracted files in {{stdout}} when redirecting > {{stderr}}: > {code:java} > $ java -jar tika-app-1.22.jar --extract-dir=tika-test/out/ -z > testPDF_childAttachments.pdf 2> /dev/null > Extracting 'image0.jpg' (image/jpeg) to > tika-test/out/4ec61a12-4e5f-4de3-bee8-fa15521c374a-image0.jpg > Extracting 'image1.tif' (image/tiff) to > tika-test/out/004fbeb5-4b0e-4d35-8c50-23a420dccc99-image1.tif > Extracting 'Press Quality(1).joboptions' (text/plain) to > tika-test/out/8f6174d1-f0c7-4143-990d-a922c2e9513a-Press Quality(1).joboptions > Extracting 'Unit10.doc' (application/msword) to > tika-test/out/b2508bee-745d-4051-b927-0f5c31b97c1e-Unit10.doc > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)