Tika Api consumes given stream
I use Apache Tika bundle dependency for a Project to find out MimeTypes for Files. due to some issues we have to find out through InputStream. it is actually guaranteed to mark / reset given InputStream. Tika-Bundle includes core and parser api and uses PoifscontainerDetector , ZipContainerDetector, OggDetector, MimeTypes and Magic for detection. I have been debugging for 3 hours and all of Detectors mark and reset after detection. I did it in following way. TikaInputStream tis = null; try { TikaConfig config = new TikaConfig(); tikaDetector = config.getDetector(); tis = TikaInputStream.get(in); MediaType mediaType = tikaDetector.detect(tis, new Metadata()); if (mediaType != null) { String[] types = mediaType.toString().split(","); for (int i = 0; i < types.length; i++) { mimeTypes.add(new MimeType(types[i])); } } } catch (Exception e) { logger.error("Mime Type for given Stream could not be resolved: ", e); } But Stream is consumed. Does anyone know how to find out MimeTypes without consuming Stream? -- View this message in context: http://lucene.472066.n3.nabble.com/Tika-Api-consumes-given-stream-tp4168960.html Sent from the Apache Tika - Development mailing list archive at Nabble.com.
[jira] [Commented] (TIKA-1471) OOM with corrupt PDF file
[ https://issues.apache.org/jira/browse/TIKA-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208383#comment-14208383 ] Alan Burlison commented on TIKA-1471: - Running a separate indexer JVM would be safer but up until now I haven't had anything that causes fatal errors. I already have to spawn ps2ascii (ghostscript) sub-processes for Postscript files as PDFBox doesn't cope with some of the older ones in the corpus and the impact on indexing time is significant, so I want to do as much as possible from within the same JVM. bq. I wonder if PDFBOX-2200/TIKA-1424 is the culprit for the memory leak you mention. Adding the workaround from TIKA-1424 (calling org.apache.pdfbox.pdmodel.font.PDFont.clearResources) does seem to help a bit but I'm a bit wary about calling a static method that affects global state when multiple threads are running. I'm therefore just going to call it a the end of each index run - they are normally incremental so it's only the initial index build that reads the whole corpus. Although mem usage is approx ~4Gb after a full reindex I can just restart the appserver if necessary. Thanks for the helpful hints and tips :-) > OOM with corrupt PDF file > - > > Key: TIKA-1471 > URL: https://issues.apache.org/jira/browse/TIKA-1471 > Project: Tika > Issue Type: Bug > Components: general >Affects Versions: 1.6 > Environment: Linux, JVM 1.8.0_25-b17, 64-bit >Reporter: Alan Burlison >Priority: Blocker > Fix For: 1.7 > > > Use of PDFBox 1.8.6 by Tika 1.6 is causing OOM errors with corrupt PDF files, > due to a bug in PDFBox, see PDFBOX-2493. This makes Tika 1.6 unusable from > inside a long-running webapp and I've had to revert to Tika 1.5. Although 1.5 > also throws errors with the corrupt file it does not cause OOM errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1471) OOM with corrupt PDF file
[ https://issues.apache.org/jira/browse/TIKA-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208100#comment-14208100 ] Tim Allison commented on TIKA-1471: --- Ah, thank you for sharing this use case. The first step for tika-batch is disk to disk, but if there are other common use cases, we should add those (more robust tika-server, for example). I've found a separate jvm for Tika alone (despite the added storage) is the most robust way to handle large batches of potentially dangerous files; keep tika in a separate jvm from the indexer or next step in processing. Right, I had forgotten to mention memory leaks as one of the things integrators have to deal with. Thank you. I wonder if PDFBOX-2200/TIKA-1424 is the culprit for the memory leak you mention. > OOM with corrupt PDF file > - > > Key: TIKA-1471 > URL: https://issues.apache.org/jira/browse/TIKA-1471 > Project: Tika > Issue Type: Bug > Components: general >Affects Versions: 1.6 > Environment: Linux, JVM 1.8.0_25-b17, 64-bit >Reporter: Alan Burlison >Priority: Blocker > Fix For: 1.7 > > > Use of PDFBox 1.8.6 by Tika 1.6 is causing OOM errors with corrupt PDF files, > due to a bug in PDFBox, see PDFBOX-2493. This makes Tika 1.6 unusable from > inside a long-running webapp and I've had to revert to Tika 1.5. Although 1.5 > also throws errors with the corrupt file it does not cause OOM errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1471) OOM with corrupt PDF file
[ https://issues.apache.org/jira/browse/TIKA-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208089#comment-14208089 ] Alan Burlison commented on TIKA-1471: - In my case I'm using Tika to extract text from a corpus of around 350,000 documents, many of which are attachments to emails that I'm in turn handling with JavaMail. I therefore don't have an on-disk representation of many of the documents so doing all the processing inside the same JVM makes life a little easier. To keep performance reasonable I'm also using a thread pool with each thread containing a Tika instance which is reused for many (10s of thousands) documents . During a full re-index memory use creeps inexorably upwards but as I destroy the thread pool after each indexing run the memory is reclaimed. I'm guessing that one or more of the components that Tika uses is a bit tardy in releasing memory. > OOM with corrupt PDF file > - > > Key: TIKA-1471 > URL: https://issues.apache.org/jira/browse/TIKA-1471 > Project: Tika > Issue Type: Bug > Components: general >Affects Versions: 1.6 > Environment: Linux, JVM 1.8.0_25-b17, 64-bit >Reporter: Alan Burlison >Priority: Blocker > Fix For: 1.7 > > > Use of PDFBox 1.8.6 by Tika 1.6 is causing OOM errors with corrupt PDF files, > due to a bug in PDFBox, see PDFBOX-2493. This makes Tika 1.6 unusable from > inside a long-running webapp and I've had to revert to Tika 1.5. Although 1.5 > also throws errors with the corrupt file it does not cause OOM errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1469) Upgrade to POI 3.11-beta3 when available
[ https://issues.apache.org/jira/browse/TIKA-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1469: -- Attachment: Upgrade_to_poi-3_11-beta3v1.patch This doesn't fix the bundle issues, but this should be a good start. [~gagravarr], do we need to add a dependency on ooxml-security? Any changes I'd make to bundle would be dangerous given my lack of OSGi knowledge, but if the solution is to make optional anything that it can't find, then these work: {noformat} org.apache.jcp.xml.dsig.internal.dom;resolution:=optional, org.apache.xml.security;resolution:=optional, org.apache.xml.security.c14n;resolution:=optional, org.apache.xml.security.utils;resolution:=optional, org.bouncycastle.cert;resolution:=optional, org.bouncycastle.cert.jcajce;resolution:=optional, org.bouncycastle.cert.ocsp;resolution:=optional, org.bouncycastle.cms.bc;resolution:=optional, org.bouncycastle.operator;resolution:=optional, org.bouncycastle.operator.bc;resolution:=optional, org.bouncycastle.tsp;resolution:=optional, {noformat} > Upgrade to POI 3.11-beta3 when available > > > Key: TIKA-1469 > URL: https://issues.apache.org/jira/browse/TIKA-1469 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Minor > Attachments: Upgrade_to_poi-3_11-beta3v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TIKA-1446) CHM parser : wrong decompression of aligned blocks
[ https://issues.apache.org/jira/browse/TIKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208079#comment-14208079 ] Hong-Thai Nguyen edited comment on TIKA-1446 at 11/12/14 2:38 PM: -- Hi [~binhawking], I've merged your contribution and make title comparison before/after on a local corpus of CHM files. Before merge, we have only one failed file, after merge we have 10 failed files. I've pushed failed CHM files under _test-documents/chm_ & a checking test case into: https://github.com/thaichat04/tika I made also some clean-up. Any chance you have a look again ? was (Author: thaichat04): Hi [~binhawking], I've merge your pull request and make title comparison before/after on a local corpus of CHM files. Before merge, we have only one failed file, after merge we have 10 failed files. I've pushed failed CHM files under _test-documents/chm_ & a checking test case into: https://github.com/thaichat04/tika I made also some clean-up. Any chance you have a look again ? > CHM parser : wrong decompression of aligned blocks > -- > > Key: TIKA-1446 > URL: https://issues.apache.org/jira/browse/TIKA-1446 > Project: Tika > Issue Type: Bug >Affects Versions: 1.7 >Reporter: Bin Hawking >Priority: Critical > Attachments: chm.zip > > > If an embedded file contains aligned blocks, the parser outputs chaotic text > or empty text as to this file. > I have fixed it myself, corrected decompressAlignedBlock() and its > preparation methods. Mostly this bug is due to misusing main tree/align > tree/length tree. And some tree is built wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1446) CHM parser : wrong decompression of aligned blocks
[ https://issues.apache.org/jira/browse/TIKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208079#comment-14208079 ] Hong-Thai Nguyen commented on TIKA-1446: Hi [~binhawking], I've merge your pull request and make title comparison before/after on a local corpus of CHM files. Before merge, we have only one failed file, after merge we have 10 failed files. I've pushed failed CHM files under _test-documents/chm_ & a checking test case into: https://github.com/thaichat04/tika I made also some clean-up. Any chance you have a look again ? > CHM parser : wrong decompression of aligned blocks > -- > > Key: TIKA-1446 > URL: https://issues.apache.org/jira/browse/TIKA-1446 > Project: Tika > Issue Type: Bug >Affects Versions: 1.7 >Reporter: Bin Hawking >Priority: Critical > Attachments: chm.zip > > > If an embedded file contains aligned blocks, the parser outputs chaotic text > or empty text as to this file. > I have fixed it myself, corrected decompressAlignedBlock() and its > preparation methods. Mostly this bug is due to misusing main tree/align > tree/length tree. And some tree is built wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TIKA-1471) OOM with corrupt PDF file
[ https://issues.apache.org/jira/browse/TIKA-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208008#comment-14208008 ] Tim Allison edited comment on TIKA-1471 at 11/12/14 1:00 PM: - >From the discussion on PDFBOX-2493, this looks to be solved by PDFBox 1.8.7, >which we're now using in trunk. Thank you, [~alanbur], for reporting this issue on both Tika and PDFBox. We need to fix these serious errors as they are discovered. At this point, code that uses Tika needs to be able to handle regular exceptions, OOM errors and permanent hangs...these catastrophic errors will happen...rarely...but they do happen. Use of the ForkParser and tika server can help avoid some of these issues, and on TIKA-1330, we're working to develop a robust wrapper around Tika that can handle these types of problems so that every integrator doesn't have to reinvent the wheel. was (Author: talli...@mitre.org): >From the discussion on PDFBOX-2493, this looks to be solved by PDFBox 1.8.8. >I'll leave this open until we upgrade. Thank you, [~alanbur], for reporting this issue on both Tika and PDFBox. We need to fix these serious errors as they are discovered. At this point, code that uses Tika needs to be able to handle regular exceptions, OOM errors and permanent hangs...these catastrophic errors will happen...rarely...but they do happen. Use of the ForkParser and tika server can help avoid some of these issues, and on TIKA-1330, we're working to develop a robust wrapper around Tika that can handle these types of problems so that every integrator doesn't have to reinvent the wheel. > OOM with corrupt PDF file > - > > Key: TIKA-1471 > URL: https://issues.apache.org/jira/browse/TIKA-1471 > Project: Tika > Issue Type: Bug > Components: general >Affects Versions: 1.6 > Environment: Linux, JVM 1.8.0_25-b17, 64-bit >Reporter: Alan Burlison >Priority: Blocker > Fix For: 1.7 > > > Use of PDFBox 1.8.6 by Tika 1.6 is causing OOM errors with corrupt PDF files, > due to a bug in PDFBox, see PDFBOX-2493. This makes Tika 1.6 unusable from > inside a long-running webapp and I've had to revert to Tika 1.5. Although 1.5 > also throws errors with the corrupt file it does not cause OOM errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1471) OOM with corrupt PDF file
[ https://issues.apache.org/jira/browse/TIKA-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1471: -- Fix Version/s: 1.7 > OOM with corrupt PDF file > - > > Key: TIKA-1471 > URL: https://issues.apache.org/jira/browse/TIKA-1471 > Project: Tika > Issue Type: Bug > Components: general >Affects Versions: 1.6 > Environment: Linux, JVM 1.8.0_25-b17, 64-bit >Reporter: Alan Burlison >Priority: Blocker > Fix For: 1.7 > > > Use of PDFBox 1.8.6 by Tika 1.6 is causing OOM errors with corrupt PDF files, > due to a bug in PDFBox, see PDFBOX-2493. This makes Tika 1.6 unusable from > inside a long-running webapp and I've had to revert to Tika 1.5. Although 1.5 > also throws errors with the corrupt file it does not cause OOM errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1471) OOM with corrupt PDF file
[ https://issues.apache.org/jira/browse/TIKA-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208008#comment-14208008 ] Tim Allison commented on TIKA-1471: --- >From the discussion on PDFBOX-2493, this looks to be solved by PDFBox 1.8.8. >I'll leave this open until we upgrade. Thank you, [~alanbur], for reporting this issue on both Tika and PDFBox. We need to fix these serious errors as they are discovered. At this point, code that uses Tika needs to be able to handle regular exceptions, OOM errors and permanent hangs...these catastrophic errors will happen...rarely...but they do happen. Use of the ForkParser and tika server can help avoid some of these issues, and on TIKA-1330, we're working to develop a robust wrapper around Tika that can handle these types of problems so that every integrator doesn't have to reinvent the wheel. > OOM with corrupt PDF file > - > > Key: TIKA-1471 > URL: https://issues.apache.org/jira/browse/TIKA-1471 > Project: Tika > Issue Type: Bug > Components: general >Affects Versions: 1.6 > Environment: Linux, JVM 1.8.0_25-b17, 64-bit >Reporter: Alan Burlison >Priority: Blocker > > Use of PDFBox 1.8.6 by Tika 1.6 is causing OOM errors with corrupt PDF files, > due to a bug in PDFBox, see PDFBOX-2493. This makes Tika 1.6 unusable from > inside a long-running webapp and I've had to revert to Tika 1.5. Although 1.5 > also throws errors with the corrupt file it does not cause OOM errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1472) Warning on Tika Server startup - Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/TIKA-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207974#comment-14207974 ] Hudson commented on TIKA-1472: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #288 (See [https://builds.apache.org/job/tika-trunk-jdk1.6/288/]) Fix for TIKA-1472 Warning on Tika Server startup - Failed to load class org.slf4j.impl.StaticLoggerBinder contributed by Konstantin Gribov this closes #22. (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1638761) * /tika/trunk/CHANGES.txt * /tika/trunk/tika-server/pom.xml > Warning on Tika Server startup - Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > - > > Key: TIKA-1472 > URL: https://issues.apache.org/jira/browse/TIKA-1472 > Project: Tika > Issue Type: Bug > Components: server >Affects Versions: 1.6 > Environment: Windows 8, JDK 1.8, Maven 3.2.3 >Reporter: Darya Arbuzova >Assignee: Chris A. Mattmann >Priority: Minor > Fix For: 1.7 > > Attachments: 0001-Added-slf4j-jcl-impl-to-tika-server-deps.patch > > > Hello! > I want to use Apache Tika in server mode. > I downloaded {{tika-server-1.6.jar}} from > http://mirror.vorboss.net/apache/tika/ > When I try to start the server, I get > {{SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".}} > So I go to the link you direct me to > (http://www.slf4j.org/codes.html#StaticLoggerBinder), download other slfj4 > {{jar}}-files, but what next? I can't put them to the "class path", since I > don't have a project. I can't change dependencies in {{pom.xml}} for the same > reason. Whant should I do? > I tried downloading the whole source code, but couldn't build it using Maven, > still haven't figured out why. Previous discussion see here: > https://issues.apache.org/jira/browse/TIKA-1470 > Thank you! > Best regards, > Darya Arbuzova -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1472) Warning on Tika Server startup - Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/TIKA-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207956#comment-14207956 ] Hudson commented on TIKA-1472: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #308 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/308/]) Fix for TIKA-1472 Warning on Tika Server startup - Failed to load class org.slf4j.impl.StaticLoggerBinder contributed by Konstantin Gribov this closes #22. (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1638761) * /tika/trunk/CHANGES.txt * /tika/trunk/tika-server/pom.xml > Warning on Tika Server startup - Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > - > > Key: TIKA-1472 > URL: https://issues.apache.org/jira/browse/TIKA-1472 > Project: Tika > Issue Type: Bug > Components: server >Affects Versions: 1.6 > Environment: Windows 8, JDK 1.8, Maven 3.2.3 >Reporter: Darya Arbuzova >Assignee: Chris A. Mattmann >Priority: Minor > Fix For: 1.7 > > Attachments: 0001-Added-slf4j-jcl-impl-to-tika-server-deps.patch > > > Hello! > I want to use Apache Tika in server mode. > I downloaded {{tika-server-1.6.jar}} from > http://mirror.vorboss.net/apache/tika/ > When I try to start the server, I get > {{SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".}} > So I go to the link you direct me to > (http://www.slf4j.org/codes.html#StaticLoggerBinder), download other slfj4 > {{jar}}-files, but what next? I can't put them to the "class path", since I > don't have a project. I can't change dependencies in {{pom.xml}} for the same > reason. Whant should I do? > I tried downloading the whole source code, but couldn't build it using Maven, > still haven't figured out why. Previous discussion see here: > https://issues.apache.org/jira/browse/TIKA-1470 > Thank you! > Best regards, > Darya Arbuzova -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TIKA-1472) Warning on Tika Server startup - Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/TIKA-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved TIKA-1472. - Resolution: Fixed Fix Version/s: 1.7 - merged pull request #22 into master in r1638761. Thanks to Konstantin Gribov for the patch! > Warning on Tika Server startup - Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > - > > Key: TIKA-1472 > URL: https://issues.apache.org/jira/browse/TIKA-1472 > Project: Tika > Issue Type: Bug > Components: server >Affects Versions: 1.6 > Environment: Windows 8, JDK 1.8, Maven 3.2.3 >Reporter: Darya Arbuzova >Assignee: Chris A. Mattmann >Priority: Minor > Fix For: 1.7 > > Attachments: 0001-Added-slf4j-jcl-impl-to-tika-server-deps.patch > > > Hello! > I want to use Apache Tika in server mode. > I downloaded {{tika-server-1.6.jar}} from > http://mirror.vorboss.net/apache/tika/ > When I try to start the server, I get > {{SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".}} > So I go to the link you direct me to > (http://www.slf4j.org/codes.html#StaticLoggerBinder), download other slfj4 > {{jar}}-files, but what next? I can't put them to the "class path", since I > don't have a project. I can't change dependencies in {{pom.xml}} for the same > reason. Whant should I do? > I tried downloading the whole source code, but couldn't build it using Maven, > still haven't figured out why. Previous discussion see here: > https://issues.apache.org/jira/browse/TIKA-1470 > Thank you! > Best regards, > Darya Arbuzova -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1472) Warning on Tika Server startup - Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/TIKA-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207940#comment-14207940 ] ASF GitHub Bot commented on TIKA-1472: -- Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/22 > Warning on Tika Server startup - Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > - > > Key: TIKA-1472 > URL: https://issues.apache.org/jira/browse/TIKA-1472 > Project: Tika > Issue Type: Bug > Components: server >Affects Versions: 1.6 > Environment: Windows 8, JDK 1.8, Maven 3.2.3 >Reporter: Darya Arbuzova >Assignee: Chris A. Mattmann >Priority: Minor > Fix For: 1.7 > > Attachments: 0001-Added-slf4j-jcl-impl-to-tika-server-deps.patch > > > Hello! > I want to use Apache Tika in server mode. > I downloaded {{tika-server-1.6.jar}} from > http://mirror.vorboss.net/apache/tika/ > When I try to start the server, I get > {{SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".}} > So I go to the link you direct me to > (http://www.slf4j.org/codes.html#StaticLoggerBinder), download other slfj4 > {{jar}}-files, but what next? I can't put them to the "class path", since I > don't have a project. I can't change dependencies in {{pom.xml}} for the same > reason. Whant should I do? > I tried downloading the whole source code, but couldn't build it using Maven, > still haven't figured out why. Previous discussion see here: > https://issues.apache.org/jira/browse/TIKA-1470 > Thank you! > Best regards, > Darya Arbuzova -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] tika pull request: Added slf4j-jcl impl to tika-server deps.
Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/22 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (TIKA-1472) Warning on Tika Server startup - Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/TIKA-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned TIKA-1472: --- Assignee: Chris A. Mattmann > Warning on Tika Server startup - Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > - > > Key: TIKA-1472 > URL: https://issues.apache.org/jira/browse/TIKA-1472 > Project: Tika > Issue Type: Bug > Components: server >Affects Versions: 1.6 > Environment: Windows 8, JDK 1.8, Maven 3.2.3 >Reporter: Darya Arbuzova >Assignee: Chris A. Mattmann >Priority: Minor > Attachments: 0001-Added-slf4j-jcl-impl-to-tika-server-deps.patch > > > Hello! > I want to use Apache Tika in server mode. > I downloaded {{tika-server-1.6.jar}} from > http://mirror.vorboss.net/apache/tika/ > When I try to start the server, I get > {{SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".}} > So I go to the link you direct me to > (http://www.slf4j.org/codes.html#StaticLoggerBinder), download other slfj4 > {{jar}}-files, but what next? I can't put them to the "class path", since I > don't have a project. I can't change dependencies in {{pom.xml}} for the same > reason. Whant should I do? > I tried downloading the whole source code, but couldn't build it using Maven, > still haven't figured out why. Previous discussion see here: > https://issues.apache.org/jira/browse/TIKA-1470 > Thank you! > Best regards, > Darya Arbuzova -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1446) CHM parser : wrong decompression of aligned blocks
[ https://issues.apache.org/jira/browse/TIKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207922#comment-14207922 ] Chris A. Mattmann commented on TIKA-1446: - Hi guys, what is the status on this? Is this ready to be merged? > CHM parser : wrong decompression of aligned blocks > -- > > Key: TIKA-1446 > URL: https://issues.apache.org/jira/browse/TIKA-1446 > Project: Tika > Issue Type: Bug >Affects Versions: 1.7 >Reporter: Bin Hawking >Priority: Critical > Attachments: chm.zip > > > If an embedded file contains aligned blocks, the parser outputs chaotic text > or empty text as to this file. > I have fixed it myself, corrected decompressAlignedBlock() and its > preparation methods. Mostly this bug is due to misusing main tree/align > tree/length tree. And some tree is built wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)