[jira] [Created] (TIKA-1709) Tika Server doesn't handle multi-part attachments or form-encoded inputs
Chris A. Mattmann created TIKA-1709: --- Summary: Tika Server doesn't handle multi-part attachments or form-encoded inputs Key: TIKA-1709 URL: https://issues.apache.org/jira/browse/TIKA-1709 Project: Tika Issue Type: Bug Components: server Environment: http://github.com/chrismattmann/tika-python/ Windows 7 Ultimate Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.11 Downstream in the Tika Python library, I noticed that Tika Server doesn't handle e.g., in /rmeta, multi-part attachments on Windows 7 Ultimate, such as those encoded using curl -T for example. Tika-Server returns back a 415 that it can't properly diagnose what the mime type is. See: https://github.com/kennethreitz/requests/issues/2725 https://github.com/chrismattmann/tika-python/issues/58 For more info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2
[ https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Beeker updated TIKA-1707: - Attachment: common_sl.diff ok ... I've changed the import settings and executed organize imports. I couldn't validate the tests again, as the new grobid dependency somehow broke my build ... but apart of that, the rest is the same ... > Upgrade to Apache POI 3.13 Beta 2 > - > > Key: TIKA-1707 > URL: https://issues.apache.org/jira/browse/TIKA-1707 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.9 >Reporter: Andreas Beeker > Attachments: common_sl.diff > > > In the not so far future, POI 3.13 Beta 2 will be available. > This contains a quite big change to the Powerpoint modules XSLF/HSLF, but > thankfully TIKA isn't much affected. > Please try the patch on our trunk and post side-effects. > As the work on the common_sl api hasn't been finished yet, there might be > another patch for the next POI beta version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2
[ https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Beeker updated TIKA-1707: - Attachment: (was: common_sl.diff) > Upgrade to Apache POI 3.13 Beta 2 > - > > Key: TIKA-1707 > URL: https://issues.apache.org/jira/browse/TIKA-1707 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.9 >Reporter: Andreas Beeker > > In the not so far future, POI 3.13 Beta 2 will be available. > This contains a quite big change to the Powerpoint modules XSLF/HSLF, but > thankfully TIKA isn't much affected. > Please try the patch on our trunk and post side-effects. > As the work on the common_sl api hasn't been finished yet, there might be > another patch for the next POI beta version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1706) Bring back commons-io to tika-core
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697961#comment-14697961 ] Uwe Schindler commented on TIKA-1706: - If you bring in commons-io, you should also add the corresponding forbidden-apis signatures to the POM. commons-io makes it easy to choose the wrong IOUtils/FileUtils method and then you are dependent to default charset again... https://github.com/policeman-tools/forbidden-apis/wiki/BundledSignatures > Bring back commons-io to tika-core > -- > > Key: TIKA-1706 > URL: https://issues.apache.org/jira/browse/TIKA-1706 > Project: Tika > Issue Type: Improvement > Components: core >Reporter: Yaniv Kunda >Priority: Minor > Fix For: 1.11 > > Attachments: TIKA-1706.patch > > > TIKA-249 inlined select commons-io classes in order to simplify the > dependency tree and save some space. > I believe these arguments are weaker nowadays due to the following concerns: > - Most of the non-core modules already use commons-io, and since tika-core is > usually not used by itself, commons-io is already included with it > - Since some modules use both tika-core and commons-io, it's not clear which > code should be used > - Having the inlined classes causes more maintenance and/or technology debt > (which in turn causes more maintenance) > - Newer commons-io code utilizes newer platform code, e.g. using Charset > objects instead of encoding names, being able to use StringBuilder instead of > StringBuffer, and so on. > I'll be happy to provide a patch to replace usages of the inlined classes > with commons-io classes if this is accepted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1706) Bring back commons-io to tika-core
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697852#comment-14697852 ] Nick Burch commented on TIKA-1706: -- Since tika-parsers already depends on Commons IO, would you be able to split your patch into two? We can probably apply the tika-parsers changes / tidy-ups straight away, the first few at least look perfectly sensible to me. However, having the tika-core related changes independently will help with the review there, as that's the one I think will likely need more oversight and thinking. Especially from [~jukkaz], who made the original inlining changes, and might be best placed to comment on the updated plan now we're a few years later on > Bring back commons-io to tika-core > -- > > Key: TIKA-1706 > URL: https://issues.apache.org/jira/browse/TIKA-1706 > Project: Tika > Issue Type: Improvement > Components: core >Reporter: Yaniv Kunda >Priority: Minor > Fix For: 1.11 > > Attachments: TIKA-1706.patch > > > TIKA-249 inlined select commons-io classes in order to simplify the > dependency tree and save some space. > I believe these arguments are weaker nowadays due to the following concerns: > - Most of the non-core modules already use commons-io, and since tika-core is > usually not used by itself, commons-io is already included with it > - Since some modules use both tika-core and commons-io, it's not clear which > code should be used > - Having the inlined classes causes more maintenance and/or technology debt > (which in turn causes more maintenance) > - Newer commons-io code utilizes newer platform code, e.g. using Charset > objects instead of encoding names, being able to use StringBuilder instead of > StringBuffer, and so on. > I'll be happy to provide a patch to replace usages of the inlined classes > with commons-io classes if this is accepted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1706) Bring back commons-io to tika-core
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1706: -- Attachment: TIKA-1706.patch A patch to bring back commons-io to tika-core and replace all formerly inlined classes. > Bring back commons-io to tika-core > -- > > Key: TIKA-1706 > URL: https://issues.apache.org/jira/browse/TIKA-1706 > Project: Tika > Issue Type: Improvement > Components: core >Reporter: Yaniv Kunda >Priority: Minor > Fix For: 1.11 > > Attachments: TIKA-1706.patch > > > TIKA-249 inlined select commons-io classes in order to simplify the > dependency tree and save some space. > I believe these arguments are weaker nowadays due to the following concerns: > - Most of the non-core modules already use commons-io, and since tika-core is > usually not used by itself, commons-io is already included with it > - Since some modules use both tika-core and commons-io, it's not clear which > code should be used > - Having the inlined classes causes more maintenance and/or technology debt > (which in turn causes more maintenance) > - Newer commons-io code utilizes newer platform code, e.g. using Charset > objects instead of encoding names, being able to use StringBuilder instead of > StringBuffer, and so on. > I'll be happy to provide a patch to replace usages of the inlined classes > with commons-io classes if this is accepted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1708) Detectors loaded from configuration files into CompositeDetector fail
[ https://issues.apache.org/jira/browse/TIKA-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697389#comment-14697389 ] Justin Palmer commented on TIKA-1708: - Attached test case fails... Running org.apache.tika.detect.ConfiguredDetectorTest Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.434 sec <<< FAILURE! - in org.apache.tika.detect.ConfiguredDetectorTest testLoadedCompositeConfig(org.apache.tika.detect.ConfiguredDetectorTest) Time elapsed: 0.029 sec <<< FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.tika.detect.ConfiguredDetectorTest.parseDocument(ConfiguredDetectorTest.java:55) at org.apache.tika.detect.ConfiguredDetectorTest.testLoadedCompositeConfig(ConfiguredDetectorTest.java:77) > Detectors loaded from configuration files into CompositeDetector fail > - > > Key: TIKA-1708 > URL: https://issues.apache.org/jira/browse/TIKA-1708 > Project: Tika > Issue Type: Bug > Components: config, detector >Affects Versions: 1.10 >Reporter: Justin Palmer > Attachments: TIKA-1708.zip > > > Loading individual Detectors from a configuration file, e.g. > > > will cause them to be added to a CompositeDetector which does not detect, > e.g. PST files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1708) Detectors loaded from configuration files into CompositeDetector fail
[ https://issues.apache.org/jira/browse/TIKA-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Palmer updated TIKA-1708: Attachment: TIKA-1708.zip Test case. > Detectors loaded from configuration files into CompositeDetector fail > - > > Key: TIKA-1708 > URL: https://issues.apache.org/jira/browse/TIKA-1708 > Project: Tika > Issue Type: Bug > Components: config, detector >Affects Versions: 1.10 >Reporter: Justin Palmer > Attachments: TIKA-1708.zip > > > Loading individual Detectors from a configuration file, e.g. > > > will cause them to be added to a CompositeDetector which does not detect, > e.g. PST files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1708) Detectors loaded from configuration files into CompositeDetector fail
Justin Palmer created TIKA-1708: --- Summary: Detectors loaded from configuration files into CompositeDetector fail Key: TIKA-1708 URL: https://issues.apache.org/jira/browse/TIKA-1708 Project: Tika Issue Type: Bug Components: config, detector Affects Versions: 1.10 Reporter: Justin Palmer Loading individual Detectors from a configuration file, e.g. will cause them to be added to a CompositeDetector which does not detect, e.g. PST files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
tika-trunk-jdk1.7 - Build # 823 - Still Failing
The Apache Jenkins build system has built tika-trunk-jdk1.7 (build #823) Status: Still Failing Check console output at https://builds.apache.org/job/tika-trunk-jdk1.7/823/ to view the results.
tika-trunk-jdk1.7 - Build # 822 - Still Failing
The Apache Jenkins build system has built tika-trunk-jdk1.7 (build #822) Status: Still Failing Check console output at https://builds.apache.org/job/tika-trunk-jdk1.7/822/ to view the results.