[jira] [Created] (TIKA-1709) Tika Server doesn't handle multi-part attachments or form-encoded inputs

2015-08-14 Thread Chris A. Mattmann (JIRA)
Chris A. Mattmann created TIKA-1709:
---

 Summary: Tika Server doesn't handle multi-part attachments or 
form-encoded inputs
 Key: TIKA-1709
 URL: https://issues.apache.org/jira/browse/TIKA-1709
 Project: Tika
  Issue Type: Bug
  Components: server
 Environment: http://github.com/chrismattmann/tika-python/ Windows 7 
Ultimate
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.11


Downstream in the Tika Python library, I noticed that Tika Server doesn't 
handle e.g., in /rmeta, multi-part attachments on Windows 7 Ultimate, such as 
those encoded using curl -T for example. Tika-Server returns back a 415 that it 
can't properly diagnose what the mime type is.

See: 
https://github.com/kennethreitz/requests/issues/2725
https://github.com/chrismattmann/tika-python/issues/58

For more info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2

2015-08-14 Thread Andreas Beeker (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Beeker updated TIKA-1707:
-
Attachment: common_sl.diff

ok ... I've changed the import settings and executed organize imports.
I couldn't validate the tests again, as the new grobid dependency somehow broke 
my build ... but apart of that, the rest is the same ...

> Upgrade to Apache POI 3.13 Beta 2
> -
>
> Key: TIKA-1707
> URL: https://issues.apache.org/jira/browse/TIKA-1707
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.9
>Reporter: Andreas Beeker
> Attachments: common_sl.diff
>
>
> In the not so far future, POI 3.13 Beta 2 will be available.
> This contains a quite big change to the Powerpoint modules XSLF/HSLF, but 
> thankfully TIKA isn't much affected.
> Please try the patch on our trunk and post side-effects.
> As the work on the common_sl api hasn't been finished yet, there might be 
> another patch for the next POI beta version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2

2015-08-14 Thread Andreas Beeker (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Beeker updated TIKA-1707:
-
Attachment: (was: common_sl.diff)

> Upgrade to Apache POI 3.13 Beta 2
> -
>
> Key: TIKA-1707
> URL: https://issues.apache.org/jira/browse/TIKA-1707
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.9
>Reporter: Andreas Beeker
>
> In the not so far future, POI 3.13 Beta 2 will be available.
> This contains a quite big change to the Powerpoint modules XSLF/HSLF, but 
> thankfully TIKA isn't much affected.
> Please try the patch on our trunk and post side-effects.
> As the work on the common_sl api hasn't been finished yet, there might be 
> another patch for the next POI beta version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1706) Bring back commons-io to tika-core

2015-08-14 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697961#comment-14697961
 ] 

Uwe Schindler commented on TIKA-1706:
-

If you bring in commons-io, you should also add the corresponding 
forbidden-apis signatures to the POM. commons-io makes it easy to choose the 
wrong IOUtils/FileUtils method and then you are dependent to default charset 
again...

https://github.com/policeman-tools/forbidden-apis/wiki/BundledSignatures

> Bring back commons-io to tika-core
> --
>
> Key: TIKA-1706
> URL: https://issues.apache.org/jira/browse/TIKA-1706
> Project: Tika
>  Issue Type: Improvement
>  Components: core
>Reporter: Yaniv Kunda
>Priority: Minor
> Fix For: 1.11
>
> Attachments: TIKA-1706.patch
>
>
> TIKA-249 inlined select commons-io classes in order to simplify the 
> dependency tree and save some space.
> I believe these arguments are weaker nowadays due to the following concerns:
> - Most of the non-core modules already use commons-io, and since tika-core is 
> usually not used by itself, commons-io is already included with it
> - Since some modules use both tika-core and commons-io, it's not clear which 
> code should be used
> - Having the inlined classes causes more maintenance and/or technology debt 
> (which in turn causes more maintenance)
> - Newer commons-io code utilizes newer platform code, e.g. using Charset 
> objects instead of encoding names, being able to use StringBuilder instead of 
> StringBuffer, and so on.
> I'll be happy to provide a patch to replace usages of the inlined classes 
> with commons-io classes if this is accepted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1706) Bring back commons-io to tika-core

2015-08-14 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697852#comment-14697852
 ] 

Nick Burch commented on TIKA-1706:
--

Since tika-parsers already depends on Commons IO, would you be able to split 
your patch into two?

We can probably apply the tika-parsers changes / tidy-ups straight away, the 
first few at least look perfectly sensible to me. 

However, having the tika-core related changes independently will help with the 
review there, as that's the one I think will likely need more oversight and 
thinking. Especially from [~jukkaz], who made the original inlining changes, 
and might be best placed to comment on the updated plan now we're a few years 
later on

> Bring back commons-io to tika-core
> --
>
> Key: TIKA-1706
> URL: https://issues.apache.org/jira/browse/TIKA-1706
> Project: Tika
>  Issue Type: Improvement
>  Components: core
>Reporter: Yaniv Kunda
>Priority: Minor
> Fix For: 1.11
>
> Attachments: TIKA-1706.patch
>
>
> TIKA-249 inlined select commons-io classes in order to simplify the 
> dependency tree and save some space.
> I believe these arguments are weaker nowadays due to the following concerns:
> - Most of the non-core modules already use commons-io, and since tika-core is 
> usually not used by itself, commons-io is already included with it
> - Since some modules use both tika-core and commons-io, it's not clear which 
> code should be used
> - Having the inlined classes causes more maintenance and/or technology debt 
> (which in turn causes more maintenance)
> - Newer commons-io code utilizes newer platform code, e.g. using Charset 
> objects instead of encoding names, being able to use StringBuilder instead of 
> StringBuffer, and so on.
> I'll be happy to provide a patch to replace usages of the inlined classes 
> with commons-io classes if this is accepted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1706) Bring back commons-io to tika-core

2015-08-14 Thread Yaniv Kunda (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yaniv Kunda updated TIKA-1706:
--
Attachment: TIKA-1706.patch

A patch to bring back commons-io to tika-core and replace all formerly inlined 
classes.

> Bring back commons-io to tika-core
> --
>
> Key: TIKA-1706
> URL: https://issues.apache.org/jira/browse/TIKA-1706
> Project: Tika
>  Issue Type: Improvement
>  Components: core
>Reporter: Yaniv Kunda
>Priority: Minor
> Fix For: 1.11
>
> Attachments: TIKA-1706.patch
>
>
> TIKA-249 inlined select commons-io classes in order to simplify the 
> dependency tree and save some space.
> I believe these arguments are weaker nowadays due to the following concerns:
> - Most of the non-core modules already use commons-io, and since tika-core is 
> usually not used by itself, commons-io is already included with it
> - Since some modules use both tika-core and commons-io, it's not clear which 
> code should be used
> - Having the inlined classes causes more maintenance and/or technology debt 
> (which in turn causes more maintenance)
> - Newer commons-io code utilizes newer platform code, e.g. using Charset 
> objects instead of encoding names, being able to use StringBuilder instead of 
> StringBuffer, and so on.
> I'll be happy to provide a patch to replace usages of the inlined classes 
> with commons-io classes if this is accepted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1708) Detectors loaded from configuration files into CompositeDetector fail

2015-08-14 Thread Justin Palmer (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697389#comment-14697389
 ] 

Justin Palmer commented on TIKA-1708:
-

Attached test case fails...
Running org.apache.tika.detect.ConfiguredDetectorTest
Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.434 sec <<< 
FAILURE! - in org.apache.tika.detect.ConfiguredDetectorTest
testLoadedCompositeConfig(org.apache.tika.detect.ConfiguredDetectorTest)  Time 
elapsed: 0.029 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.tika.detect.ConfiguredDetectorTest.parseDocument(ConfiguredDetectorTest.java:55)
at 
org.apache.tika.detect.ConfiguredDetectorTest.testLoadedCompositeConfig(ConfiguredDetectorTest.java:77)



> Detectors loaded from configuration files into CompositeDetector fail
> -
>
> Key: TIKA-1708
> URL: https://issues.apache.org/jira/browse/TIKA-1708
> Project: Tika
>  Issue Type: Bug
>  Components: config, detector
>Affects Versions: 1.10
>Reporter: Justin Palmer
> Attachments: TIKA-1708.zip
>
>
> Loading individual Detectors from a configuration file, e.g.
>   
>   
> will cause them to be added to a CompositeDetector which does not detect, 
> e.g. PST files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1708) Detectors loaded from configuration files into CompositeDetector fail

2015-08-14 Thread Justin Palmer (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Palmer updated TIKA-1708:

Attachment: TIKA-1708.zip

Test case.

> Detectors loaded from configuration files into CompositeDetector fail
> -
>
> Key: TIKA-1708
> URL: https://issues.apache.org/jira/browse/TIKA-1708
> Project: Tika
>  Issue Type: Bug
>  Components: config, detector
>Affects Versions: 1.10
>Reporter: Justin Palmer
> Attachments: TIKA-1708.zip
>
>
> Loading individual Detectors from a configuration file, e.g.
>   
>   
> will cause them to be added to a CompositeDetector which does not detect, 
> e.g. PST files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-1708) Detectors loaded from configuration files into CompositeDetector fail

2015-08-14 Thread Justin Palmer (JIRA)
Justin Palmer created TIKA-1708:
---

 Summary: Detectors loaded from configuration files into 
CompositeDetector fail
 Key: TIKA-1708
 URL: https://issues.apache.org/jira/browse/TIKA-1708
 Project: Tika
  Issue Type: Bug
  Components: config, detector
Affects Versions: 1.10
Reporter: Justin Palmer


Loading individual Detectors from a configuration file, e.g.

  
  

will cause them to be added to a CompositeDetector which does not detect, e.g. 
PST files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


tika-trunk-jdk1.7 - Build # 823 - Still Failing

2015-08-14 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-trunk-jdk1.7 (build #823)

Status: Still Failing

Check console output at https://builds.apache.org/job/tika-trunk-jdk1.7/823/ to 
view the results.

tika-trunk-jdk1.7 - Build # 822 - Still Failing

2015-08-14 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-trunk-jdk1.7 (build #822)

Status: Still Failing

Check console output at https://builds.apache.org/job/tika-trunk-jdk1.7/822/ to 
view the results.