[jira] [Commented] (TIKA-1706) Bring back commons-io to tika-core
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383397#comment-17383397 ] Yaniv Kunda commented on TIKA-1706: --- What a blast from the past... Thanks! > Bring back commons-io to tika-core > -- > > Key: TIKA-1706 > URL: https://issues.apache.org/jira/browse/TIKA-1706 > Project: Tika > Issue Type: Improvement > Components: core > Reporter: Yaniv Kunda >Priority: Minor > Fix For: 2.0.0 > > Attachments: TIKA-1706-1.patch, TIKA-1706-2.patch > > > TIKA-249 inlined select commons-io classes in order to simplify the > dependency tree and save some space. > I believe these arguments are weaker nowadays due to the following concerns: > - Most of the non-core modules already use commons-io, and since tika-core is > usually not used by itself, commons-io is already included with it > - Since some modules use both tika-core and commons-io, it's not clear which > code should be used > - Having the inlined classes causes more maintenance and/or technology debt > (which in turn causes more maintenance) > - Newer commons-io code utilizes newer platform code, e.g. using Charset > objects instead of encoding names, being able to use StringBuilder instead of > StringBuffer, and so on. > I'll be happy to provide a patch to replace usages of the inlined classes > with commons-io classes if this is accepted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
RE: [jira] [Updated] (TIKA-1706) Bring back commons-io to tika-core
It’s been almost two months since I provided my patches for this – Can a committer please review and submit? *From:* Yaniv Kunda [mailto:yaniv.ku...@answers.com] *Sent:* Monday, October 12, 2015 23:08 *To:* dev@tika.apache.org *Subject:* Re: [jira] [Updated] (TIKA-1706) Bring back commons-io to tika-core Is this solution applicable? I have some improvements waiting for this. On Oct 1, 2015 5:57 PM, "Yaniv Kunda (JIRA)" <j...@apache.org> wrote: [ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1706: -- Attachment: TIKA-1706-2.patch TIKA-1706-1.patch A proposed patch per [~grossws]'s suggestion from the dev mailing list - The first patch contains the following: - creation of the secondary jar using maven-shade-plugin: -- used the *uber* classifier using alternatives: shaded, nodep, all, etc. Which one is best? -- commons-io shaded under {{shaded.commons-io.$\{commons.io.version\}. org.apache.commons.io}} to avoid potential conflicts with other commons-io-shading dependencies e.g. as in org.ops4j.pax.url:pax-url-aether:2.3.0 -- automatic removal of unused classes using - deprecated all classes that were copied from commons-io and modified them to extend their new counterparts - deprecated all constructors - removed all identical or functionally identical methods - modified all remaining methods to call alternative existing jdk/commons-io methods, deprecated them and refered to the used alternatives _*Note: this was done only in IOUtils, where many methods that has the same signature as the ones in commons-io were modified along the way to use UTF-8 instead of the platform default._ - all things should remain backward-compatible, except one: org.apache.tika.io.TaggedIOException(IOException, Object) will now throw a ClassCastException if the Object is not Serializable The second patch contains trivial import changes in tika-core from org.apache.tika.io to org.apache.commons.io > Bring back commons-io to tika-core > -- > > Key: TIKA-1706 > URL: https://issues.apache.org/jira/browse/TIKA-1706 > Project: Tika > Issue Type: Improvement > Components: core >Reporter: Yaniv Kunda >Priority: Minor > Fix For: 1.11 > > Attachments: TIKA-1706-1.patch, TIKA-1706-2.patch > > > TIKA-249 inlined select commons-io classes in order to simplify the dependency tree and save some space. > I believe these arguments are weaker nowadays due to the following concerns: > - Most of the non-core modules already use commons-io, and since tika-core is usually not used by itself, commons-io is already included with it > - Since some modules use both tika-core and commons-io, it's not clear which code should be used > - Having the inlined classes causes more maintenance and/or technology debt (which in turn causes more maintenance) > - Newer commons-io code utilizes newer platform code, e.g. using Charset objects instead of encoding names, being able to use StringBuilder instead of StringBuffer, and so on. > I'll be happy to provide a patch to replace usages of the inlined classes with commons-io classes if this is accepted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) -- This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email le...@answers.com. If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out. Thank you.
[jira] [Commented] (TIKA-1672) Integrate tika-java7 component
[ https://issues.apache.org/jira/browse/TIKA-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962596#comment-14962596 ] Yaniv Kunda commented on TIKA-1672: --- Here are some names I suggested: - tika-java7-spi - tika-java7-filetypedetector - tika-java7-detector-spi > Integrate tika-java7 component > -- > > Key: TIKA-1672 > URL: https://issues.apache.org/jira/browse/TIKA-1672 > Project: Tika > Issue Type: Improvement >Reporter: Tyler Palsulich > Fix For: 1.12 > > > Code requiring Java 7 doesn't need to be in a separate module now that > TIKA-1536 (upgrade to Java 7) is done. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: [jira] [Updated] (TIKA-1745) Add methods accepting java.nio.file.Path to org.apache.tika.Tika and org.apache.tika.parser.ParsingReader
This (and https://issues.apache.org/jira/browse/TIKA-1746 and https://issues.apache.org/jira/browse/TIKA-1751) are part of https://issues.apache.org/jira/browse/TIKA-1726 and already have relatively simple patches ready to be committed. I think they'd be better off committed together with their already-committed siblings, for putting all API additions in 1.11. (I'd also like to see https://issues.apache.org/jira/browse/TIKA-1706 in 1.11, which I have prepared patches for according to [~grossws]'s suggestion, but that's another story...) -Original Message- From: Chris A. Mattmann (JIRA) [mailto:j...@apache.org] Sent: Sunday, October 18, 2015 22:44 To: dev@tika.apache.org Subject: [jira] [Updated] (TIKA-1745) Add methods accepting java.nio.file.Path to org.apache.tika.Tika and org.apache.tika.parser.ParsingReader [ https://issues.apache.org/jira/browse/TIKA-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1745: Fix Version/s: (was: 1.11) 1.12 > Add methods accepting java.nio.file.Path to org.apache.tika.Tika and > org.apache.tika.parser.ParsingReader > - > > Key: TIKA-1745 > URL: https://issues.apache.org/jira/browse/TIKA-1745 > Project: Tika > Issue Type: Sub-task > Components: core > Reporter: Yaniv Kunda >Priority: Minor > Labels: java7 > Fix For: 1.12 > > Attachments: TIKA-1745.patch > > > Add methods accepting java.nio.file.Path to complement those accepting > java.io.File, using the new methods in TikaInputStream or > java.nio.file.Files -- This message was sent by Atlassian JIRA (v6.3.4#6332) -- This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email le...@answers.com. If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out. Thank you.
Re: [jira] [Updated] (TIKA-1706) Bring back commons-io to tika-core
Is this solution applicable? I have some improvements waiting for this. On Oct 1, 2015 5:57 PM, "Yaniv Kunda (JIRA)" <j...@apache.org> wrote: > > [ > https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Yaniv Kunda updated TIKA-1706: > -- > Attachment: TIKA-1706-2.patch > TIKA-1706-1.patch > > A proposed patch per [~grossws]'s suggestion from the dev mailing list - > The first patch contains the following: > - creation of the secondary jar using maven-shade-plugin: > -- used the *uber* classifier using > alternatives: shaded, nodep, all, etc. > Which one is best? > -- commons-io shaded under {{shaded.commons-io.$\{commons.io.version\}. > org.apache.commons.io}} to avoid potential conflicts with other > commons-io-shading dependencies e.g. as in > org.ops4j.pax.url:pax-url-aether:2.3.0 > -- automatic removal of unused classes using > - deprecated all classes that were copied from commons-io and modified > them to extend their new counterparts > - deprecated all constructors > - removed all identical or functionally identical methods > - modified all remaining methods to call alternative existing > jdk/commons-io methods, deprecated them and refered to the used alternatives > _*Note: this was done only in IOUtils, where many methods that has the > same signature as the ones in commons-io were modified along the way to use > UTF-8 instead of the platform default._ > - all things should remain backward-compatible, except one: > org.apache.tika.io.TaggedIOException(IOException, Object) will now throw a > ClassCastException if the Object is not Serializable > > The second patch contains trivial import changes in tika-core from > org.apache.tika.io to org.apache.commons.io > > > Bring back commons-io to tika-core > > -- > > > > Key: TIKA-1706 > > URL: https://issues.apache.org/jira/browse/TIKA-1706 > > Project: Tika > > Issue Type: Improvement > > Components: core > >Reporter: Yaniv Kunda > >Priority: Minor > > Fix For: 1.11 > > > > Attachments: TIKA-1706-1.patch, TIKA-1706-2.patch > > > > > > TIKA-249 inlined select commons-io classes in order to simplify the > dependency tree and save some space. > > I believe these arguments are weaker nowadays due to the following > concerns: > > - Most of the non-core modules already use commons-io, and since > tika-core is usually not used by itself, commons-io is already included > with it > > - Since some modules use both tika-core and commons-io, it's not clear > which code should be used > > - Having the inlined classes causes more maintenance and/or technology > debt (which in turn causes more maintenance) > > - Newer commons-io code utilizes newer platform code, e.g. using Charset > objects instead of encoding names, being able to use StringBuilder instead > of StringBuffer, and so on. > > I'll be happy to provide a patch to replace usages of the inlined > classes with commons-io classes if this is accepted. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > -- This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email le...@answers.com. If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out. Thank you.
[jira] [Updated] (TIKA-1706) Bring back commons-io to tika-core
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1706: -- Attachment: TIKA-1706-2.patch TIKA-1706-1.patch A proposed patch per [~grossws]'s suggestion from the dev mailing list - The first patch contains the following: - creation of the secondary jar using maven-shade-plugin: -- used the *uber* classifier using alternatives: shaded, nodep, all, etc. Which one is best? -- commons-io shaded under {{shaded.commons-io.$\{commons.io.version\}.org.apache.commons.io}} to avoid potential conflicts with other commons-io-shading dependencies e.g. as in org.ops4j.pax.url:pax-url-aether:2.3.0 -- automatic removal of unused classes using - deprecated all classes that were copied from commons-io and modified them to extend their new counterparts - deprecated all constructors - removed all identical or functionally identical methods - modified all remaining methods to call alternative existing jdk/commons-io methods, deprecated them and refered to the used alternatives _*Note: this was done only in IOUtils, where many methods that has the same signature as the ones in commons-io were modified along the way to use UTF-8 instead of the platform default._ - all things should remain backward-compatible, except one: org.apache.tika.io.TaggedIOException(IOException, Object) will now throw a ClassCastException if the Object is not Serializable The second patch contains trivial import changes in tika-core from org.apache.tika.io to org.apache.commons.io > Bring back commons-io to tika-core > -- > > Key: TIKA-1706 > URL: https://issues.apache.org/jira/browse/TIKA-1706 > Project: Tika > Issue Type: Improvement > Components: core > Reporter: Yaniv Kunda >Priority: Minor > Fix For: 1.11 > > Attachments: TIKA-1706-1.patch, TIKA-1706-2.patch > > > TIKA-249 inlined select commons-io classes in order to simplify the > dependency tree and save some space. > I believe these arguments are weaker nowadays due to the following concerns: > - Most of the non-core modules already use commons-io, and since tika-core is > usually not used by itself, commons-io is already included with it > - Since some modules use both tika-core and commons-io, it's not clear which > code should be used > - Having the inlined classes causes more maintenance and/or technology debt > (which in turn causes more maintenance) > - Newer commons-io code utilizes newer platform code, e.g. using Charset > objects instead of encoding names, being able to use StringBuilder instead of > StringBuffer, and so on. > I'll be happy to provide a patch to replace usages of the inlined classes > with commons-io classes if this is accepted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1744) Use java.nio.file.Path in TikaInputStream
[ https://issues.apache.org/jira/browse/TIKA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1744: -- Attachment: TIKA-1744-2.patch Additional minor patch: - Corrected javadoc links - Added {{@Deprecated}} annotations to methods where {{@deprecated}} javadoc tags were added > Use java.nio.file.Path in TikaInputStream > - > > Key: TIKA-1744 > URL: https://issues.apache.org/jira/browse/TIKA-1744 > Project: Tika > Issue Type: Sub-task > Components: core > Reporter: Yaniv Kunda >Assignee: Tim Allison >Priority: Minor > Labels: java7 > Fix For: 1.11 > > Attachments: TIKA-1744-2.patch, TIKA-1744.patch > > > This will provide support for the new api for users who need it, and provide > better information in I/O operations, e.g. detailed exception if file cannot > be read. > - used Path and methods in java.nio.file.Files internally > - add getPath() method as the counterpart to getFile() > - modified test to use -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: [jira] [Resolved] (TIKA-1747) Change file->path in tika-batch throughout
Tim - I actually had a shelved changelist with improvements almost identical to what you did for FSBatchTestBase! I also shared the thought that the utility methods - countChildren, readFileToString, deleteDirectory, listPaths - should be elsewhere. Ideally in commons-io, but this will have to wait until it requires Java 7. How about in the meantime I concentrate them in tika-core in a new utility class such as org.apache.tika.io.FileUtils or org.apache.tika.io.Files? This will expose these methods to other Java7-transitioning code (of which I have plenty almost ready to be delivered), reducing redundant boilerplate code. In addition, I think some of these methods could be slightly improved along the way, and if they're going to a first-class utility class (no pun intended), I suggest the following names for clarity and consistency: countChildren -> countEntries (Files.walkFileTree and DirectoryStream refer to these as entries) listPaths -> listEntries (ditto, or use listChildren and leave countChildren as is) deleteDirectory -> deleteRecursively (just because it can be technically used to delete a non-directory file, which is actually convenient) readFileToString -> toString (as in Guava's Files.toString(File, Charset)) -Original Message- From: Tim Allison (JIRA) [mailto:j...@apache.org] Sent: Wednesday, September 30, 2015 19:01 To: dev@tika.apache.org Subject: [jira] [Resolved] (TIKA-1747) Change file->path in tika-batch throughout [ https://issues.apache.org/jira/browse/TIKA-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1747. --- Resolution: Fixed r1706060 > Change file->path in tika-batch throughout > -- > > Key: TIKA-1747 > URL: https://issues.apache.org/jira/browse/TIKA-1747 > Project: Tika > Issue Type: Sub-task > Components: batch >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 1.11 > > > Add Path equivalents for File and deprecate File usage in tika-batch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) -- This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email le...@answers.com. If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out. Thank you.
[jira] [Commented] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938914#comment-14938914 ] Yaniv Kunda commented on TIKA-1757: --- Also, regarding the badness of {{URL#getFile()}} - on Windows machines it returns a String starting with a slash - e.g. {{/C:\File.txt}}. This, for some reason, when passed to a {{File}} constructor, is handled in a lenient manner, and the preceding slash disappears - unlike {{Paths.get(String)}} fails with a {{InvalidPathException}}. > tika-batch tests fail on systems with whitespace or special chars in folder > name > > > Key: TIKA-1757 > URL: https://issues.apache.org/jira/browse/TIKA-1757 > Project: Tika > Issue Type: Bug >Reporter: Uwe Schindler >Assignee: Tim Allison > Attachments: TIKA-1757.patch > > > This is one problem that forbiddenapis des not catch, because the method > affected has valid use cases: {{URL#getFile()}} or {{URL#getPath()}} both > return the URL path, which should never be treated as a file system path (for > file: URLs). This is breaks asap, if the path contains special characters > which may not be part of URL. getFile() and getPath() return the encoded path. > The correct way to transform a file URL to a file is: {{new > File(url.toURI())}}. See also the list of "bad stuff" as listed by the Maven > community for Mojos/Plugins. > In fact the affected test should not use a file at all. Instead it should use > {{Class#getResourceAsStream()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938908#comment-14938908 ] Yaniv Kunda commented on TIKA-1757: --- If one needs a java.nio.file.Path, {{Paths.get(url.toURI())}} can be used instead. > tika-batch tests fail on systems with whitespace or special chars in folder > name > > > Key: TIKA-1757 > URL: https://issues.apache.org/jira/browse/TIKA-1757 > Project: Tika > Issue Type: Bug >Reporter: Uwe Schindler >Assignee: Tim Allison > Attachments: TIKA-1757.patch > > > This is one problem that forbiddenapis des not catch, because the method > affected has valid use cases: {{URL#getFile()}} or {{URL#getPath()}} both > return the URL path, which should never be treated as a file system path (for > file: URLs). This is breaks asap, if the path contains special characters > which may not be part of URL. getFile() and getPath() return the encoded path. > The correct way to transform a file URL to a file is: {{new > File(url.toURI())}}. See also the list of "bad stuff" as listed by the Maven > community for Mojos/Plugins. > In fact the affected test should not use a file at all. Instead it should use > {{Class#getResourceAsStream()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1751) Use java.nio.file.Path in TikaConfig
[ https://issues.apache.org/jira/browse/TIKA-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1751: -- Attachment: TIKA-1751.patch Updated patch to latest changes. > Use java.nio.file.Path in TikaConfig > > > Key: TIKA-1751 > URL: https://issues.apache.org/jira/browse/TIKA-1751 > Project: Tika > Issue Type: Sub-task > Components: config > Reporter: Yaniv Kunda >Priority: Minor > Labels: java7 > Fix For: 1.11 > > Attachments: TIKA-1751.patch > > > Provide constructors accepting java.nio.file.Path -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1751) Use java.nio.file.Path in TikaConfig
[ https://issues.apache.org/jira/browse/TIKA-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1751: -- Attachment: (was: TIKA-1751.patch) > Use java.nio.file.Path in TikaConfig > > > Key: TIKA-1751 > URL: https://issues.apache.org/jira/browse/TIKA-1751 > Project: Tika > Issue Type: Sub-task > Components: config > Reporter: Yaniv Kunda >Priority: Minor > Labels: java7 > Fix For: 1.11 > > Attachments: TIKA-1751.patch > > > Provide constructors accepting java.nio.file.Path -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1758) BatchCommandLineBuilder fails on systems with whitespace in path
[ https://issues.apache.org/jira/browse/TIKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938972#comment-14938972 ] Yaniv Kunda commented on TIKA-1758: --- Not a hard requirement - can be avoided by converting a Path back to a File (or to a String). > BatchCommandLineBuilder fails on systems with whitespace in path > > > Key: TIKA-1758 > URL: https://issues.apache.org/jira/browse/TIKA-1758 > Project: Tika > Issue Type: Bug > Components: cli >Reporter: Uwe Schindler > Attachments: TIKA-1758.patch > > > All tests for CLI module fail with errors like that: > {noformat} > Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.048 sec <<< > FAILURE! - in org.apache.tika.cli.TikaCLIBatchCommandL > ineTest > testTwoDirsNoFlags(org.apache.tika.cli.TikaCLIBatchCommandLineTest) Time > elapsed: 0.026 sec <<< ERROR! > java.nio.file.InvalidPathException: Illegal char <"> at index 0: > "C:\Users\Uwe Schindler\Projects\TIKA\svn\tika-app\testInput" > at sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182) > at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153) > at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77) > at sun.nio.fs.WindowsPath.parse(WindowsPath.java:94) > at sun.nio.fs.WindowsFileSystem.getPath(WindowsFileSystem.java:255) > at java.nio.file.Paths.get(Paths.java:84) > at > org.apache.tika.cli.BatchCommandLineBuilder.translateCommandLine(BatchCommandLineBuilder.java:137) > at > org.apache.tika.cli.BatchCommandLineBuilder.build(BatchCommandLineBuilder.java:51) > at > org.apache.tika.cli.TikaCLIBatchCommandLineTest.testTwoDirsNoFlags(TikaCLIBatchCommandLineTest.java:127) > {noformat} > The reason is that BatchCommandLineBuilder adds quotes for unknown reasons!? > If you use ProcessBuilder you don't need that! Not sure what this should do, > but the problem is: The first argument (the executable) contains quotes after > the method transformed it and breaks the test. > I have no idea how to fix this, but the quotes should not be in a String[] > command line at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1758) BatchCommandLineBuilder fails on systems with whitespace in path
[ https://issues.apache.org/jira/browse/TIKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1758: -- Attachment: TIKA-1758.patch A patch containing a fix (and more File->Path migration), requires TIKA-1751. > BatchCommandLineBuilder fails on systems with whitespace in path > > > Key: TIKA-1758 > URL: https://issues.apache.org/jira/browse/TIKA-1758 > Project: Tika > Issue Type: Bug > Components: cli >Reporter: Uwe Schindler > Attachments: TIKA-1758.patch > > > All tests for CLI module fail with errors like that: > {noformat} > Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.048 sec <<< > FAILURE! - in org.apache.tika.cli.TikaCLIBatchCommandL > ineTest > testTwoDirsNoFlags(org.apache.tika.cli.TikaCLIBatchCommandLineTest) Time > elapsed: 0.026 sec <<< ERROR! > java.nio.file.InvalidPathException: Illegal char <"> at index 0: > "C:\Users\Uwe Schindler\Projects\TIKA\svn\tika-app\testInput" > at sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182) > at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153) > at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77) > at sun.nio.fs.WindowsPath.parse(WindowsPath.java:94) > at sun.nio.fs.WindowsFileSystem.getPath(WindowsFileSystem.java:255) > at java.nio.file.Paths.get(Paths.java:84) > at > org.apache.tika.cli.BatchCommandLineBuilder.translateCommandLine(BatchCommandLineBuilder.java:137) > at > org.apache.tika.cli.BatchCommandLineBuilder.build(BatchCommandLineBuilder.java:51) > at > org.apache.tika.cli.TikaCLIBatchCommandLineTest.testTwoDirsNoFlags(TikaCLIBatchCommandLineTest.java:127) > {noformat} > The reason is that BatchCommandLineBuilder adds quotes for unknown reasons!? > If you use ProcessBuilder you don't need that! Not sure what this should do, > but the problem is: The first argument (the executable) contains quotes after > the method transformed it and breaks the test. > I have no idea how to fix this, but the quotes should not be in a String[] > command line at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: [jira] [Commented] (TIKA-1748) Upgrade to POI 3.13-final when available
9/29 is two days away - the latest available build is 20150924. -Original Message- From: gil cattaneo (JIRA) [mailto:j...@apache.org] Sent: Saturday, September 26, 2015 15:47 To: dev@tika.apache.org Subject: [jira] [Commented] (TIKA-1748) Upgrade to POI 3.13-final when available [ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909254#comment-14909254 ] gil cattaneo commented on TIKA-1748: hi i used poi-3.13-20150929 and consequently build fails [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.2:compile (default-compile) on project tika-parsers: Compilation failure: Compilation failure: [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[22,27] package org.apache.poi.hslf does not exist [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[25,33] cannot find symbol [ERROR] symbol: class MasterSheet [ERROR] location: package org.apache.poi.hslf.model [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[26,33] cannot find symbol [ERROR] symbol: class Notes [ERROR] location: package org.apache.poi.hslf.model [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[28,33] cannot find symbol [ERROR] symbol: class Picture [ERROR] location: package org.apache.poi.hslf.model [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[29,33] cannot find symbol [ERROR] symbol: class Shape [ERROR] location: package org.apache.poi.hslf.model [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[30,33] cannot find symbol [ERROR] symbol: class Slide [ERROR] location: package org.apache.poi.hslf.model [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[31,33] cannot find symbol [ERROR] symbol: class Table [ERROR] location: package org.apache.poi.hslf.model [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[32,33] cannot find symbol [ERROR] symbol: class TableCell [ERROR] location: package org.apache.poi.hslf.model [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[33,33] cannot find symbol [ERROR] symbol: class TextRun [ERROR] location: package org.apache.poi.hslf.model [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[34,33] cannot find symbol [ERROR] symbol: class TextShape [ERROR] location: package org.apache.poi.hslf.model [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[35,37] cannot find symbol [ERROR] symbol: class ObjectData [ERROR] location: package org.apache.poi.hslf.usermodel [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[36,37] cannot find symbol [ERROR] symbol: class PictureData [ERROR] location: package org.apache.poi.hslf.usermodel [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[37,37] cannot find symbol [ERROR] symbol: class SlideShow [ERROR] location: package org.apache.poi.hslf.usermodel [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[178,59] cannot find symbol [ERROR] symbol: class MasterSheet [ERROR] location: class org.apache.tika.parser.microsoft.HSLFExtractor [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[202,62] cannot find symbol [ERROR] symbol: class Table [ERROR] location: class org.apache.tika.parser.microsoft.HSLFExtractor [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[220,60] cannot find symbol [ERROR] symbol: class TextRun [ERROR] location: class org.apache.tika.parser.microsoft.HSLFExtractor [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[241,46] cannot find symbol [ERROR] symbol: class SlideShow [ERROR] location: class org.apache.tika.parser.microsoft.HSLFExtractor [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:[270,47] cannot find symbol [ERROR] symbol: class Slide [ERROR] location: class org.apache.tika.parser.microsoft.HSLFExtractor [ERROR] /BUILD/tika-1.10/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSLFPowerPointExtractorDecorator.java:[69,49] incompatible types: java.util.List cannot be converted to org.apache.poi.xslf.usermodel.XSLFSlide[] [ERROR]
[jira] [Updated] (TIKA-1750) CachedTranslator.isAvailable() throws NPE when underlying translator is null
[ https://issues.apache.org/jira/browse/TIKA-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1750: -- Attachment: TIKA-1750.patch > CachedTranslator.isAvailable() throws NPE when underlying translator is null > > > Key: TIKA-1750 > URL: https://issues.apache.org/jira/browse/TIKA-1750 > Project: Tika > Issue Type: Bug > Components: translation > Reporter: Yaniv Kunda >Priority: Minor > Fix For: 1.11 > > Attachments: TIKA-1750.patch > > > When initialized with no underlying translator, CachedTranslator throws NPE > when calling isAvailable(), although a user should initialize the translator > (as it says in the default constructor's javadoc), it doesn't always happen > and since CachedTranslator is defined as a registered service in > tika-translate\src\main\resources\META-INF\services\org.apache.tika.language.translate.Translator, > it normally doesn't (causing DumpTikaConfigExampleTest to fail). > Since CachedTranslator is returning the source text when calling > translate(String, String, String) when the translator is null, it makes sense > that isAvailable returns false under the same condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1750) CachedTranslator.isAvailable() throws NPE when underlying translator is null
Yaniv Kunda created TIKA-1750: - Summary: CachedTranslator.isAvailable() throws NPE when underlying translator is null Key: TIKA-1750 URL: https://issues.apache.org/jira/browse/TIKA-1750 Project: Tika Issue Type: Bug Components: translation Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 When initialized with no underlying translator, CachedTranslator throws NPE when calling isAvailable(), although a user should initialize the translator (as it says in the default constructor's javadoc), it doesn't always happen and since CachedTranslator is defined as a registered service in tika-translate\src\main\resources\META-INF\services\org.apache.tika.language.translate.Translator, it normally doesn't (causing DumpTikaConfigExampleTest to fail). Since CachedTranslator is returning the source text when calling translate(String, String, String) when the translator is null, it makes sense that isAvailable returns false under the same condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: [jira] [Created] (TIKA-1749) Upgrade, or shade, guava
Tika no longer uses Guava - it was removed in r1696860, see https://issues.apache.org/jira/browse/TIKA-1710?focusedCommentId=14705823=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14705823 We still have some references in tika-bundle's pom, but no dependencies in any component. -Original Message- From: Alexander Pogrenbyak (JIRA) [mailto:j...@apache.org] Sent: Wednesday, September 23, 2015 22:37 To: dev@tika.apache.org Subject: [jira] [Created] (TIKA-1749) Upgrade, or shade, guava Alexander Pogrenbyak created TIKA-1749: -- Summary: Upgrade, or shade, guava Key: TIKA-1749 URL: https://issues.apache.org/jira/browse/TIKA-1749 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.10 Reporter: Alexander Pogrenbyak I use managed dependencies and have guava managed to version 18.0. The tika-parsers project has guava version 11.0.2 I have a concern that managing up guava 18.0 may break something in Tika code. Besides the fact that 11.0.2 is deprecated a long time ago, if Tika has dependency on a particular version it should shade it for its use. -- This message was sent by Atlassian JIRA (v6.3.4#6332) -- This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email le...@answers.com. If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out. Thank you.
[jira] [Created] (TIKA-1751) Use java.nio.file.Path in TikaConfig
Yaniv Kunda created TIKA-1751: - Summary: Use java.nio.file.Path in TikaConfig Key: TIKA-1751 URL: https://issues.apache.org/jira/browse/TIKA-1751 Project: Tika Issue Type: Sub-task Components: config Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 Provide constructors accepting java.nio.file.Path -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1751) Use java.nio.file.Path in TikaConfig
[ https://issues.apache.org/jira/browse/TIKA-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1751: -- Attachment: TIKA-1751.patch > Use java.nio.file.Path in TikaConfig > > > Key: TIKA-1751 > URL: https://issues.apache.org/jira/browse/TIKA-1751 > Project: Tika > Issue Type: Sub-task > Components: config > Reporter: Yaniv Kunda >Priority: Minor > Fix For: 1.11 > > Attachments: TIKA-1751.patch > > > Provide constructors accepting java.nio.file.Path -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1752) Use java.nio.file.Path in org.apache.tika.detect
Yaniv Kunda created TIKA-1752: - Summary: Use java.nio.file.Path in org.apache.tika.detect Key: TIKA-1752 URL: https://issues.apache.org/jira/browse/TIKA-1752 Project: Tika Issue Type: Sub-task Components: detector Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 Add constructors and methods accepting java.nio.file.Path to TrainedModelDetector & Son. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1734) Use java.nio.file.Path in TemporaryResources
[ https://issues.apache.org/jira/browse/TIKA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1734: -- Labels: java7 (was: ) > Use java.nio.file.Path in TemporaryResources > > > Key: TIKA-1734 > URL: https://issues.apache.org/jira/browse/TIKA-1734 > Project: Tika > Issue Type: Sub-task > Components: core > Reporter: Yaniv Kunda >Priority: Minor > Labels: java7 > Fix For: 1.11 > > Attachments: TIKA-1734.patch > > > This will provide support for the new api for uses who need it, and provide > better information in I/O operations, e.g. detailed exception if temporary > file deletion fails. > - used Path and methods in java.nio.file.Files internally > - add setTemporaryFileDirectory(Path) method > - add createTempFile() method (mimicking Files.createTempFile) > - add unit test for proper deletion of temp files -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1752) Use java.nio.file.Path in org.apache.tika.detect
[ https://issues.apache.org/jira/browse/TIKA-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1752: -- Labels: java7 (was: ) > Use java.nio.file.Path in org.apache.tika.detect > > > Key: TIKA-1752 > URL: https://issues.apache.org/jira/browse/TIKA-1752 > Project: Tika > Issue Type: Sub-task > Components: detector > Reporter: Yaniv Kunda >Priority: Minor > Labels: java7 > Fix For: 1.11 > > Attachments: TIKA-1752.patch > > > Add constructors and methods accepting java.nio.file.Path to > TrainedModelDetector & Son. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1746) modify TikaFileTypeDetector to use new detect method accepting java.nio.file.Path
[ https://issues.apache.org/jira/browse/TIKA-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1746: -- Labels: java7 (was: ) > modify TikaFileTypeDetector to use new detect method accepting > java.nio.file.Path > - > > Key: TIKA-1746 > URL: https://issues.apache.org/jira/browse/TIKA-1746 > Project: Tika > Issue Type: Sub-task > Components: detector > Reporter: Yaniv Kunda >Priority: Minor > Labels: java7 > Fix For: 1.11 > > Attachments: TIKA-1746.patch > > > Utilize the new org.apache.tika.Tika.detect(Path) method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1744) Use java.nio.file.Path in TikaInputStream
[ https://issues.apache.org/jira/browse/TIKA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1744: -- Attachment: TIKA-1744.patch > Use java.nio.file.Path in TikaInputStream > - > > Key: TIKA-1744 > URL: https://issues.apache.org/jira/browse/TIKA-1744 > Project: Tika > Issue Type: Sub-task > Components: core > Reporter: Yaniv Kunda >Priority: Minor > Fix For: 1.11 > > Attachments: TIKA-1744.patch > > > This will provide support for the new api for users who need it, and provide > better information in I/O operations, e.g. detailed exception if file cannot > be read. > - used Path and methods in java.nio.file.Files internally > - add getPath() method as the counterpart to getFile() > - modified test to use -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1745) Add methods accepting java.nio.file.Path to org.apache.tika.Tika and org.apache.tika.parser.ParsingReader
[ https://issues.apache.org/jira/browse/TIKA-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1745: -- Attachment: TIKA-1745.patch > Add methods accepting java.nio.file.Path to org.apache.tika.Tika and > org.apache.tika.parser.ParsingReader > - > > Key: TIKA-1745 > URL: https://issues.apache.org/jira/browse/TIKA-1745 > Project: Tika > Issue Type: Sub-task > Components: core > Reporter: Yaniv Kunda >Priority: Minor > Fix For: 1.11 > > Attachments: TIKA-1745.patch > > > Add methods accepting java.nio.file.Path to complement those accepting > java.io.File, using the new methods in TikaInputStream or java.nio.file.Files -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1746) modify TikaFileTypeDetector to use new detect method accepting java.nio.file.Path
[ https://issues.apache.org/jira/browse/TIKA-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1746: -- Attachment: TIKA-1746.patch > modify TikaFileTypeDetector to use new detect method accepting > java.nio.file.Path > - > > Key: TIKA-1746 > URL: https://issues.apache.org/jira/browse/TIKA-1746 > Project: Tika > Issue Type: Sub-task > Components: detector > Reporter: Yaniv Kunda >Priority: Minor > Fix For: 1.11 > > Attachments: TIKA-1746.patch > > > Utilize the new org.apache.tika.Tika.detect(Path) method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [jira] [Commented] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path
Yes, using getPath() for the getFile() counterpart. I'll prepare patches in a few hours. On Sep 22, 2015 4:35 PM, "Tim Allison (JIRA)" <j...@apache.org> wrote: > > [ > https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902613#comment-14902613 > ] > > Tim Allison commented on TIKA-1726: > --- > > Thank you, [~kkrugler]. [~kunda], is there enough consensus on this to > move forward? > > > Augment public methods that use a java.io.File with methods that use a > java.nio.file.Path > > > - > > > > Key: TIKA-1726 > > URL: https://issues.apache.org/jira/browse/TIKA-1726 > > Project: Tika > > Issue Type: Improvement > > Components: batch, core, gui, parser, translation > >Reporter: Yaniv Kunda > >Priority: Minor > > Fix For: 1.11 > > > > > > In light of Java 7 already EOL, it's high time we add support for the > new java.nio.file.Path class introduced with it, which, together with > support methods in java.nio.file.Files and others, provide a better file > I/O framework than java.io.File. > > In just two cases, we have public methods in tika that only return a > File object, and cannot be overloaded, so a different name for the new > method must be created: > > - {{org.apache.tika.io.TemporaryResources#createTemporaryFile()}} > > _Suggestions:_ > > -- addTemporaryFile > > -- addTempFile > > -- createTempFile > > -- createTemporaryPath > > - {{org.apache.tika.io.TikaInputStream#getFile()}} > > _Suggestions:_ > > -- asFile > > -- toPath > > -- getPath > > In other cases, the methods accept a File as an argument, and should > remain as tika users might be using them - so an overloaded method that > accepts a Path instead should be added, referencing the new method from the > old one (using the @see tag) until java.io.File itself is deprecated or > otherwise becomes obsolete. > > Here is the full list of other methods: > > _tika-app:_ > > - {{org.apache.tika.gui.TikaGUI#openFile(File)}} > > _tika-batch:_ > > - {{org.apache.tika.batch.fs.FSUtil#getOutputFile(File, String, > HANDLE_EXISTING, String)}} > > - {{org.apache.tika.util.PropsUtil#getFile(String, File)}} > > - {{org.apache.tika.batch.fs.FSDirectoryCrawler}} constructors > > - > {{org.apache.tika.batch.fs.FSDirectoryCrawler#handleFirstFileInDirectory(File)}} > > - {{org.apache.tika.batch.fs.FSFileResource}} constructor > > - {{org.apache.tika.batch.fs.FSListCrawler}} constructor > > - {{org.apache.tika.batch.fs.FSOutputStreamFactory}} constructor > > - > {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfOrSameAsThat(File, > File)}} > > - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfThat(File, > File)}} > > - {{org.apache.tika.batch.fs.strawman.StrawManTikaAppDriver}} constructor > > _tika-core:_ > > - {{org.apache.tika.Tika#detect(File)}} > > - {{org.apache.tika.Tika#parse(File)}} > > - {{org.apache.tika.Tika#parseToString(File)}} > > - {{org.apache.tika.config.TikaConfig}} constructors > > - {{org.apache.tika.detect.NNExampleModelDetector}} constructor > > - {{org.apache.tika.detect.TrainedModelDetector#loadDefaultModels(File)}} > > - > {{org.apache.tika.io.TemporaryResources#setTemporaryFileDirectory(File)}} > > - {{org.apache.tika.io.TikaInputStream#get(File)}} > > - {{org.apache.tika.io.TikaInputStream#get(File, Metadata)}} > > _tika-parsers:_ > > - {{org.apache.tika.parser.ParsingReader}} constructor > > - {{org.apache.tika.parser.image.ImageMetadataExtractor#parseJpeg(File)}} > > - {{org.apache.tika.parser.image.ImageMetadataExtractor#parseWebP(File)}} > > - {{org.apache.tika.parser.mp4.DirectFileReadDataSource}} constructor > > _tika-translate:_ > > - > {{org.apache.tika.language.translate.ExternalTranslator#runAndGetOutput(String, > String[], File)}} > > Due to lack of evidence, all public methods in public non-test classes > (and not in tika-example) are deemed part of a public API - although > there's no formal definition of such. > > If anyone knows of a public method which isn't accessed publicly and can > be defined as package-private, or for another reason, please comment. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > -- This email communication (including any attachments) contains information from Answers Corporation or its aff
RE: [DISCUSS] Release Tika 1.11?
Thanks for the positive spirit! Regarding FilenameUtils.getName() - I believe that its functionality can be replaced by Path.getFileName() - and in a platform-aware manner, as each JVM distribution comes with a specific provider implementation for the OS it's for. -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Monday, September 21, 2015 14:27 To: dev@tika.apache.org Subject: RE: [DISCUSS] Release Tika 1.11? +1, it would be great to move a bit more into EOL'd Java 7 asap. I'll take TIKA-1734 by tomorrow EDT. As for the other 2, I'm personally ok waiting for 1.12, but I defer to the dev community. Chris, Nick, Ray, Ken, Konstantin, if you have a chance to chime in on TIKA-1726, that might help move things forward. On TIKA-1706, I share Nick's and Jukka's caution, and I also share Yaniv's point about duplication of code, bloat within Tika and missing out on updates. Aside from one small bit of code I'd like to keep or perhaps try to move into commons-io (?)[0], I think I'm now +1 to going forward with TIKA-1706 in core...unless there is a -1 from the community. Best, Tim [1] I added some customizations for old MAC OS behavior (treat ":" as file separator) in FileNameUtils.getName() that I don't want to lose. -Original Message----- From: Yaniv Kunda [mailto:yaniv.ku...@answers.com] Sent: Sunday, September 20, 2015 7:15 AM To: dev@tika.apache.org Subject: RE: [DISCUSS] Release Tika 1.11? I would really like to push the following: https://issues.apache.org/jira/browse/TIKA-1706 - Bring back commons-io to tika-core This requires a decision to re-include commons-io as a dependency of tika-core. All the pros and cons have been already debated, but no decision has been made. https://issues.apache.org/jira/browse/TIKA-1726 - Augment public methods that use a java.io.File with methods that use a java.nio.file.Path Since this adds new methods to the public API, I requested the group to make a decision about the new names - but have not received something definite. However, I did create a subtask - https://issues.apache.org/jira/browse/TIKA-1734 Use java.nio.file.Path in TemporaryResources - using [~tallison]'s suggestion, which has not been committed yet. If decisions are made on the above issues, I can quickly create patches for them. -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Saturday, September 19, 2015 08:10 To: dev@tika.apache.org Subject: [DISCUSS] Release Tika 1.11? Hey Guys and Gals, I’d like to roll a 1.11 release. There is TIKA-1716 which in particular allows some neat functionality in tika-python: https://github.com/chrismattmann/tika-python/pull/67 Anything else to try and get into the release? If not, I’ll produce an RC #1 by end of weekend. Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -- This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email le...@answers.com. If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out. Thank you. -- This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email le...@answers.com. I
RE: [DISCUSS] Release Tika 1.11?
I would really like to push the following: https://issues.apache.org/jira/browse/TIKA-1706 - Bring back commons-io to tika-core This requires a decision to re-include commons-io as a dependency of tika-core. All the pros and cons have been already debated, but no decision has been made. https://issues.apache.org/jira/browse/TIKA-1726 - Augment public methods that use a java.io.File with methods that use a java.nio.file.Path Since this adds new methods to the public API, I requested the group to make a decision about the new names - but have not received something definite. However, I did create a subtask - https://issues.apache.org/jira/browse/TIKA-1734 Use java.nio.file.Path in TemporaryResources - using [~tallison]'s suggestion, which has not been committed yet. If decisions are made on the above issues, I can quickly create patches for them. -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Saturday, September 19, 2015 08:10 To: dev@tika.apache.org Subject: [DISCUSS] Release Tika 1.11? Hey Guys and Gals, I’d like to roll a 1.11 release. There is TIKA-1716 which in particular allows some neat functionality in tika-python: https://github.com/chrismattmann/tika-python/pull/67 Anything else to try and get into the release? If not, I’ll produce an RC #1 by end of weekend. Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -- This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email le...@answers.com. If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out. Thank you.
[jira] [Updated] (TIKA-1738) ForkClient does not always delete temporary bootstrap jar
[ https://issues.apache.org/jira/browse/TIKA-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1738: -- Attachment: TIKA-1738.patch This patch moves the bootstrap jar creation to be static and happen only once in the class initialization. Deletion is done using a single shutdown hook, which will *probably* do its job, if no handle created by a forked process still references the file - i.e. if enough time has passed since the last forked process was destroyed and the JVM was shutdown. It also uses java.nio.file instead of the old java.io package. Added benefit: performance is better since forked process do not need to create the bootstrap jar all over again. Added drawback: if temp jar is deleted between forks future forks would fail. > ForkClient does not always delete temporary bootstrap jar > - > > Key: TIKA-1738 > URL: https://issues.apache.org/jira/browse/TIKA-1738 > Project: Tika > Issue Type: Bug > Components: core > Environment: Windows 10 > Reporter: Yaniv Kunda >Priority: Minor > Fix For: 1.11 > > Attachments: TIKA-1738.patch > > > ForkClient creates a new temporary bootstrap jar each time it's instantiated, > and tries to delete it in the {{close()}} method, after destroying the > process. > Possibly a Windows-specific behavior, the OS seem to still hold a handle to > the file a bit after the process is destroyed, causing the delete() method to > do nothing. > This is recreated by simply running ForkParserTest on my machine. > In a long-running process,this could fill the temp folder with many bootstrap > jars that will never be deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1734) Use java.nio.file.Path in TemporaryResources
Yaniv Kunda created TIKA-1734: - Summary: Use java.nio.file.Path in TemporaryResources Key: TIKA-1734 URL: https://issues.apache.org/jira/browse/TIKA-1734 Project: Tika Issue Type: Sub-task Components: core Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 This will provide support for the new api for uses who need it, and provide better information in I/O operations, e.g. detailed exception if temporary file deletion fails. - used Path and methods in java.nio.file.Files internally - add setTemporaryFileDirectory(Path) method - add createTempFile() method (mimicking Files.createTempFile) - add unit test for proper deletion of temp files -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1734) Use java.nio.file.Path in TemporaryResources
[ https://issues.apache.org/jira/browse/TIKA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1734: -- Attachment: TIKA-1734.patch > Use java.nio.file.Path in TemporaryResources > > > Key: TIKA-1734 > URL: https://issues.apache.org/jira/browse/TIKA-1734 > Project: Tika > Issue Type: Sub-task > Components: core > Reporter: Yaniv Kunda >Priority: Minor > Fix For: 1.11 > > Attachments: TIKA-1734.patch > > > This will provide support for the new api for uses who need it, and provide > better information in I/O operations, e.g. detailed exception if temporary > file deletion fails. > - used Path and methods in java.nio.file.Files internally > - add setTemporaryFileDirectory(Path) method > - add createTempFile() method (mimicking Files.createTempFile) > - add unit test for proper deletion of temp files -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: Adding API support for Java 7's java.nio.file.Path
Can we move this forward? Already decided: Methods using java.io.File will be left as is and added a @see Javadoc tag to refer to the java.nio.file.Path counterpart. Not decided yet: Names for the methods returning a java.nio.file.Path (especially org.apache.tika.io.TemporaryResources#createTemporaryFile and org.apache.tika.io.TikaInputStream#getFile) We need either more opinions or a decision – this is an addition to the public API so we need a sustainable decision. *From:* Yaniv Kunda [mailto:yaniv.ku...@answers.com] *Sent:* Tuesday, September 1, 2015 15:44 *To:* dev@tika.apache.org *Subject:* RE: Adding API support for Java 7's java.nio.file.Path I’ve formalized this issue here: https://issues.apache.org/jira/browse/TIKA-1726 Please take the time and share your opinion on the new method names, so I can go ahead a provide some patches. *From:* Yaniv Kunda [mailto:yaniv.ku...@answers.com] *Sent:* Monday, August 31, 2015 18:51 *To:* dev@tika.apache.org *Subject:* Re: Adding API support for Java 7's java.nio.file.Path I've already done that, I'm just waiting for the group's opinions on names for the new methods, especially the two that I've added to augment org.apache.tika.io.TemporaryResources#createTemporaryFile And org.apache.tika.io.TikaInputStream#getFile As described below. On Aug 31, 2015 3:26 PM, "Konstantin Gribov" <gros...@gmail.com> wrote: My two cents, we can migrate to Files.copy, Files.newBufferedReader etc in places where it can replace commons-io and Tika's internal copy of it. сб, 29 авг. 2015 г. в 19:48, Ken Krugler <kkrugler_li...@transpac.com>: > > > From: Yaniv Kunda > > Sent: August 29, 2015 2:21:23am PDT > > To: dev@tika.apache.org > > Subject: RE: Adding API support for Java 7's java.nio.file.Path > > > > In addition to the discussion I've raised about the methods returning a > > File, I have another problem: > > Some of the methods that accept a File throw a FileNotFoundException. > > This exception is thrown by FIS/FOS/RAF constructors in response to > > anything - from an file that's actually not there to access denied. > > The NIO api methods usually declare to throw an IOException, which can > be a > > subclass representing a more accurate reason - NoSuchFileException or > > AccessDeniedException. > > > > When adding the overloaded methods accepting a Path, I initially thought > to > > delegate the old methods to the new ones, but the new ones declare an > > IOException while the old declare a FileNotFoundException. > > > > I have three options: > > 1) Leave the old methods with their own code - > > this means essentially duplicate code, but complete backward > > compatibility. > > +1 > > I don't feel strongly, but I think we get max bang for the development > buck by doing the simplest thing here. > > And it doesn't feel like it'll be that long before Tika 2.0, when the old > method code can be removed. > > -- Ken > > > 2) Delegate the old methods to the new ones, but catch the IOException > and > > wrap it in a FileNotFoundException - > > this will remain backward compatible, unless some catching a > > FileNotFoundException does text analysis on the exception message. > > 3) Delegate the old methods to the new ones, and change the signature > > accordingly to throw an IOException instead of a FileNotFoundException - > > this will break backward compatibility, only in cases a > > FileNotFoundException was caught explicitly. > > > > What do you think? > > > > -Original Message- > > From: Yaniv Kunda [mailto:yaniv.ku...@answers.com] > > Sent: Friday, August 28, 2015 03:33 > > To: dev@tika.apache.org > > Subject: RE: Adding API support for Java 7's java.nio.file.Path > > > > Thanks, I just like to move things forward :-) > > > > Regarding my proposed API additions - > > since adding new methods will make them a part of a new API, this is a > > change to make their names more meaningful/concise/correct: replacing > File > > with Path in the method name might be awkward. > > > > I'd like to gather alternatives for the changes/additions to methods that > > return a File. > > I found a total of 4 methods that return a java.io.File and are public, > in > > public non-test classes and not in tika-example (I assume the rest can be > > changed without breaking anything). > > For each method I will provide my suggestion/s, which will be either "Add > > newName", "Replace with newName" or "Keep": > > > > tika-batch: > > - org.apache.tika.batch.fs.FSUtil#getOutputFile > > + Keep > > - org.apache.t
[jira] [Updated] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path
[ https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1726: -- Description: In light of Java 7 already EOL, it's high time we add support for the new java.nio.file.Path class introduced with it, which, together with support methods in java.nio.file.Files and others, provide a better file I/O framework than java.io.File. In just two cases, we have public methods in tika that only return a File object, and cannot be overloaded, so a different name for the new method must be created: - {{org.apache.tika.io.TemporaryResources#createTemporaryFile()}} _Suggestions:_ -- addTemporaryFile -- addTempFile -- createTempFile - {{org.apache.tika.io.TikaInputStream#getFile()}} _Suggestions:_ -- asFile -- toPath -- getPath In other cases, the methods accept a File as an argument, and should remain as tika users might be using them - so an overloaded method that accepts a Path instead should be added, referencing the new method from the old one using (using the @see tag) deprecating the old method until an unknown tika major release. Here is the full list of other methods: _tika-app:_ - {{org.apache.tika.gui.TikaGUI#openFile(File)}} _tika-batch:_ - {{org.apache.tika.batch.fs.FSUtil#getOutputFile(File, String, HANDLE_EXISTING, String)}} - {{org.apache.tika.util.PropsUtil#getFile(String, File)}} - {{org.apache.tika.batch.fs.FSDirectoryCrawler}} constructors - {{org.apache.tika.batch.fs.FSDirectoryCrawler#handleFirstFileInDirectory(File)}} - {{org.apache.tika.batch.fs.FSFileResource}} constructor - {{org.apache.tika.batch.fs.FSListCrawler}} constructor - {{org.apache.tika.batch.fs.FSOutputStreamFactory}} constructor - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfOrSameAsThat(File, File)}} - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfThat(File, File)}} - {{org.apache.tika.batch.fs.strawman.StrawManTikaAppDriver}} constructor _tika-core:_ - {{org.apache.tika.Tika#detect(File)}} - {{org.apache.tika.Tika#parse(File)}} - {{org.apache.tika.Tika#parseToString(File)}} - {{org.apache.tika.config.TikaConfig}} constructors - {{org.apache.tika.detect.NNExampleModelDetector}} constructor - {{org.apache.tika.detect.TrainedModelDetector#loadDefaultModels(File)}} - {{org.apache.tika.io.TemporaryResources#setTemporaryFileDirectory(File)}} - {{org.apache.tika.io.TikaInputStream#get(File)}} - {{org.apache.tika.io.TikaInputStream#get(File, Metadata)}} _tika-parsers:_ - {{org.apache.tika.parser.ParsingReader}} constructor - {{org.apache.tika.parser.image.ImageMetadataExtractor#parseJpeg(File)}} - {{org.apache.tika.parser.image.ImageMetadataExtractor#parseWebP(File)}} - {{org.apache.tika.parser.mp4.DirectFileReadDataSource}} constructor _tika-translate:_ - {{org.apache.tika.language.translate.ExternalTranslator#runAndGetOutput(String, String[], File)}} Due to lack of evidence, all public methods in public non-test classes (and not in tika-example) are deemed part of a public API - although there's no formal definition of such. If anyone knows of a public method which isn't accessed publicly and can be defined as package-private, or for another reason, please comment. was: In light of Java 7 already EOL, it's high time we add support for the new java.nio.file.Path class introduced with it, which, together with support methods in java.nio.file.Files and others, provide a better file I/O framework than java.io.File. In just two cases, we have public methods in tika that only return a File object, and cannot be overloaded, so a different name for the new method must be created: - {{org.apache.tika.io.TemporaryResources#createTemporaryFile()}} _Suggestions:_ -- addTemporaryFile -- addTempFile -- createTempFile - {{org.apache.tika.io.TikaInputStream#getFile()}} _Suggestions:_ -- asFile -- toPath -- getPath In other cases, the methods accept a File as an argument, and should remain as tika users might be using them - so an overloaded method that accepts a Path instead should be added, deprecating the old method until an unknown tika major release. Here is the full list of other methods: _tika-app:_ - {{org.apache.tika.gui.TikaGUI#openFile(File)}} _tika-batch:_ - {{org.apache.tika.batch.fs.FSUtil#getOutputFile(File, String, HANDLE_EXISTING, String)}} - {{org.apache.tika.util.PropsUtil#getFile(String, File)}} - {{org.apache.tika.batch.fs.FSDirectoryCrawler}} constructors - {{org.apache.tika.batch.fs.FSDirectoryCrawler#handleFirstFileInDirectory(File)}} - {{org.apache.tika.batch.fs.FSFileResource}} constructor - {{org.apache.tika.batch.fs.FSListCrawler}} constructor - {{org.apache.tika.batch.fs.FSOutputStreamFactory}} constructor - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfOrSameAsThat(File, File)}} - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfThat(File, File)}} - {{org.apache.tika.batch.fs.strawman.StrawManTikaAppDriver}} constructor
[jira] [Updated] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path
[ https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1726: -- Description: In light of Java 7 already EOL, it's high time we add support for the new java.nio.file.Path class introduced with it, which, together with support methods in java.nio.file.Files and others, provide a better file I/O framework than java.io.File. In just two cases, we have public methods in tika that only return a File object, and cannot be overloaded, so a different name for the new method must be created: - {{org.apache.tika.io.TemporaryResources#createTemporaryFile()}} _Suggestions:_ -- addTemporaryFile -- addTempFile -- createTempFile -- createTemporaryPath - {{org.apache.tika.io.TikaInputStream#getFile()}} _Suggestions:_ -- asFile -- toPath -- getPath In other cases, the methods accept a File as an argument, and should remain as tika users might be using them - so an overloaded method that accepts a Path instead should be added, referencing the new method from the old one (using the @see tag) until java.io.File itself is deprecated or otherwise becomes obsolete. Here is the full list of other methods: _tika-app:_ - {{org.apache.tika.gui.TikaGUI#openFile(File)}} _tika-batch:_ - {{org.apache.tika.batch.fs.FSUtil#getOutputFile(File, String, HANDLE_EXISTING, String)}} - {{org.apache.tika.util.PropsUtil#getFile(String, File)}} - {{org.apache.tika.batch.fs.FSDirectoryCrawler}} constructors - {{org.apache.tika.batch.fs.FSDirectoryCrawler#handleFirstFileInDirectory(File)}} - {{org.apache.tika.batch.fs.FSFileResource}} constructor - {{org.apache.tika.batch.fs.FSListCrawler}} constructor - {{org.apache.tika.batch.fs.FSOutputStreamFactory}} constructor - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfOrSameAsThat(File, File)}} - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfThat(File, File)}} - {{org.apache.tika.batch.fs.strawman.StrawManTikaAppDriver}} constructor _tika-core:_ - {{org.apache.tika.Tika#detect(File)}} - {{org.apache.tika.Tika#parse(File)}} - {{org.apache.tika.Tika#parseToString(File)}} - {{org.apache.tika.config.TikaConfig}} constructors - {{org.apache.tika.detect.NNExampleModelDetector}} constructor - {{org.apache.tika.detect.TrainedModelDetector#loadDefaultModels(File)}} - {{org.apache.tika.io.TemporaryResources#setTemporaryFileDirectory(File)}} - {{org.apache.tika.io.TikaInputStream#get(File)}} - {{org.apache.tika.io.TikaInputStream#get(File, Metadata)}} _tika-parsers:_ - {{org.apache.tika.parser.ParsingReader}} constructor - {{org.apache.tika.parser.image.ImageMetadataExtractor#parseJpeg(File)}} - {{org.apache.tika.parser.image.ImageMetadataExtractor#parseWebP(File)}} - {{org.apache.tika.parser.mp4.DirectFileReadDataSource}} constructor _tika-translate:_ - {{org.apache.tika.language.translate.ExternalTranslator#runAndGetOutput(String, String[], File)}} Due to lack of evidence, all public methods in public non-test classes (and not in tika-example) are deemed part of a public API - although there's no formal definition of such. If anyone knows of a public method which isn't accessed publicly and can be defined as package-private, or for another reason, please comment. was: In light of Java 7 already EOL, it's high time we add support for the new java.nio.file.Path class introduced with it, which, together with support methods in java.nio.file.Files and others, provide a better file I/O framework than java.io.File. In just two cases, we have public methods in tika that only return a File object, and cannot be overloaded, so a different name for the new method must be created: - {{org.apache.tika.io.TemporaryResources#createTemporaryFile()}} _Suggestions:_ -- addTemporaryFile -- addTempFile -- createTempFile - {{org.apache.tika.io.TikaInputStream#getFile()}} _Suggestions:_ -- asFile -- toPath -- getPath In other cases, the methods accept a File as an argument, and should remain as tika users might be using them - so an overloaded method that accepts a Path instead should be added, referencing the new method from the old one using (using the @see tag) deprecating the old method until an unknown tika major release. Here is the full list of other methods: _tika-app:_ - {{org.apache.tika.gui.TikaGUI#openFile(File)}} _tika-batch:_ - {{org.apache.tika.batch.fs.FSUtil#getOutputFile(File, String, HANDLE_EXISTING, String)}} - {{org.apache.tika.util.PropsUtil#getFile(String, File)}} - {{org.apache.tika.batch.fs.FSDirectoryCrawler}} constructors - {{org.apache.tika.batch.fs.FSDirectoryCrawler#handleFirstFileInDirectory(File)}} - {{org.apache.tika.batch.fs.FSFileResource}} constructor - {{org.apache.tika.batch.fs.FSListCrawler}} constructor - {{org.apache.tika.batch.fs.FSOutputStreamFactory}} constructor - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfOrSameAsThat(File, File)}} - {{org.apache.tika.batch.fs.FSUtil
RE: OSGi exceptions in trunk w Intellij +Osmorc
Probably JDom: http://www.jdom.org/pipermail/jdom-interest/2008-November/016226.html https://developer.atlassian.com/docs/faq/plugin-framework-faq/using-jdom-in-osgi On Sep 3, 2015 9:05 PM, "Allison, Timothy B."wrote: > Interesting, thank you. > > I wasn't getting any pointers to the offending class before I removed the > plugin. > > Any recommendations on finding the offender? > > -Original Message- > From: Bob Paulin [mailto:b...@bobpaulin.com] > Sent: Thursday, September 03, 2015 11:21 AM > To: dev@tika.apache.org > Subject: Re: OSGi exceptions in trunk w Intellij +Osmorc > > It's likely one of the embedded dependencies have class files in the > default package. If these classes are not being used they could just be > removed as suggested here: > > > https://techotom.wordpress.com/2014/10/21/fixing-the-default-package-is-not-permitted-by-the-import-package-syntax-with-maven-bundle-plugin/ > > Do we know which dependency this might be? I agree that it would be > better if this all worked in Intellij with the Osmorc plugin. > > - Bob > > On Thu, Sep 3, 2015 at 10:10 AM, Allison, Timothy B. > wrote: > > > All, > > > > I'm able to build via Maven without any problem. However, within > > Intellij, I'm not able to run any individual unit tests in > > tika-parsers or tika-xmp because of this error: > > > > Error:osgi: [tika-parsers] The default package '.' is not permitted by > > the Import-Package syntax. > > This can be caused by compile errors in Eclipse because Eclipse > > creates valid class files regardless of compile errors. > > The following package(s) import from the default package null > > > > If I remove Osmorc (the OSGi plugin), all is ok, but that seems like a > > really bad idea. Is this something we should fix, or is this > > something that I should ignore? > > > > Best, > > > > Tim > > > > > -- This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email le...@answers.com. If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out. Thank you.
[jira] [Commented] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path
[ https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725769#comment-14725769 ] Yaniv Kunda commented on TIKA-1726: --- Funny you proposed those two alternatives - exactly what I started with... But compared to the methods in the Java platform it seems partly incorrect as these mostly deal with files, located using paths, e.g. Files.createTempFile. So for createTemporaryFile I think that createTemporaryPath is problematic in the sense that an actual file is created, not just a path. I suggested the add* variants to hint that the file is added to the list of resources to close, as in addResource. For getFile, getPath is actually pretty ok but I think both are problematic in that they look like a getter - I wanted to signify its write-to-file functionality. How about save/store/persist? Regarding deprecation, not a problem - I'll drop it and add a @see tag from the old method to the new one (but not the other way round?). In both questions, my suggestions are only suggestions, and my reservations are only reservations - but if you can take a call and make any decision I'd be happy to accept it and move this forward. > Augment public methods that use a java.io.File with methods that use a > java.nio.file.Path > - > > Key: TIKA-1726 > URL: https://issues.apache.org/jira/browse/TIKA-1726 > Project: Tika > Issue Type: Improvement > Components: batch, core, gui, parser, translation > Reporter: Yaniv Kunda >Priority: Minor > Fix For: 1.11 > > > In light of Java 7 already EOL, it's high time we add support for the new > java.nio.file.Path class introduced with it, which, together with support > methods in java.nio.file.Files and others, provide a better file I/O > framework than java.io.File. > In just two cases, we have public methods in tika that only return a File > object, and cannot be overloaded, so a different name for the new method must > be created: > - {{org.apache.tika.io.TemporaryResources#createTemporaryFile()}} > _Suggestions:_ > -- addTemporaryFile > -- addTempFile > -- createTempFile > - {{org.apache.tika.io.TikaInputStream#getFile()}} > _Suggestions:_ > -- asFile > -- toPath > -- getPath > In other cases, the methods accept a File as an argument, and should remain > as tika users might be using them - so an overloaded method that accepts a > Path instead should be added, deprecating the old method until an unknown > tika major release. > Here is the full list of other methods: > _tika-app:_ > - {{org.apache.tika.gui.TikaGUI#openFile(File)}} > _tika-batch:_ > - {{org.apache.tika.batch.fs.FSUtil#getOutputFile(File, String, > HANDLE_EXISTING, String)}} > - {{org.apache.tika.util.PropsUtil#getFile(String, File)}} > - {{org.apache.tika.batch.fs.FSDirectoryCrawler}} constructors > - > {{org.apache.tika.batch.fs.FSDirectoryCrawler#handleFirstFileInDirectory(File)}} > - {{org.apache.tika.batch.fs.FSFileResource}} constructor > - {{org.apache.tika.batch.fs.FSListCrawler}} constructor > - {{org.apache.tika.batch.fs.FSOutputStreamFactory}} constructor > - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfOrSameAsThat(File, > File)}} > - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfThat(File, File)}} > - {{org.apache.tika.batch.fs.strawman.StrawManTikaAppDriver}} constructor > _tika-core:_ > - {{org.apache.tika.Tika#detect(File)}} > - {{org.apache.tika.Tika#parse(File)}} > - {{org.apache.tika.Tika#parseToString(File)}} > - {{org.apache.tika.config.TikaConfig}} constructors > - {{org.apache.tika.detect.NNExampleModelDetector}} constructor > - {{org.apache.tika.detect.TrainedModelDetector#loadDefaultModels(File)}} > - {{org.apache.tika.io.TemporaryResources#setTemporaryFileDirectory(File)}} > - {{org.apache.tika.io.TikaInputStream#get(File)}} > - {{org.apache.tika.io.TikaInputStream#get(File, Metadata)}} > _tika-parsers:_ > - {{org.apache.tika.parser.ParsingReader}} constructor > - {{org.apache.tika.parser.image.ImageMetadataExtractor#parseJpeg(File)}} > - {{org.apache.tika.parser.image.ImageMetadataExtractor#parseWebP(File)}} > - {{org.apache.tika.parser.mp4.DirectFileReadDataSource}} constructor > _tika-translate:_ > - > {{org.apache.tika.language.translate.ExternalTranslator#runAndGetOutput(String, > String[], File)}} > Due to lack of evidence, all public methods in public non-test classes (and > not in tika-example) are deemed part of a public API - although there's no > formal definition of such. > If anyone knows of a public method which isn't accessed publicly and can be > defined as package-private, or for another reason, please comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path
Yaniv Kunda created TIKA-1726: - Summary: Augment public methods that use a java.io.File with methods that use a java.nio.file.Path Key: TIKA-1726 URL: https://issues.apache.org/jira/browse/TIKA-1726 Project: Tika Issue Type: Improvement Components: batch, core, gui, parser, translation Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 In light of Java 7 already EOL, it's high time we add support for the new java.nio.file.Path class introduced with it, which, together with support methods in java.nio.file.Files and others, provide a better file I/O framework than java.io.File. In just two cases, we have public methods in tika that only return a File object, and cannot be overloaded, so a different name for the new method must be created: - {{org.apache.tika.io.TemporaryResources#createTemporaryFile()}} _Suggestions:_ -- addTemporaryFile -- addTempFile -- createTempFile - {{org.apache.tika.io.TikaInputStream#getFile()}} _Suggestions:_ -- asFile -- toPath -- getPath In other cases, the methods accept a File as an argument, and should remain as tika users might be using them - so an overloaded method that accepts a Path instead should be added, deprecating the old method until an unknown tika major release. Here is the full list of other methods: _tika-app:_ - {{org.apache.tika.gui.TikaGUI#openFile(File)}} _tika-batch:_ - {{org.apache.tika.batch.fs.FSUtil#getOutputFile(File, String, HANDLE_EXISTING, String)}} - {{org.apache.tika.util.PropsUtil#getFile(String, File)}} - {{org.apache.tika.batch.fs.FSDirectoryCrawler}} constructors - {{org.apache.tika.batch.fs.FSDirectoryCrawler#handleFirstFileInDirectory(File)}} - {{org.apache.tika.batch.fs.FSFileResource}} constructor - {{org.apache.tika.batch.fs.FSListCrawler}} constructor - {{org.apache.tika.batch.fs.FSOutputStreamFactory}} constructor - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfOrSameAsThat(File, File)}} - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfThat(File, File)}} - {{org.apache.tika.batch.fs.strawman.StrawManTikaAppDriver}} constructor _tika-core:_ - {{org.apache.tika.Tika#detect(File)}} - {{org.apache.tika.Tika#parse(File)}} - {{org.apache.tika.Tika#parseToString(File)}} - {{org.apache.tika.config.TikaConfig}} constructors - {{org.apache.tika.detect.NNExampleModelDetector}} constructor - {{org.apache.tika.detect.TrainedModelDetector#loadDefaultModels(File)}} - {{org.apache.tika.io.TemporaryResources#setTemporaryFileDirectory(File)}} - {{org.apache.tika.io.TikaInputStream#get(File)}} - {{org.apache.tika.io.TikaInputStream#get(File, Metadata)}} _tika-parsers:_ - {{org.apache.tika.parser.ParsingReader}} constructor - {{org.apache.tika.parser.image.ImageMetadataExtractor#parseJpeg(File)}} - {{org.apache.tika.parser.image.ImageMetadataExtractor#parseWebP(File)}} - {{org.apache.tika.parser.mp4.DirectFileReadDataSource}} constructor _tika-translate:_ - {{org.apache.tika.language.translate.ExternalTranslator#runAndGetOutput(String, String[], File)}} Due to lack of evidence, all public methods in public non-test classes (and not in tika-example) are deemed part of a public API - although there's no formal definition of such. If anyone knows of a public method which isn't accessed publicly and can be defined as package-private, or for another reason, please comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: Adding API support for Java 7's java.nio.file.Path
I’ve formalized this issue here: https://issues.apache.org/jira/browse/TIKA-1726 Please take the time and share your opinion on the new method names, so I can go ahead a provide some patches. *From:* Yaniv Kunda [mailto:yaniv.ku...@answers.com] *Sent:* Monday, August 31, 2015 18:51 *To:* dev@tika.apache.org *Subject:* Re: Adding API support for Java 7's java.nio.file.Path I've already done that, I'm just waiting for the group's opinions on names for the new methods, especially the two that I've added to augment org.apache.tika.io.TemporaryResources#createTemporaryFile And org.apache.tika.io.TikaInputStream#getFile As described below. On Aug 31, 2015 3:26 PM, "Konstantin Gribov" <gros...@gmail.com> wrote: My two cents, we can migrate to Files.copy, Files.newBufferedReader etc in places where it can replace commons-io and Tika's internal copy of it. сб, 29 авг. 2015 г. в 19:48, Ken Krugler <kkrugler_li...@transpac.com>: > > > From: Yaniv Kunda > > Sent: August 29, 2015 2:21:23am PDT > > To: dev@tika.apache.org > > Subject: RE: Adding API support for Java 7's java.nio.file.Path > > > > In addition to the discussion I've raised about the methods returning a > > File, I have another problem: > > Some of the methods that accept a File throw a FileNotFoundException. > > This exception is thrown by FIS/FOS/RAF constructors in response to > > anything - from an file that's actually not there to access denied. > > The NIO api methods usually declare to throw an IOException, which can > be a > > subclass representing a more accurate reason - NoSuchFileException or > > AccessDeniedException. > > > > When adding the overloaded methods accepting a Path, I initially thought > to > > delegate the old methods to the new ones, but the new ones declare an > > IOException while the old declare a FileNotFoundException. > > > > I have three options: > > 1) Leave the old methods with their own code - > > this means essentially duplicate code, but complete backward > > compatibility. > > +1 > > I don't feel strongly, but I think we get max bang for the development > buck by doing the simplest thing here. > > And it doesn't feel like it'll be that long before Tika 2.0, when the old > method code can be removed. > > -- Ken > > > 2) Delegate the old methods to the new ones, but catch the IOException > and > > wrap it in a FileNotFoundException - > > this will remain backward compatible, unless some catching a > > FileNotFoundException does text analysis on the exception message. > > 3) Delegate the old methods to the new ones, and change the signature > > accordingly to throw an IOException instead of a FileNotFoundException - > > this will break backward compatibility, only in cases a > > FileNotFoundException was caught explicitly. > > > > What do you think? > > > > -Original Message- > > From: Yaniv Kunda [mailto:yaniv.ku...@answers.com] > > Sent: Friday, August 28, 2015 03:33 > > To: dev@tika.apache.org > > Subject: RE: Adding API support for Java 7's java.nio.file.Path > > > > Thanks, I just like to move things forward :-) > > > > Regarding my proposed API additions - > > since adding new methods will make them a part of a new API, this is a > > change to make their names more meaningful/concise/correct: replacing > File > > with Path in the method name might be awkward. > > > > I'd like to gather alternatives for the changes/additions to methods that > > return a File. > > I found a total of 4 methods that return a java.io.File and are public, > in > > public non-test classes and not in tika-example (I assume the rest can be > > changed without breaking anything). > > For each method I will provide my suggestion/s, which will be either "Add > > newName", "Replace with newName" or "Keep": > > > > tika-batch: > > - org.apache.tika.batch.fs.FSUtil#getOutputFile > > + Keep > > - org.apache.tika.util.PropsUtil#getFile > > + Keep > > > > tika-core: > > - org.apache.tika.io.TemporaryResources#createTemporaryFile > > + Add addTemporaryFile > > Add addTempFile > > Add createTempFile > > - org.apache.tika.io.TikaInputStream#getFile > > + Add asFile > > Add toPath > > Add getPath > > > > I've added a '+' to the left of my preference - please add yours to your > > preference or add a new suggestion. > > > > Regarding added methods - I really think that the old methods should be > > deprecated. > > IMO a typo or a simple name change is a good e
Re: Adding API support for Java 7's java.nio.file.Path
I've already done that, I'm just waiting for the group's opinions on names for the new methods, especially the two that I've added to augment org.apache.tika.io.TemporaryResources#createTemporaryFile And org.apache.tika.io.TikaInputStream#getFile As described below. On Aug 31, 2015 3:26 PM, "Konstantin Gribov" <gros...@gmail.com> wrote: > My two cents, we can migrate to Files.copy, Files.newBufferedReader etc in > places where it can replace commons-io and Tika's internal copy of it. > > сб, 29 авг. 2015 г. в 19:48, Ken Krugler <kkrugler_li...@transpac.com>: > > > > > > From: Yaniv Kunda > > > Sent: August 29, 2015 2:21:23am PDT > > > To: dev@tika.apache.org > > > Subject: RE: Adding API support for Java 7's java.nio.file.Path > > > > > > In addition to the discussion I've raised about the methods returning a > > > File, I have another problem: > > > Some of the methods that accept a File throw a FileNotFoundException. > > > This exception is thrown by FIS/FOS/RAF constructors in response to > > > anything - from an file that's actually not there to access denied. > > > The NIO api methods usually declare to throw an IOException, which can > > be a > > > subclass representing a more accurate reason - NoSuchFileException or > > > AccessDeniedException. > > > > > > When adding the overloaded methods accepting a Path, I initially > thought > > to > > > delegate the old methods to the new ones, but the new ones declare an > > > IOException while the old declare a FileNotFoundException. > > > > > > I have three options: > > > 1) Leave the old methods with their own code - > > > this means essentially duplicate code, but complete backward > > > compatibility. > > > > +1 > > > > I don't feel strongly, but I think we get max bang for the development > > buck by doing the simplest thing here. > > > > And it doesn't feel like it'll be that long before Tika 2.0, when the old > > method code can be removed. > > > > -- Ken > > > > > 2) Delegate the old methods to the new ones, but catch the IOException > > and > > > wrap it in a FileNotFoundException - > > > this will remain backward compatible, unless some catching a > > > FileNotFoundException does text analysis on the exception message. > > > 3) Delegate the old methods to the new ones, and change the signature > > > accordingly to throw an IOException instead of a FileNotFoundException > - > > > this will break backward compatibility, only in cases a > > > FileNotFoundException was caught explicitly. > > > > > > What do you think? > > > > > > -Original Message- > > > From: Yaniv Kunda [mailto:yaniv.ku...@answers.com] > > > Sent: Friday, August 28, 2015 03:33 > > > To: dev@tika.apache.org > > > Subject: RE: Adding API support for Java 7's java.nio.file.Path > > > > > > Thanks, I just like to move things forward :-) > > > > > > Regarding my proposed API additions - > > > since adding new methods will make them a part of a new API, this is a > > > change to make their names more meaningful/concise/correct: replacing > > File > > > with Path in the method name might be awkward. > > > > > > I'd like to gather alternatives for the changes/additions to methods > that > > > return a File. > > > I found a total of 4 methods that return a java.io.File and are public, > > in > > > public non-test classes and not in tika-example (I assume the rest can > be > > > changed without breaking anything). > > > For each method I will provide my suggestion/s, which will be either > "Add > > > newName", "Replace with newName" or "Keep": > > > > > > tika-batch: > > > - org.apache.tika.batch.fs.FSUtil#getOutputFile > > > + Keep > > > - org.apache.tika.util.PropsUtil#getFile > > > + Keep > > > > > > tika-core: > > > - org.apache.tika.io.TemporaryResources#createTemporaryFile > > > + Add addTemporaryFile > > > Add addTempFile > > > Add createTempFile > > > - org.apache.tika.io.TikaInputStream#getFile > > > + Add asFile > > > Add toPath > > > Add getPath > > > > > > I've added a '+' to the left of my preference - please add yours to > your > > > preference or add a new suggestion. > > > > > > Regarding added methods - I really think
Re: [jira] [Commented] (TIKA-1672) Integrate tika-java7 component
I believe the tika-java7 component must remain optional, as its sole purpose is to serve as a concrete SPI implementation of FileTypeDetector, most commonly used in https://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#probeContentType-java.nio.file.Path- I do agree that a name change can help - here are a few suggestions: tika-java7-spi tika-java7-filetypedetector tika-java7-detector-spi On Aug 31, 2015 7:53 AM, "Tyler Palsulich (JIRA)"wrote: > > > [ https://issues.apache.org/jira/browse/TIKA-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14722705#comment-14722705 ] > > Tyler Palsulich commented on TIKA-1672: > --- > > Hmm. Maybe we should rename the module? Right now, it doesn't make sense to have a java7 component when the entire project depends on Java 7. > > > Integrate tika-java7 component > > -- > > > > Key: TIKA-1672 > > URL: https://issues.apache.org/jira/browse/TIKA-1672 > > Project: Tika > > Issue Type: Improvement > >Reporter: Tyler Palsulich > > Fix For: 1.11 > > > > > > Code requiring Java 7 doesn't need to be in a separate module now that TIKA-1536 (upgrade to Java 7) is done. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) -- This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email le...@answers.com. If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out. Thank you.
try-with-resources
I’ve opened https://issues.apache.org/jira/browse/TIKA-1719 along with a patch that converts applicable code to use the try-with-resources statement. Although the patch is big and covers 105 files, it’s very shallow and contains only trivial use cases – most of them fixed by IntelliJ’s quick-fix. I would appreciate if any committer can review this and push it through – I already have other changes (using Java 7’s java.nio.file.Path) waiting for it to avoid conflicts. If this is too much, I can separate it to different patches, per module or any other discriminator – although the absolute majority is in tika-parser’s tests. *Yaniv Kunda* Technical Lead yaniv.ku...@answers.com *p* +972 (3) 7661819 *m* +972 (54) 4644456 [image: Webcollage by Answers] www.answers.com/webcollage -- This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email le...@answers.com. If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out. Thank you.
[jira] [Commented] (TIKA-1706) Bring back commons-io to tika-core
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717349#comment-14717349 ] Yaniv Kunda commented on TIKA-1706: --- The fact that o.a.tika.io contains public classes is a problem I didn't think about - these files are strictly meant as internal utility/support classes and shouldn't really be used by users. In fact, I would say although these are public classes, they should not be considered a part of the public API of tika-core. And since we don't know what commons-io-cloned classes users use (probably by accident), it is indeed a problem letting these go. I also think that the no-dependencies principle is more romantic than it is useful, as these days a lot of the Java ecosystem is built on using external libraries, unless space is critical such as in mobile applications (and even these are getting bigger and bigger). As the vast majority of tika-core usages comes transitively from tika-parsers, I think this is not the case. I haven't crawled maven repo (deep enough) to find how many tika-code exclusive usages have a few or no other dependencies, but I suspect that number is not very high. So the absolute worst case here - and remember that this is the extreme case of a library that uses tika-core and no other library - is a 30% footprint increase! o.a.tika.io is a mess - it contains: - classes from commons-io-1.4 - partial classes from commons-io-1.4 - modified classes from commons-io-1.4 - classes from commons-io-2.0 (or later unknown version/s) - tika original classes It's really hard going over all changes - and I've shown just a few examples - but just doing the switch is simply easier, not so costly even in the worst case, and would bring progress to our doorstep (today and in future changes) by exploration faster than maintaining copied code. My suggestion is: - bring commons-io back to tika-core - change all usages of the copied classes to commons-io - deprecate (do not delete) the copied classes, probably until tika-2.0 Bring back commons-io to tika-core -- Key: TIKA-1706 URL: https://issues.apache.org/jira/browse/TIKA-1706 Project: Tika Issue Type: Improvement Components: core Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 TIKA-249 inlined select commons-io classes in order to simplify the dependency tree and save some space. I believe these arguments are weaker nowadays due to the following concerns: - Most of the non-core modules already use commons-io, and since tika-core is usually not used by itself, commons-io is already included with it - Since some modules use both tika-core and commons-io, it's not clear which code should be used - Having the inlined classes causes more maintenance and/or technology debt (which in turn causes more maintenance) - Newer commons-io code utilizes newer platform code, e.g. using Charset objects instead of encoding names, being able to use StringBuilder instead of StringBuffer, and so on. I'll be happy to provide a patch to replace usages of the inlined classes with commons-io classes if this is accepted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1706) Bring back commons-io to tika-core
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717519#comment-14717519 ] Yaniv Kunda commented on TIKA-1706: --- That's why I suggested to just add commons-io to tika-core, use it internally, and just deprecate the copied classes. Is that ok for 1.x? Bring back commons-io to tika-core -- Key: TIKA-1706 URL: https://issues.apache.org/jira/browse/TIKA-1706 Project: Tika Issue Type: Improvement Components: core Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 TIKA-249 inlined select commons-io classes in order to simplify the dependency tree and save some space. I believe these arguments are weaker nowadays due to the following concerns: - Most of the non-core modules already use commons-io, and since tika-core is usually not used by itself, commons-io is already included with it - Since some modules use both tika-core and commons-io, it's not clear which code should be used - Having the inlined classes causes more maintenance and/or technology debt (which in turn causes more maintenance) - Newer commons-io code utilizes newer platform code, e.g. using Charset objects instead of encoding names, being able to use StringBuilder instead of StringBuffer, and so on. I'll be happy to provide a patch to replace usages of the inlined classes with commons-io classes if this is accepted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: Adding API support for Java 7's java.nio.file.Path
Thanks, I just like to move things forward :-) Regarding my proposed API additions - since adding new methods will make them a part of a new API, this is a change to make their names more meaningful/concise/correct: replacing File with Path in the method name might be awkward. I'd like to gather alternatives for the changes/additions to methods that return a File. I found a total of 4 methods that return a java.io.File and are public, in public non-test classes and not in tika-example (I assume the rest can be changed without breaking anything). For each method I will provide my suggestion/s, which will be either Add newName, Replace with newName or Keep: tika-batch: - org.apache.tika.batch.fs.FSUtil#getOutputFile + Keep - org.apache.tika.util.PropsUtil#getFile + Keep tika-core: - org.apache.tika.io.TemporaryResources#createTemporaryFile + Add addTemporaryFile Add addTempFile Add createTempFile - org.apache.tika.io.TikaInputStream#getFile + Add asFile Add toPath Add getPath I've added a '+' to the left of my preference - please add yours to your preference or add a new suggestion. Regarding added methods - I really think that the old methods should be deprecated. IMO a typo or a simple name change is a good enough reason for deprecating a method - so returning a legacy class makes it even more welcome. -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, August 27, 2015 17:36 To: dev@tika.apache.org Subject: RE: Adding API support for Java 7's java.nio.file.Path +1 Thank you, Yaniv, for leading this effort. I have a small preference for getting rid of File entirely eventually (2.0?) as Lucene and Hadoop seem to have done (?). -Original Message- From: Yaniv Kunda [mailto:yaniv.ku...@answers.com] Sent: Wednesday, August 26, 2015 5:31 PM To: dev@tika.apache.org Subject: RE: Adding API support for Java 7's java.nio.file.Path I can point out several benefits of supporting the new API, in no particular order: - Exception handling: operations like File.delete return a boolean which provides less useful information if the operation failed than the exception thrown by Files.delete() (or a Minion...) - Performance: The new API delegates more parts of I/O operations to the OS, resulting in better usage of resources. In independent testing I've done (considering big files, cache warmup and randomized order) I've achieved 30% faster reads when using Files.copy() or FileChannel.transferTo() - Adoption: Java 7, in which the new API appeared, is already EOL. Supporting this API, considering that java.io is considered legacy, is good for keeping us with times, and even better for our users as it offers them an incentive of moving forward as well. More can be found here: http://docs.oracle.com/javase/tutorial/essential/io/legacy.html I believe that the library - user relationship must have a balance between compatibility and progress, as if libraries are stuck at compatibility - the users are sometimes stuck without progress... If we can have progress without breaking compatibility - we have a winner. I propose to add support for and make the most of the new (4 y/o) API without breaking compatibility, which means: - Public methods accepting a File will not be changed; overloaded versions will be added. - Public methods returning a File will not be changed; methods with different names will be added. - Non-public methods accepting or returning a File will be changed - Internal uses of the legacy I/O will be updated to use the new API where easy Regarding deprecation, I suggest that: 1) Methods accepting a File will not be deprecated - they will probably be used as long as File itself is not deprecated (forever?) 2) Methods returning a File will be deprecated - progressive users can use the new methods easily, less progressive can use the new methods adding .toFile() to the result, and the rest can still use the deprecated methods (which will most likely call the new methods internally anyway). To summarize: overloading = convenience, methods with the same operation but different name and return value = confusing. If this seems like a decent proposal, I can separate this work into several JIRA issues and patches, so that reviewing the changes is easier. -Original Message- From: Nick Burch [mailto:apa...@gagravarr.org] Sent: Wednesday, August 26, 2015 13:27 To: dev@tika.apache.org Subject: Re: Adding API support for Java 7's java.nio.file.Path On Wed, 26 Aug 2015, Yaniv Kunda wrote: I would like to propose adding support for Java 7’s java.nio.file.Path as an alternative to those methods in the API that deal with a java.io.File. Any chance you could briefly summarise what advantages this would give to us and/or our users? 1) What can we do with methods returning a File? e.g. TemporaryResources.createTemporaryFile, TikaInputStream.getFile. Should we break compatibility and encourage (=force) users to change their code (Note that since
RE: Adding API support for Java 7's java.nio.file.Path
I can point out several benefits of supporting the new API, in no particular order: - Exception handling: operations like File.delete return a boolean which provides less useful information if the operation failed than the exception thrown by Files.delete() (or a Minion...) - Performance: The new API delegates more parts of I/O operations to the OS, resulting in better usage of resources. In independent testing I've done (considering big files, cache warmup and randomized order) I've achieved 30% faster reads when using Files.copy() or FileChannel.transferTo() - Adoption: Java 7, in which the new API appeared, is already EOL. Supporting this API, considering that java.io is considered legacy, is good for keeping us with times, and even better for our users as it offers them an incentive of moving forward as well. More can be found here: http://docs.oracle.com/javase/tutorial/essential/io/legacy.html I believe that the library - user relationship must have a balance between compatibility and progress, as if libraries are stuck at compatibility - the users are sometimes stuck without progress... If we can have progress without breaking compatibility - we have a winner. I propose to add support for and make the most of the new (4 y/o) API without breaking compatibility, which means: - Public methods accepting a File will not be changed; overloaded versions will be added. - Public methods returning a File will not be changed; methods with different names will be added. - Non-public methods accepting or returning a File will be changed - Internal uses of the legacy I/O will be updated to use the new API where easy Regarding deprecation, I suggest that: 1) Methods accepting a File will not be deprecated - they will probably be used as long as File itself is not deprecated (forever?) 2) Methods returning a File will be deprecated - progressive users can use the new methods easily, less progressive can use the new methods adding .toFile() to the result, and the rest can still use the deprecated methods (which will most likely call the new methods internally anyway). To summarize: overloading = convenience, methods with the same operation but different name and return value = confusing. If this seems like a decent proposal, I can separate this work into several JIRA issues and patches, so that reviewing the changes is easier. -Original Message- From: Nick Burch [mailto:apa...@gagravarr.org] Sent: Wednesday, August 26, 2015 13:27 To: dev@tika.apache.org Subject: Re: Adding API support for Java 7's java.nio.file.Path On Wed, 26 Aug 2015, Yaniv Kunda wrote: I would like to propose adding support for Java 7’s java.nio.file.Path as an alternative to those methods in the API that deal with a java.io.File. Any chance you could briefly summarise what advantages this would give to us and/or our users? 1) What can we do with methods returning a File? e.g. TemporaryResources.createTemporaryFile, TikaInputStream.getFile. Should we break compatibility and encourage (=force) users to change their code (Note that since they all use Java 7 now, the change is minimal by adding .toFile() to the result), or create new methods with different names (confusing)? Breaking compatibility outside of a 2.0 release is a big no-no, sorry. TemporaryResources.createTemporaryPath and TikaInputStream.getPath could work as naming 2) Should we deprecate the old methods accepting a File, or delete them? Deleting would break compatibility, so shouldn't be done. Deprecating could be done, if there's a strong reason to encourage people off them https://wiki.apache.org/tika/Tika2_0RoadMap is where we're tracking proposed API-breaking changes for 2.0 Nick -- This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email le...@answers.com. If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out. Thank you.
[jira] [Created] (TIKA-1720) Collect multiple exceptions in TemporaryResources.close() using Throwable.addSuppressed()
Yaniv Kunda created TIKA-1720: - Summary: Collect multiple exceptions in TemporaryResources.close() using Throwable.addSuppressed() Key: TIKA-1720 URL: https://issues.apache.org/jira/browse/TIKA-1720 Project: Tika Issue Type: Improvement Components: core Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 TemporaryResource.close() currently collects exceptions throw by trying to close its resources in a list. When the time to propagate an exception comes, information is lost - the thrown exception contains a message with the string descriptions of all exceptions, and the first exception as the cause - there is no stack trace describing what went wrong closing a resource. In addition, the thrown exception is IOExceptionWithCause, copied from commons-io, which is redundant since Java 6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1721) Replace IOExceptionWithCause in ForkClient
Yaniv Kunda created TIKA-1721: - Summary: Replace IOExceptionWithCause in ForkClient Key: TIKA-1721 URL: https://issues.apache.org/jira/browse/TIKA-1721 Project: Tika Issue Type: Improvement Components: core Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 IOExceptionWithCause (copied from commons-io) is redundant since Java 6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1720) Collect multiple exceptions in TemporaryResources.close() using Throwable.addSuppressed()
[ https://issues.apache.org/jira/browse/TIKA-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1720: -- Attachment: TIKA-1720.patch Collect multiple exceptions in TemporaryResources.close() using Throwable.addSuppressed() - Key: TIKA-1720 URL: https://issues.apache.org/jira/browse/TIKA-1720 Project: Tika Issue Type: Improvement Components: core Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 Attachments: TIKA-1720.patch TemporaryResource.close() currently collects exceptions throw by trying to close its resources in a list. When the time to propagate an exception comes, information is lost - the thrown exception contains a message with the string descriptions of all exceptions, and the first exception as the cause - there is no stack trace describing what went wrong closing a resource. In addition, the thrown exception is IOExceptionWithCause, copied from commons-io, which is redundant since Java 6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Adding API support for Java 7's java.nio.file.Path
I would like to propose adding support for Java 7’s java.nio.file.Path as an alternative to those methods in the API that deal with a java.io.File. This is pretty trivial for File as a param, as new overloaded methods/constructors can be added that accept a Path. A few questions arise: 1) What can we do with methods returning a File? e.g. TemporaryResources.createTemporaryFile, TikaInputStream.getFile. Should we break compatibility and encourage (=force) users to change their code (Note that since they all use Java 7 now, the change is minimal by adding .toFile() to the result), or create new methods with different names (confusing)? 2) Should we deprecate the old methods accepting a File, or delete them? I’m ready to open an issue and provide patches. *Yaniv Kunda* Technical Lead yaniv.ku...@answers.com *p* +972 (3) 7661819 *m* +972 (54) 4644456 [image: Webcollage by Answers] www.answers.com/webcollage -- This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email le...@answers.com. If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out. Thank you.
[jira] [Updated] (TIKA-1722) Tika methods that accept a File needlessly convert it to a URL
[ https://issues.apache.org/jira/browse/TIKA-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1722: -- Attachment: TIKA-1722.patch Tika methods that accept a File needlessly convert it to a URL -- Key: TIKA-1722 URL: https://issues.apache.org/jira/browse/TIKA-1722 Project: Tika Issue Type: Improvement Components: core Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 Attachments: TIKA-1722.patch The following methods: - Tika.detect(File) - Tika.parse(File) - Tika.parseToString(File) Convert the given File to a URL and use the corresponding overloaded method that accepts a URL. This seems like a shortcut, but essentially does the following: # Converts the file to a URI # Converts the URI to a URL # Calls TikaInputStream.get(URL, Metadata), which then performs the following special handling: # Checks if the protocol is file # Tries to convert the URL (back) to a URI # Creates a File around the URI # Checks if file.isFile() # Calls TikaInputStream.get(File, Metadata) The special handling in TikaInputStream.get(URL/URI) is a good optimization for in-the-wild file resources, but for internal uses it can be skipped - making Tika call TikaInputStream.get(File, Metadata) directly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1722) Tika methods that accept a File needlessly convert it to a URL
Yaniv Kunda created TIKA-1722: - Summary: Tika methods that accept a File needlessly convert it to a URL Key: TIKA-1722 URL: https://issues.apache.org/jira/browse/TIKA-1722 Project: Tika Issue Type: Improvement Components: core Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 The following methods: - Tika.detect(File) - Tika.parse(File) - Tika.parseToString(File) Convert the given File to a URL and use the corresponding overloaded method that accepts a URL. This seems like a shortcut, but essentially does the following: # Converts the file to a URI # Converts the URI to a URL # Calls TikaInputStream.get(URL, Metadata), which then performs the following special handling: # Checks if the protocol is file # Tries to convert the URL (back) to a URI # Creates a File around the URI # Checks if file.isFile() # Calls TikaInputStream.get(File, Metadata) The special handling in TikaInputStream.get(URL/URI) is a good optimization for in-the-wild file resources, but for internal uses it can be skipped - making Tika call TikaInputStream.get(File, Metadata) directly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1711) Remove java6-activated profile from tika-bundle and move its plugins to default build
[ https://issues.apache.org/jira/browse/TIKA-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1711: -- Attachment: (was: TIKA-1711.patch) Remove java6-activated profile from tika-bundle and move its plugins to default build - Key: TIKA-1711 URL: https://issues.apache.org/jira/browse/TIKA-1711 Project: Tika Issue Type: Bug Components: general Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 Since the project now requires Java 7, there's no point in allowing Java 6+ since the build would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1711) Remove java6-activated profile from tika-bundle and move its plugins to default build
[ https://issues.apache.org/jira/browse/TIKA-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1711: -- Summary: Remove java6-activated profile from tika-bundle and move its plugins to default build (was: Modify tika-bundle profile activation to require Java 7) Remove java6-activated profile from tika-bundle and move its plugins to default build - Key: TIKA-1711 URL: https://issues.apache.org/jira/browse/TIKA-1711 Project: Tika Issue Type: Bug Components: general Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 Since the project now requires Java 7, there's no point in allowing Java 6+ since the build would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1711) Remove java6-activated profile from tika-bundle and move its plugins to default build
[ https://issues.apache.org/jira/browse/TIKA-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1711: -- Attachment: TIKA-1711.patch Revised patch for the revised purpose Remove java6-activated profile from tika-bundle and move its plugins to default build - Key: TIKA-1711 URL: https://issues.apache.org/jira/browse/TIKA-1711 Project: Tika Issue Type: Bug Components: general Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 Attachments: TIKA-1711.patch Since the project now requires Java 7, there's no point in allowing Java 6+ since the build would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1719) Utilize try-with-resources where it is trivial
Yaniv Kunda created TIKA-1719: - Summary: Utilize try-with-resources where it is trivial Key: TIKA-1719 URL: https://issues.apache.org/jira/browse/TIKA-1719 Project: Tika Issue Type: Improvement Components: cli, core, example, gui, packaging, parser, server Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 The following type of resource usages: {code} AutoCloseable resource = ...; try { // do something with resource } finally { resource.close(); } {code} {code} AutoCloseable resource = null; try { resource = ...; // do something with resource } finally { if (resource != null) { resource.close(); } } {code} and similar constructs can be trivially replaced with Java 7's try-with-resource statement: {code} try (AutoCloseable resource = ...) { // do something with resource } {code} This brings more concise code with less chance of causing resource leaks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1710) Replace usages of classes in org.apache.tika.io with current alternatives
[ https://issues.apache.org/jira/browse/TIKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705085#comment-14705085 ] Yaniv Kunda commented on TIKA-1710: --- As much as I like Guava (the library, not the fruit) its only use was its com.google.common.baseCharsets class, containing constants for the Charset instances of the standard charsets - same as in Java's StandardCharsets. When I replaced this with the static imports of StandardCharsets, there was no use left. Regarding TaggedInputStream, I wasn't sure what to do - this wrap/cast method was a modification of the original commons-io code, and it was used only once - in RFC822Parser. I think it's a nice-to-have optimization helper method but nothing more - as it only saves the cost of a new TaggedInputStream when the source InputStream is already a TaggedInputStream: the checked tag will behave the same way in the same wrap-try-catch flow. The only other usage of TaggedInputStream in tika (besides by TikaInputStream) is in RTFParser, by using the constructor directly, is actually an empty usage - the TaggedInputStream is constructed and checked in the catch clause, but it is not used in the try block at all: the underlying stream does! Since almost all of tika uses TikaInputStream (which has an advanced version of this helper, ensuring bufferism), my opinion is to refrain from adding a helper method and simply use the constructor directly, for simplicity. Replace usages of classes in org.apache.tika.io with current alternatives - Key: TIKA-1710 URL: https://issues.apache.org/jira/browse/TIKA-1710 Project: Tika Issue Type: Improvement Components: batch, cli, core, example, gui, parser, server, translation Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 Attachments: TIKA-1710.patch Many of the classes in org.apache.tika.io were inlined from commons-io in TIKA-249, but these days most components use commons-io anyway, so in order to clean the dependencies on org.apache.tika.io in preparation of adding commons-io to tika-core, the following can be done: - Replace usages of classes in org.apache.tika.io within non-core components with the corresponding classes in commons-io - Replace usages of org.apache.tika.io.IOUtils.UTF_8 with java.nio.charset.StandardCharsets.UTF_8 (in all components, including tika-core) - Replace other uses of String encoding names of standard charsets with their corresponding Charsets instances from StandardCharsets (this is logically related to IOUtils as these constants should have been there as UTF_8 was before Java 7) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1719) Utilize try-with-resources where it is trivial
[ https://issues.apache.org/jira/browse/TIKA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1719: -- Attachment: TIKA-1719.patch Utilize try-with-resources where it is trivial -- Key: TIKA-1719 URL: https://issues.apache.org/jira/browse/TIKA-1719 Project: Tika Issue Type: Improvement Components: cli, core, example, gui, packaging, parser, server Reporter: Yaniv Kunda Priority: Minor Labels: easyfix Fix For: 1.11 Attachments: TIKA-1719.patch The following type of resource usages: {code} AutoCloseable resource = ...; try { // do something with resource } finally { resource.close(); } {code} {code} AutoCloseable resource = null; try { resource = ...; // do something with resource } finally { if (resource != null) { resource.close(); } } {code} and similar constructs can be trivially replaced with Java 7's try-with-resource statement: {code} try (AutoCloseable resource = ...) { // do something with resource } {code} This brings more concise code with less chance of causing resource leaks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1710) Replace usages of classes in org.apache.tika.io with current alternatives
[ https://issues.apache.org/jira/browse/TIKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1710: -- Attachment: (was: TIKA-1710.patch) Replace usages of classes in org.apache.tika.io with current alternatives - Key: TIKA-1710 URL: https://issues.apache.org/jira/browse/TIKA-1710 Project: Tika Issue Type: Improvement Components: batch, cli, core, example, gui, parser, server, translation Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 Attachments: TIKA-1710.patch Many of the classes in org.apache.tika.io were inlined from commons-io in TIKA-249, but these days most components use commons-io anyway, so in order to clean the dependencies on org.apache.tika.io in preparation of adding commons-io to tika-core, the following can be done: - Replace usages of classes in org.apache.tika.io within non-core components with the corresponding classes in commons-io - Replace usages of org.apache.tika.io.IOUtils.UTF_8 with java.nio.charset.StandardCharsets.UTF_8 (in all components, including tika-core) - Replace other uses of String encoding names of standard charsets with their corresponding Charsets instances from StandardCharsets (this is logically related to IOUtils as these constants should have been there as UTF_8 was before Java 7) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1710) Replace usages of classes in org.apache.tika.io with current alternatives
[ https://issues.apache.org/jira/browse/TIKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1710: -- Attachment: TIKA-1710.patch Replace usages of classes in org.apache.tika.io with current alternatives - Key: TIKA-1710 URL: https://issues.apache.org/jira/browse/TIKA-1710 Project: Tika Issue Type: Improvement Components: batch, cli, core, example, gui, parser, server, translation Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 Attachments: TIKA-1710.patch Many of the classes in org.apache.tika.io were inlined from commons-io in TIKA-249, but these days most components use commons-io anyway, so in order to clean the dependencies on org.apache.tika.io in preparation of adding commons-io to tika-core, the following can be done: - Replace usages of classes in org.apache.tika.io within non-core components with the corresponding classes in commons-io - Replace usages of org.apache.tika.io.IOUtils.UTF_8 with java.nio.charset.StandardCharsets.UTF_8 (in all components, including tika-core) - Replace other uses of String encoding names of standard charsets with their corresponding Charsets instances from StandardCharsets (this is logically related to IOUtils as these constants should have been there as UTF_8 was before Java 7) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1711) Modify tika-bundle profile activation to require Java 7
Yaniv Kunda created TIKA-1711: - Summary: Modify tika-bundle profile activation to require Java 7 Key: TIKA-1711 URL: https://issues.apache.org/jira/browse/TIKA-1711 Project: Tika Issue Type: Bug Components: general Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 Since the project now requires Java 7, there's no point in allowing Java 6+ since the build would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1710) Replace usages of classes in org.apache.tika.io with current alternatives
[ https://issues.apache.org/jira/browse/TIKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1710: -- Attachment: (was: TIKA-1710.patch) Replace usages of classes in org.apache.tika.io with current alternatives - Key: TIKA-1710 URL: https://issues.apache.org/jira/browse/TIKA-1710 Project: Tika Issue Type: Improvement Components: batch, cli, core, example, gui, parser, server, translation Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 Many of the classes in org.apache.tika.io were inlined from commons-io in TIKA-249, but these days most components use commons-io anyway, so in order to clean the dependencies on org.apache.tika.io in preparation of adding commons-io to tika-core, the following can be done: - Replace usages of classes in org.apache.tika.io within non-core components with the corresponding classes in commons-io - Replace usages of org.apache.tika.io.IOUtils.UTF_8 with java.nio.charset.StandardCharsets.UTF_8 (in all components, including tika-core) - Replace other uses of String encoding names of standard charsets with their corresponding Charsets instances from StandardCharsets (this is logically related to IOUtils as these constants should have been there as UTF_8 was before Java 7) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1710) Replace usages of classes in org.apache.tika.io with current alternatives
[ https://issues.apache.org/jira/browse/TIKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1710: -- Attachment: TIKA-1710.patch Revised patch without StandardCharsets wildcard static imports Replace usages of classes in org.apache.tika.io with current alternatives - Key: TIKA-1710 URL: https://issues.apache.org/jira/browse/TIKA-1710 Project: Tika Issue Type: Improvement Components: batch, cli, core, example, gui, parser, server, translation Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 Attachments: TIKA-1710.patch Many of the classes in org.apache.tika.io were inlined from commons-io in TIKA-249, but these days most components use commons-io anyway, so in order to clean the dependencies on org.apache.tika.io in preparation of adding commons-io to tika-core, the following can be done: - Replace usages of classes in org.apache.tika.io within non-core components with the corresponding classes in commons-io - Replace usages of org.apache.tika.io.IOUtils.UTF_8 with java.nio.charset.StandardCharsets.UTF_8 (in all components, including tika-core) - Replace other uses of String encoding names of standard charsets with their corresponding Charsets instances from StandardCharsets (this is logically related to IOUtils as these constants should have been there as UTF_8 was before Java 7) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (TIKA-1706) Bring back commons-io to tika-core
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1706: -- Comment: was deleted (was: A patch to bring back commons-io to tika-core and replace all formerly inlined classes.) Bring back commons-io to tika-core -- Key: TIKA-1706 URL: https://issues.apache.org/jira/browse/TIKA-1706 Project: Tika Issue Type: Improvement Components: core Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 TIKA-249 inlined select commons-io classes in order to simplify the dependency tree and save some space. I believe these arguments are weaker nowadays due to the following concerns: - Most of the non-core modules already use commons-io, and since tika-core is usually not used by itself, commons-io is already included with it - Since some modules use both tika-core and commons-io, it's not clear which code should be used - Having the inlined classes causes more maintenance and/or technology debt (which in turn causes more maintenance) - Newer commons-io code utilizes newer platform code, e.g. using Charset objects instead of encoding names, being able to use StringBuilder instead of StringBuffer, and so on. I'll be happy to provide a patch to replace usages of the inlined classes with commons-io classes if this is accepted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1706) Bring back commons-io to tika-core
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698477#comment-14698477 ] Yaniv Kunda commented on TIKA-1706: --- I've separated all the related changes besides adding commons-io to tika-core, and opened under TIKA-1710. In addition, the recently added commons-io-unsafe check have now found a couple of more default encoding usages: tika-core: src\main\java\org\apache\tika\Tika.java tika-server: src\test\java\org\apache\tika\server\CXFTestBase.java Bring back commons-io to tika-core -- Key: TIKA-1706 URL: https://issues.apache.org/jira/browse/TIKA-1706 Project: Tika Issue Type: Improvement Components: core Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 TIKA-249 inlined select commons-io classes in order to simplify the dependency tree and save some space. I believe these arguments are weaker nowadays due to the following concerns: - Most of the non-core modules already use commons-io, and since tika-core is usually not used by itself, commons-io is already included with it - Since some modules use both tika-core and commons-io, it's not clear which code should be used - Having the inlined classes causes more maintenance and/or technology debt (which in turn causes more maintenance) - Newer commons-io code utilizes newer platform code, e.g. using Charset objects instead of encoding names, being able to use StringBuilder instead of StringBuffer, and so on. I'll be happy to provide a patch to replace usages of the inlined classes with commons-io classes if this is accepted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1710) Replace usages of classes in org.apache.tika.io with current alternatives
Yaniv Kunda created TIKA-1710: - Summary: Replace usages of classes in org.apache.tika.io with current alternatives Key: TIKA-1710 URL: https://issues.apache.org/jira/browse/TIKA-1710 Project: Tika Issue Type: Improvement Components: batch, cli, core, example, gui, parser, server, translation Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 Many of the classes in org.apache.tika.io were inlined from commons-io in TIKA-249, but these days most components use commons-io anyway, so in order to clean the dependencies on org.apache.tika.io in preparation of adding commons-io to tika-core, the following can be done: - Replace usages of classes in org.apache.tika.io within non-core components with the corresponding classes in commons-io - Replace usages of org.apache.tika.io.IOUtils.UTF_8 with java.nio.charset.StandardCharsets.UTF_8 (in all components, including tika-core) - Replace other uses of String encoding names of standard charsets with their corresponding Charsets instances from StandardCharsets (this is logically related to IOUtils as these constants should have been there as UTF_8 was before Java 7) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1706) Bring back commons-io to tika-core
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1706: -- Attachment: TIKA-1706.patch A patch to bring back commons-io to tika-core and replace all formerly inlined classes. Bring back commons-io to tika-core -- Key: TIKA-1706 URL: https://issues.apache.org/jira/browse/TIKA-1706 Project: Tika Issue Type: Improvement Components: core Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 Attachments: TIKA-1706.patch TIKA-249 inlined select commons-io classes in order to simplify the dependency tree and save some space. I believe these arguments are weaker nowadays due to the following concerns: - Most of the non-core modules already use commons-io, and since tika-core is usually not used by itself, commons-io is already included with it - Since some modules use both tika-core and commons-io, it's not clear which code should be used - Having the inlined classes causes more maintenance and/or technology debt (which in turn causes more maintenance) - Newer commons-io code utilizes newer platform code, e.g. using Charset objects instead of encoding names, being able to use StringBuilder instead of StringBuffer, and so on. I'll be happy to provide a patch to replace usages of the inlined classes with commons-io classes if this is accepted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1706) Bring back commons-io to tika-core
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696025#comment-14696025 ] Yaniv Kunda commented on TIKA-1706: --- I agree that generally adding an external dependency to a core module might have an impact, but consider that unlike tika-core, commons-io is a true low-level library: it has no compile-time dependencies and is used by 2500 projects in maven central alone. I believe that copying the code of another library, frozen in time (in this case since 2008), hinders innovation and reduces the chance that anyone will utilize new improvements and fixes in newer commons-io since: # it is disconnected from tika and requires manual discovery and research (if commons-io is used as an external dependency it's easy to find deprecated methods and their replacements using static analysis) # it requires manual maintenance of copying select classes/code It's not easy summing more than 7 years of changes in common-io, but here are some beneficial changes I found along the way: - Use org.apache.commons.io.output.ByteArrayOutputStream instead of java.io.ByteArrayOutputStream (this class is actually not that new, but can benefit many uses and save a lot of byte-copying) - this has been further improved by providing an optimized InputStream from a org.apache.commons.io.output.ByteArrayOutputStream (IO-137) - Allow using Charset instead of String encoding (IO-318) - Use StringBuilderWriter instead of StringWriter to avoid unnecessary synchronization (IO-140) Obviously, I did not propose this change just for the sake of disrupting the peace, but I plan and started a series of patches to utilize newer commons-io, which will follow - each in its own issue - once and if commons-io is added as a dependency to tika-core. Bring back commons-io to tika-core -- Key: TIKA-1706 URL: https://issues.apache.org/jira/browse/TIKA-1706 Project: Tika Issue Type: Improvement Components: core Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 TIKA-249 inlined select commons-io classes in order to simplify the dependency tree and save some space. I believe these arguments are weaker nowadays due to the following concerns: - Most of the non-core modules already use commons-io, and since tika-core is usually not used by itself, commons-io is already included with it - Since some modules use both tika-core and commons-io, it's not clear which code should be used - Having the inlined classes causes more maintenance and/or technology debt (which in turn causes more maintenance) - Newer commons-io code utilizes newer platform code, e.g. using Charset objects instead of encoding names, being able to use StringBuilder instead of StringBuffer, and so on. I'll be happy to provide a patch to replace usages of the inlined classes with commons-io classes if this is accepted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1706) Bring back commons-io to tika-core
Yaniv Kunda created TIKA-1706: - Summary: Bring back commons-io to tika-core Key: TIKA-1706 URL: https://issues.apache.org/jira/browse/TIKA-1706 Project: Tika Issue Type: Improvement Components: core Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11 TIKA-249 inlined select commons-io classes in order to simplify the dependency tree and save some space. I believe these arguments are weaker nowadays due to the following concerns: - Most of the non-core modules already use commons-io, and since tika-core is usually not used by itself, commons-io is already included with it - Since some modules use both tika-core and commons-io, it's not clear which code should be used - Having the inlined classes causes more maintenance and/or technology debt (which in turn causes more maintenance) - Newer commons-io code utilizes newer platform code, e.g. using Charset objects instead of encoding names, being able to use StringBuilder instead of StringBuffer, and so on. I'll be happy to provide a patch to replace usages of the inlined classes with commons-io classes if this is accepted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)