[jira] [Commented] (TIKA-3080) CharsetMatch.getString can get stuck in infinite loop
[ https://issues.apache.org/jira/browse/TIKA-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073050#comment-17073050 ] Hudson commented on TIKA-3080: -- UNSTABLE: Integrated in Jenkins build Tika-trunk #1799 (See [https://builds.apache.org/job/Tika-trunk/1799/]) TIKA-3080 -- prevent infinite loop in CharsetMatch.getString (tallison: [https://github.com/apache/tika/commit/8e33e28b72b791710a1e9fdf515c2fcd72f82deb]) * (edit) tika-parsers/src/main/java/org/apache/tika/parser/txt/CharsetMatch.java > CharsetMatch.getString can get stuck in infinite loop > - > > Key: TIKA-3080 > URL: https://issues.apache.org/jira/browse/TIKA-3080 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.24 >Reporter: Vikram Shrowty >Priority: Major > > In here: > [https://github.com/apache/tika/blob/fb5a191edac2cef28c0a4ac390c9156acdc9e673/tika-parsers/src/main/java/org/apache/tika/parser/txt/CharsetMatch.java#L147-L150] > If you specify a maxLength and the stream is long enough, the max variable in > the loop goes to zero and the loop then gets stuck because you're asking to > read 0 bytes but not exiting unless the number of bytes read is < 0. > Looks like the condition ought to be > 0 instead of >= 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TIKA-3082) Consider adding an OpenAPI for tika-server
[ https://issues.apache.org/jira/browse/TIKA-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073020#comment-17073020 ] Konstantin Gribov commented on TIKA-3082: - Also we could later add client modules for couple of popular libraries to give downstream users ready-to-fly libs with already declared deps. > Consider adding an OpenAPI for tika-server > -- > > Key: TIKA-3082 > URL: https://issues.apache.org/jira/browse/TIKA-3082 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Assignee: Lewis John McGibbney >Priority: Major > > On TIKA-2253, [~lewismc] asked: > bq. I was planning on putting together an OpenAPI specification for Tika. Is > anyone in favor of this? > What do people think? How much will it change the current tika-server? What > are the benefits? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TIKA-3082) Consider adding an OpenAPI for tika-server
[ https://issues.apache.org/jira/browse/TIKA-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073017#comment-17073017 ] Konstantin Gribov commented on TIKA-3082: - [~lewismc], my gratitude and big +1 than) In my experience some OpenAPI/Swagger tools are quite fragile (like swagger-codegen could break on minor version update) but overall I'm very inclined to use it since it gives us better maintainability, documentation generation, easier API versioning. Also, I'd like to propose moving current APIs to versioned namespace {{/api/v1/*}} (and redirecting existing methods (like {{/meta}}, {{/rmeta}} etc) there with HTTP status 301. BTW, JetBrains IDEA has bundled OpenAPI plugin (at least 2020.1 RC does). > Consider adding an OpenAPI for tika-server > -- > > Key: TIKA-3082 > URL: https://issues.apache.org/jira/browse/TIKA-3082 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Assignee: Lewis John McGibbney >Priority: Major > > On TIKA-2253, [~lewismc] asked: > bq. I was planning on putting together an OpenAPI specification for Tika. Is > anyone in favor of this? > What do people think? How much will it change the current tika-server? What > are the benefits? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TIKA-3082) Consider adding an OpenAPI for tika-server
[ https://issues.apache.org/jira/browse/TIKA-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073000#comment-17073000 ] Lewis John McGibbney commented on TIKA-3082: I'm going to start working on an OpenAPI and I will post it here when complete. > Consider adding an OpenAPI for tika-server > -- > > Key: TIKA-3082 > URL: https://issues.apache.org/jira/browse/TIKA-3082 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Major > > On TIKA-2253, [~lewismc] asked: > bq. I was planning on putting together an OpenAPI specification for Tika. Is > anyone in favor of this? > What do people think? How much will it change the current tika-server? What > are the benefits? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (TIKA-3082) Consider adding an OpenAPI for tika-server
[ https://issues.apache.org/jira/browse/TIKA-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned TIKA-3082: -- Assignee: Lewis John McGibbney > Consider adding an OpenAPI for tika-server > -- > > Key: TIKA-3082 > URL: https://issues.apache.org/jira/browse/TIKA-3082 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Assignee: Lewis John McGibbney >Priority: Major > > On TIKA-2253, [~lewismc] asked: > bq. I was planning on putting together an OpenAPI specification for Tika. Is > anyone in favor of this? > What do people think? How much will it change the current tika-server? What > are the benefits? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TIKA-3082) Consider adding an OpenAPI for tika-server
[ https://issues.apache.org/jira/browse/TIKA-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072998#comment-17072998 ] Lewis John McGibbney commented on TIKA-3082: [~grossws] absolutely. # Firstly, OpenAPI is the industry standard for API first implementations meaning that moving forward the goal would be for our REST to be more user and developer friendly. # OpenAPI has a wide variety of tooling meaning that people could generate both server stub implementations and client implementations on the fly at their own convenience whilst we (the dev@tika community) maintain the existing tika-server Java implementation. # There are several Java server stub generation options for us to use, namely {code:java} java-inflector java-msf4j java-pkmst java-play-framework java-undertow-server java-vertx java-vertx-web (beta) jaxrs-cxf jaxrs-cxf-cdi jaxrs-cxf-extended jaxrs-jersey jaxrs-resteasy jaxrs-resteasy-eap jaxrs-spec {code} ... I think we would choose *jaxrs-cxf* in an attempt to cause minimal impact on the existing tika-server code. wdyt? # I tend to use [IBM's OpenAPI linter and validator|https://github.com/IBM/openapi-validator] in an attempt to obtain consistency in the quality of my REST API development. I think that this tool would make it easier for us to ensure we have adequate documentation coverage for all parameters, responses, headers, paths, exceptions, etc. > Consider adding an OpenAPI for tika-server > -- > > Key: TIKA-3082 > URL: https://issues.apache.org/jira/browse/TIKA-3082 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Major > > On TIKA-2253, [~lewismc] asked: > bq. I was planning on putting together an OpenAPI specification for Tika. Is > anyone in favor of this? > What do people think? How much will it change the current tika-server? What > are the benefits? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TIKA-3081) TikaInputStream's skip() should use the equivalent of IOUtils.skipFully()
[ https://issues.apache.org/jira/browse/TIKA-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072993#comment-17072993 ] Hudson commented on TIKA-3081: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1798 (See [https://builds.apache.org/job/Tika-trunk/1798/]) TIKA-3081 -- convert TikaInputStream's skip to the equivalent of (tallison: [https://github.com/apache/tika/commit/aaa9f40e3c8119f1a155e3f1eea5c2ffe7f4f26f]) * (edit) tika-core/src/main/java/org/apache/tika/io/TikaInputStream.java * (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/OneNotePtr.java > TikaInputStream's skip() should use the equivalent of IOUtils.skipFully() > - > > Key: TIKA-3081 > URL: https://issues.apache.org/jira/browse/TIKA-3081 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Major > > Some parsers may assume that skip() returns the number of bytes actually > skipped. As we've learned, FileInputStream's return value can be completely > divorced from reality, and it can report that the stream is skipping even > past the EOF. > If we convert TikaInputStream's skip() to something that will throw an > exception if a 3rd party parser tries to skip past the end of a file, we may > prevent an entire class of bugs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TIKA-3082) Consider adding an OpenAPI for tika-server
[ https://issues.apache.org/jira/browse/TIKA-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072957#comment-17072957 ] Konstantin Gribov commented on TIKA-3082: - [~lewismc], could you please clarify how do you wish to use OpenAPI spec? Since such spec could be used to generate client libraries and stubs for JAX-RS or it could be generated from some additional annotations on say JAX-RS services. Both solutions are viable but certainly depend on your goals in introducing OpenAPI. Both solutions have pros and cons, so I hope you'll have a some time to expand your original idea. > Consider adding an OpenAPI for tika-server > -- > > Key: TIKA-3082 > URL: https://issues.apache.org/jira/browse/TIKA-3082 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Major > > On TIKA-2253, [~lewismc] asked: > bq. I was planning on putting together an OpenAPI specification for Tika. Is > anyone in favor of this? > What do people think? How much will it change the current tika-server? What > are the benefits? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TIKA-2253) Obtain new Miredot license key and upgrade plugin version in tika-server
[ https://issues.apache.org/jira/browse/TIKA-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072945#comment-17072945 ] Tim Allison commented on TIKA-2253: --- Opened TIKA-3082 to track this discussion/feature. > Obtain new Miredot license key and upgrade plugin version in tika-server > > > Key: TIKA-2253 > URL: https://issues.apache.org/jira/browse/TIKA-2253 > Project: Tika > Issue Type: Task > Components: documentation, server >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.15 > > > As per our recent mailing list conversation > http://www.mail-archive.com/dev%40tika.apache.org/msg20558.html our Miredot > license has expired. > The kind folks over at Miredot have provided us with a new key it is valid > until January 31st, 2020 after which we are free to request a new key. > Thanks Miredot! > PR coming. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (TIKA-3082) Consider adding an OpenAPI for tika-server
Tim Allison created TIKA-3082: - Summary: Consider adding an OpenAPI for tika-server Key: TIKA-3082 URL: https://issues.apache.org/jira/browse/TIKA-3082 Project: Tika Issue Type: Task Reporter: Tim Allison On TIKA-2253, [~lewismc] asked: bq. I was planning on putting together an OpenAPI specification for Tika. Is anyone in favor of this? What do people think? How much will it change the current tika-server? What are the benefits? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (TIKA-3081) TikaInputStream's skip() should use the equivalent of IOUtils.skipFully()
Tim Allison created TIKA-3081: - Summary: TikaInputStream's skip() should use the equivalent of IOUtils.skipFully() Key: TIKA-3081 URL: https://issues.apache.org/jira/browse/TIKA-3081 Project: Tika Issue Type: Task Reporter: Tim Allison Some parsers may assume that skip() returns the number of bytes actually skipped. As we've learned, FileInputStream's return value can be completely divorced from reality, and it can report that the stream is skipping even past the EOF. If we convert TikaInputStream's skip() to something that will throw an exception if a 3rd party parser tries to skip past the end of a file, we may prevent an entire class of bugs. -- This message was sent by Atlassian Jira (v8.3.4#803005)