[jira] [Commented] (TIKA-3080) CharsetMatch.getString can get stuck in infinite loop

2020-04-01 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073050#comment-17073050
 ] 

Hudson commented on TIKA-3080:
--

UNSTABLE: Integrated in Jenkins build Tika-trunk #1799 (See 
[https://builds.apache.org/job/Tika-trunk/1799/])
TIKA-3080 -- prevent infinite loop in CharsetMatch.getString (tallison: 
[https://github.com/apache/tika/commit/8e33e28b72b791710a1e9fdf515c2fcd72f82deb])
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/txt/CharsetMatch.java


> CharsetMatch.getString can get stuck in infinite loop
> -
>
> Key: TIKA-3080
> URL: https://issues.apache.org/jira/browse/TIKA-3080
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.24
>Reporter: Vikram Shrowty
>Priority: Major
>
> In here:
> [https://github.com/apache/tika/blob/fb5a191edac2cef28c0a4ac390c9156acdc9e673/tika-parsers/src/main/java/org/apache/tika/parser/txt/CharsetMatch.java#L147-L150]
> If you specify a maxLength and the stream is long enough, the max variable in 
> the loop goes to zero and the loop then gets stuck because you're asking to 
> read 0 bytes but not exiting unless the number of bytes read is < 0. 
> Looks like the condition ought to be > 0 instead of >= 0.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3082) Consider adding an OpenAPI for tika-server

2020-04-01 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073020#comment-17073020
 ] 

Konstantin Gribov commented on TIKA-3082:
-

Also we could later add client modules for couple of popular libraries to give 
downstream users ready-to-fly libs with already declared deps.

> Consider adding an OpenAPI for tika-server
> --
>
> Key: TIKA-3082
> URL: https://issues.apache.org/jira/browse/TIKA-3082
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Lewis John McGibbney
>Priority: Major
>
> On TIKA-2253, [~lewismc] asked:
> bq. I was planning on putting together an OpenAPI specification for Tika. Is 
> anyone in favor of this?
> What do people think?  How much will it change the current tika-server?  What 
> are the benefits?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3082) Consider adding an OpenAPI for tika-server

2020-04-01 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073017#comment-17073017
 ] 

Konstantin Gribov commented on TIKA-3082:
-

[~lewismc], my gratitude and big +1 than)

In my experience some OpenAPI/Swagger tools are quite fragile (like 
swagger-codegen could break on minor version update) but overall I'm very 
inclined to use it since it gives us better maintainability, documentation 
generation, easier API versioning.

Also, I'd like to propose moving current APIs to versioned namespace 
{{/api/v1/*}} (and redirecting existing methods (like {{/meta}}, {{/rmeta}} 
etc) there with HTTP status 301.

BTW, JetBrains IDEA has bundled OpenAPI plugin (at least 2020.1 RC does).

> Consider adding an OpenAPI for tika-server
> --
>
> Key: TIKA-3082
> URL: https://issues.apache.org/jira/browse/TIKA-3082
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Lewis John McGibbney
>Priority: Major
>
> On TIKA-2253, [~lewismc] asked:
> bq. I was planning on putting together an OpenAPI specification for Tika. Is 
> anyone in favor of this?
> What do people think?  How much will it change the current tika-server?  What 
> are the benefits?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3082) Consider adding an OpenAPI for tika-server

2020-04-01 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073000#comment-17073000
 ] 

Lewis John McGibbney commented on TIKA-3082:


I'm going to start working on an OpenAPI and I will post it here when complete. 

> Consider adding an OpenAPI for tika-server
> --
>
> Key: TIKA-3082
> URL: https://issues.apache.org/jira/browse/TIKA-3082
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> On TIKA-2253, [~lewismc] asked:
> bq. I was planning on putting together an OpenAPI specification for Tika. Is 
> anyone in favor of this?
> What do people think?  How much will it change the current tika-server?  What 
> are the benefits?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (TIKA-3082) Consider adding an OpenAPI for tika-server

2020-04-01 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney reassigned TIKA-3082:
--

Assignee: Lewis John McGibbney

> Consider adding an OpenAPI for tika-server
> --
>
> Key: TIKA-3082
> URL: https://issues.apache.org/jira/browse/TIKA-3082
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Lewis John McGibbney
>Priority: Major
>
> On TIKA-2253, [~lewismc] asked:
> bq. I was planning on putting together an OpenAPI specification for Tika. Is 
> anyone in favor of this?
> What do people think?  How much will it change the current tika-server?  What 
> are the benefits?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3082) Consider adding an OpenAPI for tika-server

2020-04-01 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072998#comment-17072998
 ] 

Lewis John McGibbney commented on TIKA-3082:


[~grossws] absolutely.

# Firstly, OpenAPI is the industry standard for API first implementations 
meaning that moving forward the goal would be for our REST to be more user and 
developer friendly. 
# OpenAPI has a wide variety of tooling meaning that people could generate both 
server stub implementations and client implementations on the fly at their own 
convenience whilst we (the dev@tika community) maintain the existing 
tika-server Java implementation.
# There are several Java server stub generation options for us to use, namely
{code:java}
java-inflector
java-msf4j
java-pkmst
java-play-framework
java-undertow-server
java-vertx
java-vertx-web (beta)
jaxrs-cxf
jaxrs-cxf-cdi
jaxrs-cxf-extended
jaxrs-jersey
jaxrs-resteasy
jaxrs-resteasy-eap
jaxrs-spec 
{code}
... I think we would choose *jaxrs-cxf* in an attempt to cause minimal impact 
on the existing tika-server code. wdyt?
# I tend to use [IBM's OpenAPI linter and 
validator|https://github.com/IBM/openapi-validator] in an attempt to obtain 
consistency in the quality of my REST API development. I think that this tool 
would make it easier for us to ensure we have adequate documentation coverage 
for all parameters, responses, headers, paths, exceptions, etc.

> Consider adding an OpenAPI for tika-server
> --
>
> Key: TIKA-3082
> URL: https://issues.apache.org/jira/browse/TIKA-3082
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> On TIKA-2253, [~lewismc] asked:
> bq. I was planning on putting together an OpenAPI specification for Tika. Is 
> anyone in favor of this?
> What do people think?  How much will it change the current tika-server?  What 
> are the benefits?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3081) TikaInputStream's skip() should use the equivalent of IOUtils.skipFully()

2020-04-01 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072993#comment-17072993
 ] 

Hudson commented on TIKA-3081:
--

SUCCESS: Integrated in Jenkins build Tika-trunk #1798 (See 
[https://builds.apache.org/job/Tika-trunk/1798/])
TIKA-3081 -- convert TikaInputStream's skip to the equivalent of (tallison: 
[https://github.com/apache/tika/commit/aaa9f40e3c8119f1a155e3f1eea5c2ffe7f4f26f])
* (edit) tika-core/src/main/java/org/apache/tika/io/TikaInputStream.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/OneNotePtr.java


> TikaInputStream's skip() should use the equivalent of IOUtils.skipFully()
> -
>
> Key: TIKA-3081
> URL: https://issues.apache.org/jira/browse/TIKA-3081
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> Some parsers may assume that skip() returns the number of bytes actually 
> skipped.  As we've learned, FileInputStream's return value can be completely 
> divorced from reality, and it can report that the stream is skipping even 
> past the EOF.
> If we convert TikaInputStream's skip() to something that will throw an 
> exception if a 3rd party parser tries to skip past the end of a file, we may 
> prevent an entire class of bugs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3082) Consider adding an OpenAPI for tika-server

2020-04-01 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072957#comment-17072957
 ] 

Konstantin Gribov commented on TIKA-3082:
-

[~lewismc], could you please clarify how do you wish to use OpenAPI spec? 

Since such spec could be used to generate client libraries and stubs for JAX-RS 
or it could be generated from some additional annotations on say JAX-RS 
services. Both solutions are viable but certainly depend on your goals in 
introducing OpenAPI. Both solutions have pros and cons, so I hope you'll have a 
some time to expand your original idea.

> Consider adding an OpenAPI for tika-server
> --
>
> Key: TIKA-3082
> URL: https://issues.apache.org/jira/browse/TIKA-3082
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> On TIKA-2253, [~lewismc] asked:
> bq. I was planning on putting together an OpenAPI specification for Tika. Is 
> anyone in favor of this?
> What do people think?  How much will it change the current tika-server?  What 
> are the benefits?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-2253) Obtain new Miredot license key and upgrade plugin version in tika-server

2020-04-01 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072945#comment-17072945
 ] 

Tim Allison commented on TIKA-2253:
---

Opened TIKA-3082 to track this discussion/feature.

> Obtain new Miredot license key and upgrade plugin version in tika-server
> 
>
> Key: TIKA-2253
> URL: https://issues.apache.org/jira/browse/TIKA-2253
> Project: Tika
>  Issue Type: Task
>  Components: documentation, server
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.15
>
>
> As per our recent mailing list conversation 
> http://www.mail-archive.com/dev%40tika.apache.org/msg20558.html our Miredot 
> license has expired.
> The kind folks over at Miredot have provided us with a new key it is valid 
> until January 31st, 2020 after which we are free to request a new key.
> Thanks Miredot!
> PR coming.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TIKA-3082) Consider adding an OpenAPI for tika-server

2020-04-01 Thread Tim Allison (Jira)
Tim Allison created TIKA-3082:
-

 Summary: Consider adding an OpenAPI for tika-server
 Key: TIKA-3082
 URL: https://issues.apache.org/jira/browse/TIKA-3082
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison


On TIKA-2253, [~lewismc] asked:

bq. I was planning on putting together an OpenAPI specification for Tika. Is 
anyone in favor of this?

What do people think?  How much will it change the current tika-server?  What 
are the benefits?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TIKA-3081) TikaInputStream's skip() should use the equivalent of IOUtils.skipFully()

2020-04-01 Thread Tim Allison (Jira)
Tim Allison created TIKA-3081:
-

 Summary: TikaInputStream's skip() should use the equivalent of 
IOUtils.skipFully()
 Key: TIKA-3081
 URL: https://issues.apache.org/jira/browse/TIKA-3081
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison


Some parsers may assume that skip() returns the number of bytes actually 
skipped.  As we've learned, FileInputStream's return value can be completely 
divorced from reality, and it can report that the stream is skipping even past 
the EOF.

If we convert TikaInputStream's skip() to something that will throw an 
exception if a 3rd party parser tries to skip past the end of a file, we may 
prevent an entire class of bugs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)