[GitHub] [tika-helm] frascu commented on pull request #8: Fix invalid yaml document when helm lint is executed

2023-02-15 Thread via GitHub


frascu commented on PR #8:
URL: https://github.com/apache/tika-helm/pull/8#issuecomment-1431018451

   Hi @lewismc 
   Could you please review this pull request?
   I need a minor version of tika-helm to fix the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-3970) Certain OneNote documents produce duplicate text

2023-02-15 Thread David Avant (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689131#comment-17689131
 ] 

David Avant commented on TIKA-3970:
---

Sadly, I am not aware of any free, non-Microsoft viewers.    But I have not 
spent much time searching.

> Certain OneNote documents produce duplicate text
> 
>
> Key: TIKA-3970
> URL: https://issues.apache.org/jira/browse/TIKA-3970
> Project: Tika
>  Issue Type: Bug
>  Components: app
>Affects Versions: 2.7.0
>Reporter: David Avant
>Priority: Minor
> Attachments: lyrics.docx, lyrics.one, lyrics.txt
>
>
> Extracting text from certain OneNote documents produces more text than is 
> actually in the document. In this case, the OneNote document was created 
> by opening a Word document and "printing" it to the OneNote.
> To reproduce the issue, open the attached "lyrics.one" using the Tika App 
> version 2.7.0 and view the plain text. Look for the phrase "Sunday 
> Morning" and observe that there are 14 occurrences.    However in the actual 
> displayed text, it occurs only once.  
> The original text in this document is only about 12K characters, but the 
> extracted text from tika is over 300K.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Adam Bialas (Jira)
Adam Bialas created TIKA-3973:
-

 Summary: Content of Ogg file with Opus encoded content not 
correctly recognized
 Key: TIKA-3973
 URL: https://issues.apache.org/jira/browse/TIKA-3973
 Project: Tika
  Issue Type: Bug
  Components: detector
Affects Versions: 2.7.0
Reporter: Adam Bialas
 Attachments: speech_output.ogg

We are using tika-core:2.7.0 for file content detection. We have a ogg file 
which uses Opus audio codec (see attachment). When we try to detect content 
with metadata:
 
{code:java}
Metadata metadata = new Metadata(); 
metadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, 
FilenameUtils.getName(url));{code}
this file is recognized as audio/vorbis which is not ok. Can you please verify?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Nick Burch (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689161#comment-17689161
 ] 

Nick Burch commented on TIKA-3973:
--

For container-based detection (such as the Ogg container format), you really 
need to include the Tika Parsers jars too.

With the Ogg container detector enabled (which comes with the Tika media 
parsers), Tika can correctly detect the type as {{audio/opus}}

We have magic which will detect an opus file with a single stream if you're 
lucky, but with containers it's very hit-and-miss if you can tell with magic 
alone. Enabling the Ogg container detector is the best solution though, that 
should always work no matter what order the streams are in, what streams are 
contained etc{{{}
{}}}

> Content of Ogg file with Opus encoded content not correctly recognized
> --
>
> Key: TIKA-3973
> URL: https://issues.apache.org/jira/browse/TIKA-3973
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.7.0
>Reporter: Adam Bialas
>Priority: Major
> Attachments: speech_output.ogg
>
>
> We are using tika-core:2.7.0 for file content detection. We have a ogg file 
> which uses Opus audio codec (see attachment). When we try to detect content 
> with metadata:
>  
> {code:java}
> Metadata metadata = new Metadata(); 
> metadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, 
> FilenameUtils.getName(url));{code}
> this file is recognized as audio/vorbis which is not ok. Can you please 
> verify?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Nick Burch (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689161#comment-17689161
 ] 

Nick Burch edited comment on TIKA-3973 at 2/15/23 2:38 PM:
---

For container-based detection (such as the Ogg container format), you really 
need to include the Tika Parsers jars too.

With the Ogg container detector enabled (which comes with the Tika media 
parsers), Tika can correctly detect the type as {{audio/opus}}

We have magic which will detect an opus file with a single stream if you're 
lucky, but with containers it's very hit-and-miss if you can tell with magic 
alone. Enabling the Ogg container detector is the best solution though, that 
should always work no matter what order the streams are in, what streams are 
contained etc


was (Author: gagravarr):
For container-based detection (such as the Ogg container format), you really 
need to include the Tika Parsers jars too.

With the Ogg container detector enabled (which comes with the Tika media 
parsers), Tika can correctly detect the type as {{audio/opus}}

We have magic which will detect an opus file with a single stream if you're 
lucky, but with containers it's very hit-and-miss if you can tell with magic 
alone. Enabling the Ogg container detector is the best solution though, that 
should always work no matter what order the streams are in, what streams are 
contained etc{{{}
{}}}

> Content of Ogg file with Opus encoded content not correctly recognized
> --
>
> Key: TIKA-3973
> URL: https://issues.apache.org/jira/browse/TIKA-3973
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.7.0
>Reporter: Adam Bialas
>Priority: Major
> Attachments: speech_output.ogg
>
>
> We are using tika-core:2.7.0 for file content detection. We have a ogg file 
> which uses Opus audio codec (see attachment). When we try to detect content 
> with metadata:
>  
> {code:java}
> Metadata metadata = new Metadata(); 
> metadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, 
> FilenameUtils.getName(url));{code}
> this file is recognized as audio/vorbis which is not ok. Can you please 
> verify?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Adam Bialas (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689170#comment-17689170
 ] 

Adam Bialas commented on TIKA-3973:
---

Which jar should I include also? 

> Content of Ogg file with Opus encoded content not correctly recognized
> --
>
> Key: TIKA-3973
> URL: https://issues.apache.org/jira/browse/TIKA-3973
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.7.0
>Reporter: Adam Bialas
>Priority: Major
> Attachments: speech_output.ogg
>
>
> We are using tika-core:2.7.0 for file content detection. We have a ogg file 
> which uses Opus audio codec (see attachment). When we try to detect content 
> with metadata:
>  
> {code:java}
> Metadata metadata = new Metadata(); 
> metadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, 
> FilenameUtils.getName(url));{code}
> this file is recognized as audio/vorbis which is not ok. Can you please 
> verify?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Nick Burch (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689176#comment-17689176
 ] 

Nick Burch commented on TIKA-3973:
--

For all container formats you want {{tika-parsers}} or {{tika-parsers-standard}}

If you only care about the Ogg formats, then {{vorbis-java-tika}} from 
{{org.gagravarr}} is enough

> Content of Ogg file with Opus encoded content not correctly recognized
> --
>
> Key: TIKA-3973
> URL: https://issues.apache.org/jira/browse/TIKA-3973
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.7.0
>Reporter: Adam Bialas
>Priority: Major
> Attachments: speech_output.ogg
>
>
> We are using tika-core:2.7.0 for file content detection. We have a ogg file 
> which uses Opus audio codec (see attachment). When we try to detect content 
> with metadata:
>  
> {code:java}
> Metadata metadata = new Metadata(); 
> metadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, 
> FilenameUtils.getName(url));{code}
> this file is recognized as audio/vorbis which is not ok. Can you please 
> verify?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Adam Bialas (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689183#comment-17689183
 ] 

Adam Bialas commented on TIKA-3973:
---

So I need those dependencies:
{code:java}
implementation 'org.apache.tika:tika-core:2.7.0'
implementation 'org.gagravarr:vorbis-java-tika:0.8'
implementation 'org.gagravarr:vorbis-java-core:0.8'{code}

> Content of Ogg file with Opus encoded content not correctly recognized
> --
>
> Key: TIKA-3973
> URL: https://issues.apache.org/jira/browse/TIKA-3973
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.7.0
>Reporter: Adam Bialas
>Priority: Major
> Attachments: speech_output.ogg
>
>
> We are using tika-core:2.7.0 for file content detection. We have a ogg file 
> which uses Opus audio codec (see attachment). When we try to detect content 
> with metadata:
>  
> {code:java}
> Metadata metadata = new Metadata(); 
> metadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, 
> FilenameUtils.getName(url));{code}
> this file is recognized as audio/vorbis which is not ok. Can you please 
> verify?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Nick Burch (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689199#comment-17689199
 ] 

Nick Burch commented on TIKA-3973:
--

If you only care about container-aware detection for Ogg based formats, you 
should be fine right now with just

{code:java}
implementation 'org.apache.tika:tika-core:2.7.0'
implementation 'org.gagravarr:vorbis-java-tika:0.8'
{code}

The Vorbis Tika module should pull in the other things it needs (such as core)

> Content of Ogg file with Opus encoded content not correctly recognized
> --
>
> Key: TIKA-3973
> URL: https://issues.apache.org/jira/browse/TIKA-3973
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.7.0
>Reporter: Adam Bialas
>Priority: Major
> Attachments: speech_output.ogg
>
>
> We are using tika-core:2.7.0 for file content detection. We have a ogg file 
> which uses Opus audio codec (see attachment). When we try to detect content 
> with metadata:
>  
> {code:java}
> Metadata metadata = new Metadata(); 
> metadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, 
> FilenameUtils.getName(url));{code}
> this file is recognized as audio/vorbis which is not ok. Can you please 
> verify?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689203#comment-17689203
 ] 

Tim Allison commented on TIKA-3973:
---

To emphasize Nick's point... if you need detection of other container formats, 
like OLE2 (.doc, .ppt, .xls) or zip-based (docx, pptx, xlsx), you should 
include the full tika-parsers-standard-package.

If you only care about Ogg, then go with what Nick recommends.

> Content of Ogg file with Opus encoded content not correctly recognized
> --
>
> Key: TIKA-3973
> URL: https://issues.apache.org/jira/browse/TIKA-3973
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.7.0
>Reporter: Adam Bialas
>Priority: Major
> Attachments: speech_output.ogg
>
>
> We are using tika-core:2.7.0 for file content detection. We have a ogg file 
> which uses Opus audio codec (see attachment). When we try to detect content 
> with metadata:
>  
> {code:java}
> Metadata metadata = new Metadata(); 
> metadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, 
> FilenameUtils.getName(url));{code}
> this file is recognized as audio/vorbis which is not ok. Can you please 
> verify?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Adam Bialas (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689207#comment-17689207
 ] 

Adam Bialas commented on TIKA-3973:
---

You mean this:
{code:java}


    org.apache.tika
    tika-parsers-standard-package
    2.7.0
{code}

> Content of Ogg file with Opus encoded content not correctly recognized
> --
>
> Key: TIKA-3973
> URL: https://issues.apache.org/jira/browse/TIKA-3973
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.7.0
>Reporter: Adam Bialas
>Priority: Major
> Attachments: speech_output.ogg
>
>
> We are using tika-core:2.7.0 for file content detection. We have a ogg file 
> which uses Opus audio codec (see attachment). When we try to detect content 
> with metadata:
>  
> {code:java}
> Metadata metadata = new Metadata(); 
> metadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, 
> FilenameUtils.getName(url));{code}
> this file is recognized as audio/vorbis which is not ok. Can you please 
> verify?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: Adding arguments to configure tika from the rest calls

2023-02-15 Thread Julien Massiera
Hi Tim,

bouncing back on our mail thread, could you share more documentation on how to 
use the header to configure the PDFParser on the fly ? 

Thanks,
Julien

-Message d'origine-
De : Julien Massiera  
Envoyé : vendredi 3 février 2023 13:08
À : dev@tika.apache.org
Objet : RE: Adding arguments to configure tika from the rest calls

Hi Tim,

The NER Parse config via headers like the PDFParserConfig sounds an interesting 
approach but I have just discovered that feature thanks to your reply and I 
tried to find a documentation about this, unfortunately the only thing I found 
was a TBD note on that page 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=109454066

Could you tell us more about how to use it ? so that we can test it to have a 
better idea on how it works and how useful would it be for NER ? 

Thanks,
Julien 

-Message d'origine-
De : Tim Allison 
Envoyé : mardi 31 janvier 2023 13:19
À : dev@tika.apache.org
Objet : Re: Adding arguments to configure tika from the rest calls

Configuring specific parsers that don't have their own parser config objects is 
a pain.  For example, we currently have an option to set PDFParserConfig and 
TesseractParserConfig options via headers to tika-server...and we have a way to 
extend this functionality to other parsers.  This option is "not pretty"(TM), 
but it has the benefit of correctly differentiating creation-time settings 
(applies to all
files) from runtime-settings (applies to a specific file), and this process 
reuses a single static parser so there's no overhead in rebuilding the parser 
object for every file.

So, we could add an ner parse config along the lines of the PDFParserConfig, 
or...

...I regret I can't tell if this is what you're proposing, but we could specify 
a tika-config.xml file via url parameters?  This would add overhead of loading 
the full parser for each parse where you specify your own custom parser.  Or, I 
guess, we could load x many default parsers and name them?

On Tue, Jan 31, 2023 at 5:34 AM Cedric Ulmer  
wrote:
>
> Hi all,
>
> We are playing with the regex-based detection capabilities of Tika combined 
> with ManifoldCF, and an idea came to our mind. First, the problem: for now, a 
> tika server has only one configuration. Therefore, if we set a regex based 
> entity extraction, it will be applied to all of the documents (for given mime 
> types). So if in ManifoldCF we call the Tika server during an crawling phase, 
> we cannot have different regex rules per crawling job: any job that calls the 
> tika server will be processed the same way.
>
> So here is the idea: wouldn't it be possible to make the call to a 
> tika server configurable via a REST parameter/arguments, where we 
> could set which config we want to use for the current call ? Something
> like: ?enableNER=true&NERConfig=regex1
>
> Regards,
>
> Cédric
> CEO
> France Labs - Your knowledge, now
> Datafari Enterprise Search
>




[GitHub] [tika-helm] lewismc commented on pull request #8: Fix invalid yaml document when helm lint is executed

2023-02-15 Thread via GitHub


lewismc commented on PR #8:
URL: https://github.com/apache/tika-helm/pull/8#issuecomment-1431699844

   Thanks for your patience @frascu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika-helm] lewismc merged pull request #8: Fix invalid yaml document when helm lint is executed

2023-02-15 Thread via GitHub


lewismc merged PR #8:
URL: https://github.com/apache/tika-helm/pull/8


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika-helm] lewismc commented on pull request #8: Fix invalid yaml document when helm lint is executed

2023-02-15 Thread via GitHub


lewismc commented on PR #8:
URL: https://github.com/apache/tika-helm/pull/8#issuecomment-1431700881

   Should we release a new version of the Helm Chart?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689288#comment-17689288
 ] 

Tim Allison commented on TIKA-3973:
---

Y.

> Content of Ogg file with Opus encoded content not correctly recognized
> --
>
> Key: TIKA-3973
> URL: https://issues.apache.org/jira/browse/TIKA-3973
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.7.0
>Reporter: Adam Bialas
>Priority: Major
> Attachments: speech_output.ogg
>
>
> We are using tika-core:2.7.0 for file content detection. We have a ogg file 
> which uses Opus audio codec (see attachment). When we try to detect content 
> with metadata:
>  
> {code:java}
> Metadata metadata = new Metadata(); 
> metadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, 
> FilenameUtils.getName(url));{code}
> this file is recognized as audio/vorbis which is not ok. Can you please 
> verify?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Adding arguments to configure tika from the rest calls

2023-02-15 Thread Tim Allison
Here's a first attempt at documentation:
https://cwiki.apache.org/confluence/display/TIKA/Configuring+Parsers+At+Parse+Time+in+tika-server

Please let me know if you have any questions or want write access to
improve the documentation!

On Wed, Feb 15, 2023 at 11:07 AM Julien Massiera
 wrote:
>
> Hi Tim,
>
> bouncing back on our mail thread, could you share more documentation on how 
> to use the header to configure the PDFParser on the fly ?
>
> Thanks,
> Julien
>
> -Message d'origine-
> De : Julien Massiera 
> Envoyé : vendredi 3 février 2023 13:08
> À : dev@tika.apache.org
> Objet : RE: Adding arguments to configure tika from the rest calls
>
> Hi Tim,
>
> The NER Parse config via headers like the PDFParserConfig sounds an 
> interesting approach but I have just discovered that feature thanks to your 
> reply and I tried to find a documentation about this, unfortunately the only 
> thing I found was a TBD note on that page 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=109454066
>
> Could you tell us more about how to use it ? so that we can test it to have a 
> better idea on how it works and how useful would it be for NER ?
>
> Thanks,
> Julien
>
> -Message d'origine-
> De : Tim Allison 
> Envoyé : mardi 31 janvier 2023 13:19
> À : dev@tika.apache.org
> Objet : Re: Adding arguments to configure tika from the rest calls
>
> Configuring specific parsers that don't have their own parser config objects 
> is a pain.  For example, we currently have an option to set PDFParserConfig 
> and TesseractParserConfig options via headers to tika-server...and we have a 
> way to extend this functionality to other parsers.  This option is "not 
> pretty"(TM), but it has the benefit of correctly differentiating 
> creation-time settings (applies to all
> files) from runtime-settings (applies to a specific file), and this process 
> reuses a single static parser so there's no overhead in rebuilding the parser 
> object for every file.
>
> So, we could add an ner parse config along the lines of the PDFParserConfig, 
> or...
>
> ...I regret I can't tell if this is what you're proposing, but we could 
> specify a tika-config.xml file via url parameters?  This would add overhead 
> of loading the full parser for each parse where you specify your own custom 
> parser.  Or, I guess, we could load x many default parsers and name them?
>
> On Tue, Jan 31, 2023 at 5:34 AM Cedric Ulmer  
> wrote:
> >
> > Hi all,
> >
> > We are playing with the regex-based detection capabilities of Tika combined 
> > with ManifoldCF, and an idea came to our mind. First, the problem: for now, 
> > a tika server has only one configuration. Therefore, if we set a regex 
> > based entity extraction, it will be applied to all of the documents (for 
> > given mime types). So if in ManifoldCF we call the Tika server during an 
> > crawling phase, we cannot have different regex rules per crawling job: any 
> > job that calls the tika server will be processed the same way.
> >
> > So here is the idea: wouldn't it be possible to make the call to a
> > tika server configurable via a REST parameter/arguments, where we
> > could set which config we want to use for the current call ? Something
> > like: ?enableNER=true&NERConfig=regex1
> >
> > Regards,
> >
> > Cédric
> > CEO
> > France Labs - Your knowledge, now
> > Datafari Enterprise Search
> >
>
>


[GitHub] [tika-helm] tballison commented on pull request #8: Fix invalid yaml document when helm lint is executed

2023-02-15 Thread via GitHub


tballison commented on PR #8:
URL: https://github.com/apache/tika-helm/pull/8#issuecomment-1431871189

   +1 Note that we've slightly modified the docker numbering (e.g. 2.7.0.1 is 
the 1 docker release for Tika 2.7.0).  We may want to do similar here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika-helm] lewismc commented on pull request #4: TIKA-3452 java.nio.file.FileSystemException Read-only file system in 2.0.0-BETA tika-docker

2023-02-15 Thread via GitHub


lewismc commented on PR #4:
URL: https://github.com/apache/tika-helm/pull/4#issuecomment-1431890535

   @frascu please try this out and let me know how you get on. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-3452) java.nio.file.FileSystemException Read-only file system in 2.0.0-BETA tika-docker

2023-02-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689313#comment-17689313
 ] 

ASF GitHub Bot commented on TIKA-3452:
--

lewismc commented on PR #4:
URL: https://github.com/apache/tika-helm/pull/4#issuecomment-1431890535

   @frascu please try this out and let me know how you get on. Thanks




> java.nio.file.FileSystemException Read-only file system in 2.0.0-BETA 
> tika-docker
> -
>
> Key: TIKA-3452
> URL: https://issues.apache.org/jira/browse/TIKA-3452
> Project: Tika
>  Issue Type: Bug
>  Components: docker, helm
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.0.0-BETA
>
>
> The following ExecutionException is thrown when I attempt to run [tika-docker 
> 2.0.0-BETA|https://hub.docker.com/layers/apache/tika/2.0.0-BETA-full/images/sha256-2d735f7bdf86e618a5390d92614a310697f9134d11a2b2e4c1c0cfcde1f68b1d?context=explore]
> {code:bash}
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> java.util.concurrent.ExecutionException: java.nio.file.FileSystemException: 
> /tmp/apache-tika-server-forked-tmp-8374629799942405236: Read-only file system
>   at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
>   at 
> org.apache.tika.server.core.TikaServerCli.mainLoop(TikaServerCli.java:116)
>   at 
> org.apache.tika.server.core.TikaServerCli.execute(TikaServerCli.java:88)
>   at org.apache.tika.server.core.TikaServerCli.main(TikaServerCli.java:66)
> Caused by: java.nio.file.FileSystemException: 
> /tmp/apache-tika-server-forked-tmp-8374629799942405236: Read-only file system
>   at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
>   at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
>   at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
>   at 
> java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)
>   at java.base/java.nio.file.Files.newByteChannel(Files.java:375)
>   at java.base/java.nio.file.Files.createFile(Files.java:652)
>   at 
> java.base/java.nio.file.TempFileHelper.create(TempFileHelper.java:137)
>   at 
> java.base/java.nio.file.TempFileHelper.createTempFile(TempFileHelper.java:160)
>   at java.base/java.nio.file.Files.createTempFile(Files.java:917)
>   at 
> org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.(TikaServerWatchDog.java:220)
>   at 
> org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.(TikaServerWatchDog.java:210)
>   at 
> org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:117)
>   at 
> org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:50)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
>   at java.base/java.lang.Thread.run(Thread.java:832)
> {code}
> There are differences/improvements in the way the [tika-server child process 
> is 
> spawned|https://cwiki.apache.org/confluence/display/TIKA/TikaServer#TikaServer-MakingTikaServerRobusttoOOMs,InfiniteLoopsandMemoryLeaks]
>  in the 2.0.0-BETA docker image. I am investigating a fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3972) Parsing RTF sample with hyperlink and ToXMLContentHandler returns malformed XHTML from toString method call

2023-02-15 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689337#comment-17689337
 ] 

Tim Allison commented on TIKA-3972:
---

It looks like our parser requires {{fldrslt}} as a hint that the field has come 
to an end.  {{Dip, Caesar.doc}} doesn't have that, but {{Blackening Spice}} 
does.

Will take a bit to figure out how best to fix this.  Thank you for opening this 
issue and providing an example file!

> Parsing RTF sample with hyperlink and ToXMLContentHandler returns malformed 
> XHTML from toString method call
> ---
>
> Key: TIKA-3972
> URL: https://issues.apache.org/jira/browse/TIKA-3972
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.7.0
> Environment: Tested with Java 8 (Temurin Eclipse) and Tika 2.7.0 on 
> Windows 11.
>Reporter: Martin Honnen
>Priority: Major
>  Labels: RTFParser, rtf
> Attachments: hyperlink.rtf
>
>
> I am exploring Tika for RTF to X(HT)ML parsing, I have run into a problem 
> with some RTF having an hyperlink where unfortunately the result of using a 
> ContentHandler created with ToXMLContentHandler and calling the toString() 
> method on the handler returns a malformed X(HT)ML document where the starting 
> `` tag is not properly closed.
> I have attached the relevant RTF sample document. The output I get is
> ```
> http://www.w3.org/1999/xhtml";>
> 
>  />
>  content="org.apache.tika.parser.microsoft.rtf.RTFParser" />
> 
> 
> 
> 
> 
>     10”Flour Tortilla
>     Caesar DIP: Dip, Caesar.doc
>     Ripped Romaine
>     Blackened Salmon julienne
>     Shaved Red Onion
>     Julienne Tomato
>     Grated Parmesan
>     Blackening spice: Blackening Spice.doc
> 
> Method
> Procedure Text 
> 
> 
> 
> ```
> where the part `    Caesar DIP:  href="..\\..\\SAUCES\\Dips\\Dip, Caesar.doc">Dip, Caesar.doc />` is flawed as the `` is not closed.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (TIKA-3972) Parsing RTF sample with hyperlink and ToXMLContentHandler returns malformed XHTML from toString method call

2023-02-15 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689337#comment-17689337
 ] 

Tim Allison edited comment on TIKA-3972 at 2/15/23 8:50 PM:


-It looks like our parser requires {{fldrslt}} as a hint that the field has 
come to an end.  {{Dip, Caesar.doc}} doesn't have that, but {{Blackening 
Spice}} does.-

Sorry.  That was wrong.  Still looking.

Thank you for opening this issue and providing an example file!


was (Author: talli...@mitre.org):
It looks like our parser requires {{fldrslt}} as a hint that the field has come 
to an end.  {{Dip, Caesar.doc}} doesn't have that, but {{Blackening Spice}} 
does.

Will take a bit to figure out how best to fix this.  Thank you for opening this 
issue and providing an example file!

> Parsing RTF sample with hyperlink and ToXMLContentHandler returns malformed 
> XHTML from toString method call
> ---
>
> Key: TIKA-3972
> URL: https://issues.apache.org/jira/browse/TIKA-3972
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.7.0
> Environment: Tested with Java 8 (Temurin Eclipse) and Tika 2.7.0 on 
> Windows 11.
>Reporter: Martin Honnen
>Priority: Major
>  Labels: RTFParser, rtf
> Attachments: hyperlink.rtf
>
>
> I am exploring Tika for RTF to X(HT)ML parsing, I have run into a problem 
> with some RTF having an hyperlink where unfortunately the result of using a 
> ContentHandler created with ToXMLContentHandler and calling the toString() 
> method on the handler returns a malformed X(HT)ML document where the starting 
> `` tag is not properly closed.
> I have attached the relevant RTF sample document. The output I get is
> ```
> http://www.w3.org/1999/xhtml";>
> 
>  />
>  content="org.apache.tika.parser.microsoft.rtf.RTFParser" />
> 
> 
> 
> 
> 
>     10”Flour Tortilla
>     Caesar DIP: Dip, Caesar.doc
>     Ripped Romaine
>     Blackened Salmon julienne
>     Shaved Red Onion
>     Julienne Tomato
>     Grated Parmesan
>     Blackening spice: Blackening Spice.doc
> 
> Method
> Procedure Text 
> 
> 
> 
> ```
> where the part `    Caesar DIP:  href="..\\..\\SAUCES\\Dips\\Dip, Caesar.doc">Dip, Caesar.doc />` is flawed as the `` is not closed.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3452) java.nio.file.FileSystemException Read-only file system

2023-02-15 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated TIKA-3452:
---
Summary: java.nio.file.FileSystemException Read-only file system  (was: 
java.nio.file.FileSystemException Read-only file system in 2.0.0-BETA 
tika-docker)

> java.nio.file.FileSystemException Read-only file system
> ---
>
> Key: TIKA-3452
> URL: https://issues.apache.org/jira/browse/TIKA-3452
> Project: Tika
>  Issue Type: Bug
>  Components: docker, helm
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.0.0-BETA
>
>
> The following ExecutionException is thrown when I attempt to run [tika-docker 
> 2.0.0-BETA|https://hub.docker.com/layers/apache/tika/2.0.0-BETA-full/images/sha256-2d735f7bdf86e618a5390d92614a310697f9134d11a2b2e4c1c0cfcde1f68b1d?context=explore]
> {code:bash}
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> java.util.concurrent.ExecutionException: java.nio.file.FileSystemException: 
> /tmp/apache-tika-server-forked-tmp-8374629799942405236: Read-only file system
>   at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
>   at 
> org.apache.tika.server.core.TikaServerCli.mainLoop(TikaServerCli.java:116)
>   at 
> org.apache.tika.server.core.TikaServerCli.execute(TikaServerCli.java:88)
>   at org.apache.tika.server.core.TikaServerCli.main(TikaServerCli.java:66)
> Caused by: java.nio.file.FileSystemException: 
> /tmp/apache-tika-server-forked-tmp-8374629799942405236: Read-only file system
>   at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
>   at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
>   at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
>   at 
> java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)
>   at java.base/java.nio.file.Files.newByteChannel(Files.java:375)
>   at java.base/java.nio.file.Files.createFile(Files.java:652)
>   at 
> java.base/java.nio.file.TempFileHelper.create(TempFileHelper.java:137)
>   at 
> java.base/java.nio.file.TempFileHelper.createTempFile(TempFileHelper.java:160)
>   at java.base/java.nio.file.Files.createTempFile(Files.java:917)
>   at 
> org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.(TikaServerWatchDog.java:220)
>   at 
> org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.(TikaServerWatchDog.java:210)
>   at 
> org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:117)
>   at 
> org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:50)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
>   at java.base/java.lang.Thread.run(Thread.java:832)
> {code}
> There are differences/improvements in the way the [tika-server child process 
> is 
> spawned|https://cwiki.apache.org/confluence/display/TIKA/TikaServer#TikaServer-MakingTikaServerRobusttoOOMs,InfiniteLoopsandMemoryLeaks]
>  in the 2.0.0-BETA docker image. I am investigating a fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3972) Parsing RTF sample with hyperlink and ToXMLContentHandler returns malformed XHTML from toString method call

2023-02-15 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689364#comment-17689364
 ] 

Tim Allison commented on TIKA-3972:
---

It looks like the RTFParser is actually ending the  element.  Somewhere in 
the content handler stack, though, it isn't being emitted because the parser is 
incorrectly ending the  element (and then starting it again) after the start 
of the  element.
{noformat}
start: p
characters: Caesar 
start: b
start: i
characters: DIP
end: i
characters: : 
start: a
characters: Dip, Caesar.doc
end: b
start: b
end: a
end: b
end: p
{noformat}
So literally, we're writing  

> Parsing RTF sample with hyperlink and ToXMLContentHandler returns malformed 
> XHTML from toString method call
> ---
>
> Key: TIKA-3972
> URL: https://issues.apache.org/jira/browse/TIKA-3972
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.7.0
> Environment: Tested with Java 8 (Temurin Eclipse) and Tika 2.7.0 on 
> Windows 11.
>Reporter: Martin Honnen
>Priority: Major
>  Labels: RTFParser, rtf
> Attachments: hyperlink.rtf
>
>
> I am exploring Tika for RTF to X(HT)ML parsing, I have run into a problem 
> with some RTF having an hyperlink where unfortunately the result of using a 
> ContentHandler created with ToXMLContentHandler and calling the toString() 
> method on the handler returns a malformed X(HT)ML document where the starting 
> `` tag is not properly closed.
> I have attached the relevant RTF sample document. The output I get is
> ```
> http://www.w3.org/1999/xhtml";>
> 
>  />
>  content="org.apache.tika.parser.microsoft.rtf.RTFParser" />
> 
> 
> 
> 
> 
>     10”Flour Tortilla
>     Caesar DIP: Dip, Caesar.doc
>     Ripped Romaine
>     Blackened Salmon julienne
>     Shaved Red Onion
>     Julienne Tomato
>     Grated Parmesan
>     Blackening spice: Blackening Spice.doc
> 
> Method
> Procedure Text 
> 
> 
> 
> ```
> where the part `    Caesar DIP:  href="..\\..\\SAUCES\\Dips\\Dip, Caesar.doc">Dip, Caesar.doc />` is flawed as the `` is not closed.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3972) Parsing RTF sample with hyperlink and ToXMLContentHandler returns malformed XHTML from toString method call

2023-02-15 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689400#comment-17689400
 ] 

Hudson commented on TIKA-3972:
--

FAILURE: Integrated in Jenkins build Tika » tika-main-jdk8 #1024 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/1024/])
TIKA-3972 -- fix closing  elements when there are also style elements 
(tallison: 
[https://github.com/apache/tika/commit/4f599dfa3d72c724a846356bf867db45f221170a])
* (edit) CHANGES.txt
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/rtf/RTFParserTest.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/rtf/TextExtractor.java
* (add) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testRTFHyperlinkAndStyles.rtf


> Parsing RTF sample with hyperlink and ToXMLContentHandler returns malformed 
> XHTML from toString method call
> ---
>
> Key: TIKA-3972
> URL: https://issues.apache.org/jira/browse/TIKA-3972
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.7.0
> Environment: Tested with Java 8 (Temurin Eclipse) and Tika 2.7.0 on 
> Windows 11.
>Reporter: Martin Honnen
>Priority: Major
>  Labels: RTFParser, rtf
> Attachments: hyperlink.rtf
>
>
> I am exploring Tika for RTF to X(HT)ML parsing, I have run into a problem 
> with some RTF having an hyperlink where unfortunately the result of using a 
> ContentHandler created with ToXMLContentHandler and calling the toString() 
> method on the handler returns a malformed X(HT)ML document where the starting 
> `` tag is not properly closed.
> I have attached the relevant RTF sample document. The output I get is
> ```
> http://www.w3.org/1999/xhtml";>
> 
>  />
>  content="org.apache.tika.parser.microsoft.rtf.RTFParser" />
> 
> 
> 
> 
> 
>     10”Flour Tortilla
>     Caesar DIP: Dip, Caesar.doc
>     Ripped Romaine
>     Blackened Salmon julienne
>     Shaved Red Onion
>     Julienne Tomato
>     Grated Parmesan
>     Blackening spice: Blackening Spice.doc
> 
> Method
> Procedure Text 
> 
> 
> 
> ```
> where the part `    Caesar DIP:  href="..\\..\\SAUCES\\Dips\\Dip, Caesar.doc">Dip, Caesar.doc />` is flawed as the `` is not closed.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] dependabot[bot] opened a new pull request, #966: Bump rome from 1.18.0 to 1.19.0

2023-02-15 Thread via GitHub


dependabot[bot] opened a new pull request, #966:
URL: https://github.com/apache/tika/pull/966

   Bumps [rome](https://github.com/rometools/rome) from 1.18.0 to 1.19.0.
   
   Release notes
   Sourced from https://github.com/rometools/rome/releases";>rome's releases.
   
   1.19.0
   
   What's Changed
   🔨 Dependency Upgrades
   
   Bump flatten-maven-plugin from 1.2.7 to 1.3.0 by https://github.com/dependabot";>@​dependabot in https://github-redirect.dependabot.com/rometools/rome/pull/565";>rometools/rome#565
   Bump maven-bundle-plugin from 5.1.5 to 5.1.8 by https://github.com/dependabot";>@​dependabot in https://github-redirect.dependabot.com/rometools/rome/pull/563";>rometools/rome#563
   Bump maven-dependency-plugin from 3.3.0 to 3.5.0 by https://github.com/dependabot";>@​dependabot in https://github-redirect.dependabot.com/rometools/rome/pull/602";>rometools/rome#602
   Bump maven-deploy-plugin from 2.8.2 to 3.1.0 by https://github.com/dependabot";>@​dependabot in https://github-redirect.dependabot.com/rometools/rome/pull/607";>rometools/rome#607
   Bump maven-jar-plugin from 3.2.2 to 3.3.0 by https://github.com/dependabot";>@​dependabot in https://github-redirect.dependabot.com/rometools/rome/pull/574";>rometools/rome#574
   Bump maven-javadoc-plugin from 3.3.1 to 3.5.0 by https://github.com/dependabot";>@​dependabot in https://github-redirect.dependabot.com/rometools/rome/pull/609";>rometools/rome#609
   Bump maven-scm-plugin from 1.12.2 to 1.13.0 by https://github.com/dependabot";>@​dependabot in https://github-redirect.dependabot.com/rometools/rome/pull/554";>rometools/rome#554
   Bump assertj-core from 3.22.0 to 3.24.2 by https://github.com/dependabot";>@​dependabot in https://github-redirect.dependabot.com/rometools/rome/pull/603";>rometools/rome#603
   Bump slf4j-api from 1.7.36 to 2.0.6 by https://github.com/dependabot";>@​dependabot in https://github-redirect.dependabot.com/rometools/rome/pull/596";>rometools/rome#596
   
   Other Changes
   
   Bump actions/setup-java from 3.3.0 to 3.10.0 by https://github.com/dependabot";>@​dependabot in https://github-redirect.dependabot.com/rometools/rome/pull/606";>rometools/rome#606
   Bump logback-classic from 1.2.10 to 1.3.5 by https://github.com/PatrickGotthard";>@​PatrickGotthard in 
https://github-redirect.dependabot.com/rometools/rome/pull/611";>rometools/rome#611
   
   Full Changelog: https://github.com/rometools/rome/compare/1.18.0...1.19.0";>https://github.com/rometools/rome/compare/1.18.0...1.19.0
   
   
   
   Commits
   
   https://github.com/rometools/rome/commit/de67e7c97424480de72191d053a475b301dedebf";>de67e7c
 Bump logback-classic from 1.2.10 to 1.3.5 (https://github-redirect.dependabot.com/rometools/rome/issues/611";>#611)
   https://github.com/rometools/rome/commit/3e74bcd9f47e8a58253c46be559c8a270ee70a94";>3e74bcd
 Bump slf4j-api from 1.7.36 to 2.0.6 (https://github-redirect.dependabot.com/rometools/rome/issues/596";>#596)
   https://github.com/rometools/rome/commit/a6428fc8b10c6ca269db361ccce211362960ca78";>a6428fc
 Bump assertj-core from 3.22.0 to 3.24.2 (https://github-redirect.dependabot.com/rometools/rome/issues/603";>#603)
   https://github.com/rometools/rome/commit/c89387f4dc1683502a2a8d1419614634ba72b3f7";>c89387f
 Bump maven-scm-plugin from 1.12.2 to 1.13.0 (https://github-redirect.dependabot.com/rometools/rome/issues/554";>#554)
   https://github.com/rometools/rome/commit/6604dc2a592c7bb66bffd24e91e6dae95b82263c";>6604dc2
 Bump maven-javadoc-plugin from 3.3.1 to 3.5.0 (https://github-redirect.dependabot.com/rometools/rome/issues/609";>#609)
   https://github.com/rometools/rome/commit/b8c1f07cc08e8bf06781ef8603f7647d6347a451";>b8c1f07
 Bump maven-jar-plugin from 3.2.2 to 3.3.0 (https://github-redirect.dependabot.com/rometools/rome/issues/574";>#574)
   https://github.com/rometools/rome/commit/0d40c865929dc363bc3e3a1b245e6b6c703c3ca6";>0d40c86
 Bump maven-deploy-plugin from 2.8.2 to 3.1.0 (https://github-redirect.dependabot.com/rometools/rome/issues/607";>#607)
   https://github.com/rometools/rome/commit/d2d202fd2d13c79b43666d3251964ebefe2b48d2";>d2d202f
 Bump maven-dependency-plugin from 3.3.0 to 3.5.0 (https://github-redirect.dependabot.com/rometools/rome/issues/602";>#602)
   https://github.com/rometools/rome/commit/ee79577eb38baeabfc0dfdff70cf3cc5693cc4e8";>ee79577
 Bump maven-bundle-plugin from 5.1.5 to 5.1.8 (https://github-redirect.dependabot.com/rometools/rome/issues/563";>#563)
   https://github.com/rometools/rome/commit/8f7683fb1196b77825165b9d1e2fea0445caf075";>8f7683f
 Bump flatten-maven-plugin from 1.2.7 to 1.3.0 (https://github-redirect.dependabot.com/rometools/rome/issues/565";>#565)
   Additional commits viewable in https://github.com/rometools/rome/compare/1.18.0...1.19.0";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=com.rometools:rome&package-manager=maven&previous-version=1.18.0&new-version=

[GitHub] [tika] dependabot[bot] opened a new pull request, #967: Bump maven-javadoc-plugin from 3.4.1 to 3.5.0

2023-02-15 Thread via GitHub


dependabot[bot] opened a new pull request, #967:
URL: https://github.com/apache/tika/pull/967

   Bumps [maven-javadoc-plugin](https://github.com/apache/maven-javadoc-plugin) 
from 3.4.1 to 3.5.0.
   
   Release notes
   Sourced from https://github.com/apache/maven-javadoc-plugin/releases";>maven-javadoc-plugin's
 releases.
   
   3.5.0
   https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12317529&version=12352256";>Release
 Notes
   
   Clean up language and update URLs (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/pull/172";>#172)
 https://github.com/elharo";>@​elharo
   Assorted minor FAQ edits (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/pull/176";>#176)
 https://github.com/elharo";>@​elharo
   https://issues.apache.org/jira/browse/MJAVADOC-738";>[MJAVADOC-738] - 
Upgrade commons-text to 1.10.0 (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/pull/174";>#174)
 https://github.com/cstamas";>@​cstamas
   an --> a (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/pull/171";>#171)
 https://github.com/elharo";>@​elharo
   Update (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/pull/167";>#167)
 https://github.com/elharo";>@​elharo
   https://issues.apache.org/jira/browse/MJAVADOC-685";>[MJAVADOC-685] - 
no longer document deprecated parameter stylesheet (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/pull/165";>#165)
 https://github.com/kwin";>@​kwin
   MJAVADOC-731 update parent, get rid of legacy (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/pull/164";>#164)
 https://github.com/kwin";>@​kwin
   https://issues.apache.org/jira/browse/MJAVADOC-685";>[MJAVADOC-685] - 
Deprecate parameter "stylesheet" (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/pull/162";>#162)
 https://github.com/kwin";>@​kwin
   
   📝 Documentation updates
   
   Typo in AbstractJavadocMojo (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/pull/175";>#175)
 https://github.com/ebourg";>@​ebourg
   licenced --> licensed (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/pull/168";>#168)
 https://github.com/elharo";>@​elharo
   
   👻 Maintenance
   
   https://issues.apache.org/jira/browse/MJAVADOC-729";>[MJAVADOC-729] - 
Link to Javadoc references from JDK 17 (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/pull/161";>#161)
 https://github.com/kwin";>@​kwin
   fix link to documentation of link option (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/pull/160";>#160)
 https://github.com/kwin";>@​kwin
   [[MJAVADOC-721] Parse stderr output and suppress informational lines (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/pull/157";>#157)
 https://github.com/sman-81";>@​sman-81]](https://issues.apache.org/jira/browse/MJAVADOC-721%5D";>https://issues.apache.org/jira/browse/MJAVADOC-721]
 Parse stderr output and suppress informational lines (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/pull/157";>#157)
 https://github.com/sman-81";>@​sman-81) -
   
   
   
   
   Commits
   
   https://github.com/apache/maven-javadoc-plugin/commit/e41f4fda4529a20089d708471909839f123d5988";>e41f4fd
 [maven-release-plugin] prepare release maven-javadoc-plugin-3.5.0
   https://github.com/apache/maven-javadoc-plugin/commit/c56ec0a989b34ee7383b7f5f16e4f8fca69149f8";>c56ec0a
 [MJAVADOC-741] Upgrade plugins and components
   https://github.com/apache/maven-javadoc-plugin/commit/d02fd90c799e12825448ac538bf09abf2a059d82";>d02fd90
 [MJAVADOC-740] Upgrade Parent to 39
   https://github.com/apache/maven-javadoc-plugin/commit/41233152e19b9cf01bc3ac5a072ce36d5e911656";>4123315
 [MJAVADOC-700] Plugin duplicates classes in Java 8 all-classes lists
   https://github.com/apache/maven-javadoc-plugin/commit/fabff9c1ed15007f11d1644e445d766ea9e63c5d";>fabff9c
 Clean up language and update URLs (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/issues/172";>#172)
   https://github.com/apache/maven-javadoc-plugin/commit/a654cc647a06d7f36cea8918ffa05d05489bd5a1";>a654cc6
 Assorted minor FAQ edits (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/issues/176";>#176)
   https://github.com/apache/maven-javadoc-plugin/commit/73557e337eb556fb628823594cf387bf64ef22f5";>73557e3
 Fixed a typo in AbstractJavadocMojo (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/issues/175";>#175)
   https://github.com/apache/maven-javadoc-plugin/commit/96df545acc6a70f64cb2c00b2b7d7544a4f775c4";>96df545
 [MJAVADOC-738] Upgrade commons-text to 1.10.0 (https://github-redirect.dependabot.com/apache/maven-javadoc-plugin/issues/174";>#174)
   https://github.com/apache/maven-javadoc-plugin/commit/f21b24c76cbd9724c3c5777b10e696dd1ea3f3e1";>f21b24c
 update Reproducible Builds badge link
   https://github.com/apache/maven-javadoc-plugin/commit/5b61ee915298b51a273bd88612fb346ac639d

[GitHub] [tika] dependabot[bot] opened a new pull request, #968: Bump aws.version from 1.12.407 to 1.12.408

2023-02-15 Thread via GitHub


dependabot[bot] opened a new pull request, #968:
URL: https://github.com/apache/tika/pull/968

   Bumps `aws.version` from 1.12.407 to 1.12.408.
   Updates `aws-java-sdk-s3` from 1.12.407 to 1.12.408
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md";>aws-java-sdk-s3's
 changelog.
   
   1.12.408 2023-02-15
   AWS CloudTrail
   
   
   Features
   
   This release adds an InsufficientEncryptionPolicyException type to the 
StartImport endpoint
   
   
   
   AWS Glue
   
   
   Features
   
   Fix DirectJDBCSource not showing up in CLI code gen
   
   
   
   AWS Private 5G
   
   
   Features
   
   This release introduces a new StartNetworkResourceUpdate API, which 
enables return/replacement of hardware from a NetworkSite.
   
   
   
   AWS WAFV2
   
   
   Features
   
   For protected CloudFront distributions, you can now use the AWS WAF 
Fraud Control account takeover prevention (ATP) managed rule group to block new 
login attempts from clients that have recently submitted too many failed login 
attempts.
   
   
   
   Amazon Elastic File System
   
   
   Features
   
   Documentation update for EFS to support IAM best practices.
   
   
   
   Amazon Fraud Detector
   
   
   Features
   
   This release introduces Lists feature which allows customers to 
reference a set of values in Fraud Detector's rules. With Lists, customers can 
dynamically manage these attributes in real time. Lists can be created/deleted 
and its contents can be modified using the Fraud Detector API.
   
   
   
   Amazon Relational Database Service
   
   
   Features
   
   Database Activity Stream support for RDS for SQL Server.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/962d0be6cd8b08be1403ac842b767392ca97f1a3";>962d0be
 AWS SDK for Java 1.12.408
   https://github.com/aws/aws-sdk-java/commit/d5a9c44cd0185a30dfad410c6ec71c3e78168b28";>d5a9c44
 Update GitHub version number to 1.12.408-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.407...1.12.408";>compare 
view
   
   
   
   
   Updates `aws-java-sdk-transcribe` from 1.12.407 to 1.12.408
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md";>aws-java-sdk-transcribe's
 changelog.
   
   1.12.408 2023-02-15
   AWS CloudTrail
   
   
   Features
   
   This release adds an InsufficientEncryptionPolicyException type to the 
StartImport endpoint
   
   
   
   AWS Glue
   
   
   Features
   
   Fix DirectJDBCSource not showing up in CLI code gen
   
   
   
   AWS Private 5G
   
   
   Features
   
   This release introduces a new StartNetworkResourceUpdate API, which 
enables return/replacement of hardware from a NetworkSite.
   
   
   
   AWS WAFV2
   
   
   Features
   
   For protected CloudFront distributions, you can now use the AWS WAF 
Fraud Control account takeover prevention (ATP) managed rule group to block new 
login attempts from clients that have recently submitted too many failed login 
attempts.
   
   
   
   Amazon Elastic File System
   
   
   Features
   
   Documentation update for EFS to support IAM best practices.
   
   
   
   Amazon Fraud Detector
   
   
   Features
   
   This release introduces Lists feature which allows customers to 
reference a set of values in Fraud Detector's rules. With Lists, customers can 
dynamically manage these attributes in real time. Lists can be created/deleted 
and its contents can be modified using the Fraud Detector API.
   
   
   
   Amazon Relational Database Service
   
   
   Features
   
   Database Activity Stream support for RDS for SQL Server.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/962d0be6cd8b08be1403ac842b767392ca97f1a3";>962d0be
 AWS SDK for Java 1.12.408
   https://github.com/aws/aws-sdk-java/commit/d5a9c44cd0185a30dfad410c6ec71c3e78168b28";>d5a9c44
 Update GitHub version number to 1.12.408-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.407...1.12.408";>compare 
view
   
   
   
   
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and 

[GitHub] [tika] THausherr merged pull request #968: Bump aws.version from 1.12.407 to 1.12.408

2023-02-15 Thread via GitHub


THausherr merged PR #968:
URL: https://github.com/apache/tika/pull/968


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] THausherr merged pull request #967: Bump maven-javadoc-plugin from 3.4.1 to 3.5.0

2023-02-15 Thread via GitHub


THausherr merged PR #967:
URL: https://github.com/apache/tika/pull/967


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] THausherr merged pull request #966: Bump rome from 1.18.0 to 1.19.0

2023-02-15 Thread via GitHub


THausherr merged PR #966:
URL: https://github.com/apache/tika/pull/966


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org