Re: Recording/Streaming Apache Tika Virtual Meetings to YouTube

2021-05-12 Thread Sally Khudairi
You're most welcome, Lewis.

Best,
Sally

- - - 
Vice President Marketing & Publicity
Vice President Sponsor Relations
The Apache Software Foundation

Tel +1 617 921 8656 | s...@apache.org

On Wed, May 12, 2021, at 21:51, lewis john mcgibbney wrote:
> Excellent. Thank you Sally for the info so far.
> 
> On Wed, May 12, 2021 at 17:06 Sally Khudairi  wrote:
> 
> > Perfect; thanks for the clarification, Lewis.
> >
> > What we've done in the past is help post recordings to the ASF YouTube
> > channel.
> >
> > I suspect that Swapnil may have suggestions on how to best go about that
> > if the PMC needs help.
> >
> > Best,
> > Sally
> >
> > - - -
> > Vice President Marketing & Publicity
> > Vice President Sponsor Relations
> > The Apache Software Foundation
> >
> > Tel +1 617 921 8656 | s...@apache.org
> >
> >
> > On Wed, May 12, 2021, at 20:00, lewis john mcgibbney wrote:
> >
> > Thanks for the info. Ive never live-streamed to YT before. I think that
> > maintaining the recordings outside of YT and merely uploading them there
> > would be preferred.
> > Thank you
> >
> > On Wed, May 12, 2021 at 15:32 Sally Khudairi  wrote:
> >
> >
> > Hi Lewis --popping out of the blindcopy here.
> >
> > If the PMC has the recordings, we'll be able to upload to the ASF YouTube
> > channel.
> >
> > We don't offer recording or editing services, in case that's what you're
> > seeking. There are different methods and products that allow one to record
> > from a YouTube live streaming session, but YouTube doesn't provide the
> > ability to record.
> >
> > I suspect that Swapnil may have some advice as to how to best go about
> > that, as well as the possibility to live stream from the ASF account.
> >
> > Hope this helps,
> > Sally
> >
> > - - -
> > Vice President Marketing & Publicity
> > Vice President Sponsor Relations
> > The Apache Software Foundation
> >
> > Tel +1 617 921 8656 | s...@apache.org
> >
> >
> > On Wed, May 12, 2021, at 18:06, lewis john mcgibbney wrote:
> >
> > Excellent Sally, moving you and press@ to Bcc
> >
> > Kenneth, Swapnil please let me know the logistics here. Our next proposed
> > meeting is this time next month just to give you an idea. Maybe this is too
> > tight for us to arrange?
> >
> > Thanks for your consideration.
> >
> > lewismc
> >
> > On Wed, May 12, 2021 at 13:52 Sally Khudairi  wrote:
> >
> >
> > Hello, Lewis --yes, of course we can help.
> >
> > I'm copying Kenneth Paskett and Swapnil Mane from Central Services who
> > will be able to help you with this when you're ready.
> >
> > Many thanks in advance Kenneth and Swapnil for your help with this!
> >
> > Best,
> > Sally
> >
> > - - -
> > Vice President Marketing & Publicity
> > Vice President Sponsor Relations
> > The Apache Software Foundation
> >
> > Tel +1 617 921 8656 | s...@apache.org
> >
> >
> > On Wed, May 12, 2021, at 16:20, lewis john mcgibbney wrote:
> >
> > Hello press@,
> > The Tika community held its first virtual tagup today. With the goal of
> > archiving and making available the meeting content to those not able to
> > attend, we wanted to enquire about the possibility of recording/streaming
> > future virtual meetings to YouTube...
> > Is this something you could give us advice on?
> > Thank you
> > lewismc
> > (On behalf of the Tika PMC)
> >
> >
> >
> > --
> > http://home.apache.org/~lewismc/
> > http://people.apache.org/keys/committer/lewismc
> >
> >
> > --
> > http://home.apache.org/~lewismc/
> > http://people.apache.org/keys/committer/lewismc
> >
> >
> > --
> > http://home.apache.org/~lewismc/
> > http://people.apache.org/keys/committer/lewismc
> >
> > --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
> 


Re: Recording/Streaming Apache Tika Virtual Meetings to YouTube

2021-05-12 Thread lewis john mcgibbney
Excellent. Thank you Sally for the info so far.

On Wed, May 12, 2021 at 17:06 Sally Khudairi  wrote:

> Perfect; thanks for the clarification, Lewis.
>
> What we've done in the past is help post recordings to the ASF YouTube
> channel.
>
> I suspect that Swapnil may have suggestions on how to best go about that
> if the PMC needs help.
>
> Best,
> Sally
>
> - - -
> Vice President Marketing & Publicity
> Vice President Sponsor Relations
> The Apache Software Foundation
>
> Tel +1 617 921 8656 | s...@apache.org
>
>
> On Wed, May 12, 2021, at 20:00, lewis john mcgibbney wrote:
>
> Thanks for the info. Ive never live-streamed to YT before. I think that
> maintaining the recordings outside of YT and merely uploading them there
> would be preferred.
> Thank you
>
> On Wed, May 12, 2021 at 15:32 Sally Khudairi  wrote:
>
>
> Hi Lewis --popping out of the blindcopy here.
>
> If the PMC has the recordings, we'll be able to upload to the ASF YouTube
> channel.
>
> We don't offer recording or editing services, in case that's what you're
> seeking. There are different methods and products that allow one to record
> from a YouTube live streaming session, but YouTube doesn't provide the
> ability to record.
>
> I suspect that Swapnil may have some advice as to how to best go about
> that, as well as the possibility to live stream from the ASF account.
>
> Hope this helps,
> Sally
>
> - - -
> Vice President Marketing & Publicity
> Vice President Sponsor Relations
> The Apache Software Foundation
>
> Tel +1 617 921 8656 | s...@apache.org
>
>
> On Wed, May 12, 2021, at 18:06, lewis john mcgibbney wrote:
>
> Excellent Sally, moving you and press@ to Bcc
>
> Kenneth, Swapnil please let me know the logistics here. Our next proposed
> meeting is this time next month just to give you an idea. Maybe this is too
> tight for us to arrange?
>
> Thanks for your consideration.
>
> lewismc
>
> On Wed, May 12, 2021 at 13:52 Sally Khudairi  wrote:
>
>
> Hello, Lewis --yes, of course we can help.
>
> I'm copying Kenneth Paskett and Swapnil Mane from Central Services who
> will be able to help you with this when you're ready.
>
> Many thanks in advance Kenneth and Swapnil for your help with this!
>
> Best,
> Sally
>
> - - -
> Vice President Marketing & Publicity
> Vice President Sponsor Relations
> The Apache Software Foundation
>
> Tel +1 617 921 8656 | s...@apache.org
>
>
> On Wed, May 12, 2021, at 16:20, lewis john mcgibbney wrote:
>
> Hello press@,
> The Tika community held its first virtual tagup today. With the goal of
> archiving and making available the meeting content to those not able to
> attend, we wanted to enquire about the possibility of recording/streaming
> future virtual meetings to YouTube...
> Is this something you could give us advice on?
> Thank you
> lewismc
> (On behalf of the Tika PMC)
>
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>
> --
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: Recording/Streaming Apache Tika Virtual Meetings to YouTube

2021-05-12 Thread Sally Khudairi
Perfect; thanks for the clarification, Lewis.

What we've done in the past is help post recordings to the ASF YouTube channel.

I suspect that Swapnil may have suggestions on how to best go about that if the 
PMC needs help.

Best,
Sally

- - -
Vice President Marketing & Publicity
Vice President Sponsor Relations
The Apache Software Foundation

Tel +1 617 921 8656 | s...@apache.org


On Wed, May 12, 2021, at 20:00, lewis john mcgibbney wrote:
> Thanks for the info. Ive never live-streamed to YT before. I think that 
> maintaining the recordings outside of YT and merely uploading them there 
> would be preferred. 
> Thank you 
> 
> On Wed, May 12, 2021 at 15:32 Sally Khudairi  wrote:
>> __
>> Hi Lewis --popping out of the blindcopy here.
>> 
>> If the PMC has the recordings, we'll be able to upload to the ASF YouTube 
>> channel. 
>> 
>> We don't offer recording or editing services, in case that's what you're 
>> seeking. There are different methods and products that allow one to record 
>> from a YouTube live streaming session, but YouTube doesn't provide the 
>> ability to record. 
>> 
>> I suspect that Swapnil may have some advice as to how to best go about that, 
>> as well as the possibility to live stream from the ASF account. 
>> 
>> Hope this helps,
>> Sally
>> 
>> - - -
>> Vice President Marketing & Publicity
>> Vice President Sponsor Relations
>> The Apache Software Foundation
>> 
>> Tel +1 617 921 8656 | s...@apache.org
>> 
>> 
>> On Wed, May 12, 2021, at 18:06, lewis john mcgibbney wrote:
>>> Excellent Sally, moving you and press@ to Bcc
>>> 
>>> Kenneth, Swapnil please let me know the logistics here. Our next proposed 
>>> meeting is this time next month just to give you an idea. Maybe this is too 
>>> tight for us to arrange?
>>> 
>>> Thanks for your consideration.
>>> 
>>> lewismc
>>> 
>>> On Wed, May 12, 2021 at 13:52 Sally Khudairi  wrote:
 __
 Hello, Lewis --yes, of course we can help.
 
 I'm copying Kenneth Paskett and Swapnil Mane from Central Services who 
 will be able to help you with this when you're ready.
 
 Many thanks in advance Kenneth and Swapnil for your help with this!
 
 Best,
 Sally
 
 - - -
 Vice President Marketing & Publicity
 Vice President Sponsor Relations
 The Apache Software Foundation
 
 Tel +1 617 921 8656 | s...@apache.org
 
 
 On Wed, May 12, 2021, at 16:20, lewis john mcgibbney wrote:
> Hello press@,
> The Tika community held its first virtual tagup today. With the goal of 
> archiving and making available the meeting content to those not able to 
> attend, we wanted to enquire about the possibility of recording/streaming 
> future virtual meetings to YouTube...
> Is this something you could give us advice on?
> Thank you
> lewismc
> (On behalf of the Tika PMC)
> 
> 
> -- 
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
 
>>> -- 
>>> http://home.apache.org/~lewismc/
>>> http://people.apache.org/keys/committer/lewismc
>> 
> -- 
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc

Re: Recording/Streaming Apache Tika Virtual Meetings to YouTube

2021-05-12 Thread lewis john mcgibbney
Thanks for the info. Ive never live-streamed to YT before. I think that
maintaining the recordings outside of YT and merely uploading them there
would be preferred.
Thank you

On Wed, May 12, 2021 at 15:32 Sally Khudairi  wrote:

> Hi Lewis --popping out of the blindcopy here.
>
> If the PMC has the recordings, we'll be able to upload to the ASF YouTube
> channel.
>
> We don't offer recording or editing services, in case that's what you're
> seeking. There are different methods and products that allow one to record
> from a YouTube live streaming session, but YouTube doesn't provide the
> ability to record.
>
> I suspect that Swapnil may have some advice as to how to best go about
> that, as well as the possibility to live stream from the ASF account.
>
> Hope this helps,
> Sally
>
> - - -
> Vice President Marketing & Publicity
> Vice President Sponsor Relations
> The Apache Software Foundation
>
> Tel +1 617 921 8656 | s...@apache.org
>
>
> On Wed, May 12, 2021, at 18:06, lewis john mcgibbney wrote:
>
> Excellent Sally, moving you and press@ to Bcc
>
> Kenneth, Swapnil please let me know the logistics here. Our next proposed
> meeting is this time next month just to give you an idea. Maybe this is too
> tight for us to arrange?
>
> Thanks for your consideration.
>
> lewismc
>
> On Wed, May 12, 2021 at 13:52 Sally Khudairi  wrote:
>
>
> Hello, Lewis --yes, of course we can help.
>
> I'm copying Kenneth Paskett and Swapnil Mane from Central Services who
> will be able to help you with this when you're ready.
>
> Many thanks in advance Kenneth and Swapnil for your help with this!
>
> Best,
> Sally
>
> - - -
> Vice President Marketing & Publicity
> Vice President Sponsor Relations
> The Apache Software Foundation
>
> Tel +1 617 921 8656 | s...@apache.org
>
>
> On Wed, May 12, 2021, at 16:20, lewis john mcgibbney wrote:
>
> Hello press@,
> The Tika community held its first virtual tagup today. With the goal of
> archiving and making available the meeting content to those not able to
> attend, we wanted to enquire about the possibility of recording/streaming
> future virtual meetings to YouTube...
> Is this something you could give us advice on?
> Thank you
> lewismc
> (On behalf of the Tika PMC)
>
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>
>
> --
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[jira] [Commented] (TIKA-3400) Use equals for Object and String Comparison Instead of ==

2021-05-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343629#comment-17343629
 ] 

ASF GitHub Bot commented on TIKA-3400:
--

kamaci opened a new pull request #441:
URL: https://github.com/apache/tika/pull/441


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use equals for Object and String Comparison Instead of ==
> -
>
> Key: TIKA-3400
> URL: https://issues.apache.org/jira/browse/TIKA-3400
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Major
> Fix For: 1.27
>
>
> equals() is used for object and string comparison but == compares them by 
> identity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [tika] kamaci opened a new pull request #441: fix for TIKA-3400 contributed by kamaci

2021-05-12 Thread GitBox


kamaci opened a new pull request #441:
URL: https://github.com/apache/tika/pull/441


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (TIKA-3400) Use equals for Object and String Comparison Instead of ==

2021-05-12 Thread Furkan Kamaci (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan Kamaci updated TIKA-3400:

Description: equals() is used for object and string comparison but == 
compares them by identity.  (was: `equals()` is used for object and string 
comparison but `==` compares them by identity.)

> Use equals for Object and String Comparison Instead of ==
> -
>
> Key: TIKA-3400
> URL: https://issues.apache.org/jira/browse/TIKA-3400
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Major
> Fix For: 1.27
>
>
> equals() is used for object and string comparison but == compares them by 
> identity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TIKA-3400) Use equals for Object and String Comparison Instead of ==

2021-05-12 Thread Furkan Kamaci (Jira)
Furkan Kamaci created TIKA-3400:
---

 Summary: Use equals for Object and String Comparison Instead of ==
 Key: TIKA-3400
 URL: https://issues.apache.org/jira/browse/TIKA-3400
 Project: Tika
  Issue Type: Bug
Affects Versions: 1.26
Reporter: Furkan Kamaci
 Fix For: 1.27


`equals()` is used for object and string comparison but `==` compares them by 
identity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Recording/Streaming Apache Tika Virtual Meetings to YouTube

2021-05-12 Thread Sally Khudairi
Hi Lewis --popping out of the blindcopy here.

If the PMC has the recordings, we'll be able to upload to the ASF YouTube 
channel. 

We don't offer recording or editing services, in case that's what you're 
seeking. There are different methods and products that allow one to record from 
a YouTube live streaming session, but YouTube doesn't provide the ability to 
record. 

I suspect that Swapnil may have some advice as to how to best go about that, as 
well as the possibility to live stream from the ASF account. 

Hope this helps,
Sally

- - - 
Vice President Marketing & Publicity
Vice President Sponsor Relations
The Apache Software Foundation

Tel +1 617 921 8656 | s...@apache.org


On Wed, May 12, 2021, at 18:06, lewis john mcgibbney wrote:
> Excellent Sally, moving you and press@ to Bcc
> 
> Kenneth, Swapnil please let me know the logistics here. Our next proposed 
> meeting is this time next month just to give you an idea. Maybe this is too 
> tight for us to arrange?
> 
> Thanks for your consideration.
> 
> lewismc
> 
> On Wed, May 12, 2021 at 13:52 Sally Khudairi  wrote:
>> __
>> Hello, Lewis --yes, of course we can help.
>> 
>> I'm copying Kenneth Paskett and Swapnil Mane from Central Services who will 
>> be able to help you with this when you're ready.
>> 
>> Many thanks in advance Kenneth and Swapnil for your help with this!
>> 
>> Best,
>> Sally
>> 
>> - - -
>> Vice President Marketing & Publicity
>> Vice President Sponsor Relations
>> The Apache Software Foundation
>> 
>> Tel +1 617 921 8656 | s...@apache.org
>> 
>> 
>> On Wed, May 12, 2021, at 16:20, lewis john mcgibbney wrote:
>>> Hello press@,
>>> The Tika community held its first virtual tagup today. With the goal of 
>>> archiving and making available the meeting content to those not able to 
>>> attend, we wanted to enquire about the possibility of recording/streaming 
>>> future virtual meetings to YouTube...
>>> Is this something you could give us advice on?
>>> Thank you
>>> lewismc
>>> (On behalf of the Tika PMC)
>>> 
>>> -- 
>>> http://home.apache.org/~lewismc/
>>> http://people.apache.org/keys/committer/lewismc
>> 
> -- 
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc


Re: Recording/Streaming Apache Tika Virtual Meetings to YouTube

2021-05-12 Thread lewis john mcgibbney
Excellent Sally, moving you and press@ to Bcc

Kenneth, Swapnil please let me know the logistics here. Our next proposed
meeting is this time next month just to give you an idea. Maybe this is too
tight for us to arrange?

Thanks for your consideration.

lewismc

On Wed, May 12, 2021 at 13:52 Sally Khudairi  wrote:

> Hello, Lewis --yes, of course we can help.
>
> I'm copying Kenneth Paskett and Swapnil Mane from Central Services who
> will be able to help you with this when you're ready.
>
> Many thanks in advance Kenneth and Swapnil for your help with this!
>
> Best,
> Sally
>
> - - -
> Vice President Marketing & Publicity
> Vice President Sponsor Relations
> The Apache Software Foundation
>
> Tel +1 617 921 8656 | s...@apache.org
>
>
> On Wed, May 12, 2021, at 16:20, lewis john mcgibbney wrote:
>
> Hello press@,
> The Tika community held its first virtual tagup today. With the goal of
> archiving and making available the meeting content to those not able to
> attend, we wanted to enquire about the possibility of recording/streaming
> future virtual meetings to YouTube...
> Is this something you could give us advice on?
> Thank you
> lewismc
> (On behalf of the Tika PMC)
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>
>
> --
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: Recording/Streaming Apache Tika Virtual Meetings to YouTube

2021-05-12 Thread Sally Khudairi
Hello, Lewis --yes, of course we can help.

I'm copying Kenneth Paskett and Swapnil Mane from Central Services who will be 
able to help you with this when you're ready.

Many thanks in advance Kenneth and Swapnil for your help with this!

Best,
Sally

- - - 
Vice President Marketing & Publicity
Vice President Sponsor Relations
The Apache Software Foundation

Tel +1 617 921 8656 | s...@apache.org


On Wed, May 12, 2021, at 16:20, lewis john mcgibbney wrote:
> Hello press@,
> The Tika community held its first virtual tagup today. With the goal of 
> archiving and making available the meeting content to those not able to 
> attend, we wanted to enquire about the possibility of recording/streaming 
> future virtual meetings to YouTube...
> Is this something you could give us advice on?
> Thank you
> lewismc
> (On behalf of the Tika PMC)
> 
> -- 
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc


Recording/Streaming Apache Tika Virtual Meetings to YouTube

2021-05-12 Thread lewis john mcgibbney
Hello press@,
The Tika community held its first virtual tagup today. With the goal of
archiving and making available the meeting content to those not able to
attend, we wanted to enquire about the possibility of recording/streaming
future virtual meetings to YouTube...
Is this something you could give us advice on?
Thank you
lewismc
(On behalf of the Tika PMC)

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[jira] [Commented] (TIKA-3392) Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml dependencies.

2021-05-12 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343551#comment-17343551
 ] 

Tim Allison commented on TIKA-3392:
---

For TikaConfig, we're using the XMLReaderUtils, which also warns on failed 
security but does not crash; same for ExternalParsersConfigReader.  I can't 
remember now why I created a whole separate system for mimetypes... :( Will 
look through some issues.

> Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml 
> dependencies.
> --
>
> Key: TIKA-3392
> URL: https://issues.apache.org/jira/browse/TIKA-3392
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26
> Environment: Android 11
>Reporter: Andrei Dobrescu
>Priority: Major
>  Labels: android
> Fix For: 1.27
>
> Attachments: image-2021-05-11-17-53-58-291.png, 
> image-2021-05-11-18-10-40-949.png, image-2021-05-11-18-12-15-300.png
>
>
> I use Apache Tika on Android in order to detect mime type of varios files:
> Apache Tika V1.10 works fine on Android:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {code:java}
> val mimeType = file.inputStream().buffered().use { inputStream ->
> AutoDetectParser().detector .detect(inputStream, Metadata()).toString()
> }
> {code}
> However, Tika V1.26 will crash when trying to detect the mime type:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {noformat}
> java.lang.ExceptionInInitializerError
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
> at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE:
> java.lang.RuntimeException: problem initializing SAXParser pool
> at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:119)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at 
> org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
> at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE:
>  org.apache.tika.exception.TikaException: problem creating SAX parser factory
>  at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:433)
>  at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>  at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:117)
>  at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>  at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>  at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
>  at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
>  at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
>  at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE OF CAUSE:
> org.xml.sax.SAXNotRecognizedException: 
> http://javax.xml.XMLConstants/feature/secure-processing
>  at 
> org.apache.harmony.xml.parsers.SAXParserFactoryImpl.setFeature(SAXParserFactoryImpl.java:93)
>  at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:429)
>  at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>  at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:117)
>  at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>  at 
> 

[MINUTES] Apache Tika Community Virtual Tagup

2021-05-12 Thread lewis john mcgibbney
Hi user@ dev@,

A few of us who are over on the ASF #tika Slack channel had our first
virtual tagup today.
The meeting minutes are captured at https://s.apache.org/a5evi for anyone
interested.

We decided to re-purpose this tagup as a GENERAL Apache Tika virtual tagup
(rather than restrict it to container orchestration only). We hope that
this appeals to more of the community.

We are also pursuing the possibility of us streaming these tagups to
YouTube so that they can be archived and accessed by those not able to
attend.

Thank you

lewismc

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[jira] [Commented] (TIKA-3392) Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml dependencies.

2021-05-12 Thread Nick Burch (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343485#comment-17343485
 ] 

Nick Burch commented on TIKA-3392:
--

[~tallison] What about the other Tika "own" XML files like Tika Config or 
External Parsers definitions, should we not do the same thing for them?

> Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml 
> dependencies.
> --
>
> Key: TIKA-3392
> URL: https://issues.apache.org/jira/browse/TIKA-3392
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26
> Environment: Android 11
>Reporter: Andrei Dobrescu
>Priority: Major
>  Labels: android
> Fix For: 1.27
>
> Attachments: image-2021-05-11-17-53-58-291.png, 
> image-2021-05-11-18-10-40-949.png, image-2021-05-11-18-12-15-300.png
>
>
> I use Apache Tika on Android in order to detect mime type of varios files:
> Apache Tika V1.10 works fine on Android:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {code:java}
> val mimeType = file.inputStream().buffered().use { inputStream ->
> AutoDetectParser().detector .detect(inputStream, Metadata()).toString()
> }
> {code}
> However, Tika V1.26 will crash when trying to detect the mime type:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {noformat}
> java.lang.ExceptionInInitializerError
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
> at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE:
> java.lang.RuntimeException: problem initializing SAXParser pool
> at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:119)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at 
> org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
> at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE:
>  org.apache.tika.exception.TikaException: problem creating SAX parser factory
>  at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:433)
>  at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>  at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:117)
>  at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>  at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>  at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
>  at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
>  at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
>  at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE OF CAUSE:
> org.xml.sax.SAXNotRecognizedException: 
> http://javax.xml.XMLConstants/feature/secure-processing
>  at 
> org.apache.harmony.xml.parsers.SAXParserFactoryImpl.setFeature(SAXParserFactoryImpl.java:93)
>  at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:429)
>  at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>  at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:117)
>  at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>  at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>  at 
> 

[jira] [Commented] (TIKA-3392) Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml dependencies.

2021-05-12 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343465#comment-17343465
 ] 

Hudson commented on TIKA-3392:
--

UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #231 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/231/])
TIKA-3392 -- allow insecure parsing in MimeTypesReader; log a warning 
(tallison: 
[https://github.com/apache/tika/commit/03b6baf1efdb43b299cbe025184cf24cca9a8847])
* (edit) tika-core/src/main/java/org/apache/tika/mime/MimeTypesReader.java


> Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml 
> dependencies.
> --
>
> Key: TIKA-3392
> URL: https://issues.apache.org/jira/browse/TIKA-3392
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26
> Environment: Android 11
>Reporter: Andrei Dobrescu
>Priority: Major
>  Labels: android
> Fix For: 1.27
>
> Attachments: image-2021-05-11-17-53-58-291.png, 
> image-2021-05-11-18-10-40-949.png, image-2021-05-11-18-12-15-300.png
>
>
> I use Apache Tika on Android in order to detect mime type of varios files:
> Apache Tika V1.10 works fine on Android:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {code:java}
> val mimeType = file.inputStream().buffered().use { inputStream ->
> AutoDetectParser().detector .detect(inputStream, Metadata()).toString()
> }
> {code}
> However, Tika V1.26 will crash when trying to detect the mime type:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {noformat}
> java.lang.ExceptionInInitializerError
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
> at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE:
> java.lang.RuntimeException: problem initializing SAXParser pool
> at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:119)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at 
> org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
> at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE:
>  org.apache.tika.exception.TikaException: problem creating SAX parser factory
>  at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:433)
>  at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>  at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:117)
>  at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>  at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>  at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
>  at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
>  at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
>  at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE OF CAUSE:
> org.xml.sax.SAXNotRecognizedException: 
> http://javax.xml.XMLConstants/feature/secure-processing
>  at 
> org.apache.harmony.xml.parsers.SAXParserFactoryImpl.setFeature(SAXParserFactoryImpl.java:93)
>  at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:429)
>  at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>  at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:117)
>  at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>  at 
> 

[jira] [Commented] (TIKA-3392) Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml dependencies.

2021-05-12 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343444#comment-17343444
 ] 

Hudson commented on TIKA-3392:
--

SUCCESS: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #129 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/129/])
TIKA-3392 -- allow insecure parsing in MimeTypesReader (tallison: 
[https://github.com/apache/tika/commit/06c111f82a14a34492b3302b4d8310645b6f8366])
* (edit) tika-core/src/main/java/org/apache/tika/mime/MimeTypesReader.java


> Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml 
> dependencies.
> --
>
> Key: TIKA-3392
> URL: https://issues.apache.org/jira/browse/TIKA-3392
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26
> Environment: Android 11
>Reporter: Andrei Dobrescu
>Priority: Major
>  Labels: android
> Fix For: 1.27
>
> Attachments: image-2021-05-11-17-53-58-291.png, 
> image-2021-05-11-18-10-40-949.png, image-2021-05-11-18-12-15-300.png
>
>
> I use Apache Tika on Android in order to detect mime type of varios files:
> Apache Tika V1.10 works fine on Android:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {code:java}
> val mimeType = file.inputStream().buffered().use { inputStream ->
> AutoDetectParser().detector .detect(inputStream, Metadata()).toString()
> }
> {code}
> However, Tika V1.26 will crash when trying to detect the mime type:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {noformat}
> java.lang.ExceptionInInitializerError
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
> at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE:
> java.lang.RuntimeException: problem initializing SAXParser pool
> at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:119)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at 
> org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
> at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE:
>  org.apache.tika.exception.TikaException: problem creating SAX parser factory
>  at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:433)
>  at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>  at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:117)
>  at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>  at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>  at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
>  at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
>  at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
>  at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE OF CAUSE:
> org.xml.sax.SAXNotRecognizedException: 
> http://javax.xml.XMLConstants/feature/secure-processing
>  at 
> org.apache.harmony.xml.parsers.SAXParserFactoryImpl.setFeature(SAXParserFactoryImpl.java:93)
>  at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:429)
>  at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>  at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:117)
>  at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>  at 
> 

[jira] [Resolved] (TIKA-3399) Fix Non-Atomic Operations on Volatile Fields

2021-05-12 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3399.
---
Resolution: Fixed

> Fix Non-Atomic Operations on Volatile Fields
> 
>
> Key: TIKA-3399
> URL: https://issues.apache.org/jira/browse/TIKA-3399
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Major
> Fix For: 2.0.0
>
>
> It is possible for the value of the volatile field at non-atomic operations 
> to change between the read and the write, possibly invalidating the operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3399) Fix Non-Atomic Operations on Volatile Fields

2021-05-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343432#comment-17343432
 ] 

ASF GitHub Bot commented on TIKA-3399:
--

tballison commented on pull request #440:
URL: https://github.com/apache/tika/pull/440#issuecomment-839990196


   Thank you @kamaci!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix Non-Atomic Operations on Volatile Fields
> 
>
> Key: TIKA-3399
> URL: https://issues.apache.org/jira/browse/TIKA-3399
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Major
> Fix For: 1.27
>
>
> It is possible for the value of the volatile field at non-atomic operations 
> to change between the read and the write, possibly invalidating the operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-3399) Fix Non-Atomic Operations on Volatile Fields

2021-05-12 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-3399:
--
Fix Version/s: (was: 1.27)
   2.0.0

> Fix Non-Atomic Operations on Volatile Fields
> 
>
> Key: TIKA-3399
> URL: https://issues.apache.org/jira/browse/TIKA-3399
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Major
> Fix For: 2.0.0
>
>
> It is possible for the value of the volatile field at non-atomic operations 
> to change between the read and the write, possibly invalidating the operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [tika] tballison commented on pull request #440: fix for TIKA-3399 contributed by kamaci

2021-05-12 Thread GitBox


tballison commented on pull request #440:
URL: https://github.com/apache/tika/pull/440#issuecomment-839990196


   Thank you @kamaci!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (TIKA-3399) Fix Non-Atomic Operations on Volatile Fields

2021-05-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343429#comment-17343429
 ] 

ASF GitHub Bot commented on TIKA-3399:
--

kamaci opened a new pull request #440:
URL: https://github.com/apache/tika/pull/440


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix Non-Atomic Operations on Volatile Fields
> 
>
> Key: TIKA-3399
> URL: https://issues.apache.org/jira/browse/TIKA-3399
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Major
> Fix For: 1.27
>
>
> It is possible for the value of the volatile field at non-atomic operations 
> to change between the read and the write, possibly invalidating the operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [tika] kamaci opened a new pull request #440: fix for TIKA-3399 contributed by kamaci

2021-05-12 Thread GitBox


kamaci opened a new pull request #440:
URL: https://github.com/apache/tika/pull/440


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (TIKA-94) Speech-to-text transcription

2021-05-12 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated TIKA-94:
-
Labels: new-parser tika-transcription  (was: new-parser tika-tr)

> Speech-to-text transcription
> 
>
> Key: TIKA-94
> URL: https://issues.apache.org/jira/browse/TIKA-94
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Jukka Zitting
>Assignee: Lewis John McGibbney
>Priority: Minor
>  Labels: new-parser, tika-transcription
> Fix For: 1.27
>
>
> Like OCR for image files (TIKA-93), we could try using speech recognition to 
> extract text content (where available) from audio (and video!) files.
> The CMU Sphinx engine (http://cmusphinx.sourceforge.net/) looks promising and 
> comes with a friendly license.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-94) Speech-to-text transcription

2021-05-12 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated TIKA-94:
-
Labels: new-parser tika-tr  (was: new-parser)

> Speech-to-text transcription
> 
>
> Key: TIKA-94
> URL: https://issues.apache.org/jira/browse/TIKA-94
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Jukka Zitting
>Assignee: Lewis John McGibbney
>Priority: Minor
>  Labels: new-parser, tika-tr
> Fix For: 1.27
>
>
> Like OCR for image files (TIKA-93), we could try using speech recognition to 
> extract text content (where available) from audio (and video!) files.
> The CMU Sphinx engine (http://cmusphinx.sourceforge.net/) looks promising and 
> comes with a friendly license.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TIKA-3399) Fix Non-Atomic Operations on Volatile Fields

2021-05-12 Thread Furkan Kamaci (Jira)
Furkan Kamaci created TIKA-3399:
---

 Summary: Fix Non-Atomic Operations on Volatile Fields
 Key: TIKA-3399
 URL: https://issues.apache.org/jira/browse/TIKA-3399
 Project: Tika
  Issue Type: Bug
Affects Versions: 1.26
Reporter: Furkan Kamaci
 Fix For: 1.27


It is possible for the value of the volatile field at non-atomic operations to 
change between the read and the write, possibly invalidating the operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3398) Tidy Up Code for Performance Improvements

2021-05-12 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343415#comment-17343415
 ] 

Hudson commented on TIKA-3398:
--

UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #230 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/230/])
fix for TIKA-3398 contributed by kamaci (#439) (github: 
[https://github.com/apache/tika/commit/c0331e3f74635cda68a345402b2855792a0bc140])
* (edit) tika-app/src/main/java/org/apache/tika/gui/TikaGUI.java
* (edit) 
tika-core/src/test/java/org/apache/tika/utils/ServiceLoaderUtilsTest.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-pdf-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
* (edit) tika-core/src/main/java/org/apache/tika/pipes/PipesClient.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
* (edit) 
tika-eval/tika-eval-app/src/test/java/org/apache/tika/eval/app/db/AbstractBufferTest.java
* (edit) 
tika-fuzzing/src/main/java/org/apache/tika/fuzzing/pdf/EvilCOSWriter.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/onenote/IndentUtil.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/chm/DirectoryListingEntry.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/AbstractListManager.java
* (edit) 
tika-eval/tika-eval-app/src/main/java/org/apache/tika/eval/app/tools/CommonTokenOverlapCounter.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-crypto-module/src/main/java/org/apache/tika/parser/crypto/TSDParser.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/chm/ChmPmgiHeader.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-xmp-commons/src/test/java/org/apache/tika/parser/xmp/JempboxExtractorTest.java
* (edit) 
tika-eval/tika-eval-app/src/test/java/org/apache/tika/eval/app/ProfilerBatchTest.java
* (edit) 
tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/TikaServerWatchDog.java
* (edit) 
tika-example/src/main/java/org/apache/tika/example/InterruptableParsingExample.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/chm/ChmItspHeader.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/onenote/OneNotePtr.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/chm/ChmLzxState.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-news-module/src/main/java/org/apache/tika/parser/iptc/IptcAnpaParser.java
* (edit) 
tika-eval/tika-eval-app/src/main/java/org/apache/tika/eval/app/AbstractProfiler.java
* (edit) 
tika-example/src/main/java/org/apache/tika/example/RollbackSoftware.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/chm/ChmLzxcControlData.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ListManager.java
* (edit) 
tika-eval/tika-eval-app/src/main/java/org/apache/tika/eval/app/ExtractProfiler.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/chm/ChmLzxcResetTable.java
* (edit) tika-xmp/src/main/java/org/apache/tika/xmp/convert/RTFConverter.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/chm/ChmDirectoryListingSet.java
* (edit) 
tika-parsers/tika-parsers-advanced/tika-parser-nlp-module/src/main/java/org/apache/tika/parser/ner/mitie/MITIENERecogniser.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-ocr-module/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
* (edit) tika-core/src/main/java/org/apache/tika/sax/DIFContentHandler.java
* (edit) 

[jira] [Resolved] (TIKA-3392) Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml dependencies.

2021-05-12 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3392.
---
Fix Version/s: 1.27
   Resolution: Fixed

This should now be fixed.  What I can't remember is why I bothered with the 
pool of saxparsers in the MimeTypesReader.  Shouldn't the mimetypes only be 
read/parsed once per jvm/tika load?  If you're calling tika-app from the 
commandline on every individual file, you'll still just read the mimetypes once 
per file, right?  In a multithreaded server environment, you should also only 
read the mimetypes once.

Onwards...

> Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml 
> dependencies.
> --
>
> Key: TIKA-3392
> URL: https://issues.apache.org/jira/browse/TIKA-3392
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26
> Environment: Android 11
>Reporter: Andrei Dobrescu
>Priority: Major
>  Labels: android
> Fix For: 1.27
>
> Attachments: image-2021-05-11-17-53-58-291.png, 
> image-2021-05-11-18-10-40-949.png, image-2021-05-11-18-12-15-300.png
>
>
> I use Apache Tika on Android in order to detect mime type of varios files:
> Apache Tika V1.10 works fine on Android:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {code:java}
> val mimeType = file.inputStream().buffered().use { inputStream ->
> AutoDetectParser().detector .detect(inputStream, Metadata()).toString()
> }
> {code}
> However, Tika V1.26 will crash when trying to detect the mime type:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {noformat}
> java.lang.ExceptionInInitializerError
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
> at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE:
> java.lang.RuntimeException: problem initializing SAXParser pool
> at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:119)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at 
> org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
> at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE:
>  org.apache.tika.exception.TikaException: problem creating SAX parser factory
>  at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:433)
>  at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>  at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:117)
>  at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>  at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>  at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
>  at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
>  at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
>  at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE OF CAUSE:
> org.xml.sax.SAXNotRecognizedException: 
> http://javax.xml.XMLConstants/feature/secure-processing
>  at 
> org.apache.harmony.xml.parsers.SAXParserFactoryImpl.setFeature(SAXParserFactoryImpl.java:93)
>  at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:429)
>  at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>  at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:117)
>  at 

[jira] [Commented] (TIKA-3392) Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml dependencies.

2021-05-12 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343412#comment-17343412
 ] 

Tim Allison commented on TIKA-3392:
---

Thank you [~nick], I split the difference -- logged a warning in 
MimeTypesReader and left the other as it was.

> Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml 
> dependencies.
> --
>
> Key: TIKA-3392
> URL: https://issues.apache.org/jira/browse/TIKA-3392
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26
> Environment: Android 11
>Reporter: Andrei Dobrescu
>Priority: Major
>  Labels: android
> Fix For: 1.27
>
> Attachments: image-2021-05-11-17-53-58-291.png, 
> image-2021-05-11-18-10-40-949.png, image-2021-05-11-18-12-15-300.png
>
>
> I use Apache Tika on Android in order to detect mime type of varios files:
> Apache Tika V1.10 works fine on Android:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {code:java}
> val mimeType = file.inputStream().buffered().use { inputStream ->
> AutoDetectParser().detector .detect(inputStream, Metadata()).toString()
> }
> {code}
> However, Tika V1.26 will crash when trying to detect the mime type:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {noformat}
> java.lang.ExceptionInInitializerError
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
> at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE:
> java.lang.RuntimeException: problem initializing SAXParser pool
> at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:119)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at 
> org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
> at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE:
>  org.apache.tika.exception.TikaException: problem creating SAX parser factory
>  at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:433)
>  at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>  at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:117)
>  at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>  at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>  at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
>  at org.apache.tika.config.TikaConfig.(TikaConfig.java:257)
>  at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
>  at 
> org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE OF CAUSE:
> org.xml.sax.SAXNotRecognizedException: 
> http://javax.xml.XMLConstants/feature/secure-processing
>  at 
> org.apache.harmony.xml.parsers.SAXParserFactoryImpl.setFeature(SAXParserFactoryImpl.java:93)
>  at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:429)
>  at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>  at 
> org.apache.tika.mime.MimeTypesReader.(MimeTypesReader.java:117)
>  at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>  at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>  at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>  at 
> 

[jira] [Resolved] (TIKA-3396) Rename parser modules in 2.0

2021-05-12 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3396.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

> Rename parser modules in 2.0
> 
>
> Key: TIKA-3396
> URL: https://issues.apache.org/jira/browse/TIKA-3396
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Trivial
> Fix For: 2.0.0
>
>
> On the dev list, I think there's lazy consensus for:
> tika-parsers-standard
> tika-parsers-extended
> tika-parsers-ml (machine learning)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (TIKA-3398) Tidy Up Code for Performance Improvements

2021-05-12 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3398.
---
Resolution: Fixed

Thank you [~kamaci]!

> Tidy Up Code for Performance Improvements
> -
>
> Key: TIKA-3398
> URL: https://issues.apache.org/jira/browse/TIKA-3398
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Major
> Fix For: 2.0.0
>
>
> Codebase has some performance issues as like:
>  * Concatenating strings in loops
>  * Redundant calls
>  * Does not breaking loops when necessary
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-3398) Tidy Up Code for Performance Improvements

2021-05-12 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-3398:
--
Fix Version/s: (was: 1.27)
   2.0.0

> Tidy Up Code for Performance Improvements
> -
>
> Key: TIKA-3398
> URL: https://issues.apache.org/jira/browse/TIKA-3398
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Major
> Fix For: 2.0.0
>
>
> Codebase has some performance issues as like:
>  * Concatenating strings in loops
>  * Redundant calls
>  * Does not breaking loops when necessary
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3398) Tidy Up Code for Performance Improvements

2021-05-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343346#comment-17343346
 ] 

ASF GitHub Bot commented on TIKA-3398:
--

tballison merged pull request #439:
URL: https://github.com/apache/tika/pull/439


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Tidy Up Code for Performance Improvements
> -
>
> Key: TIKA-3398
> URL: https://issues.apache.org/jira/browse/TIKA-3398
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Major
> Fix For: 1.27
>
>
> Codebase has some performance issues as like:
>  * Concatenating strings in loops
>  * Redundant calls
>  * Does not breaking loops when necessary
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [tika] tballison merged pull request #439: fix for TIKA-3398 contributed by kamaci

2021-05-12 Thread GitBox


tballison merged pull request #439:
URL: https://github.com/apache/tika/pull/439


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (TIKA-3398) Tidy Up Code for Performance Improvements

2021-05-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1734#comment-1734
 ] 

ASF GitHub Bot commented on TIKA-3398:
--

kamaci opened a new pull request #439:
URL: https://github.com/apache/tika/pull/439


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Tidy Up Code for Performance Improvements
> -
>
> Key: TIKA-3398
> URL: https://issues.apache.org/jira/browse/TIKA-3398
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Major
> Fix For: 1.27
>
>
> Codebase has some performance issues as like:
>  * Concatenating strings in loops
>  * Redundant calls
>  * Does not breaking loops when necessary
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [tika] kamaci opened a new pull request #439: fix for TIKA-3398 contributed by kamaci

2021-05-12 Thread GitBox


kamaci opened a new pull request #439:
URL: https://github.com/apache/tika/pull/439


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (TIKA-3398) Tidy Up Code for Performance Improvements

2021-05-12 Thread Furkan Kamaci (Jira)
Furkan Kamaci created TIKA-3398:
---

 Summary: Tidy Up Code for Performance Improvements
 Key: TIKA-3398
 URL: https://issues.apache.org/jira/browse/TIKA-3398
 Project: Tika
  Issue Type: Improvement
Affects Versions: 1.26
Reporter: Furkan Kamaci
 Fix For: 1.27


Codebase has some performance issues as like:
 * Concatenating strings in loops
 * Redundant calls
 * Does not breaking loops when necessary

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3394) Integrate async into tika-app in 2.x

2021-05-12 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343289#comment-17343289
 ] 

Hudson commented on TIKA-3394:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #229 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/229/])
TIKA-3394 -- integrate pipes into tika-app (tallison: 
[https://github.com/apache/tika/commit/cdb01c9d26f5074919be92361e66bbfba0e6351e])
* (add) 
tika-core/src/main/java/org/apache/tika/pipes/fetcher/fs/FileSystemFetcher.java
* (edit) tika-core/src/main/java/org/apache/tika/pipes/async/AsyncProcessor.java
* (edit) 
tika-core/src/test/resources/org/apache/tika/config/fetchers-nobasepath-config.xml
* (edit) 
tika-core/src/test/resources/org/apache/tika/config/fetchers-duplicate-config.xml
* (add) 
tika-core/src/test/java/org/apache/tika/pipes/fetcher/fs/FileSystemFetcherTest.java
* (edit) 
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/TikaPipesTest.java
* (edit) tika-core/src/main/java/org/apache/tika/pipes/async/AsyncConfig.java
* (edit) 
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/TikaServerPipesIntegrationTest.java
* (edit) tika-app/pom.xml
* (edit) tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java
* (edit) tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java
* (edit) tika-core/src/test/java/org/apache/tika/config/TikaPipesConfigTest.java
* (edit) 
tika-server/tika-server-client/src/test/resources/tika-config-simple-fs-emitter.xml
* (edit) 
tika-core/src/test/java/org/apache/tika/pipes/async/AsyncProcessorTest.java
* (delete) 
tika-core/src/test/java/org/apache/tika/pipes/fetcher/FileSystemFetcherTest.java
* (delete) 
tika-core/src/main/java/org/apache/tika/pipes/fetcher/FileSystemFetcher.java
* (edit) tika-core/src/test/resources/org/apache/tika/config/fetchers-config.xml
* (edit) 
tika-core/src/test/resources/org/apache/tika/config/fetchers-noname-config.xml
* (edit) 
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/TikaServerAsyncIntegrationTest.java
* (edit) tika-core/src/main/java/org/apache/tika/pipes/PipesClient.java


> Integrate async into tika-app in 2.x
> 
>
> Key: TIKA-3394
> URL: https://issues.apache.org/jira/browse/TIKA-3394
> Project: Tika
>  Issue Type: Task
>  Components: app
>Reporter: Tim Allison
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3395) Make Inner Classes Static If Possible to Prevent Memory Leaks

2021-05-12 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343290#comment-17343290
 ] 

Hudson commented on TIKA-3395:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #229 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/229/])
fix for TIKA-3395 contributed by kamaci (#438) (github: 
[https://github.com/apache/tika/commit/0277620cade3ee4c0ea4ab20c64c2da6d71d5756])
* (edit) 
tika-eval/tika-eval-app/src/main/java/org/apache/tika/eval/app/io/DBWriter.java
* (edit) tika-batch/src/main/java/org/apache/tika/batch/BatchProcess.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-apple-module/src/main/java/org/apache/tika/parser/apple/AppleSingleFileParser.java
* (edit) 
tika-eval/tika-eval-core/src/main/java/org/apache/tika/eval/core/tokens/TokenContraster.java
* (edit) tika-core/src/test/java/org/apache/tika/MultiThreadedTikaTest.java
* (edit) tika-core/src/test/java/org/apache/tika/TestRereadableInputStream.java
* (edit) 
tika-core/src/test/java/org/apache/tika/sax/BasicContentHandlerFactoryTest.java
* (edit) tika-batch/src/test/java/org/apache/tika/batch/fs/BatchProcessTest.java
* (edit) 
tika-batch/src/main/java/org/apache/tika/batch/fs/FSDirectoryCrawler.java
* (edit) 
tika-xmp/src/main/java/org/apache/tika/xmp/convert/OpenDocumentConverter.java
* (edit) 
tika-eval/tika-eval-app/src/main/java/org/apache/tika/eval/app/io/XMLLogReader.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFEncodedStringDecoder.java
* (edit) 
tika-batch/src/main/java/org/apache/tika/batch/fs/StreamOutRPWFSConsumer.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xwpf/XWPFEventBasedWordExtractor.java
* (edit) 
tika-pipes/tika-httpclient-commons/src/main/java/org/apache/tika/client/HttpClientFactory.java
* (edit) tika-core/src/test/java/org/apache/tika/detect/MagicDetectorTest.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-pdf-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
* (edit) 
tika-server/tika-server-client/src/main/java/org/apache/tika/server/client/TikaHttpClient.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/SXSLFPowerPointExtractorDecorator.java
* (edit) 
tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/TikaWelcome.java
* (edit) 
tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/UnpackerResource.java
* (edit) 
tika-eval/tika-eval-app/src/main/java/org/apache/tika/eval/app/batch/DBConsumersManager.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/XFAExtractor.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-package/src/test/java/org/apache/tika/parser/pkg/ZipParserTest.java
* (edit) 
tika-eval/tika-eval-app/src/test/java/org/apache/tika/eval/app/db/AbstractBufferTest.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-apple-module/src/main/java/org/apache/tika/parser/iwork/PagesContentHandler.java
* (edit) tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
* (edit) 
tika-batch/src/test/java/org/apache/tika/batch/RecursiveParserWrapperFSConsumerTest.java
* (edit) tika-core/src/test/java/org/apache/tika/mime/MimeTypesReaderTest.java
* (edit) 
tika-example/src/main/java/org/apache/tika/example/InterruptableParsingExample.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/xml/WordMLParser.java
* (edit) 
tika-eval/tika-eval-core/src/main/java/org/apache/tika/eval/core/textstats/TextSha256Signature.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/AbstractListManager.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/pst/OutlookPSTParserTest.java
* (edit) tika-core/src/test/java/org/apache/tika/config/MockConfigTest.java
* (edit) tika-core/src/test/java/org/apache/tika/metadata/TestMetadata.java
* (edit) 
tika-core/src/main/java/org/apache/tika/parser/RecursiveParserWrapper.java
* (edit) 

[jira] [Created] (TIKA-3397) Consider removing tika-batch module in 2.x

2021-05-12 Thread Tim Allison (Jira)
Tim Allison created TIKA-3397:
-

 Summary: Consider removing tika-batch module in 2.x
 Key: TIKA-3397
 URL: https://issues.apache.org/jira/browse/TIKA-3397
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison


If the new pipes module is performant or close enough with tika-batch, we 
should remove tika-batch.  I'm happy doing this after the 2.0.0-BETA release.

Need to carry out analysis to make sure that tika-pipes is better or at least 
not worse than tika-batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-3389) Close Open Resources

2021-05-12 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-3389:
--
Fix Version/s: (was: 1.27)
   2.0.0

> Close Open Resources
> 
>
> Key: TIKA-3389
> URL: https://issues.apache.org/jira/browse/TIKA-3389
> Project: Tika
>  Issue Type: Bug
>  Components: languageidentifier, parser, serialization, translation
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Major
> Fix For: 2.0.0
>
>
> Connections, streams, files, and other classes that implement the 
> {{Closeable}} interface or its super-interface, {{AutoCloseable}}, needs to 
> be closed after use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (TIKA-3389) Close Open Resources

2021-05-12 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3389.
---
Resolution: Fixed

> Close Open Resources
> 
>
> Key: TIKA-3389
> URL: https://issues.apache.org/jira/browse/TIKA-3389
> Project: Tika
>  Issue Type: Bug
>  Components: languageidentifier, parser, serialization, translation
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Major
> Fix For: 2.0.0
>
>
> Connections, streams, files, and other classes that implement the 
> {{Closeable}} interface or its super-interface, {{AutoCloseable}}, needs to 
> be closed after use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (TIKA-3390) Migrate Language Level to Java 8

2021-05-12 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3390.
---
Resolution: Fixed

> Migrate Language Level to Java 8
> 
>
> Key: TIKA-3390
> URL: https://issues.apache.org/jira/browse/TIKA-3390
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Minor
> Fix For: 2.0.0
>
>
> Apache Tika supports JDK 8. However, source code does not use the power of 
> new syntax and improvements since Java 5. This issue aims to migrate the most 
> recent supported JDK level to have better readability and performant code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-3393) Refactor metadata filters to use new ConfigBase in 2.x

2021-05-12 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-3393:
--
Fix Version/s: 2.0.0

> Refactor metadata filters to use new ConfigBase in 2.x
> --
>
> Key: TIKA-3393
> URL: https://issues.apache.org/jira/browse/TIKA-3393
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-3390) Migrate Language Level to Java 8

2021-05-12 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-3390:
--
Fix Version/s: (was: 1.27)
   2.0.0

> Migrate Language Level to Java 8
> 
>
> Key: TIKA-3390
> URL: https://issues.apache.org/jira/browse/TIKA-3390
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Minor
> Fix For: 2.0.0
>
>
> Apache Tika supports JDK 8. However, source code does not use the power of 
> new syntax and improvements since Java 5. This issue aims to migrate the most 
> recent supported JDK level to have better readability and performant code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-3395) Make Inner Classes Static If Possible to Prevent Memory Leaks

2021-05-12 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-3395:
--
Fix Version/s: (was: 1.27)
   2.0.0

> Make Inner Classes Static If Possible to Prevent Memory Leaks
> -
>
> Key: TIKA-3395
> URL: https://issues.apache.org/jira/browse/TIKA-3395
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Major
> Fix For: 2.0.0
>
>
> A static inner class does not keep an implicit reference to its enclosing 
> instance. This prevents a common cause of memory leaks and uses less memory 
> per instance of the class.
> Details can be found here: 
> [https://www.infoworld.com/article/3526554/avoid-memory-leaks-in-inner-classes.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (TIKA-3395) Make Inner Classes Static If Possible to Prevent Memory Leaks

2021-05-12 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3395.
---
Resolution: Fixed

> Make Inner Classes Static If Possible to Prevent Memory Leaks
> -
>
> Key: TIKA-3395
> URL: https://issues.apache.org/jira/browse/TIKA-3395
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Major
> Fix For: 2.0.0
>
>
> A static inner class does not keep an implicit reference to its enclosing 
> instance. This prevents a common cause of memory leaks and uses less memory 
> per instance of the class.
> Details can be found here: 
> [https://www.infoworld.com/article/3526554/avoid-memory-leaks-in-inner-classes.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TIKA-3396) Rename parser modules in 2.0

2021-05-12 Thread Tim Allison (Jira)
Tim Allison created TIKA-3396:
-

 Summary: Rename parser modules in 2.0
 Key: TIKA-3396
 URL: https://issues.apache.org/jira/browse/TIKA-3396
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison


On the dev list, I think there's lazy consensus for:

tika-parsers-standard
tika-parsers-extended
tika-parsers-ml (machine learning)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3395) Make Inner Classes Static If Possible to Prevent Memory Leaks

2021-05-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343236#comment-17343236
 ] 

ASF GitHub Bot commented on TIKA-3395:
--

tballison merged pull request #438:
URL: https://github.com/apache/tika/pull/438


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make Inner Classes Static If Possible to Prevent Memory Leaks
> -
>
> Key: TIKA-3395
> URL: https://issues.apache.org/jira/browse/TIKA-3395
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 1.26
>Reporter: Furkan Kamaci
>Priority: Major
> Fix For: 1.27
>
>
> A static inner class does not keep an implicit reference to its enclosing 
> instance. This prevents a common cause of memory leaks and uses less memory 
> per instance of the class.
> Details can be found here: 
> [https://www.infoworld.com/article/3526554/avoid-memory-leaks-in-inner-classes.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [tika] tballison merged pull request #438: fix for TIKA-3395 contributed by kamaci

2021-05-12 Thread GitBox


tballison merged pull request #438:
URL: https://github.com/apache/tika/pull/438


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org