[jira] [Reopened] (TIKA-3614) Trying to upgrade from 1.27 to 2.1.0

2021-12-15 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reopened TIKA-3614:
---

> Trying to upgrade from 1.27 to 2.1.0
> 
>
> Key: TIKA-3614
> URL: https://issues.apache.org/jira/browse/TIKA-3614
> Project: Tika
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.1.0
>Reporter: Vamsi Molli
>Priority: Major
>  Labels: gradle
>
> Currently, my application is using the Tika version of 1.27, in the Gradle 
> file we wrote like below to download and use Tika components.
> api(group: 'org.apache.tika', name: 'tika-parsers', version: '1.27')
> {         // Tika requires a version of jackson that conflicts with other 
> dependencies.         exclude group: "com.fasterxml.jackson.core"     }
> but when trying to update to 2.1.0 with the below code. seeing some of the 
> imports are missing.
> import org.apache.tika.config.TikaConfig;
> import org.apache.tika.detect.Detector;
> import org.apache.tika.exception.TikaException;
> import org.apache.tika.io.TikaInputStream;
> import org.apache.tika.metadata.HttpHeaders;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.metadata.TikaMetadataKeys;
> import org.apache.tika.mime.MediaType;
> import org.apache.tika.mime.MimeType;
> import org.apache.tika.mime.MimeTypeException;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> Tried with below, causing the above imports missing.
>  
> api(group: 'org.apache.tika', name: 'tika-parsers-standard-package', version: 
> '2.1.0')
> {         // Tika requires a version of jackson that conflicts with other 
> dependencies.         exclude group: "com.fasterxml.jackson.core"     }
> Please let me know what imports I need to change to fix above issues.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (TIKA-3614) Trying to upgrade from 1.27 to 2.1.0

2021-12-15 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed TIKA-3614.
-
Resolution: Not A Bug

I'm closing these as "not a bug" because we don't want these to appear in a 
ticket as "fixed".

> Trying to upgrade from 1.27 to 2.1.0
> 
>
> Key: TIKA-3614
> URL: https://issues.apache.org/jira/browse/TIKA-3614
> Project: Tika
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.1.0
>Reporter: Vamsi Molli
>Priority: Major
>  Labels: gradle
>
> Currently, my application is using the Tika version of 1.27, in the Gradle 
> file we wrote like below to download and use Tika components.
> api(group: 'org.apache.tika', name: 'tika-parsers', version: '1.27')
> {         // Tika requires a version of jackson that conflicts with other 
> dependencies.         exclude group: "com.fasterxml.jackson.core"     }
> but when trying to update to 2.1.0 with the below code. seeing some of the 
> imports are missing.
> import org.apache.tika.config.TikaConfig;
> import org.apache.tika.detect.Detector;
> import org.apache.tika.exception.TikaException;
> import org.apache.tika.io.TikaInputStream;
> import org.apache.tika.metadata.HttpHeaders;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.metadata.TikaMetadataKeys;
> import org.apache.tika.mime.MediaType;
> import org.apache.tika.mime.MimeType;
> import org.apache.tika.mime.MimeTypeException;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> Tried with below, causing the above imports missing.
>  
> api(group: 'org.apache.tika', name: 'tika-parsers-standard-package', version: 
> '2.1.0')
> {         // Tika requires a version of jackson that conflicts with other 
> dependencies.         exclude group: "com.fasterxml.jackson.core"     }
> Please let me know what imports I need to change to fix above issues.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Reopened] (TIKA-3615) Missing class file while upgrade to TIka 2.1.0

2021-12-15 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reopened TIKA-3615:
---

> Missing class file while upgrade to TIka 2.1.0
> --
>
> Key: TIKA-3615
> URL: https://issues.apache.org/jira/browse/TIKA-3615
> Project: Tika
>  Issue Type: Test
>  Components: detector, parser
>Affects Versions: 2.1.0
>Reporter: Vamsi Molli
>Priority: Major
>
> Class does not find an exception for following :
> 
> 
> 
> 
> 
> 
> 
> 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (TIKA-3615) Missing class file while upgrade to TIka 2.1.0

2021-12-15 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed TIKA-3615.
-
Resolution: Not A Bug

> Missing class file while upgrade to TIka 2.1.0
> --
>
> Key: TIKA-3615
> URL: https://issues.apache.org/jira/browse/TIKA-3615
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 2.1.0
>Reporter: Vamsi Molli
>Priority: Major
>
> Class does not find an exception for following :
> 
> 
> 
> 
> 
> 
> 
> 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (TIKA-3615) Missing class file while upgrade to TIka 2.1.0

2021-12-15 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated TIKA-3615:
--
Issue Type: Bug  (was: Test)

> Missing class file while upgrade to TIka 2.1.0
> --
>
> Key: TIKA-3615
> URL: https://issues.apache.org/jira/browse/TIKA-3615
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 2.1.0
>Reporter: Vamsi Molli
>Priority: Major
>
> Class does not find an exception for following :
> 
> 
> 
> 
> 
> 
> 
> 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-15 Thread Tamás Cservenák
Howdy,

There were some Maven Central issues in the past few days, hopefully fixed.
https://status.maven.org/#week

Thanks
Tamas


On Mon, Dec 13, 2021 at 11:18 PM Lewis John McGibbney 
wrote:

> I performed another build of the tika-2.2.0-src.zip artifact which failed.
> I've captured the failure output
>
> https://paste.apache.org/o9iju
>
> % mvn -version
> Apache Maven 3.8.4 (9b656c72d54e5bacbed989b64718c159fe39b537)
> Maven home: /usr/local/Cellar/maven/3.8.4/libexec
> Java version: 11.0.10, vendor: Oracle Corporation, runtime:
> /Library/Java/JavaVirtualMachines/jdk-11.0.10.jdk/Contents/Home
> Default locale: en_US, platform encoding: UTF-8
> OS name: "mac os x", version: "10.15.7", arch: "x86_64", family: "mac"
>
> Can anyone else reproduce this failure?
>
> lewismc
>
> On 2021/12/13 22:13:41 Lewis John McGibbney wrote:
> > Hi Tim,
> >
> > On 2021/12/13 21:37:47 Tim Allison wrote:
> > > A candidate for the Tika 2.2.0 release is available at:
> > > https://dist.apache.org/repos/dist/dev/tika/
> >
> > I downloaded the tika-2.2.0-src.zip artifact
> > >
> 9083fa1973f7146d2869bbdfa2dbdd493e12ac04235b9a4017a01b0b475684a2bc4377149a5a36b68722525fa3de68c7e06b2f7095af0c1e9f8510fba23e2b8d.
> >
> > .sha512 signature good
> > .asc signature is good
> > pom.xml versions all match
> > good NOTICE.txt
> > good CHANGES.txt
> >
> > >
> > > In addition, a staged maven repository is available here:
> > >
> https://repository.apache.org/content/repositories/orgapachetika-1073/org/apache/tika
> > >
> >
> > I added the following to Any23 master pom.xml and ran our unit test suite
> >
> > 
> >   
> > apache-repo-snapshots
> > https://repository.apache.org/content/repositories/snapshots/
> 
> > 
> >   false
> > 
> > 
> >   true
> > 
> >   
> > 
> >
> > Everything passes successfully.
> >
> > >
> > > [X] +1 Release this package as Apache Tika 2.2.0
> >
> > I did notice that the tika DL's module(s) are pulling in the enire
> Hadoop dependency chain. I wonder if we can cut down on this... that is
> however a concern outside of this release candidate review.
> >
> > Thanks for the quick turnaround.
> > lewismc
> >
>


Re: [DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-15 Thread Tim Allison
It didn't take too long, and as long as the original author of the
metrics stuff in tika-server isn't too concerned about breaking
changes, let's hope for the best. Log4j 1.x is so far beyond its EOL,
it is embarrassing.

I think we should keep the 1.x branch open for security upgrades for a
bit...middle of next year?  I have _not_ been adding new features or
even some bug fixes to 1.x, and I encourage people to migrate to 2.x.

What do others think?

On Tue, Dec 14, 2021 at 8:05 PM Luís Filipe Nassif  wrote:
>
> Sorry about the additional work, Tim. I thought upgrading from log4j-1.x to
> 2.x on Tika-1.x maybe could not be that hard and didn't know about breaking
> changes.
>
> Related to Eric's email, would we support Tika-1.x security updates for
> some while (that was my intent with the proposal above)? Was this already
> discussed?
>
> Best regards,
> Luis Filipe
>
>
>
> Em seg., 13 de dez. de 2021 às 17:23, Tim Allison 
> escreveu:
>
> > Yes.  That was the reasoning behind my -0.  I don't think this will
> > destroy our resources, but yes, please do migrate to 2.x asap.
> >
> >
> > On Mon, Dec 13, 2021 at 3:13 PM Eric Pugh
> >  wrote:
> > >
> > > Isn’t the goal of Tika 2 to mean that we no longer work on Tika 1?
> >  Does the Tika community have enough developer bandwidth to continue to
> > maintain Tika 1 while also pushing forward on Tika 2?
> > >
> > > I worry that we’ll fall into that situation where people just end up
> > using Tika 1 for forever, especially if there are new updates to it that
> > are happening, which then encourages folks not to move to Tika 2.
> > >
> > >
> > >
> > >
> > > > On Dec 13, 2021, at 2:49 PM, Tim Allison  wrote:
> > > >
> > > > Sounds like 2 +1 to my -0. :D  I'll start working on this now.
> > > >
> > > > On Mon, Dec 13, 2021 at 2:09 PM Nicholas DiPiazza
> > > >  wrote:
> > > >>
> > > >> I prefer upgrade to log4j2
> > > >>
> > > >> On Mon, Dec 13, 2021, 12:05 PM Tim Allison 
> > wrote:
> > > >>
> > > >>> All,
> > > >>>  I'm currently in the process of building the rc1 for Tika 2.x. On
> > > >>> TIKA-3616, Luís Filipe Nassif asked if we could upgrade log4j to
> > > >>> log4j2 in the 1.x branch.  I think we avoided that because it would
> > be
> > > >>> a breaking change(?).  There are security vulns in log4j and it hit
> > > >>> EOL
> > > >>> in August 2015.
> > > >>>  Should we upgrade the Tika 1.x branch for log4j2?
> > > >>>
> > > >>>  Best,
> > > >>>
> > > >>>   Tim
> > > >>>
> > > >>>
> > > >>> [1]
> > > >>>
> > https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457595#comment-17457595
> > > >>>
> > >
> > > ___
> > > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> > http://www.opensourceconnections.com <
> > http://www.opensourceconnections.com/> | My Free/Busy <
> > http://tinyurl.com/eric-cal>
> > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> > >
> > > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless of
> > whether attachments are marked as such.
> > >
> >


[jira] [Commented] (TIKA-3618) Upgrade to log4j2 in 1.x branch

2021-12-15 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459917#comment-17459917
 ] 

Tim Allison commented on TIKA-3618:
---

[~subhajitdas298] once you’ve had a chance to review branch_1x, we can roll the 
1.28 release. Thank you! We 

> Upgrade to log4j2 in 1.x branch
> ---
>
> Key: TIKA-3618
> URL: https://issues.apache.org/jira/browse/TIKA-3618
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3618) Upgrade to log4j2 in 1.x branch

2021-12-15 Thread Subhajit Das (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459931#comment-17459931
 ] 

Subhajit Das commented on TIKA-3618:


Yes, I will check and push.

> Upgrade to log4j2 in 1.x branch
> ---
>
> Key: TIKA-3618
> URL: https://issues.apache.org/jira/browse/TIKA-3618
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3610) Emit errors to a specific emitter

2021-12-15 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459998#comment-17459998
 ] 

Tim Allison commented on TIKA-3610:
---

Hi [~dadoonet], at the meetup, I forgot that I already added a PipesReporter.  
See 
https://github.com/tballison/file-observatory/blob/main/tika-addons/tika-pipes-reporter/src/main/java/org/tallison/tika/pipes/TikaPipesReporter.java
 for an example of logging to issues to a jdbc endpoint. 

To invoke it, you add this element to the tika-config.xml file:


  

jdbc:postgresql://somewhere/somedb?user=user&password=password
  


> Emit errors to a specific emitter
> -
>
> Key: TIKA-3610
> URL: https://issues.apache.org/jira/browse/TIKA-3610
> Project: Tika
>  Issue Type: New Feature
>  Components: tika-pipes
>Reporter: David Pilato
>Priority: Minor
>
> Instead of emitting errors in the logs, we should emit them using an error 
> emitter instead.
> Then we can have all the flexibility we want to emit our errors to a logger, 
> to a filesystem, to whatever implementation.
> For example:
> {code:java}
> fse {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (TIKA-3610) Emit errors to a specific emitter

2021-12-15 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459998#comment-17459998
 ] 

Tim Allison edited comment on TIKA-3610 at 12/15/21, 2:37 PM:
--

Hi [~dadoonet], at the meetup, I forgot that I already added a PipesReporter.  
See 
https://github.com/tballison/file-observatory/blob/main/tika-addons/tika-pipes-reporter/src/main/java/org/tallison/tika/pipes/TikaPipesReporter.java
 for an example of logging issues to a jdbc endpoint. 

To invoke it on the /async handler, you add this element to the tika-config.xml 
file:
{noformat}
   

  

jdbc:postgresql://somewhere/somedb?user=user&password=password
  

   
{noformat}


was (Author: talli...@mitre.org):
Hi [~dadoonet], at the meetup, I forgot that I already added a PipesReporter.  
See 
https://github.com/tballison/file-observatory/blob/main/tika-addons/tika-pipes-reporter/src/main/java/org/tallison/tika/pipes/TikaPipesReporter.java
 for an example of logging to issues to a jdbc endpoint. 

To invoke it, you add this element to the tika-config.xml file:


  

jdbc:postgresql://somewhere/somedb?user=user&password=password
  


> Emit errors to a specific emitter
> -
>
> Key: TIKA-3610
> URL: https://issues.apache.org/jira/browse/TIKA-3610
> Project: Tika
>  Issue Type: New Feature
>  Components: tika-pipes
>Reporter: David Pilato
>Priority: Minor
>
> Instead of emitting errors in the logs, we should emit them using an error 
> emitter instead.
> Then we can have all the flexibility we want to emit our errors to a logger, 
> to a filesystem, to whatever implementation.
> For example:
> {code:java}
> fse {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3610) Emit errors to a specific emitter

2021-12-15 Thread David Pilato (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1746#comment-1746
 ] 

David Pilato commented on TIKA-3610:


That's very good. So I believe we are all set and we can close this one ;) 

 

> Emit errors to a specific emitter
> -
>
> Key: TIKA-3610
> URL: https://issues.apache.org/jira/browse/TIKA-3610
> Project: Tika
>  Issue Type: New Feature
>  Components: tika-pipes
>Reporter: David Pilato
>Priority: Minor
>
> Instead of emitting errors in the logs, we should emit them using an error 
> emitter instead.
> Then we can have all the flexibility we want to emit our errors to a logger, 
> to a filesystem, to whatever implementation.
> For example:
> {code:java}
> fse {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-15 Thread Nick Burch

On Wed, 15 Dec 2021, Tim Allison wrote:
I think we should keep the 1.x branch open for security upgrades for a 
bit...middle of next year?  I have _not_ been adding new features or 
even some bug fixes to 1.x, and I encourage people to migrate to 2.x.


We've seen quite a few queries from people struggling to upgrade in the 
last few weeks, so I think it's fair to say we must have a decent number 
of 1.x users still. For an example, Alfresco only upgraded a couple of 
weeks ago, and that's only on their main branch, so it'll be a while until 
it's in their releases.


I'm not keen on adding new features to 1.x, as that'll only encourage 
people to stick on the old one, but I wouldn't go as far as -1'ing other 
people's backports if they're still keen, at least for a while!


I'd be minded to say we probably need to keep on top of security stuff 
until something like September 2022, to give people just over a year to 
upgrade to 2.x. I'm minded to say we allow anyone keen to backport 
bugfixes etc until 3 months before that, but effectively discourage it for 
the last part to help encourage hold-outs to move. I think we should post 
something on the site about the planned EOL timeline, linking once more to 
the wonderful migrating resource on the wiki.


Nick


[jira] [Resolved] (TIKA-369) Improve accuracy of language detection

2021-12-15 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved TIKA-369.
---
Resolution: Fixed

Cleaning this one up [~kkrugler]

> Improve accuracy of language detection
> --
>
> Key: TIKA-369
> URL: https://issues.apache.org/jira/browse/TIKA-369
> Project: Tika
>  Issue Type: Improvement
>  Components: languageidentifier
>Affects Versions: 0.6
>Reporter: Kenneth William Krugler
>Assignee: Kenneth William Krugler
>Priority: Major
> Attachments: Surprise and Coincidence.pdf, lingdet-mccs.pdf, 
> textcat.pdf
>
>
> Currently the LanguageProfile code uses 3-grams to find the best language 
> profile using Pearson's chi-square test. This has three issues:
> 1. The results aren't very good for short runs of text. Ted Dunning's paper 
> (attached) indicates that a log-likelihood ratio (LLR) test works much 
> better, which would then make language detection faster due to less text 
> needing to be processed.
> 2. The current LanguageIdentifier.isReasonablyCertain() method uses an exact 
> value as a threshold for certainty. This is very sensitive to the amount of 
> text being processed, and thus gives false negative results for short runs of 
> text.
> 3. Certainty should also be based on how much better the result is for 
> language X, compared to the next best language. If two languages both had 
> identical sum-of-squares values, and this value was below the threshold, then 
> the result is still not very certain.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [tika] nddipiazza opened a new pull request #465: TIKA-3446 - 1.x port of TIKA-3446 - Support for parsing OneNote when Alternative Encoding Using the File Synchronization via SOAP Over HTTP P

2021-12-15 Thread GitBox


nddipiazza opened a new pull request #465:
URL: https://github.com/apache/tika/pull/465


   Porting https://github.com/apache/tika/pull/461 to Tika 1.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (TIKA-3446) OneNote - look into adding support for OneNote 365 documents

2021-12-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460062#comment-17460062
 ] 

ASF GitHub Bot commented on TIKA-3446:
--

nddipiazza opened a new pull request #465:
URL: https://github.com/apache/tika/pull/465


   Porting https://github.com/apache/tika/pull/461 to Tika 1.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> OneNote - look into adding support for OneNote 365 documents
> 
>
> Key: TIKA-3446
> URL: https://issues.apache.org/jira/browse/TIKA-3446
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.27
>Reporter: Nicholas DiPiazza
>Assignee: Nicholas DiPiazza
>Priority: Major
>
> While doing some parsing of OneNote documents, I was investigating a slew of 
> them that did not seem to parse very well. 
> When I did some digging, I found out that these documents were generated from 
> SharePoint Online. 
> I had hoped that OneNote documents generated from SharePoint Online would 
> just be the same as OnPrem OneNote documents from 2016, 2019 etc. 
> But turns out this is NOT the case. 
> I checked out the Microsoft specification MS-ONESTORE and found that the 
> documents do not match the specifications that are published. 
> Opened a community post: [Looking for the MS spec for OneNote 365 version - 
> Microsoft 
> Q&A|https://docs.microsoft.com/en-us/answers/questions/436943/looking-for-the-ms-spec-for-onenote-365-version-1.html]
> And also opened an internal ticket with Microsoft. 
> They will be responding soon with an analysis of my issue and we'll see if 
> there is anything we can do. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [tika] nddipiazza commented on pull request #465: TIKA-3446 - 1.x port of TIKA-3446 - Support for parsing OneNote when Alternative Encoding Using the File Synchronization via SOAP Over HTTP P

2021-12-15 Thread GitBox


nddipiazza commented on pull request #465:
URL: https://github.com/apache/tika/pull/465#issuecomment-994933278


   @tballison am I OK to use stringutils3 in the 1.x branch? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (TIKA-491) Add language identification support for Norwegian Bokmål and Norwegian Nynorsk

2021-12-15 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved TIKA-491.
---
Resolution: Won't Fix

Cleanup [~kkrugler] [~pandermusubi]

> Add language identification support for Norwegian Bokmål and Norwegian Nynorsk
> --
>
> Key: TIKA-491
> URL: https://issues.apache.org/jira/browse/TIKA-491
> Project: Tika
>  Issue Type: New Feature
>  Components: languageidentifier
>Affects Versions: 0.7
>Reporter: Jan Høydahl
>Assignee: Kenneth William Krugler
>Priority: Major
>
> Currently there is one Norwegian language profile in Tika - "no". We need to 
> distinguish between the two official Norwegian languages defined by ISO 639-1 
> codes "nb" and "nn". Those codes are recommended used instead of the common 
> "no" tag.
> Proposed solved by removing the current language profile no.ngp and replacing 
> it with two new ones for nb and nn.
> We must also add tests for Norwegian



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3446) OneNote - look into adding support for OneNote 365 documents

2021-12-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460064#comment-17460064
 ] 

ASF GitHub Bot commented on TIKA-3446:
--

nddipiazza commented on pull request #465:
URL: https://github.com/apache/tika/pull/465#issuecomment-994933278


   @tballison am I OK to use stringutils3 in the 1.x branch? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> OneNote - look into adding support for OneNote 365 documents
> 
>
> Key: TIKA-3446
> URL: https://issues.apache.org/jira/browse/TIKA-3446
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.27
>Reporter: Nicholas DiPiazza
>Assignee: Nicholas DiPiazza
>Priority: Major
>
> While doing some parsing of OneNote documents, I was investigating a slew of 
> them that did not seem to parse very well. 
> When I did some digging, I found out that these documents were generated from 
> SharePoint Online. 
> I had hoped that OneNote documents generated from SharePoint Online would 
> just be the same as OnPrem OneNote documents from 2016, 2019 etc. 
> But turns out this is NOT the case. 
> I checked out the Microsoft specification MS-ONESTORE and found that the 
> documents do not match the specifications that are published. 
> Opened a community post: [Looking for the MS spec for OneNote 365 version - 
> Microsoft 
> Q&A|https://docs.microsoft.com/en-us/answers/questions/436943/looking-for-the-ms-spec-for-onenote-365-version-1.html]
> And also opened an internal ticket with Microsoft. 
> They will be responding soon with an analysis of my issue and we'll see if 
> there is anything we can do. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3229) mvn clean install failure - tika-1.24 on windows

2021-12-15 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460065#comment-17460065
 ] 

Lewis John McGibbney commented on TIKA-3229:


[~Simmo] are you able to reproduce this/ Otherwise I think we should close this 
ticket. 

> mvn clean install failure -  tika-1.24 on windows
> -
>
> Key: TIKA-3229
> URL: https://issues.apache.org/jira/browse/TIKA-3229
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.24
> Environment: windows 10
>Reporter: Simon Opper
>Priority: Major
>
> getting a build fail on mvn clean install
>  
> ERROR] Failed to execute goal 
> org.apache.felix:maven-bundle-plugin:4.1.0:bundle (default-bundle) on project 
> tika-core: Execution default-bundle of goal 
> org.apache.felix:maven-bundle-plugin:4.1.0:bundle failed.: 
> ConcurrentModificationException -> [Help 1]
>  
> the complete verbose error text is below
>  
>  --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ tika-core ---
> [INFO] 
> 
> [INFO] Reactor Summary for Apache Tika 2.0.0-SNAPSHOT:
> [INFO]
> [INFO] Apache Tika parent . SUCCESS [  1.813 
> s]
> [INFO] Apache Tika core ... FAILURE [  7.528 
> s]
> [INFO] Apache Tika parser modules . SKIPPED
> [INFO] tika-parser-jdbc-commons ... SKIPPED
> [INFO] tika-parser-digest-commons . SKIPPED
> [INFO] tika-parser-mail-commons ... SKIPPED
> [INFO] tika-parser-xmp-commons  SKIPPED
> [INFO] tika-parser-zip-commons  SKIPPED
> [INFO] tika-parser-image-module ... SKIPPED
> [INFO] tika-parser-ocr-module . SKIPPED
> [INFO] tika-parser-audiovideo-module .. SKIPPED
> [INFO] tika-parser-text-module  SKIPPED
> [INFO] tika-parser-code-module  SKIPPED
> [INFO] tika-parser-html-module  SKIPPED
> [INFO] tika-parser-font-module  SKIPPED
> [INFO] tika-parser-xml-module . SKIPPED
> [INFO] tika-parser-microsoft-module ... SKIPPED
> [INFO] tika-parser-pkg-module . SKIPPED
> [INFO] tika-parser-pdf-module . SKIPPED
> [INFO] tika-parser-apple-module ... SKIPPED
> [INFO] tika-parser-cad-module . SKIPPED
> [INFO] tika-parser-mail-module  SKIPPED
> [INFO] tika-parser-miscoffice-module .. SKIPPED
> [INFO] tika-parser-news-module  SKIPPED
> [INFO] tika-parser-crypto-module .. SKIPPED
> [INFO] tika-parser-integration-tests .. SKIPPED
> [INFO] tika-parsers ... SKIPPED
> [INFO] tika-parsers-extended .. SKIPPED
> [INFO] tika-parser-sqlite3-module . SKIPPED
> [INFO] tika-parser-scientific-module .. SKIPPED
> [INFO] tika-parsers-extended-integration-tests  SKIPPED
> [INFO] Apache Tika XMP  SKIPPED
> [INFO] Apache Tika serialization .. SKIPPED
> [INFO] Apache Tika batch .. SKIPPED
> [INFO] Apache Tika language detection . SKIPPED
> [INFO] tika-langdetect-commons  SKIPPED
> [INFO] tika-langdetect-lingo24  SKIPPED
> [INFO] tika-langdetect-optimaize .. SKIPPED
> [INFO] tika-langdetect-mitll-text . SKIPPED
> [INFO] tika-langdetect-opennlp  SKIPPED
> [INFO] Apache Tika application  SKIPPED
> [INFO] Apache Tika translate .. SKIPPED
> [INFO] Apache Tika server . SKIPPED
> [INFO] Apache Tika fuzzing  SKIPPED
> [INFO] Apache Tika eval ... SKIPPED
> [INFO] Apache Tika examples ... SKIPPED
> [INFO] Apache Tika Java-7 Components .. SKIPPED
> [INFO] tika-parsers-advanced .. SKIPPED
> [INFO] tika-parser-nlp-module . SKIPPED
> [INFO] Apache Tika Natural Language Processing  SKIPPED
> [INFO] tika-parser-advancedmedia-module ... SKIPPED
> [INFO] Apache Tika Deep Learning (powered by DL4J) ...

[jira] [Commented] (TIKA-3241) Clarify parser module structure in 2.0.0

2021-12-15 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460066#comment-17460066
 ] 

Lewis John McGibbney commented on TIKA-3241:


Hi [~tallison] can this ticket be closed?

> Clarify parser module structure in 2.0.0
> 
>
> Key: TIKA-3241
> URL: https://issues.apache.org/jira/browse/TIKA-3241
> Project: Tika
>  Issue Type: Task
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
>
> In 2.0.0, we currently have:
> tika-parser-modules/
> tika-parsers/
> tika-parsers-advanced/
> tika-parsers-extended
> where {{tika-parsers}} is a module that includes all parsers in 
> {{tika-parser-modules}}.
> I think we can make the structure a bit clearer by:
> tika-parsers/
>tika-parsers-classic/ (renamed from tika-parser-modules)
>tika-parsers-advanced/
>tika-parsers-extended
> As before in 2.0.0, tika-app and tika-server would pull from 
> tika-parsers-classic.  If users wanted the heavier parsers in 
> tika-parsers-advanced/tika-parsers-extended, they could pull those in on 
> their own.
>   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [tika] tballison commented on pull request #465: TIKA-3446 - 1.x port of TIKA-3446 - Support for parsing OneNote when Alternative Encoding Using the File Synchronization via SOAP Over HTTP Pr

2021-12-15 Thread GitBox


tballison commented on pull request #465:
URL: https://github.com/apache/tika/pull/465#issuecomment-994938829


   @nddipiazza I think the goal is to keep tika-core as slim as possible, but 
you can put whatever you need in tika-parsers, as long as we don't have any 
conflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (TIKA-3446) OneNote - look into adding support for OneNote 365 documents

2021-12-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460068#comment-17460068
 ] 

ASF GitHub Bot commented on TIKA-3446:
--

tballison commented on pull request #465:
URL: https://github.com/apache/tika/pull/465#issuecomment-994938829


   @nddipiazza I think the goal is to keep tika-core as slim as possible, but 
you can put whatever you need in tika-parsers, as long as we don't have any 
conflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> OneNote - look into adding support for OneNote 365 documents
> 
>
> Key: TIKA-3446
> URL: https://issues.apache.org/jira/browse/TIKA-3446
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.27
>Reporter: Nicholas DiPiazza
>Assignee: Nicholas DiPiazza
>Priority: Major
>
> While doing some parsing of OneNote documents, I was investigating a slew of 
> them that did not seem to parse very well. 
> When I did some digging, I found out that these documents were generated from 
> SharePoint Online. 
> I had hoped that OneNote documents generated from SharePoint Online would 
> just be the same as OnPrem OneNote documents from 2016, 2019 etc. 
> But turns out this is NOT the case. 
> I checked out the Microsoft specification MS-ONESTORE and found that the 
> documents do not match the specifications that are published. 
> Opened a community post: [Looking for the MS spec for OneNote 365 version - 
> Microsoft 
> Q&A|https://docs.microsoft.com/en-us/answers/questions/436943/looking-for-the-ms-spec-for-onenote-365-version-1.html]
> And also opened an internal ticket with Microsoft. 
> They will be responding soon with an analysis of my issue and we'll see if 
> there is anything we can do. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (TIKA-3241) Clarify parser module structure in 2.0.0

2021-12-15 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3241.
---
Fix Version/s: 2.0.0-ALPHA
   Resolution: Fixed

Y, thanks [~lewismc]

> Clarify parser module structure in 2.0.0
> 
>
> Key: TIKA-3241
> URL: https://issues.apache.org/jira/browse/TIKA-3241
> Project: Tika
>  Issue Type: Task
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.0.0-ALPHA
>
>
> In 2.0.0, we currently have:
> tika-parser-modules/
> tika-parsers/
> tika-parsers-advanced/
> tika-parsers-extended
> where {{tika-parsers}} is a module that includes all parsers in 
> {{tika-parser-modules}}.
> I think we can make the structure a bit clearer by:
> tika-parsers/
>tika-parsers-classic/ (renamed from tika-parser-modules)
>tika-parsers-advanced/
>tika-parsers-extended
> As before in 2.0.0, tika-app and tika-server would pull from 
> tika-parsers-classic.  If users wanted the heavier parsers in 
> tika-parsers-advanced/tika-parsers-extended, they could pull those in on 
> their own.
>   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (TIKA-3620) Language detection documentation needs attention

2021-12-15 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney reassigned TIKA-3620:
--

Assignee: Lewis John McGibbney

> Language detection documentation needs attention
> 
>
> Key: TIKA-3620
> URL: https://issues.apache.org/jira/browse/TIKA-3620
> Project: Tika
>  Issue Type: Improvement
>  Components: languageidentifier
>Affects Versions: 2.1.0
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
>
> This language identifier/detection suffers from a few problems
> # Clarity is needed on identifier/identification Vs detector/detection. Which 
> is it? The source code says identifier whereas the [documentation is nested 
> under 
> detection|https://tika.apache.org/2.1.0/detection.html#Language_Detection].
> # The 
> [org.apache.tika.language.LanguageIdentifier|https://tika.apache.org/2.1.0/api/org/apache/tika/language/LanguageIdentifier.html]
>  returns 404. What is this meant to resolve to?
> # Generally speaking the [documentation is literally 
> non-existent|https://tika.apache.org/2.1.0/detection.html#Language_Detection].
>  I checked the wiki and failed to find anything. I did find some [minor 
> documentation|https://tika.apache.org/2.1.0/examples.html#Language_Identification]
>  but this is also severely lacking. Also note the broken hyperlink.
> Some suggestions for improvement
> # Fix the broken hyperlinks.
> # Hyperlink to the existing example namely 
> [LanguageDetectorExample.java|https://github.com/apache/tika/blob/main/tika-example/src/main/java/org/apache/tika/example/LanguageDetectorExample.java],
>  
> [LanguageDetectingParser.java|https://github.com/apache/tika/blob/main/tika-example/src/main/java/org/apache/tika/example/LanguageDetectingParser.java]
>  and 
> [Language.java|https://github.com/apache/tika/blob/main/tika-example/src/main/java/org/apache/tika/example/Language.java]
> # Hyperlink to the [LanguageDetector 
> Javadoc|https://tika.apache.org/2.1.0/api/index.html?org/apache/tika/language/detect/LanguageDetector.html]
>  and atleast mention some of the other implementations.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (TIKA-3620) Language detection documentation needs attention

2021-12-15 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created TIKA-3620:
--

 Summary: Language detection documentation needs attention
 Key: TIKA-3620
 URL: https://issues.apache.org/jira/browse/TIKA-3620
 Project: Tika
  Issue Type: Improvement
  Components: languageidentifier
Affects Versions: 2.1.0
Reporter: Lewis John McGibbney


This language identifier/detection suffers from a few problems
# Clarity is needed on identifier/identification Vs detector/detection. Which 
is it? The source code says identifier whereas the [documentation is nested 
under 
detection|https://tika.apache.org/2.1.0/detection.html#Language_Detection].
# The 
[org.apache.tika.language.LanguageIdentifier|https://tika.apache.org/2.1.0/api/org/apache/tika/language/LanguageIdentifier.html]
 returns 404. What is this meant to resolve to?
# Generally speaking the [documentation is literally 
non-existent|https://tika.apache.org/2.1.0/detection.html#Language_Detection]. 
I checked the wiki and failed to find anything. I did find some [minor 
documentation|https://tika.apache.org/2.1.0/examples.html#Language_Identification]
 but this is also severely lacking. Also note the broken hyperlink.

Some suggestions for improvement
# Fix the broken hyperlinks.
# Hyperlink to the existing example namely 
[LanguageDetectorExample.java|https://github.com/apache/tika/blob/main/tika-example/src/main/java/org/apache/tika/example/LanguageDetectorExample.java],
 
[LanguageDetectingParser.java|https://github.com/apache/tika/blob/main/tika-example/src/main/java/org/apache/tika/example/LanguageDetectingParser.java]
 and 
[Language.java|https://github.com/apache/tika/blob/main/tika-example/src/main/java/org/apache/tika/example/Language.java]
# Hyperlink to the [LanguageDetector 
Javadoc|https://tika.apache.org/2.1.0/api/index.html?org/apache/tika/language/detect/LanguageDetector.html]
 and atleast mention some of the other implementations.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-15 Thread Tim Allison
Sounds good, Nick.  Unless there are objections, I'll add an EOL
September 30, 2022 for the 1.x branch on our github README and maybe
our site somewhere?

>I'm not keen on adding new features to 1.x, as that'll only encourage
people to stick on the old one, but I wouldn't go as far as -1'ing other
people's backports if they're still keen, at least for a while!

Agreed.

Onwards!

Cheers,

  Tim


On Wed, Dec 15, 2021 at 10:01 AM Nick Burch  wrote:
>
> On Wed, 15 Dec 2021, Tim Allison wrote:
> > I think we should keep the 1.x branch open for security upgrades for a
> > bit...middle of next year?  I have _not_ been adding new features or
> > even some bug fixes to 1.x, and I encourage people to migrate to 2.x.
>
> We've seen quite a few queries from people struggling to upgrade in the
> last few weeks, so I think it's fair to say we must have a decent number
> of 1.x users still. For an example, Alfresco only upgraded a couple of
> weeks ago, and that's only on their main branch, so it'll be a while until
> it's in their releases.
>
> I'm not keen on adding new features to 1.x, as that'll only encourage
> people to stick on the old one, but I wouldn't go as far as -1'ing other
> people's backports if they're still keen, at least for a while!
>
> I'd be minded to say we probably need to keep on top of security stuff
> until something like September 2022, to give people just over a year to
> upgrade to 2.x. I'm minded to say we allow anyone keen to backport
> bugfixes etc until 3 months before that, but effectively discourage it for
> the last part to help encourage hold-outs to move. I think we should post
> something on the site about the planned EOL timeline, linking once more to
> the wonderful migrating resource on the wiki.
>
> Nick


[jira] [Commented] (TIKA-3616) Upgrade log4j2

2021-12-15 Thread Dan Switzer (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460140#comment-17460140
 ] 

Dan Switzer commented on TIKA-3616:
---

Is Tika being upgraded to Log4j v2.16, since 2.15 still has a potential DoS 
issue?

Is there still a release planned in the next day or so?

> Upgrade log4j2
> --
>
> Key: TIKA-3616
> URL: https://issues.apache.org/jira/browse/TIKA-3616
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>
> RCE...might be difficult to trigger in Tika, but why ask for a PoC...
> This only affects 2.x.  We were still using the old log4j in 1.x



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-15 Thread Nick Burch

On Wed, 15 Dec 2021, Tim Allison wrote:
Sounds good, Nick.  Unless there are objections, I'll add an EOL 
September 30, 2022 for the 1.x branch on our github README and maybe our 
site somewhere?


Maybe just mention it in the news section at the end any 1.x fix releases?

Nick


[jira] [Commented] (TIKA-3616) Upgrade log4j2

2021-12-15 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460150#comment-17460150
 ] 

Tim Allison commented on TIKA-3616:
---

2.15's vulnerability seemed to require extra complexity (non-standard 
configuration) and, so far, no fellow devs have asked for a respin.  I'm not 
against it.  The current plan is to have an update in early January with 2.16 
(or later by then?).

If this is a complete non-starter and you need 2.16, please let us know and 
please help us understand how 2.15 would be problematic.

> Upgrade log4j2
> --
>
> Key: TIKA-3616
> URL: https://issues.apache.org/jira/browse/TIKA-3616
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>
> RCE...might be difficult to trigger in Tika, but why ask for a PoC...
> This only affects 2.x.  We were still using the old log4j in 1.x



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [tika] nddipiazza commented on pull request #465: TIKA-3446 - 1.x port of TIKA-3446 - Support for parsing OneNote when Alternative Encoding Using the File Synchronization via SOAP Over HTTP P

2021-12-15 Thread GitBox


nddipiazza commented on pull request #465:
URL: https://github.com/apache/tika/pull/465#issuecomment-995047236


   OK @tballison got a moment for a quick review on this one? Is this OK to add 
into the upcoming 1.x release? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (TIKA-3446) OneNote - look into adding support for OneNote 365 documents

2021-12-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460152#comment-17460152
 ] 

ASF GitHub Bot commented on TIKA-3446:
--

nddipiazza commented on pull request #465:
URL: https://github.com/apache/tika/pull/465#issuecomment-995047236


   OK @tballison got a moment for a quick review on this one? Is this OK to add 
into the upcoming 1.x release? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> OneNote - look into adding support for OneNote 365 documents
> 
>
> Key: TIKA-3446
> URL: https://issues.apache.org/jira/browse/TIKA-3446
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.27
>Reporter: Nicholas DiPiazza
>Assignee: Nicholas DiPiazza
>Priority: Major
>
> While doing some parsing of OneNote documents, I was investigating a slew of 
> them that did not seem to parse very well. 
> When I did some digging, I found out that these documents were generated from 
> SharePoint Online. 
> I had hoped that OneNote documents generated from SharePoint Online would 
> just be the same as OnPrem OneNote documents from 2016, 2019 etc. 
> But turns out this is NOT the case. 
> I checked out the Microsoft specification MS-ONESTORE and found that the 
> documents do not match the specifications that are published. 
> Opened a community post: [Looking for the MS spec for OneNote 365 version - 
> Microsoft 
> Q&A|https://docs.microsoft.com/en-us/answers/questions/436943/looking-for-the-ms-spec-for-onenote-365-version-1.html]
> And also opened an internal ticket with Microsoft. 
> They will be responding soon with an analysis of my issue and we'll see if 
> there is anything we can do. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-15 Thread Konstantin Gribov
My +1 to EOL on September 30, 2022 with effective backport submission
freeze 3 months before that.

I think it would be better if we mention the EOL timeline at least in 3
places: in each release announcement, in README and on the site (on the
main page or in release news articles). Different downstream users look at
different sources, so more visibility seems to be a good idea to me. I saw
a lot of projects still using log4j 1.2.x in the wild and have a feeling
that it's partially due to lack of visibility about its EOL.

Also we can send a message to announce@a.o (if it's not discouraged by ASF
policies, I don't recall if somebody did something similar before),
user@tika.a.o and dev@tika.a.o 6 and 3 months before EOL date.

-- 
Best regards,
Konstantin Gribov.


On Wed, Dec 15, 2021 at 9:00 PM Nick Burch  wrote:

> On Wed, 15 Dec 2021, Tim Allison wrote:
> > Sounds good, Nick.  Unless there are objections, I'll add an EOL
> > September 30, 2022 for the 1.x branch on our github README and maybe our
> > site somewhere?
>
> Maybe just mention it in the news section at the end any 1.x fix releases?
>
> Nick
>


[jira] [Commented] (TIKA-3616) Upgrade log4j2

2021-12-15 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460168#comment-17460168
 ] 

Konstantin Gribov commented on TIKA-3616:
-

I looked a bit how Tika and it's upstream dependencies use 
{{MDC}}/{{ThreadContext}} which are vulnerable in 2.15 and Tika and deps use 
them quite sparsely (as far as IntelliJ IDEA sees usages). 

{{solrj}} puts Solr client URL into MDC, Zookeeper puts node id from config 
file into MDC and UIMA puts some ids into it which doesn't seem to be 
user-generated at least in Tika. 

Also {{testcontainers}} use MDC but only in {{test}} scope.

> Upgrade log4j2
> --
>
> Key: TIKA-3616
> URL: https://issues.apache.org/jira/browse/TIKA-3616
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>
> RCE...might be difficult to trigger in Tika, but why ask for a PoC...
> This only affects 2.x.  We were still using the old log4j in 1.x



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3616) Upgrade log4j2

2021-12-15 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460180#comment-17460180
 ] 

Tim Allison commented on TIKA-3616:
---

Thank you for looking at this carefully, [~grossws]!

> Upgrade log4j2
> --
>
> Key: TIKA-3616
> URL: https://issues.apache.org/jira/browse/TIKA-3616
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>
> RCE...might be difficult to trigger in Tika, but why ask for a PoC...
> This only affects 2.x.  We were still using the old log4j in 1.x



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-491) Add language identification support for Norwegian Bokmål and Norwegian Nynorsk

2021-12-15 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460203#comment-17460203
 ] 

Tim Allison commented on TIKA-491:
--

This module in Tika 2.x does distinguish between those two languages: 
https://github.com/apache/tika/tree/main/tika-langdetect/tika-langdetect-opennlp

> Add language identification support for Norwegian Bokmål and Norwegian Nynorsk
> --
>
> Key: TIKA-491
> URL: https://issues.apache.org/jira/browse/TIKA-491
> Project: Tika
>  Issue Type: New Feature
>  Components: languageidentifier
>Affects Versions: 0.7
>Reporter: Jan Høydahl
>Assignee: Kenneth William Krugler
>Priority: Major
>
> Currently there is one Norwegian language profile in Tika - "no". We need to 
> distinguish between the two official Norwegian languages defined by ISO 639-1 
> codes "nb" and "nn". Those codes are recommended used instead of the common 
> "no" tag.
> Proposed solved by removing the current language profile no.ngp and replacing 
> it with two new ones for nb and nn.
> We must also add tests for Norwegian



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (TIKA-491) Add language identification support for Norwegian Bokmål and Norwegian Nynorsk

2021-12-15 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460203#comment-17460203
 ] 

Tim Allison edited comment on TIKA-491 at 12/15/21, 7:31 PM:
-

This module in Tika 2.x does distinguish between those two languages (nno vs 
nob): 
https://github.com/apache/tika/tree/main/tika-langdetect/tika-langdetect-opennlp


was (Author: talli...@mitre.org):
This module in Tika 2.x does distinguish between those two languages: 
https://github.com/apache/tika/tree/main/tika-langdetect/tika-langdetect-opennlp

> Add language identification support for Norwegian Bokmål and Norwegian Nynorsk
> --
>
> Key: TIKA-491
> URL: https://issues.apache.org/jira/browse/TIKA-491
> Project: Tika
>  Issue Type: New Feature
>  Components: languageidentifier
>Affects Versions: 0.7
>Reporter: Jan Høydahl
>Assignee: Kenneth William Krugler
>Priority: Major
>
> Currently there is one Norwegian language profile in Tika - "no". We need to 
> distinguish between the two official Norwegian languages defined by ISO 639-1 
> codes "nb" and "nn". Those codes are recommended used instead of the common 
> "no" tag.
> Proposed solved by removing the current language profile no.ngp and replacing 
> it with two new ones for nb and nn.
> We must also add tests for Norwegian



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [tika] tballison commented on pull request #465: TIKA-3446 - 1.x port of TIKA-3446 - Support for parsing OneNote when Alternative Encoding Using the File Synchronization via SOAP Over HTTP Pr

2021-12-15 Thread GitBox


tballison commented on pull request #465:
URL: https://github.com/apache/tika/pull/465#issuecomment-995141124


   Looks good to me.  Y, go for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (TIKA-3446) OneNote - look into adding support for OneNote 365 documents

2021-12-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460209#comment-17460209
 ] 

ASF GitHub Bot commented on TIKA-3446:
--

tballison commented on pull request #465:
URL: https://github.com/apache/tika/pull/465#issuecomment-995141124


   Looks good to me.  Y, go for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> OneNote - look into adding support for OneNote 365 documents
> 
>
> Key: TIKA-3446
> URL: https://issues.apache.org/jira/browse/TIKA-3446
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.27
>Reporter: Nicholas DiPiazza
>Assignee: Nicholas DiPiazza
>Priority: Major
>
> While doing some parsing of OneNote documents, I was investigating a slew of 
> them that did not seem to parse very well. 
> When I did some digging, I found out that these documents were generated from 
> SharePoint Online. 
> I had hoped that OneNote documents generated from SharePoint Online would 
> just be the same as OnPrem OneNote documents from 2016, 2019 etc. 
> But turns out this is NOT the case. 
> I checked out the Microsoft specification MS-ONESTORE and found that the 
> documents do not match the specifications that are published. 
> Opened a community post: [Looking for the MS spec for OneNote 365 version - 
> Microsoft 
> Q&A|https://docs.microsoft.com/en-us/answers/questions/436943/looking-for-the-ms-spec-for-onenote-365-version-1.html]
> And also opened an internal ticket with Microsoft. 
> They will be responding soon with an analysis of my issue and we'll see if 
> there is anything we can do. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [tika] tballison merged pull request #464: TIKA-3619 Augment README with build prerequisites

2021-12-15 Thread GitBox


tballison merged pull request #464:
URL: https://github.com/apache/tika/pull/464


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (TIKA-3619) Augment README with build prerequisites

2021-12-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460210#comment-17460210
 ] 

ASF GitHub Bot commented on TIKA-3619:
--

tballison merged pull request #464:
URL: https://github.com/apache/tika/pull/464


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Augment README with build prerequisites
> ---
>
> Key: TIKA-3619
> URL: https://issues.apache.org/jira/browse/TIKA-3619
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.2.0
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
>
> When [reviewing the 2.2.0 RC 
> |https://lists.apache.org/thread/pfwm8sn7w3lsrsckd8b9v3b32byj4zms] I became 
> aware that although Docker IS required to build tika-pipes modules, there is 
> no guidance to reflect that.
> I think we could cleanup the README to reflect the installation prerequisites.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-15 Thread Tim Allison
I've merged Lewis's edits to the README and added the EOL.  Let's do
what both Konstantin and Nick recommend: README, notifications to
user/dev lists x months out and include EOL in all release messages?

Please let me know/edit the README if there are other improvements we
should make.

Thank you, all!

Cheers,

 Tim

On Wed, Dec 15, 2021 at 1:20 PM Konstantin Gribov  wrote:
>
> My +1 to EOL on September 30, 2022 with effective backport submission
> freeze 3 months before that.
>
> I think it would be better if we mention the EOL timeline at least in 3
> places: in each release announcement, in README and on the site (on the
> main page or in release news articles). Different downstream users look at
> different sources, so more visibility seems to be a good idea to me. I saw
> a lot of projects still using log4j 1.2.x in the wild and have a feeling
> that it's partially due to lack of visibility about its EOL.
>
> Also we can send a message to announce@a.o (if it's not discouraged by ASF
> policies, I don't recall if somebody did something similar before),
> user@tika.a.o and dev@tika.a.o 6 and 3 months before EOL date.
>
> --
> Best regards,
> Konstantin Gribov.
>
>
> On Wed, Dec 15, 2021 at 9:00 PM Nick Burch  wrote:
>
> > On Wed, 15 Dec 2021, Tim Allison wrote:
> > > Sounds good, Nick.  Unless there are objections, I'll add an EOL
> > > September 30, 2022 for the 1.x branch on our github README and maybe our
> > > site somewhere?
> >
> > Maybe just mention it in the news section at the end any 1.x fix releases?
> >
> > Nick
> >


[jira] [Commented] (TIKA-3446) OneNote - look into adding support for OneNote 365 documents

2021-12-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460222#comment-17460222
 ] 

ASF GitHub Bot commented on TIKA-3446:
--

nddipiazza merged pull request #465:
URL: https://github.com/apache/tika/pull/465


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> OneNote - look into adding support for OneNote 365 documents
> 
>
> Key: TIKA-3446
> URL: https://issues.apache.org/jira/browse/TIKA-3446
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.27
>Reporter: Nicholas DiPiazza
>Assignee: Nicholas DiPiazza
>Priority: Major
>
> While doing some parsing of OneNote documents, I was investigating a slew of 
> them that did not seem to parse very well. 
> When I did some digging, I found out that these documents were generated from 
> SharePoint Online. 
> I had hoped that OneNote documents generated from SharePoint Online would 
> just be the same as OnPrem OneNote documents from 2016, 2019 etc. 
> But turns out this is NOT the case. 
> I checked out the Microsoft specification MS-ONESTORE and found that the 
> documents do not match the specifications that are published. 
> Opened a community post: [Looking for the MS spec for OneNote 365 version - 
> Microsoft 
> Q&A|https://docs.microsoft.com/en-us/answers/questions/436943/looking-for-the-ms-spec-for-onenote-365-version-1.html]
> And also opened an internal ticket with Microsoft. 
> They will be responding soon with an analysis of my issue and we'll see if 
> there is anything we can do. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [tika] nddipiazza merged pull request #465: TIKA-3446 - 1.x port of TIKA-3446 - Support for parsing OneNote when Alternative Encoding Using the File Synchronization via SOAP Over HTTP Protoco

2021-12-15 Thread GitBox


nddipiazza merged pull request #465:
URL: https://github.com/apache/tika/pull/465


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (TIKA-3619) Augment README with build prerequisites

2021-12-15 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460229#comment-17460229
 ] 

Hudson commented on TIKA-3619:
--

FAILURE: Integrated in Jenkins build Tika » tika-main-jdk8 #390 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/390/])
TIKA-3619 Augment README with build prerequisites (#464) (github: 
[https://github.com/apache/tika/commit/b537253fa035f06c4377b2d4286ebf60f0d8206d])
* (edit) README.md


> Augment README with build prerequisites
> ---
>
> Key: TIKA-3619
> URL: https://issues.apache.org/jira/browse/TIKA-3619
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.2.0
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
>
> When [reviewing the 2.2.0 RC 
> |https://lists.apache.org/thread/pfwm8sn7w3lsrsckd8b9v3b32byj4zms] I became 
> aware that although Docker IS required to build tika-pipes modules, there is 
> no guidance to reflect that.
> I think we could cleanup the README to reflect the installation prerequisites.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3446) OneNote - look into adding support for OneNote 365 documents

2021-12-15 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460262#comment-17460262
 ] 

Hudson commented on TIKA-3446:
--

UNSTABLE: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #153 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/153/])
port the TIKA-3446 work from the 2.x branch. (#465) (github: 
[https://github.com/apache/tika/commit/b2e442ddcb3f9c870bed1daf9827413423eba219])
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/util/UuidUtils.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/property/IProperty.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/OneNoteDocument.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/EncryptionObject.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/unsigned/UShort.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/IndentUtil.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/JCIDObject.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/property/NoData.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/StreamObjectParseErrorException.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/basic/ZipHeader.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/RevisionStoreObjectGroup.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/StorageManifestSchemaGUID.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/RevisionManifestRootDeclare.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/exception/DataElementParseErrorException.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/basic/AlternativePackaging.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/StorageIndexManifestMapping.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/FileNodePtrBackPush.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/OneNotePtr.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/CellManifestCurrentRevision.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/StreamObjectHeaderStart.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/basic/BinaryItem.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/unsigned/ULong.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/ObjectGroupObjectData.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/PropertySet.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/util/BitConverter.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/DataElementHash.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/StreamObject.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/chunking/SimpleChunking.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/util/Bit.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/ObjectGroupObjectDataBLOBReference.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/space/ObjectSpaceObjectStreamOfOSIDs.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/basic/AdapterHelper.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/basic/DataNodeObjectData.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/basic/JCID.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/chunking/ChunkingFactory.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/PropertySet.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/ObjectGroupMetadataDeclarations.java
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/onenote/fsshttpb/streamobj/DataElementPackage.java
* (edit) 
tika-p

Re: [DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-15 Thread Luís Filipe Nassif
Great, Thank you, Tim!

Em qua., 15 de dez. de 2021 às 16:50, Tim Allison 
escreveu:

> I've merged Lewis's edits to the README and added the EOL.  Let's do
> what both Konstantin and Nick recommend: README, notifications to
> user/dev lists x months out and include EOL in all release messages?
>
> Please let me know/edit the README if there are other improvements we
> should make.
>
> Thank you, all!
>
> Cheers,
>
>  Tim
>
> On Wed, Dec 15, 2021 at 1:20 PM Konstantin Gribov 
> wrote:
> >
> > My +1 to EOL on September 30, 2022 with effective backport submission
> > freeze 3 months before that.
> >
> > I think it would be better if we mention the EOL timeline at least in 3
> > places: in each release announcement, in README and on the site (on the
> > main page or in release news articles). Different downstream users look
> at
> > different sources, so more visibility seems to be a good idea to me. I
> saw
> > a lot of projects still using log4j 1.2.x in the wild and have a feeling
> > that it's partially due to lack of visibility about its EOL.
> >
> > Also we can send a message to announce@a.o (if it's not discouraged by
> ASF
> > policies, I don't recall if somebody did something similar before),
> > user@tika.a.o and dev@tika.a.o 6 and 3 months before EOL date.
> >
> > --
> > Best regards,
> > Konstantin Gribov.
> >
> >
> > On Wed, Dec 15, 2021 at 9:00 PM Nick Burch  wrote:
> >
> > > On Wed, 15 Dec 2021, Tim Allison wrote:
> > > > Sounds good, Nick.  Unless there are objections, I'll add an EOL
> > > > September 30, 2022 for the 1.x branch on our github README and maybe
> our
> > > > site somewhere?
> > >
> > > Maybe just mention it in the news section at the end any 1.x fix
> releases?
> > >
> > > Nick
> > >
>