[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2022-01-20 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479715#comment-17479715
 ] 

Hudson commented on TIKA-3164:
--

UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #430 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/430/])
TIKA-3164 -- avoid deprecated SAXHelper (tallison: 
[https://github.com/apache/tika/commit/7ed25f2e61994c51e2ba38e11bdd1ec15ed1f625])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xwpf/XWPFEventBasedWordExtractor.java
TIKA-3164 -- avoid deprecated SAXHelper -- fix checkstyle (tallison: 
[https://github.com/apache/tika/commit/c8804ad5d0a5c48a7947018b7d319c00980bbbcb])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xwpf/XWPFEventBasedWordExtractor.java


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.2.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2022-01-20 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479623#comment-17479623
 ] 

Hudson commented on TIKA-3164:
--

UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #429 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/429/])
TIKA-3164, take two -- still broken in the bundle (tallison: 
[https://github.com/apache/tika/commit/4933fc4750ef200b19a2839fc1249ecdd23ec67f])
* (edit) tika-bundles/tika-bundle-standard/pom.xml
* (edit) 
tika-bundles/tika-bundle-standard/src/test/java/org/apache/tika/bundle/BundleIT.java
TIKA-3164: better but still broken (tallison: 
[https://github.com/apache/tika/commit/e3ff863058d3bb8036fb618c314966824e295c99])
* (edit) tika-bundles/tika-bundle-standard/pom.xml
TIKA-3164 -- Upgrade to Apache POI 5.2.0.  Many thanks to PJ Fanning for fixing 
the osgi integration. (tallison: 
[https://github.com/apache/tika/commit/b30ef77763d54a742143fa6af139d532b99297d6])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSLFPowerPointExtractorDecorator.java
* (edit) CHANGES.txt
* (edit) tika-bundles/tika-bundle-standard/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSTextExtractor.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xwpf/XWPFEventBasedWordExtractor.java
* (edit) tika-parent/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xslf/XSLFEventBasedPowerPointExtractor.java
* (edit) tika-eval/tika-eval-app/src/main/resources/log4j2.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.2.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2022-01-18 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478095#comment-17478095
 ] 

Tim Allison commented on TIKA-3164:
---

Will give these a try.  Thank you!

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2022-01-18 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478079#comment-17478079
 ] 

PJ Fanning commented on TIKA-3164:
--

Again, I know nothing about OSGI but 

com.sun.org.apache.xpath.internal.jaxp is the package where the default 
XPathFactory is and you probably need to add that to  
[https://github.com/apache/tika/blob/TIKA-3164-v2/tika-bundles/tika-bundle-standard/pom.xml|https://github.com/apache/tika/blob/TIKA-3164-v2/tika-bundles/tika-bundle-standard/pom.xml#L209]
 

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2022-01-18 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478068#comment-17478068
 ] 

PJ Fanning commented on TIKA-3164:
--

[~tallison] I don't know much about the Tika build. I notice though that 
[https://github.com/apache/tika/blob/main/tika-bundles/tika-bundle-standard/pom.xml#L146]
 makes no reference of log4j-api which is an important dependency for POI

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2022-01-18 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477990#comment-17477990
 ] 

Tim Allison commented on TIKA-3164:
---

Thank you [~pj.fanning] !  I'm still not able to figure out how to configure 
our tika-bundle-standard (branch TIKA-3164-v2) to work with POI 5.x.  If you or 
anyone else can help with this, I'd appreciate it.  I'd really like to move to 
POI 5.x.

 
{noformat}

org.ops4j.pax.logging.pax-logging-api[org.ops4j.pax.logging.internal.Activator] 
: Disabling JULI Logger API support.
[ERROR] SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
[ERROR] SLF4J: Defaulting to no-operation (NOP) logger implementation
[ERROR] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for 
further details.
[ERROR] java.lang.NoClassDefFoundError: 
org/apache/logging/log4j/spi/LoggerContextFactory
[ERROR]     at 
org.apache.poi.openxml4j.util.ZipSecureFile.(ZipSecureFile.java:37)
[ERROR]     at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.(OOXMLParser.java:103)
[ERROR]     at sun.misc.Unsafe.ensureClassInitialized(Native Method)
[ERROR]     at 
sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
[ERROR]     at 
sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:156)
[ERROR]     at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1088)
[ERROR]     at java.lang.reflect.Field.getFieldAccessor(Field.java:1069)
[ERROR]     at java.lang.reflect.Field.getLong(Field.java:611)
[ERROR]     at 
java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1875)
[ERROR]     at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:79)
[ERROR]     at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:506)
[ERROR]     at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494)
[ERROR]     at java.security.AccessController.doPrivileged(Native Method)
[ERROR]     at java.io.ObjectStreamClass.(ObjectStreamClass.java:494)
[ERROR]     at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391)
[ERROR]     at 
java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681)
[ERROR]     at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2003)
[ERROR]     at 
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1850)
[ERROR]     at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2160)
[ERROR]     at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
[ERROR]     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
[ERROR]     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
[ERROR]     at java.util.ArrayList.readObject(ArrayList.java:799)
[ERROR]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR]     at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR]     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR]     at java.lang.reflect.Method.invoke(Method.java:498)
[ERROR]     at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184)
[ERROR]     at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2296)
[ERROR]     at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
[ERROR]     at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
[ERROR]     at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
[ERROR]     at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
[ERROR]     at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
[ERROR]     at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
[ERROR]     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
[ERROR]     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
[ERROR]     at 
org.apache.tika.fork.ForkObjectInputStream.readObject(ForkObjectInputStream.java:97)
[ERROR]     at org.apache.tika.fork.ForkServer.readObject(ForkServer.java:293)
[ERROR]     at 
org.apache.tika.fork.ForkServer.initializeParserAndLoader(ForkServer.java:209)
[ERROR]     at 
org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:147)
[ERROR]     at org.apache.tika.fork.ForkServer.main(ForkServer.java:121)
[ERROR] Caused by: java.lang.ClassNotFoundException: Unable to find class 
org.apache.logging.log4j.spi.LoggerContextFactory
[ERROR]     at 
org.apache.tika.fork.ClassLoaderProxy.findClass(ClassLoaderProxy.java:119)
[ERROR]     at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
[ERROR]     at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
[ERROR]     ... 42 more
 {noformat}
and
{noformat}

[ERROR] org.apache.tika.bundle.BundleIT.testPoiTikaBundle  Time elapsed: 2.648 
s  <<< ERROR!
java.lang.RuntimeException: XPathFactory#newInstance() failed to create an 

[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2022-01-14 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476495#comment-17476495
 ] 

PJ Fanning commented on TIKA-3164:
--

POI 5.2.0 is out - it has a fix for 
[https://bz.apache.org/bugzilla/show_bug.cgi?id=65676] - so Tika won't need its 
own forked version of XSSFReader

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458686#comment-17458686
 ] 

Tim Allison commented on TIKA-3164:
---

Oh my goodness, thank you [~bob]!  There's no rush. POI 5.x will go out with an 
upgraded PDFBox early in the new year.  Thank you!

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-13 Thread Bob Paulin (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458683#comment-17458683
 ] 

Bob Paulin commented on TIKA-3164:
--

Hey [~tallison] .  See the mention but will likely not get to this for a few 
day.  Did a few tests yesterday and I'm able to recreate your results on my 
machine but don't have any specific recommendations yet.

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458428#comment-17458428
 ] 

Tim Allison commented on TIKA-3164:
---

I'm still not able to get the bundle to work; see the TIKA-3164-v2 branch.

I'm now getting two exceptions, one in the ForkParser test, one in the poi 
bundle test.

ForkParser test

{noformat}
java.lang.NoClassDefFoundError: 
org/apache/logging/log4j/spi/LoggerContextFactory
[ERROR] at 
org.apache.poi.openxml4j.util.ZipSecureFile.(ZipSecureFile.java:37)
[ERROR] at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.(OOXMLParser.java:103)
[ERROR] at sun.misc.Unsafe.ensureClassInitialized(Native Method)
[ERROR] at 
sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
[ERROR] at 
sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:156)
[ERROR] at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1088)
[ERROR] at java.lang.reflect.Field.getFieldAccessor(Field.java:1069)
[ERROR] at java.lang.reflect.Field.getLong(Field.java:611)
{noformat}

and testPoiTikaBundle
{noformat}
java.lang.RuntimeException: XPathFactory#newInstance() failed to create an 
XPathFactory for the default object model: http://java.sun.com/jaxp/xpath/dom 
with the XPathFactoryConfigurationException: 
javax.xml.xpath.XPathFactoryConfigurationException: No XPathFctory 
implementation found for the object model: http://java.sun.com/jaxp/xpath/dom
at org.apache.tika.bundle.BundleIT.testPoiTikaBundle(BundleIT.java:313)
{noformat}


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457841#comment-17457841
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison opened a new pull request #462:
URL: https://github.com/apache/tika/pull/462


   
   
   Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! 
Your help is appreciated!
   
   Before opening the pull request, please verify that
   * there is an open issue on the [Tika issue 
tracker](https://issues.apache.org/jira/projects/TIKA) which describes the 
problem or the improvement. We cannot accept pull requests without an issue 
because the change wouldn't be listed in the release notes.
   * the issue ID (`TIKA-`)
 - is referenced in the title of the pull request
 - and placed in front of your commit messages surrounded by square 
brackets (`[TIKA-] Issue or pull request title`)
   * commits are squashed into a single one (or few commits for larger changes)
   * Tika is successfully built and unit tests pass by running `mvn clean test`
   * there should be no conflicts when merging the pull request branch into the 
*recent* `main` branch. If there are conflicts, please try to rebase the pull 
request branch on top of a freshly pulled `main` branch.
   
   We will be able to faster integrate your pull request if these conditions 
are met. If you have any questions how to fix your problem or about using Tika 
in general, please sign up for the [Tika mailing 
list](http://tika.apache.org/mail-lists.html). Thanks!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457821#comment-17457821
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison merged pull request #462:
URL: https://github.com/apache/tika/pull/462


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-11 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457688#comment-17457688
 ] 

Hudson commented on TIKA-3164:
--

UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #381 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/381/])
TIKA-3164, revert back to poi 4.x in main (tallison: 
[https://github.com/apache/tika/commit/10d925439cd862f74679ec5fa9a9b5863f50ce2c])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xslf/XSLFEventBasedPowerPointExtractor.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
* (delete) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OPCPackageWrapper.java
* (edit) tika-bundles/tika-bundle-standard/pom.xml
* (edit) CHANGES.txt
* (edit) tika-parent/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/detect/microsoft/ooxml/OPCPackageDetector.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xwpf/XWPFEventBasedWordExtractor.java
* (delete) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/TikaXSSFSheetXMLHandler.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFExcelExtractorDecorator.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSTextExtractor.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/SXSLFPowerPointExtractorDecorator.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSLFPowerPointExtractorDecorator.java


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-11 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457626#comment-17457626
 ] 

Tim Allison commented on TIKA-3164:
---

I've created a TIKA-3164-v2 branch until we can fix POI 5.1.0 in bundle.  I'll 
revert main back to POI 4.1.2 for now.

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-11 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457604#comment-17457604
 ] 

Tim Allison commented on TIKA-3164:
---

I'm really hoping we don't have to do this: 
https://craftsmen.nl/getting-log4j2-to-work-in-an-osgi-context/

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-10 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457495#comment-17457495
 ] 

Hudson commented on TIKA-3164:
--

UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #380 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/380/])
TIKA-3164 update POI to 5.1.0 -- try to fix bundle (tallison: 
[https://github.com/apache/tika/commit/c2ee0234700519e95aefe4199d91a4d6b56b5ec6])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/pom.xml
* (edit) tika-parsers/tika-parsers-ml/tika-parser-nlp-module/pom.xml
* (edit) tika-pipes/tika-fetchers/tika-fetcher-http/pom.xml
* (edit) tika-langdetect/tika-langdetect-tika/pom.xml
* (edit) tika-pipes/tika-emitters/tika-emitter-fs/pom.xml
* (edit) tika-pipes/pom.xml
* (edit) tika-pipes/tika-fetchers/tika-fetcher-s3/pom.xml
* (edit) tika-pipes/tika-pipes-iterators/tika-pipes-iterator-solr/pom.xml
* (edit) tika-translate/pom.xml
* (edit) tika-integration-tests/tika-pipes-s3-integration-tests/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-ocr-module/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-audiovideo-module/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-jdbc-commons/pom.xml
* (edit) tika-parsers/tika-parsers-extended/pom.xml
* (edit) tika-parsers/tika-parsers-ml/tika-parser-advancedmedia-module/pom.xml
* (edit) tika-pipes/tika-httpclient-commons/pom.xml
* (edit) tika-server/pom.xml
* (edit) tika-parsers/tika-parsers-extended/tika-parser-sqlite3-package/pom.xml
* (edit) pom.xml
* (edit) tika-pipes/tika-pipes-iterators/tika-pipes-iterator-gcs/pom.xml
* (edit) tika-parsers/tika-parsers-ml/tika-transcribe-aws/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-commons/pom.xml
* (edit) tika-integration-tests/tika-pipes-solr-integration-tests/pom.xml
* (edit) tika-parsers/pom.xml
* (edit) tika-serialization/pom.xml
* (edit) tika-fuzzing/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-crypto-module/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-module/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/pom.xml
* (edit) tika-server/tika-server-core/pom.xml
* (edit) tika-app/pom.xml
* (edit) tika-langdetect/tika-langdetect-lingo24/pom.xml
* (edit) tika-parsers/tika-parsers-ml/pom.xml
* (edit) tika-eval/tika-eval-core/pom.xml
* (edit) tika-java7/pom.xml
* (edit) tika-example/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-apple-module/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-mail-commons/pom.xml
* (edit) tika-integration-tests/pom.xml
* (edit) tika-integration-tests/tika-pipes-opensearch-integration-tests/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-digest-commons/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-mail-module/pom.xml
* (edit) tika-bundles/pom.xml
* (edit) tika-langdetect/tika-langdetect-opennlp/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-xmp-commons/pom.xml
* (edit) tika-langdetect/tika-langdetect-optimaize/pom.xml
* (edit) tika-parsers/tika-parsers-extended/tika-parser-sqlite3-module/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/pom.xml
* (edit) tika-parsers/tika-parsers-ml/tika-age-recogniser/pom.xml
* (edit) tika-pipes/tika-emitters/tika-emitter-gcs/pom.xml
* (edit) tika-server/tika-server-standard/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-font-module/pom.xml
* (edit) tika-pipes/tika-emitters/pom.xml
* (edit) tika-parent/pom.xml
* (edit) tika-pipes/tika-emitters/tika-emitter-s3/pom.xml
* (edit) tika-eval/pom.xml
* (edit) CHANGES.txt
* (edit) tika-pipes/tika-pipes-iterators/tika-pipes-iterator-s3/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-miscoffice-module/pom.xml
* (edit) tika-langdetect/tika-langdetect-test-commons/pom.xml
* (edit) tika-pipes/tika-fetchers/tika-fetcher-gcs/pom.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-news-module/pom.xml
* (edit) tika-parsers/tika-parsers-ml/tika-dl/pom.xml
* (edit) tika-bundles/tika-bundle-standard/pom.xml
* (edit) tika-pipes/tika-emitters/tika-emitter-opensearch/pom.xml
* (edit) 
tika-parsers/tika-parsers-extended/tika-parser-scientific-package/pom.xml
* (edit) tika-langdetect/tika-langdetect-mitll-text/pom.xml
* (edit) tika-pipes/tika-emitters/tika-emitter-solr/pom.xml
* (

[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-10 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457482#comment-17457482
 ] 

Tim Allison commented on TIKA-3164:
---

[~bobpaulin], I broke the bundle.  Please help if you can.  POI now requires 
log4j2.  How do we handle that in the bundle? 

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-10 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457427#comment-17457427
 ] 

Hudson commented on TIKA-3164:
--

UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #379 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/379/])
TIKA-3164 -- upgrade to POI 5.1.0 (#462) (github: 
[https://github.com/apache/tika/commit/22261ab09b2809847da87f24252dad2dfde81978])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/MetadataExtractor.java
* (add) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/TikaXSSFSheetXMLHandler.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFExcelExtractorDecorator.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xslf/XSLFEventBasedPowerPointExtractor.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xwpf/XWPFEventBasedWordExtractor.java
* (add) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OPCPackageWrapper.java
* (edit) CHANGES.txt
* (add) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/log4j2.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSTextExtractor.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/detect/microsoft/ooxml/OPCPackageDetector.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSLFPowerPointExtractorDecorator.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/SXSLFPowerPointExtractorDecorator.java
* (edit) tika-parent/pom.xml
TIKA-3164 update POI to 5.1.0 -- fix convergence checks (tallison: 
[https://github.com/apache/tika/commit/dbc680f500b83621b06deb7bb7aa23f9bda39efa])
* (edit) tika-parent/pom.xml


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-10 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457365#comment-17457365
 ] 

Tim Allison commented on TIKA-3164:
---

Many thanks to [~kiwiwings],  [~pj.fanning] and the POI team for their help in 
this upgrade!

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457364#comment-17457364
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison merged pull request #462:
URL: https://github.com/apache/tika/pull/462


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457362#comment-17457362
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison opened a new pull request #462:
URL: https://github.com/apache/tika/pull/462


   
   
   Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! 
Your help is appreciated!
   
   Before opening the pull request, please verify that
   * there is an open issue on the [Tika issue 
tracker](https://issues.apache.org/jira/projects/TIKA) which describes the 
problem or the improvement. We cannot accept pull requests without an issue 
because the change wouldn't be listed in the release notes.
   * the issue ID (`TIKA-`)
 - is referenced in the title of the pull request
 - and placed in front of your commit messages surrounded by square 
brackets (`[TIKA-] Issue or pull request title`)
   * commits are squashed into a single one (or few commits for larger changes)
   * Tika is successfully built and unit tests pass by running `mvn clean test`
   * there should be no conflicts when merging the pull request branch into the 
*recent* `main` branch. If there are conflicts, please try to rebase the pull 
request branch on top of a freshly pulled `main` branch.
   
   We will be able to faster integrate your pull request if these conditions 
are met. If you have any questions how to fix your problem or about using Tika 
in general, please sign up for the [Tika mailing 
list](http://tika.apache.org/mail-lists.html). Thanks!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-10 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457357#comment-17457357
 ] 

Tim Allison commented on TIKA-3164:
---

https://bz.apache.org/bugzilla/show_bug.cgi?id=65326 ?

Y, I agree that logging for security vulnerabilities is important which is why 
in the proposal above, I carved out info level reporting from the XMLHelper.

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-10 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457355#comment-17457355
 ] 

PJ Fanning commented on TIKA-3164:
--

[~tallison] I added a comment on 
https://bz.apache.org/bugzilla/show_bug.cgi?id=65683

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-10 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457352#comment-17457352
 ] 

Tim Allison commented on TIKA-3164:
---

I reran the large scale regression tests after making a local patch for 65676, 
and everything looks to be in good shape.  I also ran a hefty set of 
multithreading tests and found no problems.  I'll merge this into main shortly.

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-10 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457332#comment-17457332
 ] 

Tim Allison commented on TIKA-3164:
---

I'm really grateful that POI has moved to log4j2 (today's news 
notwithstanding)... The amount of new, effective logging is several orders of 
magnitude larger than 4.x.  I had 36MB of logs with 4.x on ~400k MSOffice 
files, and my log for 5.x will probably be around 5GB once the run is complete.

I'm wondering if we should configure default logging in tika-app and 
tika-server to turn off POI's logging or if we should add massive warnings in 
the release notes?

Something like this that would allow XMLHelper's warning?  [~pj.fanning] and 
fellow Tika devs, what do you think?
{noformat}


  


  

{noformat}

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-09 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456772#comment-17456772
 ] 

PJ Fanning commented on TIKA-3164:
--

[~tallison] we try to set as many settings as possible to prevent the XML 
parser or transformer from being susceptible to XXE issues and if a user's JAXP 
setup loads implementations that are less safe then they could be susceptible 
to XXE. From what I have seen, it will be unpopular for us to force uptake of 
particular parser and transformer implementations. These days, xerces is not 
regularly released and the forks of xerces that are built into the Java runtime 
probably are safer. You could say the same for xalan. On the transformer side, 
you have saxon as an alternative.

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-09 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456760#comment-17456760
 ] 

Tim Allison commented on TIKA-3164:
---

Y. 5.1.0

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-09 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456755#comment-17456755
 ] 

Tim Allison commented on TIKA-3164:
---

All sounds good. Thank you, [~pj.fanning]. I’ll open issues and share files 
when I’m back to a keyboard.

In the external schema issue, two questions:
1) can we force xerces or, frankly, any specific implementation? We had issues 
before where users had different default xml parsing than I did and debugging 
was a pain, and we couldn’t guarantee consistency across platforms.
2) I see in the comments in the issue that the logging is benign. I wanted to 
confirm that we are not vulnerable to xxe. 

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-09 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456741#comment-17456741
 ] 

PJ Fanning commented on TIKA-3164:
--

POI 4 and below have custom logging which took effort to enable so most users 
would never see POI logging. With POI 5, logging events that noone ever saw are 
now being seen.

I guess we need to start looking at the more annoying log messages and 
downgrading them to info or even debug. For 1 and 2, could you create POI 
issues and attach files that cause the logging?

For accessExternalSchema issue, we have 
[https://bz.apache.org/bugzilla/show_bug.cgi?id=65326]

I've been relucatant to entirey remove this logging but if you could add the 
full stacktrace there, we can look into it? You are using POI 5.1.0?

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-09 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456737#comment-17456737
 ] 

Tim Allison commented on TIKA-3164:
---

Y. Thank you [~pj.fanning]!  That's exactly it.  I can fix it on the Tika side 
for now by copy/pasting XSSFSheetXMLHandler.

Three other points of interest:

1)  I'm getting this on quite a few files in our regression set.  Warnings are 
great, but is something else going on? org.apache.poi.hpsf.CodePageString 
String terminator (\0) for CodePageString property value occurred before the 
end of string. Trimming and hope for the best. 

2) I'm getting a lot of these warnings.  Should we be checking if an entry is a 
directory before adding them to the parts list: 
org.apache.poi.openxml4j.exceptions.InvalidFormatException: A part name shall 
not have a forward slash as the last character [M1.5]: /word/_rels/

3) How can I avoid this and make sure that we are not vulnerable to xxe? 
org.apache.poi.util.XMLHelper SAX Feature unsupported [log suppressed for 5 
minutes]http://javax.xml.XMLConstants/property/accessExternalSchema
java.lang.IllegalArgumentException: Property 
'http://javax.xml.XMLConstants/property/accessExternalSchema' is not recognized.

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-09 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456722#comment-17456722
 ] 

PJ Fanning commented on TIKA-3164:
--

[~tallison] could [https://bz.apache.org/bugzilla/show_bug.cgi?id=65676] be the 
same issue that you are running into with numbers in last column being merged 
with 1st column on next row?

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-09 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456718#comment-17456718
 ] 

PJ Fanning commented on TIKA-3164:
--

[~tallison] I committed this to POI svn just now - 
[https://github.com/apache/poi/commit/077815b37b9d325e2cf576c64ec5dd6a6f77fff4]

poi-ooxml-lite is only a subset of all the poi-ooxml-full and most of the stuff 
that is added is based on what classes and xsbs are loaded while we run our 
tests - but we have hacks to add some missing xsbs and classes.

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-09 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456688#comment-17456688
 ] 

Tim Allison commented on TIKA-3164:
---

I finally had time to run the regression tests against ~400k files.  The 
reports are here: https://corpora.tika.apache.org/base/share/reports-poi-5.x.tgz

There are ~20 fixed exceptions.

Two files have this new exception:
{noformat}
Could not locate compiled schema resource 
org/apache/poi/schemas/ooxml/system/ooxml/ctcustomxmlblockd3c1type.xsb
{noformat}

There's a very small regression in that in a handful of xlsx files, if there's 
a number in the last column of a row, it is not cleared before the content in 
the first cell of the next row.  So we get:
{noformat}
...1.5
1.5kultur...
from 
...1.5
kultur...

I'll open an issue with POI and see if I can patch this at the Tika level for 
now.

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-11-08 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440733#comment-17440733
 ] 

Tim Allison commented on TIKA-3164:
---

Thank you [~pj.fanning]!  I've started a new TIKA-3164 branch based on {{main}} 
to give this a try.

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-11-03 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438135#comment-17438135
 ] 

PJ Fanning commented on TIKA-3164:
--

POI 5.1.0 is released - maybe Tika will be able to use this version without 
running into the problems with POI 5.0.0.

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-06-29 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371382#comment-17371382
 ] 

Tim Allison commented on TIKA-3164:
---

I hand-checked some of the content diffs in {{spreadsheetml}} with a recent 
build of POI.  Tika is now extracting the missed content now when 
single-threaded.  My guess is that the fix above to 
{{setAllThreadsPreferEventExtractors}} actually fixed the issue, but I'll rerun 
in batch mode after we cut the 1.27 rc1.

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-05-07 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340873#comment-17340873
 ] 

Tim Allison commented on TIKA-3164:
---

NPE in wmf: https://bz.apache.org/bugzilla/show_bug.cgi?id=65293 

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-05-07 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340862#comment-17340862
 ] 

Tim Allison commented on TIKA-3164:
---

There was a multithreading, erm, feature in the Tika code that led to all the 
missing attachments... we have to call  
{{POIXMLExtractorFactory.setAllThreadsPreferEventExtractors(true);}} not {{ 
   POIXMLExtractorFactory.setThreadPrefersEventExtractors(true);}}

Will rerun shortly.

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-05-06 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340249#comment-17340249
 ] 

Tim Allison commented on TIKA-3164:
---

Reports are here: 
https://corpora.tika.apache.org/base/reports/poi-5.0.1-snapshot-reports.tgz

These compare the latest 4.x vs. 5.0.1-snapshot.  There's a new NPE in WMF 
parsing, and it looks like we're missing a bunch of attachments.

I also need to look into why there's less content coming out of 
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet ... this 
could be a Tika item, not POI...

Parse times seem to be slower for ooxml than in 4.x, but that could be an 
artifact of the mood of the vm at the time of running...

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-05-05 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339749#comment-17339749
 ] 

Tim Allison commented on TIKA-3164:
---

[~kiwiwings], I got the build to work with the latest.  I'm sorry for my delay. 
 I'm running the regression tests against MSOffice files now... Thank you!

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-04-19 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325401#comment-17325401
 ] 

Tim Allison commented on TIKA-3164:
---

Thank you [~kiwiwings] !

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-04-19 Thread Andreas Beeker (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325386#comment-17325386
 ] 

Andreas Beeker commented on TIKA-3164:
--

Added more .xsbs and classes - the old POI integration test code only processed 
every 2nd hierarchy ... m(

[http://svn.apache.org/viewvc?view=revision&revision=1888985]

[~tallison] Please give it a try when the code has been tested by Jenkins and 
the tests are green

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-03-24 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307829#comment-17307829
 ] 

Tim Allison commented on TIKA-3164:
---

Two files that cause this complaint: testPDFEmbeddingAndEmbedded.docx and 
test_recursive_embedded.docx.  Both here: 
https://github.com/apache/tika/tree/branch_1x/tika-parsers/src/test/resources/test-documents

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-03-24 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307826#comment-17307826
 ] 

Tim Allison commented on TIKA-3164:
---

Progress!  New missing xsb:
{noformat}
 XML-BEANS compiled schema: Could not locate compiled schema resource 
org/apache/poi/schemas/ooxml/system/ooxml/stoletype716btype.xsb 
(org.apache.poi.schemas.ooxml.system.ooxml.stoletype716btype)
{noformat}

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-03-23 Thread Andreas Beeker (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307472#comment-17307472
 ] 

Andreas Beeker commented on TIKA-3164:
--

I'm now recursing through .xlsx and .docx in our integration tests [1].

Please regenerate the lite jar via "ant clean test test-integration 
test-ooxml-lite" in POI and try again in TIKA.

[1] http://svn.apache.org/viewvc?view=revision&revision=1887978

 

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-03-23 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307139#comment-17307139
 ] 

Tim Allison commented on TIKA-3164:
---

Yep, that's exactly what's going on.  I found that if I uncomment {{

}} in the build, the necessary files are included.

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-03-23 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307135#comment-17307135
 ] 

PJ Fanning commented on TIKA-3164:
--

[~tallison] I don't know for definite but we have 2 jars with the XMLBeans 
generated classes (for ooxml schemas) - they were renamed in POI 5 - 
poi-ooxml-lite and poi-ooxml-full - I suspect that some stuff that you might 
need is missing and that it might be worth checking poi-ooxml-full 

If we know what's missing in poi-ooxml-lite, we can see about fixing POI build 
to include the missing bits

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-03-23 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307099#comment-17307099
 ] 

Tim Allison commented on TIKA-3164:
---

[~fanningpj], many thanks for your help on this.  I'm now getting a clean build 
on 5.0.1-SNAPSHOT.

With the Tika integration, though, I'm still getting the following exception on 
several unit tests.

When I look inside the {{ooxml-lite}} jar for both 5.0.0 and 5.0.1-SNAPSHOT 
(even after I add Tika's {{EmbeddedDocument.docx}}, I see 
{{org/apache/poi/schemas/ooxml/system/oleobjelement.xsb}} but not 
{{/oleobjectelement.xsb}}.

Any idea how to fix this?

{noformat}
Caused by: org.apache.xmlbeans.SchemaTypeLoaderException: XML-BEANS compiled 
schema: Could not locate compiled schema resource 
org/apache/poi/schemas/ooxml/system/ooxml/oleobjectelement.xsb 
(org.apache.poi.schemas.ooxml.system.ooxml.oleobjectelement) - code 0
at 
org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.(SchemaTypeSystemImpl.java:1315)
at 
org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.resolveHandle(SchemaTypeSystemImpl.java:3138)
at 
org.apache.xmlbeans.SchemaComponent$Ref.getComponent(SchemaComponent.java:113)
at 
org.apache.xmlbeans.SchemaGlobalElement$Ref.get(SchemaGlobalElement.java:76)
at 
org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.findElement(SchemaTypeLoaderBase.java:103)
at 
org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:988)
at 
org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:913)
at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1597)
at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2571)
at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2565)
at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:819)
at 
org.apache.xmlbeans.impl.store.Cursor.syncWrapHelper(Cursor.java:2522)
at org.apache.xmlbeans.impl.store.Cursor.syncWrap(Cursor.java:2453)
at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2080)
at 
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractParagraph(XWPFWordExtractorDecorator.java:236)
at 
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractIBodyText(XWPFWordExtractorDecorator.java:161)
at 
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:124)
at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:136)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:214)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:113)

{noformat}

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289198#comment-17289198
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784360595


   Please don't waste time on this...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289196#comment-17289196
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784360102


   Y, see my branch.  We have to do a coupla handfuls of stuff on the tika side.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289086#comment-17289086
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

pjfanning commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784212148


   I made a mistake while building this. Looks like there are a few issues with 
the upgrade.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289085#comment-17289085
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

pjfanning closed pull request #404:
URL: https://github.com/apache/tika/pull/404


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289060#comment-17289060
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784174494


   K.  jdk 8 _should_ work, right?  I'll ping the dev list.  Thank you!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289059#comment-17289059
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

pjfanning commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784173672


   I'm not getting that issue - I'm using zulu jdk 11.0.7 and ant 1.10.8



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289053#comment-17289053
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison edited a comment on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784167038


   Clean checkout.
   
   `ant compile` appears to work.
   
   `ant test` fails with:
   
   ```
[echo] Using Ant: Apache Ant(TM) version 1.10.9 compiled on September 27 
2020 from /apache/apache-ant-1.10.9, Ant detected Java 1.8 (may be different 
than actual Java sometimes...)
[echo] Using Java: 1.8.0_282/1.8.0_282-b08/25.282-b08/OpenJDK 64-Bit 
Server VM from AdoptOpenJDK on Linux: 5.8.0-43-generic
[echo] Building Apache POI version 5.0.1-SNAPSHOT and RC: RC1
   
   
   test-main:
   [javac] Compiling 1 source file to 
/home/tallison/Intellij/poi-trunk/build/poi-ant-contrib
   
   -test-main-write-testfile:
   
   -test-scratchpad-check:
   
   test-scratchpad-download-resources:
   
   test-scratchpad:
   
   -test-scratchpad-write-testfile:
   
   -test-ooxml-check:
   
   test-ooxml:
   
   -test-ooxml-write-testfile:
   
   compile-ooxml-lite:
[echo] Create ooxml-lite schemas
   
   BUILD FAILED
   /home/tallison/Intellij/poi-trunk/build.xml:1812: 
/home/tallison/Intellij/poi-trunk/build/ooxml-lite-report.clazz doesn't exist
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289052#comment-17289052
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784167038


   Clean checkout.
   
   `ant compile` appears to work.
   
   `ant test` fails with:
   
   ```test-main:
   [javac] Compiling 1 source file to 
/home/tallison/Intellij/poi-trunk/build/poi-ant-contrib
   
   -test-main-write-testfile:
   
   -test-scratchpad-check:
   
   test-scratchpad-download-resources:
   
   test-scratchpad:
   
   -test-scratchpad-write-testfile:
   
   -test-ooxml-check:
   
   test-ooxml:
   
   -test-ooxml-write-testfile:
   
   compile-ooxml-lite:
[echo] Create ooxml-lite schemas
   
   BUILD FAILED
   /home/tallison/Intellij/poi-trunk/build.xml:1812: 
/home/tallison/Intellij/poi-trunk/build/ooxml-lite-report.clazz doesn't exist
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289047#comment-17289047
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784155288


   I'm happy to do so for testing, but I'm hesitant to add even more to tika.  
The point of 2.x is to modularize and make dependencies smaller.  I wouldn't 
rule it out, necessarily...
   
   Any recs on the above build failure?  Thank you!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289048#comment-17289048
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

pjfanning commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784156188


   I'd suggest a clean checkout - there could be some stuff hanging around that 
`ant clean` is not removing



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289040#comment-17289040
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

pjfanning commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784152065


   @tballison would it be worth just using ooxml-schemas-full on tika - tika is 
big so the benefit of ooxml-schemas-lite is lower



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289041#comment-17289041
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

pjfanning edited a comment on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784152065


   @tballison would it be worth just using ooxml-schemas-full on tika? - tika 
is big so the benefit of ooxml-schemas-lite is lower



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289020#comment-17289020
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784129952


   I'm sure the above is user error.  I've been away from POI for too 
long...argh...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289019#comment-17289019
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison edited a comment on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784128134


   ```
   openjdk version "1.8.0_282"
   OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_282-b08)
   OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.282-b08, mixed mode)
   ```
   
   Ubuntu
   
   
   Uninstalled old ant
   Installed new ant
   ```
   ant -f fetch.xml -Ddest=system
   ```
   
   ```
   echo ANT_HOME
   /apache/apache-ant-1.10.9
   ```
   ```
   ant -v
   Apache Ant(TM) version 1.10.9 compiled on September 27 2020
   ```
   
   ant clean test 
   
   ```BUILD FAILED
   BUILD FAILED
   /home/tallison/Intellij/poi-trunk/build.xml:1812: 
/home/tallison/Intellij/poi-trunk/build/ooxml-lite-report.clazz doesn't exist
   
   Total time: 2 minutes 58 seconds
   
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289018#comment-17289018
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784128134


   Uninstalled old ant
   Installed new ant
   ```
   ant -f fetch.xml -Ddest=system
   ```
   
   ```
   echo ANT_HOME
   /apache/apache-ant-1.10.9
   ```
   ```
   ant -v
   Apache Ant(TM) version 1.10.9 compiled on September 27 2020
   ```
   
   ant clean test 
   
   ```BUILD FAILED
   BUILD FAILED
   /home/tallison/Intellij/poi-trunk/build.xml:1812: 
/home/tallison/Intellij/poi-trunk/build/ooxml-lite-report.clazz doesn't exist
   
   Total time: 2 minutes 58 seconds
   
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289013#comment-17289013
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

pjfanning commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784117840


   the gradle build depends quite a bit on the ant one - I would suggest 
getting ant build working and then the gradle build will probably start working



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289010#comment-17289010
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784115989


   Fresh checkout...
   
   ./gradlew build
   Results:
   ```> Task :ooxml:compileJava
   
/home/tallison/Intellij/poi-trunk/src/ooxml/java/org/apache/poi/xssf/usermodel/XSSFCell.java:564:
 error: cannot access DocumentFactory
   f = CTCellFormula.Factory.newInstance();
^
 class file for org.apache.xmlbeans.impl.schema.DocumentFactory not found
   
/home/tallison/Intellij/poi-trunk/src/ooxml/java/org/apache/poi/xssf/usermodel/XSSFColor.java:117:
 error: recursive constructor invocation
   public XSSFColor(byte[] rgb, IndexedColorMap colorMap) {
  ^
   
/home/tallison/Intellij/poi-trunk/src/ooxml/java/org/apache/poi/xddf/usermodel/XDDFLineProperties.java:42:
 error: recursive constructor invocation
   public XDDFLineProperties(XDDFFillProperties fill) {
  ^
   
/home/tallison/Intellij/poi-trunk/src/ooxml/java/org/apache/poi/xddf/usermodel/text/XDDFHyperlink.java:29:
 error: recursive constructor invocation
   public XDDFHyperlink(String id) {
   ```
   
   I'm guessing I need to run ant first to pull in the dependencies?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289009#comment-17289009
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

pjfanning commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784114343


   there is the ooxml-schemas-full jar for cases where ooxml-schemas-lite is 
missing stuff
   
   I thought all the xsb stuff was in ooxml-schemas-lite jar
   
   definitely worth adding a test case to poi code base
   
   POI 6.0.0 is probably going to be next release and it could be a couple of 
months (fairly big logging changes just merged and probably an uptake of a 
refactored xmlbeans jar)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289006#comment-17289006
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784110289


   I'm adding "test_recursive_embedded.docx" to a unit test in POI locally to 
see if I can get it to add the oleobjectelement.xsb in schemas-lite.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289001#comment-17289001
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784102052


   See: https://github.com/apache/tika/tree/TIKA-3164-1.x



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289000#comment-17289000
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784099481


   Aside from including the full schemas jar, is there a solution to this:
   ```[ERROR] Tests run: 13, Failures: 0, Errors: 12, Skipped: 0, Time elapsed: 
0.548 s <<< FAILURE! - in org.apache.tika.parser.RecursiveParserWrapperTest
   [ERROR] org.apache.tika.parser.RecursiveParserWrapperTest.testMaxEmbedded  
Time elapsed: 0.16 s  <<< ERROR!
   org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser@626c569b
at 
org.apache.tika.parser.RecursiveParserWrapperTest.testMaxEmbedded(RecursiveParserWrapperTest.java:191)
   Caused by: org.apache.xmlbeans.SchemaTypeLoaderException: XML-BEANS compiled 
schema: Could not locate compiled schema resource 
org/apache/poi/schemas/ooxml/system/ooxml/oleobjectelement.xsb 
(org.apache.poi.schemas.ooxml.system.ooxml.oleobjectelement) - code 0
at 
org.apache.tika.parser.RecursiveParserWrapperTest.testMaxEmbedded(RecursiveParserWrapperTest.java:191)
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288994#comment-17288994
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784055893


   Sorry for my delay!  Not clear on why that test failed for you.  Let me take 
a look. Working on this today.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-02-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288661#comment-17288661
 ] 

ASF GitHub Bot commented on TIKA-3164:
--

pjfanning commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-783721207


   @tballison I tried this on my laptop - the tika-parser microsoft tests 
passed but job later failed with
   
   ```
   [ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
13.181 s <<< FAILURE! - in org.apache.tika.parser.gdal.TestGDALParser
   [ERROR] org.apache.tika.parser.gdal.TestGDALParser.testParseBasicInfo  Time 
elapsed: 12.795 s  <<< FAILURE!
   java.lang.AssertionError
at 
org.apache.tika.parser.gdal.TestGDALParser.testParseBasicInfo(TestGDALParser.java:82)
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)