Re: [PR] Add Github CI workflows for multi-arch Docker images [tika-docker]

2023-11-03 Thread via GitHub


fpiesche commented on PR #19:
URL: https://github.com/apache/tika-docker/pull/19#issuecomment-1793274083

   A good opportunity to show how Dependabot PRs work! ;) after enabling 
Dependabot on my fork:
   
![Screenshot_20231103_230717](https://github.com/apache/tika-docker/assets/393620/6178dd6a-355a-4aee-9560-e1afdbcd376d)
   
   https://github.com/fpiesche/tika-docker/pulls?q=is%3Apr+is%3Aclosed
   
   None of these needed any changes before merging. Manually-run example build 
after closing the lot, to check the update hasn't broken anything: 
https://github.com/fpiesche/tika-docker/actions/runs/6751065869


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-4165) Fix a flaky test, caused by nondeterministic iteration order of HashMap

2023-11-03 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782797#comment-17782797
 ] 

Hudson commented on TIKA-4165:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1369 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1369/])
[TIKA-4165] fix a flaky test by replacing HashMap with LinkedHashMap (#1436) 
(github: 
[https://github.com/apache/tika/commit/81d8bd0e2b112f6a95ce06c7995f47804f1c3d28])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java


> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> ---
>
> Key: TIKA-4165
> URL: https://issues.apache.org/jira/browse/TIKA-4165
> Project: Tika
>  Issue Type: Improvement
>Reporter: Xinbo Lu
>Priority: Minor
> Attachments: diff
>
>
> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> h3. Related Test
> [org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious|https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59]
> h3. Root Cause
>  
> [tika/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java|https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119]
> Line 119 in 
> [d466492|https://github.com/lxb007981/tika/commit/d466492e0a01c8ee28c108bc3022f1a03ff530de]
> ||for (Map.Entry embeddedImage : embeddedImages.entrySet()) 
> {||
>  
> When recursively extracting metadata from an XPS file, a simple {{HashMap}} 
> is used and iterated. However, note that {{HashMap}} does not gurantee the 
> order of iteration, the extracted metadata have no guaranteed order in the 
> resulting metadata list. And later in the test 
> {{{}org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious{}}}, 
> the test assumes the order of metadata in the list is the same as the 
> {{HashMap}} insertion order, thus renders the test flaky.
> h3. Fix
> We sort the metadata list before doing comparison.
> h3. How to reproduce the test
> {*}Java version{*}: 11.0.20.1
> {*}Maven version{*}: 3.6.3
>  # Build the module
> {{mvn clean install -DskipTests -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  -am}}
>  # Test without shuffling
> {{mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  test 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious}}
> This test passed.
>  # Test with shuffling using 
> [NonDex|https://github.com/TestingResearchIllinois/NonDex]
> {{{}mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  edu.illinois:nondex-maven-plugin:2.1.1:nondex 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious{}}}This
>  test passed with the proposed fix but failed without it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] fix dl4j unit tests to handle 403 on model download [tika]

2023-11-03 Thread via GitHub


tballison merged PR #1439:
URL: https://github.com/apache/tika/pull/1439


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] fix dl4j unit tests to handle 403 on model download [tika]

2023-11-03 Thread via GitHub


tballison opened a new pull request, #1439:
URL: https://github.com/apache/tika/pull/1439

   
   
   Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! 
Your help is appreciated!
   
   Before opening the pull request, please verify that
   * there is an open issue on the [Tika issue 
tracker](https://issues.apache.org/jira/projects/TIKA) which describes the 
problem or the improvement. We cannot accept pull requests without an issue 
because the change wouldn't be listed in the release notes.
   * the issue ID (`TIKA-`)
 - is referenced in the title of the pull request
 - and placed in front of your commit messages surrounded by square 
brackets (`[TIKA-] Issue or pull request title`)
   * commits are squashed into a single one (or few commits for larger changes)
   * Tika is successfully built and unit tests pass by running `mvn clean test`
   * there should be no conflicts when merging the pull request branch into the 
*recent* `main` branch. If there are conflicts, please try to rebase the pull 
request branch on top of a freshly pulled `main` branch
   * if you add new module that downstream users will depend upon add it to 
relevant group in `tika-bom/pom.xml`.
   
   We will be able to faster integrate your pull request if these conditions 
are met. If you have any questions how to fix your problem or about using Tika 
in general, please sign up for the [Tika mailing 
list](http://tika.apache.org/mail-lists.html). Thanks!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-4165) Fix a flaky test, caused by nondeterministic iteration order of HashMap

2023-11-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782781#comment-17782781
 ] 

ASF GitHub Bot commented on TIKA-4165:
--

tballison merged PR #1436:
URL: https://github.com/apache/tika/pull/1436




> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> ---
>
> Key: TIKA-4165
> URL: https://issues.apache.org/jira/browse/TIKA-4165
> Project: Tika
>  Issue Type: Improvement
>Reporter: Xinbo Lu
>Priority: Minor
> Attachments: diff
>
>
> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> h3. Related Test
> [org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious|https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59]
> h3. Root Cause
>  
> [tika/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java|https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119]
> Line 119 in 
> [d466492|https://github.com/lxb007981/tika/commit/d466492e0a01c8ee28c108bc3022f1a03ff530de]
> ||for (Map.Entry embeddedImage : embeddedImages.entrySet()) 
> {||
>  
> When recursively extracting metadata from an XPS file, a simple {{HashMap}} 
> is used and iterated. However, note that {{HashMap}} does not gurantee the 
> order of iteration, the extracted metadata have no guaranteed order in the 
> resulting metadata list. And later in the test 
> {{{}org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious{}}}, 
> the test assumes the order of metadata in the list is the same as the 
> {{HashMap}} insertion order, thus renders the test flaky.
> h3. Fix
> We sort the metadata list before doing comparison.
> h3. How to reproduce the test
> {*}Java version{*}: 11.0.20.1
> {*}Maven version{*}: 3.6.3
>  # Build the module
> {{mvn clean install -DskipTests -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  -am}}
>  # Test without shuffling
> {{mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  test 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious}}
> This test passed.
>  # Test with shuffling using 
> [NonDex|https://github.com/TestingResearchIllinois/NonDex]
> {{{}mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  edu.illinois:nondex-maven-plugin:2.1.1:nondex 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious{}}}This
>  test passed with the proposed fix but failed without it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [TIKA-4165] fix a flaky test. [tika]

2023-11-03 Thread via GitHub


tballison merged PR #1436:
URL: https://github.com/apache/tika/pull/1436


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2023-11-03 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782743#comment-17782743
 ] 

Hudson commented on TIKA-4166:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1368 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1368/])
TIKA-4166: update jetty, jackrabbit, jaxb, mockito (tilman: 
[https://github.com/apache/tika/commit/baba0088017ed032c4bbf7fa92158b691bb5c374])
* (edit) tika-parent/pom.xml
TIKA-4166: update tyrus (tilman: 
[https://github.com/apache/tika/commit/69b4c447552fd20bd3b8f3097419a33b841a6c55])
* (edit) tika-translate/pom.xml
TIKA-4166: update jackrabbit, kafka and failsafe-plugin (tilman: 
[https://github.com/apache/tika/commit/fc5b0d29b370b3c94ebeecba0c0add09044f39eb])
* (edit) tika-parent/pom.xml


> dependency updates for Tika 3.0
> ---
>
> Key: TIKA-4166
> URL: https://issues.apache.org/jira/browse/TIKA-4166
> Project: Tika
>  Issue Type: Task
>  Components: build
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 3.0.0-BETA
>
>
> Separate ticket for updates for 3.0, especially those not found by dependabot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] Add Github CI workflows for multi-arch Docker images [tika-docker]

2023-11-03 Thread via GitHub


fpiesche commented on PR #19:
URL: https://github.com/apache/tika-docker/pull/19#issuecomment-1793029617

   With this most recent set of changes:
   
   * The workflow will automatically trigger if a new version tag (of format 
`*.*.*.*`) is created
   * The build will run and locally store the image, then
 * run the newly-built image
 * as per the `docker-tool.sh`, check that the service is responding to 
http requests and running as the expected user
   * If tests have passed, the image will be pushed to the remote repositories 
with the following tags:
 * the name of the latest tag
 * the Tika version (so e.g. the image tagged `2.8.0` will always be the 
latest build out of the `2.8.0.x` tags in the git repository)
 * `latest`
   
   And finally, here's an example build [triggered by creating a new GitHub 
release](https://github.com/fpiesche/tika-docker/actions/runs/6749676453)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2023-11-03 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782711#comment-17782711
 ] 

Hudson commented on TIKA-4166:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1367 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1367/])
TIKA-4166: update checkstyle, surefire plugins, commons cli, commons io 
(tilman: 
[https://github.com/apache/tika/commit/5392b96e2ea8c6b20de6c86b86171cd74329ef83])
* (edit) tika-parent/pom.xml


> dependency updates for Tika 3.0
> ---
>
> Key: TIKA-4166
> URL: https://issues.apache.org/jira/browse/TIKA-4166
> Project: Tika
>  Issue Type: Task
>  Components: build
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 3.0.0-BETA
>
>
> Separate ticket for updates for 3.0, especially those not found by dependabot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-4166) dependency updates for Tika 3.0

2023-11-03 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4166:
-

 Summary: dependency updates for Tika 3.0
 Key: TIKA-4166
 URL: https://issues.apache.org/jira/browse/TIKA-4166
 Project: Tika
  Issue Type: Task
  Components: build
Reporter: Tilman Hausherr
 Fix For: 3.0.0-BETA


Separate ticket for updates for 3.0, especially those not found by dependabot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4165) Fix a flaky test, caused by nondeterministic iteration order of HashMap

2023-11-03 Thread Xinbo Lu (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782626#comment-17782626
 ] 

Xinbo Lu commented on TIKA-4165:


Thank you for the suggestion. Now the fix is using `LinkedHashMap`.

> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> ---
>
> Key: TIKA-4165
> URL: https://issues.apache.org/jira/browse/TIKA-4165
> Project: Tika
>  Issue Type: Improvement
>Reporter: Xinbo Lu
>Priority: Minor
> Attachments: diff
>
>
> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> h3. Related Test
> [org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious|https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59]
> h3. Root Cause
>  
> [tika/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java|https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119]
> Line 119 in 
> [d466492|https://github.com/lxb007981/tika/commit/d466492e0a01c8ee28c108bc3022f1a03ff530de]
> ||for (Map.Entry embeddedImage : embeddedImages.entrySet()) 
> {||
>  
> When recursively extracting metadata from an XPS file, a simple {{HashMap}} 
> is used and iterated. However, note that {{HashMap}} does not gurantee the 
> order of iteration, the extracted metadata have no guaranteed order in the 
> resulting metadata list. And later in the test 
> {{{}org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious{}}}, 
> the test assumes the order of metadata in the list is the same as the 
> {{HashMap}} insertion order, thus renders the test flaky.
> h3. Fix
> We sort the metadata list before doing comparison.
> h3. How to reproduce the test
> {*}Java version{*}: 11.0.20.1
> {*}Maven version{*}: 3.6.3
>  # Build the module
> {{mvn clean install -DskipTests -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  -am}}
>  # Test without shuffling
> {{mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  test 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious}}
> This test passed.
>  # Test with shuffling using 
> [NonDex|https://github.com/TestingResearchIllinois/NonDex]
> {{{}mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  edu.illinois:nondex-maven-plugin:2.1.1:nondex 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious{}}}This
>  test passed with the proposed fix but failed without it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4165) Fix a flaky test, caused by nondeterministic iteration order of HashMap

2023-11-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782622#comment-17782622
 ] 

ASF GitHub Bot commented on TIKA-4165:
--

lxb007981 opened a new pull request, #1436:
URL: https://github.com/apache/tika/pull/1436

   ### Description
   
   - Type of change :
 - [ ] New feature
 - [ ] Bug fix for existing feature
 - [ ] Code quality improvement
 - [X] Addition or Improvement of tests
 - [ ] Addition or Improvement of documentation
   
   Fix a flaky test, caused by nondeterministic iteration order of HashMap
   
   ### Related Test
   
[org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious](https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59)
   
   ### Root Cause
   
https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119
   
   When recursively extracting metadata from an XPS file, a simple `HashMap` is 
used and iterated. However, note that `HashMap` does not gurantee the order of 
iteration, the extracted metadata have no guaranteed order in the resulting 
metadata list. And later in the test 
`org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious`, the 
test assumes the order of metadata in the list is the same as the `HashMap` 
insertion order, thus renders the test flaky.
   
   ### Fix
   We sort the metadata list before doing comparison.
   
   ### How to reproduce the test
   **Java version**: 11.0.20.1
   **Maven version**: 3.6.3
   
   1. Build the module
   `mvn clean install -DskipTests -pl 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
 -am`
   2. Test without shuffling
   `mvn -pl 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
 test 
-Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious`
   This test passed.
   
   3. Test with shuffling using 
[NonDex](https://github.com/TestingResearchIllinois/NonDex)
   `mvn -pl 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
 edu.illinois:nondex-maven-plugin:2.1.1:nondex 
-Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious`
   
   This test passed with the proposed fix but failed without it.




> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> ---
>
> Key: TIKA-4165
> URL: https://issues.apache.org/jira/browse/TIKA-4165
> Project: Tika
>  Issue Type: Improvement
>Reporter: Xinbo Lu
>Priority: Minor
> Attachments: diff
>
>
> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> h3. Related Test
> [org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious|https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59]
> h3. Root Cause
>  
> [tika/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java|https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119]
> Line 119 in 
> [d466492|https://github.com/lxb007981/tika/commit/d466492e0a01c8ee28c108bc3022f1a03ff530de]
> ||for (Map.Entry embeddedImage : embeddedImages.entrySet()) 
> {||
>  
> When recursively extracting metadata from an XPS file, a simple {{HashMap}} 
> is used and iterated. However, note that {{HashMap}} does not gurantee the 
> order of iteration, the extracted metadata have no guaranteed order in the 
> resulting metadata list. And later in the test 
> {{{}org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious{}}}, 
> the test assumes the order of metadata in the list is the same as the 
> {{HashMap}} insertion order, thus renders the test flaky.
> h3. Fix
> We sort the metadata list before doing comparison.
> h3. How to reproduce the test
> {*}Java version{*}: 11.0.20.1
> {*}Maven version{*}: 3.6.3
>  # Build the module
> {{mvn clean install -DskipTests -pl 
> 

[jira] [Commented] (TIKA-4165) Fix a flaky test, caused by nondeterministic iteration order of HashMap

2023-11-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782619#comment-17782619
 ] 

ASF GitHub Bot commented on TIKA-4165:
--

lxb007981 closed pull request #1436: [TIKA-4165] fix a flaky test.
URL: https://github.com/apache/tika/pull/1436




> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> ---
>
> Key: TIKA-4165
> URL: https://issues.apache.org/jira/browse/TIKA-4165
> Project: Tika
>  Issue Type: Improvement
>Reporter: Xinbo Lu
>Priority: Minor
> Attachments: diff
>
>
> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> h3. Related Test
> [org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious|https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59]
> h3. Root Cause
>  
> [tika/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java|https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119]
> Line 119 in 
> [d466492|https://github.com/lxb007981/tika/commit/d466492e0a01c8ee28c108bc3022f1a03ff530de]
> ||for (Map.Entry embeddedImage : embeddedImages.entrySet()) 
> {||
>  
> When recursively extracting metadata from an XPS file, a simple {{HashMap}} 
> is used and iterated. However, note that {{HashMap}} does not gurantee the 
> order of iteration, the extracted metadata have no guaranteed order in the 
> resulting metadata list. And later in the test 
> {{{}org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious{}}}, 
> the test assumes the order of metadata in the list is the same as the 
> {{HashMap}} insertion order, thus renders the test flaky.
> h3. Fix
> We sort the metadata list before doing comparison.
> h3. How to reproduce the test
> {*}Java version{*}: 11.0.20.1
> {*}Maven version{*}: 3.6.3
>  # Build the module
> {{mvn clean install -DskipTests -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  -am}}
>  # Test without shuffling
> {{mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  test 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious}}
> This test passed.
>  # Test with shuffling using 
> [NonDex|https://github.com/TestingResearchIllinois/NonDex]
> {{{}mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  edu.illinois:nondex-maven-plugin:2.1.1:nondex 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious{}}}This
>  test passed with the proposed fix but failed without it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [TIKA-4165] fix a flaky test. [tika]

2023-11-03 Thread via GitHub


lxb007981 closed pull request #1436: [TIKA-4165] fix a flaky test.
URL: https://github.com/apache/tika/pull/1436


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-4165) Fix a flaky test, caused by nondeterministic iteration order of HashMap

2023-11-03 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782564#comment-17782564
 ] 

Tim Allison commented on TIKA-4165:
---

Let's go with LinkedHashMap.

> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> ---
>
> Key: TIKA-4165
> URL: https://issues.apache.org/jira/browse/TIKA-4165
> Project: Tika
>  Issue Type: Improvement
>Reporter: Xinbo Lu
>Priority: Minor
> Attachments: diff
>
>
> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> h3. Related Test
> [org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious|https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59]
> h3. Root Cause
>  
> [tika/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java|https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119]
> Line 119 in 
> [d466492|https://github.com/lxb007981/tika/commit/d466492e0a01c8ee28c108bc3022f1a03ff530de]
> ||for (Map.Entry embeddedImage : embeddedImages.entrySet()) 
> {||
>  
> When recursively extracting metadata from an XPS file, a simple {{HashMap}} 
> is used and iterated. However, note that {{HashMap}} does not gurantee the 
> order of iteration, the extracted metadata have no guaranteed order in the 
> resulting metadata list. And later in the test 
> {{{}org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious{}}}, 
> the test assumes the order of metadata in the list is the same as the 
> {{HashMap}} insertion order, thus renders the test flaky.
> h3. Fix
> We sort the metadata list before doing comparison.
> h3. How to reproduce the test
> {*}Java version{*}: 11.0.20.1
> {*}Maven version{*}: 3.6.3
>  # Build the module
> {{mvn clean install -DskipTests -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  -am}}
>  # Test without shuffling
> {{mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  test 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious}}
> This test passed.
>  # Test with shuffling using 
> [NonDex|https://github.com/TestingResearchIllinois/NonDex]
> {{{}mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  edu.illinois:nondex-maven-plugin:2.1.1:nondex 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious{}}}This
>  test passed with the proposed fix but failed without it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (TIKA-4165) Fix a flaky test, caused by nondeterministic iteration order of HashMap

2023-11-03 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782564#comment-17782564
 ] 

Tim Allison edited comment on TIKA-4165 at 11/3/23 12:59 PM:
-

Let's go with LinkedHashMap. Thank you!


was (Author: talli...@mitre.org):
Let's go with LinkedHashMap.

> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> ---
>
> Key: TIKA-4165
> URL: https://issues.apache.org/jira/browse/TIKA-4165
> Project: Tika
>  Issue Type: Improvement
>Reporter: Xinbo Lu
>Priority: Minor
> Attachments: diff
>
>
> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> h3. Related Test
> [org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious|https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59]
> h3. Root Cause
>  
> [tika/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java|https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119]
> Line 119 in 
> [d466492|https://github.com/lxb007981/tika/commit/d466492e0a01c8ee28c108bc3022f1a03ff530de]
> ||for (Map.Entry embeddedImage : embeddedImages.entrySet()) 
> {||
>  
> When recursively extracting metadata from an XPS file, a simple {{HashMap}} 
> is used and iterated. However, note that {{HashMap}} does not gurantee the 
> order of iteration, the extracted metadata have no guaranteed order in the 
> resulting metadata list. And later in the test 
> {{{}org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious{}}}, 
> the test assumes the order of metadata in the list is the same as the 
> {{HashMap}} insertion order, thus renders the test flaky.
> h3. Fix
> We sort the metadata list before doing comparison.
> h3. How to reproduce the test
> {*}Java version{*}: 11.0.20.1
> {*}Maven version{*}: 3.6.3
>  # Build the module
> {{mvn clean install -DskipTests -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  -am}}
>  # Test without shuffling
> {{mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  test 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious}}
> This test passed.
>  # Test with shuffling using 
> [NonDex|https://github.com/TestingResearchIllinois/NonDex]
> {{{}mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  edu.illinois:nondex-maven-plugin:2.1.1:nondex 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious{}}}This
>  test passed with the proposed fix but failed without it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4165) Fix a flaky test, caused by nondeterministic iteration order of HashMap

2023-11-03 Thread Xinbo Lu (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782549#comment-17782549
 ] 

Xinbo Lu commented on TIKA-4165:


Thank you for the comment. Yes I did considered changing HashMap to 
LinkedHashMap, but I am not sure if here the order of extracted metadata is 
intended to be fixed? If so, using LinkedHashMap should be a better solution.

> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> ---
>
> Key: TIKA-4165
> URL: https://issues.apache.org/jira/browse/TIKA-4165
> Project: Tika
>  Issue Type: Improvement
>Reporter: Xinbo Lu
>Priority: Minor
> Attachments: diff
>
>
> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> h3. Related Test
> [org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious|https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59]
> h3. Root Cause
>  
> [tika/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java|https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119]
> Line 119 in 
> [d466492|https://github.com/lxb007981/tika/commit/d466492e0a01c8ee28c108bc3022f1a03ff530de]
> ||for (Map.Entry embeddedImage : embeddedImages.entrySet()) 
> {||
>  
> When recursively extracting metadata from an XPS file, a simple {{HashMap}} 
> is used and iterated. However, note that {{HashMap}} does not gurantee the 
> order of iteration, the extracted metadata have no guaranteed order in the 
> resulting metadata list. And later in the test 
> {{{}org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious{}}}, 
> the test assumes the order of metadata in the list is the same as the 
> {{HashMap}} insertion order, thus renders the test flaky.
> h3. Fix
> We sort the metadata list before doing comparison.
> h3. How to reproduce the test
> {*}Java version{*}: 11.0.20.1
> {*}Maven version{*}: 3.6.3
>  # Build the module
> {{mvn clean install -DskipTests -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  -am}}
>  # Test without shuffling
> {{mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  test 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious}}
> This test passed.
>  # Test with shuffling using 
> [NonDex|https://github.com/TestingResearchIllinois/NonDex]
> {{{}mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  edu.illinois:nondex-maven-plugin:2.1.1:nondex 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious{}}}This
>  test passed with the proposed fix but failed without it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4165) Fix a flaky test, caused by nondeterministic iteration order of HashMap

2023-11-03 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782517#comment-17782517
 ] 

Tim Allison commented on TIKA-4165:
---

Thank you for opening this. What do you think about changing the implementation 
in the parser to use a LinkedHashMap to guarantee insertion order?

> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> ---
>
> Key: TIKA-4165
> URL: https://issues.apache.org/jira/browse/TIKA-4165
> Project: Tika
>  Issue Type: Improvement
>Reporter: Xinbo Lu
>Priority: Minor
> Attachments: diff
>
>
> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> h3. Related Test
> [org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious|https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59]
> h3. Root Cause
>  
> [tika/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java|https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119]
> Line 119 in 
> [d466492|https://github.com/lxb007981/tika/commit/d466492e0a01c8ee28c108bc3022f1a03ff530de]
> ||for (Map.Entry embeddedImage : embeddedImages.entrySet()) 
> {||
>  
> When recursively extracting metadata from an XPS file, a simple {{HashMap}} 
> is used and iterated. However, note that {{HashMap}} does not gurantee the 
> order of iteration, the extracted metadata have no guaranteed order in the 
> resulting metadata list. And later in the test 
> {{{}org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious{}}}, 
> the test assumes the order of metadata in the list is the same as the 
> {{HashMap}} insertion order, thus renders the test flaky.
> h3. Fix
> We sort the metadata list before doing comparison.
> h3. How to reproduce the test
> {*}Java version{*}: 11.0.20.1
> {*}Maven version{*}: 3.6.3
>  # Build the module
> {{mvn clean install -DskipTests -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  -am}}
>  # Test without shuffling
> {{mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  test 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious}}
> This test passed.
>  # Test with shuffling using 
> [NonDex|https://github.com/TestingResearchIllinois/NonDex]
> {{{}mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  edu.illinois:nondex-maven-plugin:2.1.1:nondex 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious{}}}This
>  test passed with the proposed fix but failed without it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)