[ 
https://issues.apache.org/jira/browse/TIKA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782622#comment-17782622
 ] 

ASF GitHub Bot commented on TIKA-4165:
--------------------------------------

lxb007981 opened a new pull request, #1436:
URL: https://github.com/apache/tika/pull/1436

   ### Description
   
   - Type of change :
     - [ ] New feature
     - [ ] Bug fix for existing feature
     - [ ] Code quality improvement
     - [X] Addition or Improvement of tests
     - [ ] Addition or Improvement of documentation
   
   Fix a flaky test, caused by nondeterministic iteration order of HashMap
   
   ### Related Test
   
[org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious](https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59)
   
   ### Root Cause
   
https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119
   
   When recursively extracting metadata from an XPS file, a simple `HashMap` is 
used and iterated. However, note that `HashMap` does not gurantee the order of 
iteration, the extracted metadata have no guaranteed order in the resulting 
metadata list. And later in the test 
`org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious`, the 
test assumes the order of metadata in the list is the same as the `HashMap` 
insertion order, thus renders the test flaky.
   
   ### Fix
   We sort the metadata list before doing comparison.
   
   ### How to reproduce the test
   **Java version**: 11.0.20.1
   **Maven version**: 3.6.3
   
   1. Build the module
   `mvn clean install -DskipTests -pl 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
 -am`
   2. Test without shuffling
   `mvn -pl 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
 test 
-Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious`
   This test passed.
   
   3. Test with shuffling using 
[NonDex](https://github.com/TestingResearchIllinois/NonDex)
   `mvn -pl 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
 edu.illinois:nondex-maven-plugin:2.1.1:nondex 
-Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious`
   
   This test passed with the proposed fix but failed without it.




> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> -----------------------------------------------------------------------
>
>                 Key: TIKA-4165
>                 URL: https://issues.apache.org/jira/browse/TIKA-4165
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Xinbo Lu
>            Priority: Minor
>         Attachments: diff
>
>
> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> h3. Related Test
> [org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious|https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59]
> h3. Root Cause
>  
> [tika/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java|https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119]
> Line 119 in 
> [d466492|https://github.com/lxb007981/tika/commit/d466492e0a01c8ee28c108bc3022f1a03ff530de]
> ||for (Map.Entry<String, Metadata> embeddedImage : embeddedImages.entrySet()) 
> {||
>  
> When recursively extracting metadata from an XPS file, a simple {{HashMap}} 
> is used and iterated. However, note that {{HashMap}} does not gurantee the 
> order of iteration, the extracted metadata have no guaranteed order in the 
> resulting metadata list. And later in the test 
> {{{}org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious{}}}, 
> the test assumes the order of metadata in the list is the same as the 
> {{HashMap}} insertion order, thus renders the test flaky.
> h3. Fix
> We sort the metadata list before doing comparison.
> h3. How to reproduce the test
> {*}Java version{*}: 11.0.20.1
> {*}Maven version{*}: 3.6.3
>  # Build the module
> {{mvn clean install -DskipTests -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  -am}}
>  # Test without shuffling
> {{mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  test 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious}}
> This test passed.
>  # Test with shuffling using 
> [NonDex|https://github.com/TestingResearchIllinois/NonDex]
> {{{}mvn -pl 
> tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module
>  edu.illinois:nondex-maven-plugin:2.1.1:nondex 
> -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious{}}}This
>  test passed with the proposed fix but failed without it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to