[ https://issues.apache.org/jira/browse/TIKA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217212#comment-17217212 ]
Peter Lee commented on TIKA-3209: --------------------------------- Hi [~nick] Just replace PicturesSource in Tika with PictureRunMapper in POI in my fork repo with commit 232643b. see [5] got these test failures: see[6] {code:java} Error: Failures: 3558Error: POIContainerExtractionTest.testEmbeddedImages:90 expected:<1> but was:<0> 3559Error: POIContainerExtractionTest.testEmbeddedStorageId:137 expected:<{F4754C9B-64F5-4B40-8AF4-679732AC0607}> but was:<null> 3560Error: OOXMLContainerExtractionTest.testEmbeddedOfficeFiles:170 expected:<24> but was:<22> 3561Error: SXWPFExtractorTest.testEmbedded:761 expected:<16> but was:<15>{code} [5] [https://github.com/PeterAlfredLee/tika/commit/232643b27bdd7798f94b64931b5070d667f8dc29] [6] [https://github.com/PeterAlfredLee/tika/runs/1278568384?check_suite_focus=true] > Different between PictureRunMapper in POI and PicturesSource in Tika > -------------------------------------------------------------------- > > Key: TIKA-3209 > URL: https://issues.apache.org/jira/browse/TIKA-3209 > Project: Tika > Issue Type: Bug > Components: parser > Reporter: Peter Lee > Priority: Minor > > 1. In git log of POI, class PictureRunMapper was copy from class > PicturesSource in Tika. see [1] > 2. This TODO of Tika suggest replace PicturesSource with PictureRunMapper > once POI 3.18 is out. see [2] > So I try to replace but got a test fail. > I think it may because the different between in method nextUnclaimed in these > two classes. see [3][4] > > Can we remove this line in POI ? see [4] > > [1] > [https://github.com/apache/poi/commit/bdb0e8199bb6891b068e97da69d6410870e8066b] > [2] > [https://github.com/apache/tika/blob/172d40322f5662e428850ad7a8fb4113e453a51c/tika-parser-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java#L641] > [3] > [https://github.com/apache/tika/blob/172d40322f5662e428850ad7a8fb4113e453a51c/tika-parser-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java#L709] > > [4] > [https://github.com/apache/poi/blob/f509d1deae86866ed531f10f2eba7db17e098473/src/scratchpad/src/org/apache/poi/hwpf/usermodel/PictureRunMapper.java#L130] > -- This message was sent by Atlassian Jira (v8.3.4#803005)