[ 
https://issues.apache.org/jira/browse/TIKA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217212#comment-17217212
 ] 

Peter Lee commented on TIKA-3209:
---------------------------------

Hi [~nick]

Just replace PicturesSource in Tika with PictureRunMapper in POI in my fork 
repo with commit 232643b. see [5]

got these test failures:   see[6]
{code:java}
Error:  Failures: 
3558Error:    POIContainerExtractionTest.testEmbeddedImages:90 expected:<1> but 
was:<0>
3559Error:    POIContainerExtractionTest.testEmbeddedStorageId:137 
expected:<{F4754C9B-64F5-4B40-8AF4-679732AC0607}> but was:<null>
3560Error:    OOXMLContainerExtractionTest.testEmbeddedOfficeFiles:170 
expected:<24> but was:<22>
3561Error:    SXWPFExtractorTest.testEmbedded:761 expected:<16> but 
was:<15>{code}
 

[5] 
[https://github.com/PeterAlfredLee/tika/commit/232643b27bdd7798f94b64931b5070d667f8dc29]

[6] 
[https://github.com/PeterAlfredLee/tika/runs/1278568384?check_suite_focus=true]

 

> Different between PictureRunMapper in POI and PicturesSource in Tika
> --------------------------------------------------------------------
>
>                 Key: TIKA-3209
>                 URL: https://issues.apache.org/jira/browse/TIKA-3209
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Peter Lee
>            Priority: Minor
>
> 1. In git log of POI, class PictureRunMapper was copy from class 
> PicturesSource in Tika. see [1]
> 2. This TODO of Tika suggest replace PicturesSource with PictureRunMapper 
> once POI 3.18 is out. see [2]
> So I try to replace but got a test fail.
> I think it may because the different between in method nextUnclaimed in these 
> two classes. see [3][4]
>  
> Can we remove this line in POI ? see [4]
>  
> [1] 
> [https://github.com/apache/poi/commit/bdb0e8199bb6891b068e97da69d6410870e8066b]
> [2] 
> [https://github.com/apache/tika/blob/172d40322f5662e428850ad7a8fb4113e453a51c/tika-parser-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java#L641]
> [3]
> [https://github.com/apache/tika/blob/172d40322f5662e428850ad7a8fb4113e453a51c/tika-parser-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java#L709]
>  
> [4] 
> [https://github.com/apache/poi/blob/f509d1deae86866ed531f10f2eba7db17e098473/src/scratchpad/src/org/apache/poi/hwpf/usermodel/PictureRunMapper.java#L130]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to