[
https://issues.apache.org/jira/browse/TIKA-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
PJ Fanning updated TIKA-4405:
-----------------------------
Description:
* https://github.com/apache/poi/commit/80f89a3674aaf346d10b5aa1f2bdb7dea75ba831
* https://bz.apache.org/bugzilla/show_bug.cgi?id=63575
I am looking at copying XWPFEventBasedWordExtractor into POI code and ran a few
of the tests that we have for the POI XWPFWordExtractor. The capitalized text
test failed. The text in the XML is not capitalized but the OOXML has a marker
element that says it should be capitalized.
There are quite a few other POI tests where XWPFEventBasedWordExtractor does
not return the same text as XWPFWordExtractor.
https://github.com/apache/poi/pull/788
was:
* https://github.com/apache/poi/commit/80f89a3674aaf346d10b5aa1f2bdb7dea75ba831
* https://bz.apache.org/bugzilla/show_bug.cgi?id=63575
I am looking at copying XWPFEventBasedWordExtractor into POI code and ran a few
of the tests that we have for the POI XWPFWordExtractor. The capitalized text
test failed. The text in the XML is not capitalized but the OOXML has a marker
element that says it should be capitalized.
> XWPFEventBasedWordExtractor does not support run text that is marked as
> capitalized
> -----------------------------------------------------------------------------------
>
> Key: TIKA-4405
> URL: https://issues.apache.org/jira/browse/TIKA-4405
> Project: Tika
> Issue Type: Bug
> Reporter: PJ Fanning
> Priority: Major
>
> *
> https://github.com/apache/poi/commit/80f89a3674aaf346d10b5aa1f2bdb7dea75ba831
> * https://bz.apache.org/bugzilla/show_bug.cgi?id=63575
> I am looking at copying XWPFEventBasedWordExtractor into POI code and ran a
> few of the tests that we have for the POI XWPFWordExtractor. The capitalized
> text test failed. The text in the XML is not capitalized but the OOXML has a
> marker element that says it should be capitalized.
> There are quite a few other POI tests where XWPFEventBasedWordExtractor does
> not return the same text as XWPFWordExtractor.
> https://github.com/apache/poi/pull/788
--
This message was sent by Atlassian Jira
(v8.20.10#820010)