I ran the regression tests against docx, and I'm finding no problematic new exceptions. We are extracting some new text in the phonetic/ruby runs (great!). However, I am finding some duplication of content within textboxes(? may be other sources ?). I need to figure out if this is at the POI level or the Tika level.
Reports are here: http://162.242.228.174/reports/poi-3.17-rc2-docx.tar.gz -----Original Message----- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Wednesday, August 30, 2017 8:05 PM To: POI Developers List <dev@poi.apache.org> Subject: RE: [VOTE] Apache POI 3.17 release (RC2) I’ll run regression tests at least against our .docx tonight to make sure I didn’t wreck anything with 61470.