This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch TIKA-4692-improve-ooxml-sax-parsers
in repository https://gitbox.apache.org/repos/asf/tika.git


    from 04fb507c06 improve sax ooxml - docx and pptx tests - WIP
     add 991a6297b4 TIKA-4327: update aws
     add 1318754622 TIKA-4327: update kotlin
     add 720e083421 TIKA-4327: put kotlin version variable in parent
     add ee96f8834a TIKA-4327: update sqlite
     add 48c9e93734 TIKA-4563 -- on main: cherry-pick updates from branch_3x 
found during regression tests and the release process (#2699)
     add 5ce15f2b39 Merge branch 'main' into TIKA-4692-improve-ooxml-sax-parsers
     add 81b2f29c13 refactor based on fresh commoncrawl - WIP
     add 1f6b3d04db refactor based on fresh commoncrawl - WIP
     add 88307771f4 refactor based on fresh commoncrawl - WIP
     add 3ebf8fd77b string index out of bounds exception
     add 1d87374184 checkpoint - wip
     add 18cd618ad0 checkpoint - wip
     add 7117454ca1 checkpoint - wip

No new revisions were added by this update.

Summary of changes:
 CHANGES.txt                                        |   4 +
 tika-bom/pom.xml                                   |  61 ++++++
 .../main/java/org/apache/tika/metadata/Office.java |   4 +
 .../org/apache/tika/sax/XHTMLContentHandler.java   |  27 +++
 .../main/java/org/apache/tika/utils/DateUtils.java |   3 +
 .../org/apache/tika/mime/tika-mimetypes.xml        |   6 +-
 .../org/apache/tika/eval/app/ExtractComparer.java  |  57 +++++-
 .../src/main/resources/comparison-reports-tags.xml |  25 +++
 .../src/main/resources/comparison-reports.xml      |  26 +++
 tika-parent/pom.xml                                |  16 +-
 .../parser/microsoft/AbstractPOIFSExtractor.java   |   2 +-
 ...attingTagManager.java => InlineTagManager.java} |  98 ++++++++--
 .../microsoft/ooxml/OOXMLTikaBodyPartHandler.java  | 126 ++++++++++---
 .../ooxml/OOXMLWordAndPowerPointTextHandler.java   |  66 ++++---
 .../microsoft/ooxml/ParagraphProperties.java       |   9 +
 .../ooxml/SXSLFPowerPointExtractorDecorator.java   | 207 ++++++++++++++++-----
 .../ooxml/SXWPFWordExtractorDecorator.java         |  57 ++++--
 .../ooxml/XSSFExcelExtractorDecorator.java         |  23 ++-
 .../microsoft/ooxml/XWPFBodyContentsHandler.java   |  11 ++
 .../parser/microsoft/ooxml/OOXMLDocxSAXTest.java   |   2 +-
 .../parser/microsoft/ooxml/OOXMLPptxSAXTest.java   |   2 +-
 .../resources/test-documents/testWORD_2006ml.docx  | Bin 165566 -> 151733 bytes
 .../java/org/apache/tika/parser/pkg/ZipParser.java |   4 +
 tika-pipes/tika-pipes-config-store-ignite/pom.xml  |   2 +-
 .../tika-pipes-microsoft-graph/pom.xml             |   1 -
 25 files changed, 672 insertions(+), 167 deletions(-)
 rename 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/{FormattingTagManager.java
 => InlineTagManager.java} (61%)

Reply via email to