This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch branch_1x
in repository https://gitbox.apache.org/repos/asf/tika.git.


    from 2c83f4e  TIKA-2846 -- store number of characters per page and number 
of characters with bad/missing unicode mapping per page for PDFs.
     new b2928c0  TIKA-2849 -- move to streaming detection of zip files and 
apply markLimit to POIFSContainerDetector; thank you, Jukka!
     new b019b63  TIKA-2850 add more limits to comparison reports
     new ec9822b  TIKA-2852 -- add reports for missing files/attachments by mime
     new 15ac3da  TIKA-2835 upgrade to PDFBox 2.0.15

The 4378 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.txt                                        |   7 +
 .../org/apache/tika/io/BoundedInputStream.java     | 118 +++++++
 .../java/org/apache/tika/io/TikaInputStream.java   |  35 ++-
 .../tika/parser/digest/InputStreamDigester.java    | 103 +------
 .../src/test/java/org/apache/tika/TikaTest.java    |   2 +-
 .../src/main/resources/comparison-reports.xml      | 102 ++++++-
 tika-parsers/pom.xml                               |   2 +-
 .../org/apache/tika/parser/epub/EpubParser.java    |  12 +-
 .../tika/parser/iwork/IWorkPackageParser.java      |   2 +-
 .../parser/microsoft/POIFSContainerDetector.java   |  22 +-
 .../microsoft/ooxml/OOXMLExtractorFactory.java     |  42 ++-
 .../parser/pkg/StreamingZipContainerDetector.java  | 222 ++++++++++++++
 .../tika/parser/pkg/ZipContainerDetector.java      | 340 +++++++--------------
 .../tika/parser/pkg/ZipContainerDetectorBase.java  | 162 ++++++++++
 .../org/apache/tika/parser/utils/ZipSalvager.java  |  75 ++---
 .../parser/microsoft/ooxml/TruncatedOOXMLTest.java |   1 +
 .../tika/parser/pkg/ZipContainerDetectorTest.java  | 179 ++++++++++-
 .../pkg/tika-config.xml}                           |  16 +-
 18 files changed, 1031 insertions(+), 411 deletions(-)
 create mode 100644 
tika-core/src/main/java/org/apache/tika/io/BoundedInputStream.java
 create mode 100644 
tika-parsers/src/main/java/org/apache/tika/parser/pkg/StreamingZipContainerDetector.java
 create mode 100644 
tika-parsers/src/main/java/org/apache/tika/parser/pkg/ZipContainerDetectorBase.java
 copy 
tika-parsers/src/test/resources/org/apache/tika/{config/TIKA-2273-non-detecting-params-bad-charset.xml
 => parser/pkg/tika-config.xml} (66%)

Reply via email to