This is an automated email from the ASF dual-hosted git repository.
tallison pushed a change to branch TIKA-4671-lang-aware-charset-detection
in repository https://gitbox.apache.org/repos/asf/tika.git
from dd5681316e TIKA-4671 - tweaks
add 4266e9910a TIKA-4671 - tweaks, take 3
No new revisions were added by this update.
Summary of changes:
.../tika/langdetect/charsoup/CharSoupModel.java | 18 ++-
.../charsoup/CharSoupLanguageDetector.java | 109 ++++++++++++++--
.../charsoup/CharSoupEncodingDetectorTest.java | 23 ++++
.../langdetect/charsoup/TextQualityDiagTest.java | 141 +++++++++++++++++++++
4 files changed, 277 insertions(+), 14 deletions(-)
create mode 100644
tika-langdetect/tika-langdetect-charsoup/src/test/java/org/apache/tika/langdetect/charsoup/TextQualityDiagTest.java