[ https://issues.apache.org/jira/browse/ANY23-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469136#comment-17469136 ]
Hans Brende commented on ANY23-554: ----------------------------------- All that being said, sounds like there is a problem with those tests... I will investigate further once I get a chance. > Avoid using carriage return to detect windows-1252 charset if content type > has been identified from metadata > ------------------------------------------------------------------------------------------------------------ > > Key: ANY23-554 > URL: https://issues.apache.org/jira/browse/ANY23-554 > Project: Apache Any23 > Issue Type: Task > Reporter: Peter Ansell > Priority: Major > > Two encoding detection tests are failing on Windows and Windows Subsystem for > Linux due to a condition that overrides a meta tag with a heuristic, which is > not likely correct in its current form as carriage returns are present in > many different Windows produced documents, which may legitimately follow > ISO-8859-1. > If someone has put a meta tag in with ISO-8859-1, we shouldn't be using the > presence of carriage return characters overriding that with an incompatible > windows specific codepage, windows-1252. > The relevant code is: > https://github.com/apache/any23/blob/any23-2.6/encoding/src/main/java/org/apache/any23/encoding/EncodingUtils.java#L62-L69 > The tests that are failing on Windows and WSL2 are: > [INFO] Results: > [INFO] > [ERROR] Failures: > [ERROR] TikaEncodingDetectorTest.testISO8859HTML:58->assertEncoding:128 > Unexpected encoding expected:<[ISO-8859-1]> but was:<[windows-1252]> > [ERROR] TikaEncodingDetectorTest.testISO8859XHTML:63->assertEncoding:128 > Unexpected encoding expected:<[ISO-8859-1]> but was:<[windows-1252]> > [INFO] > [ERROR] Tests run: 12, Failures: 2, Errors: 0, Skipped: 0 > [INFO] > [INFO] > ------------------------------------------------------------------------ > [INFO] Reactor Summary for Apache Any23 2.6: > [INFO] > [INFO] Apache Any23 ....................................... SUCCESS [01:57 > min] > [INFO] Apache Any23 :: Base API ........................... SUCCESS [ 56.016 > s] > [INFO] Apache Any23 :: Test Resources ..................... SUCCESS [ 1.068 > s] > [INFO] Apache Any23 :: CSV Utilities ...................... SUCCESS [ 2.759 > s] > [INFO] Apache Any23 :: Mime Type Detection ................ SUCCESS [01:10 > min] > [INFO] Apache Any23 :: Encoding Detection ................. FAILURE [ 4.160 > s] > [INFO] Apache Any23 :: Core ............................... SKIPPED > [INFO] Apache Any23 :: CLI ................................ SKIPPED > [INFO] > ------------------------------------------------------------------------ -- This message was sent by Atlassian Jira (v8.20.1#820001)