[ 
https://issues.apache.org/jira/browse/ANY23-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende reassigned ANY23-554:
---------------------------------

    Assignee: Hans Brende

> Avoid using carriage return to detect windows-1252 charset if content type 
> has been identified from metadata
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: ANY23-554
>                 URL: https://issues.apache.org/jira/browse/ANY23-554
>             Project: Apache Any23
>          Issue Type: Task
>            Reporter: Peter Ansell
>            Assignee: Hans Brende
>            Priority: Major
>
> Two encoding detection tests are failing on Windows and Windows Subsystem for 
> Linux due to a condition that overrides a meta tag with a heuristic, which is 
> not likely correct in its current form as carriage returns are present in 
> many different Windows produced documents, which may legitimately follow 
> ISO-8859-1.
> If someone has put a meta tag in with ISO-8859-1, we shouldn't be using the 
> presence of carriage return characters overriding that with an incompatible 
> windows specific codepage, windows-1252.
> The relevant code is:
> https://github.com/apache/any23/blob/any23-2.6/encoding/src/main/java/org/apache/any23/encoding/EncodingUtils.java#L62-L69
> The tests that are failing on Windows and WSL2 are:
> [INFO] Results:
> [INFO]
> [ERROR] Failures:
> [ERROR]   TikaEncodingDetectorTest.testISO8859HTML:58->assertEncoding:128
> Unexpected encoding expected:<[ISO-8859-1]> but was:<[windows-1252]>
> [ERROR]   TikaEncodingDetectorTest.testISO8859XHTML:63->assertEncoding:128
> Unexpected encoding expected:<[ISO-8859-1]> but was:<[windows-1252]>
> [INFO]
> [ERROR] Tests run: 12, Failures: 2, Errors: 0, Skipped: 0
> [INFO]
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] Reactor Summary for Apache Any23 2.6:
> [INFO]
> [INFO] Apache Any23 ....................................... SUCCESS [01:57 
> min]
> [INFO] Apache Any23 :: Base API ........................... SUCCESS [ 56.016 
> s]
> [INFO] Apache Any23 :: Test Resources ..................... SUCCESS [  1.068 
> s]
> [INFO] Apache Any23 :: CSV Utilities ...................... SUCCESS [  2.759 
> s]
> [INFO] Apache Any23 :: Mime Type Detection ................ SUCCESS [01:10 
> min]
> [INFO] Apache Any23 :: Encoding Detection ................. FAILURE [  4.160 
> s]
> [INFO] Apache Any23 :: Core ............................... SKIPPED
> [INFO] Apache Any23 :: CLI ................................ SKIPPED
> [INFO] 
> ------------------------------------------------------------------------



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to