https://bz.apache.org/ooo/show_bug.cgi?id=126863

          Issue ID: 126863
        Issue Type: DEFECT
           Summary: en_AU.dic has UTF-8 errors
           Product: General
           Version: 4.2.0-dev
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: Normal
          Priority: P5 (lowest)
         Component: spell checking
          Assignee: issues@openoffice.apache.org
          Reporter: i...@yahoo.com

In regards to the en_AU.dic extension for Australian spelling, a number of
spellings were corrupted. This appears to have occurred due to incorrect
conversion to/from UTF-8 during adding new words or in the editing process in
2008, but these errors persist to the current version of the en_AU.dic. I would
fix these errors myself but surely there is a maintainer to contact in regards
to this issue? Has it occurred with other dictionaries?

Two options I see, delete all entries with characters that are not Australian
English, or change all those bad characters to good ones. Noting that the
character � implies error, not a particular character. In other words we see
variants such as pi�ata (should be piñata) and clich� (should be cliché).

I tracked this down through various versions of the en_AU.dic
http://extensions.services.openoffice.org/en/project/AustralianDictionary

Here is some analysis of version and line numbers of 2 words as they changed
over time. This problem is rife in the newest version of en_AU.dic, with at
least 211 occurrences of the ¿ character, which indicates a failed conversion.
The word cliche, for example, is misrepresented over time in different ways.
Note that many words with the � character in the en_AU.dic file never
appeared correctly, although this example for the word cliché was originally
correct but was corrupted over time.

Version 2016.03.01 (Newest)
1700: clich�/SM

Version: 2010.03.16
1700: clich�/SM

Version: 2008.11.25
1700: clich�/SM

Version: 2008.10.3
1702: cliché/MS
1703: clich�/SM

Version: 1.0.0
1523: cliché/MS

With reference to files at:
http://extensions.services.openoffice.org/en/project/english-dictionaries-apache-openoffice
http://extensions.services.openoffice.org/en/project/AustralianDictionary

-- 
You are receiving this mail because:
You are the assignee for the issue.

Reply via email to