https://bz.apache.org/ooo/show_bug.cgi?id=126863
Issue ID: 126863
Issue Type: DEFECT
Summary: en_AU.dic has UTF-8 errors
Product: General
Version: 4.2.0-dev
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: Normal
Priority: P5 (lowest)
Component: spell checking
Assignee: [email protected]
Reporter: [email protected]
In regards to the en_AU.dic extension for Australian spelling, a number of
spellings were corrupted. This appears to have occurred due to incorrect
conversion to/from UTF-8 during adding new words or in the editing process in
2008, but these errors persist to the current version of the en_AU.dic. I would
fix these errors myself but surely there is a maintainer to contact in regards
to this issue? Has it occurred with other dictionaries?
Two options I see, delete all entries with characters that are not Australian
English, or change all those bad characters to good ones. Noting that the
character � implies error, not a particular character. In other words we see
variants such as pi�ata (should be piñata) and clich� (should be cliché).
I tracked this down through various versions of the en_AU.dic
http://extensions.services.openoffice.org/en/project/AustralianDictionary
Here is some analysis of version and line numbers of 2 words as they changed
over time. This problem is rife in the newest version of en_AU.dic, with at
least 211 occurrences of the ¿ character, which indicates a failed conversion.
The word cliche, for example, is misrepresented over time in different ways.
Note that many words with the � character in the en_AU.dic file never
appeared correctly, although this example for the word cliché was originally
correct but was corrupted over time.
Version 2016.03.01 (Newest)
1700: clich�/SM
Version: 2010.03.16
1700: clich�/SM
Version: 2008.11.25
1700: clich�/SM
Version: 2008.10.3
1702: cliché/MS
1703: clich�/SM
Version: 1.0.0
1523: cliché/MS
With reference to files at:
http://extensions.services.openoffice.org/en/project/english-dictionaries-apache-openoffice
http://extensions.services.openoffice.org/en/project/AustralianDictionary
--
You are receiving this mail because:
You are the assignee for the issue.