https://bz.apache.org/ooo/show_bug.cgi?id=126863
Issue ID: 126863 Issue Type: DEFECT Summary: en_AU.dic has UTF-8 errors Product: General Version: 4.2.0-dev Hardware: All OS: All Status: UNCONFIRMED Severity: Normal Priority: P5 (lowest) Component: spell checking Assignee: issues@openoffice.apache.org Reporter: i...@yahoo.com In regards to the en_AU.dic extension for Australian spelling, a number of spellings were corrupted. This appears to have occurred due to incorrect conversion to/from UTF-8 during adding new words or in the editing process in 2008, but these errors persist to the current version of the en_AU.dic. I would fix these errors myself but surely there is a maintainer to contact in regards to this issue? Has it occurred with other dictionaries? Two options I see, delete all entries with characters that are not Australian English, or change all those bad characters to good ones. Noting that the character � implies error, not a particular character. In other words we see variants such as pi�ata (should be piñata) and clich� (should be cliché). I tracked this down through various versions of the en_AU.dic http://extensions.services.openoffice.org/en/project/AustralianDictionary Here is some analysis of version and line numbers of 2 words as they changed over time. This problem is rife in the newest version of en_AU.dic, with at least 211 occurrences of the ¿ character, which indicates a failed conversion. The word cliche, for example, is misrepresented over time in different ways. Note that many words with the � character in the en_AU.dic file never appeared correctly, although this example for the word cliché was originally correct but was corrupted over time. Version 2016.03.01 (Newest) 1700: clich�/SM Version: 2010.03.16 1700: clich�/SM Version: 2008.11.25 1700: clich�/SM Version: 2008.10.3 1702: cliché/MS 1703: clich�/SM Version: 1.0.0 1523: cliché/MS With reference to files at: http://extensions.services.openoffice.org/en/project/english-dictionaries-apache-openoffice http://extensions.services.openoffice.org/en/project/AustralianDictionary -- You are receiving this mail because: You are the assignee for the issue.