To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=58767 Issue #:|58767 Summary:|encoding flaw in dictionary entries - garbled special |chars in documents Component:|api Version:|OOo 2.0 Platform:|PC URL:| OS/Version:|All Status:|UNCONFIRMED Status whiteboard:| Keywords:| Resolution:| Issue type:|DEFECT Priority:|P3 Subcomponent:|code Assigned to:|sw Reported by:|ms2
------- Additional comments from [EMAIL PROTECTED] Thu Dec 1 20:00:08 -0800 2005 ------- When dictionary entries are inserted into a writer doc from SBASIC code the enconding is not respected, special chars (german umlauts and the like) are not converted correctly. This problem vanishes if the locale is set to de_DE.UTF-8 before staring OO.o. Tested with: OO.o 1.1.3-de/FreeBSD OO.o 1.1.5-de/Windows 98 OO.o 1.1.5-de/Windows 2000 OO.o 2.0 RC1-de/Windows 98 Any combination garbles special chars, only on FreeBSD with locale set to de_DE.UTF-8 the chars are shown correctly. You find a complete description and diagnosis and testing code for reproducing below: Von: Stephan Bergmann <[EMAIL PROTECTED]> Antwort an: dev@api.openoffice.org An: dev@api.openoffice.org Betreff: Re: [api-dev] encoding flaw in dictionary entries Datum: Wed, 30 Nov 2005 09:57:54 +0100 Newsgroups: openoffice.api.dev Marc Santhoff wrote: > Am Dienstag, den 29.11.2005, 09:56 +0100 schrieb Stephan Bergmann: > >>Marc Santhoff wrote: >> >>>Am Montag, den 28.11.2005, 10:29 +0100 schrieb Stephan Bergmann: >>> >>> >>>>Marc Santhoff wrote: >>>> >>>> >>>>>Hi, >>>>> >>>>>I'm using dictionaries from basic code and noticed a problem. When the >>>>>search word from a dictionary entry is inserted into a writer doc the >>>>>encoding is not shown correctly. >>>>> >>>>>Try this in a german localized version: >>>>> >>>>>sub encError >>>>> dls = createUnoService("com.sun.star.linguistic2.DictionaryList") >>>>> dic = dls.getDictionaryByName("soffice.dic") >>>>> entries = dic.getEntries() >>>>> msgbox entries(16).getDictionaryWord() >>>>>end sub >>>>> >>>>>In a german language version of OO.o 1.1.x this should read >>>>>"Bemaßungslinien" but the char "ß" is not converted correctly. This >>>>>holds true for the german OO.o2.0-RC1/Windows, too. >>>>> >>>>>Is this worth filing an issue or is it a pilots error? >>>> >>>>It sure sounds like an error (so please file an issue): >>>>XDictionaryEntry.getDictionaryWord returns a UNO string, which is >>>>Unicode, so no excuse to garble an "ß" (and Basic's msgbox command >>>>should also be fully Unicode...). >>> >>> >>>Thank for replying. >>> >>>I only thought I was missing some conversion function or the like >>>because all umlauts are garbled too. They are shown as two chars in a >>>writer doc. And from the GUI anything works as expected ... >> >>You mean, adding text to a writer doc via some Basic code (where the >>text to be added is represented as a literal Basic string) leads to >>garbled characters? That's strange. Maybe Andreas Bregas knows whether >>there is some part of Basic or the Basic IDE that works with >>locale-dependent text encodings instead of Unicode? > > > Yes, that's what I wanted to say. > > Another Test fpor the german localized OO.o: > > sub encError2 > BasicLibraries.LoadLibrary("Tools") > dls = createUnoService("com.sun.star.linguistic2.DictionaryList") > dic = dls.getDictionaryByName("soffice.dic") > entries = dic.getEntries() > tmpDoc = CreateNewDocument("swriter") > csr = tmpDoc.Text.createTextCursor() > tmpDoc.Text.string = entries(16).getDictionaryWord() ' "ß" > tEnd = tmpDoc.Text.getEnd() > tEnd.String = entries(46).getDictionaryWord() ' "ö" > end sub > > This does garble the special chars, too. > > Regards, > Marc Two things I noticed when trying to reproduce this: 1 You must be using a non-UTF-8 locale (probably 8859-1), check the environment variable LANG. If you set LANG to something like "de_DE.UTF-8" the problem should go away. 2 If you modify the Basic script by adding tEnd = tmpDoc.Text.getEnd() tEnd.String = "äöü" end sub to the end, you see that Basic is not the culprit, as the umlauts show up correctly in the writer doc, regardless of LANG setting. I suspect that the OOo dictionary implementation erroneously uses osl_getThreadTextEncoding() (which depends on LANG) to translate the (obviously UTF-8 encoded) strings within the dictionary data base to Unicode. Please update the issue (did you already write one?) accordingly. -Stephan --------------------------------------------------------------------- Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]