[api-issues] [Issue 58767] New - encoding flaw in dic tionary entries - garbled special chars in doc uments

ms2 Thu, 01 Dec 2005 20:00:14 -0800

To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=58767
                  Issue #:|58767
                  Summary:|encoding flaw in dictionary entries - garbled special
                          |chars in documents
                Component:|api
                  Version:|OOo 2.0
                 Platform:|PC
                      URL:|
               OS/Version:|All
                   Status:|UNCONFIRMED
        Status whiteboard:|
                 Keywords:|
               Resolution:|
               Issue type:|DEFECT
                 Priority:|P3
             Subcomponent:|code
              Assigned to:|sw
              Reported by:|ms2






------- Additional comments from [EMAIL PROTECTED] Thu Dec  1 20:00:08 -0800 
2005 -------
When dictionary entries are inserted into a writer doc from SBASIC code
the enconding is not respected, special chars (german umlauts and the like)
are not converted correctly.

This problem vanishes if the locale is set to de_DE.UTF-8 before staring
OO.o.

Tested with:
OO.o 1.1.3-de/FreeBSD
OO.o 1.1.5-de/Windows 98
OO.o 1.1.5-de/Windows 2000
OO.o 2.0 RC1-de/Windows 98

Any combination garbles special chars, only on FreeBSD with locale
set to de_DE.UTF-8 the chars are shown correctly.

You find a complete description and diagnosis and testing code for
reproducing below:

Von:    Stephan Bergmann <[EMAIL PROTECTED]>
Antwort an:     dev@api.openoffice.org
An:     dev@api.openoffice.org
Betreff:        Re: [api-dev] encoding flaw in dictionary entries
Datum:  Wed, 30 Nov 2005 09:57:54 +0100
Newsgroups:     openoffice.api.dev

Marc Santhoff wrote:
> Am Dienstag, den 29.11.2005, 09:56 +0100 schrieb Stephan Bergmann:
> 
>>Marc Santhoff wrote:
>>
>>>Am Montag, den 28.11.2005, 10:29 +0100 schrieb Stephan Bergmann:
>>>
>>>
>>>>Marc Santhoff wrote:
>>>>
>>>>
>>>>>Hi,
>>>>>
>>>>>I'm using dictionaries from basic code and noticed a problem. When the
>>>>>search word from a dictionary entry is inserted into a writer doc the
>>>>>encoding is not shown correctly.
>>>>>
>>>>>Try this in a german localized version:
>>>>>
>>>>>sub encError
>>>>>   dls = createUnoService("com.sun.star.linguistic2.DictionaryList")
>>>>>   dic = dls.getDictionaryByName("soffice.dic")
>>>>>   entries = dic.getEntries()
>>>>>   msgbox entries(16).getDictionaryWord()
>>>>>end sub
>>>>>
>>>>>In a german language version of OO.o 1.1.x this should read
>>>>>"Bemaßungslinien" but the char "ß" is not converted correctly. This
>>>>>holds true for the german  OO.o2.0-RC1/Windows, too.
>>>>>
>>>>>Is this worth filing an issue or is it a pilots error?
>>>>
>>>>It sure sounds like an error (so please file an issue): 
>>>>XDictionaryEntry.getDictionaryWord returns a UNO string, which is 
>>>>Unicode, so no excuse to garble an "ß" (and Basic's msgbox command 
>>>>should also be fully Unicode...).
>>>
>>>
>>>Thank for replying.
>>>
>>>I only thought I was missing some conversion function or the like
>>>because all umlauts are garbled too. They are shown as two chars in a
>>>writer doc. And from the GUI anything works as expected ...
>>
>>You mean, adding text to a writer doc via some Basic code (where the 
>>text to be added is represented as a literal Basic string) leads to 
>>garbled characters?  That's strange.  Maybe Andreas Bregas knows whether 
>>there is some part of Basic or the Basic IDE that works with 
>>locale-dependent text encodings instead of Unicode?
> 
> 
> Yes, that's what I wanted to say.
> 
> Another Test fpor the german localized OO.o:
> 
> sub encError2
>       BasicLibraries.LoadLibrary("Tools")
>       dls = createUnoService("com.sun.star.linguistic2.DictionaryList")
>       dic = dls.getDictionaryByName("soffice.dic")
>       entries = dic.getEntries()
>       tmpDoc = CreateNewDocument("swriter")
>       csr = tmpDoc.Text.createTextCursor()
>       tmpDoc.Text.string = entries(16).getDictionaryWord() ' "ß"
>       tEnd = tmpDoc.Text.getEnd()
>       tEnd.String = entries(46).getDictionaryWord() ' "ö"
> end sub
> 
> This does garble the special chars, too.
> 
> Regards,
> Marc

Two things I noticed when trying to reproduce this:

1  You must be using a non-UTF-8 locale (probably 8859-1), check the 
environment variable LANG.  If you set LANG to something like 
"de_DE.UTF-8" the problem should go away.

2  If you modify the Basic script by adding

     tEnd = tmpDoc.Text.getEnd()
     tEnd.String = "äöü"
   end sub

to the end, you see that Basic is not the culprit, as the umlauts show 
up correctly in the writer doc, regardless of LANG setting.

I suspect that the OOo dictionary implementation erroneously uses 
osl_getThreadTextEncoding() (which depends on LANG) to translate the 
(obviously UTF-8 encoded) strings within the dictionary data base to 
Unicode.  Please update the issue (did you already write one?) accordingly.

-Stephan

---------------------------------------------------------------------
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[api-issues] [Issue 58767] New - encoding flaw in dic tionary entries - garbled special chars in doc uments

Reply via email to