Re: [scintilla] Quick question on SCI_ENCODEDFROMUTF8

Neil Hodgson Fri, 15 Jul 2005 19:16:37 -0700

Robert Roessler:

> IIRC the story on SCI_TARGETASUTF8, the "safe" length to allocate for
> the output buf is three times the length of the "target" (plus the
> NULL byte?)...


   Yes.

> Working with its "sibling" function SCI_ENCODEDFROMUTF8, is it "safe"
> (i.e., CAN NOT fail) to allocate the output buf @ the SAME size as the
> input utf8 buf?

   The only encoding I know of that could be a problem is EUC-JP where
there are 3 byte representations of some characters some of which may
only be 2 byte in UTF-8. Scintilla prefers ShiftJIS but EUC-JP may be
supported if the locale is set up for it. I am not aware of all the
tricks that can be played with locales in an application so have not
tried to tightly define what expansion could appear here. For SciTE
I'm happy with expecting 1:1 and handling fault reports if they occur.
For a reasonable degree of safety you could allocate the same 3*+1 as
for SCI_TARGETASUTF8.

http://www.rikai.com/library/kanjitables/kanji_codes.euc.shtml

> but do not have access to
> any value that has been set by SCI_SETLENGTHFORENCODE

   The intent was that SCI_SETLENGTHFORENCODE+SCI_ENCODEDFROMUTF8 is
really a single call: the break up into two is caused by the Scintilla
interface only allowing two parameters.

   Neil

_______________________________________________
Scintilla-interest mailing list
[email protected]
http://mailman.lyra.org/mailman/listinfo/scintilla-interest

Re: [scintilla] Quick question on SCI_ENCODEDFROMUTF8

Reply via email to