[chromium-dev] Re: UTF8

Peter Kasting Thu, 15 Jan 2009 13:04:07 -0800

On Thu, Jan 15, 2009 at 10:55 AM, Dean McNamee <de...@chromium.org> wrote:


> I think a good example is chrome_constants.cc, I don't see why any of
> these should have to be wide.  Some of them may make their way into
> filenames, etc, in which case they could be easily converted or be
> handled directly by FilePath methods, etc.
>
> Another good example would be chrome_switches, these will always be ASCII.


In principle I agree that things that will always be ASCII should just be
stored as strings, not wstrings.  The caveat is that if they can be supplied
as arguments to functions which also take wide strings from elsewhere, they
introduce conversions somewhere, which usually doesn't matter from an
efficiency perspective, but makes code uglier and makes it trickier to know
what representation a given API really ought to use.

String representations spread in the same way const does, so the consistency
argument is a pretty strong argument IMO, even though I completely agree
with you that it can waste memory.  OTOH, last time we discussed this we
figured most string usage was in the renderer, and the savings in the
browser would be fairly minimal.  For example, changing those constants and
switches you mention above barely saves anything,

So far I just mentioned constants, but I think this also applies to a
> lot of other parts to our code, and it makes sense to shift to UTF8 in
> a lot of our internal representations.


I think in speed-critical bits we're currently trying to use whatever
results in the fewest conversions, which makes some sense to me.  (Like, I
thought some stuff came out of the renderer as UTF-16.)


> Just wanted to solicit thoughts, and make sure there is some sort of
> agreement and support if we start trying to UTF8 some pieces of
> Chrome.


I'm not opposed so much as not-terribly-convinced we'll get wins out of it.
 If we do, the change seems worth it.

My biggest concern after memory usage (where a significant savings would
make me strongly support this) is code readability.  Using different string
types and conversions in the Gecko codebase almost made me kill myself.
 We're not nearly so bad, but some of the recent changes I've reviewed to
convert some wstrings to strings and insert UTF8ToWide() calls have made
code harder to read and suggested that larger refactorings of APIs and
members (which lead to even more refactorings, chain-reaction-style) would
be better.

Also, when this has come up before, I've heard the argument that this
> means strings are no longer directly indexable (ie blah[3] gets you
> the 4th character).  Well, this isn't true for wchar_t on Windows
> either.  Since it is UTF16, it can have surrogate pairs (Unicode is
> current defined for something like 20-21 bits?).


I thought our wstrings were frequently UCS-2 instead of UTF-16, so this
property _does_ hold?

Your point is well-taken, though, and I agree with everything you say
afterwards.

PK

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

[chromium-dev] Re: UTF8

Reply via email to