[chromium-dev] Re: using string16

2009-02-03 Thread Linus Upson

An angel loses its wings for each 00 byte in UTF-16. Is 'host'
measured in base-2 or base-10?

Linus


On Tue, Feb 3, 2009 at 6:11 PM, Evan Martin  wrote:
>
> [A bunch of the team met up today to hammer out some decisions.]
>
> In brief: for strings that are known to be Unicode (that is, not
> random byte strings read from a file), we will migrate towards using
> string16.  This means all places we use wstring should be split into
> the appropriate types:
>  - byte strings should be string or vectors of chars
>  - paths should be FilePath
>  - urls should be GURL
>  - UI strings, etc. should be string16.
>
> string16 uses UTF-16 underneath.  It's equivalent to wstring on
> Windows, but wstring involves 4-byte characters on Linux/Mac.
>
> Some important factors were:
> - we don't have too many strings in this category (with the huge
> exception of WebKit), so memory usage isn't much of an issue
> - it's the native string type of Windows, Mac, and WebKit
> - we want it to be explicit (i.e. a compile error) when you
> accidentally use a byte string in a place where we should know the
> encoding (which std::string and UTF-8 doesn't allow)
> - we still use UTF-8 in some places (like the history full-text
> database) where space is more of a concern
>
> >
>

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: using string16

2009-02-04 Thread Dean McNamee

Hey Evan,

I apologize for missing this discussion, I'm sure that I'm not seeing
the entire picture and the pros of this argument.  I mentioned before
that I'm in support of utf-8 everywhere we can get it.

We are obviously going to have platform specific code for the UI
(win32 / cocoa/objective-c / gtk), and it makes sense to use the
native UI string type there.  However, I think it should be possible
for all "non-platform" common code and interfaces to be in utf-8, and
I feel like this would be a more logical design and equivalent
performance.

I just wanted to point out a few concerns I have with using string16 in general.

- Another string type.
It's already a bit confusing with WebKit strings, StringPiece,
std::string, std::wstring, and string16.  I feel like making the UI be
string16 is going to prevent us from every really pushing one string
encoding everywhere.

- WebKit strings are not an argument for string16
We don't have to interact with WebKit from the UI, and we have a very
nice interface there forced onto us by the IPC.  So I don't think
WebKit using utf-16 is an argument for our UI code.  WebKit's use of
utf-16 is forced by the JavaScript standard.

- std::wstring == std::string, only on Windows
I think this will cause some confusion and likely a few bugs, where
strings are improperly converted/confused between the two.

- You can't have string16 literals on Mac / Linux.
On Windows, L"foo" will be a 16-bit string, making it fine as a
std::wstring or string16.  On Mac and Linux these will be 32-bit,
unless we compile with -fshort-wchar, but I'm not sure that's a good
idea.  This means any string literals will need to be stored in
another encoding (ascii, utf-8, wchar_t), and then converted to
UTF-16.  This isn't so strange until you think of what will happen on
Linux, when we have utf-8 -> utf-16 -> utf-8 -> gtk.

- We don't have good library functions for string16
We have a lot of great things in string_util, and most operate on
std::string / ascii / utf8 / std::wstring.  We would need to add
string16 versions for all of these (at least it would be really nice
to be able to use them).

- Memory / speed
You pointed out originally this isn't a big deal, and that we don't
have many UI strings.  (This will later be my argument for why paying
a utf-8 -> native conversion isn't a problem).  utf-8 is a more
concise memory encoding, meaning for very commonly ASCII cases we save
a byte, and for unicode cases, it's probably the same, the utf-8
encoding would probably only take 2 bytes.  This also makes a
difference in performance, since memory is a bottle neck, and you have
to deal with less of it.  Probably not really worth evaluating in this
setting, but I just wanted to point out that I feel like utf-8 is the
superior encoding here.

I'm definitely looking forward to the other side of the picture, and
why using string16 will make our UI code simpler on Mac and Linux.

On Wed, Feb 4, 2009 at 3:11 AM, Evan Martin  wrote:
>
> [A bunch of the team met up today to hammer out some decisions.]
>
> In brief: for strings that are known to be Unicode (that is, not
> random byte strings read from a file), we will migrate towards using
> string16.  This means all places we use wstring should be split into
> the appropriate types:
>  - byte strings should be string or vectors of chars
>  - paths should be FilePath
>  - urls should be GURL
>  - UI strings, etc. should be string16.
>
> string16 uses UTF-16 underneath.  It's equivalent to wstring on
> Windows, but wstring involves 4-byte characters on Linux/Mac.
>
> Some important factors were:
> - we don't have too many strings in this category (with the huge
> exception of WebKit), so memory usage isn't much of an issue
> - it's the native string type of Windows, Mac, and WebKit
> - we want it to be explicit (i.e. a compile error) when you
> accidentally use a byte string in a place where we should know the
> encoding (which std::string and UTF-8 doesn't allow)
> - we still use UTF-8 in some places (like the history full-text
> database) where space is more of a concern
>
> >
>

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: using string16

2009-02-04 Thread Evan Martin

On Wed, Feb 4, 2009 at 6:53 AM, Dean McNamee  wrote:
> I apologize for missing this discussion, I'm sure that I'm not seeing
> the entire picture and the pros of this argument.  I mentioned before
> that I'm in support of utf-8 everywhere we can get it.

I lost this argument, so I will defer this response to someone else.  :)

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: using string16

2009-02-04 Thread Darin Fisher
The proposal was to search-n-replace std::wstring to string16.  We would
have to invent a macro to replace L"" usage.  Most usages of string literals
are in unit tests, so it doesn't seem to matter if there is cost associated
with the macro.
My belief is that there isn't much fruit to be had by converting everything
to UTF-8.  I fear people passing non-UTF-8 strings around using std::string
and the bugs that ensue from that.  We've had those problems in areas that
deal with UTF-8 and non-UTF-8 byte arrays.

Whenever we have a string16 or a wstring, it means implicitly that we have
unicode that can be displayed to the user.  So, the compiler helps us not
screw up.

If someone can make a compelling performance argument for changing Chrome's
UI over to UTF-8 and also invent a solution that avoids the problem I
described above, then converting to UTF-8 would seem OK to me.  But, right
now... it just looks like cost for not much benefit.

-Darin


On Wed, Feb 4, 2009 at 8:21 AM, Evan Martin  wrote:

>
> On Wed, Feb 4, 2009 at 6:53 AM, Dean McNamee  wrote:
> > I apologize for missing this discussion, I'm sure that I'm not seeing
> > the entire picture and the pros of this argument.  I mentioned before
> > that I'm in support of utf-8 everywhere we can get it.
>
> I lost this argument, so I will defer this response to someone else.  :)
>
> >
>

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: using string16

2009-02-04 Thread Dean McNamee

On Wed, Feb 4, 2009 at 6:11 PM, Darin Fisher  wrote:
> The proposal was to search-n-replace std::wstring to string16.  We would
> have to invent a macro to replace L"" usage.  Most usages of string literals
> are in unit tests, so it doesn't seem to matter if there is cost associated
> with the macro.
> My belief is that there isn't much fruit to be had by converting everything
> to UTF-8.  I fear people passing non-UTF-8 strings around using std::string
> and the bugs that ensue from that.  We've had those problems in areas that
> deal with UTF-8 and non-UTF-8 byte arrays.
> Whenever we have a string16 or a wstring, it means implicitly that we have
> unicode that can be displayed to the user.  So, the compiler helps us not
> screw up.

This seems to be the only argument you make, that by making string16 a
new type, we know it's encoding.  This can be solved by many other
ways by keeping utf8.  We can add a new utf8 string class if you
really wanted, or we can just be diligent and make sure to DCHECK in
methods that expect a specific encoding.  Have we had a lot of these
problems?  Do you have some examples?  It would help me figure out
solutions for better checking for utf-8.

> If someone can make a compelling performance argument for changing Chrome's
> UI over to UTF-8 and also invent a solution that avoids the problem I
> described above, then converting to UTF-8 would seem OK to me.  But, right
> now... it just looks like cost for not much benefit.
> -Darin
>
> On Wed, Feb 4, 2009 at 8:21 AM, Evan Martin  wrote:
>>
>> On Wed, Feb 4, 2009 at 6:53 AM, Dean McNamee  wrote:
>> > I apologize for missing this discussion, I'm sure that I'm not seeing
>> > the entire picture and the pros of this argument.  I mentioned before
>> > that I'm in support of utf-8 everywhere we can get it.
>>
>> I lost this argument, so I will defer this response to someone else.  :)
>>
>> >>
>
>

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: using string16

2009-02-04 Thread Thomas Van Lenten
Trying to remember what came up along the discussion.

UTF16 is what Mac/win use, so there we can avoid a batch of conversions on
those two platforms.  (Mac can take UTF8, but the system would still be
doing conversions to get things into a form it prefers)

Didn't someone say ICU needs things in 16bit also, so every time we call one
of those apis, we'd be converting round tripping if we went w/ UTF8?

TVL



On Wed, Feb 4, 2009 at 12:35 PM, Dean McNamee  wrote:

>
> On Wed, Feb 4, 2009 at 6:11 PM, Darin Fisher  wrote:
> > The proposal was to search-n-replace std::wstring to string16.  We would
> > have to invent a macro to replace L"" usage.  Most usages of string
> literals
> > are in unit tests, so it doesn't seem to matter if there is cost
> associated
> > with the macro.
> > My belief is that there isn't much fruit to be had by converting
> everything
> > to UTF-8.  I fear people passing non-UTF-8 strings around using
> std::string
> > and the bugs that ensue from that.  We've had those problems in areas
> that
> > deal with UTF-8 and non-UTF-8 byte arrays.
> > Whenever we have a string16 or a wstring, it means implicitly that we
> have
> > unicode that can be displayed to the user.  So, the compiler helps us not
> > screw up.
>
> This seems to be the only argument you make, that by making string16 a
> new type, we know it's encoding.  This can be solved by many other
> ways by keeping utf8.  We can add a new utf8 string class if you
> really wanted, or we can just be diligent and make sure to DCHECK in
> methods that expect a specific encoding.  Have we had a lot of these
> problems?  Do you have some examples?  It would help me figure out
> solutions for better checking for utf-8.
>
> > If someone can make a compelling performance argument for changing
> Chrome's
> > UI over to UTF-8 and also invent a solution that avoids the problem I
> > described above, then converting to UTF-8 would seem OK to me.  But,
> right
> > now... it just looks like cost for not much benefit.
> > -Darin
> >
> > On Wed, Feb 4, 2009 at 8:21 AM, Evan Martin  wrote:
> >>
> >> On Wed, Feb 4, 2009 at 6:53 AM, Dean McNamee 
> wrote:
> >> > I apologize for missing this discussion, I'm sure that I'm not seeing
> >> > the entire picture and the pros of this argument.  I mentioned before
> >> > that I'm in support of utf-8 everywhere we can get it.
> >>
> >> I lost this argument, so I will defer this response to someone else.  :)
> >>
> >> >>
> >
> >
>
> >
>

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: using string16

2009-02-04 Thread Darin Fisher
On Wed, Feb 4, 2009 at 9:35 AM, Dean McNamee  wrote:

>
> On Wed, Feb 4, 2009 at 6:11 PM, Darin Fisher  wrote:
> > The proposal was to search-n-replace std::wstring to string16.  We would
> > have to invent a macro to replace L"" usage.  Most usages of string
> literals
> > are in unit tests, so it doesn't seem to matter if there is cost
> associated
> > with the macro.
> > My belief is that there isn't much fruit to be had by converting
> everything
> > to UTF-8.  I fear people passing non-UTF-8 strings around using
> std::string
> > and the bugs that ensue from that.  We've had those problems in areas
> that
> > deal with UTF-8 and non-UTF-8 byte arrays.
> > Whenever we have a string16 or a wstring, it means implicitly that we
> have
> > unicode that can be displayed to the user.  So, the compiler helps us not
> > screw up.
>
> This seems to be the only argument you make, that by making string16 a
> new type, we know it's encoding.  This can be solved by many other
> ways by keeping utf8.  We can add a new utf8 string class if you
> really wanted, or we can just be diligent and make sure to DCHECK in
> methods that expect a specific encoding.  Have we had a lot of these
> problems?  Do you have some examples?  It would help me figure out
> solutions for better checking for utf-8.


We have had a lot of these problems in the code that interfaces with WinHTTP
and other networking code where std::string is used to relay headers, which
do not necessarily have a known encoding.  I've also seen this kind of
problem over-and-over-again in the Mozilla code base.

I think we have much bigger fish to fry  so, I'd need to hear a
convincing argument about why investing time and energy in converting from
UTF-16 to UTF-8 is a good idea.

-Darin




>
>
> > If someone can make a compelling performance argument for changing
> Chrome's
> > UI over to UTF-8 and also invent a solution that avoids the problem I
> > described above, then converting to UTF-8 would seem OK to me.  But,
> right
> > now... it just looks like cost for not much benefit.
> > -Darin
> >
> > On Wed, Feb 4, 2009 at 8:21 AM, Evan Martin  wrote:
> >>
> >> On Wed, Feb 4, 2009 at 6:53 AM, Dean McNamee 
> wrote:
> >> > I apologize for missing this discussion, I'm sure that I'm not seeing
> >> > the entire picture and the pros of this argument.  I mentioned before
> >> > that I'm in support of utf-8 everywhere we can get it.
> >>
> >> I lost this argument, so I will defer this response to someone else.  :)
> >>
> >> >>
> >
> >
>
> >
>

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: using string16

2009-02-04 Thread Mike Belshe
The big string area is webkit, of course.  If webkit were 100% UTF-8
already, we might take a different stance on this issue as well.

If it is our goal to get to UTF-8 everywhere, then laying the plumbing for
utf8 strings rather than string16 strings seems like the right thing to do.

Mike


On Wed, Feb 4, 2009 at 9:52 AM, Darin Fisher  wrote:

> On Wed, Feb 4, 2009 at 9:35 AM, Dean McNamee  wrote:
>
>>
>> On Wed, Feb 4, 2009 at 6:11 PM, Darin Fisher  wrote:
>> > The proposal was to search-n-replace std::wstring to string16.  We would
>> > have to invent a macro to replace L"" usage.  Most usages of string
>> literals
>> > are in unit tests, so it doesn't seem to matter if there is cost
>> associated
>> > with the macro.
>> > My belief is that there isn't much fruit to be had by converting
>> everything
>> > to UTF-8.  I fear people passing non-UTF-8 strings around using
>> std::string
>> > and the bugs that ensue from that.  We've had those problems in areas
>> that
>> > deal with UTF-8 and non-UTF-8 byte arrays.
>> > Whenever we have a string16 or a wstring, it means implicitly that we
>> have
>> > unicode that can be displayed to the user.  So, the compiler helps us
>> not
>> > screw up.
>>
>> This seems to be the only argument you make, that by making string16 a
>> new type, we know it's encoding.  This can be solved by many other
>> ways by keeping utf8.  We can add a new utf8 string class if you
>> really wanted, or we can just be diligent and make sure to DCHECK in
>> methods that expect a specific encoding.  Have we had a lot of these
>> problems?  Do you have some examples?  It would help me figure out
>> solutions for better checking for utf-8.
>
>
> We have had a lot of these problems in the code that interfaces with
> WinHTTP and other networking code where std::string is used to relay
> headers, which do not necessarily have a known encoding.  I've also seen
> this kind of problem over-and-over-again in the Mozilla code base.
>
> I think we have much bigger fish to fry  so, I'd need to hear a
> convincing argument about why investing time and energy in converting from
> UTF-16 to UTF-8 is a good idea.
>
> -Darin
>
>
>
>
>>
>>
>> > If someone can make a compelling performance argument for changing
>> Chrome's
>> > UI over to UTF-8 and also invent a solution that avoids the problem I
>> > described above, then converting to UTF-8 would seem OK to me.  But,
>> right
>> > now... it just looks like cost for not much benefit.
>> > -Darin
>> >
>> > On Wed, Feb 4, 2009 at 8:21 AM, Evan Martin  wrote:
>> >>
>> >> On Wed, Feb 4, 2009 at 6:53 AM, Dean McNamee 
>> wrote:
>> >> > I apologize for missing this discussion, I'm sure that I'm not seeing
>> >> > the entire picture and the pros of this argument.  I mentioned before
>> >> > that I'm in support of utf-8 everywhere we can get it.
>> >>
>> >> I lost this argument, so I will defer this response to someone else.
>>  :)
>> >>
>> >> >>
>> >
>> >
>>
>>
>>
>
> >
>

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: using string16

2009-02-04 Thread cpu

+1 to string16

I can't make performance or memory saving claims with a straight face
for any. We just don't process enough strings for us to matter.


On Feb 4, 9:57 am, Mike Belshe  wrote:
> The big string area is webkit, of course.  If webkit were 100% UTF-8
> already, we might take a different stance on this issue as well.
>
> If it is our goal to get to UTF-8 everywhere, then laying the plumbing for
> utf8 strings rather than string16 strings seems like the right thing to do.
>
> Mike
>
> On Wed, Feb 4, 2009 at 9:52 AM, Darin Fisher  wrote:
> > On Wed, Feb 4, 2009 at 9:35 AM, Dean McNamee  wrote:
>
> >> On Wed, Feb 4, 2009 at 6:11 PM, Darin Fisher  wrote:
> >> > The proposal was to search-n-replace std::wstring to string16.  We would
> >> > have to invent a macro to replace L"" usage.  Most usages of string
> >> literals
> >> > are in unit tests, so it doesn't seem to matter if there is cost
> >> associated
> >> > with the macro.
> >> > My belief is that there isn't much fruit to be had by converting
> >> everything
> >> > to UTF-8.  I fear people passing non-UTF-8 strings around using
> >> std::string
> >> > and the bugs that ensue from that.  We've had those problems in areas
> >> that
> >> > deal with UTF-8 and non-UTF-8 byte arrays.
> >> > Whenever we have a string16 or a wstring, it means implicitly that we
> >> have
> >> > unicode that can be displayed to the user.  So, the compiler helps us
> >> not
> >> > screw up.
>
> >> This seems to be the only argument you make, that by making string16 a
> >> new type, we know it's encoding.  This can be solved by many other
> >> ways by keeping utf8.  We can add a new utf8 string class if you
> >> really wanted, or we can just be diligent and make sure to DCHECK in
> >> methods that expect a specific encoding.  Have we had a lot of these
> >> problems?  Do you have some examples?  It would help me figure out
> >> solutions for better checking for utf-8.
>
> > We have had a lot of these problems in the code that interfaces with
> > WinHTTP and other networking code where std::string is used to relay
> > headers, which do not necessarily have a known encoding.  I've also seen
> > this kind of problem over-and-over-again in the Mozilla code base.
>
> > I think we have much bigger fish to fry  so, I'd need to hear a
> > convincing argument about why investing time and energy in converting from
> > UTF-16 to UTF-8 is a good idea.
>
> > -Darin
>
> >> > If someone can make a compelling performance argument for changing
> >> Chrome's
> >> > UI over to UTF-8 and also invent a solution that avoids the problem I
> >> > described above, then converting to UTF-8 would seem OK to me.  But,
> >> right
> >> > now... it just looks like cost for not much benefit.
> >> > -Darin
>
> >> > On Wed, Feb 4, 2009 at 8:21 AM, Evan Martin  wrote:
>
> >> >> On Wed, Feb 4, 2009 at 6:53 AM, Dean McNamee 
> >> wrote:
> >> >> > I apologize for missing this discussion, I'm sure that I'm not seeing
> >> >> > the entire picture and the pros of this argument.  I mentioned before
> >> >> > that I'm in support of utf-8 everywhere we can get it.
>
> >> >> I lost this argument, so I will defer this response to someone else.
> >>  :)
--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---