Re: Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Johan Tibell
On Tue, Aug 17, 2010 at 1:05 PM, Bulat Ziganshin
wrote:

> Hello Tako,
>
> Tuesday, August 17, 2010, 3:03:20 PM, you wrote:
>
> > Unless a Char in Haskell is 32 bits (or at least more than 16 bits)
> > it con NOT encode all Unicode points.
>
> it's 32 bit
>

Like Bulat said it's 32 bit. It's *defined* as being the Unicode code point
number. It has no relation to e.g. char in C.

-- Johan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Johan Tibell
On Tue, Aug 17, 2010 at 12:39 PM, Bulat Ziganshin  wrote:

> Hello Tom,
>
> Tuesday, August 17, 2010, 2:09:09 PM, you wrote:
>
> > In the first iteration of the Text package, UTF-16 was chosen because
> > it had a nice balance of arithmetic overhead and space.  The
> > arithmetic for UTF-8 started to have serious performance impacts in
> > situations where the entire document was outside ASCII (i.e. a Russian
> > or Arabic document), but UTF-16 was still relatively compact
>
> i don't understand what you mean. are you support all 2^20 codepoints
> in Data.Text package?
>

Yes, UTF-16 can represent all Unicode code points, using surrogate pairs.

-- Johan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Tom Harper
2010/8/17 Bulat Ziganshin :
> Hello Tom,



> i don't understand what you mean. are you support all 2^20 codepoints
> in Data.Text package?

Bulat,

Yes, its internal representation is UTF-16, which is capable of
encoding *any* valid Unicode codepoint.

-- Tom
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Johan Tibell
Hi Bulat,

On Tue, Aug 17, 2010 at 10:34 AM, Bulat Ziganshin  wrote:

>  > It's not clear to me that using UTF-16 internally does make
> > Data.Text noticeably slower.
>
> not slower but require 2x more memory. speed is the same since
> Unicode contains 2^20 codepoints
>

Yes, in theory a program could use as much as 2x the memory. That being
said, most programs don't hold that much text data in memory at any given
point so that might be 2x of a small number. One experiment [1] found it
difficult to show any difference in memory usage at all in Trac when
switching Python's internal representation from UCS2 to UCS4.

So it's not clear to me that using UTF-16 makes the program noticeably
slower or use more memory on a real program.

1. http://www.cmlenz.net/archives/2008/07/the-truth-about-unicode-in-python

Cheers,
Johan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Tako Schotanus
On Tue, Aug 17, 2010 at 10:34, Bulat Ziganshin wrote:

> Hello Johan,
>
> Tuesday, August 17, 2010, 12:20:37 PM, you wrote:
>
> >  I agree, Data.Text is great.  Unfortunately, its internal use of UTF-16
> >  makes it inefficient for many purposes.
>
> > It's not clear to me that using UTF-16 internally does make
> > Data.Text noticeably slower.
>
> not slower but require 2x more memory. speed is the same since
> Unicode contains 2^20 codepoints
>
>
This is not entirely correct because it all depends on your data.
For western languages is normally holds true that UTF16 occupies twice the
memory of UTF8, but for other languages code points might take up to 3 bytes
(I thought even 4, but the wikipedia page only mentions 3:
http://en.wikipedia.org/wiki/UTF-8).

That wikipedia page is a nice read anyway, it mentions some of the
advantages and disadvantages of the different encodings.
(The complexity of the code that determines the length of an UTF string
depends on the encoding for example)

Cheers,
 -Tako
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe