Re: String != [Char]

2012-03-19 Thread Johan Tibell
On Mon, Mar 19, 2012 at 2:55 PM, Daniel Peebles  wrote:
> If the input is specified to be UTF-8, wouldn't it be better to call the
> method unpackUTF8 or something like that?

Sure.

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Daniel Peebles
If the input is specified to be UTF-8, wouldn't it be better to call the
method unpackUTF8 or something like that?

On Mon, Mar 19, 2012 at 12:59 PM, Johan Tibell wrote:

> On Mon, Mar 19, 2012 at 9:02 AM, Christian Siefkes
>  wrote:
> > On 03/19/2012 04:53 PM, Johan Tibell wrote:
> >> I've been thinking about this question as well. How about
> >>
> >> class IsString s where
> >> unpackCString :: Ptr Word8 -> CSize -> s
> >
> > What's the Ptr Word8 supposed to contain? A UTF-8 encoded string?
>
> Yes.
>
> We could make a distinction between byte and Unicode literals and have:
>
> class IsBytes a where
>unpackBytes :: Ptr Word8 -> Int -> a
>
> class IsText a where
>unpackText :: Ptr Word8 -> Int -> a
>
> In the latter the caller guarantees that the passed in pointer points
> to wellformed UTF-8 data.
>
> -- Johan
>
> ___
> Haskell-prime mailing list
> Haskell-prime@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-prime
>
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Brandon Allbery
On Mon, Mar 19, 2012 at 15:39, Simon Peyton-Jones wrote:

> Don't forget that with -XOverloadedStrings we already have a IsString
> class.  (That's not a Haskell Prime extension though.)
>

I think that's exactly the point; currently it uses [Char] initial format
and converts at runtime, which is rather unfortunate given the inefficiency
of [Char].  If it has to be done at runtime, it would be nice to at least
do it from a more efficient initial format.

-- 
brandon s allbery  allber...@gmail.com
wandering unix systems administrator (available) (412) 475-9364 vm/sms
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


RE: String != [Char]

2012-03-19 Thread Simon Peyton-Jones
Don't forget that with -XOverloadedStrings we already have a IsString class.  
(That's not a Haskell Prime extension though.)

class IsString a where
fromString :: String -> a

Simon

|  -Original Message-
|  From: haskell-prime-boun...@haskell.org [mailto:haskell-prime-
|  boun...@haskell.org] On Behalf Of Johan Tibell
|  Sent: 19 March 2012 15:54
|  To: Thomas Schilling
|  Cc: haskell-prime@haskell.org
|  Subject: Re: String != [Char]
|  
|  On Mon, Mar 19, 2012 at 8:45 AM, Thomas Schilling
|   wrote:
|  > Regarding the type class for converting to and from that type, there
|  > is a perhaps more complicated question: The current fromString method
|  > uses String as the source type which causes unnecessary overhead. This
|  > is unfortunate since GHC's built-in mechanism actually uses
|  > unpackCString[Utf8]# which constructs the inefficient String
|  > representation from a compact memory representation.  I think it would
|  > be best if the new fromString/fromText class allowed an efficient
|  > mechanism like that.  unpackCString# has type Addr# -> [Char] which is
|  > obviously GHC-specific.
|  
|  I've been thinking about this question as well. How about
|  
|  class IsString s where
|  unpackCString :: Ptr Word8 -> CSize -> s
|  
|  It's morally equivalent of unpackCString#, but uses standard Haskell types.
|  
|  -- Johan
|  
|  ___
|  Haskell-prime mailing list
|  Haskell-prime@haskell.org
|  http://www.haskell.org/mailman/listinfo/haskell-prime



___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Greg Weber
This is the best I can do with Bryan's blog posts, but none of the
graphs (which contain all the information) show up:
http://web.archive.org/web/20100222031602/http://www.serpentine.com/blog/2009/12/10/the-performance-of-data-text/

If someone has some benchmarks that can be ran that would be helpful.

On Mon, Mar 19, 2012 at 7:51 AM, Johan Tibell  wrote:
> Hi Greg,
>
> There are a few blog posts on Bryan's blog. Here are two of them:
>
>    
> http://www.serpentine.com/blog/2009/10/09/announcing-a-major-revision-of-the-haskell-text-library/
>    http://www.serpentine.com/blog/2009/12/10/the-performance-of-data-text/
>
> Unfortunately the blog seems partly broken. Images are missing and
> some articles are missing altogether (i.e. the article is there but
> the actualy body text is gone.)
>
> -- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Johan Tibell
On Mon, Mar 19, 2012 at 9:02 AM, Christian Siefkes
 wrote:
> On 03/19/2012 04:53 PM, Johan Tibell wrote:
>> I've been thinking about this question as well. How about
>>
>> class IsString s where
>>     unpackCString :: Ptr Word8 -> CSize -> s
>
> What's the Ptr Word8 supposed to contain? A UTF-8 encoded string?

Yes.

We could make a distinction between byte and Unicode literals and have:

class IsBytes a where
unpackBytes :: Ptr Word8 -> Int -> a

class IsText a where
unpackText :: Ptr Word8 -> Int -> a

In the latter the caller guarantees that the passed in pointer points
to wellformed UTF-8 data.

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Christian Siefkes
On 03/19/2012 04:53 PM, Johan Tibell wrote:
> I've been thinking about this question as well. How about
> 
> class IsString s where
> unpackCString :: Ptr Word8 -> CSize -> s

What's the Ptr Word8 supposed to contain? A UTF-8 encoded string?

Best regards
Christian

-- 
|--- Dr. Christian Siefkes --- christ...@siefkes.net ---
| Homepage: http://www.siefkes.net/ | Blog: http://www.keimform.de/
|Peer Production Everywhere:   http://peerconomy.org/wiki/
|-- OpenPGP Key ID: 0x346452D8 --
A choice of masters is not freedom.
-- Bradley M. Kuhn and Richard M. Stallman, Freedom Or Power?



signature.asc
Description: OpenPGP digital signature
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Johan Tibell
On Mon, Mar 19, 2012 at 8:45 AM, Thomas Schilling
 wrote:
> Regarding the type class for converting to and from that type, there
> is a perhaps more complicated question: The current fromString method
> uses String as the source type which causes unnecessary overhead. This
> is unfortunate since GHC's built-in mechanism actually uses
> unpackCString[Utf8]# which constructs the inefficient String
> representation from a compact memory representation.  I think it would
> be best if the new fromString/fromText class allowed an efficient
> mechanism like that.  unpackCString# has type Addr# -> [Char] which is
> obviously GHC-specific.

I've been thinking about this question as well. How about

class IsString s where
unpackCString :: Ptr Word8 -> CSize -> s

It's morally equivalent of unpackCString#, but uses standard Haskell types.

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Thomas Schilling
On 18 March 2012 19:29, ARJANEN Loïc Jean David  wrote:

> Good point, but rather than specifying in the standard that the new string
> type should be the Text datatype, maybe the new definition should be that
> String is a newtype with suitable operations defined on it, and perhaps a
> typeclass to convert to and from this newtype. The reason of my remark is
> although most implementations compile to native code, an implementation
> compiling to, for example, JavaScript might wish to use JavaScript's string
> type rather than forcing its users to have a native library installed.

I agree that the language standard should not prescribe the
implementation of a Text datatype.  It should instead require an
abstract data type (which may just be a newtype wrapper for [Char] in
some implementations) and a (minimal) set of operations on it.

Regarding the type class for converting to and from that type, there
is a perhaps more complicated question: The current fromString method
uses String as the source type which causes unnecessary overhead. This
is unfortunate since GHC's built-in mechanism actually uses
unpackCString[Utf8]# which constructs the inefficient String
representation from a compact memory representation.  I think it would
be best if the new fromString/fromText class allowed an efficient
mechanism like that.  unpackCString# has type Addr# -> [Char] which is
obviously GHC-specific.


-- 
Push the envelope. Watch it bend.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Duncan Coutts
On 17 March 2012 01:44, Greg Weber  wrote:
> the text library and Text data type have shown the worth in real world
> Haskell usage with GHC.
> I try to avoid String whenever possible, but I still have to deal with
> conversions and other issues.
> There is a lot of real work to be done to convert away from [Char],
> but I think we need to take it out of the language definition as a
> first step.

I'm pretty sure the majoirty of people would agree that if we were
making the Haskell standard nowadays we'd make String type abstract.

Unfortunately I fear making the change now will be quite disruptive,
though I don't think we've collectively put much effort yet into
working out just how disruptive.

In principle I'd support changing to reduce the number of string types
used in interfaces. From painful professional experience, I think that
one of the biggest things where C++ went wrong was not having a single
string type that everyone would use (I once had to write a C++
component integrating code that used 5 different string types). Like
Python 3, we should have two common string types used in interfaces:
string and bytes (with implementations like our current Text and
ByteString).

BTW, I don't think taking it out of the langauge would be a helpful
step. We actually want to tell people "use *this* string type in
interfaces", not leave everyone to make their own choice. I think
taking it out of the language would tend to encourage everyone to make
their own choice.

Duncan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Johan Tibell
Hi Greg,

There are a few blog posts on Bryan's blog. Here are two of them:


http://www.serpentine.com/blog/2009/10/09/announcing-a-major-revision-of-the-haskell-text-library/
http://www.serpentine.com/blog/2009/12/10/the-performance-of-data-text/

Unfortunately the blog seems partly broken. Images are missing and
some articles are missing altogether (i.e. the article is there but
the actualy body text is gone.)

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Greg Weber
I actually was not able to successfully google for Text vs. String
benchmarks. If someone can point one out that would be very helpful.

On Sat, Mar 17, 2012 at 1:52 AM, Christopher Done
 wrote:
> On 17 March 2012 05:30, Tony Morris  wrote:
>> Do you know if there is a good write-up of the benefits of Data.Text
>> over String? I'm aware of the advantages just by my own usage; hoping
>> someone has documented it rather than in our heads.
>
> Good point, it would be good to collate the experience and wisdom of
> this decision with some benchmark results on the HaskellWiki as The
> Place to link to when justifying it.
>
> ___
> Haskell-prime mailing list
> Haskell-prime@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-prime

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: What is a punctuation character?

2012-03-19 Thread Colin Paul Adams


Iavor> report?  My understanding is that the intention is that the
Iavor> alphabet is unicode codepoints (sometimes referred to as
Iavor> unicode characters).

Unicode characters are not the same as Unicode codepoints. What we want
is Unicode characters.

We don't want to be able to write a Unicode codepoint, as that would
permit writing half of a surrogate pair, which is malformed Unicode.
-- 
Colin Adams
Preston Lancashire
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: What is a punctuation character?

2012-03-19 Thread Gabriel Dos Reis
On Mon, Mar 19, 2012 at 5:36 AM, Brandon Allbery  wrote:
> On Mon, Mar 19, 2012 at 05:56, Gabriel Dos Reis
>  wrote:
>>
>> The fact that the Report is silent about encoding used to
>> represent concrete Haskell programs in text files adds
>> a certain level of non-portability (and confusion.)  I found
>
>
> Specifying the encoding can *also* limit portability, if you specify an
> encoding that is not widely supported on some target platform.

That is why I find the pragma suggestion attractive.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: What is a punctuation character?

2012-03-19 Thread Brandon Allbery
On Mon, Mar 19, 2012 at 05:56, Gabriel Dos Reis <
g...@integrable-solutions.net> wrote:

> The fact that the Report is silent about encoding used to
> represent concrete Haskell programs in text files adds
> a certain level of non-portability (and confusion.)  I found
>

Specifying the encoding can *also* limit portability, if you specify an
encoding that is not widely supported on some target platform.  (Please try
to remember that the universe is not composed solely of Windows and Linux.
 The fact that those are the only ones you care about is not relevant to
the standard; nor is the list of platforms that GHC or any other
implementation supports.)

Encoding does not belong in the language standard; it is an aspect of
implementing the language standard on a given platform.

-- 
brandon s allbery  allber...@gmail.com
wandering unix systems administrator (available) (412) 475-9364 vm/sms
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: What is a punctuation character?

2012-03-19 Thread Gabriel Dos Reis
On Mon, Mar 19, 2012 at 4:34 AM, Simon Marlow  wrote:
>> On Fri, Mar 16, 2012 at 6:49 PM, Ian Lynagh  wrote:
>> > Hi Gaby,
>> >
>> > On Fri, Mar 16, 2012 at 06:29:24PM -0500, Gabriel Dos Reis wrote:
>> >>
>> >> OK, thanks!  I guess a take away from this discussion is that what is
>> >> a punctuation is far less well defined than it appears...
>> >
>> > I'm not really sure what you're asking. Haskell's uniSymbol includes
>> > all Unicode characters (should that be codepoints? I'm not a Unicode
>> > expert) in the punctuation category; I'm not sure what the best
>> > reference is, but e.g. table 12 in
>> >    http://www.unicode.org/reports/tr44/tr44-8.html#Property_Values
>> > lists a number of Px categories, and a meta-category P "Punctuation".
>> >
>> >
>> > Thanks
>> > Ian
>> >
>>
>> Hi Ian,
>>
>> I guess what I am asking was partly summarized in Iavor's message.
>>
>> For me, the issue started with bullet number 4 in section 1.1
>>
>>      http://www.haskell.org/onlinereport/intro.html#sect1.1
>>
>> which states that:
>>
>>        The lexical structure captures the concrete representation
>>        of Haskell programs in text files.
>>
>> That combined with the opening section 2.1 (e.g. example of terminal
>> syntax) and the fact that the grammar  routinely described two non-
>> terminals ascXXX (for ASCII characters) and uniXXX for (Unicode character)
>> suggested that the concrete syntax of Haskell programs in text files is in
>> ASCII charset.  Note this does not conflict with the general statement
>> that Haskell programs use the Unicode character because the uniXXX could
>> use the ASCII charset to introduce Unicode characters -- this is not
>> uncommon practice for programming languages using Unicode characters; see
>> the link I gave earlier.
>>
>> However, if I understand Malcolm's message correctly, this is not the
>> case.
>> Contrary to what I quoted above, Chapter 2 does NOT specify the concrete
>> representation of Haskell programs in text files.  What it does is to
>> capture the structure of what is obtained from interpreting, *in some
>> unspecified encoding or unspecified alphabet*,  the concrete
>> representation of Haskell programs in text files.  This conclusion is
>> unfortunate, but I believe it is correct.
>> Since the encoding or the alphabet is unspecified, it is no longer
>> necessarily the case that two Haskell implementations would agree on the
>> same lexical interpretation when presented with the same exact text file
>> containing  a Haskell program.
>>
>> In its current form, you are correct that the Report should say
>> "codepoint"
>> instead of characters.
>>
>> I join Iavor's request in clarifying the alphabet used in the grammar.
>
> The report gives meaning to a sequence of codepoints only, it says nothing 
> about how that sequence of codepoints is represented as a string of bytes in 
> a file, nor does it say anything about what those files are called, or even 
> whether there are files at all.

Thanks, Simon.

The fact that the Report is silent about encoding used to
represent concrete Haskell programs in text files adds
a certain level of non-portability (and confusion.)  I found
last night that a proposal has been made to add some
support for encoding specification

http://hackage.haskell.org/trac/haskell-prime/wiki/UnicodeInHaskellSource

I believe that is a good start.  What are the odds of it being considered
for Haskell 2012?  I suspect the pragma proposal works only if something
is said about the position of that pragma in the source file (e.g. it
must be the
first line, or file N bytes in the source file) otherwise we have an
infinite descent.


>
> Perhaps some clarification is in order in a future revision, and we should 
> use the correct terminology where appropriate.  We should also clarify that 
> "punctuation" means exactly the Punctuation class.

That would be great.  Do you have any comment about the
UnicodeInHaskellSource proposal?

> With regards to normalisation and equivalence, my understanding is that 
> Haskell does not support either: two identifiers are equal if and only if 
> they are represented by the same sequence of codepoints.  Again, we could add 
> a clarifying sentence to the report.
>

Ugh.

Writing a parser for Haskell was an interesting exercise :-)

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


RE: What is a punctuation character?

2012-03-19 Thread Simon Marlow
> On Fri, Mar 16, 2012 at 6:49 PM, Ian Lynagh  wrote:
> > Hi Gaby,
> >
> > On Fri, Mar 16, 2012 at 06:29:24PM -0500, Gabriel Dos Reis wrote:
> >>
> >> OK, thanks!  I guess a take away from this discussion is that what is
> >> a punctuation is far less well defined than it appears...
> >
> > I'm not really sure what you're asking. Haskell's uniSymbol includes
> > all Unicode characters (should that be codepoints? I'm not a Unicode
> > expert) in the punctuation category; I'm not sure what the best
> > reference is, but e.g. table 12 in
> >    http://www.unicode.org/reports/tr44/tr44-8.html#Property_Values
> > lists a number of Px categories, and a meta-category P "Punctuation".
> >
> >
> > Thanks
> > Ian
> >
> 
> Hi Ian,
> 
> I guess what I am asking was partly summarized in Iavor's message.
> 
> For me, the issue started with bullet number 4 in section 1.1
> 
>  http://www.haskell.org/onlinereport/intro.html#sect1.1
> 
> which states that:
> 
>The lexical structure captures the concrete representation
>of Haskell programs in text files.
> 
> That combined with the opening section 2.1 (e.g. example of terminal
> syntax) and the fact that the grammar  routinely described two non-
> terminals ascXXX (for ASCII characters) and uniXXX for (Unicode character)
> suggested that the concrete syntax of Haskell programs in text files is in
> ASCII charset.  Note this does not conflict with the general statement
> that Haskell programs use the Unicode character because the uniXXX could
> use the ASCII charset to introduce Unicode characters -- this is not
> uncommon practice for programming languages using Unicode characters; see
> the link I gave earlier.
> 
> However, if I understand Malcolm's message correctly, this is not the
> case.
> Contrary to what I quoted above, Chapter 2 does NOT specify the concrete
> representation of Haskell programs in text files.  What it does is to
> capture the structure of what is obtained from interpreting, *in some
> unspecified encoding or unspecified alphabet*,  the concrete
> representation of Haskell programs in text files.  This conclusion is
> unfortunate, but I believe it is correct.
> Since the encoding or the alphabet is unspecified, it is no longer
> necessarily the case that two Haskell implementations would agree on the
> same lexical interpretation when presented with the same exact text file
> containing  a Haskell program.
> 
> In its current form, you are correct that the Report should say
> "codepoint"
> instead of characters.
> 
> I join Iavor's request in clarifying the alphabet used in the grammar.

The report gives meaning to a sequence of codepoints only, it says nothing 
about how that sequence of codepoints is represented as a string of bytes in a 
file, nor does it say anything about what those files are called, or even 
whether there are files at all.

Perhaps some clarification is in order in a future revision, and we should use 
the correct terminology where appropriate.  We should also clarify that 
"punctuation" means exactly the Punctuation class.

With regards to normalisation and equivalence, my understanding is that Haskell 
does not support either: two identifiers are equal if and only if they are 
represented by the same sequence of codepoints.  Again, we could add a 
clarifying sentence to the report.

Cheers,
Simon



___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime