On Mon, Mar 19, 2012 at 4:34 AM, Simon Marlow <[email protected]> wrote:
>> On Fri, Mar 16, 2012 at 6:49 PM, Ian Lynagh <[email protected]> wrote:
>> > Hi Gaby,
>> >
>> > On Fri, Mar 16, 2012 at 06:29:24PM -0500, Gabriel Dos Reis wrote:
>> >>
>> >> OK, thanks! I guess a take away from this discussion is that what is
>> >> a punctuation is far less well defined than it appears...
>> >
>> > I'm not really sure what you're asking. Haskell's uniSymbol includes
>> > all Unicode characters (should that be codepoints? I'm not a Unicode
>> > expert) in the punctuation category; I'm not sure what the best
>> > reference is, but e.g. table 12 in
>> > http://www.unicode.org/reports/tr44/tr44-8.html#Property_Values
>> > lists a number of Px categories, and a meta-category P "Punctuation".
>> >
>> >
>> > Thanks
>> > Ian
>> >
>>
>> Hi Ian,
>>
>> I guess what I am asking was partly summarized in Iavor's message.
>>
>> For me, the issue started with bullet number 4 in section 1.1
>>
>> http://www.haskell.org/onlinereport/intro.html#sect1.1
>>
>> which states that:
>>
>> The lexical structure captures the concrete representation
>> of Haskell programs in text files.
>>
>> That combined with the opening section 2.1 (e.g. example of terminal
>> syntax) and the fact that the grammar routinely described two non-
>> terminals ascXXX (for ASCII characters) and uniXXX for (Unicode character)
>> suggested that the concrete syntax of Haskell programs in text files is in
>> ASCII charset. Note this does not conflict with the general statement
>> that Haskell programs use the Unicode character because the uniXXX could
>> use the ASCII charset to introduce Unicode characters -- this is not
>> uncommon practice for programming languages using Unicode characters; see
>> the link I gave earlier.
>>
>> However, if I understand Malcolm's message correctly, this is not the
>> case.
>> Contrary to what I quoted above, Chapter 2 does NOT specify the concrete
>> representation of Haskell programs in text files. What it does is to
>> capture the structure of what is obtained from interpreting, *in some
>> unspecified encoding or unspecified alphabet*, the concrete
>> representation of Haskell programs in text files. This conclusion is
>> unfortunate, but I believe it is correct.
>> Since the encoding or the alphabet is unspecified, it is no longer
>> necessarily the case that two Haskell implementations would agree on the
>> same lexical interpretation when presented with the same exact text file
>> containing a Haskell program.
>>
>> In its current form, you are correct that the Report should say
>> "codepoint"
>> instead of characters.
>>
>> I join Iavor's request in clarifying the alphabet used in the grammar.
>
> The report gives meaning to a sequence of codepoints only, it says nothing
> about how that sequence of codepoints is represented as a string of bytes in
> a file, nor does it say anything about what those files are called, or even
> whether there are files at all.
Thanks, Simon.
The fact that the Report is silent about encoding used to
represent concrete Haskell programs in text files adds
a certain level of non-portability (and confusion.) I found
last night that a proposal has been made to add some
support for encoding specification
http://hackage.haskell.org/trac/haskell-prime/wiki/UnicodeInHaskellSource
I believe that is a good start. What are the odds of it being considered
for Haskell 2012? I suspect the pragma proposal works only if something
is said about the position of that pragma in the source file (e.g. it
must be the
first line, or file N bytes in the source file) otherwise we have an
infinite descent.
>
> Perhaps some clarification is in order in a future revision, and we should
> use the correct terminology where appropriate. We should also clarify that
> "punctuation" means exactly the Punctuation class.
That would be great. Do you have any comment about the
UnicodeInHaskellSource proposal?
> With regards to normalisation and equivalence, my understanding is that
> Haskell does not support either: two identifiers are equal if and only if
> they are represented by the same sequence of codepoints. Again, we could add
> a clarifying sentence to the report.
>
Ugh.
Writing a parser for Haskell was an interesting exercise :-)
-- Gaby
_______________________________________________
Haskell-prime mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-prime