Re: String != [Char]

2012-03-26 Thread Gabriel Dos Reis
On Mon, Mar 26, 2012 at 11:42 AM, Christian Siefkes
 wrote:

> Also, that example is not really an argument against using list functions on
> strings (which, by any reasonable definition, seem to be "sequences of
> characters" -- whether that sequence is represented as a list, an array, or
> something else, seems more like an implementation detail to me).

The correctness problems isn't that a list is used to represent a sequence.
The problem is that that representational detail (and I agree with you
it is an implementation of sequence) is made part of the API on strings.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Gabriel Dos Reis
On Mon, Mar 26, 2012 at 9:21 AM, Greg Weber  wrote:
> Can anyone explain how the tangent discussion of the finer points of
> Unicode and the value of teaching [Char] is relevant to the proposal
> under discussion? We aren't going to completely eliminate String and
> break most existing Haskell code as Simon said. String is just a list
> anyways, and lists are here to stay in Haskell.
>
> I would like to get back to working on the proposal and determining
> how Text can be added to the language.

The discussion started because of the question of whether Text should
support list processing functions at all, and if so how.  That is a
very legitimate
question related to the Text proposal, at least if you are concerned about
correct semantics.  Once you are there, the discussion about Unicode
characters is unavoidable, and is very much within the scope of discussing
Text.

I may have missed the proposal to eliminate list from Haskell, though.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Gabriel Dos Reis
On Mon, Mar 26, 2012 at 8:35 AM, Gábor Lehel  wrote:
> On Sun, Mar 25, 2012 at 5:19 AM, Greg Weber  wrote:
>> On Sat, Mar 24, 2012 at 7:26 PM, Gabriel Dos Reis
>>  wrote:
>>> On Sat, Mar 24, 2012 at 9:09 PM, Greg Weber  wrote:
>>
>>>> Problem: we want to write beautiful (and possibly inefficient) code
>>>> that is easy to explain. If nothing else, this is pedagologically
>>>> important.
>>>> The goals of this code are to:
>>>>  * use list processing pattern matching and functions on a string type
>>>
>>> I may have missed this question so I will ask it (apologies if it is a
>>> repeat):  Why is it believed that list processing pattern matching is
>>> appropriate or the right tool for text processing?
>>
>> Nobody said it is the right tool for text processing. In fact, I think
>> we all agreed it is the wrong tool for many cases. But it is easy for
>> students to understand since they are already being taught to use
>> lists for everything else.  It would be great if you can talk with
>> teachers of Haskell and figure out a better way to teach text
>> processing.
>>
>
> I think a helpful question might be whether [Char] is mainly used to
> teach about lists, or whether it's mainly used to teach about how to
> do Unicode text processing correctly. If it's mainly used to teach
> about lists, pattern matching, etc., as I suspect, then the fine
> details of Unicode don't matter so much, you could even work with
> ASCII-only strings and it would work equally well for teaching about
> lists.

I agree that if the purpose is to teach list and list pattern matching,
it does not matter much what the element type is as long as it follows
reasonable constraints.  However, as someone observed earlier, the
Haskell Report is not a vehicle to prescribe how Haskell should be
taught or for what reasons Haskell should be taught.  That argument,
while it was made to support String = [Char] for pedagogical purposes,
is in fact a good argument against.

>  How to do Unicode text processing correctly is a topic that
> seems like it would become important much later, when someone's going
> to write code that's meant to be used in a production environment.
> Most students in an introductory university course probably don't get
> close to that point. If you do want to teach about how to do Unicode
> text processing correctly (which, for the record, is an important
> issue irrespective of which programming language you're using) then
> presumably you want to teach about Text, but hopefully your students
> will be more advanced by then and it won't be so much of a problem.

The Haskell Report claims very prominently that it uses the Unicode
character set.  The question is whether it should be using it correctly
at all, and if so should it even try to pretend that its default string type
use those characters correctly.

I do not subscribe to the notion that simple correct text processing
is something
students would have to learn only in "advanced" classes dedicated to
Unicode.  In the region of this side of the Atlantic Ocean where I teach, the
student population is very diverse and I do think it would responsible to stand
in front of students and say:
 You are all welcome; this class is open to all cultures and we
are committed
  to diversity and equal opportunity.  However, for the purpose of
simplicity and
  pedagogy, we would refrain from looking at texts from this and
other students.
  If you are really interested, you should take an advanced class.
 I hope you
  enjoy the class.

Furthermore, I am not convinced it is a good strategy to try hard to reflect
the notion that text processing is hard, either in the language or in
its presentation
(e.g. it is advanced topic, you need to be advanced before we talk about it.)

> I'm not really sure what that recommends in terms of policy. Mainly
> what you need is "it should be possible to work with lists of
> characters" and "it should be possible to work with Text", which we
> more-or-less have already. The important bits seem to be
> OverloadedStrings and ideally some way to avoid a pervasive API bias
> towards the wrong type (the tradeoffs there are probably more tricky).
> (So... basically what Simon M. said.)

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Gabriel Dos Reis
On Mon, Mar 26, 2012 at 7:29 AM, Christian Siefkes
 wrote:
> On 03/26/2012 01:26 PM, Gabriel Dos Reis wrote:
>> It is not the precision of Char or char that is the issue here.
>> It has been clarified at several points that Char is not a Unicode character,
>> but a Unicode code point.  Not every Unicode code point represents a
>> Unicode code character, and not every sequence of Unicode code points
>> represents a character or a sequence of Unicode character.
>
> What do you mean? Every Unicode character corresponds to one code point,

Yes, but this correspondence is not a bijection -- a great source of
confusion that
permeates lot of discussions about Unicode characters and texts,
including this one
(and a previous regarding the Haskell Report.)  Very much heart breaking :-(

> and
> every code point in the range 0 to 0x10 (excluding the range 0xD800 to
> 0xDFFF which is reserved for surrogate pairs in UTF-16, and a handful of
> "noncharacters", see
> http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Special_code_points
> ) corresponds to one character.
>
> Maybe your criticism is that Char does not explicitly prevent these special
> code points from being assigned? While true, that seems a relatively minor
> matter. Moreover, a future revision of the Haskell standard could easily
> declare that a assigning a "forbidden" character results in an error/bottom
> if that is so desired.

It is not just a matter of clarification that certain things are
forbidden.   I believe
it would be a great mistake to qualify it as minor. How do you handle
normalization
if you expose the texts as sequence of unrelated code points that can be freely
taken apart and combined?

- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Gabriel Dos Reis
On Mon, Mar 26, 2012 at 4:57 AM, Simon Marlow  wrote:

> Remember that FilePath is not part of the debate, since neither [Char] nor 
> Text are correct representations of FilePath.

Yes.

> If we want to do an evaluation of the pedagogical value of [Char] vs. Text, I 
> suggest writing something like a regex matcher in both and comparing the two.

> One more thing: historically, performance considerations have been given a 
> fairly low priority in the language design process for Haskell, and rightly 
> so.  That doesn't mean performance has been ignored altogether (for example, 
> seq), but it is almost never the case that a concession in other language 
> design principles (e.g. consistency, simplicity) is made for performance 
> reasons alone.  We should remember, when thinking about changes to Haskell, 
> that Haskell is the way it is because of this uncompromising attitude, and we 
> should be glad that Haskell is not burdened with (many) legacy warts that 
> were invented to work around performance problems that no longer exist.  I'm 
> not saying that this means we should ignore Text as a performance hack, just 
> that performance should not come at the expense of good language design.

For pedagogical purposes (which seems to be the primary argument for
String = [Char]), I am far less concerned about performance than
correctness.

After going through the discussion this morning again, looking at
various arguments, I am not really sure that Haskell isn't burdened
with legacy warts ;-)

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Gabriel Dos Reis
On Mon, Mar 26, 2012 at 5:08 AM, Christian Siefkes
 wrote:
> On 03/26/2012 02:39 AM, Gabriel Dos Reis wrote:
>> True, but should the language definition default to a string type
>> that is one the most unsuited for text processing in the 21st
>> century where global multilingualism abounds?  Even C has qualms
>> about that.
> ...
>> I have no doubt believing that if all texts my students have to
>> process are US ASCII, [Char] is more than sufficient.  So, I have
>> sympathy for your position.  However,  I doubt [Char] would be
>> adequate if I ask them to shared texts from their diverse cultures.
>
> Uh, while a C char is (usually) just a byte (2^8 bits of information, like
> Word8 in Haskell), a Haskell Char is a Unicode character (2^21 bits of
> information).

It is not the precision of Char or char that is the issue here.
It has been clarified at several points that Char is not a Unicode character,
but a Unicode code point.  Not every Unicode code point represents a
Unicode code character, and not every sequence of Unicode code points
represents a character or a sequence of Unicode character.

> A single C char cannot contain arbitrary Unicode character,
> while a Haskell Char can, and does. Hence [Char] is (efficiency issues
> aside) perfectly adequate for dealing with texts written in arbitrary 
> languages.

See above.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-25 Thread Gabriel Dos Reis
On Sun, Mar 25, 2012 at 6:54 PM, Henrik Nilsson  wrote:

> In any case, this is hardly the place to to discuss how to best
> teach Haskell or programming in general.

Sure, I haven't seen any disagreement with that.

Note however that the "pedagogical" arguments was brought
in as support for the [Char] definition.  It is only natural that it being
challenged on that ground.

> Nor is the Haskell standard a vehicle to prescribe how Haskell
> should be taught or for what reasons Haskell should be taught:

I have not seen any assertion to that effect.

> that can only be decided by individual educators based in their
> experience and given a specific teaching context.

True, but should the language definition default to a string type
that is one the most unsuited for text processing in the 21st
century where global multilingualism abounds?  Even C has qualms
about that.

> Given intimate knowledge of our specific teaching context
> here at Nottingham, I can say that removing String = [Char]
> from the language wouldn't be helpful to us.

I have no doubt believing that if all texts my students have to
process are US ASCII, [Char] is more than sufficient.  So, I have
sympathy for your position.  However,  I doubt [Char] would be
adequate if I ask them to shared texts from their diverse cultures.
Should the language definition make it much harder to share such
experience in classroom when the primary argument for [Char]
is pedagogy?

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-25 Thread Gabriel Dos Reis
On Sun, Mar 25, 2012 at 4:03 PM, Daniel Peebles  wrote:
> On Sun, Mar 25, 2012 at 3:47 PM, Gabriel Dos Reis
>  wrote:
>>
>>
>> We are doing our students no favor, no good, in being condescending to
>> them pretending that they can't handle teaching material that would
>> actually
>> be close real world experience.  If we truly believe that they don't have
>> enough time to learn what would really be useful to them, then we are
>> truly wasting their valuable time teaching them things they would have to
>> unlearn before writing good and correct code.  The education would have
>> been a complete failure and waste of resources.
>>
>
> When people teach Haskell, it typically isn't to give them "real world
> experience", but to teach them an interesting programming language and all
> the great computer science it leads to.

Yes, but you have to frame it in the context of interesting problems, otherwise
it reduces to a series of dry, pointless, uninspiring series functions
named f, g, h :-)


> Types, laziness, higher-order
> abstractions are the hard bits to learn, not a string-processing API.
>
> If people want to learn how to deal with unicode correctly, I can think of
> several better places to learn about it than a Haskell course.

I don't think anybody suggested that a Haskell course should be a substitute
for Unicode course.  However, I maintain that it isn't an excuse to purposefully
teach something that the students have to unlearn.

> I don't think
> it's condescending or impractical to focus on the things that make Haskell
> unique, rather than teaching a unicode-correct API that could conceivably be
> written in any other language.

Why should a Unicode-correct API would have to be written in any other
language and not Haskell?

> Learning that real human text cannot be
> treated just an independent list of characters is something that takes
> minutes to hours at most: someone tells you that there are all sorts of
> exceptions to the list-of-chars paradigm, and then you read an article or
> two on the language-specific difficulties, learn to use specialized API
> functions, and then you get on with what you were actually trying to do.

Which brings us back to square zero: Is there any fundamental reason
why the language can't provide a good Unicode-correct API  to illustrate
solutions to text processing problems (many of them interesting) and that
can be used in introductory classes instead of having to say the
above.  Students,
like most children, learn by imitation.  They will replicate whatever they
are shown (for a long period of time, if not forever.) If it is true
that students
don't time, why should they have to waste that scarce resource listening to the
list-of-char  paradigm in the first place?

> So I think saying that ignoring unicode-correct strings a complete failure
> and waste of resources is a bit hyperbolic, honestly.

It may be an inconvenient truth, but not hyperbole [ I see you are
trying to be polite :-) ]  Look at the almost permanent damage done by the
culture that equated 'char*' to strings.  It may be inconvenient to say, but
[Char] isn't any better -- in fact, I'll go further and say: it is
spreading the same
damage but only with a different syntax.  The damage is semantics, no amount
of syntax clothing will undo it.  I realize that may appear as a
strong statement,
but think about it.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-25 Thread Gabriel Dos Reis
On Sun, Mar 25, 2012 at 2:08 PM, Edward Kmett  wrote:

> If anything we delude ourselves by overestimating the ability of kids just
> shortly out of highschool to assimilate an entire new worldview in a couple
> of weeks while they are distracted by other things. Any additional
> distraction that makes this harder is a serious pain point.

We are doing our students no favor, no good, in being condescending to
them pretending that they can't handle teaching material that would actually
be close real world experience.  If we truly believe that they don't have
enough time to learn what would really be useful to them, then we are
truly wasting their valuable time teaching them things they would have to
unlearn before writing good and correct code.  The education would have
been a complete failure and waste of resources.

> Consequently, in my experience, most instructors don't even go outside of
> the Prelude,

but  is that even a good thing?

> except perhaps to introduce simple custom data types that their
> students define.

I would say that does not do students any good, nor does it do justice to
the language.  If one believes that Haskell is unlike any other "mainstream"
language, then it is an opportunity to show that it can handle beautifully and
flawlessly some real world problems whose solutions are more involved
in other more popular languages.   These days, most of the students are
strolling around with "smart phones" that have lot of data in form of texts
(email, SMS chats, etc.)  What better real world data could you find at
a cheaper price to get them experiment with?  Restricting oneself to
purely "academic"
exercises, with no practical benefit whatsoever,  would only reinforce
students'
(mis)perception that the language isn't of any use to them -- and they would
probably be indulging more into distractions, and it would be hard to
blame them.

> The goal in that period is to get the students accustomed
> to non-strictness, do some list processing, and hope that an understanding
> of well-founded recursion vs. productive corecursion sticks, because these
> are the things that you can't teach well in another language and which are
> useful to the student no matter what tools they wind up using in the future.

I would say that is even more reasons to get them learn something that they
would not have to unlearn in order to remain harmless :-)

>
> I would rather extra time be spent trying to get the users up to speed on
> the really interesting and novel parts of the language, such as typeclasses
> and monads in particular, than lose at least a quarter of my time fiddling
> about with text processing, a special case API and qualified imports,
> because those couple of weeks are going to shape many of those students'
> opinion of the language forever.

More reasons not to show them anything that would reinforce the idea that
language should not be taken seriously and is complete waste of time.

You would be surprised to learn how bored students are *because* we obsess
too much on trying simplify their lives, while they are craving for us
to make it
more interesting, more challenging.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-25 Thread Gabriel Dos Reis
On Sun, Mar 25, 2012 at 2:08 PM, Edward Kmett  wrote:
> On Sun, Mar 25, 2012 at 11:42 AM, Gabriel Dos Reis
>  wrote:
>>
>> Perhaps we are underestimating their competences  and are
>> complicating their lives unnecessarily...
>
>
> Have you ever actually taught an introductory languages course?

Yes, and Haskell (if you asked); that is part of my daytime job.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-25 Thread Gabriel Dos Reis
On Sat, Mar 24, 2012 at 10:19 PM, Greg Weber  wrote:
> On Sat, Mar 24, 2012 at 7:26 PM, Gabriel Dos Reis
>  wrote:
>> On Sat, Mar 24, 2012 at 9:09 PM, Greg Weber  wrote:
>
>>> Problem: we want to write beautiful (and possibly inefficient) code
>>> that is easy to explain. If nothing else, this is pedagologically
>>> important.
>>> The goals of this code are to:
>>>  * use list processing pattern matching and functions on a string type
>>
>> I may have missed this question so I will ask it (apologies if it is a
>> repeat):  Why is it believed that list processing pattern matching is
>> appropriate or the right tool for text processing?
>
> Nobody said it is the right tool for text processing. In fact, I think
> we all agreed it is the wrong tool for many cases.

Hmm, I would have thought that would be enough reasons not
to use that method -- "wrong methods" are hard to unlearn
and to get rid of.

> But it is easy for  students to understand since they are already being 
> taught to use
> lists for everything else.

Perhaps we are underestimating their competences  and are complicating
their lives unnecessarily...

> It would be great if you can talk with
> teachers of Haskell and figure out a better way to teach text
> processing.

my suspicion is teachers of Haskell would want designers
of Haskell to make the good datatype for text the default :-) :-)

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Gabriel Dos Reis
On Sat, Mar 24, 2012 at 9:09 PM, Greg Weber  wrote:
> # Switching to Text by default makes us embarrassed!

Text processing /is/ quick to embarrassment :-)

> Problem: we want to write beautiful (and possibly inefficient) code
> that is easy to explain. If nothing else, this is pedagologically
> important.
> The goals of this code are to:
>  * use list processing pattern matching and functions on a string type

I may have missed this question so I will ask it (apologies if it is a
repeat):  Why is it believed that list processing pattern matching is
appropriate or the right tool for text processing?


-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Gabriel Dos Reis
On Sat, Mar 24, 2012 at 8:51 PM, Johan Tibell  wrote:
> On Sat, Mar 24, 2012 at 5:54 PM, Gabriel Dos Reis
>  wrote:
>> I think there is a confusion here.  A Unicode character is an abstract
>> entity.  For it to exist in some concrete form in a program, you need
>> an encoding.  The fact that char16_t is 16-bit wide is irrelevant to
>> whether it can be used in a representation of a Unicode text, just like
>> uint8_t (e.g. 'unsigned char') can be used to encode Unicode string
>> despite it being only 8-bit wide.   You do not need to make the
>> character type exactly equal to the type of the individual element
>> in the text representation.
>
> Well, if you have a >21-bit type you can declare its value to be a
> Unicode code point (which are numbered.)

That is correct.  Because not all Unicode points represent characters,
and not all Unicode code point sequences represent valid characters,
even if you have that >21-bit type T, the list type [T] would still not be a
good string type.

> Using a char* that you claim
> contain utf-8 encoded data is bad for safety, as there is no guarantee
> that that's indeed the case.

Indeed, and that is why a Text should be an abstract datatype, hiding
the concrete implementation away from the user.

>> Note also that an encoding itself (whether UTF-8, UTF-16, etc.) is 
>> insufficient
>> as far as text processing goes; you also need a localization at the
>> minimum.  It is the
>> combination of the two that gives some meaning to text representation
>> and operations.
>
> text does that via ICU. Some operations would be possible without
> using the locale, if it wasn't for those Turkish i:s. :/

yeah, 7 bits should be enough for every character ;-)

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Gabriel Dos Reis
On Sat, Mar 24, 2012 at 7:16 PM, Johan Tibell  wrote:
> On Sat, Mar 24, 2012 at 4:42 PM, Gabriel Dos Reis
>  wrote:
>> Hmm, std::u16string, std::u23string, and std::wstring are C++ standard
>> types to process Unicode texts.
>
> Note that at least u16string is too small to encode all of Unicode and
> wstring might be as 16 bits is not enough to encode all of Unicode.
>

I think there is a confusion here.  A Unicode character is an abstract
entity.  For it to exist in some concrete form in a program, you need
an encoding.  The fact that char16_t is 16-bit wide is irrelevant to
whether it can be used in a representation of a Unicode text, just like
uint8_t (e.g. 'unsigned char') can be used to encode Unicode string
despite it being only 8-bit wide.   You do not need to make the
character type exactly equal to the type of the individual element
in the text representation.

Now, if you want to make a one-to-one correspondence between
individual elements in a std::basic_string and a Unicode character,
you would of course go for char32_t, which might be wasteful
depending on the circumstances.  Text processing languages like Perl
have long decided to de-emphasize one-character-at-a-time processing.
For most common cases, it is just inefficient.  But, I also understand
that the efficiency argument may not be strong in the context of Haskell.
However, I believe a particular attention must be paid to the correctness
of the semantics.

Note also that an encoding itself (whether UTF-8, UTF-16, etc.) is insufficient
as far as text processing goes; you also need a localization at the
minimum.  It is the
combination of the two that gives some meaning to text representation
and operations.

I have been following the discussion, but I don't see anything said
about locales.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Gabriel Dos Reis
On Sat, Mar 24, 2012 at 6:00 PM, Johan Tibell  wrote:

> C++'s char* is morally equivalent of our ByteString, not Text. There's
> no standardized C++ Unicode string type, ICU's UnicodeString is
> perhaps the closest to one.

Hmm, std::u16string, std::u23string, and std::wstring are C++ standard
types to process Unicode texts.

Anyway, my inclination is that having a proper string in Haskell type would
be a Good Thing.  Sometimes it is worth breaking the textbook.

In our local Haskell system for AVR microcontrollers, we explicitly made
String distinct from [Char] -- we cannot afford the memory
inefficiency that [Char] entails, just to represent simple strings.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Gabriel Dos Reis
On Sat, Mar 24, 2012 at 5:33 PM, Freddie Manners  wrote:
> To add my tuppence-worth on this, addressed to no-one in particular:
>
> (1) I think getting hung up on UTF-8 correctness is a distraction here.  I
> can't imagine anyone suggesting that the C/C++ standards removed support for
> (char*) because it wasn't UTF-8 correct: sure, you'd recommend people use a
> different type when it matters, but the language standard itself shouldn't
> be driven by technical issues that don't affect most people most of the
> time.  I'm sure it's good engineering practice to worry about these things,
> but the standard isn't there to encourage good engineering practice.

C++ does not consider 'char*' as the type of a string.

It has a standard template std::basic_string that can be instantiated on
char (giving std::string) or encoding type (of unicode characters) char16_t,
char32_t, and wchar_t giving rise to u16string, u32string, and wstring.
It has a large number of functions to manipulate a string as a sequence
(Haskell's statu quo) or as a text thanks to an elaborated
localization machinery.

-- Gaby, back to lurking mode

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: What is a punctuation character?

2012-03-20 Thread Gabriel Dos Reis
On Tue, Mar 20, 2012 at 5:37 PM, Iavor Diatchki
 wrote:
> Hello,
>
> So I looked at what GHC does with Unicode and to me it is seems quite
> reasonable:
>
> * The alphabet is Unicode code points, so a valid Haskell program is
> simply a list of those.
> * Combining characters are not allowed in identifiers, so no need for
> complex normalization rules: programs should always use the "short"
> version of a character, or be rejected.
> * Combining characters may appear in string literals, and there they
> are left "as is" without any modification (so some string literals may
> be longer than what's displayed in a text editor.)
>
> Perhaps this is simply what the report already states (I haven't
> checked, for which I apologize) but, if not, perhaps we should clarify
> things.
>
> -Iavor
> PS:  I don't think that there is any need to specify a particular
> representation for the unicode code-points (e.g., utf-8 etc.) in the
> language standard.

Thanks Iavor.

If the report intended to talk about code points only (and indeed ruling
out normalization suggests that), then the Report needs to be
clarified.  As you know, there is a distinction between a Unicode code
point and a Unicode character

http://www.unicode.org/versions/Unicode6.0.0/ch02.pdf#G25564

Until I sent my original query, I had been reading the Report as meaning
Unicode characters (as the grammar seemed to suggest), but now it is
clear to me that only code points were intended.  That seemed to be
confirmed by your investigation of the GHC code base.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: What is a punctuation character?

2012-03-19 Thread Gabriel Dos Reis
On Mon, Mar 19, 2012 at 5:36 AM, Brandon Allbery  wrote:
> On Mon, Mar 19, 2012 at 05:56, Gabriel Dos Reis
>  wrote:
>>
>> The fact that the Report is silent about encoding used to
>> represent concrete Haskell programs in text files adds
>> a certain level of non-portability (and confusion.)  I found
>
>
> Specifying the encoding can *also* limit portability, if you specify an
> encoding that is not widely supported on some target platform.

That is why I find the pragma suggestion attractive.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: What is a punctuation character?

2012-03-19 Thread Gabriel Dos Reis
On Mon, Mar 19, 2012 at 4:34 AM, Simon Marlow  wrote:
>> On Fri, Mar 16, 2012 at 6:49 PM, Ian Lynagh  wrote:
>> > Hi Gaby,
>> >
>> > On Fri, Mar 16, 2012 at 06:29:24PM -0500, Gabriel Dos Reis wrote:
>> >>
>> >> OK, thanks!  I guess a take away from this discussion is that what is
>> >> a punctuation is far less well defined than it appears...
>> >
>> > I'm not really sure what you're asking. Haskell's uniSymbol includes
>> > all Unicode characters (should that be codepoints? I'm not a Unicode
>> > expert) in the punctuation category; I'm not sure what the best
>> > reference is, but e.g. table 12 in
>> >    http://www.unicode.org/reports/tr44/tr44-8.html#Property_Values
>> > lists a number of Px categories, and a meta-category P "Punctuation".
>> >
>> >
>> > Thanks
>> > Ian
>> >
>>
>> Hi Ian,
>>
>> I guess what I am asking was partly summarized in Iavor's message.
>>
>> For me, the issue started with bullet number 4 in section 1.1
>>
>>      http://www.haskell.org/onlinereport/intro.html#sect1.1
>>
>> which states that:
>>
>>        The lexical structure captures the concrete representation
>>        of Haskell programs in text files.
>>
>> That combined with the opening section 2.1 (e.g. example of terminal
>> syntax) and the fact that the grammar  routinely described two non-
>> terminals ascXXX (for ASCII characters) and uniXXX for (Unicode character)
>> suggested that the concrete syntax of Haskell programs in text files is in
>> ASCII charset.  Note this does not conflict with the general statement
>> that Haskell programs use the Unicode character because the uniXXX could
>> use the ASCII charset to introduce Unicode characters -- this is not
>> uncommon practice for programming languages using Unicode characters; see
>> the link I gave earlier.
>>
>> However, if I understand Malcolm's message correctly, this is not the
>> case.
>> Contrary to what I quoted above, Chapter 2 does NOT specify the concrete
>> representation of Haskell programs in text files.  What it does is to
>> capture the structure of what is obtained from interpreting, *in some
>> unspecified encoding or unspecified alphabet*,  the concrete
>> representation of Haskell programs in text files.  This conclusion is
>> unfortunate, but I believe it is correct.
>> Since the encoding or the alphabet is unspecified, it is no longer
>> necessarily the case that two Haskell implementations would agree on the
>> same lexical interpretation when presented with the same exact text file
>> containing  a Haskell program.
>>
>> In its current form, you are correct that the Report should say
>> "codepoint"
>> instead of characters.
>>
>> I join Iavor's request in clarifying the alphabet used in the grammar.
>
> The report gives meaning to a sequence of codepoints only, it says nothing 
> about how that sequence of codepoints is represented as a string of bytes in 
> a file, nor does it say anything about what those files are called, or even 
> whether there are files at all.

Thanks, Simon.

The fact that the Report is silent about encoding used to
represent concrete Haskell programs in text files adds
a certain level of non-portability (and confusion.)  I found
last night that a proposal has been made to add some
support for encoding specification

http://hackage.haskell.org/trac/haskell-prime/wiki/UnicodeInHaskellSource

I believe that is a good start.  What are the odds of it being considered
for Haskell 2012?  I suspect the pragma proposal works only if something
is said about the position of that pragma in the source file (e.g. it
must be the
first line, or file N bytes in the source file) otherwise we have an
infinite descent.


>
> Perhaps some clarification is in order in a future revision, and we should 
> use the correct terminology where appropriate.  We should also clarify that 
> "punctuation" means exactly the Punctuation class.

That would be great.  Do you have any comment about the
UnicodeInHaskellSource proposal?

> With regards to normalisation and equivalence, my understanding is that 
> Haskell does not support either: two identifiers are equal if and only if 
> they are represented by the same sequence of codepoints.  Again, we could add 
> a clarifying sentence to the report.
>

Ugh.

Writing a parser for Haskell was an interesting exercise :-)

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: What is a punctuation character?

2012-03-17 Thread Gabriel Dos Reis
On Fri, Mar 16, 2012 at 6:49 PM, Ian Lynagh  wrote:
> Hi Gaby,
>
> On Fri, Mar 16, 2012 at 06:29:24PM -0500, Gabriel Dos Reis wrote:
>>
>> OK, thanks!  I guess a take away from this discussion is that what
>> is a punctuation is far less well defined than it appears...
>
> I'm not really sure what you're asking. Haskell's uniSymbol includes all
> Unicode characters (should that be codepoints? I'm not a Unicode expert)
> in the punctuation category; I'm not sure what the best reference is,
> but e.g. table 12 in
>    http://www.unicode.org/reports/tr44/tr44-8.html#Property_Values
> lists a number of Px categories, and a meta-category P "Punctuation".
>
>
> Thanks
> Ian
>

Hi Ian,

I guess what I am asking was partly summarized in Iavor's message.

For me, the issue started with bullet number 4 in section 1.1

 http://www.haskell.org/onlinereport/intro.html#sect1.1

which states that:

   The lexical structure captures the concrete representation
   of Haskell programs in text files.

That combined with the opening section 2.1 (e.g. example of terminal syntax)
and the fact that the grammar  routinely described two non-terminals
ascXXX (for ASCII characters) and uniXXX for (Unicode character)
suggested that the concrete syntax of Haskell programs in text files
is in ASCII charset.  Note this does not conflict with the
general statement that Haskell programs use the Unicode character
because the uniXXX could use the ASCII charset to introduce Unicode
characters -- this is not uncommon practice for programming languages
using Unicode characters; see the link I gave earlier.

However, if I understand Malcolm's message correctly, this is not the case.
Contrary to what I quoted above, Chapter 2 does NOT specify the concrete
representation of Haskell programs in text files.  What it does is to capture
the structure of what is obtained from interpreting, *in some unspecified
encoding or unspecified alphabet*,  the concrete representation of Haskell
programs in text files.  This conclusion is unfortunate, but I believe
it is correct.
Since the encoding or the alphabet is unspecified, it is no longer necessarily
the case that two Haskell implementations would agree on the same lexical
interpretation when presented with the same exact text file containing
 a Haskell program.

In its current form, you are correct that the Report should say "codepoint"
instead of characters.

I join Iavor's request in clarifying the alphabet used in the grammar.

Thanks,

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: What is a punctuation character?

2012-03-16 Thread Gabriel Dos Reis
On Fri, Mar 16, 2012 at 6:00 PM, Malcolm Wallace  wrote:
>>> no purpose to a completely overlapping category unless it is intended to
>>> relate to an earlier standard (say Haskell 1.4).
>
> I believe all Haskell Reports, even since 1.0, have specified that the 
> language "uses" Unicode.  If it helps to bring perspective to this 
> discussion, it is my impression that the initial designers of Haskell did not 
> know very much about Unicode, but wanted to avoid the trap of being stuck 
> with ASCII-only, and so decided to reference "whatever Unicode does", as the 
> most obvious and unambiguous way of not having to think about (or specify) 
> these lexical issues themselves.
>

OK.

>> One of the underlying questions is: what is the concrete syntax of a
>> Unicode character in a Haskell program?  Note that Chapter 2 goes to a great 
>> pain to
>> specify the ASCII concrete syntax.
>
> In my view, the Haskell Report is deliberately agnostic on concrete syntax 
> for Unicode, believing that to be outside the scope of a programming language 
> standard, whilst entirely within the scope of the Unicode standards body.

The trouble is the Unicode standards body believes that the concrete syntax
is entirely within the scope of the programming language definition
(or any client
using Unicode characters), whilst largely restricting itself to the
talking about
code points which are more abstract.  So, the trick of reference the
Unicode standards
is not satisfactory :-(

> Seeing as there are (in practice) numerous concrete representations of 
> Unicode (UTF-8 and other encodings), it is largely up to individual compiler 
> implementations which encodings they support for (a) source text, and (b) 
> input/output at runtime.

OK, thanks!  I guess a take away from this discussion is that what
is a punctuation is far less well defined than it appears...

A common practice (exemplified by the link I gave earlier) is to restrict the
concrete -syntax- of the input program to the ASCII charset, and use Unicode
escape sequences to include the entire Unicode charset.  It is common to use
\uNN or \UNN to introduce Unicode characters, but I suspect that is
out of question for Haskell programs because it would clash with
lambda abstraction.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: What is a punctuation character?

2012-03-16 Thread Gabriel Dos Reis
On Fri, Mar 16, 2012 at 3:22 PM, Brandon Allbery  wrote:
> On Fri, Mar 16, 2012 at 15:20, Gabriel Dos Reis
>  wrote:
>>
>> I believe this part has seen very little change from the Revised
>> Haskell 98 Report.
>
>
> I was in fact looking at the Haskell 98 report at the time.
>
>>
>> It is not clear that it is an unintended leftover.  Section 2.1 that
>
>
> Nothing is ever clear.  This useless pedanticism being stipulated, there is

I very much appreciate any clarification you have on the topic.  However, I
believe we do best when we leave phrases like "useless pedanticism"
or "pedantically"  out.  They are rarely constructive and no substance to an
otherwise informative discussion.  At best, they would distract us.

(In matter of programming language definition, "pedanticism" should be the
least of our worries -- and it probably should not come with a modifier
such as "useless", we should probably wear it as badge of honor.)

> no purpose to a completely overlapping category unless it is intended to
> relate to an earlier standard (say Haskell 1.4).

which in itself is not an unambiguous interpretation :-)

>>
>> Unicode support is clearly intended.  Also clearly, ASCII support is
>> intended.
>> However, the Report does not say what the concrete syntax of a Unicode
>> character
>> should be. (At least I have been unable to find it from the report.)
>
>
> Maybe what needs to be pedantically specified is that the link to the
> Unicode standard is intended to be inclusion of that standard by reference
> (the [11] in the section I quoted is an endnote referencing the Unicode
> standard) and not merely informational.  Or are you insisting we are not
> precise enough unless we enumerate all the Unicode characters explicitly in
> the Haskell standard?

Giving a link to the Unicode standard does not really help with the
original questions.
I know where to find the Unicode standard; that wasn't the issue.

One of the underlying questions is: what is the concrete syntax of a
Unicode character
in a Haskell program?  Note that Chapter 2 goes to a great pain to
specify the ASCII
concrete syntax.

To put things in perspective, have look at this specification of
programs supposed
to be written using Unicode characters.

   http://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.2

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: What is a punctuation character?

2012-03-16 Thread Gabriel Dos Reis
On Fri, Mar 16, 2012 at 1:49 PM, Brandon Allbery  wrote:
> On Fri, Mar 16, 2012 at 14:30, Gabriel Dos Reis
>  wrote:
>>
>> It is not clear what "the language's lexemes are defined in terms of
>> Unicode properties"
>> really means.  Why would you need ascSmall (and similar ASCII
>> character categories) then
>> when you already have uniSmall and associates?
>
>
> I have to assume that is a leftover from an earlier version of the report,
> because it is indeed already included.

I believe this part has seen very little change from the Revised
Haskell 98 Report.
It is not clear that it is an unintended leftover.  Section 2.1 that
you quote below
is the same as in the (Revised) Haskell 98 report.

> See in section 2.1:
>
> "Haskell uses the Unicode [11] character set. However, source programs are
> currently biased toward the ASCII character set used in earlier versions of
> Haskell."
>
> I understand this to indicate that Unicode character classes are intended,
> and it does indeed hint that references to ASCII are references to older
> versions of the language (and should probably be considered fossils, as
> ASCII itself is; the American Standard Code for Information Interchange was
> obsoleted by ISO 8859, and modern references to "ASCII" usually should be
> taken to mean "ISO 8859/1").

Unicode support is clearly intended.  Also clearly, ASCII support is intended.
However, the Report does not say what the concrete syntax of a Unicode character
should be. (At least I have been unable to find it from the report.)

>>
>> It is not clear that (b) is all that "not particularly meaningful".
>> Have a look at the production
>> : it excludes double quote(") and apostrophe (') from uniSymbol.
>
>
> The notion of "symbol with certain lexicals that have other meanings *that
> are specified elsewhere in the report*" is not precise enough?  It may be
> difficult to characterize things with your required precision, since every
> general statement will necessarily have to carry part or potentially all of
> the entire Report within it if it is not sufficient to use the statement's
> context (as describing some part of the Report).

Well, I hope nobody is suggesting that it is unreasonable to require precision
of a language definition -- especially of Haskell! :-)

A problem with "use the statement's context" is that the context themselves
are not unquestionably unambiguous -- which is part of the reason we are having
this conversation in the first place.

That being said, I am not sure how the passage you quote applies here
or answers conclusively  the original questions. Where else is punctutation
defined in the Report?  What is the concrete syntax of a punctuation?  If you
were going to write a lexer and a parser for Haskell, how you would recognize
a character as a punctuation?

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: What is a punctuation character?

2012-03-16 Thread Gabriel Dos Reis
On Fri, Mar 16, 2012 at 1:18 PM, Brandon Allbery  wrote:
> On Fri, Mar 16, 2012 at 14:08, Gabriel Dos Reis
>  wrote:
>>
>> The lexical structure chapter defines the non-terminal uniSymbol as
>>
>>     uniSymbol ::= any Unicode symbol or punctuation
>>
>> There is a slight ambiguity here: is that description supposed to
>> be parsed as:
>>   (a) "Unicode (symbol or punctuation)", or
>>   (b) "(Unicode symbol) or punctuation"?
>
>
> (a) and I thought the report specified that the language's lexemes are
> defined in terms of Unicode properties so (a) is the only meaningful
> interpretation.  (b) is not particularly meaningful, as your own question
> demonstrates.

It is not clear what "the language's lexemes are defined in terms of
Unicode properties"
really means.  Why would you need ascSmall (and similar ASCII
character categories) then
when you already have uniSmall and associates?

It is not clear that (b) is all that "not particularly meaningful".
Have a look at the production
: it excludes double quote(") and apostrophe (') from uniSymbol.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


What is a punctuation character?

2012-03-16 Thread Gabriel Dos Reis
Hi,

The lexical structure chapter defines the non-terminal uniSymbol as

 uniSymbol ::= any Unicode symbol or punctuation

There is a slight ambiguity here: is that description supposed to
be parsed as:
   (a) "Unicode (symbol or punctuation)", or
   (b) "(Unicode symbol) or punctuation"?

If (b), then what qualifies as "punctuation"?  As far as I can tell,
that is not defined anywhere in the Report.  Is it "punctuation" in the
basic ASCII charset or in the extended ASCII charset?  Everywhere
else the Report has been careful in listing which ASCII characters
are meant.

Thanks,

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Hierarchical module broken link

2012-03-15 Thread Gabriel Dos Reis
Hi,

The page with link

  http://www.haskell.org/hierarchical-modules/

from

  http://hackage.haskell.org/trac/haskell-prime/wiki/HierarchicalModules

is broken (404 error).

Thanks,

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: Standard (core) libraries initiative: rationale

2006-12-02 Thread Gabriel Dos Reis
Laurent Deniau <[EMAIL PROTECTED]> writes:

[...]

| About the libraries, I should say that I was a bit disappointed by the
| common use of the terms genericity and polymorphism (even in
| books). For example I have read many times that "length" is
| polymorphic or generic while it only computes the length of a list. 

I suspect that reflects a common view, that the list datatype
is The materialization of the notion of sequence -- anything else is
either obscure or marginal or perversion :-/

| It probably reflects that standard libraries do not provide enough
| *generic* API through the use of classes and overloading. If you look
| at Java evolution (not my favorite language), a lot of classes have
| been converted into interfaces to improve its flexibility and its
| evolution. In Haskell, most functions should be member of a generic
| API through classes and overloaded for common types like list (the
| paper 'Software Extension and Integration with Type Classes' may help
| here). 

I agree with what you say.  However, the trouble with is that a
descent job at categorizing and implementing containers with type
classes will require, at the minimum, multi-parameter type classes
functional dependencies, associated types or equivalents.  Those
topics seem to be subject of heated debates among seasoned Haskellers.
Personally, I'm looking forward to a standardization of
multit-parameter type classes with functional dependency -- at least a
blessing of the parts that "work". 

Simon PJ has an excellent paper on "bulk types with class" that you
might want to consult for some thorny issues.

[...]

| Finally, the rules associated with monads are just there to
| allow common usage, that is the composition of monad (e.g. in do
| notation) and the encapsulation of the 'thing' when entering the
| monad.

And change the name "monad" to something less scary for common
programmers too? :-)

-- Gaby
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: proposal: standardize interface to Haskell' implementations

2006-02-13 Thread Gabriel Dos Reis
"Claus Reinke" <[EMAIL PROTECTED]> writes:

[...]

| the point is to standardise an api to functionality that all
| haskell implementations will need in some form or other and that all
| haskell tools should be able to depend on.

something in line of Template Haskell?

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: proposal: standardize interface to Haskell' implementations

2006-02-13 Thread Gabriel Dos Reis
Neil Mitchell <[EMAIL PROTECTED]> writes:

[...]

| Because of all this, if you make a standard like this, you basically
| dictate a large part of the implementation, and it seems no one wants
| to follow the same implementation path...

Indeed.  I'm not sure ASIS is as successful as it was intended to.
Previous experience with DIANA (still Ada) was no more successful.

-- Gaby
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime