subject:"Re\: RFC\: Unicode primes and super\/subscript characters in GHC"

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-27 Thread John Meacham

Yeah, I specifically excluded ascii prime (') from special handling in
jhc due to its already overloaded meaning in haskell. I just added the
subscript/superscript ones to the 'trailing' character class.

John

On Wed, Jun 25, 2014 at 12:54 PM, Mikhail Vorozhtsov
 wrote:
> Isn't it weird that you can't write `a₁'`? I was considering proposing
>
> varid -> (small { small | large | digit | ' | primes } { subsup | primes })
> (EXCEPT reservedid)
>
> but felt that it would be odd to allow primes in the middle of an identifier
> but not super/subscripts. I wish we could just abandon things like `a'bc'd`
> altogether...
>
>
> On 06/15/2014 03:58 AM, John Meacham wrote:
>>
>> I have this feature in jhc, where I have a 'trailing' character class
>> that can appear at the end of both symbols and ids.
>>
>> currently it consists of
>>
>>   $trailing = [₀₁₂₃₄₅₆₇₈₉⁰¹²³⁴⁵⁶⁷⁸⁹₍₎⁽⁾₊₋]
>>
>>   John
>>
>> On Sat, Jun 14, 2014 at 7:48 AM, Mikhail Vorozhtsov
>>  wrote:
>>>
>>> Hello lists,
>>>
>>> As some of you may know, GHC's support for Unicode characters in lexemes
>>> is
>>> rather crude and hence prone to inconsistencies in their handling versus
>>> the
>>> ASCII counterparts. For example, APOSTROPHE is treated differently from
>>> PRIME:
>>>
>>> λ> data a +' b = Plus a b
>>> :3:9:
>>>  Unexpected type ‘b’
>>>  In the data declaration for ‘+’
>>>  A data declaration should have form
>>>data + a b c = ...
>>> λ> data a +′ b = Plus a b
>>>
>>> λ> let a' = 1
>>> λ> let a′ = 1
>>> :10:8: parse error on input ‘=’
>>>
>>> Also some rather bizarre looking things are accepted:
>>>
>>> λ> let ᵤxᵤy = 1
>>>
>>> In the spirit of improving things little by little I would like to
>>> propose:
>>>
>>> 1. Handle single/double/triple/quadruple Unicode PRIMEs the same way as
>>> APOSTROPHE, meaning the following alterations to the lexer:
>>>
>>> primes -> U+2032 | U+2033 | U+2034 | U+2057
>>> symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes)
>>> graphic -> small | large | symbol | digit | special | " | ' | primes
>>> varid -> (small { small | large | digit | ' | primes }) (EXCEPT
>>> reservedid)
>>> conid -> large { small | large | digit | ' | primes }
>>>
>>> 2. Introduce a new lexer nonterminal "subsup" that would include the
>>> Unicode
>>> sub/superscript[1] versions of numbers, "-", "+", "=", "(", ")", Latin
>>> and
>>> Greek letters. And allow these characters to be used in names and
>>> operators:
>>>
>>> symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes |
>>> subsup )
>>> digit -> ascDigit | uniDigit (EXCEPT subsup)
>>> small -> ascSmall | uniSmall (EXCEPT subsup) | _
>>> large -> ascLarge | uniLarge (EXCEPT subsup)
>>> graphic -> small | large | symbol | digit | special | " | ' | primes |
>>> subsup
>>> varid -> (small { small | large | digit | ' | primes | subsup }) (EXCEPT
>>> reservedid)
>>> conid -> large { small | large | digit | ' | primes | subsup }
>>> varsym -> (symbol (EXCEPT :) {symbol | subsup}) (EXCEPT reservedop |
>>> dashes)
>>> consym -> (: {symbol | subsup}) (EXCEPT reservedop)
>>>
>>> If this proposal is received favorably, I'll write a patch for GHC based
>>> on
>>> my previous stab at the problem[2].
>>>
>>> P.S. I'm CC-ing Cafe for extra attention, but please keep the discussion
>>> to
>>> the GHC users list.
>>>
>>> [1] https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
>>> [2] https://ghc.haskell.org/trac/ghc/ticket/5108
>>> ___
>>> Glasgow-haskell-users mailing list
>>> Glasgow-haskell-users@haskell.org
>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>
>>
>>
>



-- 
John Meacham - http://notanumber.net/
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-25 Thread Mikhail Vorozhtsov


Isn't it weird that you can't write `a₁'`? I was considering proposing

varid -> (small { small | large | digit | ' | primes } { subsup | primes 
}) (EXCEPT reservedid)


but felt that it would be odd to allow primes in the middle of an 
identifier but not super/subscripts. I wish we could just abandon things 
like `a'bc'd` altogether...


On 06/15/2014 03:58 AM, John Meacham wrote:

I have this feature in jhc, where I have a 'trailing' character class
that can appear at the end of both symbols and ids.

currently it consists of

  $trailing = [₀₁₂₃₄₅₆₇₈₉⁰¹²³⁴⁵⁶⁷⁸⁹₍₎⁽⁾₊₋]

  John

On Sat, Jun 14, 2014 at 7:48 AM, Mikhail Vorozhtsov
 wrote:

Hello lists,

As some of you may know, GHC's support for Unicode characters in lexemes is
rather crude and hence prone to inconsistencies in their handling versus the
ASCII counterparts. For example, APOSTROPHE is treated differently from
PRIME:

λ> data a +' b = Plus a b
:3:9:
 Unexpected type ‘b’
 In the data declaration for ‘+’
 A data declaration should have form
   data + a b c = ...
λ> data a +′ b = Plus a b

λ> let a' = 1
λ> let a′ = 1
:10:8: parse error on input ‘=’

Also some rather bizarre looking things are accepted:

λ> let ᵤxᵤy = 1

In the spirit of improving things little by little I would like to propose:

1. Handle single/double/triple/quadruple Unicode PRIMEs the same way as
APOSTROPHE, meaning the following alterations to the lexer:

primes -> U+2032 | U+2033 | U+2034 | U+2057
symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes)
graphic -> small | large | symbol | digit | special | " | ' | primes
varid -> (small { small | large | digit | ' | primes }) (EXCEPT reservedid)
conid -> large { small | large | digit | ' | primes }

2. Introduce a new lexer nonterminal "subsup" that would include the Unicode
sub/superscript[1] versions of numbers, "-", "+", "=", "(", ")", Latin and
Greek letters. And allow these characters to be used in names and operators:

symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes |
subsup )
digit -> ascDigit | uniDigit (EXCEPT subsup)
small -> ascSmall | uniSmall (EXCEPT subsup) | _
large -> ascLarge | uniLarge (EXCEPT subsup)
graphic -> small | large | symbol | digit | special | " | ' | primes |
subsup
varid -> (small { small | large | digit | ' | primes | subsup }) (EXCEPT
reservedid)
conid -> large { small | large | digit | ' | primes | subsup }
varsym -> (symbol (EXCEPT :) {symbol | subsup}) (EXCEPT reservedop | dashes)
consym -> (: {symbol | subsup}) (EXCEPT reservedop)

If this proposal is received favorably, I'll write a patch for GHC based on
my previous stab at the problem[2].

P.S. I'm CC-ing Cafe for extra attention, but please keep the discussion to
the GHC users list.

[1] https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
[2] https://ghc.haskell.org/trac/ghc/ticket/5108
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users





___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-17 Thread John Meacham

Don't forget that for every line of haskell code on hackage there are
dozens of lines used internally within organizations where
compatibility beyond their target internal tools may not be a concern.
Deciding on a policy of allowing primes or whatnot within an
organization seems quite plausible and doesn't entail CPP concerns.

John

On Sun, Jun 15, 2014 at 5:26 PM, Mateusz Kowalczyk
 wrote:
> On 06/14/2014 04:48 PM, Mikhail Vorozhtsov wrote:
>> Hello lists,
>>
>> As some of you may know, GHC's support for Unicode characters in lexemes
>> is rather crude and hence prone to inconsistencies in their handling
>> versus the ASCII counterparts. For example, APOSTROPHE is treated
>> differently from PRIME:
>>
>> λ> data a +' b = Plus a b
>> :3:9:
>>  Unexpected type ‘b’
>>  In the data declaration for ‘+’
>>  A data declaration should have form
>>data + a b c = ...
>> λ> data a +′ b = Plus a b
>>
>> λ> let a' = 1
>> λ> let a′ = 1
>> :10:8: parse error on input ‘=’
>>
>> Also some rather bizarre looking things are accepted:
>>
>> λ> let ᵤxᵤy = 1
>>
>> In the spirit of improving things little by little I would like to propose:
>>
>> 1. Handle single/double/triple/quadruple Unicode PRIMEs the same way as
>> APOSTROPHE, meaning the following alterations to the lexer:
>>
>> primes -> U+2032 | U+2033 | U+2034 | U+2057
>> symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes)
>> graphic -> small | large | symbol | digit | special | " | ' | primes
>> varid -> (small { small | large | digit | ' | primes }) (EXCEPT reservedid)
>> conid -> large { small | large | digit | ' | primes }
>>
>> 2. Introduce a new lexer nonterminal "subsup" that would include the
>> Unicode sub/superscript[1] versions of numbers, "-", "+", "=", "(", ")",
>> Latin and Greek letters. And allow these characters to be used in names
>> and operators:
>>
>> symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes |
>> subsup )
>> digit -> ascDigit | uniDigit (EXCEPT subsup)
>> small -> ascSmall | uniSmall (EXCEPT subsup) | _
>> large -> ascLarge | uniLarge (EXCEPT subsup)
>> graphic -> small | large | symbol | digit | special | " | ' | primes |
>> subsup
>> varid -> (small { small | large | digit | ' | primes | subsup }) (EXCEPT
>> reservedid)
>> conid -> large { small | large | digit | ' | primes | subsup }
>> varsym -> (symbol (EXCEPT :) {symbol | subsup}) (EXCEPT reservedop | dashes)
>> consym -> (: {symbol | subsup}) (EXCEPT reservedop)
>>
>> If this proposal is received favorably, I'll write a patch for GHC based
>> on my previous stab at the problem[2].
>>
>> P.S. I'm CC-ing Cafe for extra attention, but please keep the discussion
>> to the GHC users list.
>>
>> [1] https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
>> [2] https://ghc.haskell.org/trac/ghc/ticket/5108
>> ___
>> Glasgow-haskell-users mailing list
>> Glasgow-haskell-users@haskell.org
>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>
>
> While personally I like the proposal (wanted prime and sub/sup scripts
> way too many times), I worry what this means for compatibility reasons:
> suddenly we'll have code that fails to build on 7.8 and before because
> someone using 7.9/7.10+ used ′ somewhere. Even using CPP based on
> version of the compiler used is not too great in this scenario because
> it doesn't bring significant practical advantage to justify the CPP
> clutter in code. If the choice is either extra lines due to CPP or using
> ‘'’ instead of ‘′’, I know which I'll go for.
>
> I also worry (although not based on anything particular you said)
> whether this will not change meaning of any existing programs. Does it
> only allow new programs?
>
> Will it be enabled by a pragma?
>
> I simply worry about how practical it will be to use for actual programs
> and libraries that will go out on Hackage and wider world, even if it is
> accepted.
>
> --
> Mateusz K.
> ___
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users@haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



-- 
John Meacham - http://notanumber.net/
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-17 Thread Mikhail Vorozhtsov


On 06/17/2014 03:13 AM, Tsuyoshi Ito wrote:

Hello,

Mikhail Vorozhtsov  wrote:

I also worry (although not based on anything particular you said)
whether this will not change meaning of any existing programs. Does it
only allow new programs?

As far as I can see, no change in meaning. Some hacky operators and some
hacky identifiers would become illegal. And some nicer ones would become
legal.

I do not have an opinion for or against the proposal, but I just
wanted to point out that the proposal changes the meaning of some
programs, at least in theory.  The following function currently
evaluates to True, but with the proposed change, I think that it will
evaluate to False.

test =
   let a = ()
   b = ()
   a′b = False in
   let c′d = True in
   a′b

Best regards,
Tsuyoshi

Good catch. Indeed, PRIME will no longer be a valid operator.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-16 Thread Tsuyoshi Ito

Hello,

Mikhail Vorozhtsov  wrote:
>> I also worry (although not based on anything particular you said)
>> whether this will not change meaning of any existing programs. Does it
>> only allow new programs?
>
> As far as I can see, no change in meaning. Some hacky operators and some
> hacky identifiers would become illegal. And some nicer ones would become
> legal.

I do not have an opinion for or against the proposal, but I just
wanted to point out that the proposal changes the meaning of some
programs, at least in theory.  The following function currently
evaluates to True, but with the proposed change, I think that it will
evaluate to False.

test =
  let a = ()
  b = ()
  a′b = False in
  let c′d = True in
  a′b

Best regards,
Tsuyoshi
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-16 Thread Mikhail Vorozhtsov


On 06/16/2014 04:26 AM, Mateusz Kowalczyk wrote:

On 06/14/2014 04:48 PM, Mikhail Vorozhtsov wrote:

Hello lists,

As some of you may know, GHC's support for Unicode characters in lexemes
is rather crude and hence prone to inconsistencies in their handling
versus the ASCII counterparts. For example, APOSTROPHE is treated
differently from PRIME:

λ> data a +' b = Plus a b
:3:9:
  Unexpected type ‘b’
  In the data declaration for ‘+’
  A data declaration should have form
data + a b c = ...
λ> data a +′ b = Plus a b

λ> let a' = 1
λ> let a′ = 1
:10:8: parse error on input ‘=’

Also some rather bizarre looking things are accepted:

λ> let ᵤxᵤy = 1

In the spirit of improving things little by little I would like to propose:

1. Handle single/double/triple/quadruple Unicode PRIMEs the same way as
APOSTROPHE, meaning the following alterations to the lexer:

primes -> U+2032 | U+2033 | U+2034 | U+2057
symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes)
graphic -> small | large | symbol | digit | special | " | ' | primes
varid -> (small { small | large | digit | ' | primes }) (EXCEPT reservedid)
conid -> large { small | large | digit | ' | primes }

2. Introduce a new lexer nonterminal "subsup" that would include the
Unicode sub/superscript[1] versions of numbers, "-", "+", "=", "(", ")",
Latin and Greek letters. And allow these characters to be used in names
and operators:

symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes |
subsup )
digit -> ascDigit | uniDigit (EXCEPT subsup)
small -> ascSmall | uniSmall (EXCEPT subsup) | _
large -> ascLarge | uniLarge (EXCEPT subsup)
graphic -> small | large | symbol | digit | special | " | ' | primes |
subsup
varid -> (small { small | large | digit | ' | primes | subsup }) (EXCEPT
reservedid)
conid -> large { small | large | digit | ' | primes | subsup }
varsym -> (symbol (EXCEPT :) {symbol | subsup}) (EXCEPT reservedop | dashes)
consym -> (: {symbol | subsup}) (EXCEPT reservedop)

If this proposal is received favorably, I'll write a patch for GHC based
on my previous stab at the problem[2].

P.S. I'm CC-ing Cafe for extra attention, but please keep the discussion
to the GHC users list.

[1] https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
[2] https://ghc.haskell.org/trac/ghc/ticket/5108
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


While personally I like the proposal (wanted prime and sub/sup scripts
way too many times), I worry what this means for compatibility reasons:
suddenly we'll have code that fails to build on 7.8 and before because
someone using 7.9/7.10+ used ′ somewhere. Even using CPP based on
version of the compiler used is not too great in this scenario because
it doesn't bring significant practical advantage to justify the CPP
clutter in code. If the choice is either extra lines due to CPP or using
‘'’ instead of ‘′’, I know which I'll go for.


Currently GHC categorizes Unicode PRIME as a "symbol", which means that 
it is allowed to appear only in operators (varsym and consym). So yes, 
if somebody is using things like "+′" or ":+′" (and they really 
shouldn't), they would be hit by this change. Identifiers like "ᵤx" 
would become illegal too. I'd be surprised to find an actual Hackage 
library that does that though.




I also worry (although not based on anything particular you said)
whether this will not change meaning of any existing programs. Does it
only allow new programs?


As far as I can see, no change in meaning. Some hacky operators and some 
hacky identifiers would become illegal. And some nicer ones would become 
legal.




Will it be enabled by a pragma?


No, GHC accepts Unicode input without any pragmas.



I simply worry about how practical it will be to use for actual programs
and libraries that will go out on Hackage and wider world, even if it is
accepted.


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-16 Thread Herbert Valerio Riedel

On 2014-06-16 at 02:26:49 +0200, Mateusz Kowalczyk wrote:

[...]

> While personally I like the proposal (wanted prime and sub/sup scripts
> way too many times), I worry what this means for compatibility reasons:
> suddenly we'll have code that fails to build on 7.8 and before because
> someone using 7.9/7.10+ used ′ somewhere. 

Fwiw, we already had that situation. The following code

{-# LANGUAGE UnicodeSyntax #-}

module Foo where

x₀ ∷ Double
x₀ = 1.5

compiles with GHC ≥ 7.2, but with GHC 7.0 fails with:

GHCi, version 7.0.4: http://www.haskell.org/ghc/  :? for help
[1 of 1] Compiling Foo  ( foo.hs, interpreted )

foo.hs:5:2: lexical error at character '\8320'
Failed, modules loaded: none.


Cheers,
  hvr
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-15 Thread Mateusz Kowalczyk

On 06/14/2014 04:48 PM, Mikhail Vorozhtsov wrote:
> Hello lists,
> 
> As some of you may know, GHC's support for Unicode characters in lexemes 
> is rather crude and hence prone to inconsistencies in their handling 
> versus the ASCII counterparts. For example, APOSTROPHE is treated 
> differently from PRIME:
> 
> λ> data a +' b = Plus a b
> :3:9:
>  Unexpected type ‘b’
>  In the data declaration for ‘+’
>  A data declaration should have form
>data + a b c = ...
> λ> data a +′ b = Plus a b
> 
> λ> let a' = 1
> λ> let a′ = 1
> :10:8: parse error on input ‘=’
> 
> Also some rather bizarre looking things are accepted:
> 
> λ> let ᵤxᵤy = 1
> 
> In the spirit of improving things little by little I would like to propose:
> 
> 1. Handle single/double/triple/quadruple Unicode PRIMEs the same way as 
> APOSTROPHE, meaning the following alterations to the lexer:
> 
> primes -> U+2032 | U+2033 | U+2034 | U+2057
> symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes)
> graphic -> small | large | symbol | digit | special | " | ' | primes
> varid -> (small { small | large | digit | ' | primes }) (EXCEPT reservedid)
> conid -> large { small | large | digit | ' | primes }
> 
> 2. Introduce a new lexer nonterminal "subsup" that would include the 
> Unicode sub/superscript[1] versions of numbers, "-", "+", "=", "(", ")", 
> Latin and Greek letters. And allow these characters to be used in names 
> and operators:
> 
> symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes | 
> subsup )
> digit -> ascDigit | uniDigit (EXCEPT subsup)
> small -> ascSmall | uniSmall (EXCEPT subsup) | _
> large -> ascLarge | uniLarge (EXCEPT subsup)
> graphic -> small | large | symbol | digit | special | " | ' | primes | 
> subsup
> varid -> (small { small | large | digit | ' | primes | subsup }) (EXCEPT 
> reservedid)
> conid -> large { small | large | digit | ' | primes | subsup }
> varsym -> (symbol (EXCEPT :) {symbol | subsup}) (EXCEPT reservedop | dashes)
> consym -> (: {symbol | subsup}) (EXCEPT reservedop)
> 
> If this proposal is received favorably, I'll write a patch for GHC based 
> on my previous stab at the problem[2].
> 
> P.S. I'm CC-ing Cafe for extra attention, but please keep the discussion 
> to the GHC users list.
> 
> [1] https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
> [2] https://ghc.haskell.org/trac/ghc/ticket/5108
> ___
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users@haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> 

While personally I like the proposal (wanted prime and sub/sup scripts
way too many times), I worry what this means for compatibility reasons:
suddenly we'll have code that fails to build on 7.8 and before because
someone using 7.9/7.10+ used ′ somewhere. Even using CPP based on
version of the compiler used is not too great in this scenario because
it doesn't bring significant practical advantage to justify the CPP
clutter in code. If the choice is either extra lines due to CPP or using
‘'’ instead of ‘′’, I know which I'll go for.

I also worry (although not based on anything particular you said)
whether this will not change meaning of any existing programs. Does it
only allow new programs?

Will it be enabled by a pragma?

I simply worry about how practical it will be to use for actual programs
and libraries that will go out on Hackage and wider world, even if it is
accepted.

-- 
Mateusz K.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-14 Thread John Meacham

I have this feature in jhc, where I have a 'trailing' character class
that can appear at the end of both symbols and ids.

currently it consists of

 $trailing = [₀₁₂₃₄₅₆₇₈₉⁰¹²³⁴⁵⁶⁷⁸⁹₍₎⁽⁾₊₋]

 John

On Sat, Jun 14, 2014 at 7:48 AM, Mikhail Vorozhtsov
 wrote:
> Hello lists,
>
> As some of you may know, GHC's support for Unicode characters in lexemes is
> rather crude and hence prone to inconsistencies in their handling versus the
> ASCII counterparts. For example, APOSTROPHE is treated differently from
> PRIME:
>
> λ> data a +' b = Plus a b
> :3:9:
> Unexpected type ‘b’
> In the data declaration for ‘+’
> A data declaration should have form
>   data + a b c = ...
> λ> data a +′ b = Plus a b
>
> λ> let a' = 1
> λ> let a′ = 1
> :10:8: parse error on input ‘=’
>
> Also some rather bizarre looking things are accepted:
>
> λ> let ᵤxᵤy = 1
>
> In the spirit of improving things little by little I would like to propose:
>
> 1. Handle single/double/triple/quadruple Unicode PRIMEs the same way as
> APOSTROPHE, meaning the following alterations to the lexer:
>
> primes -> U+2032 | U+2033 | U+2034 | U+2057
> symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes)
> graphic -> small | large | symbol | digit | special | " | ' | primes
> varid -> (small { small | large | digit | ' | primes }) (EXCEPT reservedid)
> conid -> large { small | large | digit | ' | primes }
>
> 2. Introduce a new lexer nonterminal "subsup" that would include the Unicode
> sub/superscript[1] versions of numbers, "-", "+", "=", "(", ")", Latin and
> Greek letters. And allow these characters to be used in names and operators:
>
> symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes |
> subsup )
> digit -> ascDigit | uniDigit (EXCEPT subsup)
> small -> ascSmall | uniSmall (EXCEPT subsup) | _
> large -> ascLarge | uniLarge (EXCEPT subsup)
> graphic -> small | large | symbol | digit | special | " | ' | primes |
> subsup
> varid -> (small { small | large | digit | ' | primes | subsup }) (EXCEPT
> reservedid)
> conid -> large { small | large | digit | ' | primes | subsup }
> varsym -> (symbol (EXCEPT :) {symbol | subsup}) (EXCEPT reservedop | dashes)
> consym -> (: {symbol | subsup}) (EXCEPT reservedop)
>
> If this proposal is received favorably, I'll write a patch for GHC based on
> my previous stab at the problem[2].
>
> P.S. I'm CC-ing Cafe for extra attention, but please keep the discussion to
> the GHC users list.
>
> [1] https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
> [2] https://ghc.haskell.org/trac/ghc/ticket/5108
> ___
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users@haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



-- 
John Meacham - http://notanumber.net/
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: RFC: Unicode primes and super/subscript characters in GHC

Re: RFC: Unicode primes and super/subscript characters in GHC

Re: RFC: Unicode primes and super/subscript characters in GHC

Re: RFC: Unicode primes and super/subscript characters in GHC

Re: RFC: Unicode primes and super/subscript characters in GHC

Re: RFC: Unicode primes and super/subscript characters in GHC

Re: RFC: Unicode primes and super/subscript characters in GHC

Re: RFC: Unicode primes and super/subscript characters in GHC

Re: RFC: Unicode primes and super/subscript characters in GHC

9 matches

Site Navigation

Mail list logo

Footer information