Re: RFC: Unicode primes and super/subscript characters in GHC

Mikhail Vorozhtsov Wed, 25 Jun 2014 12:55:32 -0700

Isn't it weird that you can't write `a₁'`? I was considering proposing

varid -> (small { small | large | digit | ' | primes } { subsup | primes}) (EXCEPT reservedid)

but felt that it would be odd to allow primes in the middle of anidentifier but not super/subscripts. I wish we could just abandon thingslike `a'bc'd` altogether...


On 06/15/2014 03:58 AM, John Meacham wrote:

I have this feature in jhc, where I have a 'trailing' character class
that can appear at the end of both symbols and ids.

currently it consists of

  $trailing = [₀₁₂₃₄₅₆₇₈₉⁰¹²³⁴⁵⁶⁷⁸⁹₍₎⁽⁾₊₋]

  John

On Sat, Jun 14, 2014 at 7:48 AM, Mikhail Vorozhtsov
<mikhail.vorozht...@gmail.com> wrote:

Hello lists,

As some of you may know, GHC's support for Unicode characters in lexemes is
rather crude and hence prone to inconsistencies in their handling versus the
ASCII counterparts. For example, APOSTROPHE is treated differently from
PRIME:

λ> data a +' b = Plus a b
<interactive>:3:9:
     Unexpected type ‘b’
     In the data declaration for ‘+’
     A data declaration should have form
       data + a b c = ...
λ> data a +′ b = Plus a b

λ> let a' = 1
λ> let a′ = 1
<interactive>:10:8: parse error on input ‘=’

Also some rather bizarre looking things are accepted:

λ> let ᵤxᵤy = 1

In the spirit of improving things little by little I would like to propose:

1. Handle single/double/triple/quadruple Unicode PRIMEs the same way as
APOSTROPHE, meaning the following alterations to the lexer:

primes -> U+2032 | U+2033 | U+2034 | U+2057
symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes)
graphic -> small | large | symbol | digit | special | " | ' | primes
varid -> (small { small | large | digit | ' | primes }) (EXCEPT reservedid)
conid -> large { small | large | digit | ' | primes }

2. Introduce a new lexer nonterminal "subsup" that would include the Unicode
sub/superscript[1] versions of numbers, "-", "+", "=", "(", ")", Latin and
Greek letters. And allow these characters to be used in names and operators:

symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes |
subsup )
digit -> ascDigit | uniDigit (EXCEPT subsup)
small -> ascSmall | uniSmall (EXCEPT subsup) | _
large -> ascLarge | uniLarge (EXCEPT subsup)
graphic -> small | large | symbol | digit | special | " | ' | primes |
subsup
varid -> (small { small | large | digit | ' | primes | subsup }) (EXCEPT
reservedid)
conid -> large { small | large | digit | ' | primes | subsup }
varsym -> (symbol (EXCEPT :) {symbol | subsup}) (EXCEPT reservedop | dashes)
consym -> (: {symbol | subsup}) (EXCEPT reservedop)

If this proposal is received favorably, I'll write a patch for GHC based on
my previous stab at the problem[2].

P.S. I'm CC-ing Cafe for extra attention, but please keep the discussion to
the GHC users list.

[1] https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
[2] https://ghc.haskell.org/trac/ghc/ticket/5108
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: RFC: Unicode primes and super/subscript characters in GHC

Reply via email to