Re: [GHC] #5108: Allow unicode sub/superscript symbols in both identifiers and operators
#5108: Allow unicode sub/superscript symbols in both identifiers and operators -+-- Reporter: mikhail.vorozhtsov | Owner: Type: feature request | Status: new Priority: normal | Milestone: 7.6.1 Component: Compiler (Parser) |Version: 7.1 Resolution: | Keywords: lexer unicode Os: Unknown/Multiple| Architecture: Unknown/Multiple Failure: None/Unknown| Difficulty: Unknown Testcase: | Blockedby: Blocking: |Related: -+-- Comment(by mikhail.vorozhtsov): Sorry for the late reply. I'll try to revisit the issue and come up with a less ad-hoc proposal in a month or two. Right now I'm completely out of free time. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/5108#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler ___ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Re: [GHC] #5108: Allow unicode sub/superscript symbols in both identifiers and operators
#5108: Allow unicode sub/superscript symbols in both identifiers and operators -+-- Reporter: mikhail.vorozhtsov | Owner: Type: feature request | Status: new Priority: normal | Milestone: 7.6.1 Component: Compiler (Parser) |Version: 7.1 Resolution: | Keywords: lexer unicode Os: Unknown/Multiple| Architecture: Unknown/Multiple Failure: None/Unknown| Difficulty: Unknown Testcase: | Blockedby: Blocking: |Related: -+-- Changes (by simonpj): * status: patch = new Comment: Mikhail, The first issue here is whether we ''want'' sub/superscripts (or indeed primes) on operators, and that's a language design question. We tend towards no but if there was a clear consensus from the Unicode-aware Haskell community, we'd accept it. The implementation questions are probably resolvable. Could you start a thread on glasgow-haskell-users to ask them? (A possible outcome might be that operators should not allow primes! ie the current behaviour is inconsistent, as you point out. And it's wierd that you can use Unicode primes but not Ascii ones!) Simon -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/5108#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler ___ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Re: [GHC] #5108: Allow unicode sub/superscript symbols in both identifiers and operators
#5108: Allow unicode sub/superscript symbols in both identifiers and operators ---+ Reporter: mikhail.vorozhtsov | Owner: Type: feature request | Status: patch Priority: normal | Milestone: 7.4.1 Component: Compiler (Parser) | Version: 7.1 Keywords: lexer unicode | Os: Unknown/Multiple Architecture: Unknown/Multiple| Failure: None/Unknown Difficulty: Unknown |Testcase: Blockedby: |Blocking: Related: | ---+ Comment(by mikhail.vorozhtsov): Replying to [comment:4 simonmar]: I'm not keen on this patch for a few reasons: * It's inconsistent to allow superscript/subscript on symbols. Haskell doesn't currently allow primes on symbols, for example. If fact, GHC already allows unicode primes on symbols. alexGetByte classifies OtherPunctuation characters (including primes) as `$unisymbol`. {{{ $ ghci GHCi, version 7.2.2: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Loading package ffi-1.0 ... linking ... done. λ let a +′ b = a + b }}} The patch just makes sure that primes at least do not appear at the start of a `@varsym`. We can further restrict sub/sup characters to appear only in the suffix of a symbol, i.e. `@varsym = $symbol $symchar* $subsup*`. * The patch has a bunch of Unicode constants baked into it The same can ultimately be said about `generalCategory`, I mean look at `u_gencat`. I can move the sup/sub test to a separate inlinable function. * It adds a bunch of extra tests to the inner loop. I haven't measured it but I wouldn't be surprised if this slows down the lexer. Hm, I don't know if a few extra comparisons on already rare unicode characters will outweight the binary search in `u_gencat`, let alone significantly increase the overall lexing time. Is there any way to stop GHC right after lexing so I can benchmark? Perhaps it might be better just to allow the category Lm (MODIFIER LETTER) as part of an identifier? That would include all the primes and subscript/superscript things. Lm leaves out a bunch of characters (e.g. sub/sup variants of + - = ( )), including the primes which, as I mentioned, are Po. Another drawback is that identifies like abcₓdef would be accepted. BTW, we already can write something not-so-beautiful like: {{{ λ let ᵤxᵤy = 1 }}} because ᵤ is in the Ll category. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/5108#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler ___ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Re: [GHC] #5108: Allow unicode sub/superscript symbols in both identifiers and operators
#5108: Allow unicode sub/superscript symbols in both identifiers and operators ---+ Reporter: mikhail.vorozhtsov | Owner: Type: feature request | Status: patch Priority: normal | Milestone: 7.4.1 Component: Compiler (Parser) | Version: 7.1 Keywords: lexer unicode | Os: Unknown/Multiple Architecture: Unknown/Multiple| Failure: None/Unknown Difficulty: Unknown |Testcase: Blockedby: |Blocking: Related: | ---+ Changes (by simonmar): * difficulty: = Unknown Comment: I'm not keen on this patch for a few reasons: * It's inconsistent to allow superscript/subscript on symbols. Haskell doesn't currently allow primes on symbols, for example. * The patch has a bunch of Unicode constants baked into it * It adds a bunch of extra tests to the inner loop. I haven't measured it but I wouldn't be surprised if this slows down the lexer. Perhaps it might be better just to allow the category Lm (MODIFIER LETTER) as part of an identifier? That would include all the primes and subscript/superscript things. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/5108#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler ___ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Re: [GHC] #5108: Allow unicode sub/superscript symbols in both identifiers and operators
#5108: Allow unicode sub/superscript symbols in both identifiers and operators ---+ Reporter: mikhail.vorozhtsov |Owner: Type: feature request | Status: patch Priority: normal |Milestone: 7.4.1 Component: Compiler (Parser) | Version: 7.1 Keywords: lexer unicode | Testcase: Blockedby: | Difficulty: Os: Unknown/Multiple| Blocking: Architecture: Unknown/Multiple| Failure: None/Unknown ---+ Comment(by mikhail.vorozhtsov): rebased -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/5108#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler ___ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Re: [GHC] #5108: Allow unicode sub/superscript symbols in both identifiers and operators
#5108: Allow unicode sub/superscript symbols in both identifiers and operators ---+ Reporter: mikhail.vorozhtsov |Owner: Type: feature request | Status: patch Priority: normal |Milestone: 7.4.1 Component: Compiler (Parser) | Version: 7.1 Keywords: lexer unicode | Testcase: Blockedby: | Difficulty: Os: Unknown/Multiple| Blocking: Architecture: Unknown/Multiple| Failure: None/Unknown ---+ Changes (by igloo): * component: Compiler = Compiler (Parser) * milestone: = 7.4.1 -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/5108#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler ___ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
[GHC] #5108: Allow unicode sub/superscript symbols in both identifiers and operators
#5108: Allow unicode sub/superscript symbols in both identifiers and operators ---+ Reporter: mikhail.vorozhtsov | Owner: Type: feature request | Status: new Priority: normal | Component: Compiler Version: 7.1 |Keywords: lexer unicode Testcase: | Blockedby: Os: Unknown/Multiple|Blocking: Architecture: Unknown/Multiple| Failure: None/Unknown ---+ While #4373 permits {{{ Prelude let v₁ = 1 }}} the following is rejected {{{ Prelude let m =₁ f = undefined interactive:0:10: lexical error at character '\8321' }}} Identifiers with non-numeric subscripts are not accepted either: {{{ Prelude let vₐ = 1 interactive:0:6: lexical error at character '\8336' }}} I wrote a small patch that makes such definitions possible. 1. A new unicode Alex macro, {{{$subsup}}}, is introduced and added to {{{$idchar}}}, {{{$symchar}}}, and {{{$graphic}}} 2. A unicode code point is classified as {{{$subsup}}} by {{{alexGetChar}}} iff either of the following holds: a. The code point is annotated with sub or super in [http://www.unicode.org/Public/UNIDATA/UnicodeData.txt] b. It is the [DOUBLE/TRIPLE/QUADRUPLE] PRIME (U+2032, U+2033, U+2034, U+2057) -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/5108 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler ___ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs