Re: [GHC] #5108: Allow unicode sub/superscript symbols in both identifiers and operators

2012-08-16 Thread GHC
#5108: Allow unicode sub/superscript symbols in both identifiers and operators
-+--
  Reporter:  mikhail.vorozhtsov  |  Owner:  
  Type:  feature request | Status:  new 
  Priority:  normal  |  Milestone:  7.6.1   
 Component:  Compiler (Parser)   |Version:  7.1 
Resolution:  |   Keywords:  lexer unicode   
Os:  Unknown/Multiple|   Architecture:  Unknown/Multiple
   Failure:  None/Unknown| Difficulty:  Unknown 
  Testcase:  |  Blockedby:  
  Blocking:  |Related:  
-+--

Comment(by mikhail.vorozhtsov):

 Sorry for the late reply. I'll try to revisit the issue and come up with a
 less ad-hoc proposal in a month or two. Right now I'm completely out of
 free time.

-- 
Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/5108#comment:8
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

___
Glasgow-haskell-bugs mailing list
Glasgow-haskell-bugs@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs


Re: [GHC] #5108: Allow unicode sub/superscript symbols in both identifiers and operators

2012-07-16 Thread GHC
#5108: Allow unicode sub/superscript symbols in both identifiers and operators
-+--
  Reporter:  mikhail.vorozhtsov  |  Owner:  
  Type:  feature request | Status:  new 
  Priority:  normal  |  Milestone:  7.6.1   
 Component:  Compiler (Parser)   |Version:  7.1 
Resolution:  |   Keywords:  lexer unicode   
Os:  Unknown/Multiple|   Architecture:  Unknown/Multiple
   Failure:  None/Unknown| Difficulty:  Unknown 
  Testcase:  |  Blockedby:  
  Blocking:  |Related:  
-+--
Changes (by simonpj):

  * status:  patch = new


Comment:

 Mikhail,

 The first issue here is whether we ''want'' sub/superscripts (or indeed
 primes) on operators, and that's a language design question.  We tend
 towards no but if there was a clear consensus from the Unicode-aware
 Haskell community, we'd accept it.  The implementation questions are
 probably resolvable.

 Could you start a thread on glasgow-haskell-users to ask them?

 (A possible outcome might be that operators should not allow primes!  ie
 the current behaviour is inconsistent, as you point out.  And it's wierd
 that you can use Unicode primes but not Ascii ones!)

 Simon

-- 
Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/5108#comment:7
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

___
Glasgow-haskell-bugs mailing list
Glasgow-haskell-bugs@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs


Re: [GHC] #5108: Allow unicode sub/superscript symbols in both identifiers and operators

2011-12-16 Thread GHC
#5108: Allow unicode sub/superscript symbols in both identifiers and operators
---+
Reporter:  mikhail.vorozhtsov  |   Owner:  
Type:  feature request |  Status:  patch   
Priority:  normal  |   Milestone:  7.4.1   
   Component:  Compiler (Parser)   | Version:  7.1 
Keywords:  lexer unicode   |  Os:  Unknown/Multiple
Architecture:  Unknown/Multiple| Failure:  None/Unknown
  Difficulty:  Unknown |Testcase:  
   Blockedby:  |Blocking:  
 Related:  |  
---+

Comment(by mikhail.vorozhtsov):

 Replying to [comment:4 simonmar]:
  I'm not keen on this patch for a few reasons:
 
   * It's inconsistent to allow superscript/subscript on symbols.  Haskell
 doesn't currently allow primes on symbols, for example.
 If fact, GHC already allows unicode primes on symbols. alexGetByte
 classifies OtherPunctuation characters (including primes) as `$unisymbol`.
 {{{
 $ ghci
 GHCi, version 7.2.2: http://www.haskell.org/ghc/  :? for help
 Loading package ghc-prim ... linking ... done.
 Loading package integer-gmp ... linking ... done.
 Loading package base ... linking ... done.
 Loading package ffi-1.0 ... linking ... done.
 λ let a +′ b = a + b
 }}}
 The patch just makes sure that primes at least do not appear at the start
 of a `@varsym`. We can further restrict sub/sup characters to appear only
 in the suffix of a symbol, i.e. `@varsym = $symbol $symchar* $subsup*`.
   * The patch has a bunch of Unicode constants baked into it
 The same can ultimately be said about `generalCategory`, I mean look at
 `u_gencat`. I can move the sup/sub test to a separate inlinable function.
   * It adds a bunch of extra tests to the inner loop.  I haven't
 measured it but I wouldn't be surprised if this slows down the lexer.
 Hm, I don't know if a few extra comparisons on already rare unicode
 characters will outweight the binary search in `u_gencat`, let alone
 significantly increase the overall lexing time. Is there any way to stop
 GHC right after lexing so I can benchmark?
  Perhaps it might be better just to allow the category Lm (MODIFIER
 LETTER) as part of an identifier?  That would include all the primes and
 subscript/superscript things.
 Lm leaves out a bunch of characters (e.g. sub/sup variants of + - =
 ( )), including the primes which, as I mentioned, are Po. Another
 drawback is that identifies like abcₓdef would be accepted. BTW, we
 already can write something not-so-beautiful like:
 {{{
 λ let ᵤxᵤy = 1
 }}}
 because ᵤ is in the Ll category.

-- 
Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/5108#comment:5
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

___
Glasgow-haskell-bugs mailing list
Glasgow-haskell-bugs@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs


Re: [GHC] #5108: Allow unicode sub/superscript symbols in both identifiers and operators

2011-12-15 Thread GHC
#5108: Allow unicode sub/superscript symbols in both identifiers and operators
---+
Reporter:  mikhail.vorozhtsov  |   Owner:  
Type:  feature request |  Status:  patch   
Priority:  normal  |   Milestone:  7.4.1   
   Component:  Compiler (Parser)   | Version:  7.1 
Keywords:  lexer unicode   |  Os:  Unknown/Multiple
Architecture:  Unknown/Multiple| Failure:  None/Unknown
  Difficulty:  Unknown |Testcase:  
   Blockedby:  |Blocking:  
 Related:  |  
---+
Changes (by simonmar):

  * difficulty:  = Unknown


Comment:

 I'm not keen on this patch for a few reasons:

  * It's inconsistent to allow superscript/subscript on symbols.  Haskell
doesn't currently allow primes on symbols, for example.

  * The patch has a bunch of Unicode constants baked into it

  * It adds a bunch of extra tests to the inner loop.  I haven't
measured it but I wouldn't be surprised if this slows down the lexer.

 Perhaps it might be better just to allow the category Lm (MODIFIER LETTER)
 as part of an identifier?  That would include all the primes and
 subscript/superscript things.

-- 
Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/5108#comment:4
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

___
Glasgow-haskell-bugs mailing list
Glasgow-haskell-bugs@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs


Re: [GHC] #5108: Allow unicode sub/superscript symbols in both identifiers and operators

2011-07-18 Thread GHC
#5108: Allow unicode sub/superscript symbols in both identifiers and operators
---+
Reporter:  mikhail.vorozhtsov  |Owner:  
Type:  feature request |   Status:  patch   
Priority:  normal  |Milestone:  7.4.1   
   Component:  Compiler (Parser)   |  Version:  7.1 
Keywords:  lexer unicode   | Testcase:  
   Blockedby:  |   Difficulty:  
  Os:  Unknown/Multiple| Blocking:  
Architecture:  Unknown/Multiple|  Failure:  None/Unknown
---+

Comment(by mikhail.vorozhtsov):

 rebased

-- 
Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/5108#comment:3
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

___
Glasgow-haskell-bugs mailing list
Glasgow-haskell-bugs@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs


Re: [GHC] #5108: Allow unicode sub/superscript symbols in both identifiers and operators

2011-07-15 Thread GHC
#5108: Allow unicode sub/superscript symbols in both identifiers and operators
---+
Reporter:  mikhail.vorozhtsov  |Owner:  
Type:  feature request |   Status:  patch   
Priority:  normal  |Milestone:  7.4.1   
   Component:  Compiler (Parser)   |  Version:  7.1 
Keywords:  lexer unicode   | Testcase:  
   Blockedby:  |   Difficulty:  
  Os:  Unknown/Multiple| Blocking:  
Architecture:  Unknown/Multiple|  Failure:  None/Unknown
---+
Changes (by igloo):

  * component:  Compiler = Compiler (Parser)
  * milestone:  = 7.4.1


-- 
Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/5108#comment:2
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

___
Glasgow-haskell-bugs mailing list
Glasgow-haskell-bugs@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs


[GHC] #5108: Allow unicode sub/superscript symbols in both identifiers and operators

2011-04-12 Thread GHC
#5108: Allow unicode sub/superscript symbols in both identifiers and operators
---+
Reporter:  mikhail.vorozhtsov  |   Owner:   
Type:  feature request |  Status:  new  
Priority:  normal  |   Component:  Compiler 
 Version:  7.1 |Keywords:  lexer unicode
Testcase:  |   Blockedby:   
  Os:  Unknown/Multiple|Blocking:   
Architecture:  Unknown/Multiple| Failure:  None/Unknown 
---+
 While #4373 permits
 {{{
 Prelude let v₁ = 1
 }}}
 the following is rejected
 {{{
 Prelude let m =₁ f = undefined

 interactive:0:10: lexical error at character '\8321'
 }}}
 Identifiers with non-numeric subscripts are not accepted either:
 {{{
 Prelude let vₐ = 1

 interactive:0:6: lexical error at character '\8336'
 }}}

 I wrote a small patch that makes such definitions possible.
  1. A new unicode Alex macro, {{{$subsup}}}, is introduced and added to
 {{{$idchar}}}, {{{$symchar}}}, and {{{$graphic}}}
  2. A unicode code point is classified as {{{$subsup}}} by
 {{{alexGetChar}}} iff either of the following holds:
   a. The code point is annotated with sub or super in
 [http://www.unicode.org/Public/UNIDATA/UnicodeData.txt]
   b. It is the [DOUBLE/TRIPLE/QUADRUPLE] PRIME (U+2032, U+2033, U+2034,
 U+2057)

-- 
Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/5108
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

___
Glasgow-haskell-bugs mailing list
Glasgow-haskell-bugs@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs