Andrei Alexandrescu wrote:
Please vote up before the haters take it down, and discuss:

http://www.reddit.com/r/programming/comments/78rjk/allowing_unicode_operators_in_d_similarly_to/


Andrei

I suggest not. There are problems if you adopt Unicode as operators:

======

1) My editor supports Unicode, but my keyboard don't. So how do I type ∩ and ∪ for a set«T»?

1.1) What if the library writer forget to provide an alternative, ASCII-only name? [This is also a problem of using Unicode as identifier as general.]

1.2) Some suggested auto-correction in the IDE. Again what if I used notepad/nano/TextEdit to code?



I had suggested once before, but let me put it formally here. If you really want to support Unicode operators in source code,

- Firstly, ditch the ability to replace \xxx with '\xxx' when it appears without the quotes (so “char x = \n;” won't compile).
 - Then, replace \xxx with the character represented in source level, so

     Vector3D«real» τ = r × F;

   can be written as

     Vector3D!(real) \τ = r \× F;

 - You don't need to introduce a separate trigraph.
- But suggestion do trigger some people's trigraph-phobia. [Yell no! Now! :) ]
 - It may make the source code difficult to parse grammatically.
- It will make the source code difficult to read, just look at the number of semicolons in the ASCII encoded version.
 - But at least you can compile your code.

======

2) This is regarding the rejection of « & » to be supported even if the emacs module goes official. Of course it turns out it is not, but let's think of these scenarios:

2.1) OK it turns out ∩ and ∪ and «T» where just .opUnion(x) and .opIntersect(x) and !(T) pretty-printed in emacs; the compiler won't accept these characters anyway. But sometimes I forgot and just copied a portion of these code to nano/geany/whatever and then it stops compiling!

2.2) Well this copy&paste problem has been solved in the IDE level by inverting the pretty printing while copying. But now I publish my fantastic, pretty-printed D program in a web page/PDF/whatever, and people just complain the compiler won't accept it!



I still believe if you're going to transform D code to Unicode visually, the compiler must accept these visual replacement as well.

May I also take Mathematica as an example. The programming language itself uses a heavy load of non-ASCII characters, and the IDE also pretty-printed them as nice mathematical formulas, but in the “source code” level they are just escape sequences. So on screen you see

   E^(I π) + 1

but in the source code you'll see

   E^(I \[Pi]) + 1

However, if you type in “E^(I π) + 1” in a plain .nb file and open with the Get[] function (think of it as “import xx.d”) it can still correctly display the result “0”.

======

3) There are over 800 unary or binary operators in Unicode[1]. How are you going to opXXX all them? Assume your blog entry doesn't mean the simple “!=” ↦ “≠” transformation.


Use to the C++/C# approach? But I heard that's no good.

======

4) These are regarding if you are going to support overloading for all these 800 operators, how to define:

4.1) [Big problem] Operator precedence? (One person may want ∧ to mean the wedge product (so they have higher precedence than + and -) but another want it to mean logical AND (so lower than + and -).)

4.2) Associativity? How to determine if an operator is left-associative, right-associative or both? (∧ as wedge product is both, while ∧ as a power function pow(a,b) is right-assoc.)

4.3) [Minor problem] Commutativity? Or we'll need to write opXXX and opXXX_r all the time?


I don't have solutions for D on these. For 4.2 & 4.3 in C# we can introduce some attributes like

  [Associative, Commutative]
  FuzzyBool operator∧ (FuzzyBool x, FuzzyBool y) { return min(x,y); }

  (Not actual C# code.)

but it's not D. :)

Or predefine the meaning, precedence and associativity for the each operator, so e.g. ∧ always means the wedge product and not logical AND, just like now ^ always means XOR and not power function.

Or just require the programmer to always put the parenthesis.




Ref: [1] A rough word count in http://www.unicode.org/Public/math/revision-11/MathClass-11.txt. The actual number is higher than this.

Reply via email to