Re: Does Java 1.5 support Unicode math alphanumerics as variable names?

2004-02-27 Thread Kazuhiro Kazama
From: Norbert Lindenberg <[EMAIL PROTECTED]>
Subject: Re: Does Java 1.5 support Unicode math alphanumerics as variable names?
Date: Thu, 26 Feb 2004 19:03:29 -0800
Message-ID: <[EMAIL PROTECTED]>
> Sorry for the late reply - until the rules of the Java Community 
> Process it had to wait until the Public Review Draft is published.

The specification of JSR-204 Unicode Supplementary Character Support
is now available for Public Review from

http://jcp.org/en/jsr/stage?listBy=public

today.

All comments to [EMAIL PROTECTED]

Kazuhiro Kazama ([EMAIL PROTECTED]) NTT Network Innovation Laboratories



Re: Does Java 1.5 support Unicode math alphanumerics as variable names?

2004-02-26 Thread Norbert Lindenberg
Murray,

Yes, starting from J2SE 1.5 the Java programming language allows 
supplementary characters in identifiers if they meet the specifications 
of the new methods java.lang.Character.isJavaIdentifierStart(int) and 
java.lang.Character.isJavaIdentifierPart(int).

Sorry for the late reply - until the rules of the Java Community 
Process it had to wait until the Public Review Draft is published.

Best regards,
Norbert
On Jan 23, 2004, at 17:46, Murray Sargent wrote:

E.g., math italic i (U+1D456)? With such usage, Java mathematical
programs could look more like the original math.
Thanks
Murray




Re: Does Java 1.5 support Unicode math alphanumerics as variable names?

2004-01-26 Thread Philippe Verdy
From: Arcane Jill

> I would be very surprised if it did, since Java chars are still only
> sixteen bits wide,

Yes but they include surrogates as valid values for char, so UTF-16 could be
used to represent variable names. The main problem is not with variables,
member variables and methods, but with class and package names which need to
be mappable into filenames to be stored in a local filesystem (at compile
time) or in a zip directory entry (for packaged applications). Not all
characters are usable as valid filenames, due to the way filesystems may
rearrange or normalize or transcode these names.

> and the new math alphanumerics are not in BMP. Still, I'd be very happy to
be proved wrong on this one.

For now the JLS
(http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html)
defines the language lexical translation as supporting only Unicode 2.1
(every thing else must use "Unicode escapes", notably for characters out of
the BMP which need to be represented as "\uD8xx\uDCxx" and can then only be
used within String constants.)

In section 3.8 -- Indentifiers --, you'll find this:

[quote]
An identifier is an unlimited-length sequence of Java letters and Java
digits, the first of which must be a Java letter. An identifier cannot have
the same spelling (Unicode character sequence) as a keyword (§3.9), boolean
literal (§3.10.3), or the null literal (§3.10.7).

Letters and digits may be drawn from the entire Unicode character set, which
supports most writing scripts in use in the world today, including the large
sets for Chinese, Japanese, and Korean. This allows programmers to use
identifiers in their programs that are written in their native languages.
A "Java letter" is a character for which the method
Character.isJavaIdentifierStart returns true. A "Java letter-or-digit" is a
character for which the method Character.isJavaIdentifierPart returns true.
[/quote]

As the valid characters usable in identifiers need to be mappable into a
Character instance (which only supports UCS2 code points) so that
Character.isJavaIdentifier() can return true, including characters out of
BMP in identifiers would require that surrogates are included in the list of
possible Character instances whose isJavaIndetifier() test returns true.

So let's see the Character class documentation:

[quote]
isJavaIdentifierPart
public static boolean isJavaIdentifierPart(char ch)
  Determines if the specified character may be part of a Java identifier as
other than the first character.
  A character may be part of a Java identifier if any of the following are
true:
  * it is a letter (matches the general categories UPPERCASE_LETTER,
LOWERCASE_LETTER, TITLECASE_LETTER, MODIFIER_LETTER, OTHER_LETTER)
  * it is a currency symbol (such as '$')
  * it is a connecting punctuation character (such as '_')
  * it is a digit
  * it is a numeric letter (such as a Roman numeral character)
  * it is a combining mark
  * it is a non-spacing mark
  * isIdentifierIgnorable returns true for the character
Parameters:
  ch - the character to be tested.
Returns:
  true if the character may be part of a Java identifier; false otherwise.
Since:
  1.1
See Also:
  isIdentifierIgnorable(char), isJavaIdentifierStart(char),
isLetterOrDigit(char), isUnicodeIdentifierPart(char)
[/quote]

The requirements above makes surrogates unsuitable for identifiers, simply
because surrogates have no suitable general category that matches the above
requirements:
* they are neither letters, currency symbols, and so on... because
surrogates have NO general category;
* the Character class just list them with a SURROGATE general category, see:
int Character.getType(Character ch);
* but it returns false for isDefined() as they don't have an entry in the
UCD or a value in a range defined in the UCD;

I doubt that this can be changed.




RE: Does Java 1.5 support Unicode math alphanumerics as variable names?

2004-01-26 Thread Arcane Jill





I would be very surprised if it did, since Java chars are still only
sixteen bits wide, and the new math alphanumerics are not in BMP.
Still, I'd be very happy to be proved wrong on this one.

Actually, I'd quite like to use these as variable names in other
languages too, like in C++ for example, but I think that may be
forbidden due to some standard or other.

You can get away with using math alphanumerics in variable
names in PHP. This is because PHP stores its code in eight-bit wide
bytes, but defines only ASCII characters. It allows any characters in
range 0x80 to 0xFF to be part of a variable name, so ... if you use
UTF-8, you can use math alphanumerics. All you need is the right text
editor to manipulate the source code. Unfortunately, this is not an
ideal situation either, because of course it would also be nice
to be able to use the proper math operators as, well, math operators!

Jill



  -Original Message-
  From: Murray Sargent [mailto:[EMAIL PROTECTED]]
  Sent: Saturday, January 24, 2004 1:47 AM
  To: [EMAIL PROTECTED]
  Subject: Does Java 1.5 support Unicode math alphanumerics as
variable names?
  
  

  E.g., math italic i (U+1D456)?
With such usage, Java mathematical programs could look more like the
original math. 
  Thanks 
  Murray 





Does Java 1.5 support Unicode math alphanumerics as variable names?

2004-01-23 Thread Murray Sargent
Title: Does Java 1.5 support Unicode math alphanumerics as variable names?






E.g., math italic i (U+1D456)? With such usage, Java mathematical programs could look more like the original math.


Thanks

Murray