That seems like a reasonable interpretation, but who knows what the spec
writer really meant?! and the result is the same, we will continue to
match the reference implementation behavior by returning false.

Thanks
Tim

Karan Malhi wrote:
> Hi Tim,
> 
> I would add something to this discussion. My interpretation is that if the
> charset name does not comply with RFC 2278 , only then an
> IllegalCharsetNameException should be thrown. The spec says that charset
> should start with character or digit, but it does not specifically mention
> that it would be treated as illegal(it could also be treated as not
> supported). The one clear example for an illegal charset is an empty String
> (this rule matches with RFC 2278)
> 
> " A charset name must begin with either a letter or a digit. The empty
> string is not a legal charset name."
> 
> I tested the reference implementation for this method and looks like the
> reference impl complies with RFC 2278 and simply returns false if the name
> starts with a "-" (This is also because there is no charset name in the IANA
> Charset registry which starts with a "-"). It does not throw an
> IllegalCharsetNameException
> So, one could also interpret the spec in the following way:
> If the charset name does not comply with RFC 2278 then throw
> IllegalCharsetNameException, otherwise if the charset is not supported,
> return false.
> 
> On 2/18/06, karan malhi <[EMAIL PROTECTED]> wrote:
>> Here is text from the j2se1.4.2 spec
>> A charset name must begin with either a letter or a digit. The empty
>> string is not a legal charset name. Charset names are not
>> case-sensitive; that is, case is always ignored when comparing charset
>> names. Charset names generally follow the conventions documented in
>> /RFC 2278: IANA Charset Registration Procedures/
>> <http://ietf.org/rfc/rfc2278.txt>.
>> According to RFC - 2278
>>
>>    Finally, charsets being registered for use with the "text" media type
>>    MUST have a primary name that conforms to the more restrictive syntax
>>    of the charset field in MIME encoded-words [RFC-2047, RFC-2184] and
>>    MIME extended parameter values [RFC-2184]. A combined ABNF definition
>>    for such names is as follows:
>>
>>    mime-charset = 1*<Any CHAR except SPACE, CTLs, and cspecials>
>>
>>    cspecials    = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
>>                   <"> / "/" / "[" / "]" / "?" / "." / "=" / "*"
>>
>>    CHAR         =  <any ASCII character>        ; (  0-177,  0.-127.)
>>    SPACE        =  <ASCII SP, space>            ; (     40,      32.)
>>    CTL          =  <any ASCII control           ; (  0- 37,  0.- 31.)
>>                     character and DEL>          ; (    177,     127.)
>>
>> If I have interpreted the above correctly, then it basically means that
>> the name can start with any ASCII character except ASCII (octal) 40,
>> 0-37, 177.
>> A "-" is 055 and an "_" is 137 which does not fall under the above
>> exclude list.
>> So primarily if I have a charset named "-UTF-8"  or "_UTF-8", it is not
>> an illegal name.
>>
>> So looks like the spec definition is further tightening the Charsets
>> accepted by java in that the name can only start with a letter or a
>> digit. How do we interpret *must* ?
>>
>>
>>
>> So
>>
>> Richard Liang wrote:
>>
>>> Hello Tim,
>>>
>>> I'm wondering why I did not just copy the first sentence. :-)
>>>
>>> "A charset name **must** begin with either a letter or a digit."  Does
>>> this mean if the charset name which begin with neither a letter nor a
>>> digit should be regarded as an illegal charset name?
>>>
>>>
>>> Richard Liang
>>> China Software Development Lab, IBM
>>>
>>>
>>>
>>> Tim Ellison wrote:
>>>
>>>> Richard Liang wrote:
>>>>
>>>>
>>>>> Hello Tim,
>>>>>
>>>>> I think this is caused by different understanding of the java spec:
>>>>>
>>>>> A charset name **must** begin with either a letter or a digit. The
>>>>> empty
>>>>> string is not a legal charset name....
>>>>>
>>>>> What do think the implication of "must" here? :-)
>>>>>
>>>>
>>>> But the name isn't empty, it is "-UTF-8" ?  I must be missing
>>>> something...
>>>>
>>>> Regards,
>>>> Tim
>>>>
>>>>
>>>>
>>>>
>>>>> Tim Ellison (JIRA) wrote:
>>>>>
>>>>>
>>>>>>     [
>>>>>>
>> http://issues.apache.org/jira/browse/HARMONY-68?page=comments#action_12366784
>>>>>> ]
>>>>>> Tim Ellison commented on HARMONY-68:
>>>>>> ------------------------------------
>>>>>>
>>>>>> The test looks invalid to me.  You shoud only expect an
>>>>>> java.nio.charset.IllegalCharsetNameException if the name itself
>>>>>> contains disallowed characters, and both underscore and dash are
>>>>>> permitted.
>>>>>>
>>>>>> The code     Charset.isSupported("-UTF-8")
>>>>>>
>>>>>> should return false, not throw an exception.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> java.nio.charset.Charset.isSupported(String charsetName) does not
>>>>>>> throw IllegalCharsetNameException for spoiled standard sharset name
>>>>>>>
>> -------------------------------------------------------------------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>          Key: HARMONY-68
>>>>>>>          URL: http://issues.apache.org/jira/browse/HARMONY-68
>>>>>>>      Project: Harmony
>>>>>>>         Type: Bug
>>>>>>>   Components: Classlib
>>>>>>>     Reporter: Svetlana Samoilenko
>>>>>>>  Attachments: charset_patch.txt
>>>>>>>
>>>>>>> According to j2se 1.4.2 specification for Charset.isSupported(String
>>>>>>> charsetName)  the method must throw IllegalCharsetNameException  "if
>>>>>>> the given charset name is illegal ". "Legal charset name must begin
>>>>>>> with either a letter or a digit. The test listed below shows that
>>>>>>> there is no the exception  if to insert "-" or "_" symbols before
>>>>>>> standard sharset name, for example "-UTF-8" or "_US-ASCII".
>>>>>>> Moreover the method returns "true" in this case.
>>>>>>> BEA also does not throw the exception but returns "false".
>>>>>>> Code to reproduce: import java.nio.charset.*;  public class test2
>>>>>>> {     public static void main (String[] args) {
>>>>>>>         // string starts neither a letter nor a digit
>> boolean
>>>>>>> sup=false;         try{
>>>>>>>              sup=Charset.isSupported("-UTF-8");
>>>>>>>              System.out.println("***BAD. should be exception;
>>>>>>> sup="+sup);              sup=Charset.isSupported("_US-ASCII");
>>>>>>>              System.out.println("***BAD. should be exception;
>>>>>>> sup="+sup);         } catch (IllegalCharsetNameException e) {
>>>>>>>             System.out.println("***OK. Expected
>>>>>>> IllegalCharsetNameException " + e);         }           } } Steps to
>>>>>>> Reproduce: 1. Build Harmony (check-out on 2006-01-30) j2se subset as
>>>>>>> described in README.txt. 2. Compile test2.java using BEA 1.4
>>>>>>> javac
>>>>>>>
>>>>>>>> javac -d . test2.java
>>>>>>> 3. Run java using compatible VM (J9)
>>>>>>>
>>>>>>>> java -showversion test2
>>>>>>> Output: C:\tmp>C:\jrockit-j2sdk1.4.2_04\bin\java.exe -showversion
>>>>>>> test2 java version "1.4.2_04" Java(TM) 2 Runtime Environment,
>>>>>>> Standard Edition (build 1.4.2_04-b05) BEA WebLogic JRockit(TM)
>>>>>>> 1.4.2_04 JVM (build ari-31788-20040616-1132-win-ia32, Native
>> Threads,
>>>>>>> GC strategy: parallel) ***BAD. should be exception; sup=false
>>>>>>> ***BAD. should be exception; sup=false
>>>>>>> C:\tmp>C:\harmony\trunk\deploy\jre\bin\java -showversion test2 (c)
>>>>>>> Copyright 1991, 2005 The Apache Software Foundation or its
>> licensors,
>>>>>>> as applicable. ***BAD. should be exception; sup=true
>>>>>>> ***BAD. should be exception; sup=true
>>>>>>> Suggested junit test case:
>>>>>>> ------------------------ CharserTest.java
>>>>>>> ------------------------------------------------- import
>>>>>>> java.nio.charset.*; import junit.framework.*; public class
>>>>>>> CharsetTest extends TestCase {     public static void main(String[]
>>>>>>> args) {         junit.textui.TestRunner.run(CharsetTest.class);
>> }
>>>>>>>     public void test_isSupported() {       boolean
>>>>>>> sup=false;        // string starts neither a letter nor a
>>>>>>> digit         try{
>>>>>>>             sup=Charset.isSupported("-UTF-8");
>>>>>>>             fail("***BAD. should be exception
>>>>>>> IllegalCharsetNameException");         } catch
>>>>>>> (IllegalCharsetNameException e) {  //expected
>>>>>>>         }
>>>>>>>         // string starts neither a letter nor a digit         try{
>>>>>>>              sup=Charset.isSupported("_US-ASCII");
>>>>>>>              fail("***BAD. should be exception
>>>>>>> IllegalCharsetNameException");          } catch
>>>>>>> (IllegalCharsetNameException e) {  //expected
>>>>>>>         }
>>>>>>>    } }
>>>>>>>
>>>>>>
>>>>
>>>
>> --
>> Karan Singh
>>
>>
> 
> 
> --
> Karan Malhi
> 

-- 

Tim Ellison ([EMAIL PROTECTED])
IBM Java technology centre, UK.

Reply via email to