Adrian, I attached the title like the above. ;) Please see some comment the below step 1-0 and 1-1.
Hope to be helpful for the furture, Sung-Gu ----- Original Message ----- From: "Sung-Gu" <[EMAIL PROTECTED]> To: "Commons HttpClient Project" <[EMAIL PROTECTED]> Sent: Thursday, June 26, 2003 10:39 PM Subject: Re: [VOTE] Re: 2.0 release - deprecate some methods? > > ----- Original Message ----- > From: "Adrian Sutton" <[EMAIL PROTECTED]> > > > If you don't know why the code would be useful or what it was > > implemented based upon, why is it that you still want it in HttpClient? > > There is nothing that uses those methods anywhere in HttpClient and > > the presence of an FTP RFC that requires them still wouldn't make them > > applicable to HttpClient since we aren't dealing with FTP. > > It's not confined to only FTP. It's for every internet 'application layer' > programs. > > > > String temporary = URIUtil.toUsingCharset(input, "UTF-8", "Big5"); > > String result = URIUTIL.toUsingCharset(temporary, "Big5", "UTF-8"); > > assertEquals(input, result); > > > > * \u4E01 is a Chinese character. You can substitute \uCBBF for a wide > > range of Chinese characters and the test will still fail. > > > > * Big5 is a very commonly used charset for Chinese characters. > > [reminder] > The first step in the process can be performed by maintaining a mapping > table that includes the local character set code and the corresponding UCS > code. If you're eager to utilize it for Big5 and EBDIC. (I regard them as legacy charsets) I think I can give you a hint for a try. Please listen to me... Step 1-0) For the basic preparation of the first step, your operating system should be installed with unicode support language system for your local character. For example, Big5 and something like Big5.UTF-16 or ch.UTF-16? or EBDIC and something like EBDIC.UTF-16? I don't know... Perhaps you might utilize the URI.getDefaultDocumentCharsetByLocale or URI.getDefaultDocumentCharsetByPlatform methods. I'm not sure though. If you're using Windows 2000 or XP, then you can find code page for unicode I guess. You should not confuse with code page for lagacy lang. (I don't know really about that.... --a anyway you need mapping table for Big5 or EBDIC... Imagine EBDIC is really very legacy bit code for ony IBM?) (About java? Well I don't expect any ISO-8859-45 or ISO-8859-99? for chinise or EBDIC.) > The next step is to convert the UCS character code to the UTF-8 encoding. > > Hmmm.... I don't know about Big5 though... > As I guess, Big5 is not an UCS. It should be unicode for second step. Step 1-1) Please see the previous comment. > If you want to find an UCS for Big5 automatically, you should insert some > code into the toUsingCharset method perhaps. Step 1-? not belong to) > Some might wor without UCS transformation though, it must be required I > guess. skip... > > If you read the JavaDoc for the String constructor being used > > (String(byte[], String)), it says: > > "Constructs a new String by decoding the specified array of bytes using > > the specified charset." > > Note the use of the word "decoding" which means that instead of > > creating a String backed by the given byte array, it uses the specified > > charset to convert the bytes into actual characters - conceptually > > these characters have no particular encoding since they are > > (conceptually) the actual characters rather than a byte representation > > of the characters. In reality, the characters are represented in > > memory by a series of bytes in UTF-8 encoding as required by the JVM > > specification. > > UTF-8 is tranformation charset, not really display charset. > It's not always used as String class in java I guess. > > > Secondly, the toUsingCharset method cannot work in most situations > > because it converts the string to bytes using one encoding and then > > converts those bytes to a String using a different encoding. To > > highlight why this cannot work, create a text file and save it to disk > > using ASCII encoding. Then, attempt to read the file back in as EBDIC > > encoding (or any double-byte character charset like UTF-16), the text > > EBDIC is also not UCS. > > > will have become corrupted because the bytes were mapped to characters > > using the wrong charset (a charset is simply a mapping between bytes > > and characters). > > > > So, the possible ways for toUsingCharset to fulfill it's contract is > > for it to be changed to: > > > > public String toUsingCharset(String target, String fromCharset, String > > toCharset) { > > return target; > > } > > > > OR to: > > > > public byte[] toUsingCharset(String target, String toCharset) { > > return target.getBytes(toCharset); > > } > > > > OR to: > > > > public byte[] toUsingCharset(byte[] target, String fromCharset, String > > toCharset) { > > return new String(target, fromCharset).getBytes(toCharset); > > } > > > > The last one is the only one that makes any sense at all, but I fail to > > see how it is useful in HttpClient. > > Well... it should be byte transformation. > Like from srouce charset to the target charset. > > Your first two examples look like just one way ticket to me. > Probably it might work? > Or the last one is similar though... I'm not sure... > > > So Sung-Gu, please provide some justification for your -1 in terms of > > why the methods should remain in HttpClient - in particular where in > > HttpClient the method would be used and for what purpose. > > As I mentioned prevously... for example, a new method called perhaps > 'toAnotherDisplay' using the toUsingCharset method were used to > change your display for changing encoding by your web-browser directly... > > > > Regards, > > > > Adrian Sutton. > > Hope to be helpful, > > Sung-Gu > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >