Re: [VOTE] Re: 2.0 release - deprecate some methods?

2003-06-26 Thread Sung-Gu
-1
My vote...  :D


FYI, (it's from URI javadoc messages,  I think they are related to your
topic.)

[snip]
 * The character set used to store files SHALL remain a local decision
and
 * MAY depend on the capability of local operating systems. Prior to the
 * exchange of URIs they SHOULD be converted into a ISO/IEC 10646 format
 * and UTF-8 encoded. This approach, while allowing international
exchange
 * of URIs, will still allow backward compatibility with older systems
 * because the code set positions for ASCII characters are identical to
the
 * one byte sequence in UTF-8.
[snip]
 * Conversion from the local filesystem character set to UTF-8 will
 * normally involve a two step process. First convert the local
character
 * set to the UCS; then convert the UCS to UTF-8.
 * The first step in the process can be performed by maintaining a
mapping
 * table that includes the local character set code and the
corresponding
 * UCS code.
 * The next step is to convert the UCS character code to the UTF-8
encoding.
[snip]
 * To work globally either requires support of a number of character
sets
 * and to be able to convert between them, or the use of a single
preferred
 * character set.
 * For support of global compatibility it is STRONGLY RECOMMENDED that
 * clients and servers use UTF-8 encoding when exchanging URIs.
[snip]

I've made sample cases and posted it before. (even if it's not a normal
junit testcase though.)
And I'm not willing to make testcase for that. I'm not interested in unicode
values... at all...

To help you guys, you could find the above sentences (it means that's from
RFC) and more specified how-to in a RFC describing FTP protocol, as I
guess... - I don't remember at all... :( ...

Sung-Gu

-Original Message-
From: Adrian Sutton [mailto:[EMAIL PROTECTED]
Sent: Thu 6/26/2003 11:10
To: Commons HttpClient Project
Cc:
Subject: [VOTE] Re: 2.0 release
All,
Personally, I believe that this issue has gone on far too long and so I
would like to propose a vote:

I move the motion that the following methods from
org.apache.commons.httpclient.util.URIUtil be depreciated for the 2.0
release and removed in a future release:

toDocumentCharset(String)
toDocumentCharset(String, String)
toProtocolCharset(String)
toProtocolCharset(String, String)
toUsingCharset(String, String, String)

Please cast your votes:

+1 - The methods should be depreciated
0 - Active Abstain (no response being a passive abstain)
-1 - The methods should not be depreciated (veto)  Veto's must contain
an explanation of why the veto is appropriate.

Under Jakarta's voting guidelines
(http://jakarta.apache.org/site/decisions.html) product changes (such
as this) are subject to lazy consensus, however in this case I would
like to achieve consensus on the issue and as such the vote will be
considered passed if there are 3 binding +1 votes and no binding vetos
or the proposal will be turned down if there are any -1 votes.

I would encourage non-committers to submit non-binding votes as well,
particularly if you can see a use for the methods in question.

Here's my +1.

Regards,

Adrian Sutton.

On Thursday, June 26, 2003, at 06:25  PM, Kalnichevski, Oleg wrote:

 Odi,
 Laura eventually conceded that these methods did not seem to make a
 lot of sense or were too specialized to be of any use for the majority
 of the HttpClient users.

 http://marc.theaimsgroup.com/?l=httpclient-commons-
 devm=104577672115772w=2

 I do think that releasing HttpClient with stuff that makes no sense
 DOES harm.

 Again, after all, what is the bloody deal with writing a test case?
 Does it really have to take 5 months if these methods indeed make
 sense?

 Oleg


-
To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]












 -
 To unsubscribe, e-mail:
[EMAIL PROTECTED]
 For additional commands, e-mail:
[EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] Re: 2.0 release - deprecate some methods?

2003-06-26 Thread Ortwin Glck
Sung-Gu wrote:
I've made sample cases and posted it before. (even if it's not a normal
junit testcase though.)
And I'm not willing to make testcase for that. I'm not interested in unicode
values... at all...
Sung-Gu,

I there are no test cases, how can we be sure these methods do what they 
are supposed to do? I find it dangerous to have code that nobody 
understands (except of you) without any test cases.

Refusing to write test cases is not good beheviour for a programmer. Are 
you sure that you still want commit rights on the HttpClient repository?

Odi

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [VOTE] Re: 2.0 release - deprecate some methods? test case always certainly...?

2003-06-26 Thread Sung-Gu

- Original Message - 
From: Ortwin Glück [EMAIL PROTECTED]

 are supposed to do? I find it dangerous to have code that nobody

Ok, Sorry, then as you wish, I'll change my vote  +1.
I hope you're happy...

 understands (except of you) without any test cases.

Actually, I don't understand that you might notice and you might wonder
why there is encoding change menu in your web browser.  and you can
test it yourself though.

At least you should know what you should know for the near future.
They will be required and be introduced anyway someday, I believe...

I just wanted affect you guys do well and correctly...
I even pasted the articles (have you ever read it again or carefully?) from
javadoc messages.

 Refusing to write test cases is not good beheviour for a programmer. Are

But, not a duty... it's voluteering... no pay...
even the code constribution here...

 you sure that you still want commit rights on the HttpClient repository?

As you know, I'm not concern about HttpClient code any more for almost
several years...
My concern was for HttpClient 1.0 previously... only URI thingy for the
recent last 1 years.

Thank you for a chance to expain myself,

Sung-Gu

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [VOTE] Re: 2.0 release - deprecate some methods? test case always certainly...?

2003-06-26 Thread Kalnichevski, Oleg
 Actually, I don't understand that you might notice and you might wonder
 why there is encoding change menu in your web browser.  and you can
 test it yourself though.

Knock, knock. Hello, anybody home? Half a year ago, when this whole story started I 
tried to explain you one simple thing: in Java the concept of charset (encoding) is 
applicable to String to byte[] or byte[] to String conversions only. I do not speak 
English natively too and routinely have to deal with different Cyrillic encodings. I 
cannot think of a single case where String to String would make any sense, as Java 
uses Unicode (two byte) for string representation.

Laura argued that some Far East alphabets might be inadequately represented in 
Unicode. This is the only plausible explanation that I can think of. THIS IS EXACTLY 
THE USE CASE I HAVE BEEN ASKING YOU TO PRODUCE FOR 5 (FIVE) MONTHS



 But, not a duty... it's voluteering... no pay...
 even the code constribution here...

Guess what? We do not get paid either. But at least any other HttpClient contributor 
does make an effort to listen to his/her fellow developers. This has nothing to do 
with getting paid or not.

Oleg

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Re: 2.0 release - deprecate some methods?

2003-06-26 Thread Sung-Gu

- Original Message - 
From: Adrian Sutton [EMAIL PROTECTED]

 If you don't know why the code would be useful or what it was
 implemented based upon, why is it that you still want it in HttpClient?
   There is nothing that uses those methods anywhere in HttpClient  and
 the presence of an FTP RFC that requires them still wouldn't make them
 applicable to HttpClient since we aren't dealing with FTP.

It's not confined to only FTP.   It's for every internet 'application layer'
programs.


 String temporary = URIUtil.toUsingCharset(input, UTF-8, Big5);
 String result = URIUTIL.toUsingCharset(temporary, Big5, UTF-8);
 assertEquals(input, result);

   * \u4E01 is a Chinese character.  You can substitute \uCBBF for a wide
 range of Chinese characters and the test will still fail.

   * Big5 is a very commonly used charset for Chinese characters.

[reminder]
The first step in the process can be performed by maintaining a mapping
table that includes the local character set code and the corresponding UCS
code.
The next step is to convert the UCS character code to the UTF-8 encoding.

Hmmm I don't know about Big5 though...
As I guess, Big5 is not an UCS.   It should be unicode for second step.
If you want to find an UCS for Big5 automatically, you should insert some
code the toUsingCharset method perhaps.
Some might wor without UCS transformation though, it must be required I
guess.

 If you read the JavaDoc for the String constructor being used
 (String(byte[], String)), it says:
 Constructs a new String by decoding the specified array of bytes using
 the specified charset.
 Note the use of the word decoding which means that instead of
 creating a String backed by the given byte array, it uses the specified
 charset to convert the bytes into actual characters - conceptually
 these characters have no particular encoding since they are
 (conceptually) the actual characters rather than a byte representation
 of the characters.  In reality, the characters are represented in
 memory by a series of bytes in UTF-8 encoding as required by the JVM
 specification.

UTF-8 is tranformation charset, not really display charset.
It's not always used as String class in java I guess.

 Secondly, the toUsingCharset method cannot work in most situations
 because it converts the string to bytes using one encoding and then
 converts those bytes to a String using a different encoding.  To
 highlight why this cannot work, create a text file and save it to disk
 using ASCII encoding.  Then, attempt to read the file back in as EBDIC
 encoding (or any double-byte character charset like UTF-16), the text

EBDIC is also not UCS.

 will have become corrupted because the bytes were mapped to characters
 using the wrong charset (a charset is simply a mapping between bytes
 and characters).

 So, the possible ways for toUsingCharset to fulfill it's contract is
 for it to be changed to:

 public String toUsingCharset(String target, String fromCharset, String
 toCharset) {
 return target;
 }

 OR to:

 public byte[] toUsingCharset(String target, String toCharset) {
 return target.getBytes(toCharset);
 }

 OR to:

 public byte[] toUsingCharset(byte[] target, String fromCharset, String
 toCharset) {
 return new String(target, fromCharset).getBytes(toCharset);
 }

 The last one is the only one that makes any sense at all, but I fail to
 see how it is useful in HttpClient.

Well... it should be byte transformation.
Like from srouce charset to the target charset.

Your first two examples look like just one way ticket to me.
Probably it might work?
Or the last one is similar though... I'm not sure...

 So Sung-Gu, please provide some justification for your -1 in terms of
 why the methods should remain in HttpClient - in particular where in
 HttpClient the method would be used and for what purpose.

As I mentioned prevously...  for example, a new method called perhaps
'toAnotherDisplay' using the toUsingCharset method were used to
change your display for changing encoding by your web-browser directly...


 Regards,

 Adrian Sutton.

Hope to be helpful,

Sung-Gu

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] Re: 2.0 release - deprecate some methods?

2003-06-26 Thread Jandalf
+1

As these methods are not used by HttpClient and do not appear useful for 
httpclient for httpclient users combined with the reality that they do 
not appear correct and the pending desire to start a Commons-URI project 
indicates that they should not be in the public interface of HttpClient.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [VOTE] Re: 2.0 release - deprecate some methods?

2003-06-26 Thread Laura Werner
Adrian Sutton wrote:

The flaw in the toUsingCharset method is two-fold:
Firstly, Strings in Java are *always* stored internally as UTF-8


I agree with the rest of your analysis of this, but I thought I should 
point out that Java Strings and chars are stored in UTF-16 rather than 
UTF-8.  A char is an unsigned, two-byte value that can hold all the 
characters from UCS2.

As far as toUsingCharset goes, I agree that it looks broken.  The code 
basically does:

   return new String(target.getBytes(fromCharset), toCharset);

It's taking target, which is a UTF-16 string, encoding it into a byte 
array in fromCharset, and then decoding those bytes back into UTF-16 
using toCharset.  So it's pretendeing the bytes in the array have two 
different meanings, one when it writes them and one when it reads them 
immediately afterward.  I can't see how this could be correct.

-- Laura

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [VOTE] Re: 2.0 release - deprecate some methods? test casealways certainly...?

2003-06-26 Thread Ortwin Glück
Sung-Gu wrote:
But, not a duty... it's voluteering... no pay...
even the code constribution here...
I pay you $5 for test cases from my Paypal account. Promise.

:-)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [VOTE] Re: 2.0 release - deprecate some methods?

2003-06-26 Thread André-John Mas
This doesn't look correct, if you are really wanting to convert
from one charset to another then you would have to do something
such as:
   String myString = new String(bytes,bytesCharset);
   byte[] bytes2 = myString.getBytes(destCharset);
Until you have the bytes, you don't have the final output, since
strings will be affected by the platformas native encoding if
you aren't careful. Otherwise if your destination is an outputstream,
then let the OutputWriter do the work for you:
   String myString = new String(bytes,bytesCharset);
   OutputStreamWriter out = new
   OutputStreamWriter(outStream, destCharset)
   out.write(myString);
I have just had to write a project that is fully UTF-8 compliant
and it taught me a lot about what Java does. Without any encoding
specified the string conversion default to the platform native
format, which is not what you always want. I had to go everywhere
and make sure the right conversions were being performed.
regards

Andre

Laura Werner wrote:

Adrian Sutton wrote:

The flaw in the toUsingCharset method is two-fold:
Firstly, Strings in Java are *always* stored internally as UTF-8


I agree with the rest of your analysis of this, but I thought I should 
point out that Java Strings and chars are stored in UTF-16 rather than 
UTF-8.  A char is an unsigned, two-byte value that can hold all the 
characters from UCS2.

As far as toUsingCharset goes, I agree that it looks broken.  The code 
basically does:

   return new String(target.getBytes(fromCharset), toCharset);

It's taking target, which is a UTF-16 string, encoding it into a byte 
array in fromCharset, and then decoding those bytes back into UTF-16 
using toCharset.  So it's pretendeing the bytes in the array have two 
different meanings, one when it writes them and one when it reads them 
immediately afterward.  I can't see how this could be correct.

-- Laura

-
To unsubscribe, e-mail: 
[EMAIL PROTECTED]
For additional commands, e-mail: 
[EMAIL PROTECTED]




--
André-John Mas
Software Developer / Développeur Informatique
Newtrade Technologies
63 de Brésoles, Suite 100, Montreal, Quebec, Canada H2Y 1V7
mailto:[EMAIL PROTECTED]
tel +1 514 286-8187 x3017
fax +1 514 221-3287
--
If you have received this message in error, please notify the sender
immediately and delete the original without making a copy, disclosing
its contents or taking any action based thereon.
Si vous avez reçu ce message par erreur, veuillez en aviser
immédiatement le signataire et effacer l'original, sans en tirer de
copie, en dévoiler le contenu ni prendre quelque mesure fondée sur
celui-ci.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [VOTE] Re: 2.0 release - deprecate some methods?

2003-06-26 Thread secoskem
I just got through Internationalizing a website... input and output.  I 
ran into the exact same issues, and as Andre states, you pretty much need 
to check everywhere for byte[] -String and String-byte[].

Then do the conversions he's given.  I personally liked the more terse:

byte[] outbytes = new String(inbytes, 
inputEncoding).getBytes(outputEncoding);

- Matt Secoske






André-John Mas [EMAIL PROTECTED]
06/26/2003 03:49 PM
Please respond to Commons HttpClient Project

 
To: Commons HttpClient Project [EMAIL PROTECTED]
cc: 
Subject:Re: [VOTE] Re: 2.0 release - deprecate some methods?


This doesn't look correct, if you are really wanting to convert
from one charset to another then you would have to do something
such as:

String myString = new String(bytes,bytesCharset);
byte[] bytes2 = myString.getBytes(destCharset);

Until you have the bytes, you don't have the final output, since
strings will be affected by the platformas native encoding if
you aren't careful. Otherwise if your destination is an outputstream,
then let the OutputWriter do the work for you:

String myString = new String(bytes,bytesCharset);
OutputStreamWriter out = new
OutputStreamWriter(outStream, destCharset)
out.write(myString);

I have just had to write a project that is fully UTF-8 compliant
and it taught me a lot about what Java does. Without any encoding
specified the string conversion default to the platform native
format, which is not what you always want. I had to go everywhere
and make sure the right conversions were being performed.

regards

Andre

Laura Werner wrote:

 Adrian Sutton wrote:
 
 The flaw in the toUsingCharset method is two-fold:
 Firstly, Strings in Java are *always* stored internally as UTF-8
 
 
 
 I agree with the rest of your analysis of this, but I thought I should 
 point out that Java Strings and chars are stored in UTF-16 rather than 

 UTF-8.  A char is an unsigned, two-byte value that can hold all the 
 characters from UCS2.
 
 As far as toUsingCharset goes, I agree that it looks broken.  The code 
 basically does:
 
return new String(target.getBytes(fromCharset), toCharset);
 
 It's taking target, which is a UTF-16 string, encoding it into a byte 
 array in fromCharset, and then decoding those bytes back into UTF-16 
 using toCharset.  So it's pretendeing the bytes in the array have two 
 different meanings, one when it writes them and one when it reads them 
 immediately afterward.  I can't see how this could be correct.
 
 -- Laura
 
 
 -
 To unsubscribe, e-mail: 
 [EMAIL PROTECTED]
 For additional commands, e-mail: 
 [EMAIL PROTECTED]
 
 


-- 
André-John Mas
Software Developer / Développeur Informatique
Newtrade Technologies
63 de Brésoles, Suite 100, Montreal, Quebec, Canada H2Y 1V7
mailto:[EMAIL PROTECTED]
tel +1 514 286-8187 x3017
fax +1 514 221-3287

--
If you have received this message in error, please notify the sender
immediately and delete the original without making a copy, disclosing
its contents or taking any action based thereon.

Si vous avez reçu ce message par erreur, veuillez en aviser
immédiatement le signataire et effacer l'original, sans en tirer de
copie, en dévoiler le contenu ni prendre quelque mesure fondée sur
celui-ci.





-
To unsubscribe, e-mail: 
[EMAIL PROTECTED]
For additional commands, e-mail: 
[EMAIL PROTECTED]





Re: [VOTE] Re: 2.0 release - deprecate some methods?

2003-06-26 Thread Oleg Kalnichevski
On Thu, 2003-06-26 at 23:40, [EMAIL PROTECTED] wrote:
 I just got through Internationalizing a website... input and output.  I 
 ran into the exact same issues, and as Andre states, you pretty much need 
 to check everywhere for byte[] -String and String-byte[].


Matt, we actually do. We do have a specialised class called
HttpConstants that requires a charset to be explicitly specified for all
byte[] to String and String to byte[] conversions. Httpclient's coding
guidelines require HttpConstants class to be used for ALL byte[] to
String and String to byte[] conversions. And we are VERY strict about
it.

The problem is that we can't convince just one guy to fix just one damn
method. This issue has been dragging on for 6 (six) months already and
finally has really blown out of all reasonable proportions

Oleg


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] Re: 2.0 release - deprecate some methods?

2003-06-26 Thread André-John Mas
This doesn't look correct, if you are really wanting to convert
from one charset to another then you would have to do something
such as:
   String myString = new String(bytes,bytesCharset);
   byte[] bytes2 = myString.getBytes(destCharset);
Until you have the bytes, you don't have the final output, since
strings will be affected by the platformas native encoding if
you aren't careful. Otherwise if your destination is an outputstream, 
then let the OutputWriter do the work for you:

   String myString = new String(bytes,bytesCharset);
   OutputStreamWriter out = new
   OutputStreamWriter(outStream, destCharset)
   out.write(myString);
I have just had to write a project that is fully UTF-8 compliant
and it taught me a lot about what Java does. Without any encoding
specified the string conversion default to the platform native
format, which is not what you always want. I had to go everywhere
and make sure the right conversions were being performed.
regards

Andre

Laura Werner wrote:

Adrian Sutton wrote:

The flaw in the toUsingCharset method is two-fold:
Firstly, Strings in Java are *always* stored internally as UTF-8


I agree with the rest of your analysis of this, but I thought I should 
point out that Java Strings and chars are stored in UTF-16 rather than 
UTF-8.  A char is an unsigned, two-byte value that can hold all the 
characters from UCS2.

As far as toUsingCharset goes, I agree that it looks broken.  The code 
basically does:

   return new String(target.getBytes(fromCharset), toCharset);

It's taking target, which is a UTF-16 string, encoding it into a byte 
array in fromCharset, and then decoding those bytes back into UTF-16 
using toCharset.  So it's pretendeing the bytes in the array have two 
different meanings, one when it writes them and one when it reads them 
immediately afterward.  I can't see how this could be correct.

-- Laura

-
To unsubscribe, e-mail: 
[EMAIL PROTECTED]
For additional commands, e-mail: 
[EMAIL PROTECTED]




--
André-John Mas
Software Developer / Développeur Informatique
Newtrade Technologies
63 de Brésoles, Suite 100, Montreal, Quebec, Canada H2Y 1V7
mailto:[EMAIL PROTECTED]
tel +1 514 286-8187 x3017
fax +1 514 221-3287
--
If you have received this message in error, please notify the sender
immediately and delete the original without making a copy, disclosing
its contents or taking any action based thereon.
Si vous avez reçu ce message par erreur, veuillez en aviser
immédiatement le signataire et effacer l'original, sans en tirer de
copie, en dévoiler le contenu ni prendre quelque mesure fondée sur
celui-ci.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]