Re: [classlib] charset decoding

2006-04-28 Thread Richard Liang

Paulex Yang wrote:

Vladimir Strigun wrote:

On 4/26/06, Paulex Yang [EMAIL PROTECTED] wrote:
 

Andrew Zhang wrote:
   

Mikhail Loenko wrote:
 

2006/4/26, Andrew Zhang [EMAIL PROTECTED]:
   

Vladimir Strigun wrote:
 

On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote:
I try to find best solution for Harmony-166 and during fix 
preparation

I've found current issue.

Method test_read from tests.api.java.io.InputStreamReaderTest 
failed

and first fix I've created was with in.available() but I agree with
Mikhail that it's possible not the best solution.


why?

Yes, the spec says The available method for class InputStream 
always

returns 0.. But it also says This method should be overridden by
subclasses.. :)

Here's the available description:
Returns the number of bytes that can be read (or skipped over) from
this input stream without blocking by the next caller of a method 
for

this input stream.

If someone writes a subclass of InputStream that available returns 0
even if there are some data available, then he should take the
result of
cheating or contradicting with spec.
  

What if someone just unable to say if the bytes are available?
For example, he reads something from a hardware module and
the only way to know whether there are bytes is try reading?



I'm a little confused.

How does solution 2 handle this problem?
  

I agree with Mikhail that available is not reliable.  You can judge the
stream end has been reached if and only if you got -1 returned from 
read

methods.

I did a quick view of Harmony-166 and patches(seems it's the root of 
why

Harmony-410 is raised), I think there should be some easier way to fix
this bug,  while the CharsetDecoder doesn't need to be modified, pls.
see below for details.

The fillBuf() of InputStreamReader should be looks like:

   private void fillBuf() throws IOException {
   chars.clear();
   int read = 0;
  // if we haven't reached the end of stream, and cannot decode
even one character,
 // go on to read and decode
   while(read != -1  chars.position() == 0){
   try {
   read = in.read(bytes.array());
   } catch (IOException e) {
   chars.limit(0);
   throw e;
   }
   boolean endOfInput = false;
   if(read == -1){
   //if we have reached the end of stream, try the last
time to decode
   bytes.limit(0);
   endOfInput = true;
   }else{
   //if we read some bytes, try to decode them
   bytes.limit(read);
   read = 0;
   }
   decoder.decode(bytes, chars, endOfInput);
   bytes.clear();
   }
   chars.flip();
   }

I've got a pass to run Vladimir's test case for Harmony-166.

If no one objections, I'll attach patch based on this. comments?



Paulex, unfortunately I can't agree with your patch.
InputStreamReaderTest passed, but OutputStreamWriterTest still failed.
Could you please try to run test I have mentioned?
  
I see, the OutputStreamWriterTest does fail, seems there's some 
problem for CharsetDecoder to handling UTF-8 byte stream, but the 
InputStreamReader itself should work because it pass all tests for 
other decoding. I'll go on to study it.


And I also realized there is a bug in my former proposal - it cannot 
support the non-blocking InputStream, so the revisited version is as 
below:

   private void fillBuf() throws IOException {
   chars.clear();
   int read = 0;
   do{
   try {
   read = in.read(bytes.array());
   } catch (IOException e) {
   chars.limit(0);
   throw e;
   }
   boolean endOfInput = false;
   if(read == -1){
   bytes.limit(0);
   endOfInput = true;
   }else{
   bytes.limit(read);
   }
   decoder.decode(bytes, chars, endOfInput);
   bytes.clear();
   }while(read  0  chars.position() == 0);
//the main difference with prior version is to check read0 instead of 
read != -1, so that the InputStreamReader based on non-blocking 
InputStream can return immediatelly

   chars.flip();
   }

Thanks.
Vladimir.

 

Thanks,
Mikhail

   

In current implementation of InputStreamReader endOfInput variable
sets to true if reader can't read less that 8192 bytes. When
InputStreamReader try to read one character from
LimitedByteArrayInputStream (A ByteArrayInputStream that only 
returns
a single byte per read) true as a parameter passed to charset 
decoder,

nevertheless we still have further input from
LimitedByteArrayInputStream.

2 methods from tests.api.java.io.OutputStreamWriterTest also failed
because of read() method implementation of InputStreamReader.

IMO, we have 2 ways to fix Harmony-166 :

1. Don't change the behavior of CharsetDecoder, and use 
in.available()

method for endOfInput variable. If in.available()  

Re: [classlib] charset decoding

2006-04-28 Thread Paulex Yang

Richard Liang wrote:

Paulex Yang wrote:
I see, the OutputStreamWriterTest does fail, seems there's some 
problem for CharsetDecoder to handling UTF-8 byte stream, but the 
InputStreamReader itself should work because it pass all tests for 
other decoding. I'll go on to study it.


And I also realized there is a bug in my former proposal - it cannot 
support the non-blocking InputStream, so the revisited version is as 
below:

   private void fillBuf() throws IOException {
   chars.clear();
   int read = 0;
   do{
   try {
   read = in.read(bytes.array());
   } catch (IOException e) {
   chars.limit(0);
   throw e;
   }
   boolean endOfInput = false;
   if(read == -1){
   bytes.limit(0);
   endOfInput = true;
   }else{
   bytes.limit(read);
   }
   decoder.decode(bytes, chars, endOfInput);
   bytes.clear();
   }while(read  0  chars.position() == 0);
//the main difference with prior version is to check read0 instead 
of read != -1, so that the InputStreamReader based on non-blocking 
InputStream can return immediatelly

   chars.flip();
   }  

Yes. It IS a bug of ICU4JNI. I have submitted a bug [1] for ICU and 
have proposed a fix for it.


[1] http://bugs.icu-project.org/cgi-bin/icu-bugs/incoming?findid=5180



Great! Thank you, Richard. Vladimir, how about I attach patch for 
InputStreamReader to Harmony-166 at first? At least, it can make 
InputStreamReaderTest pass.


Richard, have you tried the OutputStreamWriterTest with your patch for 
ICU and my modification to InputStreamReader?


--
Paulex Yang
China Software Development Lab
IBM



-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[classlib] charset decoding (was: [Fwd: [jira] Commented: (HARMONY-410) method decode(ByteBuffer, CharBuffer, boolean) should set correct position in ByteBuffer])

2006-04-26 Thread Tim Ellison
Looks like a conversation best conducted on the dev list...

 Original Message 
Subject: [jira] Commented: (HARMONY-410) method decode(ByteBuffer,
CharBuffer, boolean) should set correct position in ByteBuffer
Date: Wed, 26 Apr 2006 08:25:03 + (GMT+00:00)
From: Vladimir Strigun (JIRA) [EMAIL PROTECTED]
Reply-To: harmony-dev@incubator.apache.org
To: [EMAIL PROTECTED]

[
http://issues.apache.org/jira/browse/HARMONY-410?page=comments#action_12376425
]

Vladimir Strigun commented on HARMONY-410:
--

Hi Richard,

Thanks for the clarification, I agree that streaming decode is good
thing, but I'd like to explain my understanding of specification :)
According to specification of CharsetDecoder decoding operation should
process by the following steps:

2. Invoke the decode method zero or more times, as long as additional
input may be available, passing false for the endOfInput argument and
filling the input buffer and flushing the output buffer between
invocations;

3. Invoke the decode method one final time, passing true for the
endOfInput argument; and then

spec also says:
The buffers' positions will be advanced to reflect the bytes read and
the characters written, but their marks and limits will not be modified

I understand these sentences in the next way:
invoke decode with endOfInput = false if you have additional input, then
fill the buffer (i.e. add to buffer some additional input), invoke
decode again and pass correct endOfInput parameter dependent of
availability of input.

Example you provided is very useful and, of course, 1st option looks
better, but what I'm suggest here is to reflect actual processed bytes
in input. After first invocation of decode, not all bytes processed
actually, i.e. almost all bytes processed, but some stored in decoder to
further operation. Is it possible to store bytes in decoder, support
streaming decoding, and, at the same time sets correct position in input
buffer after each operation?

Thanks.
Vladimir.

 method decode(ByteBuffer, CharBuffer, boolean) should set correct position in 
 ByteBuffer
 

  Key: HARMONY-410
  URL: http://issues.apache.org/jira/browse/HARMONY-410
  Project: Harmony
 Type: Bug

   Components: Classlib
 Reporter: Vladimir Strigun
 Assignee: Mikhail Loenko
 Priority: Minor
  Attachments: Harmony-410_patch.txt, harmony-410_test.txt

 When ByteBuffer contain incomplete sequence of bytes for successful decoding, 
 position in ByteBuffer should be set to latest successful byte. I will attach 
 testcase and patch soon.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-- 

Tim Ellison ([EMAIL PROTECTED])
IBM Java technology centre, UK.

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[classlib] charset decoding

2006-04-26 Thread Vladimir Strigun
On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote:
 Hi Vladimir,

 I think we both understand the spec :)

 Here I propose two solutions:

 1. Set the ByteBuffer to the limit, and store the remaining ByteBuffer in
 decoder. Invokers doesn't care the position of in. If the result is
 UNDERFLOW and there're furthur input, just pass the new input to the
 decoder. It's a typical streaming decoder.  That's what Harmony does
 currently.

 2. Decoder doesn't store remaining ByteBuffer. Position of in is set to
 indicate the remaining ByteBuffer. Invoker should take care to generate new
 input ByteBuffer for next invocation.  RI acts in this way.

 Both are acceptable.

 However, I think solution 1 is more reasonable.

 Is it possible to store bytes in decoder, support streaming decoding, and,
 at the same time sets correct position in input buffer after each operation?
 

 Yes.  In fact, your patch will make Harmony act as the description above.

 However, I don't think it solve the problem. Maybe invoker is more
 confusable and may think:

 Is the remaining bytebuffer maintained in decoder or in ?  Shall I get the
 remaining buffer from in and pass it for next invocation?

 Anyway, we need a decision on this compatibility issue.
 By the way, Vladimir, does solution one cause any problem on other classlib
 implementation?


I try to find best solution for Harmony-166 and during fix preparation
I've found current issue.

Method test_read from tests.api.java.io.InputStreamReaderTest failed
and first fix I've created was with in.available() but I agree with
Mikhail that it's possible not the best solution.

In current implementation of InputStreamReader endOfInput variable
sets to true if reader can't read less that 8192 bytes. When
InputStreamReader try to read one character from
LimitedByteArrayInputStream (A ByteArrayInputStream that only returns
a single byte per read) true as a parameter passed to charset decoder,
nevertheless we still have further input from
LimitedByteArrayInputStream.

2 methods from tests.api.java.io.OutputStreamWriterTest also failed
because of read() method implementation of InputStreamReader.

IMO, we have 2 ways to fix Harmony-166 :

1. Don't change the behavior of CharsetDecoder, and use in.available()
method for endOfInput variable. If in.available()  0 pass false to
decoder, read additional one byte from input stream and try to decode
again. If decoding operation can decode remaining portion + one
additional byte to character, stop here, else try to read addition
byte again. Function fillBuf will invoke with the next invocation of
read() method and next portion of 8192 bytes will be read from input
stream.

2. Change the behavior of CharsetDecoder and operate with
bytes.remaining() for calculating endOfInput. In this case, we always
pass false with first invocation of decode method. Algorithm further
is almost the same as above, but we stop the cycle if all bytes
decoded successfully (i.e if bytes.hasRemaining() is false).

What do you think about it?


Thanks.
Vladimir.

 Any comment?

 Thanks !


 Vladimir Strigun commented on HARMONY-410:
 --

 Hi Richard,

 Thanks for the clarification, I agree that streaming decode is good thing,
 but I'd like to explain my understanding of specification :)
 According to specification of CharsetDecoder decoding operation should
 process by the following steps:
 
 2. Invoke the decode method zero or more times, as long as additional input
 may be available, passing false for the endOfInput argument and filling the
 input buffer and flushing the output buffer between invocations;

 3. Invoke the decode method one final time, passing true for the endOfInput
 argument; and then
 
 spec also says:
 The buffers' positions will be advanced to reflect the bytes read and the
 characters written, but their marks and limits will not be modified

 I understand these sentences in the next way:
 invoke decode with endOfInput = false if you have additional input, then
 fill the buffer (i.e. add to buffer some additional input), invoke decode
 again and pass correct endOfInput parameter dependent of availability of
 input.

 Example you provided is very useful and, of course, 1st option looks better,
 but what I'm suggest here is to reflect actual processed bytes in input.
 After first invocation of decode, not all bytes processed actually, i.e.
 almost all bytes processed, but some stored in decoder to further operation.
 Is it possible to store bytes in decoder, support streaming decoding, and,
 at the same time sets correct position in input buffer after each operation?

 Thanks.
 Vladimir.

  method decode(ByteBuffer, CharBuffer, boolean) should set correct position
 in ByteBuffer
 
 
 
   Key: HARMONY-410
   URL: http://issues.apache.org/jira/browse/HARMONY-410
   Project: Harmony
  Type: Bug


Re: [classlib] charset decoding (was: [Fwd: [jira] Commented: (HARMONY-410) method decode(ByteBuffer, CharBuffer, boolean) should set correct position in ByteBuffer])

2006-04-26 Thread Geir Magnusson Jr


Tim Ellison wrote:

Looks like a conversation best conducted on the dev list...


I have to admit I actually laughed out loud. Thx :)

geir




-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [classlib] charset decoding

2006-04-26 Thread Andrew Zhang

Vladimir Strigun wrote:

On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote:




I try to find best solution for Harmony-166 and during fix preparation
I've found current issue.

Method test_read from tests.api.java.io.InputStreamReaderTest failed
and first fix I've created was with in.available() but I agree with
Mikhail that it's possible not the best solution.

why?

Yes, the spec says The available method for class InputStream always 
returns 0.. But it also says This method should be overridden by 
subclasses.. :)


Here's the available description:
Returns the number of bytes that can be read (or skipped over) from 
this input stream without blocking by the next caller of a method for 
this input stream.


If someone writes a subclass of InputStream that available returns 0 
even if there are some data available, then he should take the result of 
cheating or contradicting with spec.


In current implementation of InputStreamReader endOfInput variable
sets to true if reader can't read less that 8192 bytes. When
InputStreamReader try to read one character from
LimitedByteArrayInputStream (A ByteArrayInputStream that only returns
a single byte per read) true as a parameter passed to charset decoder,
nevertheless we still have further input from
LimitedByteArrayInputStream.

2 methods from tests.api.java.io.OutputStreamWriterTest also failed
because of read() method implementation of InputStreamReader.

IMO, we have 2 ways to fix Harmony-166 :

1. Don't change the behavior of CharsetDecoder, and use in.available()
method for endOfInput variable. If in.available()  0 pass false to
decoder, read additional one byte from input stream and try to decode
again. If decoding operation can decode remaining portion + one
additional byte to character, stop here, else try to read addition
byte again. Function fillBuf will invoke with the next invocation of
read() method and next portion of 8192 bytes will be read from input
stream.


It's reasonable.
After all, streaming decode is useful and helpful to users.
IMO, streaming decode is a highlight compared with some RI :)


2. Change the behavior of CharsetDecoder and operate with
bytes.remaining() for calculating endOfInput. In this case, we always
pass false with first invocation of decode method. Algorithm further
is almost the same as above, but we stop the cycle if all bytes
decoded successfully (i.e if bytes.hasRemaining() is false).

What do you think about it?


Thanks.
Vladimir.


Thanks!

--
Andrew Zhang
China Software Development Lab, IBM




Any comment?

Thanks !


Vladimir Strigun commented on HARMONY-410:
--

Hi Richard,

Thanks for the clarification, I agree that streaming decode is good thing,
but I'd like to explain my understanding of specification :)
According to specification of CharsetDecoder decoding operation should
process by the following steps:

2. Invoke the decode method zero or more times, as long as additional input
may be available, passing false for the endOfInput argument and filling the
input buffer and flushing the output buffer between invocations;

3. Invoke the decode method one final time, passing true for the endOfInput
argument; and then

spec also says:
The buffers' positions will be advanced to reflect the bytes read and the
characters written, but their marks and limits will not be modified

I understand these sentences in the next way:
invoke decode with endOfInput = false if you have additional input, then
fill the buffer (i.e. add to buffer some additional input), invoke decode
again and pass correct endOfInput parameter dependent of availability of
input.

Example you provided is very useful and, of course, 1st option looks better,
but what I'm suggest here is to reflect actual processed bytes in input.
After first invocation of decode, not all bytes processed actually, i.e.
almost all bytes processed, but some stored in decoder to further operation.
Is it possible to store bytes in decoder, support streaming decoding, and,
at the same time sets correct position in input buffer after each operation?

Thanks.
Vladimir.


method decode(ByteBuffer, CharBuffer, boolean) should set correct position

in ByteBuffer


 Key: HARMONY-410
 URL: http://issues.apache.org/jira/browse/HARMONY-410
 Project: Harmony
Type: Bug
  Components: Classlib
Reporter: Vladimir Strigun
Assignee: Mikhail Loenko
Priority: Minor
 Attachments: Harmony-410_patch.txt, harmony-410_test.txt

When ByteBuffer contain incomplete sequence of bytes for successful

decoding, position in ByteBuffer should be set to latest successful byte. I
will attach testcase and patch soon.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
 http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
 

Re: [classlib] charset decoding

2006-04-26 Thread Mikhail Loenko
2006/4/26, Andrew Zhang [EMAIL PROTECTED]:
 Vladimir Strigun wrote:
  On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote:

 
  I try to find best solution for Harmony-166 and during fix preparation
  I've found current issue.
 
  Method test_read from tests.api.java.io.InputStreamReaderTest failed
  and first fix I've created was with in.available() but I agree with
  Mikhail that it's possible not the best solution.
 why?

 Yes, the spec says The available method for class InputStream always
 returns 0.. But it also says This method should be overridden by
 subclasses.. :)

 Here's the available description:
 Returns the number of bytes that can be read (or skipped over) from
 this input stream without blocking by the next caller of a method for
 this input stream.

 If someone writes a subclass of InputStream that available returns 0
 even if there are some data available, then he should take the result of
 cheating or contradicting with spec.

What if someone just unable to say if the bytes are available?
For example, he reads something from a hardware module and
the only way to know whether there are bytes is try reading?

Thanks,
Mikhail

 
  In current implementation of InputStreamReader endOfInput variable
  sets to true if reader can't read less that 8192 bytes. When
  InputStreamReader try to read one character from
  LimitedByteArrayInputStream (A ByteArrayInputStream that only returns
  a single byte per read) true as a parameter passed to charset decoder,
  nevertheless we still have further input from
  LimitedByteArrayInputStream.
 
  2 methods from tests.api.java.io.OutputStreamWriterTest also failed
  because of read() method implementation of InputStreamReader.
 
  IMO, we have 2 ways to fix Harmony-166 :
 
  1. Don't change the behavior of CharsetDecoder, and use in.available()
  method for endOfInput variable. If in.available()  0 pass false to
  decoder, read additional one byte from input stream and try to decode
  again. If decoding operation can decode remaining portion + one
  additional byte to character, stop here, else try to read addition
  byte again. Function fillBuf will invoke with the next invocation of
  read() method and next portion of 8192 bytes will be read from input
  stream.
 
 It's reasonable.
 After all, streaming decode is useful and helpful to users.
 IMO, streaming decode is a highlight compared with some RI :)

  2. Change the behavior of CharsetDecoder and operate with
  bytes.remaining() for calculating endOfInput. In this case, we always
  pass false with first invocation of decode method. Algorithm further
  is almost the same as above, but we stop the cycle if all bytes
  decoded successfully (i.e if bytes.hasRemaining() is false).
 
  What do you think about it?
 
 
  Thanks.
  Vladimir.

 Thanks!

 --
 Andrew Zhang
 China Software Development Lab, IBM

 
  Any comment?
 
  Thanks !
 
 
  Vladimir Strigun commented on HARMONY-410:
  --
 
  Hi Richard,
 
  Thanks for the clarification, I agree that streaming decode is good thing,
  but I'd like to explain my understanding of specification :)
  According to specification of CharsetDecoder decoding operation should
  process by the following steps:
  
  2. Invoke the decode method zero or more times, as long as additional input
  may be available, passing false for the endOfInput argument and filling the
  input buffer and flushing the output buffer between invocations;
 
  3. Invoke the decode method one final time, passing true for the endOfInput
  argument; and then
  
  spec also says:
  The buffers' positions will be advanced to reflect the bytes read and the
  characters written, but their marks and limits will not be modified
 
  I understand these sentences in the next way:
  invoke decode with endOfInput = false if you have additional input, then
  fill the buffer (i.e. add to buffer some additional input), invoke decode
  again and pass correct endOfInput parameter dependent of availability of
  input.
 
  Example you provided is very useful and, of course, 1st option looks 
  better,
  but what I'm suggest here is to reflect actual processed bytes in input.
  After first invocation of decode, not all bytes processed actually, i.e.
  almost all bytes processed, but some stored in decoder to further 
  operation.
  Is it possible to store bytes in decoder, support streaming decoding, and,
  at the same time sets correct position in input buffer after each 
  operation?
 
  Thanks.
  Vladimir.
 
  method decode(ByteBuffer, CharBuffer, boolean) should set correct position
  in ByteBuffer
  
   Key: HARMONY-410
   URL: http://issues.apache.org/jira/browse/HARMONY-410
   Project: Harmony
  Type: Bug
Components: Classlib
  Reporter: Vladimir Strigun
  Assignee: Mikhail Loenko
  Priority: Minor
   Attachments: Harmony-410_patch.txt, 

Re: [classlib] charset decoding

2006-04-26 Thread Andrew Zhang


Mikhail Loenko wrote:

2006/4/26, Andrew Zhang [EMAIL PROTECTED]:

Vladimir Strigun wrote:

On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote:
I try to find best solution for Harmony-166 and during fix preparation
I've found current issue.

Method test_read from tests.api.java.io.InputStreamReaderTest failed
and first fix I've created was with in.available() but I agree with
Mikhail that it's possible not the best solution.

why?

Yes, the spec says The available method for class InputStream always
returns 0.. But it also says This method should be overridden by
subclasses.. :)

Here's the available description:
Returns the number of bytes that can be read (or skipped over) from
this input stream without blocking by the next caller of a method for
this input stream.

If someone writes a subclass of InputStream that available returns 0
even if there are some data available, then he should take the result of
cheating or contradicting with spec.




What if someone just unable to say if the bytes are available?
For example, he reads something from a hardware module and
the only way to know whether there are bytes is try reading?



I'm a little confused.

How does solution 2 handle this problem?


Thanks,
Mikhail


In current implementation of InputStreamReader endOfInput variable
sets to true if reader can't read less that 8192 bytes. When
InputStreamReader try to read one character from
LimitedByteArrayInputStream (A ByteArrayInputStream that only returns
a single byte per read) true as a parameter passed to charset decoder,
nevertheless we still have further input from
LimitedByteArrayInputStream.

2 methods from tests.api.java.io.OutputStreamWriterTest also failed
because of read() method implementation of InputStreamReader.

IMO, we have 2 ways to fix Harmony-166 :

1. Don't change the behavior of CharsetDecoder, and use in.available()
method for endOfInput variable. If in.available()  0 pass false to
decoder, read additional one byte from input stream and try to decode
again. If decoding operation can decode remaining portion + one
additional byte to character, stop here, else try to read addition
byte again. Function fillBuf will invoke with the next invocation of
read() method and next portion of 8192 bytes will be read from input
stream.


It's reasonable.
After all, streaming decode is useful and helpful to users.
IMO, streaming decode is a highlight compared with some RI :)


2. Change the behavior of CharsetDecoder and operate with
bytes.remaining() for calculating endOfInput. In this case, we always
pass false with first invocation of decode method. Algorithm further
is almost the same as above, but we stop the cycle if all bytes
decoded successfully (i.e if bytes.hasRemaining() is false).


Vladimir, Could you please give the detail code or pseudo code of these 
two solutions?


Thanks!



What do you think about it?


Thanks.
Vladimir.

Thanks!




-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [classlib] charset decoding

2006-04-26 Thread Paulex Yang

Andrew Zhang wrote:


Mikhail Loenko wrote:

2006/4/26, Andrew Zhang [EMAIL PROTECTED]:

Vladimir Strigun wrote:

On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote:
I try to find best solution for Harmony-166 and during fix preparation
I've found current issue.

Method test_read from tests.api.java.io.InputStreamReaderTest failed
and first fix I've created was with in.available() but I agree with
Mikhail that it's possible not the best solution.

why?

Yes, the spec says The available method for class InputStream always
returns 0.. But it also says This method should be overridden by
subclasses.. :)

Here's the available description:
Returns the number of bytes that can be read (or skipped over) from
this input stream without blocking by the next caller of a method for
this input stream.

If someone writes a subclass of InputStream that available returns 0
even if there are some data available, then he should take the 
result of

cheating or contradicting with spec.




What if someone just unable to say if the bytes are available?
For example, he reads something from a hardware module and
the only way to know whether there are bytes is try reading?



I'm a little confused.

How does solution 2 handle this problem?
I agree with Mikhail that available is not reliable.  You can judge the 
stream end has been reached if and only if you got -1 returned from read 
methods.


I did a quick view of Harmony-166 and patches(seems it's the root of why 
Harmony-410 is raised), I think there should be some easier way to fix 
this bug,  while the CharsetDecoder doesn't need to be modified, pls. 
see below for details.


The fillBuf() of InputStreamReader should be looks like:

   private void fillBuf() throws IOException {
   chars.clear();
   int read = 0;
  // if we haven't reached the end of stream, and cannot decode 
even one character,

 // go on to read and decode
   while(read != -1  chars.position() == 0){
   try {
   read = in.read(bytes.array());
   } catch (IOException e) {
   chars.limit(0);
   throw e;
   }
   boolean endOfInput = false;
   if(read == -1){
   //if we have reached the end of stream, try the last 
time to decode

   bytes.limit(0);
   endOfInput = true;
   }else{
   //if we read some bytes, try to decode them
   bytes.limit(read);
   read = 0;
   }
   decoder.decode(bytes, chars, endOfInput);
   bytes.clear();
   }
   chars.flip();
   }

I've got a pass to run Vladimir's test case for Harmony-166.

If no one objections, I'll attach patch based on this. comments?



Thanks,
Mikhail


In current implementation of InputStreamReader endOfInput variable
sets to true if reader can't read less that 8192 bytes. When
InputStreamReader try to read one character from
LimitedByteArrayInputStream (A ByteArrayInputStream that only returns
a single byte per read) true as a parameter passed to charset decoder,
nevertheless we still have further input from
LimitedByteArrayInputStream.

2 methods from tests.api.java.io.OutputStreamWriterTest also failed
because of read() method implementation of InputStreamReader.

IMO, we have 2 ways to fix Harmony-166 :

1. Don't change the behavior of CharsetDecoder, and use in.available()
method for endOfInput variable. If in.available()  0 pass false to
decoder, read additional one byte from input stream and try to decode
again. If decoding operation can decode remaining portion + one
additional byte to character, stop here, else try to read addition
byte again. Function fillBuf will invoke with the next invocation of
read() method and next portion of 8192 bytes will be read from input
stream.


It's reasonable.
After all, streaming decode is useful and helpful to users.
IMO, streaming decode is a highlight compared with some RI :)


2. Change the behavior of CharsetDecoder and operate with
bytes.remaining() for calculating endOfInput. In this case, we always
pass false with first invocation of decode method. Algorithm further
is almost the same as above, but we stop the cycle if all bytes
decoded successfully (i.e if bytes.hasRemaining() is false).


Vladimir, Could you please give the detail code or pseudo code of 
these two solutions?


Thanks!



What do you think about it?


Thanks.
Vladimir.

Thanks!




-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





--
Paulex Yang
China Software Development Lab
IBM



-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [classlib] charset decoding

2006-04-26 Thread Mikhail Loenko
2006/4/26, Paulex Yang [EMAIL PROTECTED]:
 Andrew Zhang wrote:
 
  Mikhail Loenko wrote:
  2006/4/26, Andrew Zhang [EMAIL PROTECTED]:
  Vladimir Strigun wrote:
  On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote:
  I try to find best solution for Harmony-166 and during fix preparation
  I've found current issue.
 
  Method test_read from tests.api.java.io.InputStreamReaderTest failed
  and first fix I've created was with in.available() but I agree with
  Mikhail that it's possible not the best solution.
  why?
 
  Yes, the spec says The available method for class InputStream always
  returns 0.. But it also says This method should be overridden by
  subclasses.. :)
 
  Here's the available description:
  Returns the number of bytes that can be read (or skipped over) from
  this input stream without blocking by the next caller of a method for
  this input stream.
 
  If someone writes a subclass of InputStream that available returns 0
  even if there are some data available, then he should take the
  result of
  cheating or contradicting with spec.
 
 
  What if someone just unable to say if the bytes are available?
  For example, he reads something from a hardware module and
  the only way to know whether there are bytes is try reading?
 
 
  I'm a little confused.
 
  How does solution 2 handle this problem?
 I agree with Mikhail that available is not reliable.  You can judge the
 stream end has been reached if and only if you got -1 returned from read
 methods.

 I did a quick view of Harmony-166 and patches(seems it's the root of why
 Harmony-410 is raised), I think there should be some easier way to fix
 this bug,  while the CharsetDecoder doesn't need to be modified, pls.
 see below for details.

 The fillBuf() of InputStreamReader should be looks like:

private void fillBuf() throws IOException {
chars.clear();
int read = 0;
   // if we haven't reached the end of stream, and cannot decode
 even one character,
  // go on to read and decode
while(read != -1  chars.position() == 0){
try {
read = in.read(bytes.array());
} catch (IOException e) {
chars.limit(0);
throw e;
}
boolean endOfInput = false;
if(read == -1){
//if we have reached the end of stream, try the last
 time to decode
bytes.limit(0);
endOfInput = true;
}else{
//if we read some bytes, try to decode them
bytes.limit(read);
read = 0;
}
decoder.decode(bytes, chars, endOfInput);
bytes.clear();
}
chars.flip();
}

 I've got a pass to run Vladimir's test case for Harmony-166.

 If no one objections, I'll attach patch based on this.

+1 from me



 comments?
 
  Thanks,
  Mikhail
 
  In current implementation of InputStreamReader endOfInput variable
  sets to true if reader can't read less that 8192 bytes. When
  InputStreamReader try to read one character from
  LimitedByteArrayInputStream (A ByteArrayInputStream that only returns
  a single byte per read) true as a parameter passed to charset decoder,
  nevertheless we still have further input from
  LimitedByteArrayInputStream.
 
  2 methods from tests.api.java.io.OutputStreamWriterTest also failed
  because of read() method implementation of InputStreamReader.
 
  IMO, we have 2 ways to fix Harmony-166 :
 
  1. Don't change the behavior of CharsetDecoder, and use in.available()
  method for endOfInput variable. If in.available()  0 pass false to
  decoder, read additional one byte from input stream and try to decode
  again. If decoding operation can decode remaining portion + one
  additional byte to character, stop here, else try to read addition
  byte again. Function fillBuf will invoke with the next invocation of
  read() method and next portion of 8192 bytes will be read from input
  stream.
 
  It's reasonable.
  After all, streaming decode is useful and helpful to users.
  IMO, streaming decode is a highlight compared with some RI :)
 
  2. Change the behavior of CharsetDecoder and operate with
  bytes.remaining() for calculating endOfInput. In this case, we always
  pass false with first invocation of decode method. Algorithm further
  is almost the same as above, but we stop the cycle if all bytes
  decoded successfully (i.e if bytes.hasRemaining() is false).
 
  Vladimir, Could you please give the detail code or pseudo code of
  these two solutions?
 
  Thanks!
 
 
  What do you think about it?
 
 
  Thanks.
  Vladimir.
  Thanks!
 
 
 
  -
  Terms of use : http://incubator.apache.org/harmony/mailing.html
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 


 --
 Paulex Yang
 China Software Development Lab
 IBM



 

Re: [classlib] charset decoding

2006-04-26 Thread Paulex Yang

Vladimir Strigun wrote:

On 4/26/06, Paulex Yang [EMAIL PROTECTED] wrote:
  

Andrew Zhang wrote:


Mikhail Loenko wrote:
  

2006/4/26, Andrew Zhang [EMAIL PROTECTED]:


Vladimir Strigun wrote:
  

On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote:
I try to find best solution for Harmony-166 and during fix preparation
I've found current issue.

Method test_read from tests.api.java.io.InputStreamReaderTest failed
and first fix I've created was with in.available() but I agree with
Mikhail that it's possible not the best solution.


why?

Yes, the spec says The available method for class InputStream always
returns 0.. But it also says This method should be overridden by
subclasses.. :)

Here's the available description:
Returns the number of bytes that can be read (or skipped over) from
this input stream without blocking by the next caller of a method for
this input stream.

If someone writes a subclass of InputStream that available returns 0
even if there are some data available, then he should take the
result of
cheating or contradicting with spec.
  

What if someone just unable to say if the bytes are available?
For example, he reads something from a hardware module and
the only way to know whether there are bytes is try reading?



I'm a little confused.

How does solution 2 handle this problem?
  

I agree with Mikhail that available is not reliable.  You can judge the
stream end has been reached if and only if you got -1 returned from read
methods.

I did a quick view of Harmony-166 and patches(seems it's the root of why
Harmony-410 is raised), I think there should be some easier way to fix
this bug,  while the CharsetDecoder doesn't need to be modified, pls.
see below for details.

The fillBuf() of InputStreamReader should be looks like:

   private void fillBuf() throws IOException {
   chars.clear();
   int read = 0;
  // if we haven't reached the end of stream, and cannot decode
even one character,
 // go on to read and decode
   while(read != -1  chars.position() == 0){
   try {
   read = in.read(bytes.array());
   } catch (IOException e) {
   chars.limit(0);
   throw e;
   }
   boolean endOfInput = false;
   if(read == -1){
   //if we have reached the end of stream, try the last
time to decode
   bytes.limit(0);
   endOfInput = true;
   }else{
   //if we read some bytes, try to decode them
   bytes.limit(read);
   read = 0;
   }
   decoder.decode(bytes, chars, endOfInput);
   bytes.clear();
   }
   chars.flip();
   }

I've got a pass to run Vladimir's test case for Harmony-166.

If no one objections, I'll attach patch based on this. comments?



Paulex, unfortunately I can't agree with your patch.
InputStreamReaderTest passed, but OutputStreamWriterTest still failed.
Could you please try to run test I have mentioned?
  
I see, the OutputStreamWriterTest does fail, seems there's some problem 
for CharsetDecoder to handling UTF-8 byte stream, but the 
InputStreamReader itself should work because it pass all tests for other 
decoding. I'll go on to study it.


And I also realized there is a bug in my former proposal - it cannot 
support the non-blocking InputStream, so the revisited version is as below:

   private void fillBuf() throws IOException {
   chars.clear();
   int read = 0;
   do{
   try {
   read = in.read(bytes.array());
   } catch (IOException e) {
   chars.limit(0);
   throw e;
   }
   boolean endOfInput = false;
   if(read == -1){
   bytes.limit(0);
   endOfInput = true;
   }else{
   bytes.limit(read);
   }
   decoder.decode(bytes, chars, endOfInput);
   bytes.clear();
   }while(read  0  chars.position() == 0);
//the main difference with prior version is to check read0 instead of 
read != -1, so that the InputStreamReader based on non-blocking 
InputStream can return immediatelly

   chars.flip();
   }

Thanks.
Vladimir.

  

Thanks,
Mikhail



In current implementation of InputStreamReader endOfInput variable
sets to true if reader can't read less that 8192 bytes. When
InputStreamReader try to read one character from
LimitedByteArrayInputStream (A ByteArrayInputStream that only returns
a single byte per read) true as a parameter passed to charset decoder,
nevertheless we still have further input from
LimitedByteArrayInputStream.

2 methods from tests.api.java.io.OutputStreamWriterTest also failed
because of read() method implementation of InputStreamReader.

IMO, we have 2 ways to fix Harmony-166 :

1. Don't change the behavior of CharsetDecoder, and use in.available()
method for endOfInput variable. If in.available()  0 pass false to
decoder, read