Re: [classlib] charset decoding
Paulex Yang wrote: Vladimir Strigun wrote: On 4/26/06, Paulex Yang [EMAIL PROTECTED] wrote: Andrew Zhang wrote: Mikhail Loenko wrote: 2006/4/26, Andrew Zhang [EMAIL PROTECTED]: Vladimir Strigun wrote: On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote: I try to find best solution for Harmony-166 and during fix preparation I've found current issue. Method test_read from tests.api.java.io.InputStreamReaderTest failed and first fix I've created was with in.available() but I agree with Mikhail that it's possible not the best solution. why? Yes, the spec says The available method for class InputStream always returns 0.. But it also says This method should be overridden by subclasses.. :) Here's the available description: Returns the number of bytes that can be read (or skipped over) from this input stream without blocking by the next caller of a method for this input stream. If someone writes a subclass of InputStream that available returns 0 even if there are some data available, then he should take the result of cheating or contradicting with spec. What if someone just unable to say if the bytes are available? For example, he reads something from a hardware module and the only way to know whether there are bytes is try reading? I'm a little confused. How does solution 2 handle this problem? I agree with Mikhail that available is not reliable. You can judge the stream end has been reached if and only if you got -1 returned from read methods. I did a quick view of Harmony-166 and patches(seems it's the root of why Harmony-410 is raised), I think there should be some easier way to fix this bug, while the CharsetDecoder doesn't need to be modified, pls. see below for details. The fillBuf() of InputStreamReader should be looks like: private void fillBuf() throws IOException { chars.clear(); int read = 0; // if we haven't reached the end of stream, and cannot decode even one character, // go on to read and decode while(read != -1 chars.position() == 0){ try { read = in.read(bytes.array()); } catch (IOException e) { chars.limit(0); throw e; } boolean endOfInput = false; if(read == -1){ //if we have reached the end of stream, try the last time to decode bytes.limit(0); endOfInput = true; }else{ //if we read some bytes, try to decode them bytes.limit(read); read = 0; } decoder.decode(bytes, chars, endOfInput); bytes.clear(); } chars.flip(); } I've got a pass to run Vladimir's test case for Harmony-166. If no one objections, I'll attach patch based on this. comments? Paulex, unfortunately I can't agree with your patch. InputStreamReaderTest passed, but OutputStreamWriterTest still failed. Could you please try to run test I have mentioned? I see, the OutputStreamWriterTest does fail, seems there's some problem for CharsetDecoder to handling UTF-8 byte stream, but the InputStreamReader itself should work because it pass all tests for other decoding. I'll go on to study it. And I also realized there is a bug in my former proposal - it cannot support the non-blocking InputStream, so the revisited version is as below: private void fillBuf() throws IOException { chars.clear(); int read = 0; do{ try { read = in.read(bytes.array()); } catch (IOException e) { chars.limit(0); throw e; } boolean endOfInput = false; if(read == -1){ bytes.limit(0); endOfInput = true; }else{ bytes.limit(read); } decoder.decode(bytes, chars, endOfInput); bytes.clear(); }while(read 0 chars.position() == 0); //the main difference with prior version is to check read0 instead of read != -1, so that the InputStreamReader based on non-blocking InputStream can return immediatelly chars.flip(); } Thanks. Vladimir. Thanks, Mikhail In current implementation of InputStreamReader endOfInput variable sets to true if reader can't read less that 8192 bytes. When InputStreamReader try to read one character from LimitedByteArrayInputStream (A ByteArrayInputStream that only returns a single byte per read) true as a parameter passed to charset decoder, nevertheless we still have further input from LimitedByteArrayInputStream. 2 methods from tests.api.java.io.OutputStreamWriterTest also failed because of read() method implementation of InputStreamReader. IMO, we have 2 ways to fix Harmony-166 : 1. Don't change the behavior of CharsetDecoder, and use in.available() method for endOfInput variable. If in.available()
Re: [classlib] charset decoding
Richard Liang wrote: Paulex Yang wrote: I see, the OutputStreamWriterTest does fail, seems there's some problem for CharsetDecoder to handling UTF-8 byte stream, but the InputStreamReader itself should work because it pass all tests for other decoding. I'll go on to study it. And I also realized there is a bug in my former proposal - it cannot support the non-blocking InputStream, so the revisited version is as below: private void fillBuf() throws IOException { chars.clear(); int read = 0; do{ try { read = in.read(bytes.array()); } catch (IOException e) { chars.limit(0); throw e; } boolean endOfInput = false; if(read == -1){ bytes.limit(0); endOfInput = true; }else{ bytes.limit(read); } decoder.decode(bytes, chars, endOfInput); bytes.clear(); }while(read 0 chars.position() == 0); //the main difference with prior version is to check read0 instead of read != -1, so that the InputStreamReader based on non-blocking InputStream can return immediatelly chars.flip(); } Yes. It IS a bug of ICU4JNI. I have submitted a bug [1] for ICU and have proposed a fix for it. [1] http://bugs.icu-project.org/cgi-bin/icu-bugs/incoming?findid=5180 Great! Thank you, Richard. Vladimir, how about I attach patch for InputStreamReader to Harmony-166 at first? At least, it can make InputStreamReaderTest pass. Richard, have you tried the OutputStreamWriterTest with your patch for ICU and my modification to InputStreamReader? -- Paulex Yang China Software Development Lab IBM - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[classlib] charset decoding (was: [Fwd: [jira] Commented: (HARMONY-410) method decode(ByteBuffer, CharBuffer, boolean) should set correct position in ByteBuffer])
Looks like a conversation best conducted on the dev list... Original Message Subject: [jira] Commented: (HARMONY-410) method decode(ByteBuffer, CharBuffer, boolean) should set correct position in ByteBuffer Date: Wed, 26 Apr 2006 08:25:03 + (GMT+00:00) From: Vladimir Strigun (JIRA) [EMAIL PROTECTED] Reply-To: harmony-dev@incubator.apache.org To: [EMAIL PROTECTED] [ http://issues.apache.org/jira/browse/HARMONY-410?page=comments#action_12376425 ] Vladimir Strigun commented on HARMONY-410: -- Hi Richard, Thanks for the clarification, I agree that streaming decode is good thing, but I'd like to explain my understanding of specification :) According to specification of CharsetDecoder decoding operation should process by the following steps: 2. Invoke the decode method zero or more times, as long as additional input may be available, passing false for the endOfInput argument and filling the input buffer and flushing the output buffer between invocations; 3. Invoke the decode method one final time, passing true for the endOfInput argument; and then spec also says: The buffers' positions will be advanced to reflect the bytes read and the characters written, but their marks and limits will not be modified I understand these sentences in the next way: invoke decode with endOfInput = false if you have additional input, then fill the buffer (i.e. add to buffer some additional input), invoke decode again and pass correct endOfInput parameter dependent of availability of input. Example you provided is very useful and, of course, 1st option looks better, but what I'm suggest here is to reflect actual processed bytes in input. After first invocation of decode, not all bytes processed actually, i.e. almost all bytes processed, but some stored in decoder to further operation. Is it possible to store bytes in decoder, support streaming decoding, and, at the same time sets correct position in input buffer after each operation? Thanks. Vladimir. method decode(ByteBuffer, CharBuffer, boolean) should set correct position in ByteBuffer Key: HARMONY-410 URL: http://issues.apache.org/jira/browse/HARMONY-410 Project: Harmony Type: Bug Components: Classlib Reporter: Vladimir Strigun Assignee: Mikhail Loenko Priority: Minor Attachments: Harmony-410_patch.txt, harmony-410_test.txt When ByteBuffer contain incomplete sequence of bytes for successful decoding, position in ByteBuffer should be set to latest successful byte. I will attach testcase and patch soon. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira -- Tim Ellison ([EMAIL PROTECTED]) IBM Java technology centre, UK. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[classlib] charset decoding
On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote: Hi Vladimir, I think we both understand the spec :) Here I propose two solutions: 1. Set the ByteBuffer to the limit, and store the remaining ByteBuffer in decoder. Invokers doesn't care the position of in. If the result is UNDERFLOW and there're furthur input, just pass the new input to the decoder. It's a typical streaming decoder. That's what Harmony does currently. 2. Decoder doesn't store remaining ByteBuffer. Position of in is set to indicate the remaining ByteBuffer. Invoker should take care to generate new input ByteBuffer for next invocation. RI acts in this way. Both are acceptable. However, I think solution 1 is more reasonable. Is it possible to store bytes in decoder, support streaming decoding, and, at the same time sets correct position in input buffer after each operation? Yes. In fact, your patch will make Harmony act as the description above. However, I don't think it solve the problem. Maybe invoker is more confusable and may think: Is the remaining bytebuffer maintained in decoder or in ? Shall I get the remaining buffer from in and pass it for next invocation? Anyway, we need a decision on this compatibility issue. By the way, Vladimir, does solution one cause any problem on other classlib implementation? I try to find best solution for Harmony-166 and during fix preparation I've found current issue. Method test_read from tests.api.java.io.InputStreamReaderTest failed and first fix I've created was with in.available() but I agree with Mikhail that it's possible not the best solution. In current implementation of InputStreamReader endOfInput variable sets to true if reader can't read less that 8192 bytes. When InputStreamReader try to read one character from LimitedByteArrayInputStream (A ByteArrayInputStream that only returns a single byte per read) true as a parameter passed to charset decoder, nevertheless we still have further input from LimitedByteArrayInputStream. 2 methods from tests.api.java.io.OutputStreamWriterTest also failed because of read() method implementation of InputStreamReader. IMO, we have 2 ways to fix Harmony-166 : 1. Don't change the behavior of CharsetDecoder, and use in.available() method for endOfInput variable. If in.available() 0 pass false to decoder, read additional one byte from input stream and try to decode again. If decoding operation can decode remaining portion + one additional byte to character, stop here, else try to read addition byte again. Function fillBuf will invoke with the next invocation of read() method and next portion of 8192 bytes will be read from input stream. 2. Change the behavior of CharsetDecoder and operate with bytes.remaining() for calculating endOfInput. In this case, we always pass false with first invocation of decode method. Algorithm further is almost the same as above, but we stop the cycle if all bytes decoded successfully (i.e if bytes.hasRemaining() is false). What do you think about it? Thanks. Vladimir. Any comment? Thanks ! Vladimir Strigun commented on HARMONY-410: -- Hi Richard, Thanks for the clarification, I agree that streaming decode is good thing, but I'd like to explain my understanding of specification :) According to specification of CharsetDecoder decoding operation should process by the following steps: 2. Invoke the decode method zero or more times, as long as additional input may be available, passing false for the endOfInput argument and filling the input buffer and flushing the output buffer between invocations; 3. Invoke the decode method one final time, passing true for the endOfInput argument; and then spec also says: The buffers' positions will be advanced to reflect the bytes read and the characters written, but their marks and limits will not be modified I understand these sentences in the next way: invoke decode with endOfInput = false if you have additional input, then fill the buffer (i.e. add to buffer some additional input), invoke decode again and pass correct endOfInput parameter dependent of availability of input. Example you provided is very useful and, of course, 1st option looks better, but what I'm suggest here is to reflect actual processed bytes in input. After first invocation of decode, not all bytes processed actually, i.e. almost all bytes processed, but some stored in decoder to further operation. Is it possible to store bytes in decoder, support streaming decoding, and, at the same time sets correct position in input buffer after each operation? Thanks. Vladimir. method decode(ByteBuffer, CharBuffer, boolean) should set correct position in ByteBuffer Key: HARMONY-410 URL: http://issues.apache.org/jira/browse/HARMONY-410 Project: Harmony Type: Bug
Re: [classlib] charset decoding (was: [Fwd: [jira] Commented: (HARMONY-410) method decode(ByteBuffer, CharBuffer, boolean) should set correct position in ByteBuffer])
Tim Ellison wrote: Looks like a conversation best conducted on the dev list... I have to admit I actually laughed out loud. Thx :) geir - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [classlib] charset decoding
Vladimir Strigun wrote: On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote: I try to find best solution for Harmony-166 and during fix preparation I've found current issue. Method test_read from tests.api.java.io.InputStreamReaderTest failed and first fix I've created was with in.available() but I agree with Mikhail that it's possible not the best solution. why? Yes, the spec says The available method for class InputStream always returns 0.. But it also says This method should be overridden by subclasses.. :) Here's the available description: Returns the number of bytes that can be read (or skipped over) from this input stream without blocking by the next caller of a method for this input stream. If someone writes a subclass of InputStream that available returns 0 even if there are some data available, then he should take the result of cheating or contradicting with spec. In current implementation of InputStreamReader endOfInput variable sets to true if reader can't read less that 8192 bytes. When InputStreamReader try to read one character from LimitedByteArrayInputStream (A ByteArrayInputStream that only returns a single byte per read) true as a parameter passed to charset decoder, nevertheless we still have further input from LimitedByteArrayInputStream. 2 methods from tests.api.java.io.OutputStreamWriterTest also failed because of read() method implementation of InputStreamReader. IMO, we have 2 ways to fix Harmony-166 : 1. Don't change the behavior of CharsetDecoder, and use in.available() method for endOfInput variable. If in.available() 0 pass false to decoder, read additional one byte from input stream and try to decode again. If decoding operation can decode remaining portion + one additional byte to character, stop here, else try to read addition byte again. Function fillBuf will invoke with the next invocation of read() method and next portion of 8192 bytes will be read from input stream. It's reasonable. After all, streaming decode is useful and helpful to users. IMO, streaming decode is a highlight compared with some RI :) 2. Change the behavior of CharsetDecoder and operate with bytes.remaining() for calculating endOfInput. In this case, we always pass false with first invocation of decode method. Algorithm further is almost the same as above, but we stop the cycle if all bytes decoded successfully (i.e if bytes.hasRemaining() is false). What do you think about it? Thanks. Vladimir. Thanks! -- Andrew Zhang China Software Development Lab, IBM Any comment? Thanks ! Vladimir Strigun commented on HARMONY-410: -- Hi Richard, Thanks for the clarification, I agree that streaming decode is good thing, but I'd like to explain my understanding of specification :) According to specification of CharsetDecoder decoding operation should process by the following steps: 2. Invoke the decode method zero or more times, as long as additional input may be available, passing false for the endOfInput argument and filling the input buffer and flushing the output buffer between invocations; 3. Invoke the decode method one final time, passing true for the endOfInput argument; and then spec also says: The buffers' positions will be advanced to reflect the bytes read and the characters written, but their marks and limits will not be modified I understand these sentences in the next way: invoke decode with endOfInput = false if you have additional input, then fill the buffer (i.e. add to buffer some additional input), invoke decode again and pass correct endOfInput parameter dependent of availability of input. Example you provided is very useful and, of course, 1st option looks better, but what I'm suggest here is to reflect actual processed bytes in input. After first invocation of decode, not all bytes processed actually, i.e. almost all bytes processed, but some stored in decoder to further operation. Is it possible to store bytes in decoder, support streaming decoding, and, at the same time sets correct position in input buffer after each operation? Thanks. Vladimir. method decode(ByteBuffer, CharBuffer, boolean) should set correct position in ByteBuffer Key: HARMONY-410 URL: http://issues.apache.org/jira/browse/HARMONY-410 Project: Harmony Type: Bug Components: Classlib Reporter: Vladimir Strigun Assignee: Mikhail Loenko Priority: Minor Attachments: Harmony-410_patch.txt, harmony-410_test.txt When ByteBuffer contain incomplete sequence of bytes for successful decoding, position in ByteBuffer should be set to latest successful byte. I will attach testcase and patch soon. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see:
Re: [classlib] charset decoding
2006/4/26, Andrew Zhang [EMAIL PROTECTED]: Vladimir Strigun wrote: On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote: I try to find best solution for Harmony-166 and during fix preparation I've found current issue. Method test_read from tests.api.java.io.InputStreamReaderTest failed and first fix I've created was with in.available() but I agree with Mikhail that it's possible not the best solution. why? Yes, the spec says The available method for class InputStream always returns 0.. But it also says This method should be overridden by subclasses.. :) Here's the available description: Returns the number of bytes that can be read (or skipped over) from this input stream without blocking by the next caller of a method for this input stream. If someone writes a subclass of InputStream that available returns 0 even if there are some data available, then he should take the result of cheating or contradicting with spec. What if someone just unable to say if the bytes are available? For example, he reads something from a hardware module and the only way to know whether there are bytes is try reading? Thanks, Mikhail In current implementation of InputStreamReader endOfInput variable sets to true if reader can't read less that 8192 bytes. When InputStreamReader try to read one character from LimitedByteArrayInputStream (A ByteArrayInputStream that only returns a single byte per read) true as a parameter passed to charset decoder, nevertheless we still have further input from LimitedByteArrayInputStream. 2 methods from tests.api.java.io.OutputStreamWriterTest also failed because of read() method implementation of InputStreamReader. IMO, we have 2 ways to fix Harmony-166 : 1. Don't change the behavior of CharsetDecoder, and use in.available() method for endOfInput variable. If in.available() 0 pass false to decoder, read additional one byte from input stream and try to decode again. If decoding operation can decode remaining portion + one additional byte to character, stop here, else try to read addition byte again. Function fillBuf will invoke with the next invocation of read() method and next portion of 8192 bytes will be read from input stream. It's reasonable. After all, streaming decode is useful and helpful to users. IMO, streaming decode is a highlight compared with some RI :) 2. Change the behavior of CharsetDecoder and operate with bytes.remaining() for calculating endOfInput. In this case, we always pass false with first invocation of decode method. Algorithm further is almost the same as above, but we stop the cycle if all bytes decoded successfully (i.e if bytes.hasRemaining() is false). What do you think about it? Thanks. Vladimir. Thanks! -- Andrew Zhang China Software Development Lab, IBM Any comment? Thanks ! Vladimir Strigun commented on HARMONY-410: -- Hi Richard, Thanks for the clarification, I agree that streaming decode is good thing, but I'd like to explain my understanding of specification :) According to specification of CharsetDecoder decoding operation should process by the following steps: 2. Invoke the decode method zero or more times, as long as additional input may be available, passing false for the endOfInput argument and filling the input buffer and flushing the output buffer between invocations; 3. Invoke the decode method one final time, passing true for the endOfInput argument; and then spec also says: The buffers' positions will be advanced to reflect the bytes read and the characters written, but their marks and limits will not be modified I understand these sentences in the next way: invoke decode with endOfInput = false if you have additional input, then fill the buffer (i.e. add to buffer some additional input), invoke decode again and pass correct endOfInput parameter dependent of availability of input. Example you provided is very useful and, of course, 1st option looks better, but what I'm suggest here is to reflect actual processed bytes in input. After first invocation of decode, not all bytes processed actually, i.e. almost all bytes processed, but some stored in decoder to further operation. Is it possible to store bytes in decoder, support streaming decoding, and, at the same time sets correct position in input buffer after each operation? Thanks. Vladimir. method decode(ByteBuffer, CharBuffer, boolean) should set correct position in ByteBuffer Key: HARMONY-410 URL: http://issues.apache.org/jira/browse/HARMONY-410 Project: Harmony Type: Bug Components: Classlib Reporter: Vladimir Strigun Assignee: Mikhail Loenko Priority: Minor Attachments: Harmony-410_patch.txt,
Re: [classlib] charset decoding
Mikhail Loenko wrote: 2006/4/26, Andrew Zhang [EMAIL PROTECTED]: Vladimir Strigun wrote: On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote: I try to find best solution for Harmony-166 and during fix preparation I've found current issue. Method test_read from tests.api.java.io.InputStreamReaderTest failed and first fix I've created was with in.available() but I agree with Mikhail that it's possible not the best solution. why? Yes, the spec says The available method for class InputStream always returns 0.. But it also says This method should be overridden by subclasses.. :) Here's the available description: Returns the number of bytes that can be read (or skipped over) from this input stream without blocking by the next caller of a method for this input stream. If someone writes a subclass of InputStream that available returns 0 even if there are some data available, then he should take the result of cheating or contradicting with spec. What if someone just unable to say if the bytes are available? For example, he reads something from a hardware module and the only way to know whether there are bytes is try reading? I'm a little confused. How does solution 2 handle this problem? Thanks, Mikhail In current implementation of InputStreamReader endOfInput variable sets to true if reader can't read less that 8192 bytes. When InputStreamReader try to read one character from LimitedByteArrayInputStream (A ByteArrayInputStream that only returns a single byte per read) true as a parameter passed to charset decoder, nevertheless we still have further input from LimitedByteArrayInputStream. 2 methods from tests.api.java.io.OutputStreamWriterTest also failed because of read() method implementation of InputStreamReader. IMO, we have 2 ways to fix Harmony-166 : 1. Don't change the behavior of CharsetDecoder, and use in.available() method for endOfInput variable. If in.available() 0 pass false to decoder, read additional one byte from input stream and try to decode again. If decoding operation can decode remaining portion + one additional byte to character, stop here, else try to read addition byte again. Function fillBuf will invoke with the next invocation of read() method and next portion of 8192 bytes will be read from input stream. It's reasonable. After all, streaming decode is useful and helpful to users. IMO, streaming decode is a highlight compared with some RI :) 2. Change the behavior of CharsetDecoder and operate with bytes.remaining() for calculating endOfInput. In this case, we always pass false with first invocation of decode method. Algorithm further is almost the same as above, but we stop the cycle if all bytes decoded successfully (i.e if bytes.hasRemaining() is false). Vladimir, Could you please give the detail code or pseudo code of these two solutions? Thanks! What do you think about it? Thanks. Vladimir. Thanks! - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [classlib] charset decoding
Andrew Zhang wrote: Mikhail Loenko wrote: 2006/4/26, Andrew Zhang [EMAIL PROTECTED]: Vladimir Strigun wrote: On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote: I try to find best solution for Harmony-166 and during fix preparation I've found current issue. Method test_read from tests.api.java.io.InputStreamReaderTest failed and first fix I've created was with in.available() but I agree with Mikhail that it's possible not the best solution. why? Yes, the spec says The available method for class InputStream always returns 0.. But it also says This method should be overridden by subclasses.. :) Here's the available description: Returns the number of bytes that can be read (or skipped over) from this input stream without blocking by the next caller of a method for this input stream. If someone writes a subclass of InputStream that available returns 0 even if there are some data available, then he should take the result of cheating or contradicting with spec. What if someone just unable to say if the bytes are available? For example, he reads something from a hardware module and the only way to know whether there are bytes is try reading? I'm a little confused. How does solution 2 handle this problem? I agree with Mikhail that available is not reliable. You can judge the stream end has been reached if and only if you got -1 returned from read methods. I did a quick view of Harmony-166 and patches(seems it's the root of why Harmony-410 is raised), I think there should be some easier way to fix this bug, while the CharsetDecoder doesn't need to be modified, pls. see below for details. The fillBuf() of InputStreamReader should be looks like: private void fillBuf() throws IOException { chars.clear(); int read = 0; // if we haven't reached the end of stream, and cannot decode even one character, // go on to read and decode while(read != -1 chars.position() == 0){ try { read = in.read(bytes.array()); } catch (IOException e) { chars.limit(0); throw e; } boolean endOfInput = false; if(read == -1){ //if we have reached the end of stream, try the last time to decode bytes.limit(0); endOfInput = true; }else{ //if we read some bytes, try to decode them bytes.limit(read); read = 0; } decoder.decode(bytes, chars, endOfInput); bytes.clear(); } chars.flip(); } I've got a pass to run Vladimir's test case for Harmony-166. If no one objections, I'll attach patch based on this. comments? Thanks, Mikhail In current implementation of InputStreamReader endOfInput variable sets to true if reader can't read less that 8192 bytes. When InputStreamReader try to read one character from LimitedByteArrayInputStream (A ByteArrayInputStream that only returns a single byte per read) true as a parameter passed to charset decoder, nevertheless we still have further input from LimitedByteArrayInputStream. 2 methods from tests.api.java.io.OutputStreamWriterTest also failed because of read() method implementation of InputStreamReader. IMO, we have 2 ways to fix Harmony-166 : 1. Don't change the behavior of CharsetDecoder, and use in.available() method for endOfInput variable. If in.available() 0 pass false to decoder, read additional one byte from input stream and try to decode again. If decoding operation can decode remaining portion + one additional byte to character, stop here, else try to read addition byte again. Function fillBuf will invoke with the next invocation of read() method and next portion of 8192 bytes will be read from input stream. It's reasonable. After all, streaming decode is useful and helpful to users. IMO, streaming decode is a highlight compared with some RI :) 2. Change the behavior of CharsetDecoder and operate with bytes.remaining() for calculating endOfInput. In this case, we always pass false with first invocation of decode method. Algorithm further is almost the same as above, but we stop the cycle if all bytes decoded successfully (i.e if bytes.hasRemaining() is false). Vladimir, Could you please give the detail code or pseudo code of these two solutions? Thanks! What do you think about it? Thanks. Vladimir. Thanks! - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Paulex Yang China Software Development Lab IBM - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [classlib] charset decoding
2006/4/26, Paulex Yang [EMAIL PROTECTED]: Andrew Zhang wrote: Mikhail Loenko wrote: 2006/4/26, Andrew Zhang [EMAIL PROTECTED]: Vladimir Strigun wrote: On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote: I try to find best solution for Harmony-166 and during fix preparation I've found current issue. Method test_read from tests.api.java.io.InputStreamReaderTest failed and first fix I've created was with in.available() but I agree with Mikhail that it's possible not the best solution. why? Yes, the spec says The available method for class InputStream always returns 0.. But it also says This method should be overridden by subclasses.. :) Here's the available description: Returns the number of bytes that can be read (or skipped over) from this input stream without blocking by the next caller of a method for this input stream. If someone writes a subclass of InputStream that available returns 0 even if there are some data available, then he should take the result of cheating or contradicting with spec. What if someone just unable to say if the bytes are available? For example, he reads something from a hardware module and the only way to know whether there are bytes is try reading? I'm a little confused. How does solution 2 handle this problem? I agree with Mikhail that available is not reliable. You can judge the stream end has been reached if and only if you got -1 returned from read methods. I did a quick view of Harmony-166 and patches(seems it's the root of why Harmony-410 is raised), I think there should be some easier way to fix this bug, while the CharsetDecoder doesn't need to be modified, pls. see below for details. The fillBuf() of InputStreamReader should be looks like: private void fillBuf() throws IOException { chars.clear(); int read = 0; // if we haven't reached the end of stream, and cannot decode even one character, // go on to read and decode while(read != -1 chars.position() == 0){ try { read = in.read(bytes.array()); } catch (IOException e) { chars.limit(0); throw e; } boolean endOfInput = false; if(read == -1){ //if we have reached the end of stream, try the last time to decode bytes.limit(0); endOfInput = true; }else{ //if we read some bytes, try to decode them bytes.limit(read); read = 0; } decoder.decode(bytes, chars, endOfInput); bytes.clear(); } chars.flip(); } I've got a pass to run Vladimir's test case for Harmony-166. If no one objections, I'll attach patch based on this. +1 from me comments? Thanks, Mikhail In current implementation of InputStreamReader endOfInput variable sets to true if reader can't read less that 8192 bytes. When InputStreamReader try to read one character from LimitedByteArrayInputStream (A ByteArrayInputStream that only returns a single byte per read) true as a parameter passed to charset decoder, nevertheless we still have further input from LimitedByteArrayInputStream. 2 methods from tests.api.java.io.OutputStreamWriterTest also failed because of read() method implementation of InputStreamReader. IMO, we have 2 ways to fix Harmony-166 : 1. Don't change the behavior of CharsetDecoder, and use in.available() method for endOfInput variable. If in.available() 0 pass false to decoder, read additional one byte from input stream and try to decode again. If decoding operation can decode remaining portion + one additional byte to character, stop here, else try to read addition byte again. Function fillBuf will invoke with the next invocation of read() method and next portion of 8192 bytes will be read from input stream. It's reasonable. After all, streaming decode is useful and helpful to users. IMO, streaming decode is a highlight compared with some RI :) 2. Change the behavior of CharsetDecoder and operate with bytes.remaining() for calculating endOfInput. In this case, we always pass false with first invocation of decode method. Algorithm further is almost the same as above, but we stop the cycle if all bytes decoded successfully (i.e if bytes.hasRemaining() is false). Vladimir, Could you please give the detail code or pseudo code of these two solutions? Thanks! What do you think about it? Thanks. Vladimir. Thanks! - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Paulex Yang China Software Development Lab IBM
Re: [classlib] charset decoding
Vladimir Strigun wrote: On 4/26/06, Paulex Yang [EMAIL PROTECTED] wrote: Andrew Zhang wrote: Mikhail Loenko wrote: 2006/4/26, Andrew Zhang [EMAIL PROTECTED]: Vladimir Strigun wrote: On 4/26/06, Andrew Zhang [EMAIL PROTECTED] wrote: I try to find best solution for Harmony-166 and during fix preparation I've found current issue. Method test_read from tests.api.java.io.InputStreamReaderTest failed and first fix I've created was with in.available() but I agree with Mikhail that it's possible not the best solution. why? Yes, the spec says The available method for class InputStream always returns 0.. But it also says This method should be overridden by subclasses.. :) Here's the available description: Returns the number of bytes that can be read (or skipped over) from this input stream without blocking by the next caller of a method for this input stream. If someone writes a subclass of InputStream that available returns 0 even if there are some data available, then he should take the result of cheating or contradicting with spec. What if someone just unable to say if the bytes are available? For example, he reads something from a hardware module and the only way to know whether there are bytes is try reading? I'm a little confused. How does solution 2 handle this problem? I agree with Mikhail that available is not reliable. You can judge the stream end has been reached if and only if you got -1 returned from read methods. I did a quick view of Harmony-166 and patches(seems it's the root of why Harmony-410 is raised), I think there should be some easier way to fix this bug, while the CharsetDecoder doesn't need to be modified, pls. see below for details. The fillBuf() of InputStreamReader should be looks like: private void fillBuf() throws IOException { chars.clear(); int read = 0; // if we haven't reached the end of stream, and cannot decode even one character, // go on to read and decode while(read != -1 chars.position() == 0){ try { read = in.read(bytes.array()); } catch (IOException e) { chars.limit(0); throw e; } boolean endOfInput = false; if(read == -1){ //if we have reached the end of stream, try the last time to decode bytes.limit(0); endOfInput = true; }else{ //if we read some bytes, try to decode them bytes.limit(read); read = 0; } decoder.decode(bytes, chars, endOfInput); bytes.clear(); } chars.flip(); } I've got a pass to run Vladimir's test case for Harmony-166. If no one objections, I'll attach patch based on this. comments? Paulex, unfortunately I can't agree with your patch. InputStreamReaderTest passed, but OutputStreamWriterTest still failed. Could you please try to run test I have mentioned? I see, the OutputStreamWriterTest does fail, seems there's some problem for CharsetDecoder to handling UTF-8 byte stream, but the InputStreamReader itself should work because it pass all tests for other decoding. I'll go on to study it. And I also realized there is a bug in my former proposal - it cannot support the non-blocking InputStream, so the revisited version is as below: private void fillBuf() throws IOException { chars.clear(); int read = 0; do{ try { read = in.read(bytes.array()); } catch (IOException e) { chars.limit(0); throw e; } boolean endOfInput = false; if(read == -1){ bytes.limit(0); endOfInput = true; }else{ bytes.limit(read); } decoder.decode(bytes, chars, endOfInput); bytes.clear(); }while(read 0 chars.position() == 0); //the main difference with prior version is to check read0 instead of read != -1, so that the InputStreamReader based on non-blocking InputStream can return immediatelly chars.flip(); } Thanks. Vladimir. Thanks, Mikhail In current implementation of InputStreamReader endOfInput variable sets to true if reader can't read less that 8192 bytes. When InputStreamReader try to read one character from LimitedByteArrayInputStream (A ByteArrayInputStream that only returns a single byte per read) true as a parameter passed to charset decoder, nevertheless we still have further input from LimitedByteArrayInputStream. 2 methods from tests.api.java.io.OutputStreamWriterTest also failed because of read() method implementation of InputStreamReader. IMO, we have 2 ways to fix Harmony-166 : 1. Don't change the behavior of CharsetDecoder, and use in.available() method for endOfInput variable. If in.available() 0 pass false to decoder, read