Correction.
One needs to add the following line as the last line of the outer loop
in readLIne(). Sorry.
pos--; // Roll back the byte that was returned by read()
-----Original Message-----
From: Igor Lubashev
Sent: Thursday, April 12, 2007 2:00 PM
To: 'HttpClient User Discussion'
Subject: RE: Performance issues in ChunkedInputStream
It looks like attachments are filtered out.
Here is the code inline.
/**
* This is not used, but it is a nice little class that I wrote for
Apache, and it might be useful some day.
* NOTE: This has not been tested
*/
package com.foo.util;
import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.Charset;
/**
* @author Igor Lubashev
*
*/
public class LineReaderInputStream extends BufferedInputStream {
private static int defaultBufferSize = 8192;
private static int defaultLineSizeLimit = 8192;
private final int lineSizeLimit;
public LineReaderInputStream(InputStream in) {
this(in, defaultBufferSize, defaultLineSizeLimit);
}
public LineReaderInputStream(InputStream in, int size) {
this(in, size, defaultLineSizeLimit);
}
public LineReaderInputStream(InputStream in, int size, int
lineSizeLimit) {
super(in, size);
this.lineSizeLimit = lineSizeLimit;
}
public synchronized String readLine(Charset charset) throws
IOException {
int savedMarkPos = markpos;
int savedMarkLimit = marklimit;
if( savedMarkPos >= 0 ) {
marklimit += lineSizeLimit;
} else {
markpos = pos;
marklimit = lineSizeLimit;
}
String retLine;
int lineStart = pos;
topLoop:
while( true ) {
while( pos < count ) {
if( buf[pos++] == '\n' ) {
int lineLen = ((pos > 1 && buf[pos-2] == '\r') ?
pos-2 : pos-1) - lineStart;
retLine = new String(buf, lineStart, lineLen,
charset);
break topLoop;
}
}
// Fill buffer with more data
int prevPos = pos;
if( read() < 0 ) {
retLine = null;
break topLoop;
}
lineStart -= prevPos - pos; // Adjust for the moved buffer
}
// Cleanup and return
markpos = savedMarkPos;
marklimit = savedMarkLimit;
return retLine;
}
}
-----Original Message-----
From: Igor Lubashev [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 12, 2007 1:41 PM
To: HttpClient User Discussion
Subject: RE: Performance issues in ChunkedInputStream
1. BufferedInputStream is working fine. I've looked at the source, and
it correctly tried to read data only when its internal buffer is
exhausted. Most read calls reference only the internal buffer. When
the data does get read from the underlying stream, it tries to read it
in large chunks. (Of course, if the underlying stream returns very
little data, it is a different problem.)
2. It is hard to believe that reading a byte at a time is a bottleneck,
but I've just quickly written a LineReaderInputStream, which is derived
from BufferedInputStream, so all the searching for CRLF/LF happens very
quickly internally. The source is attached.
Just call readLine() method, and you'll get Strings out of the stream.
You can interleave all regular stream operations and readLine() calls.
However, if you wish to use readLine() *after* using the stream's read()
methods, make sure that you do not inadvertently pass this stream to
anything that is buffering the stream's data (or your strings may get
consumed via buffering).
- Igor
>>> I looked at the source for BufferedInputStream and it looks like
>>> it tries to fill the empty space in the buffer each time you read
from
>> it (for a socket connection it will read more than one packet of
data)
>>> instead of just doing a single read from the underlying stream.
>>>
>>
>> Ok, then the byte-by-byte reading in CIS when parsing the chunk
header
>> might well be the problem. If you want to fix that, you'll have to
hack
>> deeply into CIS. Here is what I would do if I had no other choice:
>>
>> - extend CIS by a local byte array as a buffer (needs two extra int
>> for cursor and fill size)
>> - change the chunk header parsing to read a bunch of bytes into the
>> buffer, then parsing from there
>> - change all read methods to return leftover bytes from the buffer
>> before calling a read on the underlying stream
>>
>> hope that helps,
>> Roland
>>
>Tony and Roland,
>
>I suspect rather strongly it is BufferedInputStream that needs fixing,
>not ChunkedInputStream
>
>Oleg
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]