Hi Sergey,

Are you taking into account that for many reads the data is not copied into the local buffer.
See the comments in BufferedInputStream.read1: 277:280?

How much is the slowdown, when BufferedInputStreams are chained?

Thanks, Roger

On 4/15/21 7:08 AM, Pavel Rappo wrote:
On 15 Apr 2021, at 08:33, Сергей Цыпанов <sergei.tsypa...@yandex.ru> wrote:

Hello,

buffering with j.i.BufferedInputStream is a handy way to improve performance of 
IO operations.
However in many cases buffering is redundant. Consider this code snippet from 
Spring Framework:

static ClassReader getClassReader(Resource rsc) throws Exception {
try (var is = new BufferedInputStream(rsc.getInputStream())) {
   return new ClassReader(is);
}
}

Interface Resource has lots of implementations some of them return buffered 
InputStream,
others don't, but all of them get wrapped with BufferedInputStream.

Apart from obvious misuses like BufferedInputStream cascade such wrapping is 
not necessary,
e.g. when we read a huge file using FileInputStream.readAllBytes(),
in others cases it's harmful e.g. when we read a small (comparing to the default
size of buffer in BufferedInputStream) file with readAllBytes() or
when we read from ByteArrayInputStream which is kind of buffered one by its 
nature.

I think an instance of InputStream should decide itself whether it requires 
buffering,
so I suggest to add a couple of methods into j.i.InputStream:

// subclasses can override this
protected boolean needsBuffer() {
return true;
}

public InputStream buffered() {
return needsBuffer() ? new BufferedInputStream(this) : this;
}

this allows subclasses of InputStream to override needsBuffer() to declare 
buffering redundancy.
Let's imagine we've overridden needsBuffer() in BufferedInputStream:

public class BufferedInputStream {
@Override
public boolean needsBuffer() {
   return true;
}
}

then the code we've mentioned above should be rewritten as

static ClassReader getClassReader(Resource rsc) throws Exception {
try (var is = rsc.getInputStream().buffered()) {
   return new ClassReader(is);
}
}

preventing cascade of BufferedInputStreams.

When I read this part

There are also cases when the need for buffering depends on the way how we read 
from InputStream:

new FileInputStream(file).buffered().readAllBytes() // buffering is redundant
my knee-jerk reaction was that a better solution likely lies with introducing a 
marker interface and selectively implementing it as opposed to adding two new 
methods to the existing class and selectively overriding them. Let's call this 
interface java.io.Buffered: Bufferred is to InputStream as RandomAccess is to 
List.

Just to be clear: I'm not proposing to actually do this. It's just a thought.

-Pavel

var data = new DataInputStream(new FileInputStream(file).buffered())
for (int i = 0; i < 1000; i++) {
  data.readInt();                          // readInt() does 4 calls to 
InputStream.read() so buffering is needed
}

here if FileInputStream.needsBuffer() is overridden and returns false (assuming 
most of reads from it are bulk)
then we won't have buffering for DataInputStream. To avoid this we can also
add InputStream.buffered(boolean enforceBuffering) to have manual control.

To sum up, my proposal is to add those methods to InputStream:

protected static boolean needsBuffer() {
return true;
}

public InputStream buffered() {
return buffered(needsBuffer());
}

public InputStream buffered(boolean enforceBuffering) {
return enforceBuffering ? new BufferedInputStream(this) : this;
}

What do you think?

With best regards,
Sergey Tsypanov

Reply via email to