wgtmac commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r1020313641
########## parquet-common/src/main/java/org/apache/parquet/bytes/ByteBufferInputStream.java: ########## @@ -157,4 +165,80 @@ public void reset() throws IOException { public boolean markSupported() { return delegate.markSupported(); } + + public boolean readBoolean() throws IOException { + return readByte() != 0; + } + + public byte readByte() throws IOException { + return delegate.readByte(); + } + + public int readUnsignedByte() throws IOException { + return delegate.readUnsignedByte(); + } + + public short readShort() throws IOException { Review Comment: I like the idea to provide these read functions to enable larger read. BTW, is there any use case to read a batch of shorts (and other numeric types)? ########## parquet-common/src/main/java/org/apache/parquet/bytes/MultiBufferInputStream.java: ########## @@ -238,8 +257,31 @@ public int read(byte[] bytes, int off, int len) { } @Override - public int read(byte[] bytes) { - return read(bytes, 0, bytes.length); + public void readFully(byte[] bytes, int off, int len) throws IOException { + if (len <= 0) { + if (len < 0) { + throw new IndexOutOfBoundsException("Read length must be greater than 0: " + len); + } + + return; + } + + if (current == null || len > length) { + throw new EOFException(); + } + + int bytesRead = 0; + while (bytesRead < len) { Review Comment: > Well, my objective here is to maximize performance. So we have to decide between maintainability and performance. Let's deliberate over this a bit more, and I'll do what you think is best. I deeply understand this is a difficult trade-off. Do you have any evidence on the performance penalty if we wrap `read` and `readFully` methods to share some common logic? If the penalty is acceptable, we should definitely go for maintainability. ########## parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java: ########## @@ -88,6 +136,21 @@ public long skip(long n) { return bytesToSkip; } + @Override + public void skipFully(long n) throws IOException { + if (n < 0 || n > Integer.MAX_VALUE) { + throw new IllegalArgumentException(); + } + + try { + buffer.position(buffer.position() + (int)n); + } catch (IllegalArgumentException e) { Review Comment: > A try/catch is basically free when no exception is thrown, while a check is not. I have tested this, and the try/catch is empirically faster, since the no-exception case is the common case. Putting in the check means we have to wait until the C2 compiler produces a trace without the branch. But even then, there's always the overhead of a check somewhere to be able to fall back to interpretation if the condition is not correct for the trace. I'm avoiding all of that entirely, making this faster in of the most common case. That's interesting. A caveat is that try/catch may prohibit code optimization. But I doubt whether it makes significant difference on the simple buffer operation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org