[GitHub] [parquet-mr] sunchao commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-07-28 Thread GitBox


sunchao commented on code in PR #960:
URL: https://github.com/apache/parquet-mr/pull/960#discussion_r932674870


##
parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java:
##
@@ -88,6 +136,21 @@ public long skip(long n) {
 return bytesToSkip;
   }
 
+  @Override
+  public void skipFully(long n) throws IOException {
+if (n < 0 || n > Integer.MAX_VALUE) {
+  throw new IllegalArgumentException();

Review Comment:
   nit: can we add some useful message in the exception, something like 
"Invalid input for skipFully: {}" 



##
parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java:
##
@@ -38,6 +39,34 @@ class SingleBufferInputStream extends ByteBufferInputStream {
 // duplicate the buffer because its state will be modified
 this.buffer = buffer.duplicate();
 this.startPosition = buffer.position();
+this.buffer.order(java.nio.ByteOrder.LITTLE_ENDIAN);
+  }
+
+  SingleBufferInputStream(ByteBuffer buffer, int start, int length) {
+// duplicate the buffer because its state will be modified
+this.buffer = buffer.duplicate();
+this.startPosition = start;
+this.buffer.position(start);
+this.buffer.limit(start + length);
+this.buffer.order(java.nio.ByteOrder.LITTLE_ENDIAN);
+  }
+
+  SingleBufferInputStream(byte[] inBuf) {
+this.buffer = ByteBuffer.wrap(inBuf);
+this.buffer.order(java.nio.ByteOrder.LITTLE_ENDIAN);
+this.startPosition = 0;
+  }
+
+  SingleBufferInputStream(byte[] inBuf, int start, int length) {
+this.buffer = ByteBuffer.wrap(inBuf, start, length);
+this.buffer.order(java.nio.ByteOrder.LITTLE_ENDIAN);
+// This seems to be consistent with HeapByteBuffer.wrap(), which leaves
+// the internal "offset" at zero and sets the starting position at start.
+this.startPosition = 0;
+  }
+
+  SingleBufferInputStream(List inBufs) {

Review Comment:
   hmm why we need this constructor?



##
parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java:
##
@@ -70,9 +105,22 @@ public int read(byte[] bytes, int offset, int length) 
throws IOException {
 
 return bytesToRead;
   }
-  
+
+  @Override
+  public void readFully(byte[] bytes, int offset, int length) throws 
IOException {
+try {
+  buffer.get(bytes, offset, length);
+} catch (BufferUnderflowException|IndexOutOfBoundsException e) {

Review Comment:
   nit: space before & after `|`



##
parquet-common/src/main/java/org/apache/parquet/bytes/MultiBufferInputStream.java:
##
@@ -379,4 +427,120 @@ public void remove() {
   second.remove();
 }
   }
+
+  @Override
+  public byte readByte() throws IOException {
+return (byte)readUnsignedByte();
+  }
+
+  @Override
+  public int readUnsignedByte() throws IOException {

Review Comment:
   this looks the same as `read`



##
parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java:
##
@@ -88,6 +136,21 @@ public long skip(long n) {
 return bytesToSkip;
   }
 
+  @Override
+  public void skipFully(long n) throws IOException {
+if (n < 0 || n > Integer.MAX_VALUE) {
+  throw new IllegalArgumentException();
+}
+
+try {
+  buffer.position(buffer.position() + (int)n);
+} catch (IllegalArgumentException e) {

Review Comment:
   instead of try and catch, can we check if the new position is greater than 
the `buffer.limit` and throw EOF if so?



##
parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java:
##
@@ -174,4 +254,64 @@ public boolean markSupported() {
   public int available() {
 return buffer.remaining();
   }
+
+  @Override
+  public byte readByte() throws IOException {
+try {
+  return buffer.get();
+} catch (BufferUnderflowException e) {
+  throw new EOFException(e.getMessage());
+}
+  }
+
+  @Override
+  public int readUnsignedByte() throws IOException {
+try {
+  return buffer.get() & 0xFF;
+} catch (BufferUnderflowException e) {
+  throw new EOFException(e.getMessage());
+}
+  }
+
+  @Override
+  public short readShort() throws IOException {
+try {
+  return buffer.getShort();
+} catch (BufferUnderflowException e) {
+  throw new EOFException(e.getMessage());
+}
+  }
+
+  @Override
+  public int readUnsignedShort() throws IOException {
+try {
+  return buffer.getShort() & 0x;
+} catch (BufferUnderflowException e) {
+  throw new EOFException(e.getMessage());
+}
+  }
+
+  /*
+  Use ByteBuffer.getInt(), which takes advantage of platform intrinsics
+  */
+  @Override
+  public int readInt() throws IOException {
+try {
+  return buffer.getInt();
+} catch (BufferUnderflowException e) {
+  throw new EOFException(e.getMessage());
+}
+  }
+
+  /*
+  Use ByteBuffer.getLonmg(), which takes advantage of platform intrinsics

Review Comment:
   nit: typo




[jira] [Resolved] (PARQUET-2040) Uniform encryption

2022-07-28 Thread Gidon Gershinsky (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gidon Gershinsky resolved PARQUET-2040.
---
Resolution: Fixed

> Uniform encryption
> --
>
> Key: PARQUET-2040
> URL: https://issues.apache.org/jira/browse/PARQUET-2040
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: 1.12.3
>
>
> PME low-level spec supports using the same encryption key for all columns, 
> which is useful in a number of scenarios. However, this feature is not 
> exposed yet in the high-level API, because its misuse can break the NIST 
> limit on the number of AES GCM operations with one key. We will develop a 
> limit-enforcing code and provide an API for uniform table encryption.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PARQUET-2136) File writer construction with encryptor

2022-07-28 Thread Gidon Gershinsky (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gidon Gershinsky resolved PARQUET-2136.
---
Resolution: Fixed

> File writer construction with encryptor
> ---
>
> Key: PARQUET-2136
> URL: https://issues.apache.org/jira/browse/PARQUET-2136
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.12.2
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: 1.12.3
>
>
> Currently, a file writer object can be constructed with encryption 
> properties. We need an additional constructor, that can accept an encryptor 
> instead, in order to support lazy materialization of parquet file writers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)