[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #993: PARQUET-2184: Improve the allocation behavior of SnappyCompressor

2022-09-27 Thread GitBox


shangxinli commented on code in PR #993:
URL: https://github.com/apache/parquet-mr/pull/993#discussion_r981781086


##
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/SnappyCompressor.java:
##
@@ -96,21 +100,40 @@ public synchronized void setInput(byte[] buffer, int off, 
int len) {
 "Output buffer should be empty. Caller must call compress()");
 
 if (inputBuffer.capacity() - inputBuffer.position() < len) {
-  ByteBuffer tmp = ByteBuffer.allocateDirect(inputBuffer.position() + len);
-  inputBuffer.rewind();
-  tmp.put(inputBuffer);
-  ByteBuffer oldBuffer = inputBuffer;
-  inputBuffer = tmp;
-  CleanUtil.cleanDirectBuffer(oldBuffer);
-} else {
-  inputBuffer.limit(inputBuffer.position() + len);
+  resizeInputBuffer(inputBuffer.position() + len);
 }
 
+inputBuffer.limit(inputBuffer.position() + len);

Review Comment:
   The original code doesn't call limit if (inputBuffer.capacity() - 
inputBuffer.position() < len)  is true



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #993: PARQUET-2184: Improve the allocation behavior of SnappyCompressor

2022-09-27 Thread GitBox


shangxinli commented on code in PR #993:
URL: https://github.com/apache/parquet-mr/pull/993#discussion_r981762161


##
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/SnappyCompressor.java:
##
@@ -32,6 +32,10 @@
  * entire input in setInput and compresses it as one compressed block.
  */
 public class SnappyCompressor implements Compressor {
+  // Double up to an 8 mb write buffer,  then switch to 1MB linear allocation
+  private static final int DOUBLING_ALLOC_THRESH =  8 << 20;

Review Comment:
   use 1 << 23 is more meaningful



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org