[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742894#comment-15742894 ]
Jason Lowe commented on HADOOP-13578: ------------------------------------- Thanks for updating the patch! Sorry for the delay in re-review, as I was travelling last week. The ASF license warnings appear to be unrelated, as does the unit test failure. Comments on the patch: IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE should be IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE_DEFAULT The ZStandardCompressor(level, bufferSize) constructor should be implemented in terms of ZStandardCompressor(level, inputBufferSize, outputBufferSize) to reduce the code redundancy and improve maintainability. If inflateBytesDirect took a dest position then we wouldn't need to slice the destination buffer, avoiding a temporary objection allocation. I'm not sure about this snippet in ZStandardDirectDecompressor#decompress: {code} endOfInput = !src.hasRemaining(); super.finished = !src.hasRemaining(); {code} After adding the super.finished change it appears the endOfInput and finished variables will simply reflect each other, so endOfInput would be redundant. Instead I think endOfInput may still be necessary, and the finished variable handling will already be taken care of by the JNI code. In other words I think we don't need the super.finished line that was added. However maybe I'm missing something here. I think we need to return after these THROW macros or risk passing null pointers to the zstd library: {code} void * uncompressed_bytes = (*env)->GetDirectBufferAddress(env, uncompressed_direct_buf); if (!uncompressed_bytes) { THROW(env, "java/lang/InternalError", "Undefined memory address for uncompressedDirectBuf"); } // Get the output direct buffer void * compressed_bytes = (*env)->GetDirectBufferAddress(env, compressed_direct_buf); if (!compressed_bytes) { THROW(env, "java/lang/InternalError", "Undefined memory address for compressedDirectBuf"); } {code} Same comments for similar code on the decompressor JNI side. Related to this code: {code} if (remaining_to_flush) { (*env)->SetBooleanField(env, this, ZStandardCompressor_finished, JNI_FALSE); } else { (*env)->SetBooleanField(env, this, ZStandardCompressor_finished, JNI_TRUE); } {code} What I meant by my previous review was to have the JNI layer clear the finished flag in any case we didn't set it, including the case where the finish flag isn't asking us to wrap things up. Actually thinking about this further, I'm not sure we need the case where we set it to false. We just need to set it to true in the JNI layer when the end flush indicates there aren't any more bytes to flush. In every other case finished should already be false, and the user will need to call reset() to clear both the finish and finished flags to continue with more input. So I think we can put this back to the way it was. Sorry for the confusion on my part. If there are still bytes left to consume in the uncompressed buffer then I don't think we want to call ZSTD_endStream. We should only be calling ZSTD_endStream when the uncompressed buffer has been fully consumed, correct? Otherwise we may fail to compress the final chunk of input data when the user sets the finish flag and the zstd library doesn't fully consume the input on the ZSTD_compressStream call. That could result in a "successful" compression that drops data. > Add Codec for ZStandard Compression > ----------------------------------- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature > Reporter: churro morales > Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org