[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

Jason Lowe (JIRA) Mon, 12 Dec 2016 11:30:18 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742894#comment-15742894
 ]


Jason Lowe commented on HADOOP-13578:
-------------------------------------

Thanks for updating the patch!  Sorry for the delay in re-review, as I was 
travelling last week.  The ASF license warnings appear to be unrelated, as does 
the unit test failure.  Comments on the patch:

IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE should be 
IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE_DEFAULT

The ZStandardCompressor(level, bufferSize) constructor should be implemented in 
terms of ZStandardCompressor(level, inputBufferSize, outputBufferSize) to 
reduce the code redundancy and improve maintainability.

If inflateBytesDirect took a dest position then we wouldn't need to slice the 
destination buffer, avoiding a temporary objection allocation.

I'm not sure about this snippet in ZStandardDirectDecompressor#decompress:
{code}
      endOfInput = !src.hasRemaining();
      super.finished = !src.hasRemaining();
{code}
After adding the super.finished change it appears the endOfInput and finished 
variables will simply reflect each other, so endOfInput would be redundant.  
Instead I think endOfInput may still be necessary, and the finished variable 
handling will already be taken care of by the JNI code.  In other words I think 
we don't need the super.finished line that was added.  However maybe I'm 
missing something here.

I think we need to return after these THROW macros or risk passing null 
pointers to the zstd library:
{code}
    void * uncompressed_bytes = (*env)->GetDirectBufferAddress(env, 
uncompressed_direct_buf);
    if (!uncompressed_bytes) {
        THROW(env, "java/lang/InternalError", "Undefined memory address for 
uncompressedDirectBuf");
    }

    // Get the output direct buffer
    void * compressed_bytes = (*env)->GetDirectBufferAddress(env, 
compressed_direct_buf);
    if (!compressed_bytes) {
        THROW(env, "java/lang/InternalError", "Undefined memory address for 
compressedDirectBuf");
    }
{code}
Same comments for similar code on the decompressor JNI side.

Related to this code:
{code}
        if (remaining_to_flush) {
            (*env)->SetBooleanField(env, this, ZStandardCompressor_finished, 
JNI_FALSE);
        } else {
            (*env)->SetBooleanField(env, this, ZStandardCompressor_finished, 
JNI_TRUE);
        }
{code}
What I meant by my previous review was to have the JNI layer clear the finished 
flag in any case we didn't set it, including the case where the finish flag 
isn't asking us to wrap things up.  Actually thinking about this further, I'm 
not sure we need the case where we set it to false.  We just need to set it to 
true in the JNI layer when the end flush indicates there aren't any more bytes 
to flush.  In every other case finished should already be false, and the user 
will need to call reset() to clear both the finish and finished flags to 
continue with more input.  So I think we can put this back to the way it was.  
Sorry for the confusion on my part.

If there are still bytes left to consume in the uncompressed buffer then I 
don't think we want to call ZSTD_endStream.  We should only be calling 
ZSTD_endStream when the uncompressed buffer has been fully consumed, correct?  
Otherwise we may fail to compress the final chunk of input data when the user 
sets the finish flag and the zstd library doesn't fully consume the input on 
the ZSTD_compressStream call.  That could result in a "successful" compression 
that drops data.


> Add Codec for ZStandard Compression
> -----------------------------------
>
>                 Key: HADOOP-13578
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13578
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: churro morales
>            Assignee: churro morales
>         Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, 
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

Reply via email to