[jira] [Commented] (ARROW-3831) [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size
[ https://issues.apache.org/jira/browse/ARROW-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692089#comment-16692089 ] Wes McKinney commented on ARROW-3831: - What do you think about adding a second {{Decompress}} virtual method that will return the output length? https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/compression.h#L106 The default implementation could be NotImplemented for codecs that do not support this > [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size > > > Key: ARROW-3831 > URL: https://issues.apache.org/jira/browse/ARROW-3831 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.11.1 >Reporter: Kouhei Sutou >Priority: Major > > We can't know decompressed data size when we only have compressed data. The > current {{arrow::util::Codec::Decompress()}} doesn't return decompressed data > size. So we can't know which data in {{output_buffer}} can be used. > FYI: {{arrow::util::Codec::Compress()}} returns compressed data size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3831) [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size
[ https://issues.apache.org/jira/browse/ARROW-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691138#comment-16691138 ] Kouhei Sutou commented on ARROW-3831: - I think that the actual decompressed size even when we know the expected decompressed size. For example, we can validate the decompression result. If the actual decompressed size and the expected one is different, the compressed data will be broken (or decompress logic is broken). As far as I know, zlib, LZ4 and Zstandard return the actual decompressed size. > [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size > > > Key: ARROW-3831 > URL: https://issues.apache.org/jira/browse/ARROW-3831 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.11.1 >Reporter: Kouhei Sutou >Priority: Major > > We can't know decompressed data size when we only have compressed data. The > current {{arrow::util::Codec::Decompress()}} doesn't return decompressed data > size. So we can't know which data in {{output_buffer}} can be used. > FYI: {{arrow::util::Codec::Compress()}} returns compressed data size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3831) [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size
[ https://issues.apache.org/jira/browse/ARROW-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690956#comment-16690956 ] Wes McKinney commented on ARROW-3831: - Do compression libraries provide the decompressed size consistently? Knowing the decompressed size a priori is useful to be able to pre-allocate memory > [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size > > > Key: ARROW-3831 > URL: https://issues.apache.org/jira/browse/ARROW-3831 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.11.1 >Reporter: Kouhei Sutou >Priority: Major > > We can't know decompressed data size when we only have compressed data. The > current {{arrow::util::Codec::Decompress()}} doesn't return decompressed data > size. So we can't know which data in {{output_buffer}} can be used. > FYI: {{arrow::util::Codec::Compress()}} returns compressed data size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3831) [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size
[ https://issues.apache.org/jira/browse/ARROW-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690849#comment-16690849 ] Antoine Pitrou commented on ARROW-3831: --- The current use case is that you already know the decompressed size, because the information was recorded somewhere (in e.g. a Parquet file). > [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size > > > Key: ARROW-3831 > URL: https://issues.apache.org/jira/browse/ARROW-3831 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.11.1 >Reporter: Kouhei Sutou >Priority: Major > > We can't know decompressed data size when we only have compressed data. The > current {{arrow::util::Codec::Decompress()}} doesn't return decompressed data > size. So we can't know which data in {{output_buffer}} can be used. > FYI: {{arrow::util::Codec::Compress()}} returns compressed data size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)