[jira] [Commented] (ARROW-3831) [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size

2018-11-19 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692089#comment-16692089
 ] 

Wes McKinney commented on ARROW-3831:
-

What do you think about adding a second {{Decompress}} virtual method that will 
return the output length?

https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/compression.h#L106

The default implementation could be NotImplemented for codecs that do not 
support this

> [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size
> 
>
> Key: ARROW-3831
> URL: https://issues.apache.org/jira/browse/ARROW-3831
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.11.1
>Reporter: Kouhei Sutou
>Priority: Major
>
> We can't know decompressed data size when we only have compressed data. The 
> current {{arrow::util::Codec::Decompress()}} doesn't return decompressed data 
> size. So we can't know which data in {{output_buffer}} can be used.
> FYI: {{arrow::util::Codec::Compress()}} returns compressed data size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3831) [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size

2018-11-18 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691138#comment-16691138
 ] 

Kouhei Sutou commented on ARROW-3831:
-

I think that the actual decompressed size even when we know the expected 
decompressed size.
For example, we can validate the decompression result. If the actual 
decompressed size and the expected one is different, the compressed data will 
be broken (or decompress logic is broken).

As far as I know, zlib, LZ4 and Zstandard return the actual decompressed size.

> [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size
> 
>
> Key: ARROW-3831
> URL: https://issues.apache.org/jira/browse/ARROW-3831
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.11.1
>Reporter: Kouhei Sutou
>Priority: Major
>
> We can't know decompressed data size when we only have compressed data. The 
> current {{arrow::util::Codec::Decompress()}} doesn't return decompressed data 
> size. So we can't know which data in {{output_buffer}} can be used.
> FYI: {{arrow::util::Codec::Compress()}} returns compressed data size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3831) [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size

2018-11-18 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690956#comment-16690956
 ] 

Wes McKinney commented on ARROW-3831:
-

Do compression libraries provide the decompressed size consistently? Knowing 
the decompressed size a priori is useful to be able to pre-allocate memory

> [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size
> 
>
> Key: ARROW-3831
> URL: https://issues.apache.org/jira/browse/ARROW-3831
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.11.1
>Reporter: Kouhei Sutou
>Priority: Major
>
> We can't know decompressed data size when we only have compressed data. The 
> current {{arrow::util::Codec::Decompress()}} doesn't return decompressed data 
> size. So we can't know which data in {{output_buffer}} can be used.
> FYI: {{arrow::util::Codec::Compress()}} returns compressed data size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3831) [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size

2018-11-18 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690849#comment-16690849
 ] 

Antoine Pitrou commented on ARROW-3831:
---

The current use case is that you already know the decompressed size, because 
the information was recorded somewhere (in e.g. a Parquet file).

> [C++] arrow::util::Codec::Decompress() doesn't return decompressed data size
> 
>
> Key: ARROW-3831
> URL: https://issues.apache.org/jira/browse/ARROW-3831
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.11.1
>Reporter: Kouhei Sutou
>Priority: Major
>
> We can't know decompressed data size when we only have compressed data. The 
> current {{arrow::util::Codec::Decompress()}} doesn't return decompressed data 
> size. So we can't know which data in {{output_buffer}} can be used.
> FYI: {{arrow::util::Codec::Compress()}} returns compressed data size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)