Antoine Pitrou created ARROW-4018: ------------------------------------- Summary: [C++] RLE decoder may not big-endian compatible Key: ARROW-4018 URL: https://issues.apache.org/jira/browse/ARROW-4018 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.11.1 Reporter: Antoine Pitrou
This issue was found by Coverity. The {{RleDecoder::NextCounts}} method has the following code to fetch the repeated literal in repeated runs: {code:c++} bool result = bit_reader_.GetAligned<T>(static_cast<int>(BitUtil::CeilDiv(bit_width_, 8)), reinterpret_cast<T*>(¤t_value_)); {code} Coverity says this: bq. Pointer "&this->current_value_" points to an object whose effective type is "unsigned long long" (64 bits, unsigned) but is dereferenced as a narrower "unsigned int" (32 bits, unsigned). This may lead to unexpected results depending on machine endianness. bq. In addition, it's not obvious whether {{current_value_}} also needs byte-swapping (presumably, at least in the Parquet file format, it's supposed to be stored in little-endian format in the RLE bitstream). -- This message was sent by Atlassian JIRA (v7.6.3#76005)