[ https://issues.apache.org/jira/browse/SPARK-31703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-31703: ---------------------------------- Priority: Blocker (was: Critical) > Changes made by SPARK-26985 break reading parquet files correctly in > BigEndian architectures (AIX + LinuxPPC64) > --------------------------------------------------------------------------------------------------------------- > > Key: SPARK-31703 > URL: https://issues.apache.org/jira/browse/SPARK-31703 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.5, 3.0.0 > Environment: AIX 7.2 > LinuxPPC64 with RedHat. > Reporter: Michail Giannakopoulos > Priority: Blocker > Labels: BigEndian, correctness > Attachments: Data_problem_Spark.gif > > > Trying to upgrade to Apache Spark 2.4.5 in our IBM systems (AIX and PowerPC) > so as to be able to read data stored in parquet format, we notice that values > associated with DOUBLE and DECIMAL types are parsed in the wrong form. > According toe parquet documentation, they always opt to store the values > using left-endian representation for values: > [https://github.com/apache/parquet-format/blob/master/Encodings.md] > {noformat} > The plain encoding is used whenever a more efficient encoding can not be > used. It > stores the data in the following format: > BOOLEAN: Bit Packed, LSB first > INT32: 4 bytes little endian > INT64: 8 bytes little endian > INT96: 12 bytes little endian (deprecated) > FLOAT: 4 bytes IEEE little endian > DOUBLE: 8 bytes IEEE little endian > BYTE_ARRAY: length in 4 bytes little endian followed by the bytes contained > in the array > FIXED_LEN_BYTE_ARRAY: the bytes contained in the array > For native types, this outputs the data as little endian. Floating > point types are encoded in IEEE. > For the byte array type, it encodes the length as a 4 byte little > endian, followed by the bytes.{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org