Github user robbinspg commented on a diff in the pull request: https://github.com/apache/spark/pull/12397#discussion_r59863075 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java --- @@ -31,6 +33,8 @@ private byte[] buffer; private int offset; private int bitOffset; // Only used for booleans. + + private final static boolean bigEndianPlatform = ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN); --- End diff -- So the first step is to get this functional on big endian without affecting the little endian implementation asap for inclusion in 2.0.0. In VectorizedPlainValuesReader we could wrap the buffer once (assuming it never gets re-allocated) and use the ByteBuffer for the big endian access. We could always use the ByteBuffer rather than the byte[] even in little endian implementation but I do not know what performance impact that would have and as stated above I did not want to mess around with the little endian implementation at this time. Of course the methods in on/offHeapColumnVector that take the byte[] as a parameter could also be changed to take the byte buffer instead. There is also the unnecessary subtracting/adding the Platform.BYTE_ARRAY_OFFSET around a lot of the method calls. eg public final void readIntegers(int total, ColumnVector c, int rowId) { c.putIntsLittleEndian(rowId, total, buffer, offset - Platform.BYTE_ARRAY_OFFSET); offset += 4 * total; } where the first thing that putIntsLittleEndian does is to add that back on to the offset. So I think for now I'm reasonably happy with this big endian support in this patch and we could maybe review the whole code structure later. I've improved the patch and will maybe make the change to VectorizedPlainValuesReader so the code in there only wraps the buffer once.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org