cxzl25 commented on code in PR #36787: URL: https://github.com/apache/spark/pull/36787#discussion_r901783862
########## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala: ########## @@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession { } } } + + test("SPARK-39387: BytesColumnVector should not throw RuntimeException due to overflow") { Review Comment: I tested it with JDK11 locally and it can run successfully. ```bash setjdk 1.11 build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.sql.execution.datasources.orc.OrcV1QuerySuite test ``` ![image](https://user-images.githubusercontent.com/3898450/174632802-10abcf43-d1df-4b1b-a8ac-097f240338d6.png) I saw the GA error because the writing process encountered OOM, which should have nothing to do with JDK11. ```java 2022-06-16T14:30:19.8285352Z Caused by: java.lang.OutOfMemoryError: Java heap space 2022-06-16T14:30:19.8285963Z at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.allocateBuffer(BytesColumnVector.java:300) 2022-06-16T14:30:19.8286885Z at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.ensureValPreallocated(BytesColumnVector.java:218) 2022-06-16T14:30:19.8287675Z at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:182) 2022-06-16T14:30:19.8288377Z at org.apache.orc.mapred.OrcMapredRecordWriter.setBinaryValue(OrcMapredRecordWriter.java:87) 2022-06-16T14:30:19.8289257Z at org.apache.orc.mapred.OrcMapredRecordWriter.setColumn(OrcMapredRecordWriter.java:235) 2022-06-16T14:30:19.8289956Z at org.apache.orc.mapred.OrcMapredRecordWriter.setStructValue(OrcMapredRecordWriter.java:133) 2022-06-16T14:30:19.8290654Z at org.apache.orc.mapred.OrcMapredRecordWriter.setColumn(OrcMapredRecordWriter.java:248) 2022-06-16T14:30:19.8291438Z at org.apache.orc.mapred.OrcMapredRecordWriter.setListValue(OrcMapredRecordWriter.java:162) 2022-06-16T14:30:19.8292127Z at org.apache.orc.mapred.OrcMapredRecordWriter.setColumn(OrcMapredRecordWriter.java:256) 2022-06-16T14:30:19.8292824Z at org.apache.orc.mapreduce.OrcMapreduceRecordWriter.write(OrcMapreduceRecordWriter.java:73) 2022-06-16T14:30:19.8293554Z at org.apache.spark.sql.execution.datasources.orc.OrcOutputWriter.write(OrcOutputWriter.scala:56) 2022-06-16T14:30:19.8294523Z at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:175) ``` This test does not seem to be able to compress buffer memory like PR #34284, it requires a relatively large memory to write to ORC to ensure test coverage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org