[jira] [Commented] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs
[ https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653709#comment-17653709 ] Apache Spark commented on SPARK-41804: -- User 'bersprockets' has created a pull request for this issue: https://github.com/apache/spark/pull/39349 > InterpretedUnsafeProjection doesn't properly handle an array of UDTs > > > Key: SPARK-41804 > URL: https://issues.apache.org/jira/browse/SPARK-41804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Bruce Robbins >Priority: Major > > Reproduction steps: > {noformat} > // create a file of vector data > import org.apache.spark.ml.linalg.{DenseVector, Vector} > case class TestRow(varr: Array[Vector]) > val values = Array(0.1d, 0.2d, 0.3d) > val dv = new DenseVector(values).asInstanceOf[Vector] > val ds = Seq(TestRow(Array(dv, dv))).toDS > ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data") > // this works > spark.read.format("parquet").load("vector_data").collect > sql("set spark.sql.codegen.wholeStage=false") > sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") > // this will get an error > spark.read.format("parquet").load("vector_data").collect > {noformat} > The error varies each time you run it, e.g.: > {noformat} > Sparse vectors require that the dimension of the indices match the dimension > of the values. > You provided 2 indices and 6619240 values. > {noformat} > or > {noformat} > org.apache.spark.SparkRuntimeException: Error while decoding: > java.lang.NegativeArraySizeException > {noformat} > or > {noformat} > java.lang.OutOfMemoryError: Java heap space > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414) > {noformat} > or > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGBUS (0xa) at pc=0x0001120c9d30, pid=64213, tid=0x1003 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build > 1.8.0_311-b11) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 > compressed oops) > # Problematic frame: > # V [libjvm.dylib+0xc9d30] acl_CopyRight+0x29 > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # //hs_err_pid64213.log > Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x00011efa8890,0x00011efa8be8] = 856 > relocation [0x00011efa89b8,0x00011efa89f8] = 64 > main code [0x00011efa8a00,0x00011efa8be8] = 488 > Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x00011efa8890,0x00011efa8be8] = 856 > relocation [0x00011efa89b8,0x00011efa89f8] = 64 > main code [0x00011efa8a00,0x00011efa8be8] = 488 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs
[ https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653367#comment-17653367 ] Bruce Robbins commented on SPARK-41804: --- I think I have a handle on what's going on here > InterpretedUnsafeProjection doesn't properly handle an array of UDTs > > > Key: SPARK-41804 > URL: https://issues.apache.org/jira/browse/SPARK-41804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Bruce Robbins >Priority: Major > > Reproduction steps: > {noformat} > // create a file of vector data > import org.apache.spark.ml.linalg.{DenseMatrix, DenseVector, Matrix, Vector} > case class TestRow(varr: Array[Vector]) > val values = Array(0.1d, 0.2d, 0.3d) > val dv = new DenseVector(values).asInstanceOf[Vector] > val ds = Seq(TestRow(Array(dv, dv))).toDS > ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data") > // this works > spark.read.format("parquet").load("vector_data").collect > sql("set spark.sql.codegen.wholeStage=false") > sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") > // this will get an error > spark.read.format("parquet").load("vector_data").collect > {noformat} > The error varies each time you run it, e.g.: > {noformat} > Sparse vectors require that the dimension of the indices match the dimension > of the values. > You provided 2 indices and 6619240 values. > {noformat} > or > {noformat} > org.apache.spark.SparkRuntimeException: Error while decoding: > java.lang.NegativeArraySizeException > {noformat} > or > {noformat} > java.lang.OutOfMemoryError: Java heap space > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414) > {noformat} > or > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGBUS (0xa) at pc=0x0001120c9d30, pid=64213, tid=0x1003 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build > 1.8.0_311-b11) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 > compressed oops) > # Problematic frame: > # V [libjvm.dylib+0xc9d30] acl_CopyRight+0x29 > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # //hs_err_pid64213.log > Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x00011efa8890,0x00011efa8be8] = 856 > relocation [0x00011efa89b8,0x00011efa89f8] = 64 > main code [0x00011efa8a00,0x00011efa8be8] = 488 > Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x00011efa8890,0x00011efa8be8] = 856 > relocation [0x00011efa89b8,0x00011efa89f8] = 64 > main code [0x00011efa8a00,0x00011efa8be8] = 488 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org