subject:"\[jira\] \[Commented\] \(SPARK\-41804\) InterpretedUnsafeProjection doesn't properly handle an array of UDTs"

[jira] [Commented] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs

2023-01-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653709#comment-17653709
 ] 

Apache Spark commented on SPARK-41804:
--

User 'bersprockets' has created a pull request for this issue:
https://github.com/apache/spark/pull/39349

> InterpretedUnsafeProjection doesn't properly handle an array of UDTs
> 
>
> Key: SPARK-41804
> URL: https://issues.apache.org/jira/browse/SPARK-41804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Bruce Robbins
>Priority: Major
>
> Reproduction steps:
> {noformat}
> // create a file of vector data
> import org.apache.spark.ml.linalg.{DenseVector, Vector}
> case class TestRow(varr: Array[Vector])
> val values = Array(0.1d, 0.2d, 0.3d)
> val dv = new DenseVector(values).asInstanceOf[Vector]
> val ds = Seq(TestRow(Array(dv, dv))).toDS
> ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data")
> // this works
> spark.read.format("parquet").load("vector_data").collect
> sql("set spark.sql.codegen.wholeStage=false")
> sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> // this will get an error
> spark.read.format("parquet").load("vector_data").collect
> {noformat}
> The error varies each time you run it, e.g.:
> {noformat}
> Sparse vectors require that the dimension of the indices match the dimension 
> of the values.
> You provided 2 indices and  6619240 values.
> {noformat}
> or
> {noformat}
> org.apache.spark.SparkRuntimeException: Error while decoding: 
> java.lang.NegativeArraySizeException
> {noformat}
> or
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414)
> {noformat}
> or
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0xa) at pc=0x0001120c9d30, pid=64213, tid=0x1003
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build 
> 1.8.0_311-b11)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 
> compressed oops)
> # Problematic frame:
> # V  [libjvm.dylib+0xc9d30]  acl_CopyRight+0x29
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # //hs_err_pid64213.log
> Compiled method (nm)  582142 11318 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x00011efa8890,0x00011efa8be8] = 856
>  relocation [0x00011efa89b8,0x00011efa89f8] = 64
>  main code  [0x00011efa8a00,0x00011efa8be8] = 488
> Compiled method (nm)  582142 11318 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x00011efa8890,0x00011efa8be8] = 856
>  relocation [0x00011efa89b8,0x00011efa89f8] = 64
>  main code  [0x00011efa8a00,0x00011efa8be8] = 488
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs

2022-12-31 Thread Bruce Robbins (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653367#comment-17653367
 ] 

Bruce Robbins commented on SPARK-41804:
---

I think I have a handle on what's going on here

> InterpretedUnsafeProjection doesn't properly handle an array of UDTs
> 
>
> Key: SPARK-41804
> URL: https://issues.apache.org/jira/browse/SPARK-41804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Bruce Robbins
>Priority: Major
>
> Reproduction steps:
> {noformat}
> // create a file of vector data
> import org.apache.spark.ml.linalg.{DenseMatrix, DenseVector, Matrix, Vector}
> case class TestRow(varr: Array[Vector])
> val values = Array(0.1d, 0.2d, 0.3d)
> val dv = new DenseVector(values).asInstanceOf[Vector]
> val ds = Seq(TestRow(Array(dv, dv))).toDS
> ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data")
> // this works
> spark.read.format("parquet").load("vector_data").collect
> sql("set spark.sql.codegen.wholeStage=false")
> sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> // this will get an error
> spark.read.format("parquet").load("vector_data").collect
> {noformat}
> The error varies each time you run it, e.g.:
> {noformat}
> Sparse vectors require that the dimension of the indices match the dimension 
> of the values.
> You provided 2 indices and  6619240 values.
> {noformat}
> or
> {noformat}
> org.apache.spark.SparkRuntimeException: Error while decoding: 
> java.lang.NegativeArraySizeException
> {noformat}
> or
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414)
> {noformat}
> or
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0xa) at pc=0x0001120c9d30, pid=64213, tid=0x1003
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build 
> 1.8.0_311-b11)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 
> compressed oops)
> # Problematic frame:
> # V  [libjvm.dylib+0xc9d30]  acl_CopyRight+0x29
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # //hs_err_pid64213.log
> Compiled method (nm)  582142 11318 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x00011efa8890,0x00011efa8be8] = 856
>  relocation [0x00011efa89b8,0x00011efa89f8] = 64
>  main code  [0x00011efa8a00,0x00011efa8be8] = 488
> Compiled method (nm)  582142 11318 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x00011efa8890,0x00011efa8be8] = 856
>  relocation [0x00011efa89b8,0x00011efa89f8] = 64
>  main code  [0x00011efa8a00,0x00011efa8be8] = 488
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs

[jira] [Commented] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs

2 matches

Site Navigation

Mail list logo

Footer information