[jira] [Commented] (SPARK-21277) Spark is invoking an incorrect serializer after UDAF completion

Liang-Chi Hsieh (JIRA) Sun, 02 Jul 2017 22:18:38 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-21277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071929#comment-16071929
 ]


Liang-Chi Hsieh commented on SPARK-21277:
-----------------------------------------

The call to {{InternalRow.getArray}} returns an {{ArrayData}}, it can be an 
{{UnsafeArrayData}}. Although you don't serialize your object data to 
{{UnsafeArrayData}}, the SparkSQL internally uses {{UnsafeArrayData}} for array.

We can close this if you have no further question.

> Spark is invoking an incorrect serializer after UDAF completion
> ---------------------------------------------------------------
>
>                 Key: SPARK-21277
>                 URL: https://issues.apache.org/jira/browse/SPARK-21277
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer, SQL
>    Affects Versions: 2.1.0
>            Reporter: Erik Erlandson
>
> I'm writing a UDAF that also requires some custom UDT implementations.  The 
> UDAF (and UDT) logic appear to be executing properly up through the final 
> UDAF call to the {{evaluate}} method. However, after the evaluate method 
> completes, I am seeing the UDT {{deserialize}} method being called another 
> time, however this time it is being invoked on data that wasn't produced by 
> my corresponding {{serialize}} method, and it is crashing.  The following 
> REPL output shows the execution and completion of {{evaluate}}, and then 
> another call to {{deserialize}} that sees some kind of {{UnsafeArrayData}} 
> object that my serialization doesn't produce, and so the method fails:
> {code}entering evaluate
> a= 
> [[0.5,10,2,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1813f2c,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@b3587fc7],[0.5,10,4,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d3065487,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1a5ace9],[0.5,10,4,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d01fbbcf,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1a5ace9]]
> leaving evaluate
> a= org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@27d73513
> java.lang.RuntimeException: Error while decoding: 
> java.lang.UnsupportedOperationException: Not supported on UnsafeArrayData.
> createexternalrow(newInstance(class 
> org.apache.spark.isarnproject.sketches.udt.TDigestArrayUDT).deserialize, 
> StructField(tdigestmlvecudaf(features),TDigestArrayUDT,true))
> {code}
> To reproduce, check out the branch {{first-cut}} of {{isarn-sketches-spark}}:
> https://github.com/erikerlandson/isarn-sketches-spark/tree/first-cut
> Then invoke {{xsbt console}} to get a REPL with a spark session.  In the REPL 
> execute:
> {code}
> Welcome to Scala 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131).
> Type in expressions for evaluation. Or try :help.
> scala> val training = spark.createDataFrame(Seq((1.0, Vectors.dense(0.0, 1.1, 
> 0.1)),(0.0, Vectors.dense(2.0, 1.0, -1.0)),(0.0, Vectors.dense(2.0, 1.3, 
> 1.0)),(1.0, Vectors.dense(0.0, 1.2, -0.5)))).toDF("label", "features")
> training: org.apache.spark.sql.DataFrame = [label: double, features: vector]
> scala> val featTD = 
> training.agg(TDigestMLVecUDAF(0.5,10)(training("features")))
> featTD: org.apache.spark.sql.DataFrame = [tdigestmlvecudaf(features): 
> tdigestarray]
> scala> featTD.first
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21277) Spark is invoking an incorrect serializer after UDAF completion

Reply via email to