[ https://issues.apache.org/jira/browse/SPARK-21277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071929#comment-16071929 ]
Liang-Chi Hsieh commented on SPARK-21277: ----------------------------------------- The call to {{InternalRow.getArray}} returns an {{ArrayData}}, it can be an {{UnsafeArrayData}}. Although you don't serialize your object data to {{UnsafeArrayData}}, the SparkSQL internally uses {{UnsafeArrayData}} for array. We can close this if you have no further question. > Spark is invoking an incorrect serializer after UDAF completion > --------------------------------------------------------------- > > Key: SPARK-21277 > URL: https://issues.apache.org/jira/browse/SPARK-21277 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL > Affects Versions: 2.1.0 > Reporter: Erik Erlandson > > I'm writing a UDAF that also requires some custom UDT implementations. The > UDAF (and UDT) logic appear to be executing properly up through the final > UDAF call to the {{evaluate}} method. However, after the evaluate method > completes, I am seeing the UDT {{deserialize}} method being called another > time, however this time it is being invoked on data that wasn't produced by > my corresponding {{serialize}} method, and it is crashing. The following > REPL output shows the execution and completion of {{evaluate}}, and then > another call to {{deserialize}} that sees some kind of {{UnsafeArrayData}} > object that my serialization doesn't produce, and so the method fails: > {code}entering evaluate > a= > [[0.5,10,2,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1813f2c,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@b3587fc7],[0.5,10,4,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d3065487,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1a5ace9],[0.5,10,4,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d01fbbcf,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1a5ace9]] > leaving evaluate > a= org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@27d73513 > java.lang.RuntimeException: Error while decoding: > java.lang.UnsupportedOperationException: Not supported on UnsafeArrayData. > createexternalrow(newInstance(class > org.apache.spark.isarnproject.sketches.udt.TDigestArrayUDT).deserialize, > StructField(tdigestmlvecudaf(features),TDigestArrayUDT,true)) > {code} > To reproduce, check out the branch {{first-cut}} of {{isarn-sketches-spark}}: > https://github.com/erikerlandson/isarn-sketches-spark/tree/first-cut > Then invoke {{xsbt console}} to get a REPL with a spark session. In the REPL > execute: > {code} > Welcome to Scala 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131). > Type in expressions for evaluation. Or try :help. > scala> val training = spark.createDataFrame(Seq((1.0, Vectors.dense(0.0, 1.1, > 0.1)),(0.0, Vectors.dense(2.0, 1.0, -1.0)),(0.0, Vectors.dense(2.0, 1.3, > 1.0)),(1.0, Vectors.dense(0.0, 1.2, -0.5)))).toDF("label", "features") > training: org.apache.spark.sql.DataFrame = [label: double, features: vector] > scala> val featTD = > training.agg(TDigestMLVecUDAF(0.5,10)(training("features"))) > featTD: org.apache.spark.sql.DataFrame = [tdigestmlvecudaf(features): > tdigestarray] > scala> featTD.first > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org