[ https://issues.apache.org/jira/browse/SPARK-22003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165727#comment-16165727 ]
Apache Spark commented on SPARK-22003: -------------------------------------- User 'liufengdb' has created a pull request for this issue: https://github.com/apache/spark/pull/19230 > vectorized reader does not work with UDF when the column is array > ----------------------------------------------------------------- > > Key: SPARK-22003 > URL: https://issues.apache.org/jira/browse/SPARK-22003 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.2.0 > Reporter: Feng Liu > > The UDF needs to deserialize the UnsafeRow. When the column type is Array, > the `get` method from the ColumnVector, which is used by the vectorized > reader, is called, but this method is not implemented, unfortunately. > Code to reproduce the issue: > {code:java} > val fileName = "testfile" > val str = """{ "choices": ["key1", "key2", "key3"] }""" > val rdd = sc.parallelize(Seq(str)) > val df = spark.read.json(rdd) > df.write.mode("overwrite").parquet(s"file:///tmp/$fileName ") > import org.apache.spark.sql._ > import org.apache.spark.sql.functions._ > spark.udf.register("acf", (rows: Seq[Row]) => Option[String](null)) > spark.read.parquet(s"file:///tmp/$fileName > ").select(expr("""acf(choices)""")).show > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org