Neil McQuarrie created SPARK-21727: -------------------------------------- Summary: Operating on an ArrayType in a SparkR DataFrame throws error Key: SPARK-21727 URL: https://issues.apache.org/jira/browse/SPARK-21727 Project: Spark Issue Type: Bug Components: SparkR Affects Versions: 2.2.0 Reporter: Neil McQuarrie
Previously [posted|https://stackoverflow.com/questions/45056973/sparkr-dataframe-with-r-lists-as-elements] this as a stack overflow question but it seems to be a bug. If I have an R data.frame where one of the column data types is an integer list -- i.e., each of the column elements embeds an entire R list of integers -- then I can convert the data.frame to a SparkR DataFrame just fine; SparkR treats the column as ArrayType(Double). However, any subsequent operation on this DataFrame appears to throw an error. Create an example R data.frame: {code} indices <- 1:4 myDf <- data.frame(indices) myDf$data <- list(rep(0, 20))}} {code} Convert it to a SparkR DataFrame: {code} library(SparkR, lib.loc=paste0(Sys.getenv("SPARK_HOME"),"/R/lib")) sparkR.session(master = "local[*]") mySparkDf <- as.DataFrame(myDf) {code} Examine the DataFrame schema; the list column was successfully converted to ArrayType: {code} > schema(mySparkDf) StructType |-name = "indices", type = "IntegerType", nullable = TRUE |-name = "data", type = "ArrayType(DoubleType,true)", nullable = TRUE {code} However, operating on the SparkR DataFrame throws an error: {code} > collect(mySparkDf) 17/07/13 17:23:00 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1) java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: java.lang.Double is not a valid external type for schema of array<double> if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, indices), IntegerType) AS indices#0 ... long stack trace ... {code} Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org