[ https://issues.apache.org/jira/browse/SPARK-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151437#comment-16151437 ]
Felix Cheung commented on SPARK-21727: -------------------------------------- hmm.. I think that's what the error message is saying {code} java.lang.Double is not a valid external type for schema of array<double> {code} it's finding a double and not an array of double > Operating on an ArrayType in a SparkR DataFrame throws error > ------------------------------------------------------------ > > Key: SPARK-21727 > URL: https://issues.apache.org/jira/browse/SPARK-21727 > Project: Spark > Issue Type: Bug > Components: SparkR > Affects Versions: 2.2.0 > Reporter: Neil McQuarrie > > Previously > [posted|https://stackoverflow.com/questions/45056973/sparkr-dataframe-with-r-lists-as-elements] > this as a stack overflow question but it seems to be a bug. > If I have an R data.frame where one of the column data types is an integer > *list* -- i.e., each of the elements in the column embeds an entire R list of > integers -- then it seems I can convert this data.frame to a SparkR DataFrame > just fine... SparkR treats the column as ArrayType(Double). > However, any subsequent operation on this SparkR DataFrame appears to throw > an error. > Create an example R data.frame: > {code} > indices <- 1:4 > myDf <- data.frame(indices) > myDf$data <- list(rep(0, 20))}} > {code} > Examine it to make sure it looks okay: > {code} > > str(myDf) > 'data.frame': 4 obs. of 2 variables: > $ indices: int 1 2 3 4 > $ data :List of 4 > ..$ : num 0 0 0 0 0 0 0 0 0 0 ... > ..$ : num 0 0 0 0 0 0 0 0 0 0 ... > ..$ : num 0 0 0 0 0 0 0 0 0 0 ... > ..$ : num 0 0 0 0 0 0 0 0 0 0 ... > > head(myDf) > indices data > 1 1 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 2 2 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 3 3 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 4 4 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > {code} > Convert it to a SparkR DataFrame: > {code} > library(SparkR, lib.loc=paste0(Sys.getenv("SPARK_HOME"),"/R/lib")) > sparkR.session(master = "local[*]") > mySparkDf <- as.DataFrame(myDf) > {code} > Examine the SparkR DataFrame schema; notice that the list column was > successfully converted to ArrayType: > {code} > > schema(mySparkDf) > StructType > |-name = "indices", type = "IntegerType", nullable = TRUE > |-name = "data", type = "ArrayType(DoubleType,true)", nullable = TRUE > {code} > However, operating on the SparkR DataFrame throws an error: > {code} > > collect(mySparkDf) > 17/07/13 17:23:00 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: > java.lang.Double is not a valid external type for schema of array<double> > if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null > else validateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 0, indices), IntegerType) AS indices#0 > ... long stack trace ... > {code} > Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org