Nic Crane created SPARK-35814: --------------------------------- Summary: Mismatched types when creating a new column when using Arrow Key: SPARK-35814 URL: https://issues.apache.org/jira/browse/SPARK-35814 Project: Spark Issue Type: Bug Components: R Affects Versions: 3.1.2 Reporter: Nic Crane
I was looking into a question on StackOverflow where a user had an error due to a mismatch of field types (https://stackoverflow.com/questions/68032084/how-to-fix-error-in-readbin-using-arrow-package-with-sparkr/). I've made a reproducible example below. {code:java} library(SparkR) library(arrow) SparkR::sparkR.session(sparkConfig = list(spark.sql.execution.arrow.sparkr.enabled = "true")) readr::write_csv(iris, "iris.csv")dfSchema <- structType( structField("Sepal.Length", "int"), structField("Sepal.Width", "double"), structField("Petal.Length", "double"), structField("Petal.Width", "double"), structField("Species", "string") ) spark_df <- SparkR::read.df(path="iris.csv", source="csv", schema=dfSchema) # Apply an R native function to each partition. returnSchema <- structType(structField("Sep_add_one", "int")) df <- SparkR::dapply(spark_df, function(rdf){ data.frame(rdf$Sepal.Length + 1) }, returnSchema ) collect(df) {code} It results in this error: {code:java} Error in readBin(con, raw(), as.integer(dataLen), endian = "big") : invalid 'n' argument {code} I've never used SparkR before, so could be wrong, but I *think* it may be due to the use of this line in DataFrame.R: {code:java} arrowTable <- arrow::read_ipc_stream(readRaw(conn)) {code} readRaw assumes that the the value is an integer; however when returnSchema is being created, the code creates a double (the code runs without error if it's updated to `rdf$Sepal.Length + 1L`) I wasn't sure if perhaps readTypedObject needs to be used or if maybe some type checking might be useful? Apologies for the partial explanation - not entirely sure of the cause. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org