Github user shivaram commented on the issue: https://github.com/apache/spark/pull/15421 Thanks @wangmiao1981 - There are two different kinds of serializations that happen in SparkR - one is the RPC style serialization where function arguments are serialized using `writeDate`, `writeInt` etc. The other is batch or bulk serialization that we use in case of converting R `data.frame` to Spark RDDs. This is used in the `createDataFrame` case from [1]. Now the way this is supposed to work is that this is converted by the call to `lapply` and `getJRDD` [2] to be a row-wise serialized `SparkDataFrame`. To do this on the executor side you will have a `unserialize` called on the bulk data [3] and a `writeRowSerialize` called for each row [4]. So the final byte stream to look at is the one here. But my guess is that things are going wrong somewhere before this -- i.e. the byte stream at [3] for example has some different type or something like that. Or to put it another way, are we sure `writeString` was called with `NA` or was it some other function like `writeBin` because the types were wrong ? The other reason for such a transient bug might be that the channels are not getting flushed somewhere and this doesn't show up on some R versions. But yeah your debugging methods are in line with what I would try [1] https://github.com/apache/spark/blob/d88a1bae6a9c975c39549ec2326d839ea93949b2/R/pkg/R/context.R#L140 [2] https://github.com/apache/spark/blob/d88a1bae6a9c975c39549ec2326d839ea93949b2/R/pkg/R/SQLContext.R#L275 [3] https://github.com/apache/spark/blob/d88a1bae6a9c975c39549ec2326d839ea93949b2/R/pkg/inst/worker/worker.R#L159 [4] https://github.com/apache/spark/blob/d88a1bae6a9c975c39549ec2326d839ea93949b2/R/pkg/inst/worker/worker.R#L78
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org