Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16541 Well, technically yes. But I would say it's a little more than that. The current approach to deserialization of `Seq`s is to copy the data into an array, construct a `WrappedArray` (which extends `Seq`) and optionally copy the data (again) into the new collection. This needs to go through all the elements twice if anything other than a `Seq` (or `WrappedArray` directly) is requested. This PR takes a more straightforward approach and constructs a mutable collection builder (which is defined for every Scala collection), adds the elements to it and retrieves the result. I assumed this would be a performance improvement and am quite surprised that there is no difference. But I think this might be due to the improvement being so small as to drown in the usual operation overhead. Sadly, I do not have the resources to measure the operation on larger amounts of data, having the benchmarks fail on larger collections on my setup. I am also not that familiar with Spark's internals to determine whether @kiszk's suggestion would improve operations in other ways, eg. during operations in the cluster environment. That is why I decided to implement it but keep it separate from my proposed changes for the time being. As to other benefits that this PR would bring other than peformance improvements: * I would like to implement Java `List`s support in a manner similar to what I did with `Map`s in #16986 (I would also ask you to take a look at that when you have the time) by just slightly altering the code. * I didn't know this when I implemented this but Scala 2.13 will introduce a major [collections rework](https://www.scala-lang.org/blog/2017/02/28/collections-rework.html) that will change the way the `to` method (used for conversions in the current solution) works. This will require a rewrite whereas I believe the method used here will remain largely the same.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org