Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 I just did a performance test based on our 2.1.1 and a real table. I tested a full scan of an hour of data with a single data filter. The scan had 13,083 tasks and read 1084.8 GB. I used identical Spark applications with 100 executors, each with 1 core and 6 GB memory. * **With project to UnsafeRow**: wall time: 12m, total task time: 19h, longest task: 51s. * **Without projection, using InternalRow**: wall time: 11m, total task time: 17.8h, longest task: 26s. Clearly, this is not a benchmark. But this shows a 6% performance improvement for not making unnecessary copies. Eliminating copies is a pretty easy way to get better performance, if we can update a few operators to work with both `InternalRow` and `UnsafeRow`.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org