Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22222 @cloud-fan @rdblue I want to leave some comments and thoughts during looking into this again, hope these can help us deciding the next step plan. Currently all the plan assumed input row is `RDD[InternalRow]`, whole framework treat columnar read as special case. Also the `inputRDDs` function not only be called in `WholeStageCodegenExec`, but also all the father physical node, it's very easy to get a mess in the scenario of nested plan during debug this fix. So we may have these 3 choices, the first two can totally remove cast but maybe have many changes on `CodegenSupport`, the last one can limited the changes but still has cast problem: 1. Erasure the type of `inputRDDs`, because we should allow both RDD[InternalRow] and RDD[ColumnarBatch] passed, mainly for the parent physical plan call the child. This is implemented as the last commit in this PR: https://github.com/apache/spark/pull/22222/files 2. Refactor the framework to let all plan dealing with columnar batch 3. Limited the changes in `ColumnarBatchScan`, don't change `CodegenSupport`, but still left the cast problem. This is implemented as the first two commit in this PR: https://github.com/apache/spark/pull/22222/files/7e88599dfc2caf177d12e890d588be68bdd3bc8e If all of these are not make sense, I'll just close this. Thanks.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org