[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

cloud-fan Wed, 02 May 2018 18:52:31 -0700

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21118
  
    parquet scan doesn't need unsafe row because it outputs `ColumnarBatch`. 
Note that, `UnsafeRow` is the data format Spark uses to exchange data between 
operators, but whole-stage-codegen can merge several operators into one. So in 
your case, a parquet scan followed with some simple operators like filter, 
project, is still one operator, so you won't see `UnsafeRow`.
    
    I believe if you add aggregate, you will see `UnsafeRow` in the final 
aggregate's generated code, which comes from the `ShuffleExchangeExec` operator.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

Reply via email to