Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

Xiangrui Meng Mon, 13 May 2019 08:49:40 -0700

My vote is 0. Since the updated SPIP focuses on ETL use cases, I don't feel
strongly about it. I would still suggest doing the following:


1. Link the POC mentioned in Q4. So people can verify the POC result.
2. List public APIs we plan to expose in Appendix A. I did a quick check.
Beside ColumnarBatch and ColumnarVector, we also need to make the following
public. People who are familiar with SQL internals should help assess the
risk.
* ColumnarArray
* ColumnarMap
* unsafe.types.CaledarInterval
* ColumnarRow
* UTF8String
* ArrayData
* ...
3. I still feel using Pandas UDF as the mid-term success doesn't match the
purpose of this SPIP. It does make some code cleaner. But I guess for ETL
use cases, it won't bring much value.

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

Reply via email to