Same I meant to catch up after kubecon but had some unexpected travels. On Sat, May 25, 2019 at 10:56 PM Reynold Xin <r...@databricks.com> wrote:
> Can we push this to June 1st? I have been meaning to read it but > unfortunately keeps traveling... > > On Sat, May 25, 2019 at 8:31 PM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> +1 >> >> Thanks, >> Dongjoon. >> >> On Fri, May 24, 2019 at 17:03 DB Tsai <dbt...@dbtsai.com.invalid> wrote: >> >>> +1 on exposing the APIs for columnar processing support. >>> >>> I understand that the scope of this SPIP doesn't cover AI / ML >>> use-cases. But I saw a good performance gain when I converted data >>> from rows to columns to leverage on SIMD architectures in a POC ML >>> application. >>> >>> With the exposed columnar processing support, I can imagine that the >>> heavy lifting parts of ML applications (such as computing the >>> objective functions) can be written as columnar expressions that >>> leverage on SIMD architectures to get a good speedup. >>> >>> Sincerely, >>> >>> DB Tsai >>> ---------------------------------------------------------- >>> Web: https://www.dbtsai.com >>> PGP Key ID: 42E5B25A8F7A82C1 >>> >>> On Wed, May 15, 2019 at 2:59 PM Bobby Evans <reva...@gmail.com> wrote: >>> > >>> > It would allow for the columnar processing to be extended through the >>> shuffle. So if I were doing say an FPGA accelerated extension it could >>> replace the ShuffleExechangeExec with one that can take a ColumnarBatch as >>> input instead of a Row. The extended version of the ShuffleExchangeExec >>> could then do the partitioning on the incoming batch and instead of >>> producing a ShuffleRowRDD for the exchange they could produce something >>> like a ShuffleBatchRDD that would let the serializing and deserializing >>> happen in a column based format for a faster exchange, assuming that >>> columnar processing is also happening after the exchange. This is just like >>> providing a columnar version of any other catalyst operator, except in this >>> case it is a bit more complex of an operator. >>> > >>> > On Wed, May 15, 2019 at 12:15 PM Imran Rashid >>> <iras...@cloudera.com.invalid> wrote: >>> >> >>> >> sorry I am late to the discussion here -- the jira mentions using >>> this extensions for dealing with shuffles, can you explain that part? I >>> don't see how you would use this to change shuffle behavior at all. >>> >> >>> >> On Tue, May 14, 2019 at 10:59 AM Thomas graves <tgra...@apache.org> >>> wrote: >>> >>> >>> >>> Thanks for replying, I'll extend the vote til May 26th to allow your >>> >>> and other people feedback who haven't had time to look at it. >>> >>> >>> >>> Tom >>> >>> >>> >>> On Mon, May 13, 2019 at 4:43 PM Holden Karau <hol...@pigscanfly.ca> >>> wrote: >>> >>> > >>> >>> > I’d like to ask this vote period to be extended, I’m interested >>> but I don’t have the cycles to review it in detail and make an informed >>> vote until the 25th. >>> >>> > >>> >>> > On Tue, May 14, 2019 at 1:49 AM Xiangrui Meng <m...@databricks.com> >>> wrote: >>> >>> >> >>> >>> >> My vote is 0. Since the updated SPIP focuses on ETL use cases, I >>> don't feel strongly about it. I would still suggest doing the following: >>> >>> >> >>> >>> >> 1. Link the POC mentioned in Q4. So people can verify the POC >>> result. >>> >>> >> 2. List public APIs we plan to expose in Appendix A. I did a >>> quick check. Beside ColumnarBatch and ColumnarVector, we also need to make >>> the following public. People who are familiar with SQL internals should >>> help assess the risk. >>> >>> >> * ColumnarArray >>> >>> >> * ColumnarMap >>> >>> >> * unsafe.types.CaledarInterval >>> >>> >> * ColumnarRow >>> >>> >> * UTF8String >>> >>> >> * ArrayData >>> >>> >> * ... >>> >>> >> 3. I still feel using Pandas UDF as the mid-term success doesn't >>> match the purpose of this SPIP. It does make some code cleaner. But I guess >>> for ETL use cases, it won't bring much value. >>> >>> >> >>> >>> > -- >>> >>> > Twitter: https://twitter.com/holdenkarau >>> >>> > Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 >>> >>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> >>> >>> >>> --------------------------------------------------------------------- >>> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau