Can we push this to June 1st? I have been meaning to read it but unfortunately keeps traveling...
On Sat, May 25, 2019 at 8:31 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > +1 > > Thanks, > Dongjoon. > > On Fri, May 24, 2019 at 17:03 DB Tsai <dbt...@dbtsai.com.invalid> wrote: > >> +1 on exposing the APIs for columnar processing support. >> >> I understand that the scope of this SPIP doesn't cover AI / ML >> use-cases. But I saw a good performance gain when I converted data >> from rows to columns to leverage on SIMD architectures in a POC ML >> application. >> >> With the exposed columnar processing support, I can imagine that the >> heavy lifting parts of ML applications (such as computing the >> objective functions) can be written as columnar expressions that >> leverage on SIMD architectures to get a good speedup. >> >> Sincerely, >> >> DB Tsai >> ---------------------------------------------------------- >> Web: https://www.dbtsai.com >> PGP Key ID: 42E5B25A8F7A82C1 >> >> On Wed, May 15, 2019 at 2:59 PM Bobby Evans <reva...@gmail.com> wrote: >> > >> > It would allow for the columnar processing to be extended through the >> shuffle. So if I were doing say an FPGA accelerated extension it could >> replace the ShuffleExechangeExec with one that can take a ColumnarBatch as >> input instead of a Row. The extended version of the ShuffleExchangeExec >> could then do the partitioning on the incoming batch and instead of >> producing a ShuffleRowRDD for the exchange they could produce something >> like a ShuffleBatchRDD that would let the serializing and deserializing >> happen in a column based format for a faster exchange, assuming that >> columnar processing is also happening after the exchange. This is just like >> providing a columnar version of any other catalyst operator, except in this >> case it is a bit more complex of an operator. >> > >> > On Wed, May 15, 2019 at 12:15 PM Imran Rashid >> <iras...@cloudera.com.invalid> wrote: >> >> >> >> sorry I am late to the discussion here -- the jira mentions using this >> extensions for dealing with shuffles, can you explain that part? I don't >> see how you would use this to change shuffle behavior at all. >> >> >> >> On Tue, May 14, 2019 at 10:59 AM Thomas graves <tgra...@apache.org> >> wrote: >> >>> >> >>> Thanks for replying, I'll extend the vote til May 26th to allow your >> >>> and other people feedback who haven't had time to look at it. >> >>> >> >>> Tom >> >>> >> >>> On Mon, May 13, 2019 at 4:43 PM Holden Karau <hol...@pigscanfly.ca> >> wrote: >> >>> > >> >>> > I’d like to ask this vote period to be extended, I’m interested but >> I don’t have the cycles to review it in detail and make an informed vote >> until the 25th. >> >>> > >> >>> > On Tue, May 14, 2019 at 1:49 AM Xiangrui Meng <m...@databricks.com> >> wrote: >> >>> >> >> >>> >> My vote is 0. Since the updated SPIP focuses on ETL use cases, I >> don't feel strongly about it. I would still suggest doing the following: >> >>> >> >> >>> >> 1. Link the POC mentioned in Q4. So people can verify the POC >> result. >> >>> >> 2. List public APIs we plan to expose in Appendix A. I did a quick >> check. Beside ColumnarBatch and ColumnarVector, we also need to make the >> following public. People who are familiar with SQL internals should help >> assess the risk. >> >>> >> * ColumnarArray >> >>> >> * ColumnarMap >> >>> >> * unsafe.types.CaledarInterval >> >>> >> * ColumnarRow >> >>> >> * UTF8String >> >>> >> * ArrayData >> >>> >> * ... >> >>> >> 3. I still feel using Pandas UDF as the mid-term success doesn't >> match the purpose of this SPIP. It does make some code cleaner. But I guess >> for ETL use cases, it won't bring much value. >> >>> >> >> >>> > -- >> >>> > Twitter: https://twitter.com/holdenkarau >> >>> > Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 >> >>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>