Ok, I'm going to call this vote and send the result email. We had 9 +1's (4 binding) and 1 +0 and no -1's. Tom On Monday, May 27, 2019, 3:25:14 PM CDT, Felix Cheung <felixcheun...@hotmail.com> wrote: #yiv8731614492 html {background-color:transparent;}#yiv8731614492 body {color:#333;line-height:150%;margin:0;}#yiv8731614492 .yiv8731614492ms-outlook-ios-reference-expand {display:block;color:#999;padding:20px 0px;text-decoration:none;}#yiv8731614492 .yiv8731614492ms-outlook-ios-availability-container {max-width:500px;margin:auto;padding:12px 15px 15px 15px;border:1px solid #C7E0F4;border-radius:4px;}#yiv8731614492 #yiv8731614492 .yiv8731614492ms-outlook-ios-availability-delete-button {width:25px;min-height:25px;background-size:25px 25px;background-position:center;}#yiv8731614492 #yiv8731614492ms-outlook-ios-main-container {margin:0 0 0 0;margin-top:120;padding:8;}#yiv8731614492 #yiv8731614492ms-outlook-ios-content-container {padding:0;padding-top:12;padding-bottom:20;}#yiv8731614492 .yiv8731614492ms-outlook-ios-mention {color:#333;background-color:#f1f1f1;border-radius:4px;padding:0 2px 0 2px;text-decoration:none;}#yiv8731614492 .yiv8731614492ms-outlook-ios-mention-external {color:#ba8f0d;background-color:#fdf7e7;}#yiv8731614492 .yiv8731614492ms-outlook-ios-mention-external-clear-design {color:#ba8f0d;background-color:#f1f1f1;}+1 I’d prefer to see more of the end goal and how that could be achieved (such as ETL or SPARK-24579). However given the rounds and months of discussions we have come down to just the public API. If the community thinks a new set of public API is maintainable, I don’t see any problem with that. From: Tom Graves <tgraves...@yahoo.com.INVALID> Sent: Sunday, May 26, 2019 8:22:59 AM To: hol...@pigscanfly.ca; Reynold Xin Cc: Bobby Evans; DB Tsai; Dongjoon Hyun; Imran Rashid; Jason Lowe; Matei Zaharia; Thomas graves; Xiangrui Meng; Xiangrui Meng; dev Subject: Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support More feedback would be great, this has been open a long time though, let's extend til Wednesday the 29th and see where we are at. Tom
Sent from Yahoo Mail on Android On Sat, May 25, 2019 at 6:28 PM, Holden Karau<hol...@pigscanfly.ca> wrote:Same I meant to catch up after kubecon but had some unexpected travels. On Sat, May 25, 2019 at 10:56 PM Reynold Xin <r...@databricks.com> wrote: Can we push this to June 1st? I have been meaning to read it but unfortunately keeps traveling... On Sat, May 25, 2019 at 8:31 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: +1 Thanks,Dongjoon. On Fri, May 24, 2019 at 17:03 DB Tsai <dbt...@dbtsai.com.invalid> wrote: +1 on exposing the APIs for columnar processing support. I understand that the scope of this SPIP doesn't cover AI / ML use-cases. But I saw a good performance gain when I converted data from rows to columns to leverage on SIMD architectures in a POC ML application. With the exposed columnar processing support, I can imagine that the heavy lifting parts of ML applications (such as computing the objective functions) can be written as columnar expressions that leverage on SIMD architectures to get a good speedup. Sincerely, DB Tsai ---------------------------------------------------------- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Wed, May 15, 2019 at 2:59 PM Bobby Evans <reva...@gmail.com> wrote: > > It would allow for the columnar processing to be extended through the > shuffle. So if I were doing say an FPGA accelerated extension it could > replace the ShuffleExechangeExec with one that can take a ColumnarBatch as > input instead of a Row. The extended version of the ShuffleExchangeExec could > then do the partitioning on the incoming batch and instead of producing a > ShuffleRowRDD for the exchange they could produce something like a > ShuffleBatchRDD that would let the serializing and deserializing happen in a > column based format for a faster exchange, assuming that columnar processing > is also happening after the exchange. This is just like providing a columnar > version of any other catalyst operator, except in this case it is a bit more > complex of an operator. > > On Wed, May 15, 2019 at 12:15 PM Imran Rashid <iras...@cloudera.com.invalid> > wrote: >> >> sorry I am late to the discussion here -- the jira mentions using this >> extensions for dealing with shuffles, can you explain that part? I don't >> see how you would use this to change shuffle behavior at all. >> >> On Tue, May 14, 2019 at 10:59 AM Thomas graves <tgra...@apache.org> wrote: >>> >>> Thanks for replying, I'll extend the vote til May 26th to allow your >>> and other people feedback who haven't had time to look at it. >>> >>> Tom >>> >>> On Mon, May 13, 2019 at 4:43 PM Holden Karau <hol...@pigscanfly.ca> wrote: >>> > >>> > I’d like to ask this vote period to be extended, I’m interested but I >>> > don’t have the cycles to review it in detail and make an informed vote >>> > until the 25th. >>> > >>> > On Tue, May 14, 2019 at 1:49 AM Xiangrui Meng <m...@databricks.com> wrote: >>> >> >>> >> My vote is 0. Since the updated SPIP focuses on ETL use cases, I don't >>> >> feel strongly about it. I would still suggest doing the following: >>> >> >>> >> 1. Link the POC mentioned in Q4. So people can verify the POC result. >>> >> 2. List public APIs we plan to expose in Appendix A. I did a quick >>> >> check. Beside ColumnarBatch and ColumnarVector, we also need to make the >>> >> following public. People who are familiar with SQL internals should help >>> >> assess the risk. >>> >> * ColumnarArray >>> >> * ColumnarMap >>> >> * unsafe.types.CaledarInterval >>> >> * ColumnarRow >>> >> * UTF8String >>> >> * ArrayData >>> >> * ... >>> >> 3. I still feel using Pandas UDF as the mid-term success doesn't match >>> >> the purpose of this SPIP. It does make some code cleaner. But I guess >>> >> for ETL use cases, it won't bring much value. >>> >> >>> > -- >>> > Twitter: https://twitter.com/holdenkarau >>> > Books (Learning Spark, High Performance Spark, etc.): >>> > https://amzn.to/2MaRAG9 >>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau