Ok, I'm going to call this vote and send the result email. We had 9 +1's (4 
binding) and 1 +0 and no -1's.
Tom
    On Monday, May 27, 2019, 3:25:14 PM CDT, Felix Cheung 
<felixcheun...@hotmail.com> wrote:  
 
 #yiv8731614492 html {background-color:transparent;}#yiv8731614492 body 
{color:#333;line-height:150%;margin:0;}#yiv8731614492 
.yiv8731614492ms-outlook-ios-reference-expand 
{display:block;color:#999;padding:20px 0px;text-decoration:none;}#yiv8731614492 
.yiv8731614492ms-outlook-ios-availability-container 
{max-width:500px;margin:auto;padding:12px 15px 15px 15px;border:1px solid 
#C7E0F4;border-radius:4px;}#yiv8731614492 #yiv8731614492 
.yiv8731614492ms-outlook-ios-availability-delete-button 
{width:25px;min-height:25px;background-size:25px 
25px;background-position:center;}#yiv8731614492 
#yiv8731614492ms-outlook-ios-main-container {margin:0 0 0 
0;margin-top:120;padding:8;}#yiv8731614492 
#yiv8731614492ms-outlook-ios-content-container 
{padding:0;padding-top:12;padding-bottom:20;}#yiv8731614492 
.yiv8731614492ms-outlook-ios-mention 
{color:#333;background-color:#f1f1f1;border-radius:4px;padding:0 2px 0 
2px;text-decoration:none;}#yiv8731614492 
.yiv8731614492ms-outlook-ios-mention-external 
{color:#ba8f0d;background-color:#fdf7e7;}#yiv8731614492 
.yiv8731614492ms-outlook-ios-mention-external-clear-design 
{color:#ba8f0d;background-color:#f1f1f1;}+1
I’d prefer to see more of the end goal and how that could be achieved (such as 
ETL or SPARK-24579). However given the rounds and months of discussions we have 
come down to just the public API.
If the community thinks a new set of public API is maintainable, I don’t see 
any problem with that.
From: Tom Graves <tgraves...@yahoo.com.INVALID>
Sent: Sunday, May 26, 2019 8:22:59 AM
To: hol...@pigscanfly.ca; Reynold Xin
Cc: Bobby Evans; DB Tsai; Dongjoon Hyun; Imran Rashid; Jason Lowe; Matei 
Zaharia; Thomas graves; Xiangrui Meng; Xiangrui Meng; dev
Subject: Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar 
Processing Support More feedback would be great, this has been open a long time 
though, let's extend til Wednesday the 29th and see where we are at.
Tom



Sent from Yahoo Mail on Android

On Sat, May 25, 2019 at 6:28 PM, Holden Karau<hol...@pigscanfly.ca> wrote:Same 
I meant to catch up after kubecon but had some unexpected travels.
On Sat, May 25, 2019 at 10:56 PM Reynold Xin <r...@databricks.com> wrote:

Can we push this to June 1st? I have been meaning to read it but unfortunately 
keeps traveling...
On Sat, May 25, 2019 at 8:31 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:

+1
Thanks,Dongjoon.
On Fri, May 24, 2019 at 17:03 DB Tsai <dbt...@dbtsai.com.invalid> wrote:

+1 on exposing the APIs for columnar processing support.

I understand that the scope of this SPIP doesn't cover AI / ML
use-cases. But I saw a good performance gain when I converted data
from rows to columns to leverage on SIMD architectures in a POC ML
application.

With the exposed columnar processing support, I can imagine that the
heavy lifting parts of ML applications (such as computing the
objective functions) can be written as columnar expressions that
leverage on SIMD architectures to get a good speedup.

Sincerely,

DB Tsai
----------------------------------------------------------
Web: https://www.dbtsai.com
PGP Key ID: 42E5B25A8F7A82C1

On Wed, May 15, 2019 at 2:59 PM Bobby Evans <reva...@gmail.com> wrote:
>
> It would allow for the columnar processing to be extended through the 
> shuffle.  So if I were doing say an FPGA accelerated extension it could 
> replace the ShuffleExechangeExec with one that can take a ColumnarBatch as 
> input instead of a Row. The extended version of the ShuffleExchangeExec could 
> then do the partitioning on the incoming batch and instead of producing a 
> ShuffleRowRDD for the exchange they could produce something like a 
> ShuffleBatchRDD that would let the serializing and deserializing happen in a 
> column based format for a faster exchange, assuming that columnar processing 
> is also happening after the exchange. This is just like providing a columnar 
> version of any other catalyst operator, except in this case it is a bit more 
> complex of an operator.
>
> On Wed, May 15, 2019 at 12:15 PM Imran Rashid <iras...@cloudera.com.invalid> 
> wrote:
>>
>> sorry I am late to the discussion here -- the jira mentions using this 
>> extensions for dealing with shuffles, can you explain that part?  I don't 
>> see how you would use this to change shuffle behavior at all.
>>
>> On Tue, May 14, 2019 at 10:59 AM Thomas graves <tgra...@apache.org> wrote:
>>>
>>> Thanks for replying, I'll extend the vote til May 26th to allow your
>>> and other people feedback who haven't had time to look at it.
>>>
>>> Tom
>>>
>>> On Mon, May 13, 2019 at 4:43 PM Holden Karau <hol...@pigscanfly.ca> wrote:
>>> >
>>> > I’d like to ask this vote period to be extended, I’m interested but I 
>>> > don’t have the cycles to review it in detail and make an informed vote 
>>> > until the 25th.
>>> >
>>> > On Tue, May 14, 2019 at 1:49 AM Xiangrui Meng <m...@databricks.com> wrote:
>>> >>
>>> >> My vote is 0. Since the updated SPIP focuses on ETL use cases, I don't 
>>> >> feel strongly about it. I would still suggest doing the following:
>>> >>
>>> >> 1. Link the POC mentioned in Q4. So people can verify the POC result.
>>> >> 2. List public APIs we plan to expose in Appendix A. I did a quick 
>>> >> check. Beside ColumnarBatch and ColumnarVector, we also need to make the 
>>> >> following public. People who are familiar with SQL internals should help 
>>> >> assess the risk.
>>> >> * ColumnarArray
>>> >> * ColumnarMap
>>> >> * unsafe.types.CaledarInterval
>>> >> * ColumnarRow
>>> >> * UTF8String
>>> >> * ArrayData
>>> >> * ...
>>> >> 3. I still feel using Pandas UDF as the mid-term success doesn't match 
>>> >> the purpose of this SPIP. It does make some code cleaner. But I guess 
>>> >> for ETL use cases, it won't bring much value.
>>> >>
>>> > --
>>> > Twitter: https://twitter.com/holdenkarau
>>> > Books (Learning Spark, High Performance Spark, etc.): 
>>> > https://amzn.to/2MaRAG9
>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
  

Reply via email to