Re: Thoughts on dataframe cogroup?

2019-04-23 Thread Bryan Cutler
Apologies for not leaving feedback yet. I'm a little swamped this week with the Spark Summit, but this is at the top of my list to get to for next week. Bryan On Thu, Apr 18, 2019 at 4:18 AM Chris Martin wrote: > Yes, totally agreed with Li here. > > For clarity, I'm happy to do the work to

FW: Stage 152 contains a task of very large size (12747 KB). The maximum recommended task size is 100 KB

2019-04-23 Thread Long, Andrew
Hey Friends, Is there an easy way of figuring out whats being pull into the task context? I’ve been getting the following message which I suspect means I’ve unintentional caught some large objects but figuring out what those objects are is stumping me. 19/04/23 13:52:13 WARN

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-23 Thread Matei Zaharia
Just as a note here, if the goal is the format not change, why not make that explicit in a versioning policy? You can always include a format version number and say that future versions may increment the number, but this specific version will always be readable in some specific way. You could