Re: [DISCUSS] Remove sorting of fields in PySpark SQL Row construction

2019-11-12 Thread Bryan Cutler
Thanks all. I created a WIP PR at https://github.com/apache/spark/pull/26496, we can further discuss the details in there. On Thu, Nov 7, 2019 at 7:01 PM Takuya UESHIN wrote: > +1 > > On Thu, Nov 7, 2019 at 6:54 PM Shane Knapp wrote: > >> +1 >> >> On Thu, Nov 7, 2019 at 6:08 PM Hyukjin Kwon

Temporary tables for Spark SQL

2019-11-12 Thread Laurent Bastien Corbeil
Hello, I am new to Spark, so I have a basic question which I couldn't find an answer online. If I want to run SQL queries on a Spark dataframe, do I have to create a temporary table first? I know I could use the Spark SQL API, but is there a way of simply reading the data and run SQL queries

Re: PySpark Pandas UDF

2019-11-12 Thread Holden Karau
Thanks for sharing that. I think we should maybe add some checks around this so it’s easier to debug. I’m CCing Bryan who might have some thoughts. On Tue, Nov 12, 2019 at 7:42 AM gal.benshlomo wrote: > SOLVED! > thanks for the help - I found the issue. it was the version of pyarrow > (0.15.1)

RE: PySpark Pandas UDF

2019-11-12 Thread gal.benshlomo
SOLVED! thanks for the help - I found the issue. it was the version of pyarrow (0.15.1) which apparently isn't currently stable. Downgrading it solved the issue for me -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

RE:How to use spark-on-k8s pod template?

2019-11-12 Thread sora
Hi, I am using Spark 2.4.1 now. I can run spark on k8s normally, but I want to apply some k8s features (eg: pod tolerations) to pod by pod template. Thanks. -- 发件人:David Mitchell 发送时间:2019年11月9日(星期六) 00:18 收件人:sora 抄 送:user 主 

Re: Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2019-11-12 Thread Jacob Lynn
Thanks for the pointer, Vadim. However, I just tried it with Spark 2.4 and get the same failure. (I was previously testing with 2.2 and/or 2.3.) And I don't see this particular issue referred to there. The ticket that Harel commented on indeed appears to be the most similar one to this issue: