Re: Spark 2.4.1 on Kubernetes - DNS resolution of driver fails

2019-05-02 Thread Li Gao
hi Olivier, This seems a GKE specific issue? have you tried on other vendors ? Also on the kubelet nodes did you notice any pressure on the DNS side? Li On Mon, Apr 29, 2019, 5:43 AM Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > I have ~300 spark job on

Re: pySpark - pandas UDF and binaryType

2019-05-02 Thread Bryan Cutler
Hi, BinaryType support was not added until Spark 2.4.0, see https://issues.apache.org/jira/browse/SPARK-23555. Also, pyarrow 0.10.0 or greater is require as you saw in the docs. Bryan On Thu, May 2, 2019 at 4:26 AM Nicolas Paris wrote: > Hi all > > I am using pySpark 2.3.0 and pyArrow 0.10.0

Spark SQL Teradata load is very slow

2019-05-02 Thread KhajaAsmath Mohammed
Hi, I have teradata table who has more than 2.5 billion records and data size is around 600 GB. I am not able to pull efficiently using spark SQL and it is been running for more than 11 hours. here is my code. val df2 = sparkSession.read.format("jdbc") .option("url",

Re: What is Spark context cleaner in structured streaming

2019-05-02 Thread kanchan tewary
Hi Akshay, You may refer the following: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-service-contextcleaner.html Thanks, Kanchan Data Engineer, IBM On Thu, May 2, 2019 at 5:21 PM Akshay Bhardwaj < akshay.bhardwaj1...@gmail.com> wrote: > Hi All, > > I am using Spark

What is Spark context cleaner in structured streaming

2019-05-02 Thread Akshay Bhardwaj
Hi All, I am using Spark structured streaming with spark 2.3 running on Yarn cluster with hadoop 2.7.3 In the driver logs I see numerous lines as below. 2019-05-02 17:14:24.619 ContextCleaner Spark Context Cleaner [INFO] Cleaned accumulator 81492577 2019-05-02 17:14:24.619 ContextCleaner Spark

pySpark - pandas UDF and binaryType

2019-05-02 Thread Nicolas Paris
Hi all I am using pySpark 2.3.0 and pyArrow 0.10.0 I want to apply a pandas-udf on a dataframe with I have the bellow error: > Invalid returnType with grouped map Pandas UDFs: > StructType(List(StructField(filename,StringType,true),StructField(contents,BinaryType,true))) > is not supported

Re: spark df.write.partitionBy run very slow

2019-05-02 Thread Shyam P
you can check the "Executors" tab in the spark UI screen... On Fri, Mar 15, 2019 at 7:56 AM JF Chen wrote: > But now I have another question, how to determine which data node the > spark task is writing? It's really important for diving in the problem . > > Regard, > Junfeng Chen > > > On Thu,