hi Olivier,
This seems a GKE specific issue? have you tried on other vendors ? Also on
the kubelet nodes did you notice any pressure on the DNS side?
Li
On Mon, Apr 29, 2019, 5:43 AM Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:
> Hi everyone,
> I have ~300 spark job on
Hi,
BinaryType support was not added until Spark 2.4.0, see
https://issues.apache.org/jira/browse/SPARK-23555. Also, pyarrow 0.10.0 or
greater is require as you saw in the docs.
Bryan
On Thu, May 2, 2019 at 4:26 AM Nicolas Paris
wrote:
> Hi all
>
> I am using pySpark 2.3.0 and pyArrow 0.10.0
Hi,
I have teradata table who has more than 2.5 billion records and data size
is around 600 GB. I am not able to pull efficiently using spark SQL and it
is been running for more than 11 hours. here is my code.
val df2 = sparkSession.read.format("jdbc")
.option("url",
Hi Akshay,
You may refer the following:
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-service-contextcleaner.html
Thanks,
Kanchan
Data Engineer, IBM
On Thu, May 2, 2019 at 5:21 PM Akshay Bhardwaj <
akshay.bhardwaj1...@gmail.com> wrote:
> Hi All,
>
> I am using Spark
Hi All,
I am using Spark structured streaming with spark 2.3 running on Yarn
cluster with hadoop 2.7.3 In the driver logs I see numerous lines as below.
2019-05-02 17:14:24.619 ContextCleaner Spark Context Cleaner [INFO] Cleaned
accumulator 81492577
2019-05-02 17:14:24.619 ContextCleaner Spark
Hi all
I am using pySpark 2.3.0 and pyArrow 0.10.0
I want to apply a pandas-udf on a dataframe with
I have the bellow error:
> Invalid returnType with grouped map Pandas UDFs:
> StructType(List(StructField(filename,StringType,true),StructField(contents,BinaryType,true)))
> is not supported
you can check the "Executors" tab in the spark UI screen...
On Fri, Mar 15, 2019 at 7:56 AM JF Chen wrote:
> But now I have another question, how to determine which data node the
> spark task is writing? It's really important for diving in the problem .
>
> Regard,
> Junfeng Chen
>
>
> On Thu,