Re: Spark StructuredStreaming - watermark not working as expected

2023-03-10 Thread karan alang
Hi Mich - Here is the output of the ldf.printSchema() & ldf.show() commands. ldf.printSchema() root |-- applianceName: string (nullable = true) |-- timeslot: long (nullable = true) |-- customer: string (nullable = true) |-- window: struct (nullable = false) ||-- start: timestamp

How to allocate vcores to driver (client mode)

2023-03-10 Thread sam smith
Hi, I am launching through code (client mode) a Spark program to run in Hadoop. Whenever I check the executors tab of Spark UI I always get 0 as the number of vcores for the driver. I tried to change that using *spark.driver.cores*, or also *spark.yarn.am.cores* in the SparkSession configuration

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-10 Thread Mich Talebzadeh
Just looking at the code in here ldf = ldf.groupBy("applianceName", "timeslot", "customer", window(col("ts"), "15 minutes")) \ .agg({'sentOctets':"sum", 'recvdOctets':"sum"}) \ .withColumnRenamed('sum(sentOctets)', 'sentOctets') \

Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-10 Thread Mich Talebzadeh
for your dataproc what type of machines are you using for example n2-standard-4 with 4vCPU and 16GB or something else? how many nodes and if autoscaling turned on. most likely executor memory limit? HTH view my Linkedin profile

org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-10 Thread Gary Liu
Hi , I have a job in GCP dataproc server spark session (spark 3.3.2), it is a job involving multiple joinings, as well as a complex UDF. I always got the below FetchFailedException, but the job can be done and the results look right. Neither of 2 input data is very big (one is 6.5M rows*11