Re: Detecting latecomer events in Spark structured streaming

2021-03-11 Thread Jungtaek Lim
Hi, If I remember correctly, I don't think Spark provides watermark value itself for the current batch to the public API. That said, if you're dealing with "event time" (and I guess you belong to this case as you worry about late events), unless you employ a new logical/physical plan to expose

Re: Single executor processing all tasks in spark structured streaming kafka

2021-03-11 Thread Sachit Murarka
Hi Kapil, Thanks for suggestion. Yes, It worked. Regards Sachit On Tue, 9 Mar 2021, 00:19 Kapil Garg, wrote: > Hi Sachit, > What do you mean by "spark is running only 1 executor with 1 task" ? > Did you submit the spark application with multiple executors but only 1 is > being used and rest

Re: FlatMapGroupsWithStateFunction is called thrice - Production use case.

2021-03-11 Thread Jungtaek Lim
Hi, Could you please provide the Spark version? Also it would be pretty much helpful if you could provide a simple reproducer, like placing your reproducer which can simply be built (mvn or gradle or sbt) into your Github repository, plus the set of input data to see the behavior. Worth to know

RE: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

2021-03-11 Thread Ranju Jain
Ok! Thanks for all guidance :-) Regards Ranju From: Mich Talebzadeh Sent: Thursday, March 11, 2021 11:07 PM To: Ranju Jain Cc: user@spark.apache.org Subject: Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS I don't have any specific reference. However, you can do a Google search. best

Re: spark on k8s driver pod exception

2021-03-11 Thread Attila Zsolt Piros
> but the spark-submit log still running Set the "spark.kubernetes.submission.waitAppCompletion" config to false to change that. As the doc says: "spark.kubernetes.submission.waitAppCompletion" : In cluster mode, whether to wait for the application to finish before exiting the launcher

Re: spark on k8s driver pod exception

2021-03-11 Thread Attila Zsolt Piros
For getting the logs please read Accessing Logs part of the *Running Spark on Kubernetes* page. For stopping and generic management of the spark application please read the Spark Application Management

Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

2021-03-11 Thread Mich Talebzadeh
I don't have any specific reference. However, you can do a Google search. best to ask the Unix team. They can do all that themselves. HTHT LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

RE: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

2021-03-11 Thread Ranju Jain
Yes, there is a Team but I have not contacted them yet. Trying to understand at my end. I understood your point you mentioned below: Do you have any reference or links where I can check out the Shared Volumes ? Regards Ranju From: Mich Talebzadeh Sent: Thursday, March 11, 2021 5:38 PM Cc:

Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

2021-03-11 Thread Mich Talebzadeh
Well your mileage varies so to speak. The only way to find out is setting an NFS mount and testing it. The performance will depend on the mounted file system and the amount of cache it has. File cache is important for reads and if you are going to do random writes (as opposed to sequential

spark on k8s driver pod exception

2021-03-11 Thread yxl040840219
when run the code in k8s , driver pod throw AnalysisException , but the spark-submit log still running , then how to get the exception and stop pods ? val spark = SparkSession.builder().getOrCreate() import spark.implicits._ val df = (0 until 10).toDF("id").selectExpr("id %

RE: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

2021-03-11 Thread Ranju Jain
Hi Mich, No, it is not Google cloud. It is simply Kubernetes deployed over Bare Metal Platform. I am not clear for pros and cons of Shared Volume vs NFS for Read Write Many. As NFS is Network File Server [remote] , so I can figure out that Shared Volume should be more preferable, but don’t know

Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

2021-03-11 Thread Mich Talebzadeh
Ok this is on Google Cloud correct? LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * *Disclaimer:* Use it at your own risk. Any and all responsibility for any

Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

2021-03-11 Thread Ranju Jain
Hi, I need to write all Executors pods data on some common location which can be accessed and retrieved by driver pod. I was first planning to go with NFS, but I think Shared Volume is equally good. Please suggest Is there any major drawback in using Shared Volume instead of NFS when many pods