Re: Naming files while saving a Dataframe

2021-08-12 Thread Eric Beabes
This doesn't work as given here ( https://stackoverflow.com/questions/36107581/change-output-filename-prefix-for-dataframe-write) but the answer suggests using FileOutputFormat class. Will try that. Thanks. Regards. On Sun, Jul 18, 2021 at 12:44 AM Jörn Franke wrote: > Spark heavily depends on

Replacing BroadcastNestedLoopJoin

2021-08-12 Thread Eric Beabes
We’ve two datasets that look like this: Dataset A: App specific data that contains (among other fields): ip_address Dataset B: Location data that contains start_ip_address_int, end_ip_address_int, latitude, longitude We’re (left) joining these two datasets as: A.ip_address >=

Re: K8S submit client vs. cluster

2021-08-12 Thread Mich Talebzadeh
OK amazon not much difference compared to Google Cloud Kubernetes Engines (GKE). When I submit a job, you need a powerful compute server to submit the job. It is another host but you cannot submit from K8s cluster nodes (I am not aware if one can actually do that). Anyway you submit something

RE: K8S submit client vs. cluster

2021-08-12 Thread Bode, Meikel, NMA-CFD
On EKS... From: Mich Talebzadeh Sent: Donnerstag, 12. August 2021 15:47 To: Bode, Meikel, NMA-CFD Cc: user@spark.apache.org Subject: Re: K8S submit client vs. cluster Ok As I see it with PySpark even if it is submitted as cluster, it will be converted to client mode anyway Are you running

Re: [EXTERNAL] [Marketing Mail] Reading SPARK 3.1.x generated parquet in SPARK 2.4.x

2021-08-12 Thread Gourav Sengupta
Hi Saurabh, a very big note of thanks from Gourav :) Regards, Gourav Sengupta On Thu, Aug 12, 2021 at 4:16 PM Saurabh Gulati wrote: > We had issues with this migration mainly because of changes in spark date > calendars. See >

Re: [EXTERNAL] [Marketing Mail] Reading SPARK 3.1.x generated parquet in SPARK 2.4.x

2021-08-12 Thread Saurabh Gulati
We had issues with this migration mainly because of changes in spark date calendars. See We got this working by setting the below params:

Re: K8S submit client vs. cluster

2021-08-12 Thread Mich Talebzadeh
Ok As I see it with PySpark even if it is submitted as cluster, it will be converted to client mode anyway Are you running this on AWS or GCP? view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all

RE: K8S submit client vs. cluster

2021-08-12 Thread Bode, Meikel, NMA-CFD
Hi Mich, All PySpark. Best, Meikel From: Mich Talebzadeh Sent: Donnerstag, 12. August 2021 13:41 To: Bode, Meikel, NMA-CFD Cc: user@spark.apache.org Subject: Re: K8S submit client vs. cluster Is this Spark or PySpark?

Re: K8S submit client vs. cluster

2021-08-12 Thread Mich Talebzadeh
Is this Spark or PySpark? view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's

K8S submit client vs. cluster

2021-08-12 Thread Bode, Meikel, NMA-CFD
Hi all, If we schedule a spark job on k8s, how are volume mappings handled? In client mode I would expect that drivers volumes have to mapped manually in the pod template. Executor volumes are attached dynamically based on submit parameters. Right...? I cluster mode I would expect that

Re: How can I config hive.metastore.warehouse.dir

2021-08-12 Thread eab...@163.com
Hi, I think you should set hive-site.xml before init SparkSession, spark will connect to metostore,and logged like that: == 2021-08-12 09:21:21 INFO HiveUtils:54 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. 2021-08-12