Spark salesforce connector

2021-11-24 Thread Atlas - Samir Souidi
Dear all, Do you know if there is any spark connector to SalesForce? Thanks Sam Sent from Mail for Windows

Re: [Spark] Does Spark support backward and forward compatibility?

2021-11-24 Thread Lalwani, Jayesh
One thing to be pointed out is that you never bundle the Spark Client with your code. You compile against a Spark version. You bundle your code (without Spark jars) in an uber jar and deploy the Uber jar into Spark. Spark is already bundled with the jars that are required to send jobs to

RE: [Spark] Does Spark support backward and forward compatibility?

2021-11-24 Thread Amin Borjian
Thanks again for reply. Personally, I think the whole cluster should have a single version. What mattered most to me was how important the client version that sends the jobs to scheduler, that we should hope everything work well in small version changes! (In version changes less than major)

Re: [Spark] Does Spark support backward and forward compatibility?

2021-11-24 Thread Martin Wunderlich
Hi Amin, This might be only marginally relevant to your question, but in my project I also noticed the following: The trained and exported Spark models (i.e. pipelines saved to binary files) are also not compatible between versions, at least between major versions. I noticed this when trying

Re: [Spark] Does Spark support backward and forward compatibility?

2021-11-24 Thread Sean Owen
I think/hope that it goes without saying you can't mix Spark versions within a cluster. Forwards compatibility is something you don't generally expect as a default from any piece of software, so not sure there is something to document explicitly. Backwards compatibility is important, and this is

RE: [Spark] Does Spark support backward and forward compatibility?

2021-11-24 Thread Amin Borjian
Thank you very much for the reply you sent. It would be great if these items were mentioned in the Spark document (for example, the download page or something else) If I understand correctly, it means that we can compile the client (for example Java, etc.) with a newer version (for example

Listening to ExternalCatalogEvent in Spark 3

2021-11-24 Thread Khai Tran
Hello community, Previously in Spark 2.4, we listen and capture ExternalCatalogEvent on "onOtherEvent()" method of SparkListener, but with Spark 3, we no longer see those events. Just wonder if there is any behavior change for emitting ExternalCatalogEvent in Spark 3, and if yes, where should I

Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Mich Talebzadeh
I am not sure about that. However, with Kubernetes and docker image for PySpark, I build the packages into the image itself as below in the dockerfile RUN pip install pyyaml numpy cx_Oracle and that will add those packages that you can reference in your py script import yaml import cx_Oracle

RE: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Bode, Meikel, NMA-CFD
Can we add Python dependencies as we can do for mvn coordinates? So that we run sth like pip install or download from pypi index? From: Mich Talebzadeh Sent: Mittwoch, 24. November 2021 18:28 Cc: user@spark.apache.org Subject: Re: [issue] not able to add external libs to pyspark job while

Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Mich Talebzadeh
The easiest way to set this up is to create dependencies.zip file. Assuming that you have a virtual environment already set-up, where there is directory called site-packages, go to that directory and just create a minimal a shell script say package_and_zip_dependencies.sh to do it for you

Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Atheer Alabdullatif
Hello Owen, Thank you for your prompt reply! We will check it out. best, Atheer Alabdullatif From: Sean Owen Sent: Wednesday, November 24, 2021 5:06 PM To: Atheer Alabdullatif Cc: user@spark.apache.org ; Data Engineering Subject: Re: [issue] not able to add

Re: [Spark] Does Spark support backward and forward compatibility?

2021-11-24 Thread Sean Owen
Can you mix different Spark versions on driver and executor? no. Can you compile against a different version of Spark than you run on? That typically works within a major release, though forwards compatibility may not work (you can't use a feature that doesn't exist in the version on the cluster).

Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Sean Owen
That's not how you add a library. From the docs: https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html On Wed, Nov 24, 2021 at 8:02 AM Atheer Alabdullatif wrote: > Dear Spark team, > hope my email finds you well > > > I am using pyspark 3.0 and facing an issue with

[Spark] Does Spark support backward and forward compatibility?

2021-11-24 Thread Amin Borjian
I have a simple question about using Spark, which although most tools usually explain this question explicitly (in important text, such as a specific format or a separate page), I did not find it anywhere. Maybe my search was not enough, but I thought it was good that I ask this question in the

[issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Atheer Alabdullatif
Dear Spark team, hope my email finds you well I am using pyspark 3.0 and facing an issue with adding external library [configparser] while running the job using [spark-submit] & [yarn] issue: import configparser ImportError: No module named configparser 21/11/24 08:54:38 INFO

Re: Choosing architecture for on-premise Spark & HDFS on Kubernetes cluster

2021-11-24 Thread Mich Talebzadeh
Just to clarify it should say The current Spark Kubernetes model ... You will also need to build or get the Spark docker image that you are going to use in k8s clusters based on spark version, java version, scala version, OS and so forth. Are you going to use Hive as your main storage?