Hi Amin,
This might be only marginally relevant to your question, but in my
project I also noticed the following: The trained and exported Spark
models (i.e. pipelines saved to binary files) are also not compatible
between versions, at least between major versions. I noticed this when
trying to load a model built with Spark 2.4.4 after updating to 3.2.0.
This didn't work.
Cheers,
Martin
Am 24.11.21 um 20:18 schrieb Sean Owen:
I think/hope that it goes without saying you can't mix Spark versions
within a cluster.
Forwards compatibility is something you don't generally expect as a
default from any piece of software, so not sure there is something to
document explicitly.
Backwards compatibility is important, and this is documented
extensively where it doesn't hold in the Spark docs and release notes.
On Wed, Nov 24, 2021 at 1:16 PM Amin Borjian
<borjianami...@outlook.com> wrote:
Thank you very much for the reply you sent. It would be great if
these items were mentioned in the Spark document (for example, the
download page or something else)
If I understand correctly, it means that we can compile the client
(for example Java, etc.) with a newer version (for example 3.2.0)
within the range of a major version against older server (for
example 3.1.x) and do not see any problem in most cases. Am I
right?(Because the issue of backward-compatibility can be
expressed from both the server and the client view, I repeated the
sentence to make sure I got it right.)
But what happened if we update server to 3.2.x and our client was
in version 3.1.x? Does it client can work with newer cluster
version because it uses just old feature of severs? (Maybe you
mean this and in fact my previous sentence was wrong and I
misunderstood)
*From: *Sean Owen <mailto:sro...@gmail.com>
*Sent: *Wednesday, November 24, 2021 5:38 PM
*To: *Amin Borjian <mailto:borjianami...@outlook.com>
*Cc: *user@spark.apache.org
*Subject: *Re: [Spark] Does Spark support backward and forward
compatibility?
Can you mix different Spark versions on driver and executor? no.
Can you compile against a different version of Spark than you run
on? That typically works within a major release, though forwards
compatibility may not work (you can't use a feature that doesn't
exist in the version on the cluster). Compiling vs 3.2.0 and
running on 3.1.x for example should work fine in 99% of cases.
On Wed, Nov 24, 2021 at 8:04 AM Amin Borjian
<borjianami...@outlook.com> wrote:
I have a simple question about using Spark, which although
most tools usually explain this question explicitly (in
important text, such as a specific format or a separate page),
I did not find it anywhere. Maybe my search was not enough,
but I thought it was good that I ask this question in the hope
that maybe the answer will benefit other people as well.
Spark binary is usually downloaded from the following link and
installed and configured on the cluster: Download Apache Spark
<https://spark.apache.org/downloads.html>
If, for example, we use the Java language for programming
(although it can be other supported languages), we need the
following dependencies to communicate with Spark:
|<dependency>|
| <groupId>org.apache.spark</groupId>|
| <artifactId>spark-core_2|.12|</artifactId>|
| <version>|3.2.0|</version>|
|</dependency>|
|<dependency>|
| <groupId>org.apache.spark</groupId>|
| <artifactId>spark-sql_2|.12|</artifactId>|
| <version>|3.2.0|</version>|
|</dependency>|
As is clear, both the Spark cluster (binary of Spark) and the
dependencies used on the application side have a specific
version. In my opinion, it is obvious that if the version used
is the same on both the application side and the server side,
everything will most likely work in its ideal state without
any problems.
But the question is, what if the two versions are not the
same? Is it possible to have compatibility between the server
and the application in specific number of conditions (such as
not changing major version)? Or, for example, if the client is
always ahead, is it not a problem? Or if the server is always
ahead, is it not a problem?
The argument is that there may be a library that I did not write
and it is an old version, but I want to update my cluster (server
version). Or it may not be possible for me to update the server
version and all the applications version at the same time, so I
want to update each one separately. As a result, the
application-server version differs in a period of time. (maybe
short or long period) I want to know exactly how Spark works in
this situation.