Hi Amin,

This might be only marginally relevant to your question, but in my project I also noticed the following: The trained and exported Spark models (i.e. pipelines saved to binary files) are also not compatible between versions, at least between major versions. I noticed this when trying to load a model built with Spark 2.4.4 after updating to 3.2.0. This didn't work.

Cheers,

Martin

Am 24.11.21 um 20:18 schrieb Sean Owen:
I think/hope that it goes without saying you can't mix Spark versions within a cluster. Forwards compatibility is something you don't generally expect as a default from any piece of software, so not sure there is something to document explicitly. Backwards compatibility is important, and this is documented extensively where it doesn't hold in the Spark docs and release notes.


On Wed, Nov 24, 2021 at 1:16 PM Amin Borjian <borjianami...@outlook.com> wrote:

    Thank you very much for the reply you sent. It would be great if
    these items were mentioned in the Spark document (for example, the
    download page or something else)

    If I understand correctly, it means that we can compile the client
    (for example Java, etc.) with a newer version (for example 3.2.0)
    within the range of a major version against older server (for
    example 3.1.x) and do not see any problem in most cases. Am I
    right?(Because the issue of backward-compatibility can be
    expressed from both the server and the client view, I repeated the
    sentence to make sure I got it right.)

    But what happened if we update server to 3.2.x and our client was
    in version 3.1.x? Does it client can work with newer cluster
    version because it uses just old feature of severs? (Maybe you
    mean this and in fact my previous sentence was wrong and I
    misunderstood)

    *From: *Sean Owen <mailto:sro...@gmail.com>
    *Sent: *Wednesday, November 24, 2021 5:38 PM
    *To: *Amin Borjian <mailto:borjianami...@outlook.com>
    *Cc: *user@spark.apache.org
    *Subject: *Re: [Spark] Does Spark support backward and forward
    compatibility?

    Can you mix different Spark versions on driver and executor? no.

    Can you compile against a different version of Spark than you run
    on? That typically works within a major release, though forwards
    compatibility may not work (you can't use a feature that doesn't
    exist in the version on the cluster). Compiling vs 3.2.0 and
    running on 3.1.x for example should work fine in 99% of cases.

    On Wed, Nov 24, 2021 at 8:04 AM Amin Borjian
    <borjianami...@outlook.com> wrote:

        I have a simple question about using Spark, which although
        most tools usually explain this question explicitly (in
        important text, such as a specific format or a separate page),
        I did not find it anywhere. Maybe my search was not enough,
        but I thought it was good that I ask this question in the hope
        that maybe the answer will benefit other people as well.

        Spark binary is usually downloaded from the following link and
        installed and configured on the cluster: Download Apache Spark
        <https://spark.apache.org/downloads.html>

        If, for example, we use the Java language for programming
        (although it can be other supported languages), we need the
        following dependencies to communicate with Spark:

        |<dependency>|

        |    <groupId>org.apache.spark</groupId>|

        |    <artifactId>spark-core_2|.12|</artifactId>|

        |    <version>|3.2.0|</version>|

        |</dependency>|

        |<dependency>|

        |    <groupId>org.apache.spark</groupId>|

        |    <artifactId>spark-sql_2|.12|</artifactId>|

        |    <version>|3.2.0|</version>|

        |</dependency>|

        As is clear, both the Spark cluster (binary of Spark) and the
        dependencies used on the application side have a specific
        version. In my opinion, it is obvious that if the version used
        is the same on both the application side and the server side,
        everything will most likely work in its ideal state without
        any problems.

        But the question is, what if the two versions are not the
        same? Is it possible to have compatibility between the server
        and the application in specific number of conditions (such as
        not changing major version)? Or, for example, if the client is
        always ahead, is it not a problem? Or if the server is always
        ahead, is it not a problem?

    The argument is that there may be a library that I did not write
    and it is an old version, but I want to update my cluster (server
    version). Or it may not be possible for me to update the server
    version and all the applications version at the same time, so I
    want to update each one separately. As a result, the
    application-server version differs in a period of time. (maybe
    short or long period) I want to know exactly how Spark works in
    this situation.

Reply via email to