Who could help pulling the latest Maven download stats for beam-runners-spark 
and beam-runners-spark-3 for the last few Beam releases?

Thanks so much!
/ Moritz

On 01.04.22, 16:54, "Moritz Mack" <mm...@talend.com> wrote:

I just started looking into the Spark runner code a bit to helpfully help 
supporting it. Besides having to maintain (test!) twice the number of 
artifacts, there’s also a significant negative impact on developer ergonomics / 
productivity supporting
I just started looking into the Spark runner code a bit to helpfully help 
supporting it.
Besides having to maintain (test!) twice the number of artifacts, there’s also 
a significant negative impact on developer ergonomics / productivity supporting 
multiple major versions (separate modules to deal with breaking changes and all 
the trouble that comes with that).

Thanks, Alexey, for opening the discussion. Certainly a big +1 from my side.

/ Moritz



From: Alexey Romanenko <aromanenko....@gmail.com>
Date: Thursday, 31. March 2022 at 18:51
To: dev <dev@beam.apache.org>
Subject: Re: [PROPOSAL] Stop Spark2 support in Spark Runner
!-------------------------------------------------------------------|
  This Message Is From an External Sender
  This message came from outside your organization.
  Exercise caution when opening attachments or clicking any
  links.
|-------------------------------------------------------------------!


> On 31 Mar 2022, at 18:02, Robert Bradshaw <rober...@google.com> wrote:
>
> Generally makes sense to me, though I'm curious what the maintenance
> burden is *high or low) in keeping it around.

Well, we need to provide two versions of spark runner artifacts, job-servers 
and docker images, to test them separately (different Jenkins jobs). We also 
have two different code paths for the cases where API is not compatible between 
Spark2 and Spark3.

> We should probably
> deprecate it for a period of time before removing support.

Agree and I’d suggest even ask users on user@/twitter before.


Actually, I see some problem with naming. By default, we used to call “Spark 
runner” as a runner that works with Spark2 (for example, the artifacts [1][2]). 
When Spark3 support was added, all its Beam artifacts and related names reflect 
its version [3][4]. So, it’s not clear how it will be better to deal with this, 
especially, taking into account, that new Spark version (4, 5, etc) will be 
available sooner or later. Perhaps, to avoid a confusion in the future, we need 
to follow the same naming pattern.

—
Alexey

[1] 
https://urldefense.com/v3/__https://search.maven.org/artifact/org.apache.beam/beam-runners-spark__;!!CiXD_PY!URNp5UzJrCpB9s1jH33QcFeeNp5f3S7yzes0A03mrqRxCP9P3ZJZy1_2l3mF5QcCHmGhxZl0fWKf9ZbVIoOAIg$<https://urldefense.com/v3/__https:/search.maven.org/artifact/org.apache.beam/beam-runners-spark__;!!CiXD_PY!URNp5UzJrCpB9s1jH33QcFeeNp5f3S7yzes0A03mrqRxCP9P3ZJZy1_2l3mF5QcCHmGhxZl0fWKf9ZbVIoOAIg$>
[2] 
https://urldefense.com/v3/__https://search.maven.org/artifact/org.apache.beam/beam-runners-spark-job-server__;!!CiXD_PY!URNp5UzJrCpB9s1jH33QcFeeNp5f3S7yzes0A03mrqRxCP9P3ZJZy1_2l3mF5QcCHmGhxZl0fWKf9ZZb2YMCkw$<https://urldefense.com/v3/__https:/search.maven.org/artifact/org.apache.beam/beam-runners-spark-job-server__;!!CiXD_PY!URNp5UzJrCpB9s1jH33QcFeeNp5f3S7yzes0A03mrqRxCP9P3ZJZy1_2l3mF5QcCHmGhxZl0fWKf9ZZb2YMCkw$>
[3] 
https://urldefense.com/v3/__https://search.maven.org/artifact/org.apache.beam/beam-runners-spark-3__;!!CiXD_PY!URNp5UzJrCpB9s1jH33QcFeeNp5f3S7yzes0A03mrqRxCP9P3ZJZy1_2l3mF5QcCHmGhxZl0fWKf9ZZYlOwKfg$<https://urldefense.com/v3/__https:/search.maven.org/artifact/org.apache.beam/beam-runners-spark-3__;!!CiXD_PY!URNp5UzJrCpB9s1jH33QcFeeNp5f3S7yzes0A03mrqRxCP9P3ZJZy1_2l3mF5QcCHmGhxZl0fWKf9ZZYlOwKfg$>
[4] 
https://urldefense.com/v3/__https://search.maven.org/artifact/org.apache.beam/beam-runners-spark-3-job-server__;!!CiXD_PY!URNp5UzJrCpB9s1jH33QcFeeNp5f3S7yzes0A03mrqRxCP9P3ZJZy1_2l3mF5QcCHmGhxZl0fWKf9ZaC2NvaLw$<https://urldefense.com/v3/__https:/search.maven.org/artifact/org.apache.beam/beam-runners-spark-3-job-server__;!!CiXD_PY!URNp5UzJrCpB9s1jH33QcFeeNp5f3S7yzes0A03mrqRxCP9P3ZJZy1_2l3mF5QcCHmGhxZl0fWKf9ZaC2NvaLw$>

>
> On Thu, Mar 31, 2022 at 8:52 AM Alexey Romanenko
> <aromanenko....@gmail.com> wrote:
>>
>> Hi everyone,
>>
>> For the moment, Beam Spark Runner supports two versions of Spark - 2.x and 
>> 3.x.
>>
>> Taking into account the several things that:
>> - almost all cloud providers already mostly moved to Spark 3.x as a main 
>> supported version;
>> - the latest Spark 2.x release (Spark 2.4.8, maintenance release) was done 
>> almost a year ago;
>> - Spark 3 is considered as a mainstream Spark version for development and 
>> bug fixing;
>> - better to avoid the burden of maintenance (there are some 
>> incompatibilities between Spark 2 and 3) of two versions;
>>
>> I’d suggest to stop support Spark 2 for the Spark Runner in the one of the 
>> next Beam releases.
>>
>> What are your thoughts on this? Are there any principal objections or 
>> reasons for not doing this that I probably missed?
>>
>> —
>> Alexey
>>
>>

As a recipient of an email from Talend, your contact personal data will be on 
our systems. Please see our privacy notice. <https://www.talend.com/privacy/>


As a recipient of an email from Talend, your contact personal data will be on 
our systems. Please see our privacy notice. <https://www.talend.com/privacy/>


Reply via email to