Re: ASF policy violation and Scala version issues

Grisha Weintraub Wed, 07 Jun 2023 13:05:08 -0700

Dongjoon,

I followed the conversation, and in my opinion, your concern is totally
legit.
It just feels that the discussion is focused solely on Databricks, and as I
said above, the same issue occurs in other vendors as well.



On Wed, Jun 7, 2023 at 10:28 PM Dongjoon Hyun <[email protected]>
wrote:

> To Grisha, we are talking about what is the right way and how to comply
> with ASF legal advice which I shared in this thread from "legal-discuss@"
> mailing thread.
>
> https://lists.apache.org/thread/mzhggd0rpz8t4d7vdsbhkp38mvd3lty4
>  (legal-discuss@)
> https://www.apache.org/foundation/marks/downstream.html#source (ASF
> Website)
>
> Dongjoon
>
>
> On Wed, Jun 7, 2023 at 12:16 PM Grisha Weintraub <
> [email protected]> wrote:
>
>> Yes, in Spark UI you have it as "3.1.2-amazon", but when you create a
>> cluster it's just Spark 3.1.2.
>>
>> On Wed, Jun 7, 2023 at 10:05 PM Nan Zhu <[email protected]> wrote:
>>
>>>
>>>  for EMR, I think they show 3.1.2-amazon in Spark UI, no?
>>>
>>>
>>> On Wed, Jun 7, 2023 at 11:30 Grisha Weintraub <
>>> [email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am not taking sides here, but just for fairness, I think it should be
>>>> noted that AWS EMR does exactly the same thing.
>>>> We choose the EMR version (e.g., 6.4.0) and it has an associated Spark
>>>> version (e.g., 3.1.2).
>>>> The Spark version here is not the original Apache version but AWS Spark
>>>> distribution.
>>>>
>>>> On Wed, Jun 7, 2023 at 8:24 PM Dongjoon Hyun <[email protected]>
>>>> wrote:
>>>>
>>>>> I disagree with you in several ways.
>>>>>
>>>>> The following is not a *minor* change like the given examples
>>>>> (alterations to the start-up and shutdown scripts, configuration files,
>>>>> file layout etc.).
>>>>>
>>>>> > The change you cite meets the 4th point, minor change, made for
>>>>> integration reasons.
>>>>>
>>>>> The following is also wrong. There is no such point of state of Apache
>>>>> Spark 3.4.0 after 3.4.0 tag creation. Apache Spark community didn't allow
>>>>> Scala reverting patches in both `master` branch and `branch-3.4`.
>>>>>
>>>>> > There is no known technical objection; this was after all at one
>>>>> point the state of Apache Spark.
>>>>>
>>>>> Is the following your main point? So, you are selling a box "including
>>>>> Harry Potter by J. K. Rolling whose main character is Barry instead of
>>>>> Harry", but it's okay because you didn't sell the book itself? And, as a
>>>>> cloud-vendor, you borrowed the box instead of selling it like private
>>>>> libraries?
>>>>>
>>>>> > There is no standalone distribution of Apache Spark anywhere here.
>>>>>
>>>>> We are not asking a big thing. Why are you so reluctant to say you are
>>>>> not "Apache Spark 3.4.0" by simply saying "Apache Spark 3.4.0-databricks".
>>>>> What is the marketing reason here?
>>>>>
>>>>> Dongjoon.
>>>>>
>>>>>
>>>>> On Wed, Jun 7, 2023 at 9:27 AM Sean Owen <[email protected]> wrote:
>>>>>
>>>>>> Hi Dongjoon, I think this conversation is not advancing anymore. I
>>>>>> personally consider the matter closed unless you can find other support 
>>>>>> or
>>>>>> respond with more specifics. While this perhaps should be on private@,
>>>>>> I think it's not wrong as an instructive discussion on dev@.
>>>>>>
>>>>>> I don't believe you've made a clear argument about the problem, or
>>>>>> how it relates specifically to policy. Nevertheless I will show you my
>>>>>> logic.
>>>>>>
>>>>>> You are asserting that a vendor cannot call a product Apache Spark
>>>>>> 3.4.0 if it omits a patch updating a Scala maintenance version. This
>>>>>> difference has no known impact on usage, as far as I can tell.
>>>>>>
>>>>>> Let's see what policy requires:
>>>>>>
>>>>>> 1/ All source code changes must meet at least one of the acceptable
>>>>>> changes criteria set out below:
>>>>>> - The change has accepted by the relevant Apache project community
>>>>>> for inclusion in a future release. Note that the process used to accept
>>>>>> changes and how that acceptance is documented varies between projects.
>>>>>> - A change is a fix for an undisclosed security issue; and the fix is
>>>>>> not publicly disclosed as as security fix; and the Apache project has 
>>>>>> been
>>>>>> notified of the both issue and the proposed fix; and the PMC has rejected
>>>>>> neither the vulnerability report nor the proposed fix.
>>>>>> - A change is a fix for a bug; and the Apache project has been
>>>>>> notified of both the bug and the proposed fix; and the PMC has rejected
>>>>>> neither the bug report nor the proposed fix.
>>>>>> - Minor changes (e.g. alterations to the start-up and shutdown
>>>>>> scripts, configuration files, file layout etc.) to integrate with the
>>>>>> target platform providing the Apache project has not objected to those
>>>>>> changes.
>>>>>>
>>>>>> The change you cite meets the 4th point, minor change, made for
>>>>>> integration reasons. There is no known technical objection; this was 
>>>>>> after
>>>>>> all at one point the state of Apache Spark.
>>>>>>
>>>>>>
>>>>>> 2/ A version number must be used that both clearly differentiates it
>>>>>> from an Apache Software Foundation release and clearly identifies the
>>>>>> Apache Software Foundation version on which the software is based.
>>>>>>
>>>>>> Keep in mind the product here is not "Apache Spark", but the
>>>>>> "Databricks Runtime 13.1 (including Apache Spark 3.4.0)". That is, there 
>>>>>> is
>>>>>> far more than a version number differentiating this product from Apache
>>>>>> Spark. There is no standalone distribution of Apache Spark anywhere 
>>>>>> here. I
>>>>>> believe that easily matches the intent.
>>>>>>
>>>>>>
>>>>>> 3/ The documentation must clearly identify the Apache Software
>>>>>> Foundation version on which the software is based.
>>>>>>
>>>>>> Clearly, yes.
>>>>>>
>>>>>>
>>>>>> 4/ The end user expects that the distribution channel will back-port
>>>>>> fixes. It is not necessary to back-port all fixes. Selection of fixes to
>>>>>> back-port must be consistent with the update policy of that distribution
>>>>>> channel.
>>>>>>
>>>>>> I think this is safe to say too. Indeed this explicitly contemplates
>>>>>> not back-porting a change.
>>>>>>
>>>>>>
>>>>>> Backing up, you can see from this document that the spirit of it is:
>>>>>> don't include changes in your own Apache Foo x.y that aren't wanted by 
>>>>>> the
>>>>>> project, and still call it Apache Foo x.y. I don't believe your case
>>>>>> matches this spirit either.
>>>>>>
>>>>>> I do think it's not crazy to suggest, hey vendor, would you call this
>>>>>> "Apache Spark + patches" or ".vendor123". But that's at best a 
>>>>>> suggestion,
>>>>>> and I think it does nothing in particular for users. You've made the
>>>>>> suggestion, and I do not see some police action from the PMC must follow.
>>>>>>
>>>>>>
>>>>>> I think you're simply objecting to a vendor choice, but that is not
>>>>>> on-topic here unless you can specifically rebut the reasoning above and
>>>>>> show it's connected.
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 7, 2023 at 11:02 AM Dongjoon Hyun <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Sean, it seems that you are confused here. We are not talking about
>>>>>>> your upper system (the notebook environment). We are talking about the
>>>>>>> submodule, "Apache Spark 3.4.0-databricks". Whatever you call it, both 
>>>>>>> of
>>>>>>> us knows "Apache Spark 3.4.0-databricks" is different from "Apache Spark
>>>>>>> 3.4.0". You should not use "3.4.0" at your subsystem.
>>>>>>>
>>>>>>> > This also is aimed at distributions of "Apache Foo", not products
>>>>>>> that
>>>>>>> > "include Apache Foo", which are clearly not Apache Foo.
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe e-mail: [email protected]
>>>>>>>
>>>>>>>

Re: ASF policy violation and Scala version issues

Reply via email to