On Erik's note, would SPARK-23257 be included in, say, 2.4.1? When would
the next RC be? I would like to propose the inclusion of the Kerberos
feature sooner rather than later as it would increase Spark-on-K8S adoption
in production workloads while bringing greater feature parity with Yarn and
Mesos. I would like to note that the feature itself is isolated from Core
and isolated via the step-based architecture of the Kubernetes
Driver/Executor builders.

Furthermore, Spark users traditionally use HDFS for storage and in
production use-cases these HDFS clusters would be kerberized. At Bloomberg,
for example, all of the HDFS clusters are kerberized and for this reason,
the only thing stopping our internal Data Science Platform from adopting
Spark-on-K8S is this feature.

On Tue, Oct 16, 2018 at 10:21 AM Erik Erlandson <eerla...@redhat.com> wrote:

>
> SPARK-23257 merged more recently than I realized. If that isn't on
> branch-2.4 then the first question is how soon on the release sequence that
> can be adopted
>
> On Tue, Oct 16, 2018 at 9:33 AM Reynold Xin <r...@databricks.com> wrote:
>
>> We shouldn’t merge new features into release branches anymore.
>>
>> On Tue, Oct 16, 2018 at 6:32 PM Rob Vesse <rve...@dotnetrdf.org> wrote:
>>
>>> Right now the Kerberos support for Spark on K8S is only on master AFAICT
>>> i.e. the feature is not present on branch-2.4
>>>
>>>
>>>
>>> Therefore I don’t see any point in adding the tests into branch-2.4
>>> unless the plan is to also merge the Kerberos support to branch-2.4
>>>
>>>
>>>
>>> Rob
>>>
>>>
>>>
>>> *From: *Erik Erlandson <eerla...@redhat.com>
>>> *Date: *Tuesday, 16 October 2018 at 16:47
>>> *To: *dev <dev@spark.apache.org>
>>> *Subject: *[DISCUSS][K8S][TESTS] Include Kerberos integration tests for
>>> Spark 2.4
>>>
>>>
>>>
>>> I'd like to propose including integration testing for Kerberos on the
>>> Spark 2.4 release:
>>>
>>> https://github.com/apache/spark/pull/22608
>>>
>>>
>>>
>>> Arguments in favor:
>>>
>>> 1) it improves testing coverage on a feature important for integrating
>>> with HDFS deployments
>>>
>>> 2) its intersection with existing code is small - it consists primarily
>>> of new testing code, with a bit of refactoring into 'main' and 'test'
>>> sub-trees. These new tests appear stable.
>>>
>>> 3) Spark 2.4 is still in RC, with outstanding correctness issues.
>>>
>>>
>>>
>>> The argument 'against' that I'm aware of would be the relatively large
>>> size of the PR. I believe this is considered above, but am soliciting
>>> community feedback before committing.
>>>
>>> Cheers,
>>>
>>> Erik
>>>
>>>
>>>
>>

Reply via email to