Hi Chandni,

Thanks for the explanation, I agree that ensuring the security of
the client jar and its distribution falls outside the scope of adding
authentication to Celeborn.

I'm OK with the design doc, thanks! Let's see if other developers
have other feedbacks.

Thanks,
Keyong Zhou

Chandni Singh <[email protected]> 于2023年9月17日周日 11:04写道:

> Hi Keyong,
>
> At present, Spark generates a unique secret for each application (code
> <
> https://github.com/apache/spark/blob/804f741453fb146b5261084fa3baf26631badb79/core/src/main/scala/org/apache/spark/util/Utils.scala#L2887
> >),
> which is then shared with ESS via Yarn. We were considering sharing this
> secret with Celeborn for the Spark applications. Java's SecureRandom is
> used to generate the random bytes for each application's secret, making it
> highly unlikely for two applications to generate identical secrets. For
> platforms that don't generate secrets, the LifecycleManager could handle
> this task, using similar logic for generation.
>
> The chances of two applications generating the same secret would imply a
> malicious client jar, as you pointed out. However, ensuring the security of
> the client jar and its distribution falls outside the scope of adding
> authentication to Celeborn.
>
> Chandni
>
> On Sat, Sep 16, 2023 at 5:37 AM Keyong Zhou <[email protected]> wrote:
>
> > Hi Chandni,
> >
> > Thanks for the detailed explanation. For question 3 I still have
> questions.
> > I do mean that two applications claim to be the the same AppX, but only
> the
> > first app registers itself through step5-7, the second app only goes
> > through
> > step 5.a and 5.b. Seems the shared secret is generated by
> LifecycleManager,
> > so is it possible that the second app goes through 5.a/b with the same
> > <payload>
> > as the first app, then it generates the same shared secret? If so, the
> > second app
> > can skip step 6-7, and creates connection with servers claiming it's also
> > AppX.
> >
> > This problem has an assumption that the client jar is malicious. Maybe we
> > don't need
> > to consider such situation for now, I'm just thinking about the
> > possibility.
> >
> > Thanks,
> > Keyong Zhou
> >
> > Chandni Singh <[email protected]> 于2023年9月16日周六 14:03写道:
> >
> > > Hi Keyong,
> > > Thanks for reviewing the proposal.
> > > 1. Should we store the shard secrets in Ratis among masters since
> leader
> > > may change at any time?
> > > That's a good point. The secret should be stored in Ratis, as you've
> > > mentioned, because the leader can change at any time. Mridul and I
> > > discussed this but haven't yet included it in the document. We will
> need
> > to
> > > enable TLS communication between the masters, which I believe Ratis
> > > supports. Ratis also maintains a local log where state information is
> > > persisted. Since we're dealing with secrets, encryption may be
> necessary,
> > > although that might be beyond the current scope. Once we add support
> for
> > > encryption at rest, we can then implement encrypted secret storage
> within
> > > Ratis..
> > >
> > > 2. In case of worker graceful restart, should the worker store the
> shared
> > > secret in leveldb before it stops,
> > >     or ask the master after it restarts? (The later seems to be
> > necessary)
> > > In the preferred approach, where workers can retrieve the secret from
> the
> > > master, this method should suffice even after a worker restarts.
> Although
> > > this does increase the load on the master for sharing the secret, I
> don't
> > > believe it worsens the situation when employed during graceful
> restarts.
> > > Storing the information in LevelDB is also an option; however, since
> > we're
> > > storing secrets, encryption would be advisable. As it stands, even ESS
> > > stores unencrypted secrets in LevelDB, which is unacceptable for
> > > applications with strict security requirements.
> > >
> > > 3. In 5.a/b, what happens if two applications send the same payload?
> Will
> > > they get the same shared secret?
> > > Do you mean that two applications claim to be, let's say, AppX? If so,
> > the
> > > application that first registers with the master as AppX will proceed
> to
> > > communicate with the Celeborn service. Once an application has
> registered
> > > as AppX, the master will not permit any other application to register
> > with
> > > the same identifier. The master will then terminate the connection with
> > the
> > > second application attempting to register as AppX. That application
> will
> > > not be able to connect to the service any longer, as its secret was
> never
> > > registered with the master.As of now, we don't have any plans to
> support
> > > TTL.
> > >
> > > 4. The doc says TTL is out of scope, is there a plan to support TTL in
> > the
> > > future?
> > > No, we don't have any plans to support it as of now.
> > >
> > > I am going to incorporate some of these points in the doc as well.
> > >
> > > - Chandni
> > >
> > > On Fri, Sep 15, 2023 at 10:21 PM Keyong Zhou <[email protected]>
> wrote:
> > >
> > > > Hi Chandni & Mridul,
> > > >
> > > > Thanks for proposing this great feature! I've reviewed the design doc
> > and
> > > > it LGTM overall. Still I have a few questions that
> > > > are not present in the proposal (maybe too detailed):
> > > >
> > > > 1. Should we store the shard secrets in Ratis among masters since
> > leader
> > > > may change at any time?
> > > > 2. In case of worker graceful restart, should the worker store the
> > shared
> > > > secret in leveldb before it stops,
> > > >     or ask the master after it restarts? (The later seems to be
> > > necessary)
> > > > 3. In 5.a/b, what happens if two applications send the same payload?
> > Will
> > > > they get the same shared secret?
> > > > 4. The doc says TTL is out of scope, is there a plan to support TTL
> in
> > > the
> > > > future?
> > > >
> > > > Thanks,
> > > > Keyong Zhou
> > > >
> > > > Chandni Singh <[email protected]> 于2023年9月15日周五 06:34写道:
> > > >
> > > > > Hello Celeborn community,
> > > > >
> > > > > We have a proposal to add authentication to Celeborn:
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1D1U2COYhS3ob7l0t2WghRhBk_Fci9RGx-2FBXA3nvXk/edit#heading=h.m97qw1fpl5kv
> > > > >
> > > > > Would really appreciate feedback from the community on this
> proposal.
> > > > >
> > > > > Please let me know if there is a particular format that the
> Celeborn
> > > > > community follows for proposals and I will convert it into that
> > format.
> > > > >
> > > > > Thank you
> > > > > Chandni
> > > > >
> > > >
> > >
> >
>

Reply via email to