Hi Celeborn Community, We had a sync with some of the community folks and excellent points were brought up. I will summarize the points here: 1. Backward compatibility: We will make the changes in a backward compatible manner. One of the reasons for backward incompatibility in our proposal arose because we wanted to have separate Netty servers on the Master - one serving client requests and one serving internal components (workers and other masters). However, in the discussion, we established that backward compatibility is necessary so we will make these changes in a backward compatible manner (flag-guarded by a config) and different ports for secured communication which will enable seamless rolling-upgrades in the production environment.
2. TLS will be optional even when authentication is enabled. 3. Supporting ttl for application secret, which can be supported in future, will not make the protocol backward incompatible. 4. Propagation of application secret to Workers should not overwhelm the master. Most of the time, the Master will be able to push the secrets to Workers before the workers receive any request from the client. There will be corner cases when a Worker doesn't get this secret and has to pull it from the Master. This is an exceptional case and the frequency of this will be quite low. We don't expect this to overwhelm the Master. We will add metrics to see how frequently this happens. 5. We haven't provided the details of how the secret is propagated from LifecycleManager to executors. In Apache Spark, the secret is already available to all the executors. However, for other platforms it might not be. An option is that the LifecycleManager can send the secret to mappers and reducers (in addition to other metadata) as a response to "RegisterShuffle" and "FileLocationRequest". However, it can also be a separate RPC call. This needs some more investigation. We will add this section to our proposal. 6. In the future, Celeborn can authenticate the user if that is modelled as a sasl mechanism which can be plugged in the Sasl Framework. We will provide some foundational support for it. I am going to create an epic jira and will soon start contributing changes. Thank you for all the great feedback, Chandni On Mon, Sep 18, 2023 at 7:20 PM Mridul Muralidharan <mri...@gmail.com> wrote: > To add to what Chandni mentioned, using self-signed certificates and > trusting them is another (though less secure) practice some deployments > leverage. > This ensures encryption over the wire, but does not allow for clients to > validate identity of the Celeborn server components (so potentially liable > to DNS spoofing, MITM attacks, etc). > This might or might not be acceptable to deployments. > > Note that the proposal calls securing with TLS as strongly recommended, but > not mandatory. > > Regards, > Mridul > > > On Mon, Sep 18, 2023 at 11:37 AM Chandni Singh <singh.chan...@gmail.com> > wrote: > > > Hi Zhongqiang, > > Yes, you are right. TLS implementation relies on digital certificates > which > > are usually obtained from a trusted CA. > > In my experience, many organizations establish their own internal CAs to > > issue certificates for their internal networks, thus acting as trusted > > issuers for various services within the organization. > > > > In scenarios where an internal CA infrastructure is not available and we > > want to avoid a public trusted CA because they are paid, services may > > resort to using self-signed certificates. To establish trust in these > > self-signed certificates, clients must be explicitly configured to > > recognize them — either by installing them into the client's native trust > > store or by using a custom trust store that includes these certificates. > > These certificates are securely distributed to all relevant > client-hosting > > machines using an out-of-band method. Once the trust store is properly > > configured, the client-side TLS settings can be adjusted to reference > this > > trust store, thereby ensuring secure communication > > > > Chandni > > > > On Mon, Sep 18, 2023 at 5:18 AM Zhongqiang Chen < > zhongqiangc...@apache.org > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hi Chandni, > > > > > > I have a question about how to implement TLS handshake and how to > obtain > > > the certificate? > > > Based on my understanding, TLS implementation generally relies on > digital > > > certificates which are obtained from a trusted certificate authority > > (CA). > > > It requires some money to obtain a CA certificate. > > > Thanks, > > > Zhongqiang Chen > > > > > > > > > > > > At 2023-09-15 06:34:02, "Chandni Singh" <singh.chan...@gmail.com> > wrote: > > > >Hello Celeborn community, > > > > > > > >We have a proposal to add authentication to Celeborn: > > > > > > > > > > https://docs.google.com/document/d/1D1U2COYhS3ob7l0t2WghRhBk_Fci9RGx-2FBXA3nvXk/edit#heading=h.m97qw1fpl5kv > > > > > > > >Would really appreciate feedback from the community on this proposal. > > > > > > > >Please let me know if there is a particular format that the Celeborn > > > >community follows for proposals and I will convert it into that > format. > > > > > > > >Thank you > > > >Chandni > > > > > >