Hi All,

We are trying to enable RPC encryption between driver and executor. Currently 
we're working on Spark 2.4 on Kubernetes.

According to Apache Spark Security document 
(https://spark.apache.org/docs/latest/security.html) and our understanding on 
the same, it is clear that Spark supports AES-based encryption for RPC 
connections. There is also support for SASL-based encryption, although it 
should be considered deprecated.

spark.network.crypto.enabled true , will enable AES-based RPC encryption.
However, when we enable AES based encryption between driver and executor, we 
could observe a very sporadic behaviour in communication between driver and 
executor in the logs.

Follwing are the options and their default values, we used for enabling 
encryption:-

spark.authenticate true
spark.authenticate.secret <some-value>
spark.network.crypto.enabled true
spark.network.crypto.keyLength 256
spark.network.crypto.saslFallback false

A snippet of the executor log is provided below:-
Exception in thread "main" 19/02/26 07:27:08 ERROR RpcOutboxMessage: Ask 
timeout before connecting successfully
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply from 
sts-spark-thrift-server-1551165767426-driver-svc.default.svc:7078 in 120 seconds

But, there is no error message or any message from executor seen in the driver 
log for the same timestamp.

We also tried increasing spark.network.timeout, but no luck.

This issue is seen sporadically, as the following observations were noted:-
1) Sometimes, enabling AES encryption works completely fine.
2) Sometimes, enabling AES encryption works fine for around 10 consecutive 
spark-submits but next trigger of spark-submit would go into hang state with 
the above mentioned error in the executor log.
3) Also, there are times, when enabling AES encryption would not work at all, 
as it would keep on spawnning more than 50 executors where the executors fail 
with the above mentioned error.
Even, setting spark.network.crypto.saslFallback to true didn't help.

Things are working fine when we enable SASL encryption, that is, only setting 
the following parameters:-
spark.authenticate true
spark.authenticate.secret <some-value>

I have attached the log file containing detailed error message. Please let us 
know if any configuration is missing or if any one has faced the same issue.

Any leads would be highly appreciated!!

Kind Regards,
Breeta Sinha

Attachment: rpc_timeout_error.log
Description: rpc_timeout_error.log

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to