Java 11 support in Spark 2.5

2020-01-01 Thread Sinha, Breeta (Nokia - IN/Bangalore)
Hi All,

Wanted to know if Java 11 support is added in Spark 2.5.
If so, what is the expected timeline for Spark 2.5 release?

Kind Regards,
Breeta Sinha



RE: RPC timeout error for AES based encryption between driver and executor

2019-03-27 Thread Sinha, Breeta (Nokia - IN/Bangalore)
Hi Vanzin,

"spark.authenticate" is working properly for our environment (Spark 2.4 on 
Kubernetes).
We have made few code changes through which secure communication between driver 
and executor is working fine using shared spark.authenticate.secret.

Even SASL encryption works but when we set, 
spark.network.crypto.enabled true
to enable AES based encryption, we see RPC timeout error message sporadically.

Kind Regards,
Breeta


-Original Message-
From: Marcelo Vanzin  
Sent: Tuesday, March 26, 2019 9:10 PM
To: Sinha, Breeta (Nokia - IN/Bangalore) 
Cc: user@spark.apache.org
Subject: Re: RPC timeout error for AES based encryption between driver and 
executor

I don't think "spark.authenticate" works properly with k8s in 2.4 (which would 
make it impossible to enable encryption since it requires authentication). I'm 
pretty sure I fixed it in master, though.

On Tue, Mar 26, 2019 at 2:29 AM Sinha, Breeta (Nokia - IN/Bangalore) 
 wrote:
>
> Hi All,
>
>
>
> We are trying to enable RPC encryption between driver and executor. Currently 
> we're working on Spark 2.4 on Kubernetes.
>
>
>
> According to Apache Spark Security document 
> (https://spark.apache.org/docs/latest/security.html) and our understanding on 
> the same, it is clear that Spark supports AES-based encryption for RPC 
> connections. There is also support for SASL-based encryption, although it 
> should be considered deprecated.
>
>
>
> spark.network.crypto.enabled true , will enable AES-based RPC encryption.
>
> However, when we enable AES based encryption between driver and executor, we 
> could observe a very sporadic behaviour in communication between driver and 
> executor in the logs.
>
>
>
> Follwing are the options and their default values, we used for 
> enabling encryption:-
>
>
>
> spark.authenticate true
>
> spark.authenticate.secret 
>
> spark.network.crypto.enabled true
>
> spark.network.crypto.keyLength 256
>
> spark.network.crypto.saslFallback false
>
>
>
> A snippet of the executor log is provided below:-
>
> Exception in thread "main" 19/02/26 07:27:08 ERROR RpcOutboxMessage: 
> Ask timeout before connecting successfully
>
> Caused by: java.util.concurrent.TimeoutException: Cannot receive any 
> reply from 
> sts-spark-thrift-server-1551165767426-driver-svc.default.svc:7078 in 
> 120 seconds
>
>
>
> But, there is no error message or any message from executor seen in the 
> driver log for the same timestamp.
>
>
>
> We also tried increasing spark.network.timeout, but no luck.
>
>
>
> This issue is seen sporadically, as the following observations were 
> noted:-
>
> 1) Sometimes, enabling AES encryption works completely fine.
>
> 2) Sometimes, enabling AES encryption works fine for around 10 consecutive 
> spark-submits but next trigger of spark-submit would go into hang state with 
> the above mentioned error in the executor log.
>
> 3) Also, there are times, when enabling AES encryption would not work at all, 
> as it would keep on spawnning more than 50 executors where the executors fail 
> with the above mentioned error.
>
> Even, setting spark.network.crypto.saslFallback to true didn't help.
>
>
>
> Things are working fine when we enable SASL encryption, that is, only 
> setting the following parameters:-
>
> spark.authenticate true
>
> spark.authenticate.secret 
>
>
>
> I have attached the log file containing detailed error message. Please let us 
> know if any configuration is missing or if any one has faced the same issue.
>
>
>
> Any leads would be highly appreciated!!
>
>
>
> Kind Regards,
>
> Breeta Sinha
>
>
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org



--
Marcelo


RPC timeout error for AES based encryption between driver and executor

2019-03-26 Thread Sinha, Breeta (Nokia - IN/Bangalore)
Hi All,

We are trying to enable RPC encryption between driver and executor. Currently 
we're working on Spark 2.4 on Kubernetes.

According to Apache Spark Security document 
(https://spark.apache.org/docs/latest/security.html) and our understanding on 
the same, it is clear that Spark supports AES-based encryption for RPC 
connections. There is also support for SASL-based encryption, although it 
should be considered deprecated.

spark.network.crypto.enabled true , will enable AES-based RPC encryption.
However, when we enable AES based encryption between driver and executor, we 
could observe a very sporadic behaviour in communication between driver and 
executor in the logs.

Follwing are the options and their default values, we used for enabling 
encryption:-

spark.authenticate true
spark.authenticate.secret 
spark.network.crypto.enabled true
spark.network.crypto.keyLength 256
spark.network.crypto.saslFallback false

A snippet of the executor log is provided below:-
Exception in thread "main" 19/02/26 07:27:08 ERROR RpcOutboxMessage: Ask 
timeout before connecting successfully
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply from 
sts-spark-thrift-server-1551165767426-driver-svc.default.svc:7078 in 120 seconds

But, there is no error message or any message from executor seen in the driver 
log for the same timestamp.

We also tried increasing spark.network.timeout, but no luck.

This issue is seen sporadically, as the following observations were noted:-
1) Sometimes, enabling AES encryption works completely fine.
2) Sometimes, enabling AES encryption works fine for around 10 consecutive 
spark-submits but next trigger of spark-submit would go into hang state with 
the above mentioned error in the executor log.
3) Also, there are times, when enabling AES encryption would not work at all, 
as it would keep on spawnning more than 50 executors where the executors fail 
with the above mentioned error.
Even, setting spark.network.crypto.saslFallback to true didn't help.

Things are working fine when we enable SASL encryption, that is, only setting 
the following parameters:-
spark.authenticate true
spark.authenticate.secret 

I have attached the log file containing detailed error message. Please let us 
know if any configuration is missing or if any one has faced the same issue.

Any leads would be highly appreciated!!

Kind Regards,
Breeta Sinha



rpc_timeout_error.log
Description: rpc_timeout_error.log

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

RE: Got fatal error when running spark 2.4.0 on k8s

2019-02-13 Thread Sinha, Breeta (Nokia - IN/Bangalore)
Hi Dawn,

Probably, you are providing the incorrect image(must be a java image) or the 
incorrect master ip or the service account. Please verify the pod’s permissions 
for the service account(‘spark’ in your case).

I have tried executing the same program as below:

./spark-submit --master k8s://https:// --deploy-mode 
cluster--name spark-pi --class org.apache.spark.examples.SparkPi
 --conf spark.executor.instances=1 --conf 
spark.kubernetes.container.image= --conf 
spark.kubernetes.namespace= 
local:///opt/spark/examples/jars/spark-examples*.jar 5

And, I was able to see “Pi is roughly 3.139774279548559” in the pod’s output 
log.

Hope this will help! 

Regards,
Breeta


From: dawn breaks <2005dawnbre...@gmail.com>
Sent: Wednesday, February 13, 2019 1:52 PM
To: user@spark.apache.org
Subject: Got fatal error when running spark 2.4.0 on k8s

we submit spark job to k8s by the following command, and the driver pod got an 
error and exit. Anybody can help us to solve it?

 ./bin/spark-submit \
--master k8s://https://172.21.91.48:6443 \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.container.image=xxxRepo/spark:v2.4.0 \
local:///opt/spark/examples/jars/spark-examples*.jar \
5


The error detail info as following:

2019-02-13 07:13:06 ERROR SparkContext:91 - Error initializing SparkContext.
org.apache.spark.SparkException: External scheduler cannot be instantiated
at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
at org.apache.spark.SparkContext.(SparkContext.scala:493)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: An error has 
occurred.
at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:53)
at 
io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:167)
at 
org.apache.spark.deploy.k8s.SparkKubernetesClientFactory$.createKubernetesClient(SparkKubernetesClientFactory.scala:84)
at 
org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:64)
at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
... 20 more
Caused by: java.security.cert.CertificateException: Could not parse 
certificate: java.io.IOException: Empty input
at 
sun.security.provider.X509Factory.engineGenerateCertificate(X509Factory.java:110)
at 
java.security.cert.CertificateFactory.generateCertificate(CertificateFactory.java:339)
at 
io.fabric8.kubernetes.client.internal.CertUtils.createTrustStore(CertUtils.java:93)
at 
io.fabric8.kubernetes.client.internal.CertUtils.createTrustStore(CertUtils.java:71)
at 
io.fabric8.kubernetes.client.internal.SSLUtils.trustManagers(SSLUtils.java:114)
at 
io.fabric8.kubernetes.client.internal.SSLUtils.trustManagers(SSLUtils.java:93)
at 

Local Storage Encryption - Spark ioEncryption

2019-01-22 Thread Sinha, Breeta (Nokia - IN/Bangalore)
Hi All,

We are trying to enable encryption between spark-shuffle and local FileSystem. 
We wanted to clarify our understanding on this. Currently we're working on 
Spark 2.4

According to our understanding of Spark supporting Local Storage Encryption, 
that is, "Enabling local disk I/O encryption", it looks like the following 
properties:-
spark.io.encryption.enabled
spark.io.encryption.keySizeBits
spark.io.encryption.keygen.algorithm
spark.io.encryption.commons.config.*
needs to be enabled only in spark and not in spark-shuffle's configuration 
properties.

So, on performing spark-submit using external shuffle service, only when we set 
ioEncryption properties enabled in the configuration used for spark-submit, we 
can see ioEncryption related messages in the driver log. But when we use 
ioEncryption properties enabled only in spark-shuffle's configuration we do not 
see any ioEncryption related messages in shuffle logs.

We have followed the below links:-
https://spark.apache.org/docs/latest/security.html
https://dzone.com/articles/securing-apache-spark-shuffle-using-apache-commons
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/configuring-spark/content/configuring_spark_for_wire_encryption.html
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-SerializerManager.html

Can you please clarify on this if our understanding that ioEncryption related 
properties needs to be enabled only in spark is correct?

Thanks.
Breeta