RE: External shuffle service on K8S

2018-10-27 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi,
There is an unmerged PR which can be used against spark 2.4(if you are 
interested) or master branch(3.0). Spark 2.3 K8S lacks lot of features. I 
suggest you upgrade to 2.4 which will be released in few days from now.

https://github.com/apache/spark/pull/22722

Regards
Surya

From: Matt Cheah 
Sent: Saturday, October 27, 2018 6:12 AM
To: Li Gao ; vincent.gromakow...@gmail.com
Cc: caolijun1...@gmail.com; user@spark.apache.org
Subject: Re: External shuffle service on K8S

Hi there,

Please see https://issues.apache.org/jira/browse/SPARK-25299 for more 
discussion around this matter.

-Matt Cheah

From: Li Gao mailto:eyesofho...@gmail.com>>
Date: Friday, October 26, 2018 at 9:10 AM
To: "vincent.gromakow...@gmail.com" 
mailto:vincent.gromakow...@gmail.com>>
Cc: "caolijun1...@gmail.com" 
mailto:caolijun1...@gmail.com>>, 
"user@spark.apache.org" 
mailto:user@spark.apache.org>>
Subject: Re: External shuffle service on K8S

There are existing 2.2 based ext shuffle on the fork:
https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html 
[apache-spark-on-k8s.github.io]

You can modify it to suit your needs.

-Li


On Fri, Oct 26, 2018 at 3:22 AM vincent gromakowski 
mailto:vincent.gromakow...@gmail.com>> wrote:
No it's on the roadmap >2.4

Le ven. 26 oct. 2018 à 11:15, 曹礼俊 
mailto:caolijun1...@gmail.com>> a écrit :
Hi all:

Does Spark 2.3.2 supports external shuffle service on Kubernetes?

I have looked up the 
documentation(https://spark.apache.org/docs/latest/running-on-kubernetes.html 
[spark.apache.org]),
 but couldn't find related suggestions.

If suppports, how can I enable it?

Best Regards

Lijun Cao




RE: Python kubernetes spark 2.4 branch

2018-09-26 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi Ilan/Yinan,
My observation is as follows:
The dependent files specified with “--py-files 
http://10.75.145.25:80/Spark/getNN.py” are being downloaded and available in 
the container at 
“/var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/”.
I guess we need to export PYTHONPATH with this path as well with following code 
change in entrypoint.sh


if [ -n "$PYSPARK_FILES" ]; then
PYTHONPATH="$PYTHONPATH:$PYSPARK_FILES"
fi

to

if [ -n "$PYSPARK_FILES" ]; then
PYTHONPATH="$PYTHONPATH:"
fi
Let me know, if this approach is fine.

Please correct me if my understanding is wrong with this approach.

Regards
Surya

From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Wednesday, September 26, 2018 9:14 AM
To: Ilan Filonenko ; liyinan...@gmail.com
Cc: Spark dev list ; user@spark.apache.org
Subject: RE: Python kubernetes spark 2.4 branch

Hi Ilan/ Yinan,
Yes my test case is also similar to the one described in 
https://issues.apache.org/jira/browse/SPARK-24736

My spark-submit is as follows:
./spark-submit --deploy-mode cluster --master 
k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf 
spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf 
--py-files http://10.75.145.25:80/Spark/getNN.py 
http://10.75.145.25:80/Spark/test.py

Following is the error observed:

+ exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
http://10.75.145.25:80/Spark/test.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Traceback (most recent call last):
File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in 

from getNN import *
ImportError: No module named getNN
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory 
/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229

Observing the same kind of behaviour as mentioned in 
https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and 
available in pod)

This is also the same with the local files as well:

./spark-submit --deploy-mode cluster --master 
k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf 
spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf 
--py-files ./getNN.py http://10.75.145.25:80/Spark/test.py

test.py has dependencies from getNN.py.


But the same is working in spark 2.2 k8s branch.


Regards
Surya

From: Ilan Filonenko mailto:i...@cornell.edu>>
Sent: Wednesday, September 26, 2018 2:06 AM
To: liyinan...@gmail.com<mailto:liyinan...@gmail.com>
Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) 
mailto:suryanarayana.garlap...@nokia.com>>; 
Spark dev list mailto:d...@spark.apache.org>>; 
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Python kubernetes spark 2.4 branch

Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ?

On Tue, Sep 25, 2018 at 12:38 PM Yinan Li 
mailto:liyinan...@gmail.com>> wrote:
Can you give more details on how you ran your app, did you build your own 
image, and which image are you using?

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - 
IN/Bangalore) 
mailto:suryanarayana.garlap...@nokia.com>> 
wrote:
Hi,
I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. 
When the dependent files are passed through the --py-files option, they are not 
getting resolved by the main python script. Please let me know, is this a known 
issue?

Regards
Surya



RE: Python kubernetes spark 2.4 branch

2018-09-25 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi Ilan/ Yinan,
Yes my test case is also similar to the one described in 
https://issues.apache.org/jira/browse/SPARK-24736

My spark-submit is as follows:
./spark-submit --deploy-mode cluster --master 
k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf 
spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf 
--py-files http://10.75.145.25:80/Spark/getNN.py 
http://10.75.145.25:80/Spark/test.py

Following is the error observed:

+ exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
http://10.75.145.25:80/Spark/test.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Traceback (most recent call last):
File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in 

from getNN import *
ImportError: No module named getNN
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory 
/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229

Observing the same kind of behaviour as mentioned in 
https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and 
available in pod)

This is also the same with the local files as well:

./spark-submit --deploy-mode cluster --master 
k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf 
spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf 
--py-files ./getNN.py http://10.75.145.25:80/Spark/test.py

test.py has dependencies from getNN.py.


But the same is working in spark 2.2 k8s branch.


Regards
Surya

From: Ilan Filonenko 
Sent: Wednesday, September 26, 2018 2:06 AM
To: liyinan...@gmail.com
Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) 
; Spark dev list ; 
user@spark.apache.org
Subject: Re: Python kubernetes spark 2.4 branch

Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ?

On Tue, Sep 25, 2018 at 12:38 PM Yinan Li 
mailto:liyinan...@gmail.com>> wrote:
Can you give more details on how you ran your app, did you build your own 
image, and which image are you using?

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - 
IN/Bangalore) 
mailto:suryanarayana.garlap...@nokia.com>> 
wrote:
Hi,
I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. 
When the dependent files are passed through the --py-files option, they are not 
getting resolved by the main python script. Please let me know, is this a known 
issue?

Regards
Surya



Python kubernetes spark 2.4 branch

2018-09-25 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi,
I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. 
When the dependent files are passed through the --py-files option, they are not 
getting resolved by the main python script. Please let me know, is this a known 
issue?

Regards
Surya



RE: Support STS to run in k8s deployment with spark deployment mode as cluster

2018-09-15 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi,
Following is the bug to track the same.

https://issues.apache.org/jira/browse/SPARK-25442

Regards
Surya

From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Sunday, September 16, 2018 10:15 AM
To: d...@spark.apache.org; Ilan Filonenko 
Cc: user@spark.apache.org; Imandi, Srinivas (Nokia - IN/Bangalore) 
; Chakradhar, N R (Nokia - IN/Bangalore) 
; Rao, Abhishek (Nokia - IN/Bangalore) 

Subject: Support STS to run in k8s deployment with spark deployment mode as 
cluster

Hi All,
I would like to propose the following changes for supporting the STS to run in 
k8s deployments with spark deployment mode as cluster.

PR: https://github.com/apache/spark/pull/22433

Can you please review and provide the comments?


Regards
Surya



Support STS to run in k8s deployment with spark deployment mode as cluster

2018-09-15 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi All,
I would like to propose the following changes for supporting the STS to run in 
k8s deployments with spark deployment mode as cluster.

PR: https://github.com/apache/spark/pull/22433

Can you please review and provide the comments?


Regards
Surya



RE: [K8S] Driver and Executor Logging

2018-09-08 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi,
Provide the following options in spark-defaults.conf and make sure the 
log4j.properties file is available in driver and executor:

spark.driver.extraJavaOptions -Dlog4j.configuration=file:/log4j.properties
spark.executor.extraJavaOptions -Dlog4j.configuration=file: 
/log4j.properties


Regards
Surya

From: Rohit Menon 
Sent: Friday, September 7, 2018 11:02 PM
To: user@spark.apache.org
Subject: [K8S] Driver and Executor Logging

Hello All,

We are trying to use a custom appender for Spark driver and executor pods. 
However, the changes to log4j.properties file in the spark container image are 
not taking effect. We even tried simpler changes like changing the logging 
level to DEBUG.

Has anyone run into similar issues? or successfully changed logging properties 
for spark on k8s?

Thanks,
Rohit


RE: Query on Spark Hive with kerberos Enabled on Kubernetes

2018-07-23 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi Sandeep,
Any inputs on this?

Regards
Surya

From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Saturday, July 21, 2018 6:50 PM
To: Sandeep Katta 
Cc: d...@spark.apache.org; user@spark.apache.org
Subject: RE: Query on Spark Hive with kerberos Enabled on Kubernetes

Hi Sandeep,
Thx for the response:
I am using following commands: (xml files hive-site.xml, core-site.xml and 
hdfs-site.xml are made available by exporting through the HADOOP_CONF_DIR 
option).

For HDFS Access which succeeds:
./spark-submit --deploy-mode cluster --master 
k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace 
default --conf spark.kubernetes.kerberos.enabled=true --conf 
spark.kubernetes.kerberos.principal= --conf 
spark.kubernetes.kerberos.keytab= --conf 
spark.kubernetes.driver.docker.image= --conf 
spark.kubernetes.executor.docker.image= --conf 
spark.kubernetes.initcontainer.docker.image= --conf 
spark.kubernetes.resourceStagingServer.uri=http://:1 
../examples/src/main/python/wordcount.py hdfs://:8020/tmp/wordcount.txt


For Hive Access (this is failing):
./spark-submit --deploy-mode cluster --master 
k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace 
default --conf spark.kubernetes.kerberos.enabled=true --files /etc/krb5.conf, 
,../examples/src/main/resources/kv1.txt --conf 
spark.kubernetes.kerberos.principal= --conf 
spark.kubernetes.kerberos.keytab= --conf 
spark.kubernetes.driver.docker.image= --conf 
spark.kubernetes.executor.docker.image= --conf 
spark.kubernetes.initcontainer.docker.image= --conf 
spark.kubernetes.resourceStagingServer.uri=http://:1 
../examples/src/main/python/sql/hive.py

Following is the error:
2018-07-19 04:15:55 INFO  HiveUtils:54 - Initializing HiveMetastoreConnection 
version 1.2.1 using Spark classes.
2018-07-19 04:15:56 INFO  metastore:376 - Trying to connect to metastore with 
URI thrift://vm-10-75-145-54:9083
2018-07-19 04:15:56 ERROR TSaslTransport:315 - SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)

If I don’t provide the krb5.conf in the above spark-submit:
I get an error saying unable to find any default realm.

One work around I had found, if I generate any tgt by doing the kinit and copy 
it into the driver pod  into location /tmp/krb5cc_0, it works fine. I guess 
this should not be the way to do it. It should generate automatically the tgt 
and should access the hive metastore. Please let me know, if doing wrong.

Regards
Surya

From: Sandeep Katta [mailto:sandeep0102.opensou...@gmail.com]
Sent: Friday, July 20, 2018 9:59 PM
To: Garlapati, Suryanarayana (Nokia - IN/Bangalore) 
mailto:suryanarayana.garlap...@nokia.com>>
Cc: d...@spark.apache.org<mailto:d...@spark.apache.org>; 
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Query on Spark Hive with kerberos Enabled on Kubernetes

Can you please tell us what exception you ve got,any logs for the same ?

On Fri, 20 Jul 2018 at 8:36 PM, Garlapati, Suryanarayana (Nokia - IN/Bangalore) 
mailto:suryanarayana.garlap...@nokia.com>> 
wrote:
Hi All,
I am trying to use Spark 2.2.0 
Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0)
 code to run the Hive Query on Kerberos Enabled cluster. Spark-submit’s fail 
for the Hive Queries, but pass when I am trying to access the hdfs. Is this a 
known limitation or am I doing something wrong. Please let me know. If this is 
working, can you please specify an example for running Hive Queries?

Thanks.

Regards
Surya


RE: Query on Spark Hive with kerberos Enabled on Kubernetes

2018-07-21 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi Sandeep,
Thx for the response:
I am using following commands: (xml files hive-site.xml, core-site.xml and 
hdfs-site.xml are made available by exporting through the HADOOP_CONF_DIR 
option).

For HDFS Access which succeeds:
./spark-submit --deploy-mode cluster --master 
k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace 
default --conf spark.kubernetes.kerberos.enabled=true --conf 
spark.kubernetes.kerberos.principal= --conf 
spark.kubernetes.kerberos.keytab= --conf 
spark.kubernetes.driver.docker.image= --conf 
spark.kubernetes.executor.docker.image= --conf 
spark.kubernetes.initcontainer.docker.image= --conf 
spark.kubernetes.resourceStagingServer.uri=http://:1 
../examples/src/main/python/wordcount.py hdfs://:8020/tmp/wordcount.txt


For Hive Access (this is failing):
./spark-submit --deploy-mode cluster --master 
k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace 
default --conf spark.kubernetes.kerberos.enabled=true --files /etc/krb5.conf, 
,../examples/src/main/resources/kv1.txt --conf 
spark.kubernetes.kerberos.principal= --conf 
spark.kubernetes.kerberos.keytab= --conf 
spark.kubernetes.driver.docker.image= --conf 
spark.kubernetes.executor.docker.image= --conf 
spark.kubernetes.initcontainer.docker.image= --conf 
spark.kubernetes.resourceStagingServer.uri=http://:1 
../examples/src/main/python/sql/hive.py

Following is the error:
2018-07-19 04:15:55 INFO  HiveUtils:54 - Initializing HiveMetastoreConnection 
version 1.2.1 using Spark classes.
2018-07-19 04:15:56 INFO  metastore:376 - Trying to connect to metastore with 
URI thrift://vm-10-75-145-54:9083
2018-07-19 04:15:56 ERROR TSaslTransport:315 - SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)

If I don’t provide the krb5.conf in the above spark-submit:
I get an error saying unable to find any default realm.

One work around I had found, if I generate any tgt by doing the kinit and copy 
it into the driver pod  into location /tmp/krb5cc_0, it works fine. I guess 
this should not be the way to do it. It should generate automatically the tgt 
and should access the hive metastore. Please let me know, if doing wrong.

Regards
Surya

From: Sandeep Katta [mailto:sandeep0102.opensou...@gmail.com]
Sent: Friday, July 20, 2018 9:59 PM
To: Garlapati, Suryanarayana (Nokia - IN/Bangalore) 

Cc: d...@spark.apache.org; user@spark.apache.org
Subject: Re: Query on Spark Hive with kerberos Enabled on Kubernetes

Can you please tell us what exception you ve got,any logs for the same ?

On Fri, 20 Jul 2018 at 8:36 PM, Garlapati, Suryanarayana (Nokia - IN/Bangalore) 
mailto:suryanarayana.garlap...@nokia.com>> 
wrote:
Hi All,
I am trying to use Spark 2.2.0 
Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0)
 code to run the Hive Query on Kerberos Enabled cluster. Spark-submit’s fail 
for the Hive Queries, but pass when I am trying to access the hdfs. Is this a 
known limitation or am I doing something wrong. Please let me know. If this is 
working, can you please specify an example for running Hive Queries?

Thanks.

Regards
Surya


Query on Spark Hive with kerberos Enabled on Kubernetes

2018-07-20 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi All,
I am trying to use Spark 2.2.0 
Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0)
 code to run the Hive Query on Kerberos Enabled cluster. Spark-submit's fail 
for the Hive Queries, but pass when I am trying to access the hdfs. Is this a 
known limitation or am I doing something wrong. Please let me know. If this is 
working, can you please specify an example for running Hive Queries?

Thanks.

Regards
Surya