RE: [VOTE] SPARK 2.4.0 (RC3)

2018-10-10 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Might be you need to change the date(Oct 1 has already passed).

>> The vote is open until October 1 PST and passes if a majority +1 PMC votes 
>> are cast, with
>> a minimum of 3 +1 votes.

Regards
Surya

From: Wenchen Fan 
Sent: Wednesday, October 10, 2018 10:20 PM
To: Spark dev list 
Subject: Re: [VOTE] SPARK 2.4.0 (RC3)

I'm adding my own +1, since there are no known blocker issues. The correctness 
issue has been fixed, the streaming Java API problem has been resolved, and we 
have upgraded to Scala 2.12.7.

On Thu, Oct 11, 2018 at 12:46 AM Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.4.0.

The vote is open until October 1 PST and passes if a majority +1 PMC votes are 
cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.4.0-rc3 (commit 
8e4a99bd201b9204fec52580f19ae70a229ed94e):
https://github.com/apache/spark/tree/v2.4.0-rc3

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc3-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1289

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc3-docs/

The list of bug fixes going into 2.4.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12342385

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.4.0?
===

The current list of open tickets targeted at 2.4.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" 
= 2.4.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


RE: Python kubernetes spark 2.4 branch

2018-09-26 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi Ilan/Yinan,
My observation is as follows:
The dependent files specified with “--py-files 
http://10.75.145.25:80/Spark/getNN.py” are being downloaded and available in 
the container at 
“/var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/”.
I guess we need to export PYTHONPATH with this path as well with following code 
change in entrypoint.sh


if [ -n "$PYSPARK_FILES" ]; then
PYTHONPATH="$PYTHONPATH:$PYSPARK_FILES"
fi

to

if [ -n "$PYSPARK_FILES" ]; then
PYTHONPATH="$PYTHONPATH:"
fi
Let me know, if this approach is fine.

Please correct me if my understanding is wrong with this approach.

Regards
Surya

From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Wednesday, September 26, 2018 9:14 AM
To: Ilan Filonenko ; liyinan...@gmail.com
Cc: Spark dev list ; u...@spark.apache.org
Subject: RE: Python kubernetes spark 2.4 branch

Hi Ilan/ Yinan,
Yes my test case is also similar to the one described in 
https://issues.apache.org/jira/browse/SPARK-24736

My spark-submit is as follows:
./spark-submit --deploy-mode cluster --master 
k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf 
spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf 
--py-files http://10.75.145.25:80/Spark/getNN.py 
http://10.75.145.25:80/Spark/test.py

Following is the error observed:

+ exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
http://10.75.145.25:80/Spark/test.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Traceback (most recent call last):
File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in 

from getNN import *
ImportError: No module named getNN
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory 
/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229

Observing the same kind of behaviour as mentioned in 
https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and 
available in pod)

This is also the same with the local files as well:

./spark-submit --deploy-mode cluster --master 
k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf 
spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf 
--py-files ./getNN.py http://10.75.145.25:80/Spark/test.py

test.py has dependencies from getNN.py.


But the same is working in spark 2.2 k8s branch.


Regards
Surya

From: Ilan Filonenko mailto:i...@cornell.edu>>
Sent: Wednesday, September 26, 2018 2:06 AM
To: liyinan...@gmail.com<mailto:liyinan...@gmail.com>
Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) 
mailto:suryanarayana.garlap...@nokia.com>>; 
Spark dev list mailto:dev@spark.apache.org>>; 
u...@spark.apache.org<mailto:u...@spark.apache.org>
Subject: Re: Python kubernetes spark 2.4 branch

Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ?

On Tue, Sep 25, 2018 at 12:38 PM Yinan Li 
mailto:liyinan...@gmail.com>> wrote:
Can you give more details on how you ran your app, did you build your own 
image, and which image are you using?

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - 
IN/Bangalore) 
mailto:suryanarayana.garlap...@nokia.com>> 
wrote:
Hi,
I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. 
When the dependent files are passed through the --py-files option, they are not 
getting resolved by the main python script. Please let me know, is this a known 
issue?

Regards
Surya



RE: Python kubernetes spark 2.4 branch

2018-09-25 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi Ilan/ Yinan,
Yes my test case is also similar to the one described in 
https://issues.apache.org/jira/browse/SPARK-24736

My spark-submit is as follows:
./spark-submit --deploy-mode cluster --master 
k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf 
spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf 
--py-files http://10.75.145.25:80/Spark/getNN.py 
http://10.75.145.25:80/Spark/test.py

Following is the error observed:

+ exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
http://10.75.145.25:80/Spark/test.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Traceback (most recent call last):
File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in 

from getNN import *
ImportError: No module named getNN
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory 
/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229

Observing the same kind of behaviour as mentioned in 
https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and 
available in pod)

This is also the same with the local files as well:

./spark-submit --deploy-mode cluster --master 
k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf 
spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf 
--py-files ./getNN.py http://10.75.145.25:80/Spark/test.py

test.py has dependencies from getNN.py.


But the same is working in spark 2.2 k8s branch.


Regards
Surya

From: Ilan Filonenko 
Sent: Wednesday, September 26, 2018 2:06 AM
To: liyinan...@gmail.com
Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) 
; Spark dev list ; 
u...@spark.apache.org
Subject: Re: Python kubernetes spark 2.4 branch

Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ?

On Tue, Sep 25, 2018 at 12:38 PM Yinan Li 
mailto:liyinan...@gmail.com>> wrote:
Can you give more details on how you ran your app, did you build your own 
image, and which image are you using?

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - 
IN/Bangalore) 
mailto:suryanarayana.garlap...@nokia.com>> 
wrote:
Hi,
I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. 
When the dependent files are passed through the --py-files option, they are not 
getting resolved by the main python script. Please let me know, is this a known 
issue?

Regards
Surya



Python kubernetes spark 2.4 branch

2018-09-25 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi,
I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. 
When the dependent files are passed through the --py-files option, they are not 
getting resolved by the main python script. Please let me know, is this a known 
issue?

Regards
Surya



RE: Support STS to run in k8s deployment with spark deployment mode as cluster

2018-09-15 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi,
Following is the bug to track the same.

https://issues.apache.org/jira/browse/SPARK-25442

Regards
Surya

From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Sunday, September 16, 2018 10:15 AM
To: dev@spark.apache.org; Ilan Filonenko 
Cc: u...@spark.apache.org; Imandi, Srinivas (Nokia - IN/Bangalore) 
; Chakradhar, N R (Nokia - IN/Bangalore) 
; Rao, Abhishek (Nokia - IN/Bangalore) 

Subject: Support STS to run in k8s deployment with spark deployment mode as 
cluster

Hi All,
I would like to propose the following changes for supporting the STS to run in 
k8s deployments with spark deployment mode as cluster.

PR: https://github.com/apache/spark/pull/22433

Can you please review and provide the comments?


Regards
Surya



Support STS to run in k8s deployment with spark deployment mode as cluster

2018-09-15 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi All,
I would like to propose the following changes for supporting the STS to run in 
k8s deployments with spark deployment mode as cluster.

PR: https://github.com/apache/spark/pull/22433

Can you please review and provide the comments?


Regards
Surya



RE: Query on Spark Hive with kerberos Enabled on Kubernetes

2018-07-23 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi Sandeep,
Any inputs on this?

Regards
Surya

From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Saturday, July 21, 2018 6:50 PM
To: Sandeep Katta 
Cc: dev@spark.apache.org; u...@spark.apache.org
Subject: RE: Query on Spark Hive with kerberos Enabled on Kubernetes

Hi Sandeep,
Thx for the response:
I am using following commands: (xml files hive-site.xml, core-site.xml and 
hdfs-site.xml are made available by exporting through the HADOOP_CONF_DIR 
option).

For HDFS Access which succeeds:
./spark-submit --deploy-mode cluster --master 
k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace 
default --conf spark.kubernetes.kerberos.enabled=true --conf 
spark.kubernetes.kerberos.principal= --conf 
spark.kubernetes.kerberos.keytab= --conf 
spark.kubernetes.driver.docker.image= --conf 
spark.kubernetes.executor.docker.image= --conf 
spark.kubernetes.initcontainer.docker.image= --conf 
spark.kubernetes.resourceStagingServer.uri=http://:1 
../examples/src/main/python/wordcount.py hdfs://:8020/tmp/wordcount.txt


For Hive Access (this is failing):
./spark-submit --deploy-mode cluster --master 
k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace 
default --conf spark.kubernetes.kerberos.enabled=true --files /etc/krb5.conf, 
,../examples/src/main/resources/kv1.txt --conf 
spark.kubernetes.kerberos.principal= --conf 
spark.kubernetes.kerberos.keytab= --conf 
spark.kubernetes.driver.docker.image= --conf 
spark.kubernetes.executor.docker.image= --conf 
spark.kubernetes.initcontainer.docker.image= --conf 
spark.kubernetes.resourceStagingServer.uri=http://:1 
../examples/src/main/python/sql/hive.py

Following is the error:
2018-07-19 04:15:55 INFO  HiveUtils:54 - Initializing HiveMetastoreConnection 
version 1.2.1 using Spark classes.
2018-07-19 04:15:56 INFO  metastore:376 - Trying to connect to metastore with 
URI thrift://vm-10-75-145-54:9083
2018-07-19 04:15:56 ERROR TSaslTransport:315 - SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)

If I don’t provide the krb5.conf in the above spark-submit:
I get an error saying unable to find any default realm.

One work around I had found, if I generate any tgt by doing the kinit and copy 
it into the driver pod  into location /tmp/krb5cc_0, it works fine. I guess 
this should not be the way to do it. It should generate automatically the tgt 
and should access the hive metastore. Please let me know, if doing wrong.

Regards
Surya

From: Sandeep Katta [mailto:sandeep0102.opensou...@gmail.com]
Sent: Friday, July 20, 2018 9:59 PM
To: Garlapati, Suryanarayana (Nokia - IN/Bangalore) 
mailto:suryanarayana.garlap...@nokia.com>>
Cc: dev@spark.apache.org<mailto:dev@spark.apache.org>; 
u...@spark.apache.org<mailto:u...@spark.apache.org>
Subject: Re: Query on Spark Hive with kerberos Enabled on Kubernetes

Can you please tell us what exception you ve got,any logs for the same ?

On Fri, 20 Jul 2018 at 8:36 PM, Garlapati, Suryanarayana (Nokia - IN/Bangalore) 
mailto:suryanarayana.garlap...@nokia.com>> 
wrote:
Hi All,
I am trying to use Spark 2.2.0 
Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0)
 code to run the Hive Query on Kerberos Enabled cluster. Spark-submit’s fail 
for the Hive Queries, but pass when I am trying to access the hdfs. Is this a 
known limitation or am I doing something wrong. Please let me know. If this is 
working, can you please specify an example for running Hive Queries?

Thanks.

Regards
Surya


Query on Spark Hive with kerberos Enabled on Kubernetes

2018-07-20 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi All,
I am trying to use Spark 2.2.0 
Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0)
 code to run the Hive Query on Kerberos Enabled cluster. Spark-submit's fail 
for the Hive Queries, but pass when I am trying to access the hdfs. Is this a 
known limitation or am I doing something wrong. Please let me know. If this is 
working, can you please specify an example for running Hive Queries?

Thanks.

Regards
Surya