RE: [VOTE] SPARK 2.4.0 (RC3)
Might be you need to change the date(Oct 1 has already passed). >> The vote is open until October 1 PST and passes if a majority +1 PMC votes >> are cast, with >> a minimum of 3 +1 votes. Regards Surya From: Wenchen Fan Sent: Wednesday, October 10, 2018 10:20 PM To: Spark dev list Subject: Re: [VOTE] SPARK 2.4.0 (RC3) I'm adding my own +1, since there are no known blocker issues. The correctness issue has been fixed, the streaming Java API problem has been resolved, and we have upgraded to Scala 2.12.7. On Thu, Oct 11, 2018 at 12:46 AM Wenchen Fan mailto:cloud0...@gmail.com>> wrote: Please vote on releasing the following candidate as Apache Spark version 2.4.0. The vote is open until October 1 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ The tag to be voted on is v2.4.0-rc3 (commit 8e4a99bd201b9204fec52580f19ae70a229ed94e): https://github.com/apache/spark/tree/v2.4.0-rc3 The release files, including signatures, digests, etc. can be found at: https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc3-bin/ Signatures used for Spark RCs can be found in this file: https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1289 The documentation corresponding to this release can be found at: https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc3-docs/ The list of bug fixes going into 2.4.0 can be found at the following URL: https://issues.apache.org/jira/projects/SPARK/versions/12342385 FAQ = How can I help test this release? = If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions. If you're working in PySpark you can set up a virtual env and install the current RC and see if anything important breaks, in the Java/Scala you can add the staging repository to your projects resolvers and test with the RC (make sure to clean up the artifact cache before/after so you don't end up building with a out of date RC going forward). === What should happen to JIRA tickets still targeting 2.4.0? === The current list of open tickets targeted at 2.4.0 can be found at: https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.0 Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to an appropriate release. == But my bug isn't fixed? == In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which is a regression that has not been correctly targeted please ping me or a committer to help target the issue.
RE: Python kubernetes spark 2.4 branch
Hi Ilan/Yinan, My observation is as follows: The dependent files specified with “--py-files http://10.75.145.25:80/Spark/getNN.py” are being downloaded and available in the container at “/var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/”. I guess we need to export PYTHONPATH with this path as well with following code change in entrypoint.sh if [ -n "$PYSPARK_FILES" ]; then PYTHONPATH="$PYTHONPATH:$PYSPARK_FILES" fi to if [ -n "$PYSPARK_FILES" ]; then PYTHONPATH="$PYTHONPATH:" fi Let me know, if this approach is fine. Please correct me if my understanding is wrong with this approach. Regards Surya From: Garlapati, Suryanarayana (Nokia - IN/Bangalore) Sent: Wednesday, September 26, 2018 9:14 AM To: Ilan Filonenko ; liyinan...@gmail.com Cc: Spark dev list ; u...@spark.apache.org Subject: RE: Python kubernetes spark 2.4 branch Hi Ilan/ Yinan, Yes my test case is also similar to the one described in https://issues.apache.org/jira/browse/SPARK-24736 My spark-submit is as follows: ./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files http://10.75.145.25:80/Spark/getNN.py http://10.75.145.25:80/Spark/test.py Following is the error observed: + exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner http://10.75.145.25:80/Spark/test.py SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Traceback (most recent call last): File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in from getNN import * ImportError: No module named getNN 2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called 2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229 Observing the same kind of behaviour as mentioned in https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and available in pod) This is also the same with the local files as well: ./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files ./getNN.py http://10.75.145.25:80/Spark/test.py test.py has dependencies from getNN.py. But the same is working in spark 2.2 k8s branch. Regards Surya From: Ilan Filonenko mailto:i...@cornell.edu>> Sent: Wednesday, September 26, 2018 2:06 AM To: liyinan...@gmail.com<mailto:liyinan...@gmail.com> Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) mailto:suryanarayana.garlap...@nokia.com>>; Spark dev list mailto:dev@spark.apache.org>>; u...@spark.apache.org<mailto:u...@spark.apache.org> Subject: Re: Python kubernetes spark 2.4 branch Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ? On Tue, Sep 25, 2018 at 12:38 PM Yinan Li mailto:liyinan...@gmail.com>> wrote: Can you give more details on how you ran your app, did you build your own image, and which image are you using? On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) mailto:suryanarayana.garlap...@nokia.com>> wrote: Hi, I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue? Regards Surya
RE: Python kubernetes spark 2.4 branch
Hi Ilan/ Yinan, Yes my test case is also similar to the one described in https://issues.apache.org/jira/browse/SPARK-24736 My spark-submit is as follows: ./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files http://10.75.145.25:80/Spark/getNN.py http://10.75.145.25:80/Spark/test.py Following is the error observed: + exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner http://10.75.145.25:80/Spark/test.py SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Traceback (most recent call last): File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in from getNN import * ImportError: No module named getNN 2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called 2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229 Observing the same kind of behaviour as mentioned in https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and available in pod) This is also the same with the local files as well: ./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files ./getNN.py http://10.75.145.25:80/Spark/test.py test.py has dependencies from getNN.py. But the same is working in spark 2.2 k8s branch. Regards Surya From: Ilan Filonenko Sent: Wednesday, September 26, 2018 2:06 AM To: liyinan...@gmail.com Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) ; Spark dev list ; u...@spark.apache.org Subject: Re: Python kubernetes spark 2.4 branch Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ? On Tue, Sep 25, 2018 at 12:38 PM Yinan Li mailto:liyinan...@gmail.com>> wrote: Can you give more details on how you ran your app, did you build your own image, and which image are you using? On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) mailto:suryanarayana.garlap...@nokia.com>> wrote: Hi, I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue? Regards Surya
Python kubernetes spark 2.4 branch
Hi, I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue? Regards Surya
RE: Support STS to run in k8s deployment with spark deployment mode as cluster
Hi, Following is the bug to track the same. https://issues.apache.org/jira/browse/SPARK-25442 Regards Surya From: Garlapati, Suryanarayana (Nokia - IN/Bangalore) Sent: Sunday, September 16, 2018 10:15 AM To: dev@spark.apache.org; Ilan Filonenko Cc: u...@spark.apache.org; Imandi, Srinivas (Nokia - IN/Bangalore) ; Chakradhar, N R (Nokia - IN/Bangalore) ; Rao, Abhishek (Nokia - IN/Bangalore) Subject: Support STS to run in k8s deployment with spark deployment mode as cluster Hi All, I would like to propose the following changes for supporting the STS to run in k8s deployments with spark deployment mode as cluster. PR: https://github.com/apache/spark/pull/22433 Can you please review and provide the comments? Regards Surya
Support STS to run in k8s deployment with spark deployment mode as cluster
Hi All, I would like to propose the following changes for supporting the STS to run in k8s deployments with spark deployment mode as cluster. PR: https://github.com/apache/spark/pull/22433 Can you please review and provide the comments? Regards Surya
RE: Query on Spark Hive with kerberos Enabled on Kubernetes
Hi Sandeep, Any inputs on this? Regards Surya From: Garlapati, Suryanarayana (Nokia - IN/Bangalore) Sent: Saturday, July 21, 2018 6:50 PM To: Sandeep Katta Cc: dev@spark.apache.org; u...@spark.apache.org Subject: RE: Query on Spark Hive with kerberos Enabled on Kubernetes Hi Sandeep, Thx for the response: I am using following commands: (xml files hive-site.xml, core-site.xml and hdfs-site.xml are made available by exporting through the HADOOP_CONF_DIR option). For HDFS Access which succeeds: ./spark-submit --deploy-mode cluster --master k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace default --conf spark.kubernetes.kerberos.enabled=true --conf spark.kubernetes.kerberos.principal= --conf spark.kubernetes.kerberos.keytab= --conf spark.kubernetes.driver.docker.image= --conf spark.kubernetes.executor.docker.image= --conf spark.kubernetes.initcontainer.docker.image= --conf spark.kubernetes.resourceStagingServer.uri=http://:1 ../examples/src/main/python/wordcount.py hdfs://:8020/tmp/wordcount.txt For Hive Access (this is failing): ./spark-submit --deploy-mode cluster --master k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace default --conf spark.kubernetes.kerberos.enabled=true --files /etc/krb5.conf, ,../examples/src/main/resources/kv1.txt --conf spark.kubernetes.kerberos.principal= --conf spark.kubernetes.kerberos.keytab= --conf spark.kubernetes.driver.docker.image= --conf spark.kubernetes.executor.docker.image= --conf spark.kubernetes.initcontainer.docker.image= --conf spark.kubernetes.resourceStagingServer.uri=http://:1 ../examples/src/main/python/sql/hive.py Following is the error: 2018-07-19 04:15:55 INFO HiveUtils:54 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. 2018-07-19 04:15:56 INFO metastore:376 - Trying to connect to metastore with URI thrift://vm-10-75-145-54:9083 2018-07-19 04:15:56 ERROR TSaslTransport:315 - SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) If I don’t provide the krb5.conf in the above spark-submit: I get an error saying unable to find any default realm. One work around I had found, if I generate any tgt by doing the kinit and copy it into the driver pod into location /tmp/krb5cc_0, it works fine. I guess this should not be the way to do it. It should generate automatically the tgt and should access the hive metastore. Please let me know, if doing wrong. Regards Surya From: Sandeep Katta [mailto:sandeep0102.opensou...@gmail.com] Sent: Friday, July 20, 2018 9:59 PM To: Garlapati, Suryanarayana (Nokia - IN/Bangalore) mailto:suryanarayana.garlap...@nokia.com>> Cc: dev@spark.apache.org<mailto:dev@spark.apache.org>; u...@spark.apache.org<mailto:u...@spark.apache.org> Subject: Re: Query on Spark Hive with kerberos Enabled on Kubernetes Can you please tell us what exception you ve got,any logs for the same ? On Fri, 20 Jul 2018 at 8:36 PM, Garlapati, Suryanarayana (Nokia - IN/Bangalore) mailto:suryanarayana.garlap...@nokia.com>> wrote: Hi All, I am trying to use Spark 2.2.0 Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0) code to run the Hive Query on Kerberos Enabled cluster. Spark-submit’s fail for the Hive Queries, but pass when I am trying to access the hdfs. Is this a known limitation or am I doing something wrong. Please let me know. If this is working, can you please specify an example for running Hive Queries? Thanks. Regards Surya
Query on Spark Hive with kerberos Enabled on Kubernetes
Hi All, I am trying to use Spark 2.2.0 Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0) code to run the Hive Query on Kerberos Enabled cluster. Spark-submit's fail for the Hive Queries, but pass when I am trying to access the hdfs. Is this a known limitation or am I doing something wrong. Please let me know. If this is working, can you please specify an example for running Hive Queries? Thanks. Regards Surya