[ https://issues.apache.org/jira/browse/SPARK-24736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578624#comment-16578624 ]
holdenk commented on SPARK-24736: --------------------------------- cc [~ifilonenko] > --py-files not functional for non local URLs. It appears to pass non-local > URL's into PYTHONPATH directly. > ---------------------------------------------------------------------------------------------------------- > > Key: SPARK-24736 > URL: https://issues.apache.org/jira/browse/SPARK-24736 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark > Affects Versions: 2.4.0 > Environment: Recent 2.4.0 from master branch, submitted on Linux to a > KOPS Kubernetes cluster created on AWS. > > Reporter: Jonathan A Weaver > Priority: Minor > > My spark-submit > bin/spark-submit \ > --master > k8s://[https://internal-api-test-k8s-local-7afed8-796273878.us-east-1.elb.amazonaws.com|https://internal-api-test-k8s-local-7afed8-796273878.us-east-1.elb.amazonaws.com/] > \ > --deploy-mode cluster \ > --name pytest \ > --conf > spark.kubernetes.container.image=[412834075398.dkr.ecr.us-east-1.amazonaws.com/fids/pyspark-k8s:latest|http://412834075398.dkr.ecr.us-east-1.amazonaws.com/fids/pyspark-k8s:latest] > \ > --conf > [spark.kubernetes.driver.pod.name|http://spark.kubernetes.driver.pod.name/]=spark-pi-driver > \ > --conf > spark.kubernetes.authenticate.submission.caCertFile=[cluster.ca|http://cluster.ca/] > \ > --conf spark.kubernetes.authenticate.submission.oauthToken=$TOK \ > --conf spark.kubernetes.authenticate.driver.oauthToken=$TOK \ > --py-files "[https://s3.amazonaws.com/maxar-ids-fids/screw.zip]" \ > [https://s3.amazonaws.com/maxar-ids-fids/it.py] > > *screw.zip is successfully downloaded and placed in SparkFIles.getRootPath()* > 2018-07-01 07:33:43 INFO SparkContext:54 - Added file > [https://s3.amazonaws.com/maxar-ids-fids/screw.zip] at > [https://s3.amazonaws.com/maxar-ids-fids/screw.zip] with timestamp > 1530430423297 > 2018-07-01 07:33:43 INFO Utils:54 - Fetching > [https://s3.amazonaws.com/maxar-ids-fids/screw.zip] to > /var/data/spark-7aba748d-2bba-4015-b388-c2ba9adba81e/spark-0ed5a100-6efa-45ca-ad4c-d1e57af76ffd/userFiles-a053206e-33d9-4245-b587-f8ac26d4c240/fetchFileTemp1549645948768432992.tmp > *I print out the PYTHONPATH and PYSPARK_FILES environment variables from the > driver script:* > PYTHONPATH > /opt/spark/python/lib/pyspark.zip:/opt/spark/python/lib/py4j-0.10.7-src.zip:/opt/spark/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/opt/spark/python/lib/pyspark.zip:/opt/spark/python/lib/py4j-*.zip:*[https://s3.amazonaws.com/maxar-ids-fids/screw.zip]* > PYSPARK_FILES [https://s3.amazonaws.com/maxar-ids-fids/screw.zip] > > *I print out sys.path* > ['/tmp/spark-fec3684b-8b63-4f43-91a4-2f2fa41a1914', > u'/var/data/spark-7aba748d-2bba-4015-b388-c2ba9adba81e/spark-0ed5a100-6efa-45ca-ad4c-d1e57af76ffd/userFiles-a053206e-33d9-4245-b587-f8ac26d4c240', > '/opt/spark/python/lib/pyspark.zip', > '/opt/spark/python/lib/py4j-0.10.7-src.zip', > '/opt/spark/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar', > '/opt/spark/python/lib/py4j-*.zip', *'/opt/spark/work-dir/https', > '//[s3.amazonaws.com/maxar-ids-fids/screw.zip|http://s3.amazonaws.com/maxar-ids-fids/screw.zip]',* > '/usr/lib/python27.zip', '/usr/lib/python2.7', > '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk', > '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', > '/usr/lib/python2.7/site-packages'] > > *URL from PYTHONFILES gets placed in sys.path verbatim with obvious results.* > > *Dump of spark config from container.* > Spark config dumped: > [(u'spark.master', > u'k8s://[https://internal-api-test-k8s-local-7afed8-796273878.us-east-1.elb.amazonaws.com|https://internal-api-test-k8s-local-7afed8-796273878.us-east-1.elb.amazonaws.com/]'), > (u'spark.kubernetes.authenticate.submission.oauthToken', > u'<present_but_redacted>'), > (u'spark.kubernetes.authenticate.driver.oauthToken', > u'<present_but_redacted>'), (u'spark.kubernetes.executor.podNamePrefix', > u'pytest-1530430411996'), (u'spark.kubernetes.memoryOverheadFactor', u'0.4'), > (u'spark.driver.blockManager.port', u'7079'), > (u'[spark.app.id|http://spark.app.id/]', u'spark-application-1530430424433'), > (u'[spark.app.name|http://spark.app.name/]', u'pytest'), > (u'[spark.executor.id|http://spark.executor.id/]', u'driver'), > (u'spark.driver.host', u'pytest-1530430411996-driver-svc.default.svc'), > (u'spark.kubernetes.container.image', > u'[412834075398.dkr.ecr.us-east-1.amazonaws.com/fids/pyspark-k8s:latest'|http://412834075398.dkr.ecr.us-east-1.amazonaws.com/fids/pyspark-k8s:latest']), > (u'spark.driver.port', u'7078'), > (u'spark.kubernetes.python.mainAppResource', > u'[https://s3.amazonaws.com/maxar-ids-fids/it.py']), > (u'spark.kubernetes.authenticate.submission.caCertFile', > u'[cluster.ca|http://cluster.ca/]'), (u'spark.rdd.compress', u'True'), > (u'spark.driver.bindAddress', u'100.120.0.1'), > (u'[spark.kubernetes.driver.pod.name|http://spark.kubernetes.driver.pod.name/]', > u'spark-pi-driver'), (u'spark.serializer.objectStreamReset', u'100'), > (u'spark.files', > u'[https://s3.amazonaws.com/maxar-ids-fids/it.py,https://s3.amazonaws.com/maxar-ids-fids/screw.zip']), > (u'spark.kubernetes.python.pyFiles', > u'[https://s3.amazonaws.com/maxar-ids-fids/screw.zip']), > (u'spark.kubernetes.authenticate.driver.mounted.oauthTokenFile', > u'/mnt/secrets/spark-kubernetes-credentials/oauth-token'), > (u'spark.submit.deployMode', u'client'), (u'spark.kubernetes.submitInDriver', > u'true')] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org