Re: [PySpark] Failed to add file [file:///tmp/app-submodules.zip] specified in 'spark.submit.pyFiles' to Python path:

2023-08-09 Thread lnxpgn
Yes, ls -l /tmp/app-submodules.zip, hdfs dfs -ls /tmp/app-submodules.zip 
can show the file.


在 2023/8/9 22:48, Mich Talebzadeh 写道:
If you are running in the cluster mode, that zip file should exist in 
all the nodes! Is that the case?


HTH


Mich Talebzadeh,
Solutions Architect/Engineering Lead
London
United Kingdom


**view my Linkedin profile 




https://en.everybodywiki.com/Mich_Talebzadeh

*Disclaimer:* Use it at your own risk.Any and all responsibility for 
any loss, damage or destruction of data or any other property which 
may arise from relying on this email's technical content is explicitly 
disclaimed. The author will in no case be liable for any monetary 
damages arising from such loss, damage or destruction.




On Wed, 9 Aug 2023 at 13:41, lnxpgn  wrote:

Hi,

I am using Spark 3.4.1, running on YARN. Hadoop runs on a
single-node in
a pseudo-distributed mode.

spark-submit --master yarn --deploy-mode cluster --py-files
/tmp/app-submodules.zip app.py

The YARN application ran successfully, but have a warning log message:


/opt/hadoop-tmp-dir/nm-local-dir/usercache/bigdata/appcache/application_1691548913900_0002/container_1691548913900_0002_01_01/pyspark.zip/pyspark/context.py:350:

RuntimeWarning: Failed to add file [file:///tmp/app-submodules.zip]
specified in 'spark.submit.pyFiles' to Python path:

If I use HDFS file:

spark-submit --master yarn --deploy-mode cluster --py-files
hdfs://hadoop-namenode:9000/tmp/app-submodules.zip app.py

the warning message looks like this:


/opt/hadoop-tmp-dir/nm-local-dir/usercache/bigdata/appcache/application_1691548913900_0002/container_1691548913900_0002_01_01/pyspark.zip/pyspark/context.py:350:

RuntimeWarning: Failed to add file
[hdfs://hadoop-namenode:9000/app-submodules.zip] specified in
'spark.submit.pyFiles' to Python path:

The part code of context.py:

filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
if not os.path.exists(filepath):
 shutil.copyfile(path, filepath)

Look like the submitted Python file has 'file:', 'hdfs:' URI schemes,
shutil.copyfile treats them as part of the file name.

I searched, but didn't find useful information, didn't know why,
this is
a bug or I did something wrong?




-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org


Spark Connect, Master, and Workers

2023-08-09 Thread Kezhi Xiong
Hi,

I'm recently learning Spark Connect but have some questions regarding the
connect server's relation with master or workers: so when I'm using the
connect server, I don't have to start a master alone side to make clients
work. Is the connect server simply using "local[*]" as master? Then, if I
want to add workers for my connect server, is it supported and what should
I do?

Kezhi


unsubscribe

2023-08-09 Thread heri wijayanto
unsubscribe


Re: dockerhub does not contain apache/spark-py 3.4.1

2023-08-09 Thread Mich Talebzadeh
Hi Mark,

you can build it yourself, no big deal :)

REPOSITORY TAG
  IMAGE ID   CREATED
 SIZE
sparkpy/spark-py
 3.4.1-scala_2.12-11-jre-slim-buster-Dockerfile a876102b2206   1
second ago1.09GB
sparkpy/spark
3.4.1-scala_2.12-11-jre-slim-buster-Dockerfile 6f74f7475e01   3
minutes ago   695MB

Based on

ARG java_image_tag=11-jre-slim  ## java 11
FROM openjdk:${java_image_tag}

BASE_OS="buster"
SPARK_VERSION="3.4.1"
SCALA_VERSION="scala_2.12"
DOCKERFILE="Dockerfile"
DOCKERIMAGETAG="11-jre-slim"

You need to modify the file

$SPARK_HOME/kubernetes/dockerfiles/spark/Dockerfile

and replace

#ARG java_image_tag=17-jre
#FROM eclipse-temurin:${java_image_tag}

With

ARG java_image_tag=11-jre-slim
FROM openjdk:${java_image_tag}

Which is Java 11

HTH


Mich Talebzadeh,
Solutions Architect/Engineering Lead
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 9 Aug 2023 at 16:43, Mark Elliot  wrote:

> Hello,
>
> I noticed that the apache/spark-py image for Spark's 3.4.1 release is not
> available (apache/spark@3.4.1 is available). Would it be possible to get
> the 3.4.1 release build for the apache/spark-py image published?
>
> Thanks,
>
> Mark
>
> --
>
> This communication, together with any attachments, is intended only for
> the addressee(s) and may contain confidential, privileged or proprietary
> information of Theorem Partners LLC ("Theorem"). By accepting this
> communication you agree to keep confidential all information contained in
> this communication, as well as any information derived by you from the
> confidential information contained in this communication. Theorem does not
> waive any confidentiality by misdelivery.
>
> If you receive this communication in error, any use, dissemination,
> printing or copying of all or any part of it is strictly prohibited; please
> destroy all electronic and paper copies and notify the sender immediately.
> Nothing in this email is intended to constitute (1) investment, legal or
> tax advice, (2) any recommendation to purchase or sell any security, (3)
> any advertisement or offer of advisory services or (4) any offer to sell or
> solicitation of an offer to buy any securities or other financial
> instrument in any jurisdiction.
>
> Theorem, including its agents or affiliates, reserves the right to
> intercept, archive, monitor and review all communications to and from its
> network, including this email and any email response to it.
>
> Theorem makes no representation as to the accuracy or completeness of the
> information in this communication and does not accept liability for any
> errors or omissions in this communication, including any liability
> resulting from its transmission by email, and undertakes no obligation to
> update any information in this email or its attachments.
>


dockerhub does not contain apache/spark-py 3.4.1

2023-08-09 Thread Mark Elliot
Hello,

I noticed that the apache/spark-py image for Spark's 3.4.1 release is not
available (apache/spark@3.4.1 is available). Would it be possible to get
the 3.4.1 release build for the apache/spark-py image published?

Thanks,

Mark

-- 










This communication, together with any attachments, is intended 
only for the addressee(s) and may contain confidential, privileged or 
proprietary information of Theorem Partners LLC ("Theorem"). By accepting 
this communication you agree to keep confidential all information contained 
in this communication, as well as any information derived by you from the 
confidential information contained in this communication. Theorem does not 
waive any confidentiality by misdelivery.

If you receive this 
communication in error, any use, dissemination, printing or copying of all 
or any part of it is strictly prohibited; please destroy all electronic and 
paper copies and notify the sender immediately. Nothing in this email is 
intended to constitute (1) investment, legal or tax advice, (2) any 
recommendation to purchase or sell any security, (3) any advertisement or 
offer of advisory services or (4) any offer to sell or solicitation of an 
offer to buy any securities or other financial instrument in any 
jurisdiction.

Theorem, including its agents or affiliates, reserves the 
right to intercept, archive, monitor and review all communications to and 
from its network, including this email and any email response to it.

Theorem makes no representation as to the accuracy or completeness of the 
information in this communication and does not accept liability for any 
errors or omissions in this communication, including any liability 
resulting from its transmission by email, and undertakes no obligation to 
update any information in this email or its attachments.


Re: [PySpark] Failed to add file [file:///tmp/app-submodules.zip] specified in 'spark.submit.pyFiles' to Python path:

2023-08-09 Thread Mich Talebzadeh
If you are running in the cluster mode, that zip file should exist in all
the nodes! Is that the case?

HTH


Mich Talebzadeh,
Solutions Architect/Engineering Lead
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 9 Aug 2023 at 13:41, lnxpgn  wrote:

> Hi,
>
> I am using Spark 3.4.1, running on YARN. Hadoop runs on a single-node in
> a pseudo-distributed mode.
>
> spark-submit --master yarn --deploy-mode cluster --py-files
> /tmp/app-submodules.zip app.py
>
> The YARN application ran successfully, but have a warning log message:
>
> /opt/hadoop-tmp-dir/nm-local-dir/usercache/bigdata/appcache/application_1691548913900_0002/container_1691548913900_0002_01_01/pyspark.zip/pyspark/context.py:350:
>
> RuntimeWarning: Failed to add file [file:///tmp/app-submodules.zip]
> specified in 'spark.submit.pyFiles' to Python path:
>
> If I use HDFS file:
>
> spark-submit --master yarn --deploy-mode cluster --py-files
> hdfs://hadoop-namenode:9000/tmp/app-submodules.zip app.py
>
> the warning message looks like this:
>
> /opt/hadoop-tmp-dir/nm-local-dir/usercache/bigdata/appcache/application_1691548913900_0002/container_1691548913900_0002_01_01/pyspark.zip/pyspark/context.py:350:
>
> RuntimeWarning: Failed to add file
> [hdfs://hadoop-namenode:9000/app-submodules.zip] specified in
> 'spark.submit.pyFiles' to Python path:
>
> The part code of context.py:
>
> filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
> if not os.path.exists(filepath):
>  shutil.copyfile(path, filepath)
>
> Look like the submitted Python file has 'file:', 'hdfs:' URI schemes,
> shutil.copyfile treats them as part of the file name.
>
> I searched, but didn't find useful information, didn't know why, this is
> a bug or I did something wrong?
>
>
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


[PySpark] Failed to add file [file:///tmp/app-submodules.zip] specified in 'spark.submit.pyFiles' to Python path:

2023-08-09 Thread lnxpgn

Hi,

I am using Spark 3.4.1, running on YARN. Hadoop runs on a single-node in 
a pseudo-distributed mode.


spark-submit --master yarn --deploy-mode cluster --py-files 
/tmp/app-submodules.zip app.py


The YARN application ran successfully, but have a warning log message:

/opt/hadoop-tmp-dir/nm-local-dir/usercache/bigdata/appcache/application_1691548913900_0002/container_1691548913900_0002_01_01/pyspark.zip/pyspark/context.py:350: 
RuntimeWarning: Failed to add file [file:///tmp/app-submodules.zip] 
specified in 'spark.submit.pyFiles' to Python path:


If I use HDFS file:

spark-submit --master yarn --deploy-mode cluster --py-files 
hdfs://hadoop-namenode:9000/tmp/app-submodules.zip app.py


the warning message looks like this:

/opt/hadoop-tmp-dir/nm-local-dir/usercache/bigdata/appcache/application_1691548913900_0002/container_1691548913900_0002_01_01/pyspark.zip/pyspark/context.py:350: 
RuntimeWarning: Failed to add file 
[hdfs://hadoop-namenode:9000/app-submodules.zip] specified in 
'spark.submit.pyFiles' to Python path:


The part code of context.py:

filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
if not os.path.exists(filepath):
    shutil.copyfile(path, filepath)

Look like the submitted Python file has 'file:', 'hdfs:' URI schemes, 
shutil.copyfile treats them as part of the file name.


I searched, but didn't find useful information, didn't know why, this is 
a bug or I did something wrong?





-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org