Re: pyspark.ml.recommendation is using the wrong python version

2023-09-04 Thread Mich Talebzadeh
Hi,

Have you set python environment variables correctly?

PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON?

You can print the environment variables within your PySpark script to
verify this:

import os
print("PYTHONPATH:", os.environ.get("PYTHONPATH"))
print("PYSPARK_PYTHON:", os.environ.get("PYSPARK_PYTHON"))
print("PYSPARK_DRIIVER_PYTHON:", os.environ.get("PYSPARK_DRIVER_PYTHON"))

You can set this in your .bashrc or $SPARK_HOME/confspark-env.sh

HTH

Mich Talebzadeh,
Solutions Architect & Engineer
London
United Kingdom

Disclaimer: Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 Sept 2023 at 06:12, Harry Jamison
 wrote:

> That did not paste well, let me try again
>
>
> I am using python3.7 and spark 2.4.7
>
> I am trying to figure out why my job is using the wrong python version
>
> This is how it is starting up the logs confirm that I am using python 3.7
> But I later see the error message showing it is trying to us 3.8, and I am
> not sure where it is picking that up.
>
>
> SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
>
> Here is my command
> sudo --preserve-env -u spark pyspark --deploy-mode client --jars
> /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-6.0.0.7.1.7.0-551.jar
> --verbose --py-files pullhttp/base_http_pull.py --master yarn
>
> Python 3.7.17 (default, Jun  6 2023, 20:10:10)
>
> [GCC 9.4.0] on linux
>
>
>
> And when I try to run als.fit on my training data I get this
>
> >>> model = als.fit(training)
> [Stage 0:>  (0 +
> 1) / 1]23/09/04 21:42:10 WARN scheduler.TaskSetManager: Lost task 0.0 in
> stage 0.0 (TID 0, datanode1, executor 2): org.apache.spark.SparkException:
> Error from python worker:
>   Traceback (most recent call last):
> File "/usr/lib/python3.8/runpy.py", line 185, in _run_module_as_main
>   mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
> File "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details
>   __import__(pkg_name)
> File "", line 991, in _find_and_load
> File "", line 975, in
> _find_and_load_unlocked
> File "", line 655, in _load_unlocked
> File "", line 618, in
> _load_backward_compatible
> File "", line 259, in load_module
> File
> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/pyspark.zip/pyspark/__init__.py",
> line 51, in 
>
>
> 
>
>
> File
> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/pyspark.zip/pyspark/cloudpickle.py",
> line 145, in 
> File
> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/pyspark.zip/pyspark/cloudpickle.py",
> line 126, in _make_cell_set_template_code
>   TypeError: an integer is required (got type bytes)
> PYTHONPATH was:
>
> /yarn/nm/usercache/spark/filecache/1130/__spark_libs__3536427065776590449.zip/spark-core_2.11-2.4.7.jar:/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/py4j-0.10.7-src.zip:/usr/local/lib/python3.7/dist-packages/pyspark/python/::/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/__pyfiles__:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/pyspark.zip:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/py4j-0.10.7-src.zip
> org.apache.spark.SparkException: No port number in pyspark.daemon's stdout
>
>
> On Monday, September 4, 2023 at 10:08:56 PM PDT, Harry Jamison
>  wrote:
>
>
> I am using python3.7 and spark 2.4.7
>
> I am trying to figure out why my job is using the wrong python version
>
> This is how it is starting up the logs confirm that I am using python 3.7
> But I later see the error message showing it is trying to us 3.8, and I am
> not sure where it is picking that up.
>
>
> SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
>
> Here is my command
>
> sudo --preserve-env -u spark pyspark --deploy-mode client --jars
> 

Re: pyspark.ml.recommendation is using the wrong python version

2023-09-04 Thread Harry Jamison
 That did not paste well, let me try again

I am using python3.7 and spark 2.4.7
I am trying to figure out why my job is using the wrong python version
This is how it is starting up the logs confirm that I am using python 3.7But I 
later see the error message showing it is trying to us 3.8, and I am not sure 
where it is picking that up.

SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
Here is my commandsudo --preserve-env -u spark pyspark --deploy-mode client 
--jars 
/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-6.0.0.7.1.7.0-551.jar
  --verbose --py-files pullhttp/base_http_pull.py --master yarn
Python 3.7.17 (default, Jun  6 2023, 20:10:10) 
[GCC 9.4.0] on linux


And when I try to run als.fit on my training data I get this
>>> model = als.fit(training)[Stage 0:>                                         
>>>                  (0 + 1) / 1]23/09/04 21:42:10 WARN 
>>> scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, datanode1, 
>>> executor 2): org.apache.spark.SparkException: Error from python worker:  
>>> Traceback (most recent call last):    File "/usr/lib/python3.8/runpy.py", 
>>> line 185, in _run_module_as_main      mod_name, mod_spec, code = 
>>> _get_module_details(mod_name, _Error)    File 
>>> "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details      
>>> __import__(pkg_name)    File "", line 991, in 
>>> _find_and_load    File "", line 975, in 
>>> _find_and_load_unlocked    File "", line 655, 
>>> in _load_unlocked    File "", line 618, in 
>>> _load_backward_compatible    File "", line 259, in 
>>> load_module    File 
>>> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/pyspark.zip/pyspark/__init__.py",
>>>  line 51, in 



    File 
"/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/pyspark.zip/pyspark/cloudpickle.py",
 line 145, in     File 
"/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/pyspark.zip/pyspark/cloudpickle.py",
 line 126, in _make_cell_set_template_code  TypeError: an integer is required 
(got type bytes)PYTHONPATH was:  
/yarn/nm/usercache/spark/filecache/1130/__spark_libs__3536427065776590449.zip/spark-core_2.11-2.4.7.jar:/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/py4j-0.10.7-src.zip:/usr/local/lib/python3.7/dist-packages/pyspark/python/::/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/__pyfiles__:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/pyspark.zip:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/py4j-0.10.7-src.ziporg.apache.spark.SparkException:
 No port number in pyspark.daemon's stdout

On Monday, September 4, 2023 at 10:08:56 PM PDT, Harry Jamison 
 wrote:  
 
 I am using python3.7 and spark 2.4.7
I am trying to figure out why my job is using the wrong python version
This is how it is starting up the logs confirm that I am using python 3.7But I 
later see the error message showing it is trying to us 3.8, and I am not sure 
where it is picking that up.

SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
Here is my command
sudo --preserve-env -u spark pyspark --deploy-mode client --jars 
/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-6.0.0.7.1.7.0-551.jar
  --verbose --py-files pullhttp/base_http_pull.py --master yarn

Python 3.7.17 (default, Jun  6 2023, 20:10:10) 


[GCC 9.4.0] on linux


And when I try to run als.fit on my training data I get this
>>> model = als.fit(training)[Stage 0:>                                         
>>>                  (0 + 1) / 1]23/09/04 21:42:10 WARN 
>>> scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, datanode1, 
>>> executor 2): org.apache.spark.SparkException: Error from python worker:  
>>> Traceback (most recent call last):    File "/usr/lib/python3.8/runpy.py", 
>>> line 185, in _run_module_as_main      mod_name, mod_spec, code = 
>>> _get_module_details(mod_name, _Error)    File 
>>> "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details      
>>> __import__(pkg_name)    File "", line 991, in 
>>> _find_and_load    File "", line 975, in 
>>> _find_and_load_unlocked    File "", line 655, 
>>> in _load_unlocked    File "", line 618, in 
>>> _load_backward_compatible    File "", line 259, in 
>>> load_module    File 
>>> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/pyspark.zip/pyspark/__init__.py",
>>>  line 51, in 



    File 
"/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/pyspark.zip/pyspark/cloudpickle.py",
 line 145, in     File 

pyspark.ml.recommendation is using the wrong python version

2023-09-04 Thread Harry Jamison
I am using python3.7 and spark 2.4.7
I am trying to figure out why my job is using the wrong python version
This is how it is starting up the logs confirm that I am using python 3.7But I 
later see the error message showing it is trying to us 3.8, and I am not sure 
where it is picking that up.

SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
Here is my command
sudo --preserve-env -u spark pyspark --deploy-mode client --jars 
/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-6.0.0.7.1.7.0-551.jar
  --verbose --py-files pullhttp/base_http_pull.py --master yarn

Python 3.7.17 (default, Jun  6 2023, 20:10:10) 


[GCC 9.4.0] on linux


And when I try to run als.fit on my training data I get this
>>> model = als.fit(training)[Stage 0:>                                         
>>>                  (0 + 1) / 1]23/09/04 21:42:10 WARN 
>>> scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, datanode1, 
>>> executor 2): org.apache.spark.SparkException: Error from python worker:  
>>> Traceback (most recent call last):    File "/usr/lib/python3.8/runpy.py", 
>>> line 185, in _run_module_as_main      mod_name, mod_spec, code = 
>>> _get_module_details(mod_name, _Error)    File 
>>> "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details      
>>> __import__(pkg_name)    File "", line 991, in 
>>> _find_and_load    File "", line 975, in 
>>> _find_and_load_unlocked    File "", line 655, 
>>> in _load_unlocked    File "", line 618, in 
>>> _load_backward_compatible    File "", line 259, in 
>>> load_module    File 
>>> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/pyspark.zip/pyspark/__init__.py",
>>>  line 51, in 



    File 
"/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/pyspark.zip/pyspark/cloudpickle.py",
 line 145, in     File 
"/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/pyspark.zip/pyspark/cloudpickle.py",
 line 126, in _make_cell_set_template_code  TypeError: an integer is required 
(got type bytes)PYTHONPATH was:  
/yarn/nm/usercache/spark/filecache/1130/__spark_libs__3536427065776590449.zip/spark-core_2.11-2.4.7.jar:/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/py4j-0.10.7-src.zip:/usr/local/lib/python3.7/dist-packages/pyspark/python/::/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/__pyfiles__:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/pyspark.zip:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_03/py4j-0.10.7-src.ziporg.apache.spark.SparkException:
 No port number in pyspark.daemon's stdout



Re: Running Spark Connect Server in Cluster Mode on Kubernetes

2023-09-04 Thread Nagatomi Yasukazu
Hello Mich,
Thank you for your questions. Here are my responses:

> 1. What investigation have you done to show that it is running in local
mode?

I have verified through the History Server's Environment tab that:
- "spark.master" is set to local[*]
- "spark.app.id" begins with local-xxx
- "spark.submit.deployMode" is set to local


> 2. who has configured this kubernetes cluster? Is it supplied by a cloud
vendor?

Our Kubernetes cluster was set up in an on-prem environment using RKE2(
https://docs.rke2.io/ ).


> 3. Confirm that you have configured Spark Connect Server correctly for
cluster mode. Make sure you specify the cluster manager (e.g., Kubernetes)
and other relevant Spark configurations in your Spark job submission.

Based on the Spark Connect documentation I've read, there doesn't seem to
be any specific settings for cluster mode related to the Spark Connect
Server.

Configuration - Spark 3.4.1 Documentation
https://spark.apache.org/docs/3.4.1/configuration.html#spark-connect

Quickstart: Spark Connect — PySpark 3.4.1 documentation
https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_connect.html

Spark Connect Overview - Spark 3.4.1 Documentation
https://spark.apache.org/docs/latest/spark-connect-overview.html

The documentation only suggests running ./sbin/start-connect-server.sh
--packages org.apache.spark:spark-connect_2.12:3.4.0, leaving me at a loss.


> 4. Can you provide a full spark submit command

Given the nature of Spark Connect, I don't use the spark-submit command.
Instead, as per the documentation, I can execute workloads using only a
Python script. For the Spark Connect Server, I have a Kubernetes manifest
executing "/opt.spark/sbin/start-connect-server.sh --packages
org.apache.spark:spark-connect_2.12:3.4.0".


> 5. Make sure that the Python client script connecting to Spark Connect
Server specifies the cluster mode explicitly, like using --master or
--deploy-mode flags when creating a SparkSession.

The Spark Connect Server operates as a Driver, so it isn't possible to
specify the --master or --deploy-mode flags in the Python client script. If
I try, I encounter a RuntimeError.

like this:
RuntimeError: Spark master cannot be configured with Spark Connect server;
however, found URL for Spark Connect [sc://.../]


> 6. Ensure that you have allocated the necessary resources (CPU, memory
etc) to Spark Connect Server when running it on Kubernetes.

Resources are ample, so that shouldn't be the problem.


> 7. Review the environment variables and configurations you have set,
including the SPARK_NO_DAEMONIZE=1 variable. Ensure that these variables
are not conflicting with

I'm unsure if SPARK_NO_DAEMONIZE=1 conflicts with cluster mode settings.
But without it, the process goes to the background when executing
start-connect-server.sh, causing the Pod to terminate prematurely.


> 8. Are you using the correct spark client version that is fully
compatible with your spark on the server?

Yes, I have verified that without using Spark Connect (e.g., using Spark
Operator), Spark applications run as expected.

> 9. check the kubernetes error logs

The Kubernetes logs don't show any errors, and jobs are running in local
mode.


> 10. Insufficient resources can lead to the application running in local
mode

I wasn't aware that insufficient resources could lead to local mode
execution. Thank you for pointing it out.


Best regards,
Yasukazu

2023年9月5日(火) 1:28 Mich Talebzadeh :

>
> personally I have not used this feature myself. However, some points
>
>
>1. What investigation have you done to show that it is running in
>local mode?
>2. who has configured this kubernetes cluster? Is it supplied by a
>cloud vendor?
>3. Confirm that you have configured Spark Connect Server correctly for
>cluster mode. Make sure you specify the cluster manager (e.g., Kubernetes)
>and other relevant Spark configurations in your Spark job submission.
>4. Can you provide a full spark submit command
>5. Make sure that the Python client script connecting to Spark Connect
>Server specifies the cluster mode explicitly, like using --master or
>--deploy-mode flags when creating a SparkSession.
>6. Ensure that you have allocated the necessary resources (CPU, memory
>etc) to Spark Connect Server when running it on Kubernetes.
>7. Review the environment variables and configurations you have set,
>including the SPARK_NO_DAEMONIZE=1 variable. Ensure that these
>variables are not conflicting with cluster mode settings.
>8. Are you using the correct spark client version that is fully
>compatible with your spark on the server?
>9. check the kubernetes error logs
>10. Insufficient resources can lead to the application running in
>local mode
>
> HTH
>
> Mich Talebzadeh,
> Distinguished Technologist, Solutions Architect & Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> 

Re: Running Spark Connect Server in Cluster Mode on Kubernetes

2023-09-04 Thread Mich Talebzadeh
personally I have not used this feature myself. However, some points


   1. What investigation have you done to show that it is running in local
   mode?
   2. who has configured this kubernetes cluster? Is it supplied by a cloud
   vendor?
   3. Confirm that you have configured Spark Connect Server correctly for
   cluster mode. Make sure you specify the cluster manager (e.g., Kubernetes)
   and other relevant Spark configurations in your Spark job submission.
   4. Can you provide a full spark submit command
   5. Make sure that the Python client script connecting to Spark Connect
   Server specifies the cluster mode explicitly, like using --master or
   --deploy-mode flags when creating a SparkSession.
   6. Ensure that you have allocated the necessary resources (CPU, memory
   etc) to Spark Connect Server when running it on Kubernetes.
   7. Review the environment variables and configurations you have set,
   including the SPARK_NO_DAEMONIZE=1 variable. Ensure that these variables
   are not conflicting with cluster mode settings.
   8. Are you using the correct spark client version that is fully
   compatible with your spark on the server?
   9. check the kubernetes error logs
   10. Insufficient resources can lead to the application running in local
   mode

HTH

Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 4 Sept 2023 at 04:57, Nagatomi Yasukazu 
wrote:

> Hi Cley,
>
> Thank you for taking the time to respond to my query. Your insights on
> Spark cluster deployment are much appreciated.
>
> However, I'd like to clarify that my specific challenge is related to
> running the Spark Connect Server on Kubernetes in Cluster Mode. While I
> understand the general deployment strategies for Spark on Kubernetes, I am
> seeking guidance particularly on the Spark Connect Server aspect.
>
> cf. Spark Connect Overview - Spark 3.4.1 Documentation
> https://spark.apache.org/docs/latest/spark-connect-overview.html
>
> To reiterate, when I connect from an external Python client and execute
> scripts, the server operates in Local Mode instead of the expected
> Kubernetes Cluster Mode (with master as k8s://... and deploy-mode set to
> cluster).
>
> If I've misunderstood your initial response and it was indeed related to
> Spark Connect, I sincerely apologize for the oversight. In that case, could
> you please expand a bit on the Spark Connect-specific aspects?
>
> Do you, or anyone else in the community, have experience with this
> specific setup or encountered a similar issue with Spark Connect Server on
> Kubernetes? Any targeted advice or guidance would be invaluable.
>
> Thank you again for your time and help.
>
> Best regards,
> Yasukazu
>
> 2023年9月4日(月) 0:23 Cleyson Barros :
>
>> Hi Nagatomi,
>> Use Apache imagers, then run your master node, then start your many
>> slavers. You can add a command line in the docker files to call for the
>> master using the docker container names in your service composition if you
>> wish to run 2 masters active and standby follow the instructions in the
>> Apache docs to do this configuration, the recipe is the same except when
>> you start the masters and how you expect the behaviour of your cluster.
>> I hope it helps.
>> Have a nice day :)
>> Cley
>>
>> Nagatomi Yasukazu  escreveu no dia sábado,
>> 2/09/2023 à(s) 15:37:
>>
>>> Hello Apache Spark community,
>>>
>>> I'm currently trying to run Spark Connect Server on Kubernetes in
>>> Cluster Mode and facing some challenges. Any guidance or hints would be
>>> greatly appreciated.
>>>
>>> ## Environment:
>>> Apache Spark version: 3.4.1
>>> Kubernetes version:  1.23
>>> Command executed:
>>>  /opt/spark/sbin/start-connect-server.sh \
>>>--packages
>>> org.apache.spark:spark-connect_2.13:3.4.1,org.apache.iceberg:iceberg-spark-runtime-3.4_2.13:1.3.1...
>>> Note that I'm running it with the environment variable
>>> SPARK_NO_DAEMONIZE=1.
>>>
>>> ## Issue:
>>> When I connect from an external Python client and run scripts, it
>>> operates in Local Mode instead of the expected Cluster Mode.
>>>
>>> ## Expected Behavior:
>>> When connecting from a Python client to the Spark Connect Server, I
>>> expect it to run in Cluster Mode.
>>>
>>> If anyone has any insights, advice, or has faced a similar issue, I'd be
>>> grateful for your feedback.
>>> Thank you in advance.
>>>
>>>
>>>