Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-16 Thread Mich Talebzadeh
You can try this

gsutil cp src/StructuredStream-on-gke.py gs://codes/

where you create a bucket on gcs called codes


Then in you spark-submit do


spark-submit --verbose \
   --master k8s://https://$KUBERNETES_MASTER_IP:443 \
   --deploy-mode cluster \
   --name  \
  --conf spark.kubernetes.driver.container.image=pyspark-example:0.1
  \

   --conf spark.kubernetes.executor.container.image=
pyspark-example:0.1  \

gs://codes/StructuredStream-on-gke.py



HTH


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 15 Feb 2023 at 21:17, karan alang  wrote:

> thnks, Mich .. let me check this
>
>
>
> On Wed, Feb 15, 2023 at 1:42 AM Mich Talebzadeh 
> wrote:
>
>>
>> It may help to check this article of mine
>>
>>
>> Spark on Kubernetes, A Practitioner’s Guide
>> 
>>
>>
>> HTH
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 15 Feb 2023 at 09:12, Mich Talebzadeh 
>> wrote:
>>
>>> Your submit command
>>>
>>> spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode
>>> cluster --name pyspark-example --conf 
>>> spark.kubernetes.container.image=pyspark-example:0.1
>>> --conf spark.kubernetes.file.upload.path=/myexample
>>> src/StructuredStream-on-gke.py
>>>
>>>
>>> pay attention to what it says
>>>
>>>
>>> --conf spark.kubernetes.file.upload.path
>>>
>>> That refers to your Python package on GCS storage not in the docker
>>> itself
>>>
>>>
>>> From
>>> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management
>>>
>>>
>>> "... The app jar file will be uploaded to the S3 and then when the
>>> driver is launched it will be downloaded to the driver pod and will be
>>> added to its classpath. Spark will generate a subdir under the upload path
>>> with a random name to avoid conflicts with spark apps running in parallel.
>>> User could manage the subdirs created according to his needs..."
>>>
>>>
>>> In your case it is gs not s3
>>>
>>>
>>> There is no point putting your python file in the docker image itself!
>>>
>>>
>>> HTH
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Wed, 15 Feb 2023 at 07:46, karan alang  wrote:
>>>
 Hi Ye,

 This is the error i get when i don't set the
 spark.kubernetes.file.upload.path

 Any ideas on how to fix this ?

 ```

 Exception in thread "main" org.apache.spark.SparkException: Please
 specify spark.kubernetes.file.upload.path property.

 at
 org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299)

 at
 org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248)

 at
 scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)

 at
 scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)

 at
 scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)

 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)

 at scala.collection.TraversableLike.map(TraversableLike.scala:238)

 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)

 at scala.collection.AbstractTraversable.map(Traversable.scala:108)

 at
 org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:247)

 at
 org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$an

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-15 Thread karan alang
thnks, Mich .. let me check this



On Wed, Feb 15, 2023 at 1:42 AM Mich Talebzadeh 
wrote:

>
> It may help to check this article of mine
>
>
> Spark on Kubernetes, A Practitioner’s Guide
> 
>
>
> HTH
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 15 Feb 2023 at 09:12, Mich Talebzadeh 
> wrote:
>
>> Your submit command
>>
>> spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode
>> cluster --name pyspark-example --conf 
>> spark.kubernetes.container.image=pyspark-example:0.1
>> --conf spark.kubernetes.file.upload.path=/myexample
>> src/StructuredStream-on-gke.py
>>
>>
>> pay attention to what it says
>>
>>
>> --conf spark.kubernetes.file.upload.path
>>
>> That refers to your Python package on GCS storage not in the docker itself
>>
>>
>> From
>> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management
>>
>>
>> "... The app jar file will be uploaded to the S3 and then when the
>> driver is launched it will be downloaded to the driver pod and will be
>> added to its classpath. Spark will generate a subdir under the upload path
>> with a random name to avoid conflicts with spark apps running in parallel.
>> User could manage the subdirs created according to his needs..."
>>
>>
>> In your case it is gs not s3
>>
>>
>> There is no point putting your python file in the docker image itself!
>>
>>
>> HTH
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 15 Feb 2023 at 07:46, karan alang  wrote:
>>
>>> Hi Ye,
>>>
>>> This is the error i get when i don't set the
>>> spark.kubernetes.file.upload.path
>>>
>>> Any ideas on how to fix this ?
>>>
>>> ```
>>>
>>> Exception in thread "main" org.apache.spark.SparkException: Please
>>> specify spark.kubernetes.file.upload.path property.
>>>
>>> at
>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248)
>>>
>>> at
>>> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>>>
>>> at
>>> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>>>
>>> at
>>> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>>>
>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>>>
>>> at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>>>
>>> at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>>>
>>> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:247)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:173)
>>>
>>> at scala.collection.immutable.List.foreach(List.scala:392)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:164)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>>>
>>> at
>>> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>>>
>>> at
>>> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>>>
>>> at scala.collection.immutable.List.foldLeft(List.scala:89)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
>>>
>>> at org.apache.sp

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-15 Thread Mich Talebzadeh
It may help to check this article of mine


Spark on Kubernetes, A Practitioner’s Guide



HTH


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 15 Feb 2023 at 09:12, Mich Talebzadeh 
wrote:

> Your submit command
>
> spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode
> cluster --name pyspark-example --conf 
> spark.kubernetes.container.image=pyspark-example:0.1
> --conf spark.kubernetes.file.upload.path=/myexample
> src/StructuredStream-on-gke.py
>
>
> pay attention to what it says
>
>
> --conf spark.kubernetes.file.upload.path
>
> That refers to your Python package on GCS storage not in the docker itself
>
>
> From
> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management
>
>
> "... The app jar file will be uploaded to the S3 and then when the driver
> is launched it will be downloaded to the driver pod and will be added to
> its classpath. Spark will generate a subdir under the upload path with a
> random name to avoid conflicts with spark apps running in parallel. User
> could manage the subdirs created according to his needs..."
>
>
> In your case it is gs not s3
>
>
> There is no point putting your python file in the docker image itself!
>
>
> HTH
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 15 Feb 2023 at 07:46, karan alang  wrote:
>
>> Hi Ye,
>>
>> This is the error i get when i don't set the
>> spark.kubernetes.file.upload.path
>>
>> Any ideas on how to fix this ?
>>
>> ```
>>
>> Exception in thread "main" org.apache.spark.SparkException: Please
>> specify spark.kubernetes.file.upload.path property.
>>
>> at
>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299)
>>
>> at
>> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248)
>>
>> at
>> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>>
>> at
>> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>>
>> at
>> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>>
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>>
>> at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>>
>> at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>>
>> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>>
>> at
>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:247)
>>
>> at
>> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:173)
>>
>> at scala.collection.immutable.List.foreach(List.scala:392)
>>
>> at
>> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:164)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>>
>> at
>> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>>
>> at
>> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>>
>> at scala.collection.immutable.List.foldLeft(List.scala:89)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
>>
>> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClient

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-15 Thread Mich Talebzadeh
Your submit command

spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode cluster
--name pyspark-example --conf
spark.kubernetes.container.image=pyspark-example:0.1
--conf spark.kubernetes.file.upload.path=/myexample
src/StructuredStream-on-gke.py


pay attention to what it says


--conf spark.kubernetes.file.upload.path

That refers to your Python package on GCS storage not in the docker itself


From
https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management


"... The app jar file will be uploaded to the S3 and then when the driver
is launched it will be downloaded to the driver pod and will be added to
its classpath. Spark will generate a subdir under the upload path with a
random name to avoid conflicts with spark apps running in parallel. User
could manage the subdirs created according to his needs..."


In your case it is gs not s3


There is no point putting your python file in the docker image itself!


HTH


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 15 Feb 2023 at 07:46, karan alang  wrote:

> Hi Ye,
>
> This is the error i get when i don't set the
> spark.kubernetes.file.upload.path
>
> Any ideas on how to fix this ?
>
> ```
>
> Exception in thread "main" org.apache.spark.SparkException: Please specify
> spark.kubernetes.file.upload.path property.
>
> at
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299)
>
> at
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248)
>
> at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>
> at
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>
> at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>
> at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>
> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>
> at
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:247)
>
> at
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:173)
>
> at scala.collection.immutable.List.foreach(List.scala:392)
>
> at
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:164)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>
> at
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>
> at
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>
> at scala.collection.immutable.List.foldLeft(List.scala:89)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>
> at
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
>
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
>
> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
>
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
>
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> ```
>
> On Tue, Feb 14, 2023 at 1:33 AM Ye Xianjin  wrote:
>
>> The configuration of ‘…file.upload.path’ is wrong. it means a distributed
>> fs path to store your archives/resource/jars temporarily, then distributed
>> by spark to drivers/executors.
>> For your cases, you don’t need to set this configuration.
>> Sent from my iPho

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-14 Thread karan alang
Hi Ye,

This is the error i get when i don't set the
spark.kubernetes.file.upload.path

Any ideas on how to fix this ?

```

Exception in thread "main" org.apache.spark.SparkException: Please specify
spark.kubernetes.file.upload.path property.

at
org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299)

at
org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248)

at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)

at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)

at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)

at scala.collection.TraversableLike.map(TraversableLike.scala:238)

at scala.collection.TraversableLike.map$(TraversableLike.scala:231)

at scala.collection.AbstractTraversable.map(Traversable.scala:108)

at
org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:247)

at
org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:173)

at scala.collection.immutable.List.foreach(List.scala:392)

at
org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:164)

at
org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)

at
scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)

at
scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)

at scala.collection.immutable.List.foldLeft(List.scala:89)

at
org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)

at
org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)

at
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)

at
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)

at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)

at
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)

at
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)

at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)

at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)

at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)

at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)

at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
```

On Tue, Feb 14, 2023 at 1:33 AM Ye Xianjin  wrote:

> The configuration of ‘…file.upload.path’ is wrong. it means a distributed
> fs path to store your archives/resource/jars temporarily, then distributed
> by spark to drivers/executors.
> For your cases, you don’t need to set this configuration.
> Sent from my iPhone
>
> On Feb 14, 2023, at 5:43 AM, karan alang  wrote:
>
> 
> Hello All,
>
> I'm trying to run a simple application on GKE (Kubernetes), and it is
> failing:
> Note : I have spark(bitnami spark chart) installed on GKE using helm
> install
>
> Here is what is done :
> 1. created a docker image using Dockerfile
>
> Dockerfile :
> ```
>
> FROM python:3.7-slim
>
> RUN apt-get update && \
> apt-get install -y default-jre && \
> apt-get install -y openjdk-11-jre-headless && \
> apt-get clean
>
> ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64
>
> RUN pip install pyspark
> RUN mkdir -p /myexample && chmod 755 /myexample
> WORKDIR /myexample
>
> COPY src/StructuredStream-on-gke.py /myexample/StructuredStream-on-gke.py
>
> CMD ["pyspark"]
>
> ```
> Simple pyspark application :
> ```
>
> from pyspark.sql import SparkSession
> spark = 
> SparkSession.builder.appName("StructuredStreaming-on-gke").getOrCreate()
>
> data = [('k1', 123000), ('k2', 234000), ('k3', 456000)]
> df = spark.createDataFrame(data, ('id', 'salary'))
>
> df.show(5, False)
>
> ```
>
> Spark-submit command :
> ```
>
> spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode
> cluster --name pyspark-example --conf
> spark.kubernetes.container.image=pyspark-example:0.1 --conf
> spark.kubernetes.file.upload.path=/myexample src/StructuredStream-on-gke.py
> ```
>
> Error i get :
> ```
>
> 23/02/13 13:18:27 INFO KubernetesUtils: Uploading file:
> /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py
> to dest:
> /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a/StructuredStream-on-gke.py...
>
> Exception in t

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-14 Thread Ye Xianjin
The configuration of ‘…file.upload.path’ is wrong. it means a distributed fs path to store your archives/resource/jars temporarily, then distributed by spark to drivers/executors. For your cases, you don’t need to set this configuration.Sent from my iPhoneOn Feb 14, 2023, at 5:43 AM, karan alang  wrote:Hello All,I'm trying to run a simple application on GKE (Kubernetes), and it is failing:Note : I have spark(bitnami spark chart) installed on GKE using helm install  Here is what is done :1. created a docker image using DockerfileDockerfile :```FROM python:3.7-slimRUN apt-get update && \apt-get install -y default-jre && \apt-get install -y openjdk-11-jre-headless && \apt-get cleanENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64RUN pip install pysparkRUN mkdir -p /myexample && chmod 755 /myexampleWORKDIR /myexampleCOPY src/StructuredStream-on-gke.py /myexample/StructuredStream-on-gke.pyCMD ["pyspark"]```Simple pyspark application :```from pyspark.sql import SparkSessionspark = SparkSession.builder.appName("StructuredStreaming-on-gke").getOrCreate()data = "" style="color:rgb(0,128,0);font-weight:bold">'k1', 123000), ('k2', 234000), ('k3', 456000)]df = spark.createDataFrame(data, ('id', 'salary'))df.show(5, False)```Spark-submit command :```





spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode cluster --name pyspark-example --conf spark.kubernetes.container.image=pyspark-example:0.1 --conf spark.kubernetes.file.upload.path=/myexample src/StructuredStream-on-gke.py```Error i get :```





23/02/13 13:18:27 INFO KubernetesUtils: Uploading file: /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py to dest: /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a/StructuredStream-on-gke.py...
Exception in thread "main" org.apache.spark.SparkException: Uploading file /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py failed...
	at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:296)
	at org.apache.spark.deploy.k8s.KubernetesUtils$.renameMainAppResource(KubernetesUtils.scala:270)
	at org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configureForPython(DriverCommandFeatureStep.scala:109)
	at org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configurePod(DriverCommandFeatureStep.scala:44)
	at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:89)
	at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
	at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: Error uploading file StructuredStream-on-gke.py
	at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:319)
	at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:292)
	... 21 more
Caused by: java.io.IOException: Mkdirs failed to create /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a
	at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:317)
	at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:305)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:414)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387)
	at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2369)
	at org.apache.hadoop.fs.FilterFileSystem.copyFromLocalFile(FilterFileSystem.java:368)
	at org.

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-14 Thread Khalid Mammadov
I am not k8s expert but I think you got permission issue. Try 777 as an
example to see if it works.

On Mon, 13 Feb 2023, 21:42 karan alang,  wrote:

> Hello All,
>
> I'm trying to run a simple application on GKE (Kubernetes), and it is
> failing:
> Note : I have spark(bitnami spark chart) installed on GKE using helm
> install
>
> Here is what is done :
> 1. created a docker image using Dockerfile
>
> Dockerfile :
> ```
>
> FROM python:3.7-slim
>
> RUN apt-get update && \
> apt-get install -y default-jre && \
> apt-get install -y openjdk-11-jre-headless && \
> apt-get clean
>
> ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64
>
> RUN pip install pyspark
> RUN mkdir -p /myexample && chmod 755 /myexample
> WORKDIR /myexample
>
> COPY src/StructuredStream-on-gke.py /myexample/StructuredStream-on-gke.py
>
> CMD ["pyspark"]
>
> ```
> Simple pyspark application :
> ```
>
> from pyspark.sql import SparkSession
> spark = 
> SparkSession.builder.appName("StructuredStreaming-on-gke").getOrCreate()
>
> data = [('k1', 123000), ('k2', 234000), ('k3', 456000)]
> df = spark.createDataFrame(data, ('id', 'salary'))
>
> df.show(5, False)
>
> ```
>
> Spark-submit command :
> ```
>
> spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode
> cluster --name pyspark-example --conf
> spark.kubernetes.container.image=pyspark-example:0.1 --conf
> spark.kubernetes.file.upload.path=/myexample src/StructuredStream-on-gke.py
> ```
>
> Error i get :
> ```
>
> 23/02/13 13:18:27 INFO KubernetesUtils: Uploading file:
> /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py
> to dest:
> /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a/StructuredStream-on-gke.py...
>
> Exception in thread "main" org.apache.spark.SparkException: Uploading file
> /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py
> failed...
>
> at
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:296)
>
> at
> org.apache.spark.deploy.k8s.KubernetesUtils$.renameMainAppResource(KubernetesUtils.scala:270)
>
> at
> org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configureForPython(DriverCommandFeatureStep.scala:109)
>
> at
> org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configurePod(DriverCommandFeatureStep.scala:44)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59)
>
> at
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>
> at
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>
> at scala.collection.immutable.List.foldLeft(List.scala:89)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>
> at
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
>
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
>
> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
>
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
>
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> Caused by: org.apache.spark.SparkException: Error uploading file
> StructuredStream-on-gke.py
>
> at
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:319)
>
> at
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:292)
>
> ... 21 more
>
> Caused by: java.io.IOException: Mkdirs failed to create
> /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a
>
> at
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:317)
>
> at
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:305)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)
>
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:414)
>
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387)
>
> at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2369

Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-13 Thread karan alang
Hello All,

I'm trying to run a simple application on GKE (Kubernetes), and it is
failing:
Note : I have spark(bitnami spark chart) installed on GKE using helm
install

Here is what is done :
1. created a docker image using Dockerfile

Dockerfile :
```

FROM python:3.7-slim

RUN apt-get update && \
apt-get install -y default-jre && \
apt-get install -y openjdk-11-jre-headless && \
apt-get clean

ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64

RUN pip install pyspark
RUN mkdir -p /myexample && chmod 755 /myexample
WORKDIR /myexample

COPY src/StructuredStream-on-gke.py /myexample/StructuredStream-on-gke.py

CMD ["pyspark"]

```
Simple pyspark application :
```

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("StructuredStreaming-on-gke").getOrCreate()

data = [('k1', 123000), ('k2', 234000), ('k3', 456000)]
df = spark.createDataFrame(data, ('id', 'salary'))

df.show(5, False)

```

Spark-submit command :
```

spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode cluster
--name pyspark-example --conf
spark.kubernetes.container.image=pyspark-example:0.1 --conf
spark.kubernetes.file.upload.path=/myexample src/StructuredStream-on-gke.py
```

Error i get :
```

23/02/13 13:18:27 INFO KubernetesUtils: Uploading file:
/Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py
to dest:
/myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a/StructuredStream-on-gke.py...

Exception in thread "main" org.apache.spark.SparkException: Uploading file
/Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py
failed...

at
org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:296)

at
org.apache.spark.deploy.k8s.KubernetesUtils$.renameMainAppResource(KubernetesUtils.scala:270)

at
org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configureForPython(DriverCommandFeatureStep.scala:109)

at
org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configurePod(DriverCommandFeatureStep.scala:44)

at
org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59)

at
scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)

at
scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)

at scala.collection.immutable.List.foldLeft(List.scala:89)

at
org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)

at
org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)

at
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)

at
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)

at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)

at
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)

at
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)

at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)

at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)

at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)

at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)

at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: org.apache.spark.SparkException: Error uploading file
StructuredStream-on-gke.py

at
org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:319)

at
org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:292)

... 21 more

Caused by: java.io.IOException: Mkdirs failed to create
/myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a

at
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:317)

at
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:305)

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)

at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:414)

at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387)

at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2369)

at
org.apache.hadoop.fs.FilterFileSystem.copyFromLocalFile(FilterFileSystem.java:368)

at
org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:316)

... 22 more
```

Any ideas on how to fix this & get it to work ?
tia !

Pls see the stackoverflow link :

https://stackoverflow.com/questions/75441360/running-spark-application-on-gke-failing-on-spark-submit