[ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicola Bova updated SPARK-33349:
--------------------------------
    Description: 
I launch my spark application with the 
[spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
 with the following yaml file:

{code:yaml}
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
   name: spark-kafka-streamer-test
   namespace: kafka2hdfs
spec: 
   type: Scala
   mode: cluster
   image: <my-repo>/spark:3.0.2-SNAPSHOT-2.12-0.1.0
   imagePullPolicy: Always
   timeToLiveSeconds: 259200
   mainClass: path.to.my.class.KafkaStreamer
   mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
   sparkVersion: 3.0.1
   restartPolicy:
     type: Always
   sparkConf:
     "spark.kafka.consumer.cache.capacity": "8192"
     "spark.kubernetes.memoryOverheadFactor": "0.3"
   deps:
   jars:
     - my
     - jar
     - list
   hadoopConfigMap: hdfs-config

   driver:
     cores: 4
     memory: 12g
     labels:
       version: 3.0.1
     serviceAccount: default
     javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

  executor:
     instances: 4
    cores: 4
    memory: 16g
    labels:
      version: 3.0.1
    javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
{code}

 

This is the driver log:

{code}
20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

... // my app log, it's a structured streaming app reading from kafka and 
writing to hdfs

20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 1574101276 (1574213896)
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 at 
okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
 at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
 at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
 at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
 at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.base/java.lang.Thread.run(Unknown Source)
{code}

The error above appears after roughly 50 minutes.

After the exception above, no more logs are produced and the app hangs.

  was:
I launch my spark application with the 
[spark-on-kubernetes-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
 with the following yaml file:

{code:yaml}
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
   name: spark-kafka-streamer-test
   namespace: kafka2hdfs
spec: 
   type: Scala
   mode: cluster
   image: <my-repo>/spark:3.0.2-SNAPSHOT-2.12-0.1.0
   imagePullPolicy: Always
   timeToLiveSeconds: 259200
   mainClass: path.to.my.class.KafkaStreamer
   mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
   sparkVersion: 3.0.1
   restartPolicy:
     type: Always
   sparkConf:
     "spark.kafka.consumer.cache.capacity": "8192"
     "spark.kubernetes.memoryOverheadFactor": "0.3"
   deps:
   jars:
     - my
     - jar
     - list
   hadoopConfigMap: hdfs-config

   driver:
     cores: 4
     memory: 12g
     labels:
       version: 3.0.1
     serviceAccount: default
     javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

  executor:
     instances: 4
    cores: 4
    memory: 16g
    labels:
      version: 3.0.1
    javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
{code}

 

This is the driver log:

{code}
20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

... // my app log, it's a structured streaming app reading from kafka and 
writing to hdfs

20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 1574101276 (1574213896)
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 at 
okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
 at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
 at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
 at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
 at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.base/java.lang.Thread.run(Unknown Source)
{code}

The error above appears after roughly 50 minutes.

After the exception above, no more logs are produced and the app hangs.


> ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
> ------------------------------------------------------------------
>
>                 Key: SPARK-33349
>                 URL: https://issues.apache.org/jira/browse/SPARK-33349
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.0.1, 3.0.2
>            Reporter: Nicola Bova
>            Priority: Critical
>
> I launch my spark application with the 
> [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
>  with the following yaml file:
> {code:yaml}
> apiVersion: sparkoperator.k8s.io/v1beta2
> kind: SparkApplication
> metadata:
>    name: spark-kafka-streamer-test
>    namespace: kafka2hdfs
> spec: 
>    type: Scala
>    mode: cluster
>    image: <my-repo>/spark:3.0.2-SNAPSHOT-2.12-0.1.0
>    imagePullPolicy: Always
>    timeToLiveSeconds: 259200
>    mainClass: path.to.my.class.KafkaStreamer
>    mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
>    sparkVersion: 3.0.1
>    restartPolicy:
>      type: Always
>    sparkConf:
>      "spark.kafka.consumer.cache.capacity": "8192"
>      "spark.kubernetes.memoryOverheadFactor": "0.3"
>    deps:
>    jars:
>      - my
>      - jar
>      - list
>    hadoopConfigMap: hdfs-config
>    driver:
>      cores: 4
>      memory: 12g
>      labels:
>        version: 3.0.1
>      serviceAccount: default
>      javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
>   executor:
>      instances: 4
>     cores: 4
>     memory: 16g
>     labels:
>       version: 3.0.1
>     javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
> {code}
>  
> This is the driver log:
> {code}
> 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> ... // my app log, it's a structured streaming app reading from kafka and 
> writing to hdfs
> 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed (this is expected if the application is shutting down.)
> io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
> version: 1574101276 (1574213896)
>  at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>  at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>  at 
> okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>  at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>  at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
>  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> The error above appears after roughly 50 minutes.
> After the exception above, no more logs are produced and the app hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to