[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373849#comment-17373849 ] Jim Kleckner commented on SPARK-33349: -- [~redsk] can you confirm that https://issues.apache.org/jira/browse/SPARK-33471 fixes your issue with 4.12.0 ? > ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed > -- > > Key: SPARK-33349 > URL: https://issues.apache.org/jira/browse/SPARK-33349 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.1, 3.0.2, 3.1.0 >Reporter: Nicola Bova >Priority: Critical > > I launch my spark application with the > [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] > with the following yaml file: > {code:yaml} > apiVersion: sparkoperator.k8s.io/v1beta2 > kind: SparkApplication > metadata: > name: spark-kafka-streamer-test > namespace: kafka2hdfs > spec: > type: Scala > mode: cluster > image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 > imagePullPolicy: Always > timeToLiveSeconds: 259200 > mainClass: path.to.my.class.KafkaStreamer > mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar > sparkVersion: 3.0.1 > restartPolicy: > type: Always > sparkConf: > "spark.kafka.consumer.cache.capacity": "8192" > "spark.kubernetes.memoryOverheadFactor": "0.3" > deps: > jars: > - my > - jar > - list > hadoopConfigMap: hdfs-config > driver: > cores: 4 > memory: 12g > labels: > version: 3.0.1 > serviceAccount: default > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > executor: > instances: 4 > cores: 4 > memory: 16g > labels: > version: 3.0.1 > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > {code} > I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart > the watcher when we receive a version changed from > k8s"|https://github.com/apache/spark/pull/29533] patch. > This is the driver log: > {code} > 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > ... // my app log, it's a structured streaming app reading from kafka and > writing to hdfs > 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has > been closed (this is expected if the application is shutting down.) > io.fabric8.kubernetes.client.KubernetesClientException: too old resource > version: 1574101276 (1574213896) > at > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) > at > okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) > at > okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) > at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > {code} > The error above appears after roughly 50 minutes. > After the exception above, no more logs are produced and the app hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31168) Upgrade Scala to 2.12.13
[ https://issues.apache.org/jira/browse/SPARK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327035#comment-17327035 ] Jim Kleckner commented on SPARK-31168: -- It appears that this fix [1] for 12038 merged into Scala master [2] and has been released in Scala 2.13.5 [3] but not yet released as Scala 2.12.14. [1] [https://github.com/scala/scala/pull/9478] [2] [https://github.com/scala/scala/pull/9495] [3] [https://github.com/scala/scala/releases/tag/v2.13.5] > Upgrade Scala to 2.12.13 > > > Key: SPARK-31168 > URL: https://issues.apache.org/jira/browse/SPARK-31168 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > h2. Highlights > * Performance improvements in the collections library: algorithmic > improvements and changes to avoid unnecessary allocations ([list of > PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+label%3Alibrary%3Acollections+label%3Aperformance]) > * Performance improvements in the compiler ([list of > PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+-label%3Alibrary%3Acollections+label%3Aperformance+], > minor [effects in our > benchmarks|https://scala-ci.typesafe.com/grafana/dashboard/db/scala-benchmark?orgId=1&from=1567985515850&to=1584355915694&var-branch=2.12.x&var-source=All&var-bench=HotScalacBenchmark.compile&var-host=scalabench@scalabench@]) > * Improvements to {{-Yrepl-class-based}}, an alternative internal REPL > encoding that avoids deadlocks (details on > [#8712|https://github.com/scala/scala/pull/8712]) > * A new {{-Yrepl-use-magic-imports}} flag that avoids deep class nesting in > the REPL, which can lead to deteriorating performance in long sessions > ([#8576|https://github.com/scala/scala/pull/8576]) > * Fix some {{toX}} methods that could expose the underlying mutability of a > {{ListBuffer}}-generated collection > ([#8674|https://github.com/scala/scala/pull/8674]) > h3. JDK 9+ support > * ASM was upgraded to 7.3.1, allowing the optimizer to run on JDK 13+ > ([#8676|https://github.com/scala/scala/pull/8676]) > * {{:javap}} in the REPL now works on JDK 9+ > ([#8400|https://github.com/scala/scala/pull/8400]) > h3. Other changes > * Support new labels for creating durations for consistency: > {{Duration("1m")}}, {{Duration("3 hrs")}} > ([#8325|https://github.com/scala/scala/pull/8325], > [#8450|https://github.com/scala/scala/pull/8450]) > * Fix memory leak in runtime reflection's {{TypeTag}} caches > ([#8470|https://github.com/scala/scala/pull/8470]) and some thread safety > issues in runtime reflection > ([#8433|https://github.com/scala/scala/pull/8433]) > * When using compiler plugins, the ordering of compiler phases may change > due to [#8427|https://github.com/scala/scala/pull/8427] > For more details, see [https://github.com/scala/scala/releases/tag/v2.12.11]. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226362#comment-17226362 ] Jim Kleckner commented on SPARK-33349: -- This fabric8 kubernetes-client issue/MR looks relevant: Repeated "too old resource version" exception with BaseOperation.waitUntilCondition(). #2414 * [https://github.com/fabric8io/kubernetes-client/issues/2414|https://github.com/fabric8io/kubernetes-client/issues/2414] * [https://github.com/fabric8io/kubernetes-client/pull/2424|https://github.com/fabric8io/kubernetes-client/pull/2424] This is released in [https://github.com/fabric8io/kubernetes-client/releases/tag/v4.12.0|https://github.com/fabric8io/kubernetes-client/releases/tag/v4.12.0] > ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed > -- > > Key: SPARK-33349 > URL: https://issues.apache.org/jira/browse/SPARK-33349 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.1, 3.0.2 >Reporter: Nicola Bova >Priority: Critical > > I launch my spark application with the > [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] > with the following yaml file: > {code:yaml} > apiVersion: sparkoperator.k8s.io/v1beta2 > kind: SparkApplication > metadata: > name: spark-kafka-streamer-test > namespace: kafka2hdfs > spec: > type: Scala > mode: cluster > image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 > imagePullPolicy: Always > timeToLiveSeconds: 259200 > mainClass: path.to.my.class.KafkaStreamer > mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar > sparkVersion: 3.0.1 > restartPolicy: > type: Always > sparkConf: > "spark.kafka.consumer.cache.capacity": "8192" > "spark.kubernetes.memoryOverheadFactor": "0.3" > deps: > jars: > - my > - jar > - list > hadoopConfigMap: hdfs-config > driver: > cores: 4 > memory: 12g > labels: > version: 3.0.1 > serviceAccount: default > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > executor: > instances: 4 > cores: 4 > memory: 16g > labels: > version: 3.0.1 > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > {code} > I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart > the watcher when we receive a version changed from > k8s"|https://github.com/apache/spark/pull/29533] patch. > This is the driver log: > {code} > 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > ... // my app log, it's a structured streaming app reading from kafka and > writing to hdfs > 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has > been closed (this is expected if the application is shutting down.) > io.fabric8.kubernetes.client.KubernetesClientException: too old resource > version: 1574101276 (1574213896) > at > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) > at > okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) > at > okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) > at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > {code} > The error above appears after roughly 50 minutes. > After the exception above, no more logs are produced and the app hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24266) Spark client terminates while driver is still running
[ https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212105#comment-17212105 ] Jim Kleckner commented on SPARK-24266: -- I believe that the PR is ready to merge to the 3.0 branch for a target of 3.0.2: https://github.com/apache/spark/pull/29533 > Spark client terminates while driver is still running > - > > Key: SPARK-24266 > URL: https://issues.apache.org/jira/browse/SPARK-24266 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 2.3.0, 3.0.0 >Reporter: Chun Chen >Priority: Major > Fix For: 3.1.0 > > > {code} > Warning: Ignoring non-spark config property: Default=system properties > included when running spark-submit. > 18/05/11 14:50:12 WARN Config: Error reading service account token from: > [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring. > 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: > Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf) > 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. > Mounting Hadoop specific files > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: N/A >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: 2018-05-11T06:50:17Z >container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9 >phase: Pending >status: [ContainerStatus(containerID=null, > image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, > lastState=ContainerState(running=null, terminated=null, waiting=null, > additionalProperties={}), name=spark-kubernetes-driver, ready=false, > restartCount=0, state=ContainerState(running=null, terminated=null, > waiting=ContainerStateWaiting(message=null, reason=PodInitializing, > additionalProperties={}), additionalProperties={}), additionalProperties={})] > 18/05/11 14:50:19 INFO Client: Waiting for application spark-64-293-980 to > finish... > 18/05/11 14:50:25 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-
[jira] [Commented] (SPARK-24266) Spark client terminates while driver is still running
[ https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163066#comment-17163066 ] Jim Kleckner commented on SPARK-24266: -- Would be reasonable to add a target version of 2.4.7 ? I took the commits from master and made a partial attempt to rebase this onto branch-2.4 [1]. However, the k8s api has evolved from 2.4 quite a bit so the watchOrStop function needs to be backported [2]. You can see the error message in this gitlab build [3]. [[1] https://github.com/jkleckner/spark/tree/SPARK-24266-on-branch2.4|https://github.com/jkleckner/spark/tree/SPARK-24266-on-branch2.4] [2] [https://github.com/jkleckner/spark/blob/SPARK-24266-on-branch2.4/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala#L193] [3] [https://gitlab.com/jkleckner/spark/-/jobs/651515950] > Spark client terminates while driver is still running > - > > Key: SPARK-24266 > URL: https://issues.apache.org/jira/browse/SPARK-24266 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 2.3.0, 3.0.0 >Reporter: Chun Chen >Priority: Major > Fix For: 3.1.0 > > > {code} > Warning: Ignoring non-spark config property: Default=system properties > included when running spark-submit. > 18/05/11 14:50:12 WARN Config: Error reading service account token from: > [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring. > 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: > Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf) > 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. > Mounting Hadoop specific files > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: N/A >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: 2018-05-11T06:50:17Z >container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9 >phase: Pending >status: [ContainerStatus(containerID=null, > image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, > lastState=ContainerState(running=null, terminated=null, waiting=null, > additionalProperties={}), name=spark-kubernetes-driver, ready=false, > restartCount=0, state=ContainerState(running=null, terminated=null, > waiting=ContainerStateWaiting(message=null, reason=PodInitializing, > additionalProperties={}), additionalProperties={}), additionalProperties={})] >
[jira] [Commented] (SPARK-24266) Spark client terminates while driver is still running
[ https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140028#comment-17140028 ] Jim Kleckner commented on SPARK-24266: -- +1 for this patch set if it reviews cleanly. > Spark client terminates while driver is still running > - > > Key: SPARK-24266 > URL: https://issues.apache.org/jira/browse/SPARK-24266 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 2.3.0, 3.0.0 >Reporter: Chun Chen >Priority: Major > > {code} > Warning: Ignoring non-spark config property: Default=system properties > included when running spark-submit. > 18/05/11 14:50:12 WARN Config: Error reading service account token from: > [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring. > 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: > Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf) > 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. > Mounting Hadoop specific files > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: N/A >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: 2018-05-11T06:50:17Z >container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9 >phase: Pending >status: [ContainerStatus(containerID=null, > image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, > lastState=ContainerState(running=null, terminated=null, waiting=null, > additionalProperties={}), name=spark-kubernetes-driver, ready=false, > restartCount=0, state=ContainerState(running=null, terminated=null, > waiting=ContainerStateWaiting(message=null, reason=PodInitializing, > additionalProperties={}), additionalProperties={}), additionalProperties={})] > 18/05/11 14:50:19 INFO Client: Waiting for application spark-64-293-980 to > finish... > 18/05/11 14:50:25 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvj
[jira] [Commented] (SPARK-30275) Add gitlab-ci.yml file for reproducible builds
[ https://issues.apache.org/jira/browse/SPARK-30275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022569#comment-17022569 ] Jim Kleckner commented on SPARK-30275: -- * Builds on my Mac use whatever I have installed on my machine whereas having a well-defined remote CI system eliminates variability. * The build process doesn't load my local system. * A push is just a git push rather than an image push which from home can take a long time since my ISP has very wimpy upload speeds. Obviously some CI/CD tooling exists for spark testing and release on the back end, but that isn't available to most people. > Add gitlab-ci.yml file for reproducible builds > -- > > Key: SPARK-30275 > URL: https://issues.apache.org/jira/browse/SPARK-30275 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.4.4, 3.0.0 >Reporter: Jim Kleckner >Priority: Minor > > It would be desirable to have public reproducible builds such as provided by > gitlab or others. > > Here is a candidate patch set to build spark using gitlab-ci: > * https://gitlab.com/jkleckner/spark/tree/add-gitlab-ci-yml > Let me know if there is interest in a PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30275) Add gitlab-ci.yml file for reproducible builds
[ https://issues.apache.org/jira/browse/SPARK-30275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021766#comment-17021766 ] Jim Kleckner commented on SPARK-30275: -- I sent a message to [d...@spark.apache.org|mailto:d...@spark.apache.org] but haven't seen it get approved yet. > Add gitlab-ci.yml file for reproducible builds > -- > > Key: SPARK-30275 > URL: https://issues.apache.org/jira/browse/SPARK-30275 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.4.4, 3.0.0 >Reporter: Jim Kleckner >Priority: Minor > > It would be desirable to have public reproducible builds such as provided by > gitlab or others. > > Here is a candidate patch set to build spark using gitlab-ci: > * https://gitlab.com/jkleckner/spark/tree/add-gitlab-ci-yml > Let me know if there is interest in a PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30275) Add gitlab-ci.yml file for reproducible builds
[ https://issues.apache.org/jira/browse/SPARK-30275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014749#comment-17014749 ] Jim Kleckner commented on SPARK-30275: -- [~hyukjin.kwon] As an open source project, the builds in gitlab.com would produce well-defined artifacts for anyone who pushes a branch to it. I have used this to produce artifacts for branch-2.4 after the release of 2.4.4 to be able to get bug fixes for use with spark-on-k8s-operator for example. People could create their own preview builds of master or any other version at will. See this example for containers built from branch-2.4: * [https://gitlab.com/jkleckner/spark/container_registry] > Add gitlab-ci.yml file for reproducible builds > -- > > Key: SPARK-30275 > URL: https://issues.apache.org/jira/browse/SPARK-30275 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.4.4, 3.0.0 >Reporter: Jim Kleckner >Priority: Minor > > It would be desirable to have public reproducible builds such as provided by > gitlab or others. > > Here is a candidate patch set to build spark using gitlab-ci: > * https://gitlab.com/jkleckner/spark/tree/add-gitlab-ci-yml > Let me know if there is interest in a PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30275) Add gitlab-ci.yml file for reproducible builds
Jim Kleckner created SPARK-30275: Summary: Add gitlab-ci.yml file for reproducible builds Key: SPARK-30275 URL: https://issues.apache.org/jira/browse/SPARK-30275 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 2.4.4, 3.0.0 Reporter: Jim Kleckner It would be desirable to have public reproducible builds such as provided by gitlab or others. Here is a candidate patch set to build spark using gitlab-ci: * https://gitlab.com/jkleckner/spark/tree/add-gitlab-ci-yml Let me know if there is interest in a PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28938) Move to supported OpenJDK docker image for Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-28938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962131#comment-16962131 ] Jim Kleckner commented on SPARK-28938: -- I created a minor one-line patch fix for this here. Let me know if it needs a different story. [https://github.com/apache/spark/pull/26296] > Move to supported OpenJDK docker image for Kubernetes > - > > Key: SPARK-28938 > URL: https://issues.apache.org/jira/browse/SPARK-28938 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.4, 3.0.0 > Environment: Kubernetes >Reporter: Rodney Aaron Stainback >Assignee: L. C. Hsieh >Priority: Minor > Fix For: 2.4.5, 3.0.0 > > Attachments: cve-spark-py.txt, cve-spark-r.txt, cve-spark.txt, > twistlock.txt > > > The current docker image used by Kubernetes > {code:java} > openjdk:8-alpine{code} > is not supported > [https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links] > It was removed with this commit > [https://github.com/docker-library/openjdk/commit/3eb0351b208d739fac35345c85e3c6237c2114ec#diff-f95ffa3d134732c33f7b8368e099] > Quote from commit "4. no more OpenJDK 8 Alpine images (Alpine/musl is not > officially supported by the OpenJDK project, so this reflects that -- see > "Project Portola" for the Alpine porting efforts which I understand are still > in need of help)" > > Please move to a supported image for Kubernetes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29055) Memory leak in Spark
[ https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942224#comment-16942224 ] Jim Kleckner commented on SPARK-29055: -- [~vanzin] It seems from the previous comment that it is not resolved by PR #25973 as the memory continues to grow. [~Geopap] I recorded an internal tech talk about how to connect/grab heap dump/analyze with jvisualvm if you are interested here: [https://www.youtube.com/channel/UCA81uPFG3aqo2X1YgZRAoEg] > Memory leak in Spark > > > Key: SPARK-29055 > URL: https://issues.apache.org/jira/browse/SPARK-29055 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 2.3.3 >Reporter: George Papa >Assignee: Jungtaek Lim >Priority: Major > Fix For: 2.4.5, 3.0.0 > > Attachments: test_csvs.zip > > > I used Spark 2.1.1 and I upgraded into new versions. After Spark version > 2.3.3, I observed from Spark UI that the driver memory is{color:#ff} > increasing continuously.{color} > In more detail, the driver memory and executors memory have the same used > memory storage and after each iteration the storage memory is increasing. You > can reproduce this behavior by running the following snippet code. The > following example, is very simple, without any dataframe persistence, but the > memory consumption is not stable as it was in former Spark versions > (Specifically until Spark 2.3.2). > Also, I tested with Spark streaming and structured streaming API and I had > the same behavior. I tested with an existing application which reads from > Kafka source and do some aggregations, persist dataframes and then unpersist > them. The persist and unpersist it works correct, I see the dataframes in the > storage tab in Spark UI and after the unpersist, all dataframe have removed. > But, after the unpersist the executors memory is not zero, BUT has the same > value with the driver memory. This behavior also affects the application > performance because the memory of the executors is increasing as the driver > increasing and after a while the persisted dataframes are not fit in the > executors memory and I have spill to disk. > Another error which I had after a long running, was > {color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded, but I > don't know if its relevant with the above behavior or not.{color} > > *HOW TO REPRODUCE THIS BEHAVIOR:* > Create a very simple application(streaming count_file.py) in order to > reproduce this behavior. This application reads CSV files from a directory, > count the rows and then remove the processed files. > {code:java} > import time > import os > from pyspark.sql import SparkSession > from pyspark.sql import functions as F > from pyspark.sql import types as T > target_dir = "..." > spark=SparkSession.builder.appName("DataframeCount").getOrCreate() > while True: > for f in os.listdir(target_dir): > df = spark.read.load(target_dir + f, format="csv") > print("Number of records: {0}".format(df.count())) > time.sleep(15){code} > Submit code: > {code:java} > spark-submit > --master spark://xxx.xxx.xx.xxx > --deploy-mode client > --executor-memory 4g > --executor-cores 3 > streaming count_file.py > {code} > > *TESTED CASES WITH THE SAME BEHAVIOUR:* > * I tested with default settings (spark-defaults.conf) > * Add spark.cleaner.periodicGC.interval 1min (or less) > * {{Turn spark.cleaner.referenceTracking.blocking}}=false > * Run the application in cluster mode > * Increase/decrease the resources of the executors and driver > * I tested with extraJavaOptions in driver and executor -XX:+UseG1GC > -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=12 > > *DEPENDENCIES* > * Operation system: Ubuntu 16.04.3 LTS > * Java: jdk1.8.0_131 (tested also with jdk1.8.0_221) > * Python: Python 2.7.12 > > *NOTE:* In Spark 2.1.1 the driver memory consumption (Storage Memory tab) was > extremely low and after the run of ContextCleaner and BlockManager the memory > was decreasing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29055) Memory leak in Spark
[ https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939787#comment-16939787 ] Jim Kleckner commented on SPARK-29055: -- [~Geopap] How quickly does the memory grow? > Memory leak in Spark > > > Key: SPARK-29055 > URL: https://issues.apache.org/jira/browse/SPARK-29055 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 2.3.3 >Reporter: George Papa >Priority: Major > Attachments: test_csvs.zip > > > I used Spark 2.1.1 and I upgraded into new versions. After Spark version > 2.3.3, I observed from Spark UI that the driver memory is{color:#ff} > increasing continuously.{color} > In more detail, the driver memory and executors memory have the same used > memory storage and after each iteration the storage memory is increasing. You > can reproduce this behavior by running the following snippet code. The > following example, is very simple, without any dataframe persistence, but the > memory consumption is not stable as it was in former Spark versions > (Specifically until Spark 2.3.2). > Also, I tested with Spark streaming and structured streaming API and I had > the same behavior. I tested with an existing application which reads from > Kafka source and do some aggregations, persist dataframes and then unpersist > them. The persist and unpersist it works correct, I see the dataframes in the > storage tab in Spark UI and after the unpersist, all dataframe have removed. > But, after the unpersist the executors memory is not zero, BUT has the same > value with the driver memory. This behavior also affects the application > performance because the memory of the executors is increasing as the driver > increasing and after a while the persisted dataframes are not fit in the > executors memory and I have spill to disk. > Another error which I had after a long running, was > {color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded, but I > don't know if its relevant with the above behavior or not.{color} > > *HOW TO REPRODUCE THIS BEHAVIOR:* > Create a very simple application(streaming count_file.py) in order to > reproduce this behavior. This application reads CSV files from a directory, > count the rows and then remove the processed files. > {code:java} > import time > import os > from pyspark.sql import SparkSession > from pyspark.sql import functions as F > from pyspark.sql import types as T > target_dir = "..." > spark=SparkSession.builder.appName("DataframeCount").getOrCreate() > while True: > for f in os.listdir(target_dir): > df = spark.read.load(target_dir + f, format="csv") > print("Number of records: {0}".format(df.count())) > time.sleep(15){code} > Submit code: > {code:java} > spark-submit > --master spark://xxx.xxx.xx.xxx > --deploy-mode client > --executor-memory 4g > --executor-cores 3 > streaming count_file.py > {code} > > *TESTED CASES WITH THE SAME BEHAVIOUR:* > * I tested with default settings (spark-defaults.conf) > * Add spark.cleaner.periodicGC.interval 1min (or less) > * {{Turn spark.cleaner.referenceTracking.blocking}}=false > * Run the application in cluster mode > * Increase/decrease the resources of the executors and driver > * I tested with extraJavaOptions in driver and executor -XX:+UseG1GC > -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=12 > > *DEPENDENCIES* > * Operation system: Ubuntu 16.04.3 LTS > * Java: jdk1.8.0_131 (tested also with jdk1.8.0_221) > * Python: Python 2.7.12 > > *NOTE:* In Spark 2.1.1 the driver memory consumption (Storage Memory tab) was > extremely low and after the run of ContextCleaner and BlockManager the memory > was decreasing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-11141) Batching of ReceivedBlockTrackerLogEvents for efficient WAL writes
[ https://issues.apache.org/jira/browse/SPARK-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902537#comment-15902537 ] Jim Kleckner edited comment on SPARK-11141 at 3/9/17 5:54 AM: -- FYI, this can cause problems when not using S3 during shutdown as described in this AWS posting: https://forums.aws.amazon.com/thread.jspa?threadID=223378 The workaround indicated is to use --conf spark.streaming.driver.writeAheadLog.allowBatching=false with the submit. The exception contains the text: {code} streaming stop ReceivedBlockTracker: Exception thrown while writing record: BatchAllocationEvent {code} was (Author: jkleckner): FYI, this can cause problems when not using S3 during shutdown as described in this AWS posting: https://forums.aws.amazon.com/thread.jspa?threadID=223378 The workaround indicated is to use --conf spark.streaming.driver.writeAheadLog.allowBatching=false with the submit. > Batching of ReceivedBlockTrackerLogEvents for efficient WAL writes > -- > > Key: SPARK-11141 > URL: https://issues.apache.org/jira/browse/SPARK-11141 > Project: Spark > Issue Type: Improvement > Components: DStreams >Reporter: Burak Yavuz >Assignee: Burak Yavuz > Fix For: 1.6.0 > > > When using S3 as a directory for WALs, the writes take too long. The driver > gets very easily bottlenecked when multiple receivers send AddBlock events to > the ReceiverTracker. This PR adds batching of events in the > ReceivedBlockTracker so that receivers don't get blocked by the driver for > too long. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11141) Batching of ReceivedBlockTrackerLogEvents for efficient WAL writes
[ https://issues.apache.org/jira/browse/SPARK-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902537#comment-15902537 ] Jim Kleckner commented on SPARK-11141: -- FYI, this can cause problems when not using S3 during shutdown as described in this AWS posting: https://forums.aws.amazon.com/thread.jspa?threadID=223378 The workaround indicated is to use --conf spark.streaming.driver.writeAheadLog.allowBatching=false with the submit. > Batching of ReceivedBlockTrackerLogEvents for efficient WAL writes > -- > > Key: SPARK-11141 > URL: https://issues.apache.org/jira/browse/SPARK-11141 > Project: Spark > Issue Type: Improvement > Components: DStreams >Reporter: Burak Yavuz >Assignee: Burak Yavuz > Fix For: 1.6.0 > > > When using S3 as a directory for WALs, the writes take too long. The driver > gets very easily bottlenecked when multiple receivers send AddBlock events to > the ReceiverTracker. This PR adds batching of events in the > ReceivedBlockTracker so that receivers don't get blocked by the driver for > too long. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16333) Excessive Spark history event/json data size (5GB each)
[ https://issues.apache.org/jira/browse/SPARK-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900533#comment-15900533 ] Jim Kleckner commented on SPARK-16333: -- I ended up here when looking into why an upgrade of our streaming computation to 2.1.0 was pegging the network at a gigabit/second. Setting to spark.eventLog.enabled to false confirmed that this logging from slave port 50010 was the culprit. How can anyone with seriously large numbers of tasks use spark history with this amount of load? > Excessive Spark history event/json data size (5GB each) > --- > > Key: SPARK-16333 > URL: https://issues.apache.org/jira/browse/SPARK-16333 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 > Environment: this is seen on both x86 (Intel(R) Xeon(R), E5-2699 ) > and ppc platform (Habanero, Model: 8348-21C), Red Hat Enterprise Linux Server > release 7.2 (Maipo)., Spark2.0.0-preview (May-24, 2016 build) >Reporter: Peter Liu > Labels: performance, spark2.0.0 > > With Spark2.0.0-preview (May-24 build), the history event data (the json > file), that is generated for each Spark application run (see below), can be > as big as 5GB (instead of 14 MB for exactly the same application run and the > same input data of 1TB under Spark1.6.1) > -rwxrwx--- 1 root root 5.3G Jun 30 09:39 app-20160630091959- > -rwxrwx--- 1 root root 5.3G Jun 30 09:56 app-20160630094213- > -rwxrwx--- 1 root root 5.3G Jun 30 10:13 app-20160630095856- > -rwxrwx--- 1 root root 5.3G Jun 30 10:30 app-20160630101556- > The test is done with Sparkbench V2, SQL RDD (see github: > https://github.com/SparkTC/spark-bench) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6029) Spark excludes "fastutil" dependencies of "clearspring" quantiles
Jim Kleckner created SPARK-6029: --- Summary: Spark excludes "fastutil" dependencies of "clearspring" quantiles Key: SPARK-6029 URL: https://issues.apache.org/jira/browse/SPARK-6029 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.2.1 Reporter: Jim Kleckner Spark includes the clearspring analytics package but intentionally excludes the dependencies of the fastutil package. Spark includes parquet-column which includes fastutil and relocates it under parquet/ but creates a shaded jar file which is incomplete because it shades out some of the fastutil classes, notably Long2LongOpenHashMap, which is present in the fastutil jar file that parquet-column is referencing. We are using more of the clearspring classes (e.g. QDigest) and those do depend on missing fastutil classes like Long2LongOpenHashMap. Even though I add them to our assembly jar file, the class loader finds the spark assembly and we get runtime class loader errors when we try to use it. The [documentaion|http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment] and possibly related issue [SPARK-939|https://issues.apache.org/jira/browse/SPARK-939] suggest arguments that I tried with spark-submit: {code} --conf spark.driver.userClassPathFirst=true \ --conf spark.executor.userClassPathFirst=true {code} but we still get the class not found error. Could this be a bug with {{userClassPathFirst=true}}? i.e. should it work? In any case, would it be reasonable to not exclude the "fastutil" dependencies? See email discussion [here|http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-Spark-excludes-quot-fastutil-quot-dependencies-we-need-tt21812.html] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org