from:"Jim Kleckner \(JIRA\)"

[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2021-07-02 Thread Jim Kleckner (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373849#comment-17373849
 ] 

Jim Kleckner commented on SPARK-33349:
--

[~redsk] can you confirm that https://issues.apache.org/jira/browse/SPARK-33471 
 fixes your issue with 4.12.0 ?

> ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
> --
>
> Key: SPARK-33349
> URL: https://issues.apache.org/jira/browse/SPARK-33349
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.1, 3.0.2, 3.1.0
>Reporter: Nicola Bova
>Priority: Critical
>
> I launch my spark application with the 
> [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
>  with the following yaml file:
> {code:yaml}
> apiVersion: sparkoperator.k8s.io/v1beta2
> kind: SparkApplication
> metadata:
>    name: spark-kafka-streamer-test
>    namespace: kafka2hdfs
> spec: 
>    type: Scala
>    mode: cluster
>    image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
>    imagePullPolicy: Always
>    timeToLiveSeconds: 259200
>    mainClass: path.to.my.class.KafkaStreamer
>    mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
>    sparkVersion: 3.0.1
>    restartPolicy:
>  type: Always
>    sparkConf:
>  "spark.kafka.consumer.cache.capacity": "8192"
>  "spark.kubernetes.memoryOverheadFactor": "0.3"
>    deps:
>    jars:
>  - my
>  - jar
>  - list
>    hadoopConfigMap: hdfs-config
>    driver:
>  cores: 4
>  memory: 12g
>  labels:
>    version: 3.0.1
>  serviceAccount: default
>  javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
>   executor:
>  instances: 4
>     cores: 4
>     memory: 16g
>     labels:
>   version: 3.0.1
>     javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
> {code}
>  I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart 
> the watcher when we receive a version changed from 
> k8s"|https://github.com/apache/spark/pull/29533] patch.
> This is the driver log:
> {code}
> 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> ... // my app log, it's a structured streaming app reading from kafka and 
> writing to hdfs
> 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed (this is expected if the application is shutting down.)
> io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
> version: 1574101276 (1574213896)
>  at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>  at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>  at 
> okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>  at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>  at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
>  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> The error above appears after roughly 50 minutes.
> After the exception above, no more logs are produced and the app hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31168) Upgrade Scala to 2.12.13

2021-04-21 Thread Jim Kleckner (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327035#comment-17327035
 ] 

Jim Kleckner commented on SPARK-31168:
--

It appears that this fix [1] for 12038 merged into Scala master [2] and has 
been released in Scala 2.13.5 [3] but not yet released as Scala 2.12.14.

 

[1] [https://github.com/scala/scala/pull/9478]

[2] [https://github.com/scala/scala/pull/9495]

[3] [https://github.com/scala/scala/releases/tag/v2.13.5]

> Upgrade Scala to 2.12.13
> 
>
> Key: SPARK-31168
> URL: https://issues.apache.org/jira/browse/SPARK-31168
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> h2. Highlights
>  * Performance improvements in the collections library: algorithmic 
> improvements and changes to avoid unnecessary allocations ([list of 
> PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+label%3Alibrary%3Acollections+label%3Aperformance])
>  * Performance improvements in the compiler ([list of 
> PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+-label%3Alibrary%3Acollections+label%3Aperformance+],
>  minor [effects in our 
> benchmarks|https://scala-ci.typesafe.com/grafana/dashboard/db/scala-benchmark?orgId=1&from=1567985515850&to=1584355915694&var-branch=2.12.x&var-source=All&var-bench=HotScalacBenchmark.compile&var-host=scalabench@scalabench@])
>  * Improvements to {{-Yrepl-class-based}}, an alternative internal REPL 
> encoding that avoids deadlocks (details on 
> [#8712|https://github.com/scala/scala/pull/8712])
>  * A new {{-Yrepl-use-magic-imports}} flag that avoids deep class nesting in 
> the REPL, which can lead to deteriorating performance in long sessions 
> ([#8576|https://github.com/scala/scala/pull/8576])
>  * Fix some {{toX}} methods that could expose the underlying mutability of a 
> {{ListBuffer}}-generated collection 
> ([#8674|https://github.com/scala/scala/pull/8674])
> h3. JDK 9+ support
>  * ASM was upgraded to 7.3.1, allowing the optimizer to run on JDK 13+ 
> ([#8676|https://github.com/scala/scala/pull/8676])
>  * {{:javap}} in the REPL now works on JDK 9+ 
> ([#8400|https://github.com/scala/scala/pull/8400])
> h3. Other changes
>  * Support new labels for creating durations for consistency: 
> {{Duration("1m")}}, {{Duration("3 hrs")}} 
> ([#8325|https://github.com/scala/scala/pull/8325], 
> [#8450|https://github.com/scala/scala/pull/8450])
>  * Fix memory leak in runtime reflection's {{TypeTag}} caches 
> ([#8470|https://github.com/scala/scala/pull/8470]) and some thread safety 
> issues in runtime reflection 
> ([#8433|https://github.com/scala/scala/pull/8433])
>  * When using compiler plugins, the ordering of compiler phases may change 
> due to [#8427|https://github.com/scala/scala/pull/8427]
> For more details, see [https://github.com/scala/scala/releases/tag/v2.12.11].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2020-11-04 Thread Jim Kleckner (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226362#comment-17226362
 ] 

Jim Kleckner commented on SPARK-33349:
--

This fabric8 kubernetes-client issue/MR looks relevant:

Repeated "too old resource version" exception with 
BaseOperation.waitUntilCondition(). #2414
 * 
[https://github.com/fabric8io/kubernetes-client/issues/2414|https://github.com/fabric8io/kubernetes-client/issues/2414]
 * 
[https://github.com/fabric8io/kubernetes-client/pull/2424|https://github.com/fabric8io/kubernetes-client/pull/2424]

 

This is released in 
[https://github.com/fabric8io/kubernetes-client/releases/tag/v4.12.0|https://github.com/fabric8io/kubernetes-client/releases/tag/v4.12.0]


 

> ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
> --
>
> Key: SPARK-33349
> URL: https://issues.apache.org/jira/browse/SPARK-33349
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.1, 3.0.2
>Reporter: Nicola Bova
>Priority: Critical
>
> I launch my spark application with the 
> [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
>  with the following yaml file:
> {code:yaml}
> apiVersion: sparkoperator.k8s.io/v1beta2
> kind: SparkApplication
> metadata:
>    name: spark-kafka-streamer-test
>    namespace: kafka2hdfs
> spec: 
>    type: Scala
>    mode: cluster
>    image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
>    imagePullPolicy: Always
>    timeToLiveSeconds: 259200
>    mainClass: path.to.my.class.KafkaStreamer
>    mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
>    sparkVersion: 3.0.1
>    restartPolicy:
>  type: Always
>    sparkConf:
>  "spark.kafka.consumer.cache.capacity": "8192"
>  "spark.kubernetes.memoryOverheadFactor": "0.3"
>    deps:
>    jars:
>  - my
>  - jar
>  - list
>    hadoopConfigMap: hdfs-config
>    driver:
>  cores: 4
>  memory: 12g
>  labels:
>    version: 3.0.1
>  serviceAccount: default
>  javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
>   executor:
>  instances: 4
>     cores: 4
>     memory: 16g
>     labels:
>   version: 3.0.1
>     javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
> {code}
>  I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart 
> the watcher when we receive a version changed from 
> k8s"|https://github.com/apache/spark/pull/29533] patch.
> This is the driver log:
> {code}
> 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> ... // my app log, it's a structured streaming app reading from kafka and 
> writing to hdfs
> 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed (this is expected if the application is shutting down.)
> io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
> version: 1574101276 (1574213896)
>  at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>  at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>  at 
> okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>  at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>  at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
>  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> The error above appears after roughly 50 minutes.
> After the exception above, no more logs are produced and the app hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24266) Spark client terminates while driver is still running

2020-10-11 Thread Jim Kleckner (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212105#comment-17212105
 ] 

Jim Kleckner commented on SPARK-24266:
--

I believe that the PR is ready to merge to the 3.0 branch for a target of 3.0.2:

https://github.com/apache/spark/pull/29533

> Spark client terminates while driver is still running
> -
>
> Key: SPARK-24266
> URL: https://issues.apache.org/jira/browse/SPARK-24266
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Chun Chen
>Priority: Major
> Fix For: 3.1.0
>
>
> {code}
> Warning: Ignoring non-spark config property: Default=system properties 
> included when running spark-submit.
> 18/05/11 14:50:12 WARN Config: Error reading service account token from: 
> [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
> 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: 
> Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf)
> 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. 
> Mounting Hadoop specific files
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: N/A
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: 2018-05-11T06:50:17Z
>container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9
>phase: Pending
>status: [ContainerStatus(containerID=null, 
> image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, 
> lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
> restartCount=0, state=ContainerState(running=null, terminated=null, 
> waiting=ContainerStateWaiting(message=null, reason=PodInitializing, 
> additionalProperties={}), additionalProperties={}), additionalProperties={})]
> 18/05/11 14:50:19 INFO Client: Waiting for application spark-64-293-980 to 
> finish...
> 18/05/11 14:50:25 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-

[jira] [Commented] (SPARK-24266) Spark client terminates while driver is still running

2020-07-22 Thread Jim Kleckner (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163066#comment-17163066
 ] 

Jim Kleckner commented on SPARK-24266:
--

Would be reasonable to add a target version of 2.4.7 ?

I took the commits from master and made a partial attempt to rebase this onto 
branch-2.4 [1].

However, the k8s api has evolved from 2.4 quite a bit so the watchOrStop 
function needs to be backported [2].

You can see the error message in this gitlab build [3].

 

[[1] 
https://github.com/jkleckner/spark/tree/SPARK-24266-on-branch2.4|https://github.com/jkleckner/spark/tree/SPARK-24266-on-branch2.4]

[2] 
[https://github.com/jkleckner/spark/blob/SPARK-24266-on-branch2.4/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala#L193]

[3] [https://gitlab.com/jkleckner/spark/-/jobs/651515950]

> Spark client terminates while driver is still running
> -
>
> Key: SPARK-24266
> URL: https://issues.apache.org/jira/browse/SPARK-24266
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Chun Chen
>Priority: Major
> Fix For: 3.1.0
>
>
> {code}
> Warning: Ignoring non-spark config property: Default=system properties 
> included when running spark-submit.
> 18/05/11 14:50:12 WARN Config: Error reading service account token from: 
> [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
> 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: 
> Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf)
> 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. 
> Mounting Hadoop specific files
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: N/A
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: 2018-05-11T06:50:17Z
>container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9
>phase: Pending
>status: [ContainerStatus(containerID=null, 
> image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, 
> lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
> restartCount=0, state=ContainerState(running=null, terminated=null, 
> waiting=ContainerStateWaiting(message=null, reason=PodInitializing, 
> additionalProperties={}), additionalProperties={}), additionalProperties={})]
>

[jira] [Commented] (SPARK-24266) Spark client terminates while driver is still running

2020-06-18 Thread Jim Kleckner (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140028#comment-17140028
 ] 

Jim Kleckner commented on SPARK-24266:
--

+1 for this patch set if it reviews cleanly.

> Spark client terminates while driver is still running
> -
>
> Key: SPARK-24266
> URL: https://issues.apache.org/jira/browse/SPARK-24266
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Chun Chen
>Priority: Major
>
> {code}
> Warning: Ignoring non-spark config property: Default=system properties 
> included when running spark-submit.
> 18/05/11 14:50:12 WARN Config: Error reading service account token from: 
> [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
> 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: 
> Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf)
> 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. 
> Mounting Hadoop specific files
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: N/A
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: 2018-05-11T06:50:17Z
>container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9
>phase: Pending
>status: [ContainerStatus(containerID=null, 
> image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, 
> lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
> restartCount=0, state=ContainerState(running=null, terminated=null, 
> waiting=ContainerStateWaiting(message=null, reason=PodInitializing, 
> additionalProperties={}), additionalProperties={}), additionalProperties={})]
> 18/05/11 14:50:19 INFO Client: Waiting for application spark-64-293-980 to 
> finish...
> 18/05/11 14:50:25 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvj

[jira] [Commented] (SPARK-30275) Add gitlab-ci.yml file for reproducible builds

2020-01-23 Thread Jim Kleckner (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022569#comment-17022569
 ] 

Jim Kleckner commented on SPARK-30275:
--

* Builds on my Mac use whatever I have installed on my machine whereas having a 
well-defined remote CI system eliminates variability.
 * The build process doesn't load my local system.
 * A push is just a git push rather than an image push which from home can take 
a long time since my ISP has very wimpy upload speeds.

Obviously some CI/CD tooling exists for spark testing and release on the back 
end, but that isn't available to most people.

> Add gitlab-ci.yml file for reproducible builds
> --
>
> Key: SPARK-30275
> URL: https://issues.apache.org/jira/browse/SPARK-30275
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Jim Kleckner
>Priority: Minor
>
> It would be desirable to have public reproducible builds such as provided by 
> gitlab or others.
>  
> Here is a candidate patch set to build spark using gitlab-ci:
> * https://gitlab.com/jkleckner/spark/tree/add-gitlab-ci-yml
> Let me know if there is interest in a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30275) Add gitlab-ci.yml file for reproducible builds

2020-01-22 Thread Jim Kleckner (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021766#comment-17021766
 ] 

Jim Kleckner commented on SPARK-30275:
--

I sent a message to [d...@spark.apache.org|mailto:d...@spark.apache.org] but 
haven't seen it get approved yet.

> Add gitlab-ci.yml file for reproducible builds
> --
>
> Key: SPARK-30275
> URL: https://issues.apache.org/jira/browse/SPARK-30275
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Jim Kleckner
>Priority: Minor
>
> It would be desirable to have public reproducible builds such as provided by 
> gitlab or others.
>  
> Here is a candidate patch set to build spark using gitlab-ci:
> * https://gitlab.com/jkleckner/spark/tree/add-gitlab-ci-yml
> Let me know if there is interest in a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30275) Add gitlab-ci.yml file for reproducible builds

2020-01-13 Thread Jim Kleckner (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014749#comment-17014749
 ] 

Jim Kleckner commented on SPARK-30275:
--

[~hyukjin.kwon] As an open source project, the builds in gitlab.com would 
produce well-defined artifacts for anyone who pushes a branch to it.

I have used this to produce artifacts for branch-2.4 after the release of 2.4.4 
to be able to get bug fixes for use with spark-on-k8s-operator for example.

People could create their own preview builds of master or any other version at 
will.

See this example for containers built from branch-2.4:

* [https://gitlab.com/jkleckner/spark/container_registry]

> Add gitlab-ci.yml file for reproducible builds
> --
>
> Key: SPARK-30275
> URL: https://issues.apache.org/jira/browse/SPARK-30275
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Jim Kleckner
>Priority: Minor
>
> It would be desirable to have public reproducible builds such as provided by 
> gitlab or others.
>  
> Here is a candidate patch set to build spark using gitlab-ci:
> * https://gitlab.com/jkleckner/spark/tree/add-gitlab-ci-yml
> Let me know if there is interest in a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30275) Add gitlab-ci.yml file for reproducible builds

2019-12-16 Thread Jim Kleckner (Jira)

Jim Kleckner created SPARK-30275:


 Summary: Add gitlab-ci.yml file for reproducible builds
 Key: SPARK-30275
 URL: https://issues.apache.org/jira/browse/SPARK-30275
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 2.4.4, 3.0.0
Reporter: Jim Kleckner


It would be desirable to have public reproducible builds such as provided by 
gitlab or others.
 
Here is a candidate patch set to build spark using gitlab-ci:

* https://gitlab.com/jkleckner/spark/tree/add-gitlab-ci-yml


Let me know if there is interest in a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28938) Move to supported OpenJDK docker image for Kubernetes

2019-10-29 Thread Jim Kleckner (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962131#comment-16962131
 ] 

Jim Kleckner commented on SPARK-28938:
--

I created a minor one-line patch fix for this here.  Let me know if it needs a 
different story.
[https://github.com/apache/spark/pull/26296]

> Move to supported OpenJDK docker image for Kubernetes
> -
>
> Key: SPARK-28938
> URL: https://issues.apache.org/jira/browse/SPARK-28938
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.4, 3.0.0
> Environment: Kubernetes
>Reporter: Rodney Aaron Stainback
>Assignee: L. C. Hsieh
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
> Attachments: cve-spark-py.txt, cve-spark-r.txt, cve-spark.txt, 
> twistlock.txt
>
>
> The current docker image used by Kubernetes
> {code:java}
> openjdk:8-alpine{code}
> is not supported 
> [https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links]
> It was removed with this commit
> [https://github.com/docker-library/openjdk/commit/3eb0351b208d739fac35345c85e3c6237c2114ec#diff-f95ffa3d134732c33f7b8368e099]
> Quote from commit "4. no more OpenJDK 8 Alpine images (Alpine/musl is not 
> officially supported by the OpenJDK project, so this reflects that -- see 
> "Project Portola" for the Alpine porting efforts which I understand are still 
> in need of help)"
>  
> Please move to a supported image for Kubernetes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29055) Memory leak in Spark

2019-10-01 Thread Jim Kleckner (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942224#comment-16942224
 ] 

Jim Kleckner commented on SPARK-29055:
--

[~vanzin] It seems from the previous comment that it is not resolved by PR 
#25973 as the memory continues to grow.

 

[~Geopap] I recorded an internal tech talk about how to connect/grab heap 
dump/analyze with jvisualvm if you are interested here: 
[https://www.youtube.com/channel/UCA81uPFG3aqo2X1YgZRAoEg]



 

> Memory leak in Spark
> 
>
> Key: SPARK-29055
> URL: https://issues.apache.org/jira/browse/SPARK-29055
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 2.3.3
>Reporter: George Papa
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
> Attachments: test_csvs.zip
>
>
> I used Spark 2.1.1 and I upgraded into new versions. After Spark version 
> 2.3.3,  I observed from Spark UI that the driver memory is{color:#ff} 
> increasing continuously.{color}
> In more detail, the driver memory and executors memory have the same used 
> memory storage and after each iteration the storage memory is increasing. You 
> can reproduce this behavior by running the following snippet code. The 
> following example, is very simple, without any dataframe persistence, but the 
> memory consumption is not stable as it was in former Spark versions 
> (Specifically until Spark 2.3.2).
> Also, I tested with Spark streaming and structured streaming API and I had 
> the same behavior. I tested with an existing application which reads from 
> Kafka source and do some aggregations, persist dataframes and then unpersist 
> them. The persist and unpersist it works correct, I see the dataframes in the 
> storage tab in Spark UI and after the unpersist, all dataframe have removed. 
> But, after the unpersist the executors memory is not zero, BUT has the same 
> value with the driver memory. This behavior also affects the application 
> performance because the memory of the executors is increasing as the driver 
> increasing and after a while the persisted dataframes are not fit in the 
> executors memory and  I have spill to disk.
> Another error which I had after a long running, was 
> {color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded, but I 
> don't know if its relevant with the above behavior or not.{color}
>  
> *HOW TO REPRODUCE THIS BEHAVIOR:*
> Create a very simple application(streaming count_file.py) in order to 
> reproduce this behavior. This application reads CSV files from a directory, 
> count the rows and then remove the processed files.
> {code:java}
> import time
> import os
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as F
> from pyspark.sql import types as T
> target_dir = "..."
> spark=SparkSession.builder.appName("DataframeCount").getOrCreate()
> while True:
> for f in os.listdir(target_dir):
> df = spark.read.load(target_dir + f, format="csv")
> print("Number of records: {0}".format(df.count()))
> time.sleep(15){code}
> Submit code:
> {code:java}
> spark-submit 
> --master spark://xxx.xxx.xx.xxx
> --deploy-mode client
> --executor-memory 4g
> --executor-cores 3
> streaming count_file.py
> {code}
>  
> *TESTED CASES WITH THE SAME BEHAVIOUR:*
>  * I tested with default settings (spark-defaults.conf)
>  * Add spark.cleaner.periodicGC.interval 1min (or less)
>  * {{Turn spark.cleaner.referenceTracking.blocking}}=false
>  * Run the application in cluster mode
>  * Increase/decrease the resources of the executors and driver
>  * I tested with extraJavaOptions in driver and executor -XX:+UseG1GC 
> -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=12
>   
> *DEPENDENCIES*
>  * Operation system: Ubuntu 16.04.3 LTS
>  * Java: jdk1.8.0_131 (tested also with jdk1.8.0_221)
>  * Python: Python 2.7.12
>  
> *NOTE:* In Spark 2.1.1 the driver memory consumption (Storage Memory tab) was 
> extremely low and after the run of ContextCleaner and BlockManager the memory 
> was decreasing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29055) Memory leak in Spark

2019-09-27 Thread Jim Kleckner (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939787#comment-16939787
 ] 

Jim Kleckner commented on SPARK-29055:
--

[~Geopap] How quickly does the memory grow?

> Memory leak in Spark
> 
>
> Key: SPARK-29055
> URL: https://issues.apache.org/jira/browse/SPARK-29055
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 2.3.3
>Reporter: George Papa
>Priority: Major
> Attachments: test_csvs.zip
>
>
> I used Spark 2.1.1 and I upgraded into new versions. After Spark version 
> 2.3.3,  I observed from Spark UI that the driver memory is{color:#ff} 
> increasing continuously.{color}
> In more detail, the driver memory and executors memory have the same used 
> memory storage and after each iteration the storage memory is increasing. You 
> can reproduce this behavior by running the following snippet code. The 
> following example, is very simple, without any dataframe persistence, but the 
> memory consumption is not stable as it was in former Spark versions 
> (Specifically until Spark 2.3.2).
> Also, I tested with Spark streaming and structured streaming API and I had 
> the same behavior. I tested with an existing application which reads from 
> Kafka source and do some aggregations, persist dataframes and then unpersist 
> them. The persist and unpersist it works correct, I see the dataframes in the 
> storage tab in Spark UI and after the unpersist, all dataframe have removed. 
> But, after the unpersist the executors memory is not zero, BUT has the same 
> value with the driver memory. This behavior also affects the application 
> performance because the memory of the executors is increasing as the driver 
> increasing and after a while the persisted dataframes are not fit in the 
> executors memory and  I have spill to disk.
> Another error which I had after a long running, was 
> {color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded, but I 
> don't know if its relevant with the above behavior or not.{color}
>  
> *HOW TO REPRODUCE THIS BEHAVIOR:*
> Create a very simple application(streaming count_file.py) in order to 
> reproduce this behavior. This application reads CSV files from a directory, 
> count the rows and then remove the processed files.
> {code:java}
> import time
> import os
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as F
> from pyspark.sql import types as T
> target_dir = "..."
> spark=SparkSession.builder.appName("DataframeCount").getOrCreate()
> while True:
> for f in os.listdir(target_dir):
> df = spark.read.load(target_dir + f, format="csv")
> print("Number of records: {0}".format(df.count()))
> time.sleep(15){code}
> Submit code:
> {code:java}
> spark-submit 
> --master spark://xxx.xxx.xx.xxx
> --deploy-mode client
> --executor-memory 4g
> --executor-cores 3
> streaming count_file.py
> {code}
>  
> *TESTED CASES WITH THE SAME BEHAVIOUR:*
>  * I tested with default settings (spark-defaults.conf)
>  * Add spark.cleaner.periodicGC.interval 1min (or less)
>  * {{Turn spark.cleaner.referenceTracking.blocking}}=false
>  * Run the application in cluster mode
>  * Increase/decrease the resources of the executors and driver
>  * I tested with extraJavaOptions in driver and executor -XX:+UseG1GC 
> -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=12
>   
> *DEPENDENCIES*
>  * Operation system: Ubuntu 16.04.3 LTS
>  * Java: jdk1.8.0_131 (tested also with jdk1.8.0_221)
>  * Python: Python 2.7.12
>  
> *NOTE:* In Spark 2.1.1 the driver memory consumption (Storage Memory tab) was 
> extremely low and after the run of ContextCleaner and BlockManager the memory 
> was decreasing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-11141) Batching of ReceivedBlockTrackerLogEvents for efficient WAL writes

2017-03-08 Thread Jim Kleckner (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902537#comment-15902537
 ] 

Jim Kleckner edited comment on SPARK-11141 at 3/9/17 5:54 AM:
--

FYI, this can cause problems when not using S3 during shutdown as described in 
this AWS posting: https://forums.aws.amazon.com/thread.jspa?threadID=223378

The workaround indicated is to use --conf 
spark.streaming.driver.writeAheadLog.allowBatching=false with the submit.

The exception contains the text:
{code}
streaming stop ReceivedBlockTracker: Exception thrown while writing record: 
BatchAllocationEvent
{code}


was (Author: jkleckner):
FYI, this can cause problems when not using S3 during shutdown as described in 
this AWS posting: https://forums.aws.amazon.com/thread.jspa?threadID=223378

The workaround indicated is to use --conf 
spark.streaming.driver.writeAheadLog.allowBatching=false with the submit.

> Batching of ReceivedBlockTrackerLogEvents for efficient WAL writes
> --
>
> Key: SPARK-11141
> URL: https://issues.apache.org/jira/browse/SPARK-11141
> Project: Spark
>  Issue Type: Improvement
>  Components: DStreams
>Reporter: Burak Yavuz
>Assignee: Burak Yavuz
> Fix For: 1.6.0
>
>
> When using S3 as a directory for WALs, the writes take too long. The driver 
> gets very easily bottlenecked when multiple receivers send AddBlock events to 
> the ReceiverTracker. This PR adds batching of events in the 
> ReceivedBlockTracker so that receivers don't get blocked by the driver for 
> too long.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11141) Batching of ReceivedBlockTrackerLogEvents for efficient WAL writes

2017-03-08 Thread Jim Kleckner (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902537#comment-15902537
 ] 

Jim Kleckner commented on SPARK-11141:
--

FYI, this can cause problems when not using S3 during shutdown as described in 
this AWS posting: https://forums.aws.amazon.com/thread.jspa?threadID=223378

The workaround indicated is to use --conf 
spark.streaming.driver.writeAheadLog.allowBatching=false with the submit.

> Batching of ReceivedBlockTrackerLogEvents for efficient WAL writes
> --
>
> Key: SPARK-11141
> URL: https://issues.apache.org/jira/browse/SPARK-11141
> Project: Spark
>  Issue Type: Improvement
>  Components: DStreams
>Reporter: Burak Yavuz
>Assignee: Burak Yavuz
> Fix For: 1.6.0
>
>
> When using S3 as a directory for WALs, the writes take too long. The driver 
> gets very easily bottlenecked when multiple receivers send AddBlock events to 
> the ReceiverTracker. This PR adds batching of events in the 
> ReceivedBlockTracker so that receivers don't get blocked by the driver for 
> too long.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16333) Excessive Spark history event/json data size (5GB each)

2017-03-07 Thread Jim Kleckner (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900533#comment-15900533
 ] 

Jim Kleckner commented on SPARK-16333:
--

I ended up here when looking into why an upgrade of our streaming computation 
to 2.1.0 was pegging the network at a gigabit/second.
Setting to spark.eventLog.enabled to false confirmed that this logging from 
slave port 50010 was the culprit.

How can anyone with seriously large numbers of tasks use spark history with 
this amount of load?

> Excessive Spark history event/json data size (5GB each)
> ---
>
> Key: SPARK-16333
> URL: https://issues.apache.org/jira/browse/SPARK-16333
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
> Environment: this is seen on both x86 (Intel(R) Xeon(R), E5-2699 ) 
> and ppc platform (Habanero, Model: 8348-21C), Red Hat Enterprise Linux Server 
> release 7.2 (Maipo)., Spark2.0.0-preview (May-24, 2016 build)
>Reporter: Peter Liu
>  Labels: performance, spark2.0.0
>
> With Spark2.0.0-preview (May-24 build), the history event data (the json 
> file), that is generated for each Spark application run (see below), can be 
> as big as 5GB (instead of 14 MB for exactly the same application run and the 
> same input data of 1TB under Spark1.6.1)
> -rwxrwx--- 1 root root 5.3G Jun 30 09:39 app-20160630091959-
> -rwxrwx--- 1 root root 5.3G Jun 30 09:56 app-20160630094213-
> -rwxrwx--- 1 root root 5.3G Jun 30 10:13 app-20160630095856-
> -rwxrwx--- 1 root root 5.3G Jun 30 10:30 app-20160630101556-
> The test is done with Sparkbench V2, SQL RDD (see github: 
> https://github.com/SparkTC/spark-bench)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6029) Spark excludes "fastutil" dependencies of "clearspring" quantiles

2015-02-25 Thread Jim Kleckner (JIRA)

Jim Kleckner created SPARK-6029:
---

 Summary: Spark excludes "fastutil" dependencies of "clearspring" 
quantiles
 Key: SPARK-6029
 URL: https://issues.apache.org/jira/browse/SPARK-6029
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.2.1
Reporter: Jim Kleckner


Spark includes the clearspring analytics package but intentionally excludes the 
dependencies of the fastutil package.

Spark includes parquet-column which includes fastutil and relocates it under 
parquet/ but creates a shaded jar file which is incomplete because it shades 
out some of the fastutil classes, notably Long2LongOpenHashMap, which is 
present in the fastutil jar file that parquet-column is referencing.

We are using more of the clearspring classes (e.g. QDigest) and those do depend 
on missing fastutil classes like Long2LongOpenHashMap.

Even though I add them to our assembly jar file, the class loader finds the 
spark assembly and we get runtime class loader errors when we try to use it.


The 
[documentaion|http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment]
 and possibly related issue 
[SPARK-939|https://issues.apache.org/jira/browse/SPARK-939] suggest arguments 
that I tried with spark-submit:
{code}
--conf spark.driver.userClassPathFirst=true \
--conf spark.executor.userClassPathFirst=true
{code}
but we still get the class not found error.

Could this be a bug with {{userClassPathFirst=true}}?  i.e. should it work?

In any case, would it be reasonable to not exclude the "fastutil" dependencies?

See email discussion 
[here|http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-Spark-excludes-quot-fastutil-quot-dependencies-we-need-tt21812.html]




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

[jira] [Commented] (SPARK-31168) Upgrade Scala to 2.12.13

[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

[jira] [Commented] (SPARK-24266) Spark client terminates while driver is still running

[jira] [Commented] (SPARK-24266) Spark client terminates while driver is still running

[jira] [Commented] (SPARK-24266) Spark client terminates while driver is still running

[jira] [Commented] (SPARK-30275) Add gitlab-ci.yml file for reproducible builds

[jira] [Commented] (SPARK-30275) Add gitlab-ci.yml file for reproducible builds

[jira] [Commented] (SPARK-30275) Add gitlab-ci.yml file for reproducible builds

[jira] [Created] (SPARK-30275) Add gitlab-ci.yml file for reproducible builds

[jira] [Commented] (SPARK-28938) Move to supported OpenJDK docker image for Kubernetes

[jira] [Commented] (SPARK-29055) Memory leak in Spark

[jira] [Commented] (SPARK-29055) Memory leak in Spark

[jira] [Comment Edited] (SPARK-11141) Batching of ReceivedBlockTrackerLogEvents for efficient WAL writes

[jira] [Commented] (SPARK-11141) Batching of ReceivedBlockTrackerLogEvents for efficient WAL writes

[jira] [Commented] (SPARK-16333) Excessive Spark history event/json data size (5GB each)

[jira] [Created] (SPARK-6029) Spark excludes "fastutil" dependencies of "clearspring" quantiles

17 matches

Site Navigation

Mail list logo

Footer information