from:"Stavros Kontopoulos \(JIRA\)"

[jira] [Comment Edited] (SPARK-23153) Support application dependencies in submission client's local file system

2021-09-17 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416846#comment-17416846
 ] 

Stavros Kontopoulos edited comment on SPARK-23153 at 9/17/21, 6:25 PM:
---

[~xuzhoyin] sorry for the late reply, the local scheme in the past meant local 
in the container, had a different meaning 
(https://github.com/apache/spark/pull/21378). So this was intentional. Not sure 
the status now. Btw regarding the S3 prefix, if I remember correctly the idea 
was not to download files from a remote location locally and then store them 
again eg. S3, this was intended for local files only. Feel free to add any 
other capabilities.  


was (Author: skonto):
[~xuzhoyin] sorry for the late reply, the local scheme in the past meant local 
in the container, had a different meaning 
(https://github.com/apache/spark/pull/21378). So this was intentional. Not sure 
the status now. Btw i I remember correctly the idea was not to download files 
from a remote location locally and then store them again eg. S3. 

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-23153) Support application dependencies in submission client's local file system

2021-09-17 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416846#comment-17416846
 ] 

Stavros Kontopoulos edited comment on SPARK-23153 at 9/17/21, 6:23 PM:
---

[~xuzhoyin] sorry for the late reply, the local scheme in the past meant local 
in the container, had a different meaning 
(https://github.com/apache/spark/pull/21378). So this was intentional. Not sure 
the status now. Btw i I remember correctly the idea was not to download files 
from a remote location locally and then store them again eg. S3. 


was (Author: skonto):
[~xuzhoyin] sorry for the late reply, the local scheme in the past meant local 
in the container, had a different meaning 
(https://github.com/apache/spark/pull/21378). So this was intentional. Not sure 
the status now.

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-23153) Support application dependencies in submission client's local file system

2021-09-17 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416846#comment-17416846
 ] 

Stavros Kontopoulos edited comment on SPARK-23153 at 9/17/21, 6:21 PM:
---

[~xuzhoyin] sorry for the late reply, the local scheme in the past meant local 
in the container, had a different meaning 
(https://github.com/apache/spark/pull/21378). So this was intentional. Not sure 
the status now.


was (Author: skonto):
[~xuzhoyin] sorry for the late reply, the local scheme in the past meant local 
in the container, had a different meaning 
(https://github.com/apache/spark/pull/21378). Not sure the status now.

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23153) Support application dependencies in submission client's local file system

2021-09-17 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416846#comment-17416846
 ] 

Stavros Kontopoulos commented on SPARK-23153:
-

[~xuzhoyin] sorry for the late reply, the local scheme in the past meant local 
in the container, had a different meaning 
(https://github.com/apache/spark/pull/21378). Not sure the status now.

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33737) Use an Informer+Lister API in the ExecutorPodWatcher

2020-12-11 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247759#comment-17247759
 ] 

Stavros Kontopoulos commented on SPARK-33737:
-

In addition current implementation has been out for long and it is stable. Need 
to be sure that any updates will not cause any issues.
I can work on a PR and see how things integrate.

> Use an Informer+Lister API in the ExecutorPodWatcher
> 
>
> Key: SPARK-33737
> URL: https://issues.apache.org/jira/browse/SPARK-33737
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Kubernetes backend uses Fabric8 client and a 
> [watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
>  to monitor the K8s Api server for pod changes. Every watcher keeps a 
> websocket connection open and has no caching mechanism at that part. Caching 
> at the Spark K8s resource manager exists in other areas where we are hitting 
> the Api Server for Pod CRUD ops like 
> [here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].
> In an env where a lot of connections are kept due to large scale jobs this 
> could be problematic and impose a lot of load against the API server. A lot 
> of long running jobs should not create pod changes eg. Streaming jobs to 
> justify a continuous watching mechanism.
> Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
> an example can be found 
> [here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
> This new API follows the implementation of the official java K8s client and 
> the go counterpart and it is backed up by a caching mechanism which is 
> re-synced after a configurable period to avoid hitting the API server all the 
> time. There is also a lister that keeps track of current status of resources. 
> Using such a mechanism is common place when implementing a K8s controller.
> The suggestion is to update to v4.13.0 the client (has all updates in wrt 
> that API) and use the informer+lister API where applicable. 
> I think the lister could also replace part of the snapshotting/notification 
> mechanism.
> /cc [~dongjoon] [~eje] [~holden] WDYTH?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33737) Use an Informer+Lister API in the ExecutorPodWatcher

2020-12-10 Thread Stavros Kontopoulos (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-33737:

Description: 
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching at the Spark 
K8s resource manager exists in other areas where we are hitting the Api Server 
for Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic and impose a lot of load against the API server. A lot of 
long running jobs should not create pod changes eg. Streaming jobs to justify a 
continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is re-synced 
after a configurable period to avoid hitting the API server all the time. There 
is also a lister that keeps track of current status of resources. Using such a 
mechanism is common place when implementing a K8s controller.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. 
I think the lister could also replace part of the snapshotting/notification 
mechanism.

/cc [~dongjoon] [~eje] [~holden] WDYTH?
 

  was:
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching at the K8s 
resource manager exists in other areas where we are hitting the Api Server for 
Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic and impose a lot of load against the API server.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is re-synced 
after a configurable period to avoid hitting the API server all the time. There 
is also a lister that keeps track of current status of resources. Using such a 
mechanism is common place when implementing a K8s controller.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. 
I think the lister could also replace part of the snapshotting/notification 
mechanism.

/cc [~dongjoon] [~eje] [~holden] WDYTH?
 


> Use an Informer+Lister API in the ExecutorPodWatcher
> 
>
> Key: SPARK-33737
> URL: https://issues.apache.org/jira/browse/SPARK-33737
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.2
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Kubernetes backend uses Fabric8 client and a 
> [watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
>  to monitor the K8s Api server for pod changes. Every watcher keeps a 
> websocket connection open and has no caching mechanism at that part. Caching 
> at the Spark K8s resource manager exists in other areas where we are hitting 
> the Api Server for Pod CRUD ops like 
>

[jira] [Updated] (SPARK-33737) Use an Informer+Lister API in the ExecutorPodWatcher

2020-12-10 Thread Stavros Kontopoulos (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-33737:

Description: 
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching at the K8s 
resource manager exists in other areas where we are hitting the Api Server for 
Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic and impose a lot of load against the API server.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is re-synced 
after a configurable period to avoid hitting the API server all the time. There 
is also a lister that keeps track of current status of resources. Using such a 
mechanism is common place when implementing a K8s controller.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. 
I think the lister could also replace part of the snapshotting/notification 
mechanism.

/cc [~dongjoon] [~eje] [~holden] WDYTH?
 

  was:
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching at the K8s 
resource manager exists in other areas where we are hitting the Api Server for 
Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic and impose a lot of load against the API server.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period to avoid hitting the API server all the time. There 
is also a lister that keeps track of current status of resources. Using such a 
mechanism is common place when implementing a K8s controller.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. 
I think the lister could also replace part of the snapshotting/notification 
mechanism.

/cc [~dongjoon] [~eje] [~holden] WDYTH?
 


> Use an Informer+Lister API in the ExecutorPodWatcher
> 
>
> Key: SPARK-33737
> URL: https://issues.apache.org/jira/browse/SPARK-33737
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.2
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Kubernetes backend uses Fabric8 client and a 
> [watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
>  to monitor the K8s Api server for pod changes. Every watcher keeps a 
> websocket connection open and has no caching mechanism at that part. Caching 
> at the K8s resource manager exists in other areas where we are hitting the 
> Api Server for Pod CRUD ops like 
>

[jira] [Updated] (SPARK-33737) Use an Informer+Lister API in the ExecutorPodWatcher

2020-12-10 Thread Stavros Kontopoulos (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-33737:

Description: 
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching at the K8s 
resource manager exists in other areas where we are hitting the Api Server for 
Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic and impose a lot of load against the API server.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period to avoid hitting the API server all the time. There 
is also a lister that keeps track of current status of resources. Using such a 
mechanism is common place when implementing a K8s controller.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. 
I think the lister could also replace part of the snapshotting/notification 
mechanism.

/cc [~dongjoon] [~eje] [~holden] WDYTH?
 

  was:
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching at the K8s 
resource manager exists in other areas where we are hitting the Api Server for 
Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period to avoid hitting the API server all the time. There 
is also a lister that keeps track of current status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. I think the lister could 
also replace part of the snapshotting/notification mechanism.

/cc [~dongjoon] [~eje] [~holden] WDYTH?
 


> Use an Informer+Lister API in the ExecutorPodWatcher
> 
>
> Key: SPARK-33737
> URL: https://issues.apache.org/jira/browse/SPARK-33737
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.2
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Kubernetes backend uses Fabric8 client and a 
> [watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
>  to monitor the K8s Api server for pod changes. Every watcher keeps a 
> websocket connection open and has no caching mechanism at that part. Caching 
> at the K8s resource manager exists in other areas where we are hitting the 
> Api Server for Pod CRUD ops like 
>

[jira] [Updated] (SPARK-33737) Use an Informer+Lister API in the ExecutorPodWatcher

2020-12-10 Thread Stavros Kontopoulos (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-33737:

Description: 
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching at the K8s 
resource manager exists in other areas where we are hitting the Api Server for 
Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period to avoid hitting the API server all the time. There 
is also a lister that keeps track of current status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. I think the lister could 
also replace part of the snapshotting/notification mechanism.

/cc [~dongjoon] [~eje] [~holden] WDYTH?
 

  was:
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching exists in 
other areas where hitting the Api Server for Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period to avoid hitting the API server all the time. There 
is also a lister that keeps track of current status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. I think the lister could 
also replace part of the snapshotting/notification mechanism.

/cc [~dongjoon] [~eje] [~holden] WDYTH?
 


> Use an Informer+Lister API in the ExecutorPodWatcher
> 
>
> Key: SPARK-33737
> URL: https://issues.apache.org/jira/browse/SPARK-33737
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.2
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Kubernetes backend uses Fabric8 client and a 
> [watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
>  to monitor the K8s Api server for pod changes. Every watcher keeps a 
> websocket connection open and has no caching mechanism at that part. Caching 
> at the K8s resource manager exists in other areas where we are hitting the 
> Api Server for Pod CRUD ops like 
> [here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].
> In an env where a lot of connections are kept due to large scale jobs this 
> could be problematic.
> A lot of long running jobs should not create pod

[jira] [Updated] (SPARK-33737) Use an Informer+Lister API in the ExecutorPodWatcher

2020-12-10 Thread Stavros Kontopoulos (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-33737:

Description: 
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching exists in 
other areas where hitting the Api Server for Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period to avoid hitting the API server all the time. There 
is also a lister that keeps track of current status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. I think the lister could 
also replace part of the snapshotting/notification mechanism.

/cc [~dongjoon] [~eje] [~holden] WDYTH?
 

  was:
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching exists in 
other areas where hitting the Api Server for Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period to avoid hitting the API server all the time. There 
is also a lister that keeps track of current status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. I think the lister could 
also replace part of the snapshotting/notification mechanism.

/cc [~dongjoon] [~eje] [~holden]] WDYTH?
 


> Use an Informer+Lister API in the ExecutorPodWatcher
> 
>
> Key: SPARK-33737
> URL: https://issues.apache.org/jira/browse/SPARK-33737
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.2
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Kubernetes backend uses Fabric8 client and a 
> [watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
>  to monitor the K8s Api server for pod changes. Every watcher keeps a 
> websocket connection open and has no caching mechanism at that part. Caching 
> exists in other areas where hitting the Api Server for Pod CRUD ops like 
> [here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].
> In an env where a lot of connections are kept due to large scale jobs this 
> could be problematic.
> A lot of long running jobs should not create pod changes eg. Streaming jobs 
> to justify a continuous watching mechanism.

[jira] [Updated] (SPARK-33737) Use an Informer+Lister API in the ExecutorPodWatcher

2020-12-10 Thread Stavros Kontopoulos (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-33737:

Description: 
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching exists in 
other areas where hitting the Api Server for Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period to avoid hitting the API server all the time. There 
is also a lister that keeps track of current status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. I think the lister could 
replace part of the snapshotting/notification mechanism.

/cc [~dongjoon] [~eje] [~holden]] WDYTH?
 

  was:
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching exists in 
other areas where hitting the Api Server for Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period. There is also a lister that keeps track of current 
status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. I think the lister could 
replace part of the snapshotting/notification mechanism.

/cc [~dongjoon] [~eje] [~holden]] WDYTH?
 


> Use an Informer+Lister API in the ExecutorPodWatcher
> 
>
> Key: SPARK-33737
> URL: https://issues.apache.org/jira/browse/SPARK-33737
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.2
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Kubernetes backend uses Fabric8 client and a 
> [watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
>  to monitor the K8s Api server for pod changes. Every watcher keeps a 
> websocket connection open and has no caching mechanism at that part. Caching 
> exists in other areas where hitting the Api Server for Pod CRUD ops like 
> [here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].
> In an env where a lot of connections are kept due to large scale jobs this 
> could be problematic.
> A lot of long running jobs should not create pod changes eg. Streaming jobs 
> to justify a continuous watching mechanism.
> Latest Frabric8 client versions have implemented a

[jira] [Updated] (SPARK-33737) Use an Informer+Lister API in the ExecutorPodWatcher

2020-12-10 Thread Stavros Kontopoulos (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-33737:

Description: 
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching exists in 
other areas where hitting the Api Server for Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period to avoid hitting the API server all the time. There 
is also a lister that keeps track of current status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. I think the lister could 
also replace part of the snapshotting/notification mechanism.

/cc [~dongjoon] [~eje] [~holden]] WDYTH?
 

  was:
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching exists in 
other areas where hitting the Api Server for Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period to avoid hitting the API server all the time. There 
is also a lister that keeps track of current status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. I think the lister could 
replace part of the snapshotting/notification mechanism.

/cc [~dongjoon] [~eje] [~holden]] WDYTH?
 


> Use an Informer+Lister API in the ExecutorPodWatcher
> 
>
> Key: SPARK-33737
> URL: https://issues.apache.org/jira/browse/SPARK-33737
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.2
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Kubernetes backend uses Fabric8 client and a 
> [watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
>  to monitor the K8s Api server for pod changes. Every watcher keeps a 
> websocket connection open and has no caching mechanism at that part. Caching 
> exists in other areas where hitting the Api Server for Pod CRUD ops like 
> [here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].
> In an env where a lot of connections are kept due to large scale jobs this 
> could be problematic.
> A lot of long running jobs should not create pod changes eg. Streaming jobs 
> to justify a continuous watching mechanism.
>

[jira] [Updated] (SPARK-33737) Use an Informer+Lister API in the ExecutorPodWatcher

2020-12-10 Thread Stavros Kontopoulos (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-33737:

Description: 
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching exists in 
other areas where hitting the Api Server for Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period. There is also a lister that keeps track of current 
status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. I think the lister could 
replace part of the snapshotting/notification mechanism.

/cc [~dongjoon] [~eje] [~holden]] WDYTH?
 

  was:
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching exists in 
other areas where hitting the Api Server for Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period. There is also a lister that keeps track of current 
status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. I think the lister could 
replace part of the snapshotting/notification mechanism.

/cc [~dongjoon] [~eje]] WDYTH?
 


> Use an Informer+Lister API in the ExecutorPodWatcher
> 
>
> Key: SPARK-33737
> URL: https://issues.apache.org/jira/browse/SPARK-33737
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.2
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Kubernetes backend uses Fabric8 client and a 
> [watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
>  to monitor the K8s Api server for pod changes. Every watcher keeps a 
> websocket connection open and has no caching mechanism at that part. Caching 
> exists in other areas where hitting the Api Server for Pod CRUD ops like 
> [here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].
> In an env where a lot of connections are kept due to large scale jobs this 
> could be problematic.
> A lot of long running jobs should not create pod changes eg. Streaming jobs 
> to justify a continuous watching mechanism.
> Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
> an example can be found

[jira] [Updated] (SPARK-33737) Use an Informer+Lister API in the ExecutorPodWatcher

2020-12-10 Thread Stavros Kontopoulos (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-33737:

Description: 
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching exists in 
other areas where hitting the Api Server for Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period. There is also a lister that keeps track of current 
status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. I think the lister could 
replace part of the snapshotting/notification mechanism.

/cc [~dongjoon] [~eje]] WDYTH?
 

  was:
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching exists in 
other areas where hitting the Api Server for Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period. There is also a lister that keeps track of current 
status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. I think the lister could 
replace part of the snapshotting/notification mechanism.

/cc [~dongjoon] WDYTH?
 


> Use an Informer+Lister API in the ExecutorPodWatcher
> 
>
> Key: SPARK-33737
> URL: https://issues.apache.org/jira/browse/SPARK-33737
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.2
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Kubernetes backend uses Fabric8 client and a 
> [watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
>  to monitor the K8s Api server for pod changes. Every watcher keeps a 
> websocket connection open and has no caching mechanism at that part. Caching 
> exists in other areas where hitting the Api Server for Pod CRUD ops like 
> [here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].
> In an env where a lot of connections are kept due to large scale jobs this 
> could be problematic.
> A lot of long running jobs should not create pod changes eg. Streaming jobs 
> to justify a continuous watching mechanism.
> Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
> an example can be found 
>

[jira] [Updated] (SPARK-33737) Use an Informer+Lister API in the ExecutorPodWatcher

2020-12-10 Thread Stavros Kontopoulos (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-33737:

Description: 
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching exists in 
other areas where hitting the Api Server for Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period. There is also a lister that keeps track of current 
status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. I think the lister could 
replace part of the snapshotting/notification mechanism.

/cc [~dongjoon] WDYTH?
 

  was:
Kubernetes backend uses Fabric8 client and a 
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
 to monitor the K8s Api server for pod changes. Every watcher keeps a websocket 
connection open and has no caching mechanism at that part. Caching exists in 
other areas where hitting the Api Server for Pod CRUD ops like 
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].


In an env where a lot of connections are kept due to large scale jobs this 
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to 
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
an example can be found 
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the 
go counterpart and it is backed up by a caching mechanism which is resynced 
after a configurble period. There is also a lister that keeps track of current 
status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that 
API) and use the informer+lister API where applicable. I think the lister could 
replace part of the snapshotting/notification mechanism.

/cc [~erikerlandson] [~dongjoon] WDYTH?
 


> Use an Informer+Lister API in the ExecutorPodWatcher
> 
>
> Key: SPARK-33737
> URL: https://issues.apache.org/jira/browse/SPARK-33737
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.2
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Kubernetes backend uses Fabric8 client and a 
> [watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
>  to monitor the K8s Api server for pod changes. Every watcher keeps a 
> websocket connection open and has no caching mechanism at that part. Caching 
> exists in other areas where hitting the Api Server for Pod CRUD ops like 
> [here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].
> In an env where a lot of connections are kept due to large scale jobs this 
> could be problematic.
> A lot of long running jobs should not create pod changes eg. Streaming jobs 
> to justify a continuous watching mechanism.
> Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
> an example can be found 
>

[jira] [Created] (SPARK-33737) Use an Informer+Lister API in the ExecutorPodWatcher

2020-12-10 Thread Stavros Kontopoulos (Jira)

Stavros Kontopoulos created SPARK-33737:
---

Summary: Use an Informer+Lister API in the ExecutorPodWatcher
Key: SPARK-33737
URL: https://issues.apache.org/jira/browse/SPARK-33737
Project: Spark
Issue Type: Improvement
Components: Kubernetes
Affects Versions: 3.0.2
Reporter: Stavros Kontopoulos

Kubernetes backend uses Fabric8 client and a
[watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
to monitor the K8s Api server for pod changes. Every watcher keeps a websocket
connection open and has no caching mechanism at that part. Caching exists in
other areas where hitting the Api Server for Pod CRUD ops like
[here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].

In an env where a lot of connections are kept due to large scale jobs this
could be problematic.
A lot of long running jobs should not create pod changes eg. Streaming jobs to
justify a continuous watching mechanism.

Latest Frabric8 client versions have implemented a SharedInformer API+Lister,
an example can be found
[here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
This new API follows the implementation of the official java K8s client and the
go counterpart and it is backed up by a caching mechanism which is resynced
after a configurble period. There is also a lister that keeps track of current
status of resources.
The suggestion is to update to v4.13.0 the client (has all updates in wrt that
API) and use the informer+lister API where applicable. I think the lister could
replace part of the snapshotting/notification mechanism.

/cc [~erikerlandson] [~dongjoon] WDYTH?

--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-27936) Support local dependency uploading from --py-files

2019-09-20 Thread Stavros Kontopoulos (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-27936:

Comment: was deleted

(was: Will create a PR shortly.)

> Support local dependency uploading from --py-files
> --
>
> Key: SPARK-27936
> URL: https://issues.apache.org/jira/browse/SPARK-27936
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Erik Erlandson
>Priority: Major
>
> Support python dependency uploads, as in SPARK-23153



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27936) Support local dependency uploading from --py-files

2019-09-20 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934227#comment-16934227
 ] 

Stavros Kontopoulos commented on SPARK-27936:
-

Will create a PR shortly.

> Support local dependency uploading from --py-files
> --
>
> Key: SPARK-27936
> URL: https://issues.apache.org/jira/browse/SPARK-27936
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Erik Erlandson
>Priority: Major
>
> Support python dependency uploads, as in SPARK-23153



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28953) Integration tests fail due to malformed URL

2019-09-04 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922547#comment-16922547
 ] 

Stavros Kontopoulos edited comment on SPARK-28953 at 9/4/19 3:43 PM:
-

[~srowen] thanks I will have a look need to test if it removes the extra text, 
the question is why the command via java is different compared to the bash 
one [~holden.ka...@gmail.com] fyi.


was (Author: skonto):
[~srowen] thanks I will have a look need to test if ti removes the extra text, 
the question is why the command via java is different compared to the bash 
one [~holden.ka...@gmail.com] fyi.

> Integration tests fail due to malformed URL
> ---
>
> Key: SPARK-28953
> URL: https://issues.apache.org/jira/browse/SPARK-28953
> Project: Spark
>  Issue Type: Bug
>  Components: jenkins, Kubernetes
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Tests failed on Ubuntu, verified on two different machines:
> KubernetesSuite:
> - Launcher client dependencies *** FAILED ***
>  java.net.MalformedURLException: no protocol: * http://172.31.46.91:30706
>  at java.net.URL.(URL.java:600)
>  at java.net.URL.(URL.java:497)
>  at java.net.URL.(URL.java:446)
>  at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160)
>  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>  at org.scalatest.Transformer.apply(Transformer.scala:22)
>  at org.scalatest.Transformer.apply(Transformer.scala:20)
>  at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>  
> Welcome to
>   __
>  / __/__ ___ _/ /__
>  _\ \/ _ \/ _ `/ __/ '_/
>  /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT
>  /_/
>  
>  Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
>  Type in expressions to have them evaluated.
>  Type :help for more information.
>  
> scala> val pb = new ProcessBuilder().command("bash", "-c", "minikube service 
> ceph-nano-s3 -n spark --url")
>  pb: ProcessBuilder = java.lang.ProcessBuilder@46092840
> scala> pb.redirectErrorStream(true)
>  res0: ProcessBuilder = java.lang.ProcessBuilder@46092840
> scala> val proc = pb.start()
>  proc: Process = java.lang.UNIXProcess@5e9650d3
> scala> val r = org.apache.commons.io.IOUtils.toString(proc.getInputStream())
>  r: String =
>  "* http://172.31.46.91:30706
>  "
> Although (no asterisk):
> $ minikube service ceph-nano-s3 -n spark --url
> [http://172.31.46.91:30706|http://172.31.46.91:30706/]
>  
> This is weird because it fails at the java level, where does the asterisk 
> come from?
> $ minikube version
> minikube version: v1.3.1
> commit: ca60a424ce69a4d79f502650199ca2b52f29e631
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28953) Integration tests fail due to malformed URL

2019-09-04 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922547#comment-16922547
 ] 

Stavros Kontopoulos commented on SPARK-28953:
-

[~srowen] thanks I will have a look need to test if ti removes the extra text, 
the question is why the command via java is different compared to the bash 
one [~holden.ka...@gmail.com] fyi.

> Integration tests fail due to malformed URL
> ---
>
> Key: SPARK-28953
> URL: https://issues.apache.org/jira/browse/SPARK-28953
> Project: Spark
>  Issue Type: Bug
>  Components: jenkins, Kubernetes
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Tests failed on Ubuntu, verified on two different machines:
> KubernetesSuite:
> - Launcher client dependencies *** FAILED ***
>  java.net.MalformedURLException: no protocol: * http://172.31.46.91:30706
>  at java.net.URL.(URL.java:600)
>  at java.net.URL.(URL.java:497)
>  at java.net.URL.(URL.java:446)
>  at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160)
>  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>  at org.scalatest.Transformer.apply(Transformer.scala:22)
>  at org.scalatest.Transformer.apply(Transformer.scala:20)
>  at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>  
> Welcome to
>   __
>  / __/__ ___ _/ /__
>  _\ \/ _ \/ _ `/ __/ '_/
>  /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT
>  /_/
>  
>  Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
>  Type in expressions to have them evaluated.
>  Type :help for more information.
>  
> scala> val pb = new ProcessBuilder().command("bash", "-c", "minikube service 
> ceph-nano-s3 -n spark --url")
>  pb: ProcessBuilder = java.lang.ProcessBuilder@46092840
> scala> pb.redirectErrorStream(true)
>  res0: ProcessBuilder = java.lang.ProcessBuilder@46092840
> scala> val proc = pb.start()
>  proc: Process = java.lang.UNIXProcess@5e9650d3
> scala> val r = org.apache.commons.io.IOUtils.toString(proc.getInputStream())
>  r: String =
>  "* http://172.31.46.91:30706
>  "
> Although (no asterisk):
> $ minikube service ceph-nano-s3 -n spark --url
> [http://172.31.46.91:30706|http://172.31.46.91:30706/]
>  
> This is weird because it fails at the java level, where does the asterisk 
> come from?
> $ minikube version
> minikube version: v1.3.1
> commit: ca60a424ce69a4d79f502650199ca2b52f29e631
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28953) Integration tests fail due to malformed URL

2019-09-04 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922292#comment-16922292
 ] 

Stavros Kontopoulos commented on SPARK-28953:
-

[~shaneknapp] I can attach the build log but this fails in the internal CI and 
on two other machines my local machine and on an ubuntu aws instance. What 
version of minikube do we use on the test machines? Btw this is failing 
constantly.

> Integration tests fail due to malformed URL
> ---
>
> Key: SPARK-28953
> URL: https://issues.apache.org/jira/browse/SPARK-28953
> Project: Spark
>  Issue Type: Bug
>  Components: jenkins, Kubernetes
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Tests failed on Ubuntu, verified on two different machines:
> KubernetesSuite:
> - Launcher client dependencies *** FAILED ***
>  java.net.MalformedURLException: no protocol: * http://172.31.46.91:30706
>  at java.net.URL.(URL.java:600)
>  at java.net.URL.(URL.java:497)
>  at java.net.URL.(URL.java:446)
>  at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160)
>  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>  at org.scalatest.Transformer.apply(Transformer.scala:22)
>  at org.scalatest.Transformer.apply(Transformer.scala:20)
>  at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>  
> Welcome to
>   __
>  / __/__ ___ _/ /__
>  _\ \/ _ \/ _ `/ __/ '_/
>  /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT
>  /_/
>  
>  Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
>  Type in expressions to have them evaluated.
>  Type :help for more information.
>  
> scala> val pb = new ProcessBuilder().command("bash", "-c", "minikube service 
> ceph-nano-s3 -n spark --url")
>  pb: ProcessBuilder = java.lang.ProcessBuilder@46092840
> scala> pb.redirectErrorStream(true)
>  res0: ProcessBuilder = java.lang.ProcessBuilder@46092840
> scala> val proc = pb.start()
>  proc: Process = java.lang.UNIXProcess@5e9650d3
> scala> val r = org.apache.commons.io.IOUtils.toString(proc.getInputStream())
>  r: String =
>  "* http://172.31.46.91:30706
>  "
> Although (no asterisk):
> $ minikube service ceph-nano-s3 -n spark --url
> [http://172.31.46.91:30706|http://172.31.46.91:30706/]
>  
> This is weird because it fails at the java level, where does the asterisk 
> come from?
> $ minikube version
> minikube version: v1.3.1
> commit: ca60a424ce69a4d79f502650199ca2b52f29e631
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28895) Spark client process is unable to upload jars to hdfs while using ConfigMap not HADOOP_CONF_DIR

2019-09-03 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921394#comment-16921394
 ] 

Stavros Kontopoulos edited comment on SPARK-28895 at 9/3/19 1:24 PM:
-

I changed the version to Spark 3.0.0 as this does not exist in 2.4.3. 


was (Author: skonto):
I changed the version to Spark 3.0.0 as this does not exist in 2.4.3. I havent 
used spark.kubernetes.hadoop.configMapName before so it is good that you have 
reported this. We can enhance the feature. I would mark this as a Improvement 
btw.

> Spark client process is unable to upload jars to hdfs while using ConfigMap 
> not HADOOP_CONF_DIR
> ---
>
> Key: SPARK-28895
> URL: https://issues.apache.org/jira/browse/SPARK-28895
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Major
>
> The *BasicDriverFeatureStep* for Spark on Kubernetes will upload the 
> files/jars specified by --files/–jars to a hadoop compatible file system 
> configured by spark.kubernetes.file.upload.path. While using HADOOP_CONF_DIR, 
> the spark-submit process can recognize the file system, but when using 
> spark.kubernetes.hadoop.configMapName which only will be mount on the Pods 
> not applied back to our client process. 
>  
> ||Heading 1||Heading 2||
> |HADOOP_CONF_DIR=/path/to/etc/hadoop|OK|
> |spark.kubernetes.hadoop.configMapName=hz10-hadoop-dir |FAILED|
>  
> {code:java}
>  Kent@KentsMacBookPro  
> ~/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3  bin/spark-submit 
> --conf spark.kubernetes.file.upload.path=hdfs://hz-cluster10/user/kyuubi/udf 
> --jars 
> /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar
>  --conf spark.kerberos.keytab=/Users/Kent/Downloads/kyuubi.keytab --conf 
> spark.kerberos.principal=kyuubi/d...@hadoop.hz.netease.com --conf  
> spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  --name hehe --deploy-mode 
> cluster --class org.apache.spark.examples.HdfsTest   
> local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0-SNAPSHOT.jar 
> hdfs://hz-cluster10/user/kyuubi/hive_db/kyuubi.db/hive_tbl
> Listening for transport dt_socket at address: 50014
> # spark.master=k8s://https://10.120.238.100:7443
> 19/08/27 17:21:06 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 19/08/27 17:21:07 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> Listening for transport dt_socket at address: 50014
> Exception in thread "main" org.apache.spark.SparkException: Uploading file 
> /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar
>  failed...
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:287)
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:246)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:245)
>   at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatur#
>  spark.master=k8s://https://10.120.238.100:7443
> eStep.scala:165)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:163)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:89)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>   at 
>

[jira] [Commented] (SPARK-28896) Spark client process is unable to upload jars to hdfs while using ConfigMap not HADOOP_CONF_DIR

2019-09-03 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921397#comment-16921397
 ] 

Stavros Kontopoulos commented on SPARK-28896:
-

Changed to Spark 3.0.0. I will review the PR.

> Spark client process is unable to upload jars to hdfs while using ConfigMap 
> not HADOOP_CONF_DIR
> ---
>
> Key: SPARK-28896
> URL: https://issues.apache.org/jira/browse/SPARK-28896
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Major
>
> The *BasicDriverFeatureStep* for Spark on Kubernetes will upload the 
> files/jars specified by --files/–jars to a hadoop compatible file system 
> configured by spark.kubernetes.file.upload.path. While using HADOOP_CONF_DIR, 
> the spark-submit process can recognize the file system, but when using 
> spark.kubernetes.hadoop.configMapName which only will be mount on the Pods 
> not applied back to our client process. 
>  
> ||Heading 1||Heading 2||
> |HADOOP_CONF_DIR=/path/to/etc/hadoop|OK|
> |spark.kubernetes.hadoop.configMapName=hz10-hadoop-dir |FAILED|
>  
> {code:java}
>  Kent@KentsMacBookPro  
> ~/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3  bin/spark-submit 
> --conf spark.kubernetes.file.upload.path=hdfs://hz-cluster10/user/kyuubi/udf 
> --jars 
> /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar
>  --conf spark.kerberos.keytab=/Users/Kent/Downloads/kyuubi.keytab --conf 
> spark.kerberos.principal=kyuubi/d...@hadoop.hz.netease.com --conf  
> spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  --name hehe --deploy-mode 
> cluster --class org.apache.spark.examples.HdfsTest   
> local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0-SNAPSHOT.jar 
> hdfs://hz-cluster10/user/kyuubi/hive_db/kyuubi.db/hive_tbl
> Listening for transport dt_socket at address: 50014
> # spark.master=k8s://https://10.120.238.100:7443
> 19/08/27 17:21:06 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 19/08/27 17:21:07 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> Listening for transport dt_socket at address: 50014
> Exception in thread "main" org.apache.spark.SparkException: Uploading file 
> /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar
>  failed...
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:287)
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:246)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:245)
>   at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatur#
>  spark.master=k8s://https://10.120.238.100:7443
> eStep.scala:165)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:163)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:89)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>   at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:101)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$10(KubernetesClientApplication.scala:236)
>   at 
>

[jira] [Updated] (SPARK-28896) Spark client process is unable to upload jars to hdfs while using ConfigMap not HADOOP_CONF_DIR

2019-09-03 Thread Stavros Kontopoulos (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-28896:

Affects Version/s: (was: 2.4.3)
   3.0.0

> Spark client process is unable to upload jars to hdfs while using ConfigMap 
> not HADOOP_CONF_DIR
> ---
>
> Key: SPARK-28896
> URL: https://issues.apache.org/jira/browse/SPARK-28896
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Major
>
> The *BasicDriverFeatureStep* for Spark on Kubernetes will upload the 
> files/jars specified by --files/–jars to a hadoop compatible file system 
> configured by spark.kubernetes.file.upload.path. While using HADOOP_CONF_DIR, 
> the spark-submit process can recognize the file system, but when using 
> spark.kubernetes.hadoop.configMapName which only will be mount on the Pods 
> not applied back to our client process. 
>  
> ||Heading 1||Heading 2||
> |HADOOP_CONF_DIR=/path/to/etc/hadoop|OK|
> |spark.kubernetes.hadoop.configMapName=hz10-hadoop-dir |FAILED|
>  
> {code:java}
>  Kent@KentsMacBookPro  
> ~/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3  bin/spark-submit 
> --conf spark.kubernetes.file.upload.path=hdfs://hz-cluster10/user/kyuubi/udf 
> --jars 
> /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar
>  --conf spark.kerberos.keytab=/Users/Kent/Downloads/kyuubi.keytab --conf 
> spark.kerberos.principal=kyuubi/d...@hadoop.hz.netease.com --conf  
> spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  --name hehe --deploy-mode 
> cluster --class org.apache.spark.examples.HdfsTest   
> local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0-SNAPSHOT.jar 
> hdfs://hz-cluster10/user/kyuubi/hive_db/kyuubi.db/hive_tbl
> Listening for transport dt_socket at address: 50014
> # spark.master=k8s://https://10.120.238.100:7443
> 19/08/27 17:21:06 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 19/08/27 17:21:07 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> Listening for transport dt_socket at address: 50014
> Exception in thread "main" org.apache.spark.SparkException: Uploading file 
> /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar
>  failed...
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:287)
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:246)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:245)
>   at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatur#
>  spark.master=k8s://https://10.120.238.100:7443
> eStep.scala:165)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:163)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:89)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>   at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:101)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$10(KubernetesClientApplication.scala:236)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$10$adapted(KubernetesClientApplication.scala:229)
>   at

[jira] [Commented] (SPARK-28895) Spark client process is unable to upload jars to hdfs while using ConfigMap not HADOOP_CONF_DIR

2019-09-03 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921394#comment-16921394
 ] 

Stavros Kontopoulos commented on SPARK-28895:
-

I changed the version to Spark 3.0.0 as this does not exist in 2.4.3. I havent 
used spark.kubernetes.hadoop.configMapName before so it is good that you have 
reported this. We can enhance the feature. I would mark this as a Improvement 
btw.

> Spark client process is unable to upload jars to hdfs while using ConfigMap 
> not HADOOP_CONF_DIR
> ---
>
> Key: SPARK-28895
> URL: https://issues.apache.org/jira/browse/SPARK-28895
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Major
>
> The *BasicDriverFeatureStep* for Spark on Kubernetes will upload the 
> files/jars specified by --files/–jars to a hadoop compatible file system 
> configured by spark.kubernetes.file.upload.path. While using HADOOP_CONF_DIR, 
> the spark-submit process can recognize the file system, but when using 
> spark.kubernetes.hadoop.configMapName which only will be mount on the Pods 
> not applied back to our client process. 
>  
> ||Heading 1||Heading 2||
> |HADOOP_CONF_DIR=/path/to/etc/hadoop|OK|
> |spark.kubernetes.hadoop.configMapName=hz10-hadoop-dir |FAILED|
>  
> {code:java}
>  Kent@KentsMacBookPro  
> ~/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3  bin/spark-submit 
> --conf spark.kubernetes.file.upload.path=hdfs://hz-cluster10/user/kyuubi/udf 
> --jars 
> /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar
>  --conf spark.kerberos.keytab=/Users/Kent/Downloads/kyuubi.keytab --conf 
> spark.kerberos.principal=kyuubi/d...@hadoop.hz.netease.com --conf  
> spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  --name hehe --deploy-mode 
> cluster --class org.apache.spark.examples.HdfsTest   
> local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0-SNAPSHOT.jar 
> hdfs://hz-cluster10/user/kyuubi/hive_db/kyuubi.db/hive_tbl
> Listening for transport dt_socket at address: 50014
> # spark.master=k8s://https://10.120.238.100:7443
> 19/08/27 17:21:06 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 19/08/27 17:21:07 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> Listening for transport dt_socket at address: 50014
> Exception in thread "main" org.apache.spark.SparkException: Uploading file 
> /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar
>  failed...
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:287)
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:246)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:245)
>   at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatur#
>  spark.master=k8s://https://10.120.238.100:7443
> eStep.scala:165)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:163)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:89)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>   at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:101)
>   at 
>

[jira] [Updated] (SPARK-28895) Spark client process is unable to upload jars to hdfs while using ConfigMap not HADOOP_CONF_DIR

2019-09-03 Thread Stavros Kontopoulos (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-28895:

Affects Version/s: (was: 2.4.3)
   3.0.0

> Spark client process is unable to upload jars to hdfs while using ConfigMap 
> not HADOOP_CONF_DIR
> ---
>
> Key: SPARK-28895
> URL: https://issues.apache.org/jira/browse/SPARK-28895
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Major
>
> The *BasicDriverFeatureStep* for Spark on Kubernetes will upload the 
> files/jars specified by --files/–jars to a hadoop compatible file system 
> configured by spark.kubernetes.file.upload.path. While using HADOOP_CONF_DIR, 
> the spark-submit process can recognize the file system, but when using 
> spark.kubernetes.hadoop.configMapName which only will be mount on the Pods 
> not applied back to our client process. 
>  
> ||Heading 1||Heading 2||
> |HADOOP_CONF_DIR=/path/to/etc/hadoop|OK|
> |spark.kubernetes.hadoop.configMapName=hz10-hadoop-dir |FAILED|
>  
> {code:java}
>  Kent@KentsMacBookPro  
> ~/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3  bin/spark-submit 
> --conf spark.kubernetes.file.upload.path=hdfs://hz-cluster10/user/kyuubi/udf 
> --jars 
> /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar
>  --conf spark.kerberos.keytab=/Users/Kent/Downloads/kyuubi.keytab --conf 
> spark.kerberos.principal=kyuubi/d...@hadoop.hz.netease.com --conf  
> spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  --name hehe --deploy-mode 
> cluster --class org.apache.spark.examples.HdfsTest   
> local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0-SNAPSHOT.jar 
> hdfs://hz-cluster10/user/kyuubi/hive_db/kyuubi.db/hive_tbl
> Listening for transport dt_socket at address: 50014
> # spark.master=k8s://https://10.120.238.100:7443
> 19/08/27 17:21:06 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 19/08/27 17:21:07 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> Listening for transport dt_socket at address: 50014
> Exception in thread "main" org.apache.spark.SparkException: Uploading file 
> /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar
>  failed...
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:287)
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:246)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:245)
>   at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatur#
>  spark.master=k8s://https://10.120.238.100:7443
> eStep.scala:165)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:163)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:89)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>   at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:101)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$10(KubernetesClientApplication.scala:236)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$10$adapted(KubernetesClientApplication.scala:229)
>   at

[jira] [Comment Edited] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)

2019-09-03 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921315#comment-16921315
 ] 

Stavros Kontopoulos edited comment on SPARK-28921 at 9/3/19 10:25 AM:
--

[~andygrove] could you please clarify what do you mean when you say? "jobs like 
Spark-Pi that do not launch executors run without a problem"

I run a pi job and it creates executors fine:

spark-pi-03afbd6cf6a72622-driver 1/1 Running 0 15s
spark-pi-03afbd6cf6a72622-exec-1 1/1 Running 0 7s
spark-pi-03afbd6cf6a72622-exec-2 1/1 Running 0 7s


was (Author: skonto):
[~andygrove] could you please clarify what do you mean when you say? "jobs like 
Spark-Pi that do not launch executors run without a problem"

I run a pi job and it creates executors fine.

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 
> 1.12.10, 1.11.10)
> ---
>
> Key: SPARK-28921
> URL: https://issues.apache.org/jira/browse/SPARK-28921
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Paul Schweigert
>Assignee: Andy Grove
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
> at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)

2019-09-03 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921315#comment-16921315
 ] 

Stavros Kontopoulos commented on SPARK-28921:
-

[~andygrove] could you please clarify what do you mean when you say? "jobs like 
Spark-Pi that do not launch executors run without a problem"

I run a pi job and it creates executors fine.

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 
> 1.12.10, 1.11.10)
> ---
>
> Key: SPARK-28921
> URL: https://issues.apache.org/jira/browse/SPARK-28921
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Paul Schweigert
>Assignee: Andy Grove
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
> at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28953) Integration tests fail due to malformed URL

2019-09-02 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921087#comment-16921087
 ] 

Stavros Kontopoulos edited comment on SPARK-28953 at 9/3/19 12:04 AM:
--

[~shaneknapp] [~eje] I can fix this since I am working on: SPARK-27936 but im 
wondering about the root cause.


was (Author: skonto):
[~shaneknapp] [~eje] I can fix this since I am working on: SPARK-27936 but im 
wondering of the root cause.

> Integration tests fail due to malformed URL
> ---
>
> Key: SPARK-28953
> URL: https://issues.apache.org/jira/browse/SPARK-28953
> Project: Spark
>  Issue Type: Bug
>  Components: jenkins, Kubernetes
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Tests failed on Ubuntu, verified on two different machines:
> KubernetesSuite:
> - Launcher client dependencies *** FAILED ***
>  java.net.MalformedURLException: no protocol: * http://172.31.46.91:30706
>  at java.net.URL.(URL.java:600)
>  at java.net.URL.(URL.java:497)
>  at java.net.URL.(URL.java:446)
>  at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160)
>  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>  at org.scalatest.Transformer.apply(Transformer.scala:22)
>  at org.scalatest.Transformer.apply(Transformer.scala:20)
>  at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>  
> Welcome to
>   __
>  / __/__ ___ _/ /__
>  _\ \/ _ \/ _ `/ __/ '_/
>  /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT
>  /_/
>  
>  Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
>  Type in expressions to have them evaluated.
>  Type :help for more information.
>  
> scala> val pb = new ProcessBuilder().command("bash", "-c", "minikube service 
> ceph-nano-s3 -n spark --url")
>  pb: ProcessBuilder = java.lang.ProcessBuilder@46092840
> scala> pb.redirectErrorStream(true)
>  res0: ProcessBuilder = java.lang.ProcessBuilder@46092840
> scala> val proc = pb.start()
>  proc: Process = java.lang.UNIXProcess@5e9650d3
> scala> val r = org.apache.commons.io.IOUtils.toString(proc.getInputStream())
>  r: String =
>  "* http://172.31.46.91:30706
>  "
> Although (no asterisk):
> $ minikube service ceph-nano-s3 -n spark --url
> [http://172.31.46.91:30706|http://172.31.46.91:30706/]
>  
> This is weird because it fails at the java level, where does the asterisk 
> come from?
> $ minikube version
> minikube version: v1.3.1
> commit: ca60a424ce69a4d79f502650199ca2b52f29e631
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28953) Integration tests fail due to malformed URL

2019-09-02 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921087#comment-16921087
 ] 

Stavros Kontopoulos edited comment on SPARK-28953 at 9/3/19 12:03 AM:
--

[~shaneknapp] [~eje] I can fix this since I am working on: SPARK-27936 but im 
wondering of the root cause.


was (Author: skonto):
[~shaneknapp] I can fix this since I am working on: SPARK-27936 but im 
wondering of the root cause.

> Integration tests fail due to malformed URL
> ---
>
> Key: SPARK-28953
> URL: https://issues.apache.org/jira/browse/SPARK-28953
> Project: Spark
>  Issue Type: Bug
>  Components: jenkins, Kubernetes
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Tests failed on Ubuntu, verified on two different machines:
> KubernetesSuite:
> - Launcher client dependencies *** FAILED ***
>  java.net.MalformedURLException: no protocol: * http://172.31.46.91:30706
>  at java.net.URL.(URL.java:600)
>  at java.net.URL.(URL.java:497)
>  at java.net.URL.(URL.java:446)
>  at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160)
>  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>  at org.scalatest.Transformer.apply(Transformer.scala:22)
>  at org.scalatest.Transformer.apply(Transformer.scala:20)
>  at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>  
> Welcome to
>   __
>  / __/__ ___ _/ /__
>  _\ \/ _ \/ _ `/ __/ '_/
>  /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT
>  /_/
>  
>  Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
>  Type in expressions to have them evaluated.
>  Type :help for more information.
>  
> scala> val pb = new ProcessBuilder().command("bash", "-c", "minikube service 
> ceph-nano-s3 -n spark --url")
>  pb: ProcessBuilder = java.lang.ProcessBuilder@46092840
> scala> pb.redirectErrorStream(true)
>  res0: ProcessBuilder = java.lang.ProcessBuilder@46092840
> scala> val proc = pb.start()
>  proc: Process = java.lang.UNIXProcess@5e9650d3
> scala> val r = org.apache.commons.io.IOUtils.toString(proc.getInputStream())
>  r: String =
>  "* http://172.31.46.91:30706
>  "
> Although (no asterisk):
> $ minikube service ceph-nano-s3 -n spark --url
> [http://172.31.46.91:30706|http://172.31.46.91:30706/]
>  
> This is weird because it fails at the java level, where does the asterisk 
> come from?
> $ minikube version
> minikube version: v1.3.1
> commit: ca60a424ce69a4d79f502650199ca2b52f29e631
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28953) Integration tests fail due to malformed URL

2019-09-02 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921087#comment-16921087
 ] 

Stavros Kontopoulos commented on SPARK-28953:
-

[~shaneknapp] I can fix this since I am working on: SPARK-27936 but im 
wondering of the root cause.

> Integration tests fail due to malformed URL
> ---
>
> Key: SPARK-28953
> URL: https://issues.apache.org/jira/browse/SPARK-28953
> Project: Spark
>  Issue Type: Bug
>  Components: jenkins, Kubernetes
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Tests failed on Ubuntu, verified on two different machines:
> KubernetesSuite:
> - Launcher client dependencies *** FAILED ***
>  java.net.MalformedURLException: no protocol: * http://172.31.46.91:30706
>  at java.net.URL.(URL.java:600)
>  at java.net.URL.(URL.java:497)
>  at java.net.URL.(URL.java:446)
>  at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160)
>  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>  at org.scalatest.Transformer.apply(Transformer.scala:22)
>  at org.scalatest.Transformer.apply(Transformer.scala:20)
>  at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>  
> Welcome to
>   __
>  / __/__ ___ _/ /__
>  _\ \/ _ \/ _ `/ __/ '_/
>  /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT
>  /_/
>  
>  Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
>  Type in expressions to have them evaluated.
>  Type :help for more information.
>  
> scala> val pb = new ProcessBuilder().command("bash", "-c", "minikube service 
> ceph-nano-s3 -n spark --url")
>  pb: ProcessBuilder = java.lang.ProcessBuilder@46092840
> scala> pb.redirectErrorStream(true)
>  res0: ProcessBuilder = java.lang.ProcessBuilder@46092840
> scala> val proc = pb.start()
>  proc: Process = java.lang.UNIXProcess@5e9650d3
> scala> val r = org.apache.commons.io.IOUtils.toString(proc.getInputStream())
>  r: String =
>  "* http://172.31.46.91:30706
>  "
> Although (no asterisk):
> $ minikube service ceph-nano-s3 -n spark --url
> [http://172.31.46.91:30706|http://172.31.46.91:30706/]
>  
> This is weird because it fails at the java level, where does the asterisk 
> come from?
> $ minikube version
> minikube version: v1.3.1
> commit: ca60a424ce69a4d79f502650199ca2b52f29e631
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28953) Integration tests fail due to malformed URL

2019-09-02 Thread Stavros Kontopoulos (Jira)

Stavros Kontopoulos created SPARK-28953:
---

 Summary: Integration tests fail due to malformed URL
 Key: SPARK-28953
 URL: https://issues.apache.org/jira/browse/SPARK-28953
 Project: Spark
  Issue Type: Bug
  Components: jenkins, Kubernetes
Affects Versions: 3.0.0
Reporter: Stavros Kontopoulos


Tests failed on Ubuntu, verified on two different machines:


KubernetesSuite:
- Launcher client dependencies *** FAILED ***
 java.net.MalformedURLException: no protocol: * http://172.31.46.91:30706
 at java.net.URL.(URL.java:600)
 at java.net.URL.(URL.java:497)
 at java.net.URL.(URL.java:446)
 at 
org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160)
 at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
 at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
 at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
 at org.scalatest.Transformer.apply(Transformer.scala:22)
 at org.scalatest.Transformer.apply(Transformer.scala:20)
 at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)

 

Welcome to
  __
 / __/__ ___ _/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT
 /_/
 
 Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
 Type in expressions to have them evaluated.
 Type :help for more information.

 

scala> val pb = new ProcessBuilder().command("bash", "-c", "minikube service 
ceph-nano-s3 -n spark --url")
 pb: ProcessBuilder = java.lang.ProcessBuilder@46092840

scala> pb.redirectErrorStream(true)
 res0: ProcessBuilder = java.lang.ProcessBuilder@46092840

scala> val proc = pb.start()
 proc: Process = java.lang.UNIXProcess@5e9650d3

scala> val r = org.apache.commons.io.IOUtils.toString(proc.getInputStream())
 r: String =
 "* http://172.31.46.91:30706
 "

Although (no asterisk):
$ minikube service ceph-nano-s3 -n spark --url
[http://172.31.46.91:30706|http://172.31.46.91:30706/]

 

This is weird because it fails at the java level, where does the asterisk come 
from?

$ minikube version
minikube version: v1.3.1
commit: ca60a424ce69a4d79f502650199ca2b52f29e631

 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28025) HDFSBackedStateStoreProvider should not leak .crc files

2019-08-30 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919583#comment-16919583
 ] 

Stavros Kontopoulos commented on SPARK-28025:
-

Thanks I will have a look :)

> HDFSBackedStateStoreProvider should not leak .crc files 
> 
>
> Key: SPARK-28025
> URL: https://issues.apache.org/jira/browse/SPARK-28025
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.3
> Environment: Spark 2.4.3
> Kubernetes 1.11(?) (OpenShift)
> StateStore storage on a mounted PVC. Viewed as a local filesystem by the 
> `FileContextBasedCheckpointFileManager` : 
> {noformat}
> scala> glusterfm.isLocal
> res17: Boolean = true{noformat}
>Reporter: Gerard Maas
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 2.4.4, 3.0.0
>
>
> The HDFSBackedStateStoreProvider when using the default CheckpointFileManager 
> is leaving '.crc' files behind. There's a .crc file created for each 
> `atomicFile` operation of the CheckpointFileManager.
> Over time, the number of files becomes very large. It makes the state store 
> file system constantly increase in size and, in our case, deteriorates the 
> file system performance.
> Here's a sample of one of our spark storage volumes after 2 days of execution 
> (4 stateful streaming jobs, each on a different sub-dir):
>  # 
> {noformat}
> Total files in PVC (used for checkpoints and state store)
> $find . | wc -l
> 431796
> # .crc files
> $find . -name "*.crc" | wc -l
> 418053{noformat}
> With each .crc file taking one storage block, the used storage runs into the 
> GBs of data.
> These jobs are running on Kubernetes. Our shared storage provider, GlusterFS, 
> shows serious performance deterioration with this large number of files:
> {noformat}
> DEBUG HDFSBackedStateStoreProvider: fetchFiles() took 29164ms{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28025) HDFSBackedStateStoreProvider should not leak .crc files

2019-08-30 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919577#comment-16919577
 ] 

Stavros Kontopoulos edited comment on SPARK-28025 at 8/30/19 2:15 PM:
--

[~kabhwan] cool I have a look.


was (Author: skonto):
[~kabhwan] which PR?

> HDFSBackedStateStoreProvider should not leak .crc files 
> 
>
> Key: SPARK-28025
> URL: https://issues.apache.org/jira/browse/SPARK-28025
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.3
> Environment: Spark 2.4.3
> Kubernetes 1.11(?) (OpenShift)
> StateStore storage on a mounted PVC. Viewed as a local filesystem by the 
> `FileContextBasedCheckpointFileManager` : 
> {noformat}
> scala> glusterfm.isLocal
> res17: Boolean = true{noformat}
>Reporter: Gerard Maas
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 2.4.4, 3.0.0
>
>
> The HDFSBackedStateStoreProvider when using the default CheckpointFileManager 
> is leaving '.crc' files behind. There's a .crc file created for each 
> `atomicFile` operation of the CheckpointFileManager.
> Over time, the number of files becomes very large. It makes the state store 
> file system constantly increase in size and, in our case, deteriorates the 
> file system performance.
> Here's a sample of one of our spark storage volumes after 2 days of execution 
> (4 stateful streaming jobs, each on a different sub-dir):
>  # 
> {noformat}
> Total files in PVC (used for checkpoints and state store)
> $find . | wc -l
> 431796
> # .crc files
> $find . -name "*.crc" | wc -l
> 418053{noformat}
> With each .crc file taking one storage block, the used storage runs into the 
> GBs of data.
> These jobs are running on Kubernetes. Our shared storage provider, GlusterFS, 
> shows serious performance deterioration with this large number of files:
> {noformat}
> DEBUG HDFSBackedStateStoreProvider: fetchFiles() took 29164ms{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28025) HDFSBackedStateStoreProvider should not leak .crc files

2019-08-30 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919577#comment-16919577
 ] 

Stavros Kontopoulos commented on SPARK-28025:
-

[~kabhwan] which PR?

> HDFSBackedStateStoreProvider should not leak .crc files 
> 
>
> Key: SPARK-28025
> URL: https://issues.apache.org/jira/browse/SPARK-28025
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.3
> Environment: Spark 2.4.3
> Kubernetes 1.11(?) (OpenShift)
> StateStore storage on a mounted PVC. Viewed as a local filesystem by the 
> `FileContextBasedCheckpointFileManager` : 
> {noformat}
> scala> glusterfm.isLocal
> res17: Boolean = true{noformat}
>Reporter: Gerard Maas
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 2.4.4, 3.0.0
>
>
> The HDFSBackedStateStoreProvider when using the default CheckpointFileManager 
> is leaving '.crc' files behind. There's a .crc file created for each 
> `atomicFile` operation of the CheckpointFileManager.
> Over time, the number of files becomes very large. It makes the state store 
> file system constantly increase in size and, in our case, deteriorates the 
> file system performance.
> Here's a sample of one of our spark storage volumes after 2 days of execution 
> (4 stateful streaming jobs, each on a different sub-dir):
>  # 
> {noformat}
> Total files in PVC (used for checkpoints and state store)
> $find . | wc -l
> 431796
> # .crc files
> $find . -name "*.crc" | wc -l
> 418053{noformat}
> With each .crc file taking one storage block, the used storage runs into the 
> GBs of data.
> These jobs are running on Kubernetes. Our shared storage provider, GlusterFS, 
> shows serious performance deterioration with this large number of files:
> {noformat}
> DEBUG HDFSBackedStateStoreProvider: fetchFiles() took 29164ms{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28025) HDFSBackedStateStoreProvider should not leak .crc files

2019-08-30 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919471#comment-16919471
 ] 

Stavros Kontopoulos edited comment on SPARK-28025 at 8/30/19 11:54 AM:
---

@[~dongjoon] [~zsxwing] this needs to be re-opened. When using the workaround 
we recently hit this issue:

[https://github.com/broadinstitute/gatk/issues/1389]

which can be fixed easily with a derived class like in this PR: 
[https://github.com/broadinstitute/gatk/pull/1421/files]

but this is a bit of inconvenient. 

However, I believe as well that this should be fixed in Spark (less surprises) 
otherwise we need to document it as [~kabhwan] said above.


was (Author: skonto):
@[~dongjoon] [~zsxwing] this needs to be re-opened. When using the workaround 
we recently hit this issue:

[https://github.com/broadinstitute/gatk/issues/1389]

which can be fixed easily with a derived class like in this PR: 
[https://github.com/broadinstitute/gatk/pull/1421/files]

However, I believe as well that this should be fixed in Spark (less surprises) 
otherwise we need to document it as [~kabhwan] said above.

> HDFSBackedStateStoreProvider should not leak .crc files 
> 
>
> Key: SPARK-28025
> URL: https://issues.apache.org/jira/browse/SPARK-28025
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.3
> Environment: Spark 2.4.3
> Kubernetes 1.11(?) (OpenShift)
> StateStore storage on a mounted PVC. Viewed as a local filesystem by the 
> `FileContextBasedCheckpointFileManager` : 
> {noformat}
> scala> glusterfm.isLocal
> res17: Boolean = true{noformat}
>Reporter: Gerard Maas
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 2.4.4, 3.0.0
>
>
> The HDFSBackedStateStoreProvider when using the default CheckpointFileManager 
> is leaving '.crc' files behind. There's a .crc file created for each 
> `atomicFile` operation of the CheckpointFileManager.
> Over time, the number of files becomes very large. It makes the state store 
> file system constantly increase in size and, in our case, deteriorates the 
> file system performance.
> Here's a sample of one of our spark storage volumes after 2 days of execution 
> (4 stateful streaming jobs, each on a different sub-dir):
>  # 
> {noformat}
> Total files in PVC (used for checkpoints and state store)
> $find . | wc -l
> 431796
> # .crc files
> $find . -name "*.crc" | wc -l
> 418053{noformat}
> With each .crc file taking one storage block, the used storage runs into the 
> GBs of data.
> These jobs are running on Kubernetes. Our shared storage provider, GlusterFS, 
> shows serious performance deterioration with this large number of files:
> {noformat}
> DEBUG HDFSBackedStateStoreProvider: fetchFiles() took 29164ms{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28025) HDFSBackedStateStoreProvider should not leak .crc files

2019-08-30 Thread Stavros Kontopoulos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919471#comment-16919471
 ] 

Stavros Kontopoulos commented on SPARK-28025:
-

@[~dongjoon] [~zsxwing] this needs to be re-opened. When using the workaround 
we recently hit this issue:

[https://github.com/broadinstitute/gatk/issues/1389]

which can be fixed easily with a derived class like in this PR: 
[https://github.com/broadinstitute/gatk/pull/1421/files]

However, I believe as well that this should be fixed in Spark (less surprises) 
otherwise we need to document it as [~kabhwan] said above.

> HDFSBackedStateStoreProvider should not leak .crc files 
> 
>
> Key: SPARK-28025
> URL: https://issues.apache.org/jira/browse/SPARK-28025
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.3
> Environment: Spark 2.4.3
> Kubernetes 1.11(?) (OpenShift)
> StateStore storage on a mounted PVC. Viewed as a local filesystem by the 
> `FileContextBasedCheckpointFileManager` : 
> {noformat}
> scala> glusterfm.isLocal
> res17: Boolean = true{noformat}
>Reporter: Gerard Maas
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 2.4.4, 3.0.0
>
>
> The HDFSBackedStateStoreProvider when using the default CheckpointFileManager 
> is leaving '.crc' files behind. There's a .crc file created for each 
> `atomicFile` operation of the CheckpointFileManager.
> Over time, the number of files becomes very large. It makes the state store 
> file system constantly increase in size and, in our case, deteriorates the 
> file system performance.
> Here's a sample of one of our spark storage volumes after 2 days of execution 
> (4 stateful streaming jobs, each on a different sub-dir):
>  # 
> {noformat}
> Total files in PVC (used for checkpoints and state store)
> $find . | wc -l
> 431796
> # .crc files
> $find . -name "*.crc" | wc -l
> 418053{noformat}
> With each .crc file taking one storage block, the used storage runs into the 
> GBs of data.
> These jobs are running on Kubernetes. Our shared storage provider, GlusterFS, 
> shows serious performance deterioration with this large number of files:
> {noformat}
> DEBUG HDFSBackedStateStoreProvider: fetchFiles() took 29164ms{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27936) Support local dependency uploading from --py-files

2019-08-05 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900411#comment-16900411
 ] 

Stavros Kontopoulos edited comment on SPARK-27936 at 8/5/19 9:06 PM:
-

[~eje] I have started working on this, maybe we need another ticket for R as 
well, but will be off from 7-26 so will take some time, if someone else wants 
to do it let me know, otherwise I will do a PR when I get back. 


was (Author: skonto):
[~eje] I have started working on this, maybe we need another ticket for R as 
well, but will be off from 7-26 so will take some time. 

> Support local dependency uploading from --py-files
> --
>
> Key: SPARK-27936
> URL: https://issues.apache.org/jira/browse/SPARK-27936
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Erik Erlandson
>Priority: Major
>
> Support python dependency uploads, as in SPARK-23153



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27936) Support local dependency uploading from --py-files

2019-08-05 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900411#comment-16900411
 ] 

Stavros Kontopoulos commented on SPARK-27936:
-

[~eje] I have started working on this, maybe we need another ticket for R as 
well, but will be off from 7-26 so will take some time. 

> Support local dependency uploading from --py-files
> --
>
> Key: SPARK-27936
> URL: https://issues.apache.org/jira/browse/SPARK-27936
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Erik Erlandson
>Priority: Major
>
> Support python dependency uploads, as in SPARK-23153



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28465) K8s integration tests fail due to missing ceph-nano image

2019-07-21 Thread Stavros Kontopoulos (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-28465:

Summary: K8s integration tests fail due to missing ceph-nano image  (was: 
K8s integration tests fail due to non existent image)

> K8s integration tests fail due to missing ceph-nano image
> -
>
> Key: SPARK-28465
> URL: https://issues.apache.org/jira/browse/SPARK-28465
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Image added here: 
> [https://github.com/lightbend/spark/blob/72c80ee81ca4c3c9569749b54e2db0ec91b128a5/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L66]
>  needs to be updated to the latest as it was removed from dockerhub.
> {quote}docker pull ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64
>  Error response from daemon: manifest for 
> ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64 not found
> {quote}
> Also we need to apply this fix: 
> [https://github.com/ceph/cn/issues/115#issuecomment-497384369]
> I will create a PR shortly.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28465) K8s integration tests fail due to non existent image

2019-07-21 Thread Stavros Kontopoulos (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-28465:

Description: 
Image added here: 
[https://github.com/lightbend/spark/blob/72c80ee81ca4c3c9569749b54e2db0ec91b128a5/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L66]
 needs to be updated to the latest as it was removed from dockerhub.
{quote}docker pull ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64
 Error response from daemon: manifest for 
ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64 not found
{quote}
Also we need to apply this fix: 
[https://github.com/ceph/cn/issues/115#issuecomment-497384369]

I will create a PR shortly.

  was:
Image added here: 
[https://github.com/lightbend/spark/blob/72c80ee81ca4c3c9569749b54e2db0ec91b128a5/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L66]
 needs to be updated to the latest as it was removed from dockerhub.
{quote}docker pull ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64
Error response from daemon: manifest for 
ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64 not found
{quote}
Also we need to apply this fix: 
[https://github.com/ceph/cn/issues/115#issuecomment-497384369]


> K8s integration tests fail due to non existent image
> 
>
> Key: SPARK-28465
> URL: https://issues.apache.org/jira/browse/SPARK-28465
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Image added here: 
> [https://github.com/lightbend/spark/blob/72c80ee81ca4c3c9569749b54e2db0ec91b128a5/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L66]
>  needs to be updated to the latest as it was removed from dockerhub.
> {quote}docker pull ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64
>  Error response from daemon: manifest for 
> ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64 not found
> {quote}
> Also we need to apply this fix: 
> [https://github.com/ceph/cn/issues/115#issuecomment-497384369]
> I will create a PR shortly.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28465) K8s integration tests fail due to non existent image

2019-07-21 Thread Stavros Kontopoulos (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-28465:

Description: 
Image added here: 
[https://github.com/lightbend/spark/blob/72c80ee81ca4c3c9569749b54e2db0ec91b128a5/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L66]
 needs to be updated to the latest as it was removed from dockerhub.
{quote}docker pull ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64
Error response from daemon: manifest for 
ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64 not found
{quote}
Also we need to apply this fix: 
[https://github.com/ceph/cn/issues/115#issuecomment-497384369]

  was:Image added here: 
[https://github.com/lightbend/spark/blob/72c80ee81ca4c3c9569749b54e2db0ec91b128a5/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L66]
 needs to be updated to the latest as it was removed from dockerhub. Also we 
need to apply this fix: 
https://github.com/ceph/cn/issues/115#issuecomment-497384369


> K8s integration tests fail due to non existent image
> 
>
> Key: SPARK-28465
> URL: https://issues.apache.org/jira/browse/SPARK-28465
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Image added here: 
> [https://github.com/lightbend/spark/blob/72c80ee81ca4c3c9569749b54e2db0ec91b128a5/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L66]
>  needs to be updated to the latest as it was removed from dockerhub.
> {quote}docker pull ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64
> Error response from daemon: manifest for 
> ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64 not found
> {quote}
> Also we need to apply this fix: 
> [https://github.com/ceph/cn/issues/115#issuecomment-497384369]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28465) K8s integration tests fail due to non existent image

2019-07-21 Thread Stavros Kontopoulos (JIRA)

Stavros Kontopoulos created SPARK-28465:
---

 Summary: K8s integration tests fail due to non existent image
 Key: SPARK-28465
 URL: https://issues.apache.org/jira/browse/SPARK-28465
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Stavros Kontopoulos


Image added here: 
[https://github.com/lightbend/spark/blob/72c80ee81ca4c3c9569749b54e2db0ec91b128a5/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L66]
 needs to be updated to the latest as it was removed from dockerhub. Also we 
need to apply this fix: 
https://github.com/ceph/cn/issues/115#issuecomment-497384369



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888970#comment-16888970
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 3:26 PM:
--

Right now on master we have 4.1.2 
[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32].
 Afaik this is the same version for 2.4.3. Something else is not right.


was (Author: skonto):
Right now on master we have 4.1.2 
[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32].
 Afaik this is the same version for 2.4.3.

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888970#comment-16888970
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 3:26 PM:
--

Right now on master we have 4.1.2 
[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32].
 Afaik this is the same version for 2.4.3.


was (Author: skonto):
Right now on master we have 4.1.2 
[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32].
 Did you try 3.0.0?

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888970#comment-16888970
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 3:25 PM:
--

Right now on master we have [4.1.2 | 
[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32]]


was (Author: skonto):
Right now on master we have [4.1.2 
|[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32]]

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888970#comment-16888970
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 3:25 PM:
--

Right now on master we have 4.1.2 
[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32].
 Did you try 3.0.0?


was (Author: skonto):
Right now on master we have [4.1.2 | 
[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32]]

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888970#comment-16888970
 ] 

Stavros Kontopoulos commented on SPARK-28444:
-

Right now on master we have [4.1.2 
|[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32]]

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1671#comment-1671
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 1:46 PM:
--

Am I not sure this is a k8s client version issue, it seems more like a 
credentials issue. But let's find out. Have you tried to update the k8s client? 
Can you verify you can/cant create pods with a simple app (outside Spark) using 
the fabric8io k8s client in different versions? Does it work with minikube 1.14?


was (Author: skonto):
Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can/cant 
create pods with a simple app (outside Spark) using the fabric8io k8s client in 
different versions? Does it work with minikube 1.14?

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1671#comment-1671
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 1:46 PM:
--

Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can/cant 
create pods with a simple app (outside Spark) using the fabric8io k8s client in 
different versions? Does it work with minikube 1.14?


was (Author: skonto):
Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can/cant 
create pods with a simple app (outside Spark) using the fabric8io k8s client in 
different versions? 

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1671#comment-1671
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 1:45 PM:
--

Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can/cant 
create pods with a simple app (outside Spark) using the fabric8io k8s client in 
different versions? 


was (Author: skonto):
Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can/cant 
create pods with a simple app (outside Spark) using the fabric8ios client in 
different versions? 

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1671#comment-1671
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 1:45 PM:
--

Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can/cant 
create pods with a simple app (outside Spark) using the fabric8ios client in 
different versions? 


was (Author: skonto):
Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can create 
pods with a simple app using the fabric8ios client in different versions?

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1671#comment-1671
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 1:44 PM:
--

Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can create 
pods with a simple app using the fabric8ios client in different versions?


was (Author: skonto):
Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client?

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1671#comment-1671
 ] 

Stavros Kontopoulos commented on SPARK-28444:
-

Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client?

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1639#comment-1639
 ] 

Stavros Kontopoulos commented on SPARK-28444:
-

Probably you are hitting this one: 
https://issues.apache.org/jira/browse/SPARK-26833

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28445) Inconsistency between Scala and Python/Panda udfs when groupby with udf() is used

2019-07-19 Thread Stavros Kontopoulos (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-28445:

Summary: Inconsistency between Scala and Python/Panda udfs when groupby 
with udf() is used  (was: Inconsistency between Scala and Python/Panda udfs 
when groupby udef() is used)

> Inconsistency between Scala and Python/Panda udfs when groupby with udf() is 
> used
> -
>
> Key: SPARK-28445
> URL: https://issues.apache.org/jira/browse/SPARK-28445
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Python:
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> @pandas_udf("int", PandasUDFType.SCALAR)
> def noop(x):
>  return x
> spark.udf.register("udf", noop)
> sql("""
>  CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
>  (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
> null)
>  AS testData(a, b)""")
> sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
> 1)""").show()
> : org.apache.spark.sql.AnalysisException: expression 'testdata.`a`' is 
> neither present in the group by, nor is it an aggregate function. Add to 
> group by or wrap in first() (or first_value) if you don't care which value 
> you get.;;
> Aggregate [udf((a#0 + 1))], [udf((a#0 + 1)) AS udf((a + 1))#10, 
> udf(count(b#1)) AS udf(count(b))#12]
> +- SubqueryAlias `testdata`
>  +- Project [a#0, b#1]
>  +- SubqueryAlias `testData`
>  +- LocalRelation [a#0, b#1]
> Scala:
> spark.udf.register("udf", (input: Int) => input)
> sql("""
>  CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
>  (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
> null)
>  AS testData(a, b)""")
> sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
> 1)""").show()
> ++-+
> |udf((a + 1))|udf(count(b))|
> ++-+
> | null| 1|
> | 3| 2|
> | 4| 2|
> | 2| 2|
> ++-+



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28445) Inconsistency between Scala and Python/Panda udfs when groupby udef() is used

2019-07-19 Thread Stavros Kontopoulos (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-28445:

Component/s: PySpark

> Inconsistency between Scala and Python/Panda udfs when groupby udef() is used
> -
>
> Key: SPARK-28445
> URL: https://issues.apache.org/jira/browse/SPARK-28445
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Python:
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> @pandas_udf("int", PandasUDFType.SCALAR)
> def noop(x):
>  return x
> spark.udf.register("udf", noop)
> sql("""
>  CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
>  (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
> null)
>  AS testData(a, b)""")
> sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
> 1)""").show()
> : org.apache.spark.sql.AnalysisException: expression 'testdata.`a`' is 
> neither present in the group by, nor is it an aggregate function. Add to 
> group by or wrap in first() (or first_value) if you don't care which value 
> you get.;;
> Aggregate [udf((a#0 + 1))], [udf((a#0 + 1)) AS udf((a + 1))#10, 
> udf(count(b#1)) AS udf(count(b))#12]
> +- SubqueryAlias `testdata`
>  +- Project [a#0, b#1]
>  +- SubqueryAlias `testData`
>  +- LocalRelation [a#0, b#1]
> Scala:
> spark.udf.register("udf", (input: Int) => input)
> sql("""
>  CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
>  (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
> null)
>  AS testData(a, b)""")
> sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
> 1)""").show()
> ++-+
> |udf((a + 1))|udf(count(b))|
> ++-+
> | null| 1|
> | 3| 2|
> | 4| 2|
> | 2| 2|
> ++-+



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28445) Inconsistency between Scala and Python/Panda udfs when groupby udef() is used

2019-07-19 Thread Stavros Kontopoulos (JIRA)

Stavros Kontopoulos created SPARK-28445:
---

 Summary: Inconsistency between Scala and Python/Panda udfs when 
groupby udef() is used
 Key: SPARK-28445
 URL: https://issues.apache.org/jira/browse/SPARK-28445
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Stavros Kontopoulos


Python:

from pyspark.sql.functions import pandas_udf, PandasUDFType

@pandas_udf("int", PandasUDFType.SCALAR)
def noop(x):
 return x

spark.udf.register("udf", noop)

sql("""
 CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
 (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
null)
 AS testData(a, b)""")

sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
1)""").show()
: org.apache.spark.sql.AnalysisException: expression 'testdata.`a`' is neither 
present in the group by, nor is it an aggregate function. Add to group by or 
wrap in first() (or first_value) if you don't care which value you get.;;
Aggregate [udf((a#0 + 1))], [udf((a#0 + 1)) AS udf((a + 1))#10, udf(count(b#1)) 
AS udf(count(b))#12]
+- SubqueryAlias `testdata`
 +- Project [a#0, b#1]
 +- SubqueryAlias `testData`
 +- LocalRelation [a#0, b#1]
Scala:

spark.udf.register("udf", (input: Int) => input)

sql("""
 CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
 (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
null)
 AS testData(a, b)""")

sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
1)""").show()
++-+
|udf((a + 1))|udf(count(b))|
++-+
| null| 1|
| 3| 2|
| 4| 2|
| 2| 2|
++-+



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1608#comment-1608
 ] 

Stavros Kontopoulos commented on SPARK-28444:
-

Hi [~patrick-winter-swisscard]. On our ci we are using v1.15 and tests pass, 
could you add some log output showing why pods are not created.

We need to be compliant with the compatibility matrix but still we dotn have a 
good answer to the problem of catching up with k8s, it moves fast.

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1608#comment-1608
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 11:44 AM:
---

Hi [~patrick-winter-swisscard]. On our ci we are using v1.15 and tests pass, 
could you add some log output showing why pods are not created.

We need to be compliant with the compatibility matrix but still we dont have a 
good answer to the problem of catching up with k8s, it moves fast.


was (Author: skonto):
Hi [~patrick-winter-swisscard]. On our ci we are using v1.15 and tests pass, 
could you add some log output showing why pods are not created.

We need to be compliant with the compatibility matrix but still we dotn have a 
good answer to the problem of catching up with k8s, it moves fast.

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-15 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885234#comment-16885234
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/15/19 1:43 PM:
--

np :). There will be a 2.4.4 release so there is chance to fix it there. As for 
the policy of maintenance releases probably you are right not sure what falls 
into that category though. On one hand you are targeting K8s releases that are 
way ahead and on the other you use an old client that does not support 
them(check fabric8io's compatibility matrix). We had a long discussion about 
what to k8s versions to support, a project with high velocity that does not 
match Spark release planning.

So for good or bad there is a bug introduced and we need a fix, not sure if 
there is a workaround like the one with ping interval. For the user right now 
the temporary work-around is to stop his session.

The jackson core thing is another important but also hard to upgrade. Actually 
a customer asked this because it didnt pass security checks. That means 2.4.x 
is not acceptable for some people. 

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues once and for all.  


was (Author: skonto):
np :). There will be a 2.4.4 release so there is chance to fix it there. As for 
the policy of maintenance releases probably you are right not sure what falls 
into that category though. On one hand you are targeting K8s releases that are 
way ahead and on the other you use an old client that does not support 
them(check fabric8io's compatibility matrix). We had a long discussion about 
what to k8s versions to support, a project with high velocity that does not 
match Spark release planning.

So for good or bad there is a bug introduced and we need a fix, not sure if 
there is a workaround like the one with ping interval.

The jackson core thing is another important but also hard to upgrade. Actually 
a customer asked this because it didnt pass security checks. That means 2.4.x 
is not acceptable for some people. For the user right now the temporary 
work-around is to stop his session.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues once and for all.  

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-15 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885234#comment-16885234
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/15/19 1:42 PM:
--

np :). There will be a 2.4.4 release so there is chance to fix it there. As for 
the policy of maintenance releases probably you are right not sure what falls 
into that category though. On one hand you are targeting K8s releases that are 
way ahead and on the other you use an old client that does not support 
them(check fabric8io's compatibility matrix). We had a long discussion about 
what to k8s versions to support, a project with high velocity that does not 
match Spark release planning.

So for good or bad there is a bug introduced and we need a fix, not sure if 
there is a workaround like the one with ping interval.

The jackson core thing is another important but also hard to upgrade. Actually 
a customer asked this because it didnt pass security checks. That means 2.4.x 
is not acceptable for some people. For the user right now the temporary 
work-around is to stop his session.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues once and for all.  


was (Author: skonto):
np :). There will be a 2.4.4 release so there is chance to fix it there. As for 
the policy of maintenance releases probably you are right not sure what falls 
into that category though. On one hand you are targeting K8s releases that are 
way ahead and on the other you use an old client that does not support 
them(check fabric8io's compatibility matrix). We had a long discussion about 
what to k8s versions to support, a project with high velocity that does not 
match Spark release planning.

So for good or bad there is a bug introduced and we need a fix, not sure if 
there is a workaround like the one with ping interval.

The jackson core thing is another important but also hard to upgrade. Actually 
a customer asked this because it didnt pass security checks. That means 2.4.x 
is not acceptable for some people.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues once and for all.  

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-15 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885234#comment-16885234
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/15/19 1:41 PM:
--

np :). There will be a 2.4.4 release so there is chance to fix it there. As for 
the policy of maintenance releases probably you are right not sure what falls 
into that category though. On one hand you are targeting K8s releases that are 
way ahead and on the other you use an old client that does not support 
them(check fabric8io's compatibility matrix). We had a long discussion about 
what to k8s versions to support, a project with high velocity that does not 
match Spark release planning.

So for good or bad there is a bug introduced and we need a fix, not sure if 
there is a workaround like the one with ping interval.

The jackson core thing is another important but also hard to upgrade. Actually 
a customer asked this because it didnt pass security checks. That means 2.4.x 
is not acceptable for some people.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues once and for all.  


was (Author: skonto):
np :). There will be a 2.4.4 release so there is chance to fix it there. As for 
the policy of maintenance releases probably you are right not sure what falls 
into that category though. On one hand you are targeting K8s releases that are 
way ahead and on the other you use an old client that does not support 
them(check fabric8io's compatibility matrix). We had a long discussion about 
what to k8s versions to support, a project with high velocity that does not 
match Spark release planning.

So for good or bad we need a fix, not sure if there is a workaround like the 
one with ping interval.

The jackson core thing is another important but also hard to upgrade. Actually 
a customer asked this because it didnt pass security checks. That means 2.4.x 
is not acceptable for some people.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues once and for all.  

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-15 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885234#comment-16885234
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/15/19 1:40 PM:
--

np :). There will be a 2.4.4 release so there is chance to fix it there. As for 
the policy of maintenance releases probably you are right not sure what falls 
into that category though. On one hand you are targeting K8s releases that are 
way ahead and on the other you use an old client that does not support 
them(check fabric8io's compatibility matrix). We had a long discussion about 
what to k8s versions to support, a project with high velocity that does not 
match Spark release planning.

So for good or bad we need a fix, not sure if there is a workaround like the 
one with ping interval.

The jackson core thing is another important but also hard to upgrade. Actually 
a customer asked this because it didnt pass security checks. That means 2.4.x 
is not acceptable for some people.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues once and for all.  


was (Author: skonto):
np :). There will be a 2.4.4 release so there is chance to fix it there. As for 
the policy of maintenance releases probably you are right not sure what falls 
into that category though. On one hand you are targeting K8s releases that are 
way ahead and on the other you use an old client that does not support 
them(check fabric8io's compatibility matrix). We had a long discussion about 
what to k8s versions to support, a project with high velocity that does not 
match Spark release planning.

So for good or bad we need a fix, not sure if there is a workaround like the 
one with ping interval.

The jackson core thing is another important but also hard to upgrade. Actually 
a customer asked this because it didnt pass security checks. That means 2.4.x 
is not acceptable for some people.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues.  

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-15 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885234#comment-16885234
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/15/19 1:39 PM:
--

np :). There will be a 2.4.4 release so there is chance to fix it there. As for 
the policy of maintenance releases probably you are right not sure what falls 
into that category though. On one hand you are targeting K8s releases that are 
way ahead and on the other you use an old client that does not support 
them(check fabric8io's compatibility matrix). We had a long discussion about 
what to k8s versions to support, a project with high velocity that does not 
match Spark release planning.

So for good or bad we need a fix, not sure if there is a workaround like the 
one with ping interval.

The jackson core thing is another important but also hard to upgrade. Actually 
a customer asked this because it didnt pass security checks. That means 2.4.x 
is not acceptable for some people.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues.  


was (Author: skonto):
np :). There will be a 2.4.4 release so there is chance to fix it there. As for 
the policy of maintenance releases probably you are right not sure what falls 
into that category though. On one hand you are targeting K8s releases that are 
way ahead and on the other you use an old client that does not support it 
(check fabric8io's compatibility matrix). So for good or bad we need a fix, not 
sure if there is a workaround like the one with ping interval.

The jackson core thing is another important but also hard to upgrade. Actually 
a customer asked this because it didnt pass security checks. That means 2.4.x 
is not acceptable for some people.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues.  

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-15 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885234#comment-16885234
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/15/19 1:38 PM:
--

np :). There will be a 2.4.4 release so there is chance to fix it there. As for 
the policy of maintenance releases probably you are right not sure what falls 
into that category though. On one hand you are targeting K8s releases that are 
way ahead and on the other you use an old client that does not support it 
(check fabric8io's compatibility matrix). So for good or bad we need a fix, not 
sure if there is a workaround like the one with ping interval.

The jackson core thing is another important but also hard to upgrade. Actually 
a customer asked this because it didnt pass security checks. That means 2.4.x 
is not acceptable for some people.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues.  


was (Author: skonto):
np. There will be a 2.4.4 release so there is chance to fix it there. As for 
the policy of maintenance releases probably you are right not sure what falls 
into that category though. On one hand you are targeting K8s releases that are 
way ahead and on the other you use an old client that does not support it 
(check fabric8io's compatibility matrix). So for good or bad we need a fix, not 
sure if there is a workaround like the one with ping timeout.

The jackson core thing is another important but also hard to upgrade. Actually 
a customer asked this because it didnt pass security checks. That means 2.4.x 
is not acceptable for some people.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues.  

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-15 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885234#comment-16885234
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/15/19 1:37 PM:
--

np. There will be a 2.4.4 release so there is chance to fix it there. As for 
the policy of maintenance releases probably you are right not sure what falls 
into that category though. On one hand you are targeting K8s releases that are 
way ahead and on the other you use an old client that does not support it 
(check fabric8io's compatibility matrix). So for good or bad we need a fix, not 
sure if there is a workaround like the one with ping timeout.

The jackson core thing is another important but also hard to upgrade. Actually 
a customer asked this because it didnt pass security checks. That means 2.4.x 
is not acceptable for some people.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues.  


was (Author: skonto):
np. There will be a 2.4.4 release so there is chance to fix it there.

One thing to remember is also that we cannot keep 3.0.0 since it is outdated 
according to fabric8io's compatibility matrix. So we need a fix, not sure if 
there is a workaround like the one with ping timeout.

The jackson core thing is another important but also hard to upgrade. Actually 
a customer asked this because it didnt pass security checks. That means 2.4.x 
is not acceptable for some people.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues.  

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-15 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885234#comment-16885234
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/15/19 1:35 PM:
--

np. There will be a 2.4.4 release so there is chance to fix it there.

One thing to remember is also that we cannot keep 3.0.0 since it is outdated 
according to fabric8io's compatibility matrix. So we need a fix, not sure if 
there is a workaround like the one with ping timeout.

The jackson core thing is another important but also hard to upgrade. Actually 
a customer asked this because it didnt pass security checks. That means 2.4.x 
is not acceptable for some people.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues.  


was (Author: skonto):
np. There will be a 2.4.4 release so there is chance to fix it there. The 
jackson core thing is another important but also hard to upgrade. Actually a 
customer asked this because it didnt pass security checks. That means 2.4.x is 
not acceptable for some people.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues. 

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-15 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885234#comment-16885234
 ] 

Stavros Kontopoulos commented on SPARK-27927:
-

np. There will be a 2.4.4 release so there is chance to fix it there. The 
jackson core thing is another important but also hard to upgrade. Actually a 
customer asked this because it didnt pass security checks. That means 2.4.x is 
not acceptable for some people.

Personally I was not aware of the daemon thread issue. I hope Spark 3.0.0 will 
solve these two issues. 

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-14 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884791#comment-16884791
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/15/19 12:41 AM:
---

I was able to reproduce it easily with 2.4.3 and this similar code:

from __future__ import print_function

import sys
 from random import random
 from operator import add

from pyspark.sql import SparkSession

if __name__ == "__main__":(
 """
 Usage: pi [partitions]
 """
 spark = SparkSession\
 .builder\
 .appName("PythonPi")\
 .getOrCreate()

 Also commented out this part in Eventloop to make it stay in runnable state 
(may be not required):

// val event = eventQueue.take()
// try {
// onReceive(event)
// } catch {
// case NonFatal(e) =>
// try {
// onError(e)
// } catch {
// case NonFatal(e) => logError("Unexpected error in " + name, e)
// }
// }

I used this tool [https://github.com/jglick/jkillthread] to kill eventloop 
succesfully and then tried to kill the other okhttp thread:

19/07/15 00:12:06 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
 Killing "OkHttp [https://kubernetes.default.svc/]...;
 Did not find "OkHttp [https://kubernetes.default.svc/]...;
 Killing "dag-scheduler-event-loop"
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException

Unfortunately I cant the kill the latter as another one is created. Anyway that 
means that this is just another case of 
https://issues.apache.org/jira/browse/SPARK-27812 .

spark.stop() obviously stops the k8s client and everything finishes as expected.

 


was (Author: skonto):
I was able to reproduce it easily with 2.4.3 and this similar code:

from __future__ import print_function

import sys
 from random import random
 from operator import add

from pyspark.sql import SparkSession

if __name__ == "__main__":(
 """
 Usage: pi [partitions]
 """
 spark = SparkSession\
 .builder\
 .appName("PythonPi")\
 .getOrCreate()

 

I used this tool [https://github.com/jglick/jkillthread] to kill eventloop 
succesfully and then tried to kill the other okhttp thread:

19/07/15 00:12:06 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
 Killing "OkHttp [https://kubernetes.default.svc/]...;
 Did not find "OkHttp [https://kubernetes.default.svc/]...;
 Killing "dag-scheduler-event-loop"
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException

Unfortunately I cant the kill the latter as another one is created. Anyway that 
means that this is just another case of 
https://issues.apache.org/jira/browse/SPARK-27812 .

spark.stop() obviously stops the k8s client and everything finishes as expected.

 

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0,

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-14 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884791#comment-16884791
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/15/19 12:38 AM:
---

I was able to reproduce it easily with 2.4.3 and this similar code:

from __future__ import print_function

import sys
 from random import random
 from operator import add

from pyspark.sql import SparkSession

if __name__ == "__main__":(
 """
 Usage: pi [partitions]
 """
 spark = SparkSession\
 .builder\
 .appName("PythonPi")\
 .getOrCreate()

 

I used this tool [https://github.com/jglick/jkillthread] to kill eventloop and 
then the other okhttp thread:

19/07/15 00:12:06 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
 Killing "OkHttp [https://kubernetes.default.svc/]...;
 Did not find "OkHttp [https://kubernetes.default.svc/]...;
 Killing "dag-scheduler-event-loop"
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException

Unfortunately I cant the kill the latter as another one is created. Anyway that 
means that this is just another case of 
https://issues.apache.org/jira/browse/SPARK-27812 .

spark.stop() obviously stops the k8s client and everything finishes as expected.

 


was (Author: skonto):
I was able to reproduce it easily with 2.4.3 and this similar code:

from __future__ import print_function

import sys
 from random import random
 from operator import add

from pyspark.sql import SparkSession

if __name__ == "__main__":(
 """
 Usage: pi [partitions]
 """
 spark = SparkSession\
 .builder\
 .appName("PythonPi")\
 .getOrCreate()

 

I used this tool [https://github.com/jglick/jkillthread] to kill eventloop and 
then the other okhttp thread:

19/07/15 00:12:06 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
 Killing "OkHttp [https://kubernetes.default.svc/]...;
 Did not find "OkHttp [https://kubernetes.default.svc/]...;
 Killing "dag-scheduler-event-loop"
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException

Unfortunately I cant the kill the latter as another one is created. Anyway that 
means that this is just another case of 
https://issues.apache.org/jira/browse/SPARK-27812 .

spark.stop() obviously stops the k8s client and everything finishes as expected.

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
>

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-14 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884791#comment-16884791
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/15/19 12:38 AM:
---

I was able to reproduce it easily with 2.4.3 and this similar code:

from __future__ import print_function

import sys
 from random import random
 from operator import add

from pyspark.sql import SparkSession

if __name__ == "__main__":(
 """
 Usage: pi [partitions]
 """
 spark = SparkSession\
 .builder\
 .appName("PythonPi")\
 .getOrCreate()

 

I used this tool [https://github.com/jglick/jkillthread] to kill eventloop 
succesfully and then tried to kill the other okhttp thread:

19/07/15 00:12:06 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
 Killing "OkHttp [https://kubernetes.default.svc/]...;
 Did not find "OkHttp [https://kubernetes.default.svc/]...;
 Killing "dag-scheduler-event-loop"
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException

Unfortunately I cant the kill the latter as another one is created. Anyway that 
means that this is just another case of 
https://issues.apache.org/jira/browse/SPARK-27812 .

spark.stop() obviously stops the k8s client and everything finishes as expected.

 


was (Author: skonto):
I was able to reproduce it easily with 2.4.3 and this similar code:

from __future__ import print_function

import sys
 from random import random
 from operator import add

from pyspark.sql import SparkSession

if __name__ == "__main__":(
 """
 Usage: pi [partitions]
 """
 spark = SparkSession\
 .builder\
 .appName("PythonPi")\
 .getOrCreate()

 

I used this tool [https://github.com/jglick/jkillthread] to kill eventloop and 
then the other okhttp thread:

19/07/15 00:12:06 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
 Killing "OkHttp [https://kubernetes.default.svc/]...;
 Did not find "OkHttp [https://kubernetes.default.svc/]...;
 Killing "dag-scheduler-event-loop"
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException

Unfortunately I cant the kill the latter as another one is created. Anyway that 
means that this is just another case of 
https://issues.apache.org/jira/browse/SPARK-27812 .

spark.stop() obviously stops the k8s client and everything finishes as expected.

 

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-14 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884791#comment-16884791
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/15/19 12:37 AM:
---

I was able to reproduce it easily with 2.4.3 and this similar code:

from __future__ import print_function

import sys
 from random import random
 from operator import add

from pyspark.sql import SparkSession

if __name__ == "__main__":(
 """
 Usage: pi [partitions]
 """
 spark = SparkSession\
 .builder\
 .appName("PythonPi")\
 .getOrCreate()

 

I used this tool [https://github.com/jglick/jkillthread] to kill eventloop and 
then the other okhttp thread:

19/07/15 00:12:06 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
 Killing "OkHttp [https://kubernetes.default.svc/]...;
 Did not find "OkHttp [https://kubernetes.default.svc/]...;
 Killing "dag-scheduler-event-loop"
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException

Unfortunately I cant the kill the latter as another one is created. Anyway that 
means that this is just another case of 
https://issues.apache.org/jira/browse/SPARK-27812 .

spark.stop() obviously stops the k8s client and everything finishes as expected.


was (Author: skonto):
I was able to reproduce it reasily with 2.4.3:


from __future__ import print_function

import sys
from random import random
from operator import add

from pyspark.sql import SparkSession


if __name__ == "__main__":
 """
 Usage: pi [partitions]
 """
 spark = SparkSession\
 .builder\
 .appName("PythonPi")\
 .getOrCreate()

 

I used this tool [https://github.com/jglick/jkillthread] to kill eventloop and 
then the other okhttp thread:

19/07/15 00:12:06 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
 Killing "OkHttp [https://kubernetes.default.svc/]...;
 Did not find "OkHttp [https://kubernetes.default.svc/]...;
 Killing "dag-scheduler-event-loop"
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException

Unfortunately I cant the kill the latter as another one is created. Anyway that 
means that this is just another case of 
https://issues.apache.org/jira/browse/SPARK-27812 .

spark.stop() obviously stops the k8s client and everything finishes as expected.

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-14 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884791#comment-16884791
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/15/19 12:36 AM:
---

I was able to reproduce it reasily with 2.4.3:


from __future__ import print_function

import sys
from random import random
from operator import add

from pyspark.sql import SparkSession


if __name__ == "__main__":
 """
 Usage: pi [partitions]
 """
 spark = SparkSession\
 .builder\
 .appName("PythonPi")\
 .getOrCreate()

 

I used this tool [https://github.com/jglick/jkillthread] to kill eventloop and 
then the other okhttp thread:

19/07/15 00:12:06 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
 Killing "OkHttp [https://kubernetes.default.svc/]...;
 Did not find "OkHttp [https://kubernetes.default.svc/]...;
 Killing "dag-scheduler-event-loop"
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException

Unfortunately I cant the kill the latter as another one is created. Anyway that 
means that this is just another case of 
https://issues.apache.org/jira/browse/SPARK-27812 .

spark.stop() obviously stops the k8s client and everything finishes as expected.


was (Author: skonto):
I was able to reproduce it reasily with 2.4.3.

I used this tool [https://github.com/jglick/jkillthread] to kill eventloop and 
then the other okhttp thread:

19/07/15 00:12:06 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
 Killing "OkHttp [https://kubernetes.default.svc/]...;
 Did not find "OkHttp [https://kubernetes.default.svc/]...;
 Killing "dag-scheduler-event-loop"
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException

Unfortunately I cant the kill the latter as another one is created. Anyway that 
means that this is just another case of 
https://issues.apache.org/jira/browse/SPARK-27812 .

spark.stop() obviously stops the k8s client and everything finishes as expected.

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={}

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-14 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884791#comment-16884791
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/15/19 12:35 AM:
---

I was able to reproduce it reasily with 2.4.3.

I used this tool [https://github.com/jglick/jkillthread] to kill eventloop and 
then the other okhttp thread:

19/07/15 00:12:06 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
 Killing "OkHttp [https://kubernetes.default.svc/]...;
 Did not find "OkHttp [https://kubernetes.default.svc/]...;
 Killing "dag-scheduler-event-loop"
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Killing "OkHttp WebSocket [https://kubernetes.default.svc/]...;
 Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]...; 
java.lang.IllegalMonitorStateException

Unfortunately I cant the kill the latter as another one is created. Anyway that 
means that this is just another case of 
https://issues.apache.org/jira/browse/SPARK-27812 .

spark.stop() obviously stops the k8s client and everything finishes as expected.


was (Author: skonto):
I was able to reproduce it reasily with 2.4.3.

I used this tool [https://github.com/jglick/jkillthread] to kill eventloop and 
then the other okhttp thread:

19/07/15 00:12:06 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
Killing "OkHttp https://kubernetes.default.svc/...;
Did not find "OkHttp https://kubernetes.default.svc/...;
Killing "dag-scheduler-event-loop"
Killing "OkHttp WebSocket https://kubernetes.default.svc/...;
Exception in thread "OkHttp WebSocket https://kubernetes.default.svc/...; 
java.lang.IllegalMonitorStateException
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Killing "OkHttp WebSocket https://kubernetes.default.svc/...;
Exception in thread "OkHttp WebSocket https://kubernetes.default.svc/...; 
java.lang.IllegalMonitorStateException

Unfortunately I cant the kill the latter as another one is created. Anyway that 
means that this is just another case of 
https://issues.apache.org/jira/browse/SPARK-27812 

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:

[jira] [Commented] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-14 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884791#comment-16884791
 ] 

Stavros Kontopoulos commented on SPARK-27927:
-

I was able to reproduce it reasily with 2.4.3.

I used this tool [https://github.com/jglick/jkillthread] to kill eventloop and 
then the other okhttp thread:

19/07/15 00:12:06 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
Killing "OkHttp https://kubernetes.default.svc/...;
Did not find "OkHttp https://kubernetes.default.svc/...;
Killing "dag-scheduler-event-loop"
Killing "OkHttp WebSocket https://kubernetes.default.svc/...;
Exception in thread "OkHttp WebSocket https://kubernetes.default.svc/...; 
java.lang.IllegalMonitorStateException
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Killing "OkHttp WebSocket https://kubernetes.default.svc/...;
Exception in thread "OkHttp WebSocket https://kubernetes.default.svc/...; 
java.lang.IllegalMonitorStateException

Unfortunately I cant the kill the latter as another one is created. Anyway that 
means that this is just another case of 
https://issues.apache.org/jira/browse/SPARK-27812 

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-14 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884675#comment-16884675
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/14/19 1:50 PM:
--

That call, among others, creates a SparkContext if it does not exist. The 
SparkContext will start the dag scheduler thread which starts this eventLoop 
thread. We have the following facts: a) a non-daemon thread is running due to 
https://issues.apache.org/jira/browse/SPARK-27812 b) a daemon thread is blocked 
which could cause issues 
([https://meteatamel.wordpress.com/2012/05/22/when-a-daemon-thread-is-not-so-daemon/])
  c) no shutdownhook was run although main has exited, as jvm cannot exit.

I would start with commenting out  
[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]
 build and re-run the job. Since this is a dummy job with no actions it does 
not matter. If jvm still does not exit then the only explanation is that 
https://issues.apache.org/jira/browse/SPARK-27812 stops us from that. If it 
exits then it could mean that for some reason in 2.4.0 EventLoop will not have 
the time to block as things move faster (we can show that with adding logging).


was (Author: skonto):
That call, among others, creates a SparkContext if it does not exist. The 
SparkContext will start the dag scheduler thread which starts this eventLoop 
thread. We have the following facts: a) a non-daemon thread is running due to 
https://issues.apache.org/jira/browse/SPARK-27812 b) a daemon thread is blocked 
which could cause issues 
([https://meteatamel.wordpress.com/2012/05/22/when-a-daemon-thread-is-not-so-daemon/])
  c) no shutdownhook was run although main has exited, as jvm cannot exit.

I would start with commenting out  
[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]
 build and re-run the job. Since this is a dummy job with no actions it does 
not matter. If jvm still does not exit then the only explanation is that 
https://issues.apache.org/jira/browse/SPARK-27812 stops us from that. If it 
exits then it could mean that for some reason in 2.4.0 EventLoop will not have 
the time to block as things move faster.

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-14 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884675#comment-16884675
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/14/19 1:23 PM:
--

That call, among others, creates a SparkContext if it does not exist. The 
SparkContext will start the dag scheduler thread which starts this eventLoop 
thread. We have the following facts: a) a non-daemon thread is running due to 
https://issues.apache.org/jira/browse/SPARK-27812 b) a daemon thread is blocked 
which could cause issues 
([https://meteatamel.wordpress.com/2012/05/22/when-a-daemon-thread-is-not-so-daemon/])
  c) no shutdownhook was run although main has exited, as jvm cannot exit.

I would start with commenting out  
[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]
 build and re-run the job. Since this is a dummy job with no actions it does 
not matter. If jvm still does not exit then the only explanation is that 
https://issues.apache.org/jira/browse/SPARK-27812 stops us from that. If it 
exits then it could mean that for some reason in 2.4.0 EventLoop will not have 
the time to block as things move faster.


was (Author: skonto):
That call, among others, creates a SparkContext if it does not exist. The 
SparkContext will start the dag scheduler thread which starts this eventLoop 
thread. We have the following facts: a) a non-daemon thread is running due to 
https://issues.apache.org/jira/browse/SPARK-27812 b) a daemon thread is blocked 
which could cause issues 
([https://meteatamel.wordpress.com/2012/05/22/when-a-daemon-thread-is-not-so-daemon/])
  c) no shutdownhook was run although main has exited, as jvm cannot exit.

I would start with commenting out  
[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]
 build and re-run the job. Since this is a dummy job with no actions it does 
not matter. If jvm still does not exit then the only explanation is that 
https://issues.apache.org/jira/browse/SPARK-27812 stops us from that.

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-14 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884675#comment-16884675
 ] 

Stavros Kontopoulos commented on SPARK-27927:
-

That call, among others, creates a SparkContext if it does not exist. The 
SparkContext will start the dag scheduler thread which starts this eventLoop 
thread. We have these facts: a) a non-daemon thread is running due to 
https://issues.apache.org/jira/browse/SPARK-27812 b) a daemon thread is blocked 
which could cause issues 
([https://meteatamel.wordpress.com/2012/05/22/when-a-daemon-thread-is-not-so-daemon/])
  c) no shutdownhook was run although main has exited, as jvm cannot exit.

I would start with commenting out  
[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]
 build and re-run the job. Since this is a dummy job with no actions it does 
not matter. If jvm still does not exit then the only explanation is that 
https://issues.apache.org/jira/browse/SPARK-27812 stops us from that.

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-14 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884675#comment-16884675
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/14/19 1:21 PM:
--

That call, among others, creates a SparkContext if it does not exist. The 
SparkContext will start the dag scheduler thread which starts this eventLoop 
thread. We have the following facts: a) a non-daemon thread is running due to 
https://issues.apache.org/jira/browse/SPARK-27812 b) a daemon thread is blocked 
which could cause issues 
([https://meteatamel.wordpress.com/2012/05/22/when-a-daemon-thread-is-not-so-daemon/])
  c) no shutdownhook was run although main has exited, as jvm cannot exit.

I would start with commenting out  
[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]
 build and re-run the job. Since this is a dummy job with no actions it does 
not matter. If jvm still does not exit then the only explanation is that 
https://issues.apache.org/jira/browse/SPARK-27812 stops us from that.


was (Author: skonto):
That call, among others, creates a SparkContext if it does not exist. The 
SparkContext will start the dag scheduler thread which starts this eventLoop 
thread. We have these facts: a) a non-daemon thread is running due to 
https://issues.apache.org/jira/browse/SPARK-27812 b) a daemon thread is blocked 
which could cause issues 
([https://meteatamel.wordpress.com/2012/05/22/when-a-daemon-thread-is-not-so-daemon/])
  c) no shutdownhook was run although main has exited, as jvm cannot exit.

I would start with commenting out  
[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]
 build and re-run the job. Since this is a dummy job with no actions it does 
not matter. If jvm still does not exit then the only explanation is that 
https://issues.apache.org/jira/browse/SPARK-27812 stops us from that.

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-13 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884374#comment-16884374
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 2:24 PM:
--

Yes, needs debugging (build Spark with extra log statements, one way to do it), 
but if you check the code there, there is an interrupt call by the other thread 
that joins the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen (logging would help but i suspect it never happens)? Stop 
is called there by the shutdownhook when sparkcontext is stopped. So the diff 
with the working version will be why in the working case the shutdown happens?  
Are you using btw the same jdk (we need to make sure behavior has not changed 
as in this one: 
[https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)]
 Another question is why there is no PythonRunner thread, has that exited?


was (Author: skonto):
Yes, needs debugging (build Spark with extra log statements), but if you check 
the code there, there is an interrupt call by the other thread that joins the 
EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen (logging would help but i suspect it never happens)? Stop 
is called there by the shutdownhook when sparkcontext is stopped. So the diff 
with the working version will be why in the working case the shutdown happens?  
Are you using btw the same jdk (we need to make sure behavior has not changed 
as in this one: 
[https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)]
 Another question is why there is no PythonRunner thread, has that exited?

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-13 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884374#comment-16884374
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 2:20 PM:
--

Yes, needs debugging (build Spark with extra log statements), but if you check 
the code there, there is an interrupt call by the other thread that joins the 
EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen (logging would help but i suspect it never happens)? Stop 
is called there by the shutdownhook when sparkcontext is stopped. So the diff 
with the working version will be why in the working case the shutdown happens?  
Are you using btw the same jdk (we need to make sure behavior has not changed 
as in this one: 
[https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)]
 Another question is why there is no PythonRunner thread, has that exited?


was (Author: skonto):
Yes, needs debugging, but if you check the code there, there is an interrupt 
call by the other thread that joins the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen (logging would help but i suspect it never happens)? Stop 
is called there by the shutdownhook when sparkcontext is stopped. So the diff 
with the working version will be why in the working case the shutdown happens?  
Are you using btw the same jdk (we need to make sure behavior has not changed 
as in this one: 
[https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)]
 Another question is why there is no PythonRunner thread, has that exited?

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-13 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884374#comment-16884374
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 2:13 PM:
--

Yes, needs debugging, but if you check the code there, there is an interrupt 
call by the other thread that joins the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen (logging would help but i suspect it never happens)? Stop 
is called there by the shutdownhook when sparkcontext is stopped. So the diff 
with the working version will be why in the working case the shutdown happens?  
Are you using btw the same jdk (we need to make sure behavior has not changed 
as in this one: 
[https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)]
 Another question is why there is no PythonRunner thread, has that exited?


was (Author: skonto):
Yes, needs debugging, but if you check the code there, there is an interrupt 
call by the other thread that joins the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen (logging would help but i suspect it never happens)? Stop 
is called there by the shutdownhook when sparkcontext is stopped. So the diff 
with the working version will be why in the working case the shutdown happens?  
Are you using btw the same jdk (we need to make sure behavior has not changed 
as in this one: 
[https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)]

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-13 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884374#comment-16884374
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 1:56 PM:
--

Yes, needs debugging, but if you check the code there, there is an interrupt 
call by the other thread that joins the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen (logging would help but i suspect it never happens)? Stop 
is called there by the shutdownhook when sparkcontext is stopped. So the diff 
with the working version will be why in the working case the shutdown happens?  
Are you using btw the same jdk (we need to make sure behavior has not changed 
as in this one: 
[https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)]


was (Author: skonto):
Yes, needs debugging, but if you check the code there, there is an interrupt 
call by the other thread that joins the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen (logging would help but i suspect it never happens)? Stop 
is called there by the shutdownhook when sparkcontext is stopped. So the diff 
with the working version will be why in the working case the shutdown happens? 
Btw since 

DestroyJavaVM is there as a thread in your dump the shutdown process has 
started but blocked. Are you using btw the same jdk (we need to make sure 
behavior has not changed as in this one: 
[https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)]

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-13 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884374#comment-16884374
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 1:53 PM:
--

Yes, needs debugging, but if you check the code there, there is an interrupt 
call by the other thread that joins the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen (logging would help but i suspect it never happens)? Stop 
is called there by the shutdownhook when sparkcontext is stopped. So the diff 
with the working version will be why in the working case the shutdown happens? 
Btw since 

DestroyJavaVM is there as a thread in your dump the shutdown process has 
started but blocked. Are you using btw the same jdk (we need to make sure 
behavior has not changed as in this one: 
[https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)]


was (Author: skonto):
Yes, needs debugging, but if you check the code there, there is an interrupt 
call by the other thread that joins the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen (logging would help but i suspect it never happens)? Stop 
is called there by the shutdownhook when sparkcontext is stopped. So the diff 
with the working version will be why in the working case the shutdown happens? 
Btw since 

DestroyJavaVM is there as a thread in your dump the shutdown process has 
started but blocked. Are you using btw the same jdk (we need to make sure 
behavior has not changed as in this one: 
https://bugs.openjdk.java.net/browse/JDK-8154017)?

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-13 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884374#comment-16884374
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 1:52 PM:
--

Yes, needs debugging, but if you check the code there, there is an interrupt 
call by the other thread that joins the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen (logging would help but i suspect it never happens)? Stop 
is called there by the shutdownhook when sparkcontext is stopped. So the diff 
with the working version will be why in the working case the shutdown happens? 
Btw since 

DestroyJavaVM is there as a thread in your dump the shutdown process has 
started but blocked. Are you using btw the same jdk (we need to make sure 
behavior has not changed as in this one: 
https://bugs.openjdk.java.net/browse/JDK-8154017)?


was (Author: skonto):
Yes, needs debugging, but if you check the code there, there is an interrupt 
call by the other thread that joins the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen (logging would help but i suspect it never happens)? Stop 
is called there by the shutdownhook when sparkcontext is stopped. So the diff 
with the working version will be why in the working case the shutdown happens? 
Btw since 

DestroyJavaVM is there as a thread in your dump the shutdown process has 
started but blocked. Are you using btw the same jdk?

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-13 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884374#comment-16884374
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 1:51 PM:
--

Yes, needs debugging, but if you check the code there, there is an interrupt 
call by the other thread that joins the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen (logging would help but i suspect it never happens)? Stop 
is called there by the shutdownhook when sparkcontext is stopped. So the diff 
with the working version will be why in the working case the shutdown happens? 
Btw since 

DestroyJavaVM is there as a thread in your dump the shutdown process has 
started but blocked. Are you using btw the same jdk?


was (Author: skonto):
Yes, needs debugging, but if you check the code there, there is an interrupt 
call by the other thread that joins the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen (logging would help but i suspect it never happens)? Stop 
is called there by the shutdownhook when sparkcontext is stopped. So the diff 
with the working version will be why in the working case the shutdown happens?

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-13 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884374#comment-16884374
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 1:46 PM:
--

Yes, needs debugging, but if you check the code there, there is an interrupt 
call by the other thread that joins the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen (logging would help but i suspect it never happens)? Stop 
is called there by the shutdownhook when sparkcontext is stopped. So the diff 
with the working version will be why in the working case the shutdown happens?


was (Author: skonto):
Yes, needs debugging, but if you check the code there, there is an interrupt 
call by the other thread that joins the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen? Stop is called there by the shutdownhook when 
sparkcontext is stopped. So the diff with the working version will be why in 
the working case the shutdown happens?

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-13 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884374#comment-16884374
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 1:43 PM:
--

Yes, needs debugging, but if you check the code there, there is an interrupt 
call by the other thread that joins the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Without the interrupt the EventLoop thread cannot exit.

Does this ever happen? Stop is called there by the shutdownhook when 
sparkcontext is stopped. So the diff with the working version will be why in 
the working case the shutdown happens?


was (Author: skonto):
Yes, needs debugging not sure if the commit itself if the issue, but if you 
check the code there there is an interrupt call by the other thread that joins 
the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Does this ever happen? Stop is called there by the shutdownhook when 
sparkcontext is stopped. So the diff will be why in the working case the 
shutdown happens?

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-13 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884374#comment-16884374
 ] 

Stavros Kontopoulos commented on SPARK-27927:
-

Yes, needs debugging not sure if the commit itself if the issue, but if you 
check the code there there is an interrupt call by the other thread that joins 
the EventLoop one:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78]

Does this ever happen? Stop is called there by the shutdownhook when 
sparkcontext is stopped. So the diff will be why in the working case the 
shutdown happens?

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-12 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884250#comment-16884250
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 12:37 AM:
---

It is on 2.4.0: 
[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]

Not sure if it is the k8s client in this case because if you check my thread 
dump [https://gist.github.com/skonto/74181e434a727901d4f3323461c1050b]  in 
[https://github.com/apache/spark/pull/24796] (this is recent because I didnt 
report it earlier, this failing pi job was there for at least a year but didnt 
have time...) these k8s threads still exist but they were not the root cause in 
the case with the exception. In any case we need to spot the root cause because 
we dont know how we ended up in different results anyway. So my question is why 
that thread is blocked there and we should debug the execution sequence in both 
cases eg. add logging. If it was the K8s threads I would expect to see only 
these threads blocked but it is also the eventloop, my 0.02$.


was (Author: skonto):
It is on 2.4.0: 
[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]

Not sure if it is the k8s client in this case because if you check my thread 
dump [https://gist.github.com/skonto/74181e434a727901d4f3323461c1050b]  in 
[https://github.com/apache/spark/pull/24796] (this is recent because I didnt 
report it earlier, this failing pi job was there for at least a year but didnt 
have time...) these k8s threads still exist but they were not the root cause in 
the case with the exception. In any case we need to spot the root cause because 
we dont know how we ended up in different results anyway. So my question is why 
that thread is blocked there and we should debug the execution sequence in both 
cases eg. add logging. If it was the K8s threads I would expect to see only 
these threads blocked but it is also the eventloop.

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-12 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884250#comment-16884250
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 12:37 AM:
---

It is on 2.4.0: 
[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]

Not sure if it is the k8s client in this case because if you check my thread 
dump [https://gist.github.com/skonto/74181e434a727901d4f3323461c1050b]  in 
[https://github.com/apache/spark/pull/24796] (this is recent because I didnt 
report it earlier, this failing pi job was there for at least a year but didnt 
have time...) these k8s threads still exist but they were not the root cause in 
the case with the exception. In any case we need to spot the root cause because 
we dont know how we ended up in different results anyway. So my question is why 
that thread is blocked there and we should debug the execution sequence in both 
cases eg. add logging. If it was the K8s threads I would expect to see only 
these threads blocked but it is also the eventloop.


was (Author: skonto):
It is on 2.4.0: 
[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]

Not sure if it is the k8s client in this case because if you check my thread 
dump [https://gist.github.com/skonto/74181e434a727901d4f3323461c1050b]  in 
[https://github.com/apache/spark/pull/24796] these k8s threads still exist but 
they were not the root cause in the case with the exception. In any case we 
need to spot the root cause because we dont know how we ended up in different 
results anyway. So my question is why that thread is blocked there and we 
should debug the execution sequence in both cases eg. add logging. If it was 
the K8s threads I would expect to see only these threads blocked but it is also 
the eventloop.

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-12 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884250#comment-16884250
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 12:36 AM:
---

It is on 2.4.0: 
[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]

Not sure if it is the k8s client in this case because if you check my thread 
dump [https://gist.github.com/skonto/74181e434a727901d4f3323461c1050b]  in 
[https://github.com/apache/spark/pull/24796] these k8s threads still exist but 
they were not the root cause in the case with the exception. In any case we 
need to spot the root cause because we dont know how we ended up in different 
results anyway. So my question is why that thread is blocked there and we 
should debug the execution sequence in both cases eg. add logging. If it was 
the K8s threads I would expect to see only these threads blocked but it is also 
the eventloop.


was (Author: skonto):
It is on 2.4.0: 
[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]

Not sure if it is the k8s client in this case because if you check my thread 
dump [https://gist.github.com/skonto/74181e434a727901d4f3323461c1050b]  in 
[https://github.com/apache/spark/pull/24796] these k8s threads still exist but 
they were not the root cause in the case with the exception. In any case we 
need to spot the root cause because we dont know how we ended up in different 
results anyway. So my question is why that thread is blocked there and start 
re-wind the execution in both cases. If it was the K8s threads I would expect 
to see only these threads blocked but it is also the eventloop.

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-12 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884250#comment-16884250
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 12:35 AM:
---

It is on 2.4.0: 
[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]

Not sure if it is the k8s client in this case because if you check my thread 
dump [https://gist.github.com/skonto/74181e434a727901d4f3323461c1050b]  in 
[https://github.com/apache/spark/pull/24796] these k8s threads still exist but 
they were not the root cause in the case with the exception. In any case we 
need to spot the root cause because we dont know how we ended up in different 
results anyway. So my question is why that thread is blocked there and start 
re-wind the execution in both cases. If it was the K8s threads I would expect 
to see only these threads blocked but it is also the eventloop.


was (Author: skonto):
It is on 2.4.0:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]

Not sure if it is the k8s client in this case because if you check my thread 
dump [https://gist.github.com/skonto/74181e434a727901d4f3323461c1050b]  in 
[https://github.com/apache/spark/pull/24796] these k8s threads still exist but 
they were not the root cause in the case with the exception. In any case we 
need to spot the root cause because we dont know how we ended up in different 
results anyway. So my question is why that thread is blocked there and start 
re-wind the execution in both cases. If it was the K8s threads I would expect 
to see only these threads blocked but it is also the eventloop.

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-12 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884250#comment-16884250
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 12:34 AM:
---

It is on 2.4.0:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]

Not sure if it is the k8s client in this case because if you check my thread 
dump [https://gist.github.com/skonto/74181e434a727901d4f3323461c1050b]  in 
[https://github.com/apache/spark/pull/24796] these k8s threads still exist but 
they were not the root cause in the case with the exception. In any case we 
need to spot the root cause because we dont know how we ended up in different 
results anyway. So my question is why that thread is blocked there and start 
re-wind the execution in both cases. If it was the K8s threads I would expect 
to see only these threads blocked but it is also the eventloop.


was (Author: skonto):
It is on 2.4.0:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]

Not sure if it is the k8s client in this case because if you check my thread 
dump [https://gist.github.com/skonto/74181e434a727901d4f3323461c1050b]  in 
[https://github.com/apache/spark/pull/24796] these k8s threads still exist but 
they were not the root cause in the case with the exception. In any case we 
need to spot the root cause. So my question is why that thread is blocked there.

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-12 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884250#comment-16884250
 ] 

Stavros Kontopoulos commented on SPARK-27927:
-

It is on 2.4.0:

[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]

Not sure if it is the k8s client in this case because if you check my thread 
dump [https://gist.github.com/skonto/74181e434a727901d4f3323461c1050b]  in 
[https://github.com/apache/spark/pull/24796] these k8s threads still exist but 
they were not the root cause in the case with the exception. In any case we 
need to spot the root cause. So my question is why that thread is blocked there.

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-12 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884121#comment-16884121
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/12/19 7:41 PM:
--

I think the issue is here:

 
{quote}"dag-scheduler-event-loop" #50 daemon prio=5 os_prio=0 
tid=0x7f561ceb1000 nid=0xa6 waiting on condition [0x7f5619ee4000] 
java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native 
Method) - parking to wait for <0x000542de6188> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at 
java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
 at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680) 
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:47)
{quote}
 

Code is here 
:[https://github.com/apache/spark/blob/aa41dcea4a41899507dfe4ec1eceaabb5edf728f/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47].

That thread is blocked there (blocking queue) and although its a daemon thread 
it cannot move forward. Why it happens I dont know exactly but looks similar to 
[https://github.com/apache/spark/pull/24796] (although there is no exception 
here),  [~zsxwing] thoughts?


was (Author: skonto):
I think the issue is here:

 
{quote}"dag-scheduler-event-loop" #50 daemon prio=5 os_prio=0 
tid=0x7f561ceb1000 nid=0xa6 waiting on condition [0x7f5619ee4000] 
java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native 
Method) - parking to wait for <0x000542de6188> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at 
java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
 at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680) 
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:47)
{quote}
 

Code is here 
:[https://github.com/apache/spark/blob/aa41dcea4a41899507dfe4ec1eceaabb5edf728f/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47].

That thread is blocked there (blocking queue) and although its a daemon thread 
it cannot move forward. Why it happens I dont know exactly but looks similar to 
[https://github.com/apache/spark/pull/24796],  [~zsxwing] thoughts?

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-12 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884121#comment-16884121
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/12/19 7:40 PM:
--

I think the issue is here:

 
{quote}"dag-scheduler-event-loop" #50 daemon prio=5 os_prio=0 
tid=0x7f561ceb1000 nid=0xa6 waiting on condition [0x7f5619ee4000] 
java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native 
Method) - parking to wait for <0x000542de6188> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at 
java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
 at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680) 
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:47)
{quote}
 

Code is here 
:[https://github.com/apache/spark/blob/aa41dcea4a41899507dfe4ec1eceaabb5edf728f/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47].

That thread blocked there (blocking queue) and although its a daemon thread it 
cannot move forward. Why it happens I dont know exactly but looks similar to 
[https://github.com/apache/spark/pull/24796],  [~zsxwing] thoughts?


was (Author: skonto):
I think the issue is here:

```

"dag-scheduler-event-loop" #50 daemon prio=5 os_prio=0 tid=0x7f561ceb1000 
nid=0xa6 waiting on condition [0x7f5619ee4000] java.lang.Thread.State: 
WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 
<0x000542de6188> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at 
java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
 at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680) 
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:47)

```

Code is 
[here|[https://github.com/apache/spark/blob/aa41dcea4a41899507dfe4ec1eceaabb5edf728f/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]].

That thread blocked there (blocking queue) and although its a daemon thread it 
cannot move forward. Why it happens I dont know exactly but looks similar to 
[https://github.com/apache/spark/pull/24796],  [~zsxwing] thoughts?

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-12 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884121#comment-16884121
 ] 

Stavros Kontopoulos edited comment on SPARK-27927 at 7/12/19 7:40 PM:
--

I think the issue is here:

 
{quote}"dag-scheduler-event-loop" #50 daemon prio=5 os_prio=0 
tid=0x7f561ceb1000 nid=0xa6 waiting on condition [0x7f5619ee4000] 
java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native 
Method) - parking to wait for <0x000542de6188> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at 
java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
 at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680) 
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:47)
{quote}
 

Code is here 
:[https://github.com/apache/spark/blob/aa41dcea4a41899507dfe4ec1eceaabb5edf728f/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47].

That thread is blocked there (blocking queue) and although its a daemon thread 
it cannot move forward. Why it happens I dont know exactly but looks similar to 
[https://github.com/apache/spark/pull/24796],  [~zsxwing] thoughts?


was (Author: skonto):
I think the issue is here:

 
{quote}"dag-scheduler-event-loop" #50 daemon prio=5 os_prio=0 
tid=0x7f561ceb1000 nid=0xa6 waiting on condition [0x7f5619ee4000] 
java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native 
Method) - parking to wait for <0x000542de6188> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at 
java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
 at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680) 
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:47)
{quote}
 

Code is here 
:[https://github.com/apache/spark/blob/aa41dcea4a41899507dfe4ec1eceaabb5edf728f/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47].

That thread blocked there (blocking queue) and although its a daemon thread it 
cannot move forward. Why it happens I dont know exactly but looks similar to 
[https://github.com/apache/spark/pull/24796],  [~zsxwing] thoughts?

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

2019-07-12 Thread Stavros Kontopoulos (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884121#comment-16884121
 ] 

Stavros Kontopoulos commented on SPARK-27927:
-

I think the issue is here:

```

"dag-scheduler-event-loop" #50 daemon prio=5 os_prio=0 tid=0x7f561ceb1000 
nid=0xa6 waiting on condition [0x7f5619ee4000] java.lang.Thread.State: 
WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 
<0x000542de6188> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at 
java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
 at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680) 
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:47)

```

Code is 
[here|[https://github.com/apache/spark/blob/aa41dcea4a41899507dfe4ec1eceaabb5edf728f/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]].

That thread blocked there (blocking queue) and although its a daemon thread it 
cannot move forward. Why it happens I dont know exactly but looks similar to 
[https://github.com/apache/spark/pull/24796],  [~zsxwing] thoughts?

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> ---
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>Reporter: Edwin Biemond
>Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information:  master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1082 matches

Mail list logo