[jira] [Commented] (SPARK-32259) tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s
[ https://issues.apache.org/jira/browse/SPARK-32259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156627#comment-17156627 ] Rob Vesse commented on SPARK-32259: --- bq. We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod logs for stack trace is not available. we have only pod events given in attachment You should still be able to use {{kubectl logs}} to retrieve the logs of terminated pods unless these are executor pods that are being evicted since I believe Spark cleans those up automatically. You can add {{spark.kubernetes.executor.deleteOnTermination=false}} to your configuration to disable this behaviour so that you can go and retrieve those logs later. > tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s > --- > > Key: SPARK-32259 > URL: https://issues.apache.org/jira/browse/SPARK-32259 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Prakash Rajendran >Priority: Blocker > Attachments: Capture.PNG > > > In Spark-Submit, I have these config > "{color:#4c9aff}*spark.kubernetes.local.dirs.tmpfs=true*{color}", still spark > is not pointing its spill data to SPARK_LOCAL_DIRS path. > K8s is evicting the pod due to error "{color:#de350b}*Pod ephemeral local > storage usage exceeds the total limit of containers.*{color}" > > We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod > logs for stack trace is not available. we have only pod events given in > attachment > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-32259) tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s
[ https://issues.apache.org/jira/browse/SPARK-32259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156626#comment-17156626 ] Rob Vesse edited comment on SPARK-32259 at 7/13/20, 10:32 AM: -- [~prakki79] Ideally you'd also include the following in your report: * The full {{spark-submit}} command * The {{spark-defaults.conf}} or whatever configuration file you are using (if any) * The {{kubectl describe pod}} output for the relevant pod(s) * The {{kubectl get pod -o=yaml}} output for the relevant pod(s) bq. I have these config "spark.kubernetes.local.dirs.tmpfs=true", still spark is not pointing its spill data to SPARK_LOCAL_DIRS path. Nothing you have shown so far suggests that this is true, all that configuration setting does is change how Spark configures the relevant {{emptyDir}} volume used for ephemeral storage (and that's assuming you haven't supplied other configuration that explicitly configures local directories). You can exhaust an in-memory volume in exactly the same as you exhaust a disk based volume and get your pod evicted. Note that when using in-memory volumes then you may need to adjust the amount of memory allocated to your pod per the documentation - http://spark.apache.org/docs/latest/running-on-kubernetes.html#using-ram-for-local-storage was (Author: rvesse): [~prakki79] Ideally you'd also include the following in your report: * The full {{spark-submit}} command * The {{kubectl describe pod}} output for the relevant pod(s) * The {{kubectl get pod -o=yaml}} output for the relevant pod(s) bq. I have these config "spark.kubernetes.local.dirs.tmpfs=true", still spark is not pointing its spill data to SPARK_LOCAL_DIRS path. Nothing you have shown so far suggests that this is true, all that configuration setting does is change how Spark configures the relevant {{emptyDir}} volume used for ephemeral storage (and that's assuming you haven't supplied other configuration that explicitly configures local directories). You can exhaust an in-memory volume in exactly the same as you exhaust a disk based volume and get your pod evicted. Note that when using in-memory volumes then you may need to adjust the amount of memory allocated to your pod per the documentation - http://spark.apache.org/docs/latest/running-on-kubernetes.html#using-ram-for-local-storage > tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s > --- > > Key: SPARK-32259 > URL: https://issues.apache.org/jira/browse/SPARK-32259 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Prakash Rajendran >Priority: Blocker > Attachments: Capture.PNG > > > In Spark-Submit, I have these config > "{color:#4c9aff}*spark.kubernetes.local.dirs.tmpfs=true*{color}", still spark > is not pointing its spill data to SPARK_LOCAL_DIRS path. > K8s is evicting the pod due to error "{color:#de350b}*Pod ephemeral local > storage usage exceeds the total limit of containers.*{color}" > > We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod > logs for stack trace is not available. we have only pod events given in > attachment > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32259) tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s
[ https://issues.apache.org/jira/browse/SPARK-32259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156626#comment-17156626 ] Rob Vesse commented on SPARK-32259: --- [~prakki79] Ideally you'd also include the following in your report: * The full {{spark-submit}} command * The {{kubectl describe pod}} output for the relevant pod(s) * The {{kubectl get pod -o=yaml}} output for the relevant pod(s) bq. I have these config "spark.kubernetes.local.dirs.tmpfs=true", still spark is not pointing its spill data to SPARK_LOCAL_DIRS path. Nothing you have shown so far suggests that this is true, all that configuration setting does is change how Spark configures the relevant {{emptyDir}} volume used for ephemeral storage (and that's assuming you haven't supplied other configuration that explicitly configures local directories). You can exhaust an in-memory volume in exactly the same as you exhaust a disk based volume and get your pod evicted. Note that when using in-memory volumes then you may need to adjust the amount of memory allocated to your pod per the documentation - http://spark.apache.org/docs/latest/running-on-kubernetes.html#using-ram-for-local-storage > tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s > --- > > Key: SPARK-32259 > URL: https://issues.apache.org/jira/browse/SPARK-32259 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Prakash Rajendran >Priority: Blocker > Attachments: Capture.PNG > > > In Spark-Submit, I have these config > "{color:#4c9aff}*spark.kubernetes.local.dirs.tmpfs=true*{color}", still spark > is not pointing its spill data to SPARK_LOCAL_DIRS path. > K8s is evicting the pod due to error "{color:#de350b}*Pod ephemeral local > storage usage exceeds the total limit of containers.*{color}" > > We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod > logs for stack trace is not available. we have only pod events given in > attachment > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28649) Git Ignore does not ignore python/.eggs
Rob Vesse created SPARK-28649: - Summary: Git Ignore does not ignore python/.eggs Key: SPARK-28649 URL: https://issues.apache.org/jira/browse/SPARK-28649 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 2.4.3 Reporter: Rob Vesse Currently the {{python/.eggs}} folder is not in the {{.gitignore}} file. If you are building a Spark distribution from your working copy and enabling Python distribution as part of that you'll end up with this folder present and Git will always warn you that it has untracked changes as a result. Since this directory contains transient build artifacts this should be ignored. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25262) Make Spark local dir volumes configurable with Spark on Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-25262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825132#comment-16825132 ] Rob Vesse commented on SPARK-25262: --- [~Udbhav Agrawal] Yes I think an approach like that would be acceptable to the community (and if not then I don't know what will be). If you want to take a stab at doing this please feel free > Make Spark local dir volumes configurable with Spark on Kubernetes > -- > > Key: SPARK-25262 > URL: https://issues.apache.org/jira/browse/SPARK-25262 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.0, 2.3.1 >Reporter: Rob Vesse >Priority: Major > > As discussed during review of the design document for SPARK-24434 while > providing pod templates will provide more in-depth customisation for Spark on > Kubernetes there are some things that cannot be modified because Spark code > generates pod specs in very specific ways. > The particular issue identified relates to handling on {{spark.local.dirs}} > which is done by {{LocalDirsFeatureStep.scala}}. For each directory > specified, or a single default if no explicit specification, it creates a > Kubernetes {{emptyDir}} volume. As noted in the Kubernetes documentation > this will be backed by the node storage > (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir). In some > compute environments this may be extremely undesirable. For example with > diskless compute resources the node storage will likely be a non-performant > remote mounted disk, often with limited capacity. For such environments it > would likely be better to set {{medium: Memory}} on the volume per the K8S > documentation to use a {{tmpfs}} volume instead. > Another closely related issue is that users might want to use a different > volume type to back the local directories and there is no possibility to do > that. > Pod templates will not really solve either of these issues because Spark is > always going to attempt to generate a new volume for each local directory and > always going to set these as {{emptyDir}}. > Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}: > * Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} > volumes > * Modify the logic to check if there is a volume already defined with the > name and if so skip generating a volume definition for it -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27063) Spark on K8S Integration Tests timeouts are too short for some test clusters
[ https://issues.apache.org/jira/browse/SPARK-27063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786051#comment-16786051 ] Rob Vesse commented on SPARK-27063: --- [~skonto] Yes we have experienced the same problem, I think my next PR for this will look to make that overall timeout user configurable > Spark on K8S Integration Tests timeouts are too short for some test clusters > > > Key: SPARK-27063 > URL: https://issues.apache.org/jira/browse/SPARK-27063 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Rob Vesse >Priority: Minor > > As noted during development for SPARK-26729 there are a couple of integration > test timeouts that are too short when running on slower clusters e.g. > developers laptops, small CI clusters etc > [~skonto] confirmed that he has also experienced this behaviour in the > discussion on PR [PR > 23846|https://github.com/apache/spark/pull/23846#discussion_r262564938] > We should up the defaults of this timeouts as an initial step and longer term > consider making the timeouts themselves configurable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27063) Spark on K8S Integration Tests timeouts are too short for some test clusters
Rob Vesse created SPARK-27063: - Summary: Spark on K8S Integration Tests timeouts are too short for some test clusters Key: SPARK-27063 URL: https://issues.apache.org/jira/browse/SPARK-27063 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.4.0 Reporter: Rob Vesse As noted during development for SPARK-26729 there are a couple of integration test timeouts that are too short when running on slower clusters e.g. developers laptops, small CI clusters etc [~skonto] confirmed that he has also experienced this behaviour in the discussion on PR [PR 23846|https://github.com/apache/spark/pull/23846#discussion_r262564938] We should up the defaults of this timeouts as an initial step and longer term consider making the timeouts themselves configurable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26833) Kubernetes RBAC documentation is unclear on exact RBAC requirements
[ https://issues.apache.org/jira/browse/SPARK-26833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761667#comment-16761667 ] Rob Vesse commented on SPARK-26833: --- Although not sure the latter is doable. With {{kubectl}} you can do {{--as system:serviceaccounts:namespace:account}} but I can't see any obvious way to do that with Fabric 8 unless you have the service account token present locally. We might be able to explicitly obtain the token for the relevant service account and then reconfigure a fresh client based on that but it would be a significant change to the existing behaviour. > Kubernetes RBAC documentation is unclear on exact RBAC requirements > --- > > Key: SPARK-26833 > URL: https://issues.apache.org/jira/browse/SPARK-26833 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Rob Vesse >Priority: Major > > I've seen a couple of users get bitten by this in informal discussions on > GitHub and Slack. Basically the user sets up the service account and > configures Spark to use it as described in the documentation but then when > they try and run a job they encounter an error like the following: > {quote}019-02-05 20:29:02 WARN WatchConnectionManager:185 - Exec Failure: > HTTP 403, Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: > User "system:anonymous" cannot watch pods in the namespace "default" > java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden' > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: pods > "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot > watch pods in the namespace "default"{quote} > This error stems from the fact that the configured service account is only > used by the driver pod and not by the submission client. The submission > client wants to do driver pod monitoring which it does with the users > submission credentials *NOT* the service account as the user might expect. > It seems like there are two ways to resolve this issue: > * Improve the documentation to clarify the current situation > * Ensure that if a service account is configured we always use it even on the > submission client > The former is the easy fix, the latter is more invasive and may have other > knock on effects so we should start with the former and discuss the > feasibility of the latter. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26833) Kubernetes RBAC documentation is unclear on exact RBAC requirements
[ https://issues.apache.org/jira/browse/SPARK-26833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Vesse updated SPARK-26833: -- Description: I've seen a couple of users get bitten by this in informal discussions on GitHub and Slack. Basically the user sets up the service account and configures Spark to use it as described in the documentation but then when they try and run a job they encounter an error like the following: {noformat} 019-02-05 20:29:02 WARN WatchConnectionManager:185 - Exec Failure: HTTP 403, Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot watch pods in the namespace "default" java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden' ... Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: pods "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot watch pods in the namespace "default" {noformat} This error stems from the fact that the configured service account is only used by the driver pod and not by the submission client. The submission client wants to do driver pod monitoring which it does with the users submission credentials *NOT* the service account as the user might expect. It seems like there are two ways to resolve this issue: * Improve the documentation to clarify the current situation * Ensure that if a service account is configured we always use it even on the submission client The former is the easy fix, the latter is more invasive and may have other knock on effects so we should start with the former and discuss the feasibility of the latter. was: I've seen a couple of users get bitten by this in informal discussions on GitHub and Slack. Basically the user sets up the service account and configures Spark to use it as described in the documentation but then when they try and run a job they encounter an error like the following: {noformat} 019-02-05 20:29:02 WARN WatchConnectionManager:185 - Exec Failure: HTTP 403, Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot watch pods in the namespace "default" java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden' ... Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: pods "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot watch pods in the namespace "default" {noformat} This error stems from the fact that the configured service account is only used by the driver pod and not by the submission client. The submission client wants to do driver pod monitoring which it does with the users submission credentials **NOT** the service account as the user might expect. It seems like there are two ways to resolve this issue: * Improve the documentation to clarify the current situation * Ensure that if a service account is configured we always use it even on the submission client The former is the easy fix, the latter is more invasive and may have other knock on effects so we should start with the former and discuss the feasibility of the latter. > Kubernetes RBAC documentation is unclear on exact RBAC requirements > --- > > Key: SPARK-26833 > URL: https://issues.apache.org/jira/browse/SPARK-26833 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Rob Vesse >Priority: Major > > I've seen a couple of users get bitten by this in informal discussions on > GitHub and Slack. Basically the user sets up the service account and > configures Spark to use it as described in the documentation but then when > they try and run a job they encounter an error like the following: > {noformat} > 019-02-05 20:29:02 WARN WatchConnectionManager:185 - Exec Failure: HTTP 403, > Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User > "system:anonymous" cannot watch pods in the namespace "default" > java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden' > ... > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: pods > "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot > watch pods in the namespace "default" > {noformat} > This error stems from the fact that the configured service account is only > used by the driver pod and not by the submission client. The submission > client wants to do driver pod monitoring which it does with the users > submission credentials *NOT* the service account as the user might expect. > It seems like there are two ways to resolve this issue: > * Improve the documentation to clarify the current situation > * Ensure that if a service account is conf
[jira] [Updated] (SPARK-26833) Kubernetes RBAC documentation is unclear on exact RBAC requirements
[ https://issues.apache.org/jira/browse/SPARK-26833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Vesse updated SPARK-26833: -- Description: I've seen a couple of users get bitten by this in informal discussions on GitHub and Slack. Basically the user sets up the service account and configures Spark to use it as described in the documentation but then when they try and run a job they encounter an error like the following: {quote}019-02-05 20:29:02 WARN WatchConnectionManager:185 - Exec Failure: HTTP 403, Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot watch pods in the namespace "default" java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden' Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: pods "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot watch pods in the namespace "default"{quote} This error stems from the fact that the configured service account is only used by the driver pod and not by the submission client. The submission client wants to do driver pod monitoring which it does with the users submission credentials *NOT* the service account as the user might expect. It seems like there are two ways to resolve this issue: * Improve the documentation to clarify the current situation * Ensure that if a service account is configured we always use it even on the submission client The former is the easy fix, the latter is more invasive and may have other knock on effects so we should start with the former and discuss the feasibility of the latter. was: I've seen a couple of users get bitten by this in informal discussions on GitHub and Slack. Basically the user sets up the service account and configures Spark to use it as described in the documentation but then when they try and run a job they encounter an error like the following: {noformat} 019-02-05 20:29:02 WARN WatchConnectionManager:185 - Exec Failure: HTTP 403, Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot watch pods in the namespace "default" java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden' Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: pods "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot watch pods in the namespace "default" {noformat} This error stems from the fact that the configured service account is only used by the driver pod and not by the submission client. The submission client wants to do driver pod monitoring which it does with the users submission credentials *NOT* the service account as the user might expect. It seems like there are two ways to resolve this issue: * Improve the documentation to clarify the current situation * Ensure that if a service account is configured we always use it even on the submission client The former is the easy fix, the latter is more invasive and may have other knock on effects so we should start with the former and discuss the feasibility of the latter. > Kubernetes RBAC documentation is unclear on exact RBAC requirements > --- > > Key: SPARK-26833 > URL: https://issues.apache.org/jira/browse/SPARK-26833 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Rob Vesse >Priority: Major > > I've seen a couple of users get bitten by this in informal discussions on > GitHub and Slack. Basically the user sets up the service account and > configures Spark to use it as described in the documentation but then when > they try and run a job they encounter an error like the following: > {quote}019-02-05 20:29:02 WARN WatchConnectionManager:185 - Exec Failure: > HTTP 403, Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: > User "system:anonymous" cannot watch pods in the namespace "default" > java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden' > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: pods > "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot > watch pods in the namespace "default"{quote} > This error stems from the fact that the configured service account is only > used by the driver pod and not by the submission client. The submission > client wants to do driver pod monitoring which it does with the users > submission credentials *NOT* the service account as the user might expect. > It seems like there are two ways to resolve this issue: > * Improve the documentation to clarify the current situation > * Ensure that if a service account is configured we always use it even on th
[jira] [Updated] (SPARK-26833) Kubernetes RBAC documentation is unclear on exact RBAC requirements
[ https://issues.apache.org/jira/browse/SPARK-26833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Vesse updated SPARK-26833: -- Description: I've seen a couple of users get bitten by this in informal discussions on GitHub and Slack. Basically the user sets up the service account and configures Spark to use it as described in the documentation but then when they try and run a job they encounter an error like the following: {noformat} 019-02-05 20:29:02 WARN WatchConnectionManager:185 - Exec Failure: HTTP 403, Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot watch pods in the namespace "default" java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden' Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: pods "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot watch pods in the namespace "default" {noformat} This error stems from the fact that the configured service account is only used by the driver pod and not by the submission client. The submission client wants to do driver pod monitoring which it does with the users submission credentials *NOT* the service account as the user might expect. It seems like there are two ways to resolve this issue: * Improve the documentation to clarify the current situation * Ensure that if a service account is configured we always use it even on the submission client The former is the easy fix, the latter is more invasive and may have other knock on effects so we should start with the former and discuss the feasibility of the latter. was: I've seen a couple of users get bitten by this in informal discussions on GitHub and Slack. Basically the user sets up the service account and configures Spark to use it as described in the documentation but then when they try and run a job they encounter an error like the following: {noformat} 019-02-05 20:29:02 WARN WatchConnectionManager:185 - Exec Failure: HTTP 403, Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot watch pods in the namespace "default" java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden' ... Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: pods "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot watch pods in the namespace "default" {noformat} This error stems from the fact that the configured service account is only used by the driver pod and not by the submission client. The submission client wants to do driver pod monitoring which it does with the users submission credentials *NOT* the service account as the user might expect. It seems like there are two ways to resolve this issue: * Improve the documentation to clarify the current situation * Ensure that if a service account is configured we always use it even on the submission client The former is the easy fix, the latter is more invasive and may have other knock on effects so we should start with the former and discuss the feasibility of the latter. > Kubernetes RBAC documentation is unclear on exact RBAC requirements > --- > > Key: SPARK-26833 > URL: https://issues.apache.org/jira/browse/SPARK-26833 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Rob Vesse >Priority: Major > > I've seen a couple of users get bitten by this in informal discussions on > GitHub and Slack. Basically the user sets up the service account and > configures Spark to use it as described in the documentation but then when > they try and run a job they encounter an error like the following: > {noformat} > 019-02-05 20:29:02 WARN WatchConnectionManager:185 - Exec Failure: HTTP 403, > Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User > "system:anonymous" cannot watch pods in the namespace "default" > java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden' > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: pods > "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot > watch pods in the namespace "default" > {noformat} > This error stems from the fact that the configured service account is only > used by the driver pod and not by the submission client. The submission > client wants to do driver pod monitoring which it does with the users > submission credentials *NOT* the service account as the user might expect. > It seems like there are two ways to resolve this issue: > * Improve the documentation to clarify the current situation > * Ensure that if a service account is configured we a
[jira] [Created] (SPARK-26833) Kubernetes RBAC documentation is unclear on exact RBAC requirements
Rob Vesse created SPARK-26833: - Summary: Kubernetes RBAC documentation is unclear on exact RBAC requirements Key: SPARK-26833 URL: https://issues.apache.org/jira/browse/SPARK-26833 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.4.0, 2.3.2, 2.3.1, 2.3.0 Reporter: Rob Vesse I've seen a couple of users get bitten by this in informal discussions on GitHub and Slack. Basically the user sets up the service account and configures Spark to use it as described in the documentation but then when they try and run a job they encounter an error like the following: {noformat} 019-02-05 20:29:02 WARN WatchConnectionManager:185 - Exec Failure: HTTP 403, Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot watch pods in the namespace "default" java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden' ... Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: pods "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot watch pods in the namespace "default" {noformat} This error stems from the fact that the configured service account is only used by the driver pod and not by the submission client. The submission client wants to do driver pod monitoring which it does with the users submission credentials **NOT** the service account as the user might expect. It seems like there are two ways to resolve this issue: * Improve the documentation to clarify the current situation * Ensure that if a service account is configured we always use it even on the submission client The former is the easy fix, the latter is more invasive and may have other knock on effects so we should start with the former and discuss the feasibility of the latter. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26729) Spark on Kubernetes tooling hardcodes default image names
Rob Vesse created SPARK-26729: - Summary: Spark on Kubernetes tooling hardcodes default image names Key: SPARK-26729 URL: https://issues.apache.org/jira/browse/SPARK-26729 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.4.0 Reporter: Rob Vesse Both when creating images with {{bin/docker-image-tool.sh}} and when running the Kubernetes integration tests the image names are hardcoded to {{spark}}, {{spark-py}} and {{spark-r}}. If you are producing custom images in some other way (e.g. a CI/CD process that doesn't use the script) or are required to use a different naming convention due to company policy e.g. prefixing with vendor name (e.g. {{apache-spark}}) then you can't directly create/test your images with the desired names. You can of course simply re-tag the images with the desired names but this might not be possible in some CI/CD pipelines especially if naming conventions are being enforced at the registry level. It would be nice if the default image names were customisable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26704) docker-image-tool.sh should copy custom Dockerfiles into the build context for inclusion in images
[ https://issues.apache.org/jira/browse/SPARK-26704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Vesse resolved SPARK-26704. --- Resolution: Not A Problem > docker-image-tool.sh should copy custom Dockerfiles into the build context > for inclusion in images > -- > > Key: SPARK-26704 > URL: https://issues.apache.org/jira/browse/SPARK-26704 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Rob Vesse >Priority: Major > > As surfaced in the discussion on the PR for SPARK-26687 > (https://github.com/apache/spark/pull/23613) when using custom Dockerfiles > these are not copied into the build context. Rather the build context > includes the default Dockerfiles from Spark regardless of what Dockerfiles > the end user actually used to build the images. > The suggestion in the PR was that the script should copy in the custom > Dockerfiles over the stock Dockerfiles. This potentially aids in > reproducing the images later because someone with an image can get the exact > Dockerfile used to build that image. > A related issue is that the script allows for and even in some cases > implicitly uses Docker build arguments as part of building the images. In > the case where build arguments are used these should probably also be > captured in the image to aid reproducibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26704) docker-image-tool.sh should copy custom Dockerfiles into the build context for inclusion in images
[ https://issues.apache.org/jira/browse/SPARK-26704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750366#comment-16750366 ] Rob Vesse commented on SPARK-26704: --- Yes sorry I'm conflating with the build context with the image contents. So you're correct there isn't anything to do here. Will close as Not a Problem > docker-image-tool.sh should copy custom Dockerfiles into the build context > for inclusion in images > -- > > Key: SPARK-26704 > URL: https://issues.apache.org/jira/browse/SPARK-26704 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Rob Vesse >Priority: Major > > As surfaced in the discussion on the PR for SPARK-26687 > (https://github.com/apache/spark/pull/23613) when using custom Dockerfiles > these are not copied into the build context. Rather the build context > includes the default Dockerfiles from Spark regardless of what Dockerfiles > the end user actually used to build the images. > The suggestion in the PR was that the script should copy in the custom > Dockerfiles over the stock Dockerfiles. This potentially aids in > reproducing the images later because someone with an image can get the exact > Dockerfile used to build that image. > A related issue is that the script allows for and even in some cases > implicitly uses Docker build arguments as part of building the images. In > the case where build arguments are used these should probably also be > captured in the image to aid reproducibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26704) docker-image-tool.sh should copy custom Dockerfiles into the build context for inclusion in images
[ https://issues.apache.org/jira/browse/SPARK-26704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750323#comment-16750323 ] Rob Vesse commented on SPARK-26704: --- For me it's a question of build reproducibility (I've been following an interesting discussion around this on legal-discuss - https://lists.apache.org/thread.html/d578819f1afa6b8fb697ea72083e0fb05e43938a23d6e7bb804069b8@%3Clegal-discuss.apache.org%3E). If I crack open the image and start poking around and find a Dockerfile present do I have a reasonable expectation that the Dockerfile I find there is the one used to build the image? If Yes, then we should ensure we include the correct Dockerfile's in the build context and thus the image. If No, then we should probably not bother including the Dockerfile's at all. However since as you point out when building from a Spark release distribution they will be present and thus packaged into the image I would suspect we want to continue doing this even for developer builds. > docker-image-tool.sh should copy custom Dockerfiles into the build context > for inclusion in images > -- > > Key: SPARK-26704 > URL: https://issues.apache.org/jira/browse/SPARK-26704 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Rob Vesse >Priority: Major > > As surfaced in the discussion on the PR for SPARK-26687 > (https://github.com/apache/spark/pull/23613) when using custom Dockerfiles > these are not copied into the build context. Rather the build context > includes the default Dockerfiles from Spark regardless of what Dockerfiles > the end user actually used to build the images. > The suggestion in the PR was that the script should copy in the custom > Dockerfiles over the stock Dockerfiles. This potentially aids in > reproducing the images later because someone with an image can get the exact > Dockerfile used to build that image. > A related issue is that the script allows for and even in some cases > implicitly uses Docker build arguments as part of building the images. In > the case where build arguments are used these should probably also be > captured in the image to aid reproducibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26704) docker-image-tool.sh should copy custom Dockerfiles into the build context for inclusion in images
Rob Vesse created SPARK-26704: - Summary: docker-image-tool.sh should copy custom Dockerfiles into the build context for inclusion in images Key: SPARK-26704 URL: https://issues.apache.org/jira/browse/SPARK-26704 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.4.0 Reporter: Rob Vesse As surfaced in the discussion on the PR for SPARK-26687 (https://github.com/apache/spark/pull/23613) when using custom Dockerfiles these are not copied into the build context. Rather the build context includes the default Dockerfiles from Spark regardless of what Dockerfiles the end user actually used to build the images. The suggestion in the PR was that the script should copy in the custom Dockerfiles over the stock Dockerfiles. This potentially aids in reproducing the images later because someone with an image can get the exact Dockerfile used to build that image. A related issue is that the script allows for and even in some cases implicitly uses Docker build arguments as part of building the images. In the case where build arguments are used these should probably also be captured in the image to aid reproducibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26687) Building Spark Images has non-intuitive behaviour with paths to custom Dockerfiles
Rob Vesse created SPARK-26687: - Summary: Building Spark Images has non-intuitive behaviour with paths to custom Dockerfiles Key: SPARK-26687 URL: https://issues.apache.org/jira/browse/SPARK-26687 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.4.0 Reporter: Rob Vesse With the changes from SPARK-26025 (https://github.com/apache/spark/pull/23019) we use a pared down Docker build context which significantly improves build times. However the way this is implemented leads to non-intuitive behaviour when supplying custom Docker file paths. This is because of the following code snippets: {code} (cd $(img_ctx_dir base) && docker build $NOCACHEARG "${BUILD_ARGS[@]}" \ -t $(image_ref spark) \ -f "$BASEDOCKERFILE" .) {code} Since the script changes to the temporary build context directory and then runs {{docker build}} there any path given for the Docker file is taken as relative to the temporary build context directory rather than to the directory where the user invoked the script. This produces somewhat unhelpful errors e.g. {noformat} > ./bin/docker-image-tool.sh -r rvesse -t badpath -p > resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile > build Sending build context to Docker daemon 218.4MB Step 1/15 : FROM openjdk:8-alpine ---> 5801f7d008e5 Step 2/15 : ARG spark_uid=185 ---> Using cache ---> 5fd63df1ca39 ... Successfully tagged rvesse/spark:badpath unable to prepare context: unable to evaluate symlinks in Dockerfile path: lstat /Users/rvesse/Documents/Work/Code/spark/target/tmp/docker/pyspark/resource-managers: no such file or directory Failed to build PySpark Docker image, please refer to Docker build output for details. {noformat} Here we can see that the relative path that was valid where the user typed the command was not valid inside the build context directory. To resolve this we need to ensure that we are resolving relative paths to Docker files appropriately. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26685) Building Spark Images with latest Docker does not honour spark_uid build argument
[ https://issues.apache.org/jira/browse/SPARK-26685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16748605#comment-16748605 ] Rob Vesse commented on SPARK-26685: --- Opened a PR to fix this > Building Spark Images with latest Docker does not honour spark_uid build > argument > - > > Key: SPARK-26685 > URL: https://issues.apache.org/jira/browse/SPARK-26685 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Rob Vesse >Priority: Major > > Latest Docker releases are stricter in their interpretation of the scope of > build arguments meaning the location of the {{ARG spark_uid}} declaration > puts it out of scope by the time the variable is consumed resulting in the > Python and R images still running as {{root}} regardless of what the user may > have specified as the desired UID. > e.g. Images built with {{-u 456}} provided to {{bin/docker-image-tool.sh}} > {noformat} > > docker run -it --entrypoint /bin/bash rvesse/spark-py:uid456 > bash-4.4# whoami > root > bash-4.4# id -u > 0 > bash-4.4# exit > > docker run -it --entrypoint /bin/bash rvesse/spark:uid456 > bash-4.4$ id -u > 456 > bash-4.4$ exit > {noformat} > Note that for the Python image the build argument was out of scope and > ignored. For the base image the {{ARG}} declaration is in an in-scope > location and so is honoured correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26685) Building Spark Images with latest Docker does not honour spark_uid build argument
Rob Vesse created SPARK-26685: - Summary: Building Spark Images with latest Docker does not honour spark_uid build argument Key: SPARK-26685 URL: https://issues.apache.org/jira/browse/SPARK-26685 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.4.0 Reporter: Rob Vesse Latest Docker releases are stricter in their interpretation of the scope of build arguments meaning the location of the {{ARG spark_uid}} declaration puts it out of scope by the time the variable is consumed resulting in the Python and R images still running as {{root}} regardless of what the user may have specified as the desired UID. e.g. Images built with {{-u 456}} provided to {{bin/docker-image-tool.sh}} {noformat} > docker run -it --entrypoint /bin/bash rvesse/spark-py:uid456 bash-4.4# whoami root bash-4.4# id -u 0 bash-4.4# exit > docker run -it --entrypoint /bin/bash rvesse/spark:uid456 bash-4.4$ id -u 456 bash-4.4$ exit {noformat} Note that for the Python image the build argument was out of scope and ignored. For the base image the {{ARG}} declaration is in an in-scope location and so is honoured correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26028) Design sketch for SPIP: Property Graphs, Cypher Queries, and Algorithms
[ https://issues.apache.org/jira/browse/SPARK-26028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746134#comment-16746134 ] Rob Vesse commented on SPARK-26028: --- One initial comment - why include the query engine directly in the data representation? i.e. PropertyGraph should not have the cypher related properties and methods. Ideally these should be on a separate trait (e.g. CypherCapablePropertyGraph) so you cleanly separate the data representation from the query engine. Then the design would cleanly allow for other query engines in the future e.g. future versions of Cypher, GQL, GraphQL etc. > Design sketch for SPIP: Property Graphs, Cypher Queries, and Algorithms > --- > > Key: SPARK-26028 > URL: https://issues.apache.org/jira/browse/SPARK-26028 > Project: Spark > Issue Type: New Feature > Components: GraphX >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Martin Junghanns >Priority: Major > > Placeholder for the design discussion of SPARK-25994. The scope here is to > help SPIP vote instead of the final design. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26015) Include a USER directive in project provided Spark Dockerfiles
Rob Vesse created SPARK-26015: - Summary: Include a USER directive in project provided Spark Dockerfiles Key: SPARK-26015 URL: https://issues.apache.org/jira/browse/SPARK-26015 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.4.0, 2.3.2, 2.3.1, 2.3.0 Reporter: Rob Vesse The current Dockerfiles provided by the project for running on Kubernetes do not include a [USER directive|https://docs.docker.com/engine/reference/builder/#user] which means that they default to running as {{root}}. This may lead to unsuspecting users running their Spark jobs with unexpected levels of privilege. The project should follow Docker/K8S best practises by including {{USER}} directives in the Dockerfiles. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24833) Allow specifying Kubernetes host name aliases in the pod specs
[ https://issues.apache.org/jira/browse/SPARK-24833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Vesse resolved SPARK-24833. --- Resolution: Won't Fix As discussed in PR the pod template feature provides the ability to do this without needing new configuration properties > Allow specifying Kubernetes host name aliases in the pod specs > -- > > Key: SPARK-24833 > URL: https://issues.apache.org/jira/browse/SPARK-24833 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.1 >Reporter: Rob Vesse >Priority: Major > > For some workloads you would like to allow Spark executors to access external > services using host name aliases. Currently there is no way to specify Host > name aliases > (https://kubernetes.io/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases/) > to the pods that Spark generates and pod presets cannot be used to add these > at admission time currently (plus the fact that pod presets are still an > Alpha feature so not guaranteed to be usable on any given cluster). > Since Spark on K8S already allows adding secrets and volumes to mount via > Spark configuration it should be fairly easy to use the same approach to > include host name aliases. > I will look at opening a PR for this in the next couple of days. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25887) Allow specifying Kubernetes context to use
Rob Vesse created SPARK-25887: - Summary: Allow specifying Kubernetes context to use Key: SPARK-25887 URL: https://issues.apache.org/jira/browse/SPARK-25887 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.3.2, 2.3.1, 2.3.0, 2.4.0 Reporter: Rob Vesse In working on SPARK-25809 support was added to the integration testing machinery for Spark on K8S to use an arbitrary context from the users K8S config file. However this can fail/cause false positives because regardless of what the integration test harness does the K8S submission client uses the Fabric 8 client library in such a way that it only ever configures itself from the current context. For users who work with multiple K8S clusters or who have multiple K8S "users" for interacting with their cluster being able to support arbitrary contexts without forcing the user to first {{kubectl config use-context }} is an important improvement. This would be a fairly small fix to {{SparkKubernetesClientFactory}} and an associated configuration key, likely {{spark.kubernetes.context}} to go along with this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25809) Support additional K8S cluster types for integration tests
[ https://issues.apache.org/jira/browse/SPARK-25809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669048#comment-16669048 ] Rob Vesse commented on SPARK-25809: --- Fairly close to this being ready to merge > Support additional K8S cluster types for integration tests > -- > > Key: SPARK-25809 > URL: https://issues.apache.org/jira/browse/SPARK-25809 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.2, 2.4.0 >Reporter: Rob Vesse >Priority: Major > > Currently the Spark on K8S integration tests are hardcoded to use a > {{minikube}} based backend. It would be nice if developers had more > flexibility in the choice of K8S cluster they wish to use for integration > testing. More specifically it would be useful to be able to use the built-in > Kubernetes support in recent Docker releases and to just use a generic K8S > cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25809) Support additional K8S cluster types for integration tests
Rob Vesse created SPARK-25809: - Summary: Support additional K8S cluster types for integration tests Key: SPARK-25809 URL: https://issues.apache.org/jira/browse/SPARK-25809 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.3.2, 2.4.0 Reporter: Rob Vesse Currently the Spark on K8S integration tests are hardcoded to use a {{minikube}} based backend. It would be nice if developers had more flexibility in the choice of K8S cluster they wish to use for integration testing. More specifically it would be useful to be able to use the built-in Kubernetes support in recent Docker releases and to just use a generic K8S cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25745) docker-image-tool.sh ignores errors from Docker
Rob Vesse created SPARK-25745: - Summary: docker-image-tool.sh ignores errors from Docker Key: SPARK-25745 URL: https://issues.apache.org/jira/browse/SPARK-25745 Project: Spark Issue Type: Bug Components: Deploy, Kubernetes Affects Versions: 2.3.2, 2.3.1, 2.3.0 Reporter: Rob Vesse In attempting to use the {{docker-image-tool.sh}} scripts to build some custom Dockerfiles I ran into issues with the scripts interaction with Docker. Most notably if the Docker build/push fails the script continues blindly ignoring the errors. This can either result in complete failure to build or lead to subtle bugs where images are built against different base images than expected. Additionally while the Dockerfiles assume that Spark is first built locally the scripts fail to validate this which they could easily do by checking the expected JARs location. This can also lead to failed Docker builds which could easily be avoided. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23153) Support application dependencies in submission client's local file system
[ https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Vesse updated SPARK-23153: -- Description: Currently local dependencies are not supported with Spark on K8S i.e. if the user has code or dependencies only on the client where they run {{spark-submit}} then the current implementation has no way to make those visible to the Spark application running inside the K8S pods that get launched. This limits users to only running applications where the code and dependencies are either baked into the Docker images used or where those are available via some external and globally accessible file system e.g. HDFS which are not viable options for many users and environments > Support application dependencies in submission client's local file system > - > > Key: SPARK-23153 > URL: https://issues.apache.org/jira/browse/SPARK-23153 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > Currently local dependencies are not supported with Spark on K8S i.e. if the > user has code or dependencies only on the client where they run > {{spark-submit}} then the current implementation has no way to make those > visible to the Spark application running inside the K8S pods that get > launched. This limits users to only running applications where the code and > dependencies are either baked into the Docker images used or where those are > available via some external and globally accessible file system e.g. HDFS > which are not viable options for many users and environments -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23153) Support application dependencies in submission client's local file system
[ https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635534#comment-16635534 ] Rob Vesse commented on SPARK-23153: --- [~cloud_fan][~liyinan926][~mcheah][~eje] Has there been any discussion of how to go about addressing this limitation? In the original downstream fork there was the Resource Staging Server but that got removed to simplify upstreaming and because Spark core folks had objections to that approach. Also in our usages of it we encountered a number of performance, scalability and security issues that made it a not particularly stable approach. There was a long dev list thread on this - https://lists.apache.org/thread.html/82b4ae9a2eb5ddeb3f7240ebf154f06f19b830f8b3120038e5d687a1@%3Cdev.spark.apache.org%3E - but no real conclusion seemed to be reached. There are a few workarounds open to users that I can think of: * Use the PVC support to mount a pre-created PVC that has somehow been populated with the user code * Use the incoming pod template feature to mount arbitrary volumes that has somehow been populated with the user code * Build custom images All these options put the onus on users to do prep work prior to launch, I think Option 3 is currently the "recommended" workaround. Unfortunately for us that is not a viable option as our customers tend to be very security conscious and often only allow a pre-approved list of images to be run. (Ignoring the obvious fallacy of disallowing custom images while permitting the running of images that allow custom user code to execute...) This is a blocker for me currently and I would like to contribute here but don't want to reinvent the wheel or waste effort on approaches that have already been discussed/discounted. > Support application dependencies in submission client's local file system > - > > Key: SPARK-23153 > URL: https://issues.apache.org/jira/browse/SPARK-23153 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620809#comment-16620809 ] Rob Vesse commented on SPARK-24434: --- Started a mailing list thread re: the limitations of this as currently implemented - https://lists.apache.org/thread.html/8a0ac1cada800d10ec1fe7f9552257af1dfc6719b404bdc3696b5c1f@%3Cdev.spark.apache.org%3E > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25262) Make Spark local dir volumes configurable with Spark on Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-25262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606797#comment-16606797 ] Rob Vesse commented on SPARK-25262: --- [~mcheah] I would like to keep this open so we can have the larger discussion that the original PR implied about how pod templates and feature steps should interact and how best to enable power user customisation. I am busy today at a conference but will try and kick off the discussion of this on the dev list next week. > Make Spark local dir volumes configurable with Spark on Kubernetes > -- > > Key: SPARK-25262 > URL: https://issues.apache.org/jira/browse/SPARK-25262 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.0, 2.3.1 >Reporter: Rob Vesse >Priority: Major > > As discussed during review of the design document for SPARK-24434 while > providing pod templates will provide more in-depth customisation for Spark on > Kubernetes there are some things that cannot be modified because Spark code > generates pod specs in very specific ways. > The particular issue identified relates to handling on {{spark.local.dirs}} > which is done by {{LocalDirsFeatureStep.scala}}. For each directory > specified, or a single default if no explicit specification, it creates a > Kubernetes {{emptyDir}} volume. As noted in the Kubernetes documentation > this will be backed by the node storage > (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir). In some > compute environments this may be extremely undesirable. For example with > diskless compute resources the node storage will likely be a non-performant > remote mounted disk, often with limited capacity. For such environments it > would likely be better to set {{medium: Memory}} on the volume per the K8S > documentation to use a {{tmpfs}} volume instead. > Another closely related issue is that users might want to use a different > volume type to back the local directories and there is no possibility to do > that. > Pod templates will not really solve either of these issues because Spark is > always going to attempt to generate a new volume for each local directory and > always going to set these as {{emptyDir}}. > Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}: > * Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} > volumes > * Modify the logic to check if there is a volume already defined with the > name and if so skip generating a volume definition for it -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598402#comment-16598402 ] Rob Vesse commented on SPARK-24434: --- {quote} I think the miscommunication here was because of the discrepancy between this Jira and k8s-sig-big-data weekly meeting notes {quote} As an Apache member this comment raises red flags for me. All Spark development discussions should either be happening on Apache resources (JIRA, mailing lists, GitHub repos) or being captured and posted to Apache resources. If people are having to follow external resources, particularly live meetings which naturally exclude portions of the community due to timezone/availability constraints, to participate in an Apache community then that community is not operating as a proper Apache community. This doesn't mean that such discussions and meetings can't happen but they should be summarised back on Apache resources so the wider community has the opportunity to participate. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25262) Make Spark local dir volumes configurable with Spark on Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-25262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595178#comment-16595178 ] Rob Vesse commented on SPARK-25262: --- I have changes for this almost ready and plan to open a PR tomorrow > Make Spark local dir volumes configurable with Spark on Kubernetes > -- > > Key: SPARK-25262 > URL: https://issues.apache.org/jira/browse/SPARK-25262 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.0, 2.3.1 >Reporter: Rob Vesse >Priority: Major > > As discussed during review of the design document for SPARK-24434 while > providing pod templates will provide more in-depth customisation for Spark on > Kubernetes there are some things that cannot be modified because Spark code > generates pod specs in very specific ways. > The particular issue identified relates to handling on {{spark.local.dirs}} > which is done by {{LocalDirsFeatureStep.scala}}. For each directory > specified, or a single default if no explicit specification, it creates a > Kubernetes {{emptyDir}} volume. As noted in the Kubernetes documentation > this will be backed by the node storage > (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir). In some > compute environments this may be extremely undesirable. For example with > diskless compute resources the node storage will likely be a non-performant > remote mounted disk, often with limited capacity. For such environments it > would likely be better to set {{medium: Memory}} on the volume per the K8S > documentation to use a {{tmpfs}} volume instead. > Another closely related issue is that users might want to use a different > volume type to back the local directories and there is no possibility to do > that. > Pod templates will not really solve either of these issues because Spark is > always going to attempt to generate a new volume for each local directory and > always going to set these as {{emptyDir}}. > Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}: > * Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} > volumes > * Modify the logic to check if there is a volume already defined with the > name and if so skip generating a volume definition for it -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25262) Make Spark local dir volumes configurable with Spark on Kubernetes
Rob Vesse created SPARK-25262: - Summary: Make Spark local dir volumes configurable with Spark on Kubernetes Key: SPARK-25262 URL: https://issues.apache.org/jira/browse/SPARK-25262 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.3.1, 2.3.0 Reporter: Rob Vesse As discussed during review of the design document for SPARK-24434 while providing pod templates will provide more in-depth customisation for Spark on Kubernetes there are some things that cannot be modified because Spark code generates pod specs in very specific ways. The particular issue identified relates to handling on {{spark.local.dirs}} which is done by {{LocalDirsFeatureStep.scala}}. For each directory specified, or a single default if no explicit specification, it creates a Kubernetes {{emptyDir}} volume. As noted in the Kubernetes documentation this will be backed by the node storage (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir). In some compute environments this may be extremely undesirable. For example with diskless compute resources the node storage will likely be a non-performant remote mounted disk, often with limited capacity. For such environments it would likely be better to set {{medium: Memory}} on the volume per the K8S documentation to use a {{tmpfs}} volume instead. Another closely related issue is that users might want to use a different volume type to back the local directories and there is no possibility to do that. Pod templates will not really solve either of these issues because Spark is always going to attempt to generate a new volume for each local directory and always going to set these as {{emptyDir}}. Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}: * Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} volumes * Modify the logic to check if there is a volume already defined with the name and if so skip generating a volume definition for it -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25024) Update mesos documentation to be clear about security supported
[ https://issues.apache.org/jira/browse/SPARK-25024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594933#comment-16594933 ] Rob Vesse commented on SPARK-25024: --- [~tgraves] Attempting to answer your questions: * We never used cluster mode so can't comment * Yes and no ** Similar to YARN it does the login locally in the client and then uses HDFS delegation tokens so it doesn't ship the keytabs AFAIK but it does ship the delegation tokens * We never used Spark Shuffle Service either so can't comment * Yes ** Mesos does authentication at the framework level rather than the user level so it depends on your setup. You might have setups where there is a single principal and secret used by all Spark users or you might have setups where you create a principal and secret for each user. You can optionally do ACLs within Mesos for each framework principal including configuring things like which users a framework is allowed to launch jobs as. * Again not used this feature, think these are similar to K8S secrets in that they are created separately and you are just passing identifiers for these to Spark and Mesos takes care of providing these securely to your jobs. Generally we have dropped use of Spark on Mesos in favour of Spark on K8S because the security story for Mesos was poor and we had to do a lot of extra stuff to provide multi-tenancy whereas with K8S a lot more was available out of the box (even if secure HDFS support has yet to land in mainline Spark) > Update mesos documentation to be clear about security supported > --- > > Key: SPARK-25024 > URL: https://issues.apache.org/jira/browse/SPARK-25024 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.2 >Reporter: Thomas Graves >Priority: Major > > I was reading through our mesos deployment docs and security docs and its not > clear at all what type of security and how to set it up for mesos. I think > we should clarify this and have something about exactly what is supported and > what is not. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25222) Spark on Kubernetes Pod Watcher dumps raw container status
[ https://issues.apache.org/jira/browse/SPARK-25222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591386#comment-16591386 ] Rob Vesse commented on SPARK-25222: --- There is also a similar issue with task failure: {noformat} 2018-08-24 09:11:57 WARN TaskSetManager:66 - Lost task 2.3 in stage 0.0 (TID 13, 10.244.3.199, executor 8): ExecutorLostFailure (executor 8 exited caused by one of the running tasks) Reason: The executor with id 8 exited with exit code 52. The API gave the following brief reason: null The API gave the following message: null The API gave the following container statuses: ContainerStatus(containerID=docker://353f78fd634d312ec8115032c32da56748fb5d8da2c5ae54b1d0a9f112fb4d1d, image=rvesse/spark:latest, imageID=docker-pullable://rvesse/spark@sha256:92abf0b718743d0f5a26068fc94ec42233db0493c55a8570dc8c851c62a4bc0a, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=executor, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://353f78fd634d312ec8115032c32da56748fb5d8da2c5ae54b1d0a9f112fb4d1d, exitCode=52, finishedAt=Time(time=2018-08-24T09:11:56Z, additionalProperties={}), message=null, reason=Error, signal=null, startedAt=Time(time=2018-08-24T09:11:48Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={}) {noformat} > Spark on Kubernetes Pod Watcher dumps raw container status > -- > > Key: SPARK-25222 > URL: https://issues.apache.org/jira/browse/SPARK-25222 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.0, 2.3.1 >Reporter: Rob Vesse >Priority: Minor > > Spark on Kubernetes provides logging of the pod/container status as a monitor > of the job progress. However the logger just dumps the raw container status > object leading to fairly unreadable output like so: > {noformat} > 18/08/24 09:03:27 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-groupby-1535101393784-driver >namespace: default >labels: spark-app-selector -> spark-47f7248122b9444b8d5fd3701028a1e8, > spark-role -> driver >pod uid: 88de6467-a77c-11e8-b9da-a4bf0128b75b >creation time: 2018-08-24T09:03:14Z >service account name: spark >volumes: spark-local-dir-1, spark-conf-volume, spark-token-kjxkv >node name: tab-cmp4 >start time: 2018-08-24T09:03:14Z >container images: rvesse/spark:latest >phase: Running >status: > [ContainerStatus(containerID=docker://23ae58571f59505e837dca40455d0347fb90e9b88e2a2b145a38e2919fceb447, > image=rvesse/spark:latest, > imageID=docker-pullable://rvesse/spark@sha256:92abf0b718743d0f5a26068fc94ec42233db0493c55a8570dc8c851c62a4bc0a, > lastState=ContainerState(running=null, terminated=null, waiting=null, > additionalProperties={}), name=spark-kubernetes-driver, ready=true, > restartCount=0, > state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2018-08-24T09:03:26Z, > additionalProperties={}), additionalProperties={}), terminated=null, > waiting=null, additionalProperties={}), additionalProperties={})] > {noformat} > The {{LoggingPodStatusWatcher}} actually already includes code to nicely > format this information but only invokes it at the end of the job: > {noformat} > 18/08/24 09:04:07 INFO LoggingPodStatusWatcherImpl: Container final statuses: > Container name: spark-kubernetes-driver > Container image: rvesse/spark:latest > Container state: Terminated > Exit code: 0 > {noformat} > It would be nice if we continually used the nice formatting throughout the > logging. > We already have patched this on our internal fork and will upstream a fix > shortly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25222) Spark on Kubernetes Pod Watcher dumps raw container status
Rob Vesse created SPARK-25222: - Summary: Spark on Kubernetes Pod Watcher dumps raw container status Key: SPARK-25222 URL: https://issues.apache.org/jira/browse/SPARK-25222 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.3.1, 2.3.0 Reporter: Rob Vesse Spark on Kubernetes provides logging of the pod/container status as a monitor of the job progress. However the logger just dumps the raw container status object leading to fairly unreadable output like so: {noformat} 18/08/24 09:03:27 INFO LoggingPodStatusWatcherImpl: State changed, new state: pod name: spark-groupby-1535101393784-driver namespace: default labels: spark-app-selector -> spark-47f7248122b9444b8d5fd3701028a1e8, spark-role -> driver pod uid: 88de6467-a77c-11e8-b9da-a4bf0128b75b creation time: 2018-08-24T09:03:14Z service account name: spark volumes: spark-local-dir-1, spark-conf-volume, spark-token-kjxkv node name: tab-cmp4 start time: 2018-08-24T09:03:14Z container images: rvesse/spark:latest phase: Running status: [ContainerStatus(containerID=docker://23ae58571f59505e837dca40455d0347fb90e9b88e2a2b145a38e2919fceb447, image=rvesse/spark:latest, imageID=docker-pullable://rvesse/spark@sha256:92abf0b718743d0f5a26068fc94ec42233db0493c55a8570dc8c851c62a4bc0a, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2018-08-24T09:03:26Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})] {noformat} The {{LoggingPodStatusWatcher}} actually already includes code to nicely format this information but only invokes it at the end of the job: {noformat} 18/08/24 09:04:07 INFO LoggingPodStatusWatcherImpl: Container final statuses: Container name: spark-kubernetes-driver Container image: rvesse/spark:latest Container state: Terminated Exit code: 0 {noformat} It would be nice if we continually used the nice formatting throughout the logging. We already have patched this on our internal fork and will upstream a fix shortly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567017#comment-16567017 ] Rob Vesse commented on SPARK-24434: --- [~skonto] Added a couple more comments based on some issues I've run into during ongoing development > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24833) Allow specifying Kubernetes host name aliases in the pod specs
Rob Vesse created SPARK-24833: - Summary: Allow specifying Kubernetes host name aliases in the pod specs Key: SPARK-24833 URL: https://issues.apache.org/jira/browse/SPARK-24833 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 2.3.1 Reporter: Rob Vesse For some workloads you would like to allow Spark executors to access external services using host name aliases. Currently there is no way to specify Host name aliases (https://kubernetes.io/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases/) to the pods that Spark generates and pod presets cannot be used to add these at admission time currently (plus the fact that pod presets are still an Alpha feature so not guaranteed to be usable on any given cluster). Since Spark on K8S already allows adding secrets and volumes to mount via Spark configuration it should be fairly easy to use the same approach to include host name aliases. I will look at opening a PR for this in the next couple of days. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23257) Implement Kerberos Support in Kubernetes resource manager
[ https://issues.apache.org/jira/browse/SPARK-23257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496284#comment-16496284 ] Rob Vesse commented on SPARK-23257: --- [~ifilonenko] Any updates on this? We're currently using the fork as Kerberos support is a must-have for our customers and would love to get this into upstream and get ourselves back onto an official Spark release. We can likely help out with testing, review and/or implementation as needed > Implement Kerberos Support in Kubernetes resource manager > - > > Key: SPARK-23257 > URL: https://issues.apache.org/jira/browse/SPARK-23257 > Project: Spark > Issue Type: Wish > Components: Kubernetes >Affects Versions: 2.3.0 >Reporter: Rob Keevil >Priority: Major > > On the forked k8s branch of Spark at > [https://github.com/apache-spark-on-k8s/spark/pull/540] , Kerberos support > has been added to the Kubernetes resource manager. The Kubernetes code > between these two repositories appears to have diverged, so this commit > cannot be merged in easily. Are there any plans to re-implement this work on > the main Spark repository? > > [ifilonenko|https://github.com/ifilonenko] [~liyinan926] I am happy to help > with the development and testing of this, but i wanted to confirm that this > isn't already in progress - I could not find any discussion about this > specific topic online. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23374) Checkstyle/Scalastyle only work from top level build
Rob Vesse created SPARK-23374: - Summary: Checkstyle/Scalastyle only work from top level build Key: SPARK-23374 URL: https://issues.apache.org/jira/browse/SPARK-23374 Project: Spark Issue Type: Bug Components: Build Affects Versions: 2.2.1 Reporter: Rob Vesse The current Maven plugin definitions for Checkstyle/Scalastyle use fixed XML configs for the style rule locations that are only valid relative to the top level POM. Therefore if you try and do a {{mvn verify}} in an individual module you get the following error: {noformat} [ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:1.0.0:check (default) on project spark-mesos_2.11: Failed during scalastyle execution: Unable to find configuration file at location scalastyle-config.xml {noformat} As the paths are hardcoded in XML and don't use Maven properties you can't override these settings so you can't style check a single module which makes doing style checking require a full project {{mvn verify}} which is not ideal. By introducing Maven properties for these two paths it would become possible to run checks on a single module like so: {noformat} mvn verify -Dscalastyle.location=../scalastyle-config.xml {noformat} Obviously the override would need to vary depending on the specific module you are trying to run it against but this would be a relatively simply change that would streamline dev workflows -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine
[ https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212462#comment-16212462 ] Rob Vesse commented on SPARK-9: --- [~yuvaldeg], thanks for providing additional clarifications If a library is considered to be a standard part of the platform software then it should fall under the foundations standards [platform|http://www.apache.org/legal/resolved.html#platform] resolution that licensing of the platform does generally not affect the software running upon it. And if there are other Apache projects already depending on this that provides a precedent that Spark can rely on. > SPIP: RDMA Accelerated Shuffle Engine > - > > Key: SPARK-9 > URL: https://issues.apache.org/jira/browse/SPARK-9 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Yuval Degani > Attachments: > SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf > > > An RDMA-accelerated shuffle engine can provide enormous performance benefits > to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin > open-source project ([https://github.com/Mellanox/SparkRDMA]). > Using RDMA for shuffle improves CPU utilization significantly and reduces I/O > processing overhead by bypassing the kernel and networking stack as well as > avoiding memory copies entirely. Those valuable CPU cycles are then consumed > directly by the actual Spark workloads, and help reducing the job runtime > significantly. > This performance gain is demonstrated with both industry standard HiBench > TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive > customer applications. > SparkRDMA will be presented at Spark Summit 2017 in Dublin > ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]). > Please see attached proposal document for more information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine
[ https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209396#comment-16209396 ] Rob Vesse commented on SPARK-9: --- I'd be interested to know more about what performance testing you carried out? In our experience of running Spark over various high-performance interconnects you can get fairly good performance boost by simply setting {{spark.shuffle.compress=false}} and relying on TCP/IP performance over the interconnect. It is pretty difficult to write a Spark job that saturates the available bandwidth of such an interconnect so disabling compression means you don't waste CPU cycles compressing data and instead simply pump it across the network ASAP. I also wonder if you could comment on the choice of the underlying RDMA libraries and their licensing? It looks like {{libdisni}} is ASLv2 which is fine but some of its dependencies appear to be at least in parts GPL (e.g. {{librdmacm}}) which would mean they could not be depended on by an Apache project even as optional dependencies due to foundation level policies around dependency licensing. Due to the nature of licensing of most of the libraries in this space it may be legally impossible for RDMA support to make it into spark proper. It which case you would likely have to stick with the external plug-in approach as you do currently. > SPIP: RDMA Accelerated Shuffle Engine > - > > Key: SPARK-9 > URL: https://issues.apache.org/jira/browse/SPARK-9 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Yuval Degani > Attachments: > SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf > > > An RDMA-accelerated shuffle engine can provide enormous performance benefits > to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin > open-source project ([https://github.com/Mellanox/SparkRDMA]). > Using RDMA for shuffle improves CPU utilization significantly and reduces I/O > processing overhead by bypassing the kernel and networking stack as well as > avoiding memory copies entirely. Those valuable CPU cycles are then consumed > directly by the actual Spark workloads, and help reducing the job runtime > significantly. > This performance gain is demonstrated with both industry standard HiBench > TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive > customer applications. > SparkRDMA will be presented at Spark Summit 2017 in Dublin > ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]). > Please see attached proposal document for more information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org