[ https://issues.apache.org/jira/browse/SPARK-29762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972631#comment-16972631 ]
Imran Rashid commented on SPARK-29762: -------------------------------------- I don't really understand the complication. I know there would be some special casing for GPUs in the config parsing code (eg. in {{org.apache.spark.resource.ResourceUtils#parseResourceRequirements}}), but doesn't seem anything too bad. I did think about this more, and realize it gets a bit confusing when you add in task-level resource constraints. you won't schedule optimally for tasks that don't need gpu, and you won't have gpus leftover for the tasks that do need them. Eg, say you had each executor setup with 4 cores and 2 gpus. If you had one task set come in which only needed cpu, you would only run 2 copies. And then if another taskset came in which did need the gpus, you woudn't be able to schedule it. You can't end up in that situation until you have task-specific resource constraints. But does it get too messy to have sensible defaults in that situation? Maybe the user specifies gpus as an executor resource up front, for the whole cluster, because they have them available and they know some significant fraction of the workloads need them. They might think that the regular tasks will just ignore the gpus, and the tasks that do need gpus would just specify them as task-level constraints. I guess this might have been a bad suggestion after all, sorry. > GPU Scheduling - default task resource amount to 1 > -------------------------------------------------- > > Key: SPARK-29762 > URL: https://issues.apache.org/jira/browse/SPARK-29762 > Project: Spark > Issue Type: Story > Components: Spark Core > Affects Versions: 3.0.0 > Reporter: Thomas Graves > Priority: Major > > Default the task level resource configs (for gpu/fpga, etc) to 1. So if the > user specifies the executor resource then to make it more user friendly lets > have the task resource config default to 1. This is ok right now since we > require resources to have an address. It also matches what we do for the > spark.task.cpus configs. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org