Hello, On Fri, May 07, 2021 at 06:30:56PM -0400, Alex Deucher wrote: > Maybe we are speaking past each other. I'm not following. We got > here because a device specific cgroup didn't make sense. With my > Linux user hat on, that makes sense. I don't want to write code to a > bunch of device specific interfaces if I can avoid it. But as for > temporal vs spatial partitioning of the GPU, the argument seems to be > a sort of hand-wavy one that both spatial and temporal partitioning > make sense on CPUs, but only temporal partitioning makes sense on > GPUs. I'm trying to understand that assertion. There are some GPUs
Spatial partitioning as implemented in cpuset isn't a desirable model. It's there partly because it has historically been there. It doesn't really require dynamic hierarchical distribution of anything and is more of a way to batch-update per-task configuration, which is how it's actually implemented. It's broken too in that it interferes with per-task affinity settings. So, not exactly a good example to follow. In addition, this sort of partitioning requires more hardware knowledge and GPUs are worse than CPUs in that hardwares differ more. Features like this are trivial to implement from userland side by making per-process settings inheritable and restricting who can update the settings. > that can more easily be temporally partitioned and some that can be > more easily spatially partitioned. It doesn't seem any different than > CPUs. Right, it doesn't really matter how the resource is distributed. What matters is how granular and generic the distribution can be. If gpus can implement work-conserving proportional distribution, that's something which is widely useful and inherently requires dynamic scheduling from kernel side. If it's about setting per-vendor affinities, this is way too much cgroup interface for a feature which can be easily implemented outside cgroup. Just do per-process (or whatever handles gpus use) and confine their configurations from cgroup side however way. While the specific theme changes a bit, we're basically having the same discussion with the same conclusion over the past however many months. Hopefully, the point is clear by now. Thanks. -- tejun