Re: Performance drop due to alloc_workqueue() misuse and recent change

2023-12-19 Thread Tejun Heo
Hello, again. On Mon, Dec 04, 2023 at 04:03:47PM +, Naohiro Aota wrote: ... > In summary, we misuse max_active, considering it is a global limit. And, > the recent commit introduced a huge performance drop in some cases. We > need to review alloc_workqueue() usage to check if its max_active

Re: [Intel-gfx] Performance drop due to alloc_workqueue() misuse and recent change

2023-12-04 Thread Tejun Heo
Hello, On Mon, Dec 04, 2023 at 04:03:47PM +, Naohiro Aota wrote: > Recently, commit 636b927eba5b ("workqueue: Make unbound workqueues to use > per-cpu pool_workqueues") changed WQ_UNBOUND workqueue's behavior. It > changed the meaning of alloc_workqueue()'s max_active from an upper limit >

Re: [Intel-gfx] [RFC v6 0/8] DRM scheduling cgroup controller

2023-11-12 Thread Tejun Heo
Hello, >From cgroup POV, it generally looks fine to me. As before, I'm really curious whether this is something other non-intel drivers can get behind. Just one nit. On Tue, Oct 24, 2023 at 05:07:19PM +0100, Tvrtko Ursulin wrote: > * Allowing per DRM card configuration and queries is

Re: [Intel-gfx] WQ_UNBOUND warning since recent workqueue refactoring

2023-08-30 Thread Tejun Heo
Hello, (cc'ing i915 folks) On Wed, Aug 30, 2023 at 04:57:42PM +0200, Heiner Kallweit wrote: > Recently I started to see the following warning on linux-next and presumably > this may be related to the refactoring of the workqueue core code. > > [ 56.900223] workqueue: output_poll_execute

Re: [Intel-gfx] [PATCH 16/17] cgroup/drm: Expose memory stats

2023-07-26 Thread Tejun Heo
Hello, On Wed, Jul 26, 2023 at 05:44:28PM +0100, Tvrtko Ursulin wrote: ... > > So, yeah, if you want to add memory controls, we better think through how > > the fd ownership migration should work. > > It would be quite easy to make the implicit migration fail - just the matter > of failing the

Re: [Intel-gfx] [PATCH 16/17] cgroup/drm: Expose memory stats

2023-07-26 Thread Tejun Heo
Hello, On Wed, Jul 26, 2023 at 12:14:24PM +0200, Maarten Lankhorst wrote: > > So, yeah, if you want to add memory controls, we better think through how > > the fd ownership migration should work. > > I've taken a look at the series, since I have been working on cgroup memory > eviction. > > The

Re: [Intel-gfx] [PATCH 15/17] cgroup/drm: Expose GPU utilisation

2023-07-25 Thread Tejun Heo
Hello, On Tue, Jul 25, 2023 at 03:08:40PM +0100, Tvrtko Ursulin wrote: > > Also, shouldn't this be keyed by the drm device? > > It could have that too, or it could come later. Fun with GPUs that it not > only could be keyed by the device, but also by the type of the GPU engine. > (Which are a)

Re: [Intel-gfx] [PATCH 16/17] cgroup/drm: Expose memory stats

2023-07-21 Thread Tejun Heo
On Wed, Jul 12, 2023 at 12:46:04PM +0100, Tvrtko Ursulin wrote: > $ cat drm.memory.stat > card0 region=system total=12898304 shared=0 active=0 resident=12111872 > purgeable=167936 > card0 region=stolen-system total=0 shared=0 active=0 resident=0 purgeable=0 > > Data is generated on demand

Re: [Intel-gfx] [PATCH 15/17] cgroup/drm: Expose GPU utilisation

2023-07-21 Thread Tejun Heo
On Fri, Jul 21, 2023 at 12:19:32PM -1000, Tejun Heo wrote: > On Wed, Jul 12, 2023 at 12:46:03PM +0100, Tvrtko Ursulin wrote: > > + drm.active_us > > + GPU time used by the group recursively including all child groups. > > Maybe instead add drm.stat and have "usage_us

Re: [Intel-gfx] [PATCH 15/17] cgroup/drm: Expose GPU utilisation

2023-07-21 Thread Tejun Heo
On Wed, Jul 12, 2023 at 12:46:03PM +0100, Tvrtko Ursulin wrote: > + drm.active_us > + GPU time used by the group recursively including all child groups. Maybe instead add drm.stat and have "usage_usec" inside? That'd be more consistent with cpu side. Thanks. -- tejun

Re: [Intel-gfx] [PATCH 12/17] cgroup/drm: Introduce weight based drm cgroup control

2023-07-21 Thread Tejun Heo
On Wed, Jul 12, 2023 at 12:46:00PM +0100, Tvrtko Ursulin wrote: > +DRM scheduling soft limits > +~~ Please don't say soft limits for this. It means something different for memcg, so it gets really confusing. Call it "weight based CPU time control" and maybe call the

Re: [Intel-gfx] [PATCH 08/17] drm/cgroup: Track DRM clients per cgroup

2023-07-21 Thread Tejun Heo
Hello, On Wed, Jul 12, 2023 at 12:45:56PM +0100, Tvrtko Ursulin wrote: > +void drmcgroup_client_migrate(struct drm_file *file_priv) > +{ > + struct drm_cgroup_state *src, *dst; > + struct cgroup_subsys_state *old; > + > + mutex_lock(_mutex); > + > + old = file_priv->__css; > +

Re: [Intel-gfx] [RFC PATCH 0/4] Add support for DRM cgroup memory accounting.

2023-05-10 Thread Tejun Heo
Hello, On Wed, May 10, 2023 at 04:59:01PM +0200, Maarten Lankhorst wrote: > The misc controller is not granular enough. A single computer may have any > number of > graphics cards, some of them with multiple regions of vram inside a single > card. Extending the misc controller to support

Re: [Intel-gfx] [RFC PATCH 0/4] Add support for DRM cgroup memory accounting.

2023-05-05 Thread Tejun Heo
Hello, On Wed, May 03, 2023 at 10:34:56AM +0200, Maarten Lankhorst wrote: > RFC as I'm looking for comments. > > For long running compute, it can be beneficial to partition the GPU memory > between cgroups, so each cgroup can use its maximum amount of memory without > interfering with other

Re: [Intel-gfx] [RFC v4 00/10] DRM scheduling cgroup controller

2023-03-24 Thread Tejun Heo
Hello, Tvrtko. On Tue, Mar 14, 2023 at 02:18:54PM +, Tvrtko Ursulin wrote: > DRM scheduling soft limits > ~~ > > Because of the heterogenous hardware and driver DRM capabilities, soft limits > are implemented as a loose co-operative (bi-directional) interface between

Re: [Intel-gfx] [RFC 10/12] cgroup/drm: Introduce weight based drm cgroup control

2023-02-02 Thread Tejun Heo
Hello, On Thu, Feb 02, 2023 at 02:26:06PM +, Tvrtko Ursulin wrote: > When you say active/inactive - to what you are referring in the cgroup > world? Offline/online? For those my understanding was offline was a > temporary state while css is getting destroyed. Oh, it's just based on activity.

Re: [Intel-gfx] [RFC 10/12] cgroup/drm: Introduce weight based drm cgroup control

2023-01-27 Thread Tejun Heo
On Thu, Jan 12, 2023 at 04:56:07PM +, Tvrtko Ursulin wrote: ... > + /* > + * 1st pass - reset working values and update hierarchical weights and > + * GPU utilisation. > + */ > + if (!__start_scanning(root, period_us)) > + goto out_retry; /* > +

Re: [Intel-gfx] [RFC v3 00/12] DRM scheduling cgroup controller

2023-01-26 Thread Tejun Heo
Hello, On Thu, Jan 26, 2023 at 02:00:50PM +0100, Michal Koutný wrote: > On Wed, Jan 25, 2023 at 06:11:35PM +, Tvrtko Ursulin > wrote: > > I don't immediately see how you envisage the half-userspace implementation > > would look like in terms of what functionality/new APIs would be provided

Re: [Intel-gfx] [RFC 11/13] cgroup/drm: Introduce weight based drm cgroup control

2022-11-28 Thread Tejun Heo
Hello, On Thu, Nov 24, 2022 at 02:32:25PM +, Tvrtko Ursulin wrote: > > Soft limits is a bit of misnomer and can be confused with best-effort limits > > such as memory.high. Prolly best to not use the term. > > Are you suggesting "best effort limits" or "best effort "? It > would sounds good

Re: [Intel-gfx] [RFC 11/13] cgroup/drm: Introduce weight based drm cgroup control

2022-11-22 Thread Tejun Heo
On Wed, Nov 09, 2022 at 04:11:39PM +, Tvrtko Ursulin wrote: > +DRM scheduling soft limits > +~~ > + > +Because of the heterogenous hardware and driver DRM capabilities, soft limits > +are implemented as a loose co-operative (bi-directional) interface between > the >

Re: [Intel-gfx] [RFC 00/17] DRM scheduling cgroup controller

2022-10-31 Thread Tejun Heo
Hello, On Thu, Oct 27, 2022 at 03:32:00PM +0100, Tvrtko Ursulin wrote: > Looking at what's available in cgroups world now, I have spotted the > blkio.prio.class control. For my current use case (lower GPU scheduling of > background/unfocused windows) that would also work. Even if starting with >

Re: [Intel-gfx] [RFC 00/17] DRM scheduling cgroup controller

2022-10-19 Thread Tejun Heo
Hello, On Wed, Oct 19, 2022 at 06:32:37PM +0100, Tvrtko Ursulin wrote: ... > DRM static priority interface files > ~~~ > > drm.priority_levels > One of: >1) And integer representing the minimum number of discrete priority > levels for the

Re: [Intel-gfx] [BUG] lockdep splat with kernfs lockdep annotations and slab mutex from drm patch??

2019-06-14 Thread Tejun Heo
Hello, On Fri, Jun 14, 2019 at 04:08:33PM +0100, Chris Wilson wrote: > #ifdef CONFIG_MEMCG > if (slab_state >= FULL && err >= 0 && is_root_cache(s)) { > struct kmem_cache *c; > > mutex_lock(_mutex); > > so it happens to hit the error + FULL case with the

Re: [Intel-gfx] [RFC PATCH 0/5] cgroup support for GPU devices

2019-05-09 Thread Tejun Heo
Hello, On Tue, May 07, 2019 at 12:50:50PM -0700, Welty, Brian wrote: > There might still be merit in having a 'device mem' cgroup controller. > The resource model at least is then no longer mixed up with host memory. > RDMA community seemed to have some interest in a common controller at > least

Re: [Intel-gfx] [RFC PATCH 0/5] cgroup support for GPU devices

2019-05-06 Thread Tejun Heo
Hello, On Wed, May 01, 2019 at 10:04:33AM -0400, Brian Welty wrote: > The patch series enables device drivers to use cgroups to control the > following resources within a GPU (or other accelerator device): > * control allocation of device memory (reuse of memcg) > and with future work, we could

Re: [Intel-gfx] [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices

2018-11-20 Thread Tejun Heo
Hello, On Tue, Nov 20, 2018 at 10:21:14PM +, Ho, Kenny wrote: > By this reply, are you suggesting that vendor specific resources > will never be acceptable to be managed under cgroup? Let say a user I wouldn't say never but whatever which gets included as a cgroup controller should have

Re: [Intel-gfx] [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices

2018-11-20 Thread Tejun Heo
Hello, On Tue, Nov 20, 2018 at 01:58:11PM -0500, Kenny Ho wrote: > Since many parts of the DRM subsystem has vendor-specific > implementations, we introduce mechanisms for vendor to register their > specific resources and control files to the DRM cgroup subsystem. A > vendor will register itself

Re: [Intel-gfx] [PATCH 02/12] blk: use for_each_if

2018-07-11 Thread Tejun Heo
On Wed, Jul 11, 2018 at 01:31:51PM -0600, Jens Axboe wrote: > I don't think there's a git easy way of sending it out outside of > just ensuring that everybody is CC'ed on everything. I don't mind > that at all. I don't subscribe to lkml, and the patches weren't > sent to linux-block. Hence all I

Re: [Intel-gfx] [PATCH 03/12] cgroup: use for_each_if

2018-07-11 Thread Tejun Heo
On Mon, Jul 09, 2018 at 10:36:41AM +0200, Daniel Vetter wrote: > Avoids the need to invert the condition instead of the open-coded > version. > > Signed-off-by: Daniel Vetter > Cc: Tejun Heo > Cc: Li Zefan > Cc: Johannes Weiner > Cc: cgro...@vger.kernel.org Acked-by:

Re: [Intel-gfx] [PATCH 02/12] blk: use for_each_if

2018-07-11 Thread Tejun Heo
On Wed, Jul 11, 2018 at 09:40:58AM -0700, Tejun Heo wrote: > On Mon, Jul 09, 2018 at 10:36:40AM +0200, Daniel Vetter wrote: > > Makes the macros resilient against if {} else {} blocks right > > afterwards. > > > > Signed-off-by: Daniel Vetter > > Cc: Teju

Re: [Intel-gfx] [PATCH 02/12] blk: use for_each_if

2018-07-11 Thread Tejun Heo
On Mon, Jul 09, 2018 at 10:36:40AM +0200, Daniel Vetter wrote: > Makes the macros resilient against if {} else {} blocks right > afterwards. > > Signed-off-by: Daniel Vetter > Cc: Tejun Heo > Cc: Jens Axboe > Cc: Shaohua Li > Cc: Kate Stewart > Cc: Greg Kroah-Har

Re: [Intel-gfx] [PATCH v3 1/6] cgroup: Allow registration and lookup of cgroup private data

2018-03-13 Thread Tejun Heo
Hello, On Tue, Mar 13, 2018 at 02:47:45PM -0700, Alexei Starovoitov wrote: > it has to be zero lookups. If idr lookup is involved, it's cleaner > to add idr as new bpf map type and use cgroup ino as an id. Oh, idr (or rather ida) is just to allocate the key, once the key is there it pretty much

Re: [Intel-gfx] [PATCH v3 1/6] cgroup: Allow registration and lookup of cgroup private data

2018-03-13 Thread Tejun Heo
Hello, Matt. cc'ing Roman and Alexei. On Tue, Mar 06, 2018 at 03:46:55PM -0800, Matt Roper wrote: > There are cases where other parts of the kernel may wish to store data > associated with individual cgroups without building a full cgroup > controller. Let's add interfaces to allow them to

Re: [Intel-gfx] [PATCH v3 3/6] cgroup: Introduce cgroup_permission()

2018-03-13 Thread Tejun Heo
On Tue, Mar 06, 2018 at 03:46:57PM -0800, Matt Roper wrote: > Non-controller kernel subsystems may base access restrictions for > cgroup-related syscalls/ioctls on a process' access to the cgroup. > Let's make it easy for other parts of the kernel to check these cgroup > permissions. I'm not sure

Re: [Intel-gfx] [PATCH v3 2/6] cgroup: Introduce task_get_dfl_cgroup()

2018-03-13 Thread Tejun Heo
(cc'ing Roman) Hello, On Tue, Mar 06, 2018 at 03:46:56PM -0800, Matt Roper wrote: > +static inline struct cgroup * > +task_get_dfl_cgroup(struct task_struct *task) > +{ > + struct cgroup *cgrp; > + > + mutex_lock(_mutex); > + cgrp = task_dfl_cgroup(task); > + cgroup_get(cgrp); >

Re: [Intel-gfx] [PATCH 1/5] workqueue: Allow retrieval of current task's work struct

2018-02-12 Thread Tejun Heo
context of the worker. > > Cc: Tejun Heo <t...@kernel.org> > Cc: Lai Jiangshan <jiangshan...@gmail.com> > Cc: Dave Airlie <airl...@redhat.com> > Cc: Ben Skeggs <bske...@redhat.com> > Cc: Alex Deucher <alexander.deuc...@amd.com> > Signed

Re: [Intel-gfx] [PATCH RFC v2 3/7] cgroup: Add interface to allow drivers to lookup process cgroup membership

2018-02-07 Thread Tejun Heo
Hello, On Thu, Feb 01, 2018 at 11:53:11AM -0800, Matt Roper wrote: > +/** > + * cgroup_for_driver_process - return the cgroup for a process > + * @pid: process to lookup cgroup for > + * > + * Returns the cgroup from the v2 hierarchy that a process belongs to. > + * This function is intended to

Re: [Intel-gfx] [PATCH RFC v2 1/7] cgroup: Allow drivers to store data associated with a cgroup

2018-02-07 Thread Tejun Heo
Hello, On Thu, Feb 01, 2018 at 11:53:09AM -0800, Matt Roper wrote: > * Drivers may be built as modules (and unloaded/reloaded) which is not >something cgroup controllers support today. As discussed in the other subthread, this shouldn't be a concern. > * Drivers may wish to provide their

Re: [Intel-gfx] [IGT PATCH RFC] tools: Introduce intel_cgroup tool

2018-02-07 Thread Tejun Heo
Hello, Forgot to respond to one point. On Thu, Feb 01, 2018 at 03:14:38PM -0800, Matt Roper wrote: > * The drivers that want to make use of this functionality may be built >as modules rather than compiled directly into the kernel. This is >important because the cgroups subsystem

Re: [Intel-gfx] [IGT PATCH RFC] tools: Introduce intel_cgroup tool

2018-02-07 Thread Tejun Heo
Hello, Matt, Chris. On Thu, Feb 01, 2018 at 03:14:38PM -0800, Matt Roper wrote: > > Hmm. Could we not avoid drm_ioctl + well known param names and use a > > more generic tool to set cgroup attributes? Just feels wrong that a > > such a generic interface boils down to a driver specific ioctl. So,

Re: [Intel-gfx] [PATCH RFC 6/9] drm: Add cgroup helper library

2018-01-22 Thread Tejun Heo
Hello, Matt. On Fri, Jan 19, 2018 at 05:51:38PM -0800, Matt Roper wrote: > Most DRM drivers will want to handle the CGROUP_SETPARAM ioctl by looking up a > driver-specific per-cgroup data structure (or allocating a new one) and > storing > the supplied parameter value into the data structure

Re: [Intel-gfx] [PATCH V3 04/29] ata: deprecate pci_get_bus_and_slot()

2017-11-27 Thread Tejun Heo
et_bus_and_slot() function in favor of > pci_get_domain_bus_and_slot(). > > Use pci_get_domain_bus_and_slot() and extract the actual domain number > from the pdev passed in. > > Signed-off-by: Sinan Kaya <ok...@codeaurora.org> Acked-by: Tejun Heo <t...@kernel.org> Please f

Re: [Intel-gfx] [PATCH] drm/i915: Try harder to finish the idle-worker

2017-09-05 Thread Tejun Heo
Hello, On Tue, Sep 05, 2017 at 02:43:14PM +0100, Chris Wilson wrote: > > Can't you use cancel[_delayed]_work_sync()? > > We then need a loop like: > > do { > if (cancel_delayed_work_sync(wrk)) > do_work(wrk); > else >

Re: [Intel-gfx] [PATCH] drm/i915: Try harder to finish the idle-worker

2017-09-05 Thread Tejun Heo
On Mon, Sep 04, 2017 at 10:35:49AM +0200, Daniel Vetter wrote: > On Fri, Sep 01, 2017 at 03:11:23PM +0100, Chris Wilson wrote: > > If a worker requeues itself, it may switch to a different kworker pool, > > which flush_work() considers as complete. To be strict, we then need to > > keep flushing

Re: [Intel-gfx] [PATCH] lib/ida: Document locking requirements a bit better v2

2016-10-27 Thread Tejun Heo
gt; v2: Improve the kerneldoc per Tejun's review. > > Cc: Mel Gorman <mgor...@techsingularity.net> > Cc: Michal Hocko <mho...@suse.com> > Cc: Vlastimil Babka <vba...@suse.cz> > Cc: Tejun Heo <t...@kernel.org> > Cc: Andrew Morton <a...@linux-foundation.org>

Re: [Intel-gfx] [PATCH] lib/ida: Document locking requirements a bit better

2016-10-26 Thread Tejun Heo
Hello, Daniel. On Wed, Oct 26, 2016 at 09:25:25PM +0200, Daniel Vetter wrote: > > > + * Note that callers must ensure that concurrent access to @ida is not > > > possible. > > > + * When simplicity trumps concurrency needs look at ida_simple_get() > > > instead. > > > > Maybe we can make it a

Re: [Intel-gfx] [PATCH] lib/ida: Document locking requirements a bit better

2016-10-26 Thread Tejun Heo
Hello, Daniel. On Wed, Oct 26, 2016 at 04:27:39PM +0200, Daniel Vetter wrote: > I wanted to wrap a bunch of ida_simple_get calls into their own > locking, until I dug around and read the original commit message. > Stuff like this should imo be added to the kernel doc, let's do that. Generally

Re: [Intel-gfx] [PATCH] kernfs: Move faulting copy_user operations outside of the mutex

2016-03-31 Thread Tejun Heo
..@linux.intel.com> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350 > Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk> > Reviewed-by: Joonas Lahtinen <joonas.lahti...@linux.intel.com> > Cc: Ville Syrjälä <ville.syrj...@linux.intel.com> > Cc: Joonas Laht

Re: [Intel-gfx] [PATCH 0/1] async: export current_is_async()

2015-11-19 Thread Tejun Heo
Hello, Lukas. On Thu, Nov 19, 2015 at 04:31:11PM +0100, Lukas Wunner wrote: > Hi Tejun, > > when you introduced current_is_async() with 84b233adcca3, was it a > deliberate decision not to export it? All other non-static functions > in async.c are exported as well. > > I'm asking because I would

[Intel-gfx] [PATCH driver-core-linus] kernfs: add back missing error check in kernfs_fop_mmap()

2014-04-20 Thread Tejun Heo
won't reach the point if the mmap callback isn't implemented, but I mistakenly removed the error return check together with it. This led to Xorg crash on i810 which was reported and bisected to the commit and then to the specific change by Tobias. Signed-off-by: Tejun Heo t...@kernel.org Reported

Re: [Intel-gfx] 3.14 issue with i810 graphic card bisected

2014-04-18 Thread Tejun Heo
Hello, Sorry about the long delay. On Thu, Apr 03, 2014 at 08:37:49AM +0200, Tobias Powalowski wrote: Hi, I bisected a X startup crash due to new 3.14 kernel: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/kernfs/file.c?id=9b2db6e1894577d48f4e290381bac6e573593838

Re: [Intel-gfx] kernfs oops with i915+i2c_core in 3.14 merge window

2014-02-04 Thread Tejun Heo
On Thu, Jan 30, 2014 at 02:03:18PM -0500, Josh Boyer wrote: Hi All, I'm seeing the oops below on my MacBookPro 10,2 machine using i915 graphics. It's after the DRM merge for 3.14 ( v3.13-10094-g9b0cd30) , but we seem to have one report[1] of this happening well before that, in