[PATCH -next] video: fbdev: intelfb: Remove set but not used variable 'val'

2021-05-27 Thread Baokun Li
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/video/fbdev/intelfb/intelfb_i2c.c: In function 'intelfb_gpio_setscl':
drivers/video/fbdev/intelfb/intelfb_i2c.c:58:6: warning:
 variable ‘val’ set but not used [-Wunused-but-set-variable]
drivers/video/fbdev/intelfb/intelfb_i2c.c: In function 'intelfb_gpio_setsda':
drivers/video/fbdev/intelfb/intelfb_i2c.c:69:6: warning:
 variable ‘val’ set but not used [-Wunused-but-set-variable]

It never used since introduction.

Signed-off-by: Baokun Li 
---
 drivers/video/fbdev/intelfb/intelfb_i2c.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/video/fbdev/intelfb/intelfb_i2c.c 
b/drivers/video/fbdev/intelfb/intelfb_i2c.c
index 3300bd31d9d7..4df2f1f8a18e 100644
--- a/drivers/video/fbdev/intelfb/intelfb_i2c.c
+++ b/drivers/video/fbdev/intelfb/intelfb_i2c.c
@@ -55,22 +55,20 @@ static void intelfb_gpio_setscl(void *data, int state)
 {
struct intelfb_i2c_chan *chan = data;
struct intelfb_info *dinfo = chan->dinfo;
-   u32 val;
 
OUTREG(chan->reg, (state ? SCL_VAL_OUT : 0) |
   SCL_DIR | SCL_DIR_MASK | SCL_VAL_MASK);
-   val = INREG(chan->reg);
+   INREG(chan->reg);
 }
 
 static void intelfb_gpio_setsda(void *data, int state)
 {
struct intelfb_i2c_chan *chan = data;
struct intelfb_info *dinfo = chan->dinfo;
-   u32 val;
 
OUTREG(chan->reg, (state ? SDA_VAL_OUT : 0) |
   SDA_DIR | SDA_DIR_MASK | SDA_VAL_MASK);
-   val = INREG(chan->reg);
+   INREG(chan->reg);
 }
 
 static int intelfb_gpio_getscl(void *data)
-- 
2.25.4



Re: [Intel-gfx] [PATCH 15/18] drm/i915/guc: Ensure H2G buffer updates visible before tail update

2021-05-27 Thread Michal Wajdeczko



On 28.05.2021 03:13, John Harrison wrote:
> On 5/26/2021 10:58, Matthew Brost wrote:
>> On Wed, May 26, 2021 at 02:36:18PM +0200, Michal Wajdeczko wrote:
>>> On 26.05.2021 08:42, Matthew Brost wrote:
 Ensure H2G buffer updates are visible before descriptor tail updates by
 inserting a barrier between the H2G buffer update and the tail. The
 barrier is simple wmb() for SMEM and is register write for LMEM.
 This is
 needed if more than 1 H2G can be inflight at once.

 Signed-off-by: Matthew Brost 
 Cc: Michal Wajdeczko 
 ---
   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 18 ++
   1 file changed, 18 insertions(+)

 diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
 b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
 index fb875d257536..42063e1c355d 100644
 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
 +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
 @@ -328,6 +328,18 @@ static u32 ct_get_next_fence(struct
 intel_guc_ct *ct)
   return ++ct->requests.last_fence;
   }
   +static void write_barrier(struct intel_guc_ct *ct) {
 +    struct intel_guc *guc = ct_to_guc(ct);
 +    struct intel_gt *gt = guc_to_gt(guc);
 +
 +    if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
 +    GEM_BUG_ON(guc->send_regs.fw_domains);
 +    intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
>>> hmm, as this is one of the GuC scratch registers used for H2G MMIO
>>> communication, writing 0 there might be interpreted by the GuC as new
>>> request with action=0 and might results in extra processing/logging on
>>> GuC side, and, since from here we don't protect access to this register
>>> by send_mutex, we can corrupt other MMIO message being prepared from
>>> different thread, ... can't we use other register ?
>>>
>> Hmm, this code has been internal for a long time and we haven't seen an
>> issues. MMIOs are always attempted to be processed each interrupt and
>> then CTBs are processed next. A value a 0 in scratch0 results in no MMIOs
>> being processed as a value of 0 is a reserved action which translates to
>> a NOP.
>>
>> Also in the current i915 once CTBs are enabled MMIOs are never used.
>> That being said, I think once we transition to the new interface +
>> enable suspend on a VF MMIOs might be used.
>>
>> With that I purpose that we merge this as is with a comment saying if we
>> ever mix CTBs and MMIOs we need to find another MMIO register. I don't
>> changing this now is worth delaying upstreaming this and also any change
>> we make now will make us lose confidence in code that has been
>> thoroughly tested.
>>
>> Matt
> This was discussed in chat while inspecting the GuC firmware code.
> Writing zero to the scratch does indeed not trigger any extra processing
> of spurious MMIO H2Gs. The register is indeed always checked when the
> host triggers a CTB H2G, but zero counts as invalid and thus will be
> skipped.
> 
> So with a comment about not mixing CTB and MMIOs, I think we are good
> for now. It seems unlikely that MMIOs & CTB would be mixed. MMIOs are
> only used for initialisation operations and should not be necessary once
> the CTBs are up and running. If mixing does occur in the future, it
> sounds like something that should be addressed at the GuC architecture
> level!

well, unlikely is not the same as not possible...

especially that on MMIO path we are protecting access to this register,
so maybe, to try capture any unexpected scenarios, we should at least
add something like:

GEM_WARN_ON(mutex_is_locked(&guc->send_mutex))

and since you already check for send_regs.fw_domains actual register
offset should be taken from send_regs.base ?

alternatively, since I doubt that we have to use this specific send
register, we could define i915 level function for the purpose of
triggering write barrier (or maybe we already have one?) that is using
register that is not conflicting with guc MMIO communication...

note, in case you can't find any other safe register to write, maybe
better option to use is SOFT_SCRATCH(0xc180) that is still available on
Gen11, but it is not used by GuC any more for MMIO communication, and on
Gen9 we don't have lmem so no conflict at all, which we can check with:

GEM_BUG_ON(send_regs.base == SOFT_SCRATCH)

and then we should be safe for sure, not just "unlikely"

Michal

> 
> With the comment added:
> Reviewed-by: John Harrison 
> 
> 
>>  
 +    } else {
 +    wmb();
 +    }
 +}
 +
   /**
    * DOC: CTB Host to GuC request
    *
 @@ -411,6 +423,12 @@ static int ct_write(struct intel_guc_ct *ct,
   }
   GEM_BUG_ON(tail > size);
   +    /*
 + * make sure H2G buffer update and LRC tail update (if this
 triggering a
 + * submission) are visible before updating the descriptor tail
 + */
 +    write_barrier(ct);
 +
   /* now update de

Re: [PATCH 0/4] drm/panfrost: Plumb cycle counters to userspace

2021-05-27 Thread Tomeu Vizoso

Hi Alyssa,

Will this be enough to implement GL_TIMESTAMP and GL_TIME_ELAPSED queries?

Guess the DDK implements these as WRITE_VALUE jobs, and there's also a 
soft job BASE_JD_REQ_SOFT_DUMP_CPU_GPU_TIME that I guess is used for 
glGet*(GL_TIMESTAMP). Other DRM drivers use an ioctl for that instead.


Regards,

Tomeu

On 5/27/21 10:38 PM, alyssa.rosenzw...@collabora.com wrote:

From: Alyssa Rosenzweig 

Mali has hardware cycle counters (and GPU timestamps) available for
profiling. These are exposed in various ways:

- Kernel: As CYCLE_COUNT and TIMESTAMP registers
- Job chain: As WRITE_VALUE descriptors
- Shader (Midgard): As LD_SPECIAL selectors
- Shader (Bifrost): As the LD_GCLK.u64 instruction

These form building blocks for profiling features, for example the
ARB_shader_clock extension which accesses the counters from an
application's shader.

The counters consume power, so it is recommended to disable the counters
when not in use. To do so, we follow the strategy from mali_kbase: add a
counter requirement to the job, start the counters only when required,
and stop them as quickly as possible.

The new UABI will be used in Mesa. An implementation of ARB_shader_clock
using this UABI is available as a pending upstream merge request [1].
The implementation passes the relevant piglit test, validating both the
kernel and mesa.

The main outstanding questing is the proper name. Performance monitoring
("PERMON") is the name used by kbase, but it's jargon-y and risks
confusion with performance counters, an orthogonal mechanism. Cycle
count is more descriptive and matches the actual hardware name, but
obscures that the same mechanism is required for GPU timestamps. This
bit of bikeshedding aside, I'm pleased with the patches.

[1] https://gitlab.freedesktop.org/mesa/mesa/merge_requests/11051

Alyssa Rosenzweig (4):
   drm/panfrost: Add cycle counter job requirement
   drm/panfrost: Add CYCLE_COUNT_START/STOP commands
   drm/panfrost: Add permon acquire/release helpers
   drm/panfrost: Handle PANFROST_JD_REQ_PERMON

  drivers/gpu/drm/panfrost/panfrost_device.h |  3 +++
  drivers/gpu/drm/panfrost/panfrost_drv.c| 10 +++---
  drivers/gpu/drm/panfrost/panfrost_gpu.c| 20 
  drivers/gpu/drm/panfrost/panfrost_gpu.h|  3 +++
  drivers/gpu/drm/panfrost/panfrost_job.c|  6 ++
  drivers/gpu/drm/panfrost/panfrost_regs.h   |  2 ++
  include/uapi/drm/panfrost_drm.h|  3 ++-
  7 files changed, 43 insertions(+), 4 deletions(-)



Re: [PATCH 0/4] Fix the i2c/clk bug of stm32 mcu platform

2021-05-27 Thread Dillon Min
Hi Patrice, Alain,

Could you help to take a look at this patchset, thanks.

This series is the rebase to the newest kernel commit:
88b06399c9c766c283e070b022b5ceafa4f63f19

according to the request from:
https://lore.kernel.org/lkml/ff2bc09d-1a17-50d4-d3ee-16fd3a86d...@foss.st.com/

The clk bug affects the kernel bootup on stm32f469-disco board
in case display config(CONFIG_DRM_STM, CONFIG_DRM_STM_DSI,
DRM_PANEL_ORISETECH_OTM8009A)
enabled.

If you want to test clk patch on stm32f429-disco board, the
panel-ilitek-ili9341.c can be
used for that purpose (CONFIG_DRM_STM, DRM_PANEL_ILITEK_ILI9341)

i2c driver patch intent to fix the touch panel driver get data through
i2c bus timeout issue.

Best regards.
Dillon

On Fri, May 14, 2021 at 7:02 PM  wrote:
>
> From: Dillon Min 
>
> This seriese fix three i2c/clk bug for stm32 f4/f7
> - kernel runing in sdram, i2c driver get data timeout
> - ltdc clk turn off after kernel console active
> - kernel hang in set ltdc clock rate
>
> clk bug found on stm32f429/f469-disco board
>
> Hi Patrice:
> below is the guide to verify the patch:
>
> setup test env with following files(link at below 'files link'):
> [1] u-boot-dtb.bin
> [2] rootfs zip file (used in kernel initramfs)
> [3] u-boot's mkimage to create itb file
> [4] kernel config file
> [5] my itb with-or-without i2c patch
>
> This patch based on kernel commit:
> 88b06399c9c766c283e070b022b5ceafa4f63f19
>
> Note:
> panel-ilitek-ili9341.c is the driver which was submitted last year, but not
> get accepted. it's used to setup touch screen calibration, then test i2c.
>
> create itb file(please correct path of 'data'):
> ./mkimage -f stm32.its stm32.itb
>
> HW setup:
> console:
>PA9, PA10
>usart0
>serial@40011000
>115200 8n1
>
> -- flash u-boot.bin to stm32f429-disco on PC
> $ sudo openocd -f board/stm32f429discovery.cfg -c \
>   '{PATH-TO-YOUR-UBOOT}/u-boot-dtb.bin 0x0800 exit reset'
>
> -- setup kernel load bootargs at u-boot
> U-Boot > setenv bootargs 'console=tty0 console=ttySTM0,115200
> root=/dev/ram rdinit=/linuxrc loglevel=8 fbcon=rotate:2'
> U-Boot > loady;bootm
> (download stm32.dtb or your kernel with itb format, or download zImage, dtb)
>
> -- setup ts_calibrate running env on stm32f429-disco
> / # export TSLIB_CONFFILE=/etc/ts.conf
> / # export TSLIB_TSDEVICE=/dev/input/event0
> / # export TSLIB_CONSOLEDEVICE=none
> / # export TSLIB_FBDEVICE=/dev/fb0
>
> -- clear screen
> / # ./fb
>
> -- run ts_calibrate
> / # ts_calibrate
> (you can calibrate touchscreen now, and get below errors)
>
> [  113.942087] stmpe-i2c0-0041: failed to read regs 0x52: -110
> [  114.063598] stmpe-i2c 0-0041: failed to read reg 0x4b: -16
> [  114.185629] stmpe-i2c 0-0041: failed to read reg 0x40: -16
> [  114.307257] stmpe-i2c 0-0041: failed to write reg 0xb: -16
>
> ...
> with i2c patch applied, you will find below logs:
>
> RAW-> 3164 908 183 118.110884
> TS_READ_RAW> x = 3164, y =908, pressure = 183
> RAW-> 3166 922 126 118.138946
> TS_READ_RAW> x = 3166, y = 922, pressure = 126
> 
>
> files link:
> https://drive.google.com/drive/folders/1qNbjChcB6UGtKzne2F5x9_WG_sZFyo3o?usp=sharing
>
>
>
>
> Dillon Min (4):
>   drm/panel: Add ilitek ili9341 panel driver
>   i2c: stm32f4: Fix stmpe811 get xyz data timeout issue
>   clk: stm32: Fix stm32f429's ltdc driver hang in set clock rate
>   clk: stm32: Fix ltdc's clock turn off by clk_disable_unused() after
> kernel startup
>
>  drivers/clk/clk-stm32f4.c|   10 +-
>  drivers/gpu/drm/panel/Kconfig|   12 +
>  drivers/gpu/drm/panel/Makefile   |1 +
>  drivers/gpu/drm/panel/panel-ilitek-ili9341.c | 1285 
> ++
>  drivers/i2c/busses/i2c-stm32f4.c |   12 +-
>  5 files changed, 1310 insertions(+), 10 deletions(-)
>  create mode 100755 drivers/gpu/drm/panel/panel-ilitek-ili9341.c
>
> --
> 2.7.4
>


Re: [PATCH] drm: Fix for GEM buffers with write-combine memory

2021-05-27 Thread Tomi Valkeinen

On 28/05/2021 02:03, Paul Cercueil wrote:

The previous commit wrongly assumed that dma_mmap_wc() could be replaced
by pgprot_writecombine() + dma_mmap_pages(). It did work on my setup,
but did not work everywhere.

Use dma_mmap_wc() when the buffer has the write-combine cache attribute,
and dma_mmap_pages() when it has the non-coherent cache attribute.

Signed-off-by: Paul Cercueil 
Reported-by: Tomi Valkeinen 
Fixes: cf8ccbc72d61 ("drm: Add support for GEM buffers backed by non-coherent 
memory")
---
  drivers/gpu/drm/drm_gem_cma_helper.c | 16 ++--
  1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c 
b/drivers/gpu/drm/drm_gem_cma_helper.c
index 235c7a63da2b..4c3772651954 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -514,13 +514,17 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct 
vm_area_struct *vma)
  
  	cma_obj = to_drm_gem_cma_obj(obj);
  
-	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);

-   if (!cma_obj->map_noncoherent)
-   vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+   if (cma_obj->map_noncoherent) {
+   vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+
+   ret = dma_mmap_pages(cma_obj->base.dev->dev,
+vma, vma->vm_end - vma->vm_start,
+virt_to_page(cma_obj->vaddr));
+   } else {
+   ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
+ cma_obj->paddr, vma->vm_end - vma->vm_start);
  
-	ret = dma_mmap_pages(cma_obj->base.dev->dev,

-vma, vma->vm_end - vma->vm_start,
-virt_to_page(cma_obj->vaddr));
+   }
if (ret)
drm_gem_vm_close(vma);
  



Reviewed-by: Tomi Valkeinen 

and

Tested-by: Tomi Valkeinen 

Thanks!

Btw, the kernel-doc for drm_gem_cma_create doesn't quite match, as it 
says wc is always used.


 Tomi


[PATCH 1/1] drm/i915/selftests: Fix error return code in live_parallel_switch()

2021-05-27 Thread Zhen Lei
The error code returned from intel_context_create() should be propagated
instead of 0, as done elsewhere in this function.

Fixes: 50d16d44cce4 ("drm/i915/selftests: Exercise context switching in 
parallel")
Reported-by: Hulk Robot 
Signed-off-by: Zhen Lei 
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index 5fef592390cb..7db9e31da385 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -338,8 +338,10 @@ static int live_parallel_switch(void *arg)
continue;
 
ce = intel_context_create(data[m].ce[0]->engine);
-   if (IS_ERR(ce))
+   if (IS_ERR(ce)) {
+   err = PTR_ERR(ce);
goto out;
+   }
 
err = intel_context_pin(ce);
if (err) {
-- 
2.25.1




Re: [PATCH v9 07/10] mm: Device exclusive memory access

2021-05-27 Thread Alistair Popple
On Thursday, 27 May 2021 11:04:57 PM AEST Peter Xu wrote:
> On Thu, May 27, 2021 at 01:35:39PM +1000, Alistair Popple wrote:
> > > > + *
> > > > + * @MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device
> > > > will
> > > > no + * longer have exclusive access to the page. May ignore the
> > > > invalidation that's + * part of make_device_exclusive_range() if the
> > > > owner field
> > > > + * matches the value passed to make_device_exclusive_range().
> > > 
> > > Perhaps s/matches/does not match/?
> > 
> > No, "matches" is correct. The MMU_NOTIFY_EXCLUSIVE notifier is to notify a
> > listener that a range is being invalidated for the purpose of making the
> > range available for some device to have exclusive access to. Which does
> > also mean a device getting the notification no longer has exclusive
> > access if it already did.
> > 
> > A unique type is needed because when creating the range a driver needs to
> > form a mmu critical section (with mmu_interval_read_begin()/
> > mmu_interval_read_end()) to ensure the entry remains valid long enough to
> > program the device pte and hasn't been invalidated.
> > 
> > However without a way of filtering any invalidations will result in a
> > retry, but make_device_exclusive_range() needs to do an invalidation
> > during installation of the entry. To avoid this causing infinite retries
> > the driver ignores specific invalidation events that it knows don't
> > apply, ie. the invalidations that are a result of that driver asking for
> > device exclusive entries.
> 
> OK I think I get it now.. so the driver checks both EXCLUSIVE and owner, if
> all match it skips the notify, otherwise it's treated like all the rest. 
> Thanks.
> 
> However then it's still confusing (as I raised it too in previous comment)
> that we use CLEAR when re-installing the valid pte.  It's merely against
> what CLEAR means.

Oh, thanks. I understand where you are coming from now - the pte is already 
invalid so ordinarily wouldn't need clearing.

> How about sending EXCLUSIVE for both mark/restore?  Just that when restore
> we notify with owner==NULL telling that no one is owning it anymore so
> driver needs to drop the ownership.  I assume your driver patch does not
> need change too.  Would that be much cleaner than CLEAR?  I bet it also
> makes commenting the new notify easier.
> 
> What do you think?

That seems like a good and avoids adding another type. And as you say they 
driver patch shouldn't need changing either (will need to confirm though).
 
> [...]
> 
> > > > +   vma->vm_mm, address,
> > > > min(vma->vm_end,
> > > > +   address + page_size(page)),
> > > > args->owner); + mmu_notifier_invalidate_range_start(&range);
> > > > +
> > > > + while (page_vma_mapped_walk(&pvmw)) {
> > > > + /* Unexpected PMD-mapped THP? */
> > > > + VM_BUG_ON_PAGE(!pvmw.pte, page);
> > > > +
> > > > + if (!pte_present(*pvmw.pte)) {
> > > > + ret = false;
> > > > + page_vma_mapped_walk_done(&pvmw);
> > > > + break;
> > > > + }
> > > > +
> > > > + subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte);
> > > 
> > > I see that all pages passed in should be done after FOLL_SPLIT_PMD, so
> > > is
> > > this needed?  Or say, should subpage==page always be true?
> > 
> > Not always, in the case of a thp there are small ptes which will get
> > device
> > exclusive entries.
> 
> FOLL_SPLIT_PMD will first split the huge thp into smaller pages, then do
> follow_page_pte() on them (in follow_pmd_mask):
> 
> if (flags & FOLL_SPLIT_PMD) {
> int ret;
> page = pmd_page(*pmd);
> if (is_huge_zero_page(page)) {
> spin_unlock(ptl);
> ret = 0;
> split_huge_pmd(vma, pmd, address);
> if (pmd_trans_unstable(pmd))
> ret = -EBUSY;
> } else {
> spin_unlock(ptl);
> split_huge_pmd(vma, pmd, address);
> ret = pte_alloc(mm, pmd) ? -ENOMEM : 0;
> }
> 
> return ret ? ERR_PTR(ret) :
> follow_page_pte(vma, address, pmd, flags,
> &ctx->pgmap); }
> 
> So I thought all pages are small pages?

The page will remain as a transparent huge page though (at least as I 
understand things). FOLL_SPLIT_PMD turns it into a pte mapped thp by splitting 
the pmd and creating pte's mapping the subpages but doesn't split the page 
itself. For comparison FOLL_SPLIT (which has been removed in v5.13 due to lack 
of use) is what would be used to split the page in the above GUP code by 
calling split_huge_page() rather than split_huge_pmd().

This was done to avoid adding code for handling device exclusive entries at 
the pmd level as

Re: [Intel-gfx] [PATCH 15/18] drm/i915/guc: Ensure H2G buffer updates visible before tail update

2021-05-27 Thread John Harrison

On 5/26/2021 10:58, Matthew Brost wrote:

On Wed, May 26, 2021 at 02:36:18PM +0200, Michal Wajdeczko wrote:

On 26.05.2021 08:42, Matthew Brost wrote:

Ensure H2G buffer updates are visible before descriptor tail updates by
inserting a barrier between the H2G buffer update and the tail. The
barrier is simple wmb() for SMEM and is register write for LMEM. This is
needed if more than 1 H2G can be inflight at once.

Signed-off-by: Matthew Brost 
Cc: Michal Wajdeczko 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 18 ++
  1 file changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index fb875d257536..42063e1c355d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -328,6 +328,18 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
return ++ct->requests.last_fence;
  }
  
+static void write_barrier(struct intel_guc_ct *ct) {

+   struct intel_guc *guc = ct_to_guc(ct);
+   struct intel_gt *gt = guc_to_gt(guc);
+
+   if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
+   GEM_BUG_ON(guc->send_regs.fw_domains);
+   intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);

hmm, as this is one of the GuC scratch registers used for H2G MMIO
communication, writing 0 there might be interpreted by the GuC as new
request with action=0 and might results in extra processing/logging on
GuC side, and, since from here we don't protect access to this register
by send_mutex, we can corrupt other MMIO message being prepared from
different thread, ... can't we use other register ?


Hmm, this code has been internal for a long time and we haven't seen an
issues. MMIOs are always attempted to be processed each interrupt and
then CTBs are processed next. A value a 0 in scratch0 results in no MMIOs
being processed as a value of 0 is a reserved action which translates to
a NOP.

Also in the current i915 once CTBs are enabled MMIOs are never used.
That being said, I think once we transition to the new interface +
enable suspend on a VF MMIOs might be used.

With that I purpose that we merge this as is with a comment saying if we
ever mix CTBs and MMIOs we need to find another MMIO register. I don't
changing this now is worth delaying upstreaming this and also any change
we make now will make us lose confidence in code that has been
thoroughly tested.

Matt
This was discussed in chat while inspecting the GuC firmware code. 
Writing zero to the scratch does indeed not trigger any extra processing 
of spurious MMIO H2Gs. The register is indeed always checked when the 
host triggers a CTB H2G, but zero counts as invalid and thus will be 
skipped.


So with a comment about not mixing CTB and MMIOs, I think we are good 
for now. It seems unlikely that MMIOs & CTB would be mixed. MMIOs are 
only used for initialisation operations and should not be necessary once 
the CTBs are up and running. If mixing does occur in the future, it 
sounds like something that should be addressed at the GuC architecture 
level!


With the comment added:
Reviewed-by: John Harrison 


  

+   } else {
+   wmb();
+   }
+}
+
  /**
   * DOC: CTB Host to GuC request
   *
@@ -411,6 +423,12 @@ static int ct_write(struct intel_guc_ct *ct,
}
GEM_BUG_ON(tail > size);
  
+	/*

+* make sure H2G buffer update and LRC tail update (if this triggering a
+* submission) are visible before updating the descriptor tail
+*/
+   write_barrier(ct);
+
/* now update desc tail (back in bytes) */
desc->tail = tail * 4;
return 0;


___
Intel-gfx mailing list
intel-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[PATCH v4 1/1] drm/i915/dg1: Add HWMON power sensor support

2021-05-27 Thread Dale B Stimson
As part of the System Managemenent Interface (SMI), use the HWMON
subsystem to display power utilization.

The following standard HWMON power sensors are currently supported
(and appropriately scaled):
  /sys/class/drm/card0/device/hwmon/hwmon
- energy1_input
- power1_cap
- power1_max

Some non-standard HWMON power information is also provided, such as
enable bits and intervals.

Signed-off-by: Dale B Stimson 
---
 .../ABI/testing/sysfs-driver-intel-i915-hwmon | 116 +++
 drivers/gpu/drm/i915/Kconfig  |   1 +
 drivers/gpu/drm/i915/Makefile |   1 +
 drivers/gpu/drm/i915/i915_drv.c   |   6 +
 drivers/gpu/drm/i915/i915_drv.h   |   3 +
 drivers/gpu/drm/i915/i915_hwmon.c | 757 ++
 drivers/gpu/drm/i915/i915_hwmon.h |  42 +
 drivers/gpu/drm/i915/i915_reg.h   |  52 ++
 8 files changed, 978 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
 create mode 100644 drivers/gpu/drm/i915/i915_hwmon.c
 create mode 100644 drivers/gpu/drm/i915/i915_hwmon.h

diff --git a/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon 
b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
new file mode 100644
index 0..2ee7c413ca190
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
@@ -0,0 +1,116 @@
+What:   /sys/devices/.../hwmon/hwmon/energy1_input
+Date:   June 2021
+KernelVersion:  5.14
+Contact:dri-devel@lists.freedesktop.org
+Description:
+RO. Energy input of device in microjoules.
+
+   The returned textual representation is an unsigned integer
+   number that can be stored in 64-bits.  Warning: The hardware
+   register is 32-bits wide and can overflow by wrapping around.
+   A single wrap-around between calls to read this value can
+   be detected and will be accounted for in the returned value.
+   At a power consumption of 1 watt, the 32-bit hardware register
+   would wrap-around approximately every 3 days.
+
+   Only supported for particular Intel i915 graphics platforms.
+
+What:   /sys/devices/.../hwmon/hwmon/power1_max_enable
+Date:   June 2021
+KernelVersion:  5.14
+Contact:dri-devel@lists.freedesktop.org
+Description:
+RW.  Sustained power limit is enabled - true or false.
+
+The power controller will throttle the operating frequency
+if the power averaged over a window (typically seconds)
+exceeds this limit.
+
+See power1_max_enable power1_max power1_max_interval
+
+   Only supported for particular Intel i915 graphics platforms.
+
+What:   /sys/devices/.../hwmon/hwmon/power1_max
+Date:   June 2021
+KernelVersion:  5.14
+Contact:dri-devel@lists.freedesktop.org
+Description:
+RW.  Sustained power limit in milliwatts
+
+The power controller will throttle the operating frequency
+if the power averaged over a window (typically seconds)
+exceeds this limit.
+
+See power1_max_enable power1_max power1_max_interval
+
+   Only supported for particular Intel i915 graphics platforms.
+
+What:   /sys/devices/.../hwmon/hwmon/power1_max_interval
+Date:   June 2021
+KernelVersion:  5.14
+Contact:dri-devel@lists.freedesktop.org
+Description:
+RW. Sustained power limit interval in milliseconds over
+which sustained power is averaged.
+
+See power1_max_enable power1_max power1_max_interval
+
+   Only supported for particular Intel i915 graphics platforms.
+
+What:   /sys/devices/.../hwmon/hwmon/power1_cap_enable
+Date:   June 2021
+KernelVersion:  5.14
+Contact:dri-devel@lists.freedesktop.org
+Description:
+   RW.  Power burst limit is enabled - true or false
+
+See power1_cap_enable power1_cap
+
+   Only supported for particular Intel i915 graphics platforms.
+
+What:   /sys/devices/.../hwmon/hwmon/power1_cap
+Date:   June 2021
+KernelVersion:  5.14
+Contact:dri-devel@lists.freedesktop.org
+Description:
+   RW.  Power burst limit in milliwatts.
+
+See power1_cap_enable power1_cap
+
+   Only supported for particular Intel i915 graphics platforms.
+
+What:   /sys/devices/.../hwmon/hwmon/power_default_limit
+Date:   June 2021
+KernelVersion:  5.14
+Contact:dri-devel@lists.freedesktop.org
+Description:
+RO.  Default power limit.
+
+   Only supported for particular Intel i915 graphics platforms.
+
+What:   /sys/devices/.../hwmon/hwmon/power_min_limit
+Date:   June 2021
+KernelVersion:  5.14
+Con

[PATCH v4 0/1] drm/i915/dg1: Add HWMON power sensor support

2021-05-27 Thread Dale B Stimson
drm/i915/dg1: Add HWMON power support

As part of the System Managemenent Interface (SMI), use the HWMON
subsystem to display power utilization.

The following standard HWMON entries are currently supported
(and appropriately scaled):
/sys/class/drm/card0/device/hwmon/hwmon
- energy1_input
- power1_cap
- power1_max

Some non-standard HWMON power information is also provided, such as
enable bits and intervals.

-

v4  Commit mesage minor rewording

v4  Move call to i915_hwmon_register() to a more appropriate location,
so that it is done after intel_gt_driver_register().
The call to i915_perf_unregister() is moved correspondingly.

v4  The proper register to read energy status is PCU_PACKAGE_ENERGY_STATUS.

v4  Attribute power1_max_enable is read-only.

v3  Added documentation of these hwmon attributes in file
Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon

v3  Commit mesage minor rewording

v3  Function name changes:
i915_hwmon_init() -> i915_hwmon_register()
i915_hwmon_fini() -> i915_hwmon_unregister()

v3  i915_hwmon_register and i915_hwmon_unregister now take arg i915.

v3  i915_hwmon_register() now returns void instead of int.

v3  Macro FIELD_SHIFT() added to compute shift value from constant
field mask.

v3  Certain functions now longer require "inline" due to addition of new
parameter field_shift, allowing access to constant expressions for
the field mask at each call site.  These functions now do field
access via shift and masking and no longer use le32*() functions
(as le32*() required a local constant expression for the mask).
  _field_read_and_scale()
  _field_read64_and_scale()
  _field_scale_and_write()

v3  Some comments were modified.

v3  Now using sysfs_emit() instead of scnprintf().

V2  Rename local function parameter field_mask to field_msk in order to avoid
shadowing the name of function field_mask() from include/linux/bitfield.h.

V2  Change a comment introduction from "/**" to "/*", as it is not intended
to match a pattern that triggers documentation.
Reported-by: kernel test robot 

V2  Slight movement of calls:
- i915_hwmon_init slightly later, after call to i915_setup_sysfs()
- i915_hwmon_fini slightly earlier, before i915_teardown_sysfs()

V2  Fixed some strong typing issues with le32 functions.
Detected by sparse in a run by kernel test robot:
Reported-by: kernel test robot 

Dale B Stimson (1):
  drm/i915/dg1: Add HWMON power sensor support

 .../ABI/testing/sysfs-driver-intel-i915-hwmon | 116 +++
 drivers/gpu/drm/i915/Kconfig  |   1 +
 drivers/gpu/drm/i915/Makefile |   1 +
 drivers/gpu/drm/i915/i915_drv.c   |   6 +
 drivers/gpu/drm/i915/i915_drv.h   |   3 +
 drivers/gpu/drm/i915/i915_hwmon.c | 757 ++
 drivers/gpu/drm/i915/i915_hwmon.h |  42 +
 drivers/gpu/drm/i915/i915_reg.h   |  52 ++
 8 files changed, 978 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
 create mode 100644 drivers/gpu/drm/i915/i915_hwmon.c
 create mode 100644 drivers/gpu/drm/i915/i915_hwmon.h

Range-diff against v3:
1:  ed34d683a0ef1 ! 1:  bc8bd78b2c006 drm/i915/dg1: Add HWMON power support
@@ Metadata
 Author: Dale B Stimson 
 
  ## Commit message ##
-drm/i915/dg1: Add HWMON power support
+drm/i915/dg1: Add HWMON power sensor support
 
 As part of the System Managemenent Interface (SMI), use the HWMON
 subsystem to display power utilization.
 
-The following standard HWMON entries are currently supported
+The following standard HWMON power sensors are currently supported
 (and appropriately scaled):
   /sys/class/drm/card0/device/hwmon/hwmon
 - energy1_input
@@ drivers/gpu/drm/i915/i915_drv.c
  #include "i915_irq.h"
  #include "i915_memcpy.h"
 @@ drivers/gpu/drm/i915/i915_drv.c: static void 
i915_driver_register(struct drm_i915_private *dev_priv)
-   i915_debugfs_register(dev_priv);
-   i915_setup_sysfs(dev_priv);
+ 
+   intel_gt_driver_register(&dev_priv->gt);
  
 +  i915_hwmon_register(dev_priv);
 +
-   /* Depends on sysfs having been initialized */
-   i915_perf_register(dev_priv);
+   intel_display_driver_register(dev_priv);
  
+   intel_power_domains_enable(dev_priv);
 @@ drivers/gpu/drm/i915/i915_drv.c: static void 
i915_driver_unregister(struct drm_i915_private *dev_priv)
+ 
+   intel_display_driver_unregister(dev_priv);
+ 
++  i915_hwmon_unregister(dev_priv);
++
intel_gt_driver_unregister(&dev_priv->gt);
  
i915_perf_unregister(dev_priv);
-+
-+  i915_hwmon_unregister(dev_priv);
 +
i915_pmu_unregister(dev_priv);
  
@@ drivers/gpu/drm/i915/i915_hwmon.c (new)
 +
 +  with_intel_runtime_pm(unc

Re: [PATCH 11/11] drm/tiny: drm_gem_simple_display_pipe_prepare_fb is the default

2021-05-27 Thread Linus Walleij
On Fri, May 21, 2021 at 11:10 AM Daniel Vetter  wrote:

> Goes through all the drivers and deletes the default hook since it's
> the default now.
>
> Signed-off-by: Daniel Vetter 
> Cc: Joel Stanley 
> Cc: Andrew Jeffery 
> Cc: "Noralf Trønnes" 
> Cc: Linus Walleij 
> Cc: Emma Anholt 
> Cc: David Lechner 
> Cc: Kamlesh Gurudasani 
> Cc: Oleksandr Andrushchenko 
> Cc: Daniel Vetter 
> Cc: Maxime Ripard 
> Cc: Thomas Zimmermann 
> Cc: Sam Ravnborg 
> Cc: Alex Deucher 
> Cc: Andy Shevchenko 
> Cc: linux-asp...@lists.ozlabs.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: xen-de...@lists.xenproject.org

Acked-by: Linus Walleij 

Yours,
Linus Walleij


Re: [PATCH v8 04/11] dt-bindings: drm/aux-bus: Add an example

2021-05-27 Thread Linus Walleij
On Tue, May 25, 2021 at 2:02 AM Douglas Anderson  wrote:

> Now that we have an eDP controller that lists aux-bus, we can safely
> add an example to the aux-bus bindings.
>
> NOTE: this example is just a copy of the one in the 'ti-sn65dsi86'
> one. It feels useful to have the example in both places simply because
> it's important to document the interaction between the two bindings in
> both places.
>
> Signed-off-by: Douglas Anderson 

Looks good.
Reviewed-by: Linus Walleij 

Yours,
Linus Walleij


Re: [PATCH v8 03/11] dt-bindings: drm/bridge: ti-sn65dsi86: Add aux-bus child

2021-05-27 Thread Linus Walleij
On Tue, May 25, 2021 at 2:02 AM Douglas Anderson  wrote:

> The patch ("dt-bindings: drm: Introduce the DP AUX bus") talks about
> how using the DP AUX bus is better than learning how to slice
> bread. Let's add it to the ti-sn65dsi86 bindings.
>
> Signed-off-by: Douglas Anderson 
(...)
>  description: See ../../pwm/pwm.yaml for description of the cell formats.>

Just use the full path:
/schemas/pwm/pwm.yaml

> +  aux-bus:
> +$ref: ../dp-aux-bus.yaml#

Use the full path. (Same method as above)

This removes the need for ../../... 

You do it here:

>ports:
>  $ref: /schemas/graph.yaml#/properties/ports

Other than that I think it looks all right!

Yours,
Linus Walleij


Re: [PATCH 2/2] drm/vc4: hdmi: Convert to gpiod

2021-05-27 Thread Linus Walleij
On Mon, May 24, 2021 at 3:19 PM Maxime Ripard  wrote:

> The new gpiod interface takes care of parsing the GPIO flags and to
> return the logical value when accessing an active-low GPIO, so switching
> to it simplifies a lot the driver.
>
> Signed-off-by: Maxime Ripard 

Thanks for fixing this!
Reviewed-by: Linus Walleij 

Yours,
Linus Walleij


[PATCH v4 1/1] drm/i915/dg1: Add HWMON power sensor support

2021-05-27 Thread Dale B Stimson
As part of the System Managemenent Interface (SMI), use the HWMON
subsystem to display power utilization.

The following standard HWMON power sensors are currently supported
(and appropriately scaled):
  /sys/class/drm/card0/device/hwmon/hwmon
- energy1_input
- power1_cap
- power1_max

Some non-standard HWMON power information is also provided, such as
enable bits and intervals.

Signed-off-by: Dale B Stimson 
---
 .../ABI/testing/sysfs-driver-intel-i915-hwmon | 116 +++
 drivers/gpu/drm/i915/Kconfig  |   1 +
 drivers/gpu/drm/i915/Makefile |   1 +
 drivers/gpu/drm/i915/i915_drv.c   |   6 +
 drivers/gpu/drm/i915/i915_drv.h   |   3 +
 drivers/gpu/drm/i915/i915_hwmon.c | 757 ++
 drivers/gpu/drm/i915/i915_hwmon.h |  42 +
 drivers/gpu/drm/i915/i915_reg.h   |  52 ++
 8 files changed, 978 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
 create mode 100644 drivers/gpu/drm/i915/i915_hwmon.c
 create mode 100644 drivers/gpu/drm/i915/i915_hwmon.h

diff --git a/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon 
b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
new file mode 100644
index 0..2ee7c413ca190
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
@@ -0,0 +1,116 @@
+What:   /sys/devices/.../hwmon/hwmon/energy1_input
+Date:   June 2021
+KernelVersion:  5.14
+Contact:dri-devel@lists.freedesktop.org
+Description:
+RO. Energy input of device in microjoules.
+
+   The returned textual representation is an unsigned integer
+   number that can be stored in 64-bits.  Warning: The hardware
+   register is 32-bits wide and can overflow by wrapping around.
+   A single wrap-around between calls to read this value can
+   be detected and will be accounted for in the returned value.
+   At a power consumption of 1 watt, the 32-bit hardware register
+   would wrap-around approximately every 3 days.
+
+   Only supported for particular Intel i915 graphics platforms.
+
+What:   /sys/devices/.../hwmon/hwmon/power1_max_enable
+Date:   June 2021
+KernelVersion:  5.14
+Contact:dri-devel@lists.freedesktop.org
+Description:
+RW.  Sustained power limit is enabled - true or false.
+
+The power controller will throttle the operating frequency
+if the power averaged over a window (typically seconds)
+exceeds this limit.
+
+See power1_max_enable power1_max power1_max_interval
+
+   Only supported for particular Intel i915 graphics platforms.
+
+What:   /sys/devices/.../hwmon/hwmon/power1_max
+Date:   June 2021
+KernelVersion:  5.14
+Contact:dri-devel@lists.freedesktop.org
+Description:
+RW.  Sustained power limit in milliwatts
+
+The power controller will throttle the operating frequency
+if the power averaged over a window (typically seconds)
+exceeds this limit.
+
+See power1_max_enable power1_max power1_max_interval
+
+   Only supported for particular Intel i915 graphics platforms.
+
+What:   /sys/devices/.../hwmon/hwmon/power1_max_interval
+Date:   June 2021
+KernelVersion:  5.14
+Contact:dri-devel@lists.freedesktop.org
+Description:
+RW. Sustained power limit interval in milliseconds over
+which sustained power is averaged.
+
+See power1_max_enable power1_max power1_max_interval
+
+   Only supported for particular Intel i915 graphics platforms.
+
+What:   /sys/devices/.../hwmon/hwmon/power1_cap_enable
+Date:   June 2021
+KernelVersion:  5.14
+Contact:dri-devel@lists.freedesktop.org
+Description:
+   RW.  Power burst limit is enabled - true or false
+
+See power1_cap_enable power1_cap
+
+   Only supported for particular Intel i915 graphics platforms.
+
+What:   /sys/devices/.../hwmon/hwmon/power1_cap
+Date:   June 2021
+KernelVersion:  5.14
+Contact:dri-devel@lists.freedesktop.org
+Description:
+   RW.  Power burst limit in milliwatts.
+
+See power1_cap_enable power1_cap
+
+   Only supported for particular Intel i915 graphics platforms.
+
+What:   /sys/devices/.../hwmon/hwmon/power_default_limit
+Date:   June 2021
+KernelVersion:  5.14
+Contact:dri-devel@lists.freedesktop.org
+Description:
+RO.  Default power limit.
+
+   Only supported for particular Intel i915 graphics platforms.
+
+What:   /sys/devices/.../hwmon/hwmon/power_min_limit
+Date:   June 2021
+KernelVersion:  5.14
+Con

[PATCH v4 0/1] drm/i915/dg1: Add HWMON power sensor support

2021-05-27 Thread Dale B Stimson
drm/i915/dg1: Add HWMON power support

As part of the System Managemenent Interface (SMI), use the HWMON
subsystem to display power utilization.

The following standard HWMON entries are currently supported
(and appropriately scaled):
/sys/class/drm/card0/device/hwmon/hwmon
- energy1_input
- power1_cap
- power1_max

Some non-standard HWMON power information is also provided, such as
enable bits and intervals.

-

v4  Commit mesage minor rewording

v4  Move call to i915_hwmon_register() to a more appropriate location,
so that it is done after intel_gt_driver_register().
The call to i915_perf_unregister() is moved correspondingly.

v4  The proper register to read energy status is PCU_PACKAGE_ENERGY_STATUS.

v4  Attribute power1_max_enable is read-only.

v3  Added documentation of these hwmon attributes in file
Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon

v3  Commit mesage minor rewording

v3  Function name changes:
i915_hwmon_init() -> i915_hwmon_register()
i915_hwmon_fini() -> i915_hwmon_unregister()

v3  i915_hwmon_register and i915_hwmon_unregister now take arg i915.

v3  i915_hwmon_register() now returns void instead of int.

v3  Macro FIELD_SHIFT() added to compute shift value from constant
field mask.

v3  Certain functions now longer require "inline" due to addition of new
parameter field_shift, allowing access to constant expressions for
the field mask at each call site.  These functions now do field
access via shift and masking and no longer use le32*() functions
(as le32*() required a local constant expression for the mask).
  _field_read_and_scale()
  _field_read64_and_scale()
  _field_scale_and_write()

v3  Some comments were modified.

v3  Now using sysfs_emit() instead of scnprintf().

V2  Rename local function parameter field_mask to field_msk in order to avoid
shadowing the name of function field_mask() from include/linux/bitfield.h.

V2  Change a comment introduction from "/**" to "/*", as it is not intended
to match a pattern that triggers documentation.
Reported-by: kernel test robot 

V2  Slight movement of calls:
- i915_hwmon_init slightly later, after call to i915_setup_sysfs()
- i915_hwmon_fini slightly earlier, before i915_teardown_sysfs()

V2  Fixed some strong typing issues with le32 functions.
Detected by sparse in a run by kernel test robot:
Reported-by: kernel test robot 

Dale B Stimson (1):
  drm/i915/dg1: Add HWMON power sensor support

 .../ABI/testing/sysfs-driver-intel-i915-hwmon | 116 +++
 drivers/gpu/drm/i915/Kconfig  |   1 +
 drivers/gpu/drm/i915/Makefile |   1 +
 drivers/gpu/drm/i915/i915_drv.c   |   6 +
 drivers/gpu/drm/i915/i915_drv.h   |   3 +
 drivers/gpu/drm/i915/i915_hwmon.c | 757 ++
 drivers/gpu/drm/i915/i915_hwmon.h |  42 +
 drivers/gpu/drm/i915/i915_reg.h   |  52 ++
 8 files changed, 978 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
 create mode 100644 drivers/gpu/drm/i915/i915_hwmon.c
 create mode 100644 drivers/gpu/drm/i915/i915_hwmon.h

Range-diff against v3:
1:  ed34d683a0ef1 ! 1:  bc8bd78b2c006 drm/i915/dg1: Add HWMON power support
@@ Metadata
 Author: Dale B Stimson 
 
  ## Commit message ##
-drm/i915/dg1: Add HWMON power support
+drm/i915/dg1: Add HWMON power sensor support
 
 As part of the System Managemenent Interface (SMI), use the HWMON
 subsystem to display power utilization.
 
-The following standard HWMON entries are currently supported
+The following standard HWMON power sensors are currently supported
 (and appropriately scaled):
   /sys/class/drm/card0/device/hwmon/hwmon
 - energy1_input
@@ drivers/gpu/drm/i915/i915_drv.c
  #include "i915_irq.h"
  #include "i915_memcpy.h"
 @@ drivers/gpu/drm/i915/i915_drv.c: static void 
i915_driver_register(struct drm_i915_private *dev_priv)
-   i915_debugfs_register(dev_priv);
-   i915_setup_sysfs(dev_priv);
+ 
+   intel_gt_driver_register(&dev_priv->gt);
  
 +  i915_hwmon_register(dev_priv);
 +
-   /* Depends on sysfs having been initialized */
-   i915_perf_register(dev_priv);
+   intel_display_driver_register(dev_priv);
  
+   intel_power_domains_enable(dev_priv);
 @@ drivers/gpu/drm/i915/i915_drv.c: static void 
i915_driver_unregister(struct drm_i915_private *dev_priv)
+ 
+   intel_display_driver_unregister(dev_priv);
+ 
++  i915_hwmon_unregister(dev_priv);
++
intel_gt_driver_unregister(&dev_priv->gt);
  
i915_perf_unregister(dev_priv);
-+
-+  i915_hwmon_unregister(dev_priv);
 +
i915_pmu_unregister(dev_priv);
  
@@ drivers/gpu/drm/i915/i915_hwmon.c (new)
 +
 +  with_intel_runtime_pm(unc

Re: [RFC PATCH 03/13] drm/msm/disp/dpu1: Add support for DSC

2021-05-27 Thread Dmitry Baryshkov

On 21/05/2021 15:49, Vinod Koul wrote:

Display Stream Compression (DSC) is one of the hw blocks in dpu, so add
support by adding hw blocks for DSC

Signed-off-by: Vinod Koul 
---
  drivers/gpu/drm/msm/Makefile  |   1 +
  .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h|  26 +++
  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c| 221 ++
  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.h|  79 +++
  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_mdss.h   |  13 ++
  5 files changed, 340 insertions(+)
  create mode 100644 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c
  create mode 100644 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.h

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 610d630326bb..fd8fc57f1f58 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -61,6 +61,7 @@ msm-y := \
disp/dpu1/dpu_hw_blk.o \
disp/dpu1/dpu_hw_catalog.o \
disp/dpu1/dpu_hw_ctl.o \
+   disp/dpu1/dpu_hw_dsc.o \
disp/dpu1/dpu_hw_interrupts.o \
disp/dpu1/dpu_hw_intf.o \
disp/dpu1/dpu_hw_lm.o \
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
index 4dfd8a20ad5c..a699633f7013 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
@@ -165,6 +165,7 @@ enum {
   * @DPU_PINGPONG_TE2Additional tear check block for split pipes
   * @DPU_PINGPONG_SPLIT  PP block supports split fifo
   * @DPU_PINGPONG_SLAVE  PP block is a suitable slave for split fifo
+ * @DPU_PINGPONG_DSCDisplay stream compression blocks


PP block supports DSC compression?

Also you don't seem to set it anywhere. Do we have hardware w/o DSC support?


   * @DPU_PINGPONG_DITHER,Dither blocks
   * @DPU_PINGPONG_MAX
   */
@@ -173,10 +174,21 @@ enum {
DPU_PINGPONG_TE2,
DPU_PINGPONG_SPLIT,
DPU_PINGPONG_SLAVE,
+   DPU_PINGPONG_DSC,
DPU_PINGPONG_DITHER,
DPU_PINGPONG_MAX
  };
  
+/**

+ * DSC sub-blocks
+ * @DPU_DSCDSC sub block
+ * @DPU_DSC_MAX
+ */
+enum {
+   DPU_DSC = 0x1,
+   DPU_DSC_MAX
+};
+


Unused


  /**
   * CTL sub-blocks
   * @DPU_CTL_SPLIT_DISPLAY   CTL supports video mode split display
@@ -413,6 +425,7 @@ struct dpu_dspp_sub_blks {
  struct dpu_pingpong_sub_blks {
struct dpu_pp_blk te;
struct dpu_pp_blk te2;
+   struct dpu_pp_blk dsc;
struct dpu_pp_blk dither;
  };


Unused


  
@@ -547,6 +560,16 @@ struct dpu_merge_3d_cfg  {

const struct dpu_merge_3d_sub_blks *sblk;
  };
  
+/**

+ * struct dpu_dsc_cfg - information of DSC blocks
+ * @id enum identifying this block
+ * @base   register offset of this block
+ * @features   bit mask identifying sub-blocks/features
+ */
+struct dpu_dsc_cfg {
+   DPU_HW_BLK_INFO;
+};
+
  /**
   * struct dpu_intf_cfg - information of timing engine blocks
   * @id enum identifying this block
@@ -748,6 +771,9 @@ struct dpu_mdss_cfg {
u32 merge_3d_count;
const struct dpu_merge_3d_cfg *merge_3d;
  
+	u32 dsc_count;

+   struct dpu_dsc_cfg *dsc;
+
u32 intf_count;
const struct dpu_intf_cfg *intf;
  
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c

new file mode 100644
index ..8b8d0553709d
--- /dev/null
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c
@@ -0,0 +1,221 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2020, Linaro Limited
+ */
+
+#include "dpu_kms.h"
+#include "dpu_hw_catalog.h"
+#include "dpu_hwio.h"
+#include "dpu_hw_mdss.h"
+#include "dpu_hw_dsc.h"
+
+#define DSC_COMMON_MODE0x000
+#define DSC_ENC 0X004
+#define DSC_PICTURE 0x008
+#define DSC_SLICE   0x00C
+#define DSC_CHUNK_SIZE  0x010
+#define DSC_DELAY   0x014
+#define DSC_SCALE_INITIAL   0x018
+#define DSC_SCALE_DEC_INTERVAL  0x01C
+#define DSC_SCALE_INC_INTERVAL  0x020
+#define DSC_FIRST_LINE_BPG_OFFSET   0x024
+#define DSC_BPG_OFFSET  0x028
+#define DSC_DSC_OFFSET  0x02C
+#define DSC_FLATNESS0x030
+#define DSC_RC_MODEL_SIZE   0x034
+#define DSC_RC  0x038
+#define DSC_RC_BUF_THRESH   0x03C
+#define DSC_RANGE_MIN_QP0x074
+#define DSC_RANGE_MAX_QP0x0B0
+#define DSC_RANGE_BPG_OFFSET0x0EC
+
+static void dpu_hw_dsc_disable(struct dpu_hw_dsc *dsc)
+{
+   struct dpu_hw_blk_reg_map *c = &dsc->hw;
+
+   DPU_REG_WRITE(c, DSC_COMMON_MODE, 0);
+}
+
+static void dpu_hw_dsc_config(struct dpu_hw_dsc *hw_dsc,
+ struct msm_display_dsc_config *dsc,
+ u32 mode, bool ich_reset_override)
+{
+

Re: [RFC PATCH 03/13] drm/msm/dsi: add support for dsc data

2021-05-27 Thread Dmitry Baryshkov

On 21/05/2021 15:49, Vinod Koul wrote:

DSC needs some configuration from device tree, add support to read and
store these params and add DSC structures in msm_drv

Signed-off-by: Vinod Koul 
---
  drivers/gpu/drm/msm/dsi/dsi_host.c | 170 +
  drivers/gpu/drm/msm/msm_drv.h  |  32 ++
  2 files changed, 202 insertions(+)




[skipped]



DRM_DEV_ERROR(dev, "%s: invalid lane configuration %d\n",
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 2668941df529..26661dd43936 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -30,6 +30,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -70,6 +71,16 @@ enum msm_mdp_plane_property {

  #define MSM_GPU_MAX_RINGS 4
  #define MAX_H_TILES_PER_DISPLAY 2
  
+/**

+ * enum msm_display_compression_type - compression method used for pixel stream
+ * @MSM_DISPLAY_COMPRESSION_NONE:  Pixel data is not compressed
+ * @MSM_DISPLAY_COMPRESSION_DSC:   DSC compresison is used
+ */
+enum msm_display_compression_type {
+   MSM_DISPLAY_COMPRESSION_NONE,
+   MSM_DISPLAY_COMPRESSION_DSC,
+};
+


Seems to be unused


  /**
   * enum msm_display_caps - features/capabilities supported by displays
   * @MSM_DISPLAY_CAP_VID_MODE:   Video or "active" mode supported




--
With best wishes
Dmitry


Re: [Freedreno] [RFC PATCH 00/13] drm/msm: Add Display Stream Compression Support

2021-05-27 Thread Rob Clark
On Wed, May 26, 2021 at 8:00 AM Jeffrey Hugo  wrote:
>
> On Tue, May 25, 2021 at 11:46 PM Vinod Koul  wrote:
> >
> > Hello Jeff,
> >
> > On 21-05-21, 08:09, Jeffrey Hugo wrote:
> > > On Fri, May 21, 2021 at 6:50 AM Vinod Koul  wrote:
> > > >
> > > > Display Stream Compression (DSC) compresses the display stream in host 
> > > > which
> > > > is later decoded by panel. This series enables this for Qualcomm msm 
> > > > driver.
> > > > This was tested on Google Pixel3 phone which use LGE SW43408 panel.
> > > >
> > > > The changes include adding DT properties for DSC then hardware blocks 
> > > > support
> > > > required in DPU1 driver and support in encoder. We also add support in 
> > > > DSI
> > > > and introduce required topology changes.
> > > >
> > > > In order for panel to set the DSC parameters we add dsc in drm_panel 
> > > > and set
> > > > it from the msm driver.
> > > >
> > > > Complete changes which enable this for Pixel3 along with panel driver 
> > > > (not
> > > > part of this series) and DT changes can be found at:
> > > > git.linaro.org/people/vinod.koul/kernel.git pixel/dsc_rfc
> > > >
> > > > Comments welcome!
> > >
> > > This feels backwards to me.  I've only skimmed this series, and the DT
> > > changes didn't come through for me, so perhaps I have an incomplete
> > > view.
> >
> > Not sure why, I see it on lore:
> > https://lore.kernel.org/dri-devel/20210521124946.3617862-3-vk...@kernel.org/
> >
> > > DSC is not MSM specific.  There is a standard for it.  Yet it looks
> > > like everything is implemented in a MSM specific way, and then pushed
> > > to the panel.  So, every vendor needs to implement their vendor
> > > specific way to get the DSC info, and then push it to the panel?
> > > Seems wrong, given there is an actual standard for this feature.
> >
> > I have added slice and bpp info in the DT here under the host and then
> > pass the generic struct drm_dsc_config to panel which allows panel to
> > write the pps cmd
> >
> > Nothing above is MSM specific.. It can very well work with non MSM
> > controllers too.
>
> I disagree.
>
> The DT bindings you defined (thanks for the direct link) are MSM
> specific.  I'm not talking (yet) about the properties you defined, but
> purely from the stand point that you defined the binding within the
> scope of the MSM dsi binding.  No other vendor can use those bindings.
> Of course, if we look at the properties themselves, they are prefixed
> with "qcom", which is vendor specific.
>
> So, purely on the face of it, this is MSM specific.
>
> Assuming we want a DT solution for DSC, I think it should be something
> like Documentation/devicetree/bindings/clock/clock-bindings.txt (the
> first example that comes to mind), which is a non-vendor specific
> generic set of properties that each vendor/device specific binding can
> inherit.  Panel has similar things.
>
> Specific to the properties, I don't much like that you duplicate BPP,
> which is already associated with the panel (although perhaps not in
> the scope of DT).  What if the panel and your DSC bindings disagree?
> Also, I guess I need to ask, have you read the DSC spec?  Last I
> looked, there were something like 3 dozen properties that could be
> configured.  You have five in your proposed binding.  To me, this is
> not a generic DSC solution, this is MSM specific (and frankly I don't
> think this supports all the configuration the MSM hardware can do,
> either).
>
> I'm surprised Rob Herring didn't have more to say on this.
>
> > I didn't envision DSC to be a specific thing, most of
> > the patches here are hardware enabling ones for DSC bits for MSM
> > hardware.
> >
> > > Additionally, we define panel properties (resolution, BPP, etc) at the
> > > panel, and have the display drivers pull it from the panel.  However,
> > > for DSC, you do the reverse (define it in the display driver, and push
> > > it to the panel).  If the argument is that DSC properties can be
> > > dynamic, well, so can resolution.  Every panel for MSM MTPs supports
> > > multiple resolutions, yet we define that with the panel in Linux.
> >
> > I dont have an answer for that right now, to start with yes the
> > properties are in host but I am okay to discuss this and put wherever we
> > feel is most correct thing.  I somehow dont like that we should pull
> > from panel DT and program host with that. Here using struct
> > drm_dsc_config allows me to configure panel based on resolution passed
>
> I somewhat agree that pulling from the panel and programing the host
> based on that is an odd solution, but we have it currently.  Have a
> look at Documentation/devicetree/bindings/display/panel in particular
> panel-timing.  All of that ends up informing the mdss programing
> anyways (particularly the dsi and its phy).  So my problem is that we
> currently have a solution that seems to just need to be extended, and
> instead you have proposed a completely different solution which is
> arguably contradictory.
>
> However, I'd l

[PATCH 11/11] drm/ingenic: Attach bridge chain to encoders

2021-05-27 Thread Paul Cercueil
Attach a top-level bridge to each encoder, which will be used for
negociating the bus format and flags.

Signed-off-by: Paul Cercueil 
---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 98 ++-
 1 file changed, 77 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c 
b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index 01d8490393d1..f0242e917d6e 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -132,6 +133,26 @@ struct ingenic_drm {
struct drm_private_obj private_obj;
 };
 
+struct ingenic_drm_bridge {
+   struct drm_encoder encoder;
+   struct drm_bridge bridge;
+   struct drm_bridge *next_bridge;
+
+   /*
+* FIXME: this should really be in ingenic_drm_private_state, but there
+* doesn't seem to be a way to retrieve a pointer to it from within
+* ingenic_drm_encoder_atomic_mode_set (no drm_atomic_state
+* back-pointers).
+*/
+   struct drm_bus_cfg bus_cfg;
+};
+
+static inline struct ingenic_drm_bridge *
+to_ingenic_drm_bridge(struct drm_encoder *encoder)
+{
+   return container_of(encoder, struct ingenic_drm_bridge, encoder);
+}
+
 static inline struct ingenic_drm_private_state *
 to_ingenic_drm_priv_state(struct drm_private_state *state)
 {
@@ -749,11 +770,10 @@ static void ingenic_drm_encoder_atomic_mode_set(struct 
drm_encoder *encoder,
 {
struct ingenic_drm *priv = drm_device_get_priv(encoder->dev);
struct drm_display_mode *mode = &crtc_state->adjusted_mode;
-   struct drm_connector *conn = conn_state->connector;
-   struct drm_display_info *info = &conn->display_info;
+   struct ingenic_drm_bridge *bridge = to_ingenic_drm_bridge(encoder);
unsigned int cfg, rgbcfg = 0;
 
-   priv->panel_is_sharp = info->bus_flags & DRM_BUS_FLAG_SHARP_SIGNALS;
+   priv->panel_is_sharp = bridge->bus_cfg.flags & 
DRM_BUS_FLAG_SHARP_SIGNALS;
 
if (priv->panel_is_sharp) {
cfg = JZ_LCD_CFG_MODE_SPECIAL_TFT_1 | JZ_LCD_CFG_REV_POLARITY;
@@ -766,19 +786,19 @@ static void ingenic_drm_encoder_atomic_mode_set(struct 
drm_encoder *encoder,
cfg |= JZ_LCD_CFG_HSYNC_ACTIVE_LOW;
if (mode->flags & DRM_MODE_FLAG_NVSYNC)
cfg |= JZ_LCD_CFG_VSYNC_ACTIVE_LOW;
-   if (info->bus_flags & DRM_BUS_FLAG_DE_LOW)
+   if (bridge->bus_cfg.flags & DRM_BUS_FLAG_DE_LOW)
cfg |= JZ_LCD_CFG_DE_ACTIVE_LOW;
-   if (info->bus_flags & DRM_BUS_FLAG_PIXDATA_DRIVE_NEGEDGE)
+   if (bridge->bus_cfg.flags & DRM_BUS_FLAG_PIXDATA_DRIVE_NEGEDGE)
cfg |= JZ_LCD_CFG_PCLK_FALLING_EDGE;
 
if (!priv->panel_is_sharp) {
-   if (conn->connector_type == DRM_MODE_CONNECTOR_TV) {
+   if (conn_state->connector->connector_type == 
DRM_MODE_CONNECTOR_TV) {
if (mode->flags & DRM_MODE_FLAG_INTERLACE)
cfg |= JZ_LCD_CFG_MODE_TV_OUT_I;
else
cfg |= JZ_LCD_CFG_MODE_TV_OUT_P;
} else {
-   switch (*info->bus_formats) {
+   switch (bridge->bus_cfg.format) {
case MEDIA_BUS_FMT_RGB565_1X16:
cfg |= JZ_LCD_CFG_MODE_GENERIC_16BIT;
break;
@@ -804,20 +824,31 @@ static void ingenic_drm_encoder_atomic_mode_set(struct 
drm_encoder *encoder,
regmap_write(priv->map, JZ_REG_LCD_RGBC, rgbcfg);
 }
 
-static int ingenic_drm_encoder_atomic_check(struct drm_encoder *encoder,
-   struct drm_crtc_state *crtc_state,
-   struct drm_connector_state 
*conn_state)
+static int ingenic_drm_bridge_attach(struct drm_bridge *bridge,
+enum drm_bridge_attach_flags flags)
+{
+   struct drm_encoder *encoder = bridge->encoder;
+   struct ingenic_drm_bridge *ingenic_bridge = 
to_ingenic_drm_bridge(encoder);
+
+   return drm_bridge_attach(encoder, ingenic_bridge->next_bridge,
+&ingenic_bridge->bridge, flags);
+}
+
+static int ingenic_drm_bridge_atomic_check(struct drm_bridge *bridge,
+  struct drm_bridge_state 
*bridge_state,
+  struct drm_crtc_state *crtc_state,
+  struct drm_connector_state 
*conn_state)
 {
-   struct drm_display_info *info = &conn_state->connector->display_info;
struct drm_display_mode *mode = &crtc_state->adjusted_mode;
+   struct drm_encoder *encoder = bridge->encoder;
+   struct ingenic_drm_bridge *ingenic_bridge = 
to_ingenic_drm_bridge(encoder);
 
-   if (info->num_bus_formats != 1)
- 

[PATCH 10/11] drm/ingenic: Add doublescan feature

2021-05-27 Thread Paul Cercueil
A lot of devices with an Ingenic SoC have a weird LCD panel attached,
where the pixels are not square. For instance, the AUO A030JTN01 and
Innolux EJ030NA panels have a resolution of 320x480 with a 4:3 aspect
ratio.

All userspace applications are built with the assumption that the
pixels are square. To be able to support these devices without too
much effort, add a doublescan feature, which allows the f0 and f1
planes to be used with only half of the screen's vertical resolution,
where each line of the input is displayed twice.

This is done using a chained list of DMA descriptors, one descriptor
per output line.

Signed-off-by: Paul Cercueil 
---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 93 +--
 1 file changed, 87 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c 
b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index 2761478b16e8..01d8490393d1 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -66,6 +66,8 @@ struct jz_soc_info {
 
 struct ingenic_gem_object {
struct drm_gem_cma_object base;
+   struct ingenic_dma_hwdesc *hwdescs;
+   dma_addr_t hwdescs_phys;
 };
 
 struct ingenic_drm_private_state {
@@ -73,6 +75,23 @@ struct ingenic_drm_private_state {
 
bool no_vblank;
bool use_palette;
+
+   /*
+* A lot of devices with an Ingenic SoC have a weird LCD panel attached,
+* where the pixels are not square. For instance, the AUO A030JTN01 and
+* Innolux EJ030NA panels have a resolution of 320x480 with a 4:3 aspect
+* ratio.
+*
+* All userspace applications are built with the assumption that the
+* pixels are square. To be able to support these devices without too
+* much effort, add a doublescan feature, which allows the f0 and f1
+* planes to be used with only half of the screen's vertical resolution,
+* where each line of the input is displayed twice.
+*
+* This is done using a chained list of DMA descriptors, one descriptor
+* per output line.
+*/
+   bool doublescan;
 };
 
 struct ingenic_drm {
@@ -465,7 +484,7 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane 
*plane,
return PTR_ERR(priv_state);
 
ret = drm_atomic_helper_check_plane_state(new_plane_state, crtc_state,
- DRM_PLANE_HELPER_NO_SCALING,
+ 0x8000,
  DRM_PLANE_HELPER_NO_SCALING,
  priv->soc_info->has_osd,
  true);
@@ -482,6 +501,17 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane 
*plane,
 (new_plane_state->src_h >> 16) != new_plane_state->crtc_h))
return -EINVAL;
 
+   /* Enable doublescan if the CRTC_H is twice the SRC_H. */
+   priv_state->doublescan = (new_plane_state->src_h >> 16) * 2 == 
new_plane_state->crtc_h;
+
+   /* Otherwise, fail if CRTC_H != SRC_H */
+   if (!priv_state->doublescan && (new_plane_state->src_h >> 16) != 
new_plane_state->crtc_h)
+   return -EINVAL;
+
+   /* Fail if CRTC_W != SRC_W */
+   if ((new_plane_state->src_w >> 16) != new_plane_state->crtc_w)
+   return -EINVAL;
+
priv_state->use_palette = new_plane_state->fb &&
new_plane_state->fb->format->format == DRM_FORMAT_C8;
 
@@ -647,7 +677,9 @@ static void ingenic_drm_plane_atomic_update(struct 
drm_plane *plane,
struct ingenic_drm_private_state *priv_state;
struct drm_crtc_state *crtc_state;
struct ingenic_dma_hwdesc *hwdesc;
-   unsigned int width, height, cpp;
+   unsigned int width, height, cpp, i;
+   struct drm_gem_object *gem_obj;
+   struct ingenic_gem_object *obj;
dma_addr_t addr, next_addr;
bool use_f1;
u32 fourcc;
@@ -664,17 +696,39 @@ static void ingenic_drm_plane_atomic_update(struct 
drm_plane *plane,
height = newstate->src_h >> 16;
cpp = newstate->fb->format->cpp[0];
 
+   gem_obj = drm_gem_fb_get_obj(newstate->fb, 0);
+   obj = to_ingenic_gem_obj(gem_obj);
+
priv_state = ingenic_drm_get_new_priv_state(priv, state);
if (priv_state && priv_state->use_palette)
next_addr = dma_hwdesc_pal_addr(priv);
else
next_addr = dma_hwdesc_addr(priv, use_f1);
 
-   hwdesc = &priv->dma_hwdescs->hwdesc[use_f1];
+   if (priv_state->doublescan) {
+   hwdesc = &obj->hwdescs[0];
+   /*
+* Use one DMA descriptor per output line, and display
+* each input line twice.
+*/
+

[PATCH 08/11] drm/ingenic: Support custom GEM object

2021-05-27 Thread Paul Cercueil
Add boilerplate code to support a custom "ingenic_gem_object". Empty for
now, but it will be useful later when subsequent patches will introduce
object-specific driver data.

Signed-off-by: Paul Cercueil 
---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c 
b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index ced2109e8f35..1cac369f6293 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -64,6 +64,10 @@ struct jz_soc_info {
unsigned int num_formats_f0, num_formats_f1;
 };
 
+struct ingenic_gem_object {
+   struct drm_gem_cma_object base;
+};
+
 struct ingenic_drm_private_state {
struct drm_private_state base;
 
@@ -179,6 +183,11 @@ static inline struct ingenic_drm *drm_nb_get_priv(struct 
notifier_block *nb)
return container_of(nb, struct ingenic_drm, clock_nb);
 }
 
+static inline struct ingenic_gem_object *to_ingenic_gem_obj(struct 
drm_gem_object *gem_obj)
+{
+   return container_of(gem_obj, struct ingenic_gem_object, base.base);
+}
+
 static inline dma_addr_t dma_hwdesc_addr(const struct ingenic_drm *priv, bool 
use_f1)
 {
u32 offset = offsetof(struct ingenic_dma_hwdescs, hwdesc[use_f1]);
@@ -853,15 +862,15 @@ static struct drm_gem_object *
 ingenic_drm_gem_create_object(struct drm_device *drm, size_t size)
 {
struct ingenic_drm *priv = drm_device_get_priv(drm);
-   struct drm_gem_cma_object *obj;
+   struct ingenic_gem_object *obj;
 
obj = kzalloc(sizeof(*obj), GFP_KERNEL);
if (!obj)
return ERR_PTR(-ENOMEM);
 
-   obj->map_noncoherent = priv->soc_info->map_noncoherent;
+   obj->base.map_noncoherent = priv->soc_info->map_noncoherent;
 
-   return &obj->base;
+   return &obj->base.base;
 }
 
 static struct drm_private_state *
-- 
2.30.2



[PATCH 09/11] drm/ingenic: Add ingenic_drm_gem_fb_destroy() function

2021-05-27 Thread Paul Cercueil
Add a ingenic_drm_gem_fb_destroy() function, which currently only calls
gem_fb_destroy(), but will be extended in a subsequent patch.

Signed-off-by: Paul Cercueil 
---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 26 +--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c 
b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index 1cac369f6293..2761478b16e8 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -846,16 +846,38 @@ static void ingenic_drm_disable_vblank(struct drm_crtc 
*crtc)
regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0);
 }
 
+static void ingenic_drm_gem_fb_destroy(struct drm_framebuffer *fb)
+{
+   drm_gem_fb_destroy(fb);
+}
+
+static const struct drm_framebuffer_funcs ingenic_drm_gem_fb_funcs = {
+   .destroy= ingenic_drm_gem_fb_destroy,
+   .create_handle  = drm_gem_fb_create_handle,
+};
+
+static const struct drm_framebuffer_funcs ingenic_drm_gem_fb_funcs_dirty = {
+   .destroy= ingenic_drm_gem_fb_destroy,
+   .create_handle  = drm_gem_fb_create_handle,
+   .dirty  = drm_atomic_helper_dirtyfb,
+};
+
 static struct drm_framebuffer *
 ingenic_drm_gem_fb_create(struct drm_device *drm, struct drm_file *file,
  const struct drm_mode_fb_cmd2 *mode_cmd)
 {
struct ingenic_drm *priv = drm_device_get_priv(drm);
+   const struct drm_framebuffer_funcs *fb_funcs;
+   struct drm_framebuffer *fb;
 
if (priv->soc_info->map_noncoherent)
-   return drm_gem_fb_create_with_dirty(drm, file, mode_cmd);
+   fb_funcs = &ingenic_drm_gem_fb_funcs_dirty;
+   else
+   fb_funcs = &ingenic_drm_gem_fb_funcs;
+
+   fb = drm_gem_fb_create_with_funcs(drm, file, mode_cmd, fb_funcs);
 
-   return drm_gem_fb_create(drm, file, mode_cmd);
+   return fb;
 }
 
 static struct drm_gem_object *
-- 
2.30.2



[PATCH 07/11] drm/ingenic: Upload palette before frame

2021-05-27 Thread Paul Cercueil
When using C8 color mode, make sure that the palette is always uploaded
before a frame; otherwise the very first frame will have wrong colors.

Do that by changing the link order of the DMA descriptors.

Signed-off-by: Paul Cercueil 
---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 45 ++-
 1 file changed, 35 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c 
b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index 5ba3283da97d..ced2109e8f35 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -68,6 +68,7 @@ struct ingenic_drm_private_state {
struct drm_private_state base;
 
bool no_vblank;
+   bool use_palette;
 };
 
 struct ingenic_drm {
@@ -185,6 +186,13 @@ static inline dma_addr_t dma_hwdesc_addr(const struct 
ingenic_drm *priv, bool us
return priv->dma_hwdescs_phys + offset;
 }
 
+static inline dma_addr_t dma_hwdesc_pal_addr(const struct ingenic_drm *priv)
+{
+   u32 offset = offsetof(struct ingenic_dma_hwdescs, hwdesc_pal);
+
+   return priv->dma_hwdescs_phys + offset;
+}
+
 static int ingenic_drm_update_pixclk(struct notifier_block *nb,
 unsigned long action,
 void *data)
@@ -207,11 +215,19 @@ static void ingenic_drm_crtc_atomic_enable(struct 
drm_crtc *crtc,
   struct drm_atomic_state *state)
 {
struct ingenic_drm *priv = drm_crtc_get_priv(crtc);
+   struct ingenic_drm_private_state *priv_state;
+
+   priv_state = ingenic_drm_get_new_priv_state(priv, state);
+   if (WARN_ON(!priv_state))
+   return;
 
regmap_write(priv->map, JZ_REG_LCD_STATE, 0);
 
/* Set address of our DMA descriptor chain */
-   regmap_write(priv->map, JZ_REG_LCD_DA0, dma_hwdesc_addr(priv, 0));
+   if (priv_state->use_palette)
+   regmap_write(priv->map, JZ_REG_LCD_DA0, 
dma_hwdesc_pal_addr(priv));
+   else
+   regmap_write(priv->map, JZ_REG_LCD_DA0, dma_hwdesc_addr(priv, 
0));
regmap_write(priv->map, JZ_REG_LCD_DA1, dma_hwdesc_addr(priv, 1));
 
regmap_update_bits(priv->map, JZ_REG_LCD_CTRL,
@@ -422,6 +438,7 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane 
*plane,
struct drm_plane_state *new_plane_state = 
drm_atomic_get_new_plane_state(state,

 plane);
struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
+   struct ingenic_drm_private_state *priv_state;
struct drm_crtc_state *crtc_state;
struct drm_crtc *crtc = new_plane_state->crtc ?: old_plane_state->crtc;
int ret;
@@ -434,6 +451,10 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane 
*plane,
if (WARN_ON(!crtc_state))
return -EINVAL;
 
+   priv_state = ingenic_drm_get_priv_state(priv, state);
+   if (IS_ERR(priv_state))
+   return PTR_ERR(priv_state);
+
ret = drm_atomic_helper_check_plane_state(new_plane_state, crtc_state,
  DRM_PLANE_HELPER_NO_SCALING,
  DRM_PLANE_HELPER_NO_SCALING,
@@ -452,6 +473,9 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane 
*plane,
 (new_plane_state->src_h >> 16) != new_plane_state->crtc_h))
return -EINVAL;
 
+   priv_state->use_palette = new_plane_state->fb &&
+   new_plane_state->fb->format->format == DRM_FORMAT_C8;
+
/*
 * Require full modeset if enabling or disabling a plane, or changing
 * its position, size or depth.
@@ -611,10 +635,11 @@ static void ingenic_drm_plane_atomic_update(struct 
drm_plane *plane,
struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
struct drm_plane_state *newstate = 
drm_atomic_get_new_plane_state(state, plane);
struct drm_plane_state *oldstate = 
drm_atomic_get_old_plane_state(state, plane);
+   struct ingenic_drm_private_state *priv_state;
struct drm_crtc_state *crtc_state;
struct ingenic_dma_hwdesc *hwdesc;
-   unsigned int width, height, cpp, offset;
-   dma_addr_t addr;
+   unsigned int width, height, cpp;
+   dma_addr_t addr, next_addr;
bool use_f1;
u32 fourcc;
 
@@ -630,23 +655,23 @@ static void ingenic_drm_plane_atomic_update(struct 
drm_plane *plane,
height = newstate->src_h >> 16;
cpp = newstate->fb->format->cpp[0];
 
+   priv_state = ingenic_drm_get_new_priv_state(priv, state);
+   if (priv_state && priv_state->use_palette)
+   next_addr = dma_hwdesc_pal_addr(priv);
+   else
+   next_addr = dma_hwdesc_addr(priv, use_f1);
+
hwdesc = &priv->dma_hwdescs->hwdesc[use_f1];
 
   

[PATCH 06/11] drm/ingenic: Set DMA descriptor chain register when starting CRTC

2021-05-27 Thread Paul Cercueil
Setting the DMA descriptor chain register in the probe function has been
fine until now, because we only ever had one descriptor per foreground.

As the driver will soon have real descriptor chains, and the DMA
descriptor chain register updates itself to point to the current
descriptor being processed, this register needs to be reset after a full
modeset to point to the first descriptor of the chain.

Signed-off-by: Paul Cercueil 
---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c 
b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index 639994329c60..5ba3283da97d 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -210,6 +210,10 @@ static void ingenic_drm_crtc_atomic_enable(struct drm_crtc 
*crtc,
 
regmap_write(priv->map, JZ_REG_LCD_STATE, 0);
 
+   /* Set address of our DMA descriptor chain */
+   regmap_write(priv->map, JZ_REG_LCD_DA0, dma_hwdesc_addr(priv, 0));
+   regmap_write(priv->map, JZ_REG_LCD_DA1, dma_hwdesc_addr(priv, 1));
+
regmap_update_bits(priv->map, JZ_REG_LCD_CTRL,
   JZ_LCD_CTRL_ENABLE | JZ_LCD_CTRL_DISABLE,
   JZ_LCD_CTRL_ENABLE);
@@ -1218,10 +1222,6 @@ static int ingenic_drm_bind(struct device *dev, bool 
has_components)
}
}
 
-   /* Set address of our DMA descriptor chain */
-   regmap_write(priv->map, JZ_REG_LCD_DA0, dma_hwdesc_phys_f0);
-   regmap_write(priv->map, JZ_REG_LCD_DA1, dma_hwdesc_phys_f1);
-
/* Enable OSD if available */
if (soc_info->has_osd)
regmap_write(priv->map, JZ_REG_LCD_OSDC, JZ_LCD_OSDC_OSDEN);
-- 
2.30.2



[PATCH 05/11] drm/ingenic: Move IPU scale settings to private state

2021-05-27 Thread Paul Cercueil
The IPU scaling information is computed in the plane's ".atomic_check"
callback, and used in the ".atomic_update" callback. As such, it is
state-specific, and should be moved to the private state structure.

Signed-off-by: Paul Cercueil 
---
 drivers/gpu/drm/ingenic/ingenic-ipu.c | 73 ---
 1 file changed, 54 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c 
b/drivers/gpu/drm/ingenic/ingenic-ipu.c
index 007cd547b285..b85d9a7f53d3 100644
--- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
+++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
@@ -47,6 +47,8 @@ struct soc_info {
 
 struct ingenic_ipu_private_state {
struct drm_private_state base;
+
+   unsigned int num_w, num_h, denom_w, denom_h;
 };
 
 struct ingenic_ipu {
@@ -58,8 +60,6 @@ struct ingenic_ipu {
const struct soc_info *soc_info;
bool clk_enabled;
 
-   unsigned int num_w, num_h, denom_w, denom_h;
-
dma_addr_t addr_y, addr_u, addr_v;
 
struct drm_property *sharpness_prop;
@@ -85,6 +85,30 @@ to_ingenic_ipu_priv_state(struct drm_private_state *state)
return container_of(state, struct ingenic_ipu_private_state, base);
 }
 
+static struct ingenic_ipu_private_state *
+ingenic_ipu_get_priv_state(struct ingenic_ipu *priv, struct drm_atomic_state 
*state)
+{
+   struct drm_private_state *priv_state;
+
+   priv_state = drm_atomic_get_private_obj_state(state, 
&priv->private_obj);
+   if (IS_ERR(priv_state))
+   return ERR_CAST(priv_state);
+
+   return to_ingenic_ipu_priv_state(priv_state);
+}
+
+static struct ingenic_ipu_private_state *
+ingenic_ipu_get_new_priv_state(struct ingenic_ipu *priv, struct 
drm_atomic_state *state)
+{
+   struct drm_private_state *priv_state;
+
+   priv_state = drm_atomic_get_new_private_obj_state(state, 
&priv->private_obj);
+   if (!priv_state)
+   return NULL;
+
+   return to_ingenic_ipu_priv_state(priv_state);
+}
+
 /*
  * Apply conventional cubic convolution kernel. Both parameters
  *  and return value are 15.16 signed fixed-point.
@@ -305,11 +329,16 @@ static void ingenic_ipu_plane_atomic_update(struct 
drm_plane *plane,
const struct drm_format_info *finfo;
u32 ctrl, stride = 0, coef_index = 0, format = 0;
bool needs_modeset, upscaling_w, upscaling_h;
+   struct ingenic_ipu_private_state *ipu_state;
int err;
 
if (!newstate || !newstate->fb)
return;
 
+   ipu_state = ingenic_ipu_get_new_priv_state(ipu, state);
+   if (WARN_ON(!ipu_state))
+   return;
+
finfo = drm_format_info(newstate->fb->format->format);
 
if (!ipu->clk_enabled) {
@@ -482,27 +511,27 @@ static void ingenic_ipu_plane_atomic_update(struct 
drm_plane *plane,
if (ipu->soc_info->has_bicubic)
ctrl |= JZ_IPU_CTRL_ZOOM_SEL;
 
-   upscaling_w = ipu->num_w > ipu->denom_w;
+   upscaling_w = ipu_state->num_w > ipu_state->denom_w;
if (upscaling_w)
ctrl |= JZ_IPU_CTRL_HSCALE;
 
-   if (ipu->num_w != 1 || ipu->denom_w != 1) {
+   if (ipu_state->num_w != 1 || ipu_state->denom_w != 1) {
if (!ipu->soc_info->has_bicubic && !upscaling_w)
-   coef_index |= (ipu->denom_w - 1) << 16;
+   coef_index |= (ipu_state->denom_w - 1) << 16;
else
-   coef_index |= (ipu->num_w - 1) << 16;
+   coef_index |= (ipu_state->num_w - 1) << 16;
ctrl |= JZ_IPU_CTRL_HRSZ_EN;
}
 
-   upscaling_h = ipu->num_h > ipu->denom_h;
+   upscaling_h = ipu_state->num_h > ipu_state->denom_h;
if (upscaling_h)
ctrl |= JZ_IPU_CTRL_VSCALE;
 
-   if (ipu->num_h != 1 || ipu->denom_h != 1) {
+   if (ipu_state->num_h != 1 || ipu_state->denom_h != 1) {
if (!ipu->soc_info->has_bicubic && !upscaling_h)
-   coef_index |= ipu->denom_h - 1;
+   coef_index |= ipu_state->denom_h - 1;
else
-   coef_index |= ipu->num_h - 1;
+   coef_index |= ipu_state->num_h - 1;
ctrl |= JZ_IPU_CTRL_VRSZ_EN;
}
 
@@ -513,13 +542,13 @@ static void ingenic_ipu_plane_atomic_update(struct 
drm_plane *plane,
/* Set the LUT index register */
regmap_write(ipu->map, JZ_REG_IPU_RSZ_COEF_INDEX, coef_index);
 
-   if (ipu->num_w != 1 || ipu->denom_w != 1)
+   if (ipu_state->num_w != 1 || ipu_state->denom_w != 1)
ingenic_ipu_set_coefs(ipu, JZ_REG_IPU_HRSZ_COEF_LUT,
- ipu->num_w, ipu->denom_w);
+ ipu_state->num_w, ipu_state->denom_w);
 
-   if (ipu->num_h != 1 || ipu->denom_h != 1)
+   if (ipu_state->num_h != 1 || ipu_state->denom_h != 1)
ingenic_ipu_set_coefs(ipu, JZ_REG_IPU_VRSZ_COEF_LUT,
-  

[PATCH 04/11] drm/ingenic: Move no_vblank to private state

2021-05-27 Thread Paul Cercueil
This information is carried from the ".atomic_check" to the
".atomic_commit_tail"; as such it is state-specific, and should be moved
to the private state structure.

Signed-off-by: Paul Cercueil 
---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 41 ---
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c 
b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index e81084eb3b0e..639994329c60 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -66,6 +66,8 @@ struct jz_soc_info {
 
 struct ingenic_drm_private_state {
struct drm_private_state base;
+
+   bool no_vblank;
 };
 
 struct ingenic_drm {
@@ -87,7 +89,6 @@ struct ingenic_drm {
dma_addr_t dma_hwdescs_phys;
 
bool panel_is_sharp;
-   bool no_vblank;
 
/*
 * clk_mutex is used to synchronize the pixel clock rate update with
@@ -113,6 +114,30 @@ to_ingenic_drm_priv_state(struct drm_private_state *state)
return container_of(state, struct ingenic_drm_private_state, base);
 }
 
+static struct ingenic_drm_private_state *
+ingenic_drm_get_priv_state(struct ingenic_drm *priv, struct drm_atomic_state 
*state)
+{
+   struct drm_private_state *priv_state;
+
+   priv_state = drm_atomic_get_private_obj_state(state, 
&priv->private_obj);
+   if (IS_ERR(priv_state))
+   return ERR_CAST(priv_state);
+
+   return to_ingenic_drm_priv_state(priv_state);
+}
+
+static struct ingenic_drm_private_state *
+ingenic_drm_get_new_priv_state(struct ingenic_drm *priv, struct 
drm_atomic_state *state)
+{
+   struct drm_private_state *priv_state;
+
+   priv_state = drm_atomic_get_new_private_obj_state(state, 
&priv->private_obj);
+   if (!priv_state)
+   return NULL;
+
+   return to_ingenic_drm_priv_state(priv_state);
+}
+
 static bool ingenic_drm_writeable_reg(struct device *dev, unsigned int reg)
 {
switch (reg) {
@@ -268,6 +293,7 @@ static int ingenic_drm_crtc_atomic_check(struct drm_crtc 
*crtc,
  crtc);
struct ingenic_drm *priv = drm_crtc_get_priv(crtc);
struct drm_plane_state *f1_state, *f0_state, *ipu_state = NULL;
+   struct ingenic_drm_private_state *priv_state;
 
if (crtc_state->gamma_lut &&
drm_color_lut_size(crtc_state->gamma_lut) != 
ARRAY_SIZE(priv->dma_hwdescs->palette)) {
@@ -299,9 +325,13 @@ static int ingenic_drm_crtc_atomic_check(struct drm_crtc 
*crtc,
}
}
 
+   priv_state = ingenic_drm_get_priv_state(priv, state);
+   if (IS_ERR(priv_state))
+   return PTR_ERR(priv_state);
+
/* If all the planes are disabled, we won't get a VBLANK IRQ */
-   priv->no_vblank = !f1_state->fb && !f0_state->fb &&
- !(ipu_state && ipu_state->fb);
+   priv_state->no_vblank = !f1_state->fb && !f0_state->fb &&
+   !(ipu_state && ipu_state->fb);
}
 
return 0;
@@ -727,6 +757,7 @@ static void ingenic_drm_atomic_helper_commit_tail(struct 
drm_atomic_state *old_s
 */
struct drm_device *dev = old_state->dev;
struct ingenic_drm *priv = drm_device_get_priv(dev);
+   struct ingenic_drm_private_state *priv_state;
 
drm_atomic_helper_commit_modeset_disables(dev, old_state);
 
@@ -736,7 +767,9 @@ static void ingenic_drm_atomic_helper_commit_tail(struct 
drm_atomic_state *old_s
 
drm_atomic_helper_commit_hw_done(old_state);
 
-   if (!priv->no_vblank)
+   priv_state = ingenic_drm_get_new_priv_state(priv, old_state);
+
+   if (!priv_state || !priv_state->no_vblank)
drm_atomic_helper_wait_for_vblanks(dev, old_state);
 
drm_atomic_helper_cleanup_planes(dev, old_state);
-- 
2.30.2



[PATCH 03/11] drm/ingenic: Add support for private objects

2021-05-27 Thread Paul Cercueil
Until now, the ingenic-drm as well as the ingenic-ipu drivers used to
put state-specific information in their respective private structure.

Add boilerplate code to support private objects in the two drivers, so
that state-specific information can be put in the state-specific private
structure.

Signed-off-by: Paul Cercueil 
---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 61 +++
 drivers/gpu/drm/ingenic/ingenic-ipu.c | 54 
 2 files changed, 115 insertions(+)

diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c 
b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index 4e41bdf2f3fd..e81084eb3b0e 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -64,6 +64,10 @@ struct jz_soc_info {
unsigned int num_formats_f0, num_formats_f1;
 };
 
+struct ingenic_drm_private_state {
+   struct drm_private_state base;
+};
+
 struct ingenic_drm {
struct drm_device drm;
/*
@@ -99,8 +103,16 @@ struct ingenic_drm {
struct mutex clk_mutex;
bool update_clk_rate;
struct notifier_block clock_nb;
+
+   struct drm_private_obj private_obj;
 };
 
+static inline struct ingenic_drm_private_state *
+to_ingenic_drm_priv_state(struct drm_private_state *state)
+{
+   return container_of(state, struct ingenic_drm_private_state, base);
+}
+
 static bool ingenic_drm_writeable_reg(struct device *dev, unsigned int reg)
 {
switch (reg) {
@@ -790,6 +802,28 @@ ingenic_drm_gem_create_object(struct drm_device *drm, 
size_t size)
return &obj->base;
 }
 
+static struct drm_private_state *
+ingenic_drm_duplicate_state(struct drm_private_obj *obj)
+{
+   struct ingenic_drm_private_state *state = 
to_ingenic_drm_priv_state(obj->state);
+
+   state = kmemdup(state, sizeof(*state), GFP_KERNEL);
+   if (!state)
+   return NULL;
+
+   __drm_atomic_helper_private_obj_duplicate_state(obj, &state->base);
+
+   return &state->base;
+}
+
+static void ingenic_drm_destroy_state(struct drm_private_obj *obj,
+ struct drm_private_state *state)
+{
+   struct ingenic_drm_private_state *priv_state = 
to_ingenic_drm_priv_state(state);
+
+   kfree(priv_state);
+}
+
 DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
 
 static const struct drm_driver ingenic_drm_driver_data = {
@@ -863,6 +897,11 @@ static struct drm_mode_config_helper_funcs 
ingenic_drm_mode_config_helpers = {
.atomic_commit_tail = ingenic_drm_atomic_helper_commit_tail,
 };
 
+static const struct drm_private_state_funcs ingenic_drm_private_state_funcs = {
+   .atomic_duplicate_state = ingenic_drm_duplicate_state,
+   .atomic_destroy_state = ingenic_drm_destroy_state,
+};
+
 static void ingenic_drm_unbind_all(void *d)
 {
struct ingenic_drm *priv = d;
@@ -875,9 +914,15 @@ static void __maybe_unused ingenic_drm_release_rmem(void 
*d)
of_reserved_mem_device_release(d);
 }
 
+static void ingenic_drm_atomic_private_obj_fini(struct drm_device *drm, void 
*private_obj)
+{
+   drm_atomic_private_obj_fini(private_obj);
+}
+
 static int ingenic_drm_bind(struct device *dev, bool has_components)
 {
struct platform_device *pdev = to_platform_device(dev);
+   struct ingenic_drm_private_state *private_state;
const struct jz_soc_info *soc_info;
struct ingenic_drm *priv;
struct clk *parent_clk;
@@ -1158,6 +1203,20 @@ static int ingenic_drm_bind(struct device *dev, bool 
has_components)
goto err_devclk_disable;
}
 
+   private_state = kzalloc(sizeof(*private_state), GFP_KERNEL);
+   if (!private_state) {
+   ret = -ENOMEM;
+   goto err_clk_notifier_unregister;
+   }
+
+   drm_atomic_private_obj_init(drm, &priv->private_obj, 
&private_state->base,
+   &ingenic_drm_private_state_funcs);
+
+   ret = drmm_add_action_or_reset(drm, ingenic_drm_atomic_private_obj_fini,
+  &priv->private_obj);
+   if (ret)
+   goto err_private_state_free;
+
ret = drm_dev_register(drm, 0);
if (ret) {
dev_err(dev, "Failed to register DRM driver\n");
@@ -1168,6 +1227,8 @@ static int ingenic_drm_bind(struct device *dev, bool 
has_components)
 
return 0;
 
+err_private_state_free:
+   kfree(private_state);
 err_clk_notifier_unregister:
clk_notifier_unregister(parent_clk, &priv->clock_nb);
 err_devclk_disable:
diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c 
b/drivers/gpu/drm/ingenic/ingenic-ipu.c
index 61b6d9fdbba1..007cd547b285 100644
--- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
+++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
@@ -45,6 +45,10 @@ struct soc_info {
  unsigned int weight, unsigned int offset);
 };
 
+struct ingenic_ipu_private_state {
+   struct drm_private_state base;
+};
+
 struct ingenic_ipu {
struct d

[PATCH 02/11] drm/ingenic: Simplify code by using hwdescs array

2021-05-27 Thread Paul Cercueil
Instead of having one 'hwdesc' variable for the plane #0 and one for the
plane #1, use a 'hwdesc[2]' array, where the DMA hardware descriptors
are indexed by the plane's number.

Signed-off-by: Paul Cercueil 
---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 38 ---
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c 
b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index 93c099e7464d..4e41bdf2f3fd 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -50,8 +50,7 @@ struct ingenic_dma_hwdesc {
 } __aligned(16);
 
 struct ingenic_dma_hwdescs {
-   struct ingenic_dma_hwdesc hwdesc_f0;
-   struct ingenic_dma_hwdesc hwdesc_f1;
+   struct ingenic_dma_hwdesc hwdesc[2];
struct ingenic_dma_hwdesc hwdesc_pal;
u16 palette[256] __aligned(16);
 };
@@ -142,6 +141,13 @@ static inline struct ingenic_drm *drm_nb_get_priv(struct 
notifier_block *nb)
return container_of(nb, struct ingenic_drm, clock_nb);
 }
 
+static inline dma_addr_t dma_hwdesc_addr(const struct ingenic_drm *priv, bool 
use_f1)
+{
+   u32 offset = offsetof(struct ingenic_dma_hwdescs, hwdesc[use_f1]);
+
+   return priv->dma_hwdescs_phys + offset;
+}
+
 static int ingenic_drm_update_pixclk(struct notifier_block *nb,
 unsigned long action,
 void *data)
@@ -563,6 +569,7 @@ static void ingenic_drm_plane_atomic_update(struct 
drm_plane *plane,
struct ingenic_dma_hwdesc *hwdesc;
unsigned int width, height, cpp, offset;
dma_addr_t addr;
+   bool use_f1;
u32 fourcc;
 
if (newstate && newstate->fb) {
@@ -570,16 +577,14 @@ static void ingenic_drm_plane_atomic_update(struct 
drm_plane *plane,
drm_fb_cma_sync_non_coherent(&priv->drm, oldstate, 
newstate);
 
crtc_state = newstate->crtc->state;
+   use_f1 = priv->soc_info->has_osd && plane != &priv->f0;
 
addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
width = newstate->src_w >> 16;
height = newstate->src_h >> 16;
cpp = newstate->fb->format->cpp[0];
 
-   if (!priv->soc_info->has_osd || plane == &priv->f0)
-   hwdesc = &priv->dma_hwdescs->hwdesc_f0;
-   else
-   hwdesc = &priv->dma_hwdescs->hwdesc_f1;
+   hwdesc = &priv->dma_hwdescs->hwdesc[use_f1];
 
hwdesc->addr = addr;
hwdesc->cmd = JZ_LCD_CMD_EOF_IRQ | (width * height * cpp / 4);
@@ -592,9 +597,9 @@ static void ingenic_drm_plane_atomic_update(struct 
drm_plane *plane,
if (fourcc == DRM_FORMAT_C8)
offset = offsetof(struct ingenic_dma_hwdescs, 
hwdesc_pal);
else
-   offset = offsetof(struct ingenic_dma_hwdescs, 
hwdesc_f0);
+   offset = offsetof(struct ingenic_dma_hwdescs, 
hwdesc[0]);
 
-   priv->dma_hwdescs->hwdesc_f0.next = 
priv->dma_hwdescs_phys + offset;
+   priv->dma_hwdescs->hwdesc[0].next = 
priv->dma_hwdescs_phys + offset;
 
crtc_state->color_mgmt_changed = fourcc == 
DRM_FORMAT_C8;
}
@@ -968,20 +973,17 @@ static int ingenic_drm_bind(struct device *dev, bool 
has_components)
 
 
/* Configure DMA hwdesc for foreground0 plane */
-   dma_hwdesc_phys_f0 = priv->dma_hwdescs_phys
-   + offsetof(struct ingenic_dma_hwdescs, hwdesc_f0);
-   priv->dma_hwdescs->hwdesc_f0.next = dma_hwdesc_phys_f0;
-   priv->dma_hwdescs->hwdesc_f0.id = 0xf0;
+   dma_hwdesc_phys_f0 = dma_hwdesc_addr(priv, 0);
+   priv->dma_hwdescs->hwdesc[0].next = dma_hwdesc_phys_f0;
+   priv->dma_hwdescs->hwdesc[0].id = 0xf0;
 
/* Configure DMA hwdesc for foreground1 plane */
-   dma_hwdesc_phys_f1 = priv->dma_hwdescs_phys
-   + offsetof(struct ingenic_dma_hwdescs, hwdesc_f1);
-   priv->dma_hwdescs->hwdesc_f1.next = dma_hwdesc_phys_f1;
-   priv->dma_hwdescs->hwdesc_f1.id = 0xf1;
+   dma_hwdesc_phys_f1 = dma_hwdesc_addr(priv, 1);
+   priv->dma_hwdescs->hwdesc[1].next = dma_hwdesc_phys_f1;
+   priv->dma_hwdescs->hwdesc[1].id = 0xf1;
 
/* Configure DMA hwdesc for palette */
-   priv->dma_hwdescs->hwdesc_pal.next = priv->dma_hwdescs_phys
-   + offsetof(struct ingenic_dma_hwdescs, hwdesc_f0);
+   priv->dma_hwdescs->hwdesc_pal.next = dma_hwdesc_phys_f0;
priv->dma_hwdescs->hwdesc_pal.id = 0xc0;
priv->dma_hwdescs->hwdesc_pal.addr = priv->dma_hwdescs_phys
+ offsetof(struct ingenic_dma_hwdescs, palette);
-- 
2.30.2



[PATCH 01/11] drm/ingenic: Remove dead code

2021-05-27 Thread Paul Cercueil
The priv->ipu_plane would get a different value further down the code,
without the first assigned value being read first; so the first
assignation can be dropped.

Signed-off-by: Paul Cercueil 
---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c 
b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index 5244f4763477..93c099e7464d 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -988,9 +988,6 @@ static int ingenic_drm_bind(struct device *dev, bool 
has_components)
priv->dma_hwdescs->hwdesc_pal.cmd = JZ_LCD_CMD_ENABLE_PAL
| (sizeof(priv->dma_hwdescs->palette) / 4);
 
-   if (soc_info->has_osd)
-   priv->ipu_plane = drm_plane_from_index(drm, 0);
-
primary = priv->soc_info->has_osd ? &priv->f1 : &priv->f0;
 
drm_plane_helper_add(primary, &ingenic_drm_plane_helper_funcs);
-- 
2.30.2



[PATCH 00/11] ingenic-drm cleanups and doublescan feature

2021-05-27 Thread Paul Cercueil
Hi,

Here is a set of 11 patches for the ingenic-drm driver.

Patches 1-7 are mostly generic cleanups, which will grease up the way
for bigger changes to be introduced.

Patch 3 adds support for a private state structure, which is then used
to store state-specific information, which was previously stored in the
driver's private structure directly.

Patch 10 is the big one; it adds a double-scan feature emulated with DMA
descriptors. This trick makes it possible to support a handful of boards
which have strange panels with non-square pixels (320x480 4:3).

Patch 11 updates the driver to support one top-level bridge per encoder,
as it seems to be the norm now.

Cheers,
-Paul

Paul Cercueil (11):
  drm/ingenic: Remove dead code
  drm/ingenic: Simplify code by using hwdescs array
  drm/ingenic: Add support for private objects
  drm/ingenic: Move no_vblank to private state
  drm/ingenic: Move IPU scale settings to private state
  drm/ingenic: Set DMA descriptor chain register when starting CRTC
  drm/ingenic: Upload palette before frame
  drm/ingenic: Support custom GEM object
  drm/ingenic: Add ingenic_drm_gem_fb_destroy() function
  drm/ingenic: Add doublescan feature
  drm/ingenic: Attach bridge chain to encoders

 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 414 ++
 drivers/gpu/drm/ingenic/ingenic-ipu.c | 127 ++-
 2 files changed, 458 insertions(+), 83 deletions(-)

-- 
2.30.2



Re: [v1] drm/msm/disp/dpu1: avoid perf update in frame done event

2021-05-27 Thread Doug Anderson
Hi,

On Wed, May 26, 2021 at 10:08 PM Krishna Manikandan
 wrote:
>
> Crtc perf update from frame event work can result in
> wrong bandwidth and clock update from dpu if the work
> is scheduled after the swap state has happened.
>
> Avoid such issues by moving perf update to complete
> commit once the frame is accepted by the hardware.
>
> Fixes: a29c8c024165 ("drm/msm/disp/dpu1: fix display underruns during 
> modeset")
> Signed-off-by: Krishna Manikandan 
> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)

I don't know much about this code or any of the theory behind it, but
I can confirm that this fixes the hang I was seeing with the previous
patch. On sc7180-trogdor-lazor:

Tested-by: Douglas Anderson 


[RFC PATCH 5/5] mm: changes to unref pages with Generic type

2021-05-27 Thread Felix Kuehling
From: Alex Sierra 

pages in device mapping refcounts are 1-based, instead
of 0-based. If refcount 1, means it can be freed.
This logic is not set for Generic memory type. Therefore,
its release is threated as a normal page, instead of
the callback device driver release it.

Signed-off-by: Alex Sierra 
---
 include/linux/mm.h | 1 +
 mm/memremap.c  | 5 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1af7b9b76948..83bd2f3e111b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1130,6 +1130,7 @@ static inline bool page_is_devmap_managed(struct page 
*page)
switch (page->pgmap->type) {
case MEMORY_DEVICE_PRIVATE:
case MEMORY_DEVICE_FS_DAX:
+   case MEMORY_DEVICE_GENERIC:
return true;
default:
break;
diff --git a/mm/memremap.c b/mm/memremap.c
index 16b2fb482da1..d2563fbcf987 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -44,6 +44,7 @@ EXPORT_SYMBOL(devmap_managed_key);
 static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
 {
if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
+   pgmap->type == MEMORY_DEVICE_GENERIC ||
pgmap->type == MEMORY_DEVICE_FS_DAX)
static_branch_dec(&devmap_managed_key);
 }
@@ -51,6 +52,7 @@ static void devmap_managed_enable_put(struct dev_pagemap 
*pgmap)
 static void devmap_managed_enable_get(struct dev_pagemap *pgmap)
 {
if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
+   pgmap->type == MEMORY_DEVICE_GENERIC ||
pgmap->type == MEMORY_DEVICE_FS_DAX)
static_branch_inc(&devmap_managed_key);
 }
@@ -480,7 +482,8 @@ EXPORT_SYMBOL_GPL(get_dev_pagemap);
 void free_devmap_managed_page(struct page *page)
 {
/* notify page idle for dax */
-   if (!is_device_private_page(page)) {
+   if (!(is_device_private_page(page) ||
+   is_device_generic_page(page))) {
wake_up_var(&page->_refcount);
return;
}
-- 
2.31.1



[RFC PATCH 4/5] mm: add generic type support for device zone page migration

2021-05-27 Thread Felix Kuehling
From: Alex Sierra 

This support is only for generic type anonymous memory.
Generic type with zone device pages require to take an extra reference,
as it's done with device private type.
Also, support added to migrate pages meta-data for generic device type.

Signed-off-by: Alex Sierra 
---
 mm/migrate.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 20ca887ea769..33e573a992e5 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -380,7 +380,8 @@ static int expected_page_refs(struct address_space 
*mapping, struct page *page)
 * Device private pages have an extra refcount as they are
 * ZONE_DEVICE pages.
 */
-   expected_count += is_device_private_page(page);
+   expected_count +=
+   (is_device_private_page(page) || 
is_device_generic_page(page));
if (mapping)
expected_count += thp_nr_pages(page) + page_has_private(page);
 
@@ -2607,7 +2608,7 @@ static bool migrate_vma_check_page(struct page *page)
 * FIXME proper solution is to rework migration_entry_wait() so
 * it does not need to take a reference on page.
 */
-   return is_device_private_page(page);
+   return is_device_private_page(page) | 
is_device_generic_page(page);
}
 
/* For file back page */
@@ -3069,10 +3070,12 @@ void migrate_vma_pages(struct migrate_vma *migrate)
mapping = page_mapping(page);
 
if (is_zone_device_page(newpage)) {
-   if (is_device_private_page(newpage)) {
+   if (is_device_private_page(newpage) ||
+   is_device_generic_page(newpage)) {
/*
-* For now only support private anonymous when
-* migrating to un-addressable device memory.
+* For now only support private and 
devdax/generic
+* anonymous when migrating to un-addressable
+* device memory.
 */
if (mapping) {
migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
-- 
2.31.1



[RFC PATCH 3/5] include/linux/mm.h: helper to check zone device generic type

2021-05-27 Thread Felix Kuehling
From: Alex Sierra 

Helper to check if zone device page is generic type.

Signed-off-by: Alex Sierra 
---
 include/linux/mm.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c9900aedc195..1af7b9b76948 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1158,6 +1158,13 @@ static inline bool is_device_private_page(const struct 
page *page)
page->pgmap->type == MEMORY_DEVICE_PRIVATE;
 }
 
+static inline bool is_device_generic_page(const struct page *page)
+{
+   return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
+   is_zone_device_page(page) &&
+   page->pgmap->type == MEMORY_DEVICE_GENERIC;
+}
+
 static inline bool is_pci_p2pdma_page(const struct page *page)
 {
return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
-- 
2.31.1



[RFC PATCH 2/5] drm/amdkfd: generic type as sys mem on migration to ram

2021-05-27 Thread Felix Kuehling
From: Alex Sierra 

Generic device type memory on VRAM to RAM migration,
has similar access as System RAM from the CPU. This flag sets
the source from the sender. Which in Generic type case,
should be set as SYSTEM.

Signed-off-by: Alex Sierra 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index f5939449a99f..7b41006c1164 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -653,8 +653,9 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct 
svm_range *prange,
migrate.vma = vma;
migrate.start = start;
migrate.end = end;
-   migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev);
+   migrate.flags = adev->gmc.xgmi.connected_to_cpu ?
+   MIGRATE_VMA_SELECT_SYSTEM : 
MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
 
size = 2 * sizeof(*migrate.src) + sizeof(uint64_t) + sizeof(dma_addr_t);
size *= npages;
-- 
2.31.1



[RFC PATCH 1/5] drm/amdkfd: add SPM support for SVM

2021-05-27 Thread Felix Kuehling
From: Alex Sierra 

When CPU is connected throug XGMI, it has coherent
access to VRAM resource. In this case that resource
is taken from a table in the device gmc aperture base.
This resource is used along with the device type, which could
be DEVICE_PRIVATE or DEVICE_GENERIC to create the device
page map region.

Signed-off-by: Alex Sierra 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 12 +---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  1 -
 kernel/resource.c|  2 +-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index c8ca3252cbc2..f5939449a99f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -895,6 +895,7 @@ int svm_migrate_init(struct amdgpu_device *adev)
struct resource *res;
unsigned long size;
void *r;
+   bool xgmi_connected_to_cpu = adev->gmc.xgmi.connected_to_cpu;
 
/* Page migration works on Vega10 or newer */
if (kfddev->device_info->asic_family < CHIP_VEGA10)
@@ -907,17 +908,22 @@ int svm_migrate_init(struct amdgpu_device *adev)
 * should remove reserved size
 */
size = ALIGN(adev->gmc.real_vram_size, 2ULL << 20);
-   res = devm_request_free_mem_region(adev->dev, &iomem_resource, size);
+   if (xgmi_connected_to_cpu)
+   res = lookup_resource(&iomem_resource, adev->gmc.aper_base);
+   else
+   res = devm_request_free_mem_region(adev->dev, &iomem_resource, 
size);
+
if (IS_ERR(res))
return -ENOMEM;
 
-   pgmap->type = MEMORY_DEVICE_PRIVATE;
pgmap->nr_range = 1;
pgmap->range.start = res->start;
pgmap->range.end = res->end;
+   pgmap->type = xgmi_connected_to_cpu ?
+   MEMORY_DEVICE_GENERIC : MEMORY_DEVICE_PRIVATE;
pgmap->ops = &svm_migrate_pgmap_ops;
pgmap->owner = SVM_ADEV_PGMAP_OWNER(adev);
-   pgmap->flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
+   pgmap->flags = 0;
r = devm_memremap_pages(adev->dev, pgmap);
if (IS_ERR(r)) {
pr_err("failed to register HMM device memory\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
index 21f693767a0d..3881a93192ed 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
@@ -38,7 +38,6 @@
 #define SVM_RANGE_VRAM_DOMAIN (1UL << 0)
 #define SVM_ADEV_PGMAP_OWNER(adev)\
((adev)->hive ? (void *)(adev)->hive : (void *)(adev))
-
 struct svm_range_bo {
struct amdgpu_bo*bo;
struct kref kref;
diff --git a/kernel/resource.c b/kernel/resource.c
index 627e61b0c124..da137553b83e 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -783,7 +783,7 @@ struct resource *lookup_resource(struct resource *root, 
resource_size_t start)
 
return res;
 }
-
+EXPORT_SYMBOL(lookup_resource);
 /*
  * Insert a resource into the resource tree. If successful, return NULL,
  * otherwise return the conflicting resource (compare to __request_resource())
-- 
2.31.1



[RFC PATCH 0/5] Support DEVICE_GENERIC memory in migrate_vma_*

2021-05-27 Thread Felix Kuehling
AMD is building a system architecture for the Frontier supercomputer with
a coherent interconnect between CPUs and GPUs. This hardware architecture
allows the CPUs to coherently access GPU device memory. We have hardware
in our labs and we are working with our partner HPE on the BIOS, firmware
and software for delivery to the DOE.

The system BIOS advertises the GPU device memory (aka VRAM) as SPM
(special purpose memory) in the UEFI system address map. The amdgpu driver
looks it up with lookup_resource and registers it with devmap as
MEMORY_DEVICE_GENERIC using devm_memremap_pages.

Now we're trying to migrate data to and from that memory using the
migrate_vma_* helpers so we can support page-based migration in our
unified memory allocations, while also supporting CPU access to those
pages.

This patch series makes a few changes to make MEMORY_DEVICE_GENERIC pages
behave correctly in the migrate_vma_* helpers. We are looking for feedback
about this approach. If we're close, what's needed to make our patches
acceptable upstream? If we're not close, any suggestions how else to
achieve what we are trying to do (i.e. page migration and coherent CPU
access to VRAM)?

This work is based on HMM and our SVM memory manager that was recently
upstreamed to Dave Airlie's drm-next branch
[https://cgit.freedesktop.org/drm/drm/log/?h=drm-next]. On top of that we
did some rework of our VRAM management for migrations to remove some
incorrect assumptions, allow partially successful migrations and GPU
memory mappings that mix pages in VRAM and system memory.
[https://patchwork.kernel.org/project/dri-devel/list/?series=489811]

In this RFC, patches 1 and 2 are for context to show how we are looking up
the SPM memory and registering it with devmap.

Patches 3-5 are the changes we are trying to upstream or rework to make
them acceptable upstream.

Alex Sierra (5):
  drm/amdkfd: add SPM support for SVM
  drm/amdkfd: generic type as sys mem on migration to ram
  include/linux/mm.h: helper to check zone device generic type
  mm: add generic type support for device zone page migration
  mm: changes to unref pages with Generic type

 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 15 +++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  1 -
 include/linux/mm.h   |  8 
 kernel/resource.c|  2 +-
 mm/memremap.c|  5 -
 mm/migrate.c | 13 -
 6 files changed, 32 insertions(+), 12 deletions(-)

-- 
2.31.1



[PATCH] drm: Fix for GEM buffers with write-combine memory

2021-05-27 Thread Paul Cercueil
The previous commit wrongly assumed that dma_mmap_wc() could be replaced
by pgprot_writecombine() + dma_mmap_pages(). It did work on my setup,
but did not work everywhere.

Use dma_mmap_wc() when the buffer has the write-combine cache attribute,
and dma_mmap_pages() when it has the non-coherent cache attribute.

Signed-off-by: Paul Cercueil 
Reported-by: Tomi Valkeinen 
Fixes: cf8ccbc72d61 ("drm: Add support for GEM buffers backed by non-coherent 
memory")
---
 drivers/gpu/drm/drm_gem_cma_helper.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c 
b/drivers/gpu/drm/drm_gem_cma_helper.c
index 235c7a63da2b..4c3772651954 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -514,13 +514,17 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct 
vm_area_struct *vma)
 
cma_obj = to_drm_gem_cma_obj(obj);
 
-   vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
-   if (!cma_obj->map_noncoherent)
-   vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+   if (cma_obj->map_noncoherent) {
+   vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+
+   ret = dma_mmap_pages(cma_obj->base.dev->dev,
+vma, vma->vm_end - vma->vm_start,
+virt_to_page(cma_obj->vaddr));
+   } else {
+   ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
+ cma_obj->paddr, vma->vm_end - vma->vm_start);
 
-   ret = dma_mmap_pages(cma_obj->base.dev->dev,
-vma, vma->vm_end - vma->vm_start,
-virt_to_page(cma_obj->vaddr));
+   }
if (ret)
drm_gem_vm_close(vma);
 
-- 
2.30.2



Re: [Freedreno] [PATCH] drm/msm: fix display snapshotting if DP or DSI is disabled

2021-05-27 Thread abhinavk

On 2021-05-27 15:03, Dmitry Baryshkov wrote:

Fix following warnings generated when either DP or DSI support is
disabled:

drivers/gpu/drm/msm/disp/msm_disp_snapshot_util.c:141:3: error:
implicit declaration of function 'msm_dp_snapshot'; did you mean
'msm_dsi_snapshot'? [-Werror=implicit-function-declaration]

drivers/gpu/drm/msm/msm_kms.h:127:26: warning: 'struct msm_disp_state'
declared inside parameter list will not be visible outside of this
definition or declaration
drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c:867:21: error: initialization
of 'void (*)(struct msm_disp_state *, struct msm_kms *)' from
incompatible pointer type 'void (*)(struct msm_disp_state *, struct
msm_kms *)' [-Werror=incompatible-pointer-types]
drivers/gpu/drm/msm/dsi/dsi.h:94:30: warning: 'struct msm_disp_state'
declared inside parameter list will not be visible outside of this
definition or declaration

Reported-by: kernel test robot 
Cc: Abhinav Kumar 
Fixes: 1c3b7ac1a71d ("drm/msm: pass dump state as a function argument")
Signed-off-by: Dmitry Baryshkov 

Reviewed-by: Abhinav Kumar 

---
 drivers/gpu/drm/msm/disp/msm_disp_snapshot.h |  1 -
 drivers/gpu/drm/msm/dsi/dsi.h|  2 --
 drivers/gpu/drm/msm/msm_drv.h| 12 +++-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h
b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h
index c6174a366095..c92a9508c8d3 100644
--- a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h
+++ b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h
@@ -27,7 +27,6 @@
 #include 
 #include 
 #include "msm_kms.h"
-#include "dsi.h"

 #define MSM_DISP_SNAPSHOT_MAX_BLKS 10

diff --git a/drivers/gpu/drm/msm/dsi/dsi.h 
b/drivers/gpu/drm/msm/dsi/dsi.h

index cea73f9c4be9..9b8e9b07eced 100644
--- a/drivers/gpu/drm/msm/dsi/dsi.h
+++ b/drivers/gpu/drm/msm/dsi/dsi.h
@@ -91,8 +91,6 @@ static inline bool msm_dsi_device_connected(struct
msm_dsi *msm_dsi)
return msm_dsi->panel || msm_dsi->external_bridge;
 }

-void msm_dsi_snapshot(struct msm_disp_state *disp_state, struct
msm_dsi *msm_dsi);
-
 struct drm_encoder *msm_dsi_get_encoder(struct msm_dsi *msm_dsi);

 /* dsi host */
diff --git a/drivers/gpu/drm/msm/msm_drv.h 
b/drivers/gpu/drm/msm/msm_drv.h

index c33fc1293789..ba60bf6f124c 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -43,6 +43,7 @@ struct msm_gem_submit;
 struct msm_fence_context;
 struct msm_gem_address_space;
 struct msm_gem_vma;
+struct msm_disp_state;

 #define MAX_CRTCS  8
 #define MAX_PLANES 20
@@ -340,6 +341,8 @@ void __init msm_dsi_register(void);
 void __exit msm_dsi_unregister(void);
 int msm_dsi_modeset_init(struct msm_dsi *msm_dsi, struct drm_device 
*dev,

 struct drm_encoder *encoder);
+void msm_dsi_snapshot(struct msm_disp_state *disp_state, struct
msm_dsi *msm_dsi);
+
 #else
 static inline void __init msm_dsi_register(void)
 {
@@ -353,6 +356,10 @@ static inline int msm_dsi_modeset_init(struct
msm_dsi *msm_dsi,
 {
return -EINVAL;
 }
+static inline void msm_dsi_snapshot(struct msm_disp_state
*disp_state, struct msm_dsi *msm_dsi)
+{
+}
+
 #endif

 #ifdef CONFIG_DRM_MSM_DP
@@ -367,7 +374,6 @@ void msm_dp_display_mode_set(struct msm_dp *dp,
struct drm_encoder *encoder,
struct drm_display_mode *mode,
struct drm_display_mode *adjusted_mode);
 void msm_dp_irq_postinstall(struct msm_dp *dp_display);
-struct msm_disp_state;
 void msm_dp_snapshot(struct msm_disp_state *disp_state, struct msm_dp
*dp_display);

 void msm_dp_debugfs_init(struct msm_dp *dp_display, struct drm_minor 
*minor);

@@ -412,6 +418,10 @@ static inline void msm_dp_irq_postinstall(struct
msm_dp *dp_display)
 {
 }

+static inline void msm_dp_snapshot(struct msm_disp_state *disp_state,
struct msm_dp *dp_display)
+{
+}
+
 static inline void msm_dp_debugfs_init(struct msm_dp *dp_display,
struct drm_minor *minor)
 {


[PATCH] drm/msm: fix display snapshotting if DP or DSI is disabled

2021-05-27 Thread Dmitry Baryshkov
Fix following warnings generated when either DP or DSI support is
disabled:

drivers/gpu/drm/msm/disp/msm_disp_snapshot_util.c:141:3: error: implicit 
declaration of function 'msm_dp_snapshot'; did you mean 'msm_dsi_snapshot'? 
[-Werror=implicit-function-declaration]

drivers/gpu/drm/msm/msm_kms.h:127:26: warning: 'struct msm_disp_state' declared 
inside parameter list will not be visible outside of this definition or 
declaration
drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c:867:21: error: initialization of 'void 
(*)(struct msm_disp_state *, struct msm_kms *)' from incompatible pointer type 
'void (*)(struct msm_disp_state *, struct msm_kms *)' 
[-Werror=incompatible-pointer-types]
drivers/gpu/drm/msm/dsi/dsi.h:94:30: warning: 'struct msm_disp_state' declared 
inside parameter list will not be visible outside of this definition or 
declaration

Reported-by: kernel test robot 
Cc: Abhinav Kumar 
Fixes: 1c3b7ac1a71d ("drm/msm: pass dump state as a function argument")
Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/disp/msm_disp_snapshot.h |  1 -
 drivers/gpu/drm/msm/dsi/dsi.h|  2 --
 drivers/gpu/drm/msm/msm_drv.h| 12 +++-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h 
b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h
index c6174a366095..c92a9508c8d3 100644
--- a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h
+++ b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h
@@ -27,7 +27,6 @@
 #include 
 #include 
 #include "msm_kms.h"
-#include "dsi.h"
 
 #define MSM_DISP_SNAPSHOT_MAX_BLKS 10
 
diff --git a/drivers/gpu/drm/msm/dsi/dsi.h b/drivers/gpu/drm/msm/dsi/dsi.h
index cea73f9c4be9..9b8e9b07eced 100644
--- a/drivers/gpu/drm/msm/dsi/dsi.h
+++ b/drivers/gpu/drm/msm/dsi/dsi.h
@@ -91,8 +91,6 @@ static inline bool msm_dsi_device_connected(struct msm_dsi 
*msm_dsi)
return msm_dsi->panel || msm_dsi->external_bridge;
 }
 
-void msm_dsi_snapshot(struct msm_disp_state *disp_state, struct msm_dsi 
*msm_dsi);
-
 struct drm_encoder *msm_dsi_get_encoder(struct msm_dsi *msm_dsi);
 
 /* dsi host */
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index c33fc1293789..ba60bf6f124c 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -43,6 +43,7 @@ struct msm_gem_submit;
 struct msm_fence_context;
 struct msm_gem_address_space;
 struct msm_gem_vma;
+struct msm_disp_state;
 
 #define MAX_CRTCS  8
 #define MAX_PLANES 20
@@ -340,6 +341,8 @@ void __init msm_dsi_register(void);
 void __exit msm_dsi_unregister(void);
 int msm_dsi_modeset_init(struct msm_dsi *msm_dsi, struct drm_device *dev,
 struct drm_encoder *encoder);
+void msm_dsi_snapshot(struct msm_disp_state *disp_state, struct msm_dsi 
*msm_dsi);
+
 #else
 static inline void __init msm_dsi_register(void)
 {
@@ -353,6 +356,10 @@ static inline int msm_dsi_modeset_init(struct msm_dsi 
*msm_dsi,
 {
return -EINVAL;
 }
+static inline void msm_dsi_snapshot(struct msm_disp_state *disp_state, struct 
msm_dsi *msm_dsi)
+{
+}
+
 #endif
 
 #ifdef CONFIG_DRM_MSM_DP
@@ -367,7 +374,6 @@ void msm_dp_display_mode_set(struct msm_dp *dp, struct 
drm_encoder *encoder,
struct drm_display_mode *mode,
struct drm_display_mode *adjusted_mode);
 void msm_dp_irq_postinstall(struct msm_dp *dp_display);
-struct msm_disp_state;
 void msm_dp_snapshot(struct msm_disp_state *disp_state, struct msm_dp 
*dp_display);
 
 void msm_dp_debugfs_init(struct msm_dp *dp_display, struct drm_minor *minor);
@@ -412,6 +418,10 @@ static inline void msm_dp_irq_postinstall(struct msm_dp 
*dp_display)
 {
 }
 
+static inline void msm_dp_snapshot(struct msm_disp_state *disp_state, struct 
msm_dp *dp_display)
+{
+}
+
 static inline void msm_dp_debugfs_init(struct msm_dp *dp_display,
struct drm_minor *minor)
 {
-- 
2.30.2



Linux Graphics Next: Userspace submission update

2021-05-27 Thread Marek Olšák
Hi,

Since Christian believes that we can't deadlock the kernel with some
changes there, we just need to make everything nice for userspace too.
Instead of explaining how it will work, I will explain the cases where
future hardware (and its kernel driver) will break existing userspace in
order to protect everybody from deadlocks. Anything that uses implicit sync
will be spared, so X and Wayland will be fine, assuming they don't
import/export fences. Those use cases that do import/export fences might or
might not work, depending on how the fences are used.

One of the necessities is that all fences will become future fences. The
semantics of imported/exported fences will change completely and will have
new restrictions on the usage. The restrictions are:


1) Android sync files will be impossible to support, so won't be supported.
(they don't allow future fences)


2) Implicit sync and explicit sync will be mutually exclusive between
process. A process can either use one or the other, but not both. This is
meant to prevent a deadlock condition with future fences where any process
can malevolently deadlock execution of any other process, even execution of
a higher-privileged process. The kernel will impose the following
restrictions to protect against the deadlock:

a) a process with an implicitly-sync'd imported/exported buffer can't
import/export a fence from/to another process
b) a process with an imported/exported fence can't import/export an
implicitly-sync'd buffer from/to another process

Alternative: A higher-privileged process could enforce both restrictions
instead of the kernel to protect itself from the deadlock, but this would
be a can of worms for existing userspace. It would be better if the kernel
just broke unsafe userspace on future hw, just like sync files.

If both implicit and explicit sync are allowed to occur simultaneously,
sending a future fence that will never signal to any process will deadlock
that process after it acquires the implicit sync lock, which is a sequence
number that the process is required to write to memory and send an
interrupt from the GPU in a finite time. This is how the deadlock can
happen:

* The process gets sequence number N from the kernel for an
implicitly-sync'd buffer.
* The process inserts (into the GPU user-mapped queue) a wait for sequence
number N-1.
* The process inserts a wait for a fence, but it doesn't know that it will
never signal ==> deadlock.
...
* The process inserts a command to write sequence number N to a
predetermined memory location. (which will make the buffer idle and send an
interrupt to the kernel)
...
* The kernel will terminate the process because it has never received the
interrupt. (i.e. a less-privileged process just killed a more-privileged
process)

It's the interrupt for implicit sync that never arrived that caused the
termination, and the only way another process can cause it is by sending a
fence that will never signal. Thus, importing/exporting fences from/to
other processes can't be allowed simultaneously with implicit sync.


3) Compositors (and other privileged processes, and display flipping) can't
trust imported/exported fences. They need a timeout recovery mechanism from
the beginning, and the following are some possible solutions to timeouts:

a) use a CPU wait with a small absolute timeout, and display the previous
content on timeout
b) use a GPU wait with a small absolute timeout, and conditional rendering
will choose between the latest content (if signalled) and previous content
(if timed out)

The result would be that the desktop can run close to 60 fps even if an app
runs at 1 fps.

*Redefining imported/exported fences and breaking some users/OSs is the
only way to have userspace GPU command submission, and the deadlock example
here is the counterexample proving that there is no other way.*

So, what are the chances this is going to fly with the ecosystem?

Thanks,
Marek


Re: [v4 1/4] drm/panel-simple: Add basic DPCD backlight support

2021-05-27 Thread Doug Anderson
Hi,

On Thu, May 27, 2021 at 5:21 AM  wrote:
>
> >> @@ -171,6 +172,19 @@ struct panel_desc {
> >>
> >> /** @connector_type: LVDS, eDP, DSI, DPI, etc. */
> >> int connector_type;
> >> +
> >> +   /**
> >> +* @uses_dpcd_backlight: Panel supports eDP dpcd backlight
> >> control.
> >> +*
> >> +* Set true, if the panel supports backlight control over eDP
> >> AUX channel
> >> +* using DPCD registers as per VESA's standard.
> >> +*/
> >> +   bool uses_dpcd_backlight;
> >> +};
> >> +
> >> +struct edp_backlight {
> >> +   struct backlight_device *dev;
> >
> > Can you pick a name other than "dev". In my mind "dev" means you've
> > got a "struct device" or a "struct device *".
>
> In the backlight.h "bd" is used for "struct backlight_device". I can use
> "bd"?

That would be OK w/ me since it's not "dev". In theory you could also
call it "base" like panel-simple does with the base class drm_panel,
but I'll leave that up to you. It's mostly that in my brain "dev" is
reserved for "struct device" but otherwise I'm pretty flexible.


> >> +   struct drm_edp_backlight_info info;
> >>  };
> >>
> >>  struct panel_simple {
> >> @@ -194,6 +208,8 @@ struct panel_simple {
> >>
> >> struct edid *edid;
> >>
> >> +   struct edp_backlight *edp_bl;
> >> +
> >
> > I don't think you need to add this pointer. See below for details, but
> > basically the backlight device should be in base.backlight. Any code
> > that needs the containing structure can use the standard
> > "container_of" syntax.
> >
>
> The documentation of the "struct drm_panel -> backlight" mentions
> "backlight is set by drm_panel_of_backlight() and drivers shall not
> assign it."
> That's why I was not sure if I should touch that part. Because of this,
> I added
> backlight enable/disable calls inside panel_simple_disable/enable().

Fair enough. In my opinion (subject to being overridden by the adults
in the room), if you move your backlight code into drm_panel.c and
call it drm_panel_dp_aux_backlight() then it's fair game to use. This
basically means that it's no longer a "driver" assigning it since it's
being done in drm_panel.c. ;-) Obviously you'd want to update the
comment, too...


> >> +   err = drm_panel_of_backlight(&panel->base);
> >> +   if (err)
> >> +   goto disable_pm_runtime;
> >> +   }
> >
> > See above where I'm suggesting some different logic. Specifically:
> > always try the drm_panel_of_backlight() call and then fallback to the
> > AUX backlight if "panel->base.backlight" is NULL and "panel->aux" is
> > not NULL.
>
> What I understood:
> 1. Create a new API drm_panel_dp_aux_backlight() in drm_panel.c
> 1.1. Register DP AUX backlight if "struct drm_dp_aux" is given and
>  drm_edp_backlight_supported()
> 2. Create a call back function for backlight ".update_status()" inside
> drm_panel.c ?
>This function should also handle the backlight enable/disable
> operations.
> 3. Use the suggested rules to call drm_panel_dp_aux_backlight() as a
> fallback, if
> no backlight is specified in the DT.
> 4. Remove the @uses_dpcd_backlight flag from panel_desc as this should
> be auto-detected.

This sounds about right to me.

As per all of my reviews in the DRM subsystem, this is all just my
opinion and if someone more senior in DRM contradicts me then, of
course, you might have to change directions. Hopefully that doesn't
happen but it's always good to give warning...

-Doug


Re: [PATCH 0/4] drm/panfrost: Plumb cycle counters to userspace

2021-05-27 Thread Alyssa Rosenzweig
> The main outstanding questing is the proper name. Performance monitoring
> ("PERMON") is the name used by kbase, but it's jargon-y and risks
> confusion with performance counters, an orthogonal mechanism. Cycle
> count is more descriptive and matches the actual hardware name, but
> obscures that the same mechanism is required for GPU timestamps. This
> bit of bikeshedding aside, I'm pleased with the patches.

PANFROST_JD_REQ_CLOCK might be the clearest.


[PATCH 10/10] drm/amdkfd: protect svm_bo ref in case prange has forked

2021-05-27 Thread Felix Kuehling
From: Alex Sierra 

Keep track of all the pages inside of pranges referenced to
the same svm_bo. This is done, by using the ref count inside this object.
This makes sure the object has freed after the last prange is not longer
at any GPU. Including references shared between a parent and child during
a fork.

Signed-off-by: Alex Sierra 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 10 --
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 10 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index acb9f64577a0..c8ca3252cbc2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -245,7 +245,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, 
unsigned long pfn)
struct page *page;
 
page = pfn_to_page(pfn);
-   page->zone_device_data = prange;
+   page->zone_device_data = prange->svm_bo;
get_page(page);
lock_page(page);
 }
@@ -336,6 +336,7 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
svm_migrate_get_vram_page(prange, migrate->dst[i]);
migrate->dst[i] = migrate_pfn(migrate->dst[i]);
migrate->dst[i] |= MIGRATE_PFN_LOCKED;
+   svm_range_bo_ref(prange->svm_bo);
}
if (migrate->dst[i] & MIGRATE_PFN_VALID) {
spage = migrate_pfn_to_page(migrate->src[i]);
@@ -540,7 +541,12 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t 
best_loc,
 
 static void svm_migrate_page_free(struct page *page)
 {
-   /* Keep this function to avoid warning */
+   struct svm_range_bo *svm_bo = page->zone_device_data;
+
+   if (svm_bo) {
+   pr_debug("svm_bo ref left: %d\n", kref_read(&svm_bo->kref));
+   svm_range_bo_unref(svm_bo);
+   }
 }
 
 static int
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 1e15a6170635..2bc20752ee30 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -309,14 +309,6 @@ static bool svm_bo_ref_unless_zero(struct svm_range_bo 
*svm_bo)
return true;
 }
 
-static struct svm_range_bo *svm_range_bo_ref(struct svm_range_bo *svm_bo)
-{
-   if (svm_bo)
-   kref_get(&svm_bo->kref);
-
-   return svm_bo;
-}
-
 static void svm_range_bo_release(struct kref *kref)
 {
struct svm_range_bo *svm_bo;
@@ -355,7 +347,7 @@ static void svm_range_bo_release(struct kref *kref)
kfree(svm_bo);
 }
 
-static void svm_range_bo_unref(struct svm_range_bo *svm_bo)
+void svm_range_bo_unref(struct svm_range_bo *svm_bo)
 {
if (!svm_bo)
return;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
index 27fbe1936493..21f693767a0d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
@@ -150,6 +150,14 @@ static inline void svm_range_unlock(struct svm_range 
*prange)
mutex_unlock(&prange->lock);
 }
 
+static inline struct svm_range_bo *svm_range_bo_ref(struct svm_range_bo 
*svm_bo)
+{
+   if (svm_bo)
+   kref_get(&svm_bo->kref);
+
+   return svm_bo;
+}
+
 int svm_range_list_init(struct kfd_process *p);
 void svm_range_list_fini(struct kfd_process *p);
 int svm_ioctl(struct kfd_process *p, enum kfd_ioctl_svm_op op, uint64_t start,
@@ -178,7 +186,7 @@ void svm_range_dma_unmap(struct device *dev, dma_addr_t 
*dma_addr,
 void svm_range_free_dma_mappings(struct svm_range *prange);
 void svm_range_prefault(struct svm_range *prange, struct mm_struct *mm,
void *owner);
-
+void svm_range_bo_unref(struct svm_range_bo *svm_bo);
 #else
 
 struct kfd_process;
-- 
2.31.1



[PATCH 08/10] drm/amdkfd: add invalid pages debug at vram migration

2021-05-27 Thread Felix Kuehling
From: Alex Sierra 

This is for debug purposes only.
It conditionally generates partial migrations to test mixed
CPU/GPU memory domain pages in a prange easily.

Signed-off-by: Alex Sierra 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 8a3f21d76915..f71f8d7e2b72 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -404,6 +404,20 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, 
struct svm_range *prange,
}
}
 
+#ifdef DEBUG_FORCE_MIXED_DOMAINS
+   for (i = 0, j = 0; i < npages; i += 4, j++) {
+   if (j & 1)
+   continue;
+   svm_migrate_put_vram_page(adev, dst[i]);
+   migrate->dst[i] = 0;
+   svm_migrate_put_vram_page(adev, dst[i + 1]);
+   migrate->dst[i + 1] = 0;
+   svm_migrate_put_vram_page(adev, dst[i + 2]);
+   migrate->dst[i + 2] = 0;
+   svm_migrate_put_vram_page(adev, dst[i + 3]);
+   migrate->dst[i + 3] = 0;
+   }
+#endif
 out:
return r;
 }
-- 
2.31.1



[PATCH 09/10] drm/amdkfd: partially actual_loc removed

2021-05-27 Thread Felix Kuehling
From: Alex Sierra 

actual_loc should not be used anymore, as pranges
could have mixed locations (VRAM & SYSRAM) at the
same time.

Signed-off-by: Alex Sierra 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 12 +---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 71 ++--
 2 files changed, 29 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index f71f8d7e2b72..acb9f64577a0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -501,12 +501,6 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t 
best_loc,
struct amdgpu_device *adev;
int r = 0;
 
-   if (prange->actual_loc == best_loc) {
-   pr_debug("svms 0x%p [0x%lx 0x%lx] already on best_loc 0x%x\n",
-prange->svms, prange->start, prange->last, best_loc);
-   return 0;
-   }
-
adev = svm_range_get_adev_by_id(prange, best_loc);
if (!adev) {
pr_debug("failed to get device by id 0x%x\n", best_loc);
@@ -791,11 +785,7 @@ int
 svm_migrate_to_vram(struct svm_range *prange, uint32_t best_loc,
struct mm_struct *mm)
 {
-   if  (!prange->actual_loc)
-   return svm_migrate_ram_to_vram(prange, best_loc, mm);
-   else
-   return svm_migrate_vram_to_vram(prange, best_loc, mm);
-
+   return svm_migrate_ram_to_vram(prange, best_loc, mm);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 7b50395ec377..1e15a6170635 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1420,42 +1420,38 @@ static int svm_range_validate_and_map(struct mm_struct 
*mm,
 
svm_range_reserve_bos(&ctx);
 
-   if (!prange->actual_loc) {
-   p = container_of(prange->svms, struct kfd_process, svms);
-   owner = kfd_svm_page_owner(p, find_first_bit(ctx.bitmap,
-   MAX_GPU_INSTANCE));
-   for_each_set_bit(idx, ctx.bitmap, MAX_GPU_INSTANCE) {
-   if (kfd_svm_page_owner(p, idx) != owner) {
-   owner = NULL;
-   break;
-   }
-   }
-   r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL,
-  prange->start << PAGE_SHIFT,
-  prange->npages, &hmm_range,
-  false, true, owner);
-   if (r) {
-   pr_debug("failed %d to get svm range pages\n", r);
-   goto unreserve_out;
-   }
-
-   r = svm_range_dma_map(prange, ctx.bitmap,
- hmm_range->hmm_pfns);
-   if (r) {
-   pr_debug("failed %d to dma map range\n", r);
-   goto unreserve_out;
+   p = container_of(prange->svms, struct kfd_process, svms);
+   owner = kfd_svm_page_owner(p, find_first_bit(ctx.bitmap,
+   MAX_GPU_INSTANCE));
+   for_each_set_bit(idx, ctx.bitmap, MAX_GPU_INSTANCE) {
+   if (kfd_svm_page_owner(p, idx) != owner) {
+   owner = NULL;
+   break;
}
+   }
+   r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL,
+  prange->start << PAGE_SHIFT,
+  prange->npages, &hmm_range,
+  false, true, owner);
+   if (r) {
+   pr_debug("failed %d to get svm range pages\n", r);
+   goto unreserve_out;
+   }
 
-   prange->validated_once = true;
+   r = svm_range_dma_map(prange, ctx.bitmap,
+ hmm_range->hmm_pfns);
+   if (r) {
+   pr_debug("failed %d to dma map range\n", r);
+   goto unreserve_out;
}
 
+   prange->validated_once = true;
+
svm_range_lock(prange);
-   if (!prange->actual_loc) {
-   if (amdgpu_hmm_range_get_pages_done(hmm_range)) {
-   pr_debug("hmm update the range, need validate again\n");
-   r = -EAGAIN;
-   goto unlock_out;
-   }
+   if (amdgpu_hmm_range_get_pages_done(hmm_range)) {
+   pr_debug("hmm update the range, need validate again\n");
+   r = -EAGAIN;
+   goto unlock_out;
}
if (!list_empty(&prange->child_list)) {
pr_debug("range split by unmap in parallel, validate again\n");
@@ -2740,20 +2736,9 @@ svm_range_trigger_migration(struct mm_struct *mm, struct 
svm_range *prange,
*migrated = false;

[PATCH 07/10] drm/amdkfd: skip migration for pages already in VRAM

2021-05-27 Thread Felix Kuehling
From: Alex Sierra 

Migration skipped for pages that are already in VRAM
domain. These could be the result of previous partial
migrations to SYS RAM, and prefetch back to VRAM.
Ex. Coherent pages in VRAM that were not written/invalidated after
a copy-on-write.

Signed-off-by: Alex Sierra 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 6fd68528c425..8a3f21d76915 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -329,14 +329,15 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, 
struct svm_range *prange,
for (i = j = 0; i < npages; i++) {
struct page *spage;
 
-   dst[i] = vram_addr + (j << PAGE_SHIFT);
-   migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]);
-   svm_migrate_get_vram_page(prange, migrate->dst[i]);
-
-   migrate->dst[i] = migrate_pfn(migrate->dst[i]);
-   migrate->dst[i] |= MIGRATE_PFN_LOCKED;
-
-   if (migrate->src[i] & MIGRATE_PFN_VALID) {
+   spage = migrate_pfn_to_page(migrate->src[i]);
+   if (spage && !is_zone_device_page(spage)) {
+   dst[i] = vram_addr + (j << PAGE_SHIFT);
+   migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]);
+   svm_migrate_get_vram_page(prange, migrate->dst[i]);
+   migrate->dst[i] = migrate_pfn(migrate->dst[i]);
+   migrate->dst[i] |= MIGRATE_PFN_LOCKED;
+   }
+   if (migrate->dst[i] & MIGRATE_PFN_VALID) {
spage = migrate_pfn_to_page(migrate->src[i]);
src[i] = dma_map_page(dev, spage, 0, PAGE_SIZE,
  DMA_TO_DEVICE);
-- 
2.31.1



[PATCH 06/10] drm/amdkfd: skip invalid pages during migrations

2021-05-27 Thread Felix Kuehling
From: Alex Sierra 

Invalid pages can be the result of pages that have been migrated
already due to copy-on-write procedure or pages that were never
migrated to VRAM in first place. This is not an issue anymore,
as pranges now support mixed memory domains (CPU/GPU).

Signed-off-by: Alex Sierra 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 38 +++-
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index b298aa8dea4d..6fd68528c425 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -419,7 +419,6 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
size_t size;
void *buf;
int r = -ENOMEM;
-   int retry = 0;
 
memset(&migrate, 0, sizeof(migrate));
migrate.vma = vma;
@@ -438,7 +437,6 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
migrate.dst = migrate.src + npages;
scratch = (dma_addr_t *)(migrate.dst + npages);
 
-retry:
r = migrate_vma_setup(&migrate);
if (r) {
pr_debug("failed %d prepare migrate svms 0x%p [0x%lx 0x%lx]\n",
@@ -446,17 +444,9 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
goto out_free;
}
if (migrate.cpages != npages) {
-   pr_debug("collect 0x%lx/0x%llx pages, retry\n", migrate.cpages,
+   pr_debug("Partial migration. 0x%lx/0x%llx pages can be 
migrated\n",
+migrate.cpages,
 npages);
-   migrate_vma_finalize(&migrate);
-   if (retry++ >= 3) {
-   r = -ENOMEM;
-   pr_debug("failed %d migrate svms 0x%p [0x%lx 0x%lx]\n",
-r, prange->svms, prange->start, prange->last);
-   goto out_free;
-   }
-
-   goto retry;
}
 
if (migrate.cpages) {
@@ -547,9 +537,8 @@ static void svm_migrate_page_free(struct page *page)
 static int
 svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
struct migrate_vma *migrate, struct dma_fence **mfence,
-   dma_addr_t *scratch)
+   dma_addr_t *scratch, uint64_t npages)
 {
-   uint64_t npages = migrate->cpages;
struct device *dev = adev->dev;
uint64_t *src;
dma_addr_t *dst;
@@ -566,15 +555,23 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, 
struct svm_range *prange,
src = (uint64_t *)(scratch + npages);
dst = scratch;
 
-   for (i = 0, j = 0; i < npages; i++, j++, addr += PAGE_SIZE) {
+   for (i = 0, j = 0; i < npages; i++, addr += PAGE_SIZE) {
struct page *spage;
 
spage = migrate_pfn_to_page(migrate->src[i]);
-   if (!spage) {
-   pr_debug("failed get spage svms 0x%p [0x%lx 0x%lx]\n",
+   if (!spage || !is_zone_device_page(spage)) {
+   pr_debug("invalid page. Could be in CPU already svms 
0x%p [0x%lx 0x%lx]\n",
 prange->svms, prange->start, prange->last);
-   r = -ENOMEM;
-   goto out_oom;
+   if (j) {
+   r = svm_migrate_copy_memory_gart(adev, dst + i 
- j,
+src + i - j, j,
+
FROM_VRAM_TO_RAM,
+mfence);
+   if (r)
+   goto out_oom;
+   j = 0;
+   }
+   continue;
}
src[i] = svm_migrate_addr(adev, spage);
if (i > 0 && src[i] != src[i - 1] + PAGE_SIZE) {
@@ -607,6 +604,7 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct 
svm_range *prange,
 
migrate->dst[i] = migrate_pfn(page_to_pfn(dpage));
migrate->dst[i] |= MIGRATE_PFN_LOCKED;
+   j++;
}
 
r = svm_migrate_copy_memory_gart(adev, dst + i - j, src + i - j, j,
@@ -664,7 +662,7 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct 
svm_range *prange,
 
if (migrate.cpages) {
r = svm_migrate_copy_to_ram(adev, prange, &migrate, &mfence,
-   scratch);
+   scratch, npages);
migrate_vma_pages(&migrate);
svm_migrate_copy_done(adev, mfence);
migrate_vma_finalize(&migrate);
-- 
2.31.1



[PATCH 05/10] drm/amdkfd: classify and map mixed svm range pages in GPU

2021-05-27 Thread Felix Kuehling
From: Alex Sierra 

[Why]
svm ranges can have mixed pages from device or system memory.
A good example is, after a prange has been allocated in VRAM and a
copy-on-write is triggered by a fork. This invalidates some pages
inside the prange. Endding up in mixed pages.

[How]
By classifying each page inside a prange, based on its type. Device or
system memory, during dma mapping call. If page corresponds
to VRAM domain, a flag is set to its dma_addr entry for each GPU.
Then, at the GPU page table mapping. All group of contiguous pages within
the same type are mapped with their proper pte flags.

v2:
Instead of using ttm_res to calculate vram pfns in the svm_range. It is now
done by setting the vram real physical address into drm_addr array.
This makes more flexible VRAM management, plus removes the need to have
a BO reference in the svm_range.

v3:
Remove mapping member from svm_range

Signed-off-by: Alex Sierra 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 72 +---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  2 +-
 2 files changed, 46 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 2b4318646a75..7b50395ec377 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -119,11 +119,12 @@ static void svm_range_remove_notifier(struct svm_range 
*prange)
 }
 
 static int
-svm_range_dma_map_dev(struct device *dev, dma_addr_t **dma_addr,
+svm_range_dma_map_dev(struct amdgpu_device *adev, dma_addr_t **dma_addr,
  unsigned long *hmm_pfns, uint64_t npages)
 {
enum dma_data_direction dir = DMA_BIDIRECTIONAL;
dma_addr_t *addr = *dma_addr;
+   struct device *dev = adev->dev;
struct page *page;
int i, r;
 
@@ -141,6 +142,14 @@ svm_range_dma_map_dev(struct device *dev, dma_addr_t 
**dma_addr,
dma_unmap_page(dev, addr[i], PAGE_SIZE, dir);
 
page = hmm_pfn_to_page(hmm_pfns[i]);
+   if (is_zone_device_page(page)) {
+   addr[i] = (hmm_pfns[i] << PAGE_SHIFT) +
+  adev->vm_manager.vram_base_offset -
+  adev->kfd.dev->pgmap.range.start;
+   addr[i] |= SVM_RANGE_VRAM_DOMAIN;
+   pr_debug("vram address detected: 0x%llx\n", addr[i]);
+   continue;
+   }
addr[i] = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
r = dma_mapping_error(dev, addr[i]);
if (r) {
@@ -175,7 +184,7 @@ svm_range_dma_map(struct svm_range *prange, unsigned long 
*bitmap,
}
adev = (struct amdgpu_device *)pdd->dev->kgd;
 
-   r = svm_range_dma_map_dev(adev->dev, &prange->dma_addr[gpuidx],
+   r = svm_range_dma_map_dev(adev, &prange->dma_addr[gpuidx],
  hmm_pfns, prange->npages);
if (r)
break;
@@ -1003,21 +1012,22 @@ svm_range_split_by_granularity(struct kfd_process *p, 
struct mm_struct *mm,
 }
 
 static uint64_t
-svm_range_get_pte_flags(struct amdgpu_device *adev, struct svm_range *prange)
+svm_range_get_pte_flags(struct amdgpu_device *adev, struct svm_range *prange,
+   int domain)
 {
struct amdgpu_device *bo_adev;
uint32_t flags = prange->flags;
uint32_t mapping_flags = 0;
uint64_t pte_flags;
-   bool snoop = !prange->ttm_res;
+   bool snoop = (domain != SVM_RANGE_VRAM_DOMAIN);
bool coherent = flags & KFD_IOCTL_SVM_FLAG_COHERENT;
 
-   if (prange->svm_bo && prange->ttm_res)
+   if (domain == SVM_RANGE_VRAM_DOMAIN)
bo_adev = amdgpu_ttm_adev(prange->svm_bo->bo->tbo.bdev);
 
switch (adev->asic_type) {
case CHIP_ARCTURUS:
-   if (prange->svm_bo && prange->ttm_res) {
+   if (domain == SVM_RANGE_VRAM_DOMAIN) {
if (bo_adev == adev) {
mapping_flags |= coherent ?
AMDGPU_VM_MTYPE_CC : AMDGPU_VM_MTYPE_RW;
@@ -1032,7 +1042,7 @@ svm_range_get_pte_flags(struct amdgpu_device *adev, 
struct svm_range *prange)
}
break;
case CHIP_ALDEBARAN:
-   if (prange->svm_bo && prange->ttm_res) {
+   if (domain == SVM_RANGE_VRAM_DOMAIN) {
if (bo_adev == adev) {
mapping_flags |= coherent ?
AMDGPU_VM_MTYPE_CC : AMDGPU_VM_MTYPE_RW;
@@ -1061,14 +1071,14 @@ svm_range_get_pte_flags(struct amdgpu_device *adev, 
struct svm_range *prange)
mapping_flags |= AMDGPU_VM_PAGE_EXECUTABLE;
 
pte_flags = AMDGPU_PTE_VALID;
-   pte_flags |= prange->ttm_res ? 0 : AMDGPU_PTE_SYSTEM;
+   pte_flags |= (domain == SVM_RANGE_VRAM_DOMAIN) 

[PATCH 03/10] drm/amdkfd: set owner ref to svm range prefault

2021-05-27 Thread Felix Kuehling
From: Alex Sierra 

svm_range_prefault is called right before migrations to VRAM,
to make sure pages are resident in system memory before the migration.
With partial migrations, this reference is used by hmm range get pages
to avoid migrating pages that are already in the same VRAM domain.

Signed-off-by: Alex Sierra 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 +++--
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 3 ++-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 11f7f590c6ec..b298aa8dea4d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -512,7 +512,7 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t 
best_loc,
 prange->start, prange->last, best_loc);
 
/* FIXME: workaround for page locking bug with invalid pages */
-   svm_range_prefault(prange, mm);
+   svm_range_prefault(prange, mm, SVM_ADEV_PGMAP_OWNER(adev));
 
start = prange->start << PAGE_SHIFT;
end = (prange->last + 1) << PAGE_SHIFT;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index b939f353ac8c..54f47b09b14a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2646,7 +2646,8 @@ svm_range_best_prefetch_location(struct svm_range *prange)
 /* FIXME: This is a workaround for page locking bug when some pages are
  * invalid during migration to VRAM
  */
-void svm_range_prefault(struct svm_range *prange, struct mm_struct *mm)
+void svm_range_prefault(struct svm_range *prange, struct mm_struct *mm,
+   void *owner)
 {
struct hmm_range *hmm_range;
int r;
@@ -2657,7 +2658,7 @@ void svm_range_prefault(struct svm_range *prange, struct 
mm_struct *mm)
r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL,
   prange->start << PAGE_SHIFT,
   prange->npages, &hmm_range,
-  false, true, NULL);
+  false, true, owner);
if (!r) {
amdgpu_hmm_range_get_pages_done(hmm_range);
prange->validated_once = true;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
index 4297250f259d..08542fe39303 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
@@ -176,7 +176,8 @@ void schedule_deferred_list_work(struct svm_range_list 
*svms);
 void svm_range_dma_unmap(struct device *dev, dma_addr_t *dma_addr,
 unsigned long offset, unsigned long npages);
 void svm_range_free_dma_mappings(struct svm_range *prange);
-void svm_range_prefault(struct svm_range *prange, struct mm_struct *mm);
+void svm_range_prefault(struct svm_range *prange, struct mm_struct *mm,
+   void *owner);
 
 #else
 
-- 
2.31.1



[PATCH 04/10] drm/amdgpu: get owner ref in validate and map

2021-05-27 Thread Felix Kuehling
From: Alex Sierra 

Get the proper owner reference for amdgpu_hmm_range_get_pages function.
This is useful for partial migrations. To avoid migrating back to
system memory, VRAM pages, that are accessible by all devices in the
same memory domain.
Ex. multiple devices in the same hive.

Signed-off-by: Alex Sierra 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 54f47b09b14a..2b4318646a75 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1313,6 +1313,17 @@ static void svm_range_unreserve_bos(struct 
svm_validate_context *ctx)
ttm_eu_backoff_reservation(&ctx->ticket, &ctx->validate_list);
 }
 
+static void *kfd_svm_page_owner(struct kfd_process *p, int32_t gpuidx)
+{
+   struct kfd_process_device *pdd;
+   struct amdgpu_device *adev;
+
+   pdd = kfd_process_device_from_gpuidx(p, gpuidx);
+   adev = (struct amdgpu_device *)pdd->dev->kgd;
+
+   return SVM_ADEV_PGMAP_OWNER(adev);
+}
+
 /*
  * Validation+GPU mapping with concurrent invalidation (MMU notifiers)
  *
@@ -1343,6 +1354,9 @@ static int svm_range_validate_and_map(struct mm_struct 
*mm,
 {
struct svm_validate_context ctx;
struct hmm_range *hmm_range;
+   struct kfd_process *p;
+   void *owner;
+   int32_t idx;
int r = 0;
 
ctx.process = container_of(prange->svms, struct kfd_process, svms);
@@ -1389,10 +1403,19 @@ static int svm_range_validate_and_map(struct mm_struct 
*mm,
svm_range_reserve_bos(&ctx);
 
if (!prange->actual_loc) {
+   p = container_of(prange->svms, struct kfd_process, svms);
+   owner = kfd_svm_page_owner(p, find_first_bit(ctx.bitmap,
+   MAX_GPU_INSTANCE));
+   for_each_set_bit(idx, ctx.bitmap, MAX_GPU_INSTANCE) {
+   if (kfd_svm_page_owner(p, idx) != owner) {
+   owner = NULL;
+   break;
+   }
+   }
r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL,
   prange->start << PAGE_SHIFT,
   prange->npages, &hmm_range,
-  false, true, NULL);
+  false, true, owner);
if (r) {
pr_debug("failed %d to get svm range pages\n", r);
goto unreserve_out;
-- 
2.31.1



[PATCH 02/10] drm/amdkfd: add owner ref param to get hmm pages

2021-05-27 Thread Felix Kuehling
From: Alex Sierra 

The parameter is used in the dev_private_owner to decide if device
pages in the range require to be migrated back to system memory, based
if they are or not in the same memory domain.
In this case, this reference could come from the same memory domain
with devices connected to the same hive.

Signed-off-by: Alex Sierra 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h  | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c| 4 ++--
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index 2741c28ff1b5..378c238c2099 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -160,7 +160,7 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier 
*notifier,
   struct mm_struct *mm, struct page **pages,
   uint64_t start, uint64_t npages,
   struct hmm_range **phmm_range, bool readonly,
-  bool mmap_locked)
+  bool mmap_locked, void *owner)
 {
struct hmm_range *hmm_range;
unsigned long timeout;
@@ -185,6 +185,7 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier 
*notifier,
hmm_range->hmm_pfns = pfns;
hmm_range->start = start;
hmm_range->end = start + npages * PAGE_SIZE;
+   hmm_range->dev_private_owner = owner;
 
/* Assuming 512MB takes maxmium 1 second to fault page address */
timeout = max(npages >> 17, 1ULL) * HMM_RANGE_DEFAULT_TIMEOUT;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h
index 7f7d37a457c3..14a3c1864085 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h
@@ -34,7 +34,7 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier 
*notifier,
   struct mm_struct *mm, struct page **pages,
   uint64_t start, uint64_t npages,
   struct hmm_range **phmm_range, bool readonly,
-  bool mmap_locked);
+  bool mmap_locked, void *owner);
 int amdgpu_hmm_range_get_pages_done(struct hmm_range *hmm_range);
 
 #if defined(CONFIG_HMM_MIRROR)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 7e7d8330d64b..c13f7fbfc070 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -709,7 +709,7 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
struct page **pages)
readonly = amdgpu_ttm_tt_is_readonly(ttm);
r = amdgpu_hmm_range_get_pages(&bo->notifier, mm, pages, start,
   ttm->num_pages, >t->range, readonly,
-  false);
+  false, NULL);
 out_putmm:
mmput(mm);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index b665e9ff77e3..b939f353ac8c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1392,7 +1392,7 @@ static int svm_range_validate_and_map(struct mm_struct 
*mm,
r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL,
   prange->start << PAGE_SHIFT,
   prange->npages, &hmm_range,
-  false, true);
+  false, true, NULL);
if (r) {
pr_debug("failed %d to get svm range pages\n", r);
goto unreserve_out;
@@ -2657,7 +2657,7 @@ void svm_range_prefault(struct svm_range *prange, struct 
mm_struct *mm)
r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL,
   prange->start << PAGE_SHIFT,
   prange->npages, &hmm_range,
-  false, true);
+  false, true, NULL);
if (!r) {
amdgpu_hmm_range_get_pages_done(hmm_range);
prange->validated_once = true;
-- 
2.31.1



[PATCH 01/10] drm/amdkfd: device pgmap owner at the svm migrate init

2021-05-27 Thread Felix Kuehling
From: Alex Sierra 

pgmap owner member at the svm migrate init could be referenced
to either adev or hive, depending on device topology.

Signed-off-by: Alex Sierra 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 6 +++---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 3 +++
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index fd8f544f0de2..11f7f590c6ec 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -426,7 +426,7 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
migrate.start = start;
migrate.end = end;
migrate.flags = MIGRATE_VMA_SELECT_SYSTEM;
-   migrate.pgmap_owner = adev;
+   migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev);
 
size = 2 * sizeof(*migrate.src) + sizeof(uint64_t) + sizeof(dma_addr_t);
size *= npages;
@@ -641,7 +641,7 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct 
svm_range *prange,
migrate.start = start;
migrate.end = end;
migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
-   migrate.pgmap_owner = adev;
+   migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev);
 
size = 2 * sizeof(*migrate.src) + sizeof(uint64_t) + sizeof(dma_addr_t);
size *= npages;
@@ -907,7 +907,7 @@ int svm_migrate_init(struct amdgpu_device *adev)
pgmap->range.start = res->start;
pgmap->range.end = res->end;
pgmap->ops = &svm_migrate_pgmap_ops;
-   pgmap->owner = adev;
+   pgmap->owner = SVM_ADEV_PGMAP_OWNER(adev);
pgmap->flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
r = devm_memremap_pages(adev->dev, pgmap);
if (IS_ERR(r)) {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
index 573f984b81fe..4297250f259d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
@@ -35,6 +35,9 @@
 #include "amdgpu.h"
 #include "kfd_priv.h"
 
+#define SVM_ADEV_PGMAP_OWNER(adev)\
+   ((adev)->hive ? (void *)(adev)->hive : (void *)(adev))
+
 struct svm_range_bo {
struct amdgpu_bo*bo;
struct kref kref;
-- 
2.31.1



[PATCH 4/4] drm/panfrost: Handle PANFROST_JD_REQ_PERMON

2021-05-27 Thread alyssa . rosenzweig
From: Alyssa Rosenzweig 

If a job requires cycle counters or timestamps, we must enable cycle
counting just before issuing the job, and disable as soon as the job
completes.

Since this extends the UABI, we bump the driver minor version and date.
That lets userspace detect cycle counter support, and only advertise
features like ARB_shader_clock on kernels with this commit.

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 10 +++---
 drivers/gpu/drm/panfrost/panfrost_job.c |  6 ++
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index ca07098a6..0f11d2df4 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -20,6 +20,10 @@
 #include "panfrost_gpu.h"
 #include "panfrost_perfcnt.h"
 
+#define JOB_REQUIREMENTS \
+   (PANFROST_JD_REQ_FS | \
+PANFROST_JD_REQ_PERMON)
+
 static bool unstable_ioctls;
 module_param_unsafe(unstable_ioctls, bool, 0600);
 
@@ -247,7 +251,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, 
void *data,
if (!args->jc)
return -EINVAL;
 
-   if (args->requirements && args->requirements != PANFROST_JD_REQ_FS)
+   if (args->requirements & ~JOB_REQUIREMENTS)
return -EINVAL;
 
if (args->out_sync > 0) {
@@ -557,9 +561,9 @@ static const struct drm_driver panfrost_drm_driver = {
.fops   = &panfrost_drm_driver_fops,
.name   = "panfrost",
.desc   = "panfrost DRM",
-   .date   = "20180908",
+   .date   = "20210527",
.major  = 1,
-   .minor  = 1,
+   .minor  = 2,
 
.gem_create_object  = panfrost_gem_create_object,
.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 6003cfeb1..b78147e3d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -165,6 +165,9 @@ static void panfrost_job_hw_submit(struct panfrost_job 
*job, int js)
return;
}
 
+   if (job->requirements & PANFROST_JD_REQ_PERMON)
+   panfrost_acquire_permon(job->pfdev);
+
cfg = panfrost_mmu_as_get(pfdev, &job->file_priv->mmu);
 
job_write(pfdev, JS_HEAD_NEXT_LO(js), jc_head & 0x);
@@ -296,6 +299,9 @@ static void panfrost_job_cleanup(struct kref *ref)
kvfree(job->bos);
}
 
+   if (job->requirements & PANFROST_JD_REQ_PERMON)
+   panfrost_release_permon(job->pfdev);
+
kfree(job);
 }
 
-- 
2.30.2



[PATCH 3/4] drm/panfrost: Add permon acquire/release helpers

2021-05-27 Thread alyssa . rosenzweig
From: Alyssa Rosenzweig 

Wrap the underlying CYCLE_COUNT_START/STOP commands in a safe interface
that ensures the commands are only issued where required by guarding
behind an atomic counter. In particular, we need to be careful about
races between multiple in-flight jobs, where only some require cycle
counts.

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_device.h |  3 +++
 drivers/gpu/drm/panfrost/panfrost_gpu.c| 20 
 drivers/gpu/drm/panfrost/panfrost_gpu.h|  3 +++
 3 files changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index 597cf1459..8a89aa274 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -117,6 +117,9 @@ struct panfrost_device {
struct shrinker shrinker;
 
struct panfrost_devfreq pfdevfreq;
+
+   /* Number of active jobs requiring performance monitoring */
+   atomic_t permon_pending;
 };
 
 struct panfrost_mmu {
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c 
b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index 2aae636f1..acacceb15 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -399,3 +399,23 @@ u32 panfrost_gpu_get_latest_flush_id(struct 
panfrost_device *pfdev)
 
return 0;
 }
+
+void panfrost_acquire_permon(struct panfrost_device *pfdev)
+{
+   /* If another in-flight job enabled permon, we don't have to */
+   if (atomic_inc_return(&pfdev->permon_pending) > 1)
+   return;
+
+   /* Otherwise, we're the first user */
+   gpu_write(pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_START);
+}
+
+void panfrost_release_permon(struct panfrost_device *pfdev)
+{
+   /* If another in-flight job needs permon, keep it active */
+   if (atomic_dec_return(&pfdev->permon_pending) > 0)
+   return;
+
+   /* Otherwise, we're the last user */
+   gpu_write(pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_STOP);
+}
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.h 
b/drivers/gpu/drm/panfrost/panfrost_gpu.h
index 468c51e7e..01a91af09 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.h
@@ -18,4 +18,7 @@ void panfrost_gpu_power_off(struct panfrost_device *pfdev);
 
 void panfrost_gpu_amlogic_quirk(struct panfrost_device *pfdev);
 
+void panfrost_acquire_permon(struct panfrost_device *pfdev);
+void panfrost_release_permon(struct panfrost_device *pfdev);
+
 #endif
-- 
2.30.2



[PATCH 2/4] drm/panfrost: Add CYCLE_COUNT_START/STOP commands

2021-05-27 Thread alyssa . rosenzweig
From: Alyssa Rosenzweig 

Add additional values of GPU_COMMAND required to enable and disable the
cycle (and timestamp) counters. Values from mali_kbase.

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_regs.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index eddaa62ad..8ac60de6f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -46,6 +46,8 @@
 #define   GPU_CMD_SOFT_RESET   0x01
 #define   GPU_CMD_PERFCNT_CLEAR0x03
 #define   GPU_CMD_PERFCNT_SAMPLE   0x04
+#define   GPU_CMD_CYCLE_COUNT_START0x05
+#define   GPU_CMD_CYCLE_COUNT_STOP 0x06
 #define   GPU_CMD_CLEAN_CACHES 0x07
 #define   GPU_CMD_CLEAN_INV_CACHES 0x08
 #define GPU_STATUS 0x34
-- 
2.30.2



[PATCH 1/4] drm/panfrost: Add cycle counter job requirement

2021-05-27 Thread alyssa . rosenzweig
From: Alyssa Rosenzweig 

Extend the Panfrost UABI with a new job requirement for cycle counters
(and GPU timestamps, by extension). This requirement is used in
userspace to implement ARB_shader_clock, an OpenGL extension reporting
the GPU cycle count within a shader. The same mechanism will be required
to implement timestamp queries as a "write value - timestamp" job.

We cannot enable cycle counters unconditionally, as enabling them
increases GPU power consumption. They should be left off unless actually
required by the application for profiling purposes.

Signed-off-by: Alyssa Rosenzweig 
---
 include/uapi/drm/panfrost_drm.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index ec19db1ee..27e6cb941 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -39,7 +39,8 @@ extern "C" {
 #define DRM_IOCTL_PANFROST_PERFCNT_ENABLE  DRM_IOW(DRM_COMMAND_BASE + 
DRM_PANFROST_PERFCNT_ENABLE, struct drm_panfrost_perfcnt_enable)
 #define DRM_IOCTL_PANFROST_PERFCNT_DUMP
DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_PERFCNT_DUMP, struct 
drm_panfrost_perfcnt_dump)
 
-#define PANFROST_JD_REQ_FS (1 << 0)
+#define PANFROST_JD_REQ_FS (1 << 0)
+#define PANFROST_JD_REQ_PERMON (1 << 1)
 /**
  * struct drm_panfrost_submit - ioctl argument for submitting commands to the 
3D
  * engine.
-- 
2.30.2



[PATCH 0/4] drm/panfrost: Plumb cycle counters to userspace

2021-05-27 Thread alyssa . rosenzweig
From: Alyssa Rosenzweig 

Mali has hardware cycle counters (and GPU timestamps) available for
profiling. These are exposed in various ways:

- Kernel: As CYCLE_COUNT and TIMESTAMP registers 
- Job chain: As WRITE_VALUE descriptors
- Shader (Midgard): As LD_SPECIAL selectors
- Shader (Bifrost): As the LD_GCLK.u64 instruction

These form building blocks for profiling features, for example the
ARB_shader_clock extension which accesses the counters from an
application's shader.

The counters consume power, so it is recommended to disable the counters
when not in use. To do so, we follow the strategy from mali_kbase: add a
counter requirement to the job, start the counters only when required,
and stop them as quickly as possible.

The new UABI will be used in Mesa. An implementation of ARB_shader_clock
using this UABI is available as a pending upstream merge request [1].
The implementation passes the relevant piglit test, validating both the
kernel and mesa.

The main outstanding questing is the proper name. Performance monitoring
("PERMON") is the name used by kbase, but it's jargon-y and risks
confusion with performance counters, an orthogonal mechanism. Cycle
count is more descriptive and matches the actual hardware name, but
obscures that the same mechanism is required for GPU timestamps. This
bit of bikeshedding aside, I'm pleased with the patches.

[1] https://gitlab.freedesktop.org/mesa/mesa/merge_requests/11051

Alyssa Rosenzweig (4):
  drm/panfrost: Add cycle counter job requirement
  drm/panfrost: Add CYCLE_COUNT_START/STOP commands
  drm/panfrost: Add permon acquire/release helpers
  drm/panfrost: Handle PANFROST_JD_REQ_PERMON

 drivers/gpu/drm/panfrost/panfrost_device.h |  3 +++
 drivers/gpu/drm/panfrost/panfrost_drv.c| 10 +++---
 drivers/gpu/drm/panfrost/panfrost_gpu.c| 20 
 drivers/gpu/drm/panfrost/panfrost_gpu.h|  3 +++
 drivers/gpu/drm/panfrost/panfrost_job.c|  6 ++
 drivers/gpu/drm/panfrost/panfrost_regs.h   |  2 ++
 include/uapi/drm/panfrost_drm.h|  3 ++-
 7 files changed, 43 insertions(+), 4 deletions(-)

-- 
2.30.2



Re: [PATCH v5 3/3] drm_dp_cec: add MST support

2021-05-27 Thread Lyude Paul
On Tue, 2021-05-25 at 10:59 +1000, Sam McNally wrote:
> With DP v2.0 errata E5, CEC tunneling can be supported through an MST
> topology.
> 
> When tunneling CEC through an MST port, CEC IRQs are delivered via a
> sink event notify message; when a sink event notify message is received,
> trigger CEC IRQ handling - ESI1 is not used for remote CEC IRQs so its
> value is not checked.
> 
> Register and unregister for all MST connectors, ensuring their
> drm_dp_aux_cec struct won't be accessed uninitialized.
> 
> Reviewed-by: Hans Verkuil 
> Signed-off-by: Sam McNally 
> ---
> 
> (no changes since v4)
> 
> Changes in v4:
> - Removed use of work queues
> - Updated checks of aux.transfer to accept aux.is_remote
> 
> Changes in v3:
> - Fixed whitespace in drm_dp_cec_mst_irq_work()
> - Moved drm_dp_cec_mst_set_edid_work() with the other set_edid functions
> 
> Changes in v2:
> - Used aux->is_remote instead of aux->cec.is_mst, removing the need for
>   the previous patch in the series
> - Added a defensive check for null edid in the deferred set_edid work,
>   in case the edid is no longer valid at that point
> 
>  drivers/gpu/drm/drm_dp_cec.c  | 20 
>  drivers/gpu/drm/drm_dp_mst_topology.c | 24 
>  2 files changed, 40 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_cec.c b/drivers/gpu/drm/drm_dp_cec.c
> index 3ab2609f9ec7..1abd3f4654dc 100644
> --- a/drivers/gpu/drm/drm_dp_cec.c
> +++ b/drivers/gpu/drm/drm_dp_cec.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * Unfortunately it turns out that we have a chicken-and-egg situation
> @@ -245,13 +246,22 @@ void drm_dp_cec_irq(struct drm_dp_aux *aux)
> int ret;
>  
> /* No transfer function was set, so not a DP connector */
> -   if (!aux->transfer)
> +   if (!aux->transfer && !aux->is_remote)
> return;
>  
> mutex_lock(&aux->cec.lock);
> if (!aux->cec.adap)
> goto unlock;
>  
> +   if (aux->is_remote) {
> +   /*
> +    * For remote connectors, CEC IRQ is triggered by an
> explicit
> +    * message so ESI1 is not involved.
> +    */
> +   drm_dp_cec_handle_irq(aux);
> +   goto unlock;
> +   }
> +
> ret = drm_dp_dpcd_readb(aux, DP_DEVICE_SERVICE_IRQ_VECTOR_ESI1,
> &cec_irq);
> if (ret < 0 || !(cec_irq & DP_CEC_IRQ))
> @@ -307,7 +317,7 @@ void drm_dp_cec_set_edid(struct drm_dp_aux *aux, const
> struct edid *edid)
> u8 cap;
>  
> /* No transfer function was set, so not a DP connector */
> -   if (!aux->transfer)
> +   if (!aux->transfer && !aux->is_remote)
> return;
>  
>  #ifndef CONFIG_MEDIA_CEC_RC
> @@ -375,6 +385,7 @@ void drm_dp_cec_set_edid(struct drm_dp_aux *aux, const
> struct edid *edid)
>  unlock:
> mutex_unlock(&aux->cec.lock);
>  }
> +
>  EXPORT_SYMBOL(drm_dp_cec_set_edid);

probably want to get rid of this whitespace

With that fixed, this is:

Reviewed-by: Lyude Paul 

>  
>  /*
> @@ -383,7 +394,7 @@ EXPORT_SYMBOL(drm_dp_cec_set_edid);
>  void drm_dp_cec_unset_edid(struct drm_dp_aux *aux)
>  {
> /* No transfer function was set, so not a DP connector */
> -   if (!aux->transfer)
> +   if (!aux->transfer && !aux->is_remote)
> return;
>  
> cancel_delayed_work_sync(&aux->cec.unregister_work);
> @@ -393,6 +404,7 @@ void drm_dp_cec_unset_edid(struct drm_dp_aux *aux)
> goto unlock;
>  
> cec_phys_addr_invalidate(aux->cec.adap);
> +
> /*
>  * We're done if we want to keep the CEC device
>  * (drm_dp_cec_unregister_delay is >= NEVER_UNREG_DELAY) or if the
> @@ -428,7 +440,7 @@ void drm_dp_cec_register_connector(struct drm_dp_aux
> *aux,
>    struct drm_connector *connector)
>  {
> WARN_ON(aux->cec.adap);
> -   if (WARN_ON(!aux->transfer))
> +   if (WARN_ON(!aux->transfer && !aux->is_remote))
> return;
> aux->cec.connector = connector;
> INIT_DELAYED_WORK(&aux->cec.unregister_work,
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 29aad3b6b31a..5612caf9fb49 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -2359,6 +2359,8 @@ static void build_mst_prop_path(const struct
> drm_dp_mst_branch *mstb,
>  int drm_dp_mst_connector_late_register(struct drm_connector *connector,
>    struct drm_dp_mst_port *port)
>  {
> +   drm_dp_cec_register_connector(&port->aux, connector);
> +
> drm_dbg_kms(port->mgr->dev, "registering %s remote bus for %s\n",
>     port->aux.name, connector->kdev->kobj.name);
>  
> @@ -2382,6 +2384,8 @@ void drm_dp_mst_connector_early_unregister(struct
> drm_connector

Re: [PATCH v5 1/3] drm/dp_mst: Add self-tests for up requests

2021-05-27 Thread Lyude Paul
On Tue, 2021-05-25 at 10:59 +1000, Sam McNally wrote:
> Up requests are decoded by drm_dp_sideband_parse_req(), which operates
> on a drm_dp_sideband_msg_rx, unlike down requests. Expand the existing
> self-test helper sideband_msg_req_encode_decode() to copy the message
> contents and length from a drm_dp_sideband_msg_tx to
> drm_dp_sideband_msg_rx and use the parse function under test in place of
> decode. Add an additional helper for testing clearly-invalid up
> messages, verifying that parse rejects them.
> 
> Add support for currently-supported up requests to
> drm_dp_dump_sideband_msg_req_body(); add support to
> drm_dp_encode_sideband_req() to allow encoding for the self-tests.
> 
> Add self-tests for CONNECTION_STATUS_NOTIFY and RESOURCE_STATUS_NOTIFY.
> 
> Signed-off-by: Sam McNally 
> ---
> 
> Changes in v5:
> - Set mock device name to more clearly attribute error/debug logging to
>   the self-test, in particular for cases where failures are expected
> 
> Changes in v4:
> - New in v4
> 
>  drivers/gpu/drm/drm_dp_mst_topology.c |  54 ++-
>  .../gpu/drm/drm_dp_mst_topology_internal.h    |   4 +
>  .../drm/selftests/test-drm_dp_mst_helper.c    | 149 --
>  3 files changed, 192 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 54604633e65c..573f39a3dc16 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -442,6 +442,37 @@ drm_dp_encode_sideband_req(const struct
> drm_dp_sideband_msg_req_body *req,
> idx++;
> }
> break;
> +   case DP_CONNECTION_STATUS_NOTIFY: {
> +   const struct drm_dp_connection_status_notify *msg;
> +
> +   msg = &req->u.conn_stat;
> +   buf[idx] = (msg->port_number & 0xf) << 4;
> +   idx++;
> +   memcpy(&raw->msg[idx], msg->guid, 16);
> +   idx += 16;
> +   raw->msg[idx] = 0;
> +   raw->msg[idx] |= msg->legacy_device_plug_status ? BIT(6) :
> 0;
> +   raw->msg[idx] |= msg->displayport_device_plug_status ?
> BIT(5) : 0;
> +   raw->msg[idx] |= msg->message_capability_status ? BIT(4) :
> 0;
> +   raw->msg[idx] |= msg->input_port ? BIT(3) : 0;
> +   raw->msg[idx] |= FIELD_PREP(GENMASK(2, 0), msg-
> >peer_device_type);
> +   idx++;
> +   break;
> +   }
> +   case DP_RESOURCE_STATUS_NOTIFY: {
> +   const struct drm_dp_resource_status_notify *msg;
> +
> +   msg = &req->u.resource_stat;
> +   buf[idx] = (msg->port_number & 0xf) << 4;
> +   idx++;
> +   memcpy(&raw->msg[idx], msg->guid, 16);
> +   idx += 16;
> +   buf[idx] = (msg->available_pbn & 0xff00) >> 8;
> +   idx++;
> +   buf[idx] = (msg->available_pbn & 0xff);
> +   idx++;
> +   break;
> +   }
> }
> raw->cur_len = idx;
>  }
> @@ -672,6 +703,22 @@ drm_dp_dump_sideband_msg_req_body(const struct
> drm_dp_sideband_msg_req_body *req
>   req->u.enc_status.stream_behavior,
>   req->u.enc_status.valid_stream_behavior);
> break;
> +   case DP_CONNECTION_STATUS_NOTIFY:
> +   P("port=%d guid=%*ph legacy=%d displayport=%d messaging=%d
> input=%d peer_type=%d",
> + req->u.conn_stat.port_number,
> + (int)ARRAY_SIZE(req->u.conn_stat.guid), req-
> >u.conn_stat.guid,
> + req->u.conn_stat.legacy_device_plug_status,
> + req->u.conn_stat.displayport_device_plug_status,
> + req->u.conn_stat.message_capability_status,
> + req->u.conn_stat.input_port,
> + req->u.conn_stat.peer_device_type);
> +   break;
> +   case DP_RESOURCE_STATUS_NOTIFY:
> +   P("port=%d guid=%*ph pbn=%d",
> + req->u.resource_stat.port_number,
> + (int)ARRAY_SIZE(req->u.resource_stat.guid), req-
> >u.resource_stat.guid,
> + req->u.resource_stat.available_pbn);
> +   break;
> default:
> P("???\n");
> break;
> @@ -1116,9 +1163,9 @@ static bool
> drm_dp_sideband_parse_resource_status_notify(const struct drm_dp_mst
> return false;
>  }
>  
> -static bool drm_dp_sideband_parse_req(const struct drm_dp_mst_topology_mgr
> *mgr,
> - struct drm_dp_sideband_msg_rx *raw,
> - struct drm_dp_sideband_msg_req_body
> *msg)
> +bool drm_dp_sideband_parse_req(const struct drm_dp_mst_topology_mgr *mgr,
> +  struct drm_dp_sideband_msg_rx *raw,
> +  struct drm_dp_sideband_msg_req_body *msg)
>  {
> memset(msg, 0, sizeof(*msg));
> 

Re: [RFC PATCH 24/97] drm/i915/guc: Add flag for mark broken CTB

2021-05-27 Thread Matthew Brost
On Thu, May 06, 2021 at 12:13:38PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko 
> 
> Once CTB descriptor is found in error state, either set by GuC
> or us, there is no need continue checking descriptor any more,
> we can rely on our internal flag.
> 
> Signed-off-by: Michal Wajdeczko 
> Signed-off-by: Matthew Brost 

Reviewed-by: Matthew Brost 

> Cc: Piotr Piórkowski 
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 13 +++--
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  2 ++
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 1afdeac683b5..178f73ab2c96 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -123,6 +123,7 @@ static void guc_ct_buffer_desc_init(struct 
> guc_ct_buffer_desc *desc,
>  
>  static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb, u32 
> cmds_addr)
>  {
> + ctb->broken = false;
>   guc_ct_buffer_desc_init(ctb->desc, cmds_addr, ctb->size);
>  }
>  
> @@ -365,9 +366,12 @@ static int ct_write(struct intel_guc_ct *ct,
>   u32 *cmds = ctb->cmds;
>   unsigned int i;
>  
> - if (unlikely(desc->is_in_error))
> + if (unlikely(ctb->broken))
>   return -EPIPE;
>  
> + if (unlikely(desc->is_in_error))
> + goto corrupted;
> +
>   if (unlikely(!IS_ALIGNED(head | tail, 4) ||
>(tail | head) >= size))
>   goto corrupted;
> @@ -423,6 +427,7 @@ static int ct_write(struct intel_guc_ct *ct,
>   CT_ERROR(ct, "Corrupted descriptor addr=%#x head=%u tail=%u size=%u\n",
>desc->addr, desc->head, desc->tail, desc->size);
>   desc->is_in_error = 1;
> + ctb->broken = true;
>   return -EPIPE;
>  }
>  
> @@ -608,9 +613,12 @@ static int ct_read(struct intel_guc_ct *ct, struct 
> ct_incoming_msg **msg)
>   unsigned int i;
>   u32 header;
>  
> - if (unlikely(desc->is_in_error))
> + if (unlikely(ctb->broken))
>   return -EPIPE;
>  
> + if (unlikely(desc->is_in_error))
> + goto corrupted;
> +
>   if (unlikely(!IS_ALIGNED(head | tail, 4) ||
>(tail | head) >= size))
>   goto corrupted;
> @@ -674,6 +682,7 @@ static int ct_read(struct intel_guc_ct *ct, struct 
> ct_incoming_msg **msg)
>   CT_ERROR(ct, "Corrupted descriptor addr=%#x head=%u tail=%u size=%u\n",
>desc->addr, desc->head, desc->tail, desc->size);
>   desc->is_in_error = 1;
> + ctb->broken = true;
>   return -EPIPE;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index cb222f202301..7d3cd375d6a7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -32,12 +32,14 @@ struct intel_guc;
>   * @desc: pointer to the buffer descriptor
>   * @cmds: pointer to the commands buffer
>   * @size: size of the commands buffer
> + * @broken: flag to indicate if descriptor data is broken
>   */
>  struct intel_guc_ct_buffer {
>   spinlock_t lock;
>   struct guc_ct_buffer_desc *desc;
>   u32 *cmds;
>   u32 size;
> + bool broken;
>  };
>  
>  
> -- 
> 2.28.0
> 


[PATCH] Revert "i915: use io_mapping_map_user"

2021-05-27 Thread Matthew Auld
This reverts commit b739f125e4ebd73d10ed30a856574e13649119ed.

We are unfortunately seeing more issues like we did in 293837b9ac8d
("Revert "i915: fix remap_io_sg to verify the pgprot""), except this is
now for the vm_fault_gtt path, where we are now hitting the same
BUG_ON(!pte_none(*pte)):

[10887.466150] kernel BUG at mm/memory.c:2183!
[10887.466162] invalid opcode:  [#1] PREEMPT SMP PTI
[10887.466168] CPU: 0 PID: 7775 Comm: ffmpeg Tainted: G U
5.13.0-rc3-CI-Nightly #1
[10887.466174] Hardware name: To Be Filled By O.E.M. To Be Filled By 
O.E.M./J4205-ITX, BIOS P1.40 07/14/2017
[10887.466177] RIP: 0010:remap_pfn_range_notrack+0x30f/0x440
[10887.466188] Code: e8 96 d7 e0 ff 84 c0 0f 84 27 01 00 00 48 ba 00 f0 ff ff 
ff ff 0f 00 4c 89 e0 48 c1 e0 0c 4d 85 ed 75 96 48 21 d0 31 f6 eb a9 <0f> 0b 48 
39 37 0f 85 0e 01 00 00 48 8b 0c 24 48 39 4f 08 0f 85 00
[10887.466193] RSP: 0018:c90006e33c50 EFLAGS: 00010286
[10887.466198] RAX: 802f RBX: 7f5e0180 RCX: 0028
[10887.466201] RDX: 0001 RSI: ea00 RDI: 
[10887.466204] RBP: ea33fea8 R08: 802f R09: 8881072256e0
[10887.466207] R10: c9000b84fff8 R11: 17dab000 R12: 00089f9f
[10887.466210] R13: 802f R14: 7f5e017e4000 R15: 88800cffaf20
[10887.466213] FS:  7f5e04849640() GS:88827800() 
knlGS:
[10887.466216] CS:  0010 DS:  ES:  CR0: 80050033
[10887.466220] CR2: 7fd9b191a2ac CR3: 0001829ac000 CR4: 003506f0
[10887.466223] Call Trace:
[10887.466233]  vm_fault_gtt+0x1ca/0x5d0 [i915]
[10887.466381]  ? ktime_get+0x38/0x90
[10887.466389]  __do_fault+0x37/0x90
[10887.466395]  __handle_mm_fault+0xc46/0x1200
[10887.466402]  handle_mm_fault+0xce/0x2a0
[10887.466407]  do_user_addr_fault+0x1c5/0x660

Reverting this commit is reported to fix the issue.

Reported-by: Eero Tamminen 
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3519
Fixes: b739f125e4eb ("i915: use io_mapping_map_user")
Cc: Christoph Hellwig 
Cc: Daniel Vetter 
Signed-off-by: Matthew Auld 
---
 drivers/gpu/drm/i915/Kconfig |  1 -
 drivers/gpu/drm/i915/gem/i915_gem_mman.c |  9 ++---
 drivers/gpu/drm/i915/i915_drv.h  |  3 ++
 drivers/gpu/drm/i915/i915_mm.c   | 44 
 4 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 93f4d059fc89..1e1cb245fca7 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -20,7 +20,6 @@ config DRM_I915
select INPUT if ACPI
select ACPI_VIDEO if ACPI
select ACPI_BUTTON if ACPI
-   select IO_MAPPING
select SYNC_FILE
select IOSF_MBI
select CRC32
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index f6fe5cb01438..8598a1c78a4c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -367,10 +367,11 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
goto err_unpin;
 
/* Finally, remap it using the new GTT offset */
-   ret = io_mapping_map_user(&ggtt->iomap, area, area->vm_start +
-   (vma->ggtt_view.partial.offset << PAGE_SHIFT),
-   (ggtt->gmadr.start + vma->node.start) >> PAGE_SHIFT,
-   min_t(u64, vma->size, area->vm_end - area->vm_start));
+   ret = remap_io_mapping(area,
+  area->vm_start + (vma->ggtt_view.partial.offset 
<< PAGE_SHIFT),
+  (ggtt->gmadr.start + vma->node.start) >> 
PAGE_SHIFT,
+  min_t(u64, vma->size, area->vm_end - 
area->vm_start),
+  &ggtt->iomap);
if (ret)
goto err_fence;
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0f6d27da69ac..e926f20c5b82 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1941,6 +1941,9 @@ int i915_reg_read_ioctl(struct drm_device *dev, void 
*data,
struct drm_file *file);
 
 /* i915_mm.c */
+int remap_io_mapping(struct vm_area_struct *vma,
+unsigned long addr, unsigned long pfn, unsigned long size,
+struct io_mapping *iomap);
 int remap_io_sg(struct vm_area_struct *vma,
unsigned long addr, unsigned long size,
struct scatterlist *sgl, resource_size_t iobase);
diff --git a/drivers/gpu/drm/i915/i915_mm.c b/drivers/gpu/drm/i915/i915_mm.c
index 9a777b0ff59b..666808cb3a32 100644
--- a/drivers/gpu/drm/i915/i915_mm.c
+++ b/drivers/gpu/drm/i915/i915_mm.c
@@ -37,6 +37,17 @@ struct remap_pfn {
resource_size_t iobase;
 };
 
+static int remap_pfn(pte_t *pte, unsigned long addr, void *data)
+{
+   struct remap_pfn *r = data;

Re: [PATCH 20/29] drm/i915/gem: Make an alignment check more sensible

2021-05-27 Thread Daniel Vetter
On Thu, May 27, 2021 at 11:26:41AM -0500, Jason Ekstrand wrote:
> What we really want to check is that size of the engines array, i.e.
> args->size - sizeof(*user) is divisible by the element size, i.e.
> sizeof(*user->engines) because that's what's required for computing the
> array length right below the check.  However, we're currently not doing
> this and instead doing a compile-time check that sizeof(*user) is
> divisible by sizeof(*user->engines) and avoiding the subtraction.  As
> far as I can tell, the only reason for the more confusing pair of checks
> is to avoid a single subtraction of a constant.
> 
> The other thing the BUILD_BUG_ON might be trying to implicitly check is
> that offsetof(user->engines) == sizeof(*user) and we don't have any
> weird padding throwing us off.  However, that's not the check it's doing
> and it's not even a reliable way to do that check.
> 
> Signed-off-by: Jason Ekstrand 

Yeah a non-dense compiler should be able to figure this out, plus
set_engines isn't a hotpath.

Reviewed-by: Daniel Vetter 

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 12a148ba421b6..cf7c281977a3e 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1758,9 +1758,8 @@ set_engines(struct i915_gem_context *ctx,
>   goto replace;
>   }
>  
> - BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->engines)));
>   if (args->size < sizeof(*user) ||
> - !IS_ALIGNED(args->size, sizeof(*user->engines))) {
> + !IS_ALIGNED(args->size -  sizeof(*user), sizeof(*user->engines))) {
>   drm_dbg(&i915->drm, "Invalid size for engine array: %d\n",
>   args->size);
>   return -EINVAL;
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 19/29] drm/i915: Add an i915_gem_vm_lookup helper

2021-05-27 Thread Daniel Vetter
On Thu, May 27, 2021 at 11:26:40AM -0500, Jason Ekstrand wrote:
> This is the VM equivalent of i915_gem_context_lookup.  It's only used
> once in this patch but future patches will need to duplicate this lookup
> code so it's better to have it in a helper.
> 
> Signed-off-by: Jason Ekstrand 

Reviewed-by: Daniel Vetter 

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c |  6 +-
>  drivers/gpu/drm/i915/i915_drv.h | 14 ++
>  2 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index d247fb223aac7..12a148ba421b6 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1346,11 +1346,7 @@ static int set_ppgtt(struct drm_i915_file_private 
> *file_priv,
>   if (upper_32_bits(args->value))
>   return -ENOENT;
>  
> - rcu_read_lock();
> - vm = xa_load(&file_priv->vm_xa, args->value);
> - if (vm && !kref_get_unless_zero(&vm->ref))
> - vm = NULL;
> - rcu_read_unlock();
> + vm = i915_gem_vm_lookup(file_priv, args->value);
>   if (!vm)
>   return -ENOENT;
>  
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 48316d273af66..fee2342219da1 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1871,6 +1871,20 @@ i915_gem_context_lookup(struct drm_i915_file_private 
> *file_priv, u32 id)
>   return ctx;
>  }
>  
> +static inline struct i915_address_space *
> +i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
> +{
> + struct i915_address_space *vm;
> +
> + rcu_read_lock();
> + vm = xa_load(&file_priv->vm_xa, id);
> + if (vm && !kref_get_unless_zero(&vm->ref))
> + vm = NULL;
> + rcu_read_unlock();
> +
> + return vm;
> +}
> +
>  /* i915_gem_evict.c */
>  int __must_check i915_gem_evict_something(struct i915_address_space *vm,
> u64 min_size, u64 alignment,
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH 06/18] drm/i915/guc: Drop guc->interrupts.enabled

2021-05-27 Thread Matthew Brost
On Thu, May 27, 2021 at 10:17:20AM -0700, John Harrison wrote:
> On 5/25/2021 23:42, Matthew Brost wrote:
> > Drop the variable guc->interrupts.enabled as this variable is just
> > leading to bugs creeping into the code.
> > 
> > e.g. A full GPU reset disables the GuC interrupts but forgot to clear
> > guc->interrupts.enabled, guc->interrupts.enabled being true suppresses
> > interrupts from getting re-enabled and now we are broken.
> > 
> > It is harmless to enable interrupt while already enabled so let's just
> > delete this variable to avoid bugs like this going forward.
> Is it worth leaving the enabled flag in place but only using it to trip a
> WARN to catch such cases in a less catastrophic manner? Or are there valid
> reasons for calling enable when already enabled?
>

I don't think so as mentioned above a reset disables these interrupts
and if we didn't clear this field the WARN_ON would be triggered making
CI unhappy. Yes, the bug would less catastrophic but we'd still have to
waste time and energy chasing it.

Matt 
 
> Either way, it seems like a plausible change and CI is happy with it, so:
> Reviewed-by: John Harrison 
> 
> John.
> 
> > Signed-off-by: Matthew Brost 
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.c | 27 +-
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h |  1 -
> >   2 files changed, 9 insertions(+), 19 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > index ab2c8fe8cdfa..18da9ed15728 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > @@ -96,12 +96,9 @@ static void gen9_enable_guc_interrupts(struct intel_guc 
> > *guc)
> > assert_rpm_wakelock_held(>->i915->runtime_pm);
> > spin_lock_irq(>->irq_lock);
> > -   if (!guc->interrupts.enabled) {
> > -   WARN_ON_ONCE(intel_uncore_read(gt->uncore, GEN8_GT_IIR(2)) &
> > -gt->pm_guc_events);
> > -   guc->interrupts.enabled = true;
> > -   gen6_gt_pm_enable_irq(gt, gt->pm_guc_events);
> > -   }
> > +   WARN_ON_ONCE(intel_uncore_read(gt->uncore, GEN8_GT_IIR(2)) &
> > +gt->pm_guc_events);
> > +   gen6_gt_pm_enable_irq(gt, gt->pm_guc_events);
> > spin_unlock_irq(>->irq_lock);
> >   }
> > @@ -112,7 +109,6 @@ static void gen9_disable_guc_interrupts(struct 
> > intel_guc *guc)
> > assert_rpm_wakelock_held(>->i915->runtime_pm);
> > spin_lock_irq(>->irq_lock);
> > -   guc->interrupts.enabled = false;
> > gen6_gt_pm_disable_irq(gt, gt->pm_guc_events);
> > @@ -134,18 +130,14 @@ static void gen11_reset_guc_interrupts(struct 
> > intel_guc *guc)
> >   static void gen11_enable_guc_interrupts(struct intel_guc *guc)
> >   {
> > struct intel_gt *gt = guc_to_gt(guc);
> > +   u32 events = REG_FIELD_PREP(ENGINE1_MASK, GUC_INTR_GUC2HOST);
> > spin_lock_irq(>->irq_lock);
> > -   if (!guc->interrupts.enabled) {
> > -   u32 events = REG_FIELD_PREP(ENGINE1_MASK, GUC_INTR_GUC2HOST);
> > -
> > -   WARN_ON_ONCE(gen11_gt_reset_one_iir(gt, 0, GEN11_GUC));
> > -   intel_uncore_write(gt->uncore,
> > -  GEN11_GUC_SG_INTR_ENABLE, events);
> > -   intel_uncore_write(gt->uncore,
> > -  GEN11_GUC_SG_INTR_MASK, ~events);
> > -   guc->interrupts.enabled = true;
> > -   }
> > +   WARN_ON_ONCE(gen11_gt_reset_one_iir(gt, 0, GEN11_GUC));
> > +   intel_uncore_write(gt->uncore,
> > +  GEN11_GUC_SG_INTR_ENABLE, events);
> > +   intel_uncore_write(gt->uncore,
> > +  GEN11_GUC_SG_INTR_MASK, ~events);
> > spin_unlock_irq(>->irq_lock);
> >   }
> > @@ -154,7 +146,6 @@ static void gen11_disable_guc_interrupts(struct 
> > intel_guc *guc)
> > struct intel_gt *gt = guc_to_gt(guc);
> > spin_lock_irq(>->irq_lock);
> > -   guc->interrupts.enabled = false;
> > intel_uncore_write(gt->uncore, GEN11_GUC_SG_INTR_MASK, ~0);
> > intel_uncore_write(gt->uncore, GEN11_GUC_SG_INTR_ENABLE, 0);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index c20f3839de12..4abc59f6f3cd 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -33,7 +33,6 @@ struct intel_guc {
> > unsigned int msg_enabled_mask;
> > struct {
> > -   bool enabled;
> > void (*reset)(struct intel_guc *guc);
> > void (*enable)(struct intel_guc *guc);
> > void (*disable)(struct intel_guc *guc);
> 


[Bug 212957] [radeon] kernel NULL pointer dereference during system boot

2021-05-27 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=212957

Dennis Foster (m...@dennisfoster.us) changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |PATCH_ALREADY_AVAILABLE

--- Comment #5 from Dennis Foster (m...@dennisfoster.us) ---
The issue is now resolved in kernel version 5.12.7

Link to the patch commit:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.12.y&id=ec1bd01b632ad748dce8a0eeb4c167bead71315f

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [Intel-gfx] [PATCH 06/18] drm/i915/guc: Drop guc->interrupts.enabled

2021-05-27 Thread John Harrison

On 5/25/2021 23:42, Matthew Brost wrote:

Drop the variable guc->interrupts.enabled as this variable is just
leading to bugs creeping into the code.

e.g. A full GPU reset disables the GuC interrupts but forgot to clear
guc->interrupts.enabled, guc->interrupts.enabled being true suppresses
interrupts from getting re-enabled and now we are broken.

It is harmless to enable interrupt while already enabled so let's just
delete this variable to avoid bugs like this going forward.
Is it worth leaving the enabled flag in place but only using it to trip 
a WARN to catch such cases in a less catastrophic manner? Or are there 
valid reasons for calling enable when already enabled?


Either way, it seems like a plausible change and CI is happy with it, so:
Reviewed-by: John Harrison 

John.


Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc.c | 27 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc.h |  1 -
  2 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index ab2c8fe8cdfa..18da9ed15728 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -96,12 +96,9 @@ static void gen9_enable_guc_interrupts(struct intel_guc *guc)
assert_rpm_wakelock_held(>->i915->runtime_pm);
  
  	spin_lock_irq(>->irq_lock);

-   if (!guc->interrupts.enabled) {
-   WARN_ON_ONCE(intel_uncore_read(gt->uncore, GEN8_GT_IIR(2)) &
-gt->pm_guc_events);
-   guc->interrupts.enabled = true;
-   gen6_gt_pm_enable_irq(gt, gt->pm_guc_events);
-   }
+   WARN_ON_ONCE(intel_uncore_read(gt->uncore, GEN8_GT_IIR(2)) &
+gt->pm_guc_events);
+   gen6_gt_pm_enable_irq(gt, gt->pm_guc_events);
spin_unlock_irq(>->irq_lock);
  }
  
@@ -112,7 +109,6 @@ static void gen9_disable_guc_interrupts(struct intel_guc *guc)

assert_rpm_wakelock_held(>->i915->runtime_pm);
  
  	spin_lock_irq(>->irq_lock);

-   guc->interrupts.enabled = false;
  
  	gen6_gt_pm_disable_irq(gt, gt->pm_guc_events);
  
@@ -134,18 +130,14 @@ static void gen11_reset_guc_interrupts(struct intel_guc *guc)

  static void gen11_enable_guc_interrupts(struct intel_guc *guc)
  {
struct intel_gt *gt = guc_to_gt(guc);
+   u32 events = REG_FIELD_PREP(ENGINE1_MASK, GUC_INTR_GUC2HOST);
  
  	spin_lock_irq(>->irq_lock);

-   if (!guc->interrupts.enabled) {
-   u32 events = REG_FIELD_PREP(ENGINE1_MASK, GUC_INTR_GUC2HOST);
-
-   WARN_ON_ONCE(gen11_gt_reset_one_iir(gt, 0, GEN11_GUC));
-   intel_uncore_write(gt->uncore,
-  GEN11_GUC_SG_INTR_ENABLE, events);
-   intel_uncore_write(gt->uncore,
-  GEN11_GUC_SG_INTR_MASK, ~events);
-   guc->interrupts.enabled = true;
-   }
+   WARN_ON_ONCE(gen11_gt_reset_one_iir(gt, 0, GEN11_GUC));
+   intel_uncore_write(gt->uncore,
+  GEN11_GUC_SG_INTR_ENABLE, events);
+   intel_uncore_write(gt->uncore,
+  GEN11_GUC_SG_INTR_MASK, ~events);
spin_unlock_irq(>->irq_lock);
  }
  
@@ -154,7 +146,6 @@ static void gen11_disable_guc_interrupts(struct intel_guc *guc)

struct intel_gt *gt = guc_to_gt(guc);
  
  	spin_lock_irq(>->irq_lock);

-   guc->interrupts.enabled = false;
  
  	intel_uncore_write(gt->uncore, GEN11_GUC_SG_INTR_MASK, ~0);

intel_uncore_write(gt->uncore, GEN11_GUC_SG_INTR_ENABLE, 0);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index c20f3839de12..4abc59f6f3cd 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -33,7 +33,6 @@ struct intel_guc {
unsigned int msg_enabled_mask;
  
  	struct {

-   bool enabled;
void (*reset)(struct intel_guc *guc);
void (*enable)(struct intel_guc *guc);
void (*disable)(struct intel_guc *guc);




Re: [Intel-gfx] [RFC PATCH 60/97] drm/i915: Track 'serial' counts for virtual engines

2021-05-27 Thread John Harrison

On 5/27/2021 01:53, Tvrtko Ursulin wrote:

On 26/05/2021 19:45, John Harrison wrote:

On 5/26/2021 01:40, Tvrtko Ursulin wrote:

On 25/05/2021 18:52, Matthew Brost wrote:

On Tue, May 25, 2021 at 11:16:12AM +0100, Tvrtko Ursulin wrote:


On 06/05/2021 20:14, Matthew Brost wrote:

From: John Harrison 

The serial number tracking of engines happens at the backend of
request submission and was expecting to only be given physical
engines. However, in GuC submission mode, the decomposition of 
virtual

to physical engines does not happen in i915. Instead, requests are
submitted to their virtual engine mask all the way through to the
hardware (i.e. to GuC). This would mean that the heart beat code
thinks the physical engines are idle due to the serial number not
incrementing.

This patch updates the tracking to decompose virtual engines into
their physical constituents and tracks the request against each. 
This

is not entirely accurate as the GuC will only be issuing the request
to one physical engine. However, it is the best that i915 can do 
given

that it has no knowledge of the GuC's scheduling decisions.


Commit text sounds a bit defeatist. I think instead of making up 
the serial
counts, which has downsides (could you please document in the 
commit what

they are), we should think how to design things properly.



IMO, I don't think fixing serial counts is the scope of this 
series. We

should focus on getting GuC submission in not cleaning up all the crap
that is in the i915. Let's make a note of this though so we can 
revisit

later.


I will say again - commit message implies it is introducing an 
unspecified downside by not fully fixing an also unspecified issue. 
It is completely reasonable, and customary even, to ask for both to 
be documented in the commit message.
Not sure what exactly is 'unspecified'. I thought the commit message 
described both the problem (heartbeat not running when using virtual 
engines) and the result (heartbeat running on more engines than 
strictly necessary). But in greater detail...


The serial number tracking is a hack for the heartbeat code to know 
whether an engine is busy or idle, and therefore whether it should be 
pinged for aliveness. Whenever a submission is made to an engine, the 
serial number is incremented. The heartbeat code keeps a copy of the 
value. If the value has changed, the engine is busy and needs to be 
pinged.


This works fine for execlist mode where virtual engine decomposition 
is done inside i915. It fails miserably for GuC mode where the 
decomposition is done by the hardware. The reason being that the 
heartbeat code only looks at physical engines but the serial count is 
only incremented on the virtual engine. Thus, the heartbeat sees 
everything as idle and does not ping.


So hangcheck does not work. Or it works because GuC does it anyway. 
Either way, that's one thing to explicitly state in the commit message.


This patch decomposes the virtual engines for the sake of 
incrementing the serial count on each sub-engine in order to keep the 
heartbeat code happy. The downside is that now the heartbeat sees all 
sub-engines as busy rather than only the one the submission actually 
ends up on. There really isn't much that can be done about that. The 
heartbeat code is in i915 not GuC, the scheduler is in GuC not i915. 
The only way to improve it is to either move the heartbeat code into 
GuC as well and completely disable the i915 side, or add some way for 
i915 to interrogate GuC as to which engines are or are not active. 
Technically, we do have both. GuC has (or at least had) an option to 
force a context switch on every execution quantum pre-emption. 
However, that is much, much, more heavy weight than the heartbeat. 
For the latter, we do (almost) have the engine usage statistics for 
PMU and such like. I'm not sure how much effort it would be to wire 
that up to the heartbeat code instead of using the serial count.


In short, the serial count is ever so slightly inefficient in that it 
causes heartbeat pings on engines which are idle. On the other hand, 
it is way more efficient and simpler than the current alternatives.


And the hack to make hangcheck work creates this inefficiency where 
heartbeats are sent to idle engines. Which is probably fine just needs 
to be explained.



Does that answer the questions?


With the two points I re-raise clearly explained, possibly even patch 
title changed, yeah. I am just wanting for it to be more easily 
obvious to patch reader what it is functionally about - not just what 
implementation details have been change but why as well.


My understanding is that we don't explain every piece of code in minute 
detail in every checkin email that touches it. I thought my description 
was already pretty verbose. I've certainly seen way less informative 
checkins that apparently made it through review without issue.


Regarding the problem statement, I thought this was fairly clear that 
the heart

Re: [PATCH v7 01/15] swiotlb: Refactor swiotlb init functions

2021-05-27 Thread Tom Lendacky
On 5/27/21 9:41 AM, Tom Lendacky wrote:
> On 5/27/21 8:02 AM, Christoph Hellwig wrote:
>> On Wed, May 19, 2021 at 11:50:07AM -0700, Florian Fainelli wrote:
>>> You convert this call site with swiotlb_init_io_tlb_mem() which did not
>>> do the set_memory_decrypted()+memset(). Is this okay or should
>>> swiotlb_init_io_tlb_mem() add an additional argument to do this
>>> conditionally?
>>
>> The zeroing is useful and was missing before.  I think having a clean
>> state here is the right thing.
>>
>> Not sure about the set_memory_decrypted, swiotlb_update_mem_attributes
>> kinda suggests it is too early to set the memory decrupted.
>>
>> Adding Tom who should now about all this.
> 
> The reason for adding swiotlb_update_mem_attributes() was because having
> the call to set_memory_decrypted() in swiotlb_init_with_tbl() triggered a
> BUG_ON() related to interrupts not being enabled yet during boot. So that
> call had to be delayed until interrupts were enabled.

I pulled down and tested the patch set and booted with SME enabled. The
following was seen during the boot:

[0.134184] BUG: Bad page state in process swapper  pfn:108002
[0.134196] page:(ptrval) refcount:0 mapcount:-128 
mapping: index:0x0 pfn:0x108002
[0.134201] flags: 0x17c000(node=0|zone=2|lastcpupid=0x1f)
[0.134208] raw: 0017c000 88847f355e28 88847f355e28 

[0.134210] raw:  0001 ff7f 

[0.134212] page dumped because: nonzero mapcount
[0.134213] Modules linked in:
[0.134218] CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.0-rc2-sos-custom #3
[0.134221] Hardware name: ...
[0.134224] Call Trace:
[0.134233]  dump_stack+0x76/0x94
[0.134244]  bad_page+0xa6/0xf0
[0.134252]  __free_pages_ok+0x331/0x360
[0.134256]  memblock_free_all+0x158/0x1c1
[0.134267]  mem_init+0x1f/0x14c
[0.134273]  start_kernel+0x290/0x574
[0.134279]  secondary_startup_64_no_verify+0xb0/0xbb

I see this about 40 times during the boot, each with a different PFN. The
system boots (which seemed odd), but I don't know if there will be side
effects to this (I didn't stress the system).

I modified the code to add a flag to not do the set_memory_decrypted(), as
suggested by Florian, when invoked from swiotlb_init_with_tbl(), and that
eliminated the bad page state BUG.

Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>


RE: [PATCH v2 2/2] drm/kmb: Do not report 0 (success) in case of error

2021-05-27 Thread Chrisanthus, Anitha
This is already fixed in the patch from Zhen Lei.

> -Original Message-
> From: Christophe JAILLET 
> Sent: Wednesday, May 26, 2021 11:10 PM
> To: Chrisanthus, Anitha ; Dea, Edmund J
> ; airl...@linux.ie; dan...@ffwll.ch;
> s...@ravnborg.org
> Cc: dri-devel@lists.freedesktop.org; linux-ker...@vger.kernel.org; kernel-
> janit...@vger.kernel.org; Christophe JAILLET 
> Subject: [PATCH v2 2/2] drm/kmb: Do not report 0 (success) in case of error
> 
> 'ret' is known to be 0 at this point.
> Reporting the error from the previous 'platform_get_irq()' call is likely,
> so add the missing assignment.
> 
> Fixes: 7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display")
> Signed-off-by: Christophe JAILLET 
> ---
> v2: New patch
> ---
>  drivers/gpu/drm/kmb/kmb_drv.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/kmb/kmb_drv.c
> b/drivers/gpu/drm/kmb/kmb_drv.c
> index fa28e42da460..d9e10ac9847c 100644
> --- a/drivers/gpu/drm/kmb/kmb_drv.c
> +++ b/drivers/gpu/drm/kmb/kmb_drv.c
> @@ -138,6 +138,7 @@ static int kmb_hw_init(struct drm_device *drm,
> unsigned long flags)
>   irq_lcd = platform_get_irq(pdev, 0);
>   if (irq_lcd < 0) {
>   drm_err(&kmb->drm, "irq_lcd not found");
> + ret = irq_lcd;
>   goto setup_fail;
>   }
> 
> --
> 2.30.2



[PATCH 29/29] drm/i915/gem: Roll all of context creation together

2021-05-27 Thread Jason Ekstrand
Now that we have the whole engine set and VM at context creation time,
we can just assign those fields instead of creating first and handling
the VM and engines later.  This lets us avoid creating useless VMs and
engine sets and lets us get rid of the complex VM setting code.

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 159 ++
 .../gpu/drm/i915/gem/selftests/mock_context.c |  33 ++--
 2 files changed, 64 insertions(+), 128 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index e6a6ead477ff4..502a2bd1a043e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1298,56 +1298,6 @@ static int __context_set_persistence(struct 
i915_gem_context *ctx, bool state)
return 0;
 }
 
-static struct i915_gem_context *
-__create_context(struct drm_i915_private *i915,
-const struct i915_gem_proto_context *pc)
-{
-   struct i915_gem_context *ctx;
-   struct i915_gem_engines *e;
-   int err;
-   int i;
-
-   ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
-   if (!ctx)
-   return ERR_PTR(-ENOMEM);
-
-   kref_init(&ctx->ref);
-   ctx->i915 = i915;
-   ctx->sched = pc->sched;
-   mutex_init(&ctx->mutex);
-   INIT_LIST_HEAD(&ctx->link);
-
-   spin_lock_init(&ctx->stale.lock);
-   INIT_LIST_HEAD(&ctx->stale.engines);
-
-   mutex_init(&ctx->engines_mutex);
-   e = default_engines(ctx, pc->legacy_rcs_sseu);
-   if (IS_ERR(e)) {
-   err = PTR_ERR(e);
-   goto err_free;
-   }
-   RCU_INIT_POINTER(ctx->engines, e);
-
-   INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
-   mutex_init(&ctx->lut_mutex);
-
-   /* NB: Mark all slices as needing a remap so that when the context first
-* loads it will restore whatever remap state already exists. If there
-* is no remap info, it will be a NOP. */
-   ctx->remap_slice = ALL_L3_SLICES(i915);
-
-   ctx->user_flags = pc->user_flags;
-
-   for (i = 0; i < ARRAY_SIZE(ctx->hang_timestamp); i++)
-   ctx->hang_timestamp[i] = jiffies - CONTEXT_FAST_HANG_JIFFIES;
-
-   return ctx;
-
-err_free:
-   kfree(ctx);
-   return ERR_PTR(err);
-}
-
 static inline struct i915_gem_engines *
 __context_engines_await(const struct i915_gem_context *ctx,
bool *user_engines)
@@ -1391,86 +1341,77 @@ context_apply_all(struct i915_gem_context *ctx,
i915_sw_fence_complete(&e->fence);
 }
 
-static void __apply_ppgtt(struct intel_context *ce, void *vm)
-{
-   i915_vm_put(ce->vm);
-   ce->vm = i915_vm_get(vm);
-}
-
-static struct i915_address_space *
-__set_ppgtt(struct i915_gem_context *ctx, struct i915_address_space *vm)
-{
-   struct i915_address_space *old;
-
-   old = rcu_replace_pointer(ctx->vm,
- i915_vm_open(vm),
- lockdep_is_held(&ctx->mutex));
-   GEM_BUG_ON(old && i915_vm_is_4lvl(vm) != i915_vm_is_4lvl(old));
-
-   context_apply_all(ctx, __apply_ppgtt, vm);
-
-   return old;
-}
-
-static void __assign_ppgtt(struct i915_gem_context *ctx,
-  struct i915_address_space *vm)
-{
-   if (vm == rcu_access_pointer(ctx->vm))
-   return;
-
-   vm = __set_ppgtt(ctx, vm);
-   if (vm)
-   i915_vm_close(vm);
-}
-
 static struct i915_gem_context *
 i915_gem_create_context(struct drm_i915_private *i915,
const struct i915_gem_proto_context *pc)
 {
struct i915_gem_context *ctx;
-   int ret;
+   struct i915_gem_engines *e;
+   int err;
+   int i;
 
-   ctx = __create_context(i915, pc);
-   if (IS_ERR(ctx))
-   return ctx;
+   ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+   if (!ctx)
+   return ERR_PTR(-ENOMEM);
 
-   if (pc->vm) {
-   mutex_lock(&ctx->mutex);
-   __assign_ppgtt(ctx, pc->vm);
-   mutex_unlock(&ctx->mutex);
-   }
+   kref_init(&ctx->ref);
+   ctx->i915 = i915;
+   ctx->sched = pc->sched;
+   mutex_init(&ctx->mutex);
+   INIT_LIST_HEAD(&ctx->link);
 
-   if (pc->num_user_engines >= 0) {
-   struct i915_gem_engines *engines;
+   spin_lock_init(&ctx->stale.lock);
+   INIT_LIST_HEAD(&ctx->stale.engines);
 
-   engines = user_engines(ctx, pc->num_user_engines,
-  pc->user_engines);
-   if (IS_ERR(engines)) {
-   context_close(ctx);
-   return ERR_CAST(engines);
-   }
+   if (pc->vm)
+   RCU_INIT_POINTER(ctx->vm, i915_vm_open(pc->vm));
 
-   mutex_lock(&ctx->engines_mutex);
+   mutex_init(&ctx->engines_mutex);
+   if (pc->num_user_engines >= 0) {
i915

[PATCH 28/29] i915/gem/selftests: Assign the VM at context creation in igt_shared_ctx_exec

2021-05-27 Thread Jason Ekstrand
We want to delete __assign_ppgtt and, generally, stop setting the VM
after context creation.  This is the one place I could find in the
selftests where we set a VM after the fact.

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index aee5642818824..01f7615eb3a8a 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -813,16 +813,12 @@ static int igt_shared_ctx_exec(void *arg)
struct i915_gem_context *ctx;
struct intel_context *ce;
 
-   ctx = kernel_context(i915, NULL);
+   ctx = kernel_context(i915, ctx_vm(parent));
if (IS_ERR(ctx)) {
err = PTR_ERR(ctx);
goto out_test;
}
 
-   mutex_lock(&ctx->mutex);
-   __assign_ppgtt(ctx, ctx_vm(parent));
-   mutex_unlock(&ctx->mutex);
-
ce = i915_gem_context_get_engine(ctx, 
engine->legacy_idx);
GEM_BUG_ON(IS_ERR(ce));
 
-- 
2.31.1



[PATCH 26/29] drm/i915/gem: Don't allow changing the engine set on running contexts (v2)

2021-05-27 Thread Jason Ekstrand
When the APIs were added to manage the engine set on a GEM context
directly from userspace, the questionable choice was made to allow
changing the engine set on a context at any time.  This is horribly racy
and there's absolutely no reason why any userspace would want to do this
outside of trying to exercise interesting race conditions.  By removing
support for CONTEXT_PARAM_ENGINES from ctx_setparam, we make it
impossible to change the engine set after the context has been fully
created.

This doesn't yet let us delete all the deferred engine clean-up code as
that's still used for handling the case where the client dies or calls
GEM_CONTEXT_DESTROY while work is in flight.  However, moving to an API
where the engine set is effectively immutable gives us more options to
potentially clean that code up a bit going forward.  It also removes a
whole class of ways in which a client can hurt itself or try to get
around kernel context banning.

v2 (Jason Ekstrand):
 - Expand the commit mesage

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 303 
 1 file changed, 303 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index a528c8f3354a0..e6a6ead477ff4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1819,305 +1819,6 @@ static int set_sseu(struct i915_gem_context *ctx,
return ret;
 }
 
-struct set_engines {
-   struct i915_gem_context *ctx;
-   struct i915_gem_engines *engines;
-};
-
-static int
-set_engines__load_balance(struct i915_user_extension __user *base, void *data)
-{
-   struct i915_context_engines_load_balance __user *ext =
-   container_of_user(base, typeof(*ext), base);
-   const struct set_engines *set = data;
-   struct drm_i915_private *i915 = set->ctx->i915;
-   struct intel_engine_cs *stack[16];
-   struct intel_engine_cs **siblings;
-   struct intel_context *ce;
-   struct intel_sseu null_sseu = {};
-   u16 num_siblings, idx;
-   unsigned int n;
-   int err;
-
-   if (!HAS_EXECLISTS(i915))
-   return -ENODEV;
-
-   if (intel_uc_uses_guc_submission(&i915->gt.uc))
-   return -ENODEV; /* not implement yet */
-
-   if (get_user(idx, &ext->engine_index))
-   return -EFAULT;
-
-   if (idx >= set->engines->num_engines) {
-   drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
-   idx, set->engines->num_engines);
-   return -EINVAL;
-   }
-
-   idx = array_index_nospec(idx, set->engines->num_engines);
-   if (set->engines->engines[idx]) {
-   drm_dbg(&i915->drm,
-   "Invalid placement[%d], already occupied\n", idx);
-   return -EEXIST;
-   }
-
-   if (get_user(num_siblings, &ext->num_siblings))
-   return -EFAULT;
-
-   err = check_user_mbz(&ext->flags);
-   if (err)
-   return err;
-
-   err = check_user_mbz(&ext->mbz64);
-   if (err)
-   return err;
-
-   siblings = stack;
-   if (num_siblings > ARRAY_SIZE(stack)) {
-   siblings = kmalloc_array(num_siblings,
-sizeof(*siblings),
-GFP_KERNEL);
-   if (!siblings)
-   return -ENOMEM;
-   }
-
-   for (n = 0; n < num_siblings; n++) {
-   struct i915_engine_class_instance ci;
-
-   if (copy_from_user(&ci, &ext->engines[n], sizeof(ci))) {
-   err = -EFAULT;
-   goto out_siblings;
-   }
-
-   siblings[n] = intel_engine_lookup_user(i915,
-  ci.engine_class,
-  ci.engine_instance);
-   if (!siblings[n]) {
-   drm_dbg(&i915->drm,
-   "Invalid sibling[%d]: { class:%d, inst:%d }\n",
-   n, ci.engine_class, ci.engine_instance);
-   err = -EINVAL;
-   goto out_siblings;
-   }
-   }
-
-   ce = intel_execlists_create_virtual(siblings, n);
-   if (IS_ERR(ce)) {
-   err = PTR_ERR(ce);
-   goto out_siblings;
-   }
-
-   intel_context_set_gem(ce, set->ctx, null_sseu);
-
-   if (cmpxchg(&set->engines->engines[idx], NULL, ce)) {
-   intel_context_put(ce);
-   err = -EEXIST;
-   goto out_siblings;
-   }
-
-out_siblings:
-   if (siblings != stack)
-   kfree(siblings);
-
-   return err;
-}
-
-static int
-set_engines__bond(struct i915_user_extension __user *base, void *data)
-{
-   struct i915_context_engines_bond __user *ext =
-   co

[PATCH 22/29] drm/i915/gem: Return an error ptr from context_lookup

2021-05-27 Thread Jason Ekstrand
We're about to start doing lazy context creation which means contexts
get created in i915_gem_context_lookup and we may start having more
errors than -ENOENT.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c| 12 ++--
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c |  4 ++--
 drivers/gpu/drm/i915/i915_drv.h|  2 +-
 drivers/gpu/drm/i915/i915_perf.c   |  4 ++--
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index d68c111bc824a..76662175e6980 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -2636,8 +2636,8 @@ int i915_gem_context_getparam_ioctl(struct drm_device 
*dev, void *data,
int ret = 0;
 
ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-   if (!ctx)
-   return -ENOENT;
+   if (IS_ERR(ctx))
+   return PTR_ERR(ctx);
 
switch (args->param) {
case I915_CONTEXT_PARAM_GTT_SIZE:
@@ -2705,8 +2705,8 @@ int i915_gem_context_setparam_ioctl(struct drm_device 
*dev, void *data,
int ret;
 
ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-   if (!ctx)
-   return -ENOENT;
+   if (IS_ERR(ctx))
+   return PTR_ERR(ctx);
 
ret = ctx_setparam(file_priv, ctx, args);
 
@@ -2725,8 +2725,8 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device 
*dev,
return -EINVAL;
 
ctx = i915_gem_context_lookup(file->driver_priv, args->ctx_id);
-   if (!ctx)
-   return -ENOENT;
+   if (IS_ERR(ctx))
+   return PTR_ERR(ctx);
 
/*
 * We opt for unserialised reads here. This may result in tearing
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 7024adcd5cf15..de14b26f3b2d5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -739,8 +739,8 @@ static int eb_select_context(struct i915_execbuffer *eb)
struct i915_gem_context *ctx;
 
ctx = i915_gem_context_lookup(eb->file->driver_priv, eb->args->rsvd1);
-   if (unlikely(!ctx))
-   return -ENOENT;
+   if (unlikely(IS_ERR(ctx)))
+   return PTR_ERR(ctx);
 
eb->gem_context = ctx;
if (rcu_access_pointer(ctx->vm))
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fee2342219da1..d7bd732ceacfc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1868,7 +1868,7 @@ i915_gem_context_lookup(struct drm_i915_file_private 
*file_priv, u32 id)
ctx = NULL;
rcu_read_unlock();
 
-   return ctx;
+   return ctx ? ctx : ERR_PTR(-ENOENT);
 }
 
 static inline struct i915_address_space *
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index de8ebc34af0ff..dfc2a5c067c29 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -3414,10 +3414,10 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf,
struct drm_i915_file_private *file_priv = file->driver_priv;
 
specific_ctx = i915_gem_context_lookup(file_priv, ctx_handle);
-   if (!specific_ctx) {
+   if (IS_ERR(specific_ctx)) {
DRM_DEBUG("Failed to look up context with ID %u for 
opening perf stream\n",
  ctx_handle);
-   ret = -ENOENT;
+   ret = PTR_ERR(specific_ctx);
goto err;
}
}
-- 
2.31.1



[PATCH 27/29] drm/i915/selftests: Take a VM in kernel_context()

2021-05-27 Thread Jason Ekstrand
This better models where we want to go with contexts in general where
things like the VM and engine set are create parameters instead of being
set after the fact.

Signed-off-by: Jason Ekstrand 
---
 .../drm/i915/gem/selftests/i915_gem_context.c |  4 ++--
 .../gpu/drm/i915/gem/selftests/mock_context.c |  9 -
 .../gpu/drm/i915/gem/selftests/mock_context.h |  4 +++-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 20 +--
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  2 +-
 5 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index 506cd9e9d4b25..aee5642818824 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -680,7 +680,7 @@ static int igt_ctx_exec(void *arg)
struct i915_gem_context *ctx;
struct intel_context *ce;
 
-   ctx = kernel_context(i915);
+   ctx = kernel_context(i915, NULL);
if (IS_ERR(ctx)) {
err = PTR_ERR(ctx);
goto out_file;
@@ -813,7 +813,7 @@ static int igt_shared_ctx_exec(void *arg)
struct i915_gem_context *ctx;
struct intel_context *ce;
 
-   ctx = kernel_context(i915);
+   ctx = kernel_context(i915, NULL);
if (IS_ERR(ctx)) {
err = PTR_ERR(ctx);
goto out_test;
diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_context.c 
b/drivers/gpu/drm/i915/gem/selftests/mock_context.c
index 61aaac4a334cf..500ef27ba4771 100644
--- a/drivers/gpu/drm/i915/gem/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/mock_context.c
@@ -150,7 +150,8 @@ live_context_for_engine(struct intel_engine_cs *engine, 
struct file *file)
 }
 
 struct i915_gem_context *
-kernel_context(struct drm_i915_private *i915)
+kernel_context(struct drm_i915_private *i915,
+  struct i915_address_space *vm)
 {
struct i915_gem_context *ctx;
struct i915_gem_proto_context *pc;
@@ -159,6 +160,12 @@ kernel_context(struct drm_i915_private *i915)
if (IS_ERR(pc))
return ERR_CAST(pc);
 
+   if (vm) {
+   if (pc->vm)
+   i915_vm_put(pc->vm);
+   pc->vm = i915_vm_get(vm);
+   }
+
ctx = i915_gem_create_context(i915, pc);
proto_context_close(pc);
if (IS_ERR(ctx))
diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_context.h 
b/drivers/gpu/drm/i915/gem/selftests/mock_context.h
index 2a6121d33352d..7a02fd9b5866a 100644
--- a/drivers/gpu/drm/i915/gem/selftests/mock_context.h
+++ b/drivers/gpu/drm/i915/gem/selftests/mock_context.h
@@ -10,6 +10,7 @@
 struct file;
 struct drm_i915_private;
 struct intel_engine_cs;
+struct i915_address_space;
 
 void mock_init_contexts(struct drm_i915_private *i915);
 
@@ -25,7 +26,8 @@ live_context(struct drm_i915_private *i915, struct file 
*file);
 struct i915_gem_context *
 live_context_for_engine(struct intel_engine_cs *engine, struct file *file);
 
-struct i915_gem_context *kernel_context(struct drm_i915_private *i915);
+struct i915_gem_context *kernel_context(struct drm_i915_private *i915,
+   struct i915_address_space *vm);
 void kernel_context_close(struct i915_gem_context *ctx);
 
 #endif /* !__MOCK_CONTEXT_H */
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c 
b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index a0e75b71a3374..0989e024f7a03 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -1522,12 +1522,12 @@ static int live_busywait_preempt(void *arg)
 * preempt the busywaits used to synchronise between rings.
 */
 
-   ctx_hi = kernel_context(gt->i915);
+   ctx_hi = kernel_context(gt->i915, NULL);
if (!ctx_hi)
return -ENOMEM;
ctx_hi->sched.priority = I915_CONTEXT_MAX_USER_PRIORITY;
 
-   ctx_lo = kernel_context(gt->i915);
+   ctx_lo = kernel_context(gt->i915, NULL);
if (!ctx_lo)
goto err_ctx_hi;
ctx_lo->sched.priority = I915_CONTEXT_MIN_USER_PRIORITY;
@@ -1724,12 +1724,12 @@ static int live_preempt(void *arg)
if (igt_spinner_init(&spin_lo, gt))
goto err_spin_hi;
 
-   ctx_hi = kernel_context(gt->i915);
+   ctx_hi = kernel_context(gt->i915, NULL);
if (!ctx_hi)
goto err_spin_lo;
ctx_hi->sched.priority = I915_CONTEXT_MAX_USER_PRIORITY;
 
-   ctx_lo = kernel_context(gt->i915);
+   ctx_lo = kernel_context(gt->i915, NULL);
if (!ctx_lo)
goto err_ctx_hi;
ctx_lo->sched.priority = I915_CONTEXT_MIN_USER_PRIO

[PATCH 08/29] drm/i915: Drop getparam support for I915_CONTEXT_PARAM_ENGINES

2021-05-27 Thread Jason Ekstrand
This has never been used by any userspace except IGT and provides no
real functionality beyond parroting back parameters userspace passed in
as part of context creation or via setparam.  If the context is in
legacy mode (where you use I915_EXEC_RENDER and friends), it returns
success with zero data so it's not useful for discovering what engines
are in the context.  It's also not a replacement for the recently
removed I915_CONTEXT_CLONE_ENGINES because it doesn't return any of the
balancing or bonding information.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 77 +
 1 file changed, 1 insertion(+), 76 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index aa792c9517e16..fed3538de9241 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1724,78 +1724,6 @@ set_engines(struct i915_gem_context *ctx,
return 0;
 }
 
-static int
-get_engines(struct i915_gem_context *ctx,
-   struct drm_i915_gem_context_param *args)
-{
-   struct i915_context_param_engines __user *user;
-   struct i915_gem_engines *e;
-   size_t n, count, size;
-   bool user_engines;
-   int err = 0;
-
-   e = __context_engines_await(ctx, &user_engines);
-   if (!e)
-   return -ENOENT;
-
-   if (!user_engines) {
-   i915_sw_fence_complete(&e->fence);
-   args->size = 0;
-   return 0;
-   }
-
-   count = e->num_engines;
-
-   /* Be paranoid in case we have an impedance mismatch */
-   if (!check_struct_size(user, engines, count, &size)) {
-   err = -EINVAL;
-   goto err_free;
-   }
-   if (overflows_type(size, args->size)) {
-   err = -EINVAL;
-   goto err_free;
-   }
-
-   if (!args->size) {
-   args->size = size;
-   goto err_free;
-   }
-
-   if (args->size < size) {
-   err = -EINVAL;
-   goto err_free;
-   }
-
-   user = u64_to_user_ptr(args->value);
-   if (put_user(0, &user->extensions)) {
-   err = -EFAULT;
-   goto err_free;
-   }
-
-   for (n = 0; n < count; n++) {
-   struct i915_engine_class_instance ci = {
-   .engine_class = I915_ENGINE_CLASS_INVALID,
-   .engine_instance = I915_ENGINE_CLASS_INVALID_NONE,
-   };
-
-   if (e->engines[n]) {
-   ci.engine_class = e->engines[n]->engine->uabi_class;
-   ci.engine_instance = 
e->engines[n]->engine->uabi_instance;
-   }
-
-   if (copy_to_user(&user->engines[n], &ci, sizeof(ci))) {
-   err = -EFAULT;
-   goto err_free;
-   }
-   }
-
-   args->size = size;
-
-err_free:
-   i915_sw_fence_complete(&e->fence);
-   return err;
-}
-
 static int
 set_persistence(struct i915_gem_context *ctx,
const struct drm_i915_gem_context_param *args)
@@ -2126,10 +2054,6 @@ int i915_gem_context_getparam_ioctl(struct drm_device 
*dev, void *data,
ret = get_ppgtt(file_priv, ctx, args);
break;
 
-   case I915_CONTEXT_PARAM_ENGINES:
-   ret = get_engines(ctx, args);
-   break;
-
case I915_CONTEXT_PARAM_PERSISTENCE:
args->size = 0;
args->value = i915_gem_context_is_persistent(ctx);
@@ -2137,6 +2061,7 @@ int i915_gem_context_getparam_ioctl(struct drm_device 
*dev, void *data,
 
case I915_CONTEXT_PARAM_NO_ZEROMAP:
case I915_CONTEXT_PARAM_BAN_PERIOD:
+   case I915_CONTEXT_PARAM_ENGINES:
case I915_CONTEXT_PARAM_RINGSIZE:
default:
ret = -EINVAL;
-- 
2.31.1



[PATCH 25/29] drm/i915/gem: Don't allow changing the VM on running contexts (v2)

2021-05-27 Thread Jason Ekstrand
When the APIs were added to manage VMs more directly from userspace, the
questionable choice was made to allow changing out the VM on a context
at any time.  This is horribly racy and there's absolutely no reason why
any userspace would want to do this outside of testing that exact race.
By removing support for CONTEXT_PARAM_VM from ctx_setparam, we make it
impossible to change out the VM after the context has been fully
created.  This lets us delete a bunch of deferred task code as well as a
duplicated (and slightly different) copy of the code which programs the
PPGTT registers.

v2 (Jason Ekstrand):
 - Expand the commit message

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 262 --
 .../gpu/drm/i915/gem/i915_gem_context_types.h |   2 +-
 .../drm/i915/gem/selftests/i915_gem_context.c | 119 
 .../drm/i915/selftests/i915_mock_selftests.h  |   1 -
 4 files changed, 1 insertion(+), 383 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index f7c83730ee07f..a528c8f3354a0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1633,120 +1633,6 @@ int i915_gem_vm_destroy_ioctl(struct drm_device *dev, 
void *data,
return 0;
 }
 
-struct context_barrier_task {
-   struct i915_active base;
-   void (*task)(void *data);
-   void *data;
-};
-
-static void cb_retire(struct i915_active *base)
-{
-   struct context_barrier_task *cb = container_of(base, typeof(*cb), base);
-
-   if (cb->task)
-   cb->task(cb->data);
-
-   i915_active_fini(&cb->base);
-   kfree(cb);
-}
-
-I915_SELFTEST_DECLARE(static intel_engine_mask_t context_barrier_inject_fault);
-static int context_barrier_task(struct i915_gem_context *ctx,
-   intel_engine_mask_t engines,
-   bool (*skip)(struct intel_context *ce, void 
*data),
-   int (*pin)(struct intel_context *ce, struct 
i915_gem_ww_ctx *ww, void *data),
-   int (*emit)(struct i915_request *rq, void 
*data),
-   void (*task)(void *data),
-   void *data)
-{
-   struct context_barrier_task *cb;
-   struct i915_gem_engines_iter it;
-   struct i915_gem_engines *e;
-   struct i915_gem_ww_ctx ww;
-   struct intel_context *ce;
-   int err = 0;
-
-   GEM_BUG_ON(!task);
-
-   cb = kmalloc(sizeof(*cb), GFP_KERNEL);
-   if (!cb)
-   return -ENOMEM;
-
-   i915_active_init(&cb->base, NULL, cb_retire, 0);
-   err = i915_active_acquire(&cb->base);
-   if (err) {
-   kfree(cb);
-   return err;
-   }
-
-   e = __context_engines_await(ctx, NULL);
-   if (!e) {
-   i915_active_release(&cb->base);
-   return -ENOENT;
-   }
-
-   for_each_gem_engine(ce, e, it) {
-   struct i915_request *rq;
-
-   if (I915_SELFTEST_ONLY(context_barrier_inject_fault &
-  ce->engine->mask)) {
-   err = -ENXIO;
-   break;
-   }
-
-   if (!(ce->engine->mask & engines))
-   continue;
-
-   if (skip && skip(ce, data))
-   continue;
-
-   i915_gem_ww_ctx_init(&ww, true);
-retry:
-   err = intel_context_pin_ww(ce, &ww);
-   if (err)
-   goto err;
-
-   if (pin)
-   err = pin(ce, &ww, data);
-   if (err)
-   goto err_unpin;
-
-   rq = i915_request_create(ce);
-   if (IS_ERR(rq)) {
-   err = PTR_ERR(rq);
-   goto err_unpin;
-   }
-
-   err = 0;
-   if (emit)
-   err = emit(rq, data);
-   if (err == 0)
-   err = i915_active_add_request(&cb->base, rq);
-
-   i915_request_add(rq);
-err_unpin:
-   intel_context_unpin(ce);
-err:
-   if (err == -EDEADLK) {
-   err = i915_gem_ww_ctx_backoff(&ww);
-   if (!err)
-   goto retry;
-   }
-   i915_gem_ww_ctx_fini(&ww);
-
-   if (err)
-   break;
-   }
-   i915_sw_fence_complete(&e->fence);
-
-   cb->task = err ? NULL : task; /* caller needs to unwind instead */
-   cb->data = data;
-
-   i915_active_release(&cb->base);
-
-   return err;
-}
-
 static int get_ppgtt(struct drm_i915_file_private *file_priv,
 struct i915_gem_context *ctx,
 struct drm_i915_gem_context_param *args)
@@ -1779,150 +1665,6 @@ static int get_ppgtt(struct drm_i

[PATCH 24/29] drm/i915/gem: Delay context creation

2021-05-27 Thread Jason Ekstrand
The current context uAPI allows for two methods of setting context
parameters: SET_CONTEXT_PARAM and CONTEXT_CREATE_EXT_SETPARAM.  The
former is allowed to be called at any time while the later happens as
part of GEM_CONTEXT_CREATE.  Currently, everything settable via one is
settable via the other.  While some params are fairly simple and setting
them on a live context is harmless such the context priority, others are
far trickier such as the VM or the set of engines.  In order to swap out
the VM, for instance, we have to delay until all current in-flight work
is complete, swap in the new VM, and then continue.  This leads to a
plethora of potential race conditions we'd really rather avoid.

In previous patches, we added a i915_gem_proto_context struct which is
capable of storing and tracking all such create parameters.  This commit
delays the creation of the actual context until after the client is done
configuring it with SET_CONTEXT_PARAM.  From the perspective of the
client, it has the same u32 context ID the whole time.  From the
perspective of i915, however, it's an i915_gem_proto_context right up
until the point where we attempt to do something which the proto-context
can't handle at which point the real context gets created.

This is accomplished via a little xarray dance.  When GEM_CONTEXT_CREATE
is called, we create a proto-context, reserve a slot in context_xa but
leave it NULL, the proto-context in the corresponding slot in
proto_context_xa.  Then, whenever we go to look up a context, we first
check context_xa.  If it's there, we return the i915_gem_context and
we're done.  If it's not, we look in proto_context_xa and, if we find it
there, we create the actual context and kill the proto-context.

In order for this dance to work properly, everything which ever touches
a proto-context is guarded by drm_i915_file_private::proto_context_lock,
including context creation.  Yes, this means context creation now takes
a giant global lock but it can't really be helped and that should never
be on any driver's fast-path anyway.

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 211 ++
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |   3 +
 .../gpu/drm/i915/gem/i915_gem_context_types.h |  54 +
 .../gpu/drm/i915/gem/selftests/mock_context.c |   5 +-
 drivers/gpu/drm/i915/i915_drv.h   |  24 +-
 5 files changed, 239 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 8288af0d33245..f7c83730ee07f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -298,6 +298,42 @@ proto_context_create(struct drm_i915_private *i915, 
unsigned int flags)
return err;
 }
 
+static int proto_context_register_locked(struct drm_i915_file_private *fpriv,
+struct i915_gem_proto_context *pc,
+u32 *id)
+{
+   int ret;
+   void *old;
+
+   lockdep_assert_held(&fpriv->proto_context_lock);
+
+   ret = xa_alloc(&fpriv->context_xa, id, NULL, xa_limit_32b, GFP_KERNEL);
+   if (ret)
+   return ret;
+
+   old = xa_store(&fpriv->proto_context_xa, *id, pc, GFP_KERNEL);
+   if (xa_is_err(old)) {
+   xa_erase(&fpriv->context_xa, *id);
+   return xa_err(old);
+   }
+   GEM_BUG_ON(old);
+
+   return 0;
+}
+
+static int proto_context_register(struct drm_i915_file_private *fpriv,
+ struct i915_gem_proto_context *pc,
+ u32 *id)
+{
+   int ret;
+
+   mutex_lock(&fpriv->proto_context_lock);
+   ret = proto_context_register_locked(fpriv, pc, id);
+   mutex_unlock(&fpriv->proto_context_lock);
+
+   return ret;
+}
+
 static int set_proto_ctx_vm(struct drm_i915_file_private *fpriv,
struct i915_gem_proto_context *pc,
const struct drm_i915_gem_context_param *args)
@@ -1448,12 +1484,12 @@ void i915_gem_init__contexts(struct drm_i915_private 
*i915)
init_contexts(&i915->gem.contexts);
 }
 
-static int gem_context_register(struct i915_gem_context *ctx,
-   struct drm_i915_file_private *fpriv,
-   u32 *id)
+static void gem_context_register(struct i915_gem_context *ctx,
+struct drm_i915_file_private *fpriv,
+u32 id)
 {
struct drm_i915_private *i915 = ctx->i915;
-   int ret;
+   void *old;
 
ctx->file_priv = fpriv;
 
@@ -1462,19 +1498,12 @@ static int gem_context_register(struct i915_gem_context 
*ctx,
 current->comm, pid_nr(ctx->pid));
 
/* And finally expose ourselves to userspace via the idr */
-   ret = xa_alloc(&fpriv->context_xa, id, ctx, xa_limit_32b, GFP_KERNEL);
-   if (

[PATCH 23/29] drm/i915/gt: Drop i915_address_space::file (v2)

2021-05-27 Thread Jason Ekstrand
There's a big comment saying how useful it is but no one is using this
for anything anymore.

It was added in 2bfa996e031b ("drm/i915: Store owning file on the
i915_address_space") and used for debugfs at the time as well as telling
the difference between the global GTT and a PPGTT.  In f6e8aa387171
("drm/i915: Report the number of closed vma held by each context in
debugfs") we removed one use of it by switching to a context walk and
comparing with the VM in the context.  Finally, VM stats for debugfs
were entirely nuked in db80a1294c23 ("drm/i915/gem: Remove per-client
stats from debugfs/i915_gem_objects")

v2 (Daniel Vetter):
 - Delete a struct drm_i915_file_private pre-declaration
 - Add a comment to the commit message about history

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c |  9 -
 drivers/gpu/drm/i915/gt/intel_gtt.h | 11 ---
 drivers/gpu/drm/i915/selftests/mock_gtt.c   |  1 -
 3 files changed, 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 76662175e6980..8288af0d33245 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1453,17 +1453,10 @@ static int gem_context_register(struct i915_gem_context 
*ctx,
u32 *id)
 {
struct drm_i915_private *i915 = ctx->i915;
-   struct i915_address_space *vm;
int ret;
 
ctx->file_priv = fpriv;
 
-   mutex_lock(&ctx->mutex);
-   vm = i915_gem_context_vm(ctx);
-   if (vm)
-   WRITE_ONCE(vm->file, fpriv); /* XXX */
-   mutex_unlock(&ctx->mutex);
-
ctx->pid = get_task_pid(current, PIDTYPE_PID);
snprintf(ctx->name, sizeof(ctx->name), "%s[%d]",
 current->comm, pid_nr(ctx->pid));
@@ -1562,8 +1555,6 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void 
*data,
if (IS_ERR(ppgtt))
return PTR_ERR(ppgtt);
 
-   ppgtt->vm.file = file_priv;
-
if (args->extensions) {
err = i915_user_extensions(u64_to_user_ptr(args->extensions),
   NULL, 0,
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
b/drivers/gpu/drm/i915/gt/intel_gtt.h
index ca00b45827b74..cbd89fded6f2a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -140,7 +140,6 @@ typedef u64 gen8_pte_t;
 
 enum i915_cache_level;
 
-struct drm_i915_file_private;
 struct drm_i915_gem_object;
 struct i915_fence_reg;
 struct i915_vma;
@@ -220,16 +219,6 @@ struct i915_address_space {
struct intel_gt *gt;
struct drm_i915_private *i915;
struct device *dma;
-   /*
-* Every address space belongs to a struct file - except for the global
-* GTT that is owned by the driver (and so @file is set to NULL). In
-* principle, no information should leak from one context to another
-* (or between files/processes etc) unless explicitly shared by the
-* owner. Tracking the owner is important in order to free up per-file
-* objects along with the file, to aide resource tracking, and to
-* assign blame.
-*/
-   struct drm_i915_file_private *file;
u64 total;  /* size addr space maps (ex. 2GB for ggtt) */
u64 reserved;   /* size addr space reserved */
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c 
b/drivers/gpu/drm/i915/selftests/mock_gtt.c
index 5c7ae40bba634..cc047ec594f93 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
@@ -73,7 +73,6 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, 
const char *name)
ppgtt->vm.gt = &i915->gt;
ppgtt->vm.i915 = i915;
ppgtt->vm.total = round_down(U64_MAX, PAGE_SIZE);
-   ppgtt->vm.file = ERR_PTR(-ENODEV);
ppgtt->vm.dma = i915->drm.dev;
 
i915_address_space_init(&ppgtt->vm, VM_CLASS_PPGTT);
-- 
2.31.1



[PATCH 20/29] drm/i915/gem: Make an alignment check more sensible

2021-05-27 Thread Jason Ekstrand
What we really want to check is that size of the engines array, i.e.
args->size - sizeof(*user) is divisible by the element size, i.e.
sizeof(*user->engines) because that's what's required for computing the
array length right below the check.  However, we're currently not doing
this and instead doing a compile-time check that sizeof(*user) is
divisible by sizeof(*user->engines) and avoiding the subtraction.  As
far as I can tell, the only reason for the more confusing pair of checks
is to avoid a single subtraction of a constant.

The other thing the BUILD_BUG_ON might be trying to implicitly check is
that offsetof(user->engines) == sizeof(*user) and we don't have any
weird padding throwing us off.  However, that's not the check it's doing
and it's not even a reliable way to do that check.

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 12a148ba421b6..cf7c281977a3e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1758,9 +1758,8 @@ set_engines(struct i915_gem_context *ctx,
goto replace;
}
 
-   BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->engines)));
if (args->size < sizeof(*user) ||
-   !IS_ALIGNED(args->size, sizeof(*user->engines))) {
+   !IS_ALIGNED(args->size -  sizeof(*user), sizeof(*user->engines))) {
drm_dbg(&i915->drm, "Invalid size for engine array: %d\n",
args->size);
return -EINVAL;
-- 
2.31.1



[PATCH 21/29] drm/i915/gem: Use the proto-context to handle create parameters (v2)

2021-05-27 Thread Jason Ekstrand
This means that the proto-context needs to grow support for engine
configuration information as well as setparam logic.  Fortunately, we'll
be deleting a lot of setparam logic on the primary context shortly so it
will hopefully balance out.

There's an extra bit of fun here when it comes to setting SSEU and the
way it interacts with PARAM_ENGINES.  Unfortunately, thanks to
SET_CONTEXT_PARAM and not being allowed to pick the order in which we
handle certain parameters, we have think about those interactions.

v2 (Daniel Vetter):
 - Add a proto_context_free_user_engines helper
 - Comment on SSEU in the commit message
 - Use proto_context_set_persistence in set_proto_ctx_param

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 552 +-
 .../gpu/drm/i915/gem/i915_gem_context_types.h |  58 ++
 2 files changed, 588 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index cf7c281977a3e..d68c111bc824a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -191,10 +191,24 @@ static int validate_priority(struct drm_i915_private 
*i915,
return 0;
 }
 
+static void proto_context_free_user_engines(struct i915_gem_proto_context *pc)
+{
+   int i;
+
+   if (pc->user_engines) {
+   for (i = 0; i < pc->num_user_engines; i++)
+   kfree(pc->user_engines[i].siblings);
+   kfree(pc->user_engines);
+   }
+   pc->user_engines = NULL;
+   pc->num_user_engines = -1;
+}
+
 static void proto_context_close(struct i915_gem_proto_context *pc)
 {
if (pc->vm)
i915_vm_put(pc->vm);
+   proto_context_free_user_engines(pc);
kfree(pc);
 }
 
@@ -211,7 +225,7 @@ static int proto_context_set_persistence(struct 
drm_i915_private *i915,
if (!i915->params.enable_hangcheck)
return -EINVAL;
 
-   __set_bit(UCONTEXT_PERSISTENCE, &pc->user_flags);
+   pc->user_flags |= BIT(UCONTEXT_PERSISTENCE);
} else {
/* To cancel a context we use "preempt-to-idle" */
if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_PREEMPTION))
@@ -233,7 +247,7 @@ static int proto_context_set_persistence(struct 
drm_i915_private *i915,
if (!intel_has_reset_engine(&i915->gt))
return -ENODEV;
 
-   __clear_bit(UCONTEXT_PERSISTENCE, &pc->user_flags);
+   pc->user_flags &= ~BIT(UCONTEXT_PERSISTENCE);
}
 
return 0;
@@ -248,6 +262,9 @@ proto_context_create(struct drm_i915_private *i915, 
unsigned int flags)
if (!pc)
return ERR_PTR(-ENOMEM);
 
+   pc->num_user_engines = -1;
+   pc->user_engines = NULL;
+
if (HAS_FULL_PPGTT(i915)) {
struct i915_ppgtt *ppgtt;
 
@@ -261,9 +278,8 @@ proto_context_create(struct drm_i915_private *i915, 
unsigned int flags)
pc->vm = &ppgtt->vm;
}
 
-   pc->user_flags = 0;
-   __set_bit(UCONTEXT_BANNABLE, &pc->user_flags);
-   __set_bit(UCONTEXT_RECOVERABLE, &pc->user_flags);
+   pc->user_flags = BIT(UCONTEXT_BANNABLE) |
+BIT(UCONTEXT_RECOVERABLE);
proto_context_set_persistence(i915, pc, true);
pc->sched.priority = I915_PRIORITY_NORMAL;
 
@@ -282,6 +298,429 @@ proto_context_create(struct drm_i915_private *i915, 
unsigned int flags)
return err;
 }
 
+static int set_proto_ctx_vm(struct drm_i915_file_private *fpriv,
+   struct i915_gem_proto_context *pc,
+   const struct drm_i915_gem_context_param *args)
+{
+   struct i915_address_space *vm;
+
+   if (args->size)
+   return -EINVAL;
+
+   if (!pc->vm)
+   return -ENODEV;
+
+   if (upper_32_bits(args->value))
+   return -ENOENT;
+
+   vm = i915_gem_vm_lookup(fpriv, args->value);
+   if (!vm)
+   return -ENOENT;
+
+   i915_vm_put(pc->vm);
+   pc->vm = vm;
+
+   return 0;
+}
+
+struct set_proto_ctx_engines {
+   struct drm_i915_private *i915;
+   unsigned num_engines;
+   struct i915_gem_proto_engine *engines;
+};
+
+static int
+set_proto_ctx_engines_balance(struct i915_user_extension __user *base,
+ void *data)
+{
+   struct i915_context_engines_load_balance __user *ext =
+   container_of_user(base, typeof(*ext), base);
+   const struct set_proto_ctx_engines *set = data;
+   struct drm_i915_private *i915 = set->i915;
+   struct intel_engine_cs **siblings;
+   u16 num_siblings, idx;
+   unsigned int n;
+   int err;
+
+   if (!HAS_EXECLISTS(i915))
+   return -ENODEV;
+
+   if (intel_uc_uses_guc_submission(&i915->gt.uc))
+   return -ENODEV; /* not implement yet */
+
+   

[PATCH 14/29] drm/i915/gem: Add a separate validate_priority helper

2021-05-27 Thread Jason Ekstrand
With the proto-context stuff added later in this series, we end up
having to duplicate set_priority.  This lets us avoid duplicating the
validation logic.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 42 +
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 910d31cb043e9..fc471243aa769 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -169,6 +169,28 @@ lookup_user_engine(struct i915_gem_context *ctx,
return i915_gem_context_get_engine(ctx, idx);
 }
 
+static int validate_priority(struct drm_i915_private *i915,
+const struct drm_i915_gem_context_param *args)
+{
+   s64 priority = args->value;
+
+   if (args->size)
+   return -EINVAL;
+
+   if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
+   return -ENODEV;
+
+   if (priority > I915_CONTEXT_MAX_USER_PRIORITY ||
+   priority < I915_CONTEXT_MIN_USER_PRIORITY)
+   return -EINVAL;
+
+   if (priority > I915_CONTEXT_DEFAULT_PRIORITY &&
+   !capable(CAP_SYS_NICE))
+   return -EPERM;
+
+   return 0;
+}
+
 static struct i915_address_space *
 context_get_vm_rcu(struct i915_gem_context *ctx)
 {
@@ -1744,23 +1766,13 @@ static void __apply_priority(struct intel_context *ce, 
void *arg)
 static int set_priority(struct i915_gem_context *ctx,
const struct drm_i915_gem_context_param *args)
 {
-   s64 priority = args->value;
-
-   if (args->size)
-   return -EINVAL;
-
-   if (!(ctx->i915->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
-   return -ENODEV;
-
-   if (priority > I915_CONTEXT_MAX_USER_PRIORITY ||
-   priority < I915_CONTEXT_MIN_USER_PRIORITY)
-   return -EINVAL;
+   int err;
 
-   if (priority > I915_CONTEXT_DEFAULT_PRIORITY &&
-   !capable(CAP_SYS_NICE))
-   return -EPERM;
+   err = validate_priority(ctx->i915, args);
+   if (err)
+   return err;
 
-   ctx->sched.priority = priority;
+   ctx->sched.priority = args->value;
context_apply_all(ctx, __apply_priority, ctx);
 
return 0;
-- 
2.31.1



[PATCH 18/29] drm/i915/gem: Optionally set SSEU in intel_context_set_gem

2021-05-27 Thread Jason Ekstrand
For now this is a no-op because everyone passes in a null SSEU but it
lets us get some of the error handling and selftest refactoring plumbed
through.

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 41 +++
 .../gpu/drm/i915/gem/selftests/mock_context.c |  6 ++-
 2 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index f8f3f514b4265..d247fb223aac7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -320,9 +320,12 @@ context_get_vm_rcu(struct i915_gem_context *ctx)
} while (1);
 }
 
-static void intel_context_set_gem(struct intel_context *ce,
- struct i915_gem_context *ctx)
+static int intel_context_set_gem(struct intel_context *ce,
+struct i915_gem_context *ctx,
+struct intel_sseu sseu)
 {
+   int ret = 0;
+
GEM_BUG_ON(rcu_access_pointer(ce->gem_context));
RCU_INIT_POINTER(ce->gem_context, ctx);
 
@@ -349,6 +352,12 @@ static void intel_context_set_gem(struct intel_context *ce,
 
intel_context_set_watchdog_us(ce, (u64)timeout_ms * 1000);
}
+
+   /* A valid SSEU has no zero fields */
+   if (sseu.slice_mask && !WARN_ON(ce->engine->class != RENDER_CLASS))
+   ret = intel_context_reconfigure_sseu(ce, sseu);
+
+   return ret;
 }
 
 static void __free_engines(struct i915_gem_engines *e, unsigned int count)
@@ -416,7 +425,8 @@ static struct i915_gem_engines *alloc_engines(unsigned int 
count)
return e;
 }
 
-static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx)
+static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx,
+   struct intel_sseu rcs_sseu)
 {
const struct intel_gt *gt = &ctx->i915->gt;
struct intel_engine_cs *engine;
@@ -429,6 +439,8 @@ static struct i915_gem_engines *default_engines(struct 
i915_gem_context *ctx)
 
for_each_engine(engine, gt, id) {
struct intel_context *ce;
+   struct intel_sseu sseu = {};
+   int ret;
 
if (engine->legacy_idx == INVALID_ENGINE)
continue;
@@ -442,10 +454,18 @@ static struct i915_gem_engines *default_engines(struct 
i915_gem_context *ctx)
goto free_engines;
}
 
-   intel_context_set_gem(ce, ctx);
-
e->engines[engine->legacy_idx] = ce;
e->num_engines = max(e->num_engines, engine->legacy_idx + 1);
+
+   if (engine->class == RENDER_CLASS)
+   sseu = rcs_sseu;
+
+   ret = intel_context_set_gem(ce, ctx, sseu);
+   if (ret) {
+   err = ERR_PTR(ret);
+   goto free_engines;
+   }
+
}
 
return e;
@@ -759,6 +779,7 @@ __create_context(struct drm_i915_private *i915,
 {
struct i915_gem_context *ctx;
struct i915_gem_engines *e;
+   struct intel_sseu null_sseu = {};
int err;
int i;
 
@@ -776,7 +797,7 @@ __create_context(struct drm_i915_private *i915,
INIT_LIST_HEAD(&ctx->stale.engines);
 
mutex_init(&ctx->engines_mutex);
-   e = default_engines(ctx);
+   e = default_engines(ctx, null_sseu);
if (IS_ERR(e)) {
err = PTR_ERR(e);
goto err_free;
@@ -1543,6 +1564,7 @@ set_engines__load_balance(struct i915_user_extension 
__user *base, void *data)
struct intel_engine_cs *stack[16];
struct intel_engine_cs **siblings;
struct intel_context *ce;
+   struct intel_sseu null_sseu = {};
u16 num_siblings, idx;
unsigned int n;
int err;
@@ -1615,7 +1637,7 @@ set_engines__load_balance(struct i915_user_extension 
__user *base, void *data)
goto out_siblings;
}
 
-   intel_context_set_gem(ce, set->ctx);
+   intel_context_set_gem(ce, set->ctx, null_sseu);
 
if (cmpxchg(&set->engines->engines[idx], NULL, ce)) {
intel_context_put(ce);
@@ -1723,6 +1745,7 @@ set_engines(struct i915_gem_context *ctx,
struct drm_i915_private *i915 = ctx->i915;
struct i915_context_param_engines __user *user =
u64_to_user_ptr(args->value);
+   struct intel_sseu null_sseu = {};
struct set_engines set = { .ctx = ctx };
unsigned int num_engines, n;
u64 extensions;
@@ -1732,7 +1755,7 @@ set_engines(struct i915_gem_context *ctx,
if (!i915_gem_context_user_engines(ctx))
return 0;
 
-   set.engines = default_engines(ctx);
+   set.engines = default_engines(ctx, null_sseu);
if (IS_ERR(set.engines))
 

[PATCH 09/29] drm/i915/gem: Disallow bonding of virtual engines (v3)

2021-05-27 Thread Jason Ekstrand
This adds a bunch of complexity which the media driver has never
actually used.  The media driver does technically bond a balanced engine
to another engine but the balanced engine only has one engine in the
sibling set.  This doesn't actually result in a virtual engine.

This functionality was originally added to handle cases where we may
have more than two video engines and media might want to load-balance
their bonded submits by, for instance, submitting to a balanced vcs0-1
as the primary and then vcs2-3 as the secondary.  However, no such
hardware has shipped thus far and, if we ever want to enable such
use-cases in the future, we'll use the up-and-coming parallel submit API
which targets GuC submission.

This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op.  We leave the
validation code in place in case we ever decide we want to do something
interesting with the bonding information.

v2 (Jason Ekstrand):
 - Don't delete quite as much code.

v3 (Tvrtko Ursulin):
 - Add some history to the commit message

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  18 +-
 .../drm/i915/gt/intel_execlists_submission.c  |  69 --
 .../drm/i915/gt/intel_execlists_submission.h  |   4 -
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 229 --
 4 files changed, 6 insertions(+), 314 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index fed3538de9241..5e159fb526631 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1552,6 +1552,12 @@ set_engines__bond(struct i915_user_extension __user 
*base, void *data)
}
virtual = set->engines->engines[idx]->engine;
 
+   if (intel_engine_is_virtual(virtual)) {
+   drm_dbg(&i915->drm,
+   "Bonding with virtual engines not allowed\n");
+   return -EINVAL;
+   }
+
err = check_user_mbz(&ext->flags);
if (err)
return err;
@@ -1592,18 +1598,6 @@ set_engines__bond(struct i915_user_extension __user 
*base, void *data)
n, ci.engine_class, ci.engine_instance);
return -EINVAL;
}
-
-   /*
-* A non-virtual engine has no siblings to choose between; and
-* a submit fence will always be directed to the one engine.
-*/
-   if (intel_engine_is_virtual(virtual)) {
-   err = intel_virtual_engine_attach_bond(virtual,
-  master,
-  bond);
-   if (err)
-   return err;
-   }
}
 
return 0;
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 0e8c320927d15..14378b28169b7 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -181,18 +181,6 @@ struct virtual_engine {
int prio;
} nodes[I915_NUM_ENGINES];
 
-   /*
-* Keep track of bonded pairs -- restrictions upon on our selection
-* of physical engines any particular request may be submitted to.
-* If we receive a submit-fence from a master engine, we will only
-* use one of sibling_mask physical engines.
-*/
-   struct ve_bond {
-   const struct intel_engine_cs *master;
-   intel_engine_mask_t sibling_mask;
-   } *bonds;
-   unsigned int num_bonds;
-
/* And finally, which physical engines this virtual engine maps onto. */
unsigned int num_siblings;
struct intel_engine_cs *siblings[];
@@ -3307,7 +3295,6 @@ static void rcu_virtual_context_destroy(struct 
work_struct *wrk)
intel_breadcrumbs_free(ve->base.breadcrumbs);
intel_engine_free_request_pool(&ve->base);
 
-   kfree(ve->bonds);
kfree(ve);
 }
 
@@ -3560,33 +3547,13 @@ static void virtual_submit_request(struct i915_request 
*rq)
spin_unlock_irqrestore(&ve->base.active.lock, flags);
 }
 
-static struct ve_bond *
-virtual_find_bond(struct virtual_engine *ve,
- const struct intel_engine_cs *master)
-{
-   int i;
-
-   for (i = 0; i < ve->num_bonds; i++) {
-   if (ve->bonds[i].master == master)
-   return &ve->bonds[i];
-   }
-
-   return NULL;
-}
-
 static void
 virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
 {
-   struct virtual_engine *ve = to_virtual_engine(rq->engine);
intel_engine_mask_t allowed, exec;
-   struct ve_bond *bond;
 
allowed = ~to_request(signal)->engine->mask;
 
-   bond = virtual_find_bond(ve, to_request(signal)->en

[PATCH 12/29] drm/i915/gem: Disallow creating contexts with too many engines

2021-05-27 Thread Jason Ekstrand
There's no sense in allowing userspace to create more engines than it
can possibly access via execbuf.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 5e159fb526631..2b9207b557cc9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1639,11 +1639,11 @@ set_engines(struct i915_gem_context *ctx,
return -EINVAL;
}
 
-   /*
-* Note that I915_EXEC_RING_MASK limits execbuf to only using the
-* first 64 engines defined here.
-*/
num_engines = (args->size - sizeof(*user)) / sizeof(*user->engines);
+   /* RING_MASK has no shift so we can use it directly here */
+   if (num_engines > I915_EXEC_RING_MASK + 1)
+   return -EINVAL;
+
set.engines = alloc_engines(num_engines);
if (!set.engines)
return -ENOMEM;
-- 
2.31.1



[PATCH 11/29] drm/i915/request: Remove the hook from await_execution

2021-05-27 Thread Jason Ekstrand
This was only ever used for FENCE_SUBMIT automatic engine selection
which was removed in the previous commit.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  3 +-
 drivers/gpu/drm/i915/i915_request.c   | 42 ---
 drivers/gpu/drm/i915/i915_request.h   |  4 +-
 3 files changed, 9 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index efb2fa3522a42..7024adcd5cf15 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -3473,8 +3473,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
if (in_fence) {
if (args->flags & I915_EXEC_FENCE_SUBMIT)
err = i915_request_await_execution(eb.request,
-  in_fence,
-  NULL);
+  in_fence);
else
err = i915_request_await_dma_fence(eb.request,
   in_fence);
diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 970d8f4986bbe..53f23ce40dd63 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -49,7 +49,6 @@
 struct execute_cb {
struct irq_work work;
struct i915_sw_fence *fence;
-   void (*hook)(struct i915_request *rq, struct dma_fence *signal);
struct i915_request *signal;
 };
 
@@ -180,17 +179,6 @@ static void irq_execute_cb(struct irq_work *wrk)
kmem_cache_free(global.slab_execute_cbs, cb);
 }
 
-static void irq_execute_cb_hook(struct irq_work *wrk)
-{
-   struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
-
-   cb->hook(container_of(cb->fence, struct i915_request, submit),
-&cb->signal->fence);
-   i915_request_put(cb->signal);
-
-   irq_execute_cb(wrk);
-}
-
 static __always_inline void
 __notify_execute_cb(struct i915_request *rq, bool (*fn)(struct irq_work *wrk))
 {
@@ -517,17 +505,12 @@ static bool __request_in_flight(const struct i915_request 
*signal)
 static int
 __await_execution(struct i915_request *rq,
  struct i915_request *signal,
- void (*hook)(struct i915_request *rq,
-  struct dma_fence *signal),
  gfp_t gfp)
 {
struct execute_cb *cb;
 
-   if (i915_request_is_active(signal)) {
-   if (hook)
-   hook(rq, &signal->fence);
+   if (i915_request_is_active(signal))
return 0;
-   }
 
cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
if (!cb)
@@ -537,12 +520,6 @@ __await_execution(struct i915_request *rq,
i915_sw_fence_await(cb->fence);
init_irq_work(&cb->work, irq_execute_cb);
 
-   if (hook) {
-   cb->hook = hook;
-   cb->signal = i915_request_get(signal);
-   cb->work.func = irq_execute_cb_hook;
-   }
-
/*
 * Register the callback first, then see if the signaler is already
 * active. This ensures that if we race with the
@@ -1253,7 +1230,7 @@ emit_semaphore_wait(struct i915_request *to,
goto await_fence;
 
/* Only submit our spinner after the signaler is running! */
-   if (__await_execution(to, from, NULL, gfp))
+   if (__await_execution(to, from, gfp))
goto await_fence;
 
if (__emit_semaphore_wait(to, from, from->fence.seqno))
@@ -1284,16 +1261,14 @@ static int intel_timeline_sync_set_start(struct 
intel_timeline *tl,
 
 static int
 __i915_request_await_execution(struct i915_request *to,
-  struct i915_request *from,
-  void (*hook)(struct i915_request *rq,
-   struct dma_fence *signal))
+  struct i915_request *from)
 {
int err;
 
GEM_BUG_ON(intel_context_is_barrier(from->context));
 
/* Submit both requests at the same time */
-   err = __await_execution(to, from, hook, I915_FENCE_GFP);
+   err = __await_execution(to, from, I915_FENCE_GFP);
if (err)
return err;
 
@@ -1406,9 +1381,7 @@ i915_request_await_external(struct i915_request *rq, 
struct dma_fence *fence)
 
 int
 i915_request_await_execution(struct i915_request *rq,
-struct dma_fence *fence,
-void (*hook)(struct i915_request *rq,
- struct dma_fence *signal))
+struct dma_fence *fence)
 {
struct dma_fence **child = &fence;
unsigned int nchild = 1;
@@ -1441,8 +1414,7 @@ i915_request_await_ex

[PATCH 19/29] drm/i915: Add an i915_gem_vm_lookup helper

2021-05-27 Thread Jason Ekstrand
This is the VM equivalent of i915_gem_context_lookup.  It's only used
once in this patch but future patches will need to duplicate this lookup
code so it's better to have it in a helper.

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c |  6 +-
 drivers/gpu/drm/i915/i915_drv.h | 14 ++
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index d247fb223aac7..12a148ba421b6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1346,11 +1346,7 @@ static int set_ppgtt(struct drm_i915_file_private 
*file_priv,
if (upper_32_bits(args->value))
return -ENOENT;
 
-   rcu_read_lock();
-   vm = xa_load(&file_priv->vm_xa, args->value);
-   if (vm && !kref_get_unless_zero(&vm->ref))
-   vm = NULL;
-   rcu_read_unlock();
+   vm = i915_gem_vm_lookup(file_priv, args->value);
if (!vm)
return -ENOENT;
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 48316d273af66..fee2342219da1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1871,6 +1871,20 @@ i915_gem_context_lookup(struct drm_i915_file_private 
*file_priv, u32 id)
return ctx;
 }
 
+static inline struct i915_address_space *
+i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
+{
+   struct i915_address_space *vm;
+
+   rcu_read_lock();
+   vm = xa_load(&file_priv->vm_xa, id);
+   if (vm && !kref_get_unless_zero(&vm->ref))
+   vm = NULL;
+   rcu_read_unlock();
+
+   return vm;
+}
+
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct i915_address_space *vm,
  u64 min_size, u64 alignment,
-- 
2.31.1



[PATCH 17/29] drm/i915/gem: Rework error handling in default_engines

2021-05-27 Thread Jason Ekstrand
Since free_engines works for partially constructed engine sets, we can
use the usual goto pattern.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 10bff488444b6..f8f3f514b4265 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -420,7 +420,7 @@ static struct i915_gem_engines *default_engines(struct 
i915_gem_context *ctx)
 {
const struct intel_gt *gt = &ctx->i915->gt;
struct intel_engine_cs *engine;
-   struct i915_gem_engines *e;
+   struct i915_gem_engines *e, *err;
enum intel_engine_id id;
 
e = alloc_engines(I915_NUM_ENGINES);
@@ -438,18 +438,21 @@ static struct i915_gem_engines *default_engines(struct 
i915_gem_context *ctx)
 
ce = intel_context_create(engine);
if (IS_ERR(ce)) {
-   __free_engines(e, e->num_engines + 1);
-   return ERR_CAST(ce);
+   err = ERR_CAST(ce);
+   goto free_engines;
}
 
intel_context_set_gem(ce, ctx);
 
e->engines[engine->legacy_idx] = ce;
-   e->num_engines = max(e->num_engines, engine->legacy_idx);
+   e->num_engines = max(e->num_engines, engine->legacy_idx + 1);
}
-   e->num_engines++;
 
return e;
+
+free_engines:
+   free_engines(e);
+   return err;
 }
 
 void i915_gem_context_release(struct kref *ref)
-- 
2.31.1



[PATCH 13/29] drm/i915: Stop manually RCU banging in reset_stats_ioctl (v2)

2021-05-27 Thread Jason Ekstrand
As far as I can tell, the only real reason for this is to avoid taking a
reference to the i915_gem_context.  The cost of those two atomics
probably pales in comparison to the cost of the ioctl itself so we're
really not buying ourselves anything here.  We're about to make context
lookup a tiny bit more complicated, so let's get rid of the one hand-
rolled case.

Some usermode drivers such as our Vulkan driver call GET_RESET_STATS on
every execbuf so the perf here could theoretically be an issue.  If this
ever does become a performance issue for any such userspace drivers,
they can use set CONTEXT_PARAM_RECOVERABLE to false and look for -EIO
coming from execbuf to check for hangs instead.

v2 (Daniel Vetter):
 - Add a comment in the commit message about recoverable contexts

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 13 -
 drivers/gpu/drm/i915/i915_drv.h |  8 +---
 2 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 2b9207b557cc9..910d31cb043e9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -2090,16 +2090,13 @@ int i915_gem_context_reset_stats_ioctl(struct 
drm_device *dev,
struct drm_i915_private *i915 = to_i915(dev);
struct drm_i915_reset_stats *args = data;
struct i915_gem_context *ctx;
-   int ret;
 
if (args->flags || args->pad)
return -EINVAL;
 
-   ret = -ENOENT;
-   rcu_read_lock();
-   ctx = __i915_gem_context_lookup_rcu(file->driver_priv, args->ctx_id);
+   ctx = i915_gem_context_lookup(file->driver_priv, args->ctx_id);
if (!ctx)
-   goto out;
+   return -ENOENT;
 
/*
 * We opt for unserialised reads here. This may result in tearing
@@ -2116,10 +2113,8 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device 
*dev,
args->batch_active = atomic_read(&ctx->guilty_count);
args->batch_pending = atomic_read(&ctx->active_count);
 
-   ret = 0;
-out:
-   rcu_read_unlock();
-   return ret;
+   i915_gem_context_put(ctx);
+   return 0;
 }
 
 /* GEM context-engines iterator: for_each_gem_engine() */
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 39b5e019c1a5b..48316d273af66 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1857,19 +1857,13 @@ struct drm_gem_object *i915_gem_prime_import(struct 
drm_device *dev,
 
 struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int 
flags);
 
-static inline struct i915_gem_context *
-__i915_gem_context_lookup_rcu(struct drm_i915_file_private *file_priv, u32 id)
-{
-   return xa_load(&file_priv->context_xa, id);
-}
-
 static inline struct i915_gem_context *
 i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id)
 {
struct i915_gem_context *ctx;
 
rcu_read_lock();
-   ctx = __i915_gem_context_lookup_rcu(file_priv, id);
+   ctx = xa_load(&file_priv->context_xa, id);
if (ctx && !kref_get_unless_zero(&ctx->ref))
ctx = NULL;
rcu_read_unlock();
-- 
2.31.1



[PATCH 10/29] drm/i915/gem: Remove engine auto-magic with FENCE_SUBMIT (v2)

2021-05-27 Thread Jason Ekstrand
Even though FENCE_SUBMIT is only documented to wait until the request in
the in-fence starts instead of waiting until it completes, it has a bit
more magic than that.  If FENCE_SUBMIT is used to submit something to a
balanced engine, we would wait to assign engines until the primary
request was ready to start and then attempt to assign it to a different
engine than the primary.  There is an IGT test (the bonded-slice subtest
of gem_exec_balancer) which exercises this by submitting a primary batch
to a specific VCS and then using FENCE_SUBMIT to submit a secondary
which can run on any VCS and have i915 figure out which VCS to run it on
such that they can run in parallel.

However, this functionality has never been used in the real world.  The
media driver (the only user of FENCE_SUBMIT) always picks exactly two
physical engines to bond and never asks us to pick which to use.

v2 (Daniel Vetter):
 - Mention the exact IGT test this breaks

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c  |  2 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h|  7 ---
 .../drm/i915/gt/intel_execlists_submission.c| 17 -
 3 files changed, 1 insertion(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index d640bba6ad9ab..efb2fa3522a42 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -3474,7 +3474,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
if (args->flags & I915_EXEC_FENCE_SUBMIT)
err = i915_request_await_execution(eb.request,
   in_fence,
-  
eb.engine->bond_execute);
+  NULL);
else
err = i915_request_await_dma_fence(eb.request,
   in_fence);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 883bafc449024..68cfe5080325c 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -446,13 +446,6 @@ struct intel_engine_cs {
 */
void(*submit_request)(struct i915_request *rq);
 
-   /*
-* Called on signaling of a SUBMIT_FENCE, passing along the signaling
-* request down to the bonded pairs.
-*/
-   void(*bond_execute)(struct i915_request *rq,
-   struct dma_fence *signal);
-
/*
 * Call when the priority on a request has changed and it and its
 * dependencies may need rescheduling. Note the request itself may
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 14378b28169b7..635d6d2494d26 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3547,22 +3547,6 @@ static void virtual_submit_request(struct i915_request 
*rq)
spin_unlock_irqrestore(&ve->base.active.lock, flags);
 }
 
-static void
-virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
-{
-   intel_engine_mask_t allowed, exec;
-
-   allowed = ~to_request(signal)->engine->mask;
-
-   /* Restrict the bonded request to run on only the available engines */
-   exec = READ_ONCE(rq->execution_mask);
-   while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed))
-   ;
-
-   /* Prevent the master from being re-run on the bonded engines */
-   to_request(signal)->execution_mask &= ~allowed;
-}
-
 struct intel_context *
 intel_execlists_create_virtual(struct intel_engine_cs **siblings,
   unsigned int count)
@@ -3616,7 +3600,6 @@ intel_execlists_create_virtual(struct intel_engine_cs 
**siblings,
 
ve->base.schedule = i915_schedule;
ve->base.submit_request = virtual_submit_request;
-   ve->base.bond_execute = virtual_bond_execute;
 
INIT_LIST_HEAD(virtual_queue(ve));
ve->base.execlists.queue_priority_hint = INT_MIN;
-- 
2.31.1



[PATCH 07/29] drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)

2021-05-27 Thread Jason Ekstrand
This API is entirely unnecessary and I'd love to get rid of it.  If
userspace wants a single timeline across multiple contexts, they can
either use implicit synchronization or a syncobj, both of which existed
at the time this feature landed.  The justification given at the time
was that it would help GL drivers which are inherently single-timeline.
However, neither of our GL drivers actually wanted the feature.  i965
was already in maintenance mode at the time and iris uses syncobj for
everything.

Unfortunately, as much as I'd love to get rid of it, it is used by the
media driver so we can't do that.  We can, however, do the next-best
thing which is to embed a syncobj in the context and do exactly what
we'd expect from userspace internally.  This isn't an entirely identical
implementation because it's no longer atomic if userspace races with
itself by calling execbuffer2 twice simultaneously from different
threads.  It won't crash in that case; it just doesn't guarantee any
ordering between those two submits.  It also means that sync files
exported from different engines on a SINGLE_TIMELINE context will have
different fence contexts.  This is visible to userspace if it looks at
the obj_name field of sync_fence_info.

Moving SINGLE_TIMELINE to a syncobj emulation has a couple of technical
advantages beyond mere annoyance.  One is that intel_timeline is no
longer an api-visible object and can remain entirely an implementation
detail.  This may be advantageous as we make scheduler changes going
forward.  Second is that, together with deleting the CLONE_CONTEXT API,
we should now have a 1:1 mapping between intel_context and
intel_timeline which may help us reduce locking.

v2 (Tvrtko Ursulin):
 - Update the comment on i915_gem_context::syncobj to mention that it's
   an emulation and the possible race if userspace calls execbuffer2
   twice on the same context concurrently.
v2 (Jason Ekstrand):
 - Wrap the checks for eb.gem_context->syncobj in unlikely()
 - Drop the dma_fence reference
 - Improved commit message

v3 (Jason Ekstrand):
 - Move the dma_fence_put() to before the error exit

v4 (Tvrtko Ursulin):
 - Add a comment about fence contexts to the commit message

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 49 +--
 .../gpu/drm/i915/gem/i915_gem_context_types.h | 14 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 16 ++
 3 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 97613e529aab3..aa792c9517e16 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -67,6 +67,8 @@
 #include 
 #include 
 
+#include 
+
 #include "gt/gen6_ppgtt.h"
 #include "gt/intel_context.h"
 #include "gt/intel_context_param.h"
@@ -224,10 +226,6 @@ static void intel_context_set_gem(struct intel_context *ce,
ce->vm = vm;
}
 
-   GEM_BUG_ON(ce->timeline);
-   if (ctx->timeline)
-   ce->timeline = intel_timeline_get(ctx->timeline);
-
if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
intel_engine_has_timeslices(ce->engine))
__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
@@ -351,9 +349,6 @@ void i915_gem_context_release(struct kref *ref)
mutex_destroy(&ctx->engines_mutex);
mutex_destroy(&ctx->lut_mutex);
 
-   if (ctx->timeline)
-   intel_timeline_put(ctx->timeline);
-
put_pid(ctx->pid);
mutex_destroy(&ctx->mutex);
 
@@ -570,6 +565,9 @@ static void context_close(struct i915_gem_context *ctx)
if (vm)
i915_vm_close(vm);
 
+   if (ctx->syncobj)
+   drm_syncobj_put(ctx->syncobj);
+
ctx->file_priv = ERR_PTR(-EBADF);
 
/*
@@ -765,33 +763,11 @@ static void __assign_ppgtt(struct i915_gem_context *ctx,
i915_vm_close(vm);
 }
 
-static void __set_timeline(struct intel_timeline **dst,
-  struct intel_timeline *src)
-{
-   struct intel_timeline *old = *dst;
-
-   *dst = src ? intel_timeline_get(src) : NULL;
-
-   if (old)
-   intel_timeline_put(old);
-}
-
-static void __apply_timeline(struct intel_context *ce, void *timeline)
-{
-   __set_timeline(&ce->timeline, timeline);
-}
-
-static void __assign_timeline(struct i915_gem_context *ctx,
- struct intel_timeline *timeline)
-{
-   __set_timeline(&ctx->timeline, timeline);
-   context_apply_all(ctx, __apply_timeline, timeline);
-}
-
 static struct i915_gem_context *
 i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags)
 {
struct i915_gem_context *ctx;
+   int ret;
 
if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE &&
!HAS_EXECLISTS(i915))
@@ -820,16 +796,13 @@ i915_gem_create_context(struct drm_i915_private *

[PATCH 16/29] drm/i915/gem: Add an intermediate proto_context struct

2021-05-27 Thread Jason Ekstrand
The current context uAPI allows for two methods of setting context
parameters: SET_CONTEXT_PARAM and CONTEXT_CREATE_EXT_SETPARAM.  The
former is allowed to be called at any time while the later happens as
part of GEM_CONTEXT_CREATE.  Currently, everything settable via one is
settable via the other.  While some params are fairly simple and setting
them on a live context is harmless such the context priority, others are
far trickier such as the VM or the set of engines.  In order to swap out
the VM, for instance, we have to delay until all current in-flight work
is complete, swap in the new VM, and then continue.  This leads to a
plethora of potential race conditions we'd really rather avoid.

Unfortunately, both methods of setting the VM and engine set are in
active use today so we can't simply disallow setting the VM or engine
set vial SET_CONTEXT_PARAM.  In order to work around this wart, this
commit adds a proto-context struct which contains all the context create
parameters.

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 145 ++
 .../gpu/drm/i915/gem/i915_gem_context_types.h |  22 +++
 .../gpu/drm/i915/gem/selftests/mock_context.c |  16 +-
 3 files changed, 153 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index fc471243aa769..10bff488444b6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -191,6 +191,97 @@ static int validate_priority(struct drm_i915_private *i915,
return 0;
 }
 
+static void proto_context_close(struct i915_gem_proto_context *pc)
+{
+   if (pc->vm)
+   i915_vm_put(pc->vm);
+   kfree(pc);
+}
+
+static int proto_context_set_persistence(struct drm_i915_private *i915,
+struct i915_gem_proto_context *pc,
+bool persist)
+{
+   if (persist) {
+   /*
+* Only contexts that are short-lived [that will expire or be
+* reset] are allowed to survive past termination. We require
+* hangcheck to ensure that the persistent requests are healthy.
+*/
+   if (!i915->params.enable_hangcheck)
+   return -EINVAL;
+
+   __set_bit(UCONTEXT_PERSISTENCE, &pc->user_flags);
+   } else {
+   /* To cancel a context we use "preempt-to-idle" */
+   if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_PREEMPTION))
+   return -ENODEV;
+
+   /*
+* If the cancel fails, we then need to reset, cleanly!
+*
+* If the per-engine reset fails, all hope is lost! We resort
+* to a full GPU reset in that unlikely case, but realistically
+* if the engine could not reset, the full reset does not fare
+* much better. The damage has been done.
+*
+* However, if we cannot reset an engine by itself, we cannot
+* cleanup a hanging persistent context without causing
+* colateral damage, and we should not pretend we can by
+* exposing the interface.
+*/
+   if (!intel_has_reset_engine(&i915->gt))
+   return -ENODEV;
+
+   __clear_bit(UCONTEXT_PERSISTENCE, &pc->user_flags);
+   }
+
+   return 0;
+}
+
+static struct i915_gem_proto_context *
+proto_context_create(struct drm_i915_private *i915, unsigned int flags)
+{
+   struct i915_gem_proto_context *pc, *err;
+
+   pc = kzalloc(sizeof(*pc), GFP_KERNEL);
+   if (!pc)
+   return ERR_PTR(-ENOMEM);
+
+   if (HAS_FULL_PPGTT(i915)) {
+   struct i915_ppgtt *ppgtt;
+
+   ppgtt = i915_ppgtt_create(&i915->gt);
+   if (IS_ERR(ppgtt)) {
+   drm_dbg(&i915->drm, "PPGTT setup failed (%ld)\n",
+   PTR_ERR(ppgtt));
+   err = ERR_CAST(ppgtt);
+   goto proto_close;
+   }
+   pc->vm = &ppgtt->vm;
+   }
+
+   pc->user_flags = 0;
+   __set_bit(UCONTEXT_BANNABLE, &pc->user_flags);
+   __set_bit(UCONTEXT_RECOVERABLE, &pc->user_flags);
+   proto_context_set_persistence(i915, pc, true);
+   pc->sched.priority = I915_PRIORITY_NORMAL;
+
+   if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE) {
+   if (!HAS_EXECLISTS(i915)) {
+   err = ERR_PTR(-EINVAL);
+   goto proto_close;
+   }
+   pc->single_timeline = true;
+   }
+
+   return pc;
+
+proto_close:
+   proto_context_close(pc);
+   return err;
+}
+
 static struct i915_address_space *
 context_get_vm_rcu(struct i915_gem_context *ctx)
 {
@@ -660,7 +751,8 @@ s

[PATCH 15/29] drm/i915: Add gem/i915_gem_context.h to the docs

2021-05-27 Thread Jason Ekstrand
In order to prevent kernel doc warnings, also fill out docs for any
missing fields and fix those that forgot the "@".

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 Documentation/gpu/i915.rst|  2 +
 .../gpu/drm/i915/gem/i915_gem_context_types.h | 43 ---
 2 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 486c720f38907..0529e5183982e 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -422,6 +422,8 @@ Batchbuffer Parsing
 User Batchbuffer Execution
 --
 
+.. kernel-doc:: drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+
 .. kernel-doc:: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
:doc: User command execution
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index df76767f0c41b..5f0673a2129f9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -30,19 +30,39 @@ struct i915_address_space;
 struct intel_timeline;
 struct intel_ring;
 
+/**
+ * struct i915_gem_engines - A set of engines
+ */
 struct i915_gem_engines {
union {
+   /** @link: Link in i915_gem_context::stale::engines */
struct list_head link;
+
+   /** @rcu: RCU to use when freeing */
struct rcu_head rcu;
};
+
+   /** @fence: Fence used for delayed destruction of engines */
struct i915_sw_fence fence;
+
+   /** @ctx: i915_gem_context backpointer */
struct i915_gem_context *ctx;
+
+   /** @num_engines: Number of engines in this set */
unsigned int num_engines;
+
+   /** @engines: Array of engines */
struct intel_context *engines[];
 };
 
+/**
+ * struct i915_gem_engines_iter - Iterator for an i915_gem_engines set
+ */
 struct i915_gem_engines_iter {
+   /** @idx: Index into i915_gem_engines::engines */
unsigned int idx;
+
+   /** @engines: Engine set being iterated */
const struct i915_gem_engines *engines;
 };
 
@@ -53,10 +73,10 @@ struct i915_gem_engines_iter {
  * logical hardware state for a particular client.
  */
 struct i915_gem_context {
-   /** i915: i915 device backpointer */
+   /** @i915: i915 device backpointer */
struct drm_i915_private *i915;
 
-   /** file_priv: owning file descriptor */
+   /** @file_priv: owning file descriptor */
struct drm_i915_file_private *file_priv;
 
/**
@@ -81,7 +101,9 @@ struct i915_gem_context {
 * CONTEXT_USER_ENGINES flag is set).
 */
struct i915_gem_engines __rcu *engines;
-   struct mutex engines_mutex; /* guards writes to engines */
+
+   /** @engines_mutex: guards writes to engines */
+   struct mutex engines_mutex;
 
/**
 * @syncobj: Shared timeline syncobj
@@ -118,7 +140,7 @@ struct i915_gem_context {
 */
struct pid *pid;
 
-   /** link: place with &drm_i915_private.context_list */
+   /** @link: place with &drm_i915_private.context_list */
struct list_head link;
 
/**
@@ -153,11 +175,13 @@ struct i915_gem_context {
 #define CONTEXT_CLOSED 0
 #define CONTEXT_USER_ENGINES   1
 
+   /** @mutex: guards everything that isn't engines or handles_vma */
struct mutex mutex;
 
+   /** @sched: scheduler parameters */
struct i915_sched_attr sched;
 
-   /** guilty_count: How many times this context has caused a GPU hang. */
+   /** @guilty_count: How many times this context has caused a GPU hang. */
atomic_t guilty_count;
/**
 * @active_count: How many times this context was active during a GPU
@@ -171,15 +195,17 @@ struct i915_gem_context {
unsigned long hang_timestamp[2];
 #define CONTEXT_FAST_HANG_JIFFIES (120 * HZ) /* 3 hangs within 120s? Banned! */
 
-   /** remap_slice: Bitmask of cache lines that need remapping */
+   /** @remap_slice: Bitmask of cache lines that need remapping */
u8 remap_slice;
 
/**
-* handles_vma: rbtree to look up our context specific obj/vma for
+* @handles_vma: rbtree to look up our context specific obj/vma for
 * the user handle. (user handles are per fd, but the binding is
 * per vm, which may be one per context or shared with the global GTT)
 */
struct radix_tree_root handles_vma;
+
+   /** @lut_mutex: Locks handles_vma */
struct mutex lut_mutex;
 
/**
@@ -191,8 +217,11 @@ struct i915_gem_context {
 */
char name[TASK_COMM_LEN + 8];
 
+   /** @stale: tracks stale engines to be destroyed */
struct {
+   /** @lock: guards engines */
spinlock_t lock;
+   /** @engines: list of stale engines */
struct list_head engines;
} stal

[PATCH 06/29] drm/i915: Drop the CONTEXT_CLONE API (v2)

2021-05-27 Thread Jason Ekstrand
This API allows one context to grab bits out of another context upon
creation.  It can be used as a short-cut for setparam(getparam()) for
things like I915_CONTEXT_PARAM_VM.  However, it's never been used by any
real userspace.  It's used by a few IGT tests and that's it.  Since it
doesn't add any real value (most of the stuff you can CLONE you can copy
in other ways), drop it.

There is one thing that this API allows you to clone which you cannot
clone via getparam/setparam: timelines.  However, timelines are an
implementation detail of i915 and not really something that needs to be
exposed to userspace.  Also, sharing timelines between contexts isn't
obviously useful and supporting it has the potential to complicate i915
internally.  It also doesn't add any functionality that the client can't
get in other ways.  If a client really wants a shared timeline, they can
use a syncobj and set it as an in and out fence on every submit.

v2 (Jason Ekstrand):
 - More detailed commit message

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 199 +-
 .../drm/i915/gt/intel_execlists_submission.c  |  28 ---
 .../drm/i915/gt/intel_execlists_submission.h  |   3 -
 include/uapi/drm/i915_drm.h   |  16 +-
 4 files changed, 6 insertions(+), 240 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 6f1e5c2c5b113..97613e529aab3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1957,207 +1957,14 @@ static int create_setparam(struct i915_user_extension 
__user *ext, void *data)
return ctx_setparam(arg->fpriv, arg->ctx, &local.param);
 }
 
-static int clone_engines(struct i915_gem_context *dst,
-struct i915_gem_context *src)
+static int invalid_ext(struct i915_user_extension __user *ext, void *data)
 {
-   struct i915_gem_engines *clone, *e;
-   bool user_engines;
-   unsigned long n;
-
-   e = __context_engines_await(src, &user_engines);
-   if (!e)
-   return -ENOENT;
-
-   clone = alloc_engines(e->num_engines);
-   if (!clone)
-   goto err_unlock;
-
-   for (n = 0; n < e->num_engines; n++) {
-   struct intel_engine_cs *engine;
-
-   if (!e->engines[n]) {
-   clone->engines[n] = NULL;
-   continue;
-   }
-   engine = e->engines[n]->engine;
-
-   /*
-* Virtual engines are singletons; they can only exist
-* inside a single context, because they embed their
-* HW context... As each virtual context implies a single
-* timeline (each engine can only dequeue a single request
-* at any time), it would be surprising for two contexts
-* to use the same engine. So let's create a copy of
-* the virtual engine instead.
-*/
-   if (intel_engine_is_virtual(engine))
-   clone->engines[n] =
-   intel_execlists_clone_virtual(engine);
-   else
-   clone->engines[n] = intel_context_create(engine);
-   if (IS_ERR_OR_NULL(clone->engines[n])) {
-   __free_engines(clone, n);
-   goto err_unlock;
-   }
-
-   intel_context_set_gem(clone->engines[n], dst);
-   }
-   clone->num_engines = n;
-   i915_sw_fence_complete(&e->fence);
-
-   /* Serialised by constructor */
-   engines_idle_release(dst, rcu_replace_pointer(dst->engines, clone, 1));
-   if (user_engines)
-   i915_gem_context_set_user_engines(dst);
-   else
-   i915_gem_context_clear_user_engines(dst);
-   return 0;
-
-err_unlock:
-   i915_sw_fence_complete(&e->fence);
-   return -ENOMEM;
-}
-
-static int clone_flags(struct i915_gem_context *dst,
-  struct i915_gem_context *src)
-{
-   dst->user_flags = src->user_flags;
-   return 0;
-}
-
-static int clone_schedattr(struct i915_gem_context *dst,
-  struct i915_gem_context *src)
-{
-   dst->sched = src->sched;
-   return 0;
-}
-
-static int clone_sseu(struct i915_gem_context *dst,
- struct i915_gem_context *src)
-{
-   struct i915_gem_engines *e = i915_gem_context_lock_engines(src);
-   struct i915_gem_engines *clone;
-   unsigned long n;
-   int err;
-
-   /* no locking required; sole access under constructor*/
-   clone = __context_engines_static(dst);
-   if (e->num_engines != clone->num_engines) {
-   err = -EINVAL;
-   goto unlock;
-   }
-
-   for (n = 0; n < e->num_engines; n++) {
-   struct intel_context *ce = e->engines[n];
-
- 

[PATCH 05/29] drm/i915/gem: Return void from context_apply_all

2021-05-27 Thread Jason Ekstrand
None of the callbacks we use with it return an error code anymore; they
all return 0 unconditionally.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 26 +++--
 1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 9a8a96e4346e4..6f1e5c2c5b113 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -718,32 +718,25 @@ __context_engines_await(const struct i915_gem_context 
*ctx,
return engines;
 }
 
-static int
+static void
 context_apply_all(struct i915_gem_context *ctx,
- int (*fn)(struct intel_context *ce, void *data),
+ void (*fn)(struct intel_context *ce, void *data),
  void *data)
 {
struct i915_gem_engines_iter it;
struct i915_gem_engines *e;
struct intel_context *ce;
-   int err = 0;
 
e = __context_engines_await(ctx, NULL);
-   for_each_gem_engine(ce, e, it) {
-   err = fn(ce, data);
-   if (err)
-   break;
-   }
+   for_each_gem_engine(ce, e, it)
+   fn(ce, data);
i915_sw_fence_complete(&e->fence);
-
-   return err;
 }
 
-static int __apply_ppgtt(struct intel_context *ce, void *vm)
+static void __apply_ppgtt(struct intel_context *ce, void *vm)
 {
i915_vm_put(ce->vm);
ce->vm = i915_vm_get(vm);
-   return 0;
 }
 
 static struct i915_address_space *
@@ -783,10 +776,9 @@ static void __set_timeline(struct intel_timeline **dst,
intel_timeline_put(old);
 }
 
-static int __apply_timeline(struct intel_context *ce, void *timeline)
+static void __apply_timeline(struct intel_context *ce, void *timeline)
 {
__set_timeline(&ce->timeline, timeline);
-   return 0;
 }
 
 static void __assign_timeline(struct i915_gem_context *ctx,
@@ -1841,19 +1833,17 @@ set_persistence(struct i915_gem_context *ctx,
return __context_set_persistence(ctx, args->value);
 }
 
-static int __apply_priority(struct intel_context *ce, void *arg)
+static void __apply_priority(struct intel_context *ce, void *arg)
 {
struct i915_gem_context *ctx = arg;
 
if (!intel_engine_has_timeslices(ce->engine))
-   return 0;
+   return;
 
if (ctx->sched.priority >= I915_PRIORITY_NORMAL)
intel_context_set_use_semaphores(ce);
else
intel_context_clear_use_semaphores(ce);
-
-   return 0;
 }
 
 static int set_priority(struct i915_gem_context *ctx,
-- 
2.31.1



[PATCH 04/29] drm/i915/gem: Set the watchdog timeout directly in intel_context_set_gem (v2)

2021-05-27 Thread Jason Ekstrand
Instead of handling it like a context param, unconditionally set it when
intel_contexts are created.  For years we've had the idea of a watchdog
uAPI floating about. The aim was for media, so that they could set very
tight deadlines for their transcodes jobs, so that if you have a corrupt
bitstream (especially for decoding) you don't hang your desktop too
hard.  But it's been stuck in limbo since forever, and this simplifies
things a bit in preparation for the proto-context work.  If we decide to
actually make said uAPI a reality, we can do it through the proto-
context easily enough.

This does mean that we move from reading the request_timeout_ms param
once per engine when engines are created instead of once at context
creation.  If someone changes request_timeout_ms between creating a
context and setting engines, it will mean that they get the new timeout.
If someone races setting request_timeout_ms and context creation, they
can theoretically end up with different timeouts.  However, since both
of these are fairly harmless and require changing kernel params, we
don't care.

v2 (Tvrtko Ursulin):
 - Add a comment about races with request_timeout_ms

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 44 +++
 .../gpu/drm/i915/gem/i915_gem_context_types.h |  4 --
 drivers/gpu/drm/i915/gt/intel_context_param.h |  3 +-
 3 files changed, 7 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 868c18c08a0b1..9a8a96e4346e4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -232,7 +232,12 @@ static void intel_context_set_gem(struct intel_context *ce,
intel_engine_has_timeslices(ce->engine))
__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
 
-   intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us);
+   if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) &&
+   ctx->i915->params.request_timeout_ms) {
+   unsigned int timeout_ms = ctx->i915->params.request_timeout_ms;
+
+   intel_context_set_watchdog_us(ce, (u64)timeout_ms * 1000);
+   }
 }
 
 static void __free_engines(struct i915_gem_engines *e, unsigned int count)
@@ -791,41 +796,6 @@ static void __assign_timeline(struct i915_gem_context *ctx,
context_apply_all(ctx, __apply_timeline, timeline);
 }
 
-static int __apply_watchdog(struct intel_context *ce, void *timeout_us)
-{
-   return intel_context_set_watchdog_us(ce, (uintptr_t)timeout_us);
-}
-
-static int
-__set_watchdog(struct i915_gem_context *ctx, unsigned long timeout_us)
-{
-   int ret;
-
-   ret = context_apply_all(ctx, __apply_watchdog,
-   (void *)(uintptr_t)timeout_us);
-   if (!ret)
-   ctx->watchdog.timeout_us = timeout_us;
-
-   return ret;
-}
-
-static void __set_default_fence_expiry(struct i915_gem_context *ctx)
-{
-   struct drm_i915_private *i915 = ctx->i915;
-   int ret;
-
-   if (!IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) ||
-   !i915->params.request_timeout_ms)
-   return;
-
-   /* Default expiry for user fences. */
-   ret = __set_watchdog(ctx, i915->params.request_timeout_ms * 1000);
-   if (ret)
-   drm_notice(&i915->drm,
-  "Failed to configure default fence expiry! (%d)",
-  ret);
-}
-
 static struct i915_gem_context *
 i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags)
 {
@@ -870,8 +840,6 @@ i915_gem_create_context(struct drm_i915_private *i915, 
unsigned int flags)
intel_timeline_put(timeline);
}
 
-   __set_default_fence_expiry(ctx);
-
trace_i915_context_create(ctx);
 
return ctx;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 5ae71ec936f7c..676592e27e7d2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -153,10 +153,6 @@ struct i915_gem_context {
 */
atomic_t active_count;
 
-   struct {
-   u64 timeout_us;
-   } watchdog;
-
/**
 * @hang_timestamp: The last time(s) this context caused a GPU hang
 */
diff --git a/drivers/gpu/drm/i915/gt/intel_context_param.h 
b/drivers/gpu/drm/i915/gt/intel_context_param.h
index dffedd983693d..0c69cb42d075c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_param.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_param.h
@@ -10,11 +10,10 @@
 
 #include "intel_context.h"
 
-static inline int
+static inline void
 intel_context_set_watchdog_us(struct intel_context *ce, u64 timeout_us)
 {
ce->watchdog.timeout_us = timeout_us;
-   return 0;
 }
 
 #endif /* INTEL_CONTEXT_PARAM_H */
-- 
2.31.1



[PATCH 01/29] drm/i915: Drop I915_CONTEXT_PARAM_RINGSIZE

2021-05-27 Thread Jason Ekstrand
This reverts commit 88be76cdafc7 ("drm/i915: Allow userspace to specify
ringsize on construction").  This API was originally added for OpenCL
but the compute-runtime PR has sat open for a year without action so we
can still pull it out if we want.  I argue we should drop it for three
reasons:

 1. If the compute-runtime PR has sat open for a year, this clearly
isn't that important.

 2. It's a very leaky API.  Ring size is an implementation detail of the
current execlist scheduler and really only makes sense there.  It
can't apply to the older ring-buffer scheduler on pre-execlist
hardware because that's shared across all contexts and it won't
apply to the GuC scheduler that's in the pipeline.

 3. Having userspace set a ring size in bytes is a bad solution to the
problem of having too small a ring.  There is no way that userspace
has the information to know how to properly set the ring size so
it's just going to detect the feature and always set it to the
maximum of 512K.  This is what the compute-runtime PR does.  The
scheduler in i915, on the other hand, does have the information to
make an informed choice.  It could detect if the ring size is a
problem and grow it itself.  Or, if that's too hard, we could just
increase the default size from 16K to 32K or even 64K instead of
relying on userspace to do it.

Let's drop this API for now and, if someone decides they really care
about solving this problem, they can do it properly.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/Makefile |  1 -
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 85 +--
 drivers/gpu/drm/i915/gt/intel_context_param.c | 63 --
 drivers/gpu/drm/i915/gt/intel_context_param.h |  3 -
 include/uapi/drm/i915_drm.h   | 20 +
 5 files changed, 4 insertions(+), 168 deletions(-)
 delete mode 100644 drivers/gpu/drm/i915/gt/intel_context_param.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index d0d936d9137bc..afa22338fa343 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -88,7 +88,6 @@ gt-y += \
gt/gen8_ppgtt.o \
gt/intel_breadcrumbs.o \
gt/intel_context.o \
-   gt/intel_context_param.o \
gt/intel_context_sseu.o \
gt/intel_engine_cs.o \
gt/intel_engine_heartbeat.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 188dee13e017d..650364a0dae28 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1334,63 +1334,6 @@ static int set_ppgtt(struct drm_i915_file_private 
*file_priv,
return err;
 }
 
-static int __apply_ringsize(struct intel_context *ce, void *sz)
-{
-   return intel_context_set_ring_size(ce, (unsigned long)sz);
-}
-
-static int set_ringsize(struct i915_gem_context *ctx,
-   struct drm_i915_gem_context_param *args)
-{
-   if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915))
-   return -ENODEV;
-
-   if (args->size)
-   return -EINVAL;
-
-   if (!IS_ALIGNED(args->value, I915_GTT_PAGE_SIZE))
-   return -EINVAL;
-
-   if (args->value < I915_GTT_PAGE_SIZE)
-   return -EINVAL;
-
-   if (args->value > 128 * I915_GTT_PAGE_SIZE)
-   return -EINVAL;
-
-   return context_apply_all(ctx,
-__apply_ringsize,
-__intel_context_ring_size(args->value));
-}
-
-static int __get_ringsize(struct intel_context *ce, void *arg)
-{
-   long sz;
-
-   sz = intel_context_get_ring_size(ce);
-   GEM_BUG_ON(sz > INT_MAX);
-
-   return sz; /* stop on first engine */
-}
-
-static int get_ringsize(struct i915_gem_context *ctx,
-   struct drm_i915_gem_context_param *args)
-{
-   int sz;
-
-   if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915))
-   return -ENODEV;
-
-   if (args->size)
-   return -EINVAL;
-
-   sz = context_apply_all(ctx, __get_ringsize, NULL);
-   if (sz < 0)
-   return sz;
-
-   args->value = sz;
-   return 0;
-}
-
 int
 i915_gem_user_to_context_sseu(struct intel_gt *gt,
  const struct drm_i915_gem_context_param_sseu 
*user,
@@ -2036,11 +1979,8 @@ static int ctx_setparam(struct drm_i915_file_private 
*fpriv,
ret = set_persistence(ctx, args);
break;
 
-   case I915_CONTEXT_PARAM_RINGSIZE:
-   ret = set_ringsize(ctx, args);
-   break;
-
case I915_CONTEXT_PARAM_BAN_PERIOD:
+   case I915_CONTEXT_PARAM_RINGSIZE:
default:
ret = -EINVAL;
break;
@@ -2068,18 +2008,6 @@ static int create_setparam(struct i915_user_extension 
__user *ext, void *data)
return ctx_setparam(arg->fpr

[PATCH 03/29] drm/i915: Drop I915_CONTEXT_PARAM_NO_ZEROMAP

2021-05-27 Thread Jason Ekstrand
The idea behind this param is to support OpenCL drivers with relocations
because OpenCL reserves 0x0 for NULL and, if we placed memory there, it
would confuse CL kernels.  It was originally sent out as part of a patch
series including libdrm [1] and Beignet [2] support.  However, the
libdrm and Beignet patches never landed in their respective upstream
projects so this API has never been used.  It's never been used in Mesa
or any other driver, either.

Dropping this API allows us to delete a small bit of code.

[1]: https://lists.freedesktop.org/archives/intel-gfx/2015-May/067030.html
[2]: https://lists.freedesktop.org/archives/intel-gfx/2015-May/067031.html

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c  | 16 ++--
 .../gpu/drm/i915/gem/i915_gem_context_types.h|  1 -
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c   |  8 
 include/uapi/drm/i915_drm.h  |  4 
 4 files changed, 6 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index ec999b7ca50f4..868c18c08a0b1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1920,15 +1920,6 @@ static int ctx_setparam(struct drm_i915_file_private 
*fpriv,
int ret = 0;
 
switch (args->param) {
-   case I915_CONTEXT_PARAM_NO_ZEROMAP:
-   if (args->size)
-   ret = -EINVAL;
-   else if (args->value)
-   set_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
-   else
-   clear_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
-   break;
-
case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
if (args->size)
ret = -EINVAL;
@@ -1978,6 +1969,7 @@ static int ctx_setparam(struct drm_i915_file_private 
*fpriv,
ret = set_persistence(ctx, args);
break;
 
+   case I915_CONTEXT_PARAM_NO_ZEROMAP:
case I915_CONTEXT_PARAM_BAN_PERIOD:
case I915_CONTEXT_PARAM_RINGSIZE:
default:
@@ -2358,11 +2350,6 @@ int i915_gem_context_getparam_ioctl(struct drm_device 
*dev, void *data,
return -ENOENT;
 
switch (args->param) {
-   case I915_CONTEXT_PARAM_NO_ZEROMAP:
-   args->size = 0;
-   args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
-   break;
-
case I915_CONTEXT_PARAM_GTT_SIZE:
args->size = 0;
rcu_read_lock();
@@ -2410,6 +2397,7 @@ int i915_gem_context_getparam_ioctl(struct drm_device 
*dev, void *data,
args->value = i915_gem_context_is_persistent(ctx);
break;
 
+   case I915_CONTEXT_PARAM_NO_ZEROMAP:
case I915_CONTEXT_PARAM_BAN_PERIOD:
case I915_CONTEXT_PARAM_RINGSIZE:
default:
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 340473aa70de0..5ae71ec936f7c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -129,7 +129,6 @@ struct i915_gem_context {
 * @user_flags: small set of booleans controlled by the user
 */
unsigned long user_flags;
-#define UCONTEXT_NO_ZEROMAP0
 #define UCONTEXT_NO_ERROR_CAPTURE  1
 #define UCONTEXT_BANNABLE  2
 #define UCONTEXT_RECOVERABLE   3
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 297143511f99b..b812f313422a9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -290,7 +290,6 @@ struct i915_execbuffer {
struct intel_context *reloc_context;
 
u64 invalid_flags; /** Set of execobj.flags that are invalid */
-   u32 context_flags; /** Set of execobj.flags to insert from the ctx */
 
u64 batch_len; /** Length of batch within object */
u32 batch_start_offset; /** Location within object of batch */
@@ -541,9 +540,6 @@ eb_validate_vma(struct i915_execbuffer *eb,
entry->flags |= EXEC_OBJECT_NEEDS_GTT | 
__EXEC_OBJECT_NEEDS_MAP;
}
 
-   if (!(entry->flags & EXEC_OBJECT_PINNED))
-   entry->flags |= eb->context_flags;
-
return 0;
 }
 
@@ -750,10 +746,6 @@ static int eb_select_context(struct i915_execbuffer *eb)
if (rcu_access_pointer(ctx->vm))
eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
 
-   eb->context_flags = 0;
-   if (test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags))
-   eb->context_flags |= __EXEC_OBJECT_NEEDS_BIAS;
-
return 0;
 }
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index ad8f1a0f587f6..e527f5f7e0dea 100644
--- a/include/uapi/drm/i915

[PATCH 02/29] drm/i915: Stop storing the ring size in the ring pointer (v2)

2021-05-27 Thread Jason Ekstrand
Previously, we were storing the ring size in the ring pointer before it
was actually allocated.  We would then guard setting the ring size on
checking for CONTEXT_ALLOC_BIT.  This is error-prone at best and really
only saves us a few bytes on something that already burns at least 4K.
Instead, this patch adds a new ring_size field and makes everything use
that.

v2 (Daniel Vetter):
 - Replace 512 * SZ_4K with SZ_2M

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 3 +--
 drivers/gpu/drm/i915/gt/intel_context.c   | 3 ++-
 drivers/gpu/drm/i915/gt/intel_context.h   | 5 -
 drivers/gpu/drm/i915/gt/intel_context_types.h | 1 +
 drivers/gpu/drm/i915/gt/intel_lrc.c   | 2 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 2 +-
 drivers/gpu/drm/i915/gt/selftest_mocs.c   | 2 +-
 drivers/gpu/drm/i915/gt/selftest_timeline.c   | 2 +-
 drivers/gpu/drm/i915/gvt/scheduler.c  | 7 ++-
 9 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 650364a0dae28..ec999b7ca50f4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -211,8 +211,7 @@ static void intel_context_set_gem(struct intel_context *ce,
GEM_BUG_ON(rcu_access_pointer(ce->gem_context));
RCU_INIT_POINTER(ce->gem_context, ctx);
 
-   if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags))
-   ce->ring = __intel_context_ring_size(SZ_16K);
+   ce->ring_size = SZ_16K;
 
if (rcu_access_pointer(ctx->vm)) {
struct i915_address_space *vm;
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 4033184f13b9f..bd63813c8a802 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -371,7 +371,8 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
ce->engine = engine;
ce->ops = engine->cops;
ce->sseu = engine->sseu;
-   ce->ring = __intel_context_ring_size(SZ_4K);
+   ce->ring = NULL;
+   ce->ring_size = SZ_4K;
 
ewma_runtime_init(&ce->runtime.avg);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index f83a73a2b39fc..b10cbe8fee992 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -175,11 +175,6 @@ int intel_context_prepare_remote_request(struct 
intel_context *ce,
 
 struct i915_request *intel_context_create_request(struct intel_context *ce);
 
-static inline struct intel_ring *__intel_context_ring_size(u64 sz)
-{
-   return u64_to_ptr(struct intel_ring, sz);
-}
-
 static inline bool intel_context_is_barrier(const struct intel_context *ce)
 {
return test_bit(CONTEXT_BARRIER_BIT, &ce->flags);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index ed8c447a7346b..90026c1771055 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -82,6 +82,7 @@ struct intel_context {
spinlock_t signal_lock; /* protects signals, the list of requests */
 
struct i915_vma *state;
+   u32 ring_size;
struct intel_ring *ring;
struct intel_timeline *timeline;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index aafe2a4df4960..890b43b296a90 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -845,7 +845,7 @@ int lrc_alloc(struct intel_context *ce, struct 
intel_engine_cs *engine)
if (IS_ERR(vma))
return PTR_ERR(vma);
 
-   ring = intel_engine_create_ring(engine, (unsigned long)ce->ring);
+   ring = intel_engine_create_ring(engine, ce->ring_size);
if (IS_ERR(ring)) {
err = PTR_ERR(ring);
goto err_vma;
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c 
b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index 1081cd36a2bd3..01d9896dd4844 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -2793,7 +2793,7 @@ static int __live_preempt_ring(struct intel_engine_cs 
*engine,
goto err_ce;
}
 
-   tmp->ring = __intel_context_ring_size(ring_sz);
+   tmp->ring_size = ring_sz;
 
err = intel_context_pin(tmp);
if (err) {
diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c 
b/drivers/gpu/drm/i915/gt/selftest_mocs.c
index e55a887d11e2b..f343fa5fd986f 100644
--- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
@@ -28,7 +28,7 @@ static struct intel_context *mocs_context_create(struct 
intel_engine_cs *engine)
return ce;
 
/

  1   2   3   >