Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-20 Thread Chen Li
On Fri, 18 Dec 2020 16:10:12 +0800,
Christian König wrote:
> 
> Am 18.12.20 um 04:51 schrieb Chen Li:
> > [SNIP]
>  If your ARM base board can't do that for some then you can't use the 
>  hardware
>  with that board.
> >>> Good to know, thanks! BTW, have you ever seen or heard boards like mine 
> >>> which cannot mmap device memory correctly from userspace correctly?
> >> Unfortunately yes. We haven't been able to figure out what exactly goes 
> >> wrong in
> >> those cases.
> > Ok. one more question: only e8860 or all radeon cards have this issue?
> 
> This applies to all hardware with dedicated memory which needs to be mapped to
> userspace.
> 
> That includes all graphics hardware from AMD as well as NVidia and probably a
> whole bunch of other PCIe devices.

Can mmio on these devices work fine in kernel space? I cannot see the 
difference here except user space should use uncacheable mmap to map virtual 
memory to device space(though I don't know how to use uncacheable mmap), while 
kernel use uncache ioremap. 

> 
> >>>   The graphics address remapping table (GART),[1] also known as the 
> >>> graphics aperture remapping table,[2] or graphics translation table 
> >>> (GTT),[3] is an I/O memory management unit (IOMMU) used by Accelerated 
> >>> Graphics Port (AGP) and PCI Express (PCIe) graphics cards.
> >> GART or GTT refers to the translation tables graphics hardware use to 
> >> access
> >> system memory.
> >> 
> >> Something like 15 years ago we used the IOMMU functionality from AGP to
> >> implement that. But modern hardware (PCIe) uses some specialized hardware 
> >> in the
> >> GPU for that.
> >> 
> >> Regards,
> >> Christian.
> >> 
> >> 
> >> 
> > Good to know, thanks! So modern GART/GTT is like tlb, and iommu is forcused 
> > on translating address and not manager the tlb.
> 
> You are getting closer in your understanding, but the TLB is the Translation
> lookaside buffer. Basically a cache of recent VM translations which is present
> is all page table translations (GART, IOMMU, CPU etc...).
> 
> The key difference is where the page table translation happens on modern
> hardware:
> 1. For the GART/GTT it is inside the GPU to translate between GPU internal and
> bus addresses.
> 2. For IOMMU it is inside the root complex of the PCIe to translate between 
> bus
> addresses and physical addresses.
> 3. For CPU page tables it is inside the CPU core to translate between virtual
> addresses and physical addresses.
> 
> Regards,
> Christian.
> 
> 

Awesome explaination! Thanks in a ton!


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-18 Thread Robin Murphy

On 2020-12-18 14:33, Christian König wrote:

Am 18.12.20 um 15:17 schrieb Robin Murphy:

On 2020-12-17 14:02, Christian König wrote:

[SNIP]
Do you have some background why some ARM boards fail with that?

We had a couple of reports that memset/memcpy fail in userspace 
(usually system just spontaneously reboots or becomes unresponsive), 
but so far nobody could tell us why that happens?


Part of it is that Arm doesn't really have an ideal memory type for 
mapping RAM behind PCI (much like we also struggle with the vague 
expectations of what write-combine might mean beyond x86). Device 
memory can be relaxed to allow gathering, reordering and 
write-buffering, but is still a bit too restrictive in other ways - 
aligned, non-speculative, etc. - for something that's really just RAM 
and expected to be usable as such. Thus to map PCI memory as 
"write-combine" we use Normal non-cacheable, which means the CPU MMU 
is going to allow software to do all the things it might expect of 
RAM, but we're now at the mercy of the menagerie of interconnects and 
PCI implementations out there.


I see. As far as I know we already correctly map the RAM from the GPU as 
"write-combine".


Atomic operations, for example, *might* be resolved by the CPU 
coherency mechanism or in the interconnect, such that the PCI host 
bridge only sees regular loads and stores, but more often than not 
they'll just result in an atomic transaction going all the way to the 
host bridge. A super-duper-clever host bridge implementation might 
even support that, but the vast majority are likely to just reject it 
as invalid.


Support for atomics is actually specified by an PCIe extension. As far 
as I know that extension is even necessary for full KFD support on AMD 
and full Cuda support for NVidia GPUs.




Similarly, unaligned accesses, cache line fills/evictions, and such 
will often work, since they're essentially just larger read/write 
bursts, but some host bridges can be picky and might reject access 
sizes they don't like (there's at least one where even 64-bit accesses 
don't work. On a 64-bit system...)


This is breaking our neck here. We need 64bit writes on 64bit systems to 
end up as one 64bit write at the hardware and not two 32bit writes or 
otherwise the doorbells won't work correctly.


Just to clarify, that particular case *is* considered catastrophically 
broken ;)


In general you can assume that on AArch64, any aligned 64-bit load or 
store is atomic (64-bit accesses on 32-bit Arm are less well-defined, 
but hopefully nobody cares by now).


Larger writes are pretty much unproblematic, for P2P our bus interface 
even supports really large multi byte transfers.


If an invalid transaction does reach the host bridge, it's going to 
come back to the CPU as an external abort. If we're really lucky that 
could be taken synchronously, attributable to a specific instruction, 
and just oops/SIGBUS the relevant kernel/userspace thread. Often 
though, (particularly with big out-of-order CPUs) it's likely to be 
asynchronous and no longer attributable, and thus taken as an SError 
event, which in general roughly translates to "part of the SoC has 
fallen off". The only reasonable response we have to that is to panic 
the system.


Yeah, that sounds exactly like what we see on some of the ARM boards out 
there. At least we have an explanation for that behavior now.


Going to talk about this with our hardware engineers. We might be able 
to work around some of that stuff, but that is rather tricky to get 
working under those conditions.


Yeah, unfortunately there's no easy way to judge the quality of any 
given SoC's PCI implementation until you throw your required traffic at 
it and things either break or don't...


Cheers,
Robin.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-18 Thread Robin Murphy

On 2020-12-18 06:14, Chen Li wrote:
[...]

No, not performance. See standards like OpenGL, Vulkan as well as VA-API and
VDPAU require that you can mmap() device memory and execute memset/memcpy on
the memory from userspace.

If your ARM base board can't do that for some then you can't use the hardware
with that board.


If the VRAM lives in a prefetchable PCI bar then on most sane Arm-based systems
I believe it should be able to mmap() to userspace with the Normal memory type,
where unaligned accesses and such are allowed, as opposed to the Device memory
type intended for MMIO mappings, which has more restrictions but stricter
ordering guarantees.
  
Hi, Robin. I cannot understand it allow unaligned accesses. prefetchable PCI bar should also be mmio, and accesses will end with device memory, so why does this allow unaligned access?


Because even Device-GRE is a bit too restrictive to expose to userspace 
that's likely to expect it to behave as regular memory, so, for better 
or worse, we use MT_NORMAL_MC for pgrprot_writecombine().



Regardless of what happens elsewhere though, if something is mapped *into the
kernel* with ioremap(), then it is fundamentally wrong per the kernel memory
model to reference that mapping directly without using I/O accessors. That is
not specific to any individual architecture, and Sparse should be screaming
about it already. I guess in this case the UVD code needs to pay more attention
to whether radeon_bo_kmap() ends up going via ttm_bo_ioremap() or not.

(I'm assuming the initial fault was memset() with 0 trying to perform "DC ZVA"
on a Device-type mapping from ioremap() - FYI a stacktrace on its own without
the rest of the error dump showing what actually triggered it isn't overly
useful)

Robin.

why it may be 'DC ZVA'? I'm not sure the pc in initial kernel fault memset, but 
I capture the userspace crash pc: stp(128bit) or str with neon(also 128bit) to 
render node(/dev/dri/renderD128).


As I said it was an assumption. I guessed at it being more likely to be 
an MMU fault than an external abort, and given the size and the fact 
that it's a variable initialisation guessed at it being slightly more 
likely to hit the ZVA special-case rather than being unaligned. Looking 
again, I guess starting at an odd-numbered 32-bit element might lead to 
an unaligned store of XZR, but either way it doesn't really matter - 
what it showed is it clearly *could* be an MMU fault because TTM seems 
to be a bit careless with iomem pointers.


That said, if you're also getting external aborts from your host bridge 
not liking 128-bit transactions, then as Christian says you're probably 
going to have a bad time on this platform either way.


Robin.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-18 Thread Christian König

Am 18.12.20 um 15:17 schrieb Robin Murphy:

On 2020-12-17 14:02, Christian König wrote:

[SNIP]
Do you have some background why some ARM boards fail with that?

We had a couple of reports that memset/memcpy fail in userspace 
(usually system just spontaneously reboots or becomes unresponsive), 
but so far nobody could tell us why that happens?


Part of it is that Arm doesn't really have an ideal memory type for 
mapping RAM behind PCI (much like we also struggle with the vague 
expectations of what write-combine might mean beyond x86). Device 
memory can be relaxed to allow gathering, reordering and 
write-buffering, but is still a bit too restrictive in other ways - 
aligned, non-speculative, etc. - for something that's really just RAM 
and expected to be usable as such. Thus to map PCI memory as 
"write-combine" we use Normal non-cacheable, which means the CPU MMU 
is going to allow software to do all the things it might expect of 
RAM, but we're now at the mercy of the menagerie of interconnects and 
PCI implementations out there.


I see. As far as I know we already correctly map the RAM from the GPU as 
"write-combine".


Atomic operations, for example, *might* be resolved by the CPU 
coherency mechanism or in the interconnect, such that the PCI host 
bridge only sees regular loads and stores, but more often than not 
they'll just result in an atomic transaction going all the way to the 
host bridge. A super-duper-clever host bridge implementation might 
even support that, but the vast majority are likely to just reject it 
as invalid.


Support for atomics is actually specified by an PCIe extension. As far 
as I know that extension is even necessary for full KFD support on AMD 
and full Cuda support for NVidia GPUs.




Similarly, unaligned accesses, cache line fills/evictions, and such 
will often work, since they're essentially just larger read/write 
bursts, but some host bridges can be picky and might reject access 
sizes they don't like (there's at least one where even 64-bit accesses 
don't work. On a 64-bit system...)


This is breaking our neck here. We need 64bit writes on 64bit systems to 
end up as one 64bit write at the hardware and not two 32bit writes or 
otherwise the doorbells won't work correctly.


Larger writes are pretty much unproblematic, for P2P our bus interface 
even supports really large multi byte transfers.


If an invalid transaction does reach the host bridge, it's going to 
come back to the CPU as an external abort. If we're really lucky that 
could be taken synchronously, attributable to a specific instruction, 
and just oops/SIGBUS the relevant kernel/userspace thread. Often 
though, (particularly with big out-of-order CPUs) it's likely to be 
asynchronous and no longer attributable, and thus taken as an SError 
event, which in general roughly translates to "part of the SoC has 
fallen off". The only reasonable response we have to that is to panic 
the system.


Yeah, that sounds exactly like what we see on some of the ARM boards out 
there. At least we have an explanation for that behavior now.


Going to talk about this with our hardware engineers. We might be able 
to work around some of that stuff, but that is rather tricky to get 
working under those conditions.


Thanks,
Christian.




Robin.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-18 Thread Robin Murphy

On 2020-12-17 14:02, Christian König wrote:

Am 17.12.20 um 14:45 schrieb Robin Murphy:

On 2020-12-17 10:25, Christian König wrote:

Am 17.12.20 um 02:07 schrieb Chen Li:

On Wed, 16 Dec 2020 22:19:11 +0800,
Christian König wrote:

Am 16.12.20 um 14:48 schrieb Chen Li:

On Wed, 16 Dec 2020 15:59:37 +0800,
Christian König wrote:

[SNIP]
Hi, Christian. I'm not sure why this change is a hack here. I 
cannot see the problem and wll be grateful if you give more 
explainations.
__memset is supposed to work on those addresses, otherwise you 
can't use the

e8860 on your arm64 system.
If __memset is supposed to work on those adresses, why this 
commit(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fcommit%2Fba0b2275a6781b2f3919d931d63329b5548f6d5fdata=04%7C01%7Cchristian.koenig%40amd.com%7C3551ae4972b044bb831608d8a291f81c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438095114292394%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=xns81uCGfN1tjsVn5LBU8QhmUinZRJQlXz8w%2FJ7%2FGTM%3Dreserved=0) 
is needed? (I also notice drm/radeon didn't take this change though) 
just out of curiosity.


We generally accept those patches as cleanup in the kernel with the 
hope that we can find a way to work around the userspace restrictions.


But when you also have this issue in userspace then there isn't much 
we can do for you.



Replacing the the direct write in the kernel with calls to writel() or
memset_io() will fix that temporary, but you have a more general 
problem here.
I cannot see what's the more general problem here :( u mean 
performance?


No, not performance. See standards like OpenGL, Vulkan as well as 
VA-API and VDPAU require that you can mmap() device memory and 
execute memset/memcpy on the memory from userspace.


If your ARM base board can't do that for some then you can't use the 
hardware with that board.


If the VRAM lives in a prefetchable PCI bar then on most sane 
Arm-based systems I believe it should be able to mmap() to userspace 
with the Normal memory type, where unaligned accesses and such are 
allowed, as opposed to the Device memory type intended for MMIO 
mappings, which has more restrictions but stricter ordering guarantees.


Do you have some background why some ARM boards fail with that?

We had a couple of reports that memset/memcpy fail in userspace (usually 
system just spontaneously reboots or becomes unresponsive), but so far 
nobody could tell us why that happens?


Part of it is that Arm doesn't really have an ideal memory type for 
mapping RAM behind PCI (much like we also struggle with the vague 
expectations of what write-combine might mean beyond x86). Device memory 
can be relaxed to allow gathering, reordering and write-buffering, but 
is still a bit too restrictive in other ways - aligned, non-speculative, 
etc. - for something that's really just RAM and expected to be usable as 
such. Thus to map PCI memory as "write-combine" we use Normal 
non-cacheable, which means the CPU MMU is going to allow software to do 
all the things it might expect of RAM, but we're now at the mercy of the 
menagerie of interconnects and PCI implementations out there.


Atomic operations, for example, *might* be resolved by the CPU coherency 
mechanism or in the interconnect, such that the PCI host bridge only 
sees regular loads and stores, but more often than not they'll just 
result in an atomic transaction going all the way to the host bridge. A 
super-duper-clever host bridge implementation might even support that, 
but the vast majority are likely to just reject it as invalid.


Similarly, unaligned accesses, cache line fills/evictions, and such will 
often work, since they're essentially just larger read/write bursts, but 
some host bridges can be picky and might reject access sizes they don't 
like (there's at least one where even 64-bit accesses don't work. On a 
64-bit system...)


If an invalid transaction does reach the host bridge, it's going to come 
back to the CPU as an external abort. If we're really lucky that could 
be taken synchronously, attributable to a specific instruction, and just 
oops/SIGBUS the relevant kernel/userspace thread. Often though, 
(particularly with big out-of-order CPUs) it's likely to be asynchronous 
and no longer attributable, and thus taken as an SError event, which in 
general roughly translates to "part of the SoC has fallen off". The only 
reasonable response we have to that is to panic the system.


Robin.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-18 Thread Christian König

Am 18.12.20 um 09:52 schrieb Chen Li:

On Fri, 18 Dec 2020 16:10:12 +0800,
Christian König wrote:

Am 18.12.20 um 04:51 schrieb Chen Li:

[SNIP]

If your ARM base board can't do that for some then you can't use the hardware
with that board.

Good to know, thanks! BTW, have you ever seen or heard boards like mine which 
cannot mmap device memory correctly from userspace correctly?

Unfortunately yes. We haven't been able to figure out what exactly goes wrong in
those cases.

Ok. one more question: only e8860 or all radeon cards have this issue?

This applies to all hardware with dedicated memory which needs to be mapped to
userspace.

That includes all graphics hardware from AMD as well as NVidia and probably a
whole bunch of other PCIe devices.

Can mmio on these devices work fine in kernel space?


The kernel drivers know that this is MMIO and can use special 
instructions/functions like 
writel()/writeq()/memcpy_fromio()/memcpy_toio() etc...



I cannot see the difference here except user space should use uncacheable mmap 
to map virtual memory to device space(though I don't know how to use 
uncacheable mmap), while kernel use uncache ioremap.


The problem with mmap() of MMIO into the userspace is that this can 
easily crash the whole system.


When an application uses memset()/memcpy() on the mapped region and the 
system spontaneous reboots than that's a rather big hardware problem.


Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-18 Thread Chen Li
On Thu, 17 Dec 2020 22:16:59 +0800,
Christian König wrote:
> 
> Am 17.12.20 um 14:37 schrieb Chen Li:
> > On Thu, 17 Dec 2020 18:25:11 +0800,
> > Christian König wrote:
> >> Am 17.12.20 um 02:07 schrieb Chen Li:
> >>> On Wed, 16 Dec 2020 22:19:11 +0800,
> >>> Christian König wrote:
>  Am 16.12.20 um 14:48 schrieb Chen Li:
> > On Wed, 16 Dec 2020 15:59:37 +0800,
> > Christian König wrote:
> >> [SNIP]
> > Hi, Christian. I'm not sure why this change is a hack here. I cannot 
> > see the problem and wll be grateful if you give more explainations.
>  __memset is supposed to work on those addresses, otherwise you can't use 
>  the
>  e8860 on your arm64 system.
> >>> If __memset is supposed to work on those adresses, why this 
> >>> commit(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fcommit%2Fba0b2275a6781b2f3919d931d63329b5548f6d5fdata=04%7C01%7Cchristian.koenig%40amd.com%7Cfdb4ca3e05ad4ea4882408d8a2914fbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438092297678363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=88oAUlEhnsVNSqYIfXk%2B811oXYd18XPScVZ4ceAurNk%3Dreserved=0)
> >>>  is needed? (I also notice drm/radeon didn't take this change though) 
> >>> just out of curiosity.
> >> We generally accept those patches as cleanup in the kernel with the hope 
> >> that we
> >> can find a way to work around the userspace restrictions.
> > What's the userspace restriction here? mmap device memory?
> 
> Yes, exactly that.
> 
> >> But when you also have this issue in userspace then there isn't much we 
> >> can do
> >> for you.
> >> 
>  Replacing the the direct write in the kernel with calls to writel() or
>  memset_io() will fix that temporary, but you have a more general problem 
>  here.
> >>>I cannot see what's the more general problem here :( u mean 
> >>> performance?
> >> No, not performance. See standards like OpenGL, Vulkan as well as VA-API 
> >> and
> >> VDPAU require that you can mmap() device memory and execute memset/memcpy 
> >> on the
> >> memory from userspace.
> >> 
> >> If your ARM base board can't do that for some then you can't use the 
> >> hardware
> >> with that board.
> > Good to know, thanks! BTW, have you ever seen or heard boards like mine 
> > which cannot mmap device memory correctly from userspace correctly?
> 
> Unfortunately yes. We haven't been able to figure out what exactly goes wrong 
> in
> those cases.

Ok. one more question: only e8860 or all radeon cards have this issue?
 
> >> For amdgpu I suggest that we allocate the UVD message in GTT instead 
> >> of VRAM
> >> since we don't have the hardware restriction for that on the new 
> >> generations.
> >> 
> > Thanks, I will try to dig into deeper. But what's the "hardware 
> > restriction" meaning here? I'm not familiar with video driver stack and 
> > amd gpu, sorry.
>  On older hardware (AGP days) the buffer had to be in VRAM (MMIO) memory, 
>  but on
>  modern system GTT (system memory) works as well.
> >>> IIUC, e8860 can use amdgpu(I use radeon now) beause its device id 6822 is 
> >>> in amdgpu's table. But I cannot tell whether e8860 has iommu, and I 
> >>> cannot find iommu from lspci, so graphics translation table may not work 
> >>> here?
> >> That is not related to IOMMU. IOMMU is a feature of the CPU/motherboard. 
> >> This is
> >> implemented using GTT, e.g. the VM page tables inside the GPU.
> >> 
> >> And yes it should work I will prepare a patch for it.
> > I think you mean mmu :)
> 
> No, I really meant IOMMU.
> 
> > Refer to wikipedia: 
> > https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fen.wikipedia.org%2Fwiki%2FInput%25E2%2580%2593output_memory_management_unit%23:~:text%3DIn%2520computing%252C%2520an%2520input%25E2%2580%2593output%2Cbus%2520to%2520the%2520main%2520memorydata=04%7C01%7Cchristian.koenig%40amd.com%7Cfdb4ca3e05ad4ea4882408d8a2914fbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438092297678363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=t6NDi8didU7GFzaCSMFvdSTKA%2FmRZ1cgPCpY7lf7UKo%3Dreserved=0.
> > 
> >  In computing, an input–output memory management unit (IOMMU) is a 
> > memory management unit (MMU) that connects a direct-memory-access–capable 
> > (DMA-capable) I/O bus to the main memory. Like a traditional MMU, which 
> > translates CPU-visible virtual addresses to physical addresses, the IOMMU 
> > maps device-visible virtual addresses (also called device addresses or I/O 
> > addresses in this context) to physical addresses. Some units also provide 
> > memory protection from faulty or malicious devices.
> >  An example IOMMU is the graphics address remapping table (GART) used 
> > by AGP and PCI Express graphics cards on Intel Architecture and AMD 
> > computers.
> 
> Maybe somebody should clarify the 

Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-18 Thread Chen Li
On Thu, 17 Dec 2020 21:45:06 +0800,
Robin Murphy wrote:
> 
> On 2020-12-17 10:25, Christian König wrote:
> > Am 17.12.20 um 02:07 schrieb Chen Li:
> >> On Wed, 16 Dec 2020 22:19:11 +0800,
> >> Christian König wrote:
> >>> Am 16.12.20 um 14:48 schrieb Chen Li:
>  On Wed, 16 Dec 2020 15:59:37 +0800,
>  Christian König wrote:
> > [SNIP]
>  Hi, Christian. I'm not sure why this change is a hack here. I cannot see
>  the problem and wll be grateful if you give more explainations.
> >>> __memset is supposed to work on those addresses, otherwise you can't use 
> >>> the
> >>> e8860 on your arm64 system.
> >> If __memset is supposed to work on those adresses, why this
> >> commit(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fcommit%2Fba0b2275a6781b2f3919d931d63329b5548f6d5fdata=04%7C01%7Cchristian.koenig%40amd.com%7C4ed3c075888746b7f41408d8a22811c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437640274023350%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=HhWxUaLo3WpzoV6hjV%2BG1HICaIOXwsoNpzv5tNMNg8A%3Dreserved=0)
> >> is needed? (I also notice drm/radeon didn't take this change though) just 
> >> out
> >> of curiosity.
> > 
> > We generally accept those patches as cleanup in the kernel with the hope 
> > that
> > we can find a way to work around the userspace restrictions.
> > 
> > But when you also have this issue in userspace then there isn't much we can 
> > do
> > for you.
> > 
> >>> Replacing the the direct write in the kernel with calls to writel() or
> >>> memset_io() will fix that temporary, but you have a more general problem
> >>> here.
> >> I cannot see what's the more general problem here :( u mean performance?
> > 
> > No, not performance. See standards like OpenGL, Vulkan as well as VA-API and
> > VDPAU require that you can mmap() device memory and execute memset/memcpy on
> > the memory from userspace.
> > 
> > If your ARM base board can't do that for some then you can't use the 
> > hardware
> > with that board.
> 
> If the VRAM lives in a prefetchable PCI bar then on most sane Arm-based 
> systems
> I believe it should be able to mmap() to userspace with the Normal memory 
> type,
> where unaligned accesses and such are allowed, as opposed to the Device memory
> type intended for MMIO mappings, which has more restrictions but stricter
> ordering guarantees.
 
Hi, Robin. I cannot understand it allow unaligned accesses. prefetchable PCI 
bar should also be mmio, and accesses will end with device memory, so why does 
this allow unaligned access?
> Regardless of what happens elsewhere though, if something is mapped *into the
> kernel* with ioremap(), then it is fundamentally wrong per the kernel memory
> model to reference that mapping directly without using I/O accessors. That is
> not specific to any individual architecture, and Sparse should be screaming
> about it already. I guess in this case the UVD code needs to pay more 
> attention
> to whether radeon_bo_kmap() ends up going via ttm_bo_ioremap() or not.
> 
> (I'm assuming the initial fault was memset() with 0 trying to perform "DC ZVA"
> on a Device-type mapping from ioremap() - FYI a stacktrace on its own without
> the rest of the error dump showing what actually triggered it isn't overly
> useful)
> 
> Robin.
why it may be 'DC ZVA'? I'm not sure the pc in initial kernel fault memset, but 
I capture the userspace crash pc: stp(128bit) or str with neon(also 128bit) to 
render node(/dev/dri/renderD128).
 
> > For amdgpu I suggest that we allocate the UVD message in GTT instead of
> > VRAM
> > since we don't have the hardware restriction for that on the new
> > generations.
> > 
>  Thanks, I will try to dig into deeper. But what's the "hardware
>  restriction" meaning here? I'm not familiar with video driver stack and 
>  amd
>  gpu, sorry.
> >>> On older hardware (AGP days) the buffer had to be in VRAM (MMIO) memory, 
> >>> but
> >>> on
> >>> modern system GTT (system memory) works as well.
> >> IIUC, e8860 can use amdgpu(I use radeon now) beause its device id 6822 is 
> >> in
> >> amdgpu's table. But I cannot tell whether e8860 has iommu, and I cannot 
> >> find
> >> iommu from lspci, so graphics translation table may not work here?
> > 
> > That is not related to IOMMU. IOMMU is a feature of the CPU/motherboard. 
> > This
> > is implemented using GTT, e.g. the VM page tables inside the GPU.
> > 
> > And yes it should work I will prepare a patch for it.
> > 
> > BTW: How does userspace work on arm64 then? The driver stack usually 
> > only
> > works
> > if mmio can be mapped directly.
>  I also post two usespace issue on mesa, and you may be interested with
>  them:
> 
>  

Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-18 Thread Chen Li
On Thu, 17 Dec 2020 18:25:11 +0800,
Christian König wrote:
> 
> Am 17.12.20 um 02:07 schrieb Chen Li:
> > On Wed, 16 Dec 2020 22:19:11 +0800,
> > Christian König wrote:
> >> Am 16.12.20 um 14:48 schrieb Chen Li:
> >>> On Wed, 16 Dec 2020 15:59:37 +0800,
> >>> Christian König wrote:
>  [SNIP]
> >>> Hi, Christian. I'm not sure why this change is a hack here. I cannot see 
> >>> the problem and wll be grateful if you give more explainations.
> >> __memset is supposed to work on those addresses, otherwise you can't use 
> >> the
> >> e8860 on your arm64 system.
> > If __memset is supposed to work on those adresses, why this 
> > commit(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fcommit%2Fba0b2275a6781b2f3919d931d63329b5548f6d5fdata=04%7C01%7Cchristian.koenig%40amd.com%7C4ed3c075888746b7f41408d8a22811c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437640274023350%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=HhWxUaLo3WpzoV6hjV%2BG1HICaIOXwsoNpzv5tNMNg8A%3Dreserved=0)
> >  is needed? (I also notice drm/radeon didn't take this change though) just 
> > out of curiosity.
> 
> We generally accept those patches as cleanup in the kernel with the hope that 
> we
> can find a way to work around the userspace restrictions.
What's the userspace restriction here? mmap device memory?
> 
> But when you also have this issue in userspace then there isn't much we can do
> for you.
> 
> >> Replacing the the direct write in the kernel with calls to writel() or
> >> memset_io() will fix that temporary, but you have a more general problem 
> >> here.
> >   I cannot see what's the more general problem here :( u mean performance?
> 
> No, not performance. See standards like OpenGL, Vulkan as well as VA-API and
> VDPAU require that you can mmap() device memory and execute memset/memcpy on 
> the
> memory from userspace.
> 
> If your ARM base board can't do that for some then you can't use the hardware
> with that board.

Good to know, thanks! BTW, have you ever seen or heard boards like mine which 
cannot mmap device memory correctly from userspace correctly?
> 
>  For amdgpu I suggest that we allocate the UVD message in GTT instead of 
>  VRAM
>  since we don't have the hardware restriction for that on the new 
>  generations.
>  
> >>> Thanks, I will try to dig into deeper. But what's the "hardware 
> >>> restriction" meaning here? I'm not familiar with video driver stack and 
> >>> amd gpu, sorry.
> >> On older hardware (AGP days) the buffer had to be in VRAM (MMIO) memory, 
> >> but on
> >> modern system GTT (system memory) works as well.
> > IIUC, e8860 can use amdgpu(I use radeon now) beause its device id 6822 is 
> > in amdgpu's table. But I cannot tell whether e8860 has iommu, and I cannot 
> > find iommu from lspci, so graphics translation table may not work here?
> 
> That is not related to IOMMU. IOMMU is a feature of the CPU/motherboard. This 
> is
> implemented using GTT, e.g. the VM page tables inside the GPU.
> 
> And yes it should work I will prepare a patch for it.

I think you mean mmu :) Refer to wikipedia: 
https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit#:~:text=In%20computing%2C%20an%20input%E2%80%93output,bus%20to%20the%20main%20memory.

In computing, an input–output memory management unit (IOMMU) is a memory 
management unit (MMU) that connects a direct-memory-access–capable 
(DMA-capable) I/O bus to the main memory. Like a traditional MMU, which 
translates CPU-visible virtual addresses to physical addresses, the IOMMU maps 
device-visible virtual addresses (also called device addresses or I/O addresses 
in this context) to physical addresses. Some units also provide memory 
protection from faulty or malicious devices.
An example IOMMU is the graphics address remapping table (GART) used by AGP 
and PCI Express graphics cards on Intel Architecture and AMD computers.

GART should be antoher abber of 
GTT(https://en.wikipedia.org/wiki/Graphics_address_remapping_table):

The graphics address remapping table (GART),[1] also known as the graphics 
aperture remapping table,[2] or graphics translation table (GTT),[3] is an I/O 
memory management unit (IOMMU) used by Accelerated Graphics Port (AGP) and PCI 
Express (PCIe) graphics cards. 

> 
>  BTW: How does userspace work on arm64 then? The driver stack usually 
>  only works
>  if mmio can be mapped directly.
> >>> I also post two usespace issue on mesa, and you may be interested with 
> >>> them:
> >>>
> >>> 

Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-18 Thread Christian König

Am 18.12.20 um 04:51 schrieb Chen Li:

[SNIP]

If your ARM base board can't do that for some then you can't use the hardware
with that board.

Good to know, thanks! BTW, have you ever seen or heard boards like mine which 
cannot mmap device memory correctly from userspace correctly?

Unfortunately yes. We haven't been able to figure out what exactly goes wrong in
those cases.

Ok. one more question: only e8860 or all radeon cards have this issue?


This applies to all hardware with dedicated memory which needs to be 
mapped to userspace.


That includes all graphics hardware from AMD as well as NVidia and 
probably a whole bunch of other PCIe devices.



  The graphics address remapping table (GART),[1] also known as the 
graphics aperture remapping table,[2] or graphics translation table (GTT),[3] 
is an I/O memory management unit (IOMMU) used by Accelerated Graphics Port 
(AGP) and PCI Express (PCIe) graphics cards.

GART or GTT refers to the translation tables graphics hardware use to access
system memory.

Something like 15 years ago we used the IOMMU functionality from AGP to
implement that. But modern hardware (PCIe) uses some specialized hardware in the
GPU for that.

Regards,
Christian.




Good to know, thanks! So modern GART/GTT is like tlb, and iommu is forcused on 
translating address and not manager the tlb.


You are getting closer in your understanding, but the TLB is the 
Translation lookaside buffer. Basically a cache of recent VM 
translations which is present is all page table translations (GART, 
IOMMU, CPU etc...).


The key difference is where the page table translation happens on modern 
hardware:
1. For the GART/GTT it is inside the GPU to translate between GPU 
internal and bus addresses.
2. For IOMMU it is inside the root complex of the PCIe to translate 
between bus addresses and physical addresses.
3. For CPU page tables it is inside the CPU core to translate between 
virtual addresses and physical addresses.


Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-17 Thread Lucas Stach
Am Donnerstag, den 17.12.2020, 15:02 +0100 schrieb Christian König:
> Am 17.12.20 um 14:45 schrieb Robin Murphy:
> > On 2020-12-17 10:25, Christian König wrote:
> > > Am 17.12.20 um 02:07 schrieb Chen Li:
> > > > On Wed, 16 Dec 2020 22:19:11 +0800,
> > > > Christian König wrote:
> > > > > Am 16.12.20 um 14:48 schrieb Chen Li:
> > > > > > On Wed, 16 Dec 2020 15:59:37 +0800,
> > > > > > Christian König wrote:
> > > > > > > [SNIP]
> > > > > > Hi, Christian. I'm not sure why this change is a hack here. I 
> > > > > > cannot see the problem and wll be grateful if you give more 
> > > > > > explainations.
> > > > > __memset is supposed to work on those addresses, otherwise you 
> > > > > can't use the
> > > > > e8860 on your arm64 system.
> > > > If __memset is supposed to work on those adresses, why this 
> > > > commit(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fcommit%2Fba0b2275a6781b2f3919d931d63329b5548f6d5fdata=04%7C01%7Cchristian.koenig%40amd.com%7C3551ae4972b044bb831608d8a291f81c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438095114292394%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=xns81uCGfN1tjsVn5LBU8QhmUinZRJQlXz8w%2FJ7%2FGTM%3Dreserved=0)
> > > >  
> > > > is needed? (I also notice drm/radeon didn't take this change though) 
> > > > just out of curiosity.
> > > 
> > > We generally accept those patches as cleanup in the kernel with the 
> > > hope that we can find a way to work around the userspace restrictions.
> > > 
> > > But when you also have this issue in userspace then there isn't much 
> > > we can do for you.
> > > 
> > > > > Replacing the the direct write in the kernel with calls to writel() or
> > > > > memset_io() will fix that temporary, but you have a more general 
> > > > > problem here.
> > > > I cannot see what's the more general problem here :( u mean 
> > > > performance?
> > > 
> > > No, not performance. See standards like OpenGL, Vulkan as well as 
> > > VA-API and VDPAU require that you can mmap() device memory and 
> > > execute memset/memcpy on the memory from userspace.
> > > 
> > > If your ARM base board can't do that for some then you can't use the 
> > > hardware with that board.
> > 
> > If the VRAM lives in a prefetchable PCI bar then on most sane 
> > Arm-based systems I believe it should be able to mmap() to userspace 
> > with the Normal memory type, where unaligned accesses and such are 
> > allowed, as opposed to the Device memory type intended for MMIO 
> > mappings, which has more restrictions but stricter ordering guarantees.
> 
> Do you have some background why some ARM boards fail with that?
> 
> We had a couple of reports that memset/memcpy fail in userspace (usually 
> system just spontaneously reboots or becomes unresponsive), but so far 
> nobody could tell us why that happens?

Optimized memset/memcpy uses unaligned access in some cases, where
handling unaligned start/end addresses would cause more instructions to
be used otherwise.

If the device memory isn't mapped at least writecombined (bufferable in
ARM speak) into userspace, those unaligned accesses are not allowed and
will cause traps on the hardware level. Normally this should just lead
to the process making the access getting killed with a SIGBUS, but
maybe some systems handle those traps wrong on a firmware level? If the
kernel makes such an unaligned access then the kernel will fault, which
normally means halting the kernel.

Regards,
Lucas


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-17 Thread Christian König

Am 17.12.20 um 14:37 schrieb Chen Li:

On Thu, 17 Dec 2020 18:25:11 +0800,
Christian König wrote:

Am 17.12.20 um 02:07 schrieb Chen Li:

On Wed, 16 Dec 2020 22:19:11 +0800,
Christian König wrote:

Am 16.12.20 um 14:48 schrieb Chen Li:

On Wed, 16 Dec 2020 15:59:37 +0800,
Christian König wrote:

[SNIP]

Hi, Christian. I'm not sure why this change is a hack here. I cannot see the 
problem and wll be grateful if you give more explainations.

__memset is supposed to work on those addresses, otherwise you can't use the
e8860 on your arm64 system.

If __memset is supposed to work on those adresses, why this 
commit(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fcommit%2Fba0b2275a6781b2f3919d931d63329b5548f6d5fdata=04%7C01%7Cchristian.koenig%40amd.com%7Cfdb4ca3e05ad4ea4882408d8a2914fbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438092297678363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=88oAUlEhnsVNSqYIfXk%2B811oXYd18XPScVZ4ceAurNk%3Dreserved=0)
 is needed? (I also notice drm/radeon didn't take this change though) just out of curiosity.

We generally accept those patches as cleanup in the kernel with the hope that we
can find a way to work around the userspace restrictions.

What's the userspace restriction here? mmap device memory?


Yes, exactly that.


But when you also have this issue in userspace then there isn't much we can do
for you.


Replacing the the direct write in the kernel with calls to writel() or
memset_io() will fix that temporary, but you have a more general problem here.

   I cannot see what's the more general problem here :( u mean performance?

No, not performance. See standards like OpenGL, Vulkan as well as VA-API and
VDPAU require that you can mmap() device memory and execute memset/memcpy on the
memory from userspace.

If your ARM base board can't do that for some then you can't use the hardware
with that board.

Good to know, thanks! BTW, have you ever seen or heard boards like mine which 
cannot mmap device memory correctly from userspace correctly?


Unfortunately yes. We haven't been able to figure out what exactly goes 
wrong in those cases.



For amdgpu I suggest that we allocate the UVD message in GTT instead of VRAM
since we don't have the hardware restriction for that on the new generations.


Thanks, I will try to dig into deeper. But what's the "hardware restriction" 
meaning here? I'm not familiar with video driver stack and amd gpu, sorry.

On older hardware (AGP days) the buffer had to be in VRAM (MMIO) memory, but on
modern system GTT (system memory) works as well.

IIUC, e8860 can use amdgpu(I use radeon now) beause its device id 6822 is in 
amdgpu's table. But I cannot tell whether e8860 has iommu, and I cannot find 
iommu from lspci, so graphics translation table may not work here?

That is not related to IOMMU. IOMMU is a feature of the CPU/motherboard. This is
implemented using GTT, e.g. the VM page tables inside the GPU.

And yes it should work I will prepare a patch for it.

I think you mean mmu :)


No, I really meant IOMMU.


Refer to wikipedia: 
https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fen.wikipedia.org%2Fwiki%2FInput%25E2%2580%2593output_memory_management_unit%23:~:text%3DIn%2520computing%252C%2520an%2520input%25E2%2580%2593output%2Cbus%2520to%2520the%2520main%2520memorydata=04%7C01%7Cchristian.koenig%40amd.com%7Cfdb4ca3e05ad4ea4882408d8a2914fbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438092297678363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=t6NDi8didU7GFzaCSMFvdSTKA%2FmRZ1cgPCpY7lf7UKo%3Dreserved=0.

 In computing, an input–output memory management unit (IOMMU) is a memory 
management unit (MMU) that connects a direct-memory-access–capable 
(DMA-capable) I/O bus to the main memory. Like a traditional MMU, which 
translates CPU-visible virtual addresses to physical addresses, the IOMMU maps 
device-visible virtual addresses (also called device addresses or I/O addresses 
in this context) to physical addresses. Some units also provide memory 
protection from faulty or malicious devices.
 An example IOMMU is the graphics address remapping table (GART) used by 
AGP and PCI Express graphics cards on Intel Architecture and AMD computers.


Maybe somebody should clarify the wikipedia article a bit since this is 
to general and misleading.


The key difference is that today IOMMU usually refers to the MMU block 
in the PCIe root complex of the CPU.



GART should be antoher abber of 

Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-17 Thread Christian König

Am 17.12.20 um 14:45 schrieb Robin Murphy:

On 2020-12-17 10:25, Christian König wrote:

Am 17.12.20 um 02:07 schrieb Chen Li:

On Wed, 16 Dec 2020 22:19:11 +0800,
Christian König wrote:

Am 16.12.20 um 14:48 schrieb Chen Li:

On Wed, 16 Dec 2020 15:59:37 +0800,
Christian König wrote:

[SNIP]
Hi, Christian. I'm not sure why this change is a hack here. I 
cannot see the problem and wll be grateful if you give more 
explainations.
__memset is supposed to work on those addresses, otherwise you 
can't use the

e8860 on your arm64 system.
If __memset is supposed to work on those adresses, why this 
commit(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fcommit%2Fba0b2275a6781b2f3919d931d63329b5548f6d5fdata=04%7C01%7Cchristian.koenig%40amd.com%7C3551ae4972b044bb831608d8a291f81c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438095114292394%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=xns81uCGfN1tjsVn5LBU8QhmUinZRJQlXz8w%2FJ7%2FGTM%3Dreserved=0) 
is needed? (I also notice drm/radeon didn't take this change though) 
just out of curiosity.


We generally accept those patches as cleanup in the kernel with the 
hope that we can find a way to work around the userspace restrictions.


But when you also have this issue in userspace then there isn't much 
we can do for you.



Replacing the the direct write in the kernel with calls to writel() or
memset_io() will fix that temporary, but you have a more general 
problem here.
I cannot see what's the more general problem here :( u mean 
performance?


No, not performance. See standards like OpenGL, Vulkan as well as 
VA-API and VDPAU require that you can mmap() device memory and 
execute memset/memcpy on the memory from userspace.


If your ARM base board can't do that for some then you can't use the 
hardware with that board.


If the VRAM lives in a prefetchable PCI bar then on most sane 
Arm-based systems I believe it should be able to mmap() to userspace 
with the Normal memory type, where unaligned accesses and such are 
allowed, as opposed to the Device memory type intended for MMIO 
mappings, which has more restrictions but stricter ordering guarantees.


Do you have some background why some ARM boards fail with that?

We had a couple of reports that memset/memcpy fail in userspace (usually 
system just spontaneously reboots or becomes unresponsive), but so far 
nobody could tell us why that happens?




Regardless of what happens elsewhere though, if something is mapped 
*into the kernel* with ioremap(), then it is fundamentally wrong per 
the kernel memory model to reference that mapping directly without 
using I/O accessors. That is not specific to any individual 
architecture, and Sparse should be screaming about it already. I guess 
in this case the UVD code needs to pay more attention to whether 
radeon_bo_kmap() ends up going via ttm_bo_ioremap() or not.


Yes, exactly. That's why we already have memcpy_fromio()/memcpy_toio() 
to upload the firmware and save the state on suspend/resume.


It's just that in this case here we also have IO memory because some 15+ 
years old AGP based hardware doesn't work when you but it in system 
memory :)


So pointing that out is correct and I'm going to clean that up now.

Regards,
Christian.



(I'm assuming the initial fault was memset() with 0 trying to perform 
"DC ZVA" on a Device-type mapping from ioremap() - FYI a stacktrace on 
its own without the rest of the error dump showing what actually 
triggered it isn't overly useful)


Robin.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-17 Thread Robin Murphy

On 2020-12-17 10:25, Christian König wrote:

Am 17.12.20 um 02:07 schrieb Chen Li:

On Wed, 16 Dec 2020 22:19:11 +0800,
Christian König wrote:

Am 16.12.20 um 14:48 schrieb Chen Li:

On Wed, 16 Dec 2020 15:59:37 +0800,
Christian König wrote:

[SNIP]
Hi, Christian. I'm not sure why this change is a hack here. I cannot 
see the problem and wll be grateful if you give more explainations.
__memset is supposed to work on those addresses, otherwise you can't 
use the

e8860 on your arm64 system.
If __memset is supposed to work on those adresses, why this 
commit(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fcommit%2Fba0b2275a6781b2f3919d931d63329b5548f6d5fdata=04%7C01%7Cchristian.koenig%40amd.com%7C4ed3c075888746b7f41408d8a22811c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437640274023350%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=HhWxUaLo3WpzoV6hjV%2BG1HICaIOXwsoNpzv5tNMNg8A%3Dreserved=0) 
is needed? (I also notice drm/radeon didn't take this change though) 
just out of curiosity.


We generally accept those patches as cleanup in the kernel with the hope 
that we can find a way to work around the userspace restrictions.


But when you also have this issue in userspace then there isn't much we 
can do for you.



Replacing the the direct write in the kernel with calls to writel() or
memset_io() will fix that temporary, but you have a more general 
problem here.

I cannot see what's the more general problem here :( u mean performance?


No, not performance. See standards like OpenGL, Vulkan as well as VA-API 
and VDPAU require that you can mmap() device memory and execute 
memset/memcpy on the memory from userspace.


If your ARM base board can't do that for some then you can't use the 
hardware with that board.


If the VRAM lives in a prefetchable PCI bar then on most sane Arm-based 
systems I believe it should be able to mmap() to userspace with the 
Normal memory type, where unaligned accesses and such are allowed, as 
opposed to the Device memory type intended for MMIO mappings, which has 
more restrictions but stricter ordering guarantees.


Regardless of what happens elsewhere though, if something is mapped 
*into the kernel* with ioremap(), then it is fundamentally wrong per the 
kernel memory model to reference that mapping directly without using I/O 
accessors. That is not specific to any individual architecture, and 
Sparse should be screaming about it already. I guess in this case the 
UVD code needs to pay more attention to whether radeon_bo_kmap() ends up 
going via ttm_bo_ioremap() or not.


(I'm assuming the initial fault was memset() with 0 trying to perform 
"DC ZVA" on a Device-type mapping from ioremap() - FYI a stacktrace on 
its own without the rest of the error dump showing what actually 
triggered it isn't overly useful)


Robin.

For amdgpu I suggest that we allocate the UVD message in GTT 
instead of VRAM
since we don't have the hardware restriction for that on the new 
generations.


Thanks, I will try to dig into deeper. But what's the "hardware 
restriction" meaning here? I'm not familiar with video driver stack 
and amd gpu, sorry.
On older hardware (AGP days) the buffer had to be in VRAM (MMIO) 
memory, but on

modern system GTT (system memory) works as well.
IIUC, e8860 can use amdgpu(I use radeon now) beause its device id 6822 
is in amdgpu's table. But I cannot tell whether e8860 has iommu, and I 
cannot find iommu from lspci, so graphics translation table may not 
work here?


That is not related to IOMMU. IOMMU is a feature of the CPU/motherboard. 
This is implemented using GTT, e.g. the VM page tables inside the GPU.


And yes it should work I will prepare a patch for it.

BTW: How does userspace work on arm64 then? The driver stack 
usually only works

if mmio can be mapped directly.
I also post two usespace issue on mesa, and you may be interested 
with them:
   
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F3954data=04%7C01%7Cchristian.koenig%40amd.com%7C4ed3c075888746b7f41408d8a22811c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437640274023350%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=ZR7pDS%2BCLUuMjCeKcMAXfHtbczt8WdUwSeLZCuHfCHw%3Dreserved=0 

   
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F3951data=04%7C01%7Cchristian.koenig%40amd.com%7C4ed3c075888746b7f41408d8a22811c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437640274033344%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=jAJo3aG2I1oIDTZXWhNgcKoKbd6tTdiAtc7vE4hJJPY%3Dreserved=0 

I paste some virtual memory map in userspace there. (and the two 
problems do bother me quite a long time.)

I don't really see a solution for those problems.

Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-17 Thread Christian König

Am 17.12.20 um 02:07 schrieb Chen Li:

On Wed, 16 Dec 2020 22:19:11 +0800,
Christian König wrote:

Am 16.12.20 um 14:48 schrieb Chen Li:

On Wed, 16 Dec 2020 15:59:37 +0800,
Christian König wrote:

[SNIP]

Hi, Christian. I'm not sure why this change is a hack here. I cannot see the 
problem and wll be grateful if you give more explainations.

__memset is supposed to work on those addresses, otherwise you can't use the
e8860 on your arm64 system.

If __memset is supposed to work on those adresses, why this 
commit(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fcommit%2Fba0b2275a6781b2f3919d931d63329b5548f6d5fdata=04%7C01%7Cchristian.koenig%40amd.com%7C4ed3c075888746b7f41408d8a22811c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437640274023350%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=HhWxUaLo3WpzoV6hjV%2BG1HICaIOXwsoNpzv5tNMNg8A%3Dreserved=0)
 is needed? (I also notice drm/radeon didn't take this change though) just out of curiosity.


We generally accept those patches as cleanup in the kernel with the hope 
that we can find a way to work around the userspace restrictions.


But when you also have this issue in userspace then there isn't much we 
can do for you.



Replacing the the direct write in the kernel with calls to writel() or
memset_io() will fix that temporary, but you have a more general problem here.
  
I cannot see what's the more general problem here :( u mean performance?


No, not performance. See standards like OpenGL, Vulkan as well as VA-API 
and VDPAU require that you can mmap() device memory and execute 
memset/memcpy on the memory from userspace.


If your ARM base board can't do that for some then you can't use the 
hardware with that board.



For amdgpu I suggest that we allocate the UVD message in GTT instead of VRAM
since we don't have the hardware restriction for that on the new generations.


Thanks, I will try to dig into deeper. But what's the "hardware restriction" 
meaning here? I'm not familiar with video driver stack and amd gpu, sorry.

On older hardware (AGP days) the buffer had to be in VRAM (MMIO) memory, but on
modern system GTT (system memory) works as well.

IIUC, e8860 can use amdgpu(I use radeon now) beause its device id 6822 is in 
amdgpu's table. But I cannot tell whether e8860 has iommu, and I cannot find 
iommu from lspci, so graphics translation table may not work here?


That is not related to IOMMU. IOMMU is a feature of the CPU/motherboard. 
This is implemented using GTT, e.g. the VM page tables inside the GPU.


And yes it should work I will prepare a patch for it.


BTW: How does userspace work on arm64 then? The driver stack usually only works
if mmio can be mapped directly.

I also post two usespace issue on mesa, and you may be interested with them:
   
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F3954data=04%7C01%7Cchristian.koenig%40amd.com%7C4ed3c075888746b7f41408d8a22811c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437640274023350%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=ZR7pDS%2BCLUuMjCeKcMAXfHtbczt8WdUwSeLZCuHfCHw%3Dreserved=0
   
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F3951data=04%7C01%7Cchristian.koenig%40amd.com%7C4ed3c075888746b7f41408d8a22811c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437640274033344%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=jAJo3aG2I1oIDTZXWhNgcKoKbd6tTdiAtc7vE4hJJPY%3Dreserved=0
I paste some virtual memory map in userspace there. (and the two problems do 
bother me quite a long time.)

I don't really see a solution for those problems.

See it is perfectly valid for an application to memset/memcpy on mmaped MMIO
space which comes from OpenGL or Vulkan.

So your CPU simply won't work with the hardware. We could work around that with
a couple of hacks, but this is a pretty much general problem.

Regards,
Christian.
  
Thanks! Can you provid some details about these hacks? Should I post another issue on the mail list?


Adjust the kernel and/or user space to never map VRAM to the CPU.

This violates the OpenGL/Vulkan specification in some ways. So not sure 
if that will work or not.


Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-17 Thread Chen Li
On Wed, 16 Dec 2020 15:59:37 +0800,
Christian König wrote:
> 
> Am 16.12.20 um 06:41 schrieb Chen Li:
> > When using e8860(gcn1) on arm64, the kernel crashed on drm/radeon:
> > 
> > [   11.240414] pc : __memset+0x4c/0x188
> > [   11.244101] lr : radeon_uvd_get_create_msg+0x114/0x1d0 [radeon]
> > [   11.249995] sp : 0d7eb700
> > [   11.253295] x29: 0d7eb700 x28: 8001f632a868
> > [   11.258585] x27: 0004 x26: 0de0
> > [   11.263875] x25: 0125 x24: 0001
> > [   11.269168] x23:  x22: 0005
> > [   11.274459] x21: 0df24000 x20: 8001f74b4000
> > [   11.279753] x19: 00124000 x18: 0020
> > [   11.285043] x17:  x16: 
> > [   11.290336] x15: 09309000 x14: 
> > [   11.290340] x13: 094b6f88 x12: 094b6bd2
> > [   11.290343] x11: 0d7eb700 x10: 0d7eb700
> > [   11.306246] x9 : 0d7eb700 x8 : 0df2402c
> > [   11.306254] x7 :  x6 : 094b626a
> > [   11.306257] x5 :  x4 : 0004
> > [   11.306262] x3 :  x2 : 0fd4
> > [   11.306265] x1 :  x0 : 0df2402c
> > [   11.306272] Call trace:
> > [   11.306316]  __memset+0x4c/0x188
> > [   11.306638]  uvd_v1_0_ib_test+0x70/0x1c0 [radeon]
> > [   11.306758]  radeon_ib_ring_tests+0x54/0xe0 [radeon]
> > [   11.309961] IPv6: ADDRCONF(NETDEV_UP): enp5s0f0: link is not ready
> > [   11.354628]  radeon_device_init+0x53c/0xbdc [radeon]
> > [   11.354693]  radeon_driver_load_kms+0x6c/0x1b0 [radeon]
> > [   11.364788]  drm_dev_register+0x130/0x1c0
> > [   11.364794]  drm_get_pci_dev+0x8c/0x14c
> > [   11.372704]  radeon_pci_probe+0xb0/0x110 [radeon]
> > [   11.372715]  local_pci_probe+0x3c/0xb0
> > [   11.381129]  pci_device_probe+0x114/0x1b0
> > [   11.385121]  really_probe+0x23c/0x400
> > [   11.388757]  driver_probe_device+0xdc/0x130
> > [   11.392921]  __driver_attach+0x128/0x150
> > [   11.396826]  bus_for_each_dev+0x70/0xbc
> > [   11.400643]  driver_attach+0x20/0x2c
> > [   11.404201]  bus_add_driver+0x160/0x260
> > [   11.408019]  driver_register+0x74/0x120
> > [   11.411837]  __pci_register_driver+0x40/0x50
> > [   11.416149]  radeon_init+0x78/0x1000 [radeon]
> > [   11.420489]  do_one_initcall+0x54/0x154
> > [   11.424310]  do_init_module+0x54/0x260
> > [   11.428041]  load_module+0x1ccc/0x20b0
> > [   11.431773]  __se_sys_finit_module+0xac/0x10c
> > [   11.436109]  __arm64_sys_finit_module+0x18/0x20
> > [   11.440622]  el0_svc_common+0x70/0x164
> > [   11.444353]  el0_svc_handler+0x2c/0x80
> > [   11.448084]  el0_svc+0x8/0xc
> > [   11.450954] Code: d65f03c0 cb0803e4 f2400c84 5480 (a9001d07)
> > 
> > Obviously, the __memset call is generated by gcc(8.3.1). It optimizes
> > this for loop into memset. But this may break, because dest here is
> > cpu_addr mapped to io mem. So, just invoke `memset_io` directly, which
> > do solve the problem here.
> 
> Well interesting problem you stumbled over here, but the solution is quite a
> hack.

Hi, Christian. I'm not sure why this change is a hack here. I cannot see the 
problem and wll be grateful if you give more explainations.
> 
> For amdgpu I suggest that we allocate the UVD message in GTT instead of VRAM
> since we don't have the hardware restriction for that on the new generations.
> 
Thanks, I will try to dig into deeper. But what's the "hardware restriction" 
meaning here? I'm not familiar with video driver stack and amd gpu, sorry.
> For radeon I think the better approach would be to convert the direct memory
> writes into calls to writel().
Ok, so you mean the more proper way is to use writel instead of memset_io?

> BTW: How does userspace work on arm64 then? The driver stack usually only 
> works
> if mmio can be mapped directly.
I also post two usespace issue on mesa, and you may be interested with them: 
 https://gitlab.freedesktop.org/mesa/mesa/-/issues/3954
 https://gitlab.freedesktop.org/mesa/mesa/-/issues/3951
I paste some virtual memory map in userspace there. (and the two problems do 
bother me quite a long time.)
> Regards,
> Christian.
> 
> > 
> > Signed-off-by: chenli 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 6 ++
> >   drivers/gpu/drm/radeon/radeon_uvd.c | 6 ++
> >   2 files changed, 4 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> > index 7c5b60e53482..4dccde7a9e83 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> > @@ -1187,8 +1187,7 @@ int amdgpu_uvd_get_create_msg(struct amdgpu_ring 
> > *ring, uint32_t handle,
> > msg[8] = cpu_to_le32(0x0440);
> > msg[9] = cpu_to_le32(0x);
> > msg[10] = cpu_to_le32(0x01b37000);
> > -   for (i = 11; i < 1024; ++i)
> > -   msg[i] = 

Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-17 Thread Chen Li
On Wed, 16 Dec 2020 22:19:11 +0800,
Christian König wrote:
> 
> Am 16.12.20 um 14:48 schrieb Chen Li:
> > On Wed, 16 Dec 2020 15:59:37 +0800,
> > Christian König wrote:
> >> Am 16.12.20 um 06:41 schrieb Chen Li:
> >>> When using e8860(gcn1) on arm64, the kernel crashed on drm/radeon:
> >>> [SNIP]
> >>> Obviously, the __memset call is generated by gcc(8.3.1). It optimizes
> >>> this for loop into memset. But this may break, because dest here is
> >>> cpu_addr mapped to io mem. So, just invoke `memset_io` directly, which
> >>> do solve the problem here.
> >> Well interesting problem you stumbled over here, but the solution is quite 
> >> a
> >> hack.
> > Hi, Christian. I'm not sure why this change is a hack here. I cannot see 
> > the problem and wll be grateful if you give more explainations.
> 
> __memset is supposed to work on those addresses, otherwise you can't use the
> e8860 on your arm64 system.
If __memset is supposed to work on those adresses, why this 
commit(https://github.com/torvalds/linux/commit/ba0b2275a6781b2f3919d931d63329b5548f6d5f)
 is needed? (I also notice drm/radeon didn't take this change though) just out 
of curiosity.
> 
> Replacing the the direct write in the kernel with calls to writel() or
> memset_io() will fix that temporary, but you have a more general problem here.
 
I cannot see what's the more general problem here :( u mean performance?
> >> For amdgpu I suggest that we allocate the UVD message in GTT instead of 
> >> VRAM
> >> since we don't have the hardware restriction for that on the new 
> >> generations.
> >> 
> > Thanks, I will try to dig into deeper. But what's the "hardware 
> > restriction" meaning here? I'm not familiar with video driver stack and amd 
> > gpu, sorry.
> 
> On older hardware (AGP days) the buffer had to be in VRAM (MMIO) memory, but 
> on
> modern system GTT (system memory) works as well.
IIUC, e8860 can use amdgpu(I use radeon now) beause its device id 6822 is in 
amdgpu's table. But I cannot tell whether e8860 has iommu, and I cannot find 
iommu from lspci, so graphics translation table may not work here?
> 
> >> For radeon I think the better approach would be to convert the direct 
> >> memory
> >> writes into calls to writel().
> > Ok, so you mean the more proper way is to use writel instead of memset_io?
> 
> Well, it is a start. But I'm not sure if you will ever get that hardware 
> working
> with this CPU.
> 
> >> BTW: How does userspace work on arm64 then? The driver stack usually only 
> >> works
> >> if mmio can be mapped directly.
> > I also post two usespace issue on mesa, and you may be interested with them:
> >   
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F3954data=04%7C01%7Cchristian.koenig%40amd.com%7Cd6ff52383a454a6dc03108d8a1c94dc1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437233268588747%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=RDESyzYBB3ql2GgBigsYf%2Fx2g6zwCq%2Fy8HQ0AAMtX90%3Dreserved=0
> >   
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F3951data=04%7C01%7Cchristian.koenig%40amd.com%7Cd6ff52383a454a6dc03108d8a1c94dc1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437233268588747%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=Y5f9Ki%2FQ8G4zn3MjLVG7yiLLCxbhTyNelZj36hAuXQY%3Dreserved=0
> > I paste some virtual memory map in userspace there. (and the two problems 
> > do bother me quite a long time.)
> 
> I don't really see a solution for those problems.
> 
> See it is perfectly valid for an application to memset/memcpy on mmaped MMIO
> space which comes from OpenGL or Vulkan.
> 
> So your CPU simply won't work with the hardware. We could work around that 
> with
> a couple of hacks, but this is a pretty much general problem.
> 
> Regards,
> Christian.
 
Thanks! Can you provid some details about these hacks? Should I post another 
issue on the mail list?
 


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-16 Thread Christian König

Am 16.12.20 um 14:48 schrieb Chen Li:

On Wed, 16 Dec 2020 15:59:37 +0800,
Christian König wrote:

Am 16.12.20 um 06:41 schrieb Chen Li:

When using e8860(gcn1) on arm64, the kernel crashed on drm/radeon:
[SNIP]
Obviously, the __memset call is generated by gcc(8.3.1). It optimizes
this for loop into memset. But this may break, because dest here is
cpu_addr mapped to io mem. So, just invoke `memset_io` directly, which
do solve the problem here.

Well interesting problem you stumbled over here, but the solution is quite a
hack.

Hi, Christian. I'm not sure why this change is a hack here. I cannot see the 
problem and wll be grateful if you give more explainations.


__memset is supposed to work on those addresses, otherwise you can't use 
the e8860 on your arm64 system.


Replacing the the direct write in the kernel with calls to writel() or 
memset_io() will fix that temporary, but you have a more general problem 
here.



For amdgpu I suggest that we allocate the UVD message in GTT instead of VRAM
since we don't have the hardware restriction for that on the new generations.


Thanks, I will try to dig into deeper. But what's the "hardware restriction" 
meaning here? I'm not familiar with video driver stack and amd gpu, sorry.


On older hardware (AGP days) the buffer had to be in VRAM (MMIO) memory, 
but on modern system GTT (system memory) works as well.



For radeon I think the better approach would be to convert the direct memory
writes into calls to writel().

Ok, so you mean the more proper way is to use writel instead of memset_io?


Well, it is a start. But I'm not sure if you will ever get that hardware 
working with this CPU.



BTW: How does userspace work on arm64 then? The driver stack usually only works
if mmio can be mapped directly.

I also post two usespace issue on mesa, and you may be interested with them:
  
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F3954data=04%7C01%7Cchristian.koenig%40amd.com%7Cd6ff52383a454a6dc03108d8a1c94dc1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437233268588747%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=RDESyzYBB3ql2GgBigsYf%2Fx2g6zwCq%2Fy8HQ0AAMtX90%3Dreserved=0
  
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F3951data=04%7C01%7Cchristian.koenig%40amd.com%7Cd6ff52383a454a6dc03108d8a1c94dc1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437233268588747%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=Y5f9Ki%2FQ8G4zn3MjLVG7yiLLCxbhTyNelZj36hAuXQY%3Dreserved=0
I paste some virtual memory map in userspace there. (and the two problems do 
bother me quite a long time.)


I don't really see a solution for those problems.

See it is perfectly valid for an application to memset/memcpy on mmaped 
MMIO space which comes from OpenGL or Vulkan.


So your CPU simply won't work with the hardware. We could work around 
that with a couple of hacks, but this is a pretty much general problem.


Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-16 Thread kernel test robot
Hi Chen,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.10 next-20201215]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Chen-Li/drm-amdgpu-radeon-fix-memset-on-io-mem/20201216-165835
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
d01e7f10dae29eba0f9ada82b65d24e035d5b2f9
config: x86_64-randconfig-a002-20201216 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
71601d2ac9954cb59c443cb3ae442cb106df35d4)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/201257d71c519bef0966e555d95bf820512f5a34
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Chen-Li/drm-amdgpu-radeon-fix-memset-on-io-mem/20201216-165835
git checkout 201257d71c519bef0966e555d95bf820512f5a34
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/radeon/radeon_uvd.c:159:6: warning: format specifies type 
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
version_major, version_minor, family_id);
^
   include/drm/drm_print.h:484:29: note: expanded from macro 'DRM_INFO'
   _DRM_PRINTK(, INFO, fmt, ##__VA_ARGS__)
   ~~~^~~
   include/drm/drm_print.h:481:53: note: expanded from macro '_DRM_PRINTK'
   printk##once(KERN_##level "[" DRM_NAME "] " fmt, ##__VA_ARGS__)
   ~~~^~~
   drivers/gpu/drm/radeon/radeon_uvd.c:159:21: warning: format specifies type 
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
version_major, version_minor, family_id);
   ^
   include/drm/drm_print.h:484:29: note: expanded from macro 'DRM_INFO'
   _DRM_PRINTK(, INFO, fmt, ##__VA_ARGS__)
   ~~~^~~
   include/drm/drm_print.h:481:53: note: expanded from macro '_DRM_PRINTK'
   printk##once(KERN_##level "[" DRM_NAME "] " fmt, ##__VA_ARGS__)
   ~~~^~~
   drivers/gpu/drm/radeon/radeon_uvd.c:159:36: warning: format specifies type 
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
version_major, version_minor, family_id);
  ^
   include/drm/drm_print.h:484:29: note: expanded from macro 'DRM_INFO'
   _DRM_PRINTK(, INFO, fmt, ##__VA_ARGS__)
   ~~~^~~
   include/drm/drm_print.h:481:53: note: expanded from macro '_DRM_PRINTK'
   printk##once(KERN_##level "[" DRM_NAME "] " fmt, ##__VA_ARGS__)
   ~~~^~~
>> drivers/gpu/drm/radeon/radeon_uvd.c:805:17: warning: variable 'i' is 
>> uninitialized when used here [-Wuninitialized]
   memset_io([i], 0x0, 1013 * sizeof(uint32_t));
  ^
   drivers/gpu/drm/radeon/radeon_uvd.c:787:10: note: initialize the variable 
'i' to silence this warning
   int r, i;
   ^
= 0
   drivers/gpu/drm/radeon/radeon_uvd.c:833:17: warning: variable 'i' is 
uninitialized when used here [-Wuninitialized]
   memset_io([i], 0x0, 1020 * sizeof(uint32_t));
  ^
   drivers/gpu/drm/radeon/radeon_uvd.c:822:10: note: initialize the variable 
'i' to silence this warning
   int r, i;
   ^
= 0
   5 warnings generated.


vim +/i +805 drivers/gpu/drm/radeon/radeon_uvd.c

   771  
   772  /*
   773   * multiple fence commands without any stream commands in between can
   774   * crash the vcpu so just try to emmit a dummy create/destroy msg to
   775   * avoid this
   776   */
   777  int radeon_uvd_get_create_msg(struct radeon_device *rdev, int ring,
   778uint32_t handle, struct radeon_fence 
**fence)
   779  {
   780  /* we use the last page of the vcpu bo for the UVD message */
   781

[kbuild] Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-16 Thread Dan Carpenter
Hi Chen,

url:
https://github.com/0day-ci/linux/commits/Chen-Li/drm-amdgpu-radeon-fix-memset-on-io-mem/20201216-165835
 
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git  
d01e7f10dae29eba0f9ada82b65d24e035d5b2f9
config: x86_64-randconfig-m001-20201216 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 

New smatch warnings:
drivers/gpu/drm/radeon/radeon_uvd.c:805 radeon_uvd_get_create_msg() error: 
uninitialized symbol 'i'.
drivers/gpu/drm/radeon/radeon_uvd.c:833 radeon_uvd_get_destroy_msg() error: 
uninitialized symbol 'i'.

Old smatch warnings:
drivers/gpu/drm/radeon/radeon_uvd.c:568 radeon_uvd_cs_msg() warn: ignoring 
unreachable code.

vim +/i +805 drivers/gpu/drm/radeon/radeon_uvd.c

f2ba57b5eab8817 Christian König 2013-04-08  777  int 
radeon_uvd_get_create_msg(struct radeon_device *rdev, int ring,
f2ba57b5eab8817 Christian König 2013-04-08  778   
uint32_t handle, struct radeon_fence **fence)
f2ba57b5eab8817 Christian König 2013-04-08  779  {
feba9b0bcf492ba Christian König 2014-08-22  780 /* we use the last page 
of the vcpu bo for the UVD message */
feba9b0bcf492ba Christian König 2014-08-22  781 uint64_t offs = 
radeon_bo_size(rdev->uvd.vcpu_bo) -
feba9b0bcf492ba Christian König 2014-08-22  782 
RADEON_GPU_PAGE_SIZE;
f2ba57b5eab8817 Christian König 2013-04-08  783  
feba9b0bcf492ba Christian König 2014-08-22  784 uint32_t *msg = 
rdev->uvd.cpu_addr + offs;
feba9b0bcf492ba Christian König 2014-08-22  785 uint64_t addr = 
rdev->uvd.gpu_addr + offs;
f2ba57b5eab8817 Christian König 2013-04-08  786  
feba9b0bcf492ba Christian König 2014-08-22  787 int r, i;
f2ba57b5eab8817 Christian König 2013-04-08  788  
feba9b0bcf492ba Christian König 2014-08-22  789 r = 
radeon_bo_reserve(rdev->uvd.vcpu_bo, true);
feba9b0bcf492ba Christian König 2014-08-22  790 if (r)
f2ba57b5eab8817 Christian König 2013-04-08  791 return r;
f2ba57b5eab8817 Christian König 2013-04-08  792  
f2ba57b5eab8817 Christian König 2013-04-08  793 /* stitch together an 
UVD create msg */
9b1be4dc02bb6b9 Alex Deucher2013-06-07  794 msg[0] = 
cpu_to_le32(0x0de4);
9b1be4dc02bb6b9 Alex Deucher2013-06-07  795 msg[1] = 
cpu_to_le32(0x);
9b1be4dc02bb6b9 Alex Deucher2013-06-07  796 msg[2] = 
cpu_to_le32(handle);
9b1be4dc02bb6b9 Alex Deucher2013-06-07  797 msg[3] = 
cpu_to_le32(0x);
9b1be4dc02bb6b9 Alex Deucher2013-06-07  798 msg[4] = 
cpu_to_le32(0x);
9b1be4dc02bb6b9 Alex Deucher2013-06-07  799 msg[5] = 
cpu_to_le32(0x);
9b1be4dc02bb6b9 Alex Deucher2013-06-07  800 msg[6] = 
cpu_to_le32(0x);
9b1be4dc02bb6b9 Alex Deucher2013-06-07  801 msg[7] = 
cpu_to_le32(0x0780);
9b1be4dc02bb6b9 Alex Deucher2013-06-07  802 msg[8] = 
cpu_to_le32(0x0440);
9b1be4dc02bb6b9 Alex Deucher2013-06-07  803 msg[9] = 
cpu_to_le32(0x);
9b1be4dc02bb6b9 Alex Deucher2013-06-07  804 msg[10] = 
cpu_to_le32(0x01b37000);
201257d71c519be Chen Li 2020-12-16 @805 memset_io([i], 0x0, 
1013 * sizeof(uint32_t));
  ^^^
The "i" variable is never initialized anywhere.

f2ba57b5eab8817 Christian König 2013-04-08  806  
feba9b0bcf492ba Christian König 2014-08-22  807 r = 
radeon_uvd_send_msg(rdev, ring, addr, fence);
feba9b0bcf492ba Christian König 2014-08-22  808 
radeon_bo_unreserve(rdev->uvd.vcpu_bo);
feba9b0bcf492ba Christian König 2014-08-22  809 return r;
f2ba57b5eab8817 Christian König 2013-04-08  810  }
f2ba57b5eab8817 Christian König 2013-04-08  811  
f2ba57b5eab8817 Christian König 2013-04-08  812  int 
radeon_uvd_get_destroy_msg(struct radeon_device *rdev, int ring,
f2ba57b5eab8817 Christian König 2013-04-08  813
uint32_t handle, struct radeon_fence **fence)
f2ba57b5eab8817 Christian König 2013-04-08  814  {
feba9b0bcf492ba Christian König 2014-08-22  815 /* we use the last page 
of the vcpu bo for the UVD message */
feba9b0bcf492ba Christian König 2014-08-22  816 uint64_t offs = 
radeon_bo_size(rdev->uvd.vcpu_bo) -
feba9b0bcf492ba Christian König 2014-08-22  817 
RADEON_GPU_PAGE_SIZE;
f2ba57b5eab8817 Christian König 2013-04-08  818  
feba9b0bcf492ba Christian König 2014-08-22  819 uint32_t *msg = 
rdev->uvd.cpu_addr + offs;
feba9b0bcf492ba Christian König 2014-08-22  820 uint64_t addr = 
rdev->uvd.gpu_addr + offs;
f2ba57b5eab8817 Christian König 2013-04-08  821  
feba9b0bcf492ba Christian König 2014-08-22  822 int r, i;
f2ba57b5eab8817 Christian König 2013-04-08  823  
feba9b0bcf492ba Christian König 2014-08-22  824

[PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-16 Thread Chen Li


When using e8860(gcn1) on arm64, the kernel crashed on drm/radeon:

[   11.240414] pc : __memset+0x4c/0x188
[   11.244101] lr : radeon_uvd_get_create_msg+0x114/0x1d0 [radeon]
[   11.249995] sp : 0d7eb700
[   11.253295] x29: 0d7eb700 x28: 8001f632a868
[   11.258585] x27: 0004 x26: 0de0
[   11.263875] x25: 0125 x24: 0001
[   11.269168] x23:  x22: 0005
[   11.274459] x21: 0df24000 x20: 8001f74b4000
[   11.279753] x19: 00124000 x18: 0020
[   11.285043] x17:  x16: 
[   11.290336] x15: 09309000 x14: 
[   11.290340] x13: 094b6f88 x12: 094b6bd2
[   11.290343] x11: 0d7eb700 x10: 0d7eb700
[   11.306246] x9 : 0d7eb700 x8 : 0df2402c
[   11.306254] x7 :  x6 : 094b626a
[   11.306257] x5 :  x4 : 0004
[   11.306262] x3 :  x2 : 0fd4
[   11.306265] x1 :  x0 : 0df2402c
[   11.306272] Call trace:
[   11.306316]  __memset+0x4c/0x188
[   11.306638]  uvd_v1_0_ib_test+0x70/0x1c0 [radeon]
[   11.306758]  radeon_ib_ring_tests+0x54/0xe0 [radeon]
[   11.309961] IPv6: ADDRCONF(NETDEV_UP): enp5s0f0: link is not ready
[   11.354628]  radeon_device_init+0x53c/0xbdc [radeon]
[   11.354693]  radeon_driver_load_kms+0x6c/0x1b0 [radeon]
[   11.364788]  drm_dev_register+0x130/0x1c0
[   11.364794]  drm_get_pci_dev+0x8c/0x14c
[   11.372704]  radeon_pci_probe+0xb0/0x110 [radeon]
[   11.372715]  local_pci_probe+0x3c/0xb0
[   11.381129]  pci_device_probe+0x114/0x1b0
[   11.385121]  really_probe+0x23c/0x400
[   11.388757]  driver_probe_device+0xdc/0x130
[   11.392921]  __driver_attach+0x128/0x150
[   11.396826]  bus_for_each_dev+0x70/0xbc
[   11.400643]  driver_attach+0x20/0x2c
[   11.404201]  bus_add_driver+0x160/0x260
[   11.408019]  driver_register+0x74/0x120
[   11.411837]  __pci_register_driver+0x40/0x50
[   11.416149]  radeon_init+0x78/0x1000 [radeon]
[   11.420489]  do_one_initcall+0x54/0x154
[   11.424310]  do_init_module+0x54/0x260
[   11.428041]  load_module+0x1ccc/0x20b0
[   11.431773]  __se_sys_finit_module+0xac/0x10c
[   11.436109]  __arm64_sys_finit_module+0x18/0x20
[   11.440622]  el0_svc_common+0x70/0x164
[   11.444353]  el0_svc_handler+0x2c/0x80
[   11.448084]  el0_svc+0x8/0xc
[   11.450954] Code: d65f03c0 cb0803e4 f2400c84 5480 (a9001d07)

Obviously, the __memset call is generated by gcc(8.3.1). It optimizes
this for loop into memset. But this may break, because dest here is
cpu_addr mapped to io mem. So, just invoke `memset_io` directly, which
do solve the problem here.

Signed-off-by: chenli 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 6 ++
 drivers/gpu/drm/radeon/radeon_uvd.c | 6 ++
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 7c5b60e53482..4dccde7a9e83 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -1187,8 +1187,7 @@ int amdgpu_uvd_get_create_msg(struct amdgpu_ring *ring, 
uint32_t handle,
msg[8] = cpu_to_le32(0x0440);
msg[9] = cpu_to_le32(0x);
msg[10] = cpu_to_le32(0x01b37000);
-   for (i = 11; i < 1024; ++i)
-   msg[i] = cpu_to_le32(0x0);
+   memset_io([i], 0x0, 1013 * sizeof(uint32_t));
 
return amdgpu_uvd_send_msg(ring, bo, true, fence);
 }
@@ -1212,8 +1211,7 @@ int amdgpu_uvd_get_destroy_msg(struct amdgpu_ring *ring, 
uint32_t handle,
msg[1] = cpu_to_le32(0x0002);
msg[2] = cpu_to_le32(handle);
msg[3] = cpu_to_le32(0x);
-   for (i = 4; i < 1024; ++i)
-   msg[i] = cpu_to_le32(0x0);
+   memset_io([i], 0x0, 1020 * sizeof(uint32_t));
 
return amdgpu_uvd_send_msg(ring, bo, direct, fence);
 }
diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c 
b/drivers/gpu/drm/radeon/radeon_uvd.c
index 57fb3eb3a4b4..2e2e737c4706 100644
--- a/drivers/gpu/drm/radeon/radeon_uvd.c
+++ b/drivers/gpu/drm/radeon/radeon_uvd.c
@@ -802,8 +802,7 @@ int radeon_uvd_get_create_msg(struct radeon_device *rdev, 
int ring,
msg[8] = cpu_to_le32(0x0440);
msg[9] = cpu_to_le32(0x);
msg[10] = cpu_to_le32(0x01b37000);
-   for (i = 11; i < 1024; ++i)
-   msg[i] = cpu_to_le32(0x0);
+   memset_io([i], 0x0, 1013 * sizeof(uint32_t));
 
r = radeon_uvd_send_msg(rdev, ring, addr, fence);
radeon_bo_unreserve(rdev->uvd.vcpu_bo);
@@ -831,8 +830,7 @@ int radeon_uvd_get_destroy_msg(struct radeon_device *rdev, 
int ring,
msg[1] = cpu_to_le32(0x0002);
msg[2] = cpu_to_le32(handle);
msg[3] = cpu_to_le32(0x);
-   for (i = 4; i < 1024; ++i)
-   msg[i] = cpu_to_le32(0x0);
+   memset_io([i], 0x0, 1020 * sizeof(uint32_t));
 
  

Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-15 Thread Christian König

Am 16.12.20 um 06:41 schrieb Chen Li:

When using e8860(gcn1) on arm64, the kernel crashed on drm/radeon:

[   11.240414] pc : __memset+0x4c/0x188
[   11.244101] lr : radeon_uvd_get_create_msg+0x114/0x1d0 [radeon]
[   11.249995] sp : 0d7eb700
[   11.253295] x29: 0d7eb700 x28: 8001f632a868
[   11.258585] x27: 0004 x26: 0de0
[   11.263875] x25: 0125 x24: 0001
[   11.269168] x23:  x22: 0005
[   11.274459] x21: 0df24000 x20: 8001f74b4000
[   11.279753] x19: 00124000 x18: 0020
[   11.285043] x17:  x16: 
[   11.290336] x15: 09309000 x14: 
[   11.290340] x13: 094b6f88 x12: 094b6bd2
[   11.290343] x11: 0d7eb700 x10: 0d7eb700
[   11.306246] x9 : 0d7eb700 x8 : 0df2402c
[   11.306254] x7 :  x6 : 094b626a
[   11.306257] x5 :  x4 : 0004
[   11.306262] x3 :  x2 : 0fd4
[   11.306265] x1 :  x0 : 0df2402c
[   11.306272] Call trace:
[   11.306316]  __memset+0x4c/0x188
[   11.306638]  uvd_v1_0_ib_test+0x70/0x1c0 [radeon]
[   11.306758]  radeon_ib_ring_tests+0x54/0xe0 [radeon]
[   11.309961] IPv6: ADDRCONF(NETDEV_UP): enp5s0f0: link is not ready
[   11.354628]  radeon_device_init+0x53c/0xbdc [radeon]
[   11.354693]  radeon_driver_load_kms+0x6c/0x1b0 [radeon]
[   11.364788]  drm_dev_register+0x130/0x1c0
[   11.364794]  drm_get_pci_dev+0x8c/0x14c
[   11.372704]  radeon_pci_probe+0xb0/0x110 [radeon]
[   11.372715]  local_pci_probe+0x3c/0xb0
[   11.381129]  pci_device_probe+0x114/0x1b0
[   11.385121]  really_probe+0x23c/0x400
[   11.388757]  driver_probe_device+0xdc/0x130
[   11.392921]  __driver_attach+0x128/0x150
[   11.396826]  bus_for_each_dev+0x70/0xbc
[   11.400643]  driver_attach+0x20/0x2c
[   11.404201]  bus_add_driver+0x160/0x260
[   11.408019]  driver_register+0x74/0x120
[   11.411837]  __pci_register_driver+0x40/0x50
[   11.416149]  radeon_init+0x78/0x1000 [radeon]
[   11.420489]  do_one_initcall+0x54/0x154
[   11.424310]  do_init_module+0x54/0x260
[   11.428041]  load_module+0x1ccc/0x20b0
[   11.431773]  __se_sys_finit_module+0xac/0x10c
[   11.436109]  __arm64_sys_finit_module+0x18/0x20
[   11.440622]  el0_svc_common+0x70/0x164
[   11.444353]  el0_svc_handler+0x2c/0x80
[   11.448084]  el0_svc+0x8/0xc
[   11.450954] Code: d65f03c0 cb0803e4 f2400c84 5480 (a9001d07)

Obviously, the __memset call is generated by gcc(8.3.1). It optimizes
this for loop into memset. But this may break, because dest here is
cpu_addr mapped to io mem. So, just invoke `memset_io` directly, which
do solve the problem here.


Well interesting problem you stumbled over here, but the solution is 
quite a hack.


For amdgpu I suggest that we allocate the UVD message in GTT instead of 
VRAM since we don't have the hardware restriction for that on the new 
generations.


For radeon I think the better approach would be to convert the direct 
memory writes into calls to writel().


BTW: How does userspace work on arm64 then? The driver stack usually 
only works if mmio can be mapped directly.


Regards,
Christian.



Signed-off-by: chenli 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 6 ++
  drivers/gpu/drm/radeon/radeon_uvd.c | 6 ++
  2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 7c5b60e53482..4dccde7a9e83 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -1187,8 +1187,7 @@ int amdgpu_uvd_get_create_msg(struct amdgpu_ring *ring, 
uint32_t handle,
msg[8] = cpu_to_le32(0x0440);
msg[9] = cpu_to_le32(0x);
msg[10] = cpu_to_le32(0x01b37000);
-   for (i = 11; i < 1024; ++i)
-   msg[i] = cpu_to_le32(0x0);
+   memset_io([i], 0x0, 1013 * sizeof(uint32_t));
  
  	return amdgpu_uvd_send_msg(ring, bo, true, fence);

  }
@@ -1212,8 +1211,7 @@ int amdgpu_uvd_get_destroy_msg(struct amdgpu_ring *ring, 
uint32_t handle,
msg[1] = cpu_to_le32(0x0002);
msg[2] = cpu_to_le32(handle);
msg[3] = cpu_to_le32(0x);
-   for (i = 4; i < 1024; ++i)
-   msg[i] = cpu_to_le32(0x0);
+   memset_io([i], 0x0, 1020 * sizeof(uint32_t));
  
  	return amdgpu_uvd_send_msg(ring, bo, direct, fence);

  }
diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c 
b/drivers/gpu/drm/radeon/radeon_uvd.c
index 57fb3eb3a4b4..2e2e737c4706 100644
--- a/drivers/gpu/drm/radeon/radeon_uvd.c
+++ b/drivers/gpu/drm/radeon/radeon_uvd.c
@@ -802,8 +802,7 @@ int radeon_uvd_get_create_msg(struct radeon_device *rdev, 
int ring,
msg[8] = cpu_to_le32(0x0440);
msg[9] = cpu_to_le32(0x);
msg[10] = cpu_to_le32(0x01b37000);
-   for (i = 11; i < 1024; ++i)
-   msg[i] =