Nouveau crashes in 4.6-rc on arm64

2016-04-20 Thread Alexandre Courbot
On 04/11/2016 04:22 PM, Alexandre Courbot wrote:
> Hi Robin,
>
> On 04/09/2016 03:46 AM, Robin Murphy wrote:
>> Hi Alex,
>>
>> On 08/04/16 05:47, Alexandre Courbot wrote:
>>> Hi Robin,
>>>
>>> On 04/07/2016 08:50 PM, Robin Murphy wrote:
 Hello,

 With 4.6-rc2 (and -rc1) I'm seeing Nouveau blowing up at boot, from the
 look of it by dereferencing some offset from NULL inside
 nouveau_fbcon_imageblit(). My setup is an old XFX 7600GT card plugged
 into an ARM Juno r1 board, which works fine with 4.5 and earlier.

 Attached are a couple of logs from booting arm64 defconfig plus DRM and
 Nouveau enabled - the second also has framebuffer console rotation
 turned on, which interestingly seems to move the point of failure, and
 the display does eventually come up to show the tail end of the
 panic in
 that case.

 I might be able to find time for a full bisection next week if isn't
 something sufficiently obvious to anyone who knows this driver.
>>>
>>> Looking at the log it is not clear to me what could be causing this. I
>>> can boot 4.6-rc2 with a GM206 card without any issue. A bisect would
>>> indeed be useful here.
>>
>> OK, turns out the lure of writing something to remotely drive a Juno and
>> parse kernel bootlogs through an automatic bisection was too great to
>> resist on a Friday afternoon :D
>>
>> Bisection came down to 1733a2ad3674("drm/nouveau/device/pci: set as
>> non-CPU-coherent on ARM64"), and sure enough reverting that removes the
>> crash.
>
> Thanks for taking the time to bisect this. And apologies as it seems my
> commit is the reason for your troubles.
>
> The CPU coherency flag is used for two things: explicitly sync buffers
> pages when required, and allocating buffers that are not explicitly
> synced (like fences or pushbuffers) using the DMA API. For this latter
> use, it also accesses the buffer's content using the mapping provided by
> dma_alloc_coherent() instead of creating a new one. All nouveau_bos are
> supposed to be written using nouveau_bo_rd32(), and this function
> handles the case of an DMA-API allocated object by detecting that the
> result of ttm_kmap_obj_virtual() is NULL.
>
> But as it turns out, OUT_RINGp() also calls ttm_kmap_obj_virtual() in
> order to perform a memcpy and uses its result directly - which means we
> are doing memcpy on a NULL pointer. We never caught this because we
> typically do not use Nouveau's fbcon with an ARM setup.
>
> I don't really like this special access for coherent objects, and
> actually had a patch in my tree to attempt to remove it (attached).
> Although it is not the whole solution (see below), the issue should at
> least not be visible with it applied - could you confirm?

Hi Robin, could you confirm whether the attached patch in my previous 
mail helps with your problem?

Thanks!



Nouveau crashes in 4.6-rc on arm64

2016-04-20 Thread Robin Murphy
On 20/04/16 11:44, Robin Murphy wrote:
> Hi Alex,
>
> On 20/04/16 05:35, Alexandre Courbot wrote:
> [...]
 Bisection came down to 1733a2ad3674("drm/nouveau/device/pci: set as
 non-CPU-coherent on ARM64"), and sure enough reverting that removes the
 crash.
>>>
>>> Thanks for taking the time to bisect this. And apologies as it seems my
>>> commit is the reason for your troubles.
>>>
>>> The CPU coherency flag is used for two things: explicitly sync buffers
>>> pages when required, and allocating buffers that are not explicitly
>>> synced (like fences or pushbuffers) using the DMA API. For this latter
>>> use, it also accesses the buffer's content using the mapping provided by
>>> dma_alloc_coherent() instead of creating a new one. All nouveau_bos are
>>> supposed to be written using nouveau_bo_rd32(), and this function
>>> handles the case of an DMA-API allocated object by detecting that the
>>> result of ttm_kmap_obj_virtual() is NULL.
>>>
>>> But as it turns out, OUT_RINGp() also calls ttm_kmap_obj_virtual() in
>>> order to perform a memcpy and uses its result directly - which means we
>>> are doing memcpy on a NULL pointer. We never caught this because we
>>> typically do not use Nouveau's fbcon with an ARM setup.
>>>
>>> I don't really like this special access for coherent objects, and
>>> actually had a patch in my tree to attempt to remove it (attached).
>>> Although it is not the whole solution (see below), the issue should at
>>> least not be visible with it applied - could you confirm?
>>
>> Hi Robin, could you confirm whether the attached patch in my previous
>> mail helps with your problem?
>
> With that patch on top of -rc4, it's conjuring up something that looks
> somewhat more like a real address on top of the offset, as it now
> crashes with "Unable to handle kernel paging request at virtual address
> ff8008f841ac", rather than the previous "Unable to handle kernel
> NULL pointer dereference at virtual address 01ac".
>
> That does of course mean it still crashes in the same place, though :(
>
> Robin.
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy
> the information in any medium. Thank you.

And since I intentionally sent this to the lists, anyone reading that 
_is_ an intended recipient, so it's all good, I promise!

[sorry, SMTP server mixup on my end... *berates self*]

Robin.

>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



Nouveau crashes in 4.6-rc on arm64

2016-04-20 Thread Robin Murphy
Hi Alex,

On 20/04/16 05:35, Alexandre Courbot wrote:
[...]
>>> Bisection came down to 1733a2ad3674("drm/nouveau/device/pci: set as
>>> non-CPU-coherent on ARM64"), and sure enough reverting that removes the
>>> crash.
>>
>> Thanks for taking the time to bisect this. And apologies as it seems my
>> commit is the reason for your troubles.
>>
>> The CPU coherency flag is used for two things: explicitly sync buffers
>> pages when required, and allocating buffers that are not explicitly
>> synced (like fences or pushbuffers) using the DMA API. For this latter
>> use, it also accesses the buffer's content using the mapping provided by
>> dma_alloc_coherent() instead of creating a new one. All nouveau_bos are
>> supposed to be written using nouveau_bo_rd32(), and this function
>> handles the case of an DMA-API allocated object by detecting that the
>> result of ttm_kmap_obj_virtual() is NULL.
>>
>> But as it turns out, OUT_RINGp() also calls ttm_kmap_obj_virtual() in
>> order to perform a memcpy and uses its result directly - which means we
>> are doing memcpy on a NULL pointer. We never caught this because we
>> typically do not use Nouveau's fbcon with an ARM setup.
>>
>> I don't really like this special access for coherent objects, and
>> actually had a patch in my tree to attempt to remove it (attached).
>> Although it is not the whole solution (see below), the issue should at
>> least not be visible with it applied - could you confirm?
>
> Hi Robin, could you confirm whether the attached patch in my previous
> mail helps with your problem?

With that patch on top of -rc4, it's conjuring up something that looks
somewhat more like a real address on top of the offset, as it now
crashes with "Unable to handle kernel paging request at virtual address
ff8008f841ac", rather than the previous "Unable to handle kernel
NULL pointer dereference at virtual address 01ac".

That does of course mean it still crashes in the same place, though :(

Robin.
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.



Nouveau crashes in 4.6-rc on arm64

2016-04-11 Thread Alexandre Courbot
On 04/11/2016 04:22 PM, Alexandre Courbot wrote:
> ... or maybe we could just unconditionally sync all buffers and let the
> DMA API abstract this away. My concern is that on coherent architectures
> we would still need to loop over all the pages for nothing, as I don't
> think the loop (see e.g. nouveau_bo_sync_for_cpu in nouveau_bo.c) can be
> optimized away by the compiler.

Looking at the code it actually turns out we are already calling the 
sync functions on coherent buses anyway, so maybe we have little reasons 
to keep this at all?


Nouveau crashes in 4.6-rc on arm64

2016-04-11 Thread Alexandre Courbot
Hi Robin,

On 04/09/2016 03:46 AM, Robin Murphy wrote:
> Hi Alex,
>
> On 08/04/16 05:47, Alexandre Courbot wrote:
>> Hi Robin,
>>
>> On 04/07/2016 08:50 PM, Robin Murphy wrote:
>>> Hello,
>>>
>>> With 4.6-rc2 (and -rc1) I'm seeing Nouveau blowing up at boot, from the
>>> look of it by dereferencing some offset from NULL inside
>>> nouveau_fbcon_imageblit(). My setup is an old XFX 7600GT card plugged
>>> into an ARM Juno r1 board, which works fine with 4.5 and earlier.
>>>
>>> Attached are a couple of logs from booting arm64 defconfig plus DRM and
>>> Nouveau enabled - the second also has framebuffer console rotation
>>> turned on, which interestingly seems to move the point of failure, and
>>> the display does eventually come up to show the tail end of the panic in
>>> that case.
>>>
>>> I might be able to find time for a full bisection next week if isn't
>>> something sufficiently obvious to anyone who knows this driver.
>>
>> Looking at the log it is not clear to me what could be causing this. I
>> can boot 4.6-rc2 with a GM206 card without any issue. A bisect would
>> indeed be useful here.
>
> OK, turns out the lure of writing something to remotely drive a Juno and
> parse kernel bootlogs through an automatic bisection was too great to
> resist on a Friday afternoon :D
>
> Bisection came down to 1733a2ad3674("drm/nouveau/device/pci: set as
> non-CPU-coherent on ARM64"), and sure enough reverting that removes the
> crash.

Thanks for taking the time to bisect this. And apologies as it seems my 
commit is the reason for your troubles.

The CPU coherency flag is used for two things: explicitly sync buffers 
pages when required, and allocating buffers that are not explicitly 
synced (like fences or pushbuffers) using the DMA API. For this latter 
use, it also accesses the buffer's content using the mapping provided by 
dma_alloc_coherent() instead of creating a new one. All nouveau_bos are 
supposed to be written using nouveau_bo_rd32(), and this function 
handles the case of an DMA-API allocated object by detecting that the 
result of ttm_kmap_obj_virtual() is NULL.

But as it turns out, OUT_RINGp() also calls ttm_kmap_obj_virtual() in 
order to perform a memcpy and uses its result directly - which means we 
are doing memcpy on a NULL pointer. We never caught this because we 
typically do not use Nouveau's fbcon with an ARM setup.

I don't really like this special access for coherent objects, and 
actually had a patch in my tree to attempt to remove it (attached). 
Although it is not the whole solution (see below), the issue should at 
least not be visible with it applied - could you confirm?

> I have to say, that commit looks pretty bogus anyway - since
> de335bb49269("PCI: Update DMA configuration from DT") in 4.1, PCI
> devices should correctly inherit the coherency property from their host
> controller's DT node and get the appropriate DMA ops assigned. From a
> brief look at the Nouveau code, I guess it could possibly be the
> assumptions the TTM stuff going awry in the presence of coherent DMA
> ops. Regardless of how the code goes wrong, though, it's trivially
> incorrect to have a blanket statement that PCI devices are non-coherent
> on arm64, so whatever the original issue was this isn't the right way to
> fix it.

You are absolutely right and this needs to be fixed. We still need to 
know about the bus coherency to avoid calling the page sync functions 
when they are not needed though. Is there a way for us to query the bus 
at runtime and know whether it is cpu-coherent or not?

... or maybe we could just unconditionally sync all buffers and let the 
DMA API abstract this away. My concern is that on coherent architectures 
we would still need to loop over all the pages for nothing, as I don't 
think the loop (see e.g. nouveau_bo_sync_for_cpu in nouveau_bo.c) can be 
optimized away by the compiler.

Thanks,
Alex.

-- next part --
A non-text attachment was scrubbed...
Name: 0001-WIP-no-dma-api-for-coherent-gpuobjs.patch
Type: text/x-patch
Size: 3801 bytes
Desc: not available
URL: 



Nouveau crashes in 4.6-rc on arm64

2016-04-08 Thread Robin Murphy
Hi Alex,

On 08/04/16 05:47, Alexandre Courbot wrote:
> Hi Robin,
>
> On 04/07/2016 08:50 PM, Robin Murphy wrote:
>> Hello,
>>
>> With 4.6-rc2 (and -rc1) I'm seeing Nouveau blowing up at boot, from the
>> look of it by dereferencing some offset from NULL inside
>> nouveau_fbcon_imageblit(). My setup is an old XFX 7600GT card plugged
>> into an ARM Juno r1 board, which works fine with 4.5 and earlier.
>>
>> Attached are a couple of logs from booting arm64 defconfig plus DRM and
>> Nouveau enabled - the second also has framebuffer console rotation
>> turned on, which interestingly seems to move the point of failure, and
>> the display does eventually come up to show the tail end of the panic in
>> that case.
>>
>> I might be able to find time for a full bisection next week if isn't
>> something sufficiently obvious to anyone who knows this driver.
>
> Looking at the log it is not clear to me what could be causing this. I
> can boot 4.6-rc2 with a GM206 card without any issue. A bisect would
> indeed be useful here.

OK, turns out the lure of writing something to remotely drive a Juno and 
parse kernel bootlogs through an automatic bisection was too great to 
resist on a Friday afternoon :D

Bisection came down to 1733a2ad3674("drm/nouveau/device/pci: set as 
non-CPU-coherent on ARM64"), and sure enough reverting that removes the 
crash. I have to say, that commit looks pretty bogus anyway - since 
de335bb49269("PCI: Update DMA configuration from DT") in 4.1, PCI 
devices should correctly inherit the coherency property from their host 
controller's DT node and get the appropriate DMA ops assigned. From a 
brief look at the Nouveau code, I guess it could possibly be the 
assumptions the TTM stuff going awry in the presence of coherent DMA 
ops. Regardless of how the code goes wrong, though, it's trivially 
incorrect to have a blanket statement that PCI devices are non-coherent 
on arm64, so whatever the original issue was this isn't the right way to 
fix it.

Robin.

>
> Thanks,
> Alex.
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>



Nouveau crashes in 4.6-rc on arm64

2016-04-08 Thread Alexandre Courbot
Hi Robin,

On 04/07/2016 08:50 PM, Robin Murphy wrote:
> Hello,
>
> With 4.6-rc2 (and -rc1) I'm seeing Nouveau blowing up at boot, from the
> look of it by dereferencing some offset from NULL inside
> nouveau_fbcon_imageblit(). My setup is an old XFX 7600GT card plugged
> into an ARM Juno r1 board, which works fine with 4.5 and earlier.
>
> Attached are a couple of logs from booting arm64 defconfig plus DRM and
> Nouveau enabled - the second also has framebuffer console rotation
> turned on, which interestingly seems to move the point of failure, and
> the display does eventually come up to show the tail end of the panic in
> that case.
>
> I might be able to find time for a full bisection next week if isn't
> something sufficiently obvious to anyone who knows this driver.

Looking at the log it is not clear to me what could be causing this. I 
can boot 4.6-rc2 with a GM206 card without any issue. A bisect would 
indeed be useful here.

Thanks,
Alex.



Nouveau crashes in 4.6-rc on arm64

2016-04-08 Thread Ilia Mirkin
On Fri, Apr 8, 2016 at 12:47 AM, Alexandre Courbot  
wrote:
> Hi Robin,
>
> On 04/07/2016 08:50 PM, Robin Murphy wrote:
>>
>> Hello,
>>
>> With 4.6-rc2 (and -rc1) I'm seeing Nouveau blowing up at boot, from the
>> look of it by dereferencing some offset from NULL inside
>> nouveau_fbcon_imageblit(). My setup is an old XFX 7600GT card plugged
>> into an ARM Juno r1 board, which works fine with 4.5 and earlier.
>>
>> Attached are a couple of logs from booting arm64 defconfig plus DRM and
>> Nouveau enabled - the second also has framebuffer console rotation
>> turned on, which interestingly seems to move the point of failure, and
>> the display does eventually come up to show the tail end of the panic in
>> that case.
>>
>> I might be able to find time for a full bisection next week if isn't
>> something sufficiently obvious to anyone who knows this driver.
>
>
> Looking at the log it is not clear to me what could be causing this. I can
> boot 4.6-rc2 with a GM206 card without any issue. A bisect would indeed be
> useful here.

Presumably not on an arm64 board though. This is happening in the
memcpy done somewhere in fbcon, when doing an OUT_RINGp if the
backtrace is to be believed. This means that the fifo is somehow not
writable, or not set, or ... something. Also note that it's a G73 (aka
pre-G80), so very different paths being taken through the driver.

  -ilia


Nouveau crashes in 4.6-rc on arm64

2016-04-07 Thread Robin Murphy
Hello,

With 4.6-rc2 (and -rc1) I'm seeing Nouveau blowing up at boot, from the 
look of it by dereferencing some offset from NULL inside 
nouveau_fbcon_imageblit(). My setup is an old XFX 7600GT card plugged 
into an ARM Juno r1 board, which works fine with 4.5 and earlier.

Attached are a couple of logs from booting arm64 defconfig plus DRM and 
Nouveau enabled - the second also has framebuffer console rotation 
turned on, which interestingly seems to move the point of failure, and 
the display does eventually come up to show the tail end of the panic in 
that case.

I might be able to find time for a full bisection next week if isn't 
something sufficiently obvious to anyone who knows this driver.

Thanks,
Robin.
-- next part --
[===>]   13458 Kb
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...
[0.00] Booting Linux on physical CPU 0x100
[0.00] Linux version 4.6.0-rc2 (robmur01 at e104324-lin) (gcc version 
5.2.1 20151005 (Linaro GCC 5.2-2015.11-2) ) #400 SMP PREEMPT Thu Apr 7 11:36:48 
BST 2016
[0.00] Boot CPU: AArch64 Processor [410fd033]
[0.00] earlycon: pl11 at MMIO 0x7ff8 (options '115200n8')
[0.00] bootconsole [pl11] enabled
[0.00] efi: Getting EFI parameters from FDT:
[0.00] EFI v2.40 by ARM Juno EFI May 20 2015 12:28:09
[0.00] efi:  ACPI=0xfeb8  ACPI 2.0=0xfeb80014 
[0.00] cma: Reserved 16 MiB at 0xfd80
[0.00] On node 0 totalpages: 2092816
[0.00]   DMA zone: 8192 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 519952 pages, LIFO batch:31
[0.00]   Normal zone: 24576 pages used for memmap
[0.00]   Normal zone: 1572864 pages, LIFO batch:31
[0.00] psci: probing for conduit method from DT.
[0.00] psci: PSCIv1.0 detected in firmware.
[0.00] psci: Using standard PSCI v0.2 function IDs
[0.00] psci: MIGRATE_INFO_TYPE not supported.
[0.00] percpu: Embedded 20 pages/cpu @ffc97feaa000 s43008 r8192 
d30720 u81920
[0.00] pcpu-alloc: s43008 r8192 d30720 u81920 alloc=20*4096
[0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 
[0.00] Detected VIPT I-cache on CPU0
[0.00] CPU features: enabling workaround for ARM erratum 845719
[0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 2060048
[0.00] Kernel command line: root=/dev/sda2 debug earlycon cpuidle.off=1 
consoleblank=0 fbcon=rotate:3
[0.00] log_buf_len individual max cpu contribution: 4096 bytes
[0.00] log_buf_len total cpu_extra contributions: 20480 bytes
[0.00] log_buf_len min size: 16384 bytes
[0.00] log_buf_len: 65536 bytes
[0.00] early log buf free: 14452(88%)
[0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
[0.00] Dentry cache hash table entries: 1048576 (order: 11, 8388608 
bytes)
[0.00] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
[0.00] software IO TLB [mem 0xf980-0xfd80] (64MB) mapped at 
[ffc07980-ffc07d7f]
[0.00] Memory: 8129212K/8371264K available (8328K kernel code, 722K 
rwdata, 3552K rodata, 848K init, 252K bss, 225668K reserved, 16384K 
cma-reserved)
[0.00] Virtual kernel memory layout:
[0.00] modules : 0xff80 - 0xff800800   (   128 
MB)
[0.00] vmalloc : 0xff800800 - 0xffbdbfff   (   246 
GB)
[0.00]   .text : 0xff800808 - 0xff80088a   (  8320 
KB)
[0.00] .rodata : 0xff80088a - 0xff8008c1c000   (  3568 
KB)
[0.00]   .init : 0xff8008c1c000 - 0xff8008cf   (   848 
KB)
[0.00]   .data : 0xff8008cf - 0xff8008da4a00   (   723 
KB)
[0.00] vmemmap : 0xffbdc000 - 0xffbfc000   ( 8 
GB maximum)
[0.00]   0xffbdc000 - 0xffbde600   (   608 
MB actual)
[0.00] fixed   : 0xffbffe7fd000 - 0xffbffec0   (  4108 
KB)
[0.00] PCI I/O : 0xffbffee0 - 0xffbfffe0   (16 
MB)
[0.00] memory  : 0xffc0 - 0xffc98000   ( 38912 
MB)
[0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=6, Nodes=1
[0.00] Preemptible hierarchical RCU implementation.
[0.00]  Build-time adjustment of leaf fanout to 64.
[0.00]  RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=6.
[0.00] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=6
[0.00] NR_IRQS:64 nr_irqs:64 0
[0.00] GIC: Using split EOI/Deactivate mode
[0.00] GICv2m: range[mem 0x2c1c-0x2c1c0fff], SPI[224:255]
[0.00] Architected cp15 and mmio timer(s)