Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-07 Thread Markus Trippelsdorf
On 2011.12.07 at 15:32 +0100, Robert Richter wrote:
 On 02.12.11 21:48:20, Markus Trippelsdorf wrote:
  BTW I always see (mostly only on screen, sometimes in the logs):
  
  [Firmware Bug]: cpu 2, try to use APIC500 (LVT offset 0) for vector 
  0x10400, but the register is already in use for vector 0xf9 on another cpu
  [Firmware Bug]: cpu 2, IBS interrupt offset 0 not available 
  (MSRC001103A=0x0100)
  [Firmware Bug]: using offset 1 for IBS interrupts
  [Firmware Bug]: workaround enabled for IBS LVT offset
  perf: AMD IBS detected (0x001f) 
  
  But I hope that it is only a harmless warning. 
  (perf Instruction-Based Sampling)
 
 Yes, the message always apears on AMD family 10h. Nothing to worry
 about.
 
 A patch is on the way to soften the message to not scare the people:
 
  
 http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commit;h=16e5294e5f8303756a179cf218e37dfb9ed34417

Thanks.
It's already in mainline and the message is gone now.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=16e5294e5f8303756a179cf218e37dfb9ed34417

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-07 Thread Robert Richter
On 02.12.11 21:48:20, Markus Trippelsdorf wrote:
 BTW I always see (mostly only on screen, sometimes in the logs):
 
 [Firmware Bug]: cpu 2, try to use APIC500 (LVT offset 0) for vector 0x10400, 
 but the register is already in use for vector 0xf9 on another cpu
 [Firmware Bug]: cpu 2, IBS interrupt offset 0 not available 
 (MSRC001103A=0x0100)
 [Firmware Bug]: using offset 1 for IBS interrupts
 [Firmware Bug]: workaround enabled for IBS LVT offset
 perf: AMD IBS detected (0x001f) 
 
 But I hope that it is only a harmless warning. 
 (perf Instruction-Based Sampling)

Yes, the message always apears on AMD family 10h. Nothing to worry
about.

A patch is on the way to soften the message to not scare the people:

 
http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commit;h=16e5294e5f8303756a179cf218e37dfb9ed34417

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread David Laight
 
  If I had to guess it looks like 0 is getting written back to some
  random page by the GPU maybe, it could be that the GPU is in some
half
  setup state at boot or on a reboot does it happen from a cold boot
or
  just warm boot or kexec?
 
 Only happened with kexec thus far. Cold boot seems to be fine.

Sounds like the GPU is writing to physical memory from the
old mappings.
This can happen to other devices if they aren't completely
disabled - which may not happen since the kexec case probably
avoids some of the hardware resets that occurr diring a normal
reboot.

I remember an ethernet chip writing into its rx ring/buffer
area following a reboot (and reinstall!) when connected
to a quiet lan.

David


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Markus Trippelsdorf
On 2011.12.05 at 12:10 -0500, Jerome Glisse wrote:
 On Sun, Dec 04, 2011 at 02:02:00AM +0100, Markus Trippelsdorf wrote:
  On 2011.12.03 at 14:31 -0500, Jerome Glisse wrote:
   On Sat, Dec 3, 2011 at 7:29 AM, Markus Trippelsdorf
   mar...@trippelsdorf.de wrote:
On 2011.12.03 at 12:20 +, Dave Airlie wrote:
 FIX idr_layer_cache: Marking all objects used
   
Yesterday I couldn't reproduce the issue at all. But today 
I've hit
exactly the same spot again. (CCing the drm list)
   
If I had to guess it looks like 0 is getting written back to some
random page by the GPU maybe, it could be that the GPU is in some half
setup state at boot or on a reboot does it happen from a cold boot or
just warm boot or kexec?
   
Only happened with kexec thus far. Cold boot seems to be fine.
   
   
   Can you add radeon.no_wb=1 to your kexec kernel paramater an see if
   you can reproduce.
  
  No, I cannot reproduce the issue with radeon.no_wb=1. (I write this
  after 700 successful kexec iterations...)
  
 
 Can you try if attached patch fix the issue when you don't pass the
 radeon.no_wb=1 option ?

Yes the patch finally fixes the issue for me (tested with 120 kexec
iterations).
Thanks Jerome!

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Jerome Glisse
On Mon, Dec 5, 2011 at 1:15 PM, Markus Trippelsdorf
mar...@trippelsdorf.de wrote:
 On 2011.12.05 at 12:10 -0500, Jerome Glisse wrote:
 On Sun, Dec 04, 2011 at 02:02:00AM +0100, Markus Trippelsdorf wrote:
  On 2011.12.03 at 14:31 -0500, Jerome Glisse wrote:
   On Sat, Dec 3, 2011 at 7:29 AM, Markus Trippelsdorf
   mar...@trippelsdorf.de wrote:
On 2011.12.03 at 12:20 +, Dave Airlie wrote:
 FIX idr_layer_cache: Marking all objects used
   
Yesterday I couldn't reproduce the issue at all. But today 
I've hit
exactly the same spot again. (CCing the drm list)
   
If I had to guess it looks like 0 is getting written back to some
random page by the GPU maybe, it could be that the GPU is in some half
setup state at boot or on a reboot does it happen from a cold boot or
just warm boot or kexec?
   
Only happened with kexec thus far. Cold boot seems to be fine.
   
  
   Can you add radeon.no_wb=1 to your kexec kernel paramater an see if
   you can reproduce.
 
  No, I cannot reproduce the issue with radeon.no_wb=1. (I write this
  after 700 successful kexec iterations...)
 

 Can you try if attached patch fix the issue when you don't pass the
 radeon.no_wb=1 option ?

 Yes the patch finally fixes the issue for me (tested with 120 kexec
 iterations).
 Thanks Jerome!

 --
 Markus

Will respin with some minor code changes.

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Jerome Glisse
On Mon, Dec 05, 2011 at 07:15:49PM +0100, Markus Trippelsdorf wrote:
 On 2011.12.05 at 12:10 -0500, Jerome Glisse wrote:
  On Sun, Dec 04, 2011 at 02:02:00AM +0100, Markus Trippelsdorf wrote:
   On 2011.12.03 at 14:31 -0500, Jerome Glisse wrote:
On Sat, Dec 3, 2011 at 7:29 AM, Markus Trippelsdorf
mar...@trippelsdorf.de wrote:
 On 2011.12.03 at 12:20 +, Dave Airlie wrote:
  FIX idr_layer_cache: Marking all objects used

 Yesterday I couldn't reproduce the issue at all. But today 
 I've hit
 exactly the same spot again. (CCing the drm list)

 If I had to guess it looks like 0 is getting written back to some
 random page by the GPU maybe, it could be that the GPU is in some 
 half
 setup state at boot or on a reboot does it happen from a cold boot or
 just warm boot or kexec?

 Only happened with kexec thus far. Cold boot seems to be fine.


Can you add radeon.no_wb=1 to your kexec kernel paramater an see if
you can reproduce.
   
   No, I cannot reproduce the issue with radeon.no_wb=1. (I write this
   after 700 successful kexec iterations...)
   
  
  Can you try if attached patch fix the issue when you don't pass the
  radeon.no_wb=1 option ?
 
 Yes the patch finally fixes the issue for me (tested with 120 kexec
 iterations).
 Thanks Jerome!
 
 -- 
 Markus

Can you do a kick run on the modified patch ?

I believe this patch could go to stable too as it's low
impact from my pov.

Cheers,
Jerome
From cccfa6f93faa6b556fd72e318606a01e333e67d3 Mon Sep 17 00:00:00 2001
From: Jerome Glisse jgli...@redhat.com
Date: Mon, 5 Dec 2011 12:02:17 -0500
Subject: [PATCH] drm/radeon: disable possible GPU writeback early v2

Given how kexec works we need to disable any kind of GPU writeback
early in GPU initialization just in case some are still active from
previous setup.

v2 follow previous sanity work done on earlier radeon, also write
reg uncondionaly and disable irq too.

Signed-off-by: Jerome Glisse jgli...@redhat.com
---
 drivers/gpu/drm/radeon/evergreen.c   |2 ++
 drivers/gpu/drm/radeon/ni.c  |   18 ++
 drivers/gpu/drm/radeon/nid.h |   19 +++
 drivers/gpu/drm/radeon/r100.c|   20 ++--
 drivers/gpu/drm/radeon/r520.c|2 +-
 drivers/gpu/drm/radeon/r600.c|   16 
 drivers/gpu/drm/radeon/radeon_asic.h |2 ++
 drivers/gpu/drm/radeon/rs600.c   |   20 +++-
 drivers/gpu/drm/radeon/rs600d.h  |   21 +
 drivers/gpu/drm/radeon/rs690.c   |2 +-
 drivers/gpu/drm/radeon/rv515.c   |2 +-
 drivers/gpu/drm/radeon/rv770.c   |   16 
 drivers/gpu/drm/radeon/rv770d.h  |   20 
 13 files changed, 142 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index 1934728..6109579 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3249,6 +3249,8 @@ int evergreen_init(struct radeon_device *rdev)
 {
int r;
 
+   /* restore some register to sane defaults */
+   rv770_restore_sanity(rdev);
/* This don't do much */
r = radeon_gem_init(rdev);
if (r)
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index c15fc8b..f5d7054 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -1566,6 +1566,22 @@ int cayman_suspend(struct radeon_device *rdev)
return 0;
 }
 
+/*
+ * Due to how kexec works, it can leave the hw fully initialised when it
+ * boots the new kernel.
+ */
+static void cayman_restore_sanity(struct radeon_device *rdev)
+{
+   /* stop possible GPU activities */
+   WREG32(IH_RB_CNTL, 0);
+   WREG32(IH_CNTL, 0);
+   WREG32(CP_ME_CNTL, CP_ME_HALT | CP_PFP_HALT);
+   WREG32(SCRATCH_UMSK, 0);
+   WREG32(CP_RB0_CNTL, RB_NO_UPDATE);
+   WREG32(CP_RB1_CNTL, RB_NO_UPDATE);
+   WREG32(CP_RB2_CNTL, RB_NO_UPDATE);
+}
+
 /* Plan is to move initialization in that function and use
  * helper function so that radeon_device_init pretty much
  * do nothing more than calling asic specific function. This
@@ -1577,6 +1593,8 @@ int cayman_init(struct radeon_device *rdev)
struct radeon_ring *ring = rdev-ring[RADEON_RING_TYPE_GFX_INDEX];
int r;
 
+   /* restore some register to sane defaults */
+   cayman_restore_sanity(rdev);
/* This don't do much */
r = radeon_gem_init(rdev);
if (r)
diff --git a/drivers/gpu/drm/radeon/nid.h b/drivers/gpu/drm/radeon/nid.h
index 4640334..3aa33c6 100644
--- a/drivers/gpu/drm/radeon/nid.h
+++ b/drivers/gpu/drm/radeon/nid.h
@@ -162,6 +162,25 @@
 #define HDP_MISC_CNTL  0x2F4C
 #defineHDP_FLUSH_INVALIDATE_CACHE  (1  0)
 
+#define IH_RB_CNTL0x3e00
+#   

Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Markus Trippelsdorf
On 2011.12.05 at 14:11 -0500, Jerome Glisse wrote:
 On Mon, Dec 05, 2011 at 07:15:49PM +0100, Markus Trippelsdorf wrote:
  On 2011.12.05 at 12:10 -0500, Jerome Glisse wrote:
   On Sun, Dec 04, 2011 at 02:02:00AM +0100, Markus Trippelsdorf wrote:
On 2011.12.03 at 14:31 -0500, Jerome Glisse wrote:
 On Sat, Dec 3, 2011 at 7:29 AM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2011.12.03 at 12:20 +, Dave Airlie wrote:
   FIX idr_layer_cache: Marking all objects used
 
  Yesterday I couldn't reproduce the issue at all. But 
  today I've hit
  exactly the same spot again. (CCing the drm list)
 
  If I had to guess it looks like 0 is getting written back to some
  random page by the GPU maybe, it could be that the GPU is in some 
  half
  setup state at boot or on a reboot does it happen from a cold boot 
  or
  just warm boot or kexec?
 
  Only happened with kexec thus far. Cold boot seems to be fine.
 
 
 Can you add radeon.no_wb=1 to your kexec kernel paramater an see if
 you can reproduce.

No, I cannot reproduce the issue with radeon.no_wb=1. (I write this
after 700 successful kexec iterations...)

   
   Can you try if attached patch fix the issue when you don't pass the
   radeon.no_wb=1 option ?
  
  Yes the patch finally fixes the issue for me (tested with 120 kexec
  iterations).
  Thanks Jerome!
  
  -- 
  Markus
 
 Can you do a kick run on the modified patch ?

This one is also OK after ~60 iterations.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Pekka Enberg
On Mon, Dec 5, 2011 at 9:27 PM, Markus Trippelsdorf
mar...@trippelsdorf.de wrote:
  Yes the patch finally fixes the issue for me (tested with 120 kexec
  iterations).
  Thanks Jerome!

 Can you do a kick run on the modified patch ?

 This one is also OK after ~60 iterations.

Jerome, could you please include a reference to this LKML thread for
context and attribution for Markus for reporting and following up to
get the issue fixed in the changelog?

  Pekka
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Jerome Glisse
On Mon, Dec 05, 2011 at 10:10:34PM +0200, Pekka Enberg wrote:
 On Mon, Dec 5, 2011 at 9:27 PM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
   Yes the patch finally fixes the issue for me (tested with 120 kexec
   iterations).
   Thanks Jerome!
 
  Can you do a kick run on the modified patch ?
 
  This one is also OK after ~60 iterations.
 
 Jerome, could you please include a reference to this LKML thread for
 context and attribution for Markus for reporting and following up to
 get the issue fixed in the changelog?
 
   Pekka

Attached updated patch, only changelog is different. Thanks Markus for
testing this.

Cheers,
Jerome
From cccfa6f93faa6b556fd72e318606a01e333e67d3 Mon Sep 17 00:00:00 2001
From: Jerome Glisse jgli...@redhat.com
Date: Mon, 5 Dec 2011 12:02:17 -0500
Subject: [PATCH] drm/radeon: disable possible GPU writeback early v3

Given how kexec works we need to disable any kind of GPU writeback
early in GPU initialization just in case some are still active from
previous setup. This patch is done to fix the issue described in
the lkml thread :

WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

https://lkml.org/lkml/2011/12/5/466

Thanks to Markus Trippelsdorf for testing this.

v2 follow previous sanity work done on earlier radeon, also write
reg uncondionaly and disable irq too.
v3 update change log

Signed-off-by: Jerome Glisse jgli...@redhat.com
Tested-by: Markus Trippelsdorf mar...@trippelsdorf.de
---
 drivers/gpu/drm/radeon/evergreen.c   |2 ++
 drivers/gpu/drm/radeon/ni.c  |   18 ++
 drivers/gpu/drm/radeon/nid.h |   19 +++
 drivers/gpu/drm/radeon/r100.c|   20 ++--
 drivers/gpu/drm/radeon/r520.c|2 +-
 drivers/gpu/drm/radeon/r600.c|   16 
 drivers/gpu/drm/radeon/radeon_asic.h |2 ++
 drivers/gpu/drm/radeon/rs600.c   |   20 +++-
 drivers/gpu/drm/radeon/rs600d.h  |   21 +
 drivers/gpu/drm/radeon/rs690.c   |2 +-
 drivers/gpu/drm/radeon/rv515.c   |2 +-
 drivers/gpu/drm/radeon/rv770.c   |   16 
 drivers/gpu/drm/radeon/rv770d.h  |   20 
 13 files changed, 142 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index 1934728..6109579 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3249,6 +3249,8 @@ int evergreen_init(struct radeon_device *rdev)
 {
int r;
 
+   /* restore some register to sane defaults */
+   rv770_restore_sanity(rdev);
/* This don't do much */
r = radeon_gem_init(rdev);
if (r)
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index c15fc8b..f5d7054 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -1566,6 +1566,22 @@ int cayman_suspend(struct radeon_device *rdev)
return 0;
 }
 
+/*
+ * Due to how kexec works, it can leave the hw fully initialised when it
+ * boots the new kernel.
+ */
+static void cayman_restore_sanity(struct radeon_device *rdev)
+{
+   /* stop possible GPU activities */
+   WREG32(IH_RB_CNTL, 0);
+   WREG32(IH_CNTL, 0);
+   WREG32(CP_ME_CNTL, CP_ME_HALT | CP_PFP_HALT);
+   WREG32(SCRATCH_UMSK, 0);
+   WREG32(CP_RB0_CNTL, RB_NO_UPDATE);
+   WREG32(CP_RB1_CNTL, RB_NO_UPDATE);
+   WREG32(CP_RB2_CNTL, RB_NO_UPDATE);
+}
+
 /* Plan is to move initialization in that function and use
  * helper function so that radeon_device_init pretty much
  * do nothing more than calling asic specific function. This
@@ -1577,6 +1593,8 @@ int cayman_init(struct radeon_device *rdev)
struct radeon_ring *ring = rdev-ring[RADEON_RING_TYPE_GFX_INDEX];
int r;
 
+   /* restore some register to sane defaults */
+   cayman_restore_sanity(rdev);
/* This don't do much */
r = radeon_gem_init(rdev);
if (r)
diff --git a/drivers/gpu/drm/radeon/nid.h b/drivers/gpu/drm/radeon/nid.h
index 4640334..3aa33c6 100644
--- a/drivers/gpu/drm/radeon/nid.h
+++ b/drivers/gpu/drm/radeon/nid.h
@@ -162,6 +162,25 @@
 #define HDP_MISC_CNTL  0x2F4C
 #defineHDP_FLUSH_INVALIDATE_CACHE  (1  0)
 
+#define IH_RB_CNTL0x3e00
+#   define IH_RB_ENABLE   (1  0)
+#   define IH_IB_SIZE(x)  ((x)  1) /* log2 */
+#   define IH_RB_FULL_DRAIN_ENABLE(1  6)
+#   define IH_WPTR_WRITEBACK_ENABLE   (1  8)
+#   define IH_WPTR_WRITEBACK_TIMER(x) ((x)  9) /* log2 */
+#   define IH_WPTR_OVERFLOW_ENABLE(1  16)
+#   define IH_WPTR_OVERFLOW_CLEAR (1  31)
+#define IH_CNTL   0x3e18
+#   define 

Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-04 Thread Jerome Glisse
On Sat, Dec 3, 2011 at 8:02 PM, Markus Trippelsdorf
mar...@trippelsdorf.de wrote:
 On 2011.12.03 at 14:31 -0500, Jerome Glisse wrote:
 On Sat, Dec 3, 2011 at 7:29 AM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2011.12.03 at 12:20 +, Dave Airlie wrote:
   FIX idr_layer_cache: Marking all objects used
 
  Yesterday I couldn't reproduce the issue at all. But today I've 
  hit
  exactly the same spot again. (CCing the drm list)
 
  If I had to guess it looks like 0 is getting written back to some
  random page by the GPU maybe, it could be that the GPU is in some half
  setup state at boot or on a reboot does it happen from a cold boot or
  just warm boot or kexec?
 
  Only happened with kexec thus far. Cold boot seems to be fine.
 
  --
  Markus

 Can you add radeon.no_wb=1 to your kexec kernel paramater an see if
 you can reproduce.

 No, I cannot reproduce the issue with radeon.no_wb=1. (I write this
 after 700 successful kexec iterations...)


Ok so it's GPU writeback will do a patch on monday.

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-03 Thread Dave Airlie
 FIX idr_layer_cache: Marking all objects used
   
Yesterday I couldn't reproduce the issue at all. But today I've hit
exactly the same spot again. (CCing the drm list)

If I had to guess it looks like 0 is getting written back to some
random page by the GPU maybe, it could be that the GPU is in some half
setup state at boot or on a reboot does it happen from a cold boot or
just warm boot or kexec?

Jerome, might be worth checking the ordering for when bus master gets
enabled or if we turn off the writeback producers before writeback is
enabled.

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-03 Thread Markus Trippelsdorf
On 2011.12.03 at 12:20 +, Dave Airlie wrote:
  FIX idr_layer_cache: Marking all objects used

 Yesterday I couldn't reproduce the issue at all. But today I've hit
 exactly the same spot again. (CCing the drm list)
 
 If I had to guess it looks like 0 is getting written back to some
 random page by the GPU maybe, it could be that the GPU is in some half
 setup state at boot or on a reboot does it happen from a cold boot or
 just warm boot or kexec?

Only happened with kexec thus far. Cold boot seems to be fine.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-03 Thread Jerome Glisse
On Sat, Dec 3, 2011 at 2:31 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Sat, Dec 3, 2011 at 7:29 AM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
 On 2011.12.03 at 12:20 +, Dave Airlie wrote:
  FIX idr_layer_cache: Marking all objects used

 Yesterday I couldn't reproduce the issue at all. But today I've hit
 exactly the same spot again. (CCing the drm list)

 If I had to guess it looks like 0 is getting written back to some
 random page by the GPU maybe, it could be that the GPU is in some half
 setup state at boot or on a reboot does it happen from a cold boot or
 just warm boot or kexec?

 Only happened with kexec thus far. Cold boot seems to be fine.

 --
 Markus

 Can you add radeon.no_wb=1 to your kexec kernel paramater an see if
 you can reproduce.

 Cheers,
 Jerome

Also cold boot with radeon.no_wb=1 :)

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-03 Thread Markus Trippelsdorf
On 2011.12.03 at 14:31 -0500, Jerome Glisse wrote:
 On Sat, Dec 3, 2011 at 7:29 AM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2011.12.03 at 12:20 +, Dave Airlie wrote:
   FIX idr_layer_cache: Marking all objects used
 
  Yesterday I couldn't reproduce the issue at all. But today I've 
  hit
  exactly the same spot again. (CCing the drm list)
 
  If I had to guess it looks like 0 is getting written back to some
  random page by the GPU maybe, it could be that the GPU is in some half
  setup state at boot or on a reboot does it happen from a cold boot or
  just warm boot or kexec?
 
  Only happened with kexec thus far. Cold boot seems to be fine.
 
  --
  Markus
 
 Can you add radeon.no_wb=1 to your kexec kernel paramater an see if
 you can reproduce.

No, I cannot reproduce the issue with radeon.no_wb=1. (I write this
after 700 successful kexec iterations...)

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-02 Thread Jerome Glisse
On Thu, Dec 01, 2011 at 09:44:37AM +0100, Markus Trippelsdorf wrote:
 On 2011.11.24 at 09:50 +0100, Markus Trippelsdorf wrote:
  On 2011.11.23 at 10:06 -0600, Christoph Lameter wrote:
   On Wed, 23 Nov 2011, Markus Trippelsdorf wrote:
   
 FIX idr_layer_cache: Marking all objects used
   
Yesterday I couldn't reproduce the issue at all. But today I've hit
exactly the same spot again. (CCing the drm list)
   
   Well this is looks like write after free.
   
=
BUG idr_layer_cache: Poison overwritten
-
Object 8802156487c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b  
Object 8802156487d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b  
Object 8802156487e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b  
Object 8802156487f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b  
Object 880215648800: 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b  
Object 880215648810: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b  
   
   And its an integer sized write of 0. If you look at the struct definition
   and lookup the offset you should be able to locate the field that
   was modified.
 
 It also happens with CONFIG_SLAB. 
 (If someone wants to reproduce the issue, just run a kexec boot loop and
 the bug will occur after a few (~10) iterations.)
 

Can you provide the kexec command line you are using and full kernel
log (mostly interested in kernel option).

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-02 Thread Markus Trippelsdorf
On 2011.12.02 at 14:43 -0500, Jerome Glisse wrote:
 On Thu, Dec 01, 2011 at 09:44:37AM +0100, Markus Trippelsdorf wrote:
  On 2011.11.24 at 09:50 +0100, Markus Trippelsdorf wrote:
   On 2011.11.23 at 10:06 -0600, Christoph Lameter wrote:
On Wed, 23 Nov 2011, Markus Trippelsdorf wrote:

  FIX idr_layer_cache: Marking all objects used

 Yesterday I couldn't reproduce the issue at all. But today I've hit
 exactly the same spot again. (CCing the drm list)

Well this is looks like write after free.

 =
 BUG idr_layer_cache: Poison overwritten
 -
 Object 8802156487c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
 6b  
 Object 8802156487d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
 6b  
 Object 8802156487e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
 6b  
 Object 8802156487f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
 6b  
 Object 880215648800: 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
 6b  
 Object 880215648810: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
 6b  

And its an integer sized write of 0. If you look at the struct 
definition
and lookup the offset you should be able to locate the field that
was modified.
  
  It also happens with CONFIG_SLAB. 
  (If someone wants to reproduce the issue, just run a kexec boot loop and
  the bug will occur after a few (~10) iterations.)
  
 
 Can you provide the kexec command line you are using and full kernel
 log (mostly interested in kernel option).

/usr/sbin/kexec -l /usr/src/linux/arch/x86/boot/bzImage 
--append=root=PARTUUID=6d6a4009-3a90-40df-806a-e63f48189719 init=/sbin/minit 
rootflags=logbsize=262144 fbcon=rotate:3 drm_kms_helper.poll=0 quiet
/usr/sbin/kexec -e

(The loop happens after autologin in .zprofile:
sleep 4  sudo /etc/minit/ctrlaltdel/run
(the last script kills, unmounts and then runs the two kexec commands
above))

Linux version 3.2.0-rc4-00089-g621fc1e-dirty (mar...@x4.trippels.de) (gcc 
version 4.6.3 20111202 (prerelease) (GCC) ) #134 SMP PREEMPT Fri Dec 2 11:06:20 
CET 2011
Command line: root=PARTUUID=6d6a4009-3a90-40df-806a-e63f48189719 
init=/sbin/minit rootflags=logbsize=262144 fbcon=rotate:3 drm_kms_helper.poll=0 
quiet
KERNEL supported cpus:
  AMD AuthenticAMD
BIOS-provided physical RAM map:
 BIOS-e820: 0100 - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e6000 - 0010 (reserved)
 BIOS-e820: 0010 - dfe9 (usable)
 BIOS-e820: dfe9 - dfea8000 (ACPI data)
 BIOS-e820: dfea8000 - dfed (ACPI NVS)
 BIOS-e820: dfed - dff0 (reserved)
 BIOS-e820: fff0 - 0001 (reserved)
 BIOS-e820: 0001 - 00022000 (usable)
NX (Execute Disable) protection: active
DMI present.
DMI: System manufacturer System Product Name/M4A78T-E, BIOS 340608/20/2010
e820 update range:  - 0001 (usable) == (reserved)
e820 remove range: 000a - 0010 (usable)
last_pfn = 0x22 max_arch_pfn = 0x4
MTRR default type: uncachable
MTRR fixed ranges enabled:
  0-9 write-back
  A-E uncachable
  F-F write-protect
MTRR variable ranges enabled:
  0 base  mask 8000 write-back
  1 base 8000 mask C000 write-back
  2 base C000 mask E000 write-back
  3 base F000 mask F800 write-combining
  4 disabled
  5 disabled
  6 disabled
  7 disabled
TOM2: 00022000 aka 8704M
x86 PAT enabled: cpu 0, old 0x7010600070106, new 0x7010600070106
last_pfn = 0xdfe90 max_arch_pfn = 0x4
initial memory mapped : 0 - 2000
Base memory trampoline at [8809d000] 9d000 size 8192
Using GB pages for direct mapping
init_memory_mapping: -dfe9
 00 - 00c000 page 1G
 00c000 - 00dfe0 page 2M
 00dfe0 - 00dfe9 page 4k
kernel direct mapping tables up to dfe9 @ 1fffd000-2000
init_memory_mapping: 0001-00022000
 01 - 02 page 1G
 02 - 022000 page 2M
kernel direct mapping tables up to 22000 @ dfe8e000-dfe9
ACPI: RSDP 000fb880 00024 (v02 ACPIAM)
ACPI: XSDT dfe90100 0005C (v01 082010 XSDT1403 20100820 MSFT 0097)
ACPI: FACP dfe90290 000F4 (v03 082010 FACP1403 20100820 MSFT 0097)
ACPI Warning: Optional field Pm2ControlBlock has zero address or length: 
0x/0x1 (20110623/tbfadt-560)
ACPI: DSDT dfe90450 0E6FE (v01  A1152 A1152000  INTL 20060113)
ACPI: FACS dfea8000 00040

Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-02 Thread Markus Trippelsdorf
On 2011.12.02 at 21:06 +0100, Markus Trippelsdorf wrote:
 On 2011.12.02 at 14:43 -0500, Jerome Glisse wrote:
  On Thu, Dec 01, 2011 at 09:44:37AM +0100, Markus Trippelsdorf wrote:
   On 2011.11.24 at 09:50 +0100, Markus Trippelsdorf wrote:
On 2011.11.23 at 10:06 -0600, Christoph Lameter wrote:
 On Wed, 23 Nov 2011, Markus Trippelsdorf wrote:
 
   FIX idr_layer_cache: Marking all objects used
 
  Yesterday I couldn't reproduce the issue at all. But today I've hit
  exactly the same spot again. (CCing the drm list)
 
 Well this is looks like write after free.
 
  =
  BUG idr_layer_cache: Poison overwritten
  -
  Object 8802156487c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
  6b 6b  
  Object 8802156487d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
  6b 6b  
  Object 8802156487e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
  6b 6b  
  Object 8802156487f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
  6b 6b  
  Object 880215648800: 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
  6b 6b  
  Object 880215648810: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
  6b 6b  
 
 And its an integer sized write of 0. If you look at the struct 
 definition
 and lookup the offset you should be able to locate the field that
 was modified.
   
   It also happens with CONFIG_SLAB. 
   (If someone wants to reproduce the issue, just run a kexec boot loop and
   the bug will occur after a few (~10) iterations.)
   
  
  Can you provide the kexec command line you are using and full kernel
  log (mostly interested in kernel option).
 
 /usr/sbin/kexec -l /usr/src/linux/arch/x86/boot/bzImage 
 --append=root=PARTUUID=6d6a4009-3a90-40df-806a-e63f48189719 init=/sbin/minit 
 rootflags=logbsize=262144 fbcon=rotate:3 drm_kms_helper.poll=0 quiet
 /usr/sbin/kexec -e
 
 (The loop happens after autologin in .zprofile:
 sleep 4  sudo /etc/minit/ctrlaltdel/run
 (the last script kills, unmounts and then runs the two kexec commands
 above))

BTW I always see (mostly only on screen, sometimes in the logs):

[Firmware Bug]: cpu 2, try to use APIC500 (LVT offset 0) for vector 0x10400, 
but the register is already in use for vector 0xf9 on another cpu
[Firmware Bug]: cpu 2, IBS interrupt offset 0 not available 
(MSRC001103A=0x0100)
[Firmware Bug]: using offset 1 for IBS interrupts
[Firmware Bug]: workaround enabled for IBS LVT offset
perf: AMD IBS detected (0x001f) 

But I hope that it is only a harmless warning. 
(perf Instruction-Based Sampling)

Robert?

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-02 Thread Jerome Glisse
On Thu, Nov 24, 2011 at 09:50:40AM +0100, Markus Trippelsdorf wrote:
 On 2011.11.23 at 10:06 -0600, Christoph Lameter wrote:
  On Wed, 23 Nov 2011, Markus Trippelsdorf wrote:
  
FIX idr_layer_cache: Marking all objects used
  
   Yesterday I couldn't reproduce the issue at all. But today I've hit
   exactly the same spot again. (CCing the drm list)
  
  Well this is looks like write after free.
  
   =
   BUG idr_layer_cache: Poison overwritten
   -
   Object 8802156487c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
   
   Object 8802156487d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
   
   Object 8802156487e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
   
   Object 8802156487f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
   
   Object 880215648800: 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
   
   Object 880215648810: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
   
  
  And its an integer sized write of 0. If you look at the struct definition
  and lookup the offset you should be able to locate the field that
  was modified.
 
 Here are two more BUGs that seem to point to the same bug:
 
 1)
 ...
 Nov 21 18:30:30 x4 kernel: [drm] radeon: irq initialized.
 Nov 21 18:30:30 x4 kernel: [drm] GART: num cpu pages 131072, num gpu pages 
 131072
 Nov 21 18:30:30 x4 kernel: [drm] Loading RS780 Microcode
 Nov 21 18:30:30 x4 kernel: [drm] PCIE GART of 512M enabled (table at 
 0xC004).
 Nov 21 18:30:30 x4 kernel: radeon :01:05.0: WB enabled
 Nov 21 18:30:30 x4 kernel: 
 =
 Nov 21 18:30:30 x4 kernel: BUG task_xstate: Not a valid slab page
 Nov 21 18:30:30 x4 kernel: 
 -
 Nov 21 18:30:30 x4 kernel:
 Nov 21 18:30:30 x4 kernel: INFO: Slab 0xea044300 objects=32767 
 used=65535 fp=0x  (null) flags=0x0401
 Nov 21 18:30:30 x4 kernel: Pid: 9, comm: ksoftirqd/1 Not tainted 
 3.2.0-rc2-00274-g6fe4c6d-dirty #75
 Nov 21 18:30:30 x4 kernel: Call Trace:
 Nov 21 18:30:30 x4 kernel: [81101c1d] slab_err+0x7d/0x90
 Nov 21 18:30:30 x4 kernel: [8103e29f] ? dump_trace+0x16f/0x2e0
 Nov 21 18:30:30 x4 kernel: [81044764] ? free_thread_xstate+0x24/0x40
 Nov 21 18:30:30 x4 kernel: [81044764] ? free_thread_xstate+0x24/0x40
 Nov 21 18:30:30 x4 kernel: [81102566] check_slab+0x96/0xc0
 Nov 21 18:30:30 x4 kernel: [814c5c29] 
 free_debug_processing+0x34/0x19c
 Nov 21 18:30:30 x4 kernel: [81101d9a] ? set_track+0x5a/0x190
 Nov 21 18:30:30 x4 kernel: [8110cf2b] ? sys_open+0x1b/0x20
 Nov 21 18:30:30 x4 kernel: [814c5e55] __slab_free+0x33/0x2d0
 Nov 21 18:30:30 x4 kernel: [8110cf2b] ? sys_open+0x1b/0x20
 Nov 21 18:30:30 x4 kernel: [81105134] kmem_cache_free+0x104/0x120
 Nov 21 18:30:30 x4 kernel: [81044764] free_thread_xstate+0x24/0x40
 Nov 21 18:30:30 x4 kernel: [81044794] free_thread_info+0x14/0x30
 Nov 21 18:30:30 x4 kernel: [8106a4ff] free_task+0x2f/0x50
 Nov 21 18:30:30 x4 kernel: [8106a5d0] __put_task_struct+0xb0/0x110
 Nov 21 18:30:30 x4 kernel: [8106eb4b] 
 delayed_put_task_struct+0x3b/0xa0
 Nov 21 18:30:30 x4 kernel: [810aa01a] 
 __rcu_process_callbacks+0x12a/0x350
 Nov 21 18:30:30 x4 kernel: [810aa2a2] 
 rcu_process_callbacks+0x62/0x140
 Nov 21 18:30:30 x4 kernel: [81072e18] __do_softirq+0xa8/0x200
 Nov 21 18:30:30 x4 kernel: [81073077] run_ksoftirqd+0x107/0x210
 Nov 21 18:30:30 x4 kernel: [81072f70] ? __do_softirq+0x200/0x200
 Nov 21 18:30:30 x4 kernel: [8108bb87] kthread+0x87/0x90
 Nov 21 18:30:30 x4 kernel: [814cdcf4] kernel_thread_helper+0x4/0x10
 Nov 21 18:30:30 x4 kernel: [8108bb00] ? 
 kthread_flush_work_fn+0x10/0x10
 Nov 21 18:30:30 x4 kernel: [814cdcf0] ? gs_change+0xb/0xb
 Nov 21 18:30:30 x4 kernel: FIX task_xstate: Object at 0x8110cf2b not 
 freed
 Nov 21 18:30:30 x4 kernel: [drm] ring test succeeded in 1 usecs
 Nov 21 18:30:30 x4 kernel: [drm] radeon: ib pool ready.
 Nov 21 18:30:30 x4 kernel: [drm] ib test succeeded in 0 usecs
 Nov 21 18:30:30 x4 kernel: [drm] Radeon Display Connectors
 Nov 21 18:30:30 x4 kernel: [drm] Connector 0
 
 2)
 ...
 Nov 21 17:04:38 x4 kernel: fbcon: radeondrmfb (fb0) is primary device
 Nov 21 17:04:38 x4 kernel: Console: switching to colour frame buffer device 
 131x105
 Nov 21 17:04:38 x4 kernel: fb0: radeondrmfb frame buffer device
 Nov 21 17:04:38 x4 kernel: drm: registered panic notifier
 Nov 21 17:04:38 x4 kernel: [drm] Initialized radeon 2.11.0 20080528 for 
 :01:05.0 on minor 0
 Nov 21 17:04:38 

Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-01 Thread Markus Trippelsdorf
On 2011.11.24 at 09:50 +0100, Markus Trippelsdorf wrote:
 On 2011.11.23 at 10:06 -0600, Christoph Lameter wrote:
  On Wed, 23 Nov 2011, Markus Trippelsdorf wrote:
  
FIX idr_layer_cache: Marking all objects used
  
   Yesterday I couldn't reproduce the issue at all. But today I've hit
   exactly the same spot again. (CCing the drm list)
  
  Well this is looks like write after free.
  
   =
   BUG idr_layer_cache: Poison overwritten
   -
   Object 8802156487c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
   
   Object 8802156487d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
   
   Object 8802156487e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
   
   Object 8802156487f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
   
   Object 880215648800: 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
   
   Object 880215648810: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
   
  
  And its an integer sized write of 0. If you look at the struct definition
  and lookup the offset you should be able to locate the field that
  was modified.

It also happens with CONFIG_SLAB. 
(If someone wants to reproduce the issue, just run a kexec boot loop and
the bug will occur after a few (~10) iterations.)

Dec  1 05:04:52 x4 kernel: [drm] Initialized drm 1.1.0 20060810
Dec  1 05:04:52 x4 kernel: [drm] radeon defaulting to kernel modesetting.
Dec  1 05:04:52 x4 kernel: [drm] radeon kernel modesetting enabled.
Dec  1 05:04:52 x4 kernel: radeon :01:05.0: PCI INT A - GSI 18 (level, 
low) - IRQ 18
Dec  1 05:04:52 x4 kernel: radeon :01:05.0: setting latency timer to 64
Dec  1 05:04:52 x4 kernel: [drm] initializing kernel modesetting (RS780 
0x1002:0x9614 0x1043:0x834D).
Dec  1 05:04:52 x4 kernel: [drm] register mmio base: 0xFBEE
Dec  1 05:04:52 x4 kernel: [drm] register mmio size: 65536
Dec  1 05:04:52 x4 kernel: ATOM BIOS: 113
Dec  1 05:04:52 x4 kernel: radeon :01:05.0: VRAM: 128M 0xC000 - 
0xC7FF (128M used)
Dec  1 05:04:52 x4 kernel: radeon :01:05.0: GTT: 512M 0xA000 - 
0xBFFF
Dec  1 05:04:52 x4 kernel: [drm] Detected VRAM RAM=128M, BAR=128M
Dec  1 05:04:52 x4 kernel: [drm] RAM width 32bits DDR
Dec  1 05:04:52 x4 kernel: [TTM] Zone  kernel: Available graphics memory: 
4090750 kiB.
Dec  1 05:04:52 x4 kernel: [TTM] Zone   dma32: Available graphics memory: 
2097152 kiB.
Dec  1 05:04:52 x4 kernel: [TTM] Initializing pool allocator.
Dec  1 05:04:52 x4 kernel: [drm] radeon: 128M of VRAM memory ready
Dec  1 05:04:52 x4 kernel: [drm] radeon: 512M of GTT memory ready.
Dec  1 05:04:52 x4 kernel: [drm] Supports vblank timestamp caching Rev 1 
(10.10.2010).
Dec  1 05:04:52 x4 kernel: [drm] Driver supports precise vblank timestamp query.
Dec  1 05:04:52 x4 kernel: [drm] radeon: irq initialized.
Dec  1 05:04:52 x4 kernel: [drm] GART: num cpu pages 131072, num gpu pages 
131072
Dec  1 05:04:52 x4 kernel: [drm] Loading RS780 Microcode
Dec  1 05:04:52 x4 kernel: [drm] PCIE GART of 512M enabled (table at 
0xC004).
Dec  1 05:04:52 x4 kernel: radeon :01:05.0: WB enabled
Dec  1 05:04:52 x4 kernel: Slab corruption: size-1024 start=880216cbc730, 
len=1024
Dec  1 05:04:52 x4 kernel: Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Dec  1 05:04:52 x4 kernel: Last user: [  (null)](0x0)
Dec  1 05:04:52 x4 kernel: 0d0: 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
 
Dec  1 05:04:52 x4 kernel: Prev obj: start=880216cbc318, len=1024
Dec  1 05:04:52 x4 kernel: Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Dec  1 05:04:52 x4 kernel: Last user: [  (null)](0x0)
Dec  1 05:04:52 x4 kernel: 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
 
Dec  1 05:04:52 x4 kernel: 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
 
Dec  1 05:04:52 x4 kernel: Next obj: start=880216cbcb48, len=1024
Dec  1 05:04:52 x4 kernel: Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
Dec  1 05:04:52 x4 kernel: Last user: 
[81299874](radeon_bo_create+0xb4/0x240)
Dec  1 05:04:52 x4 kernel: 000: 48 cb cb 16 02 88 ff ff 48 cb cb 16 02 88 ff ff 
 H...H...
Dec  1 05:04:52 x4 kernel: 010: 02 00 27 00 00 00 00 00 00 00 00 00 00 00 00 00 
 ..'.
Dec  1 05:04:52 x4 kernel: [drm] ring test succeeded in 0 usecs
Dec  1 05:04:52 x4 kernel: [drm] radeon: ib pool ready.
Dec  1 05:04:52 x4 kernel: [drm] ib test succeeded in 0 usecs
Dec  1 05:04:52 x4 kernel: [drm] Radeon Display Connectors
Dec  1 05:04:52 x4 kernel: [drm] Connector 0:
Dec  1 05:04:52 x4 kernel: [drm]   VGA
Dec  1 05:04:52 x4 kernel: [drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 
0x7e48 0x7e4c 0x7e4c
Dec  1 05:04:52 x4 kernel: [drm]   Encoders:
Dec  1 

Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-11-24 Thread Markus Trippelsdorf
On 2011.11.23 at 10:06 -0600, Christoph Lameter wrote:
 On Wed, 23 Nov 2011, Markus Trippelsdorf wrote:
 
   FIX idr_layer_cache: Marking all objects used
 
  Yesterday I couldn't reproduce the issue at all. But today I've hit
  exactly the same spot again. (CCing the drm list)
 
 Well this is looks like write after free.
 
  =
  BUG idr_layer_cache: Poison overwritten
  -
  Object 8802156487c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
  
  Object 8802156487d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
  
  Object 8802156487e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
  
  Object 8802156487f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
  
  Object 880215648800: 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
  
  Object 880215648810: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
  
 
 And its an integer sized write of 0. If you look at the struct definition
 and lookup the offset you should be able to locate the field that
 was modified.

Here are two more BUGs that seem to point to the same bug:

1)
...
Nov 21 18:30:30 x4 kernel: [drm] radeon: irq initialized.
Nov 21 18:30:30 x4 kernel: [drm] GART: num cpu pages 131072, num gpu pages 
131072
Nov 21 18:30:30 x4 kernel: [drm] Loading RS780 Microcode
Nov 21 18:30:30 x4 kernel: [drm] PCIE GART of 512M enabled (table at 
0xC004).
Nov 21 18:30:30 x4 kernel: radeon :01:05.0: WB enabled
Nov 21 18:30:30 x4 kernel: 
=
Nov 21 18:30:30 x4 kernel: BUG task_xstate: Not a valid slab page
Nov 21 18:30:30 x4 kernel: 
-
Nov 21 18:30:30 x4 kernel:
Nov 21 18:30:30 x4 kernel: INFO: Slab 0xea044300 objects=32767 
used=65535 fp=0x  (null) flags=0x0401
Nov 21 18:30:30 x4 kernel: Pid: 9, comm: ksoftirqd/1 Not tainted 
3.2.0-rc2-00274-g6fe4c6d-dirty #75
Nov 21 18:30:30 x4 kernel: Call Trace:
Nov 21 18:30:30 x4 kernel: [81101c1d] slab_err+0x7d/0x90
Nov 21 18:30:30 x4 kernel: [8103e29f] ? dump_trace+0x16f/0x2e0
Nov 21 18:30:30 x4 kernel: [81044764] ? free_thread_xstate+0x24/0x40
Nov 21 18:30:30 x4 kernel: [81044764] ? free_thread_xstate+0x24/0x40
Nov 21 18:30:30 x4 kernel: [81102566] check_slab+0x96/0xc0
Nov 21 18:30:30 x4 kernel: [814c5c29] free_debug_processing+0x34/0x19c
Nov 21 18:30:30 x4 kernel: [81101d9a] ? set_track+0x5a/0x190
Nov 21 18:30:30 x4 kernel: [8110cf2b] ? sys_open+0x1b/0x20
Nov 21 18:30:30 x4 kernel: [814c5e55] __slab_free+0x33/0x2d0
Nov 21 18:30:30 x4 kernel: [8110cf2b] ? sys_open+0x1b/0x20
Nov 21 18:30:30 x4 kernel: [81105134] kmem_cache_free+0x104/0x120
Nov 21 18:30:30 x4 kernel: [81044764] free_thread_xstate+0x24/0x40
Nov 21 18:30:30 x4 kernel: [81044794] free_thread_info+0x14/0x30
Nov 21 18:30:30 x4 kernel: [8106a4ff] free_task+0x2f/0x50
Nov 21 18:30:30 x4 kernel: [8106a5d0] __put_task_struct+0xb0/0x110
Nov 21 18:30:30 x4 kernel: [8106eb4b] 
delayed_put_task_struct+0x3b/0xa0
Nov 21 18:30:30 x4 kernel: [810aa01a] 
__rcu_process_callbacks+0x12a/0x350
Nov 21 18:30:30 x4 kernel: [810aa2a2] rcu_process_callbacks+0x62/0x140
Nov 21 18:30:30 x4 kernel: [81072e18] __do_softirq+0xa8/0x200
Nov 21 18:30:30 x4 kernel: [81073077] run_ksoftirqd+0x107/0x210
Nov 21 18:30:30 x4 kernel: [81072f70] ? __do_softirq+0x200/0x200
Nov 21 18:30:30 x4 kernel: [8108bb87] kthread+0x87/0x90
Nov 21 18:30:30 x4 kernel: [814cdcf4] kernel_thread_helper+0x4/0x10
Nov 21 18:30:30 x4 kernel: [8108bb00] ? 
kthread_flush_work_fn+0x10/0x10
Nov 21 18:30:30 x4 kernel: [814cdcf0] ? gs_change+0xb/0xb
Nov 21 18:30:30 x4 kernel: FIX task_xstate: Object at 0x8110cf2b not 
freed
Nov 21 18:30:30 x4 kernel: [drm] ring test succeeded in 1 usecs
Nov 21 18:30:30 x4 kernel: [drm] radeon: ib pool ready.
Nov 21 18:30:30 x4 kernel: [drm] ib test succeeded in 0 usecs
Nov 21 18:30:30 x4 kernel: [drm] Radeon Display Connectors
Nov 21 18:30:30 x4 kernel: [drm] Connector 0

2)
...
Nov 21 17:04:38 x4 kernel: fbcon: radeondrmfb (fb0) is primary device
Nov 21 17:04:38 x4 kernel: Console: switching to colour frame buffer device 
131x105
Nov 21 17:04:38 x4 kernel: fb0: radeondrmfb frame buffer device
Nov 21 17:04:38 x4 kernel: drm: registered panic notifier
Nov 21 17:04:38 x4 kernel: [drm] Initialized radeon 2.11.0 20080528 for 
:01:05.0 on minor 0
Nov 21 17:04:38 x4 kernel: loop: module loaded
Nov 21 17:04:38 x4 kernel: ahci :00:11.0: version 3.0
Nov 21 17:04:38 x4 kernel: ahci :00:11.0: PCI INT A - GSI 22 (level, low) 
- 

Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-11-23 Thread Markus Trippelsdorf
On 2011.11.21 at 16:36 +0100, Markus Trippelsdorf wrote:
 On 2011.11.21 at 15:16 +0100, Eric Dumazet wrote:
  Le lundi 21 novembre 2011 à 14:15 +0100, Markus Trippelsdorf a écrit :
  
   I've enabled CONFIG_SLUB_DEBUG_ON and this is what happend:
   
  
  Thanks
  
  Please continue to provide more samples.
  
  There is something wrong somewhere, but where exactly, its hard to say.
 
 New sample. This one points to lib/idr.c:
 
 [drm] Initialized drm 1.1.0 20060810
 [drm] radeon defaulting to kernel modesetting.
 [drm] radeon kernel modesetting enabled.
 radeon :01:05.0: PCI INT A - GSI 18 (level, low) - IRQ 18
 radeon :01:05.0: setting latency timer to 64
 [drm] initializing kernel modesetting (RS780 0x1002:0x9614 0x1043:0x834D).
 [drm] register mmio base: 0xFBEE
 [drm] register mmio size: 65536
 ATOM BIOS: 113
 radeon :01:05.0: VRAM: 128M 0xC000 - 0xC7FF (128M 
 used)
 radeon :01:05.0: GTT: 512M 0xA000 - 0xBFFF
 [drm] Detected VRAM RAM=128M, BAR=128M
 [drm] RAM width 32bits DDR
 [TTM] Zone  kernel: Available graphics memory: 4083428 kiB.
 [TTM] Zone   dma32: Available graphics memory: 2097152 kiB.
 [TTM] Initializing pool allocator.
 [drm] radeon: 128M of VRAM memory ready
 [drm] radeon: 512M of GTT memory ready.
 [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
 [drm] Driver supports precise vblank timestamp query.
 [drm] radeon: irq initialized.
 [drm] GART: num cpu pages 131072, num gpu pages 131072
 [drm] Loading RS780 Microcode
 [drm] PCIE GART of 512M enabled (table at 0xC004).
 radeon :01:05.0: WB enabled
 [drm] ring test succeeded in 1 usecs
 [drm] radeon: ib pool ready.
 [drm] ib test succeeded in 0 usecs
 =
 BUG idr_layer_cache: Poison overwritten
 -
 
 INFO: 0x880215650800-0x880215650803. First byte 0x0 instead of 0x6b
 INFO: Slab 0xea0008559400 objects=18 used=18 fp=0x  (null) 
 flags=0x40004080
 INFO: Object 0x8802156506d0 @offset=1744 fp=0x880215650a38
 
 Bytes b4 8802156506c0: a4 6f fb ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  
 .o..
 Object 8802156506d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 8802156506e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 8802156506f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650700: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650710: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650720: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650730: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650740: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650750: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650760: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650770: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650780: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650790: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 8802156507a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 8802156507b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 8802156507c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 8802156507d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 8802156507e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 8802156507f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650800: 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650810: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650820: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650830: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650840: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650850: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650860: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650870: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650880: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215650890: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 

Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-11-23 Thread Christoph Lameter
On Wed, 23 Nov 2011, Markus Trippelsdorf wrote:

  FIX idr_layer_cache: Marking all objects used

 Yesterday I couldn't reproduce the issue at all. But today I've hit
 exactly the same spot again. (CCing the drm list)

Well this is looks like write after free.

 =
 BUG idr_layer_cache: Poison overwritten
 -
 Object 8802156487c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 8802156487d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 8802156487e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 8802156487f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215648800: 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 
 Object 880215648810: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 

And its an integer sized write of 0. If you look at the struct definition
and lookup the offset you should be able to locate the field that
was modified.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel