On Mon, Sep 08, 2014 at 01:56:55PM -0400, Sasha Levin wrote:
> On 09/08/2014 01:18 PM, Mel Gorman wrote:
> > A worse possibility is that somehow the lock is getting corrupted but
> > that's also a tough sell considering that the locks should be allocated
> > from a dedicated cache. I guess I could try breaking that to allocate
> > one page per lock so DEBUG_PAGEALLOC triggers but I'm not very
> > optimistic.
> 
> I did see ptl corruption couple days ago:
> 
>       https://lkml.org/lkml/2014/9/4/599
> 
> Could this be related?
> 

Possibly although the likely explanation then would be that there is
just general corruption coming from somewhere. Even using your config
and applying a patch to make linux-next boot (already in Tejun's tree)
I was unable to reproduce the problem after running for several hours. I
had to run trinity on tmpfs as ext4 and xfs blew up almost immediately
so I have a few questions.

1. What filesystem are you using?

2. What compiler in case it's an experimental compiler? I ask because I
   think I saw a patch from you adding support so that the kernel would
   build with gcc 5

3. Does your hardware support TSX or anything similarly funky that would
   potentially affect locking?

4. How many sockets are on your test machine in case reproducing it
   depends in a machine large enough to open a timing race?

As I'm drawing a blank on what would trigger the bug I'm hoping I can
reproduce this locally and experiement a bit.

Thanks.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to