Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Linus Torvalds
Hi,

On Wed, Dec 21, 2016 at 2:16 PM, Dave Chinner  wrote:
> On Fri, Dec 16, 2016 at 10:59:06AM -0800, Chris Leech wrote:
>> Thanks Dave,
>>
>> I'm hitting a bug at scatterlist.h:140 before I even get any iSCSI
>> modules loaded (virtio block) so there's something else going on in the
>> current merge window.  I'll keep an eye on it and make sure there's
>> nothing iSCSI needs fixing for.
>
> OK, so before this slips through the cracks.
>
> Linus - your tree as of a few minutes ago still panics immediately
> when starting xfstests on iscsi devices. It appears to be a
> scatterlist corruption and not an iscsi problem, so the iscsi guys
> seem to have bounced it and no-one is looking at it.

Hmm. There's not much to go by.

Can somebody in iscsi-land please try to just bisect it - I'm not
seeing a lot of clues to where this comes from otherwise.

There clearly hasn't been anything done to drivers/scsi/*iscsi* since
4.9, so it's coming from elsewhere, presumably the block layer changes
as you say.

But I think we need iscsi people to pinpoint it a bit more, since
nobody else seems to be complaining.

Looking around a bit, the only even halfway suspicious scatterlist
initialization thing I see is commit f9d03f96b988 ("block: improve
handling of the magic discard payload") which used to have a magic
hack wrt !bio->bi_vcnt, and that got removed. See __blk_bios_map_sg(),
now it does __blk_bvec_map_sg() instead.

Maybe __blk_bvec_map_sg() would want that same

if (!bio->bi_vcnt)
return 0;

adding Christoph explicitly so that he can tell me why I'm a clueless
weenie. Although I suspect he's already on the scsi and block lists
and would have done so anyway.

> I'm disappearing for several months at the end of tomorrow, so I
> thought I better make sure you know about it.  I've also added
> linux-scsi, linux-block to the cc list

Thanks,

Linus

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Linus Torvalds
On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner  wrote:
>
> There may be deeper issues. I just started running scalability tests
> (e.g. 16-way fsmark create tests) and about a minute in I got a
> directory corruption reported - something I hadn't seen in the dev
> cycle at all.

By "in the dev cycle", do you mean your XFS changes, or have you been
tracking the merge cycle at least for some testing?

> I unmounted the fs, mkfs'd it again, ran the
> workload again and about a minute in this fired:
>
> [628867.607417] [ cut here ]
> [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 
> shadow_lru_isolate+0x171/0x220

Well, part of the changes during the merge window were the shadow
entry tracking changes that came in through Andrew's tree. Adding
Johannes Weiner to the participants.

> Now, this workload does not touch the page cache at all - it's
> entirely an XFS metadata workload, so it should not really be
> affecting the working set code.

Well, I suspect that anything that creates memory pressure will end up
triggering the working set code, so ..

That said, obviously memory corruption could be involved and result in
random issues too, but I wouldn't really expect that in this code.

It would probably be really useful to get more data points - is the
problem reliably in this area, or is it going to be random and all
over the place.

That said:

> And worse, on that last error, the /host/ is now going into meltdown
> (running 4.7.5) with 32 CPUs all burning down in ACPI code:

The obvious question here is how much you trust the environment if the
host ends up also showing problems. Maybe you do end up having hw
issues pop up too.

The primary suspect would presumably be the development kernel you're
testing triggering something, but it has to be asked..

 Linus

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2017-01-04 Thread Linus Torvalds
On Fri, Dec 23, 2016 at 2:00 AM, Christoph Hellwig  wrote:
>
> From: Christoph Hellwig 
> Date: Fri, 23 Dec 2016 10:57:06 +0100
> Subject: virtio_blk: avoid DMA to stack for the sense buffer
>
> Most users of BLOCK_PC requests allocate the sense buffer on the stack,
> so to avoid DMA to the stack copy them to a field in the heap allocated
> virtblk_req structure.  Without that any attempt at SCSI passthrough I/O,
> including the SG_IO ioctl from userspace will crash the kernel.  Note that
> this includes running tools like hdparm even when the host does not have
> SCSI passthrough enabled.

Ugh. This patch is nasty.

I think we should just fix blk_execute_rq() instead.

But from a quick look, we also have at least sg_scsi_ioctl() and
sg_io() doing the same thing.

And the SG_IO thing in bsg_ioctl(). And spi_execute() in scsi_transport_spi.c

And resp_requests() in scsi_debug.c.

So I guess ugly it may need to be, and the rule is that the sense
buffer really can be on the stack and you can't DMA to/from it.
Comments from others?

Linus

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2017-01-07 Thread Linus Torvalds
On Sat, Jan 7, 2017 at 6:02 PM, Johannes Weiner  wrote:
>
> Linus? Andrew?

Looks fine to me. Will apply.

   Linus

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.