Re: [PATCH] [RFC] fs: Possible filp_open race experiment

2017-01-31 Thread Marek Vasut
On 01/31/2017 02:17 PM, Greg Kroah-Hartman wrote:
> On Tue, Jan 31, 2017 at 01:58:14PM +0100, Christoph Hellwig wrote:
>> On Tue, Jan 31, 2017 at 11:21:02AM +0100, Greg Kroah-Hartman wrote:
>>>
>>> -next isn't Linus's tree, sometimes stuff sits in there for years :)
>>>
>>> Anyway, if this is a configfs issue, Christoph and Joel can take a look
>>> at it.  Any reason you didn't cc: Joel as well (the MAINTAINERS file is
>>> your friend...)
>>
>> It's really a mismatched assumption.  The configfs binary file
>> code just chunks updates up into a buffer, which only gets flushed
>> at ->release time.  If we'd move that to ->flush the issue Marek reports
>> would be fixed.
>>
>> But I don't think we want that - triggering a filp_open from the update
>> of a _binary_ attribute for a start is wrong.  And second doing this
>> using ->fs of a random calling process is bound to cause problems.
>>
>> I think he is using the wrong kind of interface for the job.
> 
> Ah, that's why no one has seen this before :)
> 
> So, the DT overlay code needs to be fixed...

Well I ran into the issue when I loaded DTO using 'cat' which bound a
driver which required firmware and the firmware loader uses the
filp_open() to load the firmware file from the FS. This crashed my
kernel because the current->fs was NULL. The example I provided is a
stripped down version which checks the current->fs directly to make
things simpler.

I think the issue is in the firmware loader (or filp_open() itself?) .
Shouldn't the filp_open() somehow assure that the current->fs is valid
if it is used within instead of triggering a NULL ptr dereference when
called from ie. the configfs binary attribute write callback ?

-- 
Best regards,
Marek Vasut


Re: [PATCH] [RFC] fs: Possible filp_open race experiment

2017-01-31 Thread Marek Vasut
On 01/31/2017 02:17 PM, Greg Kroah-Hartman wrote:
> On Tue, Jan 31, 2017 at 01:58:14PM +0100, Christoph Hellwig wrote:
>> On Tue, Jan 31, 2017 at 11:21:02AM +0100, Greg Kroah-Hartman wrote:
>>>
>>> -next isn't Linus's tree, sometimes stuff sits in there for years :)
>>>
>>> Anyway, if this is a configfs issue, Christoph and Joel can take a look
>>> at it.  Any reason you didn't cc: Joel as well (the MAINTAINERS file is
>>> your friend...)
>>
>> It's really a mismatched assumption.  The configfs binary file
>> code just chunks updates up into a buffer, which only gets flushed
>> at ->release time.  If we'd move that to ->flush the issue Marek reports
>> would be fixed.
>>
>> But I don't think we want that - triggering a filp_open from the update
>> of a _binary_ attribute for a start is wrong.  And second doing this
>> using ->fs of a random calling process is bound to cause problems.
>>
>> I think he is using the wrong kind of interface for the job.
> 
> Ah, that's why no one has seen this before :)
> 
> So, the DT overlay code needs to be fixed...

Well I ran into the issue when I loaded DTO using 'cat' which bound a
driver which required firmware and the firmware loader uses the
filp_open() to load the firmware file from the FS. This crashed my
kernel because the current->fs was NULL. The example I provided is a
stripped down version which checks the current->fs directly to make
things simpler.

I think the issue is in the firmware loader (or filp_open() itself?) .
Shouldn't the filp_open() somehow assure that the current->fs is valid
if it is used within instead of triggering a NULL ptr dereference when
called from ie. the configfs binary attribute write callback ?

-- 
Best regards,
Marek Vasut


Re: [PATCH] [RFC] fs: Possible filp_open race experiment

2017-01-31 Thread Greg Kroah-Hartman
On Tue, Jan 31, 2017 at 01:58:14PM +0100, Christoph Hellwig wrote:
> On Tue, Jan 31, 2017 at 11:21:02AM +0100, Greg Kroah-Hartman wrote:
> > 
> > -next isn't Linus's tree, sometimes stuff sits in there for years :)
> > 
> > Anyway, if this is a configfs issue, Christoph and Joel can take a look
> > at it.  Any reason you didn't cc: Joel as well (the MAINTAINERS file is
> > your friend...)
> 
> It's really a mismatched assumption.  The configfs binary file
> code just chunks updates up into a buffer, which only gets flushed
> at ->release time.  If we'd move that to ->flush the issue Marek reports
> would be fixed.
> 
> But I don't think we want that - triggering a filp_open from the update
> of a _binary_ attribute for a start is wrong.  And second doing this
> using ->fs of a random calling process is bound to cause problems.
> 
> I think he is using the wrong kind of interface for the job.

Ah, that's why no one has seen this before :)

So, the DT overlay code needs to be fixed...

thanks,

greg k-h


Re: [PATCH] [RFC] fs: Possible filp_open race experiment

2017-01-31 Thread Greg Kroah-Hartman
On Tue, Jan 31, 2017 at 01:58:14PM +0100, Christoph Hellwig wrote:
> On Tue, Jan 31, 2017 at 11:21:02AM +0100, Greg Kroah-Hartman wrote:
> > 
> > -next isn't Linus's tree, sometimes stuff sits in there for years :)
> > 
> > Anyway, if this is a configfs issue, Christoph and Joel can take a look
> > at it.  Any reason you didn't cc: Joel as well (the MAINTAINERS file is
> > your friend...)
> 
> It's really a mismatched assumption.  The configfs binary file
> code just chunks updates up into a buffer, which only gets flushed
> at ->release time.  If we'd move that to ->flush the issue Marek reports
> would be fixed.
> 
> But I don't think we want that - triggering a filp_open from the update
> of a _binary_ attribute for a start is wrong.  And second doing this
> using ->fs of a random calling process is bound to cause problems.
> 
> I think he is using the wrong kind of interface for the job.

Ah, that's why no one has seen this before :)

So, the DT overlay code needs to be fixed...

thanks,

greg k-h


Re: [PATCH] [RFC] fs: Possible filp_open race experiment

2017-01-31 Thread Christoph Hellwig
On Tue, Jan 31, 2017 at 11:21:02AM +0100, Greg Kroah-Hartman wrote:
> 
> -next isn't Linus's tree, sometimes stuff sits in there for years :)
> 
> Anyway, if this is a configfs issue, Christoph and Joel can take a look
> at it.  Any reason you didn't cc: Joel as well (the MAINTAINERS file is
> your friend...)

It's really a mismatched assumption.  The configfs binary file
code just chunks updates up into a buffer, which only gets flushed
at ->release time.  If we'd move that to ->flush the issue Marek reports
would be fixed.

But I don't think we want that - triggering a filp_open from the update
of a _binary_ attribute for a start is wrong.  And second doing this
using ->fs of a random calling process is bound to cause problems.

I think he is using the wrong kind of interface for the job.


Re: [PATCH] [RFC] fs: Possible filp_open race experiment

2017-01-31 Thread Christoph Hellwig
On Tue, Jan 31, 2017 at 11:21:02AM +0100, Greg Kroah-Hartman wrote:
> 
> -next isn't Linus's tree, sometimes stuff sits in there for years :)
> 
> Anyway, if this is a configfs issue, Christoph and Joel can take a look
> at it.  Any reason you didn't cc: Joel as well (the MAINTAINERS file is
> your friend...)

It's really a mismatched assumption.  The configfs binary file
code just chunks updates up into a buffer, which only gets flushed
at ->release time.  If we'd move that to ->flush the issue Marek reports
would be fixed.

But I don't think we want that - triggering a filp_open from the update
of a _binary_ attribute for a start is wrong.  And second doing this
using ->fs of a random calling process is bound to cause problems.

I think he is using the wrong kind of interface for the job.


Re: [PATCH] [RFC] fs: Possible filp_open race experiment

2017-01-31 Thread Greg Kroah-Hartman
On Tue, Jan 31, 2017 at 11:08:05AM +0100, Marek Vasut wrote:
> On 01/31/2017 08:05 AM, Greg Kroah-Hartman wrote:
> > On Tue, Jan 31, 2017 at 06:29:36AM +0100, Marek Vasut wrote:
> >> +CC Greg, LKML as I don't quite know where this should go.
> > 
> > You do know about linux-fsdevel, right?
> 
> No, wasn't aware of it, sorry.
> 
> >> On 01/18/2017 12:16 AM, Marek Vasut wrote:
> >>> I believe there is a possible race condition when configfs attributes
> >>> trigger filp_open() from the kernel. I initially observed the problem
> >>> on Linux 4.4 when loading DT overlay , which in turn loads a driver
> >>> which loads firmware. After some further investigation, I came up with
> >>> the following minimal-ish example patch, which can trigger the same
> >>> behavior on Linux 4.10-rc4 (next 20170117).
> > 
> > What in-kernel code causes this problem?  I didn't think DT overlays
> > were a feature in 4.4, are you running with code that isn't in the
> > normal releases?
> 
> No, it happens in -next as well. I believe if write into configfs binary
> attribute triggers filp_open(), the kernel will crash.

Any specific configfs binary file in-tree that this happens to?

> >>> The core of the demo is in cfs_over_item_dtbo_write(), which just checks
> >>> for valid current->fs . This function is triggered by writing data into
> >>> configfs binary attribute, ie.:
> > 
> > Why are you caring about current->fs?
> 
> Because that is what's NULL and is referenced (in set_root_rcu()) when
> the configfs binary attribute is written and triggers filp_open() .
> 
> >>> $ mkdir /sys/kernel/config/test/overlays/1/dtbo
> >>> $ cat file_17201_bytes_long > /sys/kernel/config/test/overlays/1/dtbo
> >>>
> >>> I believe the 'cat' program exits quickly and thus calls fs_exit()
> >>> before the cfs_over_item_dtbo_write() is called.
> > 
> > How can exit be called before write?
> 
> I believe the exit happens after write, but this function
> cfs_over_item_dtbo_write() is entered only after the fs_exit().
> 
> >>> Any attempts to
> >>> access FS (like ie. loading firmware from FS) from that function will
> >>> therefore fail (by crashing the kernel, NULL pointer dereference in
> >>> set_root_rcu() in fs/namei.c).
> >>>
> >>> On the other hand, replacing 'cat' with 'dd' yields different result:
> >>>
> >>> $ dd if=file_17201_bytes_long of=/sys/kernel/config/test/overlays/1/dtbo
> >>>
> >>> The kernel does not crash. I believe this is because dd takes slightly
> >>> longer to complete, so the cfs_over_item_dtbo_write() can complete
> >>> before the dd process gets to calling fs_exit() and so the filesystem
> >>> access is still available, thus current->fs is valid.
> > 
> > cat and dd act differently, if you strace them, it should show the
> > differences, perhaps you can narrow it down there?
> 
> I can try.
> 
> >>> Note that when using DT overlays (whose configfs interface is not yet
> >>> mainline),
> > 
> > Ah, we can't do anything about code that is not merged, perhaps it is
> > just buggy? :)
> 
> The configfs stuff is in -next , how is it not merged ? The code below
> is an example that triggers the problem.

-next isn't Linus's tree, sometimes stuff sits in there for years :)

Anyway, if this is a configfs issue, Christoph and Joel can take a look
at it.  Any reason you didn't cc: Joel as well (the MAINTAINERS file is
your friend...)

thanks,

greg k-h


Re: [PATCH] [RFC] fs: Possible filp_open race experiment

2017-01-31 Thread Greg Kroah-Hartman
On Tue, Jan 31, 2017 at 11:08:05AM +0100, Marek Vasut wrote:
> On 01/31/2017 08:05 AM, Greg Kroah-Hartman wrote:
> > On Tue, Jan 31, 2017 at 06:29:36AM +0100, Marek Vasut wrote:
> >> +CC Greg, LKML as I don't quite know where this should go.
> > 
> > You do know about linux-fsdevel, right?
> 
> No, wasn't aware of it, sorry.
> 
> >> On 01/18/2017 12:16 AM, Marek Vasut wrote:
> >>> I believe there is a possible race condition when configfs attributes
> >>> trigger filp_open() from the kernel. I initially observed the problem
> >>> on Linux 4.4 when loading DT overlay , which in turn loads a driver
> >>> which loads firmware. After some further investigation, I came up with
> >>> the following minimal-ish example patch, which can trigger the same
> >>> behavior on Linux 4.10-rc4 (next 20170117).
> > 
> > What in-kernel code causes this problem?  I didn't think DT overlays
> > were a feature in 4.4, are you running with code that isn't in the
> > normal releases?
> 
> No, it happens in -next as well. I believe if write into configfs binary
> attribute triggers filp_open(), the kernel will crash.

Any specific configfs binary file in-tree that this happens to?

> >>> The core of the demo is in cfs_over_item_dtbo_write(), which just checks
> >>> for valid current->fs . This function is triggered by writing data into
> >>> configfs binary attribute, ie.:
> > 
> > Why are you caring about current->fs?
> 
> Because that is what's NULL and is referenced (in set_root_rcu()) when
> the configfs binary attribute is written and triggers filp_open() .
> 
> >>> $ mkdir /sys/kernel/config/test/overlays/1/dtbo
> >>> $ cat file_17201_bytes_long > /sys/kernel/config/test/overlays/1/dtbo
> >>>
> >>> I believe the 'cat' program exits quickly and thus calls fs_exit()
> >>> before the cfs_over_item_dtbo_write() is called.
> > 
> > How can exit be called before write?
> 
> I believe the exit happens after write, but this function
> cfs_over_item_dtbo_write() is entered only after the fs_exit().
> 
> >>> Any attempts to
> >>> access FS (like ie. loading firmware from FS) from that function will
> >>> therefore fail (by crashing the kernel, NULL pointer dereference in
> >>> set_root_rcu() in fs/namei.c).
> >>>
> >>> On the other hand, replacing 'cat' with 'dd' yields different result:
> >>>
> >>> $ dd if=file_17201_bytes_long of=/sys/kernel/config/test/overlays/1/dtbo
> >>>
> >>> The kernel does not crash. I believe this is because dd takes slightly
> >>> longer to complete, so the cfs_over_item_dtbo_write() can complete
> >>> before the dd process gets to calling fs_exit() and so the filesystem
> >>> access is still available, thus current->fs is valid.
> > 
> > cat and dd act differently, if you strace them, it should show the
> > differences, perhaps you can narrow it down there?
> 
> I can try.
> 
> >>> Note that when using DT overlays (whose configfs interface is not yet
> >>> mainline),
> > 
> > Ah, we can't do anything about code that is not merged, perhaps it is
> > just buggy? :)
> 
> The configfs stuff is in -next , how is it not merged ? The code below
> is an example that triggers the problem.

-next isn't Linus's tree, sometimes stuff sits in there for years :)

Anyway, if this is a configfs issue, Christoph and Joel can take a look
at it.  Any reason you didn't cc: Joel as well (the MAINTAINERS file is
your friend...)

thanks,

greg k-h


Re: [PATCH] [RFC] fs: Possible filp_open race experiment

2017-01-31 Thread Marek Vasut
On 01/31/2017 08:05 AM, Greg Kroah-Hartman wrote:
> On Tue, Jan 31, 2017 at 06:29:36AM +0100, Marek Vasut wrote:
>> +CC Greg, LKML as I don't quite know where this should go.
> 
> You do know about linux-fsdevel, right?

No, wasn't aware of it, sorry.

>> On 01/18/2017 12:16 AM, Marek Vasut wrote:
>>> I believe there is a possible race condition when configfs attributes
>>> trigger filp_open() from the kernel. I initially observed the problem
>>> on Linux 4.4 when loading DT overlay , which in turn loads a driver
>>> which loads firmware. After some further investigation, I came up with
>>> the following minimal-ish example patch, which can trigger the same
>>> behavior on Linux 4.10-rc4 (next 20170117).
> 
> What in-kernel code causes this problem?  I didn't think DT overlays
> were a feature in 4.4, are you running with code that isn't in the
> normal releases?

No, it happens in -next as well. I believe if write into configfs binary
attribute triggers filp_open(), the kernel will crash.

>>> The core of the demo is in cfs_over_item_dtbo_write(), which just checks
>>> for valid current->fs . This function is triggered by writing data into
>>> configfs binary attribute, ie.:
> 
> Why are you caring about current->fs?

Because that is what's NULL and is referenced (in set_root_rcu()) when
the configfs binary attribute is written and triggers filp_open() .

>>> $ mkdir /sys/kernel/config/test/overlays/1/dtbo
>>> $ cat file_17201_bytes_long > /sys/kernel/config/test/overlays/1/dtbo
>>>
>>> I believe the 'cat' program exits quickly and thus calls fs_exit()
>>> before the cfs_over_item_dtbo_write() is called.
> 
> How can exit be called before write?

I believe the exit happens after write, but this function
cfs_over_item_dtbo_write() is entered only after the fs_exit().

>>> Any attempts to
>>> access FS (like ie. loading firmware from FS) from that function will
>>> therefore fail (by crashing the kernel, NULL pointer dereference in
>>> set_root_rcu() in fs/namei.c).
>>>
>>> On the other hand, replacing 'cat' with 'dd' yields different result:
>>>
>>> $ dd if=file_17201_bytes_long of=/sys/kernel/config/test/overlays/1/dtbo
>>>
>>> The kernel does not crash. I believe this is because dd takes slightly
>>> longer to complete, so the cfs_over_item_dtbo_write() can complete
>>> before the dd process gets to calling fs_exit() and so the filesystem
>>> access is still available, thus current->fs is valid.
> 
> cat and dd act differently, if you strace them, it should show the
> differences, perhaps you can narrow it down there?

I can try.

>>> Note that when using DT overlays (whose configfs interface is not yet
>>> mainline),
> 
> Ah, we can't do anything about code that is not merged, perhaps it is
> just buggy? :)

The configfs stuff is in -next , how is it not merged ? The code below
is an example that triggers the problem.

>>> there can easily be a device which requires a firmware in
>>> the DT overlay. Such device will invoke firmware load, which uses the
>>> filp_open() and will thus trigger the behavior above. Depending on
>>> whether one uses dd or cat, the kernel will either crash or not.
>>>
>>> Any ideas ?
> 
> I think you need to fix your device tree overlay code...

This is not related to DTO, I only use that to trigger the problem.

-- 
Best regards,
Marek Vasut


Re: [PATCH] [RFC] fs: Possible filp_open race experiment

2017-01-31 Thread Marek Vasut
On 01/31/2017 08:05 AM, Greg Kroah-Hartman wrote:
> On Tue, Jan 31, 2017 at 06:29:36AM +0100, Marek Vasut wrote:
>> +CC Greg, LKML as I don't quite know where this should go.
> 
> You do know about linux-fsdevel, right?

No, wasn't aware of it, sorry.

>> On 01/18/2017 12:16 AM, Marek Vasut wrote:
>>> I believe there is a possible race condition when configfs attributes
>>> trigger filp_open() from the kernel. I initially observed the problem
>>> on Linux 4.4 when loading DT overlay , which in turn loads a driver
>>> which loads firmware. After some further investigation, I came up with
>>> the following minimal-ish example patch, which can trigger the same
>>> behavior on Linux 4.10-rc4 (next 20170117).
> 
> What in-kernel code causes this problem?  I didn't think DT overlays
> were a feature in 4.4, are you running with code that isn't in the
> normal releases?

No, it happens in -next as well. I believe if write into configfs binary
attribute triggers filp_open(), the kernel will crash.

>>> The core of the demo is in cfs_over_item_dtbo_write(), which just checks
>>> for valid current->fs . This function is triggered by writing data into
>>> configfs binary attribute, ie.:
> 
> Why are you caring about current->fs?

Because that is what's NULL and is referenced (in set_root_rcu()) when
the configfs binary attribute is written and triggers filp_open() .

>>> $ mkdir /sys/kernel/config/test/overlays/1/dtbo
>>> $ cat file_17201_bytes_long > /sys/kernel/config/test/overlays/1/dtbo
>>>
>>> I believe the 'cat' program exits quickly and thus calls fs_exit()
>>> before the cfs_over_item_dtbo_write() is called.
> 
> How can exit be called before write?

I believe the exit happens after write, but this function
cfs_over_item_dtbo_write() is entered only after the fs_exit().

>>> Any attempts to
>>> access FS (like ie. loading firmware from FS) from that function will
>>> therefore fail (by crashing the kernel, NULL pointer dereference in
>>> set_root_rcu() in fs/namei.c).
>>>
>>> On the other hand, replacing 'cat' with 'dd' yields different result:
>>>
>>> $ dd if=file_17201_bytes_long of=/sys/kernel/config/test/overlays/1/dtbo
>>>
>>> The kernel does not crash. I believe this is because dd takes slightly
>>> longer to complete, so the cfs_over_item_dtbo_write() can complete
>>> before the dd process gets to calling fs_exit() and so the filesystem
>>> access is still available, thus current->fs is valid.
> 
> cat and dd act differently, if you strace them, it should show the
> differences, perhaps you can narrow it down there?

I can try.

>>> Note that when using DT overlays (whose configfs interface is not yet
>>> mainline),
> 
> Ah, we can't do anything about code that is not merged, perhaps it is
> just buggy? :)

The configfs stuff is in -next , how is it not merged ? The code below
is an example that triggers the problem.

>>> there can easily be a device which requires a firmware in
>>> the DT overlay. Such device will invoke firmware load, which uses the
>>> filp_open() and will thus trigger the behavior above. Depending on
>>> whether one uses dd or cat, the kernel will either crash or not.
>>>
>>> Any ideas ?
> 
> I think you need to fix your device tree overlay code...

This is not related to DTO, I only use that to trigger the problem.

-- 
Best regards,
Marek Vasut


Re: [PATCH] [RFC] fs: Possible filp_open race experiment

2017-01-30 Thread Greg Kroah-Hartman
On Tue, Jan 31, 2017 at 06:29:36AM +0100, Marek Vasut wrote:
> +CC Greg, LKML as I don't quite know where this should go.

You do know about linux-fsdevel, right?

> On 01/18/2017 12:16 AM, Marek Vasut wrote:
> > I believe there is a possible race condition when configfs attributes
> > trigger filp_open() from the kernel. I initially observed the problem
> > on Linux 4.4 when loading DT overlay , which in turn loads a driver
> > which loads firmware. After some further investigation, I came up with
> > the following minimal-ish example patch, which can trigger the same
> > behavior on Linux 4.10-rc4 (next 20170117).

What in-kernel code causes this problem?  I didn't think DT overlays
were a feature in 4.4, are you running with code that isn't in the
normal releases?

> > The core of the demo is in cfs_over_item_dtbo_write(), which just checks
> > for valid current->fs . This function is triggered by writing data into
> > configfs binary attribute, ie.:

Why are you caring about current->fs?

> > $ mkdir /sys/kernel/config/test/overlays/1/dtbo
> > $ cat file_17201_bytes_long > /sys/kernel/config/test/overlays/1/dtbo
> > 
> > I believe the 'cat' program exits quickly and thus calls fs_exit()
> > before the cfs_over_item_dtbo_write() is called.

How can exit be called before write?

> > Any attempts to
> > access FS (like ie. loading firmware from FS) from that function will
> > therefore fail (by crashing the kernel, NULL pointer dereference in
> > set_root_rcu() in fs/namei.c).
> > 
> > On the other hand, replacing 'cat' with 'dd' yields different result:
> > 
> > $ dd if=file_17201_bytes_long of=/sys/kernel/config/test/overlays/1/dtbo
> > 
> > The kernel does not crash. I believe this is because dd takes slightly
> > longer to complete, so the cfs_over_item_dtbo_write() can complete
> > before the dd process gets to calling fs_exit() and so the filesystem
> > access is still available, thus current->fs is valid.

cat and dd act differently, if you strace them, it should show the
differences, perhaps you can narrow it down there?

> > Note that when using DT overlays (whose configfs interface is not yet
> > mainline),

Ah, we can't do anything about code that is not merged, perhaps it is
just buggy? :)

> > there can easily be a device which requires a firmware in
> > the DT overlay. Such device will invoke firmware load, which uses the
> > filp_open() and will thus trigger the behavior above. Depending on
> > whether one uses dd or cat, the kernel will either crash or not.
> > 
> > Any ideas ?

I think you need to fix your device tree overlay code...

thanks,

greg k-h


Re: [PATCH] [RFC] fs: Possible filp_open race experiment

2017-01-30 Thread Greg Kroah-Hartman
On Tue, Jan 31, 2017 at 06:29:36AM +0100, Marek Vasut wrote:
> +CC Greg, LKML as I don't quite know where this should go.

You do know about linux-fsdevel, right?

> On 01/18/2017 12:16 AM, Marek Vasut wrote:
> > I believe there is a possible race condition when configfs attributes
> > trigger filp_open() from the kernel. I initially observed the problem
> > on Linux 4.4 when loading DT overlay , which in turn loads a driver
> > which loads firmware. After some further investigation, I came up with
> > the following minimal-ish example patch, which can trigger the same
> > behavior on Linux 4.10-rc4 (next 20170117).

What in-kernel code causes this problem?  I didn't think DT overlays
were a feature in 4.4, are you running with code that isn't in the
normal releases?

> > The core of the demo is in cfs_over_item_dtbo_write(), which just checks
> > for valid current->fs . This function is triggered by writing data into
> > configfs binary attribute, ie.:

Why are you caring about current->fs?

> > $ mkdir /sys/kernel/config/test/overlays/1/dtbo
> > $ cat file_17201_bytes_long > /sys/kernel/config/test/overlays/1/dtbo
> > 
> > I believe the 'cat' program exits quickly and thus calls fs_exit()
> > before the cfs_over_item_dtbo_write() is called.

How can exit be called before write?

> > Any attempts to
> > access FS (like ie. loading firmware from FS) from that function will
> > therefore fail (by crashing the kernel, NULL pointer dereference in
> > set_root_rcu() in fs/namei.c).
> > 
> > On the other hand, replacing 'cat' with 'dd' yields different result:
> > 
> > $ dd if=file_17201_bytes_long of=/sys/kernel/config/test/overlays/1/dtbo
> > 
> > The kernel does not crash. I believe this is because dd takes slightly
> > longer to complete, so the cfs_over_item_dtbo_write() can complete
> > before the dd process gets to calling fs_exit() and so the filesystem
> > access is still available, thus current->fs is valid.

cat and dd act differently, if you strace them, it should show the
differences, perhaps you can narrow it down there?

> > Note that when using DT overlays (whose configfs interface is not yet
> > mainline),

Ah, we can't do anything about code that is not merged, perhaps it is
just buggy? :)

> > there can easily be a device which requires a firmware in
> > the DT overlay. Such device will invoke firmware load, which uses the
> > filp_open() and will thus trigger the behavior above. Depending on
> > whether one uses dd or cat, the kernel will either crash or not.
> > 
> > Any ideas ?

I think you need to fix your device tree overlay code...

thanks,

greg k-h


Re: [PATCH] [RFC] fs: Possible filp_open race experiment

2017-01-30 Thread Marek Vasut
+CC Greg, LKML as I don't quite know where this should go.

On 01/18/2017 12:16 AM, Marek Vasut wrote:
> I believe there is a possible race condition when configfs attributes
> trigger filp_open() from the kernel. I initially observed the problem
> on Linux 4.4 when loading DT overlay , which in turn loads a driver
> which loads firmware. After some further investigation, I came up with
> the following minimal-ish example patch, which can trigger the same
> behavior on Linux 4.10-rc4 (next 20170117).
> 
> The core of the demo is in cfs_over_item_dtbo_write(), which just checks
> for valid current->fs . This function is triggered by writing data into
> configfs binary attribute, ie.:
> 
> $ mkdir /sys/kernel/config/test/overlays/1/dtbo
> $ cat file_17201_bytes_long > /sys/kernel/config/test/overlays/1/dtbo
> 
> I believe the 'cat' program exits quickly and thus calls fs_exit()
> before the cfs_over_item_dtbo_write() is called. Any attempts to
> access FS (like ie. loading firmware from FS) from that function will
> therefore fail (by crashing the kernel, NULL pointer dereference in
> set_root_rcu() in fs/namei.c).
> 
> On the other hand, replacing 'cat' with 'dd' yields different result:
> 
> $ dd if=file_17201_bytes_long of=/sys/kernel/config/test/overlays/1/dtbo
> 
> The kernel does not crash. I believe this is because dd takes slightly
> longer to complete, so the cfs_over_item_dtbo_write() can complete
> before the dd process gets to calling fs_exit() and so the filesystem
> access is still available, thus current->fs is valid.
> 
> Note that when using DT overlays (whose configfs interface is not yet
> mainline), there can easily be a device which requires a firmware in
> the DT overlay. Such device will invoke firmware load, which uses the
> filp_open() and will thus trigger the behavior above. Depending on
> whether one uses dd or cat, the kernel will either crash or not.
> 
> Any ideas ?
> 
> Signed-off-by: Marek Vasut 
> Cc: Pantelis Antoniou 
> ---
> NOTE: I don't usually dig in these areas of the kernel, do we have a
>   generic FS list for this sort of discussion ? Thus far, sending
>   it off-list. Maybe the description could also use some fleshing
>   out. Thanks
> ---
>  drivers/of/Makefile   |   2 +-
>  drivers/of/configfs.c | 149 
> ++
>  2 files changed, 150 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/of/configfs.c
> 
> diff --git a/drivers/of/Makefile b/drivers/of/Makefile
> index d7efd9d458aa..a1698bf819b5 100644
> --- a/drivers/of/Makefile
> +++ b/drivers/of/Makefile
> @@ -1,4 +1,4 @@
> -obj-y = base.o device.o platform.o
> +obj-y = base.o device.o platform.o configfs.o
>  obj-$(CONFIG_OF_DYNAMIC) += dynamic.o
>  obj-$(CONFIG_OF_FLATTREE) += fdt.o
>  obj-$(CONFIG_OF_EARLY_FLATTREE) += fdt_address.o
> diff --git a/drivers/of/configfs.c b/drivers/of/configfs.c
> new file mode 100644
> index ..407016e22209
> --- /dev/null
> +++ b/drivers/of/configfs.c
> @@ -0,0 +1,149 @@
> +/*
> + * Possible race in filp_open
> + *
> + * Copyright (C) 2017 Marek Vasut 
> + *
> + * test based on Configfs entries for DTOs
> + *
> + * Copyright (C) 2013 - Pantelis Antoniou 
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static struct configfs_attribute *cfs_overlay_attrs[] = {
> + NULL,
> +};
> +
> +ssize_t cfs_overlay_item_dtbo_read(struct config_item *item, void *buf,
> + size_t max_count)
> +{
> + return 0;
> +}
> +
> +ssize_t cfs_overlay_item_dtbo_write(struct config_item *item, const void 
> *buf,
> + size_t count)
> +{
> + WARN_ON(!current->fs);
> + /*
> +  * If anything here triggers filp_open(), kernel may crash.
> +  * The filp_open() is triggered ie. by firmware loading.
> +  *
> +  * 1) The following example will crash because cat exits too quickly
> +  *and fs_exit() is called before this code is executed, so no FS
> +  *access is available anymore.
> +  * $ mkdir /sys/kernel/config/test/overlays/1/dtbo
> +  * $ cat file_17201_bytes_long > /sys/kernel/config/test/overlays/1/dtbo
> +  *
> +  *Using "dmesg" instead of cat file_17201_bytes_long works too.
> +  *Any file length over 16 kiB will work, 17201 Bytes long file is
> +  *what I used:
> +  *
> +  * $ dmesg > /sys/kernel/config/test/overlays/1/dtbo
> +  *
> +  * 2) The follow example will not crash because dd exits slightly slower
> +  *and thus the fs_exit() is called after this code is executed and
> +  *the FS access 

Re: [PATCH] [RFC] fs: Possible filp_open race experiment

2017-01-30 Thread Marek Vasut
+CC Greg, LKML as I don't quite know where this should go.

On 01/18/2017 12:16 AM, Marek Vasut wrote:
> I believe there is a possible race condition when configfs attributes
> trigger filp_open() from the kernel. I initially observed the problem
> on Linux 4.4 when loading DT overlay , which in turn loads a driver
> which loads firmware. After some further investigation, I came up with
> the following minimal-ish example patch, which can trigger the same
> behavior on Linux 4.10-rc4 (next 20170117).
> 
> The core of the demo is in cfs_over_item_dtbo_write(), which just checks
> for valid current->fs . This function is triggered by writing data into
> configfs binary attribute, ie.:
> 
> $ mkdir /sys/kernel/config/test/overlays/1/dtbo
> $ cat file_17201_bytes_long > /sys/kernel/config/test/overlays/1/dtbo
> 
> I believe the 'cat' program exits quickly and thus calls fs_exit()
> before the cfs_over_item_dtbo_write() is called. Any attempts to
> access FS (like ie. loading firmware from FS) from that function will
> therefore fail (by crashing the kernel, NULL pointer dereference in
> set_root_rcu() in fs/namei.c).
> 
> On the other hand, replacing 'cat' with 'dd' yields different result:
> 
> $ dd if=file_17201_bytes_long of=/sys/kernel/config/test/overlays/1/dtbo
> 
> The kernel does not crash. I believe this is because dd takes slightly
> longer to complete, so the cfs_over_item_dtbo_write() can complete
> before the dd process gets to calling fs_exit() and so the filesystem
> access is still available, thus current->fs is valid.
> 
> Note that when using DT overlays (whose configfs interface is not yet
> mainline), there can easily be a device which requires a firmware in
> the DT overlay. Such device will invoke firmware load, which uses the
> filp_open() and will thus trigger the behavior above. Depending on
> whether one uses dd or cat, the kernel will either crash or not.
> 
> Any ideas ?
> 
> Signed-off-by: Marek Vasut 
> Cc: Pantelis Antoniou 
> ---
> NOTE: I don't usually dig in these areas of the kernel, do we have a
>   generic FS list for this sort of discussion ? Thus far, sending
>   it off-list. Maybe the description could also use some fleshing
>   out. Thanks
> ---
>  drivers/of/Makefile   |   2 +-
>  drivers/of/configfs.c | 149 
> ++
>  2 files changed, 150 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/of/configfs.c
> 
> diff --git a/drivers/of/Makefile b/drivers/of/Makefile
> index d7efd9d458aa..a1698bf819b5 100644
> --- a/drivers/of/Makefile
> +++ b/drivers/of/Makefile
> @@ -1,4 +1,4 @@
> -obj-y = base.o device.o platform.o
> +obj-y = base.o device.o platform.o configfs.o
>  obj-$(CONFIG_OF_DYNAMIC) += dynamic.o
>  obj-$(CONFIG_OF_FLATTREE) += fdt.o
>  obj-$(CONFIG_OF_EARLY_FLATTREE) += fdt_address.o
> diff --git a/drivers/of/configfs.c b/drivers/of/configfs.c
> new file mode 100644
> index ..407016e22209
> --- /dev/null
> +++ b/drivers/of/configfs.c
> @@ -0,0 +1,149 @@
> +/*
> + * Possible race in filp_open
> + *
> + * Copyright (C) 2017 Marek Vasut 
> + *
> + * test based on Configfs entries for DTOs
> + *
> + * Copyright (C) 2013 - Pantelis Antoniou 
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static struct configfs_attribute *cfs_overlay_attrs[] = {
> + NULL,
> +};
> +
> +ssize_t cfs_overlay_item_dtbo_read(struct config_item *item, void *buf,
> + size_t max_count)
> +{
> + return 0;
> +}
> +
> +ssize_t cfs_overlay_item_dtbo_write(struct config_item *item, const void 
> *buf,
> + size_t count)
> +{
> + WARN_ON(!current->fs);
> + /*
> +  * If anything here triggers filp_open(), kernel may crash.
> +  * The filp_open() is triggered ie. by firmware loading.
> +  *
> +  * 1) The following example will crash because cat exits too quickly
> +  *and fs_exit() is called before this code is executed, so no FS
> +  *access is available anymore.
> +  * $ mkdir /sys/kernel/config/test/overlays/1/dtbo
> +  * $ cat file_17201_bytes_long > /sys/kernel/config/test/overlays/1/dtbo
> +  *
> +  *Using "dmesg" instead of cat file_17201_bytes_long works too.
> +  *Any file length over 16 kiB will work, 17201 Bytes long file is
> +  *what I used:
> +  *
> +  * $ dmesg > /sys/kernel/config/test/overlays/1/dtbo
> +  *
> +  * 2) The follow example will not crash because dd exits slightly slower
> +  *and thus the fs_exit() is called after this code is executed and
> +  *the FS access is still available.
> +  * $ mkdir /sys/kernel/config/test/overlays/1/dtbo
> +  * $