Re: [PATCH v2 2/2] of: __of_detach_node() - remove node from phandle cache

2018-12-18 Thread Michael Ellerman
Rob Herring  writes:
> On Tue, Dec 18, 2018 at 2:33 PM Frank Rowand  wrote:
>>
>> On 12/18/18 12:09 PM, Frank Rowand wrote:
>> > On 12/18/18 12:01 PM, Rob Herring wrote:
>> >> On Tue, Dec 18, 2018 at 12:57 PM Frank Rowand  
>> >> wrote:
>> >>>
>> >>> On 12/17/18 2:52 AM, Michael Ellerman wrote:
>>  Hi Frank,
>> 
>>  frowand.l...@gmail.com writes:
>> > From: Frank Rowand 
>> >
>> > Non-overlay dynamic devicetree node removal may leave the node in
>> > the phandle cache.  Subsequent calls to of_find_node_by_phandle()
>> > will incorrectly find the stale entry.  Remove the node from the
>> > cache.
>> >
>> > Add paranoia checks in of_find_node_by_phandle() as a second level
>> > of defense (do not return cached node if detached, do not add node
>> > to cache if detached).
>> >
>> > Reported-by: Michael Bringmann 
>> > Signed-off-by: Frank Rowand 
>> > ---
>> 
>>  Similarly here can we add:
>> 
>>  Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of 
>>  of_find_node_by_phandle()")
>> >>>
>> >>> Yes, thanks.
>> >>>
>> >>>
>>  Cc: sta...@vger.kernel.org # v4.17+
>> >>>
>> >>> Nope, 0b3ce78e90fc does not belong in stable (it is a feature, not a bug
>> >>> fix).  So the bug will not be in stable.
>> >>
>> >> 0b3ce78e90fc landed in v4.17, so Michael's line above is correct.
>> >> Annotating it with 4.17 only saves Greg from trying and then emailing
>> >> us to backport this patch as it wouldn't apply.
>> >
>> > Thanks for the correction.  I was both under-thinking and over-thinking,
>> > ending up with an incorrect answer.
>> >
>> > Can you add the Cc: to version 3 patch comments (both 1/2 and 2/2) or do
>> > you want me to re-spin?
>>
>> Now that my thinking has been straightened out, a little bit more checking
>> for the other pre-requisite patches show:
>>
>>   v4.18: commit b9952b5218ad ("of: overlay: update phandle cache on overlay 
>> apply and remove")
>>   v4.19: commit e54192b48da7 ("of: fix phandle cache creation for DTs with 
>> no phandles")
>>
>> These can be addressed by changing the "Cc:" to ... # v4.19+
>> because stable v4.17.* and v4.18.* are end of life.
>
> EOL shouldn't factor into it. There's always the possibility someone
> else picks any kernel version.

Yeah, there are other stable branches out there, so the tag should point
to the correct version regardless of whether it's currently EOL on
kernel.org.

>> Or the pre-requisites can be listed:
>>
>># v4.17: b9952b5218ad of: overlay: update phandle cache
>># v4.17: e54192b48da7 of: fix phandle cache creation
>># v4.17
>>
>># v4.18: e54192b48da7 of: fix phandle cache creation
>># v4.18
>>
>># v4.19+
>>
>> Do you have a preference?
>
> I think we just list v4.17 and be done with it.

Yep, anyone who wants to backport it can work it out, or ask us.

cheers


Re: [PATCH v2 2/2] of: __of_detach_node() - remove node from phandle cache

2018-12-18 Thread Rob Herring
On Tue, Dec 18, 2018 at 2:33 PM Frank Rowand  wrote:
>
> On 12/18/18 12:09 PM, Frank Rowand wrote:
> > On 12/18/18 12:01 PM, Rob Herring wrote:
> >> On Tue, Dec 18, 2018 at 12:57 PM Frank Rowand  
> >> wrote:
> >>>
> >>> On 12/17/18 2:52 AM, Michael Ellerman wrote:
>  Hi Frank,
> 
>  frowand.l...@gmail.com writes:
> > From: Frank Rowand 
> >
> > Non-overlay dynamic devicetree node removal may leave the node in
> > the phandle cache.  Subsequent calls to of_find_node_by_phandle()
> > will incorrectly find the stale entry.  Remove the node from the
> > cache.
> >
> > Add paranoia checks in of_find_node_by_phandle() as a second level
> > of defense (do not return cached node if detached, do not add node
> > to cache if detached).
> >
> > Reported-by: Michael Bringmann 
> > Signed-off-by: Frank Rowand 
> > ---
> 
>  Similarly here can we add:
> 
>  Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of 
>  of_find_node_by_phandle()")
> >>>
> >>> Yes, thanks.
> >>>
> >>>
>  Cc: sta...@vger.kernel.org # v4.17+
> >>>
> >>> Nope, 0b3ce78e90fc does not belong in stable (it is a feature, not a bug
> >>> fix).  So the bug will not be in stable.
> >>
> >> 0b3ce78e90fc landed in v4.17, so Michael's line above is correct.
> >> Annotating it with 4.17 only saves Greg from trying and then emailing
> >> us to backport this patch as it wouldn't apply.
> >
> > Thanks for the correction.  I was both under-thinking and over-thinking,
> > ending up with an incorrect answer.
> >
> > Can you add the Cc: to version 3 patch comments (both 1/2 and 2/2) or do
> > you want me to re-spin?
>
> Now that my thinking has been straightened out, a little bit more checking
> for the other pre-requisite patches show:
>
>   v4.18: commit b9952b5218ad ("of: overlay: update phandle cache on overlay 
> apply and remove")
>   v4.19: commit e54192b48da7 ("of: fix phandle cache creation for DTs with no 
> phandles")
>
> These can be addressed by changing the "Cc:" to ... # v4.19+
> because stable v4.17.* and v4.18.* are end of life.

EOL shouldn't factor into it. There's always the possibility someone
else picks any kernel version.

> Or the pre-requisites can be listed:
>
># v4.17: b9952b5218ad of: overlay: update phandle cache
># v4.17: e54192b48da7 of: fix phandle cache creation
># v4.17
>
># v4.18: e54192b48da7 of: fix phandle cache creation
># v4.18
>
># v4.19+
>
> Do you have a preference?

I think we just list v4.17 and be done with it.

Rob


Re: [PATCH v2 2/2] of: __of_detach_node() - remove node from phandle cache

2018-12-18 Thread Frank Rowand
On 12/18/18 12:09 PM, Frank Rowand wrote:
> On 12/18/18 12:01 PM, Rob Herring wrote:
>> On Tue, Dec 18, 2018 at 12:57 PM Frank Rowand  wrote:
>>>
>>> On 12/17/18 2:52 AM, Michael Ellerman wrote:
 Hi Frank,

 frowand.l...@gmail.com writes:
> From: Frank Rowand 
>
> Non-overlay dynamic devicetree node removal may leave the node in
> the phandle cache.  Subsequent calls to of_find_node_by_phandle()
> will incorrectly find the stale entry.  Remove the node from the
> cache.
>
> Add paranoia checks in of_find_node_by_phandle() as a second level
> of defense (do not return cached node if detached, do not add node
> to cache if detached).
>
> Reported-by: Michael Bringmann 
> Signed-off-by: Frank Rowand 
> ---

 Similarly here can we add:

 Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of 
 of_find_node_by_phandle()")
>>>
>>> Yes, thanks.
>>>
>>>
 Cc: sta...@vger.kernel.org # v4.17+
>>>
>>> Nope, 0b3ce78e90fc does not belong in stable (it is a feature, not a bug
>>> fix).  So the bug will not be in stable.
>>
>> 0b3ce78e90fc landed in v4.17, so Michael's line above is correct.
>> Annotating it with 4.17 only saves Greg from trying and then emailing
>> us to backport this patch as it wouldn't apply.
> 
> Thanks for the correction.  I was both under-thinking and over-thinking,
> ending up with an incorrect answer.
> 
> Can you add the Cc: to version 3 patch comments (both 1/2 and 2/2) or do
> you want me to re-spin?

Now that my thinking has been straightened out, a little bit more checking
for the other pre-requisite patches show:

  v4.18: commit b9952b5218ad ("of: overlay: update phandle cache on overlay 
apply and remove")
  v4.19: commit e54192b48da7 ("of: fix phandle cache creation for DTs with no 
phandles")

These can be addressed by changing the "Cc:" to ... # v4.19+
because stable v4.17.* and v4.18.* are end of life.

Or the pre-requisites can be listed:

   # v4.17: b9952b5218ad of: overlay: update phandle cache
   # v4.17: e54192b48da7 of: fix phandle cache creation
   # v4.17

   # v4.18: e54192b48da7 of: fix phandle cache creation
   # v4.18

   # v4.19+

Do you have a preference?

-Frank
> 
> -Frank
> 
>>
>> Rob
>>
> 
> 



Re: [PATCH v2 2/2] of: __of_detach_node() - remove node from phandle cache

2018-12-18 Thread Frank Rowand
On 12/18/18 12:01 PM, Rob Herring wrote:
> On Tue, Dec 18, 2018 at 12:57 PM Frank Rowand  wrote:
>>
>> On 12/17/18 2:52 AM, Michael Ellerman wrote:
>>> Hi Frank,
>>>
>>> frowand.l...@gmail.com writes:
 From: Frank Rowand 

 Non-overlay dynamic devicetree node removal may leave the node in
 the phandle cache.  Subsequent calls to of_find_node_by_phandle()
 will incorrectly find the stale entry.  Remove the node from the
 cache.

 Add paranoia checks in of_find_node_by_phandle() as a second level
 of defense (do not return cached node if detached, do not add node
 to cache if detached).

 Reported-by: Michael Bringmann 
 Signed-off-by: Frank Rowand 
 ---
>>>
>>> Similarly here can we add:
>>>
>>> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of 
>>> of_find_node_by_phandle()")
>>
>> Yes, thanks.
>>
>>
>>> Cc: sta...@vger.kernel.org # v4.17+
>>
>> Nope, 0b3ce78e90fc does not belong in stable (it is a feature, not a bug
>> fix).  So the bug will not be in stable.
> 
> 0b3ce78e90fc landed in v4.17, so Michael's line above is correct.
> Annotating it with 4.17 only saves Greg from trying and then emailing
> us to backport this patch as it wouldn't apply.

Thanks for the correction.  I was both under-thinking and over-thinking,
ending up with an incorrect answer.

Can you add the Cc: to version 3 patch comments (both 1/2 and 2/2) or do
you want me to re-spin?

-Frank

> 
> Rob
> 



Re: [PATCH v2 2/2] of: __of_detach_node() - remove node from phandle cache

2018-12-18 Thread Rob Herring
On Tue, Dec 18, 2018 at 12:57 PM Frank Rowand  wrote:
>
> On 12/17/18 2:52 AM, Michael Ellerman wrote:
> > Hi Frank,
> >
> > frowand.l...@gmail.com writes:
> >> From: Frank Rowand 
> >>
> >> Non-overlay dynamic devicetree node removal may leave the node in
> >> the phandle cache.  Subsequent calls to of_find_node_by_phandle()
> >> will incorrectly find the stale entry.  Remove the node from the
> >> cache.
> >>
> >> Add paranoia checks in of_find_node_by_phandle() as a second level
> >> of defense (do not return cached node if detached, do not add node
> >> to cache if detached).
> >>
> >> Reported-by: Michael Bringmann 
> >> Signed-off-by: Frank Rowand 
> >> ---
> >
> > Similarly here can we add:
> >
> > Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of 
> > of_find_node_by_phandle()")
>
> Yes, thanks.
>
>
> > Cc: sta...@vger.kernel.org # v4.17+
>
> Nope, 0b3ce78e90fc does not belong in stable (it is a feature, not a bug
> fix).  So the bug will not be in stable.

0b3ce78e90fc landed in v4.17, so Michael's line above is correct.
Annotating it with 4.17 only saves Greg from trying and then emailing
us to backport this patch as it wouldn't apply.

Rob


Re: [PATCH v2 2/2] of: __of_detach_node() - remove node from phandle cache

2018-12-18 Thread Frank Rowand
On 12/17/18 2:52 AM, Michael Ellerman wrote:
> Hi Frank,
> 
> frowand.l...@gmail.com writes:
>> From: Frank Rowand 
>>
>> Non-overlay dynamic devicetree node removal may leave the node in
>> the phandle cache.  Subsequent calls to of_find_node_by_phandle()
>> will incorrectly find the stale entry.  Remove the node from the
>> cache.
>>
>> Add paranoia checks in of_find_node_by_phandle() as a second level
>> of defense (do not return cached node if detached, do not add node
>> to cache if detached).
>>
>> Reported-by: Michael Bringmann 
>> Signed-off-by: Frank Rowand 
>> ---
> 
> Similarly here can we add:
> 
> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of 
> of_find_node_by_phandle()")

Yes, thanks.


> Cc: sta...@vger.kernel.org # v4.17+

Nope, 0b3ce78e90fc does not belong in stable (it is a feature, not a bug
fix).  So the bug will not be in stable.

I've debated with myself over this, because there is a possibility that
0b3ce78e90fc could somehow be put into a stable despite not being a
bug fix.  We can always explicitly request this patch series be added
to stable in that case.


> Thanks for doing this series.
> 
> Some minor comments below.
> 
>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>> index 6c33d63361b8..ad71864cecf5 100644
>> --- a/drivers/of/base.c
>> +++ b/drivers/of/base.c
>> @@ -162,6 +162,27 @@ int of_free_phandle_cache(void)
>>  late_initcall_sync(of_free_phandle_cache);
>>  #endif
>>  
>> +/*
>> + * Caller must hold devtree_lock.
>> + */
>> +void __of_free_phandle_cache_entry(phandle handle)
>> +{
>> +phandle masked_handle;
>> +
>> +if (!handle)
>> +return;
> 
> We could fold the phandle_cache check into that if and return early for
> both cases couldn't we?

We could, but that would make the reason for checking phandle_cache
less obvious.  I would rather leave that check
> 
>> +masked_handle = handle & phandle_cache_mask;
>> +
>> +if (phandle_cache) {
> 
> Meaning this wouldn't be necessary.
> 
>> +if (phandle_cache[masked_handle] &&
>> +handle == phandle_cache[masked_handle]->phandle) {
>> +of_node_put(phandle_cache[masked_handle]);
>> +phandle_cache[masked_handle] = NULL;
>> +}
> 
> A temporary would help the readability here I think, eg:
> 
>   struct device_node *np;
> np = phandle_cache[masked_handle];
> 
>   if (np && handle == np->phandle) {
>   of_node_put(np);
>   phandle_cache[masked_handle] = NULL;
>   }

Yes, much cleaner.


>> @@ -1209,11 +1230,18 @@ struct device_node *of_find_node_by_phandle(phandle 
>> handle)
>>  if (phandle_cache[masked_handle] &&
>>  handle == phandle_cache[masked_handle]->phandle)
>>  np = phandle_cache[masked_handle];
>> +if (np && of_node_check_flag(np, OF_DETACHED)) {
>> +WARN_ON(1);
>> +of_node_put(np);
>
> Do we really want to do the put here?
> 
> We're here because something has gone wrong, possibly even memory
> corruption such that np is not even pointing at a device node anymore.
> So it seems like it would be safer to just leave the ref count alone,
> possibly leak a small amount of memory, and NULL out the reference.

I like the concept of the code being a little bit paranoid.

But the bug that this check is likely to cache is the bug that led
to this series -- removing a devicetree node, but failing to remove
it from the cache as part of the removal.  So I think I'll leave
it as is.

> 
> 
> cheers
> 

Thanks for the thoughts and suggestions!

-Frank



Re: [PATCH v2 2/2] of: __of_detach_node() - remove node from phandle cache

2018-12-17 Thread Michael Ellerman
Hi Frank,

frowand.l...@gmail.com writes:
> From: Frank Rowand 
>
> Non-overlay dynamic devicetree node removal may leave the node in
> the phandle cache.  Subsequent calls to of_find_node_by_phandle()
> will incorrectly find the stale entry.  Remove the node from the
> cache.
>
> Add paranoia checks in of_find_node_by_phandle() as a second level
> of defense (do not return cached node if detached, do not add node
> to cache if detached).
>
> Reported-by: Michael Bringmann 
> Signed-off-by: Frank Rowand 
> ---

Similarly here can we add:

Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of 
of_find_node_by_phandle()")
Cc: sta...@vger.kernel.org # v4.17+


Thanks for doing this series.

Some minor comments below.

> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 6c33d63361b8..ad71864cecf5 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -162,6 +162,27 @@ int of_free_phandle_cache(void)
>  late_initcall_sync(of_free_phandle_cache);
>  #endif
>  
> +/*
> + * Caller must hold devtree_lock.
> + */
> +void __of_free_phandle_cache_entry(phandle handle)
> +{
> + phandle masked_handle;
> +
> + if (!handle)
> + return;

We could fold the phandle_cache check into that if and return early for
both cases couldn't we?

> + masked_handle = handle & phandle_cache_mask;
> +
> + if (phandle_cache) {

Meaning this wouldn't be necessary.

> + if (phandle_cache[masked_handle] &&
> + handle == phandle_cache[masked_handle]->phandle) {
> + of_node_put(phandle_cache[masked_handle]);
> + phandle_cache[masked_handle] = NULL;
> + }

A temporary would help the readability here I think, eg:

struct device_node *np;
np = phandle_cache[masked_handle];

if (np && handle == np->phandle) {
of_node_put(np);
phandle_cache[masked_handle] = NULL;
}

> @@ -1209,11 +1230,18 @@ struct device_node *of_find_node_by_phandle(phandle 
> handle)
>   if (phandle_cache[masked_handle] &&
>   handle == phandle_cache[masked_handle]->phandle)
>   np = phandle_cache[masked_handle];
> + if (np && of_node_check_flag(np, OF_DETACHED)) {
> + WARN_ON(1);
> + of_node_put(np);

Do we really want to do the put here?

We're here because something has gone wrong, possibly even memory
corruption such that np is not even pointing at a device node anymore.
So it seems like it would be safer to just leave the ref count alone,
possibly leak a small amount of memory, and NULL out the reference.


cheers