On Wed, Oct 17, 2018 at 4:30 PM Alan Tull <at...@kernel.org> wrote:
>
> On Mon, Oct 15, 2018 at 9:39 PM <frowand.l...@gmail.com> wrote:
>
> Hi Frank,
>
> >
> > From: Frank Rowand <frank.row...@sony.com>
> >
> > Add checks:
> >   - attempted kfree due to refcount reaching zero before overlay
> >     is removed
> >   - properties linked to an overlay node when the node is removed
> >   - node refcount > one during node removal in a changeset destroy,
> >     if the node was created by the changeset
> >
> > After applying this patch, several validation warnings will be
> > reported from the devicetree unittest during boot due to
> > pre-existing devicetree bugs. The warnings will be similar to:
> >
> >   OF: ERROR: of_node_release() overlay node 
> > /testcase-data/overlay-node/test-bus/test-unittest11/test-unittest111 
> > contains unexpected properties
> >   OF: ERROR: memory leak - destroy cset entry: attach overlay node 
> > /testcase-data-2/substation@100/hvac-medium-2 expected refcount 1 instead 
> > of 2.  of_node_get() / of_node_put() are unbalanced for this node.
> >
> > Signed-off-by: Frank Rowand <frank.row...@sony.com>
> > ---
> > Changes since v3:
> >   - Add expected value of refcount for destroy cset entry error.  Also
> >     explain the cause of the error.
> >
> >  drivers/of/dynamic.c | 29 +++++++++++++++++++++++++++++
> >  drivers/of/overlay.c |  1 +
> >  include/linux/of.h   | 15 ++++++++++-----
> >  3 files changed, 40 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> > index f4f8ed9b5454..24c97b7a050f 100644
> > --- a/drivers/of/dynamic.c
> > +++ b/drivers/of/dynamic.c
> > @@ -330,6 +330,25 @@ void of_node_release(struct kobject *kobj)
> >         if (!of_node_check_flag(node, OF_DYNAMIC))
> >                 return;
> >
> > +       if (of_node_check_flag(node, OF_OVERLAY)) {
> > +
> > +               if (!of_node_check_flag(node, OF_OVERLAY_FREE_CSET)) {
> > +                       /* premature refcount of zero, do not free memory */
> > +                       pr_err("ERROR: memory leak %s() overlay node %pOF 
> > before free overlay changeset\n",
> > +                              __func__, node);
> > +                       return;
> > +               }
> > +
> > +               /*
> > +                * If node->properties non-empty then properties were added
> > +                * to this node either by different overlay that has not
> > +                * yet been removed, or by a non-overlay mechanism.
> > +                */
> > +               if (node->properties)
> > +                       pr_err("ERROR: %s() overlay node %pOF contains 
> > unexpected properties\n",
> > +                              __func__, node);
> > +       }
> > +
> >         property_list_free(node->properties);
> >         property_list_free(node->deadprops);
> >
> > @@ -434,6 +453,16 @@ struct device_node *__of_node_dup(const struct 
> > device_node *np,
> >
> >  static void __of_changeset_entry_destroy(struct of_changeset_entry *ce)
> >  {
> > +       if (ce->action == OF_RECONFIG_ATTACH_NODE &&
> > +           of_node_check_flag(ce->np, OF_OVERLAY)) {
> > +               if (kref_read(&ce->np->kobj.kref) > 1) {
> > +                       pr_err("ERROR: memory leak - destroy cset entry: 
> > attach overlay node %pOF expected refcount 1 instead of %d.  of_node_get() 
> > / of_node_put() are unbalanced for this node.\n",
> > +                              ce->np, kref_read(&ce->np->kobj.kref));
>
> Still testing as much as I have time to do.
>
> I'm hitting this error message once when removing an overlay that adds
> several child nodes.  The only node I get the message for was a node
> that added a fixed-clock (the other nodes didn't trigger the error).
> Then even if I edited all the rest of the overlay DTS and removed all
> other child nodes and all references to the clock from other nodes, I
> still got the error.
>
> Removing dtbo: 1-socfpga_arria10_socdk_sdmmc_ghrd_ovl_ext_cfg.dtb
> [   72.032270] OF: ERROR: memory leak - destroy cset entry: attach
> overlay node /soc/base_fpga_region/clk_0 expected refcount 1 instead
> of 2.  of_node_get() / of_node_put() are unbalanced for this node.

Update: with some helpful offline debug patches from Frank, I was able
to find the source of the of_node_get/put unbalance.  The fixed-rate
clock driver calls of_clk_add_provider() when probed but never calls
of_clk_del_provider()

This patchset quite likely will uncover other of_node_get/put
unbalances around the kernel.

Alan

>
> Here's the very stripped down overlay:
>
> /dts-v1/;
> /plugin/;
> / {
>         fragment@0 {
>                 target-path = "/soc/base_fpga_region";
>                 #address-cells = <1>;
>                 #size-cells = <1>;
>
>                 __overlay__ {
>                         external-fpga-config;
>
>                         #address-cells = <1>;
>                         #size-cells = <1>;
>
>                         clk_0: clk_0 {
>                                 compatible = "fixed-clock";
>                                 #clock-cells = <0>;
>                                 clock-frequency = <100000000>;  /* 100.00 MHz 
> */
>                                 clock-output-names = "clk_0-clk";
>                         };
>                 };
>         };
> };

Reply via email to