On 11.03.21 19:30, Minchan Kim wrote:
Currently, debugging CMA allocation failures is quite limited.
The most commong source of these failures seems to be page

s/commong/common/

migration which doesn't provide any useful information on the
reason of the failure by itself. alloc_contig_range can report
those failures as it holds a list of migrate-failed pages.

page refcount, mapcount with page flags on dump_page are
helpful information to deduce the culprit. Furthermore,
dump_page_owner was super helpful to find long term pinner
who initiated the page allocation.

Maybe simply "The information logged by dump_page() has already proven helpful for debugging allocation issues, like identifying long-term pinnings on ZONE_MOVABLE or MIGRATE_CMA."


The reason it approach with dynamic debug is the debug message
could emit lots of noises as alloc_contig_range calls more
frequently since it's a best effort allocator.

"Let's use the dynamic debugging infrastructure, such that we avoid
flooding the logs and creating a lot of noise on frequent alloc_contig_range() calls. This information is helpful for debugging only."

>>>


There are two ifdefery conditions to support common dyndbg options:

- CONFIG_DYNAMIC_DEBUG_CORE && DYNAMIC_DEBUG_MODULE
It aims for supporting the feature with only specific file
with adding ccflags.

- CONFIG_DYNAMIC_DEBUG
It aims for supporting the feature with system wide globally.

A simple example to enable the feature:

Admin could enable the dump like this(by default, disabled)

        echo "func alloc_contig_dump_pages +p" > control

Admin could disable it.

        echo "func alloc_contig_dump_pages =_" > control

Detail goes Documentation/admin-guide/dynamic-debug-howto.rst

<<< I'd drop that completely and only mention:

"For details on dynamic debugging, see Documentation/admin-guide/dynamic-debug-howto.rst."

As you have usage in the code itself, I think you don't have to be repetitive. The ifdeffery seems to be common (e.g., include/linux/netdevice.) for dynamic debugging users, so I don't see the need to describe that in detail.

>>>


A concern is utility functions in dump_page uses inconsistent
loglevels.

__dump_page: KERN_WARNING
__dump_page_owner: KERN_ALERT
         stack_trace_print: KERN_DEFAULT

There are bunch of places to use the inconsistent loglevel
utility functions(e.g., just grep dump_page/strace_trace_print).
It's unfortunate but here we are. It could be addressed
different patchset.

<<< I'd drop that completely and mention

"In the future, we might want to make the loglevels used inside dump_page() consistent and eventually rework the way we log the information here. See [1]"

Where [1] is a link to the discussion.


Signed-off-by: Minchan Kim <minc...@kernel.org>
---
* from v3 - 
https://lore.kernel.org/linux-mm/20210310180104.517886-1-minc...@kernel.org
   * add dyndgb usage comment - akpm
   * use dumpstack instead of warn_on - david

* from v2 - 
https://lore.kernel.org/linux-mm/20210308202047.1903802-1-minc...@kernel.org/
   * remove ratelimit - mhocko

* from v1 - 
https://lore.kernel.org/linux-mm/20210217163603.429062-1-minc...@kernel.org/
   * use dynamic debugging with system wide instead of per-call site - mhocko

  mm/page_alloc.c | 31 +++++++++++++++++++++++++++++++
  1 file changed, 31 insertions(+)


Minor nits:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e4b29ee2b1e..76fc202cb105 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8453,6 +8453,36 @@ static unsigned long pfn_max_align_up(unsigned long pfn)
                                pageblock_nr_pages));
  }
+#if defined(CONFIG_DYNAMIC_DEBUG) || \
+       (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
+/*
+ * usage)

"usage)" looks wrong here. Did you mean "Usage:"

+ *     dyndbg_dir="/sys/kernel/debug/dynamic_debug"
+ * To enable:
+ *     echo "func alloc_contig_dump_pages +p" > $dyndbg_dir/control
+ * To disable:
+ *     echo "func alloc_contig_dump_pages =_" > $dyndbg_dir/control
+ * For detail, read dynamic-debug-howto.rst

Maybe simply

"See admin-guide/dynamic-debug-howto.rst"

+ */
+static void alloc_contig_dump_pages(struct list_head *page_list)
+{
+       DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
+                       "migrate failure");

You can fit that into a single line.

+
+       if (DYNAMIC_DEBUG_BRANCH(descriptor)) {
+               struct page *page;
+
+               dump_stack();
+               list_for_each_entry(page, page_list, lru)
+                       dump_page(page, "migration failure");
+       }
+}
+#else
+static inline void alloc_contig_dump_pages(struct list_head *page_list)
+{
+}
+#endif
+
  /* [start, end) must belong to a single zone. */
  static int __alloc_contig_migrate_range(struct compact_control *cc,
                                        unsigned long start, unsigned long end)
@@ -8496,6 +8526,7 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
                                NULL, (unsigned long)&mtc, cc->mode, 
MR_CONTIG_RANGE);
        }
        if (ret < 0) {
+               alloc_contig_dump_pages(&cc->migratepages);
                putback_movable_pages(&cc->migratepages);
                return ret;
        }


As I said, for my taste good enough for now. I would certainly preferred what Michal suggested (e.g., doing it via debug loglevels), but this gets the job done and is not too ugly.

Reviewed-by: David Hildenbrand <da...@redhat.com>

--
Thanks,

David / dhildenb

Reply via email to