Re: [External] Re: [PATCH v11 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-04-17 Thread Ho-Ren (Jack) Chuang
On Wed, Apr 10, 2024 at 9:51 AM Jonathan Cameron
 wrote:
>
> On Tue, 9 Apr 2024 12:02:31 -0700
> "Ho-Ren (Jack) Chuang"  wrote:
>
> > Hi Jonathan,
> >
> > On Tue, Apr 9, 2024 at 9:12 AM Jonathan Cameron
> >  wrote:
> > >
> > > On Fri, 5 Apr 2024 15:43:47 -0700
> > > "Ho-Ren (Jack) Chuang"  wrote:
> > >
> > > > On Fri, Apr 5, 2024 at 7:03 AM Jonathan Cameron
> > > >  wrote:
> > > > >
> > > > > On Fri,  5 Apr 2024 00:07:06 +
> > > > > "Ho-Ren (Jack) Chuang"  wrote:
> > > > >
> > > > > > The current implementation treats emulated memory devices, such as
> > > > > > CXL1.1 type3 memory, as normal DRAM when they are emulated as 
> > > > > > normal memory
> > > > > > (E820_TYPE_RAM). However, these emulated devices have different
> > > > > > characteristics than traditional DRAM, making it important to
> > > > > > distinguish them. Thus, we modify the tiered memory initialization 
> > > > > > process
> > > > > > to introduce a delay specifically for CPUless NUMA nodes. This delay
> > > > > > ensures that the memory tier initialization for these nodes is 
> > > > > > deferred
> > > > > > until HMAT information is obtained during the boot process. Finally,
> > > > > > demotion tables are recalculated at the end.
> > > > > >
> > > > > > * late_initcall(memory_tier_late_init);
> > > > > > Some device drivers may have initialized memory tiers between
> > > > > > `memory_tier_init()` and `memory_tier_late_init()`, potentially 
> > > > > > bringing
> > > > > > online memory nodes and configuring memory tiers. They should be 
> > > > > > excluded
> > > > > > in the late init.
> > > > > >
> > > > > > * Handle cases where there is no HMAT when creating memory tiers
> > > > > > There is a scenario where a CPUless node does not provide HMAT 
> > > > > > information.
> > > > > > If no HMAT is specified, it falls back to using the default DRAM 
> > > > > > tier.
> > > > > >
> > > > > > * Introduce another new lock `default_dram_perf_lock` for adist 
> > > > > > calculation
> > > > > > In the current implementation, iterating through CPUlist nodes 
> > > > > > requires
> > > > > > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will 
> > > > > > end up
> > > > > > trying to acquire the same lock, leading to a potential deadlock.
> > > > > > Therefore, we propose introducing a standalone 
> > > > > > `default_dram_perf_lock` to
> > > > > > protect `default_dram_perf_*`. This approach not only avoids 
> > > > > > deadlock
> > > > > > but also prevents holding a large lock simultaneously.
> > > > > >
> > > > > > * Upgrade `set_node_memory_tier` to support additional cases, 
> > > > > > including
> > > > > >   default DRAM, late CPUless, and hot-plugged initializations.
> > > > > > To cover hot-plugged memory nodes, `mt_calc_adistance()` and
> > > > > > `mt_find_alloc_memory_type()` are moved into 
> > > > > > `set_node_memory_tier()` to
> > > > > > handle cases where memtype is not initialized and where HMAT 
> > > > > > information is
> > > > > > available.
> > > > > >
> > > > > > * Introduce `default_memory_types` for those memory types that are 
> > > > > > not
> > > > > >   initialized by device drivers.
> > > > > > Because late initialized memory and default DRAM memory need to be 
> > > > > > managed,
> > > > > > a default memory type is created for storing all memory types that 
> > > > > > are
> > > > > > not initialized by device drivers and as a fallback.
> > > > > >
> > > > > > Signed-off-by: Ho-Ren (Jack) Chuang 
> > > > > > Signed-off-by: Hao Xiang 
> > > > > > Reviewed-by: "Huang, Ying" 
> > > > >
> > > > > Hi - one remaining question. Why can't we delay init for all nodes
>

Re: [External] Re: [PATCH v11 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-04-09 Thread Ho-Ren (Jack) Chuang
On Tue, Apr 9, 2024 at 7:33 PM Huang, Ying  wrote:
>
> "Ho-Ren (Jack) Chuang"  writes:
>
> > On Fri, Apr 5, 2024 at 7:03 AM Jonathan Cameron
> >  wrote:
> >>
> >> On Fri,  5 Apr 2024 00:07:06 +
> >> "Ho-Ren (Jack) Chuang"  wrote:
> >>
> >> > The current implementation treats emulated memory devices, such as
> >> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal 
> >> > memory
> >> > (E820_TYPE_RAM). However, these emulated devices have different
> >> > characteristics than traditional DRAM, making it important to
> >> > distinguish them. Thus, we modify the tiered memory initialization 
> >> > process
> >> > to introduce a delay specifically for CPUless NUMA nodes. This delay
> >> > ensures that the memory tier initialization for these nodes is deferred
> >> > until HMAT information is obtained during the boot process. Finally,
> >> > demotion tables are recalculated at the end.
> >> >
> >> > * late_initcall(memory_tier_late_init);
> >> > Some device drivers may have initialized memory tiers between
> >> > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
> >> > online memory nodes and configuring memory tiers. They should be excluded
> >> > in the late init.
> >> >
> >> > * Handle cases where there is no HMAT when creating memory tiers
> >> > There is a scenario where a CPUless node does not provide HMAT 
> >> > information.
> >> > If no HMAT is specified, it falls back to using the default DRAM tier.
> >> >
> >> > * Introduce another new lock `default_dram_perf_lock` for adist 
> >> > calculation
> >> > In the current implementation, iterating through CPUlist nodes requires
> >> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end 
> >> > up
> >> > trying to acquire the same lock, leading to a potential deadlock.
> >> > Therefore, we propose introducing a standalone `default_dram_perf_lock` 
> >> > to
> >> > protect `default_dram_perf_*`. This approach not only avoids deadlock
> >> > but also prevents holding a large lock simultaneously.
> >> >
> >> > * Upgrade `set_node_memory_tier` to support additional cases, including
> >> >   default DRAM, late CPUless, and hot-plugged initializations.
> >> > To cover hot-plugged memory nodes, `mt_calc_adistance()` and
> >> > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
> >> > handle cases where memtype is not initialized and where HMAT information 
> >> > is
> >> > available.
> >> >
> >> > * Introduce `default_memory_types` for those memory types that are not
> >> >   initialized by device drivers.
> >> > Because late initialized memory and default DRAM memory need to be 
> >> > managed,
> >> > a default memory type is created for storing all memory types that are
> >> > not initialized by device drivers and as a fallback.
> >> >
> >> > Signed-off-by: Ho-Ren (Jack) Chuang 
> >> > Signed-off-by: Hao Xiang 
> >> > Reviewed-by: "Huang, Ying" 
> >>
> >> Hi - one remaining question. Why can't we delay init for all nodes
> >> to either drivers or your fallback late_initcall code.
> >> It would be nice to reduce possible code paths.
> >
> > I try not to change too much of the existing code structure in
> > this patchset.
> >
> > To me, postponing/moving all memory tier registrations to
> > late_initcall() is another possible action item for the next patchset.
> >
> > After tier_mem(), hmat_init() is called, which requires registering
> > `default_dram_type` info. This is when `default_dram_type` is needed.
> > However, it is indeed possible to postpone the latter part,
> > set_node_memory_tier(), to `late_init(). So, memory_tier_init() can
> > indeed be split into two parts, and the latter part can be moved to
> > late_initcall() to be processed together.
>
> I don't think that it's good to move all memory_tier initialization in
> drivers to late_initcall().  It's natural to keep them in
> device_initcall() level.
>
> If so, we can allocate default_dram_type in memory_tier_init(), and call
> set_node_memory_tier() only in memory_tier_lateinit().  We can call
> memory_tier_lateinit() in device_initcall() level too.
>

It makes sense to me

Re: [External] Re: [PATCH v11 1/2] memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types

2024-04-09 Thread Ho-Ren (Jack) Chuang
On Tue, Apr 9, 2024 at 2:50 PM Andrew Morton  wrote:
>
> On Tue, 9 Apr 2024 12:00:06 -0700 "Ho-Ren (Jack) Chuang" 
>  wrote:
>
> > Hi Jonathan,
> >
> > On Fri, Apr 5, 2024 at 6:56 AM Jonathan Cameron
> >  wrote:
> > >
> > > On Fri,  5 Apr 2024 00:07:05 +
> > > "Ho-Ren (Jack) Chuang"  wrote:
> > >
> > > > Since different memory devices require finding, allocating, and putting
> > > > memory types, these common steps are abstracted in this patch,
> > > > enhancing the scalability and conciseness of the code.
> > > >
> > > > Signed-off-by: Ho-Ren (Jack) Chuang 
> > > > Reviewed-by: "Huang, Ying" 
> > > Reviewed-by: Jonathan Cameron 
> > >
> > Thank you for reviewing and for adding your "Reviewed-by"!
> > I was wondering if I need to send a v12 and manually add
> > this to the commit description, or if this is sufficient.
>
> I had added Jonathan's r-b to the mm.git copy of this patch.

Got it~ Thank you Andrew!

-- 
Best regards,
Ho-Ren (Jack) Chuang
莊賀任



Re: [External] Re: [PATCH v11 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-04-09 Thread Ho-Ren (Jack) Chuang
Hi Jonathan,

On Tue, Apr 9, 2024 at 9:12 AM Jonathan Cameron
 wrote:
>
> On Fri, 5 Apr 2024 15:43:47 -0700
> "Ho-Ren (Jack) Chuang"  wrote:
>
> > On Fri, Apr 5, 2024 at 7:03 AM Jonathan Cameron
> >  wrote:
> > >
> > > On Fri,  5 Apr 2024 00:07:06 +
> > > "Ho-Ren (Jack) Chuang"  wrote:
> > >
> > > > The current implementation treats emulated memory devices, such as
> > > > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal 
> > > > memory
> > > > (E820_TYPE_RAM). However, these emulated devices have different
> > > > characteristics than traditional DRAM, making it important to
> > > > distinguish them. Thus, we modify the tiered memory initialization 
> > > > process
> > > > to introduce a delay specifically for CPUless NUMA nodes. This delay
> > > > ensures that the memory tier initialization for these nodes is deferred
> > > > until HMAT information is obtained during the boot process. Finally,
> > > > demotion tables are recalculated at the end.
> > > >
> > > > * late_initcall(memory_tier_late_init);
> > > > Some device drivers may have initialized memory tiers between
> > > > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
> > > > online memory nodes and configuring memory tiers. They should be 
> > > > excluded
> > > > in the late init.
> > > >
> > > > * Handle cases where there is no HMAT when creating memory tiers
> > > > There is a scenario where a CPUless node does not provide HMAT 
> > > > information.
> > > > If no HMAT is specified, it falls back to using the default DRAM tier.
> > > >
> > > > * Introduce another new lock `default_dram_perf_lock` for adist 
> > > > calculation
> > > > In the current implementation, iterating through CPUlist nodes requires
> > > > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end 
> > > > up
> > > > trying to acquire the same lock, leading to a potential deadlock.
> > > > Therefore, we propose introducing a standalone `default_dram_perf_lock` 
> > > > to
> > > > protect `default_dram_perf_*`. This approach not only avoids deadlock
> > > > but also prevents holding a large lock simultaneously.
> > > >
> > > > * Upgrade `set_node_memory_tier` to support additional cases, including
> > > >   default DRAM, late CPUless, and hot-plugged initializations.
> > > > To cover hot-plugged memory nodes, `mt_calc_adistance()` and
> > > > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
> > > > handle cases where memtype is not initialized and where HMAT 
> > > > information is
> > > > available.
> > > >
> > > > * Introduce `default_memory_types` for those memory types that are not
> > > >   initialized by device drivers.
> > > > Because late initialized memory and default DRAM memory need to be 
> > > > managed,
> > > > a default memory type is created for storing all memory types that are
> > > > not initialized by device drivers and as a fallback.
> > > >
> > > > Signed-off-by: Ho-Ren (Jack) Chuang 
> > > > Signed-off-by: Hao Xiang 
> > > > Reviewed-by: "Huang, Ying" 
> > >
> > > Hi - one remaining question. Why can't we delay init for all nodes
> > > to either drivers or your fallback late_initcall code.
> > > It would be nice to reduce possible code paths.
> >
> > I try not to change too much of the existing code structure in
> > this patchset.
> >
> > To me, postponing/moving all memory tier registrations to
> > late_initcall() is another possible action item for the next patchset.
> >
> > After tier_mem(), hmat_init() is called, which requires registering
> > `default_dram_type` info. This is when `default_dram_type` is needed.
> > However, it is indeed possible to postpone the latter part,
> > set_node_memory_tier(), to `late_init(). So, memory_tier_init() can
> > indeed be split into two parts, and the latter part can be moved to
> > late_initcall() to be processed together.
> >
> > Doing this all memory-type drivers have to call late_initcall() to
> > register a memory tier. I’m not sure how many they are?
> >
> > What do you guys think?
>
> Gut feeling - if you are goi

Re: [External] Re: [PATCH v11 1/2] memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types

2024-04-09 Thread Ho-Ren (Jack) Chuang
Hi Jonathan,

On Fri, Apr 5, 2024 at 6:56 AM Jonathan Cameron
 wrote:
>
> On Fri,  5 Apr 2024 00:07:05 +
> "Ho-Ren (Jack) Chuang"  wrote:
>
> > Since different memory devices require finding, allocating, and putting
> > memory types, these common steps are abstracted in this patch,
> > enhancing the scalability and conciseness of the code.
> >
> > Signed-off-by: Ho-Ren (Jack) Chuang 
> > Reviewed-by: "Huang, Ying" 
> Reviewed-by: Jonathan Cameron 
>
Thank you for reviewing and for adding your "Reviewed-by"!
I was wondering if I need to send a v12 and manually add
this to the commit description, or if this is sufficient.

-- 
Best regards,
Ho-Ren (Jack) Chuang
莊賀任



Re: [PATCH v11 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-04-05 Thread Ho-Ren (Jack) Chuang
On Fri, Apr 5, 2024 at 7:03 AM Jonathan Cameron
 wrote:
>
> On Fri,  5 Apr 2024 00:07:06 +
> "Ho-Ren (Jack) Chuang"  wrote:
>
> > The current implementation treats emulated memory devices, such as
> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
> > (E820_TYPE_RAM). However, these emulated devices have different
> > characteristics than traditional DRAM, making it important to
> > distinguish them. Thus, we modify the tiered memory initialization process
> > to introduce a delay specifically for CPUless NUMA nodes. This delay
> > ensures that the memory tier initialization for these nodes is deferred
> > until HMAT information is obtained during the boot process. Finally,
> > demotion tables are recalculated at the end.
> >
> > * late_initcall(memory_tier_late_init);
> > Some device drivers may have initialized memory tiers between
> > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
> > online memory nodes and configuring memory tiers. They should be excluded
> > in the late init.
> >
> > * Handle cases where there is no HMAT when creating memory tiers
> > There is a scenario where a CPUless node does not provide HMAT information.
> > If no HMAT is specified, it falls back to using the default DRAM tier.
> >
> > * Introduce another new lock `default_dram_perf_lock` for adist calculation
> > In the current implementation, iterating through CPUlist nodes requires
> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
> > trying to acquire the same lock, leading to a potential deadlock.
> > Therefore, we propose introducing a standalone `default_dram_perf_lock` to
> > protect `default_dram_perf_*`. This approach not only avoids deadlock
> > but also prevents holding a large lock simultaneously.
> >
> > * Upgrade `set_node_memory_tier` to support additional cases, including
> >   default DRAM, late CPUless, and hot-plugged initializations.
> > To cover hot-plugged memory nodes, `mt_calc_adistance()` and
> > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
> > handle cases where memtype is not initialized and where HMAT information is
> > available.
> >
> > * Introduce `default_memory_types` for those memory types that are not
> >   initialized by device drivers.
> > Because late initialized memory and default DRAM memory need to be managed,
> > a default memory type is created for storing all memory types that are
> > not initialized by device drivers and as a fallback.
> >
> > Signed-off-by: Ho-Ren (Jack) Chuang 
> > Signed-off-by: Hao Xiang 
> > Reviewed-by: "Huang, Ying" 
>
> Hi - one remaining question. Why can't we delay init for all nodes
> to either drivers or your fallback late_initcall code.
> It would be nice to reduce possible code paths.

I try not to change too much of the existing code structure in
this patchset.

To me, postponing/moving all memory tier registrations to
late_initcall() is another possible action item for the next patchset.

After tier_mem(), hmat_init() is called, which requires registering
`default_dram_type` info. This is when `default_dram_type` is needed.
However, it is indeed possible to postpone the latter part,
set_node_memory_tier(), to `late_init(). So, memory_tier_init() can
indeed be split into two parts, and the latter part can be moved to
late_initcall() to be processed together.

Doing this all memory-type drivers have to call late_initcall() to
register a memory tier. I’m not sure how many they are?

What do you guys think?

>
> Jonathan
>
>
> > ---
> >  mm/memory-tiers.c | 94 +++
> >  1 file changed, 70 insertions(+), 24 deletions(-)
> >
> > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> > index 516b144fd45a..6632102bd5c9 100644
> > --- a/mm/memory-tiers.c
> > +++ b/mm/memory-tiers.c
>
>
>
> > @@ -855,7 +892,8 @@ static int __init memory_tier_init(void)
> >* For now we can have 4 faster memory tiers with smaller adistance
> >* than default DRAM tier.
> >*/
> > - default_dram_type = alloc_memory_type(MEMTIER_ADISTANCE_DRAM);
> > + default_dram_type = mt_find_alloc_memory_type(MEMTIER_ADISTANCE_DRAM,
> > +   _memory_types);
> >   if (IS_ERR(default_dram_type))
> >   panic("%s() failed to allocate default DRAM tier\n", 
> > __func__);
> >
> > @@ -865,6 +903,14 @@ static int __init memory_tier_init(void)
> >* types assigned.
> >*

[PATCH v11 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-04-04 Thread Ho-Ren (Jack) Chuang
The current implementation treats emulated memory devices, such as
CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
(E820_TYPE_RAM). However, these emulated devices have different
characteristics than traditional DRAM, making it important to
distinguish them. Thus, we modify the tiered memory initialization process
to introduce a delay specifically for CPUless NUMA nodes. This delay
ensures that the memory tier initialization for these nodes is deferred
until HMAT information is obtained during the boot process. Finally,
demotion tables are recalculated at the end.

* late_initcall(memory_tier_late_init);
Some device drivers may have initialized memory tiers between
`memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
online memory nodes and configuring memory tiers. They should be excluded
in the late init.

* Handle cases where there is no HMAT when creating memory tiers
There is a scenario where a CPUless node does not provide HMAT information.
If no HMAT is specified, it falls back to using the default DRAM tier.

* Introduce another new lock `default_dram_perf_lock` for adist calculation
In the current implementation, iterating through CPUlist nodes requires
holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
trying to acquire the same lock, leading to a potential deadlock.
Therefore, we propose introducing a standalone `default_dram_perf_lock` to
protect `default_dram_perf_*`. This approach not only avoids deadlock
but also prevents holding a large lock simultaneously.

* Upgrade `set_node_memory_tier` to support additional cases, including
  default DRAM, late CPUless, and hot-plugged initializations.
To cover hot-plugged memory nodes, `mt_calc_adistance()` and
`mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
handle cases where memtype is not initialized and where HMAT information is
available.

* Introduce `default_memory_types` for those memory types that are not
  initialized by device drivers.
Because late initialized memory and default DRAM memory need to be managed,
a default memory type is created for storing all memory types that are
not initialized by device drivers and as a fallback.

Signed-off-by: Ho-Ren (Jack) Chuang 
Signed-off-by: Hao Xiang 
Reviewed-by: "Huang, Ying" 
---
 mm/memory-tiers.c | 94 +++
 1 file changed, 70 insertions(+), 24 deletions(-)

diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 516b144fd45a..6632102bd5c9 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -36,6 +36,11 @@ struct node_memory_type_map {
 
 static DEFINE_MUTEX(memory_tier_lock);
 static LIST_HEAD(memory_tiers);
+/*
+ * The list is used to store all memory types that are not created
+ * by a device driver.
+ */
+static LIST_HEAD(default_memory_types);
 static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
 struct memory_dev_type *default_dram_type;
 
@@ -108,6 +113,8 @@ static struct demotion_nodes *node_demotion __read_mostly;
 
 static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms);
 
+/* The lock is used to protect `default_dram_perf*` info and nid. */
+static DEFINE_MUTEX(default_dram_perf_lock);
 static bool default_dram_perf_error;
 static struct access_coordinate default_dram_perf;
 static int default_dram_perf_ref_nid = NUMA_NO_NODE;
@@ -505,7 +512,8 @@ static inline void __init_node_memory_type(int node, struct 
memory_dev_type *mem
 static struct memory_tier *set_node_memory_tier(int node)
 {
struct memory_tier *memtier;
-   struct memory_dev_type *memtype;
+   struct memory_dev_type *memtype = default_dram_type;
+   int adist = MEMTIER_ADISTANCE_DRAM;
pg_data_t *pgdat = NODE_DATA(node);
 
 
@@ -514,7 +522,16 @@ static struct memory_tier *set_node_memory_tier(int node)
if (!node_state(node, N_MEMORY))
return ERR_PTR(-EINVAL);
 
-   __init_node_memory_type(node, default_dram_type);
+   mt_calc_adistance(node, );
+   if (!node_memory_types[node].memtype) {
+   memtype = mt_find_alloc_memory_type(adist, 
_memory_types);
+   if (IS_ERR(memtype)) {
+   memtype = default_dram_type;
+   pr_info("Failed to allocate a memory type. Fall 
back.\n");
+   }
+   }
+
+   __init_node_memory_type(node, memtype);
 
memtype = node_memory_types[node].memtype;
node_set(node, memtype->nodes);
@@ -652,6 +669,35 @@ void mt_put_memory_types(struct list_head *memory_types)
 }
 EXPORT_SYMBOL_GPL(mt_put_memory_types);
 
+/*
+ * This is invoked via `late_initcall()` to initialize memory tiers for
+ * CPU-less memory nodes after driver initialization, which is
+ * expected to provide `adistance` algorithms.
+ */
+static int __init memory_tier_late_init(void)
+{
+   int nid;
+
+   guard(mutex)(_tier_lock);
+   for_each_n

[PATCH v11 1/2] memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types

2024-04-04 Thread Ho-Ren (Jack) Chuang
Since different memory devices require finding, allocating, and putting
memory types, these common steps are abstracted in this patch,
enhancing the scalability and conciseness of the code.

Signed-off-by: Ho-Ren (Jack) Chuang 
Reviewed-by: "Huang, Ying" 
---
 drivers/dax/kmem.c   | 30 --
 include/linux/memory-tiers.h | 13 +
 mm/memory-tiers.c| 29 +
 3 files changed, 46 insertions(+), 26 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 42ee360cf4e3..4fe9d040e375 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -55,36 +55,14 @@ static LIST_HEAD(kmem_memory_types);
 
 static struct memory_dev_type *kmem_find_alloc_memory_type(int adist)
 {
-   bool found = false;
-   struct memory_dev_type *mtype;
-
-   mutex_lock(_memory_type_lock);
-   list_for_each_entry(mtype, _memory_types, list) {
-   if (mtype->adistance == adist) {
-   found = true;
-   break;
-   }
-   }
-   if (!found) {
-   mtype = alloc_memory_type(adist);
-   if (!IS_ERR(mtype))
-   list_add(>list, _memory_types);
-   }
-   mutex_unlock(_memory_type_lock);
-
-   return mtype;
+   guard(mutex)(_memory_type_lock);
+   return mt_find_alloc_memory_type(adist, _memory_types);
 }
 
 static void kmem_put_memory_types(void)
 {
-   struct memory_dev_type *mtype, *mtn;
-
-   mutex_lock(_memory_type_lock);
-   list_for_each_entry_safe(mtype, mtn, _memory_types, list) {
-   list_del(>list);
-   put_memory_type(mtype);
-   }
-   mutex_unlock(_memory_type_lock);
+   guard(mutex)(_memory_type_lock);
+   mt_put_memory_types(_memory_types);
 }
 
 static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 69e781900082..0d70788558f4 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -48,6 +48,9 @@ int mt_calc_adistance(int node, int *adist);
 int mt_set_default_dram_perf(int nid, struct access_coordinate *perf,
 const char *source);
 int mt_perf_to_adistance(struct access_coordinate *perf, int *adist);
+struct memory_dev_type *mt_find_alloc_memory_type(int adist,
+ struct list_head 
*memory_types);
+void mt_put_memory_types(struct list_head *memory_types);
 #ifdef CONFIG_MIGRATION
 int next_demotion_node(int node);
 void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
@@ -136,5 +139,15 @@ static inline int mt_perf_to_adistance(struct 
access_coordinate *perf, int *adis
 {
return -EIO;
 }
+
+static inline struct memory_dev_type *mt_find_alloc_memory_type(int adist,
+   struct 
list_head *memory_types)
+{
+   return NULL;
+}
+
+static inline void mt_put_memory_types(struct list_head *memory_types)
+{
+}
 #endif /* CONFIG_NUMA */
 #endif  /* _LINUX_MEMORY_TIERS_H */
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 0537664620e5..516b144fd45a 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -623,6 +623,35 @@ void clear_node_memory_type(int node, struct 
memory_dev_type *memtype)
 }
 EXPORT_SYMBOL_GPL(clear_node_memory_type);
 
+struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+{
+   struct memory_dev_type *mtype;
+
+   list_for_each_entry(mtype, memory_types, list)
+   if (mtype->adistance == adist)
+   return mtype;
+
+   mtype = alloc_memory_type(adist);
+   if (IS_ERR(mtype))
+   return mtype;
+
+   list_add(>list, memory_types);
+
+   return mtype;
+}
+EXPORT_SYMBOL_GPL(mt_find_alloc_memory_type);
+
+void mt_put_memory_types(struct list_head *memory_types)
+{
+   struct memory_dev_type *mtype, *mtn;
+
+   list_for_each_entry_safe(mtype, mtn, memory_types, list) {
+   list_del(>list);
+   put_memory_type(mtype);
+   }
+}
+EXPORT_SYMBOL_GPL(mt_put_memory_types);
+
 static void dump_hmem_attrs(struct access_coordinate *coord, const char 
*prefix)
 {
    pr_info(
-- 
Ho-Ren (Jack) Chuang




[PATCH v11 0/2] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-04-04 Thread Ho-Ren (Jack) Chuang
When a memory device, such as CXL1.1 type3 memory, is emulated as
normal memory (E820_TYPE_RAM), the memory device is indistinguishable from
normal DRAM in terms of memory tiering with the current implementation.
The current memory tiering assigns all detected normal memory nodes to
the same DRAM tier. This results in normal memory devices with different
attributions being unable to be assigned to the correct memory tier,
leading to the inability to migrate pages between different
types of memory.
https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/

This patchset automatically resolves the issues. It delays the
initialization of memory tiers for CPUless NUMA nodes until they obtain
HMAT information and after all devices are initialized at boot time,
eliminating the need for user intervention. If no HMAT is specified,
it falls back to using `default_dram_type`.

Example usecase:
We have CXL memory on the host, and we create VMs with a new system memory
device backed by host CXL memory. We inject CXL memory performance
attributes through QEMU, and the guest now sees memory nodes with
performance attributes in HMAT. With this change, we enable the
guest kernel to construct the correct memory tiering for the memory nodes.

- v11:
 Thanks to comments from Jonathan,
 * Replace `mutex_lock()` with `guard(mutex)()`
 * Reorder some modifications within the patchset
 * Rewrite the code for improved readability and fixing alignment issues
 * Pass all strict rules in checkpatch.pl
- v10:
 Thanks to Andrew's and SeongJae's comments,
 * Address kunit compilation errors
 * Resolve the bug of not returning the correct error code in
   `mt_perf_to_adistance`
 * 
https://lore.kernel.org/lkml/20240402001739.2521623-1-horenchu...@bytedance.com/T/#u
-v9:
 * Address corner cases in `memory_tier_late_init`. Thank Ying's comments.
 * 
https://lore.kernel.org/lkml/20240329053353.309557-1-horenchu...@bytedance.com/T/#u
-v8:
 * Fix email format
 * 
https://lore.kernel.org/lkml/20240329004815.195476-1-horenchu...@bytedance.com/T/#u
-v7:
 * Add Reviewed-by: "Huang, Ying" 
-v6:
 Thanks to Ying's comments,
 * Move `default_dram_perf_lock` to the function's beginning for clarity
 * Fix double unlocking at v5
 * 
https://lore.kernel.org/lkml/20240327072729.3381685-1-horenchu...@bytedance.com/T/#u
-v5:
 Thanks to Ying's comments,
 * Add comments about what is protected by `default_dram_perf_lock`
 * Fix an uninitialized pointer mtype
 * Slightly shorten the time holding `default_dram_perf_lock`
 * Fix a deadlock bug in `mt_perf_to_adistance`
 * 
https://lore.kernel.org/lkml/20240327041646.3258110-1-horenchu...@bytedance.com/T/#u
-v4:
 Thanks to Ying's comments,
 * Remove redundant code
 * Reorganize patches accordingly
 * 
https://lore.kernel.org/lkml/20240322070356.315922-1-horenchu...@bytedance.com/T/#u
-v3:
 Thanks to Ying's comments,
 * Make the newly added code independent of HMAT
 * Upgrade set_node_memory_tier to support more cases
 * Put all non-driver-initialized memory types into default_memory_types
   instead of using hmat_memory_types
 * find_alloc_memory_type -> mt_find_alloc_memory_type
 * 
https://lore.kernel.org/lkml/20240320061041.3246828-1-horenchu...@bytedance.com/T/#u
-v2:
 Thanks to Ying's comments,
 * Rewrite cover letter & patch description
 * Rename functions, don't use _hmat
 * Abstract common functions into find_alloc_memory_type()
 * Use the expected way to use set_node_memory_tier instead of modifying it
 * 
https://lore.kernel.org/lkml/20240312061729.1997111-1-horenchu...@bytedance.com/T/#u
-v1:
 * 
https://lore.kernel.org/lkml/20240301082248.3456086-1-horenchu...@bytedance.com/T/#u

Ho-Ren (Jack) Chuang (2):
  memory tier: dax/kmem: introduce an abstract layer for finding,
allocating, and putting memory types
  memory tier: create CPUless memory tiers after obtaining HMAT info

 drivers/dax/kmem.c   |  30 ++---
 include/linux/memory-tiers.h |  13 
 mm/memory-tiers.c| 123 ---
 3 files changed, 116 insertions(+), 50 deletions(-)

-- 
Ho-Ren (Jack) Chuang




Re: [External] Re: [PATCH v10 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-04-04 Thread Ho-Ren (Jack) Chuang
Hi Jonathan,

Thank you! I will fix them and send a V11 soon.

On Thu, Apr 4, 2024 at 6:37 AM Jonathan Cameron
 wrote:
>
> 
>
> > > > @@ -858,7 +910,8 @@ static int __init memory_tier_init(void)
> > > >* For now we can have 4 faster memory tiers with smaller 
> > > > adistance
> > > >* than default DRAM tier.
> > > >*/
> > > > - default_dram_type = alloc_memory_type(MEMTIER_ADISTANCE_DRAM);
> > > > + default_dram_type = 
> > > > mt_find_alloc_memory_type(MEMTIER_ADISTANCE_DRAM,
> > > > + 
> > > > _memory_types);
> > >
> > > Unusual indenting.  Align with just after (
> > >
> >
> > Aligning with "(" will exceed 100 columns. Would that be acceptable?
> I think we are talking cross purposes.
>
> default_dram_type = mt_find_alloc_memory_type(MEMTIER_ADISTANCE_DRAM,
>   _memory_types);
>
> Is what I was suggesting.
>

Oh, now I see. Thanks!

> >
> > > >   if (IS_ERR(default_dram_type))
> > > >   panic("%s() failed to allocate default DRAM tier\n", 
> > > > __func__);
> > > >
> > > > @@ -868,6 +921,14 @@ static int __init memory_tier_init(void)
> > > >* types assigned.
> > > >*/
> > > >   for_each_node_state(node, N_MEMORY) {
> > > > + if (!node_state(node, N_CPU))
> > > > + /*
> > > > +  * Defer memory tier initialization on CPUless 
> > > > numa nodes.
> > > > +  * These will be initialized after firmware and 
> > > > devices are
> > >
> > > I think this wraps at just over 80 chars.  Seems silly to wrap so tightly 
> > > and not
> > > quite fit under 80. (this is about 83 chars.
> > >
> >
> > I can fix this.
> > I have a question. From my patch, this is <80 chars. However,
> > in an email, this is >80 chars. Does that mean we need to
> > count the number of chars in an email, not in a patch? Or if I
> > missed something? like vim configuration or?
>
> 3 tabs + 1 space + the text from * (58)
> = 24 + 1 + 58 = 83
>
> Advantage of using claws email for kernel stuff is it has a nice per character
> ruler at the top of the window.
>
> I wonder if you have a different tab indent size?  The kernel uses 8
> characters.  It might explain the few other odd indents if perhaps
> you have it at 4 in your editor?
>
> https://www.kernel.org/doc/html/v4.10/process/coding-style.html
>

Got it. I was using tab=4. I will change to 8. Thanks!

> Jonathan
>
> >
> > > > +  * initialized.
> > > > +  */
> > > > + continue;
> > > > +
> > > >   memtier = set_node_memory_tier(node);
> > > >   if (IS_ERR(memtier))
> > > >   /*
> > >
> >
> >
>


-- 
Best regards,
Ho-Ren (Jack) Chuang
莊賀任



Re: [PATCH v10 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-04-03 Thread Ho-Ren (Jack) Chuang
can have 4 faster memory tiers with smaller adistance
> >* than default DRAM tier.
> >*/
> > - default_dram_type = alloc_memory_type(MEMTIER_ADISTANCE_DRAM);
> > + default_dram_type = mt_find_alloc_memory_type(MEMTIER_ADISTANCE_DRAM,
> > +     
> > _memory_types);
>
> Unusual indenting.  Align with just after (
>

Aligning with "(" will exceed 100 columns. Would that be acceptable?

> >   if (IS_ERR(default_dram_type))
> >   panic("%s() failed to allocate default DRAM tier\n", 
> > __func__);
> >
> > @@ -868,6 +921,14 @@ static int __init memory_tier_init(void)
> >* types assigned.
> >*/
> >   for_each_node_state(node, N_MEMORY) {
> > + if (!node_state(node, N_CPU))
> > + /*
> > +  * Defer memory tier initialization on CPUless numa 
> > nodes.
> > +  * These will be initialized after firmware and 
> > devices are
>
> I think this wraps at just over 80 chars.  Seems silly to wrap so tightly and 
> not
> quite fit under 80. (this is about 83 chars.
>

I can fix this.
I have a question. From my patch, this is <80 chars. However,
in an email, this is >80 chars. Does that mean we need to
count the number of chars in an email, not in a patch? Or if I
missed something? like vim configuration or?

> > +  * initialized.
> > +  */
> > + continue;
> > +
> >   memtier = set_node_memory_tier(node);
> >   if (IS_ERR(memtier))
> >   /*
>


-- 
Best regards,
Ho-Ren (Jack) Chuang
莊賀任



Re: [PATCH v10 1/2] memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types

2024-04-03 Thread Ho-Ren (Jack) Chuang
Hi Jonathan,

Thanks for your feedback. I will fix them (inlined) in the next V11.
No worries, it's never too late!

On Wed, Apr 3, 2024 at 9:52 AM Jonathan Cameron
 wrote:
>
> On Tue,  2 Apr 2024 00:17:37 +
> "Ho-Ren (Jack) Chuang"  wrote:
>
> > Since different memory devices require finding, allocating, and putting
> > memory types, these common steps are abstracted in this patch,
> > enhancing the scalability and conciseness of the code.
> >
> > Signed-off-by: Ho-Ren (Jack) Chuang 
> > Reviewed-by: "Huang, Ying" 
>
> Hi,
>
> I know this is a late entry to the discussion but a few comments inline.
> (sorry I didn't look earlier!)
>
> All opportunities to improve code complexity and readability as a result
> of your factoring out.
>
> Jonathan
>
>
> > ---
> >  drivers/dax/kmem.c   | 20 ++--
> >  include/linux/memory-tiers.h | 13 +
> >  mm/memory-tiers.c| 32 
> >  3 files changed, 47 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > index 42ee360cf4e3..01399e5b53b2 100644
> > --- a/drivers/dax/kmem.c
> > +++ b/drivers/dax/kmem.c
> > @@ -55,21 +55,10 @@ static LIST_HEAD(kmem_memory_types);
> >
> >  static struct memory_dev_type *kmem_find_alloc_memory_type(int adist)
> >  {
> > - bool found = false;
> >   struct memory_dev_type *mtype;
> >
> >   mutex_lock(_memory_type_lock);
> could use
>
> guard(mutex)(_memory_type_lock);
> return mt_find_alloc_memory_type(adist, _memory_types);
>

I will change it accordingly.

> I'm fine if you ignore this comment though as may be other functions in
> here that could take advantage of the cleanup.h stuff in a future patch.
>
> > - list_for_each_entry(mtype, _memory_types, list) {
> > - if (mtype->adistance == adist) {
> > - found = true;
> > - break;
> > - }
> > - }
> > - if (!found) {
> > - mtype = alloc_memory_type(adist);
> > - if (!IS_ERR(mtype))
> > - list_add(>list, _memory_types);
> > - }
> > + mtype = mt_find_alloc_memory_type(adist, _memory_types);
> >   mutex_unlock(_memory_type_lock);
> >
> >   return mtype;
>
> > diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> > index 69e781900082..a44c03c2ba3a 100644
> > --- a/include/linux/memory-tiers.h
> > +++ b/include/linux/memory-tiers.h
> > @@ -48,6 +48,9 @@ int mt_calc_adistance(int node, int *adist);
> >  int mt_set_default_dram_perf(int nid, struct access_coordinate *perf,
> >const char *source);
> >  int mt_perf_to_adistance(struct access_coordinate *perf, int *adist);
> > +struct memory_dev_type *mt_find_alloc_memory_type(int adist,
> > + struct list_head 
> > *memory_types);
>
> That indent looks unusual.  Align the start of struct with start of int.
>

I can make this aligned but it will show another warning:
"WARNING: line length of 131 exceeds 100 columns"
Is this ok?

> > +void mt_put_memory_types(struct list_head *memory_types);
> >  #ifdef CONFIG_MIGRATION
> >  int next_demotion_node(int node);
> >  void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
> > @@ -136,5 +139,15 @@ static inline int mt_perf_to_adistance(struct 
> > access_coordinate *perf, int *adis
> >  {
> >   return -EIO;
> >  }
> > +
> > +struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct 
> > list_head *memory_types)
> > +{
> > + return NULL;
> > +}
> > +
> > +void mt_put_memory_types(struct list_head *memory_types)
> > +{
> > +
> No blank line needed here.

Will fix.

> > +}
> >  #endif   /* CONFIG_NUMA */
> >  #endif  /* _LINUX_MEMORY_TIERS_H */
> > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> > index 0537664620e5..974af10cfdd8 100644
> > --- a/mm/memory-tiers.c
> > +++ b/mm/memory-tiers.c
> > @@ -623,6 +623,38 @@ void clear_node_memory_type(int node, struct 
> > memory_dev_type *memtype)
> >  }
> >  EXPORT_SYMBOL_GPL(clear_node_memory_type);
> >
> > +struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct 
> > list_head *memory_types)
>
> Breaking this out as a separate function provides opportunity to improve it.
> Maybe a follow up p

[PATCH v10 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-04-01 Thread Ho-Ren (Jack) Chuang
The current implementation treats emulated memory devices, such as
CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
(E820_TYPE_RAM). However, these emulated devices have different
characteristics than traditional DRAM, making it important to
distinguish them. Thus, we modify the tiered memory initialization process
to introduce a delay specifically for CPUless NUMA nodes. This delay
ensures that the memory tier initialization for these nodes is deferred
until HMAT information is obtained during the boot process. Finally,
demotion tables are recalculated at the end.

* late_initcall(memory_tier_late_init);
Some device drivers may have initialized memory tiers between
`memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
online memory nodes and configuring memory tiers. They should be excluded
in the late init.

* Handle cases where there is no HMAT when creating memory tiers
There is a scenario where a CPUless node does not provide HMAT information.
If no HMAT is specified, it falls back to using the default DRAM tier.

* Introduce another new lock `default_dram_perf_lock` for adist calculation
In the current implementation, iterating through CPUlist nodes requires
holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
trying to acquire the same lock, leading to a potential deadlock.
Therefore, we propose introducing a standalone `default_dram_perf_lock` to
protect `default_dram_perf_*`. This approach not only avoids deadlock
but also prevents holding a large lock simultaneously.

* Upgrade `set_node_memory_tier` to support additional cases, including
  default DRAM, late CPUless, and hot-plugged initializations.
To cover hot-plugged memory nodes, `mt_calc_adistance()` and
`mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
handle cases where memtype is not initialized and where HMAT information is
available.

* Introduce `default_memory_types` for those memory types that are not
  initialized by device drivers.
Because late initialized memory and default DRAM memory need to be managed,
a default memory type is created for storing all memory types that are
not initialized by device drivers and as a fallback.

Signed-off-by: Ho-Ren (Jack) Chuang 
Signed-off-by: Hao Xiang 
Reviewed-by: "Huang, Ying" 
---
 include/linux/memory-tiers.h |  5 +-
 mm/memory-tiers.c| 95 +---
 2 files changed, 81 insertions(+), 19 deletions(-)

diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index a44c03c2ba3a..16769552a338 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -140,12 +140,13 @@ static inline int mt_perf_to_adistance(struct 
access_coordinate *perf, int *adis
return -EIO;
 }
 
-struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+static inline struct memory_dev_type *mt_find_alloc_memory_type(int adist,
+   struct list_head *memory_types)
 {
return NULL;
 }
 
-void mt_put_memory_types(struct list_head *memory_types)
+static inline void mt_put_memory_types(struct list_head *memory_types)
 {
 
 }
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 974af10cfdd8..44fa10980d37 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -36,6 +36,11 @@ struct node_memory_type_map {
 
 static DEFINE_MUTEX(memory_tier_lock);
 static LIST_HEAD(memory_tiers);
+/*
+ * The list is used to store all memory types that are not created
+ * by a device driver.
+ */
+static LIST_HEAD(default_memory_types);
 static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
 struct memory_dev_type *default_dram_type;
 
@@ -108,6 +113,8 @@ static struct demotion_nodes *node_demotion __read_mostly;
 
 static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms);
 
+/* The lock is used to protect `default_dram_perf*` info and nid. */
+static DEFINE_MUTEX(default_dram_perf_lock);
 static bool default_dram_perf_error;
 static struct access_coordinate default_dram_perf;
 static int default_dram_perf_ref_nid = NUMA_NO_NODE;
@@ -505,7 +512,8 @@ static inline void __init_node_memory_type(int node, struct 
memory_dev_type *mem
 static struct memory_tier *set_node_memory_tier(int node)
 {
struct memory_tier *memtier;
-   struct memory_dev_type *memtype;
+   struct memory_dev_type *mtype = default_dram_type;
+   int adist = MEMTIER_ADISTANCE_DRAM;
pg_data_t *pgdat = NODE_DATA(node);
 
 
@@ -514,11 +522,20 @@ static struct memory_tier *set_node_memory_tier(int node)
if (!node_state(node, N_MEMORY))
return ERR_PTR(-EINVAL);
 
-   __init_node_memory_type(node, default_dram_type);
+   mt_calc_adistance(node, );
+   if (node_memory_types[node].memtype == NULL) {
+   mtype = mt_find_alloc_memory_type(adist, _memory_types);
+   if (IS_ERR(mtype)) {
+   mtype = default

[PATCH v10 1/2] memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types

2024-04-01 Thread Ho-Ren (Jack) Chuang
Since different memory devices require finding, allocating, and putting
memory types, these common steps are abstracted in this patch,
enhancing the scalability and conciseness of the code.

Signed-off-by: Ho-Ren (Jack) Chuang 
Reviewed-by: "Huang, Ying" 
---
 drivers/dax/kmem.c   | 20 ++--
 include/linux/memory-tiers.h | 13 +
 mm/memory-tiers.c| 32 
 3 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 42ee360cf4e3..01399e5b53b2 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -55,21 +55,10 @@ static LIST_HEAD(kmem_memory_types);
 
 static struct memory_dev_type *kmem_find_alloc_memory_type(int adist)
 {
-   bool found = false;
struct memory_dev_type *mtype;
 
mutex_lock(_memory_type_lock);
-   list_for_each_entry(mtype, _memory_types, list) {
-   if (mtype->adistance == adist) {
-   found = true;
-   break;
-   }
-   }
-   if (!found) {
-   mtype = alloc_memory_type(adist);
-   if (!IS_ERR(mtype))
-   list_add(>list, _memory_types);
-   }
+   mtype = mt_find_alloc_memory_type(adist, _memory_types);
mutex_unlock(_memory_type_lock);
 
return mtype;
@@ -77,13 +66,8 @@ static struct memory_dev_type 
*kmem_find_alloc_memory_type(int adist)
 
 static void kmem_put_memory_types(void)
 {
-   struct memory_dev_type *mtype, *mtn;
-
mutex_lock(_memory_type_lock);
-   list_for_each_entry_safe(mtype, mtn, _memory_types, list) {
-   list_del(>list);
-   put_memory_type(mtype);
-   }
+   mt_put_memory_types(_memory_types);
mutex_unlock(_memory_type_lock);
 }
 
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 69e781900082..a44c03c2ba3a 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -48,6 +48,9 @@ int mt_calc_adistance(int node, int *adist);
 int mt_set_default_dram_perf(int nid, struct access_coordinate *perf,
 const char *source);
 int mt_perf_to_adistance(struct access_coordinate *perf, int *adist);
+struct memory_dev_type *mt_find_alloc_memory_type(int adist,
+   struct list_head 
*memory_types);
+void mt_put_memory_types(struct list_head *memory_types);
 #ifdef CONFIG_MIGRATION
 int next_demotion_node(int node);
 void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
@@ -136,5 +139,15 @@ static inline int mt_perf_to_adistance(struct 
access_coordinate *perf, int *adis
 {
return -EIO;
 }
+
+struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+{
+   return NULL;
+}
+
+void mt_put_memory_types(struct list_head *memory_types)
+{
+
+}
 #endif /* CONFIG_NUMA */
 #endif  /* _LINUX_MEMORY_TIERS_H */
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 0537664620e5..974af10cfdd8 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -623,6 +623,38 @@ void clear_node_memory_type(int node, struct 
memory_dev_type *memtype)
 }
 EXPORT_SYMBOL_GPL(clear_node_memory_type);
 
+struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+{
+   bool found = false;
+   struct memory_dev_type *mtype;
+
+   list_for_each_entry(mtype, memory_types, list) {
+   if (mtype->adistance == adist) {
+   found = true;
+   break;
+   }
+   }
+   if (!found) {
+   mtype = alloc_memory_type(adist);
+   if (!IS_ERR(mtype))
+   list_add(>list, memory_types);
+   }
+
+   return mtype;
+}
+EXPORT_SYMBOL_GPL(mt_find_alloc_memory_type);
+
+void mt_put_memory_types(struct list_head *memory_types)
+{
+   struct memory_dev_type *mtype, *mtn;
+
+   list_for_each_entry_safe(mtype, mtn, memory_types, list) {
+   list_del(>list);
+   put_memory_type(mtype);
+   }
+}
+EXPORT_SYMBOL_GPL(mt_put_memory_types);
+
 static void dump_hmem_attrs(struct access_coordinate *coord, const char 
*prefix)
 {
    pr_info(
-- 
Ho-Ren (Jack) Chuang




[PATCH v10 0/2] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-04-01 Thread Ho-Ren (Jack) Chuang
When a memory device, such as CXL1.1 type3 memory, is emulated as
normal memory (E820_TYPE_RAM), the memory device is indistinguishable from
normal DRAM in terms of memory tiering with the current implementation.
The current memory tiering assigns all detected normal memory nodes to
the same DRAM tier. This results in normal memory devices with different
attributions being unable to be assigned to the correct memory tier,
leading to the inability to migrate pages between different
types of memory.
https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/

This patchset automatically resolves the issues. It delays the
initialization of memory tiers for CPUless NUMA nodes until they obtain
HMAT information and after all devices are initialized at boot time,
eliminating the need for user intervention. If no HMAT is specified,
it falls back to using `default_dram_type`.

Example usecase:
We have CXL memory on the host, and we create VMs with a new system memory
device backed by host CXL memory. We inject CXL memory performance
attributes through QEMU, and the guest now sees memory nodes with
performance attributes in HMAT. With this change, we enable the
guest kernel to construct the correct memory tiering for the memory nodes.

- v10:
 Thanks to Andrew's and SeongJae's comments,
 * Address kunit compilation errors
 * Resolve the bug of not returning the correct error code in
   `mt_perf_to_adistance`
-v9:
 * Address corner cases in `memory_tier_late_init`. Thank Ying's comments.
 * 
https://lore.kernel.org/lkml/20240329053353.309557-1-horenchu...@bytedance.com/T/#u
-v8:
 * Fix email format
 * 
https://lore.kernel.org/lkml/20240329004815.195476-1-horenchu...@bytedance.com/T/#u
-v7:
 * Add Reviewed-by: "Huang, Ying" 
-v6:
 Thanks to Ying's comments,
 * Move `default_dram_perf_lock` to the function's beginning for clarity
 * Fix double unlocking at v5
 * 
https://lore.kernel.org/lkml/20240327072729.3381685-1-horenchu...@bytedance.com/T/#u
-v5:
 Thanks to Ying's comments,
 * Add comments about what is protected by `default_dram_perf_lock`
 * Fix an uninitialized pointer mtype
 * Slightly shorten the time holding `default_dram_perf_lock`
 * Fix a deadlock bug in `mt_perf_to_adistance`
 * 
https://lore.kernel.org/lkml/20240327041646.3258110-1-horenchu...@bytedance.com/T/#u
-v4:
 Thanks to Ying's comments,
 * Remove redundant code
 * Reorganize patches accordingly
 * 
https://lore.kernel.org/lkml/20240322070356.315922-1-horenchu...@bytedance.com/T/#u
-v3:
 Thanks to Ying's comments,
 * Make the newly added code independent of HMAT
 * Upgrade set_node_memory_tier to support more cases
 * Put all non-driver-initialized memory types into default_memory_types
   instead of using hmat_memory_types
 * find_alloc_memory_type -> mt_find_alloc_memory_type
 * 
https://lore.kernel.org/lkml/20240320061041.3246828-1-horenchu...@bytedance.com/T/#u
-v2:
 Thanks to Ying's comments,
 * Rewrite cover letter & patch description
 * Rename functions, don't use _hmat
 * Abstract common functions into find_alloc_memory_type()
 * Use the expected way to use set_node_memory_tier instead of modifying it
 * 
https://lore.kernel.org/lkml/20240312061729.1997111-1-horenchu...@bytedance.com/T/#u
-v1:
 * 
https://lore.kernel.org/lkml/20240301082248.3456086-1-horenchu...@bytedance.com/T/#u

Ho-Ren (Jack) Chuang (2):
  memory tier: dax/kmem: introduce an abstract layer for finding,
allocating, and putting memory types
  memory tier: create CPUless memory tiers after obtaining HMAT info

 drivers/dax/kmem.c   |  20 +-
 include/linux/memory-tiers.h |  14 
 mm/memory-tiers.c| 127 ++-
 3 files changed, 126 insertions(+), 35 deletions(-)

-- 
Ho-Ren (Jack) Chuang




Re: [External] Re: [PATCH v9 1/2] memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types

2024-04-01 Thread Ho-Ren (Jack) Chuang
Hi SeongJae,

On Mon, Apr 1, 2024 at 11:27 AM Ho-Ren (Jack) Chuang
 wrote:
>
> Hi SeongJae,
>
> On Sun, Mar 31, 2024 at 12:09 PM SeongJae Park  wrote:
> >
> > Hi Ho-Ren,
> >
> > On Fri, 29 Mar 2024 05:33:52 + "Ho-Ren (Jack) Chuang" 
> >  wrote:
> >
> > > Since different memory devices require finding, allocating, and putting
> > > memory types, these common steps are abstracted in this patch,
> > > enhancing the scalability and conciseness of the code.
> > >
> > > Signed-off-by: Ho-Ren (Jack) Chuang 
> > > Reviewed-by: "Huang, Ying" 
> > > ---
> > >  drivers/dax/kmem.c   | 20 ++--
> > >  include/linux/memory-tiers.h | 13 +
> > >  mm/memory-tiers.c| 32 
> > >  3 files changed, 47 insertions(+), 18 deletions(-)
> > >
> > [...]
> > > diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> > > index 69e781900082..a44c03c2ba3a 100644
> > > --- a/include/linux/memory-tiers.h
> > > +++ b/include/linux/memory-tiers.h
> > > @@ -48,6 +48,9 @@ int mt_calc_adistance(int node, int *adist);
> > >  int mt_set_default_dram_perf(int nid, struct access_coordinate *perf,
> > >const char *source);
> > >  int mt_perf_to_adistance(struct access_coordinate *perf, int *adist);
> > > +struct memory_dev_type *mt_find_alloc_memory_type(int adist,
> > > + struct list_head 
> > > *memory_types);
> > > +void mt_put_memory_types(struct list_head *memory_types);
> > >  #ifdef CONFIG_MIGRATION
> > >  int next_demotion_node(int node);
> > >  void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
> > > @@ -136,5 +139,15 @@ static inline int mt_perf_to_adistance(struct 
> > > access_coordinate *perf, int *adis
> > >  {
> > >   return -EIO;
> > >  }
> > > +
> > > +struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct 
> > > list_head *memory_types)
> > > +{
> > > + return NULL;
> > > +}
> > > +
> > > +void mt_put_memory_types(struct list_head *memory_types)
> > > +{
> > > +
> > > +}
> >
> > I found latest mm-unstable tree is failing kunit as below, and 'git bisect'
> > says it happens from this patch.
> >
> > $ ./tools/testing/kunit/kunit.py run --build_dir ../kunit.out/
> > [11:56:40] Configuring KUnit Kernel ...
> > [11:56:40] Building KUnit Kernel ...
> > Populating config with:
> > $ make ARCH=um O=../kunit.out/ olddefconfig
> > Building with:
> > $ make ARCH=um O=../kunit.out/ --jobs=36
> > ERROR:root:In file included from .../mm/memory.c:71:
> > .../include/linux/memory-tiers.h:143:25: warning: no previous prototype 
> > for ‘mt_find_alloc_memory_type’ [-Wmissing-prototypes]
> >   143 | struct memory_dev_type *mt_find_alloc_memory_type(int adist, 
> > struct list_head *memory_types)
> >   | ^
> > .../include/linux/memory-tiers.h:148:6: warning: no previous prototype 
> > for ‘mt_put_memory_types’ [-Wmissing-prototypes]
> >   148 | void mt_put_memory_types(struct list_head *memory_types)
> >   |  ^~~
> > [...]
> >
> > Maybe we should set these as 'static inline', like below?  I confirmed this
> > fixes the kunit error.  May I ask your opinion?
> >
>
> Thanks for catching this. I'm trying to figure out this problem. Will get 
> back.
>

These kunit compilation errors can be solved by adding `static inline`
to the two complaining functions, the same solution you mentioned
earlier.

I've also tested on my end and I will send out a V10 soon. Thank you again!

> >
> > diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> > index a44c03c2ba3a..ee6e53144156 100644
> > --- a/include/linux/memory-tiers.h
> > +++ b/include/linux/memory-tiers.h
> > @@ -140,12 +140,12 @@ static inline int mt_perf_to_adistance(struct 
> > access_coordinate *perf, int *adis
> > return -EIO;
> >  }
> >
> > -struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct 
> > list_head *memory_types)
> > +static inline struct memory_dev_type *mt_find_alloc_memory_type(int adist, 
> > struct list_head *memory_types)
> >  {
> > return NULL;
> >  }
> >
> > -void mt_put_memory_types(struct list_head *memory_types)
> > +static inline void mt_put_memory_types(struct list_head *memory_types)
> >  {
> >
> >  }
> >
> >
> > Thanks,
> > SJ
>
>
>
> --
> Best regards,
> Ho-Ren (Jack) Chuang
> 莊賀任



-- 
Best regards,
Ho-Ren (Jack) Chuang
莊賀任



Re: [External] Re: [PATCH v9 1/2] memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types

2024-04-01 Thread Ho-Ren (Jack) Chuang
Hi SeongJae,

On Sun, Mar 31, 2024 at 12:09 PM SeongJae Park  wrote:
>
> Hi Ho-Ren,
>
> On Fri, 29 Mar 2024 05:33:52 +0000 "Ho-Ren (Jack) Chuang" 
>  wrote:
>
> > Since different memory devices require finding, allocating, and putting
> > memory types, these common steps are abstracted in this patch,
> > enhancing the scalability and conciseness of the code.
> >
> > Signed-off-by: Ho-Ren (Jack) Chuang 
> > Reviewed-by: "Huang, Ying" 
> > ---
> >  drivers/dax/kmem.c   | 20 ++--
> >  include/linux/memory-tiers.h | 13 +
> >  mm/memory-tiers.c| 32 
> >  3 files changed, 47 insertions(+), 18 deletions(-)
> >
> [...]
> > diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> > index 69e781900082..a44c03c2ba3a 100644
> > --- a/include/linux/memory-tiers.h
> > +++ b/include/linux/memory-tiers.h
> > @@ -48,6 +48,9 @@ int mt_calc_adistance(int node, int *adist);
> >  int mt_set_default_dram_perf(int nid, struct access_coordinate *perf,
> >const char *source);
> >  int mt_perf_to_adistance(struct access_coordinate *perf, int *adist);
> > +struct memory_dev_type *mt_find_alloc_memory_type(int adist,
> > + struct list_head 
> > *memory_types);
> > +void mt_put_memory_types(struct list_head *memory_types);
> >  #ifdef CONFIG_MIGRATION
> >  int next_demotion_node(int node);
> >  void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
> > @@ -136,5 +139,15 @@ static inline int mt_perf_to_adistance(struct 
> > access_coordinate *perf, int *adis
> >  {
> >   return -EIO;
> >  }
> > +
> > +struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct 
> > list_head *memory_types)
> > +{
> > + return NULL;
> > +}
> > +
> > +void mt_put_memory_types(struct list_head *memory_types)
> > +{
> > +
> > +}
>
> I found latest mm-unstable tree is failing kunit as below, and 'git bisect'
> says it happens from this patch.
>
> $ ./tools/testing/kunit/kunit.py run --build_dir ../kunit.out/
> [11:56:40] Configuring KUnit Kernel ...
> [11:56:40] Building KUnit Kernel ...
> Populating config with:
> $ make ARCH=um O=../kunit.out/ olddefconfig
> Building with:
> $ make ARCH=um O=../kunit.out/ --jobs=36
> ERROR:root:In file included from .../mm/memory.c:71:
> .../include/linux/memory-tiers.h:143:25: warning: no previous prototype 
> for ‘mt_find_alloc_memory_type’ [-Wmissing-prototypes]
>   143 | struct memory_dev_type *mt_find_alloc_memory_type(int adist, 
> struct list_head *memory_types)
>   | ^
> .../include/linux/memory-tiers.h:148:6: warning: no previous prototype 
> for ‘mt_put_memory_types’ [-Wmissing-prototypes]
>   148 | void mt_put_memory_types(struct list_head *memory_types)
>   |  ^~~
> [...]
>
> Maybe we should set these as 'static inline', like below?  I confirmed this
> fixes the kunit error.  May I ask your opinion?
>

Thanks for catching this. I'm trying to figure out this problem. Will get back.

>
> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> index a44c03c2ba3a..ee6e53144156 100644
> --- a/include/linux/memory-tiers.h
> +++ b/include/linux/memory-tiers.h
> @@ -140,12 +140,12 @@ static inline int mt_perf_to_adistance(struct 
> access_coordinate *perf, int *adis
> return -EIO;
>  }
>
> -struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct 
> list_head *memory_types)
> +static inline struct memory_dev_type *mt_find_alloc_memory_type(int adist, 
> struct list_head *memory_types)
>  {
> return NULL;
>  }
>
> -void mt_put_memory_types(struct list_head *memory_types)
> +static inline void mt_put_memory_types(struct list_head *memory_types)
>  {
>
>  }
>
>
> Thanks,
> SJ



-- 
Best regards,
Ho-Ren (Jack) Chuang
莊賀任



[PATCH v9 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-03-28 Thread Ho-Ren (Jack) Chuang
The current implementation treats emulated memory devices, such as
CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
(E820_TYPE_RAM). However, these emulated devices have different
characteristics than traditional DRAM, making it important to
distinguish them. Thus, we modify the tiered memory initialization process
to introduce a delay specifically for CPUless NUMA nodes. This delay
ensures that the memory tier initialization for these nodes is deferred
until HMAT information is obtained during the boot process. Finally,
demotion tables are recalculated at the end.

* late_initcall(memory_tier_late_init);
Some device drivers may have initialized memory tiers between
`memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
online memory nodes and configuring memory tiers. They should be excluded
in the late init.

* Handle cases where there is no HMAT when creating memory tiers
There is a scenario where a CPUless node does not provide HMAT information.
If no HMAT is specified, it falls back to using the default DRAM tier.

* Introduce another new lock `default_dram_perf_lock` for adist calculation
In the current implementation, iterating through CPUlist nodes requires
holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
trying to acquire the same lock, leading to a potential deadlock.
Therefore, we propose introducing a standalone `default_dram_perf_lock` to
protect `default_dram_perf_*`. This approach not only avoids deadlock
but also prevents holding a large lock simultaneously.

* Upgrade `set_node_memory_tier` to support additional cases, including
  default DRAM, late CPUless, and hot-plugged initializations.
To cover hot-plugged memory nodes, `mt_calc_adistance()` and
`mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
handle cases where memtype is not initialized and where HMAT information is
available.

* Introduce `default_memory_types` for those memory types that are not
  initialized by device drivers.
Because late initialized memory and default DRAM memory need to be managed,
a default memory type is created for storing all memory types that are
not initialized by device drivers and as a fallback.

Signed-off-by: Ho-Ren (Jack) Chuang 
Signed-off-by: Hao Xiang 
Reviewed-by: "Huang, Ying" 
---
 mm/memory-tiers.c | 93 +++
 1 file changed, 77 insertions(+), 16 deletions(-)

diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 974af10cfdd8..9f8ae99e8e6e 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -36,6 +36,11 @@ struct node_memory_type_map {
 
 static DEFINE_MUTEX(memory_tier_lock);
 static LIST_HEAD(memory_tiers);
+/*
+ * The list is used to store all memory types that are not created
+ * by a device driver.
+ */
+static LIST_HEAD(default_memory_types);
 static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
 struct memory_dev_type *default_dram_type;
 
@@ -108,6 +113,8 @@ static struct demotion_nodes *node_demotion __read_mostly;
 
 static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms);
 
+/* The lock is used to protect `default_dram_perf*` info and nid. */
+static DEFINE_MUTEX(default_dram_perf_lock);
 static bool default_dram_perf_error;
 static struct access_coordinate default_dram_perf;
 static int default_dram_perf_ref_nid = NUMA_NO_NODE;
@@ -505,7 +512,8 @@ static inline void __init_node_memory_type(int node, struct 
memory_dev_type *mem
 static struct memory_tier *set_node_memory_tier(int node)
 {
struct memory_tier *memtier;
-   struct memory_dev_type *memtype;
+   struct memory_dev_type *mtype = default_dram_type;
+   int adist = MEMTIER_ADISTANCE_DRAM;
pg_data_t *pgdat = NODE_DATA(node);
 
 
@@ -514,11 +522,20 @@ static struct memory_tier *set_node_memory_tier(int node)
if (!node_state(node, N_MEMORY))
return ERR_PTR(-EINVAL);
 
-   __init_node_memory_type(node, default_dram_type);
+   mt_calc_adistance(node, );
+   if (node_memory_types[node].memtype == NULL) {
+   mtype = mt_find_alloc_memory_type(adist, _memory_types);
+   if (IS_ERR(mtype)) {
+   mtype = default_dram_type;
+   pr_info("Failed to allocate a memory type. Fall 
back.\n");
+   }
+   }
+
+   __init_node_memory_type(node, mtype);
 
-   memtype = node_memory_types[node].memtype;
-   node_set(node, memtype->nodes);
-   memtier = find_create_memory_tier(memtype);
+   mtype = node_memory_types[node].memtype;
+   node_set(node, mtype->nodes);
+   memtier = find_create_memory_tier(mtype);
if (!IS_ERR(memtier))
rcu_assign_pointer(pgdat->memtier, memtier);
return memtier;
@@ -655,6 +672,33 @@ void mt_put_memory_types(struct list_head *memory_types)
 }
 EXPORT_SYMBOL_GPL(mt_put_memory_types);
 
+/*
+ * This is invoked via `late_initcall(

[PATCH v9 1/2] memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types

2024-03-28 Thread Ho-Ren (Jack) Chuang
Since different memory devices require finding, allocating, and putting
memory types, these common steps are abstracted in this patch,
enhancing the scalability and conciseness of the code.

Signed-off-by: Ho-Ren (Jack) Chuang 
Reviewed-by: "Huang, Ying" 
---
 drivers/dax/kmem.c   | 20 ++--
 include/linux/memory-tiers.h | 13 +
 mm/memory-tiers.c| 32 
 3 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 42ee360cf4e3..01399e5b53b2 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -55,21 +55,10 @@ static LIST_HEAD(kmem_memory_types);
 
 static struct memory_dev_type *kmem_find_alloc_memory_type(int adist)
 {
-   bool found = false;
struct memory_dev_type *mtype;
 
mutex_lock(_memory_type_lock);
-   list_for_each_entry(mtype, _memory_types, list) {
-   if (mtype->adistance == adist) {
-   found = true;
-   break;
-   }
-   }
-   if (!found) {
-   mtype = alloc_memory_type(adist);
-   if (!IS_ERR(mtype))
-   list_add(>list, _memory_types);
-   }
+   mtype = mt_find_alloc_memory_type(adist, _memory_types);
mutex_unlock(_memory_type_lock);
 
return mtype;
@@ -77,13 +66,8 @@ static struct memory_dev_type 
*kmem_find_alloc_memory_type(int adist)
 
 static void kmem_put_memory_types(void)
 {
-   struct memory_dev_type *mtype, *mtn;
-
mutex_lock(_memory_type_lock);
-   list_for_each_entry_safe(mtype, mtn, _memory_types, list) {
-   list_del(>list);
-   put_memory_type(mtype);
-   }
+   mt_put_memory_types(_memory_types);
mutex_unlock(_memory_type_lock);
 }
 
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 69e781900082..a44c03c2ba3a 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -48,6 +48,9 @@ int mt_calc_adistance(int node, int *adist);
 int mt_set_default_dram_perf(int nid, struct access_coordinate *perf,
 const char *source);
 int mt_perf_to_adistance(struct access_coordinate *perf, int *adist);
+struct memory_dev_type *mt_find_alloc_memory_type(int adist,
+   struct list_head 
*memory_types);
+void mt_put_memory_types(struct list_head *memory_types);
 #ifdef CONFIG_MIGRATION
 int next_demotion_node(int node);
 void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
@@ -136,5 +139,15 @@ static inline int mt_perf_to_adistance(struct 
access_coordinate *perf, int *adis
 {
return -EIO;
 }
+
+struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+{
+   return NULL;
+}
+
+void mt_put_memory_types(struct list_head *memory_types)
+{
+
+}
 #endif /* CONFIG_NUMA */
 #endif  /* _LINUX_MEMORY_TIERS_H */
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 0537664620e5..974af10cfdd8 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -623,6 +623,38 @@ void clear_node_memory_type(int node, struct 
memory_dev_type *memtype)
 }
 EXPORT_SYMBOL_GPL(clear_node_memory_type);
 
+struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+{
+   bool found = false;
+   struct memory_dev_type *mtype;
+
+   list_for_each_entry(mtype, memory_types, list) {
+   if (mtype->adistance == adist) {
+   found = true;
+   break;
+   }
+   }
+   if (!found) {
+   mtype = alloc_memory_type(adist);
+   if (!IS_ERR(mtype))
+   list_add(>list, memory_types);
+   }
+
+   return mtype;
+}
+EXPORT_SYMBOL_GPL(mt_find_alloc_memory_type);
+
+void mt_put_memory_types(struct list_head *memory_types)
+{
+   struct memory_dev_type *mtype, *mtn;
+
+   list_for_each_entry_safe(mtype, mtn, memory_types, list) {
+   list_del(>list);
+   put_memory_type(mtype);
+   }
+}
+EXPORT_SYMBOL_GPL(mt_put_memory_types);
+
 static void dump_hmem_attrs(struct access_coordinate *coord, const char 
*prefix)
 {
    pr_info(
-- 
Ho-Ren (Jack) Chuang




[PATCH v9 0/2] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-03-28 Thread Ho-Ren (Jack) Chuang
When a memory device, such as CXL1.1 type3 memory, is emulated as
normal memory (E820_TYPE_RAM), the memory device is indistinguishable
from normal DRAM in terms of memory tiering with the current implementation.
The current memory tiering assigns all detected normal memory nodes
to the same DRAM tier. This results in normal memory devices with
different attributions being unable to be assigned to the correct memory tier,
leading to the inability to migrate pages between different types of memory.
https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/

This patchset automatically resolves the issues. It delays the initialization
of memory tiers for CPUless NUMA nodes until they obtain HMAT information
and after all devices are initialized at boot time, eliminating the need
for user intervention. If no HMAT is specified, it falls back to
using `default_dram_type`.

Example usecase:
We have CXL memory on the host, and we create VMs with a new system memory
device backed by host CXL memory. We inject CXL memory performance attributes
through QEMU, and the guest now sees memory nodes with performance attributes
in HMAT. With this change, we enable the guest kernel to construct
the correct memory tiering for the memory nodes.

-v9:
 * Address corner cases in `memory_tier_late_init`. Thank Ying's comments.
-v8:
 * Fix email format
 * 
https://lore.kernel.org/lkml/20240329004815.195476-1-horenchu...@bytedance.com/T/#u
-v7:
 * Add Reviewed-by: "Huang, Ying" 
-v6:
 Thanks to Ying's comments,
 * Move `default_dram_perf_lock` to the function's beginning for clarity
 * Fix double unlocking at v5
 * 
https://lore.kernel.org/lkml/20240327072729.3381685-1-horenchu...@bytedance.com/T/#u
-v5:
 Thanks to Ying's comments,
 * Add comments about what is protected by `default_dram_perf_lock`
 * Fix an uninitialized pointer mtype
 * Slightly shorten the time holding `default_dram_perf_lock`
 * Fix a deadlock bug in `mt_perf_to_adistance`
 * 
https://lore.kernel.org/lkml/20240327041646.3258110-1-horenchu...@bytedance.com/T/#u
-v4:
 Thanks to Ying's comments,
 * Remove redundant code
 * Reorganize patches accordingly
 * 
https://lore.kernel.org/lkml/20240322070356.315922-1-horenchu...@bytedance.com/T/#u
-v3:
 Thanks to Ying's comments,
 * Make the newly added code independent of HMAT
 * Upgrade set_node_memory_tier to support more cases
 * Put all non-driver-initialized memory types into default_memory_types
   instead of using hmat_memory_types
 * find_alloc_memory_type -> mt_find_alloc_memory_type
 * 
https://lore.kernel.org/lkml/20240320061041.3246828-1-horenchu...@bytedance.com/T/#u
-v2:
 Thanks to Ying's comments,
 * Rewrite cover letter & patch description
 * Rename functions, don't use _hmat
 * Abstract common functions into find_alloc_memory_type()
 * Use the expected way to use set_node_memory_tier instead of modifying it
 * 
https://lore.kernel.org/lkml/20240312061729.1997111-1-horenchu...@bytedance.com/T/#u
-v1:
 * 
https://lore.kernel.org/lkml/20240301082248.3456086-1-horenchu...@bytedance.com/T/#u

Ho-Ren (Jack) Chuang (2):
  memory tier: dax/kmem: introduce an abstract layer for finding,
allocating, and putting memory types
  memory tier: create CPUless memory tiers after obtaining HMAT info

 drivers/dax/kmem.c   |  20 +-
 include/linux/memory-tiers.h |  13 
 mm/memory-tiers.c| 125 ++-
 3 files changed, 124 insertions(+), 34 deletions(-)

-- 
Ho-Ren (Jack) Chuang




Re: [External] Re: [PATCH v8 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-03-28 Thread Ho-Ren (Jack) Chuang
On Thu, Mar 28, 2024 at 5:59 PM Huang, Ying  wrote:
>
> "Ho-Ren (Jack) Chuang"  writes:
>
> > The current implementation treats emulated memory devices, such as
> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
> > (E820_TYPE_RAM). However, these emulated devices have different
> > characteristics than traditional DRAM, making it important to
> > distinguish them. Thus, we modify the tiered memory initialization process
> > to introduce a delay specifically for CPUless NUMA nodes. This delay
> > ensures that the memory tier initialization for these nodes is deferred
> > until HMAT information is obtained during the boot process. Finally,
> > demotion tables are recalculated at the end.
> >
> > * late_initcall(memory_tier_late_init);
> > Some device drivers may have initialized memory tiers between
> > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
> > online memory nodes and configuring memory tiers. They should be excluded
> > in the late init.
> >
> > * Handle cases where there is no HMAT when creating memory tiers
> > There is a scenario where a CPUless node does not provide HMAT information.
> > If no HMAT is specified, it falls back to using the default DRAM tier.
> >
> > * Introduce another new lock `default_dram_perf_lock` for adist calculation
> > In the current implementation, iterating through CPUlist nodes requires
> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
> > trying to acquire the same lock, leading to a potential deadlock.
> > Therefore, we propose introducing a standalone `default_dram_perf_lock` to
> > protect `default_dram_perf_*`. This approach not only avoids deadlock
> > but also prevents holding a large lock simultaneously.
> >
> > * Upgrade `set_node_memory_tier` to support additional cases, including
> >   default DRAM, late CPUless, and hot-plugged initializations.
> > To cover hot-plugged memory nodes, `mt_calc_adistance()` and
> > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
> > handle cases where memtype is not initialized and where HMAT information is
> > available.
> >
> > * Introduce `default_memory_types` for those memory types that are not
> >   initialized by device drivers.
> > Because late initialized memory and default DRAM memory need to be managed,
> > a default memory type is created for storing all memory types that are
> > not initialized by device drivers and as a fallback.
> >
> > Signed-off-by: Ho-Ren (Jack) Chuang 
> > Signed-off-by: Hao Xiang 
> > Reviewed-by: "Huang, Ying" 
> > ---
> >  mm/memory-tiers.c | 94 +++
> >  1 file changed, 78 insertions(+), 16 deletions(-)
> >
> > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> > index 974af10cfdd8..e24fc3bebae4 100644
> > --- a/mm/memory-tiers.c
> > +++ b/mm/memory-tiers.c
> > @@ -36,6 +36,11 @@ struct node_memory_type_map {
> >
> >  static DEFINE_MUTEX(memory_tier_lock);
> >  static LIST_HEAD(memory_tiers);
> > +/*
> > + * The list is used to store all memory types that are not created
> > + * by a device driver.
> > + */
> > +static LIST_HEAD(default_memory_types);
> >  static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
> >  struct memory_dev_type *default_dram_type;
> >
> > @@ -108,6 +113,8 @@ static struct demotion_nodes *node_demotion 
> > __read_mostly;
> >
> >  static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms);
> >
> > +/* The lock is used to protect `default_dram_perf*` info and nid. */
> > +static DEFINE_MUTEX(default_dram_perf_lock);
> >  static bool default_dram_perf_error;
> >  static struct access_coordinate default_dram_perf;
> >  static int default_dram_perf_ref_nid = NUMA_NO_NODE;
> > @@ -505,7 +512,8 @@ static inline void __init_node_memory_type(int node, 
> > struct memory_dev_type *mem
> >  static struct memory_tier *set_node_memory_tier(int node)
> >  {
> >   struct memory_tier *memtier;
> > - struct memory_dev_type *memtype;
> > + struct memory_dev_type *mtype = default_dram_type;
> > + int adist = MEMTIER_ADISTANCE_DRAM;
> >   pg_data_t *pgdat = NODE_DATA(node);
> >
> >
> > @@ -514,11 +522,20 @@ static struct memory_tier *set_node_memory_tier(int 
> > node)
> >   if (!node_state(node, N_MEMORY))
> >   return ERR_PTR(-EINVAL);
> >
> > - __init_node_memory_type(node, default_dram_type);
> > + m

[PATCH v8 1/2] memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types

2024-03-28 Thread Ho-Ren (Jack) Chuang
Since different memory devices require finding, allocating, and putting
memory types, these common steps are abstracted in this patch,
enhancing the scalability and conciseness of the code.

Signed-off-by: Ho-Ren (Jack) Chuang 
Reviewed-by: "Huang, Ying" 
---
 drivers/dax/kmem.c   | 20 ++--
 include/linux/memory-tiers.h | 13 +
 mm/memory-tiers.c| 32 
 3 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 42ee360cf4e3..01399e5b53b2 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -55,21 +55,10 @@ static LIST_HEAD(kmem_memory_types);
 
 static struct memory_dev_type *kmem_find_alloc_memory_type(int adist)
 {
-   bool found = false;
struct memory_dev_type *mtype;
 
mutex_lock(_memory_type_lock);
-   list_for_each_entry(mtype, _memory_types, list) {
-   if (mtype->adistance == adist) {
-   found = true;
-   break;
-   }
-   }
-   if (!found) {
-   mtype = alloc_memory_type(adist);
-   if (!IS_ERR(mtype))
-   list_add(>list, _memory_types);
-   }
+   mtype = mt_find_alloc_memory_type(adist, _memory_types);
mutex_unlock(_memory_type_lock);
 
return mtype;
@@ -77,13 +66,8 @@ static struct memory_dev_type 
*kmem_find_alloc_memory_type(int adist)
 
 static void kmem_put_memory_types(void)
 {
-   struct memory_dev_type *mtype, *mtn;
-
mutex_lock(_memory_type_lock);
-   list_for_each_entry_safe(mtype, mtn, _memory_types, list) {
-   list_del(>list);
-   put_memory_type(mtype);
-   }
+   mt_put_memory_types(_memory_types);
mutex_unlock(_memory_type_lock);
 }
 
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 69e781900082..a44c03c2ba3a 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -48,6 +48,9 @@ int mt_calc_adistance(int node, int *adist);
 int mt_set_default_dram_perf(int nid, struct access_coordinate *perf,
 const char *source);
 int mt_perf_to_adistance(struct access_coordinate *perf, int *adist);
+struct memory_dev_type *mt_find_alloc_memory_type(int adist,
+   struct list_head 
*memory_types);
+void mt_put_memory_types(struct list_head *memory_types);
 #ifdef CONFIG_MIGRATION
 int next_demotion_node(int node);
 void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
@@ -136,5 +139,15 @@ static inline int mt_perf_to_adistance(struct 
access_coordinate *perf, int *adis
 {
return -EIO;
 }
+
+struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+{
+   return NULL;
+}
+
+void mt_put_memory_types(struct list_head *memory_types)
+{
+
+}
 #endif /* CONFIG_NUMA */
 #endif  /* _LINUX_MEMORY_TIERS_H */
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 0537664620e5..974af10cfdd8 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -623,6 +623,38 @@ void clear_node_memory_type(int node, struct 
memory_dev_type *memtype)
 }
 EXPORT_SYMBOL_GPL(clear_node_memory_type);
 
+struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+{
+   bool found = false;
+   struct memory_dev_type *mtype;
+
+   list_for_each_entry(mtype, memory_types, list) {
+   if (mtype->adistance == adist) {
+   found = true;
+   break;
+   }
+   }
+   if (!found) {
+   mtype = alloc_memory_type(adist);
+   if (!IS_ERR(mtype))
+   list_add(>list, memory_types);
+   }
+
+   return mtype;
+}
+EXPORT_SYMBOL_GPL(mt_find_alloc_memory_type);
+
+void mt_put_memory_types(struct list_head *memory_types)
+{
+   struct memory_dev_type *mtype, *mtn;
+
+   list_for_each_entry_safe(mtype, mtn, memory_types, list) {
+   list_del(>list);
+   put_memory_type(mtype);
+   }
+}
+EXPORT_SYMBOL_GPL(mt_put_memory_types);
+
 static void dump_hmem_attrs(struct access_coordinate *coord, const char 
*prefix)
 {
    pr_info(
-- 
Ho-Ren (Jack) Chuang




[PATCH v8 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-03-28 Thread Ho-Ren (Jack) Chuang
The current implementation treats emulated memory devices, such as
CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
(E820_TYPE_RAM). However, these emulated devices have different
characteristics than traditional DRAM, making it important to
distinguish them. Thus, we modify the tiered memory initialization process
to introduce a delay specifically for CPUless NUMA nodes. This delay
ensures that the memory tier initialization for these nodes is deferred
until HMAT information is obtained during the boot process. Finally,
demotion tables are recalculated at the end.

* late_initcall(memory_tier_late_init);
Some device drivers may have initialized memory tiers between
`memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
online memory nodes and configuring memory tiers. They should be excluded
in the late init.

* Handle cases where there is no HMAT when creating memory tiers
There is a scenario where a CPUless node does not provide HMAT information.
If no HMAT is specified, it falls back to using the default DRAM tier.

* Introduce another new lock `default_dram_perf_lock` for adist calculation
In the current implementation, iterating through CPUlist nodes requires
holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
trying to acquire the same lock, leading to a potential deadlock.
Therefore, we propose introducing a standalone `default_dram_perf_lock` to
protect `default_dram_perf_*`. This approach not only avoids deadlock
but also prevents holding a large lock simultaneously.

* Upgrade `set_node_memory_tier` to support additional cases, including
  default DRAM, late CPUless, and hot-plugged initializations.
To cover hot-plugged memory nodes, `mt_calc_adistance()` and
`mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
handle cases where memtype is not initialized and where HMAT information is
available.

* Introduce `default_memory_types` for those memory types that are not
  initialized by device drivers.
Because late initialized memory and default DRAM memory need to be managed,
a default memory type is created for storing all memory types that are
not initialized by device drivers and as a fallback.

Signed-off-by: Ho-Ren (Jack) Chuang 
Signed-off-by: Hao Xiang 
Reviewed-by: "Huang, Ying" 
---
 mm/memory-tiers.c | 94 +++
 1 file changed, 78 insertions(+), 16 deletions(-)

diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 974af10cfdd8..e24fc3bebae4 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -36,6 +36,11 @@ struct node_memory_type_map {
 
 static DEFINE_MUTEX(memory_tier_lock);
 static LIST_HEAD(memory_tiers);
+/*
+ * The list is used to store all memory types that are not created
+ * by a device driver.
+ */
+static LIST_HEAD(default_memory_types);
 static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
 struct memory_dev_type *default_dram_type;
 
@@ -108,6 +113,8 @@ static struct demotion_nodes *node_demotion __read_mostly;
 
 static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms);
 
+/* The lock is used to protect `default_dram_perf*` info and nid. */
+static DEFINE_MUTEX(default_dram_perf_lock);
 static bool default_dram_perf_error;
 static struct access_coordinate default_dram_perf;
 static int default_dram_perf_ref_nid = NUMA_NO_NODE;
@@ -505,7 +512,8 @@ static inline void __init_node_memory_type(int node, struct 
memory_dev_type *mem
 static struct memory_tier *set_node_memory_tier(int node)
 {
struct memory_tier *memtier;
-   struct memory_dev_type *memtype;
+   struct memory_dev_type *mtype = default_dram_type;
+   int adist = MEMTIER_ADISTANCE_DRAM;
pg_data_t *pgdat = NODE_DATA(node);
 
 
@@ -514,11 +522,20 @@ static struct memory_tier *set_node_memory_tier(int node)
if (!node_state(node, N_MEMORY))
return ERR_PTR(-EINVAL);
 
-   __init_node_memory_type(node, default_dram_type);
+   mt_calc_adistance(node, );
+   if (node_memory_types[node].memtype == NULL) {
+   mtype = mt_find_alloc_memory_type(adist, _memory_types);
+   if (IS_ERR(mtype)) {
+   mtype = default_dram_type;
+   pr_info("Failed to allocate a memory type. Fall 
back.\n");
+   }
+   }
+
+   __init_node_memory_type(node, mtype);
 
-   memtype = node_memory_types[node].memtype;
-   node_set(node, memtype->nodes);
-   memtier = find_create_memory_tier(memtype);
+   mtype = node_memory_types[node].memtype;
+   node_set(node, mtype->nodes);
+   memtier = find_create_memory_tier(mtype);
if (!IS_ERR(memtier))
rcu_assign_pointer(pgdat->memtier, memtier);
return memtier;
@@ -655,6 +672,34 @@ void mt_put_memory_types(struct list_head *memory_types)
 }
 EXPORT_SYMBOL_GPL(mt_put_memory_types);
 
+/*
+ * This is invoked via `late_initcall(

[PATCH v8 0/2] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-03-28 Thread Ho-Ren (Jack) Chuang
When a memory device, such as CXL1.1 type3 memory, is emulated as
normal memory (E820_TYPE_RAM), the memory device is indistinguishable
from normal DRAM in terms of memory tiering with the current implementation.
The current memory tiering assigns all detected normal memory nodes
to the same DRAM tier. This results in normal memory devices with
different attributions being unable to be assigned to the correct memory tier,
leading to the inability to migrate pages between different types of memory.
https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/

This patchset automatically resolves the issues. It delays the initialization
of memory tiers for CPUless NUMA nodes until they obtain HMAT information
and after all devices are initialized at boot time, eliminating the need
for user intervention. If no HMAT is specified, it falls back to
using `default_dram_type`.

Example usecase:
We have CXL memory on the host, and we create VMs with a new system memory
device backed by host CXL memory. We inject CXL memory performance attributes
through QEMU, and the guest now sees memory nodes with performance attributes
in HMAT. With this change, we enable the guest kernel to construct
the correct memory tiering for the memory nodes.

-v8:
 * Fix email format
-v7:
 * Add Reviewed-by: Huang, Ying 
-v6:
 Thanks to Ying's comments,
 * Move `default_dram_perf_lock` to the function's beginning for clarity
 * Fix double unlocking at v5
 * 
https://lore.kernel.org/lkml/20240327072729.3381685-1-horenchu...@bytedance.com/T/#u
-v5:
 Thanks to Ying's comments,
 * Add comments about what is protected by `default_dram_perf_lock`
 * Fix an uninitialized pointer mtype
 * Slightly shorten the time holding `default_dram_perf_lock`
 * Fix a deadlock bug in `mt_perf_to_adistance`
 * 
https://lore.kernel.org/lkml/20240327041646.3258110-1-horenchu...@bytedance.com/T/#u
-v4:
 Thanks to Ying's comments,
 * Remove redundant code
 * Reorganize patches accordingly
 * 
https://lore.kernel.org/lkml/20240322070356.315922-1-horenchu...@bytedance.com/T/#u
-v3:
 Thanks to Ying's comments,
 * Make the newly added code independent of HMAT
 * Upgrade set_node_memory_tier to support more cases
 * Put all non-driver-initialized memory types into default_memory_types
   instead of using hmat_memory_types
 * find_alloc_memory_type -> mt_find_alloc_memory_type
 * 
https://lore.kernel.org/lkml/20240320061041.3246828-1-horenchu...@bytedance.com/T/#u
-v2:
 Thanks to Ying's comments,
 * Rewrite cover letter & patch description
 * Rename functions, don't use _hmat
 * Abstract common functions into find_alloc_memory_type()
 * Use the expected way to use set_node_memory_tier instead of modifying it
 * 
https://lore.kernel.org/lkml/20240312061729.1997111-1-horenchu...@bytedance.com/T/#u
-v1:
 * 
https://lore.kernel.org/lkml/20240301082248.3456086-1-horenchu...@bytedance.com/T/#u


Ho-Ren (Jack) Chuang (2):
  memory tier: dax/kmem: introduce an abstract layer for finding,
allocating, and putting memory types
  memory tier: create CPUless memory tiers after obtaining HMAT info

 drivers/dax/kmem.c   |  20 +-
 include/linux/memory-tiers.h |  13 
 mm/memory-tiers.c| 126 ++-
 3 files changed, 125 insertions(+), 34 deletions(-)

-- 
Ho-Ren (Jack) Chuang




[PATCH v7 0/2] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-03-28 Thread Ho-Ren (Jack) Chuang
When a memory device, such as CXL1.1 type3 memory, is emulated as
normal memory (E820_TYPE_RAM), the memory device is indistinguishable
from normal DRAM in terms of memory tiering with the current implementation.
The current memory tiering assigns all detected normal memory nodes
to the same DRAM tier. This results in normal memory devices with
different attributions being unable to be assigned to the correct memory tier,
leading to the inability to migrate pages between different types of memory.
https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/

This patchset automatically resolves the issues. It delays the initialization
of memory tiers for CPUless NUMA nodes until they obtain HMAT information
and after all devices are initialized at boot time, eliminating the need
for user intervention. If no HMAT is specified, it falls back to
using `default_dram_type`.

Example usecase:
We have CXL memory on the host, and we create VMs with a new system memory
device backed by host CXL memory. We inject CXL memory performance attributes
through QEMU, and the guest now sees memory nodes with performance attributes
in HMAT. With this change, we enable the guest kernel to construct
the correct memory tiering for the memory nodes.

-v7:
 * Add Reviewed-by: Huang, Ying 
-v6:
 Thanks to Ying's comments,
 * Move `default_dram_perf_lock` to the function's beginning for clarity
 * Fix double unlocking at v5
 * 
https://lore.kernel.org/lkml/20240327072729.3381685-1-horenchu...@bytedance.com/T/#u
-v5:
 Thanks to Ying's comments,
 * Add comments about what is protected by `default_dram_perf_lock`
 * Fix an uninitialized pointer mtype
 * Slightly shorten the time holding `default_dram_perf_lock`
 * Fix a deadlock bug in `mt_perf_to_adistance`
 * 
https://lore.kernel.org/lkml/20240327041646.3258110-1-horenchu...@bytedance.com/T/#u
-v4:
 Thanks to Ying's comments,
 * Remove redundant code
 * Reorganize patches accordingly
 * 
https://lore.kernel.org/lkml/20240322070356.315922-1-horenchu...@bytedance.com/T/#u
-v3:
 Thanks to Ying's comments,
 * Make the newly added code independent of HMAT
 * Upgrade set_node_memory_tier to support more cases
 * Put all non-driver-initialized memory types into default_memory_types
   instead of using hmat_memory_types
 * find_alloc_memory_type -> mt_find_alloc_memory_type
 * 
https://lore.kernel.org/lkml/20240320061041.3246828-1-horenchu...@bytedance.com/T/#u
-v2:
 Thanks to Ying's comments,
 * Rewrite cover letter & patch description
 * Rename functions, don't use _hmat
 * Abstract common functions into find_alloc_memory_type()
 * Use the expected way to use set_node_memory_tier instead of modifying it
 * 
https://lore.kernel.org/lkml/20240312061729.1997111-1-horenchu...@bytedance.com/T/#u
-v1:
 * 
https://lore.kernel.org/lkml/20240301082248.3456086-1-horenchu...@bytedance.com/T/#u

Ho-Ren (Jack) Chuang (2):
  memory tier: dax/kmem: introduce an abstract layer for finding,
allocating, and putting memory types
  memory tier: create CPUless memory tiers after obtaining HMAT info

 drivers/dax/kmem.c   |  20 +-
 include/linux/memory-tiers.h |  13 
 mm/memory-tiers.c| 126 ++-
 3 files changed, 125 insertions(+), 34 deletions(-)

-- 
Ho-Ren (Jack) Chuang




Re: [External] Re: [PATCH v6 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-03-28 Thread Ho-Ren (Jack) Chuang
On Wed, Mar 27, 2024 at 6:37 PM Huang, Ying  wrote:
>
> "Ho-Ren (Jack) Chuang"  writes:
>
> [snip]
>
> > @@ -655,6 +672,34 @@ void mt_put_memory_types(struct list_head 
> > *memory_types)
> >  }
> >  EXPORT_SYMBOL_GPL(mt_put_memory_types);
> >
> > +/*
> > + * This is invoked via `late_initcall()` to initialize memory tiers for
> > + * CPU-less memory nodes after driver initialization, which is
> > + * expected to provide `adistance` algorithms.
> > + */
> > +static int __init memory_tier_late_init(void)
> > +{
> > + int nid;
> > +
> > + mutex_lock(_tier_lock);
> > + for_each_node_state(nid, N_MEMORY)
> > + if (!node_state(nid, N_CPU) &&
> > + node_memory_types[nid].memtype == NULL)
>
> Think about this again.  It seems that it is better to check
> "node_memory_types[nid].memtype == NULL" only here.  Because for all
> node with N_CPU in memory_tier_init(), "node_memory_types[nid].memtype"
> will be !NULL.  And it's possible (in theory) that some nodes becomes
> "node_state(nid, N_CPU) == true" between memory_tier_init() and
> memory_tier_late_init().
>
> Otherwise, Looks good to me.  Feel free to add
>
> Reviewed-by: "Huang, Ying" 
>
> in the future version.
>

Thank you Huang, Ying for your endorsement and
the feedback you've been giving!

> > + /*
> > +  * Some device drivers may have initialized memory 
> > tiers
> > +  * between `memory_tier_init()` and 
> > `memory_tier_late_init()`,
> > +  * potentially bringing online memory nodes and
> > +  * configuring memory tiers. Exclude them here.
> > +  */
> > + set_node_memory_tier(nid);
> > +
> > + establish_demotion_targets();
> > + mutex_unlock(_tier_lock);
> > +
> > + return 0;
> > +}
> > +late_initcall(memory_tier_late_init);
> > +
>
> [snip]
>
> --
> Best Regards,
> Huang, Ying



-- 
Best regards,
Ho-Ren (Jack) Chuang
莊賀任



[PATCH v6 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-03-27 Thread Ho-Ren (Jack) Chuang
The current implementation treats emulated memory devices, such as
CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
(E820_TYPE_RAM). However, these emulated devices have different
characteristics than traditional DRAM, making it important to
distinguish them. Thus, we modify the tiered memory initialization process
to introduce a delay specifically for CPUless NUMA nodes. This delay
ensures that the memory tier initialization for these nodes is deferred
until HMAT information is obtained during the boot process. Finally,
demotion tables are recalculated at the end.

* late_initcall(memory_tier_late_init);
Some device drivers may have initialized memory tiers between
`memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
online memory nodes and configuring memory tiers. They should be excluded
in the late init.

* Handle cases where there is no HMAT when creating memory tiers
There is a scenario where a CPUless node does not provide HMAT information.
If no HMAT is specified, it falls back to using the default DRAM tier.

* Introduce another new lock `default_dram_perf_lock` for adist calculation
In the current implementation, iterating through CPUlist nodes requires
holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
trying to acquire the same lock, leading to a potential deadlock.
Therefore, we propose introducing a standalone `default_dram_perf_lock` to
protect `default_dram_perf_*`. This approach not only avoids deadlock
but also prevents holding a large lock simultaneously.

* Upgrade `set_node_memory_tier` to support additional cases, including
  default DRAM, late CPUless, and hot-plugged initializations.
To cover hot-plugged memory nodes, `mt_calc_adistance()` and
`mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
handle cases where memtype is not initialized and where HMAT information is
available.

* Introduce `default_memory_types` for those memory types that are not
  initialized by device drivers.
Because late initialized memory and default DRAM memory need to be managed,
a default memory type is created for storing all memory types that are
not initialized by device drivers and as a fallback.

Signed-off-by: Ho-Ren (Jack) Chuang 
Signed-off-by: Hao Xiang 
---
 mm/memory-tiers.c | 94 +++
 1 file changed, 78 insertions(+), 16 deletions(-)

diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 974af10cfdd8..e24fc3bebae4 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -36,6 +36,11 @@ struct node_memory_type_map {
 
 static DEFINE_MUTEX(memory_tier_lock);
 static LIST_HEAD(memory_tiers);
+/*
+ * The list is used to store all memory types that are not created
+ * by a device driver.
+ */
+static LIST_HEAD(default_memory_types);
 static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
 struct memory_dev_type *default_dram_type;
 
@@ -108,6 +113,8 @@ static struct demotion_nodes *node_demotion __read_mostly;
 
 static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms);
 
+/* The lock is used to protect `default_dram_perf*` info and nid. */
+static DEFINE_MUTEX(default_dram_perf_lock);
 static bool default_dram_perf_error;
 static struct access_coordinate default_dram_perf;
 static int default_dram_perf_ref_nid = NUMA_NO_NODE;
@@ -505,7 +512,8 @@ static inline void __init_node_memory_type(int node, struct 
memory_dev_type *mem
 static struct memory_tier *set_node_memory_tier(int node)
 {
struct memory_tier *memtier;
-   struct memory_dev_type *memtype;
+   struct memory_dev_type *mtype = default_dram_type;
+   int adist = MEMTIER_ADISTANCE_DRAM;
pg_data_t *pgdat = NODE_DATA(node);
 
 
@@ -514,11 +522,20 @@ static struct memory_tier *set_node_memory_tier(int node)
if (!node_state(node, N_MEMORY))
return ERR_PTR(-EINVAL);
 
-   __init_node_memory_type(node, default_dram_type);
+   mt_calc_adistance(node, );
+   if (node_memory_types[node].memtype == NULL) {
+   mtype = mt_find_alloc_memory_type(adist, _memory_types);
+   if (IS_ERR(mtype)) {
+   mtype = default_dram_type;
+   pr_info("Failed to allocate a memory type. Fall 
back.\n");
+   }
+   }
+
+   __init_node_memory_type(node, mtype);
 
-   memtype = node_memory_types[node].memtype;
-   node_set(node, memtype->nodes);
-   memtier = find_create_memory_tier(memtype);
+   mtype = node_memory_types[node].memtype;
+   node_set(node, mtype->nodes);
+   memtier = find_create_memory_tier(mtype);
if (!IS_ERR(memtier))
rcu_assign_pointer(pgdat->memtier, memtier);
return memtier;
@@ -655,6 +672,34 @@ void mt_put_memory_types(struct list_head *memory_types)
 }
 EXPORT_SYMBOL_GPL(mt_put_memory_types);
 
+/*
+ * This is invoked via `late_initcall()` to initialize memory tiers for
+ * CP

[PATCH v6 1/2] memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types

2024-03-27 Thread Ho-Ren (Jack) Chuang
Since different memory devices require finding, allocating, and putting
memory types, these common steps are abstracted in this patch,
enhancing the scalability and conciseness of the code.

Signed-off-by: Ho-Ren (Jack) Chuang 
---
 drivers/dax/kmem.c   | 20 ++--
 include/linux/memory-tiers.h | 13 +
 mm/memory-tiers.c| 32 
 3 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 42ee360cf4e3..01399e5b53b2 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -55,21 +55,10 @@ static LIST_HEAD(kmem_memory_types);
 
 static struct memory_dev_type *kmem_find_alloc_memory_type(int adist)
 {
-   bool found = false;
struct memory_dev_type *mtype;
 
mutex_lock(_memory_type_lock);
-   list_for_each_entry(mtype, _memory_types, list) {
-   if (mtype->adistance == adist) {
-   found = true;
-   break;
-   }
-   }
-   if (!found) {
-   mtype = alloc_memory_type(adist);
-   if (!IS_ERR(mtype))
-   list_add(>list, _memory_types);
-   }
+   mtype = mt_find_alloc_memory_type(adist, _memory_types);
mutex_unlock(_memory_type_lock);
 
return mtype;
@@ -77,13 +66,8 @@ static struct memory_dev_type 
*kmem_find_alloc_memory_type(int adist)
 
 static void kmem_put_memory_types(void)
 {
-   struct memory_dev_type *mtype, *mtn;
-
mutex_lock(_memory_type_lock);
-   list_for_each_entry_safe(mtype, mtn, _memory_types, list) {
-   list_del(>list);
-   put_memory_type(mtype);
-   }
+   mt_put_memory_types(_memory_types);
mutex_unlock(_memory_type_lock);
 }
 
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 69e781900082..a44c03c2ba3a 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -48,6 +48,9 @@ int mt_calc_adistance(int node, int *adist);
 int mt_set_default_dram_perf(int nid, struct access_coordinate *perf,
 const char *source);
 int mt_perf_to_adistance(struct access_coordinate *perf, int *adist);
+struct memory_dev_type *mt_find_alloc_memory_type(int adist,
+   struct list_head 
*memory_types);
+void mt_put_memory_types(struct list_head *memory_types);
 #ifdef CONFIG_MIGRATION
 int next_demotion_node(int node);
 void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
@@ -136,5 +139,15 @@ static inline int mt_perf_to_adistance(struct 
access_coordinate *perf, int *adis
 {
return -EIO;
 }
+
+struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+{
+   return NULL;
+}
+
+void mt_put_memory_types(struct list_head *memory_types)
+{
+
+}
 #endif /* CONFIG_NUMA */
 #endif  /* _LINUX_MEMORY_TIERS_H */
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 0537664620e5..974af10cfdd8 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -623,6 +623,38 @@ void clear_node_memory_type(int node, struct 
memory_dev_type *memtype)
 }
 EXPORT_SYMBOL_GPL(clear_node_memory_type);
 
+struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+{
+   bool found = false;
+   struct memory_dev_type *mtype;
+
+   list_for_each_entry(mtype, memory_types, list) {
+   if (mtype->adistance == adist) {
+   found = true;
+   break;
+   }
+   }
+   if (!found) {
+   mtype = alloc_memory_type(adist);
+   if (!IS_ERR(mtype))
+   list_add(>list, memory_types);
+   }
+
+   return mtype;
+}
+EXPORT_SYMBOL_GPL(mt_find_alloc_memory_type);
+
+void mt_put_memory_types(struct list_head *memory_types)
+{
+   struct memory_dev_type *mtype, *mtn;
+
+   list_for_each_entry_safe(mtype, mtn, memory_types, list) {
+   list_del(>list);
+   put_memory_type(mtype);
+   }
+}
+EXPORT_SYMBOL_GPL(mt_put_memory_types);
+
 static void dump_hmem_attrs(struct access_coordinate *coord, const char 
*prefix)
 {
pr_info(
-- 
Ho-Ren (Jack) Chuang




[PATCH v6 0/2] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-03-27 Thread Ho-Ren (Jack) Chuang
When a memory device, such as CXL1.1 type3 memory, is emulated as
normal memory (E820_TYPE_RAM), the memory device is indistinguishable
from normal DRAM in terms of memory tiering with the current implementation.
The current memory tiering assigns all detected normal memory nodes
to the same DRAM tier. This results in normal memory devices with
different attributions being unable to be assigned to the correct memory tier,
leading to the inability to migrate pages between different types of memory.
https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/

This patchset automatically resolves the issues. It delays the initialization
of memory tiers for CPUless NUMA nodes until they obtain HMAT information
and after all devices are initialized at boot time, eliminating the need
for user intervention. If no HMAT is specified, it falls back to
using `default_dram_type`.

Example usecase:
We have CXL memory on the host, and we create VMs with a new system memory
device backed by host CXL memory. We inject CXL memory performance attributes
through QEMU, and the guest now sees memory nodes with performance attributes
in HMAT. With this change, we enable the guest kernel to construct
the correct memory tiering for the memory nodes.

-v6:
 Thanks to Ying's comments,
 * Move `default_dram_perf_lock` to the function's beginning for clarity
 * Fix double unlocking at v5
-v5:
 Thanks to Ying's comments,
 * Add comments about what is protected by `default_dram_perf_lock`
 * Fix an uninitialized pointer mtype
 * Slightly shorten the time holding `default_dram_perf_lock`
 * Fix a deadlock bug in `mt_perf_to_adistance`
 * 
https://lore.kernel.org/lkml/20240327041646.3258110-1-horenchu...@bytedance.com/T/#u
-v4:
 Thanks to Ying's comments,
 * Remove redundant code
 * Reorganize patches accordingly
 * 
https://lore.kernel.org/lkml/20240322070356.315922-1-horenchu...@bytedance.com/T/#u
-v3:
 Thanks to Ying's comments,
 * Make the newly added code independent of HMAT
 * Upgrade set_node_memory_tier to support more cases
 * Put all non-driver-initialized memory types into default_memory_types
   instead of using hmat_memory_types
 * find_alloc_memory_type -> mt_find_alloc_memory_type
 * 
https://lore.kernel.org/lkml/20240320061041.3246828-1-horenchu...@bytedance.com/T/#u
-v2:
 Thanks to Ying's comments,
 * Rewrite cover letter & patch description
 * Rename functions, don't use _hmat
 * Abstract common functions into find_alloc_memory_type()
 * Use the expected way to use set_node_memory_tier instead of modifying it
 * 
https://lore.kernel.org/lkml/20240312061729.1997111-1-horenchu...@bytedance.com/T/#u
-v1:
 * 
https://lore.kernel.org/lkml/20240301082248.3456086-1-horenchu...@bytedance.com/T/#u

Ho-Ren (Jack) Chuang (2):
  memory tier: dax/kmem: introduce an abstract layer for finding,
allocating, and putting memory types
  memory tier: create CPUless memory tiers after obtaining HMAT info

 drivers/dax/kmem.c   |  20 +-
 include/linux/memory-tiers.h |  13 
 mm/memory-tiers.c| 126 ++-
 3 files changed, 125 insertions(+), 34 deletions(-)

-- 
Ho-Ren (Jack) Chuang




Re: [External] Re: [PATCH v5 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-03-27 Thread Ho-Ren (Jack) Chuang
On Tue, Mar 26, 2024 at 10:52 PM Huang, Ying  wrote:
>
> "Ho-Ren (Jack) Chuang"  writes:
>
> > The current implementation treats emulated memory devices, such as
> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
> > (E820_TYPE_RAM). However, these emulated devices have different
> > characteristics than traditional DRAM, making it important to
> > distinguish them. Thus, we modify the tiered memory initialization process
> > to introduce a delay specifically for CPUless NUMA nodes. This delay
> > ensures that the memory tier initialization for these nodes is deferred
> > until HMAT information is obtained during the boot process. Finally,
> > demotion tables are recalculated at the end.
> >
> > * late_initcall(memory_tier_late_init);
> > Some device drivers may have initialized memory tiers between
> > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
> > online memory nodes and configuring memory tiers. They should be excluded
> > in the late init.
> >
> > * Handle cases where there is no HMAT when creating memory tiers
> > There is a scenario where a CPUless node does not provide HMAT information.
> > If no HMAT is specified, it falls back to using the default DRAM tier.
> >
> > * Introduce another new lock `default_dram_perf_lock` for adist calculation
> > In the current implementation, iterating through CPUlist nodes requires
> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
> > trying to acquire the same lock, leading to a potential deadlock.
> > Therefore, we propose introducing a standalone `default_dram_perf_lock` to
> > protect `default_dram_perf_*`. This approach not only avoids deadlock
> > but also prevents holding a large lock simultaneously. Besides, this patch
> > slightly shortens the time holding the lock by putting the lock closer to
> > what it protects as well.
> >
> > * Upgrade `set_node_memory_tier` to support additional cases, including
> >   default DRAM, late CPUless, and hot-plugged initializations.
> > To cover hot-plugged memory nodes, `mt_calc_adistance()` and
> > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
> > handle cases where memtype is not initialized and where HMAT information is
> > available.
> >
> > * Introduce `default_memory_types` for those memory types that are not
> >   initialized by device drivers.
> > Because late initialized memory and default DRAM memory need to be managed,
> > a default memory type is created for storing all memory types that are
> > not initialized by device drivers and as a fallback.
> >
> > * Fix a deadlock bug in `mt_perf_to_adistance`
> > Because an error path was not handled properly in `mt_perf_to_adistance`,
> > unlock before returning the error.
> >
> > Signed-off-by: Ho-Ren (Jack) Chuang 
> > Signed-off-by: Hao Xiang 
> > ---
> >  mm/memory-tiers.c | 85 +++
> >  1 file changed, 72 insertions(+), 13 deletions(-)
> >
> > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> > index 974af10cfdd8..610db9581ba4 100644
> > --- a/mm/memory-tiers.c
> > +++ b/mm/memory-tiers.c
> > @@ -36,6 +36,11 @@ struct node_memory_type_map {
> >
> >  static DEFINE_MUTEX(memory_tier_lock);
> >  static LIST_HEAD(memory_tiers);
> > +/*
> > + * The list is used to store all memory types that are not created
> > + * by a device driver.
> > + */
> > +static LIST_HEAD(default_memory_types);
> >  static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
> >  struct memory_dev_type *default_dram_type;
> >
> > @@ -108,6 +113,8 @@ static struct demotion_nodes *node_demotion 
> > __read_mostly;
> >
> >  static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms);
> >
> > +/* The lock is used to protect `default_dram_perf*` info and nid. */
> > +static DEFINE_MUTEX(default_dram_perf_lock);
> >  static bool default_dram_perf_error;
> >  static struct access_coordinate default_dram_perf;
> >  static int default_dram_perf_ref_nid = NUMA_NO_NODE;
> > @@ -505,7 +512,8 @@ static inline void __init_node_memory_type(int node, 
> > struct memory_dev_type *mem
> >  static struct memory_tier *set_node_memory_tier(int node)
> >  {
> >   struct memory_tier *memtier;
> > - struct memory_dev_type *memtype;
> > + struct memory_dev_type *mtype = default_dram_type;
> > + int adist = MEMTIER_ADISTANCE_DRAM;
> >   pg_data_t *pgdat = NODE_DATA(node);
> >
> >

[PATCH v5 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-03-26 Thread Ho-Ren (Jack) Chuang
The current implementation treats emulated memory devices, such as
CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
(E820_TYPE_RAM). However, these emulated devices have different
characteristics than traditional DRAM, making it important to
distinguish them. Thus, we modify the tiered memory initialization process
to introduce a delay specifically for CPUless NUMA nodes. This delay
ensures that the memory tier initialization for these nodes is deferred
until HMAT information is obtained during the boot process. Finally,
demotion tables are recalculated at the end.

* late_initcall(memory_tier_late_init);
Some device drivers may have initialized memory tiers between
`memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
online memory nodes and configuring memory tiers. They should be excluded
in the late init.

* Handle cases where there is no HMAT when creating memory tiers
There is a scenario where a CPUless node does not provide HMAT information.
If no HMAT is specified, it falls back to using the default DRAM tier.

* Introduce another new lock `default_dram_perf_lock` for adist calculation
In the current implementation, iterating through CPUlist nodes requires
holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
trying to acquire the same lock, leading to a potential deadlock.
Therefore, we propose introducing a standalone `default_dram_perf_lock` to
protect `default_dram_perf_*`. This approach not only avoids deadlock
but also prevents holding a large lock simultaneously. Besides, this patch
slightly shortens the time holding the lock by putting the lock closer to
what it protects as well.

* Upgrade `set_node_memory_tier` to support additional cases, including
  default DRAM, late CPUless, and hot-plugged initializations.
To cover hot-plugged memory nodes, `mt_calc_adistance()` and
`mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
handle cases where memtype is not initialized and where HMAT information is
available.

* Introduce `default_memory_types` for those memory types that are not
  initialized by device drivers.
Because late initialized memory and default DRAM memory need to be managed,
a default memory type is created for storing all memory types that are
not initialized by device drivers and as a fallback.

* Fix a deadlock bug in `mt_perf_to_adistance`
Because an error path was not handled properly in `mt_perf_to_adistance`,
unlock before returning the error.

Signed-off-by: Ho-Ren (Jack) Chuang 
Signed-off-by: Hao Xiang 
---
 mm/memory-tiers.c | 85 +++
 1 file changed, 72 insertions(+), 13 deletions(-)

diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 974af10cfdd8..610db9581ba4 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -36,6 +36,11 @@ struct node_memory_type_map {
 
 static DEFINE_MUTEX(memory_tier_lock);
 static LIST_HEAD(memory_tiers);
+/*
+ * The list is used to store all memory types that are not created
+ * by a device driver.
+ */
+static LIST_HEAD(default_memory_types);
 static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
 struct memory_dev_type *default_dram_type;
 
@@ -108,6 +113,8 @@ static struct demotion_nodes *node_demotion __read_mostly;
 
 static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms);
 
+/* The lock is used to protect `default_dram_perf*` info and nid. */
+static DEFINE_MUTEX(default_dram_perf_lock);
 static bool default_dram_perf_error;
 static struct access_coordinate default_dram_perf;
 static int default_dram_perf_ref_nid = NUMA_NO_NODE;
@@ -505,7 +512,8 @@ static inline void __init_node_memory_type(int node, struct 
memory_dev_type *mem
 static struct memory_tier *set_node_memory_tier(int node)
 {
struct memory_tier *memtier;
-   struct memory_dev_type *memtype;
+   struct memory_dev_type *mtype = default_dram_type;
+   int adist = MEMTIER_ADISTANCE_DRAM;
pg_data_t *pgdat = NODE_DATA(node);
 
 
@@ -514,11 +522,20 @@ static struct memory_tier *set_node_memory_tier(int node)
if (!node_state(node, N_MEMORY))
return ERR_PTR(-EINVAL);
 
-   __init_node_memory_type(node, default_dram_type);
+   mt_calc_adistance(node, );
+   if (node_memory_types[node].memtype == NULL) {
+   mtype = mt_find_alloc_memory_type(adist, _memory_types);
+   if (IS_ERR(mtype)) {
+   mtype = default_dram_type;
+   pr_info("Failed to allocate a memory type. Fall 
back.\n");
+   }
+   }
 
-   memtype = node_memory_types[node].memtype;
-   node_set(node, memtype->nodes);
-   memtier = find_create_memory_tier(memtype);
+   __init_node_memory_type(node, mtype);
+
+   mtype = node_memory_types[node].memtype;
+   node_set(node, mtype->nodes);
+   memtier = find_create_memory_tier(mtype);
if (!IS_ERR(memtier))
rc

[PATCH v5 1/2] memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types

2024-03-26 Thread Ho-Ren (Jack) Chuang
Since different memory devices require finding, allocating, and putting
memory types, these common steps are abstracted in this patch,
enhancing the scalability and conciseness of the code.

Signed-off-by: Ho-Ren (Jack) Chuang 
---
 drivers/dax/kmem.c   | 20 ++--
 include/linux/memory-tiers.h | 13 +
 mm/memory-tiers.c| 32 
 3 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 42ee360cf4e3..01399e5b53b2 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -55,21 +55,10 @@ static LIST_HEAD(kmem_memory_types);
 
 static struct memory_dev_type *kmem_find_alloc_memory_type(int adist)
 {
-   bool found = false;
struct memory_dev_type *mtype;
 
mutex_lock(_memory_type_lock);
-   list_for_each_entry(mtype, _memory_types, list) {
-   if (mtype->adistance == adist) {
-   found = true;
-   break;
-   }
-   }
-   if (!found) {
-   mtype = alloc_memory_type(adist);
-   if (!IS_ERR(mtype))
-   list_add(>list, _memory_types);
-   }
+   mtype = mt_find_alloc_memory_type(adist, _memory_types);
mutex_unlock(_memory_type_lock);
 
return mtype;
@@ -77,13 +66,8 @@ static struct memory_dev_type 
*kmem_find_alloc_memory_type(int adist)
 
 static void kmem_put_memory_types(void)
 {
-   struct memory_dev_type *mtype, *mtn;
-
mutex_lock(_memory_type_lock);
-   list_for_each_entry_safe(mtype, mtn, _memory_types, list) {
-   list_del(>list);
-   put_memory_type(mtype);
-   }
+   mt_put_memory_types(_memory_types);
mutex_unlock(_memory_type_lock);
 }
 
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 69e781900082..a44c03c2ba3a 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -48,6 +48,9 @@ int mt_calc_adistance(int node, int *adist);
 int mt_set_default_dram_perf(int nid, struct access_coordinate *perf,
 const char *source);
 int mt_perf_to_adistance(struct access_coordinate *perf, int *adist);
+struct memory_dev_type *mt_find_alloc_memory_type(int adist,
+   struct list_head 
*memory_types);
+void mt_put_memory_types(struct list_head *memory_types);
 #ifdef CONFIG_MIGRATION
 int next_demotion_node(int node);
 void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
@@ -136,5 +139,15 @@ static inline int mt_perf_to_adistance(struct 
access_coordinate *perf, int *adis
 {
return -EIO;
 }
+
+struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+{
+   return NULL;
+}
+
+void mt_put_memory_types(struct list_head *memory_types)
+{
+
+}
 #endif /* CONFIG_NUMA */
 #endif  /* _LINUX_MEMORY_TIERS_H */
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 0537664620e5..974af10cfdd8 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -623,6 +623,38 @@ void clear_node_memory_type(int node, struct 
memory_dev_type *memtype)
 }
 EXPORT_SYMBOL_GPL(clear_node_memory_type);
 
+struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+{
+   bool found = false;
+   struct memory_dev_type *mtype;
+
+   list_for_each_entry(mtype, memory_types, list) {
+   if (mtype->adistance == adist) {
+   found = true;
+   break;
+   }
+   }
+   if (!found) {
+   mtype = alloc_memory_type(adist);
+   if (!IS_ERR(mtype))
+   list_add(>list, memory_types);
+   }
+
+   return mtype;
+}
+EXPORT_SYMBOL_GPL(mt_find_alloc_memory_type);
+
+void mt_put_memory_types(struct list_head *memory_types)
+{
+   struct memory_dev_type *mtype, *mtn;
+
+   list_for_each_entry_safe(mtype, mtn, memory_types, list) {
+   list_del(>list);
+   put_memory_type(mtype);
+   }
+}
+EXPORT_SYMBOL_GPL(mt_put_memory_types);
+
 static void dump_hmem_attrs(struct access_coordinate *coord, const char 
*prefix)
 {
pr_info(
-- 
Ho-Ren (Jack) Chuang




[PATCH v5 0/2] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-03-26 Thread Ho-Ren (Jack) Chuang
When a memory device, such as CXL1.1 type3 memory, is emulated as
normal memory (E820_TYPE_RAM), the memory device is indistinguishable
from normal DRAM in terms of memory tiering with the current implementation.
The current memory tiering assigns all detected normal memory nodes
to the same DRAM tier. This results in normal memory devices with
different attributions being unable to be assigned to the correct memory tier,
leading to the inability to migrate pages between different types of memory.
https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/

This patchset automatically resolves the issues. It delays the initialization
of memory tiers for CPUless NUMA nodes until they obtain HMAT information
and after all devices are initialized at boot time, eliminating the need
for user intervention. If no HMAT is specified, it falls back to
using `default_dram_type`.

Example usecase:
We have CXL memory on the host, and we create VMs with a new system memory
device backed by host CXL memory. We inject CXL memory performance attributes
through QEMU, and the guest now sees memory nodes with performance attributes
in HMAT. With this change, we enable the guest kernel to construct
the correct memory tiering for the memory nodes.

-v5:
 Thanks to Ying's comments,
 * Add comments about what is protected by `default_dram_perf_lock`
 * Fix an uninitialized pointer mtype
 * Slightly shorten the time holding `default_dram_perf_lock`
 * Fix a deadlock bug in `mt_perf_to_adistance`
-v4:
 Thanks to Ying's comments,
 * Remove redundant code
 * Reorganize patches accordingly
 * 
https://lore.kernel.org/lkml/20240322070356.315922-1-horenchu...@bytedance.com/T/#u
-v3:
 Thanks to Ying's comments,
 * Make the newly added code independent of HMAT
 * Upgrade set_node_memory_tier to support more cases
 * Put all non-driver-initialized memory types into default_memory_types
   instead of using hmat_memory_types
 * find_alloc_memory_type -> mt_find_alloc_memory_type
 * 
https://lore.kernel.org/lkml/20240320061041.3246828-1-horenchu...@bytedance.com/T/#u
-v2:
 Thanks to Ying's comments,
 * Rewrite cover letter & patch description
 * Rename functions, don't use _hmat
 * Abstract common functions into find_alloc_memory_type()
 * Use the expected way to use set_node_memory_tier instead of modifying it
 * 
https://lore.kernel.org/lkml/20240312061729.1997111-1-horenchu...@bytedance.com/T/#u
-v1:
 * 
https://lore.kernel.org/lkml/20240301082248.3456086-1-horenchu...@bytedance.com/T/#u


Ho-Ren (Jack) Chuang (2):
  memory tier: dax/kmem: introduce an abstract layer for finding,
allocating, and putting memory types
  memory tier: create CPUless memory tiers after obtaining HMAT info

 drivers/dax/kmem.c   |  20 +-
 include/linux/memory-tiers.h |  13 
 mm/memory-tiers.c| 117 +++
 3 files changed, 119 insertions(+), 31 deletions(-)

-- 
Ho-Ren (Jack) Chuang




Re: [External] Re: [PATCH v4 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-03-26 Thread Ho-Ren (Jack) Chuang
On Mon, Mar 25, 2024 at 8:08 PM Huang, Ying  wrote:
>
> "Ho-Ren (Jack) Chuang"  writes:
>
> > On Fri, Mar 22, 2024 at 1:41 AM Huang, Ying  wrote:
> >>
> >> "Ho-Ren (Jack) Chuang"  writes:
> >>
> >> > The current implementation treats emulated memory devices, such as
> >> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal 
> >> > memory
> >> > (E820_TYPE_RAM). However, these emulated devices have different
> >> > characteristics than traditional DRAM, making it important to
> >> > distinguish them. Thus, we modify the tiered memory initialization 
> >> > process
> >> > to introduce a delay specifically for CPUless NUMA nodes. This delay
> >> > ensures that the memory tier initialization for these nodes is deferred
> >> > until HMAT information is obtained during the boot process. Finally,
> >> > demotion tables are recalculated at the end.
> >> >
> >> > * late_initcall(memory_tier_late_init);
> >> > Some device drivers may have initialized memory tiers between
> >> > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
> >> > online memory nodes and configuring memory tiers. They should be excluded
> >> > in the late init.
> >> >
> >> > * Handle cases where there is no HMAT when creating memory tiers
> >> > There is a scenario where a CPUless node does not provide HMAT 
> >> > information.
> >> > If no HMAT is specified, it falls back to using the default DRAM tier.
> >> >
> >> > * Introduce another new lock `default_dram_perf_lock` for adist 
> >> > calculation
> >> > In the current implementation, iterating through CPUlist nodes requires
> >> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end 
> >> > up
> >> > trying to acquire the same lock, leading to a potential deadlock.
> >> > Therefore, we propose introducing a standalone `default_dram_perf_lock` 
> >> > to
> >> > protect `default_dram_perf_*`. This approach not only avoids deadlock
> >> > but also prevents holding a large lock simultaneously.
> >> >
> >> > * Upgrade `set_node_memory_tier` to support additional cases, including
> >> >   default DRAM, late CPUless, and hot-plugged initializations.
> >> > To cover hot-plugged memory nodes, `mt_calc_adistance()` and
> >> > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
> >> > handle cases where memtype is not initialized and where HMAT information 
> >> > is
> >> > available.
> >> >
> >> > * Introduce `default_memory_types` for those memory types that are not
> >> >   initialized by device drivers.
> >> > Because late initialized memory and default DRAM memory need to be 
> >> > managed,
> >> > a default memory type is created for storing all memory types that are
> >> > not initialized by device drivers and as a fallback.
> >> >
> >> > Signed-off-by: Ho-Ren (Jack) Chuang 
> >> > Signed-off-by: Hao Xiang 
> >> > ---
> >> >  mm/memory-tiers.c | 73 ---
> >> >  1 file changed, 63 insertions(+), 10 deletions(-)
> >> >
> >> > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> >> > index 974af10cfdd8..9396330fa162 100644
> >> > --- a/mm/memory-tiers.c
> >> > +++ b/mm/memory-tiers.c
> >> > @@ -36,6 +36,11 @@ struct node_memory_type_map {
> >> >
> >> >  static DEFINE_MUTEX(memory_tier_lock);
> >> >  static LIST_HEAD(memory_tiers);
> >> > +/*
> >> > + * The list is used to store all memory types that are not created
> >> > + * by a device driver.
> >> > + */
> >> > +static LIST_HEAD(default_memory_types);
> >> >  static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
> >> >  struct memory_dev_type *default_dram_type;
> >> >
> >> > @@ -108,6 +113,7 @@ static struct demotion_nodes *node_demotion 
> >> > __read_mostly;
> >> >
> >> >  static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms);
> >> >
> >> > +static DEFINE_MUTEX(default_dram_perf_lock);
> >>
> >> Better to add comments about what is protected by this lock.
> >>
> >
> > Thank you. I will add 

Re: [External] Re: [PATCH v4 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-03-22 Thread Ho-Ren (Jack) Chuang
On Fri, Mar 22, 2024 at 1:41 AM Huang, Ying  wrote:
>
> "Ho-Ren (Jack) Chuang"  writes:
>
> > The current implementation treats emulated memory devices, such as
> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
> > (E820_TYPE_RAM). However, these emulated devices have different
> > characteristics than traditional DRAM, making it important to
> > distinguish them. Thus, we modify the tiered memory initialization process
> > to introduce a delay specifically for CPUless NUMA nodes. This delay
> > ensures that the memory tier initialization for these nodes is deferred
> > until HMAT information is obtained during the boot process. Finally,
> > demotion tables are recalculated at the end.
> >
> > * late_initcall(memory_tier_late_init);
> > Some device drivers may have initialized memory tiers between
> > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
> > online memory nodes and configuring memory tiers. They should be excluded
> > in the late init.
> >
> > * Handle cases where there is no HMAT when creating memory tiers
> > There is a scenario where a CPUless node does not provide HMAT information.
> > If no HMAT is specified, it falls back to using the default DRAM tier.
> >
> > * Introduce another new lock `default_dram_perf_lock` for adist calculation
> > In the current implementation, iterating through CPUlist nodes requires
> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
> > trying to acquire the same lock, leading to a potential deadlock.
> > Therefore, we propose introducing a standalone `default_dram_perf_lock` to
> > protect `default_dram_perf_*`. This approach not only avoids deadlock
> > but also prevents holding a large lock simultaneously.
> >
> > * Upgrade `set_node_memory_tier` to support additional cases, including
> >   default DRAM, late CPUless, and hot-plugged initializations.
> > To cover hot-plugged memory nodes, `mt_calc_adistance()` and
> > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
> > handle cases where memtype is not initialized and where HMAT information is
> > available.
> >
> > * Introduce `default_memory_types` for those memory types that are not
> >   initialized by device drivers.
> > Because late initialized memory and default DRAM memory need to be managed,
> > a default memory type is created for storing all memory types that are
> > not initialized by device drivers and as a fallback.
> >
> > Signed-off-by: Ho-Ren (Jack) Chuang 
> > Signed-off-by: Hao Xiang 
> > ---
> >  mm/memory-tiers.c | 73 ---
> >  1 file changed, 63 insertions(+), 10 deletions(-)
> >
> > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> > index 974af10cfdd8..9396330fa162 100644
> > --- a/mm/memory-tiers.c
> > +++ b/mm/memory-tiers.c
> > @@ -36,6 +36,11 @@ struct node_memory_type_map {
> >
> >  static DEFINE_MUTEX(memory_tier_lock);
> >  static LIST_HEAD(memory_tiers);
> > +/*
> > + * The list is used to store all memory types that are not created
> > + * by a device driver.
> > + */
> > +static LIST_HEAD(default_memory_types);
> >  static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
> >  struct memory_dev_type *default_dram_type;
> >
> > @@ -108,6 +113,7 @@ static struct demotion_nodes *node_demotion 
> > __read_mostly;
> >
> >  static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms);
> >
> > +static DEFINE_MUTEX(default_dram_perf_lock);
>
> Better to add comments about what is protected by this lock.
>

Thank you. I will add a comment like this:
+ /* The lock is used to protect `default_dram_perf*` info and nid. */
+static DEFINE_MUTEX(default_dram_perf_lock);

I also found an error path was not handled and
found the lock could be put closer to what it protects.
I will have them fixed in V5.

> >  static bool default_dram_perf_error;
> >  static struct access_coordinate default_dram_perf;
> >  static int default_dram_perf_ref_nid = NUMA_NO_NODE;
> > @@ -505,7 +511,8 @@ static inline void __init_node_memory_type(int node, 
> > struct memory_dev_type *mem
> >  static struct memory_tier *set_node_memory_tier(int node)
> >  {
> >   struct memory_tier *memtier;
> > - struct memory_dev_type *memtype;
> > + struct memory_dev_type *mtype;
>
> mtype may be referenced without initialization now below.
>

Good catch! Thank you.

Please check below.
I may found a potential NULL pointer dereference.

[PATCH v4 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-03-22 Thread Ho-Ren (Jack) Chuang
The current implementation treats emulated memory devices, such as
CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
(E820_TYPE_RAM). However, these emulated devices have different
characteristics than traditional DRAM, making it important to
distinguish them. Thus, we modify the tiered memory initialization process
to introduce a delay specifically for CPUless NUMA nodes. This delay
ensures that the memory tier initialization for these nodes is deferred
until HMAT information is obtained during the boot process. Finally,
demotion tables are recalculated at the end.

* late_initcall(memory_tier_late_init);
Some device drivers may have initialized memory tiers between
`memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
online memory nodes and configuring memory tiers. They should be excluded
in the late init.

* Handle cases where there is no HMAT when creating memory tiers
There is a scenario where a CPUless node does not provide HMAT information.
If no HMAT is specified, it falls back to using the default DRAM tier.

* Introduce another new lock `default_dram_perf_lock` for adist calculation
In the current implementation, iterating through CPUlist nodes requires
holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
trying to acquire the same lock, leading to a potential deadlock.
Therefore, we propose introducing a standalone `default_dram_perf_lock` to
protect `default_dram_perf_*`. This approach not only avoids deadlock
but also prevents holding a large lock simultaneously.

* Upgrade `set_node_memory_tier` to support additional cases, including
  default DRAM, late CPUless, and hot-plugged initializations.
To cover hot-plugged memory nodes, `mt_calc_adistance()` and
`mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
handle cases where memtype is not initialized and where HMAT information is
available.

* Introduce `default_memory_types` for those memory types that are not
  initialized by device drivers.
Because late initialized memory and default DRAM memory need to be managed,
a default memory type is created for storing all memory types that are
not initialized by device drivers and as a fallback.

Signed-off-by: Ho-Ren (Jack) Chuang 
Signed-off-by: Hao Xiang 
---
 mm/memory-tiers.c | 73 ---
 1 file changed, 63 insertions(+), 10 deletions(-)

diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 974af10cfdd8..9396330fa162 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -36,6 +36,11 @@ struct node_memory_type_map {
 
 static DEFINE_MUTEX(memory_tier_lock);
 static LIST_HEAD(memory_tiers);
+/*
+ * The list is used to store all memory types that are not created
+ * by a device driver.
+ */
+static LIST_HEAD(default_memory_types);
 static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
 struct memory_dev_type *default_dram_type;
 
@@ -108,6 +113,7 @@ static struct demotion_nodes *node_demotion __read_mostly;
 
 static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms);
 
+static DEFINE_MUTEX(default_dram_perf_lock);
 static bool default_dram_perf_error;
 static struct access_coordinate default_dram_perf;
 static int default_dram_perf_ref_nid = NUMA_NO_NODE;
@@ -505,7 +511,8 @@ static inline void __init_node_memory_type(int node, struct 
memory_dev_type *mem
 static struct memory_tier *set_node_memory_tier(int node)
 {
struct memory_tier *memtier;
-   struct memory_dev_type *memtype;
+   struct memory_dev_type *mtype;
+   int adist = MEMTIER_ADISTANCE_DRAM;
pg_data_t *pgdat = NODE_DATA(node);
 
 
@@ -514,11 +521,20 @@ static struct memory_tier *set_node_memory_tier(int node)
if (!node_state(node, N_MEMORY))
return ERR_PTR(-EINVAL);
 
-   __init_node_memory_type(node, default_dram_type);
+   mt_calc_adistance(node, );
+   if (node_memory_types[node].memtype == NULL) {
+   mtype = mt_find_alloc_memory_type(adist, _memory_types);
+   if (IS_ERR(mtype)) {
+   mtype = default_dram_type;
+   pr_info("Failed to allocate a memory type. Fall 
back.\n");
+   }
+   }
 
-   memtype = node_memory_types[node].memtype;
-   node_set(node, memtype->nodes);
-   memtier = find_create_memory_tier(memtype);
+   __init_node_memory_type(node, mtype);
+
+   mtype = node_memory_types[node].memtype;
+   node_set(node, mtype->nodes);
+   memtier = find_create_memory_tier(mtype);
if (!IS_ERR(memtier))
rcu_assign_pointer(pgdat->memtier, memtier);
return memtier;
@@ -655,6 +671,34 @@ void mt_put_memory_types(struct list_head *memory_types)
 }
 EXPORT_SYMBOL_GPL(mt_put_memory_types);
 
+/*
+ * This is invoked via `late_initcall()` to initialize memory tiers for
+ * CPU-less memory nodes after driver initialization, which is
+ * expected to provide `ad

[PATCH v4 1/2] memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types

2024-03-22 Thread Ho-Ren (Jack) Chuang
Since different memory devices require finding, allocating, and putting
memory types, these common steps are abstracted in this patch,
enhancing the scalability and conciseness of the code.

Signed-off-by: Ho-Ren (Jack) Chuang 
---
 drivers/dax/kmem.c   | 20 ++--
 include/linux/memory-tiers.h | 13 +
 mm/memory-tiers.c| 32 
 3 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 42ee360cf4e3..01399e5b53b2 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -55,21 +55,10 @@ static LIST_HEAD(kmem_memory_types);
 
 static struct memory_dev_type *kmem_find_alloc_memory_type(int adist)
 {
-   bool found = false;
struct memory_dev_type *mtype;
 
mutex_lock(_memory_type_lock);
-   list_for_each_entry(mtype, _memory_types, list) {
-   if (mtype->adistance == adist) {
-   found = true;
-   break;
-   }
-   }
-   if (!found) {
-   mtype = alloc_memory_type(adist);
-   if (!IS_ERR(mtype))
-   list_add(>list, _memory_types);
-   }
+   mtype = mt_find_alloc_memory_type(adist, _memory_types);
mutex_unlock(_memory_type_lock);
 
return mtype;
@@ -77,13 +66,8 @@ static struct memory_dev_type 
*kmem_find_alloc_memory_type(int adist)
 
 static void kmem_put_memory_types(void)
 {
-   struct memory_dev_type *mtype, *mtn;
-
mutex_lock(_memory_type_lock);
-   list_for_each_entry_safe(mtype, mtn, _memory_types, list) {
-   list_del(>list);
-   put_memory_type(mtype);
-   }
+   mt_put_memory_types(_memory_types);
mutex_unlock(_memory_type_lock);
 }
 
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 69e781900082..a44c03c2ba3a 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -48,6 +48,9 @@ int mt_calc_adistance(int node, int *adist);
 int mt_set_default_dram_perf(int nid, struct access_coordinate *perf,
 const char *source);
 int mt_perf_to_adistance(struct access_coordinate *perf, int *adist);
+struct memory_dev_type *mt_find_alloc_memory_type(int adist,
+   struct list_head 
*memory_types);
+void mt_put_memory_types(struct list_head *memory_types);
 #ifdef CONFIG_MIGRATION
 int next_demotion_node(int node);
 void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
@@ -136,5 +139,15 @@ static inline int mt_perf_to_adistance(struct 
access_coordinate *perf, int *adis
 {
return -EIO;
 }
+
+struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+{
+   return NULL;
+}
+
+void mt_put_memory_types(struct list_head *memory_types)
+{
+
+}
 #endif /* CONFIG_NUMA */
 #endif  /* _LINUX_MEMORY_TIERS_H */
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 0537664620e5..974af10cfdd8 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -623,6 +623,38 @@ void clear_node_memory_type(int node, struct 
memory_dev_type *memtype)
 }
 EXPORT_SYMBOL_GPL(clear_node_memory_type);
 
+struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+{
+   bool found = false;
+   struct memory_dev_type *mtype;
+
+   list_for_each_entry(mtype, memory_types, list) {
+   if (mtype->adistance == adist) {
+   found = true;
+   break;
+   }
+   }
+   if (!found) {
+   mtype = alloc_memory_type(adist);
+   if (!IS_ERR(mtype))
+   list_add(>list, memory_types);
+   }
+
+   return mtype;
+}
+EXPORT_SYMBOL_GPL(mt_find_alloc_memory_type);
+
+void mt_put_memory_types(struct list_head *memory_types)
+{
+   struct memory_dev_type *mtype, *mtn;
+
+   list_for_each_entry_safe(mtype, mtn, memory_types, list) {
+   list_del(>list);
+   put_memory_type(mtype);
+   }
+}
+EXPORT_SYMBOL_GPL(mt_put_memory_types);
+
 static void dump_hmem_attrs(struct access_coordinate *coord, const char 
*prefix)
 {
pr_info(
-- 
Ho-Ren (Jack) Chuang




[PATCH v4 0/2] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-03-22 Thread Ho-Ren (Jack) Chuang
When a memory device, such as CXL1.1 type3 memory, is emulated as
normal memory (E820_TYPE_RAM), the memory device is indistinguishable
from normal DRAM in terms of memory tiering with the current implementation.
The current memory tiering assigns all detected normal memory nodes
to the same DRAM tier. This results in normal memory devices with
different attributions being unable to be assigned to the correct memory tier,
leading to the inability to migrate pages between different types of memory.
https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/

This patchset automatically resolves the issues. It delays the initialization
of memory tiers for CPUless NUMA nodes until they obtain HMAT information
and after all devices are initialized at boot time, eliminating the need
for user intervention. If no HMAT is specified, it falls back to
using `default_dram_type`.

Example usecase:
We have CXL memory on the host, and we create VMs with a new system memory
device backed by host CXL memory. We inject CXL memory performance attributes
through QEMU, and the guest now sees memory nodes with performance attributes
in HMAT. With this change, we enable the guest kernel to construct
the correct memory tiering for the memory nodes.

-v4:
 Thanks to Ying's comments,
 * Remove redundant code
 * Reorganize patches accordingly
-v3:
 Thanks to Ying's comments,
 * Make the newly added code independent of HMAT
 * Upgrade set_node_memory_tier to support more cases
 * Put all non-driver-initialized memory types into default_memory_types
   instead of using hmat_memory_types
 * find_alloc_memory_type -> mt_find_alloc_memory_type
 * 
https://lore.kernel.org/lkml/20240320061041.3246828-1-horenchu...@bytedance.com/T/#u
-v2:
 Thanks to Ying's comments,
 * Rewrite cover letter & patch description
 * Rename functions, don't use _hmat
 * Abstract common functions into find_alloc_memory_type()
 * Use the expected way to use set_node_memory_tier instead of modifying it
 * 
https://lore.kernel.org/lkml/20240312061729.1997111-1-horenchu...@bytedance.com/T/#u
-v1:
 * 
https://lore.kernel.org/lkml/20240301082248.3456086-1-horenchu...@bytedance.com/T/#u

Ho-Ren (Jack) Chuang (2):
  memory tier: dax/kmem: introduce an abstract layer for finding,
allocating, and putting memory types
  memory tier: create CPUless memory tiers after obtaining HMAT info

 drivers/dax/kmem.c   |  20 +--
 include/linux/memory-tiers.h |  13 +
 mm/memory-tiers.c| 105 +++
 3 files changed, 110 insertions(+), 28 deletions(-)

-- 
Ho-Ren (Jack) Chuang




Re: [External] Re: [PATCH v3 1/2] memory tier: dax/kmem: create CPUless memory tiers after obtaining HMAT info

2024-03-20 Thread Ho-Ren (Jack) Chuang
On Wed, Mar 20, 2024 at 12:15 AM Huang, Ying  wrote:
>
> "Ho-Ren (Jack) Chuang"  writes:
>
> > The current implementation treats emulated memory devices, such as
> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
> > (E820_TYPE_RAM). However, these emulated devices have different
> > characteristics than traditional DRAM, making it important to
> > distinguish them. Thus, we modify the tiered memory initialization process
> > to introduce a delay specifically for CPUless NUMA nodes. This delay
> > ensures that the memory tier initialization for these nodes is deferred
> > until HMAT information is obtained during the boot process. Finally,
> > demotion tables are recalculated at the end.
> >
> > More details:
>
> You have done several stuff in one patch.  So you need "more details".
> You may separate them into multiple patches.  One for echo "*" below.
> But I have no strong opinion on that.
>
> > * late_initcall(memory_tier_late_init);
> > Some device drivers may have initialized memory tiers between
> > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
> > online memory nodes and configuring memory tiers. They should be excluded
> > in the late init.
> >
> > * Abstract common functions into `mt_find_alloc_memory_type()`
> > Since different memory devices require finding or allocating a memory type,
> > these common steps are abstracted into a single function,
> > `mt_find_alloc_memory_type()`, enhancing code scalability and conciseness.
> >
> > * Handle cases where there is no HMAT when creating memory tiers
> > There is a scenario where a CPUless node does not provide HMAT information.
> > If no HMAT is specified, it falls back to using the default DRAM tier.
> >
> > * Change adist calculation code to use another new lock, `mt_perf_lock`.
> > In the current implementation, iterating through CPUlist nodes requires
> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
> > trying to acquire the same lock, leading to a potential deadlock.
> > Therefore, we propose introducing a standalone `mt_perf_lock` to protect
> > `default_dram_perf`. This approach not only avoids deadlock but also
> > prevents holding a large lock simultaneously.
> >
> > * Upgrade `set_node_memory_tier` to support additional cases, including
> >   default DRAM, late CPUless, and hot-plugged initializations.
> > To cover hot-plugged memory nodes, `mt_calc_adistance()` and
> > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
> > handle cases where memtype is not initialized and where HMAT information is
> > available.
> >
> > * Introduce `default_memory_types` for those memory types that are not
> >   initialized by device drivers.
> > Because late initialized memory and default DRAM memory need to be managed,
> > a default memory type is created for storing all memory types that are
> > not initialized by device drivers and as a fallback.
> >
> > Signed-off-by: Ho-Ren (Jack) Chuang 
> > Signed-off-by: Hao Xiang 
> > ---
> >  drivers/dax/kmem.c   | 13 +
> >  include/linux/memory-tiers.h |  7 +++
> >  mm/memory-tiers.c| 94 +---
> >  3 files changed, 95 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > index 42ee360cf4e3..de1333aa7b3e 100644
> > --- a/drivers/dax/kmem.c
> > +++ b/drivers/dax/kmem.c
> > @@ -55,21 +55,10 @@ static LIST_HEAD(kmem_memory_types);
> >
> >  static struct memory_dev_type *kmem_find_alloc_memory_type(int adist)
> >  {
> > - bool found = false;
> >   struct memory_dev_type *mtype;
> >
> >   mutex_lock(_memory_type_lock);
> > - list_for_each_entry(mtype, _memory_types, list) {
> > - if (mtype->adistance == adist) {
> > - found = true;
> > - break;
> > - }
> > - }
> > - if (!found) {
> > - mtype = alloc_memory_type(adist);
> > - if (!IS_ERR(mtype))
> > - list_add(>list, _memory_types);
> > - }
> > + mtype = mt_find_alloc_memory_type(adist, _memory_types);
> >   mutex_unlock(_memory_type_lock);
> >
> >   return mtype;
>
> It seems that there's some miscommunication about my previous comments
> about this.  What I suggested is to create one separate patch, which
> moves mt_find_alloc_memory_t

[PATCH v3 2/2] memory tier: dax/kmem: abstract memory types put

2024-03-20 Thread Ho-Ren (Jack) Chuang
Abstract `kmem_put_memory_types()` into `mt_put_memory_types()` to
accommodate various memory types and enhance flexibility,
similar to `mt_find_alloc_memory_type()`.

Signed-off-by: Ho-Ren (Jack) Chuang 
---
 drivers/dax/kmem.c   |  7 +--
 include/linux/memory-tiers.h |  6 ++
 mm/memory-tiers.c| 12 
 3 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index de1333aa7b3e..01399e5b53b2 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -66,13 +66,8 @@ static struct memory_dev_type 
*kmem_find_alloc_memory_type(int adist)
 
 static void kmem_put_memory_types(void)
 {
-   struct memory_dev_type *mtype, *mtn;
-
mutex_lock(_memory_type_lock);
-   list_for_each_entry_safe(mtype, mtn, _memory_types, list) {
-   list_del(>list);
-   put_memory_type(mtype);
-   }
+   mt_put_memory_types(_memory_types);
mutex_unlock(_memory_type_lock);
 }
 
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index b2135334ac18..a44c03c2ba3a 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -50,6 +50,7 @@ int mt_set_default_dram_perf(int nid, struct 
access_coordinate *perf,
 int mt_perf_to_adistance(struct access_coordinate *perf, int *adist);
 struct memory_dev_type *mt_find_alloc_memory_type(int adist,
struct list_head 
*memory_types);
+void mt_put_memory_types(struct list_head *memory_types);
 #ifdef CONFIG_MIGRATION
 int next_demotion_node(int node);
 void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
@@ -143,5 +144,10 @@ struct memory_dev_type *mt_find_alloc_memory_type(int 
adist, struct list_head *m
 {
return NULL;
 }
+
+void mt_put_memory_types(struct list_head *memory_types)
+{
+
+}
 #endif /* CONFIG_NUMA */
 #endif  /* _LINUX_MEMORY_TIERS_H */
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index d9b96b21b65a..6246c28546ba 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -662,6 +662,18 @@ struct memory_dev_type *mt_find_alloc_memory_type(int 
adist, struct list_head *m
 }
 EXPORT_SYMBOL_GPL(mt_find_alloc_memory_type);
 
+
+void mt_put_memory_types(struct list_head *memory_types)
+{
+   struct memory_dev_type *mtype, *mtn;
+
+   list_for_each_entry_safe(mtype, mtn, memory_types, list) {
+   list_del(>list);
+   put_memory_type(mtype);
+   }
+}
+EXPORT_SYMBOL_GPL(mt_put_memory_types);
+
 /*
  * This is invoked via late_initcall() to create
  * CPUless memory tiers after HMAT info is ready or
-- 
Ho-Ren (Jack) Chuang




[PATCH v3 1/2] memory tier: dax/kmem: create CPUless memory tiers after obtaining HMAT info

2024-03-20 Thread Ho-Ren (Jack) Chuang
The current implementation treats emulated memory devices, such as
CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
(E820_TYPE_RAM). However, these emulated devices have different
characteristics than traditional DRAM, making it important to
distinguish them. Thus, we modify the tiered memory initialization process
to introduce a delay specifically for CPUless NUMA nodes. This delay
ensures that the memory tier initialization for these nodes is deferred
until HMAT information is obtained during the boot process. Finally,
demotion tables are recalculated at the end.

More details:
* late_initcall(memory_tier_late_init);
Some device drivers may have initialized memory tiers between
`memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
online memory nodes and configuring memory tiers. They should be excluded
in the late init.

* Abstract common functions into `mt_find_alloc_memory_type()`
Since different memory devices require finding or allocating a memory type,
these common steps are abstracted into a single function,
`mt_find_alloc_memory_type()`, enhancing code scalability and conciseness.

* Handle cases where there is no HMAT when creating memory tiers
There is a scenario where a CPUless node does not provide HMAT information.
If no HMAT is specified, it falls back to using the default DRAM tier.

* Change adist calculation code to use another new lock, `mt_perf_lock`.
In the current implementation, iterating through CPUlist nodes requires
holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
trying to acquire the same lock, leading to a potential deadlock.
Therefore, we propose introducing a standalone `mt_perf_lock` to protect
`default_dram_perf`. This approach not only avoids deadlock but also
prevents holding a large lock simultaneously.

* Upgrade `set_node_memory_tier` to support additional cases, including
  default DRAM, late CPUless, and hot-plugged initializations.
To cover hot-plugged memory nodes, `mt_calc_adistance()` and
`mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
handle cases where memtype is not initialized and where HMAT information is
available.

* Introduce `default_memory_types` for those memory types that are not
  initialized by device drivers.
Because late initialized memory and default DRAM memory need to be managed,
a default memory type is created for storing all memory types that are
not initialized by device drivers and as a fallback.

Signed-off-by: Ho-Ren (Jack) Chuang 
Signed-off-by: Hao Xiang 
---
 drivers/dax/kmem.c   | 13 +
 include/linux/memory-tiers.h |  7 +++
 mm/memory-tiers.c| 94 +---
 3 files changed, 95 insertions(+), 19 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 42ee360cf4e3..de1333aa7b3e 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -55,21 +55,10 @@ static LIST_HEAD(kmem_memory_types);
 
 static struct memory_dev_type *kmem_find_alloc_memory_type(int adist)
 {
-   bool found = false;
struct memory_dev_type *mtype;
 
mutex_lock(_memory_type_lock);
-   list_for_each_entry(mtype, _memory_types, list) {
-   if (mtype->adistance == adist) {
-   found = true;
-   break;
-   }
-   }
-   if (!found) {
-   mtype = alloc_memory_type(adist);
-   if (!IS_ERR(mtype))
-   list_add(>list, _memory_types);
-   }
+   mtype = mt_find_alloc_memory_type(adist, _memory_types);
mutex_unlock(_memory_type_lock);
 
return mtype;
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 69e781900082..b2135334ac18 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -48,6 +48,8 @@ int mt_calc_adistance(int node, int *adist);
 int mt_set_default_dram_perf(int nid, struct access_coordinate *perf,
 const char *source);
 int mt_perf_to_adistance(struct access_coordinate *perf, int *adist);
+struct memory_dev_type *mt_find_alloc_memory_type(int adist,
+   struct list_head 
*memory_types);
 #ifdef CONFIG_MIGRATION
 int next_demotion_node(int node);
 void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
@@ -136,5 +138,10 @@ static inline int mt_perf_to_adistance(struct 
access_coordinate *perf, int *adis
 {
return -EIO;
 }
+
+struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+{
+   return NULL;
+}
 #endif /* CONFIG_NUMA */
 #endif  /* _LINUX_MEMORY_TIERS_H */
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 0537664620e5..d9b96b21b65a 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internal.h"
 
@@ -36,6 +37,11 @@ struct node_memory_type

[PATCH v3 0/2] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-03-20 Thread Ho-Ren (Jack) Chuang
When a memory device, such as CXL1.1 type3 memory, is emulated as
normal memory (E820_TYPE_RAM), the memory device is indistinguishable
from normal DRAM in terms of memory tiering with the current implementation.
The current memory tiering assigns all detected normal memory nodes
to the same DRAM tier. This results in normal memory devices with
different attributions being unable to be assigned to the correct memory tier,
leading to the inability to migrate pages between different types of memory.
https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/

This patchset automatically resolves the issues. It delays the initialization
of memory tiers for CPUless NUMA nodes until they obtain HMAT information
and after all devices are initialized at boot time, eliminating the need
for user intervention. If no HMAT is specified, it falls back to
using `default_dram_type`.

Example usecase:
We have CXL memory on the host, and we create VMs with a new system memory
device backed by host CXL memory. We inject CXL memory performance attributes
through QEMU, and the guest now sees memory nodes with performance attributes
in HMAT. With this change, we enable the guest kernel to construct
the correct memory tiering for the memory nodes.

-v3:
 Thanks to Ying's comments,
 * Make the newly added code independent of HMAT
 * Upgrade set_node_memory_tier to support more cases
 * Put all non-driver-initialized memory types into default_memory_types
   instead of using hmat_memory_types
 * find_alloc_memory_type -> mt_find_alloc_memory_type
-v2:
 Thanks to Ying's comments,
 * Rewrite cover letter & patch description
 * Rename functions, don't use _hmat
 * Abstract common functions into find_alloc_memory_type()
 * Use the expected way to use set_node_memory_tier instead of modifying it
 * 
https://lore.kernel.org/lkml/20240312061729.1997111-1-horenchu...@bytedance.com/T/#u
-v1:
 * 
https://lore.kernel.org/lkml/20240301082248.3456086-1-horenchu...@bytedance.com/T/#u

Ho-Ren (Jack) Chuang (2):
  memory tier: dax/kmem: create CPUless memory tiers after obtaining
HMAT info
  memory tier: dax/kmem: abstract memory types put

 drivers/dax/kmem.c   |  20 +--
 include/linux/memory-tiers.h |  13 +
 mm/memory-tiers.c| 106 ---
 3 files changed, 114 insertions(+), 25 deletions(-)

-- 
Ho-Ren (Jack) Chuang




Re: [External] Re: [PATCH v2 1/1] memory tier: acpi/hmat: create CPUless memory tiers after obtaining HMAT info

2024-03-18 Thread Ho-Ren (Jack) Chuang
I'm working on V3. Thanks for Ying's feedback.

cc: sthanne...@micron.com


On Thu, Mar 14, 2024 at 12:54 AM Huang, Ying  wrote:
>
> "Ho-Ren (Jack) Chuang"  writes:
>
> > On Tue, Mar 12, 2024 at 2:21 AM Huang, Ying  wrote:
> >>
> >> "Ho-Ren (Jack) Chuang"  writes:
> >>
> >> > The current implementation treats emulated memory devices, such as
> >> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal 
> >> > memory
> >> > (E820_TYPE_RAM). However, these emulated devices have different
> >> > characteristics than traditional DRAM, making it important to
> >> > distinguish them. Thus, we modify the tiered memory initialization 
> >> > process
> >> > to introduce a delay specifically for CPUless NUMA nodes. This delay
> >> > ensures that the memory tier initialization for these nodes is deferred
> >> > until HMAT information is obtained during the boot process. Finally,
> >> > demotion tables are recalculated at the end.
> >> >
> >> > * Abstract common functions into `find_alloc_memory_type()`
> >>
> >> We should move kmem_put_memory_types() (renamed to
> >> mt_put_memory_types()?) too.  This can be put in a separate patch.
> >>
> >
> > Will do! Thanks,
> >
> >
> >>
> >> > Since different memory devices require finding or allocating a memory 
> >> > type,
> >> > these common steps are abstracted into a single function,
> >> > `find_alloc_memory_type()`, enhancing code scalability and conciseness.
> >> >
> >> > * Handle cases where there is no HMAT when creating memory tiers
> >> > There is a scenario where a CPUless node does not provide HMAT 
> >> > information.
> >> > If no HMAT is specified, it falls back to using the default DRAM tier.
> >> >
> >> > * Change adist calculation code to use another new lock, mt_perf_lock.
> >> > In the current implementation, iterating through CPUlist nodes requires
> >> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end 
> >> > up
> >> > trying to acquire the same lock, leading to a potential deadlock.
> >> > Therefore, we propose introducing a standalone `mt_perf_lock` to protect
> >> > `default_dram_perf`. This approach not only avoids deadlock but also
> >> > prevents holding a large lock simultaneously.
> >> >
> >> > Signed-off-by: Ho-Ren (Jack) Chuang 
> >> > Signed-off-by: Hao Xiang 
> >> > ---
> >> >  drivers/acpi/numa/hmat.c | 11 ++
> >> >  drivers/dax/kmem.c   | 13 +--
> >> >  include/linux/acpi.h |  6 
> >> >  include/linux/memory-tiers.h |  8 +
> >> >  mm/memory-tiers.c| 70 +---
> >> >  5 files changed, 92 insertions(+), 16 deletions(-)
> >> >
> >> > diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> >> > index d6b85f0f6082..28812ec2c793 100644
> >> > --- a/drivers/acpi/numa/hmat.c
> >> > +++ b/drivers/acpi/numa/hmat.c
> >> > @@ -38,6 +38,8 @@ static LIST_HEAD(targets);
> >> >  static LIST_HEAD(initiators);
> >> >  static LIST_HEAD(localities);
> >> >
> >> > +static LIST_HEAD(hmat_memory_types);
> >> > +
> >>
> >> HMAT isn't a device driver for some memory devices.  So I don't think we
> >> should manage memory types in HMAT.
> >
> > I can put it back in memory-tier.c. How about the list? Do we still
> > need to keep a separate list for storing late_inited memory nodes?
> > And how about the list name if we need to remove the prefix "hmat_"?
>
> I don't think we need a separate list for memory-less nodes.  Just
> iterate all memory-less nodes.
>

Ok. There is no need to keep a list of memory-less nodes. I will
only keep a memory_type list to manage created memory types.


> >
> >> Instead, if the memory_type of a
> >> node isn't set by the driver, we should manage it in memory-tier.c as
> >> fallback.
> >>
> >
> > Do you mean some device drivers may init memory tiers between
> > memory_tier_init() and late_initcall(memory_tier_late_init);?
> > And this is the reason why you mention to exclude
> > "node_memory_types[nid].memtype != NULL" in memory_tier_late_init().
> > Is my understanding correct?
>
> Yes.
>

Thanks.

Re: [External] Re: [PATCH v2 1/1] memory tier: acpi/hmat: create CPUless memory tiers after obtaining HMAT info

2024-03-13 Thread Ho-Ren (Jack) Chuang
On Tue, Mar 12, 2024 at 2:21 AM Huang, Ying  wrote:
>
> "Ho-Ren (Jack) Chuang"  writes:
>
> > The current implementation treats emulated memory devices, such as
> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
> > (E820_TYPE_RAM). However, these emulated devices have different
> > characteristics than traditional DRAM, making it important to
> > distinguish them. Thus, we modify the tiered memory initialization process
> > to introduce a delay specifically for CPUless NUMA nodes. This delay
> > ensures that the memory tier initialization for these nodes is deferred
> > until HMAT information is obtained during the boot process. Finally,
> > demotion tables are recalculated at the end.
> >
> > * Abstract common functions into `find_alloc_memory_type()`
>
> We should move kmem_put_memory_types() (renamed to
> mt_put_memory_types()?) too.  This can be put in a separate patch.
>

Will do! Thanks,


>
> > Since different memory devices require finding or allocating a memory type,
> > these common steps are abstracted into a single function,
> > `find_alloc_memory_type()`, enhancing code scalability and conciseness.
> >
> > * Handle cases where there is no HMAT when creating memory tiers
> > There is a scenario where a CPUless node does not provide HMAT information.
> > If no HMAT is specified, it falls back to using the default DRAM tier.
> >
> > * Change adist calculation code to use another new lock, mt_perf_lock.
> > In the current implementation, iterating through CPUlist nodes requires
> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
> > trying to acquire the same lock, leading to a potential deadlock.
> > Therefore, we propose introducing a standalone `mt_perf_lock` to protect
> > `default_dram_perf`. This approach not only avoids deadlock but also
> > prevents holding a large lock simultaneously.
> >
> > Signed-off-by: Ho-Ren (Jack) Chuang 
> > Signed-off-by: Hao Xiang 
> > ---
> >  drivers/acpi/numa/hmat.c | 11 ++
> >  drivers/dax/kmem.c   | 13 +--
> >  include/linux/acpi.h |  6 
> >  include/linux/memory-tiers.h |  8 +
> >  mm/memory-tiers.c| 70 +---
> >  5 files changed, 92 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> > index d6b85f0f6082..28812ec2c793 100644
> > --- a/drivers/acpi/numa/hmat.c
> > +++ b/drivers/acpi/numa/hmat.c
> > @@ -38,6 +38,8 @@ static LIST_HEAD(targets);
> >  static LIST_HEAD(initiators);
> >  static LIST_HEAD(localities);
> >
> > +static LIST_HEAD(hmat_memory_types);
> > +
>
> HMAT isn't a device driver for some memory devices.  So I don't think we
> should manage memory types in HMAT.

I can put it back in memory-tier.c. How about the list? Do we still
need to keep a separate list for storing late_inited memory nodes?
And how about the list name if we need to remove the prefix "hmat_"?


> Instead, if the memory_type of a
> node isn't set by the driver, we should manage it in memory-tier.c as
> fallback.
>

Do you mean some device drivers may init memory tiers between
memory_tier_init() and late_initcall(memory_tier_late_init);?
And this is the reason why you mention to exclude
"node_memory_types[nid].memtype != NULL" in memory_tier_late_init().
Is my understanding correct?

> >  static DEFINE_MUTEX(target_lock);
> >
> >  /*
> > @@ -149,6 +151,12 @@ int acpi_get_genport_coordinates(u32 uid,
> >  }
> >  EXPORT_SYMBOL_NS_GPL(acpi_get_genport_coordinates, CXL);
> >
> > +struct memory_dev_type *hmat_find_alloc_memory_type(int adist)
> > +{
> > + return find_alloc_memory_type(adist, _memory_types);
> > +}
> > +EXPORT_SYMBOL_GPL(hmat_find_alloc_memory_type);
> > +
> >  static __init void alloc_memory_initiator(unsigned int cpu_pxm)
> >  {
> >   struct memory_initiator *initiator;
> > @@ -1038,6 +1046,9 @@ static __init int hmat_init(void)
> >   if (!hmat_set_default_dram_perf())
> >   register_mt_adistance_algorithm(_adist_nb);
> >
> > + /* Post-create CPUless memory tiers after getting HMAT info */
> > + memory_tier_late_init();
> > +
>
> This should be called in memory-tier.c via
>
> late_initcall(memory_tier_late_init);
>
> Then, we don't need hmat to call it.
>

Thanks. Learned!


> >   return 0;
> >  out_put:
> >   hmat_free_structures();
> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> &g

[PATCH v2 1/1] memory tier: acpi/hmat: create CPUless memory tiers after obtaining HMAT info

2024-03-12 Thread Ho-Ren (Jack) Chuang
The current implementation treats emulated memory devices, such as
CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
(E820_TYPE_RAM). However, these emulated devices have different
characteristics than traditional DRAM, making it important to
distinguish them. Thus, we modify the tiered memory initialization process
to introduce a delay specifically for CPUless NUMA nodes. This delay
ensures that the memory tier initialization for these nodes is deferred
until HMAT information is obtained during the boot process. Finally,
demotion tables are recalculated at the end.

* Abstract common functions into `find_alloc_memory_type()`
Since different memory devices require finding or allocating a memory type,
these common steps are abstracted into a single function,
`find_alloc_memory_type()`, enhancing code scalability and conciseness.

* Handle cases where there is no HMAT when creating memory tiers
There is a scenario where a CPUless node does not provide HMAT information.
If no HMAT is specified, it falls back to using the default DRAM tier.

* Change adist calculation code to use another new lock, mt_perf_lock.
In the current implementation, iterating through CPUlist nodes requires
holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
trying to acquire the same lock, leading to a potential deadlock.
Therefore, we propose introducing a standalone `mt_perf_lock` to protect
`default_dram_perf`. This approach not only avoids deadlock but also
prevents holding a large lock simultaneously.

Signed-off-by: Ho-Ren (Jack) Chuang 
Signed-off-by: Hao Xiang 
---
 drivers/acpi/numa/hmat.c | 11 ++
 drivers/dax/kmem.c   | 13 +--
 include/linux/acpi.h |  6 
 include/linux/memory-tiers.h |  8 +
 mm/memory-tiers.c| 70 +---
 5 files changed, 92 insertions(+), 16 deletions(-)

diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index d6b85f0f6082..28812ec2c793 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -38,6 +38,8 @@ static LIST_HEAD(targets);
 static LIST_HEAD(initiators);
 static LIST_HEAD(localities);
 
+static LIST_HEAD(hmat_memory_types);
+
 static DEFINE_MUTEX(target_lock);
 
 /*
@@ -149,6 +151,12 @@ int acpi_get_genport_coordinates(u32 uid,
 }
 EXPORT_SYMBOL_NS_GPL(acpi_get_genport_coordinates, CXL);
 
+struct memory_dev_type *hmat_find_alloc_memory_type(int adist)
+{
+   return find_alloc_memory_type(adist, _memory_types);
+}
+EXPORT_SYMBOL_GPL(hmat_find_alloc_memory_type);
+
 static __init void alloc_memory_initiator(unsigned int cpu_pxm)
 {
struct memory_initiator *initiator;
@@ -1038,6 +1046,9 @@ static __init int hmat_init(void)
if (!hmat_set_default_dram_perf())
register_mt_adistance_algorithm(_adist_nb);
 
+   /* Post-create CPUless memory tiers after getting HMAT info */
+   memory_tier_late_init();
+
return 0;
 out_put:
hmat_free_structures();
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 42ee360cf4e3..aee17ab59f4f 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -55,21 +55,10 @@ static LIST_HEAD(kmem_memory_types);
 
 static struct memory_dev_type *kmem_find_alloc_memory_type(int adist)
 {
-   bool found = false;
struct memory_dev_type *mtype;
 
mutex_lock(_memory_type_lock);
-   list_for_each_entry(mtype, _memory_types, list) {
-   if (mtype->adistance == adist) {
-   found = true;
-   break;
-   }
-   }
-   if (!found) {
-   mtype = alloc_memory_type(adist);
-   if (!IS_ERR(mtype))
-   list_add(>list, _memory_types);
-   }
+   mtype = find_alloc_memory_type(adist, _memory_types);
mutex_unlock(_memory_type_lock);
 
return mtype;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index b7165e52b3c6..3f927ff01f02 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -434,12 +434,18 @@ int thermal_acpi_critical_trip_temp(struct acpi_device 
*adev, int *ret_temp);
 
 #ifdef CONFIG_ACPI_HMAT
 int acpi_get_genport_coordinates(u32 uid, struct access_coordinate *coord);
+struct memory_dev_type *hmat_find_alloc_memory_type(int adist);
 #else
 static inline int acpi_get_genport_coordinates(u32 uid,
   struct access_coordinate *coord)
 {
return -EOPNOTSUPP;
 }
+
+static inline struct memory_dev_type *hmat_find_alloc_memory_type(int adist)
+{
+   return NULL;
+}
 #endif
 
 #ifdef CONFIG_ACPI_NUMA
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 69e781900082..4bc2596c5774 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -48,6 +48,9 @@ int mt_calc_adistance(int node, int *adist);
 int mt_set_default_dram_perf(int nid, struct access_coordinate *perf,

[PATCH v2 0/1] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-03-12 Thread Ho-Ren (Jack) Chuang
When a memory device, such as CXL1.1 type3 memory, is emulated as
normal memory (E820_TYPE_RAM), the memory device is indistinguishable
from normal DRAM in terms of memory tiering with the current implementation.
The current memory tiering assigns all detected normal memory nodes
to the same DRAM tier. This results in normal memory devices with
different attributions being unable to be assigned to the correct memory tier,
leading to the inability to migrate pages between different types of memory.
https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/

This patchset automatically resolves the issues. It delays the initialization
of memory tiers for CPUless NUMA nodes until they obtain HMAT information
at boot time, eliminating the need for user intervention.
If no HMAT is specified, it falls back to using `default_dram_type`.

Example usecase:
We have CXL memory on the host, and we create VMs with a new system memory
device backed by host CXL memory. We inject CXL memory performance attributes
through QEMU, and the guest now sees memory nodes with performance attributes
in HMAT. With this change, we enable the guest kernel to construct
the correct memory tiering for the memory nodes.

-v2:
 Thanks to Ying's comments,
 * Rewrite cover letter & patch description
 * Rename functions, don't use _hmat
 * Abstract common functions into find_alloc_memory_type()
 * Use the expected way to use set_node_memory_tier instead of modifying it
-v1:
 * 
https://lore.kernel.org/linux-mm/20240301082248.3456086-1-horenchu...@bytedance.com/T/


Ho-Ren (Jack) Chuang (1):
  memory tier: acpi/hmat: create CPUless memory tiers after obtaining
HMAT info

 drivers/acpi/numa/hmat.c | 11 ++
 drivers/dax/kmem.c   | 13 +--
 include/linux/acpi.h |  6 
 include/linux/memory-tiers.h |  8 +
 mm/memory-tiers.c| 70 +---
 5 files changed, 92 insertions(+), 16 deletions(-)

-- 
Ho-Ren (Jack) Chuang




[PATCH] dt-bindings: sound: add address-cells and size-cells information

2021-03-31 Thread Jack Yu
Add address-cells and size-cells information to fix warnings
for rt1019.yaml.

Signed-off-by: Jack Yu 
---
 Documentation/devicetree/bindings/sound/rt1019.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/sound/rt1019.yaml 
b/Documentation/devicetree/bindings/sound/rt1019.yaml
index c24c29eafa54..3d5a91a942f4 100644
--- a/Documentation/devicetree/bindings/sound/rt1019.yaml
+++ b/Documentation/devicetree/bindings/sound/rt1019.yaml
@@ -26,6 +26,8 @@ additionalProperties: false
 examples:
   - |
 i2c {
+#address-cells = <1>;
+#size-cells = <0>;
 rt1019: codec@28 {
 compatible = "realtek,rt1019";
 reg = <0x28>;
-- 
2.29.0



Re: [PATCH] usb: gadget: Stall OS descriptor request for unsupported functions

2021-03-23 Thread Jack Pham
Hi Wesley,

On Mon, Mar 22, 2021 at 06:50:17PM -0700, Wesley Cheng wrote:
> From: Chandana Kishori Chiluveru 
> 
> Hosts which request "OS descriptors" from gadgets do so during
> the enumeration phase and before the configuration is set with
> SET_CONFIGURATION. Composite driver supports OS descriptor
> handling in composite_setup function. This requires to pass
> signature field, vendor code, compatibleID and subCompatibleID
> from user space.
> 
> For USB compositions that contain functions which don't implement os
> descriptors, Windows is sending vendor specific requests for os
> descriptors and composite driver handling this request with invalid
> data. With this invalid info host resetting the bus and never
> selecting the configuration and leading enumeration issue.
> 
> Fix this by bailing out from the OS descriptor setup request
> handling if the functions does not have OS descriptors compatibleID.
> 
> Signed-off-by: Chandana Kishori Chiluveru 
> Signed-off-by: Wesley Cheng 
> ---
>  drivers/usb/gadget/composite.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/usb/gadget/composite.c b/drivers/usb/gadget/composite.c
> index 72a9797..473edda6 100644
> --- a/drivers/usb/gadget/composite.c
> +++ b/drivers/usb/gadget/composite.c
> @@ -1945,6 +1945,12 @@ composite_setup(struct usb_gadget *gadget, const 
> struct usb_ctrlrequest *ctrl)
>   buf[6] = w_index;
>   /* Number of ext compat interfaces */
>   count = count_ext_compat(os_desc_cfg);
> + /*
> +  * Bailout if device does not
> +  * have ext_compat interfaces.
> +  */
> + if (count == 0)
> + break;
>   buf[8] = count;
>   count *= 24; /* 24 B/ext compat desc */
>   count += 16; /* header */

Do we still need this fix? IIRC we had this change in our downstream
kernel to fix the case when dynamically re-configuring ConfigFS, i.e.
changing the composition of functions wherein none of the interfaces
support OS Descriptors, so this causes count_ext_compat() to return
0 and results in the issue described in $SUBJECT.

But I think this is more of a problem of an improperly configured
ConfigFS gadget. If userspace instead removes the config from the
gadget's os_desc subdirectory that should cause cdev->os_desc_config to
be set to NULL and hence composite_setup() should never enter this
handling at all, right?

Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


RE: [PATCH v3] ASoC: Intel: sof_rt5682: Add ALC1015Q-VB speaker amp support

2021-03-17 Thread Jack Yu
> > The code is looks fine, but Jack Yu added a separate patch that makes
> > RTL1019 equivalent to RTL1015, so should this patch also handle the
> > RTL1019 case?
> 
> The topology used by this machine driver is using 48k, 64fs I2S format.
> RT1019 needs to support this configuration. Not sure if RT1019 could support
> that.
> 

Yes, RT1019 supports 48k, 64fs I2S format.


RE: [PATCH v3] ASoC: Intel: sof_rt5682: Add ALC1015Q-VB speaker amp support

2021-03-17 Thread Jack Yu
> > This patch adds jsl_rt5682_rt1015p which supports the RT5682 headset
> > codec and ALC1015Q-VB speaker amplifier combination on JasperLake
> > platform.
> >
> > This driver also supports ALC1015Q-CG if running in auto-mode.
> > Following table shows the audio interface support of the two
> > amplifiers.
> >
> >| ALC1015Q-CG | ALC1015Q-VB
> > =
> > I2C   | Yes | No
> > Auto-mode | 48K, 64fs   | 16k, 32fs
> >  | 48k, 32fs
> >  | 48k, 64fs
> >
> > Signed-off-by: Brent Lu 
> 
> The code is looks fine, but Jack Yu added a separate patch that makes
> RTL1019 equivalent to RTL1015, so should this patch also handle the
> RTL1019 case?
> 
For rt1019 non-i2c mode (auto mode), it uses the sdb pin to enable amp, the 
same as rt1015 non-i2c mode,
therefore we propose rt1019(auto mode) to use rt1015p instead of adding a new 
driver for it.


Re: [PATCH] usb: dwc3: gadget: Prevent EP queuing while stopping transfers

2021-03-10 Thread Jack Pham
Hi Wesley,

On Wed, Mar 10, 2021 at 03:02:10AM -0800, Wesley Cheng wrote:
> In the situations where the DWC3 gadget stops active transfers, once
> calling the dwc3_gadget_giveback(), there is a chance where a function
> driver can queue a new USB request in between the time where the dwc3
> lock has been released and re-aquired.  This occurs after we've already
> issued an ENDXFER command.  When the stop active transfers continues
> to remove USB requests from all dep lists, the newly added request will
> also be removed, while controller still has an active TRB for it.
> This can lead to the controller accessing an unmapped memory address.
> 
> Fix this by ensuring parameters to prevent EP queuing are set before
> calling the stop active transfers API.

Is it correct to say this Fixes: ae7e86108b12 ("usb: dwc3: Stop active
transfers before halting the controller") ?

Jack

> Signed-off-by: Wesley Cheng 
> ---
>  drivers/usb/dwc3/gadget.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 4780983..4d98fbf 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -783,8 +783,6 @@ static int __dwc3_gadget_ep_disable(struct dwc3_ep *dep)
>  
>   trace_dwc3_gadget_ep_disable(dep);
>  
> - dwc3_remove_requests(dwc, dep);
> -
>   /* make sure HW endpoint isn't stalled */
>   if (dep->flags & DWC3_EP_STALL)
>   __dwc3_gadget_ep_set_halt(dep, 0, false);
> @@ -803,6 +801,8 @@ static int __dwc3_gadget_ep_disable(struct dwc3_ep *dep)
>   dep->endpoint.desc = NULL;
>   }
>  
> + dwc3_remove_requests(dwc, dep);
> +
>   return 0;
>  }
>  
> @@ -1617,7 +1617,7 @@ static int __dwc3_gadget_ep_queue(struct dwc3_ep *dep, 
> struct dwc3_request *req)
>  {
>   struct dwc3 *dwc = dep->dwc;
>  
> - if (!dep->endpoint.desc || !dwc->pullups_connected) {
> + if (!dep->endpoint.desc || !dwc->pullups_connected || !dwc->connected) {
>   dev_err(dwc->dev, "%s: can't queue to disabled endpoint\n",
>   dep->name);
>   return -ESHUTDOWN;
> @@ -2247,6 +2247,7 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int 
> is_on)
>   if (!is_on) {
>   u32 count;
>  
> + dwc->connected = false;
>   /*
>* In the Synopsis DesignWare Cores USB3 Databook Rev. 3.30a
>* Section 4.1.8 Table 4-7, it states that for a 
> device-initiated
> @@ -3329,8 +3330,6 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 
> *dwc)
>  {
>   u32 reg;
>  
> - dwc->connected = true;
> -
>   /*
>* WORKAROUND: DWC3 revisions <1.88a have an issue which
>* would cause a missing Disconnect Event if there's a
> @@ -3370,6 +3369,7 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 
> *dwc)
>* transfers."
>*/
>   dwc3_stop_active_transfers(dwc);
> + dwc->connected = true;
>  
>   reg = dwc3_readl(dwc->regs, DWC3_DCTL);
>   reg &= ~DWC3_DCTL_TSTCTRL_MASK;
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 2/6] arm64: dts: qcom: sm8350: add USB and PHY device nodes

2021-02-04 Thread Jack Pham
Hi Vinod,

On Thu, Feb 04, 2021 at 10:39:03PM +0530, Vinod Koul wrote:
> From: Jack Pham 
> 
> Add device nodes for the two instances each of USB3 controllers,
> QMP SS PHYs and SNPS HS PHYs.
> 
> Signed-off-by: Jack Pham 
> Message-Id: <20210116013802.1609-2-ja...@codeaurora.org>
> Signed-off-by: Vinod Koul 
> ---
>  arch/arm64/boot/dts/qcom/sm8350.dtsi | 179 +++
>  1 file changed, 179 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sm8350.dtsi 
> b/arch/arm64/boot/dts/qcom/sm8350.dtsi
> index e3597e2a22ab..e51d9ca0210c 100644
> --- a/arch/arm64/boot/dts/qcom/sm8350.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sm8350.dtsi
> @@ -592,6 +592,185 @@ rpmhcc: clock-controller {
>   };
>  
>   };
> +
> + usb_1_hsphy: phy@88e3000 {
> + compatible = "qcom,sm8350-usb-hs-phy",
> +  "qcom,usb-snps-hs-7nm-phy";
> + reg = <0 0x088e3000 0 0x400>;
> + status = "disabled";
> + #phy-cells = <0>;
> +
> + clocks = < RPMH_CXO_CLK>;
> + clock-names = "ref";
> +
> + resets = < 20>;

Shouldn't this (and all the other gcc phandles below) use the
dt-bindings macros from here?
https://patchwork.kernel.org/project/linux-arm-msm/patch/20210118044321.2571775-5-vk...@kernel.org/

Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 3/4] phy: qcom-qmp: Add UFS v4 registers found in SM8350

2021-01-25 Thread Jack Pham
Hi Vinod,

On Mon, Jan 25, 2021 at 03:39:05PM +0530, Vinod Koul wrote:
> Add the registers for few new registers found in SM8350. Also the UFS
> phy used in SM8350 seems to have different offsets than V4 phy, although
> it claims it is v4 phy, so add the new offsets with SM8350 tag instead
> of V4 tag.

Actually I believe SM8350 UFS PHY is on V5, as the internal IP revision
shows 5.0.0. So IMO some of the below definitions should just be using
the V5 macros for consistency with the ones I recently added for USB3.

And like USB3 QMP, it seems we have mixed usage of V4/V5 macros in the
sequence tables, mainly wherever the offsets are identical between IP
revisions. Hope this doesn't turn out to be a maintenance nightmare...

> ---
>  drivers/phy/qualcomm/phy-qcom-qmp.h | 27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.h 
> b/drivers/phy/qualcomm/phy-qcom-qmp.h
> index dff7be5a1cc1..bba1d5e3eb73 100644
> --- a/drivers/phy/qualcomm/phy-qcom-qmp.h
> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.h
> @@ -451,6 +451,7 @@
>  #define QSERDES_V4_TX_RES_CODE_LANE_OFFSET_RX0x40
>  #define QSERDES_V4_TX_LANE_MODE_10x84
>  #define QSERDES_V4_TX_LANE_MODE_20x88
> +#define QSERDES_V4_TX_LANE_MODE_30x8C
>  #define QSERDES_V4_TX_RCV_DETECT_LVL_2   0x9c
>  #define QSERDES_V4_TX_PWM_GEAR_1_DIVIDER_BAND0_1 0xd8
>  #define QSERDES_V4_TX_PWM_GEAR_2_DIVIDER_BAND0_1 0xdC
> @@ -459,6 +460,13 @@
>  #define QSERDES_V4_TX_TRAN_DRVR_EMP_EN   0xb8
>  #define QSERDES_V4_TX_PI_QEC_CTRL0x104
>  
> +/* Only for SM8350 QMP V4 Phy TX offsets different from V4 */
> +#define QSERDES_SM8350_TX_PWM_GEAR_1_DIVIDER_BAND0_1 0x178
> +#define QSERDES_SM8350_TX_PWM_GEAR_2_DIVIDER_BAND0_1 0x17c
> +#define QSERDES_SM8350_TX_PWM_GEAR_3_DIVIDER_BAND0_1 0x180
> +#define QSERDES_SM8350_TX_PWM_GEAR_4_DIVIDER_BAND0_1 0x184
> +#define QSERDES_SM8350_TX_TRAN_DRVR_EMP_EN   0xc0

These could be augmented to the V5 TX defintions? Although they are not
present for USB, so not sure if you want to add "V5_UFS" to the prefix.
But since they are at offsets past the end of the USB TX bank it should
also be ok to share in the QSERDES_V5_TX namespace.

> +
>  /* Only for QMP V4 PHY - RX registers */
>  #define QSERDES_V4_RX_UCDR_FO_GAIN   0x008
>  #define QSERDES_V4_RX_UCDR_SO_GAIN   0x014
> @@ -514,6 +522,24 @@
>  #define QSERDES_V4_RX_DCC_CTRL1  0x1bc
>  #define QSERDES_V4_RX_VTH_CODE   0x1c4
>  
> +/* Only for SM8350 QMP V4 Phy RX offsets different from V4 */
> +#define QSERDES_SM8350_RX_RX_MODE_00_LOW 0x15c
> +#define QSERDES_SM8350_RX_RX_MODE_00_HIGH0x160
> +#define QSERDES_SM8350_RX_RX_MODE_00_HIGH2   0x164
> +#define QSERDES_SM8350_RX_RX_MODE_00_HIGH3   0x168
> +#define QSERDES_SM8350_RX_RX_MODE_00_HIGH4   0x16c
> +#define QSERDES_SM8350_RX_RX_MODE_01_LOW 0x170
> +#define QSERDES_SM8350_RX_RX_MODE_01_HIGH0x174
> +#define QSERDES_SM8350_RX_RX_MODE_01_HIGH2   0x178
> +#define QSERDES_SM8350_RX_RX_MODE_01_HIGH3   0x17c
> +#define QSERDES_SM8350_RX_RX_MODE_01_HIGH4   0x180
> +#define QSERDES_SM8350_RX_RX_MODE_10_LOW 0x184
> +#define QSERDES_SM8350_RX_RX_MODE_10_HIGH0x188
> +#define QSERDES_SM8350_RX_RX_MODE_10_HIGH2   0x18c
> +#define QSERDES_SM8350_RX_RX_MODE_10_HIGH3   0x190
> +#define QSERDES_SM8350_RX_RX_MODE_10_HIGH4   0x194
> +#define QSERDES_SM8350_RX_DCC_CTRL1  0x1a8

These are identical to the "V5" offsets I had added for SM8350 USB.

Thanks,
Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH v6 1/4] usb: gadget: udc: core: Introduce check_config to verify USB configuration

2021-01-21 Thread Jack Pham
Hi Wesley,

On Thu, Jan 21, 2021 at 08:01:37PM -0800, Wesley Cheng wrote:
> Some UDCs may have constraints on how many high bandwidth endpoints it can
> support in a certain configuration.  This API allows for the composite
> driver to pass down the total number of endpoints to the UDC so it can verify
> it has the required resources to support the configuration.
> 
> Signed-off-by: Wesley Cheng 
> ---
>  drivers/usb/gadget/udc/core.c | 9 +
>  include/linux/usb/gadget.h| 2 ++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
> index 4173acd..469962f 100644
> --- a/drivers/usb/gadget/udc/core.c
> +++ b/drivers/usb/gadget/udc/core.c
> @@ -1003,6 +1003,15 @@ int usb_gadget_ep_match_desc(struct usb_gadget *gadget,
>  }
>  EXPORT_SYMBOL_GPL(usb_gadget_ep_match_desc);
>  
> +int usb_gadget_check_config(struct usb_gadget *gadget, unsigned long ep_map)

You should probably add a kernel-doc for this function.

Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


[PATCH v3] dt-bindings: usb: qcom,dwc3: Add bindings for SM8150, SM8250, SM8350

2021-01-19 Thread Jack Pham
Add compatible strings for the USB DWC3 controller on QCOM SM8150,
SM8250 and SM8350 SoCs.

Note the SM8150 & SM8250 compatibles are already being used in the
dts but was missing from the documentation.

Acked-by: Felipe Balbi 
Signed-off-by: Jack Pham 
---
v3: Resend of #4/4 of 
https://lore.kernel.org/linux-usb/20210115174723.7424-1-ja...@codeaurora.org
added Felipe's Ack & rebased on gregkh/usb-testing

 Documentation/devicetree/bindings/usb/qcom,dwc3.yaml | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/usb/qcom,dwc3.yaml 
b/Documentation/devicetree/bindings/usb/qcom,dwc3.yaml
index dd1d8bcd9254..c3cbd1fa9944 100644
--- a/Documentation/devicetree/bindings/usb/qcom,dwc3.yaml
+++ b/Documentation/devicetree/bindings/usb/qcom,dwc3.yaml
@@ -18,6 +18,9 @@ properties:
   - qcom,sc7180-dwc3
   - qcom,sdm845-dwc3
   - qcom,sdx55-dwc3
+  - qcom,sm8150-dwc3
+  - qcom,sm8250-dwc3
+  - qcom,sm8350-dwc3
   - const: qcom,dwc3
 
   reg:
-- 
2.24.0



[PATCH v2 3/4] dt-bindings: phy: qcom,usb-snps-femto-v2: Add SM8250 and SM8350 bindings

2021-01-15 Thread Jack Pham
Add the compatible strings for the USB2 PHYs found on QCOM
SM8250 & SM8350 SoCs.

Note that the SM8250 compatible is already in use in the dts and
driver implementation but was missing from the documentation.

Signed-off-by: Jack Pham 
---
 .../devicetree/bindings/phy/qcom,usb-snps-femto-v2.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/phy/qcom,usb-snps-femto-v2.yaml 
b/Documentation/devicetree/bindings/phy/qcom,usb-snps-femto-v2.yaml
index 4949a2851532..ee77c6458326 100644
--- a/Documentation/devicetree/bindings/phy/qcom,usb-snps-femto-v2.yaml
+++ b/Documentation/devicetree/bindings/phy/qcom,usb-snps-femto-v2.yaml
@@ -17,6 +17,8 @@ properties:
 enum:
   - qcom,usb-snps-hs-7nm-phy
   - qcom,sm8150-usb-hs-phy
+  - qcom,sm8250-usb-hs-phy
+  - qcom,sm8350-usb-hs-phy
   - qcom,usb-snps-femto-v2-phy
 
   reg:
-- 
2.24.0



[PATCH v2 2/4] phy: qcom-qmp: Add SM8350 USB QMP PHYs

2021-01-15 Thread Jack Pham
Add support for the USB DP & UNI PHYs found on SM8350. These use
version 5.0.0 of the QMP PHY IP and thus require new "V5"
definitions of the register offset macros for the QSERDES RX
and TX blocks. The QSERDES common and QPHY PCS blocks' register
offsets are largely unchanged from V4 so some of the existing
macros can be reused.

Signed-off-by: Jack Pham 
---
v2: Added missing to sm8350_usb3_uniphy entry to qcom_qmp_phy_of_match_table

 drivers/phy/qualcomm/phy-qcom-qmp.c | 212 
 drivers/phy/qualcomm/phy-qcom-qmp.h | 100 +
 2 files changed, 312 insertions(+)

diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c 
b/drivers/phy/qualcomm/phy-qcom-qmp.c
index f103b14f983e..d7f0d47f774f 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
@@ -216,6 +216,15 @@ static const unsigned int 
qmp_v4_usb3_uniphy_regs_layout[QPHY_LAYOUT_SIZE] = {
[QPHY_PCS_LFPS_RXTERM_IRQ_CLEAR]  = 0x614,
 };
 
+static const unsigned int sm8350_usb3_uniphy_regs_layout[QPHY_LAYOUT_SIZE] = {
+   [QPHY_SW_RESET] = 0x00,
+   [QPHY_START_CTRL]   = 0x44,
+   [QPHY_PCS_STATUS]   = 0x14,
+   [QPHY_PCS_POWER_DOWN_CONTROL]   = 0x40,
+   [QPHY_PCS_AUTONOMOUS_MODE_CTRL] = 0x1008,
+   [QPHY_PCS_LFPS_RXTERM_IRQ_CLEAR]  = 0x1014,
+};
+
 static const unsigned int sdm845_ufsphy_regs_layout[QPHY_LAYOUT_SIZE] = {
[QPHY_START_CTRL]   = 0x00,
[QPHY_PCS_READY_STATUS] = 0x160,
@@ -2025,6 +2034,144 @@ static const struct qmp_phy_init_tbl 
sdx55_usb3_uniphy_rx_tbl[] = {
QMP_PHY_INIT_CFG(QSERDES_V4_RX_GM_CAL, 0x1f),
 };
 
+static const struct qmp_phy_init_tbl sm8350_usb3_tx_tbl[] = {
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_RES_CODE_LANE_TX, 0x00),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_RES_CODE_LANE_RX, 0x00),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_RES_CODE_LANE_OFFSET_TX, 0x16),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_RES_CODE_LANE_OFFSET_RX, 0x0e),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_LANE_MODE_1, 0x35),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_LANE_MODE_3, 0x3f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_LANE_MODE_4, 0x7f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_LANE_MODE_5, 0x3f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_RCV_DETECT_LVL_2, 0x12),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_PI_QEC_CTRL, 0x21),
+};
+
+static const struct qmp_phy_init_tbl sm8350_usb3_rx_tbl[] = {
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_FO_GAIN, 0x0a),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_SO_GAIN, 0x05),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_FASTLOCK_FO_GAIN, 0x2f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_SO_SATURATION_AND_ENABLE, 0x7f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_FASTLOCK_COUNT_LOW, 0xff),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_FASTLOCK_COUNT_HIGH, 0x0f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_PI_CONTROLS, 0x99),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_SB2_THRESH1, 0x08),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_SB2_THRESH2, 0x08),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_SB2_GAIN1, 0x00),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_SB2_GAIN2, 0x04),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_VGA_CAL_CNTRL1, 0x54),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_VGA_CAL_CNTRL2, 0x0f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_EQU_ADAPTOR_CNTRL2, 0x0f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_EQU_ADAPTOR_CNTRL3, 0x4a),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_EQU_ADAPTOR_CNTRL4, 0x0a),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_IDAC_TSETTLE_LOW, 0xc0),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_IDAC_TSETTLE_HIGH, 0x00),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_EQ_OFFSET_ADAPTOR_CNTRL1, 0x47),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_SIGDET_CNTRL, 0x04),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_SIGDET_DEGLITCH_CNTRL, 0x0e),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_00_LOW, 0xbb),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_00_HIGH, 0x7b),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_00_HIGH2, 0xbb),
+   QMP_PHY_INIT_CFG_LANE(QSERDES_V5_RX_RX_MODE_00_HIGH3, 0x3d, 1),
+   QMP_PHY_INIT_CFG_LANE(QSERDES_V5_RX_RX_MODE_00_HIGH3, 0x3c, 2),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_00_HIGH4, 0xdb),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_01_LOW, 0x64),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_01_HIGH, 0x24),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_01_HIGH2, 0xd2),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_01_HIGH3, 0x13),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_01_HIGH4, 0xa9),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_DFE_EN_TIMER, 0x04),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_DFE_CTLE_POST_CAL_OFFSET, 0x38),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_AUX_DATA_TCOARSE_TFINE, 0xa0),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_DCC_CTRL1, 0x0c),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_GM_CAL, 0x00),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_VTH_CODE, 0x10),
+};
+
+static const struct qmp_phy_init_tbl sm835

[PATCH v2 4/4] dt-bindings: usb: qcom,dwc3: Add bindings for SM8150, SM8250, SM8350

2021-01-15 Thread Jack Pham
Add compatible strings for the USB DWC3 controller on QCOM SM8150,
SM8250 and SM8350 SoCs.

Note the SM8150 & SM8250 compatibles are already being used in the
dts but was missing from the documentation.

Signed-off-by: Jack Pham 
---
 Documentation/devicetree/bindings/usb/qcom,dwc3.yaml | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/usb/qcom,dwc3.yaml 
b/Documentation/devicetree/bindings/usb/qcom,dwc3.yaml
index 2cf525d21e05..da47f43d6b04 100644
--- a/Documentation/devicetree/bindings/usb/qcom,dwc3.yaml
+++ b/Documentation/devicetree/bindings/usb/qcom,dwc3.yaml
@@ -17,6 +17,9 @@ properties:
   - qcom,msm8998-dwc3
   - qcom,sc7180-dwc3
   - qcom,sdm845-dwc3
+  - qcom,sm8150-dwc3
+  - qcom,sm8250-dwc3
+  - qcom,sm8350-dwc3
   - const: qcom,dwc3
 
   reg:
-- 
2.24.0



[PATCH v2 1/4] dt-bindings: phy: qcom,qmp: Add SM8150, SM8250 and SM8350 USB PHY bindings

2021-01-15 Thread Jack Pham
Add the compatible strings for the USB3 PHYs found on SM8150, SM8250
and SM8350 SoCs. These require separate subschemas due to the different
required clock entries.

Note the SM8150 and SM8250 compatibles have already been in place in
the dts as well as the driver implementation but were missing from
the documentation.

Signed-off-by: Jack Pham 
---
 .../devicetree/bindings/phy/qcom,qmp-phy.yaml | 67 +++
 1 file changed, 67 insertions(+)

diff --git a/Documentation/devicetree/bindings/phy/qcom,qmp-phy.yaml 
b/Documentation/devicetree/bindings/phy/qcom,qmp-phy.yaml
index 390df23b82e7..841c72863b4f 100644
--- a/Documentation/devicetree/bindings/phy/qcom,qmp-phy.yaml
+++ b/Documentation/devicetree/bindings/phy/qcom,qmp-phy.yaml
@@ -30,15 +30,24 @@ properties:
   - qcom,sdm845-qmp-ufs-phy
   - qcom,sdm845-qmp-usb3-uni-phy
   - qcom,sm8150-qmp-ufs-phy
+  - qcom,sm8150-qmp-usb3-phy
+  - qcom,sm8150-qmp-usb3-uni-phy
   - qcom,sm8250-qmp-ufs-phy
   - qcom,sm8250-qmp-gen3x1-pcie-phy
   - qcom,sm8250-qmp-gen3x2-pcie-phy
   - qcom,sm8250-qmp-modem-pcie-phy
+  - qcom,sm8250-qmp-usb3-phy
+  - qcom,sm8250-qmp-usb3-uni-phy
+  - qcom,sm8350-qmp-usb3-phy
+  - qcom,sm8350-qmp-usb3-uni-phy
   - qcom,sdx55-qmp-usb3-uni-phy
 
   reg:
+minItems: 1
+maxItems: 2
 items:
   - description: Address and length of PHY's common serdes block.
+  - description: Address and length of PHY's DP_COM control block.
 
   "#clock-cells":
 enum: [ 1, 2 ]
@@ -287,6 +296,64 @@ allOf:
 reset-names:
   items:
 - const: phy
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - qcom,sm8150-qmp-usb3-phy
+  - qcom,sm8150-qmp-usb3-uni-phy
+  - qcom,sm8250-qmp-usb3-uni-phy
+  - qcom,sm8350-qmp-usb3-uni-phy
+then:
+  properties:
+clocks:
+  items:
+- description: Phy aux clock.
+- description: 19.2 MHz ref clk source.
+- description: 19.2 MHz ref clk.
+- description: Phy common block aux clock.
+clock-names:
+  items:
+- const: aux
+- const: ref_clk_src
+- const: ref
+- const: com_aux
+resets:
+  items:
+- description: reset of phy block.
+- description: phy common block reset.
+reset-names:
+  items:
+- const: phy
+- const: common
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - qcom,sm8250-qmp-usb3-phy
+  - qcom,sm8350-qmp-usb3-phy
+then:
+  properties:
+clocks:
+  items:
+- description: Phy aux clock.
+- description: 19.2 MHz ref clk.
+- description: Phy common block aux clock.
+clock-names:
+  items:
+- const: aux
+- const: ref_clk_src
+- const: com_aux
+resets:
+  items:
+- description: reset of phy block.
+- description: phy common block reset.
+reset-names:
+  items:
+- const: phy
+- const: common
 
 examples:
   - |
-- 
2.24.0



[PATCH v2 0/4] SM8350 USB and dt-bindings updates

2021-01-15 Thread Jack Pham
This series adds support for the SM8350 USB PHY to the QMP PHY driver
as well as adds the documentation for the QMP, SNPS PHY and DWC3
controller bindings. This also adds the bindings for SM8150 and SM8250
to the same docs which had not been added previously even though they
are in use now.

v2: Reordered Patches 1 & 2; added missing entry to match_table

Jack Pham (4):
  dt-bindings: phy: qcom,qmp: Add SM8150, SM8250 and SM8350 USB PHY
bindings
  phy: qcom-qmp: Add SM8350 USB QMP PHYs
  dt-bindings: phy: qcom,usb-snps-femto-v2: Add SM8250 and SM8350
bindings
  dt-bindings: usb: qcom,dwc3: Add bindings for SM8150, SM8250, SM8350

 .../devicetree/bindings/phy/qcom,qmp-phy.yaml |  67 ++
 .../bindings/phy/qcom,usb-snps-femto-v2.yaml  |   2 +
 .../devicetree/bindings/usb/qcom,dwc3.yaml|   3 +
 drivers/phy/qualcomm/phy-qcom-qmp.c   | 209 ++
 drivers/phy/qualcomm/phy-qcom-qmp.h   | 100 +
 5 files changed, 381 insertions(+)

-- 
2.24.0



Re: [PATCH 1/4] phy: qcom-qmp: Add SM8350 USB QMP PHYs

2021-01-15 Thread Jack Pham
Hi Vinod,

On Fri, Jan 15, 2021 at 06:17:36PM +0530, Vinod Koul wrote:
> On 15-01-21, 12:54, Konrad Dybcio wrote:
> > I might be wrong but it looks as if you forgot to add a compatible
> > for the "sm8350_usb3_uniphy_cfg" configuration.

I believe Konrad was referring to the driver in which I had neglected to
add the compatible to the qcom_qmp_phy_of_match_table. My mistake.

> It seems to be documented in patch 2, ideally we should have the
> bindings patches first and this as patch 3...

Ok. I think driver change would be patch 2 rather, with the bindings in
patch 1? Patch 3 and 4 are dt-bindings updates to the SNPS Femto PHY and
DWC3 QCOM docs respectively.

Will send v2, thanks.

Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 1/4] phy: qcom-qmp: Add SM8350 USB QMP PHYs

2021-01-15 Thread Jack Pham
On Fri, Jan 15, 2021 at 12:54:26PM +0100, Konrad Dybcio wrote:
> I might be wrong but it looks as if you forgot to add a compatible for
> the "sm8350_usb3_uniphy_cfg" configuration.

Not wrong at all! My mistake, will add it in v2.

Thanks,
Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


[PATCH 2/4] dt-bindings: phy: qcom,qmp: Add SM8150, SM8250 and SM8350 USB PHY bindings

2021-01-15 Thread Jack Pham
Add the compatible strings for the USB3 PHYs found on SM8150, SM8250
and SM8350 SoCs.

Note the SM8150 and SM8250 compatibles have already been in place in
the dts as well as the driver implementation but were missing from
the documentation.

Signed-off-by: Jack Pham 
---
 .../devicetree/bindings/phy/qcom,qmp-phy.yaml | 67 +++
 1 file changed, 67 insertions(+)

diff --git a/Documentation/devicetree/bindings/phy/qcom,qmp-phy.yaml 
b/Documentation/devicetree/bindings/phy/qcom,qmp-phy.yaml
index 390df23b82e7..841c72863b4f 100644
--- a/Documentation/devicetree/bindings/phy/qcom,qmp-phy.yaml
+++ b/Documentation/devicetree/bindings/phy/qcom,qmp-phy.yaml
@@ -30,15 +30,24 @@ properties:
   - qcom,sdm845-qmp-ufs-phy
   - qcom,sdm845-qmp-usb3-uni-phy
   - qcom,sm8150-qmp-ufs-phy
+  - qcom,sm8150-qmp-usb3-phy
+  - qcom,sm8150-qmp-usb3-uni-phy
   - qcom,sm8250-qmp-ufs-phy
   - qcom,sm8250-qmp-gen3x1-pcie-phy
   - qcom,sm8250-qmp-gen3x2-pcie-phy
   - qcom,sm8250-qmp-modem-pcie-phy
+  - qcom,sm8250-qmp-usb3-phy
+  - qcom,sm8250-qmp-usb3-uni-phy
+  - qcom,sm8350-qmp-usb3-phy
+  - qcom,sm8350-qmp-usb3-uni-phy
   - qcom,sdx55-qmp-usb3-uni-phy
 
   reg:
+minItems: 1
+maxItems: 2
 items:
   - description: Address and length of PHY's common serdes block.
+  - description: Address and length of PHY's DP_COM control block.
 
   "#clock-cells":
 enum: [ 1, 2 ]
@@ -287,6 +296,64 @@ allOf:
 reset-names:
   items:
 - const: phy
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - qcom,sm8150-qmp-usb3-phy
+  - qcom,sm8150-qmp-usb3-uni-phy
+  - qcom,sm8250-qmp-usb3-uni-phy
+  - qcom,sm8350-qmp-usb3-uni-phy
+then:
+  properties:
+clocks:
+  items:
+- description: Phy aux clock.
+- description: 19.2 MHz ref clk source.
+- description: 19.2 MHz ref clk.
+- description: Phy common block aux clock.
+clock-names:
+  items:
+- const: aux
+- const: ref_clk_src
+- const: ref
+- const: com_aux
+resets:
+  items:
+- description: reset of phy block.
+- description: phy common block reset.
+reset-names:
+  items:
+- const: phy
+- const: common
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - qcom,sm8250-qmp-usb3-phy
+  - qcom,sm8350-qmp-usb3-phy
+then:
+  properties:
+clocks:
+  items:
+- description: Phy aux clock.
+- description: 19.2 MHz ref clk.
+- description: Phy common block aux clock.
+clock-names:
+  items:
+- const: aux
+- const: ref_clk_src
+- const: com_aux
+resets:
+  items:
+- description: reset of phy block.
+- description: phy common block reset.
+reset-names:
+  items:
+- const: phy
+- const: common
 
 examples:
   - |
-- 
2.24.0



[PATCH 3/4] dt-bindings: phy: qcom,usb-snps-femto-v2: Add SM8250 and SM8350 bindings

2021-01-15 Thread Jack Pham
Add the compatible strings for the USB2 PHYs found on QCOM
SM8250 & SM8350 SoCs.

Note that the SM8250 compatible is already in use in the dts and
driver implementation but was missing from the documentation.

Signed-off-by: Jack Pham 
---
 .../devicetree/bindings/phy/qcom,usb-snps-femto-v2.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/phy/qcom,usb-snps-femto-v2.yaml 
b/Documentation/devicetree/bindings/phy/qcom,usb-snps-femto-v2.yaml
index 4949a2851532..ee77c6458326 100644
--- a/Documentation/devicetree/bindings/phy/qcom,usb-snps-femto-v2.yaml
+++ b/Documentation/devicetree/bindings/phy/qcom,usb-snps-femto-v2.yaml
@@ -17,6 +17,8 @@ properties:
 enum:
   - qcom,usb-snps-hs-7nm-phy
   - qcom,sm8150-usb-hs-phy
+  - qcom,sm8250-usb-hs-phy
+  - qcom,sm8350-usb-hs-phy
   - qcom,usb-snps-femto-v2-phy
 
   reg:
-- 
2.24.0



[PATCH 0/4] SM8350 USB and dt-bindings updates

2021-01-15 Thread Jack Pham
This series adds support for the SM8350 USB PHY to the QMP PHY driver
as well as adds the documentation for the QMP, SNPS PHY and DWC3
controller bindings. This also adds the bindings for SM8150 and SM8250
to the same docs which had not been added previously even though they
are in use now.

Jack Pham (4):
  phy: qcom-qmp: Add SM8350 USB QMP PHYs
  dt-bindings: phy: qcom,qmp: Add SM8150, SM8250 and SM8350 USB PHY
bindings
  dt-bindings: phy: qcom,usb-snps-femto-v2: Add SM8250 and SM8350
bindings
  dt-bindings: usb: qcom,dwc3: Add bindings for SM8150, SM8250, SM8350

 .../devicetree/bindings/phy/qcom,qmp-phy.yaml |  67 ++
 .../bindings/phy/qcom,usb-snps-femto-v2.yaml  |   2 +
 .../devicetree/bindings/usb/qcom,dwc3.yaml|   3 +
 drivers/phy/qualcomm/phy-qcom-qmp.c   | 209 ++
 drivers/phy/qualcomm/phy-qcom-qmp.h   | 100 +
 5 files changed, 381 insertions(+)

-- 
2.24.0



[PATCH 4/4] dt-bindings: usb: qcom,dwc3: Add bindings for SM8150, SM8250, SM8350

2021-01-15 Thread Jack Pham
Add compatible strings for the USB DWC3 controller on QCOM SM8150,
SM8250 and SM8350 SoCs.

Note the SM8150 & SM8250 compatibles are already being used in the
dts and driver implementation but was missing from the documentation.

Signed-off-by: Jack Pham 
---
 Documentation/devicetree/bindings/usb/qcom,dwc3.yaml | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/usb/qcom,dwc3.yaml 
b/Documentation/devicetree/bindings/usb/qcom,dwc3.yaml
index 2cf525d21e05..da47f43d6b04 100644
--- a/Documentation/devicetree/bindings/usb/qcom,dwc3.yaml
+++ b/Documentation/devicetree/bindings/usb/qcom,dwc3.yaml
@@ -17,6 +17,9 @@ properties:
   - qcom,msm8998-dwc3
   - qcom,sc7180-dwc3
   - qcom,sdm845-dwc3
+  - qcom,sm8150-dwc3
+  - qcom,sm8250-dwc3
+  - qcom,sm8350-dwc3
   - const: qcom,dwc3
 
   reg:
-- 
2.24.0



[PATCH 1/4] phy: qcom-qmp: Add SM8350 USB QMP PHYs

2021-01-15 Thread Jack Pham
Add support for the USB DP & UNI PHYs found on SM8350. These use
version 5.0.0 of the QMP PHY IP and thus require new "V5"
definitions of the register offset macros for the QSERDES RX
and TX blocks. The QSERDES common and QPHY PCS blocks' register
offsets are largely unchanged from V4 so some of the existing
macros can be reused.

Signed-off-by: Jack Pham 
---
 drivers/phy/qualcomm/phy-qcom-qmp.c | 209 
 drivers/phy/qualcomm/phy-qcom-qmp.h | 100 +
 2 files changed, 309 insertions(+)

diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c 
b/drivers/phy/qualcomm/phy-qcom-qmp.c
index f103b14f983e..b2f90bf5c212 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
@@ -216,6 +216,15 @@ static const unsigned int 
qmp_v4_usb3_uniphy_regs_layout[QPHY_LAYOUT_SIZE] = {
[QPHY_PCS_LFPS_RXTERM_IRQ_CLEAR]  = 0x614,
 };
 
+static const unsigned int sm8350_usb3_uniphy_regs_layout[QPHY_LAYOUT_SIZE] = {
+   [QPHY_SW_RESET] = 0x00,
+   [QPHY_START_CTRL]   = 0x44,
+   [QPHY_PCS_STATUS]   = 0x14,
+   [QPHY_PCS_POWER_DOWN_CONTROL]   = 0x40,
+   [QPHY_PCS_AUTONOMOUS_MODE_CTRL] = 0x1008,
+   [QPHY_PCS_LFPS_RXTERM_IRQ_CLEAR]  = 0x1014,
+};
+
 static const unsigned int sdm845_ufsphy_regs_layout[QPHY_LAYOUT_SIZE] = {
[QPHY_START_CTRL]   = 0x00,
[QPHY_PCS_READY_STATUS] = 0x160,
@@ -2025,6 +2034,144 @@ static const struct qmp_phy_init_tbl 
sdx55_usb3_uniphy_rx_tbl[] = {
QMP_PHY_INIT_CFG(QSERDES_V4_RX_GM_CAL, 0x1f),
 };
 
+static const struct qmp_phy_init_tbl sm8350_usb3_tx_tbl[] = {
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_RES_CODE_LANE_TX, 0x00),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_RES_CODE_LANE_RX, 0x00),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_RES_CODE_LANE_OFFSET_TX, 0x16),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_RES_CODE_LANE_OFFSET_RX, 0x0e),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_LANE_MODE_1, 0x35),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_LANE_MODE_3, 0x3f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_LANE_MODE_4, 0x7f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_LANE_MODE_5, 0x3f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_RCV_DETECT_LVL_2, 0x12),
+   QMP_PHY_INIT_CFG(QSERDES_V5_TX_PI_QEC_CTRL, 0x21),
+};
+
+static const struct qmp_phy_init_tbl sm8350_usb3_rx_tbl[] = {
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_FO_GAIN, 0x0a),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_SO_GAIN, 0x05),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_FASTLOCK_FO_GAIN, 0x2f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_SO_SATURATION_AND_ENABLE, 0x7f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_FASTLOCK_COUNT_LOW, 0xff),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_FASTLOCK_COUNT_HIGH, 0x0f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_PI_CONTROLS, 0x99),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_SB2_THRESH1, 0x08),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_SB2_THRESH2, 0x08),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_SB2_GAIN1, 0x00),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_UCDR_SB2_GAIN2, 0x04),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_VGA_CAL_CNTRL1, 0x54),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_VGA_CAL_CNTRL2, 0x0f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_EQU_ADAPTOR_CNTRL2, 0x0f),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_EQU_ADAPTOR_CNTRL3, 0x4a),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_EQU_ADAPTOR_CNTRL4, 0x0a),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_IDAC_TSETTLE_LOW, 0xc0),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_IDAC_TSETTLE_HIGH, 0x00),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_EQ_OFFSET_ADAPTOR_CNTRL1, 0x47),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_SIGDET_CNTRL, 0x04),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_SIGDET_DEGLITCH_CNTRL, 0x0e),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_00_LOW, 0xbb),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_00_HIGH, 0x7b),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_00_HIGH2, 0xbb),
+   QMP_PHY_INIT_CFG_LANE(QSERDES_V5_RX_RX_MODE_00_HIGH3, 0x3d, 1),
+   QMP_PHY_INIT_CFG_LANE(QSERDES_V5_RX_RX_MODE_00_HIGH3, 0x3c, 2),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_00_HIGH4, 0xdb),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_01_LOW, 0x64),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_01_HIGH, 0x24),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_01_HIGH2, 0xd2),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_01_HIGH3, 0x13),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_RX_MODE_01_HIGH4, 0xa9),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_DFE_EN_TIMER, 0x04),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_DFE_CTLE_POST_CAL_OFFSET, 0x38),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_AUX_DATA_TCOARSE_TFINE, 0xa0),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_DCC_CTRL1, 0x0c),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_GM_CAL, 0x00),
+   QMP_PHY_INIT_CFG(QSERDES_V5_RX_VTH_CODE, 0x10),
+};
+
+static const struct qmp_phy_init_tbl sm8350_usb3_pcs_tbl[] = {
+   QMP_PHY_INIT_CFG(QPHY_V5_PCS_USB3_RCVR_DTCT_

Re: [PATCH 2/3] usb: gadget: composite: Split composite reset and disconnect

2021-01-08 Thread Jack Pham
Hi Thinh,

On Fri, Jan 08, 2021 at 02:19:30AM +, Thinh Nguyen wrote:
> Hi Wesley,
> 
> Felipe Balbi wrote:
> > Hi,
> >
> > Wesley Cheng  writes:
> >> +void composite_reset(struct usb_gadget *gadget)
> >> +{
> >> +  /*
> >> +   * Section 1.4.13 Standard Downstream Port of the USB battery charging
> >> +   * specification v1.2 states that a device connected on a SDP shall only
> >> +   * draw at max 100mA while in a connected, but unconfigured state.
> > The requirements are different, though. I think OTG spec has some extra
> > requirements where only 8mA can be drawn max. You need to check for the
> > otg flag. Moreover, USB3+ spec has units of 150mA meaning the device
> > can't draw 100mA (IIRC).
> >
> 
> We see issue with this patch series. For our device running at SSP, the
> device couldn't recover from a port reset and remained in eSS.Inactive
> state.
> 
> This patch series is already in Greg's usb-testing. Please review and
> help fix it.
> 
> We can see the failure once the patch "usb: gadget: configfs: Add a
> specific configFS reset callback" is introduced.

Hmm. Does your device use a legacy USB2 PHY (not generic PHY) driver,
and does it implement the usb_phy_set_power() callback? Because
otherwise this new configfs_composite_reset() callback would not have
changed from the original behavior since the newly introduced (in patch
1/3) dwc3_gadget_vbus_draw() callback would simply be a no-op if
dwc->usb2_phy is not present.

If it does turn out to be something with your PHY driver's set_power(),
it's still puzzling since it's directed to only the usb2_phy, so I'm
curious how telling it to draw 100mA could affect SSP operation at all.

Thanks,
Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 3/4] usb: gadget: u_audio: remove struct uac_req

2020-12-29 Thread Jack Pham
Hi Greg and Jerome,

On Mon, Dec 28, 2020 at 04:01:46PM +0100, Greg Kroah-Hartman wrote:
> On Mon, Dec 21, 2020 at 06:35:30PM +0100, Jerome Brunet wrote:
> > 'struct uac_req' purpose is to link 'struct usb_request' to the
> > corresponding 'struct uac_rtd_params'. However member req is never
> > used. Using the context of the usb request, we can keep track of the
> > corresponding 'struct uac_rtd_params' just as well, without allocating
> > extra memory.
> > 
> > Signed-off-by: Jerome Brunet 
> > ---
> >  drivers/usb/gadget/function/u_audio.c | 58 ---
> >  1 file changed, 26 insertions(+), 32 deletions(-)
> 
> This patch doesn't apply, so I can't apply patches 3 or 4 of this series
> :(
> 
> Can you rebase against my usb-testing branch and resend?

>From the cover letter:

On Mon, Dec 21, 2020 at 06:35:27PM +0100, Jerome Brunet wrote:
> The series depends on this fix [0] by Jack Pham to apply cleanly
> 
> [0]: 
> https://lore.kernel.org/linux-usb/20201029175949.6052-1-ja...@codeaurora.org/

My patch hadn't been picked up by Felipe, so it's not in your tree
either, Greg. Should I just resend it to you first?  Or shall I invite
Jerome to just include it in v2 of this series? 

Thanks,
Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH v2 4/5] USB: gadget: f_fs: add SuperSpeed Plus support

2020-11-30 Thread Jack Pham
On Fri, Nov 27, 2020 at 03:05:58PM +0100, Greg Kroah-Hartman wrote:
> From: "taehyun.cho" 
> 
> Setup the descriptors for SuperSpeed Plus for f_fs. This allows the
> gadget to work properly without crashing at SuperSpeed rates.
> 
> Cc: Felipe Balbi 
> Cc: stable 
> Signed-off-by: taehyun.cho 
> Signed-off-by: Will McVicker 
> Reviewed-by: Peter Chen 
> Signed-off-by: Greg Kroah-Hartman 
> ---
>  drivers/usb/gadget/function/f_fs.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/usb/gadget/function/f_fs.c 
> b/drivers/usb/gadget/function/f_fs.c
> index 046f770a76da..a34a7c96a1ab 100644
> --- a/drivers/usb/gadget/function/f_fs.c
> +++ b/drivers/usb/gadget/function/f_fs.c
> @@ -1327,6 +1327,7 @@ static long ffs_epfile_ioctl(struct file *file, 
> unsigned code,
>   struct usb_endpoint_descriptor *desc;
>  
>   switch (epfile->ffs->gadget->speed) {
> + case USB_SPEED_SUPER_PLUS:
>   case USB_SPEED_SUPER:
>   desc_idx = 2;
>   break;
> @@ -3222,6 +3223,10 @@ static int _ffs_func_bind(struct usb_configuration *c,
>   func->function.os_desc_n =
>   c->cdev->use_os_string ? ffs->interfaces_count : 0;
>  
> + if (likely(super)) {
> + func->function.ssp_descriptors =
> + usb_copy_descriptors(func->function.ss_descriptors);
> + }
>   /* And we're done */
>   ffs_event_add(ffs, FUNCTIONFS_BIND);
>   return 0;
> -- 

Hi Greg,

FWIW I had sent a very similar patch[1] a while back (twice in fact)
but got no response about it. Looks like Taehyun's patch already went
through Google for this, I assume it must be working on their Android
kernels so I've no problem with you or Felipe taking this instead.

Only one difference with my patch though is mine additionally clears the
func->function.ssp_descriptors member to NULL upon ffs_func_unbind() as
otherwise it could lead to a dangling reference in case the function is
re-bound and userspace does not issue SS descriptors the next time.
Realistically I don't think that's possible, except maybe when fuzzing?

[1] 
https://patchwork.kernel.org/project/linux-usb/patch/20201027230731.9073-1-ja...@codeaurora.org/

Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


[PATCH] This is a driver for the HXi series of Corsair ATX power supplies.

2020-11-29 Thread Jack Doan
It is based on Marius Zachmann's  corsair-cpro
driver, since it follows a similar communication pattern, just with a
different protocol.

The devices:
Corsair's HXi line of "smart" power supplies that provide monitoring
data via a proprietary USB-HID interface. The protocol strongly
resembles PMBus, and gives access to voltage, current, and power
measurements for each of the PSU's rails.

The driver:
Again, I based this heavily on the corsair-cpro driver as the
requirements are very similar (weird USB-HID communication, maintaining
compatibility with existing userspace tools, providing hardware
monitoring data). I've tested the driver pretty thoroughly on my HX850i,
and existing userspace tools show that the protocol is common to all HXi
series PSUs. I'm looking to acquire some more Corsair PSU variants (RMi,
and AXi), to see if support can be expanded to them as well.

I do not work for Corsair and I intend to keep this driver maintainted
as long as I feasibly can.
Signed-off-by: Jack Doan 
---
 Documentation/hwmon/corsair-hxi-psu.rst |  50 +++
 Documentation/hwmon/index.rst   |   1 +
 MAINTAINERS |   6 +
 drivers/hwmon/Kconfig   |  10 +
 drivers/hwmon/Makefile  |   1 +
 drivers/hwmon/corsair-hxi-psu.c | 504 
 6 files changed, 572 insertions(+)
 create mode 100644 Documentation/hwmon/corsair-hxi-psu.rst
 create mode 100644 drivers/hwmon/corsair-hxi-psu.c

diff --git a/Documentation/hwmon/corsair-hxi-psu.rst 
b/Documentation/hwmon/corsair-hxi-psu.rst
new file mode 100644
index ..91d6beb36f88
--- /dev/null
+++ b/Documentation/hwmon/corsair-hxi-psu.rst
@@ -0,0 +1,50 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+Kernel driver corsair-hxi-psu
+==
+
+Supported devices:
+
+  * Corsair HXi ATX power supplies (HX750i, HX850i, HX1000i and HX1200i)
+
+Author: Jack Doan
+
+Description
+---
+
+This driver provides a sysfs interface for the Corsair HXi series of ATX
+power supplies.
+
+Usage Notes
+---
+
+Since it is a USB device, hot-swapping is possible. The device is 
auto-detected.
+
+Sysfs entries
+-
+
+* in0_input / in0_labelVoltage on ATX_12V
+* in1_input / in1_labelVoltage on ATX_5V
+* in2_input / in2_labelVoltage on ATX_3V
+* in3_input / in3_labelInput AC voltage
+
+* curr0_input / curr0_labelCurrent on ATX_12V
+* curr1_input / curr1_labelCurrent on ATX_5V
+* curr2_input / curr2_labelCurrent on ATX_3V
+
+* power1_input / power1_labelPower on ATX_12V
+* power2_input / power2_labelPower on ATX_5V
+* power3_input / power3_labelPower on ATX_3V
+* power4_input / power4_labelTotal AC Power
+
+* temp1_input   Temperature before PSU fan
+* temp2_input   Temperature after PSU fan
+
+Future work
+
+
+* Adding support for monitoring and control of the fan
+* Getting and setting the overcurrent-protection mode
+* Testing on other lines of Corsair PSUs (RMi, AXi)
+* Broadening support to other "smart" ATX PSUs (NZXT, Seasonic)
+* Potentially pulling this into the PMBus code
diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index b797db738225..afa598fee1e6 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -49,6 +49,7 @@ Hardware Monitoring Kernel Drivers
bt1-pvt
coretemp
corsair-cpro
+   corsair-hxi-psu
da9052
da9055
dell-smm-hwmon
diff --git a/MAINTAINERS b/MAINTAINERS
index 2daa6ee673f7..2c84a0820ef5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4485,6 +4485,12 @@ L:   linux-hw...@vger.kernel.org
 S: Maintained
 F: drivers/hwmon/corsair-cpro.c
 
+CORSAIR-HXI-PSU HARDWARE MONITOR DRIVER
+M: Jack Doan 
+L: linux-hw...@vger.kernel.org
+S: Maintained
+F: drivers/hwmon/corsair-hxi-psu.c
+
 COSA/SRP SYNC SERIAL DRIVER
 M: Jan "Yenya" Kasprzak 
 S: Maintained
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index a850e4f0e0bd..fd28bb859199 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -449,6 +449,16 @@ config SENSORS_CORSAIR_CPRO
  This driver can also be built as a module. If so, the module
  will be called corsair-cpro.
 
+config SENSORS_CORSAIR_HXI_PSU
+   tristate "Corsair HXi power supplies"
+   depends on HID
+   help
+ If you say yes here you get monitoring support for the Corsair HXi 
series
+ of ATX power supplies
+
+ This driver can also be built as a module. If so, the module
+ will be called corsair-hxi-psu.
+
 config SENSORS_DRIVETEMP
tristate "Hard disk drives with temperature sensors"
depends on SCSI && ATA
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index 9db2903b61e5..8924270c60b7 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_SENSORS_AXI_FAN_C

Re: [Linux-kernel-mentees] [PATCH net] rds: Prevent kernel-infoleak in rds_notify_queue_get()

2020-08-08 Thread Jack Leadford

Hello!

Thanks to Jason for getting this conversation back on track.

Yes: in general, {} or a partial initializer /will/ zero padding bits.

However, there is a bug in some versions of GCC where {} will /not/ zero
padding bits; actually, Jason's test program in this mail 
https://lore.kernel.org/lkml/20200731143604.gf24...@ziepe.ca/

has the right ingredients to trigger the bug, but the GCC
versions used are outside of the bug window. :)

For more details on these cases and more (including said GCC bug), see 
my paper at:


https://www.nccgroup.com/us/about-us/newsroom-and-events/blog/2019/october/padding-the-struct-how-a-compiler-optimization-can-disclose-stack-memory/

Hopefully this paper can serve as a helpful reference when these cases 
are encountered in the kernel.


Thank you.

Jack Leadford

On 8/3/20 4:06 PM, Jason Gunthorpe wrote:

On Sun, Aug 02, 2020 at 03:45:40PM -0700, Joe Perches wrote:

On Sun, 2020-08-02 at 19:28 -0300, Jason Gunthorpe wrote:

On Sun, Aug 02, 2020 at 03:23:58PM -0700, Joe Perches wrote:

On Sun, 2020-08-02 at 19:10 -0300, Jason Gunthorpe wrote:

On Sat, Aug 01, 2020 at 08:38:33AM +0300, Leon Romanovsky wrote:


I'm using {} instead of {0} because of this GCC bug.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53119


This is why the {} extension exists..


There is no guarantee that the gcc struct initialization {}
extension also zeros padding.


We just went over this. Yes there is, C11 requires it.


c11 is not c90.  The kernel uses c90.


The kernel already relies on a lot of C11/C99 features and
behaviors. For instance Linus just bumped the minimum compiler version
so that C11's _Generic is usable.

Why do you think this particular part of C11 shouldn't be relied on?

Jason




[PATCH] drivers: power: axp20x-battery: support setting charge_full_design

2020-07-29 Thread Jack Mitchell
Signed-off-by: Jack Mitchell 
---
 drivers/power/supply/axp20x_battery.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/drivers/power/supply/axp20x_battery.c 
b/drivers/power/supply/axp20x_battery.c
index fe96f77bffa7..8ce4ebe7ccd5 100644
--- a/drivers/power/supply/axp20x_battery.c
+++ b/drivers/power/supply/axp20x_battery.c
@@ -60,6 +60,7 @@
 
 #define AXP20X_V_OFF_MASK  GENMASK(2, 0)
 
+#define AXP20X_BAT_MAX_CAP_VALID   BIT(7)
 
 struct axp20x_batt_ps;
 
@@ -86,6 +87,7 @@ struct axp20x_batt_ps {
struct axp20x_thermal_sensor sensor;
/* Maximum constant charge current */
unsigned int max_ccc;
+   unsigned int charge_full_design;
const struct axp_data   *data;
 };
 
@@ -260,6 +262,10 @@ static int axp20x_battery_get_prop(struct power_supply 
*psy,
val->intval = POWER_SUPPLY_HEALTH_GOOD;
break;
 
+   case POWER_SUPPLY_PROP_CHARGE_FULL_DESIGN:
+   val->intval = axp20x_batt->charge_full_design;
+   break;
+
case POWER_SUPPLY_PROP_CONSTANT_CHARGE_CURRENT:
ret = axp20x_get_constant_charge_current(axp20x_batt,
 >intval);
@@ -401,6 +407,30 @@ static int axp20x_battery_set_max_voltage(struct 
axp20x_batt_ps *axp20x_batt,
  AXP20X_CHRG_CTRL1_TGT_VOLT, val);
 }
 
+static int axp20x_set_charge_full_design(struct axp20x_batt_ps *axp_batt,
+ int charge_full_uah)
+{
+   /* (Unit: 1.456mAh) */
+   int max_capacity_units = charge_full_uah / 1456;
+   int ret;
+
+   u8 max_capacity_msb = (max_capacity_units & 0x7F00) >> 8;
+   u8 max_capacity_lsb = (max_capacity_units & 0xFF);
+
+   axp_batt->charge_full_design = max_capacity_units * 1456;
+
+   max_capacity_msb |= AXP20X_BAT_MAX_CAP_VALID;
+
+   ret = regmap_write(axp_batt->regmap, AXP288_FG_DES_CAP0_REG,
+  max_capacity_lsb);
+
+   if (ret)
+   return ret;
+
+   return regmap_write(axp_batt->regmap, AXP288_FG_DES_CAP1_REG,
+   max_capacity_msb);
+}
+
 static int axp20x_set_constant_charge_current(struct axp20x_batt_ps *axp_batt,
  int charge_current)
 {
@@ -492,6 +522,7 @@ static enum power_supply_property axp20x_battery_props[] = {
POWER_SUPPLY_PROP_STATUS,
POWER_SUPPLY_PROP_VOLTAGE_NOW,
POWER_SUPPLY_PROP_CURRENT_NOW,
+   POWER_SUPPLY_PROP_CHARGE_FULL_DESIGN,
POWER_SUPPLY_PROP_CONSTANT_CHARGE_CURRENT,
POWER_SUPPLY_PROP_CONSTANT_CHARGE_CURRENT_MAX,
POWER_SUPPLY_PROP_HEALTH,
@@ -675,6 +706,7 @@ static int axp20x_power_probe(struct platform_device *pdev)
if (!power_supply_get_battery_info(axp20x_batt->batt, )) {
int vmin = info.voltage_min_design_uv;
int ccc = info.constant_charge_current_max_ua;
+   int cfd = info.charge_full_design_uah;
 
if (vmin > 0 && axp20x_set_voltage_min_design(axp20x_batt,
  vmin))
@@ -692,6 +724,13 @@ static int axp20x_power_probe(struct platform_device *pdev)
axp20x_batt->max_ccc = ccc;
axp20x_set_constant_charge_current(axp20x_batt, ccc);
}
+
+   if (cfd > 0 && axp20x_set_charge_full_design(axp20x_batt,
+  cfd)) {
+   dev_err(>dev,
+   "couldn't set charge_full_design\n");
+   axp20x_batt->charge_full_design = 0;
+   }
}
 
error = axp20x_thermal_register_sensor(pdev, axp20x_batt);
-- 
2.28.0



答复: 答复: PROBLEM: cgroup cost too much memory when transfer small files to tmpfs

2020-07-27 Thread Fangxiuning (Jack, EulerOS)
@Xiuning Would you please take a look and give some suggestion?
I don't suggest this solution for using in long term which skip call 
pam-systemd.so to fix this issue, Sftp sends files and call pam-systemd.so to 
create session which manage resources more reasonable, this is evolution 
direction of systemd upstream. Systemd don't have better solution and Kernel 
cgroup maybe give a better one for this issue.



[Question]many kernel error "neighbour: ndisc_cache: neighbor table overflow!"

2020-06-24 Thread Jack Wang
Hi Folks,

In one of our big cluster, due to capacity increase, more servers are
added to the cluster, and we saw from many pserver reporting error
message below:
 "neighbour: ndisc_cache: neighbor table overflow!"

We've tested increasing the gc_thresh values in sysctl.conf, after
reboot, the errors are gone

+# Threshold when garbage collector becomes more aggressive about
+# purging entries. Entries older than 5 seconds will be cleared
+# when over this number.  Default: 512
+net.ipv4.neigh.default.gc_thresh2 = 4096
+net.ipv6.neigh.default.gc_thresh2 = 4096
+
+# Maximum number of non-PERMANENT neighbor entries allowed.  Increase
+# this when using large numbers of interfaces and when communicating
+# with large numbers of directly-connected peers.  Default: 1024
+net.ipv4.neigh.default.gc_thresh3 = 8192
+net.ipv6.neigh.default.gc_thresh3 = 8192

But we still have many systems running in production, so my question
is: is it safe to apply the setting on the fly when servers are
running with busy traffic? or we have to apply the setting only
through sysctl during boot?

Most of our servers with default settings are running kernel 4.14.137~4.14.154

Thanks in advance!

Best regards!

Jack Wang


Re: [PATCH v3 6/6] arm64: boot: dts: qcom: pm8150b: Add DTS node for PMIC VBUS booster

2020-06-17 Thread Jack Pham
Hey Wesley,

On Wed, Jun 17, 2020 at 11:02:09AM -0700, Wesley Cheng wrote:
> Add the required DTS node for the USB VBUS output regulator, which is
> available on PM8150B.  This will provide the VBUS source to connected
> peripherals.
> 
> Signed-off-by: Wesley Cheng 
> ---
>  arch/arm64/boot/dts/qcom/pm8150b.dtsi   | 6 ++
>  arch/arm64/boot/dts/qcom/sm8150-mtp.dts | 7 +++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/qcom/pm8150b.dtsi 
> b/arch/arm64/boot/dts/qcom/pm8150b.dtsi
> index ec44a8bc2f84..b7274d9d7341 100644
> --- a/arch/arm64/boot/dts/qcom/pm8150b.dtsi
> +++ b/arch/arm64/boot/dts/qcom/pm8150b.dtsi
> @@ -22,6 +22,12 @@ power-on@800 {
>   status = "disabled";
>   };
>  
> + qcom,dcdc@1100 {
> + compatible = "qcom,pm8150b-vbus-reg";
> + status = "disabled";
> + reg = <0x1100>;
> + };
> +
>   qcom,typec@1500 {
>   compatible = "qcom,pm8150b-usb-typec";
>   status = "disabled";

Don't you also need a "usb_vbus-supply" property here under the Type-C
node pointing to the phandle of the vbus reg?

Jack

> diff --git a/arch/arm64/boot/dts/qcom/sm8150-mtp.dts 
> b/arch/arm64/boot/dts/qcom/sm8150-mtp.dts
> index 6c6325c3af59..3845d19893eb 100644
> --- a/arch/arm64/boot/dts/qcom/sm8150-mtp.dts
> +++ b/arch/arm64/boot/dts/qcom/sm8150-mtp.dts
> @@ -426,6 +426,13 @@ _1 {
>   status = "okay";
>  };
>  
> +_bus {
> + pmic@2 {
> + qcom,dcdc@1100 {
> + status = "okay";
> + };
> +};
> +
>  _1_dwc3 {
>   dr_mode = "peripheral";
>  };
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 


Re: [PATCH 1/3] usb: typec: Add QCOM PMIC typec detection driver

2020-06-09 Thread Jack Pham
_vbus_rdesc;
> + struct regulator_dev *usb_vbus_reg;

These aren't used...leftovers from earlier revisions?

> +};
> +
> +static int qcom_pmic_typec_vbus_enable(struct qcom_pmic_typec *qcom_usb)
> +{
> + int rc;
> +
> + rc = regmap_update_bits(qcom_usb->regmap, CMD_OTG, OTG_EN, OTG_EN);
> + if (rc)
> + dev_err(qcom_usb->dev, "failed to update OTG_CTL\n");
> +
> + return rc;
> +}
> +
> +static int qcom_pmic_typec_vbus_disable(struct qcom_pmic_typec *qcom_usb)
> +{
> + int rc;
> +
> + rc = regmap_update_bits(qcom_usb->regmap, CMD_OTG, OTG_EN, 0);
> + if (rc)
> + dev_err(qcom_usb->dev, "failed to update OTG_CTL\n");
> +
> + return rc;
> +}
> +
> +void qcom_pmic_typec_bh_work(struct work_struct *w)
> +{
> + struct qcom_pmic_typec *qcom_usb = container_of(w,
> + struct qcom_pmic_typec,
> + bh_work);
> + enum typec_orientation orientation;
> + enum usb_role role;
> + unsigned int stat;
> +
> + regmap_read(qcom_usb->regmap, TYPEC_MISC_STATUS, );
> +
> + if (stat & CC_ATTACHED) {
> + orientation = ((stat & CC_ORIENTATION) >> 1) ?
> + TYPEC_ORIENTATION_REVERSE :
> + TYPEC_ORIENTATION_NORMAL;
> + typec_set_orientation(qcom_usb->port, orientation);
> +
> + role = (stat & SNK_SRC_MODE) ? USB_ROLE_HOST : USB_ROLE_DEVICE;
> + if (role == USB_ROLE_HOST)
> + qcom_pmic_typec_vbus_enable(qcom_usb);
> + else
> + qcom_pmic_typec_vbus_disable(qcom_usb);
> +
> + usb_role_switch_set_role(qcom_usb->role_sw, role);
> + } else {
> + usb_role_switch_set_role(qcom_usb->role_sw, USB_ROLE_NONE);
> + qcom_pmic_typec_vbus_disable(qcom_usb);
> + }
> +}
> +
> +irqreturn_t qcom_pmic_typec_interrupt(int irq, void *_qcom_usb)
> +{
> + struct qcom_pmic_typec *qcom_usb = _qcom_usb;
> +
> + queue_work(system_power_efficient_wq, _usb->bh_work);

Do we really need a workqueue function here or would a threaded IRQ
handler work?

> +
> + return IRQ_HANDLED;
> +}
> +
> +static void qcom_pmic_typec_typec_hw_init(struct qcom_pmic_typec *qcom_usb)
> +{
> + u8 mode;
> +
> + regmap_update_bits(qcom_usb->regmap, TYPE_C_CFG_REG, BC12_START_ON_CC,
> +0);
> + regmap_update_bits(qcom_usb->regmap, TYPEC_INTR_EN_CFG_1,
> +TYPEC_INTR_EN_CFG_1_MASK, 0);
> +
> + if (qcom_usb->cap->type != TYPEC_PORT_DRP)
> + mode = (qcom_usb->cap->type == TYPEC_PORT_SNK) ?
> + EN_SNK_ONLY : EN_SRC_ONLY;
> + else
> + mode = EN_TRY_SNK;
> + regmap_update_bits(qcom_usb->regmap, TYPEC_MODE_CFG,
> +EN_SNK_ONLY | EN_TRY_SNK | EN_SRC_ONLY, mode);

HW also has an EN_TRY_SRC bit too. Would these EN_TRY_{SRC,SNK} bits
happen to be compatible with the struct typec's try_role() at all?

> +
> + regmap_update_bits(qcom_usb->regmap, TYPEC_VCONN_CONTROL,
> +VCONN_EN_SRC | VCONN_EN_VAL, VCONN_EN_SRC);
> + regmap_update_bits(qcom_usb->regmap, TYPEC_VCONN_CONTROL,
> +VCONN_EN_SRC | VCONN_EN_VAL, VCONN_EN_SRC);

duplicated?

> + regmap_update_bits(qcom_usb->regmap, TYPEC_EXIT_STATE_CFG,
> +SEL_SRC_UPPER_REF, SEL_SRC_UPPER_REF);
> + regmap_update_bits(qcom_usb->regmap, OTG_CFG, OTG_EN_SRC_CFG,
> +OTG_EN_SRC_CFG);

I thought setting this OTG_EN_SRC_CFG bit makes the VBUS boost
controlled by HW, thereby negating the need for
qcom_pmic_typec_vbus_enable/disable ?

Thanks,
Jack

> +}
> +
> +static int qcom_pmic_typec_probe(struct platform_device *pdev)
> +{
> + struct device *dev = >dev;
> + struct qcom_pmic_typec *qcom_usb;
> + struct typec_capability *cap;
> + const char *buf;
> + int ret, irq, role;
> +
> + qcom_usb = kzalloc(sizeof(*qcom_usb), GFP_KERNEL);
> + if (!qcom_usb)
> + return -ENOMEM;
> +
> + qcom_usb->dev = dev;
> +
> + qcom_usb->regmap = dev_get_regmap(dev->parent, NULL);
> + if (!qcom_usb->regmap) {
> + dev_err(dev, "Failed to get regmap\n");
> + return -EINVAL;
> + }
> +
> + irq = platform_get_irq(pdev, 0);
> + if (irq < 0) {
> + dev_err(dev, "Failed to get C

Guten Tag, wie geht es ihnen?

2020-06-05 Thread Andy Jack Chin
Hallo mein guter Freund.

Guten Tag, wie geht es ihnen? Es ist zu lange her, dass ich von dir
höre. Im Moment freue ich mich sehr, Sie über meinen Erfolg bei der
Überweisung dieser Erbschaftsgelder in Zusammenarbeit mit einem neuen
Partner aus Indien zu informieren. Er ist ein Deutscher, lebt aber in
Indien, aber derzeit bin ich in Indien für Investitionsprojekte mit
meinem eigenen Anteil an der Gesamtsumme von Millionen Dollar. In der
Zwischenzeit habe ich Ihre bisherigen Bemühungen und Versuche, mich
bei der Überweisung dieser Erbschaftsgelder zu unterstützen, nicht
vergessen, obwohl es uns irgendwie gescheitert ist. Ich weiß sehr gut,
dass ich Sie in dieser Angelegenheit kontaktiert habe. Jetzt möchte
ich, dass Sie meine Sekretärin in der Republik Lome Togo, Westafrika,
kontaktieren. Sie heißt Frau Alina Joyce Bama und wird von ihr unter
ihrer E-Mail-Adresse ( blessedmrsalinajoyceb...@outlook.com ) gebeten
Wenden Sie sich an die Ecobank, wenn ich die Summe von 200.000,00 USD
für Ihre Entschädigung behalten habe. Dieser Entschädigungsfonds ist
für alle bisherigen Bemühungen und Versuche vorgesehen, mich bei der
abgeschlossenen Transaktion zu unterstützen. Ich habe Ihre damaligen
Bemühungen sehr geschätzt. Wenden Sie sich an meine Sekretärin, Frau
Alina Joyce Bama, und weisen Sie sie an, wohin die Ecobank den
Gesamtbetrag von 200.000,00 USD überweisen wird.

Bitte lassen Sie mich sofort wissen, dass die Ecobank den Fonds
200.000,00 USD auf Ihr eigenes Bankkonto überweist. damit ich weiß,
dass du auch sehr glücklich mit mir bist und wir die Freude nach all
dem Leid zu dieser Zeit teilen. Im Moment bin ich wegen der
Investitionsprojekte, die ich mit meinem neuen Partner vor mir habe,
zu beschäftigt. Denken Sie schließlich daran, dass ich in Ihrem Namen
die Anweisung an meine Sekretärin, Frau Alina Joyce Bama,
weitergeleitet habe, den Fonds 200.000,00 USD von der Ecobank zu
erhalten Fühlen Sie sich frei und setzen Sie sich mit Frau Alina Joyce
Bama, meiner Sekretärin, in Verbindung. Sie wird sich unverzüglich in
Ihrem Namen mit der Ecobank in Verbindung setzen.

Freundliche Grüße.
Dr. Andy Jack Chin.


Re: [RFC v3 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements

2020-05-29 Thread Jack Pham
->last_fifo_depth += DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
> +
> + /* Check fifo size allocation doesn't exceed available RAM size. */
> + if (dwc->last_fifo_depth >= ram1_depth) {
> + dev_err(dwc->dev, "Fifosize(%d) > RAM size(%d) %s depth:%d\n",
> + (dwc->last_fifo_depth * mdwidth), ram1_depth,
> + dep->endpoint.name, fifo_size);

Use dev_WARN() here and eliminate the WARN_ON(1) below?

> + if (dwc3_is_usb31(dwc))
> + fifo_size = DWC31_GTXFIFOSIZ_TXFDEP(fifo_size);
> + else
> + fifo_size = DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
> + dwc->last_fifo_depth -= fifo_size;
> + dep->fifo_depth = 0;
> + WARN_ON(1);
> + return -ENOMEM;
> + }
> +
> + dwc3_writel(dwc->regs, DWC3_GTXFIFOSIZ(dep->number >> 1), fifo_size);
> + dwc->num_ep_resized++;
> + return 0;
> +}
> +
>  static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int 
> action)
>  {
>   const struct usb_ss_ep_comp_descriptor *comp_desc;
> @@ -620,6 +731,10 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, 
> unsigned int action)
>   int ret;
>  
>   if (!(dep->flags & DWC3_EP_ENABLED)) {
> + ret = dwc3_gadget_resize_tx_fifos(dwc, dep);
> + if (ret)
> + return ret;
> +
>   ret = dwc3_gadget_start_config(dep);
>   if (ret)
>   return ret;

Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH] usb: dwc3: gadget: Correct the logic for finding last SG entry

2019-10-23 Thread Jack Pham
Hi Anurag,

On Fri, Jun 07, 2019 at 09:49:59AM +0300, Felipe Balbi wrote:
> Anurag Kumar Vulisha  writes:
> >>> The dma_map_sg() merges sg1 & sg2 memory regions into sg1-
> >>>dma_address.
> >>> Similarly sg3 & sg4 into sg2->dma_address, sg5 & sg6 into the
> >>> sg3->dma_address and sg6 & sg8 into sg4->dma_address. Here the
> >>memory
> >>> regions are merged but the page_link properties like SG_END are not
> >>> retained into the merged sgs.
> >>
> >>isn't this a bug in the scatterlist mapping code? Why doesn't it keep
> >>SG_END?
> >>
> >
> > Thanks for providing your comment.
> >
> > I don't think it is a bug, instead I feel some enhancement needs to be done 
> > in
> > dma-mapping code.
> >
> > SG_END represents the last sg entry in the sglist and it is correctly 
> > getting
> > set to the last sg entry.
> >
> > The issue happens only when 2 or more sg entry pages are merged into
> > contiguous dma-able address and sg_is_last() is used to find the last sg 
> > entry
> > with valid dma address.
> 
> Right, and that's something that's bound to happen. I'm arguing that, perhaps,
> dma API should move SG_END in case entries are merged.
> 
> > I think that along with sg_is_last() a new flag (SG_DMA_END) and function
> > (something like sg_dma_is_last() ) needs to be added into dma-mapping code 
> > for
> > identifying the last valid sg entry with valid dma address. So that we can
> > make use of that function instead of sg_is_last().
> 
> Sure, propose a patch to DMA API.

I'm curious if this was ever resolved. I just ran into this exact issue
with Android ADB which uses 16KB buffers, along with f_fs supporting
S/G since 5.0, combined with our IOMMU which performs this merging
behavior, so it resulted in a single TRB getting queued with CHN=1 and
LST=0 and thus the transfer never completes. Your initial patch resolves
the issue for me, but upon revisiting this discussion I couldn't tell if
you had attempted to patch DMA API instead as per Felipe's suggestion.

Thanks,
Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH v2 3/3] phy: qcom-qmp: Add SM8150 QMP UFS PHY support

2019-10-08 Thread Jack Pham
Hi Vinod,

On Fri, Sep 06, 2019 at 10:40:17AM +0530, Vinod Koul wrote:
> SM8150 UFS PHY is v4 of QMP phy. Add support for V4 QMP phy register
> defines and support for SM8150 QMP UFS PHY.
> 
> Signed-off-by: Vinod Koul 
> Reviewed-by: Bjorn Andersson 
> ---
>  drivers/phy/qualcomm/phy-qcom-qmp.c | 125 
>  drivers/phy/qualcomm/phy-qcom-qmp.h |  96 +
>  2 files changed, 221 insertions(+)
> 
> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c 
> b/drivers/phy/qualcomm/phy-qcom-qmp.c
> index 34ff6434da8f..92d3048f2b36 100644
> --- a/drivers/phy/qualcomm/phy-qcom-qmp.c
> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
> @@ -164,6 +164,11 @@ static const unsigned int sdm845_ufsphy_regs_layout[] = {
>   [QPHY_PCS_READY_STATUS] = 0x160,
>  };
>  
> +static const unsigned int sm8150_ufsphy_regs_layout[] = {
> + [QPHY_START_CTRL]   = 0x00,
> + [QPHY_PCS_READY_STATUS] = 0x180,
> +};
> +
>  static const struct qmp_phy_init_tbl msm8996_pcie_serdes_tbl[] = {
>   QMP_PHY_INIT_CFG(QSERDES_COM_BIAS_EN_CLKBUFLR_EN, 0x1c),
>   QMP_PHY_INIT_CFG(QSERDES_COM_CLK_ENABLE1, 0x10),
> @@ -878,6 +883,93 @@ static const struct qmp_phy_init_tbl 
> msm8998_usb3_pcs_tbl[] = {
>   QMP_PHY_INIT_CFG(QPHY_V3_PCS_RXEQTRAINING_RUN_TIME, 0x13),
>  };
>  
> +static const struct qmp_phy_init_tbl sm8150_ufsphy_serdes_tbl[] = {
> + QMP_PHY_INIT_CFG(QPHY_POWER_DOWN_CONTROL, 0x01),
> + QMP_PHY_INIT_CFG(QSERDES_COM_V4_SYSCLK_EN_SEL, 0xd9),

QSERDES_V4_COM? See below.



> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.h 
> b/drivers/phy/qualcomm/phy-qcom-qmp.h
> index 335ea5d7ef40..0eefd8621669 100644
> --- a/drivers/phy/qualcomm/phy-qcom-qmp.h
> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.h
> @@ -313,4 +313,100 @@
>  #define QPHY_V3_PCS_MISC_OSC_DTCT_MODE2_CONFIG4  0x5c
>  #define QPHY_V3_PCS_MISC_OSC_DTCT_MODE2_CONFIG5  0x60
>  
> +/* Only for QMP V4 PHY - QSERDES COM registers */
> +#define QSERDES_COM_V4_SYSCLK_EN_SEL 0x094

Should these rather be prefixed as QSERDES_V4_COM? There are already
QSERDES_V3_COM_* in this header so the convention appears to be
Q{SERDES,PHY}_VX_{COM,TX,RX,PCS}.

> +#define QSERDES_COM_V4_HSCLK_SEL 0x158
> +#define QSERDES_COM_V4_HSCLK_HS_SWITCH_SEL   0x15C
> +#define QSERDES_COM_V4_LOCK_CMP_EN   0x0A4
> +#define QSERDES_COM_V4_VCO_TUNE_MAP  0x10C

Nit: sort in ascending offset order, and make the hex values lowercase?



> +/* Only for QMP V4 PHY - PCS registers */
> +#define QPHY_V4_PHY_START0x000
> +#define QPHY_V4_POWER_DOWN_CONTROL   0x004
> +#define QPHY_V4_SW_RESET 0x008
> +#define QPHY_V4_PCS_READY_STATUS 0x180
> +#define QPHY_V4_LINECFG_DISABLE  0x148
> +#define QPHY_V4_MULTI_LANE_CTRL1 0x1E0
> +#define QPHY_V4_RX_SIGDET_CTRL2  0x158
> +#define QPHY_V4_TX_LARGE_AMP_DRV_LVL 0x030
> +#define QPHY_V4_TX_SMALL_AMP_DRV_LVL 0x038
> +#define QPHY_V4_TX_MID_TERM_CTRL10x1D8
> +#define QPHY_V4_DEBUG_BUS_CLKSEL 0x124
> +#define QPHY_V4_PLL_CNTL 0x02C
> +#define QPHY_V4_TIMER_20US_CORECLK_STEPS_MSB 0x00C
> +#define QPHY_V4_TIMER_20US_CORECLK_STEPS_LSB 0x010
> +#define QPHY_V4_TX_PWM_GEAR_BAND 0x160
> +#define QPHY_V4_TX_HS_GEAR_BAND  0x168
> +#define QPHY_V4_TX_HSGEAR_CAPABILITY 0x074
> +#define QPHY_V4_RX_HSGEAR_CAPABILITY 0x0B4
> +#define QPHY_V4_RX_MIN_HIBERN8_TIME  0x150
> +#define QPHY_V4_BIST_FIXED_PAT_CTRL  0x060

Interesting. These offsets appear to be valid only for the UFS instance
of the QMP PHY. For PCIe and USB the PCS layout is completely different.
Wonder if we need to add _UFS_ to  the prefix to differentiate them? Or
can this be deferred to when PCIe/USB PHY driver support for SM8150 gets
added?

I was thinking of taking a stab at USB if I get time, not sure if that's
already on your or somebody's (Bjorn?) radar.

Thanks
Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [RFC][PATCH v2 2/5] usb: dwc3: Execute GCTL Core Soft Reset while switch mdoe for Hisilicon Kirin Soc

2019-10-07 Thread Jack Pham
Hi John, Yu, Felipe,

On Mon, Oct 07, 2019 at 05:55:50PM +, John Stultz wrote:
> From: Yu Chen 
> 
> A GCTL soft reset should be executed when switch mode for dwc3 core
> of Hisilicon Kirin Soc.
> 
> Cc: Greg Kroah-Hartman 
> Cc: Felipe Balbi 
> Cc: Andy Shevchenko 
> Cc: Rob Herring 
> Cc: Mark Rutland 
> Cc: Yu Chen 
> Cc: Matthias Brugger 
> Cc: Chunfeng Yun 
> Cc: linux-...@vger.kernel.org
> Cc: devicet...@vger.kernel.org
> Signed-off-by: Yu Chen 
> Signed-off-by: John Stultz 
> ---
>  drivers/usb/dwc3/core.c | 20 
>  drivers/usb/dwc3/core.h |  3 +++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index 999ce5e84d3c..440261432421 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -112,6 +112,19 @@ void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
>   dwc->current_dr_role = mode;
>  }
>  
> +static void dwc3_gctl_core_soft_reset(struct dwc3 *dwc)
> +{
> + u32 reg;
> +
> + reg = dwc3_readl(dwc->regs, DWC3_GCTL);
> + reg |= DWC3_GCTL_CORESOFTRESET;
> + dwc3_writel(dwc->regs, DWC3_GCTL, reg);
> +
> + reg = dwc3_readl(dwc->regs, DWC3_GCTL);
> + reg &= ~DWC3_GCTL_CORESOFTRESET;
> + dwc3_writel(dwc->regs, DWC3_GCTL, reg);
> +}
> +
>  static void __dwc3_set_mode(struct work_struct *work)
>  {
>   struct dwc3 *dwc = work_to_dwc(work);
> @@ -156,6 +169,10 @@ static void __dwc3_set_mode(struct work_struct *work)
>  
>   dwc3_set_prtcap(dwc, dwc->desired_dr_role);
>  
> + /* Execute a GCTL Core Soft Reset when switch mode */
> + if (dwc->gctl_reset_quirk)
> + dwc3_gctl_core_soft_reset(dwc);
> +

In fact it is mentioned in the Synopsys databook to perform a GCTL
CoreSoftReset when changing the PrtCapDir between device & host modes.
So I think this should apply generally without a quirk. Further, it
states to do this *prior* to writing PrtCapDir, so should it go before
dwc3_set_prtcap() instead?

Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 4.19 00/57] 4.19.72-stable review

2019-09-10 Thread Jack Wang
Greg Kroah-Hartman  于2019年9月9日周一 下午12:19写道:
>
> This is the start of the stable review cycle for the 4.19.72 release.
> There are 57 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue 10 Sep 2019 12:09:36 PM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.72-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.19.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Merged, boot and tested on my testing machine, no regression found.

Thanks,
Jack Wang


Re: [PATCH v4 3/4] dt-bindings: Add Qualcomm USB SuperSpeed PHY bindings

2019-09-05 Thread Jack Pham
Hi Jorge, Bjorn,

On Thu, Sep 05, 2019 at 09:18:57AM +0200, Jorge Ramirez wrote:
> On 9/4/19 01:34, Bjorn Andersson wrote:
> > On Tue 03 Sep 14:45 PDT 2019, Stephen Boyd wrote:
> > 
> >> Quoting Jack Pham (2019-09-03 10:39:24)
> >>> On Mon, Sep 02, 2019 at 08:23:04AM +0200, Jorge Ramirez wrote:
> >>>> On 8/30/19 20:28, Stephen Boyd wrote:
> >>>>> Quoting Bjorn Andersson (2019-08-30 09:45:20)
> >>>>>> On Fri 30 Aug 09:01 PDT 2019, Stephen Boyd wrote:
> >>>>>>
> >>>>>>>>>
> >>>>>>>>> The USB-C connector is attached both to the HS and SS PHYs, so I 
> >>>>>>>>> think
> >>>>>>>>> you should represent this external to this node and use of_graph to
> >>>>>>>>> query it.
> >>>>>>>>
> >>>>>>>> but AFAICS we wont be able to retrieve the vbux-supply from an 
> >>>>>>>> external
> >>>>>>>> node (that interface does not exist).
> >>>>>>>>
> >>>>>>>> rob, do you have a suggestion?
> >>>>>>>
> >>>>>>> Shouldn't the vbus supply be in the phy? Or is this a situation where
> >>>>>>> the phy itself doesn't have the vbus supply going to it because the 
> >>>>>>> PMIC
> >>>>>>> gets in the way and handles the vbus for the connector by having the 
> >>>>>>> SoC
> >>>>>>> communicate with the PMIC about when to turn the vbus on and off, etc?
> >>>>>>>
> >>>>>>
> >>>>>> That's correct, the VBUS comes out of the PMIC and goes directly to the
> >>>>>> connector.
> >>>>>>
> >>>>>> The additional complicating factor here is that the connector is wired
> >>>>>> to a USB2 phy as well, so we need to wire up detection and vbus control
> >>>>>> to both of them - but I think this will be fine, if we can only figure
> >>>>>> out a sane way of getting hold of the vbus-supply.
> >>>>>>
> >>>>>
> >>>>> Does it really matter to describe this situation though? Maybe it's
> >>>>> simpler to throw the vbus supply into the phy and control it from the
> >>>>> phy driver, even if it never really goes there. Or put it into the
> >>>>> toplevel usb controller?
> >>>>>
> >>>> that would work for me - the connector definition seemed a better way to
> >>>> explain the connectivity but since we cant retrieve the supply from the
> >>>> external node is not of much functional use.
> >>>>
> >>>> but please let me know how to proceed. shall I add the supply back to
> >>>> the phy?
> >>
> >> So does the vbus actually go to the phy? I thought it never went there
> >> and the power for the phy was different (and possibly lower in voltage).
> >>
> > 
> > No, the PHYs use different - lower voltage - supplies to operate. VBUS
> > is coming from a 5V supply straight to the connector and plug-detect
> > logic (which is passive in this design).
> > 
> >>>
> >>> Putting it in the toplevel usb node makes sense to me, since that's
> >>> usually the driver that knows when it's switching into host mode and
> >>> needs to turn on VBUS. The dwc3-qcom driver & bindings currently don't 
> >>> do this but there's precedent in a couple of the other dwc3 "glues"--see
> >>> Documentation/devicetree/bindings/usb/{amlogic\,dwc3,omap-usb}.txt
> >>>
> >>> One exception is if the PMIC is also USB-PD capable and can do power
> >>> role swap, in which case the VBUS control needs to be done by the TCPM,
> >>> so that'd be a case where having vbus-supply in the connector node might
> >>> make more sense.
> >>>
> >>
> >> The other way is to implement the code to get the vbus supply out of a
> >> connector. Then any driver can do the work if it knows it needs to and
> >> we don't have to care that the vbus isn't going somewhere. I suppose
> >> that would need an of_regulator_get() sort of API that can get the
> >> regulator out of there? Or to make the connector into a struct device
> >> that can get the regulator out per s

Re: [PATCH v4 3/4] dt-bindings: Add Qualcomm USB SuperSpeed PHY bindings

2019-09-03 Thread Jack Pham
On Mon, Sep 02, 2019 at 08:23:04AM +0200, Jorge Ramirez wrote:
> On 8/30/19 20:28, Stephen Boyd wrote:
> > Quoting Bjorn Andersson (2019-08-30 09:45:20)
> >> On Fri 30 Aug 09:01 PDT 2019, Stephen Boyd wrote:
> >>
> >>> Quoting Jorge Ramirez (2019-08-29 00:03:48)
> >>>> On 2/23/19 17:52, Bjorn Andersson wrote:
> >>>>> On Thu 07 Feb 03:17 PST 2019, Jorge Ramirez-Ortiz wrote:
> >>>>>> +
> >>>>>> +Required child nodes:
> >>>>>> +
> >>>>>> +- usb connector node as defined in 
> >>>>>> bindings/connector/usb-connector.txt
> >>>>>> +  containing the property vbus-supply.
> >>>>>> +
> >>>>>> +Example:
> >>>>>> +
> >>>>>> +usb3_phy: usb3-phy@78000 {
> >>>>>> +compatible = "qcom,snps-usb-ssphy";
> >>>>>> +reg = <0x78000 0x400>;
> >>>>>> +#phy-cells = <0>;
> >>>>>> +clocks = < RPM_SMD_LN_BB_CLK>,
> >>>>>> + < GCC_USB_HS_PHY_CFG_AHB_CLK>,
> >>>>>> + < GCC_USB3_PHY_PIPE_CLK>;
> >>>>>> +clock-names = "ref", "phy", "pipe";
> >>>>>> +resets = < GCC_USB3_PHY_BCR>,
> >>>>>> + < GCC_USB3PHY_PHY_BCR>;
> >>>>>> +reset-names = "com", "phy";
> >>>>>> +vdd-supply = <_l3_1p05>;
> >>>>>> +vdda1p8-supply = <_l5_1p8>;
> >>>>>> +usb3_c_connector: usb3-c-connector {
> >>>
> >>> Node name should be 'connector', not usb3-c-connector.
> >>>
> >>
> >> It probably has to be usb-c-connector, because we have a
> >> micro-usb-connector on the same board.
> > 
> > Ok. Or connector@1 and connector@2? Our toplevel node container story is
> > sort of sad because we have to play tricks with node names. But in the
> > example, just connector I presume? 
> > 
> >>
> >>>>>
> >>>>> The USB-C connector is attached both to the HS and SS PHYs, so I think
> >>>>> you should represent this external to this node and use of_graph to
> >>>>> query it.
> >>>>
> >>>> but AFAICS we wont be able to retrieve the vbux-supply from an external
> >>>> node (that interface does not exist).
> >>>>
> >>>> rob, do you have a suggestion?
> >>>
> >>> Shouldn't the vbus supply be in the phy? Or is this a situation where
> >>> the phy itself doesn't have the vbus supply going to it because the PMIC
> >>> gets in the way and handles the vbus for the connector by having the SoC
> >>> communicate with the PMIC about when to turn the vbus on and off, etc?
> >>>
> >>
> >> That's correct, the VBUS comes out of the PMIC and goes directly to the
> >> connector.
> >>
> >> The additional complicating factor here is that the connector is wired
> >> to a USB2 phy as well, so we need to wire up detection and vbus control
> >> to both of them - but I think this will be fine, if we can only figure
> >> out a sane way of getting hold of the vbus-supply.
> >>
> > 
> > Does it really matter to describe this situation though? Maybe it's
> > simpler to throw the vbus supply into the phy and control it from the
> > phy driver, even if it never really goes there. Or put it into the
> > toplevel usb controller?
> > 
> that would work for me - the connector definition seemed a better way to
> explain the connectivity but since we cant retrieve the supply from the
> external node is not of much functional use.
> 
> but please let me know how to proceed. shall I add the supply back to
> the phy?

Putting it in the toplevel usb node makes sense to me, since that's
usually the driver that knows when it's switching into host mode and
needs to turn on VBUS. The dwc3-qcom driver & bindings currently don't 
do this but there's precedent in a couple of the other dwc3 "glues"--see
Documentation/devicetree/bindings/usb/{amlogic\,dwc3,omap-usb}.txt

One exception is if the PMIC is also USB-PD capable and can do power
role swap, in which case the VBUS control needs to be done by the TCPM,
so that'd be a case where having vbus-supply in the connector node might
make more sense.

Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


we offer all kinds of loan at 3% Apply Now

2019-08-28 Thread Mr. Jack Nicolas




--
Attn: Are you in need of a loan? we offer all kinds of loan like 
Personal
loans, Debt Consolidation Loan, Business Loan, with an interest rate of 
3%
Have you been turned down by your bank? Do you have bad credit? Do you 
have
unpaid bills? Are you in debt? Do you need to set up a business? Worry 
no

more as we are here to offer you the chance to get a loan.if interested
contact us for more information on this email via: 
jackfunds...@gmail.com


we offer all kinds of loan at 3%

2019-08-28 Thread Mr. Jack Nicolas




--
Attn: Are you in need of a loan? we offer all kinds of loan like 
Personal
loans, Debt Consolidation Loan, Business Loan, with an interest rate of 
3%
Have you been turned down by your bank? Do you have bad credit? Do you 
have
unpaid bills? Are you in debt? Do you need to set up a business? Worry 
no

more as we are here to offer you the chance to get a loan.if interested
contact us for more information on this email via: 
jackfunds...@gmail.com


Re: [PATCH AUTOSEL 4.4 04/14] perf header: Fix divide by zero error if f_header.attr_size==0

2019-08-19 Thread Jack Wang
Sasha Levin  于2019年8月6日周二 下午11:39写道:
>
> From: Vince Weaver 
>
> [ Upstream commit 7622236ceb167aa3857395f9bdaf871442aa467e ]
>
> So I have been having lots of trouble with hand-crafted perf.data files
> causing segfaults and the like, so I have started fuzzing the perf tool.
>
> First issue found:
>
> If f_header.attr_size is 0 in the perf.data file, then perf will crash
> with a divide-by-zero error.
>
> Committer note:
>
> Added a pr_err() to tell the user why the command failed.
>
> Signed-off-by: Vince Weaver 
> Cc: Alexander Shishkin 
> Cc: Jiri Olsa 
> Cc: Namhyung Kim 
> Cc: Peter Zijlstra 
> Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1907231100440.14532@macbook-air
> Signed-off-by: Arnaldo Carvalho de Melo 
> Signed-off-by: Sasha Levin 
> ---
>  tools/perf/util/header.c | 7 +++
>  1 file changed, 7 insertions(+)
>
Hi all,

This on cause build failure when I rebased to 4.14.140-rc1 in stable-rc tree.

util/header.c: In function 'perf_session__read_header':
util/header.c:2907:10: error: 'data' undeclared (first use in this
function); did you mean 'dots'?
  data->file.path);
  Should be fixed by:
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2904,7 +2904,7 @@ int perf_session__read_header(struct
perf_session *session)
if (f_header.attr_size == 0) {
pr_err("ERROR: The %s file's attr size field is 0
which is unexpected.\n"
   "Was the 'perf record' command properly terminated?\n",
-  data->file.path);
+  file->path);
return -EINVAL;

Regards,
Jack Wang

> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 304f5d7101436..0102dd46fb6da 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -2591,6 +2591,13 @@ int perf_session__read_header(struct perf_session 
> *session)
>file->path);
> }
>
> +   if (f_header.attr_size == 0) {
> +   pr_err("ERROR: The %s file's attr size field is 0 which is 
> unexpected.\n"
> +  "Was the 'perf record' command properly terminated?\n",
> +  data->file.path);
> +   return -EINVAL;
> +   }
> +
> nr_attrs = f_header.attrs.size / f_header.attr_size;
> lseek(fd, f_header.attrs.offset, SEEK_SET);
>
> --
> 2.20.1
>


Re: [PATCH 4.14 00/53] 4.14.137-stable review

2019-08-06 Thread Jack Wang
Greg Kroah-Hartman  于2019年8月5日周一 下午3:14写道:
>
> This is the start of the stable review cycle for the 4.14.137 release.
> There are 53 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed 07 Aug 2019 12:47:58 PM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.137-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Merge, and regression tested on my test machines,  all looks good!

Thanks,
Jack Wang


Module compression & loadpin

2019-07-31 Thread Jack Rosenthal
Has anyone looked into what it may take to support both module
compression and loadpin (ensures modules come from trusted filesystem)?

>From my understanding, this is not supported as kmod currently does the
decompression of modules, and loadpin prefers fload_module as it can
tell where the module came from. (https://crbug.com/777204)

In a gist, I am thinking supporting this scenario would require the
module decompression to happen on the kernel side. Wondering if anyone
has looked into this before I go making a solution...

Thanks,

Jack


Re: [PATCH stable-4.19 1/2] KVM: nVMX: do not use dangling shadow VMCS after guest reset

2019-07-29 Thread Jack Wang
Paolo Bonzini  于2019年7月29日周一 上午11:10写道:
>
> On 29/07/19 10:58, Jack Wang wrote:
> > Vitaly Kuznetsov  于2019年7月25日周四 下午3:29写道:
> >>
> >> From: Paolo Bonzini 
> >>
> >> [ Upstream commit 88dddc11a8d6b09201b4db9d255b3394d9bc9e57 ]
> >>
> >> If a KVM guest is reset while running a nested guest, free_nested will
> >> disable the shadow VMCS execution control in the vmcs01.  However,
> >> on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
> >> the VMCS12 to the shadow VMCS which has since been freed.
> >>
> >> This causes a vmptrld of a NULL pointer on my machime, but Jan reports
> >> the host to hang altogether.  Let's see how much this trivial patch fixes.
> >>
> >> Reported-by: Jan Kiszka 
> >> Cc: Liran Alon 
> >> Cc: sta...@vger.kernel.org
> >> Signed-off-by: Paolo Bonzini 
> >
> > Hi all,
> >
> > Do we need to backport the fix also to stable 4.14?  It applies
> > cleanly and compiles fine.
>
> The reproducer required newer kernels that support KVM_GET_NESTED_STATE
> and KVM_SET_NESTED_STATE, so it would be hard to test it.  However, the
> patch itself should be safe.
>
> Paolo

Thanks Paolo for confirmation. I'm asking because we had one incident
in our production with 4.14.129 kernel,
System is Skylake Gold cpu, first kvm errors, host hung afterwards

kernel: [1186161.091160] kvm: vmptrld   (null)/6bfc failed
kernel: [1186161.091537] kvm: vmclear fail:   (null)/6bfc
kernel: [1186186.490300] watchdog: BUG: soft lockup - CPU#54 stuck for
23s! [qemu:16639]

Hi Sasha, hi Greg,

Would be great if you can pick this patch also to 4.14 kernel.

Best regards,
Jack Wang


Re: [PATCH stable-4.19 1/2] KVM: nVMX: do not use dangling shadow VMCS after guest reset

2019-07-29 Thread Jack Wang
Vitaly Kuznetsov  于2019年7月25日周四 下午3:29写道:
>
> From: Paolo Bonzini 
>
> [ Upstream commit 88dddc11a8d6b09201b4db9d255b3394d9bc9e57 ]
>
> If a KVM guest is reset while running a nested guest, free_nested will
> disable the shadow VMCS execution control in the vmcs01.  However,
> on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
> the VMCS12 to the shadow VMCS which has since been freed.
>
> This causes a vmptrld of a NULL pointer on my machime, but Jan reports
> the host to hang altogether.  Let's see how much this trivial patch fixes.
>
> Reported-by: Jan Kiszka 
> Cc: Liran Alon 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Paolo Bonzini 

Hi all,

Do we need to backport the fix also to stable 4.14?  It applies
cleanly and compiles fine.

Regards,
Jack Wang


Re: [PATCH 4.14 00/56] 4.14.133-stable review

2019-07-09 Thread Jack Wang
>
> This is the start of the stable review cycle for the 4.14.133 release.
> There are 56 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed 10 Jul 2019 03:03:52 PM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.133-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Merged and tested with my x86_64 systems, no regression found.

Regards,
Jack Wang @ 1 & 1 IONOS Cloud GmbH


Re: next/master boot bisection: next-20190514 on rk3288-veyron-jaq

2019-05-17 Thread Jack Mitchell
On 16/05/2019 22:38, Doug Anderson wrote:
> Hi,
> 
> From: kernelci.org bot 
> Date: Tue, May 14, 2019 at 9:06 AM
> To: , ,
> , ,
> , ,
> , Elaine Zhang, Eduardo Valentin, Daniel
> Lezcano
> Cc: Heiko Stuebner, ,
> , ,
> Zhang Rui, 
> 
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>> * This automated bisection report was sent to you on the basis  *
>> * that you may be involved with the breaking commit it has  *
>> * found.  No manual investigation has been done to verify it,   *
>> * and the root cause of the problem may be somewhere else.  *
>> * Hope this helps!  *
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>
>> next/master boot bisection: next-20190514 on rk3288-veyron-jaq
>>
>> Summary:
>>   Start:  0a13f187b16a Add linux-next specific files for 20190514
>>   Details:https://kernelci.org/boot/id/5cda7f2259b514876d7a3628
>>   Plain log:  
>> https://storage.kernelci.org//next/master/next-20190514/arm/multi_v7_defconfig+CONFIG_EFI=y+CONFIG_ARM_LPAE=y/gcc-8/lab-collabora/boot-rk3288-veyron-jaq.txt
>>   HTML log:   
>> https://storage.kernelci.org//next/master/next-20190514/arm/multi_v7_defconfig+CONFIG_EFI=y+CONFIG_ARM_LPAE=y/gcc-8/lab-collabora/boot-rk3288-veyron-jaq.html
>>   Result: 691d4947face thermal: rockchip: fix up the tsadc pinctrl 
>> setting error
>>
>> Checks:
>>   revert: PASS
>>   verify: PASS
>>
>> Parameters:
>>   Tree:   next
>>   URL:
>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>>   Branch: master
>>   Target: rk3288-veyron-jaq
>>   CPU arch:   arm
>>   Lab:lab-collabora
>>   Compiler:   gcc-8
>>   Config: multi_v7_defconfig+CONFIG_EFI=y+CONFIG_ARM_LPAE=y
>>   Test suite: boot
>>
>> Breaking commit found:
>>
>> ---
>> commit 691d4947faceb8bd841900049e07c81c95ca4b0d
>> Author: Elaine Zhang 
>> Date:   Tue Apr 30 18:09:44 2019 +0800
>>
>> thermal: rockchip: fix up the tsadc pinctrl setting error
>>
>> Explicitly use the pinctrl to set/unset the right mode
>> instead of relying on the pinctrl init mode.
>> And it requires setting the tshut polarity before select pinctrl.
>>
>> When the temperature sensor mode is set to 0, it will automatically
>> reset the board via the Clock-Reset-Unit (CRU) if the over temperature
>> threshold is reached. However, when the pinctrl initializes, it does a
>> transition to "otp_out" which may lead the SoC restart all the time.
>>
>> "otp_out" IO may be connected to the RESET circuit on the hardware.
>> If the IO is in the wrong state, it will trigger RESET.
>> (similar to the effect of pressing the RESET button)
>> which will cause the soc to restart all the time.
>>
>> Signed-off-by: Elaine Zhang 
>> Reviewed-by: Daniel Lezcano 
>> Signed-off-by: Eduardo Valentin 
> 
> I can confirm that the above commit breaks my jerry, though I haven't
> dug into the details.  :(  Is anyone fixing?  For now I'm just booting
> with the revert.
> 
> 
> -Doug

I can also confirm that this breaks boot on our custom board which is
very similar to the rk3288-Firefly. In my scenario the processor just
seems to "hang", no reset occurs if that helps debug matters.

Regards,
Jack.

> 
> ___
> Linux-rockchip mailing list
> linux-rockc...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip
> 


Hello Beautiful

2019-04-08 Thread Jack
Hi Dear, my name is Jack and i am seeking for a relationship in which i will 
feel loved after a series of failed relationships. 

I am hoping that you would be interested and we could possibly get to know each 
other more if you do not mind. I am open to answering questions from you as i 
think my approach is a little inappropriate to begin with. Hope to hear back 
from you.

Jack.


[PATCH] signal: always allocate siginfo for SI_TKILL

2019-02-02 Thread Jack Andersen
The patch titled
`signal: Never allocate siginfo for SIGKILL or SIGSTOP`
created a regression for users of PTRACE_GETSIGINFO needing to
discern signals that were raised via the tgkill syscall.

A notable user of this tgkill+ptrace combination is lldb while
debugging a multithreaded program. Without the ability to detect a
SIGSTOP originating from tgkill, lldb does not have a way to
synchronize on a per-thread basis and falls back to SIGSTOP-ing the
entire process.

This patch allocates the siginfo as it did previously whenever the
SI_TKILL code is present.

Signed-off-by: Jack Andersen 
---
 kernel/signal.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 9a32bc2088c9..7a810aefb5df 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1058,9 +1058,11 @@ static int __send_signal(int sig, struct kernel_siginfo 
*info, struct task_struc
result = TRACE_SIGNAL_DELIVERED;
/*
 * Skip useless siginfo allocation for SIGKILL SIGSTOP,
-* and kernel threads.
+* and kernel threads. SI_TKILL is an exception to allow
+* processes to discern signals originating from tgkill.
 */
-   if (sig_kernel_only(sig) || (t->flags & PF_KTHREAD))
+   if ((sig_kernel_only(sig) && info->si_code != SI_TKILL) ||
+   (t->flags & PF_KTHREAD))
goto out_set;
 
/*
-- 
2.20.1



[PATCH v2] usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup

2019-01-10 Thread Jack Pham
OUT endpoint requests may somtimes have this flag set when
preparing to be submitted to HW indicating that there is an
additional TRB chained to the request for alignment purposes.
If that request is removed before the controller can execute the
transfer (e.g. ep_dequeue/ep_disable), the request will not go
through the dwc3_gadget_ep_cleanup_completed_request() handler
and will not have its needs_extra_trb flag cleared when
dwc3_gadget_giveback() is called.  This same request could be
later requeued for a new transfer that does not require an
extra TRB and if it is successfully completed, the cleanup
and TRB reclamation will incorrectly process the additional TRB
which belongs to the next request, and incorrectly advances the
TRB dequeue pointer, thereby messing up calculation of the next
requeust's actual/remaining count when it completes.

The right thing to do here is to ensure that the flag is cleared
before it is given back to the function driver.  A good place
to do that is in dwc3_gadget_del_and_unmap_request().

Fixes: c6267a51639b ("usb: dwc3: gadget: align transfers to wMaxPacketSize")
Cc: sta...@vger.kernel.org
Signed-off-by: Jack Pham 
---
v2: Added Fixes tag and Cc: stable

Felipe, as I mentioned in the cover for v1, for stable (from 4.11 where
c6267a51639b first landed through 4.20), the fix needs to be modified to
assign to the separate req->unaligned and req->zero flags in lieu of
needs_extra_trb which appeared in 5.0-rc1 in:

commit 1a22ec643580626f439c8583edafdcc73798f2fb
Author: Felipe Balbi 
Date:   Wed Aug 1 13:15:05 2018 +0300

usb: dwc3: gadget: combine unaligned and zero flags

Do I need to send a separate patch for <= 4.20 or will you handle it?
It's straightforward really, the code change should instead be

+   req->unaligned = false;
+   req->zero = false;

Thanks,
Jack

 drivers/usb/dwc3/gadget.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 2ecde30ad0b7..e97b14f444c8 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -177,6 +177,7 @@ static void dwc3_gadget_del_and_unmap_request(struct 
dwc3_ep *dep,
req->started = false;
list_del(>list);
req->remaining = 0;
+   req->needs_extra_trb = false;
 
if (req->request.status == -EINPROGRESS)
req->request.status = status;
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 1/2] dt-bindings: Add Qualcomm USB Super-Speed PHY bindings

2019-01-07 Thread Jack Pham
Hi Jorge,

Sorry for the late reply as I was out during the holiday break.

On Fri, Dec 28, 2018 at 01:38:59PM +0100, Jorge Ramirez wrote:
> On 12/20/18 18:37, Jack Pham wrote:
> >Hi Rob, Jorge,
> >
> >On Thu, Dec 20, 2018 at 11:05:31AM -0600, Rob Herring wrote:
> >>On Fri, Dec 07, 2018 at 10:55:57AM +0100, Jorge Ramirez-Ortiz wrote:
> >>>Binding description for Qualcomm's Synopsys 1.0.0 super-speed PHY
> >>>controller embedded in QCS404.
> >>>
> >>>Based on Sriharsha Allenki's  original
> >>>definitions.
> >>>
> >>>Signed-off-by: Jorge Ramirez-Ortiz 
> >>>Reviewed-by: Vinod Koul 
> >>>---
> >>>  .../devicetree/bindings/usb/qcom,usb-ssphy.txt | 78 
> >>> ++
> >>>  1 file changed, 78 insertions(+)
> >>>  create mode 100644 
> >>> Documentation/devicetree/bindings/usb/qcom,usb-ssphy.txt
> >>>
> >>>diff --git a/Documentation/devicetree/bindings/usb/qcom,usb-ssphy.txt 
> >>>b/Documentation/devicetree/bindings/usb/qcom,usb-ssphy.txt
> >>>new file mode 100644
> >>>index 000..fcf4e01
> >>>--- /dev/null
> >>>+++ b/Documentation/devicetree/bindings/usb/qcom,usb-ssphy.txt
> >>>@@ -0,0 +1,78 @@
> >>>+Qualcomm Synopsys 1.0.0 SS phy controller
> >>>+===
> >>>+
> >>>+Synopsys 1.0.0 ss phy controller supports SS usb connectivity on Qualcomm
> >>>+chipsets
> >>>+
> >>>+Required properties:
> >>>+
> >>>+- compatible:
> >>>+Value type: 
> >>>+Definition: Should contain "qcom,usb-ssphy".
> >>
> >>What is "qcom,dwc3-ss-usb-phy" which already exists then?
> >
> >Uh, apparently only the bindings doc is there but the driver never
> >landed. I guess it fell through the cracks nearly 4 years ago.
> >
> >https://lore.kernel.org/patchwork/patch/499502/
> >
> >Jorge, does Andy's version of this driver at all resemble what can be
> >used for QCS404?
> 
> on close inspection I cant see any similitudes between the drivers.
> Unfortunately I don't have access to documentation yet but the
> control register offsets and the control bits in the drivers do not
> match.
> 
> because of the above I'd like to go ahead with our separate drivers
> -already tested and validated- for HS (Shawn's) and SS (mine).
> 
> if that is acceptable, should we reuse the upstream bindings for
> our implementation? or perhaps Shawn Guo will do for his HS version
> of the driver and I go ahead and create a new one? what would you
> suggest?

I'm not really sure. My understanding of the driver Andy submitted
were for some of the older MSM and IPQ SoCs that implemented the PHY
controls as part of the DWC3 controller's "QScratch" registers, which is
why the bindings doc and the compatible string reference "dwc3" in both
the compatible and the docs filename. Is the SNPS PHY on QCS404
architected similarly in this regard? Either way, the existing bindings
doc for the non-existent driver looks incomplete for QCS404, so you'd
have to update it anyway. My feeling is that there should just be one
document describing all variants of SNPS PHYs on Qualcomm chips.

Maybe we should also just delete the "qcom,dwc3-ss-usb-phy" binding
unless there is a plan to resurrect Andy's driver.

Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH v1 3/6] phy: qcom-qusb2: Add QUSB2 PHY support for msm8998

2019-01-07 Thread Jack Pham
Hi Jeff,

Spotted a typo below:

On Fri, Jan 04, 2019 at 09:50:29AM -0700, Jeffrey Hugo wrote:
> MSM8998 contains one QUSB2 PHY which is very similar to the existing
> sdm845 support.
> 
> Signed-off-by: Jeffrey Hugo 
> ---
>  .../devicetree/bindings/phy/qcom-qusb2-phy.txt |  1 +
>  drivers/phy/qualcomm/phy-qcom-qusb2.c  | 41 
> ++
>  2 files changed, 42 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt 
> b/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt
> index 03025d9..3976847 100644
> --- a/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt
> +++ b/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt
> @@ -6,6 +6,7 @@ QUSB2 controller supports LS/FS/HS usb connectivity on 
> Qualcomm chipsets.
>  Required properties:
>   - compatible: compatible list, contains
>  "qcom,msm8996-qusb2-phy" for 14nm PHY on msm8996,
> +"qcom,msm8998-qusb2-phy" for 10nm PHY on msm8996,
   
should be 8998.

Thanks,
Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


[PATCH v2] Add USB_QUIRK_DELAY_CTRL_MSG quirk for Corsair K70 RGB

2019-01-03 Thread Jack Stocker
To match the Corsair Strafe RGB, the Corsair K70 RGB also requires
USB_QUIRK_DELAY_CTRL_MSG to completely resolve boot connection issues
discussed here: https://github.com/ckb-next/ckb-next/issues/42.
Otherwise roughly 1 in 10 boots the keyboard will fail to be detected.

Patch that applied delay control quirk for Corsair Strafe RGB:
cb88a0588717 ("usb: quirks: add control message delay for 1b1c:1b20")

Previous K70 RGB patch to add delay-init quirk:
7a1646d92257 ("Add delay-init quirk for Corsair K70 RGB keyboards")

Signed-off-by: Jack Stocker 
---
Changes in v2:
  - Added references to previous patches that affect this change.

 drivers/usb/core/quirks.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
index 7909262..4a9267d 100644
--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -377,7 +377,8 @@ static const struct usb_device_id usb_quirk_list[] = {
USB_QUIRK_LINEAR_UFRAME_INTR_BINTERVAL },
 
/* Corsair K70 RGB */
-   { USB_DEVICE(0x1b1c, 0x1b13), .driver_info = USB_QUIRK_DELAY_INIT },
+   { USB_DEVICE(0x1b1c, 0x1b13), .driver_info = USB_QUIRK_DELAY_INIT |
+ USB_QUIRK_DELAY_CTRL_MSG },
 
/* Corsair Strafe RGB */
{ USB_DEVICE(0x1b1c, 0x1b20), .driver_info = USB_QUIRK_DELAY_INIT |
-- 
2.7.4



  1   2   3   4   5   6   7   8   9   >