Re: [PATCH] mm: vmalloc: Prevent use after free in _vm_unmap_aliases

2021-03-23 Thread Vijayanand Jitta



On 3/18/2021 10:29 PM, Uladzislau Rezki wrote:
> On Thu, Mar 18, 2021 at 03:38:25PM +0530, vji...@codeaurora.org wrote:
>> From: Vijayanand Jitta 
>>
>> A potential use after free can occur in _vm_unmap_aliases
>> where an already freed vmap_area could be accessed, Consider
>> the following scenario:
>>
>> Process 1Process 2
>>
>> __vm_unmap_aliases   __vm_unmap_aliases
>>  purge_fragmented_blocks_allcpus rcu_read_lock()
>>  rcu_read_lock()
>>  list_del_rcu(>free_list)
>>  
>> list_for_each_entry_rcu(vb .. )
>>  __purge_vmap_area_lazy
>>  kmem_cache_free(va)
>>  
>> va_start = vb->va->va_start
> Or maybe we should switch to kfree_rcu() instead of kmem_cache_free()?
> 
> --
> Vlad Rezki
> 

Thanks for suggestion.

I see free_vmap_area_lock (spinlock) is taken in __purge_vmap_area_lazy
while it loops through list and calls kmem_cache_free on va's. So, looks
like we can't replace it with kfree_rcu as it might cause scheduling
within atomic context.

Thanks,
Vijay
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: linux-next: build warning after merge of the akpm tree

2021-01-31 Thread Vijayanand Jitta



On 1/28/2021 2:16 PM, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the akpm tree, today's linux-next build (x86_64
> allmodconfig) produced this warning:
> 
> lib/stackdepot.c: In function 'is_stack_depot_disabled':
> lib/stackdepot.c:154:2: warning: ignoring return value of 'kstrtobool' 
> declared with attribute 'warn_unused_result' [-Wunused-result]
>   154 |  kstrtobool(str, _depot_disable);
>   |  ^
> 
> Introduced by commit
> 
>   b9779abb09a8 ("lib: stackdepot: add support to disable stack depot")
> 
> Interestingly, we have 2 declarations of kstrtobool - one in
> linux/kernel.h (which has __must_check) and one in linux/strings.h
> (which doesn't).
> 

I have sent out the fix to lkml, Copied it below for your reference.
Please Feel free to apply or squash it to the original commit.

Thanks,
Vijay

--
From: Vijayanand Jitta 

fix the below ignoring return value warning for kstrtobool
in is_stack_depot_disabled function.

lib/stackdepot.c: In function 'is_stack_depot_disabled':
lib/stackdepot.c:154:2: warning: ignoring return value of 'kstrtobool'
declared with attribute 'warn_unused_result' [-Wunused-result]

Fixes: b9779abb09a8 ("lib: stackdepot: add support to disable stack depot")
Signed-off-by: Vijayanand Jitta 
---
 lib/stackdepot.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index cc21116..49f67a0 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -151,8 +151,10 @@ static struct stack_record **stack_table;

 static int __init is_stack_depot_disabled(char *str)
 {
-   kstrtobool(str, _depot_disable);
-   if (stack_depot_disable) {
+   int ret;
+
+   ret = kstrtobool(str, _depot_disable);
+   if (!ret && stack_depot_disable) {
pr_info("Stack Depot is disabled\n");
stack_table = NULL;
}
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v5 2/2] lib: stackdepot: Add support to disable stack depot

2021-01-21 Thread Vijayanand Jitta



On 1/22/2021 5:55 AM, Minchan Kim wrote:
> On Mon, Jan 18, 2021 at 03:26:42PM +0530, vji...@codeaurora.org wrote:
>> From: Vijayanand Jitta 
>>
>> Add a kernel parameter stack_depot_disable to disable
>> stack depot. So that stack hash table doesn't consume
>> any memory when stack depot is disabled.
> 
> The usecase is CONFIG_PAGE_OWNER without page_owner=on.
> Without this patch, stackdepot will consume the memory
> for the hashtable. By default, it's 8M which is never trivial.
> 
> With this option, in CONFIG_PAGE_OWNER configured system,
> page_owner=off, stack_depot_disable in kernel command line,
> we could save the wasted memory for the hashtable.
> 

Sure, will update the commit text with above details.
>>
>> Signed-off-by: Vinayak Menon 
>> Signed-off-by: Vijayanand Jitta 
> 
> Please also update kernel-parameters.txt.
> 

Sure.
>> ---
>>  include/linux/stackdepot.h |  1 +
>>  init/main.c|  2 ++
>>  lib/stackdepot.c   | 33 +
>>  3 files changed, 32 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
>> index 24d49c7..eafd9aa 100644
>> --- a/include/linux/stackdepot.h
>> +++ b/include/linux/stackdepot.h
>> @@ -21,4 +21,5 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle,
>>  
>>  unsigned int filter_irq_stacks(unsigned long *entries, unsigned int 
>> nr_entries);
>>  
>> +int stack_depot_init(void);
>>  #endif
>> diff --git a/init/main.c b/init/main.c
>> index 32b2a8a..8fcf9bb 100644
>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -98,6 +98,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include 
>>  #include 
>> @@ -827,6 +828,7 @@ static void __init mm_init(void)
>>  page_ext_init_flatmem();
>>  init_debug_pagealloc();
>>  report_meminit();
>> +stack_depot_init();
>>  mem_init();
>>  kmem_cache_init();
>>  kmemleak_init();
>> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
>> index dff8521..d20e6fd 100644
>> --- a/lib/stackdepot.c
>> +++ b/lib/stackdepot.c
>> @@ -31,6 +31,8 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
> 
> Why do we need vmalloc?
> 

Its not needed, will remove it.

> Otherwise, looks good to me.
> Thank you!
> 

Thanks for the review.

Thanks,
Vijay
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v5 1/2] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2021-01-21 Thread Vijayanand Jitta



On 1/22/2021 5:49 AM, Minchan Kim wrote:
> On Mon, Jan 18, 2021 at 03:26:41PM +0530, vji...@codeaurora.org wrote:
>> From: Yogesh Lal 
>>
>> Use CONFIG_STACK_HASH_ORDER to configure STACK_HASH_SIZE.
>>
>> Aim is to have configurable value for  STACK_HASH_SIZE,
>> so depend on use case one can configure it.
>>
>> One example is of Page Owner, default value of
>> STACK_HASH_SIZE lead stack depot to consume 8MB of static memory.
>> Making it configurable and use lower value helps to enable features like
>> CONFIG_PAGE_OWNER without any significant overhead.
> 
> The description could be improved to prevent confusing.
> CONFIG_PAGE_OWNER works only if page_owner=on via kernel parameter
> on CONFIG_PAGE_OWNER configured system.
> Thus, unless admin enable it via command line option, the stackdepot
> will just waste 8M memory without any customer.
> 

Sure, will update the commit text as suggested.

Thanks,
Vijay
>>
>> Signed-off-by: Yogesh Lal 
>> Signed-off-by: Vinayak Menon 
>> Signed-off-by: Vijayanand Jitta 
> Reviewed-by: Minchan Kim 
> 

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v5 1/2] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2021-01-18 Thread Vijayanand Jitta



On 1/19/2021 4:23 AM, Randy Dunlap wrote:
> On 1/18/21 1:56 AM, vji...@codeaurora.org wrote:
>> From: Yogesh Lal 
>>
>> Use CONFIG_STACK_HASH_ORDER to configure STACK_HASH_SIZE.
>>
>> Aim is to have configurable value for  STACK_HASH_SIZE,
>> so depend on use case one can configure it.
>>
>> One example is of Page Owner, default value of
>> STACK_HASH_SIZE lead stack depot to consume 8MB of static memory.
>> Making it configurable and use lower value helps to enable features like
>> CONFIG_PAGE_OWNER without any significant overhead.
>>
>> Signed-off-by: Yogesh Lal 
>> Signed-off-by: Vinayak Menon 
>> Signed-off-by: Vijayanand Jitta 
> 
> Hi,
> 
> Did you see
> https://lore.kernel.org/lkml/202101050729.cwtd47yw-...@intel.com/
> 
> It seems that arch/arc/ does not have:
>arc-elf-ld: lib/stackdepot.o: in function `filter_irq_stacks':
>(.text+0x6): undefined reference to `__irqentry_text_start'
>>> arc-elf-ld: (.text+0x6): undefined reference to `__irqentry_text_start'
>>> arc-elf-ld: (.text+0x26): undefined reference to `__irqentry_text_end'
>>> arc-elf-ld: (.text+0x26): undefined reference to `__irqentry_text_end'
>>> arc-elf-ld: (.text+0x34): undefined reference to `__softirqentry_text_start'
>>> arc-elf-ld: (.text+0x34): undefined reference to `__softirqentry_text_start'
>>> arc-elf-ld: (.text+0x3c): undefined reference to `__softirqentry_text_end'
>>> arc-elf-ld: (.text+0x3c): undefined reference to `__softirqentry_text_end'
> 
> 
> 
> 

The above issue seems to be because of a different patch.
This one
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=505a0ef15f96c6c43ec719c9fc1833d98957bb39

I didn't really get why you referred that here.

Thanks,
Vijay
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v4 1/2] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2021-01-18 Thread Vijayanand Jitta



On 1/5/2021 2:54 PM, Vijayanand Jitta wrote:
> 
> 
> On 1/5/2021 4:42 AM, Andrew Morton wrote:
>> On Wed, 30 Dec 2020 18:15:30 +0530 vji...@codeaurora.org wrote:
>>
>>> Use STACK_HASH_ORDER_SHIFT to configure STACK_HASH_SIZE.
>>>
>>> Aim is to have configurable value for  STACK_HASH_SIZE,
>>> so depend on use case one can configure it.
>>>
>>> One example is of Page Owner, default value of
>>> STACK_HASH_SIZE lead stack depot to consume 8MB of static memory.
>>> Making it configurable and use lower value helps to enable features like
>>> CONFIG_PAGE_OWNER without any significant overhead.
>>
>> Questions regarding the stackdepot code.
>>
>> - stack_table_tmp[] is __initdata.  So after initmem is released,
>>   that "consume 8MB of static memory" should no longer be true.  But
>>   iirc, not all architectures actually release __initdata memory.  Does
>>   your architecture do this?
>>
> Thanks for review comments, I wasn't aware that __initdata is
> architecture dependent, I was assuming that __initdata always frees
> memory and yes the architecture which i am using (arm64) does free
> __inidata.
> 
>> - Stackdepot copies stack_table_tmp[] into vmalloced memory during
>>   initcalls.  Why?  Why not simply make stack_table_tmp[] no longer
>>   __initdata and use that memory for all time?
>>
>>   Presumably because in the stack_depot_disable==true case, we
>>   release stack_table_tmp[] memory, don't vmalloc for a copy of it, and
>>   save a bunch of memory?  If so, this assumes that the __initdata
>>   memory is freed.
>>
> 
> Yes, that correct. assumption here is __initidata will free memory if
> stack_depot_disable=true is set.
> 
>> - Why is that hash table so large?  Is it appropriately sized?
>>
> 
> I think the large size of hash table is justified since the users of
> stack depot like kasan, page owner etc store a very large number of  stacks.
> 
>> - SMP is up and running during init_stackdepot(), I think?  If so, is
>>   that huge memcpy smp-safe?  Can other CPUs be modifying
>>   stack_table_tmp[] while the memcpy is in flight?
>>
>>
>>
> Yes, parallel access could be possible. I will add a locking mechanism
> inplace.
> 
> Thanks,
> Vijay
> 

I have updated the patch avoiding __initdata as per suggestion and the
copy from tmp , can you please review v5.

https://lore.kernel.org/patchwork/patch/1367306/

Thanks,
Vijay

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v4 1/2] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2021-01-18 Thread Vijayanand Jitta



On 1/7/2021 3:14 PM, Alexander Potapenko wrote:
> On Wed, Dec 30, 2020 at 1:46 PM  wrote:
>>
>> From: Yogesh Lal 
>>
>> Use STACK_HASH_ORDER_SHIFT to configure STACK_HASH_SIZE.
> I think "ORDER_SHIFT" is somewhat redundant, as "SMTH_ORDER" already
> means this is a power of two we'll be using for shifting.
> Leaving this up to you.
> 
> Alex
> 

Right, updated this to STACK_HASH_ORDER in v5.
https://lkml.org/lkml/2021/1/18/255

Thanks,
Vijay
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v4 1/2] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2021-01-05 Thread Vijayanand Jitta



On 1/5/2021 4:42 AM, Andrew Morton wrote:
> On Wed, 30 Dec 2020 18:15:30 +0530 vji...@codeaurora.org wrote:
> 
>> Use STACK_HASH_ORDER_SHIFT to configure STACK_HASH_SIZE.
>>
>> Aim is to have configurable value for  STACK_HASH_SIZE,
>> so depend on use case one can configure it.
>>
>> One example is of Page Owner, default value of
>> STACK_HASH_SIZE lead stack depot to consume 8MB of static memory.
>> Making it configurable and use lower value helps to enable features like
>> CONFIG_PAGE_OWNER without any significant overhead.
> 
> Questions regarding the stackdepot code.
> 
> - stack_table_tmp[] is __initdata.  So after initmem is released,
>   that "consume 8MB of static memory" should no longer be true.  But
>   iirc, not all architectures actually release __initdata memory.  Does
>   your architecture do this?
> 
Thanks for review comments, I wasn't aware that __initdata is
architecture dependent, I was assuming that __initdata always frees
memory and yes the architecture which i am using (arm64) does free
__inidata.

> - Stackdepot copies stack_table_tmp[] into vmalloced memory during
>   initcalls.  Why?  Why not simply make stack_table_tmp[] no longer
>   __initdata and use that memory for all time?
> 
>   Presumably because in the stack_depot_disable==true case, we
>   release stack_table_tmp[] memory, don't vmalloc for a copy of it, and
>   save a bunch of memory?  If so, this assumes that the __initdata
>   memory is freed.
> 

Yes, that correct. assumption here is __initidata will free memory if
stack_depot_disable=true is set.

> - Why is that hash table so large?  Is it appropriately sized?
> 

I think the large size of hash table is justified since the users of
stack depot like kasan, page owner etc store a very large number of  stacks.

> - SMP is up and running during init_stackdepot(), I think?  If so, is
>   that huge memcpy smp-safe?  Can other CPUs be modifying
>   stack_table_tmp[] while the memcpy is in flight?
> 
> 
> 
Yes, parallel access could be possible. I will add a locking mechanism
inplace.

Thanks,
Vijay

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v3] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2020-12-27 Thread Vijayanand Jitta



On 12/23/2020 8:10 PM, Alexander Potapenko wrote:
>>
>> Michan, We would still need config option so that we can reduce the
>> memory consumption on low ram devices using config.
>>
>> Alex, On this,
>> "I also suppose device vendors may prefer setting a fixed (maybe
>> non-default) hash size for low-memory devices rather than letting the
>> admins increase it."
>> I see kernel param swiotlb does similar thing i.e; '0' to disable and
>> set a value to configure size.
>>
>> I am fine with either of the approaches,
>>
>> 1. I can split this patch into two
>>i)  A bool variable to enable/disable stack depot.
>>ii) A config for the size.
> 
> I still believe this is a more appropriate solution.
> 
> Thanks in advance!
> 

Thanks, Will work on a patch with above approach.

Thanks,
Vijay
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v3] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2020-12-21 Thread Vijayanand Jitta



On 12/22/2020 1:59 AM, Minchan Kim wrote:
> On Mon, Dec 21, 2020 at 04:04:09PM +0100, Alexander Potapenko wrote:
>> On Mon, Dec 21, 2020 at 12:15 PM Vijayanand Jitta  
>> wrote:
>>>
>>>
>>>
>>> On 12/18/2020 2:10 PM, Vijayanand Jitta wrote:
>>>>
>>>>
>>>> On 12/17/2020 4:24 PM, Alexander Potapenko wrote:
>>>>>>> Can you provide an example of a use case in which the user wants to
>>>>>>> use the stack depot of a smaller size without disabling it completely,
>>>>>>> and that size cannot be configured statically?
>>>>>>> As far as I understand, for the page owner example you gave it's
>>>>>>> sufficient to provide a switch that can disable the stack depot if
>>>>>>> page_owner=off.
>>>>>>>
>>>>>> There are two use cases here,
>>>>>>
>>>>>> 1. We don't want to consume memory when page_owner=off ,boolean flag
>>>>>> would work here.
>>>>>>
>>>>>> 2. We would want to enable page_owner on low ram devices but we don't
>>>>>> want stack depot to consume 8 MB of memory, so for this case we would
>>>>>> need a configurable stack_hash_size so that we can still use page_owner
>>>>>> with lower memory consumption.
>>>>>>
>>>>>> So, a configurable stack_hash_size would work for both these use cases,
>>>>>> we can set it to '0' for first case and set the required size for the
>>>>>> second case.
>>>>>
>>>>> Will a combined solution with a boolean boot-time flag and a static
>>>>> CONFIG_STACKDEPOT_HASH_SIZE work for these cases?
>>>>> I suppose low-memory devices have a separate kernel config anyway?
>>>>>
>>>>
>>>> Yes, the combined solution will also work but i think having a single
>>>> run time config is simpler instead of having two things to configure.
>>>>
>>>
>>> To add to it we started of with a CONFIG first, after the comments from
>>> Minchan (https://lkml.org/lkml/2020/11/3/2121) we decided to switch to
>>> run time param.
>>>
>>> Quoting Minchan's comments below:
>>>
>>> "
>>> 1. When we don't use page_owner, we don't want to waste any memory for
>>> stackdepot hash array.
>>> 2. When we use page_owner, we want to have reasonable stackdeport hash array
>>>
>>> With this configuration, it couldn't meet since we always need to
>>> reserve a reasonable size for the array.
>>> Can't we make the hash size as a kernel parameter?
>>> With it, we could use it like this.
>>>
>>> 1. page_owner=off, stackdepot_stack_hash=0 -> no more wasted memory
>>> when we don't use page_owner
>>> 2. page_owner=on, stackdepot_stack_hash=8M -> reasonable hash size
>>> when we use page_owner.
>>> "
>>
>> Minchan, what do you think about making the hash size itself a static
>> parameter, while letting the user disable stackdepot completely at
>> runtime?
>> As noted before, I am concerned that moving a low-level configuration
>> bit (which essentially means "save 8Mb - (1 << stackdepot_stack_hash)
>> of static memory") to the boot parameters will be unused by most
>> admins and may actually trick them into thinking they reduce the
>> overall stackdepot memory consumption noticeably.
>> I also suppose device vendors may prefer setting a fixed (maybe
>> non-default) hash size for low-memory devices rather than letting the
>> admins increase it.
> 
> I am totally fine if we could save the static memory alloation when
> the page_owner is not used.
> 
> IOW, page_owner=disable, stackdepot=disable will not consume the 8M
> memory.
> When we want to use page_owner, we could just do like this
> 
>   page_owner=enable, stackdepot=enable
> 
> (Maybe we need something to make warning if stackdepot is disabled
> but someone want to use it, for example, KASAN?)
> 
> Vijayanand, If we could work this this, should we still need the
> config option, then? 
> 

Michan, We would still need config option so that we can reduce the
memory consumption on low ram devices using config.

Alex, On this,
"I also suppose device vendors may prefer setting a fixed (maybe
non-default) hash size for low-memory devices rather than letting the
admins increase it."
I see kernel param swiotlb does similar thing i.e; '0' to disable and
set a value to configure size.

I am fine with either of the approaches,

1. I can split this patch into two
   i)  A bool variable to enable/disable stack depot.
   ii) A config for the size.

(or)

2. A run time param - '0' to disable and set a valid size to enable.

Let me know your comments.
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v3] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2020-12-21 Thread Vijayanand Jitta



On 12/18/2020 2:10 PM, Vijayanand Jitta wrote:
> 
> 
> On 12/17/2020 4:24 PM, Alexander Potapenko wrote:
>>>> Can you provide an example of a use case in which the user wants to
>>>> use the stack depot of a smaller size without disabling it completely,
>>>> and that size cannot be configured statically?
>>>> As far as I understand, for the page owner example you gave it's
>>>> sufficient to provide a switch that can disable the stack depot if
>>>> page_owner=off.
>>>>
>>> There are two use cases here,
>>>
>>> 1. We don't want to consume memory when page_owner=off ,boolean flag
>>> would work here.
>>>
>>> 2. We would want to enable page_owner on low ram devices but we don't
>>> want stack depot to consume 8 MB of memory, so for this case we would
>>> need a configurable stack_hash_size so that we can still use page_owner
>>> with lower memory consumption.
>>>
>>> So, a configurable stack_hash_size would work for both these use cases,
>>> we can set it to '0' for first case and set the required size for the
>>> second case.
>>
>> Will a combined solution with a boolean boot-time flag and a static
>> CONFIG_STACKDEPOT_HASH_SIZE work for these cases?
>> I suppose low-memory devices have a separate kernel config anyway?
>>
> 
> Yes, the combined solution will also work but i think having a single
> run time config is simpler instead of having two things to configure.
> 

To add to it we started of with a CONFIG first, after the comments from
Minchan (https://lkml.org/lkml/2020/11/3/2121) we decided to switch to
run time param.

Quoting Minchan's comments below:

"
1. When we don't use page_owner, we don't want to waste any memory for
stackdepot hash array.
2. When we use page_owner, we want to have reasonable stackdeport hash array

With this configuration, it couldn't meet since we always need to
reserve a reasonable size for the array.
Can't we make the hash size as a kernel parameter?
With it, we could use it like this.

1. page_owner=off, stackdepot_stack_hash=0 -> no more wasted memory
when we don't use page_owner
2. page_owner=on, stackdepot_stack_hash=8M -> reasonable hash size
when we use page_owner.
"

Thanks,
Vijay
>> My concern is that exposing yet another knob to users won't really
>> solve their problems, because the hash size alone doesn't give enough
>> control over stackdepot memory footprint (we also have stack_slabs,
>> which may get way bigger than 8Mb).
>>
> 
> True, stack_slabs can consume more memory but they consume most only
> when stack depot is used as they are allocated in stack_depot_save path.
> when stack depot is not used they consume 8192 * sizeof(void) bytes at
> max. So nothing much we can do here since static allocation is not much
> and memory consumption depends up on stack depot usage, unlike
> stack_hash_table where 8mb is preallocated.
> 

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v3] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2020-12-18 Thread Vijayanand Jitta



On 12/17/2020 4:24 PM, Alexander Potapenko wrote:
>>> Can you provide an example of a use case in which the user wants to
>>> use the stack depot of a smaller size without disabling it completely,
>>> and that size cannot be configured statically?
>>> As far as I understand, for the page owner example you gave it's
>>> sufficient to provide a switch that can disable the stack depot if
>>> page_owner=off.
>>>
>> There are two use cases here,
>>
>> 1. We don't want to consume memory when page_owner=off ,boolean flag
>> would work here.
>>
>> 2. We would want to enable page_owner on low ram devices but we don't
>> want stack depot to consume 8 MB of memory, so for this case we would
>> need a configurable stack_hash_size so that we can still use page_owner
>> with lower memory consumption.
>>
>> So, a configurable stack_hash_size would work for both these use cases,
>> we can set it to '0' for first case and set the required size for the
>> second case.
> 
> Will a combined solution with a boolean boot-time flag and a static
> CONFIG_STACKDEPOT_HASH_SIZE work for these cases?
> I suppose low-memory devices have a separate kernel config anyway?
> 

Yes, the combined solution will also work but i think having a single
run time config is simpler instead of having two things to configure.

> My concern is that exposing yet another knob to users won't really
> solve their problems, because the hash size alone doesn't give enough
> control over stackdepot memory footprint (we also have stack_slabs,
> which may get way bigger than 8Mb).
> 

True, stack_slabs can consume more memory but they consume most only
when stack depot is used as they are allocated in stack_depot_save path.
when stack depot is not used they consume 8192 * sizeof(void) bytes at
max. So nothing much we can do here since static allocation is not much
and memory consumption depends up on stack depot usage, unlike
stack_hash_table where 8mb is preallocated.
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v3] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2020-12-16 Thread Vijayanand Jitta



On 12/16/2020 7:04 PM, Alexander Potapenko wrote:
>>> To reiterate, I think you don't need a tunable stack_hash_order
>>> parameter if the only use case is to disable the stack depot.
>>> Maybe it is enough to just add a boolean flag?
>>
>> There are multiple users of stackdepot they might still want to use
>> stack depot but with a lower memory footprint instead of MAX_SIZE
>> so, a configurable size might help here ?
> 
> Can you provide an example of a use case in which the user wants to
> use the stack depot of a smaller size without disabling it completely,
> and that size cannot be configured statically?
> As far as I understand, for the page owner example you gave it's
> sufficient to provide a switch that can disable the stack depot if
> page_owner=off.
> 
There are two use cases here,

1. We don't want to consume memory when page_owner=off ,boolean flag
would work here.

2. We would want to enable page_owner on low ram devices but we don't
want stack depot to consume 8 MB of memory, so for this case we would
need a configurable stack_hash_size so that we can still use page_owner
with lower memory consumption.

So, a configurable stack_hash_size would work for both these use cases,
we can set it to '0' for first case and set the required size for the
second case.

>>> Or even go further and disable the stack depot in the same place that
>>> disables page owner, as the user probably doesn't want to set two
>>> flags instead of one?
>>>
>>
>> Since, page owner is not the only user of stack depot we can't take that
>> decision of disabling stack depot if page owner is disabled.
> 
> Agreed, but if multiple subsystems want to use stackdepot together, it
> is even harder to estimate the total memory consumption.
> How likely is it that none of them will need MAX_SIZE?
> 
 Minchan,
 This should be fine right ? Do you see any issue with disabling
 stack depot completely ?

 Thanks,
 Vijay

>> Thanks,
>> Vijay
>>
> Thanks,
> Vijay
>
>>> Thanks,
>>> Vijay
>>>
>>> --
>>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
>>> member of Code Aurora Forum, hosted by The Linux Foundation
>>
>>
>>
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member of Code Aurora Forum, hosted by The Linux Foundation



>>>
>>
>> --
>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
>> member of Code Aurora Forum, hosted by The Linux Foundation
>
>
>

 --
 QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
 member of Code Aurora Forum, hosted by The Linux Foundation
>>>
>>>
>>>
>>
>> --
>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
>> member of Code Aurora Forum, hosted by The Linux Foundation
> 
> 
> 

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v3] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2020-12-16 Thread Vijayanand Jitta



On 12/16/2020 6:41 PM, Alexander Potapenko wrote:
> On Wed, Dec 16, 2020 at 2:06 PM Vijayanand Jitta  
> wrote:
>>
>>
>>
>> On 12/16/2020 1:56 PM, Alexander Potapenko wrote:
>>> On Wed, Dec 16, 2020 at 4:43 AM Vijayanand Jitta  
>>> wrote:
>>>>
>>>>
>>>>
>>>> On 12/14/2020 4:02 PM, Vijayanand Jitta wrote:
>>>>>
>>>>>
>>>>> On 12/14/2020 3:04 PM, Alexander Potapenko wrote:
>>>>>> On Mon, Dec 14, 2020 at 5:02 AM Vijayanand Jitta  
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 12/11/2020 6:55 PM, Alexander Potapenko wrote:
>>>>>>>> On Fri, Dec 11, 2020 at 1:45 PM Vijayanand Jitta 
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12/11/2020 2:06 PM, Alexander Potapenko wrote:
>>>>>>>>>> On Thu, Dec 10, 2020 at 6:01 AM  wrote:
>>>>>>>>>>>
>>>>>>>>>>> From: Yogesh Lal 
>>>>>>>>>>>
>>>>>>>>>>> Add a kernel parameter stack_hash_order to configure 
>>>>>>>>>>> STACK_HASH_SIZE.
>>>>>>>>>>>
>>>>>>>>>>> Aim is to have configurable value for STACK_HASH_SIZE, so that one
>>>>>>>>>>> can configure it depending on usecase there by reducing the static
>>>>>>>>>>> memory overhead.
>>>>>>>>>>>
>>>>>>>>>>> One example is of Page Owner, default value of STACK_HASH_SIZE lead
>>>>>>>>>>> stack depot to consume 8MB of static memory. Making it configurable
>>>>>>>>>>> and use lower value helps to enable features like CONFIG_PAGE_OWNER
>>>>>>>>>>> without any significant overhead.
>>>>>>>>>>
>>>>>>>>>> Can we go with a static CONFIG_ parameter instead?
>>>>>>>>>> Guess most users won't bother changing the default anyway, and for
>>>>>>>>>> CONFIG_PAGE_OWNER users changing the size at boot time is not 
>>>>>>>>>> strictly
>>>>>>>>>> needed.
>>>>>>>>>>
>>>>>>>>> Thanks for review.
>>>>>>>>>
>>>>>>>>> One advantage of having run time parameter is we can simply set it to 
>>>>>>>>> a
>>>>>>>>> lower value at runtime if page_owner=off thereby reducing the memory
>>>>>>>>> usage or use default value if we want to use page owner so, we have 
>>>>>>>>> some
>>>>>>>>> some flexibility here. This is not possible with static parameter as 
>>>>>>>>> we
>>>>>>>>> have to have some predefined value.
>>>>>>>>
>>>>>>>> If we are talking about a configuration in which page_owner is the
>>>>>>>> only stackdepot user in the system, then for page_owner=off it
>>>>>>>> probably makes more sense to disable stackdepot completely instead of
>>>>>>>> setting it to a lower value. This is a lot easier to do in terms of
>>>>>>>> correctness.
>>>>>>>> But if there are other users (e.g. KASAN), their stackdepot usage may
>>>>>>>> actually dominate that of page_owner.
>>>>>>>>
>>>>>>>>>>> -static struct stack_record *stack_table[STACK_HASH_SIZE] = {
>>>>>>>>>>> -   [0 ...  STACK_HASH_SIZE - 1] = NULL
>>>>>>>>>>> +static unsigned int stack_hash_order = 20;
>>>>>>>>>>
>>>>>>>>>> Please initialize with MAX_STACK_HASH_ORDER instead.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sure, will update this.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>> +
>>>>>>>>>>> +static int __init init_stackdepot(void)
>&g

Re: [PATCH v3] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2020-12-16 Thread Vijayanand Jitta



On 12/16/2020 1:56 PM, Alexander Potapenko wrote:
> On Wed, Dec 16, 2020 at 4:43 AM Vijayanand Jitta  
> wrote:
>>
>>
>>
>> On 12/14/2020 4:02 PM, Vijayanand Jitta wrote:
>>>
>>>
>>> On 12/14/2020 3:04 PM, Alexander Potapenko wrote:
>>>> On Mon, Dec 14, 2020 at 5:02 AM Vijayanand Jitta  
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 12/11/2020 6:55 PM, Alexander Potapenko wrote:
>>>>>> On Fri, Dec 11, 2020 at 1:45 PM Vijayanand Jitta  
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 12/11/2020 2:06 PM, Alexander Potapenko wrote:
>>>>>>>> On Thu, Dec 10, 2020 at 6:01 AM  wrote:
>>>>>>>>>
>>>>>>>>> From: Yogesh Lal 
>>>>>>>>>
>>>>>>>>> Add a kernel parameter stack_hash_order to configure STACK_HASH_SIZE.
>>>>>>>>>
>>>>>>>>> Aim is to have configurable value for STACK_HASH_SIZE, so that one
>>>>>>>>> can configure it depending on usecase there by reducing the static
>>>>>>>>> memory overhead.
>>>>>>>>>
>>>>>>>>> One example is of Page Owner, default value of STACK_HASH_SIZE lead
>>>>>>>>> stack depot to consume 8MB of static memory. Making it configurable
>>>>>>>>> and use lower value helps to enable features like CONFIG_PAGE_OWNER
>>>>>>>>> without any significant overhead.
>>>>>>>>
>>>>>>>> Can we go with a static CONFIG_ parameter instead?
>>>>>>>> Guess most users won't bother changing the default anyway, and for
>>>>>>>> CONFIG_PAGE_OWNER users changing the size at boot time is not strictly
>>>>>>>> needed.
>>>>>>>>
>>>>>>> Thanks for review.
>>>>>>>
>>>>>>> One advantage of having run time parameter is we can simply set it to a
>>>>>>> lower value at runtime if page_owner=off thereby reducing the memory
>>>>>>> usage or use default value if we want to use page owner so, we have some
>>>>>>> some flexibility here. This is not possible with static parameter as we
>>>>>>> have to have some predefined value.
>>>>>>
>>>>>> If we are talking about a configuration in which page_owner is the
>>>>>> only stackdepot user in the system, then for page_owner=off it
>>>>>> probably makes more sense to disable stackdepot completely instead of
>>>>>> setting it to a lower value. This is a lot easier to do in terms of
>>>>>> correctness.
>>>>>> But if there are other users (e.g. KASAN), their stackdepot usage may
>>>>>> actually dominate that of page_owner.
>>>>>>
>>>>>>>>> -static struct stack_record *stack_table[STACK_HASH_SIZE] = {
>>>>>>>>> -   [0 ...  STACK_HASH_SIZE - 1] = NULL
>>>>>>>>> +static unsigned int stack_hash_order = 20;
>>>>>>>>
>>>>>>>> Please initialize with MAX_STACK_HASH_ORDER instead.
>>>>>>>>
>>>>>>>
>>>>>>> Sure, will update this.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>>>> +
>>>>>>>>> +static int __init init_stackdepot(void)
>>>>>>>>> +{
>>>>>>>>> +   size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record 
>>>>>>>>> *));
>>>>>>>>> +
>>>>>>>>> +   stack_table = vmalloc(size);
>>>>>>>>> +   memcpy(stack_table, stack_table_def, size);
>>>>>>>>
>>>>>>>> Looks like you are assuming stack_table_def already contains some data
>>>>>>>> by this point.
>>>>>>>> But if STACK_HASH_SIZE shrinks this memcpy() above will just copy some
>>>>>>>> part of the table, whereas the rest will be lost.
>>>>>>>> We'll need to:
>>>>>>>> - either explicitly decide we can afford losing this data (no idea how
>>>&

Re: [PATCH v3] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2020-12-15 Thread Vijayanand Jitta



On 12/14/2020 4:02 PM, Vijayanand Jitta wrote:
> 
> 
> On 12/14/2020 3:04 PM, Alexander Potapenko wrote:
>> On Mon, Dec 14, 2020 at 5:02 AM Vijayanand Jitta  
>> wrote:
>>>
>>>
>>>
>>> On 12/11/2020 6:55 PM, Alexander Potapenko wrote:
>>>> On Fri, Dec 11, 2020 at 1:45 PM Vijayanand Jitta  
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 12/11/2020 2:06 PM, Alexander Potapenko wrote:
>>>>>> On Thu, Dec 10, 2020 at 6:01 AM  wrote:
>>>>>>>
>>>>>>> From: Yogesh Lal 
>>>>>>>
>>>>>>> Add a kernel parameter stack_hash_order to configure STACK_HASH_SIZE.
>>>>>>>
>>>>>>> Aim is to have configurable value for STACK_HASH_SIZE, so that one
>>>>>>> can configure it depending on usecase there by reducing the static
>>>>>>> memory overhead.
>>>>>>>
>>>>>>> One example is of Page Owner, default value of STACK_HASH_SIZE lead
>>>>>>> stack depot to consume 8MB of static memory. Making it configurable
>>>>>>> and use lower value helps to enable features like CONFIG_PAGE_OWNER
>>>>>>> without any significant overhead.
>>>>>>
>>>>>> Can we go with a static CONFIG_ parameter instead?
>>>>>> Guess most users won't bother changing the default anyway, and for
>>>>>> CONFIG_PAGE_OWNER users changing the size at boot time is not strictly
>>>>>> needed.
>>>>>>
>>>>> Thanks for review.
>>>>>
>>>>> One advantage of having run time parameter is we can simply set it to a
>>>>> lower value at runtime if page_owner=off thereby reducing the memory
>>>>> usage or use default value if we want to use page owner so, we have some
>>>>> some flexibility here. This is not possible with static parameter as we
>>>>> have to have some predefined value.
>>>>
>>>> If we are talking about a configuration in which page_owner is the
>>>> only stackdepot user in the system, then for page_owner=off it
>>>> probably makes more sense to disable stackdepot completely instead of
>>>> setting it to a lower value. This is a lot easier to do in terms of
>>>> correctness.
>>>> But if there are other users (e.g. KASAN), their stackdepot usage may
>>>> actually dominate that of page_owner.
>>>>
>>>>>>> -static struct stack_record *stack_table[STACK_HASH_SIZE] = {
>>>>>>> -   [0 ...  STACK_HASH_SIZE - 1] = NULL
>>>>>>> +static unsigned int stack_hash_order = 20;
>>>>>>
>>>>>> Please initialize with MAX_STACK_HASH_ORDER instead.
>>>>>>
>>>>>
>>>>> Sure, will update this.
>>>>>
>>>>
>>>>
>>>>>>> +
>>>>>>> +static int __init init_stackdepot(void)
>>>>>>> +{
>>>>>>> +   size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
>>>>>>> +
>>>>>>> +   stack_table = vmalloc(size);
>>>>>>> +   memcpy(stack_table, stack_table_def, size);
>>>>>>
>>>>>> Looks like you are assuming stack_table_def already contains some data
>>>>>> by this point.
>>>>>> But if STACK_HASH_SIZE shrinks this memcpy() above will just copy some
>>>>>> part of the table, whereas the rest will be lost.
>>>>>> We'll need to:
>>>>>> - either explicitly decide we can afford losing this data (no idea how
>>>>>> bad this can potentially be),
>>>>>> - or disallow storing anything prior to full stackdepot initialization
>>>>>> (then we don't need stack_table_def),
>>>>>> - or carefully move all entries to the first part of the table.
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>
>>>>> The hash for stack_table_def is computed using the run time parameter
>>>>> stack_hash_order, though stack_table_def is a bigger array it will only
>>>>> use the entries that are with in the run time configured STACK_HASH_SIZE
>>>>> range. so, there will be no data loss during copy.
>>>>
>>>> Do we expect any d

Re: [PATCH v3] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2020-12-14 Thread Vijayanand Jitta



On 12/14/2020 3:04 PM, Alexander Potapenko wrote:
> On Mon, Dec 14, 2020 at 5:02 AM Vijayanand Jitta  
> wrote:
>>
>>
>>
>> On 12/11/2020 6:55 PM, Alexander Potapenko wrote:
>>> On Fri, Dec 11, 2020 at 1:45 PM Vijayanand Jitta  
>>> wrote:
>>>>
>>>>
>>>>
>>>> On 12/11/2020 2:06 PM, Alexander Potapenko wrote:
>>>>> On Thu, Dec 10, 2020 at 6:01 AM  wrote:
>>>>>>
>>>>>> From: Yogesh Lal 
>>>>>>
>>>>>> Add a kernel parameter stack_hash_order to configure STACK_HASH_SIZE.
>>>>>>
>>>>>> Aim is to have configurable value for STACK_HASH_SIZE, so that one
>>>>>> can configure it depending on usecase there by reducing the static
>>>>>> memory overhead.
>>>>>>
>>>>>> One example is of Page Owner, default value of STACK_HASH_SIZE lead
>>>>>> stack depot to consume 8MB of static memory. Making it configurable
>>>>>> and use lower value helps to enable features like CONFIG_PAGE_OWNER
>>>>>> without any significant overhead.
>>>>>
>>>>> Can we go with a static CONFIG_ parameter instead?
>>>>> Guess most users won't bother changing the default anyway, and for
>>>>> CONFIG_PAGE_OWNER users changing the size at boot time is not strictly
>>>>> needed.
>>>>>
>>>> Thanks for review.
>>>>
>>>> One advantage of having run time parameter is we can simply set it to a
>>>> lower value at runtime if page_owner=off thereby reducing the memory
>>>> usage or use default value if we want to use page owner so, we have some
>>>> some flexibility here. This is not possible with static parameter as we
>>>> have to have some predefined value.
>>>
>>> If we are talking about a configuration in which page_owner is the
>>> only stackdepot user in the system, then for page_owner=off it
>>> probably makes more sense to disable stackdepot completely instead of
>>> setting it to a lower value. This is a lot easier to do in terms of
>>> correctness.
>>> But if there are other users (e.g. KASAN), their stackdepot usage may
>>> actually dominate that of page_owner.
>>>
>>>>>> -static struct stack_record *stack_table[STACK_HASH_SIZE] = {
>>>>>> -   [0 ...  STACK_HASH_SIZE - 1] = NULL
>>>>>> +static unsigned int stack_hash_order = 20;
>>>>>
>>>>> Please initialize with MAX_STACK_HASH_ORDER instead.
>>>>>
>>>>
>>>> Sure, will update this.
>>>>
>>>
>>>
>>>>>> +
>>>>>> +static int __init init_stackdepot(void)
>>>>>> +{
>>>>>> +   size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
>>>>>> +
>>>>>> +   stack_table = vmalloc(size);
>>>>>> +   memcpy(stack_table, stack_table_def, size);
>>>>>
>>>>> Looks like you are assuming stack_table_def already contains some data
>>>>> by this point.
>>>>> But if STACK_HASH_SIZE shrinks this memcpy() above will just copy some
>>>>> part of the table, whereas the rest will be lost.
>>>>> We'll need to:
>>>>> - either explicitly decide we can afford losing this data (no idea how
>>>>> bad this can potentially be),
>>>>> - or disallow storing anything prior to full stackdepot initialization
>>>>> (then we don't need stack_table_def),
>>>>> - or carefully move all entries to the first part of the table.
>>>>>
>>>>> Alex
>>>>>
>>>>
>>>> The hash for stack_table_def is computed using the run time parameter
>>>> stack_hash_order, though stack_table_def is a bigger array it will only
>>>> use the entries that are with in the run time configured STACK_HASH_SIZE
>>>> range. so, there will be no data loss during copy.
>>>
>>> Do we expect any data to be stored into stack_table_def before
>>> setup_stack_hash_order() is called?
>>> If the answer is no, then we could probably drop stack_table_def and
>>> allocate the table right in setup_stack_hash_order()?
>>>
>>
>> Yes, we do see an allocation from stack depot even before init is called
>> from kasan, thats the reason for having stack_table_def.
>> This is the issue reported due to that on v2, so i added stack_table_def.
>> https://lkml.org/lkml/2020/12/3/839
> 
> But at that point STACK_HASH_SIZE is still equal to 1L <<
> MAX_STACK_HASH_ORDER, isn't it?
> Then we still need to take care of the records that fit into the
> bigger array, but not the smaller one.
> 

At this point early_param is already called which sets stack_hash_order.
So, STACK_HASH_SIZE will be set to this updated value and not
MAX_STACK_HASH_SIZE.So, no additional entires in the bigger array.

Thanks,
Vijay

>> Thanks,
>> Vijay
>>
>>>> Thanks,
>>>> Vijay
>>>>
>>>> --
>>>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
>>>> member of Code Aurora Forum, hosted by The Linux Foundation
>>>
>>>
>>>
>>
>> --
>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
>> member of Code Aurora Forum, hosted by The Linux Foundation
> 
> 
> 

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v3] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2020-12-13 Thread Vijayanand Jitta



On 12/11/2020 6:55 PM, Alexander Potapenko wrote:
> On Fri, Dec 11, 2020 at 1:45 PM Vijayanand Jitta  
> wrote:
>>
>>
>>
>> On 12/11/2020 2:06 PM, Alexander Potapenko wrote:
>>> On Thu, Dec 10, 2020 at 6:01 AM  wrote:
>>>>
>>>> From: Yogesh Lal 
>>>>
>>>> Add a kernel parameter stack_hash_order to configure STACK_HASH_SIZE.
>>>>
>>>> Aim is to have configurable value for STACK_HASH_SIZE, so that one
>>>> can configure it depending on usecase there by reducing the static
>>>> memory overhead.
>>>>
>>>> One example is of Page Owner, default value of STACK_HASH_SIZE lead
>>>> stack depot to consume 8MB of static memory. Making it configurable
>>>> and use lower value helps to enable features like CONFIG_PAGE_OWNER
>>>> without any significant overhead.
>>>
>>> Can we go with a static CONFIG_ parameter instead?
>>> Guess most users won't bother changing the default anyway, and for
>>> CONFIG_PAGE_OWNER users changing the size at boot time is not strictly
>>> needed.
>>>
>> Thanks for review.
>>
>> One advantage of having run time parameter is we can simply set it to a
>> lower value at runtime if page_owner=off thereby reducing the memory
>> usage or use default value if we want to use page owner so, we have some
>> some flexibility here. This is not possible with static parameter as we
>> have to have some predefined value.
> 
> If we are talking about a configuration in which page_owner is the
> only stackdepot user in the system, then for page_owner=off it
> probably makes more sense to disable stackdepot completely instead of
> setting it to a lower value. This is a lot easier to do in terms of
> correctness.
> But if there are other users (e.g. KASAN), their stackdepot usage may
> actually dominate that of page_owner.
> 
>>>> -static struct stack_record *stack_table[STACK_HASH_SIZE] = {
>>>> -   [0 ...  STACK_HASH_SIZE - 1] = NULL
>>>> +static unsigned int stack_hash_order = 20;
>>>
>>> Please initialize with MAX_STACK_HASH_ORDER instead.
>>>
>>
>> Sure, will update this.
>>
> 
> 
>>>> +
>>>> +static int __init init_stackdepot(void)
>>>> +{
>>>> +   size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
>>>> +
>>>> +   stack_table = vmalloc(size);
>>>> +   memcpy(stack_table, stack_table_def, size);
>>>
>>> Looks like you are assuming stack_table_def already contains some data
>>> by this point.
>>> But if STACK_HASH_SIZE shrinks this memcpy() above will just copy some
>>> part of the table, whereas the rest will be lost.
>>> We'll need to:
>>> - either explicitly decide we can afford losing this data (no idea how
>>> bad this can potentially be),
>>> - or disallow storing anything prior to full stackdepot initialization
>>> (then we don't need stack_table_def),
>>> - or carefully move all entries to the first part of the table.
>>>
>>> Alex
>>>
>>
>> The hash for stack_table_def is computed using the run time parameter
>> stack_hash_order, though stack_table_def is a bigger array it will only
>> use the entries that are with in the run time configured STACK_HASH_SIZE
>> range. so, there will be no data loss during copy.
> 
> Do we expect any data to be stored into stack_table_def before
> setup_stack_hash_order() is called?
> If the answer is no, then we could probably drop stack_table_def and
> allocate the table right in setup_stack_hash_order()?
> 

Yes, we do see an allocation from stack depot even before init is called
from kasan, thats the reason for having stack_table_def.
This is the issue reported due to that on v2, so i added stack_table_def.
https://lkml.org/lkml/2020/12/3/839

Thanks,
Vijay

>> Thanks,
>> Vijay
>>
>> --
>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
>> member of Code Aurora Forum, hosted by The Linux Foundation
> 
> 
> 

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v3] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2020-12-11 Thread Vijayanand Jitta



On 12/11/2020 2:06 PM, Alexander Potapenko wrote:
> On Thu, Dec 10, 2020 at 6:01 AM  wrote:
>>
>> From: Yogesh Lal 
>>
>> Add a kernel parameter stack_hash_order to configure STACK_HASH_SIZE.
>>
>> Aim is to have configurable value for STACK_HASH_SIZE, so that one
>> can configure it depending on usecase there by reducing the static
>> memory overhead.
>>
>> One example is of Page Owner, default value of STACK_HASH_SIZE lead
>> stack depot to consume 8MB of static memory. Making it configurable
>> and use lower value helps to enable features like CONFIG_PAGE_OWNER
>> without any significant overhead.
> 
> Can we go with a static CONFIG_ parameter instead?
> Guess most users won't bother changing the default anyway, and for
> CONFIG_PAGE_OWNER users changing the size at boot time is not strictly
> needed.
> 
Thanks for review.

One advantage of having run time parameter is we can simply set it to a
lower value at runtime if page_owner=off thereby reducing the memory
usage or use default value if we want to use page owner so, we have some
some flexibility here. This is not possible with static parameter as we
have to have some predefined value.

>> -static struct stack_record *stack_table[STACK_HASH_SIZE] = {
>> -   [0 ...  STACK_HASH_SIZE - 1] = NULL
>> +static unsigned int stack_hash_order = 20;
> 
> Please initialize with MAX_STACK_HASH_ORDER instead.
> 

Sure, will update this.

>> +static struct stack_record *stack_table_def[MAX_STACK_HASH_SIZE] __initdata 
>> = {
>> +   [0 ...  MAX_STACK_HASH_SIZE - 1] = NULL
>>  };
>> +static struct stack_record **stack_table __refdata = stack_table_def;
>> +
>> +static int __init setup_stack_hash_order(char *str)
>> +{
>> +   kstrtouint(str, 0, _hash_order);
>> +   if (stack_hash_order > MAX_STACK_HASH_ORDER)
>> +   stack_hash_order = MAX_STACK_HASH_ORDER;
>> +   return 0;
>> +}
>> +early_param("stack_hash_order", setup_stack_hash_order);
>> +
>> +static int __init init_stackdepot(void)
>> +{
>> +   size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
>> +
>> +   stack_table = vmalloc(size);
>> +   memcpy(stack_table, stack_table_def, size);
> 
> Looks like you are assuming stack_table_def already contains some data
> by this point.
> But if STACK_HASH_SIZE shrinks this memcpy() above will just copy some
> part of the table, whereas the rest will be lost.
> We'll need to:
> - either explicitly decide we can afford losing this data (no idea how
> bad this can potentially be),
> - or disallow storing anything prior to full stackdepot initialization
> (then we don't need stack_table_def),
> - or carefully move all entries to the first part of the table.
> 
> Alex
> 

The hash for stack_table_def is computed using the run time parameter
stack_hash_order, though stack_table_def is a bigger array it will only
use the entries that are with in the run time configured STACK_HASH_SIZE
range. so, there will be no data loss during copy.

Thanks,
Vijay

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2020-11-12 Thread Vijayanand Jitta



On 11/4/2020 3:59 PM, Vijayanand Jitta wrote:
> 
> 
> On 11/4/2020 4:57 AM, Minchan Kim wrote:
>> Sorry if this mail corrupts the mail thread or had heavy mangling
>> since I lost this mail from my mailbox so I am sending this mail by
>> web gmail.
>>
>> On Thu, Oct 22, 2020 at 10:18 AM  wrote:
>>>
>>> From: Yogesh Lal 
>>>
>>> Use STACK_HASH_ORDER_SHIFT to configure STACK_HASH_SIZE.
>>>
>>> Aim is to have configurable value for  STACK_HASH_SIZE,
>>> so depend on use case one can configure it.
>>>
>>> One example is of Page Owner, default value of
>>> STACK_HASH_SIZE lead stack depot to consume 8MB of static memory.
>>> Making it configurable and use lower value helps to enable features like
>>> CONFIG_PAGE_OWNER without any significant overhead.
>>>
>>> Signed-off-by: Yogesh Lal 
>>> Signed-off-by: Vinayak Menon 
>>> Signed-off-by: Vijayanand Jitta 
>>> ---
>>>  lib/Kconfig  | 9 +
>>>  lib/stackdepot.c | 3 +--
>>>  2 files changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/lib/Kconfig b/lib/Kconfig
>>> index 18d76b6..b3f8259 100644
>>> --- a/lib/Kconfig
>>> +++ b/lib/Kconfig
>>> @@ -651,6 +651,15 @@ config STACKDEPOT
>>> bool
>>> select STACKTRACE
>>>
>>> +config STACK_HASH_ORDER_SHIFT
>>> +   int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
>>> +   range 12 20
>>> +   default 20
>>> +   depends on STACKDEPOT
>>> +   help
>>> +Select the hash size as a power of 2 for the stackdepot hash table.
>>> +Choose a lower value to reduce the memory impact.
>>> +
>>>  config SBITMAP
>>> bool
>>>
>>> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
>>> index 2caffc6..413c20b 100644
>>> --- a/lib/stackdepot.c
>>> +++ b/lib/stackdepot.c
>>> @@ -142,8 +142,7 @@ static struct stack_record *depot_alloc_stack(unsigned 
>>> long *entries, int size,
>>> return stack;
>>>  }
>>>
>>> -#define STACK_HASH_ORDER 20
>>> -#define STACK_HASH_SIZE (1L << STACK_HASH_ORDER)
>>> +#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER_SHIFT)
>>>  #define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
>>>  #define STACK_HASH_SEED 0x9747b28c
>>>
>>> --
>>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member 
>>> of Code Aurora Forum, hosted by The Linux Foundation
>>> 2.7.4
>>>
>>
>> 1. When we don't use page_owner, we don't want to waste any memory for
>> stackdepot hash array.
>> 2. When we use page_owner, we want to have reasonable stackdeport hash array
>>
>> With this configuration, it couldn't meet since we always need to
>> reserve a reasonable size for the array.
>> Can't we make the hash size as a kernel parameter?
>> With it, we could use it like this.
>>
>> 1. page_owner=off, stackdepot_stack_hash=0 -> no more wasted memory
>> when we don't use page_owner
>> 2. page_owner=on, stackdepot_stack_hash=8M -> reasonable hash size
>> when we use page_owner.
>>
>>
> 
> This idea looks fine to me. Andrew and others would like to hear your
> comments as well on this before implementing.
> 
> Thanks,
> Vijay
> 

Awaiting for comments from Andrew and others.

Thanks,
Vijay
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] lib: stackdepot: Add support to configure STACK_HASH_SIZE

2020-11-04 Thread Vijayanand Jitta



On 11/4/2020 4:57 AM, Minchan Kim wrote:
> Sorry if this mail corrupts the mail thread or had heavy mangling
> since I lost this mail from my mailbox so I am sending this mail by
> web gmail.
> 
> On Thu, Oct 22, 2020 at 10:18 AM  wrote:
>>
>> From: Yogesh Lal 
>>
>> Use STACK_HASH_ORDER_SHIFT to configure STACK_HASH_SIZE.
>>
>> Aim is to have configurable value for  STACK_HASH_SIZE,
>> so depend on use case one can configure it.
>>
>> One example is of Page Owner, default value of
>> STACK_HASH_SIZE lead stack depot to consume 8MB of static memory.
>> Making it configurable and use lower value helps to enable features like
>> CONFIG_PAGE_OWNER without any significant overhead.
>>
>> Signed-off-by: Yogesh Lal 
>> Signed-off-by: Vinayak Menon 
>> Signed-off-by: Vijayanand Jitta 
>> ---
>>  lib/Kconfig  | 9 +
>>  lib/stackdepot.c | 3 +--
>>  2 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/Kconfig b/lib/Kconfig
>> index 18d76b6..b3f8259 100644
>> --- a/lib/Kconfig
>> +++ b/lib/Kconfig
>> @@ -651,6 +651,15 @@ config STACKDEPOT
>> bool
>> select STACKTRACE
>>
>> +config STACK_HASH_ORDER_SHIFT
>> +   int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
>> +   range 12 20
>> +   default 20
>> +   depends on STACKDEPOT
>> +   help
>> +Select the hash size as a power of 2 for the stackdepot hash table.
>> +Choose a lower value to reduce the memory impact.
>> +
>>  config SBITMAP
>> bool
>>
>> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
>> index 2caffc6..413c20b 100644
>> --- a/lib/stackdepot.c
>> +++ b/lib/stackdepot.c
>> @@ -142,8 +142,7 @@ static struct stack_record *depot_alloc_stack(unsigned 
>> long *entries, int size,
>> return stack;
>>  }
>>
>> -#define STACK_HASH_ORDER 20
>> -#define STACK_HASH_SIZE (1L << STACK_HASH_ORDER)
>> +#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER_SHIFT)
>>  #define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
>>  #define STACK_HASH_SEED 0x9747b28c
>>
>> --
>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
>> Code Aurora Forum, hosted by The Linux Foundation
>> 2.7.4
>>
> 
> 1. When we don't use page_owner, we don't want to waste any memory for
> stackdepot hash array.
> 2. When we use page_owner, we want to have reasonable stackdeport hash array
> 
> With this configuration, it couldn't meet since we always need to
> reserve a reasonable size for the array.
> Can't we make the hash size as a kernel parameter?
> With it, we could use it like this.
> 
> 1. page_owner=off, stackdepot_stack_hash=0 -> no more wasted memory
> when we don't use page_owner
> 2. page_owner=on, stackdepot_stack_hash=8M -> reasonable hash size
> when we use page_owner.
> 
> 

This idea looks fine to me. Andrew and others would like to hear your
comments as well on this before implementing.

Thanks,
Vijay
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v5 2/2] iommu/iova: Free global iova rcache on iova alloc failure

2020-10-20 Thread Vijayanand Jitta



On 9/30/2020 1:14 PM, vji...@codeaurora.org wrote:
> From: Vijayanand Jitta 
> 
> When ever an iova alloc request fails we free the iova
> ranges present in the percpu iova rcaches and then retry
> but the global iova rcache is not freed as a result we could
> still see iova alloc failure even after retry as global
> rcache is holding the iova's which can cause fragmentation.
> So, free the global iova rcache as well and then go for the
> retry.
> 
> Signed-off-by: Vijayanand Jitta 
> ---
>  drivers/iommu/iova.c | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index c3a1a8e..faf9b13 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -25,6 +25,7 @@ static void init_iova_rcaches(struct iova_domain *iovad);
>  static void free_iova_rcaches(struct iova_domain *iovad);
>  static void fq_destroy_all_entries(struct iova_domain *iovad);
>  static void fq_flush_timeout(struct timer_list *t);
> +static void free_global_cached_iovas(struct iova_domain *iovad);
>  
>  void
>  init_iova_domain(struct iova_domain *iovad, unsigned long granule,
> @@ -442,6 +443,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
> size,
>   flush_rcache = false;
>   for_each_online_cpu(cpu)
>   free_cpu_cached_iovas(cpu, iovad);
> + free_global_cached_iovas(iovad);
>   goto retry;
>   }
>  
> @@ -1057,5 +1059,26 @@ void free_cpu_cached_iovas(unsigned int cpu, struct 
> iova_domain *iovad)
>   }
>  }
>  
> +/*
> + * free all the IOVA ranges of global cache
> + */
> +static void free_global_cached_iovas(struct iova_domain *iovad)
> +{
> + struct iova_rcache *rcache;
> + unsigned long flags;
> + int i, j;
> +
> + for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
> + rcache = >rcaches[i];
> + spin_lock_irqsave(>lock, flags);
> + for (j = 0; j < rcache->depot_size; ++j) {
> + iova_magazine_free_pfns(rcache->depot[j], iovad);
> + iova_magazine_free(rcache->depot[j]);
> + rcache->depot[j] = NULL;
> + }
> + rcache->depot_size = 0;
> + spin_unlock_irqrestore(>lock, flags);
> + }
> +}
>  MODULE_AUTHOR("Anil S Keshavamurthy ");
>  MODULE_LICENSE("GPL");
> 

Gentle ping.

Thanks,
Vijay
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v5 1/2] iommu/iova: Retry from last rb tree node if iova search fails

2020-10-20 Thread Vijayanand Jitta



On 9/30/2020 1:14 PM, vji...@codeaurora.org wrote:
> From: Vijayanand Jitta 
> 
> When ever a new iova alloc request comes iova is always searched
> from the cached node and the nodes which are previous to cached
> node. So, even if there is free iova space available in the nodes
> which are next to the cached node iova allocation can still fail
> because of this approach.
> 
> Consider the following sequence of iova alloc and frees on
> 1GB of iova space
> 
> 1) alloc - 500MB
> 2) alloc - 12MB
> 3) alloc - 499MB
> 4) free -  12MB which was allocated in step 2
> 5) alloc - 13MB
> 
> After the above sequence we will have 12MB of free iova space and
> cached node will be pointing to the iova pfn of last alloc of 13MB
> which will be the lowest iova pfn of that iova space. Now if we get an
> alloc request of 2MB we just search from cached node and then look
> for lower iova pfn's for free iova and as they aren't any, iova alloc
> fails though there is 12MB of free iova space.
> 
> To avoid such iova search failures do a retry from the last rb tree node
> when iova search fails, this will search the entire tree and get an iova
> if its available.
> 
> Signed-off-by: Vijayanand Jitta 
> Reviewed-by: Robin Murphy 
> ---
>  drivers/iommu/iova.c | 23 +--
>  1 file changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 30d969a..c3a1a8e 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -184,8 +184,9 @@ static int __alloc_and_insert_iova_range(struct 
> iova_domain *iovad,
>   struct rb_node *curr, *prev;
>   struct iova *curr_iova;
>   unsigned long flags;
> - unsigned long new_pfn;
> + unsigned long new_pfn, retry_pfn;
>   unsigned long align_mask = ~0UL;
> + unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn;
>  
>   if (size_aligned)
>   align_mask <<= fls_long(size - 1);
> @@ -198,15 +199,25 @@ static int __alloc_and_insert_iova_range(struct 
> iova_domain *iovad,
>  
>   curr = __get_cached_rbnode(iovad, limit_pfn);
>   curr_iova = rb_entry(curr, struct iova, node);
> + retry_pfn = curr_iova->pfn_hi + 1;
> +
> +retry:
>   do {
> - limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
> - new_pfn = (limit_pfn - size) & align_mask;
> + high_pfn = min(high_pfn, curr_iova->pfn_lo);
> + new_pfn = (high_pfn - size) & align_mask;
>   prev = curr;
>   curr = rb_prev(curr);
>   curr_iova = rb_entry(curr, struct iova, node);
> - } while (curr && new_pfn <= curr_iova->pfn_hi);
> -
> - if (limit_pfn < size || new_pfn < iovad->start_pfn) {
> + } while (curr && new_pfn <= curr_iova->pfn_hi && new_pfn >= low_pfn);
> +
> + if (high_pfn < size || new_pfn < low_pfn) {
> + if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) {
> + high_pfn = limit_pfn;
> + low_pfn = retry_pfn;
> + curr = >anchor.node;
> + curr_iova = rb_entry(curr, struct iova, node);
> + goto retry;
> + }
>   iovad->max32_alloc_size = size;
>   goto iova32_full;
>   }
> 

Gentle ping.

Thanks,
Vijay
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v2 2/2] iommu/iova: Free global iova rcache on iova alloc failure

2020-09-29 Thread Vijayanand Jitta



On 9/28/2020 6:11 PM, Vijayanand Jitta wrote:
> 
> 
> On 9/18/2020 8:11 PM, Robin Murphy wrote:
>> On 2020-08-20 13:49, vji...@codeaurora.org wrote:
>>> From: Vijayanand Jitta 
>>>
>>> When ever an iova alloc request fails we free the iova
>>> ranges present in the percpu iova rcaches and then retry
>>> but the global iova rcache is not freed as a result we could
>>> still see iova alloc failure even after retry as global
>>> rcache is holding the iova's which can cause fragmentation.
>>> So, free the global iova rcache as well and then go for the
>>> retry.
>>>
>>> Signed-off-by: Vijayanand Jitta 
>>> ---
>>>   drivers/iommu/iova.c | 23 +++
>>>   include/linux/iova.h |  6 ++
>>>   2 files changed, 29 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index 4e77116..5836c87 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -442,6 +442,7 @@ struct iova *find_iova(struct iova_domain *iovad,
>>> unsigned long pfn)
>>>   flush_rcache = false;
>>>   for_each_online_cpu(cpu)
>>>   free_cpu_cached_iovas(cpu, iovad);
>>> +    free_global_cached_iovas(iovad);
>>>   goto retry;
>>>   }
>>>   @@ -1055,5 +1056,27 @@ void free_cpu_cached_iovas(unsigned int cpu,
>>> struct iova_domain *iovad)
>>>   }
>>>   }
>>>   +/*
>>> + * free all the IOVA ranges of global cache
>>> + */
>>> +void free_global_cached_iovas(struct iova_domain *iovad)
>>
>> As John pointed out last time, this should be static and the header
>> changes dropped.
>>
>> (TBH we should probably register our own hotplug notifier instance for a
>> flush queue, so that external code has no need to poke at the per-CPU
>> caches either)
>>
>> Robin.
>>
> 
> Right, I have made it static and dropped header changes in v3.
> can you please review that.
> 
> Thanks,
> Vijay

Please review v4 instead of v3, I have updated other patch as well in v4.

Thanks,
Vijay
>>> +{
>>> +    struct iova_rcache *rcache;
>>> +    unsigned long flags;
>>> +    int i, j;
>>> +
>>> +    for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
>>> +    rcache = >rcaches[i];
>>> +    spin_lock_irqsave(>lock, flags);
>>> +    for (j = 0; j < rcache->depot_size; ++j) {
>>> +    iova_magazine_free_pfns(rcache->depot[j], iovad);
>>> +    iova_magazine_free(rcache->depot[j]);
>>> +    rcache->depot[j] = NULL;
>>> +    }
>>> +    rcache->depot_size = 0;
>>> +    spin_unlock_irqrestore(>lock, flags);
>>> +    }
>>> +}
>>> +
>>>   MODULE_AUTHOR("Anil S Keshavamurthy ");
>>>   MODULE_LICENSE("GPL");
>>> diff --git a/include/linux/iova.h b/include/linux/iova.h
>>> index a0637ab..a905726 100644
>>> --- a/include/linux/iova.h
>>> +++ b/include/linux/iova.h
>>> @@ -163,6 +163,7 @@ int init_iova_flush_queue(struct iova_domain *iovad,
>>>   struct iova *split_and_remove_iova(struct iova_domain *iovad,
>>>   struct iova *iova, unsigned long pfn_lo, unsigned long pfn_hi);
>>>   void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain
>>> *iovad);
>>> +void free_global_cached_iovas(struct iova_domain *iovad);
>>>   #else
>>>   static inline int iova_cache_get(void)
>>>   {
>>> @@ -270,6 +271,11 @@ static inline void free_cpu_cached_iovas(unsigned
>>> int cpu,
>>>    struct iova_domain *iovad)
>>>   {
>>>   }
>>> +
>>> +static inline void free_global_cached_iovas(struct iova_domain *iovad)
>>> +{
>>> +}
>>> +
>>>   #endif
>>>     #endif
>>>
> 

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v2 1/2] iommu/iova: Retry from last rb tree node if iova search fails

2020-09-29 Thread Vijayanand Jitta



On 9/18/2020 7:48 PM, Robin Murphy wrote:
> On 2020-08-20 13:49, vji...@codeaurora.org wrote:
>> From: Vijayanand Jitta 
>>
>> When ever a new iova alloc request comes iova is always searched
>> from the cached node and the nodes which are previous to cached
>> node. So, even if there is free iova space available in the nodes
>> which are next to the cached node iova allocation can still fail
>> because of this approach.
>>
>> Consider the following sequence of iova alloc and frees on
>> 1GB of iova space
>>
>> 1) alloc - 500MB
>> 2) alloc - 12MB
>> 3) alloc - 499MB
>> 4) free -  12MB which was allocated in step 2
>> 5) alloc - 13MB
>>
>> After the above sequence we will have 12MB of free iova space and
>> cached node will be pointing to the iova pfn of last alloc of 13MB
>> which will be the lowest iova pfn of that iova space. Now if we get an
>> alloc request of 2MB we just search from cached node and then look
>> for lower iova pfn's for free iova and as they aren't any, iova alloc
>> fails though there is 12MB of free iova space.
>>
>> To avoid such iova search failures do a retry from the last rb tree node
>> when iova search fails, this will search the entire tree and get an iova
>> if its available.
>>
>> Signed-off-by: Vijayanand Jitta 
>> ---
>>   drivers/iommu/iova.c | 23 +--
>>   1 file changed, 17 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index 49fc01f..4e77116 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -184,8 +184,9 @@ static int __alloc_and_insert_iova_range(struct
>> iova_domain *iovad,
>>   struct rb_node *curr, *prev;
>>   struct iova *curr_iova;
>>   unsigned long flags;
>> -    unsigned long new_pfn;
>> +    unsigned long new_pfn, low_pfn_new;
>>   unsigned long align_mask = ~0UL;
>> +    unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn;
>>     if (size_aligned)
>>   align_mask <<= fls_long(size - 1);
>> @@ -198,15 +199,25 @@ static int __alloc_and_insert_iova_range(struct
>> iova_domain *iovad,
>>     curr = __get_cached_rbnode(iovad, limit_pfn);
>>   curr_iova = rb_entry(curr, struct iova, node);
>> +    low_pfn_new = curr_iova->pfn_hi + 1;
> 
> Could we call "low_pfn_new" something like "retry_pfn" instead? This
> code already has unavoidable readability struggles with so many
> different "pfn"s in play, so having two different meanings of "new"
> really doesn't help.
> 
> Other than that, I think this looks OK (IIRC it's basically what I
> originally suggested), so with the naming tweaked,
> 
> Reviewed-by: Robin Murphy 
> 

Thanks for review, I have renamed it to retry_pfn in v4.

Thanks,
Vijay
>> +
>> +retry:
>>   do {
>> -    limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
>> -    new_pfn = (limit_pfn - size) & align_mask;
>> +    high_pfn = min(high_pfn, curr_iova->pfn_lo);
>> +    new_pfn = (high_pfn - size) & align_mask;
>>   prev = curr;
>>   curr = rb_prev(curr);
>>   curr_iova = rb_entry(curr, struct iova, node);
>> -    } while (curr && new_pfn <= curr_iova->pfn_hi);
>> -
>> -    if (limit_pfn < size || new_pfn < iovad->start_pfn) {
>> +    } while (curr && new_pfn <= curr_iova->pfn_hi && new_pfn >=
>> low_pfn);
>> +
>> +    if (high_pfn < size || new_pfn < low_pfn) {
>> +    if (low_pfn == iovad->start_pfn && low_pfn_new < limit_pfn) {
>> +    high_pfn = limit_pfn;
>> +    low_pfn = low_pfn_new;
>> +    curr = >anchor.node;
>> +    curr_iova = rb_entry(curr, struct iova, node);
>> +    goto retry;
>> +    }
>>   iovad->max32_alloc_size = size;
>>   goto iova32_full;
>>   }
>>

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v2 2/2] iommu/iova: Free global iova rcache on iova alloc failure

2020-09-28 Thread Vijayanand Jitta



On 9/18/2020 8:11 PM, Robin Murphy wrote:
> On 2020-08-20 13:49, vji...@codeaurora.org wrote:
>> From: Vijayanand Jitta 
>>
>> When ever an iova alloc request fails we free the iova
>> ranges present in the percpu iova rcaches and then retry
>> but the global iova rcache is not freed as a result we could
>> still see iova alloc failure even after retry as global
>> rcache is holding the iova's which can cause fragmentation.
>> So, free the global iova rcache as well and then go for the
>> retry.
>>
>> Signed-off-by: Vijayanand Jitta 
>> ---
>>   drivers/iommu/iova.c | 23 +++
>>   include/linux/iova.h |  6 ++
>>   2 files changed, 29 insertions(+)
>>
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index 4e77116..5836c87 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -442,6 +442,7 @@ struct iova *find_iova(struct iova_domain *iovad,
>> unsigned long pfn)
>>   flush_rcache = false;
>>   for_each_online_cpu(cpu)
>>   free_cpu_cached_iovas(cpu, iovad);
>> +    free_global_cached_iovas(iovad);
>>   goto retry;
>>   }
>>   @@ -1055,5 +1056,27 @@ void free_cpu_cached_iovas(unsigned int cpu,
>> struct iova_domain *iovad)
>>   }
>>   }
>>   +/*
>> + * free all the IOVA ranges of global cache
>> + */
>> +void free_global_cached_iovas(struct iova_domain *iovad)
> 
> As John pointed out last time, this should be static and the header
> changes dropped.
> 
> (TBH we should probably register our own hotplug notifier instance for a
> flush queue, so that external code has no need to poke at the per-CPU
> caches either)
> 
> Robin.
> 

Right, I have made it static and dropped header changes in v3.
can you please review that.

Thanks,
Vijay
>> +{
>> +    struct iova_rcache *rcache;
>> +    unsigned long flags;
>> +    int i, j;
>> +
>> +    for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
>> +    rcache = >rcaches[i];
>> +    spin_lock_irqsave(>lock, flags);
>> +    for (j = 0; j < rcache->depot_size; ++j) {
>> +    iova_magazine_free_pfns(rcache->depot[j], iovad);
>> +    iova_magazine_free(rcache->depot[j]);
>> +    rcache->depot[j] = NULL;
>> +    }
>> +    rcache->depot_size = 0;
>> +    spin_unlock_irqrestore(>lock, flags);
>> +    }
>> +}
>> +
>>   MODULE_AUTHOR("Anil S Keshavamurthy ");
>>   MODULE_LICENSE("GPL");
>> diff --git a/include/linux/iova.h b/include/linux/iova.h
>> index a0637ab..a905726 100644
>> --- a/include/linux/iova.h
>> +++ b/include/linux/iova.h
>> @@ -163,6 +163,7 @@ int init_iova_flush_queue(struct iova_domain *iovad,
>>   struct iova *split_and_remove_iova(struct iova_domain *iovad,
>>   struct iova *iova, unsigned long pfn_lo, unsigned long pfn_hi);
>>   void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain
>> *iovad);
>> +void free_global_cached_iovas(struct iova_domain *iovad);
>>   #else
>>   static inline int iova_cache_get(void)
>>   {
>> @@ -270,6 +271,11 @@ static inline void free_cpu_cached_iovas(unsigned
>> int cpu,
>>    struct iova_domain *iovad)
>>   {
>>   }
>> +
>> +static inline void free_global_cached_iovas(struct iova_domain *iovad)
>> +{
>> +}
>> +
>>   #endif
>>     #endif
>>

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v2 1/2] iommu/iova: Retry from last rb tree node if iova search fails

2020-09-13 Thread Vijayanand Jitta



On 8/28/2020 1:01 PM, Vijayanand Jitta wrote:
> 
> 
> On 8/20/2020 6:19 PM, vji...@codeaurora.org wrote:
>> From: Vijayanand Jitta 
>>
>> When ever a new iova alloc request comes iova is always searched
>> from the cached node and the nodes which are previous to cached
>> node. So, even if there is free iova space available in the nodes
>> which are next to the cached node iova allocation can still fail
>> because of this approach.
>>
>> Consider the following sequence of iova alloc and frees on
>> 1GB of iova space
>>
>> 1) alloc - 500MB
>> 2) alloc - 12MB
>> 3) alloc - 499MB
>> 4) free -  12MB which was allocated in step 2
>> 5) alloc - 13MB
>>
>> After the above sequence we will have 12MB of free iova space and
>> cached node will be pointing to the iova pfn of last alloc of 13MB
>> which will be the lowest iova pfn of that iova space. Now if we get an
>> alloc request of 2MB we just search from cached node and then look
>> for lower iova pfn's for free iova and as they aren't any, iova alloc
>> fails though there is 12MB of free iova space.
>>
>> To avoid such iova search failures do a retry from the last rb tree node
>> when iova search fails, this will search the entire tree and get an iova
>> if its available.
>>
>> Signed-off-by: Vijayanand Jitta 
>> ---
>>  drivers/iommu/iova.c | 23 +--
>>  1 file changed, 17 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index 49fc01f..4e77116 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -184,8 +184,9 @@ static int __alloc_and_insert_iova_range(struct 
>> iova_domain *iovad,
>>  struct rb_node *curr, *prev;
>>  struct iova *curr_iova;
>>  unsigned long flags;
>> -unsigned long new_pfn;
>> +unsigned long new_pfn, low_pfn_new;
>>  unsigned long align_mask = ~0UL;
>> +unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn;
>>  
>>  if (size_aligned)
>>  align_mask <<= fls_long(size - 1);
>> @@ -198,15 +199,25 @@ static int __alloc_and_insert_iova_range(struct 
>> iova_domain *iovad,
>>  
>>  curr = __get_cached_rbnode(iovad, limit_pfn);
>>  curr_iova = rb_entry(curr, struct iova, node);
>> +low_pfn_new = curr_iova->pfn_hi + 1;
>> +
>> +retry:
>>  do {
>> -limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
>> -new_pfn = (limit_pfn - size) & align_mask;
>> +high_pfn = min(high_pfn, curr_iova->pfn_lo);
>> +new_pfn = (high_pfn - size) & align_mask;
>>  prev = curr;
>>  curr = rb_prev(curr);
>>  curr_iova = rb_entry(curr, struct iova, node);
>> -} while (curr && new_pfn <= curr_iova->pfn_hi);
>> -
>> -if (limit_pfn < size || new_pfn < iovad->start_pfn) {
>> +} while (curr && new_pfn <= curr_iova->pfn_hi && new_pfn >= low_pfn);
>> +
>> +if (high_pfn < size || new_pfn < low_pfn) {
>> +if (low_pfn == iovad->start_pfn && low_pfn_new < limit_pfn) {
>> +high_pfn = limit_pfn;
>> +low_pfn = low_pfn_new;
>> +curr = >anchor.node;
>> +curr_iova = rb_entry(curr, struct iova, node);
>> +goto retry;
>> +}
>>  iovad->max32_alloc_size = size;
>>  goto iova32_full;
>>  }
>>
> 
> ping ?
> 
> Thanks,
> Vijay
> 

ping ?

Thanks,
Vijay
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v2 1/2] iommu/iova: Retry from last rb tree node if iova search fails

2020-08-28 Thread Vijayanand Jitta



On 8/20/2020 6:19 PM, vji...@codeaurora.org wrote:
> From: Vijayanand Jitta 
> 
> When ever a new iova alloc request comes iova is always searched
> from the cached node and the nodes which are previous to cached
> node. So, even if there is free iova space available in the nodes
> which are next to the cached node iova allocation can still fail
> because of this approach.
> 
> Consider the following sequence of iova alloc and frees on
> 1GB of iova space
> 
> 1) alloc - 500MB
> 2) alloc - 12MB
> 3) alloc - 499MB
> 4) free -  12MB which was allocated in step 2
> 5) alloc - 13MB
> 
> After the above sequence we will have 12MB of free iova space and
> cached node will be pointing to the iova pfn of last alloc of 13MB
> which will be the lowest iova pfn of that iova space. Now if we get an
> alloc request of 2MB we just search from cached node and then look
> for lower iova pfn's for free iova and as they aren't any, iova alloc
> fails though there is 12MB of free iova space.
> 
> To avoid such iova search failures do a retry from the last rb tree node
> when iova search fails, this will search the entire tree and get an iova
> if its available.
> 
> Signed-off-by: Vijayanand Jitta 
> ---
>  drivers/iommu/iova.c | 23 +--
>  1 file changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 49fc01f..4e77116 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -184,8 +184,9 @@ static int __alloc_and_insert_iova_range(struct 
> iova_domain *iovad,
>   struct rb_node *curr, *prev;
>   struct iova *curr_iova;
>   unsigned long flags;
> - unsigned long new_pfn;
> + unsigned long new_pfn, low_pfn_new;
>   unsigned long align_mask = ~0UL;
> + unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn;
>  
>   if (size_aligned)
>   align_mask <<= fls_long(size - 1);
> @@ -198,15 +199,25 @@ static int __alloc_and_insert_iova_range(struct 
> iova_domain *iovad,
>  
>   curr = __get_cached_rbnode(iovad, limit_pfn);
>   curr_iova = rb_entry(curr, struct iova, node);
> + low_pfn_new = curr_iova->pfn_hi + 1;
> +
> +retry:
>   do {
> - limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
> - new_pfn = (limit_pfn - size) & align_mask;
> + high_pfn = min(high_pfn, curr_iova->pfn_lo);
> + new_pfn = (high_pfn - size) & align_mask;
>   prev = curr;
>   curr = rb_prev(curr);
>   curr_iova = rb_entry(curr, struct iova, node);
> - } while (curr && new_pfn <= curr_iova->pfn_hi);
> -
> - if (limit_pfn < size || new_pfn < iovad->start_pfn) {
> + } while (curr && new_pfn <= curr_iova->pfn_hi && new_pfn >= low_pfn);
> +
> + if (high_pfn < size || new_pfn < low_pfn) {
> + if (low_pfn == iovad->start_pfn && low_pfn_new < limit_pfn) {
> + high_pfn = limit_pfn;
> + low_pfn = low_pfn_new;
> + curr = >anchor.node;
> + curr_iova = rb_entry(curr, struct iova, node);
> + goto retry;
> + }
>   iovad->max32_alloc_size = size;
>   goto iova32_full;
>   }
> 

ping ?

Thanks,
Vijay
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH 2/2] iommu/iova: Free global iova rcache on iova alloc failure

2020-08-12 Thread Vijayanand Jitta



On 8/12/2020 8:46 PM, Joerg Roedel wrote:
> On Mon, Aug 03, 2020 at 03:30:48PM +0530, Vijayanand Jitta wrote:
>> ping?
> 
> Please repost when v5.9-rc1 is released and add
> 
>   Robin Murphy 
> 
> on your Cc list.
> 
> Thanks,
> 
>   Joerg
> 

Sure, will do.

Thanks,
Vijay
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH 2/2] iommu/iova: Free global iova rcache on iova alloc failure

2020-08-03 Thread Vijayanand Jitta



On 7/3/2020 7:47 PM, vji...@codeaurora.org wrote:
> From: Vijayanand Jitta 
> 
> When ever an iova alloc request fails we free the iova
> ranges present in the percpu iova rcaches and then retry
> but the global iova rcache is not freed as a result we could
> still see iova alloc failure even after retry as global
> rcache is holding the iova's which can cause fragmentation.
> So, free the global iova rcache as well and then go for the
> retry.
> 
> Change-Id: Ib8236dc88ba5516b73d4fbf6bf8e68bbf09bbad2
> Signed-off-by: Vijayanand Jitta 
> ---
>  drivers/iommu/iova.c | 23 +++
>  include/linux/iova.h |  6 ++
>  2 files changed, 29 insertions(+)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 4e77116..5836c87 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -442,6 +442,7 @@ struct iova *find_iova(struct iova_domain *iovad, 
> unsigned long pfn)
>   flush_rcache = false;
>   for_each_online_cpu(cpu)
>   free_cpu_cached_iovas(cpu, iovad);
> + free_global_cached_iovas(iovad);
>   goto retry;
>   }
>  
> @@ -1055,5 +1056,27 @@ void free_cpu_cached_iovas(unsigned int cpu, struct 
> iova_domain *iovad)
>   }
>  }
>  
> +/*
> + * free all the IOVA ranges of global cache
> + */
> +void free_global_cached_iovas(struct iova_domain *iovad)
> +{
> + struct iova_rcache *rcache;
> + unsigned long flags;
> + int i, j;
> +
> + for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
> + rcache = >rcaches[i];
> + spin_lock_irqsave(>lock, flags);
> + for (j = 0; j < rcache->depot_size; ++j) {
> + iova_magazine_free_pfns(rcache->depot[j], iovad);
> + iova_magazine_free(rcache->depot[j]);
> + rcache->depot[j] = NULL;
> + }
> + rcache->depot_size = 0;
> + spin_unlock_irqrestore(>lock, flags);
> + }
> +}
> +
>  MODULE_AUTHOR("Anil S Keshavamurthy ");
>  MODULE_LICENSE("GPL");
> diff --git a/include/linux/iova.h b/include/linux/iova.h
> index a0637ab..a905726 100644
> --- a/include/linux/iova.h
> +++ b/include/linux/iova.h
> @@ -163,6 +163,7 @@ int init_iova_flush_queue(struct iova_domain *iovad,
>  struct iova *split_and_remove_iova(struct iova_domain *iovad,
>   struct iova *iova, unsigned long pfn_lo, unsigned long pfn_hi);
>  void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad);
> +void free_global_cached_iovas(struct iova_domain *iovad);
>  #else
>  static inline int iova_cache_get(void)
>  {
> @@ -270,6 +271,11 @@ static inline void free_cpu_cached_iovas(unsigned int 
> cpu,
>struct iova_domain *iovad)
>  {
>  }
> +
> +static inline void free_global_cached_iovas(struct iova_domain *iovad)
> +{
> +}
> +
>  #endif
>  
>  #endif
> 

ping?
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH 1/2] iommu/iova: Retry from last rb tree node if iova search fails

2020-08-03 Thread Vijayanand Jitta



On 7/3/2020 7:47 PM, vji...@codeaurora.org wrote:
> From: Vijayanand Jitta 
> 
> When ever a new iova alloc request comes iova is always searched
> from the cached node and the nodes which are previous to cached
> node. So, even if there is free iova space available in the nodes
> which are next to the cached node iova allocation can still fail
> because of this approach.
> 
> Consider the following sequence of iova alloc and frees on
> 1GB of iova space
> 
> 1) alloc - 500MB
> 2) alloc - 12MB
> 3) alloc - 499MB
> 4) free -  12MB which was allocated in step 2
> 5) alloc - 13MB
> 
> After the above sequence we will have 12MB of free iova space and
> cached node will be pointing to the iova pfn of last alloc of 13MB
> which will be the lowest iova pfn of that iova space. Now if we get an
> alloc request of 2MB we just search from cached node and then look
> for lower iova pfn's for free iova and as they aren't any, iova alloc
> fails though there is 12MB of free iova space.
> 
> To avoid such iova search failures do a retry from the last rb tree node
> when iova search fails, this will search the entire tree and get an iova
> if its available.
> 
> Signed-off-by: Vijayanand Jitta 
> ---
>  drivers/iommu/iova.c | 23 +--
>  1 file changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 49fc01f..4e77116 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -184,8 +184,9 @@ static int __alloc_and_insert_iova_range(struct 
> iova_domain *iovad,
>   struct rb_node *curr, *prev;
>   struct iova *curr_iova;
>   unsigned long flags;
> - unsigned long new_pfn;
> + unsigned long new_pfn, low_pfn_new;
>   unsigned long align_mask = ~0UL;
> + unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn;
>  
>   if (size_aligned)
>   align_mask <<= fls_long(size - 1);
> @@ -198,15 +199,25 @@ static int __alloc_and_insert_iova_range(struct 
> iova_domain *iovad,
>  
>   curr = __get_cached_rbnode(iovad, limit_pfn);
>   curr_iova = rb_entry(curr, struct iova, node);
> + low_pfn_new = curr_iova->pfn_hi + 1;
> +
> +retry:
>   do {
> - limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
> - new_pfn = (limit_pfn - size) & align_mask;
> + high_pfn = min(high_pfn, curr_iova->pfn_lo);
> + new_pfn = (high_pfn - size) & align_mask;
>   prev = curr;
>   curr = rb_prev(curr);
>   curr_iova = rb_entry(curr, struct iova, node);
> - } while (curr && new_pfn <= curr_iova->pfn_hi);
> -
> - if (limit_pfn < size || new_pfn < iovad->start_pfn) {
> + } while (curr && new_pfn <= curr_iova->pfn_hi && new_pfn >= low_pfn);
> +
> + if (high_pfn < size || new_pfn < low_pfn) {
> + if (low_pfn == iovad->start_pfn && low_pfn_new < limit_pfn) {
> + high_pfn = limit_pfn;
> + low_pfn = low_pfn_new;
> + curr = >anchor.node;
> + curr_iova = rb_entry(curr, struct iova, node);
> + goto retry;
> + }
>   iovad->max32_alloc_size = size;
>   goto iova32_full;
>   }
> 

ping?
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v2] iommu/iova: Retry from last rb tree node if iova search fails

2020-05-25 Thread Vijayanand Jitta



On 5/11/2020 4:34 PM, vji...@codeaurora.org wrote:
> From: Vijayanand Jitta 
> 
> When ever a new iova alloc request comes iova is always searched
> from the cached node and the nodes which are previous to cached
> node. So, even if there is free iova space available in the nodes
> which are next to the cached node iova allocation can still fail
> because of this approach.
> 
> Consider the following sequence of iova alloc and frees on
> 1GB of iova space
> 
> 1) alloc - 500MB
> 2) alloc - 12MB
> 3) alloc - 499MB
> 4) free -  12MB which was allocated in step 2
> 5) alloc - 13MB
> 
> After the above sequence we will have 12MB of free iova space and
> cached node will be pointing to the iova pfn of last alloc of 13MB
> which will be the lowest iova pfn of that iova space. Now if we get an
> alloc request of 2MB we just search from cached node and then look
> for lower iova pfn's for free iova and as they aren't any, iova alloc
> fails though there is 12MB of free iova space.
> 
> To avoid such iova search failures do a retry from the last rb tree node
> when iova search fails, this will search the entire tree and get an iova
> if its available
> 
> Signed-off-by: Vijayanand Jitta 
> ---
>  drivers/iommu/iova.c | 19 +++
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 0e6a953..7d82afc 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -184,8 +184,9 @@ static int __alloc_and_insert_iova_range(struct 
> iova_domain *iovad,
>   struct rb_node *curr, *prev;
>   struct iova *curr_iova;
>   unsigned long flags;
> - unsigned long new_pfn;
> + unsigned long new_pfn, alloc_lo_new;
>   unsigned long align_mask = ~0UL;
> + unsigned long alloc_hi = limit_pfn, alloc_lo = iovad->start_pfn;
>  
>   if (size_aligned)
>   align_mask <<= fls_long(size - 1);
> @@ -198,15 +199,25 @@ static int __alloc_and_insert_iova_range(struct 
> iova_domain *iovad,
>  
>   curr = __get_cached_rbnode(iovad, limit_pfn);
>   curr_iova = rb_entry(curr, struct iova, node);
> + alloc_lo_new = curr_iova->pfn_hi;
> +
> +retry:
>   do {
> - limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
> - new_pfn = (limit_pfn - size) & align_mask;
> + alloc_hi = min(alloc_hi, curr_iova->pfn_lo);
> + new_pfn = (alloc_hi - size) & align_mask;
>   prev = curr;
>   curr = rb_prev(curr);
>   curr_iova = rb_entry(curr, struct iova, node);
>   } while (curr && new_pfn <= curr_iova->pfn_hi);
>  
> - if (limit_pfn < size || new_pfn < iovad->start_pfn) {
> + if (alloc_hi < size || new_pfn < alloc_lo) {
> + if (alloc_lo == iovad->start_pfn && alloc_lo_new < limit_pfn) {
> + alloc_hi = limit_pfn;
> + alloc_lo = alloc_lo_new;
> + curr = >anchor.node;
> + curr_iova = rb_entry(curr, struct iova, node);
> + goto retry;
> + }
>   iovad->max32_alloc_size = size;
>   goto iova32_full;
>   }
> 

ping?
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] iommu/iova: Retry from last rb tree node if iova search fails

2020-05-11 Thread Vijayanand Jitta



On 5/9/2020 12:25 AM, Vijayanand Jitta wrote:
> 
> 
> On 5/7/2020 6:54 PM, Robin Murphy wrote:
>> On 2020-05-06 9:01 pm, vji...@codeaurora.org wrote:
>>> From: Vijayanand Jitta 
>>>
>>> When ever a new iova alloc request comes iova is always searched
>>> from the cached node and the nodes which are previous to cached
>>> node. So, even if there is free iova space available in the nodes
>>> which are next to the cached node iova allocation can still fail
>>> because of this approach.
>>>
>>> Consider the following sequence of iova alloc and frees on
>>> 1GB of iova space
>>>
>>> 1) alloc - 500MB
>>> 2) alloc - 12MB
>>> 3) alloc - 499MB
>>> 4) free -  12MB which was allocated in step 2
>>> 5) alloc - 13MB
>>>
>>> After the above sequence we will have 12MB of free iova space and
>>> cached node will be pointing to the iova pfn of last alloc of 13MB
>>> which will be the lowest iova pfn of that iova space. Now if we get an
>>> alloc request of 2MB we just search from cached node and then look
>>> for lower iova pfn's for free iova and as they aren't any, iova alloc
>>> fails though there is 12MB of free iova space.
>>
>> Yup, this could definitely do with improving. Unfortunately I think this
>> particular implementation is slightly flawed...
>>
>>> To avoid such iova search failures do a retry from the last rb tree node
>>> when iova search fails, this will search the entire tree and get an iova
>>> if its available
>>>
>>> Signed-off-by: Vijayanand Jitta 
>>> ---
>>>   drivers/iommu/iova.c | 11 +++
>>>   1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index 0e6a953..2985222 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -186,6 +186,7 @@ static int __alloc_and_insert_iova_range(struct
>>> iova_domain *iovad,
>>>   unsigned long flags;
>>>   unsigned long new_pfn;
>>>   unsigned long align_mask = ~0UL;
>>> +    bool retry = false;
>>>     if (size_aligned)
>>>   align_mask <<= fls_long(size - 1);
>>> @@ -198,6 +199,8 @@ static int __alloc_and_insert_iova_range(struct
>>> iova_domain *iovad,
>>>     curr = __get_cached_rbnode(iovad, limit_pfn);
>>>   curr_iova = rb_entry(curr, struct iova, node);
>>> +
>>> +retry_search:
>>>   do {
>>>   limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
>>>   new_pfn = (limit_pfn - size) & align_mask;
>>> @@ -207,6 +210,14 @@ static int __alloc_and_insert_iova_range(struct
>>> iova_domain *iovad,
>>>   } while (curr && new_pfn <= curr_iova->pfn_hi);
>>>     if (limit_pfn < size || new_pfn < iovad->start_pfn) {
>>> +    if (!retry) {
>>> +    curr = rb_last(>rbroot);
>>
>> Why walk when there's an anchor node there already? However...
>>
>>> +    curr_iova = rb_entry(curr, struct iova, node);
>>> +    limit_pfn = curr_iova->pfn_lo;
>>
>> ...this doesn't look right, as by now we've lost the original limit_pfn
>> supplied by the caller, so are highly likely to allocate beyond the
>> range our caller asked for. In fact AFAICS we'd start allocating from
>> directly directly below the anchor node, beyond the end of the entire
>> address space.
>>
>> The logic I was imagining we want here was something like the rapidly
>> hacked up (and untested) diff below.
>>
>> Thanks,
>> Robin.
>>
> 
> Thanks for your comments ,I have gone through below logic and I see some
> issue with retry check as there could be case where alloc_lo is set to
> some pfn other than start_pfn in that case we don't retry and there can
> still be iova available. I understand its a hacked up version, I can
> work on this.
> 
> But how about we just store limit_pfn and get the node using that and
> retry for once from that node, it would be similar to my patch just
> correcting the curr node and limit_pfn update in retry check. do you see
> any issue with this approach ?
> 
> 
> Thanks,
> Vijay.

I found one issue with my earlier approach, where we search twice from
cached node to the start_pfn, this can be avoided if we store the pfn_hi
of the cached node make this as alloc_lo when we retry. I see the below
diff also does the same, I have posted v2 version of the

Re: [PATCH] iommu/iova: Retry from last rb tree node if iova search fails

2020-05-08 Thread Vijayanand Jitta



On 5/7/2020 6:54 PM, Robin Murphy wrote:
> On 2020-05-06 9:01 pm, vji...@codeaurora.org wrote:
>> From: Vijayanand Jitta 
>>
>> When ever a new iova alloc request comes iova is always searched
>> from the cached node and the nodes which are previous to cached
>> node. So, even if there is free iova space available in the nodes
>> which are next to the cached node iova allocation can still fail
>> because of this approach.
>>
>> Consider the following sequence of iova alloc and frees on
>> 1GB of iova space
>>
>> 1) alloc - 500MB
>> 2) alloc - 12MB
>> 3) alloc - 499MB
>> 4) free -  12MB which was allocated in step 2
>> 5) alloc - 13MB
>>
>> After the above sequence we will have 12MB of free iova space and
>> cached node will be pointing to the iova pfn of last alloc of 13MB
>> which will be the lowest iova pfn of that iova space. Now if we get an
>> alloc request of 2MB we just search from cached node and then look
>> for lower iova pfn's for free iova and as they aren't any, iova alloc
>> fails though there is 12MB of free iova space.
> 
> Yup, this could definitely do with improving. Unfortunately I think this
> particular implementation is slightly flawed...
> 
>> To avoid such iova search failures do a retry from the last rb tree node
>> when iova search fails, this will search the entire tree and get an iova
>> if its available
>>
>> Signed-off-by: Vijayanand Jitta 
>> ---
>>   drivers/iommu/iova.c | 11 +++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index 0e6a953..2985222 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -186,6 +186,7 @@ static int __alloc_and_insert_iova_range(struct
>> iova_domain *iovad,
>>   unsigned long flags;
>>   unsigned long new_pfn;
>>   unsigned long align_mask = ~0UL;
>> +    bool retry = false;
>>     if (size_aligned)
>>   align_mask <<= fls_long(size - 1);
>> @@ -198,6 +199,8 @@ static int __alloc_and_insert_iova_range(struct
>> iova_domain *iovad,
>>     curr = __get_cached_rbnode(iovad, limit_pfn);
>>   curr_iova = rb_entry(curr, struct iova, node);
>> +
>> +retry_search:
>>   do {
>>   limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
>>   new_pfn = (limit_pfn - size) & align_mask;
>> @@ -207,6 +210,14 @@ static int __alloc_and_insert_iova_range(struct
>> iova_domain *iovad,
>>   } while (curr && new_pfn <= curr_iova->pfn_hi);
>>     if (limit_pfn < size || new_pfn < iovad->start_pfn) {
>> +    if (!retry) {
>> +    curr = rb_last(>rbroot);
> 
> Why walk when there's an anchor node there already? However...
> 
>> +    curr_iova = rb_entry(curr, struct iova, node);
>> +    limit_pfn = curr_iova->pfn_lo;
> 
> ...this doesn't look right, as by now we've lost the original limit_pfn
> supplied by the caller, so are highly likely to allocate beyond the
> range our caller asked for. In fact AFAICS we'd start allocating from
> directly directly below the anchor node, beyond the end of the entire
> address space.
> 
> The logic I was imagining we want here was something like the rapidly
> hacked up (and untested) diff below.
> 
> Thanks,
> Robin.
> 

Thanks for your comments ,I have gone through below logic and I see some
issue with retry check as there could be case where alloc_lo is set to
some pfn other than start_pfn in that case we don't retry and there can
still be iova available. I understand its a hacked up version, I can
work on this.

But how about we just store limit_pfn and get the node using that and
retry for once from that node, it would be similar to my patch just
correcting the curr node and limit_pfn update in retry check. do you see
any issue with this approach ?


Thanks,
Vijay.
> ->8-
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 0e6a9536eca6..3574c19272d6 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -186,6 +186,7 @@ static int __alloc_and_insert_iova_range(struct
> iova_domain *iovad,
>     unsigned long flags;
>     unsigned long new_pfn;
>     unsigned long align_mask = ~0UL;
> +   unsigned long alloc_hi, alloc_lo;
> 
>     if (size_aligned)
>     align_mask <<= fls_long(size - 1);
> @@ -196,17 +197,27 @@ static int __alloc_and_insert_iova_range(struct
> iova_domain *iovad,
>     size >= iovad->max32_alloc_size)
>     goto iova

Re: [PATCH 1/3] mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES

2018-04-24 Thread Vijayanand Jitta


On 4/13/2018 5:43 PM, vinayak menon wrote:
> On Thu, Apr 12, 2018 at 8:27 PM, Roman Gushchin  wrote:
>> On Thu, Apr 12, 2018 at 08:52:52AM +0200, Vlastimil Babka wrote:
>>> On 04/11/2018 03:56 PM, Roman Gushchin wrote:
 On Wed, Apr 11, 2018 at 03:16:08PM +0200, Vlastimil Babka wrote:
> [+CC linux-api]
>
> On 03/05/2018 02:37 PM, Roman Gushchin wrote:
>> This patch introduces a concept of indirectly reclaimable memory
>> and adds the corresponding memory counter and /proc/vmstat item.
>>
>> Indirectly reclaimable memory is any sort of memory, used by
>> the kernel (except of reclaimable slabs), which is actually
>> reclaimable, i.e. will be released under memory pressure.
>>
>> The counter is in bytes, as it's not always possible to
>> count such objects in pages. The name contains BYTES
>> by analogy to NR_KERNEL_STACK_KB.
>>
>> Signed-off-by: Roman Gushchin 
>> Cc: Andrew Morton 
>> Cc: Alexander Viro 
>> Cc: Michal Hocko 
>> Cc: Johannes Weiner 
>> Cc: linux-fsde...@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: linux...@kvack.org
>> Cc: kernel-t...@fb.com
>
> Hmm, looks like I'm late and this user-visible API change was just
> merged. But it's for rc1, so we can still change it, hopefully?
>
> One problem I see with the counter is that it's in bytes, but among
> counters that use pages, and the name doesn't indicate it.

 Here I just followed "nr_kernel_stack" path, which is measured in kB,
 but this is not mentioned in the field name.
>>>
>>> Oh, didn't know. Bad example to follow :P
>>>
> Then, I don't
> see why users should care about the "indirectly" part, as that's just an
> implementation detail. It is reclaimable and that's what matters, right?
> (I also wanted to complain about lack of Documentation/... update, but
> looks like there's no general file about vmstat, ugh)

 I agree, that it's a bit weird, and it's probably better to not expose
 it at all; but this is how all vm counters work. We do expose them all
 in /proc/vmstat. A good number of them is useless until you are not a
 mm developer, so it's arguable more "debug info" rather than "api".
>>>
>>> Yeah the problem is that once tools start rely on them, they fall under
>>> the "do not break userspace" rule, however we call them. So being
>>> cautious and conservative can't hurt.
>>>
 It's definitely not a reason to make them messy.
 Does "nr_indirectly_reclaimable_bytes" look better to you?
>>>
>>> It still has has the "indirecly" part and feels arbitrary :/
>>>
>
> I also kind of liked the idea from v1 rfc posting that there would be a
> separate set of reclaimable kmalloc-X caches for these kind of
> allocations. Besides accounting, it should also help reduce memory
> fragmentation. The right variant of cache would be detected via
> __GFP_RECLAIMABLE.

 Well, the downside is that we have to introduce X new caches
 just for this particular problem. I'm not strictly against the idea,
 but not convinced that it's much better.
>>>
>>> Maybe we can find more cases that would benefit from it. Heck, even slab
>>> itself allocates some management structures from the generic kmalloc
>>> caches, and if they are used for reclaimable caches, they could be
>>> tracked as reclaimable as well.
>>
>> This is a good catch!
>>
>>>
>
> With that in mind, can we at least for now put the (manually maintained)
> byte counter in a variable that's not directly exposed via /proc/vmstat,
> and then when printing nr_slab_reclaimable, simply add the value
> (divided by PAGE_SIZE), and when printing nr_slab_unreclaimable,
> subtract the same value. This way we would be simply making the existing
> counters more precise, in line with their semantics.

 Idk, I don't like the idea of adding a counter outside of the vm counters
 infrastructure, and I definitely wouldn't touch the exposed
 nr_slab_reclaimable and nr_slab_unreclaimable fields.
>>>
>>> We would be just making the reported values more precise wrt reality.
>>
>> It depends on if we believe that only slab memory can be reclaimable
>> or not. If yes, this is true, otherwise not.
>>
>> My guess is that some drivers (e.g. networking) might have buffers,
>> which are reclaimable under mempressure, and are allocated using
>> the page allocator. But I have to look closer...
>>
> 
> One such case I have encountered is that of the ION page pool. The page pool
> registers a shrinker. When not in any memory pressure page pool can go high
> and thus cause an mmap to fail when OVERCOMMIT_GUESS is set. I can send
> a patch to account ION page pool pages in NR_INDIRECTLY_RECLAIMABLE_BYTES.
> 
> Thanks,
> Vinayak
> 

As Vinayak 

Re: [PATCH 1/3] mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES

2018-04-24 Thread Vijayanand Jitta


On 4/13/2018 5:43 PM, vinayak menon wrote:
> On Thu, Apr 12, 2018 at 8:27 PM, Roman Gushchin  wrote:
>> On Thu, Apr 12, 2018 at 08:52:52AM +0200, Vlastimil Babka wrote:
>>> On 04/11/2018 03:56 PM, Roman Gushchin wrote:
 On Wed, Apr 11, 2018 at 03:16:08PM +0200, Vlastimil Babka wrote:
> [+CC linux-api]
>
> On 03/05/2018 02:37 PM, Roman Gushchin wrote:
>> This patch introduces a concept of indirectly reclaimable memory
>> and adds the corresponding memory counter and /proc/vmstat item.
>>
>> Indirectly reclaimable memory is any sort of memory, used by
>> the kernel (except of reclaimable slabs), which is actually
>> reclaimable, i.e. will be released under memory pressure.
>>
>> The counter is in bytes, as it's not always possible to
>> count such objects in pages. The name contains BYTES
>> by analogy to NR_KERNEL_STACK_KB.
>>
>> Signed-off-by: Roman Gushchin 
>> Cc: Andrew Morton 
>> Cc: Alexander Viro 
>> Cc: Michal Hocko 
>> Cc: Johannes Weiner 
>> Cc: linux-fsde...@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: linux...@kvack.org
>> Cc: kernel-t...@fb.com
>
> Hmm, looks like I'm late and this user-visible API change was just
> merged. But it's for rc1, so we can still change it, hopefully?
>
> One problem I see with the counter is that it's in bytes, but among
> counters that use pages, and the name doesn't indicate it.

 Here I just followed "nr_kernel_stack" path, which is measured in kB,
 but this is not mentioned in the field name.
>>>
>>> Oh, didn't know. Bad example to follow :P
>>>
> Then, I don't
> see why users should care about the "indirectly" part, as that's just an
> implementation detail. It is reclaimable and that's what matters, right?
> (I also wanted to complain about lack of Documentation/... update, but
> looks like there's no general file about vmstat, ugh)

 I agree, that it's a bit weird, and it's probably better to not expose
 it at all; but this is how all vm counters work. We do expose them all
 in /proc/vmstat. A good number of them is useless until you are not a
 mm developer, so it's arguable more "debug info" rather than "api".
>>>
>>> Yeah the problem is that once tools start rely on them, they fall under
>>> the "do not break userspace" rule, however we call them. So being
>>> cautious and conservative can't hurt.
>>>
 It's definitely not a reason to make them messy.
 Does "nr_indirectly_reclaimable_bytes" look better to you?
>>>
>>> It still has has the "indirecly" part and feels arbitrary :/
>>>
>
> I also kind of liked the idea from v1 rfc posting that there would be a
> separate set of reclaimable kmalloc-X caches for these kind of
> allocations. Besides accounting, it should also help reduce memory
> fragmentation. The right variant of cache would be detected via
> __GFP_RECLAIMABLE.

 Well, the downside is that we have to introduce X new caches
 just for this particular problem. I'm not strictly against the idea,
 but not convinced that it's much better.
>>>
>>> Maybe we can find more cases that would benefit from it. Heck, even slab
>>> itself allocates some management structures from the generic kmalloc
>>> caches, and if they are used for reclaimable caches, they could be
>>> tracked as reclaimable as well.
>>
>> This is a good catch!
>>
>>>
>
> With that in mind, can we at least for now put the (manually maintained)
> byte counter in a variable that's not directly exposed via /proc/vmstat,
> and then when printing nr_slab_reclaimable, simply add the value
> (divided by PAGE_SIZE), and when printing nr_slab_unreclaimable,
> subtract the same value. This way we would be simply making the existing
> counters more precise, in line with their semantics.

 Idk, I don't like the idea of adding a counter outside of the vm counters
 infrastructure, and I definitely wouldn't touch the exposed
 nr_slab_reclaimable and nr_slab_unreclaimable fields.
>>>
>>> We would be just making the reported values more precise wrt reality.
>>
>> It depends on if we believe that only slab memory can be reclaimable
>> or not. If yes, this is true, otherwise not.
>>
>> My guess is that some drivers (e.g. networking) might have buffers,
>> which are reclaimable under mempressure, and are allocated using
>> the page allocator. But I have to look closer...
>>
> 
> One such case I have encountered is that of the ION page pool. The page pool
> registers a shrinker. When not in any memory pressure page pool can go high
> and thus cause an mmap to fail when OVERCOMMIT_GUESS is set. I can send
> a patch to account ION page pool pages in NR_INDIRECTLY_RECLAIMABLE_BYTES.
> 
> Thanks,
> Vinayak
> 

As Vinayak mentioned NR_INDIRECTLY_RECLAIMABLE_BYTES can be used to solve the 
issue
with ION page pool when OVERCOMMIT_GUESS is