Re: [PATCH] [RFC] Throttle swappiness for interactive tasks
Abhijit Bhopatkar wrote: In my mind i find it fundamentally wrong to separate anon pages from page cache. It should rather be lot more dependent on which task accessed them last. Although it seems due to some twisted relationships bet anon pages and interactive tasks separating them improves it. Am i missing something here? The IO cost for anonymous (and other swap backed) pages is completely different from the IO cost of file system backed pages. On file systems, data is typically grouped together on disk by related content. Programs often access data linearly, meaning that with readahead we can load a lot of pages into memory with only a few disk seeks. Anonymous memory does not have this benefit. For one, memory tends to get written to swap by LRU order, not by related content. To make things worse, repeated malloc/free cycles can cause the memory adjacant to each other inside a process to be completely unrelated, making virtual address based swap clustering less useful. The goal of page replacement is to minimize the total time spent waiting on page faults. This is not exactly the same as minimizing the total number of page faults. Can you send me those patches please or point me to where i can find those? You can get the latest one here: http://surriel.com/patches/2.6/vm-split/linux-2.6-vm-split.patch -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RFC] Throttle swappiness for interactive tasks
Abhijit Bhopatkar wrote: In my mind i find it fundamentally wrong to separate anon pages from page cache. It should rather be lot more dependent on which task accessed them last. Although it seems due to some twisted relationships bet anon pages and interactive tasks separating them improves it. Am i missing something here? The IO cost for anonymous (and other swap backed) pages is completely different from the IO cost of file system backed pages. On file systems, data is typically grouped together on disk by related content. Programs often access data linearly, meaning that with readahead we can load a lot of pages into memory with only a few disk seeks. Anonymous memory does not have this benefit. For one, memory tends to get written to swap by LRU order, not by related content. To make things worse, repeated malloc/free cycles can cause the memory adjacant to each other inside a process to be completely unrelated, making virtual address based swap clustering less useful. The goal of page replacement is to minimize the total time spent waiting on page faults. This is not exactly the same as minimizing the total number of page faults. Can you send me those patches please or point me to where i can find those? You can get the latest one here: http://surriel.com/patches/2.6/vm-split/linux-2.6-vm-split.patch -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RFC] Throttle swappiness for interactive tasks
> I just wanted to know weather its worth going forward or we have > better reasons to discount any such direction? The reason that the wrong pages get swapped out sometimes could be due to a side effect of the way the swappiness policy is implemented. While the VM only reclaims page cache pages, it will still rotate through the anonymous pages on the LRU list, which effectively randomizes the order of those pages on the list. In my mind i find it fundamentally wrong to separate anon pages from page cache. It should rather be lot more dependent on which task accessed them last. Although it seems due to some twisted relationships bet anon pages and interactive tasks separating them improves it. Am i missing something here? I need to get back to benchmarking my patch to split the lists - anonymous and other swap backed pages on one set of pageout lists, filesystem backed pages on another list. Unfortunately my main desktop system at home depends on Xen, so it's not as easy to use that patch there :( Can you send me those patches please or point me to where i can find those? Abhijit - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RFC] Throttle swappiness for interactive tasks
Abhijit Bhopatkar wrote: I just wanted to know weather its worth going forward or we have better reasons to discount any such direction? The reason that the wrong pages get swapped out sometimes could be due to a side effect of the way the swappiness policy is implemented. While the VM only reclaims page cache pages, it will still rotate through the anonymous pages on the LRU list, which effectively randomizes the order of those pages on the list. I need to get back to benchmarking my patch to split the lists - anonymous and other swap backed pages on one set of pageout lists, filesystem backed pages on another list. One report I got was that the system is more interactive under very heavy load, and my desktop system at the office seems to behave better than it used to when I get back to it after a few days. Unfortunately my main desktop system at home depends on Xen, so it's not as easy to use that patch there :( -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RFC] Throttle swappiness for interactive tasks
अभिजित भोपटकर (Abhijit Bhopatkar) wrote: The mm structures of interactive tasks are marked and the pages belonging to them are never shifted to inactive list in lru algorithm. Thus keeping interactive tasks in memory as long as possible. The interactivity is already determined by schedular so we reuse that knowledge to mark the mm structures. Aside from the obvious question of whether the idea is good, there are some practical problems with your patch: 1) the mm->interactive flag is never cleared, even if the task stops being interactive 2) what if the interactive tasks use up more memory than the system has? Will you OOM kill instead of swapping out part of an interactive task? 3) the scheduler can change its idea about which task is interactive and which task isn't very rapidly, while disk IO is very slow - the scheduler's classification may not be useful on swap timescales 4) a currently completely idle task can still be marked interactive in the scheduler, even if it has been idle for days. Such a task is an obvious good candidate for swapout, isn't it? -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RFC] Throttle swappiness for interactive tasks
अभिजित भोपटकर (Abhijit Bhopatkar) wrote: The mm structures of interactive tasks are marked and the pages belonging to them are never shifted to inactive list in lru algorithm. Thus keeping interactive tasks in memory as long as possible. The interactivity is already determined by schedular so we reuse that knowledge to mark the mm structures. Signed-off-by: Abhijit Bhopatkar <[EMAIL PROTECTED]> --- Lying to the VM doesn't seem like the best way to handle this. A lot of tasks, including interactive ones have some/many pages that they touch once during startup, and don't touch again for a very long time, if ever. We want these pages swapped out long before the box swaps out the working set of our non-interactive processes. I like the general idea of swap priority influenced by scheduler priority, but if we're going to do that, we should do it in a general way that's independent of scheduler implementation, so it'll be useful to soft real-time users and still relevant if (when?) we replace the current scheduler with something else lacking a special "interactive" flag. -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [RFC] Throttle swappiness for interactive tasks
The mm structures of interactive tasks are marked and the pages belonging to them are never shifted to inactive list in lru algorithm. Thus keeping interactive tasks in memory as long as possible. The interactivity is already determined by schedular so we reuse that knowledge to mark the mm structures. Signed-off-by: Abhijit Bhopatkar <[EMAIL PROTECTED]> --- include/linux/init_task.h |1 + include/linux/sched.h |4 kernel/sched.c|4 mm/rmap.c |5 + 4 files changed, 14 insertions(+), 0 deletions(-) diff --git a/include/linux/init_task.h b/include/linux/init_task.h index a2d95ff..12ba3c1 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -54,6 +54,7 @@ .page_table_lock = __SPIN_LOCK_UNLOCKED(name.page_table_lock), \ .mmlist = LIST_HEAD_INIT(name.mmlist), \ .cpu_vm_mask= CPU_MASK_ALL, \ + .interactive= 0,\ } #define INIT_SIGNALS(sig) {\ diff --git a/include/linux/sched.h b/include/linux/sched.h index 49fe299..1f233e7 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -373,6 +373,10 @@ struct mm_struct { /* aio bits */ rwlock_tioctx_list_lock; struct kioctx *ioctx_list; + + /* interactivity flag */ + unsigned char interactive; + }; struct sighand_struct { diff --git a/kernel/sched.c b/kernel/sched.c index b9a6837..72caf94 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -746,6 +746,10 @@ static inline int __normal_prio(struct task_struct *p) prio = MAX_RT_PRIO; if (prio > MAX_PRIO-1) prio = MAX_PRIO-1; + + /* Update interactivity flag in mm if interactive. */ + if(p->mm && TASK_INTERACTIVE(p)) + p->mm->interactive = 1; return prio; } diff --git a/mm/rmap.c b/mm/rmap.c index b82146e..0735168 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -317,6 +317,11 @@ static int page_referenced_one(struct page *page, rwsem_is_locked(>mmap_sem)) referenced++; + /* Pretend the page is referenced if the task is + interactive. */ + if (mm != current->mm && mm->interactive) + referenced++; + (*mapcount)--; pte_unmap_unlock(pte, ptl); out: -- 1.4.4.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [RFC] Throttle swappiness for interactive tasks
The mm structures of interactive tasks are marked and the pages belonging to them are never shifted to inactive list in lru algorithm. Thus keeping interactive tasks in memory as long as possible. The interactivity is already determined by schedular so we reuse that knowledge to mark the mm structures. Signed-off-by: Abhijit Bhopatkar [EMAIL PROTECTED] --- include/linux/init_task.h |1 + include/linux/sched.h |4 kernel/sched.c|4 mm/rmap.c |5 + 4 files changed, 14 insertions(+), 0 deletions(-) diff --git a/include/linux/init_task.h b/include/linux/init_task.h index a2d95ff..12ba3c1 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -54,6 +54,7 @@ .page_table_lock = __SPIN_LOCK_UNLOCKED(name.page_table_lock), \ .mmlist = LIST_HEAD_INIT(name.mmlist), \ .cpu_vm_mask= CPU_MASK_ALL, \ + .interactive= 0,\ } #define INIT_SIGNALS(sig) {\ diff --git a/include/linux/sched.h b/include/linux/sched.h index 49fe299..1f233e7 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -373,6 +373,10 @@ struct mm_struct { /* aio bits */ rwlock_tioctx_list_lock; struct kioctx *ioctx_list; + + /* interactivity flag */ + unsigned char interactive; + }; struct sighand_struct { diff --git a/kernel/sched.c b/kernel/sched.c index b9a6837..72caf94 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -746,6 +746,10 @@ static inline int __normal_prio(struct task_struct *p) prio = MAX_RT_PRIO; if (prio MAX_PRIO-1) prio = MAX_PRIO-1; + + /* Update interactivity flag in mm if interactive. */ + if(p-mm TASK_INTERACTIVE(p)) + p-mm-interactive = 1; return prio; } diff --git a/mm/rmap.c b/mm/rmap.c index b82146e..0735168 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -317,6 +317,11 @@ static int page_referenced_one(struct page *page, rwsem_is_locked(mm-mmap_sem)) referenced++; + /* Pretend the page is referenced if the task is + interactive. */ + if (mm != current-mm mm-interactive) + referenced++; + (*mapcount)--; pte_unmap_unlock(pte, ptl); out: -- 1.4.4.2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RFC] Throttle swappiness for interactive tasks
अभिजित भोपटकर (Abhijit Bhopatkar) wrote: The mm structures of interactive tasks are marked and the pages belonging to them are never shifted to inactive list in lru algorithm. Thus keeping interactive tasks in memory as long as possible. The interactivity is already determined by schedular so we reuse that knowledge to mark the mm structures. Signed-off-by: Abhijit Bhopatkar [EMAIL PROTECTED] --- Lying to the VM doesn't seem like the best way to handle this. A lot of tasks, including interactive ones have some/many pages that they touch once during startup, and don't touch again for a very long time, if ever. We want these pages swapped out long before the box swaps out the working set of our non-interactive processes. I like the general idea of swap priority influenced by scheduler priority, but if we're going to do that, we should do it in a general way that's independent of scheduler implementation, so it'll be useful to soft real-time users and still relevant if (when?) we replace the current scheduler with something else lacking a special interactive flag. -- Chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RFC] Throttle swappiness for interactive tasks
अभिजित भोपटकर (Abhijit Bhopatkar) wrote: The mm structures of interactive tasks are marked and the pages belonging to them are never shifted to inactive list in lru algorithm. Thus keeping interactive tasks in memory as long as possible. The interactivity is already determined by schedular so we reuse that knowledge to mark the mm structures. Aside from the obvious question of whether the idea is good, there are some practical problems with your patch: 1) the mm-interactive flag is never cleared, even if the task stops being interactive 2) what if the interactive tasks use up more memory than the system has? Will you OOM kill instead of swapping out part of an interactive task? 3) the scheduler can change its idea about which task is interactive and which task isn't very rapidly, while disk IO is very slow - the scheduler's classification may not be useful on swap timescales 4) a currently completely idle task can still be marked interactive in the scheduler, even if it has been idle for days. Such a task is an obvious good candidate for swapout, isn't it? -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RFC] Throttle swappiness for interactive tasks
Abhijit Bhopatkar wrote: I just wanted to know weather its worth going forward or we have better reasons to discount any such direction? The reason that the wrong pages get swapped out sometimes could be due to a side effect of the way the swappiness policy is implemented. While the VM only reclaims page cache pages, it will still rotate through the anonymous pages on the LRU list, which effectively randomizes the order of those pages on the list. I need to get back to benchmarking my patch to split the lists - anonymous and other swap backed pages on one set of pageout lists, filesystem backed pages on another list. One report I got was that the system is more interactive under very heavy load, and my desktop system at the office seems to behave better than it used to when I get back to it after a few days. Unfortunately my main desktop system at home depends on Xen, so it's not as easy to use that patch there :( -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RFC] Throttle swappiness for interactive tasks
I just wanted to know weather its worth going forward or we have better reasons to discount any such direction? The reason that the wrong pages get swapped out sometimes could be due to a side effect of the way the swappiness policy is implemented. While the VM only reclaims page cache pages, it will still rotate through the anonymous pages on the LRU list, which effectively randomizes the order of those pages on the list. In my mind i find it fundamentally wrong to separate anon pages from page cache. It should rather be lot more dependent on which task accessed them last. Although it seems due to some twisted relationships bet anon pages and interactive tasks separating them improves it. Am i missing something here? I need to get back to benchmarking my patch to split the lists - anonymous and other swap backed pages on one set of pageout lists, filesystem backed pages on another list. snip Unfortunately my main desktop system at home depends on Xen, so it's not as easy to use that patch there :( Can you send me those patches please or point me to where i can find those? Abhijit - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/