Re: [PATCH 2.6.16.29 1/1] memory: enhance Linux swap subsystem

2007-01-09 Thread yunfeng zhang

Sorry, I can't be online regularly, that is, can't synchronize Linux CVS, so
only work on a fixed kernel version. Documentation/vm_pps.txt isn't only a patch
overview but also a changelog.



Great!

Do you have patch against 2.6.19?


Thanks!

--
Al



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.16.29 1/1] memory: enhance Linux swap subsystem

2007-01-09 Thread yunfeng zhang

Maybe, there should be a memory maintainer in linux kernel group.

Here, I show some content from my patch (Documentation/vm_pps.txt). In brief, I
make a revolution about Linux swap subsystem, the idea is described that
SwapDaemon should scan and reclaim pages on UserSpace::vmalist other than
current zone::active/inactive. The change will conspicuously enhance swap
subsystem performance by

1) SwapDaemon can collect the statistic of process acessing pages and by it
  unmaps ptes, SMP specially benefits from it for we can use flush_tlb_range
  to unmap ptes batchly rather than frequently TLB IPI interrupt per a page in
  current Linux legacy swap subsystem. In fact, in some cases, we can even
  flush TLB without sending IPI.
2) Page-fault can issue better readahead requests since history data shows all
  related pages have conglomerating affinity. In contrast, Linux page-fault
  readaheads the pages relative to the SwapSpace position of current
  page-fault page.
3) It's conformable to POSIX madvise API family.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.16.29 1/1] memory: enhance Linux swap subsystem

2007-01-05 Thread zyf.zeroos
Test mail with my signature, mail content is based on the second quilt patch 
(Linux 2.6.16.29), only two key files are re-sent 1) Documentation/vm_pps.txt 
2) mm/vmscan.c

Index: test.signature/Documentation/vm_pps.txt
===
--- /dev/null 1970-01-01 00:00:00.0 +
+++ test.signature/Documentation/vm_pps.txt 2007-01-06 07:00:18.146480584 +0800
@@ -0,0 +1,214 @@
+ Pure Private Page System (pps)
+ Copyright by Yunfeng Zhang on GFDL 1.2
+  [EMAIL PROTECTED]
+  December 24-26, 2006
+
+// Purpose <([{
+The file is used to document the idea which is published firstly at
+http://www.ussg.iu.edu/hypermail/linux/kernel/0607.2/0451.html, as a part of my
+OS -- main page http://blog.chinaunix.net/u/21764/index.php. In brief, the
+patch of the document is for enchancing the performance of Linux swap
+subsystem. You can find the overview of the idea in section  and how I patch it into Linux 2.6.16.29 in section
+.
+// }])>
+
+// How to Reclaim Pages more Efficiently <([{
+Good idea originates from overall design and management ability, when you look
+down from a manager view, you will relief yourself from disordered code and
+find some problem immediately.
+
+OK! to modern OS, its memory subsystem can be divided into three layers
+1) Space layer (InodeSpace, UserSpace and CoreSpace).
+2) VMA layer (PrivateVMA and SharedVMA, memory architecture-independent layer).
+3) PTE, zone/memory inode layer (architecture-dependent).
+4) Maybe it makes you sense that Page should be placed on the 3rd layer, but
+   here, it's placed on the 2nd layer since it's the basic unit of VMA.
+
+Since the 2nd layer assembles the much statistic of page-acess information, so
+it's nature that swap subsystem should be deployed and implemented on the 2nd
+layer.
+
+Undoubtedly, there are some virtues about it
+1) SwapDaemon can collect the statistic of process acessing pages and by it
+   unmaps ptes, SMP specially benefits from it for we can use flush_tlb_range
+   to unmap ptes batchly rather than frequently TLB IPI interrupt per a page in
+   current Linux legacy swap subsystem.
+2) Page-fault can issue better readahead requests since history data shows all
+   related pages have conglomerating affinity. In contrast, Linux page-fault
+   readaheads the pages relative to the SwapSpace position of current
+   page-fault page.
+3) It's conformable to POSIX madvise API family.
+
+Unfortunately, Linux 2.6.16.29 swap subsystem is based on the 3rd layer -- a
+system on zone::active_list/inactive_list.
+
+I've finished a patch, see section . Note, it
+ISN'T perfect.
+// }])>
+
+// Pure Private Page System -- pps  <([{
+As I've referred in previous section, perfectly applying my idea need to unroot
+page-surrounging swap subsystem to migrate it on VMA, but a huge gap has
+defeated me -- active_list and inactive_list. In fact, you can find
+lru_add_active code anywhere ... It's IMPOSSIBLE to me to complete it only by
+myself. It's also the difference between my design and Linux, in my OS, page is
+the charge of its new owner totally, however, to Linux, page management system
+is still tracing it by PG_active flag.
+
+So I conceive another solution:) That is, set up an independent page-recycle
+system rooted on Linux legacy page system -- pps, intercept all private pages
+belonging to PrivateVMA to pps, then use my pps to cycle them.  By the way, the
+whole job should be consist of two parts, here is the first --
+PrivateVMA-oriented (PPS), other is SharedVMA-oriented (should be called SPS)
+scheduled in future. Of course, if all are done, it will empty Linux legacy
+page system.
+
+In fact, pps is centered on how to better collect and unmap process private
+pages in SwapDaemon mm/vmscan.c:shrink_private_vma, the whole process is
+divided into six stages -- . Other sections show the remain
+aspects of pps
+1)  is basic data definition.
+2)  is focused on synchronization.
+3)  -- how private pages enter in/go off pps.
+4)  which VMA is belonging to pps.
+
+PPS uses init_mm.mm_list list to enumerate all swappable UserSpace
+(shrink_private_vma).
+
+A new kernel thread -- kppsd is introduced in mm/vmscan.c, its task is to
+execute the stages of pps periodically, note an appropriate timeout ticks is
+necessary so we can give application a chance to re-map back its PrivatePage
+from UnmappedPTE to PTE, that is, show their conglomeration affinity.
+scan_control::pps_cmd field is used to control the behavior of kppsd, = 1 for
+accelerating scanning process and reclaiming pages, it's used in balance_pgdat.
+
+PPS statistic data is appended to /proc/meminfo entry, its prototype is in
+include/linux/mm.h.
+
+I'm also glad to highlight my a new idea -- dftlb which is described in
+section .
+// }])>
+
+// Delay to Flush TLB (dftlb) <([{
+Delay to flush TLB is instroduced by me to enhance flushing

Re: [PATCH 2.6.16.29 1/1] memory: enhance Linux swap subsystem

2007-01-04 Thread yunfeng zhang

A new patch has been done by me, based on the previous quilt patch
(2.6.16.29). Here is
changelog

--

NEW

New kernel thread kppsd is added to execute background scanning task
periodically (mm/vmscan.c).

PPS statistic is added into /proc/meminfo, its prototype is in
include/linux/mm.h.

Documentation/vm_pps.txt is also updated to show the aboved two new features,
some sections is re-written for comprehension.


BUG

New loop code is introduced in shrink_private_vma (mm/vmscan.c) and
pps_swapoff (mm/swapfile.c), contrast with old code, even lhtemp is freed
during loop, it's also safe.

A bug is catched in mm/memory.c:zap_pte_range -- if a PrivatePage is
being written back, it will be migrated back to Linux legacy page system.

A fault done by me in previous patch is remedied in stage 5, now stage 5 can
work.


MISCELLANEOUS

UP code has been separated from SMP code in dftlb.
--

Index: linux-2.6.16.29/Documentation/vm_pps.txt
===
--- linux-2.6.16.29.orig/Documentation/vm_pps.txt   2007-01-04
14:47:35.0 +0800
+++ linux-2.6.16.29/Documentation/vm_pps.txt2007-01-04 14:49:36.0 
+0800
@@ -6,11 +6,11 @@
// Purpose <([{
The file is used to document the idea which is published firstly at
http://www.ussg.iu.edu/hypermail/linux/kernel/0607.2/0451.html, as a part of my
-OS -- main page http://blog.chinaunix.net/u/21764/index.php. In brief, a patch
-of the document to enchance the performance of Linux swap subsystem. You can
-find the overview of the idea in section  and how I patch it into Linux 2.6.16.29 in section .
+OS -- main page http://blog.chinaunix.net/u/21764/index.php. In brief, the
+patch of the document is for enchancing the performance of Linux swap
+subsystem. You can find the overview of the idea in section  and how I patch it into Linux 2.6.16.29 in section
+.
// }])>

// How to Reclaim Pages more Efficiently <([{
@@ -21,7 +21,9 @@
OK! to modern OS, its memory subsystem can be divided into three layers
1) Space layer (InodeSpace, UserSpace and CoreSpace).
2) VMA layer (PrivateVMA and SharedVMA, memory architecture-independent layer).
-3) PTE and page layer (architecture-dependent).
+3) PTE, zone/memory inode layer (architecture-dependent).
+4) Maybe it makes you sense that Page should be placed on the 3rd layer, but
+   here, it's placed on the 2nd layer since it's the basic unit of VMA.

Since the 2nd layer assembles the much statistic of page-acess information, so
it's nature that swap subsystem should be deployed and implemented on the 2nd
@@ -41,7 +43,8 @@
Unfortunately, Linux 2.6.16.29 swap subsystem is based on the 3rd layer -- a
system on zone::active_list/inactive_list.

-I've finished a patch, see section .
Note, it ISN'T perfect.
+I've finished a patch, see section . Note, it
+ISN'T perfect.
// }])>

// Pure Private Page System -- pps  <([{
@@ -70,7 +73,18 @@
3)  -- how private pages enter in/go off pps.
4)  which VMA is belonging to pps.

-PPS uses init_mm.mm_list list to enumerate all swappable UserSpace.
+PPS uses init_mm.mm_list list to enumerate all swappable UserSpace
+(shrink_private_vma).
+
+A new kernel thread -- kppsd is introduced in mm/vmscan.c, its task is to
+execute the stages of pps periodically, note an appropriate timeout ticks is
+necessary so we can give application a chance to re-map back its PrivatePage
+from UnmappedPTE to PTE, that is, show their conglomeration affinity.
+scan_control::pps_cmd field is used to control the behavior of kppsd, = 1 for
+accelerating scanning process and reclaiming pages, it's used in balance_pgdat.
+
+PPS statistic data is appended to /proc/meminfo entry, its prototype is in
+include/linux/mm.h.

I'm also glad to highlight my a new idea -- dftlb which is described in
section .
@@ -97,15 +111,19 @@
   gone when a CPU starts to execute the task in timer interrupt, so don't use
   dftlb.
combine stage 1 with stage 2, and send IPI immediately in fill_in_tlb_tasks.
+
+dftlb increases mm_struct::mm_users to prevent the mm from being freed when
+other CPU works on it.
// }])>

// Stage Definition <([{
The whole process of private page page-out is divided into six stages, as
-showed in shrink_pvma_scan_ptes of mm/vmscan.c
+showed in shrink_pvma_scan_ptes of mm/vmscan.c, the code groups the similar
+pages to a series.
1) PTE to untouched PTE (access bit is cleared), append flushing
tasks to dftlb.
2) Convert untouched PTE to UnmappedPTE.
3) Link SwapEntry to every UnmappedPTE.
-4) Synchronize the page of a UnmappedPTE with its physical swap page.
+4) Flush PrivatePage of UnmappedPTE to its disk SwapPage.
5) Reclaimed the page and shift UnmappedPTE to SwappedPTE.
6) SwappedPTE stage.
// }])>
@@ -114,7 +132,15 @@
New VMA flag (VM_PURE_PRIVATE) is appended into VMA in include/linux/mm.h.

New PTE type (UnmappedPTE) is appended into PTE system in
-include/asm-i386/pgtable.h.
+include/asm-i386/pgtable.h. Its prototyp

Re: [PATCH 2.6.16.29 1/1] memory: enhance Linux swap subsystem

2007-01-04 Thread yunfeng zhang

No, a new idea to re-write swap subsystem at all. In fact, it's an
impossible task to me, so I provide a compromising solution -- pps
(pure private page system).

2006/12/30, Zhou Yingchao <[EMAIL PROTECTED]>:

2006/12/27, yunfeng zhang <[EMAIL PROTECTED]>:
> To multiple address space, multiple memory inode architecture, we can 
introduce
> a new core object -- section which has several features
Do you mean "in-memory inode"  or "memory node(pglist_data)" by "memory inode" ?
> The idea issued by me is whether swap subsystem should be deployed on layer 2 
or
> layer 3 which is described in Documentation/vm_pps.txt of my patch. To 
multiple
> memory inode architecture, the special memory model should be encapsulated on
> layer 3 (architecture-dependent), I think.
I guess that you are  wanting to do something to remove arch-dependent
code in swap subsystem.  Just like the pud introduced in the
page-table related codes. Is it right?
However, you should verify that your changes will not deteriorate
system performance. Also, you need to maintain it for a long time with
the evolution of mainline kernel before it is accepted.

Best regards
--
Yingchao Zhou
***
 Institute Of Computing Technology
 Chinese Academy of Sciences
***


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.16.29 1/1] memory: enhance Linux swap subsystem

2006-12-29 Thread Zhou Yingchao

2006/12/27, yunfeng zhang <[EMAIL PROTECTED]>:

To multiple address space, multiple memory inode architecture, we can introduce
a new core object -- section which has several features

Do you mean "in-memory inode"  or "memory node(pglist_data)" by "memory inode" ?

The idea issued by me is whether swap subsystem should be deployed on layer 2 or
layer 3 which is described in Documentation/vm_pps.txt of my patch. To multiple
memory inode architecture, the special memory model should be encapsulated on
layer 3 (architecture-dependent), I think.

I guess that you are  wanting to do something to remove arch-dependent
code in swap subsystem.  Just like the pud introduced in the
page-table related codes. Is it right?
However, you should verify that your changes will not deteriorate
system performance. Also, you need to maintain it for a long time with
the evolution of mainline kernel before it is accepted.

Best regards
--
Yingchao Zhou
***
Institute Of Computing Technology
Chinese Academy of Sciences
***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.16.29 1/1] memory: enhance Linux swap subsystem

2006-12-29 Thread Randy Dunlap
On Fri, 29 Dec 2006 10:15:51 +0100 Pavel Machek wrote:

> On Fri 2006-12-29 14:45:33, yunfeng zhang wrote:
> > I've re-published my work on quilt, sorry.
> 
> Your patch is still wordwrapped.
> 
> Do not cc linus on non-final version of the patch.
> 
> Patch should be against latest kernel.
> 
> Patch should have changelog and signed off by.
> 
> Why the change? Do you gain 5% on kernel compile on 20MB box?

+ Don't leave the entire email inline if you are not going to
  comment on it inline.

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.16.29 1/1] memory: enhance Linux swap subsystem

2006-12-29 Thread Pavel Machek
On Fri 2006-12-29 14:45:33, yunfeng zhang wrote:
> I've re-published my work on quilt, sorry.

Your patch is still wordwrapped.

Do not cc linus on non-final version of the patch.

Patch should be against latest kernel.

Patch should have changelog and signed off by.

Why the change? Do you gain 5% on kernel compile on 20MB box?

Pavel


> 
> Index: linux-2.6.16.29/Documentation/vm_pps.txt
> ===
> --- /dev/null 1970-01-01 00:00:00.0 +
> +++ linux-2.6.16.29/Documentation/vm_pps.txt  2006-12-29 
> 14:36:36.507332384 +0800
> @@ -0,0 +1,192 @@
> + Pure Private Page System (pps)
> + Copyright by Yunfeng Zhang on GFDL 1.2
> +  [EMAIL PROTECTED]
> +  December 24-26, 2006
> +
> +// Purpose <([{
> +The file is used to document the idea which is published firstly at
> +http://www.ussg.iu.edu/hypermail/linux/kernel/0607.2/0451.html, as a part 
> of my
> +OS -- main page http://blog.chinaunix.net/u/21764/index.php. In brief, a 
> patch
> +of the document to enchance the performance of Linux swap subsystem. You 
> can
> +find the overview of the idea in section  +Efficiently> and how I patch it into Linux 2.6.16.29 in section  Private
> +Page System -- pps>.
> +// }])>
> +
> +// How to Reclaim Pages more Efficiently <([{
> +Good idea originates from overall design and management ability, when you 
> look
> +down from a manager view, you will relief yourself from disordered code and
> +find some problem immediately.
> +
> +OK! to modern OS, its memory subsystem can be divided into three layers
> +1) Space layer (InodeSpace, UserSpace and CoreSpace).
> +2) VMA layer (PrivateVMA and SharedVMA, memory architecture-independent 
> layer).
> +3) PTE and page layer (architecture-dependent).
> +
> +Since the 2nd layer assembles the much statistic of page-acess 
> information, so
> +it's nature that swap subsystem should be deployed and implemented on the 
> 2nd
> +layer.
> +
> +Undoubtedly, there are some virtues about it
> +1) SwapDaemon can collect the statistic of process acessing pages and by it
> +   unmaps ptes, SMP specially benefits from it for we can use 
> flush_tlb_range
> +   to unmap ptes batchly rather than frequently TLB IPI interrupt per a 
> page in
> +   current Linux legacy swap subsystem.
> +2) Page-fault can issue better readahead requests since history data shows 
> all
> +   related pages have conglomerating affinity. In contrast, Linux 
> page-fault
> +   readaheads the pages relative to the SwapSpace position of current
> +   page-fault page.
> +3) It's conformable to POSIX madvise API family.
> +
> +Unfortunately, Linux 2.6.16.29 swap subsystem is based on the 3rd layer -- 
> a
> +system on zone::active_list/inactive_list.
> +
> +I've finished a patch, see section .
> Note, it ISN'T perfect.
> +// }])>
> +
> +// Pure Private Page System -- pps  <([{
> +As I've referred in previous section, perfectly applying my idea need to 
> unroot
> +page-surrounging swap subsystem to migrate it on VMA, but a huge gap has
> +defeated me -- active_list and inactive_list. In fact, you can find
> +lru_add_active code anywhere ... It's IMPOSSIBLE to me to complete it only 
> by
> +myself. It's also the difference between my design and Linux, in my OS, 
> page is
> +the charge of its new owner totally, however, to Linux, page management 
> system
> +is still tracing it by PG_active flag.
> +
> +So I conceive another solution:) That is, set up an independent 
> page-recycle
> +system rooted on Linux legacy page system -- pps, intercept all private 
> pages
> +belonging to PrivateVMA to pps, then use my pps to cycle them.  By the 
> way, the
> +whole job should be consist of two parts, here is the first --
> +PrivateVMA-oriented (PPS), other is SharedVMA-oriented (should be called 
> SPS)
> +scheduled in future. Of course, if all are done, it will empty Linux legacy
> +page system.
> +
> +In fact, pps is centered on how to better collect and unmap process private
> +pages in SwapDaemon mm/vmscan.c:shrink_private_vma, the whole process is
> +divided into six stages -- . Other sections show the 
> remain
> +aspects of pps
> +1)  is basic data definition.
> +2)  is focused on synchronization.
> +3)  -- how private pages enter in/go off 
> pps.
> +4)  which VMA is belonging to pps.
> +
> +PPS uses init_mm.mm_list list to enumerate all swappable UserSpace.
> +
> +I'm also glad to highlight my a new idea -- dftlb which is described in
> +section .
> +// }])>
> +
> +// Delay to Flush TLB (dftlb) <([{
> +Delay to flush TLB is instroduced by me to enhance flushing TLB 
> efficiency, in
> +brief, when we want to unmap a page from the page table of a process, why 
> we
> +send TLB IPI to other CPUs immediately, since every CPU has timer 
> interrupt, we
> +can insert flushing tasks into timer inter

Re: [PATCH 2.6.16.29 1/1] memory: enhance Linux swap subsystem

2006-12-28 Thread yunfeng zhang

I've re-published my work on quilt, sorry.


Index: linux-2.6.16.29/Documentation/vm_pps.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.16.29/Documentation/vm_pps.txt2006-12-29 14:36:36.507332384 
+0800
@@ -0,0 +1,192 @@
+ Pure Private Page System (pps)
+ Copyright by Yunfeng Zhang on GFDL 1.2
+  [EMAIL PROTECTED]
+  December 24-26, 2006
+
+// Purpose <([{
+The file is used to document the idea which is published firstly at
+http://www.ussg.iu.edu/hypermail/linux/kernel/0607.2/0451.html, as a part of my
+OS -- main page http://blog.chinaunix.net/u/21764/index.php. In brief, a patch
+of the document to enchance the performance of Linux swap subsystem. You can
+find the overview of the idea in section  and how I patch it into Linux 2.6.16.29 in section .
+// }])>
+
+// How to Reclaim Pages more Efficiently <([{
+Good idea originates from overall design and management ability, when you look
+down from a manager view, you will relief yourself from disordered code and
+find some problem immediately.
+
+OK! to modern OS, its memory subsystem can be divided into three layers
+1) Space layer (InodeSpace, UserSpace and CoreSpace).
+2) VMA layer (PrivateVMA and SharedVMA, memory architecture-independent layer).
+3) PTE and page layer (architecture-dependent).
+
+Since the 2nd layer assembles the much statistic of page-acess information, so
+it's nature that swap subsystem should be deployed and implemented on the 2nd
+layer.
+
+Undoubtedly, there are some virtues about it
+1) SwapDaemon can collect the statistic of process acessing pages and by it
+   unmaps ptes, SMP specially benefits from it for we can use flush_tlb_range
+   to unmap ptes batchly rather than frequently TLB IPI interrupt per a page in
+   current Linux legacy swap subsystem.
+2) Page-fault can issue better readahead requests since history data shows all
+   related pages have conglomerating affinity. In contrast, Linux page-fault
+   readaheads the pages relative to the SwapSpace position of current
+   page-fault page.
+3) It's conformable to POSIX madvise API family.
+
+Unfortunately, Linux 2.6.16.29 swap subsystem is based on the 3rd layer -- a
+system on zone::active_list/inactive_list.
+
+I've finished a patch, see section .
Note, it ISN'T perfect.
+// }])>
+
+// Pure Private Page System -- pps  <([{
+As I've referred in previous section, perfectly applying my idea need to unroot
+page-surrounging swap subsystem to migrate it on VMA, but a huge gap has
+defeated me -- active_list and inactive_list. In fact, you can find
+lru_add_active code anywhere ... It's IMPOSSIBLE to me to complete it only by
+myself. It's also the difference between my design and Linux, in my OS, page is
+the charge of its new owner totally, however, to Linux, page management system
+is still tracing it by PG_active flag.
+
+So I conceive another solution:) That is, set up an independent page-recycle
+system rooted on Linux legacy page system -- pps, intercept all private pages
+belonging to PrivateVMA to pps, then use my pps to cycle them.  By the way, the
+whole job should be consist of two parts, here is the first --
+PrivateVMA-oriented (PPS), other is SharedVMA-oriented (should be called SPS)
+scheduled in future. Of course, if all are done, it will empty Linux legacy
+page system.
+
+In fact, pps is centered on how to better collect and unmap process private
+pages in SwapDaemon mm/vmscan.c:shrink_private_vma, the whole process is
+divided into six stages -- . Other sections show the remain
+aspects of pps
+1)  is basic data definition.
+2)  is focused on synchronization.
+3)  -- how private pages enter in/go off pps.
+4)  which VMA is belonging to pps.
+
+PPS uses init_mm.mm_list list to enumerate all swappable UserSpace.
+
+I'm also glad to highlight my a new idea -- dftlb which is described in
+section .
+// }])>
+
+// Delay to Flush TLB (dftlb) <([{
+Delay to flush TLB is instroduced by me to enhance flushing TLB efficiency, in
+brief, when we want to unmap a page from the page table of a process, why we
+send TLB IPI to other CPUs immediately, since every CPU has timer interrupt, we
+can insert flushing tasks into timer interrupt route to implement a
+free-charged TLB flushing.
+
+The trick is implemented in
+1) TLB flushing task is added in fill_in_tlb_task of mm/vmscan.c.
+2) timer_flush_tlb_tasks of kernel/timer.c is used by other CPUs to execute
+   flushing tasks.
+3) all data are defined in include/linux/mm.h.
+
+The restriction of dftlb. Following conditions must be met
+1) atomic cmpxchg instruction.
+2) atomically set the access bit after they touch a pte firstly.
+3) To some architectures, vma parameter of flush_tlb_range is maybe important,
+   if it's true, since it's possible that the vma of a TLB flushing task has
+   gone when a CPU starts to execute 

Re: [PATCH 2.6.16.29 1/1] memory: enhance Linux swap subsystem

2006-12-28 Thread Pavel Machek
On Tue 26-12-06 16:18:32, yunfeng zhang wrote:
> In the patch, I introduce a new page system -- pps which 
> can improve
> Linux swap subsystem performance, you can find a new 
> document in
> Documentation/vm_pps.txt. In brief, swap subsystem 
> should scan/reclaim
> pages on VMA instead of zone::active list ...

Is it april's fools days?

Read Doc*/SubmittingPatches.
Pavel

> 
> --- patch-linux/fs/exec.c 2006-12-26 
> 15:20:02.683546016 +0800
> +++ linux-2.6.16.29/fs/exec.c 2006-09-13 
> 02:02:10.0 +0800
> @@ -323,0 +324 @@
> + lru_cache_add_active(page);
> @@ -438 +438,0 @@
> - enter_pps(mm, mpnt);

Pavel
-- 
Thanks for all the (sleeping) penguins.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.16.29 1/1] memory: enhance Linux swap subsystem

2006-12-26 Thread yunfeng zhang

The job listed in Documentation/vm_pps.txt of my patch is too heavy to me, so
I'm appreciate that Linux kernel group can arrange a schedule to help me.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.16.29 1/1] memory: enhance Linux swap subsystem

2006-12-26 Thread yunfeng zhang

To multiple address space, multiple memory inode architecture, we can introduce
a new core object -- section which has several features
1) Section is used as the atomic unit to contain the pages of a VMA residing in
  the memory inode of the section.
2) When page migration occurs among different memory inodes, new secion should
  be set up to trace the pages.
3) Section can be scanned by the SwapDaemon of its memory inode directely.
4) All sections of a VMA are excluded with each other not overlayed.
5) VMA is made up of sections totally, but its section objects scatter on memory
  inodes.
So to the architecture, we can deploy swap subsystem on an
architecture-independent layer by section and scan pages batchly.

The idea issued by me is whether swap subsystem should be deployed on layer 2 or
layer 3 which is described in Documentation/vm_pps.txt of my patch. To multiple
memory inode architecture, the special memory model should be encapsulated on
layer 3 (architecture-dependent), I think.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.16.29 1/1] memory: enhance Linux swap subsystem

2006-12-26 Thread Zhou Yingchao

2006/12/26, yunfeng zhang <[EMAIL PROTECTED]>:

In the patch, I introduce a new page system -- pps which can improve
Linux swap subsystem performance, you can find a new document in
Documentation/vm_pps.txt. In brief, swap subsystem should scan/reclaim
pages on VMA instead of zone::active list ...

  The early swap subsystem was really scan/reclaim based on mm/vma,
but now it changes to pages on active/inactive list.  Perhaps you are
not following a right direction.
--
Yingchao Zhou
***
Institute Of Computing Technology
Chinese Academy of Sciences
***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.16.29 1/1] memory: enhance Linux swap subsystem

2006-12-26 Thread yunfeng zhang

In the patch, I introduce a new page system -- pps which can improve
Linux swap subsystem performance, you can find a new document in
Documentation/vm_pps.txt. In brief, swap subsystem should scan/reclaim
pages on VMA instead of zone::active list ...

--- patch-linux/fs/exec.c   2006-12-26 15:20:02.683546016 +0800
+++ linux-2.6.16.29/fs/exec.c   2006-09-13 02:02:10.0 +0800
@@ -323,0 +324 @@
+   lru_cache_add_active(page);
@@ -438 +438,0 @@
-   enter_pps(mm, mpnt);
--- patch-linux/mm/swap_state.c 2006-12-26 15:20:02.689545104 +0800
+++ linux-2.6.16.29/mm/swap_state.c 2006-09-13 02:02:10.0 +0800
@@ -357,2 +357 @@
-   if (vma == NULL || !(vma->vm_flags & VM_PURE_PRIVATE))
-   lru_cache_add_active(new_page);
+   lru_cache_add_active(new_page);
--- patch-linux/mm/mmap.c   2006-12-26 15:20:02.691544800 +0800
+++ linux-2.6.16.29/mm/mmap.c   2006-09-13 02:02:10.0 +0800
@@ -209 +208,0 @@
-   leave_pps(vma, 0);
@@ -597 +595,0 @@
-   leave_pps(next, 0);
@@ -1096,2 +1093,0 @@
-   enter_pps(mm, vma);
-
@@ -1120 +1115,0 @@
-   leave_pps(vma, 0);
@@ -1148 +1142,0 @@
-   leave_pps(vma, 0);
@@ -1726,4 +1719,0 @@
-   if (new->vm_flags & VM_PURE_PRIVATE) {
-   new->vm_flags &= ~VM_PURE_PRIVATE;
-   enter_pps(mm, new);
-   }
@@ -1930 +1919,0 @@
-   enter_pps(mm, vma);
@@ -2054,4 +2042,0 @@
-   if (new_vma->vm_flags & VM_PURE_PRIVATE) {
-   new_vma->vm_flags &= ~VM_PURE_PRIVATE;
-   enter_pps(mm, new_vma);
-   }
--- patch-linux/mm/fremap.c 2006-12-26 15:20:02.695544192 +0800
+++ linux-2.6.16.29/mm/fremap.c 2006-09-13 02:02:10.0 +0800
@@ -40 +40 @@
-   if (pte_swapped(pte))
+   if (!pte_file(pte))
--- patch-linux/mm/rmap.c   2006-12-26 15:20:02.696544040 +0800
+++ linux-2.6.16.29/mm/rmap.c   2006-09-13 02:02:10.0 +0800
@@ -636 +636 @@
-   BUG_ON(!pte_swapped(*pte));
+   BUG_ON(pte_file(*pte));
--- patch-linux/mm/vmscan.c 2006-12-26 15:20:02.697543888 +0800
+++ linux-2.6.16.29/mm/vmscan.c 2006-09-13 02:02:10.0 +0800
@@ -1517,392 +1516,0 @@
-struct series_t {
-   pte_t orig_ptes[MAX_SERIES_LENGTH];
-   pte_t* ptes[MAX_SERIES_LENGTH];
-   struct page* pages[MAX_SERIES_LENGTH];
-   int series_length;
-   int series_stage;
-} series;
-
-static int get_series_stage(pte_t* pte, int index)
-{
-   series.orig_ptes[index] = *pte;
-   series.ptes[index] = pte;
-   if (pte_present(series.orig_ptes[index])) {
-   struct page* page = 
pfn_to_page(pte_pfn(series.orig_ptes[index]));
-   series.pages[index] = page;
-   if (page == ZERO_PAGE(addr)) // reserved page is exclusive from 
us.
-   return 7;
-   if (pte_young(series.orig_ptes[index])) {
-   return 1;
-   } else
-   return 2;
-   } else if (pte_unmapped(series.orig_ptes[index])) {
-   struct page* page = 
pfn_to_page(pte_pfn(series.orig_ptes[index]));
-   series.pages[index] = page;
-   if (!PageSwapCache(page))
-   return 3;
-   else {
-   if (PageWriteback(page) || PageDirty(page))
-   return 4;
-   else
-   return 5;
-   }
-   } else // pte_swapped -- SwappedPTE
-   return 6;
-}
-
-static void find_series(pte_t** start, unsigned long* addr, unsigned long end)
-{
-   int i;
-   int series_stage = get_series_stage((*start)++, 0);
-   *addr += PAGE_SIZE;
-
-   for (i = 1; i < MAX_SERIES_LENGTH && *addr < end; i++, (*start)++,
*addr += PAGE_SIZE) {
-   if (series_stage != get_series_stage(*start, i))
-   break;
-   }
-   series.series_stage = series_stage;
-   series.series_length = i;
-}
-
-struct delay_tlb_task_t delay_tlb_tasks[32] = { [0 ... 31] = {0} };
-
-void timer_flush_tlb_tasks(void* data)
-{
-   // To x86, if we found there were some flushing tasks, we should do
it all together, that is, flush it once.
-   int i;
-#ifdef CONFIG_X86
-   int flag = 0;
-#endif
-   for (i = 0; i < 32; i++) {
-   if (delay_tlb_tasks[i].mm != NULL &&
-   cpu_isset(smp_processor_id(), 
delay_tlb_tasks[i].mm->cpu_vm_mask) &&
-   cpu_isset(smp_processor_id(), 
delay_tlb_tasks[i].cpu_mask)) {
-#ifdef CONFIG_X86
-   flag = 1;
-#elif
-   // smp::local_flush_tlb_range(delay_tlb_tasks[i]);
-#endif
-   cpu_clear(smp_processor_id(), 
delay_tlb_tasks[i].cpu_mask);
-   }
-   }
-#ifdef CON