Re: [PATCH] lightnvm: calculate device values correctly

2015-11-22 Thread Wenwei Tao
OK, I see. Thanks for the explanation.

2015-11-23 2:34 GMT+08:00 Matias <m...@lightnvm.io>:
> On 11/22/2015 02:51 PM, Wenwei Tao wrote:
>>
>> In the original calculation, the relationships among
>> block, plane and lun was confusing, refine it on the
>> basis of Open-channelSSDInterfaceSpecification.
>>
>> Signed-off-by: Wenwei Tao <ww.tao0...@gmail.com>
>> ---
>>   drivers/lightnvm/core.c | 9 -
>>   1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>> index f659e60..1864b94 100644
>> --- a/drivers/lightnvm/core.c
>> +++ b/drivers/lightnvm/core.c
>> @@ -174,8 +174,8 @@ static int nvm_core_init(struct nvm_dev *dev)
>> dev->nr_chnls = grp->num_ch;
>> dev->luns_per_chnl = grp->num_lun;
>> dev->pgs_per_blk = grp->num_pg;
>> -   dev->blks_per_lun = grp->num_blk;
>> dev->nr_planes = grp->num_pln;
>> +   dev->blks_per_lun = grp->num_blk * grp->num_pln;
>> dev->sec_size = grp->csecs;
>> dev->oob_size = grp->sos;
>> dev->sec_per_pg = grp->fpg_sz / grp->csecs;
>> @@ -191,13 +191,12 @@ static int nvm_core_init(struct nvm_dev *dev)
>> dev->plane_mode = NVM_PLANE_QUAD;
>>
>> /* calculated values */
>> -   dev->sec_per_pl = dev->sec_per_pg * dev->nr_planes;
>> -   dev->sec_per_blk = dev->sec_per_pl * dev->pgs_per_blk;
>> +   dev->sec_per_blk = dev->sec_per_pg * dev->pgs_per_blk;
>> +   dev->sec_per_pl = dev->sec_per_blk * grp->num_blk;
>> dev->sec_per_lun = dev->sec_per_blk * dev->blks_per_lun;
>> dev->nr_luns = dev->luns_per_chnl * dev->nr_chnls;
>>
>> -   dev->total_blocks = dev->nr_planes *
>> -   dev->blks_per_lun *
>> +   dev->total_blocks = dev->blks_per_lun *
>> dev->luns_per_chnl *
>> dev->nr_chnls;
>> dev->total_pages = dev->total_blocks * dev->pgs_per_blk;
>>
>
> The reason I had it as before, was because I wanted to drive the device in
> either single/double/quad read/write/erase plane mode. That way we could get
> away with only managing 1/4 or 1/2 of the block metadata.
>
> What is your use case? It could make sense, but it will require a little
> more work to build up the framework for having the various modes. i.e.
> detect supported plane mode, make a explicit decision about which plane mode
> to always use (or dynamically) and afterwards allocate the appropriate data
> structures.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] lightnvm: calculate device values correctly

2015-11-22 Thread Wenwei Tao
I'm a new guy to flash, if I'm wrong about this patch, please correct me.
Thanks.

2015-11-22 21:51 GMT+08:00 Wenwei Tao <ww.tao0...@gmail.com>:
> In the original calculation, the relationships among
> block, plane and lun was confusing, refine it on the
> basis of Open-channelSSDInterfaceSpecification.
>
> Signed-off-by: Wenwei Tao <ww.tao0...@gmail.com>
> ---
>  drivers/lightnvm/core.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> index f659e60..1864b94 100644
> --- a/drivers/lightnvm/core.c
> +++ b/drivers/lightnvm/core.c
> @@ -174,8 +174,8 @@ static int nvm_core_init(struct nvm_dev *dev)
> dev->nr_chnls = grp->num_ch;
> dev->luns_per_chnl = grp->num_lun;
> dev->pgs_per_blk = grp->num_pg;
> -   dev->blks_per_lun = grp->num_blk;
> dev->nr_planes = grp->num_pln;
> +   dev->blks_per_lun = grp->num_blk * grp->num_pln;
> dev->sec_size = grp->csecs;
> dev->oob_size = grp->sos;
> dev->sec_per_pg = grp->fpg_sz / grp->csecs;
> @@ -191,13 +191,12 @@ static int nvm_core_init(struct nvm_dev *dev)
> dev->plane_mode = NVM_PLANE_QUAD;
>
> /* calculated values */
> -   dev->sec_per_pl = dev->sec_per_pg * dev->nr_planes;
> -   dev->sec_per_blk = dev->sec_per_pl * dev->pgs_per_blk;
> +   dev->sec_per_blk = dev->sec_per_pg * dev->pgs_per_blk;
> +   dev->sec_per_pl = dev->sec_per_blk * grp->num_blk;
> dev->sec_per_lun = dev->sec_per_blk * dev->blks_per_lun;
> dev->nr_luns = dev->luns_per_chnl * dev->nr_chnls;
>
> -   dev->total_blocks = dev->nr_planes *
> -   dev->blks_per_lun *
> +   dev->total_blocks = dev->blks_per_lun *
> dev->luns_per_chnl *
> dev->nr_chnls;
> dev->total_pages = dev->total_blocks * dev->pgs_per_blk;
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] lightnvm: calculate device values correctly

2015-11-22 Thread Wenwei Tao
In the original calculation, the relationships among
block, plane and lun was confusing, refine it on the
basis of Open-channelSSDInterfaceSpecification.

Signed-off-by: Wenwei Tao <ww.tao0...@gmail.com>
---
 drivers/lightnvm/core.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index f659e60..1864b94 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -174,8 +174,8 @@ static int nvm_core_init(struct nvm_dev *dev)
dev->nr_chnls = grp->num_ch;
dev->luns_per_chnl = grp->num_lun;
dev->pgs_per_blk = grp->num_pg;
-   dev->blks_per_lun = grp->num_blk;
dev->nr_planes = grp->num_pln;
+   dev->blks_per_lun = grp->num_blk * grp->num_pln;
dev->sec_size = grp->csecs;
dev->oob_size = grp->sos;
dev->sec_per_pg = grp->fpg_sz / grp->csecs;
@@ -191,13 +191,12 @@ static int nvm_core_init(struct nvm_dev *dev)
dev->plane_mode = NVM_PLANE_QUAD;
 
/* calculated values */
-   dev->sec_per_pl = dev->sec_per_pg * dev->nr_planes;
-   dev->sec_per_blk = dev->sec_per_pl * dev->pgs_per_blk;
+   dev->sec_per_blk = dev->sec_per_pg * dev->pgs_per_blk;
+   dev->sec_per_pl = dev->sec_per_blk * grp->num_blk;
dev->sec_per_lun = dev->sec_per_blk * dev->blks_per_lun;
dev->nr_luns = dev->luns_per_chnl * dev->nr_chnls;
 
-   dev->total_blocks = dev->nr_planes *
-   dev->blks_per_lun *
+   dev->total_blocks = dev->blks_per_lun *
dev->luns_per_chnl *
dev->nr_chnls;
dev->total_pages = dev->total_blocks * dev->pgs_per_blk;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] lightnvm: do device max sectors boundary check first

2015-11-22 Thread Wenwei Tao
do device max_phys_sect boundary check first, otherwise
we will allocate dma_pools for devices whose max sectors
are beyond lightnvm support and register them.

Signed-off-by: Wenwei Tao <ww.tao0...@gmail.com>
---
 drivers/lightnvm/core.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index f659e60..7ecf848 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -312,6 +312,11 @@ int nvm_register(struct request_queue *q, char *disk_name,
list_add(>devices, _devices);
up_write(_lock);
 
+   if (dev->ops->max_phys_sect > 256) {
+   pr_info("nvm: max sectors supported is 256.\n");
+   return -EINVAL;
+   }
+
if (dev->ops->max_phys_sect > 1) {
dev->ppalist_pool = dev->ops->create_dma_pool(dev->q,
"ppalist");
@@ -319,9 +324,6 @@ int nvm_register(struct request_queue *q, char *disk_name,
pr_err("nvm: could not create ppa pool\n");
return -ENOMEM;
}
-   } else if (dev->ops->max_phys_sect > 256) {
-   pr_info("nvm: max sectors supported is 256.\n");
-   return -EINVAL;
}
 
return 0;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] nvme: lightnvm: use nvme admin queue

2015-11-19 Thread Wenwei Tao
According to Open-ChannelSSDInterfaceSpecification 0.1,
NVMe-NVM admin commands use vendor specific admin opcodes
of NVMe, so use the NVMe admin queue to dispatch these
commands

Signed-off-by: Wenwei Tao 
---
 drivers/nvme/host/lightnvm.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index e0b7b95..7d1981d 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -244,6 +244,7 @@ static int init_grps(struct nvm_id *nvm_id, struct 
nvme_nvm_id *nvme_nvm_id)
 static int nvme_nvm_identity(struct request_queue *q, struct nvm_id *nvm_id)
 {
struct nvme_ns *ns = q->queuedata;
+   struct nvme_dev *dev = ns->dev;
struct nvme_nvm_id *nvme_nvm_id;
struct nvme_nvm_command c = {};
int ret;
@@ -256,8 +257,8 @@ static int nvme_nvm_identity(struct request_queue *q, 
struct nvm_id *nvm_id)
if (!nvme_nvm_id)
return -ENOMEM;
 
-   ret = nvme_submit_sync_cmd(q, (struct nvme_command *), nvme_nvm_id,
-   sizeof(struct nvme_nvm_id));
+   ret = nvme_submit_sync_cmd(dev->admin_q, (struct nvme_command *),
+   nvme_nvm_id, sizeof(struct nvme_nvm_id));
if (ret) {
ret = -EIO;
goto out;
@@ -299,8 +300,8 @@ static int nvme_nvm_get_l2p_tbl(struct request_queue *q, 
u64 slba, u32 nlb,
c.l2p.slba = cpu_to_le64(cmd_slba);
c.l2p.nlb = cpu_to_le32(cmd_nlb);
 
-   ret = nvme_submit_sync_cmd(q, (struct nvme_command *),
-   entries, len);
+   ret = nvme_submit_sync_cmd(dev->admin_q,
+   (struct nvme_command *), entries, len);
if (ret) {
dev_err(dev->dev, "L2P table transfer failed (%d)\n",
ret);
@@ -343,8 +344,8 @@ static int nvme_nvm_get_bb_tbl(struct request_queue *q, int 
lunid,
 
bitmap_zero(bb_bitmap, nr_blocks);
 
-   ret = nvme_submit_sync_cmd(q, (struct nvme_command *), bb_bitmap,
-   bb_bitmap_size);
+   ret = nvme_submit_sync_cmd(dev->admin_q, (struct nvme_command *),
+   bb_bitmap, bb_bitmap_size);
if (ret) {
dev_err(dev->dev, "get bad block table failed (%d)\n", ret);
ret = -EIO;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] lightnvm: fix wrong return value and remove nvme_free(dev) in nvm_init()

2015-11-19 Thread Wenwei Tao
The return value should be non-zero under error conditions.
Remove nvme_free(dev) to avoid free dev more than once.

Signed-off-by: Wenwei Tao 
---
 drivers/lightnvm/core.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index f659e60..1942752 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -220,14 +220,13 @@ static void nvm_free(struct nvm_dev *dev)
 static int nvm_init(struct nvm_dev *dev)
 {
struct nvmm_type *mt;
-   int ret = 0;
+   int ret = -EINVAL;
 
if (!dev->q || !dev->ops)
-   return -EINVAL;
+   return ret;
 
if (dev->ops->identity(dev->q, >identity)) {
pr_err("nvm: device could not be identified\n");
-   ret = -EINVAL;
goto err;
}
 
@@ -273,7 +272,6 @@ static int nvm_init(struct nvm_dev *dev)
dev->nr_chnls);
return 0;
 err:
-   nvm_free(dev);
pr_err("nvm: failed to initialize nvm\n");
return ret;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] lightnvm: fix wrong return value and remove nvme_free(dev) in nvm_init()

2015-11-19 Thread Wenwei Tao
The return value should be non-zero under error conditions.
Remove nvme_free(dev) to avoid free dev more than once.

Signed-off-by: Wenwei Tao <ww.tao0...@gmail.com>
---
 drivers/lightnvm/core.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index f659e60..1942752 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -220,14 +220,13 @@ static void nvm_free(struct nvm_dev *dev)
 static int nvm_init(struct nvm_dev *dev)
 {
struct nvmm_type *mt;
-   int ret = 0;
+   int ret = -EINVAL;
 
if (!dev->q || !dev->ops)
-   return -EINVAL;
+   return ret;
 
if (dev->ops->identity(dev->q, >identity)) {
pr_err("nvm: device could not be identified\n");
-   ret = -EINVAL;
goto err;
}
 
@@ -273,7 +272,6 @@ static int nvm_init(struct nvm_dev *dev)
dev->nr_chnls);
return 0;
 err:
-   nvm_free(dev);
pr_err("nvm: failed to initialize nvm\n");
return ret;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] nvme: lightnvm: use nvme admin queue

2015-11-19 Thread Wenwei Tao
According to Open-ChannelSSDInterfaceSpecification 0.1,
NVMe-NVM admin commands use vendor specific admin opcodes
of NVMe, so use the NVMe admin queue to dispatch these
commands

Signed-off-by: Wenwei Tao <ww.tao0...@gmail.com>
---
 drivers/nvme/host/lightnvm.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index e0b7b95..7d1981d 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -244,6 +244,7 @@ static int init_grps(struct nvm_id *nvm_id, struct 
nvme_nvm_id *nvme_nvm_id)
 static int nvme_nvm_identity(struct request_queue *q, struct nvm_id *nvm_id)
 {
struct nvme_ns *ns = q->queuedata;
+   struct nvme_dev *dev = ns->dev;
struct nvme_nvm_id *nvme_nvm_id;
struct nvme_nvm_command c = {};
int ret;
@@ -256,8 +257,8 @@ static int nvme_nvm_identity(struct request_queue *q, 
struct nvm_id *nvm_id)
if (!nvme_nvm_id)
return -ENOMEM;
 
-   ret = nvme_submit_sync_cmd(q, (struct nvme_command *), nvme_nvm_id,
-   sizeof(struct nvme_nvm_id));
+   ret = nvme_submit_sync_cmd(dev->admin_q, (struct nvme_command *),
+   nvme_nvm_id, sizeof(struct nvme_nvm_id));
if (ret) {
ret = -EIO;
goto out;
@@ -299,8 +300,8 @@ static int nvme_nvm_get_l2p_tbl(struct request_queue *q, 
u64 slba, u32 nlb,
c.l2p.slba = cpu_to_le64(cmd_slba);
c.l2p.nlb = cpu_to_le32(cmd_nlb);
 
-   ret = nvme_submit_sync_cmd(q, (struct nvme_command *),
-   entries, len);
+   ret = nvme_submit_sync_cmd(dev->admin_q,
+   (struct nvme_command *), entries, len);
if (ret) {
dev_err(dev->dev, "L2P table transfer failed (%d)\n",
ret);
@@ -343,8 +344,8 @@ static int nvme_nvm_get_bb_tbl(struct request_queue *q, int 
lunid,
 
bitmap_zero(bb_bitmap, nr_blocks);
 
-   ret = nvme_submit_sync_cmd(q, (struct nvme_command *), bb_bitmap,
-   bb_bitmap_size);
+   ret = nvme_submit_sync_cmd(dev->admin_q, (struct nvme_command *),
+   bb_bitmap, bb_bitmap_size);
if (ret) {
dev_err(dev->dev, "get bad block table failed (%d)\n", ret);
ret = -EIO;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm

2015-07-07 Thread wenwei tao
Hi Scott

I understand what you said.

I will use the function 'is_vm_hugetlb_page()' to hide the bit
combinations according to your comments in the next version of patch
set.

But for the situation like below, there isn't an obvious structure
'vma', using 'is_vm_hugetlb_page()' maybe costly or even not possible.
void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
unsigned long end, unsigned long vmflag)
{
...

if (end == TLB_FLUSH_ALL || tlb_flushall_shift == -1
|| vmflag & VM_HUGETLB) {
local_flush_tlb();
goto flush_all;
}
...
}


Thank you
Wenwei

2015-07-07 5:34 GMT+08:00 Scott Wood :
> On Fri, 2015-07-03 at 16:47 +0800, wenwei tao wrote:
>> Hi Scott
>>
>> Thank you for your comments.
>>
>> Kernel already has that function: is_vm_hugetlb_page() , but the
>> original code didn't use it,
>> in order to keep the coding style of the original code, I didn't use it
>> either.
>>
>> For the sentence like: "vma->vm_flags & VM_HUGETLB" , hiding it behind
>> 'is_vm_hugetlb_page()' is ok,
>> but the sentence like: "vma->vm_flags &
>> (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)" appears in the patch 2/6,
>> is it better to hide the bit combinations behind the
>> is_vm_hugetlb_page() ?  In my patch I just replaced it with
>> "vma->vm_flags & (VM_LOCKED|VM_PFNMAP) ||  (vma->vm_flags &
>> (VM_HUGETLB|VM_MERGEABLE)) == VM_HUGETLB".
>
> If you're going to do non-obvious things with the flags, it should be done in
> one place rather than throughout the code.  Why would you do the above and
> not "vma->vm_flags & (VM_LOCKED | VM_PFNMAP) || is_vm_hugetlb_page(vma)"?
>
> -Scott
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm

2015-07-07 Thread wenwei tao
Hi Scott

I understand what you said.

I will use the function 'is_vm_hugetlb_page()' to hide the bit
combinations according to your comments in the next version of patch
set.

But for the situation like below, there isn't an obvious structure
'vma', using 'is_vm_hugetlb_page()' maybe costly or even not possible.
void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
unsigned long end, unsigned long vmflag)
{
...

if (end == TLB_FLUSH_ALL || tlb_flushall_shift == -1
|| vmflag  VM_HUGETLB) {
local_flush_tlb();
goto flush_all;
}
...
}


Thank you
Wenwei

2015-07-07 5:34 GMT+08:00 Scott Wood scottw...@freescale.com:
 On Fri, 2015-07-03 at 16:47 +0800, wenwei tao wrote:
 Hi Scott

 Thank you for your comments.

 Kernel already has that function: is_vm_hugetlb_page() , but the
 original code didn't use it,
 in order to keep the coding style of the original code, I didn't use it
 either.

 For the sentence like: vma-vm_flags  VM_HUGETLB , hiding it behind
 'is_vm_hugetlb_page()' is ok,
 but the sentence like: vma-vm_flags 
 (VM_LOCKED|VM_HUGETLB|VM_PFNMAP) appears in the patch 2/6,
 is it better to hide the bit combinations behind the
 is_vm_hugetlb_page() ?  In my patch I just replaced it with
 vma-vm_flags  (VM_LOCKED|VM_PFNMAP) ||  (vma-vm_flags 
 (VM_HUGETLB|VM_MERGEABLE)) == VM_HUGETLB.

 If you're going to do non-obvious things with the flags, it should be done in
 one place rather than throughout the code.  Why would you do the above and
 not vma-vm_flags  (VM_LOCKED | VM_PFNMAP) || is_vm_hugetlb_page(vma)?

 -Scott

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm

2015-07-03 Thread wenwei tao
Hi Scott

Thank you for your comments.

Kernel already has that function: is_vm_hugetlb_page() , but the
original code didn't use it,
in order to keep the coding style of the original code, I didn't use it either.

For the sentence like: "vma->vm_flags & VM_HUGETLB" , hiding it behind
'is_vm_hugetlb_page()' is ok,
but the sentence like: "vma->vm_flags &
(VM_LOCKED|VM_HUGETLB|VM_PFNMAP)" appears in the patch 2/6,
is it better to hide the bit combinations behind the
is_vm_hugetlb_page() ?  In my patch I just replaced it with
"vma->vm_flags & (VM_LOCKED|VM_PFNMAP) ||  (vma->vm_flags &
(VM_HUGETLB|VM_MERGEABLE)) == VM_HUGETLB".

I am a newbie to Linux kernel, do you have any good suggestions on
this situation?

Thank you
Wenwei

2015-07-03 5:49 GMT+08:00 Scott Wood :
> On Wed, 2015-06-10 at 14:27 +0800, Wenwei Tao wrote:
>> Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB
>> and
>> VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
>> mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a
>> hugetlb
>> VMA only if it doesn't have VM_MERGEABLE been set in the same time.
>
> Eww.
>
> If you must overload such bit combinations, please hide it behind a
> vm_is_hugetlb() function.
>
> -Scott
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm

2015-07-03 Thread wenwei tao
Hi Scott

Thank you for your comments.

Kernel already has that function: is_vm_hugetlb_page() , but the
original code didn't use it,
in order to keep the coding style of the original code, I didn't use it either.

For the sentence like: vma-vm_flags  VM_HUGETLB , hiding it behind
'is_vm_hugetlb_page()' is ok,
but the sentence like: vma-vm_flags 
(VM_LOCKED|VM_HUGETLB|VM_PFNMAP) appears in the patch 2/6,
is it better to hide the bit combinations behind the
is_vm_hugetlb_page() ?  In my patch I just replaced it with
vma-vm_flags  (VM_LOCKED|VM_PFNMAP) ||  (vma-vm_flags 
(VM_HUGETLB|VM_MERGEABLE)) == VM_HUGETLB.

I am a newbie to Linux kernel, do you have any good suggestions on
this situation?

Thank you
Wenwei

2015-07-03 5:49 GMT+08:00 Scott Wood scottw...@freescale.com:
 On Wed, 2015-06-10 at 14:27 +0800, Wenwei Tao wrote:
 Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB
 and
 VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
 mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a
 hugetlb
 VMA only if it doesn't have VM_MERGEABLE been set in the same time.

 Eww.

 If you must overload such bit combinations, please hide it behind a
 vm_is_hugetlb() function.

 -Scott

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 3/6] perf: change the condition of identifying hugetlb vm

2015-06-10 Thread Wenwei Tao
Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and
VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb
VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Signed-off-by: Wenwei Tao 
---
 kernel/events/core.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f04daab..6313bdd 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5624,7 +5624,7 @@ static void perf_event_mmap_event(struct perf_mmap_event 
*mmap_event)
flags |= MAP_EXECUTABLE;
if (vma->vm_flags & VM_LOCKED)
flags |= MAP_LOCKED;
-   if (vma->vm_flags & VM_HUGETLB)
+   if ((vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB)
flags |= MAP_HUGETLB;
 
goto got_name;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 2/6] mm: change the condition of identifying hugetlb vm

2015-06-10 Thread Wenwei Tao
Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and
VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb
VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Signed-off-by: Wenwei Tao 
---
 include/linux/hugetlb_inline.h |2 +-
 include/linux/mempolicy.h  |2 +-
 mm/gup.c   |6 --
 mm/huge_memory.c   |   17 -
 mm/madvise.c   |6 --
 mm/memory.c|5 +++--
 mm/mprotect.c  |6 --
 7 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h
index 2bb681f..08dff6f 100644
--- a/include/linux/hugetlb_inline.h
+++ b/include/linux/hugetlb_inline.h
@@ -7,7 +7,7 @@
 
 static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
 {
-   return !!(vma->vm_flags & VM_HUGETLB);
+   return !!((vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB);
 }
 
 #else
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 3d385c8..40ad136 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -178,7 +178,7 @@ static inline int vma_migratable(struct vm_area_struct *vma)
return 0;
 
 #ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
-   if (vma->vm_flags & VM_HUGETLB)
+   if ((vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB)
return 0;
 #endif
 
diff --git a/mm/gup.c b/mm/gup.c
index a6e24e2..5803dab 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -166,7 +166,8 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
pud = pud_offset(pgd, address);
if (pud_none(*pud))
return no_page_table(vma, flags);
-   if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {
+   if (pud_huge(*pud) &&
+   (vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB) {
page = follow_huge_pud(mm, address, pud, flags);
if (page)
return page;
@@ -178,7 +179,8 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
pmd = pmd_offset(pud, address);
if (pmd_none(*pmd))
return no_page_table(vma, flags);
-   if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) {
+   if (pmd_huge(*pmd) &&
+   (vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB) {
page = follow_huge_pmd(mm, address, pmd, flags);
if (page)
return page;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index fc00c8c..5a9de7f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1910,7 +1910,6 @@ out:
return ret;
 }
 
-#define VM_NO_THP (VM_SPECIAL | VM_HUGETLB | VM_SHARED | VM_MAYSHARE)
 
 int hugepage_madvise(struct vm_area_struct *vma,
 unsigned long *vm_flags, int advice)
@@ -1929,7 +1928,9 @@ int hugepage_madvise(struct vm_area_struct *vma,
/*
 * Be somewhat over-protective like KSM for now!
 */
-   if (*vm_flags & (VM_HUGEPAGE | VM_NO_THP))
+   if (*vm_flags & (VM_HUGEPAGE | VM_SPECIAL |
+   VM_SHARED | VM_MAYSHARE) ||
+   (*vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB)
return -EINVAL;
*vm_flags &= ~VM_NOHUGEPAGE;
*vm_flags |= VM_HUGEPAGE;
@@ -1945,7 +1946,9 @@ int hugepage_madvise(struct vm_area_struct *vma,
/*
 * Be somewhat over-protective like KSM for now!
 */
-   if (*vm_flags & (VM_NOHUGEPAGE | VM_NO_THP))
+   if (*vm_flags & (VM_NOHUGEPAGE | VM_SPECIAL |
+   VM_SHARED | VM_MAYSHARE) ||
+   (*vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB)
return -EINVAL;
*vm_flags &= ~VM_HUGEPAGE;
*vm_flags |= VM_NOHUGEPAGE;
@@ -2052,7 +2055,8 @@ int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
if (vma->vm_ops)
/* khugepaged not yet working on file or special mappings */
return 0;
-   VM_BUG_ON_VMA(vm_flags & VM_NO_THP, vma);
+   VM_BUG_ON_VMA(vm_flags & (VM_SPECIAL | VM_SHARED | VM_MAYSHARE) ||
+   (vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB, vma);
hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
hend = vma->vm_end & HPAGE_PMD_MASK;
if (hstart < hend)
@@ -2396,7 +2400,10 @@ static bool hugepage_vma_check(struct vm_area_struct 
*vma)
return fa

[RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm

2015-06-10 Thread Wenwei Tao
Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and
VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb
VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Signed-off-by: Wenwei Tao 
---
 arch/powerpc/kvm/e500_mmu_host.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index cc536d4..d76f518 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -423,7 +423,8 @@ static inline int kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
break;
}
} else if (vma && hva >= vma->vm_start &&
-  (vma->vm_flags & VM_HUGETLB)) {
+   ((vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) ==
+   VM_HUGETLB)) {
unsigned long psize = vma_kernel_pagesize(vma);
 
tsize = (gtlbe->mas1 & MAS1_TSIZE_MASK) >>
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 5/6] x86/mm: change the condition of identifying hugetlb vm

2015-06-10 Thread Wenwei Tao
Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and
VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb
VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Signed-off-by: Wenwei Tao 
---
 arch/x86/mm/tlb.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3250f23..0247916 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -195,7 +195,8 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long 
start,
goto out;
}
 
-   if ((end != TLB_FLUSH_ALL) && !(vmflag & VM_HUGETLB))
+   if ((end != TLB_FLUSH_ALL) &&
+   !((vmflag & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB))
base_pages_to_flush = (end - start) >> PAGE_SHIFT;
 
if (base_pages_to_flush > tlb_single_page_flush_ceiling) {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 4/6] fs/binfmt_elf.c: change the condition of identifying hugetlb vm

2015-06-10 Thread Wenwei Tao
Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and
VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb
VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Signed-off-by: Wenwei Tao 
---
 fs/binfmt_elf.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 995986b..f529c8e 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1242,7 +1242,7 @@ static unsigned long vma_dump_size(struct vm_area_struct 
*vma,
return 0;
 
/* Hugetlb memory check */
-   if (vma->vm_flags & VM_HUGETLB) {
+   if ((vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB) {
if ((vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_SHARED))
goto whole;
if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 1/6] mm: add defer mechanism to ksm to make it more suitable

2015-06-10 Thread Wenwei Tao
I observe that it is unlikely for KSM to merge new pages from an area
that has already been scanned twice on Android mobile devices, so it's
a waste of power to continue to scan these areas in high frequency.
In this patch a defer mechanism is introduced which is borrowed from
page compaction to KSM. This defer mechanism can automatic lower the scan
frequency in the above case.

Signed-off-by: Wenwei Tao 
---
 mm/ksm.c |  230 ++
 1 file changed, 203 insertions(+), 27 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 4162dce..54ffcb2 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -104,6 +104,7 @@ struct mm_slot {
struct list_head mm_list;
struct rmap_item *rmap_list;
struct mm_struct *mm;
+   unsigned long seqnr;
 };
 
 /**
@@ -117,9 +118,12 @@ struct mm_slot {
  */
 struct ksm_scan {
struct mm_slot *mm_slot;
+   struct mm_slot *active_slot;
unsigned long address;
struct rmap_item **rmap_list;
unsigned long seqnr;
+   unsigned long ksm_considered;
+   unsigned int ksm_defer_shift;
 };
 
 /**
@@ -182,6 +186,11 @@ struct rmap_item {
 #define UNSTABLE_FLAG  0x100   /* is a node of the unstable tree */
 #define STABLE_FLAG0x200   /* is listed from the stable tree */
 
+#define ACTIVE_SLOT_FLAG   0x100
+#define ACTIVE_SLOT_SEQNR  0x200
+#define KSM_MAX_DEFER_SHIFT6
+
+
 /* The stable and unstable tree heads */
 static struct rb_root one_stable_tree[1] = { RB_ROOT };
 static struct rb_root one_unstable_tree[1] = { RB_ROOT };
@@ -197,14 +206,22 @@ static DEFINE_HASHTABLE(mm_slots_hash, 
MM_SLOTS_HASH_BITS);
 static struct mm_slot ksm_mm_head = {
.mm_list = LIST_HEAD_INIT(ksm_mm_head.mm_list),
 };
+
+static struct mm_slot ksm_mm_active = {
+   .mm_list = LIST_HEAD_INIT(ksm_mm_active.mm_list),
+};
+
 static struct ksm_scan ksm_scan = {
.mm_slot = _mm_head,
+   .active_slot = _mm_active,
 };
 
 static struct kmem_cache *rmap_item_cache;
 static struct kmem_cache *stable_node_cache;
 static struct kmem_cache *mm_slot_cache;
 
+static bool ksm_merged_or_unstable;
+
 /* The number of nodes in the stable tree */
 static unsigned long ksm_pages_shared;
 
@@ -336,6 +353,23 @@ static void insert_to_mm_slots_hash(struct mm_struct *mm,
hash_add(mm_slots_hash, _slot->link, (unsigned long)mm);
 }
 
+static void move_to_active_list(struct mm_slot *mm_slot)
+{
+   if (mm_slot && !(mm_slot->seqnr & ACTIVE_SLOT_FLAG)) {
+   if (ksm_run & KSM_RUN_UNMERGE && mm_slot->rmap_list)
+   return;
+   if (mm_slot == ksm_scan.mm_slot) {
+   if (ksm_scan.active_slot == _mm_active)
+   return;
+   ksm_scan.mm_slot = list_entry(mm_slot->mm_list.next,
+   struct mm_slot, mm_list);
+   }
+   list_move_tail(_slot->mm_list,
+   _scan.active_slot->mm_list);
+   mm_slot->seqnr |= (ACTIVE_SLOT_FLAG | ACTIVE_SLOT_SEQNR);
+   }
+}
+
 /*
  * ksmd, and unmerge_and_remove_all_rmap_items(), must not touch an mm's
  * page tables after it has passed through ksm_exit() - which, if necessary,
@@ -772,6 +806,15 @@ static int unmerge_and_remove_all_rmap_items(void)
int err = 0;
 
spin_lock(_mmlist_lock);
+   mm_slot = list_entry(ksm_mm_active.mm_list.next,
+   struct mm_slot, mm_list);
+   while (mm_slot != _mm_active) {
+   list_move_tail(_slot->mm_list, _mm_head.mm_list);
+   mm_slot->seqnr &= ~(ACTIVE_SLOT_FLAG | ACTIVE_SLOT_SEQNR);
+   mm_slot = list_entry(ksm_mm_active.mm_list.next,
+   struct mm_slot, mm_list);
+   }
+   ksm_scan.active_slot = _mm_active;
ksm_scan.mm_slot = list_entry(ksm_mm_head.mm_list.next,
struct mm_slot, mm_list);
spin_unlock(_mmlist_lock);
@@ -790,8 +833,8 @@ static int unmerge_and_remove_all_rmap_items(void)
if (err)
goto error;
}
-
remove_trailing_rmap_items(mm_slot, _slot->rmap_list);
+   mm_slot->seqnr = 0;
 
spin_lock(_mmlist_lock);
ksm_scan.mm_slot = list_entry(mm_slot->mm_list.next,
@@ -806,6 +849,7 @@ static int unmerge_and_remove_all_rmap_items(void)
up_read(>mmap_sem);
mmdrop(mm);
} else {
+   move_to_active_list(mm_slot);
spin_unlock(_mmlist_lock);
up_read(>mmap_sem);
}
@@ -1401,6 +1445,9 @@ static void stable_tree_append(struct rmap_item 
*rmap_item,
ksm_pages_sharing++;
else
   

[RFC PATCH 0/6] add defer mechanism to ksm to make it more suitable for Android devices

2015-06-10 Thread Wenwei Tao
I observe that it is unlikely for KSM to merge new pages from an area
that has already been scanned twice on Android mobile devices, so it's
a waste of power to scan these areas in high frequency. In this patchset,
a defer mechanism is introduced which is borrowed from page compaction to KSM.

A new slot list called active_slot is added into ksm_scan. MMs which have
VMA marked for merging via madvise are added (MM is new to KSM) or moved to
(MM is in the ksm_scan.mm_slot list) active_slot. In 
"scan_get_next_rmap_item()",
the active_slot list will be scaned firstly unless it is empty, then the 
mm_slot list.
MMs in the active_slot list will be scaned twice, after that they will be moved
to mm_slot list. Once scanning the mm_slot list, the defer mechanism will be 
activated:

a) if KSM scans "ksm_thread_pages_to_scan" pages but none of them get
merged or become unstable, increase the ksm_defer_shift(new member of ksm_scan)
by one (no more than 6 by now). And in the next "1UL << 
ksm_scan.ksm_defer_shift"
times KSM been scheduled or woken up it will not do the actual scan, compare
and merge job, it just schedule out.

b) if KSM scans "ksm_thread_pages_to_scan" pages and more than zero of them
get merged or become unstable, reset the ksm_defer_shift and ksm_considered
to zero.

Some applications may continue to produce new mergeable VMAs to KSM, in order
to avoid scanning VMAs of these applications that have already been scanned 
twice,
we use VM_HUGETLB to indicate new mergeable VMAs since hugetlb vm are not
supported by KSM.

Wenwei Tao (6):
  mm: add defer mechanism to ksm to make it more suitable
  mm: change the condition of identifying hugetlb vm
  perf: change the condition of identifying hugetlb vm
  fs/binfmt_elf.c: change the condition of identifying hugetlb vm
  x86/mm: change the condition of identifying hugetlb vm
  powerpc/kvm: change the condition of identifying hugetlb vm

 arch/powerpc/kvm/e500_mmu_host.c |3 +-
 arch/x86/mm/tlb.c|3 +-
 fs/binfmt_elf.c  |2 +-
 include/linux/hugetlb_inline.h   |2 +-
 include/linux/mempolicy.h|2 +-
 kernel/events/core.c |2 +-
 mm/gup.c |6 +-
 mm/huge_memory.c |   17 ++-
 mm/ksm.c |  230 +-
 mm/madvise.c |6 +-
 mm/memory.c  |5 +-
 mm/mprotect.c|6 +-
 12 files changed, 238 insertions(+), 46 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 1/6] mm: add defer mechanism to ksm to make it more suitable

2015-06-10 Thread Wenwei Tao
I observe that it is unlikely for KSM to merge new pages from an area
that has already been scanned twice on Android mobile devices, so it's
a waste of power to continue to scan these areas in high frequency.
In this patch a defer mechanism is introduced which is borrowed from
page compaction to KSM. This defer mechanism can automatic lower the scan
frequency in the above case.

Signed-off-by: Wenwei Tao wenweitaowen...@gmail.com
---
 mm/ksm.c |  230 ++
 1 file changed, 203 insertions(+), 27 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 4162dce..54ffcb2 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -104,6 +104,7 @@ struct mm_slot {
struct list_head mm_list;
struct rmap_item *rmap_list;
struct mm_struct *mm;
+   unsigned long seqnr;
 };
 
 /**
@@ -117,9 +118,12 @@ struct mm_slot {
  */
 struct ksm_scan {
struct mm_slot *mm_slot;
+   struct mm_slot *active_slot;
unsigned long address;
struct rmap_item **rmap_list;
unsigned long seqnr;
+   unsigned long ksm_considered;
+   unsigned int ksm_defer_shift;
 };
 
 /**
@@ -182,6 +186,11 @@ struct rmap_item {
 #define UNSTABLE_FLAG  0x100   /* is a node of the unstable tree */
 #define STABLE_FLAG0x200   /* is listed from the stable tree */
 
+#define ACTIVE_SLOT_FLAG   0x100
+#define ACTIVE_SLOT_SEQNR  0x200
+#define KSM_MAX_DEFER_SHIFT6
+
+
 /* The stable and unstable tree heads */
 static struct rb_root one_stable_tree[1] = { RB_ROOT };
 static struct rb_root one_unstable_tree[1] = { RB_ROOT };
@@ -197,14 +206,22 @@ static DEFINE_HASHTABLE(mm_slots_hash, 
MM_SLOTS_HASH_BITS);
 static struct mm_slot ksm_mm_head = {
.mm_list = LIST_HEAD_INIT(ksm_mm_head.mm_list),
 };
+
+static struct mm_slot ksm_mm_active = {
+   .mm_list = LIST_HEAD_INIT(ksm_mm_active.mm_list),
+};
+
 static struct ksm_scan ksm_scan = {
.mm_slot = ksm_mm_head,
+   .active_slot = ksm_mm_active,
 };
 
 static struct kmem_cache *rmap_item_cache;
 static struct kmem_cache *stable_node_cache;
 static struct kmem_cache *mm_slot_cache;
 
+static bool ksm_merged_or_unstable;
+
 /* The number of nodes in the stable tree */
 static unsigned long ksm_pages_shared;
 
@@ -336,6 +353,23 @@ static void insert_to_mm_slots_hash(struct mm_struct *mm,
hash_add(mm_slots_hash, mm_slot-link, (unsigned long)mm);
 }
 
+static void move_to_active_list(struct mm_slot *mm_slot)
+{
+   if (mm_slot  !(mm_slot-seqnr  ACTIVE_SLOT_FLAG)) {
+   if (ksm_run  KSM_RUN_UNMERGE  mm_slot-rmap_list)
+   return;
+   if (mm_slot == ksm_scan.mm_slot) {
+   if (ksm_scan.active_slot == ksm_mm_active)
+   return;
+   ksm_scan.mm_slot = list_entry(mm_slot-mm_list.next,
+   struct mm_slot, mm_list);
+   }
+   list_move_tail(mm_slot-mm_list,
+   ksm_scan.active_slot-mm_list);
+   mm_slot-seqnr |= (ACTIVE_SLOT_FLAG | ACTIVE_SLOT_SEQNR);
+   }
+}
+
 /*
  * ksmd, and unmerge_and_remove_all_rmap_items(), must not touch an mm's
  * page tables after it has passed through ksm_exit() - which, if necessary,
@@ -772,6 +806,15 @@ static int unmerge_and_remove_all_rmap_items(void)
int err = 0;
 
spin_lock(ksm_mmlist_lock);
+   mm_slot = list_entry(ksm_mm_active.mm_list.next,
+   struct mm_slot, mm_list);
+   while (mm_slot != ksm_mm_active) {
+   list_move_tail(mm_slot-mm_list, ksm_mm_head.mm_list);
+   mm_slot-seqnr = ~(ACTIVE_SLOT_FLAG | ACTIVE_SLOT_SEQNR);
+   mm_slot = list_entry(ksm_mm_active.mm_list.next,
+   struct mm_slot, mm_list);
+   }
+   ksm_scan.active_slot = ksm_mm_active;
ksm_scan.mm_slot = list_entry(ksm_mm_head.mm_list.next,
struct mm_slot, mm_list);
spin_unlock(ksm_mmlist_lock);
@@ -790,8 +833,8 @@ static int unmerge_and_remove_all_rmap_items(void)
if (err)
goto error;
}
-
remove_trailing_rmap_items(mm_slot, mm_slot-rmap_list);
+   mm_slot-seqnr = 0;
 
spin_lock(ksm_mmlist_lock);
ksm_scan.mm_slot = list_entry(mm_slot-mm_list.next,
@@ -806,6 +849,7 @@ static int unmerge_and_remove_all_rmap_items(void)
up_read(mm-mmap_sem);
mmdrop(mm);
} else {
+   move_to_active_list(mm_slot);
spin_unlock(ksm_mmlist_lock);
up_read(mm-mmap_sem);
}
@@ -1401,6 +1445,9 @@ static void stable_tree_append(struct rmap_item 
*rmap_item,
ksm_pages_sharing++;
else
ksm_pages_shared

[RFC PATCH 0/6] add defer mechanism to ksm to make it more suitable for Android devices

2015-06-10 Thread Wenwei Tao
I observe that it is unlikely for KSM to merge new pages from an area
that has already been scanned twice on Android mobile devices, so it's
a waste of power to scan these areas in high frequency. In this patchset,
a defer mechanism is introduced which is borrowed from page compaction to KSM.

A new slot list called active_slot is added into ksm_scan. MMs which have
VMA marked for merging via madvise are added (MM is new to KSM) or moved to
(MM is in the ksm_scan.mm_slot list) active_slot. In 
scan_get_next_rmap_item(),
the active_slot list will be scaned firstly unless it is empty, then the 
mm_slot list.
MMs in the active_slot list will be scaned twice, after that they will be moved
to mm_slot list. Once scanning the mm_slot list, the defer mechanism will be 
activated:

a) if KSM scans ksm_thread_pages_to_scan pages but none of them get
merged or become unstable, increase the ksm_defer_shift(new member of ksm_scan)
by one (no more than 6 by now). And in the next 1UL  
ksm_scan.ksm_defer_shift
times KSM been scheduled or woken up it will not do the actual scan, compare
and merge job, it just schedule out.

b) if KSM scans ksm_thread_pages_to_scan pages and more than zero of them
get merged or become unstable, reset the ksm_defer_shift and ksm_considered
to zero.

Some applications may continue to produce new mergeable VMAs to KSM, in order
to avoid scanning VMAs of these applications that have already been scanned 
twice,
we use VM_HUGETLB to indicate new mergeable VMAs since hugetlb vm are not
supported by KSM.

Wenwei Tao (6):
  mm: add defer mechanism to ksm to make it more suitable
  mm: change the condition of identifying hugetlb vm
  perf: change the condition of identifying hugetlb vm
  fs/binfmt_elf.c: change the condition of identifying hugetlb vm
  x86/mm: change the condition of identifying hugetlb vm
  powerpc/kvm: change the condition of identifying hugetlb vm

 arch/powerpc/kvm/e500_mmu_host.c |3 +-
 arch/x86/mm/tlb.c|3 +-
 fs/binfmt_elf.c  |2 +-
 include/linux/hugetlb_inline.h   |2 +-
 include/linux/mempolicy.h|2 +-
 kernel/events/core.c |2 +-
 mm/gup.c |6 +-
 mm/huge_memory.c |   17 ++-
 mm/ksm.c |  230 +-
 mm/madvise.c |6 +-
 mm/memory.c  |5 +-
 mm/mprotect.c|6 +-
 12 files changed, 238 insertions(+), 46 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 4/6] fs/binfmt_elf.c: change the condition of identifying hugetlb vm

2015-06-10 Thread Wenwei Tao
Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and
VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb
VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Signed-off-by: Wenwei Tao wenweitaowen...@gmail.com
---
 fs/binfmt_elf.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 995986b..f529c8e 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1242,7 +1242,7 @@ static unsigned long vma_dump_size(struct vm_area_struct 
*vma,
return 0;
 
/* Hugetlb memory check */
-   if (vma-vm_flags  VM_HUGETLB) {
+   if ((vma-vm_flags  (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB) {
if ((vma-vm_flags  VM_SHARED)  FILTER(HUGETLB_SHARED))
goto whole;
if (!(vma-vm_flags  VM_SHARED)  FILTER(HUGETLB_PRIVATE))
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm

2015-06-10 Thread Wenwei Tao
Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and
VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb
VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Signed-off-by: Wenwei Tao wenweitaowen...@gmail.com
---
 arch/powerpc/kvm/e500_mmu_host.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index cc536d4..d76f518 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -423,7 +423,8 @@ static inline int kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
break;
}
} else if (vma  hva = vma-vm_start 
-  (vma-vm_flags  VM_HUGETLB)) {
+   ((vma-vm_flags  (VM_HUGETLB | VM_MERGEABLE)) ==
+   VM_HUGETLB)) {
unsigned long psize = vma_kernel_pagesize(vma);
 
tsize = (gtlbe-mas1  MAS1_TSIZE_MASK) 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 5/6] x86/mm: change the condition of identifying hugetlb vm

2015-06-10 Thread Wenwei Tao
Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and
VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb
VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Signed-off-by: Wenwei Tao wenweitaowen...@gmail.com
---
 arch/x86/mm/tlb.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3250f23..0247916 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -195,7 +195,8 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long 
start,
goto out;
}
 
-   if ((end != TLB_FLUSH_ALL)  !(vmflag  VM_HUGETLB))
+   if ((end != TLB_FLUSH_ALL) 
+   !((vmflag  (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB))
base_pages_to_flush = (end - start)  PAGE_SHIFT;
 
if (base_pages_to_flush  tlb_single_page_flush_ceiling) {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 2/6] mm: change the condition of identifying hugetlb vm

2015-06-10 Thread Wenwei Tao
Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and
VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb
VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Signed-off-by: Wenwei Tao wenweitaowen...@gmail.com
---
 include/linux/hugetlb_inline.h |2 +-
 include/linux/mempolicy.h  |2 +-
 mm/gup.c   |6 --
 mm/huge_memory.c   |   17 -
 mm/madvise.c   |6 --
 mm/memory.c|5 +++--
 mm/mprotect.c  |6 --
 7 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h
index 2bb681f..08dff6f 100644
--- a/include/linux/hugetlb_inline.h
+++ b/include/linux/hugetlb_inline.h
@@ -7,7 +7,7 @@
 
 static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
 {
-   return !!(vma-vm_flags  VM_HUGETLB);
+   return !!((vma-vm_flags  (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB);
 }
 
 #else
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 3d385c8..40ad136 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -178,7 +178,7 @@ static inline int vma_migratable(struct vm_area_struct *vma)
return 0;
 
 #ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
-   if (vma-vm_flags  VM_HUGETLB)
+   if ((vma-vm_flags  (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB)
return 0;
 #endif
 
diff --git a/mm/gup.c b/mm/gup.c
index a6e24e2..5803dab 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -166,7 +166,8 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
pud = pud_offset(pgd, address);
if (pud_none(*pud))
return no_page_table(vma, flags);
-   if (pud_huge(*pud)  vma-vm_flags  VM_HUGETLB) {
+   if (pud_huge(*pud) 
+   (vma-vm_flags  (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB) {
page = follow_huge_pud(mm, address, pud, flags);
if (page)
return page;
@@ -178,7 +179,8 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
pmd = pmd_offset(pud, address);
if (pmd_none(*pmd))
return no_page_table(vma, flags);
-   if (pmd_huge(*pmd)  vma-vm_flags  VM_HUGETLB) {
+   if (pmd_huge(*pmd) 
+   (vma-vm_flags  (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB) {
page = follow_huge_pmd(mm, address, pmd, flags);
if (page)
return page;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index fc00c8c..5a9de7f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1910,7 +1910,6 @@ out:
return ret;
 }
 
-#define VM_NO_THP (VM_SPECIAL | VM_HUGETLB | VM_SHARED | VM_MAYSHARE)
 
 int hugepage_madvise(struct vm_area_struct *vma,
 unsigned long *vm_flags, int advice)
@@ -1929,7 +1928,9 @@ int hugepage_madvise(struct vm_area_struct *vma,
/*
 * Be somewhat over-protective like KSM for now!
 */
-   if (*vm_flags  (VM_HUGEPAGE | VM_NO_THP))
+   if (*vm_flags  (VM_HUGEPAGE | VM_SPECIAL |
+   VM_SHARED | VM_MAYSHARE) ||
+   (*vm_flags  (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB)
return -EINVAL;
*vm_flags = ~VM_NOHUGEPAGE;
*vm_flags |= VM_HUGEPAGE;
@@ -1945,7 +1946,9 @@ int hugepage_madvise(struct vm_area_struct *vma,
/*
 * Be somewhat over-protective like KSM for now!
 */
-   if (*vm_flags  (VM_NOHUGEPAGE | VM_NO_THP))
+   if (*vm_flags  (VM_NOHUGEPAGE | VM_SPECIAL |
+   VM_SHARED | VM_MAYSHARE) ||
+   (*vm_flags  (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB)
return -EINVAL;
*vm_flags = ~VM_HUGEPAGE;
*vm_flags |= VM_NOHUGEPAGE;
@@ -2052,7 +2055,8 @@ int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
if (vma-vm_ops)
/* khugepaged not yet working on file or special mappings */
return 0;
-   VM_BUG_ON_VMA(vm_flags  VM_NO_THP, vma);
+   VM_BUG_ON_VMA(vm_flags  (VM_SPECIAL | VM_SHARED | VM_MAYSHARE) ||
+   (vm_flags  (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB, vma);
hstart = (vma-vm_start + ~HPAGE_PMD_MASK)  HPAGE_PMD_MASK;
hend = vma-vm_end  HPAGE_PMD_MASK;
if (hstart  hend)
@@ -2396,7 +2400,10 @@ static bool hugepage_vma_check(struct vm_area_struct 
*vma)
return false;
if (is_vma_temporary_stack(vma))
return false;
-   VM_BUG_ON_VMA(vma-vm_flags  VM_NO_THP, vma);
+   VM_BUG_ON_VMA(vma-vm_flags

[RFC PATCH 3/6] perf: change the condition of identifying hugetlb vm

2015-06-10 Thread Wenwei Tao
Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and
VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb
VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Signed-off-by: Wenwei Tao wenweitaowen...@gmail.com
---
 kernel/events/core.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f04daab..6313bdd 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5624,7 +5624,7 @@ static void perf_event_mmap_event(struct perf_mmap_event 
*mmap_event)
flags |= MAP_EXECUTABLE;
if (vma-vm_flags  VM_LOCKED)
flags |= MAP_LOCKED;
-   if (vma-vm_flags  VM_HUGETLB)
+   if ((vma-vm_flags  (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB)
flags |= MAP_HUGETLB;
 
goto got_name;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3