[jira] [Updated] (IGNITE-5884) Change default pageSize of page memory to 4KB

Denis Magda (JIRA) Fri, 18 Aug 2017 15:51:18 -0700

     [ 
https://issues.apache.org/jira/browse/IGNITE-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Denis Magda updated IGNITE-5884:
--------------------------------
    Description: 
Checkpoint write speed is suboptimal with default 2K page on most UNIX-driven 
enviroments with SSD disk. There are several reasons for this:
1) Page size of linux page cache is 4k by default on most kernels (you can 
check yours by "getconf PAGE_SIZE" command). With 2k random writes 
vm.dirty_ratio threshold is reached two times faster than with 4k random writes.
2) Most SSD manufacturers don't expose actual disk page size, but they 
recommend to write at least 4k at once. Also, 4k blocks are used during 
benchmarking SSD random writes. 
Related question: 
https://superuser.com/questions/1168014/nvme-ssd-why-is-4k-writing-faster-than-reading
Article by Emmanuel Goossaert describing why writing less than a page is 
сounterproductive: 
http://codecapsule.com/2014/02/12/coding-for-ssds-part-3-pages-blocks-and-the-flash-translation-layer/
I've prepared a checkpoint emulation benchmark (code and results attached). Run 
on production-level hardware (CentOS, 100 GB RAM, total LFS size is 100GB, 
vm.dirty_ratio=10) showed that checkpointing with 4k pages is much more 
efficient than with 2k.
*Important: backwards compatibility must be ensured with LFS files created with 
old 2k default page size.*

  was:
Checkpoint write speed is suboptimal with default 2K page on most UNIX-driven 
enviroments with SSD disk. There are several reasons for this:
1) Page size of linux page cache is 4k by default on most kernels (you can 
check yours by "getconf PAGE_SIZE" command). With 2k random writes 
vm.dirty_ratio threshold is reached two times faster than with 4k random writes.
2) Most SSD manufacturers don't reveal actual disk page size, but they 
recommend to write at least 4k at once. Also, 4k blocks are used during 
benchmarking SSD random writes. 
Related question: 
https://superuser.com/questions/1168014/nvme-ssd-why-is-4k-writing-faster-than-reading
Article by Emmanuel Goossaert describing why writing less than a page is 
сounterproductive: 
http://codecapsule.com/2014/02/12/coding-for-ssds-part-3-pages-blocks-and-the-flash-translation-layer/
I've prepared a checkpoint emulation benchmark (code and results attached). Run 
on production-level hardware (CentOS, 100 GB RAM, total LFS size is 100GB, 
vm.dirty_ratio=10) showed that checkpointing with 4k pages is much more 
efficient than with 2k.
*Important: backwards compatibility must be ensured with LFS files created with 
old 2k default page size.*


> Change default pageSize of page memory to 4KB
> ---------------------------------------------
>
>                 Key: IGNITE-5884
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5884
>             Project: Ignite
>          Issue Type: Improvement
>          Components: persistence
>            Reporter: Ivan Rakov
>            Assignee: Ivan Rakov
>            Priority: Critical
>              Labels: usability
>             Fix For: 2.2
>
>         Attachments: CpBenchmark.java, iostat.log, ssdlab.log
>
>
> Checkpoint write speed is suboptimal with default 2K page on most UNIX-driven 
> enviroments with SSD disk. There are several reasons for this:
> 1) Page size of linux page cache is 4k by default on most kernels (you can 
> check yours by "getconf PAGE_SIZE" command). With 2k random writes 
> vm.dirty_ratio threshold is reached two times faster than with 4k random 
> writes.
> 2) Most SSD manufacturers don't expose actual disk page size, but they 
> recommend to write at least 4k at once. Also, 4k blocks are used during 
> benchmarking SSD random writes. 
> Related question: 
> https://superuser.com/questions/1168014/nvme-ssd-why-is-4k-writing-faster-than-reading
> Article by Emmanuel Goossaert describing why writing less than a page is 
> сounterproductive: 
> http://codecapsule.com/2014/02/12/coding-for-ssds-part-3-pages-blocks-and-the-flash-translation-layer/
> I've prepared a checkpoint emulation benchmark (code and results attached). 
> Run on production-level hardware (CentOS, 100 GB RAM, total LFS size is 
> 100GB, vm.dirty_ratio=10) showed that checkpointing with 4k pages is much 
> more efficient than with 2k.
> *Important: backwards compatibility must be ensured with LFS files created 
> with old 2k default page size.*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (IGNITE-5884) Change default pageSize of page memory to 4KB

Reply via email to