[ https://issues.apache.org/jira/browse/IGNITE-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Denis Magda updated IGNITE-5884: -------------------------------- Description: Checkpoint write speed is suboptimal with default 2K page on most UNIX-driven enviroments with SSD disk. There are several reasons for this: 1) Page size of linux page cache is 4k by default on most kernels (you can check yours by "getconf PAGE_SIZE" command). With 2k random writes vm.dirty_ratio threshold is reached two times faster than with 4k random writes. 2) Most SSD manufacturers don't expose actual disk page size, but they recommend to write at least 4k at once. Also, 4k blocks are used during benchmarking SSD random writes. Related question: https://superuser.com/questions/1168014/nvme-ssd-why-is-4k-writing-faster-than-reading Article by Emmanuel Goossaert describing why writing less than a page is сounterproductive: http://codecapsule.com/2014/02/12/coding-for-ssds-part-3-pages-blocks-and-the-flash-translation-layer/ I've prepared a checkpoint emulation benchmark (code and results attached). Run on production-level hardware (CentOS, 100 GB RAM, total LFS size is 100GB, vm.dirty_ratio=10) showed that checkpointing with 4k pages is much more efficient than with 2k. *Important: backwards compatibility must be ensured with LFS files created with old 2k default page size.* was: Checkpoint write speed is suboptimal with default 2K page on most UNIX-driven enviroments with SSD disk. There are several reasons for this: 1) Page size of linux page cache is 4k by default on most kernels (you can check yours by "getconf PAGE_SIZE" command). With 2k random writes vm.dirty_ratio threshold is reached two times faster than with 4k random writes. 2) Most SSD manufacturers don't reveal actual disk page size, but they recommend to write at least 4k at once. Also, 4k blocks are used during benchmarking SSD random writes. Related question: https://superuser.com/questions/1168014/nvme-ssd-why-is-4k-writing-faster-than-reading Article by Emmanuel Goossaert describing why writing less than a page is сounterproductive: http://codecapsule.com/2014/02/12/coding-for-ssds-part-3-pages-blocks-and-the-flash-translation-layer/ I've prepared a checkpoint emulation benchmark (code and results attached). Run on production-level hardware (CentOS, 100 GB RAM, total LFS size is 100GB, vm.dirty_ratio=10) showed that checkpointing with 4k pages is much more efficient than with 2k. *Important: backwards compatibility must be ensured with LFS files created with old 2k default page size.* > Change default pageSize of page memory to 4KB > --------------------------------------------- > > Key: IGNITE-5884 > URL: https://issues.apache.org/jira/browse/IGNITE-5884 > Project: Ignite > Issue Type: Improvement > Components: persistence > Reporter: Ivan Rakov > Assignee: Ivan Rakov > Priority: Critical > Labels: usability > Fix For: 2.2 > > Attachments: CpBenchmark.java, iostat.log, ssdlab.log > > > Checkpoint write speed is suboptimal with default 2K page on most UNIX-driven > enviroments with SSD disk. There are several reasons for this: > 1) Page size of linux page cache is 4k by default on most kernels (you can > check yours by "getconf PAGE_SIZE" command). With 2k random writes > vm.dirty_ratio threshold is reached two times faster than with 4k random > writes. > 2) Most SSD manufacturers don't expose actual disk page size, but they > recommend to write at least 4k at once. Also, 4k blocks are used during > benchmarking SSD random writes. > Related question: > https://superuser.com/questions/1168014/nvme-ssd-why-is-4k-writing-faster-than-reading > Article by Emmanuel Goossaert describing why writing less than a page is > сounterproductive: > http://codecapsule.com/2014/02/12/coding-for-ssds-part-3-pages-blocks-and-the-flash-translation-layer/ > I've prepared a checkpoint emulation benchmark (code and results attached). > Run on production-level hardware (CentOS, 100 GB RAM, total LFS size is > 100GB, vm.dirty_ratio=10) showed that checkpointing with 4k pages is much > more efficient than with 2k. > *Important: backwards compatibility must be ensured with LFS files created > with old 2k default page size.* -- This message was sent by Atlassian JIRA (v6.4.14#64029)