[Let's add some more people and linux-mm to the CC list] On Wed 20-09-17 11:23:50, XaviLi wrote: > PageONE (Page Object Non-duplicate Engine) is a multithread kernel page > deduplication engine. It is based on a lock-less tree algorithm we currently > named as SD (Static and Dynamic) Tree. Normal operations such as > insert/query/delete to this tree are block-less. Adding more CPU cores can > linearly boost speed as far as we tested. Multithreading gives not only > opportunity to work faster. It also allows any CPU to donate spare time for > the job. Therefore, it reveals a way to use CPU more efficiently. PPR is from > an open source solution named Dynamic VM: > https://github.com/baibantech/dynamic_vm.git > > patch can be found here: > https://github.com/baibantech/dynamic_vm/tree/master/dynamic_vm_0.5 > > One work thread of PageONE can match the speed of KSM daemon. Adding more > CPUs can increase speed linearly. Here we can see a brief test: > > Test environment > DELL R730 > Intel® Xeon® E5-2650 v4 (2.20 GHz, of Cores 12, threads 24); > 256GB RAM > Host OS: Ubuntu server 14.04 Host kernel: 4.4.1 > Qemu: 2.9.0 > Guest OS: Ubuntu server 16.04 Guest kernel: 4.4.76 > > We ran 12 VMs together. Each create 16GB data in memory. After all data is > ready we start dedup-engine and see how host-side used memory amount changes. > > KSM: > Configuration: sleep_millisecs = 0, pages_to_scan = 1000000 > Starting used memory: 216.8G > Result: KSM start merging pages immediately after turned on. KSM daemon took > 100% of one CPU for 13:16 until used memory was reduced to 79.0GB. > > PageONE: > Configuration: merge_period(secs) = 20, work threads = 12 > Starting used memory: 207.3G > (Which means PageONE scans full physical memory in 20 secs period. Pages was > merged if not changed in 2 merge_periods.) > Result: In the first two periods PageONE only observe and identify unchanged > pages. Little CPU was used in this time. As the third period begin all 12 > threads start using 100% CPU to do real merge job. 00:58 later used memory > was reduced to 70.5GB. > > We ran the above test using the data quite easy for red-black tree of KSM. > Every difference can be detected by comparing the first 8 bytes. Then we ran > another test in which each data was begin with random zero bytes for > comparison. The average size of zero data was 128 bytes. Result is shown > below: > > KSM: > Configuration: sleep_millisecs = 0, pages_to_scan = 1000000 > Starting used memory: 216.8G > Result: 19:49 minutes until used memory was reduced to 78.7GB. > > PageONE: > Configuration: merge period(secs) = 20, work threads = 12 > Starting used memory: 210.3G > Result: First 2 periods same as above. 1:09 after merge job start memory was > reduced to 72GB. > > PageONE shows little difference in the two tests because SD tree search > compare each key bit just once in most cases.
-- Michal Hocko SUSE Labs