Plus, differently from your testbed, in my pixel device, there seems to be much more contention in vmap() operation. If it's not there, I agree that there might not be a big difference between vmap() and vm_map_ram().
2020년 8월 11일 (화) 오후 8:29, Gao Xiang <hsiang...@redhat.com>님이 작성: > > On Tue, Aug 11, 2020 at 08:21:23PM +0900, Daeho Jeong wrote: > > Sure, I'll update the test condition as you said in the commit message. > > FYI, the test is done with 16kb chunk and Pixel 3 (arm64) device. > > Yeah, anyway, it'd better to lock the freq and offline the little > cores in your test as well (it'd make more sense). e.g. if 16k cluster > is applied, even all data is zeroed, the count of vmap/vm_map_ram > isn't hugeous (and as you said, "sometimes, it has a very long delay", > it's much like another scheduling concern as well). > > Anyway, I'm not against your commit but the commit message is a bit > of unclear. At least, if you think that is really the case, I'm ok > with that. > > Thanks, > Gao Xiang > > > > > Thanks, > > > > 2020ë…„ 8ì›” 11ì�¼ (í™”) 오후 7:18, Gao Xiang > > <hsiang...@redhat.com>님ì�´ 작성: > > > > > > On Tue, Aug 11, 2020 at 06:33:26PM +0900, Daeho Jeong wrote: > > > > Plus, when we use vmap(), vmap() normally executes in a short time > > > > like vm_map_ram(). > > > > But, sometimes, it has a very long delay. > > > > > > > > 2020ë…„ 8ì›â€� 11� (Ãâ„¢â€�) 오Û„ 6:28, Daeho > > > > Jeong <daeh...@gmail.com>님� 작성: > > > > > > > > > > Actually, as you can see, I use the whole zero data blocks in the > > > > > test file. > > > > > It can maximize the effect of changing virtual mapping. > > > > > When I use normal files which can be compressed about 70% from the > > > > > original file, > > > > > The vm_map_ram() version is about 2x faster than vmap() version. > > > > > > What f2fs does is much similar to btrfs compression. Even if these > > > blocks are all zeroed. In principle, the maximum compression ratio > > > is determined (cluster sized blocks into one compressed block, e.g > > > 16k cluster into one compressed block). > > > > > > So it'd be better to describe your configured cluster size (16k or > > > 128k) and your hardware information in the commit message as well. > > > > > > Actually, I also tried with this patch as well on my x86 laptop just > > > now with FIO (I didn't use zeroed block though), and I didn't notice > > > much difference with turbo boost off and maxfreq. > > > > > > I'm not arguing this commit, just a note about this commit message. > > > > > > >> 1048576000 bytes (0.9 G) copied, 9.146217 s, 109 M/s > > > > > > >> 1048576000 bytes (0.9 G) copied, 9.997542 s, 100 M/s > > > > > > >> 1048576000 bytes (0.9 G) copied, 10.109727 s, 99 M/s > > > > > > IMHO, the above number is much like decompressing in the arm64 little > > > cores. > > > > > > Thanks, > > > Gao Xiang > > > > > > > > > > > > > > > > 2020ë…„ 8ì›â€� 11� (Ãâ„¢â€�) 오Û„ 4:55, Chao > > > > > Yu <yuch...@huawei.com>님� 작성: > > > > > > > > > > > > On 2020/8/11 15:15, Gao Xiang wrote: > > > > > > > On Tue, Aug 11, 2020 at 12:37:53PM +0900, Daeho Jeong wrote: > > > > > > >> From: Daeho Jeong <daehoje...@google.com> > > > > > > >> > > > > > > >> By profiling f2fs compression works, I've found vmap() callings > > > > > > >> are > > > > > > >> bottlenecks of f2fs decompression path. Changing these with > > > > > > >> vm_map_ram(), we can enhance f2fs decompression speed pretty > > > > > > >> much. > > > > > > >> > > > > > > >> [Verification] > > > > > > >> dd if=/dev/zero of=dummy bs=1m count=1000 > > > > > > >> echo 3 > /proc/sys/vm/drop_caches > > > > > > >> dd if=dummy of=/dev/zero bs=512k > > > > > > >> > > > > > > >> - w/o compression - > > > > > > >> 1048576000 bytes (0.9 G) copied, 1.999384 s, 500 M/s > > > > > > >> 1048576000 bytes (0.9 G) copied, 2.035988 s, 491 M/s > > > > > > >> 1048576000 bytes (0.9 G) copied, 2.039457 s, 490 M/s > > > > > > >> > > > > > > >> - before patch - > > > > > > >> 1048576000 bytes (0.9 G) copied, 9.146217 s, 109 M/s > > > > > > >> 1048576000 bytes (0.9 G) copied, 9.997542 s, 100 M/s > > > > > > >> 1048576000 bytes (0.9 G) copied, 10.109727 s, 99 M/s > > > > > > >> > > > > > > >> - after patch - > > > > > > >> 1048576000 bytes (0.9 G) copied, 2.253441 s, 444 M/s > > > > > > >> 1048576000 bytes (0.9 G) copied, 2.739764 s, 365 M/s > > > > > > >> 1048576000 bytes (0.9 G) copied, 2.185649 s, 458 M/s > > > > > > > > > > > > > > Indeed, vmap() approach has some impact on the whole > > > > > > > workflow. But I don't think the gap is such significant, > > > > > > > maybe it relates to unlocked cpufreq (and big little > > > > > > > core difference if it's on some arm64 board). > > > > > > > > > > > > Agreed, > > > > > > > > > > > > I guess there should be other reason causing the large performance > > > > > > gap, scheduling, frequency, or something else. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Linux-f2fs-devel mailing list > > > > > > > Linux-f2fs-devel@lists.sourceforge.net > > > > > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel > > > > > > > . > > > > > > > > > > > > > > > > > _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel