Hi Jaegeuk, 01.09.2016, 23:07, "Jaegeuk Kim" <[email protected]>: > On Thu, Sep 01, 2016 at 08:04:31PM +0300, Alexander Gordeev wrote: >> Hi Jaegeuk, >> >> 29.08.2016, 21:24, "Jaegeuk Kim" <[email protected]>: >> > What I've found from your trace are: >> > - there are two files (ino=17690, ino=17691) which shared the data log. >> > - ino=17690 writes data sequentiallly, and ino=17691 writes small data >> randomly. >> > - ino=17690 writes misaligned 4KB blocks at every around 296KB which >> produces >> > dirty segments. >> > >> > Could you check all the writes and truncation in your app are aligned to >> 4KB? >> > And, if ino=17691 is sqlite, it needs to check whether it is reaaly using >> other >> > data log. >> >> I collected more logs from both kernel tracing and strace and tried to get >> more >> understanding of this. I think, I get what's wrong now. >> >> ino=17690 is a video file. ino=17691 is not SQLite, it is an index file. It >> is written >> 24 bytes per frame. Here is a small piece of strace log for writing a >> single frame: >> >> write(19, "...", 4) = 4 >> write(19, "...", 4) = 4 >> write(19, "...", 2432) = 2432 >> write(20, "...", 24) = 24 >> >> First three writes are writing to a video file (4 byte stream id, then 4 >> byte length >> and then the actual frame), then the fourth one writes to and index file. >> Yes, I know, >> this looks ugly. :) >> All the writes are not aligned to 4096, but there are no truncations, only >> appending. >> >> Then, I think, I see f2fs worker thread wakes up about every two seconds to >> write dirty pages. Unfortunately it seems to write everything collected so >> far, even >> the most recent pages, which are not fully filled yet. I'd say that can not >> be >> expected, that every app will write data aligned to 4096 bytes. So this >> means >> more overhead and overwrites even in a more general case. Is it different in >> mode=adaptive? > > No, the flushing time is controlled by vm, and you can tune that through proc. > And, IMO, even if those are append-only, it'd be worth to split index and > media > files into different logs; it seems using the cold log for media file only > would > be recommendable. > >> The 296KB size, probably, comes from my bitrate, which is about 142KB/s, >> times >> 2 seconds. It is roughly the right size. >> My video FPS is about 30, so the size of data, written to an index, is >> about 1440 >> in two seconds. This is why it looks like randow writes, I think. >> >> Also I see from my new traces, that f2fs_submit_write_bio for other inodes >> are writing to completely different sectors. Looks like the "cold" data >> feature >> is working good. >> >> To conclude: >> 1. I think I can leave everything as is because (1) there is a small number >> of >> rewrites and (2) I start rotating the archive at 95% utilization so given >> the tiny >> amount of data in index and sqlite files, this should be ok, I hope. > > If both of index and media files are deleted before suffering from cleaning, > IMO, it'd be fine. You can check the cleaning information in status file. > >> 2. But I'd better write both video and index files at 4096 boundary. >> 3. Or this should be fixed in f2fs. I think, there should be a configurable >> amount >> of time to wait for dirty page to expire. It should be written only after >> expiration. >> Unless a user calls fsync() of course. Is there such a tunable? >> >> Does this make sense? > > Yeah, I think you can tune flushing timing through proc entries. > (e.g., /proc/sys/vm/dirty_writeback_centisecs)
After searching for more information about what /proc/sys/vm/dirty_* options do, I found this email: https://lkml.org/lkml/2013/9/10/603 Now I understand why the flush thread writes even very recent pages. I was under a wrong impression, that it checks timestamps on a per page basis, not per inode. So I thought, that f2fs does it differently. :) Sorry. Well, looks like this case is now completely clear to everyone. Probably, I should write an article about tuning f2fs for this type of workload. :) Thank you very much for all the help! -- Alexander ------------------------------------------------------------------------------ _______________________________________________ Linux-f2fs-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
