Hi Jaegeuk,

01.09.2016, 23:07, "Jaegeuk Kim" <[email protected]>:
> On Thu, Sep 01, 2016 at 08:04:31PM +0300, Alexander Gordeev wrote:
>>  Hi Jaegeuk,
>>
>>  29.08.2016, 21:24, "Jaegeuk Kim" <[email protected]>:
>>  > What I've found from your trace are:
>>  > - there are two files (ino=17690, ino=17691) which shared the data log.
>>  > - ino=17690 writes data sequentiallly, and ino=17691 writes small data 
>> randomly.
>>  > - ino=17690 writes misaligned 4KB blocks at every around 296KB which 
>> produces
>>  >  dirty segments.
>>  >
>>  > Could you check all the writes and truncation in your app are aligned to 
>> 4KB?
>>  > And, if ino=17691 is sqlite, it needs to check whether it is reaaly using 
>> other
>>  > data log.
>>
>>  I collected more logs from both kernel tracing and strace and tried to get 
>> more
>>  understanding of this. I think, I get what's wrong now.
>>
>>  ino=17690 is a video file. ino=17691 is not SQLite, it is an index file. It 
>> is written
>>  24 bytes per frame. Here is a small piece of strace log for writing a 
>> single frame:
>>
>>  write(19, "...", 4) = 4
>>  write(19, "...", 4) = 4
>>  write(19, "...", 2432) = 2432
>>  write(20, "...", 24) = 24
>>
>>  First three writes are writing to a video file (4 byte stream id, then 4 
>> byte length
>>  and then the actual frame), then the fourth one writes to and index file. 
>> Yes, I know,
>>  this looks ugly. :)
>>  All the writes are not aligned to 4096, but there are no truncations, only
>>  appending.
>>
>>  Then, I think, I see f2fs worker thread wakes up about every two seconds to
>>  write dirty pages. Unfortunately it seems to write everything collected so 
>> far, even
>>  the most recent pages, which are not fully filled yet. I'd say that can not 
>> be
>>  expected, that every app will write data aligned to 4096 bytes. So this 
>> means
>>  more overhead and overwrites even in a more general case. Is it different in
>>  mode=adaptive?
>
> No, the flushing time is controlled by vm, and you can tune that through proc.
> And, IMO, even if those are append-only, it'd be worth to split index and 
> media
> files into different logs; it seems using the cold log for media file only 
> would
> be recommendable.
>
>>  The 296KB size, probably, comes from my bitrate, which is about 142KB/s, 
>> times
>>  2 seconds. It is roughly the right size.
>>  My video FPS is about 30, so the size of data, written to an index, is 
>> about 1440
>>  in two seconds. This is why it looks like randow writes, I think.
>>
>>  Also I see from my new traces, that f2fs_submit_write_bio for other inodes
>>  are writing to completely different sectors. Looks like the "cold" data 
>> feature
>>  is working good.
>>
>>  To conclude:
>>  1. I think I can leave everything as is because (1) there is a small number 
>> of
>>  rewrites and (2) I start rotating the archive at 95% utilization so given 
>> the tiny
>>  amount of data in index and sqlite files, this should be ok, I hope.
>
> If both of index and media files are deleted before suffering from cleaning,
> IMO, it'd be fine. You can check the cleaning information in status file.
>
>>  2. But I'd better write both video and index files at 4096 boundary.
>>  3. Or this should be fixed in f2fs. I think, there should be a configurable 
>> amount
>>  of time to wait for dirty page to expire. It should be written only after 
>> expiration.
>>  Unless a user calls fsync() of course. Is there such a tunable?
>>
>>  Does this make sense?
>
> Yeah, I think you can tune flushing timing through proc entries.
> (e.g., /proc/sys/vm/dirty_writeback_centisecs)

After searching for more information about what /proc/sys/vm/dirty_*
options do, I found this email: https://lkml.org/lkml/2013/9/10/603
Now I understand why the flush thread writes even very recent pages.
I was under a wrong impression, that it checks timestamps on a per page
basis, not per inode. So I thought, that f2fs does it differently. :) Sorry.

Well, looks like this case is now completely clear to everyone. Probably,
I should write an article about tuning f2fs for this type of workload. :)

Thank you very much for all the help!

-- 
 Alexander

------------------------------------------------------------------------------
_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to