...
> >>>>>>>>>>>>>>> From: Hyunchul Lee <cheol....@lge.com>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Using write hints[1], applications can inform the life time 
> >>>>>>>>>>>>>>> of the data
> >>>>>>>>>>>>>>> written to devices. and this[2] reported that the write hints 
> >>>>>>>>>>>>>>> patch
> >>>>>>>>>>>>>>> decreased writes in NAND by 25%.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This hints help F2FS to determine the followings.
> >>>>>>>>>>>>>>>   1) the segment types where the data will be written.
> >>>>>>>>>>>>>>>   2) the hints that will be passed down to devices with the 
> >>>>>>>>>>>>>>> data of segments.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This patch set implements the first mapping from write hints 
> >>>>>>>>>>>>>>> to segment types
> >>>>>>>>>>>>>>> as shown below.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>   hints                     segment type
> >>>>>>>>>>>>>>>   -----                     ------------
> >>>>>>>>>>>>>>>   WRITE_LIFE_SHORT          CURSEG_COLD_DATA
> >>>>>>>>>>>>>>>   WRITE_LIFE_EXTREME        CURSEG_HOT_DATA
> >>>>>>>>>>>>>>>   others                    CURSEG_WARM_DATA
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The F2FS poliy for hot/cold seperation has precedence over 
> >>>>>>>>>>>>>>> this hints, And
> >>>>>>>>>>>>>>> hints are not applied in in-place update.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Could we change to disable IPU if file/inode write hint is 
> >>>>>>>>>>>>>> existing?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I am afraid that this makes side effects. for example, this 
> >>>>>>>>>>>>> could cause
> >>>>>>>>>>>>> out-of-place updates even when there are not enough free 
> >>>>>>>>>>>>> segments. 
> >>>>>>>>>>>>> I can write the patch that handles these situations. But I 
> >>>>>>>>>>>>> wonder 
> >>>>>>>>>>>>> that this is required, and I am not sure which IPU polices can 
> >>>>>>>>>>>>> be disabled.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Oh, As I replied in another thread, I think IPU just affects 
> >>>>>>>>>>>> filesystem
> >>>>>>>>>>>> hot/cold separating, rather than this feature. So I think it 
> >>>>>>>>>>>> will be okay
> >>>>>>>>>>>> to not consider it.
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Before the second mapping is implemented, write hints are not 
> >>>>>>>>>>>>>>> passed down
> >>>>>>>>>>>>>>> to devices. Because it is better that the data of a segment 
> >>>>>>>>>>>>>>> have the same 
> >>>>>>>>>>>>>>> hint.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
> >>>>>>>>>>>>>>> [2]: https://lwn.net/Articles/726477/
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Could you write a patch to support passing write hint to block 
> >>>>>>>>>>>>>> layer for
> >>>>>>>>>>>>>> buffered writes as below commit:
> >>>>>>>>>>>>>> 0127251c45ae ("ext4: add support for passing in write hints 
> >>>>>>>>>>>>>> for buffered writes")
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Sure I will. I wrote it already ;)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Cool, ;)
> >>>>>>>>>>>>
> >>>>>>>>>>>>> I think that datas from the same segment should be passed down 
> >>>>>>>>>>>>> with the same
> >>>>>>>>>>>>> hint, and the following mapping is reasonable. I wonder what is 
> >>>>>>>>>>>>> your opinion
> >>>>>>>>>>>>> about it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>   segment type               hints
> >>>>>>>>>>>>>   ------------               -----
> >>>>>>>>>>>>>   CURSEG_COLD_DATA           WRITE_LIFE_EXTREME
> >>>>>>>>>>>>>   CURSEG_HOT_DATA            WRITE_LIFE_SHORT
> >>>>>>>>>>>>>   CURSEG_COLD_NODE           WRITE_LIFE_NORMAL
> >>>>>>>>>>>>
> >>>>>>>>>>>> We have WRITE_LIFE_LONG defined rather than WRITE_LIFE_NORMAL in 
> >>>>>>>>>>>> fs.h?
> >>>>>>>>>>>>
> >>>>>>>>>>>>>   CURSEG_HOT_NODE            WRITE_LIFE_MEDIUM
> >>>>>>>>>>>>
> >>>>>>>>>>>> As I know, in scenario of cell phone, data of meta_inode is 
> >>>>>>>>>>>> hottest, then hot
> >>>>>>>>>>>> data, warm node, and cold node should be coldest. So I suggested 
> >>>>>>>>>>>> we can define
> >>>>>>>>>>>> as below:
> >>>>>>>>>>>>
> >>>>>>>>>>>> META_DATA                        WRITE_LIFE_SHORT
> >>>>>>>>>>>> HOT_DATA & WARM_NODE             WRITE_LIFE_MEDIUM
> >>>>>>>>>>>> HOT_NODE & WARM_DATA             WRITE_LIFE_LONG
> >>>>>>>>>>>> COLD_NODE & COLD_DATA            WRITE_LIFE_EXTREME
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> I agree, But I am not sure that assigning the same hint to a node 
> >>>>>>>>>>> and data
> >>>>>>>>>>> segment is good. Because NVMe is likely to write them in the same 
> >>>>>>>>>>> erase 
> >>>>>>>>>>> block if they have the same hint.
> >>>>>>>>>>
> >>>>>>>>>> If we do not give the hint, they can still be written to the same 
> >>>>>>>>>> erase block,
> >>>>>>>>
> >>>>>>>> I mean it's possible to write them to the same erase block. :)
> >>>>>>>>
> >>>>>>>>>> right? it will not be worse?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> If the hint is not given, I think that they could be written to 
> >>>>>>>>> the same erase block, or not. But if we give the same hint, they 
> >>>>>>>>> are written
> >>>>>>>>> to the same block.
> >>>>>>>>
> >>>>>>>> IMO, Only if underlying device can support more hint type or opened 
> >>>>>>>> channels,
> >>>>>>>> and actual temperature of data segment and node segment is quite 
> >>>>>>>> different, we
> >>>>>>>> can separate them.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Okay, If Jaegeuk Kim agrees with this, I will submit the patch that 
> >>>>>>> implements your proposed mapping.
> >>>>>>
> >>>>>> How about this? We'd better to split data and node blocks as much as 
> >>>>>> possible.
> >>>>>>
> >>>>>> segment type                    hints
> >>>>>> ------------                    -----
> >>>>>> COLD_NODE & COLD_DATA          WRITE_LIFE_NONE
> >>>>>
> >>>>> WRITE_LIFE_NONE means there is no hints about write life time.
> >>>>>
> >>>>> Shouldn't we define COLD_NODE & COLD_DATA as WRITE_LIFE_EXTERME?
> >>>>
> >>>> The assumption would be to split different types of blocks by flash 
> >>>> firmware,
> >>>> so I think we can use WRITE_LIFE_NONE as a type as well.
> >>>>
> >>>
> >>> WRITE_LIFE_NONE means that no stream id is specified. It equals 
> >>> WRITE_LIFE_NOT_SET.
> >>
> >> Rgith, I just saw nvme implementation:
> >>
> >> nvme_assign_write_stream
> >>
> >>    enum rw_hint streamid = req->write_hint;
> >>
> >>    if (streamid == WRITE_LIFE_NOT_SET || streamid == WRITE_LIFE_NONE)
> >>            streamid = 0;
> >>    else {
> >>            streamid--;
> >> ...
> >>
> >>> So I think that we can define WARM_DATA as WRITE_LIFE_NONE, and
> >>> COLD_NODE & COLD_DATA as WRITE_LIFE_EXTREME.
> > 
> > What's the point?
> > 
> > segment type                 hints                streamid
> > -------------                -----                -------
> > COLD_NODE & COLD_DATA        WRITE_LIFE_NONE      0
> > WARM_DATA                    WRITE_LIFE_EXTERME   4
> > HOT_NODE & WARM_NODE         WRITE_LIFE_LONG      3
> > HOT_DATA                     WRITE_LIFE_MEDIUM    2
> > META_DATA                    WRITE_LIFE_SHORT     1
> > 
> > So, I don't think something is wrong. Again, I don't care about its hotness
> > given to the naming, but do care how to split different types of blocks with
> > different stream ids. Exceptions would be giving _SHORT or _MEDIUM which are
> > likely to be latency-critical, since I guess firmware may be able to store 
> > them
> > into SLC buffer.
> > 
> > Am I missing that _NONE has another meaning?
> > 
> 
> What I am worried about is that datas with no hint have WRITE_LIFE_NOT_SET(id 
> 0).
> If block devices have swap partitions and anothor file systems, cold datas 
> could
> be mixed with datas from that. Does this seems way too much?

That seems like how to distinguish write_hints across multiple partitions?

> And I think that stream id 0 means disabling stream directives. 
> Becasue NVME_RW_DTYPE_STREAMS is clear.

Then, I guess SSD FW will just handle 5 stream IDs including disabled 0.

Thanks,

Reply via email to