On Tuesday 16 October 2012, Jaegeuk Kim wrote: > 2012-10-16 (화), 16:14 +0000, Arnd Bergmann: > > On Tuesday 16 October 2012, Jaegeuk Kim wrote: > > For the lower bound, being able to support as little as 2 logs for > > cheap hardware would be nice, but 4 logs is the important one. > > > > 5 logs is probably not all that important, as long as you have the > > choice between 4 and 6. If you implement three different ways, I > > would prefer have the choice of 2/4/6 over 4/5/6 logs. > > Ok, I'll try, but in the case of 2 logs, it may need to change recovery > routines.
Ok, I see. If it needs any changes that require a lot of extra code or if it would make the common (six logs) case less efficient, then you should probably not do it. > > I fear that this might not be good enough for a lot of cases when > > the page sizes grow and there is no sufficient amount of nonvolatile > > write cache in the device. I wonder whether there is something that can > > be done to ensure we always write with a minimum alignment, and pad > > out the data with zeroes if necessary in order to avoid getting into > > garbage collection on devices that can't handle sub-page writes. > > You're very familiar with flash. :) > Yes, as the page size grows, the sub-page write issue is one of the > most critical problems. > I also thought this before, but I have not made a conclusion until now. > Because, I don't know how to deal with this in other companies, but, > I've seen that so many firmware developers in samsung have tried to > reduce this overhead by adapting many schemes. > I guess very cautiously that other companies also handle this well. > Therefore, I keep a question whether file system should care about > this perfectly or not. My guess is that most devices would be able to handle this well enough as long as the writes are only in the log areas, but some would fail when there are cached sub-page writes by the time you update the metadata in the beginning of the drive. Besides the extreme case of getting into garbage collect when the device runs out of nonvolatile cache to keep sub-pages, there is also the other problem that it is always more efficient not to need the NV cache than having to use it to do sub-page writes. This is especially true if the NV cache is implemented as a log on a regular flash block. In those cases, it would be better to pad the current write with zeroes to the next page boundary and rely on garbage collection to do the compaction later. As I mentioned before, my design avoided the problem by using larger clusters to start with and then mitigating the space overhead from this by allowing to put multiple inodes into a single cluster. The tradeoffs from this are very different than what you have with a fixed 4KB block size, and it's probably not worth redesigning f2fs to handle this on such a global scale. One thing that you can do though is pad each flash page with data from garbage collection: There should basically always be data that needs to be GC'd, and as soon as you have decided that you want to write a block to a new location and the hardware requires that it writes a block of data to pad the page, you might just as well send down that block. In the opposite case where you have a full page worth of actual data that needs to be written (e.g. for a sync()) and half a page worth of data from garbage collection, you can decide not send the GC data in order to stay inside on a page boundary. Doing this systematically would allow using the eMMC-4.5 "large-unit" context for all of the logs, which can be a significant performance improvement, depending on the underlying implementation. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/