Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-24 Thread Arnd Bergmann
On Wednesday 24 October 2012, Dave Chinner wrote:
> On Wed, Oct 17, 2012 at 12:50:11PM +, Arnd Bergmann wrote:
> > On Tuesday 16 October 2012, Jaegeuk Kim wrote:
> > > > IIRC, fs2fs uses 4k inodes, so IMO per-inode xattr tress with
> > > > internal storage before spilling to an external block is probably
> > > > the best approach to take...
> > > 
> > > Yes, indeed this is the best approach to f2fs's xattr.
> > > Apart from giving fs hints, it is worth enough to optimize later.
> > 
> > I've thought a bit more about how this could be represented efficiently
> > in 4KB nodes. This would require a significant change of the way you
> > represent inodes, but can improve a number of things at the same time.
> > 
> > The idea is to replace the fixed area in the inode that contains block
> > pointers with an extensible TLV (type/length/value) list that can contain
> > multiple variable-length fields, like this.
> 
> You've just re-invented inode forks... ;)

Ah, good to know the name for it. I didn't really expect that it was a
new idea.

> The main issue with supporting an arbitrary number of forks is space
> management of the inode literal area.  e.g. one fork is in inline
> format (e.g.  direct file contents) and then we add an attribute.
> The attribute won't fit inline, nor will an extent form fork header,
> so the inline data fork has to be converted to extent format before
> the xattr can be added. Now scale that problem up to an arbitrary
> number of forks

Right. Obviously this is a solveable problem, but I agree that solving
it is nontrivial and requires some code complexity that would be nice
to avoid.

> > As a variation of this, it would also be nice to turn around the order
> > in which the pointers are walked, to optimize for space and for growing
> > files, rather than for reading the beginning of a file. With this, you
> > can represent a 9 KB file using a list of two block pointers, and 1KB
> > of direct data, all in the inode. When the user adds another byte, you
> > only need to rewrite the inode. Similarly, a 5 MB file would have a
> > single indirect node (covering block pointers for 4 MB), plus 256
> > separate block pointers (covering the last megabyte), and a 5 GB file
> > can be represented using 1 double-indirect node and 256 indirect nodes,
> > and each of them can still be followed by direct "tail" data and
> > extended attributes.
> 
> I'm not sure that the resultant code complexity is worth saving an
> extra block here and there.

The space overhead may be noticeable for lots of small files but the
part that worries me more is the overhead for writing (and cleaning up)
data in multiple locations. Any write to file data or extended attributes
requires an update of the inode (mtime, ctime, size, ...) and one or
more other blocks (data, pointers, xattr). In order for the garbage
collection to work best, we want to split those writes into separate logs,
which later have to be cleaned up again. In particular for the inode
but also for the block pointers, we create a lot of garbage from
copy-on-write. Storing as much as possible in the inode itself therefore
saves us from writing the data multiple times rather than just the
actual update.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-24 Thread Arnd Bergmann
On Wednesday 24 October 2012, Dave Chinner wrote:
 On Wed, Oct 17, 2012 at 12:50:11PM +, Arnd Bergmann wrote:
  On Tuesday 16 October 2012, Jaegeuk Kim wrote:
IIRC, fs2fs uses 4k inodes, so IMO per-inode xattr tress with
internal storage before spilling to an external block is probably
the best approach to take...
   
   Yes, indeed this is the best approach to f2fs's xattr.
   Apart from giving fs hints, it is worth enough to optimize later.
  
  I've thought a bit more about how this could be represented efficiently
  in 4KB nodes. This would require a significant change of the way you
  represent inodes, but can improve a number of things at the same time.
  
  The idea is to replace the fixed area in the inode that contains block
  pointers with an extensible TLV (type/length/value) list that can contain
  multiple variable-length fields, like this.
 
 You've just re-invented inode forks... ;)

Ah, good to know the name for it. I didn't really expect that it was a
new idea.

 The main issue with supporting an arbitrary number of forks is space
 management of the inode literal area.  e.g. one fork is in inline
 format (e.g.  direct file contents) and then we add an attribute.
 The attribute won't fit inline, nor will an extent form fork header,
 so the inline data fork has to be converted to extent format before
 the xattr can be added. Now scale that problem up to an arbitrary
 number of forks

Right. Obviously this is a solveable problem, but I agree that solving
it is nontrivial and requires some code complexity that would be nice
to avoid.

  As a variation of this, it would also be nice to turn around the order
  in which the pointers are walked, to optimize for space and for growing
  files, rather than for reading the beginning of a file. With this, you
  can represent a 9 KB file using a list of two block pointers, and 1KB
  of direct data, all in the inode. When the user adds another byte, you
  only need to rewrite the inode. Similarly, a 5 MB file would have a
  single indirect node (covering block pointers for 4 MB), plus 256
  separate block pointers (covering the last megabyte), and a 5 GB file
  can be represented using 1 double-indirect node and 256 indirect nodes,
  and each of them can still be followed by direct tail data and
  extended attributes.
 
 I'm not sure that the resultant code complexity is worth saving an
 extra block here and there.

The space overhead may be noticeable for lots of small files but the
part that worries me more is the overhead for writing (and cleaning up)
data in multiple locations. Any write to file data or extended attributes
requires an update of the inode (mtime, ctime, size, ...) and one or
more other blocks (data, pointers, xattr). In order for the garbage
collection to work best, we want to split those writes into separate logs,
which later have to be cleaned up again. In particular for the inode
but also for the block pointers, we create a lot of garbage from
copy-on-write. Storing as much as possible in the inode itself therefore
saves us from writing the data multiple times rather than just the
actual update.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-23 Thread Dave Chinner
On Wed, Oct 17, 2012 at 12:50:11PM +, Arnd Bergmann wrote:
> On Tuesday 16 October 2012, Jaegeuk Kim wrote:
> > > IIRC, fs2fs uses 4k inodes, so IMO per-inode xattr tress with
> > > internal storage before spilling to an external block is probably
> > > the best approach to take...
> > 
> > Yes, indeed this is the best approach to f2fs's xattr.
> > Apart from giving fs hints, it is worth enough to optimize later.
> 
> I've thought a bit more about how this could be represented efficiently
> in 4KB nodes. This would require a significant change of the way you
> represent inodes, but can improve a number of things at the same time.
> 
> The idea is to replace the fixed area in the inode that contains block
> pointers with an extensible TLV (type/length/value) list that can contain
> multiple variable-length fields, like this.

You've just re-invented inode forks... ;)

> All TLVs together with the
> fixed-length inode data can fill a 4KB block.
> 
> The obvious types would be:
> 
> * Direct file contents if the file is less than a block
> * List of block pointers, as before, minimum 1, maximum until the end
>   of the block
> * List of indirect pointers, now also a variable length, similar to the
>   list of block pointers
> * List of double-indirect block pointers
> * direct xattr: zero-terminated attribute name followed by contents
> * indirect xattr: zero-terminated attribute name followed by up to
>   16 block pointers to store a maximum of 64KB sized xattrs
> 
> This could be extended later to cover additional types, e.g. a list
> of erase block pointers, triple-indirect blocks or extents.

An inode fork doesn't care about the data in it - it's just an
independent block mapping index. i.e. inline, direct,
indirect, double indirect. The data in the fork is managed
externally to the format of the fork. e.g. XFS has two forks - one
for storing data (file data, directory contents, etc) and the other
for storing attributes.

The main issue with supporting an arbitrary number of forks is space
management of the inode literal area.  e.g. one fork is in inline
format (e.g.  direct file contents) and then we add an attribute.
The attribute won't fit inline, nor will an extent form fork header,
so the inline data fork has to be converted to extent format before
the xattr can be added. Now scale that problem up to an arbitrary
number of forks

> As a variation of this, it would also be nice to turn around the order
> in which the pointers are walked, to optimize for space and for growing
> files, rather than for reading the beginning of a file. With this, you
> can represent a 9 KB file using a list of two block pointers, and 1KB
> of direct data, all in the inode. When the user adds another byte, you
> only need to rewrite the inode. Similarly, a 5 MB file would have a
> single indirect node (covering block pointers for 4 MB), plus 256
> separate block pointers (covering the last megabyte), and a 5 GB file
> can be represented using 1 double-indirect node and 256 indirect nodes,
> and each of them can still be followed by direct "tail" data and
> extended attributes.

I'm not sure that the resultant code complexity is worth saving an
extra block here and there.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-23 Thread Dave Chinner
On Wed, Oct 17, 2012 at 12:50:11PM +, Arnd Bergmann wrote:
 On Tuesday 16 October 2012, Jaegeuk Kim wrote:
   IIRC, fs2fs uses 4k inodes, so IMO per-inode xattr tress with
   internal storage before spilling to an external block is probably
   the best approach to take...
  
  Yes, indeed this is the best approach to f2fs's xattr.
  Apart from giving fs hints, it is worth enough to optimize later.
 
 I've thought a bit more about how this could be represented efficiently
 in 4KB nodes. This would require a significant change of the way you
 represent inodes, but can improve a number of things at the same time.
 
 The idea is to replace the fixed area in the inode that contains block
 pointers with an extensible TLV (type/length/value) list that can contain
 multiple variable-length fields, like this.

You've just re-invented inode forks... ;)

 All TLVs together with the
 fixed-length inode data can fill a 4KB block.
 
 The obvious types would be:
 
 * Direct file contents if the file is less than a block
 * List of block pointers, as before, minimum 1, maximum until the end
   of the block
 * List of indirect pointers, now also a variable length, similar to the
   list of block pointers
 * List of double-indirect block pointers
 * direct xattr: zero-terminated attribute name followed by contents
 * indirect xattr: zero-terminated attribute name followed by up to
   16 block pointers to store a maximum of 64KB sized xattrs
 
 This could be extended later to cover additional types, e.g. a list
 of erase block pointers, triple-indirect blocks or extents.

An inode fork doesn't care about the data in it - it's just an
independent block mapping index. i.e. inline, direct,
indirect, double indirect. The data in the fork is managed
externally to the format of the fork. e.g. XFS has two forks - one
for storing data (file data, directory contents, etc) and the other
for storing attributes.

The main issue with supporting an arbitrary number of forks is space
management of the inode literal area.  e.g. one fork is in inline
format (e.g.  direct file contents) and then we add an attribute.
The attribute won't fit inline, nor will an extent form fork header,
so the inline data fork has to be converted to extent format before
the xattr can be added. Now scale that problem up to an arbitrary
number of forks

 As a variation of this, it would also be nice to turn around the order
 in which the pointers are walked, to optimize for space and for growing
 files, rather than for reading the beginning of a file. With this, you
 can represent a 9 KB file using a list of two block pointers, and 1KB
 of direct data, all in the inode. When the user adds another byte, you
 only need to rewrite the inode. Similarly, a 5 MB file would have a
 single indirect node (covering block pointers for 4 MB), plus 256
 separate block pointers (covering the last megabyte), and a 5 GB file
 can be represented using 1 double-indirect node and 256 indirect nodes,
 and each of them can still be followed by direct tail data and
 extended attributes.

I'm not sure that the resultant code complexity is worth saving an
extra block here and there.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-17 Thread Jaegeuk Kim
> On Tuesday 16 October 2012, Jaegeuk Kim wrote:
> > >
> > > > > An xattr on the root inode that holds a list like this is something
> > > > > that could be set at mkfs time, but then also updated easily by new
> > > > > software packages that are installed...
> > >
> > > Yes, good idea.
> >
> > Likewise many file systems, f2fs also supports xattr as a configurable
> > Kconfig option.
> > If user disables the xattr feature, how can we do this?
> 
> I can see three options here:
> 
> * make the extension list feature dependent on xattr, and treat all files
>   the same if it's disabled.
> 
> * put the list into the superblock instead.
> 
> * fall back on a hardcoded list of extensions when the extended attribute
>   is not present or the feature is disabled.
> 

IMHO, we don't need to disable the extension list among the cases.
So, as I described before, I propose the following options.

* By default, mkfs stores an extension list in superblock, and f2fs simply
   uses it.
* If users try to handle cold files by themselves, they can give a hint
  via the xattr interface.
* Whenever they want not to use the default extension list, they can
  easily disable it by a mount option.

>   Arnd

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-17 Thread Arnd Bergmann
On Tuesday 16 October 2012, Jaegeuk Kim wrote:
> > IIRC, fs2fs uses 4k inodes, so IMO per-inode xattr tress with
> > internal storage before spilling to an external block is probably
> > the best approach to take...
> 
> Yes, indeed this is the best approach to f2fs's xattr.
> Apart from giving fs hints, it is worth enough to optimize later.

I've thought a bit more about how this could be represented efficiently
in 4KB nodes. This would require a significant change of the way you
represent inodes, but can improve a number of things at the same time.

The idea is to replace the fixed area in the inode that contains block
pointers with an extensible TLV (type/length/value) list that can contain
multiple variable-length fields, like this. All TLVs together with the
fixed-length inode data can fill a 4KB block.

The obvious types would be:

* Direct file contents if the file is less than a block
* List of block pointers, as before, minimum 1, maximum until the end
  of the block
* List of indirect pointers, now also a variable length, similar to the
  list of block pointers
* List of double-indirect block pointers
* direct xattr: zero-terminated attribute name followed by contents
* indirect xattr: zero-terminated attribute name followed by up to
  16 block pointers to store a maximum of 64KB sized xattrs

This could be extended later to cover additional types, e.g. a list
of erase block pointers, triple-indirect blocks or extents.

As a variation of this, it would also be nice to turn around the order
in which the pointers are walked, to optimize for space and for growing
files, rather than for reading the beginning of a file. With this, you
can represent a 9 KB file using a list of two block pointers, and 1KB
of direct data, all in the inode. When the user adds another byte, you
only need to rewrite the inode. Similarly, a 5 MB file would have a
single indirect node (covering block pointers for 4 MB), plus 256
separate block pointers (covering the last megabyte), and a 5 GB file
can be represented using 1 double-indirect node and 256 indirect nodes,
and each of them can still be followed by direct "tail" data and
extended attributes.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-17 Thread Arnd Bergmann
On Tuesday 16 October 2012, Jaegeuk Kim wrote:
> > 
> > > > An xattr on the root inode that holds a list like this is something
> > > > that could be set at mkfs time, but then also updated easily by new
> > > > software packages that are installed...
> > 
> > Yes, good idea.
> 
> Likewise many file systems, f2fs also supports xattr as a configurable
> Kconfig option.
> If user disables the xattr feature, how can we do this?

I can see three options here:

* make the extension list feature dependent on xattr, and treat all files
  the same if it's disabled.

* put the list into the superblock instead. 

* fall back on a hardcoded list of extensions when the extended attribute
  is not present or the feature is disabled.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-17 Thread Arnd Bergmann
On Wednesday 17 October 2012, Jaegeuk Kim wrote:
> As discussed with Dave, I propose the following items.
> 
> [In v2]
> - Extension list
>: Mkfs supports configuring extensions by user, and that information
>  will be stored in the superblock.
>  I'll add a mount option to enable/disable using the extension list.
>  Instead, f2fs supports xattr to give a hint to any files.
>  After supporting this by VFS, it'll be removed.
> - The number of active logs
>   : For compatibility, on-disk layout supports max 16 logs.
> Instead, f2fs supports configuring the number of active logs that
> will be used by a mount option.
> The option supports 2, 4, and 6 logs.
> - Section size
>   : Mkfs supports multiples of segments for a section, not power-of-two.
> 
> [Future optimization]
> - Data separation
>   : file access pattern
>   : Investigate the option to make large files erase block indirect rather 
> than
>part of the normal logs
>   : sub-page write avoidance

Ok, sounds good! I'll comment separately on the xattr optimization,
since I had some ideas on how to combine this with other optimizations.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-17 Thread Arnd Bergmann
On Tuesday 16 October 2012, Jaegeuk Kim wrote:
> 2012-10-16 (화), 16:14 +, Arnd Bergmann:
> > On Tuesday 16 October 2012, Jaegeuk Kim wrote:
> > For the lower bound, being able to support as little as 2 logs for
> > cheap hardware would be nice, but 4 logs is the important one.
> > 
> > 5 logs is probably not all that important, as long as you have the
> > choice between 4 and 6. If you implement three different ways, I
> > would prefer have the choice of 2/4/6 over 4/5/6 logs.
> 
> Ok, I'll try, but in the case of 2 logs, it may need to change recovery
> routines.

Ok, I see. If it needs any changes that require a lot of extra code or
if it would make the common (six logs) case less efficient, then
you should probably not do it.

> > I fear that this might not be good enough for a lot of cases when
> > the page sizes grow and there is no sufficient amount of nonvolatile
> > write cache in the device. I wonder whether there is something that can
> > be done to ensure we always write with a minimum alignment, and pad
> > out the data with zeroes if necessary in order to avoid getting into
> > garbage collection on devices that can't handle sub-page writes.
> 
> You're very familiar with flash. :)
> Yes, as the page size grows, the sub-page write issue is one of the
> most critical problems.
> I also thought this before, but I have not made a conclusion until now.
> Because, I don't know how to deal with this in other companies, but,
> I've seen that so many firmware developers in samsung have tried to
> reduce this overhead by adapting many schemes.
> I guess very cautiously that other companies also handle this well.
> Therefore, I keep a question whether file system should care about
> this perfectly or not.

My guess is that most devices would be able to handle this well enough
as long as the writes are only in the log areas, but some would fail
when there are cached sub-page writes by the time you update the metadata
in the beginning of the drive.

Besides the extreme case of getting into garbage collect when the device
runs out of nonvolatile cache to keep sub-pages, there is also the other
problem that it is always more efficient not to need the NV cache than
having to use it to do sub-page writes. This is especially true if the
NV cache is implemented as a log on a regular flash block. In those cases,
it would be better to pad the current write with zeroes to the next
page boundary and rely on garbage collection to do the compaction later.

As I mentioned before, my design avoided the problem by using larger
clusters to start with and then mitigating the space overhead from this
by allowing to put multiple inodes into a single cluster. The tradeoffs
from this are very different than what you have with a fixed 4KB block
size, and it's probably not worth redesigning f2fs to handle this on
such a global scale.

One thing that you can do though is pad each flash page with data from
garbage collection: There should basically always be data that needs
to be GC'd, and as soon as you have decided that you want to write
a block to a new location and the hardware requires that it writes
a block of data to pad the page, you might just as well send down that
block. In the opposite case where you have a full page worth of actual
data that needs to be written (e.g. for a sync()) and half a page
worth of data from garbage collection, you can decide not send the GC
data in order to stay inside on a page boundary.

Doing this systematically would allow using the eMMC-4.5 "large-unit"
context for all of the logs, which can be a significant performance
improvement, depending on the underlying implementation.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-17 Thread Arnd Bergmann
On Tuesday 16 October 2012, Jaegeuk Kim wrote:
 2012-10-16 (화), 16:14 +, Arnd Bergmann:
  On Tuesday 16 October 2012, Jaegeuk Kim wrote:
  For the lower bound, being able to support as little as 2 logs for
  cheap hardware would be nice, but 4 logs is the important one.
  
  5 logs is probably not all that important, as long as you have the
  choice between 4 and 6. If you implement three different ways, I
  would prefer have the choice of 2/4/6 over 4/5/6 logs.
 
 Ok, I'll try, but in the case of 2 logs, it may need to change recovery
 routines.

Ok, I see. If it needs any changes that require a lot of extra code or
if it would make the common (six logs) case less efficient, then
you should probably not do it.

  I fear that this might not be good enough for a lot of cases when
  the page sizes grow and there is no sufficient amount of nonvolatile
  write cache in the device. I wonder whether there is something that can
  be done to ensure we always write with a minimum alignment, and pad
  out the data with zeroes if necessary in order to avoid getting into
  garbage collection on devices that can't handle sub-page writes.
 
 You're very familiar with flash. :)
 Yes, as the page size grows, the sub-page write issue is one of the
 most critical problems.
 I also thought this before, but I have not made a conclusion until now.
 Because, I don't know how to deal with this in other companies, but,
 I've seen that so many firmware developers in samsung have tried to
 reduce this overhead by adapting many schemes.
 I guess very cautiously that other companies also handle this well.
 Therefore, I keep a question whether file system should care about
 this perfectly or not.

My guess is that most devices would be able to handle this well enough
as long as the writes are only in the log areas, but some would fail
when there are cached sub-page writes by the time you update the metadata
in the beginning of the drive.

Besides the extreme case of getting into garbage collect when the device
runs out of nonvolatile cache to keep sub-pages, there is also the other
problem that it is always more efficient not to need the NV cache than
having to use it to do sub-page writes. This is especially true if the
NV cache is implemented as a log on a regular flash block. In those cases,
it would be better to pad the current write with zeroes to the next
page boundary and rely on garbage collection to do the compaction later.

As I mentioned before, my design avoided the problem by using larger
clusters to start with and then mitigating the space overhead from this
by allowing to put multiple inodes into a single cluster. The tradeoffs
from this are very different than what you have with a fixed 4KB block
size, and it's probably not worth redesigning f2fs to handle this on
such a global scale.

One thing that you can do though is pad each flash page with data from
garbage collection: There should basically always be data that needs
to be GC'd, and as soon as you have decided that you want to write
a block to a new location and the hardware requires that it writes
a block of data to pad the page, you might just as well send down that
block. In the opposite case where you have a full page worth of actual
data that needs to be written (e.g. for a sync()) and half a page
worth of data from garbage collection, you can decide not send the GC
data in order to stay inside on a page boundary.

Doing this systematically would allow using the eMMC-4.5 large-unit
context for all of the logs, which can be a significant performance
improvement, depending on the underlying implementation.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-17 Thread Arnd Bergmann
On Wednesday 17 October 2012, Jaegeuk Kim wrote:
 As discussed with Dave, I propose the following items.
 
 [In v2]
 - Extension list
: Mkfs supports configuring extensions by user, and that information
  will be stored in the superblock.
  I'll add a mount option to enable/disable using the extension list.
  Instead, f2fs supports xattr to give a hint to any files.
  After supporting this by VFS, it'll be removed.
 - The number of active logs
   : For compatibility, on-disk layout supports max 16 logs.
 Instead, f2fs supports configuring the number of active logs that
 will be used by a mount option.
 The option supports 2, 4, and 6 logs.
 - Section size
   : Mkfs supports multiples of segments for a section, not power-of-two.
 
 [Future optimization]
 - Data separation
   : file access pattern
   : Investigate the option to make large files erase block indirect rather 
 than
part of the normal logs
   : sub-page write avoidance

Ok, sounds good! I'll comment separately on the xattr optimization,
since I had some ideas on how to combine this with other optimizations.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-17 Thread Arnd Bergmann
On Tuesday 16 October 2012, Jaegeuk Kim wrote:
  
An xattr on the root inode that holds a list like this is something
that could be set at mkfs time, but then also updated easily by new
software packages that are installed...
  
  Yes, good idea.
 
 Likewise many file systems, f2fs also supports xattr as a configurable
 Kconfig option.
 If user disables the xattr feature, how can we do this?

I can see three options here:

* make the extension list feature dependent on xattr, and treat all files
  the same if it's disabled.

* put the list into the superblock instead. 

* fall back on a hardcoded list of extensions when the extended attribute
  is not present or the feature is disabled.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-17 Thread Arnd Bergmann
On Tuesday 16 October 2012, Jaegeuk Kim wrote:
  IIRC, fs2fs uses 4k inodes, so IMO per-inode xattr tress with
  internal storage before spilling to an external block is probably
  the best approach to take...
 
 Yes, indeed this is the best approach to f2fs's xattr.
 Apart from giving fs hints, it is worth enough to optimize later.

I've thought a bit more about how this could be represented efficiently
in 4KB nodes. This would require a significant change of the way you
represent inodes, but can improve a number of things at the same time.

The idea is to replace the fixed area in the inode that contains block
pointers with an extensible TLV (type/length/value) list that can contain
multiple variable-length fields, like this. All TLVs together with the
fixed-length inode data can fill a 4KB block.

The obvious types would be:

* Direct file contents if the file is less than a block
* List of block pointers, as before, minimum 1, maximum until the end
  of the block
* List of indirect pointers, now also a variable length, similar to the
  list of block pointers
* List of double-indirect block pointers
* direct xattr: zero-terminated attribute name followed by contents
* indirect xattr: zero-terminated attribute name followed by up to
  16 block pointers to store a maximum of 64KB sized xattrs

This could be extended later to cover additional types, e.g. a list
of erase block pointers, triple-indirect blocks or extents.

As a variation of this, it would also be nice to turn around the order
in which the pointers are walked, to optimize for space and for growing
files, rather than for reading the beginning of a file. With this, you
can represent a 9 KB file using a list of two block pointers, and 1KB
of direct data, all in the inode. When the user adds another byte, you
only need to rewrite the inode. Similarly, a 5 MB file would have a
single indirect node (covering block pointers for 4 MB), plus 256
separate block pointers (covering the last megabyte), and a 5 GB file
can be represented using 1 double-indirect node and 256 indirect nodes,
and each of them can still be followed by direct tail data and
extended attributes.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-17 Thread Jaegeuk Kim
 On Tuesday 16 October 2012, Jaegeuk Kim wrote:
  
 An xattr on the root inode that holds a list like this is something
 that could be set at mkfs time, but then also updated easily by new
 software packages that are installed...
  
   Yes, good idea.
 
  Likewise many file systems, f2fs also supports xattr as a configurable
  Kconfig option.
  If user disables the xattr feature, how can we do this?
 
 I can see three options here:
 
 * make the extension list feature dependent on xattr, and treat all files
   the same if it's disabled.
 
 * put the list into the superblock instead.
 
 * fall back on a hardcoded list of extensions when the extended attribute
   is not present or the feature is disabled.
 

IMHO, we don't need to disable the extension list among the cases.
So, as I described before, I propose the following options.

* By default, mkfs stores an extension list in superblock, and f2fs simply
   uses it.
* If users try to handle cold files by themselves, they can give a hint
  via the xattr interface.
* Whenever they want not to use the default extension list, they can
  easily disable it by a mount option.

   Arnd

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Jaegeuk Kim
> 2012-10-16 (화), 16:14 +, Arnd Bergmann:
> > On Tuesday 16 October 2012, Jaegeuk Kim wrote:
> > > Thank you for a lot of points to be addressed. :)
> > > Maybe it's time to summarize them.
> > > Please let me know what I misunderstood.
> > >
> > > [In v2]
> > > - Extension list
> > >   : Mkfs supports configuring extensions by user, and that information
> > > will be stored in the superblock. In order to reduce the cleaning 
> > > overhead,
> > > f2fs supports an additional interface, ioctl, likewise ext4.
> >
> > That is what I suggested but actually Dave Chinner is the person that you
> > need to listen to rather than me in this regard. Using an extended attribute
> > in the root node would be more appropriate to configure this than an ioctl.
> >
> > > - The number of active logs
> > >   : No change will be done in on-disk layout (i.e., max 6 logs).
> > > Instead, f2fs supports changing the number with a mount option.
> > > Currently, I think 4, 5, and 6 would be enough.
> >
> > Right, that would be the minimum that I would ask for. If it is relatively
> > easy to support more than six logs in the file format without actually
> > implementing them in the code, you might want to support up to 16, just
> > to be future-proof.
> 
> Ok, got it.
> 
> >
> > For the lower bound, being able to support as little as 2 logs for
> > cheap hardware would be nice, but 4 logs is the important one.
> >
> > 5 logs is probably not all that important, as long as you have the
> > choice between 4 and 6. If you implement three different ways, I
> > would prefer have the choice of 2/4/6 over 4/5/6 logs.
> 
> Ok, I'll try, but in the case of 2 logs, it may need to change recovery
> routines.
> 
> >
> > > - Section size
> > >   : Mkfs supports multiples of segments for a section, not power-of-two.
> >
> > Right.
> >
> > > [Future optimization]
> > > - Data separation
> > >   : file access pattern, and else?
> >
> >  : Investigate the option to make large files erase block indirect rather 
> > than
> >part of the normal logs
> >
> > There is one more more point that I have not mentioned before, which is the
> > alignment of write requests. As far as I can tell, you try to group writes
> > as much as possible, but the alignment and the minimum size is still just
> > 4 KB.
> 
> Yes.
> 
> > I fear that this might not be good enough for a lot of cases when
> > the page sizes grow and there is no sufficient amount of nonvolatile
> > write cache in the device. I wonder whether there is something that can
> > be done to ensure we always write with a minimum alignment, and pad
> > out the data with zeroes if necessary in order to avoid getting into
> > garbage collection on devices that can't handle sub-page writes.
> 
> You're very familiar with flash. :)
> Yes, as the page size grows, the sub-page write issue is one of the
> most critical problems.
> I also thought this before, but I have not made a conclusion until now.
> Because, I don't know how to deal with this in other companies, but,
> I've seen that so many firmware developers in samsung have tried to
> reduce this overhead by adapting many schemes.
> I guess very cautiously that other companies also handle this well.
> Therefore, I keep a question whether file system should care about
> this perfectly or not.
> 
> Thanks,
> 
> >
> > Arnd
> 

As discussed with Dave, I propose the following items.

[In v2]
- Extension list
   : Mkfs supports configuring extensions by user, and that information
 will be stored in the superblock.
 I'll add a mount option to enable/disable using the extension list.
 Instead, f2fs supports xattr to give a hint to any files.
 After supporting this by VFS, it'll be removed.
- The number of active logs
  : For compatibility, on-disk layout supports max 16 logs.
Instead, f2fs supports configuring the number of active logs that
will be used by a mount option.
The option supports 2, 4, and 6 logs.
- Section size
  : Mkfs supports multiples of segments for a section, not power-of-two.

[Future optimization]
- Data separation
  : file access pattern
  : Investigate the option to make large files erase block indirect rather than
   part of the normal logs
  : sub-page write avoidance

Thanks,

> --
> Jaegeuk Kim
> Samsung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Jaegeuk Kim
> On Wed, Oct 17, 2012 at 07:30:21AM +0900, Jaegeuk Kim wrote:
> > > > > OTOH, I think xattr itself is for users, not for communicating
> > > > > between file system and users.
> > > >
> > > > No, you are mistaken in that point, as Dave explained.
> > >
> > > e.g. selinux, IMA, ACLs, capabilities, etc all communicate
> > > information that the kernel uses for access control. That's why
> > > xattrs have different namespaces like "system", "security" and
> > > "user". Only user attributes are truly for user data
> > > - the rest are for communicating information to the kernel
> > >
> >
> > I agree that "system" is used by kernel.
> > How about the file system view?
> 
> Not sure what you mean - the filesystem woul dsimply read the xattrs
> in the system namespace as it needs, just like the other subsystems
> like selinux or IMA do.
> 
> > Would you explain what file systems retrieve xattrs and use
> > them with their own purpose?
> 
> I think cachefs users a "CacheFiles.cache" namespace for storing
> information it needs in xattrs. ecryptfs stores crypto metadata in
> xattrs in the lower filesytem. NFSv4 servers store junction mount
> information in xattrs.
> 
> So there are examples where filesystems use xattrs for special
> information. However, in most cases filesystems don't need xattrs
> for their own metadata primarily because that gets added to their
> own on-disk formats. IThe above are all "overlay" style filesystems
> that don't have their own on-disk formats, so need to use xattrs to
> store their per-inode metadata.
> 
> The case of access hints and allocation policies are not somethign
> that are native to any filesystem on-disk format. They are abstract
> concepts that really only the software generating/using that
> information knows about. Given we want the software that uses this
> information to be in VFS, it is separate from every filesystem and
> this is exactly the use case that system xattrs were intended for.
> :)

I understand. Thank you very much. :)

> 
> > Sorry, I'm not familiar with xattrs in depth.
> >
> > Unfortunately, "system" is not implemented in f2fs yet. :(
> 
> If you've already implemented the user.* namespace, then it's
> trivial to support the other namespaces - it's just prefixing the
> xattrs with the appropriate string instead of "user"
> 

Ok, I'll do right now.
Thanks, again.

> Cheers,
> 
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Dave Chinner
On Wed, Oct 17, 2012 at 07:30:21AM +0900, Jaegeuk Kim wrote:
> > > > OTOH, I think xattr itself is for users, not for communicating
> > > > between file system and users.
> > > 
> > > No, you are mistaken in that point, as Dave explained.
> > 
> > e.g. selinux, IMA, ACLs, capabilities, etc all communicate
> > information that the kernel uses for access control. That's why
> > xattrs have different namespaces like "system", "security" and
> > "user". Only user attributes are truly for user data
> > - the rest are for communicating information to the kernel
> > 
> 
> I agree that "system" is used by kernel.
> How about the file system view?

Not sure what you mean - the filesystem woul dsimply read the xattrs
in the system namespace as it needs, just like the other subsystems
like selinux or IMA do.

> Would you explain what file systems retrieve xattrs and use
> them with their own purpose?

I think cachefs users a "CacheFiles.cache" namespace for storing
information it needs in xattrs. ecryptfs stores crypto metadata in
xattrs in the lower filesytem. NFSv4 servers store junction mount
information in xattrs.

So there are examples where filesystems use xattrs for special
information. However, in most cases filesystems don't need xattrs
for their own metadata primarily because that gets added to their
own on-disk formats. IThe above are all "overlay" style filesystems
that don't have their own on-disk formats, so need to use xattrs to
store their per-inode metadata.

The case of access hints and allocation policies are not somethign
that are native to any filesystem on-disk format. They are abstract
concepts that really only the software generating/using that
information knows about. Given we want the software that uses this
information to be in VFS, it is separate from every filesystem and
this is exactly the use case that system xattrs were intended for.
:)

> Sorry, I'm not familiar with xattrs in depth.
> 
> Unfortunately, "system" is not implemented in f2fs yet. :(

If you've already implemented the user.* namespace, then it's
trivial to support the other namespaces - it's just prefixing the
xattrs with the appropriate string instead of "user"

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Jaegeuk Kim
> > > > > The main reason I can see against extended attributes is that they 
> > > > > are not stored
> > > > > very efficiently in f2fs, unless a lot of work is put into coming up 
> > > > > with a good
> > > > > implementation. A single flags bit can trivially be added to the 
> > > > > inode in
> > > > > comparison (if it's not there already).
> > > > 
> > > > That's a deficiency that should be corrected, then, because xattrs
> > > > are very common these days.
> > > 
> > > IMO, most file systems including f2fs have some inefficiency to store
> > > and retrieve xattrs, since they have to allocate an additional block.
> > > The only distinct problem in f2fs is that there is a cleaning overhead.
> > > So, that's the why xattr is not an efficient way in f2fs.
> > 
> > I would hope that there is a better way to encode extented attributes
> > if the current method is not efficient enough. Maybe Dave or someone
> > else who is experienced with this can make suggestions.
> > 
> > What is the "expected" size of the attributes normally?
> 
> Most attributes are small. Even "large" user attributes don't
> generally get to be more than a couple of hundred bytes, though the
> maximum size for a single xattr is 64K.
> 
> > Does it
> > make sense to put attributes for multiple files into a single block?
> 
> There are two main ways of dealing with attributes. The first is a
> tree-like structure to index and store unique xattrs, and have the
> inode siimply keep pointers to the main xattr tree. This is great
> for keeping space down when there are lots of identical xattrs, but
> is a serialisation point for modification an modification can be
> complex (i.e. shared entries in the tree need COW semantics.) This
> is the approach ZFS takes, IIRC, and is the most space efficient way
> of dealing with xattrs. It's not the most performance efficient way,
> however, and the reference counting means frequent tree rewrites.
> 
> The second is the XFS/ext4 approach, where xattrs are stored in a
> per-inode tree, with no sharing. The attribute tree holds the
> attributes in it's leaves, and the tree grows and shrinks as you add
> or remove xattrs. There are optimisations on top of this - e.g. for
> XFS if the xattrs fit in the spare space in the inode, they are
> packed into the inode ("shortform") and don't require an external
> block. IIUC, there are patches to implement this optimisation for
> ext4 floating around at the moment. This is a worthwhile
> optimisation, because with a 512 byte inode size on XFS there is
> enough spare space (roughly 380 bytes) for most systems to store all
> their xattrs in the inode itself. XFS also has "remote attr storage"
> for large xattrs (i.e. larger than a leaf block), where the tree
> just keeps a pointer to an external extent that holds the xattr.

Thank you for great explanation. :)

> 
> IIRC, fs2fs uses 4k inodes, so IMO per-inode xattr tress with
> internal storage before spilling to an external block is probably
> the best approach to take...

Yes, indeed this is the best approach to f2fs's xattr.
Apart from giving fs hints, it is worth enough to optimize later.

> 
> > > OTOH, I think xattr itself is for users, not for communicating
> > > between file system and users.
> > 
> > No, you are mistaken in that point, as Dave explained.
> 
> e.g. selinux, IMA, ACLs, capabilities, etc all communicate
> information that the kernel uses for access control. That's why
> xattrs have different namespaces like "system", "security" and
> "user". Only user attributes are truly for user data
> - the rest are for communicating information to the kernel
> 

I agree that "system" is used by kernel.
How about the file system view?
Would you explain what file systems retrieve xattrs and use
them with their own purpose?
Sorry, I'm not familiar with xattrs in depth.

Unfortunately, "system" is not implemented in f2fs yet. :(

> A file usage policy xattr would definitely exist under the "system"
> namespace - it's not a user xattr at all.
> 
> Cheers,
> 
> Dave.

-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Jaegeuk Kim
2012-10-16 (화), 16:14 +, Arnd Bergmann:
> On Tuesday 16 October 2012, Jaegeuk Kim wrote:
> > Thank you for a lot of points to be addressed. :)
> > Maybe it's time to summarize them.
> > Please let me know what I misunderstood.
> > 
> > [In v2]
> > - Extension list
> >   : Mkfs supports configuring extensions by user, and that information
> > will be stored in the superblock. In order to reduce the cleaning 
> > overhead,
> > f2fs supports an additional interface, ioctl, likewise ext4.
> 
> That is what I suggested but actually Dave Chinner is the person that you
> need to listen to rather than me in this regard. Using an extended attribute
> in the root node would be more appropriate to configure this than an ioctl.
> 
> > - The number of active logs
> >   : No change will be done in on-disk layout (i.e., max 6 logs).
> > Instead, f2fs supports changing the number with a mount option.
> > Currently, I think 4, 5, and 6 would be enough.
> 
> Right, that would be the minimum that I would ask for. If it is relatively
> easy to support more than six logs in the file format without actually
> implementing them in the code, you might want to support up to 16, just
> to be future-proof.

Ok, got it.

> 
> For the lower bound, being able to support as little as 2 logs for
> cheap hardware would be nice, but 4 logs is the important one.
> 
> 5 logs is probably not all that important, as long as you have the
> choice between 4 and 6. If you implement three different ways, I
> would prefer have the choice of 2/4/6 over 4/5/6 logs.

Ok, I'll try, but in the case of 2 logs, it may need to change recovery
routines.

> 
> > - Section size
> >   : Mkfs supports multiples of segments for a section, not power-of-two.
> 
> Right.
> 
> > [Future optimization]
> > - Data separation
> >   : file access pattern, and else?
> 
>  : Investigate the option to make large files erase block indirect rather than
>part of the normal logs
> 
> There is one more more point that I have not mentioned before, which is the
> alignment of write requests. As far as I can tell, you try to group writes
> as much as possible, but the alignment and the minimum size is still just
> 4 KB.

Yes.

> I fear that this might not be good enough for a lot of cases when
> the page sizes grow and there is no sufficient amount of nonvolatile
> write cache in the device. I wonder whether there is something that can
> be done to ensure we always write with a minimum alignment, and pad
> out the data with zeroes if necessary in order to avoid getting into
> garbage collection on devices that can't handle sub-page writes.

You're very familiar with flash. :)
Yes, as the page size grows, the sub-page write issue is one of the
most critical problems.
I also thought this before, but I have not made a conclusion until now.
Because, I don't know how to deal with this in other companies, but,
I've seen that so many firmware developers in samsung have tried to
reduce this overhead by adapting many schemes.
I guess very cautiously that other companies also handle this well.
Therefore, I keep a question whether file system should care about
this perfectly or not.

Thanks,

> 
>   Arnd

-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Dave Chinner
On Tue, Oct 16, 2012 at 11:38:35AM +, Arnd Bergmann wrote:
> On Tuesday 16 October 2012, Jaegeuk Kim wrote:
> > On Monday 15 October 2012, Dave Chinner wrote:
> > > On Sun, Oct 14, 2012 at 03:19:37PM +, Arnd Bergmann wrote:
> > > > On Sunday 14 October 2012, Vyacheslav Dubeyko wrote:
> > > > > On Oct 14, 2012, at 11:09 AM, Jaegeuk Kim wrote:
> > > > > > 2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
> > > 
> > > > The main reason I can see against extended attributes is that they are 
> > > > not stored
> > > > very efficiently in f2fs, unless a lot of work is put into coming up 
> > > > with a good
> > > > implementation. A single flags bit can trivially be added to the inode 
> > > > in
> > > > comparison (if it's not there already).
> > > 
> > > That's a deficiency that should be corrected, then, because xattrs
> > > are very common these days.
> > 
> > IMO, most file systems including f2fs have some inefficiency to store
> > and retrieve xattrs, since they have to allocate an additional block.
> > The only distinct problem in f2fs is that there is a cleaning overhead.
> > So, that's the why xattr is not an efficient way in f2fs.
> 
> I would hope that there is a better way to encode extented attributes
> if the current method is not efficient enough. Maybe Dave or someone
> else who is experienced with this can make suggestions.
> 
> What is the "expected" size of the attributes normally?

Most attributes are small. Even "large" user attributes don't
generally get to be more than a couple of hundred bytes, though the
maximum size for a single xattr is 64K.

> Does it
> make sense to put attributes for multiple files into a single block?

There are two main ways of dealing with attributes. The first is a
tree-like structure to index and store unique xattrs, and have the
inode siimply keep pointers to the main xattr tree. This is great
for keeping space down when there are lots of identical xattrs, but
is a serialisation point for modification an modification can be
complex (i.e. shared entries in the tree need COW semantics.) This
is the approach ZFS takes, IIRC, and is the most space efficient way
of dealing with xattrs. It's not the most performance efficient way,
however, and the reference counting means frequent tree rewrites.

The second is the XFS/ext4 approach, where xattrs are stored in a
per-inode tree, with no sharing. The attribute tree holds the
attributes in it's leaves, and the tree grows and shrinks as you add
or remove xattrs. There are optimisations on top of this - e.g. for
XFS if the xattrs fit in the spare space in the inode, they are
packed into the inode ("shortform") and don't require an external
block. IIUC, there are patches to implement this optimisation for
ext4 floating around at the moment. This is a worthwhile
optimisation, because with a 512 byte inode size on XFS there is
enough spare space (roughly 380 bytes) for most systems to store all
their xattrs in the inode itself. XFS also has "remote attr storage"
for large xattrs (i.e. larger than a leaf block), where the tree
just keeps a pointer to an external extent that holds the xattr.

IIRC, fs2fs uses 4k inodes, so IMO per-inode xattr tress with
internal storage before spilling to an external block is probably
the best approach to take...

> > OTOH, I think xattr itself is for users, not for communicating
> > between file system and users.
> 
> No, you are mistaken in that point, as Dave explained.

e.g. selinux, IMA, ACLs, capabilities, etc all communicate
information that the kernel uses for access control. That's why
xattrs have different namespaces like "system", "security" and
"user". Only user attributes are truly for user data
- the rest are for communicating information to the kernel

A file usage policy xattr would definitely exist under the "system"
namespace - it's not a user xattr at all.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Jaegeuk Kim
2012-10-16 (화), 11:38 +, Arnd Bergmann:
> On Tuesday 16 October 2012, Jaegeuk Kim wrote:
> > On Monday 15 October 2012, Dave Chinner wrote:
> > > On Sun, Oct 14, 2012 at 03:19:37PM +, Arnd Bergmann wrote:
> > > > On Sunday 14 October 2012, Vyacheslav Dubeyko wrote:
> > > > > On Oct 14, 2012, at 11:09 AM, Jaegeuk Kim wrote:
> > > > > > 2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
> > > 
> > > > The main reason I can see against extended attributes is that they are 
> > > > not stored
> > > > very efficiently in f2fs, unless a lot of work is put into coming up 
> > > > with a good
> > > > implementation. A single flags bit can trivially be added to the inode 
> > > > in
> > > > comparison (if it's not there already).
> > > 
> > > That's a deficiency that should be corrected, then, because xattrs
> > > are very common these days.
> > 
> > IMO, most file systems including f2fs have some inefficiency to store
> > and retrieve xattrs, since they have to allocate an additional block.
> > The only distinct problem in f2fs is that there is a cleaning overhead.
> > So, that's the why xattr is not an efficient way in f2fs.
> 
> I would hope that there is a better way to encode extented attributes
> if the current method is not efficient enough. Maybe Dave or someone
> else who is experienced with this can make suggestions.
> 
> What is the "expected" size of the attributes normally? Does it
> make sense to put attributes for multiple files into a single block?
>  

IMHO, this is an issue on global vs. local management approaches.
Yes, it is possible to manage xattrs globally, but my concern is how
to efficiently store, retrieve, and search xattrs without lock
contention, when users create and remove so many xattrs dynamically.
It may need to adopt an additional index structure like b+tree.

Well, the important thing that I focus is that xattrs need additional
blocks whatever they are compacted or not, resulting in cleaning
overhead in f2fs.

> > OTOH, I think xattr itself is for users, not for communicating
> > between file system and users.
> 
> No, you are mistaken in that point, as Dave explained.

I meant the original use-cases of xattr like below.
"Typical uses can be storing the author of a document, the character
encoding of a plain-text document, a checksum, cryptographic hash or
digital signature."
from http://en.wikipedia.org/wiki/Extended_file_attributes
Would you please give me some existing use-cases for the
communicating purpose?

> > Moreover, I'm not sure in the current android, but I saw ICS android
> > did not call any xattr operations, even if mount option was enabled.
> 
> I realize that Android is currently your main target, but to get
> the file system merged into Linux, it needs to fit in with the
> overall strategy for file systems, which includes more than just
> Android.

Definitely, yes.
I just wanted to say that xattr is not common on all the platforms yet.

> 
> > > And given that stuff like access frequency tracking is being
> > > implemented at the VFS level, access policy hints should also be VFS
> > > functionality. A bad filesystem implementation should not dictate
> > > the interface for generically useful functionality
> 
> Agreed.
> 
> > > An xattr on the root inode that holds a list like this is something
> > > that could be set at mkfs time, but then also updated easily by new
> > > software packages that are installed...
> 
> Yes, good idea.

Likewise many file systems, f2fs also supports xattr as a configurable
Kconfig option.
If user disables the xattr feature, how can we do this?

> 
> > > > We should also take the kinds of access we have seen on a file into 
> > > > account.
> > > 
> > > Yes, but it should be done at the VFS level, not in the filesystem
> > > itself. Integrated into the current hot inode/range tracking that is
> > > being worked on right now, I'd suggest.
> > > 
> > > IOWs, these access policy issues are not unique to F2FS or it's use
> > > case. Anything to do with access hints, policy, tracking, file
> > > classification, etc that can influence data locality, reclaim,
> > > migration, etc need to be dealt with at the VFS, independently of a
> > > specific filesystem. Filesystems can make use of that information
> > > how they please (whether in the kernel or via userspace tools), but
> > > having filesystem specific interfaces and implementations of the
> > > same functionality is extremely wasteful. Let's do it once, and do
> > > it right the first time. ;)
> > 
> > I agree that VFS should support something, but before then, it needs
> > to do something by the file system first.
> > Because, we have to figure out what kind of information are really useful.
> 
> As mentioned before, such tuning can still be done after the file system
> is merged, as long as the on-disk structure is flexible enough.
> 
> As you said yourself, imlpementing the hints by detecting access patterns
> from the file system itself as I suggested would be a lot of 

Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Arnd Bergmann
On Tuesday 16 October 2012, Jaegeuk Kim wrote:
> Thank you for a lot of points to be addressed. :)
> Maybe it's time to summarize them.
> Please let me know what I misunderstood.
> 
> [In v2]
> - Extension list
>   : Mkfs supports configuring extensions by user, and that information
> will be stored in the superblock. In order to reduce the cleaning 
> overhead,
> f2fs supports an additional interface, ioctl, likewise ext4.

That is what I suggested but actually Dave Chinner is the person that you
need to listen to rather than me in this regard. Using an extended attribute
in the root node would be more appropriate to configure this than an ioctl.

> - The number of active logs
>   : No change will be done in on-disk layout (i.e., max 6 logs).
> Instead, f2fs supports changing the number with a mount option.
> Currently, I think 4, 5, and 6 would be enough.

Right, that would be the minimum that I would ask for. If it is relatively
easy to support more than six logs in the file format without actually
implementing them in the code, you might want to support up to 16, just
to be future-proof.

For the lower bound, being able to support as little as 2 logs for
cheap hardware would be nice, but 4 logs is the important one.

5 logs is probably not all that important, as long as you have the
choice between 4 and 6. If you implement three different ways, I
would prefer have the choice of 2/4/6 over 4/5/6 logs.

> - Section size
>   : Mkfs supports multiples of segments for a section, not power-of-two.

Right.

> [Future optimization]
> - Data separation
>   : file access pattern, and else?

 : Investigate the option to make large files erase block indirect rather than
   part of the normal logs

There is one more more point that I have not mentioned before, which is the
alignment of write requests. As far as I can tell, you try to group writes
as much as possible, but the alignment and the minimum size is still just
4 KB. I fear that this might not be good enough for a lot of cases when
the page sizes grow and there is no sufficient amount of nonvolatile
write cache in the device. I wonder whether there is something that can
be done to ensure we always write with a minimum alignment, and pad
out the data with zeroes if necessary in order to avoid getting into
garbage collection on devices that can't handle sub-page writes.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Arnd Bergmann
On Tuesday 16 October 2012, Jaegeuk Kim wrote:
> On Monday 15 October 2012, Dave Chinner wrote:
> > On Sun, Oct 14, 2012 at 03:19:37PM +, Arnd Bergmann wrote:
> > > On Sunday 14 October 2012, Vyacheslav Dubeyko wrote:
> > > > On Oct 14, 2012, at 11:09 AM, Jaegeuk Kim wrote:
> > > > > 2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
> > 
> > > The main reason I can see against extended attributes is that they are 
> > > not stored
> > > very efficiently in f2fs, unless a lot of work is put into coming up with 
> > > a good
> > > implementation. A single flags bit can trivially be added to the inode in
> > > comparison (if it's not there already).
> > 
> > That's a deficiency that should be corrected, then, because xattrs
> > are very common these days.
> 
> IMO, most file systems including f2fs have some inefficiency to store
> and retrieve xattrs, since they have to allocate an additional block.
> The only distinct problem in f2fs is that there is a cleaning overhead.
> So, that's the why xattr is not an efficient way in f2fs.

I would hope that there is a better way to encode extented attributes
if the current method is not efficient enough. Maybe Dave or someone
else who is experienced with this can make suggestions.

What is the "expected" size of the attributes normally? Does it
make sense to put attributes for multiple files into a single block?
 
> OTOH, I think xattr itself is for users, not for communicating
> between file system and users.

No, you are mistaken in that point, as Dave explained.

> Moreover, I'm not sure in the current android, but I saw ICS android
> did not call any xattr operations, even if mount option was enabled.

I realize that Android is currently your main target, but to get
the file system merged into Linux, it needs to fit in with the
overall strategy for file systems, which includes more than just
Android.

> > And given that stuff like access frequency tracking is being
> > implemented at the VFS level, access policy hints should also be VFS
> > functionality. A bad filesystem implementation should not dictate
> > the interface for generically useful functionality

Agreed.

> > An xattr on the root inode that holds a list like this is something
> > that could be set at mkfs time, but then also updated easily by new
> > software packages that are installed...

Yes, good idea.

> > > We should also take the kinds of access we have seen on a file into 
> > > account.
> > 
> > Yes, but it should be done at the VFS level, not in the filesystem
> > itself. Integrated into the current hot inode/range tracking that is
> > being worked on right now, I'd suggest.
> > 
> > IOWs, these access policy issues are not unique to F2FS or it's use
> > case. Anything to do with access hints, policy, tracking, file
> > classification, etc that can influence data locality, reclaim,
> > migration, etc need to be dealt with at the VFS, independently of a
> > specific filesystem. Filesystems can make use of that information
> > how they please (whether in the kernel or via userspace tools), but
> > having filesystem specific interfaces and implementations of the
> > same functionality is extremely wasteful. Let's do it once, and do
> > it right the first time. ;)
> 
> I agree that VFS should support something, but before then, it needs
> to do something by the file system first.
> Because, we have to figure out what kind of information are really useful.

As mentioned before, such tuning can still be done after the file system
is merged, as long as the on-disk structure is flexible enough.

As you said yourself, imlpementing the hints by detecting access patterns
from the file system itself as I suggested would be a lot of work, and
if the VFS can give us that information for free in the future, we can
wait for that to happen (or help out on the implementation if necessary)
and then implement it based on those APIs.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Arnd Bergmann
On Tuesday 16 October 2012, Jaegeuk Kim wrote:
 On Monday 15 October 2012, Dave Chinner wrote:
  On Sun, Oct 14, 2012 at 03:19:37PM +, Arnd Bergmann wrote:
   On Sunday 14 October 2012, Vyacheslav Dubeyko wrote:
On Oct 14, 2012, at 11:09 AM, Jaegeuk Kim wrote:
 2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
  
   The main reason I can see against extended attributes is that they are 
   not stored
   very efficiently in f2fs, unless a lot of work is put into coming up with 
   a good
   implementation. A single flags bit can trivially be added to the inode in
   comparison (if it's not there already).
  
  That's a deficiency that should be corrected, then, because xattrs
  are very common these days.
 
 IMO, most file systems including f2fs have some inefficiency to store
 and retrieve xattrs, since they have to allocate an additional block.
 The only distinct problem in f2fs is that there is a cleaning overhead.
 So, that's the why xattr is not an efficient way in f2fs.

I would hope that there is a better way to encode extented attributes
if the current method is not efficient enough. Maybe Dave or someone
else who is experienced with this can make suggestions.

What is the expected size of the attributes normally? Does it
make sense to put attributes for multiple files into a single block?
 
 OTOH, I think xattr itself is for users, not for communicating
 between file system and users.

No, you are mistaken in that point, as Dave explained.

 Moreover, I'm not sure in the current android, but I saw ICS android
 did not call any xattr operations, even if mount option was enabled.

I realize that Android is currently your main target, but to get
the file system merged into Linux, it needs to fit in with the
overall strategy for file systems, which includes more than just
Android.

  And given that stuff like access frequency tracking is being
  implemented at the VFS level, access policy hints should also be VFS
  functionality. A bad filesystem implementation should not dictate
  the interface for generically useful functionality

Agreed.

  An xattr on the root inode that holds a list like this is something
  that could be set at mkfs time, but then also updated easily by new
  software packages that are installed...

Yes, good idea.

   We should also take the kinds of access we have seen on a file into 
   account.
  
  Yes, but it should be done at the VFS level, not in the filesystem
  itself. Integrated into the current hot inode/range tracking that is
  being worked on right now, I'd suggest.
  
  IOWs, these access policy issues are not unique to F2FS or it's use
  case. Anything to do with access hints, policy, tracking, file
  classification, etc that can influence data locality, reclaim,
  migration, etc need to be dealt with at the VFS, independently of a
  specific filesystem. Filesystems can make use of that information
  how they please (whether in the kernel or via userspace tools), but
  having filesystem specific interfaces and implementations of the
  same functionality is extremely wasteful. Let's do it once, and do
  it right the first time. ;)
 
 I agree that VFS should support something, but before then, it needs
 to do something by the file system first.
 Because, we have to figure out what kind of information are really useful.

As mentioned before, such tuning can still be done after the file system
is merged, as long as the on-disk structure is flexible enough.

As you said yourself, imlpementing the hints by detecting access patterns
from the file system itself as I suggested would be a lot of work, and
if the VFS can give us that information for free in the future, we can
wait for that to happen (or help out on the implementation if necessary)
and then implement it based on those APIs.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Arnd Bergmann
On Tuesday 16 October 2012, Jaegeuk Kim wrote:
 Thank you for a lot of points to be addressed. :)
 Maybe it's time to summarize them.
 Please let me know what I misunderstood.
 
 [In v2]
 - Extension list
   : Mkfs supports configuring extensions by user, and that information
 will be stored in the superblock. In order to reduce the cleaning 
 overhead,
 f2fs supports an additional interface, ioctl, likewise ext4.

That is what I suggested but actually Dave Chinner is the person that you
need to listen to rather than me in this regard. Using an extended attribute
in the root node would be more appropriate to configure this than an ioctl.

 - The number of active logs
   : No change will be done in on-disk layout (i.e., max 6 logs).
 Instead, f2fs supports changing the number with a mount option.
 Currently, I think 4, 5, and 6 would be enough.

Right, that would be the minimum that I would ask for. If it is relatively
easy to support more than six logs in the file format without actually
implementing them in the code, you might want to support up to 16, just
to be future-proof.

For the lower bound, being able to support as little as 2 logs for
cheap hardware would be nice, but 4 logs is the important one.

5 logs is probably not all that important, as long as you have the
choice between 4 and 6. If you implement three different ways, I
would prefer have the choice of 2/4/6 over 4/5/6 logs.

 - Section size
   : Mkfs supports multiples of segments for a section, not power-of-two.

Right.

 [Future optimization]
 - Data separation
   : file access pattern, and else?

 : Investigate the option to make large files erase block indirect rather than
   part of the normal logs

There is one more more point that I have not mentioned before, which is the
alignment of write requests. As far as I can tell, you try to group writes
as much as possible, but the alignment and the minimum size is still just
4 KB. I fear that this might not be good enough for a lot of cases when
the page sizes grow and there is no sufficient amount of nonvolatile
write cache in the device. I wonder whether there is something that can
be done to ensure we always write with a minimum alignment, and pad
out the data with zeroes if necessary in order to avoid getting into
garbage collection on devices that can't handle sub-page writes.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Jaegeuk Kim
2012-10-16 (화), 11:38 +, Arnd Bergmann:
 On Tuesday 16 October 2012, Jaegeuk Kim wrote:
  On Monday 15 October 2012, Dave Chinner wrote:
   On Sun, Oct 14, 2012 at 03:19:37PM +, Arnd Bergmann wrote:
On Sunday 14 October 2012, Vyacheslav Dubeyko wrote:
 On Oct 14, 2012, at 11:09 AM, Jaegeuk Kim wrote:
  2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
   
The main reason I can see against extended attributes is that they are 
not stored
very efficiently in f2fs, unless a lot of work is put into coming up 
with a good
implementation. A single flags bit can trivially be added to the inode 
in
comparison (if it's not there already).
   
   That's a deficiency that should be corrected, then, because xattrs
   are very common these days.
  
  IMO, most file systems including f2fs have some inefficiency to store
  and retrieve xattrs, since they have to allocate an additional block.
  The only distinct problem in f2fs is that there is a cleaning overhead.
  So, that's the why xattr is not an efficient way in f2fs.
 
 I would hope that there is a better way to encode extented attributes
 if the current method is not efficient enough. Maybe Dave or someone
 else who is experienced with this can make suggestions.
 
 What is the expected size of the attributes normally? Does it
 make sense to put attributes for multiple files into a single block?
  

IMHO, this is an issue on global vs. local management approaches.
Yes, it is possible to manage xattrs globally, but my concern is how
to efficiently store, retrieve, and search xattrs without lock
contention, when users create and remove so many xattrs dynamically.
It may need to adopt an additional index structure like b+tree.

Well, the important thing that I focus is that xattrs need additional
blocks whatever they are compacted or not, resulting in cleaning
overhead in f2fs.

  OTOH, I think xattr itself is for users, not for communicating
  between file system and users.
 
 No, you are mistaken in that point, as Dave explained.

I meant the original use-cases of xattr like below.
Typical uses can be storing the author of a document, the character
encoding of a plain-text document, a checksum, cryptographic hash or
digital signature.
from http://en.wikipedia.org/wiki/Extended_file_attributes
Would you please give me some existing use-cases for the
communicating purpose?

  Moreover, I'm not sure in the current android, but I saw ICS android
  did not call any xattr operations, even if mount option was enabled.
 
 I realize that Android is currently your main target, but to get
 the file system merged into Linux, it needs to fit in with the
 overall strategy for file systems, which includes more than just
 Android.

Definitely, yes.
I just wanted to say that xattr is not common on all the platforms yet.

 
   And given that stuff like access frequency tracking is being
   implemented at the VFS level, access policy hints should also be VFS
   functionality. A bad filesystem implementation should not dictate
   the interface for generically useful functionality
 
 Agreed.
 
   An xattr on the root inode that holds a list like this is something
   that could be set at mkfs time, but then also updated easily by new
   software packages that are installed...
 
 Yes, good idea.

Likewise many file systems, f2fs also supports xattr as a configurable
Kconfig option.
If user disables the xattr feature, how can we do this?

 
We should also take the kinds of access we have seen on a file into 
account.
   
   Yes, but it should be done at the VFS level, not in the filesystem
   itself. Integrated into the current hot inode/range tracking that is
   being worked on right now, I'd suggest.
   
   IOWs, these access policy issues are not unique to F2FS or it's use
   case. Anything to do with access hints, policy, tracking, file
   classification, etc that can influence data locality, reclaim,
   migration, etc need to be dealt with at the VFS, independently of a
   specific filesystem. Filesystems can make use of that information
   how they please (whether in the kernel or via userspace tools), but
   having filesystem specific interfaces and implementations of the
   same functionality is extremely wasteful. Let's do it once, and do
   it right the first time. ;)
  
  I agree that VFS should support something, but before then, it needs
  to do something by the file system first.
  Because, we have to figure out what kind of information are really useful.
 
 As mentioned before, such tuning can still be done after the file system
 is merged, as long as the on-disk structure is flexible enough.
 
 As you said yourself, imlpementing the hints by detecting access patterns
 from the file system itself as I suggested would be a lot of work, and
 if the VFS can give us that information for free in the future, we can
 wait for that to happen (or help out on the implementation if necessary)
 and then implement it based on 

Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Dave Chinner
On Tue, Oct 16, 2012 at 11:38:35AM +, Arnd Bergmann wrote:
 On Tuesday 16 October 2012, Jaegeuk Kim wrote:
  On Monday 15 October 2012, Dave Chinner wrote:
   On Sun, Oct 14, 2012 at 03:19:37PM +, Arnd Bergmann wrote:
On Sunday 14 October 2012, Vyacheslav Dubeyko wrote:
 On Oct 14, 2012, at 11:09 AM, Jaegeuk Kim wrote:
  2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
   
The main reason I can see against extended attributes is that they are 
not stored
very efficiently in f2fs, unless a lot of work is put into coming up 
with a good
implementation. A single flags bit can trivially be added to the inode 
in
comparison (if it's not there already).
   
   That's a deficiency that should be corrected, then, because xattrs
   are very common these days.
  
  IMO, most file systems including f2fs have some inefficiency to store
  and retrieve xattrs, since they have to allocate an additional block.
  The only distinct problem in f2fs is that there is a cleaning overhead.
  So, that's the why xattr is not an efficient way in f2fs.
 
 I would hope that there is a better way to encode extented attributes
 if the current method is not efficient enough. Maybe Dave or someone
 else who is experienced with this can make suggestions.
 
 What is the expected size of the attributes normally?

Most attributes are small. Even large user attributes don't
generally get to be more than a couple of hundred bytes, though the
maximum size for a single xattr is 64K.

 Does it
 make sense to put attributes for multiple files into a single block?

There are two main ways of dealing with attributes. The first is a
tree-like structure to index and store unique xattrs, and have the
inode siimply keep pointers to the main xattr tree. This is great
for keeping space down when there are lots of identical xattrs, but
is a serialisation point for modification an modification can be
complex (i.e. shared entries in the tree need COW semantics.) This
is the approach ZFS takes, IIRC, and is the most space efficient way
of dealing with xattrs. It's not the most performance efficient way,
however, and the reference counting means frequent tree rewrites.

The second is the XFS/ext4 approach, where xattrs are stored in a
per-inode tree, with no sharing. The attribute tree holds the
attributes in it's leaves, and the tree grows and shrinks as you add
or remove xattrs. There are optimisations on top of this - e.g. for
XFS if the xattrs fit in the spare space in the inode, they are
packed into the inode (shortform) and don't require an external
block. IIUC, there are patches to implement this optimisation for
ext4 floating around at the moment. This is a worthwhile
optimisation, because with a 512 byte inode size on XFS there is
enough spare space (roughly 380 bytes) for most systems to store all
their xattrs in the inode itself. XFS also has remote attr storage
for large xattrs (i.e. larger than a leaf block), where the tree
just keeps a pointer to an external extent that holds the xattr.

IIRC, fs2fs uses 4k inodes, so IMO per-inode xattr tress with
internal storage before spilling to an external block is probably
the best approach to take...

  OTOH, I think xattr itself is for users, not for communicating
  between file system and users.
 
 No, you are mistaken in that point, as Dave explained.

e.g. selinux, IMA, ACLs, capabilities, etc all communicate
information that the kernel uses for access control. That's why
xattrs have different namespaces like system, security and
user. Only user attributes are truly for user data
- the rest are for communicating information to the kernel

A file usage policy xattr would definitely exist under the system
namespace - it's not a user xattr at all.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Jaegeuk Kim
2012-10-16 (화), 16:14 +, Arnd Bergmann:
 On Tuesday 16 October 2012, Jaegeuk Kim wrote:
  Thank you for a lot of points to be addressed. :)
  Maybe it's time to summarize them.
  Please let me know what I misunderstood.
  
  [In v2]
  - Extension list
: Mkfs supports configuring extensions by user, and that information
  will be stored in the superblock. In order to reduce the cleaning 
  overhead,
  f2fs supports an additional interface, ioctl, likewise ext4.
 
 That is what I suggested but actually Dave Chinner is the person that you
 need to listen to rather than me in this regard. Using an extended attribute
 in the root node would be more appropriate to configure this than an ioctl.
 
  - The number of active logs
: No change will be done in on-disk layout (i.e., max 6 logs).
  Instead, f2fs supports changing the number with a mount option.
  Currently, I think 4, 5, and 6 would be enough.
 
 Right, that would be the minimum that I would ask for. If it is relatively
 easy to support more than six logs in the file format without actually
 implementing them in the code, you might want to support up to 16, just
 to be future-proof.

Ok, got it.

 
 For the lower bound, being able to support as little as 2 logs for
 cheap hardware would be nice, but 4 logs is the important one.
 
 5 logs is probably not all that important, as long as you have the
 choice between 4 and 6. If you implement three different ways, I
 would prefer have the choice of 2/4/6 over 4/5/6 logs.

Ok, I'll try, but in the case of 2 logs, it may need to change recovery
routines.

 
  - Section size
: Mkfs supports multiples of segments for a section, not power-of-two.
 
 Right.
 
  [Future optimization]
  - Data separation
: file access pattern, and else?
 
  : Investigate the option to make large files erase block indirect rather than
part of the normal logs
 
 There is one more more point that I have not mentioned before, which is the
 alignment of write requests. As far as I can tell, you try to group writes
 as much as possible, but the alignment and the minimum size is still just
 4 KB.

Yes.

 I fear that this might not be good enough for a lot of cases when
 the page sizes grow and there is no sufficient amount of nonvolatile
 write cache in the device. I wonder whether there is something that can
 be done to ensure we always write with a minimum alignment, and pad
 out the data with zeroes if necessary in order to avoid getting into
 garbage collection on devices that can't handle sub-page writes.

You're very familiar with flash. :)
Yes, as the page size grows, the sub-page write issue is one of the
most critical problems.
I also thought this before, but I have not made a conclusion until now.
Because, I don't know how to deal with this in other companies, but,
I've seen that so many firmware developers in samsung have tried to
reduce this overhead by adapting many schemes.
I guess very cautiously that other companies also handle this well.
Therefore, I keep a question whether file system should care about
this perfectly or not.

Thanks,

 
   Arnd

-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Jaegeuk Kim
 The main reason I can see against extended attributes is that they 
 are not stored
 very efficiently in f2fs, unless a lot of work is put into coming up 
 with a good
 implementation. A single flags bit can trivially be added to the 
 inode in
 comparison (if it's not there already).

That's a deficiency that should be corrected, then, because xattrs
are very common these days.
   
   IMO, most file systems including f2fs have some inefficiency to store
   and retrieve xattrs, since they have to allocate an additional block.
   The only distinct problem in f2fs is that there is a cleaning overhead.
   So, that's the why xattr is not an efficient way in f2fs.
  
  I would hope that there is a better way to encode extented attributes
  if the current method is not efficient enough. Maybe Dave or someone
  else who is experienced with this can make suggestions.
  
  What is the expected size of the attributes normally?
 
 Most attributes are small. Even large user attributes don't
 generally get to be more than a couple of hundred bytes, though the
 maximum size for a single xattr is 64K.
 
  Does it
  make sense to put attributes for multiple files into a single block?
 
 There are two main ways of dealing with attributes. The first is a
 tree-like structure to index and store unique xattrs, and have the
 inode siimply keep pointers to the main xattr tree. This is great
 for keeping space down when there are lots of identical xattrs, but
 is a serialisation point for modification an modification can be
 complex (i.e. shared entries in the tree need COW semantics.) This
 is the approach ZFS takes, IIRC, and is the most space efficient way
 of dealing with xattrs. It's not the most performance efficient way,
 however, and the reference counting means frequent tree rewrites.
 
 The second is the XFS/ext4 approach, where xattrs are stored in a
 per-inode tree, with no sharing. The attribute tree holds the
 attributes in it's leaves, and the tree grows and shrinks as you add
 or remove xattrs. There are optimisations on top of this - e.g. for
 XFS if the xattrs fit in the spare space in the inode, they are
 packed into the inode (shortform) and don't require an external
 block. IIUC, there are patches to implement this optimisation for
 ext4 floating around at the moment. This is a worthwhile
 optimisation, because with a 512 byte inode size on XFS there is
 enough spare space (roughly 380 bytes) for most systems to store all
 their xattrs in the inode itself. XFS also has remote attr storage
 for large xattrs (i.e. larger than a leaf block), where the tree
 just keeps a pointer to an external extent that holds the xattr.

Thank you for great explanation. :)

 
 IIRC, fs2fs uses 4k inodes, so IMO per-inode xattr tress with
 internal storage before spilling to an external block is probably
 the best approach to take...

Yes, indeed this is the best approach to f2fs's xattr.
Apart from giving fs hints, it is worth enough to optimize later.

 
   OTOH, I think xattr itself is for users, not for communicating
   between file system and users.
  
  No, you are mistaken in that point, as Dave explained.
 
 e.g. selinux, IMA, ACLs, capabilities, etc all communicate
 information that the kernel uses for access control. That's why
 xattrs have different namespaces like system, security and
 user. Only user attributes are truly for user data
 - the rest are for communicating information to the kernel
 

I agree that system is used by kernel.
How about the file system view?
Would you explain what file systems retrieve xattrs and use
them with their own purpose?
Sorry, I'm not familiar with xattrs in depth.

Unfortunately, system is not implemented in f2fs yet. :(

 A file usage policy xattr would definitely exist under the system
 namespace - it's not a user xattr at all.
 
 Cheers,
 
 Dave.

-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Dave Chinner
On Wed, Oct 17, 2012 at 07:30:21AM +0900, Jaegeuk Kim wrote:
OTOH, I think xattr itself is for users, not for communicating
between file system and users.
   
   No, you are mistaken in that point, as Dave explained.
  
  e.g. selinux, IMA, ACLs, capabilities, etc all communicate
  information that the kernel uses for access control. That's why
  xattrs have different namespaces like system, security and
  user. Only user attributes are truly for user data
  - the rest are for communicating information to the kernel
  
 
 I agree that system is used by kernel.
 How about the file system view?

Not sure what you mean - the filesystem woul dsimply read the xattrs
in the system namespace as it needs, just like the other subsystems
like selinux or IMA do.

 Would you explain what file systems retrieve xattrs and use
 them with their own purpose?

I think cachefs users a CacheFiles.cache namespace for storing
information it needs in xattrs. ecryptfs stores crypto metadata in
xattrs in the lower filesytem. NFSv4 servers store junction mount
information in xattrs.

So there are examples where filesystems use xattrs for special
information. However, in most cases filesystems don't need xattrs
for their own metadata primarily because that gets added to their
own on-disk formats. IThe above are all overlay style filesystems
that don't have their own on-disk formats, so need to use xattrs to
store their per-inode metadata.

The case of access hints and allocation policies are not somethign
that are native to any filesystem on-disk format. They are abstract
concepts that really only the software generating/using that
information knows about. Given we want the software that uses this
information to be in VFS, it is separate from every filesystem and
this is exactly the use case that system xattrs were intended for.
:)

 Sorry, I'm not familiar with xattrs in depth.
 
 Unfortunately, system is not implemented in f2fs yet. :(

If you've already implemented the user.* namespace, then it's
trivial to support the other namespaces - it's just prefixing the
xattrs with the appropriate string instead of user

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Jaegeuk Kim
 On Wed, Oct 17, 2012 at 07:30:21AM +0900, Jaegeuk Kim wrote:
 OTOH, I think xattr itself is for users, not for communicating
 between file system and users.
   
No, you are mistaken in that point, as Dave explained.
  
   e.g. selinux, IMA, ACLs, capabilities, etc all communicate
   information that the kernel uses for access control. That's why
   xattrs have different namespaces like system, security and
   user. Only user attributes are truly for user data
   - the rest are for communicating information to the kernel
  
 
  I agree that system is used by kernel.
  How about the file system view?
 
 Not sure what you mean - the filesystem woul dsimply read the xattrs
 in the system namespace as it needs, just like the other subsystems
 like selinux or IMA do.
 
  Would you explain what file systems retrieve xattrs and use
  them with their own purpose?
 
 I think cachefs users a CacheFiles.cache namespace for storing
 information it needs in xattrs. ecryptfs stores crypto metadata in
 xattrs in the lower filesytem. NFSv4 servers store junction mount
 information in xattrs.
 
 So there are examples where filesystems use xattrs for special
 information. However, in most cases filesystems don't need xattrs
 for their own metadata primarily because that gets added to their
 own on-disk formats. IThe above are all overlay style filesystems
 that don't have their own on-disk formats, so need to use xattrs to
 store their per-inode metadata.
 
 The case of access hints and allocation policies are not somethign
 that are native to any filesystem on-disk format. They are abstract
 concepts that really only the software generating/using that
 information knows about. Given we want the software that uses this
 information to be in VFS, it is separate from every filesystem and
 this is exactly the use case that system xattrs were intended for.
 :)

I understand. Thank you very much. :)

 
  Sorry, I'm not familiar with xattrs in depth.
 
  Unfortunately, system is not implemented in f2fs yet. :(
 
 If you've already implemented the user.* namespace, then it's
 trivial to support the other namespaces - it's just prefixing the
 xattrs with the appropriate string instead of user
 

Ok, I'll do right now.
Thanks, again.

 Cheers,
 
 Dave.
 --
 Dave Chinner
 da...@fromorbit.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-16 Thread Jaegeuk Kim
 2012-10-16 (화), 16:14 +, Arnd Bergmann:
  On Tuesday 16 October 2012, Jaegeuk Kim wrote:
   Thank you for a lot of points to be addressed. :)
   Maybe it's time to summarize them.
   Please let me know what I misunderstood.
  
   [In v2]
   - Extension list
 : Mkfs supports configuring extensions by user, and that information
   will be stored in the superblock. In order to reduce the cleaning 
   overhead,
   f2fs supports an additional interface, ioctl, likewise ext4.
 
  That is what I suggested but actually Dave Chinner is the person that you
  need to listen to rather than me in this regard. Using an extended attribute
  in the root node would be more appropriate to configure this than an ioctl.
 
   - The number of active logs
 : No change will be done in on-disk layout (i.e., max 6 logs).
   Instead, f2fs supports changing the number with a mount option.
   Currently, I think 4, 5, and 6 would be enough.
 
  Right, that would be the minimum that I would ask for. If it is relatively
  easy to support more than six logs in the file format without actually
  implementing them in the code, you might want to support up to 16, just
  to be future-proof.
 
 Ok, got it.
 
 
  For the lower bound, being able to support as little as 2 logs for
  cheap hardware would be nice, but 4 logs is the important one.
 
  5 logs is probably not all that important, as long as you have the
  choice between 4 and 6. If you implement three different ways, I
  would prefer have the choice of 2/4/6 over 4/5/6 logs.
 
 Ok, I'll try, but in the case of 2 logs, it may need to change recovery
 routines.
 
 
   - Section size
 : Mkfs supports multiples of segments for a section, not power-of-two.
 
  Right.
 
   [Future optimization]
   - Data separation
 : file access pattern, and else?
 
   : Investigate the option to make large files erase block indirect rather 
  than
 part of the normal logs
 
  There is one more more point that I have not mentioned before, which is the
  alignment of write requests. As far as I can tell, you try to group writes
  as much as possible, but the alignment and the minimum size is still just
  4 KB.
 
 Yes.
 
  I fear that this might not be good enough for a lot of cases when
  the page sizes grow and there is no sufficient amount of nonvolatile
  write cache in the device. I wonder whether there is something that can
  be done to ensure we always write with a minimum alignment, and pad
  out the data with zeroes if necessary in order to avoid getting into
  garbage collection on devices that can't handle sub-page writes.
 
 You're very familiar with flash. :)
 Yes, as the page size grows, the sub-page write issue is one of the
 most critical problems.
 I also thought this before, but I have not made a conclusion until now.
 Because, I don't know how to deal with this in other companies, but,
 I've seen that so many firmware developers in samsung have tried to
 reduce this overhead by adapting many schemes.
 I guess very cautiously that other companies also handle this well.
 Therefore, I keep a question whether file system should care about
 this perfectly or not.
 
 Thanks,
 
 
  Arnd
 

As discussed with Dave, I propose the following items.

[In v2]
- Extension list
   : Mkfs supports configuring extensions by user, and that information
 will be stored in the superblock.
 I'll add a mount option to enable/disable using the extension list.
 Instead, f2fs supports xattr to give a hint to any files.
 After supporting this by VFS, it'll be removed.
- The number of active logs
  : For compatibility, on-disk layout supports max 16 logs.
Instead, f2fs supports configuring the number of active logs that
will be used by a mount option.
The option supports 2, 4, and 6 logs.
- Section size
  : Mkfs supports multiples of segments for a section, not power-of-two.

[Future optimization]
- Data separation
  : file access pattern
  : Investigate the option to make large files erase block indirect rather than
   part of the normal logs
  : sub-page write avoidance

Thanks,

 --
 Jaegeuk Kim
 Samsung

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-15 Thread Jaegeuk Kim
> On Monday 15 October 2012, Changman Lee wrote:
> > 2012년 10월 15일 월요일에 Arnd Bergmann님이 작성:
> > > It is only a performance hint though, so it is not a correctness issue the
> > > file system gets it wrong. In order to do efficient garbage collection, a 
> > > log
> > > structured file system should take all the information it can get about 
> > > the
> > > expected life of data it writes. I agree that the list, even in the form 
> > > of
> > > mkfs time settings, is not a clean abstraction, but in the place of an 
> > > Android
> > > phone manufacturer I would still enable it if it promises a significant
> > > performance advantage over not using it. I guess it would be nice if this
> > > could be overridden in some form, e.g. using an ioctl on the file as ext4 
> > > does.
> > >
> > Right. This is related with HOT/COLD separation policy of f2fs. If we know
> > that data is COLD, we can manage gc effectively.
> > I think that ext lists are placed in sb is better like your advice because
> > it's difficult to fix user app. Although it's nasty way.
> 
> Ok. I think you should adapt the terminology though. Right now, the 
> optimization
> is to mark the data as COLD because we expect it to be written less often than
> other kinds of data. However, the hot/cold terms are usually only applied to
> data that we assume is going to be written soon or not based on how often
> the same data has been accessed in the past.
> 
> Anything you detect from the file name is not really a hint on hot/cold
> files, but rather on the expected access pattern: These files are going
> to be written once, and will be read-only after that, they are probably
> multiple megabytes in size, and if you have a lot of them, they are likely
> to live for the same time.
> 
> It may well be possible that we later decide to use the hint in a different
> way, e.g. to put these files into yet another separate log, aside from
> other hot or cold files.
> 
> > > We should also take the kinds of access we have seen on a file into 
> > > account.
> > > E.g. if someone opens a file O_RDWR and performs seek or pwrite on it, we 
> > > can
> > > assume that it's not in the category of typical media files, and a file 
> > > that
> > > gets written to disk linearly in multiple megabytes might belong into the
> > > category even if it is named otherwise.
> > >
> > This is more general but it's hard to adapt now.
> 
> I think it's important to leave the option open for a future optimization.
> Right now, what we have to get agreement on is the on-disk format, because
> we absolutely don't want to make incompatible changes to that once f2fs
> has been merged into the kernel and is getting used on real systems.
> 
> This is independent of how the code is implemented at the moment, and
> any tuning regarding how to group different kinds of data into the six
> logs is completely up to how things work out in practice. But you should
> definitely ensure that those changes don't require changing the format
> if we decide to use a different number of logs in the future, or to
> use the logs differently.
> 
> The split between logs for nodes on the one hand and data on the other
> is something that can well be hardcoded, and it's ok to have a hard
> upper bound on the number of logs in the file system, possibly higher
> than 6.
> 

Thank you for a lot of points to be addressed. :)
Maybe it's time to summarize them.
Please let me know what I misunderstood.

[In v2]
- Extension list
  : Mkfs supports configuring extensions by user, and that information
will be stored in the superblock. In order to reduce the cleaning overhead,
f2fs supports an additional interface, ioctl, likewise ext4.

- The number of active logs
  : No change will be done in on-disk layout (i.e., max 6 logs).
Instead, f2fs supports changing the number with a mount option.
Currently, I think 4, 5, and 6 would be enough.

- Section size
  : Mkfs supports multiples of segments for a section, not power-of-two.

[Future optimization]
- Data separation
  : file access pattern, and else?

>   Arnd

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-15 Thread Jaegeuk Kim
> On Sun, Oct 14, 2012 at 03:19:37PM +, Arnd Bergmann wrote:
> > On Sunday 14 October 2012, Vyacheslav Dubeyko wrote:
> > > On Oct 14, 2012, at 11:09 AM, Jaegeuk Kim wrote:
> > > > 2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
> > > Extended attributes are more flexible way, from my point of view. The 
> > > xattr gives
> > > possibility to make hint to filesystem at any time and without any 
> > > dependencies with
> > > application's functional opportunities. Documented way of using such 
> > > extended attributes
> > > gives to user flexible way of manipulation of filesystem behavior (but I 
> > > remember that
> > > you don't believe in an user :-)).
> > >
> > > So, I think that fadvise() and extended attributes can be complementary 
> > > solutions.
> >
> > Right. Another option is to have ext4 style attributes, see
> > http://linux.die.net/man/1/chattr
> 
> Xattrs are much prefered to more "ext4 style" flags because xattrs
> are filesystem independent. Indeed, some filesystems can't store any
> new "ext4 style" flags without a change of disk format or
> internally mapping them to an xattr. So really, xattrs are the best
> way forward for such hints.
> 
> > Unlike extended attributes, there is a limited number of those,
> > and they can only be boolean flags, but that might be enough for
> > this particular use case.
> 
> A boolean is not sufficient for access policy hints. An extensible
> xattr format is probably the best approach to take here, so that we
> can easily introduce new access policy hints as functionality is
> required. Indeed, an extensible xattr could start with just a
> hot/cold boolean, and grow from there
> 
> > The main reason I can see against extended attributes is that they are not 
> > stored
> > very efficiently in f2fs, unless a lot of work is put into coming up with a 
> > good
> > implementation. A single flags bit can trivially be added to the inode in
> > comparison (if it's not there already).
> 
> That's a deficiency that should be corrected, then, because xattrs
> are very common these days.

IMO, most file systems including f2fs have some inefficiency to store
and retrieve xattrs, since they have to allocate an additional block.
The only distinct problem in f2fs is that there is a cleaning overhead.
So, that's the why xattr is not an efficient way in f2fs.

OTOH, I think xattr itself is for users, not for communicating
between file system and users.
Moreover, I'm not sure in the current android, but I saw ICS android
did not call any xattr operations, even if mount option was enabled.

> 
> And given that stuff like access frequency tracking is being
> implemented at the VFS level, access policy hints should also be VFS
> functionality. A bad filesystem implementation should not dictate
> the interface for generically useful functionality
> 
> > > Anyway, hardcoding or saving in filesystem list of file extensions is a 
> > > nasty way. It
> > > can be not safe or hardly understandable by users the way of 
> > > reconfiguration filesystem
> > > by means of tunefs or debugfs with the purpose of file extensions 
> > > addition in such
> > > "black-box" as TV or smartphones, from my point of view.
> >
> > It is only a performance hint though, so it is not a correctness issue the
> > file system gets it wrong. In order to do efficient garbage collection, a 
> > log
> > structured file system should take all the information it can get about the
> > expected life of data it writes. I agree that the list, even in the form of
> > mkfs time settings, is not a clean abstraction, but in the place of an 
> > Android
> > phone manufacturer I would still enable it if it promises a significant
> > performance advantage over not using it. I guess it would be nice if this
> > could be overridden in some form, e.g. using an ioctl on the file as ext4 
> > does.
> 
> An xattr on the root inode that holds a list like this is something
> that could be set at mkfs time, but then also updated easily by new
> software packages that are installed...
> 
> > We should also take the kinds of access we have seen on a file into account.
> 
> Yes, but it should be done at the VFS level, not in the filesystem
> itself. Integrated into the current hot inode/range tracking that is
> being worked on right now, I'd suggest.
> 
> IOWs, these access policy issues are not unique to F2FS or it's use
> case. Anything to do with access hints, policy, tracking, file
> classification, etc that can influence data locality, reclaim,
> migration, etc need to be dealt with at the VFS, independently of a
> specific filesystem. Filesystems can make use of that information
> how they please (whether in the kernel or via userspace tools), but
> having filesystem specific interfaces and implementations of the
> same functionality is extremely wasteful. Let's do it once, and do
> it right the first time. ;)

I agree that VFS should support something, but before then, it needs
to do something by the file 

Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-15 Thread Dave Chinner
On Sun, Oct 14, 2012 at 03:19:37PM +, Arnd Bergmann wrote:
> On Sunday 14 October 2012, Vyacheslav Dubeyko wrote: 
> > On Oct 14, 2012, at 11:09 AM, Jaegeuk Kim wrote:
> > > 2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
> > Extended attributes are more flexible way, from my point of view. The xattr 
> > gives
> > possibility to make hint to filesystem at any time and without any 
> > dependencies with
> > application's functional opportunities. Documented way of using such 
> > extended attributes
> > gives to user flexible way of manipulation of filesystem behavior (but I 
> > remember that
> > you don't believe in an user :-)).
> > 
> > So, I think that fadvise() and extended attributes can be complementary 
> > solutions.
> 
> Right. Another option is to have ext4 style attributes, see 
> http://linux.die.net/man/1/chattr

Xattrs are much prefered to more "ext4 style" flags because xattrs
are filesystem independent. Indeed, some filesystems can't store any
new "ext4 style" flags without a change of disk format or
internally mapping them to an xattr. So really, xattrs are the best
way forward for such hints.

> Unlike extended attributes, there is a limited number of those,
> and they can only be boolean flags, but that might be enough for
> this particular use case.

A boolean is not sufficient for access policy hints. An extensible
xattr format is probably the best approach to take here, so that we
can easily introduce new access policy hints as functionality is
required. Indeed, an extensible xattr could start with just a
hot/cold boolean, and grow from there

> The main reason I can see against extended attributes is that they are not 
> stored
> very efficiently in f2fs, unless a lot of work is put into coming up with a 
> good
> implementation. A single flags bit can trivially be added to the inode in
> comparison (if it's not there already).

That's a deficiency that should be corrected, then, because xattrs
are very common these days.

And given that stuff like access frequency tracking is being
implemented at the VFS level, access policy hints should also be VFS
functionality. A bad filesystem implementation should not dictate
the interface for generically useful functionality

> > Anyway, hardcoding or saving in filesystem list of file extensions is a 
> > nasty way. It
> > can be not safe or hardly understandable by users the way of 
> > reconfiguration filesystem 
> > by means of tunefs or debugfs with the purpose of file extensions addition 
> > in such 
> > "black-box" as TV or smartphones, from my point of view.
> 
> It is only a performance hint though, so it is not a correctness issue the
> file system gets it wrong. In order to do efficient garbage collection, a log
> structured file system should take all the information it can get about the
> expected life of data it writes. I agree that the list, even in the form of
> mkfs time settings, is not a clean abstraction, but in the place of an Android
> phone manufacturer I would still enable it if it promises a significant
> performance advantage over not using it. I guess it would be nice if this
> could be overridden in some form, e.g. using an ioctl on the file as ext4 
> does.

An xattr on the root inode that holds a list like this is something
that could be set at mkfs time, but then also updated easily by new
software packages that are installed...

> We should also take the kinds of access we have seen on a file into account.

Yes, but it should be done at the VFS level, not in the filesystem
itself. Integrated into the current hot inode/range tracking that is
being worked on right now, I'd suggest.

IOWs, these access policy issues are not unique to F2FS or it's use
case. Anything to do with access hints, policy, tracking, file
classification, etc that can influence data locality, reclaim,
migration, etc need to be dealt with at the VFS, independently of a
specific filesystem. Filesystems can make use of that information
how they please (whether in the kernel or via userspace tools), but
having filesystem specific interfaces and implementations of the
same functionality is extremely wasteful. Let's do it once, and do
it right the first time. ;)

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-15 Thread Arnd Bergmann
On Monday 15 October 2012, Changman Lee wrote:
> 2012년 10월 15일 월요일에 Arnd Bergmann님이 작성:
> > It is only a performance hint though, so it is not a correctness issue the
> > file system gets it wrong. In order to do efficient garbage collection, a 
> > log
> > structured file system should take all the information it can get about the
> > expected life of data it writes. I agree that the list, even in the form of
> > mkfs time settings, is not a clean abstraction, but in the place of an 
> > Android
> > phone manufacturer I would still enable it if it promises a significant
> > performance advantage over not using it. I guess it would be nice if this
> > could be overridden in some form, e.g. using an ioctl on the file as ext4 
> > does.
> >
> Right. This is related with HOT/COLD separation policy of f2fs. If we know
> that data is COLD, we can manage gc effectively.
> I think that ext lists are placed in sb is better like your advice because
> it's difficult to fix user app. Although it's nasty way.

Ok. I think you should adapt the terminology though. Right now, the optimization
is to mark the data as COLD because we expect it to be written less often than
other kinds of data. However, the hot/cold terms are usually only applied to
data that we assume is going to be written soon or not based on how often
the same data has been accessed in the past.

Anything you detect from the file name is not really a hint on hot/cold
files, but rather on the expected access pattern: These files are going
to be written once, and will be read-only after that, they are probably
multiple megabytes in size, and if you have a lot of them, they are likely
to live for the same time.

It may well be possible that we later decide to use the hint in a different
way, e.g. to put these files into yet another separate log, aside from
other hot or cold files.

> > We should also take the kinds of access we have seen on a file into account.
> > E.g. if someone opens a file O_RDWR and performs seek or pwrite on it, we 
> > can
> > assume that it's not in the category of typical media files, and a file that
> > gets written to disk linearly in multiple megabytes might belong into the
> > category even if it is named otherwise.
> >
> This is more general but it's hard to adapt now.

I think it's important to leave the option open for a future optimization.
Right now, what we have to get agreement on is the on-disk format, because
we absolutely don't want to make incompatible changes to that once f2fs
has been merged into the kernel and is getting used on real systems.

This is independent of how the code is implemented at the moment, and
any tuning regarding how to group different kinds of data into the six
logs is completely up to how things work out in practice. But you should
definitely ensure that those changes don't require changing the format
if we decide to use a different number of logs in the future, or to
use the logs differently.

The split between logs for nodes on the one hand and data on the other
is something that can well be hardcoded, and it's ok to have a hard
upper bound on the number of logs in the file system, possibly higher
than 6.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-15 Thread Arnd Bergmann
On Monday 15 October 2012, Changman Lee wrote:
 2012년 10월 15일 월요일에 Arnd Bergmanna...@arndb.de님이 작성:
  It is only a performance hint though, so it is not a correctness issue the
  file system gets it wrong. In order to do efficient garbage collection, a 
  log
  structured file system should take all the information it can get about the
  expected life of data it writes. I agree that the list, even in the form of
  mkfs time settings, is not a clean abstraction, but in the place of an 
  Android
  phone manufacturer I would still enable it if it promises a significant
  performance advantage over not using it. I guess it would be nice if this
  could be overridden in some form, e.g. using an ioctl on the file as ext4 
  does.
 
 Right. This is related with HOT/COLD separation policy of f2fs. If we know
 that data is COLD, we can manage gc effectively.
 I think that ext lists are placed in sb is better like your advice because
 it's difficult to fix user app. Although it's nasty way.

Ok. I think you should adapt the terminology though. Right now, the optimization
is to mark the data as COLD because we expect it to be written less often than
other kinds of data. However, the hot/cold terms are usually only applied to
data that we assume is going to be written soon or not based on how often
the same data has been accessed in the past.

Anything you detect from the file name is not really a hint on hot/cold
files, but rather on the expected access pattern: These files are going
to be written once, and will be read-only after that, they are probably
multiple megabytes in size, and if you have a lot of them, they are likely
to live for the same time.

It may well be possible that we later decide to use the hint in a different
way, e.g. to put these files into yet another separate log, aside from
other hot or cold files.

  We should also take the kinds of access we have seen on a file into account.
  E.g. if someone opens a file O_RDWR and performs seek or pwrite on it, we 
  can
  assume that it's not in the category of typical media files, and a file that
  gets written to disk linearly in multiple megabytes might belong into the
  category even if it is named otherwise.
 
 This is more general but it's hard to adapt now.

I think it's important to leave the option open for a future optimization.
Right now, what we have to get agreement on is the on-disk format, because
we absolutely don't want to make incompatible changes to that once f2fs
has been merged into the kernel and is getting used on real systems.

This is independent of how the code is implemented at the moment, and
any tuning regarding how to group different kinds of data into the six
logs is completely up to how things work out in practice. But you should
definitely ensure that those changes don't require changing the format
if we decide to use a different number of logs in the future, or to
use the logs differently.

The split between logs for nodes on the one hand and data on the other
is something that can well be hardcoded, and it's ok to have a hard
upper bound on the number of logs in the file system, possibly higher
than 6.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-15 Thread Dave Chinner
On Sun, Oct 14, 2012 at 03:19:37PM +, Arnd Bergmann wrote:
 On Sunday 14 October 2012, Vyacheslav Dubeyko wrote: 
  On Oct 14, 2012, at 11:09 AM, Jaegeuk Kim wrote:
   2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
  Extended attributes are more flexible way, from my point of view. The xattr 
  gives
  possibility to make hint to filesystem at any time and without any 
  dependencies with
  application's functional opportunities. Documented way of using such 
  extended attributes
  gives to user flexible way of manipulation of filesystem behavior (but I 
  remember that
  you don't believe in an user :-)).
  
  So, I think that fadvise() and extended attributes can be complementary 
  solutions.
 
 Right. Another option is to have ext4 style attributes, see 
 http://linux.die.net/man/1/chattr

Xattrs are much prefered to more ext4 style flags because xattrs
are filesystem independent. Indeed, some filesystems can't store any
new ext4 style flags without a change of disk format or
internally mapping them to an xattr. So really, xattrs are the best
way forward for such hints.

 Unlike extended attributes, there is a limited number of those,
 and they can only be boolean flags, but that might be enough for
 this particular use case.

A boolean is not sufficient for access policy hints. An extensible
xattr format is probably the best approach to take here, so that we
can easily introduce new access policy hints as functionality is
required. Indeed, an extensible xattr could start with just a
hot/cold boolean, and grow from there

 The main reason I can see against extended attributes is that they are not 
 stored
 very efficiently in f2fs, unless a lot of work is put into coming up with a 
 good
 implementation. A single flags bit can trivially be added to the inode in
 comparison (if it's not there already).

That's a deficiency that should be corrected, then, because xattrs
are very common these days.

And given that stuff like access frequency tracking is being
implemented at the VFS level, access policy hints should also be VFS
functionality. A bad filesystem implementation should not dictate
the interface for generically useful functionality

  Anyway, hardcoding or saving in filesystem list of file extensions is a 
  nasty way. It
  can be not safe or hardly understandable by users the way of 
  reconfiguration filesystem 
  by means of tunefs or debugfs with the purpose of file extensions addition 
  in such 
  black-box as TV or smartphones, from my point of view.
 
 It is only a performance hint though, so it is not a correctness issue the
 file system gets it wrong. In order to do efficient garbage collection, a log
 structured file system should take all the information it can get about the
 expected life of data it writes. I agree that the list, even in the form of
 mkfs time settings, is not a clean abstraction, but in the place of an Android
 phone manufacturer I would still enable it if it promises a significant
 performance advantage over not using it. I guess it would be nice if this
 could be overridden in some form, e.g. using an ioctl on the file as ext4 
 does.

An xattr on the root inode that holds a list like this is something
that could be set at mkfs time, but then also updated easily by new
software packages that are installed...

 We should also take the kinds of access we have seen on a file into account.

Yes, but it should be done at the VFS level, not in the filesystem
itself. Integrated into the current hot inode/range tracking that is
being worked on right now, I'd suggest.

IOWs, these access policy issues are not unique to F2FS or it's use
case. Anything to do with access hints, policy, tracking, file
classification, etc that can influence data locality, reclaim,
migration, etc need to be dealt with at the VFS, independently of a
specific filesystem. Filesystems can make use of that information
how they please (whether in the kernel or via userspace tools), but
having filesystem specific interfaces and implementations of the
same functionality is extremely wasteful. Let's do it once, and do
it right the first time. ;)

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-15 Thread Jaegeuk Kim
 On Sun, Oct 14, 2012 at 03:19:37PM +, Arnd Bergmann wrote:
  On Sunday 14 October 2012, Vyacheslav Dubeyko wrote:
   On Oct 14, 2012, at 11:09 AM, Jaegeuk Kim wrote:
2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
   Extended attributes are more flexible way, from my point of view. The 
   xattr gives
   possibility to make hint to filesystem at any time and without any 
   dependencies with
   application's functional opportunities. Documented way of using such 
   extended attributes
   gives to user flexible way of manipulation of filesystem behavior (but I 
   remember that
   you don't believe in an user :-)).
  
   So, I think that fadvise() and extended attributes can be complementary 
   solutions.
 
  Right. Another option is to have ext4 style attributes, see
  http://linux.die.net/man/1/chattr
 
 Xattrs are much prefered to more ext4 style flags because xattrs
 are filesystem independent. Indeed, some filesystems can't store any
 new ext4 style flags without a change of disk format or
 internally mapping them to an xattr. So really, xattrs are the best
 way forward for such hints.
 
  Unlike extended attributes, there is a limited number of those,
  and they can only be boolean flags, but that might be enough for
  this particular use case.
 
 A boolean is not sufficient for access policy hints. An extensible
 xattr format is probably the best approach to take here, so that we
 can easily introduce new access policy hints as functionality is
 required. Indeed, an extensible xattr could start with just a
 hot/cold boolean, and grow from there
 
  The main reason I can see against extended attributes is that they are not 
  stored
  very efficiently in f2fs, unless a lot of work is put into coming up with a 
  good
  implementation. A single flags bit can trivially be added to the inode in
  comparison (if it's not there already).
 
 That's a deficiency that should be corrected, then, because xattrs
 are very common these days.

IMO, most file systems including f2fs have some inefficiency to store
and retrieve xattrs, since they have to allocate an additional block.
The only distinct problem in f2fs is that there is a cleaning overhead.
So, that's the why xattr is not an efficient way in f2fs.

OTOH, I think xattr itself is for users, not for communicating
between file system and users.
Moreover, I'm not sure in the current android, but I saw ICS android
did not call any xattr operations, even if mount option was enabled.

 
 And given that stuff like access frequency tracking is being
 implemented at the VFS level, access policy hints should also be VFS
 functionality. A bad filesystem implementation should not dictate
 the interface for generically useful functionality
 
   Anyway, hardcoding or saving in filesystem list of file extensions is a 
   nasty way. It
   can be not safe or hardly understandable by users the way of 
   reconfiguration filesystem
   by means of tunefs or debugfs with the purpose of file extensions 
   addition in such
   black-box as TV or smartphones, from my point of view.
 
  It is only a performance hint though, so it is not a correctness issue the
  file system gets it wrong. In order to do efficient garbage collection, a 
  log
  structured file system should take all the information it can get about the
  expected life of data it writes. I agree that the list, even in the form of
  mkfs time settings, is not a clean abstraction, but in the place of an 
  Android
  phone manufacturer I would still enable it if it promises a significant
  performance advantage over not using it. I guess it would be nice if this
  could be overridden in some form, e.g. using an ioctl on the file as ext4 
  does.
 
 An xattr on the root inode that holds a list like this is something
 that could be set at mkfs time, but then also updated easily by new
 software packages that are installed...
 
  We should also take the kinds of access we have seen on a file into account.
 
 Yes, but it should be done at the VFS level, not in the filesystem
 itself. Integrated into the current hot inode/range tracking that is
 being worked on right now, I'd suggest.
 
 IOWs, these access policy issues are not unique to F2FS or it's use
 case. Anything to do with access hints, policy, tracking, file
 classification, etc that can influence data locality, reclaim,
 migration, etc need to be dealt with at the VFS, independently of a
 specific filesystem. Filesystems can make use of that information
 how they please (whether in the kernel or via userspace tools), but
 having filesystem specific interfaces and implementations of the
 same functionality is extremely wasteful. Let's do it once, and do
 it right the first time. ;)

I agree that VFS should support something, but before then, it needs
to do something by the file system first.
Because, we have to figure out what kind of information are really useful.
Thanks,

 
 Cheers,
 
 Dave.
 --
 Dave Chinner
 da...@fromorbit.com

--
To 

RE: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-15 Thread Jaegeuk Kim
 On Monday 15 October 2012, Changman Lee wrote:
  2012년 10월 15일 월요일에 Arnd Bergmanna...@arndb.de님이 작성:
   It is only a performance hint though, so it is not a correctness issue the
   file system gets it wrong. In order to do efficient garbage collection, a 
   log
   structured file system should take all the information it can get about 
   the
   expected life of data it writes. I agree that the list, even in the form 
   of
   mkfs time settings, is not a clean abstraction, but in the place of an 
   Android
   phone manufacturer I would still enable it if it promises a significant
   performance advantage over not using it. I guess it would be nice if this
   could be overridden in some form, e.g. using an ioctl on the file as ext4 
   does.
  
  Right. This is related with HOT/COLD separation policy of f2fs. If we know
  that data is COLD, we can manage gc effectively.
  I think that ext lists are placed in sb is better like your advice because
  it's difficult to fix user app. Although it's nasty way.
 
 Ok. I think you should adapt the terminology though. Right now, the 
 optimization
 is to mark the data as COLD because we expect it to be written less often than
 other kinds of data. However, the hot/cold terms are usually only applied to
 data that we assume is going to be written soon or not based on how often
 the same data has been accessed in the past.
 
 Anything you detect from the file name is not really a hint on hot/cold
 files, but rather on the expected access pattern: These files are going
 to be written once, and will be read-only after that, they are probably
 multiple megabytes in size, and if you have a lot of them, they are likely
 to live for the same time.
 
 It may well be possible that we later decide to use the hint in a different
 way, e.g. to put these files into yet another separate log, aside from
 other hot or cold files.
 
   We should also take the kinds of access we have seen on a file into 
   account.
   E.g. if someone opens a file O_RDWR and performs seek or pwrite on it, we 
   can
   assume that it's not in the category of typical media files, and a file 
   that
   gets written to disk linearly in multiple megabytes might belong into the
   category even if it is named otherwise.
  
  This is more general but it's hard to adapt now.
 
 I think it's important to leave the option open for a future optimization.
 Right now, what we have to get agreement on is the on-disk format, because
 we absolutely don't want to make incompatible changes to that once f2fs
 has been merged into the kernel and is getting used on real systems.
 
 This is independent of how the code is implemented at the moment, and
 any tuning regarding how to group different kinds of data into the six
 logs is completely up to how things work out in practice. But you should
 definitely ensure that those changes don't require changing the format
 if we decide to use a different number of logs in the future, or to
 use the logs differently.
 
 The split between logs for nodes on the one hand and data on the other
 is something that can well be hardcoded, and it's ok to have a hard
 upper bound on the number of logs in the file system, possibly higher
 than 6.
 

Thank you for a lot of points to be addressed. :)
Maybe it's time to summarize them.
Please let me know what I misunderstood.

[In v2]
- Extension list
  : Mkfs supports configuring extensions by user, and that information
will be stored in the superblock. In order to reduce the cleaning overhead,
f2fs supports an additional interface, ioctl, likewise ext4.

- The number of active logs
  : No change will be done in on-disk layout (i.e., max 6 logs).
Instead, f2fs supports changing the number with a mount option.
Currently, I think 4, 5, and 6 would be enough.

- Section size
  : Mkfs supports multiples of segments for a section, not power-of-two.

[Future optimization]
- Data separation
  : file access pattern, and else?

   Arnd

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-14 Thread Arnd Bergmann
On Sunday 14 October 2012, Vyacheslav Dubeyko wrote: 
> On Oct 14, 2012, at 11:09 AM, Jaegeuk Kim wrote:
> > 2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
> >> 
> >> By the way, how about extended attributes? It is possible to save in 
> >> extended attribute
> >> a hint about file's content nature during creation operation or later. 
> >> Moreover, extended
'> >> attribute gives possibility to change hint after renaming operation, for 
example.
> >> 
> > 
> > I think xattr is not a proper way to communicate between file system and
> > users.
> 
> I don't understand why you think that xattr is not proper way. Extended 
> attributes are the
> way of adding some additional properties to filesystem object, from my point 
> of view.

There are different kinds of extended attributes, as described by
http://linux.die.net/man/5/attr

I would think that the system namespace can hold an attribute that can be used 
for this.

> > How about fadvise()?
> > 
> 
> The fadvise() is a good suggestion. But, as I can understand, such solution 
> requires
> using fadvise() during application implementation. So, from one point of 
> view, it exists
> many applications that doesn't use fadvise() and, from another point of view, 
> developers
> change style of coding not so easy.

Most importantly, fadvise is about accessing an open file, and I would expect 
anything
passed in there to be forgotten after the file is closed, while an attribute is 
associated
with the inode and should persist across open/close as well as mount/umount 
cycles.

> Extended attributes are more flexible way, from my point of view. The xattr 
> gives
> possibility to make hint to filesystem at any time and without any 
> dependencies with
> application's functional opportunities. Documented way of using such extended 
> attributes
> gives to user flexible way of manipulation of filesystem behavior (but I 
> remember that
> you don't believe in an user :-)).
> 
> So, I think that fadvise() and extended attributes can be complementary 
> solutions.

Right. Another option is to have ext4 style attributes, see 
http://linux.die.net/man/1/chattr

Unlike extended attributes, there is a limited number of those, and they can
only be boolean flags, but that might be enough for this particular use case.

The main reason I can see against extended attributes is that they are not 
stored
very efficiently in f2fs, unless a lot of work is put into coming up with a good
implementation. A single flags bit can trivially be added to the inode in
comparison (if it's not there already).

> Anyway, hardcoding or saving in filesystem list of file extensions is a nasty 
> way. It
> can be not safe or hardly understandable by users the way of reconfiguration 
> filesystem 
> by means of tunefs or debugfs with the purpose of file extensions addition in 
> such 
> "black-box" as TV or smartphones, from my point of view.

It is only a performance hint though, so it is not a correctness issue the
file system gets it wrong. In order to do efficient garbage collection, a log
structured file system should take all the information it can get about the
expected life of data it writes. I agree that the list, even in the form of
mkfs time settings, is not a clean abstraction, but in the place of an Android
phone manufacturer I would still enable it if it promises a significant
performance advantage over not using it. I guess it would be nice if this
could be overridden in some form, e.g. using an ioctl on the file as ext4 does.

We should also take the kinds of access we have seen on a file into account.
E.g. if someone opens a file O_RDWR and performs seek or pwrite on it, we can
assume that it's not in the category of typical media files, and a file that
gets written to disk linearly in multiple megabytes might belong into the
category even if it is named otherwise.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-14 Thread Vyacheslav Dubeyko

On Oct 14, 2012, at 11:09 AM, Jaegeuk Kim wrote:

> 2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
>> On Oct 14, 2012, at 12:52 AM, Arnd Bergmann wrote:
>> 
>>> On Friday 05 October 2012, 김재극 wrote:
 +const char *media_ext_lists[] = {
 +   "jpg",
 +   "gif",
 +   "png",
 +   "avi",
 +   "divx",
 +   "mp4",
 +   "mp3",
 ...
>>> 
 + * Set multimedia files as cold files for hot/cold data separation
 + */
 +static inline void set_cold_file(struct inode *inode, const unsigned char 
 *name)
 +{
 +   const char **extlist = media_ext_lists;
 +
 +   while (*extlist) {
 +   if (!is_multimedia_file(name, *extlist)) {
 +   F2FS_I(inode)->is_cold = 1;
 +   break;
 +   }
 +   extlist++;
 +   }
 +}
>>> 
>>> This is a very clever way of categorizing files by their name, but I wonder 
>>> if hardcoding
>>> the list of file name extensions at in the kernel source is the best 
>>> strategy. Generally
>>> I would consider this to be a policy that should be configurable by the 
>>> user.
>>> 
>>> Unfortunately I can't think of a good interface to configure this, but 
>>> maybe someone
>>> else has a useful idea. Maybe the list can be stored in the superblock and 
>>> get written
>>> at mkfs time from the same defaults, but with the option of overriding it 
>>> using
>>> a debugfs tool.
>>> 
>> 
>> By the way, how about extended attributes? It is possible to save in 
>> extended attribute a hint about file's content nature during creation 
>> operation or later. Moreover, extended attribute gives possibility to change 
>> hint after renaming operation, for example.
>> 
> 
> I think xattr is not a proper way to communicate between file system and
> users.

I don't understand why you think that xattr is not proper way. Extended 
attributes are the way of adding some additional properties to filesystem 
object, from my point of view.

> How about fadvise()?
> 

The fadvise() is a good suggestion. But, as I can understand, such solution 
requires using fadvise() during application implementation. So, from one point 
of view, it exists many applications that doesn't use fadvise() and, from 
another point of view, developers change style of coding not so easy.

Extended attributes are more flexible way, from my point of view. The xattr 
gives possibility to make hint to filesystem at any time and without any 
dependencies with application's functional opportunities. Documented way of 
using such extended attributes gives to user flexible way of manipulation of 
filesystem behavior (but I remember that you don't believe in an user :-)).

So, I think that fadvise() and extended attributes can be complementary 
solutions.

Anyway, hardcoding or saving in filesystem list of file extensions is a nasty 
way. It can be not safe or hardly understandable by users the way of 
reconfiguration filesystem by means of tunefs or debugfs with the purpose of 
file extensions addition in such "black-box" as TV or smartphones, from my 
point of view.

With the best regards,
Vyacheslav Dubeyko.

>> With the best regards,
>> Vyacheslav Dubeyko.
>> 
>>> Arnd
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> 
> -- 
> Jaegeuk Kim
> Samsung
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-14 Thread Jaegeuk Kim
2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
> On Oct 14, 2012, at 12:52 AM, Arnd Bergmann wrote:
> 
> > On Friday 05 October 2012, 김재극 wrote:
> >> +const char *media_ext_lists[] = {
> >> +   "jpg",
> >> +   "gif",
> >> +   "png",
> >> +   "avi",
> >> +   "divx",
> >> +   "mp4",
> >> +   "mp3",
> >> ...
> > 
> >> + * Set multimedia files as cold files for hot/cold data separation
> >> + */
> >> +static inline void set_cold_file(struct inode *inode, const unsigned char 
> >> *name)
> >> +{
> >> +   const char **extlist = media_ext_lists;
> >> +
> >> +   while (*extlist) {
> >> +   if (!is_multimedia_file(name, *extlist)) {
> >> +   F2FS_I(inode)->is_cold = 1;
> >> +   break;
> >> +   }
> >> +   extlist++;
> >> +   }
> >> +}
> > 
> > This is a very clever way of categorizing files by their name, but I wonder 
> > if hardcoding
> > the list of file name extensions at in the kernel source is the best 
> > strategy. Generally
> > I would consider this to be a policy that should be configurable by the 
> > user.
> > 
> > Unfortunately I can't think of a good interface to configure this, but 
> > maybe someone
> > else has a useful idea. Maybe the list can be stored in the superblock and 
> > get written
> > at mkfs time from the same defaults, but with the option of overriding it 
> > using
> > a debugfs tool.
> > 
> 
> By the way, how about extended attributes? It is possible to save in extended 
> attribute a hint about file's content nature during creation operation or 
> later. Moreover, extended attribute gives possibility to change hint after 
> renaming operation, for example.
> 

I think xattr is not a proper way to communicate between file system and
users.
How about fadvise()?

> With the best regards,
> Vyacheslav Dubeyko.
> 
> > Arnd
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-14 Thread Jaegeuk Kim
2012-10-13 (토), 20:52 +, Arnd Bergmann:
> On Friday 05 October 2012, 김재극 wrote:
> > +const char *media_ext_lists[] = {
> > +   "jpg",
> > +   "gif",
> > +   "png",
> > +   "avi",
> > +   "divx",
> > +   "mp4",
> > +   "mp3",
> > ...
> 
> > + * Set multimedia files as cold files for hot/cold data separation
> > + */
> > +static inline void set_cold_file(struct inode *inode, const unsigned char 
> > *name)
> > +{
> > +   const char **extlist = media_ext_lists;
> > +
> > +   while (*extlist) {
> > +   if (!is_multimedia_file(name, *extlist)) {
> > +   F2FS_I(inode)->is_cold = 1;
> > +   break;
> > +   }
> > +   extlist++;
> > +   }
> > +}
> 
> This is a very clever way of categorizing files by their name, but I wonder 
> if hardcoding
> the list of file name extensions at in the kernel source is the best 
> strategy. Generally
> I would consider this to be a policy that should be configurable by the user.
> 
> Unfortunately I can't think of a good interface to configure this, but maybe 
> someone
> else has a useful idea. Maybe the list can be stored in the superblock and 
> get written
> at mkfs time from the same defaults, but with the option of overriding it 
> using
> a debugfs tool.
> 

Good point!
I'll think about a user-made list.
Thanks,

>   Arnd
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-14 Thread Jaegeuk Kim
2012-10-13 (토), 20:52 +, Arnd Bergmann:
 On Friday 05 October 2012, 김재극 wrote:
  +const char *media_ext_lists[] = {
  +   jpg,
  +   gif,
  +   png,
  +   avi,
  +   divx,
  +   mp4,
  +   mp3,
  ...
 
  + * Set multimedia files as cold files for hot/cold data separation
  + */
  +static inline void set_cold_file(struct inode *inode, const unsigned char 
  *name)
  +{
  +   const char **extlist = media_ext_lists;
  +
  +   while (*extlist) {
  +   if (!is_multimedia_file(name, *extlist)) {
  +   F2FS_I(inode)-is_cold = 1;
  +   break;
  +   }
  +   extlist++;
  +   }
  +}
 
 This is a very clever way of categorizing files by their name, but I wonder 
 if hardcoding
 the list of file name extensions at in the kernel source is the best 
 strategy. Generally
 I would consider this to be a policy that should be configurable by the user.
 
 Unfortunately I can't think of a good interface to configure this, but maybe 
 someone
 else has a useful idea. Maybe the list can be stored in the superblock and 
 get written
 at mkfs time from the same defaults, but with the option of overriding it 
 using
 a debugfs tool.
 

Good point!
I'll think about a user-made list.
Thanks,

   Arnd
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-14 Thread Jaegeuk Kim
2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
 On Oct 14, 2012, at 12:52 AM, Arnd Bergmann wrote:
 
  On Friday 05 October 2012, 김재극 wrote:
  +const char *media_ext_lists[] = {
  +   jpg,
  +   gif,
  +   png,
  +   avi,
  +   divx,
  +   mp4,
  +   mp3,
  ...
  
  + * Set multimedia files as cold files for hot/cold data separation
  + */
  +static inline void set_cold_file(struct inode *inode, const unsigned char 
  *name)
  +{
  +   const char **extlist = media_ext_lists;
  +
  +   while (*extlist) {
  +   if (!is_multimedia_file(name, *extlist)) {
  +   F2FS_I(inode)-is_cold = 1;
  +   break;
  +   }
  +   extlist++;
  +   }
  +}
  
  This is a very clever way of categorizing files by their name, but I wonder 
  if hardcoding
  the list of file name extensions at in the kernel source is the best 
  strategy. Generally
  I would consider this to be a policy that should be configurable by the 
  user.
  
  Unfortunately I can't think of a good interface to configure this, but 
  maybe someone
  else has a useful idea. Maybe the list can be stored in the superblock and 
  get written
  at mkfs time from the same defaults, but with the option of overriding it 
  using
  a debugfs tool.
  
 
 By the way, how about extended attributes? It is possible to save in extended 
 attribute a hint about file's content nature during creation operation or 
 later. Moreover, extended attribute gives possibility to change hint after 
 renaming operation, for example.
 

I think xattr is not a proper way to communicate between file system and
users.
How about fadvise()?

 With the best regards,
 Vyacheslav Dubeyko.
 
  Arnd
  --
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-14 Thread Vyacheslav Dubeyko

On Oct 14, 2012, at 11:09 AM, Jaegeuk Kim wrote:

 2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
 On Oct 14, 2012, at 12:52 AM, Arnd Bergmann wrote:
 
 On Friday 05 October 2012, 김재극 wrote:
 +const char *media_ext_lists[] = {
 +   jpg,
 +   gif,
 +   png,
 +   avi,
 +   divx,
 +   mp4,
 +   mp3,
 ...
 
 + * Set multimedia files as cold files for hot/cold data separation
 + */
 +static inline void set_cold_file(struct inode *inode, const unsigned char 
 *name)
 +{
 +   const char **extlist = media_ext_lists;
 +
 +   while (*extlist) {
 +   if (!is_multimedia_file(name, *extlist)) {
 +   F2FS_I(inode)-is_cold = 1;
 +   break;
 +   }
 +   extlist++;
 +   }
 +}
 
 This is a very clever way of categorizing files by their name, but I wonder 
 if hardcoding
 the list of file name extensions at in the kernel source is the best 
 strategy. Generally
 I would consider this to be a policy that should be configurable by the 
 user.
 
 Unfortunately I can't think of a good interface to configure this, but 
 maybe someone
 else has a useful idea. Maybe the list can be stored in the superblock and 
 get written
 at mkfs time from the same defaults, but with the option of overriding it 
 using
 a debugfs tool.
 
 
 By the way, how about extended attributes? It is possible to save in 
 extended attribute a hint about file's content nature during creation 
 operation or later. Moreover, extended attribute gives possibility to change 
 hint after renaming operation, for example.
 
 
 I think xattr is not a proper way to communicate between file system and
 users.

I don't understand why you think that xattr is not proper way. Extended 
attributes are the way of adding some additional properties to filesystem 
object, from my point of view.

 How about fadvise()?
 

The fadvise() is a good suggestion. But, as I can understand, such solution 
requires using fadvise() during application implementation. So, from one point 
of view, it exists many applications that doesn't use fadvise() and, from 
another point of view, developers change style of coding not so easy.

Extended attributes are more flexible way, from my point of view. The xattr 
gives possibility to make hint to filesystem at any time and without any 
dependencies with application's functional opportunities. Documented way of 
using such extended attributes gives to user flexible way of manipulation of 
filesystem behavior (but I remember that you don't believe in an user :-)).

So, I think that fadvise() and extended attributes can be complementary 
solutions.

Anyway, hardcoding or saving in filesystem list of file extensions is a nasty 
way. It can be not safe or hardly understandable by users the way of 
reconfiguration filesystem by means of tunefs or debugfs with the purpose of 
file extensions addition in such black-box as TV or smartphones, from my 
point of view.

With the best regards,
Vyacheslav Dubeyko.

 With the best regards,
 Vyacheslav Dubeyko.
 
 Arnd
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
 -- 
 Jaegeuk Kim
 Samsung
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-14 Thread Arnd Bergmann
On Sunday 14 October 2012, Vyacheslav Dubeyko wrote: 
 On Oct 14, 2012, at 11:09 AM, Jaegeuk Kim wrote:
  2012-10-14 (일), 02:21 +0400, Vyacheslav Dubeyko:
  
  By the way, how about extended attributes? It is possible to save in 
  extended attribute
  a hint about file's content nature during creation operation or later. 
  Moreover, extended
'  attribute gives possibility to change hint after renaming operation, for 
example.
  
  
  I think xattr is not a proper way to communicate between file system and
  users.
 
 I don't understand why you think that xattr is not proper way. Extended 
 attributes are the
 way of adding some additional properties to filesystem object, from my point 
 of view.

There are different kinds of extended attributes, as described by
http://linux.die.net/man/5/attr

I would think that the system namespace can hold an attribute that can be used 
for this.

  How about fadvise()?
  
 
 The fadvise() is a good suggestion. But, as I can understand, such solution 
 requires
 using fadvise() during application implementation. So, from one point of 
 view, it exists
 many applications that doesn't use fadvise() and, from another point of view, 
 developers
 change style of coding not so easy.

Most importantly, fadvise is about accessing an open file, and I would expect 
anything
passed in there to be forgotten after the file is closed, while an attribute is 
associated
with the inode and should persist across open/close as well as mount/umount 
cycles.

 Extended attributes are more flexible way, from my point of view. The xattr 
 gives
 possibility to make hint to filesystem at any time and without any 
 dependencies with
 application's functional opportunities. Documented way of using such extended 
 attributes
 gives to user flexible way of manipulation of filesystem behavior (but I 
 remember that
 you don't believe in an user :-)).
 
 So, I think that fadvise() and extended attributes can be complementary 
 solutions.

Right. Another option is to have ext4 style attributes, see 
http://linux.die.net/man/1/chattr

Unlike extended attributes, there is a limited number of those, and they can
only be boolean flags, but that might be enough for this particular use case.

The main reason I can see against extended attributes is that they are not 
stored
very efficiently in f2fs, unless a lot of work is put into coming up with a good
implementation. A single flags bit can trivially be added to the inode in
comparison (if it's not there already).

 Anyway, hardcoding or saving in filesystem list of file extensions is a nasty 
 way. It
 can be not safe or hardly understandable by users the way of reconfiguration 
 filesystem 
 by means of tunefs or debugfs with the purpose of file extensions addition in 
 such 
 black-box as TV or smartphones, from my point of view.

It is only a performance hint though, so it is not a correctness issue the
file system gets it wrong. In order to do efficient garbage collection, a log
structured file system should take all the information it can get about the
expected life of data it writes. I agree that the list, even in the form of
mkfs time settings, is not a clean abstraction, but in the place of an Android
phone manufacturer I would still enable it if it promises a significant
performance advantage over not using it. I guess it would be nice if this
could be overridden in some form, e.g. using an ioctl on the file as ext4 does.

We should also take the kinds of access we have seen on a file into account.
E.g. if someone opens a file O_RDWR and performs seek or pwrite on it, we can
assume that it's not in the category of typical media files, and a file that
gets written to disk linearly in multiple megabytes might belong into the
category even if it is named otherwise.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-13 Thread Vyacheslav Dubeyko

On Oct 14, 2012, at 12:52 AM, Arnd Bergmann wrote:

> On Friday 05 October 2012, 김재극 wrote:
>> +const char *media_ext_lists[] = {
>> +   "jpg",
>> +   "gif",
>> +   "png",
>> +   "avi",
>> +   "divx",
>> +   "mp4",
>> +   "mp3",
>> ...
> 
>> + * Set multimedia files as cold files for hot/cold data separation
>> + */
>> +static inline void set_cold_file(struct inode *inode, const unsigned char 
>> *name)
>> +{
>> +   const char **extlist = media_ext_lists;
>> +
>> +   while (*extlist) {
>> +   if (!is_multimedia_file(name, *extlist)) {
>> +   F2FS_I(inode)->is_cold = 1;
>> +   break;
>> +   }
>> +   extlist++;
>> +   }
>> +}
> 
> This is a very clever way of categorizing files by their name, but I wonder 
> if hardcoding
> the list of file name extensions at in the kernel source is the best 
> strategy. Generally
> I would consider this to be a policy that should be configurable by the user.
> 
> Unfortunately I can't think of a good interface to configure this, but maybe 
> someone
> else has a useful idea. Maybe the list can be stored in the superblock and 
> get written
> at mkfs time from the same defaults, but with the option of overriding it 
> using
> a debugfs tool.
> 

By the way, how about extended attributes? It is possible to save in extended 
attribute a hint about file's content nature during creation operation or 
later. Moreover, extended attribute gives possibility to change hint after 
renaming operation, for example.

With the best regards,
Vyacheslav Dubeyko.

>   Arnd
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-13 Thread Vyacheslav Dubeyko

On Oct 14, 2012, at 12:52 AM, Arnd Bergmann wrote:

> On Friday 05 October 2012, 김재극 wrote:
>> +const char *media_ext_lists[] = {
>> +   "jpg",
>> +   "gif",
>> +   "png",
>> +   "avi",
>> +   "divx",
>> +   "mp4",
>> +   "mp3",
>> ...
> 
>> + * Set multimedia files as cold files for hot/cold data separation
>> + */
>> +static inline void set_cold_file(struct inode *inode, const unsigned char 
>> *name)
>> +{
>> +   const char **extlist = media_ext_lists;
>> +
>> +   while (*extlist) {
>> +   if (!is_multimedia_file(name, *extlist)) {
>> +   F2FS_I(inode)->is_cold = 1;
>> +   break;
>> +   }
>> +   extlist++;
>> +   }
>> +}
> 
> This is a very clever way of categorizing files by their name, but I wonder 
> if hardcoding
> the list of file name extensions at in the kernel source is the best 
> strategy. Generally
> I would consider this to be a policy that should be configurable by the user.
> 

I think that file extensions can't be a steady basis for categorization. It is 
possible that user can use any extension as you want during file naming (for 
example, save text file with png extension). Or it is possible to use file 
without any extension. Only magics in a file structure can be a steady basis. 
But analyzing of file structure on the file system driver level is a breaking 
of some fundamentals, from my point of view.

With the best regards,
Vyacheslav Dubeyko.

> Unfortunately I can't think of a good interface to configure this, but maybe 
> someone
> else has a useful idea. Maybe the list can be stored in the superblock and 
> get written
> at mkfs time from the same defaults, but with the option of overriding it 
> using
> a debugfs tool.
> 
>   Arnd
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-13 Thread Arnd Bergmann
On Friday 05 October 2012, 김재극 wrote:
> +const char *media_ext_lists[] = {
> +   "jpg",
> +   "gif",
> +   "png",
> +   "avi",
> +   "divx",
> +   "mp4",
> +   "mp3",
> ...

> + * Set multimedia files as cold files for hot/cold data separation
> + */
> +static inline void set_cold_file(struct inode *inode, const unsigned char 
> *name)
> +{
> +   const char **extlist = media_ext_lists;
> +
> +   while (*extlist) {
> +   if (!is_multimedia_file(name, *extlist)) {
> +   F2FS_I(inode)->is_cold = 1;
> +   break;
> +   }
> +   extlist++;
> +   }
> +}

This is a very clever way of categorizing files by their name, but I wonder if 
hardcoding
the list of file name extensions at in the kernel source is the best strategy. 
Generally
I would consider this to be a policy that should be configurable by the user.

Unfortunately I can't think of a good interface to configure this, but maybe 
someone
else has a useful idea. Maybe the list can be stored in the superblock and get 
written
at mkfs time from the same defaults, but with the option of overriding it using
a debugfs tool.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-13 Thread Arnd Bergmann
On Friday 05 October 2012, 김재극 wrote:
 +const char *media_ext_lists[] = {
 +   jpg,
 +   gif,
 +   png,
 +   avi,
 +   divx,
 +   mp4,
 +   mp3,
 ...

 + * Set multimedia files as cold files for hot/cold data separation
 + */
 +static inline void set_cold_file(struct inode *inode, const unsigned char 
 *name)
 +{
 +   const char **extlist = media_ext_lists;
 +
 +   while (*extlist) {
 +   if (!is_multimedia_file(name, *extlist)) {
 +   F2FS_I(inode)-is_cold = 1;
 +   break;
 +   }
 +   extlist++;
 +   }
 +}

This is a very clever way of categorizing files by their name, but I wonder if 
hardcoding
the list of file name extensions at in the kernel source is the best strategy. 
Generally
I would consider this to be a policy that should be configurable by the user.

Unfortunately I can't think of a good interface to configure this, but maybe 
someone
else has a useful idea. Maybe the list can be stored in the superblock and get 
written
at mkfs time from the same defaults, but with the option of overriding it using
a debugfs tool.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-13 Thread Vyacheslav Dubeyko

On Oct 14, 2012, at 12:52 AM, Arnd Bergmann wrote:

 On Friday 05 October 2012, 김재극 wrote:
 +const char *media_ext_lists[] = {
 +   jpg,
 +   gif,
 +   png,
 +   avi,
 +   divx,
 +   mp4,
 +   mp3,
 ...
 
 + * Set multimedia files as cold files for hot/cold data separation
 + */
 +static inline void set_cold_file(struct inode *inode, const unsigned char 
 *name)
 +{
 +   const char **extlist = media_ext_lists;
 +
 +   while (*extlist) {
 +   if (!is_multimedia_file(name, *extlist)) {
 +   F2FS_I(inode)-is_cold = 1;
 +   break;
 +   }
 +   extlist++;
 +   }
 +}
 
 This is a very clever way of categorizing files by their name, but I wonder 
 if hardcoding
 the list of file name extensions at in the kernel source is the best 
 strategy. Generally
 I would consider this to be a policy that should be configurable by the user.
 

I think that file extensions can't be a steady basis for categorization. It is 
possible that user can use any extension as you want during file naming (for 
example, save text file with png extension). Or it is possible to use file 
without any extension. Only magics in a file structure can be a steady basis. 
But analyzing of file structure on the file system driver level is a breaking 
of some fundamentals, from my point of view.

With the best regards,
Vyacheslav Dubeyko.

 Unfortunately I can't think of a good interface to configure this, but maybe 
 someone
 else has a useful idea. Maybe the list can be stored in the superblock and 
 get written
 at mkfs time from the same defaults, but with the option of overriding it 
 using
 a debugfs tool.
 
   Arnd
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-13 Thread Vyacheslav Dubeyko

On Oct 14, 2012, at 12:52 AM, Arnd Bergmann wrote:

 On Friday 05 October 2012, 김재극 wrote:
 +const char *media_ext_lists[] = {
 +   jpg,
 +   gif,
 +   png,
 +   avi,
 +   divx,
 +   mp4,
 +   mp3,
 ...
 
 + * Set multimedia files as cold files for hot/cold data separation
 + */
 +static inline void set_cold_file(struct inode *inode, const unsigned char 
 *name)
 +{
 +   const char **extlist = media_ext_lists;
 +
 +   while (*extlist) {
 +   if (!is_multimedia_file(name, *extlist)) {
 +   F2FS_I(inode)-is_cold = 1;
 +   break;
 +   }
 +   extlist++;
 +   }
 +}
 
 This is a very clever way of categorizing files by their name, but I wonder 
 if hardcoding
 the list of file name extensions at in the kernel source is the best 
 strategy. Generally
 I would consider this to be a policy that should be configurable by the user.
 
 Unfortunately I can't think of a good interface to configure this, but maybe 
 someone
 else has a useful idea. Maybe the list can be stored in the superblock and 
 get written
 at mkfs time from the same defaults, but with the option of overriding it 
 using
 a debugfs tool.
 

By the way, how about extended attributes? It is possible to save in extended 
attribute a hint about file's content nature during creation operation or 
later. Moreover, extended attribute gives possibility to change hint after 
renaming operation, for example.

With the best regards,
Vyacheslav Dubeyko.

   Arnd
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-06 Thread Al Viro
On Fri, Oct 05, 2012 at 09:03:09PM +0900, ? wrote:
> +static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t 
> mode,
> + bool excl)
> +{
> + struct super_block *sb = dir->i_sb;
> + struct f2fs_sb_info *sbi = F2FS_SB(sb);
> + struct inode *inode;
> + nid_t ino = 0;
> + int err;
> +
> + if (dentry->d_name.len > F2FS_MAX_NAME_LEN)
> + return -ENAMETOOLONG;

Pointless - failing those on ->lookup() with ENAMETOOLONG is enough.
The same goes for all entry creation methods.

> + if (inode->i_nlink >= F2FS_LINK_MAX)
> + return -EMLINK;

Just set ->s_max_links and be done with that.

> + if (dir->i_nlink >= F2FS_LINK_MAX)
> + return err;

Ditto.

> + if (old_dir_entry) {
> + err = -EMLINK;
> + if (new_dir->i_nlink >= F2FS_LINK_MAX)
> + goto out_dir;

... and here as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/16] f2fs: add inode operations for special inodes

2012-10-06 Thread Al Viro
On Fri, Oct 05, 2012 at 09:03:09PM +0900, ? wrote:
 +static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t 
 mode,
 + bool excl)
 +{
 + struct super_block *sb = dir-i_sb;
 + struct f2fs_sb_info *sbi = F2FS_SB(sb);
 + struct inode *inode;
 + nid_t ino = 0;
 + int err;
 +
 + if (dentry-d_name.len  F2FS_MAX_NAME_LEN)
 + return -ENAMETOOLONG;

Pointless - failing those on -lookup() with ENAMETOOLONG is enough.
The same goes for all entry creation methods.

 + if (inode-i_nlink = F2FS_LINK_MAX)
 + return -EMLINK;

Just set -s_max_links and be done with that.

 + if (dir-i_nlink = F2FS_LINK_MAX)
 + return err;

Ditto.

 + if (old_dir_entry) {
 + err = -EMLINK;
 + if (new_dir-i_nlink = F2FS_LINK_MAX)
 + goto out_dir;

... and here as well.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/