Re: [PATCH] fat/vfat: optionally ignore system timezone offset when reading/writing timestamps

2007-03-25 Thread Hiroyuki Machida
HI OGAWA-san and Paul-san,

Sorry late response,

I'm not famillar with recent fat code, but code itself looks good for
just turn on/off time adjusting. On the other hand, I feel we need more 
consideration on use cases/requirements. I feel that turning off
time adjustment is a just ad-hoc solution to issues like Paul san 
brought up.

Thanks,
Hiroyuki
 

OGAWA Hirofumi <[EMAIL PROTECTED]> wrote on 2007/03/19 23:31:06:
> Paul Collins <[EMAIL PROTECTED]> writes:
> 
> > Hello,
> 
> Hello,
> 
> > Here is a patch that adds a mount option named "posixtime" that, when
> > enabled, causes the fat/vfat code to not adjust timestamps as they are
> > read/written to/from disk.  The intent of the adjustment as performed
> > by the existing code appears to be to present correct timestamps to
> > Windows and friends, which treat the timestamp values as local time.
> >
> > However, the systems that I use to update my FAT32 filesystems are all
> > running Linux, and as such will process POSIX timestamps correctly.
> > (The filesystems are on disks in digital audio players, which do not
> > process timestamps except to check if they have changed.)  Due to the
> > aforementioned adjustment on read/write, after a Daylight Savings
> > transition I must update the file timestamps on the filesystems so
> > that rsync does not needlessly re-copy files.
> >
> > Please review and consider applying this patch.
> 
> Thanks. Looks good to me as start. IIRC, Machida-san did this few
> years ago. Machida-san, do you have any comment?
> 
> > diff --git a/fs/fat/dir.c b/fs/fat/dir.c
> > index c16af24..d6c0a7f 100644
> > --- a/fs/fat/dir.c
> > +++ b/fs/fat/dir.c
> > @@ -1064,7 +1064,7 @@ int fat_alloc_new_dir(struct inode *dir, struct 
> timespec *ts)
> >goto error_free;
> >   }
> >
> > - fat_date_unix2dos(ts->tv_sec, , );
> > + fat_date_unix2dos(ts->tv_sec, , , sbi->options.adjust);
> >
> >   de = (struct msdos_dir_entry *)bhs[0]->b_data;
> >   /* filling the new directory slots ("." and ".." entries) */
> > diff --git a/fs/fat/inode.c b/fs/fat/inode.c
> > index a9e4688..02d9225 100644
> > --- a/fs/fat/inode.c
> > +++ b/fs/fat/inode.c
> > @@ -374,17 +374,20 @@ static int fat_fill_inode(struct inode *inode, 
> struct msdos_dir_entry *de)
> >   inode->i_blocks = ((inode->i_size + (sbi->cluster_size - 1))
> >& ~((loff_t)sbi->cluster_size - 1)) >> 9;
> >   inode->i_mtime.tv_sec =
> > -  date_dos2unix(le16_to_cpu(de->time), le16_to_cpu(de->date));
> > +  date_dos2unix(le16_to_cpu(de->time), le16_to_cpu(de->date),
> > + sbi->options.adjust);
> >   inode->i_mtime.tv_nsec = 0;
> >   if (sbi->options.isvfat) {
> >int secs = de->ctime_cs / 100;
> >int csecs = de->ctime_cs % 100;
> >inode->i_ctime.tv_sec  =
> > date_dos2unix(le16_to_cpu(de->ctime),
> > -  le16_to_cpu(de->cdate)) + secs;
> > +  le16_to_cpu(de->cdate),
> > +  sbi->options.adjust) + secs;
> >inode->i_ctime.tv_nsec = csecs * 1000;
> >inode->i_atime.tv_sec =
> > -   date_dos2unix(0, le16_to_cpu(de->adate));
> > +   date_dos2unix(0, le16_to_cpu(de->adate),
> > +  sbi->options.adjust);
> >inode->i_atime.tv_nsec = 0;
> >   } else
> >inode->i_ctime = inode->i_atime = inode->i_mtime;
> > @@ -592,11 +595,14 @@ retry:
> >   raw_entry->attr = fat_attr(inode);
> >   raw_entry->start = cpu_to_le16(MSDOS_I(inode)->i_logstart);
> >   raw_entry->starthi = cpu_to_le16(MSDOS_I(inode)->i_logstart >> 16);
> > - fat_date_unix2dos(inode->i_mtime.tv_sec, _entry->time, 
> _entry->date);
> > + fat_date_unix2dos(inode->i_mtime.tv_sec, _entry->time, 
> _entry->date,
> > + sbi->options.adjust);
> >   if (sbi->options.isvfat) {
> >__le16 atime;
> > - 
> fat_date_unix2dos(inode->i_ctime.tv_sec,_entry->ctime,_entry->cd
> ate);
> > -  fat_date_unix2dos(inode->i_atime.tv_sec,,_entry->adate);
> > + 
> fat_date_unix2dos(inode->i_ctime.tv_sec,_entry->ctime,_entry->cd
> ate,
> > +  sbi->options.adjust);
> > +  fat_date_unix2dos(inode->i_atime.tv_sec,,_entry->adate,
> > +  sbi->options.adjust);
> >raw_entry->ctime_cs = (inode->i_ctime.tv_sec & 1) * 100 +
> > inode->i_ctime.tv_nsec / 1000;
> >   }
> > @@ -854,7 +860,7 @@ enum {
> >   Opt_charset, Opt_shortname_lower, Opt_shortname_win95,
> >   Opt_shortname_winnt, Opt_shortname_mixed, Opt_utf8_no, Opt_utf8_yes,
> >   Opt_uni_xl_no, Opt_uni_xl_yes, Opt_nonumtail_no, Opt_nonumtail_yes,
> > - Opt_obsolate, Opt_flush, Opt_err,
> > + Opt_obsolate, Opt_flush, Opt_posixtime, Opt_err,
> >  };
> >
> >  static match_table_t fat_tokens = {
> > @@ -887,6 +893,7 @@ static match_table_t fat_tokens = {
> >   {Opt_obsolate, "cvf_options=%100s"},
> >   {Opt_obsolate, "posix"},
> >   {Opt_flush, "flush"},
> > + {Opt_posixtime, "posixtime"},
> >   {Opt_err, NULL},
> >  };
> >  static match_table_t msdos_tokens = {
> > @@ -950,6 +957,7 @@ static int parse_options(char *options, int 
> is_vfat, int silent, int *debug,
> >   opts->utf8 = opts->unicode_xlate = 0;
> >   

Re: [PATCH] fat/vfat: optionally ignore system timezone offset when reading/writing timestamps

2007-03-25 Thread Hiroyuki Machida
HI OGAWA-san and Paul-san,

Sorry late response,

I'm not famillar with recent fat code, but code itself looks good for
just turn on/off time adjusting. On the other hand, I feel we need more 
consideration on use cases/requirements. I feel that turning off
time adjustment is a just ad-hoc solution to issues like Paul san 
brought up.

Thanks,
Hiroyuki
 

OGAWA Hirofumi [EMAIL PROTECTED] wrote on 2007/03/19 23:31:06:
 Paul Collins [EMAIL PROTECTED] writes:
 
  Hello,
 
 Hello,
 
  Here is a patch that adds a mount option named posixtime that, when
  enabled, causes the fat/vfat code to not adjust timestamps as they are
  read/written to/from disk.  The intent of the adjustment as performed
  by the existing code appears to be to present correct timestamps to
  Windows and friends, which treat the timestamp values as local time.
 
  However, the systems that I use to update my FAT32 filesystems are all
  running Linux, and as such will process POSIX timestamps correctly.
  (The filesystems are on disks in digital audio players, which do not
  process timestamps except to check if they have changed.)  Due to the
  aforementioned adjustment on read/write, after a Daylight Savings
  transition I must update the file timestamps on the filesystems so
  that rsync does not needlessly re-copy files.
 
  Please review and consider applying this patch.
 
 Thanks. Looks good to me as start. IIRC, Machida-san did this few
 years ago. Machida-san, do you have any comment?
 
  diff --git a/fs/fat/dir.c b/fs/fat/dir.c
  index c16af24..d6c0a7f 100644
  --- a/fs/fat/dir.c
  +++ b/fs/fat/dir.c
  @@ -1064,7 +1064,7 @@ int fat_alloc_new_dir(struct inode *dir, struct 
 timespec *ts)
 goto error_free;
}
 
  - fat_date_unix2dos(ts-tv_sec, time, date);
  + fat_date_unix2dos(ts-tv_sec, time, date, sbi-options.adjust);
 
de = (struct msdos_dir_entry *)bhs[0]-b_data;
/* filling the new directory slots (. and .. entries) */
  diff --git a/fs/fat/inode.c b/fs/fat/inode.c
  index a9e4688..02d9225 100644
  --- a/fs/fat/inode.c
  +++ b/fs/fat/inode.c
  @@ -374,17 +374,20 @@ static int fat_fill_inode(struct inode *inode, 
 struct msdos_dir_entry *de)
inode-i_blocks = ((inode-i_size + (sbi-cluster_size - 1))
  ~((loff_t)sbi-cluster_size - 1))  9;
inode-i_mtime.tv_sec =
  -  date_dos2unix(le16_to_cpu(de-time), le16_to_cpu(de-date));
  +  date_dos2unix(le16_to_cpu(de-time), le16_to_cpu(de-date),
  + sbi-options.adjust);
inode-i_mtime.tv_nsec = 0;
if (sbi-options.isvfat) {
 int secs = de-ctime_cs / 100;
 int csecs = de-ctime_cs % 100;
 inode-i_ctime.tv_sec  =
  date_dos2unix(le16_to_cpu(de-ctime),
  -  le16_to_cpu(de-cdate)) + secs;
  +  le16_to_cpu(de-cdate),
  +  sbi-options.adjust) + secs;
 inode-i_ctime.tv_nsec = csecs * 1000;
 inode-i_atime.tv_sec =
  -   date_dos2unix(0, le16_to_cpu(de-adate));
  +   date_dos2unix(0, le16_to_cpu(de-adate),
  +  sbi-options.adjust);
 inode-i_atime.tv_nsec = 0;
} else
 inode-i_ctime = inode-i_atime = inode-i_mtime;
  @@ -592,11 +595,14 @@ retry:
raw_entry-attr = fat_attr(inode);
raw_entry-start = cpu_to_le16(MSDOS_I(inode)-i_logstart);
raw_entry-starthi = cpu_to_le16(MSDOS_I(inode)-i_logstart  16);
  - fat_date_unix2dos(inode-i_mtime.tv_sec, raw_entry-time, 
 raw_entry-date);
  + fat_date_unix2dos(inode-i_mtime.tv_sec, raw_entry-time, 
 raw_entry-date,
  + sbi-options.adjust);
if (sbi-options.isvfat) {
 __le16 atime;
  - 
 fat_date_unix2dos(inode-i_ctime.tv_sec,raw_entry-ctime,raw_entry-cd
 ate);
  -  fat_date_unix2dos(inode-i_atime.tv_sec,atime,raw_entry-adate);
  + 
 fat_date_unix2dos(inode-i_ctime.tv_sec,raw_entry-ctime,raw_entry-cd
 ate,
  +  sbi-options.adjust);
  +  fat_date_unix2dos(inode-i_atime.tv_sec,atime,raw_entry-adate,
  +  sbi-options.adjust);
 raw_entry-ctime_cs = (inode-i_ctime.tv_sec  1) * 100 +
  inode-i_ctime.tv_nsec / 1000;
}
  @@ -854,7 +860,7 @@ enum {
Opt_charset, Opt_shortname_lower, Opt_shortname_win95,
Opt_shortname_winnt, Opt_shortname_mixed, Opt_utf8_no, Opt_utf8_yes,
Opt_uni_xl_no, Opt_uni_xl_yes, Opt_nonumtail_no, Opt_nonumtail_yes,
  - Opt_obsolate, Opt_flush, Opt_err,
  + Opt_obsolate, Opt_flush, Opt_posixtime, Opt_err,
   };
 
   static match_table_t fat_tokens = {
  @@ -887,6 +893,7 @@ static match_table_t fat_tokens = {
{Opt_obsolate, cvf_options=%100s},
{Opt_obsolate, posix},
{Opt_flush, flush},
  + {Opt_posixtime, posixtime},
{Opt_err, NULL},
   };
   static match_table_t msdos_tokens = {
  @@ -950,6 +957,7 @@ static int parse_options(char *options, int 
 is_vfat, int silent, int *debug,
opts-utf8 = opts-unicode_xlate = 0;
opts-numtail = 1;
opts-nocase = 0;
  + opts-adjust = 1;
*debug = 0;
 
if (!options)
  @@ -1032,6 +1040,10 @@ static int parse_options(char *options, int 
 is_vfat, int silent, int *debug,
  opts-flush = 1;
  break;
 
  +  case 

Re: [PATCH] Posix file attribute support on VFAT (take #2)

2005-08-18 Thread Hiroyuki Machida

I'm trying to explain background

Christoph Hellwig wrote:

On Wed, Aug 17, 2005 at 04:07:03AM +0900, Machida, Hiroyuki wrote:


This is a take 2 of posix file attribute support on VFAT.



Sorry, but this is far too scary.  Please just use one of the sane
filesystems linux supports.



I would say that purpose of the feature is having ability to
build root fs for small embedded device, not support full posix 
attributes top of VFAT. I think the situation is like uclinux, 
which has no MMU support and many restriction, however it's still

very helpful for small embedded device.

To reduce resource consumption, developers for embedded system
would like to select one file system type to be used, if possible. 
And in most case, FAT is required for data exchange.
	E.g. memory/storage card or USB client. 

So adding small feature to FAT could have ability to build root fs, 
it's very helpful. It's not required to support full attributes.


What do you think ?

Thanks,
Hiroyuki Machida


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Posix file attribute support on VFAT (take #2)

2005-08-18 Thread Hiroyuki Machida

I'm trying to explain background

Christoph Hellwig wrote:

On Wed, Aug 17, 2005 at 04:07:03AM +0900, Machida, Hiroyuki wrote:


This is a take 2 of posix file attribute support on VFAT.



Sorry, but this is far too scary.  Please just use one of the sane
filesystems linux supports.



I would say that purpose of the feature is having ability to
build root fs for small embedded device, not support full posix 
attributes top of VFAT. I think the situation is like uclinux, 
which has no MMU support and many restriction, however it's still

very helpful for small embedded device.

To reduce resource consumption, developers for embedded system
would like to select one file system type to be used, if possible. 
And in most case, FAT is required for data exchange.
	E.g. memory/storage card or USB client. 

So adding small feature to FAT could have ability to build root fs, 
it's very helpful. It's not required to support full attributes.


What do you think ?

Thanks,
Hiroyuki Machida


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Posix file attribute support on VFAT

2005-08-08 Thread Hiroyuki Machida
Ogawa-san and FAT developpers,

Here is a patch to enable posix file attribute mapping to VFAT attributes,
with restrictions.

Main purpose of this patch is to build root file system with VFAT for small
embedded device. FAT is widely used for embedded device to exchange data,
and also small embedded device has resource limitation. So it's very handy
that VFAT has capability to built root fs, even that has restrictions.

Details are described within a patch. I think this feature still needs
improvemnts, however it is very helpful for most embedded developpers.

Thanks,
Hiroyuki Machida

---

* vfat-posix_attr.patch:

 fs/fat/file.c|   10 +
 fs/fat/inode.c   |   27 +++
 fs/vfat/namei.c  |  291 ++
 include/linux/msdos_fs.h |  320 +++
 4 files changed, 646 insertions(+), 2 deletions(-)

Signed-off-by: Hiroyuki Machida <[EMAIL PROTECTED]>

This patch enables "posix_attr" option described as following;


"posix_attr" mapping support on VFAT.

- Descriptions

The option "posix_attr" enables attribute mapping from POSIX 
special files and permission modes/attributes to VFAT attributes 
and creation time fields.

On memory resident inode storage holds full posix attributes,
however in media side, VFAT can't have enough room to store all
attributes. So attribute mapping taken by this option 
is designed to be minimal.

For newly created and/or modified files/dirs, system can utilize 
full posix attributes, because memory resident inode storage can
hold those. After umount-mount cycle, system may lose some 
attributes to preserve VFAT format.

This mapping method has many restrictions, however it's
very handy to build root file system with FAT for small embedded 
device where inter-operation with PC is needed.

- Features

Following attributes/modes are supported with this option
in VFAT media side. However, on memory resident inode storage 
holds full posix attributes.  That means, for newly created and/or 
modified files/dirs, system can utilize full posix attributes. 
After umount-mount cycle, system can just keep following 
attributes/modes.

  - FileType
File type is held in 3MSB bits in ctime_cs.
This enables support following special files.
symbolic link,
block device node,
char device node,
fifo,
socket
Regular files also may have POSIX attributes.

  - DeviceFile
Major and minor number would be held at ctime
and both values are limited  to 255.

  - Owner's User ID/Group ID:   2nd LSB bit in ctime_cs 
This can be used to distinguish root and others,
because this has just one bit width. 
Value of UID/GID for non-root user will be taken from uid/gid 
option on mounting. If nothing is specified, system uses -1 as 
last resort.

  - Permission for Group/Other (rwx):   3rd-5th LSB bit in ctime_cs
Those modes will be kept in ctime_cs.
Also permission modes for "others" will be
same as "group", due to lack of fields.

  - Permission for Owner (rwx)
These modes will be mapped to FAT attributes.
Just same as mapping under VFAT.

  - Others
no sticky, setgid nor setuid bits are not supported.

- Algorithm for attribute mapping decision

  - Regular file/dir
To distinguish regular files/dirs, look if this fat dir 
entry doesn't have ATTR_SYS, first. If it doesn't have 
ATTR_SYS, then check if TYPE field (MSB 3bits) in ctime_cs 
is equal to 7. If so, this regular file/dir is created and/or 
modified under VFAT with "poisx_attr". And posix attribute 
mapping can be take place. Otherwise, conventional VFAT 
attribute mapping is used.

  - Special file
To distinguish special files, look if this fat dir entry 
has ATTR_SYS, first. If it has ATTR_SYS, then check
1st. LSB bit in ctime_cs, refered as "special file flag".
If set,  this file is created under VFAT with "posix_attr". 
Look up TYPE field to decide special file type.
This spcial file detection mothod has some flaw to make
potential confusion. E.g. some system file created under
dos/win may be treated as special file.  However in most case,
user don't create system file under dos/win.


- FAT DIR entry fields description

  - ctime_cs

8bit byte
7 6 5 4 3 2 1 0
|===| | | | | |
TYPE  | | | | +- special file flag (vaild if ATTR_SYS)
  | | | +--- User/Group ID(owner)
  | | +- !group X
  | +--- !group W
  +- !group R



special file flag
Indicate this entry has posix attribut mappi

[PATCH] Posix file attribute support on VFAT

2005-08-08 Thread Hiroyuki Machida
Ogawa-san and FAT developpers,

Here is a patch to enable posix file attribute mapping to VFAT attributes,
with restrictions.

Main purpose of this patch is to build root file system with VFAT for small
embedded device. FAT is widely used for embedded device to exchange data,
and also small embedded device has resource limitation. So it's very handy
that VFAT has capability to built root fs, even that has restrictions.

Details are described within a patch. I think this feature still needs
improvemnts, however it is very helpful for most embedded developpers.

Thanks,
Hiroyuki Machida

---

* vfat-posix_attr.patch:

 fs/fat/file.c|   10 +
 fs/fat/inode.c   |   27 +++
 fs/vfat/namei.c  |  291 ++
 include/linux/msdos_fs.h |  320 +++
 4 files changed, 646 insertions(+), 2 deletions(-)

Signed-off-by: Hiroyuki Machida [EMAIL PROTECTED]

This patch enables posix_attr option described as following;


posix_attr mapping support on VFAT.

- Descriptions

The option posix_attr enables attribute mapping from POSIX 
special files and permission modes/attributes to VFAT attributes 
and creation time fields.

On memory resident inode storage holds full posix attributes,
however in media side, VFAT can't have enough room to store all
attributes. So attribute mapping taken by this option 
is designed to be minimal.

For newly created and/or modified files/dirs, system can utilize 
full posix attributes, because memory resident inode storage can
hold those. After umount-mount cycle, system may lose some 
attributes to preserve VFAT format.

This mapping method has many restrictions, however it's
very handy to build root file system with FAT for small embedded 
device where inter-operation with PC is needed.

- Features

Following attributes/modes are supported with this option
in VFAT media side. However, on memory resident inode storage 
holds full posix attributes.  That means, for newly created and/or 
modified files/dirs, system can utilize full posix attributes. 
After umount-mount cycle, system can just keep following 
attributes/modes.

  - FileType
File type is held in 3MSB bits in ctime_cs.
This enables support following special files.
symbolic link,
block device node,
char device node,
fifo,
socket
Regular files also may have POSIX attributes.

  - DeviceFile
Major and minor number would be held at ctime
and both values are limited  to 255.

  - Owner's User ID/Group ID:   2nd LSB bit in ctime_cs 
This can be used to distinguish root and others,
because this has just one bit width. 
Value of UID/GID for non-root user will be taken from uid/gid 
option on mounting. If nothing is specified, system uses -1 as 
last resort.

  - Permission for Group/Other (rwx):   3rd-5th LSB bit in ctime_cs
Those modes will be kept in ctime_cs.
Also permission modes for others will be
same as group, due to lack of fields.

  - Permission for Owner (rwx)
These modes will be mapped to FAT attributes.
Just same as mapping under VFAT.

  - Others
no sticky, setgid nor setuid bits are not supported.

- Algorithm for attribute mapping decision

  - Regular file/dir
To distinguish regular files/dirs, look if this fat dir 
entry doesn't have ATTR_SYS, first. If it doesn't have 
ATTR_SYS, then check if TYPE field (MSB 3bits) in ctime_cs 
is equal to 7. If so, this regular file/dir is created and/or 
modified under VFAT with poisx_attr. And posix attribute 
mapping can be take place. Otherwise, conventional VFAT 
attribute mapping is used.

  - Special file
To distinguish special files, look if this fat dir entry 
has ATTR_SYS, first. If it has ATTR_SYS, then check
1st. LSB bit in ctime_cs, refered as special file flag.
If set,  this file is created under VFAT with posix_attr. 
Look up TYPE field to decide special file type.
This spcial file detection mothod has some flaw to make
potential confusion. E.g. some system file created under
dos/win may be treated as special file.  However in most case,
user don't create system file under dos/win.


- FAT DIR entry fields description

  - ctime_cs

8bit byte
7 6 5 4 3 2 1 0
|===| | | | | |
TYPE  | | | | +- special file flag (vaild if ATTR_SYS)
  | | | +--- User/Group ID(owner)
  | | +- !group X
  | +--- !group W
  +- !group R



special file flag
Indicate this entry has posix attribut mapping.
This field is vaild for fat dir entry, which 
have ATTR_SYS

Re: [RFD] FAT robustness

2005-07-21 Thread Hiroyuki Machida

Hi,

OGAWA Hirofumi wrote:

Hiroyuki Machida <[EMAIL PROTECTED]> writes:



We currently plan to add following features to address FAT corruption.

   - Utilize standard 2.6 features as much as possible
- Implement as options of fat, vfat and uvfat



What is the uvfat? typo (xvfat)?  Why is this an option (does it have
the big demerit)?


uvfat is another variant of vfat, like umsdos.
Xvfat for 2.4 has following directories and file organization;
most files are located at fs/xvfat.
and most of them, copied from fs/fat and fs/vfat and renamed
to have prefix like 'xvfat_'.
For 2.6, I feel that the above organization need to be changed.
And xvfat for 2.4 had some performance degradation. So I guess 'option'
is better.




- Utilize noop elevator to cancel unexpected operation reordering



Why don't you use the barrier?


You mean that using requests with barrier flag is enough and there is
no reason to specify IO-sched ?

It is better to preserve order of updating data, some circumstance
like appending data. 


At xvfat for 2.4 had own elevator function, to preserve EraseBlock unit
ordering for memory card device. 


To begin consideration for 2.6, I'd like to make it simple. But later
we need to address to this issue. So I thought at first using "noop",
later switch special elevator function to handle device better.





   - Coordinate order of operations so that update data first, meta
 data later with transaction control



Is this meaning the SoftUpdates? What does this guarantee? How does
this handle the rename(), and cyclic dependency of updates?


In <[EMAIL PROTECTED]>, I mentioned about this.




   - With O_SYNC, close() make flush all related data and
 meta-data, then wait completion of I/O



What is this meaning? Why does O_SYNC only flush at close()?


From application's point of view, application wants to believe 

close()ed file is correctly written, without any corruption.

At least close() need to guarantee this. It's ok every write()
flush meta data and data and wait compeletion I/O.

At least fat on 2.4.20, VFS sync inode on write() with O_SYNC,
however it don't take care about super block. At FAT side 
don't care about O_SYNC. That's problem.




Almost things in your email is needing the detail.



I'm thinking the SoftUpdates is best solution for now. Could you tell
the detail of your solution?


In <[EMAIL PROTECTED]>, I mentioned about this.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] FAT robustness

2005-07-21 Thread Hiroyuki Machida

Hi,

I need to explain background information more. My descriptions tends
to be depend on some knowledge about current xvfat for 2.4 kernel.

I'm not a author of xvfat fo 2.4 kernel, but can explain little more.

Current xvfat for 2.4 is designed to some specific flash memory card
controller which can guarantee atomicity of operation on ERASE-BLOCK size
unit. Xvfat for 2.4 try to merge operations on same ERASE-BLOCK under
some ordering constrain.

And xvfat for 2.4 uses own version of transaction control using 
in-core memory, not storage device like HDD nor flash ram,

to accomplish the above goal, with minimal changes on existing
FAT implementation. And this transaction control let FAT operations
came from different threads to fee from mixed up, where potentially
operation ordering problems would be caused.

We'll start with HDD, however later we'll cover memory devices.
For memory devices we may prepare another elevator functions,
depending on property of devices or lower layer. E.g. NAND/AND 
flash have  different operation units for read/write and erase,

and have some translation layer.



Paulo Marques wrote:

Hiroyuki Machida wrote:


[...]
 Q3 : I'm not sure JBD can be used for FAT improvements.   Do you 
have any comments ?



I might not be the best person to answer this, but this just seems so 
obvious:

Any comments are welcome.

If you plan to let a recently hot-unplugged device to be used in another 
OS that doesn't understand your journaling extensions, your disk will be 
corrupted.


If this is supposed to work only on OS's that understand your journaling 
extensions, then there are much better filesystems out there with 
journaling already.


I agree. Even not removable media, this situation will be occurred.
Suppose that device like audio player which acts as USB client and provide
USB Mass class target class. Embedded storage may be handled through by
USB Host side, like Win PC or Mac.


You might be able to reduce the size of the time window where hot 
removing the media will cause problems, like writting all the data first 
and update the metadata in as few operations as possible. But that just 
reduces the probability of data corruption. It doesn't eliminate it at all.




As other messages said, some developers suggest "SoftUpdate" to be used.
I need to consider about situation where memory devices are used, not HDD.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] FAT robustness

2005-07-21 Thread Hiroyuki Machida

Hi,

I need to explain background information more. My descriptions tends
to be depend on some knowledge about current xvfat for 2.4 kernel.

I'm not a author of xvfat fo 2.4 kernel, but can explain little more.

Current xvfat for 2.4 is designed to some specific flash memory card
controller which can guarantee atomicity of operation on ERASE-BLOCK size
unit. Xvfat for 2.4 try to merge operations on same ERASE-BLOCK under
some ordering constrain.

And xvfat for 2.4 uses own version of transaction control using 
in-core memory, not storage device like HDD nor flash ram,

to accomplish the above goal, with minimal changes on existing
FAT implementation. And this transaction control let FAT operations
came from different threads to fee from mixed up, where potentially
operation ordering problems would be caused.

We'll start with HDD, however later we'll cover memory devices.
For memory devices we may prepare another elevator functions,
depending on property of devices or lower layer. E.g. NAND/AND 
flash have  different operation units for read/write and erase,

and have some translation layer.



Paulo Marques wrote:

Hiroyuki Machida wrote:


[...]
 Q3 : I'm not sure JBD can be used for FAT improvements.   Do you 
have any comments ?



I might not be the best person to answer this, but this just seems so 
obvious:

Any comments are welcome.

If you plan to let a recently hot-unplugged device to be used in another 
OS that doesn't understand your journaling extensions, your disk will be 
corrupted.


If this is supposed to work only on OS's that understand your journaling 
extensions, then there are much better filesystems out there with 
journaling already.


I agree. Even not removable media, this situation will be occurred.
Suppose that device like audio player which acts as USB client and provide
USB Mass class target class. Embedded storage may be handled through by
USB Host side, like Win PC or Mac.


You might be able to reduce the size of the time window where hot 
removing the media will cause problems, like writting all the data first 
and update the metadata in as few operations as possible. But that just 
reduces the probability of data corruption. It doesn't eliminate it at all.




As other messages said, some developers suggest SoftUpdate to be used.
I need to consider about situation where memory devices are used, not HDD.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] FAT robustness

2005-07-21 Thread Hiroyuki Machida

Hi,

OGAWA Hirofumi wrote:

Hiroyuki Machida [EMAIL PROTECTED] writes:



We currently plan to add following features to address FAT corruption.

   - Utilize standard 2.6 features as much as possible
- Implement as options of fat, vfat and uvfat



What is the uvfat? typo (xvfat)?  Why is this an option (does it have
the big demerit)?


uvfat is another variant of vfat, like umsdos.
Xvfat for 2.4 has following directories and file organization;
most files are located at fs/xvfat.
and most of them, copied from fs/fat and fs/vfat and renamed
to have prefix like 'xvfat_'.
For 2.6, I feel that the above organization need to be changed.
And xvfat for 2.4 had some performance degradation. So I guess 'option'
is better.




- Utilize noop elevator to cancel unexpected operation reordering



Why don't you use the barrier?


You mean that using requests with barrier flag is enough and there is
no reason to specify IO-sched ?

It is better to preserve order of updating data, some circumstance
like appending data. 


At xvfat for 2.4 had own elevator function, to preserve EraseBlock unit
ordering for memory card device. 


To begin consideration for 2.6, I'd like to make it simple. But later
we need to address to this issue. So I thought at first using noop,
later switch special elevator function to handle device better.





   - Coordinate order of operations so that update data first, meta
 data later with transaction control



Is this meaning the SoftUpdates? What does this guarantee? How does
this handle the rename(), and cyclic dependency of updates?


In [EMAIL PROTECTED], I mentioned about this.




   - With O_SYNC, close() make flush all related data and
 meta-data, then wait completion of I/O



What is this meaning? Why does O_SYNC only flush at close()?


From application's point of view, application wants to believe 

close()ed file is correctly written, without any corruption.

At least close() need to guarantee this. It's ok every write()
flush meta data and data and wait compeletion I/O.

At least fat on 2.4.20, VFS sync inode on write() with O_SYNC,
however it don't take care about super block. At FAT side 
don't care about O_SYNC. That's problem.




Almost things in your email is needing the detail.



I'm thinking the SoftUpdates is best solution for now. Could you tell
the detail of your solution?


In [EMAIL PROTECTED], I mentioned about this.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Preserve hibenate-system-image on startup

2005-07-20 Thread Hiroyuki Machida
Hi,


With this function, system needs to mount read-write file systems on
every boot cycle, due to avoid inconsistency between FS and memory.
How did you address this problem? Did kernel check RW FS remained as
mounted on boot up or hibernate time ?


I think I need to discuss with you at San Jose at the beginning of 
this year.


Regards,
Hiroyuki Machida

Nigel Cunningham wrote:
> Hi.
> 
> We've had this feature in Suspend2 for a couple of years and I can
> confirm that the approach works, provided that the on-disk filesystem
> remains unchanged throughout this. (Useful mainly for kiosks etc).
> 
> This is not to say that I've reviewed the code below for correctness.
> 
> Regards,

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Preserve hibenate-system-image on startup

2005-07-20 Thread Hiroyuki Machida
Hi,


With this function, system needs to mount read-write file systems on
every boot cycle, due to avoid inconsistency between FS and memory.
How did you address this problem? Did kernel check RW FS remained as
mounted on boot up or hibernate time ?


I think I need to discuss with you at San Jose at the beginning of 
this year.


Regards,
Hiroyuki Machida

Nigel Cunningham wrote:
 Hi.
 
 We've had this feature in Suspend2 for a couple of years and I can
 confirm that the approach works, provided that the on-disk filesystem
 remains unchanged throughout this. (Useful mainly for kiosks etc).
 
 This is not to say that I've reviewed the code below for correctness.
 
 Regards,

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFD] FAT robustness

2005-07-17 Thread Hiroyuki Machida


Folks,

I'd like to have a discussion about FAT robustness.
Please give your thought, comments and related issues.

About few years ago, we added some features to FAT, called xvfat,
so that System and FAT have robustness against unexpected media hot
unplug and ability to let applications correctly be aware the event.

Just for your reference, I put a patch to 2.4.20 kernel at
http://www.celinuxforum.org/CelfPubWiki/XvFatDiscussion?action=AttachFile=get=20050715-xvfat-2.4.20.patch
This includes following features;

Handle media removed during “mount”
Notification of media removal to application
Cancellation of I/O Elevator for Block device
Block system calls until a completion of writing
Control order of meta-data updates, using transaction   
control implemented in fs/xvfat/fwrq.c
File syscall return “error”, except umount
Japanese file name support
possible 1-N mapping issues SJIS <-> UNICODE
Dirty Flag support
TIME ZONE support

On moving to 2.6, we consider and categorize issues, again.
And we are planing to have open source project for these features
to add 2.6 kernel.  I'd like to open discussion about these features
and how to implement on 2.6 kernel.

1. Issues to be addressed

- Issues around FAT with CE devices
 - Hot unplug issues
- File System corruption on unplug  media/storage device
Almost same as power down without umount

- Notification of the event
Application need to know the event precisely
Need to more investigation

- System stability after unplug
Almost same as I/O error recovery issues discussed
at LKLM
http://developer.osdl.jp/projects/doubt/fs-consistency-and-coherency/index.html

http://groups.google.co.jp/group/linux.kernel/browse_thread/thread/b9c11bccd59e0513/4a4dd84b411c6d32?q=[RFD]+FS+behavior+(I%2FO+failure)+in+kernel+summit++lkml=1=ja#4a4dd84b411c6d32


 - Other issues
- Time stamp issues
using always local time
time resolution is 2sec unit

- Issues around mapping with UNICODE and local char code
1-N mapping SJIS<-> UNICODE
Potential directory cache problem due to 1 –N mapping
Possible inconsistency problems with application side

- Support file size over 2GB

- Support dirty flag

 Q1 : First issue for discussion is "Do you have any other issues
about this?" and "Do you have any other idea to categorize
the issues?"


2.  FAT corruption on unplug  media/storage device

On starting the open source project, we focus to the following issue,
first.
- File System corruption on unplug  media/storage device
Almost same as power down without umount

And, we are planing to focus on HDD device and treat system power down
instead of unplug media, because
 A. Damages and it's counter methods may depend on property of lower
layer
E.g.
  - Memory Card
Some controller can guaranty atomicity of certain
operations
  - Flush Memory (NAND, NOR)
I/O operations may be constrained by Block Size
(e,g, 128KB) or Page Size (e.g. 2KB)
 - HDD
- Cache memory my resident inside in
		- Sector which is under writing 
		on power down may be corrupted(can't read anymore)


 B.  It may make the problem easier
- Sector size is 512 Byte
- Many developers may check with PC

 Q2 : Do you know any other storage devices and it's property, to 
	be address later?


3. Features to be developed for FAT corruption.

We currently plan to add following features to address FAT corruption.

   - Utilize standard 2.6 features as much as possible
- Implement as options of fat, vfat and uvfat
- Utilize existent journal block device (JBD) for transaction control
- Utilize noop elevator to cancel unexpected operation
 reordering
   - Coordinate order of operations so that update data first, meta
 data later with transaction control
   - With O_SYNC, close() make flush all related data and
 meta-data, then wait completion of I/O


 Q3 : I'm not sure JBD can be used for FAT improvements. 
  Do you have any comments ?



Thanks,
Hiroyuki Machida



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Preserve hibenate-system-image on startup

2005-07-17 Thread Hiroyuki Machida

We are now investigating fast startup/shutdown using
2.6 kernel PM functions.

An attached patch enables kernel to preserve system image
on startup, to implement "Snapshot boot"[EMAIL PROTECTED] wrote:
Conventionally system image will be broken after startup.

Snapshot boot uses un-hibernate from a permanent system image for
startup. During shutdown, does a conventional shutdown without
saving a system image.

We'll explain concept and initial work at OLS. So if you have
interest, we can talk with you at Ottawa.

Thanks,
Hiroyuki Machida

---

This patch enables preserving swsuspend system image over boot cycle, 
against 2.6.12

Signed-off-by: Hiroyui Machida <[EMAIL PROTECTED]> for CELF

-
Index: alp-linux--dev-2-6-12--1.7/kernel/power/Kconfig
===
--- alp-linux--dev-2-6-12--1.7.orig/kernel/power/Kconfig2005-07-15 
14:59:20.0 -0400
+++ alp-linux--dev-2-6-12--1.7/kernel/power/Kconfig 2005-07-16 
00:43:31.42000 -0400
@@ -84,6 +84,20 @@
  suspended image to. It will simply pick the first available swap 
  device.
 
+config PRESERVE_SWSUSP_IMAGE
+   bool "Preserve swsuspend image"
+   depends on SOFTWARE_SUSPEND
+   default n
+   ---help---
+ Useally boot with swsup destories the swsusp image.
+ This function enables to preserve swsup image over boot cycle. 
+ Default behavior is not chaged even this configuration turned on.
+
+ To preseve swsusp image, specify following option to command line;
+
+   prsv-img
+
+
 config DEFERRED_RESUME
bool "Deferred resume"
depends on PM
Index: alp-linux--dev-2-6-12--1.7/kernel/power/disk.c
===
--- alp-linux--dev-2-6-12--1.7.orig/kernel/power/disk.c 2005-07-16 
00:43:02.99000 -0400
+++ alp-linux--dev-2-6-12--1.7/kernel/power/disk.c  2005-07-16 
01:01:42.22000 -0400
@@ -29,10 +29,29 @@
 extern void swsusp_close(void);
 extern int swsusp_resume(void);
 extern int swsusp_free(void);
+extern void dump_pagedir_nosave(void);
 #ifdef CONFIG_SAFE_SUSPEND
 extern int suspend_remount(void);
 extern int resume_remount(void);
 #endif
+#ifdef CONFIG_PRESERVE_SWSUSP_IMAGE
+extern int preserve_swsusp_image;
+extern dev_t swsusp_resume_device_nosave __nosavedata;
+extern int swsusp_swap_rdonly(dev_t);
+extern int swsusp_swap_off(dev_t);
+#else
+#define preserve_swsusp_image 0
+#define swsusp_resume_device_nosave 0
+static inline int swsusp_swap_rdonly(dev_t dev)
+{
+   return 0;
+}
+static inline int swsusp_swap_off(dev_t dev)
+{
+   return 0;
+}
+#endif
+
 
 
 static int noresume = 0;
@@ -135,6 +154,26 @@
pm_restore_console();
 }
 
+#ifdef CONFIG_PRESERVE_SWSUSP_IMAGE
+void finish_in_resume(void)
+{
+   device_resume();
+   platform_finish();
+   enable_nonboot_cpus();
+   thaw_processes();
+   if (preserve_swsusp_image) {
+   swsusp_swap_off(swsusp_resume_device_nosave);
+   }
+   pm_restore_console();
+}
+#else
+void finish_in_resume(void)
+{
+   finish();
+}
+#endif
+
+
 extern atomic_t on_suspend;   /* See refrigerator() */
 
 static int prepare_processes(void)
@@ -234,8 +273,15 @@
error = swsusp_write();
if (!error)
power_down(pm_disk_mode);
-   } else
+   } else  {
pr_debug("PM: Image restored successfully.\n");
+   if (preserve_swsusp_image) {
+   swsusp_swap_rdonly(swsusp_resume_device_nosave);
+   }
+   swsusp_free();
+   finish_in_resume();
+   return 0;
+   }
swsusp_free();
  Done:
finish();
Index: alp-linux--dev-2-6-12--1.7/kernel/power/swsusp.c
===
--- alp-linux--dev-2-6-12--1.7.orig/kernel/power/swsusp.c   2005-07-16 
00:43:03.0 -0400
+++ alp-linux--dev-2-6-12--1.7/kernel/power/swsusp.c2005-07-16 
00:56:22.17000 -0400
@@ -128,6 +128,11 @@
 
 static struct swsusp_info swsusp_info;
 
+#ifdef CONFIG_PRESERVE_SWSUSP_IMAGE
+dev_t swsusp_resume_device_nosave __nosavedata;
+struct swsusp_header swsusp_header_nosave __nosavedata ;
+#endif
+
 /*
  * XXX: We try to keep some more pages free so that I/O operations succeed
  * without paging. Might this be more?
@@ -139,6 +144,24 @@
 #define PAGES_FOR_IO   512
 #endif
 
+#ifdef CONFIG_PRESERVE_SWSUSP_IMAGE
+int preserve_swsusp_image=0;
+static  int __init preserve_swsusp_image_setup(char *str)
+{
+   if (*str)
+   return 0;
+   preserve_swsusp_image = 1;
+   return 1;
+}
+#else
+static  int __init preserve_swsusp_image_setup(char *str)
+{
+   return 0;
+}
+#endif
+
+__setup("prsv-img", preserve_swsusp_image_setup);
+
 /*
  * Saving part...
  */
@

[PATCH] Preserve hibenate-system-image on startup

2005-07-17 Thread Hiroyuki Machida

We are now investigating fast startup/shutdown using
2.6 kernel PM functions.

An attached patch enables kernel to preserve system image
on startup, to implement Snapshot boot[EMAIL PROTECTED] wrote:
Conventionally system image will be broken after startup.

Snapshot boot uses un-hibernate from a permanent system image for
startup. During shutdown, does a conventional shutdown without
saving a system image.

We'll explain concept and initial work at OLS. So if you have
interest, we can talk with you at Ottawa.

Thanks,
Hiroyuki Machida

---

This patch enables preserving swsuspend system image over boot cycle, 
against 2.6.12

Signed-off-by: Hiroyui Machida [EMAIL PROTECTED] for CELF

-
Index: alp-linux--dev-2-6-12--1.7/kernel/power/Kconfig
===
--- alp-linux--dev-2-6-12--1.7.orig/kernel/power/Kconfig2005-07-15 
14:59:20.0 -0400
+++ alp-linux--dev-2-6-12--1.7/kernel/power/Kconfig 2005-07-16 
00:43:31.42000 -0400
@@ -84,6 +84,20 @@
  suspended image to. It will simply pick the first available swap 
  device.
 
+config PRESERVE_SWSUSP_IMAGE
+   bool Preserve swsuspend image
+   depends on SOFTWARE_SUSPEND
+   default n
+   ---help---
+ Useally boot with swsup destories the swsusp image.
+ This function enables to preserve swsup image over boot cycle. 
+ Default behavior is not chaged even this configuration turned on.
+
+ To preseve swsusp image, specify following option to command line;
+
+   prsv-img
+
+
 config DEFERRED_RESUME
bool Deferred resume
depends on PM
Index: alp-linux--dev-2-6-12--1.7/kernel/power/disk.c
===
--- alp-linux--dev-2-6-12--1.7.orig/kernel/power/disk.c 2005-07-16 
00:43:02.99000 -0400
+++ alp-linux--dev-2-6-12--1.7/kernel/power/disk.c  2005-07-16 
01:01:42.22000 -0400
@@ -29,10 +29,29 @@
 extern void swsusp_close(void);
 extern int swsusp_resume(void);
 extern int swsusp_free(void);
+extern void dump_pagedir_nosave(void);
 #ifdef CONFIG_SAFE_SUSPEND
 extern int suspend_remount(void);
 extern int resume_remount(void);
 #endif
+#ifdef CONFIG_PRESERVE_SWSUSP_IMAGE
+extern int preserve_swsusp_image;
+extern dev_t swsusp_resume_device_nosave __nosavedata;
+extern int swsusp_swap_rdonly(dev_t);
+extern int swsusp_swap_off(dev_t);
+#else
+#define preserve_swsusp_image 0
+#define swsusp_resume_device_nosave 0
+static inline int swsusp_swap_rdonly(dev_t dev)
+{
+   return 0;
+}
+static inline int swsusp_swap_off(dev_t dev)
+{
+   return 0;
+}
+#endif
+
 
 
 static int noresume = 0;
@@ -135,6 +154,26 @@
pm_restore_console();
 }
 
+#ifdef CONFIG_PRESERVE_SWSUSP_IMAGE
+void finish_in_resume(void)
+{
+   device_resume();
+   platform_finish();
+   enable_nonboot_cpus();
+   thaw_processes();
+   if (preserve_swsusp_image) {
+   swsusp_swap_off(swsusp_resume_device_nosave);
+   }
+   pm_restore_console();
+}
+#else
+void finish_in_resume(void)
+{
+   finish();
+}
+#endif
+
+
 extern atomic_t on_suspend;   /* See refrigerator() */
 
 static int prepare_processes(void)
@@ -234,8 +273,15 @@
error = swsusp_write();
if (!error)
power_down(pm_disk_mode);
-   } else
+   } else  {
pr_debug(PM: Image restored successfully.\n);
+   if (preserve_swsusp_image) {
+   swsusp_swap_rdonly(swsusp_resume_device_nosave);
+   }
+   swsusp_free();
+   finish_in_resume();
+   return 0;
+   }
swsusp_free();
  Done:
finish();
Index: alp-linux--dev-2-6-12--1.7/kernel/power/swsusp.c
===
--- alp-linux--dev-2-6-12--1.7.orig/kernel/power/swsusp.c   2005-07-16 
00:43:03.0 -0400
+++ alp-linux--dev-2-6-12--1.7/kernel/power/swsusp.c2005-07-16 
00:56:22.17000 -0400
@@ -128,6 +128,11 @@
 
 static struct swsusp_info swsusp_info;
 
+#ifdef CONFIG_PRESERVE_SWSUSP_IMAGE
+dev_t swsusp_resume_device_nosave __nosavedata;
+struct swsusp_header swsusp_header_nosave __nosavedata ;
+#endif
+
 /*
  * XXX: We try to keep some more pages free so that I/O operations succeed
  * without paging. Might this be more?
@@ -139,6 +144,24 @@
 #define PAGES_FOR_IO   512
 #endif
 
+#ifdef CONFIG_PRESERVE_SWSUSP_IMAGE
+int preserve_swsusp_image=0;
+static  int __init preserve_swsusp_image_setup(char *str)
+{
+   if (*str)
+   return 0;
+   preserve_swsusp_image = 1;
+   return 1;
+}
+#else
+static  int __init preserve_swsusp_image_setup(char *str)
+{
+   return 0;
+}
+#endif
+
+__setup(prsv-img, preserve_swsusp_image_setup);
+
 /*
  * Saving part...
  */
@@ -1250,6 +1273,53 @@
return error;
 }
 
+#ifdef CONFIG_PRESERVE_SWSUSP_IMAGE

[RFD] FAT robustness

2005-07-17 Thread Hiroyuki Machida


Folks,

I'd like to have a discussion about FAT robustness.
Please give your thought, comments and related issues.

About few years ago, we added some features to FAT, called xvfat,
so that System and FAT have robustness against unexpected media hot
unplug and ability to let applications correctly be aware the event.

Just for your reference, I put a patch to 2.4.20 kernel at
http://www.celinuxforum.org/CelfPubWiki/XvFatDiscussion?action=AttachFiledo=gettarget=20050715-xvfat-2.4.20.patch
This includes following features;

Handle media removed during “mount”
Notification of media removal to application
Cancellation of I/O Elevator for Block device
Block system calls until a completion of writing
Control order of meta-data updates, using transaction   
control implemented in fs/xvfat/fwrq.c
File syscall return “error”, except umount
Japanese file name support
possible 1-N mapping issues SJIS - UNICODE
Dirty Flag support
TIME ZONE support

On moving to 2.6, we consider and categorize issues, again.
And we are planing to have open source project for these features
to add 2.6 kernel.  I'd like to open discussion about these features
and how to implement on 2.6 kernel.

1. Issues to be addressed

- Issues around FAT with CE devices
 - Hot unplug issues
- File System corruption on unplug  media/storage device
Almost same as power down without umount

- Notification of the event
Application need to know the event precisely
Need to more investigation

- System stability after unplug
Almost same as I/O error recovery issues discussed
at LKLM
http://developer.osdl.jp/projects/doubt/fs-consistency-and-coherency/index.html

http://groups.google.co.jp/group/linux.kernel/browse_thread/thread/b9c11bccd59e0513/4a4dd84b411c6d32?q=[RFD]+FS+behavior+(I%2FO+failure)+in+kernel+summit++lkmlrnum=1hl=ja#4a4dd84b411c6d32


 - Other issues
- Time stamp issues
using always local time
time resolution is 2sec unit

- Issues around mapping with UNICODE and local char code
1-N mapping SJIS- UNICODE
Potential directory cache problem due to 1 –N mapping
Possible inconsistency problems with application side

- Support file size over 2GB

- Support dirty flag

 Q1 : First issue for discussion is Do you have any other issues
about this? and Do you have any other idea to categorize
the issues?


2.  FAT corruption on unplug  media/storage device

On starting the open source project, we focus to the following issue,
first.
- File System corruption on unplug  media/storage device
Almost same as power down without umount

And, we are planing to focus on HDD device and treat system power down
instead of unplug media, because
 A. Damages and it's counter methods may depend on property of lower
layer
E.g.
  - Memory Card
Some controller can guaranty atomicity of certain
operations
  - Flush Memory (NAND, NOR)
I/O operations may be constrained by Block Size
(e,g, 128KB) or Page Size (e.g. 2KB)
 - HDD
- Cache memory my resident inside in
		- Sector which is under writing 
		on power down may be corrupted(can't read anymore)


 B.  It may make the problem easier
- Sector size is 512 Byte
- Many developers may check with PC

 Q2 : Do you know any other storage devices and it's property, to 
	be address later?


3. Features to be developed for FAT corruption.

We currently plan to add following features to address FAT corruption.

   - Utilize standard 2.6 features as much as possible
- Implement as options of fat, vfat and uvfat
- Utilize existent journal block device (JBD) for transaction control
- Utilize noop elevator to cancel unexpected operation
 reordering
   - Coordinate order of operations so that update data first, meta
 data later with transaction control
   - With O_SYNC, close() make flush all related data and
 meta-data, then wait completion of I/O


 Q3 : I'm not sure JBD can be used for FAT improvements. 
  Do you have any comments ?



Thanks,
Hiroyuki Machida



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/