Re: [PATCH] fat/vfat: optionally ignore system timezone offset when reading/writing timestamps
HI OGAWA-san and Paul-san, Sorry late response, I'm not famillar with recent fat code, but code itself looks good for just turn on/off time adjusting. On the other hand, I feel we need more consideration on use cases/requirements. I feel that turning off time adjustment is a just ad-hoc solution to issues like Paul san brought up. Thanks, Hiroyuki OGAWA Hirofumi <[EMAIL PROTECTED]> wrote on 2007/03/19 23:31:06: > Paul Collins <[EMAIL PROTECTED]> writes: > > > Hello, > > Hello, > > > Here is a patch that adds a mount option named "posixtime" that, when > > enabled, causes the fat/vfat code to not adjust timestamps as they are > > read/written to/from disk. The intent of the adjustment as performed > > by the existing code appears to be to present correct timestamps to > > Windows and friends, which treat the timestamp values as local time. > > > > However, the systems that I use to update my FAT32 filesystems are all > > running Linux, and as such will process POSIX timestamps correctly. > > (The filesystems are on disks in digital audio players, which do not > > process timestamps except to check if they have changed.) Due to the > > aforementioned adjustment on read/write, after a Daylight Savings > > transition I must update the file timestamps on the filesystems so > > that rsync does not needlessly re-copy files. > > > > Please review and consider applying this patch. > > Thanks. Looks good to me as start. IIRC, Machida-san did this few > years ago. Machida-san, do you have any comment? > > > diff --git a/fs/fat/dir.c b/fs/fat/dir.c > > index c16af24..d6c0a7f 100644 > > --- a/fs/fat/dir.c > > +++ b/fs/fat/dir.c > > @@ -1064,7 +1064,7 @@ int fat_alloc_new_dir(struct inode *dir, struct > timespec *ts) > >goto error_free; > > } > > > > - fat_date_unix2dos(ts->tv_sec, &time, &date); > > + fat_date_unix2dos(ts->tv_sec, &time, &date, sbi->options.adjust); > > > > de = (struct msdos_dir_entry *)bhs[0]->b_data; > > /* filling the new directory slots ("." and ".." entries) */ > > diff --git a/fs/fat/inode.c b/fs/fat/inode.c > > index a9e4688..02d9225 100644 > > --- a/fs/fat/inode.c > > +++ b/fs/fat/inode.c > > @@ -374,17 +374,20 @@ static int fat_fill_inode(struct inode *inode, > struct msdos_dir_entry *de) > > inode->i_blocks = ((inode->i_size + (sbi->cluster_size - 1)) > >& ~((loff_t)sbi->cluster_size - 1)) >> 9; > > inode->i_mtime.tv_sec = > > - date_dos2unix(le16_to_cpu(de->time), le16_to_cpu(de->date)); > > + date_dos2unix(le16_to_cpu(de->time), le16_to_cpu(de->date), > > + sbi->options.adjust); > > inode->i_mtime.tv_nsec = 0; > > if (sbi->options.isvfat) { > >int secs = de->ctime_cs / 100; > >int csecs = de->ctime_cs % 100; > >inode->i_ctime.tv_sec = > > date_dos2unix(le16_to_cpu(de->ctime), > > - le16_to_cpu(de->cdate)) + secs; > > + le16_to_cpu(de->cdate), > > + sbi->options.adjust) + secs; > >inode->i_ctime.tv_nsec = csecs * 1000; > >inode->i_atime.tv_sec = > > - date_dos2unix(0, le16_to_cpu(de->adate)); > > + date_dos2unix(0, le16_to_cpu(de->adate), > > + sbi->options.adjust); > >inode->i_atime.tv_nsec = 0; > > } else > >inode->i_ctime = inode->i_atime = inode->i_mtime; > > @@ -592,11 +595,14 @@ retry: > > raw_entry->attr = fat_attr(inode); > > raw_entry->start = cpu_to_le16(MSDOS_I(inode)->i_logstart); > > raw_entry->starthi = cpu_to_le16(MSDOS_I(inode)->i_logstart >> 16); > > - fat_date_unix2dos(inode->i_mtime.tv_sec, &raw_entry->time, > &raw_entry->date); > > + fat_date_unix2dos(inode->i_mtime.tv_sec, &raw_entry->time, > &raw_entry->date, > > + sbi->options.adjust); > > if (sbi->options.isvfat) { > >__le16 atime; > > - > fat_date_unix2dos(inode->i_ctime.tv_sec,&raw_entry->ctime,&raw_entry->cd > ate); > > - fat_date_unix2dos(inode->i_atime.tv_sec,&atime,&raw_entry->adate); > > + > fat_date_unix2dos(inode->i_ctime.tv_sec,&raw_entry->ctime,&raw_entry->cd > ate, > > + sbi->options.adjust); > > + fat_date_unix2dos(inode->i_atime.tv_sec,&atime,&raw_entry->adate, > > + sbi->options.adjust); > >raw_entry->ctime_cs = (inode->i_ctime.tv_sec & 1) * 100 + > > inode->i_ctime.tv_nsec / 1000; > > } > > @@ -854,7 +860,7 @@ enum { > > Opt_charset, Opt_shortname_lower, Opt_shortname_win95, > > Opt_shortname_winnt, Opt_shortname_mixed, Opt_utf8_no, Opt_utf8_yes, > > Opt_uni_xl_no, Opt_uni_xl_yes, Opt_nonumtail_no, Opt_nonumtail_yes, > > - Opt_obsolate, Opt_flush, Opt_err, > > + Opt_obsolate, Opt_flush, Opt_posixtime, Opt_err, > > }; > > > > static match_table_t fat_tokens = { > > @@ -887,6 +893,7 @@ static match_table_t fat_tokens = { > > {Opt_obsolate, "cvf_options=%100s"}, > > {Opt_obsolate, "posix"}, > > {Opt_flush, "flush"}, > > + {Opt_posixtime, "posixtime"}, > > {Opt_err, NULL}, > > }; > > static match_table_t msdos_tokens = { > > @@ -950,6 +957,7 @@ static int parse_options(char *options, int > is_vfat, int silent, i
Re: [PATCH] Posix file attribute support on VFAT (take #2)
I'm trying to explain background Christoph Hellwig wrote: On Wed, Aug 17, 2005 at 04:07:03AM +0900, Machida, Hiroyuki wrote: This is a take 2 of posix file attribute support on VFAT. Sorry, but this is far too scary. Please just use one of the sane filesystems linux supports. I would say that purpose of the feature is having ability to build root fs for small embedded device, not support full posix attributes top of VFAT. I think the situation is like uclinux, which has no MMU support and many restriction, however it's still very helpful for small embedded device. To reduce resource consumption, developers for embedded system would like to select one file system type to be used, if possible. And in most case, FAT is required for data exchange. E.g. memory/storage card or USB client. So adding small feature to FAT could have ability to build root fs, it's very helpful. It's not required to support full attributes. What do you think ? Thanks, Hiroyuki Machida - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Posix file attribute support on VFAT
Ogawa-san and FAT developpers, Here is a patch to enable posix file attribute mapping to VFAT attributes, with restrictions. Main purpose of this patch is to build root file system with VFAT for small embedded device. FAT is widely used for embedded device to exchange data, and also small embedded device has resource limitation. So it's very handy that VFAT has capability to built root fs, even that has restrictions. Details are described within a patch. I think this feature still needs improvemnts, however it is very helpful for most embedded developpers. Thanks, Hiroyuki Machida --- * vfat-posix_attr.patch: fs/fat/file.c| 10 + fs/fat/inode.c | 27 +++ fs/vfat/namei.c | 291 ++ include/linux/msdos_fs.h | 320 +++ 4 files changed, 646 insertions(+), 2 deletions(-) Signed-off-by: Hiroyuki Machida <[EMAIL PROTECTED]> This patch enables "posix_attr" option described as following; "posix_attr" mapping support on VFAT. - Descriptions The option "posix_attr" enables attribute mapping from POSIX special files and permission modes/attributes to VFAT attributes and creation time fields. On memory resident inode storage holds full posix attributes, however in media side, VFAT can't have enough room to store all attributes. So attribute mapping taken by this option is designed to be minimal. For newly created and/or modified files/dirs, system can utilize full posix attributes, because memory resident inode storage can hold those. After umount-mount cycle, system may lose some attributes to preserve VFAT format. This mapping method has many restrictions, however it's very handy to build root file system with FAT for small embedded device where inter-operation with PC is needed. - Features Following attributes/modes are supported with this option in VFAT media side. However, on memory resident inode storage holds full posix attributes. That means, for newly created and/or modified files/dirs, system can utilize full posix attributes. After umount-mount cycle, system can just keep following attributes/modes. - FileType File type is held in 3MSB bits in ctime_cs. This enables support following special files. symbolic link, block device node, char device node, fifo, socket Regular files also may have POSIX attributes. - DeviceFile Major and minor number would be held at ctime and both values are limited to 255. - Owner's User ID/Group ID: 2nd LSB bit in ctime_cs This can be used to distinguish root and others, because this has just one bit width. Value of UID/GID for non-root user will be taken from uid/gid option on mounting. If nothing is specified, system uses -1 as last resort. - Permission for Group/Other (rwx): 3rd-5th LSB bit in ctime_cs Those modes will be kept in ctime_cs. Also permission modes for "others" will be same as "group", due to lack of fields. - Permission for Owner (rwx) These modes will be mapped to FAT attributes. Just same as mapping under VFAT. - Others no sticky, setgid nor setuid bits are not supported. - Algorithm for attribute mapping decision - Regular file/dir To distinguish regular files/dirs, look if this fat dir entry doesn't have ATTR_SYS, first. If it doesn't have ATTR_SYS, then check if TYPE field (MSB 3bits) in ctime_cs is equal to 7. If so, this regular file/dir is created and/or modified under VFAT with "poisx_attr". And posix attribute mapping can be take place. Otherwise, conventional VFAT attribute mapping is used. - Special file To distinguish special files, look if this fat dir entry has ATTR_SYS, first. If it has ATTR_SYS, then check 1st. LSB bit in ctime_cs, refered as "special file flag". If set, this file is created under VFAT with "posix_attr". Look up TYPE field to decide special file type. This spcial file detection mothod has some flaw to make potential confusion. E.g. some system file created under dos/win may be treated as special file. However in most case, user don't create system file under dos/win. - FAT DIR entry fields description - ctime_cs 8bit byte 7 6 5 4 3 2 1 0 |===| | | | | | TYPE | | | | +- special file flag (vaild if ATTR_SYS) | | | +--- User/Group ID(owner) | | +- !group X | +--- !group W +- !group R special file flag Indicate
Re: [RFD] FAT robustness
Hi, OGAWA Hirofumi wrote: Hiroyuki Machida <[EMAIL PROTECTED]> writes: We currently plan to add following features to address FAT corruption. - Utilize standard 2.6 features as much as possible - Implement as options of fat, vfat and uvfat What is the uvfat? typo (xvfat)? Why is this an option (does it have the big demerit)? uvfat is another variant of vfat, like umsdos. Xvfat for 2.4 has following directories and file organization; most files are located at fs/xvfat. and most of them, copied from fs/fat and fs/vfat and renamed to have prefix like 'xvfat_'. For 2.6, I feel that the above organization need to be changed. And xvfat for 2.4 had some performance degradation. So I guess 'option' is better. - Utilize noop elevator to cancel unexpected operation reordering Why don't you use the barrier? You mean that using requests with barrier flag is enough and there is no reason to specify IO-sched ? It is better to preserve order of updating data, some circumstance like appending data. At xvfat for 2.4 had own elevator function, to preserve EraseBlock unit ordering for memory card device. To begin consideration for 2.6, I'd like to make it simple. But later we need to address to this issue. So I thought at first using "noop", later switch special elevator function to handle device better. - Coordinate order of operations so that update data first, meta data later with transaction control Is this meaning the SoftUpdates? What does this guarantee? How does this handle the rename(), and cyclic dependency of updates? In <[EMAIL PROTECTED]>, I mentioned about this. - With O_SYNC, close() make flush all related data and meta-data, then wait completion of I/O What is this meaning? Why does O_SYNC only flush at close()? From application's point of view, application wants to believe close()ed file is correctly written, without any corruption. At least close() need to guarantee this. It's ok every write() flush meta data and data and wait compeletion I/O. At least fat on 2.4.20, VFS sync inode on write() with O_SYNC, however it don't take care about super block. At FAT side don't care about O_SYNC. That's problem. Almost things in your email is needing the detail. I'm thinking the SoftUpdates is best solution for now. Could you tell the detail of your solution? In <[EMAIL PROTECTED]>, I mentioned about this. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] FAT robustness
Hi, I need to explain background information more. My descriptions tends to be depend on some knowledge about current xvfat for 2.4 kernel. I'm not a author of xvfat fo 2.4 kernel, but can explain little more. Current xvfat for 2.4 is designed to some specific flash memory card controller which can guarantee atomicity of operation on ERASE-BLOCK size unit. Xvfat for 2.4 try to merge operations on same ERASE-BLOCK under some ordering constrain. And xvfat for 2.4 uses own version of transaction control using in-core memory, not storage device like HDD nor flash ram, to accomplish the above goal, with minimal changes on existing FAT implementation. And this transaction control let FAT operations came from different threads to fee from mixed up, where potentially operation ordering problems would be caused. We'll start with HDD, however later we'll cover memory devices. For memory devices we may prepare another elevator functions, depending on property of devices or lower layer. E.g. NAND/AND flash have different operation units for read/write and erase, and have some translation layer. Paulo Marques wrote: Hiroyuki Machida wrote: [...] Q3 : I'm not sure JBD can be used for FAT improvements. Do you have any comments ? I might not be the best person to answer this, but this just seems so obvious: Any comments are welcome. If you plan to let a recently hot-unplugged device to be used in another OS that doesn't understand your journaling extensions, your disk will be corrupted. If this is supposed to work only on OS's that understand your journaling extensions, then there are much better filesystems out there with journaling already. I agree. Even not removable media, this situation will be occurred. Suppose that device like audio player which acts as USB client and provide USB Mass class target class. Embedded storage may be handled through by USB Host side, like Win PC or Mac. You might be able to reduce the size of the time window where hot removing the media will cause problems, like writting all the data first and update the metadata in as few operations as possible. But that just reduces the probability of data corruption. It doesn't eliminate it at all. As other messages said, some developers suggest "SoftUpdate" to be used. I need to consider about situation where memory devices are used, not HDD. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Preserve hibenate-system-image on startup
Hi, With this function, system needs to mount read-write file systems on every boot cycle, due to avoid inconsistency between FS and memory. How did you address this problem? Did kernel check RW FS remained as mounted on boot up or hibernate time ? I think I need to discuss with you at San Jose at the beginning of this year. Regards, Hiroyuki Machida Nigel Cunningham wrote: > Hi. > > We've had this feature in Suspend2 for a couple of years and I can > confirm that the approach works, provided that the on-disk filesystem > remains unchanged throughout this. (Useful mainly for kiosks etc). > > This is not to say that I've reviewed the code below for correctness. > > Regards, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFD] FAT robustness
Folks, I'd like to have a discussion about FAT robustness. Please give your thought, comments and related issues. About few years ago, we added some features to FAT, called xvfat, so that System and FAT have robustness against unexpected media hot unplug and ability to let applications correctly be aware the event. Just for your reference, I put a patch to 2.4.20 kernel at http://www.celinuxforum.org/CelfPubWiki/XvFatDiscussion?action=AttachFile&do=get&target=20050715-xvfat-2.4.20.patch This includes following features; Handle media removed during “mount” Notification of media removal to application Cancellation of I/O Elevator for Block device Block system calls until a completion of writing Control order of meta-data updates, using transaction control implemented in fs/xvfat/fwrq.c File syscall return “error”, except umount Japanese file name support possible 1-N mapping issues SJIS <-> UNICODE Dirty Flag support TIME ZONE support On moving to 2.6, we consider and categorize issues, again. And we are planing to have open source project for these features to add 2.6 kernel. I'd like to open discussion about these features and how to implement on 2.6 kernel. 1. Issues to be addressed - Issues around FAT with CE devices - Hot unplug issues - File System corruption on unplug media/storage device Almost same as power down without umount - Notification of the event Application need to know the event precisely Need to more investigation - System stability after unplug Almost same as I/O error recovery issues discussed at LKLM http://developer.osdl.jp/projects/doubt/fs-consistency-and-coherency/index.html http://groups.google.co.jp/group/linux.kernel/browse_thread/thread/b9c11bccd59e0513/4a4dd84b411c6d32?q=[RFD]+FS+behavior+(I%2FO+failure)+in+kernel+summit++lkml&rnum=1&hl=ja#4a4dd84b411c6d32 - Other issues - Time stamp issues using always local time time resolution is 2sec unit - Issues around mapping with UNICODE and local char code 1-N mapping SJIS<-> UNICODE Potential directory cache problem due to 1 –N mapping Possible inconsistency problems with application side - Support file size over 2GB - Support dirty flag Q1 : First issue for discussion is "Do you have any other issues about this?" and "Do you have any other idea to categorize the issues?" 2. FAT corruption on unplug media/storage device On starting the open source project, we focus to the following issue, first. - File System corruption on unplug media/storage device Almost same as power down without umount And, we are planing to focus on HDD device and treat system power down instead of unplug media, because A. Damages and it's counter methods may depend on property of lower layer E.g. - Memory Card Some controller can guaranty atomicity of certain operations - Flush Memory (NAND, NOR) I/O operations may be constrained by Block Size (e,g, 128KB) or Page Size (e.g. 2KB) - HDD - Cache memory my resident inside in - Sector which is under writing on power down may be corrupted(can't read anymore) B. It may make the problem easier - Sector size is 512 Byte - Many developers may check with PC Q2 : Do you know any other storage devices and it's property, to be address later? 3. Features to be developed for FAT corruption. We currently plan to add following features to address FAT corruption. - Utilize standard 2.6 features as much as possible - Implement as options of fat, vfat and uvfat - Utilize existent journal block device (JBD) for transaction control - Utilize noop elevator to cancel unexpected operation reordering - Coordinate order of operations so that update data first, meta data later with transaction control - With O_SYNC, close() make flush all related data and meta-data, then wait completion of I/O Q3 : I'm not sure JBD can be used for FAT improvements. Do you have any comments ? Thanks, Hiroyuki Machida - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Preserve hibenate-system-image on startup
We are now investigating fast startup/shutdown using 2.6 kernel PM functions. An attached patch enables kernel to preserve system image on startup, to implement "Snapshot boot"[EMAIL PROTECTED] wrote: Conventionally system image will be broken after startup. Snapshot boot uses un-hibernate from a permanent system image for startup. During shutdown, does a conventional shutdown without saving a system image. We'll explain concept and initial work at OLS. So if you have interest, we can talk with you at Ottawa. Thanks, Hiroyuki Machida --- This patch enables preserving swsuspend system image over boot cycle, against 2.6.12 Signed-off-by: Hiroyui Machida <[EMAIL PROTECTED]> for CELF - Index: alp-linux--dev-2-6-12--1.7/kernel/power/Kconfig === --- alp-linux--dev-2-6-12--1.7.orig/kernel/power/Kconfig2005-07-15 14:59:20.0 -0400 +++ alp-linux--dev-2-6-12--1.7/kernel/power/Kconfig 2005-07-16 00:43:31.42000 -0400 @@ -84,6 +84,20 @@ suspended image to. It will simply pick the first available swap device. +config PRESERVE_SWSUSP_IMAGE + bool "Preserve swsuspend image" + depends on SOFTWARE_SUSPEND + default n + ---help--- + Useally boot with swsup destories the swsusp image. + This function enables to preserve swsup image over boot cycle. + Default behavior is not chaged even this configuration turned on. + + To preseve swsusp image, specify following option to command line; + + prsv-img + + config DEFERRED_RESUME bool "Deferred resume" depends on PM Index: alp-linux--dev-2-6-12--1.7/kernel/power/disk.c === --- alp-linux--dev-2-6-12--1.7.orig/kernel/power/disk.c 2005-07-16 00:43:02.99000 -0400 +++ alp-linux--dev-2-6-12--1.7/kernel/power/disk.c 2005-07-16 01:01:42.22000 -0400 @@ -29,10 +29,29 @@ extern void swsusp_close(void); extern int swsusp_resume(void); extern int swsusp_free(void); +extern void dump_pagedir_nosave(void); #ifdef CONFIG_SAFE_SUSPEND extern int suspend_remount(void); extern int resume_remount(void); #endif +#ifdef CONFIG_PRESERVE_SWSUSP_IMAGE +extern int preserve_swsusp_image; +extern dev_t swsusp_resume_device_nosave __nosavedata; +extern int swsusp_swap_rdonly(dev_t); +extern int swsusp_swap_off(dev_t); +#else +#define preserve_swsusp_image 0 +#define swsusp_resume_device_nosave 0 +static inline int swsusp_swap_rdonly(dev_t dev) +{ + return 0; +} +static inline int swsusp_swap_off(dev_t dev) +{ + return 0; +} +#endif + static int noresume = 0; @@ -135,6 +154,26 @@ pm_restore_console(); } +#ifdef CONFIG_PRESERVE_SWSUSP_IMAGE +void finish_in_resume(void) +{ + device_resume(); + platform_finish(); + enable_nonboot_cpus(); + thaw_processes(); + if (preserve_swsusp_image) { + swsusp_swap_off(swsusp_resume_device_nosave); + } + pm_restore_console(); +} +#else +void finish_in_resume(void) +{ + finish(); +} +#endif + + extern atomic_t on_suspend; /* See refrigerator() */ static int prepare_processes(void) @@ -234,8 +273,15 @@ error = swsusp_write(); if (!error) power_down(pm_disk_mode); - } else + } else { pr_debug("PM: Image restored successfully.\n"); + if (preserve_swsusp_image) { + swsusp_swap_rdonly(swsusp_resume_device_nosave); + } + swsusp_free(); + finish_in_resume(); + return 0; + } swsusp_free(); Done: finish(); Index: alp-linux--dev-2-6-12--1.7/kernel/power/swsusp.c === --- alp-linux--dev-2-6-12--1.7.orig/kernel/power/swsusp.c 2005-07-16 00:43:03.0 -0400 +++ alp-linux--dev-2-6-12--1.7/kernel/power/swsusp.c2005-07-16 00:56:22.17000 -0400 @@ -128,6 +128,11 @@ static struct swsusp_info swsusp_info; +#ifdef CONFIG_PRESERVE_SWSUSP_IMAGE +dev_t swsusp_resume_device_nosave __nosavedata; +struct swsusp_header swsusp_header_nosave __nosavedata ; +#endif + /* * XXX: We try to keep some more pages free so that I/O operations succeed * without paging. Might this be more? @@ -139,6 +144,24 @@ #define PAGES_FOR_IO 512 #endif +#ifdef CONFIG_PRESERVE_SWSUSP_IMAGE +int preserve_swsusp_image=0; +static int __init preserve_swsusp_image_setup(char *str) +{ + if (*str) + return 0; + preserve_swsusp_image = 1; + return 1; +} +#else +static int __init preserve_swsusp_image_setup(char *str) +{ + return 0; +} +#endif + +__setup("prsv-img", preserve_swsusp_image_setup); + /* * Saving part...