Re: Review status (Re: [PATCH] LogFS take three)
On Wed, May 23, 2007 at 05:14:04PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: > > > I'm just a German. Forgive me if I drink lesser beverages. > > > > You should definitely change that. > > Change being German? Not a bad idea, actually. You cook up really tasty shnaps, in small quantities it is good for health in infinite volumes. > > Btw, what about this piece: > > > > int logfs_erase_segment(struct super_block *sb, u32 index) > > { > > struct logfs_super *super = LOGFS_SUPER(sb); > > > > super->s_gec++; > > > > return mtderase(sb, index << super->s_segshift, super->s_segsize); > > } > > > > index << super->s_segshift might overflow, mtderase expects loff_t > > there, since index can be arbitrary segment number, is it possible, that > > overflow really occurs? > > Indeed it is. You just earned your second beer^Wvodka. Actually this code looks less encrypted than ext2 for, which definitely a good sign from reviewer's point of view. > Jörn -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Review status (Re: [PATCH] LogFS take three)
On Wed, 23 May 2007 19:07:32 +0400, Evgeniy Polyakov wrote: > On Wed, May 23, 2007 at 02:58:41PM +0200, Jörn Engel ([EMAIL PROTECTED]) > wrote: > > On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote: > > And what if it is 33 bits? Or it is not allowed? Not allowed. Both number and size of segments may never exceed 32bit. > > > segsize is long, but should be u64 I think. > > > > It could be s32 as well. > > It is a matter of definition - if segment size is allowed to be more > than 32 bits, then below transformation is not correct, otherwise > segment size should not use additional 32bits on 64bit platform, since > it is long. I guess I could save 4 Bytes there. > > I'm just a German. Forgive me if I drink lesser beverages. > > You should definitely change that. Change being German? Not a bad idea, actually. > Btw, what about this piece: > > int logfs_erase_segment(struct super_block *sb, u32 index) > { > struct logfs_super *super = LOGFS_SUPER(sb); > > super->s_gec++; > > return mtderase(sb, index << super->s_segshift, super->s_segsize); > } > > index << super->s_segshift might overflow, mtderase expects loff_t > there, since index can be arbitrary segment number, is it possible, that > overflow really occurs? Indeed it is. You just earned your second beer^Wvodka. Jörn -- The wise man seeks everything in himself; the ignorant man tries to get everything from somebody else. -- unknown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Review status (Re: [PATCH] LogFS take three)
On Wed, May 23, 2007 at 02:58:41PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: > On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote: > > > > In that case segment size must be more than 32 bits, or below > > transformation will not be correct? > > Must it? If segment size is just 20bit then the filesystem may only be > 52bit. Or 51bit when using signed values. And what if it is 33 bits? Or it is not allowed? > > segsize is long, but should be u64 I think. > > It could be s32 as well. It is a matter of definition - if segment size is allowed to be more than 32 bits, then below transformation is not correct, otherwise segment size should not use additional 32bits on 64bit platform, since it is long. > > static void fixup_from_wbuf(struct super_block *sb, struct logfs_area > > *area, void *read, u64 ofs, size_t readlen) > > > > u32 read_start = ofs & (super->s_segsize - 1); > > u32 read_end = read_start + readlen; > > > > And this can overflow, since readlen is size_t. > > Theoretically yes. Practically readlen is bounded to sb->blocksize plus > one header. I'll start worrying about that when blocksize approaches > 32bit limit. > > > > If anyone can find similar bugs, the bounty is a beer or non-alcoholic > > > beverage of choice. :) > > > > Stop kiling your kidneys, your health and promote such antisocial style > > of life, start drinking vodka instead. > > I'm just a German. Forgive me if I drink lesser beverages. You should definitely change that. Btw, what about this piece: int logfs_erase_segment(struct super_block *sb, u32 index) { struct logfs_super *super = LOGFS_SUPER(sb); super->s_gec++; return mtderase(sb, index << super->s_segshift, super->s_segsize); } index << super->s_segshift might overflow, mtderase expects loff_t there, since index can be arbitrary segment number, is it possible, that overflow really occurs? > Jörn > > -- > Eighty percent of success is showing up. > -- Woody Allen -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Review status (Re: [PATCH] LogFS take three)
On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote: > > In that case segment size must be more than 32 bits, or below > transformation will not be correct? Must it? If segment size is just 20bit then the filesystem may only be 52bit. Or 51bit when using signed values. > segsize is long, but should be u64 I think. It could be s32 as well. > static void fixup_from_wbuf(struct super_block *sb, struct logfs_area > *area, void *read, u64 ofs, size_t readlen) > > u32 read_start = ofs & (super->s_segsize - 1); > u32 read_end = read_start + readlen; > > And this can overflow, since readlen is size_t. Theoretically yes. Practically readlen is bounded to sb->blocksize plus one header. I'll start worrying about that when blocksize approaches 32bit limit. > > If anyone can find similar bugs, the bounty is a beer or non-alcoholic > > beverage of choice. :) > > Stop kiling your kidneys, your health and promote such antisocial style > of life, start drinking vodka instead. I'm just a German. Forgive me if I drink lesser beverages. Jörn -- Eighty percent of success is showing up. -- Woody Allen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Review status (Re: [PATCH] LogFS take three)
On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote: In that case segment size must be more than 32 bits, or below transformation will not be correct? Must it? If segment size is just 20bit then the filesystem may only be 52bit. Or 51bit when using signed values. segsize is long, but should be u64 I think. It could be s32 as well. static void fixup_from_wbuf(struct super_block *sb, struct logfs_area *area, void *read, u64 ofs, size_t readlen) u32 read_start = ofs (super-s_segsize - 1); u32 read_end = read_start + readlen; And this can overflow, since readlen is size_t. Theoretically yes. Practically readlen is bounded to sb-blocksize plus one header. I'll start worrying about that when blocksize approaches 32bit limit. If anyone can find similar bugs, the bounty is a beer or non-alcoholic beverage of choice. :) Stop kiling your kidneys, your health and promote such antisocial style of life, start drinking vodka instead. I'm just a German. Forgive me if I drink lesser beverages. Jörn -- Eighty percent of success is showing up. -- Woody Allen - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Review status (Re: [PATCH] LogFS take three)
On Wed, May 23, 2007 at 02:58:41PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote: In that case segment size must be more than 32 bits, or below transformation will not be correct? Must it? If segment size is just 20bit then the filesystem may only be 52bit. Or 51bit when using signed values. And what if it is 33 bits? Or it is not allowed? segsize is long, but should be u64 I think. It could be s32 as well. It is a matter of definition - if segment size is allowed to be more than 32 bits, then below transformation is not correct, otherwise segment size should not use additional 32bits on 64bit platform, since it is long. static void fixup_from_wbuf(struct super_block *sb, struct logfs_area *area, void *read, u64 ofs, size_t readlen) u32 read_start = ofs (super-s_segsize - 1); u32 read_end = read_start + readlen; And this can overflow, since readlen is size_t. Theoretically yes. Practically readlen is bounded to sb-blocksize plus one header. I'll start worrying about that when blocksize approaches 32bit limit. If anyone can find similar bugs, the bounty is a beer or non-alcoholic beverage of choice. :) Stop kiling your kidneys, your health and promote such antisocial style of life, start drinking vodka instead. I'm just a German. Forgive me if I drink lesser beverages. You should definitely change that. Btw, what about this piece: int logfs_erase_segment(struct super_block *sb, u32 index) { struct logfs_super *super = LOGFS_SUPER(sb); super-s_gec++; return mtderase(sb, index super-s_segshift, super-s_segsize); } index super-s_segshift might overflow, mtderase expects loff_t there, since index can be arbitrary segment number, is it possible, that overflow really occurs? Jörn -- Eighty percent of success is showing up. -- Woody Allen -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Review status (Re: [PATCH] LogFS take three)
On Wed, 23 May 2007 19:07:32 +0400, Evgeniy Polyakov wrote: On Wed, May 23, 2007 at 02:58:41PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote: And what if it is 33 bits? Or it is not allowed? Not allowed. Both number and size of segments may never exceed 32bit. segsize is long, but should be u64 I think. It could be s32 as well. It is a matter of definition - if segment size is allowed to be more than 32 bits, then below transformation is not correct, otherwise segment size should not use additional 32bits on 64bit platform, since it is long. I guess I could save 4 Bytes there. I'm just a German. Forgive me if I drink lesser beverages. You should definitely change that. Change being German? Not a bad idea, actually. Btw, what about this piece: int logfs_erase_segment(struct super_block *sb, u32 index) { struct logfs_super *super = LOGFS_SUPER(sb); super-s_gec++; return mtderase(sb, index super-s_segshift, super-s_segsize); } index super-s_segshift might overflow, mtderase expects loff_t there, since index can be arbitrary segment number, is it possible, that overflow really occurs? Indeed it is. You just earned your second beer^Wvodka. Jörn -- The wise man seeks everything in himself; the ignorant man tries to get everything from somebody else. -- unknown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Review status (Re: [PATCH] LogFS take three)
On Wed, May 23, 2007 at 05:14:04PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: I'm just a German. Forgive me if I drink lesser beverages. You should definitely change that. Change being German? Not a bad idea, actually. You cook up really tasty shnaps, in small quantities it is good for health in infinite volumes. Btw, what about this piece: int logfs_erase_segment(struct super_block *sb, u32 index) { struct logfs_super *super = LOGFS_SUPER(sb); super-s_gec++; return mtderase(sb, index super-s_segshift, super-s_segsize); } index super-s_segshift might overflow, mtderase expects loff_t there, since index can be arbitrary segment number, is it possible, that overflow really occurs? Indeed it is. You just earned your second beer^Wvodka. Actually this code looks less encrypted than ext2 for, which definitely a good sign from reviewer's point of view. Jörn -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Review status (Re: [PATCH] LogFS take three)
On Thu, May 17, 2007 at 07:10:19PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: > On Thu, 17 May 2007 20:03:11 +0400, Evgeniy Polyakov wrote: > > > > Is logfs 32bit fs or 674bit, since although you use 64bit values for > > offsets, area management and strange converstions like described below > > from offset into segment number are performed in 32bit? > > Is it enough for SSD for example to be 32bit only? Or if it is 64bit, > > could you please explain logic behind area management? > > Ignoring bugs and signed return values for error handling, it is either > 64bit or 32+32bit. > > Inode numbers and file positions are 64bit. Offsets are 64bit as well. > In a couple of places, offsets are also 32+32bit. Basically the high > bits contain the segment number, the lower bits the offset within a > segment. In that case segment size must be more than 32 bits, or below transformation will not be correct? segsize is long, but should be u64 I think. static void fixup_from_wbuf(struct super_block *sb, struct logfs_area *area, void *read, u64 ofs, size_t readlen) u32 read_start = ofs & (super->s_segsize - 1); u32 read_end = read_start + readlen; And this can overflow, since readlen is size_t. It is wbuf fixup, but I saw that somewhere else. Although, according to your description, it should be 32bit, sum can be more than 32 bit. > If anyone can find similar bugs, the bounty is a beer or non-alcoholic > beverage of choice. :) Stop kiling your kidneys, your health and promote such antisocial style of life, start drinking vodka instead. > Jörn -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Review status (Re: [PATCH] LogFS take three)
On Thu, May 17, 2007 at 07:10:19PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: On Thu, 17 May 2007 20:03:11 +0400, Evgeniy Polyakov wrote: Is logfs 32bit fs or 674bit, since although you use 64bit values for offsets, area management and strange converstions like described below from offset into segment number are performed in 32bit? Is it enough for SSD for example to be 32bit only? Or if it is 64bit, could you please explain logic behind area management? Ignoring bugs and signed return values for error handling, it is either 64bit or 32+32bit. Inode numbers and file positions are 64bit. Offsets are 64bit as well. In a couple of places, offsets are also 32+32bit. Basically the high bits contain the segment number, the lower bits the offset within a segment. In that case segment size must be more than 32 bits, or below transformation will not be correct? segsize is long, but should be u64 I think. static void fixup_from_wbuf(struct super_block *sb, struct logfs_area *area, void *read, u64 ofs, size_t readlen) u32 read_start = ofs (super-s_segsize - 1); u32 read_end = read_start + readlen; And this can overflow, since readlen is size_t. It is wbuf fixup, but I saw that somewhere else. Although, according to your description, it should be 32bit, sum can be more than 32 bit. If anyone can find similar bugs, the bounty is a beer or non-alcoholic beverage of choice. :) Stop kiling your kidneys, your health and promote such antisocial style of life, start drinking vodka instead. Jörn -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Saturday 19 May 2007 5:24 am, Jan Engelhardt wrote: > > On May 19 2007 02:15, Rob Landley wrote: > >> > + > >> > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode) > >> > +{ > >> > +return container_of(inode, struct logfs_inode, vfs_inode); > >> > +} > >> > >> Do these need to be uppercase? > > > >I'm trying to keep it clear in my head... > > > >When do you need to say __always_inline and when can you get away with > >just saying "static inline"? > > When using "static inline", the compiler may ignore the inline keyword > (it's just a hint), and leave the function as a standalone function. > > When CONFIG_FORCED_INLINING is active, and it is by default, inline is > always substituted by __always_inline, to be on the safe side. Some code > needs to be always inline; but not all code has been checked whether it > is safe to go from __always_inline to inline. I've seen patches go by using __always_inline directly. Is there some janitorial effort to examine each instance of the the inline keyword and either replace it with "__always_inline" or remove it? Right now "inline" seems to be about as useful as the "register" keyword. You don't feed hints to a compiler like gcc, you hit it with a two-by-four and thumbscrews if you want to get its' attention. Rob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Sat, May 19, 2007 at 05:17:32PM +0100, Jamie Lokier ([EMAIL PROTECTED]) wrote: > > So, log2fs... Sounds great to me. > > Why Log2? Logarithmic scaling is just logarithmic scaling. Does the > filesystem use 2-ary trees or anything else which gives particular > meaning to 2? Sizes used in on-disk format are rounded to the nearest power-of-two. > -- Jamie -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
David Weinehall wrote: > > It is also the filesystem that tries to scale logarithmically, as Arnd > > has noted. Maybe I should call it Log2 to emphesize this point. Log1 > > would be horrible scalability. > > So, log2fs... Sounds great to me. Why Log2? Logarithmic scaling is just logarithmic scaling. Does the filesystem use 2-ary trees or anything else which gives particular meaning to 2? -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
Dongjun Shin wrote: There are so many flash-based storage and some disposable storages, as you pointed out, have poor quality. I think it's mainly because these are not designed for good quality, but for lowering the price. The reliability seems to be appropriate to the common use. I'm doubious that computer storage was a big design factor until the last few years. A good argument for buying large sizes, they are more likely to be recent design. These kind of devices are not ready for things like power failure because their use case is far from that. For example, removing flash card while taking pictures using digital camera is not a common use case. (there should be a written notice that this kind of action is against the warranty) They do well in such use, if you equate battery death to pulling the card (it may not be). I have tested that feature and not had a failure of any but the last item. Clearly not recommended, but sometimes unplanned needs arise. - In contrast to the embedded environment where CPU and flash is directly connected, the I/O path between CPU and flash in PC environment is longer. The latency for SW handshaking between CPU and flash will also be longer, which would make the performance optimization harder. As I mentioned, some techniques like log-structured filesystem could perform generally better on any kind of flash-based storage with FTL. Although there are many kinds of FTL, it is commonly true that it performs well under workload where sequential write is dominant. I also expect that FTL for PC environment will have better quality spec than the disposable storage. The recent technology announcements from Intel are encouraging in that respect. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Wed, May 16, 2007 at 03:53:19PM +0200, Jörn Engel wrote: > On Wed, 16 May 2007 09:41:10 -0400, John Stoffel wrote: > > Jörn> On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: > > > > Jörn> How many of you have worked for IBM before? Vowels are not > > evil. ;) > > > > Nope, they're not. I just think that LogFS isn't descriptive enough, > > or more accurately, is the *wrong* description of this filesystem. > > That was the whole point. JFFS2, the journaling flash filesystem, is a > strictly log-structured filesystem. LogFS has a journal. > > It is also the filesystem that tries to scale logarithmically, as Arnd > has noted. Maybe I should call it Log2 to emphesize this point. Log1 > would be horrible scalability. So, log2fs... Sounds great to me. [snip] Regards: David -- /) David Weinehall <[EMAIL PROTECTED]> /) Northern lights wander (\ // Maintainer of the v2.0 kernel // Dance across the winter sky // \) http://www.acc.umu.se/~tao/(/ Full colour fire (/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
Kevin Bowling wrote: On 5/16/07, David Woodhouse <[EMAIL PROTECTED]> wrote: On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote: > > My experience is that no matter which name I pick, people will > complain > anyway. Previous suggestions included: > jffs3 > jefs > engelfs > poofs > crapfs > sweetfs > cutefs > dynamic journaling fs - djofs > tfsfkal - the file system formerly known as logfs Can we call it jörnfs? :) However if Jörn is accused of murder, it will have little chance of being merged :-). WRT that, seems that Nina had a lover who is a confessed serial killer. I'm surprised the case hasn't been adapter for 'Boston legal' and 'Law and Order' like other high profile crimes. I see nothing wrong with jörnfs, and there's room for numbers at the end... -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 19 2007 02:15, Rob Landley wrote: >> > + >> > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode) >> > +{ >> > + return container_of(inode, struct logfs_inode, vfs_inode); >> > +} >> >> Do these need to be uppercase? > >I'm trying to keep it clear in my head... > >When do you need to say __always_inline and when can you get away with >just saying "static inline"? When using "static inline", the compiler may ignore the inline keyword (it's just a hint), and leave the function as a standalone function. When CONFIG_FORCED_INLINING is active, and it is by default, inline is always substituted by __always_inline, to be on the safe side. Some code needs to be always inline; but not all code has been checked whether it is safe to go from __always_inline to inline. Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Tuesday 15 May 2007 4:37 pm, Andrew Morton wrote: > > +static inline struct logfs_super *LOGFS_SUPER(struct super_block *sb) > > +{ > > + return sb->s_fs_info; > > +} > > + > > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode) > > +{ > > + return container_of(inode, struct logfs_inode, vfs_inode); > > +} > > Do these need to be uppercase? I'm trying to keep it clear in my head... When do you need to say __always_inline and when can you get away with just saying "static inline"? (I'm attempting to write documentation on a topic I don't understand. Best way to learn it, I've found...) > > + buf = kmap(page); > > + ret = logfs_write_buf(inode, index, buf); > > + kunmap(page); > > kmap() is lame. The preferred approach would be to pass the page* down to > the lower layers and to use kmap_atomic() at the lowest possible point. Um, would I read about this in DMA-mapping.txt or cachetlb.txt? (I don't think it's fujitsu/frv/mmu-layout.txt) Rob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Tuesday 15 May 2007 4:37 pm, Andrew Morton wrote: +static inline struct logfs_super *LOGFS_SUPER(struct super_block *sb) +{ + return sb-s_fs_info; +} + +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode) +{ + return container_of(inode, struct logfs_inode, vfs_inode); +} Do these need to be uppercase? I'm trying to keep it clear in my head... When do you need to say __always_inline and when can you get away with just saying static inline? (I'm attempting to write documentation on a topic I don't understand. Best way to learn it, I've found...) + buf = kmap(page); + ret = logfs_write_buf(inode, index, buf); + kunmap(page); kmap() is lame. The preferred approach would be to pass the page* down to the lower layers and to use kmap_atomic() at the lowest possible point. Um, would I read about this in DMA-mapping.txt or cachetlb.txt? (I don't think it's fujitsu/frv/mmu-layout.txt) Rob - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 19 2007 02:15, Rob Landley wrote: + +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode) +{ + return container_of(inode, struct logfs_inode, vfs_inode); +} Do these need to be uppercase? I'm trying to keep it clear in my head... When do you need to say __always_inline and when can you get away with just saying static inline? When using static inline, the compiler may ignore the inline keyword (it's just a hint), and leave the function as a standalone function. When CONFIG_FORCED_INLINING is active, and it is by default, inline is always substituted by __always_inline, to be on the safe side. Some code needs to be always inline; but not all code has been checked whether it is safe to go from __always_inline to inline. Jan -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
Kevin Bowling wrote: On 5/16/07, David Woodhouse [EMAIL PROTECTED] wrote: On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote: My experience is that no matter which name I pick, people will complain anyway. Previous suggestions included: jffs3 jefs engelfs poofs crapfs sweetfs cutefs dynamic journaling fs - djofs tfsfkal - the file system formerly known as logfs Can we call it jörnfs? :) However if Jörn is accused of murder, it will have little chance of being merged :-). WRT that, seems that Nina had a lover who is a confessed serial killer. I'm surprised the case hasn't been adapter for 'Boston legal' and 'Law and Order' like other high profile crimes. I see nothing wrong with jörnfs, and there's room for numbers at the end... -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Wed, May 16, 2007 at 03:53:19PM +0200, Jörn Engel wrote: On Wed, 16 May 2007 09:41:10 -0400, John Stoffel wrote: Jörn On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: Jörn How many of you have worked for IBM before? Vowels are not evil. ;) Nope, they're not. I just think that LogFS isn't descriptive enough, or more accurately, is the *wrong* description of this filesystem. That was the whole point. JFFS2, the journaling flash filesystem, is a strictly log-structured filesystem. LogFS has a journal. It is also the filesystem that tries to scale logarithmically, as Arnd has noted. Maybe I should call it Log2 to emphesize this point. Log1 would be horrible scalability. So, log2fs... Sounds great to me. [snip] Regards: David -- /) David Weinehall [EMAIL PROTECTED] /) Northern lights wander (\ // Maintainer of the v2.0 kernel // Dance across the winter sky // \) http://www.acc.umu.se/~tao/(/ Full colour fire (/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
Dongjun Shin wrote: There are so many flash-based storage and some disposable storages, as you pointed out, have poor quality. I think it's mainly because these are not designed for good quality, but for lowering the price. The reliability seems to be appropriate to the common use. I'm doubious that computer storage was a big design factor until the last few years. A good argument for buying large sizes, they are more likely to be recent design. These kind of devices are not ready for things like power failure because their use case is far from that. For example, removing flash card while taking pictures using digital camera is not a common use case. (there should be a written notice that this kind of action is against the warranty) They do well in such use, if you equate battery death to pulling the card (it may not be). I have tested that feature and not had a failure of any but the last item. Clearly not recommended, but sometimes unplanned needs arise. - In contrast to the embedded environment where CPU and flash is directly connected, the I/O path between CPU and flash in PC environment is longer. The latency for SW handshaking between CPU and flash will also be longer, which would make the performance optimization harder. As I mentioned, some techniques like log-structured filesystem could perform generally better on any kind of flash-based storage with FTL. Although there are many kinds of FTL, it is commonly true that it performs well under workload where sequential write is dominant. I also expect that FTL for PC environment will have better quality spec than the disposable storage. The recent technology announcements from Intel are encouraging in that respect. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
David Weinehall wrote: It is also the filesystem that tries to scale logarithmically, as Arnd has noted. Maybe I should call it Log2 to emphesize this point. Log1 would be horrible scalability. So, log2fs... Sounds great to me. Why Log2? Logarithmic scaling is just logarithmic scaling. Does the filesystem use 2-ary trees or anything else which gives particular meaning to 2? -- Jamie - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Sat, May 19, 2007 at 05:17:32PM +0100, Jamie Lokier ([EMAIL PROTECTED]) wrote: So, log2fs... Sounds great to me. Why Log2? Logarithmic scaling is just logarithmic scaling. Does the filesystem use 2-ary trees or anything else which gives particular meaning to 2? Sizes used in on-disk format are rounded to the nearest power-of-two. -- Jamie -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Saturday 19 May 2007 5:24 am, Jan Engelhardt wrote: On May 19 2007 02:15, Rob Landley wrote: + +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode) +{ +return container_of(inode, struct logfs_inode, vfs_inode); +} Do these need to be uppercase? I'm trying to keep it clear in my head... When do you need to say __always_inline and when can you get away with just saying static inline? When using static inline, the compiler may ignore the inline keyword (it's just a hint), and leave the function as a standalone function. When CONFIG_FORCED_INLINING is active, and it is by default, inline is always substituted by __always_inline, to be on the safe side. Some code needs to be always inline; but not all code has been checked whether it is safe to go from __always_inline to inline. I've seen patches go by using __always_inline directly. Is there some janitorial effort to examine each instance of the the inline keyword and either replace it with __always_inline or remove it? Right now inline seems to be about as useful as the register keyword. You don't feed hints to a compiler like gcc, you hit it with a two-by-four and thumbscrews if you want to get its' attention. Rob - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Fri, 2007-05-18 at 08:17 +0200, Jan Engelhardt wrote: > > AFAIK, the camera stops writing to the flash card and automatically > > turns off when it's low on battery (before empty). > > But then, one should also consider the case where a cam is connected to > AC and someone inadvertently trips on the power cord. So you stick a bloody great cap on board to give you enough time to shut it down cleanly. I've known people do this -- and it helps, but the devices still manage to crap themselves occasionally even then. They're _disposable_. As are your data :) -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 18 2007 09:01, Dongjun Shin wrote: > On 5/18/07, Pavel Machek <[EMAIL PROTECTED]> wrote: >> >> Hmm.. so operating your camera on batteries should be against the >> warranty, since batteries commonly run empty while storing pictures? > > AFAIK, the camera stops writing to the flash card and automatically > turns off when it's low on battery (before empty). But then, one should also consider the case where a cam is connected to AC and someone inadvertently trips on the power cord. Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 17 2007 21:00, Kyle Moffett wrote: >> > > Opinions? >> > >> > Why would we need another btree, when there is lib/rbtree.c? Or does >> > yours do something fundamentally different? >> >> It is not red-black tree, it is b+ tree. > > It might be better to use the prefix "bptree" to help prevent confusion. A > quick google search on "bp-tree" reveals only the perl B+-tree module > "Tree::BPTree", a U-Maryland Java CS project on B+-trees, and a news article > about a "BP tree-top protest". BP heh.. How about "struct bplustree"? Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 17 2007 21:00, Kyle Moffett wrote: Opinions? Why would we need another btree, when there is lib/rbtree.c? Or does yours do something fundamentally different? It is not red-black tree, it is b+ tree. It might be better to use the prefix bptree to help prevent confusion. A quick google search on bp-tree reveals only the perl B+-tree module Tree::BPTree, a U-Maryland Java CS project on B+-trees, and a news article about a BP tree-top protest. BP heh.. How about struct bplustree? Jan -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 18 2007 09:01, Dongjun Shin wrote: On 5/18/07, Pavel Machek [EMAIL PROTECTED] wrote: Hmm.. so operating your camera on batteries should be against the warranty, since batteries commonly run empty while storing pictures? AFAIK, the camera stops writing to the flash card and automatically turns off when it's low on battery (before empty). But then, one should also consider the case where a cam is connected to AC and someone inadvertently trips on the power cord. Jan -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Fri, 2007-05-18 at 08:17 +0200, Jan Engelhardt wrote: AFAIK, the camera stops writing to the flash card and automatically turns off when it's low on battery (before empty). But then, one should also consider the case where a cam is connected to AC and someone inadvertently trips on the power cord. So you stick a bloody great cap on board to give you enough time to shut it down cleanly. I've known people do this -- and it helps, but the devices still manage to crap themselves occasionally even then. They're _disposable_. As are your data :) -- dwmw2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 17, 2007, at 13:45:33, Evgeniy Polyakov wrote: On Thu, May 17, 2007 at 07:26:07PM +0200, Jan Engelhardt ([EMAIL PROTECTED]) wrote: My plan was to move this code to lib/ sooner or later. If you consider it useful in its current state, I can do it immediatly. And if someone else merged a superior btree library I'd happily remove mine and use the new one instead. Opinions? Why would we need another btree, when there is lib/rbtree.c? Or does yours do something fundamentally different? It is not red-black tree, it is b+ tree. It might be better to use the prefix "bptree" to help prevent confusion. A quick google search on "bp-tree" reveals only the perl B +-tree module "Tree::BPTree", a U-Maryland Java CS project on B+- trees, and a news article about a "BP tree-top protest". Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
Hi, On 5/18/07, Pavel Machek <[EMAIL PROTECTED]> wrote: Hi! Hmm.. so operating your camera on batteries should be against the warranty, since batteries commonly run empty while storing pictures? AFAIK, the camera stops writing to the flash card and automatically turns off when it's low on battery (before empty). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
Jörn Engel wrote: > > Almost all your static functions start with logfs_, why not this one? > > Because after a while I discovered how silly it is to start every > function with logfs_. That prefix doesn't add much unless the function > has global scope. What I didn't do was remove the prefix from older > functions. It's handy when debugging or showing detailed backtraces. Not that I'm advocating it (or not), just something I've noticed in other programs. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thu, 17 May 2007 23:36:13 +0200, Arnd Bergmann wrote: > On Thursday 17 May 2007, Pekka Enberg wrote: > > > > So any sane way to enable compression is on per-inode basis which makes > > me still wonder why you need per-object compression. > > 1. it doesn't require user interaction, the file system will do the right > thing most of the time. > > 2. enlarging data is a very bad thing because it makes the behaviour > of the fs unpredictable. With uncompressed objects, you have a guaranteed > upper bound on the size. Correct. The compression decision is always per-object. Per-inode is a hint from userspace that a compression attempt would be futile. A compression algorithm that compresses any data is provably impossible. Some data will always cause expansion instead of compression. Some algorithms have a well-known upper bound on the expansion, others don't. So LogFS instead creates its own upper bound by reserving one byte in the header for the compression type. And while one bit would suffice as a compressed/uncompressed flag, having a byte allows to support more than one compression algorithm. LZO looks promising and is on its way into the kernel. Others may come in the future. Jörn -- My second remark is that our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed. -- Edsger W. Dijkstra - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thursday 17 May 2007, Pekka Enberg wrote: > > Jörn Engel wrote: > > Compressing random data will actually enlarge it. If that happens I > > simply store the verbatim uncompressed data instead and mark it as such. > > > > There is also demand for a user-controlled bit in the inode to disable > > compression completely. All those .jpg, .mpg, .mp3, etc. just waste > > time by trying and failing to compress them. > > So any sane way to enable compression is on per-inode basis which makes > me still wonder why you need per-object compression. 1. it doesn't require user interaction, the file system will do the right thing most of the time. 2. enlarging data is a very bad thing because it makes the behaviour of the fs unpredictable. With uncompressed objects, you have a guaranteed upper bound on the size. Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thu, 17 May 2007 23:00:20 +0200, Arnd Bergmann wrote: > > Just using nanoseconds probably doesn't gain you much after all > then. You could however just have separate 32 bit fields in the > inode for seconds and nanoseconds, that will result in the exact > same layout that you have right now, but won't require a conversion > function. I could also have a 30bit and a 34bit field. 30bit is enough for nanoseconds. So many options. Jörn -- Time? What's that? Time is only worth what you do with it. -- Theo de Raadt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thursday 17 May 2007, Jörn Engel wrote: > > > Why not just store 64 bit nanoseconds? that would avoid the problem > > with ns overflow and the year-2038 bug. OTOH, that would require > > a 64 bit integer division when reading the data, so it gets you > > a runtime overhead. > > I like the idea. Do conversion function exist both way? > > What I don't get is the year-2038 bug. Isn't that the 31bit limit, > while 32bit would last to 2106? You're right, you don't hit the 2038 bug here, because you use an unsigned variable. The bug exists elsewhere because time_t tv_sec is signed. Just using nanoseconds probably doesn't gain you much after all then. You could however just have separate 32 bit fields in the inode for seconds and nanoseconds, that will result in the exact same layout that you have right now, but won't require a conversion function. Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
Jörn Engel wrote: Compressing random data will actually enlarge it. If that happens I simply store the verbatim uncompressed data instead and mark it as such. There is also demand for a user-controlled bit in the inode to disable compression completely. All those .jpg, .mpg, .mp3, etc. just waste time by trying and failing to compress them. So any sane way to enable compression is on per-inode basis which makes me still wonder why you need per-object compression. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
Hi! > >Yes. These things are almost always implemented _very_ > >badly by the same > >kind of crack-smoking hobo they drag in off the streets > >to write BIOSen. > > > >It's bog-roll technology; if you fancy a laugh try > >doing some real > >reliability tests on them time some. Powerfail testing > >is a good one. > > > >This kind of thing is OK for disposable storage such as > >in digital > >cameras, where it doesn't matter that it's no more > >reliable than a > >floppy disc, but for real long-term storage it's really > >a bad idea. > > > > There are so many flash-based storage and some > disposable storages, > as you pointed out, have poor quality. I think it's > mainly because these > are not designed for good quality, but for lowering the > price. > > These kind of devices are not ready for things like > power failure because > their use case is far from that. For example, removing > flash card > while taking pictures using digital camera is not a > common use case. > (there should be a written notice that this kind of > action is against > the warranty) Hmm.. so operating your camera on batteries should be against the warranty, since batteries commonly run empty while storing pictures? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thu, 17 May 2007 17:08:51 +0200, Arnd Bergmann wrote: > On Tuesday 15 May 2007, Jörn Engel wrote: > > Add LogFS, a scalable flash filesystem. > > Sorry for not commenting earlier, there were so many discussions on version > two that I wanted to wait for the fallout of that instead of duplicating > all the comments. You are the last person that has to be sorry. ;) > Here are a few things I notice while going through the third version: > > > +/* > > + * Private errno for accessed beyond end-of-file. Only used internally to > > + * logfs. If this ever gets exposed to userspace or even other parts of > > the > > + * kernel, it is a bug. 256 was chosen as a number sufficiently above all > > + * used errno #defines. > > + * > > + * It can be argued that this is a hack and should be replaced with > > something > > + * else. My last attempt to do this failed spectacularly and there are > > more > > + * urgent problems that users actually care about. This will remain for > > the > > + * moment. Patches are wellcome, of course. > > + */ > > +#define EOF256 > > It should at least be in the kernel-only errno range between 512 and 4095, > that way it can eventually be added to include/linux/errno.h. Fair enough. 512 it is. > > + * Target rename works in three atomic steps: > > + * 1. Attach old inode to new dentry (remember old dentry and new inode) > > + * 2. Remove old dentry (still remember the new inode) > > + * 3. Remove new inode > > + * > > + * Here we remember both an inode an a dentry. If we get interrupted > > + * between steps 1 and 2, we delete both the dentry and the inode. If > > + * we get interrupted between steps 2 and 3, we delete just the inode. > > + * In either case, the remaining objects are deleted on next mount. From > > + * a users point of view, the operation succeeded. > > This description had me confused for a while: why would you remove the > new inode. Maybe change the text to say 'target inode' or 'victim inode'? 'Victim inode' sounds good. Will do. > > +static int logfs_mkdir(struct inode *dir, struct dentry *dentry, int mode) > > +{ > > + struct inode *inode; > > + > > + if (dir->i_nlink >= LOGFS_LINK_MAX) > > + return -EMLINK; > > Why is i_nlink limited? Don't you run out of space for inodes before > overflowing? I don't know. With the current limit of 2^31, a sufficiently large device can reach the limit. And it is imaginable that overflowing the s32 number space can expose security holes. Not that I actually know, the check is pure paranoia. > > + * In principle, this function should loop forever, looking for GC > > candidates > > + * and moving data. LogFS is designed in such a way that this loop is > > + * guaranteed to terminate. > > + * > > + * Limiting the loop to four iterations serves purely to catch cases when > > + * these guarantees have failed. An actual endless loop is an obvious bug > > + * and should be reported as such. > > + * > > + * But there is another nasty twist to this. As I have described in my LCA > > + * presentation, Garbage collection would have to limit itself to higher > > + * levels if the number of available free segments goes down. This code > > + * doesn't and should fail spectacularly. Yet - hard as I tried I haven't > > + * been able to make it fail (short of a bug elsewhere). > > + * > > + * So in a way this code is intentionally wrong as a desperate cry for a > > + * better testcase. And I do expect to get blamed for it one day. :( > > + */ > > Could you bug the code to reserve fewer segments for GC than you really > need, in order to stress test GC? I could. Wear leveling will cause changes in the area, so I'll have a closer look when implementing that. > > +static struct inode *logfs_alloc_inode(struct super_block *sb) > > +{ > > + struct logfs_inode *li; > > + > > + li = kmem_cache_alloc(logfs_inode_cache, GFP_KERNEL); > > + if (!li) > > + return NULL; > > + logfs_init_inode(>vfs_inode); > > + return >vfs_inode; > > +} > > + > > + > > +struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino) > > +{ > > + struct inode *inode; > > + > > + inode = logfs_alloc_inode(sb); > > + if (!inode) > > + return ERR_PTR(-ENOMEM); > > + > > + logfs_init_inode(inode); > > logfs_alloc_inode() returns an initialized inode, so no need to call > logfs_init_inode() again, right? Right. Will change. > > +static __be64 timespec_to_be64(struct timespec tsp) > > +{ > > + u64 time = ((u64)tsp.tv_sec << 32) + (tsp.tv_nsec & 0x); > > + > > + WARN_ON(tsp.tv_nsec > 9); > > + return cpu_to_be64(time); > > +} > > Why not just store 64 bit nanoseconds? that would avoid the problem > with ns overflow and the year-2038 bug. OTOH, that would require > a 64 bit integer division when reading the data, so it gets you > a runtime overhead. I like the idea. Do conversion function exist both way? What I don't get is the year-2038 bug. Isn't
Re: [PATCH] LogFS take three
On Thu, May 17, 2007 at 07:26:07PM +0200, Jan Engelhardt ([EMAIL PROTECTED]) wrote: > >My plan was to move this code to lib/ sooner or later. If you consider > >it useful in its current state, I can do it immediatly. And if someone > >else merged a superior btree library I'd happily remove mine and use the > >new one instead. > > > >Opinions? > > Why would we need another btree, when there is lib/rbtree.c? > Or does yours do something fundamentally different? It is not red-black tree, it is b+ tree. > Jan -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 16 2007 02:06, Jörn Engel wrote: > >> > +/* memtree.c */ >> > +void btree_init(struct btree_head *head); >> > +void *btree_lookup(struct btree_head *head, long val); >> > +int btree_insert(struct btree_head *head, long val, void *ptr); >> > +int btree_remove(struct btree_head *head, long val); >> >> These names are too generic. If we later add a btree library: blam. > >My plan was to move this code to lib/ sooner or later. If you consider >it useful in its current state, I can do it immediatly. And if someone >else merged a superior btree library I'd happily remove mine and use the >new one instead. > >Opinions? Why would we need another btree, when there is lib/rbtree.c? Or does yours do something fundamentally different? Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 16 2007 15:53, Jörn Engel wrote: > >My experience is that no matter which name I pick, people will complain >anyway. Previous suggestions included: [...] > >Plus today: >FFFS >flashfs >fredfs >bob >shizzle > >Imo they all suck. LogFS also sucks, but it allows me to make a stupid >joke and keep my logfs.org domain. Try woodfs! (log - wood - get it?) But finding names can be so tiresome, just give it a Borg-style designation - "filesystem 125" or so. fs2007q1, being this quartal's new filesystem. Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 16 2007 22:06, CaT wrote: >On Wed, May 16, 2007 at 01:50:03PM +0200, J??rn Engel wrote: >> On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: >> > >> > But if akpm can't pronounce it, how about FFFS for faster flash >> > filesystem ;-) >> >> How many of you have worked for IBM before? Vowels are not evil. ;) >> >> Grouping four or more consonants to name anything will cause similar >> expressions on people's faces. Numbers don't help much either. >> >> Ext2 is a great name, because "ext" actually is a pronouncable syllable. >> MinixFS, ChunkFS, TileFS are great too. XFS and JFS are ok, at least >> they only have three consonants. But FFS exists, so I'd rather go for a >> syllable. > >FlashFS? Or just try once dropping all those redundant 'fs' suffixes. bdev, proc, cpuset, devpts, mqueue, fuse(blk|ctl), vfat, iso9660, etc. Then there's much more space for innovative names. Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 16 2007 14:55, Jörn Engel wrote: >On Wed, 16 May 2007 16:29:22 +0400, Evgeniy Polyakov wrote: >> On Wed, May 16, 2007 at 01:50:03PM +0200, Jörn Engel ([EMAIL PROTECTED]) >> wrote: >> > On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: >> > > >> > > But if akpm can't pronounce it, how about FFFS for faster flash >> > > filesystem ;-) >> > >> > How many of you have worked for IBM before? Vowels are not evil. ;) >> >> Do you think 'eieio' is a good set? IBM's work too... C'mon, UIO does not cut IIO either ;-) Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 16 2007 13:09, Jörn Engel wrote: >On Wed, 16 May 2007 12:54:14 +0800, David Woodhouse wrote: >> >> Personally I'd just go for 'JFFS3'. After all, it has a better claim to >> the name than either of its predecessors :) > >Did you ever see akpm's facial expression when he tried to pronounce >"JFFS2"? ;) Is there something special with [dʒeɪ ɛf ɛf ɛs tuː]? Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Review status (Re: [PATCH] LogFS take three)
On Thu, 17 May 2007 20:03:11 +0400, Evgeniy Polyakov wrote: > > Is logfs 32bit fs or 674bit, since although you use 64bit values for > offsets, area management and strange converstions like described below > from offset into segment number are performed in 32bit? > Is it enough for SSD for example to be 32bit only? Or if it is 64bit, > could you please explain logic behind area management? Ignoring bugs and signed return values for error handling, it is either 64bit or 32+32bit. Inode numbers and file positions are 64bit. Offsets are 64bit as well. In a couple of places, offsets are also 32+32bit. Basically the high bits contain the segment number, the lower bits the offset within a segment. Side note: It would be nicer if the high 32bit were segment number. Instead the number of bits depends on segment size. Guess I should change that while the format isn't fixed yet. An "area" is a segment that is currently being written. Data is appended to this segment as it comes in, until the segment is full. Any functions dealing with areas only need a 32bit offset, which is the offset within the area, not the absolute device offset. Writes within an area are also buffered. New data first goes into the write buffer (wbuf) and only when this is full is it flushed to the device. NAND flash and some NOR flashes require such buffering. When writing to the device, the 32bit segno and the 32bit in-segment offset need to get converted back to a 64bit device offset. > I've found that you store segment numbers as 32bit values (for example > in prepare_write()), and convert requested 64bit offset into segment > number via superblock's s_segshift. Yes, as described above. > This conversation seems confusing to me in case of real 64bit offsets. > For example this one obtained via prepare_write: > > 7 1 logfs_prepare_write78 fs/logfs/file.c > 8 2 logfs_readpage_nolock20 fs/logfs/file.c > 9 1 logfs_read_block 351 fs/logfs/readwrite.c > 10 1 logfs_read_loop 139 fs/logfs/readwrite.c > 11 2 logfs_segment_read 108 fs/logfs/readwrite.c > 12 1 wbuf_read 289 > > u32 segno = ofs >> super->s_segshift; > > ofs is originally obtained from inode's li_data array, which is filled > with raw segment numbers which can be 64bit (here is another issue, > since logfs_segment_write() returns signed, so essentially logfs is > 63bit filesystem). The filesystem format is 64bit. The current code can only deal with 63bit. Eric Sandeen just fixed ext2 to actually deal with 32bit numbers and the same is possible for logfs. If anyone ever cares... > But here I've came to area management in logfs, and found that it is > 32bit only, for example __logfs_segment_write()/__logfs_get_free_bytes() > returns signed 32 bit value (so it is reduced to 31 bit), which is then > placed into li_data as 64bit value. The latter > (__logfs_get_free_bytes()) truncates 64bit data value obtained via > dev_ofs() into signed 32 bit value. That indeed is a bug. __logfs_get_free_bytes() should return s64 instead of s32. Will fix immediatly. If anyone can find similar bugs, the bounty is a beer or non-alcoholic beverage of choice. :) Jörn -- To announce that there must be no criticism of the President, or that we are to stand by the President, right or wrong, is not only unpatriotic and servile, but is morally treasonable to the American public. -- Theodore Roosevelt, Kansas City Star, 1918 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Review status (Re: [PATCH] LogFS take three)
Hi Jörn. Is logfs 32bit fs or 674bit, since although you use 64bit values for offsets, area management and strange converstions like described below from offset into segment number are performed in 32bit? Is it enough for SSD for example to be 32bit only? Or if it is 64bit, could you please explain logic behind area management? I've found that you store segment numbers as 32bit values (for example in prepare_write()), and convert requested 64bit offset into segment number via superblock's s_segshift. This conversation seems confusing to me in case of real 64bit offsets. For example this one obtained via prepare_write: 7 1 logfs_prepare_write78 fs/logfs/file.c 8 2 logfs_readpage_nolock20 fs/logfs/file.c 9 1 logfs_read_block 351 fs/logfs/readwrite.c 10 1 logfs_read_loop 139 fs/logfs/readwrite.c 11 2 logfs_segment_read 108 fs/logfs/readwrite.c 12 1 wbuf_read 289 u32 segno = ofs >> super->s_segshift; ofs is originally obtained from inode's li_data array, which is filled with raw segment numbers which can be 64bit (here is another issue, since logfs_segment_write() returns signed, so essentially logfs is 63bit filesystem). But here I've came to area management in logfs, and found that it is 32bit only, for example __logfs_segment_write()/__logfs_get_free_bytes() returns signed 32 bit value (so it is reduced to 31 bit), which is then placed into li_data as 64bit value. The latter (__logfs_get_free_bytes()) truncates 64bit data value obtained via dev_ofs() into signed 32 bit value. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Tuesday 15 May 2007, Jörn Engel wrote: > Add LogFS, a scalable flash filesystem. Hi Jörn, Sorry for not commenting earlier, there were so many discussions on version two that I wanted to wait for the fallout of that instead of duplicating all the comments. Here are a few things I notice while going through the third version: > +/* > + * Private errno for accessed beyond end-of-file. Only used internally to > + * logfs. If this ever gets exposed to userspace or even other parts of the > + * kernel, it is a bug. 256 was chosen as a number sufficiently above all > + * used errno #defines. > + * > + * It can be argued that this is a hack and should be replaced with something > + * else. My last attempt to do this failed spectacularly and there are more > + * urgent problems that users actually care about. This will remain for the > + * moment. Patches are wellcome, of course. > + */ > +#define EOF 256 It should at least be in the kernel-only errno range between 512 and 4095, that way it can eventually be added to include/linux/errno.h. > + * Target rename works in three atomic steps: > + * 1. Attach old inode to new dentry (remember old dentry and new inode) > + * 2. Remove old dentry (still remember the new inode) > + * 3. Remove new inode > + * > + * Here we remember both an inode an a dentry. If we get interrupted > + * between steps 1 and 2, we delete both the dentry and the inode. If > + * we get interrupted between steps 2 and 3, we delete just the inode. > + * In either case, the remaining objects are deleted on next mount. From > + * a users point of view, the operation succeeded. This description had me confused for a while: why would you remove the new inode. Maybe change the text to say 'target inode' or 'victim inode'? > +static int logfs_mkdir(struct inode *dir, struct dentry *dentry, int mode) > +{ > + struct inode *inode; > + > + if (dir->i_nlink >= LOGFS_LINK_MAX) > + return -EMLINK; Why is i_nlink limited? Don't you run out of space for inodes before overflowing? > + * In principle, this function should loop forever, looking for GC candidates > + * and moving data. LogFS is designed in such a way that this loop is > + * guaranteed to terminate. > + * > + * Limiting the loop to four iterations serves purely to catch cases when > + * these guarantees have failed. An actual endless loop is an obvious bug > + * and should be reported as such. > + * > + * But there is another nasty twist to this. As I have described in my LCA > + * presentation, Garbage collection would have to limit itself to higher > + * levels if the number of available free segments goes down. This code > + * doesn't and should fail spectacularly. Yet - hard as I tried I haven't > + * been able to make it fail (short of a bug elsewhere). > + * > + * So in a way this code is intentionally wrong as a desperate cry for a > + * better testcase. And I do expect to get blamed for it one day. :( > + */ Could you bug the code to reserve fewer segments for GC than you really need, in order to stress test GC? > +static struct inode *logfs_alloc_inode(struct super_block *sb) > +{ > + struct logfs_inode *li; > + > + li = kmem_cache_alloc(logfs_inode_cache, GFP_KERNEL); > + if (!li) > + return NULL; > + logfs_init_inode(>vfs_inode); > + return >vfs_inode; > +} > + > + > +struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino) > +{ > + struct inode *inode; > + > + inode = logfs_alloc_inode(sb); > + if (!inode) > + return ERR_PTR(-ENOMEM); > + > + logfs_init_inode(inode); logfs_alloc_inode() returns an initialized inode, so no need to call logfs_init_inode() again, right? > +static __be64 timespec_to_be64(struct timespec tsp) > +{ > + u64 time = ((u64)tsp.tv_sec << 32) + (tsp.tv_nsec & 0x); > + > + WARN_ON(tsp.tv_nsec > 9); > + return cpu_to_be64(time); > +} Why not just store 64 bit nanoseconds? that would avoid the problem with ns overflow and the year-2038 bug. OTOH, that would require a 64 bit integer division when reading the data, so it gets you a runtime overhead. > +static void logfs_read_inode(struct inode *inode) > +{ > + int ret; > + > + BUG_ON(inode->i_ino == LOGFS_INO_MASTER); > + > + ret = __logfs_read_inode(inode); > + > + /* What else can we do here? */ > + BUG_ON(ret); > +} ext2 returns make_bad_inode(inode) in this case, which seems to be a better solution than crashing. > +int __logfs_write_inode(struct inode *inode) > +{ > + /* > + * FIXME: Those two inodes are 512 bytes in total. Not good to > + * have on the stack. Possibly the best solution would be to bite > + * the bullet and do another format change before release and > + * shrink the inodes. > + */ > + struct logfs_disk_inode old, new; > + > + BUG_ON(inode->i_ino == LOGFS_INO_MASTER); > + > + /* read and compare the inode first. If it hasn't
Re: [PATCH] LogFS take three
On Thu, 17 May 2007 16:43:59 +0800, David Woodhouse wrote: > > > As I mentioned, some techniques like log-structured filesystem could > > perform generally better on any kind of flash-based storage with FTL. > > Although there are many kinds of FTL, it is commonly true that > > it performs well under workload where sequential write is dominant. > > Yes, it's certainly possible that we _could_ write a file system which > is specifically targeted at FTL -- I was just wondering why anyone would > _bother_ :) Haven't you done that already? JFFS2 write behaviour is the best-case scenario for any FTL. When the delta cache is finished, LogFS will be pretty close to that as well. Not sure if anyone would specifically target FTL. Being well-suited for those beasts is just a side-effect. The FTL is still a net loss. Without that FAT enabling layer a real flash filesystem would be more efficient. Jörn -- Prosperity makes friends, adversity tries them. -- Publilius Syrus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Tuesday 15 May 2007, Jörn Engel wrote: > > > I've been semi watching this, and the only comment I really can give > > is that I hate the name. To me, logfs implies a filesystem for > > logging purposes, not for Flash hardware with wear leveling issues to > > be taken into account. > > Yeah, well, ... > > Two years ago when I started all this, I was looking for a good name. > All I could come up with sounded stupid, so I picked "LogFS" as a code > name. As soon as I find a better name, the code name should get > replaced. > When doing a google search on logfs, there are less than five results among the first 100 that don't refer to your work. The other two listed in there are also log-structured file systems: The inferno flash file system (http://inferno-os.googlecode.com/svn/trunk/liblogfs/) and the (discontinued) file system named lfs from the 2005 google summer of code. I'd say the name should stay, changing it now can only add more confusion. Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thu, 2007-05-17 at 09:12 +, Pavel Machek wrote: > Nah, it would lead to Jorn disappearing misteriously and _Pavel_ > accused of murder ;-). Are you suggesting that you would murder Jörn (you misspelled his name) merely for the heinous crime of using his own name? Your Luddism was already quite excessive, but now you really _are_ taking it to extremes. :) -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
> >> My experience is that no matter which name I pick, > >people will > >> complain > >> anyway. Previous suggestions included: > >> jffs3 > >> jefs > >> engelfs > >> poofs > >> crapfs > >> sweetfs > >> cutefs > >> dynamic journaling fs - djofs > >> tfsfkal - the file system formerly known as logfs > > > >Can we call it jörnfs? :) > > However if Jörn is accused of murder, it will have > little chance of > being merged :-). Nah, it would lead to Jorn disappearing misteriously and _Pavel_ accused of murder ;-). -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thu, 2007-05-17 at 17:20 +0900, Dongjun Shin wrote: > There are, of course, cases where direct access are better. > However, as the demand for capacity, reliability and high performance > for the flash storage increases, the use of FTL with embedded controller > would be inevitable. > > - The complexity/cost of host-side SW (like JFFS2/MTD) will increase due to > the use of multiple flash in parallel and the use of high degree ECC > algorithm. > There are other things like bad block handling and wear-leveling issues > which has been recently touched by UBI with added SW complexity. You don't get rid of that complexity by offloading it to a µcontroller. The only thing you achieve that way is making sure it's quite likely to be done badly, and making it impossible to debug. > - In contrast to the embedded environment where CPU and flash is directly > connected, the I/O path between CPU and flash in PC environment is longer. > The latency for SW handshaking between CPU and flash will also be longer, > which would make the performance optimization harder. Do it the naïve way with a single byte push/pull and waggling the control lines separately, and what you say is true -- but you can have flash controllers which assist with data transfer but still give you essentially 'raw' access to the chip. With the CAFÉ controller designed for the OLPC machine, we can spew data across the PCI bus just as fast as we can suck it off the flash chip. > As I mentioned, some techniques like log-structured filesystem could > perform generally better on any kind of flash-based storage with FTL. > Although there are many kinds of FTL, it is commonly true that > it performs well under workload where sequential write is dominant. Yes, it's certainly possible that we _could_ write a file system which is specifically targeted at FTL -- I was just wondering why anyone would _bother_ :) I've seen an interesting file system which does have a kind of FTL internally as its lowest layer, and which build on that using 'virtual' sectors for file extents. Now that _does_ have its advantages -- but it doesn't go as far as pretending to be a 'normal' block device; it's its own special thing for internal use within that file system. > I also expect that FTL for PC environment will have better quality spec > than the disposable storage. There really is no reason why FTL has to be done badly; just as there's no _reason_ why hardware vendors have to give us crappy bsVendorCode. Nevertheless, that's the way the world tends to be. So good luck shipping with that :) -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On 5/17/07, David Woodhouse <[EMAIL PROTECTED]> wrote: Yes. These things are almost always implemented _very_ badly by the same kind of crack-smoking hobo they drag in off the streets to write BIOSen. It's bog-roll technology; if you fancy a laugh try doing some real reliability tests on them time some. Powerfail testing is a good one. This kind of thing is OK for disposable storage such as in digital cameras, where it doesn't matter that it's no more reliable than a floppy disc, but for real long-term storage it's really a bad idea. There are so many flash-based storage and some disposable storages, as you pointed out, have poor quality. I think it's mainly because these are not designed for good quality, but for lowering the price. These kind of devices are not ready for things like power failure because their use case is far from that. For example, removing flash card while taking pictures using digital camera is not a common use case. (there should be a written notice that this kind of action is against the warranty) There's little point in optimising a file system _specifically_ for devices which in often aren't reliable enough to keep your data anyway. You might as well use ramfs. It's unfortunate really -- there's no _fundamental_ reason why FTL has to be done so badly; it's just that it almost always is. Direct access to the flash from Linux is _always_ going to be better in practice -- and that way you avoid the problems with dual journalling, along with the problems with the underlying FTL continuing to keep (and copy around during GC) sectors which the top-level filesystem has actually deallocated, etc. There are, of course, cases where direct access are better. However, as the demand for capacity, reliability and high performance for the flash storage increases, the use of FTL with embedded controller would be inevitable. - The complexity/cost of host-side SW (like JFFS2/MTD) will increase due to the use of multiple flash in parallel and the use of high degree ECC algorithm. There are other things like bad block handling and wear-leveling issues which has been recently touched by UBI with added SW complexity. - In contrast to the embedded environment where CPU and flash is directly connected, the I/O path between CPU and flash in PC environment is longer. The latency for SW handshaking between CPU and flash will also be longer, which would make the performance optimization harder. As I mentioned, some techniques like log-structured filesystem could perform generally better on any kind of flash-based storage with FTL. Although there are many kinds of FTL, it is commonly true that it performs well under workload where sequential write is dominant. I also expect that FTL for PC environment will have better quality spec than the disposable storage. Dongjun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thu, 2007-05-17 at 15:12 +0900, Dongjun Shin wrote: > The current trend of flash-based device is to hide the flash-specific details > from the host OS. The flash memory is encapsulated in a package > which contains a dedicated controller where a small piece of software (F/W or > FTL) > runs and makes the storage shown as a block device to the host. Yes. These things are almost always implemented _very_ badly by the same kind of crack-smoking hobo they drag in off the streets to write BIOSen. It's bog-roll technology; if you fancy a laugh try doing some real reliability tests on them time some. Powerfail testing is a good one. This kind of thing is OK for disposable storage such as in digital cameras, where it doesn't matter that it's no more reliable than a floppy disc, but for real long-term storage it's really a bad idea. > IMHO, for a flash-optimized filesystem to be useful and widely-used, it would > be better > to run on a block device and to be designed to run efficiently on top of the > FTL. > (ex. log-structured filesystem on general block device) There's little point in optimising a file system _specifically_ for devices which in often aren't reliable enough to keep your data anyway. You might as well use ramfs. It's unfortunate really -- there's no _fundamental_ reason why FTL has to be done so badly; it's just that it almost always is. Direct access to the flash from Linux is _always_ going to be better in practice -- and that way you avoid the problems with dual journalling, along with the problems with the underlying FTL continuing to keep (and copy around during GC) sectors which the top-level filesystem has actually deallocated, etc. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thu, 2007-05-17 at 15:12 +0900, Dongjun Shin wrote: The current trend of flash-based device is to hide the flash-specific details from the host OS. The flash memory is encapsulated in a package which contains a dedicated controller where a small piece of software (F/W or FTL) runs and makes the storage shown as a block device to the host. Yes. These things are almost always implemented _very_ badly by the same kind of crack-smoking hobo they drag in off the streets to write BIOSen. It's bog-roll technology; if you fancy a laugh try doing some real reliability tests on them time some. Powerfail testing is a good one. This kind of thing is OK for disposable storage such as in digital cameras, where it doesn't matter that it's no more reliable than a floppy disc, but for real long-term storage it's really a bad idea. IMHO, for a flash-optimized filesystem to be useful and widely-used, it would be better to run on a block device and to be designed to run efficiently on top of the FTL. (ex. log-structured filesystem on general block device) There's little point in optimising a file system _specifically_ for devices which in often aren't reliable enough to keep your data anyway. You might as well use ramfs. It's unfortunate really -- there's no _fundamental_ reason why FTL has to be done so badly; it's just that it almost always is. Direct access to the flash from Linux is _always_ going to be better in practice -- and that way you avoid the problems with dual journalling, along with the problems with the underlying FTL continuing to keep (and copy around during GC) sectors which the top-level filesystem has actually deallocated, etc. -- dwmw2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On 5/17/07, David Woodhouse [EMAIL PROTECTED] wrote: Yes. These things are almost always implemented _very_ badly by the same kind of crack-smoking hobo they drag in off the streets to write BIOSen. It's bog-roll technology; if you fancy a laugh try doing some real reliability tests on them time some. Powerfail testing is a good one. This kind of thing is OK for disposable storage such as in digital cameras, where it doesn't matter that it's no more reliable than a floppy disc, but for real long-term storage it's really a bad idea. There are so many flash-based storage and some disposable storages, as you pointed out, have poor quality. I think it's mainly because these are not designed for good quality, but for lowering the price. These kind of devices are not ready for things like power failure because their use case is far from that. For example, removing flash card while taking pictures using digital camera is not a common use case. (there should be a written notice that this kind of action is against the warranty) There's little point in optimising a file system _specifically_ for devices which in often aren't reliable enough to keep your data anyway. You might as well use ramfs. It's unfortunate really -- there's no _fundamental_ reason why FTL has to be done so badly; it's just that it almost always is. Direct access to the flash from Linux is _always_ going to be better in practice -- and that way you avoid the problems with dual journalling, along with the problems with the underlying FTL continuing to keep (and copy around during GC) sectors which the top-level filesystem has actually deallocated, etc. There are, of course, cases where direct access are better. However, as the demand for capacity, reliability and high performance for the flash storage increases, the use of FTL with embedded controller would be inevitable. - The complexity/cost of host-side SW (like JFFS2/MTD) will increase due to the use of multiple flash in parallel and the use of high degree ECC algorithm. There are other things like bad block handling and wear-leveling issues which has been recently touched by UBI with added SW complexity. - In contrast to the embedded environment where CPU and flash is directly connected, the I/O path between CPU and flash in PC environment is longer. The latency for SW handshaking between CPU and flash will also be longer, which would make the performance optimization harder. As I mentioned, some techniques like log-structured filesystem could perform generally better on any kind of flash-based storage with FTL. Although there are many kinds of FTL, it is commonly true that it performs well under workload where sequential write is dominant. I also expect that FTL for PC environment will have better quality spec than the disposable storage. Dongjun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thu, 2007-05-17 at 17:20 +0900, Dongjun Shin wrote: There are, of course, cases where direct access are better. However, as the demand for capacity, reliability and high performance for the flash storage increases, the use of FTL with embedded controller would be inevitable. - The complexity/cost of host-side SW (like JFFS2/MTD) will increase due to the use of multiple flash in parallel and the use of high degree ECC algorithm. There are other things like bad block handling and wear-leveling issues which has been recently touched by UBI with added SW complexity. You don't get rid of that complexity by offloading it to a µcontroller. The only thing you achieve that way is making sure it's quite likely to be done badly, and making it impossible to debug. - In contrast to the embedded environment where CPU and flash is directly connected, the I/O path between CPU and flash in PC environment is longer. The latency for SW handshaking between CPU and flash will also be longer, which would make the performance optimization harder. Do it the naïve way with a single byte push/pull and waggling the control lines separately, and what you say is true -- but you can have flash controllers which assist with data transfer but still give you essentially 'raw' access to the chip. With the CAFÉ controller designed for the OLPC machine, we can spew data across the PCI bus just as fast as we can suck it off the flash chip. As I mentioned, some techniques like log-structured filesystem could perform generally better on any kind of flash-based storage with FTL. Although there are many kinds of FTL, it is commonly true that it performs well under workload where sequential write is dominant. Yes, it's certainly possible that we _could_ write a file system which is specifically targeted at FTL -- I was just wondering why anyone would _bother_ :) I've seen an interesting file system which does have a kind of FTL internally as its lowest layer, and which build on that using 'virtual' sectors for file extents. Now that _does_ have its advantages -- but it doesn't go as far as pretending to be a 'normal' block device; it's its own special thing for internal use within that file system. I also expect that FTL for PC environment will have better quality spec than the disposable storage. There really is no reason why FTL has to be done badly; just as there's no _reason_ why hardware vendors have to give us crappy bsVendorCode. Nevertheless, that's the way the world tends to be. So good luck shipping with that :) -- dwmw2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
My experience is that no matter which name I pick, people will complain anyway. Previous suggestions included: jffs3 jefs engelfs poofs crapfs sweetfs cutefs dynamic journaling fs - djofs tfsfkal - the file system formerly known as logfs Can we call it jörnfs? :) However if Jörn is accused of murder, it will have little chance of being merged :-). Nah, it would lead to Jorn disappearing misteriously and _Pavel_ accused of murder ;-). -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thu, 2007-05-17 at 09:12 +, Pavel Machek wrote: Nah, it would lead to Jorn disappearing misteriously and _Pavel_ accused of murder ;-). Are you suggesting that you would murder Jörn (you misspelled his name) merely for the heinous crime of using his own name? Your Luddism was already quite excessive, but now you really _are_ taking it to extremes. :) -- dwmw2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Tuesday 15 May 2007, Jörn Engel wrote: I've been semi watching this, and the only comment I really can give is that I hate the name. To me, logfs implies a filesystem for logging purposes, not for Flash hardware with wear leveling issues to be taken into account. Yeah, well, ... Two years ago when I started all this, I was looking for a good name. All I could come up with sounded stupid, so I picked LogFS as a code name. As soon as I find a better name, the code name should get replaced. When doing a google search on logfs, there are less than five results among the first 100 that don't refer to your work. The other two listed in there are also log-structured file systems: The inferno flash file system (http://inferno-os.googlecode.com/svn/trunk/liblogfs/) and the (discontinued) file system named lfs from the 2005 google summer of code. I'd say the name should stay, changing it now can only add more confusion. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thu, 17 May 2007 16:43:59 +0800, David Woodhouse wrote: As I mentioned, some techniques like log-structured filesystem could perform generally better on any kind of flash-based storage with FTL. Although there are many kinds of FTL, it is commonly true that it performs well under workload where sequential write is dominant. Yes, it's certainly possible that we _could_ write a file system which is specifically targeted at FTL -- I was just wondering why anyone would _bother_ :) Haven't you done that already? JFFS2 write behaviour is the best-case scenario for any FTL. When the delta cache is finished, LogFS will be pretty close to that as well. Not sure if anyone would specifically target FTL. Being well-suited for those beasts is just a side-effect. The FTL is still a net loss. Without that FAT enabling layer a real flash filesystem would be more efficient. Jörn -- Prosperity makes friends, adversity tries them. -- Publilius Syrus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Tuesday 15 May 2007, Jörn Engel wrote: Add LogFS, a scalable flash filesystem. Hi Jörn, Sorry for not commenting earlier, there were so many discussions on version two that I wanted to wait for the fallout of that instead of duplicating all the comments. Here are a few things I notice while going through the third version: +/* + * Private errno for accessed beyond end-of-file. Only used internally to + * logfs. If this ever gets exposed to userspace or even other parts of the + * kernel, it is a bug. 256 was chosen as a number sufficiently above all + * used errno #defines. + * + * It can be argued that this is a hack and should be replaced with something + * else. My last attempt to do this failed spectacularly and there are more + * urgent problems that users actually care about. This will remain for the + * moment. Patches are wellcome, of course. + */ +#define EOF 256 It should at least be in the kernel-only errno range between 512 and 4095, that way it can eventually be added to include/linux/errno.h. + * Target rename works in three atomic steps: + * 1. Attach old inode to new dentry (remember old dentry and new inode) + * 2. Remove old dentry (still remember the new inode) + * 3. Remove new inode + * + * Here we remember both an inode an a dentry. If we get interrupted + * between steps 1 and 2, we delete both the dentry and the inode. If + * we get interrupted between steps 2 and 3, we delete just the inode. + * In either case, the remaining objects are deleted on next mount. From + * a users point of view, the operation succeeded. This description had me confused for a while: why would you remove the new inode. Maybe change the text to say 'target inode' or 'victim inode'? +static int logfs_mkdir(struct inode *dir, struct dentry *dentry, int mode) +{ + struct inode *inode; + + if (dir-i_nlink = LOGFS_LINK_MAX) + return -EMLINK; Why is i_nlink limited? Don't you run out of space for inodes before overflowing? + * In principle, this function should loop forever, looking for GC candidates + * and moving data. LogFS is designed in such a way that this loop is + * guaranteed to terminate. + * + * Limiting the loop to four iterations serves purely to catch cases when + * these guarantees have failed. An actual endless loop is an obvious bug + * and should be reported as such. + * + * But there is another nasty twist to this. As I have described in my LCA + * presentation, Garbage collection would have to limit itself to higher + * levels if the number of available free segments goes down. This code + * doesn't and should fail spectacularly. Yet - hard as I tried I haven't + * been able to make it fail (short of a bug elsewhere). + * + * So in a way this code is intentionally wrong as a desperate cry for a + * better testcase. And I do expect to get blamed for it one day. :( + */ Could you bug the code to reserve fewer segments for GC than you really need, in order to stress test GC? +static struct inode *logfs_alloc_inode(struct super_block *sb) +{ + struct logfs_inode *li; + + li = kmem_cache_alloc(logfs_inode_cache, GFP_KERNEL); + if (!li) + return NULL; + logfs_init_inode(li-vfs_inode); + return li-vfs_inode; +} + + +struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino) +{ + struct inode *inode; + + inode = logfs_alloc_inode(sb); + if (!inode) + return ERR_PTR(-ENOMEM); + + logfs_init_inode(inode); logfs_alloc_inode() returns an initialized inode, so no need to call logfs_init_inode() again, right? +static __be64 timespec_to_be64(struct timespec tsp) +{ + u64 time = ((u64)tsp.tv_sec 32) + (tsp.tv_nsec 0x); + + WARN_ON(tsp.tv_nsec 9); + return cpu_to_be64(time); +} Why not just store 64 bit nanoseconds? that would avoid the problem with ns overflow and the year-2038 bug. OTOH, that would require a 64 bit integer division when reading the data, so it gets you a runtime overhead. +static void logfs_read_inode(struct inode *inode) +{ + int ret; + + BUG_ON(inode-i_ino == LOGFS_INO_MASTER); + + ret = __logfs_read_inode(inode); + + /* What else can we do here? */ + BUG_ON(ret); +} ext2 returns make_bad_inode(inode) in this case, which seems to be a better solution than crashing. +int __logfs_write_inode(struct inode *inode) +{ + /* + * FIXME: Those two inodes are 512 bytes in total. Not good to + * have on the stack. Possibly the best solution would be to bite + * the bullet and do another format change before release and + * shrink the inodes. + */ + struct logfs_disk_inode old, new; + + BUG_ON(inode-i_ino == LOGFS_INO_MASTER); + + /* read and compare the inode first. If it hasn't changed, don't + * bother writing it. */ + logfs_inode_to_disk(inode, new); + if
Re: Review status (Re: [PATCH] LogFS take three)
Hi Jörn. Is logfs 32bit fs or 674bit, since although you use 64bit values for offsets, area management and strange converstions like described below from offset into segment number are performed in 32bit? Is it enough for SSD for example to be 32bit only? Or if it is 64bit, could you please explain logic behind area management? I've found that you store segment numbers as 32bit values (for example in prepare_write()), and convert requested 64bit offset into segment number via superblock's s_segshift. This conversation seems confusing to me in case of real 64bit offsets. For example this one obtained via prepare_write: 7 1 logfs_prepare_write78 fs/logfs/file.c 8 2 logfs_readpage_nolock20 fs/logfs/file.c 9 1 logfs_read_block 351 fs/logfs/readwrite.c 10 1 logfs_read_loop 139 fs/logfs/readwrite.c 11 2 logfs_segment_read 108 fs/logfs/readwrite.c 12 1 wbuf_read 289 u32 segno = ofs super-s_segshift; ofs is originally obtained from inode's li_data array, which is filled with raw segment numbers which can be 64bit (here is another issue, since logfs_segment_write() returns signed, so essentially logfs is 63bit filesystem). But here I've came to area management in logfs, and found that it is 32bit only, for example __logfs_segment_write()/__logfs_get_free_bytes() returns signed 32 bit value (so it is reduced to 31 bit), which is then placed into li_data as 64bit value. The latter (__logfs_get_free_bytes()) truncates 64bit data value obtained via dev_ofs() into signed 32 bit value. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Review status (Re: [PATCH] LogFS take three)
On Thu, 17 May 2007 20:03:11 +0400, Evgeniy Polyakov wrote: Is logfs 32bit fs or 674bit, since although you use 64bit values for offsets, area management and strange converstions like described below from offset into segment number are performed in 32bit? Is it enough for SSD for example to be 32bit only? Or if it is 64bit, could you please explain logic behind area management? Ignoring bugs and signed return values for error handling, it is either 64bit or 32+32bit. Inode numbers and file positions are 64bit. Offsets are 64bit as well. In a couple of places, offsets are also 32+32bit. Basically the high bits contain the segment number, the lower bits the offset within a segment. Side note: It would be nicer if the high 32bit were segment number. Instead the number of bits depends on segment size. Guess I should change that while the format isn't fixed yet. An area is a segment that is currently being written. Data is appended to this segment as it comes in, until the segment is full. Any functions dealing with areas only need a 32bit offset, which is the offset within the area, not the absolute device offset. Writes within an area are also buffered. New data first goes into the write buffer (wbuf) and only when this is full is it flushed to the device. NAND flash and some NOR flashes require such buffering. When writing to the device, the 32bit segno and the 32bit in-segment offset need to get converted back to a 64bit device offset. I've found that you store segment numbers as 32bit values (for example in prepare_write()), and convert requested 64bit offset into segment number via superblock's s_segshift. Yes, as described above. This conversation seems confusing to me in case of real 64bit offsets. For example this one obtained via prepare_write: 7 1 logfs_prepare_write78 fs/logfs/file.c 8 2 logfs_readpage_nolock20 fs/logfs/file.c 9 1 logfs_read_block 351 fs/logfs/readwrite.c 10 1 logfs_read_loop 139 fs/logfs/readwrite.c 11 2 logfs_segment_read 108 fs/logfs/readwrite.c 12 1 wbuf_read 289 u32 segno = ofs super-s_segshift; ofs is originally obtained from inode's li_data array, which is filled with raw segment numbers which can be 64bit (here is another issue, since logfs_segment_write() returns signed, so essentially logfs is 63bit filesystem). The filesystem format is 64bit. The current code can only deal with 63bit. Eric Sandeen just fixed ext2 to actually deal with 32bit numbers and the same is possible for logfs. If anyone ever cares... But here I've came to area management in logfs, and found that it is 32bit only, for example __logfs_segment_write()/__logfs_get_free_bytes() returns signed 32 bit value (so it is reduced to 31 bit), which is then placed into li_data as 64bit value. The latter (__logfs_get_free_bytes()) truncates 64bit data value obtained via dev_ofs() into signed 32 bit value. That indeed is a bug. __logfs_get_free_bytes() should return s64 instead of s32. Will fix immediatly. If anyone can find similar bugs, the bounty is a beer or non-alcoholic beverage of choice. :) Jörn -- To announce that there must be no criticism of the President, or that we are to stand by the President, right or wrong, is not only unpatriotic and servile, but is morally treasonable to the American public. -- Theodore Roosevelt, Kansas City Star, 1918 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 16 2007 13:09, Jörn Engel wrote: On Wed, 16 May 2007 12:54:14 +0800, David Woodhouse wrote: Personally I'd just go for 'JFFS3'. After all, it has a better claim to the name than either of its predecessors :) Did you ever see akpm's facial expression when he tried to pronounce JFFS2? ;) Is there something special with [dʒeɪ ɛf ɛf ɛs tuː]? Jan -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 16 2007 14:55, Jörn Engel wrote: On Wed, 16 May 2007 16:29:22 +0400, Evgeniy Polyakov wrote: On Wed, May 16, 2007 at 01:50:03PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: But if akpm can't pronounce it, how about FFFS for faster flash filesystem ;-) How many of you have worked for IBM before? Vowels are not evil. ;) Do you think 'eieio' is a good set? IBM's work too... C'mon, UIO does not cut IIO either ;-) Jan -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 16 2007 22:06, CaT wrote: On Wed, May 16, 2007 at 01:50:03PM +0200, J??rn Engel wrote: On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: But if akpm can't pronounce it, how about FFFS for faster flash filesystem ;-) How many of you have worked for IBM before? Vowels are not evil. ;) Grouping four or more consonants to name anything will cause similar expressions on people's faces. Numbers don't help much either. Ext2 is a great name, because ext actually is a pronouncable syllable. MinixFS, ChunkFS, TileFS are great too. XFS and JFS are ok, at least they only have three consonants. But FFS exists, so I'd rather go for a syllable. FlashFS? Or just try once dropping all those redundant 'fs' suffixes. bdev, proc, cpuset, devpts, mqueue, fuse(blk|ctl), vfat, iso9660, etc. Then there's much more space for innovative names. Jan -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 16 2007 15:53, Jörn Engel wrote: My experience is that no matter which name I pick, people will complain anyway. Previous suggestions included: [...] Plus today: FFFS flashfs fredfs bob shizzle Imo they all suck. LogFS also sucks, but it allows me to make a stupid joke and keep my logfs.org domain. Try woodfs! (log - wood - get it?) But finding names can be so tiresome, just give it a Borg-style designation - filesystem 125 or so. fs2007q1, being this quartal's new filesystem. Jan -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 16 2007 02:06, Jörn Engel wrote: +/* memtree.c */ +void btree_init(struct btree_head *head); +void *btree_lookup(struct btree_head *head, long val); +int btree_insert(struct btree_head *head, long val, void *ptr); +int btree_remove(struct btree_head *head, long val); These names are too generic. If we later add a btree library: blam. My plan was to move this code to lib/ sooner or later. If you consider it useful in its current state, I can do it immediatly. And if someone else merged a superior btree library I'd happily remove mine and use the new one instead. Opinions? Why would we need another btree, when there is lib/rbtree.c? Or does yours do something fundamentally different? Jan -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thu, May 17, 2007 at 07:26:07PM +0200, Jan Engelhardt ([EMAIL PROTECTED]) wrote: My plan was to move this code to lib/ sooner or later. If you consider it useful in its current state, I can do it immediatly. And if someone else merged a superior btree library I'd happily remove mine and use the new one instead. Opinions? Why would we need another btree, when there is lib/rbtree.c? Or does yours do something fundamentally different? It is not red-black tree, it is b+ tree. Jan -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thu, 17 May 2007 17:08:51 +0200, Arnd Bergmann wrote: On Tuesday 15 May 2007, Jörn Engel wrote: Add LogFS, a scalable flash filesystem. Sorry for not commenting earlier, there were so many discussions on version two that I wanted to wait for the fallout of that instead of duplicating all the comments. You are the last person that has to be sorry. ;) Here are a few things I notice while going through the third version: +/* + * Private errno for accessed beyond end-of-file. Only used internally to + * logfs. If this ever gets exposed to userspace or even other parts of the + * kernel, it is a bug. 256 was chosen as a number sufficiently above all + * used errno #defines. + * + * It can be argued that this is a hack and should be replaced with something + * else. My last attempt to do this failed spectacularly and there are more + * urgent problems that users actually care about. This will remain for the + * moment. Patches are wellcome, of course. + */ +#define EOF256 It should at least be in the kernel-only errno range between 512 and 4095, that way it can eventually be added to include/linux/errno.h. Fair enough. 512 it is. + * Target rename works in three atomic steps: + * 1. Attach old inode to new dentry (remember old dentry and new inode) + * 2. Remove old dentry (still remember the new inode) + * 3. Remove new inode + * + * Here we remember both an inode an a dentry. If we get interrupted + * between steps 1 and 2, we delete both the dentry and the inode. If + * we get interrupted between steps 2 and 3, we delete just the inode. + * In either case, the remaining objects are deleted on next mount. From + * a users point of view, the operation succeeded. This description had me confused for a while: why would you remove the new inode. Maybe change the text to say 'target inode' or 'victim inode'? 'Victim inode' sounds good. Will do. +static int logfs_mkdir(struct inode *dir, struct dentry *dentry, int mode) +{ + struct inode *inode; + + if (dir-i_nlink = LOGFS_LINK_MAX) + return -EMLINK; Why is i_nlink limited? Don't you run out of space for inodes before overflowing? I don't know. With the current limit of 2^31, a sufficiently large device can reach the limit. And it is imaginable that overflowing the s32 number space can expose security holes. Not that I actually know, the check is pure paranoia. + * In principle, this function should loop forever, looking for GC candidates + * and moving data. LogFS is designed in such a way that this loop is + * guaranteed to terminate. + * + * Limiting the loop to four iterations serves purely to catch cases when + * these guarantees have failed. An actual endless loop is an obvious bug + * and should be reported as such. + * + * But there is another nasty twist to this. As I have described in my LCA + * presentation, Garbage collection would have to limit itself to higher + * levels if the number of available free segments goes down. This code + * doesn't and should fail spectacularly. Yet - hard as I tried I haven't + * been able to make it fail (short of a bug elsewhere). + * + * So in a way this code is intentionally wrong as a desperate cry for a + * better testcase. And I do expect to get blamed for it one day. :( + */ Could you bug the code to reserve fewer segments for GC than you really need, in order to stress test GC? I could. Wear leveling will cause changes in the area, so I'll have a closer look when implementing that. +static struct inode *logfs_alloc_inode(struct super_block *sb) +{ + struct logfs_inode *li; + + li = kmem_cache_alloc(logfs_inode_cache, GFP_KERNEL); + if (!li) + return NULL; + logfs_init_inode(li-vfs_inode); + return li-vfs_inode; +} + + +struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino) +{ + struct inode *inode; + + inode = logfs_alloc_inode(sb); + if (!inode) + return ERR_PTR(-ENOMEM); + + logfs_init_inode(inode); logfs_alloc_inode() returns an initialized inode, so no need to call logfs_init_inode() again, right? Right. Will change. +static __be64 timespec_to_be64(struct timespec tsp) +{ + u64 time = ((u64)tsp.tv_sec 32) + (tsp.tv_nsec 0x); + + WARN_ON(tsp.tv_nsec 9); + return cpu_to_be64(time); +} Why not just store 64 bit nanoseconds? that would avoid the problem with ns overflow and the year-2038 bug. OTOH, that would require a 64 bit integer division when reading the data, so it gets you a runtime overhead. I like the idea. Do conversion function exist both way? What I don't get is the year-2038 bug. Isn't that the 31bit limit, while 32bit would last to 2106? +static void logfs_read_inode(struct inode *inode) +{ + int ret; + + BUG_ON(inode-i_ino == LOGFS_INO_MASTER); + +
Re: [PATCH] LogFS take three
Hi! Yes. These things are almost always implemented _very_ badly by the same kind of crack-smoking hobo they drag in off the streets to write BIOSen. It's bog-roll technology; if you fancy a laugh try doing some real reliability tests on them time some. Powerfail testing is a good one. This kind of thing is OK for disposable storage such as in digital cameras, where it doesn't matter that it's no more reliable than a floppy disc, but for real long-term storage it's really a bad idea. There are so many flash-based storage and some disposable storages, as you pointed out, have poor quality. I think it's mainly because these are not designed for good quality, but for lowering the price. These kind of devices are not ready for things like power failure because their use case is far from that. For example, removing flash card while taking pictures using digital camera is not a common use case. (there should be a written notice that this kind of action is against the warranty) Hmm.. so operating your camera on batteries should be against the warranty, since batteries commonly run empty while storing pictures? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
Jörn Engel wrote: Compressing random data will actually enlarge it. If that happens I simply store the verbatim uncompressed data instead and mark it as such. There is also demand for a user-controlled bit in the inode to disable compression completely. All those .jpg, .mpg, .mp3, etc. just waste time by trying and failing to compress them. So any sane way to enable compression is on per-inode basis which makes me still wonder why you need per-object compression. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thursday 17 May 2007, Jörn Engel wrote: Why not just store 64 bit nanoseconds? that would avoid the problem with ns overflow and the year-2038 bug. OTOH, that would require a 64 bit integer division when reading the data, so it gets you a runtime overhead. I like the idea. Do conversion function exist both way? What I don't get is the year-2038 bug. Isn't that the 31bit limit, while 32bit would last to 2106? You're right, you don't hit the 2038 bug here, because you use an unsigned variable. The bug exists elsewhere because time_t tv_sec is signed. Just using nanoseconds probably doesn't gain you much after all then. You could however just have separate 32 bit fields in the inode for seconds and nanoseconds, that will result in the exact same layout that you have right now, but won't require a conversion function. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thu, 17 May 2007 23:00:20 +0200, Arnd Bergmann wrote: Just using nanoseconds probably doesn't gain you much after all then. You could however just have separate 32 bit fields in the inode for seconds and nanoseconds, that will result in the exact same layout that you have right now, but won't require a conversion function. I could also have a 30bit and a 34bit field. 30bit is enough for nanoseconds. So many options. Jörn -- Time? What's that? Time is only worth what you do with it. -- Theo de Raadt - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thursday 17 May 2007, Pekka Enberg wrote: Jörn Engel wrote: Compressing random data will actually enlarge it. If that happens I simply store the verbatim uncompressed data instead and mark it as such. There is also demand for a user-controlled bit in the inode to disable compression completely. All those .jpg, .mpg, .mp3, etc. just waste time by trying and failing to compress them. So any sane way to enable compression is on per-inode basis which makes me still wonder why you need per-object compression. 1. it doesn't require user interaction, the file system will do the right thing most of the time. 2. enlarging data is a very bad thing because it makes the behaviour of the fs unpredictable. With uncompressed objects, you have a guaranteed upper bound on the size. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Thu, 17 May 2007 23:36:13 +0200, Arnd Bergmann wrote: On Thursday 17 May 2007, Pekka Enberg wrote: So any sane way to enable compression is on per-inode basis which makes me still wonder why you need per-object compression. 1. it doesn't require user interaction, the file system will do the right thing most of the time. 2. enlarging data is a very bad thing because it makes the behaviour of the fs unpredictable. With uncompressed objects, you have a guaranteed upper bound on the size. Correct. The compression decision is always per-object. Per-inode is a hint from userspace that a compression attempt would be futile. A compression algorithm that compresses any data is provably impossible. Some data will always cause expansion instead of compression. Some algorithms have a well-known upper bound on the expansion, others don't. So LogFS instead creates its own upper bound by reserving one byte in the header for the compression type. And while one bit would suffice as a compressed/uncompressed flag, having a byte allows to support more than one compression algorithm. LZO looks promising and is on its way into the kernel. Others may come in the future. Jörn -- My second remark is that our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed. -- Edsger W. Dijkstra - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
Jörn Engel wrote: Almost all your static functions start with logfs_, why not this one? Because after a while I discovered how silly it is to start every function with logfs_. That prefix doesn't add much unless the function has global scope. What I didn't do was remove the prefix from older functions. It's handy when debugging or showing detailed backtraces. Not that I'm advocating it (or not), just something I've noticed in other programs. -- Jamie - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
Hi, On 5/18/07, Pavel Machek [EMAIL PROTECTED] wrote: Hi! Hmm.. so operating your camera on batteries should be against the warranty, since batteries commonly run empty while storing pictures? AFAIK, the camera stops writing to the flash card and automatically turns off when it's low on battery (before empty). - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On May 17, 2007, at 13:45:33, Evgeniy Polyakov wrote: On Thu, May 17, 2007 at 07:26:07PM +0200, Jan Engelhardt ([EMAIL PROTECTED]) wrote: My plan was to move this code to lib/ sooner or later. If you consider it useful in its current state, I can do it immediatly. And if someone else merged a superior btree library I'd happily remove mine and use the new one instead. Opinions? Why would we need another btree, when there is lib/rbtree.c? Or does yours do something fundamentally different? It is not red-black tree, it is b+ tree. It might be better to use the prefix bptree to help prevent confusion. A quick google search on bp-tree reveals only the perl B +-tree module Tree::BPTree, a U-Maryland Java CS project on B+- trees, and a news article about a BP tree-top protest. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 19:17:18 +, Pavel Machek wrote: > > In kernel fsck > > > --- /dev/null 2007-04-18 05:32:26.652341749 +0200 > > +++ linux-2.6.21logfs/fs/logfs/progs/fsck.c 2007-05-15 00:54:22.0 > > +0200 > > @@ -0,0 +1,332 @@ > > +/* > > + * fs/logfs/prog/fsck.c- filesystem check > > + * > > + * As should be obvious for Linux kernel code, license is GPLv2 > > + * > > + * Copyright (c) 2005-2007 Joern Engel > > + * > > + * In principle this could get moved to userspace. However it might still > > + * make some sense to keep it in the kernel. It is a pure checker and will > > + * only report problems, not attempt to repair them. > > + */ > > Is there version that repairs? No. > BUG is not right thing to do for media error. I know. Top 3 items of my todo list are: - Handle system crashes - Add second journal - Error handling > > + > > +#if 0 > > +/* rootdir */ > > Please just delete it, not comment it out like this. That will get resurrected, even before the move to userspace. I had to change the filesystem format for compression support and this is an artifact of the transition phase. Jörn -- Ninety percent of everything is crap. -- Sturgeon's Law - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
Hi! In kernel fsck > --- /dev/null 2007-04-18 05:32:26.652341749 +0200 > +++ linux-2.6.21logfs/fs/logfs/progs/fsck.c 2007-05-15 00:54:22.0 > +0200 > @@ -0,0 +1,332 @@ > +/* > + * fs/logfs/prog/fsck.c - filesystem check > + * > + * As should be obvious for Linux kernel code, license is GPLv2 > + * > + * Copyright (c) 2005-2007 Joern Engel > + * > + * In principle this could get moved to userspace. However it might still > + * make some sense to keep it in the kernel. It is a pure checker and will > + * only report problems, not attempt to repair them. > + */ Is there version that repairs? > + /* Some segments are reserved. Just pretend they were all valid */ > + reserved = btree_lookup(>s_reserved_segments, segno); > + if (reserved) > + return 0; > + > + err = wbuf_read(sb, dev_ofs(sb, segno, 0), sizeof(sh), ); > + BUG_ON(err); BUG is not right thing to do for media error. > +/* > + * fs/logfs/prog/mkfs.c - filesystem generation > + * > + * As should be obvious for Linux kernel code, license is GPLv2 > + * > + * Copyright (c) 2005-2007 Joern Engel > + * > + * Should get moved to userspace. > + */ Indeed. > + > +#if 0 > +/* rootdir */ Please just delete it, not comment it out like this. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 23:49:55 +0800, David Woodhouse wrote: > > Utility is a factor of the underlying design -- a filesystem designed > for flash really isn't suited to block devices. I can think of at least three examples where LogFS would indeed make sense on block devices. Jörn -- Happiness isn't having what you want, it's wanting what you have. -- unknown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Wed, 2007-05-16 at 08:34 -0700, Andrew Morton wrote: > Reduced testability, mainly. Also potentially reduced usefulness. CONFIG_MTD has never been a barrier to testability. JFFS2 depends on MTD and had _most_ of its early testing and development done on the 'fake' mtdram device. Utility is a factor of the underlying design -- a filesystem designed for flash really isn't suited to block devices. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 20:07:18 +0800 David Woodhouse <[EMAIL PROTECTED]> wrote: > > It's strange and a bit regrettable that an fs would have dependency on MTD, > > really. > > Why? Other file systems has dependencies on BLOCK or on NET. It seems > entirely normal to me. Reduced testability, mainly. Also potentially reduced usefulness. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On 5/16/07, David Woodhouse <[EMAIL PROTECTED]> wrote: On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote: > > My experience is that no matter which name I pick, people will > complain > anyway. Previous suggestions included: > jffs3 > jefs > engelfs > poofs > crapfs > sweetfs > cutefs > dynamic journaling fs - djofs > tfsfkal - the file system formerly known as logfs Can we call it jörnfs? :) However if Jörn is accused of murder, it will have little chance of being merged :-). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Wed, May 16, 2007 at 03:53:19PM +0200, J??rn Engel wrote: > Imo they all suck. LogFS also sucks, but it allows me to make a stupid > joke and keep my logfs.org domain. Well if stupid jokes are a goer there's always gordonfs. :) *hides* -- "To the extent that we overreact, we proffer the terrorists the greatest tribute." - High Court Judge Michael Kirby - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Wed, 2007-05-16 at 22:04 +0800, David Woodhouse wrote: > On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote: > > > > My experience is that no matter which name I pick, people will > > complain > > anyway. Previous suggestions included: > > jffs3 > > jefs > > engelfs > > poofs > > crapfs > > sweetfs > > cutefs > > dynamic journaling fs - djofs > > tfsfkal - the file system formerly known as logfs > > Can we call it jörnfs? :) And it is essential to preserve "ö" and let Pavel enjoy :-) -- Best regards, Artem Bityutskiy (Битюцкий Артём) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote: > > My experience is that no matter which name I pick, people will > complain > anyway. Previous suggestions included: > jffs3 > jefs > engelfs > poofs > crapfs > sweetfs > cutefs > dynamic journaling fs - djofs > tfsfkal - the file system formerly known as logfs Can we call it jörnfs? :) -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 09:41:10 -0400, John Stoffel wrote: > Jörn> On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: > > Jörn> How many of you have worked for IBM before? Vowels are not > evil. ;) > > Nope, they're not. I just think that LogFS isn't descriptive enough, > or more accurately, is the *wrong* description of this filesystem. That was the whole point. JFFS2, the journaling flash filesystem, is a strictly log-structured filesystem. LogFS has a journal. It is also the filesystem that tries to scale logarithmically, as Arnd has noted. Maybe I should call it Log2 to emphesize this point. Log1 would be horrible scalability. > flashfs works for me. It's longer, but hey, that's ok. Even flshfs > might work. Oh wait, flesh? flash? flush? Too confusing... :-) Maybe. FFS or flash filesystem already exists. And YAFFS, yet another flash filesystem, would be older than flashfs. My experience is that no matter which name I pick, people will complain anyway. Previous suggestions included: jffs3 jefs engelfs poofs crapfs sweetfs cutefs dynamic journaling fs - djofs tfsfkal - the file system formerly known as logfs Plus today: FFFS flashfs fredfs bob shizzle Imo they all suck. LogFS also sucks, but it allows me to make a stupid joke and keep my logfs.org domain. Jörn -- There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies, and the other is to make it so complicated that there are no obvious deficiencies. -- C. A. R. Hoare - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 15:36:44 +0300, Pekka Enberg wrote: > On 5/16/07, Jörn Engel <[EMAIL PROTECTED]> wrote: > > > >More trouble? > > Forgot to add (see below). Seems logfs_segment_read would be simpler > too if you fixed this. Would it? I think that code would still be needed, although possibly in a different function. There are two minor drawbacks to using the page cache, btw: - Indirect blocks need some mapping too. So either I need to steal a bit from the inode space or from the fpos space. - OOM handling is a bit more complicated. I would need a mempool for that. > >[ Objects are the units that get compressed. Segments can contain both > >compressed and uncompressed objects. ] > > > >It is a trade-off. Each object has a 24 Byte header plus X Bytes of > >data. Whether the data is compressed or not is indicated in the header. > > Was my point really. Why do segments contain both compressed and > uncompressed objects? Compressing random data will actually enlarge it. If that happens I simply store the verbatim uncompressed data instead and mark it as such. There is also demand for a user-controlled bit in the inode to disable compression completely. All those .jpg, .mpg, .mp3, etc. just waste time by trying and failing to compress them. Jörn -- Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface. -- Doug MacIlroy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 15:08:15 +0300, Pekka Enberg wrote: > On 5/16/07, Jamie Lokier <[EMAIL PROTECTED]> wrote: > >Given that the filesystem is still 'experimental', I'd concentrate on > >getting it stable before worrying about immutable and xattrs unless > >they are easy. > > We will run into trouble if the on-disk format is not flexible enough > to accommodate xattrs (think reiser3 here). So I'd worry about it > before merging to mainline. Adding xattrs would be fairly simple. Inodes just need one extra pointer for that. Luckily inodes no longer need to be padded to 128 or 256 bytes. They are individually compressed, so their size is not limited to powers of two. Jörn -- To recognize individual spam features you have to try to get into the mind of the spammer, and frankly I want to spend as little time inside the minds of spammers as possible. -- Paul Graham - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 16:29:22 +0400, Evgeniy Polyakov wrote: > On Wed, May 16, 2007 at 01:50:03PM +0200, Jörn Engel ([EMAIL PROTECTED]) > wrote: > > On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: > > > > > > But if akpm can't pronounce it, how about FFFS for faster flash > > > filesystem ;-) > > > > How many of you have worked for IBM before? Vowels are not evil. ;) > > Do you think 'eieio' is a good set? IBM's work too... I will let someone else comment on that one. http://www.uwsg.iu.edu/hypermail/linux/kernel/0110.1/1294.html Jörn -- There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies, and the other is to make it so complicated that there are no obvious deficiencies. -- C. A. R. Hoare - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 13:25:48 +0100, Jamie Lokier wrote: > > Is LogFS really slower than JFFS2 in practice? Not sure. I ran a benchmark before adding compression support in QEMU with a lightning-fast device. So the results should differ quite a bit from practice. http://logfs.org/~joern/logfs/benchmark/benchmark_overview LogFS was actually faster than JFFS2. So for that particular unrealistic benchmark, updating the LogFS tree was less expensive than trying (and failing) to compress and calculating the CRC was for JFFS2. With compression finished, I would expect LogFS numbers to degrade. If file data had checksums (not done yet, should be optional for users to decide) even more so. > I would have guessed reads to be a similar speed, tree updates to be a > similar speed to journal updates for sustained non-fsyncing writes, > and the difference unimportant for tiny individual commits whose index > updates are not merged with any other. I've not thought about it much > though. LogFS isn't that good yet. Right now, writing 10 adjacent blocks to a file requires 10 tree updates instead of 1. Not full updates though, just up to the inode. Quite surprisingly, read speed in the benchmark was significantly better for LogFS, even after substracting mount time. I don't know if all of that can be explained with CRC checks or there is more to it. Jörn -- I can say that I spend most of my time fixing bugs even if I have lots of new features to implement in mind, but I give bugs more priority. -- Andrea Arcangeli, 2000 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On 5/16/07, Pekka Enberg <[EMAIL PROTECTED]> wrote: Forgot to add (see below). Seems logfs_segment_read would be simpler too if you fixed this. Blah. Just to be clear: I forgot to add a "(see below)" text in the original review comment. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
On 5/16/07, Jörn Engel <[EMAIL PROTECTED]> wrote: > > +/* FIXME: all this mess should get replaced by using the page cache */ > > +static void fixup_from_wbuf(struct super_block *sb, struct logfs_area > *area, > > + void *read, u64 ofs, size_t readlen) > > +{ > > Indeed. And I think you're getting some more trouble because of this... More trouble? Forgot to add (see below). Seems logfs_segment_read would be simpler too if you fixed this. [ Objects are the units that get compressed. Segments can contain both compressed and uncompressed objects. ] It is a trade-off. Each object has a 24 Byte header plus X Bytes of data. Whether the data is compressed or not is indicated in the header. Was my point really. Why do segments contain both compressed and uncompressed objects? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/