Re: Review status (Re: [PATCH] LogFS take three)
On Wed, May 23, 2007 at 05:14:04PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: > > > I'm just a German. Forgive me if I drink lesser beverages. > > > > You should definitely change that. > > Change being German? Not a bad idea, actually. You cook up really tasty shnaps, in small quantities it is good for health in infinite volumes. > > Btw, what about this piece: > > > > int logfs_erase_segment(struct super_block *sb, u32 index) > > { > > struct logfs_super *super = LOGFS_SUPER(sb); > > > > super->s_gec++; > > > > return mtderase(sb, index << super->s_segshift, super->s_segsize); > > } > > > > index << super->s_segshift might overflow, mtderase expects loff_t > > there, since index can be arbitrary segment number, is it possible, that > > overflow really occurs? > > Indeed it is. You just earned your second beer^Wvodka. Actually this code looks less encrypted than ext2 for, which definitely a good sign from reviewer's point of view. > Jörn -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Review status (Re: [PATCH] LogFS take three)
On Wed, 23 May 2007 19:07:32 +0400, Evgeniy Polyakov wrote: > On Wed, May 23, 2007 at 02:58:41PM +0200, Jörn Engel ([EMAIL PROTECTED]) > wrote: > > On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote: > > And what if it is 33 bits? Or it is not allowed? Not allowed. Both number and size of segments may never exceed 32bit. > > > segsize is long, but should be u64 I think. > > > > It could be s32 as well. > > It is a matter of definition - if segment size is allowed to be more > than 32 bits, then below transformation is not correct, otherwise > segment size should not use additional 32bits on 64bit platform, since > it is long. I guess I could save 4 Bytes there. > > I'm just a German. Forgive me if I drink lesser beverages. > > You should definitely change that. Change being German? Not a bad idea, actually. > Btw, what about this piece: > > int logfs_erase_segment(struct super_block *sb, u32 index) > { > struct logfs_super *super = LOGFS_SUPER(sb); > > super->s_gec++; > > return mtderase(sb, index << super->s_segshift, super->s_segsize); > } > > index << super->s_segshift might overflow, mtderase expects loff_t > there, since index can be arbitrary segment number, is it possible, that > overflow really occurs? Indeed it is. You just earned your second beer^Wvodka. Jörn -- The wise man seeks everything in himself; the ignorant man tries to get everything from somebody else. -- unknown - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Review status (Re: [PATCH] LogFS take three)
On Wed, May 23, 2007 at 02:58:41PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: > On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote: > > > > In that case segment size must be more than 32 bits, or below > > transformation will not be correct? > > Must it? If segment size is just 20bit then the filesystem may only be > 52bit. Or 51bit when using signed values. And what if it is 33 bits? Or it is not allowed? > > segsize is long, but should be u64 I think. > > It could be s32 as well. It is a matter of definition - if segment size is allowed to be more than 32 bits, then below transformation is not correct, otherwise segment size should not use additional 32bits on 64bit platform, since it is long. > > static void fixup_from_wbuf(struct super_block *sb, struct logfs_area > > *area, void *read, u64 ofs, size_t readlen) > > > > u32 read_start = ofs & (super->s_segsize - 1); > > u32 read_end = read_start + readlen; > > > > And this can overflow, since readlen is size_t. > > Theoretically yes. Practically readlen is bounded to sb->blocksize plus > one header. I'll start worrying about that when blocksize approaches > 32bit limit. > > > > If anyone can find similar bugs, the bounty is a beer or non-alcoholic > > > beverage of choice. :) > > > > Stop kiling your kidneys, your health and promote such antisocial style > > of life, start drinking vodka instead. > > I'm just a German. Forgive me if I drink lesser beverages. You should definitely change that. Btw, what about this piece: int logfs_erase_segment(struct super_block *sb, u32 index) { struct logfs_super *super = LOGFS_SUPER(sb); super->s_gec++; return mtderase(sb, index << super->s_segshift, super->s_segsize); } index << super->s_segshift might overflow, mtderase expects loff_t there, since index can be arbitrary segment number, is it possible, that overflow really occurs? > Jörn > > -- > Eighty percent of success is showing up. > -- Woody Allen -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Review status (Re: [PATCH] LogFS take three)
On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote: > > In that case segment size must be more than 32 bits, or below > transformation will not be correct? Must it? If segment size is just 20bit then the filesystem may only be 52bit. Or 51bit when using signed values. > segsize is long, but should be u64 I think. It could be s32 as well. > static void fixup_from_wbuf(struct super_block *sb, struct logfs_area > *area, void *read, u64 ofs, size_t readlen) > > u32 read_start = ofs & (super->s_segsize - 1); > u32 read_end = read_start + readlen; > > And this can overflow, since readlen is size_t. Theoretically yes. Practically readlen is bounded to sb->blocksize plus one header. I'll start worrying about that when blocksize approaches 32bit limit. > > If anyone can find similar bugs, the bounty is a beer or non-alcoholic > > beverage of choice. :) > > Stop kiling your kidneys, your health and promote such antisocial style > of life, start drinking vodka instead. I'm just a German. Forgive me if I drink lesser beverages. Jörn -- Eighty percent of success is showing up. -- Woody Allen - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Review status (Re: [PATCH] LogFS take three)
On Thu, May 17, 2007 at 07:10:19PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: > On Thu, 17 May 2007 20:03:11 +0400, Evgeniy Polyakov wrote: > > > > Is logfs 32bit fs or 674bit, since although you use 64bit values for > > offsets, area management and strange converstions like described below > > from offset into segment number are performed in 32bit? > > Is it enough for SSD for example to be 32bit only? Or if it is 64bit, > > could you please explain logic behind area management? > > Ignoring bugs and signed return values for error handling, it is either > 64bit or 32+32bit. > > Inode numbers and file positions are 64bit. Offsets are 64bit as well. > In a couple of places, offsets are also 32+32bit. Basically the high > bits contain the segment number, the lower bits the offset within a > segment. In that case segment size must be more than 32 bits, or below transformation will not be correct? segsize is long, but should be u64 I think. static void fixup_from_wbuf(struct super_block *sb, struct logfs_area *area, void *read, u64 ofs, size_t readlen) u32 read_start = ofs & (super->s_segsize - 1); u32 read_end = read_start + readlen; And this can overflow, since readlen is size_t. It is wbuf fixup, but I saw that somewhere else. Although, according to your description, it should be 32bit, sum can be more than 32 bit. > If anyone can find similar bugs, the bounty is a beer or non-alcoholic > beverage of choice. :) Stop kiling your kidneys, your health and promote such antisocial style of life, start drinking vodka instead. > Jörn -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Saturday 19 May 2007 5:24 am, Jan Engelhardt wrote: > > On May 19 2007 02:15, Rob Landley wrote: > >> > + > >> > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode) > >> > +{ > >> > +return container_of(inode, struct logfs_inode, vfs_inode); > >> > +} > >> > >> Do these need to be uppercase? > > > >I'm trying to keep it clear in my head... > > > >When do you need to say __always_inline and when can you get away with > >just saying "static inline"? > > When using "static inline", the compiler may ignore the inline keyword > (it's just a hint), and leave the function as a standalone function. > > When CONFIG_FORCED_INLINING is active, and it is by default, inline is > always substituted by __always_inline, to be on the safe side. Some code > needs to be always inline; but not all code has been checked whether it > is safe to go from __always_inline to inline. I've seen patches go by using __always_inline directly. Is there some janitorial effort to examine each instance of the the inline keyword and either replace it with "__always_inline" or remove it? Right now "inline" seems to be about as useful as the "register" keyword. You don't feed hints to a compiler like gcc, you hit it with a two-by-four and thumbscrews if you want to get its' attention. Rob - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Sat, May 19, 2007 at 05:17:32PM +0100, Jamie Lokier ([EMAIL PROTECTED]) wrote: > > So, log2fs... Sounds great to me. > > Why Log2? Logarithmic scaling is just logarithmic scaling. Does the > filesystem use 2-ary trees or anything else which gives particular > meaning to 2? Sizes used in on-disk format are rounded to the nearest power-of-two. > -- Jamie -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
David Weinehall wrote: > > It is also the filesystem that tries to scale logarithmically, as Arnd > > has noted. Maybe I should call it Log2 to emphesize this point. Log1 > > would be horrible scalability. > > So, log2fs... Sounds great to me. Why Log2? Logarithmic scaling is just logarithmic scaling. Does the filesystem use 2-ary trees or anything else which gives particular meaning to 2? -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
Dongjun Shin wrote: There are so many flash-based storage and some disposable storages, as you pointed out, have poor quality. I think it's mainly because these are not designed for good quality, but for lowering the price. The reliability seems to be appropriate to the common use. I'm doubious that computer storage was a big design factor until the last few years. A good argument for buying large sizes, they are more likely to be recent design. These kind of devices are not ready for things like power failure because their use case is far from that. For example, removing flash card while taking pictures using digital camera is not a common use case. (there should be a written notice that this kind of action is against the warranty) They do well in such use, if you equate battery death to pulling the card (it may not be). I have tested that feature and not had a failure of any but the last item. Clearly not recommended, but sometimes unplanned needs arise. - In contrast to the embedded environment where CPU and flash is directly connected, the I/O path between CPU and flash in PC environment is longer. The latency for SW handshaking between CPU and flash will also be longer, which would make the performance optimization harder. As I mentioned, some techniques like log-structured filesystem could perform generally better on any kind of flash-based storage with FTL. Although there are many kinds of FTL, it is commonly true that it performs well under workload where sequential write is dominant. I also expect that FTL for PC environment will have better quality spec than the disposable storage. The recent technology announcements from Intel are encouraging in that respect. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, May 16, 2007 at 03:53:19PM +0200, Jörn Engel wrote: > On Wed, 16 May 2007 09:41:10 -0400, John Stoffel wrote: > > Jörn> On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: > > > > Jörn> How many of you have worked for IBM before? Vowels are not > > evil. ;) > > > > Nope, they're not. I just think that LogFS isn't descriptive enough, > > or more accurately, is the *wrong* description of this filesystem. > > That was the whole point. JFFS2, the journaling flash filesystem, is a > strictly log-structured filesystem. LogFS has a journal. > > It is also the filesystem that tries to scale logarithmically, as Arnd > has noted. Maybe I should call it Log2 to emphesize this point. Log1 > would be horrible scalability. So, log2fs... Sounds great to me. [snip] Regards: David -- /) David Weinehall <[EMAIL PROTECTED]> /) Northern lights wander (\ // Maintainer of the v2.0 kernel // Dance across the winter sky // \) http://www.acc.umu.se/~tao/(/ Full colour fire (/ - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
Kevin Bowling wrote: On 5/16/07, David Woodhouse <[EMAIL PROTECTED]> wrote: On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote: > > My experience is that no matter which name I pick, people will > complain > anyway. Previous suggestions included: > jffs3 > jefs > engelfs > poofs > crapfs > sweetfs > cutefs > dynamic journaling fs - djofs > tfsfkal - the file system formerly known as logfs Can we call it jörnfs? :) However if Jörn is accused of murder, it will have little chance of being merged :-). WRT that, seems that Nina had a lover who is a confessed serial killer. I'm surprised the case hasn't been adapter for 'Boston legal' and 'Law and Order' like other high profile crimes. I see nothing wrong with jörnfs, and there's room for numbers at the end... -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On May 19 2007 02:15, Rob Landley wrote: >> > + >> > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode) >> > +{ >> > + return container_of(inode, struct logfs_inode, vfs_inode); >> > +} >> >> Do these need to be uppercase? > >I'm trying to keep it clear in my head... > >When do you need to say __always_inline and when can you get away with >just saying "static inline"? When using "static inline", the compiler may ignore the inline keyword (it's just a hint), and leave the function as a standalone function. When CONFIG_FORCED_INLINING is active, and it is by default, inline is always substituted by __always_inline, to be on the safe side. Some code needs to be always inline; but not all code has been checked whether it is safe to go from __always_inline to inline. Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Tuesday 15 May 2007 4:37 pm, Andrew Morton wrote: > > +static inline struct logfs_super *LOGFS_SUPER(struct super_block *sb) > > +{ > > + return sb->s_fs_info; > > +} > > + > > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode) > > +{ > > + return container_of(inode, struct logfs_inode, vfs_inode); > > +} > > Do these need to be uppercase? I'm trying to keep it clear in my head... When do you need to say __always_inline and when can you get away with just saying "static inline"? (I'm attempting to write documentation on a topic I don't understand. Best way to learn it, I've found...) > > + buf = kmap(page); > > + ret = logfs_write_buf(inode, index, buf); > > + kunmap(page); > > kmap() is lame. The preferred approach would be to pass the page* down to > the lower layers and to use kmap_atomic() at the lowest possible point. Um, would I read about this in DMA-mapping.txt or cachetlb.txt? (I don't think it's fujitsu/frv/mmu-layout.txt) Rob - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Fri, 2007-05-18 at 08:17 +0200, Jan Engelhardt wrote: > > AFAIK, the camera stops writing to the flash card and automatically > > turns off when it's low on battery (before empty). > > But then, one should also consider the case where a cam is connected to > AC and someone inadvertently trips on the power cord. So you stick a bloody great cap on board to give you enough time to shut it down cleanly. I've known people do this -- and it helps, but the devices still manage to crap themselves occasionally even then. They're _disposable_. As are your data :) -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On May 18 2007 09:01, Dongjun Shin wrote: > On 5/18/07, Pavel Machek <[EMAIL PROTECTED]> wrote: >> >> Hmm.. so operating your camera on batteries should be against the >> warranty, since batteries commonly run empty while storing pictures? > > AFAIK, the camera stops writing to the flash card and automatically > turns off when it's low on battery (before empty). But then, one should also consider the case where a cam is connected to AC and someone inadvertently trips on the power cord. Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On May 17 2007 21:00, Kyle Moffett wrote: >> > > Opinions? >> > >> > Why would we need another btree, when there is lib/rbtree.c? Or does >> > yours do something fundamentally different? >> >> It is not red-black tree, it is b+ tree. > > It might be better to use the prefix "bptree" to help prevent confusion. A > quick google search on "bp-tree" reveals only the perl B+-tree module > "Tree::BPTree", a U-Maryland Java CS project on B+-trees, and a news article > about a "BP tree-top protest". BP heh.. How about "struct bplustree"? Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On May 17, 2007, at 13:45:33, Evgeniy Polyakov wrote: On Thu, May 17, 2007 at 07:26:07PM +0200, Jan Engelhardt ([EMAIL PROTECTED]) wrote: My plan was to move this code to lib/ sooner or later. If you consider it useful in its current state, I can do it immediatly. And if someone else merged a superior btree library I'd happily remove mine and use the new one instead. Opinions? Why would we need another btree, when there is lib/rbtree.c? Or does yours do something fundamentally different? It is not red-black tree, it is b+ tree. It might be better to use the prefix "bptree" to help prevent confusion. A quick google search on "bp-tree" reveals only the perl B +-tree module "Tree::BPTree", a U-Maryland Java CS project on B+- trees, and a news article about a "BP tree-top protest". Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
Hi, On 5/18/07, Pavel Machek <[EMAIL PROTECTED]> wrote: Hi! Hmm.. so operating your camera on batteries should be against the warranty, since batteries commonly run empty while storing pictures? AFAIK, the camera stops writing to the flash card and automatically turns off when it's low on battery (before empty). - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
Jörn Engel wrote: > > Almost all your static functions start with logfs_, why not this one? > > Because after a while I discovered how silly it is to start every > function with logfs_. That prefix doesn't add much unless the function > has global scope. What I didn't do was remove the prefix from older > functions. It's handy when debugging or showing detailed backtraces. Not that I'm advocating it (or not), just something I've noticed in other programs. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Thu, 17 May 2007 23:36:13 +0200, Arnd Bergmann wrote: > On Thursday 17 May 2007, Pekka Enberg wrote: > > > > So any sane way to enable compression is on per-inode basis which makes > > me still wonder why you need per-object compression. > > 1. it doesn't require user interaction, the file system will do the right > thing most of the time. > > 2. enlarging data is a very bad thing because it makes the behaviour > of the fs unpredictable. With uncompressed objects, you have a guaranteed > upper bound on the size. Correct. The compression decision is always per-object. Per-inode is a hint from userspace that a compression attempt would be futile. A compression algorithm that compresses any data is provably impossible. Some data will always cause expansion instead of compression. Some algorithms have a well-known upper bound on the expansion, others don't. So LogFS instead creates its own upper bound by reserving one byte in the header for the compression type. And while one bit would suffice as a compressed/uncompressed flag, having a byte allows to support more than one compression algorithm. LZO looks promising and is on its way into the kernel. Others may come in the future. Jörn -- My second remark is that our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed. -- Edsger W. Dijkstra - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Thu, 17 May 2007 23:00:20 +0200, Arnd Bergmann wrote: > > Just using nanoseconds probably doesn't gain you much after all > then. You could however just have separate 32 bit fields in the > inode for seconds and nanoseconds, that will result in the exact > same layout that you have right now, but won't require a conversion > function. I could also have a 30bit and a 34bit field. 30bit is enough for nanoseconds. So many options. Jörn -- Time? What's that? Time is only worth what you do with it. -- Theo de Raadt - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Thursday 17 May 2007, Pekka Enberg wrote: > > Jörn Engel wrote: > > Compressing random data will actually enlarge it. If that happens I > > simply store the verbatim uncompressed data instead and mark it as such. > > > > There is also demand for a user-controlled bit in the inode to disable > > compression completely. All those .jpg, .mpg, .mp3, etc. just waste > > time by trying and failing to compress them. > > So any sane way to enable compression is on per-inode basis which makes > me still wonder why you need per-object compression. 1. it doesn't require user interaction, the file system will do the right thing most of the time. 2. enlarging data is a very bad thing because it makes the behaviour of the fs unpredictable. With uncompressed objects, you have a guaranteed upper bound on the size. Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Thursday 17 May 2007, Jörn Engel wrote: > > > Why not just store 64 bit nanoseconds? that would avoid the problem > > with ns overflow and the year-2038 bug. OTOH, that would require > > a 64 bit integer division when reading the data, so it gets you > > a runtime overhead. > > I like the idea. Do conversion function exist both way? > > What I don't get is the year-2038 bug. Isn't that the 31bit limit, > while 32bit would last to 2106? You're right, you don't hit the 2038 bug here, because you use an unsigned variable. The bug exists elsewhere because time_t tv_sec is signed. Just using nanoseconds probably doesn't gain you much after all then. You could however just have separate 32 bit fields in the inode for seconds and nanoseconds, that will result in the exact same layout that you have right now, but won't require a conversion function. Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
Jörn Engel wrote: Compressing random data will actually enlarge it. If that happens I simply store the verbatim uncompressed data instead and mark it as such. There is also demand for a user-controlled bit in the inode to disable compression completely. All those .jpg, .mpg, .mp3, etc. just waste time by trying and failing to compress them. So any sane way to enable compression is on per-inode basis which makes me still wonder why you need per-object compression. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
Hi! > >Yes. These things are almost always implemented _very_ > >badly by the same > >kind of crack-smoking hobo they drag in off the streets > >to write BIOSen. > > > >It's bog-roll technology; if you fancy a laugh try > >doing some real > >reliability tests on them time some. Powerfail testing > >is a good one. > > > >This kind of thing is OK for disposable storage such as > >in digital > >cameras, where it doesn't matter that it's no more > >reliable than a > >floppy disc, but for real long-term storage it's really > >a bad idea. > > > > There are so many flash-based storage and some > disposable storages, > as you pointed out, have poor quality. I think it's > mainly because these > are not designed for good quality, but for lowering the > price. > > These kind of devices are not ready for things like > power failure because > their use case is far from that. For example, removing > flash card > while taking pictures using digital camera is not a > common use case. > (there should be a written notice that this kind of > action is against > the warranty) Hmm.. so operating your camera on batteries should be against the warranty, since batteries commonly run empty while storing pictures? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Thu, 17 May 2007 17:08:51 +0200, Arnd Bergmann wrote: > On Tuesday 15 May 2007, Jörn Engel wrote: > > Add LogFS, a scalable flash filesystem. > > Sorry for not commenting earlier, there were so many discussions on version > two that I wanted to wait for the fallout of that instead of duplicating > all the comments. You are the last person that has to be sorry. ;) > Here are a few things I notice while going through the third version: > > > +/* > > + * Private errno for accessed beyond end-of-file. Only used internally to > > + * logfs. If this ever gets exposed to userspace or even other parts of > > the > > + * kernel, it is a bug. 256 was chosen as a number sufficiently above all > > + * used errno #defines. > > + * > > + * It can be argued that this is a hack and should be replaced with > > something > > + * else. My last attempt to do this failed spectacularly and there are > > more > > + * urgent problems that users actually care about. This will remain for > > the > > + * moment. Patches are wellcome, of course. > > + */ > > +#define EOF256 > > It should at least be in the kernel-only errno range between 512 and 4095, > that way it can eventually be added to include/linux/errno.h. Fair enough. 512 it is. > > + * Target rename works in three atomic steps: > > + * 1. Attach old inode to new dentry (remember old dentry and new inode) > > + * 2. Remove old dentry (still remember the new inode) > > + * 3. Remove new inode > > + * > > + * Here we remember both an inode an a dentry. If we get interrupted > > + * between steps 1 and 2, we delete both the dentry and the inode. If > > + * we get interrupted between steps 2 and 3, we delete just the inode. > > + * In either case, the remaining objects are deleted on next mount. From > > + * a users point of view, the operation succeeded. > > This description had me confused for a while: why would you remove the > new inode. Maybe change the text to say 'target inode' or 'victim inode'? 'Victim inode' sounds good. Will do. > > +static int logfs_mkdir(struct inode *dir, struct dentry *dentry, int mode) > > +{ > > + struct inode *inode; > > + > > + if (dir->i_nlink >= LOGFS_LINK_MAX) > > + return -EMLINK; > > Why is i_nlink limited? Don't you run out of space for inodes before > overflowing? I don't know. With the current limit of 2^31, a sufficiently large device can reach the limit. And it is imaginable that overflowing the s32 number space can expose security holes. Not that I actually know, the check is pure paranoia. > > + * In principle, this function should loop forever, looking for GC > > candidates > > + * and moving data. LogFS is designed in such a way that this loop is > > + * guaranteed to terminate. > > + * > > + * Limiting the loop to four iterations serves purely to catch cases when > > + * these guarantees have failed. An actual endless loop is an obvious bug > > + * and should be reported as such. > > + * > > + * But there is another nasty twist to this. As I have described in my LCA > > + * presentation, Garbage collection would have to limit itself to higher > > + * levels if the number of available free segments goes down. This code > > + * doesn't and should fail spectacularly. Yet - hard as I tried I haven't > > + * been able to make it fail (short of a bug elsewhere). > > + * > > + * So in a way this code is intentionally wrong as a desperate cry for a > > + * better testcase. And I do expect to get blamed for it one day. :( > > + */ > > Could you bug the code to reserve fewer segments for GC than you really > need, in order to stress test GC? I could. Wear leveling will cause changes in the area, so I'll have a closer look when implementing that. > > +static struct inode *logfs_alloc_inode(struct super_block *sb) > > +{ > > + struct logfs_inode *li; > > + > > + li = kmem_cache_alloc(logfs_inode_cache, GFP_KERNEL); > > + if (!li) > > + return NULL; > > + logfs_init_inode(&li->vfs_inode); > > + return &li->vfs_inode; > > +} > > + > > + > > +struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino) > > +{ > > + struct inode *inode; > > + > > + inode = logfs_alloc_inode(sb); > > + if (!inode) > > + return ERR_PTR(-ENOMEM); > > + > > + logfs_init_inode(inode); > > logfs_alloc_inode() returns an initialized inode, so no need to call > logfs_init_inode() again, right? Right. Will change. > > +static __be64 timespec_to_be64(struct timespec tsp) > > +{ > > + u64 time = ((u64)tsp.tv_sec << 32) + (tsp.tv_nsec & 0x); > > + > > + WARN_ON(tsp.tv_nsec > 9); > > + return cpu_to_be64(time); > > +} > > Why not just store 64 bit nanoseconds? that would avoid the problem > with ns overflow and the year-2038 bug. OTOH, that would require > a 64 bit integer division when reading the data, so it gets you > a runtime overhead. I like the idea. Do conversion function exist both way? What I don't get is the year-2038 bug.
Re: [PATCH] LogFS take three
On Thu, May 17, 2007 at 07:26:07PM +0200, Jan Engelhardt ([EMAIL PROTECTED]) wrote: > >My plan was to move this code to lib/ sooner or later. If you consider > >it useful in its current state, I can do it immediatly. And if someone > >else merged a superior btree library I'd happily remove mine and use the > >new one instead. > > > >Opinions? > > Why would we need another btree, when there is lib/rbtree.c? > Or does yours do something fundamentally different? It is not red-black tree, it is b+ tree. > Jan -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On May 16 2007 02:06, Jörn Engel wrote: > >> > +/* memtree.c */ >> > +void btree_init(struct btree_head *head); >> > +void *btree_lookup(struct btree_head *head, long val); >> > +int btree_insert(struct btree_head *head, long val, void *ptr); >> > +int btree_remove(struct btree_head *head, long val); >> >> These names are too generic. If we later add a btree library: blam. > >My plan was to move this code to lib/ sooner or later. If you consider >it useful in its current state, I can do it immediatly. And if someone >else merged a superior btree library I'd happily remove mine and use the >new one instead. > >Opinions? Why would we need another btree, when there is lib/rbtree.c? Or does yours do something fundamentally different? Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On May 16 2007 15:53, Jörn Engel wrote: > >My experience is that no matter which name I pick, people will complain >anyway. Previous suggestions included: [...] > >Plus today: >FFFS >flashfs >fredfs >bob >shizzle > >Imo they all suck. LogFS also sucks, but it allows me to make a stupid >joke and keep my logfs.org domain. Try woodfs! (log - wood - get it?) But finding names can be so tiresome, just give it a Borg-style designation - "filesystem 125" or so. fs2007q1, being this quartal's new filesystem. Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On May 16 2007 22:06, CaT wrote: >On Wed, May 16, 2007 at 01:50:03PM +0200, J??rn Engel wrote: >> On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: >> > >> > But if akpm can't pronounce it, how about FFFS for faster flash >> > filesystem ;-) >> >> How many of you have worked for IBM before? Vowels are not evil. ;) >> >> Grouping four or more consonants to name anything will cause similar >> expressions on people's faces. Numbers don't help much either. >> >> Ext2 is a great name, because "ext" actually is a pronouncable syllable. >> MinixFS, ChunkFS, TileFS are great too. XFS and JFS are ok, at least >> they only have three consonants. But FFS exists, so I'd rather go for a >> syllable. > >FlashFS? Or just try once dropping all those redundant 'fs' suffixes. bdev, proc, cpuset, devpts, mqueue, fuse(blk|ctl), vfat, iso9660, etc. Then there's much more space for innovative names. Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On May 16 2007 14:55, Jörn Engel wrote: >On Wed, 16 May 2007 16:29:22 +0400, Evgeniy Polyakov wrote: >> On Wed, May 16, 2007 at 01:50:03PM +0200, Jörn Engel ([EMAIL PROTECTED]) >> wrote: >> > On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: >> > > >> > > But if akpm can't pronounce it, how about FFFS for faster flash >> > > filesystem ;-) >> > >> > How many of you have worked for IBM before? Vowels are not evil. ;) >> >> Do you think 'eieio' is a good set? IBM's work too... C'mon, UIO does not cut IIO either ;-) Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On May 16 2007 13:09, Jörn Engel wrote: >On Wed, 16 May 2007 12:54:14 +0800, David Woodhouse wrote: >> >> Personally I'd just go for 'JFFS3'. After all, it has a better claim to >> the name than either of its predecessors :) > >Did you ever see akpm's facial expression when he tried to pronounce >"JFFS2"? ;) Is there something special with [dʒeɪ ɛf ɛf ɛs tuː]? Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Review status (Re: [PATCH] LogFS take three)
On Thu, 17 May 2007 20:03:11 +0400, Evgeniy Polyakov wrote: > > Is logfs 32bit fs or 674bit, since although you use 64bit values for > offsets, area management and strange converstions like described below > from offset into segment number are performed in 32bit? > Is it enough for SSD for example to be 32bit only? Or if it is 64bit, > could you please explain logic behind area management? Ignoring bugs and signed return values for error handling, it is either 64bit or 32+32bit. Inode numbers and file positions are 64bit. Offsets are 64bit as well. In a couple of places, offsets are also 32+32bit. Basically the high bits contain the segment number, the lower bits the offset within a segment. Side note: It would be nicer if the high 32bit were segment number. Instead the number of bits depends on segment size. Guess I should change that while the format isn't fixed yet. An "area" is a segment that is currently being written. Data is appended to this segment as it comes in, until the segment is full. Any functions dealing with areas only need a 32bit offset, which is the offset within the area, not the absolute device offset. Writes within an area are also buffered. New data first goes into the write buffer (wbuf) and only when this is full is it flushed to the device. NAND flash and some NOR flashes require such buffering. When writing to the device, the 32bit segno and the 32bit in-segment offset need to get converted back to a 64bit device offset. > I've found that you store segment numbers as 32bit values (for example > in prepare_write()), and convert requested 64bit offset into segment > number via superblock's s_segshift. Yes, as described above. > This conversation seems confusing to me in case of real 64bit offsets. > For example this one obtained via prepare_write: > > 7 1 logfs_prepare_write78 fs/logfs/file.c > 8 2 logfs_readpage_nolock20 fs/logfs/file.c > 9 1 logfs_read_block 351 fs/logfs/readwrite.c > 10 1 logfs_read_loop 139 fs/logfs/readwrite.c > 11 2 logfs_segment_read 108 fs/logfs/readwrite.c > 12 1 wbuf_read 289 > > u32 segno = ofs >> super->s_segshift; > > ofs is originally obtained from inode's li_data array, which is filled > with raw segment numbers which can be 64bit (here is another issue, > since logfs_segment_write() returns signed, so essentially logfs is > 63bit filesystem). The filesystem format is 64bit. The current code can only deal with 63bit. Eric Sandeen just fixed ext2 to actually deal with 32bit numbers and the same is possible for logfs. If anyone ever cares... > But here I've came to area management in logfs, and found that it is > 32bit only, for example __logfs_segment_write()/__logfs_get_free_bytes() > returns signed 32 bit value (so it is reduced to 31 bit), which is then > placed into li_data as 64bit value. The latter > (__logfs_get_free_bytes()) truncates 64bit data value obtained via > dev_ofs() into signed 32 bit value. That indeed is a bug. __logfs_get_free_bytes() should return s64 instead of s32. Will fix immediatly. If anyone can find similar bugs, the bounty is a beer or non-alcoholic beverage of choice. :) Jörn -- To announce that there must be no criticism of the President, or that we are to stand by the President, right or wrong, is not only unpatriotic and servile, but is morally treasonable to the American public. -- Theodore Roosevelt, Kansas City Star, 1918 - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Review status (Re: [PATCH] LogFS take three)
Hi Jörn. Is logfs 32bit fs or 674bit, since although you use 64bit values for offsets, area management and strange converstions like described below from offset into segment number are performed in 32bit? Is it enough for SSD for example to be 32bit only? Or if it is 64bit, could you please explain logic behind area management? I've found that you store segment numbers as 32bit values (for example in prepare_write()), and convert requested 64bit offset into segment number via superblock's s_segshift. This conversation seems confusing to me in case of real 64bit offsets. For example this one obtained via prepare_write: 7 1 logfs_prepare_write78 fs/logfs/file.c 8 2 logfs_readpage_nolock20 fs/logfs/file.c 9 1 logfs_read_block 351 fs/logfs/readwrite.c 10 1 logfs_read_loop 139 fs/logfs/readwrite.c 11 2 logfs_segment_read 108 fs/logfs/readwrite.c 12 1 wbuf_read 289 u32 segno = ofs >> super->s_segshift; ofs is originally obtained from inode's li_data array, which is filled with raw segment numbers which can be 64bit (here is another issue, since logfs_segment_write() returns signed, so essentially logfs is 63bit filesystem). But here I've came to area management in logfs, and found that it is 32bit only, for example __logfs_segment_write()/__logfs_get_free_bytes() returns signed 32 bit value (so it is reduced to 31 bit), which is then placed into li_data as 64bit value. The latter (__logfs_get_free_bytes()) truncates 64bit data value obtained via dev_ofs() into signed 32 bit value. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Tuesday 15 May 2007, Jörn Engel wrote: > Add LogFS, a scalable flash filesystem. Hi Jörn, Sorry for not commenting earlier, there were so many discussions on version two that I wanted to wait for the fallout of that instead of duplicating all the comments. Here are a few things I notice while going through the third version: > +/* > + * Private errno for accessed beyond end-of-file. Only used internally to > + * logfs. If this ever gets exposed to userspace or even other parts of the > + * kernel, it is a bug. 256 was chosen as a number sufficiently above all > + * used errno #defines. > + * > + * It can be argued that this is a hack and should be replaced with something > + * else. My last attempt to do this failed spectacularly and there are more > + * urgent problems that users actually care about. This will remain for the > + * moment. Patches are wellcome, of course. > + */ > +#define EOF 256 It should at least be in the kernel-only errno range between 512 and 4095, that way it can eventually be added to include/linux/errno.h. > + * Target rename works in three atomic steps: > + * 1. Attach old inode to new dentry (remember old dentry and new inode) > + * 2. Remove old dentry (still remember the new inode) > + * 3. Remove new inode > + * > + * Here we remember both an inode an a dentry. If we get interrupted > + * between steps 1 and 2, we delete both the dentry and the inode. If > + * we get interrupted between steps 2 and 3, we delete just the inode. > + * In either case, the remaining objects are deleted on next mount. From > + * a users point of view, the operation succeeded. This description had me confused for a while: why would you remove the new inode. Maybe change the text to say 'target inode' or 'victim inode'? > +static int logfs_mkdir(struct inode *dir, struct dentry *dentry, int mode) > +{ > + struct inode *inode; > + > + if (dir->i_nlink >= LOGFS_LINK_MAX) > + return -EMLINK; Why is i_nlink limited? Don't you run out of space for inodes before overflowing? > + * In principle, this function should loop forever, looking for GC candidates > + * and moving data. LogFS is designed in such a way that this loop is > + * guaranteed to terminate. > + * > + * Limiting the loop to four iterations serves purely to catch cases when > + * these guarantees have failed. An actual endless loop is an obvious bug > + * and should be reported as such. > + * > + * But there is another nasty twist to this. As I have described in my LCA > + * presentation, Garbage collection would have to limit itself to higher > + * levels if the number of available free segments goes down. This code > + * doesn't and should fail spectacularly. Yet - hard as I tried I haven't > + * been able to make it fail (short of a bug elsewhere). > + * > + * So in a way this code is intentionally wrong as a desperate cry for a > + * better testcase. And I do expect to get blamed for it one day. :( > + */ Could you bug the code to reserve fewer segments for GC than you really need, in order to stress test GC? > +static struct inode *logfs_alloc_inode(struct super_block *sb) > +{ > + struct logfs_inode *li; > + > + li = kmem_cache_alloc(logfs_inode_cache, GFP_KERNEL); > + if (!li) > + return NULL; > + logfs_init_inode(&li->vfs_inode); > + return &li->vfs_inode; > +} > + > + > +struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino) > +{ > + struct inode *inode; > + > + inode = logfs_alloc_inode(sb); > + if (!inode) > + return ERR_PTR(-ENOMEM); > + > + logfs_init_inode(inode); logfs_alloc_inode() returns an initialized inode, so no need to call logfs_init_inode() again, right? > +static __be64 timespec_to_be64(struct timespec tsp) > +{ > + u64 time = ((u64)tsp.tv_sec << 32) + (tsp.tv_nsec & 0x); > + > + WARN_ON(tsp.tv_nsec > 9); > + return cpu_to_be64(time); > +} Why not just store 64 bit nanoseconds? that would avoid the problem with ns overflow and the year-2038 bug. OTOH, that would require a 64 bit integer division when reading the data, so it gets you a runtime overhead. > +static void logfs_read_inode(struct inode *inode) > +{ > + int ret; > + > + BUG_ON(inode->i_ino == LOGFS_INO_MASTER); > + > + ret = __logfs_read_inode(inode); > + > + /* What else can we do here? */ > + BUG_ON(ret); > +} ext2 returns make_bad_inode(inode) in this case, which seems to be a better solution than crashing. > +int __logfs_write_inode(struct inode *inode) > +{ > + /* > + * FIXME: Those two inodes are 512 bytes in total. Not good to > + * have on the stack. Possibly the best solution would be to bite > + * the bullet and do another format change before release and > + * shrink the inodes. > + */ > + struct logfs_disk_inode old, new; > + > + BUG_ON(inode->i_ino == LOGFS_INO_MASTER); > + > + /* read and compare the inode first. If it hasn't
Re: [PATCH] LogFS take three
On Thu, 17 May 2007 16:43:59 +0800, David Woodhouse wrote: > > > As I mentioned, some techniques like log-structured filesystem could > > perform generally better on any kind of flash-based storage with FTL. > > Although there are many kinds of FTL, it is commonly true that > > it performs well under workload where sequential write is dominant. > > Yes, it's certainly possible that we _could_ write a file system which > is specifically targeted at FTL -- I was just wondering why anyone would > _bother_ :) Haven't you done that already? JFFS2 write behaviour is the best-case scenario for any FTL. When the delta cache is finished, LogFS will be pretty close to that as well. Not sure if anyone would specifically target FTL. Being well-suited for those beasts is just a side-effect. The FTL is still a net loss. Without that FAT enabling layer a real flash filesystem would be more efficient. Jörn -- Prosperity makes friends, adversity tries them. -- Publilius Syrus - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Tuesday 15 May 2007, Jörn Engel wrote: > > > I've been semi watching this, and the only comment I really can give > > is that I hate the name. To me, logfs implies a filesystem for > > logging purposes, not for Flash hardware with wear leveling issues to > > be taken into account. > > Yeah, well, ... > > Two years ago when I started all this, I was looking for a good name. > All I could come up with sounded stupid, so I picked "LogFS" as a code > name. As soon as I find a better name, the code name should get > replaced. > When doing a google search on logfs, there are less than five results among the first 100 that don't refer to your work. The other two listed in there are also log-structured file systems: The inferno flash file system (http://inferno-os.googlecode.com/svn/trunk/liblogfs/) and the (discontinued) file system named lfs from the 2005 google summer of code. I'd say the name should stay, changing it now can only add more confusion. Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Thu, 2007-05-17 at 09:12 +, Pavel Machek wrote: > Nah, it would lead to Jorn disappearing misteriously and _Pavel_ > accused of murder ;-). Are you suggesting that you would murder Jörn (you misspelled his name) merely for the heinous crime of using his own name? Your Luddism was already quite excessive, but now you really _are_ taking it to extremes. :) -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
> >> My experience is that no matter which name I pick, > >people will > >> complain > >> anyway. Previous suggestions included: > >> jffs3 > >> jefs > >> engelfs > >> poofs > >> crapfs > >> sweetfs > >> cutefs > >> dynamic journaling fs - djofs > >> tfsfkal - the file system formerly known as logfs > > > >Can we call it jörnfs? :) > > However if Jörn is accused of murder, it will have > little chance of > being merged :-). Nah, it would lead to Jorn disappearing misteriously and _Pavel_ accused of murder ;-). -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Thu, 2007-05-17 at 17:20 +0900, Dongjun Shin wrote: > There are, of course, cases where direct access are better. > However, as the demand for capacity, reliability and high performance > for the flash storage increases, the use of FTL with embedded controller > would be inevitable. > > - The complexity/cost of host-side SW (like JFFS2/MTD) will increase due to > the use of multiple flash in parallel and the use of high degree ECC > algorithm. > There are other things like bad block handling and wear-leveling issues > which has been recently touched by UBI with added SW complexity. You don't get rid of that complexity by offloading it to a µcontroller. The only thing you achieve that way is making sure it's quite likely to be done badly, and making it impossible to debug. > - In contrast to the embedded environment where CPU and flash is directly > connected, the I/O path between CPU and flash in PC environment is longer. > The latency for SW handshaking between CPU and flash will also be longer, > which would make the performance optimization harder. Do it the naïve way with a single byte push/pull and waggling the control lines separately, and what you say is true -- but you can have flash controllers which assist with data transfer but still give you essentially 'raw' access to the chip. With the CAFÉ controller designed for the OLPC machine, we can spew data across the PCI bus just as fast as we can suck it off the flash chip. > As I mentioned, some techniques like log-structured filesystem could > perform generally better on any kind of flash-based storage with FTL. > Although there are many kinds of FTL, it is commonly true that > it performs well under workload where sequential write is dominant. Yes, it's certainly possible that we _could_ write a file system which is specifically targeted at FTL -- I was just wondering why anyone would _bother_ :) I've seen an interesting file system which does have a kind of FTL internally as its lowest layer, and which build on that using 'virtual' sectors for file extents. Now that _does_ have its advantages -- but it doesn't go as far as pretending to be a 'normal' block device; it's its own special thing for internal use within that file system. > I also expect that FTL for PC environment will have better quality spec > than the disposable storage. There really is no reason why FTL has to be done badly; just as there's no _reason_ why hardware vendors have to give us crappy bsVendorCode. Nevertheless, that's the way the world tends to be. So good luck shipping with that :) -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On 5/17/07, David Woodhouse <[EMAIL PROTECTED]> wrote: Yes. These things are almost always implemented _very_ badly by the same kind of crack-smoking hobo they drag in off the streets to write BIOSen. It's bog-roll technology; if you fancy a laugh try doing some real reliability tests on them time some. Powerfail testing is a good one. This kind of thing is OK for disposable storage such as in digital cameras, where it doesn't matter that it's no more reliable than a floppy disc, but for real long-term storage it's really a bad idea. There are so many flash-based storage and some disposable storages, as you pointed out, have poor quality. I think it's mainly because these are not designed for good quality, but for lowering the price. These kind of devices are not ready for things like power failure because their use case is far from that. For example, removing flash card while taking pictures using digital camera is not a common use case. (there should be a written notice that this kind of action is against the warranty) There's little point in optimising a file system _specifically_ for devices which in often aren't reliable enough to keep your data anyway. You might as well use ramfs. It's unfortunate really -- there's no _fundamental_ reason why FTL has to be done so badly; it's just that it almost always is. Direct access to the flash from Linux is _always_ going to be better in practice -- and that way you avoid the problems with dual journalling, along with the problems with the underlying FTL continuing to keep (and copy around during GC) sectors which the top-level filesystem has actually deallocated, etc. There are, of course, cases where direct access are better. However, as the demand for capacity, reliability and high performance for the flash storage increases, the use of FTL with embedded controller would be inevitable. - The complexity/cost of host-side SW (like JFFS2/MTD) will increase due to the use of multiple flash in parallel and the use of high degree ECC algorithm. There are other things like bad block handling and wear-leveling issues which has been recently touched by UBI with added SW complexity. - In contrast to the embedded environment where CPU and flash is directly connected, the I/O path between CPU and flash in PC environment is longer. The latency for SW handshaking between CPU and flash will also be longer, which would make the performance optimization harder. As I mentioned, some techniques like log-structured filesystem could perform generally better on any kind of flash-based storage with FTL. Although there are many kinds of FTL, it is commonly true that it performs well under workload where sequential write is dominant. I also expect that FTL for PC environment will have better quality spec than the disposable storage. Dongjun - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Thu, 2007-05-17 at 15:12 +0900, Dongjun Shin wrote: > The current trend of flash-based device is to hide the flash-specific details > from the host OS. The flash memory is encapsulated in a package > which contains a dedicated controller where a small piece of software (F/W or > FTL) > runs and makes the storage shown as a block device to the host. Yes. These things are almost always implemented _very_ badly by the same kind of crack-smoking hobo they drag in off the streets to write BIOSen. It's bog-roll technology; if you fancy a laugh try doing some real reliability tests on them time some. Powerfail testing is a good one. This kind of thing is OK for disposable storage such as in digital cameras, where it doesn't matter that it's no more reliable than a floppy disc, but for real long-term storage it's really a bad idea. > IMHO, for a flash-optimized filesystem to be useful and widely-used, it would > be better > to run on a block device and to be designed to run efficiently on top of the > FTL. > (ex. log-structured filesystem on general block device) There's little point in optimising a file system _specifically_ for devices which in often aren't reliable enough to keep your data anyway. You might as well use ramfs. It's unfortunate really -- there's no _fundamental_ reason why FTL has to be done so badly; it's just that it almost always is. Direct access to the flash from Linux is _always_ going to be better in practice -- and that way you avoid the problems with dual journalling, along with the problems with the underlying FTL continuing to keep (and copy around during GC) sectors which the top-level filesystem has actually deallocated, etc. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 19:17:18 +, Pavel Machek wrote: > > In kernel fsck > > > --- /dev/null 2007-04-18 05:32:26.652341749 +0200 > > +++ linux-2.6.21logfs/fs/logfs/progs/fsck.c 2007-05-15 00:54:22.0 > > +0200 > > @@ -0,0 +1,332 @@ > > +/* > > + * fs/logfs/prog/fsck.c- filesystem check > > + * > > + * As should be obvious for Linux kernel code, license is GPLv2 > > + * > > + * Copyright (c) 2005-2007 Joern Engel > > + * > > + * In principle this could get moved to userspace. However it might still > > + * make some sense to keep it in the kernel. It is a pure checker and will > > + * only report problems, not attempt to repair them. > > + */ > > Is there version that repairs? No. > BUG is not right thing to do for media error. I know. Top 3 items of my todo list are: - Handle system crashes - Add second journal - Error handling > > + > > +#if 0 > > +/* rootdir */ > > Please just delete it, not comment it out like this. That will get resurrected, even before the move to userspace. I had to change the filesystem format for compression support and this is an artifact of the transition phase. Jörn -- Ninety percent of everything is crap. -- Sturgeon's Law - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
Hi! In kernel fsck > --- /dev/null 2007-04-18 05:32:26.652341749 +0200 > +++ linux-2.6.21logfs/fs/logfs/progs/fsck.c 2007-05-15 00:54:22.0 > +0200 > @@ -0,0 +1,332 @@ > +/* > + * fs/logfs/prog/fsck.c - filesystem check > + * > + * As should be obvious for Linux kernel code, license is GPLv2 > + * > + * Copyright (c) 2005-2007 Joern Engel > + * > + * In principle this could get moved to userspace. However it might still > + * make some sense to keep it in the kernel. It is a pure checker and will > + * only report problems, not attempt to repair them. > + */ Is there version that repairs? > + /* Some segments are reserved. Just pretend they were all valid */ > + reserved = btree_lookup(&super->s_reserved_segments, segno); > + if (reserved) > + return 0; > + > + err = wbuf_read(sb, dev_ofs(sb, segno, 0), sizeof(sh), &sh); > + BUG_ON(err); BUG is not right thing to do for media error. > +/* > + * fs/logfs/prog/mkfs.c - filesystem generation > + * > + * As should be obvious for Linux kernel code, license is GPLv2 > + * > + * Copyright (c) 2005-2007 Joern Engel > + * > + * Should get moved to userspace. > + */ Indeed. > + > +#if 0 > +/* rootdir */ Please just delete it, not comment it out like this. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 23:49:55 +0800, David Woodhouse wrote: > > Utility is a factor of the underlying design -- a filesystem designed > for flash really isn't suited to block devices. I can think of at least three examples where LogFS would indeed make sense on block devices. Jörn -- Happiness isn't having what you want, it's wanting what you have. -- unknown - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 2007-05-16 at 08:34 -0700, Andrew Morton wrote: > Reduced testability, mainly. Also potentially reduced usefulness. CONFIG_MTD has never been a barrier to testability. JFFS2 depends on MTD and had _most_ of its early testing and development done on the 'fake' mtdram device. Utility is a factor of the underlying design -- a filesystem designed for flash really isn't suited to block devices. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 20:07:18 +0800 David Woodhouse <[EMAIL PROTECTED]> wrote: > > It's strange and a bit regrettable that an fs would have dependency on MTD, > > really. > > Why? Other file systems has dependencies on BLOCK or on NET. It seems > entirely normal to me. Reduced testability, mainly. Also potentially reduced usefulness. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 2007-05-16 at 22:04 +0800, David Woodhouse wrote: > On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote: > > > > My experience is that no matter which name I pick, people will > > complain > > anyway. Previous suggestions included: > > jffs3 > > jefs > > engelfs > > poofs > > crapfs > > sweetfs > > cutefs > > dynamic journaling fs - djofs > > tfsfkal - the file system formerly known as logfs > > Can we call it jörnfs? :) And it is essential to preserve "ö" and let Pavel enjoy :-) -- Best regards, Artem Bityutskiy (Битюцкий Артём) - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, May 16, 2007 at 03:53:19PM +0200, J??rn Engel wrote: > Imo they all suck. LogFS also sucks, but it allows me to make a stupid > joke and keep my logfs.org domain. Well if stupid jokes are a goer there's always gordonfs. :) *hides* -- "To the extent that we overreact, we proffer the terrorists the greatest tribute." - High Court Judge Michael Kirby - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On 5/16/07, David Woodhouse <[EMAIL PROTECTED]> wrote: On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote: > > My experience is that no matter which name I pick, people will > complain > anyway. Previous suggestions included: > jffs3 > jefs > engelfs > poofs > crapfs > sweetfs > cutefs > dynamic journaling fs - djofs > tfsfkal - the file system formerly known as logfs Can we call it jörnfs? :) However if Jörn is accused of murder, it will have little chance of being merged :-). - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote: > > My experience is that no matter which name I pick, people will > complain > anyway. Previous suggestions included: > jffs3 > jefs > engelfs > poofs > crapfs > sweetfs > cutefs > dynamic journaling fs - djofs > tfsfkal - the file system formerly known as logfs Can we call it jörnfs? :) -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 09:41:10 -0400, John Stoffel wrote: > Jörn> On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: > > Jörn> How many of you have worked for IBM before? Vowels are not > evil. ;) > > Nope, they're not. I just think that LogFS isn't descriptive enough, > or more accurately, is the *wrong* description of this filesystem. That was the whole point. JFFS2, the journaling flash filesystem, is a strictly log-structured filesystem. LogFS has a journal. It is also the filesystem that tries to scale logarithmically, as Arnd has noted. Maybe I should call it Log2 to emphesize this point. Log1 would be horrible scalability. > flashfs works for me. It's longer, but hey, that's ok. Even flshfs > might work. Oh wait, flesh? flash? flush? Too confusing... :-) Maybe. FFS or flash filesystem already exists. And YAFFS, yet another flash filesystem, would be older than flashfs. My experience is that no matter which name I pick, people will complain anyway. Previous suggestions included: jffs3 jefs engelfs poofs crapfs sweetfs cutefs dynamic journaling fs - djofs tfsfkal - the file system formerly known as logfs Plus today: FFFS flashfs fredfs bob shizzle Imo they all suck. LogFS also sucks, but it allows me to make a stupid joke and keep my logfs.org domain. Jörn -- There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies, and the other is to make it so complicated that there are no obvious deficiencies. -- C. A. R. Hoare - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 15:36:44 +0300, Pekka Enberg wrote: > On 5/16/07, Jörn Engel <[EMAIL PROTECTED]> wrote: > > > >More trouble? > > Forgot to add (see below). Seems logfs_segment_read would be simpler > too if you fixed this. Would it? I think that code would still be needed, although possibly in a different function. There are two minor drawbacks to using the page cache, btw: - Indirect blocks need some mapping too. So either I need to steal a bit from the inode space or from the fpos space. - OOM handling is a bit more complicated. I would need a mempool for that. > >[ Objects are the units that get compressed. Segments can contain both > >compressed and uncompressed objects. ] > > > >It is a trade-off. Each object has a 24 Byte header plus X Bytes of > >data. Whether the data is compressed or not is indicated in the header. > > Was my point really. Why do segments contain both compressed and > uncompressed objects? Compressing random data will actually enlarge it. If that happens I simply store the verbatim uncompressed data instead and mark it as such. There is also demand for a user-controlled bit in the inode to disable compression completely. All those .jpg, .mpg, .mp3, etc. just waste time by trying and failing to compress them. Jörn -- Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface. -- Doug MacIlroy - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 15:08:15 +0300, Pekka Enberg wrote: > On 5/16/07, Jamie Lokier <[EMAIL PROTECTED]> wrote: > >Given that the filesystem is still 'experimental', I'd concentrate on > >getting it stable before worrying about immutable and xattrs unless > >they are easy. > > We will run into trouble if the on-disk format is not flexible enough > to accommodate xattrs (think reiser3 here). So I'd worry about it > before merging to mainline. Adding xattrs would be fairly simple. Inodes just need one extra pointer for that. Luckily inodes no longer need to be padded to 128 or 256 bytes. They are individually compressed, so their size is not limited to powers of two. Jörn -- To recognize individual spam features you have to try to get into the mind of the spammer, and frankly I want to spend as little time inside the minds of spammers as possible. -- Paul Graham - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 16:29:22 +0400, Evgeniy Polyakov wrote: > On Wed, May 16, 2007 at 01:50:03PM +0200, Jörn Engel ([EMAIL PROTECTED]) > wrote: > > On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: > > > > > > But if akpm can't pronounce it, how about FFFS for faster flash > > > filesystem ;-) > > > > How many of you have worked for IBM before? Vowels are not evil. ;) > > Do you think 'eieio' is a good set? IBM's work too... I will let someone else comment on that one. http://www.uwsg.iu.edu/hypermail/linux/kernel/0110.1/1294.html Jörn -- There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies, and the other is to make it so complicated that there are no obvious deficiencies. -- C. A. R. Hoare - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 13:25:48 +0100, Jamie Lokier wrote: > > Is LogFS really slower than JFFS2 in practice? Not sure. I ran a benchmark before adding compression support in QEMU with a lightning-fast device. So the results should differ quite a bit from practice. http://logfs.org/~joern/logfs/benchmark/benchmark_overview LogFS was actually faster than JFFS2. So for that particular unrealistic benchmark, updating the LogFS tree was less expensive than trying (and failing) to compress and calculating the CRC was for JFFS2. With compression finished, I would expect LogFS numbers to degrade. If file data had checksums (not done yet, should be optional for users to decide) even more so. > I would have guessed reads to be a similar speed, tree updates to be a > similar speed to journal updates for sustained non-fsyncing writes, > and the difference unimportant for tiny individual commits whose index > updates are not merged with any other. I've not thought about it much > though. LogFS isn't that good yet. Right now, writing 10 adjacent blocks to a file requires 10 tree updates instead of 1. Not full updates though, just up to the inode. Quite surprisingly, read speed in the benchmark was significantly better for LogFS, even after substracting mount time. I don't know if all of that can be explained with CRC checks or there is more to it. Jörn -- I can say that I spend most of my time fixing bugs even if I have lots of new features to implement in mind, but I give bugs more priority. -- Andrea Arcangeli, 2000 - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, May 16, 2007 at 01:50:03PM +0200, J??rn Engel wrote: > On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: > > > > But if akpm can't pronounce it, how about FFFS for faster flash > > filesystem ;-) > > How many of you have worked for IBM before? Vowels are not evil. ;) > > Grouping four or more consonants to name anything will cause similar > expressions on people's faces. Numbers don't help much either. > > Ext2 is a great name, because "ext" actually is a pronouncable syllable. > MinixFS, ChunkFS, TileFS are great too. XFS and JFS are ok, at least > they only have three consonants. But FFS exists, so I'd rather go for a > syllable. FlashFS? -- "To the extent that we overreact, we proffer the terrorists the greatest tribute." - High Court Judge Michael Kirby - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On 5/16/07, Pekka Enberg <[EMAIL PROTECTED]> wrote: Forgot to add (see below). Seems logfs_segment_read would be simpler too if you fixed this. Blah. Just to be clear: I forgot to add a "(see below)" text in the original review comment. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On 5/16/07, Jörn Engel <[EMAIL PROTECTED]> wrote: > > +/* FIXME: all this mess should get replaced by using the page cache */ > > +static void fixup_from_wbuf(struct super_block *sb, struct logfs_area > *area, > > + void *read, u64 ofs, size_t readlen) > > +{ > > Indeed. And I think you're getting some more trouble because of this... More trouble? Forgot to add (see below). Seems logfs_segment_read would be simpler too if you fixed this. [ Objects are the units that get compressed. Segments can contain both compressed and uncompressed objects. ] It is a trade-off. Each object has a 24 Byte header plus X Bytes of data. Whether the data is compressed or not is indicated in the header. Was my point really. Why do segments contain both compressed and uncompressed objects? - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, May 16, 2007 at 01:50:03PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: > On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: > > > > But if akpm can't pronounce it, how about FFFS for faster flash > > filesystem ;-) > > How many of you have worked for IBM before? Vowels are not evil. ;) Do you think 'eieio' is a good set? IBM's work too... We do not get new filesystem each couple of month, so stealing so good name as 'logfs' is not that bad I think. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 13:21:11 +0300, Pekka J Enberg wrote: > > > +#define LOGFS_BUG(sb) do { \ > > + struct super_block *__sb = sb; \ > > + logfs_crash_dump(__sb); \ > > + BUG(); \ > > +} while(0) > > Note that BUG() can be a no-op so dumping something on disk might not make > sense there. This seems useful, but you probably need to make this bit > more generic so that using BUG() proper in your filesystem code does the > right thing. Inventing your own wrapper should be avoided. Hmm. I am not sure how this could be generic and still make sense. LogFS has some unused write-once space in the superblock segment. Embedded people always have problems debugging and were suggesting using this to store debug information. That allows me to ask for a filesystem image and get both the offending image plus a crash dump. It also allows me to abort mounting if I ever see an existing crash dump (not implemented yet). "First failure data capture" was an old IBM slogal and the "first" really makes a difference. So how should I deal with BUG() as a no-op? I _think_ it should not make a difference. People choosing that option should know that their kernels will continue running with errors and can potentially do any kind of damage. My wrapper doesn't seem to change that one way or another. What I _should_ do is make the filesystem read-only after this point. That would actually be quite simple. Onto my list. > > +static inline struct logfs_super *LOGFS_SUPER(struct super_block *sb) > > +{ > > + return sb->s_fs_info; > > +} > > + > > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode) > > +{ > > + return container_of(inode, struct logfs_inode, vfs_inode); > > +} > > No need for upper case in function names. Will change. > > +int logfs_memcpy(void *in, void *out, size_t inlen, size_t outlen) > > +{ > > + if (outlen < inlen) > > + return -EIO; > > + memcpy(out, in, inlen); > > + return inlen; > > +} > > Please drop this wrapper function. It's better to open-code the error > handling in the callers (there are total of three of them). Is it? The advantage of having the wrapper is that the compression case and the non-compressed case look identical and the code just repeats the same pattern twice. I'll try to open-code it and see how it looks. > > +/* FIXME: combine with per-sb journal variant */ > > +static unsigned char compressor_buf[LOGFS_MAX_OBJECTSIZE]; > > +static DEFINE_MUTEX(compr_mutex); > > This looks fishy. All reads and writes are serialized by compr_mutex > because they share a scratch buffer for compression and uncompression? s/fishy/lame/ Will move to superblock. > > +/* FIXME: all this mess should get replaced by using the page cache */ > > +static void fixup_from_wbuf(struct super_block *sb, struct logfs_area > *area, > > + void *read, u64 ofs, size_t readlen) > > +{ > > Indeed. And I think you're getting some more trouble because of this... More trouble? Sadly squeezing bugs out of that beast was easier than wrapping my brain around the proper solution. And now it works and has moved to a less urgent position in my todo list. > > +int logfs_segment_read(struct super_block *sb, void *buf, u64 ofs) > > +{ > > + struct logfs_object_header *h; > > + u16 len; > > + int err, bs = sb->s_blocksize; > > + > > + mutex_lock(&compr_mutex); > > + err = wbuf_read(sb, ofs, bs + LOGFS_HEADERSIZE, compressor_buf); > > + if (err) > > + goto out; > > + h = (void*)compressor_buf; > > + len = be16_to_cpu(h->len); > > + > > + switch (h->compr) { > > + case COMPR_NONE: > > + logfs_memcpy(compressor_buf + LOGFS_HEADERSIZE, buf, bs, > bs); > > + break; > > Seems wasteful to first read the data in a scratch buffer and then > memcpy() it immediately for the COMPR_NONE case. Any reason why we can't > read a full struct page, for example, and simply use that if we don't need > to uncompress anything? No technical reason. Just a case of "make it work now and optimize later". > > + case COMPR_ZLIB: > > + err = logfs_uncompress(compressor_buf + LOGFS_HEADERSIZE, > buf, > > + len, bs); > > + BUG_ON(err); > > + break; > > Not claiming to undestand your on-disk format, but wouldn't it make more > sense if we knew whether a given segment is compressed or not _before_ we > actually read it? [ Objects are the units that get compressed. Segments can contain both compressed and uncompressed objects. ] It is a trade-off. Each object has a 24 Byte header plus X Bytes of data. Whether the data is compressed or not is indicated in the header. So we have to do one of two things. Either: - read the complete object, then either uncompress or memcpy, or - read just the header, then either read uncompressed data or read compressed data and uncompre
Re: [PATCH] LogFS take three
Artem Bityutskiy wrote: > On Wed, 2007-05-16 at 12:34 +0100, Jamie Lokier wrote: > > Jörn Engel wrote: > > > On Wed, 16 May 2007 12:54:14 +0800, David Woodhouse wrote: > > > > Personally I'd just go for 'JFFS3'. After all, it has a better claim to > > > > the name than either of its predecessors :) > > > > > > Did you ever see akpm's facial expression when he tried to pronounce > > > "JFFS2"? ;) > > > > JFFS3 is a good, meaningful name to anyone familiar with JFFS2. > > > > But if akpm can't pronounce it, how about FFFS for faster flash > > filesystem ;-) > > The problem is that JFFS2 will always be faster in terms of I/O speed > anyway, just because it does not have to maintain on-flash indexing > data structures. But yes, it is slow in mount and in building big > inodes, so the "faster" is confusing. Is LogFS really slower than JFFS2 in practice? I would have guessed reads to be a similar speed, tree updates to be a similar speed to journal updates for sustained non-fsyncing writes, and the difference unimportant for tiny individual commits whose index updates are not merged with any other. I've not thought about it much though. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Tue, 2007-05-15 at 13:37 -0700, Andrew Morton wrote: > But it includes an MTD header file. > > Can this code be tested by people who don't have MTD hardware? We used to > ahve a fake-mtd-on-a-blockdev thing, whcih was in a state of some > disrepair. Maybe it got repaired. Or removed. I can't immediately locate > it... There's block2mtd, as Jörn pointed out. There's also 'nandsim' and 'mtdram', which will vmalloc some space and use that to emulate NAND or NOR flash, respectively. > It's strange and a bit regrettable that an fs would have dependency on MTD, > really. Why? Other file systems has dependencies on BLOCK or on NET. It seems entirely normal to me. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On 5/16/07, Jamie Lokier <[EMAIL PROTECTED]> wrote: Given that the filesystem is still 'experimental', I'd concentrate on getting it stable before worrying about immutable and xattrs unless they are easy. We will run into trouble if the on-disk format is not flexible enough to accommodate xattrs (think reiser3 here). So I'd worry about it before merging to mainline. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote: > > But if akpm can't pronounce it, how about FFFS for faster flash > filesystem ;-) How many of you have worked for IBM before? Vowels are not evil. ;) Grouping four or more consonants to name anything will cause similar expressions on people's faces. Numbers don't help much either. Ext2 is a great name, because "ext" actually is a pronouncable syllable. MinixFS, ChunkFS, TileFS are great too. XFS and JFS are ok, at least they only have three consonants. But FFS exists, so I'd rather go for a syllable. Jörn -- Joern's library part 5: http://www.faqs.org/faqs/compression-faq/part2/section-9.html - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
Albert Cahalan wrote: > Please don't forget the immutable bit. ("man lsattr") > Having both, BSD-style, would be even better. > The immutable bit is important for working around > software bugs and "features" that damage files. > > I also can't find xattr support. Imho, Given that the filesystem is still 'experimental', I'd concentrate on getting it stable before worrying about immutable and xattrs unless they are easy. (Immutable is easy). In 13 years of using Linux in all sorts of ways I've never yet to use either feature. They would be good to have, but stability in a filesystem is much more useful. I'm biased of course: if LogFS were stable and well tested, I'd be using it right now in my embedded thingy - and that doesn't even bother with uids :-) -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 2007-05-16 at 12:34 +0100, Jamie Lokier wrote: > Jörn Engel wrote: > > On Wed, 16 May 2007 12:54:14 +0800, David Woodhouse wrote: > > > Personally I'd just go for 'JFFS3'. After all, it has a better claim to > > > the name than either of its predecessors :) > > > > Did you ever see akpm's facial expression when he tried to pronounce > > "JFFS2"? ;) > > JFFS3 is a good, meaningful name to anyone familiar with JFFS2. > > But if akpm can't pronounce it, how about FFFS for faster flash > filesystem ;-) The problem is that JFFS2 will always be faster in terms of I/O speed anyway, just because it does not have to maintain on-flash indexing data structures. But yes, it is slow in mount and in building big inodes, so the "faster" is confusing. -- Best regards, Artem Bityutskiy (Битюцкий Артём) - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Tue, 15 May 2007 19:37:36 -0700, Roland Dreier wrote: > > There are rather a lot of of FIXME comments, including scary stuff like > > > + /* > > + * FIXME: this cannot be right but it does "fix" a bug of i_count > > + * dropping too low. Needs more thought. > > + */ > > + atomic_inc(&old_dentry->d_inode->i_count); Any thoughts on this would be appreciated. > and > > > +int __logfs_write_inode(struct inode *inode) > > +{ > > + /* > > + * FIXME: Those two inodes are 512 bytes in total. Not good to > > + * have on the stack. Possibly the best solution would be to bite > > + * the bullet and do another format change before release and > > + * shrink the inodes. > > + */ > > + struct logfs_disk_inode old, new; > > are you going to change the format? or fix this some other way? I would love to put my inodes on a diet. It is just a matter of time and priorities. To me the 512 bytes on the stack are unfortunate, but not a show stopper. Crash behaviour is, so that has priority. > I think a sweep through the code searching for FIXME and at least > rewriting all such comments to look like stuff that can be deferred > would be warranted ;) Are you asking me to hide known problems under a rug? ;) Will see if I can easily fix some of these. In particular the eat-your-data FIXME that can cause LogFS to not survive a crash. Jörn -- Correctness comes second. Features come third. Performance comes last. Maintainability is easily forgotten. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
Jörn Engel wrote: > On Wed, 16 May 2007 12:54:14 +0800, David Woodhouse wrote: > > Personally I'd just go for 'JFFS3'. After all, it has a better claim to > > the name than either of its predecessors :) > > Did you ever see akpm's facial expression when he tried to pronounce > "JFFS2"? ;) JFFS3 is a good, meaningful name to anyone familiar with JFFS2. But if akpm can't pronounce it, how about FFFS for faster flash filesystem ;-) -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 12:54:14 +0800, David Woodhouse wrote: > > Personally I'd just go for 'JFFS3'. After all, it has a better claim to > the name than either of its predecessors :) Did you ever see akpm's facial expression when he tried to pronounce "JFFS2"? ;) Jörn -- Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. -- Rob Pike - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 07:22:54 +0200, Willy Tarreau wrote: > > On hard disks, yes, but as you suggested, there are lots of other flash > devices interfaced as block devices. CompactFlash comes to mind, USB > keys too. And on these ones, the most important is to reduce the number > of writes and to support large sizes. I already see LogFS as an interesting > alternative to JFFS2 on such devices, eventhough it does not (yet?) support > compression. It does support compression now. That part was finished, erm, about two weeks ago. Logfs on Smartmedia format (I assume most CompactFlash, USB, SD and MMC media use that) will currently cause a nasty interaction. Unlike JFFS2, logfs will alternately write to several eraseblocks. On real flash, alternating between half-finished blocks is free. Smartmedia standard requires to finish the last block before moving to the next. So the logfs write pattern will cause it to add an erase/write overhead of - depending on erase size and compression - 4x to 512x. Pretty bad. If Smartmedia was really stupid, there would be no overhead at all. Any manucaturers listening should seriously think about allowing raw flash access to their devices. ;) Jörn -- To recognize individual spam features you have to try to get into the mind of the spammer, and frankly I want to spend as little time inside the minds of spammers as possible. -- Paul Graham - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
Hi Joern, > +#define LOGFS_BUG(sb) do { \ > + struct super_block *__sb = sb; \ > + logfs_crash_dump(__sb); \ > + BUG(); \ > +} while(0) Note that BUG() can be a no-op so dumping something on disk might not make sense there. This seems useful, but you probably need to make this bit more generic so that using BUG() proper in your filesystem code does the right thing. Inventing your own wrapper should be avoided. > +static inline struct logfs_super *LOGFS_SUPER(struct super_block *sb) > +{ > + return sb->s_fs_info; > +} > + > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode) > +{ > + return container_of(inode, struct logfs_inode, vfs_inode); > +} No need for upper case in function names. > +int logfs_memcpy(void *in, void *out, size_t inlen, size_t outlen) > +{ > + if (outlen < inlen) > + return -EIO; > + memcpy(out, in, inlen); > + return inlen; > +} Please drop this wrapper function. It's better to open-code the error handling in the callers (there are total of three of them). > +/* FIXME: combine with per-sb journal variant */ > +static unsigned char compressor_buf[LOGFS_MAX_OBJECTSIZE]; > +static DEFINE_MUTEX(compr_mutex); This looks fishy. All reads and writes are serialized by compr_mutex because they share a scratch buffer for compression and uncompression? > +/* FIXME: all this mess should get replaced by using the page cache */ > +static void fixup_from_wbuf(struct super_block *sb, struct logfs_area *area, > + void *read, u64 ofs, size_t readlen) > +{ Indeed. And I think you're getting some more trouble because of this... > +int logfs_segment_read(struct super_block *sb, void *buf, u64 ofs) > +{ > + struct logfs_object_header *h; > + u16 len; > + int err, bs = sb->s_blocksize; > + > + mutex_lock(&compr_mutex); > + err = wbuf_read(sb, ofs, bs + LOGFS_HEADERSIZE, compressor_buf); > + if (err) > + goto out; > + h = (void*)compressor_buf; > + len = be16_to_cpu(h->len); > + > + switch (h->compr) { > + case COMPR_NONE: > + logfs_memcpy(compressor_buf + LOGFS_HEADERSIZE, buf, bs, bs); > + break; Seems wasteful to first read the data in a scratch buffer and then memcpy() it immediately for the COMPR_NONE case. Any reason why we can't read a full struct page, for example, and simply use that if we don't need to uncompress anything? > + case COMPR_ZLIB: > + err = logfs_uncompress(compressor_buf + LOGFS_HEADERSIZE, buf, > + len, bs); > + BUG_ON(err); > + break; Not claiming to undestand your on-disk format, but wouldn't it make more sense if we knew whether a given segment is compressed or not _before_ we actually read it? - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, May 16, 2007 at 02:06:31AM +0200, Jörn Engel wrote: > On Tue, 15 May 2007 13:37:59 -0700, Andrew Morton wrote: > > It's strange and a bit regrettable that an fs would have dependency on MTD, > > really. > > It is and changing this wouldn't be too hard. All device access goes > through five functions (read, write, erase, is_bad and mark_bad). As > soon as someone seriously cares I will add a struct logfs_device_ops and > have a second set of these functions for block devices. > > On hard disks it shouldn't make too much sense. The filesystem will > fragment like a splinter bomb and be just as popular. On hard disks, yes, but as you suggested, there are lots of other flash devices interfaced as block devices. CompactFlash comes to mind, USB keys too. And on these ones, the most important is to reduce the number of writes and to support large sizes. I already see LogFS as an interesting alternative to JFFS2 on such devices, eventhough it does not (yet?) support compression. Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Tue, 2007-05-15 at 21:19 +0200, Jörn Engel wrote: > On Tue, 15 May 2007 15:07:05 -0400, John Stoffel wrote: > > > > I've been semi watching this, and the only comment I really can give > > is that I hate the name. To me, logfs implies a filesystem for > > logging purposes, not for Flash hardware with wear leveling issues to > > be taken into account. > > Yeah, well, ... > > Two years ago when I started all this, I was looking for a good name. > All I could come up with sounded stupid, so I picked "LogFS" as a code > name. As soon as I find a better name, the code name should get > replaced. > > By now I still don't have anything better. All alternatives that were > proposed are just as bad - with the added disadvantage of being new and > not established yet. My hope of ever finding a better name is nearly > zero. Personally I'd just go for 'JFFS3'. After all, it has a better claim to the name than either of its predecessors :) -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
There are rather a lot of of FIXME comments, including scary stuff like > +/* > + * FIXME: this cannot be right but it does "fix" a bug of i_count > + * dropping too low. Needs more thought. > + */ > +atomic_inc(&old_dentry->d_inode->i_count); and > +int __logfs_write_inode(struct inode *inode) > +{ > +/* > + * FIXME: Those two inodes are 512 bytes in total. Not good to > + * have on the stack. Possibly the best solution would be to bite > + * the bullet and do another format change before release and > + * shrink the inodes. > + */ > +struct logfs_disk_inode old, new; are you going to change the format? or fix this some other way? I think a sweep through the code searching for FIXME and at least rewriting all such comments to look like stuff that can be deferred would be warranted ;) - R. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Wed, 16 May 2007 02:06:31 +0200, Jörn Engel wrote: > > > > + > > > + if (dest) { > > > + /* symlink */ > > > + ret = logfs_inode_write(inode, dest, destlen, 0); > > > + } else { > > > + /* creat/mkdir/mknod */ > > > + ret = __logfs_write_inode(inode); > > > + } > > > + super->s_victim_ino = 0; > > > + if (ret) { > > > + if (!dest) > > > + li->li_flags |= LOGFS_IF_STILLBORN; > > > + /* FIXME: truncate symlink */ > > > + inode->i_nlink--; > > > + iput(inode); > > > + goto out; > > > + } > > > + > > > + if (inode->i_mode & S_IFDIR) > > > + dir->i_nlink++; > > > > You have helper functions for i_nlink++, which remember to do > > mark_inode_dirty()? > > I do. Looks like I should use them here and in at least one other > place. Will recheck them all. Actually, this code was correct. New inodes get written immediatly to ensure catching -ENOSPC at a place where errors can get returned. After leaving this function, the inode is not dirty. Either it has been written or it is marked LOGFS_IF_STILLBORN never will be. Your intuition was still good. Link() did need the change you indicated. Jörn -- Joern's library part 14: http://www.sandpile.org/ - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Tue, 15 May 2007 19:26:17 -0400, Albert Cahalan wrote: > > Please don't forget the immutable bit. ("man lsattr") > Having both, BSD-style, would be even better. > The immutable bit is important for working around > software bugs and "features" that damage files. > > I also can't find xattr support. Not forgotten, just on a different list. Bugs beat features. :) Jörn -- Data expands to fill the space available for storage. -- Parkinson's Law - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Tue, 15 May 2007 13:37:59 -0700, Andrew Morton wrote: > > + > > +config LOGFS_FSCK > > + bool "Run LogFS fsck at mount time" > > + depends on LOGFS > > + help > > + Run a full filesystem check on every mount. If any errors are > > + found, mounting the filesystem will fail. This is a debug option > > + for developers. > > + > > + If unsure, say N. > > + > > No dependency on MTD, Ack. > Can this code be tested by people who don't have MTD hardware? We used to > ahve a fake-mtd-on-a-blockdev thing, whcih was in a state of some > disrepair. Maybe it got repaired. Or removed. I can't immediately locate > it... Got removed. Block2mtd is the new one and works with logfs. At least my version of it does and I pushed my patches to David - not sure if they made it to mainline already. Mtdram should work as well. > It's strange and a bit regrettable that an fs would have dependency on MTD, > really. It is and changing this wouldn't be too hard. All device access goes through five functions (read, write, erase, is_bad and mark_bad). As soon as someone seriously cares I will add a struct logfs_device_ops and have a second set of these functions for block devices. On hard disks it shouldn't make too much sense. The filesystem will fragment like a splinter bomb and be just as popular. > ooh, documentation. Quick, merge it! :) > > +/* memtree.c */ > > +void btree_init(struct btree_head *head); > > +void *btree_lookup(struct btree_head *head, long val); > > +int btree_insert(struct btree_head *head, long val, void *ptr); > > +int btree_remove(struct btree_head *head, long val); > > These names are too generic. If we later add a btree library: blam. My plan was to move this code to lib/ sooner or later. If you consider it useful in its current state, I can do it immediatly. And if someone else merged a superior btree library I'd happily remove mine and use the new one instead. Opinions? > > +/* super.c */ > > +int mtdread(struct super_block *sb, loff_t ofs, size_t len, void *buf); > > +int mtdwrite(struct super_block *sb, loff_t ofs, size_t len, void *buf); > > +int mtderase(struct super_block *sb, loff_t ofs, size_t len); > > +void logfs_crash_dump(struct super_block *sb); > > +int all_ff(void *buf, size_t len); > > +int logfs_statfs(struct dentry *dentry, struct kstatfs *stats); > > Have you checked that all of this needs global scope? Not within the last month. Will recheck. > > + > > +/* progs/fsck.c */ > > +#ifdef CONFIG_LOGFS_FSCK > > +int logfs_fsck(struct super_block *sb); > > +#else > > +#define logfs_fsck(sb) ({ 0; }) > > static inline int logfs_fsck(struct super_block *sb) > { > return 0; > } > > is better: nicer to look at, has typechecking. That's what I tried first. But the brick I carry on my neck decided to compile and link the read logfs_fsck() unconditionally and so this collided. Will change the Makefile and this. > > +static inline struct logfs_super *LOGFS_SUPER(struct super_block *sb) > > +{ > > + return sb->s_fs_info; > > +} > > + > > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode) > > +{ > > + return container_of(inode, struct logfs_inode, vfs_inode); > > +} > > Do these need to be uppercase? No more than EXT2_SB() or EXT2_I(). Since you are the second person to ask, I sense a majority vote to change the case. > > +static inline pgoff_t logfs_index(u64 pos) > > +{ > > + return pos / LOGFS_BLOCKSIZE; > > +} > > If the compiler goofs up here we'll end up trying to do a 64/32 divide and > it won't link on 32-bit machines. It would be safer to do > > return pos >> LOGFS_BLOCKSHIFT; Will change. > > +int logfs_uncompress(void *in, void *out, size_t inlen, size_t outlen) > > +{ > > + int err, ret; > > + > > + ret = -EIO; > > + mutex_lock(&compr_mutex); > > A per-superblock lock and stream would be nicer. Yes and no. The zlib workspaces weigh about 300k. On any decent SMP machine, that hardly matters and getting the scalability is nice. For low-end embedded, that amount of memory per filesystem can matter. I have no clue how many people would suffer one way or another. > > + > > + > > +static inline loff_t file_end(struct inode *inode) > > +{ > > + return (i_size_read(inode) + inode->i_sb->s_blocksize - 1) > > + >> inode->i_sb->s_blocksize_bits; > > +} > > +static void logfs_set_name(struct logfs_disk_dentry *dd, struct qstr *name) > > +{ > > The code has a strange mix of two-blank-lines-between-functions and > no-blank-lines-between-functions. One blank line is usual. Will change. > > + > > + if (dest) { > > + /* symlink */ > > + ret = logfs_inode_write(inode, dest, destlen, 0); > > + } else { > > + /* creat/mkdir/mknod */ > > + ret = __logfs_write_inode(inode); > > + } > > + super->s_victim_ino = 0; > > + if (ret) { > > + if (!dest) > > + li->li_flags |= LOGFS_IF_STILLBORN; > > +
Re: [PATCH] LogFS take three
Please don't forget the immutable bit. ("man lsattr") Having both, BSD-style, would be even better. The immutable bit is important for working around software bugs and "features" that damage files. I also can't find xattr support. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Tue, 15 May 2007 17:19:20 +0200 J__rn Engel <[EMAIL PROTECTED]> wrote: > Add LogFS, a scalable flash filesystem. > > ... > > > +config LOGFS > + tristate "Log Filesystem (EXPERIMENTAL)" > + depends on EXPERIMENTAL > + select ZLIB_INFLATE > + select ZLIB_DEFLATE > + help > + Flash filesystem aimed to scale efficiently to large devices. > + In comparison to JFFS2 it offers significantly faster mount > + times and potentially less RAM usage, although the latter has > + not been measured yet. > + > + In its current state it is still very experimental and should > + not be used for other than testing purposes. > + > + If unsure, say N. > + > +config LOGFS_FSCK > + bool "Run LogFS fsck at mount time" > + depends on LOGFS > + help > + Run a full filesystem check on every mount. If any errors are > + found, mounting the filesystem will fail. This is a debug option > + for developers. > + > + If unsure, say N. > + No dependency on MTD, > @@ -0,0 +1,373 @@ > +/* > + * fs/logfs/logfs.h > + * > + * As should be obvious for Linux kernel code, license is GPLv2 > + * > + * Copyright (c) 2005-2007 Joern Engel > + * > + * Private header for logfs. > + */ > +#ifndef fs_logfs_logfs_h > +#define fs_logfs_logfs_h > + > +#define __CHECK_ENDIAN__ > + > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include But it includes an MTD header file. Can this code be tested by people who don't have MTD hardware? We used to ahve a fake-mtd-on-a-blockdev thing, whcih was in a state of some disrepair. Maybe it got repaired. Or removed. I can't immediately locate it... It's strange and a bit regrettable that an fs would have dependency on MTD, really. > + > +/** > + * struct logfs_area - area management information > + * > + * @a_sb:the superblock this area belongs to > + * @a_is_open: 1 if the area is currently open, else 0 > + * @a_segno: segment number of area > + * @a_used_objects: number of used objects (XXX: should get removed) > + * @a_used_bytes:number of used bytes > + * @a_ops: area operations (either journal or ostore) > + * @a_wbuf: write buffer > + * @a_erase_count: erase count > + * @a_level: GC level > + */ ooh, documentation. Quick, merge it! > +/* memtree.c */ > +void btree_init(struct btree_head *head); > +void *btree_lookup(struct btree_head *head, long val); > +int btree_insert(struct btree_head *head, long val, void *ptr); > +int btree_remove(struct btree_head *head, long val); These names are too generic. If we later add a btree library: blam. > + > +/* readwrite.c */ > +int logfs_inode_read(struct inode *inode, void *buf, size_t n, loff_t _pos); > +int logfs_inode_write(struct inode *inode, const void *buf, size_t n, > + loff_t pos); It's a bit rude stealing the logfs* namespace, but I guess you got there first ;) > +int logfs_readpage_nolock(struct page *page); > +int logfs_write_buf(struct inode *inode, pgoff_t index, void *buf); > +int logfs_delete(struct inode *inode, pgoff_t index); > +int logfs_rewrite_block(struct inode *inode, pgoff_t index, u64 ofs, int > level); > +int logfs_is_valid_block(struct super_block *sb, u64 ofs, u64 ino, u64 pos); > +void logfs_truncate(struct inode *inode); > +u64 logfs_seek_data(struct inode *inode, u64 pos); > + > +int logfs_init_rw(struct logfs_super *super); > +void logfs_cleanup_rw(struct logfs_super *super); > + > +/* segment.c */ > +int logfs_erase_segment(struct super_block *sb, u32 ofs); > +int wbuf_read(struct super_block *sb, u64 ofs, size_t len, void *buf); > +int logfs_segment_read(struct super_block *sb, void *buf, u64 ofs); > +s64 logfs_segment_write(struct inode *inode, void *buf, u64 pos, int level, > + int alloc); > +int logfs_segment_delete(struct inode *inode, u64 ofs, u64 pos, int level); > +void logfs_set_blocks(struct inode *inode, u64 no); > +void __logfs_set_blocks(struct inode *inode); > +/* area handling */ > +int logfs_init_areas(struct super_block *sb); > +void logfs_cleanup_areas(struct logfs_super *super); > +int logfs_open_area(struct logfs_area *area); > +void logfs_close_area(struct logfs_area *area); > + > +/* super.c */ > +int mtdread(struct super_block *sb, loff_t ofs, size_t len, void *buf); > +int mtdwrite(struct super_block *sb, loff_t ofs, size_t len, void *buf); > +int mtderase(struct super_block *sb, loff_t ofs, size_t len); > +void logfs_crash_dump(struct super_block *sb); > +int all_ff(void *buf, size_t len); > +int logfs_statfs(struct dentry *dentry, struct kstatfs *stats); Have you checked that all of this needs global scope? > + > +/* progs/fsck.c */ > +#ifdef CONFIG_LOGFS_FSCK > +int logfs_fsck(struct super_block *sb); > +#else > +#define logfs_fsck(sb) ({ 0; }) static inline int logfs_fsck(struct
Re: [PATCH] LogFS take three
On Tue, 15 May 2007 15:07:05 -0400, John Stoffel wrote: > > I've been semi watching this, and the only comment I really can give > is that I hate the name. To me, logfs implies a filesystem for > logging purposes, not for Flash hardware with wear leveling issues to > be taken into account. Yeah, well, ... Two years ago when I started all this, I was looking for a good name. All I could come up with sounded stupid, so I picked "LogFS" as a code name. As soon as I find a better name, the code name should get replaced. By now I still don't have anything better. All alternatives that were proposed are just as bad - with the added disadvantage of being new and not established yet. My hope of ever finding a better name is nearly zero. > Also, having scanned through the code, I find the name "cookie" using > in logfs_inode(), logfs_iput(), logfs_iget() to be badly named. It > should really be something like *cached_inode, which would seem to > give more natural semantics of > > if (cached_inode) > do_cached_inode_ops(...) > else > do_inode_ops(...) Half-agreed. For callers, the name "cookie" makes sense. It is a transparent thing they should not tough and hand back unchanged. For logfs_iget() and logfs_iput() something like "is_cached" would be better. Will change. Jörn -- Linux [...] existed just for discussion between people who wanted to show off how geeky they were. -- Rob Enderle - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Tue, 15 May 2007 20:37:25 +0200, Sam Ravnborg wrote: > On Tue, May 15, 2007 at 05:19:20PM +0200, Jörn Engel wrote: > > [ I have put everyone that gave comments to the last patch on Cc:. Hope > > that doesn't offend anyone. ] > > > > > > Add LogFS, a scalable flash filesystem. > > Have you run sparse on this code? Several thousand times. :) > I do not recall if you have written something about it. > I do not see any obvious things sparse would catch (just browsing quickly) > but it's always a good thing to do. Absolutely! I added this line for sparse: +#define __CHECK_ENDIAN__ Not sure how much of the kernel is endian-clean. Logfs should be. > +++ linux-2.6.21logfs/fs/logfs/progs/fsck.c 2007-05-15 00:54:22.0 > +0200 > @@ -0,0 +1,332 @@ > +/* > + * fs/logfs/prog/fsck.c- filesystem check > + * > + * As should be obvious for Linux kernel code, license is GPLv2 > + * > + * Copyright (c) 2005-2007 Joern Engel > + * > + * In principle this could get moved to userspace. However it might still > + * make some sense to keep it in the kernel. It is a pure checker and will > + * only report problems, not attempt to repair them. > + */ > +#include "../logfs.h" > + > If potential userspace tools needs to include ../logfs.h then there is > something that ought to be moved to include/linux/logfs.h instead. There shouldn't be anything left in ../logfs.h that is needed for userspace. But the kernel fsck is lame and calls into functions like iget(), which it pulls in from ../logfs.h. Jörn -- The only real mistake is the one from which we learn nothing. -- John Powell - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On Tue, May 15, 2007 at 05:19:20PM +0200, Jörn Engel wrote: > [ I have put everyone that gave comments to the last patch on Cc:. Hope > that doesn't offend anyone. ] > > > Add LogFS, a scalable flash filesystem. Have you run sparse on this code? I do not recall if you have written something about it. I do not see any obvious things sparse would catch (just browsing quickly) but it's always a good thing to do. +++ linux-2.6.21logfs/fs/logfs/progs/fsck.c 2007-05-15 00:54:22.0 +0200 @@ -0,0 +1,332 @@ +/* + * fs/logfs/prog/fsck.c- filesystem check + * + * As should be obvious for Linux kernel code, license is GPLv2 + * + * Copyright (c) 2005-2007 Joern Engel + * + * In principle this could get moved to userspace. However it might still + * make some sense to keep it in the kernel. It is a pure checker and will + * only report problems, not attempt to repair them. + */ +#include "../logfs.h" + If potential userspace tools needs to include ../logfs.h then there is something that ought to be moved to include/linux/logfs.h instead. Sam - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Review status (Re: [PATCH] LogFS take three)
Most of my homework is done. There are six items left plus another five I believe should not get changed. Changed: o Kconfig description updated o Spaces converted to tabs in Makefile o Sorted/seperated #includes o structures are __packed instead of packed, #define removed o removed TRACE() o 32bit magic is u instead of ull now o #defines aligned o Removed obsolete #defines, documented rest o LOGFS_SUPER and LOGFS_INODE are inline functions now o removed alloc_disk_sum and free_disk_sum o moved dir_callback up o removed pointless cast from max() parameter o moved LOGFS_LINK_MAX to header o license headers o removed logfs_writepage() o introduced and documented LOGFS_SEGMENT_RESERVE o documented completion dance in mtderase() o removed isbad() check from mtderase() o removed logfs_device_getpage() and logfs_cached_read() o resurrected code to pick correct device o removed beXX typedefs o removed safe_read() o removed wrapper in logfs_cleanup_gc() o device_read() returns error o renamed logfs_alloc_blocks() to match current implementation o kerneldoc comments for structures o removed logfs_compress_none and logfs_uncompress_none o Split header o move fsck behind debug option Changed many times (and I sincerely hope not to have missed any): o removed // comments, usually dead code o added newline between variable declarations and code o moved tail comments o minor code cleanups o changed comment style o added loglevels to printk Unchanged: o enum for OBJ_TOP_JOURNAL and friends Afaics all the potential enums end up being converted through cpu_to_64XX, I'll have to see if that causes more problems than the enums can hope to solve. o generic function for logfs_type() o generic helper for logfs_prepare_write() o removed EOF o error handling o move rootdir generation back to mkfs Won't happen (unless I get convinced to do otherwise): o lowercase LOGFS_SUPER and LOGFS_INODE These simply match the common pattern used is many Linux filesystems. o remove "extern" For structures, this keyword actually serves a purpose and makes sense. On the other hand, it is completely bogus for functions and I won't be bullied into adding bogus crud either. o Move fs/logfs/Locking to Documentation/ At least JFFS2 has such documentation in its own directory as well. I can see good arguments for either side and no strong consensus. While I ultimately just don't case, doing nothing is a good option in cases of great uncertainty. o change (void*) to "proper" cast Neither variant is much better than the other. I'm open for suggestions to completely remove the casts, though. o Change LOGFS_BUG() and LOGFS_BUG_ON() to inline functions These are macros for very much the same reasons BUG() and BUG_ON() are. Jörn -- To my face you have the audacity to advise me to become a thief - the worst kind of thief that is conceivable, a thief of spiritual things, a thief of ideas! It is insufferable, intolerable! -- M. Binet in Scarabouche - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html