Re: Review status (Re: [PATCH] LogFS take three)

2007-05-23 Thread Evgeniy Polyakov
On Wed, May 23, 2007 at 05:14:04PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote:
> > > I'm just a German.  Forgive me if I drink lesser beverages.
> > 
> > You should definitely change that.
> 
> Change being German?  Not a bad idea, actually.

You cook up really tasty shnaps, in small quantities it is good for
health in infinite volumes.

> > Btw, what about this piece:
> > 
> > int logfs_erase_segment(struct super_block *sb, u32 index)
> > {
> > struct logfs_super *super = LOGFS_SUPER(sb);
> > 
> > super->s_gec++;
> > 
> > return mtderase(sb, index << super->s_segshift, super->s_segsize);
> > }
> > 
> > index << super->s_segshift might overflow, mtderase expects loff_t
> > there, since index can be arbitrary segment number, is it possible, that
> > overflow really occurs?
> 
> Indeed it is.  You just earned your second beer^Wvodka.

Actually this code looks less encrypted than ext2 for, which definitely
a good sign from reviewer's point of view.

> Jörn

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-23 Thread Jörn Engel
On Wed, 23 May 2007 19:07:32 +0400, Evgeniy Polyakov wrote:
> On Wed, May 23, 2007 at 02:58:41PM +0200, Jörn Engel ([EMAIL PROTECTED]) 
> wrote:
> > On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote:
> 
> And what if it is 33 bits? Or it is not allowed?

Not allowed.  Both number and size of segments may never exceed 32bit.

> > > segsize is long, but should be u64 I think.
> > 
> > It could be s32 as well.
> 
> It is a matter of definition - if segment size is allowed to be more
> than 32 bits, then below transformation is not correct, otherwise
> segment size should not use additional 32bits on 64bit platform, since
> it is long.

I guess I could save 4 Bytes there.

> > I'm just a German.  Forgive me if I drink lesser beverages.
> 
> You should definitely change that.

Change being German?  Not a bad idea, actually.

> Btw, what about this piece:
> 
> int logfs_erase_segment(struct super_block *sb, u32 index)
> {
>   struct logfs_super *super = LOGFS_SUPER(sb);
> 
>   super->s_gec++;
> 
>   return mtderase(sb, index << super->s_segshift, super->s_segsize);
> }
> 
> index << super->s_segshift might overflow, mtderase expects loff_t
> there, since index can be arbitrary segment number, is it possible, that
> overflow really occurs?

Indeed it is.  You just earned your second beer^Wvodka.

Jörn

-- 
The wise man seeks everything in himself; the ignorant man tries to get
everything from somebody else.
-- unknown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-23 Thread Evgeniy Polyakov
On Wed, May 23, 2007 at 02:58:41PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote:
> On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote:
> > 
> > In that case segment size must be more than 32 bits, or below
> > transformation will not be correct?
> 
> Must it?  If segment size is just 20bit then the filesystem may only be
> 52bit.  Or 51bit when using signed values.

And what if it is 33 bits? Or it is not allowed?

> > segsize is long, but should be u64 I think.
> 
> It could be s32 as well.

It is a matter of definition - if segment size is allowed to be more
than 32 bits, then below transformation is not correct, otherwise
segment size should not use additional 32bits on 64bit platform, since
it is long.

> > static void fixup_from_wbuf(struct super_block *sb, struct logfs_area
> > *area, void *read, u64 ofs, size_t readlen)
> > 
> > u32 read_start = ofs & (super->s_segsize - 1);
> > u32 read_end = read_start + readlen;
> > 
> > And this can overflow, since readlen is size_t.
> 
> Theoretically yes.  Practically readlen is bounded to sb->blocksize plus
> one header.  I'll start worrying about that when blocksize approaches
> 32bit limit.
> 
> > > If anyone can find similar bugs, the bounty is a beer or non-alcoholic
> > > beverage of choice. :)
> > 
> > Stop kiling your kidneys, your health and promote such antisocial style
> > of life, start drinking vodka instead.
> 
> I'm just a German.  Forgive me if I drink lesser beverages.

You should definitely change that.


Btw, what about this piece:

int logfs_erase_segment(struct super_block *sb, u32 index)
{
struct logfs_super *super = LOGFS_SUPER(sb);

super->s_gec++;

return mtderase(sb, index << super->s_segshift, super->s_segsize);
}

index << super->s_segshift might overflow, mtderase expects loff_t
there, since index can be arbitrary segment number, is it possible, that
overflow really occurs?

> Jörn
> 
> -- 
> Eighty percent of success is showing up.
> -- Woody Allen

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-23 Thread Jörn Engel
On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote:
> 
> In that case segment size must be more than 32 bits, or below
> transformation will not be correct?

Must it?  If segment size is just 20bit then the filesystem may only be
52bit.  Or 51bit when using signed values.

> segsize is long, but should be u64 I think.

It could be s32 as well.

> static void fixup_from_wbuf(struct super_block *sb, struct logfs_area
>   *area, void *read, u64 ofs, size_t readlen)
> 
> u32 read_start = ofs & (super->s_segsize - 1);
> u32 read_end = read_start + readlen;
> 
> And this can overflow, since readlen is size_t.

Theoretically yes.  Practically readlen is bounded to sb->blocksize plus
one header.  I'll start worrying about that when blocksize approaches
32bit limit.

> > If anyone can find similar bugs, the bounty is a beer or non-alcoholic
> > beverage of choice. :)
> 
> Stop kiling your kidneys, your health and promote such antisocial style
> of life, start drinking vodka instead.

I'm just a German.  Forgive me if I drink lesser beverages.

Jörn

-- 
Eighty percent of success is showing up.
-- Woody Allen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-20 Thread Evgeniy Polyakov
On Thu, May 17, 2007 at 07:10:19PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote:
> On Thu, 17 May 2007 20:03:11 +0400, Evgeniy Polyakov wrote:
> > 
> > Is logfs 32bit fs or 674bit, since although you use 64bit values for
> > offsets, area management and strange converstions like described below 
> > from offset into segment number are performed in 32bit?
> > Is it enough for SSD for example to be 32bit only? Or if it is 64bit,
> > could you please explain logic behind area management?
> 
> Ignoring bugs and signed return values for error handling, it is either
> 64bit or 32+32bit.
> 
> Inode numbers and file positions are 64bit.  Offsets are 64bit as well.
> In a couple of places, offsets are also 32+32bit.  Basically the high
> bits contain the segment number, the lower bits the offset within a
> segment.

In that case segment size must be more than 32 bits, or below
transformation will not be correct? segsize is long, but should be u64 I
think.

static void fixup_from_wbuf(struct super_block *sb, struct logfs_area
*area, void *read, u64 ofs, size_t readlen)

u32 read_start = ofs & (super->s_segsize - 1);
u32 read_end = read_start + readlen;

And this can overflow, since readlen is size_t.
It is wbuf fixup, but I saw that somewhere else.
Although, according to your description, it should be 32bit, sum can be
more than 32 bit.

> If anyone can find similar bugs, the bounty is a beer or non-alcoholic
> beverage of choice. :)

Stop kiling your kidneys, your health and promote such antisocial style
of life, start drinking vodka instead.

> Jörn

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Rob Landley
On Saturday 19 May 2007 5:24 am, Jan Engelhardt wrote:
> 
> On May 19 2007 02:15, Rob Landley wrote:
> >> > +
> >> > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode)
> >> > +{
> >> > +return container_of(inode, struct logfs_inode, vfs_inode);
> >> > +}
> >> 
> >> Do these need to be uppercase?
> >
> >I'm trying to keep it clear in my head...
> >
> >When do you need to say __always_inline and when can you get away with 
> >just saying "static inline"?
> 
> When using "static inline", the compiler may ignore the inline keyword 
> (it's just a hint), and leave the function as a standalone function.
> 
> When CONFIG_FORCED_INLINING is active, and it is by default, inline is 
> always substituted by __always_inline, to be on the safe side. Some code 
> needs to be always inline; but not all code has been checked whether it 
> is safe to go from __always_inline to inline.

I've seen patches go by using __always_inline directly.  Is there some 
janitorial effort to examine each instance of the the inline keyword and 
either replace it with "__always_inline" or remove it?

Right now "inline" seems to be about as useful as the "register" keyword.  You 
don't feed hints to a compiler like gcc, you hit it with a two-by-four and 
thumbscrews if you want to get its' attention.

Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Evgeniy Polyakov
On Sat, May 19, 2007 at 05:17:32PM +0100, Jamie Lokier ([EMAIL PROTECTED]) 
wrote:
> > So, log2fs...  Sounds great to me.
> 
> Why Log2?  Logarithmic scaling is just logarithmic scaling.  Does the
> filesystem use 2-ary trees or anything else which gives particular
> meaning to 2?

Sizes used in on-disk format are rounded to the nearest power-of-two.

> -- Jamie

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Jamie Lokier
David Weinehall wrote:
> > It is also the filesystem that tries to scale logarithmically, as Arnd
> > has noted.  Maybe I should call it Log2 to emphesize this point.  Log1
> > would be horrible scalability.
> 
> So, log2fs...  Sounds great to me.

Why Log2?  Logarithmic scaling is just logarithmic scaling.  Does the
filesystem use 2-ary trees or anything else which gives particular
meaning to 2?

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Bill Davidsen

Dongjun Shin wrote:


There are so many flash-based storage and some disposable storages,
as you pointed out, have poor quality. I think it's mainly because these
are not designed for good quality, but for lowering the price.

The reliability seems to be appropriate to the common use. I'm doubious 
that computer storage was a big design factor until the last few years. 
A good argument for buying large sizes, they are more likely to be 
recent design.



These kind of devices are not ready for things like power failure because
their use case is far from that. For example, removing flash card
while taking pictures using digital camera is not a common use case.
(there should be a written notice that this kind of action is against
the warranty)

They do well in such use, if you equate battery death to pulling the 
card (it may not be). I have tested that feature and not had a failure 
of any but the last item. Clearly not recommended, but sometimes 
unplanned needs arise.




- In contrast to the embedded environment where CPU and flash is directly
connected, the I/O path between CPU and flash in PC environment is longer.
The latency for SW handshaking between CPU and flash will also be longer,
which would make the performance optimization harder.

As I mentioned, some techniques like log-structured filesystem could
perform generally better on any kind of flash-based storage with FTL.
Although there are many kinds of FTL, it is commonly true that
it performs well under workload where sequential write is dominant.

I also expect that FTL for PC environment will have better quality spec
than the disposable storage.


The recent technology announcements from Intel are encouraging in that 
respect.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread David Weinehall
On Wed, May 16, 2007 at 03:53:19PM +0200, Jörn Engel wrote:
> On Wed, 16 May 2007 09:41:10 -0400, John Stoffel wrote:
> > Jörn> On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
> > 
> > Jörn> How many of you have worked for IBM before?  Vowels are not
> > evil. ;)
> > 
> > Nope, they're not.  I just think that LogFS isn't descriptive enough,
> > or more accurately, is the *wrong* description of this filesystem.  
> 
> That was the whole point.  JFFS2, the journaling flash filesystem, is a
> strictly log-structured filesystem.  LogFS has a journal.
> 
> It is also the filesystem that tries to scale logarithmically, as Arnd
> has noted.  Maybe I should call it Log2 to emphesize this point.  Log1
> would be horrible scalability.

So, log2fs...  Sounds great to me.

[snip]


Regards: David
-- 
 /) David Weinehall <[EMAIL PROTECTED]> /) Northern lights wander  (\
//  Maintainer of the v2.0 kernel   //  Dance across the winter sky //
\)  http://www.acc.umu.se/~tao/(/   Full colour fire   (/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Bill Davidsen

Kevin Bowling wrote:

On 5/16/07, David Woodhouse <[EMAIL PROTECTED]> wrote:

On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote:
>
> My experience is that no matter which name I pick, people will
> complain
> anyway.  Previous suggestions included:
> jffs3
> jefs
> engelfs
> poofs
> crapfs
> sweetfs
> cutefs
> dynamic journaling fs - djofs
> tfsfkal - the file system formerly known as logfs

Can we call it jörnfs? :)


However if Jörn is accused of murder, it will have little chance of
being merged :-).


WRT that, seems that Nina had a lover who is a confessed serial killer. 
I'm surprised the case hasn't been adapter for 'Boston legal' and 'Law 
and Order' like other high profile crimes.


I see nothing wrong with jörnfs, and there's room for numbers at the end...

--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Jan Engelhardt

On May 19 2007 02:15, Rob Landley wrote:
>> > +
>> > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode)
>> > +{
>> > +  return container_of(inode, struct logfs_inode, vfs_inode);
>> > +}
>> 
>> Do these need to be uppercase?
>
>I'm trying to keep it clear in my head...
>
>When do you need to say __always_inline and when can you get away with 
>just saying "static inline"?

When using "static inline", the compiler may ignore the inline keyword 
(it's just a hint), and leave the function as a standalone function.

When CONFIG_FORCED_INLINING is active, and it is by default, inline is 
always substituted by __always_inline, to be on the safe side. Some code 
needs to be always inline; but not all code has been checked whether it 
is safe to go from __always_inline to inline.


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-18 Thread Rob Landley
On Tuesday 15 May 2007 4:37 pm, Andrew Morton wrote:

> > +static inline struct logfs_super *LOGFS_SUPER(struct super_block *sb)
> > +{
> > +   return sb->s_fs_info;
> > +}
> > +
> > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode)
> > +{
> > +   return container_of(inode, struct logfs_inode, vfs_inode);
> > +}
> 
> Do these need to be uppercase?

I'm trying to keep it clear in my head...

When do you need to say __always_inline and when can you get away with just 
saying "static inline"?

(I'm attempting to write documentation on a topic I don't understand.  Best 
way to learn it, I've found...)

> > + buf = kmap(page);
> > + ret = logfs_write_buf(inode, index, buf);
> > + kunmap(page);
>
> kmap() is lame.  The preferred approach would be to pass the page* down to
> the lower layers and to use kmap_atomic() at the lowest possible point.

Um, would I read about this in DMA-mapping.txt or cachetlb.txt?  (I don't 
think it's fujitsu/frv/mmu-layout.txt)

Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread David Woodhouse
On Fri, 2007-05-18 at 08:17 +0200, Jan Engelhardt wrote:
> > AFAIK, the camera stops writing to the flash card and automatically
> > turns off when it's low on battery (before empty).
> 
> But then, one should also consider the case where a cam is connected to
> AC and someone inadvertently trips on the power cord. 

So you stick a bloody great cap on board to give you enough time to shut
it down cleanly. I've known people do this -- and it helps, but the
devices still manage to crap themselves occasionally even then.

They're _disposable_. As are your data :)

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 18 2007 09:01, Dongjun Shin wrote:
> On 5/18/07, Pavel Machek <[EMAIL PROTECTED]> wrote:
>> 
>> Hmm.. so operating your camera on batteries should be against the
>> warranty, since batteries commonly run empty while storing pictures?
>
> AFAIK, the camera stops writing to the flash card and automatically
> turns off when it's low on battery (before empty).

But then, one should also consider the case where a cam is connected to
AC and someone inadvertently trips on the power cord.


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 17 2007 21:00, Kyle Moffett wrote:
>> > > Opinions?
>> > 
>> > Why would we need another btree, when there is lib/rbtree.c?  Or does
>> > yours do something fundamentally different?
>> 
>> It is not red-black tree, it is b+ tree.
>
> It might be better to use the prefix "bptree" to help prevent confusion.  A
> quick google search on "bp-tree" reveals only the perl B+-tree module
> "Tree::BPTree", a U-Maryland Java CS project on B+-trees, and a news article
> about a "BP tree-top protest".

BP heh.. How about "struct bplustree"?


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Kyle Moffett

On May 17, 2007, at 13:45:33, Evgeniy Polyakov wrote:
On Thu, May 17, 2007 at 07:26:07PM +0200, Jan Engelhardt  
([EMAIL PROTECTED]) wrote:
My plan was to move this code to lib/ sooner or later.  If you  
consider it useful in its current state, I can do it immediatly.   
And if someone else merged a superior btree library I'd happily  
remove mine and use the new one instead.


Opinions?


Why would we need another btree, when there is lib/rbtree.c?  Or  
does yours do something fundamentally different?


It is not red-black tree, it is b+ tree.


It might be better to use the prefix "bptree" to help prevent  
confusion.  A quick google search on "bp-tree" reveals only the perl B 
+-tree module "Tree::BPTree", a U-Maryland Java CS project on B+- 
trees, and a news article about a "BP tree-top protest".


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Dongjun Shin

Hi,

On 5/18/07, Pavel Machek <[EMAIL PROTECTED]> wrote:

Hi!


Hmm.. so operating your camera on batteries should be against the
warranty, since batteries commonly run empty while storing pictures?




AFAIK, the camera stops writing to the flash card and automatically
turns off when it's low on battery (before empty).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jamie Lokier
Jörn Engel wrote:
> > Almost all your static functions start with logfs_, why not this one?
> 
> Because after a while I discovered how silly it is to start every
> function with logfs_.  That prefix doesn't add much unless the function
> has global scope.  What I didn't do was remove the prefix from older
> functions.

It's handy when debugging or showing detailed backtraces.  Not that
I'm advocating it (or not), just something I've noticed in other
programs.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jörn Engel
On Thu, 17 May 2007 23:36:13 +0200, Arnd Bergmann wrote:
> On Thursday 17 May 2007, Pekka Enberg wrote:
> > 
> > So any sane way to enable compression is on per-inode basis which makes 
> > me still wonder why you need per-object compression.
> 
> 1. it doesn't require user interaction, the file system will do the right
> thing most of the time.
> 
> 2. enlarging data is a very bad thing because it makes the behaviour
> of the fs unpredictable. With uncompressed objects, you have a guaranteed
> upper bound on the size.

Correct.  The compression decision is always per-object.  Per-inode is a
hint from userspace that a compression attempt would be futile.

A compression algorithm that compresses any data is provably impossible.
Some data will always cause expansion instead of compression.  Some
algorithms have a well-known upper bound on the expansion, others don't.
So LogFS instead creates its own upper bound by reserving one byte in
the header for the compression type.

And while one bit would suffice as a compressed/uncompressed flag,
having a byte allows to support more than one compression algorithm.
LZO looks promising and is on its way into the kernel.  Others may come
in the future.

Jörn

-- 
My second remark is that our intellectual powers are rather geared to
master static relations and that our powers to visualize processes
evolving in time are relatively poorly developed.
-- Edsger W. Dijkstra
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Thursday 17 May 2007, Pekka Enberg wrote:
> 
> Jörn Engel wrote:
> > Compressing random data will actually enlarge it.  If that happens I
> > simply store the verbatim uncompressed data instead and mark it as such.
> > 
> > There is also demand for a user-controlled bit in the inode to disable
> > compression completely.  All those .jpg, .mpg, .mp3, etc. just waste
> > time by trying and failing to compress them.
> 
> So any sane way to enable compression is on per-inode basis which makes 
> me still wonder why you need per-object compression.

1. it doesn't require user interaction, the file system will do the right
thing most of the time.

2. enlarging data is a very bad thing because it makes the behaviour
of the fs unpredictable. With uncompressed objects, you have a guaranteed
upper bound on the size.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jörn Engel
On Thu, 17 May 2007 23:00:20 +0200, Arnd Bergmann wrote:
> 
> Just using nanoseconds probably doesn't gain you much after all
> then. You could however just have separate 32 bit fields in the
> inode for seconds and nanoseconds, that will result in the exact
> same layout that you have right now, but won't require a conversion
> function.

I could also have a 30bit and a 34bit field.  30bit is enough for
nanoseconds.  So many options.

Jörn

-- 
Time? What's that? Time is only worth what you do with it.
-- Theo de Raadt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Thursday 17 May 2007, Jörn Engel wrote:
> 
> > Why not just store 64 bit nanoseconds? that would avoid the problem
> > with ns overflow and the year-2038 bug. OTOH, that would require
> > a 64 bit integer division when reading the data, so it gets you
> > a runtime overhead.
> 
> I like the idea.  Do conversion function exist both way?
> 
> What I don't get is the year-2038 bug.  Isn't that the 31bit limit,
> while 32bit would last to 2106?

You're right, you don't hit the 2038 bug here, because you use an
unsigned variable. The bug exists elsewhere because time_t tv_sec
is signed.

Just using nanoseconds probably doesn't gain you much after all
then. You could however just have separate 32 bit fields in the
inode for seconds and nanoseconds, that will result in the exact
same layout that you have right now, but won't require a conversion
function.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Pekka Enberg

Jörn Engel wrote:

Compressing random data will actually enlarge it.  If that happens I
simply store the verbatim uncompressed data instead and mark it as such.

There is also demand for a user-controlled bit in the inode to disable
compression completely.  All those .jpg, .mpg, .mp3, etc. just waste
time by trying and failing to compress them.


So any sane way to enable compression is on per-inode basis which makes 
me still wonder why you need per-object compression.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Pavel Machek
Hi!

> >Yes. These things are almost always implemented _very_ 
> >badly by the same
> >kind of crack-smoking hobo they drag in off the streets 
> >to write BIOSen.
> >
> >It's bog-roll technology; if you fancy a laugh try 
> >doing some real
> >reliability tests on them time some. Powerfail testing 
> >is a good one.
> >
> >This kind of thing is OK for disposable storage such as 
> >in digital
> >cameras, where it doesn't matter that it's no more 
> >reliable than a
> >floppy disc, but for real long-term storage it's really 
> >a bad idea.
> >
> 
> There are so many flash-based storage and some 
> disposable storages,
> as you pointed out, have poor quality. I think it's 
> mainly because these
> are not designed for good quality, but for lowering the 
> price.
> 
> These kind of devices are not ready for things like 
> power failure because
> their use case is far from that. For example, removing 
> flash card
> while taking pictures using digital camera is not a 
> common use case.
> (there should be a written notice that this kind of 
> action is against
> the warranty)

Hmm.. so operating your camera on batteries should be against the
warranty, since batteries commonly run empty while storing pictures?


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jörn Engel
On Thu, 17 May 2007 17:08:51 +0200, Arnd Bergmann wrote:
> On Tuesday 15 May 2007, Jörn Engel wrote:
> > Add LogFS, a scalable flash filesystem.
> 
> Sorry for not commenting earlier, there were so many discussions on version
> two that I wanted to wait for the fallout of that instead of duplicating
> all the comments.

You are the last person that has to be sorry. ;)

> Here are a few things I notice while going through the third version:
> 
> > +/*
> > + * Private errno for accessed beyond end-of-file.  Only used internally to
> > + * logfs.  If this ever gets exposed to userspace or even other parts of 
> > the
> > + * kernel, it is a bug.  256 was chosen as a number sufficiently above all
> > + * used errno #defines.
> > + *
> > + * It can be argued that this is a hack and should be replaced with 
> > something
> > + * else.  My last attempt to do this failed spectacularly and there are 
> > more
> > + * urgent problems that users actually care about.  This will remain for 
> > the
> > + * moment.  Patches are wellcome, of course.
> > + */
> > +#define EOF256
> 
> It should at least be in the kernel-only errno range between 512 and 4095,
> that way it can eventually be added to include/linux/errno.h.

Fair enough.  512 it is.

> > + * Target rename works in three atomic steps:
> > + * 1. Attach old inode to new dentry (remember old dentry and new inode)
> > + * 2. Remove old dentry (still remember the new inode)
> > + * 3. Remove new inode
> > + *
> > + * Here we remember both an inode an a dentry.  If we get interrupted
> > + * between steps 1 and 2, we delete both the dentry and the inode.  If
> > + * we get interrupted between steps 2 and 3, we delete just the inode.
> > + * In either case, the remaining objects are deleted on next mount.  From
> > + * a users point of view, the operation succeeded.
> 
> This description had me confused for a while: why would you remove the
> new inode. Maybe change the text to say 'target inode' or 'victim inode'?

'Victim inode' sounds good.  Will do.

> > +static int logfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
> > +{
> > +   struct inode *inode;
> > +
> > +   if (dir->i_nlink >= LOGFS_LINK_MAX)
> > +   return -EMLINK;
> 
> Why is i_nlink limited? Don't you run out of space for inodes before
> overflowing?

I don't know.  With the current limit of 2^31, a sufficiently large
device can reach the limit.  And it is imaginable that overflowing the
s32 number space can expose security holes.  Not that I actually know,
the check is pure paranoia.

> > + * In principle, this function should loop forever, looking for GC 
> > candidates
> > + * and moving data.  LogFS is designed in such a way that this loop is
> > + * guaranteed to terminate.
> > + *
> > + * Limiting the loop to four iterations serves purely to catch cases when
> > + * these guarantees have failed.  An actual endless loop is an obvious bug
> > + * and should be reported as such.
> > + *
> > + * But there is another nasty twist to this.  As I have described in my LCA
> > + * presentation, Garbage collection would have to limit itself to higher
> > + * levels if the number of available free segments goes down.  This code
> > + * doesn't and should fail spectacularly.  Yet - hard as I tried I haven't
> > + * been able to make it fail (short of a bug elsewhere).
> > + *
> > + * So in a way this code is intentionally wrong as a desperate cry for a
> > + * better testcase.  And I do expect to get blamed for it one day. :(
> > + */
> 
> Could you bug the code to reserve fewer segments for GC than you really
> need, in order to stress test GC?

I could.  Wear leveling will cause changes in the area, so I'll have a
closer look when implementing that.

> > +static struct inode *logfs_alloc_inode(struct super_block *sb)
> > +{
> > +   struct logfs_inode *li;
> > +
> > +   li = kmem_cache_alloc(logfs_inode_cache, GFP_KERNEL);
> > +   if (!li)
> > +   return NULL;
> > +   logfs_init_inode(&li->vfs_inode);
> > +   return &li->vfs_inode;
> > +}
> > +
> > +
> > +struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino)
> > +{
> > +   struct inode *inode;
> > +
> > +   inode = logfs_alloc_inode(sb);
> > +   if (!inode)
> > +   return ERR_PTR(-ENOMEM);
> > +
> > +   logfs_init_inode(inode);
> 
> logfs_alloc_inode() returns an initialized inode, so no need to call
> logfs_init_inode() again, right?

Right.  Will change.

> > +static __be64 timespec_to_be64(struct timespec tsp)
> > +{
> > +   u64 time = ((u64)tsp.tv_sec << 32) + (tsp.tv_nsec & 0x);
> > +
> > +   WARN_ON(tsp.tv_nsec > 9);
> > +   return cpu_to_be64(time);
> > +}
> 
> Why not just store 64 bit nanoseconds? that would avoid the problem
> with ns overflow and the year-2038 bug. OTOH, that would require
> a 64 bit integer division when reading the data, so it gets you
> a runtime overhead.

I like the idea.  Do conversion function exist both way?

What I don't get is the year-2038 bug. 

Re: [PATCH] LogFS take three

2007-05-17 Thread Evgeniy Polyakov
On Thu, May 17, 2007 at 07:26:07PM +0200, Jan Engelhardt ([EMAIL PROTECTED]) 
wrote:
> >My plan was to move this code to lib/ sooner or later.  If you consider
> >it useful in its current state, I can do it immediatly.  And if someone
> >else merged a superior btree library I'd happily remove mine and use the
> >new one instead.
> >
> >Opinions?
> 
> Why would we need another btree, when there is lib/rbtree.c?
> Or does yours do something fundamentally different?

It is not red-black tree, it is b+ tree.

>   Jan

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 16 2007 02:06, Jörn Engel wrote:
>
>> > +/* memtree.c */
>> > +void btree_init(struct btree_head *head);
>> > +void *btree_lookup(struct btree_head *head, long val);
>> > +int btree_insert(struct btree_head *head, long val, void *ptr);
>> > +int btree_remove(struct btree_head *head, long val);
>> 
>> These names are too generic.  If we later add a btree library: blam.
>
>My plan was to move this code to lib/ sooner or later.  If you consider
>it useful in its current state, I can do it immediatly.  And if someone
>else merged a superior btree library I'd happily remove mine and use the
>new one instead.
>
>Opinions?

Why would we need another btree, when there is lib/rbtree.c?
Or does yours do something fundamentally different?


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 16 2007 15:53, Jörn Engel wrote:
>
>My experience is that no matter which name I pick, people will complain
>anyway.  Previous suggestions included:
[...]
>
>Plus today:
>FFFS
>flashfs
>fredfs
>bob
>shizzle
>
>Imo they all suck.  LogFS also sucks, but it allows me to make a stupid
>joke and keep my logfs.org domain.

Try woodfs! (log - wood - get it?)
But finding names can be so tiresome, just give it a Borg-style
designation - "filesystem 125" or so. fs2007q1, being this
quartal's new filesystem.


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 16 2007 22:06, CaT wrote:
>On Wed, May 16, 2007 at 01:50:03PM +0200, J??rn Engel wrote:
>> On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
>> > 
>> > But if akpm can't pronounce it, how about FFFS for faster flash
>> > filesystem ;-)
>> 
>> How many of you have worked for IBM before?  Vowels are not evil. ;)
>> 
>> Grouping four or more consonants to name anything will cause similar
>> expressions on people's faces.  Numbers don't help much either.
>> 
>> Ext2 is a great name, because "ext" actually is a pronouncable syllable.
>> MinixFS, ChunkFS, TileFS are great too.  XFS and JFS are ok, at least
>> they only have three consonants.  But FFS exists, so I'd rather go for a
>> syllable.
>
>FlashFS?

Or just try once dropping all those redundant 'fs' suffixes.
bdev, proc, cpuset, devpts, mqueue, fuse(blk|ctl), vfat, iso9660, etc.
Then there's much more space for innovative names.


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 16 2007 14:55, Jörn Engel wrote:
>On Wed, 16 May 2007 16:29:22 +0400, Evgeniy Polyakov wrote:
>> On Wed, May 16, 2007 at 01:50:03PM +0200, Jörn Engel ([EMAIL PROTECTED]) 
>> wrote:
>> > On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
>> > > 
>> > > But if akpm can't pronounce it, how about FFFS for faster flash
>> > > filesystem ;-)
>> > 
>> > How many of you have worked for IBM before?  Vowels are not evil. ;)
>> 
>> Do you think 'eieio' is a good set? IBM's work too...

C'mon, UIO does not cut IIO either ;-)


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 16 2007 13:09, Jörn Engel wrote:
>On Wed, 16 May 2007 12:54:14 +0800, David Woodhouse wrote:
>> 
>> Personally I'd just go for 'JFFS3'. After all, it has a better claim to
>> the name than either of its predecessors :)
>
>Did you ever see akpm's facial expression when he tried to pronounce
>"JFFS2"?  ;)

Is there something special with [dʒeɪ ɛf ɛf ɛs tuː]?


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-17 Thread Jörn Engel
On Thu, 17 May 2007 20:03:11 +0400, Evgeniy Polyakov wrote:
> 
> Is logfs 32bit fs or 674bit, since although you use 64bit values for
> offsets, area management and strange converstions like described below 
> from offset into segment number are performed in 32bit?
> Is it enough for SSD for example to be 32bit only? Or if it is 64bit,
> could you please explain logic behind area management?

Ignoring bugs and signed return values for error handling, it is either
64bit or 32+32bit.

Inode numbers and file positions are 64bit.  Offsets are 64bit as well.
In a couple of places, offsets are also 32+32bit.  Basically the high
bits contain the segment number, the lower bits the offset within a
segment.

Side note: It would be nicer if the high 32bit were segment number.
Instead the number of bits depends on segment size.  Guess I should
change that while the format isn't fixed yet.

An "area" is a segment that is currently being written.  Data is
appended to this segment as it comes in, until the segment is full.  Any
functions dealing with areas only need a 32bit offset, which is the
offset within the area, not the absolute device offset.

Writes within an area are also buffered.  New data first goes into the
write buffer (wbuf) and only when this is full is it flushed to the
device.  NAND flash and some NOR flashes require such buffering.  When
writing to the device, the 32bit segno and the 32bit in-segment offset
need to get converted back to a 64bit device offset.

> I've found that you store segment numbers as 32bit values (for example
> in prepare_write()), and convert requested 64bit offset into segment
> number via superblock's s_segshift.

Yes, as described above.

> This conversation seems confusing to me in case of real 64bit offsets.
> For example this one obtained via prepare_write:
> 
> 7  1 logfs_prepare_write78  fs/logfs/file.c
> 8  2 logfs_readpage_nolock20  fs/logfs/file.c
> 9  1 logfs_read_block   351  fs/logfs/readwrite.c
> 10  1 logfs_read_loop   139  fs/logfs/readwrite.c
> 11  2 logfs_segment_read   108  fs/logfs/readwrite.c
> 12  1 wbuf_read 289 
> 
> u32 segno = ofs >> super->s_segshift;
> 
> ofs is originally obtained from inode's li_data array, which is filled
> with raw segment numbers which can be 64bit (here is another issue,
> since logfs_segment_write() returns signed, so essentially logfs is
> 63bit filesystem).

The filesystem format is 64bit.  The current code can only deal with
63bit.  Eric Sandeen just fixed ext2 to actually deal with 32bit
numbers and the same is possible for logfs.  If anyone ever cares...

> But here I've came to area management in logfs, and found that it is
> 32bit only, for example __logfs_segment_write()/__logfs_get_free_bytes() 
> returns signed 32 bit value (so it is reduced to 31 bit), which is then 
> placed into li_data as 64bit value. The latter
> (__logfs_get_free_bytes()) truncates 64bit data value obtained via
> dev_ofs() into signed 32 bit value.

That indeed is a bug.  __logfs_get_free_bytes() should return s64
instead of s32.  Will fix immediatly.

If anyone can find similar bugs, the bounty is a beer or non-alcoholic
beverage of choice. :)

Jörn

-- 
To announce that there must be no criticism of the President, or that we
are to stand by the President, right or wrong, is not only unpatriotic
and servile, but is morally treasonable to the American public.
-- Theodore Roosevelt, Kansas City Star, 1918
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-17 Thread Evgeniy Polyakov

Hi Jörn.

Is logfs 32bit fs or 674bit, since although you use 64bit values for
offsets, area management and strange converstions like described below 
from offset into segment number are performed in 32bit?
Is it enough for SSD for example to be 32bit only? Or if it is 64bit,
could you please explain logic behind area management?

I've found that you store segment numbers as 32bit values (for example
in prepare_write()), and convert requested 64bit offset into segment
number via superblock's s_segshift.
This conversation seems confusing to me in case of real 64bit offsets.
For example this one obtained via prepare_write:

7  1 logfs_prepare_write78  fs/logfs/file.c
8  2 logfs_readpage_nolock20  fs/logfs/file.c
9  1 logfs_read_block   351  fs/logfs/readwrite.c
10  1 logfs_read_loop   139  fs/logfs/readwrite.c
11  2 logfs_segment_read   108  fs/logfs/readwrite.c
12  1 wbuf_read 289 

u32 segno = ofs >> super->s_segshift;

ofs is originally obtained from inode's li_data array, which is filled
with raw segment numbers which can be 64bit (here is another issue,
since logfs_segment_write() returns signed, so essentially logfs is
63bit filesystem).

But here I've came to area management in logfs, and found that it is
32bit only, for example __logfs_segment_write()/__logfs_get_free_bytes() 
returns signed 32 bit value (so it is reduced to 31 bit), which is then 
placed into li_data as 64bit value. The latter
(__logfs_get_free_bytes()) truncates 64bit data value obtained via
dev_ofs() into signed 32 bit value.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Tuesday 15 May 2007, Jörn Engel wrote:
> Add LogFS, a scalable flash filesystem.

Hi Jörn,

Sorry for not commenting earlier, there were so many discussions on version
two that I wanted to wait for the fallout of that instead of duplicating
all the comments.

Here are a few things I notice while going through the third version:

> +/*
> + * Private errno for accessed beyond end-of-file.  Only used internally to
> + * logfs.  If this ever gets exposed to userspace or even other parts of the
> + * kernel, it is a bug.  256 was chosen as a number sufficiently above all
> + * used errno #defines.
> + *
> + * It can be argued that this is a hack and should be replaced with something
> + * else.  My last attempt to do this failed spectacularly and there are more
> + * urgent problems that users actually care about.  This will remain for the
> + * moment.  Patches are wellcome, of course.
> + */
> +#define EOF  256

It should at least be in the kernel-only errno range between 512 and 4095,
that way it can eventually be added to include/linux/errno.h.

> + * Target rename works in three atomic steps:
> + * 1. Attach old inode to new dentry (remember old dentry and new inode)
> + * 2. Remove old dentry (still remember the new inode)
> + * 3. Remove new inode
> + *
> + * Here we remember both an inode an a dentry.  If we get interrupted
> + * between steps 1 and 2, we delete both the dentry and the inode.  If
> + * we get interrupted between steps 2 and 3, we delete just the inode.
> + * In either case, the remaining objects are deleted on next mount.  From
> + * a users point of view, the operation succeeded.

This description had me confused for a while: why would you remove the
new inode. Maybe change the text to say 'target inode' or 'victim inode'?

> +static int logfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
> +{
> + struct inode *inode;
> +
> + if (dir->i_nlink >= LOGFS_LINK_MAX)
> + return -EMLINK;

Why is i_nlink limited? Don't you run out of space for inodes before
overflowing?

> + * In principle, this function should loop forever, looking for GC candidates
> + * and moving data.  LogFS is designed in such a way that this loop is
> + * guaranteed to terminate.
> + *
> + * Limiting the loop to four iterations serves purely to catch cases when
> + * these guarantees have failed.  An actual endless loop is an obvious bug
> + * and should be reported as such.
> + *
> + * But there is another nasty twist to this.  As I have described in my LCA
> + * presentation, Garbage collection would have to limit itself to higher
> + * levels if the number of available free segments goes down.  This code
> + * doesn't and should fail spectacularly.  Yet - hard as I tried I haven't
> + * been able to make it fail (short of a bug elsewhere).
> + *
> + * So in a way this code is intentionally wrong as a desperate cry for a
> + * better testcase.  And I do expect to get blamed for it one day. :(
> + */

Could you bug the code to reserve fewer segments for GC than you really
need, in order to stress test GC?

> +static struct inode *logfs_alloc_inode(struct super_block *sb)
> +{
> + struct logfs_inode *li;
> +
> + li = kmem_cache_alloc(logfs_inode_cache, GFP_KERNEL);
> + if (!li)
> + return NULL;
> + logfs_init_inode(&li->vfs_inode);
> + return &li->vfs_inode;
> +}
> +
> +
> +struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino)
> +{
> + struct inode *inode;
> +
> + inode = logfs_alloc_inode(sb);
> + if (!inode)
> + return ERR_PTR(-ENOMEM);
> +
> + logfs_init_inode(inode);

logfs_alloc_inode() returns an initialized inode, so no need to call
logfs_init_inode() again, right?

> +static __be64 timespec_to_be64(struct timespec tsp)
> +{
> + u64 time = ((u64)tsp.tv_sec << 32) + (tsp.tv_nsec & 0x);
> +
> + WARN_ON(tsp.tv_nsec > 9);
> + return cpu_to_be64(time);
> +}

Why not just store 64 bit nanoseconds? that would avoid the problem
with ns overflow and the year-2038 bug. OTOH, that would require
a 64 bit integer division when reading the data, so it gets you
a runtime overhead.

> +static void logfs_read_inode(struct inode *inode)
> +{
> + int ret;
> +
> + BUG_ON(inode->i_ino == LOGFS_INO_MASTER);
> +
> + ret = __logfs_read_inode(inode);
> +
> + /* What else can we do here? */
> + BUG_ON(ret);
> +}

ext2 returns make_bad_inode(inode) in this case, which seems to be
a better solution than crashing.

> +int __logfs_write_inode(struct inode *inode)
> +{
> + /*
> +  * FIXME: Those two inodes are 512 bytes in total.  Not good to
> +  * have on the stack.  Possibly the best solution would be to bite
> +  * the bullet and do another format change before release and
> +  * shrink the inodes.
> +  */
> + struct logfs_disk_inode old, new;
> +
> + BUG_ON(inode->i_ino == LOGFS_INO_MASTER);
> +
> + /* read and compare the inode first.  If it hasn't

Re: [PATCH] LogFS take three

2007-05-17 Thread Jörn Engel
On Thu, 17 May 2007 16:43:59 +0800, David Woodhouse wrote:
> 
> > As I mentioned, some techniques like log-structured filesystem could
> > perform generally better on any kind of flash-based storage with FTL.
> > Although there are many kinds of FTL, it is commonly true that
> > it performs well under workload where sequential write is dominant.
> 
> Yes, it's certainly possible that we _could_ write a file system which
> is specifically targeted at FTL -- I was just wondering why anyone would
> _bother_ :)

Haven't you done that already?  JFFS2 write behaviour is the best-case
scenario for any FTL.  When the delta cache is finished, LogFS will be
pretty close to that as well.

Not sure if anyone would specifically target FTL.  Being well-suited for
those beasts is just a side-effect.

The FTL is still a net loss.  Without that FAT enabling layer a real
flash filesystem would be more efficient.

Jörn

-- 
Prosperity makes friends, adversity tries them.
-- Publilius Syrus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Tuesday 15 May 2007, Jörn Engel wrote:
> 
> > I've been semi watching this, and the only comment I really can give
> > is that I hate the name.  To me, logfs implies a filesystem for
> > logging purposes, not for Flash hardware with wear leveling issues to
> > be taken into account.
> 
> Yeah, well, ...
> 
> Two years ago when I started all this, I was looking for a good name.
> All I could come up with sounded stupid, so I picked "LogFS" as a code
> name.  As soon as I find a better name, the code name should get
> replaced.
> 

When doing a google search on logfs, there are less than five results
among the first 100 that don't refer to your work. The other two listed
in there are also log-structured file systems: The inferno flash file
system (http://inferno-os.googlecode.com/svn/trunk/liblogfs/) and the
(discontinued) file system named lfs from the 2005 google summer of
code.

I'd say the name should stay, changing it now can only add more confusion.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread David Woodhouse
On Thu, 2007-05-17 at 09:12 +, Pavel Machek wrote:
> Nah, it would lead to Jorn disappearing misteriously and _Pavel_
> accused of murder ;-).

Are you suggesting that you would murder Jörn (you misspelled his name)
merely for the heinous crime of using his own name?

Your Luddism was already quite excessive, but now you really _are_
taking it to extremes. :)

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Pavel Machek

> >> My experience is that no matter which name I pick, 
> >people will
> >> complain
> >> anyway.  Previous suggestions included:
> >> jffs3
> >> jefs
> >> engelfs
> >> poofs
> >> crapfs
> >> sweetfs
> >> cutefs
> >> dynamic journaling fs - djofs
> >> tfsfkal - the file system formerly known as logfs
> >
> >Can we call it jörnfs? :)
> 
> However if Jörn is accused of murder, it will have 
> little chance of
> being merged :-).

Nah, it would lead to Jorn disappearing misteriously and _Pavel_
accused of murder ;-).

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread David Woodhouse
On Thu, 2007-05-17 at 17:20 +0900, Dongjun Shin wrote:
> There are, of course, cases where direct access are better.
> However, as the demand for capacity, reliability and high performance
> for the flash storage increases, the use of FTL with embedded controller
> would be inevitable.
> 
> - The complexity/cost of host-side SW (like JFFS2/MTD) will increase due to
> the use of multiple flash in parallel and the use of high degree ECC 
> algorithm.
> There are other things like bad block handling and wear-leveling issues
> which has been recently touched by UBI with added SW complexity.

You don't get rid of that complexity by offloading it to a µcontroller. 

The only thing you achieve that way is making sure it's quite likely to
be done badly, and making it impossible to debug.

> - In contrast to the embedded environment where CPU and flash is directly
> connected, the I/O path between CPU and flash in PC environment is longer.
> The latency for SW handshaking between CPU and flash will also be longer,
> which would make the performance optimization harder.

Do it the naïve way with a single byte push/pull and waggling the
control lines separately, and what you say is true -- but you can have
flash controllers which assist with data transfer but still give you
essentially 'raw' access to the chip.

With the CAFÉ controller designed for the OLPC machine, we can spew data
across the PCI bus just as fast as we can suck it off the flash chip.

> As I mentioned, some techniques like log-structured filesystem could
> perform generally better on any kind of flash-based storage with FTL.
> Although there are many kinds of FTL, it is commonly true that
> it performs well under workload where sequential write is dominant.

Yes, it's certainly possible that we _could_ write a file system which
is specifically targeted at FTL -- I was just wondering why anyone would
_bother_ :)

I've seen an interesting file system which does have a kind of FTL
internally as its lowest layer, and which build on that using 'virtual'
sectors for file extents. Now that _does_ have its advantages -- but it
doesn't go as far as pretending to be a 'normal' block device; it's its
own special thing for internal use within that file system.
 
> I also expect that FTL for PC environment will have better quality spec
> than the disposable storage.

There really is no reason why FTL has to be done badly; just as there's
no _reason_ why hardware vendors have to give us crappy bsVendorCode.
Nevertheless, that's the way the world tends to be. So good luck
shipping with that :)

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Dongjun Shin

On 5/17/07, David Woodhouse <[EMAIL PROTECTED]> wrote:


Yes. These things are almost always implemented _very_ badly by the same
kind of crack-smoking hobo they drag in off the streets to write BIOSen.

It's bog-roll technology; if you fancy a laugh try doing some real
reliability tests on them time some. Powerfail testing is a good one.

This kind of thing is OK for disposable storage such as in digital
cameras, where it doesn't matter that it's no more reliable than a
floppy disc, but for real long-term storage it's really a bad idea.



There are so many flash-based storage and some disposable storages,
as you pointed out, have poor quality. I think it's mainly because these
are not designed for good quality, but for lowering the price.

These kind of devices are not ready for things like power failure because
their use case is far from that. For example, removing flash card
while taking pictures using digital camera is not a common use case.
(there should be a written notice that this kind of action is against
the warranty)



There's little point in optimising a file system _specifically_ for
devices which in often aren't reliable enough to keep your data anyway.
You might as well use ramfs.

It's unfortunate really -- there's no _fundamental_ reason why FTL has
to be done so badly; it's just that it almost always is. Direct access
to the flash from Linux is _always_ going to be better in practice --
and that way you avoid the problems with dual journalling, along with
the problems with the underlying FTL continuing to keep (and copy around
during GC) sectors which the top-level filesystem has actually
deallocated, etc.



There are, of course, cases where direct access are better.
However, as the demand for capacity, reliability and high performance
for the flash storage increases, the use of FTL with embedded controller
would be inevitable.

- The complexity/cost of host-side SW (like JFFS2/MTD) will increase due to
the use of multiple flash in parallel and the use of high degree ECC algorithm.
There are other things like bad block handling and wear-leveling issues
which has been recently touched by UBI with added SW complexity.

- In contrast to the embedded environment where CPU and flash is directly
connected, the I/O path between CPU and flash in PC environment is longer.
The latency for SW handshaking between CPU and flash will also be longer,
which would make the performance optimization harder.

As I mentioned, some techniques like log-structured filesystem could
perform generally better on any kind of flash-based storage with FTL.
Although there are many kinds of FTL, it is commonly true that
it performs well under workload where sequential write is dominant.

I also expect that FTL for PC environment will have better quality spec
than the disposable storage.

Dongjun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread David Woodhouse
On Thu, 2007-05-17 at 15:12 +0900, Dongjun Shin wrote:
> The current trend of flash-based device is to hide the flash-specific details
> from the host OS. The flash memory is encapsulated in a package
> which contains a dedicated controller where a small piece of software (F/W or 
> FTL)
> runs and makes the storage shown as a block device to the host.

Yes. These things are almost always implemented _very_ badly by the same
kind of crack-smoking hobo they drag in off the streets to write BIOSen.

It's bog-roll technology; if you fancy a laugh try doing some real
reliability tests on them time some. Powerfail testing is a good one.
 
This kind of thing is OK for disposable storage such as in digital
cameras, where it doesn't matter that it's no more reliable than a
floppy disc, but for real long-term storage it's really a bad idea.

> IMHO, for a flash-optimized filesystem to be useful and widely-used, it would 
> be better
> to run on a block device and to be designed to run efficiently on top of the 
> FTL.
> (ex. log-structured filesystem on general block device)

There's little point in optimising a file system _specifically_ for
devices which in often aren't reliable enough to keep your data anyway.
You might as well use ramfs.

It's unfortunate really -- there's no _fundamental_ reason why FTL has
to be done so badly; it's just that it almost always is. Direct access
to the flash from Linux is _always_ going to be better in practice --
and that way you avoid the problems with dual journalling, along with
the problems with the underlying FTL continuing to keep (and copy around
during GC) sectors which the top-level filesystem has actually
deallocated, etc.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 19:17:18 +, Pavel Machek wrote:
> 
> In kernel fsck
> 
> > --- /dev/null   2007-04-18 05:32:26.652341749 +0200
> > +++ linux-2.6.21logfs/fs/logfs/progs/fsck.c 2007-05-15 00:54:22.0 
> > +0200
> > @@ -0,0 +1,332 @@
> > +/*
> > + * fs/logfs/prog/fsck.c- filesystem check
> > + *
> > + * As should be obvious for Linux kernel code, license is GPLv2
> > + *
> > + * Copyright (c) 2005-2007 Joern Engel
> > + *
> > + * In principle this could get moved to userspace.  However it might still
> > + * make some sense to keep it in the kernel.  It is a pure checker and will
> > + * only report problems, not attempt to repair them.
> > + */
> 
> Is there version that repairs?

No.

> BUG is not right thing to do for media error.

I know.  Top 3 items of my todo list are:
- Handle system crashes
- Add second journal
- Error handling

> > +
> > +#if 0
> > +/* rootdir */
> 
> Please just delete it, not comment it out like this.

That will get resurrected, even before the move to userspace.  I had to
change the filesystem format for compression support and this is an
artifact of the transition phase.

Jörn

-- 
Ninety percent of everything is crap.
-- Sturgeon's Law
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Pavel Machek
Hi!

In kernel fsck

> --- /dev/null 2007-04-18 05:32:26.652341749 +0200
> +++ linux-2.6.21logfs/fs/logfs/progs/fsck.c   2007-05-15 00:54:22.0 
> +0200
> @@ -0,0 +1,332 @@
> +/*
> + * fs/logfs/prog/fsck.c  - filesystem check
> + *
> + * As should be obvious for Linux kernel code, license is GPLv2
> + *
> + * Copyright (c) 2005-2007 Joern Engel
> + *
> + * In principle this could get moved to userspace.  However it might still
> + * make some sense to keep it in the kernel.  It is a pure checker and will
> + * only report problems, not attempt to repair them.
> + */

Is there version that repairs?

> + /* Some segments are reserved.  Just pretend they were all valid */
> + reserved = btree_lookup(&super->s_reserved_segments, segno);
> + if (reserved)
> + return 0;
> +
> + err = wbuf_read(sb, dev_ofs(sb, segno, 0), sizeof(sh), &sh);
> + BUG_ON(err);

BUG is not right thing to do for media error.

> +/*
> + * fs/logfs/prog/mkfs.c  - filesystem generation
> + *
> + * As should be obvious for Linux kernel code, license is GPLv2
> + *
> + * Copyright (c) 2005-2007 Joern Engel
> + *
> + * Should get moved to userspace.
> + */

Indeed. 

> +
> +#if 0
> +/* rootdir */

Please just delete it, not comment it out like this.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 23:49:55 +0800, David Woodhouse wrote:
> 
> Utility is a factor of the underlying design -- a filesystem designed
> for flash really isn't suited to block devices.

I can think of at least three examples where LogFS would indeed make
sense on block devices.

Jörn

-- 
Happiness isn't having what you want, it's wanting what you have.
-- unknown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread David Woodhouse
On Wed, 2007-05-16 at 08:34 -0700, Andrew Morton wrote:
> Reduced testability, mainly. Also potentially reduced usefulness. 

CONFIG_MTD has never been a barrier to testability. JFFS2 depends on MTD
and had _most_ of its early testing and development done on the 'fake'
mtdram device.

Utility is a factor of the underlying design -- a filesystem designed
for flash really isn't suited to block devices.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Andrew Morton
On Wed, 16 May 2007 20:07:18 +0800 David Woodhouse <[EMAIL PROTECTED]> wrote:

> > It's strange and a bit regrettable that an fs would have dependency on MTD,
> > really.
> 
> Why? Other file systems has dependencies on BLOCK or on NET. It seems
> entirely normal to me.

Reduced testability, mainly. Also potentially reduced usefulness.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Kevin Bowling

On 5/16/07, David Woodhouse <[EMAIL PROTECTED]> wrote:

On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote:
>
> My experience is that no matter which name I pick, people will
> complain
> anyway.  Previous suggestions included:
> jffs3
> jefs
> engelfs
> poofs
> crapfs
> sweetfs
> cutefs
> dynamic journaling fs - djofs
> tfsfkal - the file system formerly known as logfs

Can we call it jörnfs? :)


However if Jörn is accused of murder, it will have little chance of
being merged :-).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread CaT
On Wed, May 16, 2007 at 03:53:19PM +0200, J??rn Engel wrote:
> Imo they all suck.  LogFS also sucks, but it allows me to make a stupid
> joke and keep my logfs.org domain.

Well if stupid jokes are a goer there's always gordonfs. :)

*hides*

-- 
"To the extent that we overreact, we proffer the terrorists the
greatest tribute."
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Artem Bityutskiy
On Wed, 2007-05-16 at 22:04 +0800, David Woodhouse wrote:
> On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote:
> > 
> > My experience is that no matter which name I pick, people will
> > complain
> > anyway.  Previous suggestions included:
> > jffs3
> > jefs
> > engelfs
> > poofs
> > crapfs
> > sweetfs
> > cutefs
> > dynamic journaling fs - djofs
> > tfsfkal - the file system formerly known as logfs
> 
> Can we call it jörnfs? :)

And it is essential to preserve "ö" and let Pavel enjoy :-)

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread David Woodhouse
On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote:
> 
> My experience is that no matter which name I pick, people will
> complain
> anyway.  Previous suggestions included:
> jffs3
> jefs
> engelfs
> poofs
> crapfs
> sweetfs
> cutefs
> dynamic journaling fs - djofs
> tfsfkal - the file system formerly known as logfs

Can we call it jörnfs? :)

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 09:41:10 -0400, John Stoffel wrote:
> Jörn> On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
> 
> Jörn> How many of you have worked for IBM before?  Vowels are not
> evil. ;)
> 
> Nope, they're not.  I just think that LogFS isn't descriptive enough,
> or more accurately, is the *wrong* description of this filesystem.  

That was the whole point.  JFFS2, the journaling flash filesystem, is a
strictly log-structured filesystem.  LogFS has a journal.

It is also the filesystem that tries to scale logarithmically, as Arnd
has noted.  Maybe I should call it Log2 to emphesize this point.  Log1
would be horrible scalability.

> flashfs works for me.  It's longer, but hey, that's ok.  Even flshfs
> might work.  Oh wait, flesh?  flash?  flush?  Too confusing... :-)   

Maybe.  FFS or flash filesystem already exists.  And YAFFS, yet another
flash filesystem, would be older than flashfs.

My experience is that no matter which name I pick, people will complain
anyway.  Previous suggestions included:
jffs3
jefs
engelfs
poofs
crapfs
sweetfs
cutefs
dynamic journaling fs - djofs
tfsfkal - the file system formerly known as logfs

Plus today:
FFFS
flashfs
fredfs
bob
shizzle

Imo they all suck.  LogFS also sucks, but it allows me to make a stupid
joke and keep my logfs.org domain.

Jörn

-- 
There are two ways of constructing a software design: one way is to make
it so simple that there are obviously no deficiencies, and the other is
to make it so complicated that there are no obvious deficiencies.
-- C. A. R. Hoare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 15:36:44 +0300, Pekka Enberg wrote:
> On 5/16/07, Jörn Engel <[EMAIL PROTECTED]> wrote:
> >
> >More trouble?
> 
> Forgot to add (see below). Seems logfs_segment_read would be simpler
> too if you fixed this.

Would it?  I think that code would still be needed, although possibly in
a different function.

There are two minor drawbacks to using the page cache, btw:
- Indirect blocks need some mapping too.  So either I need to steal a
  bit from the inode space or from the fpos space.
- OOM handling is a bit more complicated.  I would need a mempool for
  that.

> >[ Objects are the units that get compressed.  Segments can contain both
> >compressed and uncompressed objects. ]
> >
> >It is a trade-off.  Each object has a 24 Byte header plus X Bytes of
> >data.  Whether the data is compressed or not is indicated in the header.
> 
> Was my point really. Why do segments contain both compressed and
> uncompressed objects?

Compressing random data will actually enlarge it.  If that happens I
simply store the verbatim uncompressed data instead and mark it as such.

There is also demand for a user-controlled bit in the inode to disable
compression completely.  All those .jpg, .mpg, .mp3, etc. just waste
time by trying and failing to compress them.

Jörn

-- 
Write programs that do one thing and do it well. Write programs to work
together. Write programs to handle text streams, because that is a
universal interface.
-- Doug MacIlroy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 15:08:15 +0300, Pekka Enberg wrote:
> On 5/16/07, Jamie Lokier <[EMAIL PROTECTED]> wrote:
> >Given that the filesystem is still 'experimental', I'd concentrate on
> >getting it stable before worrying about immutable and xattrs unless
> >they are easy.
> 
> We will run into trouble if the on-disk format is not flexible enough
> to accommodate xattrs (think reiser3 here). So I'd worry about it
> before merging to mainline.

Adding xattrs would be fairly simple.  Inodes just need one extra
pointer for that.

Luckily inodes no longer need to be padded to 128 or 256 bytes.  They
are individually compressed, so their size is not limited to powers of
two.

Jörn

-- 
To recognize individual spam features you have to try to get into the
mind of the spammer, and frankly I want to spend as little time inside
the minds of spammers as possible.
-- Paul Graham
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 16:29:22 +0400, Evgeniy Polyakov wrote:
> On Wed, May 16, 2007 at 01:50:03PM +0200, Jörn Engel ([EMAIL PROTECTED]) 
> wrote:
> > On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
> > > 
> > > But if akpm can't pronounce it, how about FFFS for faster flash
> > > filesystem ;-)
> > 
> > How many of you have worked for IBM before?  Vowels are not evil. ;)
> 
> Do you think 'eieio' is a good set? IBM's work too...

I will let someone else comment on that one.

http://www.uwsg.iu.edu/hypermail/linux/kernel/0110.1/1294.html

Jörn

-- 
There are two ways of constructing a software design: one way is to make
it so simple that there are obviously no deficiencies, and the other is
to make it so complicated that there are no obvious deficiencies.
-- C. A. R. Hoare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 13:25:48 +0100, Jamie Lokier wrote:
> 
> Is LogFS really slower than JFFS2 in practice?

Not sure.  I ran a benchmark before adding compression support in QEMU
with a lightning-fast device.  So the results should differ quite a bit
from practice.

http://logfs.org/~joern/logfs/benchmark/benchmark_overview

LogFS was actually faster than JFFS2.  So for that particular
unrealistic benchmark, updating the LogFS tree was less expensive than
trying (and failing) to compress and calculating the CRC was for JFFS2.

With compression finished, I would expect LogFS numbers to degrade.  If
file data had checksums (not done yet, should be optional for users to
decide) even more so.

> I would have guessed reads to be a similar speed, tree updates to be a
> similar speed  to journal  updates for sustained  non-fsyncing writes,
> and the difference unimportant for tiny individual commits whose index
> updates are not merged with any other.  I've not thought about it much
> though.

LogFS isn't that good yet.  Right now, writing 10 adjacent blocks to a
file requires 10 tree updates instead of 1.  Not full updates though,
just up to the inode.

Quite surprisingly, read speed in the benchmark was significantly better
for LogFS, even after substracting mount time.  I don't know if all of
that can be explained with CRC checks or there is more to it.

Jörn

-- 
I can say that I spend most of my time fixing bugs even if I have lots
of new features to implement in mind, but I give bugs more priority.
-- Andrea Arcangeli, 2000
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Pekka Enberg

On 5/16/07, Pekka Enberg <[EMAIL PROTECTED]> wrote:

Forgot to add (see below). Seems logfs_segment_read would be simpler
too if you fixed this.


Blah. Just to be clear: I forgot to add a "(see below)" text in the
original review comment.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Pekka Enberg

On 5/16/07, Jörn Engel <[EMAIL PROTECTED]> wrote:

> > +/* FIXME: all this mess should get replaced by using the page cache */
> > +static void fixup_from_wbuf(struct super_block *sb, struct logfs_area
> *area,
> > + void *read, u64 ofs, size_t readlen)
> > +{
>
> Indeed. And I think you're getting some more trouble because of this...

More trouble?


Forgot to add (see below). Seems logfs_segment_read would be simpler
too if you fixed this.


[ Objects are the units that get compressed.  Segments can contain both
compressed and uncompressed objects. ]

It is a trade-off.  Each object has a 24 Byte header plus X Bytes of
data.  Whether the data is compressed or not is indicated in the header.


Was my point really. Why do segments contain both compressed and
uncompressed objects?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Evgeniy Polyakov
On Wed, May 16, 2007 at 01:50:03PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote:
> On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
> > 
> > But if akpm can't pronounce it, how about FFFS for faster flash
> > filesystem ;-)
> 
> How many of you have worked for IBM before?  Vowels are not evil. ;)

Do you think 'eieio' is a good set? IBM's work too...

We do not get new filesystem each couple of month, so stealing so good
name as 'logfs' is not that bad I think.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 13:21:11 +0300, Pekka J Enberg wrote:
> 
> > +#define LOGFS_BUG(sb) do {   \
> > + struct super_block *__sb = sb;  \
> > + logfs_crash_dump(__sb); \
> > + BUG();  \
> > +} while(0)
> 
> Note that BUG() can be a no-op so dumping something on disk might not make 
> sense there. This seems useful, but you probably need to make this bit 
> more generic so that using BUG() proper in your filesystem code does the 
> right thing. Inventing your own wrapper should be avoided.

Hmm.  I am not sure how this could be generic and still make sense.

LogFS has some unused write-once space in the superblock segment.
Embedded people always have problems debugging and were suggesting using
this to store debug information.  That allows me to ask for a filesystem
image and get both the offending image plus a crash dump.  It also
allows me to abort mounting if I ever see an existing crash dump (not
implemented yet).  "First failure data capture" was an old IBM slogal
and the "first" really makes a difference.

So how should I deal with BUG() as a no-op?  I _think_ it should not
make a difference.  People choosing that option should know that their
kernels will continue running with errors and can potentially do any
kind of damage.  My wrapper doesn't seem to change that one way or
another.

What I _should_ do is make the filesystem read-only after this point.
That would actually be quite simple.  Onto my list.

> > +static inline struct logfs_super *LOGFS_SUPER(struct super_block *sb)
> > +{
> > + return sb->s_fs_info;
> > +}
> > +
> > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode)
> > +{
> > + return container_of(inode, struct logfs_inode, vfs_inode);
> > +}
> 
> No need for upper case in function names.

Will change.

> > +int logfs_memcpy(void *in, void *out, size_t inlen, size_t outlen)
> > +{
> > + if (outlen < inlen)
> > + return -EIO;
> > + memcpy(out, in, inlen);
> > + return inlen;
> > +}
> 
> Please drop this wrapper function. It's better to open-code the error
> handling in the callers (there are total of three of them).

Is it?  The advantage of having the wrapper is that the compression case
and the non-compressed case look identical and the code just repeats the
same pattern twice.

I'll try to open-code it and see how it looks.

> > +/* FIXME: combine with per-sb journal variant */
> > +static unsigned char compressor_buf[LOGFS_MAX_OBJECTSIZE];
> > +static DEFINE_MUTEX(compr_mutex);
> 
> This looks fishy. All reads and writes are serialized by compr_mutex 
> because they share a scratch buffer for compression and uncompression?

s/fishy/lame/

Will move to superblock.

> > +/* FIXME: all this mess should get replaced by using the page cache */
> > +static void fixup_from_wbuf(struct super_block *sb, struct logfs_area 
> *area,
> > + void *read, u64 ofs, size_t readlen)
> > +{
> 
> Indeed. And I think you're getting some more trouble because of this... 

More trouble?

Sadly squeezing bugs out of that beast was easier than wrapping my brain
around the proper solution.  And now it works and has moved to a less
urgent position in my todo list.

> > +int logfs_segment_read(struct super_block *sb, void *buf, u64 ofs)
> > +{
> > + struct logfs_object_header *h;
> > + u16 len;
> > + int err, bs = sb->s_blocksize;
> > +
> > + mutex_lock(&compr_mutex);
> > + err = wbuf_read(sb, ofs, bs + LOGFS_HEADERSIZE, compressor_buf);
> > + if (err)
> > + goto out;
> > + h = (void*)compressor_buf;
> > + len = be16_to_cpu(h->len);
> > +
> > + switch (h->compr) {
> > + case COMPR_NONE:
> > + logfs_memcpy(compressor_buf + LOGFS_HEADERSIZE, buf, bs, 
> bs);
> > + break;
> 
> Seems wasteful to first read the data in a scratch buffer and then 
> memcpy() it immediately for the COMPR_NONE case. Any reason why we can't 
> read a full struct page, for example, and simply use that if we don't need 
> to uncompress anything?

No technical reason.  Just a case of "make it work now and optimize
later".

> > + case COMPR_ZLIB:
> > + err = logfs_uncompress(compressor_buf + LOGFS_HEADERSIZE, 
> buf,
> > + len, bs);
> > + BUG_ON(err);
> > + break;
> 
> Not claiming to undestand your on-disk format, but wouldn't it make more 
> sense if we knew whether a given segment is compressed or not _before_ we 
> actually read it?

[ Objects are the units that get compressed.  Segments can contain both
compressed and uncompressed objects. ]

It is a trade-off.  Each object has a 24 Byte header plus X Bytes of
data.  Whether the data is compressed or not is indicated in the header.
So we have to do one of two things.  Either:
- read the complete object, then either uncompress or memcpy, or
- read just the header, then either read uncompressed data or read
  compressed data and uncompre

Re: [PATCH] LogFS take three

2007-05-16 Thread Jamie Lokier
Artem Bityutskiy wrote:
> On Wed, 2007-05-16 at 12:34 +0100, Jamie Lokier wrote:
> > Jörn Engel wrote:
> > > On Wed, 16 May 2007 12:54:14 +0800, David Woodhouse wrote:
> > > > Personally I'd just go for 'JFFS3'. After all, it has a better claim to
> > > > the name than either of its predecessors :)
> > > 
> > > Did you ever see akpm's facial expression when he tried to pronounce
> > > "JFFS2"?  ;)
> > 
> > JFFS3 is a good, meaningful name to anyone familiar with JFFS2.
> > 
> > But if akpm can't pronounce it, how about FFFS for faster flash
> > filesystem ;-)
> 
> The problem is that JFFS2 will always be faster in terms of I/O speed
> anyway, just because it does not have to maintain on-flash indexing
> data structures. But yes, it is slow in mount and in building big
> inodes, so the "faster" is confusing.

Is LogFS really slower than JFFS2 in practice?

I would have guessed reads to be a similar speed, tree updates to be a
similar speed  to journal  updates for sustained  non-fsyncing writes,
and the difference unimportant for tiny individual commits whose index
updates are not merged with any other.  I've not thought about it much
though.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread CaT
On Wed, May 16, 2007 at 01:50:03PM +0200, J??rn Engel wrote:
> On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
> > 
> > But if akpm can't pronounce it, how about FFFS for faster flash
> > filesystem ;-)
> 
> How many of you have worked for IBM before?  Vowels are not evil. ;)
> 
> Grouping four or more consonants to name anything will cause similar
> expressions on people's faces.  Numbers don't help much either.
> 
> Ext2 is a great name, because "ext" actually is a pronouncable syllable.
> MinixFS, ChunkFS, TileFS are great too.  XFS and JFS are ok, at least
> they only have three consonants.  But FFS exists, so I'd rather go for a
> syllable.

FlashFS?

-- 
"To the extent that we overreact, we proffer the terrorists the
greatest tribute."
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Pekka Enberg

On 5/16/07, Jamie Lokier <[EMAIL PROTECTED]> wrote:

Given that the filesystem is still 'experimental', I'd concentrate on
getting it stable before worrying about immutable and xattrs unless
they are easy.


We will run into trouble if the on-disk format is not flexible enough
to accommodate xattrs (think reiser3 here). So I'd worry about it
before merging to mainline.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread David Woodhouse
On Tue, 2007-05-15 at 13:37 -0700, Andrew Morton wrote:
> But it includes an MTD header file.
> 
> Can this code be tested by people who don't have MTD hardware?  We used to
> ahve a fake-mtd-on-a-blockdev thing, whcih was in a state of some
> disrepair.  Maybe it got repaired.  Or removed.  I can't immediately locate
> it...

There's block2mtd, as Jörn pointed out. There's also 'nandsim' and
'mtdram', which will vmalloc some space and use that to emulate NAND or
NOR flash, respectively.

> It's strange and a bit regrettable that an fs would have dependency on MTD,
> really.

Why? Other file systems has dependencies on BLOCK or on NET. It seems
entirely normal to me.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
> 
> But if akpm can't pronounce it, how about FFFS for faster flash
> filesystem ;-)

How many of you have worked for IBM before?  Vowels are not evil. ;)

Grouping four or more consonants to name anything will cause similar
expressions on people's faces.  Numbers don't help much either.

Ext2 is a great name, because "ext" actually is a pronouncable syllable.
MinixFS, ChunkFS, TileFS are great too.  XFS and JFS are ok, at least
they only have three consonants.  But FFS exists, so I'd rather go for a
syllable.

Jörn

-- 
Joern's library part 5:
http://www.faqs.org/faqs/compression-faq/part2/section-9.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jamie Lokier
Albert Cahalan wrote:
> Please don't forget the immutable bit. ("man lsattr")
> Having both, BSD-style, would be even better.
> The immutable bit is important for working around
> software bugs and "features" that damage files.
> 
> I also can't find xattr support.

Imho,

Given that the filesystem is still 'experimental', I'd concentrate on
getting it stable before worrying about immutable and xattrs unless
they are easy.  (Immutable is easy).

In 13 years of using Linux in all sorts of ways I've never yet to use
either feature.  They would be good to have, but stability in a
filesystem is much more useful.

I'm biased of course: if LogFS were stable and well tested, I'd be
using it right now in my embedded thingy - and that doesn't even
bother with uids :-)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Artem Bityutskiy
On Wed, 2007-05-16 at 12:34 +0100, Jamie Lokier wrote:
> Jörn Engel wrote:
> > On Wed, 16 May 2007 12:54:14 +0800, David Woodhouse wrote:
> > > Personally I'd just go for 'JFFS3'. After all, it has a better claim to
> > > the name than either of its predecessors :)
> > 
> > Did you ever see akpm's facial expression when he tried to pronounce
> > "JFFS2"?  ;)
> 
> JFFS3 is a good, meaningful name to anyone familiar with JFFS2.
> 
> But if akpm can't pronounce it, how about FFFS for faster flash
> filesystem ;-)

The problem is that JFFS2 will always be faster in terms of I/O speed
anyway, just because it does not have to maintain on-flash indexing
data structures. But yes, it is slow in mount and in building big
inodes, so the "faster" is confusing.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Tue, 15 May 2007 19:37:36 -0700, Roland Dreier wrote:
> 
> There are rather a lot of of FIXME comments, including scary stuff like
> 
>  > +  /*
>  > +   * FIXME: this cannot be right but it does "fix" a bug of i_count
>  > +   * dropping too low.  Needs more thought.
>  > +   */
>  > +  atomic_inc(&old_dentry->d_inode->i_count);

Any thoughts on this would be appreciated.

> and
> 
>  > +int __logfs_write_inode(struct inode *inode)
>  > +{
>  > +  /*
>  > +   * FIXME: Those two inodes are 512 bytes in total.  Not good to
>  > +   * have on the stack.  Possibly the best solution would be to bite
>  > +   * the bullet and do another format change before release and
>  > +   * shrink the inodes.
>  > +   */
>  > +  struct logfs_disk_inode old, new;
> 
> are you going to change the format?  or fix this some other way?

I would love to put my inodes on a diet.  It is just a matter of time
and priorities.  To me the 512 bytes on the stack are unfortunate, but
not a show stopper.  Crash behaviour is, so that has priority.

> I think a sweep through the code searching for FIXME and at least
> rewriting all such comments to look like stuff that can be deferred
> would be warranted ;)

Are you asking me to hide known problems under a rug? ;)

Will see if I can easily fix some of these.  In particular the
eat-your-data FIXME that can cause LogFS to not survive a crash.

Jörn

-- 
Correctness comes second.
Features come third.
Performance comes last.
Maintainability is easily forgotten.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jamie Lokier
Jörn Engel wrote:
> On Wed, 16 May 2007 12:54:14 +0800, David Woodhouse wrote:
> > Personally I'd just go for 'JFFS3'. After all, it has a better claim to
> > the name than either of its predecessors :)
> 
> Did you ever see akpm's facial expression when he tried to pronounce
> "JFFS2"?  ;)

JFFS3 is a good, meaningful name to anyone familiar with JFFS2.

But if akpm can't pronounce it, how about FFFS for faster flash
filesystem ;-)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 12:54:14 +0800, David Woodhouse wrote:
> 
> Personally I'd just go for 'JFFS3'. After all, it has a better claim to
> the name than either of its predecessors :)

Did you ever see akpm's facial expression when he tried to pronounce
"JFFS2"?  ;)

Jörn

-- 
Fancy algorithms are slow when n is small, and n is usually small.
Fancy algorithms have big constants. Until you know that n is
frequently going to be big, don't get fancy.
-- Rob Pike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 07:22:54 +0200, Willy Tarreau wrote:
> 
> On hard disks, yes, but as you suggested, there are lots of other flash
> devices interfaced as block devices. CompactFlash comes to mind, USB
> keys too. And on these ones, the most important is to reduce the number
> of writes and to support large sizes. I already see LogFS as an interesting
> alternative to JFFS2 on such devices, eventhough it does not (yet?) support
> compression.

It does support compression now.  That part was finished, erm, about two
weeks ago.

Logfs on Smartmedia format (I assume most CompactFlash, USB, SD and MMC
media use that) will currently cause a nasty interaction.  Unlike JFFS2,
logfs will alternately write to several eraseblocks.  On real flash,
alternating between half-finished blocks is free.

Smartmedia standard requires to finish the last block before moving to
the next.  So the logfs write pattern will cause it to add an
erase/write overhead of - depending on erase size and compression - 4x
to 512x.  Pretty bad.

If Smartmedia was really stupid, there would be no overhead at all.  Any
manucaturers listening should seriously think about allowing raw flash
access to their devices. ;)

Jörn

-- 
To recognize individual spam features you have to try to get into the
mind of the spammer, and frankly I want to spend as little time inside
the minds of spammers as possible.
-- Paul Graham
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Pekka J Enberg
Hi Joern,

> +#define LOGFS_BUG(sb) do {   \
> + struct super_block *__sb = sb;  \
> + logfs_crash_dump(__sb); \
> + BUG();  \
> +} while(0)

Note that BUG() can be a no-op so dumping something on disk might not make 
sense there. This seems useful, but you probably need to make this bit 
more generic so that using BUG() proper in your filesystem code does the 
right thing. Inventing your own wrapper should be avoided.

> +static inline struct logfs_super *LOGFS_SUPER(struct super_block *sb)
> +{
> + return sb->s_fs_info;
> +}
> +
> +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode)
> +{
> + return container_of(inode, struct logfs_inode, vfs_inode);
> +}

No need for upper case in function names.

> +int logfs_memcpy(void *in, void *out, size_t inlen, size_t outlen)
> +{
> + if (outlen < inlen)
> + return -EIO;
> + memcpy(out, in, inlen);
> + return inlen;
> +}

Please drop this wrapper function. It's better to open-code the error
handling in the callers (there are total of three of them).

> +/* FIXME: combine with per-sb journal variant */
> +static unsigned char compressor_buf[LOGFS_MAX_OBJECTSIZE];
> +static DEFINE_MUTEX(compr_mutex);

This looks fishy. All reads and writes are serialized by compr_mutex 
because they share a scratch buffer for compression and uncompression?

> +/* FIXME: all this mess should get replaced by using the page cache */
> +static void fixup_from_wbuf(struct super_block *sb, struct logfs_area 
*area,
> + void *read, u64 ofs, size_t readlen)
> +{

Indeed. And I think you're getting some more trouble because of this... 

> +int logfs_segment_read(struct super_block *sb, void *buf, u64 ofs)
> +{
> + struct logfs_object_header *h;
> + u16 len;
> + int err, bs = sb->s_blocksize;
> +
> + mutex_lock(&compr_mutex);
> + err = wbuf_read(sb, ofs, bs + LOGFS_HEADERSIZE, compressor_buf);
> + if (err)
> + goto out;
> + h = (void*)compressor_buf;
> + len = be16_to_cpu(h->len);
> +
> + switch (h->compr) {
> + case COMPR_NONE:
> + logfs_memcpy(compressor_buf + LOGFS_HEADERSIZE, buf, bs, 
bs);
> + break;

Seems wasteful to first read the data in a scratch buffer and then 
memcpy() it immediately for the COMPR_NONE case. Any reason why we can't 
read a full struct page, for example, and simply use that if we don't need 
to uncompress anything?

> + case COMPR_ZLIB:
> + err = logfs_uncompress(compressor_buf + LOGFS_HEADERSIZE, 
buf,
> + len, bs);
> + BUG_ON(err);
> + break;

Not claiming to undestand your on-disk format, but wouldn't it make more 
sense if we knew whether a given segment is compressed or not _before_ we 
actually read it?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-15 Thread Willy Tarreau
On Wed, May 16, 2007 at 02:06:31AM +0200, Jörn Engel wrote:
> On Tue, 15 May 2007 13:37:59 -0700, Andrew Morton wrote:
> > It's strange and a bit regrettable that an fs would have dependency on MTD,
> > really.
> 
> It is and changing this wouldn't be too hard.  All device access goes
> through five functions (read, write, erase, is_bad and mark_bad).  As
> soon as someone seriously cares I will add a struct logfs_device_ops and
> have a second set of these functions for block devices.
> 
> On hard disks it shouldn't make too much sense.  The filesystem will
> fragment like a splinter bomb and be just as popular.

On hard disks, yes, but as you suggested, there are lots of other flash
devices interfaced as block devices. CompactFlash comes to mind, USB
keys too. And on these ones, the most important is to reduce the number
of writes and to support large sizes. I already see LogFS as an interesting
alternative to JFFS2 on such devices, eventhough it does not (yet?) support
compression.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-15 Thread David Woodhouse
On Tue, 2007-05-15 at 21:19 +0200, Jörn Engel wrote:
> On Tue, 15 May 2007 15:07:05 -0400, John Stoffel wrote:
> > 
> > I've been semi watching this, and the only comment I really can give
> > is that I hate the name.  To me, logfs implies a filesystem for
> > logging purposes, not for Flash hardware with wear leveling issues to
> > be taken into account.
> 
> Yeah, well, ...
> 
> Two years ago when I started all this, I was looking for a good name.
> All I could come up with sounded stupid, so I picked "LogFS" as a code
> name.  As soon as I find a better name, the code name should get
> replaced.
> 
> By now I still don't have anything better.  All alternatives that were
> proposed are just as bad - with the added disadvantage of being new and
> not established yet.  My hope of ever finding a better name is nearly
> zero. 

Personally I'd just go for 'JFFS3'. After all, it has a better claim to
the name than either of its predecessors :)

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-15 Thread Roland Dreier
There are rather a lot of of FIXME comments, including scary stuff like

 > +/*
 > + * FIXME: this cannot be right but it does "fix" a bug of i_count
 > + * dropping too low.  Needs more thought.
 > + */
 > +atomic_inc(&old_dentry->d_inode->i_count);

and

 > +int __logfs_write_inode(struct inode *inode)
 > +{
 > +/*
 > + * FIXME: Those two inodes are 512 bytes in total.  Not good to
 > + * have on the stack.  Possibly the best solution would be to bite
 > + * the bullet and do another format change before release and
 > + * shrink the inodes.
 > + */
 > +struct logfs_disk_inode old, new;

are you going to change the format?  or fix this some other way?

I think a sweep through the code searching for FIXME and at least
rewriting all such comments to look like stuff that can be deferred
would be warranted ;)

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-15 Thread Jörn Engel
On Wed, 16 May 2007 02:06:31 +0200, Jörn Engel wrote:
>
> > > +
> > > + if (dest) {
> > > + /* symlink */
> > > + ret = logfs_inode_write(inode, dest, destlen, 0);
> > > + } else {
> > > + /* creat/mkdir/mknod */
> > > + ret = __logfs_write_inode(inode);
> > > + }
> > > + super->s_victim_ino = 0;
> > > + if (ret) {
> > > + if (!dest)
> > > + li->li_flags |= LOGFS_IF_STILLBORN;
> > > + /* FIXME: truncate symlink */
> > > + inode->i_nlink--;
> > > + iput(inode);
> > > + goto out;
> > > + }
> > > +
> > > + if (inode->i_mode & S_IFDIR)
> > > + dir->i_nlink++;
> > 
> > You have helper functions for i_nlink++, which remember to do
> > mark_inode_dirty()?
> 
> I do.  Looks like I should use them here and in at least one other
> place.  Will recheck them all.

Actually, this code was correct.  New inodes get written immediatly to
ensure catching -ENOSPC at a place where errors can get returned.  After
leaving this function, the inode is not dirty.  Either it has been
written or it is marked LOGFS_IF_STILLBORN never will be.

Your intuition was still good.  Link() did need the change you
indicated.

Jörn

-- 
Joern's library part 14:
http://www.sandpile.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-15 Thread Jörn Engel
On Tue, 15 May 2007 19:26:17 -0400, Albert Cahalan wrote:
> 
> Please don't forget the immutable bit. ("man lsattr")
> Having both, BSD-style, would be even better.
> The immutable bit is important for working around
> software bugs and "features" that damage files.
> 
> I also can't find xattr support.

Not forgotten, just on a different list.  Bugs beat features. :)

Jörn

-- 
Data expands to fill the space available for storage.
-- Parkinson's Law
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-15 Thread Jörn Engel
On Tue, 15 May 2007 13:37:59 -0700, Andrew Morton wrote:
> > +
> > +config LOGFS_FSCK
> > +   bool "Run LogFS fsck at mount time"
> > +   depends on LOGFS
> > +   help
> > + Run a full filesystem check on every mount.  If any errors are
> > + found, mounting the filesystem will fail.  This is a debug option
> > + for developers.
> > +
> > + If unsure, say N.
> > +
> 
> No dependency on MTD,

Ack.

> Can this code be tested by people who don't have MTD hardware?  We used to
> ahve a fake-mtd-on-a-blockdev thing, whcih was in a state of some
> disrepair.  Maybe it got repaired.  Or removed.  I can't immediately locate
> it...

Got removed.  Block2mtd is the new one and works with logfs.  At least
my version of it does and I pushed my patches to David - not sure if
they made it to mainline already.  Mtdram should work as well.

> It's strange and a bit regrettable that an fs would have dependency on MTD,
> really.

It is and changing this wouldn't be too hard.  All device access goes
through five functions (read, write, erase, is_bad and mark_bad).  As
soon as someone seriously cares I will add a struct logfs_device_ops and
have a second set of these functions for block devices.

On hard disks it shouldn't make too much sense.  The filesystem will
fragment like a splinter bomb and be just as popular.

> ooh, documentation.  Quick, merge it!

:)

> > +/* memtree.c */
> > +void btree_init(struct btree_head *head);
> > +void *btree_lookup(struct btree_head *head, long val);
> > +int btree_insert(struct btree_head *head, long val, void *ptr);
> > +int btree_remove(struct btree_head *head, long val);
> 
> These names are too generic.  If we later add a btree library: blam.

My plan was to move this code to lib/ sooner or later.  If you consider
it useful in its current state, I can do it immediatly.  And if someone
else merged a superior btree library I'd happily remove mine and use the
new one instead.

Opinions?

> > +/* super.c */
> > +int mtdread(struct super_block *sb, loff_t ofs, size_t len, void *buf);
> > +int mtdwrite(struct super_block *sb, loff_t ofs, size_t len, void *buf);
> > +int mtderase(struct super_block *sb, loff_t ofs, size_t len);
> > +void logfs_crash_dump(struct super_block *sb);
> > +int all_ff(void *buf, size_t len);
> > +int logfs_statfs(struct dentry *dentry, struct kstatfs *stats);
> 
> Have you checked that all of this needs global scope?

Not within the last month.  Will recheck.

> > +
> > +/* progs/fsck.c */
> > +#ifdef CONFIG_LOGFS_FSCK
> > +int logfs_fsck(struct super_block *sb);
> > +#else
> > +#define logfs_fsck(sb) ({ 0; })
> 
> static inline int logfs_fsck(struct super_block *sb)
> {
>   return 0;
> }
> 
> is better: nicer to look at, has typechecking.

That's what I tried first.  But the brick I carry on my neck decided to
compile and link the read logfs_fsck() unconditionally and so this
collided.

Will change the Makefile and this.

> > +static inline struct logfs_super *LOGFS_SUPER(struct super_block *sb)
> > +{
> > +   return sb->s_fs_info;
> > +}
> > +
> > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode)
> > +{
> > +   return container_of(inode, struct logfs_inode, vfs_inode);
> > +}
> 
> Do these need to be uppercase?

No more than EXT2_SB() or EXT2_I().

Since you are the second person to ask, I sense a majority vote to
change the case.

> > +static inline pgoff_t logfs_index(u64 pos)
> > +{
> > +   return pos / LOGFS_BLOCKSIZE;
> > +}
> 
> If the compiler goofs up here we'll end up trying to do a 64/32 divide and
> it won't link on 32-bit machines.  It would be safer to do
> 
>   return pos >> LOGFS_BLOCKSHIFT;

Will change.

> > +int logfs_uncompress(void *in, void *out, size_t inlen, size_t outlen)
> > +{
> > +   int err, ret;
> > +
> > +   ret = -EIO;
> > +   mutex_lock(&compr_mutex);
> 
> A per-superblock lock and stream would be nicer.

Yes and no.  The zlib workspaces weigh about 300k.  On any decent SMP
machine, that hardly matters and getting the scalability is nice.  For
low-end embedded, that amount of memory per filesystem can matter.

I have no clue how many people would suffer one way or another.

> > +
> > +
> > +static inline loff_t file_end(struct inode *inode)
> > +{
> > +   return (i_size_read(inode) + inode->i_sb->s_blocksize - 1)
> > +   >> inode->i_sb->s_blocksize_bits;
> > +}
> > +static void logfs_set_name(struct logfs_disk_dentry *dd, struct qstr *name)
> > +{
> 
> The code has a strange mix of two-blank-lines-between-functions and
> no-blank-lines-between-functions.  One blank line is usual.

Will change.

> > +
> > +   if (dest) {
> > +   /* symlink */
> > +   ret = logfs_inode_write(inode, dest, destlen, 0);
> > +   } else {
> > +   /* creat/mkdir/mknod */
> > +   ret = __logfs_write_inode(inode);
> > +   }
> > +   super->s_victim_ino = 0;
> > +   if (ret) {
> > +   if (!dest)
> > +   li->li_flags |= LOGFS_IF_STILLBORN;
> > +   

Re: [PATCH] LogFS take three

2007-05-15 Thread Albert Cahalan

Please don't forget the immutable bit. ("man lsattr")
Having both, BSD-style, would be even better.
The immutable bit is important for working around
software bugs and "features" that damage files.

I also can't find xattr support.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-15 Thread Andrew Morton
On Tue, 15 May 2007 17:19:20 +0200
J__rn Engel <[EMAIL PROTECTED]> wrote:

> Add LogFS, a scalable flash filesystem.
>
> ...
>
>  
> +config LOGFS
> + tristate "Log Filesystem (EXPERIMENTAL)"
> + depends on EXPERIMENTAL
> + select ZLIB_INFLATE
> + select ZLIB_DEFLATE
> + help
> +   Flash filesystem aimed to scale efficiently to large devices.
> +   In comparison to JFFS2 it offers significantly faster mount
> +   times and potentially less RAM usage, although the latter has
> +   not been measured yet.
> +
> +   In its current state it is still very experimental and should
> +   not be used for other than testing purposes.
> +
> +   If unsure, say N.
> +
> +config LOGFS_FSCK
> + bool "Run LogFS fsck at mount time"
> + depends on LOGFS
> + help
> +   Run a full filesystem check on every mount.  If any errors are
> +   found, mounting the filesystem will fail.  This is a debug option
> +   for developers.
> +
> +   If unsure, say N.
> +

No dependency on MTD,

> @@ -0,0 +1,373 @@
> +/*
> + * fs/logfs/logfs.h
> + *
> + * As should be obvious for Linux kernel code, license is GPLv2
> + *
> + * Copyright (c) 2005-2007 Joern Engel
> + *
> + * Private header for logfs.
> + */
> +#ifndef fs_logfs_logfs_h
> +#define fs_logfs_logfs_h
> +
> +#define __CHECK_ENDIAN__
> +
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

But it includes an MTD header file.

Can this code be tested by people who don't have MTD hardware?  We used to
ahve a fake-mtd-on-a-blockdev thing, whcih was in a state of some
disrepair.  Maybe it got repaired.  Or removed.  I can't immediately locate
it...

It's strange and a bit regrettable that an fs would have dependency on MTD,
really.

> +
> +/**
> + * struct logfs_area - area management information
> + *
> + * @a_sb:the superblock this area belongs to
> + * @a_is_open:   1 if the area is currently open, else 0
> + * @a_segno: segment number of area
> + * @a_used_objects:  number of used objects (XXX: should get removed)
> + * @a_used_bytes:number of used bytes
> + * @a_ops:   area operations (either journal or ostore)
> + * @a_wbuf:  write buffer
> + * @a_erase_count:   erase count
> + * @a_level: GC level
> + */

ooh, documentation.  Quick, merge it!

> +/* memtree.c */
> +void btree_init(struct btree_head *head);
> +void *btree_lookup(struct btree_head *head, long val);
> +int btree_insert(struct btree_head *head, long val, void *ptr);
> +int btree_remove(struct btree_head *head, long val);

These names are too generic.  If we later add a btree library: blam.

> +
> +/* readwrite.c */
> +int logfs_inode_read(struct inode *inode, void *buf, size_t n, loff_t _pos);
> +int logfs_inode_write(struct inode *inode, const void *buf, size_t n,
> + loff_t pos);

It's a bit rude stealing the logfs* namespace, but I guess you got there
first ;)

> +int logfs_readpage_nolock(struct page *page);
> +int logfs_write_buf(struct inode *inode, pgoff_t index, void *buf);
> +int logfs_delete(struct inode *inode, pgoff_t index);
> +int logfs_rewrite_block(struct inode *inode, pgoff_t index, u64 ofs, int 
> level);
> +int logfs_is_valid_block(struct super_block *sb, u64 ofs, u64 ino, u64 pos);
> +void logfs_truncate(struct inode *inode);
> +u64 logfs_seek_data(struct inode *inode, u64 pos);
> +
> +int logfs_init_rw(struct logfs_super *super);
> +void logfs_cleanup_rw(struct logfs_super *super);
> +
> +/* segment.c */
> +int logfs_erase_segment(struct super_block *sb, u32 ofs);
> +int wbuf_read(struct super_block *sb, u64 ofs, size_t len, void *buf);
> +int logfs_segment_read(struct super_block *sb, void *buf, u64 ofs);
> +s64 logfs_segment_write(struct inode *inode, void *buf, u64 pos, int level,
> + int alloc);
> +int logfs_segment_delete(struct inode *inode, u64 ofs, u64 pos, int level);
> +void logfs_set_blocks(struct inode *inode, u64 no);
> +void __logfs_set_blocks(struct inode *inode);
> +/* area handling */
> +int logfs_init_areas(struct super_block *sb);
> +void logfs_cleanup_areas(struct logfs_super *super);
> +int logfs_open_area(struct logfs_area *area);
> +void logfs_close_area(struct logfs_area *area);
> +
> +/* super.c */
> +int mtdread(struct super_block *sb, loff_t ofs, size_t len, void *buf);
> +int mtdwrite(struct super_block *sb, loff_t ofs, size_t len, void *buf);
> +int mtderase(struct super_block *sb, loff_t ofs, size_t len);
> +void logfs_crash_dump(struct super_block *sb);
> +int all_ff(void *buf, size_t len);
> +int logfs_statfs(struct dentry *dentry, struct kstatfs *stats);

Have you checked that all of this needs global scope?

> +
> +/* progs/fsck.c */
> +#ifdef CONFIG_LOGFS_FSCK
> +int logfs_fsck(struct super_block *sb);
> +#else
> +#define logfs_fsck(sb) ({ 0; })

static inline int logfs_fsck(struct

Re: [PATCH] LogFS take three

2007-05-15 Thread Jörn Engel
On Tue, 15 May 2007 15:07:05 -0400, John Stoffel wrote:
> 
> I've been semi watching this, and the only comment I really can give
> is that I hate the name.  To me, logfs implies a filesystem for
> logging purposes, not for Flash hardware with wear leveling issues to
> be taken into account.

Yeah, well, ...

Two years ago when I started all this, I was looking for a good name.
All I could come up with sounded stupid, so I picked "LogFS" as a code
name.  As soon as I find a better name, the code name should get
replaced.

By now I still don't have anything better.  All alternatives that were
proposed are just as bad - with the added disadvantage of being new and
not established yet.  My hope of ever finding a better name is nearly
zero.

> Also, having scanned through the code, I find the name "cookie" using
> in logfs_inode(), logfs_iput(), logfs_iget() to be badly named.  It
> should really be something like *cached_inode, which would seem to
> give more natural semantics of
> 
>   if (cached_inode)
>   do_cached_inode_ops(...)
>   else
>   do_inode_ops(...)

Half-agreed.  For callers, the name "cookie" makes sense.  It is a
transparent thing they should not tough and hand back unchanged.  For
logfs_iget() and logfs_iput() something like "is_cached" would be
better.

Will change.

Jörn

-- 
Linux [...] existed just for discussion between people who wanted
to show off how geeky they were.
-- Rob Enderle
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-15 Thread Jörn Engel
On Tue, 15 May 2007 20:37:25 +0200, Sam Ravnborg wrote:
> On Tue, May 15, 2007 at 05:19:20PM +0200, Jörn Engel wrote:
> > [ I have put everyone that gave comments to the last patch on Cc:.  Hope
> > that doesn't offend anyone. ]
> > 
> > 
> > Add LogFS, a scalable flash filesystem.
> 
> Have you run sparse on this code?

Several thousand times. :)

> I do not recall if you have written something about it.
> I do not see any obvious things sparse would catch (just browsing quickly)
> but it's always a good thing to do.

Absolutely! I added this line for sparse:
+#define __CHECK_ENDIAN__ 

Not sure how much of the kernel is endian-clean.  Logfs should be.

> +++ linux-2.6.21logfs/fs/logfs/progs/fsck.c 2007-05-15 00:54:22.0 
> +0200
> @@ -0,0 +1,332 @@
> +/*
> + * fs/logfs/prog/fsck.c- filesystem check
> + *
> + * As should be obvious for Linux kernel code, license is GPLv2
> + *
> + * Copyright (c) 2005-2007 Joern Engel
> + *
> + * In principle this could get moved to userspace.  However it might still
> + * make some sense to keep it in the kernel.  It is a pure checker and will
> + * only report problems, not attempt to repair them.
> + */
> +#include "../logfs.h"
> +
> If potential userspace tools needs to include ../logfs.h then there is
> something that ought to be moved to include/linux/logfs.h instead.

There shouldn't be anything left in ../logfs.h that is needed for
userspace.  But the kernel fsck is lame and calls into functions like
iget(), which it pulls in from ../logfs.h.

Jörn

-- 
The only real mistake is the one from which we learn nothing.
-- John Powell
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-15 Thread Sam Ravnborg
On Tue, May 15, 2007 at 05:19:20PM +0200, Jörn Engel wrote:
> [ I have put everyone that gave comments to the last patch on Cc:.  Hope
> that doesn't offend anyone. ]
> 
> 
> Add LogFS, a scalable flash filesystem.

Have you run sparse on this code?
I do not recall if you have written something about it.
I do not see any obvious things sparse would catch (just browsing quickly)
but it's always a good thing to do.

 
+++ linux-2.6.21logfs/fs/logfs/progs/fsck.c 2007-05-15 00:54:22.0 
+0200
@@ -0,0 +1,332 @@
+/*
+ * fs/logfs/prog/fsck.c- filesystem check
+ *
+ * As should be obvious for Linux kernel code, license is GPLv2
+ *
+ * Copyright (c) 2005-2007 Joern Engel
+ *
+ * In principle this could get moved to userspace.  However it might still
+ * make some sense to keep it in the kernel.  It is a pure checker and will
+ * only report problems, not attempt to repair them.
+ */
+#include "../logfs.h"
+
If potential userspace tools needs to include ../logfs.h then there is
something that ought to be moved to include/linux/logfs.h instead.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Review status (Re: [PATCH] LogFS take three)

2007-05-15 Thread Jörn Engel
Most of my homework is done.  There are six items left plus another five
I believe should not get changed.


Changed:
o Kconfig description updated
o Spaces converted to tabs in Makefile
o Sorted/seperated #includes
o structures are __packed instead of packed, #define removed
o removed TRACE()
o 32bit magic is u instead of ull now
o #defines aligned
o Removed obsolete #defines, documented rest
o LOGFS_SUPER and LOGFS_INODE are inline functions now
o removed alloc_disk_sum and free_disk_sum
o moved dir_callback up
o removed pointless cast from max() parameter
o moved LOGFS_LINK_MAX to header
o license headers
o removed logfs_writepage()
o introduced and documented LOGFS_SEGMENT_RESERVE
o documented completion dance in mtderase()
o removed isbad() check from mtderase()
o removed logfs_device_getpage() and logfs_cached_read()
o resurrected code to pick correct device
o removed beXX typedefs
o removed safe_read()
o removed wrapper in logfs_cleanup_gc()
o device_read() returns error
o renamed logfs_alloc_blocks() to match current implementation
o kerneldoc comments for structures
o removed logfs_compress_none and logfs_uncompress_none
o Split header
o move fsck behind debug option


Changed many times (and I sincerely hope not to have missed any):
o removed // comments, usually dead code
o added newline between variable declarations and code
o moved tail comments
o minor code cleanups
o changed comment style
o added loglevels to printk


Unchanged:
o enum for OBJ_TOP_JOURNAL and friends
  Afaics all the potential enums end up being converted through cpu_to_64XX,
  I'll have to see if that causes more problems than the enums can hope to
  solve.
o generic function for logfs_type()
o generic helper for logfs_prepare_write()
o removed EOF
o error handling
o move rootdir generation back to mkfs


Won't happen (unless I get convinced to do otherwise):
o lowercase LOGFS_SUPER and LOGFS_INODE
  These simply match the common pattern used is many Linux filesystems.
o remove "extern"
  For structures, this keyword actually serves a purpose and makes sense.
  On the other hand, it is completely bogus for functions and I won't be
  bullied into adding bogus crud either.
o Move fs/logfs/Locking to Documentation/
  At least JFFS2 has such documentation in its own directory as well.
  I can see good arguments for either side and no strong consensus.  While
  I ultimately just don't case, doing nothing is a good option in cases
  of great uncertainty.
o change (void*) to "proper" cast
  Neither variant is much better than the other.  I'm open for suggestions
  to completely remove the casts, though.
o Change LOGFS_BUG() and LOGFS_BUG_ON() to inline functions
  These are macros for very much the same reasons BUG() and BUG_ON() are.


Jörn

-- 
To my face you have the audacity to advise me to become a thief - the worst
kind of thief that is conceivable, a thief of spiritual things, a thief of
ideas! It is insufferable, intolerable!
-- M. Binet in Scarabouche
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/