Re: Review status (Re: [PATCH] LogFS take three)

2007-05-23 Thread Evgeniy Polyakov
On Wed, May 23, 2007 at 05:14:04PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote:
> > > I'm just a German.  Forgive me if I drink lesser beverages.
> > 
> > You should definitely change that.
> 
> Change being German?  Not a bad idea, actually.

You cook up really tasty shnaps, in small quantities it is good for
health in infinite volumes.

> > Btw, what about this piece:
> > 
> > int logfs_erase_segment(struct super_block *sb, u32 index)
> > {
> > struct logfs_super *super = LOGFS_SUPER(sb);
> > 
> > super->s_gec++;
> > 
> > return mtderase(sb, index << super->s_segshift, super->s_segsize);
> > }
> > 
> > index << super->s_segshift might overflow, mtderase expects loff_t
> > there, since index can be arbitrary segment number, is it possible, that
> > overflow really occurs?
> 
> Indeed it is.  You just earned your second beer^Wvodka.

Actually this code looks less encrypted than ext2 for, which definitely
a good sign from reviewer's point of view.

> Jörn

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-23 Thread Jörn Engel
On Wed, 23 May 2007 19:07:32 +0400, Evgeniy Polyakov wrote:
> On Wed, May 23, 2007 at 02:58:41PM +0200, Jörn Engel ([EMAIL PROTECTED]) 
> wrote:
> > On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote:
> 
> And what if it is 33 bits? Or it is not allowed?

Not allowed.  Both number and size of segments may never exceed 32bit.

> > > segsize is long, but should be u64 I think.
> > 
> > It could be s32 as well.
> 
> It is a matter of definition - if segment size is allowed to be more
> than 32 bits, then below transformation is not correct, otherwise
> segment size should not use additional 32bits on 64bit platform, since
> it is long.

I guess I could save 4 Bytes there.

> > I'm just a German.  Forgive me if I drink lesser beverages.
> 
> You should definitely change that.

Change being German?  Not a bad idea, actually.

> Btw, what about this piece:
> 
> int logfs_erase_segment(struct super_block *sb, u32 index)
> {
>   struct logfs_super *super = LOGFS_SUPER(sb);
> 
>   super->s_gec++;
> 
>   return mtderase(sb, index << super->s_segshift, super->s_segsize);
> }
> 
> index << super->s_segshift might overflow, mtderase expects loff_t
> there, since index can be arbitrary segment number, is it possible, that
> overflow really occurs?

Indeed it is.  You just earned your second beer^Wvodka.

Jörn

-- 
The wise man seeks everything in himself; the ignorant man tries to get
everything from somebody else.
-- unknown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-23 Thread Evgeniy Polyakov
On Wed, May 23, 2007 at 02:58:41PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote:
> On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote:
> > 
> > In that case segment size must be more than 32 bits, or below
> > transformation will not be correct?
> 
> Must it?  If segment size is just 20bit then the filesystem may only be
> 52bit.  Or 51bit when using signed values.

And what if it is 33 bits? Or it is not allowed?

> > segsize is long, but should be u64 I think.
> 
> It could be s32 as well.

It is a matter of definition - if segment size is allowed to be more
than 32 bits, then below transformation is not correct, otherwise
segment size should not use additional 32bits on 64bit platform, since
it is long.

> > static void fixup_from_wbuf(struct super_block *sb, struct logfs_area
> > *area, void *read, u64 ofs, size_t readlen)
> > 
> > u32 read_start = ofs & (super->s_segsize - 1);
> > u32 read_end = read_start + readlen;
> > 
> > And this can overflow, since readlen is size_t.
> 
> Theoretically yes.  Practically readlen is bounded to sb->blocksize plus
> one header.  I'll start worrying about that when blocksize approaches
> 32bit limit.
> 
> > > If anyone can find similar bugs, the bounty is a beer or non-alcoholic
> > > beverage of choice. :)
> > 
> > Stop kiling your kidneys, your health and promote such antisocial style
> > of life, start drinking vodka instead.
> 
> I'm just a German.  Forgive me if I drink lesser beverages.

You should definitely change that.


Btw, what about this piece:

int logfs_erase_segment(struct super_block *sb, u32 index)
{
struct logfs_super *super = LOGFS_SUPER(sb);

super->s_gec++;

return mtderase(sb, index << super->s_segshift, super->s_segsize);
}

index << super->s_segshift might overflow, mtderase expects loff_t
there, since index can be arbitrary segment number, is it possible, that
overflow really occurs?

> Jörn
> 
> -- 
> Eighty percent of success is showing up.
> -- Woody Allen

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-23 Thread Jörn Engel
On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote:
> 
> In that case segment size must be more than 32 bits, or below
> transformation will not be correct?

Must it?  If segment size is just 20bit then the filesystem may only be
52bit.  Or 51bit when using signed values.

> segsize is long, but should be u64 I think.

It could be s32 as well.

> static void fixup_from_wbuf(struct super_block *sb, struct logfs_area
>   *area, void *read, u64 ofs, size_t readlen)
> 
> u32 read_start = ofs & (super->s_segsize - 1);
> u32 read_end = read_start + readlen;
> 
> And this can overflow, since readlen is size_t.

Theoretically yes.  Practically readlen is bounded to sb->blocksize plus
one header.  I'll start worrying about that when blocksize approaches
32bit limit.

> > If anyone can find similar bugs, the bounty is a beer or non-alcoholic
> > beverage of choice. :)
> 
> Stop kiling your kidneys, your health and promote such antisocial style
> of life, start drinking vodka instead.

I'm just a German.  Forgive me if I drink lesser beverages.

Jörn

-- 
Eighty percent of success is showing up.
-- Woody Allen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-23 Thread Jörn Engel
On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote:
 
 In that case segment size must be more than 32 bits, or below
 transformation will not be correct?

Must it?  If segment size is just 20bit then the filesystem may only be
52bit.  Or 51bit when using signed values.

 segsize is long, but should be u64 I think.

It could be s32 as well.

 static void fixup_from_wbuf(struct super_block *sb, struct logfs_area
   *area, void *read, u64 ofs, size_t readlen)
 
 u32 read_start = ofs  (super-s_segsize - 1);
 u32 read_end = read_start + readlen;
 
 And this can overflow, since readlen is size_t.

Theoretically yes.  Practically readlen is bounded to sb-blocksize plus
one header.  I'll start worrying about that when blocksize approaches
32bit limit.

  If anyone can find similar bugs, the bounty is a beer or non-alcoholic
  beverage of choice. :)
 
 Stop kiling your kidneys, your health and promote such antisocial style
 of life, start drinking vodka instead.

I'm just a German.  Forgive me if I drink lesser beverages.

Jörn

-- 
Eighty percent of success is showing up.
-- Woody Allen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-23 Thread Evgeniy Polyakov
On Wed, May 23, 2007 at 02:58:41PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote:
 On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote:
  
  In that case segment size must be more than 32 bits, or below
  transformation will not be correct?
 
 Must it?  If segment size is just 20bit then the filesystem may only be
 52bit.  Or 51bit when using signed values.

And what if it is 33 bits? Or it is not allowed?

  segsize is long, but should be u64 I think.
 
 It could be s32 as well.

It is a matter of definition - if segment size is allowed to be more
than 32 bits, then below transformation is not correct, otherwise
segment size should not use additional 32bits on 64bit platform, since
it is long.

  static void fixup_from_wbuf(struct super_block *sb, struct logfs_area
  *area, void *read, u64 ofs, size_t readlen)
  
  u32 read_start = ofs  (super-s_segsize - 1);
  u32 read_end = read_start + readlen;
  
  And this can overflow, since readlen is size_t.
 
 Theoretically yes.  Practically readlen is bounded to sb-blocksize plus
 one header.  I'll start worrying about that when blocksize approaches
 32bit limit.
 
   If anyone can find similar bugs, the bounty is a beer or non-alcoholic
   beverage of choice. :)
  
  Stop kiling your kidneys, your health and promote such antisocial style
  of life, start drinking vodka instead.
 
 I'm just a German.  Forgive me if I drink lesser beverages.

You should definitely change that.


Btw, what about this piece:

int logfs_erase_segment(struct super_block *sb, u32 index)
{
struct logfs_super *super = LOGFS_SUPER(sb);

super-s_gec++;

return mtderase(sb, index  super-s_segshift, super-s_segsize);
}

index  super-s_segshift might overflow, mtderase expects loff_t
there, since index can be arbitrary segment number, is it possible, that
overflow really occurs?

 Jörn
 
 -- 
 Eighty percent of success is showing up.
 -- Woody Allen

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-23 Thread Jörn Engel
On Wed, 23 May 2007 19:07:32 +0400, Evgeniy Polyakov wrote:
 On Wed, May 23, 2007 at 02:58:41PM +0200, Jörn Engel ([EMAIL PROTECTED]) 
 wrote:
  On Sun, 20 May 2007 21:30:52 +0400, Evgeniy Polyakov wrote:
 
 And what if it is 33 bits? Or it is not allowed?

Not allowed.  Both number and size of segments may never exceed 32bit.

   segsize is long, but should be u64 I think.
  
  It could be s32 as well.
 
 It is a matter of definition - if segment size is allowed to be more
 than 32 bits, then below transformation is not correct, otherwise
 segment size should not use additional 32bits on 64bit platform, since
 it is long.

I guess I could save 4 Bytes there.

  I'm just a German.  Forgive me if I drink lesser beverages.
 
 You should definitely change that.

Change being German?  Not a bad idea, actually.

 Btw, what about this piece:
 
 int logfs_erase_segment(struct super_block *sb, u32 index)
 {
   struct logfs_super *super = LOGFS_SUPER(sb);
 
   super-s_gec++;
 
   return mtderase(sb, index  super-s_segshift, super-s_segsize);
 }
 
 index  super-s_segshift might overflow, mtderase expects loff_t
 there, since index can be arbitrary segment number, is it possible, that
 overflow really occurs?

Indeed it is.  You just earned your second beer^Wvodka.

Jörn

-- 
The wise man seeks everything in himself; the ignorant man tries to get
everything from somebody else.
-- unknown
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-23 Thread Evgeniy Polyakov
On Wed, May 23, 2007 at 05:14:04PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote:
   I'm just a German.  Forgive me if I drink lesser beverages.
  
  You should definitely change that.
 
 Change being German?  Not a bad idea, actually.

You cook up really tasty shnaps, in small quantities it is good for
health in infinite volumes.

  Btw, what about this piece:
  
  int logfs_erase_segment(struct super_block *sb, u32 index)
  {
  struct logfs_super *super = LOGFS_SUPER(sb);
  
  super-s_gec++;
  
  return mtderase(sb, index  super-s_segshift, super-s_segsize);
  }
  
  index  super-s_segshift might overflow, mtderase expects loff_t
  there, since index can be arbitrary segment number, is it possible, that
  overflow really occurs?
 
 Indeed it is.  You just earned your second beer^Wvodka.

Actually this code looks less encrypted than ext2 for, which definitely
a good sign from reviewer's point of view.

 Jörn

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-20 Thread Evgeniy Polyakov
On Thu, May 17, 2007 at 07:10:19PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote:
> On Thu, 17 May 2007 20:03:11 +0400, Evgeniy Polyakov wrote:
> > 
> > Is logfs 32bit fs or 674bit, since although you use 64bit values for
> > offsets, area management and strange converstions like described below 
> > from offset into segment number are performed in 32bit?
> > Is it enough for SSD for example to be 32bit only? Or if it is 64bit,
> > could you please explain logic behind area management?
> 
> Ignoring bugs and signed return values for error handling, it is either
> 64bit or 32+32bit.
> 
> Inode numbers and file positions are 64bit.  Offsets are 64bit as well.
> In a couple of places, offsets are also 32+32bit.  Basically the high
> bits contain the segment number, the lower bits the offset within a
> segment.

In that case segment size must be more than 32 bits, or below
transformation will not be correct? segsize is long, but should be u64 I
think.

static void fixup_from_wbuf(struct super_block *sb, struct logfs_area
*area, void *read, u64 ofs, size_t readlen)

u32 read_start = ofs & (super->s_segsize - 1);
u32 read_end = read_start + readlen;

And this can overflow, since readlen is size_t.
It is wbuf fixup, but I saw that somewhere else.
Although, according to your description, it should be 32bit, sum can be
more than 32 bit.

> If anyone can find similar bugs, the bounty is a beer or non-alcoholic
> beverage of choice. :)

Stop kiling your kidneys, your health and promote such antisocial style
of life, start drinking vodka instead.

> Jörn

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-20 Thread Evgeniy Polyakov
On Thu, May 17, 2007 at 07:10:19PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote:
 On Thu, 17 May 2007 20:03:11 +0400, Evgeniy Polyakov wrote:
  
  Is logfs 32bit fs or 674bit, since although you use 64bit values for
  offsets, area management and strange converstions like described below 
  from offset into segment number are performed in 32bit?
  Is it enough for SSD for example to be 32bit only? Or if it is 64bit,
  could you please explain logic behind area management?
 
 Ignoring bugs and signed return values for error handling, it is either
 64bit or 32+32bit.
 
 Inode numbers and file positions are 64bit.  Offsets are 64bit as well.
 In a couple of places, offsets are also 32+32bit.  Basically the high
 bits contain the segment number, the lower bits the offset within a
 segment.

In that case segment size must be more than 32 bits, or below
transformation will not be correct? segsize is long, but should be u64 I
think.

static void fixup_from_wbuf(struct super_block *sb, struct logfs_area
*area, void *read, u64 ofs, size_t readlen)

u32 read_start = ofs  (super-s_segsize - 1);
u32 read_end = read_start + readlen;

And this can overflow, since readlen is size_t.
It is wbuf fixup, but I saw that somewhere else.
Although, according to your description, it should be 32bit, sum can be
more than 32 bit.

 If anyone can find similar bugs, the bounty is a beer or non-alcoholic
 beverage of choice. :)

Stop kiling your kidneys, your health and promote such antisocial style
of life, start drinking vodka instead.

 Jörn

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Rob Landley
On Saturday 19 May 2007 5:24 am, Jan Engelhardt wrote:
> 
> On May 19 2007 02:15, Rob Landley wrote:
> >> > +
> >> > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode)
> >> > +{
> >> > +return container_of(inode, struct logfs_inode, vfs_inode);
> >> > +}
> >> 
> >> Do these need to be uppercase?
> >
> >I'm trying to keep it clear in my head...
> >
> >When do you need to say __always_inline and when can you get away with 
> >just saying "static inline"?
> 
> When using "static inline", the compiler may ignore the inline keyword 
> (it's just a hint), and leave the function as a standalone function.
> 
> When CONFIG_FORCED_INLINING is active, and it is by default, inline is 
> always substituted by __always_inline, to be on the safe side. Some code 
> needs to be always inline; but not all code has been checked whether it 
> is safe to go from __always_inline to inline.

I've seen patches go by using __always_inline directly.  Is there some 
janitorial effort to examine each instance of the the inline keyword and 
either replace it with "__always_inline" or remove it?

Right now "inline" seems to be about as useful as the "register" keyword.  You 
don't feed hints to a compiler like gcc, you hit it with a two-by-four and 
thumbscrews if you want to get its' attention.

Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Evgeniy Polyakov
On Sat, May 19, 2007 at 05:17:32PM +0100, Jamie Lokier ([EMAIL PROTECTED]) 
wrote:
> > So, log2fs...  Sounds great to me.
> 
> Why Log2?  Logarithmic scaling is just logarithmic scaling.  Does the
> filesystem use 2-ary trees or anything else which gives particular
> meaning to 2?

Sizes used in on-disk format are rounded to the nearest power-of-two.

> -- Jamie

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Jamie Lokier
David Weinehall wrote:
> > It is also the filesystem that tries to scale logarithmically, as Arnd
> > has noted.  Maybe I should call it Log2 to emphesize this point.  Log1
> > would be horrible scalability.
> 
> So, log2fs...  Sounds great to me.

Why Log2?  Logarithmic scaling is just logarithmic scaling.  Does the
filesystem use 2-ary trees or anything else which gives particular
meaning to 2?

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Bill Davidsen

Dongjun Shin wrote:


There are so many flash-based storage and some disposable storages,
as you pointed out, have poor quality. I think it's mainly because these
are not designed for good quality, but for lowering the price.

The reliability seems to be appropriate to the common use. I'm doubious 
that computer storage was a big design factor until the last few years. 
A good argument for buying large sizes, they are more likely to be 
recent design.



These kind of devices are not ready for things like power failure because
their use case is far from that. For example, removing flash card
while taking pictures using digital camera is not a common use case.
(there should be a written notice that this kind of action is against
the warranty)

They do well in such use, if you equate battery death to pulling the 
card (it may not be). I have tested that feature and not had a failure 
of any but the last item. Clearly not recommended, but sometimes 
unplanned needs arise.




- In contrast to the embedded environment where CPU and flash is directly
connected, the I/O path between CPU and flash in PC environment is longer.
The latency for SW handshaking between CPU and flash will also be longer,
which would make the performance optimization harder.

As I mentioned, some techniques like log-structured filesystem could
perform generally better on any kind of flash-based storage with FTL.
Although there are many kinds of FTL, it is commonly true that
it performs well under workload where sequential write is dominant.

I also expect that FTL for PC environment will have better quality spec
than the disposable storage.


The recent technology announcements from Intel are encouraging in that 
respect.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread David Weinehall
On Wed, May 16, 2007 at 03:53:19PM +0200, Jörn Engel wrote:
> On Wed, 16 May 2007 09:41:10 -0400, John Stoffel wrote:
> > Jörn> On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
> > 
> > Jörn> How many of you have worked for IBM before?  Vowels are not
> > evil. ;)
> > 
> > Nope, they're not.  I just think that LogFS isn't descriptive enough,
> > or more accurately, is the *wrong* description of this filesystem.  
> 
> That was the whole point.  JFFS2, the journaling flash filesystem, is a
> strictly log-structured filesystem.  LogFS has a journal.
> 
> It is also the filesystem that tries to scale logarithmically, as Arnd
> has noted.  Maybe I should call it Log2 to emphesize this point.  Log1
> would be horrible scalability.

So, log2fs...  Sounds great to me.

[snip]


Regards: David
-- 
 /) David Weinehall <[EMAIL PROTECTED]> /) Northern lights wander  (\
//  Maintainer of the v2.0 kernel   //  Dance across the winter sky //
\)  http://www.acc.umu.se/~tao/(/   Full colour fire   (/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Bill Davidsen

Kevin Bowling wrote:

On 5/16/07, David Woodhouse <[EMAIL PROTECTED]> wrote:

On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote:
>
> My experience is that no matter which name I pick, people will
> complain
> anyway.  Previous suggestions included:
> jffs3
> jefs
> engelfs
> poofs
> crapfs
> sweetfs
> cutefs
> dynamic journaling fs - djofs
> tfsfkal - the file system formerly known as logfs

Can we call it jörnfs? :)


However if Jörn is accused of murder, it will have little chance of
being merged :-).


WRT that, seems that Nina had a lover who is a confessed serial killer. 
I'm surprised the case hasn't been adapter for 'Boston legal' and 'Law 
and Order' like other high profile crimes.


I see nothing wrong with jörnfs, and there's room for numbers at the end...

--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Jan Engelhardt

On May 19 2007 02:15, Rob Landley wrote:
>> > +
>> > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode)
>> > +{
>> > +  return container_of(inode, struct logfs_inode, vfs_inode);
>> > +}
>> 
>> Do these need to be uppercase?
>
>I'm trying to keep it clear in my head...
>
>When do you need to say __always_inline and when can you get away with 
>just saying "static inline"?

When using "static inline", the compiler may ignore the inline keyword 
(it's just a hint), and leave the function as a standalone function.

When CONFIG_FORCED_INLINING is active, and it is by default, inline is 
always substituted by __always_inline, to be on the safe side. Some code 
needs to be always inline; but not all code has been checked whether it 
is safe to go from __always_inline to inline.


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Rob Landley
On Tuesday 15 May 2007 4:37 pm, Andrew Morton wrote:

> > +static inline struct logfs_super *LOGFS_SUPER(struct super_block *sb)
> > +{
> > +   return sb->s_fs_info;
> > +}
> > +
> > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode)
> > +{
> > +   return container_of(inode, struct logfs_inode, vfs_inode);
> > +}
> 
> Do these need to be uppercase?

I'm trying to keep it clear in my head...

When do you need to say __always_inline and when can you get away with just 
saying "static inline"?

(I'm attempting to write documentation on a topic I don't understand.  Best 
way to learn it, I've found...)

> > + buf = kmap(page);
> > + ret = logfs_write_buf(inode, index, buf);
> > + kunmap(page);
>
> kmap() is lame.  The preferred approach would be to pass the page* down to
> the lower layers and to use kmap_atomic() at the lowest possible point.

Um, would I read about this in DMA-mapping.txt or cachetlb.txt?  (I don't 
think it's fujitsu/frv/mmu-layout.txt)

Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Rob Landley
On Tuesday 15 May 2007 4:37 pm, Andrew Morton wrote:

  +static inline struct logfs_super *LOGFS_SUPER(struct super_block *sb)
  +{
  +   return sb-s_fs_info;
  +}
  +
  +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode)
  +{
  +   return container_of(inode, struct logfs_inode, vfs_inode);
  +}
 
 Do these need to be uppercase?

I'm trying to keep it clear in my head...

When do you need to say __always_inline and when can you get away with just 
saying static inline?

(I'm attempting to write documentation on a topic I don't understand.  Best 
way to learn it, I've found...)

  + buf = kmap(page);
  + ret = logfs_write_buf(inode, index, buf);
  + kunmap(page);

 kmap() is lame.  The preferred approach would be to pass the page* down to
 the lower layers and to use kmap_atomic() at the lowest possible point.

Um, would I read about this in DMA-mapping.txt or cachetlb.txt?  (I don't 
think it's fujitsu/frv/mmu-layout.txt)

Rob
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Jan Engelhardt

On May 19 2007 02:15, Rob Landley wrote:
  +
  +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode)
  +{
  +  return container_of(inode, struct logfs_inode, vfs_inode);
  +}
 
 Do these need to be uppercase?

I'm trying to keep it clear in my head...

When do you need to say __always_inline and when can you get away with 
just saying static inline?

When using static inline, the compiler may ignore the inline keyword 
(it's just a hint), and leave the function as a standalone function.

When CONFIG_FORCED_INLINING is active, and it is by default, inline is 
always substituted by __always_inline, to be on the safe side. Some code 
needs to be always inline; but not all code has been checked whether it 
is safe to go from __always_inline to inline.


Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Bill Davidsen

Kevin Bowling wrote:

On 5/16/07, David Woodhouse [EMAIL PROTECTED] wrote:

On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote:

 My experience is that no matter which name I pick, people will
 complain
 anyway.  Previous suggestions included:
 jffs3
 jefs
 engelfs
 poofs
 crapfs
 sweetfs
 cutefs
 dynamic journaling fs - djofs
 tfsfkal - the file system formerly known as logfs

Can we call it jörnfs? :)


However if Jörn is accused of murder, it will have little chance of
being merged :-).


WRT that, seems that Nina had a lover who is a confessed serial killer. 
I'm surprised the case hasn't been adapter for 'Boston legal' and 'Law 
and Order' like other high profile crimes.


I see nothing wrong with jörnfs, and there's room for numbers at the end...

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread David Weinehall
On Wed, May 16, 2007 at 03:53:19PM +0200, Jörn Engel wrote:
 On Wed, 16 May 2007 09:41:10 -0400, John Stoffel wrote:
  Jörn On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
  
  Jörn How many of you have worked for IBM before?  Vowels are not
  evil. ;)
  
  Nope, they're not.  I just think that LogFS isn't descriptive enough,
  or more accurately, is the *wrong* description of this filesystem.  
 
 That was the whole point.  JFFS2, the journaling flash filesystem, is a
 strictly log-structured filesystem.  LogFS has a journal.
 
 It is also the filesystem that tries to scale logarithmically, as Arnd
 has noted.  Maybe I should call it Log2 to emphesize this point.  Log1
 would be horrible scalability.

So, log2fs...  Sounds great to me.

[snip]


Regards: David
-- 
 /) David Weinehall [EMAIL PROTECTED] /) Northern lights wander  (\
//  Maintainer of the v2.0 kernel   //  Dance across the winter sky //
\)  http://www.acc.umu.se/~tao/(/   Full colour fire   (/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Bill Davidsen

Dongjun Shin wrote:


There are so many flash-based storage and some disposable storages,
as you pointed out, have poor quality. I think it's mainly because these
are not designed for good quality, but for lowering the price.

The reliability seems to be appropriate to the common use. I'm doubious 
that computer storage was a big design factor until the last few years. 
A good argument for buying large sizes, they are more likely to be 
recent design.



These kind of devices are not ready for things like power failure because
their use case is far from that. For example, removing flash card
while taking pictures using digital camera is not a common use case.
(there should be a written notice that this kind of action is against
the warranty)

They do well in such use, if you equate battery death to pulling the 
card (it may not be). I have tested that feature and not had a failure 
of any but the last item. Clearly not recommended, but sometimes 
unplanned needs arise.




- In contrast to the embedded environment where CPU and flash is directly
connected, the I/O path between CPU and flash in PC environment is longer.
The latency for SW handshaking between CPU and flash will also be longer,
which would make the performance optimization harder.

As I mentioned, some techniques like log-structured filesystem could
perform generally better on any kind of flash-based storage with FTL.
Although there are many kinds of FTL, it is commonly true that
it performs well under workload where sequential write is dominant.

I also expect that FTL for PC environment will have better quality spec
than the disposable storage.


The recent technology announcements from Intel are encouraging in that 
respect.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Jamie Lokier
David Weinehall wrote:
  It is also the filesystem that tries to scale logarithmically, as Arnd
  has noted.  Maybe I should call it Log2 to emphesize this point.  Log1
  would be horrible scalability.
 
 So, log2fs...  Sounds great to me.

Why Log2?  Logarithmic scaling is just logarithmic scaling.  Does the
filesystem use 2-ary trees or anything else which gives particular
meaning to 2?

-- Jamie
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Evgeniy Polyakov
On Sat, May 19, 2007 at 05:17:32PM +0100, Jamie Lokier ([EMAIL PROTECTED]) 
wrote:
  So, log2fs...  Sounds great to me.
 
 Why Log2?  Logarithmic scaling is just logarithmic scaling.  Does the
 filesystem use 2-ary trees or anything else which gives particular
 meaning to 2?

Sizes used in on-disk format are rounded to the nearest power-of-two.

 -- Jamie

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Rob Landley
On Saturday 19 May 2007 5:24 am, Jan Engelhardt wrote:
 
 On May 19 2007 02:15, Rob Landley wrote:
   +
   +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode)
   +{
   +return container_of(inode, struct logfs_inode, vfs_inode);
   +}
  
  Do these need to be uppercase?
 
 I'm trying to keep it clear in my head...
 
 When do you need to say __always_inline and when can you get away with 
 just saying static inline?
 
 When using static inline, the compiler may ignore the inline keyword 
 (it's just a hint), and leave the function as a standalone function.
 
 When CONFIG_FORCED_INLINING is active, and it is by default, inline is 
 always substituted by __always_inline, to be on the safe side. Some code 
 needs to be always inline; but not all code has been checked whether it 
 is safe to go from __always_inline to inline.

I've seen patches go by using __always_inline directly.  Is there some 
janitorial effort to examine each instance of the the inline keyword and 
either replace it with __always_inline or remove it?

Right now inline seems to be about as useful as the register keyword.  You 
don't feed hints to a compiler like gcc, you hit it with a two-by-four and 
thumbscrews if you want to get its' attention.

Rob
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-18 Thread David Woodhouse
On Fri, 2007-05-18 at 08:17 +0200, Jan Engelhardt wrote:
> > AFAIK, the camera stops writing to the flash card and automatically
> > turns off when it's low on battery (before empty).
> 
> But then, one should also consider the case where a cam is connected to
> AC and someone inadvertently trips on the power cord. 

So you stick a bloody great cap on board to give you enough time to shut
it down cleanly. I've known people do this -- and it helps, but the
devices still manage to crap themselves occasionally even then.

They're _disposable_. As are your data :)

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-18 Thread Jan Engelhardt

On May 18 2007 09:01, Dongjun Shin wrote:
> On 5/18/07, Pavel Machek <[EMAIL PROTECTED]> wrote:
>> 
>> Hmm.. so operating your camera on batteries should be against the
>> warranty, since batteries commonly run empty while storing pictures?
>
> AFAIK, the camera stops writing to the flash card and automatically
> turns off when it's low on battery (before empty).

But then, one should also consider the case where a cam is connected to
AC and someone inadvertently trips on the power cord.


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-18 Thread Jan Engelhardt

On May 17 2007 21:00, Kyle Moffett wrote:
>> > > Opinions?
>> > 
>> > Why would we need another btree, when there is lib/rbtree.c?  Or does
>> > yours do something fundamentally different?
>> 
>> It is not red-black tree, it is b+ tree.
>
> It might be better to use the prefix "bptree" to help prevent confusion.  A
> quick google search on "bp-tree" reveals only the perl B+-tree module
> "Tree::BPTree", a U-Maryland Java CS project on B+-trees, and a news article
> about a "BP tree-top protest".

BP heh.. How about "struct bplustree"?


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-18 Thread Jan Engelhardt

On May 17 2007 21:00, Kyle Moffett wrote:
   Opinions?
  
  Why would we need another btree, when there is lib/rbtree.c?  Or does
  yours do something fundamentally different?
 
 It is not red-black tree, it is b+ tree.

 It might be better to use the prefix bptree to help prevent confusion.  A
 quick google search on bp-tree reveals only the perl B+-tree module
 Tree::BPTree, a U-Maryland Java CS project on B+-trees, and a news article
 about a BP tree-top protest.

BP heh.. How about struct bplustree?


Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-18 Thread Jan Engelhardt

On May 18 2007 09:01, Dongjun Shin wrote:
 On 5/18/07, Pavel Machek [EMAIL PROTECTED] wrote:
 
 Hmm.. so operating your camera on batteries should be against the
 warranty, since batteries commonly run empty while storing pictures?

 AFAIK, the camera stops writing to the flash card and automatically
 turns off when it's low on battery (before empty).

But then, one should also consider the case where a cam is connected to
AC and someone inadvertently trips on the power cord.


Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-18 Thread David Woodhouse
On Fri, 2007-05-18 at 08:17 +0200, Jan Engelhardt wrote:
  AFAIK, the camera stops writing to the flash card and automatically
  turns off when it's low on battery (before empty).
 
 But then, one should also consider the case where a cam is connected to
 AC and someone inadvertently trips on the power cord. 

So you stick a bloody great cap on board to give you enough time to shut
it down cleanly. I've known people do this -- and it helps, but the
devices still manage to crap themselves occasionally even then.

They're _disposable_. As are your data :)

-- 
dwmw2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Kyle Moffett

On May 17, 2007, at 13:45:33, Evgeniy Polyakov wrote:
On Thu, May 17, 2007 at 07:26:07PM +0200, Jan Engelhardt  
([EMAIL PROTECTED]) wrote:
My plan was to move this code to lib/ sooner or later.  If you  
consider it useful in its current state, I can do it immediatly.   
And if someone else merged a superior btree library I'd happily  
remove mine and use the new one instead.


Opinions?


Why would we need another btree, when there is lib/rbtree.c?  Or  
does yours do something fundamentally different?


It is not red-black tree, it is b+ tree.


It might be better to use the prefix "bptree" to help prevent  
confusion.  A quick google search on "bp-tree" reveals only the perl B 
+-tree module "Tree::BPTree", a U-Maryland Java CS project on B+- 
trees, and a news article about a "BP tree-top protest".


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Dongjun Shin

Hi,

On 5/18/07, Pavel Machek <[EMAIL PROTECTED]> wrote:

Hi!


Hmm.. so operating your camera on batteries should be against the
warranty, since batteries commonly run empty while storing pictures?




AFAIK, the camera stops writing to the flash card and automatically
turns off when it's low on battery (before empty).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jamie Lokier
Jörn Engel wrote:
> > Almost all your static functions start with logfs_, why not this one?
> 
> Because after a while I discovered how silly it is to start every
> function with logfs_.  That prefix doesn't add much unless the function
> has global scope.  What I didn't do was remove the prefix from older
> functions.

It's handy when debugging or showing detailed backtraces.  Not that
I'm advocating it (or not), just something I've noticed in other
programs.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jörn Engel
On Thu, 17 May 2007 23:36:13 +0200, Arnd Bergmann wrote:
> On Thursday 17 May 2007, Pekka Enberg wrote:
> > 
> > So any sane way to enable compression is on per-inode basis which makes 
> > me still wonder why you need per-object compression.
> 
> 1. it doesn't require user interaction, the file system will do the right
> thing most of the time.
> 
> 2. enlarging data is a very bad thing because it makes the behaviour
> of the fs unpredictable. With uncompressed objects, you have a guaranteed
> upper bound on the size.

Correct.  The compression decision is always per-object.  Per-inode is a
hint from userspace that a compression attempt would be futile.

A compression algorithm that compresses any data is provably impossible.
Some data will always cause expansion instead of compression.  Some
algorithms have a well-known upper bound on the expansion, others don't.
So LogFS instead creates its own upper bound by reserving one byte in
the header for the compression type.

And while one bit would suffice as a compressed/uncompressed flag,
having a byte allows to support more than one compression algorithm.
LZO looks promising and is on its way into the kernel.  Others may come
in the future.

Jörn

-- 
My second remark is that our intellectual powers are rather geared to
master static relations and that our powers to visualize processes
evolving in time are relatively poorly developed.
-- Edsger W. Dijkstra
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Thursday 17 May 2007, Pekka Enberg wrote:
> 
> Jörn Engel wrote:
> > Compressing random data will actually enlarge it.  If that happens I
> > simply store the verbatim uncompressed data instead and mark it as such.
> > 
> > There is also demand for a user-controlled bit in the inode to disable
> > compression completely.  All those .jpg, .mpg, .mp3, etc. just waste
> > time by trying and failing to compress them.
> 
> So any sane way to enable compression is on per-inode basis which makes 
> me still wonder why you need per-object compression.

1. it doesn't require user interaction, the file system will do the right
thing most of the time.

2. enlarging data is a very bad thing because it makes the behaviour
of the fs unpredictable. With uncompressed objects, you have a guaranteed
upper bound on the size.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jörn Engel
On Thu, 17 May 2007 23:00:20 +0200, Arnd Bergmann wrote:
> 
> Just using nanoseconds probably doesn't gain you much after all
> then. You could however just have separate 32 bit fields in the
> inode for seconds and nanoseconds, that will result in the exact
> same layout that you have right now, but won't require a conversion
> function.

I could also have a 30bit and a 34bit field.  30bit is enough for
nanoseconds.  So many options.

Jörn

-- 
Time? What's that? Time is only worth what you do with it.
-- Theo de Raadt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Thursday 17 May 2007, Jörn Engel wrote:
> 
> > Why not just store 64 bit nanoseconds? that would avoid the problem
> > with ns overflow and the year-2038 bug. OTOH, that would require
> > a 64 bit integer division when reading the data, so it gets you
> > a runtime overhead.
> 
> I like the idea.  Do conversion function exist both way?
> 
> What I don't get is the year-2038 bug.  Isn't that the 31bit limit,
> while 32bit would last to 2106?

You're right, you don't hit the 2038 bug here, because you use an
unsigned variable. The bug exists elsewhere because time_t tv_sec
is signed.

Just using nanoseconds probably doesn't gain you much after all
then. You could however just have separate 32 bit fields in the
inode for seconds and nanoseconds, that will result in the exact
same layout that you have right now, but won't require a conversion
function.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Pekka Enberg

Jörn Engel wrote:

Compressing random data will actually enlarge it.  If that happens I
simply store the verbatim uncompressed data instead and mark it as such.

There is also demand for a user-controlled bit in the inode to disable
compression completely.  All those .jpg, .mpg, .mp3, etc. just waste
time by trying and failing to compress them.


So any sane way to enable compression is on per-inode basis which makes 
me still wonder why you need per-object compression.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Pavel Machek
Hi!

> >Yes. These things are almost always implemented _very_ 
> >badly by the same
> >kind of crack-smoking hobo they drag in off the streets 
> >to write BIOSen.
> >
> >It's bog-roll technology; if you fancy a laugh try 
> >doing some real
> >reliability tests on them time some. Powerfail testing 
> >is a good one.
> >
> >This kind of thing is OK for disposable storage such as 
> >in digital
> >cameras, where it doesn't matter that it's no more 
> >reliable than a
> >floppy disc, but for real long-term storage it's really 
> >a bad idea.
> >
> 
> There are so many flash-based storage and some 
> disposable storages,
> as you pointed out, have poor quality. I think it's 
> mainly because these
> are not designed for good quality, but for lowering the 
> price.
> 
> These kind of devices are not ready for things like 
> power failure because
> their use case is far from that. For example, removing 
> flash card
> while taking pictures using digital camera is not a 
> common use case.
> (there should be a written notice that this kind of 
> action is against
> the warranty)

Hmm.. so operating your camera on batteries should be against the
warranty, since batteries commonly run empty while storing pictures?


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jörn Engel
On Thu, 17 May 2007 17:08:51 +0200, Arnd Bergmann wrote:
> On Tuesday 15 May 2007, Jörn Engel wrote:
> > Add LogFS, a scalable flash filesystem.
> 
> Sorry for not commenting earlier, there were so many discussions on version
> two that I wanted to wait for the fallout of that instead of duplicating
> all the comments.

You are the last person that has to be sorry. ;)

> Here are a few things I notice while going through the third version:
> 
> > +/*
> > + * Private errno for accessed beyond end-of-file.  Only used internally to
> > + * logfs.  If this ever gets exposed to userspace or even other parts of 
> > the
> > + * kernel, it is a bug.  256 was chosen as a number sufficiently above all
> > + * used errno #defines.
> > + *
> > + * It can be argued that this is a hack and should be replaced with 
> > something
> > + * else.  My last attempt to do this failed spectacularly and there are 
> > more
> > + * urgent problems that users actually care about.  This will remain for 
> > the
> > + * moment.  Patches are wellcome, of course.
> > + */
> > +#define EOF256
> 
> It should at least be in the kernel-only errno range between 512 and 4095,
> that way it can eventually be added to include/linux/errno.h.

Fair enough.  512 it is.

> > + * Target rename works in three atomic steps:
> > + * 1. Attach old inode to new dentry (remember old dentry and new inode)
> > + * 2. Remove old dentry (still remember the new inode)
> > + * 3. Remove new inode
> > + *
> > + * Here we remember both an inode an a dentry.  If we get interrupted
> > + * between steps 1 and 2, we delete both the dentry and the inode.  If
> > + * we get interrupted between steps 2 and 3, we delete just the inode.
> > + * In either case, the remaining objects are deleted on next mount.  From
> > + * a users point of view, the operation succeeded.
> 
> This description had me confused for a while: why would you remove the
> new inode. Maybe change the text to say 'target inode' or 'victim inode'?

'Victim inode' sounds good.  Will do.

> > +static int logfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
> > +{
> > +   struct inode *inode;
> > +
> > +   if (dir->i_nlink >= LOGFS_LINK_MAX)
> > +   return -EMLINK;
> 
> Why is i_nlink limited? Don't you run out of space for inodes before
> overflowing?

I don't know.  With the current limit of 2^31, a sufficiently large
device can reach the limit.  And it is imaginable that overflowing the
s32 number space can expose security holes.  Not that I actually know,
the check is pure paranoia.

> > + * In principle, this function should loop forever, looking for GC 
> > candidates
> > + * and moving data.  LogFS is designed in such a way that this loop is
> > + * guaranteed to terminate.
> > + *
> > + * Limiting the loop to four iterations serves purely to catch cases when
> > + * these guarantees have failed.  An actual endless loop is an obvious bug
> > + * and should be reported as such.
> > + *
> > + * But there is another nasty twist to this.  As I have described in my LCA
> > + * presentation, Garbage collection would have to limit itself to higher
> > + * levels if the number of available free segments goes down.  This code
> > + * doesn't and should fail spectacularly.  Yet - hard as I tried I haven't
> > + * been able to make it fail (short of a bug elsewhere).
> > + *
> > + * So in a way this code is intentionally wrong as a desperate cry for a
> > + * better testcase.  And I do expect to get blamed for it one day. :(
> > + */
> 
> Could you bug the code to reserve fewer segments for GC than you really
> need, in order to stress test GC?

I could.  Wear leveling will cause changes in the area, so I'll have a
closer look when implementing that.

> > +static struct inode *logfs_alloc_inode(struct super_block *sb)
> > +{
> > +   struct logfs_inode *li;
> > +
> > +   li = kmem_cache_alloc(logfs_inode_cache, GFP_KERNEL);
> > +   if (!li)
> > +   return NULL;
> > +   logfs_init_inode(>vfs_inode);
> > +   return >vfs_inode;
> > +}
> > +
> > +
> > +struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino)
> > +{
> > +   struct inode *inode;
> > +
> > +   inode = logfs_alloc_inode(sb);
> > +   if (!inode)
> > +   return ERR_PTR(-ENOMEM);
> > +
> > +   logfs_init_inode(inode);
> 
> logfs_alloc_inode() returns an initialized inode, so no need to call
> logfs_init_inode() again, right?

Right.  Will change.

> > +static __be64 timespec_to_be64(struct timespec tsp)
> > +{
> > +   u64 time = ((u64)tsp.tv_sec << 32) + (tsp.tv_nsec & 0x);
> > +
> > +   WARN_ON(tsp.tv_nsec > 9);
> > +   return cpu_to_be64(time);
> > +}
> 
> Why not just store 64 bit nanoseconds? that would avoid the problem
> with ns overflow and the year-2038 bug. OTOH, that would require
> a 64 bit integer division when reading the data, so it gets you
> a runtime overhead.

I like the idea.  Do conversion function exist both way?

What I don't get is the year-2038 bug.  Isn't 

Re: [PATCH] LogFS take three

2007-05-17 Thread Evgeniy Polyakov
On Thu, May 17, 2007 at 07:26:07PM +0200, Jan Engelhardt ([EMAIL PROTECTED]) 
wrote:
> >My plan was to move this code to lib/ sooner or later.  If you consider
> >it useful in its current state, I can do it immediatly.  And if someone
> >else merged a superior btree library I'd happily remove mine and use the
> >new one instead.
> >
> >Opinions?
> 
> Why would we need another btree, when there is lib/rbtree.c?
> Or does yours do something fundamentally different?

It is not red-black tree, it is b+ tree.

>   Jan

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 16 2007 02:06, Jörn Engel wrote:
>
>> > +/* memtree.c */
>> > +void btree_init(struct btree_head *head);
>> > +void *btree_lookup(struct btree_head *head, long val);
>> > +int btree_insert(struct btree_head *head, long val, void *ptr);
>> > +int btree_remove(struct btree_head *head, long val);
>> 
>> These names are too generic.  If we later add a btree library: blam.
>
>My plan was to move this code to lib/ sooner or later.  If you consider
>it useful in its current state, I can do it immediatly.  And if someone
>else merged a superior btree library I'd happily remove mine and use the
>new one instead.
>
>Opinions?

Why would we need another btree, when there is lib/rbtree.c?
Or does yours do something fundamentally different?


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 16 2007 15:53, Jörn Engel wrote:
>
>My experience is that no matter which name I pick, people will complain
>anyway.  Previous suggestions included:
[...]
>
>Plus today:
>FFFS
>flashfs
>fredfs
>bob
>shizzle
>
>Imo they all suck.  LogFS also sucks, but it allows me to make a stupid
>joke and keep my logfs.org domain.

Try woodfs! (log - wood - get it?)
But finding names can be so tiresome, just give it a Borg-style
designation - "filesystem 125" or so. fs2007q1, being this
quartal's new filesystem.


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 16 2007 22:06, CaT wrote:
>On Wed, May 16, 2007 at 01:50:03PM +0200, J??rn Engel wrote:
>> On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
>> > 
>> > But if akpm can't pronounce it, how about FFFS for faster flash
>> > filesystem ;-)
>> 
>> How many of you have worked for IBM before?  Vowels are not evil. ;)
>> 
>> Grouping four or more consonants to name anything will cause similar
>> expressions on people's faces.  Numbers don't help much either.
>> 
>> Ext2 is a great name, because "ext" actually is a pronouncable syllable.
>> MinixFS, ChunkFS, TileFS are great too.  XFS and JFS are ok, at least
>> they only have three consonants.  But FFS exists, so I'd rather go for a
>> syllable.
>
>FlashFS?

Or just try once dropping all those redundant 'fs' suffixes.
bdev, proc, cpuset, devpts, mqueue, fuse(blk|ctl), vfat, iso9660, etc.
Then there's much more space for innovative names.


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 16 2007 14:55, Jörn Engel wrote:
>On Wed, 16 May 2007 16:29:22 +0400, Evgeniy Polyakov wrote:
>> On Wed, May 16, 2007 at 01:50:03PM +0200, Jörn Engel ([EMAIL PROTECTED]) 
>> wrote:
>> > On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
>> > > 
>> > > But if akpm can't pronounce it, how about FFFS for faster flash
>> > > filesystem ;-)
>> > 
>> > How many of you have worked for IBM before?  Vowels are not evil. ;)
>> 
>> Do you think 'eieio' is a good set? IBM's work too...

C'mon, UIO does not cut IIO either ;-)


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 16 2007 13:09, Jörn Engel wrote:
>On Wed, 16 May 2007 12:54:14 +0800, David Woodhouse wrote:
>> 
>> Personally I'd just go for 'JFFS3'. After all, it has a better claim to
>> the name than either of its predecessors :)
>
>Did you ever see akpm's facial expression when he tried to pronounce
>"JFFS2"?  ;)

Is there something special with [dʒeɪ ɛf ɛf ɛs tuː]?


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-17 Thread Jörn Engel
On Thu, 17 May 2007 20:03:11 +0400, Evgeniy Polyakov wrote:
> 
> Is logfs 32bit fs or 674bit, since although you use 64bit values for
> offsets, area management and strange converstions like described below 
> from offset into segment number are performed in 32bit?
> Is it enough for SSD for example to be 32bit only? Or if it is 64bit,
> could you please explain logic behind area management?

Ignoring bugs and signed return values for error handling, it is either
64bit or 32+32bit.

Inode numbers and file positions are 64bit.  Offsets are 64bit as well.
In a couple of places, offsets are also 32+32bit.  Basically the high
bits contain the segment number, the lower bits the offset within a
segment.

Side note: It would be nicer if the high 32bit were segment number.
Instead the number of bits depends on segment size.  Guess I should
change that while the format isn't fixed yet.

An "area" is a segment that is currently being written.  Data is
appended to this segment as it comes in, until the segment is full.  Any
functions dealing with areas only need a 32bit offset, which is the
offset within the area, not the absolute device offset.

Writes within an area are also buffered.  New data first goes into the
write buffer (wbuf) and only when this is full is it flushed to the
device.  NAND flash and some NOR flashes require such buffering.  When
writing to the device, the 32bit segno and the 32bit in-segment offset
need to get converted back to a 64bit device offset.

> I've found that you store segment numbers as 32bit values (for example
> in prepare_write()), and convert requested 64bit offset into segment
> number via superblock's s_segshift.

Yes, as described above.

> This conversation seems confusing to me in case of real 64bit offsets.
> For example this one obtained via prepare_write:
> 
> 7  1 logfs_prepare_write78  fs/logfs/file.c
> 8  2 logfs_readpage_nolock20  fs/logfs/file.c
> 9  1 logfs_read_block   351  fs/logfs/readwrite.c
> 10  1 logfs_read_loop   139  fs/logfs/readwrite.c
> 11  2 logfs_segment_read   108  fs/logfs/readwrite.c
> 12  1 wbuf_read 289 
> 
> u32 segno = ofs >> super->s_segshift;
> 
> ofs is originally obtained from inode's li_data array, which is filled
> with raw segment numbers which can be 64bit (here is another issue,
> since logfs_segment_write() returns signed, so essentially logfs is
> 63bit filesystem).

The filesystem format is 64bit.  The current code can only deal with
63bit.  Eric Sandeen just fixed ext2 to actually deal with 32bit
numbers and the same is possible for logfs.  If anyone ever cares...

> But here I've came to area management in logfs, and found that it is
> 32bit only, for example __logfs_segment_write()/__logfs_get_free_bytes() 
> returns signed 32 bit value (so it is reduced to 31 bit), which is then 
> placed into li_data as 64bit value. The latter
> (__logfs_get_free_bytes()) truncates 64bit data value obtained via
> dev_ofs() into signed 32 bit value.

That indeed is a bug.  __logfs_get_free_bytes() should return s64
instead of s32.  Will fix immediatly.

If anyone can find similar bugs, the bounty is a beer or non-alcoholic
beverage of choice. :)

Jörn

-- 
To announce that there must be no criticism of the President, or that we
are to stand by the President, right or wrong, is not only unpatriotic
and servile, but is morally treasonable to the American public.
-- Theodore Roosevelt, Kansas City Star, 1918
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-17 Thread Evgeniy Polyakov

Hi Jörn.

Is logfs 32bit fs or 674bit, since although you use 64bit values for
offsets, area management and strange converstions like described below 
from offset into segment number are performed in 32bit?
Is it enough for SSD for example to be 32bit only? Or if it is 64bit,
could you please explain logic behind area management?

I've found that you store segment numbers as 32bit values (for example
in prepare_write()), and convert requested 64bit offset into segment
number via superblock's s_segshift.
This conversation seems confusing to me in case of real 64bit offsets.
For example this one obtained via prepare_write:

7  1 logfs_prepare_write78  fs/logfs/file.c
8  2 logfs_readpage_nolock20  fs/logfs/file.c
9  1 logfs_read_block   351  fs/logfs/readwrite.c
10  1 logfs_read_loop   139  fs/logfs/readwrite.c
11  2 logfs_segment_read   108  fs/logfs/readwrite.c
12  1 wbuf_read 289 

u32 segno = ofs >> super->s_segshift;

ofs is originally obtained from inode's li_data array, which is filled
with raw segment numbers which can be 64bit (here is another issue,
since logfs_segment_write() returns signed, so essentially logfs is
63bit filesystem).

But here I've came to area management in logfs, and found that it is
32bit only, for example __logfs_segment_write()/__logfs_get_free_bytes() 
returns signed 32 bit value (so it is reduced to 31 bit), which is then 
placed into li_data as 64bit value. The latter
(__logfs_get_free_bytes()) truncates 64bit data value obtained via
dev_ofs() into signed 32 bit value.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Tuesday 15 May 2007, Jörn Engel wrote:
> Add LogFS, a scalable flash filesystem.

Hi Jörn,

Sorry for not commenting earlier, there were so many discussions on version
two that I wanted to wait for the fallout of that instead of duplicating
all the comments.

Here are a few things I notice while going through the third version:

> +/*
> + * Private errno for accessed beyond end-of-file.  Only used internally to
> + * logfs.  If this ever gets exposed to userspace or even other parts of the
> + * kernel, it is a bug.  256 was chosen as a number sufficiently above all
> + * used errno #defines.
> + *
> + * It can be argued that this is a hack and should be replaced with something
> + * else.  My last attempt to do this failed spectacularly and there are more
> + * urgent problems that users actually care about.  This will remain for the
> + * moment.  Patches are wellcome, of course.
> + */
> +#define EOF  256

It should at least be in the kernel-only errno range between 512 and 4095,
that way it can eventually be added to include/linux/errno.h.

> + * Target rename works in three atomic steps:
> + * 1. Attach old inode to new dentry (remember old dentry and new inode)
> + * 2. Remove old dentry (still remember the new inode)
> + * 3. Remove new inode
> + *
> + * Here we remember both an inode an a dentry.  If we get interrupted
> + * between steps 1 and 2, we delete both the dentry and the inode.  If
> + * we get interrupted between steps 2 and 3, we delete just the inode.
> + * In either case, the remaining objects are deleted on next mount.  From
> + * a users point of view, the operation succeeded.

This description had me confused for a while: why would you remove the
new inode. Maybe change the text to say 'target inode' or 'victim inode'?

> +static int logfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
> +{
> + struct inode *inode;
> +
> + if (dir->i_nlink >= LOGFS_LINK_MAX)
> + return -EMLINK;

Why is i_nlink limited? Don't you run out of space for inodes before
overflowing?

> + * In principle, this function should loop forever, looking for GC candidates
> + * and moving data.  LogFS is designed in such a way that this loop is
> + * guaranteed to terminate.
> + *
> + * Limiting the loop to four iterations serves purely to catch cases when
> + * these guarantees have failed.  An actual endless loop is an obvious bug
> + * and should be reported as such.
> + *
> + * But there is another nasty twist to this.  As I have described in my LCA
> + * presentation, Garbage collection would have to limit itself to higher
> + * levels if the number of available free segments goes down.  This code
> + * doesn't and should fail spectacularly.  Yet - hard as I tried I haven't
> + * been able to make it fail (short of a bug elsewhere).
> + *
> + * So in a way this code is intentionally wrong as a desperate cry for a
> + * better testcase.  And I do expect to get blamed for it one day. :(
> + */

Could you bug the code to reserve fewer segments for GC than you really
need, in order to stress test GC?

> +static struct inode *logfs_alloc_inode(struct super_block *sb)
> +{
> + struct logfs_inode *li;
> +
> + li = kmem_cache_alloc(logfs_inode_cache, GFP_KERNEL);
> + if (!li)
> + return NULL;
> + logfs_init_inode(>vfs_inode);
> + return >vfs_inode;
> +}
> +
> +
> +struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino)
> +{
> + struct inode *inode;
> +
> + inode = logfs_alloc_inode(sb);
> + if (!inode)
> + return ERR_PTR(-ENOMEM);
> +
> + logfs_init_inode(inode);

logfs_alloc_inode() returns an initialized inode, so no need to call
logfs_init_inode() again, right?

> +static __be64 timespec_to_be64(struct timespec tsp)
> +{
> + u64 time = ((u64)tsp.tv_sec << 32) + (tsp.tv_nsec & 0x);
> +
> + WARN_ON(tsp.tv_nsec > 9);
> + return cpu_to_be64(time);
> +}

Why not just store 64 bit nanoseconds? that would avoid the problem
with ns overflow and the year-2038 bug. OTOH, that would require
a 64 bit integer division when reading the data, so it gets you
a runtime overhead.

> +static void logfs_read_inode(struct inode *inode)
> +{
> + int ret;
> +
> + BUG_ON(inode->i_ino == LOGFS_INO_MASTER);
> +
> + ret = __logfs_read_inode(inode);
> +
> + /* What else can we do here? */
> + BUG_ON(ret);
> +}

ext2 returns make_bad_inode(inode) in this case, which seems to be
a better solution than crashing.

> +int __logfs_write_inode(struct inode *inode)
> +{
> + /*
> +  * FIXME: Those two inodes are 512 bytes in total.  Not good to
> +  * have on the stack.  Possibly the best solution would be to bite
> +  * the bullet and do another format change before release and
> +  * shrink the inodes.
> +  */
> + struct logfs_disk_inode old, new;
> +
> + BUG_ON(inode->i_ino == LOGFS_INO_MASTER);
> +
> + /* read and compare the inode first.  If it hasn't 

Re: [PATCH] LogFS take three

2007-05-17 Thread Jörn Engel
On Thu, 17 May 2007 16:43:59 +0800, David Woodhouse wrote:
> 
> > As I mentioned, some techniques like log-structured filesystem could
> > perform generally better on any kind of flash-based storage with FTL.
> > Although there are many kinds of FTL, it is commonly true that
> > it performs well under workload where sequential write is dominant.
> 
> Yes, it's certainly possible that we _could_ write a file system which
> is specifically targeted at FTL -- I was just wondering why anyone would
> _bother_ :)

Haven't you done that already?  JFFS2 write behaviour is the best-case
scenario for any FTL.  When the delta cache is finished, LogFS will be
pretty close to that as well.

Not sure if anyone would specifically target FTL.  Being well-suited for
those beasts is just a side-effect.

The FTL is still a net loss.  Without that FAT enabling layer a real
flash filesystem would be more efficient.

Jörn

-- 
Prosperity makes friends, adversity tries them.
-- Publilius Syrus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Tuesday 15 May 2007, Jörn Engel wrote:
> 
> > I've been semi watching this, and the only comment I really can give
> > is that I hate the name.  To me, logfs implies a filesystem for
> > logging purposes, not for Flash hardware with wear leveling issues to
> > be taken into account.
> 
> Yeah, well, ...
> 
> Two years ago when I started all this, I was looking for a good name.
> All I could come up with sounded stupid, so I picked "LogFS" as a code
> name.  As soon as I find a better name, the code name should get
> replaced.
> 

When doing a google search on logfs, there are less than five results
among the first 100 that don't refer to your work. The other two listed
in there are also log-structured file systems: The inferno flash file
system (http://inferno-os.googlecode.com/svn/trunk/liblogfs/) and the
(discontinued) file system named lfs from the 2005 google summer of
code.

I'd say the name should stay, changing it now can only add more confusion.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread David Woodhouse
On Thu, 2007-05-17 at 09:12 +, Pavel Machek wrote:
> Nah, it would lead to Jorn disappearing misteriously and _Pavel_
> accused of murder ;-).

Are you suggesting that you would murder Jörn (you misspelled his name)
merely for the heinous crime of using his own name?

Your Luddism was already quite excessive, but now you really _are_
taking it to extremes. :)

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Pavel Machek

> >> My experience is that no matter which name I pick, 
> >people will
> >> complain
> >> anyway.  Previous suggestions included:
> >> jffs3
> >> jefs
> >> engelfs
> >> poofs
> >> crapfs
> >> sweetfs
> >> cutefs
> >> dynamic journaling fs - djofs
> >> tfsfkal - the file system formerly known as logfs
> >
> >Can we call it jörnfs? :)
> 
> However if Jörn is accused of murder, it will have 
> little chance of
> being merged :-).

Nah, it would lead to Jorn disappearing misteriously and _Pavel_
accused of murder ;-).

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread David Woodhouse
On Thu, 2007-05-17 at 17:20 +0900, Dongjun Shin wrote:
> There are, of course, cases where direct access are better.
> However, as the demand for capacity, reliability and high performance
> for the flash storage increases, the use of FTL with embedded controller
> would be inevitable.
> 
> - The complexity/cost of host-side SW (like JFFS2/MTD) will increase due to
> the use of multiple flash in parallel and the use of high degree ECC 
> algorithm.
> There are other things like bad block handling and wear-leveling issues
> which has been recently touched by UBI with added SW complexity.

You don't get rid of that complexity by offloading it to a µcontroller. 

The only thing you achieve that way is making sure it's quite likely to
be done badly, and making it impossible to debug.

> - In contrast to the embedded environment where CPU and flash is directly
> connected, the I/O path between CPU and flash in PC environment is longer.
> The latency for SW handshaking between CPU and flash will also be longer,
> which would make the performance optimization harder.

Do it the naïve way with a single byte push/pull and waggling the
control lines separately, and what you say is true -- but you can have
flash controllers which assist with data transfer but still give you
essentially 'raw' access to the chip.

With the CAFÉ controller designed for the OLPC machine, we can spew data
across the PCI bus just as fast as we can suck it off the flash chip.

> As I mentioned, some techniques like log-structured filesystem could
> perform generally better on any kind of flash-based storage with FTL.
> Although there are many kinds of FTL, it is commonly true that
> it performs well under workload where sequential write is dominant.

Yes, it's certainly possible that we _could_ write a file system which
is specifically targeted at FTL -- I was just wondering why anyone would
_bother_ :)

I've seen an interesting file system which does have a kind of FTL
internally as its lowest layer, and which build on that using 'virtual'
sectors for file extents. Now that _does_ have its advantages -- but it
doesn't go as far as pretending to be a 'normal' block device; it's its
own special thing for internal use within that file system.
 
> I also expect that FTL for PC environment will have better quality spec
> than the disposable storage.

There really is no reason why FTL has to be done badly; just as there's
no _reason_ why hardware vendors have to give us crappy bsVendorCode.
Nevertheless, that's the way the world tends to be. So good luck
shipping with that :)

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Dongjun Shin

On 5/17/07, David Woodhouse <[EMAIL PROTECTED]> wrote:


Yes. These things are almost always implemented _very_ badly by the same
kind of crack-smoking hobo they drag in off the streets to write BIOSen.

It's bog-roll technology; if you fancy a laugh try doing some real
reliability tests on them time some. Powerfail testing is a good one.

This kind of thing is OK for disposable storage such as in digital
cameras, where it doesn't matter that it's no more reliable than a
floppy disc, but for real long-term storage it's really a bad idea.



There are so many flash-based storage and some disposable storages,
as you pointed out, have poor quality. I think it's mainly because these
are not designed for good quality, but for lowering the price.

These kind of devices are not ready for things like power failure because
their use case is far from that. For example, removing flash card
while taking pictures using digital camera is not a common use case.
(there should be a written notice that this kind of action is against
the warranty)



There's little point in optimising a file system _specifically_ for
devices which in often aren't reliable enough to keep your data anyway.
You might as well use ramfs.

It's unfortunate really -- there's no _fundamental_ reason why FTL has
to be done so badly; it's just that it almost always is. Direct access
to the flash from Linux is _always_ going to be better in practice --
and that way you avoid the problems with dual journalling, along with
the problems with the underlying FTL continuing to keep (and copy around
during GC) sectors which the top-level filesystem has actually
deallocated, etc.



There are, of course, cases where direct access are better.
However, as the demand for capacity, reliability and high performance
for the flash storage increases, the use of FTL with embedded controller
would be inevitable.

- The complexity/cost of host-side SW (like JFFS2/MTD) will increase due to
the use of multiple flash in parallel and the use of high degree ECC algorithm.
There are other things like bad block handling and wear-leveling issues
which has been recently touched by UBI with added SW complexity.

- In contrast to the embedded environment where CPU and flash is directly
connected, the I/O path between CPU and flash in PC environment is longer.
The latency for SW handshaking between CPU and flash will also be longer,
which would make the performance optimization harder.

As I mentioned, some techniques like log-structured filesystem could
perform generally better on any kind of flash-based storage with FTL.
Although there are many kinds of FTL, it is commonly true that
it performs well under workload where sequential write is dominant.

I also expect that FTL for PC environment will have better quality spec
than the disposable storage.

Dongjun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread David Woodhouse
On Thu, 2007-05-17 at 15:12 +0900, Dongjun Shin wrote:
> The current trend of flash-based device is to hide the flash-specific details
> from the host OS. The flash memory is encapsulated in a package
> which contains a dedicated controller where a small piece of software (F/W or 
> FTL)
> runs and makes the storage shown as a block device to the host.

Yes. These things are almost always implemented _very_ badly by the same
kind of crack-smoking hobo they drag in off the streets to write BIOSen.

It's bog-roll technology; if you fancy a laugh try doing some real
reliability tests on them time some. Powerfail testing is a good one.
 
This kind of thing is OK for disposable storage such as in digital
cameras, where it doesn't matter that it's no more reliable than a
floppy disc, but for real long-term storage it's really a bad idea.

> IMHO, for a flash-optimized filesystem to be useful and widely-used, it would 
> be better
> to run on a block device and to be designed to run efficiently on top of the 
> FTL.
> (ex. log-structured filesystem on general block device)

There's little point in optimising a file system _specifically_ for
devices which in often aren't reliable enough to keep your data anyway.
You might as well use ramfs.

It's unfortunate really -- there's no _fundamental_ reason why FTL has
to be done so badly; it's just that it almost always is. Direct access
to the flash from Linux is _always_ going to be better in practice --
and that way you avoid the problems with dual journalling, along with
the problems with the underlying FTL continuing to keep (and copy around
during GC) sectors which the top-level filesystem has actually
deallocated, etc.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread David Woodhouse
On Thu, 2007-05-17 at 15:12 +0900, Dongjun Shin wrote:
 The current trend of flash-based device is to hide the flash-specific details
 from the host OS. The flash memory is encapsulated in a package
 which contains a dedicated controller where a small piece of software (F/W or 
 FTL)
 runs and makes the storage shown as a block device to the host.

Yes. These things are almost always implemented _very_ badly by the same
kind of crack-smoking hobo they drag in off the streets to write BIOSen.

It's bog-roll technology; if you fancy a laugh try doing some real
reliability tests on them time some. Powerfail testing is a good one.
 
This kind of thing is OK for disposable storage such as in digital
cameras, where it doesn't matter that it's no more reliable than a
floppy disc, but for real long-term storage it's really a bad idea.

 IMHO, for a flash-optimized filesystem to be useful and widely-used, it would 
 be better
 to run on a block device and to be designed to run efficiently on top of the 
 FTL.
 (ex. log-structured filesystem on general block device)

There's little point in optimising a file system _specifically_ for
devices which in often aren't reliable enough to keep your data anyway.
You might as well use ramfs.

It's unfortunate really -- there's no _fundamental_ reason why FTL has
to be done so badly; it's just that it almost always is. Direct access
to the flash from Linux is _always_ going to be better in practice --
and that way you avoid the problems with dual journalling, along with
the problems with the underlying FTL continuing to keep (and copy around
during GC) sectors which the top-level filesystem has actually
deallocated, etc.

-- 
dwmw2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Dongjun Shin

On 5/17/07, David Woodhouse [EMAIL PROTECTED] wrote:


Yes. These things are almost always implemented _very_ badly by the same
kind of crack-smoking hobo they drag in off the streets to write BIOSen.

It's bog-roll technology; if you fancy a laugh try doing some real
reliability tests on them time some. Powerfail testing is a good one.

This kind of thing is OK for disposable storage such as in digital
cameras, where it doesn't matter that it's no more reliable than a
floppy disc, but for real long-term storage it's really a bad idea.



There are so many flash-based storage and some disposable storages,
as you pointed out, have poor quality. I think it's mainly because these
are not designed for good quality, but for lowering the price.

These kind of devices are not ready for things like power failure because
their use case is far from that. For example, removing flash card
while taking pictures using digital camera is not a common use case.
(there should be a written notice that this kind of action is against
the warranty)



There's little point in optimising a file system _specifically_ for
devices which in often aren't reliable enough to keep your data anyway.
You might as well use ramfs.

It's unfortunate really -- there's no _fundamental_ reason why FTL has
to be done so badly; it's just that it almost always is. Direct access
to the flash from Linux is _always_ going to be better in practice --
and that way you avoid the problems with dual journalling, along with
the problems with the underlying FTL continuing to keep (and copy around
during GC) sectors which the top-level filesystem has actually
deallocated, etc.



There are, of course, cases where direct access are better.
However, as the demand for capacity, reliability and high performance
for the flash storage increases, the use of FTL with embedded controller
would be inevitable.

- The complexity/cost of host-side SW (like JFFS2/MTD) will increase due to
the use of multiple flash in parallel and the use of high degree ECC algorithm.
There are other things like bad block handling and wear-leveling issues
which has been recently touched by UBI with added SW complexity.

- In contrast to the embedded environment where CPU and flash is directly
connected, the I/O path between CPU and flash in PC environment is longer.
The latency for SW handshaking between CPU and flash will also be longer,
which would make the performance optimization harder.

As I mentioned, some techniques like log-structured filesystem could
perform generally better on any kind of flash-based storage with FTL.
Although there are many kinds of FTL, it is commonly true that
it performs well under workload where sequential write is dominant.

I also expect that FTL for PC environment will have better quality spec
than the disposable storage.

Dongjun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread David Woodhouse
On Thu, 2007-05-17 at 17:20 +0900, Dongjun Shin wrote:
 There are, of course, cases where direct access are better.
 However, as the demand for capacity, reliability and high performance
 for the flash storage increases, the use of FTL with embedded controller
 would be inevitable.
 
 - The complexity/cost of host-side SW (like JFFS2/MTD) will increase due to
 the use of multiple flash in parallel and the use of high degree ECC 
 algorithm.
 There are other things like bad block handling and wear-leveling issues
 which has been recently touched by UBI with added SW complexity.

You don't get rid of that complexity by offloading it to a µcontroller. 

The only thing you achieve that way is making sure it's quite likely to
be done badly, and making it impossible to debug.

 - In contrast to the embedded environment where CPU and flash is directly
 connected, the I/O path between CPU and flash in PC environment is longer.
 The latency for SW handshaking between CPU and flash will also be longer,
 which would make the performance optimization harder.

Do it the naïve way with a single byte push/pull and waggling the
control lines separately, and what you say is true -- but you can have
flash controllers which assist with data transfer but still give you
essentially 'raw' access to the chip.

With the CAFÉ controller designed for the OLPC machine, we can spew data
across the PCI bus just as fast as we can suck it off the flash chip.

 As I mentioned, some techniques like log-structured filesystem could
 perform generally better on any kind of flash-based storage with FTL.
 Although there are many kinds of FTL, it is commonly true that
 it performs well under workload where sequential write is dominant.

Yes, it's certainly possible that we _could_ write a file system which
is specifically targeted at FTL -- I was just wondering why anyone would
_bother_ :)

I've seen an interesting file system which does have a kind of FTL
internally as its lowest layer, and which build on that using 'virtual'
sectors for file extents. Now that _does_ have its advantages -- but it
doesn't go as far as pretending to be a 'normal' block device; it's its
own special thing for internal use within that file system.
 
 I also expect that FTL for PC environment will have better quality spec
 than the disposable storage.

There really is no reason why FTL has to be done badly; just as there's
no _reason_ why hardware vendors have to give us crappy bsVendorCode.
Nevertheless, that's the way the world tends to be. So good luck
shipping with that :)

-- 
dwmw2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Pavel Machek

  My experience is that no matter which name I pick, 
 people will
  complain
  anyway.  Previous suggestions included:
  jffs3
  jefs
  engelfs
  poofs
  crapfs
  sweetfs
  cutefs
  dynamic journaling fs - djofs
  tfsfkal - the file system formerly known as logfs
 
 Can we call it jörnfs? :)
 
 However if Jörn is accused of murder, it will have 
 little chance of
 being merged :-).

Nah, it would lead to Jorn disappearing misteriously and _Pavel_
accused of murder ;-).

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread David Woodhouse
On Thu, 2007-05-17 at 09:12 +, Pavel Machek wrote:
 Nah, it would lead to Jorn disappearing misteriously and _Pavel_
 accused of murder ;-).

Are you suggesting that you would murder Jörn (you misspelled his name)
merely for the heinous crime of using his own name?

Your Luddism was already quite excessive, but now you really _are_
taking it to extremes. :)

-- 
dwmw2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Tuesday 15 May 2007, Jörn Engel wrote:
 
  I've been semi watching this, and the only comment I really can give
  is that I hate the name.  To me, logfs implies a filesystem for
  logging purposes, not for Flash hardware with wear leveling issues to
  be taken into account.
 
 Yeah, well, ...
 
 Two years ago when I started all this, I was looking for a good name.
 All I could come up with sounded stupid, so I picked LogFS as a code
 name.  As soon as I find a better name, the code name should get
 replaced.
 

When doing a google search on logfs, there are less than five results
among the first 100 that don't refer to your work. The other two listed
in there are also log-structured file systems: The inferno flash file
system (http://inferno-os.googlecode.com/svn/trunk/liblogfs/) and the
(discontinued) file system named lfs from the 2005 google summer of
code.

I'd say the name should stay, changing it now can only add more confusion.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jörn Engel
On Thu, 17 May 2007 16:43:59 +0800, David Woodhouse wrote:
 
  As I mentioned, some techniques like log-structured filesystem could
  perform generally better on any kind of flash-based storage with FTL.
  Although there are many kinds of FTL, it is commonly true that
  it performs well under workload where sequential write is dominant.
 
 Yes, it's certainly possible that we _could_ write a file system which
 is specifically targeted at FTL -- I was just wondering why anyone would
 _bother_ :)

Haven't you done that already?  JFFS2 write behaviour is the best-case
scenario for any FTL.  When the delta cache is finished, LogFS will be
pretty close to that as well.

Not sure if anyone would specifically target FTL.  Being well-suited for
those beasts is just a side-effect.

The FTL is still a net loss.  Without that FAT enabling layer a real
flash filesystem would be more efficient.

Jörn

-- 
Prosperity makes friends, adversity tries them.
-- Publilius Syrus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Tuesday 15 May 2007, Jörn Engel wrote:
 Add LogFS, a scalable flash filesystem.

Hi Jörn,

Sorry for not commenting earlier, there were so many discussions on version
two that I wanted to wait for the fallout of that instead of duplicating
all the comments.

Here are a few things I notice while going through the third version:

 +/*
 + * Private errno for accessed beyond end-of-file.  Only used internally to
 + * logfs.  If this ever gets exposed to userspace or even other parts of the
 + * kernel, it is a bug.  256 was chosen as a number sufficiently above all
 + * used errno #defines.
 + *
 + * It can be argued that this is a hack and should be replaced with something
 + * else.  My last attempt to do this failed spectacularly and there are more
 + * urgent problems that users actually care about.  This will remain for the
 + * moment.  Patches are wellcome, of course.
 + */
 +#define EOF  256

It should at least be in the kernel-only errno range between 512 and 4095,
that way it can eventually be added to include/linux/errno.h.

 + * Target rename works in three atomic steps:
 + * 1. Attach old inode to new dentry (remember old dentry and new inode)
 + * 2. Remove old dentry (still remember the new inode)
 + * 3. Remove new inode
 + *
 + * Here we remember both an inode an a dentry.  If we get interrupted
 + * between steps 1 and 2, we delete both the dentry and the inode.  If
 + * we get interrupted between steps 2 and 3, we delete just the inode.
 + * In either case, the remaining objects are deleted on next mount.  From
 + * a users point of view, the operation succeeded.

This description had me confused for a while: why would you remove the
new inode. Maybe change the text to say 'target inode' or 'victim inode'?

 +static int logfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
 +{
 + struct inode *inode;
 +
 + if (dir-i_nlink = LOGFS_LINK_MAX)
 + return -EMLINK;

Why is i_nlink limited? Don't you run out of space for inodes before
overflowing?

 + * In principle, this function should loop forever, looking for GC candidates
 + * and moving data.  LogFS is designed in such a way that this loop is
 + * guaranteed to terminate.
 + *
 + * Limiting the loop to four iterations serves purely to catch cases when
 + * these guarantees have failed.  An actual endless loop is an obvious bug
 + * and should be reported as such.
 + *
 + * But there is another nasty twist to this.  As I have described in my LCA
 + * presentation, Garbage collection would have to limit itself to higher
 + * levels if the number of available free segments goes down.  This code
 + * doesn't and should fail spectacularly.  Yet - hard as I tried I haven't
 + * been able to make it fail (short of a bug elsewhere).
 + *
 + * So in a way this code is intentionally wrong as a desperate cry for a
 + * better testcase.  And I do expect to get blamed for it one day. :(
 + */

Could you bug the code to reserve fewer segments for GC than you really
need, in order to stress test GC?

 +static struct inode *logfs_alloc_inode(struct super_block *sb)
 +{
 + struct logfs_inode *li;
 +
 + li = kmem_cache_alloc(logfs_inode_cache, GFP_KERNEL);
 + if (!li)
 + return NULL;
 + logfs_init_inode(li-vfs_inode);
 + return li-vfs_inode;
 +}
 +
 +
 +struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino)
 +{
 + struct inode *inode;
 +
 + inode = logfs_alloc_inode(sb);
 + if (!inode)
 + return ERR_PTR(-ENOMEM);
 +
 + logfs_init_inode(inode);

logfs_alloc_inode() returns an initialized inode, so no need to call
logfs_init_inode() again, right?

 +static __be64 timespec_to_be64(struct timespec tsp)
 +{
 + u64 time = ((u64)tsp.tv_sec  32) + (tsp.tv_nsec  0x);
 +
 + WARN_ON(tsp.tv_nsec  9);
 + return cpu_to_be64(time);
 +}

Why not just store 64 bit nanoseconds? that would avoid the problem
with ns overflow and the year-2038 bug. OTOH, that would require
a 64 bit integer division when reading the data, so it gets you
a runtime overhead.

 +static void logfs_read_inode(struct inode *inode)
 +{
 + int ret;
 +
 + BUG_ON(inode-i_ino == LOGFS_INO_MASTER);
 +
 + ret = __logfs_read_inode(inode);
 +
 + /* What else can we do here? */
 + BUG_ON(ret);
 +}

ext2 returns make_bad_inode(inode) in this case, which seems to be
a better solution than crashing.

 +int __logfs_write_inode(struct inode *inode)
 +{
 + /*
 +  * FIXME: Those two inodes are 512 bytes in total.  Not good to
 +  * have on the stack.  Possibly the best solution would be to bite
 +  * the bullet and do another format change before release and
 +  * shrink the inodes.
 +  */
 + struct logfs_disk_inode old, new;
 +
 + BUG_ON(inode-i_ino == LOGFS_INO_MASTER);
 +
 + /* read and compare the inode first.  If it hasn't changed, don't
 +  * bother writing it. */
 + logfs_inode_to_disk(inode, new);
 + if 

Re: Review status (Re: [PATCH] LogFS take three)

2007-05-17 Thread Evgeniy Polyakov

Hi Jörn.

Is logfs 32bit fs or 674bit, since although you use 64bit values for
offsets, area management and strange converstions like described below 
from offset into segment number are performed in 32bit?
Is it enough for SSD for example to be 32bit only? Or if it is 64bit,
could you please explain logic behind area management?

I've found that you store segment numbers as 32bit values (for example
in prepare_write()), and convert requested 64bit offset into segment
number via superblock's s_segshift.
This conversation seems confusing to me in case of real 64bit offsets.
For example this one obtained via prepare_write:

7  1 logfs_prepare_write78  fs/logfs/file.c
8  2 logfs_readpage_nolock20  fs/logfs/file.c
9  1 logfs_read_block   351  fs/logfs/readwrite.c
10  1 logfs_read_loop   139  fs/logfs/readwrite.c
11  2 logfs_segment_read   108  fs/logfs/readwrite.c
12  1 wbuf_read 289 

u32 segno = ofs  super-s_segshift;

ofs is originally obtained from inode's li_data array, which is filled
with raw segment numbers which can be 64bit (here is another issue,
since logfs_segment_write() returns signed, so essentially logfs is
63bit filesystem).

But here I've came to area management in logfs, and found that it is
32bit only, for example __logfs_segment_write()/__logfs_get_free_bytes() 
returns signed 32 bit value (so it is reduced to 31 bit), which is then 
placed into li_data as 64bit value. The latter
(__logfs_get_free_bytes()) truncates 64bit data value obtained via
dev_ofs() into signed 32 bit value.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Review status (Re: [PATCH] LogFS take three)

2007-05-17 Thread Jörn Engel
On Thu, 17 May 2007 20:03:11 +0400, Evgeniy Polyakov wrote:
 
 Is logfs 32bit fs or 674bit, since although you use 64bit values for
 offsets, area management and strange converstions like described below 
 from offset into segment number are performed in 32bit?
 Is it enough for SSD for example to be 32bit only? Or if it is 64bit,
 could you please explain logic behind area management?

Ignoring bugs and signed return values for error handling, it is either
64bit or 32+32bit.

Inode numbers and file positions are 64bit.  Offsets are 64bit as well.
In a couple of places, offsets are also 32+32bit.  Basically the high
bits contain the segment number, the lower bits the offset within a
segment.

Side note: It would be nicer if the high 32bit were segment number.
Instead the number of bits depends on segment size.  Guess I should
change that while the format isn't fixed yet.

An area is a segment that is currently being written.  Data is
appended to this segment as it comes in, until the segment is full.  Any
functions dealing with areas only need a 32bit offset, which is the
offset within the area, not the absolute device offset.

Writes within an area are also buffered.  New data first goes into the
write buffer (wbuf) and only when this is full is it flushed to the
device.  NAND flash and some NOR flashes require such buffering.  When
writing to the device, the 32bit segno and the 32bit in-segment offset
need to get converted back to a 64bit device offset.

 I've found that you store segment numbers as 32bit values (for example
 in prepare_write()), and convert requested 64bit offset into segment
 number via superblock's s_segshift.

Yes, as described above.

 This conversation seems confusing to me in case of real 64bit offsets.
 For example this one obtained via prepare_write:
 
 7  1 logfs_prepare_write78  fs/logfs/file.c
 8  2 logfs_readpage_nolock20  fs/logfs/file.c
 9  1 logfs_read_block   351  fs/logfs/readwrite.c
 10  1 logfs_read_loop   139  fs/logfs/readwrite.c
 11  2 logfs_segment_read   108  fs/logfs/readwrite.c
 12  1 wbuf_read 289 
 
 u32 segno = ofs  super-s_segshift;
 
 ofs is originally obtained from inode's li_data array, which is filled
 with raw segment numbers which can be 64bit (here is another issue,
 since logfs_segment_write() returns signed, so essentially logfs is
 63bit filesystem).

The filesystem format is 64bit.  The current code can only deal with
63bit.  Eric Sandeen just fixed ext2 to actually deal with 32bit
numbers and the same is possible for logfs.  If anyone ever cares...

 But here I've came to area management in logfs, and found that it is
 32bit only, for example __logfs_segment_write()/__logfs_get_free_bytes() 
 returns signed 32 bit value (so it is reduced to 31 bit), which is then 
 placed into li_data as 64bit value. The latter
 (__logfs_get_free_bytes()) truncates 64bit data value obtained via
 dev_ofs() into signed 32 bit value.

That indeed is a bug.  __logfs_get_free_bytes() should return s64
instead of s32.  Will fix immediatly.

If anyone can find similar bugs, the bounty is a beer or non-alcoholic
beverage of choice. :)

Jörn

-- 
To announce that there must be no criticism of the President, or that we
are to stand by the President, right or wrong, is not only unpatriotic
and servile, but is morally treasonable to the American public.
-- Theodore Roosevelt, Kansas City Star, 1918
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 16 2007 13:09, Jörn Engel wrote:
On Wed, 16 May 2007 12:54:14 +0800, David Woodhouse wrote:
 
 Personally I'd just go for 'JFFS3'. After all, it has a better claim to
 the name than either of its predecessors :)

Did you ever see akpm's facial expression when he tried to pronounce
JFFS2?  ;)

Is there something special with [dʒeɪ ɛf ɛf ɛs tuː]?


Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 16 2007 14:55, Jörn Engel wrote:
On Wed, 16 May 2007 16:29:22 +0400, Evgeniy Polyakov wrote:
 On Wed, May 16, 2007 at 01:50:03PM +0200, Jörn Engel ([EMAIL PROTECTED]) 
 wrote:
  On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
   
   But if akpm can't pronounce it, how about FFFS for faster flash
   filesystem ;-)
  
  How many of you have worked for IBM before?  Vowels are not evil. ;)
 
 Do you think 'eieio' is a good set? IBM's work too...

C'mon, UIO does not cut IIO either ;-)


Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 16 2007 22:06, CaT wrote:
On Wed, May 16, 2007 at 01:50:03PM +0200, J??rn Engel wrote:
 On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
  
  But if akpm can't pronounce it, how about FFFS for faster flash
  filesystem ;-)
 
 How many of you have worked for IBM before?  Vowels are not evil. ;)
 
 Grouping four or more consonants to name anything will cause similar
 expressions on people's faces.  Numbers don't help much either.
 
 Ext2 is a great name, because ext actually is a pronouncable syllable.
 MinixFS, ChunkFS, TileFS are great too.  XFS and JFS are ok, at least
 they only have three consonants.  But FFS exists, so I'd rather go for a
 syllable.

FlashFS?

Or just try once dropping all those redundant 'fs' suffixes.
bdev, proc, cpuset, devpts, mqueue, fuse(blk|ctl), vfat, iso9660, etc.
Then there's much more space for innovative names.


Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 16 2007 15:53, Jörn Engel wrote:

My experience is that no matter which name I pick, people will complain
anyway.  Previous suggestions included:
[...]

Plus today:
FFFS
flashfs
fredfs
bob
shizzle

Imo they all suck.  LogFS also sucks, but it allows me to make a stupid
joke and keep my logfs.org domain.

Try woodfs! (log - wood - get it?)
But finding names can be so tiresome, just give it a Borg-style
designation - filesystem 125 or so. fs2007q1, being this
quartal's new filesystem.


Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jan Engelhardt

On May 16 2007 02:06, Jörn Engel wrote:

  +/* memtree.c */
  +void btree_init(struct btree_head *head);
  +void *btree_lookup(struct btree_head *head, long val);
  +int btree_insert(struct btree_head *head, long val, void *ptr);
  +int btree_remove(struct btree_head *head, long val);
 
 These names are too generic.  If we later add a btree library: blam.

My plan was to move this code to lib/ sooner or later.  If you consider
it useful in its current state, I can do it immediatly.  And if someone
else merged a superior btree library I'd happily remove mine and use the
new one instead.

Opinions?

Why would we need another btree, when there is lib/rbtree.c?
Or does yours do something fundamentally different?


Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Evgeniy Polyakov
On Thu, May 17, 2007 at 07:26:07PM +0200, Jan Engelhardt ([EMAIL PROTECTED]) 
wrote:
 My plan was to move this code to lib/ sooner or later.  If you consider
 it useful in its current state, I can do it immediatly.  And if someone
 else merged a superior btree library I'd happily remove mine and use the
 new one instead.
 
 Opinions?
 
 Why would we need another btree, when there is lib/rbtree.c?
 Or does yours do something fundamentally different?

It is not red-black tree, it is b+ tree.

   Jan

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jörn Engel
On Thu, 17 May 2007 17:08:51 +0200, Arnd Bergmann wrote:
 On Tuesday 15 May 2007, Jörn Engel wrote:
  Add LogFS, a scalable flash filesystem.
 
 Sorry for not commenting earlier, there were so many discussions on version
 two that I wanted to wait for the fallout of that instead of duplicating
 all the comments.

You are the last person that has to be sorry. ;)

 Here are a few things I notice while going through the third version:
 
  +/*
  + * Private errno for accessed beyond end-of-file.  Only used internally to
  + * logfs.  If this ever gets exposed to userspace or even other parts of 
  the
  + * kernel, it is a bug.  256 was chosen as a number sufficiently above all
  + * used errno #defines.
  + *
  + * It can be argued that this is a hack and should be replaced with 
  something
  + * else.  My last attempt to do this failed spectacularly and there are 
  more
  + * urgent problems that users actually care about.  This will remain for 
  the
  + * moment.  Patches are wellcome, of course.
  + */
  +#define EOF256
 
 It should at least be in the kernel-only errno range between 512 and 4095,
 that way it can eventually be added to include/linux/errno.h.

Fair enough.  512 it is.

  + * Target rename works in three atomic steps:
  + * 1. Attach old inode to new dentry (remember old dentry and new inode)
  + * 2. Remove old dentry (still remember the new inode)
  + * 3. Remove new inode
  + *
  + * Here we remember both an inode an a dentry.  If we get interrupted
  + * between steps 1 and 2, we delete both the dentry and the inode.  If
  + * we get interrupted between steps 2 and 3, we delete just the inode.
  + * In either case, the remaining objects are deleted on next mount.  From
  + * a users point of view, the operation succeeded.
 
 This description had me confused for a while: why would you remove the
 new inode. Maybe change the text to say 'target inode' or 'victim inode'?

'Victim inode' sounds good.  Will do.

  +static int logfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
  +{
  +   struct inode *inode;
  +
  +   if (dir-i_nlink = LOGFS_LINK_MAX)
  +   return -EMLINK;
 
 Why is i_nlink limited? Don't you run out of space for inodes before
 overflowing?

I don't know.  With the current limit of 2^31, a sufficiently large
device can reach the limit.  And it is imaginable that overflowing the
s32 number space can expose security holes.  Not that I actually know,
the check is pure paranoia.

  + * In principle, this function should loop forever, looking for GC 
  candidates
  + * and moving data.  LogFS is designed in such a way that this loop is
  + * guaranteed to terminate.
  + *
  + * Limiting the loop to four iterations serves purely to catch cases when
  + * these guarantees have failed.  An actual endless loop is an obvious bug
  + * and should be reported as such.
  + *
  + * But there is another nasty twist to this.  As I have described in my LCA
  + * presentation, Garbage collection would have to limit itself to higher
  + * levels if the number of available free segments goes down.  This code
  + * doesn't and should fail spectacularly.  Yet - hard as I tried I haven't
  + * been able to make it fail (short of a bug elsewhere).
  + *
  + * So in a way this code is intentionally wrong as a desperate cry for a
  + * better testcase.  And I do expect to get blamed for it one day. :(
  + */
 
 Could you bug the code to reserve fewer segments for GC than you really
 need, in order to stress test GC?

I could.  Wear leveling will cause changes in the area, so I'll have a
closer look when implementing that.

  +static struct inode *logfs_alloc_inode(struct super_block *sb)
  +{
  +   struct logfs_inode *li;
  +
  +   li = kmem_cache_alloc(logfs_inode_cache, GFP_KERNEL);
  +   if (!li)
  +   return NULL;
  +   logfs_init_inode(li-vfs_inode);
  +   return li-vfs_inode;
  +}
  +
  +
  +struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino)
  +{
  +   struct inode *inode;
  +
  +   inode = logfs_alloc_inode(sb);
  +   if (!inode)
  +   return ERR_PTR(-ENOMEM);
  +
  +   logfs_init_inode(inode);
 
 logfs_alloc_inode() returns an initialized inode, so no need to call
 logfs_init_inode() again, right?

Right.  Will change.

  +static __be64 timespec_to_be64(struct timespec tsp)
  +{
  +   u64 time = ((u64)tsp.tv_sec  32) + (tsp.tv_nsec  0x);
  +
  +   WARN_ON(tsp.tv_nsec  9);
  +   return cpu_to_be64(time);
  +}
 
 Why not just store 64 bit nanoseconds? that would avoid the problem
 with ns overflow and the year-2038 bug. OTOH, that would require
 a 64 bit integer division when reading the data, so it gets you
 a runtime overhead.

I like the idea.  Do conversion function exist both way?

What I don't get is the year-2038 bug.  Isn't that the 31bit limit,
while 32bit would last to 2106?

  +static void logfs_read_inode(struct inode *inode)
  +{
  +   int ret;
  +
  +   BUG_ON(inode-i_ino == LOGFS_INO_MASTER);
  +
  +  

Re: [PATCH] LogFS take three

2007-05-17 Thread Pavel Machek
Hi!

 Yes. These things are almost always implemented _very_ 
 badly by the same
 kind of crack-smoking hobo they drag in off the streets 
 to write BIOSen.
 
 It's bog-roll technology; if you fancy a laugh try 
 doing some real
 reliability tests on them time some. Powerfail testing 
 is a good one.
 
 This kind of thing is OK for disposable storage such as 
 in digital
 cameras, where it doesn't matter that it's no more 
 reliable than a
 floppy disc, but for real long-term storage it's really 
 a bad idea.
 
 
 There are so many flash-based storage and some 
 disposable storages,
 as you pointed out, have poor quality. I think it's 
 mainly because these
 are not designed for good quality, but for lowering the 
 price.
 
 These kind of devices are not ready for things like 
 power failure because
 their use case is far from that. For example, removing 
 flash card
 while taking pictures using digital camera is not a 
 common use case.
 (there should be a written notice that this kind of 
 action is against
 the warranty)

Hmm.. so operating your camera on batteries should be against the
warranty, since batteries commonly run empty while storing pictures?


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Pekka Enberg

Jörn Engel wrote:

Compressing random data will actually enlarge it.  If that happens I
simply store the verbatim uncompressed data instead and mark it as such.

There is also demand for a user-controlled bit in the inode to disable
compression completely.  All those .jpg, .mpg, .mp3, etc. just waste
time by trying and failing to compress them.


So any sane way to enable compression is on per-inode basis which makes 
me still wonder why you need per-object compression.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Thursday 17 May 2007, Jörn Engel wrote:
 
  Why not just store 64 bit nanoseconds? that would avoid the problem
  with ns overflow and the year-2038 bug. OTOH, that would require
  a 64 bit integer division when reading the data, so it gets you
  a runtime overhead.
 
 I like the idea.  Do conversion function exist both way?
 
 What I don't get is the year-2038 bug.  Isn't that the 31bit limit,
 while 32bit would last to 2106?

You're right, you don't hit the 2038 bug here, because you use an
unsigned variable. The bug exists elsewhere because time_t tv_sec
is signed.

Just using nanoseconds probably doesn't gain you much after all
then. You could however just have separate 32 bit fields in the
inode for seconds and nanoseconds, that will result in the exact
same layout that you have right now, but won't require a conversion
function.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jörn Engel
On Thu, 17 May 2007 23:00:20 +0200, Arnd Bergmann wrote:
 
 Just using nanoseconds probably doesn't gain you much after all
 then. You could however just have separate 32 bit fields in the
 inode for seconds and nanoseconds, that will result in the exact
 same layout that you have right now, but won't require a conversion
 function.

I could also have a 30bit and a 34bit field.  30bit is enough for
nanoseconds.  So many options.

Jörn

-- 
Time? What's that? Time is only worth what you do with it.
-- Theo de Raadt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Thursday 17 May 2007, Pekka Enberg wrote:
 
 Jörn Engel wrote:
  Compressing random data will actually enlarge it.  If that happens I
  simply store the verbatim uncompressed data instead and mark it as such.
  
  There is also demand for a user-controlled bit in the inode to disable
  compression completely.  All those .jpg, .mpg, .mp3, etc. just waste
  time by trying and failing to compress them.
 
 So any sane way to enable compression is on per-inode basis which makes 
 me still wonder why you need per-object compression.

1. it doesn't require user interaction, the file system will do the right
thing most of the time.

2. enlarging data is a very bad thing because it makes the behaviour
of the fs unpredictable. With uncompressed objects, you have a guaranteed
upper bound on the size.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jörn Engel
On Thu, 17 May 2007 23:36:13 +0200, Arnd Bergmann wrote:
 On Thursday 17 May 2007, Pekka Enberg wrote:
  
  So any sane way to enable compression is on per-inode basis which makes 
  me still wonder why you need per-object compression.
 
 1. it doesn't require user interaction, the file system will do the right
 thing most of the time.
 
 2. enlarging data is a very bad thing because it makes the behaviour
 of the fs unpredictable. With uncompressed objects, you have a guaranteed
 upper bound on the size.

Correct.  The compression decision is always per-object.  Per-inode is a
hint from userspace that a compression attempt would be futile.

A compression algorithm that compresses any data is provably impossible.
Some data will always cause expansion instead of compression.  Some
algorithms have a well-known upper bound on the expansion, others don't.
So LogFS instead creates its own upper bound by reserving one byte in
the header for the compression type.

And while one bit would suffice as a compressed/uncompressed flag,
having a byte allows to support more than one compression algorithm.
LZO looks promising and is on its way into the kernel.  Others may come
in the future.

Jörn

-- 
My second remark is that our intellectual powers are rather geared to
master static relations and that our powers to visualize processes
evolving in time are relatively poorly developed.
-- Edsger W. Dijkstra
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Jamie Lokier
Jörn Engel wrote:
  Almost all your static functions start with logfs_, why not this one?
 
 Because after a while I discovered how silly it is to start every
 function with logfs_.  That prefix doesn't add much unless the function
 has global scope.  What I didn't do was remove the prefix from older
 functions.

It's handy when debugging or showing detailed backtraces.  Not that
I'm advocating it (or not), just something I've noticed in other
programs.

-- Jamie
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Dongjun Shin

Hi,

On 5/18/07, Pavel Machek [EMAIL PROTECTED] wrote:

Hi!


Hmm.. so operating your camera on batteries should be against the
warranty, since batteries commonly run empty while storing pictures?




AFAIK, the camera stops writing to the flash card and automatically
turns off when it's low on battery (before empty).
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-17 Thread Kyle Moffett

On May 17, 2007, at 13:45:33, Evgeniy Polyakov wrote:
On Thu, May 17, 2007 at 07:26:07PM +0200, Jan Engelhardt  
([EMAIL PROTECTED]) wrote:
My plan was to move this code to lib/ sooner or later.  If you  
consider it useful in its current state, I can do it immediatly.   
And if someone else merged a superior btree library I'd happily  
remove mine and use the new one instead.


Opinions?


Why would we need another btree, when there is lib/rbtree.c?  Or  
does yours do something fundamentally different?


It is not red-black tree, it is b+ tree.


It might be better to use the prefix bptree to help prevent  
confusion.  A quick google search on bp-tree reveals only the perl B 
+-tree module Tree::BPTree, a U-Maryland Java CS project on B+- 
trees, and a news article about a BP tree-top protest.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 19:17:18 +, Pavel Machek wrote:
> 
> In kernel fsck
> 
> > --- /dev/null   2007-04-18 05:32:26.652341749 +0200
> > +++ linux-2.6.21logfs/fs/logfs/progs/fsck.c 2007-05-15 00:54:22.0 
> > +0200
> > @@ -0,0 +1,332 @@
> > +/*
> > + * fs/logfs/prog/fsck.c- filesystem check
> > + *
> > + * As should be obvious for Linux kernel code, license is GPLv2
> > + *
> > + * Copyright (c) 2005-2007 Joern Engel
> > + *
> > + * In principle this could get moved to userspace.  However it might still
> > + * make some sense to keep it in the kernel.  It is a pure checker and will
> > + * only report problems, not attempt to repair them.
> > + */
> 
> Is there version that repairs?

No.

> BUG is not right thing to do for media error.

I know.  Top 3 items of my todo list are:
- Handle system crashes
- Add second journal
- Error handling

> > +
> > +#if 0
> > +/* rootdir */
> 
> Please just delete it, not comment it out like this.

That will get resurrected, even before the move to userspace.  I had to
change the filesystem format for compression support and this is an
artifact of the transition phase.

Jörn

-- 
Ninety percent of everything is crap.
-- Sturgeon's Law
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Pavel Machek
Hi!

In kernel fsck

> --- /dev/null 2007-04-18 05:32:26.652341749 +0200
> +++ linux-2.6.21logfs/fs/logfs/progs/fsck.c   2007-05-15 00:54:22.0 
> +0200
> @@ -0,0 +1,332 @@
> +/*
> + * fs/logfs/prog/fsck.c  - filesystem check
> + *
> + * As should be obvious for Linux kernel code, license is GPLv2
> + *
> + * Copyright (c) 2005-2007 Joern Engel
> + *
> + * In principle this could get moved to userspace.  However it might still
> + * make some sense to keep it in the kernel.  It is a pure checker and will
> + * only report problems, not attempt to repair them.
> + */

Is there version that repairs?

> + /* Some segments are reserved.  Just pretend they were all valid */
> + reserved = btree_lookup(>s_reserved_segments, segno);
> + if (reserved)
> + return 0;
> +
> + err = wbuf_read(sb, dev_ofs(sb, segno, 0), sizeof(sh), );
> + BUG_ON(err);

BUG is not right thing to do for media error.

> +/*
> + * fs/logfs/prog/mkfs.c  - filesystem generation
> + *
> + * As should be obvious for Linux kernel code, license is GPLv2
> + *
> + * Copyright (c) 2005-2007 Joern Engel
> + *
> + * Should get moved to userspace.
> + */

Indeed. 

> +
> +#if 0
> +/* rootdir */

Please just delete it, not comment it out like this.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 23:49:55 +0800, David Woodhouse wrote:
> 
> Utility is a factor of the underlying design -- a filesystem designed
> for flash really isn't suited to block devices.

I can think of at least three examples where LogFS would indeed make
sense on block devices.

Jörn

-- 
Happiness isn't having what you want, it's wanting what you have.
-- unknown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread David Woodhouse
On Wed, 2007-05-16 at 08:34 -0700, Andrew Morton wrote:
> Reduced testability, mainly. Also potentially reduced usefulness. 

CONFIG_MTD has never been a barrier to testability. JFFS2 depends on MTD
and had _most_ of its early testing and development done on the 'fake'
mtdram device.

Utility is a factor of the underlying design -- a filesystem designed
for flash really isn't suited to block devices.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Andrew Morton
On Wed, 16 May 2007 20:07:18 +0800 David Woodhouse <[EMAIL PROTECTED]> wrote:

> > It's strange and a bit regrettable that an fs would have dependency on MTD,
> > really.
> 
> Why? Other file systems has dependencies on BLOCK or on NET. It seems
> entirely normal to me.

Reduced testability, mainly. Also potentially reduced usefulness.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Kevin Bowling

On 5/16/07, David Woodhouse <[EMAIL PROTECTED]> wrote:

On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote:
>
> My experience is that no matter which name I pick, people will
> complain
> anyway.  Previous suggestions included:
> jffs3
> jefs
> engelfs
> poofs
> crapfs
> sweetfs
> cutefs
> dynamic journaling fs - djofs
> tfsfkal - the file system formerly known as logfs

Can we call it jörnfs? :)


However if Jörn is accused of murder, it will have little chance of
being merged :-).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread CaT
On Wed, May 16, 2007 at 03:53:19PM +0200, J??rn Engel wrote:
> Imo they all suck.  LogFS also sucks, but it allows me to make a stupid
> joke and keep my logfs.org domain.

Well if stupid jokes are a goer there's always gordonfs. :)

*hides*

-- 
"To the extent that we overreact, we proffer the terrorists the
greatest tribute."
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Artem Bityutskiy
On Wed, 2007-05-16 at 22:04 +0800, David Woodhouse wrote:
> On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote:
> > 
> > My experience is that no matter which name I pick, people will
> > complain
> > anyway.  Previous suggestions included:
> > jffs3
> > jefs
> > engelfs
> > poofs
> > crapfs
> > sweetfs
> > cutefs
> > dynamic journaling fs - djofs
> > tfsfkal - the file system formerly known as logfs
> 
> Can we call it jörnfs? :)

And it is essential to preserve "ö" and let Pavel enjoy :-)

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread David Woodhouse
On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote:
> 
> My experience is that no matter which name I pick, people will
> complain
> anyway.  Previous suggestions included:
> jffs3
> jefs
> engelfs
> poofs
> crapfs
> sweetfs
> cutefs
> dynamic journaling fs - djofs
> tfsfkal - the file system formerly known as logfs

Can we call it jörnfs? :)

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 09:41:10 -0400, John Stoffel wrote:
> Jörn> On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
> 
> Jörn> How many of you have worked for IBM before?  Vowels are not
> evil. ;)
> 
> Nope, they're not.  I just think that LogFS isn't descriptive enough,
> or more accurately, is the *wrong* description of this filesystem.  

That was the whole point.  JFFS2, the journaling flash filesystem, is a
strictly log-structured filesystem.  LogFS has a journal.

It is also the filesystem that tries to scale logarithmically, as Arnd
has noted.  Maybe I should call it Log2 to emphesize this point.  Log1
would be horrible scalability.

> flashfs works for me.  It's longer, but hey, that's ok.  Even flshfs
> might work.  Oh wait, flesh?  flash?  flush?  Too confusing... :-)   

Maybe.  FFS or flash filesystem already exists.  And YAFFS, yet another
flash filesystem, would be older than flashfs.

My experience is that no matter which name I pick, people will complain
anyway.  Previous suggestions included:
jffs3
jefs
engelfs
poofs
crapfs
sweetfs
cutefs
dynamic journaling fs - djofs
tfsfkal - the file system formerly known as logfs

Plus today:
FFFS
flashfs
fredfs
bob
shizzle

Imo they all suck.  LogFS also sucks, but it allows me to make a stupid
joke and keep my logfs.org domain.

Jörn

-- 
There are two ways of constructing a software design: one way is to make
it so simple that there are obviously no deficiencies, and the other is
to make it so complicated that there are no obvious deficiencies.
-- C. A. R. Hoare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 15:36:44 +0300, Pekka Enberg wrote:
> On 5/16/07, Jörn Engel <[EMAIL PROTECTED]> wrote:
> >
> >More trouble?
> 
> Forgot to add (see below). Seems logfs_segment_read would be simpler
> too if you fixed this.

Would it?  I think that code would still be needed, although possibly in
a different function.

There are two minor drawbacks to using the page cache, btw:
- Indirect blocks need some mapping too.  So either I need to steal a
  bit from the inode space or from the fpos space.
- OOM handling is a bit more complicated.  I would need a mempool for
  that.

> >[ Objects are the units that get compressed.  Segments can contain both
> >compressed and uncompressed objects. ]
> >
> >It is a trade-off.  Each object has a 24 Byte header plus X Bytes of
> >data.  Whether the data is compressed or not is indicated in the header.
> 
> Was my point really. Why do segments contain both compressed and
> uncompressed objects?

Compressing random data will actually enlarge it.  If that happens I
simply store the verbatim uncompressed data instead and mark it as such.

There is also demand for a user-controlled bit in the inode to disable
compression completely.  All those .jpg, .mpg, .mp3, etc. just waste
time by trying and failing to compress them.

Jörn

-- 
Write programs that do one thing and do it well. Write programs to work
together. Write programs to handle text streams, because that is a
universal interface.
-- Doug MacIlroy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 15:08:15 +0300, Pekka Enberg wrote:
> On 5/16/07, Jamie Lokier <[EMAIL PROTECTED]> wrote:
> >Given that the filesystem is still 'experimental', I'd concentrate on
> >getting it stable before worrying about immutable and xattrs unless
> >they are easy.
> 
> We will run into trouble if the on-disk format is not flexible enough
> to accommodate xattrs (think reiser3 here). So I'd worry about it
> before merging to mainline.

Adding xattrs would be fairly simple.  Inodes just need one extra
pointer for that.

Luckily inodes no longer need to be padded to 128 or 256 bytes.  They
are individually compressed, so their size is not limited to powers of
two.

Jörn

-- 
To recognize individual spam features you have to try to get into the
mind of the spammer, and frankly I want to spend as little time inside
the minds of spammers as possible.
-- Paul Graham
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 16:29:22 +0400, Evgeniy Polyakov wrote:
> On Wed, May 16, 2007 at 01:50:03PM +0200, Jörn Engel ([EMAIL PROTECTED]) 
> wrote:
> > On Wed, 16 May 2007 12:34:34 +0100, Jamie Lokier wrote:
> > > 
> > > But if akpm can't pronounce it, how about FFFS for faster flash
> > > filesystem ;-)
> > 
> > How many of you have worked for IBM before?  Vowels are not evil. ;)
> 
> Do you think 'eieio' is a good set? IBM's work too...

I will let someone else comment on that one.

http://www.uwsg.iu.edu/hypermail/linux/kernel/0110.1/1294.html

Jörn

-- 
There are two ways of constructing a software design: one way is to make
it so simple that there are obviously no deficiencies, and the other is
to make it so complicated that there are no obvious deficiencies.
-- C. A. R. Hoare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Jörn Engel
On Wed, 16 May 2007 13:25:48 +0100, Jamie Lokier wrote:
> 
> Is LogFS really slower than JFFS2 in practice?

Not sure.  I ran a benchmark before adding compression support in QEMU
with a lightning-fast device.  So the results should differ quite a bit
from practice.

http://logfs.org/~joern/logfs/benchmark/benchmark_overview

LogFS was actually faster than JFFS2.  So for that particular
unrealistic benchmark, updating the LogFS tree was less expensive than
trying (and failing) to compress and calculating the CRC was for JFFS2.

With compression finished, I would expect LogFS numbers to degrade.  If
file data had checksums (not done yet, should be optional for users to
decide) even more so.

> I would have guessed reads to be a similar speed, tree updates to be a
> similar speed  to journal  updates for sustained  non-fsyncing writes,
> and the difference unimportant for tiny individual commits whose index
> updates are not merged with any other.  I've not thought about it much
> though.

LogFS isn't that good yet.  Right now, writing 10 adjacent blocks to a
file requires 10 tree updates instead of 1.  Not full updates though,
just up to the inode.

Quite surprisingly, read speed in the benchmark was significantly better
for LogFS, even after substracting mount time.  I don't know if all of
that can be explained with CRC checks or there is more to it.

Jörn

-- 
I can say that I spend most of my time fixing bugs even if I have lots
of new features to implement in mind, but I give bugs more priority.
-- Andrea Arcangeli, 2000
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Pekka Enberg

On 5/16/07, Pekka Enberg <[EMAIL PROTECTED]> wrote:

Forgot to add (see below). Seems logfs_segment_read would be simpler
too if you fixed this.


Blah. Just to be clear: I forgot to add a "(see below)" text in the
original review comment.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-16 Thread Pekka Enberg

On 5/16/07, Jörn Engel <[EMAIL PROTECTED]> wrote:

> > +/* FIXME: all this mess should get replaced by using the page cache */
> > +static void fixup_from_wbuf(struct super_block *sb, struct logfs_area
> *area,
> > + void *read, u64 ofs, size_t readlen)
> > +{
>
> Indeed. And I think you're getting some more trouble because of this...

More trouble?


Forgot to add (see below). Seems logfs_segment_read would be simpler
too if you fixed this.


[ Objects are the units that get compressed.  Segments can contain both
compressed and uncompressed objects. ]

It is a trade-off.  Each object has a 24 Byte header plus X Bytes of
data.  Whether the data is compressed or not is indicated in the header.


Was my point really. Why do segments contain both compressed and
uncompressed objects?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >