Re: rfs (new filesystem for ELKS)

1999-07-26 Thread Beau Kuiper

On Sun, 25 Jul 1999, Robert de Bath wrote:
> Ok, problem.
> 
>Defragmentation is _very_ difficult, if you move a file you'll have to
> renumber the inode, nasty for files with multiple links.
> 
> If you use aligned allocation, as you've got a free blocks bitmap it's
> easy, you should be able to avoid the need for defragmentation.

My filesystem breaks a disk into several zones (up to 20 atm). What i am going
to do, whenever a file expands, it tries to allocate a new block from the
same zone as its inode. Whenever a new inode is needed, the filesystem will
allocate it in the first zone that has less than 70% block usage. If all zones
are more than 70% full, it will use the emptiest zone. Also defragmentation
will be reasonably difficult, but not impossible. The best way to do it is
remember where inodes are moved to and at the end just update all the inode
numbers.

> Next a suggestion, I think it'd be a good idea to put a magic number at 
> the start of an inode, that way 'lost clusters' can be checked for file
> inodes easily and reattached to lost+found if sane.

Good idea, I will do that :)

> I notice that you've not mentioned how the end of the filesystem is
> marked, you do need this as the filesystem image may be copied to a new
> larger device.  In fact it'd be nice if you recorded only the higest
> block that has actually been written to, that way if the filesystem is
> copied to a larger device it will _automatically_ extend to fill the
> device. (really neat with /dev/md devices :-) ) Unfortunatly to shrink
> a filesystem you'd need to solve the defragmentation problem.

Oh, do i have to mark the end of the filesystem. Wat happens is that all zones
are of equal size(except the last one) and the superblock remembers how large
the filesytem really is. To make the filesystem larger, it is simply a case of
enlarging the size of the last zone and adding more zones as needed, and then
telling the superblock how big the new fs is.

> I know 64Mb (actually I'd estimate about 60Mb after the inode info is
> allocated) is more than enough, but you should _specify_ triple indirect
> and possibly larger. Who knows you may want to copy a CDROM one day. :-)

Well, it is ELKS. However, I imagine that eventually I could write an algorthm
that expands to infinite indirectness. (up to MAXLONGINT (4 gig)).

> Compression/encoding: specify hooks and data space for things like
> compression, forward error correction and encryption. (eg encoding
> method field and lseek translation table pointer)

Errk. Sounds like Version 2 stuff :)

> The final things you might want to consider are journeling and other
> "new features people will want for V2"

Journeling, sounds like fun, and complicated :)
 
Thanks for the ideas Rob.

Beau Kuiper
[EMAIL PROTECTED]



Re: rfs (new filesystem for ELKS)

1999-07-25 Thread Robert de Bath

Ok, problem.

   Defragmentation is _very_ difficult, if you move a file you'll have to
renumber the inode, nasty for files with multiple links.

If you use aligned allocation, as you've got a free blocks bitmap it's
easy, you should be able to avoid the need for defragmentation.

Next a suggestion, I think it'd be a good idea to put a magic number at 
the start of an inode, that way 'lost clusters' can be checked for file
inodes easily and reattached to lost+found if sane.

I notice that you've not mentioned how the end of the filesystem is
marked, you do need this as the filesystem image may be copied to a new
larger device.  In fact it'd be nice if you recorded only the higest
block that has actually been written to, that way if the filesystem is
copied to a larger device it will _automatically_ extend to fill the
device. (really neat with /dev/md devices :-) ) Unfortunatly to shrink
a filesystem you'd need to solve the defragmentation problem.

I know 64Mb (actually I'd estimate about 60Mb after the inode info is
allocated) is more than enough, but you should _specify_ triple indirect
and possibly larger. Who knows you may want to copy a CDROM one day. :-)

Compression/encoding: specify hooks and data space for things like
compression, forward error correction and encryption. (eg encoding
method field and lseek translation table pointer)

The final things you might want to consider are journeling and other
"new features people will want for V2"

-- 
Rob.  (Robert de Bath )
 

On Fri, 16 Jul 1999, Beau Kuiper wrote:

> Hi again,
> 
> I am working with the elks source and I am creating a new filesystem and would
> love some feedback on its design. It has the following properties:
> 
> 1) All blocks are 1024 bytes in size.
> 
> 2) the first 8K is managed as follows
> 
> 1st 512 byte sector is solely used for a boot sector to load 7K bootup manager
> of choice. This is good because the sys program doens't have to worry about
> trashing the superblock as it installs the bootup manager of choice :)
> 
> 2nd 512 byte sector is the superblock. The data in the superblock can be
> recreated if needed, so copies of it arn't required.
> 
> the next 7 blocks (7K) is the boot manager.
> 
> 3) The rest of the space is divided up into zones, each containing a maximum of
> 65536 blocks.
> 
> 4) The first 8K of each zone is a bitmap of free blocks in the zone, 1 for
> used, 0 for free.
> 
> 5) The rest of the zone is used for storing inodes/files.
> 
> 6) Inodes are stored in exactly the same areas of file data and are 1024 bytes
> in size.
> 
> 7) For files < 1000 bytes in size (often dirs, symbolic links, many small
> files), the data is directly stored after the inode (great performace boost
> IMHO)
> 
> 8) For files > 1000 bytes, that space in the inode is used to point to blocks
> the data is really stored in. This allows files up to 256000 bytes to be
> stored, and offers decent performace.
> 
> 9) For files > 256000 bytes, that space is used to store pointers to blocks
> containing further pointers to blocks that store the data (indirect). This
> allows files up to the size of 65536000 bytes (64M) which should be more than
> enough :).
> 
> 10) Inode numbers are really a combination of the zone and block. (32 bit
> number. High 16 bit is zone, low 16 bit is block). This allows file references
> to be picked up quickly.
> 
> 11) Directores work like they do under ext2. This allows long fille names :)
> 
> 12) The first inode will be a bad block inode that stores bad blocks. The
> second will be the root inode.
> 
> That is about it for now. Please give me comments about the robustness,
> performace, memory usage, ect.
> 
> Beau Kuiper
> [EMAIL PROTECTED]
> 
> 



RE: rfs (new filesystem for ELKS)

1999-07-19 Thread Beau Kuiper

On Tue, 20 Jul 1999, Greg Haerr wrote:
> On Monday, July 19, 1999 11:08 AM, David Murn [SMTP:[EMAIL PROTECTED]] wrote:
> : On Mon, 19 Jul 1999, Greg Haerr wrote:
> : 
> : > > Why has elks chosen a 16 bit inode number for stat when the rest of 
> : > > the world has 32 bit inode numbers? It probably is a good idea to use
> : > > 32bit inode numbers. 
> : 
> : I'd say the simple answer is that because ELKS is targetted at 16bit (or
> : less, ie. not 32bit) machines, that 32bit inode numbers aren't a good
> : idea.  Same reason that under Linux/x86, we use 32bit instead of 64bit,
> : simply because we've got a 32bit CPU (or 16bit in the case of ELKS).
> : 
> 
>   Actually, the inode width should be dependent on the size
> of disks attached, not the processor specs...

Yes, this is true, the inode numbers for RFS don't actually exceed 16 bit for
partitions less than 64meg in size. However, I cannot see the problem in using
32 bit inode numbers, very little math ever occurs on then, and they use a
whole 2 bytes more in memory. (I know, it is a bad attitiude :), but 32bit
inodes are throughout the kernel already.)

Beau Kuiper
[EMAIL PROTECTED]



RE: rfs (new filesystem for ELKS)

1999-07-19 Thread Greg Haerr

On Monday, July 19, 1999 11:08 AM, David Murn [SMTP:[EMAIL PROTECTED]] wrote:
: On Mon, 19 Jul 1999, Greg Haerr wrote:
: 
: > > Why has elks chosen a 16 bit inode number for stat when the rest of 
: > > the world has 32 bit inode numbers? It probably is a good idea to use
: > > 32bit inode numbers. 
: 
: I'd say the simple answer is that because ELKS is targetted at 16bit (or
: less, ie. not 32bit) machines, that 32bit inode numbers aren't a good
: idea.  Same reason that under Linux/x86, we use 32bit instead of 64bit,
: simply because we've got a 32bit CPU (or 16bit in the case of ELKS).
: 

Actually, the inode width should be dependent on the size
of disks attached, not the processor specs...



RE: rfs (new filesystem for ELKS)

1999-07-19 Thread David Murn

On Mon, 19 Jul 1999, Greg Haerr wrote:

> > Why has elks chosen a 16 bit inode number for stat when the rest of 
> > the world has 32 bit inode numbers? It probably is a good idea to use
> > 32bit inode numbers. 

I'd say the simple answer is that because ELKS is targetted at 16bit (or
less, ie. not 32bit) machines, that 32bit inode numbers aren't a good
idea.  Same reason that under Linux/x86, we use 32bit instead of 64bit,
simply because we've got a 32bit CPU (or 16bit in the case of ELKS).

Davey



RE: rfs (new filesystem for ELKS)

1999-07-19 Thread Greg Haerr

On Sunday, July 18, 1999 6:01 AM, Alan Cox [SMTP:[EMAIL PROTECTED]] wrote:
: > Why has elks chosen a 16 bit inode number for stat when the rest of the world
: > has 32 bit inode numbers? It probably is a good idea to use 32bit inode numbers.
: 
: Small computers, small problems. 
: 
Alan's response is an accurate overall encapsulation of the problem.
A bit of history: elks is based on linux ext2 is based on linux minix is based on 
minix,
whose executable file format was copied which used 16 bit inode numbers... (remember 
stat?)




Re: rfs (new filesystem for ELKS)

1999-07-18 Thread Beau Kuiper

On Sun, 18 Jul 1999, Alan Cox wrote:
> > Why has elks chosen a 16 bit inode number for stat when the rest of the world
> > has 32 bit inode numbers? It probably is a good idea to use 32bit inode numbers.
> 
> Small computers, small problems.

Actually, inside the ELKS kernel, inodes numbers are 32 bits in size, it is
only in stat.h and minux fs that it is 16 bit. This is all i really need since
userland programs don't access files using inode numbers :)

Beau Kuiper
[EMAIL PROTECTED]



Re: rfs (new filesystem for ELKS)

1999-07-18 Thread Alan Cox

> Why has elks chosen a 16 bit inode number for stat when the rest of the world
> has 32 bit inode numbers? It probably is a good idea to use 32bit inode numbers.

Small computers, small problems. 



RE: rfs (new filesystem for ELKS)

1999-07-16 Thread Beau Kuiper

On Sat, 17 Jul 1999, Greg Haerr wrote:
> : Actually, I have only studied the extent file systems, which are good for some
> : things (like performace on really big files and smaller files). For small
> : systems, these are not really suitable because they require too much caching,
> : esp for really small files. Under an OS like elks, this wastes a lot of space. 
> : I probably should read about NTFS one day, but it sounds really complex (they
> : use B+ trees to store directory names). Such complexity is hard to do under
> : ELKS.
> 
>   The NTFS filesystem is documented in a small thin book available
> at many bookstores in the US.  I don't know about down under.
>
I may read up on it sometime :)
> : 
> : I choose the 1024 byte block becuase:
> : 
> : 1) that is the size elks reads AFAIK when you read a buffer.
> : 2) you have a reasonable area to store small files. For small files, this
> : removes an expensive seek on the hard drive. Most directories are small files,
> : and will save much time over separated inode/data fs's.
> : 3) If i chose a smaller block size, there will be more overhead with the free
> : block bitmap, and i would probably need double indirect to store really large
> : files.
> 
>   I think the 1k block size is fine.
> 
> : : Oh, BTW, the inode numbers i chose for root/bad blocks is abitary and
> : unimportant to the outside world. (I think (I am still learning))
> : 
>   The inode number size *is* important, or the sys/stat.h structure
> has to change, causing all sorts of portability problems.  Others may want
> to comment on this.

Why has elks chosen a 16 bit inode number for stat when the rest of the world
has 32 bit inode numbers? It probably is a good idea to use 32bit inode numbers.

Beau KUiper
[EMAIL PROTECTED]



RE: rfs (new filesystem for ELKS)

1999-07-16 Thread Greg Haerr

: Actually, I have only studied the extent file systems, which are good for some
: things (like performace on really big files and smaller files). For small
: systems, these are not really suitable because they require too much caching,
: esp for really small files. Under an OS like elks, this wastes a lot of space. 
: I probably should read about NTFS one day, but it sounds really complex (they
: use B+ trees to store directory names). Such complexity is hard to do under
: ELKS.

The NTFS filesystem is documented in a small thin book available
at many bookstores in the US.  I don't know about down under.




: 
: I choose the 1024 byte block becuase:
: 
: 1) that is the size elks reads AFAIK when you read a buffer.
: 2) you have a reasonable area to store small files. For small files, this
: removes an expensive seek on the hard drive. Most directories are small files,
: and will save much time over separated inode/data fs's.
: 3) If i chose a smaller block size, there will be more overhead with the free
: block bitmap, and i would probably need double indirect to store really large
: files.

I think the 1k block size is fine.

: : Oh, BTW, the inode numbers i chose for root/bad blocks is abitary and
: unimportant to the outside world. (I think (I am still learning))
: 
The inode number size *is* important, or the sys/stat.h structure
has to change, causing all sorts of portability problems.  Others may want
to comment on this.

Greg



RE: rfs (new filesystem for ELKS)

1999-07-16 Thread Beau Kuiper

On Sat, 17 Jul 1999, Greg Haerr wrote:

Actually, I have only studied the extent file systems, which are good for some
things (like performace on really big files and smaller files). For small
systems, these are not really suitable because they require too much caching,
esp for really small files. Under an OS like elks, this wastes a lot of space. 
I probably should read about NTFS one day, but it sounds really complex (they
use B+ trees to store directory names). Such complexity is hard to do under
ELKS.

I choose the 1024 byte block becuase:

1) that is the size elks reads AFAIK when you read a buffer.
2) you have a reasonable area to store small files. For small files, this
removes an expensive seek on the hard drive. Most directories are small files,
and will save much time over separated inode/data fs's.
3) If i chose a smaller block size, there will be more overhead with the free
block bitmap, and i would probably need double indirect to store really large
files.

Later on, I will probably add an option to change the blocksize, but ATM, i am
still learning how to write the FS in elks :)

Oh, BTW, the inode numbers i chose for root/bad blocks is abitary and
unimportant to the outside world. (I think (I am still learning))

> : 3) The rest of the space is divided up into zones, each containing a
maximum of > : 65536 blocks.
> : 
> : 4) The first 8K of each zone is a bitmap of free blocks in the zone, 1 for
> : used, 0 for free.
> : 
> : 5) The rest of the zone is used for storing inodes/files.
> : 
> 
>   This is basically the BSD filesystem idea, right?
> 
> : 6) Inodes are stored in exactly the same areas of file data and are 1024 bytes
> : in size.
> : 
> : 7) For files < 1000 bytes in size (often dirs, symbolic links, many small
> : files), the data is directly stored after the inode (great performace boost
> : IMHO)
> 
>   1K inodes are great for big operating systems, but may not
> be a good idea for small ones.  Nonetheless, this is interesting, and is the
> same thing that NTFS uses.  I presume you've read all the lit on NTFS, 
> because it has some really neat ideas for a new fs design, which is what
> you're attempting.
> 
> 
> : 10) Inode numbers are really a combination of the zone and block. (32 bit
> : number. High 16 bit is zone, low 16 bit is block). This allows file references
> : to be picked up quickly.
> : 
> : 11) Directores work like they do under ext2. This allows long fille names :)
> : 
> : 12) The first inode will be a bad block inode that stores bad blocks. The
> : second will be the root inode.
> 
>   Why not swap the two and have the first inode be the root inode,
> just like now?
> 
>   It might be nice to have the 1k inode size configurable.  It also
> would be nice to write a utility that would show what the average file size
> is on a minix or ext2 filesystem.
> 
> Greg



RE: rfs (new filesystem for ELKS)

1999-07-16 Thread Greg Haerr

: 3) The rest of the space is divided up into zones, each containing a maximum of
: 65536 blocks.
: 
: 4) The first 8K of each zone is a bitmap of free blocks in the zone, 1 for
: used, 0 for free.
: 
: 5) The rest of the zone is used for storing inodes/files.
: 

This is basically the BSD filesystem idea, right?

: 6) Inodes are stored in exactly the same areas of file data and are 1024 bytes
: in size.
: 
: 7) For files < 1000 bytes in size (often dirs, symbolic links, many small
: files), the data is directly stored after the inode (great performace boost
: IMHO)

1K inodes are great for big operating systems, but may not
be a good idea for small ones.  Nonetheless, this is interesting, and is the
same thing that NTFS uses.  I presume you've read all the lit on NTFS, 
because it has some really neat ideas for a new fs design, which is what
you're attempting.


: 10) Inode numbers are really a combination of the zone and block. (32 bit
: number. High 16 bit is zone, low 16 bit is block). This allows file references
: to be picked up quickly.
: 
: 11) Directores work like they do under ext2. This allows long fille names :)
: 
: 12) The first inode will be a bad block inode that stores bad blocks. The
: second will be the root inode.

Why not swap the two and have the first inode be the root inode,
just like now?

It might be nice to have the 1k inode size configurable.  It also
would be nice to write a utility that would show what the average file size
is on a minix or ext2 filesystem.

Greg



Re: rfs (new filesystem for ELKS)

1999-07-16 Thread Beau Kuiper

On Fri, 16 Jul 1999, you wrote:
> On Fri, 16 Jul 1999, Beau Kuiper wrote:
> 
> > 12) The first inode will be a bad block inode that stores bad blocks. The
> > second will be the root inode.
> 
> But what if the first or second inode gets trashed?
> Is there any way to rescue any data then?

The bad block inode can be created by doing a bad block scan.
But for the root inode, thinking about it now, i will have to change the layout
slightly.

I will reserve another 8K at the end of each zone to be a map of whether a
block is a file or an inode. That way, if the a dir is trashed (along with
the free page table), all lost inodes can be found. Thanks for pointing that
out :). I could allow it to be disabled for non-vital filesystems that need
more performace.

Beau Kuiper
[EMAIL PROTECTED]