Re: Article on LWN about recent discussions on reiser4 and inclusion

2006-08-04 Thread Hans Reiser
Jorgen Hermanrud Fjeld wrote:

The recent discussions regarding reiser4 and possible inclusion have
also caught the eye(s) of LWN.
I have made the article available for you, non-lwn-subscribers, so that you may
have a look at it here 
http://lwn.net/SubscriberLink/193663/9d2ac03195c775bc/;.

  

Jorgen, are you with lwn?  Thanks Jorgen.

It was a remarkably positive article, and the posters were also quite
positive.

Hans


Re: Checksumming blocks? [was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-04 Thread Toby Thain


On 4-Aug-06, at 3:25 AM, Russell Leighton wrote:



If the software (filesystem like ZFS or database like Berkeley DB)   
finds a mismatch for a checksum on a block read, then what?


Is there a recovery mechanism, or do you just be happy you know  
there is a problem (and go to backup)?


ZFS will correct from a good mirror (http://blogs.sun.com/roller/page/ 
bonwick?entry=zfs_end_to_end_data).

--T



Thx

Matthias Andree wrote:


Berkeley DB can, since version 4.1 (IIRC), write checksums (newer
versions document this as SHA1) on its database pages, to detect
corruptions and writes that were supposed to be atomic but failed
(because you cannot write 4K or 16K atomically on a disk drive).







Re: Article on LWN about recent discussions on reiser4 and inclusion

2006-08-04 Thread Jorgen Hermanrud Fjeld
Hi,

On 2006-08-03 23:44:55, Hans Reiser wrote:
 Jorgen Hermanrud Fjeld wrote:
 
 The recent discussions regarding reiser4 and possible inclusion have
 also caught the eye(s) of LWN.
 I have made the article available for you, non-lwn-subscribers, so that you 
 may
 have a look at it here 
 http://lwn.net/SubscriberLink/193663/9d2ac03195c775bc/;.
 
 Jorgen, are you with lwn?  Thanks Jorgen.
 
You are welcome. I'm just a subscriber of LWN, which gives me the
possibility of getting direct links to articles, before they are publicly
available next week.
I just thought you should be aware of the press, and have the
possibility of making your own remarks, if need be.

 It was a remarkably positive article, and the posters were also quite
 positive.
 
Yes the article was nice, and when I reveal that I used the name Armagh as a 
game alias when I was younger, my post is also evident.

When I first have your personal attention, I would like to thank you for
your good work. I think your ideas on the future of file systems, as I have 
read about them on namesys and in the mailing-list, are profoundly important. 

I have been using reiser3 for a long time, and would just like express
my support and gratitude. 

-- 
Sincerely | Homepage:
Jørgen| http://www.hex.no/jhf
  | Public GPG key:
  | http://www.hex.no/jhf/key.txt

The solution of problems is the most characteristic and peculiar sort
of voluntary thinking.
-- William James


Re: Checksumming blocks? [was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-04 Thread Russell Leighton


That was exactly the summary I was looking for.

I would enourage folks to read the referenced link Toby sent:

   http://blogs.sun.com/roller/page/bonwick?entry=zfs_end_to_end_data

...also the linked RAID-Z summary from this article was very 
interesting, since something like this is needed for recovery from 
checksum failures:


Which brings us to the coolest thing about RAID-Z: self-healing data. 
In addition to handling whole-disk failure, RAID-Z can also detect and 
correct silent data corruption. Whenever you read a RAID-Z block, ZFS 
compares it against its checksum. If the data disks didn't return the 
right answer, ZFS reads the parity and then does combinatorial 
reconstruction to figure out which disk returned bad data. It then 
repairs the damaged disk and returns good data to the application. ZFS 
also reports the incident through Solaris FMA so that the system 
administrator knows that one of the disks is silently failing.


Finally, note that *RAID-Z doesn't require any special hardware.* It 
doesn't need NVRAM for correctness, and it doesn't need write 
buffering for good performance. With RAID-Z, ZFS makes good on the 
original RAID promise: it provides fast, reliable storage using cheap, 
commodity disks.




   http://blogs.sun.com/roller/page/bonwick?entry=raid_z




Toby Thain wrote:



On 4-Aug-06, at 3:25 AM, Russell Leighton wrote:



If the software (filesystem like ZFS or database like Berkeley DB)   
finds a mismatch for a checksum on a block read, then what?


Is there a recovery mechanism, or do you just be happy you know  
there is a problem (and go to backup)?



ZFS will correct from a good mirror 
(http://blogs.sun.com/roller/page/bonwick?entry=zfs_end_to_end_data).

--T



Thx

Matthias Andree wrote:


Berkeley DB can, since version 4.1 (IIRC), write checksums (newer
versions document this as SHA1) on its database pages, to detect
corruptions and writes that were supposed to be atomic but failed
(because you cannot write 4K or 16K atomically on a disk drive).









Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-04 Thread Edward Shishkin

Hans Reiser wrote:

Edward Shishkin wrote:



Matthias Andree wrote:



On Tue, 01 Aug 2006, Hans Reiser wrote:




You will want to try our compression plugin, it has an ecc for every
64k




What kind of forward error correction would that be,




Actually we use checksums, not ECC. If checksum is wrong, then run
fsck - it will remove the whole disk cluster, that represent 64K of
data.



How about we switch to ecc, which would help with bit rot not sector loss?


Interesting aspect.

Yes, we can implement ECC as a special crypto transform that inflates
data. As I mentioned earlier, it is possible via translation of key
offsets with scale factor  1.

Of course, it is better then nothing, but anyway meta-data remains
ecc-unprotected, and, hence, robustness is not increased..

Edward.





and how much and



what failure patterns can it correct? URL suffices.



Checksum is checked before unsafe decompression (when trying to
decompress incorrect data can lead to fatal things). It can be
broken because of many reasons. The main one is tree corruption
(for example, when disk cluster became incomplete - ECC can not
help here). Perhaps such checksumming is also useful for other
things, I didnt classify the patterns..

Edward.











Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-04 Thread Antonio Vargas

On 8/4/06, Edward Shishkin [EMAIL PROTECTED] wrote:

Hans Reiser wrote:
 Edward Shishkin wrote:


Matthias Andree wrote:


On Tue, 01 Aug 2006, Hans Reiser wrote:



You will want to try our compression plugin, it has an ecc for every
64k



What kind of forward error correction would that be,



Actually we use checksums, not ECC. If checksum is wrong, then run
fsck - it will remove the whole disk cluster, that represent 64K of
data.


 How about we switch to ecc, which would help with bit rot not sector loss?

Interesting aspect.

Yes, we can implement ECC as a special crypto transform that inflates
data. As I mentioned earlier, it is possible via translation of key
offsets with scale factor  1.

Of course, it is better then nothing, but anyway meta-data remains
ecc-unprotected, and, hence, robustness is not increased..

Edward.



 and how much and


what failure patterns can it correct? URL suffices.


Checksum is checked before unsafe decompression (when trying to
decompress incorrect data can lead to fatal things). It can be
broken because of many reasons. The main one is tree corruption
(for example, when disk cluster became incomplete - ECC can not
help here). Perhaps such checksumming is also useful for other
things, I didnt classify the patterns..

Edward.




Would the storage + plugin subsystem support storing 1 copies of the
metadata tree?


--
Greetz, Antonio Vargas aka winden of network

http://network.amigascne.org/
[EMAIL PROTECTED]
[EMAIL PROTECTED]

Every day, every year
you have to work
you have to study
you have to scene.


Re: Checksumming blocks? [was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-04 Thread David Masover

Russell Leighton wrote:

Is there a recovery mechanism, or do you just be happy you know there is 
a problem (and go to backup)?


You probably go to backup anyway.  The recovery mechanism just means you 
get to choose the downtime to restore from backup (if there is 
downtime), versus being suddenly down until you can restore.


Re: reiser4: maybe just fix bugs?

2006-08-04 Thread David Masover

Theodore Tso wrote:

On Tue, Aug 01, 2006 at 11:55:57AM -0500, David Masover wrote:
If I understand it right, the original Reiser4 model of file metadata is 
the file-as-directory stuff that caused such a furor the last big push 
for inclusion (search for Silent semantic changes in Reiser4):


The furor was caused by concerns Al Viro expressed about
locking/deadlock issues that reiser4 introduced.  


Which, I believe, was about file-as-dir.  Which also had problems with 
things like directory loops.  That's sort of a disk space memory leak.



The bigger issue with xattr support is two-fold.  First of all, there
are the progams that are expecting the existing extended attribute
interface,


Yeah...


More importantly are the system-level extended attributes, such as
those used by SELINUX, which by definition are not supposed to be
visible to the user at all,


I don't see why either of these are issues.  The SELINUX stuff can be a 
plugin which doesn't necessarily have a user-level interface. 
Cryptocompress, for instance, exists independent of its user-level 
interface (probably the file-as-dir stuff), and will probably be 
implemented in some sort of stable form as a system-wide default for new 
files.


So, certainly metadata (xattrs) as a plugin could be implemented with no 
UI at all, or any given UI.


... Anyway, I still see no reason why these cannot be implemented in 
Reiser4, other than the possibility that if it uses plugins, I 
guarantee that at least one or two people will hate the implementation 
for that reason alone.



Not supporting xattrs means that those distro's that use SELINUX by
default (i.e., RHEL, Fedora, etc.) won't want to use reiser4, because
SELINUX won't work on reiser4 filesytstems.


Right.  So they will be implemented, eventually.


Whether or not Hans cares about this is up to him


He does, or he should.  Reiser4 needs every bit of acceptance it can get 
right now, as long as it can get them without compromising its goals or 
philosophy.  Extended attributes only compromise these because it 
provides less incentive to learn any other metadata interface that 
Reiser4 provides.  But that's irrelevant if Reiser4 doesn't gain enough 
acceptance due to lack of xattr support, anything it has will be 
irrelevant anyway.


So just as we provide the standard interface to Unix permissions (even 
though we intend to implement things like acls and views, and even 
though there was a file/.pseudo/rwx interface), we should provide the 
standard xattr interface, and the standard direct IO interface, and 
anything else that's practical.  Be a good, standard filesystem first, 
and an innovative filesystem second.


symlink issues with reiser4

2006-08-04 Thread Gurganus, Brant L
Title: symlink issues with reiser4






Before I investigate whether it is a problem with the test or tested program or something else, are there known issues with symbolic links and reiser4? See http://forums.gentoo.org/viewtopic-t-485689-highlight-reiser4+symbolic.html for details on what I am seeing.

Brant Gurganus
http://www.rose-hulman.edu/~gurganbl







Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-08-04 Thread David Masover

Horst H. von Brand wrote:

Vladimir V. Saveliev [EMAIL PROTECTED] wrote:

On Tue, 2006-08-01 at 17:32 +0200, Łukasz Mierzwa wrote:



What fancy (beside cryptocompress) does reiser4 do now?

it is supposed to provide an ability to easy modify filesystem behaviour
in various aspects without breaking compatibility.


If it just modifies /behaviour/ it can't really do much. And what can be
done here is more the job of the scheduler, not of the filesystem. Keep your
hands off it!


Say wha?

There's a lot you can do with the _representation_ of the on-disk format 
without changing the _physical_ on-disk format.  As a very simple 
example, a plugin could add a sysfs-like folder with information about 
that particular filesystem.  Yes, I know there are better ways to do 
things, but there are things you can change about behavior without (I 
think) touching the scheduler.


Or am I wrong about the scope of the scheduler?


If it somehow modifies /on disk format/, it (by *definition*) isn't
compatible. Ditto.


Cryptocompress is compatible with kernels that have a working 
cryptocompress plugin.  Other kernels will notice that they are meant to 
be read by cryptocompress, and (I hope) refuse to read files they won't 
be able to.


Same would be true of any plugin that changes the disk format.

But, the above comments about behavior still hold.  There's a lot you 
can do with plugins without changing the on-disk format.  If you want a 
working example, look to your own favorite filesystems that support 
quotas, xattrs, and acls -- is an on-disk FS format with those enabled 
compatible with a kernel that doesn't support them (has them turned 
off)?  How about ext3, with its journaling -- is the journaling all in 
the scheduler?  But isn't the ext3 disk format compatible with ext2?



quota support
xattrs and acls


Without those, it is next to useless anyway.


What is?  The FS?  I use neither on desktop machines, though I'd 
appreciate xattrs for Beagle.


Or are you talking about the plugins?  See above, then.



Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-04 Thread Hans Reiser
Edward Shishkin wrote:




 How about we switch to ecc, which would help with bit rot not sector
 loss?


 Interesting aspect.

 Yes, we can implement ECC as a special crypto transform that inflates
 data. As I mentioned earlier, it is possible via translation of key
 offsets with scale factor  1.

 Of course, it is better then nothing, but anyway meta-data remains
 ecc-unprotected, and, hence, robustness is not increased..

 Edward.

Would you prefer to do it as a node layout plugin instead, so as to get
the metadata?

Hans


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-04 Thread Hans Reiser
Antonio Vargas wrote:

 On 8/4/06, Edward Shishkin [EMAIL PROTECTED] wrote:

 Hans Reiser wrote:
  Edward Shishkin wrote:
 
 
 Matthias Andree wrote:
 
 
 On Tue, 01 Aug 2006, Hans Reiser wrote:
 
 
 
 You will want to try our compression plugin, it has an ecc for every
 64k
 
 
 
 What kind of forward error correction would that be,
 
 
 
 Actually we use checksums, not ECC. If checksum is wrong, then run
 fsck - it will remove the whole disk cluster, that represent 64K of
 data.
 
 
  How about we switch to ecc, which would help with bit rot not
 sector loss?

 Interesting aspect.

 Yes, we can implement ECC as a special crypto transform that inflates
 data. As I mentioned earlier, it is possible via translation of key
 offsets with scale factor  1.

 Of course, it is better then nothing, but anyway meta-data remains
 ecc-unprotected, and, hence, robustness is not increased..

 Edward.

 
 
  and how much and
 
 
 what failure patterns can it correct? URL suffices.
 
 
 Checksum is checked before unsafe decompression (when trying to
 decompress incorrect data can lead to fatal things). It can be
 broken because of many reasons. The main one is tree corruption
 (for example, when disk cluster became incomplete - ECC can not
 help here). Perhaps such checksumming is also useful for other
 things, I didnt classify the patterns..
 
 Edward.
 
 


 Would the storage + plugin subsystem support storing 1 copies of the
 metadata tree?


I suppose

What would be nice would be to have a plugin that when a node fails its
checksum/ecc it knows to get it from another mirror, and which generally
handles faults with a graceful understanding of its ability to get
copies from a mirror (or RAID parity calculation).

I would happily accept such a patch (subject to usual reservation of
right to complain about implementation details).