Re: Article on LWN about recent discussions on reiser4 and inclusion
Jorgen Hermanrud Fjeld wrote: The recent discussions regarding reiser4 and possible inclusion have also caught the eye(s) of LWN. I have made the article available for you, non-lwn-subscribers, so that you may have a look at it here http://lwn.net/SubscriberLink/193663/9d2ac03195c775bc/;. Jorgen, are you with lwn? Thanks Jorgen. It was a remarkably positive article, and the posters were also quite positive. Hans
Re: Checksumming blocks? [was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]
On 4-Aug-06, at 3:25 AM, Russell Leighton wrote: If the software (filesystem like ZFS or database like Berkeley DB) finds a mismatch for a checksum on a block read, then what? Is there a recovery mechanism, or do you just be happy you know there is a problem (and go to backup)? ZFS will correct from a good mirror (http://blogs.sun.com/roller/page/ bonwick?entry=zfs_end_to_end_data). --T Thx Matthias Andree wrote: Berkeley DB can, since version 4.1 (IIRC), write checksums (newer versions document this as SHA1) on its database pages, to detect corruptions and writes that were supposed to be atomic but failed (because you cannot write 4K or 16K atomically on a disk drive).
Re: Article on LWN about recent discussions on reiser4 and inclusion
Hi, On 2006-08-03 23:44:55, Hans Reiser wrote: Jorgen Hermanrud Fjeld wrote: The recent discussions regarding reiser4 and possible inclusion have also caught the eye(s) of LWN. I have made the article available for you, non-lwn-subscribers, so that you may have a look at it here http://lwn.net/SubscriberLink/193663/9d2ac03195c775bc/;. Jorgen, are you with lwn? Thanks Jorgen. You are welcome. I'm just a subscriber of LWN, which gives me the possibility of getting direct links to articles, before they are publicly available next week. I just thought you should be aware of the press, and have the possibility of making your own remarks, if need be. It was a remarkably positive article, and the posters were also quite positive. Yes the article was nice, and when I reveal that I used the name Armagh as a game alias when I was younger, my post is also evident. When I first have your personal attention, I would like to thank you for your good work. I think your ideas on the future of file systems, as I have read about them on namesys and in the mailing-list, are profoundly important. I have been using reiser3 for a long time, and would just like express my support and gratitude. -- Sincerely | Homepage: Jørgen| http://www.hex.no/jhf | Public GPG key: | http://www.hex.no/jhf/key.txt The solution of problems is the most characteristic and peculiar sort of voluntary thinking. -- William James
Re: Checksumming blocks? [was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]
That was exactly the summary I was looking for. I would enourage folks to read the referenced link Toby sent: http://blogs.sun.com/roller/page/bonwick?entry=zfs_end_to_end_data ...also the linked RAID-Z summary from this article was very interesting, since something like this is needed for recovery from checksum failures: Which brings us to the coolest thing about RAID-Z: self-healing data. In addition to handling whole-disk failure, RAID-Z can also detect and correct silent data corruption. Whenever you read a RAID-Z block, ZFS compares it against its checksum. If the data disks didn't return the right answer, ZFS reads the parity and then does combinatorial reconstruction to figure out which disk returned bad data. It then repairs the damaged disk and returns good data to the application. ZFS also reports the incident through Solaris FMA so that the system administrator knows that one of the disks is silently failing. Finally, note that *RAID-Z doesn't require any special hardware.* It doesn't need NVRAM for correctness, and it doesn't need write buffering for good performance. With RAID-Z, ZFS makes good on the original RAID promise: it provides fast, reliable storage using cheap, commodity disks. http://blogs.sun.com/roller/page/bonwick?entry=raid_z Toby Thain wrote: On 4-Aug-06, at 3:25 AM, Russell Leighton wrote: If the software (filesystem like ZFS or database like Berkeley DB) finds a mismatch for a checksum on a block read, then what? Is there a recovery mechanism, or do you just be happy you know there is a problem (and go to backup)? ZFS will correct from a good mirror (http://blogs.sun.com/roller/page/bonwick?entry=zfs_end_to_end_data). --T Thx Matthias Andree wrote: Berkeley DB can, since version 4.1 (IIRC), write checksums (newer versions document this as SHA1) on its database pages, to detect corruptions and writes that were supposed to be atomic but failed (because you cannot write 4K or 16K atomically on a disk drive).
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Hans Reiser wrote: Edward Shishkin wrote: Matthias Andree wrote: On Tue, 01 Aug 2006, Hans Reiser wrote: You will want to try our compression plugin, it has an ecc for every 64k What kind of forward error correction would that be, Actually we use checksums, not ECC. If checksum is wrong, then run fsck - it will remove the whole disk cluster, that represent 64K of data. How about we switch to ecc, which would help with bit rot not sector loss? Interesting aspect. Yes, we can implement ECC as a special crypto transform that inflates data. As I mentioned earlier, it is possible via translation of key offsets with scale factor 1. Of course, it is better then nothing, but anyway meta-data remains ecc-unprotected, and, hence, robustness is not increased.. Edward. and how much and what failure patterns can it correct? URL suffices. Checksum is checked before unsafe decompression (when trying to decompress incorrect data can lead to fatal things). It can be broken because of many reasons. The main one is tree corruption (for example, when disk cluster became incomplete - ECC can not help here). Perhaps such checksumming is also useful for other things, I didnt classify the patterns.. Edward.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
On 8/4/06, Edward Shishkin [EMAIL PROTECTED] wrote: Hans Reiser wrote: Edward Shishkin wrote: Matthias Andree wrote: On Tue, 01 Aug 2006, Hans Reiser wrote: You will want to try our compression plugin, it has an ecc for every 64k What kind of forward error correction would that be, Actually we use checksums, not ECC. If checksum is wrong, then run fsck - it will remove the whole disk cluster, that represent 64K of data. How about we switch to ecc, which would help with bit rot not sector loss? Interesting aspect. Yes, we can implement ECC as a special crypto transform that inflates data. As I mentioned earlier, it is possible via translation of key offsets with scale factor 1. Of course, it is better then nothing, but anyway meta-data remains ecc-unprotected, and, hence, robustness is not increased.. Edward. and how much and what failure patterns can it correct? URL suffices. Checksum is checked before unsafe decompression (when trying to decompress incorrect data can lead to fatal things). It can be broken because of many reasons. The main one is tree corruption (for example, when disk cluster became incomplete - ECC can not help here). Perhaps such checksumming is also useful for other things, I didnt classify the patterns.. Edward. Would the storage + plugin subsystem support storing 1 copies of the metadata tree? -- Greetz, Antonio Vargas aka winden of network http://network.amigascne.org/ [EMAIL PROTECTED] [EMAIL PROTECTED] Every day, every year you have to work you have to study you have to scene.
Re: Checksumming blocks? [was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]
Russell Leighton wrote: Is there a recovery mechanism, or do you just be happy you know there is a problem (and go to backup)? You probably go to backup anyway. The recovery mechanism just means you get to choose the downtime to restore from backup (if there is downtime), versus being suddenly down until you can restore.
Re: reiser4: maybe just fix bugs?
Theodore Tso wrote: On Tue, Aug 01, 2006 at 11:55:57AM -0500, David Masover wrote: If I understand it right, the original Reiser4 model of file metadata is the file-as-directory stuff that caused such a furor the last big push for inclusion (search for Silent semantic changes in Reiser4): The furor was caused by concerns Al Viro expressed about locking/deadlock issues that reiser4 introduced. Which, I believe, was about file-as-dir. Which also had problems with things like directory loops. That's sort of a disk space memory leak. The bigger issue with xattr support is two-fold. First of all, there are the progams that are expecting the existing extended attribute interface, Yeah... More importantly are the system-level extended attributes, such as those used by SELINUX, which by definition are not supposed to be visible to the user at all, I don't see why either of these are issues. The SELINUX stuff can be a plugin which doesn't necessarily have a user-level interface. Cryptocompress, for instance, exists independent of its user-level interface (probably the file-as-dir stuff), and will probably be implemented in some sort of stable form as a system-wide default for new files. So, certainly metadata (xattrs) as a plugin could be implemented with no UI at all, or any given UI. ... Anyway, I still see no reason why these cannot be implemented in Reiser4, other than the possibility that if it uses plugins, I guarantee that at least one or two people will hate the implementation for that reason alone. Not supporting xattrs means that those distro's that use SELINUX by default (i.e., RHEL, Fedora, etc.) won't want to use reiser4, because SELINUX won't work on reiser4 filesytstems. Right. So they will be implemented, eventually. Whether or not Hans cares about this is up to him He does, or he should. Reiser4 needs every bit of acceptance it can get right now, as long as it can get them without compromising its goals or philosophy. Extended attributes only compromise these because it provides less incentive to learn any other metadata interface that Reiser4 provides. But that's irrelevant if Reiser4 doesn't gain enough acceptance due to lack of xattr support, anything it has will be irrelevant anyway. So just as we provide the standard interface to Unix permissions (even though we intend to implement things like acls and views, and even though there was a file/.pseudo/rwx interface), we should provide the standard xattr interface, and the standard direct IO interface, and anything else that's practical. Be a good, standard filesystem first, and an innovative filesystem second.
symlink issues with reiser4
Title: symlink issues with reiser4 Before I investigate whether it is a problem with the test or tested program or something else, are there known issues with symbolic links and reiser4? See http://forums.gentoo.org/viewtopic-t-485689-highlight-reiser4+symbolic.html for details on what I am seeing. Brant Gurganus http://www.rose-hulman.edu/~gurganbl
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Horst H. von Brand wrote: Vladimir V. Saveliev [EMAIL PROTECTED] wrote: On Tue, 2006-08-01 at 17:32 +0200, Åukasz Mierzwa wrote: What fancy (beside cryptocompress) does reiser4 do now? it is supposed to provide an ability to easy modify filesystem behaviour in various aspects without breaking compatibility. If it just modifies /behaviour/ it can't really do much. And what can be done here is more the job of the scheduler, not of the filesystem. Keep your hands off it! Say wha? There's a lot you can do with the _representation_ of the on-disk format without changing the _physical_ on-disk format. As a very simple example, a plugin could add a sysfs-like folder with information about that particular filesystem. Yes, I know there are better ways to do things, but there are things you can change about behavior without (I think) touching the scheduler. Or am I wrong about the scope of the scheduler? If it somehow modifies /on disk format/, it (by *definition*) isn't compatible. Ditto. Cryptocompress is compatible with kernels that have a working cryptocompress plugin. Other kernels will notice that they are meant to be read by cryptocompress, and (I hope) refuse to read files they won't be able to. Same would be true of any plugin that changes the disk format. But, the above comments about behavior still hold. There's a lot you can do with plugins without changing the on-disk format. If you want a working example, look to your own favorite filesystems that support quotas, xattrs, and acls -- is an on-disk FS format with those enabled compatible with a kernel that doesn't support them (has them turned off)? How about ext3, with its journaling -- is the journaling all in the scheduler? But isn't the ext3 disk format compatible with ext2? quota support xattrs and acls Without those, it is next to useless anyway. What is? The FS? I use neither on desktop machines, though I'd appreciate xattrs for Beagle. Or are you talking about the plugins? See above, then.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Edward Shishkin wrote: How about we switch to ecc, which would help with bit rot not sector loss? Interesting aspect. Yes, we can implement ECC as a special crypto transform that inflates data. As I mentioned earlier, it is possible via translation of key offsets with scale factor 1. Of course, it is better then nothing, but anyway meta-data remains ecc-unprotected, and, hence, robustness is not increased.. Edward. Would you prefer to do it as a node layout plugin instead, so as to get the metadata? Hans
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Antonio Vargas wrote: On 8/4/06, Edward Shishkin [EMAIL PROTECTED] wrote: Hans Reiser wrote: Edward Shishkin wrote: Matthias Andree wrote: On Tue, 01 Aug 2006, Hans Reiser wrote: You will want to try our compression plugin, it has an ecc for every 64k What kind of forward error correction would that be, Actually we use checksums, not ECC. If checksum is wrong, then run fsck - it will remove the whole disk cluster, that represent 64K of data. How about we switch to ecc, which would help with bit rot not sector loss? Interesting aspect. Yes, we can implement ECC as a special crypto transform that inflates data. As I mentioned earlier, it is possible via translation of key offsets with scale factor 1. Of course, it is better then nothing, but anyway meta-data remains ecc-unprotected, and, hence, robustness is not increased.. Edward. and how much and what failure patterns can it correct? URL suffices. Checksum is checked before unsafe decompression (when trying to decompress incorrect data can lead to fatal things). It can be broken because of many reasons. The main one is tree corruption (for example, when disk cluster became incomplete - ECC can not help here). Perhaps such checksumming is also useful for other things, I didnt classify the patterns.. Edward. Would the storage + plugin subsystem support storing 1 copies of the metadata tree? I suppose What would be nice would be to have a plugin that when a node fails its checksum/ecc it knows to get it from another mirror, and which generally handles faults with a graceful understanding of its ability to get copies from a mirror (or RAID parity calculation). I would happily accept such a patch (subject to usual reservation of right to complain about implementation details).