Re: [PATCH] reiser4: fix readv
btw, please be aware that the post-2.6.17 deadlock fixes which went into generic_file_buffered_write() caused a few problems: a) Significant NFS overwrite performance regression (due to running prepare_write/commit_write) against each writev segment. b) Smaller but significant performance regression for all writev() usages (I expect - haven't measured this yet). c) It's still theoretically deadlockable. I am (slowly) working on addressing these issues - more info later. Basically, we will again need to be able to traverse multiple iovec segments within a single prepare_write/commit_write instance. This will impact your batch_write patch, because that patch somewhat codifies in the function layout the concept that we do one-segment-pre-prepare_write(). So that patch will need significant rework, I suspect. What I'm hoping to be able to do is to simply revert the two deadlock-fix patches (so we go back to the 2.6.17 code) and to remove fault_in_pages_readable() and to simply unlock the page after ->prepare_write() and lock it again prior to ->commit_write(). But I haven't tried that yet - there's quite a bit of preparatory work needed.
Re: [PATCH] reiser4: fix readv
On Sun, 17 Sep 2006 22:39:07 +0400 "Vladimir V. Saveliev" <[EMAIL PROTECTED]> wrote: > Hello > > On Friday 15 September 2006 06:24, Andrew Morton wrote: > > On Wed, 13 Sep 2006 22:12:54 +0400 > > "Vladimir V. Saveliev" <[EMAIL PROTECTED]> wrote: > > > > > Hello, Andrew > > > > > > reiser4 in 2.6.18-rc6-mm2 has a bug. It can not do readv. > > > > > > The attached patch fixes it by implementing reiser4' aio_read file > > > operation. > > > Unfortunately, it appeared to get a loop which is very similar to the one > > > of > > > fs/read_write.c:do_loop_readv_writev(). > > > Alternatively, if do_loop_readv_writev were EXPORT_SYMBOL-ed > > > reiser4' aio_read could use it instead. But, there is a problem with > > > do_loop_readv_writev EXPORT_SYMBOL-ing: > > > one if its arguments is io_fn_t, which is declared in fs/read_write.h. > > > If it is ok to move io_fn_t and do_loop_readv_writev declarations to > > > include/linux/fs.h and to EXPORT_SYMBOL > > > do_loop_readv_writev the fix will be smaller. Please, let me know what > > > would you prefer. > > > > > > > Yes, I'd say that do_loop_readv_writev() is suitable for exporting to > > filesystems, and that doing so is preferable to duplicating it. > > > > That'd be two patches, please: one to do the export and one to use it in > > reiser4. > > > > I assume there's a good reason why reiser4 cannot use > > generic_file_aio_read() or vfs_readv(). Please capture that discussion in > > the changelog for the first patch, thanks. > > > > It seems the problem can be fixed a bit simpler. Currently, there is a > difference between read and readv: read calls f_op->read if it is defined, > but readv calls f_op->read if f_op->aio_read is not defined. The latest is a > bit unlogical imho: > wouldn't it be more consistent if readv worked via f_op->read if it is > defined? > If we fixed readv (do_readv_writev) that way - reiser4 would not need > anything from its aio_read but generic_file_aio_read. > > > From: Vladimir Saveliev <[EMAIL PROTECTED]> > > There is some asymmetry between read and readv: > read (vfs_read, namely) calls f_op->read when it is defined, > while readv (do_readv_writev) calls f_op->read when f_op->aio_read is not > defined. > This patch makes do_readv_writev to call do_loop_readv_writev > (which calls f_op->read for each segment of i/o vector) if f_op->read is > defined. > > Signed-off-by: Vladimir Saveliev <[EMAIL PROTECTED]> > > > > > diff -puN fs/read_write.c~fix-do_readv_writev fs/read_write.c > --- linux-2.6.18-rc6-mm2/fs/read_write.c~fix-do_readv_writev 2006-09-15 > 17:46:03.0 +0400 > +++ linux-2.6.18-rc6-mm2-vs/fs/read_write.c 2006-09-17 23:01:17.0 > +0400 > @@ -608,7 +608,6 @@ static ssize_t do_readv_writev(int type, > if (ret) > goto out; > > - fnv = NULL; > if (type == READ) { > fn = file->f_op->read; > fnv = file->f_op->aio_read; > @@ -617,11 +616,11 @@ static ssize_t do_readv_writev(int type, > fnv = file->f_op->aio_write; > } > > - if (fnv) > + if (fn) > + ret = do_loop_readv_writev(file, iov, nr_segs, pos, fn); > + else > ret = do_sync_readv_writev(file, iov, nr_segs, tot_len, > pos, fnv); > - else > - ret = do_loop_readv_writev(file, iov, nr_segs, pos, fn); > > out: > if (iov != iovstack) > > _ OK, thanks. I'll wait to hear from Badari and Christoph on that.
Re: [PATCH] reiser4: fix readv
On Wed, 13 Sep 2006 22:12:54 +0400 "Vladimir V. Saveliev" <[EMAIL PROTECTED]> wrote: > Hello, Andrew > > reiser4 in 2.6.18-rc6-mm2 has a bug. It can not do readv. > > The attached patch fixes it by implementing reiser4' aio_read file operation. > Unfortunately, it appeared to get a loop which is very similar to the one of > fs/read_write.c:do_loop_readv_writev(). > Alternatively, if do_loop_readv_writev were EXPORT_SYMBOL-ed > reiser4' aio_read could use it instead. But, there is a problem with > do_loop_readv_writev EXPORT_SYMBOL-ing: > one if its arguments is io_fn_t, which is declared in fs/read_write.h. > If it is ok to move io_fn_t and do_loop_readv_writev declarations to > include/linux/fs.h and to EXPORT_SYMBOL > do_loop_readv_writev the fix will be smaller. Please, let me know what would > you prefer. > Yes, I'd say that do_loop_readv_writev() is suitable for exporting to filesystems, and that doing so is preferable to duplicating it. That'd be two patches, please: one to do the export and one to use it in reiser4. I assume there's a good reason why reiser4 cannot use generic_file_aio_read() or vfs_readv(). Please capture that discussion in the changelog for the first patch, thanks. > From: Vladimir Saveliev <[EMAIL PROTECTED]> > > This patch adds implementation of aio_read file operation for reiser4. > It is needed because in reiser4 there are files which can not be dealt > with via generic page cache routines. > In case of readv, reiser4 has no meaning to find out file type and to choose > proper > way to read it. As result generic page cache read gets called for files which > can not be > read that way. Reiser4' aio_read method is to fix that problem. > > Signed-off-by: Vladimir Saveliev <[EMAIL PROTECTED]> > > > > > diff -puN fs/reiser4/plugin/object.c~reiser4-add-aio_read > fs/reiser4/plugin/object.c > --- linux-2.6.18-rc6-mm2/fs/reiser4/plugin/object.c~reiser4-add-aio_read > 2006-09-13 20:18:23.0 +0400 > +++ linux-2.6.18-rc6-mm2-vs/fs/reiser4/plugin/object.c2006-09-13 > 20:18:23.0 +0400 > @@ -101,7 +101,7 @@ file_plugin file_plugins[LAST_FILE_PLUGI > .llseek = generic_file_llseek, > .read = read_unix_file, > .write = do_sync_write, > - .aio_read = generic_file_aio_read, > + .aio_read = aio_read_unix_file, > .aio_write = generic_file_aio_write, > .ioctl = ioctl_unix_file, > .mmap = mmap_unix_file, > diff -puN fs/reiser4/plugin/file/file.c~reiser4-add-aio_read > fs/reiser4/plugin/file/file.c > --- linux-2.6.18-rc6-mm2/fs/reiser4/plugin/file/file.c~reiser4-add-aio_read > 2006-09-13 20:18:23.0 +0400 > +++ linux-2.6.18-rc6-mm2-vs/fs/reiser4/plugin/file/file.c 2006-09-13 > 20:52:30.0 +0400 > @@ -2011,6 +2011,54 @@ out: > return result; > } > > +/** > + * aio_read_unix_file - aio_read of struct file_operations > + * @iocb: i/o control block > + * @iov: i/o vector > + * @nr_segs: number of segments in the i/o vector > + * @pos: file position to read from > + * > + * When it is called within reiser4 context (this happens when sys_read is > + * reading a file built of extents) - just call generic_file_aio_read to > + * perform read into page cache. When it is called without reiser4 context > + * (sys_readv) - call read_unix_file for each segments of i/o vector, so that > + * read_unix_file will be able to choose whether the file is to be read into > + * page cache or the file is built of tail items and page cache read is not > + * suitable for it. > + */ > +ssize_t aio_read_unix_file(struct kiocb *iocb, const struct iovec *iov, > +unsigned long nr_segs, loff_t pos) > +{ > + ssize_t ret = 0; > + > + if (is_in_reiser4_context()) > + return generic_file_aio_read(iocb, iov, nr_segs, pos); > + > + while (nr_segs > 0) { > + void __user *base; > + size_t len; > + ssize_t nr; > + > + base = iov->iov_base; > + len = iov->iov_len; > + iov++; > + nr_segs--; > + > + nr = read_unix_file(iocb->ki_filp, base, len, &iocb->ki_pos); > + if (nr < 0) { > + if (!ret) > + ret = nr; > + break; > + } > + ret += nr; > + if (nr != len) > + break; > + } > + > + return ret; > + > +} > + > static ssize_t read_unix_file_container_tails( > struct file *file, char __user *buf, size_t read_amount, loff_t *off) > { > diff -puN fs/reiser4/plugin/file/file.h~reiser4-add-aio_read > fs/reiser4/plugin/file/file.h > --- linux-2.6.18-rc6-mm2/fs/reiser4/plugin/file/file.h~reiser4-add-aio_read > 2006-09-13 20:18:23.0 +0400 > +++ linux-2.6.18-rc6-mm2-vs/fs/reiser4/plugin/file/file.h
Re: Reiser4 und LZO compression
On Sun, 27 Aug 2006 04:34:26 +0400 Alexey Dobriyan <[EMAIL PROTECTED]> wrote: > The patch below is so-called reiser4 LZO compression plugin as extracted > from 2.6.18-rc4-mm3. > > I think it is an unauditable piece of shit and thus should not enter > mainline. Like lib/inflate.c (and this new code should arguably be in lib/). The problem is that if we clean this up, we've diverged very much from the upstream implementation. So taking in fixes and features from upstream becomes harder and more error-prone. I'd suspect that the maturity of these utilities is such that we could afford to turn them into kernel code in the expectation that any future changes will be small. But it's not a completely simple call. (iirc the inflate code had a buffer overrun a while back, which was found and fixed in the upstream version).
Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching
On Tue, 22 Aug 2006 19:46:34 -0700 Clay Barnes <[EMAIL PROTECTED]> wrote: > Perhaps I mis-recall, but shouldn't delayed writes (or something along > those lines) cause a case where two files are being incrementally > written rare? If we did delayed allocation, yes. But we generally don't. (Exceptions: XFS, reiser4, ext4, ext2 prototype circa 2001).
Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching
I can see that the bigalloc code as-is is pretty sad, but this is a scary patch. It has the potential to cause people significant performance problems way, way ahead in the future. For example, suppose userspace is growing two files concurrently. It could be that the bigalloc code would cause one file's allocation cursor to repeatedly jump far away from the second. ie: a beneficial side-effect. Without bigalloc that side-effect is lost and the two files blocks end up all intermingled. I don't know if that scenario is realistic, but I bet there are similar accidental oddities which can occur as a result of this change. But what are they?
Re: reiser4: maybe just fix bugs?
On Tue, 01 Aug 2006 15:24:37 +0400 "Vladimir V. Saveliev" <[EMAIL PROTECTED]> wrote: > > >The writeout code is ugly, although that's largely due to a mismatch > > >between > > >what reiser4 wants to do and what the VFS/MM expects it to do. > > Yes. reiser4 writeouts atoms. Most of pages get into atoms via > sys_write. But pages dirtied via shared mapping do not. They get into > atoms in reiser4's writepages address space operation. It think you mean ->writepage - reiser4 desn't implement ->writepages(). I assume you considered hooking into ->set_page_dirty() to do the add-to-atom thing earlier on? We'll merge mm-tracking-shared-dirty-pages.patch into 2.6.19-rc1, which would make that approach considerably more successful, I expect. ->set_page_dirty() is a bit awkward because it can be called under spinlock. Maybe comething could also be gained from the new vm_operations_struct.page_mkwrite(), although that's less obvious... > That is why > reiser4_sync_inodes has two steps: on first one it calls > generic_sync_sb_inodes to call writepages for dirty inodes to capture > pages dirtied via shared mapping into atoms. Second step flushes atoms. > > > > > > I agree --- both with it being ugly, and that being part of why. > > > > > If it > > >works, we can live with it, although perhaps the VFS could be made smarter. > > > > > > > > I would be curious regarding any ideas on that. Next time I read > > through that code, I will keep in mind that you are open to making VFS > > changes if it improves things, and I will try to get clever somehow and > > send it by you. Our squalloc code though is I must say the most > > complicated and ugliest piece of code I ever worked on for which every > > cumulative ugliness had a substantive performance advantage requiring us > > to keep it. If you spare yourself from reading that, it is > > understandable to do so. > > > > >I'd say that resier4's major problem is the lack of xattrs, acls and > > >direct-io. That's likely to significantly limit its vendor uptake. > > xattrs is really a problem. That's not good. The ability to properly support SELinux is likely to be important.
Re: [PATCH] reiserfs: fix handling of device names with /'s in them
On Wed, 12 Jul 2006 20:51:47 -0700 Hans Reiser <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > >On Wed, 12 Jul 2006 12:42:28 -0400 > >Jeff Mahoney <[EMAIL PROTECTED]> wrote: > > > > > > > >> On systems with block devices containing slashes (virtual dasd, cciss, > >> etc), reiserfs will fail to initialize /proc/fs/reiserfs/ due to > >> it being interpreted as a subdirectory. The generic block device code > >> changes the / to ! for use in the sysfs tree. This patch uses that > >> convention. > >> > >> > > > >Isn't it a bit dumb of us to be putting slashes in the device names anyway? > > It would be better, if poss, to alter dasd/cciss/etc and stop all these > >s@/@[EMAIL PROTECTED] games. > > > > > > > > > Isn't better to ask why there is a problem with the /'s? It would be > bad for Linux as a design to prevent passing arbitrary tail ends of > filenames off to arbitrary plugins of some kind. In general, in > namespace design, you want to allow delegating the job of > resolving/interpreting the tail end of a file that the front end has > identified as something that can interpret it. > > Forgive me, I probably understand something wongly about procfs and this > issue > It's a question of being *practical*. Your observations are in the realm of the theoretical. Software sucks, and we get along better by not provoking it. So don't put spaces, let alone slashes into strings which we offer to userspace.
Re: [PATCH] reiserfs: fix handling of device names with /'s in them
On Wed, 12 Jul 2006 12:42:28 -0400 Jeff Mahoney <[EMAIL PROTECTED]> wrote: > On systems with block devices containing slashes (virtual dasd, cciss, > etc), reiserfs will fail to initialize /proc/fs/reiserfs/ due to > it being interpreted as a subdirectory. The generic block device code > changes the / to ! for use in the sysfs tree. This patch uses that > convention. Isn't it a bit dumb of us to be putting slashes in the device names anyway? It would be better, if poss, to alter dasd/cciss/etc and stop all these s@/@[EMAIL PROTECTED] games.
Re: [PATCH] reiserfs:fix journaling issue regarding fsync()
On Mon, 03 Jul 2006 10:31:49 +0900 Hisashi Hifumi <[EMAIL PROTECTED]> wrote: > When write() extends a file(i_size is increased) and fsync() is called, > change of inode must be written to journaling area through fsync(). > But,currently the i_trans_id is not correctly updated when i_size > is increased. So fsync() does not kick the journal writer. > > Reiserfs_file_write() already updates the transaction when blocks are > allocated, > but the case when i_size increases and new blocks are not added is not > correctly treated. How can i_size be increased without adding blocks? Are you referring to a write which remains wholly within the final block of the file? And/or to an expanding lseek? And what are the user-visible consequences of this bug? Thanks.
Re: [PATCH 03/04] reiserfs: reorganize bitmap loading functions
Jeff Mahoney <[EMAIL PROTECTED]> wrote: > > + for (i = 0; i < SB_BMAP_NR(sb); i++) > + bitmap[i].bh = reiserfs_read_bitmap_block(sb, i); Can return NULL. > + /* make sure we have them all */ > + for (i = 0; i < SB_BMAP_NR(sb); i++) { > + wait_on_buffer(bitmap[i].bh); Will go oops.
Re: [PATCH] Fix reiserfs deadlock
Jan Kara <[EMAIL PROTECTED]> wrote: > > reiserfs_cache_default_acl() should return whether we successfully found the > acl or not. We have to return correct value even if reiserfs_get_acl() returns > error code and not just 0. Otherwise callers such as reiserfs_mkdir() can > unnecessarily lock the xattrs and later functions such as reiserfs_new_inode() > fail to notice that we have already taken the lock and try to take it again > with obvious consequences. > > Signed-off-by: Jan Kara <[EMAIL PROTECTED]> > > diff -rupX /home/jack/.kerndiffexclude > linux-2.6.5-SLES9_SP3_BRANCH/fs/reiserfs/xattr_acl.c > linux-2.6.5-SLES9_SP3_BRANCH-1-reiser_xattr_fix/fs/reiserfs/xattr_acl.c > --- linux-2.6.5-SLES9_SP3_BRANCH/fs/reiserfs/xattr_acl.c 2006-01-21 > 03:02:06.0 +0100 > +++ linux-2.6.5-SLES9_SP3_BRANCH-1-reiser_xattr_fix/fs/reiserfs/xattr_acl.c > 2006-01-21 09:09:04.0 +0100 > @@ -410,8 +410,10 @@ reiserfs_cache_default_acl (struct inode > acl = reiserfs_get_acl (inode, ACL_TYPE_DEFAULT); > reiserfs_read_unlock_xattrs (inode->i_sb); > reiserfs_read_unlock_xattr_i (inode); > -ret = acl ? 1 : 0; > + if (!acl || IS_ERR(acl)) > + return 0; > posix_acl_release (acl); > + ret = 1; > } > > return ret; Your email client did something ghastly to this patch. Please fix it. I typed it in again. While I was there I structured it a bit more nicely. --- 25/fs/reiserfs/xattr_acl.c~fix-reiserfs-deadlockFri Apr 21 15:15:10 2006 +++ 25-akpm/fs/reiserfs/xattr_acl.c Fri Apr 21 15:17:40 2006 @@ -408,8 +408,9 @@ int reiserfs_cache_default_acl(struct in acl = reiserfs_get_acl(inode, ACL_TYPE_DEFAULT); reiserfs_read_unlock_xattrs(inode->i_sb); reiserfs_read_unlock_xattr_i(inode); - ret = acl ? 1 : 0; - posix_acl_release(acl); + ret = (acl && !IS_ERR(acl)); + if (ret) + posix_acl_release(acl); } return ret; _
Re: [PATCH] reiserfs: reiserfs_file_write will lose error code when a 0-length write occurs w/ O_SYNC
Boy, lots of reiserfs things happening lately. We presently have: reiserfs-do-not-check-if-unsigned-0.patch [ merged today ] reiserfs-fix-transaction-overflowing.patch reiserfs-handle-trans_id-overflow.patch reiserfs-reiserfs_file_write-will-lose-error-code-when-a-0-length-write-occurs-w-o_sync.patch reiserfs-cleanups.patch reiserfs-use-balance_dirty_pages_ratelimited_nr-in-reiserfs_file_write.patch reiserfs-fix-unaligned-bitmap-usage.patch The question is, which of these are sufficiently serious-and-safe for 2.6.16? I haven't seen any resierfs bug reports for quite some time (except for the usual dribble of it-goes-oops-in-prints.c-when-something-went-wrong reports). So I'm inclined to hold off on all the above?
Re: [Fwd: Re: [PATCH] reiserfs: use balance_dirty_pages_ratelimited_nr in reiserfs_file_write]
Hans Reiser <[EMAIL PROTECTED]> wrote: > > I suspect that when someone did the search and replace when creating > balance_dirty_pages_ratelimited_nr they failed to read the code and > realize this code path was already effectively ratelimited. The result > is they made it excessively infrequent (every 1MB if ratelimit is 8) in > its calling balance_dirty_pages. ?? There's been no change to balance_dirty_pages_ratelimited(). I merely widened the interface a bit: introduced the new balance_dirty_pages_ratelimited_nr() and did static inline void balance_dirty_pages_ratelimited(struct address_space *mapping) { balance_dirty_pages_ratelimited_nr(mapping, 1); } That being said, if reiserfs has `number of pages' in hand then yes, it makes sense to migrate over to balance_dirty_pages_ratelimited_nr(). > If anyone has ever seen this as an actual problem on a real system, I > would be curious to hear of it. No, I wouldn't expect it to make much difference. All that gunk is there just to avoid calling the expensive get_writeback_state() once per set_page_dirty(). Inaccuracy here will introduce the possibility that we'll transiently dirty more memory than dirty_ratio permits, but it'll only be a little bit (eight times the amount of memory which is dirtied per balance_dirty_pages_ratelimited() call). That's a small amount of memory. But if you have 1000 filesystems mounted and they all do the same thing at the same time, things could get a bit sticky. Your patch will (greatly) reduce the possibility of even that scenario.
Re: latest patches degrade reiser4 performance substantially
Hans Reiser <[EMAIL PROTECTED]> wrote: > > At the this time we have no idea which patch is responsible, probably in > a day or two we'll have a patch to fix it. > OK. I assume this performance change is demonstrable in just 2.6.14-rc2+reiser4? Beware that there are other changes in the -mm lineup which might cause regressions. Notably mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch and per-task-predictive-write-throttling-1.patch per-task-predictive-write-throttling-1-tweaks.patch
Re: + reiser4-big-update-div64-fix.patch added to -mm tree
"Vladimir V. Saveliev" <[EMAIL PROTECTED]> wrote: > > Hello > > [EMAIL PROTECTED] wrote: > > The patch titled > > > > reiser4-big-update-div64-fix > > > > has been added to the -mm tree. Its filename is > > > > reiser4-big-update-div64-fix.patch > > > > > > From: Andrew Morton <[EMAIL PROTECTED]> > > > > There's no implementation of __div64_32(). Nor should there be. > > > > hmm, there is one in linux/lib/div64.c. > It's only for supporting compiler-generated calls on 32-bit platforms: /* Not needed on 64bit architectures */ #if BITS_PER_LONG == 32 uint32_t __div64_32(uint64_t *n, uint32_t base)
Re: List of things requested by lkml for reiser4 inclusion (to review)
Hans Reiser <[EMAIL PROTECTED]> wrote: > > ... > > Do I remember right that the submission > deadline is a week from Monday for 2.6.14 inclusion? Next week, supposedly. But something like a brand new filesystem can go in pretty much any time, as long as it compiles. Because it can't break anyone's current setup. Brand new filesystems which also require monkeying with the core VFS aren't quite that straightforward though. > This is supposed to be a bullet list, so I don't list here the line by > line minor code improvements sent to us, most of which were > incorporated, but let me take a moment to thanks those who donated them. Guys, you should know by now not to send 1021-column emails :( > 1. pseudo files or "" files > > disabled. It remains a point of (extraordinary) contention as to > whether it can be fixed, we want to keep the code around until we can > devote proper resources into proving it can be (or until we fail to prove > it can be and remove it). We don't want to delay the rest of the code for > that proof, but we still think it can be done (by several different ways of > which we need to select one and make it work.) Let us postpone contention > on this until the existence of a patch that cannot crash makes contention > purposeful, shall we? I'd prefer that unused code simply not be present in the tree, sorry. > > > 2. dependency on 4k stack turned off > >removed as requested So it all runs OK with 4k stacks now? > 3. remove conditional variable code, use wait queues instead. > > not done. There are times when reduced functionality aids debugging. > kcond is (literally) textbook code. We don't care enough to fight much for > it, but akpm, what is your opinion? Will remove if akpm asks us to. kcond is only used in a couple of places. One looks like it could use complete() and the other is a standard wait-for-something-to-do kernel thread loop, which we open-code without any fuss in lots of places (kjournald, loop, pdflush, etc). So yes, I'd be inclined to remove kcond please. Also, it would be better to use the kthread API rather than open-coding kernel_thread() calls. If you think that reiser4 needs additional ways of controlling kernel threads then feel free to enhance the kthread API. > 6. remove type safe lists and type safe hash queues. > > not done, it is not clear that the person asking for this represents a > unified consensus of lkml. Other persons instead asked that it just be > moved out of reiser4 code into the generic kernel code, which implies they > did not object to it. There are many who like being type safe. Akpm, what > do you yourself think? The type-unsafety of existing list_heads gives me conniptions too. Yes, it'd be nice to have a type-safe version available. That being said, I don't see why such a thing cannot be a wrapper around the existing list_head functions. Yes, there will be some ghastly C-templates-via-CPP stuff, best avoided by not looking at the file ;) We should aim for a complete 1:1 relationship between list_heads and type-safe lists. So people know what they're called, know how they work, etc. We shouldn't go adding things called rx_event_list_pop_back() when everyone has learned the existing list API. Of course, it would have been better to do this work as a completely separate kernel feature rather than bundling it with a filesystem. If this isn't a thing your team wants to take on now then yes, I'd be inclined to switch reiser4 to list_heads. > > 7. remove fs/reiser4/lib.h:/div64_32. > >is being replaced by the linux one. > > 8. Remove all assertions because they clutter the code and make it hard to > read > > > We think this person was not an experienced security specialist, We think this person didn't submit a patch to remove the 124 assertions from ext3 ;) Keep the assertions.
Re: reiser4 merge status
Hans Reiser <[EMAIL PROTECTED]> wrote: > > Andrew, we have been working on a big patch that resolves the VFS > layering issues. This patch should be ready to send in any day now, we > are debugging it. We may not have been sending you a lot of emails, but > we have been working away at it. That's good news. > I don't understand why you think the > review feedback was lost. Because I never saw the condensed list of open issues. > I will have Zam review the email history, develop the bullet list for > you, and send it. OK, thanks, it's important. Lots of people spent time looking over and discussing the code. They don't want to have to a) go back and re-read everything to remember what their concerns were and b) scratch through the new code to find out if their concerns were addressed, and how they were addressed. So I'd view Zam's list as pretty important.
Re: reiser4 merging action list
Hans Reiser <[EMAIL PROTECTED]> wrote: > > > Andrew asked me to put together a list of things that need to be done > before merging: Thanks. As I said to Hans, if we can get a list of bullet-point actions nailed down and agreed to then we have an uncontroversial path to happiness and a merge. Let's get down and concentrate on technical specifics. Hans, please maintain this list and republish it as we work through things. > * VFS will dispatch directly to the method of the plugin for the > *_operations methods. This requires duplicating to all plugin methods > the common code currently used by all reiser4 plugins for a given > method. It has the desirable side effect of making the methods more > fully self-contained, which is somethng I had wanted two years ago and > was a little sad to not get, and the cost of duplicating some code. > Since not all plugin methods are *_operations, it means we have two > structures with duplicated data, and duplicate data that must be in sync > at all times is classical badness in programming technique (see Codd and > normalization). vs owns this task > > * review all sparse complaints, and revise as appropriate. > > * panic and code beauty: everyone agrees that having function, file, > and line added to reiser4_panic output hurts nothing (I hope). Everyone > agrees that restarting the machine without an error message seems like a > useless option to allow. Much else was argued, not sure if anything > was a consensus view. Various detail improvements were suggested by > Pecca, and I agreed with half of them. > > >* metafiles should be disabled until we can present code that works > right. Half the list thinks we cannot solve the cycles problem ever. > Disable metafiles and postpone problem until working code, or the > failure to produce it, makes it possible to do more than rant at each > other. This is currently already done in the -mm patches, but is > mentioned lest someone think it forgotten. > >* update the locking documentation > There's also the custom list, hash and debug code. We should either a) remove them or b) generify them and submit as standalone works or c) justify them as custom-to-reiser4 and leave them as-is.
Re: -mm -> 2.6.13 merge status
Jeff Mahoney <[EMAIL PROTECTED]> wrote: > > >>+ assert("nikita-955", pool != NULL); > > > > These assertion codes are meaningless to the rest of us so please drop > > them. > > As someone who spends time debugging reiser3 issues, I've grown > accustomed to the named assertions. They make discussing a particular > assertion much more natural in conversation than file:line. __FUNCTION__?
Re: reiser4 plugins
Hans Reiser <[EMAIL PROTECTED]> wrote: > > What is wrong with having an encryption plugin implemented in this > manner? What is wrong with being able to have some files implemented > using a compression plugin, and others in the same filesystem not. > > What is wrong with having one file in the FS use a write only plugin, in > which the encrypion key is changed with every append in a forward but > not backward computable manner, and in order to read a file you must > either have a key that is stored on another computer or be reading what > was written after the moment of cracking root? > > What is wrong with having a set of critical data files use a CRC > checking file plugin? I think the concern here is that this is implemented at the wrong level. In Linux, a filesystem is some dumb thing which implements address_space_operations, filesystem_operations, etc. Advanced features such as those which you describe are implemented on top of the filesystem, not within it. reiser4 turns it all upside down. Now, some of the features which you envision are not amenable to above-the-fs implementations. But some will be, and that's where we should implement those.
Re: reiser4 read performance loss of 17% in latest -mm kernel
Hans Reiser <[EMAIL PROTECTED]> wrote: > > Just an FYI, we are still figuring it out. andrew, you might note it in > release notes, especially if we don't fix it before your next release. I'd try reverting simplified-readahead*.patch.
Re: [Fwd: [PATCH] reiser4: fix a use after free bug in reiser4_parse_options]
Laurent Riffard <[EMAIL PROTECTED]> wrote: > > This patch fix a "use after kfree" bug in reiser4_parse_options. I'll add that to -mm. In future, please don't allow your email client to wordwrap patches. Thanks.
Re: [PATCH] Fix of quota deadlock on pagelock
Jan Kara <[EMAIL PROTECTED]> wrote: > > > > > What prevents vfs_quota_off() from racing with unmount? > > > > If you look at, say, sync_filesystems() you'll see that we take ->s_umount > > and then test ->s_root to check that we didn't race with an unmount > > attempt. > I thought that the sync_fs and stuff is guarded by the fact that quota > code is holding references to inodes on the filesystem. And invalidate_bdev() > should not care about the filesystem. Or do I miss something? Anyway > adding a comment about this is really a good idea... Holding a ref on an inode will not cause umount to block. It'll just cause nasty "Busy inodes after unmount" warnings.
Re: [PATCH] Fix of quota deadlock on pagelock
Jan Kara <[EMAIL PROTECTED]> wrote: > > +static ssize_t ext2_quota_write(struct super_block *sb, int type, > + const char *data, size_t len, loff_t off) > +{ > + struct inode *inode = sb_dqopt(sb)->files[type]; > + unsigned long blk = off >> EXT2_BLOCK_SIZE_BITS(sb); > + int err = 0, offset = off & (sb->s_blocksize - 1), tocopy; > + size_t towrite = len; > + struct buffer_head tmp_bh, *bh; > + > + down(&inode->i_sem); > + while (towrite > 0) { > + tocopy = sb->s_blocksize - offset < towrite ? > + sb->s_blocksize - offset : towrite; > + > + tmp_bh.b_state = 0; > + err = ext2_get_block(inode, blk, &tmp_bh, 1); > + if (err) > + goto out; > + if (offset || tocopy != EXT2_BLOCK_SIZE(sb)) > + bh = sb_bread(sb, tmp_bh.b_blocknr); > + else > + bh = sb_getblk(sb, tmp_bh.b_blocknr); > + if (!bh) { > + err = -EIO; > + goto out; > + } > + memcpy(bh->b_data+offset, data, tocopy); It is possible to mmap block devices, so we should do a flush_dcache_page() after the memset here. Also, we should lock the buffer. Because a concurrent read of the blockdev would cause the disk DMA transfer to stomp all over the data whihc you just wrote. Say, someone is doing `cp /dev/hda1 /dev/null' at the same time. So my accumulated patch against this one is: diff -puN fs/ext2/super.c~fix-of-quota-deadlock-on-pagelock-ext2-tweaks fs/ext2/super.c --- 25/fs/ext2/super.c~fix-of-quota-deadlock-on-pagelock-ext2-tweaksFri Nov 19 14:54:47 2004 +++ 25-akpm/fs/ext2/super.c Fri Nov 19 15:00:59 2004 @@ -1020,10 +1020,13 @@ static ssize_t ext2_quota_read(struct su size_t len, loff_t off) { struct inode *inode = sb_dqopt(sb)->files[type]; - unsigned long blk = off >> EXT2_BLOCK_SIZE_BITS(sb); - int err = 0, offset = off & (sb->s_blocksize - 1), tocopy; + sector_t blk = off >> EXT2_BLOCK_SIZE_BITS(sb); + int err = 0; + int offset = off & (sb->s_blocksize - 1); + int tocopy; size_t toread; - struct buffer_head tmp_bh, *bh; + struct buffer_head tmp_bh; + struct buffer_head *bh; loff_t i_size = i_size_read(inode); if (off > i_size) @@ -1061,10 +1064,13 @@ static ssize_t ext2_quota_write(struct s const char *data, size_t len, loff_t off) { struct inode *inode = sb_dqopt(sb)->files[type]; - unsigned long blk = off >> EXT2_BLOCK_SIZE_BITS(sb); - int err = 0, offset = off & (sb->s_blocksize - 1), tocopy; + sector_t blk = off >> EXT2_BLOCK_SIZE_BITS(sb); + int err = 0; + int offset = off & (sb->s_blocksize - 1); + int tocopy; size_t towrite = len; - struct buffer_head tmp_bh, *bh; + struct buffer_head tmp_bh; + struct buffer_head *bh; down(&inode->i_sem); while (towrite > 0) { @@ -1083,9 +1089,12 @@ static ssize_t ext2_quota_write(struct s err = -EIO; goto out; } + lock_buffer(bh); memcpy(bh->b_data+offset, data, tocopy); + flush_dcache_page(&bh->b_page); set_buffer_uptodate(bh); mark_buffer_dirty(bh); + unlock_buffer(bh); brelse(bh); offset = 0; towrite -= tocopy; _ I suspect things in there are still a bit racy against a concurrent `blockdev --flushbufs' though.
Re: [PATCH] Fix of quota deadlock on pagelock
Jan Kara <[EMAIL PROTECTED]> wrote: > > +static ssize_t ext2_quota_read(struct super_block *sb, int type, char *data, > +size_t len, loff_t off) > +{ > + struct inode *inode = sb_dqopt(sb)->files[type]; > + unsigned long blk = off >> EXT2_BLOCK_SIZE_BITS(sb); `blk' should have sector_t type. > + int err = 0, offset = off & (sb->s_blocksize - 1), tocopy; Please do int err = 0; int offset = off & (sb->s_blocksize - 1); int tocopy; I'll make the above changes.
Re: [PATCH] Fix of quota deadlock on pagelock
What prevents vfs_quota_off() from racing with unmount? If you look at, say, sync_filesystems() you'll see that we take ->s_umount and then test ->s_root to check that we didn't race with an unmount attempt.
Re: 2.6.9-mm1
"Hilzinger Marcel" <[EMAIL PROTECTED]> wrote: > > SuSE Linux 9.2 will contain reiser4 hm. Nobody ever tells me anything. Does that mean that SuSE are using 8k stacks?
Re: silent semantic changes with reiser4
Spam <[EMAIL PROTECTED]> wrote: > >Secondly, do you expect file managers like Nautilus and Konqueror to >support every piece of file format on the planet so they could read >information directly from the documents? Sure. You're proposing that we implement a single, golden multi-stream file format in the kernel. We could just as well do that in libmultistreamfileformat.so. But I'll grant that one cannot go adding new metadata to, say, C files this way. I don't know how useful such a thing is though. Remember that my main point is that there's a lack of coordination in userspace. Hell, there's none. Putting it in-kernel forces that coordination, and may be the way to go, but it's pretty sad.
Re: I oppose Chris and Jeff's patch to add an unnecessary additional namespace to ReiserFS
Hey, I was reading that! Please do *not* go making modifications to Cc: lists. Just do reply-to-all and be happy, thanks.
Re: reiserfs logging patches udpated
Chris Mason <[EMAIL PROTECTED]> wrote: > > On Wed, 2004-03-24 at 17:35, Andrew Morton wrote: > > > > > For data=ordered? The only docs are to mount -o data=writeback if you > > > don't want data=ordered (which is the new default). No tool upgrades > > > are required. > > > > OK, thanks. Switching the default on day one sounds radical doesn't it? > > > It does, but the code has been in testing -suse and on the reiserfs > list. This dooesn't mean data=ordered is perfect, but it's not quite > day one either. I can switch the default back, but I'd rather have a > trial by fire ;-) OK. > > > reiserfs needs a Documentation/filesystems/reiserfs.txt, but it needs > > > that in general. I'll write one up and have the namesys guys review. > > > > OK. And these guys need boring old changelogs please: > > > The top of each patch has a boring old changelog. I can reformat them > if needed. Oh, I didn't notice that. Anything other than one-patch-per-email with changelog in the body is a bit of a pain. I'll go fetch the patches again :(
Re: reiserfs logging patches udpated
Chris Mason <[EMAIL PROTECTED]> wrote: > > On Wed, 2004-03-24 at 17:18, Andrew Morton wrote: > > Chris Mason <[EMAIL PROTECTED]> wrote: > > > > > > ftp.suse.com/pub/people/mason/patches/data-logging/experimental/2.6.5-rc2-mm2 > > > > > > Has a new set of reiserfs patches. > > > > -ENODOCCO. If people are going to test this stuff we will need setup and > > usage instructions, links to userspace tool upgrades, etc, etc. > > For data=ordered? The only docs are to mount -o data=writeback if you > don't want data=ordered (which is the new default). No tool upgrades > are required. OK, thanks. Switching the default on day one sounds radical doesn't it? > reiserfs needs a Documentation/filesystems/reiserfs.txt, but it needs > that in general. I'll write one up and have the namesys guys review. OK. And these guys need boring old changelogs please: reiserfs-nesting-02.patch reiserfs-journal-writer.patch reiserfs-logging.patch reiserfs-jh-2.patch reiserfs-prealloc.patch reiserfs-tail-jh.patch reiserfs-writepage-ordered-race.patch reiserfs-file_write_hole_sd.diff.patch reiserfs-laptop-mode.patch reiserfs-truncate-leak.patch reiserfs-ordered-lat.patch reiserfs-dirty-warning.patch
Re: reiserfs logging patches udpated
Chris Mason <[EMAIL PROTECTED]> wrote: > > ftp.suse.com/pub/people/mason/patches/data-logging/experimental/2.6.5-rc2-mm2 > > Has a new set of reiserfs patches. -ENODOCCO. If people are going to test this stuff we will need setup and usage instructions, links to userspace tool upgrades, etc, etc.
Re: Filesystem Tests
Mike Fedyk <[EMAIL PROTECTED]> wrote: > > On Wed, Aug 06, 2003 at 06:34:10PM +0200, Diego Calleja Garc?a wrote: > > El Wed, 06 Aug 2003 18:06:37 +0400 Hans Reiser <[EMAIL PROTECTED]> escribi?: > > > > > I don't think ext2 is a serious option for servers of the sort that > > > Linux specializes in, which is probably why he didn't measure it. > > > > Why? > > Because if you have a power outage, or a crash, you have to run the > filesystem check tools on it or risk damaging it further. > > Journaled filesystems have a much smaller chance of having problems after a > crash. Journalled filesytems have a runtime cost, and you're paying that all the time. If you're going 200 days between crashes on a disk-intensive box then using a journalling fs to save 30 minutes at reboot time just doesn't stack up: you've lost much, much more time than that across the 200 days. It all depends on what the machine is doing and what your max downtime requirements are.
Re: Filesystem Tests
Grant Miner <[EMAIL PROTECTED]> wrote: > > I tested the performace of various filesystems with a mozilla build tree > of 295MB, with primarily writing and copying operations. The test > system is Linux 2.6.0-test2, 512MB memory, 11531.85MB partition for > tests. Sync is run a few times throughout the test (for full script see > bottom of this email). I ran mkfs on the partition before every test. > Running the tests again tends to produces similar times, about +/- 3 > seconds. > > The first item number is time, in seconds, to complete the test (lower > is better). The second number is CPU use percentage (lower is better). > > reiser4 171.28s, 30%CPU (1.x time; 1.0x CPU) > reiserfs 302.53s, 16%CPU (1.7663x time; 0.53x CPU) > ext3 319.71s, 11%CPU (1.8666x time; 0.36x CPU) > xfs 429.79s, 13%CPU (2.5093x time; 0.43x CPU) > jfs 470.88s, 6%CPU (2.7492x time 0.02x CPU) But different filesystems will leave different amounts of dirty, unwritten data in memory at the end of the test. On your machine, up to 200MB of dirty data could be sitting there in memory at the end of the timing interval. You need to decide how to account for that unwritten data in the measurement. Simply ignoring it as you have done is certainly valid, but is only realistic in a couple of scenarios: a) the files are about the be deleted again b) the application which your benchmark simulates is about to spend more than 30 seconds not touching the disk. This discrepancy is especially significant with ext3 which, in ordered data mode, will commit all that data every five seconds. If the test takes more than five seconds then ext3 can appear to take a _lot_ longer. But it is somewhat artificial: that data has to be written out sometime. Solutions to this inaccuracy are to make the test so long-running (ten minutes or more) that the difference is minor, or to include the `sync' in the time measurement. And when benching things, please include ext2. It is the reference filesystem, as it were. It tends to be the fastest, too.
[reiserfs-list] Re: [patch] truncate_inode_pages
Dieter Nützel wrote: > > Hello Andrew, > > your patch is in 2.4.6-pre3 but do not appear in Alan's 2.4.5-ac14. He'll pick it up next time he syncs with Linus. > Is it not needed any longer since the latest VM fixes? It's needed. It doesn't fix any bugs - it just speeds up some uncommon cases by few thousand times.