Re: per bdi dirty balancing (was Re: kupdate weirdness)
> > My plan is to extract the minimal set of features from your patchset, > > that solves the dirty balancing deadlocks and submit them as quickly > > as possible. > > I had hoped to post a new version yesterday, but lets hope for today. Would be cool. > > After that we can look at trying to solve the more ambitious problem > > of the slow vs. fast devices in a way that not only you can understand ;) > > Drad, and here I thought all that documentation in the proportions lib > would have solved that :-( Well, I didn't get that far, and only had a glimpse of the proportions lib. But my hunch is that there's still lots of room for simplification. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: per bdi dirty balancing (was Re: kupdate weirdness)
On Fri, 2007-08-03 at 08:43 +0200, Miklos Szeredi wrote: > (cc restored) > > > > > There were heaps of problems in there and it is surprising how few > > > > people > > > > were hitting them. Ordered-mode journalling filesystems will fix it > > > > all up > > > > behind the scenes, of course. > > > > > > > > I just have a bad feeling about that code - list_heads are the wrong > > > > data > > > > structure and it all needs to be ripped and redone using some indexable > > > > data structure. There has been desultory discussion, but nothing's > > > > happening and nothing will happen in the medium term, so we need to keep > > > > on whapping bandainds on it. > > > > > > The reason why I'm looking at that code is because of those > > > balance_dirty_pages() deadlocks. I'm not perfectly happy with the > > > per-pdi-per-cpu counters Peter's patch is introducing. > > > > What is your biggest concern regarding them? > > Complexity. I've started to review the patches, and they are just too > damn complex. > > For example introducing the backing_dev_info initializer and > destructor adds potential bugs if we miss to add them somewhere. yeah, that was/is a pain. > Now maybe this is unavoidable. I'm just trying to look for a solution > involving less uncertanties and complexities. > > My plan is to extract the minimal set of features from your patchset, > that solves the dirty balancing deadlocks and submit them as quickly > as possible. I had hoped to post a new version yesterday, but lets hope for today. > After that we can look at trying to solve the more ambitious problem > of the slow vs. fast devices in a way that not only you can understand ;) Drad, and here I thought all that documentation in the proportions lib would have solved that :-( - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
per bdi dirty balancing (was Re: kupdate weirdness)
(cc restored) > > > There were heaps of problems in there and it is surprising how few people > > > were hitting them. Ordered-mode journalling filesystems will fix it all > > > up > > > behind the scenes, of course. > > > > > > I just have a bad feeling about that code - list_heads are the wrong data > > > structure and it all needs to be ripped and redone using some indexable > > > data structure. There has been desultory discussion, but nothing's > > > happening and nothing will happen in the medium term, so we need to keep > > > on whapping bandainds on it. > > > > The reason why I'm looking at that code is because of those > > balance_dirty_pages() deadlocks. I'm not perfectly happy with the > > per-pdi-per-cpu counters Peter's patch is introducing. > > What is your biggest concern regarding them? Complexity. I've started to review the patches, and they are just too damn complex. For example introducing the backing_dev_info initializer and destructor adds potential bugs if we miss to add them somewhere. Now maybe this is unavoidable. I'm just trying to look for a solution involving less uncertanties and complexities. My plan is to extract the minimal set of features from your patchset, that solves the dirty balancing deadlocks and submit them as quickly as possible. After that we can look at trying to solve the more ambitious problem of the slow vs. fast devices in a way that not only you can understand ;) How's that? Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
per bdi dirty balancing (was Re: kupdate weirdness)
(cc restored) There were heaps of problems in there and it is surprising how few people were hitting them. Ordered-mode journalling filesystems will fix it all up behind the scenes, of course. I just have a bad feeling about that code - list_heads are the wrong data structure and it all needs to be ripped and redone using some indexable data structure. There has been desultory discussion, but nothing's happening and nothing will happen in the medium term, so we need to keep on whapping bandainds on it. The reason why I'm looking at that code is because of those balance_dirty_pages() deadlocks. I'm not perfectly happy with the per-pdi-per-cpu counters Peter's patch is introducing. What is your biggest concern regarding them? Complexity. I've started to review the patches, and they are just too damn complex. For example introducing the backing_dev_info initializer and destructor adds potential bugs if we miss to add them somewhere. Now maybe this is unavoidable. I'm just trying to look for a solution involving less uncertanties and complexities. My plan is to extract the minimal set of features from your patchset, that solves the dirty balancing deadlocks and submit them as quickly as possible. After that we can look at trying to solve the more ambitious problem of the slow vs. fast devices in a way that not only you can understand ;) How's that? Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: per bdi dirty balancing (was Re: kupdate weirdness)
On Fri, 2007-08-03 at 08:43 +0200, Miklos Szeredi wrote: (cc restored) There were heaps of problems in there and it is surprising how few people were hitting them. Ordered-mode journalling filesystems will fix it all up behind the scenes, of course. I just have a bad feeling about that code - list_heads are the wrong data structure and it all needs to be ripped and redone using some indexable data structure. There has been desultory discussion, but nothing's happening and nothing will happen in the medium term, so we need to keep on whapping bandainds on it. The reason why I'm looking at that code is because of those balance_dirty_pages() deadlocks. I'm not perfectly happy with the per-pdi-per-cpu counters Peter's patch is introducing. What is your biggest concern regarding them? Complexity. I've started to review the patches, and they are just too damn complex. For example introducing the backing_dev_info initializer and destructor adds potential bugs if we miss to add them somewhere. yeah, that was/is a pain. Now maybe this is unavoidable. I'm just trying to look for a solution involving less uncertanties and complexities. My plan is to extract the minimal set of features from your patchset, that solves the dirty balancing deadlocks and submit them as quickly as possible. I had hoped to post a new version yesterday, but lets hope for today. After that we can look at trying to solve the more ambitious problem of the slow vs. fast devices in a way that not only you can understand ;) Drad, and here I thought all that documentation in the proportions lib would have solved that :-( - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: per bdi dirty balancing (was Re: kupdate weirdness)
My plan is to extract the minimal set of features from your patchset, that solves the dirty balancing deadlocks and submit them as quickly as possible. I had hoped to post a new version yesterday, but lets hope for today. Would be cool. After that we can look at trying to solve the more ambitious problem of the slow vs. fast devices in a way that not only you can understand ;) Drad, and here I thought all that documentation in the proportions lib would have solved that :-( Well, I didn't get that far, and only had a glimpse of the proportions lib. But my hunch is that there's still lots of room for simplification. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kupdate weirdness
> There were heaps of problems in there and it is surprising how few people > were hitting them. Ordered-mode journalling filesystems will fix it all up > behind the scenes, of course. > > I just have a bad feeling about that code - list_heads are the wrong data > structure and it all needs to be ripped and redone using some indexable > data structure. There has been desultory discussion, but nothing's > happening and nothing will happen in the medium term, so we need to keep > on whapping bandainds on it. The reason why I'm looking at that code is because of those balance_dirty_pages() deadlocks. I'm not perfectly happy with the per-pdi-per-cpu counters Peter's patch is introducing. I was wondering if we can count the number of writeback pages through the radix tree, just like we do for dirty pages? All that would be needed is to keep the under-writeback inodes on some list as well. But I realize, that this introduces it's own problems as well... Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kupdate weirdness
On Thu, 02 Aug 2007 17:52:39 +0200 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > The following strange behavior can be observed: > > > > > > 1. large file is written > > > 2. after 30 seconds, nr_dirty goes down by 1024 > > > 3. then for some time (< 30 sec) nothing happens (disk idle) > > > 4. then nr_dirty again goes down by 1024 > > > 5. repeat from 3. until whole file is written > > > > > > So basically a 4Mbyte chunk of the file is written every 30 seconds. > > > I'm quite sure this is not the intended behavior. > > > > > > The reason seems to be that __sync_single_inode() will move the > > > partially written inode from s_io onto s_dirty, and sync_sb_inode() > > > will not splice it back onto s_io until the rest of the inodes on s_io > > > has been processed. > > > > It does all sorts of weird crap. > > > > > Since there will probably be a recently dirtied inode on s_io, this > > > will take some of time, but always less than 30 sec. > > > > > > I don't know what's the easiest solution. > > > > > > Any ideas? > > > > Try 2.6.23-rc1-mm2. > > Much better, but still not perfect. I've kinda lost track of the status of all these patches. I _think_ Ken has identified a remaining problem even after his writeback-fix-periodic-superblock-dirty-inode-flushing.patch, but maybe I misremember. Ken, can you remind us of the status there, please? > Now it writes out 1024 pages after 30 seconds and then the rest after > another 30s. Bah. > If my analysis is correct, this is because when it first gets onto > s_io other inodes will get there too (with up-to 30s later dirying > time), and the contents of s_more_io won't be recycled until the > current contents of s_io are processed. > > Maybe this is OK, the previous weird stuff didn't seem to bother a lot > of people either. There were heaps of problems in there and it is surprising how few people were hitting them. Ordered-mode journalling filesystems will fix it all up behind the scenes, of course. I just have a bad feeling about that code - list_heads are the wrong data structure and it all needs to be ripped and redone using some indexable data structure. There has been desultory discussion, but nothing's happening and nothing will happen in the medium term, so we need to keep on whapping bandainds on it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kupdate weirdness
> > The following strange behavior can be observed: > > > > 1. large file is written > > 2. after 30 seconds, nr_dirty goes down by 1024 > > 3. then for some time (< 30 sec) nothing happens (disk idle) > > 4. then nr_dirty again goes down by 1024 > > 5. repeat from 3. until whole file is written > > > > So basically a 4Mbyte chunk of the file is written every 30 seconds. > > I'm quite sure this is not the intended behavior. > > > > The reason seems to be that __sync_single_inode() will move the > > partially written inode from s_io onto s_dirty, and sync_sb_inode() > > will not splice it back onto s_io until the rest of the inodes on s_io > > has been processed. > > It does all sorts of weird crap. > > > Since there will probably be a recently dirtied inode on s_io, this > > will take some of time, but always less than 30 sec. > > > > I don't know what's the easiest solution. > > > > Any ideas? > > Try 2.6.23-rc1-mm2. Much better, but still not perfect. Now it writes out 1024 pages after 30 seconds and then the rest after another 30s. If my analysis is correct, this is because when it first gets onto s_io other inodes will get there too (with up-to 30s later dirying time), and the contents of s_more_io won't be recycled until the current contents of s_io are processed. Maybe this is OK, the previous weird stuff didn't seem to bother a lot of people either. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kupdate weirdness
The following strange behavior can be observed: 1. large file is written 2. after 30 seconds, nr_dirty goes down by 1024 3. then for some time ( 30 sec) nothing happens (disk idle) 4. then nr_dirty again goes down by 1024 5. repeat from 3. until whole file is written So basically a 4Mbyte chunk of the file is written every 30 seconds. I'm quite sure this is not the intended behavior. The reason seems to be that __sync_single_inode() will move the partially written inode from s_io onto s_dirty, and sync_sb_inode() will not splice it back onto s_io until the rest of the inodes on s_io has been processed. It does all sorts of weird crap. Since there will probably be a recently dirtied inode on s_io, this will take some of time, but always less than 30 sec. I don't know what's the easiest solution. Any ideas? Try 2.6.23-rc1-mm2. Much better, but still not perfect. Now it writes out 1024 pages after 30 seconds and then the rest after another 30s. If my analysis is correct, this is because when it first gets onto s_io other inodes will get there too (with up-to 30s later dirying time), and the contents of s_more_io won't be recycled until the current contents of s_io are processed. Maybe this is OK, the previous weird stuff didn't seem to bother a lot of people either. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kupdate weirdness
On Thu, 02 Aug 2007 17:52:39 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: The following strange behavior can be observed: 1. large file is written 2. after 30 seconds, nr_dirty goes down by 1024 3. then for some time ( 30 sec) nothing happens (disk idle) 4. then nr_dirty again goes down by 1024 5. repeat from 3. until whole file is written So basically a 4Mbyte chunk of the file is written every 30 seconds. I'm quite sure this is not the intended behavior. The reason seems to be that __sync_single_inode() will move the partially written inode from s_io onto s_dirty, and sync_sb_inode() will not splice it back onto s_io until the rest of the inodes on s_io has been processed. It does all sorts of weird crap. Since there will probably be a recently dirtied inode on s_io, this will take some of time, but always less than 30 sec. I don't know what's the easiest solution. Any ideas? Try 2.6.23-rc1-mm2. Much better, but still not perfect. I've kinda lost track of the status of all these patches. I _think_ Ken has identified a remaining problem even after his writeback-fix-periodic-superblock-dirty-inode-flushing.patch, but maybe I misremember. Ken, can you remind us of the status there, please? Now it writes out 1024 pages after 30 seconds and then the rest after another 30s. Bah. If my analysis is correct, this is because when it first gets onto s_io other inodes will get there too (with up-to 30s later dirying time), and the contents of s_more_io won't be recycled until the current contents of s_io are processed. Maybe this is OK, the previous weird stuff didn't seem to bother a lot of people either. There were heaps of problems in there and it is surprising how few people were hitting them. Ordered-mode journalling filesystems will fix it all up behind the scenes, of course. I just have a bad feeling about that code - list_heads are the wrong data structure and it all needs to be ripped and redone using some indexable data structure. There has been desultory discussion, but nothing's happening and nothing will happen in the medium term, so we need to keep on whapping bandainds on it. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kupdate weirdness
There were heaps of problems in there and it is surprising how few people were hitting them. Ordered-mode journalling filesystems will fix it all up behind the scenes, of course. I just have a bad feeling about that code - list_heads are the wrong data structure and it all needs to be ripped and redone using some indexable data structure. There has been desultory discussion, but nothing's happening and nothing will happen in the medium term, so we need to keep on whapping bandainds on it. The reason why I'm looking at that code is because of those balance_dirty_pages() deadlocks. I'm not perfectly happy with the per-pdi-per-cpu counters Peter's patch is introducing. I was wondering if we can count the number of writeback pages through the radix tree, just like we do for dirty pages? All that would be needed is to keep the under-writeback inodes on some list as well. But I realize, that this introduces it's own problems as well... Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kupdate weirdness
On Wed, Aug 01, 2007 at 10:45:16PM +0200, Miklos Szeredi wrote: > The following strange behavior can be observed: > > 1. large file is written > 2. after 30 seconds, nr_dirty goes down by 1024 > 3. then for some time (< 30 sec) nothing happens (disk idle) > 4. then nr_dirty again goes down by 1024 > 5. repeat from 3. until whole file is written > > So basically a 4Mbyte chunk of the file is written every 30 seconds. > I'm quite sure this is not the intended behavior. > > The reason seems to be that __sync_single_inode() will move the > partially written inode from s_io onto s_dirty, and sync_sb_inode() > will not splice it back onto s_io until the rest of the inodes on s_io > has been processed. It's been doing this for a long time. http://marc.info/?l=linux-kernel=113919849421679=2 Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kupdate weirdness
On Wed, 01 Aug 2007 22:45:16 +0200 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > The following strange behavior can be observed: > > 1. large file is written > 2. after 30 seconds, nr_dirty goes down by 1024 > 3. then for some time (< 30 sec) nothing happens (disk idle) > 4. then nr_dirty again goes down by 1024 > 5. repeat from 3. until whole file is written > > So basically a 4Mbyte chunk of the file is written every 30 seconds. > I'm quite sure this is not the intended behavior. > > The reason seems to be that __sync_single_inode() will move the > partially written inode from s_io onto s_dirty, and sync_sb_inode() > will not splice it back onto s_io until the rest of the inodes on s_io > has been processed. It does all sorts of weird crap. > Since there will probably be a recently dirtied inode on s_io, this > will take some of time, but always less than 30 sec. > > I don't know what's the easiest solution. > > Any ideas? Try 2.6.23-rc1-mm2. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
kupdate weirdness
The following strange behavior can be observed: 1. large file is written 2. after 30 seconds, nr_dirty goes down by 1024 3. then for some time (< 30 sec) nothing happens (disk idle) 4. then nr_dirty again goes down by 1024 5. repeat from 3. until whole file is written So basically a 4Mbyte chunk of the file is written every 30 seconds. I'm quite sure this is not the intended behavior. The reason seems to be that __sync_single_inode() will move the partially written inode from s_io onto s_dirty, and sync_sb_inode() will not splice it back onto s_io until the rest of the inodes on s_io has been processed. Since there will probably be a recently dirtied inode on s_io, this will take some of time, but always less than 30 sec. I don't know what's the easiest solution. Any ideas? Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
kupdate weirdness
The following strange behavior can be observed: 1. large file is written 2. after 30 seconds, nr_dirty goes down by 1024 3. then for some time ( 30 sec) nothing happens (disk idle) 4. then nr_dirty again goes down by 1024 5. repeat from 3. until whole file is written So basically a 4Mbyte chunk of the file is written every 30 seconds. I'm quite sure this is not the intended behavior. The reason seems to be that __sync_single_inode() will move the partially written inode from s_io onto s_dirty, and sync_sb_inode() will not splice it back onto s_io until the rest of the inodes on s_io has been processed. Since there will probably be a recently dirtied inode on s_io, this will take some of time, but always less than 30 sec. I don't know what's the easiest solution. Any ideas? Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kupdate weirdness
On Wed, 01 Aug 2007 22:45:16 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: The following strange behavior can be observed: 1. large file is written 2. after 30 seconds, nr_dirty goes down by 1024 3. then for some time ( 30 sec) nothing happens (disk idle) 4. then nr_dirty again goes down by 1024 5. repeat from 3. until whole file is written So basically a 4Mbyte chunk of the file is written every 30 seconds. I'm quite sure this is not the intended behavior. The reason seems to be that __sync_single_inode() will move the partially written inode from s_io onto s_dirty, and sync_sb_inode() will not splice it back onto s_io until the rest of the inodes on s_io has been processed. It does all sorts of weird crap. Since there will probably be a recently dirtied inode on s_io, this will take some of time, but always less than 30 sec. I don't know what's the easiest solution. Any ideas? Try 2.6.23-rc1-mm2. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kupdate weirdness
On Wed, Aug 01, 2007 at 10:45:16PM +0200, Miklos Szeredi wrote: The following strange behavior can be observed: 1. large file is written 2. after 30 seconds, nr_dirty goes down by 1024 3. then for some time ( 30 sec) nothing happens (disk idle) 4. then nr_dirty again goes down by 1024 5. repeat from 3. until whole file is written So basically a 4Mbyte chunk of the file is written every 30 seconds. I'm quite sure this is not the intended behavior. The reason seems to be that __sync_single_inode() will move the partially written inode from s_io onto s_dirty, and sync_sb_inode() will not splice it back onto s_io until the rest of the inodes on s_io has been processed. It's been doing this for a long time. http://marc.info/?l=linux-kernelm=113919849421679w=2 Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/