Andrew Deason <[email protected]> writes: > If you want my opinion on what the _reason_ is, it's just that your high > rate of pag generation and high rate of writes is more than the > fileserver can handle, which is why I only ever see this stuff come from > you (at least, to this degree).
But, of course, it's not only me. There are at least three sites that I know of that are seriously impacted by these sorts of reliability issues under load, and it's worth remembering that we only hear from a small percentage of sites. > or, something in that area. I've mentioned a few times a few different > changes that I think can alleviate some of this, but... I never heard > anything back about them, so it didn't seem like you were interested or > that it wasn't that important. That's an unfortunate interpretation of delays in implementing configuration changes, and I'm sorry you got that impression. A better conclusion to draw is that it takes quite a bit of time to implement file server configuration changes in a large environment with a zero scheduled downtime requirement. We're currently still in the process of implementing the last round of suggestions. Each time you give us something new to try, it takes us several weeks to implement it, and we can't tell whether the problem has improved until after we do that and then observe behavior for several more weeks. That's one of the problems with trying to resolve these sorts of site-wide reliability issues. Part of the problem here is that I'm not really supposed to be spending my time on trying to shephard these problems through to resolution, because I'm not supposed to be primary on our production AFS cell. Stanford wants me to be doing other things and delegating that to other people, but they may not know what they need to be communicating to ensure that you understand what the situation is on our end. > Yes, I can understand that, and I can understand why that is what you're > advocating. But when I see you talk on this list, I see you as a > gatekeeper, and so when I see objection to runtime options, I see that > as something that will become "openafs.org policy" or something if I > don't object. And indeed you're not wrong about the crossover between these opinions and the obligation I feel as a gatekeeper to advocate for design principles that I believe will make OpenAFS better. I believe that reliability and robustness are more important than configurable flexibility, and that ensuring reliability is the top (but not exclusive) priority for OpenAFS. I think this is a key property of a file system; nothing else you do matters if it's not reliable. As long as I'm a gatekeeper, I feel like part of that job is to advocate for a direction and a set of guiding principles that continues the general improvement in the overall robustness of the OpenAFS code and prioritizes that appropriately against other types of changes. (idledead is, of course, particularly challenging since it's a real problem in all directions, and was originally added to address a *different* robustness problem.) However, because AFS is seen as less and less strategic at Stanford, in part because of ongoing reliability issues but more because the usage patterns of file systems have changed and OpenAFS is not currently keeping up, the amount of time that I have available to be a gatekeeper has diminished considerably. If my remaining contribution in terms of trying to advocate for the sort of project I think OpenAFS should be is out of step with the community, then I can resign. > If you want to try to say that we shouldn't add anything more until > these problems are solved, then... well, I don't think you're trying to > advocate that everyone should stop what they're doing just to help you > :) but that at least puts a kind of limit on things. There is, of course, an inherent conflict of interest in any gatekeeper position in that one is not going to care enough about AFS to become a gatekeeper unless one is actually using it, and the problems that one is personally running into are going to, shall we say, come readily to mind. Part of my job as gatekeeper (and, more to the point, elder) is, somewhat inherently, to advocate for Stanford's issues and concerns and hope that those issues and concerns are at least somewhat representative of a class of users of OpenAFS. I don't believe that the situation I'm describing is unique to Stanford, and I have done a reality check with others and think they would have told me if it was. But, again, if I'm out of step with the priorities of the community, I can certainly find something else to do with the small amount of time I'm currently able to spend on OpenAFS. -- Russ Allbery ([email protected]) <http://www.eyrie.org/~eagle/> _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
