On 02.09.2005, at 17:25, Niclas Hedhman wrote:

On Friday 02 September 2005 20:23, Erik Abele wrote:

I honestly don't feel like "fueling" this thread, so please don't hesitate to say I am outright stupid and don't know what I am talking about, and I'll
shut up as a good citizen... My intention is not to "whine".

Your last msg was quite fueling, no? ;-)

Why isn't that working for others?

Yes. Why?
My take is that only one in 300 committers have what it takes to "get thru". I was not one of them... Can that ratio be improved? If it takes 10 people to keep the ship afloat, (as a manager) I would plan for at least one person leaving every quarter, and that would then set the minimum recruitment pace.

Uhm, 'recruitement' (in the managerial sense) doesn't really work in a volunteer-driven organization with collaborative and meritocratic development processes... you can of course encourage people but it seems that that unfortunately doesn't really work with the unsexy jobs of [EMAIL PROTECTED]

The infra repo isn't the almighty tool everyone needs. Most material
in there (if not all by now) isn't instantly useful if you are not on
top of the different setups.

Ok. So I don't need to bother about the docs, since they may confuse me even
more? Good start.

I didn't say that; all I said is that the infra repo is not our primary documentation place for beginners. It's just the place where we keep configuration files like crontabs, dns zonefiles, the httpd config and so on. There is also a limited set of documentation but I wonder why a infra newcomer has to know things like how to access the terminal server or how the network in the colo is set up.

Furthermore you can find nearly
everything on the machines itself, mostly world-readable;

I noticed a pluralis of "machines". AFAIK, only minotaur is "world
accessible".

Yep and that should be enough to show that you know what you are doing and that what you are doing is goodness...

a) overload is self-inflicted
Uh oh, just consider the following example: account requests.

How long can it possibly take?
Let me make a guess ~1 minute, perhaps 2. Let's say I spend half an hour a day, that makes it 15 a day, and several thousand per year. Apparently, this
can't be a bottle neck.

You snipped the most important failures so trust me that it isn't done within a 1 minute. But that just shows the ignorance (not necessarily willful) we are facing, see below.

- pmc votes in new committer,
- makes him sent in a CLA;
- the PMC chair watches for the receipt of the CLA and if it gets recorded,
  he sends a single email to root@ (cc'ing the PMC) in a pre-defined
  format and waits till the account is created.

Great. So it is not a problem, anymore?

If everybody would follow this scheme it would be nice but as I said, nobody is doing so. Well, to be correct and fair, nobody *was* doing so, it is really better now with respect to account requests - but that was just one of a bunch of examples to make my point clear: even if we have a process and documentation in place, we are still facing a lot of people not following the process, ignoring the documentation and whining and pestering to get their work done :(

b) being disorganized
Maybe, but keep in mind that we are all volunteers and that not only
the ASF is growing tremendously, our hardware/infrastructure needs
are doind so too. Old systems and services have to be kept running
for projects who want to still use it, new systems and services have
to be put in place (and administered) because projects are begging
for it. The complexity is growing daily.

I recognize that. And I happen to be of the opinion that it is self- inflicted. Leo wrote a humorous mail about it two months ago "Why we say no.". And just like projects don't have a choice of CVS, such policy could be introduced for Jira/Bugzilla/Scarab (any other?) as well, if it is seen as a taking up
precious time.

Yes, infra could say 'no' more often or could simply shut down the services they don't want to administer. To be honest, I'd be fine with this (and its consequences (people/projects leaving, flamewars, what-have-you)) but since I don't want to discuss this in hundreds of emails, I'll simply leave and let others take over. No harm done.

c) non-transparent
Hmm, IMO infra is *not* non-transparent; it's just that the bar is
pretty high (knowledge-wise and confidence-wise (in the sense of
trust)). Please give me an example of what is so non-transparent; I'm
willing to help you here.

Example 1. You said it yourself -> docs are "shaky", but I could live with that. The problem is "everyone knows they are not good" and it has been hinted that a lot of material is outright wrong. That makes it even worse.

Okay, but that is not 'non-transparent' - the bar is just higher. I agree that it'd be nice to have more docs but OTOH the people have to also read them, see my example with the documented account request processes; nobody was following it, even after several pmc-wide emails :)

Example 2. Most requests comes in as either a mail or a Jira issue. Some time later, someone like yourself, mark it as "done". If I was overworked, and that I wanted others to get involved, I would spend more time explaining what
I did to make it "done" than I did to "do it". *In detail*.
Over "my time", that rarely happened, and I took it as "they don't want help
with that".

I agree and I always tried (*) to explain what I did to a) show the other infra guys that I know what I was doing and b) to educate others. I learnt a lot by just reading infra mails. Ah, and some time ago, Leo even started a tool to help with this, but I have to admit that I'm not aware of the specifics right now.

*: note the past tense, I'm not doing it anymore and I know that it's bad but here you'll have to bear with me, sorry, I got lazy over time

Example 3. I think that most resources are turned off by default, and only after long considerations, made accessible (read and/or write) to a wider
audience. That is natural security awareness kicking in, but little
discussion is going on, about how to make more info available. Can other people watch this configuration? I have always been of the opinion that ASF
is more secretive than the situation calls for.
The fact that many services live on machines that are not accessible, makes it difficult to peek around to get an idea of how things are setup, without "bothering" the peeps who do the work, since it is likely I won't be able to
help "in that particular area" right now.

IMO this is not the case. There are certainly parts which are only accessible to members, but that has also legal reasons. Everything else is more or less open, at least to a degree where you can show that it warrants more karma for you...

I'm still missing any concrete examples of issues which can't be solved because of too restrictive access.

d) "put out fire by hand"
Well, that's the occasional hdd failure or worm attack or svn wedge
or ... . It's pretty hard to come up with automated solutions to
every problem so administering a system always means to baby-sit it
in some way. If it would be solvable by a click on a fancy button,
the managers could do it and we wouldn't need any sysadmins anymore :)

I get the impression by your response that there are no problems, or overload
at the infra@ team.

Huh? 'occasional' in the terms of 10 machines and millions of users may mean 'every second day'.

Catastrophic events can't be automated, but they happen
rarely.

Catastrophic events may happen rarely but that's not all. See the list below.

All the 'bulk' is already streamlined, and shouldn't take much time.

Is it? No, it is not, unfortunately...

So what is it? Full time staff is needed, so there must be something.

How about the following list:
- creating, tweaking, moving, deleting of different project resources like mailing lists, svn repositories, user accounts etc. - recovering from crashes of different sorts, ranging from hw (hdd, network, ...) to sw (rsyncs, repositories, ...) - bearing with occasional events like virus attacks or malfunctioning mass downloaders etc. - tweaking all sorts of things due to user requests due to other users failures ('can someone chmod these files please')
- answering questions
- thinking about and discussing improvements, changes, etc.
- keeping systems and services up to date, testing updates/changes
- putting new systems/resources in place
- caring about backups, hw orders and other boring things like security
- reading a lot of emails (heh, just the nightly cronjobs are 15 emails alone plus numerous other alert/info mails)
- there are numerous more events which I don't want to bore you with...

And remember: this is not why we (the infra team) originally came to the ASF. The reason was developing software like HTTPD, Tomcat, Ant or Foo - so that's all substracting from the time we can do this. It's not that we are paid to do it or have the greatest fun doing it; it's a necessity!

I guess for most people it was just *fun* getting involved with OS software development (remember: some people don't even have IT as their day job!) and now we are stuck with keeping the ship afloat in the hope of getting it to a state, where we have enough time to get back again to the things for which we came here... hah, what a mess :)

Thanks, I'm nearly outta here too - it's far more easy to support my
own systems which have to take care only for a couple hundred users
per second and not millions and, ah, making a living out of it
instead of just fighting with a huge amount of of whining people,
materializing in hundreds of emails :|

Erik, in case no one has expressed it before; A Big Thank You!!!

Oh, don't thank me (although I appreciate it) - I'm only a very small lightbulb in the flashing universe of infrastructure :)

You, Noel,
Leo, Justin and everyone else are providing a wonderful service. That is the
external interface, and I think you manage that well.

Thanks :)

And thanks for getting me think about this another time. I was not really actively involved in infrastructure issues for the last two months (except for an occasional helping hand on IRC) and with that distance I realize now how much time and effort it took for me, oh well...

If mails are a problem, disable the mailing list and require Jira to be used
as the medium to communicate with the infra@ team.

I think that would be the wrong way but OTOH, emails are sometimes actually an issue for me. As many others, I'm not a native english speaker so writing an email is probably taking twice (if not more) the amount of time... especially these long ones, sorry, I'll shut up now since I think _in the end_ we are basically on the same page :)

Cheers,
Erik

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to