On Thu, Sep 5, 2013 at 2:56 PM, janI <[email protected]> wrote:
> On 5 September 2013 22:14, Kay Schenk <[email protected]> wrote:
>
> > On Wed, Sep 4, 2013 at 9:36 AM, janI <[email protected]> wrote:
> >
> > > Hi.
> > >
> > > We have had some longer discussions on different ML/IRC about how a
> > > vm-admin should behave and which level of service we expect for our
> > > servers.
> > >
> > > We need new admins, so this is also a request for anyone interested to
> > chip
> > > in.
> > >
> > > We have had some unfortunate incidents on all 3 vm, of different
> nature,
> > > which made me question if we as a community:
> > > a) want servers, that are cared for professionally or by happening.
> > > b) want to (are capable to) maintain the servers ourself.
> > > c) are prepared to support a change that make a), b) possible.
> > >
> > > I have formulated some thoughts on how admins could work, but in
> general
> > I
> > > believe we should convince infra to take over the vm responsibility and
> > > keep our well functioning forum/wiki admins.
> > >
> > > We have a vm-team in place, that was created with the purpose of not
> > having
> > > a single person as admin. I my opinion the team have not lived up to
> that
> > > purpose but I am still thankful for the help I have received.
> > >
> > > Remarks the ideas below are my personal thought, which I have used
> during
> > > the time where I maintained the servers:
> > >
> > > ===========
> > > The server should at all times be maintained with the following
> priority:
> > > 1) security (the backside of being popular is to have the attention of
> > > people who want to gain merit by breaking our servers)
> > > 2) stability (we have limited cpu/ram/disk so we must optimize)
> > > 3) add user wishes (we already have stable systems, 1,2 are far more
> > > important that enhancing the systems).
> > >
> > > Being an admin on a vm is a job that does not take soo much time, but
> > > requires a lot of monitoring and communication (especially with infra).
> > >
> > > A good setup would be, 3 types of admin:
> > > Each server will have an appointed "owner" (anchor-admin)
> > > A number of persons have full sudo on a server (admin)
> > > A number of persons can reboot/restart/work on po files (help-admin)
> > >
> > > === Anchor-admin responsibilities ===
> > > Anchor-admin is the "owner" of the vm and the prime contact to infra.
> > >
> > > Anchor-admin has the overall responsibility of the vm.
> > > 1) help when receiving alerts
> > > 2) keep informed on available patches, especial security related
> patches
> > > 3) create/keep a maintenance plan
> > > 4) coordinate changes external to vm (like dns) with infra
> > > 5) participate in infra discussions relevant for the vm (e.g.
> > certificates)
> > > 6) monitor the vm regularly for resource usage
> > > 7) secure that appl changes are implemented with relevant consensus
> > > 8) discuss work with admin, with the goal that they should be able to
> > take
> > > over one day.
> > >
> > > These activities are expected to take 3-4 hours pr week, more in the
> > > beginning and less later. The hour usage highly depend on the number
> and
> > > level of admins.
> > >
> > > === Admin responsibilities ===
> > > Admins help the anchor admin with ongoing maintenance and have full
> sudo.
> > >
> > > All changes must be discussed and agreed with the anchor admin, no
> change
> > > is so important that it cannot wait until discussed !
> > >
> > > Admins are expected to:
> > > 1) help when receiving alerts
> > > 2) stay informed with the vm configuration
> > > including but not limited to:
> > > - where are which configuration done, and stored (svn/backup)
> > > - how are the apps. configured
> > > - read and update runbook, if something is unclear
> > > 3) participate in the regular maintenance
> > > 4) coordinate all non-scheduled work with anchor-admin
> > >
> > > These activities are expected to take 1-2 hours pr week, more in the
> > > beginning and less later.
> > >
> > > Admin does not need to be specialists, we all learn, but it is
> important
> > > that the admin have motivation and time to learn.
> > >
> > >
> > > === Help-admin responsibilities ===
> > > Help-admins are located in different timezones, so we have 24/7
> coverage
> > > and have limited sudo (only restart/reboot/handle po files).
> > >
> > > When a help-admin receives an alert mail, actions should be taken
> > > 1) is the vm reachable via ssh, then login else escalate to admin/infra
> > > 2) is the vm overloaded, or is apache/mysql not running
> > > 3) restart the needed processes
> > > 4) mail at least anchor-admin about with obervations and what was done.
> > >
> > >
> > > ===
> > > remark the above are just my thoughts, there are a lot of other
> > > possibilities.
> > >
> > > Lets hear your opinion?
> > >
> > > rgds
> > > jan I.
> > >
> >
> > I would like to discuss this topic further, much further as a matter of
> > fact, but right now I don't really have enough information.
> >
> > Can you provide details on the following 9or point to document that
> > describes this):
> >
> > * to aid our memories, who are the current vm-team
> >
> jürgen, andrea, imacat, arist and myself.
>
>
> > * what are the three servers now under the vm-team
> >
> ooo-wiki-vm2.a.o (wiki.openoffice.org), ooo-forums-vm.a.o (
> forums.openoffice.org), translate-vm2.a.o
>
>
> Our servers also depend on erebus.a.o which are proxy server for HTTPS.
>
> * what vm-OS does each use
> >
> ubuntu 12.04 (I have standardized that part).
>
>
> > * for each server, what are the specific applications a vm-sysadmin would
> > need to know/become familiar with to be an effective sysadmin
> >
> for all 3 systems:
> - ubuntu, especially apt-get, apparmor
> - httpd, local installation as defined in ASF
> - php, generic installation
> - puppet, config as defined in ASF
> - sshd, config as defined in ASF
> - svn, usage depend on the single server, but in general all static changes
> are defined here
> - apbackup, as used by ASF
> - memcached
> - mysql
> - /root/bin, helper scripts
> - security applications, as defined in ASF (details are on purpose not
> given to a public list).
>
> For ooo-wiki:
> - wikimedia
> - ATS
>
> For ooo-forums
> - php2bb (remark multiforum setup with links)
>
> For translate
> - pootle
> - django
>
>
> * how are alerts on system failure currently handled
> >
> Nagios and circonus standard setup. Detected alerts goes to #asfinfra,
> infra-team and vm-team.
>
>
>
> > * what resources would a vm-admin need to respond to a system failure
> >
> ??? I am not sure I understand what you mean.
>
> help-admin, would restart/reboot system
> admin, would locate problem, try to fix it
>
OK, on this...basically ssh access to server vm for starters .
On this mention, ssh w/ password or keys?
>
> >
> >
> > Your role outline is good, but I think before we discuss future strategy,
> > we need a better idea about what's involved.
> >
> Or maybe we need someone that are interested before discussing
> theoretically. The Items I listed above, should all be obvious to people
> with SA experience.
>
> We can dream about strategies, but if we dont have volunteers, or the
> people that volunteered dont do the job, its seems to be wrong way.
Well this is certainly true. That is why I thought outlining skills like
this would be helpful.
> The
> vm-team was defined for that purpose, but I dont think the vm-team, apart
> from me, have responded to a single alert.
Well that is disappointing to say the least. A statement like you just made
really plays into the long term plans for these VMs in my mind.
> Arist helped me a lot with
> changing mysql on forum, but that about all the help I have received.
>
> Our current problem, is much more that the current admins do what they like
> to do, instead of following an organized plan. We do not need more people,
> we need people who care and do something as a team.
>
> rgds
> jan I.
>
OK, thanks for this response. Let's see how this analysis/discussion goes.
>
>
>
>
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> >
> -------------------------------------------------------------------------------------------------
> > MzK
> >
> > "Truth is stranger than fiction, but it is because Fiction is obliged
> > to stick to possibilities. Truth isn't."
> > -- "Following the Equator", Mark Twain
> >
>
--
-------------------------------------------------------------------------------------------------
MzK
"Truth is stranger than fiction, but it is because Fiction is obliged
to stick to possibilities. Truth isn't."
-- "Following the Equator", Mark Twain