On Sun, Jan 29, 2012 at 2:15 PM, Stephen Gran <sg...@debian.org> wrote: > Hello, > > Unfortunately, one of the pair of machines providing the alioth service > (vasks.debian.org) won't power on. We are working on it, and apologize > for any inconvenience caused. > > Cheers,
Hello, First, thanks from a plain user, to all people working to provide infrastructure to Debian project. Second, a suggestion or some brain dump about ideas on howto improve the issues communication: I imagine the scenario, where some DD is trying to work from any place in the world. Nowadays, there are many points to check if a service is not working... is it my last upgrade? is it my last config change? is it my ISP? is it some intermediate ISP? is it the service that is really down? of course, this kind of email notifications are just fine to notify a known issue. So I ask myself... the reason to do not run "any" public monitoring system, is much increase in the workload of the sysadmins ? There are different approaches to do it... approach one) Run a public nagios, monit, whatever, configured with templates to notify to this list on defined events (i.e. more than 10 minutes down? the service, the DNS, the whole machine, the whole network? is service recovered again? approach two) Search across available opensource monitoring systems, some than can run some "status.debian.org", so instead of emails, users having an issue can lookup such dashboard, and see present and past status or issues. approach three) Write a fast and furious bash/perl/python script (can be cool to just use priority >= standard or as few depends as possible), that takes a debian.org/infrastructure.yaml file (or .json or .txt or xml or ...) that defines Debian machines and services... the CLI client runs against such file (so it diagnoses that network connection to d.o is ok in first instance) and prints a report of unreachable services... (one run, one check. So no too much overload unless lot of users synchronize a DoS, that can be done with or without this tool). approach four) Search or write a distributed monitoring service, that provides the "one" or "two" approaches, but from different geolocalized places, so after detect that a service/machine is down "from here", it tries to communicate with other continents monitoring systems and contrast results before "validate" the issue. approach five) ... sure that people more clever than me, can propose better solutions, to automate issue notification and tracking... please do! This is not one big neither important, "improvement front", to Debian, these are just suggestions on ideas to improve the process, from my personal view point, that of course maybe plainly wrong from outside the project. I can just help with details on the ideas, with code if needed, and collaborate from my home aDSL to distributed monitoring in case it's needed, but I think that my home connection fails more often than Debian machines do. Again thanks to every people doing the work. -- Iñigo -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/CAKDTd8QhJ14=ubtwvstvqzxrfd5ejs9m9bhipkn_ha8w6nz...@mail.gmail.com