Owww! If they were all name brand like HP or something then there's tools from the manufacturer that you could buy to manage them.
Otherwise at most if they are all the same CPU depending on the hardware you might be able to use something from Intel. Otherwise, all you got is in-band management. SNMP does exist for the OS and you can get some stats such as disk space, etc. from it. Ted -----Original Message----- From: PLUG <plug-boun...@lists.pdxlinux.org> On Behalf Of Ben Koenig Sent: Saturday, March 2, 2024 2:30 PM To: Portland Linux/Unix Group <plug@lists.pdxlinux.org> Subject: Re: [PLUG] Linux Software for Data Center Monitoring On Saturday, March 2nd, 2024 at 10:50 AM, Ted Mittelstaedt <t...@portlandia-it.com> wrote: > Are these 800 servers virtual or physical? Physical. > Are the physical servers home-built or commercial from a major brand (HP > Proliant, etc.) Home-built... but often with parts from major brands. Or copy cat brands > Are the servers all the same brand and model or are they a mismash of pieces > from different makers? Uhh.. Ever seen a graphics card with a Gigabyte logo and EVGA silkscreened onto the PCB? > Are the servers yours or owned by customers? That is, if they are virtual > servers owned by remote customers do you have any responsibility to monitor > them?> We own them. And the racks, cabinets, PDUs. > For "emergency notifications" the go-to for FOSS is "Big Sister" > https://bigsister.ch/ Set that up to ping the server interface and if it > trips a breaker and goes offline then have Big Sister email a text-to-SMS > gateway for your cell phone number > > For monitoring power consumption you have to configure the PDUs for that. > I've yet to see one of these that supports current monitoring but does not > support SNMP, so once you get that going you can monitor power consumption > with mrtg or, if you want to get fancy, https://www.cacti.net/ Cacti is based > on RRDtool with is the successor to MRTG https://oss.oetiker.ch/rrdtool/ > The PDUs have SNMP so I may have to take a look at those. I've used RT in the past and it's a bit on the excessive side. IIRC it uses perl and I know next to nothing about perl. As of right now, it basically is a one man show, I am the only one regularly on side for the physical hardware. That said, they want to hire a second person which is where these tools will start to come in handy. Creating a custom tool to manage all this stuff is not outside the realm of possibility, but that might end up meaning that I spend all my time maintaining said tool. My instinct is to start setting up some sort of relational database and build it up piece by piece simply because there is literally NOTHING used to manage this stuff. Especially since the servers are already installed and running. But like anything else the first step is to list all options and make my list of pros and cons. ;)