On 08/25/14 19:48, Phil Pennock wrote: > On 2014-08-25 at 18:02 -0500, Lawrence K. Chen, P.Eng. wrote: >> I don't know much about pagerduty, except one group on campus that shares our >> Nagios server is using it. >> >> So, there's perl script to tie into nagios hasn't left a good impression on >> me. > > On the bright side, it's Perl, and there's always _some_ sucker around > who can read Perl. (Although, here, it's ... me. Damn.) This means > that you can understand what is going to be running on your monitoring > system, which is worth a lot to me. > True.... and compared to figuring out why DateTime::TimeZone >1.69 didn't work in irssi was harder...
> >> And, a proxy is needed because the server is in private IP space (eventually >> our entire datacenter will be....though sounds like it'll all be behind our >> F5, but its been WIP for almost a couple of years now.) > > That is not a sufficient reason to need a proxy. You can use NAT to > solve that problem. What makes more sense is that you might not _want_ > all server instances to have unconstrained network access: if you can > lock down which machines can talk where, then you can have more > confidence about the communication patterns involved and what an > attacker might or might not be able to do. Running a SOCKS proxy and > logging connections made will get you an event stream which can in > theory be inspected for anomalies. > NAT (or Secure NAT) would solve a lot of problems.... except that none of the options that I have access to were available for this vlan. Some of it because its a vlan for servers that will have full access to all other vlans in the datacenter, regardless of data classification. There is also a move that someday we'll go to a default deny outbound on every host.... Which I have dabbled with on some hosts in the past...with varied success. What won't stop are the people that fix the problem by sticking in an allow all (sometimes with comment that its temporary....but that I've come across pre-date me.) Like servers with PII (include our SSNs) definitely should not be accessible from off campus....but a former co-worker had messed up firewalls on such servers just before lunch, and his quick fix had been to allow any to any. (and these servers sat in the old network that is outside any firewalls...when I had started we were supposed to be working on moving everything off of that network....but 8 years later...we still have many things in it. Previous managers had said that they might just request a new vlan that is all or part of that IP space and just push all the servers into it....but it hasn't happened, and I don't know enough to know why or why not.) So, sometime after the co-worker had left, left.... for a hosting company that is all about security, and apparently if they don't think you're serious enough about security you can't be a customer of theirs. I discovered this, so immediately fixed the firewalls correctly. Which broke at least one essential service that was running in the cloud, but shouldn't have been. I was then told that while what I did was the correct action, I'm in trouble because it broke that service.....and in a recently post Virgina Tech....got us in a lot of hot water, especially since I rejected the tickets to reopen the servers to world. We did eventually open it temporarily to a specific IP, and I then pulled it as scheduled without waiting to see if they had actually migrated the service completely back into our datacenter..... Though recently, there have been numerous cases where I'll find that the ticket got reassigned to somebody that overrode the restriction. Still leaving me to fix the mess that comes from it. (though I didn't put in any overtime to eventually fix the visible part of the problem....forget where I left the other part.) > > CPAN on FreeBSD should be integrating into Ports, registering as an > installed port. I haven't looked at how CPAN/FreeBSD ties into package > creation, instead of just registering something as having been > installed. BSDPAN hasn't been updated for pkgng. - not sure if it will: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187111 Had thought about looking under its covers to see what's actually involved, but keep getting distracted with other things to fix. Also probably not going to actually get in any patches to save ports I'm using that haven't been staged. > > I really strongly recommend taking a look at Poudriere, the new Ports > building framework being used by the FreeBSD folks. It uses jails, > dynamically creating jails for builds, to fully isolate the Ports. > Create an overlay ports tree to contain the things which you want > packaged, and you're in good shape. The only caveat is that you can't > use bind mounts or symlinks to hold the overlay in place, since a > read-only bind mount is used to expose the Ports tree into the build > jails. This is just good motivation to keep the overlay in git and > auto-sync in whatever setup tool you use to manage the tree updates. > I've been working on and off for the last couple of months trying to get Poudriere going (along with Portshaker to merge my bits and other peoples bits into a ports tree) The main hang up (other than a disk failing and the 200+ hours to resilver...was was originally estimating 400+ hours, but things picked up along the way. Follow replacing the other disk (looks like I'll need to reboot to see the increased space. And, ignoring the failing disk in a raidz pool.... not sure how it came to be, that even though all my hard drives have 512-byte sectors....that I apparently had given enough though to make my mirrored zpool be 4K aligned...but not my raidz/raidz2 pools. Getting hard to find drives to replace disks in the raidz pool (though the remaining 'good' disk from the mirrored pool is the same size as disks in the raidz pool, so I could use that now) has been to have the poudriere jails use most of the same options I've set across the various make.conf's...and some thing similar with the various port options. Decided that 4 jails for each of the 4 servers wasn't what I wanted....settled on 2 jails. One for my two headless servers (which I had intended that they both be the same...though there have been some drift...which I'm currently working on resolving). And, my two workstation servers....though that might not be possible since there's quite a lot of difference between the two....since they do quite different things in the background. Like my home system is my policyserver for CFE3, its my SVN server (though its project to convert to git and host that from my headless servers...in some HA failover scheme...using the HAST volume.) Though might just do something else to get things to git (so tired of doing 'svn commit' and more than what I intended getting committed....need to stop doing so much task switching...perhaps I need to stop using -m "<life story>" with my commits, though it was hard enough to break out of bad habit I had picked up at work for just doing 'svn commit -m ""') Plus there are other differences and desires for the two workstation servers.... I suppose I'm a bit more of a risk taker with ports on my system at home (even though its actually more important to work then my work one, since I pop all my work email to it....so that I can better decide what's spam and what's not and use more than 64k of message filters to sort it... And, then connect to it from work to keep up with my work email...) Though having my work system stay up all the time is important...for all the screen sessions I have running on there. Though one obvious difference is work system has mysql=5.6p... > > Monitor process count on the monitoring box, alert when it goes too > high. You might not reach Pagerduty, but that's the sort of event, on > an isolated box under your control, where you make sure that the contact > for that service goes to multiple places, not just pagerduty. Start > screaming blue bloody murder. > In the past, I had gotten comments on why there have been nagios alerts that nagios is down.... But, as I mentioned its a shared server, and one of the groups using it is the one using pagerduty. I have no control on how they choose to get alerted or not. And, nothing to do with the fact that the sysadmin is my former $boss. > > Why did they switch in one move, instead of using _both_ pagerduty _and_ > email? > Probably one of the great mysteries that will never be solved.... > >> Hadn't really thought about our notifications from this Nagios server now >> being dependent on our smtp server....our old server had been in the >> datacenter range that is completely open to the world....so it did its own >> mail delivery (especially important when it used to largely inform us of >> problems with campus email...) Though its getting hard for me to handle >> notifications timely/safely.... > > See above: once you've debugged your integration and checked the failure > modes, the active link and retries outside the store-and-forward result > in more reliability. I still have the emails too, but they're mostly > just something to purge from my mail folders when they finally straggle > in. > Well, the pagerduty is specific to one group....the rest of us will still be dependent on email....and worse the unpredictable email-to-SMS. At least it has gotten considerly less noisy (something my new manager had praised about our configuration...which came out of some best practices session from my first LISA ;) And, I'm probably not concerned enough about it....though, we're supposed to eventually have a capable NOC to deal with most notifications or escalate appropriately from there. Wonder when the purge of always down or irrelevant monitoring will take place? Still don't know why we're monitoring the inter-library loan printer...or who's supposed to act if it goes down. (probably something left over from when LNS merged with CNS...well before I started here.) Also when I started here...our nagios was running on a long ago former sysadmin's desktop. Along with constant grumbling that needed to get upgraded and moved.... (something that I decided to just tackle on my own....following my first contract renewal....and perhaps why I had gone to such sessions at my first LISA...) -- Who: Lawrence K. Chen, P.Eng. - W0LKC - Sr. Unix Systems Administrator For: Enterprise Server Technologies (EST) -- & SafeZone Ally _______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
