Re: Is Scientfic Linux Still Active as a Distribution?
I find it extremely puzzling that you can express so much fear about IBM and then even consider moving into the arms of Oracle! I used Scientific Linux for about 10 years and it was fantastic. Close to 4 years ago I moved to a new place which uses CentOS instead. It also works just fine, other than the support cycle being a bit different there is almost zero pain or re-learning involved. If that wasn't enough, FNAL and CERN have also clearly put their faith into CentOS for the future. On 2/23/2020 12:24 AM, Yasha Karant wrote: From below: Will look forward to move to another distribution. End excerpt. The question is: which distro? My first hope was Oracle EL 8 -- given that Oracle has to compete with IBM and thus, unlike CentOS that may or may not fit into the profit/business long term plan of IBM (long term -- less than a decade, but more than three or four years -- at least through EL 9 first production release), provide a "working and usable" product, just as was SL. After reading comments on this list, I am more tempted to give up on EL and move to Ubuntu LTS. But -- I have not made a decision. For those who require a reliable, production, stable, but reasonably "current" Linux environment ("current" means that when I need an application, I will not find that there are no ports of the recent releases of the application to the Linux I am using because the major libraries -- .so files -- are too "obsolete"), what choices are available? In so far as possible, I want the same distro to work on servers (and have CUDA support for compute servers with Nvidia GPU compute boards as well as MPI) and my laptop "workstation".
Re: RAID 6 array and failing harddrives
On 4/4/2017 6:59 PM, Konstantin Olchanski wrote: Moving to ZFS... ZFS is also scary... Heh - another soon to be victim of ZFS on linux :) No kidding. Former victim of XLV+XFS (remember XLV?), former victim of LV+EFS, former victim of ext2, ext3, reiserfs, former victim of LVM, current victim of mdadm/raid5/6/ext4/xfs. You'll quickly realise that the majority of major features you'd expect to work - don't. I am not big on "features". For me the main features is open()/read()/write()/close(), mkdir()/rmdir()/readdir() and those seem to work on all filesystems. Next features are: a) non-scary raid rebuild after a crash or disk failure, b) "online" fsck You can't grow a ZFS 'raid'. You're stuck with the number of disks you first start with. (I know this is 2 quotes back) That's kind of unfair since here you're talking about a feature that was never offered, it's just an incorrect assumption. It took me a while to understand what ZFS does and does not offer as well - I missed many things from (ancient history) Digital's advfs - but ZFS does lots of things quite well. There really is no such thing as "a ZFS raid", that's probably most analogous to a zfs pool made of a single raidz vdev, but that's a very simple case. What other system lets you make large reliable storage pools from hundreds of drives on a single server? I built some with 200+ 4TB drives some years back. We only have a few hardware configurations, all with fixed number of disks, so not a problem: a) single 120GB ssd for OS (/home on NFS) b) single SSD for OS, dual 4-6-8 TB HDD for data, RAID1 configuration to protect against single disk failure c) dual SSD for OS and /home, dual HDD for data, both RAID1 configuration to protect against single disk failure d) single SSD for OS, multiple (usually 8) 6-8 TB HDDs for data, mdadm raid6+xfs and now raidz2 ZFS (protection against single disk failure + failure of second disk during raid rebuild). For case (b) for your data storage, you can expand a ZFS mirror reasonably easily. For case (c) I don't know how hard it is to use ZFS for the OS drive on linux; I only used it on BSD. But mdadm on linux is ok for that role. For case (d) it is true that you cannot expand a ZFS RAIDZ(2) vdev, but that's ok if you know that going in. BTRFS is billed as "open source replacement for ZFS", but after testing it, my impression is that it is only used by a couple of enthusiasts in single-disk laptop configurations. In a single-disk system, it is not clear how btrfs/zfs is better than old-plain ext4/xfs. I've never seen any good success stories for btrfs but to be fair I have not followed it closely. zfs can still give you some useful things on a single drive system: you get the data checksums, useful snapshots (as opposed to LVM), volume manager features, etc. By default the checksums would only warn you of problems with a single drive, but you can tell zfs to keep multiple copies of your data ("zfs set copies=n") so that might well also let it recover from bad blocks. G. -- Graham Allan Minnesota Supercomputing Institute - g...@umn.edu
Re: Need help Debuging boot
I think that's a linux message. I've seen this before when output starts getting redirected to the serial port (or ipmi/iDRAC/iLO virtual serial port). Maybe check if there is some type of console redirection set up in the BIOS? It seems to me that when redirection is set up, the BIOS itself and GRUB can output to both serial and regular console; once the kernel boots, messages only go to one or the other. The "hang" (or rather change in output) doesn't have anything to do with the EDD message itself - it just happens that this is the last message printed during that particular phase of booting. G. On 2/17/2017 10:33 AM, Konstantin Olchanski wrote: On Thu, Feb 16, 2017 at 10:07:55PM -0600, ~Stack~ wrote: I have a bunch of new SuperMicro servers. Installed 7.3 on it. Reboot and it hangs at: "Probing EDD (edd=off to disable)...ok" That's a BIOS message, not a linux or grub message, yes? (I do not see any EDD messages in the linux log files) Anyhow, *which* SuperMicro servers? (so I do not buy the same) However, if I let it sit long enough it will boot (once one sat for an hour before it continued on, most of the time it is closer to 30-40 minutes). The SuperMicro mobo/bios is notorious for slow booting, takes a good few minutes from powerup to grub menu. But 30 min is extreme, yes. If I boot into rescue kernel, it instantly boots. Every time. This is so puzzling to me. How do you mean? The "EDD" message is before grub menu or after grub menu? -- Graham Allan Minnesota Supercomputing Institute - g...@umn.edu
Re: firefox 45.2 issues - actually kernel issues.
It looks to me like it's somehow related to NFSv4, as the /var/log/messages files on the affected clients contain repeated lines like: nfsidmap[7508]: Failed to add child keyring: Operation not permitted along with call traces from the hung firefox processes kernel: Call Trace: kernel: [] ? nfs_permission+0xb2/0x1e0 [nfs] kernel: [] ? security_inode_permission+0x1f/0x30 kernel: [] __mutex_lock_slowpath+0x96/0x210 kernel: [] mutex_lock+0x2b/0x50 kernel: [] do_filp_open+0x2d6/0xd20 kernel: [] ? nfs_attribute_cache_expired+0x1b/0x70 [nfs] kernel: [] ? cp_new_stat+0xe4/0x100 kernel: [] ? strncpy_from_user+0x4a/0x90 kernel: [] ? alloc_fd+0x92/0x160 kernel: [] do_sys_open+0x67/0x130 kernel: [] sys_open+0x20/0x30 kernel: [] system_call_fastpath+0x16/0x1b The nfs-utils package hasn't updated in a while so nfsidmap itself shouldn't have changed, but I see that it uses an in-kernel keyring to store the mappings, so it could certainly be affected by a change in the kernel. G. On 6/21/2016 9:40 AM, Jesse Bren wrote: Looks like its not the firefox update at fault but the new kernel, which was also released about the same time, not playing nicely with our NFS file servers. Hopefully this can be resolved soon, currently am having users boot into the previous kernel version (2.6.32-573.26.1.el6.x86_64 as opposed to 2.6.32-642.el6.x86_64). Sorry for the, somewhat, false alarm about firefox. Jesse
Software Collections 2.0?
I know it hasn't been out for very long, but I was wondering if it's planned to have a scientific linux rebuild of SCL 2.0? Or, indeed, if the preferred strategy these days is to use the repos at softwarecollections.org? My impression from dipping into Centos mailing list archives is that softwarecollections.org isn't necessarily a direct equivalent to the RHEL or Scientific Linux SCL builds, but perhaps that's outdated information. Graham -- - Graham Allan School of Physics and Astronomy - University of Minnesota -
Re: What happened to adobe repository ?
On 1/15/2014 4:20 AM, Urs Beyerle wrote: Adobe discontinued the Adobe Reader 9 for Linux in June 2013 and has not fixed and will not fix any further security issues in it. Therefore it makes totally sense to remove it from their repo. I'm not disagreeing with you but it's still a breathtakingly crappy way of handling it. Acroread for linux is still available as a regular web download, so it's not remotely obvious that it's desupported unless you follow the news independently. For example it might have been worth a final mention on Adobe's acrobat for unix blog http://blogs.adobe.com/acroread/ rather than leaving that abandoned since 2010! Graham
Re: Large filesystem recommendation
It's not so bad if you build the system taking these things into account (much easier if you wait long enough to read about others' experiences :-). We built our BSD ZFS systems using inexpensive Intel 313 SSDs for the log devices. I can't say that they're the best possible choice, opinions vary all over the map, but the box is currently happily accepting 2Gbps continuous NFS writes, which seems pretty decent. Graham On 7/24/2013 5:36 PM, Paul Robert Marino wrote: ZFS is a performance nightmare if you plan to export it via NFS because of a core design conflict with how NFS locking and the ZIL journal in ZFS. Its not just a linux issue it effects Solaris and BSD as well. My only experience with ZFS was on a Solaris NFS server and we had to get a dedicated flash backed ram drive for the ZIL to fix our performance issues, and let me tell you sun charged us a small fortune for the card. Aside from that most of the cool features are available in XFS if you dive deep enough into the documentation though most of them like multi disk spanning can be handled now by LVM or MD but are at least in my opinion handled better by hardware raid. Though I will admit the being able to move your journal to a separate faster volume to increase performance is very cool and that's only a feature I've seen in XFS and ZFS.
Re: Large filesystem recommendation
I'm not sure if anyone really knows what the reliability will be, but the hope is obviously that these SLC-type drives should be longer-lasting (and they are in a mirror). Losing the ZIL used to be a fairly fatal event, but that was a long time ago (ZFS v19 or something). I think with current ZFS versions you just lose the performance boost if the dedicated ZIL device fails or goes away. There's a good explanation here: http://www.nexentastor.org/boards/2/topics/6890 Graham On Thu, Jul 25, 2013 at 10:41:50AM -0700, Yasha Karant wrote: How reliable are the SSDs, including actual non-corrected BER, and what is the failure rate / interval ? If a ZFS log on a SSD fails, what happens? Is the log automagically recreated on a secondary SSD? Are the drives (spinning and/or SSD) mirrored? Are primary (non-log) data lost? Yasha Karant
Re: advice on using latest firefox from mozilla
Latest firefox is available pre-packaged for SL at Remi's repo, http://rpms.famillecollet.com/ we used it for a while when TUV was still supplying a desperately old version, though more recently switched back to the supplied ESR release. Graham On Thu, Jun 06, 2013 at 12:19:23PM -0500, Ken Teh wrote: I'd appreciate some more details on how you implement the Mozilla update protocol in an (not quite) enterprise environment. IOW, not hundreds or thousands of machines but enough to make manual updates unfeasable. Isn't mozilla's update user-based? When a user launches firefox, the browser checks for updates, and if there is a newer version, asks the user to download and install it. What do you do if the user has no privileges to install software? Thanks! On 06/05/2013 08:40 PM, Yasha Karant wrote: On 06/05/2013 01:57 PM, Ken Teh wrote: I'd like to hear some pros and cons with using the latest firefox from mozilla instead of using the ESR version that comes with the stock distro. I am deploying a web app that fails to render properly. It is a bug in firefox which has been fixed since version 18. Naturally, the ESR version is 17. Sigh... As we were deploying new Nvidia-equipped stereoscopic 3D scientific visualisation workstations using X86-64 SL6x, we had to make a decision as to whether to use the SL distribution Firefox (ESR) or the latest production release. After considering the pros and cons (including that the machines are behind a network firewall), we selected the current production version. Thus far, we have had no issues, and have done several updates using the Mozilla Firefox update technique, not the SL6x update, keeping Firefox up to the current production release. Part of the reason for the decision was the observation you also have made; certain defects that were corrected in the production release did not have the corrections backported to the earlier release ESR SL version. Yasha Karant -- - Graham Allan School of Physics and Astronomy - University of Minnesota -
SL 6.1 installation anaconda failure - issue at ftp.scientificlinux.org?
We just had to install a couple of SL 6.1 machines (due to collaboration dependency on this version) and strangely found that the installation would bomb every time with an anaconda error when trying to parse /tmp/.treeinfo. The file contains the following contents: !DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN htmlhead title300 Multiple Choices/title /headbody h1Multiple Choices/h1 The document name you requested (code/linux/scientific/6.1/x86_64/updates/security//.treeinfo/code) could not be found on this server. However, we found documents with names similar to the one you requested.pAvailable documents: ul lia href=/linux/scientific/6.0/x86_64/updates/security//./linux/scientific/6.1/x86_64/updates/security//./a (common basename) lia href=/linux/scientific/6.0/x86_64/updates/security//../linux/scientific/6.1/x86_64/updates/security//../a (common basename) /ul /body/html This is curious because we are installing from a local copy of the 6.1 repo, our kickstart file hasn't changed in a long time, worked ok when last used earlier this year, etc. To cut a long story short I eventually found this at scientificlinuxforum.org: http://scientificlinuxforum.org/index.php?showtopic=2302 it was suggested there that someone post this to the SL lists but I don't see any sign that happened. Their workaround of a dummy hosts entry for ftp.scientificlinux.org does let the install succeed, but I imagine that either putting the file back into place, or disabling the helpful response from the server in favor of a plain 404 response, should help? Although I'm curious why anaconda is attempting to download a file from ftp.scientificlinux.org when all the declared sources are local - some oversight in the installer build? The same problem afflicts SL 6.0, btw. Thanks, Graham -- - Graham Allan School of Physics and Astronomy - University of Minnesota -
Re: [SCIENTIFIC-LINUX-USERS] sata0 is not sda
On Wed, Feb 20, 2013 at 07:57:22AM -0600, Pat Riehecky wrote: On 02/19/2013 10:14 AM, Graham Allan wrote: On Mon, Feb 18, 2013 at 07:33:37PM -0700, Orion Poplawski wrote: On 02/18/2013 02:01 PM, Ken Teh wrote: During a kickstart install, how are drives mapped? I notice that sata0 is not always sda. This is especially true when there are very large drives in the mix. The sd* letters are simply handed out in order of enumeration and, as you noted, is not deterministic. If you need that, use the /dev/disk/by-{id,label,path,uuid} labels. For this reason we use a script during the kickstart %pre section which attempts to examine the available drives and determine which is the appropriate one to install on (also waits for confirmation if an unexpected partition table is found). G. Any chance you can share that script? It sounds interesting! Well I hope I haven't oversold this - it's not a miracle cure but it works for our environment. Back in the SL3/4/5 days we made some basic assumptions in kickstart %pre that if there was an hda device, that would be the OS drive, otherwise use sda. If a workstation didn't match that model, we'd generally beat on it until it did :-) IIRC SL6 changed so that ATA drives, and any connected usb drives, appeared as sd* devices, and you couldn't be certain about the ordering. There will be lots of deeply unfashionable things in this script such as using regular drive partitions instead of LVM, using separate /var, etc... Also please remember it's been hacked together as edge cases were discovered so it's not pretty, or commented as well as it might be. It tries to examine all candidate drives (after eliminating USB devices) and will select a drive for installation if it either contains no partition table, or finds a /boot directory (actually that hardly seems foolproof, there might be better choices). Otherwise if it finds an unfamiliar partition table, it prints it and asks for confirmation. The rest of the script is concerned with stashing away ssh keys and suchlike, for restoration after a reinstall. Here goes, hope vim didn't mangle it overmuch during pasting, and don't judge it too harshly! %pre ## # Figure out where to install to #Make sure USB storage devices are GONE modprobe -r usb_storage mkdir /mnt/tmp for i in `ls /dev/sd?`; do CANDIDATE_DISK=$i echo UMPHYS: Checking $CANDIDATE_DISK for partition table | tee -a /tmp/ks.log /dev/tty3 if parted -s $CANDIDATE_DISK print /dev/null; then echo parted found a partition table | tee -a /tmp/ks.log /dev/tty3 if mount ${CANDIDATE_DISK}1 /mnt/tmp; then if [ -d /mnt/tmp/boot ]; then INSTALL_DISK=$CANDIDATE_DISK echo Found /boot on ${CANDIDATE_DISK}1, using $CANDIDATE_DISK as system disk | tee -a /tmp/ks.log /dev/tty3 umount /mnt/tmp break else echo Couldn't find /boot on ${CANDIDATE_DISK}1, moving on to next disk... | tee -a /tmp/ks.log /dev/tty3 umount /mnt/tmp fi umount /mnt/tmp fi else echo Failed to mount ${CANDIDATE_DISK}1, moving on to next disk... | tee -a /tmp/ks.log /dev/tty3 fi else echo parted found no partition table, using $CANDIDATE_DISK as system disk | tee -a /tmp/ks.log /dev/tty3 INSTALL_DISK=$CANDIDATE_DISK break fi done if [ ${INSTALL_DISK}x = x ]; then echo /dev/tty1 echo Initial check failed to find a suitable system disk! /dev/tty1 echo /dev/tty1 for i in `ls /dev/sd?`; do CANDIDATE_DISK=$i if mount ${CANDIDATE_DISK}1 /mnt/tmp; then if ! [ -d /mnt/tmp/boot ]; then echo -e \n\nCouldn't find /boot on ${CANDIDATE_DISK}1 /dev/tty1 echo -n Partition table for ${CANDIDATE_DISK}: /dev/tty1 fdisk -l ${CANDIDATE_DISK} /dev/tty1 doit=default while ! echo $doit | grep -P ([Y|y]es|[N|n]o) /dev/null 21; do echo -n Install linux on ${CANDIDATE_DISK}? [Yes/No] /dev/tty1 read doit done if echo $doit | grep -P [Y|y]es /dev/null 21; then INSTALL_DISK=$CANDIDATE_DISK umount /mnt/tmp break else echo moving on to next disk... /dev/tty1 fi umount /mnt/tmp fi else echo -ne \n\nPartition table for ${CANDIDATE_DISK}: /dev/tty1 fdisk -l ${CANDIDATE_DISK} /dev/tty1 doit=default while ! echo $doit | grep -P ([Y|y]es|[N|n]o) /dev/null 21; do echo -n Install linux on ${CANDIDATE_DISK}? [Yes/No] /dev/tty1
Re: puppet
On Fri, Feb 22, 2013 at 11:39:58AM -0500, Paul Robert Marino wrote: The only problem I ever had with cfengine is the documentation was never all that great but it is stable and scales well. That being said puppet is not perfect many of the stock recipes for it you find on the web don't scale well and to get it to scale you really need to be a ruby programer. My other issue with puppet is it doesn't provide you with a great amount of control over the timing of the deployment of changes unless you go to significant lengths. Essentially its good for a Agile development model environment which is popular with many web companies; however its a nightmare for mission critical 24x7x365 environments which require changes to be scheduled in advance. At the risk of continuing off-topic for the list... but it's a really interesting discussion... We ended up building a bunch of infrastructure around cfengine to help with this kind of thing. First step was getting the cfengine config into version control (svn, then git), which seems basic now, but I certainly wasn't doing that 10 years ago! Then one of our smart student sysadmins devised a way we could make development branches of the config, and tell specific machines which branch cfengine should use. That's been very useful for figuring out more complicated actions like setting up OSG nodes. Of course once you build this kind of infrastructure and have it working, you're reluctant to abandon it. Maybe there are tools which do all this for you now, that's why this is such a good thread. These days I'm using Spacewalk for most of what I would have used cfengine or puppet for in the past the only thing that it doesn't do out of the box is make sure that particular services are running or not running at boot, but there are a myriad of other simple ways to do that which require very little work, and if I really wanted to I could get spacewalk to do that as well via the soap APIs. Yes, spacewalk seems like it can do a lot (some of our neighbors here use it). We still have a somewhat multiplatform environment - SL, FreeBSD, a few lingering legacy systems still hanging on (tru64), so more generic tools are still important to us. Graham -- - Graham Allan - I.T. Manager School of Physics and Astronomy - University of Minnesota -
Re: Firefox 17 causing X server crashes
On Fri, Feb 22, 2013 at 10:30:50PM +, Phil Perry wrote: I had X crash yesterday when using the new 17.0.3 Firefox too on EL5 (32-bit FF on 64-bit OS). I'm an NVIDIA user - I mention it as I really can't remember the last time X crashed for me, and I've been running this box since 5.0. X has crashed maybe 5 times at most in 5 years and the box runs permanently only rebooting for new kernels. The reason I mention this is I'm more inclined to point the finger at the new FF that the graphics driver, at least in my case, as that's the component that changed right before the crash. I have not been able to reproduce it so have no evidence, just a gut feeling. For what it's worth. We also had someone experience this on an SL5 machine in the last few days - also using nvidia video, using the nvidia driver from ELrepo. Someone fixed it but the only comment I saw was Getting GLX working correctly seems to have solved the problem. I remember hearing something about the nvidia libglx.so failing to load in the xorg logs, but I'll ask what the fix really was. It may simply have been reinstalling the elrepo nvidia driver. G. -- - Graham Allan - I.T. Manager School of Physics and Astronomy - University of Minnesota -
Re: sata0 is not sda
On Mon, Feb 18, 2013 at 07:33:37PM -0700, Orion Poplawski wrote: On 02/18/2013 02:01 PM, Ken Teh wrote: During a kickstart install, how are drives mapped? I notice that sata0 is not always sda. This is especially true when there are very large drives in the mix. The sd* letters are simply handed out in order of enumeration and, as you noted, is not deterministic. If you need that, use the /dev/disk/by-{id,label,path,uuid} labels. For this reason we use a script during the kickstart %pre section which attempts to examine the available drives and determine which is the appropriate one to install on (also waits for confirmation if an unexpected partition table is found). G. -- - Graham Allan - I.T. Manager School of Physics and Astronomy - University of Minnesota -