Re: Is Scientfic Linux Still Active as a Distribution?

2020-02-23 Thread Graham Allan
I find it extremely puzzling that you can express so much fear about IBM 
and then even consider moving into the arms of Oracle!


I used Scientific Linux for about 10 years and it was fantastic. Close 
to 4 years ago I moved to a new place which uses CentOS instead. It also 
works just fine, other than the support cycle being a bit different 
there is almost zero pain or re-learning involved. If that wasn't 
enough, FNAL and CERN have also clearly put their faith into CentOS for 
the future.


On 2/23/2020 12:24 AM, Yasha Karant wrote:

 From below:

Will look forward to move to another distribution.

End excerpt.

The question is:  which distro?  My first hope was Oracle EL 8 -- given 
that Oracle has to compete with IBM and thus, unlike CentOS that may or 
may not fit into the profit/business long term plan of IBM (long term -- 
less than a decade, but more than three or four years -- at least 
through EL 9 first production release), provide a "working and usable" 
product, just as was SL.  After reading comments on this list, I am more 
tempted to give up on EL and move to Ubuntu LTS.  But -- I have not made 
a decision. For those who require a reliable, production, stable, but 
reasonably "current" Linux environment ("current" means that when I need 
an application, I will not find that there are no ports of the recent 
releases of the application to the Linux I am using because the major 
libraries -- .so files -- are too "obsolete"), what choices are 
available?  In so far as possible, I want the same distro to work on 
servers (and have CUDA support for compute servers with Nvidia GPU 
compute boards as well as MPI) and my laptop "workstation".


Re: RAID 6 array and failing harddrives

2017-04-04 Thread Graham Allan

On 4/4/2017 6:59 PM, Konstantin Olchanski wrote:

Moving to ZFS...
ZFS is also scary...


Heh - another soon to be victim of ZFS on linux :)



No kidding. Former victim of XLV+XFS (remember XLV?), former
victim of LV+EFS, former victim of ext2, ext3, reiserfs, former
victim of LVM, current victim of mdadm/raid5/6/ext4/xfs.



You'll quickly realise that the majority of major features you'd expect
to work - don't.


I am not big on "features". For me the main features is 
open()/read()/write()/close(),
mkdir()/rmdir()/readdir() and those seem to work on all filesystems. Next 
features are:
a) non-scary raid rebuild after a crash or disk failure,
b) "online" fsck



You can't grow a ZFS 'raid'. You're stuck with the number of disks you first 
start with.


(I know this is 2 quotes back) That's kind of unfair since here you're 
talking about a feature that was never offered, it's just an incorrect 
assumption. It took me a while to understand what ZFS does and does not 
offer as well - I missed many things from (ancient history) Digital's 
advfs - but ZFS does lots of things quite well. There really is no such 
thing as "a ZFS raid", that's probably most analogous to a zfs pool made 
of a single raidz vdev, but that's a very simple case. What other system 
lets you make large reliable storage pools from hundreds of drives on a 
single server? I built some with 200+ 4TB drives some years back.



We only have a few hardware configurations, all with fixed number of disks, so 
not a problem:

a) single 120GB ssd for OS (/home on NFS)
b) single SSD for OS, dual 4-6-8 TB HDD for data, RAID1 configuration to 
protect against single disk failure
c) dual SSD for OS and /home, dual HDD for data, both RAID1 configuration to 
protect against single disk failure
d) single SSD for OS, multiple (usually 8) 6-8 TB HDDs for data, mdadm 
raid6+xfs and now raidz2 ZFS (protection against single disk failure + failure 
of second disk during raid rebuild).


For case (b) for your data storage, you can expand a ZFS mirror 
reasonably easily.
For case (c) I don't know how hard it is to use ZFS for the OS drive on 
linux; I only used it on BSD. But mdadm on linux is ok for that role.
For case (d) it is true that you cannot expand a ZFS RAIDZ(2) vdev, but 
that's ok if you know that going in.



BTRFS is billed as "open source replacement for ZFS", but after testing it,
my impression is that it is only used by a couple of enthusiasts
in single-disk laptop configurations. In a single-disk system, it is not
clear how btrfs/zfs is better than old-plain ext4/xfs.


I've never seen any good success stories for btrfs but to be fair I have 
not followed it closely.


zfs can still give you some useful things on a single drive system: you 
get the data checksums, useful snapshots (as opposed to LVM), volume 
manager features, etc.


By default the checksums would only warn you of problems with a single 
drive, but you can tell zfs to keep multiple copies of your data ("zfs 
set copies=n") so that might well also let it recover from bad blocks.


G.
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu


Re: Need help Debuging boot

2017-02-17 Thread Graham Allan
I think that's a linux message. I've seen this before when output starts 
getting redirected to the serial port (or ipmi/iDRAC/iLO virtual serial 
port). Maybe check if there is some type of console redirection set up 
in the BIOS? It seems to me that when redirection is set up, the BIOS 
itself and GRUB can output to both serial and regular console; once the 
kernel boots, messages only go to one or the other.


The "hang" (or rather change in output) doesn't have anything to do with 
the EDD message itself - it just happens that this is the last message 
printed during that particular phase of booting.


G.

On 2/17/2017 10:33 AM, Konstantin Olchanski wrote:

On Thu, Feb 16, 2017 at 10:07:55PM -0600, ~Stack~ wrote:

I have a bunch of new SuperMicro servers. Installed 7.3 on it. Reboot and it 
hangs at:
"Probing EDD (edd=off to disable)...ok"


That's a BIOS message, not a linux or grub message, yes?

(I do not see any EDD messages in the linux log files)

Anyhow, *which* SuperMicro servers? (so I do not buy the same)


However, if I let it sit long enough it will boot (once one sat for an
hour before it continued on, most of the time it is closer to 30-40
minutes).


The SuperMicro mobo/bios is notorious for slow booting, takes a good few minutes
from powerup to grub menu. But 30 min is extreme, yes.


If I boot into rescue kernel, it instantly boots. Every time. This is so
puzzling to me.


How do you mean? The "EDD" message is before grub menu or after grub menu?


--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu


Re: firefox 45.2 issues - actually kernel issues.

2016-06-22 Thread Graham Allan
It looks to me like it's somehow related to NFSv4, as the 
/var/log/messages files on the affected clients contain repeated lines like:


nfsidmap[7508]: Failed to add child keyring: Operation not permitted

along with call traces from the hung firefox processes


kernel: Call Trace:
kernel: [] ? nfs_permission+0xb2/0x1e0 [nfs]
kernel: [] ? security_inode_permission+0x1f/0x30
kernel: [] __mutex_lock_slowpath+0x96/0x210
kernel: [] mutex_lock+0x2b/0x50
kernel: [] do_filp_open+0x2d6/0xd20
kernel: [] ? nfs_attribute_cache_expired+0x1b/0x70 [nfs]
kernel: [] ? cp_new_stat+0xe4/0x100
kernel: [] ? strncpy_from_user+0x4a/0x90
kernel: [] ? alloc_fd+0x92/0x160
kernel: [] do_sys_open+0x67/0x130
kernel: [] sys_open+0x20/0x30
kernel: [] system_call_fastpath+0x16/0x1b


The nfs-utils package hasn't updated in a while so nfsidmap itself 
shouldn't have changed, but I see that it uses an in-kernel keyring to 
store the mappings, so it could certainly be affected by a change in the 
kernel.


G.

On 6/21/2016 9:40 AM, Jesse Bren wrote:

Looks like its not the firefox update at fault but the new kernel,
which was also released about the same time, not playing nicely with
our NFS file servers. Hopefully this can be resolved soon, currently
am having users boot into the previous kernel version
(2.6.32-573.26.1.el6.x86_64 as opposed to 2.6.32-642.el6.x86_64).

Sorry for the, somewhat, false alarm about firefox.

Jesse



Software Collections 2.0?

2015-08-17 Thread Graham Allan
I know it hasn't been out for very long, but I was wondering if it's 
planned to have a scientific linux rebuild of SCL 2.0? Or, indeed, if 
the preferred strategy these days is to use the repos at 
softwarecollections.org?


My impression from dipping into Centos mailing list archives is that 
softwarecollections.org isn't necessarily a direct equivalent to the 
RHEL or Scientific Linux SCL builds, but perhaps that's outdated 
information.


Graham
--
-
Graham Allan
School of Physics and Astronomy - University of Minnesota
-


Re: What happened to adobe repository ?

2014-01-15 Thread Graham Allan

On 1/15/2014 4:20 AM, Urs Beyerle wrote:


Adobe discontinued the Adobe Reader 9 for Linux in June 2013 and has not
fixed and will not fix any further security issues in it. Therefore it
makes totally sense to remove it from their repo.


I'm not disagreeing with you but it's still a breathtakingly crappy way 
of handling it. Acroread for linux is still available as a regular web 
download, so it's not remotely obvious that it's desupported unless you 
follow the news independently. For example it might have been worth a 
final mention on Adobe's acrobat for unix blog 
http://blogs.adobe.com/acroread/ rather than leaving that abandoned 
since 2010!


Graham


Re: Large filesystem recommendation

2013-07-25 Thread Graham Allan
It's not so bad if you build the system taking these things into account 
(much easier if you wait long enough to read about others' experiences 
:-). We built our BSD ZFS systems using inexpensive Intel 313 SSDs for 
the log devices. I can't say that they're the best possible choice, 
opinions vary all over the map, but the box is currently happily 
accepting 2Gbps continuous NFS writes, which seems pretty decent.


Graham

On 7/24/2013 5:36 PM, Paul Robert Marino wrote:

ZFS is a performance nightmare if you plan to export it via NFS because
of a core design conflict with how NFS locking and the ZIL journal in
ZFS. Its not just a linux issue it effects Solaris and BSD as well. My
only experience with ZFS was on a Solaris NFS server and we had to get a
dedicated flash backed ram drive for the ZIL to fix our performance
issues, and let me tell you sun charged us a small fortune for the card.
Aside from that most of the cool features are available in XFS if you
dive deep enough into the documentation though most of them like multi
disk spanning can be handled now by LVM or MD but are at least in my
opinion handled better by hardware raid. Though I will admit the being
able to move your journal to a separate faster volume to increase
performance is very cool and that's only a feature I've seen in XFS and ZFS.


Re: Large filesystem recommendation

2013-07-25 Thread Graham Allan
I'm not sure if anyone really knows what the reliability will be, but
the hope is obviously that these SLC-type drives should be
longer-lasting (and they are in a mirror).

Losing the ZIL used to be a fairly fatal event, but that was a long time
ago (ZFS v19 or something). I think with current ZFS versions you just
lose the performance boost if the dedicated ZIL device fails or goes away.
There's a good explanation here:
  http://www.nexentastor.org/boards/2/topics/6890

Graham

On Thu, Jul 25, 2013 at 10:41:50AM -0700, Yasha Karant wrote:
 How reliable are the SSDs, including actual non-corrected BER, and
 what is the failure rate / interval ?
 
 If a ZFS log on a SSD fails, what happens?  Is the log automagically
 recreated on a secondary SSD?  Are the drives (spinning and/or SSD)
 mirrored? Are primary (non-log) data lost?
 
 Yasha Karant


Re: advice on using latest firefox from mozilla

2013-06-06 Thread Graham Allan
Latest firefox is available pre-packaged for SL at Remi's repo,
http://rpms.famillecollet.com/

we used it for a while when TUV was still supplying a desperately old
version, though more recently switched back to the supplied ESR release.

Graham

On Thu, Jun 06, 2013 at 12:19:23PM -0500, Ken Teh wrote:
 I'd appreciate some more details on how you implement the Mozilla update
 protocol in an (not quite) enterprise environment.  IOW, not hundreds or
 thousands of machines but enough to make manual updates unfeasable.
 
 Isn't mozilla's update user-based?  When a user launches firefox, the browser
 checks for updates, and if there is a newer version, asks the user to download
 and install it.  What do you do if the user has no privileges to install
 software?
 
 Thanks!
 
 
 On 06/05/2013 08:40 PM, Yasha Karant wrote:
 On 06/05/2013 01:57 PM, Ken Teh wrote:
 I'd like to hear some pros and cons with using the latest firefox from
 mozilla instead of using the ESR version that comes with the stock
 distro.  I am deploying a web app that fails to render properly. It is a
 bug in firefox which has been fixed since version 18.
 
 Naturally, the ESR version is 17. Sigh...
 
 As we were deploying new Nvidia-equipped stereoscopic 3D scientific
 visualisation workstations using X86-64 SL6x, we had to make a decision as
 to whether to use the SL distribution Firefox (ESR) or the latest production
 release.  After considering the pros and cons (including that the machines
 are behind a network firewall), we selected the current production version.
 Thus far, we have had no issues, and have done several updates using the
 Mozilla Firefox update technique, not the SL6x update, keeping Firefox up to
 the current production release.  Part of the reason for the decision was the
 observation you also have made; certain defects that were corrected in the
 production release did not have the corrections backported to the earlier
 release ESR SL version.
 
 Yasha Karant

-- 
-
Graham Allan
School of Physics and Astronomy - University of Minnesota
-


SL 6.1 installation anaconda failure - issue at ftp.scientificlinux.org?

2013-06-03 Thread Graham Allan
We just had to install a couple of SL 6.1 machines (due to collaboration
dependency on this version) and strangely found that the installation
would bomb every time with an anaconda error when trying to parse
/tmp/.treeinfo. The file contains the following contents:

!DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN
htmlhead
title300 Multiple Choices/title
/headbody
h1Multiple Choices/h1
The document name you requested 
(code/linux/scientific/6.1/x86_64/updates/security//.treeinfo/code) could 
not be found on this server.
However, we found documents with names similar to the one you 
requested.pAvailable documents:
ul
lia 
href=/linux/scientific/6.0/x86_64/updates/security//./linux/scientific/6.1/x86_64/updates/security//./a
 (common basename)
lia 
href=/linux/scientific/6.0/x86_64/updates/security//../linux/scientific/6.1/x86_64/updates/security//../a
 (common basename)
/ul
/body/html

This is curious because we are installing from a local copy of the 6.1 repo, our
kickstart file hasn't changed in a long time, worked ok when last used earlier
this year, etc.

To cut a long story short I eventually found this at
scientificlinuxforum.org:
http://scientificlinuxforum.org/index.php?showtopic=2302

it was suggested there that someone post this to the SL lists but I
don't see any sign that happened. Their workaround of a dummy hosts
entry for ftp.scientificlinux.org does let the install succeed, but I
imagine that either putting the file back into place, or disabling the
helpful response from the server in favor of a plain 404 response,
should help?

Although I'm curious why anaconda is attempting to download a file from
ftp.scientificlinux.org when all the declared sources are local - some
oversight in the installer build?

The same problem afflicts SL 6.0, btw.

Thanks,

Graham
-- 
-
Graham Allan
School of Physics and Astronomy - University of Minnesota
-


Re: [SCIENTIFIC-LINUX-USERS] sata0 is not sda

2013-02-22 Thread Graham Allan
On Wed, Feb 20, 2013 at 07:57:22AM -0600, Pat Riehecky wrote:
 On 02/19/2013 10:14 AM, Graham Allan wrote:
 On Mon, Feb 18, 2013 at 07:33:37PM -0700, Orion Poplawski wrote:
 On 02/18/2013 02:01 PM, Ken Teh wrote:
 During a kickstart install, how are drives mapped?  I notice that sata0
 is not always sda.  This is especially true when there are very large
 drives in the mix.
 The sd* letters are simply handed out in order of enumeration and,
 as you noted, is not deterministic.  If you need that, use the
 /dev/disk/by-{id,label,path,uuid} labels.
 For this reason we use a script during the kickstart %pre section which
 attempts to examine the available drives and determine which is the
 appropriate one to install on (also waits for confirmation if an
 unexpected partition table is found).
 
 G.
 
 Any chance you can share that script?  It sounds interesting!

Well I hope I haven't oversold this - it's not a miracle cure but it
works for our environment.

Back in the SL3/4/5 days we made some basic assumptions in kickstart
%pre that if there was an hda device, that would be the OS drive,
otherwise use sda. If a workstation didn't match that model, we'd
generally beat on it until it did :-) IIRC SL6 changed so that ATA drives,
and any connected usb drives, appeared as sd* devices, and you couldn't be
certain about the ordering.

There will be lots of deeply unfashionable things in this script such as
using regular drive partitions instead of LVM, using separate /var,
etc... Also please remember it's been hacked together as edge cases were
discovered so it's not pretty, or commented as well as it might be.

It tries to examine all candidate drives (after eliminating USB devices)
and will select a drive for installation if it either contains no
partition table, or finds a /boot directory (actually that hardly seems
foolproof, there might be better choices). Otherwise if it finds an
unfamiliar partition table, it prints it and asks for confirmation.

The rest of the script is concerned with stashing away ssh keys and
suchlike, for restoration after a reinstall.

Here goes, hope vim didn't mangle it overmuch during pasting, and don't
judge it too harshly!

%pre

##
# Figure out where to install to

#Make sure USB storage devices are GONE
modprobe -r usb_storage

mkdir /mnt/tmp

for i in `ls /dev/sd?`; do
CANDIDATE_DISK=$i

echo UMPHYS: Checking $CANDIDATE_DISK for partition table | tee -a 
/tmp/ks.log  /dev/tty3

if parted -s $CANDIDATE_DISK print /dev/null; then
echo parted found a partition table | tee -a /tmp/ks.log  /dev/tty3
if mount ${CANDIDATE_DISK}1 /mnt/tmp; then
if [ -d /mnt/tmp/boot ]; then
INSTALL_DISK=$CANDIDATE_DISK
echo Found /boot on ${CANDIDATE_DISK}1, using $CANDIDATE_DISK 
as system disk | tee -a /tmp/ks.log  /dev/tty3
umount /mnt/tmp
break
else
echo Couldn't find /boot on ${CANDIDATE_DISK}1, moving on to 
next disk... | tee -a /tmp/ks.log  /dev/tty3
umount /mnt/tmp
fi
umount /mnt/tmp
fi
else
echo Failed to mount ${CANDIDATE_DISK}1, moving on to next 
disk... | tee -a /tmp/ks.log  /dev/tty3
fi
else
echo parted found no partition table, using $CANDIDATE_DISK as system 
disk | tee -a /tmp/ks.log  /dev/tty3
INSTALL_DISK=$CANDIDATE_DISK
break
fi
done

if [ ${INSTALL_DISK}x = x ]; then
echo   /dev/tty1
echo Initial check failed to find a suitable system disk!  /dev/tty1
echo   /dev/tty1
for i in `ls /dev/sd?`; do
CANDIDATE_DISK=$i
if mount ${CANDIDATE_DISK}1 /mnt/tmp; then
if ! [ -d /mnt/tmp/boot ]; then
echo -e \n\nCouldn't find /boot on ${CANDIDATE_DISK}1  
/dev/tty1
echo -n Partition table for ${CANDIDATE_DISK}:  /dev/tty1
fdisk -l ${CANDIDATE_DISK}  /dev/tty1
doit=default
while ! echo $doit | grep -P ([Y|y]es|[N|n]o)  /dev/null 
21; do
echo -n Install linux on ${CANDIDATE_DISK}? [Yes/No]   
/dev/tty1
read doit
done
if echo $doit | grep -P [Y|y]es  /dev/null 21; then
INSTALL_DISK=$CANDIDATE_DISK
umount /mnt/tmp
break
else
echo moving on to next disk...  /dev/tty1
fi
umount /mnt/tmp
fi
else
echo -ne \n\nPartition table for ${CANDIDATE_DISK}:  /dev/tty1
fdisk -l ${CANDIDATE_DISK}  /dev/tty1
doit=default
while ! echo $doit | grep -P ([Y|y]es|[N|n]o)  /dev/null 21; 
do
echo -n Install linux on ${CANDIDATE_DISK}? [Yes/No]   
/dev/tty1

Re: puppet

2013-02-22 Thread Graham Allan
On Fri, Feb 22, 2013 at 11:39:58AM -0500, Paul Robert Marino wrote:
 The only problem I ever had with cfengine is the documentation was
 never all that great but it is stable and scales well.
 That being said puppet is not perfect many of the stock recipes for it
 you find on the web don't scale well and to get it to scale you really
 need to be a ruby programer. My other issue with puppet is it doesn't
 provide you with a great amount of control over the timing of the
 deployment of changes unless you go to significant lengths.
 Essentially its good for a Agile development model environment which
 is popular with many web companies; however its a nightmare for
 mission critical 24x7x365 environments which require changes to be
 scheduled in advance.

At the risk of continuing off-topic for the list... but it's a really
interesting discussion... We ended up building a bunch of infrastructure
around cfengine to help with this kind of thing. First step was getting
the cfengine config into version control (svn, then git), which seems
basic now, but I certainly wasn't doing that 10 years ago! Then one of
our smart student sysadmins devised a way we could make development
branches of the config, and tell specific machines which branch cfengine
should use. That's been very useful for figuring out more complicated
actions like setting up OSG nodes.

Of course once you build this kind of infrastructure and have it
working, you're reluctant to abandon it. Maybe there are tools which
do all this for you now, that's why this is such a good thread.

 These days I'm using Spacewalk for most of what I would have used
 cfengine or puppet for in the past the only thing that it doesn't do
 out of the box is make sure that particular services are running or
 not running at boot, but there are a myriad of other simple ways to do
 that which require very little work, and if I really wanted to I could
 get spacewalk to do that as well via the soap APIs.

Yes, spacewalk seems like it can do a lot (some of our neighbors here
use it). We still have a somewhat multiplatform environment - SL,
FreeBSD, a few lingering legacy systems still hanging on (tru64), so
more generic tools are still important to us.

Graham
-- 
-
Graham Allan - I.T. Manager
School of Physics and Astronomy - University of Minnesota
-


Re: Firefox 17 causing X server crashes

2013-02-22 Thread Graham Allan
On Fri, Feb 22, 2013 at 10:30:50PM +, Phil Perry wrote:
 
 I had X crash yesterday when using the new 17.0.3 Firefox too on EL5
 (32-bit FF on 64-bit OS).
 
 I'm an NVIDIA user - I mention it as I really can't remember the
 last time X crashed for me, and I've been running this box since
 5.0. X has crashed maybe 5 times at most in 5 years and the box runs
 permanently only rebooting for new kernels. The reason I mention
 this is I'm more inclined to point the finger at the new FF that the
 graphics driver, at least in my case, as that's the component that
 changed right before the crash. I have not been able to reproduce it
 so have no evidence, just a gut feeling. For what it's worth.

We also had someone experience this on an SL5 machine in the last few
days - also using nvidia video, using the nvidia driver from ELrepo.

Someone fixed it but the only comment I saw was Getting GLX working
correctly seems to have solved the problem. I remember hearing
something about the nvidia libglx.so failing to load in the xorg logs,
but I'll ask what the fix really was. It may simply have been
reinstalling the elrepo nvidia driver.

G.
-- 
-
Graham Allan - I.T. Manager
School of Physics and Astronomy - University of Minnesota
-


Re: sata0 is not sda

2013-02-19 Thread Graham Allan
On Mon, Feb 18, 2013 at 07:33:37PM -0700, Orion Poplawski wrote:
 On 02/18/2013 02:01 PM, Ken Teh wrote:
 During a kickstart install, how are drives mapped?  I notice that sata0
 is not always sda.  This is especially true when there are very large
 drives in the mix.
 
 The sd* letters are simply handed out in order of enumeration and,
 as you noted, is not deterministic.  If you need that, use the
 /dev/disk/by-{id,label,path,uuid} labels.

For this reason we use a script during the kickstart %pre section which
attempts to examine the available drives and determine which is the
appropriate one to install on (also waits for confirmation if an
unexpected partition table is found).

G.
-- 
-
Graham Allan - I.T. Manager
School of Physics and Astronomy - University of Minnesota
-