Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-28 Thread Ben Hutchings
On Mon, 2016-06-27 at 11:58 -0400, Jeffrey Mark Siskind wrote:
>    The non-determinism in which identifiers are shown might be a bug in the
>    installer, or it might be caused by failure of ID commands to the
>    drives.
> 
>    I think most of the problems you're still having must be caused by a
>    bug in the RAID driver, mpt2sas (or its firmware, if that's not
>    embedded in the BIOS).
> 
> Thanks. Please let me know how I can report the potential bug(s) and what I
> can do to help track them down.

Please test a more recent kernel version, like Linux 4.6
(available as linux-image-4.6.0-0.bpo.1-amd64 in jessie-backports).

Then use 'reportbug' to submit a bug report against that package if the
bug is still present, or the jessie package if it's fixed there, giving
a summary of the problems you've described.

Ben.

-- 

Ben Hutchings
If at first you don't succeed, you're doing about average.


signature.asc
Description: This is a digitally signed message part


Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-28 Thread Ben Hutchings
On Tue, 2016-06-28 at 00:49 +0200, deloptes wrote:
> Jeffrey Mark Siskind wrote:
> 
> >    I had big issues with mptsas and 3.16 in jessie, so I am still using
> >    3.2.0-4-rt-amd64
> > 
> > Will jessie run with 3.2.0-4-rt-amd64? If so, where do I get it and how do
> > I install it on a fresh jessie install that wasn't dist-upgraded from
> > wheezy?
> > 
> > Jeff (http://engineering.purdue.edu/~qobi)
> 
> Yes I run it with that kernel since wheezy. You can get it from wheezy
> https://packages.debian.org/wheezy/linux-image-3.2.0-4-rt-amd64

Any particular reason why you use the -rt variant?

> https://packages.debian.org/wheezy/linux-headers-3.2.0-4-rt-amd64
> https://packages.debian.org/wheezy/linux-image-3.2.0-4-rt-amd64-dbg

The proper way is to add the wheezy-security suite to
/etc/apt/sources.list.  (All updates to wheezy now go to the wheezy-
security suite.)

> Here is what I have and bit of background
> 
> # uname -a
> Linux lisa 3.2.0-4-rt-amd64 #1 SMP PREEMPT RT Debian 3.2.68-1+deb7u4 x86_64
> GNU/Linux
> 
> # cat /etc/debian_version
> 8.5
> 
> - Disk controller is mptsas here not mpt2sas as you posted - no idea what is
> the difference.
[...]

So far as I can see, mptsas is for SAS 1.0 (3 Gbps) controllers and
mpt2sas is for SAS 2.0 (6 Gbps) controllers.  They are two entirely
separate drivers, probably with different sets of bugs.

Ben.

-- 

Ben Hutchings
If at first you don't succeed, you're doing about average.


signature.asc
Description: This is a digitally signed message part


Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-27 Thread Jeffrey Mark Siskind
   I had big issues with mptsas and 3.16 in jessie, so I am still using
   3.2.0-4-rt-amd64

Will jessie run with 3.2.0-4-rt-amd64? If so, where do I get it and how do I
install it on a fresh jessie install that wasn't dist-upgraded from wheezy?

Jeff (http://engineering.purdue.edu/~qobi)



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-27 Thread Jeffrey Mark Siskind
   The non-determinism in which identifiers are shown might be a bug in the
   installer, or it might be caused by failure of ID commands to the
   drives.

   I think most of the problems you're still having must be caused by a
   bug in the RAID driver, mpt2sas (or its firmware, if that's not
   embedded in the BIOS).

Thanks. Please let me know how I can report the potential bug(s) and what I
can do to help track them down.

Jeff (http://engineering.purdue.edu/~qobi)



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-27 Thread Ben Hutchings
On Mon, 2016-06-27 at 08:07 -0400, Jeffrey Mark Siskind wrote:
[...]
> Whenever I observe any of the behavior reported in this email, it is
> almost always associated with dmesg reporting the same error on the same
> sector 2056 (sometimes 2058 or 2062). Given the dozens of attempted
> reinstalls and reboots, at this point, I have seen this on almost all, if
> not all, of the six disks on each of the four machines. I don't believe
> that 24 disks all have the same bad sectors.

The first partition probably starts at an offset of 1MB, which is 2048
sectors.  So these errors are presumably occurring while reading a
filesystem label near the start of that partition, which is pretty much
the first thing that will happen after the array is assembled.

[...]
>  D. In step (4), there appears to be nondeterminism in the serial numbers of
> the disks that get reported in the menu of options of where to install
> grub. Sometimes, the disks get reported as ata-*, sometimes as scsi-*.
> Note that all of my disks are SATA so the ones reported as scsi-* are
> clearly in error. If I do fresh installs multiple times on the same
> machine, each time it reports different serial numbers for the disks.

Linux uses an ATA/SCSI translation layer (libata), so that each ATA
drive is also seen as a SCSI drive and has two such identifiers.  The
non-determinism in which identifiers are shown might be a bug in the
installer, or it might be caused by failure of ID commands to the
drives.

[...]
> Note that there is a lot of nondeterministic behavior (all cases above where I
> say "sometimes"). In all cases, I do exactly the same thing over and over to
> the same machine and get different behavior.

This is an unfortunate effect of doing multiple things in parallel,
which is really the only way to make them go fast.

I think most of the problems you're still having must be caused by a
bug in the RAID driver, mpt2sas (or its firmware, if that's not
embedded in the BIOS).

Ben.

-- 

Ben Hutchings
Humour is the best antidote to reality.


signature.asc
Description: This is a digitally signed message part


Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-27 Thread Jeffrey Mark Siskind
I'd like to thank everyone for helping out.

Here is an update on installing jessie on R815s.

I succeeded in installing on three of my four R815s. But I am holding off on
the last because it is my file server and there are still issues. Please read
on. I don't believe that the problem is solved and there may be a bug lurking
that can lead to data loss.

Here is what I did.

 1. Before the install, while still running wheezy, I upgraded the BIOS.
  R815_BIOS_JF8YH_LN_3.2.2.BIN
This seemed to alleviate the problem of the jessie installer failing to
find the ISO. More on this later.

 2. Before the install, while still running wheezy, I reduced the number of
components of md0 from 6 to 4. This was in response to Steve' suggestion.
  mdadm /dev/md0 --fail /dev/sdf1
  mdadm /dev/md0 --fail /dev/sde1
  mdadm /dev/md0 --remove /dev/sdf1
  mdadm /dev/md0 --remove /dev/sde1

 3. I did a fresh USB install of jessie. More on this later.

 4. When it asked about which devices to install grub, I answered "manual" and
then typed /dev/sdb. More on this later.

 5. After the fresh install, I rebooted, and in grub, I added rootdelay=20.
This was in response to Don's suggestion.

 6. After the reboot, I ran my standard post-install script. Among other
things, this installs numerous packages, makes a small number of mods to
/etc, and does a dpkg-reconfigure grub-pc. When it did that, I specified
only the 4 drives with active components of md0 and added rootdelay=20.

 7. I rebooted. More on this later.

Now for the issues.

 A. Even after the BIOS upgrade, when it no longer fails to find the ISO,
during the installer phase where it searches for an ISO, I notice
nondetermininstic behavior. Sometimes it searchs sdb{1,2,3}, sdc{1,2,3},
sdd{1,2,3}, sde{1,2,3}, sdf{1,2,3}, sdg{1,2,3}, sd{a,b,c,d,e,f,g} and
eventually finds an ISO (sda is the USB dongle). Sometimes it finds the
ISO right away without any searching. This doesn't cause problems but I
believe that it is symptomatic of other problems.

 B. I'm not sure that reducing the number of components of md0 to 4 and/or
adding rootdelay=20 really solved the problem. I think it just reduced the
likelihood of occurrence. On one of the machines (arivu), during the
reboot in step (7), at an early phase of the boot, the machine first
reported that it found all 4 components of md0 and all 6 components of md1.
Then at  a later phase it reported that there were errors on 3 of the 4
components. After the machine came up, md0 had only one component. Three
of the four components were in failed (F) state. I did mdadm --remove to
them and then mdadm --add to them. This doesn't happen all of the time. But
it happens some of the time.


  qobi@upplysingaoflun>all-n-3g dmesg --level=err
  upplysingaoflun:
  verstand:
  arivu:
  [   28.012558] mpt2sas0: fault_state(0x265d)!
  [   29.231355] end_request: I/O error, dev sdb, sector 2056
  [   29.231600] end_request: I/O error, dev sdc, sector 2056
  [   29.231773] end_request: I/O error, dev sde, sector 2056
  [   29.232020] end_request: I/O error, dev sda, sector 2056
  perisikan:
  [   13.035132] mpt2sas0: fault_state(0x265d)!
  [   28.600099] mpt2sas0: fault_state(0x265d)!
  qobi@upplysingaoflun>

  qobi@upplysingaoflun>all-n-3g "dmesg --level=warn|fgrep -i error|fgrep -v 
ACPI"
  upplysingaoflun:
  verstand:
  arivu:
  [   29.231430] md: super_written gets error=-5, uptodate=0
  [   29.231670] md: super_written gets error=-5, uptodate=0
  [   29.231869] md: super_written gets error=-5, uptodate=0
  [   29.232117] md: super_written gets error=-5, uptodate=0
  perisikan:
  qobi@upplysingaoflun>

(These are my four R815s. upplysingaflun is the file server that has not
been updated. The other three have.) Note that one machine reports no
"mpt2sas0: fault_state(0x265d)" errors, one machine reports one, and one
machine reports two. Note that the machine that dropped three components
of md0 during boot reported I/O errors on all 4 disks with the 4
components of md0. I don't believe that there really are faulty disks.
Whenever I observe any of the behavior reported in this email, it is
almost always associated with dmesg reporting the same error on the same
sector 2056 (sometimes 2058 or 2062). Given the dozens of attempted
reinstalls and reboots, at this point, I have seen this on almost all, if
not all, of the six disks on each of the four machines. I don't believe
that 24 disks all have the same bad sectors.

 C. In step (3), sometimes, but not always, during the install, I get a screen
that says that some partition failed. If offers a menu of two options. I
select "retry". Sometimes, but not always, this causes md0 to drop
components in the installer, which I fix by going to ctrl-alt-f2 

Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-24 Thread Steve McIntyre
On Fri, Jun 24, 2016 at 06:22:37PM -0400, Jeffrey Mark Siskind wrote:
>Please note that bootint with rootdelay=20 does not solve the problem. It only
>masks it.
>
> 1. If I attempt a fresh USB install of jessie, when md0 is correctly built
>before the install, the process of doing the fresh install breaks
>md0. When it gets to grub install, components of md0 are missing (even
>though all six components were present before the install). And
>grub-install fails. At this point it is impossible to complete the install
>and produce a bootable system.
>
> 2. If I do a fresh minimal USB install of wheezy, rebuilding md0 in the
>process, and then do a dist-upgrade to jessie, I can manually add
>rootdelay=20 in grub and boot into jessie with all six components of md0
>present. But if I do so, then after boot, if I do dpkg-reconfigure pc-grub,
>doing that gives errors, drops components of md0, precludes me from adding
>them back, fails to install grub, and leaves the machine in an unbootable
>state.
>
>I fear that there is a problem writing to disk. Even if I boot with
>rootdelay=20, unless the kind of writes that dpkg-reconfigure pc-grub does are
>different, doing ordinary writes to disk may also corrupt the disk.
>
>Please let me know what new information you would like me to gather.

Ummm. Checking back up-thread, I can see that you're using md0 across
more than 4 disks and you're trying to boot off it with
grub-pc. You're hitting BIOS limitations here - the BIOS is only
capable of accessing 4 disks. I'm *guessing* that maybe the newer grub
in jessie is just being pickier about checking BIOS access to those
disks. Try just using 4 of the disks for md0, and I'd expect it to
work.

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
"Arguing that you don't care about the right to privacy because you have
 nothing to hide is no different than saying you don't care about free
 speech because you have nothing to say."
   -- Edward Snowden



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-24 Thread Jeffrey Mark Siskind
Please note that bootint with rootdelay=20 does not solve the problem. It only
masks it.

 1. If I attempt a fresh USB install of jessie, when md0 is correctly built
before the install, the process of doing the fresh install breaks
md0. When it gets to grub install, components of md0 are missing (even
though all six components were present before the install). And
grub-install fails. At this point it is impossible to complete the install
and produce a bootable system.

 2. If I do a fresh minimal USB install of wheezy, rebuilding md0 in the
process, and then do a dist-upgrade to jessie, I can manually add
rootdelay=20 in grub and boot into jessie with all six components of md0
present. But if I do so, then after boot, if I do dpkg-reconfigure pc-grub,
doing that gives errors, drops components of md0, precludes me from adding
them back, fails to install grub, and leaves the machine in an unbootable
state.

I fear that there is a problem writing to disk. Even if I boot with
rootdelay=20, unless the kind of writes that dpkg-reconfigure pc-grub does are
different, doing ordinary writes to disk may also corrupt the disk.

Please let me know what new information you would like me to gather.

Jeff (http://engineering.purdue.edu/~qobi)



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-23 Thread deloptes
deloptes wrote:

> Don Armstrong wrote:
> 
>> The error would be useful to know. Most likely one or more of them
>> dropped out of the array for some reason and you're booting off of one
>> which has a lower event count and it won't assemble.
>> 
>> But it could be any number of things.
>> 
>> The output of mdadm --examine /dev/sd[abcdef]1; when md0 fails to
>> assemble would also be useful.
> 
> In my case it is Dell OptiPlex 7xx - I have it under the desk for 2y now -
> but it looks like it is 5y old.
> When I looked into the drives they were detected but md disks seemed to be
> messed and not easy recovarable.
> 
> What I observed that only raid0 was loaded but not raid1. After removing
> raid0 and loading raid1 I was able to see at least the partitions of the
> drives but I did not have time to go further, so as I had to do a lot in
> the office and @home I just shut it down. I hope I'll have some time next
> week to play with that. Good that I do not need a remote machine at the
> moment.
> 
> I hope this helps
> 
> regards

Just FYI

I managed to solve this on the OptiPlex today - it took me 3h and several
reboots.

The thing is that the BIOS is locked by the admins and the built in raid
controller can not be deactivated. Thus obviously when upgrading the array
info was dumped into mdadm.conf from the built in array or may be even
before - I don't know, which lead to broken initrd.

When in the initramfs shell I had to unload all related modules because I
had only sda and sdb (no prtitions). After loading ata_piix I get all the
partitions and loading raid1 + ext4 was enough to start raid and mount
root.
mdadm -A -R /dev/md0 etc

then
mount -o bind /proc /root/proc 
sys, run, dev ...

mount /dev/md2 /root
cd /root

exec /sbin/chroot . /bin/sh <<- EOF >dev/console 2>&1
exec /sbin/init ${CMDLINE}
EOF

install 3.16 kernel and update-initramfs, but first remove the wrong ARRAY
info from mdadm.conf

DONE.

messy

regards



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-23 Thread deloptes
Lennart Sorensen wrote:

> Unless you are sharing the drive with windows I would highly recommend
> avoiding that, and doing the software raid purely in linux.  It is much
> simpler and much better supported.

Hi,
this is not a critical machine. I let debian configure the raid at
installation time. Usually I do it myself and I use software raid - as you
said it is in many ways better.

What should I do to revive the raid and be able to boot? You said you have
done this.

thanks



RE: jessie won't install/boot on a Dell Poweredge R815

2016-06-22 Thread Jared_Dominguez
Dell Customer Communication
>>Are you certain that there isn't a PERC H700 in this machine? [Sort of
>>odd that mpt2sas is triggering a state error in your screenshot if 
> there
>>actually isn't one.]
>>
>> There could be one. But I probably don't use it. I use software RAID. 
> Dell
>> wouldn't sell an R815 without an OS. I think I purchased it with RHEL
> which
>> may have needed the PERC H700. But I never even booted RHEL. The first
>> thing I did was a fresh install of squeeze, or maybe wheezy.
> 
>We definitely sell PowerEdge systems without an OS and have for quite a
>while. However, we do limit configuration for higher end systems to
> include
>hardware RAID.
> 
> My appologies. I may misremember. I purchased the machines (twelve
> T5500s, four R815s, and four C6145s) about 5 years ago and don't remember
> precisely the arrangements. I'd have to check archived email to know for
> sure.
> 
> The machines were purchased through ECN (Purdue's Engineering IT
> services). I'm a lowly professor. But I software-maintain my own machines. I
> definitely didn't spec out a hardware RAID controller. The mechanisms by
> which one was included are unclear at this point.

It looks like Stuart is out of office, but I'll try to remember to ping him 
when he's back.

>There's definitely a PERC controller in there based on
> 
>"05:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008
> PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)"
> 
>I'm not seeing the subvendor/subsystem ID's there but it's presumably the
>PERC 6/i. If you're really not using it at all, you might be able to pull
>it out if the driver for it is causing problems. However, I suspect you
>need it to connect to the drive backplane. Stuart (CCed) may be able to
>offer some more insight into driver issues you might see.
> 
>The SATA controller should only really be in use by the optical drive if
>present. Some of the mid-tier systems of that generation support SATA
>drives connected directly to a controller on the motherboard, but support
>for that under Linux was spotty from my recollection.
> 
> My T5500s have optical drives. But neither my R815s nor my C6145s have
> optical drives. All my machines have SATA drives. The R815s in question each
> have six ST9500530NS drives. They have been running squeeze and then
> wheezy with software RAID for 5 years since purchase.
> 
> Now that I have someone from Dell on the line who appears to be Debian-
> friendly, it would be nice if you made firmware upgrades Debian-friendly. I
> have been able to apply

If you have an iDRAC, it's possible to those updates out-of-band using the 
Lifecycle Controller using either WS-MAN or racadm.

I'm told (I work on client platforms (laptops/desktops/etc) now so haven't 
checked) that DUPs (the .BIN files) built after December 2014 should work on 
Debian and Ubuntu now, though only PowerEdge 12G/13G were tested. Also, not all 
types of DUPs have been tested, but the BIOS and iDRAC DUPs should work pretty 
well. More obscure stuff like Qlogic DUPs may not work. I'm not working in that 
area so am just relaying information and don't know much more than that.

>   R815_BIOS_JF8YH_LN_3.2.2.BIN
> 
> but have not been able to apply
> 
>   ESM_Firmware_7N76T_LN32_1.07_A00.BIN
>   ESM_Firmware_J7YYK_LN32_2.85_A00.BIN
>   SATA_FRMW_LX_R300994.BIN
> 
> (I don't even know if either of the ESM upgrades are for my hardware. But
> the shell scripts don't run.)

ESM = Embedded Server Management. "ESM" updates are for updating the iDRAC.

> Jeff (http://engineering.purdue.edu/~qobi)



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-22 Thread Jeffrey Mark Siskind
   >Are you certain that there isn't a PERC H700 in this machine? [Sort of
   >odd that mpt2sas is triggering a state error in your screenshot if there
   >actually isn't one.]
   > 
   > There could be one. But I probably don't use it. I use software RAID. Dell
   > wouldn't sell an R815 without an OS. I think I purchased it with RHEL which
   > may have needed the PERC H700. But I never even booted RHEL. The first
   > thing I did was a fresh install of squeeze, or maybe wheezy.

   We definitely sell PowerEdge systems without an OS and have for quite a
   while. However, we do limit configuration for higher end systems to include
   hardware RAID.

My appologies. I may misremember. I purchased the machines (twelve T5500s,
four R815s, and four C6145s) about 5 years ago and don't remember precisely
the arrangements. I'd have to check archived email to know for sure.

The machines were purchased through ECN (Purdue's Engineering IT services). I'm
a lowly professor. But I software-maintain my own machines. I definitely
didn't spec out a hardware RAID controller. The mechanisms by which one was
included are unclear at this point.

   There's definitely a PERC controller in there based on 

   "05:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 
PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)"

   I'm not seeing the subvendor/subsystem ID's there but it's presumably the
   PERC 6/i. If you're really not using it at all, you might be able to pull
   it out if the driver for it is causing problems. However, I suspect you
   need it to connect to the drive backplane. Stuart (CCed) may be able to
   offer some more insight into driver issues you might see.

   The SATA controller should only really be in use by the optical drive if
   present. Some of the mid-tier systems of that generation support SATA
   drives connected directly to a controller on the motherboard, but support
   for that under Linux was spotty from my recollection.

My T5500s have optical drives. But neither my R815s nor my C6145s have optical
drives. All my machines have SATA drives. The R815s in question each have
six ST9500530NS drives. They have been running squeeze and then wheezy with
software RAID for 5 years since purchase.

Now that I have someone from Dell on the line who appears to be
Debian-friendly, it would be nice if you made firmware upgrades
Debian-friendly. I have been able to apply

  R815_BIOS_JF8YH_LN_3.2.2.BIN

but have not been able to apply

  ESM_Firmware_7N76T_LN32_1.07_A00.BIN
  ESM_Firmware_J7YYK_LN32_2.85_A00.BIN
  SATA_FRMW_LX_R300994.BIN

(I don't even know if either of the ESM upgrades are for my hardware. But the
shell scripts don't run.)

Jeff (http://engineering.purdue.edu/~qobi)



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-22 Thread Jeffrey Mark Siskind
I conjecture that there may be two to five separate issues.

 1. Setting up md0 upon boot takes a long time. rootdelay=20 fixes this.
 2. There is a problem writing to disk. Perhaps just writing to certain blocks.
Because even when the machine boots with rootdelay=20, and md0 has all 6
components, grub-install fails and causes md0 to drop some/most of its
components.

Both of these are observed with a dist path-upgrade from a fresh USB install
of wheezy to jessie. Separate from this, there are two other errors observed
with a direct fresh USB install of jessie.

 3. Can't find the ISO.
 4. grub-install
This may be the same as (2) above.

This is yet distinct from the fact that

 5. a fresh direct USB install of jessie on the Dell Poweredge C6145s takes a
really long time (an hour) for each hardware probe (three times, once
before finding the ISO, once before partitioning, and once before grub
install).

Jeff (http://engineering.purdue.edu/~qobi)



RE: jessie won't install/boot on a Dell Poweredge R815

2016-06-22 Thread Jared_Dominguez
>Are you certain that there isn't a PERC H700 in this machine? [Sort of
>odd that mpt2sas is triggering a state error in your screenshot if there
>actually isn't one.]
> 
> There could be one. But I probably don't use it. I use software RAID. Dell
> wouldn't sell an R815 without an OS. I think I purchased it with RHEL which
> may have needed the PERC H700. But I never even booted RHEL. The first
> thing I did was a fresh install of squeeze, or maybe wheezy.

We definitely sell PowerEdge systems without an OS and have for quite a while. 
However, we do limit configuration for higher end systems to include hardware 
RAID.

There's definitely a PERC controller in there based on 

"05:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 
PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)"

I'm not seeing the subvendor/subsystem ID's there but it's presumably the PERC 
6/i. If you're really not using it at all, you might be able to pull it out if 
the driver for it is causing problems. However, I suspect you need it to 
connect to the drive backplane. Stuart (CCed) may be able to offer some more 
insight into driver issues you might see.

The SATA controller should only really be in use by the optical drive if 
present. Some of the mid-tier systems of that generation support SATA drives 
connected directly to a controller on the motherboard, but support for that 
under Linux was spotty from my recollection.

>OK. This:
> 
>> 00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI
> SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode]
> 
>makes me think that the SATA controller is in IDE/Legacy mode instead of
>AHCI. In theory, this shouldn't matter, but it's possible that this is
>also a problem. I'd try switching it in the bios and see what happens.
> 
> I'll do that in a bit. Before I got your current post, I tried some things in
> response to your previous post. I'll report on that here and then go back and
> try the new things.
> 
> Here is what I did.
> 
> I had a fresh minimal USB install of wheezy running. That install was done
> with debian-wheezy-DI-b1-amd64-netinst.iso from Jul 15  2012. I also put
> the non-free firmware on the USB. When I did that, I unchecked all of the
> boxes during the install for any extra packages. The only thing that I 
> installed
> after that was
> 
>apt-get install less
> 
> I then did
> 
>nano /etc/apt/source.list
>(change all wheezy to jessie)
>apt-get update
>apt-get dist-upgrade
> 
> I answered all of the defaults.
> 
> (default) all
> (default) no
> (default) cron
> 
> I captured this with
> 
>script -t 2>upgrade-jessie1 time -a ~/upgrade-jessie1.script
> 
> (My mistake. I forgot a period between upgrade-jessie1 and time.)
> 
>http://upplysingaoflun.ecn.purdue.edu/~qobi/time
>http://upplysingaoflun.ecn.purdue.edu/~qobi/upgrade-jessie1
> 
> You can see that it all worked.
> 
> You can see that at the end I did
> 
>apt-get install firmware-linux
> 
>dpkg-reconfigure grub-pc
># default
># default
># check all /dev/sd?
> 
> and it all worked.
> 
> You can also see that at the end I did
> 
>cat /proc/mdstat
> 
> and all 6 components of both md0 and md1 were there.
> 
> Then I did and
> 
>/sbin/reboot
> 
> The first reboot failed. It gave a similar screen as to the one that you 
> already
> saw.
> 
> Then I did a second reboot, with delay=20. That did the same.
> 
> Then I did a third reboot, with rootdelay=20. That worked. I got a login
> prompt, logged in, and got a root shell.
> 
> At that point, I did a
> 
>cat /proc/mdstat
> 
> and all 6 components of both md0 and md1 were there.
> 
> Then I did a
> 
>dpkg-reconfigure grub-pc
> 
> My intent was to add rootdelay=20 to the command line. But I got lots of
> errors while doing so. I realized that I should have done this under script.
> So I did
> 
>script -t 2>upgrade-jessie2.time -a ~/upgrade-jessie2.script
> 
> (this time with the period) and redid
> 
>dpkg-reconfigure grub-pc
> 
> and also did
> 
>cat /proc/mdstat
> 
> and attempted
> 
>mdadm /dev/md0 --add /dev/sda1
>mdadm /dev/md0 --add /dev/sdb1
>mdadm /dev/md0 --add /dev/sdc1
>mdadm /dev/md0 --add /dev/sdd1
>mdadm /dev/md0 --add /dev/sde1
>mdadm /dev/md0 --add /dev/sdf1
> 
> but these all failed.
> 
>http://upplysingaoflun.ecn.purdue.edu/~qobi/upgrade-jessie2.script
>http://upplysingaoflun.ecn.purdue.edu/~qobi/upgrade-jessie2.time
> 
> The machine is now in the state left at the end of the above script. If you
> want me to do some more things in this state, let me know. Or I can do a
> fresh USB install of wheezy and rebuild md0.
> 
>>What does the kernel output while it is detecting the disks and
>>partitions?
> 
>Remove the quiet option from the kernel command line by editing it in
> grub.
> 
> I will do this next time.
> 
>> 

Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-22 Thread Jeffrey Mark Siskind
   > and attempted
   > 
   >mdadm /dev/md0 --add /dev/sda1
   >mdadm /dev/md0 --add /dev/sdb1
   >mdadm /dev/md0 --add /dev/sdc1
   >mdadm /dev/md0 --add /dev/sdd1
   >mdadm /dev/md0 --add /dev/sde1
   >mdadm /dev/md0 --add /dev/sdf1
   > 
   > but these all failed.

   This is the wrong command; it should be mdadm --assemble /dev/md0
   /dev/sd[abcdef]1;

   And that should only be done if the md0 device doesn't show up in the
   initrd when you cat /proc/mdstat.

   What's happened is that the raid1 device now has 12 drives instead of 6,
   which basically isn't going to work at all.

You can see from the transcript that md0 is there and has only 6 drives. Just
that 5 of the six are marked as failed. And you can see that it refused to do
the mdadm --add.

   http://upplysingaoflun.ecn.purdue.edu/~qobi/upgrade-jessie2.script

   root@verstand:~# cat /proc/mdstat
   Personalities : [raid1] [raid6] [raid5] [raid4] 
   md1 : active raid5 sda2[0] sdf2[5] sdd2[4] sdc2[3] sde2[2] sdb2[1]
 1953118720 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] 
[UU]

   md0 : active raid1 sda1[6](F) sdd1[8](F) sdb1[7](F) sde1[9](F) sdc1[10] 
sdf1[11](F)
 39157688 blocks super 1.2 [6/1] [__U___]

   unused devices: 
   root@verstand:~# mdadm --add/dev/md0 --add /defv/sda1
   mdadm: Cannot open /dev/sda1: Device or resource busy
   root@verstand:~# mdadm /dev/md0 --add /dev/sda1b1
   mdadm: Cannot open /dev/sdb1: Device or resource busy
   root@verstand:~# mdadm /dev/md0 --add /dev/sdb1d1
   mdadm: Cannot open /dev/sdd1: Device or resource busy
   root@verstand:~# mdadm /dev/md0 --add /dev/sdd1e1
   mdadm: Cannot open /dev/sde1: Device or resource busy
   root@verstand:~# mdadm /dev/md0 --add /dev/sde1f1
   mdadm: Cannot open /dev/sdf1: Device or resource busy
   root@verstand:~# mdadm /dev/md0 --add /dev/sdf11c1
   mdadm: Cannot open /dev/sdc1: Device or resource busy

   You should be able to just directly reinstall jessie on this machine;

In earlier posts I explained how this fails. If I do a direct install from
USB, I observe two kinds of errors.

 1. Sometimes, but not every time, (it is nondeterministic) after the first 3
questions, the installer complains that it can't find the ISO.
 2. Whenever it does find the ISO, the install progresses without error all
the way to the grub install and then complains that it can't install grub.
I've tried several different things. Sometimes, I just answer sda to the
grub install question. (Actually sometimes sdb, because if I plug the USB
into the front port, the USB gets sdg and the drives get sd[a-f] but if I
plug the USB into the back port, the USB gets sda and the drives get
sd[b-g].) But this always fails. Sometimes, I go into ctrl-alt-f2 and do
  chroot target
  grub-install /dev/sda
  ...
  grub-install /dev/sdf
  (or b-g as appropriate)
but this also fails. At that point, I have no way to install grub. (If I
abort the install, the machine is unbootable.) Whenever I'm in this state
I do cat /proc/mdstat and it shows that some components of md0 are failed
or missing. Some are present. This is nondeterministic. Which components
are present and which are missing changes each time I attempt this. If I
attempt to do mdadm --add I get errors. If I reinstall fresh wheezy from
USB and then in wheezy do mdadm --add, it works and rebuilds the
array. When it is done it has all 6 components. And then I immediately do
a fresh install of jessie from USB and the same problem happens.

   I'd also zero out the superblocks on the devices in /dev/md0,

What command?

Jeff (http://engineering.purdue.edu/~qobi)



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-22 Thread Don Armstrong
On Wed, 22 Jun 2016, Jeffrey Mark Siskind wrote:
> and attempted
> 
>mdadm /dev/md0 --add /dev/sda1
>mdadm /dev/md0 --add /dev/sdb1
>mdadm /dev/md0 --add /dev/sdc1
>mdadm /dev/md0 --add /dev/sdd1
>mdadm /dev/md0 --add /dev/sde1
>mdadm /dev/md0 --add /dev/sdf1
> 
> but these all failed.

This is the wrong command; it should be mdadm --assemble /dev/md0
/dev/sd[abcdef]1;

And that should only be done if the md0 device doesn't show up in the
initrd when you cat /proc/mdstat.

What's happened is that the raid1 device now has 12 drives instead of 6,
which basically isn't going to work at all.

You should be able to just directly reinstall jessie on this machine;
I'd also zero out the superblocks on the devices in /dev/md0, and then
assuming that the syncing has proceeded enough, you should be able to
install grub with an appropriate rootdelay and get it to boot. (Again,
in theory.)

-- 
Don Armstrong  https://www.donarmstrong.com

The computer allows you to make mistakes faster than any other
invention, with the possible exception of handguns and tequila
 -- Mitch Ratcliffe



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-22 Thread Lennart Sorensen
On Wed, Jun 22, 2016 at 09:24:51PM +0200, deloptes wrote:
> Sorry previous went out incomplete, because of some shortcut I pressed
> wrongly
> 
> Here is what I found
> 
> lrwxrwxrwx 1 root root   9 Jun 22 14:16
> ata-WDC_WD800GD-75FLC3_WD-WMAKE1962410 -> ../../sda
> lrwxrwxrwx 1 root root   9 Jun 22 14:16
> ata-WDC_WD800JD-75JNC0_WD-WCAM97914701 -> ../../sdb
> lrwxrwxrwx 1 root root  10 Jun 22 14:16
> dm-name-isw_dgebjhdbhb_Volume0 -> ../../dm-0
> lrwxrwxrwx 1 root root  10 Jun 22 14:16
> dm-name-isw_dgebjhdbhb_Volume0p1 -> ../../dm-1
> lrwxrwxrwx 1 root root  10 Jun 22 14:16
> dm-name-isw_dgebjhdbhb_Volume0p2 -> ../../dm-2
> lrwxrwxrwx 1 root root  10 Jun 22 14:16
> dm-name-isw_dgebjhdbhb_Volume0p3 -> ../../dm-3
> lrwxrwxrwx 1 root root  10 Jun 22 14:16
> dm-uuid-DMRAID-isw_dgebjhdbhb_Volume0 -> ../../dm-0
> lrwxrwxrwx 1 root root  10 Jun 22 14:16
> dm-uuid-part1-DMRAID-isw_dgebjhdbhb_Volume0 -> ../../dm-1
> lrwxrwxrwx 1 root root  10 Jun 22 14:16
> dm-uuid-part2-DMRAID-isw_dgebjhdbhb_Volume0 -> ../../dm-2
> lrwxrwxrwx 1 root root  10 Jun 22 14:16
> dm-uuid-part3-DMRAID-isw_dgebjhdbhb_Volume0 -> ../../dm-3
> lrwxrwxrwx 1 root root  10 Jun 22 14:16
> raid-isw_dgebjhdbhb_Volume0-part1 -> ../../dm-1
> lrwxrwxrwx 1 root root  10 Jun 22 14:16
> raid-isw_dgebjhdbhb_Volume0-part2 -> ../../dm-2
> lrwxrwxrwx 1 root root  10 Jun 22 14:16
> raid-isw_dgebjhdbhb_Volume0-part3 -> ../../dm-3
> 
> mdadm --examine --scan
> ARRAY metadata=imsm UUID=4613f991:8bbd4593:72c1388f:0b91f6a7
> ARRAY /dev/md/Volume0 container=4613f991:8bbd4593:72c1388f:0b91f6a7 member=0
> UUID=546fe9f1:4a141c96:5d18debe:ee4cb184
> ARRAY metadata=imsm UUID=4613f991:8bbd4593:72c1388f:0b91f6a7
> ARRAY /dev/md/Volume0 container=4613f991:8bbd4593:72c1388f:0b91f6a7 member=0
> UUID=546fe9f1:4a141c96:5d18debe:ee4cb184

Oh dear, that means you are using intel fake raid.  I had no end of
trouble when I tried to do that, and often had to manually start the
raid in the initramfs before the boot would continue.

Unless you are sharing the drive with windows I would highly recommend
avoiding that, and doing the software raid purely in linux.  It is much
simpler and much better supported.

-- 
Len Sorensen



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-22 Thread Jeffrey Mark Siskind
   Are you certain that there isn't a PERC H700 in this machine? [Sort of
   odd that mpt2sas is triggering a state error in your screenshot if there
   actually isn't one.]

There could be one. But I probably don't use it. I use software RAID. Dell
wouldn't sell an R815 without an OS. I think I purchased it with RHEL which
may have needed the PERC H700. But I never even booted RHEL. The first thing I
did was a fresh install of squeeze, or maybe wheezy.

   OK. This:

   > 00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI 
SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode]

   makes me think that the SATA controller is in IDE/Legacy mode instead of
   AHCI. In theory, this shouldn't matter, but it's possible that this is
   also a problem. I'd try switching it in the bios and see what happens.

I'll do that in a bit. Before I got your current post, I tried some things in
response to your previous post. I'll report on that here and then go back and
try the new things.

Here is what I did.

I had a fresh minimal USB install of wheezy running. That install was done
with debian-wheezy-DI-b1-amd64-netinst.iso from Jul 15  2012. I also put the
non-free firmware on the USB. When I did that, I unchecked all of the boxes
during the install for any extra packages. The only thing that I installed
after that was

   apt-get install less

I then did

   nano /etc/apt/source.list
   (change all wheezy to jessie)
   apt-get update
   apt-get dist-upgrade

I answered all of the defaults.

(default) all
(default) no
(default) cron

I captured this with

   script -t 2>upgrade-jessie1 time -a ~/upgrade-jessie1.script

(My mistake. I forgot a period between upgrade-jessie1 and time.)

   http://upplysingaoflun.ecn.purdue.edu/~qobi/time
   http://upplysingaoflun.ecn.purdue.edu/~qobi/upgrade-jessie1

You can see that it all worked.

You can see that at the end I did

   apt-get install firmware-linux

   dpkg-reconfigure grub-pc
   # default
   # default
   # check all /dev/sd?

and it all worked.

You can also see that at the end I did

   cat /proc/mdstat

and all 6 components of both md0 and md1 were there.

Then I did and

   /sbin/reboot

The first reboot failed. It gave a similar screen as to the one that you
already saw.

Then I did a second reboot, with delay=20. That did the same.

Then I did a third reboot, with rootdelay=20. That worked. I got a login
prompt, logged in, and got a root shell.

At that point, I did a 

   cat /proc/mdstat

and all 6 components of both md0 and md1 were there.

Then I did a

   dpkg-reconfigure grub-pc

My intent was to add rootdelay=20 to the command line. But I got lots of
errors while doing so. I realized that I should have done this under script.
So I did

   script -t 2>upgrade-jessie2.time -a ~/upgrade-jessie2.script

(this time with the period) and redid

   dpkg-reconfigure grub-pc

and also did

   cat /proc/mdstat

and attempted

   mdadm /dev/md0 --add /dev/sda1
   mdadm /dev/md0 --add /dev/sdb1
   mdadm /dev/md0 --add /dev/sdc1
   mdadm /dev/md0 --add /dev/sdd1
   mdadm /dev/md0 --add /dev/sde1
   mdadm /dev/md0 --add /dev/sdf1

but these all failed.

   http://upplysingaoflun.ecn.purdue.edu/~qobi/upgrade-jessie2.script
   http://upplysingaoflun.ecn.purdue.edu/~qobi/upgrade-jessie2.time

The machine is now in the state left at the end of the above script. If you
want me to do some more things in this state, let me know. Or I can do a fresh
USB install of wheezy and rebuild md0.

   >What does the kernel output while it is detecting the disks and
   >partitions?

   Remove the quiet option from the kernel command line by editing it in grub.

I will do this next time.

   > Do all of the drives show up properly?

   echo /dev/sd*; should give you an idea of what is there in the initramfs.

I will do this next time.

   >When the boot fails, can you read from the underlying block
   >devices?

   more /dev/sda; should work, I believe.

I will do this next time.

   > I don't know what one can do in at the initramfs command prompt. If you 
give
   > me some commands, I will try them out and post the output.
   > 
   >Does specifying delay=20 or similar result in a successful boot?

   > I will try this.

   This should actually be rootdelay=20; sorry.

Done. See above.

   > I will try to get this info. It will require me to redo the exercise
   > of a fresh jessie install from USB. I'll have to take and post screen
   > pictures because I have no way to capture the console output.

   I believe the R815 still has a serial port; you can just plug in a
   serial cable and append an appropriate serial tty option to the kernel
   command line to get output as text.

I figured out how to use script. That will work for most situations.

   What I'm trying to do is get enough information so that the error is
   obvious.

Thanks. Let me know what you want me to try next. Do you still wish me to do
the following?

   >What does the kernel output while it 

Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-22 Thread Don Armstrong
On Tue, 21 Jun 2016, Jeffrey Mark Siskind wrote:
> http://upplysingaoflun.ecn.purdue.edu/~qobi/20160619_140357.jpg

Are you certain that there isn't a PERC H700 in this machine? [Sort of
odd that mpt2sas is triggering a state error in your screenshot if there
actually isn't one.]

> I don't believe that I have any add-in cards. The machine was
> purchased straight from Dell. It has six SATA disks and 4 gigabit
> ethernet ports. It has four 12-core AMD CPUs and 128GB RAM. The output
> of lspci on an indentical machin purchased at the same time that is
> still running wheezy is enclosed below.

OK. This:

> 00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI 
> SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode]

makes me think that the SATA controller is in IDE/Legacy mode instead of
AHCI. In theory, this shouldn't matter, but it's possible that this is
also a problem. I'd try switching it in the bios and see what happens.

>What does the kernel output while it is detecting the disks and
>partitions?

Remove the quiet option from the kernel command line by editing it in grub.

> Do all of the drives show up properly?

echo /dev/sd*; should give you an idea of what is there in the initramfs.

>When the boot fails, can you read from the underlying block
>devices?

more /dev/sda; should work, I believe.

> I don't know what one can do in at the initramfs command prompt. If you give
> me some commands, I will try them out and post the output.
> 
>Does specifying delay=20 or similar result in a successful boot?

> I will try this.

This should actually be rootdelay=20; sorry.

> I will try to get this info. It will require me to redo the exercise
> of a fresh jessie install from USB. I'll have to take and post screen
> pictures because I have no way to capture the console output.

I believe the R815 still has a serial port; you can just plug in a
serial cable and append an appropriate serial tty option to the kernel
command line to get output as text.

> But again note, that I do not believe that there are any disk hardware
> errors. And I do not believe that there are any data errors in the
> layout of the ext3 file system, the layout of the md0 raid array, or
> the partition tables. The reason is that after the failed jessie
> install, I reinstall a fressh wheezy from USB. I don't repartition.
> And I don't rebuild md1 and don't rebuild /aux. But I do rebuild md0
> and / as part of the fresh install. And it works.

Yes; it's possible that a change in one of the drivers between the
wheezy and jessie kernels is exposing a firmware bug (or there's a bug
in the kernel itself) which is causing this issue.

What I'm trying to do is get enough information so that the error is
obvious.


-- 
Don Armstrong  https://www.donarmstrong.com

What I can't stand is the feeling that my brain is leaving me for 
someone more interesting.



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-21 Thread Jeffrey Mark Siskind
Thanks for your help.

   > Here is a screen picture.

   Could you upload this to an image paste site or send it along (or use a
   serial console to get it as text?)

http://upplysingaoflun.ecn.purdue.edu/~qobi/20160619_140357.jpg

(The other screen picture of a machine (not an R815) that does boot but that
takes a really long time to bring up the network is at

http://upplysingaoflun.ecn.purdue.edu/~qobi/IMG-20160609-WA.jpeg

)

   > I conjecture that the jessie kernel has difficulty accessing the MD
   > array on disk. The same problem occurs when I attempt a direct fresh
   > install of jessie with the installer.

   Which add-in card are you using on the R815s?

I don't believe that I have any add-in cards. The machine was purchased
straight from Dell. It has six SATA disks and 4 gigabit ethernet ports. It has
four 12-core AMD CPUs and 128GB RAM. The output of lspci on an indentical
machin purchased at the same time that is still running wheezy is enclosed
below.

   What does the kernel
   output while it is detecting the disks and partitions? Do all of the
   drives show up properly? Are the blocksizes correct for the partitions?

I don't know how to get this info when in the initramfs after boot. If you
tell me what commands I should give I will redo this exercise. Right now, I
have a fresh minimal wheezy reinstalled. But after the reinstall of wheezy,
everything works. I did not repartition either during the (re)install of
jessie or during the (re)install of wheezy. I go back and forth. The
(re)install of wheezy works and the (re)install of jessie does not.

   When the boot fails, can you read from the underlying block devices? Do
   the block devices get detected after the boot fails?

I don't know what one can do in at the initramfs command prompt. If you give
me some commands, I will try them out and post the output.

   Does specifying delay=20 or similar result in a successful boot?

I will try this.

 I made the dongle
   > as follows:
   > 
   ># cd /tmp
   ># wget 
http://ftp.nl.debian.org/debian/dists/jessie/main/installer-amd64/current/images/hd-media/boot.img.gz
   ># wget 
http://cdimage.debian.org/cdimage/unofficial/non-free/cd-including-firmware/8.5.0+nonfree/amd64/iso-cd/firmware-8.5.0-amd64-netinst.iso
   ># zcat boot.img.gz >/dev/sdf
   ># mount /dev/sdf /mnt
   ># cp firmware-8.5.0-amd64-netinst.iso /mnt/.

   You can actually just cat firmware-8.5.0-amd64-netinst.iso > /dev/sdf;

Please see my other post to debian-user

subject: how to make bootable live wheezy USB that doesn't use isohybrid

One of the exercises I tried was when the machine failed to boot after a fresh
USB-install of jessie, I tried to boot a live wheezy from USB by using a USB
dongle that I made by catting the isohybrid live wheezy ISO to the USB. But
the BIOS failed to detect the USB as bootable. I haven't tried to do that with
the netinst ISO but I suspect that it also won't be detected as bootable. But
when I build the USB dongle as per above it is detected by the BIOS as bootable.

   > Every time so far, md1 has all 6 components. But md0 has only some of
   > the components, sometimes 5/6, sometimes 4/6, and sometimes 1/6. And
   > every time it is a different set of components. Even though, just a
   > few minutes earlier, I was running wheezy and md0 had all 6
   > components. I do
   > 
   > mdadm /dev/md0 --add 
   > 
   > but it refuses. I forget the error.

   The error would be useful to know. Most likely one or more of them
   dropped out of the array for some reason and you're booting off of one
   which has a lower event count and it won't assemble.

   But it could be any number of things.

   The output of mdadm --examine /dev/sd[abcdef]1; when md0 fails to
   assemble would also be useful.

I will try to get this info. It will require me to redo the exercise of a
fresh jessie install from USB. I'll have to take and post screen pictures
because I have no way to capture the console output. (I guess that I could use
iDRAC but I don't know how to and would have to learn.) If you let me know all
of the info you would like me to collect, I will try to collect it all in the
same retry of the fresh install.

But again note, that I do not believe that there are any disk hardware
errors. And I do not believe that there are any data errors in the layout of
the ext3 file system, the layout of the md0 raid array, or the partition
tables. The reason is that after the failed jessie install, I reinstall a
fressh wheezy from USB. I don't repartition. And I don't rebuild md1 and don't
rebuild /aux. But I do rebuild md0 and / as part of the fresh install. And it
works. I have done this over and over, switching between wheezy and jessie,
about a half dozen times. Each time, the jessie install leaves a different
collection of md0 components out. And each time, as part of the wheezy
install, I add them back in.

Thanks for your help.
Jeff (http://engineering.purdue.edu/~qobi)

Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-21 Thread deloptes
Don Armstrong wrote:

> The error would be useful to know. Most likely one or more of them
> dropped out of the array for some reason and you're booting off of one
> which has a lower event count and it won't assemble.
> 
> But it could be any number of things.
> 
> The output of mdadm --examine /dev/sd[abcdef]1; when md0 fails to
> assemble would also be useful.

In my case it is Dell OptiPlex 7xx - I have it under the desk for 2y now -
but it looks like it is 5y old.
When I looked into the drives they were detected but md disks seemed to be
messed and not easy recovarable.

What I observed that only raid0 was loaded but not raid1. After removing
raid0 and loading raid1 I was able to see at least the partitions of the
drives but I did not have time to go further, so as I had to do a lot in
the office and @home I just shut it down. I hope I'll have some time next
week to play with that. Good that I do not need a remote machine at the
moment.

I hope this helps

regards



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-21 Thread Don Armstrong
On Tue, 21 Jun 2016, Jeffrey Mark Siskind wrote:
> Please note that all of the above systems have / as md0 RAID1. The fresh
> install of jessie was successfull on all but the R815s.
> 
>> Then it fails to reboot and goes into the initramfs. I have a 
> picture of
>> the screen if anybody wishes.
> 
>Yes please.  Also please use the 'rescue' boot option which enables
>more verbose logging to the screen.
> 
> Thanks for your help.
> 
> Here is a screen picture.

Could you upload this to an image paste site or send it along (or use a
serial console to get it as text?)

> I conjecture that the jessie kernel has difficulty accessing the MD
> array on disk. The same problem occurs when I attempt a direct fresh
> install of jessie with the installer.

Which add-in card are you using on the R815s? What does the kernel
output while it is detecting the disks and partitions? Do all of the
drives show up properly? Are the blocksizes correct for the partitions?

When the boot fails, can you read from the underlying block devices? Do
the block devices get detected after the boot fails? Does specifying
delay=20 or similar result in a successful boot?

> Here is what happens that is strange. When I do a fresh install of jessie, one
> of the first things that the installer does is probe for hardware to try to
> find the ISO. I have done this about 10 times. Sometimes (about 3 or 4) it
> succeeds in finding the ISO. Sometimes (the rest) it comes up with a red
> screen and claims that it can't find the ISO. In all cases, I am booting the
> installer from the same USB dongle with the same data on it. I made the dongle
> as follows:
> 
># cd /tmp
># wget 
> http://ftp.nl.debian.org/debian/dists/jessie/main/installer-amd64/current/images/hd-media/boot.img.gz
># wget 
> http://cdimage.debian.org/cdimage/unofficial/non-free/cd-including-firmware/8.5.0+nonfree/amd64/iso-cd/firmware-8.5.0-amd64-netinst.iso
># zcat boot.img.gz >/dev/sdf
># mount /dev/sdf /mnt
># cp firmware-8.5.0-amd64-netinst.iso /mnt/.

You can actually just cat firmware-8.5.0-amd64-netinst.iso > /dev/sdf;

> Every time so far, md1 has all 6 components. But md0 has only some of
> the components, sometimes 5/6, sometimes 4/6, and sometimes 1/6. And
> every time it is a different set of components. Even though, just a
> few minutes earlier, I was running wheezy and md0 had all 6
> components. I do
> 
> mdadm /dev/md0 --add 
> 
> but it refuses. I forget the error.

The error would be useful to know. Most likely one or more of them
dropped out of the array for some reason and you're booting off of one
which has a lower event count and it won't assemble.

But it could be any number of things.

The output of mdadm --examine /dev/sd[abcdef]1; when md0 fails to
assemble would also be useful.

-- 
Don Armstrong  https://www.donarmstrong.com

S: Make me a sandwich
B: What? Make it yourself.
S: sudo make me a sandwich
B: Okay.
 -- xkcd http://xkcd.com/c149.html



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-21 Thread Jeffrey Mark Siskind
My posting has not appeared on debian-{boot,kernel,user}. I think it is
because of the attachments. I have removed them. I'll send the screen images
to people individually if they request them.
---
I am cross posting this to debian-{boot,kernel,user}. I had replied to a reply
to my original post on debian-{boot,kernel} with a to: to the replier and a
cc: to debian-{boot,kernel} apparently it didn't get posted. So I am reposting
this there. And I am posting this on debian-user to provide more information
to all of the responders to my post there. My original post was short, just to
raise the issue. This post is longer, to provide all of the details that I
have.

Thanks to everyone for your help.

Some background. I have 23 machines.

 11 Dell T5500each has 4 disks
  4 HP DL165  each has 3 disks
  4 Dell Poweredge R815   each has 6 disks
  4 Dell Poweredge C6145  each has 4 disks

All were purchased around 2011. All have been running wheezy reliably for
years and running squeeze reliably for years before that. The initial install
about 5 years ago was squeeze, with the squeeze installer. And then a
dist-upgrade to wheezy a few years later.

All machines within a class have the same hardware and have their disks
partitoned identically. The disks were partitioned at the time of the initial
install of squeeze about five years ago by the squeeze installer. All the
machines have SATA disks but different classes of machines have different
numbers of disks of different sizes. The disks on the T5500s and C6145s are
the same.

Dell T5500
  sd[a-d]1 md0 RAID1 ext4 /
  sd[a-d]2 md1 RAID5 ext4 /aux
  sd[a-d]3 swap
DL165
  sd[a-c]1 md0 RAID1 ext3 /
  sd[a-c]2 md1 RAID5 ext3 /aux
  sd[a-c]3 swap
R815
  sd[a-f]1 md0 RAID1 ext3 /
  sd[a-f]2 md1 RAID5 ext3 /aux
  sd[a-f]3 swap
C6145
  sd[a-d]1 md0 RAID1 ext3 /
  sd[a-d]2 md1 RAID5 ext3 /aux
  sd[a-d]3 swap

The reason that the T5500s have ext4 and the others do not is that the
machines were purchased at slightly different times and ext4 became available.

I first tried to do a dist-upgrade from wheezy to jessie one one machine of
each class. But the dist-upgrade hung on 3 of the 4 machine types. I didn't
save the details from that. But what I decided to do was a fresh install on
one machine of each class.  That fresh install succeeded on the T5500, the
DL165, and the C6145. So I upgraded all of the T5500s, all of the DL165s, and
all of the C6145s with a fresh install of jessie. That was successfull. There
was (and still is) a minor issue with the C6145s. I will discuss that
later. But the attempted fresh install to one R815 has not been successful.

For the fresh installs, I am using the jessie installer on USB, built as
described below. I attempt to preserve the existing disk partitioning. I also
attempt to preserve the existing md1 /aux. These are my long-term data storage
and collectively have about 100 terabytes of data. I reformat md0 /, keeping
it as ext3 on the DL165s, R815s, and C6145s and keeping it as ext4 on the
T5500s.

On the R815, I first tried to do a fresh install from USB. (That was after the
unsuccessful attempt at a dist-upgrade from a wheezy installation that had
been running for years.) I tried that about 8 times, all unsuccessful. But it
fails in slightly different ways each time. That nondeterministic behavior,
described below, leads me to believe that there is a bug. After that, I tried
unsuccessfully to boot from a live wheezy. (See my other posts to
debian-user.) After that, I was successful in doing a fresh install of wheezy.
That install was a minimal install. I did nothing but the fresh install from
USB and I deselected all of the options for additional software to install.
After that minimal install of wheezy, all I did was:

  nano /etc/apt/sources.list
  (change all wheezy to jessie)
  apt-get update
  apt-get dist-upgrade
  (answer default to all questions)
  /sbin/reboot

The dist-upgrade did not complain and did not give any errors. But upon
reboot, it entered the initramfs. A screen picture is enclosed below.

I am only posting the part below because it has not previously been posted. To
the readers of debian-users, there have been posts to debian-{boot,kernel}
that may answer some of your questions and provide more information. I am not
reposting those. Likewise, to the readers of debian-{boot,kernel}, there have
been posts to debian-user that may answer some of your questions and provide
more information. I am not reposting those.

   From: deloptes 
   I failed today to upgrade wheezy to jessie on raided system as well.

Please note that all of the above systems have / as md0 RAID1. The fresh
install of jessie was successfull on all but the R815s.

   > Then it fails to reboot and goes into the initramfs. I have a picture 
of
   > 

Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-20 Thread Jochen Spieker
deloptes:
> 
> Upgrade usually is done by
> 
> apt-get update
> apt-get upgrade
> apt-get dist-upgrade

No. You upgrade to a new stable release by reading and following the
release notes.

J.
-- 
I am heading for the loony bin.
[Agree]   [Disagree]
 


signature.asc
Description: Digital signature


Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-20 Thread Brian
On Mon 20 Jun 2016 at 20:53:55 +0200, deloptes wrote:

> Brian wrote:
> 
> > On Mon 20 Jun 2016 at 13:06:30 +0200, Michael Lange wrote:
> > 
> >> On Mon, 20 Jun 2016 10:43:35 +0200
> >> Sven Hartge  wrote:
> >> 
> >> > deloptes  wrote:
> >> > > Jeffrey Mark Siskind wrote:
> >> > 
> >> > >> I am attempting to install jessie on a Dell Poweredge R815. It has
> >> > >> been running wheezy reliably for years. And running squeeze reliably
> >> > >> for years before that. But no matter what I try it won't install or
> >> > >> boot.
> >> > 
> >> > > why is an upgrade not an option?
> >> > 
> >> > Upgrade to what? He wants to install Jessie, you can't get a newer
> >> > stable Debian than that.
> >> 
> >> I guess he meant a dist-upgrade from an installed wheezy to jessie, if
> >> jessie won't do a fresh install.
> > 
> > I think the OP's attempt at a dist-upgrade was described in item 2 of
> > his first mail.
> 
> The problem is the kernel and some other changes that cause troubles.

Really?

You can deduce that from the sparse information provided by the OP?

I like "some other changes". Push something completely unspecified into
a discussion and we all nod our heads at the wisdom of the statement.



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-20 Thread deloptes
Brian wrote:

> On Mon 20 Jun 2016 at 13:06:30 +0200, Michael Lange wrote:
> 
>> On Mon, 20 Jun 2016 10:43:35 +0200
>> Sven Hartge  wrote:
>> 
>> > deloptes  wrote:
>> > > Jeffrey Mark Siskind wrote:
>> > 
>> > >> I am attempting to install jessie on a Dell Poweredge R815. It has
>> > >> been running wheezy reliably for years. And running squeeze reliably
>> > >> for years before that. But no matter what I try it won't install or
>> > >> boot.
>> > 
>> > > why is an upgrade not an option?
>> > 
>> > Upgrade to what? He wants to install Jessie, you can't get a newer
>> > stable Debian than that.
>> 
>> I guess he meant a dist-upgrade from an installed wheezy to jessie, if
>> jessie won't do a fresh install.
> 
> I think the OP's attempt at a dist-upgrade was described in item 2 of
> his first mail.

The problem is the kernel and some other changes that cause troubles.

Upgrade usually is done by

apt-get update
apt-get upgrade
apt-get dist-upgrade

I failed today to upgrade wheezy to jessie on raided system as well.

The kernel/initramfs is the key to this and perhaps eliminate systemd first
time booting after the upgrade.

In the initramfs shell I usually 
1. check if disks are found (might be /dev/[hs]d* are missing.
2. mount the root partition (in the example to dir called new) and 
3. run

cd /new
exec /usr/sbin/chroot . /bin/sh <<- EOF >dev/console 2>&1
exec /sbin/init ${CMDLINE}
EOF

4. when system is up update initram
update-initramfs

This magic worked always

It is a bit more complicated if you use raid, lvm and luks, but still it
comes to the magic at the end

I hope this helps

regards



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-20 Thread Don Armstrong
On Sun, 19 Jun 2016, Jeffrey Mark Siskind wrote:
>  2. I do a fresh install of wheezy from a USB dongle. It boots wheezy just 
> fine.
> I do nothing but
> 
>   nano /etc/apt/sources.list
>   (change all instances of wheezy to jessie, save, and exit)
>   apt-get update
>   apt-get dist-upgrade
>   (It upgrades without error. I answer the default to all questions.)
>   /sbin/reboot
> 
> Then it fails to reboot and goes into the initramfs. I have a picture of
> the screen if anybody wishes.

It would be useful to see that screen (or better, the console output
as text directly from the DRAC in an e-mail.)

I'm guessing this is a "cannot find root filesystem" issue; it's also
possible that you're missing the appropriate driver for however the
disks are attached to that R815.

-- 
Don Armstrong  https://www.donarmstrong.com

Life would be way easier
if I were easier.
 -- a softer world #473
http://www.asofterworld.com/index.php?id=473



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-20 Thread Michael Lange
On Mon, 20 Jun 2016 12:39:34 +0100
Brian  wrote:

> > > > why is an upgrade not an option?
> > > 
> > > Upgrade to what? He wants to install Jessie, you can't get a newer
> > > stable Debian than that.
> > 
> > I guess he meant a dist-upgrade from an installed wheezy to jessie, if
> > jessie won't do a fresh install. 
> 
> I think the OP's attempt at a dist-upgrade was described in item 2 of
> his first mail.

Oh, yes, sure (^.^);

Ok, then a few questions to the OP that come to mind, since we still don't
seem to know why the jessie boot actually fails:
Did you try to boot different kernel versions, does the old kernel from
wheezy also fail to boot (just to rule out a problem of jessie's kernel
with that particular machine)?
What's the contents of your sources.list file? Sometimes I myself
experienced problems with a dist-upgrade when mirrors like backports or
multimedia where active during that process. (ok, since a fresh install
seems to fail also, that's probably not the issue here)
Maybe you could post the exact error messages that show up during the
failed boot?

Regards

Michael

.-.. .. ...- .   .-.. --- -. --.   .- -. -..   .--. .-. --- ... .--. . .-.

Death.  Destruction.  Disease.  Horror.  That's what war is all about.
That's what makes it a thing to be avoided.
-- Kirk, "A Taste of Armageddon", stardate 3193.0



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-20 Thread Brian
On Mon 20 Jun 2016 at 13:06:30 +0200, Michael Lange wrote:

> On Mon, 20 Jun 2016 10:43:35 +0200
> Sven Hartge  wrote:
> 
> > deloptes  wrote:
> > > Jeffrey Mark Siskind wrote:
> > 
> > >> I am attempting to install jessie on a Dell Poweredge R815. It has
> > >> been running wheezy reliably for years. And running squeeze reliably
> > >> for years before that. But no matter what I try it won't install or
> > >> boot.
> > 
> > > why is an upgrade not an option?
> > 
> > Upgrade to what? He wants to install Jessie, you can't get a newer
> > stable Debian than that.
> 
> I guess he meant a dist-upgrade from an installed wheezy to jessie, if
> jessie won't do a fresh install. 

I think the OP's attempt at a dist-upgrade was described in item 2 of
his first mail.



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-20 Thread Michael Lange
On Mon, 20 Jun 2016 10:43:35 +0200
Sven Hartge  wrote:

> deloptes  wrote:
> > Jeffrey Mark Siskind wrote:
> 
> >> I am attempting to install jessie on a Dell Poweredge R815. It has
> >> been running wheezy reliably for years. And running squeeze reliably
> >> for years before that. But no matter what I try it won't install or
> >> boot.
> 
> > why is an upgrade not an option?
> 
> Upgrade to what? He wants to install Jessie, you can't get a newer
> stable Debian than that.

I guess he meant a dist-upgrade from an installed wheezy to jessie, if
jessie won't do a fresh install. 

Regards 

Michael


.-.. .. ...- .   .-.. --- -. --.   .- -. -..   .--. .-. --- ... .--. . .-.

He's dead, Jim.
-- McCoy, "The Devil in the Dark", stardate 3196.1



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-20 Thread Sven Hartge
deloptes  wrote:
> Jeffrey Mark Siskind wrote:

>> I am attempting to install jessie on a Dell Poweredge R815. It has
>> been running wheezy reliably for years. And running squeeze reliably
>> for years before that. But no matter what I try it won't install or
>> boot.

> why is an upgrade not an option?

Upgrade to what? He wants to install Jessie, you can't get a newer
stable Debian than that.

Grüße,
Sven.

-- 
Sigmentation fault. Core dumped.



Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-20 Thread deloptes
Jeffrey Mark Siskind wrote:

> I am attempting to install jessie on a Dell Poweredge R815. It has been
> running wheezy reliably for years. And running squeeze reliably for years
> before that. But no matter what I try it won't install or boot.
> 

why is an upgrade not an option?




Re: jessie won't install/boot on a Dell Poweredge R815

2016-06-19 Thread Jan Bakuwel
Hi Jeffrey,

On 20/06/16 06:49, Jeffrey Mark Siskind wrote:
> I am attempting to install jessie on a Dell Poweredge R815. It has been
> running wheezy reliably for years. And running squeeze reliably for years
> before that. But no matter what I try it won't install or boot.
>
> I have tried two ways.
>
>  1. I attempt a fresh install from a USB dongle. It gets all the way to
> installing grub and then fails.
>
>  2. I do a fresh install of wheezy from a USB dongle. It boots wheezy just 
> fine.
> I do nothing but
>
>   nano /etc/apt/sources.list
>   (change all instances of wheezy to jessie, save, and exit)
>   apt-get update
>   apt-get dist-upgrade
>   (It upgrades without error. I answer the default to all questions.)
>   /sbin/reboot
>
> Then it fails to reboot and goes into the initramfs. I have a picture of
> the screen if anybody wishes.
>
> I can reliably install and run wheezy over and over. I have not been able to
> install or boot jessie despite numerous attempts.
>
> Any suggestions?
>
> Jeff (http://engineering.purdue.edu/~qobi)


Two things come to mind, one being potential lack of disc space. I think
Jessie needs more than Wheezy if you selected the "standard utilities"
or whatever it's called (bottom line) when you're asked what to install.
I use a "rescue/boot manager" partition for many of my systems, which
only function is to chainload one of a few other operating systems. That
way I don't have to throw away my old boots before I try the new.
Installing Jessie on that 1G partition is only possible if the only
thing I select during install is the SSH server.

The other thing you may want to have a look at is the output on tty4
(Alt F4), perhaps that reveals why grub is not able to finish.

cheers,
Jan