Re: Need help Debuging boot

2017-02-18 Thread ~Stack~
On 02/17/2017 09:11 PM, Konstantin Olchanski wrote:
> Hi, there, we told you many things, some hopefully helpful, but you never
> told us what machine you have, and we did ask (twice).
> 
> Please, tell us what motherboard you have (from dmidecode) and
> what CPU you have (from /proc/cpuinfo).

Sorry. You're right. I forgot to tell the motherboard and I typo'd the
processor! Ugh. It was a long day. :-)

The processor is Intel Xeon E5-2637v4 (not the 2630) and it is in a
SuperMicro X10SRW-F board.

~Stack~






signature.asc
Description: OpenPGP digital signature


Re: Need help Debuging boot

2017-02-17 Thread Konstantin Olchanski
On Fri, Feb 17, 2017 at 05:43:50PM -0600, ~Stack~ wrote:
> 
> Thank you all very much for your help.
>


Hi, there, we told you many things, some hopefully helpful, but you never
told us what machine you have, and we did ask (twice).

Please, tell us what motherboard you have (from dmidecode) and
what CPU you have (from /proc/cpuinfo).

Please, please, please.


K.O.



>
> I wasn't in a position to reply
> earlier, but I was watching the updates.
> 
> The EDD, I think, was a red herring. Turning it off just meant it locked
> the screen without anything being printed at all. I installed to a
> different hard drive and got the same results. Even when I disabled the
> on board SATA in the BIOS and installed to an external disk, same thing.
> 
> I will spare all of the gruesome details of all the many things I tried
> that didn't work. Here is what finally did work.
> 
> Install 7.2 then update everything but microcode_ctl.
> 
> Done. :-)
> 
> Such a simple statement for such a CRAZY few days of complex debugging.
> The short version.
> 
> 7.3 is fairly new, but many of the servers I have that are nearly the
> exact same hardware config have been running for a while and only
> recently updated to 7.3. So why not try 7.2? Works no problem. Well that
> is odd. Update to 7.3, same problem.
> 
> Huh, none of my other boxes have this problem. I wonder what could be
> the difference? Same parts. The only things that are different are minor
> revision updates (eg, bios is 2 versions newer ect). Then I noticed that
> my old boxes are "Intel Xeon E5-2630v3" and the new boxes are "Intel
> Xeon E5-2630v4". Well that shouldn't matter...unless there is something
> in the microcode or the linux-firmware...
> 
> So I started investigating and narrowed it down to the microcode. I'm
> throwing this back up to my vendor to chase down w/ Red Hat as it is
> reproducible on Red Hat Enterprise Linux 7.2/7.3 and I gotta get these
> things on line (one benefit of paying for support is being able to say
> "It's broke! Fix it for me!" :-D ). Besides, looking at the Kernel hex
> code tracing things out today has given me a headache. :-D
> 
> I will find out next week if there is any strange fall out from this,
> but for today they seem to be working just fine. I am hoping they
> continue to do so until a patch/kernel update rolls down the line.
> 
> Thanks again! I really do appreciate the help. The ideas got me thinking
> on the right track and helped eliminate variables.
> 
> ~Stack~
> 




-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada


Re: Need help Debuging boot

2017-02-17 Thread ~Stack~
Greetings,

Thank you all very much for your help. I wasn't in a position to reply
earlier, but I was watching the updates.

The EDD, I think, was a red herring. Turning it off just meant it locked
the screen without anything being printed at all. I installed to a
different hard drive and got the same results. Even when I disabled the
on board SATA in the BIOS and installed to an external disk, same thing.

I will spare all of the gruesome details of all the many things I tried
that didn't work. Here is what finally did work.

Install 7.2 then update everything but microcode_ctl.

Done. :-)

Such a simple statement for such a CRAZY few days of complex debugging.
The short version.

7.3 is fairly new, but many of the servers I have that are nearly the
exact same hardware config have been running for a while and only
recently updated to 7.3. So why not try 7.2? Works no problem. Well that
is odd. Update to 7.3, same problem.

Huh, none of my other boxes have this problem. I wonder what could be
the difference? Same parts. The only things that are different are minor
revision updates (eg, bios is 2 versions newer ect). Then I noticed that
my old boxes are "Intel Xeon E5-2630v3" and the new boxes are "Intel
Xeon E5-2630v4". Well that shouldn't matter...unless there is something
in the microcode or the linux-firmware...

So I started investigating and narrowed it down to the microcode. I'm
throwing this back up to my vendor to chase down w/ Red Hat as it is
reproducible on Red Hat Enterprise Linux 7.2/7.3 and I gotta get these
things on line (one benefit of paying for support is being able to say
"It's broke! Fix it for me!" :-D ). Besides, looking at the Kernel hex
code tracing things out today has given me a headache. :-D

I will find out next week if there is any strange fall out from this,
but for today they seem to be working just fine. I am hoping they
continue to do so until a patch/kernel update rolls down the line.

Thanks again! I really do appreciate the help. The ideas got me thinking
on the right track and helped eliminate variables.

~Stack~



signature.asc
Description: OpenPGP digital signature


Re: Need help Debuging boot

2017-02-17 Thread Konstantin Olchanski
On Fri, Feb 17, 2017 at 10:49:17AM -0600, Graham Allan wrote:
>
> I think that's a linux message.
>


Confirmed.

It is the linux kernel sources file arch/x86/boot/edd.c

After printing "Probing EDD", it issues BIOS int13 calls
getting some kind of disk information. Good place to be stuck
with funky bioses.

You can turn it all off by kernel command line "edd=off", the "Probing EDD" 
message
should go away. Leaving you with whatever previous message while the kernel
is stuck somewhere else (or not). Worth a try anyway.


K.O.


> I've seen this before when output
> starts getting redirected to the serial port (or ipmi/iDRAC/iLO
> virtual serial port). Maybe check if there is some type of console
> redirection set up in the BIOS? It seems to me that when redirection
> is set up, the BIOS itself and GRUB can output to both serial and
> regular console; once the kernel boots, messages only go to one or
> the other.
> 
> The "hang" (or rather change in output) doesn't have anything to do
> with the EDD message itself - it just happens that this is the last
> message printed during that particular phase of booting.
> 
> G.
> 
> On 2/17/2017 10:33 AM, Konstantin Olchanski wrote:
> >On Thu, Feb 16, 2017 at 10:07:55PM -0600, ~Stack~ wrote:
> >>I have a bunch of new SuperMicro servers. Installed 7.3 on it. Reboot and 
> >>it hangs at:
> >>"Probing EDD (edd=off to disable)...ok"
> >
> >That's a BIOS message, not a linux or grub message, yes?
> >
> >(I do not see any EDD messages in the linux log files)
> >
> >Anyhow, *which* SuperMicro servers? (so I do not buy the same)
> >
> >>However, if I let it sit long enough it will boot (once one sat for an
> >>hour before it continued on, most of the time it is closer to 30-40
> >>minutes).
> >
> >The SuperMicro mobo/bios is notorious for slow booting, takes a good few 
> >minutes
> >from powerup to grub menu. But 30 min is extreme, yes.
> >
> >>If I boot into rescue kernel, it instantly boots. Every time. This is so
> >>puzzling to me.
> >
> >How do you mean? The "EDD" message is before grub menu or after grub menu?
> 
> -- 
> Graham Allan
> Minnesota Supercomputing Institute - g...@umn.edu

-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada


Re: Need help Debuging boot

2017-02-17 Thread Graham Allan
I think that's a linux message. I've seen this before when output starts 
getting redirected to the serial port (or ipmi/iDRAC/iLO virtual serial 
port). Maybe check if there is some type of console redirection set up 
in the BIOS? It seems to me that when redirection is set up, the BIOS 
itself and GRUB can output to both serial and regular console; once the 
kernel boots, messages only go to one or the other.


The "hang" (or rather change in output) doesn't have anything to do with 
the EDD message itself - it just happens that this is the last message 
printed during that particular phase of booting.


G.

On 2/17/2017 10:33 AM, Konstantin Olchanski wrote:

On Thu, Feb 16, 2017 at 10:07:55PM -0600, ~Stack~ wrote:

I have a bunch of new SuperMicro servers. Installed 7.3 on it. Reboot and it 
hangs at:
"Probing EDD (edd=off to disable)...ok"


That's a BIOS message, not a linux or grub message, yes?

(I do not see any EDD messages in the linux log files)

Anyhow, *which* SuperMicro servers? (so I do not buy the same)


However, if I let it sit long enough it will boot (once one sat for an
hour before it continued on, most of the time it is closer to 30-40
minutes).


The SuperMicro mobo/bios is notorious for slow booting, takes a good few minutes
from powerup to grub menu. But 30 min is extreme, yes.


If I boot into rescue kernel, it instantly boots. Every time. This is so
puzzling to me.


How do you mean? The "EDD" message is before grub menu or after grub menu?


--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu


Re: Need help Debuging boot

2017-02-17 Thread Konstantin Olchanski
On Thu, Feb 16, 2017 at 10:07:55PM -0600, ~Stack~ wrote:
> I have a bunch of new SuperMicro servers. Installed 7.3 on it. Reboot and it 
> hangs at:
> "Probing EDD (edd=off to disable)...ok"

That's a BIOS message, not a linux or grub message, yes?

(I do not see any EDD messages in the linux log files)

Anyhow, *which* SuperMicro servers? (so I do not buy the same)

> However, if I let it sit long enough it will boot (once one sat for an
> hour before it continued on, most of the time it is closer to 30-40
> minutes).

The SuperMicro mobo/bios is notorious for slow booting, takes a good few minutes
from powerup to grub menu. But 30 min is extreme, yes.

> If I boot into rescue kernel, it instantly boots. Every time. This is so
> puzzling to me.

How do you mean? The "EDD" message is before grub menu or after grub menu?

-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada


Re: Need help Debuging boot

2017-02-17 Thread Mark Stodola

On 02/16/2017 10:07 PM, ~Stack~ wrote:

Greetings,

I'm going to keep this "short" because I've just had 6hrs of things I've
tried that didn't work. It would take to long to list them all. :-)

I have a bunch of new SuperMicro servers. Installed 7.3 on it. Reboot
and it hangs at:
"Probing EDD (edd=off to disable)...ok"

And by hangs, I mean there is NO response out of anything. No Caps Lock
light on keyboard, nothing.

However, if I let it sit long enough it will boot (once one sat for an
hour before it continued on, most of the time it is closer to 30-40
minutes).

If I boot into rescue kernel, it instantly boots. Every time. This is so
puzzling to me.

If I wait and let it boot, then check 'systemd-analyze' it says my boot
time is sub 6 seconds (fancy new SSD's too!) and blame tells me that the
longest section to boot was 3 seconds on the networking. Well, that is
worthless because it just SAT THERE FOR THIRTY MINUTES!!! It obviously
starts recording time after the hang.

No matter the amount of logging I do or what debug mode I put it in, it
prints "Probing EDD (edd=off to disable)...ok" then hangs, and
EVERYTHING after that has nothing to do whatsoever with the reason for
the hang.

I have disabled just about everything I can think of from various online
suggestions. I removed the quiet flag (duh) and I've turned off
intel_pthread's and power states and ACPI and nomodeset and loglevel=7
and blah blah blah blah. Seriously, my string of crap tacked on to the
grub prompt is getting rather absurd. (I boot into recovery, modify
/etc/default/grub and run grub2-mkconfig to set the grub prompt; I
checked and this is working to set the grub parameters).

Still same result. Recovery kernel boots, the other kernel hangs.

Fine. I will install a kernel from El Repo! I'll get a fancy new 4 kernel!

Yeah. That doesn't do squat either.

Want to know the thing most infuriating? A single box in the whole
batch, shows this problem once every 10 boots or so. I can't tell that
there is a stinking thing different. BIOS is exactly the same, configs,
install, packages, everything. *shrug*

Are there *any* suggestions at all as to how I can figure out what it is
hanging on? Is there a list of things after EDD that I can just start
disabling till I get a different result?

Thoughts?

Thanks!
~Stack~



Can you tell use what motherboard and drives you have?  I've used 
several models from SuperMicro (presently using X11SAE) without problems.


I am careful about my BIOS options.  For example, I ensure the disk 
access mode is set AHCI and not RAID.  It might be worth poking around 
there to see if anything makes a difference.


-Mark


Re: Need help Debuging boot

2017-02-17 Thread David Sommerseth
On 17/02/17 12:36, ~Stack~ wrote:
> Greetings,
> On 02/17/2017 04:37 AM, Bluejay Adametz wrote:
>>> I have a bunch of new SuperMicro servers. Installed 7.3 on it. Reboot
>>> and it hangs at:
>>> "Probing EDD (edd=off to disable)...ok"
>>
>> Did you get rid of the "quiet" option on the kernel line? If not, do
>> so, so you're sure to get all the kernel messages. It might not be
>> hanging where you think it is.
>>
> 
> I did remove 'quiet' as well as 'rhgb'.

There's a lot of nice debug possibilities which can be enabled via the
kernel command line.  I would look closely at debug, earlyprintk= and edd=



I've never had any EDD issues, so I don't know how early that appears.
But edd=off at the command line would be valuable to test as well.

The earlyprintk is quite useful if issues appears very early in the boot
process, and it has best benefit if you can hook up this against an
external system (serial port, etc).

Another thing to check is /proc/cmdline if you manage to boot the system
using a SL install medium in rescue mode.  Also check the kernel
versions installed vs the one on the install medium.  Using the install
medium in rescue mode, it should be possible to mount the root file
system in a chroot, where you can downgrade/upgrade the kernel if that
is different from the install medium.

Just my few cents of ideas ...


-- 
kind regards,

David Sommerseth


Re: Need help Debuging boot

2017-02-17 Thread ~Stack~
Greetings,
On 02/17/2017 04:37 AM, Bluejay Adametz wrote:
>> I have a bunch of new SuperMicro servers. Installed 7.3 on it. Reboot
>> and it hangs at:
>> "Probing EDD (edd=off to disable)...ok"
> 
> Did you get rid of the "quiet" option on the kernel line? If not, do
> so, so you're sure to get all the kernel messages. It might not be
> hanging where you think it is.
> 

I did remove 'quiet' as well as 'rhgb'.

Thanks!
~Stack~



signature.asc
Description: OpenPGP digital signature


Re: Need help Debuging boot

2017-02-17 Thread Bluejay Adametz
> I have a bunch of new SuperMicro servers. Installed 7.3 on it. Reboot
> and it hangs at:
> "Probing EDD (edd=off to disable)...ok"

Did you get rid of the "quiet" option on the kernel line? If not, do
so, so you're sure to get all the kernel messages. It might not be
hanging where you think it is.

 - Bluejay Adametz, CFII, A, AA-5B, http://wildcorvid.org

Your focus determines your reality.  - Qui-gon Jinn

-- 
NOTICE: This message, including any attachments, is only for the use of the 
intended recipient(s) and may contain confidential and privileged 
information, or information otherwise protected from disclosure by law.  If 
the reader of this message is not the intended recipient, you are hereby 
notified that any use, disclosure, copying, dissemination or distribution 
of this message or any of its attachments is strictly prohibited.  If you 
received this message in error, please contact the sender immediately by 
reply email and destroy this message, including all attachments, and any 
copies thereof. 


Re: Need help Debuging boot

2017-02-16 Thread Paul Myers
Stack

Not sure this will help but heres my 10 pence worth

EDD being related to Disk drives this makes me wonder if the correct
firmware is being used but it did somehow eventually get around the
issue eventually - any way you can try say a different hard disk?

Can you boot a live disk on the machine with dare I say it another
distro ISO on usb or cd - I would try a full centos or redhat first then
try debian flavours and if not the ever last resort knoppix  if so maybe
there are drivers or settings you can try - this will aleviate the isue
if its the hard disk firmware causing the problem!

If the cd works try a different hard disk maybe 

Paul



On Thu, 2017-02-16 at 22:07 -0600, ~Stack~ wrote:

> Greetings,
> 
> I'm going to keep this "short" because I've just had 6hrs of things I've
> tried that didn't work. It would take to long to list them all. :-)
> 
> I have a bunch of new SuperMicro servers. Installed 7.3 on it. Reboot
> and it hangs at:
> "Probing EDD (edd=off to disable)...ok"
> 
> And by hangs, I mean there is NO response out of anything. No Caps Lock
> light on keyboard, nothing.
> 
> However, if I let it sit long enough it will boot (once one sat for an
> hour before it continued on, most of the time it is closer to 30-40
> minutes).
> 
> If I boot into rescue kernel, it instantly boots. Every time. This is so
> puzzling to me.
> 
> If I wait and let it boot, then check 'systemd-analyze' it says my boot
> time is sub 6 seconds (fancy new SSD's too!) and blame tells me that the
> longest section to boot was 3 seconds on the networking. Well, that is
> worthless because it just SAT THERE FOR THIRTY MINUTES!!! It obviously
> starts recording time after the hang.
> 
> No matter the amount of logging I do or what debug mode I put it in, it
> prints "Probing EDD (edd=off to disable)...ok" then hangs, and
> EVERYTHING after that has nothing to do whatsoever with the reason for
> the hang.
> 
> I have disabled just about everything I can think of from various online
> suggestions. I removed the quiet flag (duh) and I've turned off
> intel_pthread's and power states and ACPI and nomodeset and loglevel=7
> and blah blah blah blah. Seriously, my string of crap tacked on to the
> grub prompt is getting rather absurd. (I boot into recovery, modify
> /etc/default/grub and run grub2-mkconfig to set the grub prompt; I
> checked and this is working to set the grub parameters).
> 
> Still same result. Recovery kernel boots, the other kernel hangs.
> 
> Fine. I will install a kernel from El Repo! I'll get a fancy new 4 kernel!
> 
> Yeah. That doesn't do squat either.
> 
> Want to know the thing most infuriating? A single box in the whole
> batch, shows this problem once every 10 boots or so. I can't tell that
> there is a stinking thing different. BIOS is exactly the same, configs,
> install, packages, everything. *shrug*
> 
> Are there *any* suggestions at all as to how I can figure out what it is
> hanging on? Is there a list of things after EDD that I can just start
> disabling till I get a different result?
> 
> Thoughts?
> 
> Thanks!
> ~Stack~
> 


Need help Debuging boot

2017-02-16 Thread ~Stack~
Greetings,

I'm going to keep this "short" because I've just had 6hrs of things I've
tried that didn't work. It would take to long to list them all. :-)

I have a bunch of new SuperMicro servers. Installed 7.3 on it. Reboot
and it hangs at:
"Probing EDD (edd=off to disable)...ok"

And by hangs, I mean there is NO response out of anything. No Caps Lock
light on keyboard, nothing.

However, if I let it sit long enough it will boot (once one sat for an
hour before it continued on, most of the time it is closer to 30-40
minutes).

If I boot into rescue kernel, it instantly boots. Every time. This is so
puzzling to me.

If I wait and let it boot, then check 'systemd-analyze' it says my boot
time is sub 6 seconds (fancy new SSD's too!) and blame tells me that the
longest section to boot was 3 seconds on the networking. Well, that is
worthless because it just SAT THERE FOR THIRTY MINUTES!!! It obviously
starts recording time after the hang.

No matter the amount of logging I do or what debug mode I put it in, it
prints "Probing EDD (edd=off to disable)...ok" then hangs, and
EVERYTHING after that has nothing to do whatsoever with the reason for
the hang.

I have disabled just about everything I can think of from various online
suggestions. I removed the quiet flag (duh) and I've turned off
intel_pthread's and power states and ACPI and nomodeset and loglevel=7
and blah blah blah blah. Seriously, my string of crap tacked on to the
grub prompt is getting rather absurd. (I boot into recovery, modify
/etc/default/grub and run grub2-mkconfig to set the grub prompt; I
checked and this is working to set the grub parameters).

Still same result. Recovery kernel boots, the other kernel hangs.

Fine. I will install a kernel from El Repo! I'll get a fancy new 4 kernel!

Yeah. That doesn't do squat either.

Want to know the thing most infuriating? A single box in the whole
batch, shows this problem once every 10 boots or so. I can't tell that
there is a stinking thing different. BIOS is exactly the same, configs,
install, packages, everything. *shrug*

Are there *any* suggestions at all as to how I can figure out what it is
hanging on? Is there a list of things after EDD that I can just start
disabling till I get a different result?

Thoughts?

Thanks!
~Stack~



signature.asc
Description: OpenPGP digital signature