Re: [Ipmitool-devel] Is the BMC robust to recover from system hangs? impitool unresponsive

2010-08-30 Thread Rahul Nabar
On Mon, Aug 30, 2010 at 7:52 AM, Jarrod B Johnson jbjoh...@us.ibm.comwrote:

 Your BMC simply isn't responding to any traffic. BMCs are supposed to be
 completely resilient to OS failures when done properly (not much apart from
 things like power failures in non-redundant systems should be capable of
 knocking out a quality IPMI implementation) . You need to look to your
 system vendor's support for an explanation and/or resolution, since
 implementations vary greatly from one vendor to the next. Sometimes a vendor
 is not competent to make it work, sometimes a vendor is too cheap to make it
 easy, and sometimes a vendor simply hasn't covered your particular NIC
 driver/OS combination and the NIC vendor flubbed some register handling or
 some such to make the NIC shoot itself when the kernel panics.


Thanks for the tips Jarrod! I will look into the nodes. These are
DellR410-servers with the on-board Broadcom NIC. The first thing for this
Monday morning is for me to trudge down to the dark depths of the cluster
room and to manually log in and see what exactly happened to these nodes.

I'll post on the list if I find anything interesting

-- 
Rahul
___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

Re: [Ipmitool-devel] Is the BMC robust to recover from system hangs? impitool unresponsive

2010-08-30 Thread Rahul Nabar
On Mon, Aug 30, 2010 at 7:54 AM, Andy Cress andy.cr...@us.kontron.com wrote:

Thanks very much Andy for taking the time for such a detailed
response! That sure helps!

 Yes, that is a key function that all IPMI BMCs are supposed to provide.
 The BMC is generally not affected by what the OS does, unless there are
 IPMI-aware applications running in the OS, specifically talking to the
 BMC.

Nothing that I am aware of other than ipmitool. None of the
vendor-specific GUIs etc.

 1) IPMI LAN configuration.  Make sure that the IPMI LAN was properly
 configured.  It sounds like you may have tested this beforehand.  Even
 something like the ARP configuration could cause the port to no longer
 be visible to the router.

Yes, I had tested extensively prior to failure. This is a HPC cluster
with about ~300 identical servers and other servers in the group are
still responding perfectly. All are on their own dedicated IP subnet
although the IMPI physical network is the same as the normal 1GiGE eth
network. i.e. IPMI traffic is piggybacking on the same eth adapter
port.


 3) Some OS-resident (custom?) IPMI-aware application that may be causing
 trouble/stress/configuration problems with the BMC.

Nothing that I can imagine. I'm using CentOS and fairly standard Linux tools.

n a healthy
 system, the 'ps -ef' output on the target should show any ipmi-related
 processes that are running.

I don't see any suspicious  processes (on a sister node that hasn't
crashed). But here's a ps -ef if anything out of place is evident to
you.

[r...@eu001 ~]# ps -ef
UIDPID  PPID  C STIME TTY  TIME CMD
root 1 0  0 Aug29 ?00:00:01 init [3]
root 2 1  0 Aug29 ?00:00:00 [migration/0]
root 3 1  0 Aug29 ?00:00:00 [ksoftirqd/0]
root 4 1  0 Aug29 ?00:00:00 [watchdog/0]
root 5 1  0 Aug29 ?00:00:00 [migration/1]
root 6 1  0 Aug29 ?00:00:00 [ksoftirqd/1]
root 7 1  0 Aug29 ?00:00:00 [watchdog/1]
root 8 1  0 Aug29 ?00:00:00 [migration/2]
root 9 1  0 Aug29 ?00:00:00 [ksoftirqd/2]
root10 1  0 Aug29 ?00:00:00 [watchdog/2]
root11 1  0 Aug29 ?00:00:00 [migration/3]
root12 1  0 Aug29 ?00:00:00 [ksoftirqd/3]
root13 1  0 Aug29 ?00:00:00 [watchdog/3]
root14 1  0 Aug29 ?00:00:00 [migration/4]
root15 1  0 Aug29 ?00:00:00 [ksoftirqd/4]
root16 1  0 Aug29 ?00:00:00 [watchdog/4]
root17 1  0 Aug29 ?00:00:00 [migration/5]
root18 1  0 Aug29 ?00:00:00 [ksoftirqd/5]
root19 1  0 Aug29 ?00:00:00 [watchdog/5]
root20 1  0 Aug29 ?00:00:00 [migration/6]
root21 1  0 Aug29 ?00:00:00 [ksoftirqd/6]
root22 1  0 Aug29 ?00:00:00 [watchdog/6]
root23 1  0 Aug29 ?00:00:00 [migration/7]
root24 1  0 Aug29 ?00:00:00 [ksoftirqd/7]
root25 1  0 Aug29 ?00:00:00 [watchdog/7]
root26 1  0 Aug29 ?00:00:00 [events/0]
root27 1  0 Aug29 ?00:00:00 [events/1]
root28 1  0 Aug29 ?00:00:00 [events/2]
root29 1  0 Aug29 ?00:00:00 [events/3]
root30 1  0 Aug29 ?00:00:00 [events/4]
root31 1  0 Aug29 ?00:00:00 [events/5]
root32 1  0 Aug29 ?00:00:00 [events/6]
root33 1  0 Aug29 ?00:00:00 [events/7]
root34 1  0 Aug29 ?00:00:00 [khelper]
root   169 1  0 Aug29 ?00:00:00 [kthread]
root   181   169  0 Aug29 ?00:00:00 [kblockd/0]
root   182   169  0 Aug29 ?00:00:00 [kblockd/1]
root   183   169  0 Aug29 ?00:00:00 [kblockd/2]
root   184   169  0 Aug29 ?00:00:00 [kblockd/3]
root   185   169  0 Aug29 ?00:00:00 [kblockd/4]
root   186   169  0 Aug29 ?00:00:00 [kblockd/5]
root   187   169  0 Aug29 ?00:00:00 [kblockd/6]
root   188   169  0 Aug29 ?00:00:00 [kblockd/7]
root   189   169  0 Aug29 ?00:00:00 [kacpid]
root   302   169  0 Aug29 ?00:00:00 [cqueue/0]
root   303   169  0 Aug29 ?00:00:00 [cqueue/1]
root   304   169  0 Aug29 ?00:00:00 [cqueue/2]
root   305   169  0 Aug29 ?00:00:00 [cqueue/3]
root   306   169  0 Aug29 ?00:00:00 [cqueue/4]
root   307   169  0 Aug29 ?00:00:00 [cqueue/5]
root   308   169  0 Aug29 ?00:00:00 [cqueue/6]
root   309   169  0 Aug29 ?00:00:00 [cqueue/7]
root   312   169  0 Aug29 ?00:00:00 [khubd]
root   314   169  0 Aug29 ?00:00:00 [kseriod]
root   437   169  0 Aug29 ?00:00:00 [pdflush]
root   438   169  0 Aug29 ?00:00:30 [pdflush]
root   439   

Re: [Ipmitool-devel] Is the BMC robust to recover from system hangs? impitool unresponsive

2010-08-30 Thread Rahul Nabar
On Mon, Aug 30, 2010 at 10:23 AM, Jarrod B Johnson jbjoh...@us.ibm.comwrote:

  Don't know much about Dell specifically, however I'll offer some
 guidance.

Very much appreciate your helping out! I haven't any


  If the Broadcom part has the tg3 driver, you may be out of luck depending
 on the failure state. For example, BCM5704 chips fundamentally cannot
 provide BMC access while executing PXE. On the other hand, bnx2 managed
 chips tend to fare better, there generally is at least one way to make it
 work correctly, though drivers and nic firmware matter *greatly* still. Not
 as resilient as I would like, but with precautions in how you manage
 firmware and drivers, it's workable.

Luckily no tg3 for this server. It does have the bnx2 driver. Do you have
any specific driver / firmware version comments about which combinations do
work and which don't?


  You'll want to check your tg3/bnx2/whatever driver version and NIC
 firmware version, depending on your investigation.

I have version 1.9.3 of bnx2. Not sure how to get the NIC firmware version
on a running system.


 Shared nics can work great, but some implementations can be picky about
 what drivers and firmware are in place.

This being a HPC cluster shared nics was the more feasible option. The cost
of a dedicated out-of-band network and switches was deemed too expensive and
messy. As such no one server is critical but the utility of the BMC+IMPI is
the ability to debug crashes without having to walk to the server room each
time. Or so I thought! :)


 Also, newer is not always better, sometimes a developer without caring
 about the IPMI access provided by some nics will unwittingly break it
 somehow in the driver, and it won't get fixed until some server vendor or
 other industrious administrator stumbles across it.

Absolutely. Agreed. Unfortunately there's no easy way of knowing what works
and what doesn't other than posting on a list like this and hoping someone
else has been burnt before! :)

Thanks again!

-- 
Rahul
___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

Re: IPMI

2010-08-05 Thread Rahul Nabar
On Wed, Aug 4, 2010 at 2:33 PM, Alexander Dupuy alex.du...@mac.com wrote:
 My experience with Dell server systems that have BMC or iDRAC cards standard
 (9th/10th/11th gen, at least) is that lm_sensors doesn't have any usable
 sensors to monitor, as Dell have wired them all up to the BMC instead (AMD
 CPU temperature sensors builtin to the CPU itself perhaps being one
 exception - but all my Dell servers are Intel).

I'm not sure about the hardware details but is it difficult to allow
both lm_sensors and the BMC to access the sensors? WHy does Dell block
the lm_sensors access? Is there a reason? Maybe someone at Dell can
elaborate?

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Recommendations for JBOD to add storage to my R410

2010-08-04 Thread Rahul Nabar
I've used MD1000's in the past but that involves buying a PERCe and
doing hardware RAID. I am tending towards doing a software RAID config
this time by using mdadm and just needed a minimalistic JBOD to put
SATA drives in to. The R410 will only take 4 internal drives (and
already using one drive for the OS) whereas I need 8 drives. So I do
need some external storage. I was planning on using 8, 1-Terabyte SATA
drives.

Any recommendations for a suitable JBOD model? I couldn't find
anything suitable on the website

This is going to be dedicated storage for this particular server so
SAN, iSCSI, fiber channel etc. are overkills. Performance is not very
important since this will be a backup box so my 100 Mbps network link
is probably the bottleneck.

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: IPMI

2010-08-04 Thread Rahul Nabar
On Wed, Aug 4, 2010 at 8:20 AM, James Bensley jwbens...@gmail.com wrote:
 IPMI is great, but if every server has it it's useless unless you can
 monitor all you servers from a central place so how are people doing
 this?


What if you monitor in-band using something like pings, heartbeat,
ganglia etc.? Then use ipmi (via ipmitool) only when stuff goes wrong
and a machine is hung or crashed.

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: IPMI

2010-08-04 Thread Rahul Nabar
On Wed, Aug 4, 2010 at 12:07 PM, James Bensley jwbens...@gmail.com wrote:
 On 4 August 2010 17:52, Rahul Nabar rpna...@gmail.com wrote:
 What if you monitor in-band using something like pings, heartbeat,
 ganglia etc.? Then use ipmi (via ipmitool) only when stuff goes wrong
 and a machine is hung or crashed.

 Not a bad idea but with IPMI waiting till something goes POP is silly,
 I could have already been using it to see the temperature on my CPUs
 rising, or the 2nd power supply flapping etc etc...

I monitor temperatures via lm_sensors. Again in-band. I try to keep my
monitoring in-band unless there is a compelling reason to use ipmi.
Maybe some sensors are not available to lm_sensors.

Of course, there has to be some aggregation tool whenever you have
many servers. But that's a different issue from whether to use in-band
(lm_sensors) or out-of-band (ipmi). Personally, both nagios and
ganglia have worked well for the aggregation and display.

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: IPMI

2010-08-04 Thread Rahul Nabar
On Wed, Aug 4, 2010 at 12:19 PM, Rahul Nabar rpna...@gmail.com wrote:
 What if you monitor in-band using something like pings, heartbeat,
 ganglia etc.? Then use ipmi (via ipmitool) only when stuff goes wrong
 and a machine is hung or crashed.

I assumed you were using Linux / *nix. Are you? Not sure how good
these tools are in a Win environment.

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: OSMA does not work on PowerEdge 1435SC

2010-07-26 Thread Rahul Nabar
On Mon, Jul 26, 2010 at 8:27 AM, J. Epperson
d...@epperson.homelinux.net wrote:
 Correct.  This has been discussed more than once on the list, check the
 archives.  It's unfortunate that there's Dell doc that says the 1435SC is
 supported for systems management, implying that OMSA should work.

I bought a whole cluster of SC1435's 2 years ago under this same false
impression.
The SC1435 seems to be a black hole in the heirarchy. I often tried
out a feature that didn't work only to be eventually told by support
that the SC1435 didn;t do it. e.g. large MTU's required for jumbo
frames on the eth cards. Never did get that to work on the SC1435
either.

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Serial over LAN

2010-07-15 Thread Rahul Nabar
On Thu, May 20, 2010 at 1:01 PM, Jefferson Ogata powere...@antibozo.net wrote:
 2. In BMC setup (control-E during POST), enable serial over LAN, set IP
 and password.

 ** First thing you should do in BMC setup is reset to default. The BMCs
 often ship with a weird non-default setting that will cause lots of
 serial port feedback if you try to run a getty on the serial console.

Is it possible to do this step in an automated fashion? i.e. without
using the Manual Ctrl+E?

Would ipmitool, or syscfg or another equivalent tool have the way of
setting the BMC SOL? I already set the BMC's IP and pasword using
ipmitool.

I have ~300 machines so doing this step manually via Ctrl+E is tedious.

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: syscfg bug? Does not allow me to set console redirection

2010-07-15 Thread Rahul Nabar
On Mon, Jul 12, 2010 at 7:13 AM,  vibha_g...@dell.com wrote:
 Rahul,

 Syscfg --conred is applicable up to 8th generation of Power Edge
 servers.
 On R300 and R410 servers, use syscfg --serialcomm along with --extserial
 option.

Thanks! That works. Is the fact that this option was removed
documented somewhere? I don't see it in the Dell manuals for syscfg.

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: various FAN on PowerEdge R710

2010-07-11 Thread Rahul Nabar
On Sun, Jul 11, 2010 at 5:51 PM, Zhichao Li lzcmich...@gmail.com wrote:
 Hello all:

 I am using ipmitool to get some useful information about the system. The
 outputs for FAN really
 confuse me as follow:

These servers have more than one fan for better cooling.
RPM of each fan.

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


syscfg bug? Does not allow me to set console redirection options from command line

2010-07-10 Thread Rahul Nabar
In order to set the correct BIOS settings for Serial-over-LAN I was
using the Dell syscfg too (From the deployment Toolkit). I was hoping
to then automate this process on all 300 R410 servers than we have.

I tried:

 /opt/dell/toolkit/bin/syscfg --conred
The option 'conred' is not available or cannot be configured
through software.

I am sure my system has this option since if I physically log in to
the bios using a console I can set the redirection. Also the manual
(*) for syscfg lists this option so clearly it has to be software
addressable. Is this a bug in syscfg?

Can someone at Dell verify please? I tried the excellent Dell helpdesk
but the guys there had never heard of syscfg nor Serial-over-LAN so it
is a losing battle.

I am using this version of syscfg:
syscfg Version 3.1. ser01 (Linux - Sep  1 2009, 00:30:05)
Copyright (c) 2002-2009 Dell Inc.

-- 
Rahul

* 
http://www.google.com/url?sa=tsource=webcd=1ved=0CBIQFjAAurl=http%3A%2F%2Fsupport.dell.com%2Fsupport%2Fedocs%2Fsoftware%2Fdtk%2F2.4%2FCLI%2Fpdf%2Fdtk24cli.pdfei=rCQ5TMnXHIX_ngfB05ikDAusg=AFQjCNHvcMIJUOW55HSMQFJwRsluLuNhFw

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


R410: Is IPMI-BMC always connected to NIC1?

2010-07-09 Thread Rahul Nabar
Is the IMPI / BMC IP always connected to the Gig1 port?

Normally I use just a single physical ethernet connection but assign a
different management IP to the same port so that hung machines etc.
can be remotely rebooted.
It works mostly ok but: I have a few machines where I am using the
NIC2 interface. There it doesn't seem to work.

Is the IPMI somehow tied to the Gig1 interface. ANy way to change it
so that Gig2 also works?

[I'm using Gig2 on some ports because a few of my servers came with
the tab on the Gig1 damaged so that the connection remains lose. At
that time I didn't think that this was a big deal but this might shut
my ipmi options]

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


set BMC to be on a different subnet on a shared ethernet card (R410)

2010-06-30 Thread Rahul Nabar
On my R410's I have ethernet adapters that have twin MAC addresses:
one for the regular
use and the other for a Baseboard Management Controller (IPMI) that can be
used to reboot hung machines etc.

All my normal eth cards are assigned to a 10.0.x.x network by DHCP. So far
so good.

Now if I try to set the second IP address to something in the 10.0.x.x
range then things work. Say 10.0.5.3. I can ping the same physical server
(and the same physical card) on twin IP addresses from any remote machine.

arp from the remote pinging machine shows:
AddressHWaddress   Flags Mask  Iface
10.0.0.3   00:26:B9:58:E6:46   C   eth0
10.0.4.3   00:26:B9:58:E6:48   C   eth0

10.0.0.3 is the normal ethernet IP assigned via DHCP and 10.0.4.3 is for
the BMC.

But let's say I wanted the BMC to respond on the 172.16.x.x subnet. I can
set this address fine. Say, 172.16.0.3 But if I try to ping 172.16.0.3 then
there is no response.

Surprisingly, arp still shows both addresses but the remote ping does not
work.

Address HWaddress   Flags Mask Iface
10.0.0.3 00:26:B9:58:E6:46   C  eth0
172.16.0.3   00:26:B9:58:E6:48   C  eth0

Do I have to do something else to make this work? How do I have two subnets
on the same adapter?

Commands I used:
ipmitool lan set 1 ipsrc static
ipmitool lan set 1 ipaddr 172.16.0.3
ipmitool lan set 1 netmask  255.255.0.0


-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Dell Power Edge R710 with CentOS

2010-05-28 Thread Rahul Nabar
On Tue, May 18, 2010 at 6:46 AM, Stephan van Hienen
stephan.van.hie...@thevalley.nl wrote:
 Just install centos 5.5 and add  'options bnx2 disable_msi=1' to 
 /etc/modprobe.conf after the first boot.

If I do modinfo bnx2 I already see:

parm:   disable_msi:Disable Message Signaled Interrupt (MSI) (int)

Do I still have to add the line to modprobe? I tried adding the line
and then modprobing bnx2 but modinfo output doesn't seem to change. Is
there a way of knowing if or not the fix was successfully applied?

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Dell Power Edge R710 with CentOS

2010-05-24 Thread Rahul Nabar
On Fri, May 21, 2010 at 7:46 PM, Matt Domsch matt_dom...@dell.com wrote:


  (eg. in solaris) Dell has been aware of the issue for months without
  real fix, just workarounds.

 I'm not sure what latest means, but we did manage to find the root
 cause of the failure where the MSI bit would get stuck - which also
 explains why disabling MSI-X worked around it.  The right solution is
 to use code already in the driver to manage the timeout on that bit
 automatically, which is what we are testing with 5.5+ and expect in
 newer RHEL kernels ASAP.

 I'm really glad I follow this mailing list and hence came to know of this
problem. If Dell has been aware of this issue isn't there some way to notify
users? I have ~300 R410 systems here and not a word about this. Dell, how do
you expect users to find out!?

-- 
Rahul
___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

Re: Dell Power Edge R710 with CentOS

2010-05-21 Thread Rahul Nabar
On Tue, May 18, 2010 at 6:23 AM, Tom Boland t...@t0mb.net wrote:
 I haven't upgraded any of our boxes from 5.4 yet, but I have to say I've had
 two big problems with the stock bnx2 NIC drivers from 5.4 (for both R610s
 and R710s).  The first problem was that the NICs would just die randomly
 when under heavy load, which required a network restart.  This in particular
 can be resolved by adding options bnx2 disable_msi=1 to /etc/modprobe.conf.

I have R410's running CentOS 5.4 too. Do you know if they could be
having the same issue / fix?
I am seeing some suspicious network problems too.

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


mixing SAS and SATA drives in a Dell SC1435

2010-02-28 Thread Rahul Nabar
I had a power edge SC1435 with a small SAS drive connected to a PERC-i
card. I wanted more storage and have bought a large SATA drive. I
removed the SAS and connected the SATA to the PERC-i controller. It
recognizes it but randomly freezes the disk intermittently.

Can't figure out what the problem is. On the other hand the
motherboard seems to have two SATA risers on it. Should this be a
better option to connect?

Also, I read that Dell does not recommend mixing SAS and SATA drives
in the SC1435. Does that apply to drives connected to the PERC-i
alone? Or can I have a SAS-drive connected to the PERC-i and a SATA
connected directly to the motherboard riser?

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Third-party drives not permitted on Gen 11 servers

2010-02-16 Thread Rahul Nabar
On Tue, Feb 16, 2010 at 3:35 AM, Tino Schwarze
linux-poweredge.li...@tisc.de wrote:
 Hi there,

 On Thu, Feb 11, 2010 at 11:52:04AM +0100, Tino Schwarze wrote:

 I mailed my sales rep yesterday explaining my concerns and got a reply
 today that he'll ask the marketing department for an official statement.

 I got an answer today (German, English translation below):

 If only these were home-grade devices and not enterprise. We'd have
to wait on the order of 2 days for some smart kid to come along and
tweak a few lines in the firmware that do the check. :)

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Third-party drives not permitted on Gen 11 servers

2010-02-11 Thread Rahul Nabar
On Thu, Feb 11, 2010 at 3:50 PM, Andy Krantz an...@digitalcyclone.com wrote:
 I agree with what everyone else is saying on this subject.

 I contacted my Dell account manager and they suggested that I post on
 http://www.ideastorm.com/

 I didn't find an existing thread so I started one:

 http://dellideas.force.com/ideaView?id=0877dwTAAQ

+1 for contacting my Sales Rep. I've also posted on the Beowulf list
where a lot of us use Dell hardware.

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Third-party drives not permitted on Gen 11 servers

2010-02-11 Thread Rahul Nabar
On Tue, Feb 9, 2010 at 4:17 PM,  howard_sho...@dell.com wrote:


 There are a number of benefits for using Dell qualified drives in particular 
 ensuring a positive experience and protecting our data.

 While SAS and SATA are industry standards there are differences which occur 
 in implementation.  An analogy is that English is spoken in the UK, US and 
 Australia. While the language is generally the same, there are subtle 
 differences in word usage which can lead to confusion.

Sure. But I don't refuse to speak to a person from the UK or Australia
do I? Why not learn to work with all the dialects?

Besides what's next? Dell certified power-cords?!? Or mouse-pads?

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


R410 stops during boot waiting for a keypress of F1 or F2

2010-02-07 Thread Rahul Nabar
On some of my R410 servers the BIOS stops at a message:

Press F1 to proceed or F2 to enter System Setup

These are clustered machines so such a message just causes the whole
automated boot process to halt since they don't have a keyboard or
console attached where one presses this key. Is there a way to
automate changing this via sysconfg?

I found no clue in the Dell Deployment Guide. In the BIOS though there
is a setting that chooses whether or not the system halts on the
errors.

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Dell Toolkit conflicts with Open Manage

2010-02-04 Thread Rahul Nabar
On Thu, Feb 4, 2010 at 3:40 PM, Trond Hasle Amundsen
t.h.amund...@usit.uio.no wrote:
 either. The folder /opt/dell

 Try running 'rpm -qp --scripts dell-toolkit.rpm', and investigate the
 rpm pre-install scriptlet that complains.

Thanks Trond!  --scripts is a nice rpm option; didn't know that one.

It is the file  /etc/omreg.cfg. I guess it is left behind by the
uninstaller. I'm going to try and delete it, I doubt anything but open
mange needs this file.

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: cannot find correct attribute for omconfig BIOS mod

2010-02-02 Thread Rahul Nabar
On Mon, Feb 1, 2010 at 9:26 PM,  mahavee...@dell.com wrote:
 Check if the services are running - srvadmin-services.sh status. The
 dsm_sa_datamgrd needs to be running.

Thanks Mahaveer. Everything seems to be running. I even rebooted the
machine. What else do I need to check?

 ./opt/dell/srvadmin/sbin/srvadmin-services.sh status
dell_rbu (module) is running
ipmi driver is running
dsm_sa_datamgrd (pid 4267) is running
dsm_sa_eventmgrd (pid 4949) is running
dsm_om_shrsvcd (pid 3902) is running
dsm_om_connsvcd (pid 4975 4974) is running

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: R410's shipped out with BIOS showing 4 cores instead of 8

2010-02-01 Thread Rahul Nabar
On Mon, Feb 1, 2010 at 3:23 AM, Tim Small t...@buttersideup.com wrote:

 You can use dumpCmos from the smbios utilities to dump all of the BIOS
 settings from the Linux commandline.  Do a diff of the two sets of dumps
 with the dual-core/quad-core setting having been manually changed...
 You can then use activateCmosToken to change the same setting on all
 your servers - which should be trivial to automate.

Thanks Tim! What's the exact command  again? I see the following on my system,:

smbios-get-ut-data  smbios-rbu-bios-update  smbios-sys-info-lite
 smbios-wakeup-ctl
smbios-lcd-brightness   smbios-state-byte-ctl   smbios-token-ctl
 smbios-wireless-ctl
smbios-passwd   smbios-sys-info smbios-upflag-ctl

Am I missing a Dell repo that needs to be installed!

I dried dmidecode but cannot figure out the exact setting line I need.

 I'd also push for financial compensation, I think!

I wish! :) Is there Dell management lurking on the list?

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


cannot find correct attribute for omconfig BIOS mod

2010-02-01 Thread Rahul Nabar
I was told by Dell support that I could use omconfig system biossetup
attribute=cpucore setting=all to change a BIOS setting that restricts
my cores to half.

Problem is I haven't gotten this to work I get an error message each time:
Error! Invalid command: biossetup

If I try omconfig chassis biossetup attribute=cpucore setting=all it
still throws
Error! This BIOS setup feature is not present on this system.


Unfortunately I cannot find the attribute cpucore anywhere in the
documentation for Open Manage. Is this atribute really present and
documented somewhere or am I barking up the wrong tree?

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: cannot find correct attribute for omconfig BIOS mod

2010-02-01 Thread Rahul Nabar
On Mon, Feb 1, 2010 at 11:39 AM, Vanush Misha mi...@cs.nuim.ie wrote:
useful for you?
 another one tro try would be omreport chassis biossetup and see
 what's showing up there. I've used omconfig chassis biossetup to
 enable CPU Virtualization Technology (cpuvt) and AC power recovery
 mode (acpwrrecovery) on PE2900 in the past.


omconfig chassis biossetup -?
Error! BIOS setup feature unavailable on this system.

I just don't get it!

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: R410's shipped out with BIOS showing 4 cores instead of 8

2010-01-31 Thread Rahul Nabar
On Sun, Jan 31, 2010 at 1:43 PM, Big Wave Dave bigwaved...@gmail.com wrote:

 --
 Rahul

 We had 16 x R410s show up in mid December with the same issue.

 Dave


Thanks Dave! I'm feeling a little better that I'm not the only one.

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: R410's shipped out with BIOS showing 4 cores instead of 8

2010-01-31 Thread Rahul Nabar
On Sun, Jan 31, 2010 at 1:43 PM, Big Wave Dave bigwaved...@gmail.com wrote:
 On Sun, Jan 31, 2010 at 8:05 AM, Rahul Nabar rpna...@gmail.com wrote:
 Has anyone seen this problem before? I have dual socket Nehalems with
 twin quad core chips. When I booted the OS it showed only 4 cores. I
 went to the BIOS and found under Processor Settings the entry
 Cores-per-processor set to Dual

 If I change it to All everything is ok again.

Thanks guys for helping out! I got the official answer from Dell just
a few minutes ago. Apparently the Dell servers are doing this strange
BIOS setting by design. I quote my Technical Account Manager below.

##
[snip]
My apologies we were quite busy today, and I was just about to email
you with the results. I have been able to check our tools, and with
several technicians. The bios setting by default is set to dual. This
is the same for the R710, R410, and R510 servers. As of right now it
looks like the only way to change that is by either physically going
to each server, and booting into the bios and changing the processor
settings.
[snip]
###


 R410's R710 and R510 all are doing this currently (says my TAM).

I can't say why but I was told it's a feature not a bug. Very
non-intuitive to me. I can't see why an 8 core server defaulting to 4
core makes sense.

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Network stability problems on R710 servers and BCM5709

2009-10-24 Thread Rahul Nabar
On Sat, Oct 24, 2009 at 5:14 PM, Steve Thompson s...@vgersoft.com wrote:
 On Sun, 27 Sep 2009, Ryan Pugatch wrote:

 Evangelos Souglakos wrote:
 We are experiencing network problems on i/o and network loaded R710
 Poweredge servers. Network connectivity dies after some time.
 The systems needs to be powered down to bring the NICs back to life.

 Supposedly, loading the bnx2 module with disable_msi=1 resolves this
 problem.

 There is a version of the netxtreme driver available that is newer than
 the one provided by RHEL/CentOS.  You can get this from Dell's site.  In
 my experience, using this driver resolves the problem.

 I have just taken delivery of a T710 w/BCM5709's, and can confirm that the
 disable_msi=1 trick does *not* resolve this problem; I have had the system
 hang while idle, 5 minutes after a cold start. Installing the netxtreme
 driver *does* fix the issue. Not very clever, guys.


I have a R410 here. I wonder if they have the same driver. I thought
the eth NICs on this one were Broadcoms.

-- 
Rahul

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq