Re: 2850 very slow cp compared to identical hardware

2010-12-10 Thread Jefferson Ogata
On 2010-12-10 20:40, Brian A. Seklecki wrote:
 5A2D BIOS H433 and the fast one has fw 516A and BIOS H418.  RAID
 adapter and container settings match across systems, as do e2fstune -l
 
 Just to confirm; same cache settings for each volume?

And same RAID battery state?

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: [BULK] RE: R910/Linux CPU Heat Problems?

2010-12-08 Thread Jefferson Ogata
On 2010-12-08 23:31, Bond Masuda wrote:
 yeah, looks like the R910 has 4 PSUs definitely something off with one
 of them. I'd consider taking a physical look at it who knows? maybe one
 PSU is failing and generating a lot of heat? 

Or perhaps the high fan speed in one PSU is part of a scaled response to
the high CPU temp. Maybe at higher CPU temps the other PSU fans will
spin up to high speed.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: iDRAC6 firmware download links broken (and where is the source code?)

2010-12-02 Thread Jefferson Ogata
On 2010-12-03 00:30, Adam Nielsen wrote:
 I would very much like to see a full build and reflash environment, 
 there are a number of improvements I would like to make to the DRAC - 
 and of course Dell would be free to include them in future releases if 
 they wanted to, so I still don't really understand why they are so 
 opposed to having people work on improving their products for free...

What Adam said.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Issues with syncing mirror/fetching rpms

2010-12-01 Thread Jefferson Ogata
On 2010-12-01 21:00, Matt Domsch wrote:
 On Wed, Dec 01, 2010 at 10:23:45AM -0500, Bryan wrote:
  Can the person who runs linux.dell.com take a look at this and fix
it?  This occurs from multiple different networks/clients on this same
file.
 
 I can reproduce the failure too, though it's not clear why it's
 failing.  apache logs show the connection and report the whole file
 was sent...  The file is very much readable on the server itself.
 Gremlins...

Is it being served from an NFS filesystem? If so, maybe you need to turn 
EnableSendfile off...

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


OT: mailing list issue

2010-11-01 Thread Jefferson Ogata
Got a message from the linux-poweredge@dell.com mailman interface last 
night claiming:

Your membership in the mailing list Linux-PowerEdge has been disabled
due to excessive bounces The last bounce received from you was dated
31-Oct-2010.  You will not get any more messages from this list until
you re-enable your membership.  You will receive 3 more reminders like
this before your membership in the list is deleted.

with a link to re-enable the membership. Checked my mail server logs and 
there have been no bounces on my end. Something's funky.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: perc6i alignment?

2010-10-12 Thread Jefferson Ogata
On 2010-10-12 14:52, Tino Schwarze wrote:
 I suppose(!) alignment doesn't matter that much (or at all) for RAID10
 (which is the right choice for DB loads with only few disks).
 
 But that's just my gut feeling.

My gut thinks your gut is wrong about that. :^) Why would RAID10 be
exempt? The PERC is still going to bunch up disk addressing into RAID
chunks. If your filesystem blocks aren't aligned with the chunk
boundaries, you're going to need two disks to seek to satisfy some read
requests, and four disks for some write requests.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Dumping the BMC/DRAC event log without installing a bunch of software on R410 Ubuntu 10

2010-09-20 Thread Jefferson Ogata
On 2010-09-20 18:47, Drew Weaver wrote:
 Is there a way to view the event log in the BMC/DRAC without installing 
 OMSA on an R410 with Ubuntu 10?

apt-get install ipmitool
man ipmitool

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: PE2850 RAID1 upgrade drives

2010-09-04 Thread Jefferson Ogata
On 2010-09-04 12:15, Raymond Kolbe wrote:
 Over the past couple of months I have been looking into upgrading the 
 drives in one of our servers, a PE2850 running CentOS 4.8. Currently it 
 has 3x146GB 10K drives, two of which are RAID1 and the third being a hot 
 spare.
 
 I would like to upgrade the drives to 3x300GB 15K drives but I do not 
 want to reinstall the OS. I have found many articles on the web related 
 to upgrading RAID1 configurations and it seems like everyone says the 
 following:
 
 1) Create a Ghost image of OS/data, etc. for backup.
 2) Break the array (degrade it).
 3) Pull one of the drives (drive 1) and replace it with the newer 300GB 
 drive.
 4) Let the array rebuild to the bigger drive.
 5) Pull drive 0 and replace it with the newer 300GB drive.
 6) Let the array rebuild.
 7) Use gParted or another partition resizing program to increase my 
 partitions.
 
 or
 
 1) Create a Ghost image of OS/data, etc. for backup and restore.
 2) Turn off the server and replace both drives with the newer 300GB drives.
 3) Turn on the server and create a new RAID1 array.
 4) Restore the Ghost image from step 1.
 5) Use gParted or another partition resizing program to increase my 
 partitions.
 
 However, no one has confirmed that these methods worked for them.
 
 Now, both ways sound like they would work, but I am extremely nervous 
 about this because I have also found forum postings and articles about 
 having to manually copy over partition information, and that disk block 
 sizes matter, etc. (not exactly sure about the technical issues here), 
 etc. This is also a mission critical production server so uptime is key.
 
 So my question is, are either of the two methods above realistic, and/or 
 has anyone actually upgraded RAID1 in a PE2850 or PE server before 
 without having to reinstall their OS?

Method 1 may not give you a larger RAID1.

Method 2 may not preserve your boot record (MBR) and partition table,
which are stored in the first track of the RAID volume, i.e. blocks
0-62. Ghost may or may not be able to image these.

In both cases, there may be limits to how you can resize the partitions
because of the actual layout on disk. Also, depending on utilization,
copying full images of your filesystems, rather than using dump/restore,
may waste a lot of time on unallocated blocks.

Since you have a third disk in there, you could also do something like
the following, which would have lower downtime:

1. Replace the hot spare with a 300 GB disk.
2. Create a RAID0 volume on the new 300 GB disk (get rid of the hot spare).
3. Create a partition on the new RAID0 volume and make a filesystem on it.
3. Use dd to transfer the partition table and boot record to a file on
the new filesystem. (dd if=/dev/sda of=/mnt/foo/track0 count=63)
4. Boot in emergency mode, or live boot a CentOS install disc or other
live CD, the objective being to make sure all filesystems are either
unmounted or mounted read-only.
5. Use dump to copy the other filesystems to the new disk. (e.g. dump 0f
/mnt/foo/root.0 /dev/sda1, etc.)
6. Make copies of the static dump and restore binaries on the new disk,
in case you don't have them later.
7. Delete the RAID1 volume.
8. Replace the other two disks.
9. Create a new RAID1 volume.
10. Boot a live or install disc again and mount the RAID0 filesystem.
11. Use dd to copy the partition table and boot record to the new RAID1.
12. Tweak the partition table to suit your needs.
13. Create new filesystems on the new partitions. Run mkswap on your
swap partion, if you have one. Check /etc/fstab and be sure to specify
filesystem labels where needed, e.g. if /etc/fstab says LABEL=/usr for
the /usr mount, be sure to add -L /usr to your mkfs line. You can also
tweak labels after the fact using e2label. Also pay attention to whether
there's a label on your swap partition, and use -L with mkswap in that
case as well.
14. Mount each new filesystem and use restore to recover the appropriate
filesystem.

I'm assuming you're not using LVM. If you are, then some of these steps
would become simpler.

It might be advisable to use Ghost, as you suggested, to make a backup
over the network to a different system just in case. But it will add
time to the process.

If you're not completely familiar with all of this, it may be best for
you to setup another system to practice on. Specific things to practice
ahead of time (that are good skills for you to have as a sysadmin anyway):

- Using grub to rewrite the boot record.
- Changing grub config options (e.g. root, kernel) at the boot
screen in order to boot a system whose disks have been shuffled around.
- Booting in emergency mode.
- Getting your root filesystem remounted r/w in emergency mode.
- Using dump and restore.
- Checking and modifying filesystem labels with e2label.
- Identifying which RAID volume /dev/sda actually refers to.
- Using a CentOS/Red Hat install disk to get to a command line without
nuking anything on your disk.


Re: 16tb filesystems on linux

2010-08-26 Thread Jefferson Ogata
On 2010-08-26 17:26, Nick Stephens wrote:
 Does anyone have any tips or tricks for this scenario?  I am utilizing 
 RHEL5 based installations, btw.

Don't create very large filesystems.
Use LVM.

- Very large filesystems take a long time to fsck. Using smaller 
filesystems with LVM snapshots lets you fsck periodically without even 
umounting your filesystems.
- A serious error or inconsistency in a very large filesystem may blow 
away all of your data; smaller filesystems constrain the damage.
- The properties of one giant filesystem (e.g. striping, inode/block 
ratio) can't be tuned to the different needs of different types of files 
you might store. Your application might be more efficient if it put 
larger files on a different filesystem with a better large-file 
allocation strategy.
- Very large filesystems limit you to a small subset of possible 
filesystem types.
- Very large filesystems keep you from migrating your data to 
off-the-shelf hardware in an emergency.
- You're going to hit limits of some kind sooner or later, so your 
application should be designed to tolerate having your data on multiple 
filesystems anyway.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: 16tb filesystems on linux

2010-08-26 Thread Jefferson Ogata
On 2010-08-26 18:30, Nick Stephens wrote:
 I actually gave that a shot myself but didn't think it was available yet 
 due to getting the same error message.  Now that I think about it 
 though, it could be a different issue I'm encountering. 
 
 [r...@localhost ~]# mkfs.ext4dev -T news -m0 -L backup -E 
 stride=16,stripe-width=208 /dev/sda1
 mke2fs 1.41.12 (17-May-2010)
 mkfs.ext4dev: Size of device /dev/sda1 too big to be expressed in 32 
 bits
 using a blocksize of 4096.

Another reason to use LVM: you've put a partition table on your giant 
block device. Did you align the start of the first partition with your 
RAID stripe size? If not, then many of your filesystem blocks will span 
two disks, meaning reading one of those block requires two disks to seek 
instead of one. If you make the whole block device an LVM physical 
volume instead, you won't have to worry about that (unless you have a 
stripe size  64 kB, and in that case, you can override the default PV 
metadata size to make it a multiple of your RAID stripe size).

See:

http://insights.oetiker.ch/linux/raidoptimization/

[snip]
 The MD1000 is populated with (15) 2TB 7200rpm SAS drives in a RAID-5 
 with 1 hotspare (leaving 13 data disks).  I know that conventional 
 wisdom says that raid5 is a poor choice when you are looking for 
 performance, but localized benchmarking has proven that in our scenario 
 the total-size gains acquired with the striping outweigh the redundancy 
 provided with RAID-10 (since we are unable to get significant 
 performance increases).

Consider creating two 7-disk RAID5s instead of a single 14-disk RAID5. 
This will double your redundancy, and you can still stripe over all 14 
disks using LVM.

In addition, if you use slots 0-6 for one RAID5 and 7-13 for the other, 
you can dual-connect the MD1000 and have one SAS channel dedicated to 
each RAID.

Or, as others have suggested, consider RAID6.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: idrac firmware update on R710, can't connect to idrac now

2010-08-06 Thread Jefferson Ogata
On 2010-08-06 18:01, Sabuj Pattanayek wrote:
 I just updated the idrac firmware on an R710 :
[snip]
 But now I can't connect to the idrac. Did the settings on the idrac
 basically get blown away so that it doesn't remember what static IP I
 had set for it? How do I set the idrac ipv4 settings without going
 over to data center? Can I do it from Linux? I was looking at this :
 
 http://support.dell.com/support/edocs/software/smdrac3/idrac/idrac1.11/en/ug/html/chap02.htm#wp95392
 
 but it doesn't look like it's possible?

If you have ipmitool you can set LAN and user settings with lan set 
and user set commands, respectively.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: idrac firmware update on R710, can't connect to idrac now

2010-08-06 Thread Jefferson Ogata
On 2010-08-06 20:04, Sabuj Pattanayek wrote:
 But, I'm still not getting any pings, nor can I connect to the web, or
 via ssh. Any other ideas? Do I just have to bite the bullet and try
 power cycling it?
 
 I'm also assuming that ipmitool chassis power cycle does an
 immediate power cycle, i.e. it doesn't call shutdown first? I could
 also try a warm boot first to see if that fixes the idrac. Has anyone
 experienced the same issue, i.e. is the idrac supposed to stop working
 after an update until a cold/warm boot ?

ipmitool chassis power cycle will, as you say, immediately power cycle 
the main system. It won't necessarily reset the iDRAC if that's wedged 
somehow. The way to do that is to actually pull all physical power to 
the system for some time between 30 seconds and a few minutes.

A regular reboot is at least worth a shot.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: collecting RAID info

2010-07-21 Thread Jefferson Ogata
On 2010-07-21 05:33, Geoff Galitz wrote:
 We have systems with various PERC RAID cards and I would like to be able 
 to gather basic data about the RAID configs on our servers programmatically.
  
 In other words, I want to write a script that can report the disk drive 
 models, size and the RAID level configuration.  This is for inventory 
 purposes so I don't need to worry about the runtime state of the array.  
 I'm ok with perl and shell, so I really just need guidance on what 
 interfaces to use to collect the data.

You can use megactl/megasasctl for this. That's pretty much what I wrote 
it for.

http://megactl.sourceforge.net/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: collecting RAID info

2010-07-21 Thread Jefferson Ogata
On 2010-07-21 21:20, Paul M. Dyer wrote:
 Is megactl/megasasctl available for a WinOS?

Afraid not; sorry.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Vendor provided support

2010-07-19 Thread Jefferson Ogata
On 2010-07-19 23:01, Wiley Sanders wrote:
 I am trying to resolve an issue with RHEL on a newly purchased R510 (a 
 botched RHEL4 to RHEL5 upgrade), I opened a ticket with RHN, and RHN is 
 saying support is vendor provided and to contact Dell and not RHN for 
 support issues.
 
 Huh? (A polite way of saying WT*?)
 
 Is that what I'm supposed to do now? It's been a while (RHEL 3 days) 
 since I've bought a system with RHEL support. I've been wandering 
 through the labyrinth of dell.com http://dell.com for the last 15 min 
 looking for *anything* that references RHEL support and so far nothing. 
 RHEL support costs big bucks and I don't appreciate a runaround - from 
 RHN and not Dell in this case it looks like. That's what I get for not 
 sticking with CentOS - hey I just wanted to make sure I got good System 
 Management Tools support (which I did - installing omreport just *worked*!)

Open a ticket with Dell, and make sure your sales rep knows about the issue.

FWIW I had a whole slew of problems with RHEL purchases via Dell 
auto-activated licenses this year, involving Red Hat entitling systems 
for only two weeks when a full year had been purchased.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Vendor provided support

2010-07-19 Thread Jefferson Ogata
On 2010-07-19 23:10, Robin Bowes wrote:
 I would imagine Dell support will be limited to RHEL runs on this
 hardware. If you have any issues with kernel oops or other
 hardware-related crashing then Dell may be interested in fixing it.

Not if it's a licensing issue where Dell is the OEM that sold the Red 
Hat license. In that case, Red Hat will run you (and Dell tech support) 
around in circles of ever-increasing size until you finally conclude 
that the only viable option is CentOS.

My only support tickets with Red Hat in the past two years have been 
related to licensing. Kinda tells you something

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Help with error message from megasasctl

2010-07-08 Thread Jefferson Ogata
On 2010-07-09 04:10, brijesh patel wrote:
 I have been receiving error messages from megasasctl which says the
 following.
snip /
 *a0e32s3   SEAGATE ST9300603SS279GiB  a0d0  online   errs:
 media:0  other:5*
  write errors: corr:  0delay:  0rewrit:  2tot/corr: 
 2tot/uncorr: 13  
   read errors: corr: 32Mi  delay:  1reread:  0tot/corr:
 32Mi  tot/uncorr:  0  
 verify errors: corr:159Mi  delay:  9revrfy:  0   
 tot/corr:159Mi  tot/uncorr:  0  
 
 This is the 4th hard drive in my RAID Array. I couldn't find a proper
 explanation on google. I think one of my hard drive is failing but i
 would like to have your thoughts on it.

Yes, that is a failing disk. I don't know why the PERCs don't fail them
when this starts to happen, but the disk is clearly reporting that it
has had 13 uncorrected write errors. That may be okay for Dell's data,
but not for mine.

Usually in this case I start a long self-test on the disk in question
(as well as on a spare so I have a reliable replacement). Often the
self-test will fail sufficiently to get Dell to replace the disk, or it
may even force the PERC to fail the disk.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Blew away my partition table

2010-06-29 Thread Jefferson Ogata
On 2010-06-29 20:22, Eberhard Moenkeberg wrote:
 On Tue, 29 Jun 2010, Jefferson Ogata wrote:
 You should be able to use dmsetup to create device nodes with offset
 into /dev/sda if you want to do this. But you should be able to find
 your filesystem headers with dd and xxd (or any hexdump program).
 
 A very good idea, to avoid the reboot.

Well, apparently using dmsetup doesn't work because the kernel refuses 
to set up new mappings directly on /dev/sda, possibly because there's an 
existing lock on the device due to the partition table being loaded.

 Where to look:

 - The first partition starts one track into the disk; typically that's
 63 512-byte sectors.

 - The second, third, and fourth partitions are usually on cylinder
 boundaries, with a cylinder typically being 63 * 255 512-byte sectors.

 - If you had more than four partitions, then the last physical partition
 has a partition table at the beginning. The first logical partition will
 begin one track into that physical partition.

 What to look for:

 - For ext3 filesystems, a superblock begins 1024 bytes into the
 partition. At offset 0x38 in the superblock you should find the magic
 number 0x53ef (big-endian).

 - For swap partitions, look at the first 4096 bytes. At the end of that
 page you should find the string SWAPSPACE2.

 - For LVM physical volumes you should see an LVM label 512 bytes from
 the beginning of the partition.
 
 A nice collection. Thanks, I will keep it in case i get into partition 
 table trouble.

A little more info:

- For ext3 filesystems, the superblock begins with a series of uint32_ts 
in little-endian format. The second uint32_t is the number of filesystem 
blocks in the filesystem. The seventh uint32_t is the block size, 
expressed as the number of bits to shift 1024 left. (So 0 for 1024-byte 
blocks, 1 for 2048-byte blocks, 2 for 4096-byte blocks). From this you 
can calculate the offset to the next partition--multiply the number of 
blocks by the actual blocksize and round up to a cylinder boundary.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Blew away my partition table

2010-06-29 Thread Jefferson Ogata
On 2010-06-30 01:53, J. Epperson wrote:
 On Tue, June 29, 2010 21:27, Jefferson Ogata wrote:
 Number  Start   End SizeType File system  Flags
  1  63s 401622s 401560s primary  ext3 boot
  2  401625s 139299608s  138897984s  primary  ext3
  3  139299616s  143380124s  4080509sprimary   swap
 I would say those end sectors on partitions 1 and 2 should be one less
 than the following partition's start sector. The end sector of partition
 3 looks correct; though the last sector on the disk is 143380479, when
 you round down to a cylinder boundary you end up at 143380124.
 
 I was thinking the same thing, but that's what the parted rescue found,
 so I assumed it was correct.  Looking at another F12 system, what you say
 is how that one is.  Not sure what to do, try it as is or make the
 adjustment.  I do notice from the other system that I should probably mark
 the swap as FS type linux-swap(v1).  The other system looks like:

I don't think it would actually matter with partition 1. If your 
filesystem has a 2kB or 4kB block size, then those extra 2 sectors won't 
ever be addressed. With partition 2, however, the additional 7 sectors 
extend the volume by one or two filesystem blocks (with 3 extra sectors 
on the end).

I would go ahead and extend the partitions to the n-1 values. It's 
always safe to have a filesystem on a block device that is larger than 
the filesystem, but the converse is not true. You can also check the 
superblock with tune2fs -l to see how big the filesytem thinks the block 
device is. Block count * Block size / 512 should be = the number of 
sectors.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: RHEL5 and PERC H700

2010-06-24 Thread Jefferson Ogata
On 2010-06-24 12:34, Kipp, Jim wrote:
 Thanks, I will give RHEL5.4 a try

Uh, why not RHEL 5.5?

 -Original Message-
 From: linux-poweredge-boun...@dell.com
 [mailto:linux-poweredge-boun...@dell.com] On Behalf Of
 raghavendra_bilig...@dell.com
 Sent: Thursday, June 24, 2010 12:36 AM
 To: robin-li...@robinbowes.com; linux-powere...@lists.us.dell.com
 Subject: RE: RHEL5 and PERC H700
 
 RHEL5.2 does not include native support for H700 controller.
 
 H700 controllers is supported natively from RHEL5.3 onwards. 

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: dell 2850 initrd problem.

2010-06-09 Thread Jefferson Ogata
On 2010-06-09 15:33, Paul M. Dyer wrote:
 Sorry for the delay.  Been busy with things.
 
 If you believe the other LVOL is a  filesystem, you can run e2fsck on it 
 also.   Yes, the first LVOL may be a swap partition.   The -b 32768 
 parameter is telling e2fsck to use superblock at location 32768, instead of 
 the default location.   If you have a superblock that is corrupt, that 
 command is using the first backup superblock.   So, you could try the 
 default, if fails, then the first backup superblock, then the second, ...
 default superblock:
 e2fsck -f /dev/mapper/VolGroup00-LogVol00

Even if there were a filesystem on that volume, which there almost 
certainly isn't, I don't know why you think the first backup superblock 
is at 32768. The locations of backup superblocks depend on the size of 
the device. The way to find out is to run a non-destructive mkfs on the 
device (i.e. with mkfs.ext3 -n).

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: dell 2850 initrd problem.

2010-06-08 Thread Jefferson Ogata
On 2010-06-08 14:18, Ron Croonenberg wrote:
 Hi Paul,
 
 I managed to move pretty much all data from the 'hosed' machine to 
 another place. (there are a few things that I would like to salvage but 
 couldn't)
 
 I would like to make an attempt to fix the filesystem and see if I can 
 get some more of it.
 
 So if I issue: 
 e2fsck -f -b 32768 /dev/mapper/VolGroup00-LogVol01
 
 because that is the one with the bad superblock? that I found a bit ago 
 with:

Again, there's no reason to think you have a bad superblock.

Have you made an image of LogVol00 so you can safely fsck it?

 e2label /dev/mapper/VolGroup00-LogVol01 : bad magic number in
 Group-LogVol01. couldn't find valid filesystem superblock
 
 (although Jefferson mentioned that might be swap)

Given that that's the only other volume, it's almost certainly swap. Can
you recover the original /etc/fstab from
/dev/mapper/VolGroup00-LogVol00? That should tell you if it's swap.

Or you can run the following command:

dd if=/dev/mapper/VolGroup00-LogVol01 bs=1 skip=$[0xff6] count=10 | strings

If that yields SWAPSPACE2, it's swap.

 there is also:
 
 /dev/mapper/VolGroup00-LogVol00  but I thought that was /boot ?

/boot is /dev/sda1.

 sorry about all the questions, but I am a bit of a rookie with fixing crashed 
 filesystems.

That's okay.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: dell 2850 initrd problem.

2010-06-04 Thread Jefferson Ogata
On 2010-06-04 15:42, Ron Croonenberg wrote:
 Jefferson Ogata wrote:
 What does your partition table actually say? Is /dev/sda2 *supposed* to
 be a filesystem, or is an LVM physical volume?
 
 it says that /dev/sda2 is an LVM volume

In that case, I strongly suggest that you *not* follow any advice about
trying to recover a superblock on it.

What is the filesystem label on /dev/sda1 (run e2label /dev/sda1)? (I'm
guessing it's /boot.)

How important is the content of this system. Do you have another system
you can image the disk to in case you do something destructive?

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: dell 2850 initrd problem.

2010-06-04 Thread Jefferson Ogata
On 2010-06-04 15:42, Ron Croonenberg wrote:
 it says that /dev/sda2 is an LVM volume

In case it isn't clear to you, BTW, this means that /dev/sda2 is NOT /.

Check what devices you have under /dev/mapper/.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: dell 2850 initrd problem.

2010-06-04 Thread Jefferson Ogata
On 2010-06-04 16:58, Robin Bowes wrote:
 On 04/06/10 17:51, J. Epperson wrote:
 On Fri, June 4, 2010 12:08, Jefferson Ogata wrote:
 On 2010-06-04 15:42, Ron Croonenberg wrote:
 it says that /dev/sda2 is an LVM volume

 In case it isn't clear to you, BTW, this means that /dev/sda2 is NOT /.

 ?Does it?
 
 Hmm, I queried that statement too.
 
 Why does this mean that /dev/sda2 is not / ?
 
 As I understand it, the default RH/Fedora install is two partitions: a
 small /boot, and the rest an LVM PV assigned to a VG (VolGroup00) with
 root on an LV inside VolGroup00 ?

Regardless of default install layout, we have a bad superblock on
/dev/sda2 and /dev/sda2 marked as an LVM device. The evidence is strong
that / is an LV within the PV that is on /dev/sda2. If you tried to
recover a superblock directly on /dev/sda2 you would wipe out who knows
what.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: dell 2850 initrd problem.

2010-06-04 Thread Jefferson Ogata
On 2010-06-04 17:08, Ron Croonenberg wrote:
 Here is what is in /dev/mapper/
 
 10, 63  control
 253, 0  VolGroup00-LogVol00
 253, 1  VolGroup00-LogVol01

And what do you get from:

e2label /dev/mapper/VolGroup00-LogVol00
e2label /dev/mapper/VolGroup00-LogVol01

(or, alternately):

e2label /dev/VolGroup00/LogVol00
e2label /dev/VolGroup00/LogVol01

What are the sizes listed in /proc/partitions for these two devices
(identified by major/minor numbers 253/0 and 253/1)?

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: dell 2850 initrd problem.

2010-06-04 Thread Jefferson Ogata
On 2010-06-04 18:57, Paul M. Dyer wrote:
 Actually, you would want to try and recover the superblock from the 
 /dev/mapper/..   device.

No, given that he already said he can see /etc, albeit missing important
things, his superblock is almost certainly intact.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: dell 2850 initrd problem.

2010-06-04 Thread Jefferson Ogata
On 2010-06-04 17:17, Ron Croonenberg wrote:
 In what sense are these systems fried? Can you move the RAID
 controller and disks from one system to another?
 
 Uhm,  no.  the machine I am talking about now has the hardware repaired,
 by  putting in a new raid kit.
 Moving it to another server (I don't have another 2850, I have a few
 more 2950's though)( would make much sense I think.
 
 The filesystem is 'corrupted'  so moving the disks doesn't help much there?

I meant the other way around. If the two backup systems are fried but
you can get their disks running on alternate non-fried systems, you
can recover your data from those.

 question:  since  I have a broken filesystem,  does it even make sense
 to make a disk image of it?  or do you mean just for backup? (that if I
 break it, I still have the broken stuff I started with?)

Yes, it does make sense, and for the reason you surmise.

Ideally you would get a disk image onto another system as an LV, take a
snapshot of that, and then work with the snapshot. Then if things go
awry, you just drop the snapshot and make a new one.

If you're lucky, what's happened is that the /etc directory file has
been damaged but that all of the objects it linked to are intact. fsck
will reparent these objects under /lost+found if so.

But there may be other damage as well.

But even if you can't recover /etc, you may well be able to recover the
data you're really interested in.

And even if files have been orphaned and fsck doesn't reattach them
under /lost+found you may be able to recover them using debugfs.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: dell 2850 initrd problem.

2010-06-04 Thread Jefferson Ogata
On 2010-06-04 19:14, Ron Croonenberg wrote:
 What jefferson says is correct, I can mount the volume with the rescue 
 cd, in /mnt/sysimage.
 
 If I browse around (within /mnt/sysimage) I can see 'everything', not 
 just etc.
 
 However, /initrd is empty  and in /etc there are a bunch of damaged 
 files,  initrd.conf, ldap.conf ...  but those are just a few (a dozen or 
 so)  (an ls in the directory would show  some question marks  on the 
 lines with damaged files.
 I tried to rename one of the damaged files and got an Input/Output error.

Is there anything in the output of dmesg that relates to the I/O error?

The /etc directory file may have an invalid block number in its block 
list, pointing past the end of the volume, for example.

I wouldn't try to make any modifications to the filesystem, let alone 
fully boot it, without imaging it first. You could very easily have a 
corrupted free block list that could cause boot logging to write all 
over the data you care about.

Your best bet might be just to do a reinstall.

IIRC, /initrd shouldn't even exist except during boot before swaproot; 
that's a transitory filesystem used to load drivers so the kernel can 
find the devices it needs. If it does exist it *might* mean that your 
last kernel patch had a problem building the new initrd and aborted, 
e.g. if the filesystem was full or if the RAID controller caused 
problems. You might have a previous kernel you could boot (should show 
up in your grub list), but as I say, I wouldn't even try to boot the 
system in its current state without making a disk image.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: YA Perc H700 Question

2010-05-25 Thread Jefferson Ogata
On 2010-05-25 16:48, Wiley Sanders wrote:
 This is another basic quick presales question that sales doesn't seem to
 be able to answer: The specs for the H700 say it has 4x2 SAS ports.
 What exactly does 4x2 mean? As far as I can tell the controller has
 two channels, period.

I believe it means two four-lane channels (12 Gb/s each). But I don't
have any H700s so I'm not certain. That's how the PERC 6s are tho.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: YA Perc H700 Question

2010-05-25 Thread Jefferson Ogata
On 2010-05-25 19:23, Jefferson Ogata wrote:
 On 2010-05-25 16:48, Wiley Sanders wrote:
 This is another basic quick presales question that sales doesn't seem to
 be able to answer: The specs for the H700 say it has 4x2 SAS ports.
 What exactly does 4x2 mean? As far as I can tell the controller has
 two channels, period.
 
 I believe it means two four-lane channels (12 Gb/s each). But I don't
 have any H700s so I'm not certain. That's how the PERC 6s are tho.

BTW, I would (and do) avoid the H* controllers until Dell makes good on
its pledge to remove the crippleware firmware features that prevent it
from interoperating with non-Dell drives. AFAIK new firmware addressing
this hasn't yet been released.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Mounting an LVM disk

2010-05-24 Thread Jefferson Ogata
On 2010-05-24 19:09, J. Epperson wrote:
 Some good points, but having had this hole in my own foot, I'll say that
 it's very unlikely that it's _just_ the partition table that got wiped.  I
 also never had any luck getting a partition editor to work with a disk
 that had a table saying it was bigger than it actually was.  Always had to
 wipe it at a hardware level to get it repartitioned.
 
 I hope OP's luck is better.

OP doesn't need a partition table. Assuming that a dd was executed in 
the wrong direction for some period but aborted without wiping out too 
much of the disk, he needs to know the offset where the /home filesystem 
started, and a lower bound on its size. The filesystem could start at 
any multiple of LVM chunk size from the beginning of the physical 
volume, which would have covered either the entire disk (which may still 
be what's going on) or have started at a track offset from the start of 
the disk, or cylinder offset if not on the same cylinder as the 
partition table or logical partition table (unless the disk was 
partitioned in some unusual way). If not too much of the disk is gone, 
he also might be able to find a backup of the LVM config somewhere. It 
would be worthwhile imaging the whole disk as a backup, and using 
strings(1) to try to find an LVM backup.

A bigger question for me is why the OP isn't using any redundancy 
(single disk for OS and RAID0 for the rest), but whatever...

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: how to enable BMC sensors

2010-05-24 Thread Jefferson Ogata
On 2010-05-24 23:00, Zhichao Li wrote:
 1. From the output of ipmitool sdr, can I ensure that the drivers for 
 BMC haven't been installed(or supported) in my Cent OS?
 If so, where can I get those drivers for my CentOS?
 2. Since currently, no ipmi modules is installed on my CentOS, where can 
 I find and install those ipmi modules on my CentOS for Dell PowerEdge R710?

Did you run service ipmi start?

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: any advice to find root cause of Falling back to HPET ?

2010-05-23 Thread Jefferson Ogata
On 2010-05-24 01:01, Bond Masuda wrote:
 This time around, on 3rd chassis, the lost ticks are no longer, the
 server is fast/normal again, and all is well. I just can't believe we
 ran into 2 servers with the same issue back to back??? (the guys at the
 hosting company are usually great, but it almost makes me doubt that
 they did the 1st chassis swap??)

For future reference, you might want to save a dump of dmidecode output
before requesting chassis swaps so you can check if the service tag
actually changed.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: any advice to find root cause of Falling back to HPET ?

2010-05-23 Thread Jefferson Ogata
On 2010-05-24 04:45, Bond Masuda wrote:
 As an aside, this weekend has been a bizarre weekend. After finally
 resolving this lost ticks issue, another server kernel panic and crashed
 mysteriously, and again kernel panic upon boot up.. some messages about bad
 memory in DIMM8 and DIMM7. then my friend's Drobo went all red and failed
 this afternoon. We must be getting showered by intense cosmic rays this
 weekend

Maybe the lost ticks was happening because of the Lost series
finale. rim-shot /

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Can't set lcd on R710?

2010-05-22 Thread Jefferson Ogata
On 2010-05-21 19:31, Sam Flory wrote:
 I'm unable to set the lcd message on my R710.  This seems to work fine on a 
 2950.  Is there a patch for ipmitool I'm missing or something?

The R710s have a more elaborate LCD setup than the 2950s. There are
settings both in the BIOS and in the iDRAC Express. I'm not surprised
the same method ipmitool doesn't work in the R710.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: any advice to find root cause of Falling back to HPET ?

2010-05-22 Thread Jefferson Ogata
On 2010-05-22 19:52, Bond Masuda wrote:
 I wish RHEL4 had smartctl that worked with megasas; i'll have to compile the
 latest smartctl to see if SMART data will tell me anything about the drives.
 One thing to note is that during the build of these  servers 2 weeks ago,
 one of the drives on s8 did fail and had to be replaced.

Try megactl/megasasctl.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Serial over LAN

2010-05-20 Thread Jefferson Ogata
On 2010-05-20 17:45, chandrasekha...@dell.com wrote:
 Install OpenManage Server Administrator on that system. It will provide
 you GUI or CLI interface to configure the SOL.

Talk about using a sledgehammer to drive a pushpin.

Nice job quoting the entire digest too.

On 2010-05-20 16:14, Marc Moreau wrote:
 I'm looking to setup Serial over Lan on my cluster of PowerEdge 1950's.   
 Does anyone have this setup?

Sure. Use it all the time.

 I'm kind of confused on how all the redirection works.  From posts that I 
 have read, we redirect console to a serial port, then tell the BMC to forward 
 the serial console to the LAN.  But my BIOS has a 'Direct Connect' mode.  My 
 IPMI doesn't have any of the Serial redirect.  I fear that I am confusing 
 IPMI and BMC somewhere too.  Could some take a stab at explaining this please.

Assuming you don't have DRACs in these systems, the BMC is the device
that is providing the IPMI interface.

1. In BIOS setup, set console redirection to use COM2.

2. In BMC setup (control-E during POST), enable serial over LAN, set IP
and password.

** First thing you should do in BMC setup is reset to default. The BMCs
often ship with a weird non-default setting that will cause lots of
serial port feedback if you try to run a getty on the serial console.

3. Use ipmitool -H bmc-host -I lanplus -U bmc-user sol activate to
reach the serial console.

 One further question. If I do get this setup, does it give me 'full console 
 access' over serial. In other words, do I get the BIOS POST, or does the 
 serial console only come up after post?  I am aware that I need to setup my 
 OS (Centos) to spawn a serial tty, and this I have done before, just not over 
 baseboard LAN.

Yes, you will get the entire POST including the initial BIOS id. You
might also want to set Redirection after boot in the BIOS. In grub,
add something like console=ttyS1,57600 console=tty0 to your kernel line.

If you have DRACs the setup is actually quite similar, but you can also
do it completely remotely by hitting the web service on 192.168.0.120
(IIRC) to access the KVM console, then use that to do the BIOS setup.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Serial over LAN

2010-05-20 Thread Jefferson Ogata
On 2010-05-20 20:56, Alexander Dupuy wrote:
 Jefferson Ogata wrote:
 ** First thing you should do in BMC setup is reset to default. The BMCs
 often ship with a weird non-default setting that will cause lots of
 serial port feedback if you try to run a getty on the serial console.
 
 For often you can substitute always (at least in my experience).

I've had one or two cases where I *didn't* have this problem. :^)

 Yes, you will get the entire POST including the initial BIOS id. You
 might also want to set Redirection after boot in the BIOS. In grub,
 add something like console=ttyS1,57600 console=tty0 to your kernel 
 line.
 
 Just to make this clear, if the BIOS redirection after boot is 
 enabled, the VGA console will be duplicated on the serial port.  This 
 may be okay (and is the easiest way to make it work), but if you go that 
 route you want to make sure that you are not also using the serial port 
 directly (e.g. grub console=ttyS1,57600, /etc/inittab ttyS1 entry, 
 etcetera).
 
 If you have redirection after boot disabled, you are responsible for 
 making serial access work in ISOLINUX/PXELINUX, grub, and Linux.

Hmm, my impression was that Redirection After Boot only affects comms 
up to the point of getting a bootloader running, but I'm not 100% sure.

Anyway, no harm in trying either way.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: 2850 PERC 4e/Di drive errors

2010-05-11 Thread Jefferson Ogata
On 2010-05-11 14:54, Matthew Lenz wrote:
 $ megactl -H
 a0   PERC 4e/Di   chan:2 ldrv:1  batt:good
 a0c0t0 279GiB  a0d0  online   errs: media:6  other:1
  write errors: corr:  0delay:  0rewrit:  0tot/corr:  
 0tot/uncorr:  0   
   read errors: corr:  4Mi  delay: 58reread:  0tot/corr:  
 0tot/uncorr:  0   
 verify errors: corr:  0delay:  0revrfy:  6tot/corr:  
 0tot/uncorr:  6   
 temperature: current:28C threshold:0C
 
 This is the only system with this showing up (of several x850 raid 
 setups).  These systems are still under warranty should I request a 
 replacement of this drive?  I really don't know how long this drive has 
 been erroring.

Try running a long self-test on the drive (megactl -T long a0c0t0). If
it fails that it will be worth replacing it.

I'm assuming that that drive is part of a redunant RAID. Be aware that
if the disk is failing, a long self-test may turn up enough problems for
the PERC to knock the disk offline. If you don't have redundancy, back
the system up first...

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Debian lenny on PE2950

2010-04-27 Thread Jefferson Ogata
On 2010-04-28 04:27, Carlos Beltrame wrote:
 I'm a beginner with dell products and I have several douts about
 installing debian lenny (amd64) into a PE2950. It has 6 HDs, 2x73gb and
 4x146gb. I know that this machine comes with PERC5 and when the
 instalation begins, the partition manager detects only one scsi with
 73gb and another with 438gb, scsi too. Is it correct? How to detect each
 hd to create my own raid and lvm?
 I hope that my explanation was clearly. Best regards.

Obviously the PERC is set up with a RAID1 on the two 73GB disks and a
RAID5 on the four 146GB disks. This is hardware RAID. You don't need to
see the individual disks in Linux; the PERC is managing them for you and
giving you logical disks to work with.

If you want to change the hardware RAID config, go into the PERC BIOS
during boot (watch for the prompt). If you really want to see the
individual disks in Linux for some reason, you can delete the two
logical disks and configure each disk as its own RAID0.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Very high load during mkfs on a new 1tb raid 1

2010-04-20 Thread Jefferson Ogata
On 2010-04-20 16:04, Daniele Paoni wrote:
 I just installed two new 1TB disks configured as RAID1 on a PE2950 with
 PERC 6/i controller.
 When I try to create the filesystem with mkfs the load on the server
 goes over 500 (i have then killed the mkfs) and all the processes are
 blocked in D state (the processes are apache + mysql running on another
 raid1 array)

Did you wait for the RAID1 to finish initializing?

-- 
Jefferson Ogata : Internetworker, Antibozo
og...@antibozo.net  http://www.antibozo.net/ogata/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Very high load during mkfs on a new 1tb raid 1

2010-04-20 Thread Jefferson Ogata
On 2010-04-20 16:59, Daniele Paoni wrote:
 On 04/20/2010 06:25 PM, Jefferson Ogata wrote:
 On 2010-04-20 16:04, Daniele Paoni wrote:
 When I try to create the filesystem with mkfs the load on the server
 goes over 500 (i have then killed the mkfs) and all the processes are
 blocked in D state (the processes are apache + mysql running on another
 raid1 array)
  
 Did you wait for the RAID1 to finish initializing?

 Yes, the array status is ready.
 I have initialized it with the fast initialize command, could it cause
 the problem ?

Fast initialization just means it marks the array as ready and continues
scrubbing in the background. The scrubbing competes heavily for I/O.

Use MegaCLI to check initialization status, e.g. MegaCLI -ldbi
-showprog -lall -aall.

Or look at the disks when you aren't doing anything. Are they both lit
up heavily?

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Very high load during mkfs on a new 1tb raid 1

2010-04-20 Thread Jefferson Ogata
On 2010-04-20 20:54, Daniele Paoni wrote:
 On 04/20/2010 07:35 PM, Jefferson Ogata wrote:
 Fast initialization just means it marks the array as ready and continues
 scrubbing in the background. The scrubbing competes heavily for I/O.

 Ok I have reinitialized the array with the slow initialization method.

I don't think it actually makes any difference whether you choose fast
or slow.

 Use MegaCLI to check initialization status, e.g. MegaCLI -ldbi
 -showprog -lall -aall.

 [r...@ns01 daniele]# /opt/MegaRAID/MegaCli/MegaCli -ldbi -showprog -lall 
 -aall
 
 Background Initialization on VD #0 is not in Progress.
 Background Initialization on VD #1 is not in Progress.

If you check shortly after boot, this will be the case. Check again
after about 5 minutes. The controller holds off on background
initializations for a little while.

 [r...@ns01 daniele]# omreport storage vdisk
 List of Virtual Disks in the System
 
 Controller PERC 6/i Integrated (Embedded)
 
 ID   : 1
 Status   : Ok
 Name : Data
 State: Ready
 HotSpare Policy violated : Not Assigned
 Virtual Disk Bad Blocks  : Not Applicable
 Secured  : Not Applicable
 Progress : Not Applicable
 Layout   : RAID-1
 Size : 931.00 GB (999653638144 bytes)
 Device Name  : /dev/sdd
 Bus Protocol : SAS
 Media: HDD
 Read Policy  : Read Ahead
 Write Policy : Write Back
 Cache Policy : Not Applicable
 Stripe Element Size  : 64 KB
 Disk Cache Policy: Enabled
 
 Or look at the disks when you aren't doing anything. Are they both lit
 up heavily?

 The disks are 200Km away from me :-( so I cannot check them.
 
 I tried to create the filesystem after the slow initialization but the 
 issue is still present.
 
 A simple dd if=/dev/zero of=/dev/sdd1 exibits the same problem , after 
 30 seconds the load is already about 46 and all the waiting processes 
 are in state D

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Adding 3rd Party RAID card halts Linux boot

2010-04-17 Thread Jefferson Ogata
On 2010-04-17 19:39, Nathan Milford wrote:
 Give it a shot and no go.
 
 Gives me Filesystem type unknown
 
 Since array has no filesystem and the original message says Filesystem
 type is ext2fs I presume it can see that (hd0,0) is correct, but it
 just stops short...

You might also need to tweak your root= option in the kernel line.

What are the lines in your default grub boot conf?

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: debian monitoring

2010-04-13 Thread Jefferson Ogata
On 2010-04-13 16:56, P.A wrote:
 Prasana, thanks for the info, I actually  had downloaded this but it 
 does not come with MegaCLI which is shown on the example page you have 
 below. I don't think megactl and megacli are the same, someone correct 
 me if im wrong.

MegaCLI and megactl are definitely different. MegaCLI is the official 
configuration and inspection tool from LSI. megactl/megasasctl is an 
alternate program I authored that does some things MegaCLI does not do; 
in particular it uses SCSI passthrough to get info directly from the 
disks in a RAID so you can check error conditions at the drive 
controller level rather than the RAID controller level. It can also 
start disk self-tests and check their results.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: megasas failures at MWT2

2010-03-31 Thread Jefferson Ogata
You might get some more info about what's going on by dumping the PERC 
log using MegaCLI.

MegaCli -AdpEventLog -GetEvents will give you the basic log.

MegaCli -FwTermLog -Dsply will give you a more detailed log.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: / full

2010-03-25 Thread Jefferson Ogata
On 2010-03-25 17:07, Dimitri Yioulos wrote:
 At this point, we don't want to add more space to 
 the partition.  Can anyone suggest any other 
 things we can try? 

Does this condition survive a reboot?

Is /boot on /, or is it a separate filesystem? One thing that happens on 
Red Hat and derivatives is that as you patch the kernel you need to 
remove older kernels to avoid filling up whichever filesystem /boot 
resides on.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: External array showing as /dev/sda

2010-03-21 Thread Jefferson Ogata
On 2010-03-21 22:30, David Hubbard wrote:
 From: Stephan van Hienen [mailto:stephan.van.hie...@thevalley.nl] 
 No sure which controller you have, but with the perc5/6 you 
 can create multiple virtual disks ?
 I have one PE2950 server with a Perc5i with 6 * 750GB drives, 
 with a 150gb boot vdisk, and a 3TB vdisk. (raid5)

 
 When doing raid 50 the H200/700/800-series controllers do not
 let you do that, the virtual disk size box becomes
 a fixed value.  I think it may let you do that on some
 other raid types.

As if we needed another reason to avoid the H-series controllers

To address your problem: you could do two RAID 5s and stripe them with LVM.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: RAID-5 and database servers

2010-03-12 Thread Jefferson Ogata
On 2010-03-12 15:39, Craig White wrote:
 On Fri, 2010-03-12 at 07:06 +, Jefferson Ogata wrote:
 On 2010-03-12 04:26, Craig White wrote:
 On Fri, 2010-03-12 at 02:23 +, Jefferson Ogata wrote:
 On 2010-03-11 22:23, Matthew Geier wrote:
 I've had a disk fail in such a way on a SCSI array that all disks on
 that SCSI bus became unavailable simultaneously. When half the disks
 dropped of the array at the same time, it gave up and corrupted the RAID
 5 meta data so that even after removing the offending drive, the array
 didn't recover.
 I also should point out (in case it isn't obvious), that that sort of
 failure would take out the typical RAID 10 as well.
 
 ignoring that a 2nd failed disk on RAID 5 is always fatal and only 50%
 fatal on RAID 10, I suppose that would be true.
 The poster wrote that all of the disks on a bus failed, not just a
 second one. Depending on the RAID structure, this could take out a RAID
 10 100% of the time.
 
 actually, this is what he wrote...
 
 When half the disks dropped of the array at the same time, it gave up
 and corrupted the RAID 5 meta data so that even after removing the
 offending drive, the array didn't recover.
 
 Half != all

Read it again: I've had a disk fail in such a way on a SCSI array that
all disks on that SCSI bus became unavailable simultaneously.

Unless you have a disk on a separate bus for every mirror in the RAID
10, this will kill your RAID 10 100% of the time. While that
configuration is more bulletproof, it also may not perform as well on a
saturated RAID 10 since every write has to be queued to two separate
buses instead of one.

The original poster's failure was a recoverable one, anyway. He just
didn't know the technique for recovery.

 I had a 5 disk RAID 5 array fail the wrong disk and thus had 2 drives go
 offline and had a catastophic failure and thus had to re-install and
 recover from backup once (PERC 3/di  SCSI disks). Not something I wish
 to do again.

PERC 5 and PERC 6 are worlds different from the PERC 3/di.

 I don't think I understand your 'odds' model. I interpret the first
 example as RAID 50 having 5 times more likelihood of loss than RAID 10
 and I presume that isn't what you were after

Yes, it is 5 times higher. But it is not 100%; it's actually less than
50%. And the probability for RAID 10 is not 50% as you said it was. I
was just correcting your analysis. I'm still not sure what RAID
structure you had in mind where a second failure on a
RAID 10 has a 50% probability of loss.

 
 In the alternative fair comparison, RAID 5 vs. RAID 1, the second
 failure kills both RAIDs 100% of the time.
 
 actually, I didn't raise the RAID 5 vs RAID 10 comparison, I only
 amplified with my experiences

You wrote: ignoring that a 2nd failed disk on RAID 5 is always fatal
and only 50% fatal on RAID 10, I suppose that would be true. That was
you comparing RAID 5 with RAID 10.

 the last time I bought an MD-1000, Dell would only sell me the PERC-5e,
 I don't know why.

Currently you can buy an MD1000 with or without a PERC 6.

(If I could recommend an enclosure from a different manufacturer at this
point, I would, but I haven't evaluated any others since I started
buying MD1000s some years ago.)

-- 
Jefferson Ogata : Internetworker, Antibozo
og...@antibozo.net  http://www.antibozo.net/ogata/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: RAID-5 and database servers

2010-03-12 Thread Jefferson Ogata
On 2010-03-12 15:45, John G. Heim wrote:
 Well, its not really practical to suggest that I consult with my vendor. My 
 whole budget is $6000. This is just the Math Department at the University of 
 Wisconsin. I mentioned in my original message that our databases consist 
 primarily of spamassassin bayesian rules and horde3/imp web mail. Those do a 
 lot of updates -- well, a lot by our standards. Every time a spam message 
 comes in, it it is added to the bayesian rule set for the user. I'm going to 
 say that typically each user gets 100 spam messages a day and there are 200 
 users. But each new rule consistes of several table updates. Even so, its 
 not like we're ebay.
 
 Anyway, speed of updates is critical because we can't have the mail system 
 getting bogged down by database updates. I put the bayesian rules in a mysql 
 DB in the first place because it was getting bogged down saving bayesian 
 data to bbm files on the mail server.
 
 I just want to make sure that I'm not setting myself up for a disaster.

Can you estimate the number of transactions per second you need? Is the
current mysql implementation keeping up with the mail? If so, run iostat
-kthx 60 under peak load, wait a minute, and post the last report
indicating which block device has the mysql database on it.

It doesn't sound like it would be a disaster if your database filesystem
crashed; you'd just drop the spam filtering while you reconstruct it.

Is your $6000 just for storage or do you have to buy a PowerEdge to go
along with it?

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: RAID-5 and database servers

2010-03-12 Thread Jefferson Ogata
On 2010-03-12 22:10, John G. Heim wrote:
 I really think my boss is nearly out of patience with me. I think I know 
 what I want though. If I want to set up two RAIDs, one for the operating 
 system and one for the database files, do I need two PERCs? Can a single 
 PERC put 2 disks in a RAID-1 array and 3 others in a RAID-5 array?

Yes, no problem. You'll have /dev/sda and /dev/sdb.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: RAID-5 and database servers

2010-03-11 Thread Jefferson Ogata
On 2010-03-11 18:09, Preston Hagar wrote:
 Actually it says if money is no object, go with RAID 10:
 
 http://www.orafaq.com/wiki/RAID#RAID_10
 
 RAID 10 is the ideal RAID level in terms of performance and
 availability, but it can be expensive as it requires at least twice
 the amount of disk space. If money is no objective, always choose RAID
 10!
 
 I would agree with the RAID 10 recommendation.  I at one time did a
 lot of RAID 5 to try to comprimise price vs performance, but had
 several array failures resulting in having to restore from backup.
 Now, I put anything important on either RAID 1, or RAID 10.  Basically
 I use RAID 1 if it needs to be reliable and RAID 10 if it needs to be
 reliable and fast.

I've got several hundred disks running on RAID 5 and I've had one actual 
full RAID failure in 10 years, and that was my fault.

In terms of performance, depending on the workload, RAID 5 can 
outperform RAID 10. Furthermore Oracle's recommendations are based on 
what appears to be 5-10-year-old data, back when mid-level RAID 
controllers weren't capable of pushing ~700 MB/s onto a RAID 5. 
Nowadays, they can do that, and achieve pretty stellar IOPS as well. The 
difference in performance between RAID 5 (or better yet, RAID 50, 
striped using LVM), and RAID 10 is not what it used to be. Bear in mind 
also that now that Oracle is a hardware company, they'd just love you to 
buy almost twice as much disk (from them).

*Again*, this is why if you have particular performance requirements, 
you should consult with your database vendor to determine what bandwidth 
and IOPS you need, and benchmark your gear using different RAID configs. 
You may find that RAID 5 is just fine performance-wise, and you can get 
around 1.7 times the storage capacity with the same rack space, heat, 
and power load over RAID 10. Asking here you're just going to get people 
parroting Oracle's stale recommendations and speculating wildly without 
knowing anything about your workload.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: RAID-5 and database servers

2010-03-11 Thread Jefferson Ogata
On 2010-03-11 19:48, Eric Rostetter wrote:
 Quoting Jefferson Ogata powere...@antibozo.net:
 I've got several hundred disks running on RAID 5 and I've had one actual
 full RAID failure in 10 years, and that was my fault.
 
 You've been lucky! :)
 
 In 10 years, I've think I've had 3 RAID 5 failures (all rebuilt without
 problems).

That's not what I mean by a full RAID failure. I've had plenty of disks 
fail and subsequent successful rebuilds. I'm saying on one occasion 
(because of an oversight) I ended up with an unrecoverable RAID 5 
because of disk failures.

Of course, this wasn't a serious problem because I also had backups.

-- 
Jefferson Ogata : Internetworker, Antibozo
og...@antibozo.net  http://www.antibozo.net/ogata/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Installing RHEL5 via DRAC

2010-03-11 Thread Jefferson Ogata
On 2010-03-12 01:31, Paul M. Dyer wrote:
 Sorry Stephen, but that will not work.
 
 The iso is setup in the comps to be a CD/DVD.  The process of creating a 
 repository rebuilds the comps for the new media, i.e. harddisk.

I have no problem using the iso image directly.

I also have no problem installing simply loopback-mounting the ISO and
either NFS-exporting that directly or copying the whole thing to an
exported directory.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: RAID-5 and database servers

2010-03-11 Thread Jefferson Ogata
On 2010-03-12 04:26, Craig White wrote:
 On Fri, 2010-03-12 at 02:23 +, Jefferson Ogata wrote:
 On 2010-03-11 22:23, Matthew Geier wrote:
 I've had a disk fail in such a way on a SCSI array that all disks on
 that SCSI bus became unavailable simultaneously. When half the disks
 dropped of the array at the same time, it gave up and corrupted the RAID
 5 meta data so that even after removing the offending drive, the array
 didn't recover.
 I also should point out (in case it isn't obvious), that that sort of
 failure would take out the typical RAID 10 as well.
 
 ignoring that a 2nd failed disk on RAID 5 is always fatal and only 50%
 fatal on RAID 10, I suppose that would be true.

The poster wrote that all of the disks on a bus failed, not just a
second one. Depending on the RAID structure, this could take out a RAID
10 100% of the time.

In your second disk scenario, comparing RAID 5 with RAID 10 in terms
of failure likelihood isn't fair; you need to compare RAID 50 with RAID
10. And the odd depend on the number of disks and the RAID structure.

Suppose you have 12 disks arranged as a 6x2 RAID 10, and the same number
of disks as a 2x6 RAID 50. When the second disk fails the odds of loss are:

- RAID 50: 5/11.
- RAID 10: 1/11.

If instead we have the 12 disks as a 3x4 RAID 50, then the odds of loss
when the second disk fails are:

- RAID 50: 3/11.
- RAID 10: 1/11.

We can now tolerate a third disk failure with our RAID 50 with the odds
of loss:

- RAID 50: 6/10.
- RAID 10: 2/10.

How often does this happen? It hasn't happened to me, and it hasn't
happened to anyone I know.

In the alternative fair comparison, RAID 5 vs. RAID 1, the second
failure kills both RAIDs 100% of the time.

And there's always RAID 6.

 So if Dell is selling a high quality hard drive with more than average
 durability and the anticipation that it is going to last longer under
 24/7 usage, its entirely reasonable to have to pay more than the
 cheapest dirt SATA drive you can find online. Of course you will have to
 live with the consequences if you go with the dirt cheap drive.
 Personally, I put a lot of value on my time and my customers data.

I have hundreds of Dell disks online. They fail regularly. Often they
fail during system burn-in. For the kind of markup Dell is charging on
these drives I don't think I should be finding dead ones after only 24
hours of operation. And a one-year warranty is just ridiculous.

 I read this article last year...
 
 http://www.enterprisestorageforum.com/technology/features/article.php/3839636
 
 and I had already forsaken RAID 5 but it pretty much confirmed what my
 experiences had been... that when I considered the life cycle of the
 installation, the time lost in waiting for file transfer, etc. on RAID
 5, etc. that it was foolish for me to recommend RAID 5 to anyone. 

It's pretty clear you don't speak from any recent experience as far as
RAID 5 performance goes, and you yourself say as much when you say you
had already forsaken RAID 5. Like Oracle, you're living in the past.
You should do some of your own benchmarks.

In any case, the argument in that article applies to RAID 10 as well; it
gives you better probabilities but eventually it will take too long to
rebuild mirrors and failure will be just as inevitable as with RAID 5.
Error rates will have to drop to prevent this, and no doubt they will,
sufficiently that the article's argument is moot. Eventually they will
drop to the point where we will be using RAID 0.

  On top of that,
 it seems to me that RAID 10 smokes RAID 5 on every performance
 characteristic my clients are likely to use (and yes, that means
 databases). RAID 5 primarily satisfies the needs for maximum storage for
 the least amount of money and that was rarely what I need in a storage
 system for a server.

For a lot of access patterns, RAID 5 yields much better write bandwidth
than RAID 10. I don't know why you think RAID 10 smokes RAID 5. You
should grab a PERC 6 and a couple of MD1000s and try some different
configurations. I don't think you'll see any smoke in the margins, even
over the oddly limited gamut of access patterns your clients use.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Alternatives to Dell storage?

2010-03-09 Thread Jefferson Ogata
So, does anyone want to recommend alternatives to the MD1000 for 3.5 
SAS/SATA storage? I've looked at HP's offering but it appears to have a 
lower drive count.

Obviously direct LSI replacements for the prior generations of PERC are 
available, as well as the Arecas that some people seem to like. Any 
other suggestions on the controller front for RAID5/6 connecting to 
SAS/SATA enclosures?

The reasons for this question should be obvious. :^)

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: iDRAC6 out of range

2010-02-19 Thread Jefferson Ogata
On 2010-02-19 15:59, Nick Lunt wrote:
 just been trying to install Red Hat on R710 over the DRAC and I just get
 out of range when it starts anaconda gui, so I had to reboot and
 install in text mode, which is less than ideal.
 
 Anyone know how to solve the out of range issue please ?

Try adding resolution=800x600 or resolution=1280x1024 to the install
boot options.

See also:

http://fedoraproject.org/wiki/Anaconda/Options

Or use kickstart with PXE and don't look at the screen at all. :^)

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Third-party drives not permitted on Gen 11 servers

2010-02-16 Thread Jefferson Ogata
On 2010-02-16 17:46, Blake Hudson wrote:
 Attached was a pdf explaining the stringent quality
 control standards for Dell's HDDs. No apology, remorse, alternative
 solutions, etc.

That's pretty funny considering the fairly high failure rate of Dell 
drives. If you actually check the SMART statistics you'll see the PERC 
often tries to pretend bad drives are just fine. For example I have a 
Dell-provided Seagate in a PE2950 right now that has logged 100 
uncorrected write errors and 10 uncorrected read errors, and has failed 
a SMART long self-test. The PERC says 2 media errors and hasn't failed 
it out of the RAID.

Well, I guess this is the year I start diving into HP or IBM gear.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Reverse DNS lookup syslog (Debian 4.0)

2010-02-15 Thread Jefferson Ogata
On 2010-02-15 10:22, Brian O'Mahony wrote:
 Not possible with 600+ machines in different locations around the world. MAC 
 reservations for clients is not an option here.

Only 600+? Of course it's possible. The only legitimate reason to use
non-static assignment is if you don't have enough address space for your
devices and they need to take turns.

Not knowing what devices are attaching to your networks gives you
exactly the sort of problem you're currently trying to find a kluge for,
and invites unwelcome guests on your network.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Reverse DNS lookup syslog (Debian 4.0)

2010-02-12 Thread Jefferson Ogata
On 2010-02-12 14:58, Brian O'Mahony wrote:
 As our domain use DHCP for the windows clients, when I go back a few 
 days later, the syslog entries are close to useless to mea s the IP may 
 have changed. How do I get syslog to log the hostname of the connecting 
 machine?

Your real problem is that you are using DHCP to assign addresses from a 
pool. Reserve your IPs to MAC addresses so that machines don't change 
IPs and you've solved both problems.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Firmware repository out of date

2010-01-29 Thread Jefferson Ogata
On 2010-01-29 22:28, jeffrey_l_mend...@dell.com wrote:
 Looks like the firmware repo is falling out of date. This happened
 before when it was unofficial. Maybe now that it is official something
 can be done?
 
 The OM 6.2 server update utility (SUU) disc that was released in December was 
 pulled and replaced with a new version last week. The DRAC5 firmware going 
 from 1.5 to 1.51 was one of the changes. I just updated the yum repo 
 yesterday to reflect the new SUU disc.

Hmm. Does that mean there were problems with SUU 6.2.0?

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Lifecycle Controller not-so slick IMO.

2010-01-28 Thread Jefferson Ogata
On 2010-01-28 14:11, patrick_b...@dell.com wrote:
 You should look at USC 1.3 and the latest DRAC for your system. There exists 
 a WSMAN interface through the DRAC to do just what you are saying.

Dell folks: please feel free to include URLs for resources providing
further details in your postings. :^)

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: PXE boot R900 via 10Gb Intel NIC?

2010-01-23 Thread Jefferson Ogata
On 2010-01-23 13:17, Pavel Matěja wrote:
 I have gotten as far as figuring out I might need to run IBAUTIL.EXE to
 enable PXE boot on the device ROM, but I'm so far stymied at actually
 getting to a state where I can execute that program. None of the ancient
 DOS bootable floppy images I have around seems to want to boot on this
 system via virtual floppy.

 Any clues? Anyone successfully booted a bare metal R900 to a DOS/Windows
 command line in order to run such a beast?
 
 Try http://www.freedos.org/

Oh, I did, of course. It doesn't run on R900s. Boots but crashes with
invalid instruction errors. But thanks for the suggestion.

I've also tried booting gPXE on virtual CDROM but it fails silently; I'm
guessing that either there's no current driver for that NIC or the boot
code doesn't work on an R900.

WTF is wrong with Intel, anyway? Who expects someone to boot DOS in 2010
in order to tune a 10GBE NIC? They could at least provide some way to do
it in Linux so I could boot a live CD running an operating system from
this millennium to fix it. And if Dell is providing IBAUTIL as part of
their support packages, it would be nice if they could suggest a way you
could run it successfully.

Maybe there's some way to do it with ethtool -E, but I don't have any
way to find out what it is.

Any other ideas?

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

Re: PXE boot R900 via 10Gb Intel NIC?

2010-01-22 Thread Jefferson Ogata
On 2010-01-23 03:32, thomas_chena...@dell.com wrote:
 Jefferson Ogata wrote:
 Anyone know how to PXE boot an R900 off of an Intel 10Gb optical NIC 
 rather than one of the onboard copper NICs?
 
 Some Intel 10Gb optical NICs are bootable, but not necessarily all. For 
 bootable adapters, the boot ROM needs to be programmed and enabled using a 
 DOS-based tool; either flautil.exe or ibautil.exe. These tools can be found 
 on support.dell.com in packages named in the pattern 
 Intel_LAN_*_DOSUtilities*.

Thanks for the reply!

Well, this is the 10GBE NIC provided by Dell as part RN219, and the
specs on that part claim it supports PXE boot.

I have gotten as far as figuring out I might need to run IBAUTIL.EXE to
enable PXE boot on the device ROM, but I'm so far stymied at actually
getting to a state where I can execute that program. None of the ancient
DOS bootable floppy images I have around seems to want to boot on this
system via virtual floppy.

Any clues? Anyone successfully booted a bare metal R900 to a DOS/Windows
command line in order to run such a beast?

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: megactl

2009-12-30 Thread Jefferson Ogata
On 2009-12-30 09:07, Pavel Mateja wrote:
 One of these days I'll add some code to do that in mega{,sas}ctl, as
 well as add an XML output mode so that various monitoring tasks become
 easier, e.g. writing a simple SNMP agent.
 IMHO this should be done by udev.
 Create rule file like /etc/udev/rules.d/55-megadev.rules:
 KERNEL=megadev* MODE=600
 and run udevtrigger.
 That's a fine idea, but the tools should still try to work on systems
 where no one has done this.
 
 But you can make such file part of your package, can't you?

I think that would be a bit overzealous, personally. Not really my place
to muck with udev. Certainly could include it and recommend
installation. udevtrigger can do unexpected things. E.g. I just ran
udevtrigger on a system with 7 down interfaces and it brought them all
up with dhclient.

The safer course is simply to create the device node if it's not
present. This is what dellmgr used to do with older PERCs.

Maybe in a secondary RPM it would be okay, with appropriate caution.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: megactl

2009-12-30 Thread Jefferson Ogata
On 2009-12-30 10:39, Pavel Mateja wrote:
 The safer course is simply to create the device node if it's not
 present. This is what dellmgr used to do with older PERCs.
 
 I've met similar problem year ago. We had raidmon init script which deleted 
 /dev/megadev0 and was unable to recreate it because the device was moved to 
 misc with major number 10 in newer kernels. Udev just worked with both old 
 and 
 new kernels.

That's why you look in /proc/devices for the correct major number.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: RAID Perc 5 monitoring

2009-12-28 Thread Jefferson Ogata
On 2009-12-28 16:36, Brian A. Seklecki wrote:
 It is in a remote location, so command line tools would be great.  I
 
   MegaCli from LSI is all that you need.   OMSA if your nights are long.

megasasctl can send you periodic reports of health of all physical and
logical disks. On SAS disks you can also check disk temperature and
various other log pages, or initiate disk self-tests in disk firmware.
One case where this is handy is when you're about to rotate a new disk
in after a failure; it's nice to be able to do a long self-test of the
new disk first so you don't have it fail in the middle of the rebuild.

http://sourceforge.net/projects/megactl/

You may wish to build from SVN, particularly if you need PERC6 RAID6
support. I haven't made a new release since before I had one to work with.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: PERC/6E kernel modules

2009-12-15 Thread Jefferson Ogata
On 2009-12-14 15:45, Karl Zander wrote:
 Does the PERC/6E use the same LSI MegaRAID modules in the kernel as some of 
 the other PERCs?  
 
 Is there a specific kernel version needed for the PERC/6E?
 
 I am am compiling my own kernel.

PERCs up to and including PERC 4 (the AMI/LSI, not Adaptec), use the 
older megaraid driver.

PERC 5/i, 5/E, and 6/E use megaraid_sas.

PERC 6/i uses mptsas (i.e. the standard LSI Fusion driver).

If you want to cover most of your bases, build megaraid, megaraid_sas 
and mptsas.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: massive io problems

2009-12-11 Thread Jefferson Ogata
On 2009-12-11 09:04, John Hodrien wrote:
 On Fri, 11 Dec 2009, Adam Nielsen wrote:
 Do you have a utility like the old megamgr that can get you controller
 stats?  That will report actual disk errors that are seen by the RAID
 controller, which may not make it all the way through to the OS.
 
 No, I was relying on omsa.  I'll take a look at what else I can use.

Have you tried using MegaCli to dump the controller log?

Have you tried using megasasctl to check SAS error log pages? (Check out
the current source from sourceforge subversion and build from source for
PERC 6 support.)

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: R900 internal disks I/O slow!!!!

2009-12-03 Thread Jefferson Ogata
On 2009-12-03 21:50, mcclnx mcc wrote:
 we have several R900 server with Redhat 5.3 O.S. in it.  R900 have 5 internal 
 450GB SAS disks.  we configured it as 0 and 1 mirror, 2 and 3 mirror, last 
 disk singe.
 
 I have been perform file in between each logical disks and I/O is very slow 
 (something like 8 to 9GB/sec).  Even my OLD DELL 2650 servers (Redhat 4.7) 
 internal disks run fast than it.
 
 Does anyone know why?

I assume you actually mean 8-9 MB/s. Have you allowed the RAIDs to 
complete their background initialization? This will contend with your 
disk activity.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Operation disabled with PERC/5i

2009-12-03 Thread Jefferson Ogata
On 2009-12-03 16:12, one...@waste.org wrote:
 Thank you very much for your reply. I thought these were errors relating
 to the SCSI Inquiry not being read but the drive came with a Dell Server
 we have, and it was working fine prior:
 
 Vendor ID : DELL
 Product ID: MAX3073RC
 Revision  : D206
 
 Though you mention something that is worrying, that the slot failed. Could
 I make this drive a hot swappable or clear, which I did successfully, if
 the slot had failed?

I haven't had a slot fail so I don't know. But it's certainly a 
possibility that could explain both the previous disk failing and the 
current one acting weird. That's all I'm saying.

Have you used MegaCli to dump the controller event log?

 Is there any way to test a failed slot through these tools? I checked
 consistency (when the drive was replaced) and all was okay it seems. If
 the slot failed, it seems to me that the drive may not even be recognized.

I suggest putting the drive in a different box to see if it behaves 
better. That would be some evidence either way about the slot.

I'm confused tho. Earlier you wrote:
 We're running a RAID 5 and two days ago, a drive went down. We replaced it
 and it went into Foreign mode to which I cleared it:
 
  sudo omconfig storage controller action=clearforeignconfig controller=0
 
 That worked fine. However, I cannot seem to get this new hard drive to
 attach itself to the RAID array, no matter what I try:

How is it that you were unable to add the drive to the array but were 
able to perform a consistency check?

 The drive is a Dell drive (well Fujitsu) - I find it odd it would bug
 out...

Drives bug out all the time.

Please don't top post.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Running custom code on the DRAC

2009-11-24 Thread Jefferson Ogata
On 2009-11-25 01:58, Adam Nielsen wrote:
 I have now installed a cross-compiler and managed to get some test code
 running on the device.  The hard part was figuring out how make files
 available on the DRAC, but luckily it has NFS support built in so I
 could just mount a folder from another PC and run the code from there.

It's been a couple of years since I've played with this, and my memory 
is fuzzy, but this is what I can dredge up; sorry if any of it is 
misremembered:

The virtual media plugin gives you a path to move stuff to the DRAC, 
albeit in a block-oriented way. This is not needed with the NFS path and 
a root shell, but it was when I was working from the restricted busybox 
user shell.

The firmware image has a CRAMFS at some offset. You can dd this off and 
unpack it to get the firmware contents. Again, not as necessary with a 
source release.

This site used to host a bunch of MIPS/Linux binaries which would run on 
the DRAC, though I'm not sure where they've gone to.

http://www.paralogos.com/mipslinux/

There appear to be more resources these days. This looks promising:

ftp://ftp.linux-mips.org/pub/linux/mips/redhat/7.1/RPMS/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Determine interface's MAC address prior to OS install?

2009-11-24 Thread Jefferson Ogata
On 2009-11-24 17:49, Jefferson Cowart wrote:
 I have a couple R710s. Those latches are designed to hold the server without 
 the screws. Simply push the server into the rack and they should latch. You 
 can then lift the latches to release the server. I believe the screws are for 
 securing the server if you are shipping it in a rack. (See 
 http://support.dell.com/support/edocs/systems/per710/multlang/Rack/H153KA00.pdf)

Huh--well, I'll happily stand corrected on that. Unfortunately, the 
fellow who brought me the R710s failed to bring the rails along (they're 
for a remote install), so I haven't seen them interact with the rails. 
It wasn't evident to me that there was any catch mechanism other than 
the recessed screws.

They're still ugly, though, IMO. :^)

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: DRAC firmware source code available for download

2009-11-17 Thread Jefferson Ogata
On 2009-11-18 06:44, Adam Nielsen wrote:
 Maybe this was to discourage us from flashing our own code?  Who knows.
  If anyone from Dell is listening, you really don't have to bother doing
 that :-)

Since the firmware shell was based on busybox, GPL compels them to
publish source for at least part of the firmware. Theoretically, they
should be publishing it in the same place as the binary, but this is not
too bad. I'm not sure how long they've been publishing source, but I
couldn't find it when I looked around a couple of years ago.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


BMC SOL agetty feedback loop

2009-10-23 Thread Jefferson Ogata
Apologies if this has been discussed; it seems to affect a lot of my 
systems so I'm surprised if it hasn't, but I haven't found an effective 
solution looking through the list.

Various PowerEdge 1950s/2950s without DRAC exhibit this symptom running 
RHEL 5. There is a multiline warning banner in /etc/issue. Console 
redirection is enabled, and the following form of kernel line is used in 
/boot/grub/menu.lst:

timeout=5
serial --unit=1 --speed=57600
terminal --timeout=5 serial console
title Red Hat Enterprise Linux Server (2.6.18-164.2.1.el5)
 root (hd0,1)
 kernel /boot/vmlinuz-2.6.18-164.2.1.el5 ro root=LABEL=/ 
console=tty0 console=ttyS1,57600 rhgb
 initrd /boot/initrd-2.6.18-164.2.1.el5.img

The following is in /etc/inittab:

co:2345:respawn:/sbin/agetty -h ttyS1 57600 vt100-nav

(The agetty -h option doesn't seem to matter.)

Boot is fine; I have access to the console over IPMI/SOL during boot, 
all the way to getting an agetty login banner and prompt. But once I 
disconnect my IPMI/SOL session, after some delay, the BMC enters a kind 
of feedback loop where the banner text is fed back into agetty. agetty 
then logs a lot of failed login attempts, and eventually init pauses 
spawning agetty for 5 minutes because of excessive restarts. This looks 
like this in /var/log/secure (over and over again):

Oct 23 21:03:15 foo login: FAILED LOGIN 1 FROM (null) FOR 
warning**warning**warning**warning**warning**warning**warning, User not 
known to the underlying authentication module
Oct 23 21:03:17 foo login: pam_unix(login:auth): bad username []
Oct 23 21:03:17 foo login: pam_succeed_if(login:auth): error retrieving 
information about user
Oct 23 21:03:17 foo login: FAILED LOGIN 2 FROM (null) FOR , User not 
known to the underlying authentication module
Oct 23 21:03:18 foo login: pam_unix(login:auth): bad username []
Oct 23 21:03:18 foo login: pam_succeed_if(login:auth): error retrieving 
information about user
Oct 23 21:03:18 foo login: FAILED LOGIN 3 FROM (null) FOR , User not 
known to the underlying authentication module
Oct 23 21:03:20 foo login: pam_unix(login:auth): bad username []
Oct 23 21:03:20 foo login: pam_succeed_if(login:auth): error retrieving 
information about user
Oct 23 21:03:20 foo login: FAILED LOGIN SESSION FROM (null) FOR , User 
not known to the underlying authentication module

And in /var/log/messages:

Oct 23 21:03:20 hobo init: Id co respawning too fast: disabled for 5 
minutes

If I reconnect to the SOL console using IPMI, everything's fine 
(assuming init hasn't disabled agetty at the time I connect). Once I 
disconnect, the same thing starts again (again after some delay).

I haven't found any options to agetty that alter this behavior, and the 
BIOS Redirection after boot option doesn't alter it either. It's a 
problem because init makes agetty unavailable for 5 minutes at a time, 
and it also causes lots of noise in /var/log/secure.

Does anyone know what I'm missing? I would appreciate any help on this.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq