Re: 2850 very slow cp compared to identical hardware
On 2010-12-10 20:40, Brian A. Seklecki wrote: 5A2D BIOS H433 and the fast one has fw 516A and BIOS H418. RAID adapter and container settings match across systems, as do e2fstune -l Just to confirm; same cache settings for each volume? And same RAID battery state? ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: [BULK] RE: R910/Linux CPU Heat Problems?
On 2010-12-08 23:31, Bond Masuda wrote: yeah, looks like the R910 has 4 PSUs definitely something off with one of them. I'd consider taking a physical look at it who knows? maybe one PSU is failing and generating a lot of heat? Or perhaps the high fan speed in one PSU is part of a scaled response to the high CPU temp. Maybe at higher CPU temps the other PSU fans will spin up to high speed. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: iDRAC6 firmware download links broken (and where is the source code?)
On 2010-12-03 00:30, Adam Nielsen wrote: I would very much like to see a full build and reflash environment, there are a number of improvements I would like to make to the DRAC - and of course Dell would be free to include them in future releases if they wanted to, so I still don't really understand why they are so opposed to having people work on improving their products for free... What Adam said. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Issues with syncing mirror/fetching rpms
On 2010-12-01 21:00, Matt Domsch wrote: On Wed, Dec 01, 2010 at 10:23:45AM -0500, Bryan wrote: Can the person who runs linux.dell.com take a look at this and fix it? This occurs from multiple different networks/clients on this same file. I can reproduce the failure too, though it's not clear why it's failing. apache logs show the connection and report the whole file was sent... The file is very much readable on the server itself. Gremlins... Is it being served from an NFS filesystem? If so, maybe you need to turn EnableSendfile off... ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
OT: mailing list issue
Got a message from the linux-poweredge@dell.com mailman interface last night claiming: Your membership in the mailing list Linux-PowerEdge has been disabled due to excessive bounces The last bounce received from you was dated 31-Oct-2010. You will not get any more messages from this list until you re-enable your membership. You will receive 3 more reminders like this before your membership in the list is deleted. with a link to re-enable the membership. Checked my mail server logs and there have been no bounces on my end. Something's funky. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: perc6i alignment?
On 2010-10-12 14:52, Tino Schwarze wrote: I suppose(!) alignment doesn't matter that much (or at all) for RAID10 (which is the right choice for DB loads with only few disks). But that's just my gut feeling. My gut thinks your gut is wrong about that. :^) Why would RAID10 be exempt? The PERC is still going to bunch up disk addressing into RAID chunks. If your filesystem blocks aren't aligned with the chunk boundaries, you're going to need two disks to seek to satisfy some read requests, and four disks for some write requests. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Dumping the BMC/DRAC event log without installing a bunch of software on R410 Ubuntu 10
On 2010-09-20 18:47, Drew Weaver wrote: Is there a way to view the event log in the BMC/DRAC without installing OMSA on an R410 with Ubuntu 10? apt-get install ipmitool man ipmitool ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: PE2850 RAID1 upgrade drives
On 2010-09-04 12:15, Raymond Kolbe wrote: Over the past couple of months I have been looking into upgrading the drives in one of our servers, a PE2850 running CentOS 4.8. Currently it has 3x146GB 10K drives, two of which are RAID1 and the third being a hot spare. I would like to upgrade the drives to 3x300GB 15K drives but I do not want to reinstall the OS. I have found many articles on the web related to upgrading RAID1 configurations and it seems like everyone says the following: 1) Create a Ghost image of OS/data, etc. for backup. 2) Break the array (degrade it). 3) Pull one of the drives (drive 1) and replace it with the newer 300GB drive. 4) Let the array rebuild to the bigger drive. 5) Pull drive 0 and replace it with the newer 300GB drive. 6) Let the array rebuild. 7) Use gParted or another partition resizing program to increase my partitions. or 1) Create a Ghost image of OS/data, etc. for backup and restore. 2) Turn off the server and replace both drives with the newer 300GB drives. 3) Turn on the server and create a new RAID1 array. 4) Restore the Ghost image from step 1. 5) Use gParted or another partition resizing program to increase my partitions. However, no one has confirmed that these methods worked for them. Now, both ways sound like they would work, but I am extremely nervous about this because I have also found forum postings and articles about having to manually copy over partition information, and that disk block sizes matter, etc. (not exactly sure about the technical issues here), etc. This is also a mission critical production server so uptime is key. So my question is, are either of the two methods above realistic, and/or has anyone actually upgraded RAID1 in a PE2850 or PE server before without having to reinstall their OS? Method 1 may not give you a larger RAID1. Method 2 may not preserve your boot record (MBR) and partition table, which are stored in the first track of the RAID volume, i.e. blocks 0-62. Ghost may or may not be able to image these. In both cases, there may be limits to how you can resize the partitions because of the actual layout on disk. Also, depending on utilization, copying full images of your filesystems, rather than using dump/restore, may waste a lot of time on unallocated blocks. Since you have a third disk in there, you could also do something like the following, which would have lower downtime: 1. Replace the hot spare with a 300 GB disk. 2. Create a RAID0 volume on the new 300 GB disk (get rid of the hot spare). 3. Create a partition on the new RAID0 volume and make a filesystem on it. 3. Use dd to transfer the partition table and boot record to a file on the new filesystem. (dd if=/dev/sda of=/mnt/foo/track0 count=63) 4. Boot in emergency mode, or live boot a CentOS install disc or other live CD, the objective being to make sure all filesystems are either unmounted or mounted read-only. 5. Use dump to copy the other filesystems to the new disk. (e.g. dump 0f /mnt/foo/root.0 /dev/sda1, etc.) 6. Make copies of the static dump and restore binaries on the new disk, in case you don't have them later. 7. Delete the RAID1 volume. 8. Replace the other two disks. 9. Create a new RAID1 volume. 10. Boot a live or install disc again and mount the RAID0 filesystem. 11. Use dd to copy the partition table and boot record to the new RAID1. 12. Tweak the partition table to suit your needs. 13. Create new filesystems on the new partitions. Run mkswap on your swap partion, if you have one. Check /etc/fstab and be sure to specify filesystem labels where needed, e.g. if /etc/fstab says LABEL=/usr for the /usr mount, be sure to add -L /usr to your mkfs line. You can also tweak labels after the fact using e2label. Also pay attention to whether there's a label on your swap partition, and use -L with mkswap in that case as well. 14. Mount each new filesystem and use restore to recover the appropriate filesystem. I'm assuming you're not using LVM. If you are, then some of these steps would become simpler. It might be advisable to use Ghost, as you suggested, to make a backup over the network to a different system just in case. But it will add time to the process. If you're not completely familiar with all of this, it may be best for you to setup another system to practice on. Specific things to practice ahead of time (that are good skills for you to have as a sysadmin anyway): - Using grub to rewrite the boot record. - Changing grub config options (e.g. root, kernel) at the boot screen in order to boot a system whose disks have been shuffled around. - Booting in emergency mode. - Getting your root filesystem remounted r/w in emergency mode. - Using dump and restore. - Checking and modifying filesystem labels with e2label. - Identifying which RAID volume /dev/sda actually refers to. - Using a CentOS/Red Hat install disk to get to a command line without nuking anything on your disk.
Re: 16tb filesystems on linux
On 2010-08-26 17:26, Nick Stephens wrote: Does anyone have any tips or tricks for this scenario? I am utilizing RHEL5 based installations, btw. Don't create very large filesystems. Use LVM. - Very large filesystems take a long time to fsck. Using smaller filesystems with LVM snapshots lets you fsck periodically without even umounting your filesystems. - A serious error or inconsistency in a very large filesystem may blow away all of your data; smaller filesystems constrain the damage. - The properties of one giant filesystem (e.g. striping, inode/block ratio) can't be tuned to the different needs of different types of files you might store. Your application might be more efficient if it put larger files on a different filesystem with a better large-file allocation strategy. - Very large filesystems limit you to a small subset of possible filesystem types. - Very large filesystems keep you from migrating your data to off-the-shelf hardware in an emergency. - You're going to hit limits of some kind sooner or later, so your application should be designed to tolerate having your data on multiple filesystems anyway. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: 16tb filesystems on linux
On 2010-08-26 18:30, Nick Stephens wrote: I actually gave that a shot myself but didn't think it was available yet due to getting the same error message. Now that I think about it though, it could be a different issue I'm encountering. [r...@localhost ~]# mkfs.ext4dev -T news -m0 -L backup -E stride=16,stripe-width=208 /dev/sda1 mke2fs 1.41.12 (17-May-2010) mkfs.ext4dev: Size of device /dev/sda1 too big to be expressed in 32 bits using a blocksize of 4096. Another reason to use LVM: you've put a partition table on your giant block device. Did you align the start of the first partition with your RAID stripe size? If not, then many of your filesystem blocks will span two disks, meaning reading one of those block requires two disks to seek instead of one. If you make the whole block device an LVM physical volume instead, you won't have to worry about that (unless you have a stripe size 64 kB, and in that case, you can override the default PV metadata size to make it a multiple of your RAID stripe size). See: http://insights.oetiker.ch/linux/raidoptimization/ [snip] The MD1000 is populated with (15) 2TB 7200rpm SAS drives in a RAID-5 with 1 hotspare (leaving 13 data disks). I know that conventional wisdom says that raid5 is a poor choice when you are looking for performance, but localized benchmarking has proven that in our scenario the total-size gains acquired with the striping outweigh the redundancy provided with RAID-10 (since we are unable to get significant performance increases). Consider creating two 7-disk RAID5s instead of a single 14-disk RAID5. This will double your redundancy, and you can still stripe over all 14 disks using LVM. In addition, if you use slots 0-6 for one RAID5 and 7-13 for the other, you can dual-connect the MD1000 and have one SAS channel dedicated to each RAID. Or, as others have suggested, consider RAID6. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: idrac firmware update on R710, can't connect to idrac now
On 2010-08-06 18:01, Sabuj Pattanayek wrote: I just updated the idrac firmware on an R710 : [snip] But now I can't connect to the idrac. Did the settings on the idrac basically get blown away so that it doesn't remember what static IP I had set for it? How do I set the idrac ipv4 settings without going over to data center? Can I do it from Linux? I was looking at this : http://support.dell.com/support/edocs/software/smdrac3/idrac/idrac1.11/en/ug/html/chap02.htm#wp95392 but it doesn't look like it's possible? If you have ipmitool you can set LAN and user settings with lan set and user set commands, respectively. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: idrac firmware update on R710, can't connect to idrac now
On 2010-08-06 20:04, Sabuj Pattanayek wrote: But, I'm still not getting any pings, nor can I connect to the web, or via ssh. Any other ideas? Do I just have to bite the bullet and try power cycling it? I'm also assuming that ipmitool chassis power cycle does an immediate power cycle, i.e. it doesn't call shutdown first? I could also try a warm boot first to see if that fixes the idrac. Has anyone experienced the same issue, i.e. is the idrac supposed to stop working after an update until a cold/warm boot ? ipmitool chassis power cycle will, as you say, immediately power cycle the main system. It won't necessarily reset the iDRAC if that's wedged somehow. The way to do that is to actually pull all physical power to the system for some time between 30 seconds and a few minutes. A regular reboot is at least worth a shot. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: collecting RAID info
On 2010-07-21 05:33, Geoff Galitz wrote: We have systems with various PERC RAID cards and I would like to be able to gather basic data about the RAID configs on our servers programmatically. In other words, I want to write a script that can report the disk drive models, size and the RAID level configuration. This is for inventory purposes so I don't need to worry about the runtime state of the array. I'm ok with perl and shell, so I really just need guidance on what interfaces to use to collect the data. You can use megactl/megasasctl for this. That's pretty much what I wrote it for. http://megactl.sourceforge.net/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: collecting RAID info
On 2010-07-21 21:20, Paul M. Dyer wrote: Is megactl/megasasctl available for a WinOS? Afraid not; sorry. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Vendor provided support
On 2010-07-19 23:01, Wiley Sanders wrote: I am trying to resolve an issue with RHEL on a newly purchased R510 (a botched RHEL4 to RHEL5 upgrade), I opened a ticket with RHN, and RHN is saying support is vendor provided and to contact Dell and not RHN for support issues. Huh? (A polite way of saying WT*?) Is that what I'm supposed to do now? It's been a while (RHEL 3 days) since I've bought a system with RHEL support. I've been wandering through the labyrinth of dell.com http://dell.com for the last 15 min looking for *anything* that references RHEL support and so far nothing. RHEL support costs big bucks and I don't appreciate a runaround - from RHN and not Dell in this case it looks like. That's what I get for not sticking with CentOS - hey I just wanted to make sure I got good System Management Tools support (which I did - installing omreport just *worked*!) Open a ticket with Dell, and make sure your sales rep knows about the issue. FWIW I had a whole slew of problems with RHEL purchases via Dell auto-activated licenses this year, involving Red Hat entitling systems for only two weeks when a full year had been purchased. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Vendor provided support
On 2010-07-19 23:10, Robin Bowes wrote: I would imagine Dell support will be limited to RHEL runs on this hardware. If you have any issues with kernel oops or other hardware-related crashing then Dell may be interested in fixing it. Not if it's a licensing issue where Dell is the OEM that sold the Red Hat license. In that case, Red Hat will run you (and Dell tech support) around in circles of ever-increasing size until you finally conclude that the only viable option is CentOS. My only support tickets with Red Hat in the past two years have been related to licensing. Kinda tells you something ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Help with error message from megasasctl
On 2010-07-09 04:10, brijesh patel wrote: I have been receiving error messages from megasasctl which says the following. snip / *a0e32s3 SEAGATE ST9300603SS279GiB a0d0 online errs: media:0 other:5* write errors: corr: 0delay: 0rewrit: 2tot/corr: 2tot/uncorr: 13 read errors: corr: 32Mi delay: 1reread: 0tot/corr: 32Mi tot/uncorr: 0 verify errors: corr:159Mi delay: 9revrfy: 0 tot/corr:159Mi tot/uncorr: 0 This is the 4th hard drive in my RAID Array. I couldn't find a proper explanation on google. I think one of my hard drive is failing but i would like to have your thoughts on it. Yes, that is a failing disk. I don't know why the PERCs don't fail them when this starts to happen, but the disk is clearly reporting that it has had 13 uncorrected write errors. That may be okay for Dell's data, but not for mine. Usually in this case I start a long self-test on the disk in question (as well as on a spare so I have a reliable replacement). Often the self-test will fail sufficiently to get Dell to replace the disk, or it may even force the PERC to fail the disk. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Blew away my partition table
On 2010-06-29 20:22, Eberhard Moenkeberg wrote: On Tue, 29 Jun 2010, Jefferson Ogata wrote: You should be able to use dmsetup to create device nodes with offset into /dev/sda if you want to do this. But you should be able to find your filesystem headers with dd and xxd (or any hexdump program). A very good idea, to avoid the reboot. Well, apparently using dmsetup doesn't work because the kernel refuses to set up new mappings directly on /dev/sda, possibly because there's an existing lock on the device due to the partition table being loaded. Where to look: - The first partition starts one track into the disk; typically that's 63 512-byte sectors. - The second, third, and fourth partitions are usually on cylinder boundaries, with a cylinder typically being 63 * 255 512-byte sectors. - If you had more than four partitions, then the last physical partition has a partition table at the beginning. The first logical partition will begin one track into that physical partition. What to look for: - For ext3 filesystems, a superblock begins 1024 bytes into the partition. At offset 0x38 in the superblock you should find the magic number 0x53ef (big-endian). - For swap partitions, look at the first 4096 bytes. At the end of that page you should find the string SWAPSPACE2. - For LVM physical volumes you should see an LVM label 512 bytes from the beginning of the partition. A nice collection. Thanks, I will keep it in case i get into partition table trouble. A little more info: - For ext3 filesystems, the superblock begins with a series of uint32_ts in little-endian format. The second uint32_t is the number of filesystem blocks in the filesystem. The seventh uint32_t is the block size, expressed as the number of bits to shift 1024 left. (So 0 for 1024-byte blocks, 1 for 2048-byte blocks, 2 for 4096-byte blocks). From this you can calculate the offset to the next partition--multiply the number of blocks by the actual blocksize and round up to a cylinder boundary. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Blew away my partition table
On 2010-06-30 01:53, J. Epperson wrote: On Tue, June 29, 2010 21:27, Jefferson Ogata wrote: Number Start End SizeType File system Flags 1 63s 401622s 401560s primary ext3 boot 2 401625s 139299608s 138897984s primary ext3 3 139299616s 143380124s 4080509sprimary swap I would say those end sectors on partitions 1 and 2 should be one less than the following partition's start sector. The end sector of partition 3 looks correct; though the last sector on the disk is 143380479, when you round down to a cylinder boundary you end up at 143380124. I was thinking the same thing, but that's what the parted rescue found, so I assumed it was correct. Looking at another F12 system, what you say is how that one is. Not sure what to do, try it as is or make the adjustment. I do notice from the other system that I should probably mark the swap as FS type linux-swap(v1). The other system looks like: I don't think it would actually matter with partition 1. If your filesystem has a 2kB or 4kB block size, then those extra 2 sectors won't ever be addressed. With partition 2, however, the additional 7 sectors extend the volume by one or two filesystem blocks (with 3 extra sectors on the end). I would go ahead and extend the partitions to the n-1 values. It's always safe to have a filesystem on a block device that is larger than the filesystem, but the converse is not true. You can also check the superblock with tune2fs -l to see how big the filesytem thinks the block device is. Block count * Block size / 512 should be = the number of sectors. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: RHEL5 and PERC H700
On 2010-06-24 12:34, Kipp, Jim wrote: Thanks, I will give RHEL5.4 a try Uh, why not RHEL 5.5? -Original Message- From: linux-poweredge-boun...@dell.com [mailto:linux-poweredge-boun...@dell.com] On Behalf Of raghavendra_bilig...@dell.com Sent: Thursday, June 24, 2010 12:36 AM To: robin-li...@robinbowes.com; linux-powere...@lists.us.dell.com Subject: RE: RHEL5 and PERC H700 RHEL5.2 does not include native support for H700 controller. H700 controllers is supported natively from RHEL5.3 onwards. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: dell 2850 initrd problem.
On 2010-06-09 15:33, Paul M. Dyer wrote: Sorry for the delay. Been busy with things. If you believe the other LVOL is a filesystem, you can run e2fsck on it also. Yes, the first LVOL may be a swap partition. The -b 32768 parameter is telling e2fsck to use superblock at location 32768, instead of the default location. If you have a superblock that is corrupt, that command is using the first backup superblock. So, you could try the default, if fails, then the first backup superblock, then the second, ... default superblock: e2fsck -f /dev/mapper/VolGroup00-LogVol00 Even if there were a filesystem on that volume, which there almost certainly isn't, I don't know why you think the first backup superblock is at 32768. The locations of backup superblocks depend on the size of the device. The way to find out is to run a non-destructive mkfs on the device (i.e. with mkfs.ext3 -n). ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: dell 2850 initrd problem.
On 2010-06-08 14:18, Ron Croonenberg wrote: Hi Paul, I managed to move pretty much all data from the 'hosed' machine to another place. (there are a few things that I would like to salvage but couldn't) I would like to make an attempt to fix the filesystem and see if I can get some more of it. So if I issue: e2fsck -f -b 32768 /dev/mapper/VolGroup00-LogVol01 because that is the one with the bad superblock? that I found a bit ago with: Again, there's no reason to think you have a bad superblock. Have you made an image of LogVol00 so you can safely fsck it? e2label /dev/mapper/VolGroup00-LogVol01 : bad magic number in Group-LogVol01. couldn't find valid filesystem superblock (although Jefferson mentioned that might be swap) Given that that's the only other volume, it's almost certainly swap. Can you recover the original /etc/fstab from /dev/mapper/VolGroup00-LogVol00? That should tell you if it's swap. Or you can run the following command: dd if=/dev/mapper/VolGroup00-LogVol01 bs=1 skip=$[0xff6] count=10 | strings If that yields SWAPSPACE2, it's swap. there is also: /dev/mapper/VolGroup00-LogVol00 but I thought that was /boot ? /boot is /dev/sda1. sorry about all the questions, but I am a bit of a rookie with fixing crashed filesystems. That's okay. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: dell 2850 initrd problem.
On 2010-06-04 15:42, Ron Croonenberg wrote: Jefferson Ogata wrote: What does your partition table actually say? Is /dev/sda2 *supposed* to be a filesystem, or is an LVM physical volume? it says that /dev/sda2 is an LVM volume In that case, I strongly suggest that you *not* follow any advice about trying to recover a superblock on it. What is the filesystem label on /dev/sda1 (run e2label /dev/sda1)? (I'm guessing it's /boot.) How important is the content of this system. Do you have another system you can image the disk to in case you do something destructive? ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: dell 2850 initrd problem.
On 2010-06-04 15:42, Ron Croonenberg wrote: it says that /dev/sda2 is an LVM volume In case it isn't clear to you, BTW, this means that /dev/sda2 is NOT /. Check what devices you have under /dev/mapper/. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: dell 2850 initrd problem.
On 2010-06-04 16:58, Robin Bowes wrote: On 04/06/10 17:51, J. Epperson wrote: On Fri, June 4, 2010 12:08, Jefferson Ogata wrote: On 2010-06-04 15:42, Ron Croonenberg wrote: it says that /dev/sda2 is an LVM volume In case it isn't clear to you, BTW, this means that /dev/sda2 is NOT /. ?Does it? Hmm, I queried that statement too. Why does this mean that /dev/sda2 is not / ? As I understand it, the default RH/Fedora install is two partitions: a small /boot, and the rest an LVM PV assigned to a VG (VolGroup00) with root on an LV inside VolGroup00 ? Regardless of default install layout, we have a bad superblock on /dev/sda2 and /dev/sda2 marked as an LVM device. The evidence is strong that / is an LV within the PV that is on /dev/sda2. If you tried to recover a superblock directly on /dev/sda2 you would wipe out who knows what. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: dell 2850 initrd problem.
On 2010-06-04 17:08, Ron Croonenberg wrote: Here is what is in /dev/mapper/ 10, 63 control 253, 0 VolGroup00-LogVol00 253, 1 VolGroup00-LogVol01 And what do you get from: e2label /dev/mapper/VolGroup00-LogVol00 e2label /dev/mapper/VolGroup00-LogVol01 (or, alternately): e2label /dev/VolGroup00/LogVol00 e2label /dev/VolGroup00/LogVol01 What are the sizes listed in /proc/partitions for these two devices (identified by major/minor numbers 253/0 and 253/1)? ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: dell 2850 initrd problem.
On 2010-06-04 18:57, Paul M. Dyer wrote: Actually, you would want to try and recover the superblock from the /dev/mapper/.. device. No, given that he already said he can see /etc, albeit missing important things, his superblock is almost certainly intact. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: dell 2850 initrd problem.
On 2010-06-04 17:17, Ron Croonenberg wrote: In what sense are these systems fried? Can you move the RAID controller and disks from one system to another? Uhm, no. the machine I am talking about now has the hardware repaired, by putting in a new raid kit. Moving it to another server (I don't have another 2850, I have a few more 2950's though)( would make much sense I think. The filesystem is 'corrupted' so moving the disks doesn't help much there? I meant the other way around. If the two backup systems are fried but you can get their disks running on alternate non-fried systems, you can recover your data from those. question: since I have a broken filesystem, does it even make sense to make a disk image of it? or do you mean just for backup? (that if I break it, I still have the broken stuff I started with?) Yes, it does make sense, and for the reason you surmise. Ideally you would get a disk image onto another system as an LV, take a snapshot of that, and then work with the snapshot. Then if things go awry, you just drop the snapshot and make a new one. If you're lucky, what's happened is that the /etc directory file has been damaged but that all of the objects it linked to are intact. fsck will reparent these objects under /lost+found if so. But there may be other damage as well. But even if you can't recover /etc, you may well be able to recover the data you're really interested in. And even if files have been orphaned and fsck doesn't reattach them under /lost+found you may be able to recover them using debugfs. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: dell 2850 initrd problem.
On 2010-06-04 19:14, Ron Croonenberg wrote: What jefferson says is correct, I can mount the volume with the rescue cd, in /mnt/sysimage. If I browse around (within /mnt/sysimage) I can see 'everything', not just etc. However, /initrd is empty and in /etc there are a bunch of damaged files, initrd.conf, ldap.conf ... but those are just a few (a dozen or so) (an ls in the directory would show some question marks on the lines with damaged files. I tried to rename one of the damaged files and got an Input/Output error. Is there anything in the output of dmesg that relates to the I/O error? The /etc directory file may have an invalid block number in its block list, pointing past the end of the volume, for example. I wouldn't try to make any modifications to the filesystem, let alone fully boot it, without imaging it first. You could very easily have a corrupted free block list that could cause boot logging to write all over the data you care about. Your best bet might be just to do a reinstall. IIRC, /initrd shouldn't even exist except during boot before swaproot; that's a transitory filesystem used to load drivers so the kernel can find the devices it needs. If it does exist it *might* mean that your last kernel patch had a problem building the new initrd and aborted, e.g. if the filesystem was full or if the RAID controller caused problems. You might have a previous kernel you could boot (should show up in your grub list), but as I say, I wouldn't even try to boot the system in its current state without making a disk image. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: YA Perc H700 Question
On 2010-05-25 16:48, Wiley Sanders wrote: This is another basic quick presales question that sales doesn't seem to be able to answer: The specs for the H700 say it has 4x2 SAS ports. What exactly does 4x2 mean? As far as I can tell the controller has two channels, period. I believe it means two four-lane channels (12 Gb/s each). But I don't have any H700s so I'm not certain. That's how the PERC 6s are tho. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: YA Perc H700 Question
On 2010-05-25 19:23, Jefferson Ogata wrote: On 2010-05-25 16:48, Wiley Sanders wrote: This is another basic quick presales question that sales doesn't seem to be able to answer: The specs for the H700 say it has 4x2 SAS ports. What exactly does 4x2 mean? As far as I can tell the controller has two channels, period. I believe it means two four-lane channels (12 Gb/s each). But I don't have any H700s so I'm not certain. That's how the PERC 6s are tho. BTW, I would (and do) avoid the H* controllers until Dell makes good on its pledge to remove the crippleware firmware features that prevent it from interoperating with non-Dell drives. AFAIK new firmware addressing this hasn't yet been released. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Mounting an LVM disk
On 2010-05-24 19:09, J. Epperson wrote: Some good points, but having had this hole in my own foot, I'll say that it's very unlikely that it's _just_ the partition table that got wiped. I also never had any luck getting a partition editor to work with a disk that had a table saying it was bigger than it actually was. Always had to wipe it at a hardware level to get it repartitioned. I hope OP's luck is better. OP doesn't need a partition table. Assuming that a dd was executed in the wrong direction for some period but aborted without wiping out too much of the disk, he needs to know the offset where the /home filesystem started, and a lower bound on its size. The filesystem could start at any multiple of LVM chunk size from the beginning of the physical volume, which would have covered either the entire disk (which may still be what's going on) or have started at a track offset from the start of the disk, or cylinder offset if not on the same cylinder as the partition table or logical partition table (unless the disk was partitioned in some unusual way). If not too much of the disk is gone, he also might be able to find a backup of the LVM config somewhere. It would be worthwhile imaging the whole disk as a backup, and using strings(1) to try to find an LVM backup. A bigger question for me is why the OP isn't using any redundancy (single disk for OS and RAID0 for the rest), but whatever... ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: how to enable BMC sensors
On 2010-05-24 23:00, Zhichao Li wrote: 1. From the output of ipmitool sdr, can I ensure that the drivers for BMC haven't been installed(or supported) in my Cent OS? If so, where can I get those drivers for my CentOS? 2. Since currently, no ipmi modules is installed on my CentOS, where can I find and install those ipmi modules on my CentOS for Dell PowerEdge R710? Did you run service ipmi start? ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: any advice to find root cause of Falling back to HPET ?
On 2010-05-24 01:01, Bond Masuda wrote: This time around, on 3rd chassis, the lost ticks are no longer, the server is fast/normal again, and all is well. I just can't believe we ran into 2 servers with the same issue back to back??? (the guys at the hosting company are usually great, but it almost makes me doubt that they did the 1st chassis swap??) For future reference, you might want to save a dump of dmidecode output before requesting chassis swaps so you can check if the service tag actually changed. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: any advice to find root cause of Falling back to HPET ?
On 2010-05-24 04:45, Bond Masuda wrote: As an aside, this weekend has been a bizarre weekend. After finally resolving this lost ticks issue, another server kernel panic and crashed mysteriously, and again kernel panic upon boot up.. some messages about bad memory in DIMM8 and DIMM7. then my friend's Drobo went all red and failed this afternoon. We must be getting showered by intense cosmic rays this weekend Maybe the lost ticks was happening because of the Lost series finale. rim-shot / ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Can't set lcd on R710?
On 2010-05-21 19:31, Sam Flory wrote: I'm unable to set the lcd message on my R710. This seems to work fine on a 2950. Is there a patch for ipmitool I'm missing or something? The R710s have a more elaborate LCD setup than the 2950s. There are settings both in the BIOS and in the iDRAC Express. I'm not surprised the same method ipmitool doesn't work in the R710. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: any advice to find root cause of Falling back to HPET ?
On 2010-05-22 19:52, Bond Masuda wrote: I wish RHEL4 had smartctl that worked with megasas; i'll have to compile the latest smartctl to see if SMART data will tell me anything about the drives. One thing to note is that during the build of these servers 2 weeks ago, one of the drives on s8 did fail and had to be replaced. Try megactl/megasasctl. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Serial over LAN
On 2010-05-20 17:45, chandrasekha...@dell.com wrote: Install OpenManage Server Administrator on that system. It will provide you GUI or CLI interface to configure the SOL. Talk about using a sledgehammer to drive a pushpin. Nice job quoting the entire digest too. On 2010-05-20 16:14, Marc Moreau wrote: I'm looking to setup Serial over Lan on my cluster of PowerEdge 1950's. Does anyone have this setup? Sure. Use it all the time. I'm kind of confused on how all the redirection works. From posts that I have read, we redirect console to a serial port, then tell the BMC to forward the serial console to the LAN. But my BIOS has a 'Direct Connect' mode. My IPMI doesn't have any of the Serial redirect. I fear that I am confusing IPMI and BMC somewhere too. Could some take a stab at explaining this please. Assuming you don't have DRACs in these systems, the BMC is the device that is providing the IPMI interface. 1. In BIOS setup, set console redirection to use COM2. 2. In BMC setup (control-E during POST), enable serial over LAN, set IP and password. ** First thing you should do in BMC setup is reset to default. The BMCs often ship with a weird non-default setting that will cause lots of serial port feedback if you try to run a getty on the serial console. 3. Use ipmitool -H bmc-host -I lanplus -U bmc-user sol activate to reach the serial console. One further question. If I do get this setup, does it give me 'full console access' over serial. In other words, do I get the BIOS POST, or does the serial console only come up after post? I am aware that I need to setup my OS (Centos) to spawn a serial tty, and this I have done before, just not over baseboard LAN. Yes, you will get the entire POST including the initial BIOS id. You might also want to set Redirection after boot in the BIOS. In grub, add something like console=ttyS1,57600 console=tty0 to your kernel line. If you have DRACs the setup is actually quite similar, but you can also do it completely remotely by hitting the web service on 192.168.0.120 (IIRC) to access the KVM console, then use that to do the BIOS setup. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Serial over LAN
On 2010-05-20 20:56, Alexander Dupuy wrote: Jefferson Ogata wrote: ** First thing you should do in BMC setup is reset to default. The BMCs often ship with a weird non-default setting that will cause lots of serial port feedback if you try to run a getty on the serial console. For often you can substitute always (at least in my experience). I've had one or two cases where I *didn't* have this problem. :^) Yes, you will get the entire POST including the initial BIOS id. You might also want to set Redirection after boot in the BIOS. In grub, add something like console=ttyS1,57600 console=tty0 to your kernel line. Just to make this clear, if the BIOS redirection after boot is enabled, the VGA console will be duplicated on the serial port. This may be okay (and is the easiest way to make it work), but if you go that route you want to make sure that you are not also using the serial port directly (e.g. grub console=ttyS1,57600, /etc/inittab ttyS1 entry, etcetera). If you have redirection after boot disabled, you are responsible for making serial access work in ISOLINUX/PXELINUX, grub, and Linux. Hmm, my impression was that Redirection After Boot only affects comms up to the point of getting a bootloader running, but I'm not 100% sure. Anyway, no harm in trying either way. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: 2850 PERC 4e/Di drive errors
On 2010-05-11 14:54, Matthew Lenz wrote: $ megactl -H a0 PERC 4e/Di chan:2 ldrv:1 batt:good a0c0t0 279GiB a0d0 online errs: media:6 other:1 write errors: corr: 0delay: 0rewrit: 0tot/corr: 0tot/uncorr: 0 read errors: corr: 4Mi delay: 58reread: 0tot/corr: 0tot/uncorr: 0 verify errors: corr: 0delay: 0revrfy: 6tot/corr: 0tot/uncorr: 6 temperature: current:28C threshold:0C This is the only system with this showing up (of several x850 raid setups). These systems are still under warranty should I request a replacement of this drive? I really don't know how long this drive has been erroring. Try running a long self-test on the drive (megactl -T long a0c0t0). If it fails that it will be worth replacing it. I'm assuming that that drive is part of a redunant RAID. Be aware that if the disk is failing, a long self-test may turn up enough problems for the PERC to knock the disk offline. If you don't have redundancy, back the system up first... ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Debian lenny on PE2950
On 2010-04-28 04:27, Carlos Beltrame wrote: I'm a beginner with dell products and I have several douts about installing debian lenny (amd64) into a PE2950. It has 6 HDs, 2x73gb and 4x146gb. I know that this machine comes with PERC5 and when the instalation begins, the partition manager detects only one scsi with 73gb and another with 438gb, scsi too. Is it correct? How to detect each hd to create my own raid and lvm? I hope that my explanation was clearly. Best regards. Obviously the PERC is set up with a RAID1 on the two 73GB disks and a RAID5 on the four 146GB disks. This is hardware RAID. You don't need to see the individual disks in Linux; the PERC is managing them for you and giving you logical disks to work with. If you want to change the hardware RAID config, go into the PERC BIOS during boot (watch for the prompt). If you really want to see the individual disks in Linux for some reason, you can delete the two logical disks and configure each disk as its own RAID0. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Very high load during mkfs on a new 1tb raid 1
On 2010-04-20 16:04, Daniele Paoni wrote: I just installed two new 1TB disks configured as RAID1 on a PE2950 with PERC 6/i controller. When I try to create the filesystem with mkfs the load on the server goes over 500 (i have then killed the mkfs) and all the processes are blocked in D state (the processes are apache + mysql running on another raid1 array) Did you wait for the RAID1 to finish initializing? -- Jefferson Ogata : Internetworker, Antibozo og...@antibozo.net http://www.antibozo.net/ogata/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Very high load during mkfs on a new 1tb raid 1
On 2010-04-20 16:59, Daniele Paoni wrote: On 04/20/2010 06:25 PM, Jefferson Ogata wrote: On 2010-04-20 16:04, Daniele Paoni wrote: When I try to create the filesystem with mkfs the load on the server goes over 500 (i have then killed the mkfs) and all the processes are blocked in D state (the processes are apache + mysql running on another raid1 array) Did you wait for the RAID1 to finish initializing? Yes, the array status is ready. I have initialized it with the fast initialize command, could it cause the problem ? Fast initialization just means it marks the array as ready and continues scrubbing in the background. The scrubbing competes heavily for I/O. Use MegaCLI to check initialization status, e.g. MegaCLI -ldbi -showprog -lall -aall. Or look at the disks when you aren't doing anything. Are they both lit up heavily? ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Very high load during mkfs on a new 1tb raid 1
On 2010-04-20 20:54, Daniele Paoni wrote: On 04/20/2010 07:35 PM, Jefferson Ogata wrote: Fast initialization just means it marks the array as ready and continues scrubbing in the background. The scrubbing competes heavily for I/O. Ok I have reinitialized the array with the slow initialization method. I don't think it actually makes any difference whether you choose fast or slow. Use MegaCLI to check initialization status, e.g. MegaCLI -ldbi -showprog -lall -aall. [r...@ns01 daniele]# /opt/MegaRAID/MegaCli/MegaCli -ldbi -showprog -lall -aall Background Initialization on VD #0 is not in Progress. Background Initialization on VD #1 is not in Progress. If you check shortly after boot, this will be the case. Check again after about 5 minutes. The controller holds off on background initializations for a little while. [r...@ns01 daniele]# omreport storage vdisk List of Virtual Disks in the System Controller PERC 6/i Integrated (Embedded) ID : 1 Status : Ok Name : Data State: Ready HotSpare Policy violated : Not Assigned Virtual Disk Bad Blocks : Not Applicable Secured : Not Applicable Progress : Not Applicable Layout : RAID-1 Size : 931.00 GB (999653638144 bytes) Device Name : /dev/sdd Bus Protocol : SAS Media: HDD Read Policy : Read Ahead Write Policy : Write Back Cache Policy : Not Applicable Stripe Element Size : 64 KB Disk Cache Policy: Enabled Or look at the disks when you aren't doing anything. Are they both lit up heavily? The disks are 200Km away from me :-( so I cannot check them. I tried to create the filesystem after the slow initialization but the issue is still present. A simple dd if=/dev/zero of=/dev/sdd1 exibits the same problem , after 30 seconds the load is already about 46 and all the waiting processes are in state D ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Adding 3rd Party RAID card halts Linux boot
On 2010-04-17 19:39, Nathan Milford wrote: Give it a shot and no go. Gives me Filesystem type unknown Since array has no filesystem and the original message says Filesystem type is ext2fs I presume it can see that (hd0,0) is correct, but it just stops short... You might also need to tweak your root= option in the kernel line. What are the lines in your default grub boot conf? ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: debian monitoring
On 2010-04-13 16:56, P.A wrote: Prasana, thanks for the info, I actually had downloaded this but it does not come with MegaCLI which is shown on the example page you have below. I don't think megactl and megacli are the same, someone correct me if im wrong. MegaCLI and megactl are definitely different. MegaCLI is the official configuration and inspection tool from LSI. megactl/megasasctl is an alternate program I authored that does some things MegaCLI does not do; in particular it uses SCSI passthrough to get info directly from the disks in a RAID so you can check error conditions at the drive controller level rather than the RAID controller level. It can also start disk self-tests and check their results. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: megasas failures at MWT2
You might get some more info about what's going on by dumping the PERC log using MegaCLI. MegaCli -AdpEventLog -GetEvents will give you the basic log. MegaCli -FwTermLog -Dsply will give you a more detailed log. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: / full
On 2010-03-25 17:07, Dimitri Yioulos wrote: At this point, we don't want to add more space to the partition. Can anyone suggest any other things we can try? Does this condition survive a reboot? Is /boot on /, or is it a separate filesystem? One thing that happens on Red Hat and derivatives is that as you patch the kernel you need to remove older kernels to avoid filling up whichever filesystem /boot resides on. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: External array showing as /dev/sda
On 2010-03-21 22:30, David Hubbard wrote: From: Stephan van Hienen [mailto:stephan.van.hie...@thevalley.nl] No sure which controller you have, but with the perc5/6 you can create multiple virtual disks ? I have one PE2950 server with a Perc5i with 6 * 750GB drives, with a 150gb boot vdisk, and a 3TB vdisk. (raid5) When doing raid 50 the H200/700/800-series controllers do not let you do that, the virtual disk size box becomes a fixed value. I think it may let you do that on some other raid types. As if we needed another reason to avoid the H-series controllers To address your problem: you could do two RAID 5s and stripe them with LVM. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: RAID-5 and database servers
On 2010-03-12 15:39, Craig White wrote: On Fri, 2010-03-12 at 07:06 +, Jefferson Ogata wrote: On 2010-03-12 04:26, Craig White wrote: On Fri, 2010-03-12 at 02:23 +, Jefferson Ogata wrote: On 2010-03-11 22:23, Matthew Geier wrote: I've had a disk fail in such a way on a SCSI array that all disks on that SCSI bus became unavailable simultaneously. When half the disks dropped of the array at the same time, it gave up and corrupted the RAID 5 meta data so that even after removing the offending drive, the array didn't recover. I also should point out (in case it isn't obvious), that that sort of failure would take out the typical RAID 10 as well. ignoring that a 2nd failed disk on RAID 5 is always fatal and only 50% fatal on RAID 10, I suppose that would be true. The poster wrote that all of the disks on a bus failed, not just a second one. Depending on the RAID structure, this could take out a RAID 10 100% of the time. actually, this is what he wrote... When half the disks dropped of the array at the same time, it gave up and corrupted the RAID 5 meta data so that even after removing the offending drive, the array didn't recover. Half != all Read it again: I've had a disk fail in such a way on a SCSI array that all disks on that SCSI bus became unavailable simultaneously. Unless you have a disk on a separate bus for every mirror in the RAID 10, this will kill your RAID 10 100% of the time. While that configuration is more bulletproof, it also may not perform as well on a saturated RAID 10 since every write has to be queued to two separate buses instead of one. The original poster's failure was a recoverable one, anyway. He just didn't know the technique for recovery. I had a 5 disk RAID 5 array fail the wrong disk and thus had 2 drives go offline and had a catastophic failure and thus had to re-install and recover from backup once (PERC 3/di SCSI disks). Not something I wish to do again. PERC 5 and PERC 6 are worlds different from the PERC 3/di. I don't think I understand your 'odds' model. I interpret the first example as RAID 50 having 5 times more likelihood of loss than RAID 10 and I presume that isn't what you were after Yes, it is 5 times higher. But it is not 100%; it's actually less than 50%. And the probability for RAID 10 is not 50% as you said it was. I was just correcting your analysis. I'm still not sure what RAID structure you had in mind where a second failure on a RAID 10 has a 50% probability of loss. In the alternative fair comparison, RAID 5 vs. RAID 1, the second failure kills both RAIDs 100% of the time. actually, I didn't raise the RAID 5 vs RAID 10 comparison, I only amplified with my experiences You wrote: ignoring that a 2nd failed disk on RAID 5 is always fatal and only 50% fatal on RAID 10, I suppose that would be true. That was you comparing RAID 5 with RAID 10. the last time I bought an MD-1000, Dell would only sell me the PERC-5e, I don't know why. Currently you can buy an MD1000 with or without a PERC 6. (If I could recommend an enclosure from a different manufacturer at this point, I would, but I haven't evaluated any others since I started buying MD1000s some years ago.) -- Jefferson Ogata : Internetworker, Antibozo og...@antibozo.net http://www.antibozo.net/ogata/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: RAID-5 and database servers
On 2010-03-12 15:45, John G. Heim wrote: Well, its not really practical to suggest that I consult with my vendor. My whole budget is $6000. This is just the Math Department at the University of Wisconsin. I mentioned in my original message that our databases consist primarily of spamassassin bayesian rules and horde3/imp web mail. Those do a lot of updates -- well, a lot by our standards. Every time a spam message comes in, it it is added to the bayesian rule set for the user. I'm going to say that typically each user gets 100 spam messages a day and there are 200 users. But each new rule consistes of several table updates. Even so, its not like we're ebay. Anyway, speed of updates is critical because we can't have the mail system getting bogged down by database updates. I put the bayesian rules in a mysql DB in the first place because it was getting bogged down saving bayesian data to bbm files on the mail server. I just want to make sure that I'm not setting myself up for a disaster. Can you estimate the number of transactions per second you need? Is the current mysql implementation keeping up with the mail? If so, run iostat -kthx 60 under peak load, wait a minute, and post the last report indicating which block device has the mysql database on it. It doesn't sound like it would be a disaster if your database filesystem crashed; you'd just drop the spam filtering while you reconstruct it. Is your $6000 just for storage or do you have to buy a PowerEdge to go along with it? ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: RAID-5 and database servers
On 2010-03-12 22:10, John G. Heim wrote: I really think my boss is nearly out of patience with me. I think I know what I want though. If I want to set up two RAIDs, one for the operating system and one for the database files, do I need two PERCs? Can a single PERC put 2 disks in a RAID-1 array and 3 others in a RAID-5 array? Yes, no problem. You'll have /dev/sda and /dev/sdb. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: RAID-5 and database servers
On 2010-03-11 18:09, Preston Hagar wrote: Actually it says if money is no object, go with RAID 10: http://www.orafaq.com/wiki/RAID#RAID_10 RAID 10 is the ideal RAID level in terms of performance and availability, but it can be expensive as it requires at least twice the amount of disk space. If money is no objective, always choose RAID 10! I would agree with the RAID 10 recommendation. I at one time did a lot of RAID 5 to try to comprimise price vs performance, but had several array failures resulting in having to restore from backup. Now, I put anything important on either RAID 1, or RAID 10. Basically I use RAID 1 if it needs to be reliable and RAID 10 if it needs to be reliable and fast. I've got several hundred disks running on RAID 5 and I've had one actual full RAID failure in 10 years, and that was my fault. In terms of performance, depending on the workload, RAID 5 can outperform RAID 10. Furthermore Oracle's recommendations are based on what appears to be 5-10-year-old data, back when mid-level RAID controllers weren't capable of pushing ~700 MB/s onto a RAID 5. Nowadays, they can do that, and achieve pretty stellar IOPS as well. The difference in performance between RAID 5 (or better yet, RAID 50, striped using LVM), and RAID 10 is not what it used to be. Bear in mind also that now that Oracle is a hardware company, they'd just love you to buy almost twice as much disk (from them). *Again*, this is why if you have particular performance requirements, you should consult with your database vendor to determine what bandwidth and IOPS you need, and benchmark your gear using different RAID configs. You may find that RAID 5 is just fine performance-wise, and you can get around 1.7 times the storage capacity with the same rack space, heat, and power load over RAID 10. Asking here you're just going to get people parroting Oracle's stale recommendations and speculating wildly without knowing anything about your workload. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: RAID-5 and database servers
On 2010-03-11 19:48, Eric Rostetter wrote: Quoting Jefferson Ogata powere...@antibozo.net: I've got several hundred disks running on RAID 5 and I've had one actual full RAID failure in 10 years, and that was my fault. You've been lucky! :) In 10 years, I've think I've had 3 RAID 5 failures (all rebuilt without problems). That's not what I mean by a full RAID failure. I've had plenty of disks fail and subsequent successful rebuilds. I'm saying on one occasion (because of an oversight) I ended up with an unrecoverable RAID 5 because of disk failures. Of course, this wasn't a serious problem because I also had backups. -- Jefferson Ogata : Internetworker, Antibozo og...@antibozo.net http://www.antibozo.net/ogata/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Installing RHEL5 via DRAC
On 2010-03-12 01:31, Paul M. Dyer wrote: Sorry Stephen, but that will not work. The iso is setup in the comps to be a CD/DVD. The process of creating a repository rebuilds the comps for the new media, i.e. harddisk. I have no problem using the iso image directly. I also have no problem installing simply loopback-mounting the ISO and either NFS-exporting that directly or copying the whole thing to an exported directory. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: RAID-5 and database servers
On 2010-03-12 04:26, Craig White wrote: On Fri, 2010-03-12 at 02:23 +, Jefferson Ogata wrote: On 2010-03-11 22:23, Matthew Geier wrote: I've had a disk fail in such a way on a SCSI array that all disks on that SCSI bus became unavailable simultaneously. When half the disks dropped of the array at the same time, it gave up and corrupted the RAID 5 meta data so that even after removing the offending drive, the array didn't recover. I also should point out (in case it isn't obvious), that that sort of failure would take out the typical RAID 10 as well. ignoring that a 2nd failed disk on RAID 5 is always fatal and only 50% fatal on RAID 10, I suppose that would be true. The poster wrote that all of the disks on a bus failed, not just a second one. Depending on the RAID structure, this could take out a RAID 10 100% of the time. In your second disk scenario, comparing RAID 5 with RAID 10 in terms of failure likelihood isn't fair; you need to compare RAID 50 with RAID 10. And the odd depend on the number of disks and the RAID structure. Suppose you have 12 disks arranged as a 6x2 RAID 10, and the same number of disks as a 2x6 RAID 50. When the second disk fails the odds of loss are: - RAID 50: 5/11. - RAID 10: 1/11. If instead we have the 12 disks as a 3x4 RAID 50, then the odds of loss when the second disk fails are: - RAID 50: 3/11. - RAID 10: 1/11. We can now tolerate a third disk failure with our RAID 50 with the odds of loss: - RAID 50: 6/10. - RAID 10: 2/10. How often does this happen? It hasn't happened to me, and it hasn't happened to anyone I know. In the alternative fair comparison, RAID 5 vs. RAID 1, the second failure kills both RAIDs 100% of the time. And there's always RAID 6. So if Dell is selling a high quality hard drive with more than average durability and the anticipation that it is going to last longer under 24/7 usage, its entirely reasonable to have to pay more than the cheapest dirt SATA drive you can find online. Of course you will have to live with the consequences if you go with the dirt cheap drive. Personally, I put a lot of value on my time and my customers data. I have hundreds of Dell disks online. They fail regularly. Often they fail during system burn-in. For the kind of markup Dell is charging on these drives I don't think I should be finding dead ones after only 24 hours of operation. And a one-year warranty is just ridiculous. I read this article last year... http://www.enterprisestorageforum.com/technology/features/article.php/3839636 and I had already forsaken RAID 5 but it pretty much confirmed what my experiences had been... that when I considered the life cycle of the installation, the time lost in waiting for file transfer, etc. on RAID 5, etc. that it was foolish for me to recommend RAID 5 to anyone. It's pretty clear you don't speak from any recent experience as far as RAID 5 performance goes, and you yourself say as much when you say you had already forsaken RAID 5. Like Oracle, you're living in the past. You should do some of your own benchmarks. In any case, the argument in that article applies to RAID 10 as well; it gives you better probabilities but eventually it will take too long to rebuild mirrors and failure will be just as inevitable as with RAID 5. Error rates will have to drop to prevent this, and no doubt they will, sufficiently that the article's argument is moot. Eventually they will drop to the point where we will be using RAID 0. On top of that, it seems to me that RAID 10 smokes RAID 5 on every performance characteristic my clients are likely to use (and yes, that means databases). RAID 5 primarily satisfies the needs for maximum storage for the least amount of money and that was rarely what I need in a storage system for a server. For a lot of access patterns, RAID 5 yields much better write bandwidth than RAID 10. I don't know why you think RAID 10 smokes RAID 5. You should grab a PERC 6 and a couple of MD1000s and try some different configurations. I don't think you'll see any smoke in the margins, even over the oddly limited gamut of access patterns your clients use. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Alternatives to Dell storage?
So, does anyone want to recommend alternatives to the MD1000 for 3.5 SAS/SATA storage? I've looked at HP's offering but it appears to have a lower drive count. Obviously direct LSI replacements for the prior generations of PERC are available, as well as the Arecas that some people seem to like. Any other suggestions on the controller front for RAID5/6 connecting to SAS/SATA enclosures? The reasons for this question should be obvious. :^) ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: iDRAC6 out of range
On 2010-02-19 15:59, Nick Lunt wrote: just been trying to install Red Hat on R710 over the DRAC and I just get out of range when it starts anaconda gui, so I had to reboot and install in text mode, which is less than ideal. Anyone know how to solve the out of range issue please ? Try adding resolution=800x600 or resolution=1280x1024 to the install boot options. See also: http://fedoraproject.org/wiki/Anaconda/Options Or use kickstart with PXE and don't look at the screen at all. :^) ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Third-party drives not permitted on Gen 11 servers
On 2010-02-16 17:46, Blake Hudson wrote: Attached was a pdf explaining the stringent quality control standards for Dell's HDDs. No apology, remorse, alternative solutions, etc. That's pretty funny considering the fairly high failure rate of Dell drives. If you actually check the SMART statistics you'll see the PERC often tries to pretend bad drives are just fine. For example I have a Dell-provided Seagate in a PE2950 right now that has logged 100 uncorrected write errors and 10 uncorrected read errors, and has failed a SMART long self-test. The PERC says 2 media errors and hasn't failed it out of the RAID. Well, I guess this is the year I start diving into HP or IBM gear. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Reverse DNS lookup syslog (Debian 4.0)
On 2010-02-15 10:22, Brian O'Mahony wrote: Not possible with 600+ machines in different locations around the world. MAC reservations for clients is not an option here. Only 600+? Of course it's possible. The only legitimate reason to use non-static assignment is if you don't have enough address space for your devices and they need to take turns. Not knowing what devices are attaching to your networks gives you exactly the sort of problem you're currently trying to find a kluge for, and invites unwelcome guests on your network. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Reverse DNS lookup syslog (Debian 4.0)
On 2010-02-12 14:58, Brian O'Mahony wrote: As our domain use DHCP for the windows clients, when I go back a few days later, the syslog entries are close to useless to mea s the IP may have changed. How do I get syslog to log the hostname of the connecting machine? Your real problem is that you are using DHCP to assign addresses from a pool. Reserve your IPs to MAC addresses so that machines don't change IPs and you've solved both problems. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Firmware repository out of date
On 2010-01-29 22:28, jeffrey_l_mend...@dell.com wrote: Looks like the firmware repo is falling out of date. This happened before when it was unofficial. Maybe now that it is official something can be done? The OM 6.2 server update utility (SUU) disc that was released in December was pulled and replaced with a new version last week. The DRAC5 firmware going from 1.5 to 1.51 was one of the changes. I just updated the yum repo yesterday to reflect the new SUU disc. Hmm. Does that mean there were problems with SUU 6.2.0? ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Lifecycle Controller not-so slick IMO.
On 2010-01-28 14:11, patrick_b...@dell.com wrote: You should look at USC 1.3 and the latest DRAC for your system. There exists a WSMAN interface through the DRAC to do just what you are saying. Dell folks: please feel free to include URLs for resources providing further details in your postings. :^) ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: PXE boot R900 via 10Gb Intel NIC?
On 2010-01-23 13:17, Pavel Matěja wrote: I have gotten as far as figuring out I might need to run IBAUTIL.EXE to enable PXE boot on the device ROM, but I'm so far stymied at actually getting to a state where I can execute that program. None of the ancient DOS bootable floppy images I have around seems to want to boot on this system via virtual floppy. Any clues? Anyone successfully booted a bare metal R900 to a DOS/Windows command line in order to run such a beast? Try http://www.freedos.org/ Oh, I did, of course. It doesn't run on R900s. Boots but crashes with invalid instruction errors. But thanks for the suggestion. I've also tried booting gPXE on virtual CDROM but it fails silently; I'm guessing that either there's no current driver for that NIC or the boot code doesn't work on an R900. WTF is wrong with Intel, anyway? Who expects someone to boot DOS in 2010 in order to tune a 10GBE NIC? They could at least provide some way to do it in Linux so I could boot a live CD running an operating system from this millennium to fix it. And if Dell is providing IBAUTIL as part of their support packages, it would be nice if they could suggest a way you could run it successfully. Maybe there's some way to do it with ethtool -E, but I don't have any way to find out what it is. Any other ideas? ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: PXE boot R900 via 10Gb Intel NIC?
On 2010-01-23 03:32, thomas_chena...@dell.com wrote: Jefferson Ogata wrote: Anyone know how to PXE boot an R900 off of an Intel 10Gb optical NIC rather than one of the onboard copper NICs? Some Intel 10Gb optical NICs are bootable, but not necessarily all. For bootable adapters, the boot ROM needs to be programmed and enabled using a DOS-based tool; either flautil.exe or ibautil.exe. These tools can be found on support.dell.com in packages named in the pattern Intel_LAN_*_DOSUtilities*. Thanks for the reply! Well, this is the 10GBE NIC provided by Dell as part RN219, and the specs on that part claim it supports PXE boot. I have gotten as far as figuring out I might need to run IBAUTIL.EXE to enable PXE boot on the device ROM, but I'm so far stymied at actually getting to a state where I can execute that program. None of the ancient DOS bootable floppy images I have around seems to want to boot on this system via virtual floppy. Any clues? Anyone successfully booted a bare metal R900 to a DOS/Windows command line in order to run such a beast? ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: megactl
On 2009-12-30 09:07, Pavel Mateja wrote: One of these days I'll add some code to do that in mega{,sas}ctl, as well as add an XML output mode so that various monitoring tasks become easier, e.g. writing a simple SNMP agent. IMHO this should be done by udev. Create rule file like /etc/udev/rules.d/55-megadev.rules: KERNEL=megadev* MODE=600 and run udevtrigger. That's a fine idea, but the tools should still try to work on systems where no one has done this. But you can make such file part of your package, can't you? I think that would be a bit overzealous, personally. Not really my place to muck with udev. Certainly could include it and recommend installation. udevtrigger can do unexpected things. E.g. I just ran udevtrigger on a system with 7 down interfaces and it brought them all up with dhclient. The safer course is simply to create the device node if it's not present. This is what dellmgr used to do with older PERCs. Maybe in a secondary RPM it would be okay, with appropriate caution. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: megactl
On 2009-12-30 10:39, Pavel Mateja wrote: The safer course is simply to create the device node if it's not present. This is what dellmgr used to do with older PERCs. I've met similar problem year ago. We had raidmon init script which deleted /dev/megadev0 and was unable to recreate it because the device was moved to misc with major number 10 in newer kernels. Udev just worked with both old and new kernels. That's why you look in /proc/devices for the correct major number. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: RAID Perc 5 monitoring
On 2009-12-28 16:36, Brian A. Seklecki wrote: It is in a remote location, so command line tools would be great. I MegaCli from LSI is all that you need. OMSA if your nights are long. megasasctl can send you periodic reports of health of all physical and logical disks. On SAS disks you can also check disk temperature and various other log pages, or initiate disk self-tests in disk firmware. One case where this is handy is when you're about to rotate a new disk in after a failure; it's nice to be able to do a long self-test of the new disk first so you don't have it fail in the middle of the rebuild. http://sourceforge.net/projects/megactl/ You may wish to build from SVN, particularly if you need PERC6 RAID6 support. I haven't made a new release since before I had one to work with. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: PERC/6E kernel modules
On 2009-12-14 15:45, Karl Zander wrote: Does the PERC/6E use the same LSI MegaRAID modules in the kernel as some of the other PERCs? Is there a specific kernel version needed for the PERC/6E? I am am compiling my own kernel. PERCs up to and including PERC 4 (the AMI/LSI, not Adaptec), use the older megaraid driver. PERC 5/i, 5/E, and 6/E use megaraid_sas. PERC 6/i uses mptsas (i.e. the standard LSI Fusion driver). If you want to cover most of your bases, build megaraid, megaraid_sas and mptsas. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: massive io problems
On 2009-12-11 09:04, John Hodrien wrote: On Fri, 11 Dec 2009, Adam Nielsen wrote: Do you have a utility like the old megamgr that can get you controller stats? That will report actual disk errors that are seen by the RAID controller, which may not make it all the way through to the OS. No, I was relying on omsa. I'll take a look at what else I can use. Have you tried using MegaCli to dump the controller log? Have you tried using megasasctl to check SAS error log pages? (Check out the current source from sourceforge subversion and build from source for PERC 6 support.) ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: R900 internal disks I/O slow!!!!
On 2009-12-03 21:50, mcclnx mcc wrote: we have several R900 server with Redhat 5.3 O.S. in it. R900 have 5 internal 450GB SAS disks. we configured it as 0 and 1 mirror, 2 and 3 mirror, last disk singe. I have been perform file in between each logical disks and I/O is very slow (something like 8 to 9GB/sec). Even my OLD DELL 2650 servers (Redhat 4.7) internal disks run fast than it. Does anyone know why? I assume you actually mean 8-9 MB/s. Have you allowed the RAIDs to complete their background initialization? This will contend with your disk activity. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Operation disabled with PERC/5i
On 2009-12-03 16:12, one...@waste.org wrote: Thank you very much for your reply. I thought these were errors relating to the SCSI Inquiry not being read but the drive came with a Dell Server we have, and it was working fine prior: Vendor ID : DELL Product ID: MAX3073RC Revision : D206 Though you mention something that is worrying, that the slot failed. Could I make this drive a hot swappable or clear, which I did successfully, if the slot had failed? I haven't had a slot fail so I don't know. But it's certainly a possibility that could explain both the previous disk failing and the current one acting weird. That's all I'm saying. Have you used MegaCli to dump the controller event log? Is there any way to test a failed slot through these tools? I checked consistency (when the drive was replaced) and all was okay it seems. If the slot failed, it seems to me that the drive may not even be recognized. I suggest putting the drive in a different box to see if it behaves better. That would be some evidence either way about the slot. I'm confused tho. Earlier you wrote: We're running a RAID 5 and two days ago, a drive went down. We replaced it and it went into Foreign mode to which I cleared it: sudo omconfig storage controller action=clearforeignconfig controller=0 That worked fine. However, I cannot seem to get this new hard drive to attach itself to the RAID array, no matter what I try: How is it that you were unable to add the drive to the array but were able to perform a consistency check? The drive is a Dell drive (well Fujitsu) - I find it odd it would bug out... Drives bug out all the time. Please don't top post. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Running custom code on the DRAC
On 2009-11-25 01:58, Adam Nielsen wrote: I have now installed a cross-compiler and managed to get some test code running on the device. The hard part was figuring out how make files available on the DRAC, but luckily it has NFS support built in so I could just mount a folder from another PC and run the code from there. It's been a couple of years since I've played with this, and my memory is fuzzy, but this is what I can dredge up; sorry if any of it is misremembered: The virtual media plugin gives you a path to move stuff to the DRAC, albeit in a block-oriented way. This is not needed with the NFS path and a root shell, but it was when I was working from the restricted busybox user shell. The firmware image has a CRAMFS at some offset. You can dd this off and unpack it to get the firmware contents. Again, not as necessary with a source release. This site used to host a bunch of MIPS/Linux binaries which would run on the DRAC, though I'm not sure where they've gone to. http://www.paralogos.com/mipslinux/ There appear to be more resources these days. This looks promising: ftp://ftp.linux-mips.org/pub/linux/mips/redhat/7.1/RPMS/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Determine interface's MAC address prior to OS install?
On 2009-11-24 17:49, Jefferson Cowart wrote: I have a couple R710s. Those latches are designed to hold the server without the screws. Simply push the server into the rack and they should latch. You can then lift the latches to release the server. I believe the screws are for securing the server if you are shipping it in a rack. (See http://support.dell.com/support/edocs/systems/per710/multlang/Rack/H153KA00.pdf) Huh--well, I'll happily stand corrected on that. Unfortunately, the fellow who brought me the R710s failed to bring the rails along (they're for a remote install), so I haven't seen them interact with the rails. It wasn't evident to me that there was any catch mechanism other than the recessed screws. They're still ugly, though, IMO. :^) ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: DRAC firmware source code available for download
On 2009-11-18 06:44, Adam Nielsen wrote: Maybe this was to discourage us from flashing our own code? Who knows. If anyone from Dell is listening, you really don't have to bother doing that :-) Since the firmware shell was based on busybox, GPL compels them to publish source for at least part of the firmware. Theoretically, they should be publishing it in the same place as the binary, but this is not too bad. I'm not sure how long they've been publishing source, but I couldn't find it when I looked around a couple of years ago. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
BMC SOL agetty feedback loop
Apologies if this has been discussed; it seems to affect a lot of my systems so I'm surprised if it hasn't, but I haven't found an effective solution looking through the list. Various PowerEdge 1950s/2950s without DRAC exhibit this symptom running RHEL 5. There is a multiline warning banner in /etc/issue. Console redirection is enabled, and the following form of kernel line is used in /boot/grub/menu.lst: timeout=5 serial --unit=1 --speed=57600 terminal --timeout=5 serial console title Red Hat Enterprise Linux Server (2.6.18-164.2.1.el5) root (hd0,1) kernel /boot/vmlinuz-2.6.18-164.2.1.el5 ro root=LABEL=/ console=tty0 console=ttyS1,57600 rhgb initrd /boot/initrd-2.6.18-164.2.1.el5.img The following is in /etc/inittab: co:2345:respawn:/sbin/agetty -h ttyS1 57600 vt100-nav (The agetty -h option doesn't seem to matter.) Boot is fine; I have access to the console over IPMI/SOL during boot, all the way to getting an agetty login banner and prompt. But once I disconnect my IPMI/SOL session, after some delay, the BMC enters a kind of feedback loop where the banner text is fed back into agetty. agetty then logs a lot of failed login attempts, and eventually init pauses spawning agetty for 5 minutes because of excessive restarts. This looks like this in /var/log/secure (over and over again): Oct 23 21:03:15 foo login: FAILED LOGIN 1 FROM (null) FOR warning**warning**warning**warning**warning**warning**warning, User not known to the underlying authentication module Oct 23 21:03:17 foo login: pam_unix(login:auth): bad username [] Oct 23 21:03:17 foo login: pam_succeed_if(login:auth): error retrieving information about user Oct 23 21:03:17 foo login: FAILED LOGIN 2 FROM (null) FOR , User not known to the underlying authentication module Oct 23 21:03:18 foo login: pam_unix(login:auth): bad username [] Oct 23 21:03:18 foo login: pam_succeed_if(login:auth): error retrieving information about user Oct 23 21:03:18 foo login: FAILED LOGIN 3 FROM (null) FOR , User not known to the underlying authentication module Oct 23 21:03:20 foo login: pam_unix(login:auth): bad username [] Oct 23 21:03:20 foo login: pam_succeed_if(login:auth): error retrieving information about user Oct 23 21:03:20 foo login: FAILED LOGIN SESSION FROM (null) FOR , User not known to the underlying authentication module And in /var/log/messages: Oct 23 21:03:20 hobo init: Id co respawning too fast: disabled for 5 minutes If I reconnect to the SOL console using IPMI, everything's fine (assuming init hasn't disabled agetty at the time I connect). Once I disconnect, the same thing starts again (again after some delay). I haven't found any options to agetty that alter this behavior, and the BIOS Redirection after boot option doesn't alter it either. It's a problem because init makes agetty unavailable for 5 minutes at a time, and it also causes lots of noise in /var/log/secure. Does anyone know what I'm missing? I would appreciate any help on this. ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq