Re: Strange system lockups - kernel saying disk error
On 5 Jun 2011 at 16:55, Michael Powell wrote: per...@pluto.rain.com wrote: [snip] Power supplies do fail occasionally, and not always in obvious ways such as failing to turn on at all. The output voltages may be a little too high or too low, or they may be correct but with excessive ripple or electrical noise; or the supply may be just fine until a disk draws a current spike to move the arm rapidly. I've seen a fair number or power supplies degrade somewhere around the 5 year mark. Simple voltage checks with a VOM and its accuracy will usually still show the voltages as being correct. To see the ripple you'll need an oscilloscope. Excessive ripple can make a PC appear to have all kinds of intermittent hardware failures with little or no rhyme or reason. A degraded power supply will show large variations in ripple based on load. The largest load from hard drives is when they are first spinning up. Servers are commonly configured with the ability to spin up drives one at a time with a short delay in between. You won't usually find this on a desktop. Generally, this situation will develop more often on an old machine that had a 'barely enough' capacity power supply when new. Add 3 more hard drives, bigger video, etc and it was still just inside the envelope until enough time went by and the power supply got old. Since the most amps pulled by the hard drives occurs on power up you will see the ripple on a 'scope look really ugly while this happens. The unseen danger here is that bits on the drive(s) can get scrambled until things settle down. You will know this happens when stuff goes wrong and fsck is needed to get the file system clean, and after cleaning and working again will do the same thing again at some future reboot. Easiest way to look at this without a 'scope is to simply substitute a known good PSU of sufficient rating from a machine with no troubles. If all the random nonsense suddenly stops, you'll know. This is easiest for folks these days as those without an analog electronics background are unlikely to have an oscilloscope laying around. It might be worth checking the fan mounted on the CPU heatsink if there is one, and the fan in the power supply (which ventilates the case as well as the power supply itself). Aside from the fans themselves, dust buildup plugs heat sinks eventually drastically reducing their ability to get rid of heat. When you get to this stage blowing them out with canned air can work wonders. My 2 servers at home sit on the floor and need this about once a year. -Mike Hi.. I've recently replaced all the 3.3V decoupling caps on a 7 year old Compaq mobo, that was showing all sorts of odd behaviour, more (at first glance) related to the video card. It wasn't expensive, but was time consuming even for me as a skilled electronics tech, with more years of soldering iron time than I care to admit, it took me a good couple of hours! These things aren't made to be easily repaired, but it can be done. In fact, for some common mobo's you can buy complete re-cap kits with all the right parts. Same for all sorts of other consumer electronics. (DVD players, Games consoles, DTV and other set-top boxes etc.) As a result, that box now runs sweet as a nut. Passing all diags with flying colours, even when hot. Any caps that have a bulging top, on the mobo or in the PSU, need changing. Idealy for the same value and voltage. But you can go higher (within reason) in value, but don't go too high in voltage rating, as they can deteriorate if they don't have enough volts, and start to fail early again. Re the PSU thing. Don't get fooled into the common lore that bigger is better. You can have too big a PSU that will fail to regulate the auxilary output lines correctly until you add extra load to it's main output. Many PC supplies (sadly not all) do have a note to that effect on the ratings label. For most Switch Mode supplies, they work best loaded to between half and full power on their main output. Much less than 1/4 of their capability, and the auxilary outputs will start to wander about a bit, especially if the incoming line is a bit high in voltage. Common symptoms are strange audiable noises from CD drives, or hard drives that struggle to start up, but are OK once working. Yes, also keeping things clean and cool is a good move too. Hope that helps someone. Cheers. Dave B. PS: I don't suppose anyone knows a real good simple blow by blow total newby dialog, as to how to realiably and correctly create and setup Jails on FreeBSD 8.0? All the man pages I've found so far, are way over my head. Good Reference material admittedly, but no good as an instructional if you dont already know How To... I don't understand ezjail either... Something to do with the faded grey cell and too many years etc... ___
Re: Strange system lockups - kernel saying disk error
[...] PS: I don't suppose anyone knows a real good simple blow by blow total newby dialog, as to how to realiably and correctly create and setup Jails on FreeBSD 8.0? All the man pages I've found so far, are way over my head. Good Reference material admittedly, but no good as an instructional if you dont already know How To... I don't understand ezjail either... Something to do with the faded grey cell and too many years etc... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org http://wiki.optiplex-networks.com/xwiki/bin/view/FreeBSD/Jails Still a work in progress and running from a VM in a laptop on an ADSL line but it does the job :-) Regards, Kaya ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
[direct] Re: Strange system lockups - kernel saying disk error
recovery and self contained AV disks, and also Memtest86, I carry a copy of Spinrite arround with me too. I just wish I could come up with something as successful, and able to continue selling over and over... As for changing mobo caps, it's not dificult, but it sure takes a lot of time and care. Cap's in PSU's too go bad (Usually the Low Voltage ones) again, not dificult to change, but take care. There's often considerable High Voltage stored in some places, that can bite you, and it hurts! Lastly, large slow running fans last the longest, and are nice and quiet too. Just regularly blow the dust bunnies out of the systems (two or three time a year?) and keep things like the CPU cooler and PSU clean, and your hardware will work for many years just fine. Oh.. CPU coolers. If your system has the ability to monitor the CPU temperature, get to know how that behaves depending on the software you use. If it starts to slowly rise, but the room temperature is not correspondinlgy warmer, also cleaning the dust from the cooler doenst seem to help. It may need the cooler removing, the old heat transfer compound removing and cleaning, and fresh compound using when you refit the cooler. This issues seems worse with the earlier single core P4's, that had a very small contact area to the cooler. At least Intel chips just slow down as they get hotter (cycle skipping) so as not to burn out. Some AMD's will destroy themselves if the cooler fails!...There is a YouTube video somewhere, showing a PC with an Intel CPU with no cooler getting slower and slower till it almost stops. I hope you get things sorted out, one way or another. Life is so much nicer if you don't have to keep messing with the blessed things! I have a sick Land Rover to fix too. Gearbox rear oil seal, also rear drive shaft UJ's. At least I can use big hammers on that sometimes... (Therapy!) Oh, the grass needs cutting, and I'm now also under instruction to change the bed, when the cat's finished sleeping on it!!! Best Regards. Dave B. On 4 Jun 2011 at 21:35, Kaya Saman wrote: Subject:Re: Strange system lockups - kernel saying disk error [...] Hmmm Hard drives do not like heat! Check the PSU voltages with a meter, for accuracy and ripple. Failing SMPS's can do all sorts of odd things. Capacitor problems. Been there done that. They can be changed for very low cost, other than your time. DaveB You might guess by know, I know far more about hardware than I do about software, but for the latter to run well, the former must be good. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org Many thanks Dave for all the suggestions!!! To be honest I think the drives are fine but the system is just s old including the IDE drives. I mean if I get a SATA/IDE USB adapter I should be able to backup the drives to the new DAS system I will have in place shortly since I am much more in favor of running Nexenta Core 3 OS with ZFS spanning the 16x drives meaning a total of 36TB with 2 internal drives used for logging and caching. Then this system will be obsolete. However, I will keep your suggestion of using spinwrite in mind next time I encounter issues! BTW I respect your H/W knowledge that's quite in deep :-) thank you for your insight. just an observation demon.co.uk :-) used to be my old ISP til I went with Pipex which is now bust, then I moved out of the UK and now everything is roasting hot Best regards, Kaya __ NOD32 6175 (20110602) Information __ This message was checked by NOD32 antivirus system. http://www.eset.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Strange system lockups - kernel saying disk error
On 06/05/2011 03:48 AM, per...@pluto.rain.com wrote: Kaya Samankayasa...@gmail.com wrote: Did you apply any updates shortly before it started to fail? No updates! I did however, install unrar through ports. Intuitively, that seems unlikely to have triggered the problem. This doesn't sound like an issue to me either as it wouldn't touch the kernel or any modules. I remember on other boards that went on me in the past with capacitor issues, a bunch of orange stuff starts leaking out of them when they blow up. A leaking capacitor has surely gone bad, but the syndrome I'm thinking of is more subtle. The top of the can, which should be flat, bulges upward a little bit. Whether replacing bad capacitors qualifies as quick depends on how comfortable you are using a soldering iron. It does generally require taking the board out of the case, which may or may not be quick or easy depending on the case design. I have a degree in Electronic Engineering :-) - though no soldering iron :-( Also the chassis doesn't have any cooling fans either since it was bought extremely cheaply by the family member but not sure that's the culprit neither power problems as the system has run in high outside ambient temps in the past with no A/C in the room and also was working fine on the PSU installed with the 4 disks. Fans that were never there can't have suddenly failed :) Odd that isn't it :-P Power supplies do fail occasionally, and not always in obvious ways such as failing to turn on at all. The output voltages may be a little too high or too low, or they may be correct but with excessive ripple or electrical noise; or the supply may be just fine until a disk draws a current spike to move the arm rapidly. This needs either a voltmeter or oscilloscope to check out the voltages, fluctuations, and ripple. None of those at home :-( man what I am I doing with 2 racks and no tools to fix things??? It might be worth checking the fan mounted on the CPU heatsink if there is one, and the fan in the power supply (which ventilates the case as well as the power supply itself). CPU fan works - at least it spins, fan in PSU not checked as I'd need to open it as it's a PS/2 design if not mistaken! But all these tips would be useful for a system that was given more value then mine. If I had actually paid for the system and it been quite advanced it would definitely be worth taking everything into account. Regards, Kaya ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: [direct] Re: Strange system lockups - kernel saying disk error
disks, and also Memtest86, I carry a copy of Spinrite arround with me too. I just wish I could come up with something as successful, and able to continue selling over and over... As for changing mobo caps, it's not dificult, but it sure takes a lot of time and care. Cap's in PSU's too go bad (Usually the Low Voltage ones) again, not dificult to change, but take care. There's often considerable High Voltage stored in some places, that can bite you, and it hurts! Lastly, large slow running fans last the longest, and are nice and quiet too. Just regularly blow the dust bunnies out of the systems (two or three time a year?) and keep things like the CPU cooler and PSU clean, and your hardware will work for many years just fine. Oh.. CPU coolers. If your system has the ability to monitor the CPU temperature, get to know how that behaves depending on the software you use. If it starts to slowly rise, but the room temperature is not correspondinlgy warmer, also cleaning the dust from the cooler doenst seem to help. It may need the cooler removing, the old heat transfer compound removing and cleaning, and fresh compound using when you refit the cooler. This issues seems worse with the earlier single core P4's, that had a very small contact area to the cooler. At least Intel chips just slow down as they get hotter (cycle skipping) so as not to burn out. Some AMD's will destroy themselves if the cooler fails!...There is a YouTube video somewhere, showing a PC with an Intel CPU with no cooler getting slower and slower till it almost stops. I hope you get things sorted out, one way or another. Life is so much nicer if you don't have to keep messing with the blessed things! I have a sick Land Rover to fix too. Gearbox rear oil seal, also rear drive shaft UJ's. At least I can use big hammers on that sometimes... (Therapy!) Oh, the grass needs cutting, and I'm now also under instruction to change the bed, when the cat's finished sleeping on it!!! Best Regards. Dave B. On 4 Jun 2011 at 21:35, Kaya Saman wrote: Subject:Re: Strange system lockups - kernel saying disk error [...] Hmmm Hard drives do not like heat! Check the PSU voltages with a meter, for accuracy and ripple. Failing SMPS's can do all sorts of odd things. Capacitor problems. Been there done that. They can be changed for very low cost, other than your time. DaveB You might guess by know, I know far more about hardware than I do about software, but for the latter to run well, the former must be good. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org Many thanks Dave for all the suggestions!!! To be honest I think the drives are fine but the system is just s old including the IDE drives. I mean if I get a SATA/IDE USB adapter I should be able to backup the drives to the new DAS system I will have in place shortly since I am much more in favor of running Nexenta Core 3 OS with ZFS spanning the 16x drives meaning a total of 36TB with 2 internal drives used for logging and caching. Then this system will be obsolete. However, I will keep your suggestion of using spinwrite in mind next time I encounter issues! BTW I respect your H/W knowledge that's quite in deep :-) thank you for your insight. just an observation demon.co.uk :-) used to be my old ISP til I went with Pipex which is now bust, then I moved out of the UK and now everything is roasting hot Best regards, Kaya __ NOD32 6175 (20110602) Information __ This message was checked by NOD32 antivirus system. http://www.eset.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org Thanks Dave for this very graphic and insightful story :-) It was a pleasure to read and a nice display of how experience really does prevail over things!!! I liked the radio chart on the site provided :-) - what exactly is it measuring? Background noise? I think not having a UPS for over a year killed me with the power cutting out almost every weekend for 10 - 20 minutes/night. Now I have UPS, 2x 1500KVA APC systems... nice but need the network and temp monitoring cards. Need plenty of £££ for that! Plus the new server I am intending to build as the DAS box already cost $2000. Regards, Kaya ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Strange system lockups - kernel saying disk error
per...@pluto.rain.com wrote: [snip] Power supplies do fail occasionally, and not always in obvious ways such as failing to turn on at all. The output voltages may be a little too high or too low, or they may be correct but with excessive ripple or electrical noise; or the supply may be just fine until a disk draws a current spike to move the arm rapidly. I've seen a fair number or power supplies degrade somewhere around the 5 year mark. Simple voltage checks with a VOM and its accuracy will usually still show the voltages as being correct. To see the ripple you'll need an oscilloscope. Excessive ripple can make a PC appear to have all kinds of intermittent hardware failures with little or no rhyme or reason. A degraded power supply will show large variations in ripple based on load. The largest load from hard drives is when they are first spinning up. Servers are commonly configured with the ability to spin up drives one at a time with a short delay in between. You won't usually find this on a desktop. Generally, this situation will develop more often on an old machine that had a 'barely enough' capacity power supply when new. Add 3 more hard drives, bigger video, etc and it was still just inside the envelope until enough time went by and the power supply got old. Since the most amps pulled by the hard drives occurs on power up you will see the ripple on a 'scope look really ugly while this happens. The unseen danger here is that bits on the drive(s) can get scrambled until things settle down. You will know this happens when stuff goes wrong and fsck is needed to get the file system clean, and after cleaning and working again will do the same thing again at some future reboot. Easiest way to look at this without a 'scope is to simply substitute a known good PSU of sufficient rating from a machine with no troubles. If all the random nonsense suddenly stops, you'll know. This is easiest for folks these days as those without an analog electronics background are unlikely to have an oscilloscope laying around. It might be worth checking the fan mounted on the CPU heatsink if there is one, and the fan in the power supply (which ventilates the case as well as the power supply itself). Aside from the fans themselves, dust buildup plugs heat sinks eventually drastically reducing their ability to get rid of heat. When you get to this stage blowing them out with canned air can work wonders. My 2 servers at home sit on the floor and need this about once a year. -Mike ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Strange system lockups - kernel saying disk error
Many thanks for the response! On 06/04/2011 02:00 AM, per...@pluto.rain.com wrote: Kaya Samankayasa...@gmail.com wrote: I have an ancient pre-HT PIV machine with500MB RAM. ... Everything was running fine until round about 2 days ago when the system started locking up on me? ... is there anyway to fix the kernel error quickly? Did you apply any updates shortly before it started to fail? No updates! I did however, install unrar through ports. If not, this is likely to be a hardware problem. I'd suggest checking the power supply and the fans, running memtest86, and taking a close look at the electrolytic filter capacitors on the system board -- the last because it sounds as if this system may be about the right age to have been built with some bad ones. (If any of the capacitors are bulging, either those caps, or the entire board, need to be replaced.) Power and heat problems can cause all sorts of strange symptoms. I guess, I mean I did mention that the system was old and also I've been running in 24/7 online for the past year and half as this box got passed down to me by a family member. It has a Gigabyte system board. Not sure about the capacitors; I'll check. I remember on other boards that went on me in the past with capacitor issues, a bunch of orange stuff starts leaking out of them when they blow up. Also the chassis doesn't have any cooling fans either since it was bought extremely cheaply by the family member but not sure that's the culprit neither power problems as the system has run in high outside ambient temps in the past with no A/C in the room and also was working fine on the PSU installed with the 4 disks. I guess it's hardware related somehow as something's blown up, either the PSU, system board or so.. As I explained in the beginning if there's no clear way to fix the problem easily then I'll wait a bit. - I have a 16 disk Promise DAS on the way and will build a server using a Chenbro industrial rack chassis and Supermicro AMD based 8-12 core system board. These systems will fit better in the 2 racks I have in my living room. This should be a bit more stable and also give me higher capacity too! Regards, Kaya ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Strange system lockups - kernel saying disk error
On 3 Jun 2011 at 15:09, Kaya Saman wrote: Hi, I have an ancient pre-HT PIV machine with 500MB RAM. The system has an extra PCI-SATA card installed so I can make use of modern high capacity drives. Everything was running fine until round about 2 days ago when the system started locking up on me? Current drive configuration for the system is: 40GB IDE drive as root (ad2) - UFS2 500GB IDE drive for storage (ad3) - EXT3 1TB SATA drive for storage (ad4) - UFS2 750GB SATA drive for storage (ad8) - EXT3 I had an issue with the 750GB drive which the file system seemed to have got corrupted so I powered down and backed the information up to a 2TB SATA drive using ddrescue and the Gentoo Linux based System Rescue CD. I put the 2TB drive in place of the 1TB ad4 drive physically. Once backed up I powered down again and re-installed the 1TB SATA drive into ad4 position on system and completely removed the 2TB backup. When booted back into FreeBSD upon boot I received this error: WARNING: Kernel Errors Present ad4: FAILURE - WRITE_DMA48 status=51READY,DSC,ERROR error=4ABORTED LBA=1 ...: 1 Time(s) g_vfs_done():ad4e[WRITE(offset=97691456, length=16384)]error = 5 ...: 1 Time(s) The current status of the disks seemed to be ok though: 1 Time(s): ad2: 38166MBSeagate ST340014A 3.06 at ata1-master UDMA33 1 Time(s): ad2: DMA limited to UDMA33, controller found non-ATA66 cable 1 Time(s): ad3: 476940MBSeagate ST3500630A 3.AAF at ata1-slave UDMA33 1 Time(s): ad3: DMA limited to UDMA33, controller found non-ATA66 cable 1 Time(s): ad4: 953869MBSAMSUNG HD103SJ 1AJ10001 at ata2-master SATA150 1 Time(s): ad8: 715404MBSeagate ST3750640AS 3.AAE at ata4-master SATA150 1 Time(s): agp0:SiS 651 host to AGP bridge on hostb0 1 Time(s): ata0:ATA channel 0 on atapci0 1 Time(s): ata0: [ITHREAD] 1 Time(s): ata1:ATA channel 1 on atapci0 1 Time(s): ata1: [ITHREAD] 1 Time(s): ata2:ATA channel 0 on atapci1 1 Time(s): ata2: [ITHREAD] 1 Time(s): ata3:ATA channel 1 on atapci1 1 Time(s): ata3: [ITHREAD] 1 Time(s): ata4:ATA channel 2 on atapci1 1 Time(s): ata4: [ITHREAD] 1 Time(s): ata5:ATA channel 3 on atapci1 In order to test if the error was due to disk failure I powered down and disconnected the ad4 and ad3 disks and powered back up. The system still seems to be locking on me and I can't understand why? Through Google'ing a discovered a post by Jeremy Chadwick about these kinds of errors: http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting however since the system board is pre-SATA is doesn't even have S.M.A.R.T. so I'm totally lost on how to fix this. I mean the best remedy would be to get a new computer and migrate the stored information (something like this is on the way) but currently I don't have access to any of the disks at all and to make matters worse no NTP or DNS server as I was running these services on the same machine or TFTP boot server for my IP phones. - I do run multiboot UNIX on my notebook so Bind9 is naturally installed hence me writing this but I only activate in emergencies. I mean one way I thought of for fixing this would be to grab a USB - ATA/SATA adapter: http://www.startech.com/product/USB2SATAIDE-USB-20-to-IDE-or-SATA-Adap ter-Cable and hook the drives up to both Linux and FreeBSD in my notebook and copy the information across to the new system when it arrives in a few months. Aside from that is there anyway to fix the kernel error quickly? Thanks, Kaya Hmmm... No backups then? First, check the drive data cables. Many do fail with age. Some SATA types are made with Aluminium not copper, and are extremley fragile when they age. If that doenst shed some light... Take a look athttp://www.grc.com/spinrite.htm Will often restore a failling drive to full use, if it's not mechanicaly damaged. It can take time though, if any sector corruption is very bad. Days, weeks, even months have been see in some cases, but if the software keeps going, it usualy does the job. It's not a Windows program, if anyting it's a DOS program, but comes with it's own FreeDOS system to boot and run from, so you don't even need an OS on the machine to test! It will work with IDE or SATA types, even over a USB adapter if needed (but then it can't access any SMART data the drive may have) but it'll run a lot slower as it won't be aware of the drive's detailed physical timing etc. I've used it on WIndows and Linux machines in anger, and the FreeBSD box when I got it (an old Gateway E-1400) to make sure the drive was healthy. It's the hard drive equivalent of Memtest86, and you know how good that is. Even if it doesn't report any problems found, often it will cause the drive to maitain things itself, improving performance as a result. Even if the recovered drive is still less than 100% happy, or some of
Re: Strange system lockups - kernel saying disk error
On 4 Jun 2011 at 10:52, Kaya Saman wrote: Many thanks for the response! On 06/04/2011 02:00 AM, per...@pluto.rain.com wrote: Kaya Samankayasa...@gmail.com wrote: I have an ancient pre-HT PIV machine with500MB RAM. ... Everything was running fine until round about 2 days ago when the system started locking up on me? ... is there anyway to fix the kernel error quickly? Did you apply any updates shortly before it started to fail? No updates! I did however, install unrar through ports. If not, this is likely to be a hardware problem. I'd suggest checking the power supply and the fans, running memtest86, and taking a close look at the electrolytic filter capacitors on the system board -- the last because it sounds as if this system may be about the right age to have been built with some bad ones. (If any of the capacitors are bulging, either those caps, or the entire board, need to be replaced.) Power and heat problems can cause all sorts of strange symptoms. I guess, I mean I did mention that the system was old and also I've been running in 24/7 online for the past year and half as this box got passed down to me by a family member. It has a Gigabyte system board. Not sure about the capacitors; I'll check. I remember on other boards that went on me in the past with capacitor issues, a bunch of orange stuff starts leaking out of them when they blow up. Also the chassis doesn't have any cooling fans either since it was bought extremely cheaply by the family member but not sure that's the culprit neither power problems as the system has run in high outside ambient temps in the past with no A/C in the room and also was working fine on the PSU installed with the 4 disks. I guess it's hardware related somehow as something's blown up, either the PSU, system board or so.. As I explained in the beginning if there's no clear way to fix the problem easily then I'll wait a bit. - I have a 16 disk Promise DAS on the way and will build a server using a Chenbro industrial rack chassis and Supermicro AMD based 8-12 core system board. These systems will fit better in the 2 racks I have in my living room. This should be a bit more stable and also give me higher capacity too! Regards, Kaya Hmmm Hard drives do not like heat! Check the PSU voltages with a meter, for accuracy and ripple. Failing SMPS's can do all sorts of odd things. Capacitor problems. Been there done that. They can be changed for very low cost, other than your time. DaveB You might guess by know, I know far more about hardware than I do about software, but for the latter to run well, the former must be good. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Strange system lockups - kernel saying disk error
[...] Hmmm Hard drives do not like heat! Check the PSU voltages with a meter, for accuracy and ripple. Failing SMPS's can do all sorts of odd things. Capacitor problems. Been there done that. They can be changed for very low cost, other than your time. DaveB You might guess by know, I know far more about hardware than I do about software, but for the latter to run well, the former must be good. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org Many thanks Dave for all the suggestions!!! To be honest I think the drives are fine but the system is just s old including the IDE drives. I mean if I get a SATA/IDE USB adapter I should be able to backup the drives to the new DAS system I will have in place shortly since I am much more in favor of running Nexenta Core 3 OS with ZFS spanning the 16x drives meaning a total of 36TB with 2 internal drives used for logging and caching. Then this system will be obsolete. However, I will keep your suggestion of using *spinwrite* in mind next time I encounter issues! BTW I respect your H/W knowledge that's quite in deep :-) thank you for your insight. just an observation demon.co.uk :-) used to be my old ISP til I went with Pipex which is now bust, then I moved out of the UK and now everything is roasting hot Best regards, Kaya ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Strange system lockups - kernel saying disk error
Kaya Saman kayasa...@gmail.com wrote: Did you apply any updates shortly before it started to fail? No updates! I did however, install unrar through ports. Intuitively, that seems unlikely to have triggered the problem. I remember on other boards that went on me in the past with capacitor issues, a bunch of orange stuff starts leaking out of them when they blow up. A leaking capacitor has surely gone bad, but the syndrome I'm thinking of is more subtle. The top of the can, which should be flat, bulges upward a little bit. Whether replacing bad capacitors qualifies as quick depends on how comfortable you are using a soldering iron. It does generally require taking the board out of the case, which may or may not be quick or easy depending on the case design. Also the chassis doesn't have any cooling fans either since it was bought extremely cheaply by the family member but not sure that's the culprit neither power problems as the system has run in high outside ambient temps in the past with no A/C in the room and also was working fine on the PSU installed with the 4 disks. Fans that were never there can't have suddenly failed :) Power supplies do fail occasionally, and not always in obvious ways such as failing to turn on at all. The output voltages may be a little too high or too low, or they may be correct but with excessive ripple or electrical noise; or the supply may be just fine until a disk draws a current spike to move the arm rapidly. It might be worth checking the fan mounted on the CPU heatsink if there is one, and the fan in the power supply (which ventilates the case as well as the power supply itself). ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Strange system lockups - kernel saying disk error
Kaya Saman kayasa...@gmail.com wrote: I have an ancient pre-HT PIV machine with 500MB RAM. ... Everything was running fine until round about 2 days ago when the system started locking up on me? ... is there anyway to fix the kernel error quickly? Did you apply any updates shortly before it started to fail? If not, this is likely to be a hardware problem. I'd suggest checking the power supply and the fans, running memtest86, and taking a close look at the electrolytic filter capacitors on the system board -- the last because it sounds as if this system may be about the right age to have been built with some bad ones. (If any of the capacitors are bulging, either those caps, or the entire board, need to be replaced.) Power and heat problems can cause all sorts of strange symptoms. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: disk error / reboot / 6.3
Hi Paul, The patch worked (almost). At first a program accessing a disk that reported an uncorrectable error, the program just segfaulted. Another instance let to the situation that I was only able to ping the server. No ssh or console access was possible anymore. -Pat _ From: Paul B. Mahol [mailto:one...@gmail.com] To: jerome [mailto:jer...@code-monkey.nl] Cc: freebsd-questions@freebsd.org Sent: Mon, 22 Dec 2008 13:15:12 +0100 Subject: Re: disk error / reboot / 6.3 On 12/22/08, jerome jer...@code-monkey.nl wrote: Hi Paul, The server resets while running, like pressing the reset button... Try this patch: --- src/sys/dev/ata/ata-queue.c 2008/10/27 09:26:24 1.74 +++ src/sys/dev/ata/ata-queue.c 2008/11/27 03:37:46 1.75 @@ -357,7 +357,7 @@ ata_completed(void *context, int dummy) \6MEDIA_CHANGED\5NID_NOT_FOUND \4MEDIA_CHANGE_REQEST \3ABORTED\2NO_MEDIA\1ILLEGAL_LENGTH); - if ((request-flags ATA_R_DMA) + if ((request-flags ATA_R_DMA) request-dma (request-dma-status ATA_BMSTAT_ERROR)) printf( dma=0x%02x, request-dma-status); if (!(request-flags (ATA_R_ATAPI | ATA_R_CONTROL))) -- Paul ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: disk error / reboot / 6.3
On 12/28/08, jerome jer...@code-monkey.nl wrote: Hi Paul, The patch worked (almost). At first a program accessing a disk that reported an uncorrectable error, the program just segfaulted. Another instance let to the situation that I was only able to ping the server. No ssh or console access was possible anymore. That is somehow to be expected, the point of patch is to fix panic, not trashing due to faulty disk/drivers/something else ... -- Paul ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: disk error / reboot / 6.3
On 12/22/08, jerome jer...@code-monkey.nl wrote: Hi Paul, The server resets while running, like pressing the reset button... Try this patch: --- src/sys/dev/ata/ata-queue.c 2008/10/27 09:26:24 1.74 +++ src/sys/dev/ata/ata-queue.c 2008/11/27 03:37:46 1.75 @@ -357,7 +357,7 @@ ata_completed(void *context, int dummy) \6MEDIA_CHANGED\5NID_NOT_FOUND \4MEDIA_CHANGE_REQEST \3ABORTED\2NO_MEDIA\1ILLEGAL_LENGTH); - if ((request-flags ATA_R_DMA) + if ((request-flags ATA_R_DMA) request-dma (request-dma-status ATA_BMSTAT_ERROR)) printf( dma=0x%02x, request-dma-status); if (!(request-flags (ATA_R_ATAPI | ATA_R_CONTROL))) -- Paul ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: disk error / reboot / 6.3
Hi Paul, Ok, thanks. Will let you know the outcome. -Jerome _ From: Paul B. Mahol [mailto:one...@gmail.com] To: jerome [mailto:jer...@code-monkey.nl] Cc: freebsd-questions@freebsd.org Sent: Mon, 22 Dec 2008 13:15:12 +0100 Subject: Re: disk error / reboot / 6.3 On 12/22/08, jerome jer...@code-monkey.nl wrote: Hi Paul, The server resets while running, like pressing the reset button... Try this patch: --- src/sys/dev/ata/ata-queue.c 2008/10/27 09:26:24 1.74 +++ src/sys/dev/ata/ata-queue.c 2008/11/27 03:37:46 1.75 @@ -357,7 +357,7 @@ ata_completed(void *context, int dummy) \6MEDIA_CHANGED\5NID_NOT_FOUND \4MEDIA_CHANGE_REQEST \3ABORTED\2NO_MEDIA\1ILLEGAL_LENGTH); - if ((request-flags ATA_R_DMA) + if ((request-flags ATA_R_DMA) request-dma (request-dma-status ATA_BMSTAT_ERROR)) printf( dma=0x%02x, request-dma-status); if (!(request-flags (ATA_R_ATAPI | ATA_R_CONTROL))) -- Paul ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
disk error / reboot / 6.3
Hi, We are running 6.3 on a fileserver with a couple of data disks. Once the server encounters an error on a data disk (os disk is separate) the server will reset itself without warning. We can usually identify the problem disk with a smartctl, the disk will show 'Offline uncorrectable errors'. The fact that the server reboots itself, is this normal? Can we prevent this from happening? The disks are attached to the on-board sata ports of the mainboard itself, so no (raid)controllers whatsoever. We also do not use software raid. Best regards Jerome ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: disk error / reboot / 6.3
On 12/21/08, jerome jer...@code-monkey.nl wrote: Hi, We are running 6.3 on a fileserver with a couple of data disks. Once the server encounters an error on a data disk (os disk is separate) the server will reset itself without warning. It just reset or it panic? There is known panic on bad block on some FreeBSD versions but I don't think that such regression hit 6.X. -- Paul ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: disk error / reboot / 6.3
Hi Paul, The server resets while running, like pressing the reset button... -Jerome _ From: Paul B. Mahol [mailto:one...@gmail.com] To: jerome [mailto:jer...@code-monkey.nl] Cc: freebsd-questions@freebsd.org Sent: Mon, 22 Dec 2008 00:35:04 +0100 Subject: Re: disk error / reboot / 6.3 On 12/21/08, jerome jer...@code-monkey.nl wrote: Hi, We are running 6.3 on a fileserver with a couple of data disks. Once the server encounters an error on a data disk (os disk is separate) the server will reset itself without warning. It just reset or it panic? There is known panic on bad block on some FreeBSD versions but I don't think that such regression hit 6.X. -- Paul ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
RAID 1 / disk error / Offline uncorrectable sectors
Hello, I'd like to ask your advice. We have RAID 1 / SATA turned on in BIOS. A couple of days ago smartd let me know about a disk problem. Jun 14 01:13:38 relay kernel: ad12: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=374468863 Jun 14 01:13:38 relay kernel: ar0: WARNING - mirror protection lost. RAID1 array in DEGRADED mode Jun 14 01:14:19 relay kernel: ad12: WARNING - WRITE_DMA taskqueue timeout - completing request directly Jun 14 01:14:19 relay kernel: ad12: WARNING - WRITE_DMA48 freeing taskqueue zombie request Jun 14 01:37:38 relay smartd[683]: Device: /dev/ad12, 1 Currently unreadable (pending) sectors Jun 14 01:37:38 relay smartd[683]: Device: /dev/ad12, 1 Offline uncorrectable sectors If I do smarctl -a /dev/ad12 I get 197 Current_Pending_Sector 0x0012 100 100 000Old_age Always - 1 198 Offline_Uncorrectable 0x0010 100 100 000Old_age Offline - 1 My understanding is that RAID 1 no longer works because of this error. There is a bad sector on HD (Offline uncorrectable sectors) and the best we can do is replace the drive? Does it make sense to try to turn RAID 1 on ignoring this error (however, this is done in BIOS so the machine would have to be taken down in order to do that)? It seems serious enough for me not to ignore it but then I know close to nothing about HDs. Many thanks for your suggestions! Zbigniew Szalbot ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: RAID 1 / disk error / Offline uncorrectable sectors
In response to Zbigniew Szalbot [EMAIL PROTECTED]: A couple of days ago smartd let me know about a disk problem. Jun 14 01:13:38 relay kernel: ad12: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=374468863 Jun 14 01:13:38 relay kernel: ar0: WARNING - mirror protection lost. RAID1 array in DEGRADED mode Jun 14 01:14:19 relay kernel: ad12: WARNING - WRITE_DMA taskqueue timeout - completing request directly Jun 14 01:14:19 relay kernel: ad12: WARNING - WRITE_DMA48 freeing taskqueue zombie request Jun 14 01:37:38 relay smartd[683]: Device: /dev/ad12, 1 Currently unreadable (pending) sectors Jun 14 01:37:38 relay smartd[683]: Device: /dev/ad12, 1 Offline uncorrectable sectors If I do smarctl -a /dev/ad12 I get 197 Current_Pending_Sector 0x0012 100 100 000Old_age Always - 1 198 Offline_Uncorrectable 0x0010 100 100 000Old_age Offline - 1 My understanding is that RAID 1 no longer works because of this error. There is a bad sector on HD (Offline uncorrectable sectors) and the best we can do is replace the drive? Does it make sense to try to turn RAID 1 on ignoring this error (however, this is done in BIOS so the machine would have to be taken down in order to do that)? It seems serious enough for me not to ignore it but then I know close to nothing about HDs. Replace the hard drive. Every modern hard drive keeps extra space available to remap bad sectors. This happens magically behind the scenes without you ever knowing about it. Once you've hit uncorrectable errors, it means your re-mappable sectors are used up, and that means the drive is on its last legs. -- Bill Moran http://www.potentialtech.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: RAID 1 / disk error / Offline uncorrectable sectors
Dear all, Bill Moran: My understanding is that RAID 1 no longer works because of this error. There is a bad sector on HD (Offline uncorrectable sectors) and the best we can do is replace the drive? Does it make sense to try to turn RAID 1 on ignoring this error (however, this is done in BIOS so the machine would have to be taken down in order to do that)? It seems serious enough for me not to ignore it but then I know close to nothing about HDs. Replace the hard drive. Every modern hard drive keeps extra space available to remap bad sectors. This happens magically behind the scenes without you ever knowing about it. Once you've hit uncorrectable errors, it means your re-mappable sectors are used up, and that means the drive is on its last legs. Thank you Bill. One last question. RAID 1 is off now (degraded) and the hosting company is asking if I can try to bring it up (to check if it will work). They have given me this link http://www.freebsd.org/doc/en/books/handbook/raid.html. The problem is that as far as I understand we are not using gmirror but RAID 1 turned on in BIOS (although it is also software-based). Thank you very much in advance! Zbigniew Szalbot www.lc-words.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: RAID 1 / disk error / Offline uncorrectable sectors
Zbigniew Szalbot wrote: Dear all, Bill Moran: My understanding is that RAID 1 no longer works because of this error. There is a bad sector on HD (Offline uncorrectable sectors) and the best we can do is replace the drive? Does it make sense to try to turn RAID 1 on ignoring this error (however, this is done in BIOS so the machine would have to be taken down in order to do that)? It seems serious enough for me not to ignore it but then I know close to nothing about HDs. Replace the hard drive. Every modern hard drive keeps extra space available to remap bad sectors. This happens magically behind the scenes without you ever knowing about it. Once you've hit uncorrectable errors, it means your re-mappable sectors are used up, and that means the drive is on its last legs. Thank you Bill. One last question. RAID 1 is off now (degraded) and the hosting company is asking if I can try to bring it up (to check if it will work). They have given me this link http://www.freebsd.org/doc/en/books/handbook/raid.html. The problem is that as far as I understand we are not using gmirror but RAID 1 turned on in BIOS (although it is also software-based). Thank you very much in advance! Zbigniew Szalbot www.lc-words.com Hey Zbigniew ;) I understand you are using the ataraid (ar) driver. I always use gmirror, but it seems they pointed you to the right place in the handbook. Look at section 18.4.3 - you would probably need to do something like: # atacontrol list From the list, get the ATA channel for /dev/ad12 which is the faulty one, e.g. ata2 Detach and re-attach (maybe this will reset the state of the drive) atacontrol detach ata2 atacontrol attach ata2 atacontrol addspare ar0 ad12 atacontrol rebuild ar0 I've done more or less the same with gmirror when I had similar messages a few months back. It may work for a few hours/days but it will fail again. Have it replaced ASAP. Manolis ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: RAID 1 / disk error / Offline uncorrectable sectors
Replace the hard drive. Every modern hard drive keeps extra space available to remap bad sectors. This happens magically behind the scenes without you ever knowing about it. Once you've hit uncorrectable errors, it means no. usually it means that there was an error when writing that sector, and later there is an error on read. madia may be good (quite often is). if you would be right i wouldn't have my disk running one year after having whole block of uncorrectable errors i just rewrote that blocks and they are readable. drive HAS TO know about bad media to remap, and no HDDs today perform verification ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: RAID 1 / disk error / Offline uncorrectable sectors
Hello Manolis, I understand you are using the ataraid (ar) driver. I always use gmirror, but it seems they pointed you to the right place in the handbook. Look at section 18.4.3 - you would probably need to do something like: # atacontrol list ATA channel 6: Master: ad12 ST3250310NS/SN04 Serial ATA v1.0 Slave: no device present ATA channel 0: Master: no device present Slave: no device present ATA channel 1: Master: no device present Slave: no device present ATA channel 2: Master: no device present Slave: no device present ATA channel 3: Master: no device present Slave: no device present ATA channel 4: Master: no device present Slave: no device present ATA channel 5: Master: ad10 ST3250310NS/SN04 Serial ATA v1.0 Slave: no device present ATA channel 6: Master: ad12 ST3250310NS/SN04 Serial ATA v1.0 Slave: no device present ATA channel 7: Master: no device present Slave: no device present ATA channel 8: Master: no device present Slave: no device present ATA channel 9: Master: no device present Slave: no device present ATA channel 10: Master: no device present Slave: no device present So in this case it would be ata6? Sorry for asking confirmation for every step but it is just so new to me! And thanks for the list of steps to perform! Zbigniew Szalbot ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: RAID 1 / disk error / Offline uncorrectable sectors
On Mon, Jun 16, 2008 at 04:41:15PM +0200, Wojciech Puchar wrote: Replace the hard drive. Every modern hard drive keeps extra space available to remap bad sectors. This happens magically behind the scenes without you ever knowing about it. Once you've hit uncorrectable errors, it means no. usually it means that there was an error when writing that sector, and later there is an error on read. madia may be good (quite often is). if you would be right i wouldn't have my disk running one year after having whole block of uncorrectable errors i just rewrote that blocks and they are readable. drive HAS TO know about bad media to remap, and no HDDs today perform verification Also, remapping can only happen if the error is encountered on a write operation. If there is an error on read the drive cannot remap, since it does not know what data should be there. (A good RAID implementation could however handle a read error by reading the corresponding sector from the other disks(s) in the array and write it back to the failing disk, probably causing it to remap the block.) (Write errors is however usually a strong indication that the drive should be replaced ASAP.) -- Insert your favourite quote here. Erik Trulsson [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: RAID 1 / disk error / Offline uncorrectable sectors
Zbigniew Szalbot wrote: Hello Manolis, I understand you are using the ataraid (ar) driver. I always use gmirror, but it seems they pointed you to the right place in the handbook. Look at section 18.4.3 - you would probably need to do something like: # atacontrol list ATA channel 6: Master: ad12 ST3250310NS/SN04 Serial ATA v1.0 Slave: no device present ATA channel 0: Master: no device present Slave: no device present ATA channel 1: Master: no device present Slave: no device present ATA channel 2: Master: no device present Slave: no device present ATA channel 3: Master: no device present Slave: no device present ATA channel 4: Master: no device present Slave: no device present ATA channel 5: Master: ad10 ST3250310NS/SN04 Serial ATA v1.0 Slave: no device present ATA channel 6: Master: ad12 ST3250310NS/SN04 Serial ATA v1.0 Slave: no device present ATA channel 7: Master: no device present Slave: no device present ATA channel 8: Master: no device present Slave: no device present ATA channel 9: Master: no device present Slave: no device present ATA channel 10: Master: no device present Slave: no device present So in this case it would be ata6? Sorry for asking confirmation for every step but it is just so new to me! And thanks for the list of steps to perform! Zbigniew Szalbot Yes, it is ata6 Give it a try, if the problem is serious enough, it will probably not even finish rebuild :( ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: RAID 1 / disk error / Offline uncorrectable sectors
(Write errors is however usually a strong indication that the drive should be replaced ASAP.) he got read error... but your sentence alone is true of course. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: RAID 1 / disk error / Offline uncorrectable sectors
Hi Manolis, Yes, it is ata6 Give it a try, if the problem is serious enough, it will probably not even finish rebuild :( Detaching and ataching went well but when I issued atacontrol addspare ar0 ad12 it said atacontrol: ioctl(IOCATARAIDADDSPARE): Device busy I am not sure if that means I should wait or rather that it is mission impossible? Thanks! Zbigniew Szalbot smime.p7s Description: S/MIME Cryptographic Signature
Re: RAID 1 / disk error / Offline uncorrectable sectors
Zbigniew Szalbot wrote: Hi Manolis, Yes, it is ata6 Give it a try, if the problem is serious enough, it will probably not even finish rebuild :( Detaching and ataching went well but when I issued atacontrol addspare ar0 ad12 it said atacontrol: ioctl(IOCATARAIDADDSPARE): Device busy I am not sure if that means I should wait or rather that it is mission impossible? Thanks! Zbigniew Szalbot Try atacontrol status ar0 Since you haven't actually removed/replaced ad12 you may simply have to continue with: atacontrol rebuild ar0 but see what status says first. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: RAID 1 / disk error / Offline uncorrectable sectors
Hello, Manolis Kiagias: Try atacontrol status ar0 ar0: ATA RAID1 status: DEGRADED subdisks: 0 ad10 ONLINE 1 MISSING Since you haven't actually removed/replaced ad12 you may simply have to continue with: atacontrol rebuild ar0 I'll try it now. Thanks! Zbigniew Szalbot smime.p7s Description: S/MIME Cryptographic Signature
Re: RAID 1 / disk error / Offline uncorrectable sectors
Hello, Manolis Kiagias: Try atacontrol status ar0 Since you haven't actually removed/replaced ad12 you may simply have to continue with: atacontrol rebuild ar0 atacontrol rebuild ar0 atacontrol: ioctl(IOCATARAIDREBUILD): Input/output error So it looks like it cannot be done? Zbigniew Szalbot smime.p7s Description: S/MIME Cryptographic Signature
Re: RAID 1 / disk error / Offline uncorrectable sectors
Zbigniew Szalbot wrote: Hello, Manolis Kiagias: Try atacontrol status ar0 ar0: ATA RAID1 status: DEGRADED subdisks: 0 ad10 ONLINE 1 MISSING Since you haven't actually removed/replaced ad12 you may simply have to continue with: atacontrol rebuild ar0 I'll try it now. Thanks! Zbigniew Szalbot Ok, ad12 is missing, so it seems it was detached but not reattached. try again: atacontrol attach ata6 If this succeeds, atacontrol addspare ar0 ad12 atacontrol rebuild ar0 If attach fails, then someone at the remote site may have to physically detach / reattach the disk in question. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: RAID 1 / disk error / Offline uncorrectable sectors
Hello one last time, Manolis Kiagias: Ok, ad12 is missing, so it seems it was detached but not reattached. try again: atacontrol attach ata6 $ sudo atacontrol attach ata6 atacontrol: ioctl(IOCATAATTACH): File exists Thank you all for a lot of suggestions! Zbigniew Szalbot smime.p7s Description: S/MIME Cryptographic Signature
Re: RAID 1 / disk error / Offline uncorrectable sectors
Zbigniew Szalbot wrote: Hello one last time, Manolis Kiagias: Ok, ad12 is missing, so it seems it was detached but not reattached. try again: atacontrol attach ata6 $ sudo atacontrol attach ata6 atacontrol: ioctl(IOCATAATTACH): File exists Thank you all for a lot of suggestions! Zbigniew Szalbot As a last resort, you could also try: atacontrol reinit ata6 and try reattaching again ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: RAID 1 / disk error / Offline uncorrectable sectors
Hello, As a last resort, you could also try: atacontrol reinit ata6 and try reattaching again Thank you Manolis - you have been more than patient with me! Unfortunately, the result is still the same. OK. I am going to ask our hosting company to replace the drive. Again, many thanks for your help! Zbigniew Szalbot smime.p7s Description: S/MIME Cryptographic Signature
Re: RAID 1 / disk error / Offline uncorrectable sectors
Bill Moran wrote: Zbigniew Szalbot wrote: [...] Jun 14 01:13:38 relay kernel: ad12: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=374468863 [...] Replace the hard drive. Every modern hard drive keeps extra space available to remap bad sectors. This happens magically behind the scenes without you ever knowing about it. Once you've hit uncorrectable errors, it means your re-mappable sectors are used up, and that means the drive is on its last legs. That's not completely true. When a disk drive encounters a bad sector during a read operation, it will remember the bad sector address, but it is unable to transparently remap the sector because it doesn't know that correct contents of the sector. So it has to report the unrecoverable error to the OS, even if there's still plenty of space for remapping sectors. Upon the next write operation to a sector marked as bad, the drive will finally remap it and write the data to a spare location. Therefore, getting uncorrectable errors does *not* mean that the drive has used up its spare sectors. You only need to overwrite the bad sectors (e.g. with dd(1))so the drive gets a chance to remap them. Of course, it might still be a good idea to replace the drive anyway. It depends on the cause of the bad sectors (mechanical or electrical). If you had a head crash (caused by mechanical impact or a media manufacturing error or whatever), it is possible that it caused debris within the drive which will cause further bad blocks. This can lead to a snowball effect that can really exhaust all spare sectors quickly. On the other hand, if the bad sectors where caused by a voltage spike, a power failure or similar, chances are that the drive is fine and you can continue to use it after making sure that the bad sectors are remapped (by overwriting them, see above). Finally, there is also the possibility that the problem is caused by a bug in the drive's firmware. If that's the case, I would be inclined to replace the drive with a different brand. However, I guess all drives have bugs ... the question is whether they affect you. Another question is whether it's possible at all to find out what caused the problem in the first place. Best regards Oliver -- Oliver Fromme, secnetix GmbH Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd What is this talk of 'release'? We do not make software 'releases'. Our software 'escapes', leaving a bloody trail of designers and quality assurance people in its wake. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
disk error
Hi all, Just found these messages in my logfile. Is it something to worry about? I've never seen them before upgrading to 6.3. ra kernel: ad0: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=281550271 ra kernel: ad0: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=4ABORTED LBA=281550271 ra kernel: g_vfs_done():ad0s1f[READ(offset=138248126464, length=16384)]error = 5 ra kernel: handle_workitem_freeblocks: block count ra kernel: handle_workitem_freeblks: got error 5 while accessing filesystem Peter -- http://www.boosten.org ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: disk error
On Sat, 2008-02-16 at 17:59 +0100, Peter Boosten wrote: Hi all, Just found these messages in my logfile. Is it something to worry about? I've never seen them before upgrading to 6.3. ra kernel: ad0: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=281550271 ra kernel: ad0: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=4ABORTED LBA=281550271 Yea -- normally that means a bad sector(*), and where there's one, there's bound to be more. Failed drive eventually. I would pull this server from rotation and run a full surface sector scan on it (download an ISO of Hiran's Boot CD) Or if its a geom mirror raid-1, test this component. If it was scsi, I would recommend camcontrol(8) to query the disk for a list of grown defect sectors. ~BAS *. If you've never seen it before and it developed. Bad cables/controllers/drives/interference can cause it too, but you would have seen it from inception. ra kernel: g_vfs_done():ad0s1f[READ(offset=138248126464, length=16384)]error = 5 ra kernel: handle_workitem_freeblocks: block count ra kernel: handle_workitem_freeblks: got error 5 while accessing filesystem Peter ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: disk error
On Sat, 16 Feb 2008, Peter Boosten wrote: Brian, thanks for your answer (and sugggestion). Isn't a drive supposed to mark a bad sector as bad and ignore it (that is: They ship with a certain number of unallocated sectors to reassign failed ones to (I dont think ATA/IDE disks have a way to ask this, maybe SMART). Once all of the silent allocations happen unbeknown to the user, then your suffering starts. Install smartutils and check these values: 5 Reallocated_Sector_Ct 0x0033 100 100 005Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067Pre-fail Always - 0 196 Reallocated_Event_Count 0x0032 100 100 000Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x000a 200 200 000Old_age Always - 0 ~BAS not use it anymore)? -- http://www.boosten.org l8* -lava (Brian A. Seklecki - Pittsburgh, PA, USA) http://www.spiritual-machines.org/ Guilty? Yeah. But he knows it. I mean, you're guilty. You just don't know it. So who's really in jail? ~Maynard James Keenan ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: disk error
Brian A. Seklecki wrote: On Sat, 2008-02-16 at 17:59 +0100, Peter Boosten wrote: Hi all, Just found these messages in my logfile. Is it something to worry about? I've never seen them before upgrading to 6.3. ra kernel: ad0: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=281550271 ra kernel: ad0: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=4ABORTED LBA=281550271 Yea -- normally that means a bad sector(*), and where there's one, there's bound to be more. Failed drive eventually. I would pull this server from rotation and run a full surface sector scan on it (download an ISO of Hiran's Boot CD) Brian, thanks for your answer (and sugggestion). Isn't a drive supposed to mark a bad sector as bad and ignore it (that is: not use it anymore)? -- http://www.boosten.org ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: disk error
On Sat, Feb 16, 2008 at 07:30:37PM +0100, Peter Boosten wrote: Brian A. Seklecki wrote: On Sat, 2008-02-16 at 17:59 +0100, Peter Boosten wrote: Hi all, Just found these messages in my logfile. Is it something to worry about? I've never seen them before upgrading to 6.3. ra kernel: ad0: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=281550271 ra kernel: ad0: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=4ABORTED LBA=281550271 Yea -- normally that means a bad sector(*), and where there's one, there's bound to be more. Failed drive eventually. I would pull this server from rotation and run a full surface sector scan on it (download an ISO of Hiran's Boot CD) Brian, thanks for your answer (and sugggestion). Isn't a drive supposed to mark a bad sector as bad and ignore it (that is: not use it anymore)? The drive can only remap bad sectors when you write to them. When you read from a bad sector the drive does not know what data was supposed to be there and thus can only return an error or return garbage data. Returning an error (which is what disks do) is a much better choice. -- Insert your favourite quote here. Erik Trulsson [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
uncorrectable disk error
ad4: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=465628608 g_vfs_done():ad4a[READ(offset=238401650688, length=638976)]error = 5 how can i find (UFS2) what file uses that block? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Block to i-node to file name (was Re: uncorrectable disk error)
On Tue, Aug 21, 2007 at 01:04:38AM +0200, Wojciech Puchar wrote: ad4: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=465628608 g_vfs_done():ad4a[READ(offset=238401650688, length=638976)]error = 5 how can i find (UFS2) what file uses that block? [I took the liberty to change the subject for better archival] Unless you're an fs guru or very patient and careful, you probably won't or would have a hard time. But don't give up yet! Try the following procedure: 1. Determine the slice where the block is located (fdisk) 2. Determine the partition of the block (bsdlabel) 3. Calculate the partition-relative offset of the block (i.e. subtract the slice offset and subtract from the result the partition offset). 4. Fire up fsdb(8) with the -r option on that file system. 5. Use fsdb's findblk command with that fs-relative offset to determine the inode that is holding this block. From man fsdb: findblk disk block number ... Find the inode(s) owning the specified disk block(s) number(s). Note that these are not absolute disk blocks numbers, but offsets from the start of the partition. Keep in mind that the block could also be in the free list (unused); but you'd not get this error message if it was (?). 6. Verify that the resulting i-node number is the right one by jumping to that inode with the inode command of fsdb, and rechecking that this block is indeed held by this i-node with the blocks command of fsdb. (you may want to run fsdb in a script(1), to capture the potentially long list of blocks). 7. The inode number you get won't tell you the name of the file. To find this, scan all directories of that file system for this inode number (I'd write a small C proggy for that, but you could just as well use find(1)'s -inum switch. If your disk is dying, this can (wether with a C program or with find(1) crash your system. If the number of directories is not very high, you could try to use fsdb(8) for that. BEWARE: Always use fsdb(8) with the read-only flag -r! You could irrevocably damage your file system otherwise if you don't know exactly what you're doing. Good luck! Regards, -cpghost. P.S.: We really need a little LBA to i-node utility for UFS/UFS2, that we could combine with find /fs -inum n...! If possible, a utility that also takes care of GEOM-ified disks etc... -- Cordula's Web. http://www.cordula.ws/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Disk Error - DUMP output.
Grant Peel [EMAIL PROTECTED] writes: Is there any way to figure out the files that are not being read using the DUMP error output below? DUMP: read error from /dev/da0s1g: Input/output error: [block 42718592]: count=8192 DUMP: read error from /dev/da0s1g: Input/output error: [sector 42718594]: count=512 DUMP: read error from /dev/da0s1g: Input/output error: [block 42671366]: count=5120 DUMP: read error from /dev/da0s1g: Input/output error: [sector 42671371]: count=512 I had such a problem just last night. I tracked it down by copying directory trees within the filesystem to /dev/null until one failed. Then I repeated the process one directory level down, narrowing down the problem. [It turned out to be my wife's incoming mail spool...] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Disk Error - DUMP output.
Is there any way to figure out the files that are not being read using the DUMP error output below? DUMP: read error from /dev/da0s1g: Input/output error: [block 42718592]: count=8192 DUMP: read error from /dev/da0s1g: Input/output error: [sector 42718594]: count=512 DUMP: read error from /dev/da0s1g: Input/output error: [block 42671366]: count=5120 DUMP: read error from /dev/da0s1g: Input/output error: [sector 42671371]: count=512 -Grant ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Finding an LBA after a disk error
After much revision I finally have a tool that does a pretty good job of identifying the usage of an LBA. Its not perfect, but its normally only used with a disk with a bad sector. It no longer needs the complete source distribution but can be built from the normal libraries. It has been tested on FreeBSD 5.3 and 6.0. One of the libraries it uses was introduced in 5.1 so its not likely to work on anything earlier. It works on ufs1 and ufs2 formats and there is even a man page now. It could be mnade into a port, but I am out of time right now. A quick look at the documents for creating ports shows that it will take quite a bit of time to figure out that part. Contact me off-list if you would like to get it. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Fwd: Re: Disk error messages (ad0: HARD READ ERROR blk# xxxxxx)
On Tue, 3 Jan 2006 12:25 pm, Gayn Winters wrote: [mailto:[EMAIL PROTECTED] On Behalf Of Russell J. Wood Sent: Monday, January 02, 2006 3:54 PM To: freebsd-questions@freebsd.org Subject: Re: Disk error messages (ad0: HARD READ ERROR blk# xx) On Mon, Jan 02, 2006 at 11:15:08PM +, [EMAIL PROTECTED] wrote: Hi there, On my screen, there were messages like the followings comeing up. I have to reboot mutiple times to get it boot up normally. Does this mean I have to replace the disk which is a relatively new disk (1-2 years)? Any simple way to fix it and to avoid the time consuming task? ad0: 39205MB Maxtor 6EX [79656/16/63] at ata0-master WDMA2 ad0: HARD READ ERROR blk# 131199 ad0: HARD READ ERROR blk# 131199 status=59 error=40 ad0: DMA problem fallback to PIO mode ad0: HARD READ ERROR blk# 11272319 status=59 error=40 ad0: HARD READ ERROR blk# 11272319 status=59 error=40 ad0: HARD READ ERROR blk# 11272319 status=59 error=40 ad0: HARD READ ERROR blk# 131199 status=59 error=40 ad0: HARD READ ERROR blk# 3473535 status=59 error=40 ad0: HARD READ ERROR blk# 9240703 status=59 error=40 ad0: HARD READ ERROR blk# 17367167 status=59 error=40 ad0: HARD READ ERROR blk# 17760383 status=59 error=40 I suspect that you have bad sectors on your hard disk drive (and many of them). A good tool to use is Segate's Seatools (http://www.seagate.com/support/seatools/index.html). Just burn the Seatools Desktop edition to CDROM and boot from it. - Russell After you've checked for loose cables, you might want to take the drive out and check it in another system (using the Seagate or other such tools). If indeed the problem is with DMA, the drive might be ok but the MB is flakey. Perhaps the PC or MB manufacturer has diagnosics with which you can zero into the latter ugly possiblity. In any case, get yourself a backup asap (at least of the user data so that you can recover from a fresh installation.) Unless you are getting other types of errors, it is probably still possible to copy the drive with dd using bs=512b, and this would be your quickest fix of a hard drive problem. Run fsck on your new disk after the copy. dd is generally not a good choice for copying disks (although it does sort of work). The new disk will appear unclean when copied from a live fs and may in fact have an odd instance of a file which has not yet been physically updated. And it just takes too long since you copy empty space as well as real data. Instead slice, partition the new disk and create newfs on the new partitions. And then pipe dump (using the snapshot option) through to restore for each fs on the disks. I have (successfully) used this approach extensively for cloning systems. Malcolm ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Disk error messages (ad0: HARD READ ERROR blk# xxxxxx)
Hi there, On my screen, there were messages like the followings comeing up. I have to reboot mutiple times to get it boot up normally. Does this mean I have to replace the disk which is a relatively new disk (1-2 years)? Any simple way to fix it and to avoid the time consuming task? ad0: 39205MB Maxtor 6EX [79656/16/63] at ata0-master WDMA2 ad0: HARD READ ERROR blk# 131199 ad0: HARD READ ERROR blk# 131199 status=59 error=40 ad0: DMA problem fallback to PIO mode ad0: HARD READ ERROR blk# 11272319 status=59 error=40 ad0: HARD READ ERROR blk# 11272319 status=59 error=40 ad0: HARD READ ERROR blk# 11272319 status=59 error=40 ad0: HARD READ ERROR blk# 131199 status=59 error=40 ad0: HARD READ ERROR blk# 3473535 status=59 error=40 ad0: HARD READ ERROR blk# 9240703 status=59 error=40 ad0: HARD READ ERROR blk# 17367167 status=59 error=40 ad0: HARD READ ERROR blk# 17760383 status=59 error=40 ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Disk error messages (ad0: HARD READ ERROR blk# xxxxxx)
On Monday 02 January 2006 02:15 pm, [EMAIL PROTECTED] wrote: Hi there, On my screen, there were messages like the followings comeing up. I have to reboot mutiple times to get it boot up normally. Does this mean I have to replace the disk which is a relatively new disk (1-2 years)? Any simple way to fix it and to avoid the time consuming task? ad0: 39205MB Maxtor 6EX [79656/16/63] at ata0-master WDMA2 ad0: HARD READ ERROR blk# 131199 ad0: HARD READ ERROR blk# 131199 status=59 error=40 ad0: DMA problem fallback to PIO mode ad0: HARD READ ERROR blk# 11272319 status=59 error=40 ad0: HARD READ ERROR blk# 11272319 status=59 error=40 ad0: HARD READ ERROR blk# 11272319 status=59 error=40 ad0: HARD READ ERROR blk# 131199 status=59 error=40 ad0: HARD READ ERROR blk# 3473535 status=59 error=40 ad0: HARD READ ERROR blk# 9240703 status=59 error=40 ad0: HARD READ ERROR blk# 17367167 status=59 error=40 ad0: HARD READ ERROR blk# 17760383 status=59 error=40 Check that your cables are tight. You might even try swapping your drive cable. Other than that it looks like your drive is failing. You do have backups don't you? Beech -- --- Beech Rintoul - System Administrator - [EMAIL PROTECTED] /\ ASCII Ribbon Campaign | NorthWind Communications \ / - NO HTML/RTF in e-mail | 201 East 9th Avenue Ste.310 X - NO Word docs in e-mail | Anchorage, AK 99501 / \ - Please visit Alaska Paradise - http://akparadise.byethost33.com --- pgpwsowY81Mak.pgp Description: PGP signature
Re: Disk error messages (ad0: HARD READ ERROR blk# xxxxxx)
On Mon, Jan 02, 2006 at 11:15:08PM +, [EMAIL PROTECTED] wrote: Hi there, On my screen, there were messages like the followings comeing up. I have to reboot mutiple times to get it boot up normally. Does this mean I have to replace the disk which is a relatively new disk (1-2 years)? Any simple way to fix it and to avoid the time consuming task? ad0: 39205MB Maxtor 6EX [79656/16/63] at ata0-master WDMA2 ad0: HARD READ ERROR blk# 131199 ad0: HARD READ ERROR blk# 131199 status=59 error=40 ad0: DMA problem fallback to PIO mode ad0: HARD READ ERROR blk# 11272319 status=59 error=40 ad0: HARD READ ERROR blk# 11272319 status=59 error=40 ad0: HARD READ ERROR blk# 11272319 status=59 error=40 ad0: HARD READ ERROR blk# 131199 status=59 error=40 ad0: HARD READ ERROR blk# 3473535 status=59 error=40 ad0: HARD READ ERROR blk# 9240703 status=59 error=40 ad0: HARD READ ERROR blk# 17367167 status=59 error=40 ad0: HARD READ ERROR blk# 17760383 status=59 error=40 I suspect that you have bad sectors on your hard disk drive (and many of them). A good tool to use is Segate's Seatools (http://www.seagate.com/support/seatools/index.html). Just burn the Seatools Desktop edition to CDROM and boot from it. - Russell ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: Disk error messages (ad0: HARD READ ERROR blk# xxxxxx)
[mailto:[EMAIL PROTECTED] On Behalf Of Russell J. Wood Sent: Monday, January 02, 2006 3:54 PM To: freebsd-questions@freebsd.org Subject: Re: Disk error messages (ad0: HARD READ ERROR blk# xx) On Mon, Jan 02, 2006 at 11:15:08PM +, [EMAIL PROTECTED] wrote: Hi there, On my screen, there were messages like the followings comeing up. I have to reboot mutiple times to get it boot up normally. Does this mean I have to replace the disk which is a relatively new disk (1-2 years)? Any simple way to fix it and to avoid the time consuming task? ad0: 39205MB Maxtor 6EX [79656/16/63] at ata0-master WDMA2 ad0: HARD READ ERROR blk# 131199 ad0: HARD READ ERROR blk# 131199 status=59 error=40 ad0: DMA problem fallback to PIO mode ad0: HARD READ ERROR blk# 11272319 status=59 error=40 ad0: HARD READ ERROR blk# 11272319 status=59 error=40 ad0: HARD READ ERROR blk# 11272319 status=59 error=40 ad0: HARD READ ERROR blk# 131199 status=59 error=40 ad0: HARD READ ERROR blk# 3473535 status=59 error=40 ad0: HARD READ ERROR blk# 9240703 status=59 error=40 ad0: HARD READ ERROR blk# 17367167 status=59 error=40 ad0: HARD READ ERROR blk# 17760383 status=59 error=40 I suspect that you have bad sectors on your hard disk drive (and many of them). A good tool to use is Segate's Seatools (http://www.seagate.com/support/seatools/index.html). Just burn the Seatools Desktop edition to CDROM and boot from it. - Russell After you've checked for loose cables, you might want to take the drive out and check it in another system (using the Seagate or other such tools). If indeed the problem is with DMA, the drive might be ok but the MB is flakey. Perhaps the PC or MB manufacturer has diagnosics with which you can zero into the latter ugly possiblity. In any case, get yourself a backup asap (at least of the user data so that you can recover from a fresh installation.) Unless you are getting other types of errors, it is probably still possible to copy the drive with dd using bs=512b, and this would be your quickest fix of a hard drive problem. Run fsck on your new disk after the copy. Good luck, -gayn Bristol Systems Inc. 714/532-6776 www.bristolsystems.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Non-system disk or disk error
I don't know if this is poor netiquete or not but I am bumping my own question in case anyone missed it. Basically my 5.4 installation will not boot up from a warm reboot but will boot with no problems from a power-off situation. Thanks. --- Portie Owner [EMAIL PROTECTED] wrote: I am sure there is an easy solution to this but here is my problem, and it is driving me nuts. The error message on boot is Non-system disk or disk error. I only get this message if I do a warm reboot with no power off. If I halt the system and power off and restart it boots right up. Computer is a Compaq AP500 (P-II 450mhz, 700MB Ram, Adaptec SCSI card). The system has two SCSI drives, C: which is at ID 1 and D: which is at ID 2. The OS is FreeBDS 5.4, standard installation using the FreeBSD-only boot manager (I also tried the alternate FreeBSD boot choice). No other OSs reside on the machine and I have tried to start with a clean DOS Fdisked bachine before installing FreeBSD. The PC does not have the Compaq bios partition installed but that does not seem to matter. I have not been able to upgrade the ROM BIOS on this machine, but the Compaq Diagnostics and Setup programs seem to work and report the right information about the disks. I even tried disabling floppy and CD media boot but that dodn't help either. Thanks, Portie __ Yahoo! FareChase: Search multiple travel sites in one click. http://farechase.yahoo.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] __ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Non-system disk or disk error
I am sure there is an easy solution to this but here is my problem, and it is driving me nuts. The error message on boot is Non-system disk or disk error. I only get this message if I do a warm reboot with no power off. If I halt the system and power off and restart it boots right up. Computer is a Compaq AP500 (P-II 450mhz, 700MB Ram, Adaptec SCSI card). The system has two SCSI drives, C: which is at ID 1 and D: which is at ID 2. The OS is FreeBDS 5.4, standard installation using the FreeBSD-only boot manager (I also tried the alternate FreeBSD boot choice). No other OSs reside on the machine and I have tried to start with a clean DOS Fdisked bachine before installing FreeBSD. The PC does not have the Compaq bios partition installed but that does not seem to matter. I have not been able to upgrade the ROM BIOS on this machine, but the Compaq Diagnostics and Setup programs seem to work and report the right information about the disks. I even tried disabling floppy and CD media boot but that dodn't help either. Thanks, Portie __ Yahoo! FareChase: Search multiple travel sites in one click. http://farechase.yahoo.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
disk error?
hi all... suddenly today out of nowhere this happens (log below) and now i get vchkpw core dumps every few minutes or so. vchkpw is authorization module for vpopmail... does this mean the disk where vpopmail lives - ad2 - is already craping up?! thanks... here is the log: Aug 31 22:53:33 chavo /kernel: ad2: READ command timeout tag=0 serv=0 - resetting Aug 31 22:53:33 chavo /kernel: ata1: resetting devices .. done Sep 1 00:36:22 chavo /kernel: ad2: READ command timeout tag=0 serv=0 - resetting Sep 1 00:36:22 chavo /kernel: ata1: resetting devices .. done Sep 1 01:12:42 chavo /kernel: ad2: READ command timeout tag=0 serv=0 - resetting Sep 1 01:12:42 chavo /kernel: ata1: resetting devices .. done Sep 1 01:49:54 chavo /kernel: ad2: WRITE command timeout tag=0 serv=0 - resetting Sep 1 01:49:54 chavo /kernel: ata1: resetting devices .. done Sep 1 01:52:12 chavo /kernel: ad2: WRITE command timeout tag=0 serv=0 - resetting Sep 1 01:52:12 chavo /kernel: ata1: resetting devices .. Sep 1 01:52:12 chavo /kernel: ad2: removed from configuration Sep 1 01:52:12 chavo /kernel: ad3: removed from configuration Sep 1 01:52:12 chavo /kernel: done Sep 1 01:53:02 chavo /kernel: handle_workitem_freeblocks: block count Sep 1 01:54:04 chavo /kernel: handle_workitem_freeblocks: block count Sep 1 01:55:37 chavo last message repeated 2 times -- ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Disk Error ... back up method
Yance Kowara [EMAIL PROTECTED] writes: Hi all, I am a FreeBSD newbie... would like to know more about backing up the whole FreeBSD system to a new hard disk. What is the most convenient method of backing up to a new harddisk? any pointers appreciated The question isn't completely clear, but I think the FAQ entry for How do I move my system over to my huge new disk? is probably what you're looking for. http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/disks.html#NEW-HUGE-DISK I cut and pasted Aftabs' reply to Disk Error thread ... Thanks in advance. ASAP 1. fsck -y 2. tunefs ( enable softupdate) 3. backup to new hard disk 4. remove this faulty hard disk Your hard disk is dyeing . __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] -- Lowell Gilbert, embedded/networking software engineer, Boston area http://be-well.ilk.org/~lowell/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: Disk Error
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Doug Hardie Sent: Sunday, March 06, 2005 10:24 PM To: Aftab Jahan Subedar Cc: FreeBSD Questions Subject: Re: Disk Error I doubt that its dying. There is only one bad sector. The drive is in constant use. Its ran at 100% for almost 12 hours while copying the files and no errors were detected. Its always the same sector with the error. I've seen something like this once when a drive/bios combo lied about the number of blocks the drive had available. The BSD partition was created larger than the actual available blocks, thus whenever the OS sent data to blocks that didn't exist, you got this problem. If this is setup OK then as the other poster said your days on this drive are coming to an end. IDE drives have a number of reserved blocks available that are used internally by the drive to map out bad sectors. When a drive starts going bad the sectors start failing one by one and the drive maps them out - when it uses up all the reserved blocks then the drive starts returning errors to the operating system. If this drive supports S.M.A.R.T. and it's enabled and your running 5.X then smartmon might give you some data about the actual real state of the drive, rather than the lies that the drive normally tells the OS. Ted ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Disk Error
Doug Hardie wrote: I doubt that its dying. There is only one bad sector. The drive is in constant use. Its ran at 100% for almost 12 hours while copying the files and no errors were detected. Its always the same sector with the error. Just as a note, hard drives now come with a number of spare sectors which they map automatically to replace dead sectors. This is done because all drives ship with a few bad sectors. Usually when errors like this show up, it is because the drive is out of spares. Since problems like these tend to accelerate, it is a good idea at least to consider replacing the disk before you start losing data more than a sector at a time. You might consider getting smartmontools and seeing what the drive's diagnostics have to say. Usually that will tell you if this is a fluke or a symptom of a failing drive. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Disk Error ... back up method
Hi all, I am a FreeBSD newbie... would like to know more about backing up the whole FreeBSD system to a new hard disk. What is the most convenient method of backing up to a new harddisk? any pointers appreciated I cut and pasted Aftabs' reply to Disk Error thread ... Thanks in advance. ASAP 1. fsck -y 2. tunefs ( enable softupdate) 3. backup to new hard disk 4. remove this faulty hard disk Your hard disk is dyeing . __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Disk Error
I have been getting the following disk errors consistently for the last month. ad2s1e: hard error reading fsbn 6934399 of 3467168-3467295 (ad2s1 bn 6934399; cn 431 tn 164 sn 52) status=59 error=40 spec_getpages:(#ad/0x20014) I/O read failure: (error=5) bp 0xc5678f94 vp 0xcb5f3a80 size: 65536, resid: 65536, a_count: 65536, valid: 0x0 nread: 0, reqpage: 0, pindex: 504, pcount: 16 vm_fault: pager read error, pid 35441 (expireover) How do you figure out which file has the problem? expireover's logs are all buffered so you don't get the last partial buffer. I don't know yet if I can mark that particular sector as bad, but if I can find the file I can at least move to someplace where it won't get deleted. I chased through the core dump and the only directory indicated but all of those files are good. I have also tar'd the entire news directory elsewhere and no errors were encountered. The sector is the same every day. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Disk Error
ASAP 1. fsck -y 2. tunefs ( enable softupdate) 3. backup to new hard disk 4. remove this faulty hard disk Your hard disk is dyeing . Doug Hardie wrote: I have been getting the following disk errors consistently for the last month. ad2s1e: hard error reading fsbn 6934399 of 3467168-3467295 (ad2s1 bn 6934399; cn 431 tn 164 sn 52) status=59 error=40 spec_getpages:(#ad/0x20014) I/O read failure: (error=5) bp 0xc5678f94 vp 0xcb5f3a80 size: 65536, resid: 65536, a_count: 65536, valid: 0x0 nread: 0, reqpage: 0, pindex: 504, pcount: 16 vm_fault: pager read error, pid 35441 (expireover) How do you figure out which file has the problem? expireover's logs are all buffered so you don't get the last partial buffer. I don't know yet if I can mark that particular sector as bad, but if I can find the file I can at least move to someplace where it won't get deleted. I chased through the core dump and the only directory indicated but all of those files are good. I have also tar'd the entire news directory elsewhere and no errors were encountered. The sector is the same every day. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Disk Error
I doubt that its dying. There is only one bad sector. The drive is in constant use. Its ran at 100% for almost 12 hours while copying the files and no errors were detected. Its always the same sector with the error. On Mar 7, 2005, at 09:54, Aftab Jahan Subedar wrote: ASAP 1. fsck -y 2. tunefs ( enable softupdate) 3. backup to new hard disk 4. remove this faulty hard disk Your hard disk is dyeing . Doug Hardie wrote: I have been getting the following disk errors consistently for the last month. ad2s1e: hard error reading fsbn 6934399 of 3467168-3467295 (ad2s1 bn 6934399; cn 431 tn 164 sn 52) status=59 error=40 spec_getpages:(#ad/0x20014) I/O read failure: (error=5) bp 0xc5678f94 vp 0xcb5f3a80 size: 65536, resid: 65536, a_count: 65536, valid: 0x0 nread: 0, reqpage: 0, pindex: 504, pcount: 16 vm_fault: pager read error, pid 35441 (expireover) How do you figure out which file has the problem? expireover's logs are all buffered so you don't get the last partial buffer. I don't know yet if I can mark that particular sector as bad, but if I can find the file I can at least move to someplace where it won't get deleted. I chased through the core dump and the only directory indicated but all of those files are good. I have also tar'd the entire news directory elsewhere and no errors were encountered. The sector is the same every day. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
disk error or what ?
Hi, I have a scsi controller: iir0: Intel Integratd RAID Controller mem 0xfc2f-0xfc2f3fff irq 20 at device 8.0 on pci4 Yestarday I have taken this messages from my dmesg output: iir0: SCSI-B, ID 0: last status 0x0107. I/O status: SELECTION_TIMEOUT iir0: SCSI-B, ID 0: Check cables, termination, termpower, LVDS operation, etc. iir0: Array Drive 0: Logical Drive 0 SCSI-B, ID 0, LUN 0 failed iir0: Array Drive 0: FAIL state entered iir0: SCSI-B, ID 0: Auto Hot Plug started for slot 0 iir0: SCSI-B, ID 0: MPI returned 0x0043 iir0: Bus B: The SCSI controller successfully recovered from a SCSI BUS issue. The issue may still be present on the BUS. Check cables, termination, termpower, LVDS operation, etc iir0: SCSI-B, ID 0: MPI returned 0x0048 I want to be sure that if I have understood right. It seems that Driver0 has failed. But I want to be sure that that is correct. How can I verift that Driver0 has the problem. --- Omer Faruk Sen http://www.EnderUNIX.ORG Software Development Team @ Turkey http://www.Faruk.NET For Public key: http://www.enderunix.org/ofsen/ofsen.asc First Turkish FreeBSD book is out! Go check it. Duydunuz mu! Turkiye'nin ilk FreeBSD kitabi cikti. http://www.acikkod.com/freebsd.php ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
4.9-Stable: Disk error booting from hard drive
Hi all, I am completely stumped by this one. I have a new MSI 1u P1-1000 server: 2.4GHz P4, 1GB Ram, with a 40GB IDE (38166MB ST340014A [77545/16/63]) hard drive on the primary master channel. Here's the problem, I can install 4.9-Stable, but when I finish and reboot the machine the bios reports disk error, the machine reboots and is trapped in this loop (I've tried setting the box up with a Boot Manger and Standard, but it doesn't work for either). To make things interesting 5.2-Release works fine, I can install, reboot, everything is cool. I don't understand what is going on here, I even tried re-downloading the 4.9 iso to check my cd with no luck. This machine will be a high use server so I really want to run Stable, what do I need to do? Thanks in advance, Max This is the dmesg from 5.2: Copyright (c) 1992-2004 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.2-RELEASE #0: Sun Jan 11 04:21:45 GMT 2004 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC Preloaded elf kernel /boot/kernel/kernel at 0xc0a33000. Preloaded elf module /boot/kernel/acpi.ko at 0xc0a331f4. Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz (2391.15-MHz 686-class CPU) Origin = GenuineIntel Id = 0xf29 Stepping = 9 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE real memory = 1073676288 (1023 MB) avail memory = 1033510912 (985 MB) ACPI APIC Table: IntelR AWRDACPI ioapic0 Version 2.0 irqs 0-23 on motherboard Pentium Pro MTRR support enabled npx0: [FAST] npx0: math processor on motherboard npx0: INT 16 interface acpi0: IntelR AWRDACPI on motherboard pcibios: BIOS version 2.10 Using $PIR table, 10 entries at 0xc00fdec0 acpi0: Power Button (fixed) Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x408-0x40b on acpi0 acpi_cpu0: CPU on acpi0 acpi_cpu1: CPU on acpi0 device_probe_and_attach: acpi_cpu1 attach returned 6 acpi_tz0: Thermal Zone on acpi0 acpi_button0: Power Button on acpi0 acpi_button1: Sleep Button on acpi0 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 agp0: Intel 82845 host to AGP bridge mem 0xd000-0xdfff at device 0.0 on pci0 pcib1: PCI-PCI bridge at device 1.0 on pci0 pci1: PCI bus on pcib1 uhci0: Intel 82801DB (ICH4) USB controller USB-A port 0xd800-0xd81f irq 16 at device 29.0 on pci0 usb0: Intel 82801DB (ICH4) USB controller USB-A on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: Intel 82801DB (ICH4) USB controller USB-B port 0xd000-0xd01f irq 19 at device 29.1 on pci0 usb1: Intel 82801DB (ICH4) USB controller USB-B on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: Intel 82801DB (ICH4) USB controller USB-C port 0xd400-0xd41f irq 18 at device 29.2 on pci0 usb2: Intel 82801DB (ICH4) USB controller USB-C on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered pci0: serial bus, USB at device 29.7 (no driver attached) pcib2: ACPI PCI-PCI bridge at device 30.0 on pci0 pci2: ACPI PCI bus on pcib2 em0: Intel(R) PRO/1000 Network Connection, Version - 1.7.19 port 0xc000-0xc03f mem 0xe200-0xe201 irq 21 at device 5.0 on pci2 em0: Speed:N/A Duplex:N/A fxp0: Intel 82551 Pro/100 Ethernet port 0xc400-0xc43f mem 0xe202-0xe203,0xe2041000-0xe2041fff irq 23 at device 6.0 on pci2 fxp0: Ethernet address 00:0c:76:4e:78:73 miibus0: MII bus on fxp0 inphy0: i82555 10/100 media interface on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto pci2: display, VGA at device 7.0 (no driver attached) isab0: PCI-ISA bridge at device 31.0 on pci0 isa0: ISA bus on isab0 atapci0: Intel ICH4 UDMA100 controller port 0xf000-0xf00f,0-0x3,0-0x7,0-0x3,0-0x7 at device 31.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata0: [MPSAFE] ata1: at 0x170 irq 15 on atapci0 ata1: [MPSAFE] pci0: serial bus, SMBus at device 31.3 (no driver attached) fdc0: Enhanced floppy controller (i82077, NE72065 or clone) port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0 sio0 port 0x3f8-0x3ff irq 4 on acpi0 sio0: type 16550A sio1 port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A atkbdc0: Keyboard controller (i8042) port 0x64,0x60 irq 1 on acpi0 atkbd0: AT Keyboard flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: PS/2 Mouse irq 12 on atkbdc0 psm0: model IntelliMouse, device ID 3 acpi_cpu1: CPU on acpi0 device_probe_and_attach: acpi_cpu1 attach returned 6 orm0: Option ROM at iomem 0xc-0xc7fff on isa0 pmtimer0 on isa0 ppc0: parallel port not found. sc0: System console at flags 0x100 on isa0 sc0: VGA 16 virtual consoles
vinum crashed disk error
Hello, Three IDE drives multiplexed together to make one large partition (for mounting as /usr/local). We were messing with hardware in the box and when we rebooted vinum spat out errors about defective objects and the boot came to a halt. We figured we had left something loose or unplugged on the motherboard, so we shut it down and took a look. Sure enough, the plug in the mobo's secondary IDE channel was loose, so we reseated it and powered the machine up again. We saw that the kernel found all the IDE drives and figured the problem was over. But vinum had the same problem. It said (loose quotation): /dev/ mounted read-only. vinum config not being rebuilt. And then spit out the same errors it had the first time. Unfortunately, the vinum.org domain is having problems of some sort, and the vinum help pages on lemis.com are redirected to the vinum.org site, so I am deprived of a great trouble-shooting resource. System is 4.8-STABLE. Below you will see the output of 'vinum start'. Any suggestions as to fixing this problem would be greatly appreciated, as we do make production use of this box. Thank you in advance for any words you may be able to offer. -John output of 'vinum start': Warning: defective objects V bigdisk State: down Plexes: 1 Size: 23 GB P big_plexC State: faulty Subdisks: 3 Size: 23 GB S drive0State: down PO:0 B Size: 8063 MB S drive2State: crashed PO: 8063 MB Size: 8063 MB S drive3State: crashed PO: 15 GB Size: 8063 MB -- +---+ | John Fox [EMAIL PROTECTED] |System Administrator | InfoStructure | +---+ |Gideon: I thought you said don't hold a grudge.| | Galen: I don't. I have no surviving enemies...at all. | | -- Crusdade, _Racing the Night_ | +---+ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
vinum crashed disk error -- addendum
Maybe it'll help if I give some more comprehensive information: vinum - l 3 drives: D ide0e State: up Device /dev/ad0s1e Avail: 0/8063 MB (0%) D ide2e State: up Device /dev/ad2s1e Avail: 0/4031 MB (0%) D ide3e State: up Device /dev/ad3s1e Avail: 0/8063 MB (0%) 1 volumes: V bigdisk State: down Plexes: 1 Size: 23 GB 1 plexes: P big_plexC State: faulty Subdisks: 3 Size: 23 GB 3 subdisks: S drive0State: down PO:0 B Size: 8063 MB S drive2State: crashed PO: 8063 MB Size: 8063 MB S drive3State: crashed PO: 15 GB Size: 8063 MB Thanks, John -- +---+ | John Fox [EMAIL PROTECTED] |System Administrator | InfoStructure | +---+ |Gideon: I thought you said don't hold a grudge.| | Galen: I don't. I have no surviving enemies...at all. | | -- Crusdade, _Racing the Night_ | +---+ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vinum crashed disk error -- addendum
On Tuesday, 23 September 2003 at 11:29:44 -0700, John Fox wrote: Maybe it'll help if I give some more comprehensive information: vinum - l 3 drives: D ide0e State: up Device /dev/ad0s1e Avail: 0/8063 MB (0%) D ide2e State: up Device /dev/ad2s1e Avail: 0/4031 MB (0%) D ide3e State: up Device /dev/ad3s1e Avail: 0/8063 MB (0%) 1 volumes: V bigdisk State: down Plexes: 1 Size: 23 GB 1 plexes: P big_plexC State: faulty Subdisks: 3 Size: 23 GB 3 subdisks: S drive0State: down PO:0 B Size: 8063 MB S drive2State: crashed PO: 8063 MB Size: 8063 MB S drive3State: crashed PO: 15 GB Size: 8063 MB By itself, this is meaningless. If you have a problem, look at the man page or the web site for information on how to report it. Greg -- When replying to this message, please copy the original recipients. If you don't, I may ignore the reply or reply to the original recipients. For more information, see http://www.lemis.com/questions.html See complete headers for address and phone numbers pgp0.pgp Description: PGP signature
Re: vinum crashed disk error
On Tuesday, 23 September 2003 at 11:21:49 -0700, John Fox wrote: Hello, Three IDE drives multiplexed together to make one large partition (for mounting as /usr/local). We were messing with hardware in the box and when we rebooted vinum spat out errors about defective objects and the boot came to a halt. We figured we had left something loose or unplugged on the motherboard, so we shut it down and took a look. Sure enough, the plug in the mobo's secondary IDE channel was loose, so we reseated it and powered the machine up again. We saw that the kernel found all the IDE drives and figured the problem was over. But vinum had the same problem. It said (loose quotation): /dev/ mounted read-only. vinum config not being rebuilt. And then spit out the same errors it had the first time. These messages have a purpose. You shouldn't just ignore them. Unfortunately, the vinum.org domain is having problems of some sort, What sort? More error messages? I have no problem accessing it (and no, it's not here, it's at the other end of the world). and the vinum help pages on lemis.com are redirected to the vinum.org site, They're on the same server. so I am deprived of a great trouble-shooting resource. There are still the man pages. output of 'vinum start': Warning: defective objects V bigdisk State: down Plexes: 1 Size: 23 GB P big_plexC State: faulty Subdisks: 3 Size: 23 GB S drive0State: down PO:0 B Size: 8063 MB S drive2State: crashedPO: 8063 MB Size: 8063 MB S drive3State: crashedPO: 15 GB Size: 8063 MB Any suggestions as to fixing this problem would be greatly appreciated, as we do make production use of this box. Do these objects have any relationship to each other? The naming is confusing to say the least. In general, though, if your drives are up again, and the volume only has one plex, you can use the 'vinum setupstate' command to explicitly set the state to up. You'll then need to save the configuration with saveconfig after you've confirmed that the data is OK. Greg -- When replying to this message, please copy the original recipients. If you don't, I may ignore the reply or reply to the original recipients. For more information, see http://www.lemis.com/questions.html See complete headers for address and phone numbers pgp0.pgp Description: PGP signature
Re: hard disk error , run fsck manually
hi sirs, thanks to all who give me good helps and hints on my problem. but i afraid that i really need to replace hard disk. i try David Wolfskill's last help but fialed. i mean i could not re-allocate hard disk back. but that is /home partion. my other partion such as /usr /var / are still clean. at this time i simply want to backup /var ( actually is mysql data), /etc and /usr/local/www/data, should only dump enough for backup ? once again thanks so much to all of you. --- David Wolfskill [EMAIL PROTECTED] wrote: Date: Sun, 29 Jun 2003 00:17:27 -0700 (PDT) From: manee [EMAIL PROTECTED] Subject: Re: hard disk error , run fsck manually To: David Wolfskill [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] i got , after running fsck -p THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY: /dev/ad0s2g (/home) so that i ran fsck OK. and the following messages come Phase 1 ad0s2g: hard error reading fsbn 98971950 of 21364912-21365023( ad0s2 nb 98971950; cn 6160 tn 183 sn 21) status=59 error=40; CAN NOT READ: BLK 21364912 continue? [yn] i had to hit y and a few messages simila to the above popped up and before Phase 2 started, i got FILE SYSTEM STILL DIRTY PLEASE RERUN fsck MANULLY. Well, you had data on your disk that is no longer readable. If you are lucky, you may be able to get the disk to re-allocate some of the bad sectors. If there were more than about 6 or 8 of these, though, I suspect that you will need to replace the disk soon enough that it is not worth your time. To try to get the disk to re-allocate block 21364912, I would do: dd bs=512 count=1 if=/dev/zero of=/dev/ad0s2g seek=21364912 Note that this has a very high probability of ensuring that whatever data is now written to block 21364912 is different from what it had been; its only saving grace is that it is data that may possibly be readable Once you have done this for each block that was reported as CAN NOT READ: BLK , then re-run fsck. Because things are almost assuredly going to be inconsistent, you may wish to merely do fsck -y An alternate, and possibly faster approach would be to skip the fsck altogether, and just use newfs. Of course, that will obliterate any data you once had on the file system, and you would then need to reconstruct the data -- from backups or other sources. But then, you may well need to do that anyway, especially for files affected by the bad blocks. at this ponit i had to edit /etc/fstab and put /home as read only in order to bring system up and running. Seems that you have a disk drive that is getting bad enough that its continued usefulness is in question. with best regards, = ÁÒ¹Õ http://www.thai-aec.org __ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hard disk error , run fsck manually
hi sirs, thanks for your time indeed. --- David Wolfskill [EMAIL PROTECTED] wrote: Reboot your system. During the 10-second spinning propeller count-down, press the space bar once. You should see the prompt boot At that point, type boot -s and press Enter. This will enable you to boot into single-user mode. The machine should show the usual device probes, but instead of mounting filesystems and starting daemons, you will get a prompt like: Enter full pathname of shell or RETURN for /bin/sh: At that point, press Enter. The prompt should read # This means that you are in single-user mode; you are running as root. At this point, I would (first) try fsck -p reboot That is, do the fsck in preen mode; if that works OK, just reboot. If that does not automatically reboot, you have problems that fsck -p cannot fix easily. In that case, try as expected, i need to run fsck fsck and answer the questions as best you can. If you are (finally!) able to get through that OK, try i got , after running fsck -p THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY: /dev/ad0s2g (/home) so that i ran fsck and the following messages come Phase 1 ad0s2g: hard error reading fsbn 98971950 of 21364912-21365023( ad0s2 nb 98971950; cn 6160 tn 183 sn 21) status=59 error=40; CAN NOT READ: BLK 21364912 continue? [yn] i had to hit y and a few messages simila to the above popped up and before Phase 2 started, i got FILE SYSTEM STILL DIRTY PLEASE RERUN fsck MANULLY. at this ponit i had to edit /etc/fstab and put /home as read only in order to bring system up and running. reboot and see how far you ge. Peace, david -- David H. Wolfskill[EMAIL PROTECTED] Based on what I have seen to date, the use of Microsoft products is not consistent with reliability. I recommend FreeBSD for reliable systems. once again please cc to me with best regards, = ÁÒ¹Õ http://www.thai-aec.org __ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re:hard disk error , run fsck manually
--- Alex Zivenko [EMAIL PROTECTED] wrote: You need just run fsck. See the man page. I had this problem too. Only yesterday. All was fixed with fsck, I just gived root password and logged in in single user mode. Then I just runned fsck with some params. thank you for your time. but in my case, fsck can not help. i also try fsck -p -y i still got FILE SYSTEM STILL DIRTY PLEASE RERUN fsck MANUALLY. what i did was simply put that partion or file system in read only mode and exit single user mode in order to bring the system up and running. anyway, thanks so much for your helps. with best regards, = ÁÒ¹Õ http://www.thai-aec.org __ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
hard disk error , run fsck manually
hi sirs, i face problem of hard disk error because of the power downed during using the machine. once the power is coming, my machine stuck at file system is still dirty please run fsck manually. the partion that is dirty is /dev/ad0s2g, a home partion one. i did run fsck /dev/ad0s2g several times but still get the same message. what i decided to do was that to edit /etc/fstab and put read only option for /dev/ad0s2g and exited a single mode. i got a message said that /home was not dismount, as you see in the attachment. up to this point, only root that can log in. my question is that are there any method to recover an fsck error during boot time? please cc to me since i do not a member of the list. with best regards, = ÁÒ¹Õ http://www.thai-aec.org __ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.comCopyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.8-STABLE #2: Sun Jun 1 18:59:28 ICT 2003 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/Bank Timecounter i8254 frequency 1193182 Hz CPU: Intel(R) Celeron(TM) CPU1100MHz (1102.51-MHz 686-class CPU) Origin = GenuineIntel Id = 0x6b1 Stepping = 1 Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE real memory = 125763584 (122816K bytes) config di sn0 config di lnc0 config di ie0 config di fe0 config di cs0 config q avail memory = 118005760 (115240K bytes) Preloaded elf kernel kernel at 0xc0447000. Preloaded userconfig_script /boot/kernel.conf at 0xc044709c. Preloaded elf module snd_via8233.ko at 0xc04470ec. Preloaded elf module snd_pcm.ko at 0xc0447190. Preloaded elf module snd_via82c686.ko at 0xc0447230. Pentium Pro MTRR support enabled md0: Malloc disk Using $PIR table, 6 entries at 0xc00fdd40 npx0: math processor on motherboard npx0: INT 16 interface pcib0: Host to PCI bridge on motherboard pci0: PCI bus on pcib0 agp0: VIA Generic host to PCI bridge mem 0xe000-0xe3ff at device 0.0 on pci0 pcib1: PCI to PCI bridge (vendor=1106 device=8601) at device 1.0 on pci0 pci1: PCI bus on pcib1 pci1: Trident model 8500 VGA-compatible display device at 0.0 irq 10 isab0: VIA 82C686 PCI-ISA bridge at device 7.0 on pci0 isa0: ISA bus on isab0 atapci0: VIA 82C686 ATA100 controller port 0xd000-0xd00f at device 7.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 uhci0: VIA 83C572 USB controller port 0xd400-0xd41f irq 9 at device 7.2 on pci0 usb0: VIA 83C572 USB controller on uhci0 usb0: USB revision 1.0 uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered pci0: unknown card (vendor=0x1106, dev=0x3057) at 7.4 pcm0: VIA VT82C686A port 0xe400-0xe403,0xe000-0xe003,0xdc00-0xdcff irq 11 at device 7.5 on pci0 pcm0: Avance Logic ALC200/200P ac97 codec ed0: NE2000 PCI Ethernet (RealTek 8029) port 0xe800-0xe81f irq 11 at device 10.0 on pci0 ed0: address 00:00:21:2d:ad:f7, type NE2000 (16 bit) orm0: Option ROM at iomem 0xc-0xcbfff on isa0 fdc0: NEC 72065B or clone at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: 1440-KB 3.5 drive on fdc0 drive 0 atkbdc0: Keyboard controller (i8042) at port 0x60,0x64 on isa0 atkbd0: AT Keyboard flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: PS/2 Mouse irq 12 on atkbdc0 psm0: model IntelliMouse, device ID 3 vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0 sc0: System console at flags 0x100 on isa0 sc0: VGA 16 virtual consoles, flags=0x300 sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A ppc0: Parallel port at port 0x378-0x37f irq 7 on isa0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/8 bytes threshold plip0: PLIP network interface on ppbus0 lpt0: Printer on ppbus0 lpt0: Interrupt-driven port ppi0: Parallel I/O on ppbus0 IP packet filtering initialized, divert enabled, rule-based forwarding enabled, default to deny, logging disabled ad0: 38166MB ST340810A [77545/16/63] at ata0-master UDMA100 acd0: CDROM LTN526 at ata1-slave PIO4 Mounting root from ufs:/dev/ad0s2a ad0s2g: hard error reading fsbn 98971886 of 21364912-21365023 (ad0s2 bn 98971886; cn 6160 tn 182 sn 20) trying PIO mode ad0: DMA problem fallback to PIO mode ad0: DMA problem fallback to PIO mode ad0: DMA problem fallback to PIO mode ad0: DMA problem fallback to PIO mode ad0s2g: hard error reading fsbn 98971950 of 21364912-21365023 (ad0s2 bn 98971950; cn 6160 tn 183 sn 21) status=59 error=40 WARNING: /home was not properly dismounted dmesg ended here here is /etc/fstab, editted one # DeviceMountpoint FStype Options DumpPass# /dev/ad0s2b noneswapsw 0 0
Re: hard disk error , run fsck manually
On Saturday 28 June 2003 07:17 pm, manee wrote: hi sirs, i face problem of hard disk error because of the power downed during using the machine. once the power is coming, my machine stuck at file system is still dirty please run fsck manually. the partion that is dirty is /dev/ad0s2g, a home partion one. i did run fsck /dev/ad0s2g several times but still get the same message. what i decided to do was that to edit /etc/fstab and put read only option for /dev/ad0s2g and exited a single mode. i got a message said that /home was not dismount, as you see in the attachment. up to this point, only root that can log in. my question is that are there any method to recover an fsck error during boot time? please cc to me since i do not a member of the list. You need to do something like fsck -y from single user mode. The fs has to be unmounted to fix it. Kent -- Kent Stewart Richland, WA http://users.owt.com/kstewart/index.html ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: slice extends beyond end of disk error on install
Physician heal thyself... I recreated my install disks and the problem disappeared... Will --- W. J. Williams [EMAIL PROTECTED] wrote: I keep getting the following error when trying to install FreeBSD 4.7 ad0: 9773MB FUJITSU MPF3102AT [19857/16/63] at ata0-master UDMA 33 Mounting root from ufs:/dev/md0c md0s4: slice extends beyond end of disk: truncating from 5 to 8640 sectors . after this message the system just hangs. I have low-level formatted the disk twice now, but still the same error. Does anyone know what I am doing wrong? Will = Will Williams To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-questions in the body of the message = Will Williams To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-questions in the body of the message
slice extends beyond end of disk error on install
I keep getting the following error when trying to install FreeBSD 4.7 ad0: 9773MB FUJITSU MPF3102AT [19857/16/63] at ata0-master UDMA 33 Mounting root from ufs:/dev/md0c md0s4: slice extends beyond end of disk: truncating from 5 to 8640 sectors . after this message the system just hangs. I have low-level formatted the disk twice now, but still the same error. Does anyone know what I am doing wrong? Will = Will Williams To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-questions in the body of the message
Disk error 0x10 (lba=0x48)
Hello, I use FreeBSD 4.3 and I encounter problems with the installation floppies. The two disk I made are working fine on my two other PCs. On a third machine, I cannot boot from kern.flp, I receive the following message Disk error 0x10 (lba=0x48) Disk error 0x10 (lba=0x48) No /boot/loader FreeBSD/i386 BOOT Default: 0:fd(0,a)/kernel Boot: Disk error 0x10 (lba=0x48) No /kernel FreeBSD/i386 BOOT Default: 0:fd(0,a)/kernel I can boot dos and linux from this floppy drive. Linux is actually installed on the hardisk and is working fine. It's a second hand pc, I bought for a good price because the first ide connecter is defect, so the first hard disk is master on the second controller (with working linux on it). It's standard harware, nothing special, Intel PII, realtek ethernet, maxtor 91021u2 and floppy disk, no cd. Can you help me with this problem? Thanks a lot. Peter To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-questions in the body of the message
Looking up disk error codes
Hello all. Does anybody know how can I figure out what status=51 error=04 mean? dmesg reported that regarding an (obviously) bad disk (due to power outage). I didn't find anywhere in the Net a look-up table or something similar that which possibly could give me a *precise* answer to my question. Anybody feeling helpful out there? Thanks in advance. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-questions in the body of the message