Prolonging a drive's life

2017-08-14 Thread Mikhail T.
One of the four drives in my system is frequently timing out of late, 
although the operation succeeds on a second attempt:


   Aug 14 13:51:59 aldan kernel: (ada4:ahcich5:0:0:0): FLUSHCACHE48.
   ACB: ea 00 00 00 00 40 00 00 00 00 00 00
   Aug 14 13:51:59 aldan kernel: (ada4:ahcich5:0:0:0): CAM status:
   Command timeout
   Aug 14 13:51:59 aldan kernel: (ada4:ahcich5:0:0:0): Retrying command
   Aug 14 13:59:12 aldan kernel: (ada4:ahcich5:0:0:0): FLUSHCACHE48.
   ACB: ea 00 00 00 00 40 00 00 00 00 00 00
   Aug 14 13:59:12 aldan kernel: (ada4:ahcich5:0:0:0): CAM status:
   Command timeout
   Aug 14 13:59:12 aldan kernel: (ada4:ahcich5:0:0:0): Retrying command

While I'm getting a replacement, maybe, I can use camcontrol to somehow 
lower the operating system's exceptions about it? For example, the 
"camcontrol negotiate" returns the following about it:


   Current parameters:
   (pass5:ahcich5:0:0:0): SATA revision: 2.x
   (pass5:ahcich5:0:0:0): ATA mode: UDMA6
   (pass5:ahcich5:0:0:0): ATAPI packet length: 0
   (pass5:ahcich5:0:0:0): PIO transaction length: 8192
   (pass5:ahcich5:0:0:0): PMP presence: 0
   (pass5:ahcich5:0:0:0): Number of tags: 32
   (pass5:ahcich5:0:0:0): SATA capabilities: 0030
   (pass5:ahcich5:0:0:0): tagged queueing: enabled

Is there anything I can tweak for it to keep working even if at lower 
speeds?


Also, years ago, some BIOSes had the feature, which would "verify" a 
drive -- is there something similar I can trigger with camcontrol or 
smartctl?


Thanks!

   -mi


___
freebsd-hardware@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to "freebsd-hardware-unsubscr...@freebsd.org"


ada vs. da?

2017-08-13 Thread Mikhail T.

The four AHCI-drives in my system appear as both adaX and daX each:

 at scbus2 target 0 lun 0 (ada1,pass2)
 at scbus3 target 0 lun 0 (ada2,pass3)
 at scbus5 target 0 lun 0 (ada3,pass4)
 at scbus6 target 0 lun 0 (ada4,pass5)
 at scbus7 target 0 lun 0 (da0,pass6)
 at scbus7 target 0 lun 1 (da1,pass7)
 at scbus7 target 0 lun 2 (da2,pass8)
 at scbus7 target 0 lun 3 (da3,pass9)

Each one is listed in /var/run/dmesg.boot like this:

   ada2:  ATA8-ACS SATA 3.x device
   ada2: Serial Number Z1F1E8NK
   ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
   ada2: Command Queueing enabled
   ada2: 2861588MB (5860533168 512 byte sectors)
   ada2: quirks=0x1<4K>
   ada2: Previously was known as ad8
   da2:  Removable Direct Access SPC-3
   SCSI device
   da2: Serial Number 0195
   da2: 40.000MB/s transfers
   da2: Attempt to query device size failed: NOT READY, Medium not present
   da2: quirks=0x3

What am I supposed to make of it? Can they be accessed through either 
name? What are the advantages of each? If ada is always a better choice, 
how do I make the da ones disappear -- such as from the systat's output? 
Thanks!


   -mi

___
freebsd-hardware@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to "freebsd-hardware-unsubscr...@freebsd.org"


Do I need SAS drives?..

2017-08-09 Thread Mikhail T.
My server has 8 "hot-plug" slots, that can accept both SATA and SAS drives. 
SATA ones tend to be cheaper for the same features (like cache-sizes), what am 
I getting for the extra money spent on SAS?

Asking specifically about the protocol differences... It would seem, for 
example, SATA can not be as easily hot-plugged, but with camcontrol(8) that 
should not be a problem, right? What else? Thank you!
-- 
Sent from mobile device, please, pardon shorthand.


-- 
Sent from mobile device, please, pardon shorthand.
___
freebsd-hardware@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to "freebsd-hardware-unsubscr...@freebsd.org"


monitoring hardware temperatures

2010-12-06 Thread Mikhail T.

Hello!

I have a server (Dell Poweredge 2900), that's loaded with sensors.

While it was in Windows-mode, a utility was able to tell me not only the 
temperature of each CPU-core, but also that of every DIMM!.. One of them 
was running far hotter than others, and I'd like to continue keeping an 
eye on it now that the box run FreeBSD.


In FreeBSD there is coretemp(4), which is nice, but nothing else... 
There is no hw.acpi.thermal hierarchy either on this box... Yet, the box 
has 6 fans, two power-supplies, plus DIMMs -- all of them with sensors, 
that I can't read...


It seems, in 2007, there was an attempt to introduce OpenBSD's 
sensor-framework:


   http://kerneltrap.org/OpenBSD/BSDCan_2008_Hardware_Sensors_Framework

but it was backed-out after being declared a pile of crap and 
festering junkpile by our most mirthful contributor:


   
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=193129+0+archive/2007/cvs-all/20071021.cvs-all

until a proper architectural solution has been found. Has that 
happened in the three years, that passed since that lovely discussion? 
Or are we still waiting for someone to design and implement it not 
merely adequately, but perfectly?


If the three other BSD-cousins have had this for a while (NetBSD -- for 
10 years, apparently), continuing to insist on some future perfection 
seems wrong -- we should have this adequate but imperfect method if 
only for cross-BSD compatibility.


Is there, perhaps, a set of patches still secretly maintained by some 
die-hard? I'd love to try it here, and will be very thankful, if it 
gives me the monitoring, that I can not obtain otherwise... Thanks! Yours,


   -mi

___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org


Re: monitoring hardware temperatures

2010-12-06 Thread Mikhail T.

On 06.12.2010 07:44, Andriy Gapon wrote:

Well, that code has support only for a few types of hardware monitoring chips
(Super I/Os with hardware monitoring function).
Damn, I wish I knew earlier... The machine I'm retiring now -- but which 
was my primary horse 3 years ago -- has Super I/O :-(

So, it greatly depends on exact kind of hardware and sensors that you have.
First thing you should do to is to discover what kind of hardware is used for
monitoring in your server.
In your case that data might be provided via IPMI.

Thanks, I'll explore that pointer...

Especially I am not sure about monitoring DIMM temperature - greatly depends on
the way that it is actually done.  Perhaps it's reported via SMBus by the DIMMs
themselves, not sure...
Both NetBSD and OpenBSD (and, likely, DragonFly too) have something 
called sdtemp(4):


   http://fxr.watson.org/fxr/source/dev/i2c/sdtemp.c?v=NETBSD

I thought, that driver would be part of the unfortunate basic support 
for a few sensors...


Anyway, I'll try merging the 
http://people.freebsd.org/~avg/sensors9.diff, and see, what gives...


Is not it just like Linux, that one needs to get patches from here and 
there to get going :-\ ?


   -mi

___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org


Re: monitoring hardware temperatures

2010-12-06 Thread Mikhail T.

On 06.12.2010 14:51, Michael Fuckner wrote:

did you try to read the data via IPMI?
kldload ipmi;ipmitool sdr 

Interestingly, I was doing just that, when your e-mail arrived...

ipmitool was impressive enough and I'm building openipmi to take a look 
at that too.


I don't see information on each DIMM (yet?), but other information is 
quite useful...


One of the fans, for example, was listed as cr (rather than ok) -- 
which was, apparently, causing all other fans to run at maximum speed 
(*very* noisy fans in poweredge 2900).


I reset it (by pulling it out and back again), and now the box is 
quieting back down...


The sensors-patches did not add any new entries under hw.sensors 
hierarchy :(


The coretemp(4) stopped functioning, unfortunately... Whereas before, 
when I simply kldload-ed it, it was reporting reasonable temperatures, 
now that I have the sensors-patch merged in, I see nonsense like:


   hw.sensors.cpu0.temp0: -1282,97 degC
   hw.sensors.cpu1.temp0: -1272,97 degC
   hw.sensors.cpu2.temp0: -1282,97 degC
   hw.sensors.cpu3.temp0: -1262,97 degC

Seems like some kind of calibration issue -- the numbers differ from 
each other and change with time... I think, I'll back the patch out as 
it did not give me any new information -- the it- and lm-devices aren't 
found on this box :-(


Anyway, sdtemp(4) -- or equivalent -- is something, I'd like to have...

Thanks! Yours,

   -mi

___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org


Re: monitoring hardware temperatures

2010-12-06 Thread Mikhail T.

On 06.12.2010 18:02, Andriy Gapon wrote:

BTW, you could probably write a simple script employing smbmsg(1) to query the
DIMMs based on logic in the sdtemp driver.
From OpenBSD's sdtemp man-page, it would seem, the driver uses the iic 
framework (if that's the right word, khmm...)


And on this server I can't get /dev/iic* (nor smb*) to appear despite 
loading everything I could think of (even the viapm):


 31 0x80c23000 d22  iic.ko
 44 0x80c24000 10e7 iicbus.ko
 51 0x80c26000 f16  iicsmb.ko
 65 0x80c27000 819  smbus.ko
 71 0x80c28000 c02  smb.ko
 83 0x80c29000 114f iicbb.ko
 91 0x80c2b000 1df3 ichsmb.ko
   101 0x80c2d000 1aed intpm.ko
   111 0x80c2f000 e38  pcf.ko
   121 0x80c3 b83  lpbb.ko
   131 0x80c31000 368b ppbus.ko
   141 0x80c35000 262a viapm.ko

Could it be, that the motherboard simply does not have the iic-circuitry 
and that some other method has to be used? Thanks! Yours,


   -mi

___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org


Re: monitoring hardware temperatures

2010-12-06 Thread Mikhail T.

On 06.12.2010 18:19, Andriy Gapon wrote:

Another possibility is that a driver that should be able to handle your hardwre
just doesn't know the particular IDs.

pciconf -lv output could shed some light.
Attached -- it is a vanilla PowerEdge 2900 with just one add-on card 
-- audio...


Thanks! Yours,

   -mi

hos...@pci0:0:0:0:  class=0x06 card=0x80868086 chip=0x25c08086 rev=0x12 
hdr=0x00
vendor = 'Intel Corporation'
device = '5000X Chipset Memory Controller Hub'
class  = bridge
subclass   = HOST-PCI
pc...@pci0:0:2:0:   class=0x060400 card=0x chip=0x25e28086 rev=0x12 
hdr=0x01
vendor = 'Intel Corporation'
device = '5000 Series Chipset PCIe x4 Port 2'
class  = bridge
subclass   = PCI-PCI
pc...@pci0:0:3:0:   class=0x060400 card=0x chip=0x25e38086 rev=0x12 
hdr=0x01
vendor = 'Intel Corporation'
device = '5000 Series Chipset PCIe x4 Port 3'
class  = bridge
subclass   = PCI-PCI
pc...@pci0:0:4:0:   class=0x060400 card=0x chip=0x25e48086 rev=0x12 
hdr=0x01
vendor = 'Intel Corporation'
device = '5000 Series Chipset PCIe x4 Port 4'
class  = bridge
subclass   = PCI-PCI
pci...@pci0:0:5:0:  class=0x060400 card=0x chip=0x25e58086 rev=0x12 
hdr=0x01
vendor = 'Intel Corporation'
device = '5000 Series Chipset PCIe x4 Port 5'
class  = bridge
subclass   = PCI-PCI
pci...@pci0:0:6:0:  class=0x060400 card=0x chip=0x25f98086 rev=0x12 
hdr=0x01
vendor = 'Intel Corporation'
device = '5000 Series Chipset PCIe x8 Port 6-7'
class  = bridge
subclass   = PCI-PCI
pci...@pci0:0:7:0:  class=0x060400 card=0x chip=0x25e78086 rev=0x12 
hdr=0x01
vendor = 'Intel Corporation'
device = '5000 Series Chipset PCIe x4 Port 7'
class  = bridge
subclass   = PCI-PCI
no...@pci0:0:8:0:   class=0x088000 card=0x80868086 chip=0x1a388086 rev=0x12 
hdr=0x00
vendor = 'Intel Corporation'
device = '5000 Series Chipset DMA Engine (5000P)'
class  = base peripheral
hos...@pci0:0:16:0: class=0x06 card=0x01b11028 chip=0x25f08086 rev=0x12 
hdr=0x00
vendor = 'Intel Corporation'
device = '5000 Series Chipset Error Reporting Registers'
class  = bridge
subclass   = HOST-PCI
hos...@pci0:0:16:1: class=0x06 card=0x01b11028 chip=0x25f08086 rev=0x12 
hdr=0x00
vendor = 'Intel Corporation'
device = '5000 Series Chipset Error Reporting Registers'
class  = bridge
subclass   = HOST-PCI
hos...@pci0:0:16:2: class=0x06 card=0x01b11028 chip=0x25f08086 rev=0x12 
hdr=0x00
vendor = 'Intel Corporation'
device = '5000 Series Chipset Error Reporting Registers'
class  = bridge
subclass   = HOST-PCI
hos...@pci0:0:17:0: class=0x06 card=0x80868086 chip=0x25f18086 rev=0x12 
hdr=0x00
vendor = 'Intel Corporation'
device = '5000 Series Chipset Reserved Registers'
class  = bridge
subclass   = HOST-PCI
hos...@pci0:0:19:0: class=0x06 card=0x80868086 chip=0x25f38086 rev=0x12 
hdr=0x00
vendor = 'Intel Corporation'
device = '5000 Series Chipset Reserved Registers'
class  = bridge
subclass   = HOST-PCI
hos...@pci0:0:21:0: class=0x06 card=0x80868086 chip=0x25f58086 rev=0x12 
hdr=0x00
vendor = 'Intel Corporation'
device = '5000 Series Chipset FBD Registers'
class  = bridge
subclass   = HOST-PCI
hos...@pci0:0:22:0: class=0x06 card=0x80868086 chip=0x25f68086 rev=0x12 
hdr=0x00
vendor = 'Intel Corporation'
device = '5000 Series Chipset FBD Registers'
class  = bridge
subclass   = HOST-PCI
pci...@pci0:0:28:0: class=0x060400 card=0x01b11028 chip=0x26908086 rev=0x09 
hdr=0x01
vendor = 'Intel Corporation'
device = '631xESB/632xESB/3100 PCIe Root Port 1'
class  = bridge
subclass   = PCI-PCI
uh...@pci0:0:29:0:  class=0x0c0300 card=0x01b11028 chip=0x26888086 rev=0x09 
hdr=0x00
vendor = 'Intel Corporation'
device = '631xESB/632xESB/3100 Chipset USB Universal Host Controller *1'
class  = serial bus
subclass   = USB
uh...@pci0:0:29:1:  class=0x0c0300 card=0x01b11028 chip=0x26898086 rev=0x09 
hdr=0x00
vendor = 'Intel Corporation'
device = '631xESB/632xESB/3100 Chipset USB Universal Host Controller *2'
class  = serial bus
subclass   = USB
uh...@pci0:0:29:2:  class=0x0c0300 card=0x01b11028 chip=0x268a8086 rev=0x09 
hdr=0x00
vendor = 'Intel Corporation'
device = '631xESB/632xESB/3100 Chipset USB Universal Host Controller *3'
class  = serial bus
subclass   = USB
uh...@pci0:0:29:3:  class=0x0c0300 card=0x01b11028 chip=0x268b8086 rev=0x09 
hdr=0x00
vendor = 'Intel Corporation'
device = '631xESB/632xESB/3100 Chipset USB Universal Host 

After a disk's disappearance, ar0 (raid5) hung...

2010-11-07 Thread Mikhail T.
All of a sudden, one of the three parts (ad8) of my ar0 threw a fit. As
one might expect, the OS logged the event and told me, the array is now
in degraded mode.

Unfortunately, all I/O on the array is hanging. The machine is otherwise
responsive, but processes trying to access the array hang in either
biord or getblk. I'm pretty sure, that, if I reboot, things will get
back to normal. But I was hoping to buy some redundancy by using RAID5...

If anybody is interested in any diagnostics -- let me know, I'll hold
off rebooting for 12 hours. Yours,

-mi

___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org