Adam Bahe wrote:
Hello all,
'All' includes me as well, but keep in mind I am not a BTRFS dev.
I have a drive that has been in my btrfs array for about 6 months now.
It was purchased new. Its an IBM-ESXS SAS drive rebranded from an HGST
HUH721010AL4200. Here is the following stats, it passed a long
smartctl test. But I'm not sure what to make of it.
[/dev/sdi].write_io_errs 1823
[/dev/sdi].read_io_errs 0
[/dev/sdi].flush_io_errs 0
[/dev/sdi].corruption_errs 0
[/dev/sdi].generation_errs 0
Just a few observations.
You are more likely to get (faster) help from the friendly devs here if
you provide the output of...
btrfs --version
uname -a
btrfs filesystem show
Have you gone through the "regular" stuff?! E.g. things like bad cables,
rerouting cables, checking your power supply (noise, correct voltages),
temperature (your drive is not *that* far off the trip temperature, if
it is 52 C I imagine it could easily hit 65 C with a bit of load),
trying to eliminate other hardware, sound cards, graphics cards etc...
If you run your array on a USB enclosure weird things may/will/(has to)
happen.
=== START OF INFORMATION SECTION ===
Vendor: IBM-ESXS
Product: HUH721010AL4200
Revision: J6R2
User Capacity: 9,931,038,130,176 bytes [9.93 TB]
Logical block size: 4096 bytes
Formatted with type 2 protection
Logical block provisioning type unreported, LBPME=0, LBPRZ=0
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000cca266a405e4
Serial number: *YOINK*
Device type: disk
Transport protocol: SAS
Local Time is: Sat May 12 03:06:35 2018 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 52 C
Drive Trip Temperature: 65 C
Manufactured in week 33 of year 2017
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 28
Specified load-unload count over device lifetime: 600000
Accumulated load-unload cycles: 170
Elements in grown defect list: 0
Vendor (Seagate) cache information
Blocks sent to initiator = 1848304782540800
Error counter log:
Errors Corrected by Total Correction
Gigabytes Total
ECC rereads/ errors algorithm
processed uncorrected
fast | delayed rewrites corrected invocations [10^9
bytes] errors
read: 0 0 0 0 1096283
14317.360 0
write: 0 0 0 0 2906
27801.489 0
verify: 0 0 0 0 13027
0.000 0
Non-medium error count: 0
SMART Self-test log
Num Test Status segment LifeTime
LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Completed - 2466
- [- - -]
# 2 Background short Completed - 2448
- [- - -]
Long (extended) Self Test duration: 65535 seconds [1092.2 minutes]
I have not seen the 'correction algorithm invocations' before, but I
expect that such a large drive probably do some of this as part of
regular use. If the number is significantly higher than your other
drives (if they have the same load) I would suspect something is fishy
with your drive. But then again , it's better to ask someone else.
I can't RMA the drive as I have no idea how or where to RMA an IBM
branded HGST drive. So if on the off chance someone here is reading
this who can also point me in the right direction, let me know where
to RMA and IBM standalone drive with no FRU.
Uhm... can't you just return the drive where you purchased it?
But is this drive healthy or should I have it replaced? What is the
extent of a write_io_err? Are they somewhat common or a sign of a bad
drive? A scrub returned no errors.
The manual is a bit hard to understand
https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-device
It does not say clearly what happens if you have a redundant storage
profile for your (meta)data. Would a write be redirected to another
copy? if yes would it retry the original write. I *assume* that as long
as you don't get any write errors in your application it works. But
perhaps someone else care to explain this better (by preferably updating
the manual/wiki)
Also what about the correction algorithm invocations? All of my IBM
drives seem to have those. Whereas all of my other drives do not. I
was curious about that too, if anyone knows. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html