On 3/17/2024 4:32 AM, Andrea Venturoli wrote:
On 3/15/24 19:17, mike tancsa wrote:
(da5:mpr0:0:15:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on,
reset, or bus device reset occurred)
Hello.
I know I'm probably blaming the wrong component, but is your PSU up to
the task?
How many drives do you have? Are they power-hungrier than the others
you tried (Samsung ???)?
Do you have a spare PSU to test/add?
Probably this is not the cause... still, before you bit farewell to
400 bucks...
hehe, thanks Andrea :) I too dont want to be out the money. Power
supply for sure is a good thing to check. In this case, the main server
chassis is sized with a couple of redundant 1000W power supplies that
should handle 12 full HDDs. Pretty sure in this case 6 SSDs should not
stress it beyond the point. But I had 2 other test boxes on the bench
and the one common variable seems to be the WDs.
I feel like this is a sunk cost I am pushing myself into, but I did do
some more testing. My co-worker came across this post which was
interesting.
https://forum.hddguru.com/viewtopic.php?f=10&t=43284
The very last entry says
"For WD BLUE SA 510 there are some problems with this type of SSD. This
YODA model
To fix the SSD if it is still recognized, use the firmware update tools.
And then do a secure erase or full wipe of the SSD. After this it will
work well. I can give you a link to this utility if it necessary. Also
ossible download it from manufacture FTP.
If it is not recognized by the computer or is identified as a SSD
device, there only one way, use production tools with new firmware to
begin the production process by testing the controller and NAND chip and
forming a translator. The SSD will be like brand new.
"
After I did the erase, the tests worked for a good 5 cycles and
performance was MUCH smoother and consistent. But then the drives
started to fail again. So I really wonder if TRIM has something to do
with it as my test is essentially writing a 250G data set with about 28
million txt files, destroying the dataset and then copying it again.
I noticed these 2 commits for other drives. I wonder if the WD is having
similar issues.
https://cgit.freebsd.org/src/commit/?h=stable/14&id=bf11fee6a5cf97102f87695185cadb63d5a2a7de
and
https://cgit.freebsd.org/src/commit/?h=stable/14&id=50aa22323424ccea00ef5d8f24e729a480cc77eb
I hope you dont mind bcc'ing you Andriy. I noticed you only added the
NCQ quirks for CAM ata and not for CAM scsi. I am running into odd
issues with some WD drives and wondering if there is the same root
limitation of these WD SA 510 drives like the Samsungs ? However, in my
use of the Samsungs I have not been able to trigger these bugs so far.
---Mike