subject:"help\\\\\\\?\\\\\\\!"


On 1/21/24 18:29, David Christensen wrote:

On 1/21/24 14:48, gene heskett wrote:

On 1/21/24 16:13, David Christensen wrote:

On 1/21/24 03:47, gene heskett wrote:

On 1/21/24 01:33, David Christensen wrote:
I am still uncertain if those are internal SSD errors or SATA 
errors. Please check if you see matching errors in dmesg(1).


There aren't any. Those hours would very closely correspond to my 
attempts to rsync and the OOM deamon killed the machine, which it 
did around 10 times.  So logging by then had been killed. That to me 
is the smoking gun. 



Kernel ring buffer is renewed with each boot and newer messages 
overwrite older messages.  So, you will want to save or clear the 
ring buffer with demsg(1), save a SMART full report, exercise the 
disk with dd(1) and/or a SMART test, save the ring buffer, save a 
SMART full report, and analyze everything to see if you have disk 
problems, SATA problems, and/or system problems.  Once everything 
passes without error, the disk is ready to be put into service.



2T is enough /home for the nonce. so I'll do the rsync thing going 
the other direction, using it for a backup of /home until I'm ready 
for trixie.


However I am tempted to zero the drives an recreate the raid w/o 
formatting since the mdadm seems capable to installing itw own 
filesystems to use the whole drive unpartitioned, giving me a backup 
that sizewise is about the same as the single 2T drive has now.


And although my single experience with lvm over a decade ago was a 
total disaster, made out of used spinning rust I may now see how the 
other 4 2T's assembled as a lvm for amandas vtapes as an 8T lvm to 
backup the whole system, which in addition to the 4 cnc'd machines, 
has over the last 5 years seen a train of 3d printers go by. If all 
3, currently a WIP, get rebuilt, the smallest is 305 by, the largest 
is 400 by.  And all I hope will lay plastic at 200+ mm a second. 
Normal consumer stuff is 40 to 60.


Obviously I have an eclectic choice of too many hobbies. ;o)>
Now if curiosity doesn't kill this cat, I need to find some 
breakfast and git to it.



This and other threads have led me to the conclusion that consumer 
SSD's are meant for devices that are off most of the time -- e.g. 
notepad, laptop, desktop, and workstation computers.  If you put them 
into a NAS/ file server and run them 24x7, they will die sometime 
after 2 years.


That has not been my experience at all David, I bought a 4 pack of 
120G ssd's when they were the biggest available and replaced 3 
spinning rust drives that had 50-70k hours on them with these. My cnc 
machines are all wired so power for the mill/lathe/what have you is 
totally controlled by the enable key, f2, so if f2 is off only the 
computer is running. That was at least 6 years ago. Then I installed a 
240G as an extra drive on the rpi4 that runs my biggest lathe and made 
a buildbot out of it to pull linuxcnc-master from github and build it, 
also armhf kernels for linuxcnc's realtime needs.  The 120G 
disappeared in about a year, replaced the adapter with a startech, 
drive was and is just fine. There is now at least 5 years on everyone 
of those original 120's, zero SSD problems in the whole lot.



I also have small SSD's that have lasted far longer than 2 years on 
mixed duty, including 24x7 (Intel SSD 520 Series 60 GB).  The relevant 
recent threads on this list seem to be 1+ TB Samsung's.



It is interesting to note that BackBlaze does not seem to use Samsung 
SSD's:


https://www.backblaze.com/blog/ssd-edition-2023-mid-year-drive-stats-review/


3.  For Amanda, either add more HDD's to the storage server or build 
another server.  If another server, shut it down when you are not 
using it.



Speaking as someone who has used amanda for about 25 years:

People don't always understand that one of Amanda prime directives is 
to balance the size of an individual back up run by advancing the 
level 3 scheduled for tonight, by advancing it to level 0 if this run 
is only going to be small. The only guarantee is that if you have a 10 
day schedule, all machines/dle's, will get that level 0 backup not 
more than 10 days after the last one.   You choose how many days long 
that cycle is. I adjust it so the storage is around 75 to 80% used 
after the schedule has stabilized. This may take quite  a few such 
cycles. Designed to run every night when things are relatively quiet, 
how this works well depends on the other machines it is backing up be 
available.


Machines missing at backup time can and will muck things up for this 
efficient scheduling. Corporate users of Amanda, used to doing it 
their way, backing the weeks business on friday nights just don't 
understand that the Amanda way gets them a 100% coverage backup by 
backing up only the differences from the previous run of that dle 
every night is far superior to their fridey night when most of the 
offices machines are turned off for the weekend. For those cases we 
recommend

Re: smartctl cannot access my storage, need syntax help

2024-01-21 Thread David Christensen


On 1/21/24 14:48, gene heskett wrote:

On 1/21/24 16:13, David Christensen wrote:

On 1/21/24 03:47, gene heskett wrote:

On 1/21/24 01:33, David Christensen wrote:
I am still uncertain if those are internal SSD errors or SATA 
errors. Please check if you see matching errors in dmesg(1).


There aren't any. Those hours would very closely correspond to my 
attempts to rsync and the OOM deamon killed the machine, which it did 
around 10 times.  So logging by then had been killed. That to me is 
the smoking gun. 



Kernel ring buffer is renewed with each boot and newer messages 
overwrite older messages.  So, you will want to save or clear the ring 
buffer with demsg(1), save a SMART full report, exercise the disk with 
dd(1) and/or a SMART test, save the ring buffer, save a SMART full 
report, and analyze everything to see if you have disk problems, SATA 
problems, and/or system problems.  Once everything passes without 
error, the disk is ready to be put into service.



2T is enough /home for the nonce. so I'll do the rsync thing going 
the other direction, using it for a backup of /home until I'm ready 
for trixie.


However I am tempted to zero the drives an recreate the raid w/o 
formatting since the mdadm seems capable to installing itw own 
filesystems to use the whole drive unpartitioned, giving me a backup 
that sizewise is about the same as the single 2T drive has now.


And although my single experience with lvm over a decade ago was a 
total disaster, made out of used spinning rust I may now see how the 
other 4 2T's assembled as a lvm for amandas vtapes as an 8T lvm to 
backup the whole system, which in addition to the 4 cnc'd machines, 
has over the last 5 years seen a train of 3d printers go by. If all 
3, currently a WIP, get rebuilt, the smallest is 305 by, the largest 
is 400 by.  And all I hope will lay plastic at 200+ mm a second.  
Normal consumer stuff is 40 to 60.


Obviously I have an eclectic choice of too many hobbies. ;o)>
Now if curiosity doesn't kill this cat, I need to find some breakfast 
and git to it.



This and other threads have led me to the conclusion that consumer 
SSD's are meant for devices that are off most of the time -- e.g. 
notepad, laptop, desktop, and workstation computers.  If you put them 
into a NAS/ file server and run them 24x7, they will die sometime 
after 2 years.


That has not been my experience at all David, I bought a 4 pack of 120G 
ssd's when they were the biggest available and replaced 3 spinning rust 
drives that had 50-70k hours on them with these. My cnc machines are all 
wired so power for the mill/lathe/what have you is totally controlled by 
the enable key, f2, so if f2 is off only the computer is running. That 
was at least 6 years ago. Then I installed a 240G as an extra drive on 
the rpi4 that runs my biggest lathe and made a buildbot out of it to 
pull linuxcnc-master from github and build it, also armhf kernels for 
linuxcnc's realtime needs.  The 120G disappeared in about a year, 
replaced the adapter with a startech, drive was and is just fine. There 
is now at least 5 years on everyone of those original 120's, zero SSD 
problems in the whole lot.



I also have small SSD's that have lasted far longer than 2 years on 
mixed duty, including 24x7 (Intel SSD 520 Series 60 GB).  The relevant 
recent threads on this list seem to be 1+ TB Samsung's.



It is interesting to note that BackBlaze does not seem to use Samsung SSD's:

https://www.backblaze.com/blog/ssd-edition-2023-mid-year-drive-stats-review/


3.  For Amanda, either add more HDD's to the storage server or build 
another server.  If another server, shut it down when you are not 
using it.



Speaking as someone who has used amanda for about 25 years:

People don't always understand that one of Amanda prime directives is to 
balance the size of an individual back up run by advancing the level 3 
scheduled for tonight, by advancing it to level 0 if this run is only 
going to be small. The only guarantee is that if you have a 10 day 
schedule, all machines/dle's, will get that level 0 backup not more than 
10 days after the last one.   You choose how many days long that cycle 
is. I adjust it so the storage is around 75 to 80% used after the 
schedule has stabilized. This may take quite  a few such cycles. 
Designed to run every night when things are relatively quiet, how this 
works well depends on the other machines it is backing up be available.


Machines missing at backup time can and will muck things up for this 
efficient scheduling. Corporate users of Amanda, used to doing it their 
way, backing the weeks business on friday nights just don't understand 
that the Amanda way gets them a 100% coverage backup by backing up only 
the differences from the previous run of that dle every night is far 
superior to their fridey night when most of the offices machines are 
turned off for the weekend. For those cases we recommend composing two 
or more dle files and rigging cron to

Re: smartctl cannot access my storage, need syntax help


On 1/21/24 16:13, David Christensen wrote:

On 1/21/24 03:47, gene heskett wrote:

On 1/21/24 01:33, David Christensen wrote:
I am still uncertain if those are internal SSD errors or SATA errors. 
Please check if you see matching errors in dmesg(1).


There aren't any. Those hours would very closely correspond to my 
attempts to rsync and the OOM deamon killed the machine, which it did 
around 10 times.  So logging by then had been killed. That to me is 
the smoking gun. 



Kernel ring buffer is renewed with each boot and newer messages 
overwrite older messages.  So, you will want to save or clear the ring 
buffer with demsg(1), save a SMART full report, exercise the disk with 
dd(1) and/or a SMART test, save the ring buffer, save a SMART full 
report, and analyze everything to see if you have disk problems, SATA 
problems, and/or system problems.  Once everything passes without error, 
the disk is ready to be put into service.



2T is enough /home for the nonce. so I'll do the rsync thing going the 
other direction, using it for a backup of /home until I'm ready for 
trixie.


However I am tempted to zero the drives an recreate the raid w/o 
formatting since the mdadm seems capable to installing itw own 
filesystems to use the whole drive unpartitioned, giving me a backup 
that sizewise is about the same as the single 2T drive has now.


And although my single experience with lvm over a decade ago was a 
total disaster, made out of used spinning rust I may now see how the 
other 4 2T's assembled as a lvm for amandas vtapes as an 8T lvm to 
backup the whole system, which in addition to the 4 cnc'd machines, 
has over the last 5 years seen a train of 3d printers go by. If all 3, 
currently a WIP, get rebuilt, the smallest is 305 by, the largest is 
400 by.  And all I hope will lay plastic at 200+ mm a second.  Normal 
consumer stuff is 40 to 60.


Obviously I have an eclectic choice of too many hobbies. ;o)>
Now if curiosity doesn't kill this cat, I need to find some breakfast 
and git to it.



This and other threads have led me to the conclusion that consumer SSD's 
are meant for devices that are off most of the time -- e.g. notepad, 
laptop, desktop, and workstation computers.  If you put them into a NAS/ 
file server and run them 24x7, they will die sometime after 2 years.


That has not been my experience at all David, I bought a 4 pack of 120G 
ssd's when they were the biggest available and replaced 3 spinning rust 
drives that had 50-70k hours on them with these. My cnc machines are all 
wired so power for the mill/lathe/what have you is totally controlled by 
the enable key, f2, so if f2 is off only the computer is running. That 
was at least 6 years ago. Then I installed a 240G as an extra drive on 
the rpi4 that runs my biggest lathe and made a buildbot out of it to 
pull linuxcnc-master from github and build it, also armhf kernels for 
linuxcnc's realtime needs.  The 120G disappeared in about a year, 
replaced the adapter with a startech, drive was and is just fine. There 
is now at least 5 years on everyone of those original 120's, zero SSD 
problems in the whole lot.


So, I suggest:

1.  Build a storage server using NAS or enterprise HDD's.  Use an 
enterprise SSD or DOM for the OS.  Run it 24x7 or shut it down as you like.


2.  Use your Asus PRIME Z370-A II as a workstation.  Install the WD 
Black M.2 NVMe PCIe SSD.  Connect the optical drive to the first 
motherboard SATA port.  Install Debian onto the WD Black.  Then, connect 
the five Samsung EVO 870's to the remaining motherboard SATA ports.  Set 
them up as a 5-way mirror (RAID1).  Use the Samsung RAID as a scratch 
disk for your 3-D work.  As the Samsung's die off, replace them with the 
Gigastones.  Shut it down when you are not using it.


3.  For Amanda, either add more HDD's to the storage server or build 
another server.  If another server, shut it down when you are not using it.



Speaking as someone who has used amanda for about 25 years:

People don't always understand that one of Amanda prime directives is to 
balance the size of an individual back up run by advancing the level 3 
scheduled for tonight, by advancing it to level 0 if this run is only 
going to be small. The only guarantee is that if you have a 10 day 
schedule, all machines/dle's, will get that level 0 backup not more than 
10 days after the last one.   You choose how many days long that cycle 
is. I adjust it so the storage is around 75 to 80% used after the 
schedule has stabilized. This may take quite  a few such cycles. 
Designed to run every night when things are relatively quiet, how this 
works well depends on the other machines it is backing up be available.


Machines missing at backup time can and will muck things up for this 
efficient scheduling. Corporate users of Amanda, used to doing it their 
way, backing the weeks business on friday nights just don't understand 
that the Amanda way gets them a 100% coverage backup by backing up only 
the

Re: smartctl cannot access my storage, need syntax help

2024-01-21 Thread David Christensen


On 1/21/24 03:47, gene heskett wrote:

On 1/21/24 01:33, David Christensen wrote:
I am still uncertain if those are internal SSD errors or SATA errors. 
Please check if you see matching errors in dmesg(1).


There aren't any. Those hours would very closely correspond to my 
attempts to rsync and the OOM deamon killed the machine, which it did 
around 10 times.  So logging by then had been killed. That to me is the 
smoking gun. 



Kernel ring buffer is renewed with each boot and newer messages 
overwrite older messages.  So, you will want to save or clear the ring 
buffer with demsg(1), save a SMART full report, exercise the disk with 
dd(1) and/or a SMART test, save the ring buffer, save a SMART full 
report, and analyze everything to see if you have disk problems, SATA 
problems, and/or system problems.  Once everything passes without error, 
the disk is ready to be put into service.



2T is enough /home for the nonce. so I'll do the rsync 
thing going the other direction, using it for a backup of /home 
until I'm ready for trixie.


However I am tempted to zero the drives an recreate the raid w/o 
formatting since the mdadm seems capable to installing itw own 
filesystems to use the whole drive unpartitioned, giving me a backup 
that sizewise is about the same as the single 2T drive has now.


And although my single experience with lvm over a decade ago was a total 
disaster, made out of used spinning rust I may now see how the other 4 
2T's assembled as a lvm for amandas vtapes as an 8T lvm to backup the 
whole system, which in addition to the 4 cnc'd machines, has over the 
last 5 years seen a train of 3d printers go by. If all 3, currently a 
WIP, get rebuilt, the smallest is 305 by, the largest is 400 by.  And 
all I hope will lay plastic at 200+ mm a second.  Normal consumer stuff 
is 40 to 60.


Obviously I have an eclectic choice of too many hobbies. ;o)>
Now if curiosity doesn't kill this cat, I need to find some breakfast 
and git to it.



This and other threads have led me to the conclusion that consumer SSD's 
are meant for devices that are off most of the time -- e.g. notepad, 
laptop, desktop, and workstation computers.  If you put them into a NAS/ 
file server and run them 24x7, they will die sometime after 2 years.



So, I suggest:

1.  Build a storage server using NAS or enterprise HDD's.  Use an 
enterprise SSD or DOM for the OS.  Run it 24x7 or shut it down as you like.


2.  Use your Asus PRIME Z370-A II as a workstation.  Install the WD 
Black M.2 NVMe PCIe SSD.  Connect the optical drive to the first 
motherboard SATA port.  Install Debian onto the WD Black.  Then, connect 
the five Samsung EVO 870's to the remaining motherboard SATA ports.  Set 
them up as a 5-way mirror (RAID1).  Use the Samsung RAID as a scratch 
disk for your 3-D work.  As the Samsung's die off, replace them with the 
Gigastones.  Shut it down when you are not using it.


3.  For Amanda, either add more HDD's to the storage server or build 
another server.  If another server, shut it down when you are not using it.



David

Re: smartctl cannot access my storage, need syntax help


On 1/21/24 04:35, Max Nikulin wrote:


On 21/01/2024 03:23, gene heskett wrote:

Right now nothing in the system is north of 32C, might get to 36C at
the end of a 9 minute build of something in OpenSCAD. 


I would say that 53°C and even 44°C is well above 36°C you expected:

On 21/01/2024 12:48, gene heskett wrote:


SCT Status Version:  3
SCT Version (vendor specific):   256 (0x0100)
Device State:    DST executing in background (3)
Current Temperature:    28 Celsius
Power Cycle Min/Max Temperature: 26/44 Celsius
Lifetime    Min/Max Temperature: 24/53 Celsius
Specified Max Operating Temperature:    70 Celsius
Under/Over Temperature Limit Count:   0/0



Device Statistics (GP Log 0x04)

0x05  =  =   =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1  28  ---  Current Temperature
0x05  0x020  1  53  ---  Highest Temperature
0x05  0x028  1  24  ---  Lowest Temperature
0x05  0x058  1  70  ---  Specified Maximum Operating 
Temperature


IIRC the fan in the front of an upper drive cage got unplugged for a 
while, half an hour maybe, about a year ago while I was doing my annual 
D on it.  These SSD's all of them have a label claiming they need 5 
volts and 1 amp, that is 5 watts, but I don't think that is a steady 
load, probably only when writing at 500+ mhz,  ! watt or less of heat is 
much closer to normal operation.


Thank you, take care, stay warm, dry and well, Max. Having a heat wave 
here, its up to 21F out at 12:25 pm here, 16" of white stuff on the 
front deck, got cold & had to replace the battery's in my smart t-stat 
about an hour ago.


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help

On 1/21/24 01:33, David Christensen wrote:

On 1/20/24 21:48, gene heskett wrote:

New -x version for this SSD attached

 > SMART Attributes Data Structure revision number: 1
 > Vendor Specific SMART Attributes with Thresholds:
 > ID# ATTRIBUTE_NAME  FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
 >   5 Reallocated_Sector_Ct   PO--CK   094   094   010    -    64
 > 183 Runtime_Bad_Block   PO--C-   094   094   010    -    64
 > 187 Uncorrectable_Error_Cnt -O--CK   099   099   000    -    392
 > 195 ECC_Error_Rate  -O-RC-   199   199   000    -    392
 > 199 CRC_Error_Count -OSRCK   099   099   000    -    2

Those attributes are worrisome.  Especially Reallocated_Sector_Ct and 
Runtime_Bad_Block -- I am confident those are inside the SSD.

 >   9 Power_On_Hours  -O--CK   095   095   000    -    21194

That is equivalent to 10.2 years at 40 hours/week.

Machine runs 24/7/365.25

 > 241 Total_LBAs_Written  -O--CK   099   099   000    -    38429262625

TBW specification for 1 TB drive is 600TB.  You are at 19.7.

relatively low IOW.

 > Error 466 [1] occurred at disk power-on lifetime: 21078 hours (878 
days + 6 hours)
 >   When the command that caused the error occurred, the device was 
active or idle.

 >
 >   After command completion occurred, registers were:
 >   ER -- ST COUNT  LBA_48  LH LM LL DV DC
 >   -- -- -- == -- == == == -- -- -- -- --
 >   40 -- 51 00 40 00 00 1b a4 0d 18 40 00  Error: WP at LBA = 
0x1ba40d18 = 463736088

 >
 >   Commands leading to the command that caused the error were:
 >   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
Command/Feature_Name
 >   -- == -- == -- == == == -- -- -- -- --  --- 

 >   61 00 08 00 40 00 00 1b a4 0d 18 40 08  1d+03:35:20.430  WRITE 
FPDMA QUEUED
 >   60 0a 00 00 38 00 00 70 f1 a4 00 40 07  1d+03:35:20.430  READ FPDMA 
QUEUED
 >   60 07 80 00 30 00 00 70 f1 3c 80 40 06  1d+03:35:20.430  READ FPDMA 
QUEUED
 >   61 00 28 00 28 00 00 1b a4 0d 38 40 05  1d+03:35:20.430  WRITE 
FPDMA QUEUED
 >   47 00 00 00 01 00 00 00 00 00 00 40 02  1d+03:35:20.430  READ LOG 
DMA EXT

 >
 > Error 465 [0] occurred at disk power-on lifetime: 21078 hours (878 
days + 6 hours)

 > ...
 > Error 464 [3] occurred at disk power-on lifetime: 21078 hours (878 
days + 6 hours)

 > ...
 > Error 463 [2] occurred at disk power-on lifetime: 21078 hours (878 
days + 6 hours)

I am still uncertain if those are internal SSD errors or SATA errors. 
Please check if you see matching errors in dmesg(1).

There aren't any. Those hours would very closely correspond to my 
attempts to rsync and the OOM deamon killed the machine, which it did 
around 10 times.  So logging by then had been killed. That to me is the 
smoking gun. 2T is enough /home for the nonce. so I'll do the rsync 
thing going the other direction, using it for a backup of /home until 
I'm ready for trixie.

However I am tempted to zero the drives an recreate the raid w/o 
formatting since the mdadm seems capable to installing itw own 
filesystems to use the whole drive unpartitioned, giving me a backup 
that sizewise is about the same as the single 2T drive has now.

And although my single experience with lvm over a decade ago was a total 
disaster, made out of used spinning rust I may now see how the other 4 
2T's assembled as a lvm for amandas vtapes as an 8T lvm to backup the 
whole system, which in addition to the 4 cnc'd machines, has over the 
last 5 years seen a train of 3d printers go by. If all 3, currently a 
WIP, get rebuilt, the smallest is 305 by, the largest is 400 by.  And 
all I hope will lay plastic at 200+ mm a second.  Normal consumer stuff 
is 40 to 60.

Obviously I have an eclectic choice of too many hobbies. ;o)>
Now if curiosity doesn't kill this cat, I need to find some breakfast 
and git to it.

Thank you David, take care, stay warm dry and well.

David

.

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help

2024-01-21 Thread Max Nikulin




On 21/01/2024 03:23, gene heskett wrote:

Right now nothing in the system is north of 32C, might get to 36C at
the end of a 9 minute build of something in OpenSCAD. 


I would say that 53°C and even 44°C is well above 36°C you expected:

On 21/01/2024 12:48, gene heskett wrote:


SCT Status Version:  3
SCT Version (vendor specific):   256 (0x0100)
Device State:DST executing in background (3)
Current Temperature:28 Celsius
Power Cycle Min/Max Temperature: 26/44 Celsius
LifetimeMin/Max Temperature: 24/53 Celsius
Specified Max Operating Temperature:70 Celsius
Under/Over Temperature Limit Count:   0/0



Device Statistics (GP Log 0x04)

0x05  =  =   =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1  28  ---  Current Temperature
0x05  0x020  1  53  ---  Highest Temperature
0x05  0x028  1  24  ---  Lowest Temperature
0x05  0x058  1  70  ---  Specified Maximum Operating Temperature

Re: smartctl cannot access my storage, need syntax help

2024-01-20 Thread David Christensen


On 1/20/24 21:48, gene heskett wrote:

New -x version for this SSD attached


> SMART Attributes Data Structure revision number: 1
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME  FLAGSVALUE WORST THRESH FAIL RAW_VALUE
>   5 Reallocated_Sector_Ct   PO--CK   094   094   010-64
> 183 Runtime_Bad_Block   PO--C-   094   094   010-64
> 187 Uncorrectable_Error_Cnt -O--CK   099   099   000-392
> 195 ECC_Error_Rate  -O-RC-   199   199   000-392
> 199 CRC_Error_Count -OSRCK   099   099   000-2

Those attributes are worrisome.  Especially Reallocated_Sector_Ct and 
Runtime_Bad_Block -- I am confident those are inside the SSD.



>   9 Power_On_Hours  -O--CK   095   095   000-21194

That is equivalent to 10.2 years at 40 hours/week.


> 241 Total_LBAs_Written  -O--CK   099   099   000-38429262625

TBW specification for 1 TB drive is 600TB.  You are at 19.7.


> Error 466 [1] occurred at disk power-on lifetime: 21078 hours (878 
days + 6 hours)
>   When the command that caused the error occurred, the device was 
active or idle.

>
>   After command completion occurred, registers were:
>   ER -- ST COUNT  LBA_48  LH LM LL DV DC
>   -- -- -- == -- == == == -- -- -- -- --
>   40 -- 51 00 40 00 00 1b a4 0d 18 40 00  Error: WP at LBA = 
0x1ba40d18 = 463736088

>
>   Commands leading to the command that caused the error were:
>   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
Command/Feature_Name
>   -- == -- == -- == == == -- -- -- -- --  --- 

>   61 00 08 00 40 00 00 1b a4 0d 18 40 08  1d+03:35:20.430  WRITE 
FPDMA QUEUED
>   60 0a 00 00 38 00 00 70 f1 a4 00 40 07  1d+03:35:20.430  READ FPDMA 
QUEUED
>   60 07 80 00 30 00 00 70 f1 3c 80 40 06  1d+03:35:20.430  READ FPDMA 
QUEUED
>   61 00 28 00 28 00 00 1b a4 0d 38 40 05  1d+03:35:20.430  WRITE 
FPDMA QUEUED
>   47 00 00 00 01 00 00 00 00 00 00 40 02  1d+03:35:20.430  READ LOG 
DMA EXT

>
> Error 465 [0] occurred at disk power-on lifetime: 21078 hours (878 
days + 6 hours)

> ...
> Error 464 [3] occurred at disk power-on lifetime: 21078 hours (878 
days + 6 hours)

> ...
> Error 463 [2] occurred at disk power-on lifetime: 21078 hours (878 
days + 6 hours)


I am still uncertain if those are internal SSD errors or SATA errors. 
Please check if you see matching errors in dmesg(1).



David

Re: smartctl cannot access my storage, need syntax help

2024-01-20 Thread gene heskett


On 1/21/24 00:30, Max Nikulin wrote:

On 21/01/2024 03:23, gene heskett wrote:

On 1/20/24 10:24, Max Nikulin wrote:

On 19/01/2024 06:10, gene heskett wrote:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE UPDATED 
WHEN_FAILED RAW_VALUE


190 Airflow_Temperature_Cel 0x0032   071   049   000    Old_age 
Always   -   29


Initial 100 decreased to 49 means that sometimes the drive is hot 
enough.


I've been under the impression that 100C was the absolute temp limit


Do not confuse normalized values (100 means shiny new, 0 means really 
old or damaged) and RAW_VALUE. For some drives smartctl -x may report 
history of temperature measurements, but I think summer values are 
already unavailable.


and it not been over 36C that I know of according to gkrellm which s 
set to monitor that stuff in real time. Right now nothing in the 
system is north of 32C, might get to 36C


71 <-> 29 °C and 49 <-> 36 °C mapping might be possible, but I would 
expect higher temperature for 49.


I read up on the manpage.
New -x version for this SSD attached

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-17-rt-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Samsung based SSDs
Device Model: Samsung SSD 870 EVO 1TB
Serial Number:S626NF0R302498T
LU WWN Device Id: 5 002538 f413394a5
Firmware Version: SVT01B6Q
User Capacity:1,000,204,886,016 bytes [1.00 TB]
Sector Size:  512 bytes logical/physical
Rotation Rate:Solid State Device
Form Factor:  2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is:In smartctl database 7.3/5319
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:Sun Jan 21 00:44:08 2024 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  ( 117) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline 
data collection:(0) seconds.
Offline data collection
capabilities:(0x53) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine 
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:(  85) minutes.
SCT capabilities:  (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAGSVALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   094   094   010-64
  9 Power_On_Hours  -O--CK   095   095   000-21194
 12 Power_Cycle_Count   -O--CK   099   099   000-86
177 Wear_Leveling_Count PO--C-   099   099   000-23
179 Used_Rsvd_Blk_Cnt_Tot   PO--C-   094   094   010-64
181 Program_Fail_Cnt_Total  -O--CK   100   100   010-0
182 Erase_Fail_Count_Total  -O--CK   100   100   010-0
183

Re: smartctl cannot access my storage, need syntax help

2024-01-20 Thread Max Nikulin


On 21/01/2024 03:23, gene heskett wrote:

On 1/20/24 10:24, Max Nikulin wrote:

On 19/01/2024 06:10, gene heskett wrote:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE UPDATED  
WHEN_FAILED RAW_VALUE


190 Airflow_Temperature_Cel 0x0032   071   049   000    Old_age 
Always   -   29


Initial 100 decreased to 49 means that sometimes the drive is hot enough.


I've been under the impression that 100C was the absolute temp limit


Do not confuse normalized values (100 means shiny new, 0 means really 
old or damaged) and RAW_VALUE. For some drives smartctl -x may report 
history of temperature measurements, but I think summer values are 
already unavailable.


and it not been over 36C that I know of according to gkrellm which s set 
to monitor that stuff in real time. Right now nothing in the system is 
north of 32C, might get to 36C


71 <-> 29 °C and 49 <-> 36 °C mapping might be possible, but I would 
expect higher temperature for 49.


# 2  Extended offline    Completed: read failure   50% 
10917 1847474376
# 3  Extended offline    Completed: read failure   50% 
10586 1847474376


May it happen that disk firmware does not remap failed sectors to 
allow the user to identify what file is damaged?


IDK Max. I know the microware os9 file system well enough to connect the 
dots, but have little knowledge for how one might do this with ext4.


If you are motivated enough then docs either for badblocks or for some 
data recovery software may give you a recipe. A search engine should 
help to find it.

Re: smartctl cannot access my storage, need syntax help

2024-01-20 Thread gene heskett


On 1/20/24 10:24, Max Nikulin wrote:

On 19/01/2024 06:10, gene heskett wrote:

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  
UPDATED  WHEN_FAILED RAW_VALUE
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   085   085   010    Pre-fail  
Always   -   168
183 Runtime_Bad_Block   0x0013   085   085   010    Pre-fail  
Always   -   168


85 is still far enough from 10, however the change is noticeable.

190 Airflow_Temperature_Cel 0x0032   071   049   000    Old_age   
Always   -   29


Initial 100 decreased to 49 means that sometimes the drive is hot 
enough.


I've been under the impression that 100C was the absolute temp limit, 
and it not been over 36C that I know of according to gkrellm which s set 
to monitor that stuff in real time. Right now nothing in the system is 
north of 32C, might get to 36C at the end of a 9 minute build of 
something in OpenSCAD.


 On the other hand the raw value of 29 is likely centigrade

degrees and it is not really hot for the normalized value of 71.


this is true, all reported temps are in C.


SMART Self-test log structure revision number 1
Num  Test_Description    Status  Remaining  
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure   50% 
21128 1847474744
# 2  Extended offline    Completed: read failure   50% 
10917 1847474376
# 3  Extended offline    Completed: read failure   50% 
10586 1847474376


May it happen that disk firmware does not remap failed sectors to allow 
the user to identify what file is damaged?


IDK Max. I know the microware os9 file system well enough to connect the 
dots, but have little knowledge for how one might do this with ext4.


Thanks Max, take care & stay well.

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help

2024-01-20 Thread Max Nikulin


On 19/01/2024 06:10, gene heskett wrote:

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   085   085   010Pre-fail  Always   
-   168
183 Runtime_Bad_Block   0x0013   085   085   010Pre-fail  Always   
-   168


85 is still far enough from 10, however the change is noticeable.


190 Airflow_Temperature_Cel 0x0032   071   049   000Old_age   Always   
-   29


Initial 100 decreased to 49 means that sometimes the drive is hot 
enough. On the other hand the raw value of 29 is likely centigrade 
degrees and it is not really hot for the normalized value of 71.



SMART Self-test log structure revision number 1
Num  Test_DescriptionStatus  Remaining  LifeTime(hours)  
LBA_of_first_error
# 1  Extended offlineCompleted: read failure   50% 21128 
1847474744
# 2  Extended offlineCompleted: read failure   50% 10917 
1847474376
# 3  Extended offlineCompleted: read failure   50% 10586 
1847474376


May it happen that disk firmware does not remap failed sectors to allow 
the user to identify what file is damaged?

Re: smartctl cannot access my storage, need syntax help

2024-01-19 Thread David Christensen


On 1/19/24 21:34, gene heskett wrote:

On 1/19/24 20:29, Felix Miata wrote:

gene heskett composed on 2024-01-19 19:09 (UTC-0500):

On 1/19/24 15:56, David Christensen wrote:

https://www.cablematters.com/pc-187-156-3-pack-straight-60-gbps-sata-iii-cable.aspx



Cheap enough at 18", ordered 4 packs of 3 for service & build stock,
thanks David.


Among the elements of that page, opened in web browser lacking JS 
support, was

absence of a price, and also were the following "features":

Serial ATA/150
and
Fast data transfer rate of up to 150 Mbps

Those describe SATA revision 1.0 (1.5 Gbit/s), not SATA revision 2.0 
(300MB/s, 3.0

Gbit/s), not SATA revision 3.0 (600MB/s, 6.0 Gbit/s).
https://en.wikipedia.org/wiki/SATA

With JS enabled, the page radically changed to show $8.49 for a 3-pack 
of 6.0

Gbit/s cables.


They had 2 lengths, 24" will if everything isn't good, sign on as 
sata-II but the 18" I bought claim sata-III.



I bought black cables, 18" and 24", straight-straight and straight-90. 
The older ones are labeled "Serial ATA 6G".  The newer ones are labeled 
"Serial ATA3.2".



David

Re: smartctl cannot access my storage, need syntax help


On 1/19/24 20:29, Felix Miata wrote:

gene heskett composed on 2024-01-19 19:09 (UTC-0500):


On 1/19/24 15:56, David Christensen wrote:



No sign of that snipped stuff.


https://www.cablematters.com/pc-187-156-3-pack-straight-60-gbps-sata-iii-cable.aspx



Cheap enough at 18", ordered 4 packs of 3 for service & build stock,
thanks David.


Among the elements of that page, opened in web browser lacking JS support, was
absence of a price, and also were the following "features":

Serial ATA/150
and
Fast data transfer rate of up to 150 Mbps

Those describe SATA revision 1.0 (1.5 Gbit/s), not SATA revision 2.0 (300MB/s, 
3.0
Gbit/s), not SATA revision 3.0 (600MB/s, 6.0 Gbit/s).
https://en.wikipedia.org/wiki/SATA

With JS enabled, the page radically changed to show $8.49 for a 3-pack of 6.0
Gbit/s cables.


They had 2 lengths, 24" will if everything isn't good, sign on as 
sata-II but the 18" I bought claim sata-III.


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help

2024-01-19 Thread Felix Miata

gene heskett composed on 2024-01-19 19:09 (UTC-0500):

> On 1/19/24 15:56, David Christensen wrote:

> No sign of that snipped stuff.
> 
>> https://www.cablematters.com/pc-187-156-3-pack-straight-60-gbps-sata-iii-cable.aspx

> Cheap enough at 18", ordered 4 packs of 3 for service & build stock, 
> thanks David. 

Among the elements of that page, opened in web browser lacking JS support, was
absence of a price, and also were the following "features":

Serial ATA/150
and
Fast data transfer rate of up to 150 Mbps

Those describe SATA revision 1.0 (1.5 Gbit/s), not SATA revision 2.0 (300MB/s, 
3.0
Gbit/s), not SATA revision 3.0 (600MB/s, 6.0 Gbit/s).
https://en.wikipedia.org/wiki/SATA

With JS enabled, the page radically changed to show $8.49 for a 3-pack of 6.0
Gbit/s cables.
-- 
Evolution as taught in public schools is, like religion,
based on faith, not based on science.

 Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata

Re: smartctl cannot access my storage, need syntax help


On 1/19/24 15:56, David Christensen wrote:
No sign of that snipped stuff.


https://www.cablematters.com/pc-187-156-3-pack-straight-60-gbps-sata-iii-cable.aspx


Cheap enough at 18", ordered 4 packs of 3 for service & build stock, 
thanks David.



I call that the "wiggle" test.


So do I but I've had to explain it.  Several times.

Now they'll have to dig me out, got around 16" of white stuff in the 
last 36 hrs. I believe winter has arrived.


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help

2024-01-19 Thread David Christensen


On 1/18/24 23:23, gene heskett wrote:

On 1/19/24 00:55, David Christensen wrote:
I am unclear if those errors are inside the SSD or if they are the 
SATA communications link between the SSD and the motherbaord or HBA 
port and/or main memory (?).  Does dmesg(1) show anything?


I'm not sure what I should be looking for, and I don't see anything that 
is looping to correct an error.  Suggested grep targets?



Here is a dmesg(1) excerpt from 2014 -- Debian 7, good SSD, bad SATA cable:

[2.086360] ata3.00: ATA-9: INTEL SSDSC2CW060A3, 400i, max UDMA/133
[2.086365] ata3.00: 117231408 sectors, multi 16: LBA48 NCQ (depth 
31/32), AA

[2.096265] ata3.00: configured for UDMA/133
[   14.718054] EXT4-fs (dm-0): mounted filesystem with ordered data 
mode. Opts: (null)
[   18.449227] EXT4-fs (sda1): mounted filesystem with ordered data 
mode. Opts: (null)
[   20.157693] ata3.00: exception Emask 0x10 SAct 0x40 SErr 0xc1 
action 0x6 frozen

[   20.157699] ata3.00: irq_stat 0x0800, interface fatal error
[   20.157703] ata3: SError: { RecovData Handshk LinkSeq }
[   20.157709] ata3.00: failed command: WRITE FPDMA QUEUED
[   20.157716] ata3.00: cmd 61/08:b0:a0:e0:61/00:00:00:00:00/40 tag 22 
ncq 4096 out

[   20.157721] ata3.00: status: { DRDY }
[   20.157727] ata3: hard resetting link
[   20.473489] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   20.484835] ata3.00: configured for UDMA/133
[   20.484847] ata3: EH complete
[   21.059825] ata3.00: exception Emask 0x10 SAct 0x4000 SErr 0x400100 
action 0x6 frozen

[   21.059831] ata3.00: irq_stat 0x0800, interface fatal error
[   21.059835] ata3: SError: { UnrecovData Handshk }
[   21.059840] ata3.00: failed command: WRITE FPDMA QUEUED
[   21.059848] ata3.00: cmd 61/08:70:50:e2:61/00:00:00:00:00/40 tag 14 
ncq 4096 out

[   21.059853] ata3.00: status: { DRDY }
[   21.059859] ata3: hard resetting link
[   21.376135] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   21.397234] ata3.00: configured for UDMA/133
[   21.397246] ata3: EH complete
[   22.590805] ata3.00: exception Emask 0x10 SAct 0x600 SErr 0x400100 
action 0x6 frozen

[   22.590811] ata3.00: irq_stat 0x0800, interface fatal error
[   22.590815] ata3: SError: { UnrecovData Handshk }
[   22.590819] ata3.00: failed command: WRITE FPDMA QUEUED
[   22.590826] ata3.00: cmd 61/08:48:f0:ee:1d/00:00:00:00:00/40 tag 9 
ncq 4096 out

[   22.590831] ata3.00: status: { DRDY }
[   22.590834] ata3.00: failed command: WRITE FPDMA QUEUED
[   22.590840] ata3.00: cmd 61/08:50:70:ef:1d/00:00:00:00:00/40 tag 10 
ncq 4096 out

[   22.590844] ata3.00: status: { DRDY }
[   22.590851] ata3: hard resetting link
[   22.909955] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   22.921525] ata3.00: configured for UDMA/133
[   22.937878] ata3: EH complete
[   22.938635] ata3: limiting SATA link speed to 3.0 Gbps
[   22.938638] ata3.00: exception Emask 0x10 SAct 0x40 SErr 0x400100 
action 0x6 frozen

[   22.938640] ata3.00: irq_stat 0x0800, interface fatal error
[   22.938642] ata3: SError: { UnrecovData Handshk }
[   22.938645] ata3.00: failed command: WRITE FPDMA QUEUED
[   22.938648] ata3.00: cmd 61/60:b0:20:28:66/00:00:00:00:00/40 tag 22 
ncq 49152 out

[   22.938650] ata3.00: status: { DRDY }
[   22.938652] ata3: hard resetting link
[   23.257418] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   23.269251] ata3.00: configured for UDMA/133
[   23.285387] ata3: EH complete


In any case, make sure that you are using SATA III 6 Gbps cables with 
locking connectors for your drives and that all the connections are good.


That's hard to verify once the cables are removed from the packing. all 
are black, with locking clips  There is a cable maker under every tree 
in china so I'n not swearing any are up to specs, I've had cable problem 
in the past but usually a magenta colored on that is over 2 years old, 
If you have a known good src on straight on cables, please share.  You 
would be doing everyone a favor. 



https://www.cablematters.com/pc-187-156-3-pack-straight-60-gbps-sata-iii-cable.aspx

https://www.cablematters.com/pc-188-156-cable-matters-3-pack-90-degree-right-angle-60-gbps-sata-iii-cable-18-inches.aspx


Test what you have by taking a wooden stick and moving each one a 
centimeter or so, if the log blows up with sata resets, bingo, bad 
cable. replace it asap.



I call that the "wiggle" test.


David

Re: smartctl cannot access my storage, need syntax help

2024-01-19 Thread David Christensen


On 1/19/24 00:03, Anssi Saari wrote:

My only mdraid was on raw partitions but that never had any issues. I
think zfs effectively does the same, no partitions.



You can do it either way on ZFS.


David

Re: To partition or not to partition MD arrays (Was Re: smartctl cannotaccess my storage, need syntax help)

2024-01-19 Thread Franco Martelli


On 19/01/24 at 20:14, Nicolas George wrote:

Franco Martelli (12024-01-19):

One case against using partitions on mdraid: if your array gets messed
up, you get to recreate those partition tables yourself and that's just
hilarious if you don't have a backup. Happened to a friend of mine,
reason was a UPS brownout.

How can I get a backup of mdadm RAID partition?


You do not need a backup of the RAID partitions, that would be terribly
inefficient. You need a backup of the partition table.


Yes, I agree of course. I was asking this to Anssi because it looks like 
strange to me to have the backup of the partitions, as he pointed (for 
my understanding)




Which, if you are organized, you already have in
$notes_dir/$hostname/install.md as something that looks like this:

```
sudo sfdisk /dev/sdX <

The partitions table of my HDD is part of my backup.

Cheers,

--
Franco Martelli

Re: To partition or not to partition MD arrays (Was Re: smartctl cannotaccess my storage, need syntax help)

2024-01-19 Thread Nicolas George

Franco Martelli (12024-01-19):
> > One case against using partitions on mdraid: if your array gets messed
> > up, you get to recreate those partition tables yourself and that's just
> > hilarious if you don't have a backup. Happened to a friend of mine,
> > reason was a UPS brownout.
> How can I get a backup of mdadm RAID partition?

You do not need a backup of the RAID partitions, that would be terribly
inefficient. You need a backup of the partition table.

Which, if you are organized, you already have in
$notes_dir/$hostname/install.md as something that looks like this:

```
sudo sfdisk /dev/sdX <

signature.asc
Description: PGP signature

Re: To partition or not to partition MD arrays (Was Re: smartctl cannotaccess my storage, need syntax help)

2024-01-19 Thread Franco Martelli


On 19/01/24 at 09:03, Anssi Saari wrote:

One case against using partitions on mdraid: if your array gets messed
up, you get to recreate those partition tables yourself and that's just
hilarious if you don't have a backup. Happened to a friend of mine,
reason was a UPS brownout.


How can I get a backup of mdadm RAID partition? And which tool to backup 
the whole disks of an array? The only tool that it comes in mind it is 
"dd" that it isn't a viable solution for me.
I think is useless to backup the raw data stored in a partition or the 
whole disk. I backup files and directories stored in the filesystem not 
raw data. If an error occurs in the RAID, mdadm takes care to warn me 
via email... I hope!



I think he scanned his disks for copies of
the superblock but didn't find any and then somehow with a lot of hassle
eventually figured out what the partition tables were.

So in a catastrophe, partition tables are one more obstacle to cross
before you can start actually recovering your data.


Me too ran into a catastrophe scenario, I had lost /dev/md0, the reason 
was using hibernate (suspend to disk) in a logical volume placed inside 
the RAID. I think it was damaged the RAID metadata.
I got rid of this using Debian-installer, I thought that I had loosed 
everything and I prepared for reinstall, when Debian-installer asked me 
to create the new RAID I specify all the four partitions, I saved, and 
magically the logical device and all my logical volumes, embedded in the 
old RAID, reappeared. To partition was not a trouble in those circumstances.




My only mdraid was on raw partitions but that never had any issues. I
think zfs effectively does the same, no partitions.


Which raw partitions? Maybe did you mean without partitions? I never 
used zfs it's full featured, I prefer to keep the things simple: RAID -> 
LVM -> ext4


Cheers,
--
Franco Martelli

Re: smartctl cannot access my storage, need syntax help


On 1/19/24 04:50, Thomas Schmitt wrote:

Hi,

Anssi Saari

It does seem strange to me, even in MS-DOS era I was able to set a
terminal scrollback to 5000 lines without issue, when RAM was maybe 4 MB
and a DOS terminal program probably had access to way less than that.


I have no problems with 130 xterms of 10,000 lines each.



So does rsync really generate gigabytes of verbose output?


rsync can be extremely verbose when the number of transferred files is
very high.



Or is xfce-terminal storing the scrollback in a very inefficient way?


I would not be astonished to learn that the luxury ornamented terminals
of the various desktops waste many extra bytes when memorizing plain text.
But the real bug is the fact that the scroll back memory is unlimited and
can summon the OOM killer. (I imagine it like the Discworld Death of Rats.)

If i were a user of Xfce i would report this as bug to its Debian
maintainer. Bug title "xfce-terminal: A landmine on the kids' playground".


Have a nice day :)

Thomas

.

Excellent description Thomas. Love it.
Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help


On 1/19/24 03:12, Anssi Saari wrote:

gene heskett  writes:


The OOM death of the system was the xfce4 terminal apparently being
set for unlimited scrollback and that was eating the memory. Switching
to Konsole with has the ability to control the scrollback to 200
lines, and its taken all 32G's as .cache and 1536 1k blocks of swap,
and its working w/o any OOM actions I've detected.


It does seem strange to me, even in MS-DOS era I was able to set a
terminal scrollback to 5000 lines without issue, when RAM was maybe 4 MB
and a DOS terminal program probably had access to way less than that. So
does rsync really generate gigabytes of verbose output? Or is
xfce-terminal storing the scrollback in a very inefficient way?

That I can't answer, other than -v outputs a full from / pathlist to 
everyfile it touchs, and if storing that in ram, I can sure see it 
eating 32G very quickly when it is moving 335G, it only got around 13G 
moved before OOM struck and killed the system, on each of probably 15 
attempts. Knowing that the tech of an SSD and the common micro-sd has a 
relatively limited actual write speed after in has used up its input 
cache of fast ram, I took the v off the -av, and them limited it to 
10megs a second, it took around 9 hours and the system acted normally, 
no OOM problens. I have edited the /etc/stab and am now running on that 
copy for /home.  The raid is now automounted to /raid10 and says its 
valid, despite the 4th drives log being a mess. My thoughts are to 
reverse the copy and put it in crontab to keep an uptodate backup of 
/home until I can re-invent my wrappers for amanda. /home is by far the 
biggest glop of data, and none of my printers or cnc machines will use 
more that 10G reach, so I'm inclinded to think of the other 8T of drives 
as an lvm managed 8T, which should give me room enough to keep 30 days 
worth of amanda's way of doing things.


But I'm hibernating for the nonce, I woke up at 6 with 6" of new snow on 
the deck, and the weather fabricators are promising another 24 hours of 
that, might wind up with 3 or 4 feet of it. I've got coffee, the 
freezers are well stocked.  Boring but safe.


All the messy logs were at hour 21027 so that was a single actual event, 
probably caused by OOM.


Take care Anssi, stay warm, dry and well where ever you are.

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help

2024-01-19 Thread Thomas Schmitt

Hi,

Anssi Saari
> It does seem strange to me, even in MS-DOS era I was able to set a
> terminal scrollback to 5000 lines without issue, when RAM was maybe 4 MB
> and a DOS terminal program probably had access to way less than that.

I have no problems with 130 xterms of 10,000 lines each.


> So does rsync really generate gigabytes of verbose output?

rsync can be extremely verbose when the number of transferred files is
very high.


> Or is xfce-terminal storing the scrollback in a very inefficient way?

I would not be astonished to learn that the luxury ornamented terminals
of the various desktops waste many extra bytes when memorizing plain text.
But the real bug is the fact that the scroll back memory is unlimited and
can summon the OOM killer. (I imagine it like the Discworld Death of Rats.)

If i were a user of Xfce i would report this as bug to its Debian
maintainer. Bug title "xfce-terminal: A landmine on the kids' playground".


Have a nice day :)

Thomas

Re: smartctl cannot access my storage, need syntax help

2024-01-19 Thread Anssi Saari

gene heskett  writes:

> The OOM death of the system was the xfce4 terminal apparently being
> set for unlimited scrollback and that was eating the memory. Switching
> to Konsole with has the ability to control the scrollback to 200
> lines, and its taken all 32G's as .cache and 1536 1k blocks of swap,
> and its working w/o any OOM actions I've detected.

It does seem strange to me, even in MS-DOS era I was able to set a
terminal scrollback to 5000 lines without issue, when RAM was maybe 4 MB
and a DOS terminal program probably had access to way less than that. So
does rsync really generate gigabytes of verbose output? Or is
xfce-terminal storing the scrollback in a very inefficient way?

Re: smartctl cannot access my storage, need syntax help

2024-01-19 Thread Anssi Saari

Franco Martelli  writes:

> I don't know if it is a good idea, in fact it exists a special
> partition type for RAID array listed in fdisk, I used that for my
> RAID:

One case against using partitions on mdraid: if your array gets messed
up, you get to recreate those partition tables yourself and that's just
hilarious if you don't have a backup. Happened to a friend of mine,
reason was a UPS brownout. I think he scanned his disks for copies of
the superblock but didn't find any and then somehow with a lot of hassle
eventually figured out what the partition tables were.

So in a catastrophe, partition tables are one more obstacle to cross
before you can start actually recovering your data.

My only mdraid was on raw partitions but that never had any issues. I
think zfs effectively does the same, no partitions.

Re: smartctl cannot access my storage, need syntax help

2024-01-18 Thread gene heskett


On 1/19/24 00:55, David Christensen wrote:

On 1/18/24 15:10, gene heskett wrote:

On 1/18/24 16:08, David Christensen wrote:

On 1/18/24 03:47, gene heskett wrote:
I have issued a smartctl -tlong on all 4 drives, results in about 3 
hours.



A SMART long test should find and fix any read errors.

Which has now been done on all 4 SSD. but the log is still a mess. 4th 
one in particular, smartctl -a /dev/sdg attached.



179 Used_Rsvd_Blk_Cnt_Tot   0x0013   085   085   010    Pre-fail  Always 
   -   168


183 Runtime_Bad_Block   0x0013   085   085   010    Pre-fail  Always 
   -   168
187 Uncorrectable_Error_Cnt 0x0032   099   099   000    Old_age   Always 
   -   3275


195 ECC_Error_Rate  0x001a   199   199   000    Old_age   Always 
   -   3275


Error 3332 occurred at disk power-on lifetime: 21027 hours (876 days + 3 
hours)
   When the command that caused the error occurred, the device was 
active or idle.


   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 51 38 e8 ea 67 40  Error: WP at LBA = 0x0067eae8 = 6810344

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --    
   61 18 38 e8 ea 67 40 07  15:17:03.046  WRITE FPDMA QUEUED
   60 00 30 00 5e a9 40 06  15:17:03.046  READ FPDMA QUEUED
   60 28 28 00 f4 87 40 05  15:17:03.046  READ FPDMA QUEUED
   60 00 20 00 7c a9 40 04  15:17:03.046  READ FPDMA QUEUED
   60 00 18 00 4a a9 40 03  15:17:03.046  READ FPDMA QUEUED

Error 3331 occurred at disk power-on lifetime: 21027 hours (876 days + 3 
hours)


Error 3330 occurred at disk power-on lifetime: 21027 hours (876 days + 3 
hours)


Error 3329 occurred at disk power-on lifetime: 21027 hours (876 days + 3 
hours)


Error 3328 occurred at disk power-on lifetime: 21027 hours (876 days + 3 
hours)



I am unclear if those errors are inside the SSD or if they are the SATA 
communications link between the SSD and the motherbaord or HBA port 
and/or main memory (?).  Does dmesg(1) show anything?


I'm not sure what I should be looking for, and I don't see anything that 
is looping to correct an error.  Suggested grep targets?


In any case, make sure that you are using SATA III 6 Gbps cables with 
locking connectors for your drives and that all the connections are good.


That's hard to verify once the cables are removed from the packing. all 
are black, with locking clips  There is a cable maker under every tree 
in china so I'n not swearing any are up to specs, I've had cable problem 
in the past but usually a magenta colored on that is over 2 years old, 
If you have a known good src on straight on cables, please share.  You 
would be doing everyone a favor. No hot red need apply. People think its 
pretty, but the die that gives the color, eats the copper in the cable.


I am the src of the internet legend about that, first observed in the 
early 1970's when all the cb radio mic cables switched from dull red to 
this bright red/magemta as the tx wire in multiconductor cables. And 
that wire literally dissolved the copper in the hot red conductor to a 
dull rusty powder in 2 years.
And its been doing that same failure in sata cables of that color for a 
decade now.


Test what you have by taking a wooden stick and moving each one a 
centimeter or so, if the log blows up with sata resets, bingo, bad 
cable. replace it asap.



When deploying an SSD into a new role, I like to do a "secure erase" 
followed by a SMART long test.


not fam with that, I usually just reformat.  But I'll not do that 
until I have amanda running again.



Secure erase will erase all of the blocks in the drive, including those 
that are held in reserve.  This both verifies that each block can be 
erased, and provides maximum performance what you put the disk into 
service and start writing to it.




Thanks David, take care & stay well


Likewise.  :-)


David


.


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help

2024-01-18 Thread David Christensen


On 1/18/24 15:10, gene heskett wrote:

On 1/18/24 16:08, David Christensen wrote:

On 1/18/24 03:47, gene heskett wrote:
I have issued a smartctl -tlong on all 4 drives, results in about 3 
hours.



A SMART long test should find and fix any read errors.

Which has now been done on all 4 SSD. but the log is still a mess. 4th 
one in particular, smartctl -a /dev/sdg attached.



179 Used_Rsvd_Blk_Cnt_Tot   0x0013   085   085   010Pre-fail  Always 
  -   168


183 Runtime_Bad_Block   0x0013   085   085   010Pre-fail  Always 
  -   168
187 Uncorrectable_Error_Cnt 0x0032   099   099   000Old_age   Always 
  -   3275


195 ECC_Error_Rate  0x001a   199   199   000Old_age   Always 
  -   3275


Error 3332 occurred at disk power-on lifetime: 21027 hours (876 days + 3 
hours)
  When the command that caused the error occurred, the device was 
active or idle.


  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 38 e8 ea 67 40  Error: WP at LBA = 0x0067eae8 = 6810344

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --    
  61 18 38 e8 ea 67 40 07  15:17:03.046  WRITE FPDMA QUEUED
  60 00 30 00 5e a9 40 06  15:17:03.046  READ FPDMA QUEUED
  60 28 28 00 f4 87 40 05  15:17:03.046  READ FPDMA QUEUED
  60 00 20 00 7c a9 40 04  15:17:03.046  READ FPDMA QUEUED
  60 00 18 00 4a a9 40 03  15:17:03.046  READ FPDMA QUEUED

Error 3331 occurred at disk power-on lifetime: 21027 hours (876 days + 3 
hours)


Error 3330 occurred at disk power-on lifetime: 21027 hours (876 days + 3 
hours)


Error 3329 occurred at disk power-on lifetime: 21027 hours (876 days + 3 
hours)


Error 3328 occurred at disk power-on lifetime: 21027 hours (876 days + 3 
hours)



I am unclear if those errors are inside the SSD or if they are the SATA 
communications link between the SSD and the motherbaord or HBA port 
and/or main memory (?).  Does dmesg(1) show anything?



In any case, make sure that you are using SATA III 6 Gbps cables with 
locking connectors for your drives and that all the connections are good.



When deploying an SSD into a new role, I like to do a "secure erase" 
followed by a SMART long test.


not fam with that, I usually just reformat.  But I'll not do that until 
I have amanda running again.



Secure erase will erase all of the blocks in the drive, including those 
that are held in reserve.  This both verifies that each block can be 
erased, and provides maximum performance what you put the disk into 
service and start writing to it.




Thanks David, take care & stay well


Likewise.  :-)


David

Re: smartctl cannot access my storage, need syntax help

2024-01-18 Thread David Wright

On Thu 18 Jan 2024 at 00:57:07 (-0800), David Christensen wrote:
> On 1/17/24 22:44, gene heskett wrote:
> > One thing that bothers me is there is no way the installers parted
> > shows partition names for non-raid disks. To me that is a serious
> > bug. It appears from the help that it can LABEL a partition but
> > can't read that LABEL.
> 
> When installing to UEFI/GPT, I am able to label partitions in the
> Debian Installer, the labels are visible in the installer, and the
> labels persist on disk after installation is complete.

Agreed, and that doesn't depend on UEFI; MBR/GPT disks show
the same behaviour. But those are PARTLABELS.

But it may be that Gene meant filesystem LABELs.

Gene, to check/display the LABELs, just place, in turn, the
highlight on the line for each partition, like:

  │   >   #5   31.5 GBext4Viva-B▒ │

press Return for it to display:

  │ Partition settings: │   
  │ │   
  │Name:  Viva-B│   
  │Use as:Ext4 journaling file system   │   
  │ │   
  │Format the partition:  yes, format it│   
  │Mount point:   / │   
  │Mount options: defaults  │   
  │Label: viva05 ←← │   
  │Reserved blocks:   5%│   
  │ │   
  │Done setting up the partition│   

where Name: ⇒ PARTLABEL and Label: ⇒ LABEL.

Then select "Done setting up …" or  to back out each time.

Cheers,
David.

Re: smartctl cannot access my storage, need syntax help

2024-01-18 Thread gene heskett


On 1/18/24 16:08, David Christensen wrote:

On 1/18/24 03:47, gene heskett wrote:

On 1/18/24 03:57, David Christensen wrote:
The old /home RAID10 still has its metadata on disk.  I would install 
the "mdadm" package, edit /etc/fstab, copy and rework the old /home 
line (new mount point, add option "ro"), create the mount point, and 
mount.


I believe mdadm is already installed. At least enough to collect and 
mount this raid10 and use it for /home for the last nearly 2 years.



I made the suggestion to install the "mdadm" package because I thought 
you were going to do a fresh install of Debian.



Now after all this folderall, all 4 of the SSD's are reporting read 
errors at very high lba's.


all 4 drives are reporting the same poh, 21027 hours for the occurence 
of the error, that sounds like it could be just one crash or dirty 
power down.  In which case it s/b repairable


  Do we have a repair utility that will force the drive to reallocate 
a spare sector and fix those?
I have issued a smartctl -tlong on all 4 drives, results in about 3 
hours.



A SMART long test should find and fix any read errors.

Which has now been done on all 4 SSD. but the log is still a mess. 4th 
one in particular, smartctl -a /dev/sdg attached.


When deploying an SSD into a new role, I like to do a "secure erase" 
followed by a SMART long test.
not fam with that, I usually just reformat.  But I'll not do that until 
I have amanda running again.


Thanks David, take care & stay well


David




.


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-17-rt-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Samsung based SSDs
Device Model: Samsung SSD 870 EVO 1TB
Serial Number:S626NF0R302509W
LU WWN Device Id: 5 002538 f413394b0
Firmware Version: SVT01B6Q
User Capacity:1,000,204,886,016 bytes [1.00 TB]
Sector Size:  512 bytes logical/physical
Rotation Rate:Solid State Device
Form Factor:  2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is:In smartctl database 7.3/5319
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:Thu Jan 18 18:02:48 2024 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  ( 117) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline 
data collection:(0) seconds.
Offline data collection
capabilities:(0x53) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine 
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:(  85) minutes.
SCT capabilities:  (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   085   085   010Pre-fail  Always   
-   168
  9 Power_On_Hours  0x0032   095   095   000Old_age   Always   
-   21139
 12 Power_Cycle_Count

Re: smartctl cannot access my storage, need syntax help

2024-01-18 Thread David Christensen


On 1/18/24 03:47, gene heskett wrote:

On 1/18/24 03:57, David Christensen wrote:
The old /home RAID10 still has its metadata on disk.  I would install 
the "mdadm" package, edit /etc/fstab, copy and rework the old /home 
line (new mount point, add option "ro"), create the mount point, and 
mount.


I believe mdadm is already installed. At least enough to collect and 
mount this raid10 and use it for /home for the last nearly 2 years.



I made the suggestion to install the "mdadm" package because I thought 
you were going to do a fresh install of Debian.



Now after all this folderall, all 4 of the SSD's are reporting read 
errors at very high lba's.


all 4 drives are reporting the same poh, 21027 hours for the occurence 
of the error, that sounds like it could be just one crash or dirty power 
down.  In which case it s/b repairable


  Do we have a repair utility that will force the drive to reallocate a 
spare sector and fix those?

I have issued a smartctl -tlong on all 4 drives, results in about 3 hours.



A SMART long test should find and fix any read errors.


When deploying an SSD into a new role, I like to do a "secure erase" 
followed by a SMART long test.



David

Re: To partition or not to partition MD arrays (Was Re: smartctl cannotaccess my storage, need syntax help)

2024-01-18 Thread Andy Smith

Hi,

On Thu, Jan 18, 2024 at 10:28:30AM -0600, Nicholas Geovanis wrote:
> Sounds like this group has finally achieved a long overdue consensus. How
> many times since LVM was ready for root/boot volumes have I been told that
> using partitions was necessary good practice. Even had that in job
> interviews, where half the team would grin at me saying it and the other
> half scowling at my "poor practice".
> 
> Now we know it was just personal preference all along. Like somebody said
> :-)

Look, if you're going to resolve this thread so quickly all it means
is that someone is going to have to mention home.arpa or their time
zone setting again. We have strict quotas here for the amount of
circular repeating "you don't do things like me therefore you are
wrong and here are a selection of Internet standards to back me up"
threads that must be taking place at once.



Thanks,
Andy

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting

Re: smartctl cannot access my storage, need syntax help

2024-01-18 Thread Curt

On 2024-01-17, Thomas Schmitt  wrote:
> Hi,
>
> Curt wrote:
>> I discovered a couple of discussions of the phenomenon, the upshot of which
>> were:
>> 1) That's what you get when you purchase cheap SSDs.
>> https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/
>> 2) SSDs belonging to the same software RAID show identical serial numbers
>> in software, but these numbers don't match the serial numbers printed on the
>> SSDs themselves.
>> https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/
>
> Those URLs are identical. (OMG ! Is it contageous ?)

Human error may very be:
https://www.reddit.com/r/synology/comments/18fe6ez/how_to_fix_2_drives_with_same_serial_number/


> Number 2 would match my suspicion that some layer in the disk driving
> gets confused and mixes up the serial numbers.
>
>
>> But you said *similar*.
>
> By "colliding serial numbers" i mean indeed "identical serial numbers".
>
> How cheap the disks may ever be, that would be no excuse for not making
> them individually distinguishable.
>
>
>> As Gene's threads have too many movable parts
>> for me to follow, on that point I couldn't say.
>
> This one begins to gain presence in the web. So one can use search engines
> and AI to untangle its sub-threads. I meanwhile participate in two of them:
> serial number collision, rsync caused OOM killer (solved now, but how ?).
>
>
> Have a nice day :)
>
> Thomas
>
>


--

normally start new xterms [was: Re: smartctl cannot access my storage, need syntax help]

2024-01-18 Thread Max Nikulin


On 18/01/2024 04:20, Thomas Schmitt wrote:


I normally start new xterms by

   xterm -ls -geometry 80x24 -bg wheat -fg black -sl 1 +sb &


Options may be put into ~/.Xresources

xterm*vt100.saveLines: 1
xterm*VT100.background: wheat
xterm*VT100.foreground: black
! etc

Use xrdb to merge changes without restarting X session. It is possible 
to have several presets (-name or -class), see /etc/X11/app-defaults/

Re: To partition or not to partition MD arrays (Was Re: smartctl cannotaccess my storage, need syntax help)

2024-01-18 Thread Nicholas Geovanis

On Wed, Jan 17, 2024, 9:35 PM gene heskett  wrote:

> On 1/17/24 19:54, Steve McIntyre wrote:
> > Andy Smith wrote:
> ...
> >> Then there will just be people going by taste.
> >>
> >> Personally I still put them directly on drives. If I ever get taken
> >> out by one of those crappy motherboards, I reserve the right to get
> >> a different religion. 
> >
> > I'm clearly a member of a third group of people,,, :-)
> >
> > Putting partitions on the RAID drives helps *me* identify them.
> >
> you aren't alone Steve.
> Cheers, Gene Heskett.
>

Sounds like this group has finally achieved a long overdue consensus. How
many times since LVM was ready for root/boot volumes have I been told that
using partitions was necessary good practice. Even had that in job
interviews, where half the team would grin at me saying it and the other
half scowling at my "poor practice".

Now we know it was just personal preference all along. Like somebody said
:-)

>

Re: To partition or not to partition MD arrays (Was Re: smartctl cannot access my storage, need syntax help)

2024-01-18 Thread Steve McIntyre

Hey Andy.

Andy Smith wrote:
>
>On Thu, Jan 18, 2024 at 12:53:43AM +, Steve McIntyre wrote:
>> I'm clearly a member of a third group of people,,, :-)
>
>Oh, I didn't mean to imply that those going by taste were in a
>minority! Taste, or possibly, "just never thought about it" could
>well be the biggest group. I was only talking about my observations
>of those who seem to hold strong opinions on this, usually to the
>point where they will advocate "their way" to others.

ACK!

>> Putting partitions on the RAID drives helps *me* identify them.
>
>So, I don't care what people do and I'm not trying to change your
>mind. Would you mind going into what makes "sda1" more identifiable
>for you than "sda" though?
>
>Or is it that you make use of partition labels for some extra info?

If I'm looking at disks on a system, the first thing I'll look for is
the partition table. If a disk has a partition table with "Linux RAID"
partitions viaible, that gives me a strong hint of what I should
expect on the disk. Especially if I'm swappings disk around between
systems, commisioning new systems and re-using disks etc.

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
Can't keep my eyes from the circling sky,
Tongue-tied & twisted, Just an earth-bound misfit, I...

Re: To partition or not to partition MD arrays (Was Re: smartctl cannot access my storage, need syntax help)

2024-01-18 Thread Andy Smith

Hello,

On Thu, Jan 18, 2024 at 12:53:43AM +, Steve McIntyre wrote:
> I'm clearly a member of a third group of people,,, :-)

Oh, I didn't mean to imply that those going by taste were in a
minority! Taste, or possibly, "just never thought about it" could
well be the biggest group. I was only talking about my observations
of those who seem to hold strong opinions on this, usually to the
point where they will advocate "their way" to others.

> Putting partitions on the RAID drives helps *me* identify them.

So, I don't care what people do and I'm not trying to change your
mind. Would you mind going into what makes "sda1" more identifiable
for you than "sda" though?

Or is it that you make use of partition labels for some extra info?

Thanks,
Andy

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting

Re: smartctl cannot access my storage, need syntax help

2024-01-18 Thread gene heskett


On 1/18/24 03:57, David Christensen wrote:
On 1/17/24 22:44, gene heskett wrote:>> On 1/18/24 00:50, David 
Christensen wrote:
The migration took two passes because udev can't make up its alleged 
mind so I was finally forced to use the rescue mode to edit fstab to 
mount it by UUID and that worked, I've got /home on the copy right now.



Congratulations!  :-)


and I took the 60 G's of swap out too since I've never used more the 
20G with any gfx program, so I figure 47G's on /dev/sda is enough. 



1 GB swap works for me.  When a memory leak gets out of control, I do 
not have to wait long for the lock up.



So now none of the raid is mounted, but the 30+ second lag when 
opening a write path is still there, so I was erroneously blaming the 
raid. So I've narrowed the problem 



Good to know.


but w/o a good clue what to do next. 



Find the needle in the haystack or do a fresh install.  I prefer the 
latter, because I can estimate the effort and I am reasonably confident 
of the outcome.



One thing that bothers me is there is no way the installers parted 
shows partition names for non-raid disks. To me that is a serious bug. 
It appears from the help that it can LABEL a partition but can't read 
that LABEL.



When installing to UEFI/GPT, I am able to label partitions in the Debian 
Installer, the labels are visible in the installer, and the labels 
persist on disk after installation is complete.



parted when asked to print all does that just fine, but the | doesn't 
put it to less, so it scrolls off screen the top 60% of a parted's 
print all output at some fraction of C speed. Not exactly helpful. I 
have other things to do while I cogitate on what to do next.



The following works as expected on my machine:

2024-01-18 00:34:41 root@laalaa ~
# parted -l | less



Many thanks to all that helped.



YW.  :-)


If you use rsync(1), I suggest using some kind of integrity checking 
tool to verify that the source and destination file systems are 
identical.  I prefer BSD mtree(8):


I assume I'd have to remount the raid like to /raid?
Whew!  That's got more arguments than rsync...



The old /home RAID10 still has its metadata on disk.  I would install 
the "mdadm" package, edit /etc/fstab, copy and rework the old /home line 
(new mount point, add option "ro"), create the mount point, and mount.


I believe mdadm is already installed. At least enough to collect and 
mount this raid10 and use it for /home for the last nearly 2 years.
Now after all this folderall, all 4 of the SSD's are reporting read 
errors at very high lba's.


all 4 drives are reporting the same poh, 21027 hours for the occurence 
of the error, that sounds like it could be just one crash or dirty power 
down.  In which case it s/b repairable


 Do we have a repair utility that will force the drive to reallocate a 
spare sector and fix those?

I have issued a smartctl -tlong on all 4 drives, results in about 3 hours.


David

.


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help

2024-01-18 Thread David Christensen

On 1/17/24 22:44, gene heskett wrote:>> On 1/18/24 00:50, David 
Christensen wrote:
The migration took two passes because udev can't make up its alleged 
mind so I was finally forced to use the rescue mode to edit fstab to 
mount it by UUID and that worked, I've got /home on the copy right now.



Congratulations!  :-)


and I took the 60 G's of swap out too since I've never used more the 20G 
with any gfx program, so I figure 47G's on /dev/sda is enough. 



1 GB swap works for me.  When a memory leak gets out of control, I do 
not have to wait long for the lock up.



So now 
none of the raid is mounted, but the 30+ second lag when opening a write 
path is still there, so I was erroneously blaming the raid. So I've 
narrowed the problem 



Good to know.


but w/o a good clue what to do next. 



Find the needle in the haystack or do a fresh install.  I prefer the 
latter, because I can estimate the effort and I am reasonably confident 
of the outcome.



One thing that 
bothers me is there is no way the installers parted shows partition 
names for non-raid disks. To me that is a serious bug. It appears from 
the help that it can LABEL a partition but can't read that LABEL.



When installing to UEFI/GPT, I am able to label partitions in the Debian 
Installer, the labels are visible in the installer, and the labels 
persist on disk after installation is complete.



parted 
when asked to print all does that just fine, but the | doesn't put it to 
less, so it scrolls off screen the top 60% of a parted's print all 
output at some fraction of C speed. Not exactly helpful. I have other 
things to do while I cogitate on what to do next.



The following works as expected on my machine:

2024-01-18 00:34:41 root@laalaa ~
# parted -l | less



Many thanks to all that helped.



YW.  :-)


If you use rsync(1), I suggest using some kind of integrity checking 
tool to verify that the source and destination file systems are 
identical.  I prefer BSD mtree(8):


I assume I'd have to remount the raid like to /raid?
Whew!  That's got more arguments than rsync...



The old /home RAID10 still has its metadata on disk.  I would install 
the "mdadm" package, edit /etc/fstab, copy and rework the old /home line 
(new mount point, add option "ro"), create the mount point, and mount.



David

Re: smartctl cannot access my storage, need syntax help

Hi,

gene heskett wrote:
> > where did the extra 19.4G's come from? Can filesystem
> > ext4's overhead account for that?

In an earlier mail:

> > > command line: rsync -a --bwlimit=10m --fsync --progress /home/ 
> > > /mnt/homevol

David Christensen wrote:
> Please RTFM rsync(1) to choose your options.  These look
> useful:
>--archive, -a   (-rlptgoD)
>--delete
>--hard-links, -H
>--one-file-system, -x
>--sparse, -S

I bet on --hard-links and --sparse as means to avoid the extra disk space
consumption. (--archive is important for other reasons, but it was
already in use as -a with your successful rsync run. --delete will be
of importance if the rsync run gets repeated on the already filled target
directory tree.)

man rsync:

 -H, --hard-links
This tells rsync to look for hard-linked files in the source and
link together the corresponding files on the destination.  With‐
out  this option, hard-linked files in the source are treated as
though they were separate files.
[...]
 -S, --sparse
Try  to  handle  sparse  files  efficiently so they take up less
space on the destination. [...]

One can observe a similar inflation effect when copying the files of a
Debian installation ISO to hard disk. In the original disk directory
on the machine which created the ISO there were hardlinked kernels and
firmware packages. In the ISO these link siblings share the same file
content storage.
But when mounted, the siblings get treated as separate files with
different inode numbers. So the 8,135,584 bytes of the hardlink siblings
  /install.amd/gtk/vmlinuz
  /install.amd/vmlinuz
  /install.amd/xen/vmlinuz
get triplicated when these three files get copied out of the ISO.

I am somewhat astonished that --hard-links is not default in rsync,
as it is quite important for backup fidelity.
(On the other hand it is some effort to find all siblings on the disk.)

Sparse files are files with large areas of 0-bytes. Many filesystems
don't store the zeros but rather an instruction to hand out the given
number of 0-bytes when requested by a reader.

If i were you, i'd let rsync make a complete new copy with --hard-links
--sparse, and --delete, but without --bwlimit= in order to get a higher
copy fidelity and also to check whether the transfer speed really was not
to blame for the appearance of the OOM killer.

Have a nice day :)

Thomas

Re: smartctl cannot access my storage, need syntax help

2024-01-17 Thread Charles Curley

On Tue, 16 Jan 2024 21:10:28 -0500
gene heskett  wrote:

> gene@coyote:~/src/klipper-docs$  lsblk -d -o 
> NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijkl]
> NAME MAJ:MIN MODEL SERIAL WWN
> sdh8:112 Gigastone SSD GSTD02TB230102
> sdi8:128 Gigastone SSD GST02TBG221146
> sdj8:144 Gigastone SSD GST02TBG221146
> sdk8:160 Gigastone SSD GSTG02TB230206
> sdl8:176 Gigastone SSD GSTG02TB230206

Something is seriously wrong here. I worked at Maxtor for a while. They
went out of their way to be sure there were no duplicate serial
numbers.

Gene, I suggest you check these SNs with the SN on the packages (if
there is one) and on the label on the drive.

Also, take each drive, one at a time, attach it to another computer
with a fresh installation of Debian, one you haven't mucked with in any
way, and only one other drive already in it, and read the SNs there.

I also went looking for Gigastone's web site. Every page I tried at
gigastone.com led to what I presume was an Error 404 page. I say
presume because most of the text was in non-English, probably Chinese,
characters.

-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/

Re: smartctl cannot access my storage, need syntax help


On 1/18/24 00:50, David Christensen wrote:

On 1/17/24 20:20, gene heskett wrote:

On 1/17/24 19:58, David Christensen wrote:

On 1/17/24 15:58, gene heskett wrote:
Now the question is how did it make this: homevol s/b very close to 
/home  in size but:

root@coyote:~# df && free
Filesystem  1K-blocks  Used  Available Use% Mounted on
udev 16327704 0   16327704   0% /dev
tmpfs 3272684  1912    3270772   1% /run
/dev/sda1   863983352  22348472  797673232   3% /
tmpfs    16363420  1244   16362176   1% /dev/shm
tmpfs    5120 8   5112   1% /run/lock
/dev/sda3    47749868   784   45291076   1% /tmp
/dev/md0p1 1796382580 335102676 1369954928  20% /home
tmpfs 3272684  4956    3267728   1% /run/user/1000
/dev/sdh1  1967892164 354519236 1513336680  19% /mnt/homevol
    total    used    free  shared  
buff/cache available
Mem:    32726840 3417576  515520  934540    30072184 
29309264

Swap:  111902712    2048   111900664
root@coyote:~#

It somehow changed 335G into 354G. Thinking the AppImages dir is 
full of soft links of short names pointing at the long filename and 
had turned the links into duplicates, that was the first thing I 
checked, but it was all good soft-links, so where did the extra 
19.4G's come from? Can filesystem ext4's overhead account for that?



I suggest running rsync(1) with --dry-run, --log-file=FILE, 
--itemize_changes, and whatever other options are needed to find the 
differences.  Please RTFM rsync(1) to choose your options.  These 
look useful:


 --archive, -a    (-rlptgoD)
 --delete

Why --delete?



If you have files on the destination from a previous run of rsync(1) and 
they no longer exist on the source, --delete will get rid of extraneous 
files on the destination.




 --hard-links, -H
 --one-file-system, -x
 --sparse, -S

or --sparse?



First, you need to understand what "sparse file" means:

https://en.wikipedia.org/wiki/Sparse_file


If you have sparse files on the source -- say, 10 GB virtual machine 
images -- then you want rsync(1) to create sparse files on the destination.



Well, my abundance of curiosity, may have killed the cat, but if I 
understand how rsync's -a works, re-running the same command will only 
update for the incoming email and any posts I've made while it was 
running the first time.  So the same command quoted last is now 
running again. when it has exited, which it has now done in about 15 
minutes I'll edit fstab to remove the 60 gigs of swap on md1, remove 
the existing mount of md0p1 as /home taking the raid10 completely out 
of the system. And add the mounting of LABEL=homevolsdh1 as the /home 
partition and reboot. In the event I have to re-install, the raid will 
still contain my data and can be recovered.
I already have a dvd with the most recent netinstall burnt. All I have 
to do is convince it to not install orca and brltty. Probably by 
unplugging _all_ usb stuff except the keyboard and mouse buttons.


What would solve many of my problems is a bit of help from someone who 
it running trinity to tell me how to install it on a system w/o any 
installed gui which obviously disables synaptic. That leaves apt, 
apt-get, and aptitude, unless there is a better way. aptitude is 
uncontrollable, has fixed me once, has torn the system down to another 
install 3 times so the odds are not in my favor.


So those fstab edits have been done, next is a reboot



You should be able to migrate your /home file system from RAID10 to an 
SSD without needing to reinstall Debian.


The migration took two passes because udev can't make up its alleged 
mind so I was finally forced to use the rescue mode to edit fstab to 
mount it by UUID and that worked, I've got /home on the copy right now. 
and I took the 60 G's of swap out too since I've never used more the 20G 
with any gfx program, so I figure 47G's on /dev/sda is enough.  So now 
none of the raid is mounted, but the 30+ second lag when opening a write 
path is still there, so I was erroneously blaming the raid. So I've 
narrowed the problem but w/o a good clue what to do next. One thing that 
bothers me is there is no way the installers parted shows partition 
names for non-raid disks. To me that is a serious bug. It appears from 
the help that it can LABEL a partition but can't read that LABEL. parted 
when asked to print all does that just fine, but the | doesn't put it to 
less, so it scrolls off screen the top 60% of a parted's print all 
output at some fraction of C speed. Not exactly helpful. I have other 
things to do while I cogitate on what to do next.  Many thanks to all 
that helped.



If you use rsync(1), I suggest using some kind of integrity checking 
tool to verify that the source and destination file systems are 
identical.  I prefer BSD mtree(8):


I assume I'd have to remount the raid l

Re: smartctl cannot access my storage, need syntax help


On 1/17/24 20:20, gene heskett wrote:

On 1/17/24 19:58, David Christensen wrote:

On 1/17/24 15:58, gene heskett wrote:
Now the question is how did it make this: homevol s/b very close to 
/home  in size but:

root@coyote:~# df && free
Filesystem  1K-blocks  Used  Available Use% Mounted on
udev 16327704 0   16327704   0% /dev
tmpfs 3272684  1912    3270772   1% /run
/dev/sda1   863983352  22348472  797673232   3% /
tmpfs    16363420  1244   16362176   1% /dev/shm
tmpfs    5120 8   5112   1% /run/lock
/dev/sda3    47749868   784   45291076   1% /tmp
/dev/md0p1 1796382580 335102676 1369954928  20% /home
tmpfs 3272684  4956    3267728   1% /run/user/1000
/dev/sdh1  1967892164 354519236 1513336680  19% /mnt/homevol
    total    used    free  shared  buff/cache 
available
Mem:    32726840 3417576  515520  934540    30072184 
29309264

Swap:  111902712    2048   111900664
root@coyote:~#

It somehow changed 335G into 354G. Thinking the AppImages dir is full 
of soft links of short names pointing at the long filename and had 
turned the links into duplicates, that was the first thing I checked, 
but it was all good soft-links, so where did the extra 19.4G's come 
from? Can filesystem ext4's overhead account for that?



I suggest running rsync(1) with --dry-run, --log-file=FILE, 
--itemize_changes, and whatever other options are needed to find the 
differences.  Please RTFM rsync(1) to choose your options.  These look 
useful:


 --archive, -a    (-rlptgoD)
 --delete

Why --delete?



If you have files on the destination from a previous run of rsync(1) and 
they no longer exist on the source, --delete will get rid of extraneous 
files on the destination.




 --hard-links, -H
 --one-file-system, -x
 --sparse, -S

or --sparse?



First, you need to understand what "sparse file" means:

https://en.wikipedia.org/wiki/Sparse_file


If you have sparse files on the source -- say, 10 GB virtual machine 
images -- then you want rsync(1) to create sparse files on the destination.



Well, my abundance of curiosity, may have killed the cat, but if I 
understand how rsync's -a works, re-running the same command will only 
update for the incoming email and any posts I've made while it was 
running the first time.  So the same command quoted last is now running 
again. when it has exited, which it has now done in about 15 minutes 
I'll edit fstab to remove the 60 gigs of swap on md1, remove the 
existing mount of md0p1 as /home taking the raid10 completely out of the 
system. And add the mounting of LABEL=homevolsdh1 as the /home partition 
and reboot. In the event I have to re-install, the raid will still 
contain my data and can be recovered.
I already have a dvd with the most recent netinstall burnt. All I have 
to do is convince it to not install orca and brltty. Probably by 
unplugging _all_ usb stuff except the keyboard and mouse buttons.


What would solve many of my problems is a bit of help from someone who 
it running trinity to tell me how to install it on a system w/o any 
installed gui which obviously disables synaptic. That leaves apt, 
apt-get, and aptitude, unless there is a better way. aptitude is 
uncontrollable, has fixed me once, has torn the system down to another 
install 3 times so the odds are not in my favor.


So those fstab edits have been done, next is a reboot



You should be able to migrate your /home file system from RAID10 to an 
SSD without needing to reinstall Debian.



Copying a file system that is mounted read-write is problematic.  It is 
best to remount it read-only, and then copy.  This is hard to do when 
you are logged in and using the file system you want to copy.  Options 
include rebooting into single-user root console or using live media.



To make an exact copy of the source, consider using a tool designed for 
this task -- such as cpio(1), tar(1), or a backup/restore system such as 
amanda(8).



If you use rsync(1), I suggest using some kind of integrity checking 
tool to verify that the source and destination file systems are 
identical.  I prefer BSD mtree(8):


https://manpages.debian.org/bullseye/mtree-netbsd/mtree.8.en.html


(Be careful not to confuse the above with mtree(5) via libarchive.)


David

Re: smartctl cannot access my storage, need syntax help


On 1/17/24 19:58, David Christensen wrote:

On 1/17/24 15:58, gene heskett wrote:
Now the question is how did it make this: homevol s/b very close to 
/home  in size but:

root@coyote:~# df && free
Filesystem  1K-blocks  Used  Available Use% Mounted on
udev 16327704 0   16327704   0% /dev
tmpfs 3272684  1912    3270772   1% /run
/dev/sda1   863983352  22348472  797673232   3% /
tmpfs    16363420  1244   16362176   1% /dev/shm
tmpfs    5120 8   5112   1% /run/lock
/dev/sda3    47749868   784   45291076   1% /tmp
/dev/md0p1 1796382580 335102676 1369954928  20% /home
tmpfs 3272684  4956    3267728   1% /run/user/1000
/dev/sdh1  1967892164 354519236 1513336680  19% /mnt/homevol
    total    used    free  shared  buff/cache 
available
Mem:    32726840 3417576  515520  934540    30072184 
29309264

Swap:  111902712    2048   111900664
root@coyote:~#

It somehow changed 335G into 354G. Thinking the AppImages dir is full 
of soft links of short names pointing at the long filename and had 
turned the links into duplicates, that was the first thing I checked, 
but it was all good soft-links, so where did the extra 19.4G's come 
from? Can filesystem ext4's overhead account for that?



I suggest running rsync(1) with --dry-run, --log-file=FILE, 
--itemize_changes, and whatever other options are needed to find the 
differences.  Please RTFM rsync(1) to choose your options.  These look 
useful:


 --archive, -a    (-rlptgoD)
 --delete

Why --delete?

 --hard-links, -H
 --one-file-system, -x
 --sparse, -S

or --sparse?

Well, my abundance of curiosity, may have killed the cat, but if I 
understand how rsync's -a works, re-running the same command will only 
update for the incoming email and any posts I've made while it was 
running the first time.  So the same command quoted last is now running 
again. when it has exited, which it has now done in about 15 minutes 
I'll edit fstab to remove the 60 gigs of swap on md1, remove the 
existing mount of md0p1 as /home taking the raid10 completely out of the 
system. And add the mounting of LABEL=homevolsdh1 as the /home partition 
and reboot. In the event I have to re-install, the raid will still 
contain my data and can be recovered.
I already have a dvd with the most recent netinstall burnt. All I have 
to do is convince it to not install orca and brltty. Probably by 
unplugging _all_ usb stuff except the keyboard and mouse buttons.


What would solve many of my problems is a bit of help from someone who 
it running trinity to tell me how to install it on a system w/o any 
installed gui which obviously disables synaptic. That leaves apt, 
apt-get, and aptitude, unless there is a better way. aptitude is 
uncontrollable, has fixed me once, has torn the system down to another 
install 3 times so the odds are not in my favor.


So those fstab edits have been done, next is a reboot



David

.


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help

2024-01-17 Thread David Wright

On Wed 17 Jan 2024 at 15:34:09 (-0500), gene heskett wrote:
> On 1/17/24 12:27, Thomas Schmitt wrote:
> > David Christensen wrote:
> > > I suspect the conflicting serial numbers are causing problems in the 
> > > kernel,
> > > as indicated by the /dev/disk/by-id/* problems.
> > 
> > That's not in the kernel but in udev/systemd's process of creating the
> > symbolic links in /dev/disk/by-id/.
> > It gets /dev/sd[h-l] and /dev/sd[h-l]1 as kernel generated device files.
> > But sd[ij] and also sd[hl] show pair-wise the same serial numbers.
> > In case of sd[ij] the outcome is mixed: links to sdi and sdj1 survive.
> > In case of sd[hl] we see a less strange outcome: sdh and sdh1, while
> > sdl and sdl1 are missing.
> > 
> missing because the original command line did not look at sdl.
> I added the l and it showed up. No magic.

What do you mean, it was "missing"? The original command, which I wrote
for you, contained a wildcard, so it doesn't miss anything that's there:

  root@coyote:~# for j in /dev/disk/by-id/* ; do printf '%s\t%s\n' "$(realpath 
"$j")" "$j" ; done 

and there was no sdl in the output from that command. In fact,
there was no "l" in your post between the "l" in "realpath",
above, and the "l" in "like", below:

  root@coyote:~#
  but like I wrote, 2 pairs with identical "serial numbers", so the

https://lists.debian.org/debian-user/2024/01/msg00658.html
shows this, that no sdl was seen under by-id/.

> > The open question (at least to me) is whether it's the disks or the
> > controllers or the drivers which cause the duplication.
> Neither, a typu in the original command.

Cheers,
David.

Re: To partition or not to partition MD arrays (Was Re: smartctl cannotaccess my storage, need syntax help)


On 1/17/24 19:54, Steve McIntyre wrote:

Andy Smith wrote:


The newer set of people recommending partitions are mostly doing so
because there's been a few incidents of "helpful" PC motherboards
detecting on boot what they think is a corrupt GPT, and replacing it
with a blank one, damaging the RAID. This is a real thing that has
happened to more than one person; it even got linked on Hacker News
I believe.

Then there will just be people going by taste.

Personally I still put them directly on drives. If I ever get taken
out by one of those crappy motherboards, I reserve the right to get
a different religion. 


I'm clearly a member of a third group of people,,, :-)

Putting partitions on the RAID drives helps *me* identify them.


you aren't alone Steve.
Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help


On 1/17/24 15:58, gene heskett wrote:
Now the question is how did it make 
this: homevol s/b very close to /home  in size but:

root@coyote:~# df && free
Filesystem  1K-blocks  Used  Available Use% Mounted on
udev 16327704 0   16327704   0% /dev
tmpfs 3272684  1912    3270772   1% /run
/dev/sda1   863983352  22348472  797673232   3% /
tmpfs    16363420  1244   16362176   1% /dev/shm
tmpfs    5120 8   5112   1% /run/lock
/dev/sda3    47749868   784   45291076   1% /tmp
/dev/md0p1 1796382580 335102676 1369954928  20% /home
tmpfs 3272684  4956    3267728   1% /run/user/1000
/dev/sdh1  1967892164 354519236 1513336680  19% /mnt/homevol
    total    used    free  shared  buff/cache 
available
Mem:    32726840 3417576  515520  934540    30072184 
29309264

Swap:  111902712    2048   111900664
root@coyote:~#

It somehow changed 335G into 354G. Thinking the AppImages dir is full of 
soft links of short names pointing at the long filename and had turned 
the links into duplicates, that was the first thing I checked, but it 
was all good soft-links, so where did the extra 19.4G's come from? Can 
filesystem ext4's overhead account for that?



I suggest running rsync(1) with --dry-run, --log-file=FILE, 
--itemize_changes, and whatever other options are needed to find the 
differences.  Please RTFM rsync(1) to choose your options.  These look 
useful:


--archive, -a   (-rlptgoD)
--delete
--hard-links, -H
--one-file-system, -x
--sparse, -S


David

Re: To partition or not to partition MD arrays (Was Re: smartctl cannot access my storage, need syntax help)

2024-01-17 Thread Steve McIntyre

Andy Smith wrote:
>
>The newer set of people recommending partitions are mostly doing so
>because there's been a few incidents of "helpful" PC motherboards
>detecting on boot what they think is a corrupt GPT, and replacing it
>with a blank one, damaging the RAID. This is a real thing that has
>happened to more than one person; it even got linked on Hacker News
>I believe.
>
>Then there will just be people going by taste.
>
>Personally I still put them directly on drives. If I ever get taken
>out by one of those crappy motherboards, I reserve the right to get
>a different religion. ð

I'm clearly a member of a third group of people,,, :-)

Putting partitions on the RAID drives helps *me* identify them.

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
Can't keep my eyes from the circling sky,
Tongue-tied & twisted, Just an earth-bound misfit, I...

Re: smartctl cannot access my storage, need syntax help


On 1/17/24 16:45, David Christensen wrote:

On 1/17/24 12:30, gene heskett wrote:

By LABELing the partitions uniquely, that problem so far as I can
see, is solved.



Okay.


So, are you confident that your motherboard ports, HBA ports, and SSD's
are all working correctly now?



The OOM death of the system was the xfce4 terminal apparently being
set for unlimited scrollback and that was eating the memory. Switching 
to Konsole with has the ability to control the scrollback

to 200 lines, and its taken all 32G's as .cache and 1536 1k blocks of
swap, and its working w/o any OOM actions I've detected.



Okay.


Xfce -> Terminal Emulator -> right click on screen -> Preferences ->
General -> Scrolling:

 Scrollback    200
 Unlimited scrollback    uncheck


Using tee(1) would allow you to both monitor progress and save standard 
output and/or standard error (via shell redirection).



A related issue is that lots of standard output can slow a program. 
Minimizing a terminal can help.  Redirecting standard output to a file 
or to /dev/null can help, especially when done on the remote host while 
using ssh(1).



The best solution is to tell rsync(1) not to generate messages on 
standard output -- do not use --verbose, do not use --info, do not use 
--progress, etc.; use --quiet, etc..


All good hints after it is done.  Now the question is how did it make 
this: homevol s/b very close to /home  in size but:

root@coyote:~# df && free
Filesystem  1K-blocks  Used  Available Use% Mounted on
udev 16327704 0   16327704   0% /dev
tmpfs 3272684  19123270772   1% /run
/dev/sda1   863983352  22348472  797673232   3% /
tmpfs16363420  1244   16362176   1% /dev/shm
tmpfs5120 8   5112   1% /run/lock
/dev/sda347749868   784   45291076   1% /tmp
/dev/md0p1 1796382580 335102676 1369954928  20% /home
tmpfs 3272684  49563267728   1% /run/user/1000
/dev/sdh1  1967892164 354519236 1513336680  19% /mnt/homevol
   totalusedfree  shared  buff/cache 
available
Mem:32726840 3417576  515520  93454030072184 
29309264

Swap:  1119027122048   111900664
root@coyote:~#

It somehow changed 335G into 354G. Thinking the AppImages dir is full of 
soft links of short names pointing at the long filename and had turned 
the links into duplicates, that was the first thing I checked, but it 
was all good soft-links, so where did the extra 19.4G's come from? Can 
filesystem ext4's overhead account for that?


David


Thanks David.

.


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help


On 1/17/24 12:30, gene heskett wrote:

By LABELing the partitions uniquely, that problem so far as I can
see, is solved.



Okay.


So, are you confident that your motherboard ports, HBA ports, and SSD's
are all working correctly now?



The OOM death of the system was the xfce4 terminal apparently being
set for unlimited scrollback and that was eating the memory. 
Switching to Konsole with has the ability to control the scrollback

to 200 lines, and its taken all 32G's as .cache and 1536 1k blocks of
swap, and its working w/o any OOM actions I've detected.



Okay.


Xfce -> Terminal Emulator -> right click on screen -> Preferences ->
General -> Scrolling:

Scrollback  200
Unlimited scrollbackuncheck


Using tee(1) would allow you to both monitor progress and save standard 
output and/or standard error (via shell redirection).



A related issue is that lots of standard output can slow a program. 
Minimizing a terminal can help.  Redirecting standard output to a file 
or to /dev/null can help, especially when done on the remote host while 
using ssh(1).



The best solution is to tell rsync(1) not to generate messages on 
standard output -- do not use --verbose, do not use --info, do not use 
--progress, etc.; use --quiet, etc..



David

Re: smartctl cannot access my storage, need syntax help


On 1/17/24 16:16, Thomas Schmitt wrote:

Hi,

i wrote:

What did finally help ? Just the shorter terminal scroll back memory ?


gene heskett wrote:

That, and possibly the --bwlimit=10m, giving the SSD time to keep their
stuff in one sock.


Then i place my bet on the terminal alone.
Linux is able to handle disk-to-disk copies that are larger than the
available memory. This is a standard use case.



How large was it set when your runs caused the OOM killer to act ?



different terminal, xfce4's is apparently unlimited but can't find it in the
config prefs.


I normally start new xterms by

   xterm -ls -geometry 80x24 -bg wheat -fg black -sl 1 +sb &

The -sl option gives the number of lines to be memorized for scrollback.
Black-on-wheat is a calmative color combination which does not overwork
the eyes.


Thank you, I did not know that.


Have a nice day :)

Thomas

.


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help

Hi,

i wrote:
> > What did finally help ? Just the shorter terminal scroll back memory ?

gene heskett wrote:
> That, and possibly the --bwlimit=10m, giving the SSD time to keep their
> stuff in one sock.

Then i place my bet on the terminal alone.
Linux is able to handle disk-to-disk copies that are larger than the
available memory. This is a standard use case.

> > How large was it set when your runs caused the OOM killer to act ?

> different terminal, xfce4's is apparently unlimited but can't find it in the
> config prefs.

I normally start new xterms by

  xterm -ls -geometry 80x24 -bg wheat -fg black -sl 1 +sb &

The -sl option gives the number of lines to be memorized for scrollback.
Black-on-wheat is a calmative color combination which does not overwork
the eyes.

Have a nice day :)

Thomas

Re: smartctl cannot access my storage, need syntax help


On 1/17/24 09:31, Thomas Schmitt wrote:

Hi,

David Christensen wrote:

I suspect the conflicting serial numbers are causing problems in the kernel,
as indicated by the /dev/disk/by-id/* problems.


That's not in the kernel but in udev/systemd's process of creating the
symbolic links in /dev/disk/by-id/.
It gets /dev/sd[h-l] and /dev/sd[h-l]1 as kernel generated device files.
But sd[ij] and also sd[hl] show pair-wise the same serial numbers.
In case of sd[ij] the outcome is mixed: links to sdi and sdj1 survive.
In case of sd[hl] we see a less strange outcome: sdh and sdh1, while
sdl and sdl1 are missing.

The open question (at least to me) is whether it's the disks or the
controllers or the drivers which cause the duplication.



Thank you for the explanation.


I would still remove them.


David

Re: smartctl cannot access my storage, need syntax help


On 1/17/24 15:13, gene heskett wrote:

On 1/17/24 11:30, Thomas Schmitt wrote:

Hi,

after i began enumerating suspects, gene heskett wrote:

terminals scroll back memory, I purposely set this
particular terminals scrollback to 200 lines with that in mind.


How large was it set when your runs caused the OOM killer to act ?
different terminal, xfce4's is apparently unlimited but can't find it in 
the config prefs.



I have a good number of xterms with 10,000 lines each. No tabs, no KDE,
but 8 fvwm "desktops" (virtual screens) full of terminal windows.


12 workspaces with 1 to 8 tabs open. 32G of main memory.




[Request to test the disks one-by-one on some other computer, whether
  they bear the same serial number at all controllers in all machines.]


Not as easily tried, the other 4 are in twin mounts in another 
portion of

the drive cages in this 30" tall tiger direct cage and not too readily
accessible w/o tipping the mobo out on its hinged mount.


One should raise protest at Gigastone if the disks really have the same
serial numbers. But before doing so, one would have to make sure that
it is not some weird effect of them all being plugged into that machine
at the same time.
Should not be a problem if labeled uniquely.  And that's easily affected 
by gparted.




One of you made the remark that seems to be the secret password.


What did finally help ? Just the shorter terminal scroll back memory ?


That, and possibly the --bwlimit=10m, giving the SSD time to keep their 
stuff in one sock.



It would explain why a verbose rsync could summon the OOM killer always
around the same stage of progress. But what waste of memory would have
to happen with each of the rsync messages ?


Everything you see flying by when the -v is in the opts, and some of the 
pathnames are 250-300 bytes long.



(You mentioned LABEL as a possibility. But not as actually used.)


Yes I have, repeatedly.



Its still, slowly at 10 megs a second, working.


I see in your previous mail rsync option --bwlimit=10m . But in the
same mail there is an older quote from you that --bwlimit=3m only
prolonged the time until the OOM killer appeared.
So i wonder whether it would work at a more contemporary speed.


I can't change it for testing?  Boggles my mind.


A probably informative test.
But as yet not tested.




Self-incrimination: The rest of this mail is off topic.


they gave all 7nth graders the Iowa
test in 1947, similar to the S/B IQ test but not copyrighted, there 
fore a

lot cheaper, and I came out of that with an equivalent of 147.


I was tested in the 1960s but they did not tell the results to kids or
parents. We only got recommendations at which of our three types of 
school

we should continue at the age of 10 or 11 years.


That I believe was the intention but one of the teachers was a 
blabbermouth.



(So it was not to avoid discrimination of the dumb but rather to avoid
that pupils feel more intelligent than their teachers.)


That avoidance was untenable, in the 1st semester of my freshman year I 
got thrown out of the senior physics class for correcting an erroneous 
statement by the teacher that was patently at odds with Newton's 3rd law 
of motion. For every action, there is an equal but opposite reaction.


Pretty basic stuff. But correcting the teacher in front of the other 
students was absolutely not to be tolerated. But I felt correcting him 
AND setting it straight was more important to the rest of the nominally 
20 students present than any embarrassment it may have caused him.


Same with the papered EE's who can't understand that E=MV2 does not have 
a speed floor, below which its doesn't work when the electron beam in a 
klystron amplifier is only moving at a potential of 20,000 volts. The 
problem not understood is that the amplification is obtained not from a 
current variation, but a velocity variation induced by a 1 watt signal 
speeding up or slowing down the passing beam as it traverses the first 
cavity of 4, the next two to control the bandwidth, the last one picks 
30 kilowatts back off the beam by the capacitative coupling effects as 
the beam goes on thru into a copper funnel cooled by 70 gallons of very 
pure water to absorb the end of that beam which takes around 125 
kilowatts to generate.


I forgot to mention that 70 gallons figure is a per minute value 
supplied by a 15 hp ingersol-rand pump. A semi sealed system that has a 
4' wide x8' long x1.5' thick radiator supplied with external cooling air 
by a another 20 horse motor. Rigged by vent louvers to control the air 
flow to maintain the water above freezing. That 20 horse had the power 
to blow that whole louver out into the field behind the building when 
the modutrol motor that controlled that hot air exit louver failed to 
open it at signon time one morning. Panic call from the remote control 
site as it was only about 20F outsid

Re: smartctl cannot access my storage, need syntax help


On 1/17/24 12:27, Thomas Schmitt wrote:

Hi,

David Christensen wrote:

I suspect the conflicting serial numbers are causing problems in the kernel,
as indicated by the /dev/disk/by-id/* problems.


That's not in the kernel but in udev/systemd's process of creating the
symbolic links in /dev/disk/by-id/.
It gets /dev/sd[h-l] and /dev/sd[h-l]1 as kernel generated device files.
But sd[ij] and also sd[hl] show pair-wise the same serial numbers.
In case of sd[ij] the outcome is mixed: links to sdi and sdj1 survive.
In case of sd[hl] we see a less strange outcome: sdh and sdh1, while
sdl and sdl1 are missing.


missing because the original command line did not look at sdl.
I added the l and it showed up. No magic.


The open question (at least to me) is whether it's the disks or the
controllers or the drivers which cause the duplication.

Neither, a typu in the original command.



Have a nice day :)

Thomas

.


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help

On 1/17/24 12:16, Thomas Schmitt wrote:

Hi,

Curt wrote:

I discovered a couple of discussions of the phenomenon, the upshot of which
were:
1) That's what you get when you purchase cheap SSDs.
https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/
2) SSDs belonging to the same software RAID show identical serial numbers
in software, but these numbers don't match the serial numbers printed on the
SSDs themselves.
https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/

Those URLs are identical. (OMG ! Is it contageous ?)

Number 2 would match my suspicion that some layer in the disk driving
gets confused and mixes up the serial numbers.

But you said *similar*.

By "colliding serial numbers" i mean indeed "identical serial numbers".

How cheap the disks may ever be, that would be no excuse for not making
them individually distinguishable.

As Gene's threads have too many movable parts
for me to follow, on that point I couldn't say.

This one begins to gain presence in the web. So one can use search engines
and AI to untangle its sub-threads. I meanwhile participate in two of them:
serial number collision, rsync caused OOM killer (solved now, but how ?).

By LABELing the partitions uniquely, that problem so far as I can see,
is solved.

The OOM death of the system was the xfce4 terminal apparently being set
for unlimited scrollback and that was eating the memory. Switching to
Konsole with has the ability to control the scrollback to 200 lines, and
its taken all 32G's as .cache and 1536 1k blocks of swap, and its
working w/o any OOM actions I've detected.

Have a nice day :)

Thomas

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
- Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help


On 1/17/24 11:38, Curt wrote:

On 2024-01-17, Thomas Schmitt  wrote:


This is just weird.
I still have difficulties to believe that any disk manufacturer would
hand out disks with colliding serial numbers. I googled for this
phenomenon, but except two mails of Gene nothing similar popped up.


I discovered a couple of discussions of the phenomenon, the upshot of which
were:

1) That's what you get when you purchase cheap SSDs.

https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/

2) SSDs belonging to the same software RAID show identical serial numbers
in software, but these numbers don't match the serial numbers printed on the 
SSDs themselves.


But the drives in question are not yet and never have been in a raid 
just plugged in awaiting my putting them to work.


https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/

But you said *similar*. As Gene's threads have too many movable parts
for me to follow, on that point I couldn't say.

.


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help


On 1/17/24 11:30, Thomas Schmitt wrote:

Hi,

after i began enumerating suspects, gene heskett wrote:

terminals scroll back memory, I purposely set this
particular terminals scrollback to 200 lines with that in mind.


How large was it set when your runs caused the OOM killer to act ?
different terminal, xfce4's is apparently unlimited but can't find it in 
the config prefs.



I have a good number of xterms with 10,000 lines each. No tabs, no KDE,
but 8 fvwm "desktops" (virtual screens) full of terminal windows.


12 workspaces with 1 to 8 tabs open. 32G of main memory.




[Request to test the disks one-by-one on some other computer, whether
  they bear the same serial number at all controllers in all machines.]



Not as easily tried, the other 4 are in twin mounts in another portion of
the drive cages in this 30" tall tiger direct cage and not too readily
accessible w/o tipping the mobo out on its hinged mount.


One should raise protest at Gigastone if the disks really have the same
serial numbers. But before doing so, one would have to make sure that
it is not some weird effect of them all being plugged into that machine
at the same time.
Should not be a problem if labeled uniquely.  And that's easily affected 
by gparted.




One of you made the remark that seems to be the secret password.


What did finally help ? Just the shorter terminal scroll back memory ?


That, and possibly the --bwlimit=10m, giving the SSD time to keep their 
stuff in one sock.



It would explain why a verbose rsync could summon the OOM killer always
around the same stage of progress. But what waste of memory would have
to happen with each of the rsync messages ?

(You mentioned LABEL as a possibility. But not as actually used.)



Its still, slowly at 10 megs a second, working.


I see in your previous mail rsync option --bwlimit=10m . But in the
same mail there is an older quote from you that --bwlimit=3m only
prolonged the time until the OOM killer appeared.
So i wonder whether it would work at a more contemporary speed.


A probably informative test.
But as yet not tested.




Self-incrimination: The rest of this mail is off topic.


they gave all 7nth graders the Iowa
test in 1947, similar to the S/B IQ test but not copyrighted, there fore a
lot cheaper, and I came out of that with an equivalent of 147.


I was tested in the 1960s but they did not tell the results to kids or
parents. We only got recommendations at which of our three types of school
we should continue at the age of 10 or 11 years.


That I believe was the intention but one of the teachers was a blabbermouth.


(So it was not to avoid discrimination of the dumb but rather to avoid
that pupils feel more intelligent than their teachers.)


That avoidance was untenable, in the 1st semester of my freshman year I 
got thrown out of the senior physics class for correcting an erroneous 
statement by the teacher that was patently at odds with Newton's 3rd law 
of motion. For every action, there is an equal but opposite reaction.


Pretty basic stuff. But correcting the teacher in front of the other 
students was absolutely not to be tolerated. But I felt correcting him 
AND setting it straight was more important to the rest of the nominally 
20 students present than any embarrassment it may have caused him.


Same with the papered EE's who can't understand that E=MV2 does not have 
a speed floor, below which its doesn't work when the electron beam in a 
klystron amplifier is only moving at a potential of 20,000 volts. The 
problem not understood is that the amplification is obtained not from a 
current variation, but a velocity variation induced by a 1 watt signal 
speeding up or slowing down the passing beam as it traverses the first 
cavity of 4, the next two to control the bandwidth, the last one picks 
30 kilowatts back off the beam by the capacitative coupling effects as 
the beam goes on thru into a copper funnel cooled by 70 gallons of very 
pure water to absorb the end of that beam which takes around 125 
kilowatts to generate.


But that beams electrons have mass, another name for weight, and one 
watt to slow them slows them more than 1 watt to speed them up speeds 
them up, so at high power levels, the tube is effective longer in terms 
of the transit time. This puts a time of flight error into the signal we 
didn't know how to pre-distort for in the 1970's. A very dependable way 
to generate transmitter power levels that was also not very efficient, 
95% of the uhf stations that went dark in those years were bankrupted by 
the power bills even at 3 cents a kw.


So there was a huge financial push to find a better method as that time 
distortion would have killed hidef tv before it ever got out of the 
laboritory,


And E=MV2 is as valid at 25 mph as it is at C speed, nominally 186,272 
miles per second.


Yup, I understand Albert Eintein's theory.  Di

Re: smartctl cannot access my storage, need syntax help

Hi,

i see that i messed up "h" and "k" in my explanation of the fight over
the link targets in /dev/disk/by-id. So another attempt:

sdh has a unique serial number GSTD02TB230102. Thus we see in
  https://lists.debian.org/debian-user/2024/01/msg00667.html
these two links:

  /dev/sdh/dev/disk/by-id/ata-Gigastone_SSD_GSTD02TB230102
  /dev/sdh1   /dev/disk/by-id/ata-Gigastone_SSD_GSTD02TB230102-part1

sdi and sdj share the serial number GST02TBG221146. So the concurrent
attempts to create the links let only these two survive:

  /dev/sdi/dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
  /dev/sdj1   /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146-part1

sdk and sdl share GSTG02TB230206. The survivors are:

  /dev/sdk/dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206
  /dev/sdk1   /dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206-part1

The next system startup might yield other survivors.


Have a nice day :)

Thomas

Re: smartctl cannot access my storage, need syntax help

Hi,

David Christensen wrote:
> I suspect the conflicting serial numbers are causing problems in the kernel,
> as indicated by the /dev/disk/by-id/* problems.

That's not in the kernel but in udev/systemd's process of creating the
symbolic links in /dev/disk/by-id/.
It gets /dev/sd[h-l] and /dev/sd[h-l]1 as kernel generated device files.
But sd[ij] and also sd[hl] show pair-wise the same serial numbers.
In case of sd[ij] the outcome is mixed: links to sdi and sdj1 survive.
In case of sd[hl] we see a less strange outcome: sdh and sdh1, while
sdl and sdl1 are missing.

The open question (at least to me) is whether it's the disks or the
controllers or the drivers which cause the duplication.

Have a nice day :)

Thomas

Re: smartctl cannot access my storage, need syntax help

Hi,

Curt wrote:
> I discovered a couple of discussions of the phenomenon, the upshot of which
> were:
> 1) That's what you get when you purchase cheap SSDs.
> https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/
> 2) SSDs belonging to the same software RAID show identical serial numbers
> in software, but these numbers don't match the serial numbers printed on the
> SSDs themselves.
> https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/

Those URLs are identical. (OMG ! Is it contageous ?)

Number 2 would match my suspicion that some layer in the disk driving
gets confused and mixes up the serial numbers.

> But you said *similar*.

By "colliding serial numbers" i mean indeed "identical serial numbers".

How cheap the disks may ever be, that would be no excuse for not making
them individually distinguishable.

> As Gene's threads have too many movable parts
> for me to follow, on that point I couldn't say.

This one begins to gain presence in the web. So one can use search engines
and AI to untangle its sub-threads. I meanwhile participate in two of them:
serial number collision, rsync caused OOM killer (solved now, but how ?).

Have a nice day :)

Thomas

Re: smartctl cannot access my storage, need syntax help


On 1/17/24 06:18, gene heskett wrote:

On 1/17/24 00:52, David Christensen wrote:
I suggest removing one GST02TBG221146 and one GSTG02TB230206.  Put 
them on the shelf, in other computer(s), or sell them.  Then perhaps 
copying the /home RAID10 2 TB to one Gigastone 2 TB SSD would work.


Or LABEL them.



I suspect the conflicting serial numbers are causing problems in the 
kernel, as indicated by the /dev/disk/by-id/* problems.  I would remove 
one each of the duplicate serial number disks to eliminate that possibility.



David

Cheers, Gene Heskett.

Re: smartctl cannot access my storage, need syntax help


On 1/16/24 23:46, Thomas Schmitt wrote:

Gene Heskett wrote:
One of these mails from a thread in december reveals that the three
unique serial numbers GSTD02TB230102, GST02TBG221146, GSTG02TB230206
each come with a different version of "1C0", "7A0", "5A0", respectively.
   https://www.mail-archive.com/debian-user@lists.debian.org/msg799307.html
That's unexpected, too, as the disk properties look identical elsewise.



Thank you for locating the lshw(1) output.  It appears to have been run 
when one Gigastone SSD was on the motherboard SATA controller and four 
Gigastone SSD's were on the 6-port HBA:


2024-01-17 08:58:54 dpchrist@laalaa ~
$ egrep 'sata|disk|product|version|serial' gene-heskett-coyote-lshw.out 
| grep -B 1 -A 2 Gigastone

  *-disk:1
   product: Gigastone SSD
   version: 7A0
   serial: GST02TBG221146
--
  *-disk:0
   product: Gigastone SSD
   version: 7A0
   serial: GST02TBG221146
--
  *-disk:1
   product: Gigastone SSD
   version: 5A0
   serial: GSTG02TB230206
--
  *-disk:2
   product: Gigastone SSD
   version: 5A0
   serial: GSTG02TB230206
--
  *-disk:3
   product: Gigastone SSD
   version: 1C0
   serial: GSTD02TB230102


David

Re: smartctl cannot access my storage, need syntax help

2024-01-17 Thread Curt

On 2024-01-17, Thomas Schmitt  wrote:
>
> This is just weird.
> I still have difficulties to believe that any disk manufacturer would
> hand out disks with colliding serial numbers. I googled for this
> phenomenon, but except two mails of Gene nothing similar popped up.

I discovered a couple of discussions of the phenomenon, the upshot of which
were:

1) That's what you get when you purchase cheap SSDs.

https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/

2) SSDs belonging to the same software RAID show identical serial numbers
in software, but these numbers don't match the serial numbers printed on the 
SSDs themselves.

https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/

But you said *similar*. As Gene's threads have too many movable parts
for me to follow, on that point I couldn't say.

Re: smartctl cannot access my storage, need syntax help

Hi,

after i began enumerating suspects, gene heskett wrote:
> terminals scroll back memory, I purposely set this
> particular terminals scrollback to 200 lines with that in mind.

How large was it set when your runs caused the OOM killer to act ?

I have a good number of xterms with 10,000 lines each. No tabs, no KDE,
but 8 fvwm "desktops" (virtual screens) full of terminal windows.

> > [Request to test the disks one-by-one on some other computer, whether
> >  they bear the same serial number at all controllers in all machines.]

> Not as easily tried, the other 4 are in twin mounts in another portion of
> the drive cages in this 30" tall tiger direct cage and not too readily
> accessible w/o tipping the mobo out on its hinged mount.

One should raise protest at Gigastone if the disks really have the same
serial numbers. But before doing so, one would have to make sure that
it is not some weird effect of them all being plugged into that machine
at the same time.

> One of you made the remark that seems to be the secret password.

What did finally help ? Just the shorter terminal scroll back memory ?

It would explain why a verbose rsync could summon the OOM killer always
around the same stage of progress. But what waste of memory would have
to happen with each of the rsync messages ?

(You mentioned LABEL as a possibility. But not as actually used.)

> Its still, slowly at 10 megs a second, working.

I see in your previous mail rsync option --bwlimit=10m . But in the
same mail there is an older quote from you that --bwlimit=3m only
prolonged the time until the OOM killer appeared.
So i wonder whether it would work at a more contemporary speed.

Self-incrimination: The rest of this mail is off topic.

> they gave all 7nth graders the Iowa
> test in 1947, similar to the S/B IQ test but not copyrighted, there fore a
> lot cheaper, and I came out of that with an equivalent of 147.

I was tested in the 1960s but they did not tell the results to kids or
parents. We only got recommendations at which of our three types of school
we should continue at the age of 10 or 11 years.
(So it was not to avoid discrimination of the dumb but rather to avoid
that pupils feel more intelligent than their teachers.)

Have a nice day :)

Thomas

Re: smartctl cannot access my storage, need syntax help


On 1/17/24 02:42, Thomas Schmitt wrote:

Hi,

Gene Heskett wrote:

lsblk, which I've published several times, shows 5 drives.


Duh. Obviously this thread overstretches my mental capacity.



And I've since tried cp in addition to rsync, does the same thing, killing
the sysytem with the OOM but much quicker. cp using all system memory (32Gb)
in 1 minute, another 500K into swap adds another 15 secs, and the OOM kills
the system. So both cp and rsync act broken.


I get the suspicion that your disk set overstretches the mental capacity
of the hardware or the operating system.
Both "cp" and "rsync" are heavily tested by the GNU/Linux community and
quite independently developed. A common memory leak would have to sit
deeper in the software stack, i.e. in kernel or firmware.


kernel. firmware, or terminals scroll back memory, I purposely set this 
particular terminals scrollback to 200 lines with that in mind.





rsync, with a --bwlimit=3m set, takes much longer to kill the system but the
amount of data moved is very similar, 13.5G from clean disk to system freeze
for rsync, 13.4G for cp.


This observation might be significant. But i fail to make up a theory.


One of the things I'm fairly good at, they gave all 7nth graders the 
Iowa test in 1947, similar to the S/B IQ test but not copyrighted, there 
fore a lot cheaper, and I came out of that with an equivalent of 147. I 
quit school 2 years later when I could and went to work fixing tv's. Had 
my draft number moved up in '52 in the middle of korea to get that out 
of the way, drafted was 2 years, volunteered was 4 years, but failed the 
AFQT by getting a 98 out of 100, which earned me a 4F classification 
because I wouldn't take orders from the Sargent, I find out the next 
best score that day among 130+ boys was 36/100 which freed me to let a 
girl become my wife in '57, & started making kids, got a 1st phone in 
1962 without cracking a book, did the same thing in 1972 to become a 
registered CET which I'll readily admit is getting rusty in my dotage at 
89 yo. The technology is slowly passing me by since I retired in the 
middle of 2002. Because I went diabetic in the '80's, my beer limit is 
1, but I'd do it with any of you folks if we ever meet in person.  Let 
the war stories flow. ;o)>  <-smiley with a goatee.


That copy is now up to 4x the data copied in any other try.
root@coyote:~# df && free
Filesystem  1K-blocks  Used  Available Use% Mounted on
udev 16327704 0   16327704   0% /dev
tmpfs 3272684  19043270780   1% /run
/dev/sda1   863983352  22346308  797675396   3% /
tmpfs16363420  1244   16362176   1% /dev/shm
tmpfs5120 8   5112   1% /run/lock
/dev/sda347749868   612   45291248   1% /tmp
/dev/md0p1 1796382580 335101664 1369955940  20% /home
tmpfs 3272684  37523268932   1% /run/user/1000
/dev/sdh1  1967892164  64369552 1803486364   4% /mnt/homevol
   totalusedfree  shared  buff/cache 
available
Mem:32726840 3453372  199708  91904430336824 
29273468

Swap:  1119027121536   111901176
And swap use has not increased, its stabilized.





gene@coyote:~/src/klipper-docs$  lsblk -d -o NAME,MAJ:MIN,MODEL,SERIAL,WWN 
/dev/sd[hijkl]
NAME MAJ:MIN MODEL SERIAL WWN
sdh8:112 Gigastone SSD GSTD02TB230102
sdi8:128 Gigastone SSD GST02TBG221146
sdj8:144 Gigastone SSD GST02TBG221146
sdk8:160 Gigastone SSD GSTG02TB230206
sdl8:176 Gigastone SSD GSTG02TB230206


This is just weird.
I still have difficulties to believe that any disk manufacturer would
hand out disks with colliding serial numbers. I googled for this
phenomenon, but except two mails of Gene nothing similar popped up.

One of these mails from a thread in december reveals that the three
unique serial numbers GSTD02TB230102, GST02TBG221146, GSTG02TB230206
each come with a different version of "1C0", "7A0", "5A0", respectively.


Which is why, when I let my imagination out to play w/o a chaperone, my 
thoughts run toward some invented date code for a batch number.



   https://www.mail-archive.com/debian-user@lists.debian.org/msg799307.html
That's unexpected, too, as the disk properties look identical elsewise.

I guess that it is not possible to identify which disk came with which
of the two separate purchases ?


Once removed from the boxes, no.


How many days were these purchases apart ?


6 weeks or so, as I formulated what to do next. But that isn't carved 
even in sandstone.


David Christensen wrote:

I suggest removing one GST02TBG221146 and one GSTG02TB230206.  Put them on
the shelf, in other computer(s), or sell them.  Then perhaps copying the
/home RAID10 2 TB to one Gigastone 2 TB SSD would work.


I join this proposal.
... and dimly remember to have seen the proposal to attach the disks
one by one without the other four, in order to see whether the

Re: smartctl cannot access my storage, need syntax help


On 1/17/24 00:52, David Christensen wrote:

On 1/16/24 17:08, gene heskett wrote:
 > lsblk, which I've published several times, shows 5 drives. by-id listing
 > only shows 3. The drive I've been trying to use bounces from /dev/sdd to
 > sde to sdh dependin on which controller it is curently plugged into.
 >
 > And I've since tried cp in addition to rsync, does the same thing,
 > killing the sysytem with the OOM but much quicker. cp using all system
 > memory (32Gb) in 1 minute, another 500K into swap adds another 15 secs,
 > and the OOM kills the system. So both cp and rsync act broken.
 >
 > rsync, with a --bwlimit=3m set, takes much longer to kill the system but
 > the amount of data moved is very similar, 13.5G from clean disk to
 > system freeze for rsync, 13.4G for cp.


On 1/16/24 18:10, gene heskett wrote:

On 1/16/24 11:08, Thomas Schmitt wrote:

  ls -l /dev/sd[ij]*

oot@coyote:~#  ls -l /dev/sd[ij]*
brw-rw 1 root disk 8, 128 Jan 16 05:01 /dev/sdi
brw-rw 1 root disk 8, 129 Jan 16 05:01 /dev/sdi1
brw-rw 1 root disk 8, 144 Jan 16 05:01 /dev/sdj
brw-rw 1 root disk 8, 145 Jan 16 05:01 /dev/sdj1
root@coyote:~#

lsblk -d -o NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijkl]
gene@coyote:~/src/klipper-docs$  lsblk -d -o 
NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijkl]

NAME MAJ:MIN MODEL SERIAL WWN
sdh    8:112 Gigastone SSD GSTD02TB230102
sdi    8:128 Gigastone SSD GST02TBG221146
sdj    8:144 Gigastone SSD GST02TBG221146
sdk    8:160 Gigastone SSD GSTG02TB230206
sdl    8:176 Gigastone SSD GSTG02TB230206



I suggest removing one GST02TBG221146 and one GSTG02TB230206.  Put them 
on the shelf, in other computer(s), or sell them.  Then perhaps copying 
the /home RAID10 2 TB to one Gigastone 2 TB SSD would work.



David

.

Or LABEL them.
And I seem to be making some progress this morning. opening a konsole 
and setting scrollback to 200 lines, limiting its use of memory, the tan 
memory bar in htop if full scale and it a couple megs into swap out of 
107G. and the system still feels normal.

in another multitabbed xfce4 shell, a "df && free" is showing this:
root@coyote:~# df && free
Filesystem  1K-blocks  Used  Available Use% Mounted on
udev 16327704 0   16327704   0% /dev
tmpfs 3272684  19043270780   1% /run
/dev/sda1   863983352  22346276  797675428   3% /
tmpfs16363420  1244   16362176   1% /dev/shm
tmpfs5120 8   5112   1% /run/lock
/dev/sda347749868   580   45291280   1% /tmp
/dev/md0p1 1796382580 335100148 1369957456  20% /home
tmpfs 3272684  37523268932   1% /run/user/1000
/dev/sdh1  1967892164  23830812 1844025104   2% /mnt/homevol
   totalusedfree  shared  buff/cache 
available
Mem:32726840 3343048  218316  92219630443960 
29383792

Swap:  1119027121536   111901176
root@coyote:~#

rsync has been stopped and restarted, 4 times, but stopping it has not 
recovered the cache, so swap is increasing slowly.

That faint knocking sound? Me, knocking on wood... ;o)>

command line: rsync -a --bwlimit=10m --fsync --progress /home/ /mnt/homevol

So we'll eventually either git-r-done or crask the system but this is 
farther than it ever got before in several days.


Thanks everybody.

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help

2024-01-16 Thread Thomas Schmitt

Hi,

Gene Heskett wrote:
> lsblk, which I've published several times, shows 5 drives.

Duh. Obviously this thread overstretches my mental capacity.

> And I've since tried cp in addition to rsync, does the same thing, killing
> the sysytem with the OOM but much quicker. cp using all system memory (32Gb)
> in 1 minute, another 500K into swap adds another 15 secs, and the OOM kills
> the system. So both cp and rsync act broken.

I get the suspicion that your disk set overstretches the mental capacity
of the hardware or the operating system.
Both "cp" and "rsync" are heavily tested by the GNU/Linux community and
quite independently developed. A common memory leak would have to sit
deeper in the software stack, i.e. in kernel or firmware.

> rsync, with a --bwlimit=3m set, takes much longer to kill the system but the
> amount of data moved is very similar, 13.5G from clean disk to system freeze
> for rsync, 13.4G for cp.

This observation might be significant. But i fail to make up a theory.

> gene@coyote:~/src/klipper-docs$  lsblk -d -o NAME,MAJ:MIN,MODEL,SERIAL,WWN 
> /dev/sd[hijkl]
> NAME MAJ:MIN MODEL SERIAL WWN
> sdh8:112 Gigastone SSD GSTD02TB230102
> sdi8:128 Gigastone SSD GST02TBG221146
> sdj8:144 Gigastone SSD GST02TBG221146
> sdk8:160 Gigastone SSD GSTG02TB230206
> sdl8:176 Gigastone SSD GSTG02TB230206

This is just weird.
I still have difficulties to believe that any disk manufacturer would
hand out disks with colliding serial numbers. I googled for this
phenomenon, but except two mails of Gene nothing similar popped up.

One of these mails from a thread in december reveals that the three
unique serial numbers GSTD02TB230102, GST02TBG221146, GSTG02TB230206
each come with a different version of "1C0", "7A0", "5A0", respectively.
  https://www.mail-archive.com/debian-user@lists.debian.org/msg799307.html
That's unexpected, too, as the disk properties look identical elsewise.

I guess that it is not possible to identify which disk came with which
of the two separate purchases ?
How many days were these purchases apart ?

David Christensen wrote:
> I suggest removing one GST02TBG221146 and one GSTG02TB230206.  Put them on
> the shelf, in other computer(s), or sell them.  Then perhaps copying the
> /home RAID10 2 TB to one Gigastone 2 TB SSD would work.

I join this proposal.
... and dimly remember to have seen the proposal to attach the disks
one by one without the other four, in order to see whether the serial
numbers are the same as with all five together.

Since you got quite some hardware zoo:
Consider to try the Gigastone disks with a different machine.
Do the serial numbers show up as with the machine where you experience
all those difficulties.

Have a nice day :)

Thomas

Re: smartctl cannot access my storage, need syntax help

2024-01-16 Thread David Christensen


On 1/16/24 17:08, gene heskett wrote:
> lsblk, which I've published several times, shows 5 drives. by-id listing
> only shows 3. The drive I've been trying to use bounces from /dev/sdd to
> sde to sdh dependin on which controller it is curently plugged into.
>
> And I've since tried cp in addition to rsync, does the same thing,
> killing the sysytem with the OOM but much quicker. cp using all system
> memory (32Gb) in 1 minute, another 500K into swap adds another 15 secs,
> and the OOM kills the system. So both cp and rsync act broken.
>
> rsync, with a --bwlimit=3m set, takes much longer to kill the system but
> the amount of data moved is very similar, 13.5G from clean disk to
> system freeze for rsync, 13.4G for cp.


On 1/16/24 18:10, gene heskett wrote:

On 1/16/24 11:08, Thomas Schmitt wrote:

  ls -l /dev/sd[ij]*

oot@coyote:~#  ls -l /dev/sd[ij]*
brw-rw 1 root disk 8, 128 Jan 16 05:01 /dev/sdi
brw-rw 1 root disk 8, 129 Jan 16 05:01 /dev/sdi1
brw-rw 1 root disk 8, 144 Jan 16 05:01 /dev/sdj
brw-rw 1 root disk 8, 145 Jan 16 05:01 /dev/sdj1
root@coyote:~#

lsblk -d -o NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijkl]
gene@coyote:~/src/klipper-docs$  lsblk -d -o 
NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijkl]

NAME MAJ:MIN MODEL SERIAL WWN
sdh    8:112 Gigastone SSD GSTD02TB230102
sdi    8:128 Gigastone SSD GST02TBG221146
sdj    8:144 Gigastone SSD GST02TBG221146
sdk    8:160 Gigastone SSD GSTG02TB230206
sdl    8:176 Gigastone SSD GSTG02TB230206



I suggest removing one GST02TBG221146 and one GSTG02TB230206.  Put them 
on the shelf, in other computer(s), or sell them.  Then perhaps copying 
the /home RAID10 2 TB to one Gigastone 2 TB SSD would work.



David

Re: smartctl cannot access my storage, need syntax help

2024-01-16 Thread Felix Miata

gene heskett composed on 2024-01-16 20:08 (UTC-0500):

> Felix Miata wrote:

>> I straightened out the wrapping mess, and gave each entry a line number. I 
>> see
>> nothing I recognize as representing serial number duplication among /dev/sdX
>> (physical device) names:

>> /dev/sda 9  /dev/disk/by-id/ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V
>> /dev/sdd19  /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V
>> /dev/sde28  /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E
>> /dev/sdf36  /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T
>> /dev/sdg43  /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W
>> /dev/sdh51  /dev/disk/by-id/ata-Gigastone_SSD_GSTD02TB230102
>> /dev/sdi53  /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
>> /dev/sdk55  /dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206

>> Exactly which line numbers represent duplication among the physical drives?

> lsblk, which I've published several times, shows 5 drives. by-id listing 
> only shows 3. The drive I've been trying to use bounces from /dev/sdd to 
> sde to sdh dependin on which controller it is curently plugged into.

>From your 2024-01-15 17:56 -0500 post, I see 8 unique serial numbers from SATA
SSDs, 5 Samsung, 3 Gigastone.

I ignore all your posts with lsblk that didn't use the -f option to facilitate
identifying individual SSDs.

> And I've since tried cp in addition to rsync, does the same thing, 
> killing the sysytem with the OOM but much quicker. cp using all system 
> memory (32Gb) in 1 minute, another 500K into swap adds another 15 secs, 
> and the OOM kills the system. So both cp and rsync act broken.

> rsync, with a --bwlimit=3m set, takes much longer to kill the system but 
> the amount of data moved is very similar, 13.5G from clean disk to 
> system freeze for rsync, 13.4G for cp.-- 
Evolution as taught in public schools is, like religion,
based on faith, not based on science.

 Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata

Re: smartctl cannot access my storage, need syntax help


On 1/16/24 11:08, Thomas Schmitt wrote:

  ls -l /dev/sd[ij]*

oot@coyote:~#  ls -l /dev/sd[ij]*
brw-rw 1 root disk 8, 128 Jan 16 05:01 /dev/sdi
brw-rw 1 root disk 8, 129 Jan 16 05:01 /dev/sdi1
brw-rw 1 root disk 8, 144 Jan 16 05:01 /dev/sdj
brw-rw 1 root disk 8, 145 Jan 16 05:01 /dev/sdj1
root@coyote:~#

lsblk -d -o NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijkl]
gene@coyote:~/src/klipper-docs$  lsblk -d -o 
NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijkl]

NAME MAJ:MIN MODEL SERIAL WWN
sdh8:112 Gigastone SSD GSTD02TB230102
sdi8:128 Gigastone SSD GST02TBG221146
sdj8:144 Gigastone SSD GST02TBG221146
sdk8:160 Gigastone SSD GSTG02TB230206
sdl8:176 Gigastone SSD GSTG02TB230206
note added l to get them all

gene@coyote:~/src/klipper-docs$

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help


On 1/16/24 06:09, Felix Miata wrote:

Tom Furie composed on 2024-01-16 08:18 (UTC):


Felix Miata writes:



/dev/sdc 18 /dev/disk/by-id/usb-Brother_MFC-J6920DW_BROG5F229909-0:0 #
How does a printer get a storage device assignment???



By having some kind of SD card slot or similar.


So this pollution only results from a USB-connected printer? IP printer
connections don't cause it too?


Since I have one of the above printers it does indeed have an editable 
ipv4 address, but I don't generally use it as the usb2 is faster. Its 
been so long since I did use that interface that I do not recall if it 
listed the card memory.  I'd expect it would since it can also to a free 
standing copy from its tabloid sized scanner.  The printer can handle 
tabloid sized paper by hand feeding, so the copy function includes 
tabloid size too.


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help

On Tue 16 Jan 2024 at 20:08:12 (-0500), gene heskett wrote:
> On 1/16/24 00:56, Felix Miata wrote:
> > gene heskett composed on 2024-01-15 17:56 (UTC-0500):
> > 
> > > Thanks for that composition: but it will be word wrapped:
> > > root@coyote:~# for j in /dev/disk/by-id/* ; do printf '%s\t%s\n'
> > > "$(realpath "$j")" "$j" ; done
[ … ]
> > I straightened out the wrapping mess, and gave each entry a line number. I 
> > see
> > nothing I recognize as representing serial number duplication among /dev/sdX
> > (physical device) names:
> > [ … ]
> > /dev/sdd19  /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V
> > /dev/sdd20  /dev/disk/by-id/wwn-0x5002538f413394ae
> > /dev/sdd1   21  
> > /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V-part1
> > /dev/sdd1   22  /dev/disk/by-id/wwn-0x5002538f413394ae-part1
> > /dev/sdd2   23  
> > /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V-part2
> > /dev/sdd2   24  /dev/disk/by-id/wwn-0x5002538f413394ae-part2
> > /dev/sdd3   25  
> > /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V-part3
> > /dev/sdd3   26  /dev/disk/by-id/wwn-0x5002538f413394ae-part3
> > /dev/sde27  /dev/disk/by-id/wwn-0x5002538f413394a9
> > /dev/sde28  /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E
> > /dev/sde1   29  
> > /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E-part1
> > /dev/sde1   30  /dev/disk/by-id/wwn-0x5002538f413394a9-part1
> > /dev/sde2   31  
> > /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E-part2
> > /dev/sde2   32  /dev/disk/by-id/wwn-0x5002538f413394a9-part2
> > /dev/sde3   33  
> > /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E-part3
> > /dev/sde3   34  /dev/disk/by-id/wwn-0x5002538f413394a9-part3
> > /dev/sdf35  /dev/disk/by-id/wwn-0x5002538f413394a5
> > /dev/sdf36  /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T
> > /dev/sdf1   37  
> > /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T-part1
> > /dev/sdf1   38  /dev/disk/by-id/wwn-0x5002538f413394a5-part1
> > /dev/sdf2   39  
> > /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T-part2
> > /dev/sdf2   40  /dev/disk/by-id/wwn-0x5002538f413394a5-part2
> > /dev/sdf3   41  
> > /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T-part3
> > /dev/sdf3   42  /dev/disk/by-id/wwn-0x5002538f413394a5-part3
> > /dev/sdg43  /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W
> > /dev/sdg44  /dev/disk/by-id/wwn-0x5002538f413394b0
> > /dev/sdg1   45  
> > /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W-part1
> > /dev/sdg1   46  /dev/disk/by-id/wwn-0x5002538f413394b0-part1
> > /dev/sdg2   47  
> > /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W-part2
> > /dev/sdg2   48  /dev/disk/by-id/wwn-0x5002538f413394b0-part2
> > /dev/sdg3   49  
> > /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W-part3
> > /dev/sdg3   50  /dev/disk/by-id/wwn-0x5002538f413394b0-part3

> lsblk, which I've published several times, shows 5 drives. by-id
> listing only shows 3. The drive I've been trying to use bounces from
> /dev/sdd to sde to sdh dependin on which controller it is curently
> plugged into.

I take it that you're trying to copy to one Gigastone SSD. Presumably
the kernel favours some controllers over others in the race to name
them. This is why using the kernel's device names is no longer
recommended.

> And I've since tried cp in addition to rsync, does the same thing,
> killing the sysytem with the OOM but much quicker. cp using all system
> memory (32Gb) in 1 minute, another 500K into swap adds another 15
> secs, and the OOM kills the system. So both cp and rsync act broken.

I'd be tempted to bisect the problem by copying to another machine
though a cat5 cable.

> rsync, with a --bwlimit=3m set, takes much longer to kill the system
> but the amount of data moved is very similar, 13.5G from clean disk to
> system freeze for rsync, 13.4G for cp.

I don't know enough about how rsync behaves to interpret that
coincidence, but it seems ominous on its face.

Cheers,
David.

Re: smartctl cannot access my storage, need syntax help


On 1/16/24 01:18, Felix Miata wrote:

Felix Miata composed on 2024-01-16 01:05 (UTC-0500):


gene heskett composed on 2024-01-15 18:37 (UTC-0500):



Ah,but I finally glombed onto the bug tan memory bar in htop as it was
runniing, someplace in the data chain is a huge memory leak, my crash is
caused by the OOM daemon killing things. And it only occurs when I run
rsync. Only takes it 10 minute to eat 32G of memory, then 500k into
swap, and the OOM daemon start killing the system until there's nothing
left to run.



What does free report before starting rsync? Do you have all your swap on a
partition? Do you have any swapspace?



I would log out of XFCE, login on a vtty to open top, then login on another to 
try
to run rsync. If that fails OOM too, since the target is ostensibly starting 
from
scratch, use MC, and divide the job into the source's directories if necessary. 
MC
gets rather bogged down if you try to do a bazillion individual files in a 
single
copy operation.


Trying to think outside the box, something else to think about, from the man 
page:
[quote]
--archive, -a
This is equivalent to -rlptgoD.  It is a quick way of saying you want recursion
and want to preserve almost everything.  Be aware that it does not include
preserving ACLs (-A), xattrs (-X), atimes (-U), crtimes (-N), nor the finding 
and
preserving of hardlinks (-H).
[/quote]

If rsync really is bugged, maybe a change of options would avoid the bug. Try
instead of -av, -rlptgoDAXUNH. Could it be that verbosity is the OOM crippler, 
and
not necessarily from rsync itself, but possibly from the xterm in which rsync is
running? Does your source contain any hard links? Do you use ACLs or xattrs?


unreported here because it didn't seem to have any effect, I've tried to 
test that theory by clearing the back-trace buffer at 30 second 
intervals. no obviously detectable effect, untested is setting that back 
to a 1000 line default.


And since I've driven around 170 miles in poor visibility bad weather 
today, no more tests will be done tonight, I'm not the 16 years old I 
was when I learned to drive 70 mph on even worse roads 75 years ago. So 
I'll sign off shortly.


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help


On 1/16/24 01:05, Felix Miata wrote:

gene heskett composed on 2024-01-15 18:37 (UTC-0500):


Ah,but I finally glombed onto the bug tan memory bar in htop as it was
runniing, someplace in the data chain is a huge memory leak, my crash is
caused by the OOM daemon killing things. And it only occurs when I run
rsync. Only takes it 10 minute to eat 32G of memory, then 500k into
swap, and the OOM daemon start killing the system until there's nothing
left to run.


What does free report before starting rsync? Do you have all your swap on a
partition? Do you have any swapspace?


Actually, swap is in 2 locations, one is a swap-dir on /dev/sda, 47G 
IIRC, and 60G on md1.  Shows in htop as 107G total.


I would log out of XFCE, login on a vtty to open top, then login on another to 
try
to run rsync. If that fails OOM too, since the target is ostensibly starting 
from
scratch, use MC, and divide the job into the source's directories if necessary. 
MC
gets rather bogged down if you try to do a bazillion individual files in a 
single
copy operation.


True, but I don't recall it ever failing

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Re: smartctl cannot access my storage, need syntax help


On 1/16/24 00:56, Felix Miata wrote:

gene heskett composed on 2024-01-15 17:56 (UTC-0500):


Thanks for that composition: but it will be word wrapped:
root@coyote:~# for j in /dev/disk/by-id/* ; do printf '%s\t%s\n'
"$(realpath "$j")" "$j" ; done
/dev/sr0/dev/disk/by-id/ata-ATAPI_iHAS424_B_3524253_327133504865
/dev/sdi/dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
/dev/sdj1   /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146-part1
/dev/sdh/dev/disk/by-id/ata-Gigastone_SSD_GSTD02TB230102
/dev/sdh1   /dev/disk/by-id/ata-Gigastone_SSD_GSTD02TB230102-part1
/dev/sdk/dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206
/dev/sdk1   /dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206-part1
/dev/sdf/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T
/dev/sdf1
/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T-part1
/dev/sdf2
/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T-part2
/dev/sdf3
/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T-part3
/dev/sde/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E
/dev/sde1
/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E-part1
/dev/sde2
/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E-part2
/dev/sde3
/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E-part3
/dev/sdd/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V
/dev/sdd1
/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V-part1
/dev/sdd2
/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V-part2
/dev/sdd3
/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V-part3
/dev/sdg/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W
/dev/sdg1
/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W-part1
/dev/sdg2
/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W-part2
/dev/sdg3
/dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W-part3
/dev/sda/dev/disk/by-id/ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V
/dev/sda1
/dev/disk/by-id/ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V-part1
/dev/sda2
/dev/disk/by-id/ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V-part2
/dev/sda3
/dev/disk/by-id/ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V-part3
/dev/md0/dev/disk/by-id/md-name-coyote:0
/dev/md0p1  /dev/disk/by-id/md-name-coyote:0-part1
/dev/md2/dev/disk/by-id/md-name-coyote:2
/dev/md1/dev/disk/by-id/md-name-_none_:1
/dev/md0/dev/disk/by-id/md-uuid-3d5a3621:c0e32c8a:e3f7ebb3:318edbfb
/dev/md0p1
/dev/disk/by-id/md-uuid-3d5a3621:c0e32c8a:e3f7ebb3:318edbfb-part1
/dev/md1/dev/disk/by-id/md-uuid-57a88605:27f5a773:5be347c1:7c5e7342
/dev/md2/dev/disk/by-id/md-uuid-bb6e03ce:19d290c8:5171004f:0127a392
/dev/sdc/dev/disk/by-id/usb-Brother_MFC-J6920DW_BROG5F229909-0:0
/dev/sdb/dev/disk/by-id/usb-USB_Mass_Storage_Device_816820130806-0:0
/dev/sdf/dev/disk/by-id/wwn-0x5002538f413394a5
/dev/sdf1   /dev/disk/by-id/wwn-0x5002538f413394a5-part1
/dev/sdf2   /dev/disk/by-id/wwn-0x5002538f413394a5-part2
/dev/sdf3   /dev/disk/by-id/wwn-0x5002538f413394a5-part3
/dev/sde/dev/disk/by-id/wwn-0x5002538f413394a9
/dev/sde1   /dev/disk/by-id/wwn-0x5002538f413394a9-part1
/dev/sde2   /dev/disk/by-id/wwn-0x5002538f413394a9-part2
/dev/sde3   /dev/disk/by-id/wwn-0x5002538f413394a9-part3
/dev/sdd/dev/disk/by-id/wwn-0x5002538f413394ae
/dev/sdd1   /dev/disk/by-id/wwn-0x5002538f413394ae-part1
/dev/sdd2   /dev/disk/by-id/wwn-0x5002538f413394ae-part2
/dev/sdd3   /dev/disk/by-id/wwn-0x5002538f413394ae-part3
/dev/sdg/dev/disk/by-id/wwn-0x5002538f413394b0
/dev/sdg1   /dev/disk/by-id/wwn-0x5002538f413394b0-part1
/dev/sdg2   /dev/disk/by-id/wwn-0x5002538f413394b0-part2
/dev/sdg3   /dev/disk/by-id/wwn-0x5002538f413394b0-part3
/dev/sda/dev/disk/by-id/wwn-0x5002538f42205e8e
/dev/sda1   /dev/disk/by-id/wwn-0x5002538f42205e8e-part1
/dev/sda2   /dev/disk/by-id/wwn-0x5002538f42205e8e-part2
/dev/sda3   /dev/disk/by-id/wwn-0x5002538f42205e8e-part3
root@coyote:~#
but like I wrote, 2 pairs with identical "serial numbers", so the
assunption is that the last one overwrites the first on by udev, when
IMO it should be yelling about the duplicats.
  
I straightened out the wrapping mess, and gave each entry a line number. I see

nothing I recognize as representing serial number duplication among /dev/sdX
(physical device) names:

/dev/md0 1  /dev/disk/by-id/md-name-coyote:0
/dev/md0 2  /dev/disk/by-id/md-uuid-3d5a3621:c0e32c8a:e3f7ebb3:318edbfb
/dev/md0p1   3  /dev/disk/by-id/md-name-coyote:0-part1
/dev/md0p1   4  
/dev/disk/by-id/md-uuid-3d5a3621:c0e32c8a:e3f7ebb3:318edbfb-part1
/dev/md1 5  /dev/disk/by-id/md-name-_none_:1
/dev/md1 6  /dev/disk/by-id/md-uuid-57a88605:27f5a773:5be347c1:7c5e7342
/dev/md2 7  /dev/disk/by-id/md-name-coyote:2
/dev/md2 8  /dev/disk/by-id/md-uuid-bb6e03ce:19d290c8:5171004f:0127a392

Re: smartctl cannot access my storage, need syntax help

On Tue 16 Jan 2024 at 06:08:35 (-0500), Felix Miata wrote:
> Tom Furie composed on 2024-01-16 08:18 (UTC):
> > Felix Miata writes:
> 
> >> /dev/sdc 18 /dev/disk/by-id/usb-Brother_MFC-J6920DW_BROG5F229909-0:0 #
> >> How does a printer get a storage device assignment???
> 
> > By having some kind of SD card slot or similar.
> 
> So this pollution only results from a USB-connected printer? IP printer
> connections don't cause it too?

AIUI (not very well), you only get a /dev/sdX when the linux kernel
is what's writing the blocks on the filesystem.

So when I plug in my Galaxy 4 mobile and tap the appropriate buttons
on its screen, /dev/sdb{,1} appear as a block device and partition:

  sdb   8:16   1  29.7G  0 disk  
  └─sdb18:17   1  29.7G  0 part  

so I can run fdisk on the SD card while in the phone, for example:

  $ sudo fdisk -l /dev/sdb
  Disk /dev/sdb: 29.72 GiB, 31914983424 bytes, 62333952 sectors
  Disk model: S5360 Card  
  Units: sectors of 1 * 512 = 512 bytes
  Sector size (logical/physical): 512 bytes / 512 bytes
  I/O size (minimum/optimal): 512 bytes / 512 bytes
  Disklabel type: dos
  Disk identifier: 0x03399e11

  Device Boot Start  End  Sectors  Size Id Type
  /dev/sdb12048 62333951 62331904 29.7G  c W95 FAT32 (LBA)
  $ 

OTOH with my A13 phone, I don't get a block device created, but just
a FUSE wrapper round the filesystems that Android is running, both
internal and any SD card:

  $ mount
  [ … ]
  aft-mtp-mount on /media/samsungd type fuse.aft-mtp-mount 
(rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
  $ 

Cheers,
David.

Re: smartctl cannot access my storage, need syntax help

On Tue 16 Jan 2024 at 09:40:19 (-0500), Greg Wooledge wrote:
> On Tue, Jan 16, 2024 at 09:31:54AM -0500, Felix Miata wrote:
> > David Wright composed on 2024-01-16 08:05 (UTC-0600):
> > > On Tue 16 Jan 2024 at 00:55:52 (-0500), Felix Miata wrote:
> > >> gene heskett composed on 2024-01-15 17:56 (UTC-0500):
> > 
> > >>> Thanks for that composition: but it will be word wrapped:
> > >>> root@coyote:~# for j in /dev/disk/by-id/* ; do printf '%s\t%s\n' 
> > >>> "$(realpath "$j")" "$j" ; done
> > >>> /dev/sr0/dev/disk/by-id/ata-ATAPI_iHAS424_B_3524253_327133504865
> > >>> /dev/sdi/dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
> > >>> /dev/sdj1   /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146-part1
> > 
> > > It's right here at the top.
> > 
> > I missed that, probably because i & j look similar in the big sea of
> > alphanumerics, /and/ sdi has no partitions, while sdj1 has no parent disk. 
> > That
> > seems to smell as much like a bug somewhere as two different disks with the 
> > same
> > serial number, a cheap SATA port card maybe. Does ...1146 get duplication 
> > like
> > that when connected to any/every available SATA port?
> 
> I missed it too.  It actually looks like someone copy/pasted the
> pathnames on the right, but then manually typed the device names on
> the left, and made a typo here.  Or, somehow, the device names and
> the pathnames got mixed together, and someone tried to separate them
> manually, and got these two crossed.

It's the sticky labels that convinced me. I had one last possibility
in mind, that the serial numbers were being generated by the
interfaces somehow, but they wouldn't be able to read the labels.

I know nothing about Gene's interfaces, but my SD cards can appear
with false by-id/ values depending on where they're plugged in:
slots (on different PCs), via µSD-SD adapter, SD-USB adapter, etc.

Cheers,
David.

To partition or not to partition MD arrays (Was Re: smartctl cannot access my storage, need syntax help)

2024-01-16 Thread Andy Smith

Hello,

On Tue, Jan 16, 2024 at 01:01:02PM -0800, David Christensen wrote:
> On 1/16/24 11:51, Franco Martelli wrote:
> > I thought it was mandatory for a RAID to partition drives with this
> > partition type, am I wrong?

In the ancient past it was required, because that was one of the
ways that mdadm arrays were assembled: the md kernel driver saw the
"LInux RAID" partition types and tried using them. If you weren't
going to do that, you had to have an mdadm config file, or ewven
specify the topology on the kernel command line. This was 15
or more years ago.

Ever since udev, each newly-appearing block device is handed to a
script for incremental assembly based on the md metadata on the
device itself, so any kind of block device will do.

> As I switched from mdadm(8) to zfs(8) years ago, perhaps another
> reader can explain what mdadm(8) does when given whole disks and
> when given disk partitions.

mdadm doesn't care.

The older set of people recommending partitions were because drive
capacities used to vary quite a lot more than they do today. So
people used to say, "put a partition on and make it few hundred MB
less than the total size of the drive, then if you have to replace
it with a slightly smaller one you'll be fine."

Since 2005 or so there has been a standard called IDEMA LBA1-03¹
about what the actual capacity in sectors should be for any stated
drive capacity, and most drives obey this, though there are still a
few exceptions. So this is very much less of a concern, especially
for those buying "enterprise" storage.

The newer set of people recommending partitions are mostly doing so
because there's been a few incidents of "helpful" PC motherboards
detecting on boot what they think is a corrupt GPT, and replacing it
with a blank one, damaging the RAID. This is a real thing that has
happened to more than one person; it even got linked on Hacker News
I believe.

Then there will just be people going by taste.

Personally I still put them directly on drives. If I ever get taken
out by one of those crappy motherboards, I reserve the right to get
a different religion. 

Thanks,
Andy

¹ https://idema.org/wp-content/downloads/2169.pdf

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting

Re: smartctl cannot access my storage, need syntax help

2024-01-16 Thread David Christensen


On 1/16/24 11:51, Franco Martelli wrote:

On 15/01/24 at 08:43, David Christensen wrote:
When I built and ran a Debian 2 @ HDD RAID1 using mdadm(8), I did not 
partiton the HDD's -- I gave mdadm(8) the whole drives.


I don't know if it is a good idea, in fact it exists a special partition 
type for RAID array listed in fdisk, I used that for my RAID:


---
~# fdisk -l /dev/sd[a-d]
Disk /dev/sda: 931,51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: ST1000DM003-1CH1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00088ecc > ...
I thought it was mandatory for a RAID to partition drives with this 
partition type, am I wrong?



STFW and RTFM I have seen recommendations for and against using whole 
disks for RAID and for and against using partitions for RAID.  And, as 
this in the Internet, there are countless rumors and speculation.  As I 
switched from mdadm(8) to zfs(8) years ago, perhaps another reader can 
explain what mdadm(8) does when given whole disks and when given disk 
partitions.



David

Re: smartctl cannot access my storage, need syntax help

2024-01-16 Thread Franco Martelli


On 15/01/24 at 08:43, David Christensen wrote:
This I am still trying to do, the first pass copied all 350G of /home 
but went to the wrong drive, and I had mounted the drive by its label.

It is now /dev/sdh and all labels above it are now wrong. Crazy.
These SSD's all have an OTP serial number. I am tempted to use that 
serial number as a label _I_ can control.



When I built and ran a Debian 2 @ HDD RAID1 using mdadm(8), I did not 
partiton the HDD's -- I gave mdadm(8) the whole drives.


I don't know if it is a good idea, in fact it exists a special partition 
type for RAID array listed in fdisk, I used that for my RAID:


---
~# fdisk -l /dev/sd[a-d]
Disk /dev/sda: 931,51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: ST1000DM003-1CH1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00088ecc

Device Boot StartEndSectors   Size Id Type
/dev/sda1  * 2048 1953523711 1953521664 931,5G fd Linux raid autodetect


Disk /dev/sdb: 931,51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: ST1000DM003-1CH1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x000d65c9

Device Boot StartEndSectors   Size Id Type
/dev/sdb12048 1953523711 1953521664 931,5G fd Linux raid autodetect


Disk /dev/sdc: 931,51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: ST1000DM003-1CH1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x000306a3

Device Boot StartEndSectors   Size Id Type
/dev/sdc12048 1953523711 1953521664 931,5G fd Linux raid autodetect


Disk /dev/sdd: 931,51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: ST1000DM003-1CH1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x0007a1fe

Device Boot StartEndSectors   Size Id Type
/dev/sdd12048 1953523711 1953521664 931,5G fd Linux raid autodetect
---

I thought it was mandatory for a RAID to partition drives with this 
partition type, am I wrong?


Cheers,

--
Franco Martelli

Re: smartctl cannot access my storage, need syntax help

2024-01-16 Thread Andy Smith

Hello,

On Tue, Jan 16, 2024 at 01:17:42AM -0500, Felix Miata wrote:
> If rsync really is bugged, maybe a change of options would avoid the bug. Try
> instead of -av, -rlptgoDAXUNH. Could it be that verbosity is the OOM 
> crippler, and
> not necessarily from rsync itself, but possibly from the xterm in which rsync 
> is
> running? Does your source contain any hard links? Do you use ACLs or xattrs?

I'm totally burned out on trying to get info out of Gene, but my
experience with rsync is that use of some options can massively
increase memory usage.

The options covered by -a don't tend to do it (and I doubt -v does
anything), but things like --delay-updates, --delete--before,
--delete-after and --prune-empty-dirs do. This is because rsync
normally incrementally finds files to transfer so it only keeps a
certain number of entries in memory and can sync any number of files
without blowing up RAM, but those options disable that strategy.

Even so, rsync only needs about 100 bytes of RAM per file that is
checked on source, and the size of the files doesn't matter.

In desperate circumstances, file tree can be rsynced in multiple
segments, e.g. one rsync for each subdir or whatever other split
makes sense.

Maybe also ulimit can be used to set an artificially low value on
the memory that rsync is allowed to use. It will fail sooner, but
hopefully before using all the system's RAM and swap and having the
oom-killer intervene.

Thanks,
Andy

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting

Re: smartctl cannot access my storage, need syntax help

2024-01-16 Thread Thomas Schmitt

Hi,

i, too, wondered where there should be a duplicate serial number.
But indeed:

David Wright wrote:
> > /dev/sdi53  /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
> > /dev/sdj1   54  /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146-part1
>   ↑ that is /really/ bad!

Does the number of 4 device files /dev/sd[h-k] match the number of
installed ata-Gigastone_SSD devices ? Gene talked of
"5, ordered in 2 separate orders".
(Looking at https://lists.debian.org/debian-user/2024/01/msg00667.html)
Now we see 3 to 4, depending on what one wants to believe.

Wild ideas:
One possible reason could be that a device is mapped to both, /dev/sdi
and /dev/sdj. udev would then suffer a race condition when creating the
/dev/disk/by-id.
Another could be that udev's assessment of the drives derails and that
serial number information spilled from the assessment of /dev/sdi to
the assessment of /dev/sdj*.

It would be interesting to see the output of

  ls -l /dev/sd[ij]*

in order to learn about the existence of /dev/sdj and the the device
numbers of sdi* and sdj*.

Further one should inquire the serial numbers by

  lsblk -d -o NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijk]

Have a nice day :)

Thomas

Re: smartctl cannot access my storage, need syntax help

2024-01-16 Thread Max Nikulin


On 16/01/2024 15:18, Tom Furie wrote:

/dev/sdc 18 /dev/disk/by-id/usb-Brother_MFC-J6920DW_BROG5F229909-0:0 #
How does a printer get a storage device assignment???


By having some kind of SD card slot or similar.


I have heard that some devices expose a USB mass storage interface out 
of the box to autorun an installer when the device is plugged. Finally 
the installer switches the device to its normal mode. On Linux 
usb-modeswitch might be required.

Re: smartctl cannot access my storage, need syntax help

2024-01-16 Thread Greg Wooledge

On Tue, Jan 16, 2024 at 09:31:54AM -0500, Felix Miata wrote:
> David Wright composed on 2024-01-16 08:05 (UTC-0600):
> 
> > On Tue 16 Jan 2024 at 00:55:52 (-0500), Felix Miata wrote:
> 
> >> gene heskett composed on 2024-01-15 17:56 (UTC-0500):
> 
> >>> Thanks for that composition: but it will be word wrapped:
> >>> root@coyote:~# for j in /dev/disk/by-id/* ; do printf '%s\t%s\n' 
> >>> "$(realpath "$j")" "$j" ; done
> >>> /dev/sr0/dev/disk/by-id/ata-ATAPI_iHAS424_B_3524253_327133504865
> >>> /dev/sdi/dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
> >>> /dev/sdj1   /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146-part1
> 
> > It's right here at the top.
> 
> I missed that, probably because i & j look similar in the big sea of
> alphanumerics, /and/ sdi has no partitions, while sdj1 has no parent disk. 
> That
> seems to smell as much like a bug somewhere as two different disks with the 
> same
> serial number, a cheap SATA port card maybe. Does ...1146 get duplication like
> that when connected to any/every available SATA port?

I missed it too.  It actually looks like someone copy/pasted the
pathnames on the right, but then manually typed the device names on
the left, and made a typo here.  Or, somehow, the device names and
the pathnames got mixed together, and someone tried to separate them
manually, and got these two crossed.

Re: smartctl cannot access my storage, need syntax help

2024-01-16 Thread Felix Miata

David Wright composed on 2024-01-16 08:05 (UTC-0600):

> On Tue 16 Jan 2024 at 00:55:52 (-0500), Felix Miata wrote:

>> gene heskett composed on 2024-01-15 17:56 (UTC-0500):

>>> Thanks for that composition: but it will be word wrapped:
>>> root@coyote:~# for j in /dev/disk/by-id/* ; do printf '%s\t%s\n' 
>>> "$(realpath "$j")" "$j" ; done
>>> /dev/sr0/dev/disk/by-id/ata-ATAPI_iHAS424_B_3524253_327133504865
>>> /dev/sdi/dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
>>> /dev/sdj1   /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146-part1

> It's right here at the top.

I missed that, probably because i & j look similar in the big sea of
alphanumerics, /and/ sdi has no partitions, while sdj1 has no parent disk. That
seems to smell as much like a bug somewhere as two different disks with the same
serial number, a cheap SATA port card maybe. Does ...1146 get duplication like
that when connected to any/every available SATA port?
-- 
Evolution as taught in public schools is, like religion,
based on faith, not based on science.

 Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata

Re: smartctl cannot access my storage, need syntax help