[PLUG] Raving mad RAID

2021-02-01 Thread John Jason Jordan
About a week ago I finally was successful in creating a RAID0 array on
my four NVMe drives that are installed in a Thunderbolt 3 enclosure.
After creating the array it appeared in /dev as md0. After rebooting it
became md127. I copied the UUID from Gparted and used it in a line
that I added to /etc/fstab.

The array has been working fine ever since I created it, including
copying files to it late last night. This morning I tried to add a
torrent for a distro ISO to Ktorrent, and got an error message that
Ktorrent couldn't add the torrent because the location to copy it to
did not exist. WTH?

I looked at my GUI file manager and all the files in the array were
listed. I right-clicked on one of them and immediately noticed that
Rename and Delete were no longer listed in the options. After a bit
more poking around I determined that the array had become read-only
overnight.

I decided to umount it and then re-mount it. The umount command gave me
'can't read superblock on /dev/md127p1,' which is what /dev/md0 became
after rebooting a week ago. However, apparently the umount command
succeeded, because it was no longer mounted. Then I tried to re-mount
it and got the same superblock error message.

Looking at /dev I see that most everything has changed. NVMe1-3 now
have namespace 2 instead of the 1 that they were when I created the
array. And now nvme5-8 are listed, which don't exist. And only nvme4n1
had a partition after I created the array, and now it has two
partitions.

It looks like I'm going to have to nuke the array, re-make it, and wait
24 hours to copy the 10TB of data back to the new array from the NAS
backup. But before I do that I need to find out what went wrong. Might
there be a defect in one of the NVMe drives? Or might there be a bug in
mdadm when it tries to create an array out of NVMe media? Or when the
ext4 filesystem was created? I assume that there exists a utility to
check a drive, but I've never done that before. Suggestions?

I'm considering throwing my computers into the river and doing
something useful with my life.
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-01 Thread TomasK
On Mon, 2021-02-01 at 16:19 -0800, John Jason Jordan wrote:
> About a week ago I finally was successful in creating a RAID0 array
> on
> my four NVMe drives that are installed in a Thunderbolt 3 enclosure.
> After creating the array it appeared in /dev as md0. After rebooting
> it
> became md127. I copied the UUID from Gparted and used it in a line
> that I added to /etc/fstab.
> 
> The array has been working fine ever since I created it, including
> copying files to it late last night. This morning I tried to add a
> torrent for a distro ISO to Ktorrent, and got an error message that
> Ktorrent couldn't add the torrent because the location to copy it to
> did not exist. WTH?
> 
> I looked at my GUI file manager and all the files in the array were
> listed. I right-clicked on one of them and immediately noticed that
> Rename and Delete were no longer listed in the options. After a bit
> more poking around I determined that the array had become read-only
> overnight.
> 
> I decided to umount it and then re-mount it. The umount command gave
> me
> 'can't read superblock on /dev/md127p1,' which is what /dev/md0
> became
> after rebooting a week ago. However, apparently the umount command
> succeeded, because it was no longer mounted. Then I tried to re-mount
> it and got the same superblock error message.
> 
> Looking at /dev I see that most everything has changed. NVMe1-3 now
> have namespace 2 instead of the 1 that they were when I created the
> array. And now nvme5-8 are listed, which don't exist. And only
> nvme4n1
> had a partition after I created the array, and now it has two
> partitions.
> 
> It looks like I'm going to have to nuke the array, re-make it, and
> wait
> 24 hours to copy the 10TB of data back to the new array from the NAS
> backup. But before I do that I need to find out what went wrong.
> Might
> there be a defect in one of the NVMe drives? Or might there be a bug
> in
> mdadm when it tries to create an array out of NVMe media? Or when the
> ext4 filesystem was created? I assume that there exists a utility to
> check a drive, but I've never done that before. Suggestions?
> 
> I'm considering throwing my computers into the river and doing
> something useful with my life.
> 

Perhaps now would be the time to dig out those old emails and consider
some of the native alternatives rejected in favor of RAID0.

Just saying, -T

___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-01 Thread Ben Koenig



On 2/1/21 9:16 PM, TomasK wrote:

On Mon, 2021-02-01 at 16:19 -0800, John Jason Jordan wrote:

About a week ago I finally was successful in creating a RAID0 array
on
my four NVMe drives that are installed in a Thunderbolt 3 enclosure.
After creating the array it appeared in /dev as md0. After rebooting
it
became md127. I copied the UUID from Gparted and used it in a line
that I added to /etc/fstab.

The array has been working fine ever since I created it, including
copying files to it late last night. This morning I tried to add a
torrent for a distro ISO to Ktorrent, and got an error message that
Ktorrent couldn't add the torrent because the location to copy it to
did not exist. WTH?

I looked at my GUI file manager and all the files in the array were
listed. I right-clicked on one of them and immediately noticed that
Rename and Delete were no longer listed in the options. After a bit
more poking around I determined that the array had become read-only
overnight.

I decided to umount it and then re-mount it. The umount command gave
me
'can't read superblock on /dev/md127p1,' which is what /dev/md0
became
after rebooting a week ago. However, apparently the umount command
succeeded, because it was no longer mounted. Then I tried to re-mount
it and got the same superblock error message.

Looking at /dev I see that most everything has changed. NVMe1-3 now
have namespace 2 instead of the 1 that they were when I created the
array. And now nvme5-8 are listed, which don't exist. And only
nvme4n1
had a partition after I created the array, and now it has two
partitions.

It looks like I'm going to have to nuke the array, re-make it, and
wait
24 hours to copy the 10TB of data back to the new array from the NAS
backup. But before I do that I need to find out what went wrong.
Might
there be a defect in one of the NVMe drives? Or might there be a bug
in
mdadm when it tries to create an array out of NVMe media? Or when the
ext4 filesystem was created? I assume that there exists a utility to
check a drive, but I've never done that before. Suggestions?

I'm considering throwing my computers into the river and doing
something useful with my life.


Perhaps now would be the time to dig out those old emails and consider
some of the native alternatives rejected in favor of RAID0.

Just saying, -T



Unfortunately it looks like RAID might not be the culprit if his NVMe 
/dev nodes are moving around. RAID0 isn't the cause but it will make 
things more complicated when something fails further down in the stack.



If his system is dynamically naming devices in /dev/nvme* then that 
needs to be dealt with before even thinking about RAID. Not really sure 
where to start looking at that off the top of my head since I was under 
the assumption that this wasn't supposed to happen with NVMe.


-Ben

___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-01 Thread John Jason Jordan
On Mon, 1 Feb 2021 22:15:12 -0800
Ben Koenig  dijo:

>> Perhaps now would be the time to dig out those old emails and
>> consider some of the native alternatives rejected in favor of RAID0.

>Unfortunately it looks like RAID might not be the culprit if his NVMe
>/dev nodes are moving around. RAID0 isn't the cause but it will make
>things more complicated when something fails further down in the stack.
>
>If his system is dynamically naming devices in /dev/nvme* then that
>needs to be dealt with before even thinking about RAID. Not really
>sure where to start looking at that off the top of my head since I was
>under the assumption that this wasn't supposed to happen with NVMe.

There was recently a bit of discussion about LVM, and Rich sent me some
links. I tried to read and understand it, but it seemed even more
complicated than RAID. Plus, several years ago, when I was using Fedora,
one of their obligatory updates changed my setup to LVM (without
telling me that it was going to do so), and I couldn't get rid of it.
That left a bad taste in my mouth and I have always avoided LVM ever
since. But I must admit that my dislike of LVM is pure bias without
much science.

I am more concerned about devices renaming themselves and changing how
they are mounted, all without any input from me. About January 20 I lost
the first array that I had been running without a problem for about a
month. And now my re-creation of that array is playing up after only a
week. As I mentioned before, after rebooting the drives appear fine,
read-write, but when I launched Ktorrent it complained that about half
of the files it was seeding were missing. The files are all there and I
can do anything I want to with them, but something is screwy with
access. And why just half of the files? Either they should all work or
they should all fail.

Right now my money is on a defective drive. I need to find some tools
for diagnosis and learn how to use them. All four are brand new Intel
drives. If they all check out OK, then it's time to consider other
possibilities.
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-01 Thread Ben Koenig



On 2/1/21 11:35 PM, John Jason Jordan wrote:

On Mon, 1 Feb 2021 22:15:12 -0800
Ben Koenig  dijo:


Perhaps now would be the time to dig out those old emails and
consider some of the native alternatives rejected in favor of RAID0.

Unfortunately it looks like RAID might not be the culprit if his NVMe
/dev nodes are moving around. RAID0 isn't the cause but it will make
things more complicated when something fails further down in the stack.

If his system is dynamically naming devices in /dev/nvme* then that
needs to be dealt with before even thinking about RAID. Not really
sure where to start looking at that off the top of my head since I was
under the assumption that this wasn't supposed to happen with NVMe.

There was recently a bit of discussion about LVM, and Rich sent me some
links. I tried to read and understand it, but it seemed even more
complicated than RAID. Plus, several years ago, when I was using Fedora,
one of their obligatory updates changed my setup to LVM (without
telling me that it was going to do so), and I couldn't get rid of it.
That left a bad taste in my mouth and I have always avoided LVM ever
since. But I must admit that my dislike of LVM is pure bias without
much science.

I am more concerned about devices renaming themselves and changing how
they are mounted, all without any input from me. About January 20 I lost
the first array that I had been running without a problem for about a
month. And now my re-creation of that array is playing up after only a
week. As I mentioned before, after rebooting the drives appear fine,
read-write, but when I launched Ktorrent it complained that about half
of the files it was seeding were missing. The files are all there and I
can do anything I want to with them, but something is screwy with
access. And why just half of the files? Either they should all work or
they should all fail.


That's what seems so odd. A defective drive wouldn't actually change the 
way things are enumerated. You have 4 drives, one of those would 
disappear and the others would stay the same (for the most part).



A simple test to help everyone here understand what your machine is 
doing would be to run through a few reboots and grab the list of 
devices, like so



1) unplug your TB-3 drives and reboot.

2) record the output of 'ls -l /dev/nvme*' here

3) turn the computer off

4) plug in the TB-3 drives

5) turn the computer on and run 'ls /dev/nvme*' again.


This will clearly isolate the device nodes for your enclosure 
independently of everything else on your computer. Once we have the 
drives isolate, it's trivial to watch them for irregular behavior. Until 
we have more confidence in the existence of your /dev/nvme nodes we can 
ignore the other symptoms.





Right now my money is on a defective drive. I need to find some tools
for diagnosis and learn how to use them. All four are brand new Intel
drives. If they all check out OK, then it's time to consider other
possibilities.
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug

___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-02 Thread John Jason Jordan
On Mon, 1 Feb 2021 23:48:03 -0800
Ben Koenig  dijo:

>On 2/1/21 11:35 PM, John Jason Jordan wrote:
>> On Mon, 1 Feb 2021 22:15:12 -0800
>> Ben Koenig  dijo:
>>
 Perhaps now would be the time to dig out those old emails and
 consider some of the native alternatives rejected in favor of
 RAID0.
>>> Unfortunately it looks like RAID might not be the culprit if his
>>> NVMe /dev nodes are moving around. RAID0 isn't the cause but it
>>> will make things more complicated when something fails further down
>>> in the stack.
>>>
>>> If his system is dynamically naming devices in /dev/nvme* then that
>>> needs to be dealt with before even thinking about RAID. Not really
>>> sure where to start looking at that off the top of my head since I
>>> was under the assumption that this wasn't supposed to happen with
>>> NVMe.
>> There was recently a bit of discussion about LVM, and Rich sent me
>> some links. I tried to read and understand it, but it seemed even
>> more complicated than RAID. Plus, several years ago, when I was
>> using Fedora, one of their obligatory updates changed my setup to
>> LVM (without telling me that it was going to do so), and I couldn't
>> get rid of it. That left a bad taste in my mouth and I have always
>> avoided LVM ever since. But I must admit that my dislike of LVM is
>> pure bias without much science.
>>
>> I am more concerned about devices renaming themselves and changing
>> how they are mounted, all without any input from me. About January
>> 20 I lost the first array that I had been running without a problem
>> for about a month. And now my re-creation of that array is playing
>> up after only a week. As I mentioned before, after rebooting the
>> drives appear fine, read-write, but when I launched Ktorrent it
>> complained that about half of the files it was seeding were missing.
>> The files are all there and I can do anything I want to with them,
>> but something is screwy with access. And why just half of the files?
>> Either they should all work or they should all fail.
>
>That's what seems so odd. A defective drive wouldn't actually change
>the way things are enumerated. You have 4 drives, one of those would
>disappear and the others would stay the same (for the most part).

My understanding is that, since it is RAID0, if one drive fails the
whole array fails. (But that's why this array is backed up to a NAS.)

>A simple test to help everyone here understand what your machine is
>doing would be to run through a few reboots and grab the list of
>devices, like so

I will do these things in the morning. It's too late and my brain is
going into shutdown mode.

But I should add one more thought: I have a RAID0 array on the Synology
NAS, and another on a Mediasonic enclosure with two WS drives. Both
have worked flawlessly for about four years. I've never had to mess
with the arrays.

And one more thought: I'd consider LVM instead of RAID0. But whatever
system I set up, I need the four 7.68TB NVMe drives to appear as one
big-ass 31TB drive.

Now it's bedtime. :)

>1) unplug your TB-3 drives and reboot.
>
>2) record the output of 'ls -l /dev/nvme*' here
>
>3) turn the computer off
>
>4) plug in the TB-3 drives
>
>5) turn the computer on and run 'ls /dev/nvme*' again.
>
>
>This will clearly isolate the device nodes for your enclosure
>independently of everything else on your computer. Once we have the
>drives isolate, it's trivial to watch them for irregular behavior.
>Until we have more confidence in the existence of your /dev/nvme nodes
>we can ignore the other symptoms.

___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-02 Thread Rich Shepard

On Tue, 2 Feb 2021, John Jason Jordan wrote:


And one more thought: I'd consider LVM instead of RAID0. But whatever
system I set up, I need the four 7.68TB NVMe drives to appear as one
big-ass 31TB drive.


John,

That's what LVM does. You have four physical drives. They can be collected
into virtual groups any way you want, including all four in one group. Then
each group (one in your case) can be divided into as many logical volumes as
you want ... one in your case.

That's the way I set up my Mediasonic 4-bay NAS. I now have a single 8TiB
volume available.

The process is a bit different from preparing a new drive (hard or solid
state), but one of the URLs I sent lays them out step-by-step.

So, should you choose to once again dip your toes in the LVM waters you'll
find plenty of help here to keep them toes warm.

Keep on truckin',

Rich
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-02 Thread John Jason Jordan
On Mon, 1 Feb 2021 23:48:03 -0800
Ben Koenig  dijo:

>A simple test to help everyone here understand what your machine is
>doing would be to run through a few reboots and grab the list of
>devices, like so
>
>1) unplug your TB-3 drives and reboot.
>
>2) record the output of 'ls -l /dev/nvme*' here
>
>3) turn the computer off
>
>4) plug in the TB-3 drives
>
>5) turn the computer on and run 'ls /dev/nvme*' again.
>
>This will clearly isolate the device nodes for your enclosure
>independently of everything else on your computer. Once we have the
>drives isolate, it's trivial to watch them for irregular behavior.
>Until we have more confidence in the existence of your /dev/nvme nodes
>we can ignore the other symptoms.

Here are the results:

1: (after unplugging TB3 device and rebooting)
crw--- 1 root root 239, 0 Feb  2 12:01 /dev/nvme0
brw-rw 1 root disk 259, 0 Feb  2 12:01 /dev/nvme0n1
brw-rw 1 root disk 259, 1 Feb  2 12:01 /dev/nvme0n1p1
brw-rw 1 root disk 259, 2 Feb  2 12:01 /dev/nvme0n1p2
Note that nvme0 is a 1TB m.2 drive inside the Thinkpad that holds / and
/home.

2: (after turning off computer, plugging in TB3 device, and booting)
crw--- 1 root root 239, 0 Feb  2 11:47 /dev/nvme0
brw-rw 1 root disk 259, 0 Feb  2 11:47 /dev/nvme0n1
crw--- 1 root root 239, 1 Feb  2 11:47
/dev/nvme1 brw-rw 1 root disk 259, 2 Feb  2 11:47 /dev/nvme1n1
crw--- 1 root root 239, 2 Feb  2 11:47 /dev/nvme2
brw-rw 1 root disk 259, 1 Feb  2 11:47 /dev/nvme2n1
crw--- 1 root root 239, 3 Feb  2 11:47 /dev/nvme3
brw-rw 1 root disk 259, 3 Feb  2 11:47 /dev/nvme3n1
crw--- 1 root root 239, 4 Feb  2 11:47 /dev/nvme4
brw-rw 1 root disk 259, 4 Feb  2 11:47 /dev/nvme4n1
brw-rw 1 root disk 259, 5 Feb  2 11:47 /dev/nvme4n1p1
brw-rw 1 root disk 259, 6 Feb  2 11:47 /dev/nvme4n1p2
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-02 Thread Ben Koenig



On 2/2/21 12:09 PM, John Jason Jordan wrote:

On Mon, 1 Feb 2021 23:48:03 -0800
Ben Koenig  dijo:


A simple test to help everyone here understand what your machine is
doing would be to run through a few reboots and grab the list of
devices, like so

1) unplug your TB-3 drives and reboot.

2) record the output of 'ls -l /dev/nvme*' here

3) turn the computer off

4) plug in the TB-3 drives

5) turn the computer on and run 'ls /dev/nvme*' again.

This will clearly isolate the device nodes for your enclosure
independently of everything else on your computer. Once we have the
drives isolate, it's trivial to watch them for irregular behavior.
Until we have more confidence in the existence of your /dev/nvme nodes
we can ignore the other symptoms.

Here are the results:

1: (after unplugging TB3 device and rebooting)
crw--- 1 root root 239, 0 Feb  2 12:01 /dev/nvme0
brw-rw 1 root disk 259, 0 Feb  2 12:01 /dev/nvme0n1
brw-rw 1 root disk 259, 1 Feb  2 12:01 /dev/nvme0n1p1
brw-rw 1 root disk 259, 2 Feb  2 12:01 /dev/nvme0n1p2
Note that nvme0 is a 1TB m.2 drive inside the Thinkpad that holds / and
/home.

2: (after turning off computer, plugging in TB3 device, and booting)
crw--- 1 root root 239, 0 Feb  2 11:47 /dev/nvme0
brw-rw 1 root disk 259, 0 Feb  2 11:47 /dev/nvme0n1
crw--- 1 root root 239, 1 Feb  2 11:47
/dev/nvme1 brw-rw 1 root disk 259, 2 Feb  2 11:47 /dev/nvme1n1
crw--- 1 root root 239, 2 Feb  2 11:47 /dev/nvme2
brw-rw 1 root disk 259, 1 Feb  2 11:47 /dev/nvme2n1
crw--- 1 root root 239, 3 Feb  2 11:47 /dev/nvme3
brw-rw 1 root disk 259, 3 Feb  2 11:47 /dev/nvme3n1
crw--- 1 root root 239, 4 Feb  2 11:47 /dev/nvme4
brw-rw 1 root disk 259, 4 Feb  2 11:47 /dev/nvme4n1
brw-rw 1 root disk 259, 5 Feb  2 11:47 /dev/nvme4n1p1
brw-rw 1 root disk 259, 6 Feb  2 11:47 /dev/nvme4n1p2



OK so then everything seems to be connecting at the hardware level. Your 
TB controller is exposing 4 NVMe devices and they are identifying as 
block devices ("disks") which means that the hardware is functioning.



What stands out is that of the 4 disks only 1 of them actually has 
partitions. This strikes me as odd.



As a simple test, can you create a test folder in /mnt and see if those 
partitions on nvme4 are mountable?


$ mkdir /mnt/nvme

$ mount /dev/nvme4n1p1 /mnt/nvme


Both commands should be run as root and I'm assuming you have nothing 
mounted in /mnt. If that succeeds and you can view files let me know. 
Also post any output from the mount command here.


-Ben



___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-02 Thread carl day
I have used F2FS for years on my SSD drives. [F2FS ALWAYS benchmarks FASTEST fs]
There is an option in mkfs.f2fs -c  ,  that can make use of upto 7
drives as ONE LARGE DRIVE.
Have 7 240G [do not have to be same size] as one /dev in one of my large boxes
 i use archlinux, YMMV

On 2/2/21, Rich Shepard  wrote:
> On Tue, 2 Feb 2021, John Jason Jordan wrote:
>
>> And one more thought: I'd consider LVM instead of RAID0. But whatever
>> system I set up, I need the four 7.68TB NVMe drives to appear as one
>> big-ass 31TB drive.
>
> John,
>
> That's what LVM does. You have four physical drives. They can be collected
> into virtual groups any way you want, including all four in one group. Then
> each group (one in your case) can be divided into as many logical volumes
> as
> you want ... one in your case.
>
> That's the way I set up my Mediasonic 4-bay NAS. I now have a single 8TiB
> volume available.
>
> The process is a bit different from preparing a new drive (hard or solid
> state), but one of the URLs I sent lays them out step-by-step.
>
> So, should you choose to once again dip your toes in the LVM waters you'll
> find plenty of help here to keep them toes warm.
>
> Keep on truckin',
>
> Rich
> ___
> PLUG: https://pdxlinux.org
> PLUG mailing list
> PLUG@pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug
>
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-02 Thread John Jason Jordan
On Tue, 2 Feb 2021 12:09:06 -0800
John Jason Jordan  dijo:

>On Mon, 1 Feb 2021 23:48:03 -0800
>Ben Koenig  dijo:
>
>>A simple test to help everyone here understand what your machine is
>>doing would be to run through a few reboots and grab the list of
>>devices, like so
>>
>>1) unplug your TB-3 drives and reboot.
>>
>>2) record the output of 'ls -l /dev/nvme*' here
>>
>>3) turn the computer off
>>
>>4) plug in the TB-3 drives
>>
>>5) turn the computer on and run 'ls /dev/nvme*' again.
>>
>>This will clearly isolate the device nodes for your enclosure
>>independently of everything else on your computer. Once we have the
>>drives isolate, it's trivial to watch them for irregular behavior.
>>Until we have more confidence in the existence of your /dev/nvme nodes
>>we can ignore the other symptoms.
>
>Here are the results:
>
>1: (after unplugging TB3 device and rebooting)
>crw--- 1 root root 239, 0 Feb  2 12:01 /dev/nvme0
>brw-rw 1 root disk 259, 0 Feb  2 12:01 /dev/nvme0n1
>brw-rw 1 root disk 259, 1 Feb  2 12:01 /dev/nvme0n1p1
>brw-rw 1 root disk 259, 2 Feb  2 12:01 /dev/nvme0n1p2
>Note that nvme0 is a 1TB m.2 drive inside the Thinkpad that holds / and
>/home.
>
>2: (after turning off computer, plugging in TB3 device, and booting)
>crw--- 1 root root 239, 0 Feb  2 11:47 /dev/nvme0
>brw-rw 1 root disk 259, 0 Feb  2 11:47 /dev/nvme0n1
>crw--- 1 root root 239, 1 Feb  2 11:47
>/dev/nvme1 brw-rw 1 root disk 259, 2 Feb  2 11:47 /dev/nvme1n1
>crw--- 1 root root 239, 2 Feb  2 11:47 /dev/nvme2
>brw-rw 1 root disk 259, 1 Feb  2 11:47 /dev/nvme2n1
>crw--- 1 root root 239, 3 Feb  2 11:47 /dev/nvme3
>brw-rw 1 root disk 259, 3 Feb  2 11:47 /dev/nvme3n1
>crw--- 1 root root 239, 4 Feb  2 11:47 /dev/nvme4
>brw-rw 1 root disk 259, 4 Feb  2 11:47 /dev/nvme4n1
>brw-rw 1 root disk 259, 5 Feb  2 11:47 /dev/nvme4n1p1
>brw-rw 1 root disk 259, 6 Feb  2 11:47 /dev/nvme4n1p2

After the above I opened Ktorrent. It presented me with a couple error
messages about missing files, where I pointed it to a folder that I had
renamed, which it happily accepted, and it is now seeding all its
torrents. The renamed folders were my fault. Then I opened a file
manager to /dev and scrolled down to nvme entries; which gave me:

nvme0
nvme0n1
nvme1
nvme1n1
nvme2
nvme2n1
nvme3
nvme3n1
nvme4
nvme4n1
nvme4n1p1
nvme4n1p2

And scrolling up a bit I see md127 and md127p1.

Everything is back to normal. My only problem is what happens when the
md127 and md127p1 suddenly become read-only again. It happened during
the night of February 1, so I can assume that eventually it's going to
happen again.
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-02 Thread Ben Koenig



On 2/2/21 12:31 PM, John Jason Jordan wrote:

On Tue, 2 Feb 2021 12:09:06 -0800
John Jason Jordan  dijo:


On Mon, 1 Feb 2021 23:48:03 -0800
Ben Koenig  dijo:


A simple test to help everyone here understand what your machine is
doing would be to run through a few reboots and grab the list of
devices, like so

1) unplug your TB-3 drives and reboot.

2) record the output of 'ls -l /dev/nvme*' here

3) turn the computer off

4) plug in the TB-3 drives

5) turn the computer on and run 'ls /dev/nvme*' again.

This will clearly isolate the device nodes for your enclosure
independently of everything else on your computer. Once we have the
drives isolate, it's trivial to watch them for irregular behavior.
Until we have more confidence in the existence of your /dev/nvme nodes
we can ignore the other symptoms.

Here are the results:

1: (after unplugging TB3 device and rebooting)
crw--- 1 root root 239, 0 Feb  2 12:01 /dev/nvme0
brw-rw 1 root disk 259, 0 Feb  2 12:01 /dev/nvme0n1
brw-rw 1 root disk 259, 1 Feb  2 12:01 /dev/nvme0n1p1
brw-rw 1 root disk 259, 2 Feb  2 12:01 /dev/nvme0n1p2
Note that nvme0 is a 1TB m.2 drive inside the Thinkpad that holds / and
/home.

2: (after turning off computer, plugging in TB3 device, and booting)
crw--- 1 root root 239, 0 Feb  2 11:47 /dev/nvme0
brw-rw 1 root disk 259, 0 Feb  2 11:47 /dev/nvme0n1
crw--- 1 root root 239, 1 Feb  2 11:47
/dev/nvme1 brw-rw 1 root disk 259, 2 Feb  2 11:47 /dev/nvme1n1
crw--- 1 root root 239, 2 Feb  2 11:47 /dev/nvme2
brw-rw 1 root disk 259, 1 Feb  2 11:47 /dev/nvme2n1
crw--- 1 root root 239, 3 Feb  2 11:47 /dev/nvme3
brw-rw 1 root disk 259, 3 Feb  2 11:47 /dev/nvme3n1
crw--- 1 root root 239, 4 Feb  2 11:47 /dev/nvme4
brw-rw 1 root disk 259, 4 Feb  2 11:47 /dev/nvme4n1
brw-rw 1 root disk 259, 5 Feb  2 11:47 /dev/nvme4n1p1
brw-rw 1 root disk 259, 6 Feb  2 11:47 /dev/nvme4n1p2

After the above I opened Ktorrent. It presented me with a couple error
messages about missing files, where I pointed it to a folder that I had
renamed, which it happily accepted, and it is now seeding all its
torrents. The renamed folders were my fault. Then I opened a file
manager to /dev and scrolled down to nvme entries; which gave me:

nvme0
nvme0n1
nvme1
nvme1n1
nvme2
nvme2n1
nvme3
nvme3n1
nvme4
nvme4n1
nvme4n1p1
nvme4n1p2

And scrolling up a bit I see md127 and md127p1.

Everything is back to normal. My only problem is what happens when the
md127 and md127p1 suddenly become read-only again. It happened during
the night of February 1, so I can assume that eventually it's going to
happen again.



There's probably a bunch of debug information dumped into a log 
somewhere. If it seems to be working now and you didn't actually change 
anything then it was probably a minor drop in connection with one or 
more of the drives. Depending on how paranoid you want to be there are 
steps you can take to try and root cause the problem.



The one thing to keep an eye on is the number of nvme devices in /dev. 
If you encounter a situation where you have drives numbered above nvme4 
then that means the kernel is losing track of your connected drives. 
That's a very specific class of problem and usually easy to deal with 
once isolated.



For now it's probably best to continue as normal and if it happens again 
just take a look at your nvme devices in /dev and see if they have changed.


-Ben

___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-02 Thread John Jason Jordan
On Tue, 2 Feb 2021 05:50:57 -0800 (PST)
Rich Shepard  dijo:

>On Tue, 2 Feb 2021, John Jason Jordan wrote:
>
>> And one more thought: I'd consider LVM instead of RAID0. But whatever
>> system I set up, I need the four 7.68TB NVMe drives to appear as one
>> big-ass 31TB drive.

>That's what LVM does. You have four physical drives. They can be
>collected into virtual groups any way you want, including all four in
>one group. Then each group (one in your case) can be divided into as
>many logical volumes as you want ... one in your case.
>
>That's the way I set up my Mediasonic 4-bay NAS. I now have a single
>8TiB volume available.
>
>The process is a bit different from preparing a new drive (hard or
>solid state), but one of the URLs I sent lays them out step-by-step.

At the moment everything is working. But I'm pretty sure things
will eventually turn upside down again. My best guess is that one of my
drives is defective, but the failure is intermittent. I need something
to check the disks, sort of like the memtest tool that appears in the
grub menu. I want something that will check every part of each drive,
repeatedly.

I haven't completely given up on LVM, and I may end up there
eventually. But while poking around I stumbled on F2FS, a Linux
filesystem for flash drives written by Samsung. Here is the Wikipedia
article:

https://en.wikipedia.org/wiki/F2FS

Opinions welcome!
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-02 Thread Ben Koenig



On 2/2/21 3:20 PM, John Jason Jordan wrote:

On Tue, 2 Feb 2021 05:50:57 -0800 (PST)
Rich Shepard  dijo:


On Tue, 2 Feb 2021, John Jason Jordan wrote:


And one more thought: I'd consider LVM instead of RAID0. But whatever
system I set up, I need the four 7.68TB NVMe drives to appear as one
big-ass 31TB drive.

That's what LVM does. You have four physical drives. They can be
collected into virtual groups any way you want, including all four in
one group. Then each group (one in your case) can be divided into as
many logical volumes as you want ... one in your case.

That's the way I set up my Mediasonic 4-bay NAS. I now have a single
8TiB volume available.

The process is a bit different from preparing a new drive (hard or
solid state), but one of the URLs I sent lays them out step-by-step.

At the moment everything is working. But I'm pretty sure things
will eventually turn upside down again. My best guess is that one of my
drives is defective, but the failure is intermittent. I need something
to check the disks, sort of like the memtest tool that appears in the
grub menu. I want something that will check every part of each drive,
repeatedly.



smartctl can give you a quick look at drive statistics.

$ smartctl -a /dev/nvme0n1


It's by no means guaranteed to tell you if something is wrong or what 
the cause is, but it will probably let you know if there is a major problem.





I haven't completely given up on LVM, and I may end up there
eventually. But while poking around I stumbled on F2FS, a Linux
filesystem for flash drives written by Samsung. Here is the Wikipedia
article:

https://en.wikipedia.org/wiki/F2FS

Opinions welcome!
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug

___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-02 Thread John Jason Jordan
On Tue, 2 Feb 2021 17:07:45 -0800
Ben Koenig  dijo:

> smartctl -a /dev/nvme0n1

OK, this is really weird. All this time nvme0 has been the 1TB m.2
Samsung drive inside the Thinkpad, which holds / and /home. I was amazed
when I ran the command exactly as above and it said the drive was a
7.68TB Intel NVMe. I continued, incrementing the number each time I ran
the command, and the 1TB Samsung is now nvme4n1.

I had been perplexed with why nvme4n1 had two partitions, because I was
thinking it was part of the array. The mdadm command that I used to
create the array listed nvme1, nvme2, nvme3, and nvme4, in that order.
I copied the command from the terminal and put it in a text file to save
it, and I verified just now that I did not misremember. Drive nvme4 was
originally part of the array, and now it is the m.2 drive, while the
array is now made of nvme0-3.

I have long been aware that drive labeling like sda, sdb, etc. can
annoyingly swap around. Apparently NVMe drives can be just as
exasperating.

As for the results of the command on the drives in the array, there were
no errors reported, and everything else looked normal.
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-02 Thread wes
On Tue, Feb 2, 2021 at 6:32 PM John Jason Jordan  wrote:

> I have long been aware that drive labeling like sda, sdb, etc. can
> annoyingly swap around. Apparently NVMe drives can be just as
> exasperating.
>
>
Do you have a /dev/disk directory? If so, within it you should find various
ways of addressing each drive, including by UUID. You could specify this
path in your fstab file, as well as re-create the RAID array using these
names rather than /dev/nvmewhatever.

-wes
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-02 Thread John Jason Jordan
On Tue, 2 Feb 2021 18:49:21 -0800
wes  dijo:

>On Tue, Feb 2, 2021 at 6:32 PM John Jason Jordan 
>wrote:
>
>> I have long been aware that drive labeling like sda, sdb, etc. can
>> annoyingly swap around. Apparently NVMe drives can be just as
>> exasperating.

>Do you have a /dev/disk directory? If so, within it you should find
>various ways of addressing each drive, including by UUID. You could
>specify this path in your fstab file, as well as re-create the RAID
>array using these names rather than /dev/nvmewhatever.

Oh, I always do. After creating the array as md0 I opened it in Gparted
(GUI) to create and format the partition. If you right-click on the
partition you can get its UUID, which I saved in my text file where I
keep all the junk I had to go through to create the array and mount it.

It's nice to know that you can get the UUID from /dev/disk, but there
is also a command (that I have forgotten) to get the UUID. For a long
time I used LABEL= for mounting, where I had give the partition a
label. Labels are great, but you have to be careful not to label more
than one partition with the same label. I like labels, because they are
human readable, unlike UUIDs.
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-02 Thread wes
On Tue, Feb 2, 2021 at 7:36 PM John Jason Jordan  wrote:

> It's nice to know that you can get the UUID from /dev/disk, but there
> is also a command (that I have forgotten) to get the UUID. For a long
> time I used LABEL= for mounting, where I had give the partition a
> label. Labels are great, but you have to be careful not to label more
> than one partition with the same label. I like labels, because they are
> human readable, unlike UUIDs.
>
>
If you create the RAID array with UUIDs, it (probably) won't break when
your system renames devices.

-wes
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-02 Thread John Jason Jordan
On Tue, 2 Feb 2021 19:46:51 -0800
wes  dijo:

>If you create the RAID array with UUIDs, it (probably) won't break when
>your system renames devices.

Brilliant!
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-03 Thread Larry Brigman
You can use mdadm to examine the superblock on each drive and it will
give you the details of the array that the drive thinks it is in.
Without a mdadm.conf, the kernel will attempt to assemble the array
based on what it finds in the drive superblocks and will default to
md127 and count up from there.
/proc/mdadm will give you the status of the assembled arrays directly
without the need to go through the mdadm util.

On Tue, Feb 2, 2021 at 9:09 PM John Jason Jordan  wrote:
>
> On Tue, 2 Feb 2021 19:46:51 -0800
> wes  dijo:
>
> >If you create the RAID array with UUIDs, it (probably) won't break when
> >your system renames devices.
>
> Brilliant!
> ___
> PLUG: https://pdxlinux.org
> PLUG mailing list
> PLUG@pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-03 Thread John Jason Jordan
On Wed, 3 Feb 2021 12:48:48 -0800
Larry Brigman  dijo:

>You can use mdadm to examine the superblock on each drive and it will
>give you the details of the array that the drive thinks it is in.
>Without a mdadm.conf, the kernel will attempt to assemble the array
>based on what it finds in the drive superblocks and will default to
>md127 and count up from there.

Ah, now I know where the '127' came from.

As for mdadm.conf, when I started I noticed that I had such a file, and
I left it alone. But when I got massive errors in my attempts to create
the array, in hopes of making things work better, I appended '.old' to
the file. I still had errors, but after trying over and over I finally
succeeded in creating /dev/md0 without errors. After I finished I
deleted the renamed mdadm.conf file, and then used:
sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf
to recreate the mdadm.conf file.

>/proc/mdadm will give you the status of the assembled arrays directly
>without the need to go through the mdadm util.

/proc/mdadm
bash: /proc/mdadm: No such file or directory
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-03 Thread wes
On Wed, Feb 3, 2021 at 2:19 PM John Jason Jordan  wrote:

> On Wed, 3 Feb 2021 12:48:48 -0800 Larry Brigman 
> dijo:
> >/proc/mdadm will give you the status of the assembled arrays directly
> >without the need to go through the mdadm util.
>
> /proc/mdadm
> bash: /proc/mdadm: No such file or directory
>
>
I get this mixed up frequently also. it's actually /proc/mdstat.

-wes
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-03 Thread John Jason Jordan
On Wed, 3 Feb 2021 15:05:33 -0800
wes  dijo:

>On Wed, Feb 3, 2021 at 2:19 PM John Jason Jordan 
>wrote:
>
>> On Wed, 3 Feb 2021 12:48:48 -0800 Larry Brigman
>>  dijo:
>> >/proc/mdadm will give you the status of the assembled arrays
>> >directly without the need to go through the mdadm util.
>>
>> /proc/mdadm
>> bash: /proc/mdadm: No such file or directory

>I get this mixed up frequently also. it's actually /proc/mdstat.

$ /proc/mdadm
bash: /proc/mdadm: No such file or directory
jjj@Devil-Thinkpad:~$ /proc/mdstat
bash: /proc/mdstat: Permission denied
jjj@Devil-Thinkpad:~$ sudo /proc/mdstat
[sudo] password for jjj:
sudo: /proc/mdstat: command not found

Some utility for mdadm that I don't have installed?
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-03 Thread wes
On Wed, Feb 3, 2021 at 4:14 PM John Jason Jordan  wrote:

>
> $ /proc/mdstat
>
>
Ah, yeah, we could have been clearer about that too. try cat /proc/mdstat

-wes
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-03 Thread Rich Shepard

On Wed, 3 Feb 2021, John Jason Jordan wrote:


jjj@Devil-Thinkpad:~$ sudo /proc/mdstat


John,

The /proc/files are text. Use:
sudo less /proc/mdstat

Rich
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-17 Thread John Jason Jordan
On Tue, 2 Feb 2021 12:31:45 -0800
John Jason Jordan  dijo:

>nvme0
>nvme0n1
>nvme1
>nvme1n1
>nvme2
>nvme2n1
>nvme3
>nvme3n1
>nvme4
>nvme4n1
>nvme4n1p1
>nvme4n1p2
>
>And scrolling up a bit I see md127 and md127p1.
>
>Everything is back to normal. My only problem is what happens when the
>md127 and md127p1 suddenly become read-only again. It happened during
>the night of February 1, so I can assume that eventually it's going to
>happen again.

It worked fine for a couple weeks, and now it has stopped working again.
The contents of /dev now include:

md127
md127p1
nvme0n2 #was nvme0n1
nvme1n2 #was nvme1n1
nvme2n2 #was nvme2n1
nvme3n2 #was nvme3n1
nvme4
nvme4n1
nvme4n1p1
nvme4n1p2
nvme5
nvme6
nvme7
nvme8

And nvme5-8 do not exist. Things are all screwed up again. I am unable
to access any of the files on md127p1 (the array). The array is
(supposed to be) made up of nvme0-3, and nvme4 is inside my Thinkpad
for / and /home.

I pulled the TB3 cable out of the enclosure (which automatically shuts
it down) and then plugged it back in again. Nothing changed. From past
experience the only way to restore things is to completely reboot.

I really need to figure out what is causing this. Is it a defective
drive? Is it one of the two PCI cards that the four drives are plugged
into? Is it the enclosure that the two PCI cards are plugged into? Why
does it work for a couple weeks and then go south?

The only thing that I can rule out is the TB3 cable, because I bought
three of them and they have been swapped, with no change.

This is driving me nuts.
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-17 Thread John Jason Jordan
On Wed, 17 Feb 2021 14:05:20 -0800
John Jason Jordan  dijo:

>This is driving me nuts.

Before rebooting I tried to umount md127p1, but got the 'busy' error
message, which I solved with the 'lazy' -l option. But although umount
-l executed without error, it didn't umount the array. Eventually I
gave up and rebooted, and after rebooting everything is back to normal.

I really need ideas.
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-17 Thread Tomas Kuchta
On Wed, Feb 17, 2021, 17:30 John Jason Jordan  wrote:

> On Wed, 17 Feb 2021 14:05:20 -0800
> John Jason Jordan  dijo:
>
> >This is driving me nuts.
>
> Before rebooting I tried to umount md127p1, but got the 'busy' error
> message, which I solved with the 'lazy' -l option. But although umount
> -l executed without error, it didn't umount the array. Eventually I
> gave up and rebooted, and after rebooting everything is back to normal.
>
> I really need ideas.
>

I thought you switched to using uuids - why do you care about name changes?

TB is removable hotplug  like USB ... device name changes are to be
expected.

No?
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-17 Thread John Jason Jordan
On Wed, 17 Feb 2021 18:55:11 -0500
Tomas Kuchta  dijo:

>On Wed, Feb 17, 2021, 17:30 John Jason Jordan  wrote:
>
>> On Wed, 17 Feb 2021 14:05:20 -0800
>> John Jason Jordan  dijo:
>>
>> >This is driving me nuts.
>>
>> Before rebooting I tried to umount md127p1, but got the 'busy' error
>> message, which I solved with the 'lazy' -l option. But although
>> umount -l executed without error, it didn't umount the array.
>> Eventually I gave up and rebooted, and after rebooting everything is
>> back to normal.
>>
>> I really need ideas.

>I thought you switched to using uuids - why do you care about name
>changes?
>
>TB is removable hotplug  like USB ... device name changes are to be
>expected.

Well, yes. But I spent just a couple minute trying to umount and mount
the array, and it didn't dawn on my to use the UUID. I had a file
browser window open and /dev was displayed, so I just used
/dev/md127p1. That should have worked, at least for umount. When I
rebooted the array was mounted with the UUID, because the line in fstab
has the UUID, not /dev/md127. The following command executed without
error, but afterwards the array was still mounted, even though I was
unable to access any of the files on it:

sudo umount -l /dev/md127p1

And afterwards I tried to mount it, but the mount command gave the error
that the device was already mounted.

And before umount/mount I pulled the TB3 plug from the enclosure,
waited a couple of minutes, then plugged it back in, but I was still
unable to access any of the files. It should have automatically
mounted, because that's what TB3 does. From past history I was pretty
sure that rebooting would fix things - I was just trying to solve the
problem without having to reboot. And none of these weird results
surprised me, because when I discovered that I couldn't access files on
the array my very first step was to look at /dev, and when I saw the
strange list of device names I knew that things had gotten screwed up
again.

Yes, device names may change on booting or when plugged in, but don't
device names stay put once you are booted or a drive is plugged in?
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-17 Thread Tomas Kuchta
On Wed, Feb 17, 2021, 19:41 John Jason Jordan  wrote:

> On Wed, 17 Feb 2021 18:55:11 -0500
> Tomas Kuchta  dijo:
>
> >On Wed, Feb 17, 2021, 17:30 John Jason Jordan  wrote:
> >
> >> On Wed, 17 Feb 2021 14:05:20 -0800
> >> John Jason Jordan  dijo:
> >>
> >> >This is driving me nuts.
> >>
> >> Before rebooting I tried to umount md127p1, but got the 'busy' error
> >> message, which I solved with the 'lazy' -l option. But although
> >> umount -l executed without error, it didn't umount the array.
> >> Eventually I gave up and rebooted, and after rebooting everything is
> >> back to normal.
> >>
> >> I really need ideas.
>
> >I thought you switched to using uuids - why do you care about name
> >changes?
> >
> >TB is removable hotplug  like USB ... device name changes are to be
> >expected.
>
> Well, yes. But I spent just a couple minute trying to umount and mount
> the array, and it didn't dawn on my to use the UUID. I had a file
> browser window open and /dev was displayed, so I just used
> /dev/md127p1. That should have worked, at least for umount. When I
> rebooted the array was mounted with the UUID, because the line in fstab
> has the UUID, not /dev/md127. The following command executed without
> error, but afterwards the array was still mounted, even though I was
> unable to access any of the files on it:
>
> sudo umount -l /dev/md127p1
>
> And afterwards I tried to mount it, but the mount command gave the error
> that the device was already mounted.
>
> And before umount/mount I pulled the TB3 plug from the enclosure,
> waited a couple of minutes, then plugged it back in, but I was still
> unable to access any of the files. It should have automatically
> mounted, because that's what TB3 does.


That might work with single disk if you run sync beforehand and nothing
accesses the disk.

Definitely not a good idea to unplug disk array without clean unmount and
stopping mdadm.

You say that  in the past was great. Your past and current problems
would indicate that it was not all great in the past - or you runout of
luck.

Hope that helps,
T

>
>
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-17 Thread John Jason Jordan
On Thu, 18 Feb 2021 00:27:25 -0500
Tomas Kuchta  dijo:

>> And before umount/mount I pulled the TB3 plug from the enclosure,
>> waited a couple of minutes, then plugged it back in, but I was still
>> unable to access any of the files. It should have automatically
>> mounted, because that's what TB3 does.

>That might work with single disk if you run sync beforehand and nothing
>accesses the disk.

I never heard of 'sync.' Is that a command?

>Definitely not a good idea to unplug disk array without clean unmount
>and stopping mdadm.

I have had numerous RAID0 arrays, and I never had to stop mdadm before
mounting/unmounting. I don't even know how to stop mdadm.

>You say that  in the past was great. Your past and current problems
>would indicate that it was not all great in the past - or you runout of
>luck.

I think I need some education. An mdadm array is apparently not the same
as a disk, but how it's different and how to deal with it is something
I need to learn about.
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-17 Thread TomasK
On Wed, 2021-02-17 at 21:59 -0800, John Jason Jordan wrote:
> On Thu, 18 Feb 2021 00:27:25 -0500
> Tomas Kuchta  dijo:
> 
> > > And before umount/mount I pulled the TB3 plug from the enclosure,
> > > waited a couple of minutes, then plugged it back in, but I was
> > > still
> > > unable to access any of the files. It should have automatically
> > > mounted, because that's what TB3 does.
> > That might work with single disk if you run sync beforehand and
> > nothing
> > accesses the disk.
> 
> I never heard of 'sync.' Is that a command?
> 
> > Definitely not a good idea to unplug disk array without clean
> > unmount
> > and stopping mdadm.
> 
> I have had numerous RAID0 arrays, and I never had to stop mdadm
> before
> mounting/unmounting. I don't even know how to stop mdadm.
> 
> > You say that  in the past was great. Your past and current
> > problems
> > would indicate that it was not all great in the past - or you
> > runout of
> > luck.
> 
> I think I need some education. An mdadm array is apparently not the
> same
> as a disk, but how it's different and how to deal with it is
> something
> I need to learn about.
> 
Note: My advice is: Never let the array or TB go to sleep - SSDs can
sleep to save power if mdraid lets them.

There are two ways to cleanly stop raid array:
   1. Easiest:
  sudo shutdown -h
   2. If you need to change configuration, add/remove disks, etc.
  Copy
  paste from:

-Tomas
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-17 Thread John Jason Jordan
On Wed, 17 Feb 2021 22:57:55 -0800
TomasK  dijo:

>On Wed, 2021-02-17 at 21:59 -0800, John Jason Jordan wrote:
>> On Thu, 18 Feb 2021 00:27:25 -0500
>> Tomas Kuchta  dijo:

>> I never heard of 'sync.' Is that a command?
>>
>> > Definitely not a good idea to unplug disk array without clean
>> > unmount
>> > and stopping mdadm.

>Note: My advice is: Never let the array or TB go to sleep - SSDs can
>sleep to save power if mdraid lets them.

I need to figure out how to stop SSDs from sleep to save power.

>There are two ways to cleanly stop raid array:
>   1. Easiest:
>  sudo shutdown -h
>   2. If you need to change configuration, add/remove disks, etc.
>  Copy
>  paste from:

What does -h reference?  How do I enrer a command to showdown the
arrary>
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-18 Thread Tomas Kuchta
On Thu, Feb 18, 2021, 02:18 John Jason Jordan  wrote:

>
>
> >There are two ways to cleanly stop raid array:
> >   1. Easiest:
> >  sudo shutdown -h
> >   2. If you need to change configuration, add/remove disks, etc.
> >  Copy
> >  paste from:
>
> What does -h reference?  How do I enrer a command to showdown the
> arrary>
> .


sudo shutdown -h now
Will shutdown your laptop - there are, obviously, other ways of doing that
- the shutdown process should take care of doing it correctly - as long as
the numbers/dependencies in your fstab are correct.

man shutdown
Should answer all the questions about shutdown command options.

-Tomas

>
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-18 Thread David

On 2/17/21 4:40 PM, John Jason Jordan wrote:


Yes, device names may change on booting or when plugged in, but don't
device names stay put once you are booted or a drive is plugged in?


This sounds like a cable or hardware issue to me. If a drive is mounted 
and something happens and the kernel isn't involved, you will get a new 
mount address of the same thing.


Either the drives are going to sleep, as already suggested, or there is 
a glitch in the communication path which is causing a sporadic 
disconnect/reconnect sequence.


I have seen this periodically with USB, and TB3 likely uses something 
similar, where udev is involved.


Sorry that I don't have a solution, but if you have another TB3 cable 
you could use, that may help narrow things down if the problem goes away.


dafr
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-18 Thread John Jason Jordan
On Thu, 18 Feb 2021 10:40:04 -0800
David  dijo:

>On 2/17/21 4:40 PM, John Jason Jordan wrote:
>
>> Yes, device names may change on booting or when plugged in, but don't
>> device names stay put once you are booted or a drive is plugged in?
>
>This sounds like a cable or hardware issue to me. If a drive is
>mounted and something happens and the kernel isn't involved, you will
>get a new mount address of the same thing.
>
>Either the drives are going to sleep, as already suggested, or there
>is a glitch in the communication path which is causing a sporadic
>disconnect/reconnect sequence.
>
>I have seen this periodically with USB, and TB3 likely uses something
>similar, where udev is involved.
>
>Sorry that I don't have a solution, but if you have another TB3 cable
>you could use, that may help narrow things down if the problem goes
>away.

It's definitely not the cable. I have three TB3 cables, and they have
been swapped, yet the problem remains.

The four U.2 drives are plugged into two PCI cards (two drives per
card) which are then plugged into a four-slot enclosure. The enclosure
has two TB3 ports, but no on-off switch. When a powered TB3 cable is
plugged into either port the enclosure automatically powers on, and it
powers off when the cable is unplugged. And its fan is loud enough that
there is no mistaking when it is running. It is possible that there was
a power loss of only a few milliseconds - enough to un-mount the
array, but not long enough that the fan stopped running. But since it
is TB3, when the power is restored, why doesn't the array automatically
mount?

Besides, looking at the NVMe drives in /dev, when things go upside
down I see nvme5 to nvme8 listed, and those drives don't exist. After
rebooting they are no longer listed. The array is made from nvme0 to
nvme3, and nvme4 is inside my laptop and holds partitions for / and
/home. And from the command line I get screwy results from the mount
command - just 'mount' does not list the array (i.e., it must not be
mounted), but if I try to mount it I get an error message that it is
already mounted. Yet the file manager does not see it, nor does the ls
command.
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-18 Thread wes
On Thu, Feb 18, 2021 at 1:24 PM John Jason Jordan  wrote:

>
> It's definitely not the cable. I have three TB3 cables, and they have
> been swapped, yet the problem remains.
>
>
So the next logical step is to swap the other bits of hardware. Which is
really difficult when you only have 1 of each. Normally in such a case we
could reach out to the community to see if anyone has similar hardware we
can borrow for troubleshooting, but I think you're the only one in the
Portland metro area with one of those enclosures.


> The four U.2 drives are plugged into two PCI cards (two drives per
> card) which are then plugged into a four-slot enclosure. The enclosure
> has two TB3 ports, but no on-off switch. When a powered TB3 cable is
> plugged into either port the enclosure automatically powers on, and it
> powers off when the cable is unplugged.
>

Does the enclosure offer any options to supply power externally? Might be
worth a try, just as an experiment.

-wes
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Raving mad RAID

2021-02-18 Thread John Jason Jordan
On Thu, 18 Feb 2021 13:36:15 -0800
wes  dijo:

>On Thu, Feb 18, 2021 at 1:24 PM John Jason Jordan 
>wrote:
>
>>
>> It's definitely not the cable. I have three TB3 cables, and they have
>> been swapped, yet the problem remains.

>So the next logical step is to swap the other bits of hardware. Which
>is really difficult when you only have 1 of each. Normally in such a
>case we could reach out to the community to see if anyone has similar
>hardware we can borrow for troubleshooting, but I think you're the
>only one in the Portland metro area with one of those enclosures.

>> The four U.2 drives are plugged into two PCI cards (two drives per
>> card) which are then plugged into a four-slot enclosure. The
>> enclosure has two TB3 ports, but no on-off switch. When a powered
>> TB3 cable is plugged into either port the enclosure automatically
>> powers on, and it powers off when the cable is unplugged.

>Does the enclosure offer any options to supply power externally? Might
>be worth a try, just as an experiment.

It's a Magma Expressbox 3T-V3:

https://www.onestopsystems.com/product/expressbox-3t-v3

Magma was bought out by One Stop Systems.

In the pic on the above link it looks like there is a power button on
the front, but it's really just a light.

Inside there are four PCI slots, but one is occupied by the TB3 card,
and one of the others is blocked by a big heat sink so you can't hang a
long card off the end of it. But that's OK because I just need slots
for two full length cards. There are also two fans, one in front
blowing in and another in back blowing out.

It has an external power supply that plugs in with a connector that
looks like the end of a machine power cord, like the end of the power
cord for a desktop PC, except that the end that plugs into the wall has
a brick on it. I don't know how many volts the enclosure takes, but it's
not straight 120v, because I assume the brick is stepping it down. So
there is another way o turn it off - just pull the power cord out of
its socket. But unlike the TB3 cord, plugging the power cord back in
doesn't turn it on.

Even if I had spares to swap for the other bits, it would be difficult
to swap things out because it runs fine for a couple weeks before it
fails.
___
PLUG: https://pdxlinux.org
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug