[PLUG] Hard drive issues

2022-08-18 Thread Rich Shepard

A few months ago you helped me re-do the MediaSonic Probox external drive
enclosure. I replaced all four hard drives with WD RED 2T drives.

Bay 1 is mounted as /media/data2/, bay 2 is mounted as /media/data3, and
bays 3 and 4 are a RAID1 (/dev/md0) mounted as /media/backup.

Yesterday my dirvish backup reported rsync errors with /media/data2/ and
/media/data3/. I sent one error report to the dirvish mail list (since
they're both the same rsync input/output errors. No response yet.

I've uploaded one dirvish error report (as temp.tmp) to
 because at 201 lines it's too large to include in
this message. It will remain there for 5 days.

The two drives are entered in /etc/fstab as:
UUID=b50f1824-45ee-4623-adc7-ea737a88902b  /media/data2  ext4  auto,users,rw  1 
2
UUID=8b47d782-8e1c-46bb-b314-0c53d90d6fac  /media/data3  ext4  auto,users,rw  1 
2

and fdisk -l reports:
The primary GPT table is corrupt, but the backup appears OK, so that will be 
used.
Disk /dev/sdi: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 516A555D-993E-4F90-97A5-D698B61E7170

The primary GPT table is corrupt, but the backup appears OK, so that will be 
used.
Disk /dev/sdj: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: E999846E-C55C-4DF6-8656-F7B68F542F6C

lsblk doesn't include them.

This all worked fine until Wednesday night.

I think the fstab UUIDs are for the partitions /dev/sdi1 and /dev/sdj1
rather than for /dev/sdi and /dev/sdj.

Could the WD RED drives have failed so quickly?

Should I run fsck? If so, do I specify /dev/sdi and /dev/sdj or the *1
partitions?

I need the data on these disks. Fortunately, I've added nothing to either on
Tuesday so the backups through Tuesday are all okay.

Rich



Re: [PLUG] Hard drive issues

2022-08-18 Thread Tomas Kuchta
On Thu, Aug 18, 2022, 08:43 Rich Shepard  wrote:

> A few months ago you helped me re-do the MediaSonic Probox external drive
> enclosure. I replaced all four hard drives with WD RED 2T drives.
>
> Bay 1 is mounted as /media/data2/, bay 2 is mounted as /media/data3, and
> bays 3 and 4 are a RAID1 (/dev/md0) mounted as /media/backup.
>
> Yesterday my dirvish backup reported rsync errors with /media/data2/ and
> /media/data3/. I sent one error report to the dirvish mail list (since
> they're both the same rsync input/output errors. No response yet.
>
> I've uploaded one dirvish error report (as temp.tmp) to
>  because at 201 lines it's too large to include in
> this message. It will remain there for 5 days.
>
> The two drives are entered in /etc/fstab as:
> UUID=b50f1824-45ee-4623-adc7-ea737a88902b  /media/data2  ext4
> auto,users,rw  1 2
> UUID=8b47d782-8e1c-46bb-b314-0c53d90d6fac  /media/data3  ext4
> auto,users,rw  1 2
>
> and fdisk -l reports:
> The primary GPT table is corrupt, but the backup appears OK, so that will
> be used.
> Disk /dev/sdi: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disklabel type: gpt
> Disk identifier: 516A555D-993E-4F90-97A5-D698B61E7170
>
> The primary GPT table is corrupt, but the backup appears OK, so that will
> be used.
> Disk /dev/sdj: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disklabel type: gpt
> Disk identifier: E999846E-C55C-4DF6-8656-F7B68F542F6C
>
> lsblk doesn't include them.
>
> This all worked fine until Wednesday night.
>
> I think the fstab UUIDs are for the partitions /dev/sdi1 and /dev/sdj1
> rather than for /dev/sdi and /dev/sdj.
>
> Could the WD RED drives have failed so quickly?
>
> Should I run fsck? If so, do I specify /dev/sdi and /dev/sdj or the *1
> partitions?
>
> I need the data on these disks. Fortunately, I've added nothing to either
> on
> Tuesday so the backups through Tuesday are all okay.
>
> Rich
> .


I have no time to look through the logs - others will hopefully do.

That said, I would not blame the disks as top suspect. Raid over usb
attached disk array will fail sooner or later. That has been discussed (and
ignored, I understand the inconvenience) a lot here in respect to these
MediaSonic enclosures.

Not wanting to repeat the same over and over  JBOD or Btrfs or zfs are
better choices - over usb disk array IMHO.

Please weight the above as general advice - logs may have real root cause -
which will likely support this.

-Tomas


Re: [PLUG] Hard drive issues

2022-08-18 Thread Rich Shepard

On Thu, 18 Aug 2022, Tomas Kuchta wrote:


That said, I would not blame the disks as top suspect. Raid over usb
attached disk array will fail sooner or later. That has been discussed
(and ignored, I understand the inconvenience) a lot here in respect to
these MediaSonic enclosures.

Not wanting to repeat the same over and over  JBOD or Btrfs or zfs are
better choices - over usb disk array IMHO.


Tomas,

Let me describe the Probox again.

Bay 1 is /media/data2
Bay 2 is /media/data3
Bays 3 and 4 are RAID1 /media/backup

The RAID is fine, it's the two NON-RAID1 disks that are not available.

Rich


Re: [PLUG] Hard drive issues

2022-08-18 Thread Rich Shepard

On Thu, 18 Aug 2022, Ben Koenig wrote:


The log has generic input/output errors. You'll need to check your system
logs at the time of the error. dmesg in particular will tell you if the
USB connection reset because as Tomas mentioned USB connections tend to be
unreliable.


Ben,

Hadn't thought of dmesg:
...
[9151279.099122] EXT4-fs warning (device sdd1): htree_dirblock_to_tree:995: 
inode #2: lblock 0: comm gvfs-udisks2-vo: error -5 reading directory block
[9151279.099132] EXT4-fs warning (device sdd1): htree_dirblock_to_tree:995: 
inode #2: lblock 0: comm gvfs-udisks2-vo: error -5 reading directory block
[9151279.099142] EXT4-fs warning (device sdd1): htree_dirblock_to_tree:995: 
inode #2: lblock 0: comm gvfs-udisks2-vo: error -5 reading directory block
[9151279.099151] EXT4-fs warning (device sdd1): htree_dirblock_to_tree:995: 
inode #2: lblock 0: comm gvfs-udisks2-vo: error -5 reading directory block
[9151279.099635] EXT4-fs error (device sdd1): ext4_find_entry:1455: inode #2: 
comm pool: reading directory lblock 0
[9151279.099638] EXT4-fs error (device sdc1): ext4_find_entry:1455: inode #2: 
comm pool: reading directory lblock 0
[9153896.496795] EXT4-fs error (device sdc1): ext4_find_entry:1455: inode #2: 
comm gvfsd-trash: reading directory lblock 0
[9153896.496819] EXT4-fs error (device sdc1): ext4_find_entry:1455: inode #2: 
comm gvfsd-trash: reading directory lblock 0
[9153896.496852] EXT4-fs error (device sdd1): ext4_find_entry:1455: inode #2: 
comm gvfsd-trash: reading directory lblock 0
[9153896.496869] EXT4-fs error (device sdd1): ext4_find_entry:1455: inode #2: 
comm gvfsd-trash: reading directory lblock 0
[9156467.969892] EXT4-fs (sdc1): error count since last fsck: 54
[9156467.969897] EXT4-fs (sdd1): error count since last fsck: 30
[9156467.969898] EXT4-fs (sdc1): initial error at time 1660653781: 
ext4_find_entry:1455: inode 2
[9156467.969901] EXT4-fs (sdd1): initial error at time 1660653781: 
ext4_find_entry:1455
[9156467.969902] EXT4-fs (sdc1): last error at time 1660826190: 
ext4_find_entry:1455
[9156467.969902] : inode 2
[9156467.969904] : inode 2
[9156467.969909] EXT4-fs (sdd1): last error at time 1660826190: 
ext4_find_entry:1455: inode 2

So dmesg sees sdc1 and sdd1 while fdisk sees sdi and sdj. And I used UUIDs
in fstab.


Also, why is it that nobody ever posts the output of the mount command
when they have filesystem errors? Nobody cares about your /etc/fstab.
Seriously, it doesn't matter and we don't need to see it. 'mount |grep
*media*'


I looked at mount a few times; didn't think of posting it. Just now it
shows:
# mount
/dev/sda3 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sdb1 on /home type ext4 (rw)
/dev/sdb2 on /opt type ext4 (rw)
/dev/sdb3 on /data1 type ext4 (rw)
/dev/sda1 on /boot/efi type vfat (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sdc1 on /media/data2 type ext4 (rw,noexec,nosuid,nodev)
/dev/sdd1 on /media/data3 type ext4 (rw,noexec,nosuid,nodev)
/dev/md0 on /media/backup type ext4 (rw)
gvfsd-fuse on /tmp/runtime-rshepard/gvfs type fuse.gvfsd-fuse 
(rw,nosuid,nodev,user=rshepard)

Thanks,

Rich


Re: [PLUG] Hard drive issues

2022-08-18 Thread Michael Ewan
Ext4 is not known for being a robust file system and can cause corruption,
plus as others have pointed out USB is fine for making a quick copy and
then disconnecting, it is not suitable for RAID storage.  Would it be
possible to get an external eSATA enclosure?  As for the file system, XFS
is paramount in my experience, very fast and reliable.

On Thu, Aug 18, 2022 at 2:13 PM Rich Shepard 
wrote:

> On Thu, 18 Aug 2022, Ben Koenig wrote:
>
> > The log has generic input/output errors. You'll need to check your system
> > logs at the time of the error. dmesg in particular will tell you if the
> > USB connection reset because as Tomas mentioned USB connections tend to
> be
> > unreliable.
>
> Ben,
>
> Hadn't thought of dmesg:
> ...
> [9151279.099122] EXT4-fs warning (device sdd1):
> htree_dirblock_to_tree:995: inode #2: lblock 0: comm gvfs-udisks2-vo: error
> -5 reading directory block
> [9151279.099132] EXT4-fs warning (device sdd1):
> htree_dirblock_to_tree:995: inode #2: lblock 0: comm gvfs-udisks2-vo: error
> -5 reading directory block
> [9151279.099142] EXT4-fs warning (device sdd1):
> htree_dirblock_to_tree:995: inode #2: lblock 0: comm gvfs-udisks2-vo: error
> -5 reading directory block
> [9151279.099151] EXT4-fs warning (device sdd1):
> htree_dirblock_to_tree:995: inode #2: lblock 0: comm gvfs-udisks2-vo: error
> -5 reading directory block
> [9151279.099635] EXT4-fs error (device sdd1): ext4_find_entry:1455: inode
> #2: comm pool: reading directory lblock 0
> [9151279.099638] EXT4-fs error (device sdc1): ext4_find_entry:1455: inode
> #2: comm pool: reading directory lblock 0
> [9153896.496795] EXT4-fs error (device sdc1): ext4_find_entry:1455: inode
> #2: comm gvfsd-trash: reading directory lblock 0
> [9153896.496819] EXT4-fs error (device sdc1): ext4_find_entry:1455: inode
> #2: comm gvfsd-trash: reading directory lblock 0
> [9153896.496852] EXT4-fs error (device sdd1): ext4_find_entry:1455: inode
> #2: comm gvfsd-trash: reading directory lblock 0
> [9153896.496869] EXT4-fs error (device sdd1): ext4_find_entry:1455: inode
> #2: comm gvfsd-trash: reading directory lblock 0
> [9156467.969892] EXT4-fs (sdc1): error count since last fsck: 54
> [9156467.969897] EXT4-fs (sdd1): error count since last fsck: 30
> [9156467.969898] EXT4-fs (sdc1): initial error at time 1660653781:
> ext4_find_entry:1455: inode 2
> [9156467.969901] EXT4-fs (sdd1): initial error at time 1660653781:
> ext4_find_entry:1455
> [9156467.969902] EXT4-fs (sdc1): last error at time 1660826190:
> ext4_find_entry:1455
> [9156467.969902] : inode 2
> [9156467.969904] : inode 2
> [9156467.969909] EXT4-fs (sdd1): last error at time 1660826190:
> ext4_find_entry:1455: inode 2
>
> So dmesg sees sdc1 and sdd1 while fdisk sees sdi and sdj. And I used UUIDs
> in fstab.
>
> > Also, why is it that nobody ever posts the output of the mount command
> > when they have filesystem errors? Nobody cares about your /etc/fstab.
> > Seriously, it doesn't matter and we don't need to see it. 'mount |grep
> > *media*'
>
> I looked at mount a few times; didn't think of posting it. Just now it
> shows:
> # mount
> /dev/sda3 on / type ext4 (rw)
> proc on /proc type proc (rw)
> sysfs on /sys type sysfs (rw)
> tmpfs on /dev/shm type tmpfs (rw)
> /dev/sdb1 on /home type ext4 (rw)
> /dev/sdb2 on /opt type ext4 (rw)
> /dev/sdb3 on /data1 type ext4 (rw)
> /dev/sda1 on /boot/efi type vfat (rw)
> devpts on /dev/pts type devpts (rw,gid=5,mode=620)
> /dev/sdc1 on /media/data2 type ext4 (rw,noexec,nosuid,nodev)
> /dev/sdd1 on /media/data3 type ext4 (rw,noexec,nosuid,nodev)
> /dev/md0 on /media/backup type ext4 (rw)
> gvfsd-fuse on /tmp/runtime-rshepard/gvfs type fuse.gvfsd-fuse
> (rw,nosuid,nodev,user=rshepard)
>
> Thanks,
>
> Rich
>


[PLUG] Recovering physical memory question

2022-08-18 Thread American Citizen

Hi:

I have been running a mathematical programming language on my openSuse 
Linux system, but have noticed that running programs in this language 
seem to be chewing up physical memory, but not releasing it back when 
the program is terminated, or killed. Once I had all 32 gigs of memory 
allocated and about 12 gigs of swap, leading to a severely swamped 
system, which I barely recovered from.


Is there any command that can be run, to recover good physical memory? I 
know rebooting the system will recover the physical memory, but this is 
the last step.


I suspect a memory leak in the programming language as the cause of all 
this.


Thanks for your input.

Randall




Re: [PLUG] Hard drive issues

2022-08-18 Thread Rich Shepard

On Thu, 18 Aug 2022, Ben Koenig wrote:


If fdisk is seeing those partitions as sdi/sdj but they show in mount as
their original sdc/sdd, then that definitely means they disconnected and
reconnected. It probably happened so fast that the system didn't have time
to clean up the old devices. You might want to check to make sure /dev
doesn't have any strange block devices laying around. ls /dev/sd*
shouldn't show any files for sdc or sdd. lsblk is a more detailed version
of that.


# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdb  8:16   0   1.8T  0 disk 
├─sdb2   8:18   0   100G  0 part  /opt

├─sdb3   8:19   0   1.3T  0 part  /data1
└─sdb1   8:17   0   400G  0 part  /home
sdi  8:128  0   1.8T  0 disk 
└─md09:00   1.8T  0 raid1 /media/backup
sr0 11:01  1024M  0 rom 
sdg  8:96   0   1.8T  0 disk 
└─sdg1   8:97   0   1.8T  0 part 
sda  8:00 232.9G  0 disk 
├─sda2   8:2032G  0 part  [SWAP]

├─sda3   8:30 200.8G  0 part  /
└─sda1   8:10   100M  0 part  /boot/efi
sdj  8:144  0   1.8T  0 disk 
└─md09:00   1.8T  0 raid1 /media/backup
sdh  8:112  0   1.8T  0 disk 
└─sdh1   8:113  0   1.8T  0 part



It looks like all you need to do is umount the drives, then mount fresh to
force a replay of the ext4 journal. Since the fstab entries go by UUID you
probably don't need to do anything more, it will find the correct path in
/dev.


Sonofagun. I umounted then mounted /dev/sdc1 and /dev/sdd1 and now mount
shows them (again) as /dev/sdg1 and /dev/sdh1, but I can (as a user) once
again access those two drive. Ergo, that fixed the problem.

As long as I can access and back up those two partitions I don't care that
different utilities see them as differently named devices.

Thanks very much, Ben.

Rich




Re: [PLUG] Hard drive issues

2022-08-18 Thread Rich Shepard

On Thu, 18 Aug 2022, Michael Ewan wrote:


Ext4 is not known for being a robust file system and can cause corruption,
plus as others have pointed out USB is fine for making a quick copy and
then disconnecting, it is not suitable for RAID storage. Would it be
possible to get an external eSATA enclosure? As for the file system, XFS
is paramount in my experience, very fast and reliable.


Michael,

Be that as it may, there's nothing wrong with the two disks of the RAID1
array. And the Probox supposed eSATA but the current desktop hasn't an eSATA
port. The new one does ... as soon as I can make the time to finish wiring
components and install and configure the OS and applications.

Regards,

Rich


Re: [PLUG] Recovering physical memory question

2022-08-18 Thread Rich Shepard

On Thu, 18 Aug 2022, American Citizen wrote:


I have been running a mathematical programming language on my openSuse
Linux system, but have noticed that running programs in this language seem
to be chewing up physical memory, but not releasing it back when the
program is terminated, or killed. Once I had all 32 gigs of memory
allocated and about 12 gigs of swap, leading to a severely swamped system,
which I barely recovered from.


Randall,

Just out of curiosity, what language are you using?

Rich


Re: [PLUG] Recovering physical memory question

2022-08-18 Thread Russell Senior
If you want to recover the swap space back into RAM (assuming the RAM is
available again):

  sudo swapoff -a
  sudo swapon -a

... should do the trick.

On Thu, Aug 18, 2022 at 3:00 PM American Citizen 
wrote:

> Hi:
>
> I have been running a mathematical programming language on my openSuse
> Linux system, but have noticed that running programs in this language
> seem to be chewing up physical memory, but not releasing it back when
> the program is terminated, or killed. Once I had all 32 gigs of memory
> allocated and about 12 gigs of swap, leading to a severely swamped
> system, which I barely recovered from.
>
> Is there any command that can be run, to recover good physical memory? I
> know rebooting the system will recover the physical memory, but this is
> the last step.
>
> I suspect a memory leak in the programming language as the cause of all
> this.
>
> Thanks for your input.
>
> Randall
>
>
>


Re: [PLUG] Recovering physical memory question

2022-08-18 Thread American Citizen
Before running this command, please make sure that your system has 
enough available physical memory to do the transfer. I did this once, 
having about 11 or 12 gigs swap, and only 4 gigs avail, and suddenly 
realized that all physical memory was going to be swallowed. Try as I 
could (using htop commands) I could NOT abort the swapoff command, 
despite how many sig signals were sent to the command after it became live.



On 8/18/22 15:38, Russell Senior wrote:

If you want to recover the swap space back into RAM (assuming the RAM is
available again):

   sudo swapoff -a
   sudo swapon -a

... should do the trick.

On Thu, Aug 18, 2022 at 3:00 PM American Citizen 
wrote:


Hi:

I have been running a mathematical programming language on my openSuse
Linux system, but have noticed that running programs in this language
seem to be chewing up physical memory, but not releasing it back when
the program is terminated, or killed. Once I had all 32 gigs of memory
allocated and about 12 gigs of swap, leading to a severely swamped
system, which I barely recovered from.

Is there any command that can be run, to recover good physical memory? I
know rebooting the system will recover the physical memory, but this is
the last step.

I suspect a memory leak in the programming language as the cause of all
this.

Thanks for your input.

Randall





Re: [PLUG] Recovering physical memory question

2022-08-18 Thread Tomas Kuchta
>
> .


If you have memory leak your only option is to check if you have a proces
running taking the memory and kill that process.

top and ps -ef should help to find the process to kill.

If your system is healthy and you want to free Ram by evicting
buffers/cache - memhog can help by taking a lot of ram and releasing it.
Just make sure you do not ask for more memory to "hog" than the OS needs.

-Tomas

>


Re: [PLUG] Recovering physical memory question

2022-08-18 Thread Randy Bush
Valgrind?


Re: [PLUG] Recovering physical memory question

2022-08-18 Thread Bill Barry
On Thu, Aug 18, 2022 at 5:00 PM American Citizen
 wrote:
>
> Hi:
>
> I have been running a mathematical programming language on my openSuse
> Linux system, but have noticed that running programs in this language
> seem to be chewing up physical memory, but not releasing it back when
> the program is terminated, or killed. Once I had all 32 gigs of memory
> allocated and about 12 gigs of swap, leading to a severely swamped
> system, which I barely recovered from.
>
> Is there any command that can be run, to recover good physical memory? I
> know rebooting the system will recover the physical memory, but this is
> the last step.
>
> I suspect a memory leak in the programming language as the cause of all
> this.
>
> Thanks for your input.
>
> Randall
>

If memory is not freed when the program terminates, that seems to me
to be an operating system bug, not a programming language bug. How do
you know the program actually terminated and how do you know the
memory  is not being freed?

Bill


BIll


Re: [PLUG] Recovering physical memory question

2022-08-18 Thread Russell Senior
Top can sort processes by memory size, which can help figure out if your
memory eating process is still alive.

Another useful command line is: echo 3 > /proc/sys/vm/drop_caches