Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-16 Thread Anssi Saari
gene heskett  writes:

> Is this info helpful?

I don't know really. I was thinking about the file dialogs or requestors
and how they often try access previously used locations. For example,
I've learned not to download with Firefox to a network drive.

I don't know if Firefox is still like that but in the past, after
downloading to a network drive, Firefox wanted to put the next download
in the same place and if the network drive wasn't available, it just
froze indefinitely and only getting that network drive going would bring
it out of its coma.

So I was just thinking if your file dialogs try to access something that
isn't available it could cause this kind of delay but I don't know, it
doesn't seem to fit that well. Also if the issue happened with any
common app packaged in Debian, it might be easier to figure out what's
happening.



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-15 Thread Stefan Monnier
> I've no idea how to start debugging this but I feel like the problem

`strace` maybe?


Stefan



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-15 Thread Andy Smith
Hello,

On Fri, Dec 15, 2023 at 12:00:00AM -0800, David Christensen wrote:
> On 12/14/23 18:36, gene heskett wrote:
> > Thunar, yes, but I don't use it, not my cup of tea.

[…]

> It sounds like OpenSCAD and gidislicer have something in common that is
> causing the issue, while the other apps do not have that something.  So, the
> challenge is finding the shared object files (dynamic linking) and/or the
> source files (static linking) that are present in the affected programs and
> not present in the unaffected programs.

I will add that the Thunar file manager was included in Gene's
"affected" list and I think Gene posted some logs before that showed
some dbus timeout.

I've no idea how to start debugging this but I feel like the problem
may exist somewhere in Gene's desktop environment and affect
things that call its file dialog.

Anyway, I think we can all agree at this point that this has got
nothing to do with RAID and mdadm. Though Gene said he has tried to
reinstall several times I think, with the same outcome, so it's not
something that's going to be avoided by another install otherwise I
might suggest that.

Thanks,
Andy

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-15 Thread David Christensen

On 12/14/23 18:36, gene heskett wrote:

On 12/14/23 16:36, Anssi Saari wrote:

gene heskett  writes:


It repeats per gui access. Starting a gfx program such as OpenSCAD, or
qidislicer from an xfce4 terminal cli, is delayed for this similar but
not always identical lag. And reports odd warnings etc while its
getting ready to open its gui.


Does this happen with common GUI tools too like, say, Firefox? 

firefox, no.
Or XFCE's

file manager, Thunar I believe?
Thunar, yes, but I don't use it, not my cup of tea.  It wants to be a 
replacement for mc, but fails at 90% of what mc can do.



  Or a text editor like Gedit?

Gedit has ben banned from any of my machines for at least 15 years, it 
made scrambled eggs out of of several linuxcnc configuration files I had 
to re-write from scratch, but geany has never done that.  And geany is 
as instant as nano.


  Or even the

XFCE terminal?
Comes up instantly from the menu, I use it heavily because it has tabs. 
I use them much like workspaces.


Is this info helpful?

Thank you Anssi Saari

Cheers, Gene Heskett.



It sounds like OpenSCAD and gidislicer have something in common that is 
causing the issue, while the other apps do not have that something.  So, 
the challenge is finding the shared object files (dynamic linking) 
and/or the source files (static linking) that are present in the 
affected programs and not present in the unaffected programs.



It would be helpful if you posted a list of affected programs and a list 
of unaffected programs to provide alternatives for a search.  Please 
note any programs that you did not install using conventional Debian 
packages (and that may be the root cause of the issue).



David



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-14 Thread gene heskett

On 12/14/23 16:36, Anssi Saari wrote:

gene heskett  writes:


It repeats per gui access. Starting a gfx program such as OpenSCAD, or
qidislicer from an xfce4 terminal cli, is delayed for this similar but
not always identical lag. And reports odd warnings etc while its
getting ready to open its gui.


Does this happen with common GUI tools too like, say, Firefox? 

firefox, no.
Or XFCE's

file manager, Thunar I believe?
Thunar, yes, but I don't use it, not my cup of tea.  It wants to be a 
replacement for mc, but fails at 90% of what mc can do.



 Or a text editor like Gedit?

Gedit has ben banned from any of my machines for at least 15 years, it 
made scrambled eggs out of of several linuxcnc configuration files I had 
to re-write from scratch, but geany has never done that.  And geany is 
as instant as nano.


 Or even the

XFCE terminal?
Comes up instantly from the menu, I use it heavily because it has tabs. 
I use them much like workspaces.


Is this info helpful?

Thank you Anssi Saari

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-14 Thread Anssi Saari
gene heskett  writes:

> It repeats per gui access. Starting a gfx program such as OpenSCAD, or
> qidislicer from an xfce4 terminal cli, is delayed for this similar but 
> not always identical lag. And reports odd warnings etc while its
> getting ready to open its gui.

Does this happen with common GUI tools too like, say, Firefox? Or XFCE's
file manager, Thunar I believe? Or a text editor like Gedit? Or even the
XFCE terminal?



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-14 Thread gene heskett

On 12/14/23 04:17, Nicolas George wrote:

to...@tuxteam.de (12023-12-14):

I've skimmed some of the answers, and they correspond to your confusing
request. Someone mentions DNS timeouts to rule them out right away (do
you access your RAID over the net? Is DNS resolution involved at all?)


no, and no.


He quoted:


Error creating proxy: Error calling StartServiceByName for
org.gtk.vfs.GPhoto2VolumeMonitor: Timeout was reached (g-io-error-quark, 24)


The odd part of that is that there is, stuck on screen on every 
workspace, a volume control gui of some kind that has no exit icon. I 
cannot get rid of it. It has a wrench icon where most gui's have an exit 
button. And that lead to a red trash can icon labeled remove widget, and 
it did, whatever the heck a widget is.



That means the issue is in the DBus monster moussaka¹. The odds of
finding a solution in the current circumstances are vanishingly thin.

Regards,



Cheers, Nik, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-14 Thread gene heskett

On 12/14/23 00:39, to...@tuxteam.de wrote:

On Wed, Dec 13, 2023 at 10:26:19AM -0500, gene heskett wrote:

Greetings all;

I thought I was doing things right a year back when I built a raid10 for my
/home partition. but I'm tired of fighting with it for access. Anything that
wants to open a file on it, is subjected to a freeze of at least 30 seconds
BEFORE the file requester is drawn on screen.  Once it has done the screen
draw and the path is established, read/writes then proceed at multi-gigabyte
speeds just like it should [...]


  - disk access latency
  - digikam
  - photo volume monitor
  - cache buffers (which?)
  - klipper


I've been here several times with this problem without any constructive
responses [...]



So one more time: Why can't I use my software raid10 on 4 1T SSD's ?


Gene, just a humble suggestion. I'm too short in time to wade through all
this deep software cake, of which I know but a fraction.

Perhaps if you structured your requests a bit better, the quality of the
answers would improve?


The latest info is that non-gui stuff works instantly. gui stuff lags at 
least 30 seconds, mouse still moves but the rest of the same screen is 
non-responsive until this tomeout has taken place, then everything 
returns to normal.



I've skimmed some of the answers, and they correspond to your confusing
request. Someone mentions DNS timeouts to rule them out right away (do
you access your RAID over the net? Is DNS resolution involved at all?)


no

Other answers veer of in similar disparate directions, but that corresponds
to your request's deeply confusing nature.


Because I was not able to define it any better.



Let me humbly suggest to structure your search a bit (you do have deep
experience in fault searching, we all know).

What I get from your post is that you seem to see the root of your problems
in a long latency on (first?) storage access to your block device (whether
it matters that it be a RAID10 or a RAID42 we just don't know!).


As I also don't know, this raid10 is my only experience with a raid of 
any kind.




This looks like a promising avenue, so let's pretend we start with this
one.

Do you experience this latency also with simpler tools (something which
doesn't "draw a requester on screen", like, say, ls or find)?


no, even a dd write is essentially instant.


Let's thus try to rule out the deep pie of sh*** (uh, software stack)
you are using to access the disk. Do you still observe this latency?
Is there a pattern (like, when accessing something for the first time,
and/or accessing things after a longer inactivity period, yadda, yadda).


It repeats per gui access. Starting a gfx program such as OpenSCAD, or 
qidislicer from an xfce4 terminal cli, is delayed for this similar but 
not always identical lag. And reports odd warnings etc while its getting 
ready to open its gui.


Might there be a clue there?  IDK  I could copy/paste some of it if you 
like to see it, but to me it doesn't look related or I would have already.


If yes, you can follow the path "disk access latency". If no, the problem
might lie further up the stack (and then, things like DNS latencies might
play a role again!).

With your posts, my head spins and my time slot in the mornings, before
I go to $DAYJOB is used up before I can start even to think about how
debug things.


And I appreciate that Tomas, $DAYJOBS take precedence, always. Triply 
appreciated when you are the only tech person responsible for keeping a 
tv station on the air and working smoothly like I was for nearly 50 
years before I retired, its not a $DAYJOB, its a $24/7/365.25JOB.



In one short word: please focus. Debugging complex stuff becomes impossible
otherwise.


I now think I have a gui problem and can imagine something in the 
original debian gnome install getting the request to open the gui and 
has to fail before xfce4 even gets the request. Whether that is true or 
not, is up to ways to test the theory. That I'm clueless about.  There 
is enough kde/plasma installed that it thinks it has to start kmail at 
boot time, lots of kde and gnome leftovers, but the popups asking for a 
pw, are also subjected to this delay after the bootup is completed. Is 
this all connected? At this time IDK.



Cheers


Thank you Tomas.

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-13 Thread David Christensen

On 12/13/23 15:33, gene heskett wrote:
gene@coyote:~$ time dd if=/dev/zero of=/home/gene/zero bs=1M count=100 
oflag=sync

100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.935655 s, 112 MB/s

real    0m0.940s
user    0m0.000s
sys 0m0.254s



Thank you for providing a console session that confirms the issue is not 
md RAID.



For completeness, I suggest that you do both write and read benchmarks:

2023-12-13 17:56:58 root@taz ~
# smartctl -i /dev/sda | grep "Device Model"
Device Model: INTEL SSDSC2CW060A3

2023-12-13 17:57:12 root@taz ~
# dd if=/dev/zero of=/home/dpchrist/100mb.zero bs=1M count=100 oflag=sync
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 1.6329 s, 64.2 MB/s

2023-12-13 17:57:57 root@taz ~
# free && sync && echo 3 > /proc/sys/vm/drop_caches && free
   totalusedfree  shared  buff/cache 
available
Mem:32698252 194103229428472  761160 1328748 
29606508

Swap: 976892   0  976892
   totalusedfree  shared  buff/cache 
available
Mem:32698252 194151629580404  717360 1176332 
29650836

Swap: 976892   0  976892

2023-12-13 17:58:03 root@taz ~
# dd of=/dev/null if=/home/dpchrist/100mb.zero bs=1M count=100 oflag=sync
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.263723 s, 398 MB/s


I have found that as I run computers, there is an accumulation of cruft 
over time.  The more I mess with a computer, the sooner it becomes 
unstable.   Eventually, "finding the needle in the haystack" and 
"putting Humpty Dumpty back together again" do not work -- the computer 
requires a backup-wipe-install-restore cycle.  Your posts indicate that 
your computer is overdue.



And, I suspect a deeper issue -- you have one computer that is your 
workstation, your file server, and your backup server.  This 
over-complicates everything and creates a strong disincentive to 
backup-wipe-install-restore.  I have been there, done that, lost 
service, and lost data.  Now I have several laptops/ desktops/ 
workstations, a dedicated file server, and a dedicated backup server. 
Life is good.  :-)



Again -- I suggest that you build a backup server, then build a file 
server, then rebuild the workstation.  I am confident you will be 
rewarded with simpler administration and improved reliability.



David



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-13 Thread gene heskett

On 12/13/23 16:30, Andy Smith wrote:

Hello,

On Wed, Dec 13, 2023 at 10:26:19AM -0500, gene heskett wrote:

I thought I was doing things right a year back when I built a raid10 for my
/home partition. but I'm tired of fighting with it for access. Anything that
wants to open a file on it, is subjected to a freeze of at least 30 seconds
BEFORE the file requester is drawn on screen.


I haven't chimed in to any of the multiple times you've brought this
to the list, because it's just so bizarre. I've about 20 years'
experience of using mdadm and have never seen anything like what
you're reporting, so I just don't know what the problem could be or
how to find it.

The only times I've seen anything remotely like it have been when
there's been hardware problems with failing writes, but I know
you've been through this with the list several time sand no such low
level issues were ever uncovered.

Would it be correct to say that you only experience these IO delays
from GUI applications? Like, if you do a simple:

$ time echo "test" > ~/foo

does that complete in a normal time?
The lshw >lshw.txt would be in the "guiless" category, and it worked 
even quicker than if I left it to come out on the cli. Another item that 
may be of interest is that the gui is xfce4.


That was a bit over a 40k write


And if you did a bigger write, again from the command line?

$ dd if=/dev/zero of=/path/to/your/home/dir/zero bs=1m count=100
00+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0783268 s, 1.3 GB/s


I had to use an uppercase M in the bs=, to get:
gene@coyote:~$ time dd if=/dev/zero of=/home/gene/zero bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0829926 s, 1.3 GB/s

real0m0.085s
user0m0.005s
sys 0m0.080s
I'd have to say thats realtime. no lag.

and with sync:

$ dd if=/dev/zero of=/path/to/your/home/dir/zero bs=1m count=100 oflag=sync
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.463356 s, 226 MB/s

gene@coyote:~$ time dd if=/dev/zero of=/home/gene/zero bs=1M count=100 
oflag=sync

100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.935655 s, 112 MB/s

real0m0.940s
user0m0.000s
sys 0m0.254s
A lot slower but still no lag, about an even second.


But then when you try to use some GUI application to save something
to a file, the initial save file dialog takers ages to appear and
everything seems frozen?

Correct...



If so then I feel like this may actually be some sort of problem
with your desktop environment, but then I've no idea how to narrow
that down.
I think we have made progress, Andy, having narrowed it down to the gui. 
 To me that is progress.  Now I need a gui expert, which I am for sure 
not.  Never have been, never will be.


Thanks,
Andy

Thanks a bunch Andy, I think your logic was quite helpful in narrowing 
down the problem area.

Take care, stay warm and well.

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-13 Thread Andrew M.A. Cater
On Wed, Dec 13, 2023 at 02:19:07PM -0500, gene heskett wrote:
> On 12/13/23 13:24, Andrew M.A. Cater wrote:
> > On Wed, Dec 13, 2023 at 10:26:19AM -0500, gene heskett wrote:
> > > Greetings all;
> > > 
> > 
> > Hi Gene,
> > 
> > Respectfully, if I were you, I might consider tearing down one machine
> > and rebuilding the data on it bit by bit.
> > 
> > Questions to answer first:
> > 
> > 1. Are all the disks the same size?
> > 
> yes
> > 2. Are all the disks the same manufacturer?
> All 1T Samsung 870's
> > 3. Are they all connected to the same controller if this is an add-in card?
> > 
> yes. add in card.
> 
> > If not an add in card:
> > 
> > 4. Are they all connected to the SATA sockets on the motherboard?
> > motherboard?
> > 
> No. All connected to a 6 port board, in port order.
> 
> > 4. If to the motherboard, are they the only devices connected to the SATA
> > sockets there?
> No, main board is Asus Prime Z370-A II but it only has 6 ports, all busy.
> > 
> > 5. What is the primary device that has / on it - NVME / SSD / spinning rust?
> 
> SSD, another 1T samsung.
> 
> > > So one more time: Why can't I use my software raid10 on 4 1T SSD's ?
> > > 
> > 
> > _How did you set the RAID 10 up?
> > 
> > Would you be willing to scrap the data in /home and start again?
> No, I have a lot of work I'd be the rest of my life rebuilding.
> Howevr, in preparation to restarting amanda, I've just installed a 2nd sata
> add on card, this one with 16 ports, 4 of which are already loaded with 2T
> gigastones so I do have the means to rsync /home to 1 or more of those.
> 

Copy /home to another drive - then disconnect power and drive cables to it.
You seem to like adding many disks to one machine: I'd honestly suggest
grabbing another machine to put half these disks into.

If you've got NVME - put that in as your boot drive, maybe.
Maybe use LVM and guided partitioning with all files in one partition.

Then use the four 1T disks and the four way card and mdadm to set up the
mirrored RAID with LVM on top for /home and add that to your fstab.

Rsync the data back from your one drive that you put the original /home
onto and you're done with that disk.

Do all this with a brand new bookworm disk and linuxcnc and you're done

Simplify, simplify, simplify :)

Andy
(amaca...@debian.org)



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-13 Thread gene heskett

On 12/13/23 13:24, Andrew M.A. Cater wrote:

On Wed, Dec 13, 2023 at 10:26:19AM -0500, gene heskett wrote:

Greetings all;

I thought I was doing things right a year back when I built a raid10 for my
/home partition. but I'm tired of fighting with it for access. Anything that
wants to open a file on it, is subjected to a freeze of at least 30 seconds
BEFORE the file requester is drawn on screen.  Once it has done the screen
draw and the path is established, read/writes then proceed at multi-gigabyte
speeds just like it should, but some applications refuse to wait that long,
so digiKam cannot import from my camera for example one, QIDISlicer is
another that get plumb upset and declares a segfault, core dumped, but it
can't write the core dump for the same reason it declared a segfault.  Here
is a copy/paste of the last attempt to select the "device" tab in
QIDISlicer:
---
Error creating proxy: Error calling StartServiceByName for
org.gtk.vfs.GPhoto2VolumeMonitor: Timeout was reached (g-io-error-quark, 24)

** (qidi-slicer:389574): CRITICAL **: 04:55:46.975: Cannot register URI
scheme wxfs more than once

** (qidi-slicer:389574): CRITICAL **: 04:55:46.975: Cannot register URI
scheme memory more than once

(qidi-slicer:389574): Gtk-CRITICAL **: 04:55:47.084:
gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrollbar
[2023-12-13 05:10:27.325222] [0x7f77e6ffd6c0] [error]   Socket created.
Multicast: 255.255.255.255. Interface: 192.168.71.3
Unhandled unknown exception; terminating the application.
Segmentation fault (core dumped)
-
This where it was attempting to open the cache buffers if needed to remember
what moonraker, a web server driver which is part of the klipper install on
the printer, addressed at 192.168.71.110: with an odd, high numbered port
above 10,000.

I've been here several times with this problem without any constructive
responses other than strace, which of course does NOT work for network
stuff, and would if my past history with it is any indication, generate
several terabytes of output, but it fails for the same reason, no place to
put its output because I assume, it can't write to the raid10 in a timely
manner.



Hi Gene,

Respectfully, if I were you, I might consider tearing down one machine
and rebuilding the data on it bit by bit.

Questions to answer first:

1. Are all the disks the same size?


yes

2. Are all the disks the same manufacturer?

All 1T Samsung 870's

3. Are they all connected to the same controller if this is an add-in card?


yes. add in card.


If not an add in card:

4. Are they all connected to the SATA sockets on the motherboard?
motherboard?


No. All connected to a 6 port board, in port order.


4. If to the motherboard, are they the only devices connected to the SATA
sockets there?

No, main board is Asus Prime Z370-A II but it only has 6 ports, all busy.


5. What is the primary device that has / on it - NVME / SSD / spinning rust?


SSD, another 1T samsung.


So one more time: Why can't I use my software raid10 on 4 1T SSD's ?



_How did you set the RAID 10 up?

Would you be willing to scrap the data in /home and start again?

No, I have a lot of work I'd be the rest of my life rebuilding.
Howevr, in preparation to restarting amanda, I've just installed a 2nd 
sata add on card, this one with 16 ports, 4 of which are already loaded 
with 2T gigastones so I do have the means to rsync /home to 1 or more of 
those.


Since lshw is a bit verbose, and this machine is stuffed, except the m2 
sockets, I'll attach the output of lshw for those who want to peruse it.
Maybe it will answer additional hdwe questions. I do have an m2 module I 
intend to add at some point, a WD_BLACK SN770 NVMe SSD of 2T capacity. 
Supposedly rate at 5160 MHZ/SEC.  Its still in the box.


All best, as ever,

Andy

(amaca...@debian.org)


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
  soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
  - Louis D. Brandeis



.


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
coyote
description: Desktop Computer
product: System Product Name (SKU)
vendor: System manufacturer
version: System Version
serial: System Serial Number
width: 64 bits
capabilities: smbios-3.1.1 dmi-3.1.1 smp vsyscall32
configuration: boot=normal chassis=desktop family=To be filled by O.E.M. 
sku=SKU uuid=93a9e285-63b0-4a26-8f43-40b0765b113c
  *-core
   description: Motherboard
   product: PRIME Z370-A II
   vendor: ASUSTeK COMPUTER INC.
   physical id: 0
   

Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-13 Thread Pocket



On 12/13/23 13:50, Dan Ritter wrote:

Pocket wrote:

Many reasons

If the RAID controller bites the bullet you are usually toast unless you
have another RAID controller (same manufacturer and type) as a spare.

mdadm, zfs and btrfs all lack this problem.


Not for me as I am not going down that worm hole





I have zero luck replacing one companies raid controller with another and
ditto on raid built into the motherboard.

As above.



As above


  

I really don't need any help losing my data/files as I do a good job of that
all by myself ;)

btrfs and zfs have snapshots which really help avoiding losing
data. On other machines, rsnapshot is often suitable.



I am exploring rdiff-backup



I found it is better to just have my data on several backup disks, that way
if one fails I get another disk and copy all the data to the newly purchased
disk.

RAID isn't a backup solution, it's a way of keeping things going
until you have time to restore. (And also a way of improving
performance and/or manageability.)

If you don't need or want it, you shouldn't use it. Same as any
tool.


I don't need the expense or trouble.

Raspberry pi(s) and USB drives equate to "just works"

--

It's not easy to be me



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-13 Thread Pocket



On 12/13/23 13:47, Nicolas George wrote:

Pocket (12023-12-13):

If the RAID controller

Then use software RAID with a Libre implementation.



Nope been there done that and I ain't doing that





I found it is better to just have my data on several backup disks

Yeah, backups and RAID are not meant to protect against the same issues,
so if you think one replaces the other…


After removing raid, I completely redesigned my network to be more inline
with the howtos and other information.

You know that RAID has nothing to do with the setup of your network,
right?



Not saying it did


--
It's not easy to be me



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-13 Thread Dan Ritter
Pocket wrote: 
> 
> Many reasons
> 
> If the RAID controller bites the bullet you are usually toast unless you
> have another RAID controller (same manufacturer and type) as a spare.

mdadm, zfs and btrfs all lack this problem.

> I have zero luck replacing one companies raid controller with another and
> ditto on raid built into the motherboard.

As above.
 
> I really don't need any help losing my data/files as I do a good job of that
> all by myself ;)

btrfs and zfs have snapshots which really help avoiding losing
data. On other machines, rsnapshot is often suitable.


> I found it is better to just have my data on several backup disks, that way
> if one fails I get another disk and copy all the data to the newly purchased
> disk.

RAID isn't a backup solution, it's a way of keeping things going
until you have time to restore. (And also a way of improving
performance and/or manageability.)

If you don't need or want it, you shouldn't use it. Same as any
tool.

-dsr-



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-13 Thread Tom Furie
gene heskett  writes:

> It is a separate 6 port sata controller because the mobo is out of
> ports.  There is no obvious lag during bios post or grub booting it.

That *should* rule out DNS then, unless something really strange is
going on. What does mdadm tell you about the raid device, and its
component devices?

Is the filesystem on the raid healthy?

Cheers,
Tom



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-13 Thread Nicolas George
Pocket (12023-12-13):
> If the RAID controller

Then use software RAID with a Libre implementation.

> I found it is better to just have my data on several backup disks

Yeah, backups and RAID are not meant to protect against the same issues,
so if you think one replaces the other…

> After removing raid, I completely redesigned my network to be more inline
> with the howtos and other information.

You know that RAID has nothing to do with the setup of your network,
right?

-- 
  Nicolas George


signature.asc
Description: PGP signature


Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-13 Thread Pocket



On 12/13/23 13:20, gene heskett wrote:

On 12/13/23 11:51, Pocket wrote:


On 12/13/23 10:26, gene heskett wrote:

Greetings all;

I thought I was doing things right a year back when I built a raid10 
for my /home partition. but I'm tired of fighting with it for 
access. Anything that wants to open a file on it, is subjected to a 
freeze of at least 30 seconds BEFORE the file requester is drawn on 
screen. Once it has done the screen draw and the path is 
established, read/writes then proceed at multi-gigabyte speeds just 
like it should, but some applications refuse to wait that long, so 
digiKam cannot import from my camera for example one, QIDISlicer is 
another that get plumb upset and declares a segfault, core dumped, 
but it can't write the core dump for the same reason it declared a 
segfault.  Here is a copy/paste of the last attempt to select the 
"device" tab in QIDISlicer:

---
Error creating proxy: Error calling StartServiceByName for 
org.gtk.vfs.GPhoto2VolumeMonitor: Timeout was reached 
(g-io-error-quark, 24)


** (qidi-slicer:389574): CRITICAL **: 04:55:46.975: Cannot register 
URI scheme wxfs more than once


** (qidi-slicer:389574): CRITICAL **: 04:55:46.975: Cannot register 
URI scheme memory more than once


(qidi-slicer:389574): Gtk-CRITICAL **: 04:55:47.084: 
gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrollbar
[2023-12-13 05:10:27.325222] [0x7f77e6ffd6c0] [error] Socket 
created. Multicast: 255.255.255.255. Interface: 192.168.71.3

Unhandled unknown exception; terminating the application.
Segmentation fault (core dumped)
-
This where it was attempting to open the cache buffers if needed to 
remember what moonraker, a web server driver which is part of the 
klipper install on the printer, addressed at 192.168.71.110: with an 
odd, high numbered port above 10,000.


I've been here several times with this problem without any 
constructive responses other than strace, which of course does NOT 
work for network stuff, and would if my past history with it is any 
indication, generate several terabytes of output, but it fails for 
the same reason, no place to put its output because I assume, it 
can't write to the raid10 in a timely manner.


So one more time: Why can't I use my software raid10 on 4 1T SSD's 
?


Cheers, Gene Heskett.



I gave up using raid many years ago and I used the extra drives as 
backups.



So why did you give up? Must have been a reason.


Many reasons

No real benefit (companies excepted), and issues like you have been posting.

If the RAID controller bites the bullet you are usually toast unless you 
have another RAID controller (same manufacturer and type) as a spare.


I have zero luck replacing one companies raid controller with another 
and ditto on raid built into the motherboard.


I really don't need any help losing my data/files as I do a good job of 
that all by myself ;)


I found it is better to just have my data on several backup disks, that 
way if one fails I get another disk and copy all the data to the newly 
purchased disk.


After removing raid, I completely redesigned my network to be more 
inline with the howtos and other information.


I have little to nothing on the client system I use daily, everything is 
on networks systems and they have certain things they do.


I have a "git" server that has all my setup/custom/building scripts and 
all my programming and solidworks projects.


I have DELPHI build apps going back to about 1995.

It all backed up to a backup server(master and slave) and also a 4TB 
offline external hard drive.  I have not "lost" any information since.


I also found that DHCP and NetworkManager is your friend.

Maybe you should review your network setup as you seem to have a lot is 
issues with it?





Wrote a script to rsync  /home to the backup drives.



Cheers, Gene Heskett.


--
It's not easy to be me



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-13 Thread gene heskett

On 12/13/23 11:51, Pocket wrote:


On 12/13/23 10:26, gene heskett wrote:

Greetings all;

I thought I was doing things right a year back when I built a raid10 
for my /home partition. but I'm tired of fighting with it for access. 
Anything that wants to open a file on it, is subjected to a freeze of 
at least 30 seconds BEFORE the file requester is drawn on screen. Once 
it has done the screen draw and the path is established, read/writes 
then proceed at multi-gigabyte speeds just like it should, but some 
applications refuse to wait that long, so digiKam cannot import from 
my camera for example one, QIDISlicer is another that get plumb upset 
and declares a segfault, core dumped, but it can't write the core dump 
for the same reason it declared a segfault.  Here is a copy/paste of 
the last attempt to select the "device" tab in QIDISlicer:

---
Error creating proxy: Error calling StartServiceByName for 
org.gtk.vfs.GPhoto2VolumeMonitor: Timeout was reached 
(g-io-error-quark, 24)


** (qidi-slicer:389574): CRITICAL **: 04:55:46.975: Cannot register 
URI scheme wxfs more than once


** (qidi-slicer:389574): CRITICAL **: 04:55:46.975: Cannot register 
URI scheme memory more than once


(qidi-slicer:389574): Gtk-CRITICAL **: 04:55:47.084: 
gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrollbar
[2023-12-13 05:10:27.325222] [0x7f77e6ffd6c0] [error]   Socket 
created. Multicast: 255.255.255.255. Interface: 192.168.71.3

Unhandled unknown exception; terminating the application.
Segmentation fault (core dumped)
-
This where it was attempting to open the cache buffers if needed to 
remember what moonraker, a web server driver which is part of the 
klipper install on the printer, addressed at 192.168.71.110: with an 
odd, high numbered port above 10,000.


I've been here several times with this problem without any 
constructive responses other than strace, which of course does NOT 
work for network stuff, and would if my past history with it is any 
indication, generate several terabytes of output, but it fails for the 
same reason, no place to put its output because I assume, it can't 
write to the raid10 in a timely manner.


So one more time: Why can't I use my software raid10 on 4 1T SSD's ?

Cheers, Gene Heskett.



I gave up using raid many years ago and I used the extra drives as backups.


So why did you give up? Must have been a reason.


Wrote a script to rsync  /home to the backup drives.



Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis



Re: raid10 is killing me, and applications that aren't willing towait for it to respond

2023-12-13 Thread gene heskett

On 12/13/23 10:41, Tom Furie wrote:

gene heskett  writes:


I thought I was doing things right a year back when I built a raid10
for my /home partition. but I'm tired of fighting with it for
access. Anything that wants to open a file on it, is subjected to a
freeze of at least 30 seconds BEFORE the file requester is drawn on
screen.  Once it has done the screen draw and the path is established,


Where is the raid10 located and how is it interfaced to the device
you're accessing it from? That delay, along with other things you
mentioned suggests (but this is only a guess without other relevant
information) a DNS timeout.

It is a separate 6 port sata controller because the mobo is out of 
ports.  There is no obvious lag during bios post or grub booting it.

/etc/fstab:
gene@coyote:/etc/lvm/profile$ cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# systemd generates mount units based on this file, see systemd.mount(5).
# Please run 'systemctl daemon-reload' after making changes here.
#
#
# / was on /dev/sda1 during installation
UUID=f295334b-fdcb-4428-bed3-cb9e9e129be6 /   ext4 
errors=remount-ro 0   1

# /tmp was on /dev/sda3 during installation
UUID=518cb65d-21f0-493f-8bb5-a5f435796991 /tmpext4 
defaults0   2

# swap was on /dev/sda2 during installation
UUID=422b50db-9913-4ed3-92c3-dc18be72cc61 noneswapsw 
 0   0

/dev/sr0/media/cdrom0   udf,iso9660 user,noauto 0   0
UUID=bc6135de-0578-4e3b-b2c0-5c4687abd9bd /home ext4 
errors=remount-ro  0   2
UUID=d24c3a99-9f40-4b71-92d4-916804553cb5 none  swapsw 
0   0

-
From df:
gene@coyote:/etc/lvm/profile$ sudo df
[sudo] password for gene:
Filesystem  1K-blocks  Used  Available Use% Mounted on
udev 16328024 0   16328024   0% /dev
tmpfs 3272676  18763270800   1% /run
/dev/sda1   863983352  18784928  801236776   3% /
tmpfs1636337612   16363364   1% /dev/shm
tmpfs5120 8   5112   1% /run/lock
/dev/sda347749868222628   45069232   1% /tmp
/dev/md0p1 1796382580 330887008 1374170596  20% /home
tmpfs 3272672 458683226804   2% /run/user/1000
-
A dhcpd has been installed, but is limited to issuing a single fixed 
address to a 3d printer plugged into my network, as the printer doesn't 
seem to know what to do with a hosts file that runs the rest of my home 
network. Anything else you want, just ask.



Cheers,
Tom

.

Thank you Tom.

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis