Hey Tony,

I haven't been following this thread very closely, but I enjoyed your email. 

I have some personal experience using backup software that scans and saves in 
parallel and it's wonderful. Last time I looked, it took about 30 seconds on a 
10 year old system with 2 CPUs and 1 HDD to scan over 30,000 files for changes 
on an incremental backup. 

I wonder what software the OP's IT staff are using that it's timing out...

David Cook
Software Engineer
Prosentient Systems
72/330 Wattle St
Ultimo, NSW 2007
Australia

Office: 02 9212 0899
Online: 02 8005 0595

-----Original Message-----
From: dspace-tech@googlegroups.com <dspace-tech@googlegroups.com> On Behalf Of 
Tony Brian Albers
Sent: Thursday, 13 August 2020 3:44 PM
To: dspace-tech@googlegroups.com; mwoodiu...@gmail.com
Subject: Re: [dspace-tech] Backup issues due to assetstore size

I hope I'm not being annoying, please bear with me. 

I'd like to explain how parallelism actually works in some(not all) backup 
systems. See my answers below if you're interested. 

On Wed, 2020-08-12 at 09:14 -0400, Mark H. Wood wrote:
> 
> 
> If the bottleneck is network, higher parallelism doesn't help.

Right, it won't.

>   If
> the bottleneck is CPU or memory, higher parallelism doesn't help.

Yes, but that really depends on whether the CPU/MEM is actually used properly.

>   If
> the bottleneck is disk, higher parallelism only helps if those ten 
> volumes are on separate physical disks.

Not necessarily. In general backup software scans the file system looking for 
changes etc. before actually starting to stream data to the backup storage. 
Often the scan and the save stream are handled by single-threaded processes who 
traverses the file and folder structure, and in this case where we have a large 
number of files, this will take a large amount of time even though the disk 
might not be struggling at all. But it will still be seen as a disk bottleneck.
By employing several scanning and streaming processes on the same file 
system/disk, we can actually speed things up a lot. 

For instance, EMC NetWorker version > 8.1 actually can do parallel save streams 
in one backup job on the same file system. I've used this a number of times in 
situations like this and it helps a lot.


>   (Where do you buy 10GB disks
> these days?)  Same for the storage controller(s).  That left me to 
> consider the backup system itself, where the slowest thing in the 
> entire process, by far, is the tape drive(s).

If they use tape drives and write directly to them, that might be true.
It's hard to utilize a tape drive's performance by saving directly to it from a 
client. However, for sequential reads/writes tape is often extremely fast and 
can easily outperform a RAID-5 storage system of the same capacity. LTO-8 for 
example has a compressed write speed of 900MB/sec.
So if you have a fast temporary storage area on the backup system, you can 
stream to that from the client and let the backup system dump it to tape and 
save huge amounts of time. 

> 
> The problem *could* be the local disks.  I'd want to prove that first.

It actually could. And that's probably a good place to start since it's quite 
easy to check.

/tony
(who is actually a certified EMC NetWorker Specialist - Implementation
Engineer) 

> 
> --
> Mark H. Wood
> Lead Technology Analyst
> 
> University Library
> Indiana University - Purdue University Indianapolis
> 755 W. Michigan Street
> Indianapolis, IN 46202
> 317-274-0749
> www.ulib.iupui.edu
> 
--
Tony Albers - Systems Architect - IT Development Royal Danish Library, Victor 
Albecks Vej 1, 8000 Aarhus C, Denmark
Tel: +45 2566 2383 - CVR/SE: 2898 8842 - EAN: 5798000792142

--
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/ada0a42220c81cfbd244fe6741e93225d5f76e35.camel%40kb.dk.

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/0adc01d67138%240ab48bc0%24201da340%24%40prosentient.com.au.

Attachment: signature.asc
Description: PGP signature

Reply via email to