Hey Tony, I haven't been following this thread very closely, but I enjoyed your email.
I have some personal experience using backup software that scans and saves in parallel and it's wonderful. Last time I looked, it took about 30 seconds on a 10 year old system with 2 CPUs and 1 HDD to scan over 30,000 files for changes on an incremental backup. I wonder what software the OP's IT staff are using that it's timing out... David Cook Software Engineer Prosentient Systems 72/330 Wattle St Ultimo, NSW 2007 Australia Office: 02 9212 0899 Online: 02 8005 0595 -----Original Message----- From: dspace-tech@googlegroups.com <dspace-tech@googlegroups.com> On Behalf Of Tony Brian Albers Sent: Thursday, 13 August 2020 3:44 PM To: dspace-tech@googlegroups.com; mwoodiu...@gmail.com Subject: Re: [dspace-tech] Backup issues due to assetstore size I hope I'm not being annoying, please bear with me. I'd like to explain how parallelism actually works in some(not all) backup systems. See my answers below if you're interested. On Wed, 2020-08-12 at 09:14 -0400, Mark H. Wood wrote: > > > If the bottleneck is network, higher parallelism doesn't help. Right, it won't. > If > the bottleneck is CPU or memory, higher parallelism doesn't help. Yes, but that really depends on whether the CPU/MEM is actually used properly. > If > the bottleneck is disk, higher parallelism only helps if those ten > volumes are on separate physical disks. Not necessarily. In general backup software scans the file system looking for changes etc. before actually starting to stream data to the backup storage. Often the scan and the save stream are handled by single-threaded processes who traverses the file and folder structure, and in this case where we have a large number of files, this will take a large amount of time even though the disk might not be struggling at all. But it will still be seen as a disk bottleneck. By employing several scanning and streaming processes on the same file system/disk, we can actually speed things up a lot. For instance, EMC NetWorker version > 8.1 actually can do parallel save streams in one backup job on the same file system. I've used this a number of times in situations like this and it helps a lot. > (Where do you buy 10GB disks > these days?) Same for the storage controller(s). That left me to > consider the backup system itself, where the slowest thing in the > entire process, by far, is the tape drive(s). If they use tape drives and write directly to them, that might be true. It's hard to utilize a tape drive's performance by saving directly to it from a client. However, for sequential reads/writes tape is often extremely fast and can easily outperform a RAID-5 storage system of the same capacity. LTO-8 for example has a compressed write speed of 900MB/sec. So if you have a fast temporary storage area on the backup system, you can stream to that from the client and let the backup system dump it to tape and save huge amounts of time. > > The problem *could* be the local disks. I'd want to prove that first. It actually could. And that's probably a good place to start since it's quite easy to check. /tony (who is actually a certified EMC NetWorker Specialist - Implementation Engineer) > > -- > Mark H. Wood > Lead Technology Analyst > > University Library > Indiana University - Purdue University Indianapolis > 755 W. Michigan Street > Indianapolis, IN 46202 > 317-274-0749 > www.ulib.iupui.edu > -- Tony Albers - Systems Architect - IT Development Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark Tel: +45 2566 2383 - CVR/SE: 2898 8842 - EAN: 5798000792142 -- All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/ --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/ada0a42220c81cfbd244fe6741e93225d5f76e35.camel%40kb.dk. -- All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/ --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/0adc01d67138%240ab48bc0%24201da340%24%40prosentient.com.au.
signature.asc
Description: PGP signature