I'll second that. Our policy is that no node should generate a non-zero exit code at the end of a backup. There's a couple things that we do:
* Exclude directories that generated errors that do not need to be backed up (Firefox cache, all sorts of Apple stuff, etc.) * Work with users to identify files that don't need SHRDYNAMIC copy serialization. This reduces the number of retries the client does. For data in a workflow pipeline, eventually it will be static, and there's no point spending a lot of time waiting for it to become static. * Provide directories that users can create that are never backed up (in our case, these are NoBackup, NOBACKUP, nobackup, nobackups, NoBackups, and NOBACKUPS). We are an HPC shop, and many files are created on filesystems that we backup that are transient and are only useful for the life of the job. If the job is running while backups are happening, then we get lots of errors when these files are removed. After a disaster, that job would just be resubmitted based on source data that are elsewhere on the filesystem. Users can create those directories anywhere, and files in those directories are never backed up. -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine On 11/ 7/12 08:02 AM, Arbogast, Warren K wrote:
One more thought. In my experience some-to-many client admins assume that failed file backups (file in use, file not found, file changed, etc) are the cause of failed backups (condition code 12). If the failed files aren't important to them they look no further for the cause of the failed backup. I believe this misunderstanding exacerbates their habit of ignoring automated alerts. We have text in the alert message to correct that perception, but we still hear that opinion. Keith