Bright minds, Some time ago a problem has arisen with one of the GPFS file systems that I happen to backup. The system I'm talking about is: Server: IBM x345 with SLES 9 SP3 FC switch: Cisco MDS9124 Storage: DS4400 (formerly FAStT700 ), latest firmware GPFS ver: 2.3 Multipathing: IBM supplied RDAC (Linux MPP Driver Version: 09.01.B5.76) TSM client: 5.4.1.2; server: 5.3.6 The filesystem:
bioinfo4:~ # df -h /srv Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 452G 377G 76G 84% /srv bioinfo4:~ # mmlsfs /dev/gpfs1 flag value description ---- -------------- ----------------------------------------------------- -s roundRobin Stripe method -f 2048 Minimum fragment size in bytes -i 512 Inode size in bytes -I 8192 Indirect block size in bytes -m 1 Default number of metadata replicas -M 1 Maximum number of metadata replicas -r 1 Default number of data replicas -R 1 Maximum number of data replicas -j cluster Block allocation type -D posix File locking semantics in effect -k posix ACL semantics in effect -a 1048576 Estimated average file size -n 32 Estimated number of nodes that will mount file system -B 65536 Block size -Q user;group Quotas enforced user;group Default quotas enabled -F 6999936 Maximum number of inodes -V 8.01 File system version. Highest supported version: 8.02 -u yes Support for large LUNs? -z no Is DMAPI enabled? -E yes Exact mtime mount option -S no Suppress atime mount option -d gpfs4nsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /srv Default mount point Basically what happens is that the backup of that particular file system never completes, cuts short with return code 12. I have two GPFS file systems on that linux box, both reside on the same storage and are identically connected in terms of storage, FC topology and multipathing. One backs up without a hitch while the other doesn't. Log excerpt below illustrates what's going on. Online GPFS fsck returns no errors (mmfsck /dev/gpfs1 -o). I haven't tried offline fsck. Any ideas on how to proceed about this problem will be appreciated! bioinfo4:~ # mmlsfs /dev/gpfs1 flag value description ---- -------------- ----------------------------------------------------- -s roundRobin Stripe method -f 2048 Minimum fragment size in bytes -i 512 Inode size in bytes -I 8192 Indirect block size in bytes -m 1 Default number of metadata replicas -M 1 Maximum number of metadata replicas -r 1 Default number of data replicas -R 1 Maximum number of data replicas -j cluster Block allocation type -D posix File locking semantics in effect -k posix ACL semantics in effect -a 1048576 Estimated average file size -n 32 Estimated number of nodes that will mount file system -B 65536 Block size -Q user;group Quotas enforced user;group Default quotas enabled -F 6999936 Maximum number of inodes -V 8.01 File system version. Highest supported version: 8.02 -u yes Support for large LUNs? -z no Is DMAPI enabled? -E yes Exact mtime mount option -S no Suppress atime mount option -d gpfs4nsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /srv Default mount point bioinfo4:~ # df -h /srv Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 452G 377G 76G 84% /srv 01/28/08 21:00:12 Scheduler has been started by Dsmcad. 01/28/08 21:00:12 Querying server for next scheduled event. 01/28/08 21:00:12 Node Name: BIOINFO4 01/28/08 21:00:12 Session established with server GALAHAD: Linux/i386 01/28/08 21:00:12 Server Version 5, Release 3, Level 6.0 01/28/08 21:00:12 Server date/time: 01/28/08 21:00:12 Last access: 01/28/08 20:26:46 01/28/08 21:00:12 --- SCHEDULEREC QUERY BEGIN 01/28/08 21:00:12 --- SCHEDULEREC QUERY END 01/28/08 21:00:12 Next operation scheduled: 01/28/08 21:00:12 ------------------------------------------------------------ 01/28/08 21:00:12 Schedule Name: 21_SCHED_18 01/28/08 21:00:12 Action: Incremental 01/28/08 21:00:12 Objects: 01/28/08 21:00:12 Options: 01/28/08 21:00:12 Server Window Start: 21:00:00 on 01/28/08 01/28/08 21:00:12 ------------------------------------------------------------ 01/28/08 21:00:12 Executing scheduled command now. 01/28/08 21:00:12 --- SCHEDULEREC OBJECT BEGIN 21_SCHED_18 01/28/08 21:00:00 01/28/08 21:00:12 Incremental backup of volume '/' 01/28/08 21:00:12 Incremental backup of volume '/boot' 01/28/08 21:00:12 Incremental backup of volume '/csminstall' 01/28/08 21:00:12 Incremental backup of volume '/home' 01/28/08 21:00:12 Incremental backup of volume '/srv' <snip> 01/28/08 21:07:51 Successful incremental backup of '/boot' <snip> 01/28/08 21:08:05 Successful incremental backup of '/' <snip> 01/28/08 21:09:53 Successful incremental backup of '/csminstall' <snip> 01/28/08 23:59:45 ANS1802E Incremental backup of '/home' finished with 1 failure <snip> 01/29/08 00:00:01 Normal File--> 59,008 /srv/group.quota [Sent] 01/29/08 00:00:01 Normal File--> 262,144 /srv/user.quota [Sent] 01/29/08 00:00:01 Normal File--> 8,109 /srv/LogShared/apache2/access_log [Sent] <snip> 01/29/08 02:28:31 Normal File--> 1,268,946 /srv/databases/unigeneU/Hs.lib.info [Sent] 01/29/08 02:28:46 Normal File--> 221,453,209 /srv/databases/unigeneU/Hs.profiles [Sent] 01/29/08 02:29:04 Normal File--> 694,651,680 /srv/databases/unigeneU/Hs.data [Sent] 01/29/08 02:29:28 Normal File--> 684,135,874 /srv/databases/unigeneU/Hs.retired.lst [Sent] 01/29/08 02:29:28 ANS1999E Incremental processing of '/srv' stopped. 01/29/08 02:29:28 --- SCHEDULEREC STATUS BEGIN 01/29/08 02:29:28 Total number of objects inspected: 3,039,708 01/29/08 02:29:28 Total number of objects backed up: 559,287 01/29/08 02:29:28 Total number of objects updated: 1 01/29/08 02:29:28 Total number of objects rebound: 0 01/29/08 02:29:28 Total number of objects deleted: 0 01/29/08 02:29:28 Total number of objects expired: 95 01/29/08 02:29:28 Total number of objects failed: 1 01/29/08 02:29:28 Total number of bytes transferred: 70.16 GB 01/29/08 02:29:28 Data transfer time: 6,053.40 sec 01/29/08 02:29:28 Network data transfer rate: 12,153.50 KB/sec 01/29/08 02:29:28 Aggregate data transfer rate: 3,723.94 KB/sec 01/29/08 02:29:28 Objects compressed by: 0% 01/29/08 02:29:28 Elapsed processing time: 05:29:15 01/29/08 02:29:28 --- SCHEDULEREC STATUS END 01/29/08 02:29:28 ANS1028S An internal program error occurred. 01/29/08 02:29:28 --- SCHEDULEREC OBJECT END 21_SCHED_18 01/28/08 21:00:00 01/29/08 02:29:28 ANS1512E Scheduled event '21_SCHED_18' failed. Return code = 12. 01/29/08 02:29:28 Sending results for scheduled event '21_SCHED_18'. 01/29/08 02:29:29 Results sent to server for scheduled event '21_SCHED_18'. -- Warm regards, Michael Green