Hi JAB, You have to run "... change --simulate-dead" or "... --simulate- failing" to get the disk out of the system. Afterwards you can start replacement procedure.
mmvdisk pdisk change --recovery-group dssg2 --pdisk e1d2s25 --simulate- dead Kind regards Timm Stamer Am Donnerstag, dem 20.06.2024 um 21:14 +0100 schrieb Jonathan Buzzard: > > So came to light because I was checking the mmbackup logs and found > that > we had not been getting any successful backups for several days and > seeing lots of errors like this > > Wed Jun 19 21:45:28 2024 mmbackup:Error encountered in policy scan: > [E] > Error on gpfs_iopen([/gpfs/users/xxxyyyyy/.swr],68050746): Stale file > handle > Wed Jun 19 21:45:28 2024 mmbackup:Error encountered in policy scan: > [E] > Summary of errors:: _dirscan failures:3, _serious unclassified > errors:3. > > After some digging around wondering what was going on I came across > these being logged on one of the DSS-G nodes > > [Wed Jun 12 22:22:05 2024] blk_update_request: I/O error, dev sdbv, > sector 9144672512 op 0x1:(WRITE) flags 0x700 phys_seg 17 prio class 0 > > Yikes looks like I have a failed disk/ However if I do > > [root@gpfs2 ~]# mmvdisk pdisk list --recovery-group all --not-ok > mmvdisk: All pdisks are ok. > > Clearly that's a load of rubbish. > > After a lot more prodding > > [root@gpfs2 ~]# mmvdisk pdisk list --recovery-group dssg2 --pdisk > e1d2s25 -L > pdisk: > replacementPriority = 1000 > name = "e1d2s25" > device = > "//gpfs1/dev/sdft(notEnabled),//gpfs1/dev/sdfu(notEnabled),//gpfs2/de > v/sdfb,//gpfs2/dev/sdbv" > recoveryGroup = "dssg2" > declusteredArray = "DA1" > state = "ok" > IOErrors = 444 > IOTimeouts = 8958 > mediaErrors = 15 > > > What on earth gives? Why has the disk not been failed? It's not great > that a clearly bad disk is allowed to stick around in the file system > and cause problems IMHO. > > When I try and prepare the disk for removal I get > > [root@gpfs2 ~]# mmvdisk pdisk replace --prepare --rg dssg2 --pdisk > e1d2s25 > mmvdisk: Pdisk e1d2s25 of recovery group dssg2 is not currently > scheduled for replacement. > mmvdisk: > mmvdisk: > mmvdisk: Command failed. Examine previous error messages to determine > cause. > > Do I have to use the --force option? I would like to get this disk > out > the file system ASAP. > > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org