Hi JAB,

You have to run "... change --simulate-dead" or "... --simulate-
failing" to get the disk out of the system. Afterwards you can start
replacement procedure. 


mmvdisk pdisk change --recovery-group dssg2 --pdisk e1d2s25 --simulate-
dead


Kind regards

Timm Stamer


Am Donnerstag, dem 20.06.2024 um 21:14 +0100 schrieb Jonathan Buzzard:
> 
> So came to light because I was checking the mmbackup logs and found
> that
> we had not been getting any successful backups for several days and
> seeing lots of errors like this
> 
> Wed Jun 19 21:45:28 2024 mmbackup:Error encountered in policy scan:
> [E]
> Error on gpfs_iopen([/gpfs/users/xxxyyyyy/.swr],68050746): Stale file
> handle
> Wed Jun 19 21:45:28 2024 mmbackup:Error encountered in policy scan:
> [E]
> Summary of errors:: _dirscan failures:3, _serious unclassified
> errors:3.
> 
> After some digging around wondering what was going on I came across
> these being logged on one of the DSS-G nodes
> 
> [Wed Jun 12 22:22:05 2024] blk_update_request: I/O error, dev sdbv,
> sector 9144672512 op 0x1:(WRITE) flags 0x700 phys_seg 17 prio class 0
> 
> Yikes looks like I have a failed disk/ However if I do
> 
> [root@gpfs2 ~]# mmvdisk pdisk list --recovery-group all --not-ok
> mmvdisk: All pdisks are ok.
> 
> Clearly that's a load of rubbish.
> 
> After a lot more prodding
> 
> [root@gpfs2 ~]# mmvdisk pdisk list --recovery-group dssg2 --pdisk
> e1d2s25 -L
> pdisk:
>     replacementPriority = 1000
>     name = "e1d2s25"
>     device =
> "//gpfs1/dev/sdft(notEnabled),//gpfs1/dev/sdfu(notEnabled),//gpfs2/de
> v/sdfb,//gpfs2/dev/sdbv"
>     recoveryGroup = "dssg2"
>     declusteredArray = "DA1"
>     state = "ok"
>     IOErrors = 444
>     IOTimeouts = 8958
>     mediaErrors = 15
> 
> 
> What on earth gives? Why has the disk not been failed? It's not great
> that a clearly bad disk is allowed to stick around in the file system
> and cause problems IMHO.
> 
> When I try and prepare the disk for removal I get
> 
> [root@gpfs2 ~]# mmvdisk pdisk replace --prepare --rg dssg2 --pdisk
> e1d2s25
> mmvdisk: Pdisk e1d2s25 of recovery group dssg2 is not currently
> scheduled for replacement.
> mmvdisk:
> mmvdisk:
> mmvdisk: Command failed. Examine previous error messages to determine
> cause.
> 
> Do I have to use the --force option? I would like to get this disk
> out
> the file system ASAP.
> 
> 
> 
> JAB.
> 
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

Reply via email to