Eric, In case you haven't seen it, IBM officially released 8.1.14.200 with quite a few replication/STGRULE fixes including one related to canceling replication processes causing hangs ( https://www.ibm.com/support/pages/apar/IT41066). The whole list is here:
https://www.ibm.com/support/pages/node/6567121 On Tue, Aug 9, 2022 at 2:54 AM Loon, Eric van (ITOP DI) - KLM < eric-van.l...@klm.com> wrote: > Hi Zoltan, > > This all sounds so familiar. To my opinion, all releases after 8.1.12 are > the most buggy versions IBM ever released (yes, maybe even more buggy than > the infamous 6.1). I'm in close contact with somebody from development and > it already resulted in 6 or 7 APARs and still not everything is working OK. > The crashes, the hanging replications, the slow replications, the stale > sessions which can't be canceled, I have seen them all... > One question: are you using 'traditional' node replication or are you > using stgrule replication? The latter one contains a nasty bug: when > replication fails of gets canceled, the next replication runs VERY slow. > The only way to fix this is by running a special script I received from > support, along with several DB2 commands... No permanent fix available yet. > But to come back on your question: I have never been able to cancel those > session, the only way to get rid of them is by bouncing the server. > > Kind regards, > Eric van Loon > Air France/KLM Core Infra > > -----Original Message----- > From: ADSM: Dist Stor Manager <ADSM-L@VM.MARIST.EDU> On Behalf Of Zoltan > Forray > Sent: maandag 8 augustus 2022 16:08 > To: ADSM-L@VM.MARIST.EDU > Subject: Cancel Session with EXTREME PREJUDICE (a.k.a. FORCE) > > First off, we do not know how anyone can run replication of data from a > FILE base storagepool (yes, we know that CONTAINERS fixes everything🙄) to > another server. Every attempt we have made usually ends up in a mess that > we have to undo/cleanup. We find it is very slow (10G on both ends) and > the replication processes never seem to finish/end. > > We have observed that no matter how many or how few replication sessions > we start, most of them seem to go idle/wait (e.g. MAXSESSIONS=10 starts > 20-sessions to the target server of which 16+ become idle eventhough there > is 4TB to replication. > > Since we need to get off magnetic tape (moving to a new building with > restricted space so existing ATL has to go!), we have been using the > offsite server as Virtual Volumes and creating offsite backups to it. This > was working pretty well until we started experiencing server crashes/cores > after upgrading to 8.1.14 (support confirmed a bug - sent us an eFix for it > - we were continuing to have intermittent crashes - support discovered > another related bug via RECONCILE VOLUMES command - just installed another > eFix that is supposed to address the crashes). > > While waiting for a fix for the original server problem, we decided to try > to transition back to replication - only to have more problems than the > crashes. We have had to bounce the target and source servers multiple times > due to replication sessions that won't go away/end as well as the > performance issues I mentioned above. > > Did I mention the issues with the 8.1.15 Linux client also related to > replication? > > Since we have the shared stgpools/reconcile volumes eFix (8.1.14.110) > installed on all servers, we have decided to go back to virtual volumes. > > Now back to the subject of this post. Right now I have 4-replication > sessions on the target server that say they are doing something (i.e. not > in a WAIT), but in reality have been hung since August 1st (we had > installed the eFix but forgot to disable the admin command that kicks off > replication). There are no replication sessions on the source server. > > All attempts to cancel the ghost sessions on the target server say they > can't be canceled. > > So before we bounce it one-more-time, we were wondering if there is a > super-secret "cancel session with force" we are not aware of? > > -- > *Zoltan Forray* > Enterprise Backup Administrator > VMware Systems Administrator > Enterprise Compute & Storage Platforms Team VCU Infrastructure Services > www.ucc.vcu.edu zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim > - VCU and other reputable organizations will never use email to request > that you reply with your password, social security number or confidential > personal information. For more details visit http://phishing.vcu.edu/ < > https://adminmicro2.questionpro.com> > ******************************************************** > For information, services and offers, please visit our web site: > http://www.klm.com. This e-mail and any attachment may contain > confidential and privileged material intended for the addressee only. If > you are not the addressee, you are notified that no part of the e-mail or > any attachment may be disclosed, copied or distributed, and that any other > action related to this e-mail or attachment is strictly prohibited, and may > be unlawful. If you have received this e-mail by error, please notify the > sender immediately by return e-mail, and delete this message. > > Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its > employees shall not be liable for the incorrect or incomplete transmission > of this e-mail or any attachments, nor responsible for any delay in receipt. > Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch > Airlines) is registered in Amstelveen, The Netherlands, with registered > number 33014286 > ******************************************************** > -- *Zoltan Forray* Enterprise Backup Administrator VMware Systems Administrator Enterprise Compute & Storage Platforms Team VCU Infrastructure Services www.ucc.vcu.edu zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim - VCU and other reputable organizations will never use email to request that you reply with your password, social security number or confidential personal information. For more details visit http://phishing.vcu.edu/ <https://adminmicro2.questionpro.com>