Hi Eric,

Thank you for confirming what my co-worker and I have been experiencing and
that we aren't alone with these issues.  We would never have jumped from
8.1.12 to .14 if it were not for log4j and the plethora of other possible
intrusions!

I have seen way too many APARS related to STGRULE and have no desire to
inflict even more pain upon ourselves!   These are (were) standard
administrative schedule or console "replicate node" commands using node
groups or sometimes just a single node trying to reconcile "errors" caused
by canceling replication.

This brings up another question - what is the purpose of the "cancel
replication" command if it causes such damage that subsequent replications
issue these dire warnings about *"detected partially replicated data from a
previous replication operation. This might result in extended processing
time while the server is replicating"*. IMO, it sounds like a regular
"cancel process" would do about the same?  I understand about special
commands like "cancel expiration" which allows subsequent expire inventory
commands to resume processing where the "canceled expiration" left off but
cancel replication doesn't seem to offer much benefit - or is this another
APAR waiting for mitigation?


On Tue, Aug 9, 2022 at 2:54 AM Loon, Eric van (ITOP DI) - KLM <
eric-van.l...@klm.com> wrote:

> Hi Zoltan,
>
> This all sounds so familiar. To my opinion, all releases after 8.1.12 are
> the most buggy versions IBM ever released (yes, maybe even more buggy than
> the infamous 6.1). I'm in close contact with somebody from development and
> it already resulted in 6 or 7 APARs and still not everything is working OK.
> The crashes, the hanging replications, the slow replications, the stale
> sessions which can't be canceled, I have seen them all...
> One question: are you using 'traditional' node replication or are you
> using stgrule replication? The latter one contains a nasty bug: when
> replication fails of gets canceled, the next replication runs VERY slow.
> The only way to fix this is by running  a special script I received from
> support, along with several DB2 commands... No permanent fix available yet.
> But to come back on your question: I have never been able to cancel those
> session, the only way to get rid of them is by bouncing the server.
>
> Kind regards,
> Eric van Loon
> Air France/KLM Core Infra
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager <ADSM-L@VM.MARIST.EDU> On Behalf Of Zoltan
> Forray
> Sent: maandag 8 augustus 2022 16:08
> To: ADSM-L@VM.MARIST.EDU
> Subject: Cancel Session with EXTREME PREJUDICE (a.k.a. FORCE)
>
> First off, we do not know how anyone can run replication of data from a
> FILE base storagepool (yes, we know that CONTAINERS fixes everything🙄) to
> another server.  Every attempt we have made usually ends up in a mess that
> we have to undo/cleanup.  We find it is very slow (10G on both ends) and
> the replication processes never seem to finish/end.
>
> We have observed that no matter how many or how few replication sessions
> we start, most of them seem to go idle/wait (e.g. MAXSESSIONS=10 starts
> 20-sessions to the target server of which 16+ become idle eventhough there
> is 4TB to replication.
>
> Since we need to get off magnetic tape (moving to a new building with
> restricted space so existing ATL has to go!), we have been using the
> offsite server as Virtual Volumes and creating offsite backups to it.  This
> was working pretty well until we started experiencing server crashes/cores
> after upgrading to 8.1.14 (support confirmed a bug - sent us an eFix for it
> - we were continuing to have intermittent crashes - support discovered
> another related bug via RECONCILE VOLUMES command - just installed another
> eFix that is supposed to address the crashes).
>
> While waiting for a fix for the original server problem, we decided to try
> to transition back to replication - only to have more problems than the
> crashes. We have had to bounce the target and source servers multiple times
> due to replication sessions that won't go away/end as well as the
> performance issues I mentioned above.
>
> Did I mention the issues with the 8.1.15 Linux client also related to
> replication?
>
> Since we have the shared stgpools/reconcile volumes eFix (8.1.14.110)
> installed on all servers, we have decided to go back to virtual volumes.
>
> Now back to the subject of this post.  Right now I have 4-replication
> sessions on the target server that say they are doing something (i.e. not
> in a WAIT), but in reality have been hung since August 1st (we had
> installed the eFix but forgot to disable the admin command that kicks off
> replication).  There are no replication sessions on the source server.
>
> All attempts to cancel the ghost sessions on the target server say they
> can't be canceled.
>
> So before we bounce it one-more-time, we were wondering if there is a
> super-secret "cancel session with force" we are not aware of?
>
> --
> *Zoltan Forray*
> Enterprise Backup Administrator
> VMware Systems Administrator
> Enterprise Compute & Storage Platforms Team VCU Infrastructure Services
> www.ucc.vcu.edu zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim
> - VCU and other reputable organizations will never use email to request
> that you reply with your password, social security number or confidential
> personal information. For more details visit http://phishing.vcu.edu/ <
> https://adminmicro2.questionpro.com>
> ********************************************************
> For information, services and offers, please visit our web site:
> http://www.klm.com. This e-mail and any attachment may contain
> confidential and privileged material intended for the addressee only. If
> you are not the addressee, you are notified that no part of the e-mail or
> any attachment may be disclosed, copied or distributed, and that any other
> action related to this e-mail or attachment is strictly prohibited, and may
> be unlawful. If you have received this e-mail by error, please notify the
> sender immediately by return e-mail, and delete this message.
>
> Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its
> employees shall not be liable for the incorrect or incomplete transmission
> of this e-mail or any attachments, nor responsible for any delay in receipt.
> Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch
> Airlines) is registered in Amstelveen, The Netherlands, with registered
> number 33014286
> ********************************************************
>


-- 
*Zoltan Forray*
Enterprise Backup Administrator
VMware Systems Administrator
Enterprise Compute & Storage Platforms Team
VCU Infrastructure Services
www.ucc.vcu.edu
zfor...@vcu.edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://phishing.vcu.edu/
<https://adminmicro2.questionpro.com>

Reply via email to