Tasks ENQ'ing exclusive on resource not getting control in FIFO order
It was my understanding that tasks enqueueing on a resource are getting control in FIFO order, if contention existed. Today, we had a situation where this was not true. Environment is as follows: - 4 system parallel sysplex, z/OS V1.13, GRS STAR mode. - a dozen or so jobs running the same program are active across all 4 systems at a time. More job being submitted as jobs end (more or less). - the programs are serializing using EXCLUSIVE ENQ on a resource, scope systems As expected, one job is running, all others are waiting to get the resource assigned. But suddenly, we the recognized that jobs on two systems never got running. They have been waiting for the resource for hours, while newer jobs got control one after the other. So resource assignment is clearly not FIFO. We then saw (in EJES) that once a job ended, all waiting jobs are active for a very short time, then one job continues to run while all other are waiting again. I have RTFM, and still think ENQ is FIFO. I have not found anything related to GRS STAR mode that contradicts. I have not followed GRS new lately. What am I missing? -- Peter Hunkeler -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Tasks ENQ'ing exclusive on resource not getting control in FIFO order
It sounds as if the relative CPU speed and/or hypervisor dispatching of LPARs may be involved in any given LPAR's getting the resource. If after GRS sends a signal to all enqueing systems that the resource is available it then waits for a response from said systems, it may be that GRS will give the resource to whichever system responded first and is ignoring the order in which the processors did the original ENQs. Perhaps a timestamp needs to be associated with each ENQ request and the global resource allocator made sensitive to the timestamp. Or maybe the documentation needs to be updated to reflect the different way that SCOPE=SYSTEMS ENQ works in GRS from SCOPE=SYSTEM with no GRS involved. The relative processor speed certainly has been known to affect which of several sharing processors will next get access to a shared DASD volume using the RESERVE/RELEASE hardware function. Bill Fairchild Programmer Rocket Software 408 Chamberlain Park Lane • Franklin, TN 37069-2526 • USA t: +1.617.614.4503 • e: bfairch...@rocketsoftware.com • w: www.rocketsoftware.com -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Peter Hunkeler Sent: Thursday, January 17, 2013 2:46 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Tasks ENQ'ing exclusive on resource not getting control in FIFO order It was my understanding that tasks enqueueing on a resource are getting control in FIFO order, if contention existed. Today, we had a situation where this was not true. Environment is as follows: - 4 system parallel sysplex, z/OS V1.13, GRS STAR mode. - a dozen or so jobs running the same program are active across all 4 systems at a time. More job being submitted as jobs end (more or less). - the programs are serializing using EXCLUSIVE ENQ on a resource, scope systems As expected, one job is running, all others are waiting to get the resource assigned. But suddenly, we the recognized that jobs on two systems never got running. They have been waiting for the resource for hours, while newer jobs got control one after the other. So resource assignment is clearly not FIFO. We then saw (in EJES) that once a job ended, all waiting jobs are active for a very short time, then one job continues to run while all other are waiting again. I have RTFM, and still think ENQ is FIFO. I have not found anything related to GRS STAR mode that contradicts. I have not followed GRS new lately. What am I missing? -- Peter Hunkeler -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Tasks ENQ'ing exclusive on resource not getting control in FIFO order
IBM Mainframe Discussion List IBM-MAIN@listserv.ua.edu wrote on 01/17/2013 04:42:33 PM: It sounds as if the relative CPU speed and/or hypervisor dispatching of LPARs may be involved in any given LPAR's getting the resource. If after GRS sends a signal to all enqueing systems that the resource is available it then waits for a response from said systems, it may be that GRS will give the resource to whichever system responded first and is ignoring the order in which the processors did the original ENQs. Perhaps a timestamp needs to be associated with each ENQ request and the global resource allocator made sensitive to the timestamp. Or maybe the documentation needs to be updated to reflect the different way that SCOPE=SYSTEMS ENQ works in GRS from SCOPE=SYSTEM with no GRS involved. The relative processor speed certainly has been known to affect which of several sharing processors will next get access to a shared DASD volume using the RESERVE/RELEASE hardware function. Bill Fairchild Programmer Rocket Software 408 Chamberlain Park Lane ? Franklin, TN 37069-2526 ? USA t: +1.617.614.4503 ? e: bfairch...@rocketsoftware.com ? w: www.rocketsoftware.com -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU ] On Behalf Of Peter Hunkeler Sent: Thursday, January 17, 2013 2:46 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Tasks ENQ'ing exclusive on resource not getting control in FIFO order It was my understanding that tasks enqueueing on a resource are getting control in FIFO order, if contention existed. Today, we had a situation where this was not true. Environment is as follows: - 4 system parallel sysplex, z/OS V1.13, GRS STAR mode. - a dozen or so jobs running the same program are active across all 4 systems at a time. More job being submitted as jobs end (more or less). - the programs are serializing using EXCLUSIVE ENQ on a resource, scope systems As expected, one job is running, all others are waiting to get the resource assigned. But suddenly, we the recognized that jobs on two systems never got running. They have been waiting for the resource for hours, while newer jobs got control one after the other. So resource assignment is clearly not FIFO. We then saw (in EJES) that once a job ended, all waiting jobs are active for a very short time, then one job continues to run while all other are waiting again. I have RTFM, and still think ENQ is FIFO. I have not found anything related to GRS STAR mode that contradicts. I have not followed GRS new lately. What am I missing? -- Peter Hunkeler GRS resource contention is intended to be processed FIFO, regardless of Ring mode vs. Star mode. Jim Mulder z/OS System Test IBM Corp. Poughkeepsie, NY -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Tasks ENQ'ing exclusive on resource not getting control in FIFO order
Have you checked to see if the system that ends up winning the ENQ is the same as the one which was running the job that just ended and did the DEQ? It seems to me that the DEQing system has the best chance of winning any race condition since it is the first to know of the DEQ and thus the first to be able to grab the ENQ lock. At 14:46 -0600 on 01/17/2013, Peter Hunkeler wrote about Tasks ENQ'ing exclusive on resource not getting control in : x-charset UTF-8It was my understanding that tasks enqueueing on a resource are getting control in FIFO order, if contention existed. Today, we had a situation where this was not true. Environment is as follows: - 4 system parallel sysplex, z/OS V1.13, GRS STAR mode. - a dozen or so jobs running the same program are active across all 4 systems at a time. More job being submitted as jobs end (more or less). - the programs are serializing using EXCLUSIVE ENQ on a resource, scope systems As expected, one job is running, all others are waiting to get the resource assigned. But suddenly, we the recognized that jobs on two systems never got running. They have been waiting for the resource for hours, while newer jobs got control one after the other. So resource assignment is clearly not FIFO. We then saw (in EJES) that once a job ended, all waiting jobs are active for a very short time, then one job continues to run while all other are waiting again. I have RTFM, and still think ENQ is FIFO. I have not found anything related to GRS STAR mode that contradicts. I have not followed GRS new lately. What am I missing? -- Peter Hunkeler -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN /x-charset -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Tasks ENQ'ing exclusive on resource not getting control in FIFO order
Have you checked to see if the system that ends up winning the ENQ is the same as the one which was running the job that just ended and did the DEQ? It seems to me that the DEQing system has the best chance of winning any race condition since it is the first to know of the DEQ and thus the first to be able to grab the ENQ lock. Yes, we saw symptoms like that, too, but if GRS contention resolution is defined to be FIFO, then there is no race condition one system can win. I wanted to confirm I'm not missing something that changed in GRS processing, before opening a PMR. Jim Mulder confirmed that it still had to be FIFO. We now have to find out why we saw different behavior. -- Peter Hunkeler -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN