AW: AW: Problems with SCSI FBA device under z/VM
Thanks for your answer. The Linux was actually shutdown when we tried to vary the EDEVice online. Which possibilities do we have to trace a EDEVice in z/VM? We performed a scsidisc debug and saw the following: INFO::FCP SUB-CHANNEL 2000 Re-Initialized INFO::WWPN 21D02317 Opened INFO::WWPN 21D02317 UTIL LUN Opened INFO::WWPN 21D02317 UTIL LUN Closed INFO::For WWPN 21D02317 No of LUNs found=8 INFO::For 2000 21D02317 Choosen LUN= DEBUG::LUN for WWPN 21D02317 Opened DEBUG::LUN Closed INFO::WWPN 21D02317 Closed We will also open a service request at IBM. Regards, Heiko Ruck Von: The IBM z/VM Operating System [mailto:IBMVM@LISTSERV.UARK.EDU] Im Auftrag von Eric R Farman Gesendet: Mittwoch, 4. Mai 2011 17:28 An: IBMVM@LISTSERV.UARK.EDU Betreff: Re: AW: Problems with SCSI FBA device under z/VM Have you removed the configuration of the LUN from Linux before you varied the EDEVice online? z/VM will not be able to open a LUN R/W if it is already open on another FCP CHPID/WWPN pair. Doesn't matter if it's a different subchannel, if it's on the same CHPID. If that's not it, I would suggest contacting opening a service request so that we can collect some traces and determine why the LUN can't be opened. Non-IBM storage should work, but the UDID Mismatch message you should only get with multiple path devices, not single-path. Regards, Eric Eric Farman z/VM I/O Development IBM Endicott, NY From: Heiko Ruck hr...@mainstorconcept.de To: IBMVM@LISTSERV.UARK.EDU Date: 05/04/2011 03:55 PM Subject: AW: Problems with SCSI FBA device under z/VM Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU We used another subchannel and another LUN for z/Linux - device 2000 was used for z/VM (edev 9000) with LUN1 and device 2002 was used for z/Linux with LUN0. We tried it with and without NPIV, it works in both modes for Linux but not for z/VM. The PPP in the WWPN can be ignored (changed the WWPN after copypaste, sorry for the confusion). -Ursprüngliche Nachricht- Von: The IBM z/VM Operating System [mailto:IBMVM@LISTSERV.UARK.EDU mailto:IBMVM@LISTSERV.UARK.EDU ] Im Auftrag von Richard Troth Gesendet: Mittwoch, 4. Mai 2011 15:29 An: IBMVM@LISTSERV.UARK.EDU Betreff: Re: Problems with SCSI FBA device under z/VM So ... when it works on Linux, how is it defined? Using the same subchannel? (2000) Is NPIV in place? What's with the PPP in the WWPN in the error report? Is Linux also using this device when you try to use it as an EDEV? -- R; Rick Troth Velocity Software http://www.velocitysoftware.com/ http://www.velocitysoftware.com/ On Wed, May 4, 2011 at 08:58, Heiko Ruck hr...@mainstorconcept.de mailto:hr...@mainstorconcept.de wrote: Hi all, we have Problems with using our SCSI-Storage System under z/VM defined as FCP. The same Storage System works fine for our z/Linux running as a guest of this VM, so we assume that there are no errors in the definition of IOCP, HMC, FibreChannel Switch and SCSI Storage System. We are currently using z/VM 5.4 with Service Level 1101 and we can see the WWPN and LUNs of the storage when using scsidisc. The set edev command does also work, but as soon as we vary the device online we receive the following error: HCPSZP8701I Path FCP_DEV 2000 WWPN 2200PPP023100017 LUN 0001 was deleted from EDEV 9000 because it is invalid. HCPCPN8700I Emulated Device 9000 cannot be varied online because HCPCPN8700I there are no valid paths defined to the device. There are no error is in the Operator console. And the same tests in z/VM 5.3 ends with: 11:51:01 HCPAXS3586E UDID mismatch for path 6000PATH01. Function:scsi_mpio_init 11:51:01 HCPCSS3507E Thread (0x01B9C148) encountered a severe error in p ers_paix_loadDevice at line 1604: EID(3) RC(0x00 15) RSN(0x0033) Does anyone have an idea why we receive an error in z/VM although it works under z/Linux? Of course I have to say that this is not a IBM Storage System. Does anyone have a list of Storage Systems which work with z/VM or experiences with a System which works? I think supported are only IBM 2105, 2107, 1750, 2145 and 2810 but with which other systems is it also functional even if it is not supported and/or from another vendor? Thanks, Heiko Ruck Checked by MSC FGT60 Checked by MSC FGT60
Re: AW: AW: Problems with SCSI FBA device under z/VM
In your original note, you were asking to open LUN x0001 as part of an EDEV. But in your SCSIDISC DEBUG, you are opening LUN x. Can you try the same debug against that LUN? Regards, Eric Eric Farman z/VM I/O Development IBM Endicott, NY (607)429-4958 (tie 620) From: Heiko Ruck hr...@mainstorconcept.de To: IBMVM@LISTSERV.UARK.EDU Date: 05/05/2011 11:13 AM Subject: AW: AW: Problems with SCSI FBA device under z/VM Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU Thanks for your answer. The Linux was actually shutdown when we tried to vary the EDEVice online. Which possibilities do we have to trace a EDEVice in z/VM? We performed a scsidisc debug and saw the following: INFO::FCP SUB-CHANNEL 2000 Re-Initialized INFO::WWPN 21D02317 Opened INFO::WWPN 21D02317 UTIL LUN Opened INFO::WWPN 21D02317 UTIL LUN Closed INFO::For WWPN 21D02317 No of LUNs found=8 INFO::For 2000 21D02317 Choosen LUN= DEBUG::LUN for WWPN 21D02317 Opened DEBUG::LUN Closed INFO::WWPN 21D02317 Closed We will also open a service request at IBM. Regards, Heiko Ruck Von: The IBM z/VM Operating System [mailto:IBMVM@LISTSERV.UARK.EDU] Im Auftrag von Eric R Farman Gesendet: Mittwoch, 4. Mai 2011 17:28 An: IBMVM@LISTSERV.UARK.EDU Betreff: Re: AW: Problems with SCSI FBA device under z/VM Have you removed the configuration of the LUN from Linux before you varied the EDEVice online? z/VM will not be able to open a LUN R/W if it is already open on another FCP CHPID/WWPN pair. Doesn't matter if it's a different subchannel, if it's on the same CHPID. If that's not it, I would suggest contacting opening a service request so that we can collect some traces and determine why the LUN can't be opened. Non-IBM storage should work, but the UDID Mismatch message you should only get with multiple path devices, not single-path. Regards, Eric Eric Farman z/VM I/O Development IBM Endicott, NY From: Heiko Ruck hr...@mainstorconcept.de To: IBMVM@LISTSERV.UARK.EDU Date: 05/04/2011 03:55 PM Subject: AW: Problems with SCSI FBA device under z/VM Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU We used another subchannel and another LUN for z/Linux - device 2000 was used for z/VM (edev 9000) with LUN1 and device 2002 was used for z/Linux with LUN0. We tried it with and without NPIV, it works in both modes for Linux but not for z/VM. The PPP in the WWPN can be ignored (changed the WWPN after copypaste, sorry for the confusion). -Ursprüngliche Nachricht- Von: The IBM z/VM Operating System [mailto:IBMVM@LISTSERV.UARK.EDU] Im Auftrag von Richard Troth Gesendet: Mittwoch, 4. Mai 2011 15:29 An: IBMVM@LISTSERV.UARK.EDU Betreff: Re: Problems with SCSI FBA device under z/VM So ... when it works on Linux, how is it defined? Using the same subchannel? (2000) Is NPIV in place? What's with the PPP in the WWPN in the error report? Is Linux also using this device when you try to use it as an EDEV? -- R; Rick Troth Velocity Software http://www.velocitysoftware.com/ On Wed, May 4, 2011 at 08:58, Heiko Ruck hr...@mainstorconcept.de wrote: Hi all, we have Problems with using our SCSI-Storage System under z/VM defined as FCP. The same Storage System works fine for our z/Linux running as a guest of this VM, so we assume that there are no errors in the definition of IOCP, HMC, FibreChannel Switch and SCSI Storage System. We are currently using z/VM 5.4 with Service Level 1101 and we can see the WWPN and LUNs of the storage when using scsidisc. The set edev command does also work, but as soon as we vary the device online we receive the following error: HCPSZP8701I Path FCP_DEV 2000 WWPN 2200PPP023100017 LUN 0001 was deleted from EDEV 9000 because it is invalid. HCPCPN8700I Emulated Device 9000 cannot be varied online because HCPCPN8700I there are no valid paths defined to the device. There are no error is in the Operator console. And the same tests in z/VM 5.3 ends with: 11:51:01 HCPAXS3586E UDID mismatch for path 6000PATH01. Function:scsi_mpio_init 11:51:01 HCPCSS3507E Thread (0x01B9C148) encountered a severe error in p ers_paix_loadDevice at line 1604: EID(3) RC(0x00 15) RSN(0x0033) Does anyone have an idea why we receive an error in z/VM although it works under z/Linux? Of course I have to say that this is not a IBM Storage System. Does anyone have a list of Storage Systems which work with z/VM or experiences with a System which works? I think supported are only IBM 2105, 2107, 1750, 2145 and 2810 but with which
Re: Problem with z/Linux guest Ethernet frames (buffering ?)
Hello Scott, We are using Layer 2 Vswitch for the guest and the MTU size is set to 1492. None of the Ethernet frames are not being dropped by they are being delayed. Regards, Ashwin From: The IBM z/VM Operating System [mailto:IBMVM@LISTSERV.UARK.EDU] On Behalf Of Scott Rohling Sent: Wednesday, May 04, 2011 8:12 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: Problem with z/Linux guest Ethernet frames (buffering ?) Just a guess, but have you checked MTU sizes? Are you using a VSWITCH for the guests or dedicated OSA? Scott Rohling On Wed, May 4, 2011 at 5:15 PM, Bhemidhi, Ashwin ashw...@ti.commailto:ashw...@ti.com wrote: Hello all, Recently we started noticing on few of our Linux guest that Ethernet frames were being delayed up to 25 seconds from the time they were sent to the time the guest received them. The frames are Ethernet LLC keep alive polls (layer 2 poll) that are sent by a Cisco SNA switch router every 30 secs. Both the router and the Linux guest are in the same LAN. Looking at the ethereal traces captured on the guest. During normal operation the keep alive Frames are being sent every 30 secs and the z/Linux guest responds to the poll with in 60 micro seconds. But few times we noticed that the frames were being delayed up to 25 seconds (total time from the previous poll is 30+25) after the router sends the poll frame to the time the Linux guest receives them. This is causing the keep alive timer ( 9 secs = 1 sec X 8 retries) to expire and disconnect sessions. The Linux guest eventually receives the frames including the retires all at the same but by that time the sessions are dropped the router. It appears that the frames are being buffered and are delayed by the guest receives them. We for sure know that the router is sending the poll every 30 seconds but some were some how the frames were buffered (?) for 25 secs before being delivered to the guest. I am trying to figure at which layer the delay was being introduced. Are there any other traces that I can turn on z/VM to diagnose the problem? Were do I start looking at? z/VM LPAR is a small one running 8 guest with 80MB memory and 16MB and 48MB vdisk on a z10 IFL utilization : 2% X 2 IFLS, Central Storage : 95% 768 MB, XSTORE : 97% 256MB, PAGE : 12% X 2 3390-3 page DASD. Paging/Spooling activity: 0/s (most of the times) Thank you, Ashwin Bhemidhi
Re: Problem with z/Linux guest Ethernet frames (buffering ?)
Right now there is the eligible list is 0. I could not check the queues at the time of event, the 3 times it occurred was either during after business hours or over the weekends. By the time I was able to logon the eligible list was 0. I did change the SRM storbuff setting from the default to 300%, 300%, 200%. Regards, Ashwin q srm IABIAS : INTENSITY=90%; DURATION=2 LDUBUF : Q1=100% Q2=75% Q3=60% STORBUF: Q1=300% Q2=300% Q3=200% DSPBUF : Q1=32767 Q2=32767 Q3=32767 DISPATCHING MINOR TIMESLICE = 5 MS MAXWSS : LIMIT=% .. : PAGES=99 XSTORE : 0% LIMITHARD METHOD: DEADLINE From: The IBM z/VM Operating System [mailto:IBMVM@LISTSERV.UARK.EDU] On Behalf Of Marcy Cortes Sent: Wednesday, May 04, 2011 8:37 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: Problem with z/Linux guest Ethernet frames (buffering ?) Be sure your guest isn't dropping into the eligible list. That can look like a network problem. Issue ind q to see if it is there and q srm or consult your performance monitor. SRM STORBUFF setting default is too low for Linux workloads on z/VM so hopefully that has bumped up. Marcy From: The IBM z/VM Operating System [mailto:IBMVM@LISTSERV.UARK.EDU] On Behalf Of Scott Rohling Sent: Wednesday, May 04, 2011 6:12 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: [IBMVM] Problem with z/Linux guest Ethernet frames (buffering ?) Just a guess, but have you checked MTU sizes? Are you using a VSWITCH for the guests or dedicated OSA? Scott Rohling On Wed, May 4, 2011 at 5:15 PM, Bhemidhi, Ashwin ashw...@ti.commailto:ashw...@ti.com wrote: Hello all, Recently we started noticing on few of our Linux guest that Ethernet frames were being delayed up to 25 seconds from the time they were sent to the time the guest received them. The frames are Ethernet LLC keep alive polls (layer 2 poll) that are sent by a Cisco SNA switch router every 30 secs. Both the router and the Linux guest are in the same LAN. Looking at the ethereal traces captured on the guest. During normal operation the keep alive Frames are being sent every 30 secs and the z/Linux guest responds to the poll with in 60 micro seconds. But few times we noticed that the frames were being delayed up to 25 seconds (total time from the previous poll is 30+25) after the router sends the poll frame to the time the Linux guest receives them. This is causing the keep alive timer ( 9 secs = 1 sec X 8 retries) to expire and disconnect sessions. The Linux guest eventually receives the frames including the retires all at the same but by that time the sessions are dropped the router. It appears that the frames are being buffered and are delayed by the guest receives them. We for sure know that the router is sending the poll every 30 seconds but some were some how the frames were buffered (?) for 25 secs before being delivered to the guest. I am trying to figure at which layer the delay was being introduced. Are there any other traces that I can turn on z/VM to diagnose the problem? Were do I start looking at? z/VM LPAR is a small one running 8 guest with 80MB memory and 16MB and 48MB vdisk on a z10 IFL utilization : 2% X 2 IFLS, Central Storage : 95% 768 MB, XSTORE : 97% 256MB, PAGE : 12% X 2 3390-3 page DASD. Paging/Spooling activity: 0/s (most of the times) Thank you, Ashwin Bhemidhi
Re: PIPEDDR and attached DASD
I have solved this problem and present the resolution for the benefit of others: it was an MTU mismatch between the VM TCPIP stacks (set at 8992) and what the network between would support (1500). Further testing I did had showed that ATTACH vs. MDISK and the versions o f PIPELINES and PIPEDDR were not the cause. Instead, sucess or failure was dependent on the data being transferred and whether it traversed the external network or not. After ripping apart PIPEDDR to understand it so that I could add debugging code and options I found that the transfer for a particular disk always failed at a unique track within that disk. Manually added pacing (via DELAY stages) would work (although *tediously* more slowly than expected based on the delay value used), but not if the delay was too small. Eventually, while looking for pacing information in the TCPIP stack I read the reference that Selecting an MTU size that is too large may cause a client application to hang. The light bulb went o n and after adjusting the MTU size used in the VM TCPIP stack everything works fine now. Brian Nielsen On Tue, 5 Apr 2011 12:20:59 -0500, Brian Nielsen bniel...@sco.idaho.gov wrote: Should PIPEDDR work with attached DASD or does it only support MDISKs? The documentation doesn't seem to say. I get an error with attached DAS D but it works fine with a full pack MDISK of the same DASD volume when doing a dump/restore over TCPIP. Using attached DASD will avoid label conflicts and also avoids maintaining hardcoded MDISKs with DEVNO's in the directory. Unfortunately, the DEFINE MDISK command doesn't have a DEVNO option. Here is what the failure looks like from the receiving and sending sides using attached DASD at both ends: - pipeddr restore * 6930 11000 (listen noprompt Connecting to TCP/IP. Enter PIPMOD STOP to terminate. Waiting for connection on port 11000 to restore BNIELSEN 6930. Sending user is BNIELSEN at VMP2 Receiving data from 172.16.64.45 PIPTCQ1015E ERRNO 54: ECONNRESET. PIPMSG004I ... Issued from stage 3 of pipeline 3 name iprestore. PIPMSG001I ... Running tcpdata. PIPUPK072E Last record not complete. PIPMSG003I ... Issued from stage 2 of pipeline 1. PIPMSG001I ... Running unpack. Data restore failed. Ready(01015); T=0.01/0.01 08:58:15 pipeddr dump * 9d5e 172.16.64.44 11000 Dumping disk BNIELSEN 9D5E to 172.16.64.44 PIPTCQ1015E ERRNO 32: EPIPE. PIPMSG004I ... Issued from stage 7 of pipeline 1 name ipread. PIPMSG001I ... Running tcpclient 172.16.64.44 11000 linger 10 reuseaddr U. Dump failed. Ready(01015); T=0.02/0.03 07:54:34 If I create full pack MDISKs via DEFINE MDISK (starting at cyl zero) it works fine, as shown below. pipeddr restore * 6930 11000 (listen noprompt Connecting to TCP/IP. Enter PIPMOD STOP to terminate. Waiting for connection on port 11000 to restore BNIELSEN 6930. Sending user is BNIELSEN at VMP2 Receiving data from 172.16.64.45 41 MB received. Data restored successfully. Ready; T=4.04/4.54 09:34:18 --- pipeddr dump * 9d5e 172.16.64.44 11000 Dumping disk BNIELSEN 9D5E to 172.16.64.44 -- All data sent to BNIELSEN AT VMP1 -- 41 MB transmitted. Dump completed. Ready; T=6.27/8.21 08:30:37 Brian Nielsen = ===
May 12 Webcast: Bringing You Up to Date with LE for z/VSE
Cross-posted to IBMVM,LINUX390, IBMMAIN for those who are interested in updates via hour-long, no-charge webcasts. Feel free to register to listen to one of the two live calls on Thurs. May 12 or listen to the automatic replay in about a week. (remember there's also a linux webcast on May 10/11, too) http://www.vm.ibm.com/education/lvc/ Title: Bringing You Up to Date with LE for z/VSE Abstract: This webcast will provide an overview of Language Environment for z/VSE, recap of features, programming aspects, help for debugging, tooling plus recent functional enhancements with z/VSE 4.3 such as PL/I Multitasking, LE TCP/IP Multiplexer, added dynamic call capabilities and more. Speaker: Wolfgang Bosch, IBM Boeblingen - z/VSE Development Service, Language Environment for z/VSE Webcast Registration, information, replays at this site: http://www.vm.ibm.com/education/lvc/ Please direct LVC questions to Julie Liesenfelt at jul...@us.ibm.com The LVC page has the archive of the past Webcasts. http://www.vm.ibm.com/education/lvc/ And also for your convenience, you can find them in their own section below the current year events on the calendar page: http://www.vm.ibm.com/events/#2011W Regards, Pam C
Re: Problem with z/Linux guest Ethernet frames (buffering ?)
On Thursday, 05/05/2011 at 03:22 EDT, Bhemidhi, Ashwin ashw...@ti.com wrote: Right now there is the eligible list is 0. I could not check the queues at the time of event, the 3 times it occurred was either during after business hours or over the weekends. By the time I was able to logon the eligible list was 0. You can write an exec that issues IND QUEUES every, say, 10 seconds. If the result shows no-zero eligible lists, record the results (with the time) in a file or on the (spooled) console. Start it when you leave for the day. I did change the SRM storbuff setting from the default to 300%, 300%, 200%. Why? If you put it back to the defaults, does the problem go away? Alan Altmark z/VM and Linux on System z Consultant IBM System Lab Services and Training ibm.com/systems/services/labservices office: 607.429.3323 mobile; 607.321.7556 alan_altm...@us.ibm.com IBM Endicott