Re: [s390] Question on WWPN WP

2014-01-14 Thread Berthold Gunreben
Am Fri, 10 Jan 2014 11:21:00 -0700
schrieb Mark Post mp...@suse.com:

  On 1/10/2014 at 01:03 PM, Will, Chris cw...@bcbsm.com wrote: 
  So if we can't pre-stage our NPIV definitions, this will
  significantly increase the cutover time from the Z10 to the EC12.
  We have about 60 servers with about 600 LUNs.  With other sites
  having hundreds of guests, I would think there would be a better
  way to do this.  We have done cpu migrations in the past but this
  is the first time z/VM and NPIV have been involved.
  
  Chris Will
 
 You would think so, I agree.  Unfortunately, after talking with many
 people over time at places like SHARE, etc., there doesn't seem to
 be.  Things like IBM's SAN Volume Controller seem to make things
 somewhat easier, but not as easy as it should be.  (I don't have any
 personal experience with the SVC, so I could be overly pessimistic
 here.)  I and several other people see a potentially large
 opportunity for some ISV to provide a SAN/LUN
 discovery/inventory/management tool to make a lot of things easier,
 including CPU migrations.  Considering how hard IBM pushes customers
 to upgrade to new CPUs when they're announced, this is a rather large
 speed bump to run into.
 
 
 Mark Post

I recently did such a migration. The way I proceeded, was as follows:

1) setup the new EC12, including IOCDS
2) retrieve the list of NPIV numbers from the HMC
3) add the new numbers to the host connections on the storage
4) setup the zoning for all the new NPIV adapters

After this, all machines found their respective disks when we migrated,
and I just had to do some cleanup when the migration was done.

Note, that I heard that there should also be some kind of prediction
tool from IBM, however I have never seen this. If you have access to
such a tool, you may even start configuring before the new machine is in
place.

Berthold Gunreben

-- 
--
 Berthold Gunreben  Build Service Team
 http://www.suse.de/ Maxfeldstr. 5
 SUSE LINUX Products GmbH   D-90409 Nuernberg, Germany
 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer
 HRB 16746 (AG Nürnberg)

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


RHEL CACHE usage being reported erroneously by Monwrite?

2014-01-14 Thread Diep, David (OCTO-Contractor)
Hi everyone,

I am getting MONWRITE data from my RHEL machines for measurement/accounting 
purposes.  We take this data and it gets sent to two places: Performance 
Toolkit and z/OS (where we use MXG to produce a table).

At RHEL startup, this these commands are issued:

modprobe appldata_os
modprobe appldata_mem
modprobe appldata_net_sum
echo 1  /proc/sys/appldata/timer
echo 1  /proc/sys/appldata/mem
echo 1  /proc/sys/appldata/os
echo 1  /proc/sys/appldata/net_sum
echo 5000  /proc/sys/appldata/interval

Everything is by the book... but I see a big difference between my 
Monwrite-based monitor and RMF-PMS, top.  The factor I see for Cache usage is 
way out off ...by a factor of 10.

From performance toolkit:

Linux --- Main --- --- High ---Buffers  Cache -Space (MB)-
 Useridhttp://10.82.10.120:81/02DBAFE8/D2AC/HE.04.001
M_Totalhttp://10.82.10.120:81/02DBAFE8/D2AC/HE.04.011 
%MUsedhttp://10.82.10.120:81/02DBAFE8/D2AC/HE.04.019 
H_Totalhttp://10.82.10.120:81/02DBAFE8/D2AC/HE.04.026 
%HUsedhttp://10.82.10.120:81/02DBAFE8/D2AC/HE.04.034 
Sharedhttp://10.82.10.120:81/02DBAFE8/D2AC/HE.04.041 
/CaFreehttp://10.82.10.120:81/02DBAFE8/D2AC/HE.04.048   
Usedhttp://10.82.10.120:81/02DBAFE8/D2AC/HE.04.057
 System 1101   91.7  .0 .0 .0   111.1  573.4
 VIPSERVP184.1   87.1  .0 .0 .0  .5   50.6

But issuing top in RHEL yields a different number:

top - 14:20:17 up 4 days, 23:09,  1 user,  load average: 0.00, 
0.00, 0.00
Tasks: 103 total,   1 running, 102 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.2%sy,  0.0%ni, 99.2%id,  0.5%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:188560k total,   168208k used,20352k free,  572k buffers
Swap:  1304648k total,   355616k used,   949032k free, 9908k cached

This is the only field that is off. This is the only machine where this 
parameter is off.  I made one change to this server... and that was changing 
swappiness to a lower value:

# echo vm.swappiness=10  /etc/sysctl.conf

Any suggestions???  Thanks!



David Diep

Look out for those in need this winter. When the temperature or wind chill is 
32?F or below, the District issues a Hypothermia Alert. For assistance during 
an Alert: call the Shelter Hotline at 1-800-535-7252 or 311. Or, send an email 
to the Shelter Hotline (up...@upo.orgmailto:up...@upo.org).


--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Question on WWPN WP

2014-01-14 Thread Raymond Higgs
Chris,

We've been talking about your migration problem internally.  If you open a
hardware PMH asking for migration assistance, we might be able to help
you.

Regards,

Ray Higgs
System z FCP Firmware Development
Bld. 706, B42
2455 South Road
Poughkeepsie, NY 12601
(845) 435-8666,  T/L 295-8666
rayhi...@us.ibm.com

Linux on 390 Port LINUX-390@vm.marist.edu wrote on 01/13/2014 11:32:42
AM:

 From: Will, Chris cw...@bcbsm.com
 To: LINUX-390@vm.marist.edu
 Date: 01/13/2014 11:38 AM
 Subject: Re: Question on WWPN WP
 Sent by: Linux on 390 Port LINUX-390@vm.marist.edu

 We try to keep the number of NPIV WWPNs at 32 or less per physical
 channel.  Otherwise we get nameserver and login problems.

 Chris Will
 Systems Software
 (313) 549-9729
 cw...@bcbsm.com


 -Original Message-
 From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf
 Of Raymond Higgs
 Sent: Saturday, January 11, 2014 2:16 PM
 To: LINUX-390@VM.MARIST.EDU
 Subject: Re: Question on WWPN WP

 Chris,

 Yes, there is port zoning, but all of the vendors will recommend
 small zones, and port zoning does not encourage small zones.  When
 setting up zoning, it is the number of virtual ports in it that
 matter.  Port zoning makes it easy to ignore the virtual
 considerations because you are working with physical resources.
 Extra care is needed to avoid making a zone too big, and sometimes
 it only takes an add of 1 port.

 Fibre channel has a notification mechanism called state changes.
 Whenever a virtual nport logs in or out of the fabric, all of the
 other virtual ports in the fabric are notified by the fabric
 controller service running on the switch(s).  This can be a very big
 burden on the switch fabric. The switch vendors have recommendations
 about zone sizes which are much smaller than their consoles will let
 a person set up.  For the most part, newer switches will work with
 larger zones so people really need to check for the hardware that they
have.

 600 Luns was mentioned in another email.  If this means 600 NPIV
 subchannels, then 1 giant zone with 600+ members would be too big!

 The other Chris said they were using Brocade.  I believe Brocade
 also recommends either port, or WWPN zoning.  So no mixing port and
 WWPN zoning.

 The toughest aspect of making zones too big is that it isn't
 apparent right away.  The symptoms do not show up until events like
 fibre pulls, pchid/chpid/switch port vary off/on, guest IPL/
 shutdown, etc happen.

 Regards,

 Ray Higgs
 System z FCP Firmware Development
 Bld. 706, B42
 2455 South Road
 Poughkeepsie, NY 12601
 (845) 435-8666,  T/L 295-8666
 rayhi...@us.ibm.com

 Linux on 390 Port LINUX-390@vm.marist.edu wrote on 01/10/2014 02:37:10
 PM:

  From: burgess, christopher christopher.burg...@emc.com
  To: LINUX-390@vm.marist.edu
  Date: 01/10/2014 04:30 PM
  Subject: Re: Question on WWPN WP
  Sent by: Linux on 390 Port LINUX-390@vm.marist.edu
 
  You don't have to zone by WWPN. In the switch you can set up your
  zones by port number and then make sure the cables are in the right
 ports.
 
   Thanks,
  Chris Burgess
  Phone: 1-800-445-2588 x42149
 1-508-249-2149
  Fax: 1-508-497-8027
  Email: christopher.burg...@emc.com
 
 
 
 
 
 
 
  -Original Message-
  From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of
  Raymond Higgs
  Sent: Friday, January 10, 2014 2:27 PM
  To: LINUX-390@VM.MARIST.EDU
  Subject: Re: Question on WWPN WP
 
  Chris,
 
  I don't agree with that.  My memory is foggy because there are so many

  storage array vendors and their consoles are all different.  I haven't

  run across one that doesn't let you enter WWPNs manually.
  It isn't always easy to find, and might be through a CLI, but it is
  always there.
 
  So, I bet you have 2 options:
  enter them in some manual way
  drag and drop with the GUI as you have been advised
 
  Regards,
 
  Ray Higgs
  System z FCP Firmware Development
  Bld. 706, B42
  2455 South Road
  Poughkeepsie, NY 12601
  (845) 435-8666,  T/L 295-8666
  rayhi...@us.ibm.com
 
  Linux on 390 Port LINUX-390@vm.marist.edu wrote on 01/10/2014
  12:38:25
  PM:
 
   From: Will, Chris cw...@bcbsm.com
   To: LINUX-390@vm.marist.edu
   Date: 01/10/2014 12:40 PM
   Subject: Re: Question on WWPN WP
   Sent by: Linux on 390 Port LINUX-390@vm.marist.edu
  
   Our issue is that z/VM and the zLinux guests have to be up and the
   npiv channel logged in before the new NPIV WWPN can be zoned from
   the SAN side.  At least this is my understanding with EMC storage.
  
   Chris Will
  
   -Original Message-
   From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf
   Of Scott Rohling
   Sent: Friday, January 10, 2014 12:28 PM
   To: LINUX-390@VM.MARIST.EDU
   Subject: Re: Question on WWPN WP
  
   I'm not familiar with the WPT tool, but my experience using NPIV
   would

   leave me to believe that the tool simply tells you what you're new
   WWPN's will be for the FCP channels, so that you can get 

Swap behavior change between SLES 11SP2 and 11SP3?

2014-01-14 Thread Ted Rodriguez-Bell
We've noticed something pretty bothersome for our
environment when we went from SLES 11SP2 to 11SP3.  The
penalty for using more virtual memory on a machine than you have
real memory allocated to it has gone up dramatically.  Have any
of you seen anything like this?

This has been seen with all three SP3 kernels: 3.0.82-0.7.9,
3.0.93-0.8.2, and 3.0.101-0.8.1.  Kswap0 seems to start frantically 
going through the virtual memory space looking for something it can 
free or swap out; the CPU use is very high and the machine is close 
to non-responsive.

A test case we came up with was a simple Perl script that allocated 3.5 GB of
memory as one big array and stepped through it.  That took two hours 
on a server with 1 GB of memory total but a minute on a server with 3 GB free.

Our first engineer said that this is normal behavior because you shouldn't dip 
deeply into swap and expect the system to perform decently  That argument works
in the Intel world because swap goes to disk and disks are very slow.
It's not nearly as true on mainframes because swap (at least our swap)
goes to extended storage and that's still memory.  Since then Suse have 
come around to our view that we shouldn't be seeing this.

Besides, we got away with it in SP2.  Something changed with SP3.  

So:  questions. 
  * Has anyone else seen this?
  * Does anyone know what changed in the kernel between SP2 and SP3?
SP2 kernels of similar release number to SP3 don't show this.
  * Can we tune something to alleviate this?

For those who are interested, more detail follows:

The real-world applications that trigger this are Java applications
that use a lot of memory.  Some are Websphere and some are
home-grown.  I'm convinced that it's the memory used, not the
details of the application that's the problem.

Once kswapd finishes looking around it really doesn't take that long
to go through the array.  Once the system gives up cleaning the
cupboards and actually starts going through them it's not too bad;
it's slower than with adequate memory but by a factor closer to 4
than to 60.  The CPU use is also what convinces me it's a kernel
problem instead of understandably poor hardware performance.

We could get our test systems to go back to the old behavior by
downgrading the kernel to SP2---even if the SP2 kernel had a higher
version than the SP3 one.   For example, we can run the 3.0.93-0.5.1
from SP2 successfully on an otherwise-SP3 system; the 3.0.93-0.8.2
kernel from SP3 has the problem on an otherwise SP2 system. 

We asked about this here earlier; that thread starts at
http://www.mail-archive.com/linux-390@vm.marist.edu/msg64647.html

And if you got this far:  thank you!

Ted Rodriguez-Bell
Mainframe and Midrange Services, Wells Fargo
te...@wellsfargo.com



Company policy requires:  This message may contain confidential and/or 
privileged information.  If you are not the addressee or authorized to receive 
this for the addressee, you must not use, copy, disclose, or take any action 
based on this message or any information herein.  If you have received this 
message in error, please advise the sender immediately by reply e-mail and 
delete this message.  Thank you for your cooperation.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/