Re: vmcp cannot allocate memory

2017-10-18 Thread Pavelka, Tomas
Would it be possible for the vmcp kernel driver to have an option to 
pre-allocate some fixed size of contiguous memory and hold onto it? This would 
prevent fragmentation which can hit you at any time. I remember using vmcp from 
a Linux based installer which typically started by downloading large files over 
scp. Somehow scp caused a lot of fragmentation which made fragmentation errors 
very likely.
If you have an application that runs a large number of vmcp commands, then 
failed vmcp due to fragmentation means the end, because AFAIK there is no sure 
way how to defragment, other than wait an unspecified amount of time.

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Page allocation failure - SLES11 (s390x)

2016-11-25 Thread Pavelka, Tomas
> I have problem with page allocation failure on SUSE Linux Enterprise Server 
> 11 (s390x) VERSION = 11 PATCHLEVEL = 4

> Logs on messages:

> kernel: WebContainer : : page allocation failure: order:4, mode:0x4d0
> kernel: mount.nfs: page allocation failure: order:4, mode:0x4d0

When the kernel requests memory, in some cases it needs it in contiguous 
chunks. When the kernel reports on the size of the chunk that it needs it uses 
this "order" - e.g. your order "4" means that the kernel asked for 2^4 pages = 
2^4 * 4k = 64KB. If your kernel runs long enough (sometimes downloading a large 
file over SCP is enough to cause this, not sure why) the memory used may become 
fragmented so there is no free chunk that is large enough. In this case the 
kernel will try to free up some caches and buffers to see if a contiguous chunk 
could be reclaimed. If it does not succeed after trying several strategies, it 
will just fail and you will get the "page allocation failure" that you are 
seeing. This article has a good description:

https://utcc.utoronto.ca/~cks/space/blog/linux/DecodingPageAllocFailures

I once researched a similar problem and my conclusion was that there is no good 
way out. The kernel developers try to not use large contiguous chunks, but 
there are still drivers that do. Here are a few more things that I have 
discovered during my research that you may find useful:

- If you want to see to what extent your memory is fragmented, use this command:

echo m > /proc/sysrq-trigger

Then look at the output of dmesg, you will see statistics about kernel memory 
usage. The doc for this is here: 
https://www.kernel.org/doc/Documentation/sysrq.txt

- If you want to see what strategies the kernel has for trying to free up 
chunks of contiguous memory, see mm/page_alloc.c, mm/vmscan.c in the kernel 
sources.
- One of the kernel drivers that need large contiguous chunks of memory is the 
vmcp driver. If you run vmcp --buffer=1M often enough you'll eventually run 
into this.

Good luck,
Tomas 

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Creating root LVM

2016-08-23 Thread Pavelka, Tomas
I don't know if that changed but it used to be that zipl could safely boot from 
LVM that was on only one physical volume. See this older discussion for details:
http://www.mail-archive.com/linux-390%40vm.marist.edu/msg62491.html

Tomas

Tomas Pavelka
CA Technologies
Sr Software Engineer

CA CZ, s.r.o 
V Parku 12, 
148 00 Praha 
Czech Republic

Office: +25996 | tomas.pave...@ca.com



Id. Císlo 25694073, z obchodního rejstříku, vedeného Městským soudem v Praze, 
oddíl C, vložka 61808 / Id. No. 25694073, registered in the Commercial Register 
maintained by the Municipal Court in Praque, Section C, File 61808


 

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to unformat a dasd drive?

2016-05-16 Thread Pavelka, Tomas
I am assuming you want to do this from Linux and not from CMS. If I remember it 
right, the DASD driver does not show you the entire device, only the data 
portion of CKD and also not record 0. This means that if dasd has been once 
formatted, Linux will try to access it as formatted drive.

There is a way to access raw tracks from Linux, look in Device Drivers, 
Features and Commands and search for "Accessing full ECKD tracks". 
Unfortunately, the last time I tried this (a couple of years ago) not all 
kernels supported this and also even on newer kernels, some non-IBM devices did 
not support raw track mode from Linux.

Why do you need to unformat drives? I once had a similar need because under 
certain conditions the Linux dasd driver would go into a loop trying to access 
badly formatted drives (this happened only on certain devices, but I forgot the 
details). I ended up writing a user exit for the directory manager that would 
wipe out the first few tracks on a newly created minidisk.

HTH,
Tomas

Tomas Pavelka
CA Technologies
Sr Software Engineer

CA CZ, s.r.o 
V Parku 12, 
148 00 Praha 
Czech Republic

Office: +25996 | tomas.pave...@ca.com



Id. Císlo 25694073, z obchodního rejstříku, vedeného Městským soudem v Praze, 
oddíl C, vložka 61808 / Id. No. 25694073, registered in the Commercial Register 
maintained by the Municipal Court in Praque, Section C, File 61808


-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Dimitri 
John Ledkov
Sent: Tuesday, May 17, 2016 8:12 AM
To: LINUX-390@VM.MARIST.EDU
Subject: How to unformat a dasd drive?

Hello,

Is it possible to "unformat" a dasd drive such that on Linux it appears as 
"n/f" in lsdasd output?

--
Regards,

Dimitri.

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.marist.edu_htbin_wlvindex-3FLINUX-2D390&d=DQIBaQ&c=_hRq4mqlUmqpqlyQ5hkoDXIVh6I6pxfkkNxQuL0p-Z0&r=DK2K-NIOOl0HxbipU0v85tkUyLDvwTVlpJtE__bnwNo&m=WFNCiVt1kjAqOG9uGgX4U7KnbiL6iFQ2IOERde-8zdU&s=nM3vMo000RB2KxSiVR9roT4FQ_pYUVlbF7-wfXoUEPM&e=
--
For more information on Linux on System z, visit 
https://urldefense.proofpoint.com/v2/url?u=http-3A__wiki.linuxvm.org_&d=DQIBaQ&c=_hRq4mqlUmqpqlyQ5hkoDXIVh6I6pxfkkNxQuL0p-Z0&r=DK2K-NIOOl0HxbipU0v85tkUyLDvwTVlpJtE__bnwNo&m=WFNCiVt1kjAqOG9uGgX4U7KnbiL6iFQ2IOERde-8zdU&s=AbPFiVvF5Taw8nEjIYblCJY-mXyy5EjBLuCZOJ2qyJY&e=
 


Re: Maven dependency

2016-02-24 Thread Pavelka, Tomas
Hi Neale,
I would guess that the issue is that the Maven repository is a different thing 
than an RPM repository:
https://maven.apache.org/guides/introduction/introduction-to-repositories.html

I am also guessing there isn't anything that could turn the RPM repo to a Maven 
repo, but I have not tried looking for such thing.

The standard practice is to simply allow maven download from Maven Central. 
Alternatively, if you want to have full control over what packages are used you 
could use something like Artifactory for your own remote Maven repo. But I 
think in either case you would have to give your mock build (assuming you are 
using mock) network access. 
There is also the local Maven repo which serves as a cache for the remote ones, 
but I don't know how you would populate it in any automated way.

HTH,
Tomas

Tomas Pavelka
CA Technologies
Sr Software Engineer

CA CZ, s.r.o 
V Parku 12, 
148 00 Praha 
Czech Republic

Office: +25996 | tomas.pave...@ca.com



Id. Císlo 25694073, z obchodního rejstříku, vedeného Městským soudem v Praze, 
oddíl C, vložka 61808 / Id. No. 25694073, registered in the Commercial Register 
maintained by the Municipal Court in Praque, Section C, File 61808

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Searching Java Class in .jar

2016-02-18 Thread Pavelka, Tomas
The same Java class can be present in multiple JAR files in your class path 
which may be a pain to debug. What I found helpful is to turn on class loading 
tracing - add the "-XX:+TraceClassLoading" option.

This lets you know which exact location the class was loaded from.

It works in Oracle Java:
http://www.oracle.com/technetwork/articles/java/vmoptions-jsp-140102.html

Not sure about the other JVM implementations, they may have their own variants 
of the option. 

If that does not help, try the option -verbose:class which should be available 
in all JVMs.

HTH,
Tomas

Tomas Pavelka
CA Technologies
Sr Software Engineer

CA CZ, s.r.o 
V Parku 12, 
148 00 Praha 
Czech Republic

Office: +25996 | tomas.pave...@ca.com



Id. Císlo 25694073, z obchodního rejstříku, vedeného Městským soudem v Praze, 
oddíl C, vložka 61808 / Id. No. 25694073, registered in the Commercial Register 
maintained by the Municipal Court in Praque, Section C, File 61808



Re: SMAPI call to get USER DIRECT

2015-11-25 Thread Pavelka, Tomas
> Is this a known issue?  I'm assuming it's SMAPI in the backend, not smaclient 
> on the front end causing the slowness. 

The best I can remember is that for a user directory with somewhere between 
100-200 entries the Query_All_DM took 1-3 seconds. Most of the CPU was spent on 
the SMAPI side, because the work involved in parsing the Query_All_DM output 
was trivial (if I remember it right it just returns one large text chunk). So I 
would guess it is an issue in smaclient. But that is just a guess, I have never 
worked with smaclient. 

Tomas


Re: SMAPI call to get USER DIRECT

2015-11-25 Thread Pavelka, Tomas
I have used it successfully with FORMAT=NO. But this was in our proprietary 
SMAPI library so I can't share. But the call is not complicated, just put 
together function name, target and append FORMAT=NO. I am not familiar with 
smaclient, but if you get to understand how it assembles the SMAPI function 
calls adding support for Query_All_DM should not be difficult.

HTH,
Tomas 


Re: zLinux CPU monitoring

2015-09-30 Thread Pavelka, Tomas
I was once researching something similar (i.e. how to get reliable CPU readings 
from within Linux) but unfortunately never finished. But I still have the 
links, maybe you will find something interesting there:

Presentation about how Linux kernel can get accurate CPU readings from the 
hypervisor:
http://linuxvm.org/present/SHARE110/S9266ms.pdf

How to set up the hypervisor file system mentioned in the presentation above:
http://www-01.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.lgdd/lgdd_r_hypfs_setup.html

s390 Debug fs (I think this is where hyptop reads from):
https://www.kernel.org/doc/Documentation/s390/s390dbf.txt

Also look at /proc/[pid]/stat in proc man page:
http://linux.die.net/man/5/proc

None of these will give you readings without additional work (like computing 
percentages from cumulative readings).

HTH,
Tomas


Re: SSH no-root, key-based authentication video

2015-07-20 Thread Pavelka, Tomas
> SourceForge R/W access has been offline for 4 days!!

Have you seen this?

https://www.reddit.com/r/sysadmin/comments/3do9k0/sourceforge_is_down_due_to_storage_problems_no_eta/


Tomas Pavelka
CA Technologies
Sr Software Engineer

CA CZ, s.r.o 
V Parku 12, 
148 00 Praha 
Czech Republic

Office: +25996 | tomas.pave...@ca.com



Id. Císlo 25694073, z obchodního rejstříku, vedeného Městským soudem v Praze, 
oddíl C, vložka 61808 / Id. No. 25694073, registered in the Commercial Register 
maintained by the Municipal Court in Praque, Section C, File 61808


-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Michael 
MacIsaac
Sent: Monday, July 20, 2015 8:18 PM
To: LINUX-390@VM.MARIST.EDU
Subject: SSH no-root, key-based authentication video

Hello lists,

I made a short 5 minute video on automating SSH key setup using zoom. (the 
user's names are totally mythical :)) See:
https://www.youtube.com/watch?v=p19-08aJUEA

I plan to present on it in more detail tomorrow at MVMUA in Poughkeepsie NY.

Unfortunately, the latest code is not yet on the Internet as SourceForge R/W 
access has been offline for 4 days!!  (so much for 'five nines'; they're down 
to one - but 'what do you want for nothing?' :)).

-Mike MacIsaac

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/


Re: How to find a memory leak?

2015-07-13 Thread Pavelka, Tomas
Let me try a different example than Mike's 'Q DASD DETAILS 0-': Suppose you 
are writing software for disaster recovery of LVM disks. The Linux that owned 
them will not come up so you link them from another Linux. Get a list of 
minidisk addresses and their owners and issue LINK against each. LINK needs a 
local virtual address that is also free. You don't know who will be using this 
so this needs to work in any configuration. You can ask the user to provide a 
range in a config file, but it would be nicer if you could find the range 
automatically. You could run QUERY VIRTUAL and find which virtual addresses are 
occupied. Once you start doing this, you run QUERY VIRTUAL every time you need 
to find free address, i.e. even when you have a few minidisks linked.
But at this point you realize that you program is unreliable because it can 
occasionally fail due to memory fragmentation, because to list all the used 
virtual addresses you need a large contiguous buffer in the kernel. What people 
wrote about reading the response of vmcp and enlarging the buffer if needed 
helps in the average scenario but does not help the worst case scenario. So if 
you are using vmcp in critical code you have to be very careful about keeping 
buffers small. It would be nice if this was not necessary, i.e. if there was a 
way to run DIAG 8 without the need for contiguous buffer.

Tomas


Re: How to find a memory leak?

2015-07-13 Thread Pavelka, Tomas
> the qeth driver has been improved in 2014 to reduce its demand for contiguous 
> storage:

Thanks Ursula, the memories keep coming back ;-) I remembered that the vmcp is 
still problematic but managed to forget that the qeth driver got fixed.


Re: How to find a memory leak?

2015-07-13 Thread Pavelka, Tomas
> I wouldn't really put that at the feet of s390 (z/Architecture).

Bad wording on my part. When I said s390 I meant the s390 part of the Linux 
kernel implementation, not the entire architecture. I meant to point out that 
the other parts of the kernel are working on getting out of the requirement of 
large contiguous buffers. AFAIK the vmcp driver uses the largest buffer and as 
you say, if the diag allowed to return discontiguous memory then it would solve 
the fragmentation problem. There are few other places that used larger buffers, 
NIC driver was one of them. So not sure if "all over" was the good wording 
either, maybe I should have said "multiple places" ;-)

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to find a memory leak?

2015-07-10 Thread Pavelka, Tomas
> Only admins would have access to those sudo commands.

But the sudoers line shows an intent to restrict access to tee only:

%zoom ALL=NOPASSWD:/usr/bin/tee

The hole that Karsten has shown is that the line in sudoers is really the 
security equivalent of:

%zoom ALL=NOPASSWD:ALL

Whether it is a huge hole depends on whether you would be ok with allowing all 
users in %zoom to be able to run any command through sudo without a password.

Tomas


Re: How to find a memory leak?

2015-07-09 Thread Pavelka, Tomas
I replied to Mike and Alan yesterday evening but it does not show in the 
archives. I am assuming it got lost and I am resending. Sorry if this is a 
duplicate.

> If vmcp is called with a buffer of 1M and the last slab in 
> /proc/buddyinfo is 0, would it not be reasonable to nudge 
> the kernel to free at least one slot up, assuming this can be done safely?

> So there's no point in nudging the kernel to do a Hail Mary attempt to
> find more memory.  If it were available, the slab count would already be > 0.

As I understand it from the time I was researching this, /proc/buddyinfo shows 
the current state of the slab cache. Since the kernel uses a large amount of 
memory for caches and buffers and these are ready to be freed when needed, a 
zero slab count does not necessarilly mean that a call needing that slab will 
fail. The kernel does several rounds of freeing and rearranging memory to find 
or construct a suitable slab. 
I looked at this in kernel 2.6 and it may have changed, but there the algorithm 
was different for slabs with size lesser than 32k: for those it tried even 
harder to free memory. I also remember there was some time limit on the 
freeing, if the kernel did not free the memoryin time, it failed.
So a vmcp failure happens when there are zero free slabs and the kernel fails 
to free enough continuous memory. I guess you can end up with freeing a lot and 
still have enough fragmentation not to be able to find a large slab.
Where the s390 is different is that it uses large continuous buffers all over. 
The rest of Linux tries to use smaller or discontinuous buffers which may be 
why the kernel mainline is not bothered by problems with reclaiming larger 
slabs. So the question for the VM/zLinux devs could be whether the diag that 
allows Linux to make CP calls could be changed to return partial data or do 
something else in order to not use a large buffer.

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to find a memory leak?

2015-07-09 Thread Pavelka, Tomas
> Maybe I'll think about sudo-enabling cmmflush and checking the last field of 
> /proc/buddyinfo to see if it needs to be run.

I tried doing things based on the values of /proc/buddyinfo but what I found is 
that if there are zeroes in the high order slab counts, there is a chance that 
vmcp with 1M buffer will fail. But not a guarantee. Sometimes Linux just 
rearranges the slabs and finds the memory. Which makes it even harder to 
reproduce. Beware that you can spend ages debugging this ;-)

Tomas


Re: How to find a memory leak?

2015-07-09 Thread Pavelka, Tomas
> The next question is - can this ever be done by a non-root user? I tried 
> adding /bin/echo to /etc/sudoers, but still get an error:

I was able to google these two approaches to dropping caches over sudo:

sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"

or

echo 3 | sudo tee /proc/sys/vm/drop_caches

See the comments here: http://www.linuxinsight.com/proc_sys_vm_drop_caches.html

But as I said, in my experiments dropping caches did not help. What makes this 
hard to test is that vmcp running out of memory is not easily reproducible. It 
can happen once, then you can try rerunning for a while and it keeps happening. 
But suddenly the kernel rearranges the slabs and you can run fine for days. The 
problem is that I have not found a way to free memory for large kernel slabs 
from within a script. If you are trying to fix the problem as human, the 
solution is to repeatedly run vmcp --buffer=1M q userid and it will eventually 
go away.

Tomas


Re: How to find a memory leak?

2015-07-09 Thread Pavelka, Tomas
> Thanks.  I copied and pasted cmmflush and it seems to work nicely

If I understand it right then you have to look at how cmmflush affects the 
output of /proc/buddyinfo. If you see non-zero in the last order of slab (i.e. 
the one with 1MB size) then you are good to run vmcp --buffer=1M. Otherwise you 
may still run into problems even if free -m shows a lot of free memory.

But I have not tried cmmflush, maybe it will help.

The way that I was able to reproduce the memory fragmentation problem was by 
copying large amount of data over SCP to that Linux machine. Try that and see 
if you can reproduce the vmcp --buffer=1M failure.

Tomas


Re: How to find a memory leak?

2015-07-09 Thread Pavelka, Tomas
> As a workaround, is there a command to flush the buffer cache?

I forgot to answer this question: you can drop buffers and cache by running

echo 3 > /proc/sys/vm/drop_caches

See http://linux-mm.org/Drop_Caches

As far as I remember this did not help at all. My guess about why that did not 
help is that when seeking for memory, the kernel will actually try to drop some 
caches, but in the case of memory fragmentation that does not help. But feel 
free to try.

Other things I tried that did not work or work consistently was repeating the 
vmcp call with a possible wait and increasing the server memory to about 2G. 
What definitely does not help is increasing the memory with chmem, because that 
adds memory not usable by the kernel for this kind of buffer allocation (again, 
I forgot the details).

Tomas


Re: How to find a memory leak?

2015-07-09 Thread Pavelka, Tomas
This is a really ugly problem that I don't have a solution for. But let me give 
you a bit of info if you want to do your own digging:

The way I found this is that I was adding NICs to a Linux on the fly. Sometimes 
this would fail, saying page allocation in syslog. The discussion on this list 
is here:

http://www.mail-archive.com/linux-390%40vm.marist.edu/msg65371.html

What I found later is that the NIC driver needs 64k of memory in kernel space. 
This means the memory needs to be continuous. The kernel keeps memory in 
structures called slabs, and keeps pools of these. If you do 

cat /proc/buddyinfo
Node 0, zone  DMA   9078  10398   3135838164 14  0  0   
   2

Another way to get memory report is to run "echo m > /proc/sysrq-trigger" and 
look into syslog for a report about kernel memory usage.

You will see how many slabs of each order you have. 9078 of order 1 slabs 
(4kb), 10398 of order 2 slabs (8kb) ... 2 order 9 slabs (1MB). If a slab of 
lower order is needed it may split a higher order one (e.g. if the kernel wants 
a 4k slab it may split an 8k slab into two). Lots of kernel allocations and you 
may run out of the higher order slabs. What worked for me for trigerring this 
condition was moving a lot of data to the Linux over SCP. There may be other 
causes.

Now the significance of 32k is that this is where Linux stops retrying to 
rearrange memory to find larger slabs. I don't remember the details, but if you 
want to investigate look at the kernel sources, namely mm/page_alloc.c and 
mm/vmscan.c

So the bottom line is, anytime you have an operation that needs a large buffer 
in kernel (chccwdev of a NIC, vmcp with --buffer, DIAG from Linux) it may fail 
at unexpected times. I have not found a good way to get around this but I will 
be interested if you find anything.

In the case of VMCP what may help is if it allocated a buffer at kernel 
startup. At the moment it allocates it for every call, see 
http://lxr.free-electrons.com/source/drivers/s390/char/vmcp.c#L105 

Tomas



Re: How to find a memory leak?

2015-07-09 Thread Pavelka, Tomas
> In the past this server has gone to near zero memory, and vmcp commands fail.

Do you have any specifics? Did you use a buffer larger than 32k on those vmcp 
commands? Vmcp can fail due to memory fragmentation even on a server with lots 
of free memory.

Tomas Pavelka
CA Technologies
Sr Software Engineer

CA CZ, s.r.o 
V Parku 12, 
148 00 Praha 
Czech Republic

Office: +25996 | tomas.pave...@ca.com



Id. Císlo 25694073, z obchodního rejstříku, vedeného Městským soudem v Praze, 
oddíl C, vložka 61808 / Id. No. 25694073, registered in the Commercial Register 
maintained by the Municipal Court in Praque, Section C, File 61808



-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Michael 
MacIsaac
Sent: Thursday, July 09, 2015 4:15 PM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: How to find a memory leak?

Thanks Richard for the joke :))

Thanks Thomas for the input.  I changed the ps command flag to '--sort -rss', 
and restarted memusage - will continue to monitor.

Thanks Dave for the pointer, but I don't have any of my own C/C++ programs 
running, just many bash scripts (if they do no 'malloc's, can they still cause 
memory leaks?).

In the past this server has gone to near zero memory, and vmcp commands fail.  
I'm guessing the OOM killer was invoked, but by then it's already too late ...

-Mike

On Thu, Jul 9, 2015 at 9:54 AM, Dave Jones  wrote:

> Hi, Mike.
>
> if the package AddressSanitizer (ASan) is available, you might want to 
> ive it a go.  It is a fast memory error detector. that can find 
> use-after-free and {heap,stack,global}-buffer overflow bugs in C/C++ 
> programs. it's here:
>
> https://code.google.com/p/address-sanitizer/
>
> Good luckI still think C/C++ will be the death of us all. :-)
>
> DJ
>
> On 07/09/2015 07:50 AM, Pavelka, Tomas wrote:
> > Look at the " -/+ buffers/cache" line in the free output:
> >
> > Before:
> > -/+ buffers/cache: 41450
> > After:
> > -/+ buffers/cache: 48443
> >
> > (First number used, second free)
> >
> > Linux has various buffers and caches that are allocated if there is 
> > free
> memory. For example for disk reads. These are dropped if the memory is 
> needed by processes. The " -/+ buffers/cache" line shows what memory 
> is actually used by processes and not the buffers. In your case the 
> used memory rose only by 7 MB.
> >
> > BTW I would not look at the virtual memory size of proceses, this 
> > may be
> allocated way over the virtual memory size of your machine. The more 
> interesting metric is RSS which is how much memory is actually used.
> >
> > HTH,
> > Tomas
> >
> > Tomas Pavelka
> > CA Technologies
> > Sr Software Engineer
> >
> > CA CZ, s.r.o
> > V Parku 12,
> > 148 00 Praha
> > Czech Republic
> >
> > Office: +25996 | tomas.pave...@ca.com
> >
> >
> >
> > Id. Císlo 25694073, z obchodního rejstříku, vedeného Městským soudem 
> > v
> Praze, oddíl C, vložka 61808 / Id. No. 25694073, registered in the 
> Commercial Register maintained by the Municipal Court in Praque, 
> Section C, File 61808
> >
> >
> > -Original Message-
> > From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf 
> > Of
> Michael MacIsaac
> > Sent: Thursday, July 09, 2015 2:19 PM
> > To: LINUX-390@VM.MARIST.EDU
> > Subject: How to find a memory leak?
> >
> > Hello list,
> >
> > I have a SLES 11 SP3 system that is leaking memory, but I don't know 
> > how
> or where.
> >
> > I find a script on the Internet that runs forever, adapt it 
> > somewhat,
> and start logging some info to a temp file. Here's the script:
> >
> > # cat memusage
> > #!/bin/bash
> > #
> > # track memory usage
> > #
> > outFile="/tmp/memusage"
> > while true
> > do
> >   echo "---" >> $outFile
> >   date >> $outFile
> >   ps aux --sort -vsz | head -22 >> $outFile
> >   echo >> $outFile
> >   free -m >> $outFile
> >   sleep 300
> > done
> >
> > After a fresh reboot of a 512 MB virtual machine, I start the script 
> > and
> the first entry in the temp file shows about 20 MB (512 - 492) used by 
> Linux and 97 MB used by processes:
> >
> > Wed Jul  8 12:37:45 EDT 2015
> > USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> > root  2181  0.0  0.2 115404  1024 ?Ssl  12:36   0:00
> > /usr/sbin/n

Re: How to find a memory leak?

2015-07-09 Thread Pavelka, Tomas
Look at the " -/+ buffers/cache" line in the free output:

Before:
-/+ buffers/cache: 41450
After:
-/+ buffers/cache: 48443

(First number used, second free)

Linux has various buffers and caches that are allocated if there is free 
memory. For example for disk reads. These are dropped if the memory is needed 
by processes. The " -/+ buffers/cache" line shows what memory is actually used 
by processes and not the buffers. In your case the used memory rose only by 7 
MB.

BTW I would not look at the virtual memory size of proceses, this may be 
allocated way over the virtual memory size of your machine. The more 
interesting metric is RSS which is how much memory is actually used. 

HTH,
Tomas

Tomas Pavelka
CA Technologies
Sr Software Engineer

CA CZ, s.r.o 
V Parku 12, 
148 00 Praha 
Czech Republic

Office: +25996 | tomas.pave...@ca.com



Id. Císlo 25694073, z obchodního rejstříku, vedeného Městským soudem v Praze, 
oddíl C, vložka 61808 / Id. No. 25694073, registered in the Commercial Register 
maintained by the Municipal Court in Praque, Section C, File 61808


-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Michael 
MacIsaac
Sent: Thursday, July 09, 2015 2:19 PM
To: LINUX-390@VM.MARIST.EDU
Subject: How to find a memory leak?

Hello list,

I have a SLES 11 SP3 system that is leaking memory, but I don't know how or 
where.

I find a script on the Internet that runs forever, adapt it somewhat, and start 
logging some info to a temp file. Here's the script:

# cat memusage
#!/bin/bash
#
# track memory usage
#
outFile="/tmp/memusage"
while true
do
  echo "---" >> $outFile
  date >> $outFile
  ps aux --sort -vsz | head -22 >> $outFile
  echo >> $outFile
  free -m >> $outFile
  sleep 300
done

After a fresh reboot of a 512 MB virtual machine, I start the script and the 
first entry in the temp file shows about 20 MB (512 - 492) used by Linux and 97 
MB used by processes:

Wed Jul  8 12:37:45 EDT 2015
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root  2181  0.0  0.2 115404  1024 ?Ssl  12:36   0:00
/usr/sbin/nscd
root  1851  0.0  0.1  11512   692 ?Shttp://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/


Re: Slightly OT: LVMing root file system & other Linux on z best practices (or not so best).....

2015-05-04 Thread Pavelka, Tomas
> "Do not include the root file system in the LVM structure because, if for any 
> reason the LVM
>  fails, the operating system will not boot. "  - Set up Linux on IBM System z 
> for Production

People often give non-unique names to volume groups. For example if you name 
the VG root resides on "RootVG" and try to access it (via CP LINK) from another 
Linux system that also has root on "RootVG" then you can't put the VG online. 
We once got around this by using UUIDs for VGs, this ensures that any root FS 
is accessible from other Linuxes that can be used as emergency repair machines.

> Are there other reasoning's to not do so?  These are the types of things that 
> would really be helpful.

One other reason I know about is that in the general case you cannot boot from 
an LVM root on multiple minidisks. This is a ZIPL limitation. Previous 
discussion is here:

http://www.mail-archive.com/linux-390%40vm.marist.edu/msg62491.html

This limits the size of the root partition to whatever is the largest single 
virtual disk you have available.

HTH,
Tomas



Tomas Pavelka
CA Technologies
Sr Software Engineer

CA CZ, s.r.o 
V Parku 12, 
148 00 Praha 
Czech Republic

Office: +25996 | tomas.pave...@ca.com

Id. Císlo 25694073, z obchodního rejstříku, vedeného Městským soudem v Praze, 
oddíl C, vložka 61808 / Id. No. 25694073, registered in the Commercial Register 
maintained by the Municipal Court in Praque, Section C, File 61808

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Java performance under a second level VM

2015-04-17 Thread Pavelka, Tomas
> And because the optimizer uses elapsed time rather than cpu time it keeps 
> recompiling classes and kind of digs is own grave. You might try without the 
> optimizer on 2nd level Linux.

I wrote a simple benchmark that just prints a number of lines on the console. 
Java uses about 180 times more CPU on the second level when compared to first 
level. It uses about 40 times more CPU than Perl doing the same thing on second 
level.

When I turn the JIT off the CPU time on the second level is slightly less than 
with the JIT on, but the wallclock time is cut by half.

> This is probably too specialist topic for general consumption.

I find this pretty fascinating, not sure about the others. If no one else 
replies let's move this offline.

Thanks,
Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Java performance under a second level VM

2015-04-16 Thread Pavelka, Tomas
Hi Rob, 
thanks for the reply. The performance of Linux under a second level VM always 
seemed unpredictable to me but after reading your response I started seeing a 
pattern: whenever I had a poorly performing application on Linux on a second 
level VM, there was always some kind of networking involved which in turn 
involved waiting on open sockets.
I just want to ensure I am understanding correctly what your are saying: Is it 
true that waits under software SIE burn CPU where the same waits under hardware 
SIE do not? That would explain a lot of the performance problems I have seen 
but I am sure there are caveats.

Thanks,
Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Java performance under a second level VM

2015-04-16 Thread Pavelka, Tomas
I have a Java application running a REST API built on top of the Dropwizard 
stack (www.dropwizard.io). I have been running it on 
a first level z/VM and it was always well behaved consuming less than 1% of a 
CPU on an EC12 during normal operation. I needed to do some testing which I 
need to run on a second level z/VM. I have experienced performance that pretty 
much rendered the application unusable. To get a better understanding of the 
difference in performance I tried to measure the initialization time of the 
application. This I have done with CP INDICATE USER EXPANDED, measuring Ttime 
and Vtime, subtracting the values at the beginning from values at the end of 
the measured interval.
For the first level z/VM the initialization took 5 seconds for both Ttime and 
Vtime. For the second level system Ttime was  0:41:12 and Vtime was 0:41:11, 
i.e. the initialization on the second level took a little less than 500 times 
as much as on the first level.

We have another group testing software on that system written in Perl and they 
have not complained about performance (unfortunately that is the closest I have 
to performance data for the Perl app). The Linux machines on the second level 
do not feel sluggish (when doing things like text editing directory browsing 
and copying files, all under an SSH session). This makes me believe that the 
problem may be Java specific. I also tried running other Java applications, the 
response was pretty bad especially if the applications did console or file IO 
(but I don't have any measurements). I tried this under IBM JRE 7.1 and on 8.0 
without noticeable change in performance.

I have a few questions:

1) Can the total and virtual time on a second level system be relied upon? Or 
can the time be skewed? Unfortunately I don't have the performance data from 
the first level VM on which the second level runs.
2) Could anyone recommend a Java profiler for s390x?
3) Has anyone experienced something similar?

Thanks,
Tomas


Tomas Pavelka
CA Technologies
Sr Software Engineer

CA CZ, s.r.o 
V Parku 12, 
148 00 Praha 
Czech Republic

Office: +25996 | tomas.pave...@ca.com

Id. Císlo 25694073, z obchodního rejstříku, vedeného Městským soudem v Praze, 
oddíl C, vložka 61808 / Id. No. 25694073, registered in the Commercial Register 
maintained by the Municipal Court in Praque, Section C, File 61808

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: GREP command to Find UID

2015-02-27 Thread Pavelka, Tomas
> Is there a place within OMVS to find the ID being defined ? Is there a path 
> where get the list of ID defined within OMVS ?

Sorry, I assumed you are on Linux... I don't know much about USS.

Tomas 


Re: GREP command to Find UID

2015-02-27 Thread Pavelka, Tomas
getent?

http://linux.die.net/man/1/getent

Examples:

getent passwd 99
nobody:x:99:99:Nobody:/:/sbin/nologin

getent passwd nobody
nobody:x:99:99:Nobody:/:/sbin/nologin

getent group nobody
nobody:x:99:

getent group 99
nobody:x:99:

HTH, 
Tomas


Re: KVM in Linux for system z capabilities

2015-02-16 Thread Pavelka, Tomas
> I envision people using KVM on System z eventually deciding they want to 
> upgrade to z/VM for the better capabilities and manageability it provides.

Could you elaborate on what are the most important things that z/VM provides 
that KVM does not? I occasionally get questions from people that heard 
somewhere that KVM will replace z/VM on the mainframe but can't give answers 
because I don't really know anything about KVM.

Thanks,
Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Bug in vmcp?

2015-02-02 Thread Pavelka, Tomas
> Use FOR to execute a CP command on another virtual machine and receive the
> command's responses and return code either to your terminal or over an IUCV
> connection to the Asynchronous CP Command Response system service (*ASYNCMD).

Not that you have the AF_IUCV support in Linux kernel so you can use it to 
connect to *ASYNCMD and get the response of the FOR command, see

http://dev.man-online.org/man7/af_iucv/

HTH,
Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: What are your feelings about non-RPM installers for Linux?

2015-01-23 Thread Pavelka, Tomas
Just to explain what I am trying to do: over the years I have been involved in 
multiple discussions about installations that involved people supporting 
different platforms. It is often difficult to convince everyone that RPM is the 
right thing to do and that interactive installers are not a good idea on Linux. 
So I thought that the convincing could be easier if I could point to actual 
zLinux users supporting this idea.

However, I do not follow the part about why putting RPMs in a tarball is evil. 
I see tarball as just a transport vehicle. The vendor puts a bunch of RPMs in a 
tarball for a single file download, the end user unpacks it and proceeds as 
normal with the RPMs. What am I missing here? Could you give me an example 
where the tarball makes the workflow difficult?

Thanks,
Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: What are your feelings about non-RPM installers for Linux?

2015-01-23 Thread Pavelka, Tomas
> Things like InstallMangler seem like good ideas to product owners since it 
> allows them to provide the illusion of a simplified install experience that 
> is common across all platforms

For the same product owners the opinions of (potential) customers often carry 
more weight than those of a developer. Thanks very much for the replies, I 
promise to make a good use of them ;-)

Tomas


What are your feelings about non-RPM installers for Linux?

2015-01-22 Thread Pavelka, Tomas
Hello,
I am in the middle of discussion about how to package and install software on 
Linux for System z. There are people new to Linux involved and things like 
InstallAnywhere are coming up. What is your experience with non-RPM installers? 
For example, IBM Java no longer comes in as RPM for the s390x (or at least I 
can't find the RPM version anymore). Does this cause problems for you? And if 
so, could you give examples of such problems?

Thanks,
Tomas



Tomas Pavelka
CA Technologies
Sr Software Engineer

CA CZ, s.r.o 
V Parku 12, 
148 00 Praha 
Czech Republic

Office: +25996 | tomas.pave...@ca.com

Id. Císlo 25694073, z obchodního rejstříku, vedeného Městským soudem v Praze, 
oddíl C, vložka 61808 / Id. No. 25694073, registered in the Commercial Register 
maintained by the Municipal Court in Praque, Section C, File 61808

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: device-mapper: reload ioctl on failed

2014-11-18 Thread Pavelka, Tomas
I just realized that I was in a different situation that you are in. We had 
multiple Linux users sharing a disk on an LVM R/O, but what you have is one R/W 
and multiple R/O. If the system with R/W makes change, the R/O system's file 
system will not have means to sync up. So even if you manage to solve your 
current problems, you could still end up with IO errors on the Linux that's 
linking R/O.
Is there a particular reason why you need to share Linux data via minidisk 
links? Could a NFS share do the same job? 

Tomas

-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Shumate, 
Scott
Sent: Tuesday, November 18, 2014 3:56 PM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: device-mapper: reload ioctl on failed

Thanks for the reply Thomas.  I'm going to pass this on to our Linux guys, this 
is their arena.  I'm trying to help them trouble shoot.  I'm not keen on making 
the disk mw.  I'm with you on that one.  It should work by linking it r/o and 
mount the volume group r/o on the linux side.  Works fine in our QA 
environment.  Just fails on our test environment.

Thanks
Scott


-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Pavelka, 
Tomas
Sent: Tuesday, November 18, 2014 2:27 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: device-mapper: reload ioctl on failed

> It works if I attach the mini-disk was multi-write.  This is not what we want 
> but it works.  Any ideas?

A lot of speculation on my part follows. Again I will point to Red Hat as the 
more informed party. The bug I dealt with before seems very similar to what you 
are experiencing. There were functions in the LVM code used for volume 
initialization that came in two variants: the read write variant would write 
metadata to the volume during initialization while the read only variant would 
not try to write anything. The cause of the bug was that someone forgot to call 
the RO variant when the disk was linked RO and the RW variant tried to write to 
the disk and failed which in turn caused the whole initialization to fail and 
the volume was not brought online.

If your case is similar, then linking the minidisk MW would allow the 
initialization logic on the disk that is supposed to read to write to the disk. 
I assume that you are using an ext type file system and I am also assuming that 
Linux does not support the RESERVE/RELEASE function there (if there is someone 
who knows better please correct me). This means that the two Linuxes sharing 
the disk can write to the disk at the same time. You may get lucky and this 
will work or you may end up with corrupted LVM metadata or worse. Before you 
proceed with this check how important the data on those disks are, what are 
your skills regarding recovery of LVM disks with corrupted metadata and whether 
you have usable backups. Even if the data on those disks are not valuable, 
corrupted LVM metadata could make debugging your original problem more 
complicated.

The safe rule of MW is to never use it unless you have a very good idea about 
what you are doing.

HTH,
Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/


The information in this transmission may contain proprietary and non-public 
information of BB&T or its affiliates and may be subject to protection under 
the law. The message is intended for the sole use of the individual or entity 
to which it is addressed. If you are not the intended recipient, you are 
notified that any use, distribution or copying of the message is strictly 
prohibited. If you received this message in error, please delete the material 
from your system without reading the content and notify the sender immediately 
of the inadvertent transmission.

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: device-mapper: reload ioctl on failed

2014-11-17 Thread Pavelka, Tomas
> It works if I attach the mini-disk was multi-write.  This is not what we want 
> but it works.  Any ideas?

A lot of speculation on my part follows. Again I will point to Red Hat as the 
more informed party. The bug I dealt with before seems very similar to what you 
are experiencing. There were functions in the LVM code used for volume 
initialization that came in two variants: the read write variant would write 
metadata to the volume during initialization while the read only variant would 
not try to write anything. The cause of the bug was that someone forgot to call 
the RO variant when the disk was linked RO and the RW variant tried to write to 
the disk and failed which in turn caused the whole initialization to fail and 
the volume was not brought online.

If your case is similar, then linking the minidisk MW would allow the 
initialization logic on the disk that is supposed to read to write to the disk. 
I assume that you are using an ext type file system and I am also assuming that 
Linux does not support the RESERVE/RELEASE function there (if there is someone 
who knows better please correct me). This means that the two Linuxes sharing 
the disk can write to the disk at the same time. You may get lucky and this 
will work or you may end up with corrupted LVM metadata or worse. Before you 
proceed with this check how important the data on those disks are, what are 
your skills regarding recovery of LVM disks with corrupted metadata and whether 
you have usable backups. Even if the data on those disks are not valuable, 
corrupted LVM metadata could make debugging your original problem more 
complicated.

The safe rule of MW is to never use it unless you have a very good idea about 
what you are doing.

HTH,
Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: device-mapper: reload ioctl on failed

2014-11-13 Thread Pavelka, Tomas
> lvm2-2.02.100-8.el6.s390x

When I was dealing with this I had a version that had the bug fixed and I think 
it was older than 2.02.100. It was 2.02.9x something. I would guess that your 
problem is LVM related but not the same that I had. Red Hat would give you 
better advice than I do.

Tomas


Re: device-mapper: reload ioctl on failed

2014-11-13 Thread Pavelka, Tomas
What version of the lvm RPM are you running? We have run into a problem on a 
CentOS 6 installation where LVM would not put online disks that were read only. 
The bug started at a version that I unfortunately forgot and was fixed in a 
newer version that I forgot as well. But I managed to find one of the versions 
that was wrong, which was lvm2-2.02.87-6. At that time we were on CentOS and 
did not have support contract with Red Hat so we just rolled a custom RPM with 
a patch which I was able to find:

--- ./lib/device/dev-io.c.orig  2012-08-31 03:45:14.0 -0400
+++ ./lib/device/dev-io.c   2012-08-31 03:46:22.0 -0400
@@ -282,7 +282,7 @@ static int _dev_read_ahead_dev(struct de
return 1;
}

-   if (!dev_open(dev))
+   if (!dev_open_readonly(dev))
return_0;

if (ioctl(dev->fd, BLKRAGET, &read_ahead_long) < 0) {

The project where we needed this got cancelled so I have not followed up 
whether Red Hat fixed it or not, but this should give you some leads.

HTH,
Tomas


Re: New to z/VM and Linux on System z: Minidisk definitions in IBM supplied USER DIRECT question

2014-10-23 Thread Pavelka, Tomas
Are you looking for documentation of the individual user directory statements? 
That can be found in the IBM book "CP Planning and Administration", in the 
section "Creating and Updating a User Directory"

http://pic.dhe.ibm.com/infocenter/zvm/v6r3/topic/com.ibm.zvm.v630.hcpa5/cusrdir.htm

If you follow from there you can find descriptions of each individual 
statements. For example here is ACCOUNT:

http://pic.dhe.ibm.com/infocenter/zvm/v6r3/topic/com.ibm.zvm.v630.hcpa5/daccoun.htm

Or did I misunderstand and you are looking for how to display the contents of 
the user directory?

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to reset the Linux root pw

2014-10-03 Thread Pavelka, Tomas
Thanks everyone for the very insightful comments.

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to reset the Linux root pw

2014-10-02 Thread Pavelka, Tomas
> warm bodies authenticate with PKI using a central LDAP store for public keys

Being curious, how do you deal with situations when LDAP is temporarily not 
available?

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Adding memory dynamically to a RHEL6.5 guest running under z/VM 6.3

2014-10-02 Thread Pavelka, Tomas
One way is to have standby memory defined. 
For example, if you have user LINUX and want to have 512MB with the chance to 
go up to 2G, then you add the 2G as max memory on the USER directory statement:

USER LINUX PASWD 512M 2048M G 64

Then add the following statement to the user directory to define the remaining 
memory as standby:

COMMAND DEFINE STORAGE AS 512M STANDBY 1536M

If you set up the user like this, then from inside the Linux guest you can add 
memory at run time with the chmem command. For example, to add extra 512 M run:

chmem -e 512M

Note that it is not always possible to return the standby memory that you have 
once allocated without re-ipling the guest.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Adding CPUs

2014-07-24 Thread Pavelka, Tomas
Unless you want all your Linux machines to have four CPUs, you'd have to change 
your profile. There is this line in the doc of the CPU directory statement:

"If specified, the CPU statement must appear before any device statements."

Since you have NICDEF in your profile, you can't put CPU statements after the 
INCLUDE of your profile.

The MACHINE statement you mentioned may be the way out of that:

http://pic.dhe.ibm.com/infocenter/zvm/v6r3/topic/com.ibm.zvm.v630.hcpa5/dmachin.htm

It allows you to specify how many virtual CPUs is the user allowed to define, 
e.g.

MACHINE ESA 4

Would make the user start with one CPU (assuming there are no CPU statements in 
the directory) and let the user define extra CPUs via DEFINE CPU

http://pic.dhe.ibm.com/infocenter/zvm/v6r3/topic/com.ibm.zvm.v630.hcpb7/defincpu.htm

You could run DEFINE CPUs from PROFILE EXEC if you IPL to CMS first and IPL 
Linux from PRIFILE EXEC.

Just like CPU, the MACHINE statement must precede any device statements so 
you'd have to put MACHINE ESA 4 in your profile which would allow all your 
Linux users to define 4 CPUs at runtime. So it may be better to restructure 
your profile and leave the device statements out of it. 

To tell which directory statements are device statements, see this table:

http://pic.dhe.ibm.com/infocenter/zvm/v6r3/topic/com.ibm.zvm.v630.hcpa5/hcpa5226.htm#hcpa5-gen225__dircat

Tomas




Re: Running on CP or IFL ?

2014-07-14 Thread Pavelka, Tomas
> There are no conditions in which a CPU's type is unknown.  Rather, it's a bug 
> in hyptop.

If anyone actually tries to use this, note that the version of hyptop I was 
running is quite old. I did not test whether newer versions fix this behavior.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Running on CP or IFL ?

2014-07-14 Thread Pavelka, Tomas
> There must be some way to determine. At least when using hyptop in LPAR, it 
> tells the number of available IFL and CP in the top line.

I have looked at the source of hyptop and it reads the information from 
debugfs, namely from these two files:

/sys/kernel/debug/s390_hypfs/diag_204  - if running on LPAR
/sys/kernel/debug/s390_hypfs/diag_2fc  - if running under VM

I assume that DIAG 2FC is run by the kernel and saved to the file. The diagnose 
is documented here:
http://pic.dhe.ibm.com/infocenter/zvm/v6r3/topic/com.ibm.zvm.v630.hcpb4/hcpb4295.htm

I have not found the doc for diag 204 and neither do I know how is it possible 
to call a diagnose when not running under z/VM (would that be a diagnose of 
PR/SM?), but if you look inside the kernel sources, you may be able to find out.

There is one catch that I know of, besides CP and IFL, there is also an unknown 
processor type. When I run hyptop under z/VM 6.3 and RHEL 6.1 the CPU shows up 
as unknown (CPU-T: UN). I am running under IFL only system. If you wanted to 
use this detection in practice, you would have to figure out under which 
conditions can the CPU be reported as unknown.

Tomas


Re: Using DIRMAINT to copy an existing guest

2014-07-07 Thread Pavelka, Tomas
An interesting alternative to this is to mount by label, here is an example of 
fstab lines:

LABEL=root/   ext3defaults1 1
LABEL=usr /usrext3defaults0 0

Before you can mount by label, you need to put the label on the file system 
e.g. like this:

tune2fs -L root /dev/dasda1

The advantage is that it is resistant to minidisk virtual address changes and 
volume group name changes if the volume with the fs is on LVM.

Tomas

-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Chuck 
Tribolet
Sent: Tuesday, July 08, 2014 12:27 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: Using DIRMAINT to copy an existing guest

Before cloning, make sure that /etc/fstab doesn't contain any entries that look 
like:

/dev/disk/by-id/

or

/dev/dasd

You will need to change them to:

/dev/disk/by-path/ccw-0.0.-part1

The first form will change on a clone, the second form I've had problems with 
the drive letter changing.





Chuck Tribolet
trib...@us.ibm.com (IBM business)
trib...@garlic.com (Personal)
http://www.almaden.ibm.com/cs/people/triblet



From:   Mark Post 
To: LINUX-390@vm.marist.edu,
Date:   07/07/2014 12:16 PM
Subject:Re: Using DIRMAINT to copy an existing guest
Sent by:Linux on 390 Port 



>>> On 7/6/2014 at 07:19 PM, Cameron Seay  wrote:
> Es claro.  For my edification, how does the system handle the cloning 
> of the minidisks of the cloned-from guest?  Are they physically the 
> same ones?  Does it grad available DASD for the cloned user?  Thanks!

DIRMAINT will use available space from the various DASD volumes you've told it 
to manage.  That means the cloning does involve actual copying, and not some 
sort of thin provisioning.


Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/



--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Netstat Grep Port Range

2014-07-01 Thread Pavelka, Tomas
> 'NR > 2 {split($5,a,":");

One thing you should watch out for is that the IP address could be an IPv6. In 
that case, you cannot split by ":" because there may be many ":" in the 
address. So you would need something that parses out the number after the last 
colon in field 5. But I'm not really good with awk to write that...

Tomas


Re: Adding a zfcp LUN to root file system logical volume

2014-05-20 Thread Pavelka, Tomas
Not sure if this is the root cause of your problem, but zipl does not support 
booting from a logical volume consisting of more than one physical volume. It 
may work in some cases but is generally unsafe. Previous discussion is here:

http://www.mail-archive.com/linux-390%40vm.marist.edu/msg62491.html

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: The device stopped operating while being set offline

2014-04-30 Thread Pavelka, Tomas
> Does writing to "ungroup" synchronously generate a udev event?   

To generate the event synchronously, the write to the file would have to block 
until the udev event is generated. I don't think it is the case, but I have not 
actually tried/researched.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: The device stopped operating while being set offline

2014-04-30 Thread Pavelka, Tomas
Thanks for all the replies. I did not fully explain what I'm doing before 
detach so let me make that clearer. To offline and detach a NIC I do the 
following:

1) Write 0 to /sys/bus/ccwgroup/devices//online
2) Wait for /sys/bus/ccwgroup/devices//online to become 0 and for 
/sys/bus/ccwgroup/devices//state to become "DOWN"
3) Write 1 to /sys/bus/ccwgroup/devices//ungroup
4) Wait for the /sys/bus/ccwgroup/devices// directory to disappear

I should also explain what the "wait" above means: 

1) Run udevadm settle
2) Check if the various files in /sys/bus have expected values (see above for 
which)
3) If not, sleep for a second and repeat the check
4) Do this until the expected values are there or until a specified number of 
iterations passed

I have verified that by the time I'm done with the last wait:
/sys/bus/ccwgroup/devices// directory is gone
/sys/bus/ccw/devices//online is 0
The change uevent generated at the very end of the offline sequence has been 
generated (more about this later).

Note that by this time the ethX interface is long gone so unfortunately ethtool 
cannot be used for any checking.

If, by this time I run CP DETACH NIC I get the "The device stopped operating 
while being set offline" in syslog. If I recreate the NIC immediately after 
that (in a loop in a script) and try to set it online it fails to come up 
online. I could be more precise about what I mean by "fails to come up online":
/sys/bus/ccw/devices//online is 1
Wait for /sys/bus/ccwgroup/devices//online is 0 even after a wait of 20 
seconds

Let me also expand on the change uevent:

I have noticed that /sys/bus/ccw/devices//online is set to zero at the 
beginning of the offline sequence and there is a wait after that so just 
checking the online file will not let me know that kernel is done with 
offlining the device. I have noticed that in a function called 
__qeth_l2_set_offline there is a change uevent  generated and it is at the very 
end of the function. So I thought that if I wait for that uevent, I could avoid 
detaching the device too early. But that did not work either and it only 
reminded me that my understanding of the s390 kernel drivers is still rather 
poor ;-)

One other thing I tried is various kernels. I originally found this problem on 
an old 2.6.32-131.0.15.el6.s390x kernel on RHEL. I also tried 
2.6.32-431.11.2.cl6.s390x on ClefOS and I was able to reproduce the problem. I 
was however not able to reproduce it on 3.0.13-0.27-default on SLES 11 SP2.

I should also say why I'm going through all this trouble of writing to the 
/sys/bus files on my own. We have previously used znetconf, but it kept failing 
occasionally. E.g. it failed with ENODEV (echo: write error: No such device) 
when trying to write to /sys/bus/ccwgroup/devices//online. I thought I 
could be more clever than that and actually check and wait for the right files 
and values before doing the actions. But at least in one case I have ended up 
with a fixed sleep which I very much dislike.

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


The device stopped operating while being set offline

2014-04-29 Thread Pavelka, Tomas
I have a script that dynamically adds and removes virtual NICS. E.g. it runs 
define nic, couples it to a vswitch, sets it online, connects somewhere, sets 
it offline, ungroups and detaches it.
The problem I have is that I cannot figure out how to tell that the device is 
really offline and the NIC is safe to detach. I have looked at the s390 kernel 
drivers, namely into drivers/s390/cio/device.c and there I see that during 
offline processing (ccw_device_set_offline), the ccw_device online field is set 
to 0 first and then there is a wait loop that finishes the offline process. The 
reason I'm mentioning the ccw_device online filed is that it is what is being 
read when you read /sys/bus/ccw/devices//online from user mode.

If the device is detached before the offline processing is finished, you will 
get this error message in syslog:

: The device stopped operating while being set offline

I found a reference for this message here:
http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=%2Fcom.ibm.linux.l0kmsg.doc%2Fl0km_m_cio.390dcf.html

Which says "The device is now inactive, but setting it online again might 
fail." Which describes the problem I'm seeing ;-)

Can anyone think of a way how to tell that the device is really offline (i.e. 
in DEV_STATE_DISCONNECTED or DEV_STATE_OFFLINE) so that it is safe to detach? 
And do this from user mode?

My current solution is simply to put a short sleep before the CP detach command 
but that does not seem to be very safe even though I have not been able to 
reproduce the failure with it.

Thanks,
Tomas



Tomas Pavelka
CA Technologies
Sr Software Engineer
Tel:  +420226207796
tomas.pave...@ca.com

[cid:image001.gif@01CF6387.B70E3490]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: MACPROTECT question

2014-03-19 Thread Pavelka, Tomas
> When setting the MACPROTECT ON at the NIC level, the  specified must be 
> that of the data device.  This is probably A002 in your case.

That did the trick in both cases. Thanks! 

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


MACPROTECT question

2014-03-18 Thread Pavelka, Tomas
I would like to use MACPROTECT ON for Linux guests on a vswitch but it is not 
working as I would expect. My understanding is that MACPROTECT ON will not 
allow a NIC to send a frame with source mac address different from the one 
assigned by CP.
I have a Linux bridge that bridges layer 2 traffic between two vswitches. I 
would like to have MACPROTECT on for all guests except for the one that runs 
the bridge. This I intended to do with running SET VSWITCH  MACPROTECT ON 
and SET NIC USER   MACPROTECT OFF. But, as soon as I turn 
MACPROTECT on for the vswitch the traffic through the bridge stops regardless 
of whether the MACPROTECT on the NIC is on or off.

As an additional check I have tried to do it in the other way: set MACPROTECT 
OFF on the vswitch and set MACPROTECT ON for the bridge NIC. I would expect 
this to stop the traffic through the bridge, but that did not happen.
More details for the second case:

q v nic a000
Adapter A000.P00 Type: QDIO  Name: UNASSIGNED  Devices: 3
  MAC: 02-00-C2-0A-6D-D5 VSWITCH: SYSTEM ALBL07
 Device: A000 Protected

znetconf -c | grep a000
0.0.a000,0.0.a001,0.0.a002 1731/01 GuestLAN QDIO 08 qeth eth6online

tcpdump -e -i eth6 '(host 141.202.59.44 or host 141.202.59.45)'
tcpdump: WARNING: eth6: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth6, link-type EN10MB (Ethernet), capture size 65535 bytes
07:46:54.596577 02:00:c2:0a:6d:ff (oui Unknown) > Broadcast, ethertype ARP 
(0x0806), length 42: Request who-has 141.202.59.45 tell 141.202.59.44, length 28
07:46:54.596827 02:00:c2:0a:6e:00 (oui Unknown) > 02:00:c2:0a:6d:ff (oui 
Unknown), ethertype ARP (0x0806), length 42: Reply 141.202.59.45 is-at 
02:00:c2:0a:6e:00 (oui Unknown), length 28
07:46:54.596985 02:00:c2:0a:6d:ff (oui Unknown) > 02:00:c2:0a:6e:00 (oui 
Unknown), ethertype IPv4 (0x0800), length 98: 141.202.59.44 > 141.202.59.45: 
ICMP echo request, id 1913, seq 1, length 64

The A000 NIC on the bridge has mac addres 02-00-C2-0A-6D-D5 but passes traffic 
between mac addresses 02:00:c2:0a:6d:ff and 02:00:c2:0a:6e:00 despite 
protection being on.
Is my understanding of MACPROTECT incorrect or have I found a bug?

Thanks,
Tomas


Tomas Pavelka
CA Technologies
Sr Software Engineer
Tel:  +420226207796
tomas.pave...@ca.com

[cid:image001.gif@01CF42AD.93DBBA60]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/
<>

Re: dasdfmt slowness

2014-03-04 Thread Pavelka, Tomas
What we do to speed dasdfmt up is to split the large dasd into smaller 
minidisks and join them by LVM. Then the individual minidisks can be formatted 
in parallel. This is easy to script with xargs, which has the parameter 
--max-procs that lets you specify the maximum number of worker processes to 
spawn.

Tomas


Re: Layer 2 frames passing through a Linux bridge get dropped before leaving the mainframe box

2014-02-27 Thread Pavelka, Tomas
> I gave up and looked at some code.  :-)  The x200C is generated when you try 
> to register a universal MAC address or one that potentially conflicts with a 
> MAC address CP might create.
> That means you can only register additional MAC addresses that have MAC 
> prefixes that are outside the range of the SYSTEM or USER MACPREFIX 
> identified in SYSTEM CONFIG.  

Thanks, this explains a lot. Knowing this I was actually able to register a 
secondary mac address with the vswitch in my experimental setup.

> This opens a dangerous door because it means that the non-bridge guests must 
> also have MACPROTECT OFF on their NICs.
> Further, you can't manage the MAC addresses in CP; the MAC address has to 
> appear in the Linux configuration (hwaddr).

This solves the problem for the likes of KVM where the hypervisor inside Linux 
can manage its own set of mac addresses. It does not solve the problem for us, 
because we rely on CP managing the mac addresses (i.e. we do not want the guest 
OSes to have the ability to change their mac address). But this is very useful 
information. We will most likely go into redesign phase, but now we know much 
more about what are the limits that we operate under.

> I think I would raise a PMR to discuss this with z/VM Development.
We actually have an open PMR but at this point in time I have learned more from 
these discussions, so thanks again.

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Layer 2 frames passing through a Linux bridge get dropped before leaving the mainframe box

2014-02-26 Thread Pavelka, Tomas
We have done more research and found out that the same problem as we have is 
faced by virtualization software running under Linux. I have found a discussion 
about a feature called "secondary unicast address" that would allow registering 
multiple MAC addresses with a NIC. Here is a discussion that explains why this 
is useful for virtualization:
http://thread.gmane.org/gmane.linux.network/64719/focus=64775
I have been looking at the l2 qeth driver and found out that the code seems to 
support secondary unicast MAC addresses. Adding a secondary unicast address to 
a NIC can actually be done from user mode, here is a code snippet that 
registers a secondary MAC via a raw packet interface:

s = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));

...

struct packet_mreq mreq;
memset(&mreq, 0, sizeof(mreq));
mreq.mr_ifindex=ifr.ifr_ifindex;
mreq.mr_type=PACKET_MR_UNICAST;
mreq.mr_alen=ETHER_ADDR_LEN;
memcpy(mreq.mr_address,mac,ETHER_ADDR_LEN);
ret = setsockopt(s, SOL_PACKET, PACKET_ADD_MEMBERSHIP, &mreq, sizeof(mreq));

The dbf trace shows that the l2 qeth driver code gets exercised and leads to a 
command called IPA_CMD_SETVMAC. The user mode call succeeds but I have found 
this error message in /sys/kernel/debug/s390dbf/qeth_msg/sprintf

00 01393420032:989799 2 - 00 03c00092471e  IPA: setvmac(x21) for 
0.0.a100/eth7 returned x200C "L2 mac not authorized by hypervisor"

This is where we are stuck again. Is the secondary unicast address what I think 
it is? Are there any restrictions on in what situations secondary MAC addresses 
can be added to a virtual NIC?

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Layer 2 frames passing through a Linux bridge get dropped before leaving the mainframe box

2014-02-23 Thread Pavelka, Tomas
I was talking to Carsten and he said that the MAC address gets registered with 
the OSA card via calls from the Linux kernel drivers. He also mentioned that 
KVM faces the same problem. We will attempt to do the registering of the extra 
MAC addresses from a kernel module running on the bridge machine. I'll let you 
know if we have any success with that.

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Layer 2 frames passing through a Linux bridge get dropped before leaving the mainframe box

2014-02-21 Thread Pavelka, Tomas
> so you must be performing MAC address translation such that you look more 
> like a a layer 2 router (a la OSA in layer 3 mode), 
> not an 802.1d bridge (I said 802.3 earlier; I meant 802.1d.)  That is, all 
> guests on the PUBLIC vswitch have the same MAC 
> address as viewed by all hosts on the PRIVATE vswitch (and vice versa).

In my original question I changed the names of vswitches and virtual machines. 
This one will have the real names which are not as self-explanatory because I'm 
using what is available. Here are more details on the setup:

AL070009 (141.202.59.44/02-00-C2-00-01-3A) -  - TOMRH61 (runs bridge) - 
 - AL07000A (141.202.59.45/02-00-C2-00-01-3B)

I ran a ping from AL070009

PING 141.202.59.45 (141.202.59.45) 56(84) bytes of data.
64 bytes from 141.202.59.45: icmp_seq=1 ttl=64 time=1.40 ms
64 bytes from 141.202.59.45: icmp_seq=2 ttl=64 time=0.426 ms

And three TCP dumps:

First one at the interface on AL070009:

03:21:01.089237 02:00:c2:00:01:3a (oui Unknown) > 02:00:c2:00:01:3b (oui 
Unknown), ethertype IPv4 (0x0800), length 98: 141.202.59.44 > 141.202.59.45: 
ICMP echo request, id 2040, seq 1, length 64

Second one on the bridge (TOMRH61):

03:21:01.089304 02:00:c2:00:01:3a (oui Unknown) > 02:00:c2:00:01:3b (oui 
Unknown), ethertype IPv4 (0x0800), length 98: 141.202.59.44 > 141.202.59.45: 
ICMP echo request, id 2040, seq 1, length 64

And third one at the interface at the destination guest (AL07000A):

03:21:01.090045 02:00:c2:00:01:3a (oui Unknown) > 02:00:c2:00:01:3b (oui 
Unknown), ethertype IPv4 (0x0800), length 98: 141.202.59.44 > 141.202.59.45: 
ICMP echo request, id 2040, seq 1, length 64

As sanity check I have saved the results of CP Q VSWITCH ACT for both vswitches 
and verified that ALBL07 (private vswitch) does not have 02-00-C2-00-01-3B 
registered and that INTRA59 (public vswitch) does not have 02-00-C2-00-01-3A 
registered:

+ grep 02-00-C2-00-01-3A dumps/private_switch.act
  02-00-C2-00-01-3A IP: 141.202.59.44
+ grep 02-00-C2-00-01-3B dumps/private_switch.act
+ grep 02-00-C2-00-01-3A dumps/public_switch.act
+ grep 02-00-C2-00-01-3B dumps/public_switch.act
  02-00-C2-00-01-3B IP: 141.202.59.45

As a second sanity check I have revoked access of AL07000A from all vswitches 
except for INTRA59 which left the Linux machine with a single active NIC with 
MAC 02-00-C2-00-01-3B (I used another vswitch for SSH access). I was still able 
to ping in that scenario.

>From this I think that sending MAC addresses unknown to a vswitch is possible 
>and the vswitch will not block it. A frame with src mac address 
>02-00-C2-00-01-3A which is only known on ALBL07 gets received by a guest only 
>connected to INTRA59 which does not know the source mac address. I do not 
>think we have duplicate MAC addresses on the whole z/VM system.

The bridge software I use is this: 
http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge which 
they say implements a subset of ANSI/IEEE 802.1d.

I hope we are not developing on top of a bug exploit. Let me know if I should 
add more details about the setup.

Thanks,
Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Layer 2 frames passing through a Linux bridge get dropped before leaving the mainframe box

2014-02-20 Thread Pavelka, Tomas
>so you must be performing MAC address translation such that you look more like 
>a a layer 2 router (a la OSA in layer 3 mode), not an 802.1d bridge (I said 
>802.3 earlier; I meant
802.1d.)  That is, all guests on the PUBLIC vswitch have the same MAC address 
as viewed by all hosts on the PRIVATE vswitch (and vice versa).

I don't think we do any translation. But I want to ensure that what I think is 
in fact consistent with reality ;-) I will rerun the experiment and post 
details of vswitch settings and TCP dumps to show what the MAC addresses are at 
various places in the communication.

Thanks,
Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Layer 2 frames passing through a Linux bridge get dropped before leaving the mainframe box

2014-02-20 Thread Pavelka, Tomas
> Further, the VSWITCH is already acting as an IEEE 802.3 layer 2 bridge and 
> its filtering database will drop unicast frames destined for unknown MAC 
> addresses.

One thing I forgot to mention: We have successfully sent packets between two 
vswitches connected to a Linux bridge (LINUX1 and LINUX2 communicate in the 
example below). But we needed to put the Linux bridge into promiscuous mode on 
both of the bridged vswitches.

(LINUX1) -  - (LINUXBR) -  - (LINUX2)

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Layer 2 frames passing through a Linux bridge get dropped before leaving the mainframe box

2014-02-20 Thread Pavelka, Tomas
> What is LINUXBR doing for you that the VSWITCH cannot do for you?

We are in the business of porting software that works on top of a Linux bridge. 
We have a kernel driver that hooks into the Linux bridge and filters layer 2 
frames based on rules. We got it working on Xen, VMware and inside z/VM (Linux 
bridging between vswitches). The OSA problem caught us by surprise (namely the 
fact that OSA behaves differently than a vswitch otherwise we would have 
discovered this sooner).

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Layer 2 frames passing through a Linux bridge get dropped before leaving the mainframe box

2014-02-20 Thread Pavelka, Tomas
Another question that comes to mind is, if there is negotiation with OSA, how 
does Linux tell that there is a real OSA involved? My assumptions (which may be 
false ;-)) were that Linux as a z/VM guest should not be able to tell whether a 
NIC is real or virtual. And in our case the NIC is always virtual, because we 
do not connect directly to the OSA, we go through a vswitch. The bridging works 
as long as the traffic does not go through the OSA card, so somehow the Linux 
guest is able to pass frames with MAC addresses it does not own. These frames 
are only dropped if they go towards the OSA. I was not able to tell whether 
they are dropped by the OSA or by the vswitch which connects to the OSA. But 
the same vswitch passes the bridged packets with no problems to other Linux 
machines inside the box...

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Layer 2 frames passing through a Linux bridge get dropped before leaving the mainframe box

2014-02-20 Thread Pavelka, Tomas
Thanks, this means a big change to our plans ;-) Do you know if there are any 
public docs (or source code) that we could look at to understand how the 
negotiation works?

Tomas

-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Carsten 
Otte
Sent: Thursday, February 20, 2014 9:35 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: Layer 2 frames passing through a Linux bridge get dropped before 
leaving the mainframe box

This setup won't work, because Linux negotiates its mac address with the OSA, 
and cannot send frames from another mac. You could use ip forwarding, and have 
Linux route on layer 3. This should work, as long as you use the OSA in layer 2 
mode.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Layer 2 frames passing through a Linux bridge get dropped before leaving the mainframe box

2014-02-20 Thread Pavelka, Tomas
We have a problem where frames that pass through a Linux bridge do not reach 
the gateway outside of the mainframe box. We have set up an experiment that 
reproduces the problem, which looks like this:

(LINUX1) -  - (LINUXBR) -  - OSA - gateway

The problem is that in this setup we cannot ping the gateway. But, under a 
different setup:

(LINUX1) -  - (LINUXBR) -  - (LINUX2)

Both LINUX1 and LINUX2 can communicate. Moreover, LINUX2 can ping the gateway 
(the OSA card is still connected to the public vswitch, I just did not put it 
in the picture).

Some more details that may be important:
- Both public and private vswitch are layer 2
- LINUXBR runs RHEL 6 and uses bridge-utils to create the bridge
- private vswitch is not connected to any OSA card

We have played with TCPDUMP and found that ARP (broadcast) packets do reach the 
gateway and come back, but ping's ICMP (unicast) packets get dropped. This led 
us to the following hypothesis: If there is a unicast packet originating from a 
MAC address not known to public vswitch, it gets dropped somewhere on the way 
between LINUXBR and the gateway.

Does anyone know any settings that could affect filtering done either by the 
vswitch or by the OSA card? We asked our hardware people but they did not know 
of anything that could cause the problems. But a more targeted question could 
help if we knew what to ask for.

Any debugging tips will be much appreciated.

Thanks,
Tomas

Tomas Pavelka
CA Technologies
Sr Software Engineer
Tel:  +420226207796
tomas.pave...@ca.com

[cid:image001.gif@01CF2E1A.CF9FFDB0]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/
<>

Re: Running RTC on Linux under z/VM

2014-02-19 Thread Pavelka, Tomas
Are you planning to run the RTC server or clients? I have no experience with 
the server but have been running a client for a few years. For the client, only 
command line runs on zLinux (i.e. no Eclipse). The CLI runs fine with 512M of 
memory. 

Tomas

-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of BESSETTE, 
JOHN
Sent: Wednesday, February 19, 2014 7:30 PM
To: LINUX-390@VM.MARIST.EDU
Subject: Running RTC on Linux under z/VM

Is anyone running RTC (Rational Team Concert) on a Linux server running under 
z/VM?  If so what are the memory and IFL requirements for these servers.

Thanks,

John J. Bessette
Team Lead,  z/OS Mainframe Systems
State Employees' Credit Union
Office  919-839-7257
Fax  919-832-0429
john.besse...@ncsecu.org

This email may contain confidential and privileged material for the sole use of 
the intended recipient. If you are not the intended recipient, please contact 
the sender and delete all copies. Any review or distribution by others is 
strictly prohibited. Personal emails are restricted by policy of the State 
Employees' Credit Union (SECU).  Therefore SECU specifically disclaims any 
responsibility or liability for any personal information or opinions of the 
author expressed in this email.

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Virtualization Cookbook - first draft hopefully soon?

2013-09-04 Thread Pavelka, Tomas
> Does anyone know a freeware graphics tool that could somehow reverse-video, 
> or just brighten these graphics?  They are all GIFs (pronounced JIFs :))

How about ImageMagick? 
http://www.imagemagick.org/Usage/color_basics/#replace

RPMs exist for s390x, at least for RHEL they are part of the distro.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: DASD format from Linux only

2013-03-14 Thread Pavelka, Tomas
> One error message is ignorable. Hundreds are a problem that should be 
> fixed.

I just had to learn this the hard way... I started playing with 
raw_track_access, noticed the large number of I/O errors reported, but paid it 
no mind and made it work. I wrote a CKD track that was not recognized as a 
known format and I fixed my problem. Then I started to do more large scale 
testing and found out that while I can write the track to a minidisk on one 
real device, others would not let me write the track. I have noticed that the 
vendor (as reported in /sys) of the good device is IBM and the one that I 
cannot write to is HTC. And again I have a large wall of I/O errors and sense 
data...
I'm on the verge of giving up on trying to drive all this from Linux. We 
already have a working directory manager user exit that puts a label on every 
new disk to let Linux know that the disk is not formatted in any known format. 
That takes care of the problem.

The problem is that when Linux puts a disk online, it does analysis of the disk 
to determine if it is in a known format. If it finds a known format it starts 
reading all over the disk. If there are errors, like bad cylinders in the count 
field, all kinds of bad things can happen. There is no way to bring the disk 
online without doing the analysis, which means that Linux does not have a way 
to safely format its own disks.
If there was a way to tell the kernel to bring the device online without the 
analysis, the problem would be solved. There is a state called "unformatted" 
where the disk goes if kernel can't identify a known label. I would like to 
have a way to explicitly put a disk online in the unformatted state. For 
example, writing 1 to /sys/bus/ccw/devices//online would bring the disk 
online with format detection and writing 2 would put it online without 
detection, ending up in "unformatted" state. If the user knew that they intend 
to format a disk with potentially unsafe data on it they could just bring it 
online unformatted. (By unsafe I mean data that may be partially in a good 
format and partially in bad).

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: DASD format from Linux only

2013-03-14 Thread Pavelka, Tomas
The main problem is not the huge amount of errors in the syslog. The main 
problem is the contention caused by the locate record ccws that end up in 
error. I have seen this happen anytime there are counts with the wrong cylinder 
anywhere on the disk and the first cylinder of the disk is in recognizable 
format. The only fast way I have found that prevents Linux from issuing locate 
records all over the device when the device is brought online is to write data 
to the first track of the first cylinder in order to make it unrecognizable as 
a format known to Linux. Then Linux will treat the disk as unformatted and the 
contention problem is no longer there. 
One thing I did not explain is why I'm insisting on doing the format from 
Linux: I have user interface that reports progress of a long format to the user 
and I have infrastructure for this reporting, which is in Linux. There isn't 
time to port this to CMS. I can format from CMS, but only as long as the format 
is quick. If I CMS format whole disks, then I can't report progress (within my 
existing infrastructure).

Tomas


-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Mark Post
Sent: Wednesday, March 13, 2013 9:05 PM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: DASD format from Linux only

>>> On 3/13/2013 at 12:20 PM, Stefan Haberland  wrote: 
> Hi Tomas,
> 
> I have a possible solution for you from within Linux. You can set the 
> device online with raw_track_access enabled.
> 
> $ echo 1 > /sys/bus/ccw/devices/0.0./raw_track_access
> $ chccwdev -e 
> 
> Please ignore the few Buffer I/O errors in syslog.

I can't say this made a huge amount of difference.  For a 1,000 cylinder TDISK, 
I got 572 lines of output from bringing a mis-formatted disk online.  Using the 
raw_track_access, I got 444 lines of output.  I don't know if the number of 
messages is proportional to the number of cylinders or not, but I suspect it 
is.  So, either method is going to be generating a lot of output regardless.  I 
think the best method is to CPFMTXA at _least_ cylinder 0 before giving it to a 
guest.  It really should be the entire volume, or use the DIRMAINT function to 
erase things for you.


Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: DASD format from Linux only

2013-03-13 Thread Pavelka, Tomas
Thanks Stefan, 
this is actually a path I was investigating today except that I took a slightly 
different approach. I have noticed that the Linux driver looks at the size of 
the first records on the track to check if it is any recognizable format. It 
checks if the key is 4 bytes long. So I accessed the device with 
raw_track_access and wrote the first track with a single record that had a key 
that was six bytes long. Brought it offline, switched off raw_track_access, 
brought it online again and got a nice "The DASD is not formatted" error in the 
syslog. The device was immediately ready for format.

The only thing that bothers me is the large number of I/O error messages in the 
syslog when you access the device in raw_track_access. We use the syslog for 
other errors as well, so I'm afraid of the log being rotated away too quickly. 
But if I remember right, I saw those errors even when running normal dasdfmt so 
this is nothing new.

Tomas

-Original Message-
From: Stefan Haberland [mailto:s...@linux.vnet.ibm.com] 
Sent: Wednesday, March 13, 2013 5:20 PM
To: Linux on 390 Port
Cc: Pavelka, Tomas
Subject: Re: DASD format from Linux only

Hi Tomas,

I have a possible solution for you from within Linux. You can set the device 
online with raw_track_access enabled.

$ echo 1 > /sys/bus/ccw/devices/0.0./raw_track_access
$ chccwdev -e 

Please ignore the few Buffer I/O errors in syslog.

Afterwards you can format the device using dasdfmt. It was not intended to do 
so but it works. Just checked on my system. Since the disk is in 
raw_track_access mode dasdfmt can not write the volume label to the disk.
So dasdfmt will fail with following error message:

Finished formatting the device.
dasdfmt: Writing the bootstrap IPL1 failed, only wrote -1 bytes.

But the disk is formatted correctly. When you set the disk online afterwards 
without raw_track_access you can call fdasd on the device. It will ask you if 
it should write the missing volume label to the disk.

$ fdasd /dev/dasde
reading volume label ..: no known label
Should I create a new one? (y/n): y

With this steps done the disk should be ready for usage.

Regards,
Stefan

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: DASD format from Linux only

2013-03-13 Thread Pavelka, Tomas
Thanks for the replies, here are my thoughts on the problem:

I agree that before a minidisk is given to a guest (before the guest is started 
for the first time) the minidisk needs to be formatted and any data that was 
previously on the disk erased. The question is, when to do it and from which OS.
I have the ability to create both the new guest and the minidisk from Linux 
(via SMAPI) but not the ability to safely format the disk from Linux, because I 
cannot safely bring the disk online for format. By unsafe I mean that bringing 
the disk online can create contention on the real device that can last several 
minutes.

There are several possibilities I can think of:
1) Format every newly created disk in CMS before formatting in Linux. Directory 
maintenance products can do this. This means every disk would be formatted 
twice and every new disk creation would take twice as long (unless you stay 
with CMS format and not use CDL at all).
2) Do a security erase on every deleted disk. Again directory managers can do 
this, but the setting is optional. If you want to do this, you have to follow 
this rigorously on the entire DASD pool on which  minidisks are created. One 
deletion without security erase can potentially cause trouble.
3) Write nonsense data to the first tracks of the disk so that Linux would not 
recognize it as a known format and would not go into loops when the data on the 
disk is not right. Similar to 1) but faster.

After this, it is safe to format a disk with CDL from Linux.

As Mark has suggested, I need the ability to format the disk from Linux without 
needing to put it online first with Linux examining the contents. Without this, 
the CDL format is incomplete as it can only be safely applied to an already 
formatted disk.

As for the security question about Linux running on LPAR with disk shared by 
z/OS: what makes Linux different from other platforms? If Linux is not used to 
format disks, there must be another OS that has the ability to wipe out any of 
the shared disks and the person doing the format must know which disk they are 
formatting. Also, we are talking about security in the sense of preventing 
accidental deletion. A malicious user having access to Linux sharing disks with 
zOS can do harm to the shared disks by using raw_track_access unless the shared 
disks are protected against access from Linux. (As long as the attacker knows 
CKD architecture. As I have recently learned, you cannot just redirect 
/dev/zero to the disk in raw track format ;-))

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: DASD format from Linux only

2013-03-13 Thread Pavelka, Tomas
> Excuse my ignorance, but what is "FBAF"?

This is a minidisk definition in the user directory:
MDISK FBAF 3390 4819 1000 VMBL2H WR
FBAF is the virtual address of the disk.

When we first ran into this, we were doing experiments and weren't sure if we 
are able to reproduce the problem. But we had one minidisk where this always 
happened. Multiple people were involved so someone suggested a virtual address 
FBAD so that we knew which minidisk was the bad one. Further experiments led to 
more virtual addresses used, one of them was FBAF which I copied in my 
question. I did not realize it would confuse people, otherwise I would have 
just used something more normal, like 0100 ;-)

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


DASD format from Linux only

2013-03-12 Thread Pavelka, Tomas
We have been trying to format all minidisks from Linux only and this turned out 
to be problematic. I am looking for a solution that would let us stay in Linux 
without having to involve CMS format for every new minidisk. Let me first 
describe the problem:
When there is a record on dasd that has incorrect cylinder in the count area, 
this leads to "record not found" errors when the dasd is brought online. Since 
the dasd needs to be online before the problem is fixed (by formatting) the 
only way around that I can see is to preformat in CMS.
If new minidisks are regularly formatted and destroyed, it is possible to run 
into situation where part of the disk has the correct format and part has the 
cylinder number in the count area wrong.

Here is a way to reproduce:


1) Create a minidisk and format it with CDL, e.g.

MDISK FBAF 3390 4819 1000 VMBL2H WR

2) Delete it and create a minidisk starting at the next cylinder, but half the 
size of the first one:

MDISK FBAF 3390 4820 500 VMBL2H WR

3) Format it With CDL

4) Delete the disk and create a new one, spanning the first disk except for the 
first cylinder:

MDISK FBAF 3390 4820 999 VMBL2H WR

This will create a disk that has the first half correct, but the rest of the 
disk has the cylinders off by one in the count area.

5) Link it from Linux, and put it online



When the disk is put online, large number of "record not found" errors appear 
in the syslog. On some of our real devices, the errors appear in less than a 
second and the device can be formatted. On other real devices, the errors 
appear in the course of several minutes (highest I have observed was about 25 
minutes). While the errors appear, the device is not usable and cannot be put 
offline.



Why I think this is a problem (beyond cluttered syslog):

- The device cannot be put offline until the errors stop appearing. Sometimes 
dasdfmt with --force stops this, but only as long as the device is present in 
/dev which is not always the case.

- While the errors appear, there is contention on the real device where the 
minidisk is located. Any other Linuxes running from the real device becomes 
next to unusable.



There is a fix in the newer kernels that deals with a similar problem:

http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=3bc9fef9cc1e4047c3a3c51d84cc1c5d2ef03cea



I have tested it and it seems that the initial check is made on the first few 
cylinders only, if the count errors are further towards the end of the disk, 
the problem is still present.

Here is an example of the "record not found" error:

Mar 12 05:52:10 kernelts kernel: dasd-eckd 0.0.fbaf: The specified record was 
not found
Mar 12 05:52:10 kernelts kernel: dasd(eckd): I/O status report for device 
0.0.fbaf:
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): in req: 1ba4dcf0 CC:00 
FC:04 AC:00 SC:17 DS:02 CS:20 fcxs:01 schxs:02 RC:0
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): device 0.0.fbaf: Failing TCW: 
1ba4de40
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->length 64
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->flags d1
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->dcw_offset 0
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->count 4096
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): residual 4068
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->tsa.iostat.dev_time 81
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->tsa.iostat.def_time 0
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->tsa.iostat.queue_time 0
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->tsa.iostat.dev_busy_time 0
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->tsa.iostat.dev_act_time 0
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): Sense(hex)  0- 7: 00 08 00 00 
45 e6 3e 00
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): Sense(hex)  8-15: 00 00 00 00 
00 00 00 04
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): Sense(hex) 16-23: e5 11 6a 27 
85 00 0f 00
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): Sense(hex) 24-31: 00 00 40 e2 
00 03 e6 0e
Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): 24 Byte: 0 MSG 0, no MSGb to 
SYSOP
Mar 12 05:52:10 kernelts kernel: Buffer I/O error on device dasdd, logical 
block 179819

Let me know if I need to supply any more information. Also, can anyone think of 
a reason why on some real devices the errors appear in seconds and on others it 
takes such a long time?

Thanks,
Tomas

Tomas Pavelka
CA Technologies
Sr Software Engineer
Tel:  +420226207796
tomas.pave...@ca.com

[cid:image001.gif@01CE1F1F.093B1FC0]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on Sy

Re: Using dasd_configure in SLES11 SP2

2013-02-18 Thread Pavelka, Tomas
If it helps I have done the following experiment: Added a new minidisk to the 
guest with LVM and ext2 on it at vaddr 204 and added the following udev rule:

/etc/udev/rules.d/51-dasd-0.0.0204.rules
ACTION=="add", SUBSYSTEM=="ccw", KERNEL=="0.0.0204", IMPORT{program}="collect 
0.0.0204 %k 0.0.0204 dasd-eckd"
ACTION=="add", SUBSYSTEM=="drivers", KERNEL=="dasd-eckd", 
IMPORT{program}="collect 0.0.0204 %k 0.0.0204 dasd-eckd"
ACTION=="add", ENV{COLLECT_0.0.0204}=="0", ATTR{[ccw/0.0.0204]online}="1" 
ATTR{[ccw/0.0.0204]readonly}="0"

The volume group was active and mountable after boot and did not require 
recreating the initrd. What it needed was to have LVM support in the initrd in 
the first place, but once it works for one disk (203 in my case), additional 
ones can be added without the need to change the initrd.

Tomas

-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Leland 
Lucius
Sent: Monday, February 18, 2013 9:53 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: Using dasd_configure in SLES11 SP2

This would have been nice, but it doesn't work.  :-(

I will prevail though!  :-)

Leland

On 2/17/2013 11:36 PM, Leland Lucius wrote:
> An addendum to my previous post...
>
> Ran into an issue this weekend where we'd added a new disk to the root VG of 
> a system and that system wouldn't boot because we didn't recreate the initrd. 
>  Yes, really!  We actually had to recreate the initrd just because we added a 
> disk to the root VG.

>
> Being the non-conformist that I am, I couldn't abide that requirement, so a 
> small script and 3 commands later, I now have that problem solved.
>
> Save this script as /lib/mkinitrd/scripts/setup-dasd_all.sh:
>
> <>
> #!/bin/bash
> #
> #%stage: device
> #
>
>  cat > $tmp_mnt/etc/udev/rules.d/51-dasd-all.rules < ACTION=="add", SUBSYSTEM=="ccw", DRIVER=="dasd-eckd", ATTR{online}="1"
> ACTION=="add", SUBSYSTEM=="ccw", DRIVER=="dasd-fba", ATTR{online}="1"
> ACTION=="add", SUBSYSTEM=="ccw", DRIVER=="dasd-diag", ATTR{online}="1"
> EOF
>
>  verbose "[DASD] All DASD devices"
> <>
>
> Tell mkinitrd to use it: (all one line...watch for wrappage)
>
> ln -s ../scripts/setup-dasd_all.sh /lib/mkinitrd/setup/$(basename 
> /lib/mkinitrd/setup/*-dasd.sh | cut -f 1 -d -)-dasd_all.sh
>
> Recreate the initrd:
>
> mkinitrd -v
>
> You should see a line that says:
>
> [DASD] All DASD devices
>
> Now run zipl: (however you normally do it)
>
> zipl
>
> And now you're protected from missing disks in your root VG.
>
> Leland
>
> --
> For LINUX-390 subscribe / signoff / archive access instructions, send 
> email to lists...@vm.marist.edu with the message: INFO LINUX-390 or 
> visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> --
> For more information on Linux on System z, visit 
> http://wiki.linuxvm.org/
>

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Allocating DASD to new users.

2013-02-14 Thread Pavelka, Tomas
A minidisk is a continuous range of cylinders on a real disk, that the guest 
sees as a virtual disk. For example, the meaning of this:
MDISK 700 3390 0001 10016 MM2101

700 - this is the virtual address of the minidisk as seen by the Linux guest
3390 - this is the type of the real disk that the minidisk is on
0001 - The minidisk starts at cylinder 1 of the real disk on which it is located
10016 - The minidisk is 10016 cylinders long
MM2101 - this is the volume serial of the real disk that the minidisk is on.

More info about the MDISK directory statement is here:

http://publib.boulder.ibm.com/infocenter/zvm/v6r2/topic/com.ibm.zvm.v620.hcpa5/dmdisk.htm?resultof=%22%6d%64%69%73%6b%22%20

If you do not have a directory maintenance product (e.g. DIRMAINT or 
VM:Director), then in general case, you have to find gaps in free space by 
hand. But in your case it looks like your minidisks span the whole real disks 
(but I'm just guessing from the size):
MDISK 700 3390 0001 10016 MM2101
MDISK 701 3390 0001 10016 MM2201
So you may have an easier job of finding free real disks and assigning them 
from cyl 1 to the end as minidisks for the Linux guests.

HTH 
Tomas


-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Cameron 
Seay
Sent: Friday, February 15, 2013 4:07 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Allocating DASD to new users.

I am trying to copy either of these guests and replicate them for a class I am 
teaching:

USER LINUX001 LINUX001 1G 4G G
INCLUDE LNXDFLT
CPU 00
CPU 01
NICDEF F000 TYPE QDIO DEV 3 LAN SYSTEM VSWITCH1 MDISK 191 3390 0001 150 MM2001 
MDISK 700 3390 0001 10016 MM2101 MDISK 701 3390 0001 10016 MM2201 MDISK 900 
FB-512 V-DISK 409600 MR
*
**
*
USER LINUX002 LINUX002 1G 4G G
INCLUDE LNXDFLT
CPU 00
CPU 01
NICDEF F000 TYPE QDIO DEV 3 LAN SYSTEM VSWITCH1 MDISK 191 3390 0301 150 MM2001 
MDISK 700 3390 0001 10016 MM2104 MDISK 701 3390 0001 10016 MM2204 MDISK 900 
FB-512 V-DISK 409600 MR

My question: Disks 191 and 900 are exactly  the same for both users. For disks 
700 and 701, the disks where Linux is going to live, the disks are different 
slightly in their labeling: 700 is MM2101 on for user LINUX001 AND MM2104 for 
user LINUX002, etc.  I know how to use DISKMAP somewhat. But I'm not completely 
clear about how to apportion DASD for these users. I sure don't want to 
overwrite an address and I know DISKMAP will warn you about overlaps but I 
would hate to mess up the addressing. I was with the VM sysprog when he defined 
these, but I forgot exactly what I'm supposed to do.  Do I find more minidisks 
labeled MM and assign them sequentially in DISKMAP?  I have never allocated 
DASD before. I am going to need about
30-40 guests and I have the DASD for it.

Thanks!


--
Cameron Seay, Ph.D.
Department of Computer Systems Technology School of Technology NC A & T State 
University Greensboro, NC
336 334 7717 x2251

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Using dasd_configure in SLES11 SP2

2013-02-14 Thread Pavelka, Tomas
Hi Leland,
I did, I'm just far away timezone-wise ;-) It did not occur to me that you can 
do tests in udev rules. What we ended up doing was access the file system 
before IPLing the Linux and dropping in the individual dasd rules. We did this 
because we were already accessing the fs for other tasks and because we knew in 
advance what should be R/O and what RW. But your way is useful when you cannot 
afford to access the fs before every IPL.

Thanks,
Tomas

-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Leland 
Lucius
Sent: Thursday, February 14, 2013 1:58 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: Using dasd_configure in SLES11 SP2

Hi Tomas,

Did you catch this line in my other reply?

# Set them to readonly if linked R/O
ACTION=="add", SUBSYSTEM=="ccw", DRIVER=="dasd-eckd", PROGRAM="/bin/sh -c 
'/sbin/modprobe vmcp;/sbin/vmcp q v dasd|grep ${DEVPATH##*.}|grep -q R/O'", 
ATTR{readonly}="1"

Again...make sure it's all on one line.

Leland

On 2/13/2013 1:31 AM, Pavelka, Tomas wrote:
> Will your solution preserve read only attributes? I.e. if you bring all dasd 
> online with a single udev rule, will those linked as read only have the 
> correct read only attributes so the kernel knows that it cannot write to them?
>
> Example of what I mean by correct read only attributes:
>
>> vmcp q v dasd
> DASD 0200 3390 VMBL1V R/W353 CYL ON DASD  8460 SUBCHANNEL = 0001
> DASD 0201 3390 VMBL1J R/O683 CYL ON DASD  845C SUBCHANNEL = 0002
>
>> lsdasd
> Bus-ID Status  Name  Device  Type  BlkSz  Size  Blocks
> ==
> 0.0.0200   active  dasda 94:0ECKD  4096   248MB 63540
> 0.0.0201   active(ro)  dasdb 94:4ECKD  4096   480MB 122940
>
> We have ran into the same problem you are describing and ended up making 
> individual rules for each dasd (e.g. 
> /etc/udev/rules.d/51-dasd-0.0.0200.rules) to preserve read only attributes. 
> But we are new to SUSE and haven't experimented with a single rule for all 
> dasd which is why I am curious.
>
> Tomas
>
> --
> For LINUX-390 subscribe / signoff / archive access instructions, send 
> email to lists...@vm.marist.edu with the message: INFO LINUX-390 or 
> visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> --
> For more information on Linux on System z, visit 
> http://wiki.linuxvm.org/
>

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Using dasd_configure in SLES11 SP2

2013-02-12 Thread Pavelka, Tomas
Will your solution preserve read only attributes? I.e. if you bring all dasd 
online with a single udev rule, will those linked as read only have the correct 
read only attributes so the kernel knows that it cannot write to them?

Example of what I mean by correct read only attributes:

> vmcp q v dasd
DASD 0200 3390 VMBL1V R/W353 CYL ON DASD  8460 SUBCHANNEL = 0001
DASD 0201 3390 VMBL1J R/O683 CYL ON DASD  845C SUBCHANNEL = 0002

> lsdasd
Bus-ID Status  Name  Device  Type  BlkSz  Size  Blocks
==
0.0.0200   active  dasda 94:0ECKD  4096   248MB 63540
0.0.0201   active(ro)  dasdb 94:4ECKD  4096   480MB 122940

We have ran into the same problem you are describing and ended up making 
individual rules for each dasd (e.g. /etc/udev/rules.d/51-dasd-0.0.0200.rules) 
to preserve read only attributes. But we are new to SUSE and haven't 
experimented with a single rule for all dasd which is why I am curious.

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Speed of BASH script vs. Python vs. Perl vs. compiled

2013-02-04 Thread Pavelka, Tomas
The internal bash parameter expansion functions (e.g. ${line%% *}) tend to be 
quite inefficient. Here is one example, compare the performance of bash 
substitution to Perl substitution:

#!/bin/bash
comma_sep=$(perl -e 'for($i=0;$i<1000;$i++) { print("$i;") };')
time space_sep=${comma_sep//;/ }
time space_sep=$(echo $comma_sep | perl -pe 's/;/ /g')

On my machine, the times look like this:

Bash:
real0m2.140s
user0m2.049s
sys 0m0.002s

Perl:
real0m0.007s
user0m0.010s
sys 0m0.002s

I got burned more than once by unexpected inefficiencies in bash. As others 
have said, Perl is very fast and reliable for large text manipulations. 

(The script is just an example to show the point. Replacing commas with spaces 
through substitution is not a good idea. Transliteration will do the job 
faster.)

HTH,

Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: dasdfmt - why are you so darn slow?

2012-11-08 Thread Pavelka, Tomas
I have recently run a similar experiment measuring how fast do various disk 
operations run. Although I haven't attempted to compare dd with dasdfmt and the 
variation is high, here are the most commonly seen numbers:

dasdfmt + pvcreate : 30-40 MB/s
dd : 30-80 MB/s
dasdfmt + pvcreate, parallel pool of 20 workers formatting all  physical disks 
in a large LVM group: 150-300 MB/s

After reading your results I have noticed that I have never seen dasdfmt run 
faster than 40 MB/s while I have regularly seen dd run at 80 MB/s.

Tomas


-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Marcy 
Cortes
Sent: Thursday, November 08, 2012 2:05 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: dasdfmt - why are you so darn slow?

CA Hidro can copy one in 11 minutes.
DFDSS can dump one to VTS in 6 minutes according to my z/OS guy dumping our 
stuff  (I wouldn't think VTS would be quicker than DASD - maybe pretty similar) 
I haven't timed DDR but I don't think it is all that great either - certainly 
it is much worse the hidro.

So yeah, I think both VM and Linux could be doing better here :)


Marcy 


-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Robert J 
Brenneman
Sent: Wednesday, November 07, 2012 4:15 PM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: [LINUX-390] dasdfmt - why are you so darn slow?

Additionally - why does Linux not make better use of the I/O subsystem ?

For example, a z/OS DSF copy job copying a dataset from one volume to another 
uses like 6% of a CPU at most, whereas Linux dd or cp uses 100% of a CPU, and 
doesn't go noticably faster than the DSF job.

Could Linux make better use of CCWs to have the subsystem handle the copy 
rather than actually moving the data blocks through the main memory?

--
Jay Brenneman

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Booting from LVM device with multiple physical volumes

2012-08-29 Thread Pavelka, Tomas
Hello,
Does anyone have experience with booting from an LVM device that spans multiple 
physical disks? This presentation:
https://share.confex.com/share/118/webprogram/Handout/Session10310/System%20z%20Roadmap%20SHARE%202012.pdf
Says that it is possible:
"zipl bootloader supports device-mapper (LVM & multipath) devices. Installer 
now allows /boot these device"

I was even able to boot from a multi volume LVM device by specifying the base 
device parameters without using the zipl_helper.device-mapper helper script. 
However, if I try it with the helper script zipl_helper.device-mapper, it ends 
with this error:

"Error: Unsupported setup: Directory '$directory' is located on a multi-target 
device-mapper device"

Also, the "Device Drivers, Features and Commands" book says that it is not 
possible to boot from multiple volume device except for e.g. mirror setup (if 
my reading of the text is correct):

"For a mapping to multiple real devices all the real devices must share the 
device
characteristics and contain the same data (for example, a mirror setup). The
mapping can also be to parts of the devices as long as the parts contain block 
0.
The mapping must not combine multiple devices into one large device."

My understanding is that zipl does not understand the filesystem the kernel and 
initrd images are on (like GRUB does), but instead stores the offset at which 
the kernel image can be found (like LILO does). What happens when the initrd 
image ends up on one physical device and the kernel image on another? Or part 
on one device and part on another? Will zipl still be able to boot? My guess is 
that it won't which makes booting from multi disk LVM possible in some cases, 
but generally unsafe (which is why I suspect the error message in the helper 
script is there). Is this a correct interpretation? It would be very useful in 
my project if I was proven wrong here.

Thanks,
Tomas

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Synchronous option for chccwdev -- was there a resolution?

2012-07-26 Thread Pavelka, Tomas
Is there a known reliable workaround? I have tried running dasdfmt in a script 
right after chccwdev and run into the non-existent device problem. So I tried 
putting a loop after chccwdev waiting for the device to appear in /dev, like 
this:

while [[ ! -b $dev ]] ; do
  sleep 0.1
done

and then ran dasdfmt. That works most of the time, but still isn't reliable 
enough, especially on slow systems I get the occasional 

"DASD format failed: dasdfmt: (format cylinder) IOCTL BIODASDFMT failed. 
(Input/output error)"

I know it is possible to rerun the dasdfmt, or whatever command follows the 
chccwdev, but it would be much nicer to have some indicator that the device is 
really ready. Then I could just write a wrapper around chccwdev and forget 
about this problem. 

Thanks,

Tomas

-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Florian 
Bilek
Sent: Friday, July 27, 2012 6:28 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: Synchronous option for chccwdev -- was there a resolution?

Hi David,

Thank you for bringing up this topic again. No, unfortunately there was no 
other solution than to rerun the commands.

I think there should be an option for chccwdev to wait till DE/CE is received 
and not to terminate with device busy.

Kind regards,
Florian

On Thu, Jul 26, 2012 at 6:52 PM, David Boyes  wrote:

> A week or two back, someone (I think it was Florian Bilek) asked why 
> there was a delay between invoking chccwdev and the device becoming 
> available, and whether there was an option or command that would exit 
> only when the device was actually available. There was some discussion 
> of the --settle option in udev, but I don't recall seeing a resolution 
> other than "loop on (test for device availability;sleep a few seconds) 
> repeat".
>
> Was there a better solution? If not, could the IBM developers add a 
> --sync option to chccwdev that forces chccwdev to wait until the 
> requested operation is actually completed before exiting?
>
> --
> For LINUX-390 subscribe / signoff / archive access instructions, send 
> email to lists...@vm.marist.edu with the message: INFO LINUX-390 or 
> visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> --
> For more information on Linux on System z, visit 
> http://wiki.linuxvm.org/
>

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/