Re: LVM Program Dinner at SHARE

2019-03-13 Thread Martin Schwidefsky
Hi Rich,

On Tue, 12 Mar 2019 12:01:18 -0700
Rich Smrcina  wrote:

> For those of you at SHARE in Phoenix. If wish to join us for the LVM program 
> dinner, please let me know either via email or here at SHARE face-to-face.
> 
> We will be going to Steve’s Greenhouse Grill on Adams Street a half a block 
> from of the 2nd Street entrance to the convention center.
> 
> Feel free to meet us there by 7PM, or a group will depart from the Sheraton 
> at 6:45 sharp to get to the restaurant.
> 
> Must be present to attend.

Meet you all at 6:45 then, I'll join.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Linux guest stop responding

2018-09-24 Thread Martin Schwidefsky
On Fri, 21 Sep 2018 01:23:46 +
Victor Echavarry  wrote:

> Today two linux guests stops responding. Both server are idle and using less 
> than 10% of cpu.
> 
> 
> Checking the console we found these message
> 
> 
> 1: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop 
> from CPU 00.
> 
> 00: HCPGIR450W CP entered; disabled wait PSW 00020001 8000  
> 0010BF02
> 
> 
> 
> The VM, 6.4 was ok at the problem. We open an issue with IBM. Does anyone see 
> this before?

Do you know the exact kernel version that has been running at the time of the 
stop?
The instruction address 0x10bf02 should give us a hint about the function that
caused the stop.g

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: z/Linux 32-bit modules

2018-05-22 Thread Martin Schwidefsky
On Mon, 21 May 2018 07:21:38 +1000
Paul Edwards  wrote:

> When 32-bit modules are created on z/Linux
> using "gcc -m32" or whatever, is the resultant
> module run as AM31 or AM64?

The -m31 option creates objects and binaries in the ELF32 format.
Without the option or with -m64 the ELF64 format is used.

A 64-bit kernel recognizes the format of the binary and starts it
in the appropriate mode. Native for ELF64 or compat for ELF32.
The addressing mode is AMODE-64 for native 64-bit programs and
AMOED-31 for compat 31-bit code.

> If the answer is AM31, then what happens if it
> is run as AM64 instead?

The process is started by the kernel in the "correct" mode,
you can not specify that a ELF32 binary is started in AMODE-64.

The kernel enforces limits as well, e.g. with a 31-bit compat
process (ELF32) you can not map anything above the 2GB line.
The format of the signal handler frame is different as well
for ELF32 vs ELF64.

But you can switch the addressing mode in your program with
the sam24, sam31, and sam64 instructions. If the code of the
program is at a location that goes along with the chosen
addressing mode, the CPU will happily execute the instructions.
You better not forget to switch back to the default mode before
calling a function in the runtime environment. Otherwise your
program will quickly terminate.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP

2018-04-12 Thread Martin Schwidefsky
On Thu, 12 Apr 2018 08:13:43 -0400
Rick Barlow  wrote:

> We have started seeing a rash of errors on some of our Linux virtual
> servers. The message we see on the virtual machine console is "Unknown
> program exception: 0028 [#1] SMP". It appears to only affect servers that
> were recently patched to kernel level "2.6.32-696.23.1.el6.s390x #1 SMP Sat
> Feb 10 11:11:31 EST 2018". It does not affect all of our servers that were
> recently patched. I suspect that it might be related to patches related to
> Spectre. I did a google search and did not get any hits on the message. I
> expect our Linux team will contact Red Hat support.

Program check 28 is ALET-specification exception. It is very unlikely that
Linux causes this exception. The only piece of code that uses the access
register mode is the clock_gettime() function in the vdso code. And then
only for the CPUCLOCK_VIRT clock source with a constant ALET.

--
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: SLES 11 under z/VM clock incorrectly set at boot

2016-11-29 Thread Martin Schwidefsky
On Wed, 30 Nov 2016 00:22:41 -0500
Alan Altmark <alan_altm...@us.ibm.com> wrote:

> On Tuesday, 11/29/2016 at 08:56 GMT, Aria Bamdad <a...@bsc.gwu.edu> wrote:
> > When you say the virtual TOD is on UTC, do you mean that even if I IPL
> > z/VM using
> > the local time and timezone, the TOD reported to Linux is in UTC?
> > Then this means
> > regardless of what z/VM time is set to, the guests always see UTC as
> > the TOD time.
>
> Unless you use CP SET VTOD command or the guest issues a SET CLOCK or
> PERFORM TIMING FACILITY FUNCTION instruction to change the virtual TOD,
> the virtual TOD will match the LPAR TOD.  Further, the LPAR TOD will match
> the machine TOD.  I'm over-simplifying, but it's good enough for our
> discussion.

Linux uses the only available source to get the initial time which is
the TOD clock. The assumption is that the wall clock returned by the TOD
is STP time. If you configure your machine by the book the following
holds:
STP timestamp = UTC timestamp + leap-second-offset

As Alan already mentioned most people simply use UTC for the TOD clock
and ignore leap seconds. In this case the time stamp returned by the TOD
clock is UTC. The Linux time is calculated in UTC, any timezone offset
is added by userspace.

Now we have three cases:
1) TOD = UTC with leap seconds.
   The initial system time is UTC. All is good.
2) TOD = UTC without leap seconds.
   The initial system time is UTC + leap-seconds. The initial system
   time is off by the number of leap seconds.
3) TOD = local time
   Linux has no way to know what you did to your machine TOD clock.
   The initial system time will by off by the timezone offset.

There is an upstream patch to read the configured number of leap seconds
at IPL and subtract it from the TOD clock to get the initial time. This
fixes case 2). See the following git commit:

commit 936cc855ffe808b428cf75116fe031af9f12237e
Author: Martin Schwidefsky <schwidef...@de.ibm.com>
Date:   Tue May 31 12:47:03 2016 +0200

s390/time: add leap seconds to initial system time

The PTFF instruction can be used to retrieve information about UTC
including the current number of leap seconds. Use this value to
convert the coordinated server time value of the TOD clock to a
proper UTC timestamp to initialize the system time. Without this
correction the system time will be off by the number of leap seonds
until it has been corrected via NTP.

Signed-off-by: Martin Schwidefsky <schwidef...@de.ibm.com>

This is upstream only right now.

> > Based on the above, I tested setting the correct region, and then
> > selected 'Hardware
> > clock set to UTC' in Yast and rebooted.  Sure enough, the time in
> > Linux shows my local
> > time correctly.
> >
> > So does this mean that regardless of what z/VM time is set to, the
> > Linux guests should
> > select their time as 'Hardware clock set to UTC'?
>
> Before Linux can correctly calculate the time, it needs to know how the
> (virtual) TOD clock is set.  Some people (unwisely) set the TOD clock to
> the local time instead of UTC by setting local time with offset 0 (gag).
> But most people set the TOD clock to UTC (ignoring leap seconds).

If you do that the only way to get your system time corrected is NTP.

> > Is the kernel parameter clocksource=tod needed?
>
> I don't think so, but I'm not sure.  A long time ago the timer support in
> Linux was changed to use the TOD rather than depending on a ticking
> software clock.  (Martin?)

No, you do not have to do this. There is only one clocksource which is the
TOD and it is used automatically.

My suggestion would by to configure your machines TOD clock to UTC.

--
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Back to the future?

2016-07-27 Thread Martin Schwidefsky
On Wed, 27 Jul 2016 10:28:04 +0200
Rob van der Heij  wrote:

> On 26 July 2016 at 19:33, Marcy Cortes 
> wrote:
>
> > Martin wrote:
> >
> > >Either the sysadmin or NTP should do this, otherwise the system clock
> > will be off by 26 seconds (soon 27 seconds as another leap second is
> > scheduled).
> >
> > This kind of implies that if I disable NTP on a server and reboot, it
> > should be 26 seconds or so off.
> > It's not.   I'm seeing .003590 offset on one I just tried.
> >
>
> Unfortunately NTP does not work well inside a virtual machine. The
> algorithms are designed with network latency and eliminate those aspects.
> The latency in dispatching a virtual machine is rather different. I have
> seen an ntpd steered guest have serious mood swings, larger than the
> dispatch latency. And 3.5 ms is pretty good. If you need reasonable time in
> Linux, you might want to adjust clocks once during boot and not run ntpd
> after that. Go read the 100 pages on para-virtualized RTC on other
> platforms and notice there are still guests that drift away.
>
> Last time I looked, the HWCLOCK was broken for s390x. It is meant to deal
> with the cheap RTC chip to keep time over periods of power-off and primed
> the clock the wrong way (confusing ntpd and make the clock jump after a
> while).

Are you referring to the RTC clock interface of the kernel? If so then yes,
that never worked for s390. If you look into drivers/rtc/Kconfig you'll
find this:

menuconfig RTC_CLASS
bool "Real Time Clock"
default n
depends on !S390 && !UML
select RTC_LIB
help
  Generic RTC class support. If you say yes here, you will
  be allowed to plug one or more RTCs to your system. You will
  probably want to enable one or more of the interfaces below.

> A z/OS system requires the hardware TOD to run TA1 and converts TOD clock
> values (current or historic) it uses a table with 26 ^w 27 TOD values where
> leap seconds were added. If you are close with z/OS your z/VM TOD runs TA1.
> Since z/VM does not have that table, your UTC (and local time for CP and
> CMS things) will be off by the 26 leap seconds unless you change it at IPL
> (which goes into the LPAR offset). The HMC has the current number of leap
> seconds defined to obtain the TA1 from NTP. You schedule the 27th leap
> seconds to happen just at the right time. When you're just 25 seconds off,
> someone forgot that update ;-)

If I do a "#cp query time" on my z/VM guests I get something like this:

00: CP QUERY TIME
00: TIME IS 14:01:14 CST WEDNESDAY 07/27/16
00: CONNECT= 99:59:59 VIRTCPU= 007:13.79 TOTCPU= 007:35.53

The interesting part is "CST", which stands for coordinated server time.
And this does not include leap seconds. That implies that your local time
for CP and CMS is not in UTC, no?

> If you don't care about z/OS, you might keep the HMC-defined offset at 0
> and run at UTC rather than TA1. When NTP injects the leap second, STP/ETR
> will work through the night and slowly adjust time again.

Why should STP adjust anything after NTP injected a leap second?!? The TOD
clock does not include leap seconds, there is nothing to adjust.
The NTP injected leap second will only show up in the system time offset
that is added to the TOD clock.

> A pragmatic approach could be to have a single virtual machine after z/VM
> IPL obtain UTC time (eg through my NTPDATE EXEC or a table with leap second
> moments) and issue a SET VTOD to correct the offset at IPL. The Linux
> guests could have the directory statement to say "I have what she has" and
> run all with the proper 26 seconds offset. Linux would see a TOD in UTC
> again. After that STP/ETR will steer the hardware TOD to stay on time.
> Unfortunately this does not handle the moment when the leap second is
> injected.

Well, STP will report the TOD time stamp when a leap second is due. How to
do the leap second clock drift is up to the OS.

--
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Back to the future?

2016-07-27 Thread Martin Schwidefsky
On Tue, 26 Jul 2016 17:33:31 +
Marcy Cortes  wrote:

> Martin wrote:
>
> >Either the sysadmin or NTP should do this, otherwise the system clock will 
> >be off by 26 seconds (soon 27 seconds as another leap second is scheduled).
>
> This kind of implies that if I disable NTP on a server and reboot, it should 
> be 26 seconds or so off.
> It's not.   I'm seeing .003590 offset on one I just tried.

LPAR or z/VM? I just tried as well and I do get the 26 second offset for both 
LPAR and z/VM.
How did you get the timestamps you are comparing?

--
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Back to the future?

2016-07-27 Thread Martin Schwidefsky
On Tue, 26 Jul 2016 17:29:24 +0200
WF Konynenberg <w...@konynenberg.org> wrote:

> Hi Martin,
>
> On 07/26/2016 04:12 PM, Martin Schwidefsky wrote:
> > On Tue, 26 Jul 2016 10:37:33 +0200
> > WF Konynenberg <w...@konynenberg.org> wrote:
> >
> > Either the sysadmin or NTP should do this, otherwise the system clock will
> > be off by 26 seconds (soon 27 seconds as another leap second is scheduled).
>
> Indeed.  Where "the sysadmin" could simply translate to "a suitable
> hwclock equivalent tool that understands how to apply the leap second
> adjustment to the value returned by the hardware clock", being run in
> the RC scripts roughly where on other systems hwclock would be run.
> Though in this case, if the leap second info is actually available from
> the hardware in a defined way, you might as well just stick that tiny
> bit of logic in the linux kernel code that reads the hw clock at boot to
> initialize the system clock.
>
> >
> >>>> In an LPAR, Linux will sync the LPAR TOD to the CPC TOD when it boots.
> >>>> After that it depends on STP, and it really depends on having the proper
> >>>> number of leap seconds configured in the CPC and in Linux.
> >>> Only if STP is enabled. The default is off, in this case Linux just uses 
> >>> the
> >>> current TOD clock to initialize the system time.
> >> This is getting a bit confusing to me.
> >> Do I understand correctly that when STP is enabled, Linux uses the
> >> hardware TOD clock directly as its system clock, adjusting it on the fly
> >> as needed as per any local system clock adjustments that have been made?
> >> And that otherwise it simply initializes the internal system clock from
> >> the TOD clock at startup and then runs an independent software system
> >> clock, in the classic way?
> > The Linux kernel does this as IPL time, it does know better at this early
> > stage. Later when user space is powered up you can use e.g. ntpdate to
> > set a more reasonable system time.
>
> Given what you write below about STP providing this info and even a nice
> interrupt when the value changes, I'ld say that the Linux kernel
> *should* know better and be able to initialize the system clock
> correctly from the hardware clock from the get go.

Like this? ;-)

commit 936cc855ffe808b428cf75116fe031af9f12237e
Author: Martin Schwidefsky <schwidef...@de.ibm.com>
Date:   Tue May 31 12:47:03 2016 +0200

s390/time: add leap seconds to initial system time

The PTFF instruction can be used to retrieve information about UTC
including the current number of leap seconds. Use this value to
convert the coordinated server time value of the TOD clock to a
proper UTC timestamp to initialize the system time. Without this
correction the system time will be off by the number of leap seonds
until it has been corrected via NTP.

Signed-off-by: Martin Schwidefsky <schwidef...@de.ibm.com>

> >> You shouldn't really need NTP for that.
> >> The doc I just read about STP seemed to suggest that the leap second
> >> info is entered into the hardware console by the admin and then obtained
> >> by some means by z/OS to apply the appropriate UTC adjustment to the TOD
> >> clock, before applying the local time adjustment.
> >> If Linux could access this same info somehow, it could apply the same
> >> adjustment to its UTC system clock.
> > STP reports the current number of leap seconds. It even gives you a nice
> > interrupt to inform you that a change in the number of leap seconds
> > is due. But what do you do with this information? You can not just add
> > the leap second, if you have NTP running as well you end up with duplicates.
> > And you do not want to insert the leap second twice..
>
> That ought to be reasonably simple:
>
> When running with an STP synchronized hardware TOD clock, by default you
> do exactly what the documentation says z/OS does in that case:
> Take the STCK(E) output, apply the leap second offset provided by the
> STP hardware, and use the result as the reference UTC value.
> When the Linux kernel is using the TOD as its system clock and merely
> applies a delta to report the actual system clock value, then it should
> probably adjust the offset whenever the leap second offset changes.
> When the Linux kernel is using its own internal software system clock, a
> similar delta should be applied to the system clock upon a leap second
> change.
> The resulting behaviour of the Linux system clock will be comparable to
> its behaviour when it is managed by NTP.
>
> When running with an STP synchronized hard

Re: Back to the future?

2016-07-26 Thread Martin Schwidefsky
On Tue, 26 Jul 2016 09:42:23 -0400
Robert J Brenneman  wrote:

> Hmmm... When configuring STP and the HMC to retrieve time from a GPS based
> NTP server appliance, I am betting that the GPS time source also already
> has the leap seconds applied. In that case - Should we be setting the HMC
> leap second value for the STP network to 0 ?

No, that would just remove the correct information about the leap seconds.

--
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Back to the future?

2016-07-26 Thread Martin Schwidefsky
On Tue, 26 Jul 2016 10:37:33 +0200
WF Konynenberg <w...@konynenberg.org> wrote:

> Hi Martin,
>
> A more definite source of information.   Good.  :-)
>
>
> On 07/26/2016 09:21 AM, Martin Schwidefsky wrote:
> > On Tue, 26 Jul 2016 01:30:03 -0400
> > Alan Altmark <alan_altm...@us.ibm.com> wrote:
> >
> >
> >> Linux on z does not alter the TOD clock after it's booted.  NTP affects
> >> only the offset from the TOD that the kernel maintains.  Any app that
> >> wants to know the time must ask the kernel.  It can't just issue STCK(E).
> > This is important enough to reiterate: STCK(E) will not give you the correct
> > system time if you are using NTP to drift the time. Use gettimeofday() to
> > retrieve the time, this call is optimized with a function in the VDSO.
> > The VDSO knows about the offset between the TOD clock and the system time
> > and does the necessary calculations. It is reasonable fast as no system
> > call is required.
>
> Well duh.  Applications on Linux should generally not go directly to the
> hardware for this kind of info, but rather should use defined standard
> operating system interfaces.
>
> Even if you aren't using NTP to keep the time in sync, when the hardware
> TOD is not kept in UTC (as it appears to be the case with STP), and the
> sysadmin has correctly set the system time to UTC, the time you get from
> STCK(E) would be off by some 26 seconds from the correct UTC system time.

Either the sysadmin or NTP should do this, otherwise the system clock will
be off by 26 seconds (soon 27 seconds as another leap second is scheduled).

> >
> >> In an LPAR, Linux will sync the LPAR TOD to the CPC TOD when it boots.
> >> After that it depends on STP, and it really depends on having the proper
> >> number of leap seconds configured in the CPC and in Linux.
> > Only if STP is enabled. The default is off, in this case Linux just uses the
> > current TOD clock to initialize the system time.
>
> This is getting a bit confusing to me.
> Do I understand correctly that when STP is enabled, Linux uses the
> hardware TOD clock directly as its system clock, adjusting it on the fly
> as needed as per any local system clock adjustments that have been made?
> And that otherwise it simply initializes the internal system clock from
> the TOD clock at startup and then runs an independent software system
> clock, in the classic way?

The Linux kernel does this as IPL time, it does know better at this early
stage. Later when user space is powered up you can use e.g. ntpdate to
set a more reasonable system time.

> >
> >> We really need CP to virtualize STP (when real STP is being used e.g. with
> >> NTP).  That would allow Linux to always have the correct time without
> >> running its own NTP client.
> > It would certainly improve things but for at least one aspect you still
> > need NTP: to inject leap seconds.
>
> You shouldn't really need NTP for that.
> The doc I just read about STP seemed to suggest that the leap second
> info is entered into the hardware console by the admin and then obtained
> by some means by z/OS to apply the appropriate UTC adjustment to the TOD
> clock, before applying the local time adjustment.
> If Linux could access this same info somehow, it could apply the same
> adjustment to its UTC system clock.

STP reports the current number of leap seconds. It even gives you a nice
interrupt to inform you that a change in the number of leap seconds
is due. But what do you do with this information? You can not just add
the leap second, if you have NTP running as well you end up with duplicates.
And you do not want to insert the leap second twice..

> >
> >> The problem today is that VM will not generate a sync check when the
> >> difference between NTP and the TOD becomes large enough that it cannot be
> >> steered out in a short period of time, such as when the external time
> >> source is reconnected or when a leap second is added.
> > STP does not generate a sync check for leap seconds, they are not included 
> > in
> > the TOD clock. The programming notes in the PoP about the Time-of-Day Clock
> > you will find this:
> >
> > "In converting to or from the current date or time, the programming support
> >   must take into account that leap seconds have been inserted or deleted
> >   because of time-correction standards. When the TOD clock has been set
> >   correctly to a time within the standard epoch, the sum of the accumulated
> >   leap seconds must be subtracted from the clock time to determine UTC 
> > time."
>
> Hmm, so how is the leap second info kept up to date for z/OS?
>

Re: Back to the future?

2016-07-26 Thread Martin Schwidefsky
On Tue, 26 Jul 2016 01:30:03 -0400
Alan Altmark  wrote:

> On Monday, 07/25/2016 at 11:36 GMT, WF Konynenberg 
> wrote:
> > Yes, when your hardware clock is not managed by NTP you cannot rely on
> it to accurately initialize
> > the system clock on boot, so you should force an NTP clock sync ("jump")
> on startup.  Otherwise the
> > NTP smooth clock adjustment could take quite a while to get the system
> clock synced.
> >
> > Once the initial sync has happened, NTP should be able to keep the clock
> within at most a few
> > milliseconds accurate, even on a VM with some scheduling artifacts.
>
> Linux on z does not alter the TOD clock after it's booted.  NTP affects
> only the offset from the TOD that the kernel maintains.  Any app that
> wants to know the time must ask the kernel.  It can't just issue STCK(E).

This is important enough to reiterate: STCK(E) will not give you the correct
system time if you are using NTP to drift the time. Use gettimeofday() to
retrieve the time, this call is optimized with a function in the VDSO.
The VDSO knows about the offset between the TOD clock and the system time
and does the necessary calculations. It is reasonable fast as no system
call is required.

> In an LPAR, Linux will sync the LPAR TOD to the CPC TOD when it boots.
> After that it depends on STP, and it really depends on having the proper
> number of leap seconds configured in the CPC and in Linux.

Only if STP is enabled. The default is off, in this case Linux just uses the
current TOD clock to initialize the system time.

> We really need CP to virtualize STP (when real STP is being used e.g. with
> NTP).  That would allow Linux to always have the correct time without
> running its own NTP client.

It would certainly improve things but for at least one aspect you still
need NTP: to inject leap seconds.

> The problem today is that VM will not generate a sync check when the
> difference between NTP and the TOD becomes large enough that it cannot be
> steered out in a short period of time, such as when the external time
> source is reconnected or when a leap second is added.

STP does not generate a sync check for leap seconds, they are not included in
the TOD clock. The programming notes in the PoP about the Time-of-Day Clock
you will find this:

"In converting to or from the current date or time, the programming support
 must take into account that leap seconds have been inserted or deleted
 because of time-correction standards. When the TOD clock has been set
 correctly to a time within the standard epoch, the sum of the accumulated
 leap seconds must be subtracted from the clock time to determine UTC time."

--
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: SUSE12 Gold image on VM

2015-06-17 Thread Martin Schwidefsky
Mark Post mp...@suse.com wrote on 06/16/2015 04:53:29 PM:

 From: Mark Post mp...@suse.com
 To: Linux on 390 Port LINUX-390@VM.MARIST.EDU,
 Date: 06/16/2015 04:53 PM
 Subject: Re: SUSE12 Gold image on VM

  On 6/16/2015 at 09:55 AM, Alan Altmark alan_altm...@us.ibm.com
wrote:
  On Tuesday, 06/16/2015 at 12:50 EDT, Mark Post mp...@suse.com wrote:
   On 6/15/2015 at 11:54 PM, Alan Altmark alan_altm...@us.ibm.com
  wrote:
   Neither z/VM nor Linux have to specify a portname (real or virtual).
I
   recommend that you do NOT specify the portname.
  
   I don't understand why the distros still worry about the portname.
We
  got
   rid of the requirement for z/VM and Linux on the z900 and z800 back
in
   2003.  Just issue a msg that it's obsolete and ignore it.
 
  Even for Linux in an LPAR sharing an OSA with a z/OS system that is
  specifying
  (their equivalent of) a portname?
 
  Yes, even then.  You can google OSA portname relief.

 OK then.  I'm going to open up a bug with IBM to either remove that
 attribute from the driver, or make it read-only in sysfs.
 Preferably removing it altogether since it can only cause problems
 by being present.

 Looking at the results of my Google searches, I can see why I never
 made the complete connection.  There were always statements made
 about z/OS requiring the parameter, and if you specify it in Linux
 it has to match.  Looking at it now, I understand what was meant.
 Back then it meant (to me and apparently others) if you're sharing
 an OSA you need to specify the same portname as z/OS is using.

Well, technically there have been machines with OSA cards that required
the portname which is why the parameter has survived until now. There
is a bit in the response block of the read channel activation ccw that
tells us if the portname is required.

With the option for 31-bit kernel builds gone from the upstream source
and the fact that the z900/z800 does not require the portname we could
remove the portname code from the OSA driver.

Uschi, care to create a patch to remove everything portname related from
the OSA driver?

blue skies,
   Martin

Martin Schwidefsky
Linux on System z Development
IBM System  Technology Group, System Software Development

IBM Deutschland Research  Development GmbH
Vorsitzender des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Shared library question

2011-11-10 Thread Martin Schwidefsky
On Wed, 9 Nov 2011 13:21:34 -0600
Neale Ferguson ne...@sinenomine.net wrote:

 I’ve verified that the code generated is correct:
 
lgr %r1,%r11
 aghi%r1,168
 lgr %r2,%r1
 lghi%r3,0
 brasl   %r14,gettimeofday@PLT
 lg  %r1,168(%r11)

That looks good.

 Used readelf on the objet to see what it thinks about that symbol:
 
 012a  06290013 R_390_PC32DBL  gettimeofday + 2
 3736  06290013 R_390_PC32DBL  gettimeofday + 2
   1577:  0 NOTYPE  GLOBAL DEFAULT  UND gettimeofday

This is not good. On which object do you use the readelf on?
On my little test shared library it looks like this:

Relocation section '.rela.plt' at offset 0x4f0 contains 2 entries:
  Offset  Info   Type   Sym. ValueSym. Name + Addend
2018  0004000b R_390_JMP_SLOT __cxa_finalize + 0
2020  0005000b R_390_JMP_SLOT gettimeofday + 0

No R_390_PC32DBL relocation (which cannot work as you branch between two shared
objects, the new shared library and the glibc).

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Shared library question

2011-11-10 Thread Martin Schwidefsky
On Thu, 10 Nov 2011 15:16:55 +0100
Andreas Krebbel kreb...@linux.vnet.ibm.com wrote:

 On 11/09/2011 08:21 PM, Neale Ferguson wrote:

 Hi,

 your cmdlines to build the shared lib look correct. The assembler code shows 
 that the code
 has been built correctly with -fPIC. However, the relocations in the .o and 
 .so files look
 like non-pic code. With -fPIC it should be R_390_PLT32DBL in the .o and there 
 has to be a
 R_390_JMP_SLOT relocation in the .so. The R_390_PC32DBL is not sufficient to 
 reach the symbol.

 So it looks like AS ignores the @PLT modifier completely. But your system 
 would not work
 at all if this really would be the case.

That is pretty much impossible. md_gather_operands does this for @PLT:

  else if (suffix == ELF_SUFFIX_PLT)
{
  if ((operand-flags  S390_OPERAND_PCREL)
   (operand-bits == 16))
reloc = BFD_RELOC_390_PLT16DBL;
  else if ((operand-flags  S390_OPERAND_PCREL)
(operand-bits == 32))
reloc = BFD_RELOC_390_PLT32DBL;
}

A BFD_RELOC_390_PLT32DBL relocation is always emitted as a R_390_PLT32DBL. On 
final link
the bfd code will either convert that to a direct call without any kind of 
relocation
or it will create a PLT slot and a R_390_JMP_SLOT.
If the final shared library has a R_390_PC32DBL there must have been an object 
file that
contained the relocation.

 What distro are you using? I wasn't able to reproduce it on the system I have 
 access to.

Me neither, a simple library that wraps gettimeofday just works.

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: HugePage support with RHEL

2011-09-05 Thread Martin Schwidefsky
On Fri, 2 Sep 2011 09:54:31 -0500
Stephen Frazier ste...@doc.state.ok.us wrote:

 On 9/2/2011 7:37 AM, Martin Schwidefsky wrote:
  There is no documentation for z/VM as the edat facility is not
  supported. The emulation for large pages takes place in the Linux guest.
 Looking at IBM past support for new hardware in guests of z/VM, I would
 suggest that it will be supported in z/VM 7.1 or before. Look for edat
 facility support to be added in a as yet unannounced point release of
 z/VM 6.?. Also, IBM probably will not talk about it until they are ready
 to announce it.

This is harder than it looks. With z/VM you have two complete sets of
page tables to worry about, the host page table and the guest page table.
The host page table differs from the guest page table, each entry in the
lowest level of the page table (the pte) has an extension (the pgste).
Among other things you have the guest/host reference-backup-bit and the
guest/host change-backup-bit in the pgste which are required to get
paging right over the two levels of page tables.

Now what happens if z/VM uses large pages in the host page table for
the guest? The pgste table is tied to the pte table but that table does
not exists if the segment table entry points to a 1MB frame. You loose
access to the backup bits and your normal paging does not work anymore.
The virtualization of the edat architecture is hard.

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: HugePage support with RHEL

2011-09-02 Thread Martin Schwidefsky
On Thu, 1 Sep 2011 15:01:52 -0400
Brad Hinson bhin...@redhat.com wrote:

 Hi James,

 It appears that in order to use hardware large page support,
 Linux must be running in LPAR mode.  I can't find anything that
 says this is supported in z/VM.  Hopefully someone can correct
 me if I'm wrong.  I can confirm that on a z10 under z/VM 6.1
 I also do not see 'edat' in /proc/cpuinfo, so hugepage support
 is emulated in software.

You can use large pages in LPAR and under z/VM. In LPAR we have
real large pages if we have the edat facility. If there is no
edat facility or if we are running under z/VM we use large page
emulation.

There are two benefits to using hugepages:
1) The TLB pressure in reduced by using 1MB frames. To get this
   benefit the edat facility is required since this needs the
   large page segment table entries. No love here for z/VM.
2) The memory savings due to the reduced number of page tables.
   There are two cases:
2a) under LPAR with edat the 1MB frames are directly referenced
by the segment table entry, the lowest page table level is
not allocated at all.
2b) under z/VM there is no edat facility and no large page
segment table entries. Here a single page table for the 1MB
frame is allocated which is shared by all users of the large
page.

The page table overhead to map 2GB of memory:
i) without large pages: 1 segment table, 2048 page tables
ii) with large page emulation: 1 segment table, 1 page table
iii) with edat large pages: 1 segment table

In numbers i) 4112 KB, ii) 18 KB, and iii) 16 KB. This number
is per process. If your database uses processes for its
transactions and maps large share memory areas the memory
savings quickly add up.
If you have e.g. 128 processes mapping 2GB you'll need for
case i) 514 MB, ii) 2.25 MB, and iii) 2 MB.

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: HugePage support with RHEL

2011-09-02 Thread Martin Schwidefsky
On Fri, 2 Sep 2011 07:35:56 -0400
CHAPLIN, JAMES (CTR) james.chap...@associates.dhs.gov wrote:

 I almost gave up hope on HughPages due to all our zLinux Guests run
 under zVM, and we like using zVM to control the swapping as best as
 possible. I do have two follow-up questions:

 1) Can you point me to any reference material dealing with HugePages
 with zVM (v6.1) where I can start my homework on the topic? I did a
 search online of IBM zVM 6.1 doc on HugePages, Large Pages and came up
 with nothing.

There is no documentation for z/VM as the edat facility is not supported.
The emulation for large pages takes place in the Linux guest.


 2) What is required or needs to be done to enable edat in zLinux from
 the zVM side? Again, would you be able to point me to any doc in the zVM
 side? How do we get the edat facility, is it hardware or a setting in
 either the SE or HMC with the LPAR definition, or in the IODF?

You cannot enable the edat facility under z/VM for a Linux guest (and you
won't see edat in the cpu features) but you can still use large pages
in the Linux guest. Just specify the hugepages=nr kernel parameter.
The Linux kernel will automatically use the large page emulation if the
edat facility is missing.

You may want to look at the Large Page Support chapter in Linux on
System z Device Drivers, Features and Commands:
http://public.dhe.ibm.com/software/dw/linux390/docu/lk39dd11.pdf

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Server rebooting after doing a CP Q DA from the console

2011-03-21 Thread Martin Schwidefsky
On Sat, 19 Mar 2011 23:10:16 -0300
Mauro Souza thoriu...@gmail.com wrote:

 I saw this problem some time ago with an Oracle RAC guest. It haven't set
 the CP SET RUN ON, and as soon as the client issued some #CP Q SOMETHING,
 the server froze down, and linux rebooted. Looks like Oracle RAC have some
 kind of watchdog, and as CP MODE stops running Linux kernel for a little
 moment, the watchdog thinks the system froze down, and reboots the system.
 Setting RUN ON solved the problem.
 You can try this, it won't hurt, and I think RUN ON should be the default.

There is the important hint: if you have Oracle RAC and the watchdog is
running the z/VM guest may not stop for longer periods of time. My guts
feeling is that the large output of the #CP Q xyz command stopped the
linux guest for too long. Once the output completed the guest continued
and the Oracle watchdog did what it is programmed to do: reboot.

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Where is kernel loaded in memory?

2011-03-17 Thread Martin Schwidefsky
On Thu, 17 Mar 2011 10:46:18 +0100
Heiko Carstens heiko.carst...@de.ibm.com wrote:

 On Wed, Mar 16, 2011 at 01:40:36PM -0500, Mark Wheeler wrote:
  Greetings all,
 
  Is there a way to tell externally (command or otherwise) where the
  zLinux kernel is loaded in memory?

 The kernel gets loaded to address absolute zero and uses a 1:1
 mapping for virtual to physical pages.
 /proc/iomem tells you which memory areas the kernel uses.
 This does not include kernel modules which get loaded into the
 vmalloc area which is a virtual address range that for current
 kernels starts at address 0x03c0. That range (or better
 parts of it) gets backed with arbitrary physical pages when needed.

Well, the kernel image is linked with a starting address of 0, logically
it is loaded to the absolute address zero 0 of the guest container.
In truth the kernel image usually is loaded starting from either 0x1
or 0x10, the ipl loader strips the first 64KB / 1MB.
But the knowledge that your guest kernel is at specific address doesn't
tell you much, for a z/VM guest the memory is virtualized and for the
LPAR partition you are running in you have another remapping mechanism
(zoning). Basically the pages of your kernel can be anywhere in the
physical memory.

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Assembler question

2010-07-07 Thread Martin Schwidefsky
On Wed, 7 Jul 2010 10:19:18 +0100
Richard J Moore richardj_mo...@uk.ibm.com wrote:

 gcc/as option --march=z10 should certainly provide EPSW, but I'm I'm
 certain it's older than z10. When I mean new I mean more recent than
 s/370 :-)

 But as mentioned in another response, unless you require the entire PSW to
 be stored you are better off using 3 or 4 instructions designed to extracts
 specific parts of the PSW.

The relevant line from the binutils opcode description file

b98d epsw RRE_RR extract psw z900 esa,zarch

That instruction exists for esa and zarch mode starting with the z900.
A -march=z900 should enable the instruction if your binutils version is
recent enough. With older binutils version you can use the .insn
pseudo-op:

.insn rre,0xb98d,%rx,%ry

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: Low address protection

2010-03-12 Thread Martin Schwidefsky
On Thu, 11 Mar 2010 17:04:04 -0600
Neale Ferguson ne...@sinenomine.net wrote:

 It was doing a MVCLE 4,2 (a842)

 R2 - 0 so it was trying to reference page 0.


 On 3/11/10 5:59 PM, Marcy Cortes marcy.d.cor...@wellsfargo.com wrote:

 Mar 11 04:36:00 cpzpv17020 kernel: Low-address protection: 0004 [#1]
 Mar 11 04:36:00 cpzpv17020 kernel: CPU:1Not tainted 
 (2.6.5-7.317-s390x 200905261627510200)
 Mar 11 04:36:00 cpzpv17020 kernel: Process java (pid: 11073, task: 
 7c7ee070, ksp: 4b5835c8)
 Mar 11 04:36:00 cpzpv17020 kernel: User PSW : 0705f0018000 
 0289ec04 (0x289ec04)
 Mar 11 04:36:00 cpzpv17020 kernel: User GPRS:  
 0001  
 Mar 11 04:36:00 cpzpv17020 kernel:020098ef2000 
 08e8 020098ef2b60 0002
 Mar 11 04:36:00 cpzpv17020 kernel:0200c6f9f508 
  3400 800382f8
 Mar 11 04:36:00 cpzpv17020 kernel:02861560 
 028a9e98 0287d936 0200c6f9f258
 Mar 11 04:36:00 cpzpv17020 kernel: User Code: a8 42 00 00 a7 14 ff fe 07 fe 
 07 07 a7 29 00 00 07 fe 07 07

MVCLE with a zero source length is a memset. The instruction only
accesses the target address, the source address is of no concern.

A low-address protection exception may NEVER happen for a user-space
process, the user space asce has the private space control bit set.
This is definitely a kernel bug, we've seen the same on SLES10.
The fix is a backport of the new TLB flush logic.

Just for verification: this is on a z10, no?

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: CPU Capability values [PUBLIC]

2010-02-17 Thread Martin Schwidefsky
On Wed, 17 Feb 2010 08:18:12 -0600
Shedlock, George gshedl...@aegonusa.com wrote:

 If the selection of the CPU capability is incorrect, what exactly does that
 mean to the kernel? Does that mean that linux has been running at the speed
 of the GP processors?

No, an IFL always runs at its native speed. Only the number that is supposed to
detail the processor speed is wrong.

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: CPU Capability values

2010-02-16 Thread Martin Schwidefsky
On Mon, 15 Feb 2010 14:26:12 -0500
Quay, Jonathan (IHG) jonathan.q...@ihg.com wrote:

 zsles10sp3ctl:/sys/devices/system/cpu/cpu0 # cat capability
 1760

 It looks like the kernel uses the wrong value.  IBM says that the
 secondary number is the IFL capability (lower is faster).

Hmm, that number is retrieved by this code snipped:

rc = stsi(info, 1, 2, 2);
if (rc == -ENOSYS)
goto out;
rc = 0;
*capability = info-capability;

struct sysinfo_1_2_2 {
char format;
char reserved_0[1];
unsigned short acc_offset;
char reserved_1[24];
unsigned int secondary_capability;
unsigned int capability;
unsigned short cpus_total;
unsigned short cpus_configured;
unsigned short cpus_standby;
unsigned short cpus_reserved;
unsigned short adjustment[0];
};

It always reports the primary capability. We'll investigate.

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: x86 to z CPU comparison for calculating IFLs needed

2010-01-08 Thread Martin Schwidefsky
On Fri, 8 Jan 2010 14:37:50 +0100
Rob van der Heij rvdh...@velocitysoftware.com wrote:

 On Fri, Jan 8, 2010 at 1:56 PM, Agblad Tore tore.agb...@volvo.com wrote:

  SLES also have the hz_timer file.
  It is zero here as well, SLES10 SP2, z10.
  and the HZ contains 100.

 Correct. The hz_timer is to enable the tick timer - the 0
 setting is the right one, since that gets us the tickless timer
 which avoids overhead.

 When the other platforms changed from 10 ms to 1 ms granularity, s390
 did not dare because it would seriously impact installations that
 still run with the tick timer on. With the tickless timer, it would
 be helpful for granularity of the CPU metrics to have HZ raised to
 1000. But..

The increase of the number of timer ticks from 100 to 1000 does not
change the output format of the different /proc entries. That would be
an incompatible change that would break quite a few user space tools.

 The HZ also determines the granularity of wake-up requests. Some
 applications misbehave and ask to sleep for very short delays
 (polling). Today (after the right kernel fixes) these get rounded up
 to the next 10 ms. So even when some application is polling with 1 ms
 (!) delay, the process will wake up only once every 10 ms. Waking up
 the server every ms would be 10 times as much expensive. The 10 ms
 granularity also helps to improve timer merging, again reducing the
 overhead in a virtualized environment.

That is true for the kernels used in the SLES10 / RHEL5 and older
distributions. More recent kernel version support true high-resolution
wake-up. The hrtimer system call allows to specify timeout with a
granularity of nano-seconds. With the clockevents infrastructure that
gets rounded to the capabilities of the currently active clock event
device. For s390 this always is the clock comparator which has
microsecond or better resolution. You loose the timer merging though..

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Hugepages+oracle 10.0.2.0.4 in sles10SP2

2009-11-28 Thread Martin Schwidefsky
On Sat, 28 Nov 2009 12:07:16 +1000
Shane ibm-m...@tpg.com.au wrote:

 I missed the earlier part of this thread, but is large page support even
 in z/VM ?. Alan ?.
 I have a recollection that Mario Held from the Boeblingen labs said this
 has been tested, but only for an LPAR install. And I'm pretty sure I
 followed up on this and got the answer it's not even in V6.1.
 Happy to be proved wrong.

Large page support is not available in z/VM, in particular guest
support is pretty hard to do. But the nice thing is: you don't need
guest support to reap the main benefit of large pages. There are two
benefits:
1) Less TLB entries for the same amount of addressed shared memory.
2) Reduced memory overhead for the page tables to map the shared memory.

You get TLB saving only on LPAR with the hardware support but you can
get the page table savings on z/VM with a little trick. Allocate huge
pages like you do if you have the hardware support. Then in addition
allocate a page table page to map the huge page and associate it with
the huge page. When the huge page is mapped to a process use the same
page table for all mappers. Sort of a poor man page table sharing. This
trick works on z/VM and on older machines that do not even have large
pages.

 On non-s390 one of the big plusses is avoiding TLB misses. But given
 that I believe z/VM also doesn't support hiperdispatch, that may not be
 much of a gain on zSeries either.

I don't see the connection between TLB misses and hiperdispatch. Could
you elaborate please?

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Hugepages+oracle 10.0.2.0.4 in sles10SP2

2009-11-28 Thread Martin Schwidefsky
On Sat, 28 Nov 2009 07:26:15 -0800
Barton Robinson bar...@vm1.velocity-software.com wrote:

 Martin, please correct me if i am wrong.
 Large page support supports 1mb pages - meaning consecutive 256 4k pages
 in hardware. These pages are fixed, are not pageable with current
 technology.  The advantage is that there is one TLB entry per megabyte
 instead of one per 4k page, so that the TLB is more efficient and more
 entries fit into the hardware cache, requiring less DAT translations.
 To get advantage then, there would have to be 1mb of very active
 programs or data packaged in that 1mb page. Operating systems could be
 packaged for this with work. zTPF took advantage of this, but that
 architecture is focused on performance, not virtualization.

Technically large pages could be swapped, the invalid bit in the
segment table entry is there. Linux does not do this because it would
be complicated and not very effective because of the packing of data
you mentioned. The main use case for large pages are large in-memory
databases.

 For oracle, it would require dedicating 256 consecutive hardware pages
 to an oracle database, running virtual under linux, running virtual
 under z/vm - yep, quite a challenge.  In a virtual environment where we
 do run many different programs - the benefit to programs and data would
 be difficult to show.

The pages that are dedicated to the large pages in the guest can be
swapped by z/VM without a problem. You could argue that these large
pages are not really large pages because of the software emulation
via shared page table pages but you do get the savings in the page
table memory which is ihmo well worth the effort.

 The benefits of large pages are to those places that can and do measure
 differences in performance at the single digit percentage difference
 (zTPF). z/VM could get a little advantage with a lot of work, linux in
 an LPAR as well. Linux under z/VM not. I would be surprised if the
 improvement was measurable in any truly virtualized environment.

There will be considerable improvement if the oracle workload uses
i) many processes and ii) large shared segments. For each process that
maps 1GB of shared memory you save 2MB in page tables.

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: time oddity - maybe a z/VM question

2009-11-19 Thread Martin Schwidefsky
On Thu, 19 Nov 2009 08:48:30 -0600
bruce.light...@its.ms.gov wrote:

 Currently it is 08:43 CST

 br...@taxdbp01:~ date
 Thu Nov 19 07:43:17 CST 2009

 user MAINT on z/VM shows that it is 08:43

Is there an NTP daemon running on that system and if yes what is its
configuration?

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Clearing linux buffers and caches

2009-07-08 Thread Martin Schwidefsky
On Wed, 8 Jul 2009 03:00:27 +0200
Ivan Warren i...@vmfacility.fr wrote:

 cmm-1 (arch/s390/mm/cmm.c) uses SMSGs to communicate with the VMRM
 service virtual machine. The VMRM will instruct the guest to release its
 pages using good ole Diag 10.. nothing fancy here..

diag 10 actively tells z/VM that the page is free.

 cmma (arch/s390/mm/page_states.c) uses the B9AB instruction to indicate
 to CP which pages linux no longer needs.. It uses the standard Linux MM
 mechanism to be instructed to do so. (page_free()).. The only diff
 between Diag 10 and B9AB is that a page marked Unused with B9AB will
 generate an Addressing Exception (PIC 5) if you attempt to access it,
 while Diag 10 won't do that.

The B9AB instruction will mark a page as free in the pgste. That does
not cause a sie exit to z/VM which means z/VM doesn't know the page is
free until it does a page scan for the guest. So you won't see an
immediate effect in the z/VM page counts. If z/VM needs memory it will
eventually find the unused pages and the page count will drop belately.

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Error linking 31 bit

2009-06-04 Thread Martin Schwidefsky
On Wed, 3 Jun 2009 12:30:19 -0600
Lee Stewart lstewart.dsgr...@attglobal.net wrote:

 This was posed by one of our customers..

 I'm having trouble linking (ld) as a 31-bit executable. I've got the C
 flags set-up correctly, I think (they are compiling as 31-bit), but I'm
 getting this message when doing ld:
 ld: Relocatable linking with relocations from format elf32-s390
 (/devel/supra1/rel1.3.1m/linux/obj/dis_iqt.o) to format elf64-s390
 (/devel/supra1/rel1.3.1m/linux/obj/dapdis.o) is not supported
 I'm using -m31
 And
 --oformat elf32-s390

 Has anyone dabbled in such things and have any advice/counsel?
 Thanks,
 Lee

Do a file *.o on all the objects that are passed to ld. One of them
is not a elf32-s390 elf object. Then check the Makefiles to find the
place where an -m31 is missing.

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: IPL of SLES11 .ins file in Hercules yields immediate disabled wait state

2009-04-17 Thread Martin Schwidefsky
On Wed, 15 Apr 2009 09:55:08 -0400
David Boyes dbo...@sinenomine.net wrote:

 Rather than testing for specific model numbers, why don't you test for
 specific capabilities that you need/want? That way you get a exception that
 you can handle and actually issue a useful message or load a specific PSW
 (similar to the way things work on other OSes), rather than just dying in a
 cryptic way.

How about this:

Subject: [PATCH] use facility list for cpu type safety check

From: Martin Schwidefsky schwidef...@de.ibm.com

Signed-off-by: Martin Schwidefsky schwidef...@de.ibm.com
---
 arch/s390/include/asm/lowcore.h |1
 arch/s390/kernel/head.S |   49 +++-
 2 files changed, 35 insertions(+), 15 deletions(-)

diff -urpN linux-2.6/arch/s390/include/asm/lowcore.h 
linux-2.6-patched/arch/s390/include/asm/lowcore.h
--- linux-2.6/arch/s390/include/asm/lowcore.h   2009-04-17 09:37:40.0 
+0200
+++ linux-2.6-patched/arch/s390/include/asm/lowcore.h   2009-04-17 
12:14:06.0 +0200
@@ -30,6 +30,7 @@
 #define __LC_SUBCHANNEL_NR 0x00ba
 #define __LC_IO_INT_PARM   0x00bc
 #define __LC_IO_INT_WORD   0x00c0
+#define __LC_STFL_FAC_LIST 0x00c8
 #define __LC_MCCK_CODE 0x00e8

 #define __LC_DUMP_REIPL0x0e00
diff -urpN linux-2.6/arch/s390/kernel/head.S 
linux-2.6-patched/arch/s390/kernel/head.S
--- linux-2.6/arch/s390/kernel/head.S   2009-04-17 09:37:40.0 +0200
+++ linux-2.6-patched/arch/s390/kernel/head.S   2009-04-17 12:14:06.0 
+0200
@@ -478,27 +478,46 @@ startup:basr  %r13,0  # get base
mvc __LC_LAST_UPDATE_TIMER(8),6f-.LPG0(%r13)
mvc __LC_EXIT_TIMER(8),5f-.LPG0(%r13)
 #ifndef CONFIG_MARCH_G5
-   # check processor version against MARCH_{G5,Z900,Z990,Z9_109,Z10}
-   stidp   __LC_CPUID  # store cpuid
-   lhi %r0,(3f-2f) / 2
-   la  %r1,2f-.LPG0(%r13)
-0: clc __LC_CPUID+4(2),0(%r1)
-   jne 3f
-   lpsw1f-.LPG0(13)# machine type not good enough, crash
+   # check capabilities against MARCH_{G5,Z900,Z990,Z9_109,Z10}
+   xc  __LC_STFL_FAC_LIST(8),__LC_STFL_FAC_LIST
+   stfl__LC_STFL_FAC_LIST  # store facility list
+   tm  __LC_STFL_FAC_LIST,0x01 # stfle available ?
+   jz  0f
+   la  %r0,0
+   stfle   __LC_STFL_FAC_LIST  # store facility list extended
+0: l   %r0,__LC_STFL_FAC_LIST
+   n   %r0,2f+8-.LPG0(%r13)
+   cl  %r0,2f+8-.LPG0(%r13)
+   jne 1f
+   l   %r0,__LC_STFL_FAC_LIST
+   n   %r0,2f+12-.LPG0(%r13)
+   cl  %r0,2f+12-.LPG0(%r13)
+   je  3f
+1: lpsw2f-.LPG0(13)# machine type not good enough, crash
.align 16
-1: .long   0x000a,0x
-2:
+2: .long   0x000a,0x8bad
+#if defined(CONFIG_64BIT)
 #if defined(CONFIG_MARCH_Z10)
-   .short 0x9672, 0x2064, 0x2066, 0x2084, 0x2086, 0x2094, 0x2096
+   .long 0xc100efe3, 0xf068
 #elif defined(CONFIG_MARCH_Z9_109)
-   .short 0x9672, 0x2064, 0x2066, 0x2084, 0x2086
+   .long 0xc100efc3, 0x
 #elif defined(CONFIG_MARCH_Z990)
-   .short 0x9672, 0x2064, 0x2066
+   .long 0xc0002000, 0x
 #elif defined(CONFIG_MARCH_Z900)
-   .short 0x9672
+   .long 0xc000, 0x
 #endif
-3: la  %r1,2(%r1)
-   brct%r0,0b
+#else
+#if defined(CONFIG_MARCH_Z10)
+   .long 0x8100c880, 0x
+#elif defined(CONFIG_MARCH_Z9_109)
+   .long 0x8100c880, 0x
+#elif defined(CONFIG_MARCH_Z990)
+   .long 0x80002000, 0x
+#elif defined(CONFIG_MARCH_Z900)
+   .long 0x8000, 0x
+#endif
+#endif
+3:
 #endif

l   %r13,4f-.LPG0(%r13)


--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: IPL of SLES11 .ins file in Hercules yields immediate disabled wait state

2009-04-17 Thread Martin Schwidefsky
On Fri, 17 Apr 2009 15:41:49 +0200
Ivan Warren i...@vmfacility.fr wrote:

 Martin Schwidefsky wrote:
   #if defined(CONFIG_MARCH_Z10)
  -   .short 0x9672, 0x2064, 0x2066, 0x2084, 0x2086, 0x2094, 0x2096
  +   .long 0xc100efe3, 0xf068
 
 Does the linux kernel (with CONFIG_MARCH_Z10) really need all the z10
 facilities ? (especially some of those are simply hints - like bit 19)

Bit 19 is 0x1000 in the first stfl word and is not set, no?

The logic I've used for the capability bits:
1) the bits the kernel really needs, e.g. the z/Architecture bit for
   a 64 bit kernel
2) the bits for facilities that can be used in user space and therefore
   in theory could be generated by the compiler, e.g. by __builtin_xyz

I did not include:
1) bits for facilities that can only be used by the kernel and are
   currently unused, e.g. the ASN-and-LX reused facility.
2) hint bits, e.g. about the performance of long-displacement or DFP

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: IPL of SLES11 .ins file in Hercules yields immediate disabled wait state

2009-04-15 Thread Martin Schwidefsky
On Wed, 15 Apr 2009 09:55:08 -0400
David Boyes dbo...@sinenomine.net wrote:

 On 4/15/09 5:49 AM, Heiko Carstens heiko.carst...@de.ibm.com wrote:
 
  How about the patch below?
  Since I would expect that this is going to happen a lot of times as
  soon as some distro starts to compile the kernel with e.g. only z9-109
  and higher support we indeed need a magic number here.
  Otherwise we can't tell immediately what's going wrong.

 Rather than testing for specific model numbers, why don't you test for
 specific capabilities that you need/want? That way you get a exception that
 you can handle and actually issue a useful message or load a specific PSW
 (similar to the way things work on other OSes), rather than just dying in a
 cryptic way.

 If I remember, the response from STSI should give you whether a specific set
 of capabilities is present, and then you don't care what weird-ass things
 the marketing people do.

If you look at arch/s390/Makefile you will find this:

cflags-$(CONFIG_MARCH_G5)   += $(call cc-option,-march=g5)
cflags-$(CONFIG_MARCH_Z900) += $(call cc-option,-march=z900)
cflags-$(CONFIG_MARCH_Z990) += $(call cc-option,-march=z990)
cflags-$(CONFIG_MARCH_Z9_109) += $(call cc-option,-march=z9-109)
cflags-$(CONFIG_MARCH_Z10) += $(call cc-option,-march=z10)

We could translate the compile option for a particular model to the
set of capabilities of the machine. That would help with future models
where we don't know the model number yet.

Printing a message is harder to do. We cannot use the device drivers
because they are compiled with options that will generate instruction
that will trap on older machines. The only option would be to use
assembler written function to print a message via sclp.

--
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: zLinux on Hercules and DIAG 308

2008-12-15 Thread Martin Schwidefsky
On Sun, 2008-12-14 at 17:28 -0500, Richard Troth wrote:
 What happens is this:  the kernel loads, console traffic begins, but
 before /sbin/init can get launched something triggers DIAG 308.  (In
 fact, I think this is before the root gets mounted.)  If I boot from
 disk, it loops rebooting.  If I boot from the
 reader, it stops.  (empty hopper - duh!)

That is probably the diag308_set_works detection code that is called
early in the boot process:

void __init ipl_update_parameters(void)
{
int rc;

rc = diag308(DIAG308_STORE, ipl_block);
if ((rc == DIAG308_RC_OK) || (rc == DIAG308_RC_NOCONFIG))
diag308_set_works = 1;
}

I did not look at the hercules code but I guess that this detection code
triggers a reboot. It does match the symptom ..

--
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: question on top

2008-12-08 Thread Martin Schwidefsky
On Mon, 2008-12-08 at 07:25 -0800, Barton Robinson wrote:
 Sorry Christian, but with the latest and greatest, there are many cases where 
 Linux and
 TOP now seriously under report utilization (I think by factor of 5 in the 
 lab, and by 4 in
 a production server). Not sure we've bothered to report the details since 
 this problem
 would not impact our users.  So the data still can not be used for serious 
 performance
 work, capacity planning or accounting/chargeback.  It's like putting gas in a 
 car, and the
 price per unit varies with the number of other people wanting gas.  Doesn't 
 lead one to
 trust the instrumentation.

1) Rob, please report these discrepancies. The numbers linux reports
should be correct.
2) The factor of 4-5 is based on what numbers exactly? I doubt that you
get that discrepancy if you are running a cpu bound linux process that
uses more than a few percent of cpu. As already pointer out the
situation that started this thread is very likely a multi threaded
program and top aggregates the cputime.
3) Top is by no means a monitoring tool. You can use it to get a rough
snapshot of the current situation but please don't use it instead of a
real monitor because top itself uses a lot of cpu.

--
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: question on top

2008-12-08 Thread Martin Schwidefsky
On Mon, 2008-12-08 at 17:41 +0100, Rob van der Heij wrote:
  a production server). Not sure we've bothered to report the details since 
  this problem
  would not impact our users.  So the data still can not be used for serious 
  performance
 
  The last time we talked, your tool used the Linux data as one input value of
  your calculations. So if the Linux data is really wrong, any fix would 
  improve
  the accuracy of your tool, no?

 I don't think the measurements based on CPU timer are more accurate
 than those based on TOD.

Sorry Rob but this is nonsense.

 For one thing because the CPU timer is less accurate than the TOD clock.

Principles of Operation chapter 4 about the CPU timer:
The CPU timer is a binary counter with a format which is the same as
that of bits 0-63 of the TOD clock, except that bit 0 is considered a
sign. The CPU timer nominally is decremented by subtracting a one in bit
position 51 every microsecond.

I would call this as accurate as the TOD clock. The stepping rates are
not 100% the same if the TOD-clock-steering facility is installed but
the difference is very very small. By the way z/VM is using the same
mechanism to do its own cputime accounting.

 It's accurate enough when you measure a single virtual machine.
 But when the kernel is reloading the CPU timer again and again for
 each process or thread using a small amount of CPU, the error adds up
 very quick.

This statement is wrong. The CPU timer is reprogrammed when a CPU goes
idle, after it wakes up from idle, when a new earliest CPU timer event
is added and when a CPU timer event expires. Usually there are no CPU
timer events so we only reprogram the CPU timer going in and out of
idle. In particular the kernel does not reprogram the CPU timer for each
process. The overall error is minuscule, the following function programs
the CPU timer:

static inline void set_vtimer(__u64 expires)
{
__u64 timer;

asm volatile (  STPT %0\n  /* Store current cpu timer value */
SPT %1 /* Set new value immediatly afterwards */
  : =m (timer) : m (expires) );
S390_lowcore.system_timer += S390_lowcore.last_update_timer - timer;
S390_lowcore.last_update_timer = expires;

/* store expire time for this CPU timer */
__get_cpu_var(virt_cpu_timer).to_expire = expires;
}

The instruction to store the current value and the instruction to set
the new value are next to each other. You cannot do better.

There is one problem we recently identified and that is the cputime
spent by the idle process doing actual system work is accounted as idle
time instead of system time. I have a patch for this problem, it will go
upstream with the next merge window. The maximum difference I was able
to create with my testcases has been 0,35%.

 And because the CPU timer measures only in-SIE time, you miss the
 resources that CP and SIE spent on behalf of the virtual machine. Even
 when you don't measure it, someone still has to pay for it ;-)

This is called CP overhead and there are two cases. If CP wants to
account CPU time to the guest because it has done work on behalf of the
guest, it can simply add the time to the guest CPU timer in the SIE
control block before the guest cpu is restarted. The cputime spent by CP
for things not directly related to a guest should NOT be accounted to
the guest. This part of the CP overhead has to be accounted by z/VM.

 When I was diagnosing the customer problem, I did notice one bug in
 the kernel that probably could be fixed. But I did not have time yet
 to try that and see how big the difference would be. And in general,
 the additional code for dealing with the CPU timer makes the
 unmeasured part of time longer, so in general reduces the capture
 ratio.

How is the unmeasured part of the time longer? There is some overhead
for doing the improved Linux cputime accounting but the additional
instructions are fully accounted as cputime in Linux.

--
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: z10 binutils release

2008-06-13 Thread Martin Schwidefsky
On Thu, 2008-06-12 at 09:11 -0600, Mark Post wrote:
 Why not just use the linux versions of the binutils? They are released
  more often and the latest version 2.18.50.0.7 comes with the z10
  instructions included. You'll find them on ftp.kernel.org under
  pub/linux/devel/binutils.

 Wouldn't the OP also need to use a version of GCC that generates the
 z10 instructions and optimizations?

The question was specifically for the binutils version, if you want to
use the z10 instructions with you standard C/C++ applications you indeed
will need a new gcc that makes use of the z10 instructions. The code is
in the gcc SVN but we did not finish the patches in time for gcc 4.3.
You'll either have to use the upstream variant of wait for gcc 4.4.

--
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Where is cmma setting?

2008-02-27 Thread Martin Schwidefsky
On Wed, 2008-02-27 at 14:37 +0100, Rob van der Heij wrote:
 Oh, and I doubt your patch does the trick. The ESSA will program check
 when not there. I don't think fault.c will like that.

The EX_TABLE entry will take care of the fault. As far as I can tell
Heikos patch is fine.

 PS I did read that Brian Wade's experiments were done with cmma=on and
 use the CP settings to enable / disable the feature. When my
 assumptions about the implementation are correct, then that was indeed
 the right way to do it. But it does not reveal what will happen to
 customers who end up with a mix of penguins.

This would be an interesting experiment. The difference between cmma=on
and cmma=off for a page cache page that has been paged-out by z/VM is
not big. If a guest accesses such a page with cmma=off the z/VM system
will initiate an i/o to get the page and deliver a pfault interrupt to
the guest so that the guest can schedule another process. If a guest
uses cmma=on it will get a discard fault on access of a paged-out page.
In this case it is Linux that initiates the i/o. The process will have
to wait for page and another process is scheduled.

--
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Betr.: Re: Where is cmma setting?

2008-02-27 Thread Martin Schwidefsky
On Sun, 2008-02-24 at 15:36 -0700, Mark Post wrote:
  Why? What is broken?

 Pretty much the whole thing.  On a too-frequent basis it would go
 pathological and hurt you more than help you.  According to Martin
 Schwidefsky, however, about a week ago, he's changed his opinion on
 CMMA.  He gave me this URL to look at:
 http://www.vm.ibm.com/perf/reports/zvm/html/530cmm.html  Look it over
 and make sure you meet the maintenance requirements.  Since I'm not a
 performance expert, I can't really assess the new-and-improved CMMA
 myself.  I have to depend on other people for that.

We had a rather interesting time to get cmma working the way we wanted
it to work. We a some rather nasty bugs and got worried that there might
be more daemons lurking in the dark. So we decided to play safe and make
cmma=off the default. In the meantime we got good results and didn't
find any more bugs so it is safe to turn on cmma now. But make sure that
you have the latest service levels.

  For now CMMA seems a lot easier to do (far less moving parts
 involved), and
  I have all the requirements.

 Not sure how many moving parts you think CMM-1 has.  As far as I can
 see, it's loading the kernel module and echoing some values
 into /proc/sys/vm/cmm_*

You need to setup VMRM to generate the messages that direct the cmm-1
modules on the different machines. With cmma you don't have to do
anything but switch it on.

--
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Linuxes on LPARs become unresponsive

2008-01-04 Thread Martin Schwidefsky
On Fri, 2008-01-04 at 10:34 +0200, Niemi Ari wrote:
  I wonder if anyone would have any clue that why this is happening?
 
 One thing to add to all the other responses.  The screen shot shows 0
 swap space available.  That could just mean it was all used up, but it
 does raise the question: do you have any swap space defined for these
 systems?

 Yes. For instance, there is 1,5 GB swap space for those systems with 1
 GB of memory. From top:

 top - 10:32:16 up 19:35,  2 users,  load average: 0.01, 0.06, 0.02
 Tasks:  80 total,   2 running,  78 sleeping,   0 stopped,   0 zombie
 Cpu(s): 10.7% us,  0.7% sy,  0.0% ni, 88.3% id,  0.0% wa,  0.0% hi,
 0.3% si
 Mem:   1033748k total,   990516k used,43232k free,   224960k buffers
 Swap:  1572856k total,0k used,  1572856k free,   268520k cached

 BTW, the swap space is located on an lvm logical volume.

The fact that the system has 1.5GB swap space and the oom killer reports
FreeSwap: 0kb, Active:29378 inactive:29410 and dirty:0 indicates
that there has been roughly 1.8GB of anonymous memory around at the time
the oom killer struck. Before the system ooms all clean pages with a
backing are removed. Since there have not been any dirty pages, the
pages on the active/inactive list are to a very large degree anonymous
memory - 1.5GB swap + (29378+29410)*4K of anonymous memory.
The system either has not enough swap space or some process went
ballistic. The oom killer usually picks the process that went ballistic
since it has biggest badness. So my best guess is that you have a
processes with a memory leak and the name of one of these processes has
been kuma610.

--
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: zLinux performance collection tool(s)?

2007-10-31 Thread Martin Schwidefsky
On Wed, 2007-10-31 at 09:49 +0100, Rob van der Heij wrote:
 The other part is the virtualization effect, where the Linux kernel
 believes it uses 100% of the CPU, but because of shared resources on
 z/VM it only gets 10% (so all readings are order of magnitude off).
 The latest Linux kernels have changed this to report 10% - so what's
 the conclusion when you see Linux report 10% usage; is it short on CPU
 or not ;-).

If Linux reports 10% usage and a steal time of 90% then its short on
CPU, if the sum of the usage and steal time is not close to 100% then
there is spare cpu time that could be used by the Linux processes. This
can be concluded inside Linux, you don't need an external monitor for
this particular problem.

--
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Linux password and the VM console

2007-10-03 Thread Martin Schwidefsky
On Wed, 2007-10-03 at 01:13 +0200, Ivan Warren wrote:
 Masking a password masked at the 3215 console should be possible :

 1) In drivers/s390/char/con3215.c, if the driver detects that ECHO is
 off for the underlying TTY, then issue X'0E' instead of X'0A'  for the
 read CCW (in raw3215_mk_read_req() ?)
 2) When prompted for a password, press ENTER (or whatever your 3270 SEND
 AID key may be) first, enter your password and then press ENTER again..
 If you type the password directly without first sending ATTN, the X'0E'
 will be issued too late and the input area won't be masked (that's
 because, contrary to a real 3215, on VM, you don't have to ATTN before
 you type something) :

In principle it is a good idea to use read inhibited (X'0E') instead of
read inquiry (X'0A') but unfortunately it doesn't work with the current
3215 support in z/VM. The password is only suppressed if a X'0E' read is
pending when the password is typed. The user might take a long time to
type in the password and console output might get printed in the
meantime. To get the output on the screen the 3215 driver would have to
stop the outstanding read but that is not possible! The halt subchannel
on an emulated 3215 devices does not have the desired effect. The read
just continues. This means you can block the console by typing in a user
name on the login prompt and then doing nothing. If the console output
buffer is filled up by some messages this will stop the whole virtual
machine. Only after the pending read has completed the system can
continue. That is where I stopped with the patch for the 3215 driver, to
me it is not acceptable that the system can drop dead because of a
pending read.

--
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: How much more memory to relieve swapping?

2007-06-28 Thread Martin Schwidefsky
On Thu, 2007-06-28 at 15:24 +0200, Rob van der Heij wrote:
 A higher value of swappiness means that when Linux memory management
 needs some free memory, it is more willing to swap out a process than
 to purge data in cache.

The definition of swappiness is murky. The relevant code snippets and
comments from the source:

/*
 * `distress' is a measure of how much trouble we're having
 * reclaiming pages.  0 - no problems.  100 - great trouble.
 */
distress = 100  min(zone-prev_priority, priority);

/*
 * The point of this algorithm is to decide when to start
 * reclaiming mapped memory instead of just pagecache.  Work out
 * how much memory
 * is mapped.
 */
mapped_ratio = ((global_page_state(NR_FILE_MAPPED) +
global_page_state(NR_ANON_PAGES)) * 100) /
vm_total_pages;
...
/*
 * Now decide how much we really want to unmap some pages.  The
 * mapped ratio is downgraded - just because there's a lot of
 * mapped memory doesn't necessarily mean that page reclaim
 * isn't succeeding.
 *
 * The distress ratio is important - we don't want to start
 * going oom.
 *
 * A 100% value of vm_swappiness overrides this algorithm
 * altogether.
 */
swap_tendency = mapped_ratio / 2 + distress + sc-swappiness;
...
/*
 * Now use this metric to decide whether to start moving mapped
 * memory onto the inactive list.
 */
if (swap_tendency = 100)
force_reclaim_mapped:
reclaim_mapped = 1;


If the distress -- which is defined in terms of zone priorities --
reaches a certain limit, the system starts to swap. try_to_free_pages
repeatedly scans the active/inactive list with a priority value that
starts at 12. The priority is decreased from run to run if not enough
pages can be freed. Black magic.

--
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: what's Linux page size

2007-06-27 Thread Martin Schwidefsky
On Wed, 2007-06-27 at 11:13 -0400, Richard Troth wrote:
 4K on all platforms

There are several platforms where the pagesize is not 4K, e.g.
alpha(8K), powerpc(4K or 64K), ia64(4K, 8K, 16K or 64K), ...

--
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Let Novell Know if you want a easy CMS-friendly starter system!

2007-06-19 Thread Martin Schwidefsky
On Tue, 2007-06-19 at 06:19 -0400, Rick Troth wrote:
 Not only are some inbound converted to meaningful VT100-like sequences
 but I found out that some of the *outbound* are already handled too!
 Dunno if this was done by Boeblingen or UTSGlobal,  but either way,
 nice job folks!!

Thanks, the vt100/ANSI emulation was a pet project of mine :-)

 So ... in a pinch,  I tested the followin sequences:

 ESC[...H   for explicit cursor placement
 ESC[...J   for clearing the screen
 ESC[...m   for text attributes (eg: color)

Implemented are:
  Esc [ 0 K   Erase from current position to end of line inclusive
  Esc [ 1 K   Erase from beginning of line to current position inclusive
  Esc [ 2 K   Erase entire line (without moving cursor)
  Esc [ 0 J   Erase from current position to bottom of screen inclusive
  Esc [ 1 J   Erase from top of screen to current position inclusive
  Esc [ 2 J   Erase entire screen (without moving the cursor)
  Esc [ attr ; attr ; ... m
with attr ; attr ; ... a sequence of
  0  Reset highlight and color
  4  Start underlining
  5  Start blink
  7  Start reverse
 24  End underlining
 25  End blink
 27  End reverse
 30  Black
 31  Red
 32  Green
 33  Yellow
 34  Blue
 35  Magenta
 36  Cyan
 37  White
 39  Black
  Esc [ n A   Cursor n Up
  Esc [ n F   Cursor n Up
  Esc [ n B   Cursor n Down
  Esc [ n e   Cursor n Down
  Esc [ n E   Cursor n Down
  Esc [ n C   Cursor n Forward
  Esc [ n a   Cursor n Forward
  Esc [ n D   Cursor n Backward
  Esc [ x G   Set Cursor Horizontal Absolute
  Esc [ x `   Set Cursor Horizontal Absolute
  Esc [ y ; x H Set Cursor Position (x,y)
  Esc [ y ; x f Set Cursor Position (x,y)
  Esc [ y d   Set Cursor Vertical Absolute
  Esc 7 Save Cursor Position
  Esc [ s   Save Cursor Position
  Esc 8 Restore Cursor Position
  Esc [ u   Restore Cursor Position
  Esc [ n @   Insert n Characters
  Esc [ n P   Delete n Characters
  Esc [ n X   Erase n Characters
  Esc c Reset Terminal
  Esc D Line Feed
  Esc E Next Line
  Esc M Reverse Index
  Esc Z Respond ID
  Esc [ 5 n Device Status Report
  Esc [ 6 n Cursor Position Report

--
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Let Novell Know if you want a easy CMS-friendly starter system!

2007-05-31 Thread Martin Schwidefsky
On Wed, 2007-05-30 at 23:35 -0400, Rick Troth wrote:
 On an 'xterm',  pressing F1 delivers an  ESC[224z  sequence.
 One way to get full-screen 3270 interaction to ASCII apps
 is to have the 3270 PF1 AID converted to an  ESC[224z  sequence.

With the current 3270 driver in 2.6.x kernels some PFx AID keys are
already translated to meaningful escape sequences. For example PF1 AID
is translated to \033[[A which is F1 in the vt100 world.
You can change the translation with the loadkeys user space tool. The
3270 driver uses a little trick. loadkeys is normally used to do the
input translation of raw keyboard keys. The 3270 driver uses the
translation table defined with loadkeys for the output as well. This way
it is possible to change the code page that is used with a particular
device. Just define a new map file an feed it into loadkeys. The map
file contains a section where you can replace keys with strings, here is
the line for the PF1 translation:
string F1 = \033[[A
The PF1 key itself is mapped as
shift   control keycode 113 = F1
The keycode tables in linux are 128 bytes long but there are several of
them (keycode, shift keycode, control keycode, ...). On the 3270 side we
have 8 bit characters and the AID prefix. The AID prefix codes get
translated to 256 + character, so there are keycodes 0-511. For example
PF1 is EBCDIC 0xF1 but it always is preceeded with an AID, that makes
the keycode for PF1 0x1F1. Four keymap tables are used to cover
everything, keycode for 0-127, shift keycode for 128-255, control
keycode for 256-383 and shift control keycode for 384-511. So for AID
PF1 0x1F1 this translates to shift controlf keycode 113. It is a bit
awkward but I did not want to write new user space tools. You'll find
the default keymap in linux/drivers/s390/char/defkeymap.map. Change at
will, load it with loadkeys and watch :-)

--
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: what is the granularity of itimer

2007-05-24 Thread Martin Schwidefsky
On Thu, 2007-05-24 at 11:28 -0400, Brad Hinson wrote:
 itimer is based on jiffies, so the granularity should be 1 millisecond
 (HZ=1000/s) in kernel space.  I haven't tested this, though.  Are you
 seeing something different?

HZ is 100 on s390 so the granularity is 10 milliseconds.

--
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Steal % in top - s390x only?

2007-04-07 Thread Martin Schwidefsky
On Fri, 2007-04-06 at 16:34 -0400, Michael MacIsaac wrote:
 My question is, does this also work on other architectures? I don't have
 an Intel box with RHEL5 - does that new value show up?  And, if you are
 running VMware or other hipervisor technology, will it also show non-zero
 (i.e. does it also work?)

The st field shows up, e.g. under debian running on my thinkpad. Only
the field is zero for almost all architectures. With todays git tree
s390, powerpc and the i386 paravirtual timer support steal time
accounting. 

-- 
blue skies,  IBM Deutschland Entwicklung GmbH
   MartinVorsitzender des Aufsichtsrats: Johann Weihen
 Geschäftsführung: Herbert Kircher
Martin Schwidefsky   Sitz der Gesellschaft: Böblingen
Linux on zSeries Registergericht: Amtsgericht Stuttgart,
   Development   HRB 243294

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: vmcp problem

2007-03-15 Thread Martin Schwidefsky
On Thu, 2007-03-15 at 12:54 +0200, Avinoam hirschberg wrote:
 we don't have any .o or .ko files so this explain the error
 we did see that make file that come with the s390-tools remove the .o
 files

The kernel module vmcp.ko is delivered with the kernel, not the
s390-tools package. If the modprobe vmcp fails you need to upgrade your
kernel. The vmcp.o file is an intermediate file for the vmcp user space
program.

-- 
blue skies,  IBM Deutschland Entwicklung GmbH
   MartinVorsitzender des Aufsichtsrats: Johann Weihen
 Geschäftsführung: Herbert Kircher
Martin Schwidefsky   Sitz der Gesellschaft: Böblingen
Linux on zSeries Registergericht: Amtsgericht Stuttgart,
   Development   HRB 243294

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: vmcp problem

2007-03-15 Thread Martin Schwidefsky
On Thu, 2007-03-15 at 09:56 -0400, David Boyes wrote:
 Geez. No wonder they have to use bigger paper sheets in Europe -- that 
 address is a mouthful. 

There is a new law in .de that requires a specific e-mail closing (what
you see below) for external business mails. But I could remove that
mouthful for mailing list postings, they have lifted the rules a bit.

Compared to my old signature this one is only a bit wider, the new one
has 8 lines as did the old signature.

-- 
blue skies,  IBM Deutschland Entwicklung GmbH
   MartinVorsitzender des Aufsichtsrats: Johann Weihen
 Geschäftsführung: Herbert Kircher
Martin Schwidefsky   Sitz der Gesellschaft: Böblingen
Linux on zSeries Registergericht: Amtsgericht Stuttgart,
   Development   HRB 243294

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Trying to Debug IMAP Support in PHP

2007-01-22 Thread Martin Schwidefsky
On Sun, 2007-01-21 at 13:43 -0500, Mark Post wrote:
 0x016f7cb8 auth_md5_valid+24: la  %r4,160(%r11)
 0x016f7cbc auth_md5_valid+28: brasl   %r14,0x1008000aec4
 0x016f7cc2 auth_md5_valid+34: ltr %r2,%r2
 0x016f7cc4 auth_md5_valid+36: je  0x16f7cd8

Looks like a problem with shared library code that has been compiled
without -fpic/-fPIC. The brasl has a reach of +-4GB, for branches that
are farther away it cannot be used. Check the compile logs if you still
have then for a relocation truncated to fit error message.

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Trying to Debug IMAP Support in PHP

2007-01-22 Thread Martin Schwidefsky
On Mon, 2007-01-22 at 09:25 -0500, Mark Post wrote:
 No, that wasn't it.  I had to rebuild glibc anyway, so I wrote everything to
 a log (as usual).  No truncation messages, but the problem still exists.

 One thing I didn't think to look at before are the several calls to __xstat
 earlier in the gdb run.  Those all seem to go fine.  When I look at what
 routine is calling it, it's __libc_csu_fini.  When I disassemble that, the
 call looks like this:
 0x80064ad0 __libc_csu_fini+148:   brasl   %r14,0x8000aec4
 [EMAIL PROTECTED]

 You can see that the branch is going to [EMAIL PROTECTED]

 What I find odd is that when the breakpoint trips, it says:
 Breakpoint 1, 0x012fe2dc in _xstat () from /lib64/libc.so.6

 So, the difference between success and failure is
 0x8000aec4 versus 0x1008000aec4

Do a readelf -a on the object that contains auth_md5_valid and take a
look at the relocations that are applied to the function. The code
snippet above does not match the code snippet that contains the broken
brasl. The last tree digits of the address have to match. I think you
are looking at a different object.

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: [PATCH 0/59] Cleanup sysctl

2007-01-17 Thread Martin Schwidefsky
On Tue, 2007-01-16 at 09:33 -0700, Eric W. Biederman wrote:
 There has not been much maintenance on sysctl in years, and as a result is
 there is a lot to do to allow future interesting work to happen, and being
 ambitious I'm trying to do it all at once :)

s390 parts look good. Kernels boots and the system controls are still
working. I had to add an #include linux/uaccess.h to ipc/ipc_sysctl.c
to get the kernel compiled. That include should be added to patch #51.

Acked-by: Martin Schwidefsky [EMAIL PROTECTED] for:
[PATCH 33/59] sysctl: s390 move sysctl definitions to sysctl.h
[PATCH 34/59] sysctl: s390 Remove unnecessary use of insert_at_head

and the s390 parts of
[PATCH 55/59] sysctl: Remove insert_at_head from register_sysctl

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: what is the conversion from mips to Ghz or back

2006-11-02 Thread Martin Schwidefsky
On Sat, 2006-10-28 at 02:32 -0400, Alan Altmark wrote:
 On Friday, 10/27/2006 at 04:12 ZE2, Martin Schwidefsky
 [EMAIL PROTECTED] wrote:
  After the BogoMips number has entertained us once again I'm inclined to
  fix this stupid number.

 Yipee!

:-)

  For replacement we could use the cpu capability that is reported by the
  store system information instruction in SYSIB 1.2.2. On our test system
  it reports a cpu capability of 1456 (System z type 2094 model 738/S38).
  The bogomips number is between ~3400 and ~4000 dependent on the phase of
  the moon and the my caffeine-level. With the attached patch the same
  number is report every time. Makes sense, doesn't it ?

 Is it possible to call it the BogoFactor instead of BogoMIPS?  Esp.
 since there is no formal definition of the CPU Capability?

At least the bogomips entry in the /proc/cpuinfo interface is fixed, we
cannot just change it. I would leave it as it is.

 It seems that you should apply the adjustment factors and select the
 Primary, Secondary, or Alternate CPU capability as appropriate.  Also, to
 quote the book, a lower value indicates a proportionally higher CPU
 capacity.   Beyond that, there is no formal description of the algorithm
 used to generate this value.  Hmmm... lower value ... hmmm From
 that I infer that it is related to cycle time in some undefined way, but
 that a value of 0 would represent an infinitely capable CPU.  (I know,
 I'm making big leaps in my assumptions.)

Arg, the cpu capacity gets lower for faster machines! Now I finally have
understood why they introduced a floating point number in the interface.
Lower value, that makes the number useless for bogomips. I'd have to
invent some calculation like 100 / cpu capacity.

 BTW, I *love* the 42 answer on a Really Old Machine!

But that would indicate a really fast machine! I can't use 42 as the
answer for old machines.. too bad ;-)

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: what is the conversion from mips to Ghz or back

2006-11-02 Thread Martin Schwidefsky
On Thu, 2006-11-02 at 11:42 -0500, David Boyes wrote:

   BTW, I *love* the 42 answer on a Really Old Machine!
 
  But that would indicate a really fast machine! I can't use 42 as the
  answer for old machines.. too bad ;-)

 2.71828, aka e.

 After all, you guys spent all that money trademarking eServer for
 those machines.

But 2.71828 would be even faster than 42 ..

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: what is the conversion from mips to Ghz or back

2006-10-27 Thread Martin Schwidefsky
On Thu, 2006-10-26 at 13:28 -0400, Alan Altmark wrote:
 On Thursday, 10/26/2006 at 09:52 AST, Richard Troth
 [EMAIL PROTECTED] wrote:
   Note the existence of bogo in the word -- for bogus.
   It's a useless number.
 
  He says, as if MIPS were not also a useless number.   :-)

 (yawn...scratch..scratch)  So, I'm curious.  Does that mean bogomips
 has, like, O(n**2) level of bogocity?  Kul.

After the BogoMips number has entertained us once again I'm inclined to
fix this stupid number. The only place where we use the result of the
bogomips calculation is in __spin_lock_debug(). Any number roughly in
the same range as the bogomips number will do for __spin_lock_debug().
For replacement we could use the cpu capability that is reported by the
store system information instruction in SYSIB 1.2.2. On our test system
it reports a cpu capability of 1456 (System z type 2094 model 738/S38).
The bogomips number is between ~3400 and ~4000 dependent on the phase of
the moon and the my caffeine-level. With the attached patch the same
number is report every time. Makes sense, doesn't it ?

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
Index: arch/s390/Kconfig
===
RCS file: /home/cvs/linux-2.5/arch/s390/Kconfig,v
retrieving revision 1.65
diff -u -r1.65 Kconfig
--- arch/s390/Kconfig   24 Oct 2006 12:04:11 -  1.65
+++ arch/s390/Kconfig   27 Oct 2006 13:41:49 -
@@ -26,10 +26,6 @@
bool
default y

-config GENERIC_CALIBRATE_DELAY
-   bool
-   default y
-
 mainmenu Linux Kernel Configuration

 config S390
Index: drivers/s390/sysinfo.c
===
RCS file: /home/cvs/linux-2.5/drivers/s390/sysinfo.c,v
retrieving revision 1.8
diff -u -r1.8 sysinfo.c
--- drivers/s390/sysinfo.c  20 Sep 2006 08:52:54 -  1.8
+++ drivers/s390/sysinfo.c  27 Oct 2006 13:41:49 -
@@ -9,6 +9,7 @@
 #include linux/mm.h
 #include linux/proc_fs.h
 #include linux/init.h
+#include linux/delay.h
 #include asm/ebcdic.h

 struct sysinfo_1_1_1 {
@@ -351,3 +352,26 @@

 __initcall(create_proc_sysinfo);

+/*
+ * calibrate the delay loop
+ */
+void __init calibrate_delay(void)
+{
+   struct sysinfo_1_2_2 *info = (void *) get_zeroed_page (GFP_KERNEL);
+   unsigned int capability;
+
+   if (stsi(info, 1, 2, 2) == -ENOSYS)
+   /*
+* Really old machine without stsi block for basic
+* cpu information. Report 42.0 bogomips.
+*/
+   capability = 42;
+   else
+   capability = info-capability;
+   loops_per_jiffy = capability * (50/HZ);
+   free_page((unsigned long) info);
+   /* Print the good old Bogomips line .. */
+   printk(KERN_DEBUG Calibrating delay loop (skipped)... 
+  %lu.%02lu BogoMIPS preset\n, loops_per_jiffy/(50/HZ),
+  (loops_per_jiffy/(5000/HZ)) % 100);
+}

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: CP commands through a Web interface

2006-10-20 Thread Martin Schwidefsky
On Thu, 2006-10-19 at 16:57 -0400, Post, Mark K wrote:
 What are the permissions on /dev/vmcp?

Even if you set the permission of /dev/vmcp to allow normal users to
access the device, it won't allow the user to execute cp commands. There
is an additional CAP_SYS_ADMIN check in the vmcp_open function.
The reason is that a user that can execute cp commands owns the machine,
with strategically placed vmcp STORE addr data calls you change
any code in the kernel. So you better make sure that nobody who is not
trusted can get control to issue arbitrary cp commands. That is
especially true if you use vmpc in a web interface. It sounds like a
very dangerous thing to do.

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Server Time Protocol support for zSeries

2006-10-12 Thread Martin Schwidefsky
On Thu, 2006-10-12 at 00:32 +0200, Rob van der Heij wrote:
  If the underlying hardware clock keeps good time, does the Linux clock
  actually drift?

 On zSeries, the Linux system clock was supposed to be locked to the
 TOD (apart from the corrections by ntpd). That's because the TOD is
 used to measure time rather than count by interrupts (similar to the
 instruction counter in Intel CPUs). There have been bugs that caused
 Linux system clock to drop behind. I believe those were/are bugs.

Linux on zSeries uses the TOD clock to initialize the internal time and
to advance that internal time each 1/100 of a second. In the absence of
bugs (we had a few in particular in regart to NO_HZ_IDLE) the linux time
and the TOD will be in sync. If you read the linux time with the
gettimeofday system call the value of the internal linux time will be
adjusted by using the difference of the current TOD clock and the last
jiffy timestamp. That gives you a very good resolution of the clock.
That means as long as you do not use NTP the result of a STCK in user
space and the gettimeofday call should be very close. With NTP they will
drift apart.

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: BUG: Soft Lockup detected on CPU#

2006-10-10 Thread Martin Schwidefsky
On Tue, 2006-10-10 at 12:14 +0200, Carsten Otte wrote:
  The way how the soft-lockup detection works right now is broken for
  system that utilize virtual cpus. You could argue that all zSeries
  systems use virtualized cpu so the feature does not make sense.
 With dedicated PUs on a logical partition, and when running on raw
 iron the feature should work fine afaict. No?

Yes, in that special case it should work. But who is using dedicated PUs
on LPAR? 1% of the installed linux systems? The best thing to do is to
disable the config option, as it is harmful for the majority of the
systems. Heiko already sent a patch.

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: BUG: Soft Lockup detected on CPU#

2006-10-09 Thread Martin Schwidefsky
On Mon, 2006-10-09 at 15:06 -0400, Post, Mark K wrote:
 Is the plan to make this work in the future?  If not, it should not be a
 config option that can be set.

The way how the soft-lockup detection works right now is broken for
system that utilize virtual cpus. You could argue that all zSeries
systems use virtualized cpu so the feature does not make sense.

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: zLinux User Passwords on console

2006-09-28 Thread Martin Schwidefsky
On Wed, 2006-09-27 at 14:20 -0400, Alan Altmark wrote:
 On Wednesday, 09/27/2006 at 12:48 AST, Morris, Kevin J. (LNG-DAY)
  Is there a way to let zVM know that this is a password field similar to
  when you are logging on to zVM or using the VM-FTP Client?

 Yes, the function is there in the underlying z/VM device support (3215
 opcode 0x0E), but the con3215 driver in Linux doesn't appear to issue it.
 I imagine there would have to be termios (?) and ioctl() changes to the
 driver of some sort to support some type of NOECHO specification.

I tried to use opcode 0x0e to do password suppression in the 3215 driver
but found out the hard way that it doesn't work. To suppress the output
on a 3215 device you need to have a pending 0x0e read. The read will sit
there until the user pressed the attention key. If in the meantime the
console has to print a new message, the read needs to be stopped,
followed by the write for the message and then the read needs to be
started again. Consider my surprise when I found that a halt-subchannel
on the 3215 device did not work .. it is not implemented in the 3215
emulation.

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: zLinux User Passwords on console

2006-09-28 Thread Martin Schwidefsky
On Thu, 2006-09-28 at 10:44 -0400, Alan Altmark wrote:
 On Thursday, 09/28/2006 at 09:27 AST, [EMAIL PROTECTED]
 For all EBCDIC stuff, Linux assumes you are code page 37 (Grr!),
 including ^ and square brackets.  IMO Linux should extract the code page
 information from the virtual console and translate accordingly.  (sigh...
 I'm a cp924 guy myself...)

Care to come up with a patch?

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: gcc 3.4.6 converting sprintf to strcpy calls causing kernel linkedit failure

2006-09-13 Thread Martin Schwidefsky
On Tue, 2006-09-12 at 19:09 -0400, Post, Mark K wrote:
 I'm _almost_ ready to get a good compile of Linux kernel 2.4.33.3 with
 all the developerWorks patches integrated.  The last problem is
 unresolved references to strcpy in the lcs.c and qeth.c modules.  I've
 isolated the problem to a single sprintf command in each of them (out of
 _many_ that seem fine).  The following patches seems to fix it.  It
 certainly compiles, but I am not sure it is the correct way to go about
 it.

That is the too clever compiler that transforms a sprintf(buf,%s,str)
to a simple strcpy. 2.4 has an inline function for strcpy but not a
non-inlined version. The compiler optimization requires a non-inline
function for strcpy. The easiest way to get to it is to backport
arch/s390/lib/string.c from a 2.6 kernel to 2.4.

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: gcc 3.4.6 converting sprintf to strcpy calls causing kernel linkedit failure

2006-09-13 Thread Martin Schwidefsky
On Wed, 2006-09-13 at 11:51 -0400, Post, Mark K wrote:
 Yeah, I know the cause, I just wasn't at all sure about the cure.  Seems
 to me it would be even easier to manually convert sprintf(buf,%s,str)
 to strcpy(buf,str) then, wouldn't it?  That's what someone did with
 fs/reiserfs/prints.c

Well, a matter of perspective. I think it is easier to simply define a
strcpy function that the gcc can call, instead of converting all
occurrences of sprintf(buf,%s,str).

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: [PATCH] s390: kill __SMALL_STACK define

2006-09-04 Thread Martin Schwidefsky
On Fri, 2006-09-01 at 12:21 -0700, Dave Hansen wrote:
 s390 seems to define a macro: __SMALL_STACK with gcc's -D, and
 bases that -D flag off of CONFIG_SMALL_STACK.  This patch makes
 it use the existing CONFIG_ option instead.

No, that won't work. There is a reason why we have CONFIG_SMALL_STACK
and __SMALL_STACK: the first define states the wish that we want to use
a small stack, the second define reflects if the compiler can actually
do it. Your compiler needs to know about the -mpacked-stack option,
otherwise you can't use the feature.

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Fw: [LINUX-390] Who's been reading our list...

2006-05-18 Thread Martin Schwidefsky
On Wed, 2006-05-17 at 12:07 -0500, Tom Duerbusch wrote:
 #4 Context switching.  Seems like when you switch from one task to
 another on some processors, all of cache is invalidated.  Doesn't seem
 to be so with the mainframe.  I assume there is a point, where we
 thrash cache, but it seems like when we switch tasks on the mainframe,
 your part of cache (instruction cache...stuff within the processor),
 seems to stay in tack.

On x86 it is the translation-lookaside-buffers (TLBs) which get flushed
each time the control register 1 is loaded. Switching between threads is
fine because the use the same translation table. Switching between
processes has a performance penalty. On mainframes the TLBs are not
flushed for any context switch.

The cache is a different story. Mainframes have the advantage of a
shared level 2 cache compared to x86. If a process migrates from one
processor to another, the cache lines of the process just have to be
loaded from level 2 cache to level 1 cache again before they can be
accessed. On x86 it goes over memory.

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Dumping a LIN-z System

2006-04-28 Thread Martin Schwidefsky
On Fri, 2006-04-28 at 12:30 +0200, Rob van der Heij wrote:
 On 4/28/06, Peter 1 Oberparleiter [EMAIL PROTECTED] wrote:

  Note that the vmhalt function is only triggered after a user-initiated
  system halt (meaning 'shutdown -h' or equivalent actions). Specifically
  vmhalt will not be called after a kernel panic.

 Bummer!  I guess that's a memory fault on my side then...

 Don't you agree it would be extremely useful to have a vmoops=
 function then? If the kernel is still vital enough to show a trace
 back and what have you, then I would be able to issue a diag8 (even
 ignoring the response). The alternative is that people set up separate
 things to monitor for servers sitting in CP READ (like HMF can do) or
 watch the console with SCIF, or simply issue SEND CP xx IPL to any
 Linux that appears to be unresponsive... :-(

Yes, I have heard that and similar needs from various sides now. Our
testers for example would like to have a way to do an automatic dump as
well. Trouble is how do we get to a consistent interface, one that is
usable under z/VM and under LPAR? How do we specificy the dump device?
A vmoops analog to vmhalt would solved the problem just for z/VM. We
should be able to come up with a general interface for z/VM and LPAR,
something along the lines of /sys/firmware/ipl.

--
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: [PATCH] arch/s390/kernel/setup.c: fix compilation on UP

2006-01-12 Thread Martin Schwidefsky
On Wed, 2006-01-11 at 19:07 +0300, Alexey Dobriyan wrote:
Hi Alexey,

   CC  arch/s390/kernel/setup.o
 arch/s390/kernel/setup.c: In function `do_machine_restart_nonsmp':
 arch/s390/kernel/setup.c:271: error: too few arguments to function `__cpcmd'
 arch/s390/kernel/setup.c: In function `do_machine_halt_nonsmp':
 arch/s390/kernel/setup.c:279: error: too few arguments to function `__cpcmd'
 arch/s390/kernel/setup.c: In function `do_machine_power_off_nonsmp':
 arch/s390/kernel/setup.c:286: error: too few arguments to function `__cpcmd'

I added the bug-fix to my patch list. Will be sent to Andrew shortly.

--
blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: [PATCH] arch/s390/kernel/setup.c: fix compilation on UP

2006-01-12 Thread Martin Schwidefsky
On Thu, 2006-01-12 at 15:54 +0300, Alexey Dobriyan wrote:
 FYI, s390 on UP is currently broken in other ways:

   CC [M]  drivers/s390/block/dasd.o
 drivers/s390/block/dasd.c: In function `dasd_setup_queue':
 drivers/s390/block/dasd.c:1638: error: too few arguments to function 
 `blk_queue_ordered'

   CC  arch/s390/lib/spinlock.o
 arch/s390/lib/spinlock.c: In function `_raw_spin_lock_wait':
 arch/s390/lib/spinlock.c:49: warning: implicit declaration of function 
 `_raw_compare_and_swap'

I'll take care of it.

--
blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: [2.6 patch] arch/s390/Makefile: remove -finline-limit=10000

2006-01-11 Thread Martin Schwidefsky
On Tue, 2006-01-10 at 21:57 +0100, Adrian Bunk wrote:
 -finline-limit might have been required for older compilers, but
 nowadays it does no longer make sense.

I didn't check the effects of reverting to the default inline-limit, did
you find any negative impacts? I'm thinking about the critical code
paths e.g. minor faults. There better should not be an additional
function call that would have been inlined with the bigger inline limit,
since function calls are quite expensive on s390.

--
blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: 2005-10-04 Recommended Linux on zSeries code drop to developerWorks

2005-10-14 Thread Martin Schwidefsky
On Thu, 2005-10-13 at 08:58 -0700, Fargusson.Alan wrote:
 But it isn't a device is it?  It isn't a filesystem either.

 It looks like sysfs is the best fit even if it isn't really the right
 place for it.  Actually a architecture specific system call might be the
 right thing to do, but I suspect it would be hard to get it implemented.

It's not a device, it's not a filesystem, it is a service that lets you
call a hypervisor function (z/VM AND LPAR by the way). Dependent on the
diagnose code it has to do memory copies. That screams for a system call
and they are not harder to implement then e.g. an ioctl. Doing the
diagnose via a system call would have the benefit that you don't have to
open some device first. Even better you don't need any special device
node, /proc or /sys entry at all. So if we decide that we want such a
beast then we can as well implement it as a system call.

--
blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: DIAG [was: 2005-10-04 Recommended Linux on zSeries ...]

2005-10-11 Thread Martin Schwidefsky
On Mon, 2005-10-10 at 21:57 +0200, Christian Borntraeger wrote:
  You missed the point.
  The point is that neither 'vmcp' nor /dev/vmcp offer even a hint of
  the underlying DIAG interface.   Both 'vmcp' and /dev/vmcp obscure
  the DIAG interface from the very programmers who should see it.
 
  In the CPINT case,  yes,  'hcp'  hides the DIAG details.
  That's as it should be.

 Ah, now I get you. You are moving beyond cpint/vmcp and you want a generic

 diagnose interface which currently does not exist, right?
 At least I havent found a diag8 device in cpint which does not hide
 details.
 Am I still missing your intention?

Sorry to burst bubbles here. A generic diag interface doesn't make
sense. diag is a way to call the hypervisor from the guest kernel to
do something. What is done by the diag is up to the inventor of the
particular diagnose call. We are facing a number of problems here:
1) It depends on the diagnose how many and which registers are used
2) The semantics of the register contents depends on the diagnose. In
particular if one of the registers passed to the diag points to memory
then which piece of memory is meant? Usually the diag instructions take
real memory as input. The application that calls the diag service only
knows about virtual addresses. Who converts the virtual memory to a
real memory? You need to know about the semantics of the diagnose to be
able to do that. There are even diagnoses where the memory passed to the
diagnose contains additional pointers to more real memory.
3) The diagnoses do wildly different things. It is NOT a good idea to
have a single device driver that does a lot of different things. Each
function or group of functions that logically belong together should be
embedded in a stand-alone driver that can be used independently of all
the other functions.
4) Some of the diagnoses have side effects on other subsystems. E.g. the
diag250 does block i/o on a dasd, which naturally interferes with the
dasd driver.

--
blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: DIAG [was: 2005-10-04 Recommended Linux on zSeries ...]

2005-10-11 Thread Martin Schwidefsky
On Tue, 2005-10-11 at 13:25 +0100, Alan Cox wrote:
 On Maw, 2005-10-11 at 12:33 +0200, Martin Schwidefsky wrote:
  Sorry to burst bubbles here. A generic diag interface doesn't make
  sense. diag is a way to call the hypervisor from the guest kernel to
  do something.

 I'm not entirely sure I agree. Think about things lik scsi generic
 or /dev/ioport on the PC. There is a good argument to allow
 CAP_SYS_RAWIO capability users to do anything. It is up to them to get
 it right. That allows a simple kernel API for it and CAP_SYS_RAWIO
 capable users can already machine gun themselves in both feet via other
 interfaces anyway

 There is much to be said for a -privileged- API which is of the form

   struct {
   struct s390_regs reg_in;
   struct s390_regs reg_out;
   unsigned long in_ptr;   /* Bit map of pointers */
   unsigned long out_ptr;  /* Bit map of pointers */
   struct {
   void *addr;
   int direction; /* IN/OUT/BOTH */
   unsigned long length;
   } addresses[MAX_MAP];
   }

 which lets the user say what should be mapped and have the kernel do the
 copies in/out as described by the syscall.

 Now you can do anything if you have the rights.

Well, there is at least one example where such this scheme breaks down:
for the diag 250 interface the read/write control block contains a
pointer to a block list. You have a control block pointing to another
control block. Both need to be copied to kernel space but then the
address of the block list needs to get corrected in the first control
block. We'd need to add a way to store the alias address of one of the
memory areas to some field in the other memory areas.

That whole interface is scary. I'm not sure I want to debug a system
that has a self-destruct unit like that.

--
blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: catfight.

2005-10-10 Thread Martin Schwidefsky
On Sat, 2005-10-08 at 12:53 -0400, Neale Ferguson wrote:
 Christoph's point is well taken: The best place for the code was to be
 submitted for inclusion in the mainstream and subject to peer review. My
 point is that I hadn't avoided that process but had been trying to work
 with IBM (via SHARE and the Technical Steering Committee) to setup a
 co-operative process to get code like this in a place where it could be
 dealt with by the s390 community prior to going to the main kernel list.

Working with SHARE and/or the Technical Steering Committee will help to
get the s390 review tree in place. That is good.

 We were doing so in order to avoid duplication of effort and to deliver
 quality stuff to the kernel. However, things got bogged down in the
 legals such that we now have to find an alternative means of doing so.

But as long as the review tree isn't there yet we need to use the
alternative of sending patches to the mailing list  the peers. Waiting
will only lead to code rot while it is sitting on your hard-drive.

 In the interim people needed a mainline solution and the folks at the
 lab stepped up and got one in. If it's a technically superior option
 then all the better. However, I just took offence at the implication
 that I was avoiding submitting things and was bitching. I have had my
 dealings with the glibc list so I know how necessary it is to develop a
 thick skin. Similarly, the work on the Mono s390 JIT was done directly
 with the mono list: so I think I'm familiar with what the process
 involves.

The glibc folks have their own set of rules. One important thing is in
common with the linux development though: if you don't play by their
rules you get nasty mails. But if you do play by their rules it works
well. You just have to find out what the rules are ...

--
blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: catfight.

2005-10-10 Thread Martin Schwidefsky
On Fri, 2005-10-07 at 14:46 -0400, Post, Mark K wrote:
 And this has been one of my worries, and why we initially tried to go
 through IBM Legal to get things changed there.  It may very well be that
 the answer needs to be we keep pursuing that long-term, while having
 non-IBM contributors submitting stuff directly to Andrew Morton
 short-term.

Yes, please do so and please add me to CC when sending mainframe related
patches. That way I can comment them directly.

--
blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: catfight.

2005-10-10 Thread Martin Schwidefsky
On Mon, 2005-10-10 at 10:43 +0200, Carsten Otte wrote:
 It's good to see that at the end of day after a long emotional
 discussion we got to the real bottom-line:
 Neale needs to submit his things for peer-review, and we IBMers
 need to focus on actually reviewing until the result is feasible
 for inclusion.

 Btw also the opposite would be nice: people like Neale and Rob
 peer-reviewing our contributions would also help the result.

I second that. We definitly are interested in your opinion about our
code. The review works both ways.

--
blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: 2005-10-04 Recommended Linux on zSeries code drop to developerWorks

2005-10-10 Thread Martin Schwidefsky
On Mon, 2005-10-10 at 10:36 +0200, Rob van der Heij wrote:
 On 10/10/05, Carsten Otte [EMAIL PROTECTED] wrote:

  No. The Linux kernel should return Linux error codes. This way you get
  reasonable messages like out of memory, localized in the language the user
  has chosen. Users don't expect to see CP return values in Linux.

 The users who are Linux kernel developers do not expect that... ;-)
 and you may even claim that you know what the users *should* expect...
 but the other part was the PLA that Arty brought up.

There is obviously a clash between what a CP programmer expects to see
as the return code and what a Linux programmer wants to see.

 The point I tried to make is that CP command return codes are part of
 the response and allow the user (programs) to deal with expected
 errors (as opposed to things like 'out of memory' which is unexpected
 errors). When the diag8 is issued correctly it returns two responses:
 1. The CP return code (a number)
 2. The human readable response (in text)
 Although we're told not to use the text as an API, real life is
 different. But the CP return code is very useful to deal with expected
 errors.

 Looking at how Linux programs work, I believe the logical approach
 would be for cpint/vmcp to write (2) either to stdout or to stderr,
 depending on whether (1) is zero or not. But I doubt this would be
 practical for people to use. And we still would not have the return
 code to use. That's why I suggested to cheat.

If you use the vmcp api in a program you can get the CP return code by
an ioctl. I kind of like the idea to use an option, e.g. --rc to make
vmcp return the CP return code instead of the Linux return code. That
way everybody is happy. You get what you ask for.

--
blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: 2005-10-04 Recommended Linux on zSeries code drop to developerWorks

2005-10-07 Thread Martin Schwidefsky
On Thu, 2005-10-06 at 12:37 -0400, Neale Ferguson wrote:
 Please don't patronise me. I and several others from the VM/Linux
 Technical Steering committee of SHARE have been working hand-in-hand
 with the lab people since SHARE was in San Francisco a few years ago to
 work out a means of doing co-operative development. Those involved
 determined it would be a good idea if instead of an uncoordinated
 approach to kernel development for s390 we could do things in a more
 planned way. This way we could avoid the reinvention syndrome. We had
 setup a system on the Marist VM/Linux complex and ported bitkeeper over
 so that we had up to date kernel sources; there would be general access
 for the s390 people interested in kernel development; and the lab people
 could see what we were up to. It was then going to be a matter of
 submitting our changes to the kernel we made through the s390
 maintainer. Technically this was done. Legally we couldn't get it going.
 Not our legals: YOURS.

Yes we need a central place where we can kepp all the code that is
floating around for mainframe linux so that all interested parties can
look at the code. However that only addresses one aspect, the code
doesn't magically get better if we keep it at one place. Some important
facts I have learned over the past few years about Linux kernel
development:
1) It is a push model, not a pull model. If you want to get something in
you need to actively push it up the chain.
2) Be prepared for criticism, sometimes it is very harsh. We had an
example of one of the extremes how bad news is delivered. Even if you
don't like the style you need to listen to what is said. These people,
tend to be right in what they say. Unfortunatly not in the how...
3) Be prepared to rewrite your code. Completly. Even it the function is
desperatly needed, code that is considered inferior is not accepted.
Inferior in the sense that how things are done is improvable. The
function is often good, but the how is lacking. As an example (sorry
Neale): last time I looked at Neales iucv driver it still used a
character device driver as user interface to connect to the z/VM service
via iucv. That does work but imho the correct solution is to implement
a AF_IUCV network family and use netcat.

 So yes, the alternative approach of going through the standard mailing
 lists appears to be our only option now. We were keen to work closely
 with the s390 lab so that stuff like my utterly crap code could be
 made less crappy. But please don't complain to me about my complaints
 when we've been trying to do what we feel has been the best for the s390
 Linux branch.

What we need to set up is a way how we can do peer reviews. I still hope
that the marist proposal takes place, it would solve some of our current
problems. For now please send the patches to one of the appropriate
mailing lists, if you want me to comment include me on the CC list.

--
blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: catfight.

2005-10-07 Thread Martin Schwidefsky
 proposed implementation. I can't work at the consolidated repository
yet but we keep trying to find a solution. 
About me being the only pipe to the s390 backend: that is just not
true. The point here is that you need to have a name in the kernel
community to get things accepted for s390 without me. There are numerous
examples where that has happened. If someone unknown tries to change
something in arch/s390, include/s390 or drivers/s390 I get asked about
the change. I usually get asked about changes from well known people as
well but there my agreement isn't really required. That is what is meant
by being the maintainer. And if something has broken the kernel for s390
then I usually have to deal with it.

 One option on the table is to share the responsibility a bit -- it's gotta
 be a major hassle for Martin's free time to be the only guy doing this. If
 the option of using a external organization like SHARE or WAVV would solve
 some of the legal problems, then it's worth the discussion.  At least IBM
 Legal has some history with those organizations, and has a past model of how
 to deal with them. We're not starting from scratch there. 

And what is solved by that? Bad code is still bad code. To name a new
maintainer won't get you anywhere if that person starts to wave through
bad code. Maintainership is about trust that only code that matches a
certain standard gets accepted and about saying no to bad code.

 For me, that's the business of what we're doing here in this exchange --
 there is a percieved problem present. This discussion is supposed to explore
 options to solving the problem. It's not about Martin failing to do
 something, it's about solving the problem, however perceived or real. I
 happen to think that a non-IBM voice would be helpful in partially solving
 this problem; you have a different perspective. Let's discuss it and find
 some middle ground. 

If that non-IBM voice is a nobody as far as the kernel community is
concerned he or she has to spent some considerable time to make
himself/herself known to the community before we will see an
improvement.

-- 
blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: 2005-10-04 Recommended Linux on zSeries code drop to developerWorks

2005-10-06 Thread Martin Schwidefsky
On Thu, 2005-10-06 at 07:58 +0200, Waite, Dick wrote:
   We read, Because IBM's lawyers forbids IBM employees to sign
 off on code that they did not write.  At the moment, Martin Schwidefsky
 of IBM is considered the architecture maintainer for mainframe Linux.
 This makes him the only person from which Linus and Andrew Morton will
 accept mainframe-related patches.

I'm not the only person from which Linus and Andrew will accept s390
code. You just never tried to sent patches to Andrew. He will probably
ask me for my opinion and that is where the legal aspects will come in.
If the code is of good quality and matches the coding style guidelines
chances are good that it will get accepted. I can't sign them off but I
can comment on them. What we need here is a way how to get the suitable
patches together in one place.

Our lawyers are working on it - unluckily that doesn't mean that we'll
have a solution tomorrow. It takes way to long for my taste but there is
not much I can do about it. Legal question always take longer than you
think, fact of life I guess.

--
blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Linux 2.6 and 3270 fullscreen [Was: Re: [LINUX-390] NED and SLES 9]

2005-08-11 Thread Martin Schwidefsky
Linux on 390 Port LINUX-390@VM.MARIST.EDU wrote on 08/11/2005 03:42:05
AM:

 I'm adding class_simple_*() calls at the appropriate places in raw3270.c
 to create /sys/class/3270 and thence (for, say, device 6a0)
 /sys/class/3270/tty06a0 and /sys/class/3270/tub06a0; this leads to what
 I'm doing in /etc/udev/udev.rules, which is work in progress.  I'll come
 up with a patch for you in a day or two with all the above stuff, if you
 like; or you may want to fix at least the bugs and send me another
 patch, and I'll rebase on it.

Yes, please sent a patch with your changes after you are content with it.
I'll do other things in the meantime..

blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Linux 2.6 and 3270 fullscreen [Was: Re: [LINUX-390] NED and SLES 9]

2005-08-10 Thread Martin Schwidefsky
  As to /dev/3270/tub, believe me, devfs was the furthest thing from my
  mind.  Here's what I was talking about.  Consider the device /dev/tty.
  It exists to provide a path for the application to the current
  controlling tty.  What would the application have to do (absent
  stdout/stderr) to access its controlling terminal if there was no
  /dev/tty device?  It wouldn't be easy, right?

 Hmm, the controlling terminal for a fullscreen application still is
 /dev/tty. If I start the test program on a ssh terminal any output to
 stdout goes the the ssh terminal. If I start the test program on the
 3270 terminal I get nothing though. Kind of strange, the output should
 have gone to the tty view of the 3270. Needs more investigation.

Found the reason why the tty stopped while the fullscreen view is
activ: the tty is stopped... Removed the stop_tty/start_tty calls
from tty3270_deactivate/tty3270_activate.

  Similarly, /dev/3270/tub needs to exist to provide a path for the
  application to the fullscreen, when the controlling tty is in fact a
  line-mode 3270 (and ENODEV otherwise).
 
  Notice that the permissions of /dev/tty are crw-rw-rw and those of
  /dev/3270/tub are crw-rw-rw as well.  For /dev/tty notice the logic in
  drivers/char/tty_io.c:tty_open(), right at retry_open:, where a test is
  made for major 5, minor 0 (the maj/min of /dev/tty) and the device at
  current-signal-tty is used.  That's what /dev/3270/tub wants as well,
  only of course to use the corresponding full-screen major number 228
  with the minor number of the current-signal-tty device.  In 2.4 I
  reserved minor number 0 strictly for /dev/3270/tub, analogous to
  /dev/tty, and I ensured that 227,N and 228,N referred to the same
  physical device.  That is, there was no 227,0 device: the console would
  probably come in at 227,1.  My test program should work with no operands
  when invoked from a logged-on 3270: in that case, it opens /dev/3270/tub.

 Ok, I think I understand. You want some special device node analog to
 /dev/tty that redirects you automatically to the 3270 fullscreen node
 that corresponds to your controlling 3270 terminal. So that you can do
 a simple open(/dev/tub) in your application and that gets redirected
 to the fullscreen node. That shouldn't be too hard to implement.

Ok, used the old approach from the 2.4 driver to reserve the minor 0 as
multiplexer device. Open on char-major/minor 227/0 always returns -ENODEV,
open on char-major/minor 228/0 opens a fullscreen view on the 3270 device
that is associated with the current tty. If the controling tty isn't
a 3270 then again -ENODEV is returned.

In addition I fixed a few other bugs I found along the way. Latest patch
attached. Can you give it a try ?

blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

(See attached file: fs3270.diff)

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

fs3270.diff
Description: Binary data


Re: Linux 2.6 and 3270 fullscreen [Was: Re: [LINUX-390] NED and SLES 9]

2005-08-05 Thread Martin Schwidefsky
Linux on 390 Port LINUX-390@VM.MARIST.EDU wrote on 07/29/2005 10:44:24
PM:

 Thanks a lot for that blindingly fast turnaround!

Thanks for the flowers, although the second time wasn't so fast after all..

 Problems remain; I've attached an enhanced version of my tester that
 will show them up.   On Linux 2.4 my tester's fullscreen read() returns
 a count of 15, and I suspect it should in Linux 2.6 as well.

Ok, should be fixed with the attached patch. fs3270_read and fs3270_write
didn't return the number of processed bytes.

 As to /dev/3270/tub, believe me, devfs was the furthest thing from my
 mind.  Here's what I was talking about.  Consider the device /dev/tty.
 It exists to provide a path for the application to the current
 controlling tty.  What would the application have to do (absent
 stdout/stderr) to access its controlling terminal if there was no
 /dev/tty device?  It wouldn't be easy, right?

Hmm, the controlling terminal for a fullscreen application still is
/dev/tty. If I start the test program on a ssh terminal any output to
stdout goes the the ssh terminal. If I start the test program on the
3270 terminal I get nothing though. Kind of strange, the output should
have gone to the tty view of the 3270. Needs more investigation.

 Similarly, /dev/3270/tub needs to exist to provide a path for the
 application to the fullscreen, when the controlling tty is in fact a
 line-mode 3270 (and ENODEV otherwise).

 Notice that the permissions of /dev/tty are crw-rw-rw and those of
 /dev/3270/tub are crw-rw-rw as well.  For /dev/tty notice the logic in
 drivers/char/tty_io.c:tty_open(), right at retry_open:, where a test is
 made for major 5, minor 0 (the maj/min of /dev/tty) and the device at
 current-signal-tty is used.  That's what /dev/3270/tub wants as well,
 only of course to use the corresponding full-screen major number 228
 with the minor number of the current-signal-tty device.  In 2.4 I
 reserved minor number 0 strictly for /dev/3270/tub, analogous to
 /dev/tty, and I ensured that 227,N and 228,N referred to the same
 physical device.  That is, there was no 227,0 device: the console would
 probably come in at 227,1.  My test program should work with no operands
 when invoked from a logged-on 3270: in that case, it opens /dev/3270/tub.

Ok, I think I understand. You want some special device node analog to
/dev/tty that redirects you automatically to the 3270 fullscreen node
that corresponds to your controlling 3270 terminal. So that you can do
a simple open(/dev/tub) in your application and that gets redirected
to the fullscreen node. That shouldn't be too hard to implement.

 This hardly begins to address the issue you very rightly brought up, how
 to make /dev/3270/tty0987 and /dev/3270/tub0987 for a new device, and
 I'll try plowing into that.

Good.

blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

(See attached file: fs3270.diff)

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

fs3270.diff
Description: Binary data


Re: CAN-2004-0887

2004-10-26 Thread Martin Schwidefsky
 I presume, then, that a signal handler could be called, but
 instead of code getting executed in home space, code would
 get executed in primary space instead.  If a carefully
 crafted signal handler address was created then the code
 actually executed could put the user space in root mode ??

Kernel code would get executed with the user registers set up
for the signal handler. By taking careful aim with the help
of a kernel listing, a malicious user program could have done
ugly things. This is fixed in BitKeeper since yesterday, see
ChangeSet 1.2091.

 I suppose what I am really trying to understand a little better
 is how s390 linux works.  This is what I'm guessing:
 1) userland runs in home space mode
 2) kernel runs in primary space mode, uses mvcs/mvcp to
 copy between kernel and userland
 3) syscall is implemented using the svc instruction
 4) cow is implementing by forcing program interrupt 0x04
 on write

Yes, yes, yes and yes.

blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: CAN-2004-0887

2004-10-25 Thread Martin Schwidefsky
Hi Greg,

 Can't find any info on this but it is mentioned here:
 http://www.ussg.iu.edu/hypermail/linux/kernel/0410.2/2264.html

CAN-2004-0887 is a local root exploit specific to s390. The only
affected distro is SLES9 and they have a security update in place.
Do you need to know anything more specific ?

blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Documentation for User Process Faults

2004-09-30 Thread Martin Schwidefsky
 Slack may be giving you the fullword at 0x8c, consisting of two-byte ILC
 and two-byte code.  The instruction length code will be the number of
 bytes in the faulting instruction:  2, 4, or 6.  The code will be the
 0x0001 part of the examples you gave, Operation Exception for code
 0x0001.  (Operation Exception means an invalid instruction opcode.)

Exactly. For 2.4 the first level program check handler in entry.S and'ed
the 16 bit value from 0x8c with 0x7f which removed the instruction length
code. For 2.6 I changed this because it is really nice to know how long
the instruction has been that caused the fault. If the psw points behind
the faulting instruction you have to subtract the ilc to find the correct
address to look at.

blue skies,
   Martin

Linux/390 Design  Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [EMAIL PROTECTED]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: S/390 Breakage in 2.4.27

2004-09-22 Thread Martin Schwidefsky
Hi Mark,

 Thanks.  I guess I should apologize.

No need to apologize. There are so many things that slip my notice,
I can understand this. You can't know everything, this is why there
are mailing-lists.

blue skies,
   Martin

Linux/390 Design  Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [EMAIL PROTECTED]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: [PATCH] timer (6/6): add cpu steal time fields to procps.

2004-08-23 Thread Martin Schwidefsky
Hi Pavel,

 You really need to write nice description into Documentation describing what cpu 
 steal
 time is. Will it be always 0 on  non-virtualized machines?
 What about hyperthreading?

Yes, the documentation is one thing that still needs to be done. The steal
time is the time where the cpu has been scheduled to do something but did
something else. This can be used for hyperthreading as well and it would
reflect perfectly what the cpu is doing. The two threads that are scheduled
on the cpu won't get 100% of the cpu, but lets say 80% and 60% of the single
processor performance. The steal time would be 20% and 40%. You can even
combine hyperthreading with virtual processors. The steal time is just the
difference between 100% and the actual cpu usage of the process/thread.

blue skies,
   Martin

Linux/390 Design  Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [EMAIL PROTECTED]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: [PATCH] cputime (1/6): move call to update_process_times.

2004-08-06 Thread Martin Schwidefsky
  Wouldn't it be possible to move the #ifndef into sched.h?

 You can't simply define it to a nop in case of SMP, because
 there it is called from a different place, but we could
 have a separate version for UP and SMP in sched.h:

 void update_process_times(int user_tick);
 static inline void update_process_times_nonsmp(int user_tick)
 {
 #ifndef CONFIG_SMP
 update_process_times(user_tick);
 #endif
 }

Well, the #ifndef can just be removed for most of the architectures
because they are non-smp architectures anyway. But to avoid breaking
anything I decided to play safe and move the whole #ifndef block.
It's up to the arch maintainer to remove the #ifndef if the arch
doesn't need it.
The reason for moving the #ifndef is twofold. 1) it's just confusing
that a common code function depends on CONFIG_SMP in the way do_timer
does. do_timer should do just one thing, and not two but only if this
is a non-smp kernel. 2) do_timer and update_process_times needs to get
separated to make it possible to account cputime independent of the
xtime updates.

blue skies,
   Martin

Linux/390 Design  Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [EMAIL PROTECTED]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: [PATCH] cputime (3/6): move jiffies stuff to jiffies.h

2004-08-06 Thread Martin Schwidefsky
  --- linux-2.6.8-rc3/include/linux/times.h   Wed Jun 16 07:18:57
2004
  +++ linux-2.6.8-s390/include/linux/times.h  Thu Jan  1 01:00:00
1970
  @@ -1,65 +0,0 @@
 ...
  -
  -struct tms {
  -   clock_t tms_utime;
  -   clock_t tms_stime;
  -   clock_t tms_cutime;
  -   clock_t tms_cstime;
  -};
  -

 This should probably stay in linux/times.h, in order to be moved
 to abi/times.h one day. glibc has its own sys/times.h, but struct
 tms simply belongs into times.h, not time.h, according to the
 times man page.

The times man pages says that the struct tms is defined in sys/times.h.
This doesn't make it necessary to have a linux/times.h header file.
These are kernel headers and not user space headers. Does anybody think
it's important to keep the user/kernel header files names similar ?

blue skies,
   Martin

Linux/390 Design  Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [EMAIL PROTECTED]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


[RFC] cputime patches.

2004-08-05 Thread Martin Schwidefsky
Hi folks,
the cleanup of my cputime patches is done and now I'm ready for the
first round of bashing ;-)
I've split the kernel patch into 5 parts, Jan works on another kernel
patch that introduces virtual cpu time slices but this isn't ready yet.
Patch number 6 is a patch against procps that make the cpu steal field
visible. Patches number 1 to 3 are kernel code cleanups that make life
easier (I think), number 4 is the one that introduces the cputime
interface to common code and number 5 is s390 architecture code that
makes use of the interface to get exact cputime numbers.

The patches are against 2.6.8-rc3. I'll keep the fingers crossed
that I didn't break any architecture. Have fun.

[PATCH] cputime (1/6): move call to update_process_times.
[PATCH] cputime (2/6): remove unused definitions from timex.h.
[PATCH] cputime (3/6): move jiffies stuff to jiffies.h
[PATCH] cputime (4/6): introduce cputime.
[PATCH] cputime (5/6): microsecond based cputime for s390.
[PATCH] cputime (6/6): add cpu steal time fields to procps.

blue skies,
  Martin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


[PATCH] timer (6/6): add cpu steal time fields to procps.

2004-08-05 Thread Martin Schwidefsky
[PATCH] timer (6/6): add cpu steal time fields to procps.

From: Martin Schwidefsky [EMAIL PROTECTED]

Make use of the cpu steal time field in /proc/stat that has been
introduces by the cputime patch. The new output of top looks like
this:

top - 09:50:20 up 11 min,  3 users,  load average: 8.94, 7.17, 3.82
Tasks:  78 total,   8 running,  70 sleeping,   0 stopped,   0 zombie
 Cpu0 : 38.7%us,  4.2%sy,  0.0%ni,  0.0%id,  2.4%wa,  1.8%hi,  0.0%si, 53.0%st
 Cpu1 : 38.5%us,  0.6%sy,  0.0%ni,  5.1%id,  1.3%wa,  1.9%hi,  0.0%si, 52.6%st
 Cpu2 : 54.0%us,  0.6%sy,  0.0%ni,  0.6%id,  4.9%wa,  1.2%hi,  0.0%si, 38.7%st
 Cpu3 : 49.1%us,  0.6%sy,  0.0%ni,  1.2%id,  0.0%wa,  0.0%hi,  0.0%si, 49.1%st
 Cpu4 : 35.9%us,  1.2%sy,  0.0%ni, 15.0%id,  0.6%wa,  1.8%hi,  0.0%si, 45.5%s
 Cpu5 : 43.0%us,  2.1%sy,  0.7%ni,  0.0%id,  4.2%wa,  1.4%hi,  0.0%si, 48.6%st
Mem:251832k total,   155448k used,96384k free, 1212k buffers
Swap:   524248k total,17716k used,   506532k free,18096k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
20629 root  25   0 30572  27m 7076 R 55.2 11.1   0:02.14 cc1
20617 root  25   0 40600  37m 7076 R 47.0 15.1   0:03.04 cc1
20635 root  24   0 26356  20m 7076 R 42.3  8.4   0:00.75 cc1
20638 root  25   0 23196  17m 7076 R 27.0  7.2   0:00.46 cc1
20642 root  25   0 15028 9824 7076 R 18.2  3.9   0:00.31 cc1
20644 root  20   0 14852 9648 7076 R 17.0  3.8   0:00.29 cc1
   26 root   5 -10 000 S  0.6  0.0   0:00.03 kblockd/5
  915 root  16   0  3012  884 2788 R  0.6  0.4   0:02.33 top
1 root  16   0  2020  284 1844 S  0.0  0.1   0:00.06 init

Signed-off-by: Martin Schwidefsky [EMAIL PROTECTED]

diff -urN procps-3.2.0/proc/sysinfo.c procps-3.2.0-steal/proc/sysinfo.c
--- procps-3.2.0/proc/sysinfo.c 2004-07-16 09:52:29.0 +0200
+++ procps-3.2.0-steal/proc/sysinfo.c   2004-07-16 09:51:44.0 +0200
@@ -216,11 +216,11 @@
 #define NAN (-0.0)
 #endif
 #define JT unsigned long long
-void seven_cpu_numbers(double *restrict uret, double *restrict nret, double *restrict 
sret, double *restrict iret, double *restrict wret, double *restrict xret, double 
*restrict yret){
-double tmp_u, tmp_n, tmp_s, tmp_i, tmp_w, tmp_x, tmp_y;
+void seven_cpu_numbers(double *restrict uret, double *restrict nret, double *restrict 
sret, double *restrict iret, double *restrict wret, double *restrict xret, double 
*restrict yret, double *restrict zret){
+double tmp_u, tmp_n, tmp_s, tmp_i, tmp_w, tmp_x, tmp_y, tmp_z;
 double scale;  /* scale values to % */
-static JT old_u, old_n, old_s, old_i, old_w, old_x, old_y;
-JT new_u, new_n, new_s, new_i, new_w, new_x, new_y;
+static JT old_u, old_n, old_s, old_i, old_w, old_x, old_y, old_z;
+JT new_u, new_n, new_s, new_i, new_w, new_x, new_y, new_z;
 JT ticks_past; /* avoid div-by-0 by not calling too often :-( */

 tmp_w = 0.0;
@@ -229,10 +229,12 @@
 new_x = 0;
 tmp_y = 0.0;
 new_y = 0;
+tmp_z = 0.0;
+new_z = 0;

 FILE_TO_BUF(STAT_FILE,stat_fd);
-sscanf(buf, cpu %Lu %Lu %Lu %Lu %Lu %Lu %Lu, new_u, new_n, new_s, new_i, 
new_w, new_x, new_y);
-ticks_past = 
(new_u+new_n+new_s+new_i+new_w+new_x+new_y)-(old_u+old_n+old_s+old_i+old_w+old_x+old_y);
+sscanf(buf, cpu %Lu %Lu %Lu %Lu %Lu %Lu %Lu %Lu, new_u, new_n, new_s, 
new_i, new_w, new_x, new_y, new_z);
+ticks_past = 
(new_u+new_n+new_s+new_i+new_w+new_x+new_y+new_z)-(old_u+old_n+old_s+old_i+old_w+old_x+old_y+old_z);
 if(ticks_past){
   scale = 100.0 / (double)ticks_past;
   tmp_u = ( (double)new_u - (double)old_u ) * scale;
@@ -242,6 +244,7 @@
   tmp_w = ( (double)new_w - (double)old_w ) * scale;
   tmp_x = ( (double)new_x - (double)old_x ) * scale;
   tmp_y = ( (double)new_y - (double)old_y ) * scale;
+  tmp_z = ( (double)new_z - (double)old_z ) * scale;
 }else{
   tmp_u = NAN;
   tmp_n = NAN;
@@ -250,6 +253,7 @@
   tmp_w = NAN;
   tmp_x = NAN;
   tmp_y = NAN;
+  tmp_z = NAN;
 }
 SET_IF_DESIRED(uret, tmp_u);
 SET_IF_DESIRED(nret, tmp_n);
@@ -258,6 +262,7 @@
 SET_IF_DESIRED(wret, tmp_w);
 SET_IF_DESIRED(iret, tmp_x);
 SET_IF_DESIRED(wret, tmp_y);
+SET_IF_DESIRED(wret, tmp_z);
 old_u=new_u;
 old_n=new_n;
 old_s=new_s;
@@ -265,6 +270,7 @@
 old_w=new_w;
 old_i=new_x;
 old_w=new_y;
+old_z=new_z;
 }
 #undef JT
 #endif
@@ -341,7 +347,7 @@

 /***/

-void getstat(jiff *restrict cuse, jiff *restrict cice, jiff *restrict csys, jiff 
*restrict cide, jiff *restrict ciow, jiff *restrict cxxx, jiff *restrict cyyy,
+void getstat(jiff *restrict cuse, jiff *restrict cice, jiff *restrict csys, jiff 
*restrict cide, jiff *restrict ciow, jiff *restrict cxxx, jiff *restrict cyyy, jiff 
*restrict czzz,
 unsigned long *restrict pin, unsigned long *restrict pout, unsigned long 
*restrict s_in, unsigned long

[PATCH] cputime (1/6): move call to update_process_times.

2004-08-05 Thread Martin Schwidefsky
[PATCH] cputime (1/6): move call to update_process_times.

From: Martin Schwidefsky [EMAIL PROTECTED]

For non-smp kernels the call to update_process_times is done
in the do_timer function. It is more consistent with smp kernels
to move this call to the architecture file which calls do_timer.

Signed-off-by: Martin Schwidefsky [EMAIL PROTECTED]

diffstat:
 arch/alpha/kernel/time.c |3 +++
 arch/arm/kernel/time.c   |3 +++
 arch/arm/mach-iop3xx/iq80310-time.c  |3 +++
 arch/arm26/kernel/time.c |3 +++
 arch/cris/arch-v10/kernel/time.c |3 +++
 arch/h8300/kernel/time.c |3 +++
 arch/ia64/kernel/time.c  |8 +---
 arch/m68k/kernel/time.c  |3 +++
 arch/m68k/sun3/sun3ints.c|3 +++
 arch/m68knommu/kernel/time.c |3 +++
 arch/mips/au1000/common/time.c   |9 +
 arch/mips/baget/time.c   |3 +++
 arch/mips/galileo-boards/ev96100/time.c  |3 +++
 arch/mips/gt64120/common/time.c  |3 +++
 arch/mips/kernel/time.c  |3 ---
 arch/mips/momentum/ocelot_g/gt-irq.c |3 +++
 arch/mips/sgi-ip27/ip27-timer.c  |2 --
 arch/parisc/kernel/time.c|2 ++
 arch/ppc/kernel/time.c   |3 +++
 arch/ppc64/kernel/time.c |3 +++
 arch/s390/kernel/time.c  |4 +++-
 arch/sh/kernel/time.c|3 +++
 arch/sh64/kernel/time.c  |3 +++
 arch/sparc/kernel/pcic.c |3 +++
 arch/sparc/kernel/time.c |4 
 arch/sparc64/kernel/time.c   |1 +
 arch/um/kernel/time_kern.c   |2 --
 arch/v850/kernel/time.c  |3 +++
 arch/x86_64/kernel/time.c|3 +++
 include/asm-arm/arch-clps711x/time.h |3 +++
 include/asm-arm/arch-integrator/time.h   |3 +++
 include/asm-arm/arch-l7200/time.h|3 +++
 include/asm-i386/mach-default/do_timer.h |3 +++
 include/asm-i386/mach-visws/do_timer.h   |3 +++
 include/asm-i386/mach-voyager/do_timer.h |3 +++
 kernel/timer.c   |5 -
 36 files changed, 98 insertions(+), 20 deletions(-)

diff -urN linux-2.6.8-rc3/arch/alpha/kernel/time.c 
linux-2.6.8-s390/arch/alpha/kernel/time.c
--- linux-2.6.8-rc3/arch/alpha/kernel/time.cThu Aug  5 18:39:48 2004
+++ linux-2.6.8-s390/arch/alpha/kernel/time.c   Thu Aug  5 18:40:21 2004
@@ -138,6 +138,9 @@

while (nticks  0) {
do_timer(regs);
+#ifndef CONFIG_SMP
+   update_process_times(user_mode(regs));
+#endif
nticks--;
}

diff -urN linux-2.6.8-rc3/arch/arm/kernel/time.c 
linux-2.6.8-s390/arch/arm/kernel/time.c
--- linux-2.6.8-rc3/arch/arm/kernel/time.c  Thu Aug  5 18:39:48 2004
+++ linux-2.6.8-s390/arch/arm/kernel/time.c Thu Aug  5 18:40:21 2004
@@ -321,6 +321,9 @@
do_leds();
do_set_rtc();
do_timer(regs);
+#ifndef CONFIG_SMP
+   update_process_times(user_mode(regs));
+#endif
 }

 void (*init_arch_time)(void);
diff -urN linux-2.6.8-rc3/arch/arm/mach-iop3xx/iq80310-time.c 
linux-2.6.8-s390/arch/arm/mach-iop3xx/iq80310-time.c
--- linux-2.6.8-rc3/arch/arm/mach-iop3xx/iq80310-time.c Wed Jun 16 07:19:43 2004
+++ linux-2.6.8-s390/arch/arm/mach-iop3xx/iq80310-time.cThu Aug  5 18:40:21 
2004
@@ -97,6 +97,9 @@
*timer_en |= 2;

do_timer(regs);
+#ifndef CONFIG_SMP
+   update_process_times(user_mode(regs));
+#endif

return IRQ_HANDLED;
 }
diff -urN linux-2.6.8-rc3/arch/arm26/kernel/time.c 
linux-2.6.8-s390/arch/arm26/kernel/time.c
--- linux-2.6.8-rc3/arch/arm26/kernel/time.cWed Jun 16 07:19:42 2004
+++ linux-2.6.8-s390/arch/arm26/kernel/time.c   Thu Aug  5 18:40:21 2004
@@ -188,6 +188,9 @@
 static irqreturn_t timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
 {
 do_timer(regs);
+#ifndef CONFIG_SMP
+   update_process_times(user_mode(regs));
+#endif
 do_set_rtc(); //FIME - EVERY timer IRQ?
 do_profile(regs);
return IRQ_HANDLED; //FIXME - is this right?
diff -urN linux-2.6.8-rc3/arch/cris/arch-v10/kernel/time.c 
linux-2.6.8-s390/arch/cris/arch-v10/kernel/time.c
--- linux-2.6.8-rc3/arch/cris/arch-v10/kernel/time.cThu Aug  5 18:39:48 2004
+++ linux-2.6.8-s390/arch/cris/arch-v10/kernel/time.c   Thu Aug  5 18:40:21 2004
@@ -227,6 +227,9 @@
/* call the real timer interrupt handler */

do_timer(regs);
+#ifndef CONFIG_SMP
+   update_process_times(user_mode(regs));
+#endif

/*
 * If we have an externally synchronized Linux clock, then update
diff -urN linux-2.6.8-rc3/arch/h8300/kernel/time.c 
linux-2.6.8-s390/arch/h8300/kernel/time.c
--- linux-2.6.8-rc3/arch/h8300/kernel/time.cWed Jun 16 07:19:10 2004
+++ linux-2.6.8-s390/arch/h8300/kernel/time.c   Thu Aug  5 18:40

[PATCH] cputime (2/6): remove unused definitions from timex.h.

2004-08-05 Thread Martin Schwidefsky
[PATCH] cputime (2/6): remove unused definitions from timex.h.

From: Martin Schwidefsky [EMAIL PROTECTED]

The CLOCK_TICK_FACTOR and FINETUNE defines from asm/timex.h
are not used anywhere. Kill them.

Signed-off-by: Martin Schwidefsky [EMAIL PROTECTED]

diffstat:
 include/asm-arm/arch-lh7a40x/timex.h |1 -
 include/asm-arm/arch-sa1100/timex.h  |1 -
 include/asm-h8300/timex.h|4 
 include/asm-i386/timex.h |4 
 include/asm-m68k/timex.h |4 
 include/asm-ppc/timex.h  |4 
 include/asm-ppc64/timex.h|4 
 include/asm-s390/timex.h |4 
 include/asm-sh/timex.h   |4 
 include/asm-sparc/timex.h|4 
 include/asm-sparc64/timex.h  |4 
 include/asm-v850/timex.h |4 
 include/asm-x86_64/timex.h   |4 
 13 files changed, 46 deletions(-)

diff -urN linux-2.6.8-rc3/include/asm-arm/arch-lh7a40x/timex.h 
linux-2.6.8-s390/include/asm-arm/arch-lh7a40x/timex.h
--- linux-2.6.8-rc3/include/asm-arm/arch-lh7a40x/timex.hWed Jun 16 07:18:55 
2004
+++ linux-2.6.8-s390/include/asm-arm/arch-lh7a40x/timex.h   Thu Aug  5 18:40:22 
2004
@@ -14,5 +14,4 @@

 /*
 #define CLOCK_TICK_RATE3686400
-#define CLOCK_TICK_FACTOR  80
 */
diff -urN linux-2.6.8-rc3/include/asm-arm/arch-sa1100/timex.h 
linux-2.6.8-s390/include/asm-arm/arch-sa1100/timex.h
--- linux-2.6.8-rc3/include/asm-arm/arch-sa1100/timex.h Wed Jun 16 07:19:37 2004
+++ linux-2.6.8-s390/include/asm-arm/arch-sa1100/timex.hThu Aug  5 18:40:22 
2004
@@ -10,4 +10,3 @@
  * SA1100 timer
  */
 #define CLOCK_TICK_RATE3686400
-#define CLOCK_TICK_FACTOR  80
diff -urN linux-2.6.8-rc3/include/asm-h8300/timex.h 
linux-2.6.8-s390/include/asm-h8300/timex.h
--- linux-2.6.8-rc3/include/asm-h8300/timex.h   Wed Jun 16 07:19:23 2004
+++ linux-2.6.8-s390/include/asm-h8300/timex.h  Thu Aug  5 18:40:22 2004
@@ -7,10 +7,6 @@
 #define _ASM_H8300_TIMEX_H

 #define CLOCK_TICK_RATE CONFIG_CPU_CLOCK*1000/8192 /* Timer input freq. */
-#define CLOCK_TICK_FACTOR  20  /* Factor of both 100 and CLOCK_TICK_RATE 
*/
-#define FINETUNE ((long)LATCH * HZ - CLOCK_TICK_RATE)  SHIFT_HZ) * \
-   (100/CLOCK_TICK_FACTOR) / (CLOCK_TICK_RATE/CLOCK_TICK_FACTOR)) \
-(SHIFT_SCALE-SHIFT_HZ)) / HZ)

 typedef unsigned long cycles_t;
 extern short h8300_timer_count;
diff -urN linux-2.6.8-rc3/include/asm-i386/timex.h 
linux-2.6.8-s390/include/asm-i386/timex.h
--- linux-2.6.8-rc3/include/asm-i386/timex.hThu Aug  5 18:40:05 2004
+++ linux-2.6.8-s390/include/asm-i386/timex.h   Thu Aug  5 18:40:22 2004
@@ -15,10 +15,6 @@
 #  define CLOCK_TICK_RATE 1193182 /* Underlying HZ */
 #endif

-#define CLOCK_TICK_FACTOR  20  /* Factor of both 100 and CLOCK_TICK_RATE 
*/
-#define FINETUNE ((long)LATCH * HZ - CLOCK_TICK_RATE)  SHIFT_HZ) * \
-   (100/CLOCK_TICK_FACTOR) / (CLOCK_TICK_RATE/CLOCK_TICK_FACTOR)) \
-(SHIFT_SCALE-SHIFT_HZ)) / HZ)

 /*
  * Standard way to access the cycle counter on i586+ CPUs.
diff -urN linux-2.6.8-rc3/include/asm-m68k/timex.h 
linux-2.6.8-s390/include/asm-m68k/timex.h
--- linux-2.6.8-rc3/include/asm-m68k/timex.hWed Jun 16 07:18:57 2004
+++ linux-2.6.8-s390/include/asm-m68k/timex.h   Thu Aug  5 18:40:22 2004
@@ -7,10 +7,6 @@
 #define _ASMm68k_TIMEX_H

 #define CLOCK_TICK_RATE1193180 /* Underlying HZ */
-#define CLOCK_TICK_FACTOR  20  /* Factor of both 100 and CLOCK_TICK_RATE 
*/
-#define FINETUNE ((long)LATCH * HZ - CLOCK_TICK_RATE)  SHIFT_HZ) * \
-   (100/CLOCK_TICK_FACTOR) / (CLOCK_TICK_RATE/CLOCK_TICK_FACTOR)) \
-(SHIFT_SCALE-SHIFT_HZ)) / HZ)

 typedef unsigned long cycles_t;

diff -urN linux-2.6.8-rc3/include/asm-ppc/timex.h 
linux-2.6.8-s390/include/asm-ppc/timex.h
--- linux-2.6.8-rc3/include/asm-ppc/timex.h Wed Jun 16 07:19:23 2004
+++ linux-2.6.8-s390/include/asm-ppc/timex.hThu Aug  5 18:40:22 2004
@@ -11,10 +11,6 @@
 #include asm/cputable.h

 #define CLOCK_TICK_RATE1193180 /* Underlying HZ */
-#define CLOCK_TICK_FACTOR  20  /* Factor of both 100 and CLOCK_TICK_RATE 
*/
-#define FINETUNE ((long)LATCH * HZ - CLOCK_TICK_RATE)  SHIFT_HZ) * \
-   (100/CLOCK_TICK_FACTOR) / (CLOCK_TICK_RATE/CLOCK_TICK_FACTOR)) \
-(SHIFT_SCALE-SHIFT_HZ)) / HZ)

 typedef unsigned long cycles_t;

diff -urN linux-2.6.8-rc3/include/asm-ppc64/timex.h 
linux-2.6.8-s390/include/asm-ppc64/timex.h
--- linux-2.6.8-rc3/include/asm-ppc64/timex.h   Wed Jun 16 07:20:26 2004
+++ linux-2.6.8-s390/include/asm-ppc64/timex.h  Thu Aug  5 18:40:22 2004
@@ -12,10 +12,6 @@
 #define _ASMPPC64_TIMEX_H

 #define CLOCK_TICK_RATE1193180 /* Underlying HZ */
-#define CLOCK_TICK_FACTOR  20  /* Factor of both 100 and CLOCK_TICK_RATE 
*/
-#define FINETUNE ((long)LATCH * HZ - CLOCK_TICK_RATE

Re: Linux 2.6.7 Patch Status

2004-07-16 Thread Martin Schwidefsky
Hi Mark,

 No, that's not the name inside the tarball.  See my note to Ulrich (or
 download the file yourself if you don't believe me).  As I also said to
 Ulrich, even if that were the name of the file, I have no way of knowing
 which lines actually comprise _just_ the multicast notifier patch.  So I
 have no way of only adding that on top of 2.6.8-rc1.

Are you looking at the linux-2.6.5-s390-xxx-april2004.tar.gz tarball or
at the linux-2.6.5-s390-xxx-april2004-patches.tar.gz tarball ? You'll
find the per-problem patches only in the -patches variant.

blue skies,
   Martin

Linux/390 Design  Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [EMAIL PROTECTED]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: Linux 2.6.7 Patch Status

2004-07-12 Thread Martin Schwidefsky
 The only significant exceptions I know of for 2.6.7 are these:
 - The shared-IPv6-card patch:
   linux-2.6.5-s390-base-14-april2004.diff
   Sadly, no one has so far come up with a solution that can be
   merged into the official kernel, so you have to apply this
   to be able to use IPv6 with multiple VM guests on one card.
 - The lost-dirty-bit fix:
   linux-2.6.5-s390-04-16-april2004.diff
   This came too late for 2.6.7 but is integrated in the latest
   BitKeeper snapshots, like many other less important bug fixes.
 - The xip2fs filesystem (you know the story...).

 We're always trying to minimize the number of these patches, but probably
 some more work should be put into documenting the state.
 If you have found anything else in the DeveloperWorks stream that is not
 in the real kernel, please tell!

I have three more patches for 2.6.7 in my list
- The kerntypes patch
  linux-2.6.5-s390-base-12-april2004.diff
  This patch hasn't been accepted into the official BitKeeper because
  the discussion how to do post mortem problem determination isn't
  finished yet. Redhat has its own crash analysis tool whereas we
  at ibm prefer lcrash.
- The multicast notifier patch
  linux-2.6.5-s390-04-26-april2004.diff
  This patch adds another multicast notifier chain that reports all
  addresses. The standard notifier doesn't report all. The final
  solution will be to merge these two notifiers but this might affect
  other network drivers so we introduces this interim solution.
- The zfcp module_exit vs. lun/port object kfree patch
  This is part of the patch linux-2.6.5-s390-base-07-april2004.diff
  I reverted a part of the patch to get the rest of the zfcp patches
  integrated into the official BitKeeper. Greg strongly objected
  against the kfree trick so we had to remove it.

Currently I have 4 patches pending against BitKeeper, these are the
above patches without the dirty bit patch and the xip2fs patch.
The dirty bit patch has been added to -bk shortly after the release
of 2.6.7 and the xip2fs is currently reworked based on suggestions
from Andrew Morton.

blue skies,
   Martin

Linux/390 Design  Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [EMAIL PROTECTED]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: 2.6 kernel and old DASD problem

2004-05-11 Thread Martin Schwidefsky
 Martin, you are the Wizard, indeed! Thank you very much! Now i've got new
 kernel with old disks :)

Thanks for the flowers, but actually it was Conny who came up with the
idea for the fix. I only did the debugging part ...

blue skies,
   Martin

Linux/390 Design  Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [EMAIL PROTECTED]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: 2.6 kernel and old DASD problem

2004-05-03 Thread Martin Schwidefsky
 SenseID : device 0750 reports: CU  Type/Mod = 3990/EC, Dev Type/Mod =
 3390/0A (for Shark)
 SenseID : device 080b reports: CU  Type/Mod = 3990/E9, Dev Type/Mod =
 3390/0A   (for Tetragon)
 SenseID : device 080c reports: CU  Type/Mod = 3990/E9, Dev Type/Mod =
 3390/0A   (for Tetragon)

Hmm, the cu type/dev type combination is fine. The must be another reason
why the dasd aren't recognized. If this system runs under z/VM could you
do a #CP CPU ALL TR IO 080B INST INT CCW RUN and post the log?

blue skies,
   Martin

Linux/390 Design  Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [EMAIL PROTECTED]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: 2.6 kernel and old DASD problem

2004-04-30 Thread Martin Schwidefsky
Hi Sergey,

 Is there a problem with not supported by new driver or i've just
 forgotten to compile something into the kernel?

Could you ipl with the cio_msg=yes parameter and tell me what the
SenseID line for the device looks like. I have the feeling that
there is a cu type/device type pair missing in the list of devices
(see drivers/s390/block/dasd_eckd.c:dasd_eckd_ids[]).

blue skies,
   Martin

Linux/390 Design  Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [EMAIL PROTECTED]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: 2.6 device node help

2004-04-27 Thread Martin Schwidefsky
 I don't think you're on crack,
 but the book might be.   4,64 is a UART class serial line.
 Not likely your zSeries will have that.   But all HW have 5,1 console.

Well, neither is on crack. The 3215/sclp console driver use 4,64 as device
node, so the dd-book is correct. But nevertheless you have to use 5,1
for /dev/console. The device that hides behind 5,1 is a redirector to the
real console.

blue skies,
   Martin

Linux/390 Design  Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [EMAIL PROTECTED]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: [PATCH] Clean up asm/pgalloc.h include (s390)

2004-04-19 Thread Martin Schwidefsky
 This patch cleans up needless includes of asm/pgalloc.h from the
 arch/s390/ subtree.  This has not been compile tested, so
 needs the architecture maintainers (or willing volunteers) to
 test.

Doesn't compile. s390_ksyms needs pgalloc.h for the definition of diag10.
The other includes of pgalloc.h can be removed without a problem.

blue skies,
   Martin

Linux/390 Design  Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [EMAIL PROTECTED]

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


  1   2   >