Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?

2010-04-25 Thread Richard L. Hamilton
[...]
  To get around the problem
  pmap: cannot examine 5608: address space is
 changing
 
  and get a closer look, try stopping the process
 first:
 
  pstop 5608
 
  and then running pmap or whatever to inspect it,
 and finally running
 
  prun 5608
 
  to let it run again.
 
 Why is pmap not doing this?

I don't know.  I (a) don't work for Sun, and (b) only run
OpenSolaris under VirtualBox on an Intel Mac Mini (my other
systems are SPARC and running Solaris 9, and Solaris SXCE snv_97,
respectively).  So just starting OpenSolaris under VirtualBox uses
about all the time I'm willing to spend getting my answer right
on anything but the most interesting problems.  I would _guess_ that
either using the -F option might also help

 -F  Force. Grabs the target process even  if
 another process has control.

or else that there's a bug in (or affecting) pmap.  I suggested
pstop/prun simply as the first thing I could think of to work
around the problem.

  I suspect the shell script being run is the real
 problem; not too many
  well-written shell scripts should grow to such
 monster size.
 
 We called upon the author to explain: He said the
 script caches many
 data in memory (arrays) during execution and the 19G
 memory peak usage
 matches the working set of the input data.

Ok.  Sometimes, esp. if it would otherwise run very slowly,
that can make sense.

 We're still verifying the output because the script
 finished in four
 hours while the legacy perl version of the script
 used to run a whole
 weekend.
 This is suspicious and too good to be true.

Maybe, maybe not.  If the rewrite also improved the algorithm
or implementation, that's entirely possible.  Also, a lot of perl
scripts run more external programs than they really ought to.
Recent ksh93 has a lot of built-in commands that used to require
running external programs.  So for any of various reasons, such
an improvement, although suspicious, doesn't seem at all impossible
to me.  (It also leaves me thinking that perhaps with enough effort,
further improvements might be possible.)

Note: I'm not necessarily saying that recent ksh93 is faster than
perl.  Both can be fast or slow depending on how the script is
written; probably neither will give as much help to optimize out
slow things the programmer did as would some other languages.
It used to be that perl was often faster than shell scripts, perhaps
only slower if the longer startup time for perl was an issue.  But
recent ksh93 is probably at least capable of being quite close
(or _maybe_ even a bit faster), so simply assuming that one
is faster than the other is probably more wrong now than it
ever was.

I think I've read things suggesting that the ksh93 developers
want to make it competitive with perl in both speed and
at least commonly used functionality.  Although even if it
were there today (and on functionality, I don't think it is really),
it would be a long time before there was anything like perl's CPAN.
Nevertheless, where it's good enough, I like ksh93 better myself.
I learned C and awk and sed way back before perl even existed, so
perl looks to me like a stew with random bits of everything
mixed in.  It gives me a headache, and I have to have a manual
open the whole time I write something in it.  Borrowing stuff
from everywhere may look natural to someone like Larry Wall
(perl creator) who started as a linguist, but to me, usually
more comfortable with predictable computers than unpredictable
people, I'd rather use a language where I can remember a few rules
rather than a lot of details.
-- 
This message posted from opensolaris.org
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?

2010-04-25 Thread Hugh McIntyre

Richard L. Hamilton wrote:

[...]

To get around the problem
pmap: cannot examine 5608: address space is

changing

and get a closer look, try stopping the process

first:

pstop 5608

and then running pmap or whatever to inspect it,

and finally running

prun 5608

to let it run again.

Why is pmap not doing this?


I don't know.  I (a) don't work for Sun, and (b) only run
OpenSolaris under VirtualBox on an Intel Mac Mini (my other
systems are SPARC and running Solaris 9, and Solaris SXCE snv_97,
respectively).  So just starting OpenSolaris under VirtualBox uses
about all the time I'm willing to spend getting my answer right
on anything but the most interesting problems.  I would _guess_ that
either using the -F option might also help

 -F  Force. Grabs the target process even  if
 another process has control.


You could also try -L, since (apart from changing the output), this 
seems to change the way the data is collected via an agent LWP inside 
the process itself, and possibly involving stopping the process.   Like 
Richard, I cannot say for sure though.


As for the default behavior, the presence of the address space is 
changing message and the fact that the code loops with #define 
MAX_RETRIES 5  implies that the process is definitely not stopped and 
that the possibility of change is expected.


The code that handles -L with an agent LWP, on the other hand, goes to 
some effort to buffer stdout to avoid the deadlock you'd get with pmap 
`pgrep xterm` from inside the relevant xterm nad has comments about the 
process being blocked.  Running pmap on the X server you're issuing pmap 
on being a well known way to lock up your desktop session ...


Hugh.
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?

2010-04-25 Thread Casper . Dik

A 32-bit process _can't_* be bigger than 4GB, and as far as the kernel
is concerned, AFAIK won't be bigger than 2GB in terms of regular memory
(although it could have a frame buffer or something mapped in the
part of the address space reserved for I/O devices, making its total
size perhaps appear larger than 2GB, but still definitely = 4GB).

I can be nearly 4GB but not over 4GB.  (On a 64 bit kernel or on sparcv9
there is no kernel address space as part of the userland address space)

*actually, a 32-bit _address space_ can't be.  On suitable hardware, a 32-bit 
kernel
can use special instructions to address more than 4GB of RAM, and some other
OSs allow even a user process to own more than one address space. I don't think
any of that applies here though.

We had a specific form of memory based filesystems for x86 and it is also 
possible for a 32 bit process can cache more then 4GB in the file cache.

Casper

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?

2010-04-25 Thread Yves Huang
On Sun, Apr 25, 2010 at 8:43 AM, Richard L. Hamilton rlha...@smart.net wrote:
 [...]
  To get around the problem
  pmap: cannot examine 5608: address space is
 changing
 
  and get a closer look, try stopping the process
 first:
 
  pstop 5608
 
  and then running pmap or whatever to inspect it,
 and finally running
 
  prun 5608
 
  to let it run again.

 Why is pmap not doing this?

 I don't know.  I (a) don't work for Sun, and (b) only run
 OpenSolaris under VirtualBox on an Intel Mac Mini (my other
 systems are SPARC and running Solaris 9, and Solaris SXCE snv_97,
 respectively).  So just starting OpenSolaris under VirtualBox uses
 about all the time I'm willing to spend getting my answer right
 on anything but the most interesting problems.  I would _guess_ that
 either using the -F option might also help

 -F  Force. Grabs the target process even  if
 another process has control.

-F has no effect.

 We're still verifying the output because the script
 finished in four
 hours while the legacy perl version of the script
 used to run a whole
 weekend.
 This is suspicious and too good to be true.

We are done with the verification of the data and did not find any
problems so far.

 Maybe, maybe not.  If the rewrite also improved the algorithm
 or implementation, that's entirely possible.  Also, a lot of perl
 scripts run more external programs than they really ought to.
 Recent ksh93 has a lot of built-in commands that used to require
 running external programs.

Is there any documentation about this feature?

 So for any of various reasons, such
 an improvement, although suspicious, doesn't seem at all impossible
 to me. (It also leaves me thinking that perhaps with enough effort,
 further improvements might be possible.)

OK. We'll have a look.

Yves
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?

2010-04-25 Thread Richard L. Hamilton
[...]
  Recent ksh93 has a lot of built-in commands that
 used to require
  running external programs.
 
 Is there any documentation about this feature?

The ksh93 man page includes some (but perhaps not all) information about builtin
commands.

The ksh93 built-in command builtin will list all such commands that it knows 
about.
Some are in the form of pathnames; they are described in the Korn Shell FAQ at
http://kornshell.com/doc/faq.html  as follows:
Q12.When I type builtin, I notice that some of these are full pathnames.
What does this mean?
A12.Builtins that are not bound to pathnames are always searched
for before doing a path search.  Builtins that are bound
to pathnames are only executed when the path search would
bind to this pathname.

The FAQ also mentions that one can add one's own built-in commands.
(where the text below says mail I think it means main)
Q4. How do I add built-in commands?
A4. There are two ways to do this.  One is write a shared library
with functions whose names are b_ where  is the name of
the builtin.  The function b_ takes three argument.  The first
two are the same as a mail program.  The third parameter is
a pointer argument which will point to the current shell context.
The second way is to write a shared library with a function named
lib_init().  This function will be called with an argument of 0
after the library is loaded.  This function can add built-ins
with the sh_addbuiltin() API function.  In both cases, the
library is loaded into the shell with the builtin utility.


The folks working on ksh93 integration have submitted at least one ARC
request recently to add additional built-in shell commands.  So there's
a pretty good chance that even without any rewrites, existing ksh93 scripts
that use those will get faster once they become available.  Look through
arc-discuss in the forums or mail archives if you're curious.  I don't think
it will say what build they're targeting though, so no idea when that might
become available.
-- 
This message posted from opensolaris.org
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


[osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?

2010-04-24 Thread Yves Huang
Good Day all; I'm new here.

We have an urgent problem. A test server which is ready for
demonstration (and purchase from Sun/Oracle) suffers from some kind of
kernel problem: A 32bit process (ksh) started to consume more than 4G
of memory and is still running but defeats some attempts to observe
it:

Memory usage is 4.8G and rising and we're troubled what may cause the
kernel to ignore the 32bit address limit:
prstat 1 | head -3
   PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP
  5608 remy24930M 4588M cpu9580   0:24:57 5.0% ksh/1
   380 root 8896K 2392K sleep   590   0:00:58 0.1% automountd/4

Observing the process is not always possible, some tools fail like this one:
pmap -x 5608
5608:   ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom
5608:   ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom
5608:   ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom
5608:   ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom
5608:   ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom
5608:   ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom
pmap: cannot examine 5608: address space is changing

What should we do in this case? Reboot? How can we prevent this from
happening again during the demonstration on Monday?

Yves
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?

2010-04-24 Thread Yves Huang
On Sun, Apr 25, 2010 at 1:39 AM, Yves Huang
yves.huang.proje...@googlemail.com wrote:
 Good Day all; I'm new here.

 We have an urgent problem. A test server which is ready for
 demonstration (and purchase from Sun/Oracle) suffers from some kind of
 kernel problem: A 32bit process (ksh) started to consume more than 4G
 of memory and is still running but defeats some attempts to observe
 it:

 Memory usage is 4.8G and rising and we're troubled what may cause the
 kernel to ignore the 32bit address limit:
 prstat 1 | head -3
   PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP
  5608 remy24930M 4588M cpu9580   0:24:57 5.0% ksh/1
   380 root 8896K 2392K sleep   590   0:00:58 0.1% automountd/4

 Observing the process is not always possible, some tools fail like this one:
 pmap -x 5608
 5608:   ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom
 5608:   ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom
 5608:   ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom
 5608:   ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom
 5608:   ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom
 5608:   ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom
 pmap: cannot examine 5608: address space is changing

 What should we do in this case? Reboot? How can we prevent this from
 happening again during the demonstration on Monday?

HELP HELP

The memory usage is now 12.1G. We managed to monitor the output and it
makes sense, the script is doing correct work, the memory usage is
still worrying.

Yves
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?

2010-04-24 Thread Richard L. Hamilton
A 32-bit process _can't_* be bigger than 4GB, and as far as the kernel
is concerned, AFAIK won't be bigger than 2GB in terms of regular memory
(although it could have a frame buffer or something mapped in the
part of the address space reserved for I/O devices, making its total
size perhaps appear larger than 2GB, but still definitely = 4GB).

*actually, a 32-bit _address space_ can't be.  On suitable hardware, a 32-bit 
kernel
can use special instructions to address more than 4GB of RAM, and some other
OSs allow even a user process to own more than one address space. I don't think
any of that applies here though.

Assuming you're running OpenSolaris and not Solaris 10 or SXCE,
ksh is actually ksh93, and was built both 32-bit and 64-bit.

Programs like that are typically just a link to (or copy of)
/usr/lib/isaexec, which looks in subdirectories (i86 or amd64 for x86,
sparcv7 or sparcv9 for SPARC) of the $PATH directories to find a
64-bit or 32-bit version, and then execs the 64-bit version if on a 64-bit
capable system, otherwise the 32-bit version.

Example:

$ uname -a
SunOS virtualbox-indiana 5.11 snv_108 i86pc i386 i86pc
$ pargs -x $$|grep AT_SUN_EXECNAME
AT_SUN_EXECNAME 0xfd7fffdfffdb /usr/bin/amd64/ksh93
$ pflags $$
789:ksh
data model = _LP64  flags = ORPHAN|MSACCT|MSFORK
 /1:flags = ASLEEP  waitid(0x7,0x0,0xfd7fffdfebf0,0xf)

$ file /usr/bin/ksh* /usr/bin/*/ksh*  
/usr/bin/ksh:   ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically 
linked, not stripped
/usr/bin/ksh93: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically 
linked, not stripped, no debugging information available
/usr/bin/amd64/ksh93:   ELF 64-bit LSB executable AMD64 Version 1 [SSE FXSR 
FPU], dynamically linked, not stripped, no debugging information available
/usr/bin/i86/ksh93: ELF 32-bit LSB executable 80386 Version 1 [FPU], 
dynamically linked, not stripped, no debugging information available

(My /usr/bin/ksh and /usr/bin/ksh93 are not quite the same, possibly due to 
having
separately put a ksh93 update on the system.  But /usr/bin/ksh is till tiny, 
just a
wrapper, and even if I specifically execute /usr/bin/ksh, what ends up running
is still /usr/bin/amd64/ksh93.  So don't let that confuse the issue.)

To get around the problem
pmap: cannot examine 5608: address space is changing

and get a closer look, try stopping the process first:

pstop 5608

and then running pmap or whatever to inspect it, and finally running

prun 5608

to let it run again.

I suspect the shell script being run is the real problem; not too many
well-written shell scripts should grow to such monster size.
-- 
This message posted from opensolaris.org
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?

2010-04-24 Thread Richard L. Hamilton
(oops, got chopped in the forum the first time, due to punctuation it didn't 
like)

A 32-bit process _can't_* be bigger than 4GB, and as far as the kernel
is concerned, AFAIK won't be bigger than 2GB in terms of regular memory
(although it could have a frame buffer or something mapped in the
part of the address space reserved for I/O devices, making its total
size perhaps appear larger than 2GB, but still definitely less than or equal to 
4GB).

*actually, a 32-bit _address space_ can't be.  On suitable hardware, a 32-bit 
kernel
can use special instructions to address more than 4GB of RAM, and some other
OSs allow even a user process to own more than one address space. I don't think
any of that applies here though.

Assuming you're running OpenSolaris and not Solaris 10 or SXCE,
ksh is actually ksh93, and was built both 32-bit and 64-bit.

Programs like that are typically just a link to (or copy of)
/usr/lib/isaexec, which looks in subdirectories (i86 or amd64 for x86,
sparcv7 or sparcv9 for SPARC) of the $PATH directories to find a
64-bit or 32-bit version, and then execs the 64-bit version if on a 64-bit
capable system, otherwise the 32-bit version.

Example:

$ uname -a
SunOS virtualbox-indiana 5.11 snv_108 i86pc i386 i86pc
$ pargs -x $$|grep AT_SUN_EXECNAME
AT_SUN_EXECNAME 0xfd7fffdfffdb /usr/bin/amd64/ksh93
$ pflags $$
789:ksh
data model = _LP64  flags = ORPHAN|MSACCT|MSFORK
 /1:flags = ASLEEP  waitid(0x7,0x0,0xfd7fffdfebf0,0xf)

$ file /usr/bin/ksh* /usr/bin/*/ksh*  
/usr/bin/ksh:   ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically 
linked, not stripped
/usr/bin/ksh93: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically 
linked, not stripped, no debugging information available
/usr/bin/amd64/ksh93:   ELF 64-bit LSB executable AMD64 Version 1 [SSE FXSR 
FPU], dynamically linked, not stripped, no debugging information available
/usr/bin/i86/ksh93: ELF 32-bit LSB executable 80386 Version 1 [FPU], 
dynamically linked, not stripped, no debugging information available

(My /usr/bin/ksh and /usr/bin/ksh93 are not quite the same, possibly due to 
having
separately put a ksh93 update on the system.  But /usr/bin/ksh is till tiny, 
just a
wrapper, and even if I specifically execute /usr/bin/ksh, what ends up running
is still /usr/bin/amd64/ksh93.  So don't let that confuse the issue.)

To get around the problem
pmap: cannot examine 5608: address space is changing

and get a closer look, try stopping the process first:

pstop 5608

and then running pmap or whatever to inspect it, and finally running

prun 5608

to let it run again.

I suspect the shell script being run is the real problem; not too many
well-written shell scripts should grow to such monster size.
-- 
This message posted from opensolaris.org
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?

2010-04-24 Thread Yves Huang
On Sun, Apr 25, 2010 at 6:08 AM, Richard L. Hamilton rlha...@smart.net wrote:
 A 32-bit process _can't_* be bigger than 4GB, and as far as the kernel
 is concerned, AFAIK won't be bigger than 2GB in terms of regular memory
 (although it could have a frame buffer or something mapped in the
 part of the address space reserved for I/O devices, making its total
 size perhaps appear larger than 2GB, but still definitely = 4GB).

 *actually, a 32-bit _address space_ can't be.  On suitable hardware, a 32-bit 
 kernel
 can use special instructions to address more than 4GB of RAM, and some other
 OSs allow even a user process to own more than one address space. I don't 
 think
 any of that applies here though.

This is what we assumed. But none of the senior admins expected ksh to
be a 64bit shell and we were quite worried about the out-of-control
32bit process.

 Assuming you're running OpenSolaris and not Solaris 10 or SXCE,
 ksh is actually ksh93, and was built both 32-bit and 64-bit.

OK. This comes as surprise, albeit a good one.

We've figured we did a mistake and passed the whole set of data to the
script and not the demo data, which is a difference between 500 files
(demo set) and 725298 files (production set). We've figured that
without a 64bit shell the script would've crashed.

 To get around the problem
 pmap: cannot examine 5608: address space is changing

 and get a closer look, try stopping the process first:

 pstop 5608

 and then running pmap or whatever to inspect it, and finally running

 prun 5608

 to let it run again.

Why is pmap not doing this?

 I suspect the shell script being run is the real problem; not too many
 well-written shell scripts should grow to such monster size.

We called upon the author to explain: He said the script caches many
data in memory (arrays) during execution and the 19G memory peak usage
matches the working set of the input data.

We're still verifying the output because the script finished in four
hours while the legacy perl version of the script used to run a whole
weekend.
This is suspicious and too good to be true.

Yves
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org