Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?
[...] To get around the problem pmap: cannot examine 5608: address space is changing and get a closer look, try stopping the process first: pstop 5608 and then running pmap or whatever to inspect it, and finally running prun 5608 to let it run again. Why is pmap not doing this? I don't know. I (a) don't work for Sun, and (b) only run OpenSolaris under VirtualBox on an Intel Mac Mini (my other systems are SPARC and running Solaris 9, and Solaris SXCE snv_97, respectively). So just starting OpenSolaris under VirtualBox uses about all the time I'm willing to spend getting my answer right on anything but the most interesting problems. I would _guess_ that either using the -F option might also help -F Force. Grabs the target process even if another process has control. or else that there's a bug in (or affecting) pmap. I suggested pstop/prun simply as the first thing I could think of to work around the problem. I suspect the shell script being run is the real problem; not too many well-written shell scripts should grow to such monster size. We called upon the author to explain: He said the script caches many data in memory (arrays) during execution and the 19G memory peak usage matches the working set of the input data. Ok. Sometimes, esp. if it would otherwise run very slowly, that can make sense. We're still verifying the output because the script finished in four hours while the legacy perl version of the script used to run a whole weekend. This is suspicious and too good to be true. Maybe, maybe not. If the rewrite also improved the algorithm or implementation, that's entirely possible. Also, a lot of perl scripts run more external programs than they really ought to. Recent ksh93 has a lot of built-in commands that used to require running external programs. So for any of various reasons, such an improvement, although suspicious, doesn't seem at all impossible to me. (It also leaves me thinking that perhaps with enough effort, further improvements might be possible.) Note: I'm not necessarily saying that recent ksh93 is faster than perl. Both can be fast or slow depending on how the script is written; probably neither will give as much help to optimize out slow things the programmer did as would some other languages. It used to be that perl was often faster than shell scripts, perhaps only slower if the longer startup time for perl was an issue. But recent ksh93 is probably at least capable of being quite close (or _maybe_ even a bit faster), so simply assuming that one is faster than the other is probably more wrong now than it ever was. I think I've read things suggesting that the ksh93 developers want to make it competitive with perl in both speed and at least commonly used functionality. Although even if it were there today (and on functionality, I don't think it is really), it would be a long time before there was anything like perl's CPAN. Nevertheless, where it's good enough, I like ksh93 better myself. I learned C and awk and sed way back before perl even existed, so perl looks to me like a stew with random bits of everything mixed in. It gives me a headache, and I have to have a manual open the whole time I write something in it. Borrowing stuff from everywhere may look natural to someone like Larry Wall (perl creator) who started as a linguist, but to me, usually more comfortable with predictable computers than unpredictable people, I'd rather use a language where I can remember a few rules rather than a lot of details. -- This message posted from opensolaris.org ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?
Richard L. Hamilton wrote: [...] To get around the problem pmap: cannot examine 5608: address space is changing and get a closer look, try stopping the process first: pstop 5608 and then running pmap or whatever to inspect it, and finally running prun 5608 to let it run again. Why is pmap not doing this? I don't know. I (a) don't work for Sun, and (b) only run OpenSolaris under VirtualBox on an Intel Mac Mini (my other systems are SPARC and running Solaris 9, and Solaris SXCE snv_97, respectively). So just starting OpenSolaris under VirtualBox uses about all the time I'm willing to spend getting my answer right on anything but the most interesting problems. I would _guess_ that either using the -F option might also help -F Force. Grabs the target process even if another process has control. You could also try -L, since (apart from changing the output), this seems to change the way the data is collected via an agent LWP inside the process itself, and possibly involving stopping the process. Like Richard, I cannot say for sure though. As for the default behavior, the presence of the address space is changing message and the fact that the code loops with #define MAX_RETRIES 5 implies that the process is definitely not stopped and that the possibility of change is expected. The code that handles -L with an agent LWP, on the other hand, goes to some effort to buffer stdout to avoid the deadlock you'd get with pmap `pgrep xterm` from inside the relevant xterm nad has comments about the process being blocked. Running pmap on the X server you're issuing pmap on being a well known way to lock up your desktop session ... Hugh. ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?
A 32-bit process _can't_* be bigger than 4GB, and as far as the kernel is concerned, AFAIK won't be bigger than 2GB in terms of regular memory (although it could have a frame buffer or something mapped in the part of the address space reserved for I/O devices, making its total size perhaps appear larger than 2GB, but still definitely = 4GB). I can be nearly 4GB but not over 4GB. (On a 64 bit kernel or on sparcv9 there is no kernel address space as part of the userland address space) *actually, a 32-bit _address space_ can't be. On suitable hardware, a 32-bit kernel can use special instructions to address more than 4GB of RAM, and some other OSs allow even a user process to own more than one address space. I don't think any of that applies here though. We had a specific form of memory based filesystems for x86 and it is also possible for a 32 bit process can cache more then 4GB in the file cache. Casper ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?
On Sun, Apr 25, 2010 at 8:43 AM, Richard L. Hamilton rlha...@smart.net wrote: [...] To get around the problem pmap: cannot examine 5608: address space is changing and get a closer look, try stopping the process first: pstop 5608 and then running pmap or whatever to inspect it, and finally running prun 5608 to let it run again. Why is pmap not doing this? I don't know. I (a) don't work for Sun, and (b) only run OpenSolaris under VirtualBox on an Intel Mac Mini (my other systems are SPARC and running Solaris 9, and Solaris SXCE snv_97, respectively). So just starting OpenSolaris under VirtualBox uses about all the time I'm willing to spend getting my answer right on anything but the most interesting problems. I would _guess_ that either using the -F option might also help -F Force. Grabs the target process even if another process has control. -F has no effect. We're still verifying the output because the script finished in four hours while the legacy perl version of the script used to run a whole weekend. This is suspicious and too good to be true. We are done with the verification of the data and did not find any problems so far. Maybe, maybe not. If the rewrite also improved the algorithm or implementation, that's entirely possible. Also, a lot of perl scripts run more external programs than they really ought to. Recent ksh93 has a lot of built-in commands that used to require running external programs. Is there any documentation about this feature? So for any of various reasons, such an improvement, although suspicious, doesn't seem at all impossible to me. (It also leaves me thinking that perhaps with enough effort, further improvements might be possible.) OK. We'll have a look. Yves ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?
[...] Recent ksh93 has a lot of built-in commands that used to require running external programs. Is there any documentation about this feature? The ksh93 man page includes some (but perhaps not all) information about builtin commands. The ksh93 built-in command builtin will list all such commands that it knows about. Some are in the form of pathnames; they are described in the Korn Shell FAQ at http://kornshell.com/doc/faq.html as follows: Q12.When I type builtin, I notice that some of these are full pathnames. What does this mean? A12.Builtins that are not bound to pathnames are always searched for before doing a path search. Builtins that are bound to pathnames are only executed when the path search would bind to this pathname. The FAQ also mentions that one can add one's own built-in commands. (where the text below says mail I think it means main) Q4. How do I add built-in commands? A4. There are two ways to do this. One is write a shared library with functions whose names are b_ where is the name of the builtin. The function b_ takes three argument. The first two are the same as a mail program. The third parameter is a pointer argument which will point to the current shell context. The second way is to write a shared library with a function named lib_init(). This function will be called with an argument of 0 after the library is loaded. This function can add built-ins with the sh_addbuiltin() API function. In both cases, the library is loaded into the shell with the builtin utility. The folks working on ksh93 integration have submitted at least one ARC request recently to add additional built-in shell commands. So there's a pretty good chance that even without any rewrites, existing ksh93 scripts that use those will get faster once they become available. Look through arc-discuss in the forums or mail archives if you're curious. I don't think it will say what build they're targeting though, so no idea when that might become available. -- This message posted from opensolaris.org ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
[osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?
Good Day all; I'm new here. We have an urgent problem. A test server which is ready for demonstration (and purchase from Sun/Oracle) suffers from some kind of kernel problem: A 32bit process (ksh) started to consume more than 4G of memory and is still running but defeats some attempts to observe it: Memory usage is 4.8G and rising and we're troubled what may cause the kernel to ignore the 32bit address limit: prstat 1 | head -3 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 5608 remy24930M 4588M cpu9580 0:24:57 5.0% ksh/1 380 root 8896K 2392K sleep 590 0:00:58 0.1% automountd/4 Observing the process is not always possible, some tools fail like this one: pmap -x 5608 5608: ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom 5608: ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom 5608: ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom 5608: ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom 5608: ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom 5608: ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom pmap: cannot examine 5608: address space is changing What should we do in this case? Reboot? How can we prevent this from happening again during the demonstration on Monday? Yves ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?
On Sun, Apr 25, 2010 at 1:39 AM, Yves Huang yves.huang.proje...@googlemail.com wrote: Good Day all; I'm new here. We have an urgent problem. A test server which is ready for demonstration (and purchase from Sun/Oracle) suffers from some kind of kernel problem: A 32bit process (ksh) started to consume more than 4G of memory and is still running but defeats some attempts to observe it: Memory usage is 4.8G and rising and we're troubled what may cause the kernel to ignore the 32bit address limit: prstat 1 | head -3 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 5608 remy24930M 4588M cpu9580 0:24:57 5.0% ksh/1 380 root 8896K 2392K sleep 590 0:00:58 0.1% automountd/4 Observing the process is not always possible, some tools fail like this one: pmap -x 5608 5608: ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom 5608: ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom 5608: ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom 5608: ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom 5608: ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom 5608: ksh /home/remy2/prod/test/opensolaris/transactions/daily423 /hom pmap: cannot examine 5608: address space is changing What should we do in this case? Reboot? How can we prevent this from happening again during the demonstration on Monday? HELP HELP The memory usage is now 12.1G. We managed to monitor the output and it makes sense, the script is doing correct work, the memory usage is still worrying. Yves ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?
A 32-bit process _can't_* be bigger than 4GB, and as far as the kernel is concerned, AFAIK won't be bigger than 2GB in terms of regular memory (although it could have a frame buffer or something mapped in the part of the address space reserved for I/O devices, making its total size perhaps appear larger than 2GB, but still definitely = 4GB). *actually, a 32-bit _address space_ can't be. On suitable hardware, a 32-bit kernel can use special instructions to address more than 4GB of RAM, and some other OSs allow even a user process to own more than one address space. I don't think any of that applies here though. Assuming you're running OpenSolaris and not Solaris 10 or SXCE, ksh is actually ksh93, and was built both 32-bit and 64-bit. Programs like that are typically just a link to (or copy of) /usr/lib/isaexec, which looks in subdirectories (i86 or amd64 for x86, sparcv7 or sparcv9 for SPARC) of the $PATH directories to find a 64-bit or 32-bit version, and then execs the 64-bit version if on a 64-bit capable system, otherwise the 32-bit version. Example: $ uname -a SunOS virtualbox-indiana 5.11 snv_108 i86pc i386 i86pc $ pargs -x $$|grep AT_SUN_EXECNAME AT_SUN_EXECNAME 0xfd7fffdfffdb /usr/bin/amd64/ksh93 $ pflags $$ 789:ksh data model = _LP64 flags = ORPHAN|MSACCT|MSFORK /1:flags = ASLEEP waitid(0x7,0x0,0xfd7fffdfebf0,0xf) $ file /usr/bin/ksh* /usr/bin/*/ksh* /usr/bin/ksh: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped /usr/bin/ksh93: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped, no debugging information available /usr/bin/amd64/ksh93: ELF 64-bit LSB executable AMD64 Version 1 [SSE FXSR FPU], dynamically linked, not stripped, no debugging information available /usr/bin/i86/ksh93: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped, no debugging information available (My /usr/bin/ksh and /usr/bin/ksh93 are not quite the same, possibly due to having separately put a ksh93 update on the system. But /usr/bin/ksh is till tiny, just a wrapper, and even if I specifically execute /usr/bin/ksh, what ends up running is still /usr/bin/amd64/ksh93. So don't let that confuse the issue.) To get around the problem pmap: cannot examine 5608: address space is changing and get a closer look, try stopping the process first: pstop 5608 and then running pmap or whatever to inspect it, and finally running prun 5608 to let it run again. I suspect the shell script being run is the real problem; not too many well-written shell scripts should grow to such monster size. -- This message posted from opensolaris.org ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?
(oops, got chopped in the forum the first time, due to punctuation it didn't like) A 32-bit process _can't_* be bigger than 4GB, and as far as the kernel is concerned, AFAIK won't be bigger than 2GB in terms of regular memory (although it could have a frame buffer or something mapped in the part of the address space reserved for I/O devices, making its total size perhaps appear larger than 2GB, but still definitely less than or equal to 4GB). *actually, a 32-bit _address space_ can't be. On suitable hardware, a 32-bit kernel can use special instructions to address more than 4GB of RAM, and some other OSs allow even a user process to own more than one address space. I don't think any of that applies here though. Assuming you're running OpenSolaris and not Solaris 10 or SXCE, ksh is actually ksh93, and was built both 32-bit and 64-bit. Programs like that are typically just a link to (or copy of) /usr/lib/isaexec, which looks in subdirectories (i86 or amd64 for x86, sparcv7 or sparcv9 for SPARC) of the $PATH directories to find a 64-bit or 32-bit version, and then execs the 64-bit version if on a 64-bit capable system, otherwise the 32-bit version. Example: $ uname -a SunOS virtualbox-indiana 5.11 snv_108 i86pc i386 i86pc $ pargs -x $$|grep AT_SUN_EXECNAME AT_SUN_EXECNAME 0xfd7fffdfffdb /usr/bin/amd64/ksh93 $ pflags $$ 789:ksh data model = _LP64 flags = ORPHAN|MSACCT|MSFORK /1:flags = ASLEEP waitid(0x7,0x0,0xfd7fffdfebf0,0xf) $ file /usr/bin/ksh* /usr/bin/*/ksh* /usr/bin/ksh: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped /usr/bin/ksh93: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped, no debugging information available /usr/bin/amd64/ksh93: ELF 64-bit LSB executable AMD64 Version 1 [SSE FXSR FPU], dynamically linked, not stripped, no debugging information available /usr/bin/i86/ksh93: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped, no debugging information available (My /usr/bin/ksh and /usr/bin/ksh93 are not quite the same, possibly due to having separately put a ksh93 update on the system. But /usr/bin/ksh is till tiny, just a wrapper, and even if I specifically execute /usr/bin/ksh, what ends up running is still /usr/bin/amd64/ksh93. So don't let that confuse the issue.) To get around the problem pmap: cannot examine 5608: address space is changing and get a closer look, try stopping the process first: pstop 5608 and then running pmap or whatever to inspect it, and finally running prun 5608 to let it run again. I suspect the shell script being run is the real problem; not too many well-written shell scripts should grow to such monster size. -- This message posted from opensolaris.org ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] URGENT: Kernel lets 32bit process consume 4G memory?
On Sun, Apr 25, 2010 at 6:08 AM, Richard L. Hamilton rlha...@smart.net wrote: A 32-bit process _can't_* be bigger than 4GB, and as far as the kernel is concerned, AFAIK won't be bigger than 2GB in terms of regular memory (although it could have a frame buffer or something mapped in the part of the address space reserved for I/O devices, making its total size perhaps appear larger than 2GB, but still definitely = 4GB). *actually, a 32-bit _address space_ can't be. On suitable hardware, a 32-bit kernel can use special instructions to address more than 4GB of RAM, and some other OSs allow even a user process to own more than one address space. I don't think any of that applies here though. This is what we assumed. But none of the senior admins expected ksh to be a 64bit shell and we were quite worried about the out-of-control 32bit process. Assuming you're running OpenSolaris and not Solaris 10 or SXCE, ksh is actually ksh93, and was built both 32-bit and 64-bit. OK. This comes as surprise, albeit a good one. We've figured we did a mistake and passed the whole set of data to the script and not the demo data, which is a difference between 500 files (demo set) and 725298 files (production set). We've figured that without a 64bit shell the script would've crashed. To get around the problem pmap: cannot examine 5608: address space is changing and get a closer look, try stopping the process first: pstop 5608 and then running pmap or whatever to inspect it, and finally running prun 5608 to let it run again. Why is pmap not doing this? I suspect the shell script being run is the real problem; not too many well-written shell scripts should grow to such monster size. We called upon the author to explain: He said the script caches many data in memory (arrays) during execution and the 19G memory peak usage matches the working set of the input data. We're still verifying the output because the script finished in four hours while the legacy perl version of the script used to run a whole weekend. This is suspicious and too good to be true. Yves ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org