Pawel (privately) wrote:
Pawel,Hi, I know that it sounds unbeliveable for most of you, but I would like to share with problems that we started to face on our test servers about 2-3 days ago.We started to run out of memory errors on our test servers, eg.: jsh t24fe ~ -->jdiag ** Warning [ PERFORM_ERROR ] ** Unix error number 0 while attempting PERFORM , Line 111 , Source jsh.b Trap from an error message, error message name = PERFORM_ERROR Line 111 , Source jsh.b jBASE debugger->q Are you sure ?y Have you set the environment variable for AIX that causes memory allocation to be from the top of the HEAP down? It changes the allocation algorithm for AIX and is a huge improvement over the standard one. In fact, I suspect that as the engineers produced this new algorithm for allocation patterns like your batch job, that the standard one no longer tries to deal with it, though the literature does not indicate this: Improvements to malloc subsystem The number of malloc-related environment variables supported by AIX 5L Version 5.3 has been reduced to three and the attributes they can assume have been redefined. These environment variables are: MALLOCTYPE, MALLOCOPTIONS, and MALLOCDEBUG. MALLOCOPTIONS is a new environment variable that has been added to take care of all current and future options to the MALLOCTYPE allocators. It supplants the MALLOCBUCKETS, MALLOCMULTIHEAP, and MALLOCDISCLAIM that have been deprecated. The three environment variables have the following definitions and some of the attributes they can assume:
The following enhancements have also been incorporated: ... etc. I think that you want: export MALLOCTYPE=watson in your .profile However there are other options to tune that, including a malloc
cache and so on, so read up on it. It is safe to go right ahead and try
that simple export, but don't give up on it if it seems not to improve
things right away. Also check your local settings here. A lot of malloc algorithms pre-allocate swap space in case they NEED to swap. Hence you can run out of swap even though you have not actually used any. For this, you just allocate a huge amount of swap (disk is cheap) or find the tuning options to tell the system to run in optimistic mode for swap, which means it doesn't pre-allocate the swap and you are on your own if you run out of it.. We have several instances of some banking product on each test server. We run batch processing (COB) on these test servers - usually 2-3 envs in the same time on one, single AIX "test" server. Our specificity is that we run these COBs using multiple agents - say 20 processes, giving 60 agents in total (60 jBASE processes running in the same time on one test machine). Today I was first time informed about problem and we have checked memory allocation. It showed that we simply run out of it (physical and swap runs out). You have to be a bit careful here as you need to distinguish between memory that is shared by all processes (for mmap() of files and libraries) and real data memory consumed by your application. top seems to be reasonably good at distinguishing though.We have also found out that some of jBASE (COB) processes are consuming large amounts of memory, for example: 16 eoyrx08 61482 633 (521) 1025 26K 9.38M 395K 6541 1212M 35m 2 SLEEP tSA 8 (BATCH.JOB.CONTROL,593) 26 t24ferx 503830 786 (776) 228 266K 12.2M 1.33M 1814 1321M 34m 2 SLEEP tSA 7 (BATCH.JOB.CONTROL,323) 28 t24ferx 430118 808 (799) 200 240K 17.6M 1.28M 1813 1200M 45m 2 tSA 6 (BATCH.JOB.CONTROL,322) However, what usually happens here is poor application programming. You can of course only inspect your local code but you should look for subroutines that do things like logging but never free up the variables they are using to accumulate log records, things like that. As the only person to have written an MQ interface was me, I have supreme confidence in your MQ links ;-)I know that somebody can start to suggest me that our local C/C++ code is causing memory leaks. Please belive me that we do not run any C/C++ code during batch processing. We have only 1 (MQ) library written in C and interfaced (DEFC) to jBASE. It was done by external vendor and is LIVE since 5 years. It was thoroughly tested few years ago against memory leaks. You have to belive me, but this library does not run during COB. It is used only by some online processes. I see how you got here, but I would be extremely surprised :-) In 19 years, this has rarely been the case, though Greg and I had to fly all over the world to show that it wasn't more times than I can remember ;-). That doesn't mean it isn't a fault there, but it almost always isn't.Therefore I claim that something must be wrong with jBASE. Nope, the system wouldn't run for more than 5 minutes if that were the case.My guess is that jBASE does not free "transaction buffer" (does not downsize it once transaction is finished). Well, the best answer to that is of course to fix the things, as it is a stupid design [well, it isn't a design ;-)], but they are probably not your programs. However, while the internal buffer might grow, even if it were never shrunk, it would be reused, not lost and re-allocated and would soon reach the maximum that your application needed and stay there. Hence, if this were the issue it would be because your application was just growing the transaction forever.There are some (single threaded) jobs during our COB that create huge transactions (eg. 900K changes in one transaction). It seems to me that "changes" buffer is never downsized or this memory simply "leaks" somehow. However, I suspect that the buffer allocation for such large transactions might be the root of your problem and that if you change the malloc algorithm to Watson, it will have a much easier time of it. Yes - it is usually the allocation algorithm you are using and rogue applications :-)Does anyone face(d) similar problems? jBASE version 4.1.5.17 (Major 4.1 , Minor 5.17 , Patch 5690 (Change 52756)), AIX 5.3.0.0-06. Jim --~--~---------~--~----~------------~-------~--~----~ Please read the posting guidelines at: http://groups.google.com/group/jBASE/web/Posting%20Guidelines IMPORTANT: Type T24: at the start of the subject line for questions specific to Globus/T24
To post, send email to [email protected]
|
- jBASE 4.1.5.17 - does anyone face "out of memory&qu... Pawel (privately)
- Re: jBASE 4.1.5.17 - does anyone face "out of ... Pawel (privately)
- Re: jBASE 4.1.5.17 - does anyone face "out of ... Jim Idle
- Re: jBASE 4.1.5.17 - does anyone face "out of ... Jim Idle
- Re: jBASE 4.1.5.17 - does anyone face "out... Mike Preece
- Re: jBASE 4.1.5.17 - does anyone face "... Pawel (privately)
- Re: jBASE 4.1.5.17 - does anyone face &... Pawel (privately)
- Re: jBASE 4.1.5.17 - does anyone f... Jim Idle
- Re: jBASE 4.1.5.17 - does anyo... Jim Idle
- Re: jBASE 4.1.5.17 - does anyone f... pat
- Re: jBASE 4.1.5.17 - does anyo... Greg Cooper
- Re: jBASE 4.1.5.17 - does ... Pawel (privately)
- Re: jBASE 4.1.5.17 - does ... Jim Idle
