Hi guys, We are trying to find out details (locate processes=jobs which utilise so much memory).
We did not apply "export MALLOCTYPE=buckets" to .profiles as HelpDesk suggested. This option is exported only on our one test area, which is rarely processed. I read http://www.redbooks.ibm.com/redbooks/pdfs/sg247463.pdf and: "The current default allocator is the Yorktown allocator. With respect to speed, the Yorktown allocator does not efficiently handle repeated small to medium size requests. This deficiency was previously addressed by adding the Malloc Buckets algorithm. Malloc Buckets, however, provides no way to consolidate freed memory into the heap. The new bucket allocator will allow freed memory to be reclaimed. Through the use of the new bucket allocator, the Watson Allocator handles small requests quickly and with comparatively little wasted memory. The Watson Allocator also performs quicker than the default allocator, and with less internal fragmentation than the old bucket allocator. The Watson Allocator can be configured in three distinct ways to try and identify which sections reveal the largest gains. It can be enabled with caching mechanisms (a per thread cache and an adaptive heap cache) and with the new bucket allocator. It can be enabled without the caching mechanisms and with the new bucket allocator. It can also be enabled without the caching mechanisms and without the new bucket allocator. The new bucket allocator and thread cache have been adapted to work with the Yorktown allocator." It seems to me that we should try what Jim suggested: "export MALLOCTYPE=watson" and perhaps some MALLOCOPTIONS settings. Let's leave it for the second, because something interesting shown on our LIVE area today in the very morning. Session that runs TSM produced following output: jsh techuser ~ -->START.TSM START.TSM Phantom process started on process id 2449420 [2449420] Done : tSA 1 jsh techuser ~ -->Process ID 1232928 , port 794 , hangup Program source name CLEAR.TOKENS , line 26 Recursive debugger calls - program aborting Process ID 5054700 , port 794 , hangup Program source name CLEAR.TOKENS , line 27 Recursive debugger calls - program aborting jBASE: Segmentation violation. Aborting cp: ../bnk.data/int.data/DM.TEMP/DC.CARD.ISSUE.HIS.DM: No such file or directory jBASE: Segmentation violation. Aborting jBASE: Attempting to free NULL pointer at jediTransaction.c,1636(EB.TRANS.JBASE, 26) jsh techuser ~ -->Process ID 7737374 , port 805 , hangup Program source name F.READ , line 7 You can ignore hangups, but I am worried about these jBASE errors (Segmentation violation / Attempting to free NULL pointer at jediTransaction.c,1636). This does not sound good to me. We do not know which processes thrown these messages, but likely they were COB agents. I do not know yet wheter physical / swap memory run out yesterday on PROD, but it quite unlikely (total memory of LIVE system is 2-3 times bigger than on test machines). I would like to mention one fact from the past. During "start of year" (2nd January) processing we faced 1 "little" problem: a) one of the single threaded jobs did a large transaction (over 900k of changes) - we have already requested to improve this core EOY job b) then later one of the batch sessions (agent) failed with SUBROUTINE_CALL_FAIL error. There was nothing wrong with our libraries - called object was there and routine that failed was successfully called by other COB agents. Only one COB agent noted SUBROUTINE_CALL_FAIL error, which seemed to be very strange. We have raised that and CSHD conclusion was: "agent run out of shared memory" (ulimit is unlimited on LIVE) so use "slibclean" periodically to reclaim memory. I think now that agent which failed on 2nd January performed in previous steps large transaction, means allocated large "transaction buffer" and finally got SUBROUTINE_CALL_FAIL on one of the following jobs (not immediately). That is why I suggested that "transaction buffer" may not get downsized or leaks memory. I also guess that it may not be a leak, but default MALLOC allocator fault. I am not sure if Watson will help, but reading Jim's emails we will give it try. We need to do more analysis and testing, so I will come back with conclusion. We have red alert now and are trying to understand what is going on with memory utilisation :) Kind regards Pawel Dnia 5-02-2009 o godz. 0:31 Mike Preece napisał(a): > What is the CoB job that fails? > > Is it a report job that has gone wrong somehow, generating > ridiculously large transactions? > > Is it localdev or core? > > If it is a report, should it run in CoB or can it be set to run on an > "A"dhoc basis instead? > > On Feb 4, 4:16 pm, Jim Idle <[email protected]> wrote: > > Pawel (privately) wrote: > > > Hi, > > > > > I know that it sounds unbeliveable for most of you, but I would like to > > > share with problems that we started to face on our test servers about > > > 2-3 days ago. > > > > > We started to run out of memory errors on our test servers, eg.: > > > jsh t24fe ~ -->jdiag > > > ** Warning [ PERFORM_ERROR ] ** > > > Unix error number 0 while attempting PERFORM , Line 111 , Source jsh.b > > > Trap from an error message, error message name = PERFORM_ERROR > > > Line 111 , Source jsh.b > > > jBASE debugger->q > > > Are you sure ?y > > > . > > > > I meant to add that if valgrind can be compiled on your system, then you > > could run it against a jBASE program that exited normally. You will see > > that some allocations are left to the system to reclaim, but nothing > > from the application. However valgrind does have a useful heap > > examination tool that can identify where memory is being held out on the > > stack. I have only ever used it in Linux however. > > > > There are other tools for AIX, but I think that most of them would > > require source code. What does Temenos help desk say by the way? They > > are only going to have to ask the jBASE guys, but we have been through > > this sort of thing lots of times. > > > > Jim- Hide quoted text - ---------------------------------------------------- Nużą Cię utarte scenariusze? Wymyśl własną grę flashową i wygraj główną nagrodę 5.500 Euro: http://klik.wp.pl/?adr=http%3A%2F%2Fwhosegame.pl%2Fcontestcard.php%3Fcontest%3D55&sid=631 --~--~---------~--~----~------------~-------~--~----~ Please read the posting guidelines at: http://groups.google.com/group/jBASE/web/Posting%20Guidelines IMPORTANT: Type T24: at the start of the subject line for questions specific to Globus/T24 To post, send email to [email protected] To unsubscribe, send email to [email protected] For more options, visit this group at http://groups.google.com/group/jBASE?hl=en -~----------~----~----~----~------~----~------~--~---
