Hi guys,

We are trying to find out details (locate processes=jobs which utilise 
so much memory).

We did not apply "export MALLOCTYPE=buckets" to .profiles as HelpDesk 
suggested. This option is exported only on our one test area, which is 
rarely processed.

I read http://www.redbooks.ibm.com/redbooks/pdfs/sg247463.pdf and:
"The current default allocator is the Yorktown allocator. With respect 
to speed, the Yorktown allocator does not efficiently handle repeated 
small to medium size requests. This deficiency was previously addressed 
by adding the Malloc Buckets algorithm. Malloc Buckets, however, 
provides no way to consolidate freed memory into the heap. The new 
bucket allocator will allow freed memory to be reclaimed. Through the 
use of the new bucket allocator, the Watson Allocator handles small 
requests quickly and with comparatively little wasted memory. The Watson 
Allocator also performs quicker than the default allocator, and with 
less internal fragmentation than the old bucket allocator.

The Watson Allocator can be configured in three distinct ways to try and 
identify which sections reveal the largest gains. It can be enabled with 
caching mechanisms (a per thread cache and an adaptive heap cache) and 
with the new bucket allocator. It can be enabled without the caching 
mechanisms and with the new bucket allocator. It can also be enabled 
without the caching mechanisms and without the new bucket allocator. The 
new bucket allocator and thread cache have been adapted to work with the 
Yorktown allocator."

It seems to me that we should try what Jim suggested: "export 
MALLOCTYPE=watson" and perhaps some MALLOCOPTIONS settings.

Let's leave it for the second, because something interesting shown on 
our LIVE area today in the very morning. Session that runs TSM produced 
following output:
jsh techuser ~ -->START.TSM
START.TSM
Phantom process started on process id 2449420
 [2449420] Done : tSA 1
jsh techuser ~ -->Process ID 1232928 , port 794 , hangup
  Program source name CLEAR.TOKENS , line 26
Recursive debugger calls - program aborting
Process ID 5054700 , port 794 , hangup
  Program source name CLEAR.TOKENS , line 27
Recursive debugger calls - program aborting
jBASE: Segmentation violation. Aborting
cp: ../bnk.data/int.data/DM.TEMP/DC.CARD.ISSUE.HIS.DM: No such file or 
directory
jBASE: Segmentation violation. Aborting
jBASE: Attempting to free NULL pointer at 
jediTransaction.c,1636(EB.TRANS.JBASE,
26)
jsh techuser ~ -->Process ID 7737374 , port 805 , hangup
  Program source name F.READ , line 7

You can ignore hangups, but I am worried about these jBASE errors 
(Segmentation violation / Attempting to free NULL pointer at 
jediTransaction.c,1636). This does not sound good to me. We do not know 
which processes thrown these messages, but likely they were COB agents.

I do not know yet wheter physical / swap memory run out yesterday on 
PROD, but it quite unlikely (total memory of LIVE system is 2-3 times 
bigger than on test machines).

I would like to mention one fact from the past.

During "start of year" (2nd January) processing we faced 1 "little" 
problem:
a) one of the single threaded jobs did a large transaction (over 900k of 
changes) - we have already requested to improve this core EOY job
b) then later one of the batch sessions (agent) failed with 
SUBROUTINE_CALL_FAIL error. There was nothing wrong with our libraries - 
called object was there and routine that failed was successfully called 
by other COB agents. Only one COB agent noted SUBROUTINE_CALL_FAIL 
error, which seemed to be very strange. We have raised that and CSHD 
conclusion was: "agent run out of shared memory" (ulimit is unlimited on 
LIVE) so use "slibclean" periodically to reclaim memory.
I think now that agent which failed on 2nd January performed in previous 
steps large transaction, means allocated large "transaction buffer" and 
finally got SUBROUTINE_CALL_FAIL on one of the following jobs (not 
immediately).
That is why I suggested that "transaction buffer" may not get downsized 
or leaks memory. I also guess that it may not be a leak, but default 
MALLOC allocator fault. I am not sure if Watson will help, but reading 
Jim's emails we will give it try.

We need to do more analysis and testing, so I will come back with 
conclusion. We have red alert now and are trying to understand what is 
going on with memory utilisation :)

Kind regards
Pawel

Dnia 5-02-2009 o godz. 0:31 Mike Preece napisał(a):
> What is the CoB job that fails?
> 
> Is it a report job that has gone wrong somehow, generating
> ridiculously large transactions?
> 
> Is it localdev or core?
> 
> If it is a report, should it run in CoB or can it be set to run on an
> "A"dhoc basis instead?
> 
> On Feb 4, 4:16 pm, Jim Idle <[email protected]> wrote:
> > Pawel (privately) wrote:
> > > Hi,
> >
> > > I know that it sounds unbeliveable for most of you, but I would like to
> > > share with problems that we started to face on our test servers about
> > > 2-3 days ago.
> >
> > > We started to run out of memory errors on our test servers, eg.:
> > > jsh t24fe ~ -->jdiag
> > >  ** Warning [ PERFORM_ERROR ] **
> > > Unix error number 0 while attempting PERFORM , Line   111 , Source jsh.b
> > > Trap from an error message, error message name = PERFORM_ERROR
> > > Line 111 , Source jsh.b
> > > jBASE debugger->q
> > > Are you sure ?y
> > > .
> >
> > I meant to add that if valgrind can be compiled on your system, then you
> > could run it against a jBASE program that exited normally. You will see
> > that some allocations are left to the system to reclaim, but nothing
> > from the application. However valgrind does have a useful heap
> > examination tool that can identify where memory is being held out on the
> > stack. I have only ever used it in Linux however.
> >
> > There are other tools for AIX, but I think that most of them would
> > require source code. What does Temenos help desk say by the way? They
> > are only going to have to ask the jBASE guys, but we have been through
> > this sort of thing lots of times.
> >
> > Jim- Hide quoted text -

----------------------------------------------------
Nużą Cię utarte scenariusze? 
Wymyśl własną grę flashową i wygraj główną nagrodę 5.500 Euro: 
http://klik.wp.pl/?adr=http%3A%2F%2Fwhosegame.pl%2Fcontestcard.php%3Fcontest%3D55&sid=631



--~--~---------~--~----~------------~-------~--~----~
Please read the posting guidelines at: 
http://groups.google.com/group/jBASE/web/Posting%20Guidelines

IMPORTANT: Type T24: at the start of the subject line for questions specific to 
Globus/T24

To post, send email to [email protected]
To unsubscribe, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/jBASE?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to