Hi Pam, During incremental backup, the peak memory utilization usually occurs when the client is processing the directory with the largest number of files. In this case, unless your machine has only 8 MB of RAM ;-) I do not see how ~15K objects could cause memory to be exhausted. It doesn't pass the "sniff test".
Regarding the 6.4 error where you see the return code 11: This most likely corresponds to errno EAGAIN, which means there were insufficient system resources to create a new thread. This is not an insufficient memory issue, but some other system resource. A shot in the dark, but... by any chance is the AIX system configured to use 64 KB page sizes? I ask because of this AIX APAR which *might* be a match: http://www.ibm.com/support/docview.wss?uid=isg1IZ27457 (See the Comments section of the APAR to match the acutal 6.1 maintenance level.) Best regards, Andy ____________________________________________________________________________ Andrew Raibeck | IBM Spectrum Protect Level 3 | stor...@us.ibm.com IBM Tivoli Storage Manager links: Product support: https://www.ibm.com/support/entry/portal/product/tivoli/tivoli_storage_manager Online documentation: http://www.ibm.com/support/knowledgecenter/SSGSG7/landing/welcome_ssgsg7.html Product Wiki: https://www.ibm.com/developerworks/community/wikis/home/wiki/Tivoli%20Storage%20Manager "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> wrote on 2016-04-27 15:56:42: > From: "Pagnotta, Pamela (CONTR)" <pamela.pagno...@hq.doe.gov> > To: ADSM-L@VM.MARIST.EDU > Date: 2016-04-27 15:59 > Subject: Re: TSM Client upgrade on AIX > Sent by: "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> > > Andy, > > Here is the top few entries from the select statement. From what I > can tell, none of the filesystems have more than 200K objects on this client. > > FILESPACE_NAME: /gridfs > HL_NAME: /oraem/Oracle/admin/emrep/adump/ > TOTAL_OBJECTS: 15551 > > FILESPACE_NAME: /gridfs > HL_NAME: /oraem/Oracle/middleware/oms/sysman/archives/emgc/ > deployments/GCDomain/emgc.ear/em.war/cabo/jsLibs/resources/ > TOTAL_OBJECTS: 2473 > > FILESPACE_NAME: /gridfs > HL_NAME: /oraem/Oracle/middleware/logs/ > TOTAL_OBJECTS: 2344 > > Since moving back to version 6.4.2.0 there is a new message > > 04/27/16 02:33:22 ANS0361I DIAG: Thread creation failed; rc=11. > 04/27/16 02:33:24 ANS1999E Incremental processing of '/usr' stopped. > > The only Technote I can find that is close to this message and rc is > TSM server related on a Linux system. Is there anywhere where these > diagnostic return codes are defined for TSM administrators? > > I have moved this backup to a quieter time of the night to see if > that helps at all. > > I will open a ticket for this new error tomorrow. > > Thank you, > > Pam Pagnotta > Sr. System Engineer > Criterion Systems, Inc./ActioNet > Contractor to US. Department of Energy > Office of the CIO/IM-622 > Office: 301-903-5508 > Mobile: 301-335-8177 > > > -----Original Message----- > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On > Behalf Of Andrew Raibeck > Sent: Wednesday, April 27, 2016 2:19 PM > To: ADSM-L@VM.MARIST.EDU > Subject: Re: [ADSM-L] TSM Client upgrade on AIX > > Hi Pam, > > Do any of the file systems happen to have directories that contain large > numbers of files (say, more than a million)? You could try running this > SELECT statement from an administrative command line client to assess this. > Make sure to the node name RAIBECK with your node name (in all upper case): > > select filespace_name, hl_name, count(*) as total_objects from backups > where node_name='RAIBECK' and state='ACTIVE_VERSION' group by > filespace_name, hl_name order by 3 desc > > You can cancel the output after the first few lines. What you are looking > for is the top of the list, which would show you which file system and > directory has the largest number of files. How many objects are there? > > If there are millions of files involved, it could be that memory is being > exhausted (how much memory is available on this system?); though I would > normally expect a proper "out of memory" message, rather than the more > cryptic message you are seeing. In the past, I have heard customers say > that this occurs after upgrading the client, but what really happened was > that the number of files in the directory was growing continuously, and > eventually the backup could not allocate enough memory; and the upgrade > just happened to roughly coincide with the onset of the issue. I cannot say > whether this is possible in your situation, but I am just sharing some of > my past experiences with this issue. > > From which client version and bit-architecture did you upgrade to 7.1? I > see you put 6.4 on as a "workaround", but what was the original version? > Earlier client versions did see an increase in memory usage when the > clients were changed from 32-bit to 64-bit, as 64-bit software tends to use > more memory (pointer variables are 8 bytes rather than 4 bytes, and that is > one chief contributor). But no such change occurred from 6.4 to 7.1, so why > memory would be exhausted in 7.1 but not 6.4, I have no immediate idea. > > If the affected machine does not really have any directories with huge > numbers of files, then this could be something else... I would invite you > to reopen your PMR, let me know, and I will have it escalated to our Level > 2 support for further investigation. As I mentioned earlier, the cryptic > calloc() error does not seem right. > > Best regards, > > Andy > > ____________________________________________________________________________ > > Andrew Raibeck | IBM Spectrum Protect Level 3 | stor...@us.ibm.com > > IBM Tivoli Storage Manager links: > Product support: > https://www.ibm.com/support/entry/portal/product/tivoli/tivoli_storage_manager > > Online documentation: > http://www.ibm.com/support/knowledgecenter/SSGSG7/landing/welcome_ssgsg7.html > > Product Wiki: > https://www.ibm.com/developerworks/community/wikis/home/wiki/Tivoli% > 20Storage%20Manager > > "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> wrote on 2016-04-27 > 13:15:50: > > > From: "Pagnotta, Pamela (CONTR)" <pamela.pagno...@hq.doe.gov> > > To: ADSM-L@VM.MARIST.EDU > > Date: 2016-04-27 13:17 > > Subject: Re: Re: TSM Client upgrade on AIX > > Sent by: "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> > > > > Hi Dave, > > > > Thank you for the information. I, of course, could not get the > > person assigned to my ticket to even acknowledge that this might be > > due to some different memory requirements for the newer TSM clients. > > The only response I received was that we just must not have enough > > memory on our system to do the backup despite being told that there > > was no issue with an older client. > > > > Pam > > > > Pam Pagnotta > > Sr. System Engineer > > Criterion Systems, Inc./ActioNet > > Contractor to US. Department of Energy > > Office of the CIO/IM-622 > > Office: 301-903-5508 > > Mobile: 301-335-8177 > > > > > > -----Original Message----- > > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On > > Behalf Of David Bronder > > Sent: Wednesday, April 27, 2016 12:56 PM > > To: ADSM-L@VM.MARIST.EDU > > Subject: Re: [ADSM-L] Re: TSM Client upgrade on AIX > > > > This isn't really helpful for your specific situation, Pam (I don't think > > I've had the specific errors you've seen). But I have noticed (with much > > dismay) that the 7.x clients for AIX have required significantly more > memory > > than earlier versions. I have clients with 1+ million files in a > filesystem > > that had no problems with 6.x and earlier clients, but consistently > required > > huge data ulimits after upgrading to 7.x (and would fail, often > completely > > silently, if the ulimit wasn't high enough). > > > > I don't know what IBM did with the 7.x clients to make them so > memory-greedy > > compared to earlier versions. Maybe the client-side dedupe support or > > something, though I'm not using those newer features currently, so I > would > > hope that wouldn't be a factor. Then again, I would hope IBM would > realize > > that setting the data ulimit to unlimited isn't really a best practice > and > > that having successful backups shouldn't require risking breaking > services on > > the systems those backups are protecting. (</soapbox>) > > > > So far, I've gotten by with a non-unlimited ulimit, but it seems like I > do > > have to keep raising it with each new 7.x client release... > > > > =Dave > > > > > > On 04/27/2016 09:09 AM, Pagnotta, Pamela (CONTR) wrote: > > > Hi Matthew, > > > > > > Yes, the root user ulimits is set to unlimited on all the AIX servers. > > > > > > Regards, > > > > > > Pam Pagnotta > > > Sr. System Engineer > > > Criterion Systems, Inc./ActioNet > > > Contractor to US. Department of Energy > > > Office of the CIO/IM-622 > > > Office: 301-903-5508 > > > Mobile: 301-335-8177 > > > > > > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On > > Behalf Of Matthew McGeary > > > Sent: Wednesday, April 27, 2016 9:56 AM > > > To: ADSM-L@VM.MARIST.EDU > > > Subject: Re: [ADSM-L] TSM Client upgrade on AIX > > > > > > Good morning Pam, > > > > > > We encountered errors backing up filesystems with large numbers of > > files until we set the root user ulimits to unlimited. That fixed > > the problem but can have other consequences, obviously. Do you know > > if your AIX admin tried changing the ulimits? > > > > > > Regards, > > > __________________________ > > > > > > Matthew McGeary > > > Senior Technical Specialist - Infrastructure > > > PotashCorp > > > T: (306) 933-8921 > > > www.potashcorp.com > > > > > > From: "Pagnotta, Pamela (CONTR)" <pamela.pagno...@hq.doe.gov> > > > To: ADSM-L@VM.MARIST.EDU > > > Date: 04/27/2016 07:47 AM > > > Subject: [ADSM-L] TSM Client upgrade on AIX > > > Sent by: "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> > > > > > > ________________________________ > > > > > > > > > > > > Hello, > > > > > > Recently one of our AIX administrators upgraded the TSM client to > > 7.1.4.4 on her servers. Many of them started receiving errors like > > > > > > calloc() failed: Size 31496 File ../mem/mempool.cpp Line 1092 > > > > > > I looked this up and the indication is that the AIX server could > > not supply enough memory to TSM to complete the backup. We opened a > > ticket and were told to try memoryefficientbackup with > > diskcachemethod. This did not fix the issue. > > > > > > In frustration the administrator reinstalled a TSM client version > > of 6.4.2.0 and is no longer experiencing the memory problems. > > > > > > Any thoughts? > > > > > > Thank you, > > > > > > Pam > > > > > > Pam Pagnotta > > > Sr. System Engineer > > > Criterion Systems, Inc./ActioNet > > > Contractor to US. Department of Energy > > > Office of the CIO/IM-622 > > > Office: 301-903-5508 > > > Mobile: 301-335-8177 > > > > > > > -- > > Hello World. David Bronder - Systems > Architect > > Segmentation Fault ITS-EI, Univ. of > Iowa > > Core dumped, disk trashed, quota filled, soda warm. > david-bron...@uiowa.edu > > >