Was thread logging enabled when the server was started? The Init should look like this:
<THRD> /* Sun Feb 22 2009 02:36:07.5150 */ Thread Trace Log -- ON (AR Server 7.1.00 Patch 002 200802011900) <THRD> /* Sun Feb 22 2009 02:36:15.4060 */ Thread Id 3076 (thread number 0) Thread Manager started. <THRD> /* Sun Feb 22 2009 02:36:15.4060 */ Thread Id 3080 (thread number 1) timed call thread started. <THRD> /* Sun Feb 22 2009 02:36:15.4060 */ Thread Id 3084 (thread number 2) on ADMIN queue started. <THRD> /* Sun Feb 22 2009 02:36:19.5150 */ InitServerCache Begin <THRD> /* Sun Feb 22 2009 02:43:20.3880 */ InitServerCache End: rpcCallProc=0 tid=3084 And re-caches look like this; <THRD> /* Fri Feb 20 2009 13:17:16.3370 */ CopyCache Begin: rpcCallProc=10002 user="Remedy Application Service" tid=2808 rpcId=0 <THRD> /* Fri Feb 20 2009 13:19:27.7490 */ CopyCache End <THRD> /* Fri Feb 20 2009 13:22:38.8550 */ FreeServerCache: rpcCallProc=5 user="blah" tid=5776 rpcId=1761714 Can you verify that the server completes and InitServerCache before performing a CopyCache? Tony Worthington Sr. Technical Analyst Kohl's Department Stores N56 W17000 Ridgewood Drive Menomonee Falls, WI 53051 262.703.5911 (phone) tony.worthing...@kohls.com www.Kohls.com From: Anthony K R <anthony_rathna...@dell.com> To: arslist@ARSLIST.ORG Date: 02/25/2009 10:40 AM Subject: Re: ARS 7.1 server group issue Sent by: "Action Request System discussion list(ARSList)" <arslist@ARSLIST.ORG> Why is doing ?InitServerCache? instead of ?CopyCache?? -Anthony From: Action Request System discussion list(ARSList) [ mailto:arsl...@arslist.org] On Behalf Of Walters, Mark Sent: Wednesday, February 25, 2009 10:02 PM To: arslist@ARSLIST.ORG Subject: Re: ARS 7.1 server group issue OK ? that?s just a failure of the admin thread then. Another plus of Windows is that we seem to be able to handle individual thread failures more gracefully than Unix. In this case the admin thread is getting a malloc error, dying and restarting to try again. Mark From: Action Request System discussion list(ARSList) [ mailto:arsl...@arslist.org] On Behalf Of Anthony K R Sent: 25 February 2009 15:36 To: arslist@ARSLIST.ORG Subject: Re: ARS 7.1 server group issue Mark, No entry seen in armonitor.log, but the arerror.log says; Wed Feb 25 05:32:28 2009 390600 : Malloc failed on server (ARERR 300) Wed Feb 25 05:32:28 2009 390600 : AR System server terminated -- fatal error encountered (ARNOTE 21) Thanks, Anthony From: Action Request System discussion list(ARSList) [ mailto:arsl...@arslist.org] On Behalf Of Walters, Mark Sent: Wednesday, February 25, 2009 5:17 PM To: arslist@ARSLIST.ORG Subject: Re: ARS 7.1 server group issue This looks like just the admin thread dying and not the arserver crashing? What do the arerror.log and armonitor.log show at these times? Mark From: Action Request System discussion list(ARSList) [ mailto:arsl...@arslist.org] On Behalf Of Anthony K R Sent: 25 February 2009 11:38 To: arslist@ARSLIST.ORG Subject: Re: ARS 7.1 server group issue Here is the entries from thread log; <THRD> /* Wed Feb 25 2009 05:16:37.8330 */ Thread Trace Log -- ON (AR Server 7.1.00 Patch 005 200809150630) <THRD> /* Wed Feb 25 2009 05:22:15.5140 */ InitServerCache Begin <THRD> /* Wed Feb 25 2009 05:22:29.0760 */ FreeServerCache: rpcCallProc=10004 user="Remedy Application Service" tid=3076 rpcId=390600 <THRD> /* Wed Feb 25 2009 05:22:29.3260 */ Thread Id 3076 (thread number 1) on ADMIN queue died. <THRD> /* Wed Feb 25 2009 05:22:29.3260 */ Thread Id 4600 (thread number 1) on ADMIN queue restarted. <THRD> /* Wed Feb 25 2009 05:32:15.5020 */ InitServerCache Begin <THRD> /* Wed Feb 25 2009 05:32:28.7990 */ FreeServerCache: rpcCallProc=10004 user="Remedy Application Service" tid=4600 rpcId=390600 <THRD> /* Wed Feb 25 2009 05:32:29.0490 */ Thread Id 4600 (thread number 1) on ADMIN queue died. <THRD> /* Wed Feb 25 2009 05:32:29.0490 */ Thread Id 5916 (thread number 1) on ADMIN queue restarted. Regards, Anthony From: Rathnappa, Anthony Sent: Wednesday, February 25, 2009 4:57 PM To: arslist@ARSLIST.ORG Subject: RE: ARS 7.1 server group issue I have verified the boot.ini file has /3G switch. Also using ?dumpbin? tool I got confirmed that arserver can address more than 2GB. After startup the memory consumed is ~1.3GB, as shown in Task Manager. This is still a pre-prod env, so there are no users. In the Dev env, I had used ;CopyCache Begin? flag, where the log showed only ?CopyCache Begin:? but no ?CopyCache End? Will enable both flags and update you. Thanks, Anthony From: Action Request System discussion list(ARSList) [ mailto:arsl...@arslist.org] On Behalf Of Walters, Mark Sent: Wednesday, February 25, 2009 1:46 PM To: arslist@ARSLIST.ORG Subject: Re: ARS 7.1 server group issue By default the maximum memory arserver can access on 32-bit Windows is 2GB. If it tries to grow beyond this then it will fail. This is an OS limitation that can be changed to 3GB by the addition of the /3GB switch to the appropriate line in the boot.ini file. See http://www.microsoft.com/whdc/system/platform/server/PAE/PAEmem.mspx and many of the other pages returned by a Google for ?windows 3gb boot.ini?. The arserver is compiled with the large address aware flag that enables it to make use of the additional 1GB of RAM provided by this switch. However, I?d be interested to understand why your arserver process is getting so large that it is reaching the 2GB limit. How much memory does arserver.exe consume after startup ? at the point that users can login? How many concurrent users? The initial size of the process is largely determined by the amount of forms and workflow that you have on the system as these are all read in to the server to create the cache. If you have a full ITSM system with multiple language packs the initial size could be in excess of 700MB. Once it is up and running the server will increase in size as it allocates memory to handle it?s day-to-day work ? processing query results and so on. One of the advantages of the Windows platform is that once the server releases the memory it is returned to the OS and the footprint should shrink again. If the maximum process size (2 or 3 GB depending on the flag above) minus the current size or arserverd is LESS than the startup size a recache operation is likely to fail. Things that you could do; · Enable the /3GB option · If your startup size is very large look to remove unused views, forms, workflow from the system · Set Large-Result-Logging-Threshold: 100000 in ar.cfg and enable thread logging on the secondary servers ? this will show you if you have users running queries returning large datasets and consuming memory. · Set Copy-Cache-Logging: T too ? this will record the recache operations in the thread log. You want to make sure that you see the freeservercache that indicates that the server has released the original copy of the cache. If you have long running API calls it is possible for the server to end up with more than 2 copies of the cache ? if this is a large cache you can very quickly hit the memory limit. Eg This is bad ? multiple copies ? you want to see a begin, end and free before the next begin. CopyCache Begin: rpcCallProc=10002 user="Remedy Application Service" tid=5 rpcId=0 CopyCache End CopyCache Begin: rpcCallProc=10002 user="Remedy Application Service" tid=5 rpcId=0 CopyCache End FreeServerCache: rpcCallProc=10018 user="Remedy Application Service" tid=5 rpcId=1178442632 Incidentally, if you have are using 64-bit Windows I believe the maximum size of a large address aware enabled 32-bit application is 4GB by default - http://msdn.microsoft.com/en-us/library/ms791558.aspx Mark Walters The opinions, statements, and/or suggested courses of action expressed in this E-mail do not necessarily reflect those of BMC Software, Inc. My voluntary participation in this forum is not intended to convey a role as a spokesperson, liaison or support representative for BMC Software, Inc. From: Action Request System discussion list(ARSList) [ mailto:arsl...@arslist.org] On Behalf Of Anthony K R Sent: 25 February 2009 07:17 To: arslist@ARSLIST.ORG Subject: Re: ARS 7.1 server group issue Joe, The chunk setting should not cause malloc error. There is no timeout issue either. Today I saw memory consumption report when the recache triggered on secondary servers. It is crossing 2GB before the malloc error, a memory limitation on OS or arserver process? Regards, Anthony From: Action Request System discussion list(ARSList) [ mailto:arsl...@arslist.org] On Behalf Of Joe DeSouza Sent: Wednesday, February 25, 2009 7:50 AM To: arslist@ARSLIST.ORG Subject: Re: ARS 7.1 server group issue ** Its a known issue where ARS on Windows connected to a Remote Oracle database, takes forever to recache and that it takes forever to restart if the services have been stopped and is restarted. This is because of the way that data is read in chunks of 100 rows. It is as designed and Remedy has nothing to do with the design as its more how the Oracle client communicates to remote oracle databases when the client is on Windows.. I didn't experience the kinds of problems you are talking about on UNIX ARS Servers connected to remote Oracle databases. So I guessed your configurations by the symptoms you described. Unfortunately you got to live with it unless you decide to move to UNIX. Joe From: Lyle Taylor <tayl...@ldschurch.org> To: arslist@ARSLIST.ORG Sent: Tuesday, February 24, 2009 6:02:40 PM Subject: Re: ARS 7.1 server group issue Correct?? From: Action Request System discussion list(ARSList) [ mailto:arsl...@arslist.org] On Behalf Of Joe DeSouza Sent: Tuesday, February 24, 2009 3:20 PM To: arslist@ARSLIST.ORG Subject: Re: ARS 7.1 server group issue ** Your AR Servers are probably on windows and connect to Oracle setup as a Remote database? Joe From: Lyle Taylor <tayl...@ldschurch.org> To: arslist@ARSLIST.ORG Sent: Tuesday, February 24, 2009 4:27:56 PM Subject: Re: ARS 7.1 server group issue ** I see server groups as being more useful for load balancing and redundancy. While you can indeed have users on the other systems while you perform the updates, the other servers become nearly unusable as the cache updates, especially for anything other than very minor changes. I?ve simply had less issues if I simply bring down the other servers during the changes and then bring them back up again after. In my experience, that actually provides a better user experience, because knowing that it?s down for a short time is easier to deal with than extremely slow performance during a cache update. Lyle From: ********************************************************************** CONFIDENTIALITY NOTICE: This is a transmission from Kohl's Department Stores, Inc. and may contain information which is confidential and proprietary. If you are not the addressee, any disclosure, copying or distribution or use of the contents of this message is expressly prohibited. If you have received this transmission in error, please destroy it and notify us immediately at 262-703-7000. CAUTION: Internet and e-mail communications are Kohl's property and Kohl's reserves the right to retrieve and read any message created, sent and received. Kohl's reserves the right to monitor messages by authorized Kohl's Associates at any time without any further consent.
<<image/jpeg>>