I finally got some scripting to do exactly what I needed. I did it the kludgy 
way for now but it works. I first had to use Autoexpect to help me build a 
script to connect to the mailbox via openssl s_client, then issue the STAT 
command to get the message count. I tweaked that Expect script and used the 
log_file command to send the output to a text file.

I then wrote a BASH script that first calls the Expect script, then parses the 
resulting text file to get out the number of messages queued up; if that is 
greater than zero then it uses mailx to send me an alert so that I can bounce 
the e-mail engine. Phase two will be modifying the script so that it will 
bounce the e-mail engine and just send an informational (instead of actionable) 
e-mail to notify me that there had been messages queued but the engine was 
restarted.

Of course the problem hasn’t reoccurred since my last message on 9/5. I did get 
a successful test last night though, I put the script in the crontab to run 
every one minute and since the e-mail engine only polls the mailbox every two 
minutes I got several alerts overnight. In the future I will investigate using 
Ruby or Perl instead so that I can leverage existing POP3 libraries but for now 
this addresses my problem (even if it isn’t in the most elegant manner). My 
script is now running every ten minutes which gives the e-mail engine at least 
four tries to pull down messages.

In case anyone is curious the STAT command returns this sort of output 
(including the line break) where the first integer is message count and the 
second is total size:
stat
+OK 14 3561

That made it slightly more difficult to parse out what I needed which is the 
first integer after the OK. I ended up chaining two awk commands together to do 
it.

awk '/stat/,/+OK/' $ETC_DIR/output.txt | awk '{print $2}'


-Rick


From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of Rick Westbrock
Sent: Friday, September 05, 2014 2:39 PM
To: arslist@ARSLIST.ORG
Subject: Re: E-mail engine not getting POP3 messages (Linux) but not logging 
errors

**
Theo-

Actually that is a good solution and we have something almost exactly like that 
in place. That works well in restarting the service and since our script sends 
a notification I know that it fires off every week or two (we don’t have very 
much e-mail volume). My current issue is that our script doesn’t catch the fact 
that the engine is not pulling down the messages so it never stops the service.

That being said the script predates me here so I need to read it through and 
see if it references the specific string you mentioned. If not that might be my 
magic bullet.

Thanks for the suggestion!

Rick

From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of Theo Fondse (GMail)
Sent: Wednesday, September 03, 2014 11:29 PM
To: arslist@ARSLIST.ORG<mailto:arslist@ARSLIST.ORG>
Subject: Re: E-mail engine not getting POP3 messages (Linux) but not logging 
errors

**
Rick,

We were also running 7.1 on RHEL up until a couple of weeks ago, and also had 
to deal with reliability issues with the Remedy Email Engine stopping to 
process mail without any warning or without inscriptions in the debug log that 
points to any real issues or leads to what the issue might be….. very 
frustrating……

We send and receive a lot of e-mail to/from our Remedy server and users have 
become quite dependant on the e-mails (against our better advice), so we got 
into a lot of hot water every time someone discovered that mail was not being 
processed.
This meant that we had to log a daily maintenance call for whoever’s turn it is 
to be on standby for the week to restart the Remedy Email Engine and babysit 
the Email Engine to clear out the unprocessed mail queue. This was NOT fun……

In response to this I implemented a workaround and our Remedy Email Engine woes 
disappeared….
Emails continued to be processed 24/7 and we did not get lambasted any more by 
angry users about emails not being sent/received.

This workaround was quite simple to implement:


1)      Configure the Remedy Email Engine to write to the debug log.

2)      Write a small shell script to monitor the debug log file for 
inscriptions of “/221 2.0.0 Service closing transmission channel” and kill the 
Email Engine process once the inscriptions appear.

(Since the Email Engine is taking a nap for a while when it writes this line to 
the log, it means that you are not killing it whilst it is busy processing 
mail.)

Old Faithful ARMonitor will do it’s job immediately and start the Email Engine 
right up again (unless your email engine is running on a different server from 
your ARSystem server – then your script will have to start the Email Engine up 
again).

3)      Schedule the script as a cron job to run once an hour.

This is not the most desirable or elegant method, and some might call it crude, 
but it has the end result of emails being processed 24/7 without fail and keeps 
angry users off our backs….


Best Regards,
Theo


From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of Rick Westbrock
Sent: 02 September 2014 20:00
To: arslist@ARSLIST.ORG<mailto:arslist@ARSLIST.ORG>
Subject: Re: E-mail engine not getting POP3 messages (Linux) but not logging 
errors

**
With a suggestion from Laurent on BMC Communities I realized that I should have 
just searched for POP3 commands to find what I need and then write a quick BASH 
script to log into the mailbox (hopefully I can to it via SSH instead of 
telnet) and then if the result of the LIST command is greater than zero fire 
the alert. Since the e-mail engine does a POP3 poll every two minutes I can run 
my script every 10 minutes to minimize false positives. In fact it looks like 
maybe just doing the login should tell me the number of messages. Once I get 
this written I will post my results in case anyone else might benefit from it.

Incidentally the same thing happened on Friday night and I caught it when I got 
back  in the office this morning by visually inspecting the mailbox. I have a 
feeling the problem may be related to the fact that this particular mailbox is 
hosted by Microsoft in the cloud. :/

-Rick

From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of Grooms, Frederick W
Sent: Friday, August 29, 2014 6:22 AM
To: arslist@ARSLIST.ORG<mailto:arslist@ARSLIST.ORG>
Subject: Re: E-mail engine not getting POP3 messages (Linux) but not logging 
errors

**
You can do the same thing I have listed in reverse (I do that also)

Change the Count to find out if there have been any inbound in the last x 
minutes and if not then send an email

Fred


From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of Rick Westbrock
Sent: Thursday, August 28, 2014 5:39 PM
To: arslist@ARSLIST.ORG<mailto:arslist@ARSLIST.ORG>
Subject: Re: E-mail engine not getting POP3 messages (Linux) but not logging 
errors

**
Thanks for all the helpful suggestions. Our outbound e-mail is SMTP so it is 
decoupled from the inbound mail which is where I’m having the problem. I guess 
what I am looking for is really something I can run from a different server 
that can check the number of messages in the mailbox via POP3 without pulling 
any of them down. If the check only runs every 10 minutes then the Remedy 
e-mail engine will have had four chances to pick up the messages and I could 
trigger an alert on that. I’ll have to exercise some good-old fashioned 
Google-fu to see what I can come up with.

The frustrating part is that there was nothing logged at all, the engine just 
failed to grab the messages. I had already thought of calling an external 
script to use sendmail to send the alert for queued up outbound messages but 
fortunately that isn’t a problem.

-Rick

From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of Thad Esser
Sent: Thursday, August 28, 2014 10:51 AM
To: arslist@ARSLIST.ORG<mailto:arslist@ARSLIST.ORG>
Subject: Re: E-mail engine not getting POP3 messages (Linux) but not logging 
errors

**
I essentially do the same thing as Fred, but since we have Windows servers, I 
use powershell to send the email, if it helps anyone:
powershell -command Send-MailMessage -smtpServer <smtpserver> -to $Char Param 
01$ -from $SERVER$@<companyname> -subject \"=== ALERT === AR System Email 
Messages Count: $z1D Integer 01$ ($SERVER$)\" -body \"$z1D Char 01$\"

A set fields prior to this run process sets the char field for the body.  I 
also have similar filters to check for Application Pending records that might 
not be processing and a few other things.  It's all attached to an "Automation" 
form, with separate records and filters for the things I'm checking.

Thad

On Thu, Aug 28, 2014 at 10:30 AM, Grooms, Frederick W  wrote:
**
This is how I do it…

First … I have a form with only a single record in it that we use to hold 
configuration info

I have an escalation that runs against this “Config” form that set’s a Display 
only field to trigger filter workflow.  The filter workflow does:
            Filter 1   xxxyyyzzz-1 Check_Counts
                        Set Fields
                                    zTmp_Integer_1  =  $SERVERTIMESTAMP$
                        SQL Set Fields
                                    zTmp_Integer_2 =  SELECT COUNT(*) from 
AR_SYSTEM_EMAIL_MESSAGES WHERE MESSAGE_TYPE = 1 AND SEND_MESSAGE = 1 AND 
CREATE_DATE <= ($zTmp_Integer_1$ -  150)

            Filter 2   xxxyyyzzz-2 SendEmail
                        Run-If  ‘zTmp_Integer_2’ > 25
                        Set Fields
                                    zTmp_String_1 = $PROCESS$  echo "Subject: 
Email Count Error
   Server $SERVER$ has $zTmp_Integer_2$ messages waiting to send
   .
   " | /usr/lib/sendmail  
ema...@mydomain.com,ema...@mydomain.com<mailto:ema...@mydomain.com,ema...@mydomain.com>


Filter 1 gets me a count of Emails waiting to send that are at least 2 1/2 
minutes old.
Filter 2 says if there are more than 25 emails waiting to send alert people 
using sendmail

This is the same basic logic I use to monitor other parts of the system, except 
instead of using sendmail I push to the email messages form

Fred

From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG<mailto:arslist@ARSLIST.ORG>] On Behalf Of William 
Rentfrow
Sent: Thursday, August 28, 2014 11:32 AM
To: arslist@ARSLIST.ORG<mailto:arslist@ARSLIST.ORG>
Subject: Re: E-mail engine not getting POP3 messages (Linux) but not logging 
errors

**
I've seen this in Linux from time to time as well.  It's not really frequent 
but it does happen.  We're on SuSe linux running 7.6.04 sp 5.  Another 
environment is on SuSe with 8.1 - and it's happened to both.

There's not a great way to test it honestly, since when it dies this way it 
doesn't appear to do anything bad.  There's nothing in the log files for the 
monitoring tools to grab.  In fact, a couple of weeks ago this died on a 
Saturday and for some reason no one noticed until Tuesday morning.  Then I 
fixed it....and it sent 200,000+ emails out.  I was *very* popular that day....

We've kicked around a couple of idea like writing workflow to notify us of 
this, but the problem there is that everyone wants to get notified by 
email...so....that's not going to work.  It turns out a broken email process 
won't send email either :)

I think long term the best solution would be for BMC to separate the email 
process completely from the AR server and do a check-in like it does for the 
server group.   Right now in a server group if email dies but the ar server 
itself stays up the email process won't hop to another machine.  It's annoying 
and completely fixable, but BMC has not yet chosen to do that.

If it did have a check-in then armonitor could kill it when it wasn't 
responding, regardless of if you were in a server group or not.

Right now we just check it intermittently and hope for the best.  Fortunately 
our email volume is high enough that our customers usually notice within an 
hour or two.

From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of Rick Westbrock
Sent: Thursday, August 28, 2014 10:25 AM
To: arslist@ARSLIST.ORG<mailto:arslist@ARSLIST.ORG>
Subject: E-mail engine not getting POP3 messages (Linux) but not logging errors

**
Hi all-

I had an interesting issue today and wondered if someone else had run into it 
before. I am running my e-mail engine (7.1) on a Linux server (RHEL 5.10) and 
using POP3 to get messages from a remote mail server. Normally if there’s a 
problem the Email Error form fills up with connection errors but this time it 
failed to pull down messages for over 24 hours but never logged an error.

I used the emaild.sh script with the stop parameter to kill the process and 
normally it stops it immediately, then a monitoring script sees that it isn’t 
running and starts it up again. However today the stop script appeared to hang 
and after five minutes I finally did a kill -9 on the PID to kill the process. 
The monitoring script started it back up immediately with a new PID and it 
processed the 124 waiting messages via POP3 within 30 seconds.

Any ideas what would cause the engine to hang without logging an error? Any 
suggestions on how to monitor and alert on this situation? To date I have just 
been visually looking at the Inbox via Outlook on my local machine to make sure 
there are no messages waiting (the e-mail engine polls every two minutes) but 
that is obviously not an optimal solution. Apparently I forgot to check it 
yesterday, hence the 24 hour backup of messages.

Thanks in advance,
Rick

_________________________
Rick Westbrock
AppOps Engineer | IT Department
24 Hour Fitness USA, Inc.


_ARSlist: "Where the Answers Are" and have been for 20 years_

_ARSlist: "Where the Answers Are" and have been for 20 years_
_ARSlist: "Where the Answers Are" and have been for 20 years_
_ARSlist: "Where the Answers Are" and have been for 20 years_
_ARSlist: "Where the Answers Are" and have been for 20 years_
_ARSlist: "Where the Answers Are" and have been for 20 years_
_ARSlist: "Where the Answers Are" and have been for 20 years_

_______________________________________________________________________________
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
"Where the Answers Are, and have been for 20 years"

Reply via email to