Re: qmail died again... 3x in 3 weeks
It seems that all of a sudden my RH had a resource limit problem. DNS is fine, but after 61 qmail-remotes it wouls appear that RH ran out of resources. I searched the archives and added some ulimit commands to the qmail.init script, but I couldn't find a way to determine how many files to allow open etc If anyone knows how many resources qmail needs for a concurrancy of 100 let me know as the default RH settings are to low plus the other services on the box, https, ssh, ntp etc. Paul Farber Farber Technology [EMAIL PROTECTED] Ph 570-628-5303 Fax 570-628-5545 On Sat, 22 Jul 2000, Eric Cox wrote: > > > Paul Farber wrote: > > > > telnetting to port 25 and 110 just timed out. > > This usually means (when it has happened to me anyway) that the > server is listening on the port you're telnetting to, but is > stalled doing a reverse DNS lookup of the client's IP address. > Perhaps a munged reverse DNS zonefile? > > > > DNS was fine... it means > > just that, I could ping via hostname and the dns logs show it was running. > > That could still happen under the above scenario... > > Eric >
Re: qmail died again... 3x in 3 weeks
Paul Farber wrote: > > telnetting to port 25 and 110 just timed out. This usually means (when it has happened to me anyway) that the server is listening on the port you're telnetting to, but is stalled doing a reverse DNS lookup of the client's IP address. Perhaps a munged reverse DNS zonefile? > DNS was fine... it means > just that, I could ping via hostname and the dns logs show it was running. That could still happen under the above scenario... Eric
Re: qmail died again... 3x in 3 weeks
> > Have you thought about migrating to the latest version of > daemontools and it's startup idiom? More importantly, > using cyclog? That would be kind of difficult with the _latest_ version. :) Try multilog :) RC -- +--- | Ricardo Cerqueira | PGP Key fingerprint - B7 05 13 CE 48 0A BF 1E 87 21 83 DB 28 DE 03 42 | Novis - Engenharia ISP / Rede Técnica | Pç. Duque Saldanha, 1, 7º E / 1050-094 Lisboa / Portugal | Tel: +351 21 3166700 (24h/dia) - Fax: +351 21 3166701
Re: qmail died again... 3x in 3 weeks
On Thu, Jul 20, 2000 at 02:16:21PM -0400, Paul Farber wrote: > Here is the ps output after the crash at 12:35 today: > > 339 ?S657:23 syslogd -m 0 [...] > 486 ?S 1:43 splogger qmail Have you thought about the resources used by syslogd during heavy qmail use? Have you thought about migrating to the latest version of daemontools and it's startup idiom? More importantly, using cyclog? John
Re: qmail died again... 3x in 3 weeks
telnetting to port 25 and 110 just timed out. DNS was fine... it means just that, I could ping via hostname and the dns logs show it was running. No other host (the web server specifically) showed a dns failure (log files had hostnames resolved during the 30 minute mail 'outage'. I have the ps output saved as files, gpm cut/paste will only go so far. qmail unable to fork is a new one for me. I have concurrency set to 100, and ulimit returns unlimited (from the command line). ulimit -a shows core file size (blocks) 100 data seg size (kbytes) unlimited file size (blocks) unlimited max memory size (kbytes) unlimited stack size (kbytes) 8192 cpu time (seconds) unlimited max user processes 2048 pipe size (512 bytes)8 open files 1024 virtual memory (kbytes) 2105343 System memory available: Mem: 131022848 125976576 5046272 8724480 77127680 23965696 Swap: 41250816 3854336 37396480 MemTotal:127952 kB MemFree: 4928 kB MemShared: 8520 kB Buffers: 75320 kB Cached: 23404 kB SwapTotal:40284 kB SwapFree: 36520 kB I can see a large emailing from 1 single user in the logs, 2 of over 50 bounced (from qmail-qread). but I'm pretty sure I've had CC's or To: of 150 before without problems (from a site we host). What else can I do to up the available resources??? Paul Farber Farber Technology [EMAIL PROTECTED] Ph 570-628-5303 Fax 570-628-5545 On Thu, 20 Jul 2000 [EMAIL PROTECTED] wrote: > On Thu, Jul 20, 2000 at 02:16:21PM -0400, Paul Farber wrote: > > Hello all, I had another qmail failure... i say that because qmail seems > > to be the only service affected. I lump tcpserver, smtpd and pop3d in > > as qmail for now as I don't have a definative place to look. > > > > Here is the ps output after the crash at 12:35 today: > > Which looks ok to me. All the processes that are needed, seem to be > there. > > > No pop3 or smtp services were available, telnet didn't work either. > > What do you mean by "available?". What happened when you tried to connect > to those ports? > > > I have identd commented out in inet.conf, so I have no idea where it is > > coming from, or how it started. > > What do you expect identd to tell you that's relevant to qmail? > > > After stopping qmail.init, vpopmail, pop3d and smtpd this is a list of the > > new processes: > > Apart from the qmail-remotes, what relevant differences do you see > between this and the last ps? > > > 23026 tty1 S 0:00 supervise /var/lock/qmail qmail-start # Using > > qmail-l23027 tty1 S 0:00 splogger qmail > > 23028 tty1 S 0:00 qmail-send > > 23029 tty1 S 0:00 qmail-lspawn # Using qmail-local to deliver > > messages > > Is this a heavily edited 'ps' output? It makes if difficult if it is. > > > Notice that I have a new pid 23094 tcprules , but the pid seems > > to indicate that it was from the restart of the services, not the original > > ps output. > > tcprules hasn't much to do with the running of qmail though. > > > The mail log just before the stoppage indicates that tcpserver encountered > > a huge number of cname lookup failures for outgoing mail. but DNS was > > fine (the main dsn server is still running, and I do not cache entries on > > the mail server). > > What do you mean by DNS "was fine"? Do you mean it had reachability to > resolve queries or do you mean that the process was running? > > My suspicion is that you lost the ability to do DNS lookups and > that stalled reverse lookups which stalls all processes that want to > do that (such as a wrappered telnet). > > > Going the the mail log I see: > > > > Jul 20 11:30:19 mail qmail: 964107019.317584 status: local 1/100 remote > > 51/100 > > Jul 20 11:30:20 mail qmail: 964107020.447592 delivery 118231: deferral: > > qmail-spawn_unable_to_fork._(#4.3.0)/ > > > > and > > > > Jul 20 11:48:28 mail qmail: 964108108.168548 status: local 0/100 remote > > 46/100 > > Jul 20 11:48:35 mail qmail: 964108115.918335 delivery 118202: deferral: > > Connected_to_24.0.95.29_but_connection_died._(#4.4.2)/ > > > > then > > > > Jul 20 12:01:47 mail qmail: 964108907.824767 status: local 0/100 remote > > 47/100 > > Jul 20 12:01:47 mail qmail: 964108907.917107 delivery 118245: deferral: > > CNAME_lookup_failed_temporarily._(#4.4.3)/ > > > > Concurrancy went up to 51 for remote, local 1 during the 30 minutes it > > took to fail. > > And what do you make of that? The first message tells me that you > haven't given qmail-send enough resources to start all the processes > you've configured it for - but it's not fatal. > > > Mark. >
Re: qmail died again... 3x in 3 weeks
On Thu, Jul 20, 2000 at 02:16:21PM -0400, Paul Farber wrote: > Hello all, I had another qmail failure... i say that because qmail seems > to be the only service affected. I lump tcpserver, smtpd and pop3d in > as qmail for now as I don't have a definative place to look. > > Here is the ps output after the crash at 12:35 today: Which looks ok to me. All the processes that are needed, seem to be there. > No pop3 or smtp services were available, telnet didn't work either. What do you mean by "available?". What happened when you tried to connect to those ports? > I have identd commented out in inet.conf, so I have no idea where it is > coming from, or how it started. What do you expect identd to tell you that's relevant to qmail? > After stopping qmail.init, vpopmail, pop3d and smtpd this is a list of the > new processes: Apart from the qmail-remotes, what relevant differences do you see between this and the last ps? > 23026 tty1 S 0:00 supervise /var/lock/qmail qmail-start # Using > qmail-l23027 tty1 S 0:00 splogger qmail > 23028 tty1 S 0:00 qmail-send > 23029 tty1 S 0:00 qmail-lspawn # Using qmail-local to deliver > messages Is this a heavily edited 'ps' output? It makes if difficult if it is. > Notice that I have a new pid 23094 tcprules , but the pid seems > to indicate that it was from the restart of the services, not the original > ps output. tcprules hasn't much to do with the running of qmail though. > The mail log just before the stoppage indicates that tcpserver encountered > a huge number of cname lookup failures for outgoing mail. but DNS was > fine (the main dsn server is still running, and I do not cache entries on > the mail server). What do you mean by DNS "was fine"? Do you mean it had reachability to resolve queries or do you mean that the process was running? My suspicion is that you lost the ability to do DNS lookups and that stalled reverse lookups which stalls all processes that want to do that (such as a wrappered telnet). > Going the the mail log I see: > > Jul 20 11:30:19 mail qmail: 964107019.317584 status: local 1/100 remote > 51/100 > Jul 20 11:30:20 mail qmail: 964107020.447592 delivery 118231: deferral: > qmail-spawn_unable_to_fork._(#4.3.0)/ > > and > > Jul 20 11:48:28 mail qmail: 964108108.168548 status: local 0/100 remote > 46/100 > Jul 20 11:48:35 mail qmail: 964108115.918335 delivery 118202: deferral: > Connected_to_24.0.95.29_but_connection_died._(#4.4.2)/ > > then > > Jul 20 12:01:47 mail qmail: 964108907.824767 status: local 0/100 remote > 47/100 > Jul 20 12:01:47 mail qmail: 964108907.917107 delivery 118245: deferral: > CNAME_lookup_failed_temporarily._(#4.4.3)/ > > Concurrancy went up to 51 for remote, local 1 during the 30 minutes it > took to fail. And what do you make of that? The first message tells me that you haven't given qmail-send enough resources to start all the processes you've configured it for - but it's not fatal. Mark.