Hi all,

This is in connection with fairly frequent runaway lpd processes that
we're seeing using LPRng-3.8.27 under RH9 - by "fairly frequent", I
mean 5 or more a week on 5 print servers, serving about 60 printers
between them.  A description of the problem is attached at the end of
this message from my previous mail to the list.  It's definitely not
connected with futexes, as I orginally thought, as I see the same
symptoms when running without futexes.

I have turned on debugging to level 3 on a queue to try and get some
information, this is the end part of the log for the most recent one
I've seen this problem on:

2004-08-24-17:19:56.177 cessnock [14080] (Server)  at3: cleanup: done,
exit(0)
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3:
Update_spool_info: file 'control.pr'
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3:
Get_spool_control:  file 'control.pr'
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3: Get_file_image:
'control.pr', maxsize 0
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3: Checkread: file
'control.pr'
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3: Checkread:
'control.pr' fd 5, size 73
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3: Get_fd_image: fd
5
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3: Get_fd_image:
len 73 'debug=3
printing_aborted=0x0
pr'
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3: ***
Dump_subserver_info: 'Do_queue_jobs - after setup' - 1 subservers
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3:  server 0 -
0x80c4498, count 7, max 102, list 0x80c6c80
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3:   [ 0] 0x80c5e80
='debug=3'
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3:   [ 1] 0x80c3fb8
='printer=at3'
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3:   [ 2] 0x80c5eb0
='printing_aborted=0x0'
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3:   [ 3] 0x80c3f78
='printing_disabled=0x0'
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3:   [ 4] 0x80c73b8
='queue_control_file=control.pr'
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3:   [ 5] 0x80c7660
='spooldir=/var/spool/lpd/at3'
2004-08-24-17:21:13.051 cessnock [14089] (Server)  at3:   [ 6] 0x80c76b8
='spooling_disabled=0x0'

.. at this point nothing more gets logged to the log file.  Other
runaway processes show the same.

I've also debugged the runaway process with gdb a couple of times and
it's been stuck at the same point each time, namely after the call to
'opendir (".")' in Scan_queue in getqueue.c (line 82).  Here's the
backtrace from gdb:

(gdb) bt
#0  0x4023697b in malloc_consolidate () from /lib/i686/libc.so.6
#1  0x40236007 in _int_malloc () from /lib/i686/libc.so.6
#2  0x40235a34 in calloc () from /lib/i686/libc.so.6
#3  0x4026bdc7 in opendir () from /lib/i686/libc.so.6
#4  0x0804fa53 in Scan_queue (spool_control=0x80bc640,
sort_order=0x80bc0b4, 
    pprintable=0xbfff97bc, pheld=0xbfff97c0, pmove=0xbfff97c4, 
    only_queue_process=1, perr=0xbfff97c8, pdone=0xbfff97cc, 
    remove_prefix=0x0, remove_suffix=0x0) at common/getqueue.c:82
#5  0x0806280b in Do_queue_jobs (name=0x11 <Address 0x11 out of bounds>,

    subserver=0) at common/lpd_jobs.c:566
#6  0x080714dc in Receive_secure (sock=0xbfffa330, 
    input=0x808ab58 "U\211�WVS\203�\024\213}$\213u �E�")
    at common/lpd_secure.c:247
#7  0x080617d5 in Service_lpd (talk=-1, 
    from_addr=0xbfffa360 "129.215.45.134 port 43234")
    at common/lpd_dispatch.c:341
#8  0x080614d0 in Service_connection (args=0xbfffa360)
    at common/lpd_dispatch.c:310
#9  0x0805d8c5 in Do_work (name=0x809fd68 "server", args=0xbfffa490)
    at common/linelist.c:3847
#10 0x0805d676 in Make_lpd_call (name=0x809fd68 "server",
passfd=0xbfffa4a0, 
    args=0xbfffa490) at common/linelist.c:3820
#11 0x0805d9e7 in Start_worker (name=0x809fd68 "server",
parms=0xbfffa500, 
    fd=14) at common/linelist.c:3876
#12 0x0804d3a7 in Accept_connection (sock=8, lpd_socket=0,
unix_socket=0)
    at common/lpd.c:1008
#13 0x0804be3a in main (argc=1, argv=0xbfffa7b4, envp=0x1) at
common/lpd.c:687
#14 0x401d7a67 in __libc_start_main () from /lib/i686/libc.so.6
(gdb)

But I'm a bit stuck as to what I can do next to try and work out what
is happening to these proceses to make them run away.  I can't debug
into the system calls.  Where is "." when the opendir is called?

I'd very much appreciate any advice on what I can do next - is it a
RH9 problem or an LPRng problem?  What can I try next, etc?

Cheers
Toby Blake
University of Edinburgh

> I'm running LPRng-3.8.27.
> 
> However, I'm still seeing runaway lpd processes - it's always the
> 'server' process and it consumes as much CPU as it can - an lpc kill
> fixes the problem, but obviously this impacts on the overall
> reliability of the printing system - it's happened twice in the last
> day or so.
> 
> I was wondering if anyone else has seen this problem at all.  Here's
> an example of it happening:
> 
> [kant]toby: lpq -Pat8
> Printer: [EMAIL PROTECTED] 'HP Laserjet 8150DN in 5.05 (Level 5 West Lab)
AT'
>  Queue: 7 printable jobs
>  Server: pid 8178 active
>  Status: job '[EMAIL PROTECTED]' removed at 16:02:46.028
>  Rank   Owner/ID               Pr/Class Job Files                 Size
Time
> 1      [EMAIL PROTECTED]                A   372 print.ps            224344
16:04:36
> [kant]toby:
> 
> .. with the 8178 process chewing up all CPU, until an lpc kill kills
> this process and gets the queue moving again.  Note that strace
> doesn't reveal anything at all - not a single line of output.  I have
> enabled debugging on this queue, so will hopefully get some
> information if/when I get the next runaway.



-- 





-----------------------------------------------------------------------------
YOU MUST BE A LIST MEMBER IN ORDER TO POST TO THE LPRng MAILING LIST
The address you post from or your Reply-To address MUST be your
subscription address

If you need help, send email to [EMAIL PROTECTED] (or lprng-requests
or lprng-digest-requests) with the word 'help' in the body.
To subscribe to a list with name LIST,  send mail to [EMAIL PROTECTED]
with:                           | example:
subscribe LIST <mailaddr>       |  subscribe lprng-digest [EMAIL PROTECTED]
unsubscribe LIST <mailaddr>     |  unsubscribe lprng [EMAIL PROTECTED]

If you have major problems,  call Patrick Powell or one of the friendly
staff at Astart Technologies for help.  Astart also does support for LPRng.
Also, check the Web Page at: http://www.lprng.com for any announcements.
Astart Technologies  (LPRng - Print Spooler http://www.lprng.com)
6741 Convoy Court
San Diego, CA 92111
858-874-6543 FAX 858-751-2435
-----------------------------------------------------------------------------

Reply via email to