> Hello, > > Apache httpd 2.x is running into some odd CGI problems on Solaris 10. We've > had a number of people hit this in the wild - including myself using Apache > 2.0.54 (our latest stable release). We're stumped and we're trying to see if > we can find some help from people who know Solaris fairly well. =) > > We've narrowed it down to some weirdness with reading and writing sockets > between processes.
Do you use AF_INET or AF_UNIX sockets? I'd guess that these are AF_UNIX sockets. It looks similar to bug http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6227895 or http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6249138 If you are using AF_INET sockets, it is something else. The two bugs above are fixed in SolarisExpress/OpenSolaris bits. There is S10 port which will show up with the update. - Alexander Kolbasov > > First off, here's our httpd bug report: > > <http://issues.apache.org/bugzilla/show_bug.cgi?id=34264> > > We have confirmed this with Solaris 10 GA with all current Sol10 GA patches > applied. We have heard reports that 'go[ing] back to the next to last > Solaris > Express driver prior to GA' works. So, this bug must have been introduced > just before 10 went GA. There were no issues with Solaris 9 or earlier. > > Here's the overview: When httpd 2.x is threaded (with the 'worker' MPM), CGIs > are handled by a dedicated process that sits in an accept() loop. When an > incoming request gets assigned to a thread and needs to exec() a CGI, the > thread creates a socket to this standalone cgid process. The thread then > writes a bunch of information over this socket - such as the program name, > arguments, etc. The cgid process then reads this data and executes the > script > accordingly and shuffles back the program output over that socket. > > Here's the issue we have: the environment variables to use in the CGI are > passed in a <4-byte len><value> format on the socket from the thread. At > times, the environment variable length is 'skipped' or corrupted. This > causes > httpd to think that there's a *lot* of data to be read - it then calloc's > roughly 1GB of memory (ouch!). > > Through dtrace, we know that we're writing it successfully to the socket - > but > it will occasionally come out on the other side corrupted. N.B. if you truss > the program, it'll work just fine. > > Here's a dtrace file you might find helpful: > > <http://people.apache.org/~jerenkrantz/httpd.d> > > (I'm new to dtrace, so there might be easier ways to write this script.) > > Configure httpd with --enable-mpm=worker such that we only have 1 worker > thread: > > <IfModule worker.c> > StartServers 1 > MaxClients 1 > MinSpareThreads 1 > MaxSpareThreads 1 > ThreadsPerChild 1 > MaxRequestsPerChild 0 > </IfModule> > > Run it with: > > ./httpd.d <pid of worker process> <pid of cgid process> > > The worker process is the httpd with multiple threads; look for a thread that > has these stack characteristics (this is the idle 'worker' waiting for a > connection): > ----------------- lwp# 3 / thread# 3 -------------------- > cba7d3a9 lwp_park (0, 0, 0) > cba77c2a cond_wait_queue (81e5768, 81e5734, 0, 0) + 3b > cba78123 _cond_wait (81e5768, 81e5734) + 66 > cba78165 cond_wait (81e5768, 81e5734) + 21 > cba7819e pthread_cond_wait (81e5768, 81e5734) + 1b > cbcaf9a6 apr_thread_cond_wait (81e5760, 81e5730) + 36 > 080ada99 ap_queue_pop (81e5718, ca49dfa4, ca49df98) + 69 > 080aac3f worker_thread (81e5838, 87cfdd8) + 10f > cbcaac0a dummy_worker (81e5838) + 3a > cba7d03f _thr_setup (ca9b0000) + 4e > cba7d330 _lwp_start (ca9b0000, 0, 0, ca49dff8, cba7d330, ca9b0000) > > The cgid program will have a pstack output similar to: > > 15089: httpd -k start > cba7d905 accept (25, 8046b7a, 8046b64, 1) > 0809c1e4 cgid_server (811f820) + 314 > 0809c852 cgid_start (811b0a0, 811f820, 8119e18) + a2 > 0809b405 cgid_maint (0, 8119e18, f) + 95 > cbcad0cd apr_proc_other_child_alert (8046c7c, 0, f) + 7d > cbcad300 apr_proc_other_child_read (8046c7c, f) + 30 > 080ac1e9 server_main_loop (0) + 199 > 080ac57b ap_mpm_run (811b0a0, 8153180, 811f820) + 2eb > 080b5a34 main (3, 8046d44, 8046d54) + 9c4 > 0807ba3a ???????? (3, 8046e84, 8046ea4, 8046ea7, 0, 8046ead) > > Pretty much any CGI script will demonstrate the problem. There's a few in > the > httpd PR link above. > > So, you could run dtrace with: > > ./httpd.d 14581 15089 > > Here's a 'bad' output from an Solaris 10/Intel (SMP) box: > > ----- > 0 40471 get_req:entry (15274) Entering get_req! > 0 10 read:entry read (15274) fd: 38 - 64 bytes > 0 11 read:return read (15274) fd: 38 - 64 bytes > 0 11 read:return > 0 10 read:entry read (15274) fd: 38 - 42 bytes > 0 11 read:return read (15274) fd: 38 - 42 bytes > 0 11 read:return > /home/jerenk/public_html/weblog/weblog.cgi > 0 10 read:entry read (15274) fd: 38 - 10 bytes > 0 11 read:return read (15274) fd: 38 - 10 bytes > 0 11 read:return weblog.cgii/software/ > 0 10 read:entry read (15274) fd: 38 - 20 bytes > 0 11 read:return read (15274) fd: 38 - 20 bytes > 0 11 read:return /weblog.cgi/software > 0 10 read:entry read (15274) fd: 38 - 4 bytes > 0 11 read:return read (15274) fd: 38 - 4 bytes > 0 11 read:return 1430084180 > 0 11 read:return TZ=U > ...UH-OH. This isn't correct....15274 will now go allocate that much > memory... > 1 12 write:entry write (14581) fd: 38 - 64 bytes > 1 12 write:entry > 1 13 write:return wrote (14581): 64 > 1 12 write:entry write (14581) fd: 38 - 42 bytes > 1 12 write:entry > /home/jerenk/public_html/weblog/weblog.cgi > 1 13 write:return wrote (14581): 42 > 1 12 write:entry write (14581) fd: 38 - 10 bytes > 1 12 write:entry weblog.cgi > 1 13 write:return wrote (14581): 10 > 1 12 write:entry write (14581) fd: 38 - 20 bytes > 1 12 write:entry /weblog.cgi/software > 1 13 write:return wrote (14581): 20 > 1 12 write:entry write (14581) fd: 38 - 4 bytes > 1 12 write:entry > 1 13 write:return wrote (14581): 4 > 1 12 write:entry write (14581) fd: 38 - 13 bytes > 1 12 write:entry TZ=US/Pacific > 1 13 write:return wrote (14581): 13 > 1 12 write:entry write (14581) fd: 38 - 4 bytes > 1 12 write:entry $ > 1 13 write:return wrote (14581): 4 > 1 12 write:entry write (14581) fd: 38 - 36 bytes > 1 12 write:entry > HTTP_HOST=weblog.erenkrantz.com:8080 > 1 13 write:return wrote (14581): 36 > 1 12 write:entry write (14581) fd: 38 - 4 bytes > 1 12 write:entry l > 1 13 write:return wrote (14581): 4 > 1 12 write:entry write (14581) fd: 38 - 108 bytes > 1 12 write:entry HTTP_USER_AGENT=Mozilla/5.0 > (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8b4) Gecko/20050819 > Firefox/1.0+ > 1 13 write:return wrote (14581): 108 > ...snip...pid 15274 is done allocating 1430084180 bytes.... > 1 10 read:entry read (15274) fd: 38 - 1430084180 > bytes > 1 11 read:return read (15274): Short read: > 1430084180, 165 > 1 11 read:return S/Pacific > 1 10 read:entry read (15274) fd: 38 - 1430084015 > bytes > 1 11 read:return read (15274): Short read: > 1430084015, 1849 > 1 11 read:return Data too large > 1 10 read:entry read (15274) fd: 38 - 1430082166 > bytes > 1 11 read:return read (15274): Short read: > 1430082166, 0 > 1 11 read:return > 1 40472 get_req:return (15274) Leaving get_req! > ----- > > Since its an SMP box, I'm guessing that'll partially explain why they appear > out of order. (This is the first connection, so it's not from somewhere > else.) > > For 4 byte reads, I have dtrace doing the length conversion on the read data. > > Notice the write pattern: 64 bytes, 42 bytes, 10, 20, 4, 13, 4, 36.... > Notice the read pattern: 64 bytes, 42 bytes, 10, 20, 4**, 1430084180... > > That 4-byte read is corrupted. This causes the reader to allocate 1430084180 > bytes. > > So, does this ring a bell for anyone? > > Our code worked fine on Solaris 9 - which I was using until yesterday on this > particular machine. And, as mentioned before, it also worked fine on pre-GA > releases of Sol10. (It also works identically on FreeBSD, Linux, etc, etc.) > > The cgid source file is here: > > <http://svn.apache.org/repos/asf/httpd/httpd/tags/2.0.54/modules/generators/mod_cgid.c> > > The thread from [email protected] today is here: > > <http://mail-archives.apache.org/mod_mbox/httpd-dev/200508.mbox/[EMAIL > PROTECTED]> > > Thanks in advance for any help. -- justin > _______________________________________________ > opensolaris-code mailing list > [email protected] > https://opensolaris.org:444/mailman/listinfo/opensolaris-code > _______________________________________________ opensolaris-code mailing list [email protected] https://opensolaris.org:444/mailman/listinfo/opensolaris-code
