Re: mod_fcgi: Excessive memory usage when large files are uploaded
Hello, It turns out that there is an existing bug report for this since year 2011: https://bz.apache.org/bugzilla/show_bug.cgi?id=51747 The latest proposed patch by Jan Stürtz on 2015-08-28 fixes both the Linux and Windows versions and still applies OK. Furthermore, it uses the same algorithm as mine but is better coded by using the existing Apache APR library functions. This patch seems suitable for production, and we're already evaluating it on our servers. Best regards. --Ivan On 17.1.2017 г. 17:49 ч., Ivan Zahariev wrote: Hi, Here is the patch -- https://github.com/famzah/mod_fcgid/commit/84c7c2dbf2047745c6aea87a4bc6b4061c98ac8e A few notes: I tested it with several files which had random data and various lengths: 0b, 127b, 128b, 129b, 150MB, 1MB, 1b, 250MB, 350MB, 8191b, 8192b, 8193b. The uploaded size and SHA1 checksum match. Also made sure that the special case "There are something left" in the code was tested too, at the end when "nvec != 0". I can't test this on Windows but porting the patch in "fcgid_proc_win.c" looks easy, since the same loop method is used there too. Additionally, "fcgid_proc_win.c" doesn't group write requests in 8 chunks and does everything one-by-one (one read, then one write to the IPC socket). Please review the patch. If it looks good, I'll test it on our production machines and then submit it for mainstream merge. Best regards. --Ivan On 17.1.2017 г. 13:03 ч., Yann Ylavic wrote: On Tue, Jan 17, 2017 at 9:06 AM, Ivan Zahariev wrote: 1. Delete each bucket after sending it to the "ipc_handle". I've looked through the call tree and the *output_brigade is last used by proc_write_ipc(). Therefore, it should be safe to empty it while being processed there. 2. Take the same approach as mod_http2, which handles FILE buckets in a different way. Instead of using apr_bucket_read(), they process FILE buckets by apr_file_read() and manage the data buffer manually. This way the original *output_brigade won't be modified and automatically split by apr_bucket_read(). This requires more coding work. I would go for 1., but keep in mind that you can't delete the bucket while it is still pointed to by the iovec... An (optimal) alternative would be to use apr_socket_sendfile() (for file buckets) like sendfile_nonblocking() does in httpd's core output filter (see server/core_filters.c). But 1. is wise still for non file buckets. Regards, Yann.
Re: mod_fcgi: Excessive memory usage when large files are uploaded
Hi, Here is the patch -- https://github.com/famzah/mod_fcgid/commit/84c7c2dbf2047745c6aea87a4bc6b4061c98ac8e A few notes: I tested it with several files which had random data and various lengths: 0b, 127b, 128b, 129b, 150MB, 1MB, 1b, 250MB, 350MB, 8191b, 8192b, 8193b. The uploaded size and SHA1 checksum match. Also made sure that the special case "There are something left" in the code was tested too, at the end when "nvec != 0". I can't test this on Windows but porting the patch in "fcgid_proc_win.c" looks easy, since the same loop method is used there too. Additionally, "fcgid_proc_win.c" doesn't group write requests in 8 chunks and does everything one-by-one (one read, then one write to the IPC socket). Please review the patch. If it looks good, I'll test it on our production machines and then submit it for mainstream merge. Best regards. --Ivan On 17.1.2017 г. 13:03 ч., Yann Ylavic wrote: On Tue, Jan 17, 2017 at 9:06 AM, Ivan Zahariev wrote: 1. Delete each bucket after sending it to the "ipc_handle". I've looked through the call tree and the *output_brigade is last used by proc_write_ipc(). Therefore, it should be safe to empty it while being processed there. 2. Take the same approach as mod_http2, which handles FILE buckets in a different way. Instead of using apr_bucket_read(), they process FILE buckets by apr_file_read() and manage the data buffer manually. This way the original *output_brigade won't be modified and automatically split by apr_bucket_read(). This requires more coding work. I would go for 1., but keep in mind that you can't delete the bucket while it is still pointed to by the iovec... An (optimal) alternative would be to use apr_socket_sendfile() (for file buckets) like sendfile_nonblocking() does in httpd's core output filter (see server/core_filters.c). But 1. is wise still for non file buckets. Regards, Yann.
mod_fcgi: Excessive memory usage when large files are uploaded
Hello, If a large amount of data is POST'ed to a process running mod_fcgid, the Apache child uses an excessive amount of memory when processing it. The client request is properly received and the following statement from the documentation is true: "Once the amount of request body read from the client exceeds FcgidMaxRequestInMem bytes, the remainder of the request body will be stored in a temporary file." The problem occurs when the temporary file is being sent to the FastCGI handling process via its IPC socket. Here is the corresponding function which sends the prepared "output_brigade" to the IPC socket: https://gist.github.com/famzah/125a91971fba3450dd4926fe13e0ede6 The documentation of apr_bucket_read() clearly states that "if buckets are read in a loop, and aren't deleted after being processed, the potentially large bucket will slowly be converted into RAM resident heap buckets. If the file is larger than available RAM, an out of memory condition could be caused." I need your guidance, in order to fix this properly. I've researched a bit and see the following possible options to fix this: 1. Delete each bucket after sending it to the "ipc_handle". I've looked through the call tree and the *output_brigade is last used by proc_write_ipc(). Therefore, it should be safe to empty it while being processed there. 2. Take the same approach as mod_http2, which handles FILE buckets in a different way. Instead of using apr_bucket_read(), they process FILE buckets by apr_file_read() and manage the data buffer manually. This way the original *output_brigade won't be modified and automatically split by apr_bucket_read(). This requires more coding work. Best regards. --Ivan
Re: mod_fcgid: Immediate HTTP error 503 if the max total process count is reached
Hi, Thanks for the review. You can find my comments below in the reply. On 20.9.2016 г. 16:33 ч., Eric Covener wrote: Unfortunately there are not many reviewers for mod_fcgid. I tried to take as a relative laymen and had a few comments/questions: * a few C99 // comments were added and should be zapped Fixed. * the assert looks funny -- no assert. maybe ap_debug_assert() so it blows up in maintainer builds only? It's a paranoid check. If it ever blows the whistle, this will be in some very hard to reproduce case, and I don't expect that we catch this on a maintainer build. I made this an APLOG_CRIT message because a mismatch in the counts between proctable_count_all_nonfree_nodes() and "g_total_process" could make your FastCGI requests fail erroneously with an immediate 503 HTTP error. A System Administrator who could encounter this bug will appreciate the immediate APLOG_CRIT message in the "error_log". I don't insist this to remain like this and I will change it as you say, in order to comply with the usual development practices. Or we can give this function check a better name. * The new flag is merged, but cannot be unset? ** Probably need to make it look like an AP_INIT_FLAG Right. I've converted this to an AP_INIT_FLAG which makes more sense. I don't understand what you mean by "it cannot be unset". I'm merging it in the same way as other mod_fcgid config options. Please read below for more information regarding this matter. * I couldn't follow the concept in the comments around max_process_count moving or being per-request. Is it maybe out of date? I think the current impl takes an existing global only and merges it with vhost server confs but it will always be a copy. The original implementation allows you to set "max_process_count" in the GLOBAL_ONLY context. This is enforced by ap_check_cmd_context() in the AP_INIT_TAKE1() handler, because RSRC_CONF would otherwise allow us to define this config option also "per vhost". I've added the new variable "max_process_used_no_wait_enable" in the same way. The original implementation works well without further magic, because it really uses "max_process_count" in code which works with the "main_server" (global) context. Now I need to access the value of "max_process_count" inside a "request_rec" (per VirtualHost) variable. If one or more mod_fcgid directives are defined not only within the main server config, but also as additional settings in a VirtualHost context, then we need to merge "max_process_count", too. If we don't merge, the value of "max_process_count" returned by the "request_rec" variable will be the default init value (zero, in our case). Long story short -- you are right, this is a global only variable and it merges with vhost confs always as a copy of the global value. This is just a behind-the-scenes action and doesn't affect Apache administrators, since they can't define this variable in a vhost context anyway. I tried to summarize this as the comment ["global only" but we use its main_server value in a "request_rec"] but maybe someone can propose a better text. I've committed the source code changes and you can review them again. When we're satisfied with the changes, I'll test the final version on our server fleet. Best regards. --Ivan
Re: mod_fcgid: Immediate HTTP error 503 if the max total process count is reached
Hello devs, It's been four months since I originally proposed this new feature in "mod_fcgid". During this time I've tested and deployed it on hundreds of busy production servers. It works as expected. If enabled, web visitors get an immediate response when FastCGI is overloaded, and no RAM is over-utilized. You can find all the information and a patch in the enhancement request at Bugzilla: https://bz.apache.org/bugzilla/show_bug.cgi?id=59656 Can we get this merged into "mod_fcgid"? Best regards. --Ivan On 2.6.2016 г. 11:57 ч., Ivan Zahariev wrote: Hi Nick, Thanks for the info. I've followed your instructions and submitted an enhancement request: https://bz.apache.org/bugzilla/show_bug.cgi?id=59656 Cheers. --Ivan On 31.5.2016 г. 13:45 ч., Nick Kew wrote: On Tue, 2016-05-31 at 11:15 +0300, Ivan Zahariev wrote: Hello, I got no feedback. Am I posting this suggestion at the right mailing list? Sorry, I see your original post marked for attention in my mail folder, but languishing hitherto unattended. Just now opened your link in a browser to take a look. There could be others who have done something similar. As a general reply to this question, yes, this is the best available mailinglist. The other place to post it would be as an enhancement request in bugzilla (issues.apache.org). The keyword "PatchAvailable" there may help by marking it as low-hanging fruit.
Re: mod_fcgid: Immediate HTTP error 503 if the max total process count is reached
Hi Nick, Thanks for the info. I've followed your instructions and submitted an enhancement request: https://bz.apache.org/bugzilla/show_bug.cgi?id=59656 Cheers. --Ivan On 31.5.2016 г. 13:45 ч., Nick Kew wrote: On Tue, 2016-05-31 at 11:15 +0300, Ivan Zahariev wrote: Hello, I got no feedback. Am I posting this suggestion at the right mailing list? Sorry, I see your original post marked for attention in my mail folder, but languishing hitherto unattended. Just now opened your link in a browser to take a look. There could be others who have done something similar. As a general reply to this question, yes, this is the best available mailinglist. The other place to post it would be as an enhancement request in bugzilla (issues.apache.org). The keyword "PatchAvailable" there may help by marking it as low-hanging fruit.
Re: mod_fcgid: Immediate HTTP error 503 if the max total process count is reached
Hello, I got no feedback. Am I posting this suggestion at the right mailing list? Best regards. --Ivan On 19.5.2016 г. 10:40 ч., Ivan Zahariev wrote: Hi all, I'd like to propose a new configuration setting for "mod_fcgid". The source code changes to review follow: * The whole patch compared to version 2.3.9: https://github.com/famzah/mod_fcgid/compare/2.3.9...maxnowait?diff=split&name=maxnowait * The whole patch as a single file: https://github.com/famzah/mod_fcgid/compare/2.3.9...maxnowait.patch?diff=unified&name=maxnowait * Every single commit compared to version 2.3.9: https://github.com/famzah/mod_fcgid/commits/maxnowait * There should be no merge conflicts with the current "trunk" version 2.3.10. The motivation is that the current behavior to queue up new pending requests differs from the RLimitNPROC directive behavior. When there is a spike in the web hits, lots of Apache children get busy just waiting for up to 30+ seconds to get an idle FastCGI process. Thus we "waste" Apache children doing nothing while they could serve static content. This also puts unneeded memory pressure. Additionally, users today won't wait for a page to load, if it takes more than a few seconds, and we'd rather notify them that we are currently overloaded by sending them a 503 HTTP response immediately. Here is the documentation for the new directive: http://www.famzah.net/temp/FcgidMaxProcessesUsedNoWait.docs.txt Let me know what you think. Best regards. --Ivan
mod_fcgid: Immediate HTTP error 503 if the max total process count is reached
Hi all, I'd like to propose a new configuration setting for "mod_fcgid". The source code changes to review follow: The whole patch compared to version 2.3.9: https://github.com/famzah/mod_fcgid/compare/2.3.9...maxnowait?diff=split&name=maxnowait The whole patch as a single file: https://github.com/famzah/mod_fcgid/compare/2.3.9...maxnowait.patch?diff=unified&name=maxnowait Every single commit compared to version 2.3.9: https://github.com/famzah/mod_fcgid/commits/maxnowait There should be no merge conflicts with the current "trunk" version 2.3.10. The motivation is that the current behavior to queue up new pending requests differs from the RLimitNPROC directive behavior. When there is a spike in the web hits, lots of Apache children get busy just waiting for up to 30+ seconds to get an idle FastCGI process. Thus we "waste" Apache children doing nothing while they could serve static content. This also puts unneeded memory pressure. Additionally, users today won't wait for a page to load, if it takes more than a few seconds, and we'd rather notify them that we are currently overloaded by sending them a 503 HTTP response immediately. Here is the documentation for the new directive: http://www.famzah.net/temp/FcgidMaxProcessesUsedNoWait.docs.txt Let me know what you think. Best regards. --Ivan