Re: [Libevent-users] http input buffer deallocation problem
Niels Provos wrote: On Mon, May 12, 2008 at 3:33 AM, Cezary Rzewuski <[EMAIL PROTECTED]> wrote: I suspect there may be two reasons, but I don't know if this how libevent works: 1) there is any priority in events processed in libevent, and buffer deallocation remains in the tail of events queue 2) may be there is any HTTP header that causes libevent to postpone closing connection (which crawler uses but wget not). We would have to see your code to figure out what's going on. For example, HTTP has persistent connections, i.e. connections that stay open after a request has been handled. It would also be good to know which version of libevent you are using. Niels. Actually my code is very similar to spybye. However, I don't need sites to be cached or to wait for antivirus scanning, so it's kind of simplified spybye. libevent functions are used in exactly the same way as in spybye. I use libevent1.4.3-stable. After more extensive debugging I suspect, that, in case of heavy workload, libevent just doesn't complete all the requests that are scheduled. I can see that my proxy receives a request from the crawler, sends the request to a www server, then receives response from the server. Then evhttp_send_reply function is called to return the reply to the crawler. But then, strangely, function evhttp_write is never called with argument of this particular connection. Consequently, evhttp_send_done function, which is responsible for freeing connection and request structures, is also never called. This causes that evhttp_connection and evhttp_request structures are never freed. The situation I've described happens really randomly and only in case of crawling (which means many requests). When I use proxy with browser (and human is the one that interacts:)), everything looks fine. Cezary Rzewuski ___ Libevent-users mailing list Libevent-users@monkey.org http://monkeymail.org/mailman/listinfo/libevent-users
[Libevent-users] http input buffer deallocation problem
Hello, I'm experiencing problem with memory freeing in libevent (1.4.3 stable). I've implemented proxy server using libevent http functions. Everything works OK as long as the workload isn't heavy - then memory allocations grows rapidly. Memory leak detection tool (I use memoryscape from totalview technologies) told me that allocation of memory that isn't deallocated is done in evbuffer_expand function (when realloc is called). The backtrace is: event_dispatch event_loop event_base_loop event_read_body evbuffer_add evbuffer_expand When I use tool like wget to download file, everything works fine. But when it's proxying crawler requests (which meany heavy workload) the heap grows. I suspect there may be two reasons, but I don't know if this how libevent works: 1) there is any priority in events processed in libevent, and buffer deallocation remains in the tail of events queue 2) may be there is any HTTP header that causes libevent to postpone closing connection (which crawler uses but wget not). Any help appreciated. Best regards, Cezary Rzewuski ___ Libevent-users mailing list Libevent-users@monkey.org http://monkeymail.org/mailman/listinfo/libevent-users
Re: [Libevent-users] garbage collection in libevent
Niels Provos wrote: You could try to use event_set_mem_functions() to change the functions that libevent uses for memory allocation internally. Which function exactly do you mean? I'd like to change memory allocation function to detect memory leaks, but I cannot find event_set_mem_functions nor by grepping libevent source neither through google. Cezary Rzewuski ___ Libevent-users mailing list Libevent-users@monkey.org http://monkeymail.org/mailman/listinfo/libevent-users
Re: [Libevent-users] http: libevent vs many threads
Thank you for Your suggestions. I've just finished the implementation. I used the approach of libevent as HTTP server and threads working on downloaded content (they are performing some statistical computation on downloaded javascripts). It looks to work efficiently. It's probably not the right group, but you says that switching between threads is expensive. However, I've read somewhere (it was probably "Advanced linux programming" by Alex Samuel) that creating a new thread is nearly as fast as calling a function. Does it mean, that switching between threads is slower than creating a new thread? Once more - thanks for comprehensive answer. William Ahern wrote: On Wed, Apr 02, 2008 at 10:15:59PM +0200, Cezary Rzewuski wrote: Hi, I'd like to ask if sending http requests with libevent is carried out in separate thread or is the library single-threaded? I want to use the library in a program which will visit many URL and download it's content. Is it good idea to use libevent or the classic solution with creating a separate thread per URL request will be much more efficient solution? It depends. What you describe is not nearly enough informatio to even give a suggestion. One thread per URL normally is a very poor choice (just as a matter of runtime efficiency), unless each URL causes you to do a lot of disk I/O, or if each URL causes you to do CPU intensive operations, like decode compressed audio/video. In each of those two situations, the process context switching costs are diminished relative to the type of work being done. Basically, the idea is that if your thread will block on an operation--CPU or I/O--but another thread running in parallel (not merely concurrently) could utilize additional resources, you want to multi-thread. If your application is merely moving bytes (say, as a proxy), usually a single thread is enough; you can multiplex non-blocking network operations on a single thread. In that sense, you're "switching contexts" in the application, and not the kernel. This reduces the workload, because context switching in the kernel is usually more expensive., OTOH, copying data in itself can be CPU intensive. If you read into a buffer from one socket, you might evict previous data you read in earlier. If you then try to re-read and/or copy that previous data over to another buffer later, the process will block as the data is fetched from RAM. If your proxy is even on a 100Mb connection, depending on how you process the data, you most definitely will need multiple threads. That's because 100Mb of network data could ballon to 5x or 10x that mount of byte shuffling. Of course, depending on how the L1, L2 and L3 caches are shared, it might not actually make much of a difference. It all depends! Of course, you can always use an event-oriented model within each particular thread. Or spread event delivery and processing across multiple threads. Given that you seem new to this (or at least new to the particular problem you're trying to solve), your best bet is to use a single thread using libevent, or go totally multi-threaded without libevent. In 90% of the circumstances one of those options (though not both) are as near to optimal as you'll get, and you don't need to the headaches of any additional complexity. I saw that libevent was used in spybye, which is kind of similar what I want to do. I was wondering if spybye were more efficient with requests served in separate threads instead of using libevent (I don't say that it's not efficient, just theoretically). I'm not sure, maybe its most efficient using _both_. But I suspect it probably just uses libevent in a single thread. Note, there are other ways to use threads. You could use one thread using libevent to handle all your queries and network I/O. Then you could use a separate thread worker pool to, for instance, run ClamAV on the data. This works well if you can isolate your CPU intensive work outside the mundane network I/O parts. If your application is overall CPU bound, and latency of particular requests isn't of primary concern, then it doesn't matter that libevent is running in a single thread. All your CPUs are doing work, just not the same types of work. ___ Libevent-users mailing list Libevent-users@monkey.org http://monkeymail.org/mailman/listinfo/libevent-users ___ Libevent-users mailing list Libevent-users@monkey.org http://monkeymail.org/mailman/listinfo/libevent-users
[Libevent-users] http: libevent vs many threads
Hi, I'd like to ask if sending http requests with libevent is carried out in separate thread or is the library single-threaded? I want to use the library in a program which will visit many URL and download it's content. Is it good idea to use libevent or the classic solution with creating a separate thread per URL request will be much more efficient solution? I saw that libevent was used in spybye, which is kind of similar what I want to do. I was wondering if spybye were more efficient with requests served in separate threads instead of using libevent (I don't say that it's not efficient, just theoretically). Kind regards, Cezary ___ Libevent-users mailing list Libevent-users@monkey.org http://monkeymail.org/mailman/listinfo/libevent-users