Re: select() efficiency / epoll
Davide Libenzi wrote: There is no known problem in using epoll_ctl() in one thread while another does epoll_wait(). I suggest you to ask Valgrind to take a look at you binary. Since I have no clue of what your software does, please create the *minimal* code snippet that exploit the eventual problem, and post it. Yes, I have pretty much confirmed this. And unfortunately I tried to make a minimal code snippet which demonstrates the problem, but wasn't able to do that before I figured out a work-around. I may still try to create something for you to test against so you can fix it. But I'm going to have to continue to work with the existing implementation since I'm going to be running this code on some production servers where updating the kernel might not be an option. The work-around is as follows: 1) I create a queue that can hold operations to perform on the epoll structure and I protect it with a mutex. 2) Other threads (when needing to modify the epoll) lock the mutex and enque the operation into the operation queue instead of calling epoll_ctl itself (i.e. add this socket for reading.. add this socket for writing, remove this socket.. etc) *and* then cancel the epoll_wait() I implemented the cancel by having a pipe() always being watched for read, and write a byte to it when I want to cancel (is there a better way?) There are several operations that could be supported (add/remove/modify/change userdata/etc), but I only need two myself. 3) There's only one thread that actually does the epoll_wait(). When epoll_wait() returns, (I first drain the cancel pipe so it never fills up) I handle what events need handling, and then lock the operations queue mutex, perform all the operations in the queue then clear the queue So, this works for me now. Thanks for all your guys' info. -- Davy P.S. Davide, I still might get you that snipped, but it's not a trivial snippet as you can imagine... and timing is everything to the problem :( .. and also the question of WHERE it corrupts memory.. it seemed to be unpredictable so far. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() efficiency / epoll
Jari Sundell wrote: On 8/23/05, Davy Durham <[EMAIL PROTECTED]> wrote: I was hoping you would mention in your reply that you knew epoll_data_t was an union and you didn't touch epoll_data::fd, so i wouldn't have to say it explicitly. ;) Oh!.. unless the epoll_data_t is a union just for convenience in that it already has an 'int fd' if you want to use that, but don't have to.. that at least makes the void *ptr, useful.. The example in 'man epoll' sorta made it look necessary to set the 'fd' of the union. But that still doesn't fix the issue of course.. but good to know. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() efficiency / epoll
Jari Sundell wrote: On 8/23/05, Davy Durham <[EMAIL PROTECTED]> wrote: I was hoping you would mention in your reply that you knew epoll_data_t was an union and you didn't touch epoll_data::fd, so i wouldn't have to say it explicitly. ;) No, I saw that epoll_data_t was a union (although, it kind of makes the ptr useless as a user data pointer.. but I'm not using it for that) When I mean that pointers are getting corrupted, I just mean in other parts of the code (actually it's some C++ STL container's data and is completely unrelated to the epoll specific code) Something, somewhere seems to be writing to memory that it's not supposed to be writing to. And as far as I can tell, it happens when I use epoll and doesn't when I use select :-/ -- Davy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() efficiency / epoll
Davide Libenzi wrote: I should mention that the 2.4 patch is old WRT mainline epoll in 2.6 (I stopped maintaining it when 2.6 went "stable"). I'd definitely suggest to use 2.6 if you are looking at epoll. I am using linux-2.6.11 and glibc-2.3.4 .. and using select() in it's place seems to work fine. Are there any known issues with say, one thread does epoll_wait()s while other threads may be doing epoll_ctl()s? Is there someone else I should be asking this question? Thanks, Davy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() efficiency / epoll
Thanks for the info.. I did find this thread and was wondering if this patch ever got put in http://www.ussg.iu.edu/hypermail/linux/kernel/0303.3/1139.html Willy Tarreau wrote: On Tue, Aug 23, 2005 at 06:24:42AM -0500, Davy Durham wrote: That's probably a good idea. Where would I find out what other projects use it? I use it in my load-balancer (haproxy), and it could somewhat match your needs, because I ported the select()-based earlier version to epoll() with the smallest possible changes. Indeed, the new epoll() loop still uses the FD_ISSET() to determine what to do with epoll_ctl(). If you have changed your code to use select(), you may find similarities. But I want to tell you from now that my code is NOT multi-threaded. It could be a bug in the epoll implementation, because I don't think that there are so many applications using epoll on MT models. Bert says that the epoll implementation is heavily benchmarked, which is true, but which does not guarantee that it is tested under every condition. You can download it from there : http://w.ods.org/tools/haproxy/src/devel/ Use version 1.2.6. I added epoll in 1.2.5, so the diff between 1.2.4 and 1.2.5 could help you too. Good luck ! Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() efficiency / epoll
Jari Sundell wrote: On 8/23/05, Davy Durham <[EMAIL PROTECTED]> wrote: However, I'm getting segfaults because some pointers in places are getting set to low integer values (which didn't used to have those values). Is it possible that you are overwritting the pointers with file descriptors, as those would have low integer values? Yes, that is what I was thinking and is why I mentioned that. But I'm apparently not overwriting the pointers with FDs.. it seems that epoll is the cause at this point (unless I'm misusing the epoll API). I've made some changes to now use select() instead of epoll and things work flawlessly (although it obviously won't work as efficiently when I really connect a lot of clients to this server) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() efficiency / epoll
That's probably a good idea. Where would I find out what other projects use it? Willy Tarreau wrote: Hi, On Tue, Aug 23, 2005 at 06:01:15AM -0500, Davy Durham wrote: I just mean that when I debug and catch the segv, it's dies because some pointers now have corrupted values. (usually because something is overwriting some memory some where) I'm currently re-writing some code to make it use select() instead of epoll_wait() and see if everything is suddently fixed. If so, then I will suspect that epoll has a problem. But it's still not ruled out being my fault since it could be a timing issue that makes the crash show up. Just out of curiosity, have you had the opportunity to read some other code which uses epoll ? Maybe reading others code could enlighten you on potential bugs in your code, potential races, etc... Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() efficiency / epoll
Davy Durham wrote: I'm currently re-writing some code to make it use select() instead of epoll_wait() and see if everything is suddently fixed. If so, then I will suspect that epoll has a problem. But it's still not ruled out being my fault since it could be a timing issue that makes the crash show up. Well, the select() replacement works fine... so hrmm.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() efficiency / epoll
bert hubert wrote: On Tue, Aug 23, 2005 at 04:49:14AM -0500, Davy Durham wrote: However, I'm getting segfaults because some pointers in places are getting set to low integer values (which didn't used to have those values). epoll is pretty heavily benchmarked and hence tested. I don't entirely understand the remark above and suggest looking at the generated core dumps. I just mean that when I debug and catch the segv, it's dies because some pointers now have corrupted values. (usually because something is overwriting some memory some where) I'm currently re-writing some code to make it use select() instead of epoll_wait() and see if everything is suddently fixed. If so, then I will suspect that epoll has a problem. But it's still not ruled out being my fault since it could be a timing issue that makes the crash show up. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() efficiency / epoll
So, I've been trying to use epoll.. on linux-2.6.11-6mdk However, I'm getting segfaults because some pointers in places are getting set to low integer values (which didn't used to have those values). The deal is that my application is multi-threaded, and I was wondering if epoll had issues if you use epoll_ctl while an epoll_wait is waiting or something like that. I'm also compiling with -D_MULTI_THREADED. I'm not new to threading, but am stumped at this point. I'm not ruling out it being my code, but wanted to ask about epoll since it's so new. Any ideas? Thanks, Davy bert hubert wrote: On Fri, Jul 22, 2005 at 04:18:46PM -0500, Davy Durham wrote: Please forgive and redirect me if this is not the right place to ask this question: I'm looking to write a sort of messaging system that would take input from any number of entities that "register" with it.. it would then route the messages to outputs and so forth.. Look at epoll, or libevent, which uses epoll to be quick in this scenario. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: /proc question
Jan Engelhardt wrote: I have a zombie process which has apparently died for some unknown reason.. I know it was terminated by a signal (found that from the 9th field (sheduler flags) in /proc/pid/stat) Start the process under the observation of strace. However, I'm trying to figure out what signal killed it. Jan Engelhardt Wish I could.. but it's already happened (to a lot of processes for the same reason) It's an intermittant problem and can't really reproduce it at will. I've redeployed the binary now so I can hopefully attach to it with gdb to figure out some things next time it does happen. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
/proc question
After much research.. I have a question regarding /proc I have a zombie process which has apparently died for some unknown reason.. I know it was terminated by a signal (found that from the 9th field (sheduler flags) in /proc/pid/stat) However, I'm trying to figure out what signal killed it. Also, it would be nice if /proc could show what the exit status of a dead process is.. seems strange that it doesn't contain that information (or am I just not seeing it in there). Any info would be helpful.. thanks, Davy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
select() efficiency
Please forgive and redirect me if this is not the right place to ask this question: I'm looking to write a sort of messaging system that would take input from any number of entities that "register" with it.. it would then route the messages to outputs and so forth.. I'm guessing that the messaging system would be a single process on the machine.. So, I'm considering making the means of input to the system be a unix socket. An entity would connect to the socket as it's means of inputting messages into the system. However, lets suppose that 1000+ entities connect to that socket.. this would require the message system's loop to be adding 1000+ file descriptures to an fd_set and call select() every time it loops around to check for any messages. So, my question is: how efficient would things be, doing selects() very often on 1000+ file descriptors? I'm not aware of max size for an fd_set.. (I do know that NT is limited to 64 handles.. but that's really beside the point unless I look at porting someday) Should I go another route? The system is meant to rapidly route messages ASAP.. so it would be a bad idea to say write them to a file and poll the file or something like that... Another thought was to use a system-wide mutex and write to a named pipe, but the socket method seems more appealing to me in design... and I didn't know if it was pretty much equivalent either way since either I will do the work of dealing with 1000+ things or the kernel will. Thanks, Davy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Suspend/Resume
Hi, I've been trying for the last few days to get my D810 to suspend and resume in linux. I'm doing it from klaptop in kde using Fedora Core 3, but I've now compiled my own linux-2.6.12-rc2-mm3 kernel since I've seen some ACPI changes going in. At 2.6.11 it would seem to suspend ok, but when doing the resume it would come back and have I/O errors.. causing the computer to freeze for a few seconds, then run for a second, then freeze again, etc.. the HDD light would stay on solid, and at the tty1 I saw something like "ata1: command 0xc8 timeout... I/O error..." So apparently something isn't getting starting back up. Thinking it might be the HDD not spinning, I powered off, but DID hear it spin down. Running what I compiled, 2.6.12-rc2-mm3, the suspend happens a little faster but the resume comes to a blank screen, then immediately reboots without any messages that I can see. I'm very interested in getting this to work and will do whatever someone needs to gather information. I may need to ask basic kernel info questions when asked to do something as I haven't done much trouble shooting at this low a level before but I'm game. From googling around this is a problem for many and I would like to help resolve it. If I need to take this message to another mailing list or another individual working on ACPI stuff or something just let me know. Any ideas? Thanks, Davy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Dell D810 Laptop Suspend/Resume
Hi, I've been trying for the last few days to get my D810 to suspend and resume in linux. I'm doing it from klaptop in kde using Fedora Core 3, but I've now compiled my own linux-2.6.12-rc2-mm3 kernel since I've seen some ACPI changes going in. At 2.6.11 it would seem to suspend ok, but when doing the resume it would come back and have I/O errors.. causing the computer to freeze for a few seconds, then run for a second, then freeze again, etc.. the HDD light would stay on solid, and at the tty1 I saw something like "ata1: command 0xc8 timeout... I/O error..." So apparently something isn't getting starting back up. Thinking it might be the HDD not spinning, I powered off, but DID hear it spin down. Running what I compiled, 2.6.12-rc2-mm3, the suspend happens a little faster but the resume comes to a blank screen, then immediately reboots without any messages that I can see. I'm very interested in getting this to work and will do whatever someone needs to gather information. I may need to ask basic kernel info questions when asked to do something as I haven't done much trouble shooting at this low a level before but I'm game. From googling around this is a problem for many and I would like to help resolve it. Any ideas? Thanks, Davy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/