RE: slow open() calls and o_nonblock
Aaron Wiebe wrote: > David Schwartz wrote: > > There is no way you can re-try the request. The open must > > either succeed or > > not return a handle. It is not like a 'read' operation that has > > an "I didn't > > do anything, and you can retry this request" option. > > If 'open' returns a file handle, you can't retry it (since it > > must succeed > > in order to do that, failure must not return a handle). If you 'open' > > doesn't return a file handle, you can't retry it (because, > > without a handle, > > there is no way to associate a future request with this one, if > > it creates a > > file, the file must not be created if you don't call 'open' again). > I understand, but this is exactly the situation that I'm complaining > about. There is no functionality to provide a nonblocking open - no > ability to come back around and retry a given open call. I agree. I'm addressing why things can't "just work", not arguing that they aren't broken or should stay broken. ;) I think a good solution would be to re-use the 'connect' and 'shutdown' calls. You would need a new asynchronous flag to 'open' that would mean, *really* don't block. You would have to follow up with 'connect' to complete the actual opening -- the 'open' would just assign a file descriptor (unless it could complete or error immediately, of course). To asynchronously close such a socket, you simply call 'shutdown'. Once the 'shutdown' completes, 'close' would be guaranteed not to block. Obviously, being able to 'poll' or 'select' would be a huge plus (while an 'open' or 'close' is in progress, of course, otherwise it would always return immediate availability). I think this covers all the bases and the only ugly API change is an extra 'open' flag. (Which I think is unavoidable.) > I'm speaking to my ideal world view - but any application I write > should not have to wait for the kernel if I don't want it to. I > should be able to submit my request, and come back to it later as I so > decide. A working generic asynchronous system call interface would be the best solution, I think. But that may be further off than just an asynchronous file open/close interface. > (And I did actually consider writing my own NFS client for about > 5 minutes.) Yeah, what a pain that would be. The obvious counter-argument to what I propose above is that it doesn't handle reads and writes, so why bother with a complex partial solution? DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
On Mon, 2007-06-04 at 12:26 -0400, Aaron Wiebe wrote: > Actually, lets see if I can summarize this more generically... I > realize I'm suggesting something that probably would be a massive > undertaking, but .. > > Regular files are the only interface that requires an application to > wait. With any other case, the nonblocking interfaces are fairly > complete and easy to work with. If userspace could treat regular > files in the same fashion as sockets, life would be good. > > I admittedly do not understand internal kernel semantics in the > differences between a socket and a regular file. Why couldn't we just > have a different 'socket type' like PF_FILE or something like this? > > Abstracting any IO through the existing interfaces provided to sockets > would be ideal from my perspective. The code required to use a file > through these interfaces would be more complex in userspace, but the > abstraction of the current open() itself could simply be an aggregate > of these interfaces without a nonblocking flag. > > It would, however, fix problems around issues with event-based > applications handling events from both disk and sockets. I can't > trigger disk read/write events in the same event handlers I use for > sockets (ie, poll or epoll). I end up having two separate event > handlers - one for disk (currently using glibc's aio thread kludge), > and one for sockets. > > I'm sure this isn't a new idea. Coming from my own development > backround that had little to do with disk, I was actually surprised > when I first discovered that I couldn't edge-trigger disk IO through > poll(). > > Thoughts, comments? Unless you're planning on rearchitecting the entire VFS lookup and permissions code, you would basically have to fall back onto having a pool of service threads actually perform the I/O. That can just as easily be done today in userland. AFAICS, syslets should give you the means to implement a more scalable scheme, but we'll have to wait and see if/when those are ready for kernel inclusion. Cheers Trond - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
Actually, lets see if I can summarize this more generically... I realize I'm suggesting something that probably would be a massive undertaking, but .. Regular files are the only interface that requires an application to wait. With any other case, the nonblocking interfaces are fairly complete and easy to work with. If userspace could treat regular files in the same fashion as sockets, life would be good. I admittedly do not understand internal kernel semantics in the differences between a socket and a regular file. Why couldn't we just have a different 'socket type' like PF_FILE or something like this? Abstracting any IO through the existing interfaces provided to sockets would be ideal from my perspective. The code required to use a file through these interfaces would be more complex in userspace, but the abstraction of the current open() itself could simply be an aggregate of these interfaces without a nonblocking flag. It would, however, fix problems around issues with event-based applications handling events from both disk and sockets. I can't trigger disk read/write events in the same event handlers I use for sockets (ie, poll or epoll). I end up having two separate event handlers - one for disk (currently using glibc's aio thread kludge), and one for sockets. I'm sure this isn't a new idea. Coming from my own development backround that had little to do with disk, I was actually surprised when I first discovered that I couldn't edge-trigger disk IO through poll(). Thoughts, comments? -Aaron On 6/4/07, Aaron Wiebe <[EMAIL PROTECTED]> wrote: On 6/4/07, Trond Myklebust <[EMAIL PROTECTED]> wrote: > > So exactly how would you expect a nonblocking open to work? Should it be > starting I/O? What if that involves blocking? How would you know when to > try again? Well, theres a bunch of options - some have been suggested in the thread already. The idea of an open with O_NONBLOCK (or a different flag) returning a handle immediately, and subsequent calls returning EAGAIN if the open is incomplete, or ESTALE if it fails (with some auxiliary method of getting the reason why it failed) are not too far a stretch from my perspective. The other option that comes to mind would be to add an interface that behaves like sockets - get a handle from one system call, set it nonblocking using fcntl, and use another call to attach it to a regular file. This method would make the most sense to me - but its also because I've worked with sockets in the past far far more than with regular files. The one that would take the least amount of work from the application perspective would be to simply reply to the nonblocking open call with EAGAIN (or something), and when an open on the same file is performed, the kernel could have performed its work in the background. I can understand, given the fact that there is no handle provided to the application, that this idea could be sloppy. I'm still getting caught up on some of the other suggestions (I'm currently reading about the syslets work that Zach and Ingo are doing), and it sounds like this is a common complaint that is being addressed through a number of initiatives. I'm looking forward to seeing where that work goes. -Aaron - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
On 6/4/07, Trond Myklebust <[EMAIL PROTECTED]> wrote: So exactly how would you expect a nonblocking open to work? Should it be starting I/O? What if that involves blocking? How would you know when to try again? Well, theres a bunch of options - some have been suggested in the thread already. The idea of an open with O_NONBLOCK (or a different flag) returning a handle immediately, and subsequent calls returning EAGAIN if the open is incomplete, or ESTALE if it fails (with some auxiliary method of getting the reason why it failed) are not too far a stretch from my perspective. The other option that comes to mind would be to add an interface that behaves like sockets - get a handle from one system call, set it nonblocking using fcntl, and use another call to attach it to a regular file. This method would make the most sense to me - but its also because I've worked with sockets in the past far far more than with regular files. The one that would take the least amount of work from the application perspective would be to simply reply to the nonblocking open call with EAGAIN (or something), and when an open on the same file is performed, the kernel could have performed its work in the background. I can understand, given the fact that there is no handle provided to the application, that this idea could be sloppy. I'm still getting caught up on some of the other suggestions (I'm currently reading about the syslets work that Zach and Ingo are doing), and it sounds like this is a common complaint that is being addressed through a number of initiatives. I'm looking forward to seeing where that work goes. -Aaron - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
On Mon, 2007-06-04 at 10:20 -0400, Aaron Wiebe wrote: > I understand, but this is exactly the situation that I'm complaining > about. There is no functionality to provide a nonblocking open - no > ability to come back around and retry a given open call. So exactly how would you expect a nonblocking open to work? Should it be starting I/O? What if that involves blocking? How would you know when to try again? Trond - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
Sorry for the unthreaded responses, I wasn't cc'd here, so I'm replying to these based on mailing list archives Al Viro wrote: BTW, why close these suckers all the time? It's not that kernel would be unable to hold thousands of open descriptors for your process... Hash descriptors by pathname and be done with that; don't bother with close unless you decide that you've got too many of them (e.g. when you get a hash conflict). A valid point - I currently keep a pool of 4000 descriptors open and cycle them out based on inactivity. I hadn't seriously considered just keeping them all open, because I simply wasn't sure how well things would go with 100,000 files open. Would my backend storage keep up... would the kernel mind maintaining 100,000 files open over NFS? The majority of the files would simply be idle - I would be keeping file handles open for no reason. Pooling allows me to substantially drop the number of opens I require, but I am hesitant to blow the pool size to substantially higher numbers. Can anyone shed light on any issues that may come up with a massive pool size, such as 128k? -Aaron - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
On 6/4/07, John Stoffel <[EMAIL PROTECTED]> wrote: So how many files are in the directory where you're seeing the delays? And what's the average size of the files in there? The directories themselves will have a maximum of 160 files, and the files are maybe a few megs each - the delays are (as you pointed out earlier) due to the ram restrictions and our filesystem design of very deep directory structures that Netapps suck at. My point is more generic though - I will come up with ways to handle this problem in my application (probably with threads), but I'm griping more about the lack of a kernel interface that would have allowed me to avoid this. -Aaron - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
Replying to David Schwartz here.. (David, good to hear from you again - haven't seen you around since the irc days :)) David Schwartz wrote: There is no way you can re-try the request. The open must either succeed or not return a handle. It is not like a 'read' operation that has an "I didn't do anything, and you can retry this request" option. If 'open' returns a file handle, you can't retry it (since it must succeed in order to do that, failure must not return a handle). If you 'open' doesn't return a file handle, you can't retry it (because, without a handle, there is no way to associate a future request with this one, if it creates a file, the file must not be created if you don't call 'open' again). I understand, but this is exactly the situation that I'm complaining about. There is no functionality to provide a nonblocking open - no ability to come back around and retry a given open call. You need either threads or a working asynchronous system call interface. Short of that, you need your own NFS client code. This is exactly my point - there is no asynchronous system call to do this work, to my knowledge. I will likely fix this in my own code using threads, but I see using threads in this case as working around that lack of systems interface. Threads, imho, should be limited to cases where I'm using them to distribute load across multiple processors, not because the kernel interfaces for IO cannot support nonblocking calls. I'm speaking to my ideal world view - but any application I write should not have to wait for the kernel if I don't want it to. I should be able to submit my request, and come back to it later as I so decide. (And I did actually consider writing my own NFS client for about 5 minutes.) Thanks for the response! -Aaron - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
> "Aaron" == Aaron Wiebe <[EMAIL PROTECTED]> writes: Aaron> On 6/4/07, Alan Cox <[EMAIL PROTECTED]> wrote: >> >> > Now, I'm a userspace guy so I can be pretty dense, but shouldn't a >> > call with a nonblocking flag return EAGAIN if its going to take >> > anywhere near 415ms? >> >> Violation of causality. We don't know it will block for 415ms until 415ms >> have elapsed. Aaron> Understood - but what I'm getting at is more the fact that Aaron> there really doesn't appear to be any real implementation of Aaron> nonblocking open(). On the socket side of the fence, I would Aaron> consider a regular file open() to be equivalent to a connect() Aaron> call - the difference obviously being that we already have a Aaron> handle for the socket. Aaron> The end result, however, is roughly the same. We have a file Aaron> descriptor with the endpoint established. In the socket world, Aaron> we assume that a nonblocking request will always return Aaron> immediately and the application is expected to come back around Aaron> and see if the request has completed. Regular files have no Aaron> equivalent. So how many files are in the directory where you're seeing the delays? And what's the average size of the files in there? John - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
On 6/4/07, Alan Cox <[EMAIL PROTECTED]> wrote: > Now, I'm a userspace guy so I can be pretty dense, but shouldn't a > call with a nonblocking flag return EAGAIN if its going to take > anywhere near 415ms? Violation of causality. We don't know it will block for 415ms until 415ms have elapsed. Understood - but what I'm getting at is more the fact that there really doesn't appear to be any real implementation of nonblocking open(). On the socket side of the fence, I would consider a regular file open() to be equivalent to a connect() call - the difference obviously being that we already have a handle for the socket. The end result, however, is roughly the same. We have a file descriptor with the endpoint established. In the socket world, we assume that a nonblocking request will always return immediately and the application is expected to come back around and see if the request has completed. Regular files have no equivalent. -Aaron - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
On 6/3/07, Neil Brown <[EMAIL PROTECTED]> wrote: Have you tried the "nocto" mount option for your NFS filesystems. The cache-coherency rules of NFS require the client to check with the server at each open. If you are the sole client on this filesystem, then you don't need the same cache-coherency, and "nocto" will tell the NFS client not to both checking with the server in information is available in cache. No I haven't - I will research this a little further today. While we're not the only client using these filesystems, this process is (currently) the only process that writes to these files. Thanks for the suggestion. -Aaron - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
> Now, I'm a userspace guy so I can be pretty dense, but shouldn't a > call with a nonblocking flag return EAGAIN if its going to take > anywhere near 415ms? Violation of causality. We don't know it will block for 415ms until 415ms have elapsed. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: slow open() calls and o_nonblock
David Schwartz writes: [Aaron Wiebe] open("/somefile", O_WRONLY|O_NONBLOCK|O_CREAT, 0644) = 1621 <0.415147> How could they make any difference? I can't think of any conceivable way they could. Now, I'm a userspace guy so I can be pretty dense, but shouldn't a call with a nonblocking flag return EAGAIN if its going to take anywhere near 415ms? Is there a way I can force opens to EAGAIN if they take more than 10ms? There is no way you can re-try the request. The open must either succeed or not return a handle. It is not like a 'read' operation that has an "I didn't do anything, and you can retry this request" option. If 'open' returns a file handle, you can't retry it (since it must succeed in order to do that, failure must not return a handle). If you 'open' doesn't return a file handle, you can't retry it (because, without a handle, there is no way to associate a future request with this one, if it creates a file, the file must not be created if you don't call 'open' again). The 'open' function must, at minimum, confirm that the file exists (or doesn't exist and can be created, or whatever). This takes however long it takes on NFS. This is not the case, though we might need to allocate a new flag to avoid breaking things. Let open() with O_UNCHECKED always return a file descriptor, except perhaps when failure can be identified without doing IO. The "real" open then proceeds in the background. From poll() or select(), you can see that the file descriptor is not ready for anything. Eventually it becomes ready for IO or reports an error condition. Both select() and poll() are capable of reporting errors. If the "real" (background) open() fails, then the only valid operation is close(). Attempts to do anything else get EBADFD or ESTALE. You'll also need a background close(). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
In article <[EMAIL PROTECTED]> you wrote: > In short, I'm distributing logs in realtime for about 600,000 > websites. The sources of the logs (http, ftp, realmedia, etc) are > flexible, however the base framework was build around a large cluster > of webservers. The output can be to several hundred thousand files > across about two dozen filers for user consumption - some can be very > active, some can be completely inactive. Asuming you have multiple request log summary files, I would just run multiple "splitters". > You can certainly open the file, but not block on the call to do it. > What confuses me is why the kernel would "block" for 415ms on an open > call. Thats an eternity to suspend a process that has to distribute > data such as this. Because it has to, to return the result with the given API. But If you would have a async interface, the operation would still take that long and your throughput will still be limited by the opens/sec your filers support, or? > Except I cant very well keep 600,000 files open over NFS. :) Pool > and queue, and cycle through the pool. I've managed to achieve a > balance in my production deployment with this method - my email was > more of a rant after months of trying to work around a problem (caused > by a limitation in system calls), I agree that a unified async layer is nice from the programmers POV, but I disagree that it would help your performance problem which is caused by NFS and/or NetApp (and I wont blame them). Gruss Bernd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
On Sunday June 3, [EMAIL PROTECTED] wrote: > > You can certainly open the file, but not block on the call to do it. > What confuses me is why the kernel would "block" for 415ms on an open > call. Thats an eternity to suspend a process that has to distribute > data such as this. Have you tried the "nocto" mount option for your NFS filesystems. The cache-coherency rules of NFS require the client to check with the server at each open. If you are the sole client on this filesystem, then you don't need the same cache-coherency, and "nocto" will tell the NFS client not to both checking with the server in information is available in cache. This should speed up the time for open considerably. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
In article <[EMAIL PROTECTED]> you wrote: > (ps. having come from the socket side of the fence, its incredibly > frustrating to be unable to poll() or epoll regular file FDs -- > Especially knowing that the kernel is translating them into a TCP > socket to do NFS anyway. Please add regular files to epoll and give > me a way to do the opens in the same fasion as connects!) You might want to use Windows? :) Gruss Bernd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
Hi John, thanks for responding. I'm using kernel 2.6.20 on a home-grown distro. I've responded to a few specific points inline - but as a whole, Davide directed me to work that is being done specifically to address these issues in the kernel, as well as a userspace implementation that would allow me to sidestep this failing for the time being. On 6/3/07, John Stoffel <[EMAIL PROTECTED]> wrote: How large are these files? Are they all in a single directory? How many files are in the directory? Ugh. Why don't you just write to a DB instead? It sounds like you're writing small records, with one record to a file. It can work, but when you're doing thousands per-minute, the open/close overhead is starting to dominate. Can you just amortize that overhead across a bunch of writes instead by writing to a single file which is more structured for your needs? In short, I'm distributing logs in realtime for about 600,000 websites. The sources of the logs (http, ftp, realmedia, etc) are flexible, however the base framework was build around a large cluster of webservers. The output can be to several hundred thousand files across about two dozen filers for user consumption - some can be very active, some can be completely inactive. Netapps usually scream for NFS writes and such, so it sounds to me that you've blown out the NVRAM cache on the box. Can you elaborate more on your hardware & Network & Netapp setup? You're totally correct here - Netapp has told us as much about our filesystem design, we use too much ram on the filer itself. Its true that the application would handle just fine if our filesystem structure were redesigned - I am approaching this from an application perspective though. These units are capable of the raw IO, its the simple fact that open calls are taking a while. If I were to thread off the application (which Davide has been kind enough to provide some libraries which will make that substantially easier), the problem wouldn't exist. The problem is that O_NONBLOCK on files open doesn't make sense. You either open it, or you don't. How long it takes to comlete isn't part of the spec. You can certainly open the file, but not block on the call to do it. What confuses me is why the kernel would "block" for 415ms on an open call. Thats an eternity to suspend a process that has to distribute data such as this. But in this case, I think you're doing something hokey with your data design. You should be opening just a handful of files and then streaming your writes to those files. You'll get much more performance. Except I cant very well keep 600,000 files open over NFS. :) Pool and queue, and cycle through the pool. I've managed to achieve a balance in my production deployment with this method - my email was more of a rant after months of trying to work around a problem (caused by a limitation in system calls), only to have it present an order of magnitude worse than I expected. Sorry for not giving more information off the line - and thanks for your time. -Aaron - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
On Sun, Jun 03, 2007 at 05:27:06PM -0700, David Schwartz wrote: > > > Now, Netapp speed aside, O_NONBLOCK and O_DIRECT seem to make zero > > difference to my open times. Example: > > > > open("/somefile", O_WRONLY|O_NONBLOCK|O_CREAT, 0644) = 1621 <0.415147> > The 'open' function must, at minimum, confirm that the file exists (or > doesn't exist and can be created, or whatever). This takes however long it > takes on NFS. > > You need either threads or a working asynchronous system call interface. > Short of that, you need your own NFS client code. BTW, why close these suckers all the time? It's not that kernel would be unable to hold thousands of open descriptors for your process... Hash descriptors by pathname and be done with that; don't bother with close unless you decide that you've got too many of them (e.g. when you get a hash conflict). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: slow open() calls and o_nonblock
> Now, Netapp speed aside, O_NONBLOCK and O_DIRECT seem to make zero > difference to my open times. Example: > > open("/somefile", O_WRONLY|O_NONBLOCK|O_CREAT, 0644) = 1621 <0.415147> How could they make any difference? I can't think of any conceivable way they could. > Now, I'm a userspace guy so I can be pretty dense, but shouldn't a > call with a nonblocking flag return EAGAIN if its going to take > anywhere near 415ms? Is there a way I can force opens to EAGAIN if > they take more than 10ms? There is no way you can re-try the request. The open must either succeed or not return a handle. It is not like a 'read' operation that has an "I didn't do anything, and you can retry this request" option. If 'open' returns a file handle, you can't retry it (since it must succeed in order to do that, failure must not return a handle). If you 'open' doesn't return a file handle, you can't retry it (because, without a handle, there is no way to associate a future request with this one, if it creates a file, the file must not be created if you don't call 'open' again). The 'open' function must, at minimum, confirm that the file exists (or doesn't exist and can be created, or whatever). This takes however long it takes on NFS. You need either threads or a working asynchronous system call interface. Short of that, you need your own NFS client code. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
> "Aaron" == Aaron Wiebe <[EMAIL PROTECTED]> writes: More details on which kernel you're using and which distro would be helpful. Also, more details on your App and reasons why you're writing bunches of small files would help as well. Aaron> Greetings all. I'm not on this list, so I apologize if this subject Aaron> has been covered before. (Also, please cc me in the response.) Aaron> I've spent the last several months trying to work around the lack of a Aaron> decent disk AIO interface. I'm starting to wonder if one exists Aaron> anywhere. The short version: Aaron> I have written a daemon that needs to open several thousand Aaron> files a minute and write a small amount of data to each file. How large are these files? Are they all in a single directory? How many files are in the directory? Ugh. Why don't you just write to a DB instead? It sounds like you're writing small records, with one record to a file. It can work, but when you're doing thousands per-minute, the open/close overhead is starting to dominate. Can you just amortize that overhead across a bunch of writes instead by writing to a single file which is more structured for your needs? Aaron> After extensive research, I ended up going with the POSIX AIO Aaron> kludgy pthreads wrapper in glibc to handle my writes due to the Aaron> time constraints of writing my own pthreads handler into the Aaron> application. Aaron> The problem with this equation is that opens, closes and Aaron> non-readwrite operations (fchmod, fcntl, etc) have no interface Aaron> in posix aio. Now I was under the assumption that given open Aaron> and close operations are comparatively less common than the Aaron> write operations, this wouldn't be a huge problem. My tests Aaron> seemed to reflect that. Aaron> I went to production with this yesterday to discover that under Aaron> production load, our filesystems (nfs on netapps) were Aaron> substantially slower than I was expecting. open() calls are Aaron> taking upwards of 2 seconds on occation, and usually ~20ms. Netapps usually scream for NFS writes and such, so it sounds to me that you've blown out the NVRAM cache on the box. Can you elaborate more on your hardware & Network & Netapp setup? Of course, you could also be using sucky NFS configuration, so we need to see your mount options as well. You are using TCP and NFSv3, right? And a large wsize/rsize values too? Have you also checked your NetApp to make sure you have the following options turned OFF: nfs.per_client_stats.enable nfs.mountd_trace Seeing your exports file and output of 'options nfs' would help. Aaron> Now, Netapp speed aside, O_NONBLOCK and O_DIRECT seem to make Aaron> zero difference to my open times. Example: Aaron> open("/somefile", O_WRONLY|O_NONBLOCK|O_CREAT, 0644) = 1621 <0.415147> Aaron> Now, I'm a userspace guy so I can be pretty dense, but Aaron> shouldn't a call with a nonblocking flag return EAGAIN if its Aaron> going to take anywhere near 415ms? Is there a way I can force Aaron> opens to EAGAIN if they take more than 10ms? The problem is that O_NONBLOCK on files open doesn't make sense. You either open it, or you don't. How long it takes to comlete isn't part of the spec. But in this case, I think you're doing something hokey with your data design. You should be opening just a handful of files and then streaming your writes to those files. You'll get much more performance. Also, have you tried writing to a local disk instead of via NFS to see how local disk speed is? Aaron> (ps. having come from the socket side of the fence, its Aaron> incredibly frustrating to be unable to poll() or epoll regular Aaron> file FDs -- Especially knowing that the kernel is translating Aaron> them into a TCP socket to do NFS anyway. Please add regular Aaron> files to epoll and give me a way to do the opens in the same Aaron> fasion as connects!) epoll isn't going to help you much here, it's the open which is causing the delay, not the writing to the file itself. Maybe you need to be caching more of your writes into memory on the client side, and then streaming them to the NetApp later on when you know you can write a bunch of data at once. But honestly, I think you've done a bad job architecting your application's backend data store and you really need to re-think it through. Heck, I'm not even much of a programmer, I'm a SysAdmin who runs Netapps and talks the users into more sane ways of getting better performance out of their applications. *grin*. John - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slow open() calls and o_nonblock
On Sun, 3 Jun 2007, Aaron Wiebe wrote: > (ps. having come from the socket side of the fence, its incredibly > frustrating to be unable to poll() or epoll regular file FDs -- > Especially knowing that the kernel is translating them into a TCP > socket to do NFS anyway. Please add regular files to epoll and give > me a way to do the opens in the same fasion as connects!) You may want to follow Ingo and Zach work on syslets/threadlets. If that goes in, you can make *any* syscall asynchronous. I ended up writing a userspace library, to cover the same exact problem you have: http://www.xmailserver.org/guasi.html I basically host an epoll_wait (containing all my sockets, pipes, etc) inside a GUASI async request, where other non-pollable async requests are hosted. So guasi_fetch() becomes my main event collector, and when the epoll_wait async request show up, I handle all the events in there. This is a *very-trivial* HTTP server using such solution (coroutines, epoll and GUASI): http://www.xmailserver.org/cghttpd-home.html - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/