>OK, the app is used to deal with standard disks, that is why it is not >handling the EINTR signal propoerly.
I think you're misunderstanding what a "signal" is in the Unix sense. EINTR isn't a signal; it's a return code from the write() system call that says, "Hey, you got a signal in the middle of this write() call and it didn't complete". It doesn't mean that there was an error writing the file; if that was happening, you'd get a (presumably different) error code. Signals can be sent by the operating system, but those signals are things like SIGSEGV, which basically means, "you're program screwed up". Programs can also send signals to each other, with kill(2) and the like. Now, NORMALLY systems calls like write() are interrupted by signals when you're writing to "slow" devices, like network sockets. According to the signal(7) man page, disks are not normally considered slow devices, so I can understand the application not being used to handling this. And you know, now that I think about it I'm not even sure that network filesystems SHOULD allow I/O system calls to be interrupted by signals ... I'd have to think more about it. I suspect what happened is that something changed between 1.8.5 and the previous version of Lustre that you were using that allowed some operations to be interruptable by signals. Some things to try: - Check to see if you are, in fact, receiving a signal in your application and Lustre isn't returning EINTR for some other reason. - If you are receiving a signal, when you set the signal handler for it you could use the SA_RESTART flag to restart the interrupted I/O; I think that would make everything work like it did before. --Ken _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss