Add KAPI-annotated kerneldoc for the sys_close system call in fs/open.c. The specification documents the file descriptor parameter, error conditions, locking requirements, side effects on pending I/O, and the close-on-exec relationship.
Signed-off-by: Sasha Levin <[email protected]> --- fs/open.c | 238 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 234 insertions(+), 4 deletions(-) diff --git a/fs/open.c b/fs/open.c index 8e805233a277b..cf74912d15eb5 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1808,10 +1808,240 @@ int filp_close(struct file *filp, fl_owner_t id) } EXPORT_SYMBOL(filp_close); -/* - * Careful here! We test whether the file pointer is NULL before - * releasing the fd. This ensures that one clone task can't release - * an fd while another clone is opening it. +/** + * sys_close - Close a file descriptor + * @fd: The file descriptor to close + * + * long-desc: Terminates access to an open file descriptor, releasing the file + * descriptor for reuse by subsequent open(), dup(), or similar syscalls. Any + * advisory record locks (POSIX locks, OFD locks, and flock locks) held on the + * associated file are released. When this is the last file descriptor + * referring to the underlying open file description, associated resources are + * freed. If the file was previously unlinked, the file itself is deleted when + * the last reference is closed. + * + * CRITICAL: The file descriptor is ALWAYS closed, even when close() returns + * an error. This differs from POSIX semantics where the state of the file + * descriptor is unspecified after EINTR. On Linux, the fd is released early + * in close() processing before flush operations that may fail. Therefore, + * retrying close() after an error return is DANGEROUS and may close an + * unrelated file descriptor that was assigned to another thread. + * + * Errors returned from close() (EIO, ENOSPC, EDQUOT) indicate that the final + * flush of buffered data failed. These errors commonly occur on network + * filesystems like NFS when write errors are deferred to close time. A + * successful return from close() does NOT guarantee that data has been + * successfully written to disk; the kernel uses buffer cache to defer writes. + * To ensure data persistence, call fsync() before close(). + * + * On close, the following cleanup operations are performed: POSIX advisory + * locks are removed, dnotify registrations are cleaned up, the file is + * flushed to storage if applicable, and the file + * reference is released. If this was the last reference, additional cleanup + * includes: fsnotify close notification, epoll cleanup, flock and lease + * removal, FASYNC cleanup, and the file structure deallocation. + * + * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE + * + * param: fd + * type: KAPI_TYPE_FD + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_RANGE + * range: 0, INT_MAX + * cdesc: Must be a valid, open file descriptor for the current process. + * The value 0, 1, or 2 (stdin, stdout, stderr) may be closed like any other + * fd, though this is unusual and may cause issues with libraries that assume + * these descriptors are valid. The parameter is unsigned int to match kernel + * file descriptor table indexing, but values exceeding INT_MAX are effectively + * invalid due to internal checks. + * + * return: + * type: KAPI_TYPE_INT + * check-type: KAPI_RETURN_EXACT + * success: 0 + * desc: Returns 0 on success. On error, returns a negative error code. + * IMPORTANT: Even when an error is returned, the file descriptor is still + * closed and must not be used again. The error indicates a problem with + * the final flush operation, not that the fd remains open. + * + * error: EBADF, Bad file descriptor + * desc: The file descriptor fd is not a valid open file descriptor, or was + * already closed. This is the only error that indicates the fd was NOT + * closed (because it was never open to begin with). Occurs when fd is out + * of range, has no file assigned, or was already closed. + * + * error: EINTR, Interrupted system call + * desc: The flush operation was interrupted by a signal before completion. + * This occurs when the close-time flush operation (e.g., on NFS) performs an + * interruptible wait that receives a signal. IMPORTANT: Despite this error, + * the file descriptor IS closed and must not be used again. This error + * is generated by converting kernel-internal restart codes (ERESTARTSYS, + * ERESTARTNOINTR, ERESTARTNOHAND, ERESTART_RESTARTBLOCK) to EINTR because + * restarting the syscall would be incorrect once the fd is freed. + * + * error: EIO, I/O error + * desc: An I/O error occurred during the flush of buffered data to the + * underlying storage. This typically indicates a hardware error, network + * failure on NFS, or other storage system error. The file descriptor is + * still closed. Previously buffered write data may have been lost. + * + * error: ENOSPC, No space left on device + * desc: There was insufficient space on the storage device to flush buffered + * writes. This is common on NFS when the server runs out of space between + * write() and close(). The file descriptor is still closed. + * + * error: EDQUOT, Disk quota exceeded + * desc: The user's disk quota was exceeded while attempting to flush buffered + * writes. Common on NFS when quota is exceeded between write() and close(). + * The file descriptor is still closed. + * + * lock: files->file_lock + * type: KAPI_LOCK_SPINLOCK + * acquired: true + * released: true + * desc: Acquired via file_close_fd() to atomically lookup and remove the fd + * from the file descriptor table. Held only during the table manipulation; + * released before flush and final cleanup operations. This ensures that + * another thread cannot allocate the same fd number while close is in + * progress. + * + * lock: file->f_lock + * type: KAPI_LOCK_SPINLOCK + * acquired: true + * released: true + * desc: Acquired during epoll cleanup (eventpoll_release_file) and dnotify + * cleanup to safely unlink the file from monitoring structures. May also + * be acquired during lock context operations. + * + * lock: ep->mtx + * type: KAPI_LOCK_MUTEX + * acquired: true + * released: true + * desc: Acquired during epoll cleanup if the file was monitored by epoll. + * Used to safely remove the file from epoll interest lists. + * + * lock: flc_lock + * type: KAPI_LOCK_SPINLOCK + * acquired: true + * released: true + * desc: File lock context spinlock, acquired during locks_remove_file() to + * safely remove POSIX, flock, and lease locks associated with the file. + * + * signal: pending_signals + * direction: KAPI_SIGNAL_RECEIVE + * action: KAPI_SIGNAL_ACTION_RETURN + * condition: When close-time flush performs interruptible wait + * desc: If the close-time flush operation (e.g., on NFS) performs an + * interruptible wait and a signal is pending, the wait is interrupted. + * Any kernel restart codes are converted to EINTR since close cannot be + * restarted after the fd is freed. + * error: -EINTR + * timing: KAPI_SIGNAL_TIME_DURING + * restartable: no + * + * side-effect: KAPI_EFFECT_RESOURCE_DESTROY | KAPI_EFFECT_IRREVERSIBLE + * target: File descriptor table entry + * desc: The file descriptor is removed from the process's file descriptor + * table, making the fd number available for reuse by subsequent open(), + * dup(), or similar calls. This occurs BEFORE any flush or cleanup that + * might fail, making the operation irreversible regardless of return value. + * condition: Always (when fd is valid) + * reversible: no + * + * side-effect: KAPI_EFFECT_LOCK_RELEASE + * target: POSIX advisory locks, OFD locks, flock locks + * desc: All advisory locks held on the file by this process are removed. + * POSIX locks are removed via locks_remove_posix() during filp_flush(). + * All lock types (POSIX, OFD, flock) are removed via locks_remove_file() + * during __fput() when this is the last reference. + * condition: File has FMODE_OPENED and !(FMODE_PATH) + * reversible: no + * + * side-effect: KAPI_EFFECT_RESOURCE_DESTROY + * target: File leases + * desc: Any file leases held on the file are removed during locks_remove_file() + * when this is the last reference to the open file description. + * condition: File had leases and this is the last close + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: dnotify registrations + * desc: Directory notification (dnotify) registrations associated with this + * file are cleaned up via dnotify_flush(). This only applies to directories. + * condition: File is a directory with dnotify registrations + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: epoll interest lists + * desc: If the file was being monitored by epoll instances, it is removed + * from those interest lists via eventpoll_release(). + * condition: File was added to epoll instances + * reversible: no + * + * side-effect: KAPI_EFFECT_FILESYSTEM + * target: Buffered data + * desc: Any buffered data is flushed if applicable (e.g., on NFS). This + * attempts to write any buffered data to storage + * and may return errors (EIO, ENOSPC, EDQUOT) if the flush fails. The + * success of this flush is NOT guaranteed even with a 0 return; use + * fsync() before close() to ensure data persistence. + * condition: File was opened for writing and has buffered data + * reversible: no + * + * side-effect: KAPI_EFFECT_FREE_MEMORY + * target: struct file and related structures + * desc: When this is the last reference to the file, the file structure is + * freed and the dentry and mount references are released. + * condition: This is the last reference to the file + * reversible: no + * + * side-effect: KAPI_EFFECT_FILESYSTEM + * target: Unlinked file deletion + * desc: If the file was previously unlinked (deleted) but kept open, closing + * the last reference causes the actual file data to be removed from the + * filesystem and the inode to be freed. + * condition: File was unlinked and this is the last reference + * reversible: no + * + * state-trans: file_descriptor + * from: open + * to: closed/free + * condition: Valid fd passed to close + * desc: The file descriptor transitions from open (usable) to closed (invalid). + * The fd number becomes available for reuse. This transition occurs early + * in close() processing, before any operations that might fail. + * + * state-trans: file_reference_count + * from: n + * to: n-1 (or freed if n was 1) + * condition: Always on successful fd lookup + * desc: The file's reference count is decremented. If this was the last + * reference, the file is fully cleaned up and freed. + * + * constraint: File Descriptor Reuse Race + * desc: Because the fd is freed early in close() processing, another thread + * may receive the same fd number from a concurrent open() before close() + * returns. Applications must not retry close() after an error return, as + * this could close an unrelated file opened by another thread. + * expr: After close(fd) returns (even with error), fd is invalid + * + * examples: close(fd); // Basic usage - ignore errors (common but not ideal) + * if (close(fd) == -1) perror("close"); // Log errors for debugging + * fsync(fd); close(fd); // Ensure data persistence before closing + * + * notes: The fd is always freed regardless of the return value. POSIX + * specifies that on EINTR the state of the fd is unspecified, but Linux + * always closes it. Retrying close() after an error may close an unrelated + * fd that was reassigned by another thread, so callers should never retry. + * + * Error codes like EIO, ENOSPC, and EDQUOT indicate that previously buffered + * writes may have failed to reach storage. These errors are particularly + * common on NFS where write errors are often deferred to close time. + * + * Calling close() on a file descriptor while another thread is using it + * (e.g., in a blocking read() or write()) does not interrupt the blocked + * operation. The blocked operation continues on the underlying file and + * may complete even after close() returns. */ SYSCALL_DEFINE1(close, unsigned int, fd) { -- 2.51.0

