Re: patch: VFS: fix passing of AT_PHDR value in auxv to ELF interpreter
On Fri, 4 May 2007 23:34:08 -0400 Quentin Godfroy <[EMAIL PROTECTED]> wrote: > By the way, is init 32 bits or 64 bits? It could break the ia32 > emulation thing, but not the 64bit native mode. akpm2:/home/akpm> file /sbin/init /sbin/init: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped akpm2:/home/akpm> cat /etc/issue Fedora Core release 6 (Zod) Kernel \r on an \m - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: patch: VFS: fix passing of AT_PHDR value in auxv to ELF interpreter
Quentin Godfroy wrote: >> Won't this break with ET_DYN executables? And besides, isn't this the >> same thing? >> > > Indeed, I haven't seen that. For ET_DYN executables, it could be done a > thing like load_addr+elf_ppnt->p_vaddr (in the function that creates the > auxv, as ity has access to the elf header), and for ET_EXEC do what I > propose. I think this is trivial to do. I'll do it as soon as I come back > in front of my machine. > I don't think you need to special-case it. You can compute the offset between the linked address and the load address (first PT_LOAD[0]->p_vaddr - load_addr) and use that to offset all the other addresses. > I don't understand. Yes it is what it is supposed to be, and the kernel > is supposed to give the vaddr of the phdr table to the interpreter and > not load addr + offset of phdr in file, which is sometimes wrong. > How can it be wrong? Does the PT_PHDR point to a different array of Phdr entries? J - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: patch: VFS: fix passing of AT_PHDR value in auxv to ELF interpreter
On Fri, May 04, 2007 at 04:22:08PM -0700, Andrew Morton wrote: > This patch kills my FC6 machine (using a config which was derived from RH's > original): > > Freeing unused kernel memory: 368k freed > Write protecting the kernel read-only data: 959k > request_module: runaway loop modprobe binfmt-464c > request_module: runaway loop modprobe binfmt-464c > request_module: runaway loop modprobe binfmt-464c > request_module: runaway loop modprobe binfmt-464c > request_module: runaway loop modprobe binfmt-464c > > > .config: http://userweb.kernel.org/~akpm/config-akpm2.txt I didn't try it on a 64bit kernel. I'll do as soon as I can reach my machine. Probably the loop does not find PT_PHDR and then returns noexec. I had such a problem, but it was because I forgot elf_ppnt = elf_phdata before the loop. By the way, is init 32 bits or 64 bits? It could break the ia32 emulation thing, but not the 64bit native mode. Anyway the problem could be addressed by returning back to the old behaviour if the loop fails, but it's not clean at all. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: patch: VFS: fix passing of AT_PHDR value in auxv to ELF interpreter
On Fri, May 04, 2007 at 04:31:49PM -0700, Jeremy Fitzhardinge wrote: > Quentin Godfroy wrote: > > + elf_ppnt = elf_phdata; > > + for (i = 0; i< loc->elf_ex.e_phnum; i++, elf_ppnt++) > > + if (elf_ppnt->p_type == PT_PHDR) { > > + phdr_addr = elf_ppnt->p_vaddr; > > > > Won't this break with ET_DYN executables? And besides, isn't this the > same thing? Indeed, I haven't seen that. For ET_DYN executables, it could be done a thing like load_addr+elf_ppnt->p_vaddr (in the function that creates the auxv, as ity has access to the elf header), and for ET_EXEC do what I propose. I think this is trivial to do. I'll do it as soon as I come back in front of my machine. > Shouldn't PT_PHDR->p_vaddr point to the vaddr of the Phdr > table itself? I don't understand. Yes it is what it is supposed to be, and the kernel is supposed to give the vaddr of the phdr table to the interpreter and not load addr + offset of phdr in file, which is sometimes wrong. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: patch: VFS: fix passing of AT_PHDR value in auxv to ELF interpreter
Quentin Godfroy wrote: > + elf_ppnt = elf_phdata; > + for (i = 0; i< loc->elf_ex.e_phnum; i++, elf_ppnt++) > + if (elf_ppnt->p_type == PT_PHDR) { > + phdr_addr = elf_ppnt->p_vaddr; > Won't this break with ET_DYN executables? And besides, isn't this the same thing? Shouldn't PT_PHDR->p_vaddr point to the vaddr of the Phdr table itself? J - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: patch: VFS: fix passing of AT_PHDR value in auxv to ELF interpreter
On Fri, 4 May 2007 10:09:21 -0400 Quentin Godfroy <[EMAIL PROTECTED]> wrote: > On a dynamic ELF executable, the current kernel loader gives to the > interpreter (in the AUXV vector) the AT_PHDR argument as : > offset_of_phdr_in_file + first address. > > It can be wrong for an executable where the program headers are not located > in the first loaded segment. > > This patch corrects the behaviour. > > Signed-off-by: Quentin Godfroy <[EMAIL PROTECTED]> > --- > Here is an example of such an ELF executable which the current code > fails on : > ftp://quatramaran.ens.fr/pub/godfroy/addrpath/broken-sample > > --- linux-2.6.21.1/fs/binfmt_elf.c2007-05-04 03:20:00.0 -0400 > +++ linux-2.6.21.1-patch/fs/binfmt_elf.c 2007-05-04 08:02:18.0 > -0400 > @@ -134,6 +134,7 @@ static int padzero(unsigned long elf_bss > static int > create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec, > int interp_aout, unsigned long load_addr, > + unsigned long phdr_addr, > unsigned long interp_load_addr) > { > unsigned long p = bprm->p; > @@ -190,7 +191,7 @@ create_elf_tables(struct linux_binprm *b > NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP); > NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE); > NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC); > - NEW_AUX_ENT(AT_PHDR, load_addr + exec->e_phoff); > + NEW_AUX_ENT(AT_PHDR, phdr_addr); > NEW_AUX_ENT(AT_PHENT, sizeof(struct elf_phdr)); > NEW_AUX_ENT(AT_PHNUM, exec->e_phnum); > NEW_AUX_ENT(AT_BASE, interp_load_addr); > @@ -529,7 +530,7 @@ static unsigned long randomize_stack_top > static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs) > { > struct file *interpreter = NULL; /* to shut gcc up */ > - unsigned long load_addr = 0, load_bias = 0; > + unsigned long load_addr = 0, load_bias = 0, phdr_addr = 0; > int load_addr_set = 0; > char * elf_interpreter = NULL; > unsigned int interpreter_type = INTERPRETER_NONE; > @@ -718,6 +719,16 @@ static int load_elf_binary(struct linux_ > break; > } > > + elf_ppnt = elf_phdata; > + for (i = 0; i< loc->elf_ex.e_phnum; i++, elf_ppnt++) > + if (elf_ppnt->p_type == PT_PHDR) { > + phdr_addr = elf_ppnt->p_vaddr; > + break; > + } > + retval = -ENOEXEC; > + if (!phdr_addr) > + goto out_free_dentry; > + > /* Some simple consistency checks for the interpreter */ > if (elf_interpreter) { > interpreter_type = INTERPRETER_ELF | INTERPRETER_AOUT; > @@ -987,7 +998,7 @@ static int load_elf_binary(struct linux_ > current->flags &= ~PF_FORKNOEXEC; > create_elf_tables(bprm, &loc->elf_ex, > (interpreter_type == INTERPRETER_AOUT), > - load_addr, interp_load_addr); > + load_addr, phdr_addr, interp_load_addr); > /* N.B. passed_fileno might not be initialized? */ > if (interpreter_type == INTERPRETER_AOUT) > current->mm->arg_start += strlen(passed_fileno) + 1; This patch kills my FC6 machine (using a config which was derived from RH's original): Freeing unused kernel memory: 368k freed Write protecting the kernel read-only data: 959k request_module: runaway loop modprobe binfmt-464c request_module: runaway loop modprobe binfmt-464c request_module: runaway loop modprobe binfmt-464c request_module: runaway loop modprobe binfmt-464c request_module: runaway loop modprobe binfmt-464c .config: http://userweb.kernel.org/~akpm/config-akpm2.txt - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] AF_RXRPC: Sort out MTU handling
From: David Howells <[EMAIL PROTECTED]> Date: Fri, 04 May 2007 15:56:47 +0100 > Sort out the MTU determination and handling in AF_RXRPC: > > (1) If it's present, parse the additional information supplied by the peer at > the end of the ACK packet (struct ackinfo) to determine the MTU sizes > that peer is willing to support. > > (2) Initialise the MTU size to that peer from the kernel's routing records. > > (3) Send ACKs rather than ACKALLs as the former carry the additional info, > and the latter do not. > > (4) Declare the interface MTU size in outgoing ACKs as a maximum amount of > data that can be stuffed into an RxRPC packet without it having to be > fragmented to come in this computer's NIC. > > (5) If sendmsg() is given MSG_MORE then it should allocate an skb of the > maximum size rather than one just big enough for the data it's got left > to process on the theory that there is more data to come that it can > append to that packet. > > This means, for example, that if AFS does a large StoreData op, all the > packets barring the last will be filled to the maximum unfragmented size. > > Signed-off-by: David Howells <[EMAIL PROTECTED]> Applied, thanks. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] JFS: Fix race waking up jfsIO kernel thread
I've been looking at a hang that was reported off-list, and I believe I found the cause. I'm still waiting for confirmation on whether the patch does fix the problem, but I wanted to distribute the patch in case anyone else is seeing a similar hang. It looks like the problem is in kernels from 2.6.17 to the present. Thanks, Shaggy JFS: Fix race waking up jfsIO kernel thread It's possible for a journal I/O request to be added to the log_redrive queue and the jfsIO thread to be awakened after the thread releases log_redrive_lock but before it sets its state to TASK_INTERRUPTIBLE. The jfsIO thread should set its state before giving up the spinlock, so the waking thread will really wake it. Signed-off-by: Dave Kleikamp <[EMAIL PROTECTED]> diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c index ff7f1be..16c6268 100644 --- a/fs/jfs/jfs_logmgr.c +++ b/fs/jfs/jfs_logmgr.c @@ -2354,12 +2354,13 @@ int jfsIOWait(void *arg) lbmStartIO(bp); spin_lock_irq(&log_redrive_lock); } - spin_unlock_irq(&log_redrive_lock); if (freezing(current)) { + spin_unlock_irq(&log_redrive_lock); refrigerator(); } else { set_current_state(TASK_INTERRUPTIBLE); + spin_unlock_irq(&log_redrive_lock); schedule(); __set_current_state(TASK_RUNNING); } -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] AF_RXRPC: Sort out MTU handling
Sort out the MTU determination and handling in AF_RXRPC: (1) If it's present, parse the additional information supplied by the peer at the end of the ACK packet (struct ackinfo) to determine the MTU sizes that peer is willing to support. (2) Initialise the MTU size to that peer from the kernel's routing records. (3) Send ACKs rather than ACKALLs as the former carry the additional info, and the latter do not. (4) Declare the interface MTU size in outgoing ACKs as a maximum amount of data that can be stuffed into an RxRPC packet without it having to be fragmented to come in this computer's NIC. (5) If sendmsg() is given MSG_MORE then it should allocate an skb of the maximum size rather than one just big enough for the data it's got left to process on the theory that there is more data to come that it can append to that packet. This means, for example, that if AFS does a large StoreData op, all the packets barring the last will be filled to the maximum unfragmented size. Signed-off-by: David Howells <[EMAIL PROTECTED]> --- net/rxrpc/ar-ack.c| 80 ++--- net/rxrpc/ar-error.c |2 + net/rxrpc/ar-output.c |2 + net/rxrpc/ar-peer.c | 45 +++- 4 files changed, 115 insertions(+), 14 deletions(-) diff --git a/net/rxrpc/ar-ack.c b/net/rxrpc/ar-ack.c index fc07a92..657ee69 100644 --- a/net/rxrpc/ar-ack.c +++ b/net/rxrpc/ar-ack.c @@ -543,6 +543,38 @@ static void rxrpc_zap_tx_window(struct rxrpc_call *call) } /* + * process the extra information that may be appended to an ACK packet + */ +static void rxrpc_extract_ackinfo(struct rxrpc_call *call, struct sk_buff *skb, + unsigned latest, int nAcks) +{ + struct rxrpc_ackinfo ackinfo; + struct rxrpc_peer *peer; + unsigned mtu; + + if (skb_copy_bits(skb, nAcks + 3, &ackinfo, sizeof(ackinfo)) < 0) { + _leave(" [no ackinfo]"); + return; + } + + _proto("Rx ACK %%%u Info { rx=%u max=%u rwin=%u jm=%u }", + latest, + ntohl(ackinfo.rxMTU), ntohl(ackinfo.maxMTU), + ntohl(ackinfo.rwind), ntohl(ackinfo.jumbo_max)); + + mtu = min(ntohl(ackinfo.rxMTU), ntohl(ackinfo.maxMTU)); + + peer = call->conn->trans->peer; + if (mtu < peer->maxdata) { + spin_lock_bh(&peer->lock); + peer->maxdata = mtu; + peer->mtu = mtu + peer->hdrsize; + spin_unlock_bh(&peer->lock); + _net("Net MTU %u (maxdata %u)", peer->mtu, peer->maxdata); + } +} + +/* * process packets in the reception queue */ static int rxrpc_process_rx_queue(struct rxrpc_call *call, @@ -606,6 +638,8 @@ process_further: rxrpc_acks[ack.reason], ack.nAcks); + rxrpc_extract_ackinfo(call, skb, latest, ack.nAcks); + if (ack.reason == RXRPC_ACK_PING) { _proto("Rx ACK %%%u PING Request", latest); rxrpc_propose_ACK(call, RXRPC_ACK_PING_RESPONSE, @@ -801,9 +835,9 @@ void rxrpc_process_call(struct work_struct *work) struct msghdr msg; struct kvec iov[5]; unsigned long bits; - __be32 data; + __be32 data, pad; size_t len; - int genbit, loop, nbit, ioc, ret; + int genbit, loop, nbit, ioc, ret, mtu; u32 abort_code = RX_PROTOCOL_ERROR; u8 *acks = NULL; @@ -899,9 +933,30 @@ void rxrpc_process_call(struct work_struct *work) } if (test_bit(RXRPC_CALL_ACK_FINAL, &call->events)) { - hdr.type = RXRPC_PACKET_TYPE_ACKALL; genbit = RXRPC_CALL_ACK_FINAL; - goto send_message; + + ack.bufferSpace = htons(8); + ack.maxSkew = 0; + ack.serial = 0; + ack.reason = RXRPC_ACK_IDLE; + ack.nAcks = 0; + call->ackr_reason = 0; + + spin_lock_bh(&call->lock); + ack.serial = call->ackr_serial; + ack.previousPacket = call->ackr_prev_seq; + ack.firstPacket = htonl(call->rx_data_eaten + 1); + spin_unlock_bh(&call->lock); + + pad = 0; + + iov[1].iov_base = &ack; + iov[1].iov_len = sizeof(ack); + iov[2].iov_base = &pad; + iov[2].iov_len = 3; + iov[3].iov_base = &ackinfo; + iov[3].iov_len = sizeof(ackinfo); + goto send_ACK; } if (call->events & ((1 << RXRPC_CALL_RCVD_BUSY) | @@ -971,8 +1026,6 @@ void rxrpc_process_call(struct work_struct *work) /* consider sending an ordinary ACK */ if (test_bit(RXRPC_CALL_ACK, &call->events)) { - __be32 pad; - _debug("send ACK: window: %d - %d { %lx }",
patch: VFS: fix passing of AT_PHDR value in auxv to ELF interpreter
On a dynamic ELF executable, the current kernel loader gives to the interpreter (in the AUXV vector) the AT_PHDR argument as : offset_of_phdr_in_file + first address. It can be wrong for an executable where the program headers are not located in the first loaded segment. This patch corrects the behaviour. Signed-off-by: Quentin Godfroy <[EMAIL PROTECTED]> --- Here is an example of such an ELF executable which the current code fails on : ftp://quatramaran.ens.fr/pub/godfroy/addrpath/broken-sample --- linux-2.6.21.1/fs/binfmt_elf.c 2007-05-04 03:20:00.0 -0400 +++ linux-2.6.21.1-patch/fs/binfmt_elf.c2007-05-04 08:02:18.0 -0400 @@ -134,6 +134,7 @@ static int padzero(unsigned long elf_bss static int create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec, int interp_aout, unsigned long load_addr, + unsigned long phdr_addr, unsigned long interp_load_addr) { unsigned long p = bprm->p; @@ -190,7 +191,7 @@ create_elf_tables(struct linux_binprm *b NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP); NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE); NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC); - NEW_AUX_ENT(AT_PHDR, load_addr + exec->e_phoff); + NEW_AUX_ENT(AT_PHDR, phdr_addr); NEW_AUX_ENT(AT_PHENT, sizeof(struct elf_phdr)); NEW_AUX_ENT(AT_PHNUM, exec->e_phnum); NEW_AUX_ENT(AT_BASE, interp_load_addr); @@ -529,7 +530,7 @@ static unsigned long randomize_stack_top static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs) { struct file *interpreter = NULL; /* to shut gcc up */ - unsigned long load_addr = 0, load_bias = 0; + unsigned long load_addr = 0, load_bias = 0, phdr_addr = 0; int load_addr_set = 0; char * elf_interpreter = NULL; unsigned int interpreter_type = INTERPRETER_NONE; @@ -718,6 +719,16 @@ static int load_elf_binary(struct linux_ break; } + elf_ppnt = elf_phdata; + for (i = 0; i< loc->elf_ex.e_phnum; i++, elf_ppnt++) + if (elf_ppnt->p_type == PT_PHDR) { + phdr_addr = elf_ppnt->p_vaddr; + break; + } + retval = -ENOEXEC; + if (!phdr_addr) + goto out_free_dentry; + /* Some simple consistency checks for the interpreter */ if (elf_interpreter) { interpreter_type = INTERPRETER_ELF | INTERPRETER_AOUT; @@ -987,7 +998,7 @@ static int load_elf_binary(struct linux_ current->flags &= ~PF_FORKNOEXEC; create_elf_tables(bprm, &loc->elf_ex, (interpreter_type == INTERPRETER_AOUT), - load_addr, interp_load_addr); + load_addr, phdr_addr, interp_load_addr); /* N.B. passed_fileno might not be initialized? */ if (interpreter_type == INTERPRETER_AOUT) current->mm->arg_start += strlen(passed_fileno) + 1; - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Implement renaming for debugfs
On Thu 03-05-07 17:16:02, Greg KH wrote: > On Thu, May 03, 2007 at 11:54:52AM +0200, Jan Kara wrote: > > On Tue 01-05-07 20:26:27, Greg KH wrote: > > > On Mon, Apr 30, 2007 at 07:55:36PM +0200, Jan Kara wrote: > > > > Hello, > > > > > > > > attached patch implements renaming for debugfs. I was asked for this > > > > feature by WLAN guys and I guess it makes sence (they have some debug > > > > info > > > > in the directory identified by interface name and that can change...). > > > > Could someone have a look at what I wrote whether it looks reasonable? > > > > Thanks. > > > > > > > > Honza > > > > > > > > -- > > > > Jan Kara <[EMAIL PROTECTED]> > > > > SuSE CR Labs > > > > > > > Implement debugfs_rename() to allow renaming files/directories in > > > > debugfs. > > > > > > I think you are going to need more infrastructure here, the caller > > > doesn't want to have to allocate a new dentry themselves, they just want > > > to pass in the new filename :) > > Actually, I wanted the call to be in the spirit of other debugfs calls. > > So we have for example: > > void debugfs_remove(struct dentry *dentry) > > That is because 'debugfs_create' returns a dentry. > > > struct dentry *debugfs_create_dir(const char *name, struct dentry *parent) > > etc. > > Same here, you already have a dentry to place this directory into, _and_ > all the user needs to provide is a name for the new directory. They > don't ever create a dentry themselves, which is what your function > required them to do. > > Try using your function and you'll see what I mean :) I've tried it when testing the function :). The code looked like: dir1 = debugfs_create_dir("dir1", NULL); dir2 = debugfs_create_dir("dir2", NULL); file1 = debugfs_create_file("file1", 0644, dir1, NULL, NULL); file2 = debugfs_rename(dir1, file1, dir2, "new_name"); No new dentries needed to be created... > > So it seemed to me that the interface with dentries was perfectly > > appropriate... One possibility would be to take filename of a file to > > rename instead of old_dentry. But dirs should IMHO remain to be dentries... > > Sure, they should be dentries, but the caller should just provide a > name. Hmm, I'm not sure I understand... So you prefer debugfs_rename() to be: struct dentry *debugfs_rename(struct dentry *old_dir, const char *old_name, struct dentry *new_dir, const char *new_name); Right? Honza - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] fs: add 4th case to do_path_lookup
Oh and btw, net/sunrpc/rpc_pipe.c:rpc_lookup_parent() and fs/nfsctl.c:do_open() should be switched to the new code, at which point the path_walk() export can go. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc
On Thu, May 03, 2007 at 11:28:15PM -0700, Andrew Morton wrote: > On Fri, 4 May 2007 16:07:31 +1000 David Chinner <[EMAIL PROTECTED]> wrote: > > On Thu, May 03, 2007 at 09:29:55PM -0700, Andrew Morton wrote: > > > On Thu, 26 Apr 2007 23:33:32 +0530 "Amit K. Arora" <[EMAIL PROTECTED]> > > > wrote: > > > > > > > This patch implements the fallocate() system call and adds support for > > > > i386, x86_64 and powerpc. > > > > > > > > ... > > > > +{ > > > > + struct file *file; > > > > + struct inode *inode; > > > > + long ret = -EINVAL; > > > > + > > > > + if (len == 0 || offset < 0) > > > > + goto out; > > > > > > The posix spec implies that negative `len' is permitted - presumably > > > "allocate > > > ahead of `offset'". How peculiar. > > > > I just checked the man page for posix_fallocate() and it says: > > > > EINVAL offset or len was less than zero. > > > > We should probably follow this lead. > > Yes, I think so. I'm suspecting that > http://www.opengroup.org/onlinepubs/009695399/functions/posix_fallocate.html > is just buggy. Or I can't read. > > I mean, if we're going to support negative `len' then is the byte at > `offset' inside or outside the segment? Head spins. I don't think we should care. If we provide a syscall with the semantics of "allocate from offset to offset+len" then glibc's implementation can turn negative length into two separate fallocate syscalls > > > > + ret = -ENODEV; > > > > + if (!S_ISREG(inode->i_mode)) > > > > + goto out_fput; > > > > > > So we return ENODEV against an S_ISBLK fd, as per the posix spec. That > > > seems a bit silly of them. > > > > H - I thought that the intention of sys_fallocate() was to > > be generic enough to eventually allow preallocation on directories. > > If that is the case, then this check will prevent that > > The above opengroup page only permits S_ISREG. Preallocating directories > sounds quite useful to me, although it's something which would be pretty > hard to emulate if the FS doesn't support it. And there's a decent case to > be made for emulating it - run-anywhere reasons. Does glibc emulation support > directories? Quite unlikely. > > But yes, sounds like a desirable thing. Would XFS support it easily if the > above > check was relaxed? No - right now empty blocks are pruned from the directory immediately so I don't think we really have a concept of empty blocks in the btree structure. dir2 is bloody complex, so adding preallocation is probably not going to be simple to do. It's not high on my list to add, either, because we can typically avoid the worst case directory fragmentation by using larger directory block sizes (e.g. 16k instead of the default 4k on a 4k block size fs). IIRC directory preallocation has been talked about more for ext3/4 Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] fs: add 4th case to do_path_lookup
sorry, I proposed Jeff a reply long ago but haven't done yet. On Fri, May 04, 2007 at 12:02:00AM -0700, Andrew Morton wrote: > > @@ -1125,6 +1125,10 @@ static int fastcall do_path_lookup(int dfd, const > > char *name, > > nd->mnt = mntget(fs->rootmnt); > > nd->dentry = dget(fs->root); > > read_unlock(&fs->lock); > > + } else if (flags & LOOKUP_ONE) { > > + /* nd->mnt and nd->dentry already set, just grab references */ > > + mntget(nd->mnt); > > + dget(nd->dentry); > > } else if (dfd == AT_FDCWD) { > > read_lock(&fs->lock); > > nd->mnt = mntget(fs->pwdmnt); > > Well the patch passes my too-small-to-care-about test ;) > > Unless someone objects I'd suggest that you add it to the unionfs tree. The code is obviously correct. There is one little thing that bothers me, and that's that nd was purely an output paramter to path_lookup and do_path_lookup, and no it's an input paramter for the least used case. It might make sense to just a simple helper ala: static int path_component_lookup(struct dentry *dentry, struct vfsmount *mnt, const char *name, unsigned int flags, struct nameidata *nd) { int retval; nd->last_type = LAST_ROOT; nd->flags = flags; nd->mnt = mntget(mnt); nd->dentry = dget(dentry); nd->depth = 0; retval = path_walk(name, nd); if (unlikely(!retval && !audit_dummy_context() && nd->dentry && nd->dentry->d_inode)) audit_inode(name, nd->dentry->d_inode); return retval; } instead. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] fs: add 4th case to do_path_lookup
On Sun, 29 Apr 2007 23:30:12 -0400 Josef Sipek <[EMAIL PROTECTED]> wrote: > Signed-off-by: Josef 'Jeff' Sipek <[EMAIL PROTECTED]> > > diff --git a/fs/namei.c b/fs/namei.c > index 2995fba..1516a9b 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -1125,6 +1125,10 @@ static int fastcall do_path_lookup(int dfd, const char > *name, > nd->mnt = mntget(fs->rootmnt); > nd->dentry = dget(fs->root); > read_unlock(&fs->lock); > + } else if (flags & LOOKUP_ONE) { > + /* nd->mnt and nd->dentry already set, just grab references */ > + mntget(nd->mnt); > + dget(nd->dentry); > } else if (dfd == AT_FDCWD) { > read_lock(&fs->lock); > nd->mnt = mntget(fs->pwdmnt); > diff --git a/include/linux/namei.h b/include/linux/namei.h > index 92b422b..aa89d97 100644 > --- a/include/linux/namei.h > +++ b/include/linux/namei.h > @@ -48,6 +48,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, > LAST_BIND}; > * - internal "there are more path compnents" flag > * - locked when lookup done with dcache_lock held > * - dentry cache is untrusted; force a real lookup > + * - lookup path from given dentry/vfsmount pair > */ > #define LOOKUP_FOLLOW 1 > #define LOOKUP_DIRECTORY 2 > @@ -55,6 +56,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, > LAST_BIND}; > #define LOOKUP_PARENT16 > #define LOOKUP_NOALT 32 > #define LOOKUP_REVAL 64 > +#define LOOKUP_ONE 128 Well the patch passes my too-small-to-care-about test ;) Unless someone objects I'd suggest that you add it to the unionfs tree. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html