Re: Adding linux_link(2) system call, second round

2011-07-31 Thread Christos Zoulas
In article <20110731224944.ga23...@britannica.bec.de>,
Joerg Sonnenberger   wrote:
>On Sun, Jul 31, 2011 at 07:49:20PM +0200, Emmanuel Dreyfus wrote:
>> Both behaviors are standard compliant, since SUSv2 says nothing about
>> resolving symlinks or not. I found at least one program (glusterfs),
>> which assumes the Linux behavior, and is a real pain to fix on NetBSD
>> because of that.
>
>The standard is explicitly open on this to allow filesystems that
>implement symlinks without using inodes. Essentially, it is valid to
>store a symlink inside the directory entry itself. That's one of the
>reasons why no change semantic is provided either.

And approximately this (storing the symlink data inside the source
inode without using an extra inode of the link target fit) was
attempted in BSD4.4 and if failed miserably. We had to undo it, and
use separate inodes again.

christos



Re: Adding linux_link(2) system call, second round

2011-07-31 Thread Emmanuel Dreyfus
Joerg Sonnenberger  wrote:

> Given the very small number of programs that manage to mess up the
> symlink usage, I'm kind of opposed to providing another system call just
> as work around for them.

You did not explain what problems it would introduce, did you?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: Adding linux_link(2) system call, second round

2011-07-31 Thread Emmanuel Dreyfus
David Holland  wrote:

> You still haven't explained what glusterfs is doing that's so evil or
> why it can't be fixed by having it copy the symlink when that's the
> case in question.

glusterfs uses the native filesystem as its storage backend. When you
rename a filesytem object in the distributed and replicated setup, they
have to make sure it remains accessible by another client during the
operation. 

Directories are all present on all servers and therefore are just
treated by a rename(2). Other objects are stored on some server and are
reteived using a DHT. When they are renamed, they are treated by a
distributed link(2)/rename(2)/unlink(2) algorithm. This breaks on NetBSD
when the object is a symlink to a directory or a symlink to a
nonexistent target, since you cannot link(2) to such an object. 

The fix is not traightforward, and require a change in the way glusterfs
stores symlinks in its distributed and replicated setup. I suspect it
may involve treating such objects like directories, and have them
duplicated on all servers. An alternative would be to sacrifice the
garantee that symlinks are available during a rename, at least for
NetBSD.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: Adding linux_link(2) system call, second round

2011-07-31 Thread David Holland
On Sun, Jul 31, 2011 at 07:49:20PM +0200, Emmanuel Dreyfus wrote:
 > Quick summary for the impatient: NetBSD link(2) first resolves symlinks
 > before doing the actual link to the target. As a result, NetBSD link(2)
 > fails on symlinks to directories or to non existent targets.
 > 
 > On the other side, Linux link(2) is dumb and just create a second
 > symlink with the same inode. Therefore it does not care about the
 > symlink target, and will succeed even if it is a directory or if it is
 > nonexistent. 
 > 
 > Both behaviors are standard compliant, since SUSv2 says nothing about
 > resolving symlinks or not. I found at least one program (glusterfs),
 > which assumes the Linux behavior, and is a real pain to fix on NetBSD
 > because of that.

You still haven't explained what glusterfs is doing that's so evil or
why it can't be fixed by having it copy the symlink when that's the
case in question.

I remain not thrilled about adding this, mostly on the grounds that
adding variant functionality with no clear purpose or value tends to
create maintenance hassles in the long run.

-- 
David A. Holland
dholl...@netbsd.org


Re: Adding linux_link(2) system call, second round

2011-07-31 Thread Joerg Sonnenberger
On Sun, Jul 31, 2011 at 07:49:20PM +0200, Emmanuel Dreyfus wrote:
> Both behaviors are standard compliant, since SUSv2 says nothing about
> resolving symlinks or not. I found at least one program (glusterfs),
> which assumes the Linux behavior, and is a real pain to fix on NetBSD
> because of that.

The standard is explicitly open on this to allow filesystems that
implement symlinks without using inodes. Essentially, it is valid to
store a symlink inside the directory entry itself. That's one of the
reasons why no change semantic is provided either.

Given the very small number of programs that manage to mess up the
symlink usage, I'm kind of opposed to providing another system call just
as work around for them. Besides, NetBSD isn't the only implementation
following this strategy...

Joerg


Re: Adding linux_link(2) system call, second round

2011-07-31 Thread Roland C. Dowdeswell
On Sun, Jul 31, 2011 at 06:36:53PM +, Christos Zoulas wrote:
>

> Also perhaps just call it link2(from, to, flags) in the long tradition
> of adding a number to existing syscalls when extending them ;-)

Or perhaps llink(2) for symmetry with lchmod(2) and lstat(2).

--
Roland Dowdeswell  http://Imrryr.ORG/~elric/


Re: Adding linux_link(2) system call, second round

2011-07-31 Thread Christos Zoulas
On Jul 31,  9:18pm, el...@imrryr.org ("Roland C. Dowdeswell") wrote:
-- Subject: Re: Adding linux_link(2) system call, second round

| On Sun, Jul 31, 2011 at 06:36:53PM +, Christos Zoulas wrote:
| >
| 
| > Also perhaps just call it link2(from, to, flags) in the long tradition
| > of adding a number to existing syscalls when extending them ;-)
| 
| Or perhaps llink(2) for symmetry with lchmod(2) and lstat(2).

I like that even more!

christos


Re: Adding linux_link(2) system call, second round

2011-07-31 Thread Christos Zoulas
In article <1k5abxi.a8h289rm4jc3m%m...@netbsd.org>,
Emmanuel Dreyfus  wrote:
>Quick summary for the impatient: NetBSD link(2) first resolves symlinks
>before doing the actual link to the target. As a result, NetBSD link(2)
>fails on symlinks to directories or to non existent targets.
>
>On the other side, Linux link(2) is dumb and just create a second
>symlink with the same inode. Therefore it does not care about the
>symlink target, and will succeed even if it is a directory or if it is
>nonexistent. 
>
>Both behaviors are standard compliant, since SUSv2 says nothing about
>resolving symlinks or not. I found at least one program (glusterfs),
>which assumes the Linux behavior, and is a real pain to fix on NetBSD
>because of that.
>
>I proposed to implement a linux_link(2), or lazy_link(2), whatever
>sounds nicer. It seems it does not reach consensus, but I am not sure I
>understood why: what are the problems that would be introduced by adding
>such a system call? At least I can tell what benefit it would have: it
>would ease porting from Linux.

I don't have an issue with it as long as:
- fsck does not get confused
- filesystems don't need to be modified to support it
- there is consensus that this is not harmful
- I am also ambivalent about exposing this in the native abi
  because it will only cause confusion.

Also perhaps just call it link2(from, to, flags) in the long tradition
of adding a number to existing syscalls when extending them ;-)

christos



kcpuset(9) interface

2011-07-31 Thread Mindaugas Rasiukevicius
Hello,

Here is a reworked dynamic CPU set implementation for kernel (shared
cpuset.c in src/common will be moved to libc) - a kcpuset(9) interface:

http://www.netbsd.org/~rmind/kcpuset_ng.diff

It supports early use while the system is cold through a fix up mechanism,
see kcpuset_sysinit().  That would enable us to use kcpuset(9) in MD code,
such as pmap(9).  The intention of interface is to:  1) replace hard-coded
parts (e.g. limited to uint32_t or MAXCPUS constant) with a more dynamic
mechanism  2) replace and unify duplicated CPU bitset code (e.g. in MIPS,
PowerPC, sparc64, which have own copies).

Comments?

-- 
Mindaugas


exec and VM_MAP_TOPDOWN - chicken & egg?

2011-07-31 Thread Martin Husemann
I have a small (mostly conceptional) issue with sys/kern/exec_elf.c.
In my view the exec operation is kind of contstructor op for a vmspace,
but on the other hand exec needs to know where to put the interpreter,
which slightly differs if we are about to arrange for topdown VM layout.

My concrete issue popped up when I try to exec in a proc that has no
p_vmspace at all yet - so it crashes when checking for VM_MAP_TOPDOWN
in the vmspace flags.

This is easily worked around by this patch:

Index: exec_elf.c
===
RCS file: /cvsroot/src/sys/kern/exec_elf.c,v
retrieving revision 1.30
diff -c -u -p -r1.30 exec_elf.c
--- exec_elf.c  19 Jul 2011 19:45:36 -  1.30
+++ exec_elf.c  31 Jul 2011 18:01:22 -
@@ -84,6 +84,7 @@ __KERNEL_RCSID(1, "$NetBSD: exec_elf.c,v
 #include 
 
 #include 
+#include 
 
 extern struct emul emul_netbsd;
 
@@ -406,9 +407,19 @@ elf_load_file(struct lwp *l, struct exec
u_long phsize;
Elf_Addr addr = *last;
struct proc *p;
+   bool use_topdown;
 
p = l->l_proc;
 
+   if (p->p_vmspace)
+   use_topdown = p->p_vmspace->vm_map.flags & VM_MAP_TOPDOWN;
+   else
+#ifdef __USING_TOPDOWN_VM
+   use_topdown = true;
+#else
+   use_topdown = false;
+#endif
+
/*
 * 1. open file
 * 2. read filehdr
@@ -552,7 +563,7 @@ elf_load_file(struct lwp *l, struct exec
flags = VMCMD_BASE;
if (addr == ELF_LINK_ADDR)
addr = ph0->p_vaddr;
-   if (p->p_vmspace->vm_map.flags & VM_MAP_TOPDOWN)
+   if (use_topdown)
addr = ELF_TRUNC(addr, ph0->p_align);
else
addr = ELF_ROUND(addr, ph0->p_align);



Obviously this is a hack. Thinking about what happens in the normal case:
we are about to create the new vmspace, but the check tests the flags for
the old vmspace. The new vmspace will not inherit the flags, but will
have the same default as the use_topdown variable I added in the patch.

I would have expected that emulations would care, but I can't find traces
of it. And the only exec format that cares is elf. Wouldn't it be conceptually
cleaner if the "we would like to arrange for topdown VM, if possible" flag
would be part of struct exec_pack and explicitly set upfront (maybbe by
just copying it from the current procs vmspace flags?

Object loaders and emulations could override it, and the vmpspace flag
could later be set accordingly.

Am I missing something?

Martin


Adding linux_link(2) system call, second round

2011-07-31 Thread Emmanuel Dreyfus
Quick summary for the impatient: NetBSD link(2) first resolves symlinks
before doing the actual link to the target. As a result, NetBSD link(2)
fails on symlinks to directories or to non existent targets.

On the other side, Linux link(2) is dumb and just create a second
symlink with the same inode. Therefore it does not care about the
symlink target, and will succeed even if it is a directory or if it is
nonexistent. 

Both behaviors are standard compliant, since SUSv2 says nothing about
resolving symlinks or not. I found at least one program (glusterfs),
which assumes the Linux behavior, and is a real pain to fix on NetBSD
because of that.

I proposed to implement a linux_link(2), or lazy_link(2), whatever
sounds nicer. It seems it does not reach consensus, but I am not sure I
understood why: what are the problems that would be introduced by adding
such a system call? At least I can tell what benefit it would have: it
would ease porting from Linux.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org