Re: resource leak in linux emulation?
In article 201405040936.21907.m...@ecs.vuw.ac.nz, Mark Davies m...@ecs.vuw.ac.nz wrote: On Thu, 24 Apr 2014 07:18:10 David Laight wrote: To fix, this should be added somewhere, probably at sys/kern/kern_exit.c:487 (but I'm not sure if there's a better location): if ((l-l_pflag LP_PIDLID) != 0 l-l_lid != p-p_pid) { proc_free_pid(l-l_lid); } That doesn't look like the right place. I think it should be further down (and with proc_lock held). So can someone suggest where exactly the patch should go. And isn't proc_lock held at this point (entered at line 344, exit at line 569)? How about this? christos Index: kern_exit.c === RCS file: /cvsroot/src/sys/kern/kern_exit.c,v retrieving revision 1.243 diff -u -u -r1.243 kern_exit.c --- kern_exit.c 9 Jun 2013 01:13:47 - 1.243 +++ kern_exit.c 4 May 2014 21:26:00 - @@ -541,12 +541,10 @@ */ pcu_discard_all(l); - /* -* Remaining lwp resources will be freed in lwp_exit2() once we've -* switch to idle context; at that point, we will be marked as a -* full blown zombie. -*/ mutex_enter(p-p_lock); + /* Free the linux lwp id */ + if ((l-l_pflag LP_PIDLID) != 0 l-l_lid != p-p_pid) + proc_free_pid(l-l_lid); lwp_drainrefs(l); lwp_lock(l); l-l_prflag = ~LPR_DETACHED;
Re: Inconsistency with COMPAT_10
In article 5350e2b5.6000...@m00nbsd.net, Maxime Villard m...@m00nbsd.net wrote: Hi all, I think there's an inconsistency with COMPAT_10 in the open() syscall: - kern/vfs_syscalls.c - l.1631 -- #ifdef COMPAT_10 /* XXX: and perhaps later */ if (path == NULL) { pb = pathbuf_create(.); if (pb == NULL) return ENOMEM; } else #endif { error = pathbuf_copyin(path, pb); if (error) return error; } - compat/netbsd32/netbsd32_netbsd.c - l.240 -- if (SCARG(ua, path) != NULL) { error = pathbuf_copyin(SCARG(ua, path), pb); if (error) return error; } else { pb = pathbuf_create(.); if (pb == NULL) return ENOMEM; } - COMPAT_10 should be added in netbsd32, or removed from the native syscall. But I'm not sure which fix should be applied. I guess there's someone around here who knows how to fix that. I guess add COMPAT_10 in netbsd32_netbsd.c christos
Re: Rewrite kernfs and procfs.
On Apr 8, 9:15pm, net...@izyk.ru (Ilia Zykov) wrote: -- Subject: Rewrite kernfs and procfs. | Hello! | I desire become a NetBSD developer and develop this project. Excellent... | Sorry to disturb, maybe I need anything else. What else do you need? | Also little patch, that removes unusable hack(any more, see below) from kernfs and | returns its work. | Right, thanks for fixing that! I meant to look at what broke it, but I kept forgetting about it. Nevertheless, applied. Your application looks fine to me, and I guess membership-e...@netbsd.org is CC:ed. You can add the kernfs patch to it now :-) christos
Re: Enhance ptyfs to handle multiple instances.
On Apr 4, 12:29pm, net...@izyk.ru (Ilya Zykov) wrote: -- Subject: Re: Enhance ptyfs to handle multiple instances. | | - I don't like the refactoring because it makes ptyfs less optional (brings |in code and headers to the base kernel). I think it is simpler to provide |an entry function to get the mount point instead, and this way all the guts |of ptyfs stay in ptyfs. | | Looks better, thank you. | | - Is it important to append to the list? Then perhaps use a different set |of macros than LIST_. I've changed the code just to prepend. | | I hope I did not break it. Comments? | | Order IMPORTANT here, because, pty_getmp returns the first found, | traditionally /dev/pts - must be persistently first(for security too). | All others MPs are useful only inside chroot. Should we put a pointer in the pty node that points to the primary mount point then so we get the correct one? Or that does not work? I will change the list so it always appends. Thanks, christos
Re: Enhance ptyfs to handle multiple instances.
On Apr 4, 6:40pm, net...@izyk.ru (Ilya Zykov) wrote: -- Subject: Re: Enhance ptyfs to handle multiple instances. | | Should we put a pointer in the pty node that points to the primary mount point | then so we get the correct one? | | Why? In general case we forever must return first which mount first, next mount point, | shouldn't replace previous, else incorrect TIOCPTMGET(path) for already opened pty we will have. | Simple appends it to the tail will be good(IMHO). I have to think about it. If you opened a pty in a chroot, the pty node will appear both in the chroot and in the regular mount. If you opened a pty outside the chroot, the pty will appear only in the regular mount and not in the chroot, right? If you have 2 chroots, each one will only show its own ptys, but the regular not rooted mount will show all of them? | Or that does not work? I will change the list | so it always appends. I'll convert to an STAILQ christos
Re: Enhance ptyfs to handle multiple instances.
On Apr 4, 7:28pm, net...@izyk.ru (Ilya Zykov) wrote: -- Subject: Re: Enhance ptyfs to handle multiple instances. | On 04.04.2014 18:55, Christos Zoulas wrote: | On Apr 4, 6:40pm, net...@izyk.ru (Ilya Zykov) wrote: | -- Subject: Re: Enhance ptyfs to handle multiple instances. | | | | | Should we put a pointer in the pty node that points to the primary mount point | | then so we get the correct one? | | | | Why? In general case we forever must return first which mount first, next mount point, | | shouldn't replace previous, else incorrect TIOCPTMGET(path) for already opened pty we will have. | | Simple appends it to the tail will be good(IMHO). | | I have to think about it. If you opened a pty in a chroot, the pty node | will appear both in the chroot and in the regular mount. If you opened | a pty outside the chroot, the pty will appear only in the regular mount | and not in the chroot, right? If you have 2 chroots, each one will only | show its own ptys, but the regular not rooted mount will show all of them? | | | Maybe I do not understand you correctly. | No. Every is in proper MP. I'll test it. christos
Re: Enhance ptyfs to handle multiple instances.
On Apr 2, 10:36am, net...@izyk.ru (Ilya Zykov) wrote: -- Subject: Re: Enhance ptyfs to handle multiple instances. Looks very good. Some changes: - I don't like the refactoring because it makes ptyfs less optional (brings in code and headers to the base kernel). I think it is simpler to provide an entry function to get the mount point instead, and this way all the guts of ptyfs stay in ptyfs. - Is it important to append to the list? Then perhaps use a different set of macros than LIST_. I've changed the code just to prepend. I hope I did not break it. Comments? christos Index: sys/pty.h === RCS file: /cvsroot/src/sys/sys/pty.h,v retrieving revision 1.9 diff -u -p -u -r1.9 pty.h --- sys/pty.h 27 Mar 2014 17:31:56 - 1.9 +++ sys/pty.h 3 Apr 2014 21:51:03 - @@ -41,7 +41,7 @@ int pty_grant_slave(struct lwp *, dev_t, dev_t pty_makedev(char, int); int pty_vn_open(struct vnode *, struct lwp *); struct ptm_pty *pty_sethandler(struct ptm_pty *); -int ptyfs_getmp(struct lwp *, struct mount **); +int pty_getmp(struct lwp *, struct mount **); /* * Ptm_pty is used for switch ptm{x} driver between BSDPTY, PTYFS. @@ -53,7 +53,7 @@ struct ptm_pty { char); int (*makename)(struct mount *, struct lwp *, char *, size_t, dev_t, char); void (*getvattr)(struct mount *, struct lwp *, struct vattr *); - void *arg; + int (*getmp)(struct lwp *, struct mount **); }; #ifdef COMPAT_BSDPTY Index: fs/ptyfs/ptyfs.h === RCS file: /cvsroot/src/sys/fs/ptyfs/ptyfs.h,v retrieving revision 1.11 diff -u -p -u -r1.11 ptyfs.h --- fs/ptyfs/ptyfs.h21 Mar 2014 17:21:53 - 1.11 +++ fs/ptyfs/ptyfs.h3 Apr 2014 21:51:04 - @@ -106,6 +106,8 @@ struct ptyfsnode { }; struct ptyfsmount { + LIST_ENTRY(ptyfsmount) pmnt_le; + struct mount *pmnt_mp; gid_t pmnt_gid; mode_t pmnt_mode; int pmnt_flags; Index: fs/ptyfs/ptyfs_vfsops.c === RCS file: /cvsroot/src/sys/fs/ptyfs/ptyfs_vfsops.c,v retrieving revision 1.48 diff -u -p -u -r1.48 ptyfs_vfsops.c --- fs/ptyfs/ptyfs_vfsops.c 27 Mar 2014 17:31:56 - 1.48 +++ fs/ptyfs/ptyfs_vfsops.c 3 Apr 2014 21:51:04 - @@ -77,6 +77,7 @@ static int ptyfs__allocvp(struct mount * static int ptyfs__makename(struct mount *, struct lwp *, char *, size_t, dev_t, char); static void ptyfs__getvattr(struct mount *, struct lwp *, struct vattr *); +static int ptyfs__getmp(struct lwp *, struct mount **); /* * ptm glue: When we mount, we make ptm point to us. @@ -84,13 +85,37 @@ static void ptyfs__getvattr(struct mount struct ptm_pty *ptyfs_save_ptm; static int ptyfs_count; +static LIST_HEAD(, ptyfsmount) ptyfs_head; + struct ptm_pty ptm_ptyfspty = { ptyfs__allocvp, ptyfs__makename, ptyfs__getvattr, - NULL + ptyfs__getmp, }; +static int +ptyfs__getmp(struct lwp *l, struct mount **mpp) +{ + struct cwdinfo *cwdi = l-l_proc-p_cwdi; + struct mount *mp; + struct ptyfsmount *pmnt; + + LIST_FOREACH(pmnt, ptyfs_head, pmnt_le) { + mp = pmnt-pmnt_mp; + if (cwdi-cwdi_rdir == NULL) + goto ok; + + if (vn_isunder(mp-mnt_vnodecovered, cwdi-cwdi_rdir, l)) + goto ok; + } + *mpp = NULL; + return EOPNOTSUPP; +ok: + *mpp = mp; + return 0; +} + static const char * ptyfs__getpath(struct lwp *l, const struct mount *mp) { @@ -137,6 +162,18 @@ ptyfs__makename(struct mount *mp, struct len = snprintf(tbuf, bufsiz, /dev/null); break; case 't': + /* +* We support traditional ptys, so we can get here, +* if pty had been opened before PTYFS was mounted, +* or was opened through /dev/ptyXX devices. +* Return it only outside chroot for more security :). +*/ + if (l-l_proc-p_cwdi-cwdi_rdir == NULL +ptyfs_save_ptm != NULL +ptyfs_used_get(PTYFSptc, minor(dev), mp, 0) == NULL) + return (*ptyfs_save_ptm-makename)(mp, l, + tbuf, bufsiz, dev, ms); + np = ptyfs__getpath(l, mp); if (np == NULL) return EOPNOTSUPP; @@ -189,6 +226,7 @@ void ptyfs_init(void) { + LIST_INIT(ptyfs_head); malloc_type_attach(M_PTYFSMNT); malloc_type_attach(M_PTYFSTMP); ptyfs_hashinit(); @@ -274,9 +312,9 @@ ptyfs_mount(struct mount *mp, const char return error; } - /* Point pty access to us */ - if (ptyfs_count == 0) { - ptm_ptyfspty.arg = mp; + LIST_INSERT_HEAD(ptyfs_head, pmnt, pmnt_le); +
Re: Enhance ptyfs to handle multiple instances.
On Mar 27, 5:53pm, net...@izyk.ru (Ilya Zykov) wrote: -- Subject: Re: Enhance ptyfs to handle multiple instances. | This is a multi-part message in MIME format. | --040300020609040305030709 | Content-Type: text/plain; charset=ISO-8859-1 | Content-Transfer-Encoding: 7bit | | On 27.03.2014 12:51, Ilya Zykov wrote: | Hello! | Maybe you skipped: | Minor corrections readdir and lookup for multi-mountpoint use. | | ptyfs_vnops.c |6 -- | 1 file changed, 4 insertions(+), 2 deletions(-) | | Resending. | | Also main patch for subject. | I didn't want locate many code in ptm driver, but in real world, | it was the most suitable place(performance, flexible ..). | Most of changes were tied with code refactoring. I had added only one ptyfs_getmp(). | In the near future we will need only decide how will keep,get list of mount points? | Yes, we need to think about it. In the general case there won't be many so it should not be that hard. christos
Re: [M-Labs devel] NetBSD kernel booting on lm32
On Mar 27, 8:55pm, yann.sionn...@gmail.com (Yann Sionneau) wrote: -- Subject: Re: [M-Labs devel] NetBSD kernel booting on lm32 | Le 21/03/14 21:42, Christos Zoulas a écrit : | On Mar 21, 7:05pm, yann.sionn...@gmail.com (Yann Sionneau) wrote: | -- Subject: Re: [M-Labs devel] NetBSD kernel booting on lm32 | | | Do you see the printf: | | | | printf(sysctl_createv: sysctl_locate(%s) returned %d\n, | | nnode.sysctl_name, error); | | | | Yes that's exactly the printf I am seeing, I am using diagnostic and | | debuglock options if that matters. I have seen such messages a bit while | | googling, for instance in the boot of NetBSD/avr32 which have been posted | | to the list. | | This is really strange. I would add some more printfs to see what's causing | it. This printf would appear in all kernels... Perhaps something is | uninitialized and ends up 0 usually? | | christos | FYI I found the root of the issue, setfault was a stub, therefore | kcopy() was behaving weirdly. | Fixed in | https://github.com/fallen/NetBSD/commit/ebb6e13376f0ed008a6dd4ff81ca16e8756ce40e I'll put a comment to that effect! Thanks for the info. christos
Re: Enhance ptyfs to handle multiple instances.
On Mar 28, 12:37am, net...@izyk.ru (Ilya Zykov) wrote: -- Subject: Re: Enhance ptyfs to handle multiple instances. | Please, don't forget this, otherwise readdir returns released(free), but still hashed inode numbers. | Got it, thanks! christos
Re: Enhance ptyfs to handle multiple instances.
On Mar 26, 2:09pm, net...@izyk.ru (Ilya Zykov) wrote: -- Subject: Re: Enhance ptyfs to handle multiple instances. | Index: fs/ptyfs/ptyfs_subr.c | === | RCS file: /cvsil/nbcur/src/sys/fs/ptyfs/ptyfs_subr.c,v | retrieving revision 1.3 | diff -u -r1.3 ptyfs_subr.c | --- fs/ptyfs/ptyfs_subr.c 24 Mar 2014 20:48:08 - 1.3 | +++ fs/ptyfs/ptyfs_subr.c 26 Mar 2014 09:44:44 - | @@ -116,7 +116,7 @@ | static void | ptyfs_getinfo(struct ptyfsnode *ptyfs, struct lwp *l) | { | - extern struct ptm_pty *ptyfs_save_ptm, ptm_ptyfspty; | + extern struct ptm_pty *ptyfs_save_ptm; | | if (ptyfs-ptyfs_type == PTYFSroot) { | ptyfs-ptyfs_mode = S_IRUSR|S_IXUSR|S_IRGRP|S_IXGRP| | @@ -126,7 +126,7 @@ | ptyfs-ptyfs_mode = S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP| | S_IROTH|S_IWOTH; | | - if (ptyfs_save_ptm != NULL ptyfs_save_ptm != ptm_ptyfspty) { | + if (ptyfs_save_ptm != NULL) { | int error; | struct pathbuf *pb; | struct nameidata nd; Ok, | Index: kern/tty_bsdpty.c | === | RCS file: /cvsil/nbcur/src/sys/kern/tty_bsdpty.c,v | retrieving revision 1.1.1.1 | diff -u -r1.1.1.1 tty_bsdpty.c | --- kern/tty_bsdpty.c 4 Mar 2014 18:16:04 - 1.1.1.1 | +++ kern/tty_bsdpty.c 26 Mar 2014 09:44:44 - | @@ -121,7 +121,7 @@ | struct nameidata nd; | char name[TTY_NAMESIZE]; | | - error = (*ptm-makename)(ptm, l, name, sizeof(name), dev, ms); | + error = pty_makename(ptm, l, name, sizeof(name), dev, ms); | if (error) | return error; | Are you sure about this one? It is used when ptyfs is mounted and you have old pty nodes around (so you get consistent names). christos
Re: Enhance ptyfs to handle multiple instances.
On Mar 26, 6:36pm, net...@izyk.ru (Ilya Zykov) wrote: -- Subject: Re: Enhance ptyfs to handle multiple instances. | PTYFS has dependency from ptm driver. | If config has NO_DEV_PTM, PTYFS isn't compiled. | PTYFS is useless without ptm. | | How, better, this condition is fixed in config files? Add ' !no_dev_ptm' next to the ptyfs files in sys/fs/ptyfs/files.ptyfs? I am not sure... Is there a way to bail out configuration of a filesystem if a feature is missing, for example no coda if no venus? christos
Re: Enhance ptyfs to handle multiple instances.
In article 5332dc47.1060...@izyk.ru, Ilya Zykov net...@izyk.ru wrote: | | - error = (*ptm-makename)(ptm, l, name, sizeof(name), dev, ms); | + error = pty_makename(ptm, l, name, sizeof(name), dev, ms); |if (error) |return error; | Are you sure about this one? It is used when ptyfs is mounted and you have old pty nodes around (so you get consistent names). christos I think pty_allocvp can be invoked only when ptm == ptm_bsdpty, therefore: ptm-allocvp == pty_allocvp, ptm-makename == pty_makename. ptm-getvattr == pty_getvattr. In what condition and who can invokes pty_allocvp, when ptm == ptm_ptyfspty? Yes, that is true, ignore me. christos
Re: Enhance ptyfs to handle multiple instances.
On Mar 24, 5:46pm, net...@izyk.ru (Ilya Zykov) wrote: -- Subject: Re: Enhance ptyfs to handle multiple instances. | Hello! | | Please, tell me know if I wrong. | In general case I can't find(easy), from driver, where its device file located on file system, | its vnode or its directory vnode where this file located. | Such files can be many and I can't find what file used for current operation. | Maybe anybody had being attempted get this info from the driver? You can't find from the driver where the device node file is located in the filesystem, as well as you cannot reliably find from the vnode of the device node the filesystem path. There could be many device nodes that satisfy the criteria (you can make your own tty node with mknod) christos
Re: Enhance ptyfs to handle multiple instances.
On Mar 22, 3:50pm, net...@izyk.ru (Ilya Zykov) wrote: -- Subject: Re: Enhance ptyfs to handle multiple instances. | | I don't understand why you want to get rid of the mountpoint arg inside | the pty structure. It certainly makes things faster, and the pty can't | be shared... | | christos | | | Sorry, but I don't understand too, what structure do you mean exactly and how. The mountpoint inside ptm_pty. Perhaps by having separate instances in the ptm driver? christos
Re: Enhance ptyfs to handle multiple instances.
In article 532def5e.2040...@izyk.ru, Ilya Zykov net...@izyk.ru wrote: The mountpoint inside ptm_pty. Perhaps by having separate instances in the ptm driver? christos I think, it's not better. I can do so, but: 1. Now we have only 2 instances ptm_pty, one for ptyfs one for bsdpty and use its mainly for switch from one to other(we will have ptm_pty array). 2. Now we keep local ptyfs' data pointer(mp) inside external ptm_pty it's mistaken way(IMHO). We have useless ping pong local data. Maybe it is conceived for other goal. Easier keep it in local static pointer and don't pass it in parameters every function call. 3. I don't want dispose ptyfs code inside ptm driver. 4. We will have export ptyfs__getpath(). or 5. Use ptm minor numbers for differentiating factor(or other differentiating factor), maybe, useless work, because user space programs use only one instance. Really multiple instances are needed only for chroot. Also it complicates user space programs, but give more productivity(we don't need look up mount point). Ok this is fine for now then. christos
Re: [M-Labs devel] NetBSD kernel booting on lm32
In article CACi+aWYQ9x8c2W6M7qwzOdr=l9cvzkz_p4jsporttsoagjx...@mail.gmail.com, Yann Sionneau yann.sionn...@gmail.com wrote: 2014-03-20 10:38 GMT+01:00 Yann Sionneau yann.sionn...@gmail.com: Hi, I am very happy to announce that the NetBSD/lm32 project is making good progress :) [...] Regards, Hi NetBSD guys, Is this normal I get that much error messages about sysctl_createv ? Am I doing something wrong here? seems return 2 means ENOENT when doing the sysctl_locate() from within sysctl_createv(). Moreover, when adding options INET in my kernel config, then arp_init() is doing a lot of sysctl_createv() one of them is filling a node pointer with NULL, then this node is dereferenced later in other calls to sysctl_createv(), I don't see exactly why this happens ( cf http://nxr.netbsd.org/xref/src/sys/netinet/if_arp.c#1631 ). Do you see the printf: printf(sysctl_createv: sysctl_locate(%s) returned %d\n, nnode.sysctl_name, error); I don't see this in any of the kernels I boot.. christos
Re: Enhance ptyfs to handle multiple instances.
In article 532c0718.3020...@izyk.ru, Ilya Zykov net...@izyk.ru wrote: -=-=-=-=-=- Hello! Correct ptyfs_readdir for multi mount points use. committed. christos
Re: [M-Labs devel] NetBSD kernel booting on lm32
On Mar 21, 7:05pm, yann.sionn...@gmail.com (Yann Sionneau) wrote: -- Subject: Re: [M-Labs devel] NetBSD kernel booting on lm32 | Do you see the printf: | | printf(sysctl_createv: sysctl_locate(%s) returned %d\n, | nnode.sysctl_name, error); | | Yes that's exactly the printf I am seeing, I am using diagnostic and | debuglock options if that matters. I have seen such messages a bit while | googling, for instance in the boot of NetBSD/avr32 which have been posted | to the list. This is really strange. I would add some more printfs to see what's causing it. This printf would appear in all kernels... Perhaps something is uninitialized and ends up 0 usually? christos
Re: Enhance ptyfs to handle multiple instances.
On Mar 21, 10:23pm, net...@izyk.ru (Ilya Zykov) wrote: -- Subject: Re: Enhance ptyfs to handle multiple instances. | If seriously, it's first working prototype for comments and objections. | | It's working as follow: | | Mount first ptyfs instance in /dev/pts(or other path) you can get access to master side | through ptm{x} device. | | Mount second ptyfs instance inside chroot(Example: /var/chroot/test/dev/pts), create ptm{x} | device inside chroot(Ex. /var/chroot/test/dev/ptm{x}). | Chroot: chroot /var/chroot/test /rescue/sh, | now you can see only second instance in /dev/pts. | | I'm leaving for the weekend till Monday. Enjoy! I don't understand why you want to get rid of the mountpoint arg inside the pty structure. It certainly makes things faster, and the pty can't be shared... christos
Re: Enhance ptyfs to handle multiple instances.
On Mar 19, 9:51pm, net...@izyk.ru (Ilya Zykov) wrote: -- Subject: Re: Enhance ptyfs to handle multiple instances. | Ok, but bug will stay in the system, temporarily. No problem, no worse than we have now. christos | | fs/ptyfs/ptyfs_vfsops.c | 16 +++- | kern/tty_ptm.c |9 - | 2 files changed, 19 insertions(+), 6 deletions(-) | | Ilya. | | | --030108000701050802030304 | Content-Type: text/x-patch; | name=ptyfs.mi.02.patch | Content-Transfer-Encoding: 7bit | Content-Disposition: attachment; | filename=ptyfs.mi.02.patch | | Index: fs/ptyfs/ptyfs_vfsops.c | === | RCS file: /cvsil/nbcur/src/sys/fs/ptyfs/ptyfs_vfsops.c,v | retrieving revision 1.1.1.1 | diff -u -p -r1.1.1.1 ptyfs_vfsops.c | --- fs/ptyfs/ptyfs_vfsops.c 4 Mar 2014 18:16:03 - 1.1.1.1 | +++ fs/ptyfs/ptyfs_vfsops.c 19 Mar 2014 17:36:48 - | @@ -109,14 +109,16 @@ ptyfs__getpath(struct lwp *l, const stru | buf = malloc(MAXBUF, M_TEMP, M_WAITOK); | bp = buf + MAXBUF; | *--bp = '\0'; | - error = getcwd_common(cwdi-cwdi_rdir, rootvnode, bp, | + error = getcwd_common(mp-mnt_vnodecovered, cwdi-cwdi_rdir, bp, | buf, MAXBUF / 2, 0, l); | - if (error) /* XXX */ | + if (error) {/* Mount point is out of rdir */ | + rv = NULL; | goto out; | + } | | len = strlen(bp); | if (len sizeof(mp-mnt_stat.f_mntonname)) /* XXX */ | - rv += len; | + rv += strlen(rv) - len; | out: | free(buf, M_TEMP); | return rv; | @@ -128,6 +130,7 @@ ptyfs__makename(struct ptm_pty *pt, stru | { | struct mount *mp = pt-arg; | size_t len; | + const char *np; | | switch (ms) { | case 'p': | @@ -135,8 +138,11 @@ ptyfs__makename(struct ptm_pty *pt, stru | len = snprintf(tbuf, bufsiz, /dev/null); | break; | case 't': | - len = snprintf(tbuf, bufsiz, %s/%llu, ptyfs__getpath(l, mp), | - (unsigned long long)minor(dev)); | + np = ptyfs__getpath(l, mp); | + if (np == NULL) | + return EOPNOTSUPP; | + len = snprintf(tbuf, bufsiz, %s/%llu, np, | + (unsigned long long)minor(dev)); | break; | default: | return EINVAL; | Index: kern/tty_ptm.c | === | RCS file: /cvsil/nbcur/src/sys/kern/tty_ptm.c,v | retrieving revision 1.1.1.2 | diff -u -p -r1.1.1.2 tty_ptm.c | --- kern/tty_ptm.c17 Mar 2014 11:46:10 - 1.1.1.2 | +++ kern/tty_ptm.c19 Mar 2014 17:36:48 - | @@ -381,7 +381,9 @@ ptmioctl(dev_t dev, u_long cmd, void *da | goto bad; | | /* now, put the indices and names into struct ptmget */ | - return pty_fill_ptmget(l, newdev, cfd, sfd, data); | + if ((error = pty_fill_ptmget(l, newdev, cfd, sfd, data)) != 0) | + break; /* goto bad2 */ | + return 0; | default: | #ifdef COMPAT_60 | error = compat_60_ptmioctl(dev, cmd, data, flag, l); | @@ -391,6 +393,11 @@ ptmioctl(dev_t dev, u_long cmd, void *da | DPRINTF((ptmioctl EINVAL\n)); | return EINVAL; | } | +/* bad2: close sfd too */ | + fp = fd_getfile(sfd); | + if (fp != NULL) { | + fd_close(sfd); | + } | bad: | fp = fd_getfile(cfd); | if (fp != NULL) { | | --030108000701050802030304-- -- End of excerpt from Ilya Zykov
Re: Enhance ptyfs to handle multiple instances.
On Mar 14, 12:30pm, net...@izyk.ru (Ilya Zykov) wrote: -- Subject: Enhance ptyfs to handle multiple instances. | Hello! | I desire develop this project. Excellent. | About me. | I am system administrator in little Italy-Russia's firm. I live in Moscow. | OS kernel it's my hobby mainly. | I have free time now and can do this project about 1-2 months. | I have little experience with Linux kernel's tty layer and few accepted patches. | The largest: | https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/tty/tty_buffer.c?id=64325a3be08d364a62ee8f84b2cf86934bc2544a Looks fine to me. | I have few questions about project. | Christos, can I ask you about this? | Please, if anybody has objections or already doing it, tell me know. Nobody is already doing it, and if you have questions, you came to the right place. christos
Re: Enhance ptyfs to handle multiple instances.
In article 20140314143532.ga17...@britannica.bec.de, Joerg Sonnenberger jo...@britannica.bec.de wrote: On Fri, Mar 14, 2014 at 09:51:12AM -0400, Christos Zoulas wrote: I don't think that putting ptmx inside devpts makes sense. OTOH, we could have multiple ptmx devices with different minor numbers and use that as the differentiating factor for the pty devices. I think that's too complex and probably not worth it (at least in the first pass). Actually, it would simplify things a lot if /dev/ptmx was a symlink to /dev/pts/ptmx. The ptyfs instance could assign a unique minor number per instance and use that to associate the instance with the correct mount point. Yes, except that ptmx is supposed to work without ptyfs, but I guess that nobody uses it this way anymore. christos
Re: Enhance ptyfs to handle multiple instances.
On Mar 14, 6:49pm, net...@izyk.ru (Ilya Zykov) wrote: -- Subject: Re: Enhance ptyfs to handle multiple instances. | We first need to decide if disclosing gaps in the pty number is a security | issue. If not, it is simple; we just allocate the next free one and we don't | care about gaps. I.e. first mount can grab 0,1,2,3,5,6 second mount can grab | 4,7,8 etc. | | It introduces limits. How so? | I think that this is not very desirable because it again introduces limits | to the number of ptys per mountpoint. | | I don't understand how? if the first mount can only have [0..n-1] the second [n...2*n] etc... | Ok. | One remark: mount one instance more than one time useless, because, which mount point must return TIOCPTMGET in this case? | Maybe I don't understand fully NetBSD pty layer realization yet. If they point to the same device and they are both reachable, it does not matter. If you are inside a chroot, then a reachable one within the root. christos
Re: Enhance ptyfs to handle multiple instances.
On Mar 14, 5:08pm, campbell+netbsd-tech-k...@mumble.net (Taylor R Campbell) wrote: -- Subject: Re: Enhance ptyfs to handle multiple instances. | We could install a symlink at /dev/ptmx pointing to pts/ptmx, and we | could install a device node at /dev/pts/ptmx, which gets hidden by the | ptyfs mount if you use ptyfs. That way the non-ptyfs case still works | and the ptyfs case enables multi-instance goodies. That is the easy part; the harder part is the kernel driver portion. christos
Re: DIOCGDISKINFO support for vnd
In article 20140311130940.GA456@quark, Patrick Welche pr...@cam.ac.uk wrote: -=-=-=-=-=- The attached trivial patch allows vnd(4) to support generic disk ioctls. The only one in kern/subr_disk.c at the moment is DIOCGDISKINFO. Before: $ ./vndtest /dev/vnd0a vndtest: DIOCGDISKINFO: Inappropriate ioctl for device After: $ ./vndtest /dev/vnd0a size of /dev/vnd0a: 524288 bytes Thanks to pooka@ for help in creating librumpdev_vnd.so which made finding the root of the problem easy. Comments? LGTM. christos
Re: 4byte aligned com(4) and PCI_MAPREG_TYPE_MEM
In article 52f7c96e.6000...@execsw.org, SAITOH Masanobu msai...@execsw.org wrote: Hello, all. I'm now working to support Intel Quark X1000. This chip's internal com is MMIO(PCI_MAPREG_TYPE_MEM). Our com and puc don't support such type of device, yet. To solve the problem, I wrote a patch. Registers of Quark X1000's com are 4byte aligned. Some other machines have such type of device, so I modified COM_INIT_REGS() macro to support both byte aligned and 4byte aligned. This change reduce special modifications done in atheros, rmi and marvell drivers. One of problem is serial console on i386 and amd64. These archs calls consinit() three times. The function is called in the following order: 1) machdep.c::init386() or init_x86_64() 2) init_main.c::main() *) (call uvm_init()) *) (call extent_init()) 3) machdep.c::cpu_startup() When consinit() called in init386(), it calls comcnattach() -comcnattach1() -comcninit() - bus_space_map() with x86_bus_space_mem tag. -bus_space_reservation_map() -x86_mem_add_mapping() -uvm_km_alloc() panic in KASSERT(vm_map_pmap(map) == pmap_kernel()); What should I do? One of the solution is to check whether extent_init() was called or not. There is no easy way to know it, so I added a global variable extent_initted. Is it acceptable? Looks great, can't you use cold instead, or is that too late? christos
Re: [PATCH] netbsd32 swapctl, round 4
On Feb 3, 8:04am, m...@netbsd.org (Emmanuel Dreyfus) wrote: -- Subject: Re: [PATCH] netbsd32 swapctl, round 4 | On Mon, Feb 03, 2014 at 03:06:14AM +, Christos Zoulas wrote: | I thought we decided that it is better to have one sep32 on the stack | and do copyout in the loop. | | I can do that too, I just need to sort out the size question: ok. | +out: | + kmem_free(sep, sizeof(*sep)); | + kmem_free(sep32, sizeof(*sep32)); | The sizes are wrong. | | How are they wrong? * count? christos
Re: [PATCH] netbsd32 swapctl, round 4
In article 20140202043534.ga8...@homeworld.netbsd.org, Emmanuel Dreyfus m...@netbsd.org wrote: Latest revision of the netbsd32 swapctl patch + for (i = 0; i count; i++) { + sep32[i].se_dev = sep[i].se_dev; + sep32[i].se_flags = sep[i].se_flags; + sep32[i].se_nblks = sep[i].se_nblks; + sep32[i].se_inuse = sep[i].se_inuse; + sep32[i].se_priority = sep[i].se_priority; + memcpy(sep32[i].se_path, sep[i].se_path, + sizeof(sep32[i].se_path)); + } + + error = copyout(sep32, SCARG(uap, arg), sizeof(*sep32) * count); I thought we decided that it is better to have one sep32 on the stack and do copyout in the loop. +out: + kmem_free(sep, sizeof(*sep)); + kmem_free(sep32, sizeof(*sep32)); The sizes are wrong. christos
Re: compat_netbsd32 swapctl
In article 20140129153713.gl5...@homeworld.netbsd.org, Emmanuel Dreyfus m...@netbsd.org wrote: On Wed, Jan 29, 2014 at 01:48:35PM +0100, Martin Husemann wrote: My vote for this special case: hard code it #ifdef __x86_64__. If we run into other instances, we can add a define (like: DEV_T_ALIGN_32). Here is a patch that fixes the problem. + default: + panic(unexpected cmd = %d, SCARG(ua, cmd)); + break; Anyone can panic the kernel now... + } + + *retval = 0; + SCARG(ssa, cmd) = SCARG(ua, cmd); + SCARG(ssa, misc) = 1; + + for (i = 0; i SCARG(ua, misc); i++) { + SCARG(ssa, arg) = + (char *)SCARG(ua, arg) + (i * swapctl32_len); + + if ((error = sys_swapctl(l, ssa, rv)) != 0) { + *retval = rv; + break; + } I think you probably want to do some struct conversion because if sys_swapctl expects to write to a bigger buffer, you can end up trashing the user's stack. christos
Re: amd64 kernel, i386 userland
In article 1lfyrch.n0en6eho0ft6m%m...@netbsd.org, Emmanuel Dreyfus m...@netbsd.org wrote: Since nobody opposes, I am going to commit that. Perhaps the option name could be better: NATIVE_EMULROOT or EMULROOT_NATIVE? --- sys/kern/kern_exec.c.orig 2014-01-21 16:55:00.0 +0100 +++ sys/kern/kern_exec.c2014-01-21 16:55:13.0 +0100 @@ -184,9 +184,13 @@ /* NetBSD emul struct */ struct emul emul_netbsd = { .e_name = netbsd, +#ifdef COMPAT_NATIVE + .e_path = COMPAT_NATIVE, +#else .e_path = NULL, +#endif #ifndef __HAVE_MINIMAL_EMUL .e_flags = EMUL_HAS_SYS___syscall, .e_errno = NULL, .e_nosys = SYS_syscall, I think EMUL_NATIVEROOT is better, since everything starts with EMUL? christos
Re: Autoload of pseudo-device driver module
In article pine.neb.4.64.1312280809050.26...@screamer.whooppee.com, Paul Goyette p...@whooppee.com wrote: I've noticed that the vnd(4) driver seems to be able to auto-load when one runs vndconfig. Can someone tell me how this is triggered? The module_autoload() in sys/miscfs/specfs/specfs_vnops.c, triggered on the open of the device vnode. christos
Re: Problem with autounload of nfsserver module
In article pine.neb.4.64.1312140541170@screamer.whooppee.com, Paul Goyette p...@whooppee.com wrote: I believe that the nfsserver module should not be allowed to autounload. Consider the following sequence of events: 1. mountd is started, and calls nfssvc(2) 2. The module subsystem autoloads the nfsserver module 3. mountd continues, adding entries to the exports list So far, everything is fine. However 4. When the autounload timer expires, the module subsystem unloads the nfsserver module 5. As part of nfsserver_modcmd(), the export list is cleared 6. The autounload completes successfully 7. At some later time, we finally get around to starting nfsd. This succeeds, but the export list has been cleared, so there is nothing for nfsd to deliver to the clients. So, depending on how much time it takes between starting mountd and starting nfsd, we could end up serving an empty exports list. The following patch prevents the module subsystem from unloading the nfsserver module. (Manual unloading of the module will still work.) Comments? Perhaps you want it to prevent it from unloading while there are exported filesystems? christos
Re: Problem with autounload of nfsserver module
In article pine.neb.4.64.1312140648400.22...@screamer.whooppee.com, Paul Goyette p...@whooppee.com wrote: On Sat, 14 Dec 2013, Christos Zoulas wrote: In article pine.neb.4.64.1312140541170@screamer.whooppee.com, Paul Goyette p...@whooppee.com wrote: I believe that the nfsserver module should not be allowed to autounload. Consider the following sequence of events: 1. mountd is started, and calls nfssvc(2) 2. The module subsystem autoloads the nfsserver module 3. mountd continues, adding entries to the exports list So far, everything is fine. However 4. When the autounload timer expires, the module subsystem unloads the nfsserver module 5. As part of nfsserver_modcmd(), the export list is cleared 6. The autounload completes successfully 7. At some later time, we finally get around to starting nfsd. This succeeds, but the export list has been cleared, so there is nothing for nfsd to deliver to the clients. So, depending on how much time it takes between starting mountd and starting nfsd, we could end up serving an empty exports list. The following patch prevents the module subsystem from unloading the nfsserver module. (Manual unloading of the module will still work.) Comments? Perhaps you want it to prevent it from unloading while there are exported filesystems? Yeah, that would work, too. :) The following patch prevents the module from being auto-unloaded if there are exported filesystems. (If a manual unload is requested, we will still forcibly delete the exports list.) I committed something similar. christos
re: Problem with autounload of nfsserver module
On Dec 15, 5:07pm, m...@eterna.com.au (matthew green) wrote: -- Subject: re: Problem with autounload of nfsserver module | right, that's my whole point. | | what's the real benefit to the end user for having these modules | use auto unload? mostly they seem to present bugs we have to fix | (which is good for fixing bugs, but annoying for users), instead | of actually being useful. i don't consider the minor memory | savings to be a real consideration here. | | i've long thought we should not auto unload by default. Think about it the other way around... If you don't auto-unload, most users will not run the unload code and the bugs will stay hidden. And you know too well, that I've been bitten by one unload bug that took me a month to figure out, since it manifested as random page faults in the pool code... christos
Re: qsort_r
In article 20131209061036.ge2...@apb-laptoy.apb.alt.za, Alan Barrett a...@cequrux.com wrote: On Sun, 08 Dec 2013, David Holland wrote: My irritation with not being able to pass a data pointer through qsort() boiled over just now. Apparently Linux and/or GNU has a qsort_r() that supports this; so, following is a patch that gives us a compatible qsort_r() plus mergesort_r(), and heapsort_r(). Apparently FreeBSD [1] and GNU [2] have incompatible versions of qsort_r, passing the extra 'thunk' or 'data' argument in a different position. [1]: FreeBSD qsort_r http://www.manpagez.com/man/3/qsort_r/ [2]: Linux qsort_r http://man7.org/linux/man-pages/man3/qsort.3.html If we have to pick one, let's pick the FreeBSD version. Actually let's not (fortunately dh@ chose the right one). We should pick the linux one: http://sourceware.org/ml/libc-alpha/2008-12/msg7.html christos
Re: qsort_r
In article 20131208222953.gb25...@netbsd.org, David Holland dholland-t...@netbsd.org wrote: (Cc: tech-kern because of kheapsort()) My irritation with not being able to pass a data pointer through qsort() boiled over just now. Apparently Linux and/or GNU has a qsort_r() that supports this; so, following is a patch that gives us a compatible qsort_r() plus mergesort_r(), and heapsort_r(). I have done it by having the original, non-_r functions provide a thunk for the comparison function, as this is least invasive. If we think this is too expensive, an alternative is generating a union of function pointers and making tests at the call sites; another option is to duplicate the code (hopefully with cpp rather than CP) but that seems like a bad plan. Note that the thunks use an extra struct to hold the function pointer; this is to satisfy C standards pedantry about void pointers vs. function pointers, and if we decide not to care it could be simplified. This patch was supposed to have all the necessary support widgetry, like namespace.h changes, but there's at least one more thing not in it: MLINKS for the new functions and corresponding setlist changes. If I've forgotten anything else, let me know. heapsort() is used in one place in the kernel as kheapsort(), which takes an extra argument so that the heapsort code itself doesn't have to know how to malloc in the kernel. I have done the following: - add kheapsort_r() - change the signature of plain kheapsort() to move the extra argument to a place it is less likely to cause confusion with the userdata argument; - update the caller for the signature change; but I have not changed the caller to call kheapsort_r instead. Based on the code, this should probably be done later as like many sort calls it's using a global for context. At this point the plain kheapsort can be removed. LGTM. christos
Re: in which we present an ugly hack to make sys/queue.h CIRCLEQ work
In article 96497888-f8c7-49f2-958b-532a2093b...@gmail.com, Dennis Ferguson dennis.c.fergu...@gmail.com wrote: I think getting rid of uses of the CIRCLEQ macros was the right thing to do in any case, since code which works like that doesn't need to exist. I'm not sure that that TAILQ macros are the best answer to the problem, though. Unfortunately I agree... But in the absence of a better alternative... christos
Re: in which we present an ugly hack to make sys/queue.h CIRCLEQ work
On Nov 26, 8:20pm, m...@linuxbox.com (Matt W. Benjamin) wrote: -- Subject: Re: in which we present an ugly hack to make sys/queue.h CIRCLEQ | What's your issue with TAILQ? #define TAILQ_LAST(head, headname) \ (*(((struct headname *)((head)-tqh_last))-tqh_last)) #define TAILQ_PREV(elm, headname, field) \ (*(((struct headname *)((elm)-field.tqe_prev))-tqh_last)) etc. christos
Re: Help for PR kern/46606 is needed
In article 6eefbae491842ca53ddd56690ba31...@mail.marples.name, Roy Marples r...@marples.name wrote: -=-=-=-=-=- Hi On 25/11/2013 10:33, Ryo ONODERA wrote: pulseaudio needs pkgsrc/sysutils/hal, and running hal causes PR kern/46606 kernel panic when the NetBSD system is shutdown. See http://gnats.netbsd.org/46606 (and duplicated bug http://gnats.netbsd.org/47012 ). How to debug this problem? This problem is observed even on NetBSD/amd64 6.99.27 of Tue Nov 19 06:16:01 JST 2013. I have had this for months on i386. I am running the attached patch from christos@ which stops the crash, but probably Does Bad Things. The actual error seems to be when hal starts, but only causes a problem when hal stops. This happens for you at shutdown, because hal is stopped then. Here's the output from a run: I think I understand what's going on finally. 1st start Creation: /usr/src/sys/kern/kern_lwp.c,731: hald[2231]: [uid=0] (0/1) Setuid: /usr/src/sys/kern/kern_prot.c,357: hald[2231]: [uid=0] (1/-1) /usr/src/sys/kern/kern_prot.c,361: hald[2231]: [uid=1005] (0/1) /usr/src/sys/kern/kern_prot.c,381: hald[2231]: [uid=1005] (1/0) Setuid after using p-p_cred for the uid of the process: /usr/src/sys/kern/kern_prot.c,386: hald[2231]: [uid=1005] (1/0) /usr/src/sys/kern/kern_lwp.c,731: hald-runner[1965]: [uid=0] (0/1) 1st stop Destruction: XXX: This is using l-l_cred to find the uid of the lwp. This is still pointing to root?!?!? Didn't we setuid just above to 1005? All creds of hald should be pointing to 1005, yet this lwp cred is still pointing to root. /usr/src/sys/kern/kern_lwp.c,1128: hald[2231]: [uid=0] (1/-1) Now root has one cred missing! So when hald-runner dies we end up: /usr/src/sys/kern/kern_lwp.c,1128: hald-runner[1965]: [uid=0] (0/-1) With lwp count == -1 as you can see below. We should have crashed now, but my patch comments out the KASSERT! Too late now, the damage has been done. 2nd start /usr/src/sys/kern/kern_lwp.c,731: hald[1209]: [uid=0] (4294967295/1) /usr/src/sys/kern/kern_prot.c,357: hald[1209]: [uid=0] (0/-1) /usr/src/sys/kern/kern_prot.c,361: hald[1209]: [uid=1005] (1/1) /usr/src/sys/kern/kern_prot.c,381: hald[1209]: [uid=1005] (2/0) /usr/src/sys/kern/kern_prot.c,386: hald[1209]: [uid=1005] (2/0) /usr/src/sys/kern/kern_lwp.c,731: hald-runner[738]: [uid=0] (4294967295/1) 2nd stop /usr/src/sys/kern/kern_lwp.c,1128: hald[1209]: [uid=1005] (2/-1) /usr/src/sys/kern/kern_lwp.c,1128: hald-runner[738]: [uid=0] (0/-1) Thanks Roy -=-=-=-=-=- Index: sys/kern/kern_lwp.c === RCS file: /cvsroot/src/sys/kern/kern_lwp.c,v retrieving revision 1.175 diff -u -p -r1.175 kern_lwp.c --- sys/kern/kern_lwp.c9 Jun 2013 01:13:47 - 1.175 +++ sys/kern/kern_lwp.c25 Nov 2013 14:31:20 - @@ -781,6 +781,12 @@ lwp_create(lwp_t *l1, proc_t *p2, vaddr_ */ if (p2-p_nlwps != 0 p2 != proc0) { uid_t uid = kauth_cred_getuid(l1-l_cred); + if (strncmp(p2-p_comm, hald, 4) == 0) { + struct uidinfo *uip = uid_find(uid); + printf(%s,%d: %s[%d]: [uid=%d] (%lu/%d)\n, __FILE__, + __LINE__, p2-p_comm, (int)p2-p_pid, (int)uid, + uip-ui_lwpcnt, 1); + } int count = chglwpcnt(uid, 1); if (__predict_false(count p2-p_rlimit[RLIMIT_NTHR].rlim_cur)) { @@ -789,6 +795,13 @@ lwp_create(lwp_t *l1, proc_t *p2, vaddr_ KAUTH_ARG(KAUTH_REQ_PROCESS_RLIMIT_BYPASS), p2-p_rlimit[RLIMIT_NTHR], KAUTH_ARG(RLIMIT_NTHR)) != 0) { + if (strncmp(p2-p_comm, hald, 4) == 0) { + struct uidinfo *uip = uid_find(uid); + printf(%s,%d: %s[%d]: [uid=%d] + (%lu/%d)\n, __FILE__, __LINE__, + p2-p_comm, (int)p2-p_pid, + (int)uid, uip-ui_lwpcnt, -1); + } (void)chglwpcnt(uid, -1); return EAGAIN; } @@ -1174,8 +1187,16 @@ lwp_free(struct lwp *l, bool recycle, bo KASSERT(l != curlwp); KASSERT(last || mutex_owned(p-p_lock)); - if (p != proc0 p-p_nlwps != 1) + if (p != proc0 p-p_nlwps != 1) { + uid_t uid = kauth_cred_getuid(l-l_cred); + if (strncmp(p-p_comm, hald, 4) == 0) { + struct uidinfo *uip = uid_find(uid); + printf(%s,%d: %s[%d]: [uid=%d] (%lu/%d)\n, __FILE__, + __LINE__, p-p_comm, (int)p-p_pid, (int)uid, + uip-ui_lwpcnt, -1); + } (void)chglwpcnt(kauth_cred_getuid(l-l_cred), -1); + }
Re: posix_fallocate
On Nov 17, 1:15pm, k...@munnari.oz.au (Robert Elz) wrote: -- Subject: Re: posix_fallocate | ps: I have not examined the FreeBSD implementation - if they've done it the | hard, safe, way, and worked out all the potential kinks, and if it doesn't | depend too much upon other aspects of their I/O system implementation (like | whatever they have to make softdeps work) then perhaps copying that might be | feasible -- if the demand for this really exists, and it isn't being requested | just because it is in the spec and NetBSD is lacking it. From the cursory look at it, they just write. christos
Re: [patch] changing lua_Number to int64_t
On Nov 17, 10:36am, lourival.n...@gmail.com (Lourival Vieira Neto) wrote: -- Subject: Re: [patch] changing lua_Number to int64_t | I mean know it as a script programmer. I think that would be helpful | to know the exact lua_Number width when you are writing a script. | AFAIK, you don't have sizeof functionality from Lua. So, IMHO, | lua_Number width should be fixed and documented. Lua should provide manifest constants for it (like INTMAX_MAX). Otherwise you'd be making assumptions christos
Re: [patch] changing lua_Number to int64_t
On Nov 17, 10:46am, lourival.n...@gmail.com (Lourival Vieira Neto) wrote: -- Subject: Re: [patch] changing lua_Number to int64_t | On Sun, Nov 17, 2013 at 7:37 AM, Marc Balmer m...@msys.ch wrote: | Am 17.11.13 04:49, schrieb Terry Moore: | I believe that if you want the Lua scripts to be portable across NetBSD | deployments, you should choose a well-known fixed width. | | I don't see this as very important. Lua scripts will hardly depend on | the size of an integer. | | But they could. I think that the script programmers should know if the | numeric data type is enough for their usage (e.g., time diffs). By making it the biggest type possible, you never need to be worried. christos
Re: [patch] changing lua_Number to int64_t
On Nov 17, 3:36pm, lourival.n...@gmail.com (Lourival Vieira Neto) wrote: -- Subject: Re: [patch] changing lua_Number to int64_t | 1. Lua 5.3 will have 64 bit integer support as standard, which will | make interop and reuse between kernel and userspace code much easier, | iff we use int64_t | | If they are using int64_t for integers, I think it is a good reason to us to | stick to int64_t. This is not relevant. The numeric type will still be double, so forget about compatibility between kernel and userland. There is no need for the interpreter to use a fixed width type, but rather it is convenient to use the largest numeric type the machine can represent. christos
Re: [patch] changing lua_Number to int64_t
On Nov 17, 7:14pm, lourival.n...@gmail.com (Lourival Vieira Neto) wrote: -- Subject: Re: [patch] changing lua_Number to int64_t | Humm.. I think that =A72.1 brings a good argument: Standard Lua uses | 64-bit integers and double-precision floats, (...). I think that | would not hurt to stick to the future standard; once 64 bit is good | enough for kernel purposes. Ok, christos
Re: [patch] changing lua_Number to int64_t
In article 52872b0c.5080...@msys.ch, Marc Balmer m...@msys.ch wrote: Changing the number type to int64_t is certainly a good idea. Two questions, however: Why not intmax_t? christos
Re: [patch] changing lua_Number to int64_t
On Nov 16, 9:30pm, lourival.n...@gmail.com (Lourival Vieira Neto) wrote: -- Subject: Re: [patch] changing lua_Number to int64_t | On Sat, Nov 16, 2013 at 8:52 PM, Christos Zoulas chris...@astron.com wrote: | In article 52872b0c.5080...@msys.ch, Marc Balmer m...@msys.ch wrote: | Changing the number type to int64_t is certainly a good idea. Two | questions, however: | | Why not intmax_t? | | My only argument is that int64_t has a well-defined width and, AFAIK, | intmax_t could vary. But I have no strong feelings about this. Do you | think intmax_t would be better? Bigger is better. And you can use %jd to print which is a big win. christos
Re: posix_fallocate
In article 1lcgiu4.18zr2h51aac07zm%m...@netbsd.org, Emmanuel Dreyfus m...@netbsd.org wrote: Hi NetBSD-current seems to lack posix_fallocate(2) http://pubs.opengroup.org/onlinepubs/009695299/functions/posix_fallocate .html Is someone already working on it, or has thoughs about how it should be implemented? FreeBSD has it as a system call. It should be easy to dup. christos
Re: Changing __USING_TOPDOWN_VM to a runtime decision
In article 20131105144023.gc17...@mail.duskware.de, Martin Husemann mar...@duskware.de wrote: -=-=-=-=-=- Hey folks, I would like to change the current (mostly) compile time decision wether we will use top-down VA layout for userland processes to a runtime check. This allows emulations to disable it, and also allows MD code to recognize binaries not suitable for topdown VM layout and give those binaries the old layout. The latter point is what I actually need: on sparc64 we have compiled most code in the medlow code model, which does not allow big addresses. I am about to commit changes that switch this default and properly mark new binaries. To still allow running old binaries, I need something like the attached patch. The patch is mostly straight forward: I define a new flag EXEC_TOPDOWN_VM, initialized by default according to __USING_TOPDOWN_VM, but overridable by a MD function. This way the exec_package carries over the information, wether we will use topdown-vm for the to-be-loaded binary. Most other changes are mechanical, like pass through this information through a few uvm layers. For architectures already using topdown-VM, no change is intended. Comments? I don't like the !!(expr) syntax, I'd prefer to hide the ugliness in a macro that does (expr != 0) christos
Re: zero-length symlinks
In article 20131105220754.gb...@snowdrop.l8s.co.uk, David Laight da...@l8s.co.uk wrote: On Sun, Nov 03, 2013 at 04:35:19PM -0800, John Nemeth wrote: It has to do with the fact that historically mkdir(2) was actually mkdir(3), it wasn't an atomic syscall and was a sequence of operation performed by a library routine... Actually I think you'll find that mkdir way always a system call. It was directory rename that was done with a series of link and unlink system calls. Nope, on 4.1BSD and I believe SVR1 (please correct me), it was a setuid binary that did: mknod(foo, 04, 0); chown(foo, getuid()); link(foo, foo/.); link(., foo/..); Also, if you look at any current fs code the processing of . and .. is special - they will be treated as requests for the current and parent directories regardless of the inodes they reference. Doing otherwise is a complete locking nightmare! I think that this also came much later. I believe with 4.4BSD. christos
Re: autoconf deferred processing
In article 20131022205705.c0dc812...@ren.fdy2.co.uk, Robert Swindells r...@fdy2.co.uk wrote: Can somebody explain how the deferred processing code in subr_autoconf.c is supposed to work ? Looking at config_create_interruptthreads() it creates 8 threads all of which seem to walk the same list and delete elements from it. I'm getting crashes in i386 at startup and am trying to track down what is causing it. The faulting PC is random so I'm looking for anything that calls through function pointers. Robert Swindells Edit subr_autoconf.c and set the number of threads to 1. int interrupt_config_threads = 8; int mountroot_config_threads = 2; Or you can patch them in ddb with boot -d Also you can add DEBUG_AUTOCONF in current, which prints the name of the deferred driver as well as the count ot the deferred mutex. For even better success, boot -1 christos
Re: kgdb on NetBSD/amd64 6.99.23
In article 523aab61.8000...@gmail.com, Jan Danielsson jan.m.daniels...@gmail.com wrote: On 9/18/13 7:00 PM, Jan Danielsson wrote: I'm trying to get kgdb working between two virtual box instances. (I have verified that /dev/tty00 - /dev/tty00 works by running GENERIC kernels and minicom on both virtual machines). [---] Problem #1 solved (worked-around). It looks like RB_KDB isn't being passed over to the kernel properly. I simply commented out if (boothowto RB_KDB), and how have a kernel which actually waits for a remote debugger to attach on boot. I enabled DEBUG_KGDB, and when I attach the debugger from the remote the target clearly reacts to it. Unfortunately, the remote says PC register not available -- and it doesn't appear to actually be connected. Is kgdb supposed to work on amd64 and/or -current? I'm starting to get the feeling that this is somewhat untested. I did not test kgdb last time I upgraded gdb, so it might need fixing... christos
Re: high load, no bottleneck
On Sep 19, 11:35am, buh...@nfbcal.org (Brian Buhrow) wrote: -- Subject: Re: high load, no bottleneck | Hello. the worst case scenario is when a raid set is running in | degraded mode. Greg sent me some notes on how to calculate the memory | utilization in this instance. I'll go dig them out and send them along in | a bit. In theory, if all your raid sets are in degraded mode at once, and | i/o is busy, you could be highly impacted, since you can have up to 40 | i/o's outstanding for each raid set with my configuration option. However, | even on machines with multiple raid5 sets, with 2 of them running in | degraded mode, I've not seen a memory bottleneck. I don't recommend this, | of course, but somethimes stuff happens. In any case, except for the | potential memory utilization, there's no down side to setting this number | in the kernel and not worrying about it anymore. In fact, this is what I | do for all our machines around here regardless of whether the machine is | hosting raid1 sets, raid5 sets or a combination of the two. If we are going to add a sysctl, we might also put a different value for the raid-degraded condition? Ideally I prefer if things autotuned, but that is much more difficult. christos
Re: high load, no bottleneck
On Sep 18, 3:34am, m...@netbsd.org (Emmanuel Dreyfus) wrote: -- Subject: Re: high load, no bottleneck | Christos Zoulas chris...@zoulas.com wrote: | | On large filesystems with many files fsck can take a really long time after | a crash. In my personal experience power outages are much less frequent than | crashes (I crash quite a lot since I always fiddle with things). If you | don't care about fsck time, you don't need WAPBL. | | But you just told me that I will need a fsck after crash now I am | running with vfs.wapbl.flush_disk_cache=0 so I wonder if I should not | just mount without -o log. What are WAPBL benefits when running with | vfs.wapbl.flush_disk_cache=0? You *might* need an fsck after power loss. If you crash and the disk syncs then you should be ok if the disk flushed (which it probably did if you say syncing disks after the panic). christos
Re: high load, no bottleneck
In article 1l9czcn.y6kr35aruvzvm%m...@netbsd.org, Emmanuel Dreyfus m...@netbsd.org wrote: Emmanuel Dreyfus m...@netbsd.org wrote: db{0} show vnode c5a24b08 OBJECT 0xc5a24b08: locked=0, pgops=0xc0b185a8, npages=1720, refs=16 VNODE flags 0x4030MPSAFE,LOCKSWORK,ONWORKLST mp 0xc4a14000 numoutput 0 size 0x6f writesize 0x6f data 0xc5a25d74 writecount 0 holdcnt 2 tag VT_UFS(1) type VREG(1) mount 0xc4a14000 typedata 0xc4fe5480 v_lock 0xc5a24bac While many threads are waiting, another nfsd thread holds the lock with this backtrace: turnstile_block rw_vector_enter wapbl_begin ffs_write VOP_WRITE nfsrv_write nfssvc_nfsd sys_nfssvc syscall I understand it is waiting for another process to complete I/O before passing the entering rwlock in wapbl_begin I have a first-class suspect with this other nfsd thread which is engaged in I/O: sleepq_block wdc_exec_command wd_flushcache wdioctl bdev_ioctl spec_ioctl VOP_IOCTL rf_sync_component_caches raidioctl bdev_ioctl spec_ioctl VOP_IOCTL wapbl_cache_sync Is it a nasty interraction between RAIDframe, NFS and WAPBL? My suggestion is to try: sysctl -w vfs.wapbl.flush_disk_cache=0 for now... christos
Re: high load, no bottleneck
On Sep 17, 9:48pm, m...@netbsd.org (Emmanuel Dreyfus) wrote: -- Subject: Re: high load, no bottleneck | Excellent: the load does not go over 2 now (compared to 50). | | Thank you for saving my day. But now what happens? | I note the SATA disks are in IDE emulation mode, and not AHCI. This is | something I need to try changing: What happens highly depends on the drive (how frequently it flushes cache to disk internally and how long does it keep data in-cache), but it is never good. The best case scenario is would be that WAPBL writes are ordered properly and that cache-flush is only send occasionally between transactionally safe metadata commit points, but it seems that this is not happening (because we are getting too many flushes). The case to worry about is the scenario where the machine suddently loses power, the data never makes it to the physical media, and gets lost from the cache. In this case you might end up with a filesystem that has inconsistent metadata, so the next reboot might end up causing a panic when the filesystem is used. The solution there is to reboot and force an fsck. If you have a UPS I would not worry too much about it; even if your system panics the kernel should issue the flush commands to the disk. BTW I hope that everyone realizes that WAPBL deals only with metadata and not the actual file data, so if you crash/lose power you typically end up with garbage in the active files (usually bits and pieces of files form other files, or NUL's). christos
Re: high load, no bottleneck
On Sep 18, 2:22am, m...@netbsd.org (Emmanuel Dreyfus) wrote: -- Subject: Re: high load, no bottleneck | The case to worry about is the scenario where the machine | suddently loses power, the data never makes it to the physical media, | and gets lost from the cache. In this case you might end up with a | filesystem that has inconsistent metadata, so the next reboot might | end up causing a panic when the filesystem is used. The solution there | is to reboot and force an fsck. | | It seems the system would be better without WAPBL enabled in this case. | Is there any befenit left? On large filesystems with many files fsck can take a really long time after a crash. In my personal experience power outages are much less frequent than crashes (I crash quite a lot since I always fiddle with things). If you don't care about fsck time, you don't need WAPBL. Another easy thing you can try is to put the WAPBL log in a flash drive and re-enable the cache flushes. christos
Re: NFS over-quota not detected if utimes() called before fsync()/close()
In article 20130731222303.gj96...@trav.math.uni-bonn.de, Edgar Fuß e...@math.uni-bonn.de wrote: Yes, I believe you are right. Return an error for all errors. Any idea what the intent of only catching EINTR was? The flawed logic of: If the write fails for any other reason than being unterrupted by the user, why at least not succeed changing the permissions? christos
Re: NFS over-quota not detected if utimes() called before fsync()/close()
In article 20130730211200.gd96...@trav.math.uni-bonn.de, Edgar Fuß e...@math.uni-bonn.de wrote: I think the problem is in nfs_setattr(), sys/nfs/nfs_vnops.c:681, where files are flushed before setattr because a later write of cached data might change timestamps or reset sugid bits, but the only return value of nfs_vinvalbuf() that's treated as an error is EINTR. Why? Any comments on this? We are losing mail because of this problem so I would like to get it fixed. Yes, I believe you are right. Return an error for all errors. christos
Re: ibcs2 syscalls.master problem
In article 51c9db37.1090...@netbsd.org, Jeff Rizzo r...@netbsd.org wrote: The last time sys/compat/ibcs2/syscalls.master was edited [1] (July 2010), the dependent files were not regenerated. There was at least one typo (fixed), but there are also duplicate syscall names, which cause the generated files to break the i386 build. Can someone who knows what's what fix this, so the resulting files work? I did notice that FreeBSD's ibcs2 emulation has more info on at least one of the syscalls. Thanks, +j [1] http://mail-index.netbsd.org/source-changes/2010/07/23/msg011989.html I think FreeBSD does not have ibcs2 emulation anymore... christos
Re: DTrace syscall provider - please test/comment
On Jun 24, 6:12pm, m...@3am-software.com (Matt Thomas) wrote: -- Subject: Re: DTrace syscall provider - please test/comment | | On Jun 24, 2013, at 6:01 PM, Christos Zoulas chris...@astron.com wrote: | | Can't this be done as an addition/enhancement to the trace_enter()/ | trace_exit() facility instead of having to enter each syscall entry? | | that only gets called if p-p_trace_enabled is set. So now you need | a hook to set that on every lwp switch if the provider is tracing. Right, and it (dtrace) can set a different (or the same flag) to enable it. christos
Re: DTrace syscall provider - please test/comment
On Jun 25, 9:32am, m...@3am-software.com (Matt Thomas) wrote: -- Subject: Re: DTrace syscall provider - please test/comment | | On Jun 25, 2013, at 5:25 AM, chris...@zoulas.com (Christos Zoulas) wrote: | | On Jun 24, 6:12pm, m...@3am-software.com (Matt Thomas) wrote: | -- Subject: Re: DTrace syscall provider - please test/comment | | | | | On Jun 24, 2013, at 6:01 PM, Christos Zoulas chris...@astron.com wrote: | | | | Can't this be done as an addition/enhancement to the trace_enter()/ | | trace_exit() facility instead of having to enter each syscall entry? | | | | that only gets called if p-p_trace_enabled is set. So now you need | | a hook to set that on every lwp switch if the provider is tracing. | | Right, and it (dtrace) can set a different (or the same flag) to enable | it. | | | How does it set the same flag since that's per-proc and will need to changed | on context switch. | | A different flag is more overhead per syscall. I am trying to balance that against adding of two more conditionals per syscall per architecture and touching dozens of source files adding the same code in each one. Perhaps the syscall_plain/syscall_fancy idea was not that bad after all :-( Perhaps a different bit on the same flag. If any of them is set, you call trace enter, and you clear/move the bit on context switch. christos
Re: NetBSD/avr32
In article caev1cwcb1-+eyu+enmdq9omh8tzuanu8k4spurxw8uvpwfa...@mail.gmail.com, Tomas Niño Kehoe tomasninoke...@gmail.com wrote: -=-=-=-=-=- Hi all, I'd like to announce the existence of a NetBSD port to the AVR32 processor architecture. This port is being developed in the context of my engineering thesis at the University of Buenos Aires, Argentina. It is directed by Leandro Santi. Looks like you've made a lot of progress. You might be interested in looking at https://wiki.freebsd.org/FreeBSD/avr32 christos
Re: revert broken O_SEARCH
In article 74e9a033-b75c-45b8-beee-a7380baa8...@gmail.com, Garrett Cooper yaneg...@gmail.com wrote: On Jan 13, 2013, at 12:59 AM, Martin Husemann wrote: On Sun, Jan 13, 2013 at 08:49:06AM +, David Holland wrote: Nope, don't have that kind of setup and atf is way too invasive to allow just building the test programs somewhere else. ATF is available from pkgsrc and straight forward to install, so I tried on FreeBSD 9, but they do not have O_SEARCH: ATF is available on FreeBSD CURRENT (and will be the defacto test infrastructure for FreeBSD) as of last October and we're working towards having a sane set of wrapper Makefiles for producing tests (based largely on what jmmv did for NetBSD, but divergent because the build systems are divergent). Would it make sense then to provide the tests as a shared separate repository managed by both projects? I think a large number of the tests can be shared. christos
Re: revert broken O_SEARCH
In article c96ccc7b-08cc-4404-b567-1a5aa2c02...@gmail.com, Garrett Cooper yaneg...@gmail.com wrote: This is what I would like to happen (similar to LTP with Linux), but it hasn't yet because of other items on my priority list of things to do. But, I would really like working with someone at NetBSD (and hopefully eventually DragonFlyBSD and OpenBSD) to make this a reality. Versioning would be the only difficult thing that would need to be properly thought out because functional requirements change over time, and as such some tests may or may not apply (or the requirements may be different) between multiple OS distro versions. Fine, just let me know how I can help :-) christos
Re: Porting FreeBSD drm2 driver
In article 20130112154830.gc22...@falu.nl, Rhialto rhia...@falu.nl wrote: I just noticed that FreeBSD's new 9.1 release has Kernel Mode Setting: The drm2(4) Intel GPU driver, which supports GEM and KMS and works with new generations of GPUs such as IronLake, SandyBridge, and IvyBridge, has been added. The agp(4) driver now supports SandyBridge and IvyBridge CPU northbridges.[r236926, r236927, r239965] (from http://www.freebsd.org/releases/9.1R/relnotes.html) It seems that it is practically impossible to buy a PCI graphics card these days for which X doesn't require KMS. This is going to make the use of NetBSD with X impossible at a very rapid rate. Even the card I that thought was supported properly (Radeon HD 5450-based) isn't - X claims no accelleration. I haven't tried Xv yet (needed to view video) but I fear the worst. Is anybody by any chance working on porting this FreeBSD driver to NetBSD? We are painfully aware of this and we are commissioning work to remedy the situation. christos
Re: Importing lua(4), but where in the source tree?
In article de1aa2ee-2a57-4255-9f6c-84b240596...@msys.ch, Marc Balmer m...@msys.ch wrote: Am 09.01.2013 um 16:28 schrieb matthew green m...@eterna.com.au: I want to import the lua(4) device driver, which is currently a module only, which seems wrong. Is sys/dev/lua/ a good place? can you give a little more details on what is included? Sure. The full diff is at http://www.netbsd.org/~mbalmer/diffs/kernel_lua_010.diff and it's the files that the diff now places in sys/modules/lua/ that I think should better go to sys/dev/lua/ at a guess, if there are more than a couple of files then sys/dev/lua is an OK place, otherwise just sys/dev seems reasonable to me. - Marc Fine (with sys/dev/lua), but: -#define fputs(s, f)printf(s) +#define fputs(s, f)printf(%s, s) And perhaps: -#define realloc(ptr, nsize)kmem_alloc(nsize, KM_SLEEP) -#define free(ptr) kmem_free(ptr, osize) +#define realloc(ptr, nsize)kern_realloc(ptr, nsize, KM_SLEEP) +#define free(ptr) kern_free(ptr) And: -char *strncat(char *dst, const char *src, size_t n); -size_t strspn(const char *s, const char *charset); -size_t strcspn(const char *s, const char *charset); -char *strpbrk(const char *s, const char *charset); +char *strncat(char *, const char *, size_t); +size_t strspn(const char *, const char *); +size_t strcspn(const char *, const char *); +char *strpbrk(const char *, const char *); And where does the luapmf driver go? in sys/dev/lua/ or sys/dev/lua/pmf? christos
Re: WAPBL and write cacheing (was: SATA write performance problems)
In article cajcb3foogeoiw_xgnrkovykigab+vf8jsiffiz8zgpyjrij...@mail.gmail.com, Andy Ruhl acr...@gmail.com wrote: On Thu, Jan 3, 2013 at 4:54 AM, Lars Heidieker lars.heidie...@googlemail.com wrote: On Thu, Jan 3, 2013 at 12:49 PM, Edgar Fuß e...@math.uni-bonn.de wrote: Doesn't this depend on filesystem journaling? Can someone please enlighten me? Is it safe to use write cacheing on a SATA drive with FFS/WAPBL on it? AFAIK it depends on the drive, if it doesn't lie about the command to flush the cache it's safe. WAPBL sends such a command on commit. So I think what you are saying is that WAPBL asks the drive to flush it's volatile cache before the journal update is done? There was talk a while back on some list (I don't remember if it was a NetBSD list) that certain OS behavior (maybe not NetBSD) was flushing cache so often that drive cache performance benefits were essentially negated. So the drive would ignore some of the cache requests which leaves systems using journaling vulnerable. The fail-safe was to just turn off cache completely. Actually it was the addition of: sysctl -w vfs.wapbl.flush_disk_cache=0 and the discussion on the actual behavior of various cache flushing commands on different types of buses and drives. christos
Re: fixing compat_12 getdents
In article 20121210195346.ga8...@apb-laptoy.apb.alt.za, Alan Barrett a...@cequrux.com wrote: also, EINVAL doesn't seem like a great error code for this condition. it's not an input parameter that's causing the error, but rather that the required output format cannot express the data to be returned. I think solaris uses EOVERFLOW for this kind of situation, and ERANGE doesn't seem too bad either. any opinions on that? There's also E2BIG, but I don't think it fits. ERANGE is documented in terms of the available space, while EOVERFLOW is documented in terms of a numeric result. So perhaps EOVERFLOW for integer is too large to fit in N bits, and ERANGE for string is too long to fit in N bytes? Or vice versa? Somebody(TM) should go through the errno(2) documentation and make the descriptions more generic, and add guidance for choosing which code to return. We need to be careful here because the set of errnos returned by many syscalls is fixed by POSIX etc. christos
Re: Making forced unmounts work
In article 31490263-5a8e-411a-bb57-f7fc5cffc...@eis.cs.tu-bs.de, J. Hannken-Illjes hann...@eis.cs.tu-bs.de wrote: The more I think the more I just want to remove forced unmounts. I think that any operation that cannot be undone (and requires reboot to be undone) makes the OS less resilient to failure. To take some examples: - A hard,nointr NFS mount hanging because the server stops responding. Even if it were possible to use fstrans_ here (and it would become ugly) it would not help. The root node of the mount will likely be locked by the first thread trying to lookup so unmount won't be able to even lookup the mount point. If it were possible to run `mount -u' or `unmount' it should be possible to update the mount as `soft,intr' and proceed as usual, kill threads blocking an unmount and unmount. Store the normalized mount path with the mountpoint, look it up in the mount list, make all blocked threads give an I/O error on the current operation, etc. christos
Re: fexecve, round 3
In article 20121125152520.ga17...@panix.com, Thor Lancelot Simon t...@panix.com wrote: On Sat, Nov 24, 2012 at 06:53:16PM +0100, Emmanuel Dreyfus wrote: Let's try to move forward, and I will start will a sum up of what I understand from the standard. It would be nice if we could at least reach consensus on standard interpretation. I think your interpretation of the standard is correct. The particularly problematic part is: O_EXEC is mutually exclusive with O_RDONLY, O_WRONLY, or O_RDWR This -- along with the basic shift from checking permissions when a handle to an object is obtained to checking them when it's used -- is exemplary of the poor design that seems to have gone into this set of features. Does everyone agrees on this interpretation? If we do, next steps are - describe threats this introduce to chrooted processes - decide if they are acceptable and if they are not, propose mitigation. I think you left out part of the solution space: - simply don't include this poorly-designed functionality in NetBSD. Unless you want to change O_RDONLY to be non-zero and version all the syscalls that use it :-) christos
Re: WAPL panic
In article 20121106221628.gl22...@trav.math.uni-bonn.de, Edgar Fuß e...@math.uni-bonn.de wrote: So, while investigating my WAPL performance problems, It looks like I can crash the machine (not reliably, but more often that not) with a simple seq 1 3000 | xargs mkdir command. I get the following backtrace in ddb (wetware OCR): panic: wapbl_register_deallocation: out of resources fatal breakpoint trap in supervisor mode trap type 1 code 0 rip 8016f01d cs 8 rflags 246 cr2 80011fc2d000 cpl 0 rsp fe811e0fe6f0 Stopped in pid 12551.1 (mkdir) at netbsd:breakpoint+0x5: leave db{3} bt breakpoint() at netbs:breakpoint+0x5 vpanic() at netbsd:vpanic+0x1f2 printf_nolog() at netbsd:printf_nolog wapbl_register_inode() at netbsd:wapo_register_inode ffs_truncaze() at netbsd:ffs_truncate+0x917 ufs_direnter() at netbsd:ufs_direnter+0x481 ufs_mkdir() at netbsd:ufs_mkdir+0x617 VOP_MKDIR() at netbsd:VOP_MKDIR+0x3b do_sys_mkdir() at netbsd:do_sys_mkdir+0x10f syscall() at netbsd:syscall+0xc4 It's unreasonable to take a dump because that would take an estimated four to five hours. Is there any reasonable way to get a dump out of a 16G box? Try to get a sparse dump via machdep.sparse_dump=1 christos
Re: ETHERCAP_* ioctl()
In article 5090fc73.4060...@execsw.org, Masanobu SAITOH msai...@execsw.org wrote: Hi, all. I sent the followin mail more than two years ago. http://mail-index.netbsd.org/tech-kern/2010/07/28/msg008613.html As the starting point to solve this problem, I committed the change to add SIOCGETHERCAP stuff. Example: msk0: flags=8802BROADCAST,SIMPLEX,MULTICAST mtu 1500 ec_capabilities=5VLAN_MTU,JUMBO_MTU ec_enabled=0 address: 00:50:43:00:4b:c5 media: Ethernet autoselect status: no carrier wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx,TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6 enabled=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx,TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6 ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU ec_enabled=0 address: 00:1b:21:58:68:34 media: Ethernet autoselect (1000baseT full-duplex,flowcontrol,rxpause,txpause) status: active inet 192.168.1.5 netmask 0xff00 broadcast 192.168.1.255 inet6 fe80::21b:21ff:fe58:6834%wm0 prefixlen 64 scopeid 0x2 inet6 2001:240:694:1:21b:21ff:fe58:6834 prefixlen 64 What do you think about this output? Very nice! christos
Re: suenv
In article c75a84166056c94f84d238a44af9f6ad277...@ausx10mpc103.amer.dell.com, paul_kon...@dell.com wrote: But apache is security critical, isn't it? And it certainly is threaded. Or are you applying the term security critical only to a smaller set of components? Yes, but apache is designed to be threaded. login, su, and other pam users not necessarily. Typically programs know the closure of shared libraries that they can potentially use, and PAM breaks that model. The threaded/non-threaded case is a particularly nasty example, where a program might assume that it can use static storage and non-threaded interfaces (res_foo() instead of res_nfoo(), getdbfoo() instead of getdbfoo_r()) and then suddenly it finds itself in a threaded environment and potential heisen bugs. In the apache case these may effect only the apache user and whatever access it has, but login/su and other PAM users cases this leads to a complete system compromise. christos
Re: suenv
In article 20121023162142.gb24...@panix.com, Thor Lancelot Simon t...@panix.com wrote: Nasty hacks like subverting the protection against LD_PRELOAD on setuid executables are not called for in a case like this. If we resort to them, why should our users trust us to deliver quality software? If you want the wild west, you can find Debian's openssl patches over there -. Not that I advocate doing that (and I will not provide the recipe to do it), but if you want to always load libpthread you can do so via ld.so.conf(5). Resist the temptation :-) christos
Re: fixing zfs
In article 20121014193635.6ccf360...@jupiter.mumble.net, Taylor R Campbell campbell+netbsd-tech-k...@mumble.net wrote: -=-=-=-=-=- The attached patches fixes a lot of issues in our zfs port mainly having to do with locking and our (insane) vop protocols. With it, many of the zfs tests pass much more reliably, although there remain a number that still fail, mainly having to do with permissions and file flags. Currently zfs is badly hosed, and I am pretty confident that at least the intent of these patches is correct even if I have made some mistakes in the details. So I strongly doubt whether committing these patches would make our zfs situation any worse than it currently is. Any objections? If not, I'll commit them tomorrow or in the next few days. None from here. christos
Re: 5.1 vs gdb
In article 201210142318.taa11...@sparkle.rodents-montreal.org, Mouse mo...@rodents-montreal.org wrote: I've run into an issue with gdb on 5.1, and ktrace leads me to think it's likely a kernel issue (hence this list). It wouldn't surprise me too much if I were wrong, though; feel free to point me elsewhere if appropriate. The surface manifestation is straightforward: % cat gdbtest.c int main(void); int main(void) { return(0); } Fixed in 6. You'll need all my sys/ commits around 2011-08-20 - 2011-09-05 christos
Re: pass-through linux ioctl for mfi(4)
On Sep 19, 12:38am, bou...@antioche.eu.org (Manuel Bouyer) wrote: -- Subject: Re: pass-through linux ioctl for mfi(4) | Hello, | so it seems we can't do much better in compat_linux. | Here's an updated patch, which checks the size before malloc in mfifioctl(), | and I also removed a debug printf in compat_linux. | I intend to commit this next weekend. Fine with me. christos
Re: pass-through linux ioctl for mfi(4)
On Sep 17, 2:49pm, bou...@antioche.eu.org (Manuel Bouyer) wrote: -- Subject: Re: pass-through linux ioctl for mfi(4) | I agree, but I don't know how to do this (is there a better way than | hardcoding mfi's major number in compat_linux), can you give details on how | you would do this ? devsw_name2{blk,chr}(mfi, NULL, 0), but that could be expensive. Perhaps do it once? But what about hotplug? christos
Re: pass-through linux ioctl for mfi(4)
On Sep 17, 5:47pm, bou...@antioche.eu.org (Manuel Bouyer) wrote: -- Subject: Re: pass-through linux ioctl for mfi(4) | But this assumes that the mfi driver is compiled in. it doesn't | look right, especially in the context of modules. It works for modules (which is the reason we cannot cache the result). christos
Re: pass-through linux ioctl for mfi(4)
On Sep 17, 6:08pm, bou...@antioche.eu.org (Manuel Bouyer) wrote: -- Subject: Re: pass-through linux ioctl for mfi(4) | Sorry but I can't see how a kernel with COMPAT_LINUX but without | mfi would compile. You you get the major by name using mfi... christos
Re: pass-through linux ioctl for mfi(4)
On Sep 17, 8:42pm, bou...@antioche.eu.org (Manuel Bouyer) wrote: -- Subject: Re: pass-through linux ioctl for mfi(4) | On Mon, Sep 17, 2012 at 02:31:35PM -0400, Christos Zoulas wrote: | On Sep 17, 6:08pm, bou...@antioche.eu.org (Manuel Bouyer) wrote: | -- Subject: Re: pass-through linux ioctl for mfi(4) | | | Sorry but I can't see how a kernel with COMPAT_LINUX but without | | mfi would compile. | | You you get the major by name using mfi... | | I was talking about Robert's solution, which needs the mfiioctl address. | | I looked a bit at this and it's really not straitforward. We need to | get the vp to have the informations we need, so we're basically re-writing | some of sys/kern's code ... | Do you have a better way to get the file's type and major ? file's f_type should be DTYPE_VNODE, then f_data points to the vnode... This whole thing is too complicated. Perhaps add dlsym() in the kernel to do dlsym(mfiioctl)? :-) christos
Re: pass-through linux ioctl for mfi(4)
On Sep 17, 8:42pm, bou...@antioche.eu.org (Manuel Bouyer) wrote: -- Subject: Re: pass-through linux ioctl for mfi(4) | On Mon, Sep 17, 2012 at 02:30:03PM -0400, Christos Zoulas wrote: | On Sep 17, 5:47pm, bou...@antioche.eu.org (Manuel Bouyer) wrote: | -- Subject: Re: pass-through linux ioctl for mfi(4) | | | But this assumes that the mfi driver is compiled in. it doesn't | | look right, especially in the context of modules. | | It works for modules (which is the reason we cannot cache the result). | | But mfi's major won't change, it's independant of the driver being | present or not, isn't it ? Right. One should assume so (until we get devfs at least). christos
Re: pass-through linux ioctl for mfi(4)
On Sep 17, 9:22pm, bou...@antioche.eu.org (Manuel Bouyer) wrote: -- Subject: Re: pass-through linux ioctl for mfi(4) | I agree it's too complicated. Couldn't we just keep the dispatch based on | com then ? Let's leave it as it is. christos
Re: pass-through linux ioctl for mfi(4)
In article 20120916132322.ga6...@antioche.eu.org, Manuel Bouyer bou...@antioche.eu.org wrote: Hello, the attached patch adds a pass-through ioctl interface, with the necessery linux compat code, for mfi(4). This allows to run the linux binary of the MegaCLI tool provided by LSI logic. Adding support for the FreeBSD binaries should be easy, once the COMPAT_FREEBSD is updated to run recent binaries. (I found that running a 9265-8i without MegaCLI has lots of limitations, e.g. you have to reboot and enter firmware to start reconstruction after a disk remplacement). One problem is that the key conflicts with the ossaudio ioctl. What I've done is that I explicitely test for the mfi ioctls in linux_ioctl.c. Does anyone see a better way of handling this ? More generaly, does anyone have any comments about this code ? Where is the patch? christos
Re: CVS commit: src
On Sep 12, 4:04pm, mar...@duskware.de (Martin Husemann) wrote: -- Subject: Re: CVS commit: src | On Wed, Sep 12, 2012 at 01:00:52PM +, Christos Zoulas wrote: | This is orthogonal. I believe that in the discussion we had in core | we decided to not define _UC_TLSBASE unconditionally, and that ports | should define it as needed. | | What does as needed mean here? Can you show an example of an arch not | needing it? I don't have one. christos
Re: freebsd binary and kern.usrstack
In article 20120912202823.ga5...@antioche.eu.org, Manuel Bouyer bou...@antioche.eu.org wrote: Hello, I'm trying to run a FreeBSD binary under emulation, but it dies in this piece of code: if (sysctl(mib, 2, _usrstack, len, NULL, 0) == -1) PANIC(Cannot get kern.usrstack from sysctl); (this is in FreeBSD's src/lib/libthr/thread/thr_init.c). Is there something that can be done about it easily ? And, BTW, do we support FreeBSD threaded binaries ? sysctl kern.smp.cpus may also be needed ... These are simply implemented as emul.freebsd.kern.smp.cpus etc. See how this is done for linux. Yes, there is no support for amd64 binaries, but it is pretty easy to add. It is more work to get all the new syscalls in place. As I mentioned before, if someone can give me a set of binaries and libraries, I can take a whack at it. christos
Re: quotactl permissions
In article 20120905123416.gb10...@homeworld.netbsd.org, Emmanuel Dreyfus m...@netbsd.org wrote: On Wed, Sep 05, 2012 at 06:37:27AM +, David Holland wrote: Changing it to effective uid seems like a good plan. The change below fixes the test case. Is it safe to commit? Yes, but it should all be encapsulated in the kauth call. It is an abstraction violation to do the id check separately. christos Index: sys/ufs/ufs/ufs_quota.c === RCS file: /cvsroot/src/sys/ufs/ufs/ufs_quota.c,v retrieving revision 1.111 diff -U4 -r1.111 ufs_quota.c --- sys/ufs/ufs/ufs_quota.c 26 Aug 2012 02:32:14 - 1.111 +++ sys/ufs/ufs/ufs_quota.c 5 Sep 2012 12:33:07 - @@ -334,9 +334,9 @@ /* XXX shouldn't all this be in kauth ? */ static int quota_get_auth(struct mount *mp, struct lwp *l, uid_t id) { /* The user can always query about his own quota. */ - if (id == kauth_cred_getuid(l-l_cred)) + if (id == kauth_cred_geteuid(l-l_cred)) return 0; return kauth_authorize_system(l-l_cred, KAUTH_SYSTEM_FS_QUOTA, KAUTH_REQ_SYSTEM_FS_QUOTA_GET, mp, KAUTH_ARG(id), NULL); } -- Emmanuel Dreyfus m...@netbsd.org
Re: [PATCH] swapcontext vs libpthread
On Aug 25, 7:00am, m...@netbsd.org (Emmanuel Dreyfus) wrote: -- Subject: Re: [PATCH] swapcontext vs libpthread | FIX: ./alpha/gen/swapcontext.S: CALL(setcontext)/* setcontext(ucp) */ | | That one seems already fine to me. The CALL macro is here to invoke a function | Am I wrong? CALL() is good. | | FIX: ./hppa/gen/swapcontext.S: SYSCALL(setcontext) | | If I try to steal from resumecontext, I would do this. Does it make sense? | | #ifdef PIC | ldw HPPA_FRAME_EDP(%sp), %r19 | addil LT%_C_LABEL(setcontext), %r19 | ldw RT%_C_LABEL(setcontext)(%r1), %r1 | #else | ldilL%_C_LABEL(setcontext), %r1 | ldo R%_C_LABEL(setcontext)(%r1), %r1 | #endif Yes, that loads the address to %1, you'll need to call afterwards. | | FIX: ./mips/gen/_resumecontext.S:SYSTRAP(setcontext) # yes, become it. | FIX: ./mips/gen/swapcontext.S: SYSTRAP(setcontext) | | I would do this: | PIC_TAILCALL(setcontext) I guess. | | FIX?: ./sh3/gen/swapcontext.S:mov.l .L_setcontext, r2 | FIX?: ./sh3/gen/swapcontext.S:2: CALLr2 /* setcontext(ucp) */ | | There is this later in the file, therefore I would say it is okay. | .L_setcontext: CALL_DATUM(_C_LABEL(setcontext), 2b) Ok. | FIX: ./sparc/gen/swapcontext.S: mov SYS_setcontext|SYSCALL_G2RFLAG, %g1 | FIX: ./sparc64/gen/swapcontext.S: mov SYS_setcontext|SYSCALL_G2RFLAG, %g1 | | I would do this: | call_C_LABEL(setcontext) Sure. | | FIX: ./powerpc64/gen/swapcontext.S: bl .setcontext # setcontext(ucp) | | Here it seems to be: | bl PIC_PLT(_C_LABEL(setcontest)) | Ok, sounds good. Portmasters, please chime in! christos
Re: [PATCH] swapcontext vs libpthread
On Aug 25, 9:10am, m...@netbsd.org (Emmanuel Dreyfus) wrote: -- Subject: Re: [PATCH] swapcontext vs libpthread | On Sat, Aug 25, 2012 at 03:10:51AM -0400, Christos Zoulas wrote: | [Call a C function from hppa assembly] | Yes, that loads the address to %1, you'll need to call afterwards. | | It seems to be done with bv,n%r0(%r1). I understand bv | is branch-something, %r1 makes sense, but I am not sure about %r0. Typically %r0 == 0 on RISC... | I would have something like this: | | #ifdef PIC | ldw HPPA_FRAME_EDP(%sp), %r19 | addil LT%_C_LABEL(setcontext), %r19 | ldw RT%_C_LABEL(setcontext)(%r1), %r1 | #else | ldilL%_C_LABEL(setcontext), %r1 | ldo R%_C_LABEL(setcontext)(%r1), %r1 | #endif | bv,n%r0(%r1) | I am not that familiar with PA-RISC assembly, but I guess the tests will fail if this is incorrect :-) | I will prepare a patch with all libc proposed changes in about | 18 hours. Sounds good. christos
Re: [PATCH] swapcontext vs libpthread
In article 20120822170050.gj2...@homeworld.netbsd.org, Emmanuel Dreyfus m...@netbsd.org wrote: -=-=-=-=-=- Here is an updated patch for sorting out swapcontext with libpthread, with documentation and test cases. I would appreciate feedback on LWP_PRESERVETLS flag to _lwp_create(). This tells the kernel that the TLS base register will be used by libpthread and that setcontext() should leave it untouched. This is done in kernel because it seems to be the easiest way: another approach would be to have libpthread overriding setcontext(), but that seems a bad choice: after unsetting _UC_TLSBASE it needs to call the real setcontext, which means doing a system call from libpthread. That looks wrong. Why do you say that? pthread_cancelstub.c does exactly this (wrapping a syscall and calling it) all the time. I don't think we should be getting the kernel involved with this. christos
Re: [RFC][PATCH] _UC_TLSBASE for all ports
In article 20120810173818.ga8...@britannica.bec.de, Joerg Sonnenberger jo...@britannica.bec.de wrote: On Fri, Aug 10, 2012 at 07:31:59PM +0200, Emmanuel Dreyfus wrote: Joerg Sonnenberger jo...@britannica.bec.de wrote: I maintain that trying to move contexts between threads is an inherently bad idea and that it is a very inefficient interface for implementing coroutines. I object to this change for the sake of misdesigned software. Did you look at that test case? This is a nasty bug, and I think we need a way for user to opt out of that behavior. I don't agree with it being a nasty bug. Heck, document it as limitation if you want. But essentially don't mix *context and pthread in this way, it will create other interesting issues later. Like it or not most of the world has turned into linux. We can either provide compatibility where possible (and not overly disgusting) to gain compatibility with 3rd party code developed for linux, or simply say tough, it will not work on NetBSD because we refuse to compromise. It is a slippery slope, but I think in this case it is wise to bend. If we cannot reach agreement here, consult core. christos
Re: [RFC][PATCH] _UC_TLSBASE for all ports
On Aug 11, 11:16am, m...@netbsd.org (Emmanuel Dreyfus) wrote: -- Subject: Re: [RFC][PATCH] _UC_TLSBASE for all ports | Christos Zoulas chris...@astron.com wrote: | | Like it or not most of the world has turned into linux. We can either | provide compatibility where possible (and not overly disgusting) to | gain compatibility with 3rd party code developed for linux, or simply | say tough, it will not work on NetBSD because we refuse to compromise. | | IMO here it is even a worse stance, since we already have the desired | fix for amd64, i386, m68k, mips, vax, and hppa. It would be more like: | it will work on NetBSD, except for sh3, sparc, and sparc64 ports | because we refuse to compromise. And on powerpc and alpha it will work | but with different interfaces. | | We are supposed to focus on portability, it is really weird to argue | that a feature should have a MI interface. I don't see why this change is met with so much resistance... I could believe that, if the change suggested to make this the default behavior (which some would argue it should be...) christos
Re: [RFC][PATCH] _UC_TLSBASE for all ports
On Aug 11, 12:40pm, m...@netbsd.org (Emmanuel Dreyfus) wrote: -- Subject: Re: [RFC][PATCH] _UC_TLSBASE for all ports | Christos Zoulas chris...@zoulas.com wrote: | | I could believe that, if the change suggested to make this the default | behavior (which some would argue it should be...) | | In an ideal world, we would set _UC_TLSBASE by default (as it is today, | except on powerpc), and we would automatically ignore _UC_TLSBASE when | l-l_proc-p_nlwps 1, since we cannot think of a sane usage for that. Well, why don't we make it that way then? christos
Re: [RFC][PATCH] _UC_TLSBASE for all ports
On Aug 11, 5:13pm, m...@netbsd.org (Emmanuel Dreyfus) wrote: -- Subject: Re: [RFC][PATCH] _UC_TLSBASE for all ports | Well, why don't we make it that way then? | | We cannot toggle an option that does not exist, so that require adding | _UC_TLSBASE for ports that miss it. This meets a strong opposition for | now. Again, if there is no consensus in the lists, we'll have to let core decide. christos
Re: [RFC][PATCH] _UC_TLSBASE for all ports
On Aug 11, 1:35pm, t...@panix.com (Thor Lancelot Simon) wrote: -- Subject: Re: [RFC][PATCH] _UC_TLSBASE for all ports | On Sat, Aug 11, 2012 at 06:45:12AM +, Christos Zoulas wrote: | | It is a slippery slope, but I think in this case it is wise to bend. | If we cannot reach agreement here, consult core. | | I see no point bending NetBSD into knots in this case if the resulting | performance is as bad as Joerg claims it will be. Is it actually the | case that our *context() functions are almost as heavy as a full | kernel-level thread switch? The point is that glusterfs works without code modifications (or minimal ones) and wit acceptable performance. christos
Re: malo@pci vs malo@pcmcia
In article 20120803084934.GA3362@bugfree, Arnaud Degroote arnaud.degro...@laas.fr wrote: -=-=-=-=-=- On 01/Aug - 21:43, KIYOHARA Takashi wrote: Hi! all, I have a 'I-O DATA WN-G54/CF'. And some on-board 88W8686 has on pcmcia-bus of Gumstix. I think, malo@pcmcia and malo@pci is all different. These drivers can't merge maybe. From a quick review, the driver seems quite different, and it is probably why they are not merged in OpenBSD. http://www.openbsd.org/cgi-bin/man.cgi?query=maloapropos=0sektion=0manpath=OpenBSD+Currentarch=i386format=html Shall I commit to tree the source for malo@pcmcia? If 'yes' then I will try to verify next week end. :-) I think it would be nice to have the support for malo@pcmcia too in NetBSD. There is not much pcmcia left around, but I guess it would be nice to have the driver for those who have the card. I would commit it. christos
Re: pinning down dk? assignment
In article julcud$9sd$1...@serpens.de, Michael van Elst mlel...@serpens.de wrote: Let wd1 disappear and the raid will try to use wd0a (dk0) and sd0a (dk1). Of course raidframe will notice the mismatch in this case, but you can easily imagine more complex scenarios where it doesn't. But a simple failure case comes from trying to recover the failed wd1 without rebooting. When you replace the drive it may attach as wd1 but then as dk2. Now try to teach raidrame that its component changed the path. That works too because the components are not signed. Actually this is the exact failure I got because wd0 was not found because of the latest ata changes! christos
Re: pinning down dk? assignment
In article 20120723141721.gj4...@trav.math.uni-bonn.de, Edgar Fuß e...@math.uni-bonn.de wrote: Can I somehow pin down which dk? gets assigned to which GPT partition? In a disklabel world, I have components sd2a..sd6a making raid1. I then have raid1a mounted on /export/home and raid1e on /export/mail. In a GPT/wedge word, I have dk0..dk4 (on sd2..sd6) making raid1. I then have dk5 and dk6 on raid1 mounted on /export/home resp. /export/mail. Now suppose the machine comes up with sd6 failing to attach. At that time, I have dk0..dk3, which (plus an absent dk4) will build raid1. I then get dk4 and dk5 on raid1 and the mail fs mounted on /export/home. Use NAME=guid or NAME=underlying-device-name instead of /dev/dkX for the fs_spec field. christos
Re: pinning down dk? assignment
In article juk971$qpi$1...@serpens.de, Michael van Elst mlel...@serpens.de wrote: e...@math.uni-bonn.de (=?iso-8859-1?Q?Edgar_Fu=DF?=) writes: It probably won't help you with raidframe. It would indeed help in my case. In case sd6 has gone missing, so dk4 is on the RAID and not on sd6, it would prevent the wrong filesystem being mounted for dk5. I was refering to the situation with building a raid on wedge components. Wedges on top of raid are no problem. Actually I do exactly that (raid on top of wedges) dk0 at wd0: wd0a dk0: 488397105 blocks at 63, type: raidframe dk1 at wd1: wd1a dk1: 488397105 blocks at 63, type: raidframe dk2 at sd0: sd0a dk2: 117210177 blocks at 63, type: ffs Component on: dk0: 488397105 Component on: dk1: 488397105 Found: dk1 at 0 Found: dk1 at 0 Found(low mod_counter): dk0 at 1 raid0: Components: /dev/dk1 /dev/dk0[**FAILED**] Ignore the failed dk here, this is because of the bad ata code in current. In fstab I have: NAME=raid0a / ffs rw,log 7 1 NAME=raid0b noneswapsw 0 0 NAME=raid0e /varffs rw,log 7 3 NAME=raid0f /usrffs rw,log 7 3 NAME=raid0g /usr/local ffs rw,log 7 3 christos