Re: resource leak in linux emulation?

2014-05-04 Thread Christos Zoulas
In article 201405040936.21907.m...@ecs.vuw.ac.nz,
Mark Davies  m...@ecs.vuw.ac.nz wrote:
On Thu, 24 Apr 2014 07:18:10 David Laight wrote:
  To fix, this should be added somewhere, probably at
  sys/kern/kern_exit.c:487 (but I'm not sure if there's a better
  location):
  if ((l-l_pflag  LP_PIDLID) != 0  l-l_lid != p-p_pid) {
  
  proc_free_pid(l-l_lid);
  
  }
 
 That doesn't look like the right place.
 I think it should be further down (and with proc_lock held).

So can someone suggest where exactly the patch should go.  And isn't proc_lock 
held at this point (entered at line 344, exit at line 569)?

How about this?

christos

Index: kern_exit.c
===
RCS file: /cvsroot/src/sys/kern/kern_exit.c,v
retrieving revision 1.243
diff -u -u -r1.243 kern_exit.c
--- kern_exit.c 9 Jun 2013 01:13:47 -   1.243
+++ kern_exit.c 4 May 2014 21:26:00 -
@@ -541,12 +541,10 @@
 */
pcu_discard_all(l);
 
-   /*
-* Remaining lwp resources will be freed in lwp_exit2() once we've
-* switch to idle context; at that point, we will be marked as a
-* full blown zombie.
-*/
mutex_enter(p-p_lock);
+   /* Free the linux lwp id */
+   if ((l-l_pflag  LP_PIDLID) != 0  l-l_lid != p-p_pid)
+   proc_free_pid(l-l_lid);
lwp_drainrefs(l);
lwp_lock(l);
l-l_prflag = ~LPR_DETACHED;



Re: Inconsistency with COMPAT_10

2014-04-18 Thread Christos Zoulas
In article 5350e2b5.6000...@m00nbsd.net,
Maxime Villard  m...@m00nbsd.net wrote:
Hi all,
I think there's an inconsistency with COMPAT_10 in the open() syscall:

- kern/vfs_syscalls.c - l.1631 --

#ifdef COMPAT_10   /* XXX: and perhaps later */
   if (path == NULL) {
   pb = pathbuf_create(.);
   if (pb == NULL)
   return ENOMEM;
   } else
#endif
   {
   error = pathbuf_copyin(path, pb);
   if (error)
   return error;
   }

-

 compat/netbsd32/netbsd32_netbsd.c - l.240 --

   if (SCARG(ua, path) != NULL) {
   error = pathbuf_copyin(SCARG(ua, path), pb);
   if (error)
   return error;
   } else {
   pb = pathbuf_create(.);
   if (pb == NULL)
   return ENOMEM;
   }

-

COMPAT_10 should be added in netbsd32, or removed from the native
syscall. But I'm not sure which fix should be applied.

I guess there's someone around here who knows how to fix that.

I guess add COMPAT_10 in netbsd32_netbsd.c

christos



Re: Rewrite kernfs and procfs.

2014-04-08 Thread Christos Zoulas
On Apr 8,  9:15pm, net...@izyk.ru (Ilia Zykov) wrote:
-- Subject: Rewrite kernfs and procfs.

| Hello!
| I desire become a NetBSD developer and develop this project.

Excellent...

| Sorry to disturb, maybe I need anything else.

What else do you need?

| Also little patch, that removes unusable hack(any more, see below) from 
kernfs and
| returns its work.
| 

Right, thanks for fixing that! I meant to look at what broke it, but I kept
forgetting about it. Nevertheless, applied.

Your application looks fine to me, and I guess membership-e...@netbsd.org
is CC:ed. You can add the kernfs patch to it now :-)

christos


Re: Enhance ptyfs to handle multiple instances.

2014-04-04 Thread Christos Zoulas
On Apr 4, 12:29pm, net...@izyk.ru (Ilya Zykov) wrote:
-- Subject: Re: Enhance ptyfs to handle multiple instances.

|  
|  - I don't like the refactoring because it makes ptyfs less optional (brings
|in code and headers to the base kernel). I think it is simpler to provide
|an entry function to get the mount point instead, and this way all the 
guts
|of ptyfs stay in ptyfs.
| 
| Looks better, thank you.
| 
|  - Is it important to append to the list? Then perhaps use a different set
|of macros than LIST_. I've changed the code just to prepend.
|  
|  I hope I did not break it. Comments?
| 
| Order IMPORTANT here, because, pty_getmp returns the first found,
| traditionally /dev/pts - must be persistently first(for security too). 
| All others MPs are useful only inside chroot.

Should we put a pointer in the pty node that points to the primary mount point
then so we get the correct one? Or that does not work? I will change the list
so it always appends.

Thanks,

christos


Re: Enhance ptyfs to handle multiple instances.

2014-04-04 Thread Christos Zoulas
On Apr 4,  6:40pm, net...@izyk.ru (Ilya Zykov) wrote:
-- Subject: Re: Enhance ptyfs to handle multiple instances.

|  
|  Should we put a pointer in the pty node that points to the primary mount 
point
|  then so we get the correct one? 
| 
| Why? In general case we forever must return first which mount first, next 
mount point,
| shouldn't replace previous, else incorrect TIOCPTMGET(path) for already 
opened pty we will have.
| Simple appends it to the tail will be good(IMHO).

I have to think about it. If you opened a pty in a chroot, the pty node
will appear both in the chroot and in the regular mount. If you opened
a pty outside the chroot, the pty will appear only in the regular mount
and not in the chroot, right? If you have 2 chroots, each one will only
show its own ptys, but the regular not rooted mount will show all of them?

| Or that does not work? I will change the list
|  so it always appends.

I'll convert to an STAILQ

christos


Re: Enhance ptyfs to handle multiple instances.

2014-04-04 Thread Christos Zoulas
On Apr 4,  7:28pm, net...@izyk.ru (Ilya Zykov) wrote:
-- Subject: Re: Enhance ptyfs to handle multiple instances.

| On 04.04.2014 18:55, Christos Zoulas wrote:
|  On Apr 4,  6:40pm, net...@izyk.ru (Ilya Zykov) wrote:
|  -- Subject: Re: Enhance ptyfs to handle multiple instances.
|  
|  |  
|  |  Should we put a pointer in the pty node that points to the primary 
mount point
|  |  then so we get the correct one? 
|  | 
|  | Why? In general case we forever must return first which mount first, next 
mount point,
|  | shouldn't replace previous, else incorrect TIOCPTMGET(path) for already 
opened pty we will have.
|  | Simple appends it to the tail will be good(IMHO).
|  
|  I have to think about it. If you opened a pty in a chroot, the pty node
|  will appear both in the chroot and in the regular mount. If you opened
|  a pty outside the chroot, the pty will appear only in the regular mount
|  and not in the chroot, right? If you have 2 chroots, each one will only
|  show its own ptys, but the regular not rooted mount will show all of them?
|  
| 
| Maybe I do not understand you correctly.
| No. Every is in proper MP.

I'll test it.

christos


Re: Enhance ptyfs to handle multiple instances.

2014-04-03 Thread Christos Zoulas
On Apr 2, 10:36am, net...@izyk.ru (Ilya Zykov) wrote:
-- Subject: Re: Enhance ptyfs to handle multiple instances.

Looks very good. Some changes:

- I don't like the refactoring because it makes ptyfs less optional (brings
  in code and headers to the base kernel). I think it is simpler to provide
  an entry function to get the mount point instead, and this way all the guts
  of ptyfs stay in ptyfs.
- Is it important to append to the list? Then perhaps use a different set
  of macros than LIST_. I've changed the code just to prepend.

I hope I did not break it. Comments?

christos

Index: sys/pty.h
===
RCS file: /cvsroot/src/sys/sys/pty.h,v
retrieving revision 1.9
diff -u -p -u -r1.9 pty.h
--- sys/pty.h   27 Mar 2014 17:31:56 -  1.9
+++ sys/pty.h   3 Apr 2014 21:51:03 -
@@ -41,7 +41,7 @@ int pty_grant_slave(struct lwp *, dev_t,
 dev_t pty_makedev(char, int);
 int pty_vn_open(struct vnode *, struct lwp *);
 struct ptm_pty *pty_sethandler(struct ptm_pty *);
-int ptyfs_getmp(struct lwp *, struct mount **);
+int pty_getmp(struct lwp *, struct mount **);
 
 /*
  * Ptm_pty is used for switch ptm{x} driver between BSDPTY, PTYFS.
@@ -53,7 +53,7 @@ struct ptm_pty {
char);
int (*makename)(struct mount *, struct lwp *, char *, size_t, dev_t, 
char);
void (*getvattr)(struct mount *, struct lwp *, struct vattr *);
-   void *arg;
+   int (*getmp)(struct lwp *, struct mount **);
 };
 
 #ifdef COMPAT_BSDPTY
Index: fs/ptyfs/ptyfs.h
===
RCS file: /cvsroot/src/sys/fs/ptyfs/ptyfs.h,v
retrieving revision 1.11
diff -u -p -u -r1.11 ptyfs.h
--- fs/ptyfs/ptyfs.h21 Mar 2014 17:21:53 -  1.11
+++ fs/ptyfs/ptyfs.h3 Apr 2014 21:51:04 -
@@ -106,6 +106,8 @@ struct ptyfsnode {
 };
 
 struct ptyfsmount {
+   LIST_ENTRY(ptyfsmount) pmnt_le;
+   struct mount *pmnt_mp;
gid_t pmnt_gid;
mode_t pmnt_mode;
int pmnt_flags;
Index: fs/ptyfs/ptyfs_vfsops.c
===
RCS file: /cvsroot/src/sys/fs/ptyfs/ptyfs_vfsops.c,v
retrieving revision 1.48
diff -u -p -u -r1.48 ptyfs_vfsops.c
--- fs/ptyfs/ptyfs_vfsops.c 27 Mar 2014 17:31:56 -  1.48
+++ fs/ptyfs/ptyfs_vfsops.c 3 Apr 2014 21:51:04 -
@@ -77,6 +77,7 @@ static int ptyfs__allocvp(struct mount *
 static int ptyfs__makename(struct mount *, struct lwp *, char *, size_t,
 dev_t, char);
 static void ptyfs__getvattr(struct mount *, struct lwp *, struct vattr *);
+static int ptyfs__getmp(struct lwp *, struct mount **);
 
 /*
  * ptm glue: When we mount, we make ptm point to us.
@@ -84,13 +85,37 @@ static void ptyfs__getvattr(struct mount
 struct ptm_pty *ptyfs_save_ptm;
 static int ptyfs_count;
 
+static LIST_HEAD(, ptyfsmount) ptyfs_head;
+
 struct ptm_pty ptm_ptyfspty = {
ptyfs__allocvp,
ptyfs__makename,
ptyfs__getvattr,
-   NULL
+   ptyfs__getmp,
 };
 
+static int
+ptyfs__getmp(struct lwp *l, struct mount **mpp)
+{
+   struct cwdinfo *cwdi = l-l_proc-p_cwdi;
+   struct mount *mp;
+   struct ptyfsmount *pmnt;
+ 
+   LIST_FOREACH(pmnt, ptyfs_head, pmnt_le) {
+   mp = pmnt-pmnt_mp;
+   if (cwdi-cwdi_rdir == NULL)
+   goto ok;
+
+   if (vn_isunder(mp-mnt_vnodecovered, cwdi-cwdi_rdir, l))
+   goto ok;
+   }
+   *mpp = NULL;
+   return EOPNOTSUPP;
+ok:
+   *mpp = mp;
+   return 0;
+}
+
 static const char *
 ptyfs__getpath(struct lwp *l, const struct mount *mp)
 {
@@ -137,6 +162,18 @@ ptyfs__makename(struct mount *mp, struct
len = snprintf(tbuf, bufsiz, /dev/null);
break;
case 't':
+   /*
+* We support traditional ptys, so we can get here,
+* if pty had been opened before PTYFS was mounted,
+* or was opened through /dev/ptyXX devices.
+* Return it only outside chroot for more security :).
+*/
+   if (l-l_proc-p_cwdi-cwdi_rdir == NULL
+ptyfs_save_ptm != NULL 
+ptyfs_used_get(PTYFSptc, minor(dev), mp, 0) == NULL)
+   return (*ptyfs_save_ptm-makename)(mp, l,
+   tbuf, bufsiz, dev, ms);
+
np = ptyfs__getpath(l, mp);
if (np == NULL)
return EOPNOTSUPP;
@@ -189,6 +226,7 @@ void
 ptyfs_init(void)
 {
 
+   LIST_INIT(ptyfs_head);
malloc_type_attach(M_PTYFSMNT);
malloc_type_attach(M_PTYFSTMP);
ptyfs_hashinit();
@@ -274,9 +312,9 @@ ptyfs_mount(struct mount *mp, const char
return error;
}
 
-   /* Point pty access to us */
-   if (ptyfs_count == 0) {
-   ptm_ptyfspty.arg = mp;
+   LIST_INSERT_HEAD(ptyfs_head, pmnt, pmnt_le);
+   

Re: Enhance ptyfs to handle multiple instances.

2014-03-27 Thread Christos Zoulas
On Mar 27,  5:53pm, net...@izyk.ru (Ilya Zykov) wrote:
-- Subject: Re: Enhance ptyfs to handle multiple instances.

| This is a multi-part message in MIME format.
| --040300020609040305030709
| Content-Type: text/plain; charset=ISO-8859-1
| Content-Transfer-Encoding: 7bit
| 
| On 27.03.2014 12:51, Ilya Zykov wrote:
|  Hello!
|  Maybe you skipped:
|  Minor corrections readdir and lookup for multi-mountpoint use.
|  
|   ptyfs_vnops.c |6 --
|   1 file changed, 4 insertions(+), 2 deletions(-)
|  
|  Resending.
|  
|  Also main patch for subject.
|  I didn't want locate many code in ptm driver, but in real world,
|  it was the most suitable place(performance, flexible ..).
|  Most of changes were tied with code refactoring. I had added only one 
ptyfs_getmp().
|  In the near future we will need only decide how will keep,get list of mount 
points?
|  

Yes, we need to think about it. In the general case there won't be many so
it should not be that hard.

christos


Re: [M-Labs devel] NetBSD kernel booting on lm32

2014-03-27 Thread Christos Zoulas
On Mar 27,  8:55pm, yann.sionn...@gmail.com (Yann Sionneau) wrote:
-- Subject: Re: [M-Labs devel] NetBSD kernel booting on lm32

| Le 21/03/14 21:42, Christos Zoulas a écrit :
|  On Mar 21,  7:05pm, yann.sionn...@gmail.com (Yann Sionneau) wrote:
|  -- Subject: Re: [M-Labs devel] NetBSD kernel booting on lm32
| 
|  |  Do you see the printf:
|  | 
|  |  printf(sysctl_createv: sysctl_locate(%s) returned 
%d\n,
|  |  nnode.sysctl_name, error);
|  | 
|  | Yes that's exactly the printf I am seeing, I am using diagnostic and
|  | debuglock options if that matters. I have seen such messages a bit while
|  | googling, for instance in the boot of NetBSD/avr32 which have been posted
|  | to the list.
| 
|  This is really strange. I would add some more printfs to see what's causing
|  it. This printf would appear in all kernels... Perhaps something is
|  uninitialized and ends up 0 usually?
| 
|  christos
| FYI I found the root of the issue, setfault was a stub, therefore 
| kcopy() was behaving weirdly.
| Fixed in 
| 
https://github.com/fallen/NetBSD/commit/ebb6e13376f0ed008a6dd4ff81ca16e8756ce40e

I'll put a comment to that effect! Thanks for the info.

christos


Re: Enhance ptyfs to handle multiple instances.

2014-03-27 Thread Christos Zoulas
On Mar 28, 12:37am, net...@izyk.ru (Ilya Zykov) wrote:
-- Subject: Re: Enhance ptyfs to handle multiple instances.

| Please, don't forget this, otherwise readdir returns released(free), but 
still hashed inode numbers.
| 

Got it, thanks!

christos


Re: Enhance ptyfs to handle multiple instances.

2014-03-26 Thread Christos Zoulas
On Mar 26,  2:09pm, net...@izyk.ru (Ilya Zykov) wrote:
-- Subject: Re: Enhance ptyfs to handle multiple instances.

| Index: fs/ptyfs/ptyfs_subr.c
| ===
| RCS file: /cvsil/nbcur/src/sys/fs/ptyfs/ptyfs_subr.c,v
| retrieving revision 1.3
| diff -u -r1.3 ptyfs_subr.c
| --- fs/ptyfs/ptyfs_subr.c 24 Mar 2014 20:48:08 -  1.3
| +++ fs/ptyfs/ptyfs_subr.c 26 Mar 2014 09:44:44 -
| @@ -116,7 +116,7 @@
|  static void
|  ptyfs_getinfo(struct ptyfsnode *ptyfs, struct lwp *l)
|  {
| - extern struct ptm_pty *ptyfs_save_ptm, ptm_ptyfspty;
| + extern struct ptm_pty *ptyfs_save_ptm;
|  
|   if (ptyfs-ptyfs_type == PTYFSroot) {
|   ptyfs-ptyfs_mode = S_IRUSR|S_IXUSR|S_IRGRP|S_IXGRP|
| @@ -126,7 +126,7 @@
|   ptyfs-ptyfs_mode = S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|
|   S_IROTH|S_IWOTH;
|  
| - if (ptyfs_save_ptm != NULL  ptyfs_save_ptm != ptm_ptyfspty) {
| + if (ptyfs_save_ptm != NULL) {
|   int error;
|   struct pathbuf *pb;
|   struct nameidata nd;

Ok,

| Index: kern/tty_bsdpty.c
| ===
| RCS file: /cvsil/nbcur/src/sys/kern/tty_bsdpty.c,v
| retrieving revision 1.1.1.1
| diff -u -r1.1.1.1 tty_bsdpty.c
| --- kern/tty_bsdpty.c 4 Mar 2014 18:16:04 -   1.1.1.1
| +++ kern/tty_bsdpty.c 26 Mar 2014 09:44:44 -
| @@ -121,7 +121,7 @@
|   struct nameidata nd;
|   char name[TTY_NAMESIZE];
|  
| - error = (*ptm-makename)(ptm, l, name, sizeof(name), dev, ms);
| + error = pty_makename(ptm, l, name, sizeof(name), dev, ms);
|   if (error)
|   return error;
|  

Are you sure about this one? It is used when ptyfs is mounted and you have
old pty nodes around (so you get consistent names).

christos


Re: Enhance ptyfs to handle multiple instances.

2014-03-26 Thread Christos Zoulas
On Mar 26,  6:36pm, net...@izyk.ru (Ilya Zykov) wrote:
-- Subject: Re: Enhance ptyfs to handle multiple instances.

| PTYFS has dependency from ptm driver.
| If config has NO_DEV_PTM, PTYFS isn't compiled.
| PTYFS is useless without ptm.
| 
| How, better, this condition is fixed in config files?

Add ' !no_dev_ptm' next to the ptyfs files in sys/fs/ptyfs/files.ptyfs?
I am not sure... Is there a way to bail out configuration of a filesystem
if a feature is missing, for example no coda if no venus?

christos


Re: Enhance ptyfs to handle multiple instances.

2014-03-26 Thread Christos Zoulas
In article 5332dc47.1060...@izyk.ru, Ilya Zykov  net...@izyk.ru wrote:
 |  
 | -  error = (*ptm-makename)(ptm, l, name, sizeof(name), dev, ms);
 | +  error = pty_makename(ptm, l, name, sizeof(name), dev, ms);
 |if (error)
 |return error;
 |  
 
 Are you sure about this one? It is used when ptyfs is mounted and you have
 old pty nodes around (so you get consistent names).
 
 christos
 

I think pty_allocvp can be invoked only when ptm == ptm_bsdpty,
therefore:
ptm-allocvp == pty_allocvp,
ptm-makename == pty_makename.
ptm-getvattr == pty_getvattr.

In what condition and who can invokes pty_allocvp, when ptm == ptm_ptyfspty?

Yes, that is true, ignore me.

christos



Re: Enhance ptyfs to handle multiple instances.

2014-03-24 Thread Christos Zoulas
On Mar 24,  5:46pm, net...@izyk.ru (Ilya Zykov) wrote:
-- Subject: Re: Enhance ptyfs to handle multiple instances.

| Hello!
| 
| Please, tell me know if I wrong.
| In general case I can't find(easy), from driver, where its device file 
located on file system,
| its vnode or its directory vnode where this file located.
| Such files can be many and I can't find what file used for current operation.
| Maybe anybody had being attempted get this info from the driver?

You can't find from the driver where the device node file is located
in the filesystem, as well as you cannot reliably find from the
vnode of the device node the filesystem path. There could be many
device nodes that satisfy the criteria (you can make your own tty
node with mknod)

christos


Re: Enhance ptyfs to handle multiple instances.

2014-03-22 Thread Christos Zoulas
On Mar 22,  3:50pm, net...@izyk.ru (Ilya Zykov) wrote:
-- Subject: Re: Enhance ptyfs to handle multiple instances.

|  
|  I don't understand why you want to get rid of the mountpoint arg inside
|  the pty structure. It certainly makes things faster, and the pty can't
|  be shared...
|  
|  christos
|   
| 
| Sorry, but I don't understand too, what structure do you mean exactly and how.

The mountpoint inside ptm_pty. Perhaps by having separate instances in the ptm
driver?

christos


Re: Enhance ptyfs to handle multiple instances.

2014-03-22 Thread Christos Zoulas
In article 532def5e.2040...@izyk.ru, Ilya Zykov  net...@izyk.ru wrote:
 The mountpoint inside ptm_pty. Perhaps by having separate instances in the 
 ptm
 driver?
 
 christos
 
 

I think, it's not better.
I can do so, but:
 1. Now we have only 2 instances ptm_pty, one for ptyfs one for bsdpty 
and use its mainly for switch from one to other(we will have ptm_pty array).
 2. Now we keep local ptyfs' data pointer(mp) inside external ptm_pty
it's mistaken way(IMHO).
We have useless ping pong local data. Maybe it is conceived for other goal.
Easier  keep it in local static pointer and don't pass it in parameters
every function call.
 3. I don't want dispose ptyfs code inside ptm driver.
 4. We will have export ptyfs__getpath().
 or
 5. Use ptm minor numbers for differentiating factor(or other
differentiating factor), maybe, 
useless work, because user space programs use only one instance. 
Really multiple instances are needed only for chroot.
Also it complicates user space programs, but give more productivity(we
don't need look up mount point).

Ok this is fine for now then.

christos



Re: [M-Labs devel] NetBSD kernel booting on lm32

2014-03-21 Thread Christos Zoulas
In article CACi+aWYQ9x8c2W6M7qwzOdr=l9cvzkz_p4jsporttsoagjx...@mail.gmail.com,
Yann Sionneau  yann.sionn...@gmail.com wrote:
2014-03-20 10:38 GMT+01:00 Yann Sionneau
yann.sionn...@gmail.com:
 Hi,

 I am very happy to announce that the NetBSD/lm32 project is making good
 progress :)

[...]

 Regards,


Hi NetBSD guys,

Is this normal I get that much error messages about sysctl_createv ?
Am I doing something wrong here?

seems return 2 means ENOENT when doing the sysctl_locate() from
within sysctl_createv().

Moreover, when adding options INET in my kernel config, then
arp_init() is doing a lot of sysctl_createv() one of them is filling a
node pointer with NULL, then this node is dereferenced later in
other calls to sysctl_createv(), I don't see exactly why this happens
( cf http://nxr.netbsd.org/xref/src/sys/netinet/if_arp.c#1631 ).

Do you see the printf:

printf(sysctl_createv: sysctl_locate(%s) returned %d\n,
nnode.sysctl_name, error);

I don't see this in any of the kernels I boot..

christos



Re: Enhance ptyfs to handle multiple instances.

2014-03-21 Thread Christos Zoulas
In article 532c0718.3020...@izyk.ru, Ilya Zykov  net...@izyk.ru wrote:
-=-=-=-=-=-

Hello!

Correct ptyfs_readdir for multi mount points use.

committed.

christos



Re: [M-Labs devel] NetBSD kernel booting on lm32

2014-03-21 Thread Christos Zoulas
On Mar 21,  7:05pm, yann.sionn...@gmail.com (Yann Sionneau) wrote:
-- Subject: Re: [M-Labs devel] NetBSD kernel booting on lm32

|  Do you see the printf:
| 
|  printf(sysctl_createv: sysctl_locate(%s) returned %d\n,
|  nnode.sysctl_name, error);
| 
| Yes that's exactly the printf I am seeing, I am using diagnostic and
| debuglock options if that matters. I have seen such messages a bit while
| googling, for instance in the boot of NetBSD/avr32 which have been posted
| to the list.

This is really strange. I would add some more printfs to see what's causing
it. This printf would appear in all kernels... Perhaps something is
uninitialized and ends up 0 usually?

christos


Re: Enhance ptyfs to handle multiple instances.

2014-03-21 Thread Christos Zoulas
On Mar 21, 10:23pm, net...@izyk.ru (Ilya Zykov) wrote:
-- Subject: Re: Enhance ptyfs to handle multiple instances.

| If seriously, it's first working prototype for comments and objections.
| 
| It's working as follow:
| 
| Mount first ptyfs instance in /dev/pts(or other path) you can get access to 
master side
| through ptm{x} device.
| 
| Mount second ptyfs instance inside chroot(Example: /var/chroot/test/dev/pts), 
create ptm{x}
| device inside chroot(Ex. /var/chroot/test/dev/ptm{x}). 
| Chroot: chroot /var/chroot/test /rescue/sh,
| now you can see only second instance in /dev/pts.
| 
| I'm leaving for the weekend till Monday.

Enjoy!

I don't understand why you want to get rid of the mountpoint arg inside
the pty structure. It certainly makes things faster, and the pty can't
be shared...

christos


Re: Enhance ptyfs to handle multiple instances.

2014-03-19 Thread Christos Zoulas
On Mar 19,  9:51pm, net...@izyk.ru (Ilya Zykov) wrote:
-- Subject: Re: Enhance ptyfs to handle multiple instances.

| Ok, but bug will stay in the system, temporarily.

No problem, no worse than we have now.

christos
| 
| fs/ptyfs/ptyfs_vfsops.c |   16 +++-
| kern/tty_ptm.c  |9 -
| 2 files changed, 19 insertions(+), 6 deletions(-)
| 
| Ilya.
| 
| 
| --030108000701050802030304
| Content-Type: text/x-patch;
|  name=ptyfs.mi.02.patch
| Content-Transfer-Encoding: 7bit
| Content-Disposition: attachment;
|  filename=ptyfs.mi.02.patch
| 
| Index: fs/ptyfs/ptyfs_vfsops.c
| ===
| RCS file: /cvsil/nbcur/src/sys/fs/ptyfs/ptyfs_vfsops.c,v
| retrieving revision 1.1.1.1
| diff -u -p -r1.1.1.1 ptyfs_vfsops.c
| --- fs/ptyfs/ptyfs_vfsops.c   4 Mar 2014 18:16:03 -   1.1.1.1
| +++ fs/ptyfs/ptyfs_vfsops.c   19 Mar 2014 17:36:48 -
| @@ -109,14 +109,16 @@ ptyfs__getpath(struct lwp *l, const stru
|   buf = malloc(MAXBUF, M_TEMP, M_WAITOK);
|   bp = buf + MAXBUF;
|   *--bp = '\0';
| - error = getcwd_common(cwdi-cwdi_rdir, rootvnode, bp,
| + error = getcwd_common(mp-mnt_vnodecovered, cwdi-cwdi_rdir, bp,
|   buf, MAXBUF / 2, 0, l);
| - if (error)  /* XXX */
| + if (error) {/* Mount point is out of rdir */
| + rv = NULL;
|   goto out;
| + }
|  
|   len = strlen(bp);
|   if (len  sizeof(mp-mnt_stat.f_mntonname)) /* XXX */
| - rv += len;
| + rv += strlen(rv) - len;
|  out:
|   free(buf, M_TEMP);
|   return rv;
| @@ -128,6 +130,7 @@ ptyfs__makename(struct ptm_pty *pt, stru
|  {
|   struct mount *mp = pt-arg;
|   size_t len;
| + const char *np;
|  
|   switch (ms) {
|   case 'p':
| @@ -135,8 +138,11 @@ ptyfs__makename(struct ptm_pty *pt, stru
|   len = snprintf(tbuf, bufsiz, /dev/null);
|   break;
|   case 't':
| - len = snprintf(tbuf, bufsiz, %s/%llu, ptyfs__getpath(l, mp),
| - (unsigned long long)minor(dev));
| + np = ptyfs__getpath(l, mp);
| + if (np == NULL)
| + return EOPNOTSUPP;
| + len = snprintf(tbuf, bufsiz, %s/%llu, np,
| + (unsigned long long)minor(dev));
|   break;
|   default:
|   return EINVAL;
| Index: kern/tty_ptm.c
| ===
| RCS file: /cvsil/nbcur/src/sys/kern/tty_ptm.c,v
| retrieving revision 1.1.1.2
| diff -u -p -r1.1.1.2 tty_ptm.c
| --- kern/tty_ptm.c17 Mar 2014 11:46:10 -  1.1.1.2
| +++ kern/tty_ptm.c19 Mar 2014 17:36:48 -
| @@ -381,7 +381,9 @@ ptmioctl(dev_t dev, u_long cmd, void *da
|   goto bad;
|  
|   /* now, put the indices and names into struct ptmget */
| - return pty_fill_ptmget(l, newdev, cfd, sfd, data);
| + if ((error = pty_fill_ptmget(l, newdev, cfd, sfd, data)) != 0)
| + break;  /* goto bad2 */
| + return 0;
|   default:
|  #ifdef COMPAT_60
|   error = compat_60_ptmioctl(dev, cmd, data, flag, l);
| @@ -391,6 +393,11 @@ ptmioctl(dev_t dev, u_long cmd, void *da
|   DPRINTF((ptmioctl EINVAL\n));
|   return EINVAL;
|   }
| +/* bad2: close sfd too */
| + fp = fd_getfile(sfd);
| + if (fp != NULL) {
| + fd_close(sfd);
| + }
|   bad:
|   fp = fd_getfile(cfd);
|   if (fp != NULL) {
| 
| --030108000701050802030304--
-- End of excerpt from Ilya Zykov




Re: Enhance ptyfs to handle multiple instances.

2014-03-14 Thread Christos Zoulas
On Mar 14, 12:30pm, net...@izyk.ru (Ilya Zykov) wrote:
-- Subject: Enhance ptyfs to handle multiple instances.

| Hello!
| I desire develop this project.

Excellent.

| About me.
| I am system administrator in little Italy-Russia's firm. I live in Moscow.
| OS kernel it's my hobby mainly.
| I have free time now and can do this project about 1-2 months.
| I have little experience with Linux kernel's tty layer and few accepted 
patches.
| The largest:
| 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/tty/tty_buffer.c?id=64325a3be08d364a62ee8f84b2cf86934bc2544a

Looks fine to me.

| I have few questions about project.
| Christos, can I ask you about this?
| Please, if anybody has objections or already doing it, tell me know.

Nobody is already doing it, and if you have questions, you came to the right
place.

christos


Re: Enhance ptyfs to handle multiple instances.

2014-03-14 Thread Christos Zoulas
In article 20140314143532.ga17...@britannica.bec.de,
Joerg Sonnenberger  jo...@britannica.bec.de wrote:
On Fri, Mar 14, 2014 at 09:51:12AM -0400, Christos Zoulas wrote:
 I don't think that putting ptmx inside devpts makes sense. OTOH, we
 could have multiple ptmx devices with different minor numbers and use that
 as the differentiating factor for the pty devices. I think that's too complex
 and probably not worth it (at least in the first pass).

Actually, it would simplify things a lot if /dev/ptmx was a symlink to
/dev/pts/ptmx. The ptyfs instance could assign a unique minor number per
instance and use that to associate the instance with the correct mount
point.

Yes, except that ptmx is supposed to work without ptyfs, but I guess that
nobody uses it this way anymore.

christos



Re: Enhance ptyfs to handle multiple instances.

2014-03-14 Thread Christos Zoulas
On Mar 14,  6:49pm, net...@izyk.ru (Ilya Zykov) wrote:
-- Subject: Re: Enhance ptyfs to handle multiple instances.

|  We first need to decide if disclosing gaps in the pty number is a security
|  issue. If not, it is simple; we just allocate the next free one and we don't
|  care about gaps. I.e. first mount can grab 0,1,2,3,5,6 second mount can grab
|  4,7,8 etc. 
| 
| It introduces limits.

How so?

|  I think that this is not very desirable because it again introduces limits
|  to the number of ptys per mountpoint.
| 
| I don't understand how?

if the first mount can only have [0..n-1] the second [n...2*n] etc...

| Ok.
| One remark: mount one instance more than one time useless, because, which 
mount point must return TIOCPTMGET in this case?
| Maybe I don't understand fully NetBSD pty layer realization yet.

If they point to the same device and they are both reachable, it does
not matter. If you are inside a chroot, then a reachable one within the root.

christos


Re: Enhance ptyfs to handle multiple instances.

2014-03-14 Thread Christos Zoulas
On Mar 14,  5:08pm, campbell+netbsd-tech-k...@mumble.net (Taylor R Campbell) 
wrote:
-- Subject: Re: Enhance ptyfs to handle multiple instances.

| We could install a symlink at /dev/ptmx pointing to pts/ptmx, and we
| could install a device node at /dev/pts/ptmx, which gets hidden by the
| ptyfs mount if you use ptyfs.  That way the non-ptyfs case still works
| and the ptyfs case enables multi-instance goodies.

That is the easy part; the harder part is the kernel driver portion.

christos


Re: DIOCGDISKINFO support for vnd

2014-03-11 Thread Christos Zoulas
In article 20140311130940.GA456@quark,
Patrick Welche  pr...@cam.ac.uk wrote:
-=-=-=-=-=-

The attached trivial patch allows vnd(4) to support generic disk ioctls.
The only one in kern/subr_disk.c at the moment is DIOCGDISKINFO.

Before:
$ ./vndtest /dev/vnd0a 
vndtest: DIOCGDISKINFO: Inappropriate ioctl for device

After:
$ ./vndtest /dev/vnd0a
size of /dev/vnd0a: 524288 bytes

Thanks to pooka@ for help in creating librumpdev_vnd.so which made finding
the root of the problem easy.

Comments?

LGTM.

christos



Re: 4byte aligned com(4) and PCI_MAPREG_TYPE_MEM

2014-02-09 Thread Christos Zoulas
In article 52f7c96e.6000...@execsw.org,
SAITOH Masanobu  msai...@execsw.org wrote:
Hello, all.

 I'm now working to support Intel Quark X1000.
This chip's internal com is MMIO(PCI_MAPREG_TYPE_MEM).
Our com and puc don't support such type of device, yet.
To solve the problem, I wrote a patch.

 Registers of Quark X1000's com are 4byte aligned.
Some other machines have such type of device, so
I modified COM_INIT_REGS() macro to support both
byte aligned and 4byte aligned. This change reduce
special modifications done in atheros, rmi and
marvell drivers.

 One of problem is serial console on i386 and amd64.
These archs calls consinit() three times. The function
is called in the following order:

   1) machdep.c::init386() or init_x86_64()
   2) init_main.c::main()
   *) (call uvm_init())
   *) (call extent_init())
   3) machdep.c::cpu_startup()

When consinit() called in init386(), it calls

  comcnattach()
-comcnattach1()
  -comcninit()
- bus_space_map() with x86_bus_space_mem tag.
  -bus_space_reservation_map()
-x86_mem_add_mapping()
  -uvm_km_alloc()
panic in KASSERT(vm_map_pmap(map) == pmap_kernel());

What should I do?
One of the solution is to check whether extent_init() was called
or not. There is no easy way to know it, so I added a global
variable extent_initted. Is it acceptable?

Looks great, can't you use cold instead, or is that too late?

christos



Re: [PATCH] netbsd32 swapctl, round 4

2014-02-03 Thread Christos Zoulas
On Feb 3,  8:04am, m...@netbsd.org (Emmanuel Dreyfus) wrote:
-- Subject: Re: [PATCH] netbsd32 swapctl, round 4

| On Mon, Feb 03, 2014 at 03:06:14AM +, Christos Zoulas wrote:
|  I thought we decided that it is better to have one sep32 on the stack
|  and do copyout in the loop.
| 
| I can do that too, I just need to sort out the size question:

ok.

|  +out:
|  +  kmem_free(sep, sizeof(*sep));
|  +  kmem_free(sep32, sizeof(*sep32));
|  The sizes are wrong.
| 
| How are they wrong?

* count?

christos


Re: [PATCH] netbsd32 swapctl, round 4

2014-02-02 Thread Christos Zoulas
In article 20140202043534.ga8...@homeworld.netbsd.org,
Emmanuel Dreyfus  m...@netbsd.org wrote:

Latest revision of the netbsd32 swapctl patch

+  for (i = 0; i  count; i++) {
+  sep32[i].se_dev = sep[i].se_dev;
+  sep32[i].se_flags = sep[i].se_flags;
+  sep32[i].se_nblks = sep[i].se_nblks;
+  sep32[i].se_inuse = sep[i].se_inuse;
+  sep32[i].se_priority = sep[i].se_priority;
+  memcpy(sep32[i].se_path, sep[i].se_path,
+  sizeof(sep32[i].se_path));
+  }
+
+  error = copyout(sep32, SCARG(uap, arg), sizeof(*sep32) * count);

I thought we decided that it is better to have one sep32 on the stack
and do copyout in the loop.

+out:
+  kmem_free(sep, sizeof(*sep));
+  kmem_free(sep32, sizeof(*sep32));

The sizes are wrong.

christos



Re: compat_netbsd32 swapctl

2014-01-29 Thread Christos Zoulas
In article 20140129153713.gl5...@homeworld.netbsd.org,
Emmanuel Dreyfus  m...@netbsd.org wrote:
On Wed, Jan 29, 2014 at 01:48:35PM +0100, Martin Husemann wrote:
 My vote for this special case: hard code it #ifdef __x86_64__.
 If we run into other instances, we can add a define (like: DEV_T_ALIGN_32).

Here is a patch that fixes the problem.

+  default:
+  panic(unexpected cmd = %d, SCARG(ua, cmd));
+  break;

Anyone can panic the kernel now...

+  }
+
+  *retval = 0;
+  SCARG(ssa, cmd) = SCARG(ua, cmd);
+  SCARG(ssa, misc) = 1;
+
+  for (i = 0; i  SCARG(ua, misc); i++) {
+  SCARG(ssa, arg) = 
+  (char *)SCARG(ua, arg) + (i * swapctl32_len);
+
+  if ((error = sys_swapctl(l, ssa, rv)) != 0) {
+  *retval = rv;
+  break;
+  }

I think you probably want to do some struct conversion because if sys_swapctl
expects to write to a bigger buffer, you can end up trashing the user's stack.

christos



Re: amd64 kernel, i386 userland

2014-01-25 Thread Christos Zoulas
In article 1lfyrch.n0en6eho0ft6m%m...@netbsd.org,
Emmanuel Dreyfus m...@netbsd.org wrote:
Since nobody opposes, I am going to commit that. Perhaps the option name
could be better:  NATIVE_EMULROOT or EMULROOT_NATIVE?

 --- sys/kern/kern_exec.c.orig   2014-01-21 16:55:00.0 +0100
 +++ sys/kern/kern_exec.c2014-01-21 16:55:13.0 +0100
 @@ -184,9 +184,13 @@
  
  /* NetBSD emul struct */
  struct emul emul_netbsd = {
 .e_name =   netbsd,
 +#ifdef COMPAT_NATIVE
 +   .e_path =   COMPAT_NATIVE,
 +#else
 .e_path =   NULL,
 +#endif
  #ifndef __HAVE_MINIMAL_EMUL
 .e_flags =  EMUL_HAS_SYS___syscall,
 .e_errno =  NULL,
 .e_nosys =  SYS_syscall,

I think EMUL_NATIVEROOT is better, since everything starts with EMUL?

christos



Re: Autoload of pseudo-device driver module

2013-12-28 Thread Christos Zoulas
In article pine.neb.4.64.1312280809050.26...@screamer.whooppee.com,
Paul Goyette  p...@whooppee.com wrote:
I've noticed that the vnd(4) driver seems to be able to auto-load when 
one runs vndconfig.  Can someone tell me how this is triggered?

The module_autoload() in sys/miscfs/specfs/specfs_vnops.c, triggered
on the open of the device vnode.

christos



Re: Problem with autounload of nfsserver module

2013-12-14 Thread Christos Zoulas
In article pine.neb.4.64.1312140541170@screamer.whooppee.com,
Paul Goyette  p...@whooppee.com wrote:
I believe that the nfsserver module should not be allowed to autounload.

Consider the following sequence of events:

1. mountd is started, and calls nfssvc(2)
2. The module subsystem autoloads the nfsserver module
3. mountd continues, adding entries to the exports list

So far, everything is fine.  However

4. When the autounload timer expires, the module subsystem unloads the
nfsserver module
5. As part of nfsserver_modcmd(), the export list is cleared
6. The autounload completes successfully
7. At some later time, we finally get around to starting nfsd.  This
succeeds, but the export list has been cleared, so there is nothing
for nfsd to deliver to the clients.

So, depending on how much time it takes between starting mountd and 
starting nfsd, we could end up serving an empty exports list.

The following patch prevents the module subsystem from unloading the 
nfsserver module.  (Manual unloading of the module will still work.) 
Comments?

Perhaps you want it to prevent it from unloading while there are exported
filesystems?

christos



Re: Problem with autounload of nfsserver module

2013-12-14 Thread Christos Zoulas
In article pine.neb.4.64.1312140648400.22...@screamer.whooppee.com,
Paul Goyette  p...@whooppee.com wrote:
On Sat, 14 Dec 2013, Christos Zoulas wrote:

 In article pine.neb.4.64.1312140541170@screamer.whooppee.com,
 Paul Goyette  p...@whooppee.com wrote:
 I believe that the nfsserver module should not be allowed to autounload.

 Consider the following sequence of events:

 1. mountd is started, and calls nfssvc(2)
 2. The module subsystem autoloads the nfsserver module
 3. mountd continues, adding entries to the exports list

 So far, everything is fine.  However

 4. When the autounload timer expires, the module subsystem unloads the
nfsserver module
 5. As part of nfsserver_modcmd(), the export list is cleared
 6. The autounload completes successfully
 7. At some later time, we finally get around to starting nfsd.  This
succeeds, but the export list has been cleared, so there is nothing
for nfsd to deliver to the clients.

 So, depending on how much time it takes between starting mountd and
 starting nfsd, we could end up serving an empty exports list.

 The following patch prevents the module subsystem from unloading the
 nfsserver module.  (Manual unloading of the module will still work.)
 Comments?

 Perhaps you want it to prevent it from unloading while there are exported
 filesystems?

Yeah, that would work, too.  :)   The following patch prevents the 
module from being auto-unloaded if there are exported filesystems.  (If 
a manual unload is requested, we will still forcibly delete the exports 
list.)

I committed something similar.

christos



re: Problem with autounload of nfsserver module

2013-12-14 Thread Christos Zoulas
On Dec 15,  5:07pm, m...@eterna.com.au (matthew green) wrote:
-- Subject: re: Problem with autounload of nfsserver module

| right, that's my whole point.
| 
| what's the real benefit to the end user for having these modules
| use auto unload?  mostly they seem to present bugs we have to fix
| (which is good for fixing bugs, but annoying for users), instead
| of actually being useful.  i don't consider the minor memory
| savings to be a real consideration here.
| 
| i've long thought we should not auto unload by default.

Think about it the other way around... If you don't auto-unload, most users
will not run the unload code and the bugs will stay hidden. And you know too
well, that I've been bitten by one unload bug that took me a month to figure
out, since it manifested as random page faults in the pool code...

christos


Re: qsort_r

2013-12-09 Thread Christos Zoulas
In article 20131209061036.ge2...@apb-laptoy.apb.alt.za,
Alan Barrett  a...@cequrux.com wrote:
On Sun, 08 Dec 2013, David Holland wrote:
 My irritation with not being able to pass a data pointer through 
 qsort() boiled over just now. Apparently Linux and/or GNU 
 has a qsort_r() that supports this; so, following is a patch 
 that gives us a compatible qsort_r() plus mergesort_r(), and 
 heapsort_r().

Apparently FreeBSD [1] and GNU [2] have incompatible versions 
of qsort_r, passing the extra 'thunk' or 'data' argument in a 
different position.

[1]: FreeBSD qsort_r http://www.manpagez.com/man/3/qsort_r/
[2]: Linux qsort_r  http://man7.org/linux/man-pages/man3/qsort.3.html

If we have to pick one, let's pick the FreeBSD version.

Actually let's not (fortunately dh@ chose the right one).
We should pick the linux one:

http://sourceware.org/ml/libc-alpha/2008-12/msg7.html

christos



Re: qsort_r

2013-12-08 Thread Christos Zoulas
In article 20131208222953.gb25...@netbsd.org,
David Holland  dholland-t...@netbsd.org wrote:
(Cc: tech-kern because of kheapsort())

My irritation with not being able to pass a data pointer through
qsort() boiled over just now. Apparently Linux and/or GNU has a
qsort_r() that supports this; so, following is a patch that gives us
a compatible qsort_r() plus mergesort_r(), and heapsort_r().

I have done it by having the original, non-_r functions provide a
thunk for the comparison function, as this is least invasive. If we
think this is too expensive, an alternative is generating a union of
function pointers and making tests at the call sites; another option
is to duplicate the code (hopefully with cpp rather than CP) but that
seems like a bad plan. Note that the thunks use an extra struct to
hold the function pointer; this is to satisfy C standards pedantry
about void pointers vs. function pointers, and if we decide not to
care it could be simplified.

This patch was supposed to have all the necessary support widgetry,
like namespace.h changes, but there's at least one more thing not in
it: MLINKS for the new functions and corresponding setlist
changes. If I've forgotten anything else, let me know.

heapsort() is used in one place in the kernel as kheapsort(), which
takes an extra argument so that the heapsort code itself doesn't have
to know how to malloc in the kernel. I have done the following:
   - add kheapsort_r()
   - change the signature of plain kheapsort() to move the extra
 argument to a place it is less likely to cause confusion with the
 userdata argument;
   - update the caller for the signature change;
but I have not changed the caller to call kheapsort_r instead. Based
on the code, this should probably be done later as like many sort
calls it's using a global for context. At this point the plain
kheapsort can be removed.


LGTM.

christos



Re: in which we present an ugly hack to make sys/queue.h CIRCLEQ work

2013-11-26 Thread Christos Zoulas
In article 96497888-f8c7-49f2-958b-532a2093b...@gmail.com,
Dennis Ferguson  dennis.c.fergu...@gmail.com wrote:

I think getting rid of uses of the CIRCLEQ macros was the right thing
to do in any case, since code which works like that doesn't need to
exist.  I'm not sure that that TAILQ macros are the best answer
to the problem, though.

Unfortunately I agree... But in the absence of a better alternative...

christos



Re: in which we present an ugly hack to make sys/queue.h CIRCLEQ work

2013-11-26 Thread Christos Zoulas
On Nov 26,  8:20pm, m...@linuxbox.com (Matt W. Benjamin) wrote:
-- Subject: Re: in which we present an ugly hack to make sys/queue.h CIRCLEQ 

| What's your issue with TAILQ?

#define TAILQ_LAST(head, headname) \
(*(((struct headname *)((head)-tqh_last))-tqh_last))
#define TAILQ_PREV(elm, headname, field) \
(*(((struct headname *)((elm)-field.tqe_prev))-tqh_last)) 

etc.

christos


Re: Help for PR kern/46606 is needed

2013-11-25 Thread Christos Zoulas
In article 6eefbae491842ca53ddd56690ba31...@mail.marples.name,
Roy Marples  r...@marples.name wrote:
-=-=-=-=-=-

Hi

On 25/11/2013 10:33, Ryo ONODERA wrote:
 pulseaudio needs pkgsrc/sysutils/hal, and running hal causes
 PR kern/46606 kernel panic when the NetBSD system is shutdown.
 See http://gnats.netbsd.org/46606 (and duplicated bug
 http://gnats.netbsd.org/47012 ).
 
 How to debug this problem?
 This problem is observed even on NetBSD/amd64 6.99.27
 of Tue Nov 19 06:16:01 JST 2013.

I have had this for months on i386.
I am running the attached patch from christos@ which stops the crash, 
but probably Does Bad Things.

The actual error seems to be when hal starts, but only causes a problem 
when hal stops.
This happens for you at shutdown, because hal is stopped then.

Here's the output from a run:

I think I understand what's going on finally.

1st start
Creation:

/usr/src/sys/kern/kern_lwp.c,731: hald[2231]: [uid=0] (0/1)

Setuid:
/usr/src/sys/kern/kern_prot.c,357: hald[2231]: [uid=0] (1/-1)
/usr/src/sys/kern/kern_prot.c,361: hald[2231]: [uid=1005] (0/1)
/usr/src/sys/kern/kern_prot.c,381: hald[2231]: [uid=1005] (1/0)

Setuid after using p-p_cred for the uid of the process:
/usr/src/sys/kern/kern_prot.c,386: hald[2231]: [uid=1005] (1/0)
/usr/src/sys/kern/kern_lwp.c,731: hald-runner[1965]: [uid=0] (0/1)

1st stop

Destruction:

XXX: This is using l-l_cred to find the uid of the lwp. This is still
pointing to root?!?!? Didn't we setuid just above to 1005? All creds
of hald should be pointing to 1005, yet this lwp cred is still pointing to
root.

/usr/src/sys/kern/kern_lwp.c,1128: hald[2231]: [uid=0] (1/-1)

Now root has one cred missing! So when hald-runner dies we end up:


/usr/src/sys/kern/kern_lwp.c,1128: hald-runner[1965]: [uid=0] (0/-1)

With lwp count == -1 as you can see below. We should have crashed now,
but my patch comments out the KASSERT!

Too late now, the damage has been done.

2nd start
/usr/src/sys/kern/kern_lwp.c,731: hald[1209]: [uid=0] (4294967295/1)
/usr/src/sys/kern/kern_prot.c,357: hald[1209]: [uid=0] (0/-1)
/usr/src/sys/kern/kern_prot.c,361: hald[1209]: [uid=1005] (1/1)
/usr/src/sys/kern/kern_prot.c,381: hald[1209]: [uid=1005] (2/0)
/usr/src/sys/kern/kern_prot.c,386: hald[1209]: [uid=1005] (2/0)
/usr/src/sys/kern/kern_lwp.c,731: hald-runner[738]: [uid=0] 
(4294967295/1)

2nd stop
/usr/src/sys/kern/kern_lwp.c,1128: hald[1209]: [uid=1005] (2/-1)
/usr/src/sys/kern/kern_lwp.c,1128: hald-runner[738]: [uid=0] (0/-1)

Thanks

Roy
-=-=-=-=-=-
Index: sys/kern/kern_lwp.c
===
RCS file: /cvsroot/src/sys/kern/kern_lwp.c,v
retrieving revision 1.175
diff -u -p -r1.175 kern_lwp.c
--- sys/kern/kern_lwp.c9 Jun 2013 01:13:47 -   1.175
+++ sys/kern/kern_lwp.c25 Nov 2013 14:31:20 -
@@ -781,6 +781,12 @@ lwp_create(lwp_t *l1, proc_t *p2, vaddr_
*/
   if (p2-p_nlwps != 0  p2 != proc0) {
   uid_t uid = kauth_cred_getuid(l1-l_cred);
+  if (strncmp(p2-p_comm, hald, 4) == 0) {
+  struct uidinfo *uip = uid_find(uid);
+  printf(%s,%d: %s[%d]: [uid=%d] (%lu/%d)\n, __FILE__,
+  __LINE__, p2-p_comm, (int)p2-p_pid, (int)uid,
+  uip-ui_lwpcnt, 1);
+  }
   int count = chglwpcnt(uid, 1);
   if (__predict_false(count 
   p2-p_rlimit[RLIMIT_NTHR].rlim_cur)) {
@@ -789,6 +795,13 @@ lwp_create(lwp_t *l1, proc_t *p2, vaddr_
   KAUTH_ARG(KAUTH_REQ_PROCESS_RLIMIT_BYPASS),
   p2-p_rlimit[RLIMIT_NTHR], KAUTH_ARG(RLIMIT_NTHR))
   != 0) {
+  if (strncmp(p2-p_comm, hald, 4) == 0) {
+  struct uidinfo *uip = uid_find(uid);
+  printf(%s,%d: %s[%d]: [uid=%d]
+   (%lu/%d)\n, __FILE__, __LINE__,
+  p2-p_comm, (int)p2-p_pid,
+  (int)uid, uip-ui_lwpcnt, -1);
+  }
   (void)chglwpcnt(uid, -1);
   return EAGAIN;
   }
@@ -1174,8 +1187,16 @@ lwp_free(struct lwp *l, bool recycle, bo
   KASSERT(l != curlwp);
   KASSERT(last || mutex_owned(p-p_lock));
 
-  if (p != proc0  p-p_nlwps != 1)
+  if (p != proc0  p-p_nlwps != 1) {
+  uid_t uid = kauth_cred_getuid(l-l_cred);
+  if (strncmp(p-p_comm, hald, 4) == 0) {
+  struct uidinfo *uip = uid_find(uid);
+  printf(%s,%d: %s[%d]: [uid=%d] (%lu/%d)\n, __FILE__,
+  __LINE__, p-p_comm, (int)p-p_pid, (int)uid,
+  uip-ui_lwpcnt, -1);
+  }
   (void)chglwpcnt(kauth_cred_getuid(l-l_cred), -1);
+  }
  

Re: posix_fallocate

2013-11-17 Thread Christos Zoulas
On Nov 17,  1:15pm, k...@munnari.oz.au (Robert Elz) wrote:
-- Subject: Re: posix_fallocate

| ps: I have not examined the FreeBSD implementation - if they've done it the
| hard, safe, way, and worked out all the potential kinks, and if it doesn't
| depend too much upon other aspects of their I/O system implementation (like
| whatever they have to make softdeps work) then perhaps copying that might be
| feasible -- if the demand for this really exists, and it isn't being requested
| just because it is in the spec and NetBSD is lacking it.

From the cursory look at it, they just write.

christos


Re: [patch] changing lua_Number to int64_t

2013-11-17 Thread Christos Zoulas
On Nov 17, 10:36am, lourival.n...@gmail.com (Lourival Vieira Neto) wrote:
-- Subject: Re: [patch] changing lua_Number to int64_t

| I mean know it as a script programmer. I think that would be helpful
| to know the exact  lua_Number width when you are writing a script.
| AFAIK, you don't have sizeof functionality from Lua. So, IMHO,
| lua_Number width should be fixed and documented.

Lua should provide manifest constants for it  (like INTMAX_MAX).
Otherwise you'd be making assumptions

christos


Re: [patch] changing lua_Number to int64_t

2013-11-17 Thread Christos Zoulas
On Nov 17, 10:46am, lourival.n...@gmail.com (Lourival Vieira Neto) wrote:
-- Subject: Re: [patch] changing lua_Number to int64_t

| On Sun, Nov 17, 2013 at 7:37 AM, Marc Balmer m...@msys.ch wrote:
|  Am 17.11.13 04:49, schrieb Terry Moore:
|  I believe that if you want the Lua scripts to be portable across NetBSD
|  deployments, you should choose a well-known fixed width.
| 
|  I don't see this as very important.  Lua scripts will hardly depend on
|  the size of an integer.
| 
| But they could. I think that the script programmers should know if the
| numeric data type is enough for their usage (e.g., time diffs).

By making it the biggest type possible, you never need to be worried.

christos


Re: [patch] changing lua_Number to int64_t

2013-11-17 Thread Christos Zoulas
On Nov 17,  3:36pm, lourival.n...@gmail.com (Lourival Vieira Neto) wrote:
-- Subject: Re: [patch] changing lua_Number to int64_t

|  1. Lua 5.3 will have 64 bit integer support as standard, which will
|  make interop and reuse between kernel and userspace code much easier,
|  iff we use int64_t
| 
| If they are using int64_t for integers, I think it is a good reason to us to
| stick to int64_t.

This is not relevant. The numeric type will still be double, so forget
about compatibility between kernel and userland. There is no need for
the interpreter to use a fixed width type, but rather it is convenient
to use the largest numeric type the machine can represent.

christos


Re: [patch] changing lua_Number to int64_t

2013-11-17 Thread Christos Zoulas
On Nov 17,  7:14pm, lourival.n...@gmail.com (Lourival Vieira Neto) wrote:
-- Subject: Re: [patch] changing lua_Number to int64_t

| Humm.. I think that =A72.1 brings a good argument: Standard Lua uses
| 64-bit integers and double-precision floats, (...). I think that
| would not hurt to stick to the future standard; once 64 bit is good
| enough for kernel purposes.

Ok,

christos


Re: [patch] changing lua_Number to int64_t

2013-11-16 Thread Christos Zoulas
In article 52872b0c.5080...@msys.ch, Marc Balmer  m...@msys.ch wrote:
Changing the number type to int64_t is certainly a good idea.  Two
questions, however:

Why not intmax_t?

christos



Re: [patch] changing lua_Number to int64_t

2013-11-16 Thread Christos Zoulas
On Nov 16,  9:30pm, lourival.n...@gmail.com (Lourival Vieira Neto) wrote:
-- Subject: Re: [patch] changing lua_Number to int64_t

| On Sat, Nov 16, 2013 at 8:52 PM, Christos Zoulas chris...@astron.com wrote:
|  In article 52872b0c.5080...@msys.ch, Marc Balmer  m...@msys.ch wrote:
| Changing the number type to int64_t is certainly a good idea.  Two
| questions, however:
| 
|  Why not intmax_t?
| 
| My only argument is that int64_t has a well-defined width and, AFAIK,
| intmax_t could vary. But I have no strong feelings about this. Do you
| think intmax_t would be better?

Bigger is better. And you can use %jd to print which is a big win.

christos


Re: posix_fallocate

2013-11-16 Thread Christos Zoulas
In article 1lcgiu4.18zr2h51aac07zm%m...@netbsd.org,
Emmanuel Dreyfus m...@netbsd.org wrote:
Hi

NetBSD-current seems to lack posix_fallocate(2)
http://pubs.opengroup.org/onlinepubs/009695299/functions/posix_fallocate
.html

Is someone already working on it, or has thoughs about how it should be
implemented?

FreeBSD has it as a system call. It should be easy to dup.

christos



Re: Changing __USING_TOPDOWN_VM to a runtime decision

2013-11-05 Thread Christos Zoulas
In article 20131105144023.gc17...@mail.duskware.de,
Martin Husemann  mar...@duskware.de wrote:
-=-=-=-=-=-

Hey folks,

I would like to change the current (mostly) compile time decision
wether we will use top-down VA layout for userland processes to a
runtime check.

This allows emulations to disable it, and also allows MD code to recognize
binaries not suitable for topdown VM layout and give those binaries the
old layout.

The latter point is what I actually need: on sparc64 we have compiled most
code in the medlow code model, which does not allow big addresses. I am
about to commit changes that switch this default and properly mark new
binaries. To still allow running old binaries, I need something like the
attached patch.

The patch is mostly straight forward: I define a new flag EXEC_TOPDOWN_VM,
initialized by default according to __USING_TOPDOWN_VM, but overridable
by a MD function. This way the exec_package carries over the information,
wether we will use topdown-vm for the to-be-loaded binary.

Most other changes are mechanical, like pass through this information through
a few uvm layers.

For architectures already using topdown-VM, no change is intended.

Comments?

I don't like the !!(expr) syntax, I'd prefer to hide the ugliness in a macro
that does (expr != 0) 

christos



Re: zero-length symlinks

2013-11-05 Thread Christos Zoulas
In article 20131105220754.gb...@snowdrop.l8s.co.uk,
David Laight  da...@l8s.co.uk wrote:
On Sun, Nov 03, 2013 at 04:35:19PM -0800, John Nemeth wrote:
 
  It has to do with the fact that historically mkdir(2) was
 actually mkdir(3), it wasn't an atomic syscall and was a sequence
 of operation performed by a library routine...

Actually I think you'll find that mkdir way always a system call.
It was directory rename that was done with a series of link and
unlink system calls.

Nope, on 4.1BSD and I believe SVR1 (please correct me),
it was a setuid binary that did:

mknod(foo, 04, 0);
chown(foo, getuid());
link(foo, foo/.);
link(., foo/..);

Also, if you look at any current fs code the processing of . and
.. is special - they will be treated as requests for the current
and parent directories regardless of the inodes they reference.
Doing otherwise is a complete locking nightmare!

I think that this also came much later. I believe with 4.4BSD.

christos



Re: autoconf deferred processing

2013-10-22 Thread Christos Zoulas
In article 20131022205705.c0dc812...@ren.fdy2.co.uk,
Robert Swindells  r...@fdy2.co.uk wrote:

Can somebody explain how the deferred processing code in subr_autoconf.c
is supposed to work ?

Looking at config_create_interruptthreads() it creates 8 threads all
of which seem to walk the same list and delete elements from it.

I'm getting crashes in i386 at startup and am trying to track down what
is causing it. The faulting PC is random so I'm looking for anything
that calls through function pointers.

Robert Swindells

Edit subr_autoconf.c and set the number of threads to 1.

int interrupt_config_threads = 8;
int mountroot_config_threads = 2;

Or you can patch them in ddb with boot -d

Also you can add DEBUG_AUTOCONF in current, which prints the name of
the deferred driver as well as the count ot the deferred mutex. For
even better success, boot -1

christos



Re: kgdb on NetBSD/amd64 6.99.23

2013-09-19 Thread Christos Zoulas
In article 523aab61.8000...@gmail.com,
Jan Danielsson  jan.m.daniels...@gmail.com wrote:
On 9/18/13 7:00 PM, Jan Danielsson wrote:
I'm trying to get kgdb working between two virtual box instances. (I
 have verified that /dev/tty00 - /dev/tty00 works by running GENERIC
 kernels and minicom on both virtual machines).
[---]

   Problem #1 solved (worked-around). It looks like RB_KDB isn't being
passed over to the kernel properly. I simply commented out if
(boothowto  RB_KDB), and how have a kernel which actually waits for a
remote debugger to attach on boot.

   I enabled DEBUG_KGDB, and when I attach the debugger from the
remote the target clearly reacts to it. Unfortunately, the remote
says PC register not available -- and it doesn't appear to actually be
connected.

   Is kgdb supposed to work on amd64 and/or -current? I'm starting to
get the feeling that this is somewhat untested.

I did not test kgdb last time I upgraded gdb, so it might need fixing...

christos



Re: high load, no bottleneck

2013-09-19 Thread Christos Zoulas
On Sep 19, 11:35am, buh...@nfbcal.org (Brian Buhrow) wrote:
-- Subject: Re: high load, no bottleneck

|   Hello.  the worst case scenario is when a raid set is running in
| degraded mode.  Greg sent me some notes on how to calculate the memory
| utilization in this instance.  I'll go dig them out and send them along in
| a bit.  In theory, if all your raid sets are in degraded mode at once, and
| i/o is busy, you could be highly impacted, since you can have up to 40
| i/o's outstanding for each raid set with my configuration option.  However,
| even on machines with multiple raid5 sets, with 2 of them running in
| degraded mode, I've not seen a memory bottleneck.  I don't recommend this,
| of course, but somethimes stuff happens.  In any case, except for the
| potential memory utilization, there's no down side to setting this number
| in the kernel and not worrying about it anymore.  In fact, this is what I
| do for all our machines around here regardless of whether the machine is
| hosting raid1 sets, raid5 sets or a combination of the two.

If we are going to add a sysctl, we might also put a different value for
the raid-degraded condition? Ideally I prefer if things autotuned, but that
is much more difficult.

christos


Re: high load, no bottleneck

2013-09-18 Thread Christos Zoulas
On Sep 18,  3:34am, m...@netbsd.org (Emmanuel Dreyfus) wrote:
-- Subject: Re: high load, no bottleneck

| Christos Zoulas chris...@zoulas.com wrote:
| 
|  On large filesystems with many files fsck can take a really long time after
|  a crash. In my personal experience power outages are much less frequent than
|  crashes (I crash quite a lot since I always fiddle with things). If you
|  don't care about fsck time, you don't need WAPBL. 
| 
| But you just told me that I will need a fsck after crash now I am
| running with vfs.wapbl.flush_disk_cache=0 so I wonder if I should not
| just mount without -o log. What are WAPBL benefits when running with
| vfs.wapbl.flush_disk_cache=0?

You *might* need an fsck after power loss. If you crash and the disk syncs
then you should be ok if the disk flushed (which it probably did if you
say syncing disks after the panic).

christos


Re: high load, no bottleneck

2013-09-17 Thread Christos Zoulas
In article 1l9czcn.y6kr35aruvzvm%m...@netbsd.org,
Emmanuel Dreyfus m...@netbsd.org wrote:
Emmanuel Dreyfus m...@netbsd.org wrote:

 db{0} show vnode c5a24b08
 OBJECT 0xc5a24b08: locked=0, pgops=0xc0b185a8, npages=1720, refs=16
 
 VNODE flags 0x4030MPSAFE,LOCKSWORK,ONWORKLST
 mp 0xc4a14000 numoutput 0 size 0x6f writesize 0x6f
 data 0xc5a25d74 writecount 0 holdcnt 2
 tag VT_UFS(1) type VREG(1) mount 0xc4a14000 typedata 0xc4fe5480
 v_lock 0xc5a24bac

While many threads are waiting, another nfsd thread holds the lock with
this backtrace:
turnstile_block
rw_vector_enter
wapbl_begin
ffs_write
VOP_WRITE
nfsrv_write
nfssvc_nfsd
sys_nfssvc
syscall

I understand it is waiting for another process to complete I/O before
passing the entering rwlock in wapbl_begin

I have a first-class suspect with this other nfsd thread which is
engaged in I/O:
sleepq_block
wdc_exec_command
wd_flushcache
wdioctl
bdev_ioctl
spec_ioctl
VOP_IOCTL
rf_sync_component_caches
raidioctl
bdev_ioctl
spec_ioctl
VOP_IOCTL
wapbl_cache_sync

Is it a nasty interraction between RAIDframe, NFS and WAPBL?

My suggestion is to try:

sysctl -w vfs.wapbl.flush_disk_cache=0

for now...

christos




Re: high load, no bottleneck

2013-09-17 Thread Christos Zoulas
On Sep 17,  9:48pm, m...@netbsd.org (Emmanuel Dreyfus) wrote:
-- Subject: Re: high load, no bottleneck

| Excellent: the load does not go over 2 now (compared to 50).
| 
| Thank you for saving my day. But now what happens?
| I note the SATA disks are in IDE emulation mode, and not AHCI. This is
| something I need to try changing:

What happens highly depends on the drive (how frequently it flushes
cache to disk internally and how long does it keep data in-cache),
but it is never good. The best case scenario is would be that WAPBL
writes are ordered properly and that cache-flush is only send
occasionally between transactionally safe metadata commit points, but
it seems that this is not happening (because we are getting too many
flushes).

The case to worry about is the scenario where the machine
suddently loses power, the data never makes it to the physical media,
and gets lost from the cache. In this case you might end up with a
filesystem that has inconsistent metadata, so the next reboot might
end up causing a panic when the filesystem is used. The solution there
is to reboot and force an fsck. If you have a UPS I would not worry
too much about it; even if your system panics the kernel should issue
the flush commands to the disk.

BTW I hope that everyone realizes that WAPBL deals only with metadata
and not the actual file data, so if you crash/lose power you typically
end up with garbage in the active files (usually bits and pieces of files
form other files, or NUL's).

christos


Re: high load, no bottleneck

2013-09-17 Thread Christos Zoulas
On Sep 18,  2:22am, m...@netbsd.org (Emmanuel Dreyfus) wrote:
-- Subject: Re: high load, no bottleneck

|  The case to worry about is the scenario where the machine
|  suddently loses power, the data never makes it to the physical media,
|  and gets lost from the cache. In this case you might end up with a
|  filesystem that has inconsistent metadata, so the next reboot might
|  end up causing a panic when the filesystem is used. The solution there
|  is to reboot and force an fsck. 
| 
| It seems the system would be better without WAPBL enabled in this case.
| Is there any befenit left?

On large filesystems with many files fsck can take a really long time after
a crash. In my personal experience power outages are much less frequent than
crashes (I crash quite a lot since I always fiddle with things). If you
don't care about fsck time, you don't need WAPBL. Another easy thing you can
try is to put the WAPBL log in a flash drive and re-enable the cache flushes.

christos


Re: NFS over-quota not detected if utimes() called before fsync()/close()

2013-08-04 Thread Christos Zoulas
In article 20130731222303.gj96...@trav.math.uni-bonn.de,
Edgar Fuß  e...@math.uni-bonn.de wrote:
 Yes, I believe you are right. Return an error for all errors.
Any idea what the intent of only catching EINTR was?

The flawed logic of:

If the write fails for any other reason than being unterrupted
by the user, why at least not succeed changing the permissions?

christos



Re: NFS over-quota not detected if utimes() called before fsync()/close()

2013-07-31 Thread Christos Zoulas
In article 20130730211200.gd96...@trav.math.uni-bonn.de,
Edgar Fuß  e...@math.uni-bonn.de wrote:
 I think the problem is in nfs_setattr(), sys/nfs/nfs_vnops.c:681,
 where files are flushed before setattr because a later write of
 cached data might change timestamps or reset sugid bits, but the
 only return value of nfs_vinvalbuf() that's treated as an error is
 EINTR. Why?
Any comments on this?
We are losing mail because of this problem so I would like to get it fixed.

Yes, I believe you are right. Return an error for all errors.

christos




Re: ibcs2 syscalls.master problem

2013-06-26 Thread Christos Zoulas
In article 51c9db37.1090...@netbsd.org, Jeff Rizzo  r...@netbsd.org wrote:
The last time sys/compat/ibcs2/syscalls.master was edited [1] (July 
2010), the dependent files were not regenerated.  There was at least one 
typo (fixed), but there are also duplicate syscall names, which cause 
the generated files to break the i386 build. Can someone who knows 
what's what fix this, so the resulting files work?  I did notice that 
FreeBSD's ibcs2 emulation has more info on at least one of the syscalls.

Thanks,
+j

[1] http://mail-index.netbsd.org/source-changes/2010/07/23/msg011989.html

I think FreeBSD does not have ibcs2 emulation anymore...

christos



Re: DTrace syscall provider - please test/comment

2013-06-25 Thread Christos Zoulas
On Jun 24,  6:12pm, m...@3am-software.com (Matt Thomas) wrote:
-- Subject: Re: DTrace syscall provider - please test/comment

| 
| On Jun 24, 2013, at 6:01 PM, Christos Zoulas chris...@astron.com wrote:
| 
|  Can't this be done as an addition/enhancement to the trace_enter()/
|  trace_exit() facility instead of having to enter each syscall entry?
| 
| that only gets called if p-p_trace_enabled is set.  So now you need
| a hook to set that on every lwp switch if the provider is tracing.

Right, and it (dtrace) can set a different (or the same flag) to enable
it.

christos


Re: DTrace syscall provider - please test/comment

2013-06-25 Thread Christos Zoulas
On Jun 25,  9:32am, m...@3am-software.com (Matt Thomas) wrote:
-- Subject: Re: DTrace syscall provider - please test/comment

| 
| On Jun 25, 2013, at 5:25 AM, chris...@zoulas.com (Christos Zoulas) wrote:
| 
|  On Jun 24,  6:12pm, m...@3am-software.com (Matt Thomas) wrote:
|  -- Subject: Re: DTrace syscall provider - please test/comment
|  
|  | 
|  | On Jun 24, 2013, at 6:01 PM, Christos Zoulas chris...@astron.com wrote:
|  | 
|  |  Can't this be done as an addition/enhancement to the trace_enter()/
|  |  trace_exit() facility instead of having to enter each syscall entry?
|  | 
|  | that only gets called if p-p_trace_enabled is set.  So now you need
|  | a hook to set that on every lwp switch if the provider is tracing.
|  
|  Right, and it (dtrace) can set a different (or the same flag) to enable
|  it.
| 
| 
| How does it set the same flag since that's per-proc and will need to changed
| on context switch.  
| 
| A different flag is more overhead per syscall.

I am trying to balance that against adding of two more conditionals per
syscall per architecture and touching dozens of source files adding the
same code in each one. Perhaps the syscall_plain/syscall_fancy idea
was not that bad after all :-( Perhaps a different bit on the same flag.
If any of them is set, you call trace enter, and you clear/move the
bit on context switch.

christos


Re: NetBSD/avr32

2013-05-18 Thread Christos Zoulas
In article caev1cwcb1-+eyu+enmdq9omh8tzuanu8k4spurxw8uvpwfa...@mail.gmail.com,
Tomas Niño Kehoe  tomasninoke...@gmail.com wrote:
-=-=-=-=-=-

Hi all,

I'd like to announce the existence of a NetBSD port to the AVR32 processor
architecture.
This port is being developed in the context of my engineering thesis at the
University of Buenos Aires, Argentina. It is directed by Leandro Santi.

Looks like you've made a lot of progress. You might be interested in looking
at https://wiki.freebsd.org/FreeBSD/avr32

christos



Re: revert broken O_SEARCH

2013-01-13 Thread Christos Zoulas
In article 74e9a033-b75c-45b8-beee-a7380baa8...@gmail.com,
Garrett Cooper  yaneg...@gmail.com wrote:
On Jan 13, 2013, at 12:59 AM, Martin Husemann wrote:

 On Sun, Jan 13, 2013 at 08:49:06AM +, David Holland wrote:
 Nope, don't have that kind of setup and atf is way too invasive to
 allow just building the test programs somewhere else.
 
 ATF is available from pkgsrc and straight forward to install, so I tried
 on FreeBSD 9, but they do not have O_SEARCH:

   ATF is available on FreeBSD CURRENT (and will be the defacto test
infrastructure for FreeBSD) as of last October and we're working towards
having a sane set of wrapper Makefiles for producing tests (based
largely on what jmmv did for NetBSD, but divergent because the build
systems are divergent).

Would it make sense then to provide the tests as a shared separate repository
managed by both projects? I think a large number of the tests can be shared.

christos



Re: revert broken O_SEARCH

2013-01-13 Thread Christos Zoulas
In article c96ccc7b-08cc-4404-b567-1a5aa2c02...@gmail.com,
Garrett Cooper  yaneg...@gmail.com wrote:

   This is what I would like to happen (similar to LTP with Linux), but it
hasn't yet because of other items on my priority list of things to do.
But, I would really like working with someone at NetBSD (and hopefully
eventually DragonFlyBSD and OpenBSD) to make this a reality. Versioning
would be the only difficult thing that would need to be properly thought
out because functional requirements change over time, and as such some
tests may or may not apply (or the requirements may be different)
between multiple OS distro versions.

Fine, just let me know how I can help :-)

christos



Re: Porting FreeBSD drm2 driver

2013-01-12 Thread Christos Zoulas
In article 20130112154830.gc22...@falu.nl, Rhialto  rhia...@falu.nl wrote:
I just noticed that FreeBSD's new 9.1 release has Kernel Mode Setting:

The drm2(4) Intel GPU driver, which supports GEM and KMS and works with
new generations of GPUs such as IronLake, SandyBridge, and IvyBridge,
has been added. The agp(4) driver now supports SandyBridge and IvyBridge
CPU northbridges.[r236926, r236927, r239965]

(from http://www.freebsd.org/releases/9.1R/relnotes.html)

It seems that it is practically impossible to buy a PCI graphics card
these days for which X doesn't require KMS. This is going to make the use
of NetBSD with X impossible at a very rapid rate.

Even the card I that thought was supported properly (Radeon HD
5450-based) isn't - X claims no accelleration. I haven't tried Xv yet
(needed to view video) but I fear the worst.

Is anybody by any chance working on porting this FreeBSD driver to NetBSD?

We are painfully aware of this and we are commissioning work to remedy the
situation.

christos



Re: Importing lua(4), but where in the source tree?

2013-01-09 Thread Christos Zoulas
In article de1aa2ee-2a57-4255-9f6c-84b240596...@msys.ch,
Marc Balmer  m...@msys.ch wrote:

Am 09.01.2013 um 16:28 schrieb matthew green m...@eterna.com.au:

 
 I want to import the lua(4) device driver, which is currently a
module only, which seems wrong.
 
 Is sys/dev/lua/ a good place?
 
 can you give a little more details on what is included?

Sure. The full diff is at
http://www.netbsd.org/~mbalmer/diffs/kernel_lua_010.diff and it's the
files that the diff now places in sys/modules/lua/ that I think should
better go to sys/dev/lua/

 
 at a guess, if there are more than a couple of files
 then sys/dev/lua is an OK place, otherwise just sys/dev
 seems reasonable to me.

- Marc


Fine (with sys/dev/lua), but:

-#define fputs(s, f)printf(s)
+#define fputs(s, f)printf(%s, s)

And perhaps:
-#define realloc(ptr, nsize)kmem_alloc(nsize, KM_SLEEP)
-#define free(ptr)  kmem_free(ptr, osize)
+#define realloc(ptr, nsize)kern_realloc(ptr, nsize, KM_SLEEP)
+#define free(ptr)  kern_free(ptr)

And:
-char *strncat(char *dst, const char *src, size_t n);
-size_t strspn(const char *s, const char *charset);
-size_t strcspn(const char *s, const char *charset);
-char *strpbrk(const char *s, const char *charset);
+char *strncat(char *, const char *, size_t);
+size_t strspn(const char *, const char *);
+size_t strcspn(const char *, const char *);
+char *strpbrk(const char *, const char *);

And where does the luapmf driver go? in sys/dev/lua/ or sys/dev/lua/pmf?

christos



Re: WAPBL and write cacheing (was: SATA write performance problems)

2013-01-03 Thread Christos Zoulas
In article cajcb3foogeoiw_xgnrkovykigab+vf8jsiffiz8zgpyjrij...@mail.gmail.com,
Andy Ruhl  acr...@gmail.com wrote:
On Thu, Jan 3, 2013 at 4:54 AM, Lars Heidieker
lars.heidie...@googlemail.com wrote:
 On Thu, Jan 3, 2013 at 12:49 PM, Edgar Fuß e...@math.uni-bonn.de wrote:
 Doesn't this depend on filesystem journaling?
 Can someone please enlighten me?
 Is it safe to use write cacheing on a SATA drive with FFS/WAPBL on it?

 AFAIK it depends on the drive, if it doesn't lie about the command to
 flush the cache it's safe.
 WAPBL sends such a command on commit.

So I think what you are saying is that WAPBL asks the drive to flush
it's volatile cache before the journal update is done?

There was talk a while back on some list (I don't remember if it was a
NetBSD list) that certain OS behavior (maybe not NetBSD) was flushing
cache so often that drive cache performance benefits were essentially
negated. So the drive would ignore some of the cache requests which
leaves systems using journaling vulnerable. The fail-safe was to just
turn off cache completely.

Actually it was the addition of:

sysctl -w vfs.wapbl.flush_disk_cache=0

and the discussion on the actual behavior of various cache flushing
commands on different types of buses and drives.

christos



Re: fixing compat_12 getdents

2012-12-10 Thread Christos Zoulas
In article 20121210195346.ga8...@apb-laptoy.apb.alt.za,
Alan Barrett  a...@cequrux.com wrote:
 also, EINVAL doesn't seem like a great error code for this 
 condition.  it's not an input parameter that's causing the 
 error, but rather that the required output format cannot express 
 the data to be returned.  I think solaris uses EOVERFLOW for 
 this kind of situation, and ERANGE doesn't seem too bad either. 
 any opinions on that?

There's also E2BIG, but I don't think it fits.  ERANGE is 
documented in terms of the available space, while EOVERFLOW is 
documented in terms of a numeric result.  So perhaps EOVERFLOW 
for integer is too large to fit in N bits, and ERANGE for 
string is too long to fit in N bytes?  Or vice versa?

Somebody(TM) should go through the errno(2) documentation and make 
the descriptions more generic, and add guidance for choosing which 
code to return.

We need to be careful here because the set of errnos returned by
many syscalls is fixed by POSIX etc.

christos



Re: Making forced unmounts work

2012-12-08 Thread Christos Zoulas
In article 31490263-5a8e-411a-bb57-f7fc5cffc...@eis.cs.tu-bs.de,
J. Hannken-Illjes hann...@eis.cs.tu-bs.de wrote:
The more I think the more I just want to remove forced unmounts.

I think that any operation that cannot be undone (and requires reboot
to be undone) makes the OS less resilient to failure.

To take some examples:

- A hard,nointr NFS mount hanging because the server stops responding.

Even if it were possible to use fstrans_ here (and it would become ugly)
it would not help.  The root node of the mount will likely be locked by the
first thread trying to lookup so unmount won't be able to even lookup
the mount point.  If it were possible to run `mount -u' or `unmount' it
should be possible to update the mount as `soft,intr' and proceed as usual,
kill threads blocking an unmount and unmount.

Store the normalized mount path with the mountpoint, look it up in the mount
list, make all blocked threads give an I/O error on the current operation,
etc.

christos



Re: fexecve, round 3

2012-11-25 Thread Christos Zoulas
In article 20121125152520.ga17...@panix.com,
Thor Lancelot Simon  t...@panix.com wrote:
On Sat, Nov 24, 2012 at 06:53:16PM +0100, Emmanuel Dreyfus wrote:
 Let's try to move forward, and I will start will a sum up of what I
 understand from the standard. It would be nice if we could at least
 reach consensus on standard interpretation.

I think your interpretation of the standard is correct.  The
particularly problematic part is:

 O_EXEC is mutually exclusive with O_RDONLY, O_WRONLY, or O_RDWR

This -- along with the basic shift from checking permissions when a handle
to an object is obtained to checking them when it's used -- is exemplary of
the poor design that seems to have gone into this set of features.

 Does everyone agrees on this interpretation? If we do, next steps are
 - describe threats this introduce to chrooted processes
 - decide if they are acceptable and if they are not, propose mitigation.

I think you left out part of the solution space:

 - simply don't include this poorly-designed functionality in NetBSD.

Unless you want to change O_RDONLY to be non-zero and version all
the syscalls that use it :-)

christos



Re: WAPL panic

2012-11-06 Thread Christos Zoulas
In article 20121106221628.gl22...@trav.math.uni-bonn.de,
Edgar Fuß  e...@math.uni-bonn.de wrote:
So, while investigating my WAPL performance problems, It looks like I can 
crash the machine (not reliably, but more often that not) with a simple
   seq 1 3000 | xargs mkdir
command. I get the following backtrace in ddb (wetware OCR):

panic: wapbl_register_deallocation: out of resources
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 8016f01d cs 8 rflags 246 cr2
80011fc2d000 cpl 0 rsp fe811e0fe6f0
Stopped in pid 12551.1 (mkdir) at  netbsd:breakpoint+0x5:  leave
db{3} bt
breakpoint() at netbs:breakpoint+0x5
vpanic() at netbsd:vpanic+0x1f2
printf_nolog() at netbsd:printf_nolog
wapbl_register_inode() at netbsd:wapo_register_inode
ffs_truncaze() at netbsd:ffs_truncate+0x917
ufs_direnter() at netbsd:ufs_direnter+0x481
ufs_mkdir() at netbsd:ufs_mkdir+0x617
VOP_MKDIR() at netbsd:VOP_MKDIR+0x3b
do_sys_mkdir() at netbsd:do_sys_mkdir+0x10f
syscall() at netbsd:syscall+0xc4

It's unreasonable to take a dump because that would take an estimated four 
to five hours. Is there any reasonable way to get a dump out of a 16G box?

Try to get a sparse dump via machdep.sparse_dump=1

christos



Re: ETHERCAP_* ioctl()

2012-10-31 Thread Christos Zoulas
In article 5090fc73.4060...@execsw.org,
Masanobu SAITOH  msai...@execsw.org wrote:
 Hi, all.

 I sent the followin mail more than two years ago.

 http://mail-index.netbsd.org/tech-kern/2010/07/28/msg008613.html

 As the starting point to solve this problem, I committed the change to
add SIOCGETHERCAP stuff.

 Example:
 msk0: flags=8802BROADCAST,SIMPLEX,MULTICAST mtu 1500
 ec_capabilities=5VLAN_MTU,JUMBO_MTU
 ec_enabled=0
 address: 00:50:43:00:4b:c5
 media: Ethernet autoselect
 status: no carrier
 wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500

capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx,TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6

enabled=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx,TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
 ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
 ec_enabled=0
 address: 00:1b:21:58:68:34
 media: Ethernet autoselect (1000baseT
full-duplex,flowcontrol,rxpause,txpause)
 status: active
 inet 192.168.1.5 netmask 0xff00 broadcast 192.168.1.255
 inet6 fe80::21b:21ff:fe58:6834%wm0 prefixlen 64 scopeid 0x2
 inet6 2001:240:694:1:21b:21ff:fe58:6834 prefixlen 64


 What do you think about this output?

Very nice!

christos



Re: suenv

2012-10-23 Thread Christos Zoulas
In article c75a84166056c94f84d238a44af9f6ad277...@ausx10mpc103.amer.dell.com,
 paul_kon...@dell.com wrote:

But apache is security critical, isn't it?  And it certainly is
threaded.  Or are you applying the term security critical only to a
smaller set of components?  

Yes, but apache is designed to be threaded. login, su, and other
pam users not necessarily. Typically programs know the closure
of shared libraries that they can potentially use, and PAM breaks
that model. The threaded/non-threaded case is a particularly nasty
example, where a program might assume that it can use static storage
and non-threaded interfaces (res_foo() instead of res_nfoo(),
getdbfoo() instead of getdbfoo_r()) and then suddenly it finds
itself in a threaded environment and potential heisen bugs. In the
apache case these may effect only the apache user and whatever
access it has, but login/su and other PAM users cases this leads
to a complete system compromise.

christos



Re: suenv

2012-10-23 Thread Christos Zoulas
In article 20121023162142.gb24...@panix.com,
Thor Lancelot Simon  t...@panix.com wrote:

Nasty hacks like subverting the protection against LD_PRELOAD
on setuid executables are not called for in a case like this.
If we resort to them, why should our users trust us to deliver
quality software?  If you want the wild west, you can find
Debian's openssl patches over there -.

Not that I advocate doing that (and I will not provide the
recipe to do it), but if you want to always load libpthread
you can do so via ld.so.conf(5). Resist the temptation :-)

christos



Re: fixing zfs

2012-10-14 Thread Christos Zoulas
In article 20121014193635.6ccf360...@jupiter.mumble.net,
Taylor R Campbell  campbell+netbsd-tech-k...@mumble.net wrote:
-=-=-=-=-=-

The attached patches fixes a lot of issues in our zfs port mainly
having to do with locking and our (insane) vop protocols.  With it,
many of the zfs tests pass much more reliably, although there remain a
number that still fail, mainly having to do with permissions and file
flags.

Currently zfs is badly hosed, and I am pretty confident that at least
the intent of these patches is correct even if I have made some
mistakes in the details.  So I strongly doubt whether committing these
patches would make our zfs situation any worse than it currently is.

Any objections?  If not, I'll commit them tomorrow or in the next few
days.

None from here.

christos



Re: 5.1 vs gdb

2012-10-14 Thread Christos Zoulas
In article 201210142318.taa11...@sparkle.rodents-montreal.org,
Mouse  mo...@rodents-montreal.org wrote:
I've run into an issue with gdb on 5.1, and ktrace leads me to think
it's likely a kernel issue (hence this list).  It wouldn't surprise me
too much if I were wrong, though; feel free to point me elsewhere if
appropriate.

The surface manifestation is straightforward:

% cat gdbtest.c
int main(void);
int main(void)
{
 return(0);
}

Fixed in 6. You'll need all my sys/ commits around 2011-08-20 - 2011-09-05

christos



Re: pass-through linux ioctl for mfi(4)

2012-09-18 Thread Christos Zoulas
On Sep 19, 12:38am, bou...@antioche.eu.org (Manuel Bouyer) wrote:
-- Subject: Re: pass-through linux ioctl for mfi(4)

| Hello,
| so it seems we can't do much better in compat_linux.
| Here's an updated patch, which checks the size before malloc in mfifioctl(),
| and I also removed a debug printf in compat_linux.
| I intend to commit this next weekend.

Fine with me.

christos


Re: pass-through linux ioctl for mfi(4)

2012-09-17 Thread Christos Zoulas
On Sep 17,  2:49pm, bou...@antioche.eu.org (Manuel Bouyer) wrote:
-- Subject: Re: pass-through linux ioctl for mfi(4)

| I agree, but I don't know how to do this (is there a better way than
| hardcoding mfi's major number in compat_linux), can you give details on how
| you would do this ?

devsw_name2{blk,chr}(mfi, NULL, 0), but that could be expensive. Perhaps
do it once? But what about hotplug?

christos


Re: pass-through linux ioctl for mfi(4)

2012-09-17 Thread Christos Zoulas
On Sep 17,  5:47pm, bou...@antioche.eu.org (Manuel Bouyer) wrote:
-- Subject: Re: pass-through linux ioctl for mfi(4)

| But this assumes that the mfi driver is compiled in. it doesn't
| look right, especially in the context of modules.

It works for modules (which is the reason we cannot cache the result).

christos


Re: pass-through linux ioctl for mfi(4)

2012-09-17 Thread Christos Zoulas
On Sep 17,  6:08pm, bou...@antioche.eu.org (Manuel Bouyer) wrote:
-- Subject: Re: pass-through linux ioctl for mfi(4)

| Sorry but I can't see how a kernel with COMPAT_LINUX but without
| mfi would compile.

You you get the major by name using mfi...

christos


Re: pass-through linux ioctl for mfi(4)

2012-09-17 Thread Christos Zoulas
On Sep 17,  8:42pm, bou...@antioche.eu.org (Manuel Bouyer) wrote:
-- Subject: Re: pass-through linux ioctl for mfi(4)

| On Mon, Sep 17, 2012 at 02:31:35PM -0400, Christos Zoulas wrote:
|  On Sep 17,  6:08pm, bou...@antioche.eu.org (Manuel Bouyer) wrote:
|  -- Subject: Re: pass-through linux ioctl for mfi(4)
|  
|  | Sorry but I can't see how a kernel with COMPAT_LINUX but without
|  | mfi would compile.
|  
|  You you get the major by name using mfi...
| 
| I was talking about Robert's solution, which needs the mfiioctl address.
| 
| I looked a bit at this and it's really not straitforward. We need to
| get the vp to have the informations we need, so we're basically re-writing
| some of sys/kern's code ...
| Do you have a better way to get the file's type and major ?

file's f_type should be DTYPE_VNODE, then f_data points to the vnode...
This whole thing is too complicated. Perhaps add dlsym() in the kernel
to do dlsym(mfiioctl)? :-)

christos


Re: pass-through linux ioctl for mfi(4)

2012-09-17 Thread Christos Zoulas
On Sep 17,  8:42pm, bou...@antioche.eu.org (Manuel Bouyer) wrote:
-- Subject: Re: pass-through linux ioctl for mfi(4)

| On Mon, Sep 17, 2012 at 02:30:03PM -0400, Christos Zoulas wrote:
|  On Sep 17,  5:47pm, bou...@antioche.eu.org (Manuel Bouyer) wrote:
|  -- Subject: Re: pass-through linux ioctl for mfi(4)
|  
|  | But this assumes that the mfi driver is compiled in. it doesn't
|  | look right, especially in the context of modules.
|  
|  It works for modules (which is the reason we cannot cache the result).
| 
| But mfi's major won't change, it's independant of the driver being
| present or not, isn't it ?

Right. One should assume so (until we get devfs at least).

christos


Re: pass-through linux ioctl for mfi(4)

2012-09-17 Thread Christos Zoulas
On Sep 17,  9:22pm, bou...@antioche.eu.org (Manuel Bouyer) wrote:
-- Subject: Re: pass-through linux ioctl for mfi(4)

| I agree it's too complicated. Couldn't we just keep the dispatch based on
| com then ?

Let's leave it as it is.

christos


Re: pass-through linux ioctl for mfi(4)

2012-09-16 Thread Christos Zoulas
In article 20120916132322.ga6...@antioche.eu.org,
Manuel Bouyer  bou...@antioche.eu.org wrote:
Hello,
the attached patch adds a pass-through ioctl interface, with the
necessery linux compat code, for mfi(4). This allows to run the
linux binary of the MegaCLI tool provided by LSI logic.
Adding support for the FreeBSD binaries should be easy, once
the COMPAT_FREEBSD is updated to run recent binaries.
(I found that running a 9265-8i without MegaCLI has lots of
limitations, e.g. you have to reboot and enter firmware to start
reconstruction after a disk remplacement).

One problem is that the key conflicts with the ossaudio ioctl.
What I've done is that I explicitely test for the mfi ioctls in
linux_ioctl.c. Does anyone see a better way of handling this ?
More generaly, does anyone have any comments about this code ?

Where is the patch?

christos



Re: CVS commit: src

2012-09-12 Thread Christos Zoulas
On Sep 12,  4:04pm, mar...@duskware.de (Martin Husemann) wrote:
-- Subject: Re: CVS commit: src

| On Wed, Sep 12, 2012 at 01:00:52PM +, Christos Zoulas wrote:
|  This is orthogonal. I believe that in the discussion we had in core
|  we decided to not define _UC_TLSBASE unconditionally, and that ports
|  should define it as needed.
| 
| What does as needed mean here? Can you show an example of an arch not
| needing it?

I don't have one.

christos


Re: freebsd binary and kern.usrstack

2012-09-12 Thread Christos Zoulas
In article 20120912202823.ga5...@antioche.eu.org,
Manuel Bouyer  bou...@antioche.eu.org wrote:
Hello,
I'm trying to run a FreeBSD binary under emulation, but it dies in this
piece of code:
   if (sysctl(mib, 2, _usrstack, len, NULL, 0) == -1)
   PANIC(Cannot get kern.usrstack from sysctl);

(this is in FreeBSD's src/lib/libthr/thread/thr_init.c).

Is there something that can be done about it easily ?
And, BTW, do we support FreeBSD threaded binaries ?

sysctl kern.smp.cpus may also be needed ...

These are simply implemented as emul.freebsd.kern.smp.cpus etc. See
how this is done for linux. Yes, there is no support for amd64 binaries,
but it is pretty easy to add. It is more work to get all the new syscalls
in place. As I mentioned before, if someone can give me a set of binaries
and libraries, I can take a whack at it.

christos



Re: quotactl permissions

2012-09-05 Thread Christos Zoulas
In article 20120905123416.gb10...@homeworld.netbsd.org,
Emmanuel Dreyfus  m...@netbsd.org wrote:
On Wed, Sep 05, 2012 at 06:37:27AM +, David Holland wrote:
 Changing it to effective uid seems like a good plan.

The change below fixes the test case. Is it safe to commit?

Yes, but it should all be encapsulated in the kauth call. It is an abstraction
violation to do the id check separately.

christos


Index: sys/ufs/ufs/ufs_quota.c
===
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_quota.c,v
retrieving revision 1.111
diff -U4 -r1.111 ufs_quota.c
--- sys/ufs/ufs/ufs_quota.c 26 Aug 2012 02:32:14 -  1.111
+++ sys/ufs/ufs/ufs_quota.c 5 Sep 2012 12:33:07 -
@@ -334,9 +334,9 @@
 /* XXX shouldn't all this be in kauth ? */
 static int
 quota_get_auth(struct mount *mp, struct lwp *l, uid_t id) {
/* The user can always query about his own quota. */
-   if (id == kauth_cred_getuid(l-l_cred))
+   if (id == kauth_cred_geteuid(l-l_cred))
return 0;
return kauth_authorize_system(l-l_cred, KAUTH_SYSTEM_FS_QUOTA,
KAUTH_REQ_SYSTEM_FS_QUOTA_GET, mp, KAUTH_ARG(id), NULL);
 }


-- 
Emmanuel Dreyfus
m...@netbsd.org






Re: [PATCH] swapcontext vs libpthread

2012-08-25 Thread Christos Zoulas
On Aug 25,  7:00am, m...@netbsd.org (Emmanuel Dreyfus) wrote:
-- Subject: Re: [PATCH] swapcontext vs libpthread

|  FIX: ./alpha/gen/swapcontext.S:  CALL(setcontext)/* 
setcontext(ucp) */
| 
| That one seems already fine to me. The CALL macro is here to invoke a function
| Am I wrong?

CALL() is good.

| 
|  FIX: ./hppa/gen/swapcontext.S:   SYSCALL(setcontext)
| 
| If I try to steal from resumecontext, I would do this. Does it make sense?
| 
| #ifdef PIC 
| ldw HPPA_FRAME_EDP(%sp), %r19 
| addil   LT%_C_LABEL(setcontext), %r19
| ldw RT%_C_LABEL(setcontext)(%r1), %r1
| #else 
| ldilL%_C_LABEL(setcontext), %r1
| ldo R%_C_LABEL(setcontext)(%r1), %r1
| #endif

Yes, that loads the address to %1, you'll need to call afterwards.

|  
|  FIX: ./mips/gen/_resumecontext.S:SYSTRAP(setcontext)
 #   yes, become it.
|  FIX: ./mips/gen/swapcontext.S:   SYSTRAP(setcontext)
| 
| I would do this:
|   PIC_TAILCALL(setcontext)

I guess.

| 
|  FIX?: ./sh3/gen/swapcontext.S:mov.l   .L_setcontext, r2
|  FIX?: ./sh3/gen/swapcontext.S:2:  CALLr2  /* 
setcontext(ucp) */
| 
| There is this later in the file, therefore I would say it is okay.
| .L_setcontext:  CALL_DATUM(_C_LABEL(setcontext), 2b)

Ok.

|  FIX: ./sparc/gen/swapcontext.S:  mov 
SYS_setcontext|SYSCALL_G2RFLAG, %g1
|  FIX: ./sparc64/gen/swapcontext.S:  mov 
SYS_setcontext|SYSCALL_G2RFLAG, %g1
| 
| I would do this:
|   call_C_LABEL(setcontext)

Sure.

| 
|  FIX: ./powerpc64/gen/swapcontext.S:  bl  .setcontext
 # setcontext(ucp)
| 
| Here it seems to be:
| bl  PIC_PLT(_C_LABEL(setcontest))
| 

Ok, sounds good. Portmasters, please chime in!

christos


Re: [PATCH] swapcontext vs libpthread

2012-08-25 Thread Christos Zoulas
On Aug 25,  9:10am, m...@netbsd.org (Emmanuel Dreyfus) wrote:
-- Subject: Re: [PATCH] swapcontext vs libpthread

| On Sat, Aug 25, 2012 at 03:10:51AM -0400, Christos Zoulas wrote:
| [Call a C function from hppa assembly]
|  Yes, that loads the address to %1, you'll need to call afterwards.
| 
| It seems to be done with bv,n%r0(%r1). I understand bv
| is branch-something, %r1 makes sense, but I am not sure about %r0.

Typically %r0 == 0 on RISC...

| I would have something like this:
| 
| #ifdef PIC
| ldw HPPA_FRAME_EDP(%sp), %r19
| addil   LT%_C_LABEL(setcontext), %r19
| ldw RT%_C_LABEL(setcontext)(%r1), %r1
| #else
| ldilL%_C_LABEL(setcontext), %r1
| ldo R%_C_LABEL(setcontext)(%r1), %r1
| #endif
| bv,n%r0(%r1)
| 

I am not that familiar with PA-RISC assembly, but I guess the tests
will fail if this is incorrect :-)

| I will prepare a patch with all libc proposed changes in about 
| 18 hours.

Sounds good.

christos


Re: [PATCH] swapcontext vs libpthread

2012-08-23 Thread Christos Zoulas
In article 20120822170050.gj2...@homeworld.netbsd.org,
Emmanuel Dreyfus  m...@netbsd.org wrote:
-=-=-=-=-=-

Here is an updated patch for sorting out swapcontext with libpthread,
with documentation and test cases.

I would appreciate feedback on LWP_PRESERVETLS flag to _lwp_create().
This tells the kernel that the TLS base register will be used by 
libpthread and that setcontext() should leave it untouched. 

This is done in kernel because it seems to be the easiest way: 
another approach would be to have libpthread overriding setcontext(),
but that seems a bad choice: after unsetting _UC_TLSBASE it needs
to call the real setcontext, which means doing a system call from
libpthread. That looks wrong.

Why do you say that? pthread_cancelstub.c does exactly this (wrapping
a syscall and calling it) all the time. I don't think we should be
getting the kernel involved with this.

christos



Re: [RFC][PATCH] _UC_TLSBASE for all ports

2012-08-11 Thread Christos Zoulas
In article 20120810173818.ga8...@britannica.bec.de,
Joerg Sonnenberger  jo...@britannica.bec.de wrote:
On Fri, Aug 10, 2012 at 07:31:59PM +0200, Emmanuel Dreyfus wrote:
 Joerg Sonnenberger jo...@britannica.bec.de wrote:
 
  I maintain that trying to move contexts between threads is an inherently
  bad idea and that it is a very inefficient interface for implementing
  coroutines. I object to this change for the sake of misdesigned
  software.
 
 Did you look at that test case? This is a nasty bug, and I think we need
 a way for user to opt out of that behavior.

I don't agree with it being a nasty bug. Heck, document it as limitation
if you want. But essentially don't mix *context and pthread in this way,
it will create other interesting issues later.

Like it or not most of the world has turned into linux. We can either
provide compatibility where possible (and not overly disgusting) to
gain compatibility with 3rd party code developed for linux, or simply
say tough, it will not work on NetBSD because we refuse to compromise.

It is a slippery slope, but I think in this case it is wise to bend.
If we cannot reach agreement here, consult core.

christos



Re: [RFC][PATCH] _UC_TLSBASE for all ports

2012-08-11 Thread Christos Zoulas
On Aug 11, 11:16am, m...@netbsd.org (Emmanuel Dreyfus) wrote:
-- Subject: Re: [RFC][PATCH] _UC_TLSBASE for all ports

| Christos Zoulas chris...@astron.com wrote:
| 
|  Like it or not most of the world has turned into linux. We can either
|  provide compatibility where possible (and not overly disgusting) to
|  gain compatibility with 3rd party code developed for linux, or simply
|  say tough, it will not work on NetBSD because we refuse to compromise.
| 
| IMO here it is even a worse stance, since we already have the desired
| fix for amd64, i386, m68k, mips, vax, and hppa. It would be more like:
| it will work on NetBSD, except for sh3, sparc, and sparc64 ports
| because we refuse to compromise. And on powerpc and alpha it will work
| but with different interfaces. 
| 
| We are supposed to focus on portability, it is really weird to argue
| that a feature should have a MI interface.

I don't see why this change is met with so much resistance...
I could believe that, if the change suggested to make this the default
behavior (which some would argue it should be...)

christos


Re: [RFC][PATCH] _UC_TLSBASE for all ports

2012-08-11 Thread Christos Zoulas
On Aug 11, 12:40pm, m...@netbsd.org (Emmanuel Dreyfus) wrote:
-- Subject: Re: [RFC][PATCH] _UC_TLSBASE for all ports

| Christos Zoulas chris...@zoulas.com wrote:
| 
|  I could believe that, if the change suggested to make this the default
|  behavior (which some would argue it should be...)
| 
| In an ideal world, we would set _UC_TLSBASE by default (as it is today,
| except on powerpc), and we would automatically ignore _UC_TLSBASE when
| l-l_proc-p_nlwps  1, since we cannot think of a sane usage for that.

Well, why don't we make it that way then?

christos


Re: [RFC][PATCH] _UC_TLSBASE for all ports

2012-08-11 Thread Christos Zoulas
On Aug 11,  5:13pm, m...@netbsd.org (Emmanuel Dreyfus) wrote:
-- Subject: Re: [RFC][PATCH] _UC_TLSBASE for all ports

|  Well, why don't we make it that way then?
| 
| We cannot toggle an option that does not exist, so that require adding
| _UC_TLSBASE  for ports that miss it. This meets a strong opposition for
| now.

Again, if there is no consensus in the lists, we'll have to let core
decide.

christos


Re: [RFC][PATCH] _UC_TLSBASE for all ports

2012-08-11 Thread Christos Zoulas
On Aug 11,  1:35pm, t...@panix.com (Thor Lancelot Simon) wrote:
-- Subject: Re: [RFC][PATCH] _UC_TLSBASE for all ports

| On Sat, Aug 11, 2012 at 06:45:12AM +, Christos Zoulas wrote:
|  
|  It is a slippery slope, but I think in this case it is wise to bend.
|  If we cannot reach agreement here, consult core.
| 
| I see no point bending NetBSD into knots in this case if the resulting
| performance is as bad as Joerg claims it will be.  Is it actually the
| case that our *context() functions are almost as heavy as a full
| kernel-level thread switch?

The point is that glusterfs works without code modifications (or minimal ones)
and wit acceptable performance.

christos


Re: malo@pci vs malo@pcmcia

2012-08-03 Thread Christos Zoulas
In article 20120803084934.GA3362@bugfree,
Arnaud Degroote  arnaud.degro...@laas.fr wrote:
-=-=-=-=-=-

On 01/Aug - 21:43, KIYOHARA Takashi wrote:
 Hi! all,
 
 
 I have a 'I-O DATA WN-G54/CF'.  And some on-board 88W8686 has on
 pcmcia-bus of Gumstix.
 I think, malo@pcmcia and malo@pci is all different.  These drivers
 can't merge maybe.

From a quick review, the driver seems quite different, and it is
probably why they are not merged in OpenBSD.
 

http://www.openbsd.org/cgi-bin/man.cgi?query=maloapropos=0sektion=0manpath=OpenBSD+Currentarch=i386format=html
 
 
 Shall I commit to tree the source for malo@pcmcia?
 If 'yes' then I will try to verify next week end.  :-)

I think it would be nice to have the support for malo@pcmcia too in
NetBSD. 


There is not much pcmcia left around, but I guess it would be nice to have
the driver for those who have the card. I would commit it.

christos



Re: pinning down dk? assignment

2012-07-24 Thread Christos Zoulas
In article julcud$9sd$1...@serpens.de,
Michael van Elst mlel...@serpens.de wrote:

Let wd1 disappear and the raid will try to use wd0a (dk0) and sd0a (dk1).
Of course raidframe will notice the mismatch in this case, but you can
easily imagine more complex scenarios where it doesn't. But a simple
failure case comes from trying to recover the failed wd1 without
rebooting. When you replace the drive it may attach as wd1 but then
as dk2. Now try to teach raidrame that its component changed the path.

That works too because the components are not signed. Actually this
is the exact failure I got because wd0 was not found because of the
latest ata changes!

christos



Re: pinning down dk? assignment

2012-07-23 Thread Christos Zoulas
In article 20120723141721.gj4...@trav.math.uni-bonn.de,
Edgar Fuß  e...@math.uni-bonn.de wrote:
Can I somehow pin down which dk? gets assigned to which GPT partition?

In a disklabel world, I have components sd2a..sd6a making raid1.
I then have raid1a mounted on /export/home and raid1e on /export/mail.

In a GPT/wedge word, I have dk0..dk4 (on sd2..sd6) making raid1.
I then have dk5 and dk6 on raid1 mounted on /export/home resp. /export/mail.

Now suppose the machine comes up with sd6 failing to attach.
At that time, I have dk0..dk3, which (plus an absent dk4) will build raid1.
I then get dk4 and dk5 on raid1 and the mail fs mounted on /export/home.

Use NAME=guid or NAME=underlying-device-name instead of /dev/dkX for the
fs_spec field.

christos



Re: pinning down dk? assignment

2012-07-23 Thread Christos Zoulas
In article juk971$qpi$1...@serpens.de,
Michael van Elst mlel...@serpens.de wrote:
e...@math.uni-bonn.de (=?iso-8859-1?Q?Edgar_Fu=DF?=) writes:

 It probably won't help you with raidframe.
It would indeed help in my case. In case sd6 has gone missing, so dk4
is on the RAID and not on sd6, it would prevent the wrong filesystem
being mounted for dk5.

I was refering to the situation with building a raid on wedge components.
Wedges on top of raid are no problem.

Actually I do exactly that (raid on top of wedges)

dk0 at wd0: wd0a
dk0: 488397105 blocks at 63, type: raidframe
dk1 at wd1: wd1a
dk1: 488397105 blocks at 63, type: raidframe
dk2 at sd0: sd0a
dk2: 117210177 blocks at 63, type: ffs
Component on: dk0: 488397105
Component on: dk1: 488397105
Found: dk1 at 0
Found: dk1 at 0
Found(low mod_counter): dk0 at 1
raid0: Components: /dev/dk1 /dev/dk0[**FAILED**]

Ignore the failed dk here, this is because of the bad ata code in current.
In fstab I have:

NAME=raid0a /   ffs rw,log  7 1
NAME=raid0b noneswapsw  0 0
NAME=raid0e /varffs rw,log  7 3
NAME=raid0f /usrffs rw,log  7 3
NAME=raid0g /usr/local  ffs rw,log  7 3


christos



<    1   2   3   4   5   6   7   8   >