Paravirtualized optimizations for KVM

2014-07-08 Thread Stefan Fritsch
Hi,

I have been trying to increase fork performance of openbsd/amd64 on KVM. 
It turns out that when I increase the number of CPUs of a VM from 1 to 3, 
a fork+exit micro benchmark is slowed down by a factor of 7.

The main reason for this seems to be a very large number of cross-CPU TLB 
flushes (about 4 per fork+exit). Each IPI causes several VM exits which 
are expensive. To reduce this, I have been trying to use paravirtualized 
interfaces provided by KVM and optimize some other things. These changes 
are mostly activated by a new pseudo device paravirt (which has the 
advantage that one can use UKC to tweak things without recompiling). 
However, some changes will remain if not running on a hypervisor (or 
paravirt is disabled). For example, x86_ipi() will use a pointer to 
dispatch to the appropriate implementation.

Is this the way to go forward? Or would you rather prefer a compile time 
option and maybe ship a bsd.mp.paravirt kernel in addition to bsd+bsd.mp?


The attached patch speeds up the fork+exit micro benchmark by a factor of 
3 on a 3 CPU system. And the time to build a kernel with -j4 on a 4 CPU 
system is also reduced by about 20%:

current:
real1m50.089s
user4m46.240s
sys 1m29.510s

current+paravirt:
real1m29.313s
user4m54.720s
sys 0m45.100s



BTW, why does amd64 use the APTE mapping/unmapping dance in pmap despite 
the memory being available in the direct map area all the time?


Cheers,
Stefan


diff --git a/sys/arch/amd64/amd64/cpu.c b/sys/arch/amd64/amd64/cpu.c
index 88725f7..248ebb8 100644
--- a/sys/arch/amd64/amd64/cpu.c
+++ b/sys/arch/amd64/amd64/cpu.c
@@ -83,6 +83,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -569,6 +570,19 @@ cpu_init(struct cpu_info *ci)
ci->ci_flags |= CPUF_RUNNING;
tlbflushg();
 #endif
+#if NPARAVIRT > 0
+   if (kvm_pv_eoi_enabled) {
+   paddr_t pa;
+   ci->ci_kvm_pv_eoi = 0;
+   if (pmap_extract(pmap_kernel(), (vaddr_t)&ci->ci_kvm_pv_eoi, 
&pa) &&
+   ((uint64_t)pa & 0x3) == 0) {
+   wrmsr(MSR_KVM_EOI_EN, (1 | (uint64_t)pa) );
+   } else {
+   printf("could not get phys addr for MSR_KVM_EOI_EN, 
disabling pv_eoi\n");
+   kvm_pv_eoi_enabled = 0;
+   }
+   }
+#endif
 }
 
 
diff --git a/sys/arch/amd64/amd64/genassym.cf b/sys/arch/amd64/amd64/genassym.cf
index e13a477..ab20329 100644
--- a/sys/arch/amd64/amd64/genassym.cf
+++ b/sys/arch/amd64/amd64/genassym.cf
@@ -114,6 +114,9 @@ member  CPU_INFO_MUTEX_LEVELci_mutex_level
 endif
 member CPU_INFO_GDTci_gdt
 member CPU_INFO_TSSci_tss
+if NPARAVIRT > 0
+member CPU_INFO_KVM_PV_EOI ci_kvm_pv_eoi
+endif
 
 struct intrsource
 member is_recurse
diff --git a/sys/arch/amd64/amd64/lapic.c b/sys/arch/amd64/amd64/lapic.c
index d09e3fc..857af4b 100644
--- a/sys/arch/amd64/amd64/lapic.c
+++ b/sys/arch/amd64/amd64/lapic.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -235,20 +236,42 @@ lapic_boot_init(paddr_t lapic_base)
lapic_map(lapic_base);
 
 #ifdef MULTIPROCESSOR
-   idt_allocmap[LAPIC_IPI_VECTOR] = 1;
-   idt_vec_set(LAPIC_IPI_VECTOR, Xintr_lapic_ipi);
-   idt_allocmap[LAPIC_IPI_INVLTLB] = 1;
-   idt_vec_set(LAPIC_IPI_INVLTLB, Xipi_invltlb);
-   idt_allocmap[LAPIC_IPI_INVLPG] = 1;
-   idt_vec_set(LAPIC_IPI_INVLPG, Xipi_invlpg);
-   idt_allocmap[LAPIC_IPI_INVLRANGE] = 1;
-   idt_vec_set(LAPIC_IPI_INVLRANGE, Xipi_invlrange);
+#if NPARAVIRT > 0
+   if (kvm_pv_eoi_enabled) {
+   idt_allocmap[LAPIC_IPI_VECTOR] = 1;
+   idt_vec_set(LAPIC_IPI_VECTOR, Xintr_lapic_ipi_kvm_pv_eoi);
+   idt_allocmap[LAPIC_IPI_INVLTLB] = 1;
+   idt_vec_set(LAPIC_IPI_INVLTLB, Xipi_invltlb_kvm_pv_eoi);
+   idt_allocmap[LAPIC_IPI_INVLPG] = 1;
+   idt_vec_set(LAPIC_IPI_INVLPG, Xipi_invlpg_kvm_pv_eoi);
+   idt_allocmap[LAPIC_IPI_INVLRANGE] = 1;
+   idt_vec_set(LAPIC_IPI_INVLRANGE, Xipi_invlrange_kvm_pv_eoi);
+   }
+   else
+#endif
+   {
+   idt_allocmap[LAPIC_IPI_VECTOR] = 1;
+   idt_vec_set(LAPIC_IPI_VECTOR, Xintr_lapic_ipi);
+   idt_allocmap[LAPIC_IPI_INVLTLB] = 1;
+   idt_vec_set(LAPIC_IPI_INVLTLB, Xipi_invltlb);
+   idt_allocmap[LAPIC_IPI_INVLPG] = 1;
+   idt_vec_set(LAPIC_IPI_INVLPG, Xipi_invlpg);
+   idt_allocmap[LAPIC_IPI_INVLRANGE] = 1;
+   idt_vec_set(LAPIC_IPI_INVLRANGE, Xipi_invlrange);
+   }
 #endif
idt_allocmap[LAPIC_SPURIOUS_VECTOR] = 1;
idt_vec_set(LAPIC_SPURIOUS_VECTOR, Xintrspurious);
 
idt_allocmap[LAPIC_TIMER_VECTOR] = 1;
+#if NPARAVIRT > 0
+   if (kvm_pv_eoi_enabled)
+   idt_vec_set(LAPIC_TIMER_VECTOR, Xintr_lapic_ltimer_kvm_pv_eoi);
+  

Re: Use atomics for mutating p_sigmask

2014-07-08 Thread Martin Pieuchot
On 07/07/14(Mon) 13:46, Matthew Dempsky wrote:
> p_sigmask is only modified by the owning thread from process context
> (e.g., sys_sigprocmask(), sys_sigreturn(), userret(), or postsig()),
> but it can be accessed anywhere (e.g., interrupts or threads on other
> CPUs).  Currently sys_sigprocmask() protects p_sigmask with splhigh(),
> but that's not MP-safe.
>
> 
> The simpler solution is to take advantage of our atomics APIs.
> Unfortunately for implementing SIG_SETMASK, we don't have an
> atomic_store_int() function, and I can't bring myself to abuse
> "volatile" for this purpose, so I've resorted to atomic_swap_uint().
> 
> While here, I also refactored the P_SIGSUSPEND code a bit so there's
> less code duplication.
> 
> I've included just amd64 and i386's machdep.c for now, because those
> were the only ones I'm readily able to test.  The others should be
> easy to similarly fix though, and I can send a followup diff for them.

There was a previous attempt to mark sigpromask(2) as nolock but if I
recall properly setting p_sigmask atomically is not enough and there's
still a possible race in ptsignal().

Have you looked into this function, is it safe now?

> Index: sys/proc.h
> ===
> RCS file: /home/matthew/cvs-mirror/cvs/src/sys/sys/proc.h,v
> retrieving revision 1.186
> diff -u -p -r1.186 proc.h
> --- sys/proc.h15 May 2014 03:52:25 -  1.186
> +++ sys/proc.h7 Jul 2014 19:23:06 -
> @@ -520,6 +520,8 @@ int   proc_cansugid(struct proc *);
>  void proc_finish_wait(struct proc *, struct proc *);
>  void process_zap(struct process *);
>  void proc_free(struct proc *);
> +void proc_sigsetmask(struct proc *, sigset_t);
> +void proc_sigsuspend(struct proc *, sigset_t);
>  
>  struct sleep_state {
>   int sls_s;
> @@ -559,4 +561,3 @@ struct cpu_info *cpuset_first(struct cpu
>  
>  #endif   /* _KERNEL */
>  #endif   /* !_SYS_PROC_H_ */
> -
> Index: kern/kern_sig.c
> ===
> RCS file: /home/matthew/cvs-mirror/cvs/src/sys/kern/kern_sig.c,v
> retrieving revision 1.166
> diff -u -p -r1.166 kern_sig.c
> --- kern/kern_sig.c   4 May 2014 05:03:26 -   1.166
> +++ kern/kern_sig.c   7 Jul 2014 19:24:07 -
> @@ -62,6 +62,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -439,37 +440,55 @@ sys_sigprocmask(struct proc *p, void *v,
>   syscallarg(int) how;
>   syscallarg(sigset_t) mask;
>   } */ *uap = v;
> - int error = 0;
> - int s;
> - sigset_t mask;
> + sigset_t mask = SCARG(uap, mask);
>  
>   *retval = p->p_sigmask;
> - mask = SCARG(uap, mask);
> - s = splhigh();
>  
>   switch (SCARG(uap, how)) {
>   case SIG_BLOCK:
> - p->p_sigmask |= mask &~ sigcantmask;
> + atomic_setbits_int(&p->p_sigmask, mask &~ sigcantmask);
>   break;
>   case SIG_UNBLOCK:
> - p->p_sigmask &= ~mask;
> + atomic_clearbits_int(&p->p_sigmask, mask);
>   break;
>   case SIG_SETMASK:
> - p->p_sigmask = mask &~ sigcantmask;
> + proc_sigsetmask(p, mask &~ sigcantmask);
>   break;
>   default:
> - error = EINVAL;
> - break;
> + return (EINVAL);
>   }
> - splx(s);
> - return (error);
> +
> + return (0);
> +}
> +
> +void
> +proc_sigsetmask(struct proc *p, sigset_t mask)
> +{
> + KASSERT(p == curproc);
> +
> + /* XXX: An atomic store would suffice here. */
> + (void)atomic_swap_uint(&p->p_sigmask, mask);
> +}
> +
> +/*
> + * Temporarily replace calling proc's signal mask for the duration of a
> + * system call.  Original signal mask will be restored by userret().
> + */
> +void
> +proc_sigsuspend(struct proc *p, sigset_t mask)
> +{
> + KASSERT(p == curproc);
> + KASSERT(!(p->p_flag & P_SIGSUSPEND));
> +
> + p->p_oldmask = p->p_sigmask;
> + atomic_setbits_int(&p->p_flag, P_SIGSUSPEND);
> + proc_sigsetmask(p, mask);
>  }
>  
>  /* ARGSUSED */
>  int
>  sys_sigpending(struct proc *p, void *v, register_t *retval)
>  {
> -
>   *retval = p->p_siglist;
>   return (0);
>  }
> @@ -489,16 +508,7 @@ sys_sigsuspend(struct proc *p, void *v, 
>   struct process *pr = p->p_p;
>   struct sigacts *ps = pr->ps_sigacts;
>  
> - /*
> -  * When returning from sigpause, we want
> -  * the old mask to be restored after the
> -  * signal handler has finished.  Thus, we
> -  * save it here and mark the sigacts structure
> -  * to indicate this.
> -  */
> - p->p_oldmask = p->p_sigmask;
> - atomic_setbits_int(&p->p_flag, P_SIGSUSPEND);
> - p->p_sigmask = SCARG(uap, mask) &~ sigcantmask;
> + proc_sigsuspend(p, SCARG(uap, mask) &~ sigcantmask);
>   while (tsleep(ps, PPAUSE|PCATCH, "pause", 0) == 0)
>   /* void */;
>   /* alway

Re: Use atomics for mutating p_sigmask

2014-07-08 Thread Mark Kettenis
> Date: Mon, 7 Jul 2014 13:46:03 -0700
> From: Matthew Dempsky 
> 
> p_sigmask is only modified by the owning thread from process context
> (e.g., sys_sigprocmask(), sys_sigreturn(), userret(), or postsig()),
> but it can be accessed anywhere (e.g., interrupts or threads on other
> CPUs).  Currently sys_sigprocmask() protects p_sigmask with splhigh(),
> but that's not MP-safe.
> 
> The simpler solution is to take advantage of our atomics APIs.
> Unfortunately for implementing SIG_SETMASK, we don't have an
> atomic_store_int() function, and I can't bring myself to abuse
> "volatile" for this purpose, so I've resorted to atomic_swap_uint().

Sory, but I think that's wrong.  You need volatile to make sure the
mask is read from memory when you're checking bits.  Or you need to
insert the proper memory barriers I think.



unify some bpf code

2014-07-08 Thread Henning Brauer
I'll need this for some upcoming changes, at least to do it WITHOUT
adding the 3rd or 4th or 5th copy of the bpf_mtap loop. most of these
bpf_mtap_* are almost identical, minor differences in what to prepend,
and foremost: passing custom copy functions. since bpf_mtap is all
over the place I made bpf_mtap_hdr the flexible one: takes the copy
function as parameter (if NULL, use default), and don't prepend
anything if dlen == 0 (so can be used like bpf_mtap).

lightly tested.

Index: arch/sparc/dev/if_ie.c
===
RCS file: /cvs/src/sys/arch/sparc/dev/if_ie.c,v
retrieving revision 1.45
diff -u -p -r1.45 if_ie.c
--- arch/sparc/dev/if_ie.c  28 Nov 2013 22:18:52 -  1.45
+++ arch/sparc/dev/if_ie.c  8 Jul 2014 14:37:17 -
@@ -1335,7 +1335,7 @@ ie_readframe(sc, num)
if (bpf_gets_it) {
/* Pass it up. */
bpf_mtap_hdr(sc->sc_arpcom.ac_if.if_bpf, (caddr_t)&eh,
-   sizeof(eh), m, BPF_DIRECTION_IN);
+   sizeof(eh), m, BPF_DIRECTION_IN, NULL);
}
/*
 * A signal passed up from the filtering code indicating that the
Index: net/bpf.c
===
RCS file: /cvs/src/sys/net/bpf.c,v
retrieving revision 1.93
diff -u -p -r1.93 bpf.c
--- net/bpf.c   23 Apr 2014 10:50:18 -  1.93
+++ net/bpf.c   8 Jul 2014 15:09:10 -
@@ -92,6 +92,8 @@ LIST_HEAD(, bpf_d) bpf_d_list;
 void   bpf_allocbufs(struct bpf_d *);
 void   bpf_freed(struct bpf_d *);
 void   bpf_ifname(struct ifnet *, struct ifreq *);
+void   _bpf_mtap(caddr_t, struct mbuf *, u_int,
+   void (*)(const void *, void *, size_t));
 void   bpf_mcopy(const void *, void *, size_t);
 intbpf_movein(struct uio *, u_int, struct mbuf **,
struct sockaddr *, struct bpf_insn *);
@@ -1159,10 +1161,11 @@ bpf_mcopy(const void *src_arg, void *dst
 }
 
 /*
- * Incoming linkage from device drivers, when packet is in an mbuf chain.
+ * like bpf_mtap, but copy fn can be given. used by various bpf_mtap*
  */
 void
-bpf_mtap(caddr_t arg, struct mbuf *m, u_int direction)
+_bpf_mtap(caddr_t arg, struct mbuf *m, u_int direction,
+void (*cpfn)(const void *, void *, size_t))
 {
struct bpf_if *bp = (struct bpf_if *)arg;
struct bpf_d *d;
@@ -1174,6 +1177,9 @@ bpf_mtap(caddr_t arg, struct mbuf *m, u_
if (m == NULL)
return;
 
+   if (cpfn == NULL)
+   cpfn = bpf_mcopy;
+
pktlen = 0;
for (m0 = m; m0 != 0; m0 = m0->m_next)
pktlen += m0->m_len;
@@ -1191,13 +1197,22 @@ bpf_mtap(caddr_t arg, struct mbuf *m, u_
 
if (!gottime++)
microtime(&tv);
-   bpf_catchpacket(d, (u_char *)m, pktlen, slen, bpf_mcopy, &tv);
+   bpf_catchpacket(d, (u_char *)m, pktlen, slen, cpfn, &tv);
if (d->bd_fildrop)
m->m_flags |= M_FILDROP;
}
 }
 
 /*
+ * Incoming linkage from device drivers, when packet is in an mbuf chain.
+ */
+void
+bpf_mtap(caddr_t arg, struct mbuf *m, u_int direction)
+{
+   _bpf_mtap(arg, m, direction, NULL);
+}
+
+/*
  * Incoming linkage from device drivers, where we have a mbuf chain
  * but need to prepend some arbitrary header from a linear buffer.
  *
@@ -1208,17 +1223,23 @@ bpf_mtap(caddr_t arg, struct mbuf *m, u_
  */
 void
 bpf_mtap_hdr(caddr_t arg, caddr_t data, u_int dlen, struct mbuf *m,
-u_int direction)
+u_int direction, void (*cpfn)(const void *, void *, size_t))
 {
-   struct m_hdr mh;
-
-   mh.mh_flags = 0;
-   mh.mh_next = m;
-   mh.mh_len = dlen;
-   mh.mh_data = data;
+   struct m_hdr mh;
+   struct mbuf *m0;
 
-   bpf_mtap(arg, (struct mbuf *) &mh, direction);
-   m->m_flags |= mh.mh_flags & M_FILDROP;
+   if (dlen > 0) {
+   mh.mh_flags = 0;
+   mh.mh_next = m;
+   mh.mh_len = dlen;
+   mh.mh_data = data;
+   m0 = (struct mbuf *)&mh;
+   } else 
+   m0 = m;
+
+   _bpf_mtap(arg, (struct mbuf *) m0, direction, cpfn);
+   if (m0 != m)
+   m->m_flags |= m0->m_flags & M_FILDROP;
 }
 
 /*
@@ -1233,17 +1254,10 @@ bpf_mtap_hdr(caddr_t arg, caddr_t data, 
 void
 bpf_mtap_af(caddr_t arg, u_int32_t af, struct mbuf *m, u_int direction)
 {
-   struct m_hdr mh;
u_int32_tafh;
 
-   mh.mh_flags = 0;
-   mh.mh_next = m;
-   mh.mh_len = 4;
afh = htonl(af);
-   mh.mh_data = (caddr_t)&afh;
-
-   bpf_mtap(arg, (struct mbuf *) &mh, direction);
-   m->m_flags |= mh.mh_flags & M_FILDROP;
+   bpf_mtap_hdr(arg, (caddr_t)&afh, 4, m, direction, NULL);
 }
 
 /*
@@ -1259,7 +1273,6 @@ void
 bpf_mtap_ether(caddr_t arg, struct mbuf *m, u_int direction)
 {
 #if NVLAN > 0
-   struct m_hdr mh;
struct ether_vlan_header evh;
 
if ((m->m_flags & M_VLANT

Re: Use atomics for mutating p_sigmask

2014-07-08 Thread Matthew Dempsky
On Tue, Jul 8, 2014 at 1:21 AM, Martin Pieuchot  wrote:
> There was a previous attempt to mark sigpromask(2) as nolock but if I
> recall properly setting p_sigmask atomically is not enough and there's
> still a possible race in ptsignal().
>
> Have you looked into this function, is it safe now?

I haven't looked very closely yet, but I'll be sure to look again
before proposing NOLOCK for sys_sigprocmask().  However, this diff
only replaces the SPLs with atomic operations; it leaves
sys_sigprocmask() under kernel lock.

As long as sys_sigprocmask() is still kernel locked, ptsignal() can
only execute concurrently with sys_sigprocmask() if its an interrupt
on the same CPU.  Without this diff, that interrupt can only occur
before or after the SPL-protected code section; while after this diff,
the interrupt can only occur before or after the atomic operation
(since by definition atomic operations are atomic).



Re: Use atomics for mutating p_sigmask

2014-07-08 Thread Matthew Dempsky
On Tue, Jul 8, 2014 at 1:29 AM, Mark Kettenis  wrote:
>> Date: Mon, 7 Jul 2014 13:46:03 -0700
>> From: Matthew Dempsky 
>>
>> p_sigmask is only modified by the owning thread from process context
>> (e.g., sys_sigprocmask(), sys_sigreturn(), userret(), or postsig()),
>> but it can be accessed anywhere (e.g., interrupts or threads on other
>> CPUs).  Currently sys_sigprocmask() protects p_sigmask with splhigh(),
>> but that's not MP-safe.
>>
>> The simpler solution is to take advantage of our atomics APIs.
>> Unfortunately for implementing SIG_SETMASK, we don't have an
>> atomic_store_int() function, and I can't bring myself to abuse
>> "volatile" for this purpose, so I've resorted to atomic_swap_uint().
>
> Sory, but I think that's wrong.  You need volatile to make sure the
> mask is read from memory when you're checking bits.  Or you need to
> insert the proper memory barriers I think.

To be clear: I meant I don't want to abuse "volatile" as though it's a
magic "make-these-operations-**atomic**" flag, as that's not what it
really means (even if in practice it often has that effect).

Also, as long as the (always current thread) mutators and cross-thread
accessors are still serialized by the kernel lock, marking p_sigmask
as volatile shouldn't be necessary.  However, I do agree that once we
start unlocking any of them (which is a future goal of this work), we
need some sort of atomic guarantee on the read side too, and marking
p_sigmask as volatile seems like a reasonable first step.  (I'd
probably go further and also decorate the accesses with C11-style
atomic_load()s.)

So I'm happy to mark p_sigmask as volatile with this diff if you'd
prefer.  I just don't think it's strictly necessary yet, but I'm not
opposed to it either.



Allow tsleep() with ident==NULL

2014-07-08 Thread Matthew Dempsky
It's rare, but there *are* cases where a thread wants to sleep and
doesn't expect any wakeup() calls at all; e.g., nanosleep() and
sigsuspend().  In these cases, there's no point in requiring a valid
wait channel identifier or adding the thread to the sleep queue.

Diff below explicitly allows tsleep() with ident==NULL, but skips
adding the thread to a sleep queue in that case.

As an implementation detail, we currently treat "p_wchan == NULL" as
an indication that the thread has been woken up.  To avoid breaking
that assumption this diff uses "p_wchan == NOSLPQUE" to identify
threads that are still sleeping, but aren't associated with a sleep
queue.

This also lets us tighten the the kernel lock KASSERT in tsleep():
rather than weakly only checking non-timed sleeps, we can precisely
check all sleeps that use a wait channel.

ok?


Index: share/man/man9/tsleep.9
===
RCS file: /home/matthew/cvs-mirror/cvs/src/share/man/man9/tsleep.9,v
retrieving revision 1.9
diff -u -p -r1.9 tsleep.9
--- share/man/man9/tsleep.9 22 Jan 2014 07:32:47 -  1.9
+++ share/man/man9/tsleep.9 8 Jul 2014 16:25:53 -
@@ -96,8 +96,9 @@ The same identifier must be used in a ca
 .Fn wakeup
 to get the process going again.
 .Fa ident
-should not be
-.Dv NULL .
+may be
+.Dv NULL
+if the process is not waiting on a wait channel.
 .It Fa priority
 The process priority to be used when the process is awakened and put on
 the queue of runnable processes.
Index: sys/kern/kern_synch.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.115
diff -u -p -r1.115 kern_synch.c
--- sys/kern/kern_synch.c   22 Mar 2014 06:05:45 -  1.115
+++ sys/kern/kern_synch.c   8 Jul 2014 16:32:50 -
@@ -78,6 +78,12 @@ sleep_queue_init(void)
TAILQ_INIT(&slpque[i]);
 }
 
+/*
+ * Dummy wait channel identifier for threads that don't expect a wakeup().
+ * Threads sleeping on this wait channel aren't queued on slpque.
+ */
+#defineNOSLPQUE((volatile void *)(&noslpque))
+static char noslpque;
 
 /*
  * During autoconfiguration or after a panic, a sleep will simply
@@ -109,8 +115,17 @@ tsleep(const volatile void *ident, int p
 
KASSERT((priority & ~(PRIMASK | PCATCH)) == 0);
 
+   /* Make sure caller gave us at least one reason to stop sleeping. */
+   KASSERT(ident != NULL || (priority & PCATCH) != 0 || timo != 0);
+
 #ifdef MULTIPROCESSOR
-   KASSERT(timo || __mp_lock_held(&kernel_lock));
+   if (ident != NULL) {
+   /*
+* Waiting on a condition variable requires a lock, otherwise
+* we might miss our wakeup notification.
+*/
+   KASSERT(__mp_lock_held(&kernel_lock));
+   }
 #endif
 
if (cold || panicstr) {
@@ -191,12 +206,11 @@ sleep_setup(struct sleep_state *sls, con
 {
struct proc *p = curproc;
 
-#ifdef DIAGNOSTIC
+   KASSERT(ident != NOSLPQUE);
+   KASSERT(p->p_stat == SONPROC);
+
if (ident == NULL)
-   panic("tsleep: no ident");
-   if (p->p_stat != SONPROC)
-   panic("tsleep: not SONPROC");
-#endif
+   ident = NOSLPQUE;
 
 #ifdef KTRACE
if (KTRPOINT(p, KTR_CSW))
@@ -213,7 +227,8 @@ sleep_setup(struct sleep_state *sls, con
p->p_wmesg = wmesg;
p->p_slptime = 0;
p->p_priority = prio & PRIMASK;
-   TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_runq);
+   if (ident != NOSLPQUE)
+   TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_runq);
 }
 
 void
@@ -350,7 +365,8 @@ void
 unsleep(struct proc *p)
 {
if (p->p_wchan) {
-   TAILQ_REMOVE(&slpque[LOOKUP(p->p_wchan)], p, p_runq);
+   if (p->p_wchan != NOSLPQUE)
+   TAILQ_REMOVE(&slpque[LOOKUP(p->p_wchan)], p, p_runq);
p->p_wchan = NULL;
}
 }
@@ -365,6 +381,9 @@ wakeup_n(const volatile void *ident, int
struct proc *p;
struct proc *pnext;
int s;
+
+   KASSERT(ident != NULL);
+   KASSERT(ident != NOSLPQUE);
 
SCHED_LOCK(s);
qp = &slpque[LOOKUP(ident)];



Compile kernel with -std=gnu99

2014-07-08 Thread Matthew Dempsky
Diff below converts the kernel to build with -std=gnu99.  (For
simplicity, I've only included amd64 for now, but I'll make the same
change to all kernel Makefiles if this is ok.)

The only incompatibility (that I'm aware of) is that ISO C99's inline
semantics differ slightly from GNU C89's historical (but non-standard)
inline semantics, but I believe the diff below keeps us consistent
with the semantics the kernel currently assumes.  (More details
below.)

I've tested on amd64 and I get the exact same .o files with or without
this change (except vers.o, but only because of timestamping).  It's
probably worth conducting the same test on one of our GCC 3
architectures.

ok?

Boring standards details: GNU89 and C99 define "static inline" to have
the same semantics, but they differ for non-static definitions.  In
particular, C99 introduces the concept of "inline definitions" so that
it can allow non-static inline functions to be defined in headers and
used across multiple translation units without causing the function to
be receive an external definition in each object file.  GNU89 requires
"static inline" for inline functions in header files, but then you can
end up with multiple static definitions of the function if it's not
inlined everywhere.

That might seem scary for silently introducing incompatibilities, but
"fortunately" GCC 3.3 and 4.2 don't support C99 inline semantics
anyway (they always use GNU89 inline), and GCC 4.2 unconditionally
emits a warning if it sees code that would be compiled differently
under C99 semantics.  For newer compilers that do support C99 inline,
-fgnu89-inline forces the legacy GNU89 semantics while also silencing
the GCC 4.2 warning.

Index: Makefile.amd64
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/arch/amd64/conf/Makefile.amd64,v
retrieving revision 1.59
diff -u -p -r1.59 Makefile.amd64
--- Makefile.amd64  8 May 2014 17:59:28 -   1.59
+++ Makefile.amd64  8 Jul 2014 17:41:49 -
@@ -39,8 +39,13 @@ CMACHFLAGS+= -fno-stack-protector
 CMACHFLAGS+=   -Wa,-n
 .endif
 
+CSTD=  -std=gnu99
+.if ${COMPILER_VERSION} != gcc3
+CSTD+= -fgnu89-inline
+.endif
+
 COPTS?=-O2
-CFLAGS=${DEBUG} ${CWARNFLAGS} ${CMACHFLAGS} ${COPTS} ${PIPE}
+CFLAGS=${CSTD} ${DEBUG} ${CWARNFLAGS} ${CMACHFLAGS} ${COPTS} 
${PIPE}
 AFLAGS=-D_LOCORE -x assembler-with-cpp ${CWARNFLAGS} 
${CMACHFLAGS}
 LINKFLAGS= -Ttext 0x810001e0 -e start --warn-common -nopie
 



Re: increase netcat's buffer...

2014-07-08 Thread Ted Unangst
On Thu, Jun 26, 2014 at 13:43, Arne Becker wrote:

> Hi.
> 
>> Now soliciting diffs to change readwrite to a loop with two buffers
>> that poll()s in all four directions. :)
> 
> Good thing you made me remember I wrote just this a while ago.
> This is my first OpenBSD diff, so tell me if I missed anything obvious.
> Tested quite extensively originally; for this diff I only checked a
> simple nc to nc "hello".
> @@ -608,7 +616,7 @@ remote_connect(const char *host, const c
> 
> if (bind(s, (struct sockaddr *)ares->ai_addr,
> ares->ai_addrlen) < 0)
> - err(1, "bind failed");
> + errx(1, "bind failed: %s", strerror(errno));
> freeaddrinfo(ares);

This doesn't seem necessary, or correct.

> @@ -640,7 +648,7 @@ timeout_connect(int s, const struct sock
> if (timeout != -1) {
> flags = fcntl(s, F_GETFL, 0);
> if (fcntl(s, F_SETFL, flags | O_NONBLOCK) == -1)
> - err(1, "set non-blocking mode");
> + warn("unable to set non-blocking mode");

ok, maybe. i wonder what this will break...

> -readwrite(int nfd)
> +readwrite(int net_fd)

this part read ok, for the actual poll changes.

> @@ -877,8 +1049,20 @@ atelnet(int nfd, unsigned char *buf, uns
> 
> p++;
> obuf[2] = *p;
> +
> + if (!blocking) {
> + flags = fcntl(nfd, F_GETFL, 0);
> + if (fcntl(nfd, F_SETFL, flags & ~O_NONBLOCK) == -1)
> + warn("unable to set blocking mode");
> + blocking = 1;
> + }
> if (atomicio(vwrite, nfd, obuf, 3) != 3)
> warn("Write Error!");
> + }
> + if (blocking) {
> + flags = fcntl(nfd, F_GETFL, 0);
> + if (fcntl(nfd, F_SETFL, flags | O_NONBLOCK) == -1)
> + warn("unable to set non-blocking mode");
> }
> }

I don't understand this part. What's the reasoning?



Re: fsck_msdos: memleak merge with NetBSD

2014-07-08 Thread Ted Unangst
On Mon, Jun 30, 2014 at 21:26, Tobias Stoeckmann wrote:
> Hi,
> 
> this diff merges NetBSD's revision 1.20 into our tree:  There are
> some memory leaks in resetDosDirSection.
> 
> This is not a simple merge, I have added some things:
> 
> - rootDir was not properly freed in NetBSD's commit
> (actually it's put into a "free dir entry queue")
> - also free memory if root directory checks fail
> - I use goto's instead of rewriting the code every single time

> +fail:
> + freeDosDirEntry(rootDir);
> + rootDir = NULL;
> +fail_root:
> + free(delbuf);
> + delbuf = NULL;
> +fail_delbuf:
> + free(buffer);
> + buffer = NULL;
> + return (FSFATAL);

I think it's safe to just have one goto label? free(NULL) is fine. a
small preference, but it seems less error prone when there's one cleanup
section that does what's necessary, as opposed to the jumping code
needing to determine where to jump each time.



Re: Compile kernel with -std=gnu99

2014-07-08 Thread Mark Kettenis
> Date: Tue, 8 Jul 2014 11:17:35 -0700
> From: Matthew Dempsky 
> 
> Diff below converts the kernel to build with -std=gnu99.  (For
> simplicity, I've only included amd64 for now, but I'll make the same
> change to all kernel Makefiles if this is ok.)
> 
> The only incompatibility (that I'm aware of) is that ISO C99's inline
> semantics differ slightly from GNU C89's historical (but non-standard)
> inline semantics, but I believe the diff below keeps us consistent
> with the semantics the kernel currently assumes.  (More details
> below.)
> 
> I've tested on amd64 and I get the exact same .o files with or without
> this change (except vers.o, but only because of timestamping).  It's
> probably worth conducting the same test on one of our GCC 3
> architectures.
> 
> ok?

I disagree with this diff.  We should discourage the use of GNU
extensions in our kernel.  Therefore I think std=gnu99 would give the
wrong signal.

As for the inline issue.  IMHO, given the incompatbility problems
between ISO C and GNU C, only "static inline" is usable.  The
semantics of the other inline "forms" is just too confusing.  The
occasional extra copy of the code is as far as I'm concerned
acceptable.  The functions should be small enough for it not to
matter.



Re: Use atomics for mutating p_sigmask

2014-07-08 Thread Mark Kettenis
> Date: Tue, 8 Jul 2014 09:05:38 -0700
> From: Matthew Dempsky 
> 
> On Tue, Jul 8, 2014 at 1:29 AM, Mark Kettenis  wrote:
> >> Date: Mon, 7 Jul 2014 13:46:03 -0700
> >> From: Matthew Dempsky 
> >>
> >> p_sigmask is only modified by the owning thread from process context
> >> (e.g., sys_sigprocmask(), sys_sigreturn(), userret(), or postsig()),
> >> but it can be accessed anywhere (e.g., interrupts or threads on other
> >> CPUs).  Currently sys_sigprocmask() protects p_sigmask with splhigh(),
> >> but that's not MP-safe.
> >>
> >> The simpler solution is to take advantage of our atomics APIs.
> >> Unfortunately for implementing SIG_SETMASK, we don't have an
> >> atomic_store_int() function, and I can't bring myself to abuse
> >> "volatile" for this purpose, so I've resorted to atomic_swap_uint().
> >
> > Sory, but I think that's wrong.  You need volatile to make sure the
> > mask is read from memory when you're checking bits.  Or you need to
> > insert the proper memory barriers I think.
> 
> To be clear: I meant I don't want to abuse "volatile" as though it's a
> magic "make-these-operations-**atomic**" flag, as that's not what it
> really means (even if in practice it often has that effect).
> 
> Also, as long as the (always current thread) mutators and cross-thread
> accessors are still serialized by the kernel lock, marking p_sigmask
> as volatile shouldn't be necessary.  However, I do agree that once we
> start unlocking any of them (which is a future goal of this work), we
> need some sort of atomic guarantee on the read side too, and marking
> p_sigmask as volatile seems like a reasonable first step.  (I'd
> probably go further and also decorate the accesses with C11-style
> atomic_load()s.)
> 
> So I'm happy to mark p_sigmask as volatile with this diff if you'd
> prefer.  I just don't think it's strictly necessary yet, but I'm not
> opposed to it either.

Even if the kernel lock protects us now, I think not adding volatile
would cause nasty surprises in the future.



Re: Compile kernel with -std=gnu99

2014-07-08 Thread Matthew Dempsky
On Tue, Jul 8, 2014 at 12:03 PM, Mark Kettenis  wrote:
> I disagree with this diff.  We should discourage the use of GNU
> extensions in our kernel.  Therefore I think std=gnu99 would give the
> wrong signal.

Can you clarify your concern?  Currently we're (implicitly) compiling
with -std=gnu89, which is ISO C90 + GNU extensions.  This diff changes
us to (explicitly) compile with -std=gnu99 -fgnu89-inline, which is
ISO C99 + GNU extensions + GNU C89 inline.

I think moving from C90 to C99 is a good idea.

I'm indifferent to GNU extensions.  If we could make do without them,
then great; but technically inline asm is a GNU extension (even if
it's available in C99 mode) and I doubt we'll benefit from eliminating
that.

Using GNU89 inline is an intermediary step, because the kernel isn't
ready for C99 inline.  See below.

> As for the inline issue.  IMHO, given the incompatbility problems
> between ISO C and GNU C, only "static inline" is usable.  The
> semantics of the other inline "forms" is just too confusing.  The
> occasional extra copy of the code is as far as I'm concerned
> acceptable.  The functions should be small enough for it not to
> matter.

Converting the existing inline functions to be C99 compatible (either
by adding static or removing inline) is a next step I have planned,
but I want to allow other C99 features first.

Also, there are "inline" functions in MD code, so I'd rather have all
kernels on -std=gnu99 -fgnu89-inline.  Then we can start cleaning up
GNU89 inlines and remove -fgnu89-inline on arches once they're clean.
E.g., first clean up all MI and x86 inlines, then the x86 kernels can
start compiling without -fgnu89-inline, which will make sure we don't
regress in MI code while we start tackling other MD files.



Re: Compile kernel with -std=gnu99

2014-07-08 Thread Mark Kettenis
> 
> On Tue, Jul 8, 2014 at 12:03 PM, Mark Kettenis  
> wrote:
> > I disagree with this diff.  We should discourage the use of GNU
> > extensions in our kernel.  Therefore I think std=gnu99 would give the
> > wrong signal.
> 
> Can you clarify your concern?  Currently we're (implicitly) compiling
> with -std=gnu89, which is ISO C90 + GNU extensions.  This diff changes
> us to (explicitly) compile with -std=gnu99 -fgnu89-inline, which is
> ISO C99 + GNU extensions + GNU C89 inline.
> 
> I think moving from C90 to C99 is a good idea.
> 
> I'm indifferent to GNU extensions.  If we could make do without them,
> then great; but technically inline asm is a GNU extension (even if
> it's available in C99 mode) and I doubt we'll benefit from eliminating
> that.
> 
> Using GNU89 inline is an intermediary step, because the kernel isn't
> ready for C99 inline.  See below.
> 
> > As for the inline issue.  IMHO, given the incompatbility problems
> > between ISO C and GNU C, only "static inline" is usable.  The
> > semantics of the other inline "forms" is just too confusing.  The
> > occasional extra copy of the code is as far as I'm concerned
> > acceptable.  The functions should be small enough for it not to
> > matter.
> 
> Converting the existing inline functions to be C99 compatible (either
> by adding static or removing inline) is a next step I have planned,
> but I want to allow other C99 features first.
> 
> Also, there are "inline" functions in MD code, so I'd rather have all
> kernels on -std=gnu99 -fgnu89-inline.  Then we can start cleaning up
> GNU89 inlines and remove -fgnu89-inline on arches once they're clean.
> E.g., first clean up all MI and x86 inlines, then the x86 kernels can
> start compiling without -fgnu89-inline, which will make sure we don't
> regress in MI code while we start tackling other MD files.

With that explanation, this sounds a lot more reasonable.



Re: fsck_msdos: memleak merge with NetBSD

2014-07-08 Thread Tobias Stoeckmann
On Tue, Jul 08, 2014 at 03:01:58PM -0400, Ted Unangst wrote:
> I think it's safe to just have one goto label? free(NULL) is fine.

That's true.  If we go into that direction, we can also use the cleanup
function that would be done at the end of directory processing.

Also put two variables static that are not used anywhere else, easier
to verify that there are no side-effects.

This version looks definitely simpler.


Tobias

Index: dir.c
===
RCS file: /cvs/src/sbin/fsck_msdos/dir.c,v
retrieving revision 1.23
diff -u -p -r1.23 dir.c
--- dir.c   18 Jun 2014 17:29:07 -  1.23
+++ dir.c   8 Jul 2014 19:29:54 -
@@ -147,7 +147,7 @@ freeDirTodo(struct dirTodoNode *dt)
 /*
  * The stack of unread directories
  */
-struct dirTodoNode *pendingDirectories = NULL;
+static struct dirTodoNode *pendingDirectories = NULL;
 
 /*
  * Return the full pathname for a directory entry.
@@ -203,7 +203,7 @@ static char longName[DOSLONGNAMELEN] = "
 static u_char *buffer = NULL;
 static u_char *delbuf = NULL;
 
-struct dosDirEntry *rootDir;
+static struct dosDirEntry *rootDir;
 static struct dosDirEntry *lostDir;
 
 /*
@@ -223,14 +223,14 @@ resetDosDirSection(struct bootblock *boo
|| !(delbuf = malloc(b2))
|| !(rootDir = newDosDirEntry())) {
xperror("No space for directory");
-   return (FSFATAL);
+   goto fail;
}
(void)memset(rootDir, 0, sizeof *rootDir);
if (boot->flags & FAT32) {
if (boot->RootCl < CLUST_FIRST || boot->RootCl >= 
boot->NumClusters) {
pfatal("Root directory starts with cluster out of 
range(%u)\n",
   boot->RootCl);
-   return (FSFATAL);
+   goto fail;
}
cl = fat[boot->RootCl].next;
if (cl < CLUST_FIRST
@@ -243,7 +243,7 @@ resetDosDirSection(struct bootblock *boo
  rsrvdcltype(cl));
else {
pfatal("Root directory doesn't start a cluster 
chain\n");
-   return (FSFATAL);
+   goto fail;
}
if (ask(1, "Fix")) {
fat[boot->RootCl].next = CLUST_FREE;
@@ -257,6 +257,9 @@ resetDosDirSection(struct bootblock *boo
}
 
return (ret);
+fail:
+   finishDosDirSection();
+   return (FSFATAL);
 }
 
 /*
Index: ext.h
===
RCS file: /cvs/src/sbin/fsck_msdos/ext.h,v
retrieving revision 1.11
diff -u -p -r1.11 ext.h
--- ext.h   16 Jun 2014 18:33:33 -  1.11
+++ ext.h   8 Jul 2014 19:29:54 -
@@ -46,8 +46,6 @@ extern int rdonly;/* device is opened r
 
 extern struct disklabel lab;
 
-extern struct dosDirEntry *rootDir;
-
 /*
  * function declarations
  */



Re: Compile kernel with -std=gnu99

2014-07-08 Thread Matthew Dempsky
On Tue, Jul 8, 2014 at 12:28 PM, Mark Kettenis  wrote:
> With that explanation, this sounds a lot more reasonable.

Ah, good. :)  Sorry, I should have mentioned up front the followup
steps I had planned and the rationale for taking this path.



more deadbeef

2014-07-08 Thread Ted Unangst
how would you like your deadbeef? well done?

since we support variable poison values, we should invert the patterns
sometimes to find bugs where the deadbeef values hide the bugs. (we
can of course do even more with extra tricksy bit patterns, but start
with this.)

Index: subr_poison.c
===
RCS file: /cvs/src/sys/kern/subr_poison.c,v
retrieving revision 1.7
diff -u -p -r1.7 subr_poison.c
--- subr_poison.c   19 May 2014 14:30:03 -  1.7
+++ subr_poison.c   8 Jul 2014 20:09:38 -
@@ -43,7 +43,17 @@ poison_value(void *v)
 
l = l >> PAGE_SHIFT;
 
-   return (l & 1) ? POISON0 : POISON1;
+   switch (l & 3) {
+   case 0:
+   return POISON0;
+   case 1:
+   return POISON1;
+   case 2:
+   return ~POISON0;
+   case 3:
+   return ~POISON1;
+   }
+   return 0;
 }
 
 void



Re: more deadbeef

2014-07-08 Thread Theo de Raadt
> how would you like your deadbeef? well done?
> 
> since we support variable poison values, we should invert the patterns
> sometimes to find bugs where the deadbeef values hide the bugs. (we
> can of course do even more with extra tricksy bit patterns, but start
> with this.)

I have been worried about this for quite a while.  Especially for flag
bits .



diff: Option to use duids in /etc/dumpdates

2014-07-08 Thread Maximilian Fillinger
Hi!

This diff adds a "-U" flag to dump that allows using disklabel
UIDs in /etc/dumpdates. That makes incremental dumps possible when a
disk is roaming between device files.

I'd be happy to receive comments.

Also, sorry for the noise in misc, and thanks to everyone pointing me
in the right direction.

Best regards,
Max

--- sbin/dump/dump.h2014/06/24 21:35:13 1.1
+++ sbin/dump/dump.h2014/06/24 21:38:47 1.2
@@ -56,9 +56,11 @@
 char   *tape;  /* name of the tape file */
 char   *dumpdates; /* name of the file containing dump date information*/
 char   *temp;  /* name of the file for doing rewrite of dumpdates */
+char   *duid;  /* duid of the disk being dumped */
 char   lastlevel;  /* dump level of previous dump */
 char   level;  /* dump level of this dump */
 intuflag;  /* update flag */
+intduidflag;   /* use duids in dumpdates flag */
 intdiskfd; /* disk file descriptor */
 inttapefd; /* tape file descriptor */
 intpipeout;/* true => output to standard output */


--- sbin/dump/main.c2014/06/24 21:35:37 1.1
+++ sbin/dump/main.c2014/06/24 21:38:23 1.2
@@ -112,7 +112,7 @@
usage();

obsolete(&argc, &argv);
-   while ((ch = getopt(argc, argv, "0123456789aB:b:cd:f:h:ns:ST:uWw")) != 
-1)
+   while ((ch = getopt(argc, argv, "0123456789aB:b:cd:f:h:ns:ST:UuWw")) != 
-1)
switch (ch) {
/* dump level */
case '0': case '1': case '2': case '3': case '4':
@@ -180,6 +180,9 @@
lastlevel = '?';
break;

+   case 'U':
+   duidflag = 1;   /* use duids */
+   break;
case 'u':   /* update /etc/dumpdates */
uflag = 1;
break;
@@ -370,6 +373,21 @@
(void)gethostname(spcl.c_host, sizeof(spcl.c_host));
spcl.c_level = level - '0';
spcl.c_type = TS_TAPE;
+
+   if ((diskfd = open(disk, O_RDONLY)) < 0) {
+   msg("Cannot open %s\n", disk);
+   exit(X_STARTUP);
+   }
+   if (ioctl(diskfd, DIOCGDINFO, (char *)&lab) < 0)
+   err(1, "ioctl (DIOCGDINFO)");
+   if (duidflag && asprintf(&duid,
+   "%02hhx%02hhx%02hhx%02hhx%02hhx%02hhx%02hhx%02hhx.%c",
+   lab.d_uid[0], lab.d_uid[1], lab.d_uid[2], lab.d_uid[3],
+   lab.d_uid[4], lab.d_uid[5], lab.d_uid[6], lab.d_uid[7],
+   disk[strlen(disk)-1]) == -1) {
+   msg("Cannot malloc duid\n");
+   exit(X_STARTUP);
+   }
if (!Tflag)
getdumptime();  /* /etc/dumpdates snarfed */

@@ -387,10 +405,6 @@
else
msgtail("to %s\n", tape);

-   if ((diskfd = open(disk, O_RDONLY)) < 0) {
-   msg("Cannot open %s\n", disk);
-   exit(X_STARTUP);
-   }
if (ioctl(diskfd, DIOCGPDINFO, (char *)&lab) < 0)
err(1, "ioctl (DIOCGPDINFO)");
sync();


--- sbin/dump/itime.c   2014/06/24 21:35:30 1.1
+++ sbin/dump/itime.c   2014/06/24 21:38:33 1.2
@@ -124,7 +124,7 @@
int i;
char *fname;

-   fname = disk;
+   fname = duidflag ? duid : disk;
 #ifdef FDEBUG
msg("Looking for name %s in dumpdates = %s for level = %c\n",
fname, dumpdates, level);
@@ -164,7 +164,7 @@
quit("cannot rewrite %s: %s\n", dumpdates, strerror(errno));
fd = fileno(df);
(void) flock(fd, LOCK_EX);
-   fname = disk;
+   fname = duidflag ? duid : disk;
free((char *)ddatev);
ddatev = 0;
nddates = 0;


--- include/protocols/dumprestore.h 2014/06/24 22:23:05 1.1
+++ include/protocols/dumprestore.h 2014/06/24 22:52:53 1.3
@@ -152,8 +152,8 @@
 #define DR_NEWHEADER   0x0001  /* new format tape header */
 #define DR_NEWINODEFMT 0x0002  /* new format inodes on tape */

-#defineDUMPOUTFMT  "%-16s %c %s"   /* for printf */
+#defineDUMPOUTFMT  "%-18s %c %s"   /* for printf */
/* name, level, ctime(date) */
-#defineDUMPINFMT   "%16s %c %[^\n]\n"  /* inverse for scanf */
+#defineDUMPINFMT   "%18s %c %[^\n]\n"  /* inverse for scanf */

 #endif /* !_PROTOCOLS_DUMPRESTORE_H_ */


--- sbin/dump/dump.82014/06/24 21:35:21 1.1
+++ sbin/dump/dump.82014/07/07 23:33:58 1.3
@@ -40,7 +40,7 @@
 .Sh SYNOPSIS
 .Nm dump
 .Bk -words
-.Op Fl 0123456789acnSuWw
+.Op Fl 0123456789acnSUuWw
 .Op Fl B Ar records
 .Op Fl b Ar blocksize
 .Op Fl d Ar density
@@ -229,6 +229,13 @@
 flag is mutually exclusive from the
 .Fl u
 flag.
+.It Fl U
+Use the
+.Xr disklabel 8
+UID instead of the device name when updating
+.Pa /etc/dumpdates
+and when searching for the date of the latest
+lower-level dump.
 .It Fl u
 Up

Re: Use atomics for mutating p_sigmask

2014-07-08 Thread Matthew Dempsky
Updated diff below with the following changes:

- p_sigmask is now volatile, per kettenis's request
- kern_fork.c's memcpy() for p_startcopy needs to cast away the
  volatile now
- kettenis pointed out atomic_swap_uint() isn't safe to use in MI code
  yet, so for now proc_sigsetmask() just relies on volatile assignment
  being atomic

ok?

Index: sys/proc.h
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/sys/proc.h,v
retrieving revision 1.186
diff -u -p -r1.186 proc.h
--- sys/proc.h  15 May 2014 03:52:25 -  1.186
+++ sys/proc.h  8 Jul 2014 19:19:17 -
@@ -310,7 +310,7 @@ struct proc {
 
 /* The following fields are all copied upon creation in fork. */
 #definep_startcopy p_sigmask
-   sigset_t p_sigmask; /* Current signal mask. */
+   volatile sigset_t p_sigmask;/* Current signal mask. */
 
u_char  p_priority; /* Process priority. */
u_char  p_usrpri;   /* User-priority based on p_cpu and ps_nice. */
@@ -520,6 +520,8 @@ int proc_cansugid(struct proc *);
 void   proc_finish_wait(struct proc *, struct proc *);
 void   process_zap(struct process *);
 void   proc_free(struct proc *);
+void   proc_sigsetmask(struct proc *, sigset_t);
+void   proc_sigsuspend(struct proc *, sigset_t);
 
 struct sleep_state {
int sls_s;
@@ -559,4 +561,3 @@ struct cpu_info *cpuset_first(struct cpu
 
 #endif /* _KERNEL */
 #endif /* !_SYS_PROC_H_ */
-
Index: kern/kern_sig.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/kern/kern_sig.c,v
retrieving revision 1.166
diff -u -p -r1.166 kern_sig.c
--- kern/kern_sig.c 4 May 2014 05:03:26 -   1.166
+++ kern/kern_sig.c 8 Jul 2014 20:29:50 -
@@ -62,6 +62,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -439,37 +440,55 @@ sys_sigprocmask(struct proc *p, void *v,
syscallarg(int) how;
syscallarg(sigset_t) mask;
} */ *uap = v;
-   int error = 0;
-   int s;
-   sigset_t mask;
+   sigset_t mask = SCARG(uap, mask);
 
*retval = p->p_sigmask;
-   mask = SCARG(uap, mask);
-   s = splhigh();
 
switch (SCARG(uap, how)) {
case SIG_BLOCK:
-   p->p_sigmask |= mask &~ sigcantmask;
+   atomic_setbits_int(&p->p_sigmask, mask &~ sigcantmask);
break;
case SIG_UNBLOCK:
-   p->p_sigmask &= ~mask;
+   atomic_clearbits_int(&p->p_sigmask, mask);
break;
case SIG_SETMASK:
-   p->p_sigmask = mask &~ sigcantmask;
+   proc_sigsetmask(p, mask &~ sigcantmask);
break;
default:
-   error = EINVAL;
-   break;
+   return (EINVAL);
}
-   splx(s);
-   return (error);
+
+   return (0);
+}
+
+void
+proc_sigsetmask(struct proc *p, sigset_t mask)
+{
+   KASSERT(p == curproc);
+
+   /* XXX: Abusing volatile assignment as atomic store operation. */
+   p->p_sigmask = mask;
+}
+
+/*
+ * Temporarily replace calling proc's signal mask for the duration of a
+ * system call.  Original signal mask will be restored by userret().
+ */
+void
+proc_sigsuspend(struct proc *p, sigset_t mask)
+{
+   KASSERT(p == curproc);
+   KASSERT(!(p->p_flag & P_SIGSUSPEND));
+
+   p->p_oldmask = p->p_sigmask;
+   atomic_setbits_int(&p->p_flag, P_SIGSUSPEND);
+   proc_sigsetmask(p, mask);
 }
 
 /* ARGSUSED */
 int
 sys_sigpending(struct proc *p, void *v, register_t *retval)
 {
-
*retval = p->p_siglist;
return (0);
 }
@@ -489,16 +508,7 @@ sys_sigsuspend(struct proc *p, void *v, 
struct process *pr = p->p_p;
struct sigacts *ps = pr->ps_sigacts;
 
-   /*
-* When returning from sigpause, we want
-* the old mask to be restored after the
-* signal handler has finished.  Thus, we
-* save it here and mark the sigacts structure
-* to indicate this.
-*/
-   p->p_oldmask = p->p_sigmask;
-   atomic_setbits_int(&p->p_flag, P_SIGSUSPEND);
-   p->p_sigmask = SCARG(uap, mask) &~ sigcantmask;
+   proc_sigsuspend(p, SCARG(uap, mask) &~ sigcantmask);
while (tsleep(ps, PPAUSE|PCATCH, "pause", 0) == 0)
/* void */;
/* always return EINTR rather than ERESTART... */
@@ -749,7 +759,7 @@ trapsignal(struct proc *p, int signum, u
p->p_ru.ru_nsignals++;
(*pr->ps_emul->e_sendsig)(ps->ps_sigact[signum], signum,
p->p_sigmask, trapno, code, sigval);
-   p->p_sigmask |= ps->ps_catchmask[signum];
+   atomic_setbits_int(&p->p_sigmask, ps->ps_catchmask[signum]);
if ((ps->ps_sigreset & mask) != 0) {
ps->ps_sigcatch &= ~mask;
if (signum != SIGCONT && sigprop[signum] & S

make ifconfig scan show encryption type

2014-07-08 Thread Stefan Sperling
This makes ifconfig scan indicate the type of encryption used on a network.

To make this work, the kernel must fill in RSN info every time it runs
a scan, not just if wpa was already enabled (i.e. the IEEE80211_F_RSNON
flag is already set).

While here, add missing definition for IEEE80211_WPA_CIPHER_BIP.

I don't have an 802.1x (wpa enterprise) network in range, so I couldn't
test that part.

This is an ABI break since the ioctl node request data structure grows.

ok?

Index: sbin/ifconfig/ifconfig.c
===
RCS file: /cvs/src/sbin/ifconfig/ifconfig.c,v
retrieving revision 1.284
diff -u -p -r1.284 ifconfig.c
--- sbin/ifconfig/ifconfig.c23 Jun 2014 18:44:43 -  1.284
+++ sbin/ifconfig/ifconfig.c8 Jul 2014 21:11:22 -
@@ -2281,6 +2281,18 @@ ieee80211_printnode(struct ieee80211_nod
nr->nr_capinfo &= ~IEEE80211_CAPINFO_ESS;
if (nr->nr_capinfo) {
printb_status(nr->nr_capinfo, IEEE80211_CAPINFO_BITS);
+   if (nr->nr_capinfo & IEEE80211_CAPINFO_PRIVACY) {
+   if (nr->nr_rsnciphers & IEEE80211_WPA_CIPHER_CCMP)
+   fputs(",wpa2", stdout);
+   else if (nr->nr_rsnciphers & IEEE80211_WPA_CIPHER_TKIP)
+   fputs(",wpa1", stdout);
+   else
+   fputs(",wep", stdout);
+
+   if (nr->nr_rsnakms & IEEE80211_WPA_AKM_8021X ||
+   nr->nr_rsnakms & IEEE80211_WPA_AKM_SHA256_8021X)
+   fputs(",802.1x", stdout);
+   }
putchar(' ');
}
 
Index: sys/net80211/ieee80211_input.c
===
RCS file: /cvs/src/sys/net80211/ieee80211_input.c,v
retrieving revision 1.123
diff -u -p -r1.123 ieee80211_input.c
--- sys/net80211/ieee80211_input.c  11 Jun 2013 18:15:53 -  1.123
+++ sys/net80211/ieee80211_input.c  8 Jul 2014 21:11:22 -
@@ -1579,11 +1579,11 @@ ieee80211_recv_probe_resp(struct ieee802
ieee80211_parse_wmm_params(ic, wmmie);
}
 
-   if (ic->ic_state == IEEE80211_S_SCAN &&
+   if (ic->ic_state == IEEE80211_S_SCAN
 #ifndef IEEE80211_STA_ONLY
-   ic->ic_opmode != IEEE80211_M_HOSTAP &&
+   && ic->ic_opmode != IEEE80211_M_HOSTAP
 #endif
-   (ic->ic_flags & IEEE80211_F_RSNON)) {
+  ) {
struct ieee80211_rsnparams rsn;
const u_int8_t *saveie = NULL;
/*
@@ -1613,8 +1613,7 @@ ieee80211_recv_probe_resp(struct ieee802
ni->ni_rsncaps = rsn.rsn_caps;
} else
ni->ni_rsnprotos = IEEE80211_PROTO_NONE;
-   } else if (ic->ic_state == IEEE80211_S_SCAN)
-   ni->ni_rsnprotos = IEEE80211_PROTO_NONE;
+   }
 
if (ssid[1] != 0 && ni->ni_esslen == 0) {
ni->ni_esslen = ssid[1];
Index: sys/net80211/ieee80211_ioctl.c
===
RCS file: /cvs/src/sys/net80211/ieee80211_ioctl.c,v
retrieving revision 1.34
diff -u -p -r1.34 ieee80211_ioctl.c
--- sys/net80211/ieee80211_ioctl.c  29 Sep 2010 20:00:51 -  1.34
+++ sys/net80211/ieee80211_ioctl.c  8 Jul 2014 21:11:22 -
@@ -91,7 +91,18 @@ ieee80211_node2req(struct ieee80211com *
nr->nr_inact = ni->ni_inact;
nr->nr_txrate = ni->ni_txrate;
nr->nr_state = ni->ni_state;
-   /* XXX RSN */
+
+   /* RSN */
+   nr->nr_rsnciphers = ni->ni_rsnciphers;
+   nr->nr_rsnakms = 0;
+   if (ni->ni_rsnakms & IEEE80211_AKM_8021X)
+   nr->nr_rsnakms |= IEEE80211_WPA_AKM_8021X;
+   if (ni->ni_rsnakms & IEEE80211_AKM_PSK)
+   nr->nr_rsnakms |= IEEE80211_WPA_AKM_PSK;
+   if (ni->ni_rsnakms & IEEE80211_AKM_SHA256_8021X)
+   nr->nr_rsnakms |= IEEE80211_WPA_AKM_SHA256_8021X;
+   if (ni->ni_rsnakms & IEEE80211_AKM_SHA256_PSK)
+   nr->nr_rsnakms |= IEEE80211_WPA_AKM_SHA256_PSK;
 
/* Node flags */
nr->nr_flags = 0;
Index: sys/net80211/ieee80211_ioctl.h
===
RCS file: /cvs/src/sys/net80211/ieee80211_ioctl.h,v
retrieving revision 1.18
diff -u -p -r1.18 ieee80211_ioctl.h
--- sys/net80211/ieee80211_ioctl.h  4 Mar 2011 23:48:15 -   1.18
+++ sys/net80211/ieee80211_ioctl.h  8 Jul 2014 21:11:22 -
@@ -230,6 +230,7 @@ struct ieee80211_wpapsk {
 #define IEEE80211_WPA_CIPHER_TKIP  0x04
 #define IEEE80211_WPA_CIPHER_CCMP  0x08
 #define IEEE80211_WPA_CIPHER_WEP1040x10
+#define IEEE80211_WPA_CIPHER_BIP   0x20
 
 #define IEEE80211_WPA_AKM_PSK  0x01
 #define IEEE80211_WPA_AKM_8021X0x02
@@ -311,7 +312,9 @@ struct ieee80211_nodereq {
u_int8_tnr_txrate;  /* index to nr_rates[]

Re: Paravirtualized optimizations for KVM

2014-07-08 Thread Mark Kettenis
> Date: Tue, 8 Jul 2014 09:22:41 +0200 (CEST)
> From: Stefan Fritsch 
> 
> Hi,
> 
> I have been trying to increase fork performance of openbsd/amd64 on KVM. 
> It turns out that when I increase the number of CPUs of a VM from 1 to 3, 
> a fork+exit micro benchmark is slowed down by a factor of 7.
> 
> The main reason for this seems to be a very large number of cross-CPU TLB 
> flushes (about 4 per fork+exit). Each IPI causes several VM exits which 
> are expensive. To reduce this, I have been trying to use paravirtualized 
> interfaces provided by KVM and optimize some other things. These changes 
> are mostly activated by a new pseudo device paravirt (which has the 
> advantage that one can use UKC to tweak things without recompiling). 
> However, some changes will remain if not running on a hypervisor (or 
> paravirt is disabled). For example, x86_ipi() will use a pointer to 
> dispatch to the appropriate implementation.
> 
> Is this the way to go forward? Or would you rather prefer a compile time 
> option and maybe ship a bsd.mp.paravirt kernel in addition to bsd+bsd.mp?

Are these paravirtualization APIs stable?  Are they (properly)
documented somewhere?

If we're serious about supporting OpenBSD on (KVM) hypervisors,
something like this makes sense.  We tend to try and have a single
kernel that runs on the widest range of hardware that is possible.
For example the OpenBSD/sparc64 kernel runs on both sun4u and sun4v
hardware, and the sun4v platforms has written paravirtualization all
over it.  There I successfully made use of code patching techniques.
That might help on x86 as well.

Can't say I'm happy with making the interrupt handling code even more
complicated though...



Re: Use atomics for mutating p_sigmask

2014-07-08 Thread Philip Guenther
On Mon, Jul 7, 2014 at 1:46 PM, Matthew Dempsky  wrote:

> p_sigmask is only modified by the owning thread from process context
> (e.g., sys_sigprocmask(), sys_sigreturn(), userret(), or postsig()),
> but it can be accessed anywhere (e.g., interrupts or threads on other
> CPUs).  Currently sys_sigprocmask() protects p_sigmask with splhigh(),
> but that's not MP-safe.
>

(warning: "MP-safe" used before definition.)


While moderately cool, I don't think this is the right direction to go for
two reasons:

 * using atomics when we don't even have correct process-pending signals
risks a solution that
   doesn't work when more than one pending set has to be examined.  In the
mean time, atomics
   do have a cost in bus contention, so would we just end up pulling the
atomics back out after
   putting this stuff under a mutex or rwlock?

 * something tells me you're looking at this because it's a tall pole in
syscall counts because of
   ld.so..but I'm working to drop it from the top 20 syscalls as a
side-effect of adding __kbind()


Philip Guenther


Re: Use atomics for mutating p_sigmask

2014-07-08 Thread Matthew Dempsky
On Tue, Jul 8, 2014 at 3:51 PM, Philip Guenther  wrote:
>  * using atomics when we don't even have correct process-pending signals
> risks a solution that
>doesn't work when more than one pending set has to be examined.

I'll hold off on this until we have process-pending signals then.



port-modules.5 small fix

2014-07-08 Thread Brian Callahan

Hi everyone --

port-modules.5 is missing an "If" to make a complete sentence.
OK?

~Brian

Index: port-modules.5
===
RCS file: /cvs/src/share/man/man5/port-modules.5,v
retrieving revision 1.172
diff -u -p -r1.172 port-modules.5
--- port-modules.5  2 Apr 2014 15:00:27 -   1.172
+++ port-modules.5  9 Jul 2014 01:37:23 -
@@ -1184,6 +1184,7 @@ is appended to
 .Ev MODGNOME_RUN_DEPENDS
 and a link to /usr/bin/true is created under
 .Pa ${WRKDIR}/bin/desktop-file-validate .
+If
 .Ev MODGNOME_TOOLS
 is set to docbook,
 .Pa textproc/docbook-xsl



Re: port-modules.5 small fix

2014-07-08 Thread Daniel Dickman
On Tue, Jul 8, 2014 at 9:59 PM, Brian Callahan  wrote:
> Hi everyone --
>
> port-modules.5 is missing an "If" to make a complete sentence.
> OK?

ok daniel@

>
> ~Brian
>
> Index: port-modules.5
> ===
> RCS file: /cvs/src/share/man/man5/port-modules.5,v
> retrieving revision 1.172
> diff -u -p -r1.172 port-modules.5
> --- port-modules.5  2 Apr 2014 15:00:27 -   1.172
> +++ port-modules.5  9 Jul 2014 01:37:23 -
> @@ -1184,6 +1184,7 @@ is appended to
>  .Ev MODGNOME_RUN_DEPENDS
>  and a link to /usr/bin/true is created under
>  .Pa ${WRKDIR}/bin/desktop-file-validate .
> +If
>  .Ev MODGNOME_TOOLS
>  is set to docbook,
>  .Pa textproc/docbook-xsl
>



growfs error message, lack of comprehension

2014-07-08 Thread Adam Thompson
On 5.4-RELEASE, I'm trying to use growfs to expand a root filesystem. 
I've grown the disk from 2GB to 10GB, I've used disklabel(8) to adjust 
the OpenBSD area and the size of partition 'a'.  All those numbers line up.
Rebooting into bsd.rd, copying /sbin/growfs to the ramdisk, and then 
using it (successfully, more or less), I still see this error:


new filesystem size is: 3411480 frags
Warning: 376928 sector(s) cannot be allocated.
growfs: not enough new space

I can see in growfs.c where that messages is being emitted, and I see 
the comment there that says "The space in the new last cylinder group is 
too small, so revert back.
I lack, however, a sufficiently low-level understanding of FFS internals 
to understand _why_ the space in the new last cylinder group is too small.
dumpfs(8) reports "cylgrp dynamic inodes 4.4BSD fslevel 3", which I 
vaguely understood to mean that cylinder groups were now 
dynamically-sized things...?


So, per growfs, what is the space too small _for_ ?  Too small for 
another cylinder group?  How would I go about calculating how much more 
(or less) disk space I would need to make growfs happy, and not wasting ?


Now waiting incoming cluebats :-(

--
-Adam Thompson
 athom...@athompso.net



divert(4) without mbuf tags

2014-07-08 Thread Lawrence Teo
The current divert(4) implementation allocates an mbuf tag in pf_test()
to store the divert port specified by a divert-packet PF rule.

The divert_packet() function then looks up that mbuf tag to retrieve the
divert port number before sending the packet to userspace.

As far as I can tell, this approach of using an mbuf tag was borrowed
from divert-to's implementation.  However, in the case of divert(4) I
think it's overkill because once the packet has reached userspace,
its mbuf and mbuf tag are no longer needed.

I would like to simplify divert(4)'s implementation by passing the
divert port to divert_packet() directly as an argument, which avoids the
allocation of an mbuf tag completely.

ok?


Index: net/pf.c
===
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.878
diff -u -p -u -p -r1.878 pf.c
--- net/pf.c20 May 2014 11:03:13 -  1.878
+++ net/pf.c14 Jun 2014 18:12:06 -
@@ -6617,14 +6617,8 @@ done:
}
}
 
-   if (action == PF_PASS && r->divert_packet.port) {
-   struct pf_divert *divert;
-
-   if ((divert = pf_get_divert(pd.m)))
-   divert->port = r->divert_packet.port;
-
+   if (action == PF_PASS && r->divert_packet.port)
action = PF_DIVERT;
-   }
 
if (pd.pflog) {
struct pf_rule_item *ri;
@@ -6651,12 +6645,12 @@ done:
case PF_DIVERT:
switch (pd.af) {
case AF_INET:
-   if (divert_packet(pd.m, pd.dir) == 0)
+   if (!divert_packet(pd.m, pd.dir, r->divert_packet.port))
*m0 = NULL;
break;
 #ifdef INET6
case AF_INET6:
-   if (divert6_packet(pd.m, pd.dir) == 0)
+   if (!divert6_packet(pd.m, pd.dir, 
r->divert_packet.port))
*m0 = NULL;
break;
 #endif /* INET6 */
Index: netinet/ip_divert.c
===
RCS file: /cvs/src/sys/netinet/ip_divert.c,v
retrieving revision 1.22
diff -u -p -u -p -r1.22 ip_divert.c
--- netinet/ip_divert.c 23 Apr 2014 14:43:14 -  1.22
+++ netinet/ip_divert.c 14 Jun 2014 17:45:12 -
@@ -189,12 +189,11 @@ fail:
 }
 
 int
-divert_packet(struct mbuf *m, int dir)
+divert_packet(struct mbuf *m, int dir, u_int16_t divert_port)
 {
struct inpcb *inp;
struct socket *sa = NULL;
struct sockaddr_in addr;
-   struct pf_divert *divert;
 
inp = NULL;
divstat.divs_ipackets++;
@@ -205,15 +204,8 @@ divert_packet(struct mbuf *m, int dir)
return (0);
}
 
-   divert = pf_find_divert(m);
-   if (divert == NULL) {
-   divstat.divs_errors++;
-   m_freem(m);
-   return (0);
-   }
-
TAILQ_FOREACH(inp, &divbtable.inpt_queue, inp_queue) {
-   if (inp->inp_lport != divert->port)
+   if (inp->inp_lport != divert_port)
continue;
if (inp->inp_divertfl == 0)
break;
Index: netinet/ip_divert.h
===
RCS file: /cvs/src/sys/netinet/ip_divert.h,v
retrieving revision 1.5
diff -u -p -u -p -r1.5 ip_divert.h
--- netinet/ip_divert.h 23 Apr 2014 14:43:14 -  1.5
+++ netinet/ip_divert.h 14 Jun 2014 17:52:05 -
@@ -55,7 +55,7 @@ extern struct divstat divstat;
 
 voiddivert_init(void);
 voiddivert_input(struct mbuf *, ...);
-int divert_packet(struct mbuf *, int);
+int divert_packet(struct mbuf *, int, u_int16_t);
 int divert_sysctl(int *, u_int, void *, size_t *, void *, size_t);
 int divert_usrreq(struct socket *,
int, struct mbuf *, struct mbuf *, struct mbuf *, struct proc *);
Index: netinet6/ip6_divert.c
===
RCS file: /cvs/src/sys/netinet6/ip6_divert.c,v
retrieving revision 1.23
diff -u -p -u -p -r1.23 ip6_divert.c
--- netinet6/ip6_divert.c   28 Apr 2014 15:43:04 -  1.23
+++ netinet6/ip6_divert.c   14 Jun 2014 18:13:03 -
@@ -188,12 +188,11 @@ fail:
 }
 
 int
-divert6_packet(struct mbuf *m, int dir)
+divert6_packet(struct mbuf *m, int dir, u_int16_t divert_port)
 {
struct inpcb *inp;
struct socket *sa = NULL;
struct sockaddr_in6 addr;
-   struct pf_divert *divert;
 
inp = NULL;
div6stat.divs_ipackets++;
@@ -204,15 +203,8 @@ divert6_packet(struct mbuf *m, int dir)
return (0);
}
 
-   divert = pf_find_divert(m);
-   if (divert == NULL) {
-   div6stat.divs_errors++;
-   m_freem(m);
-   return (0);
-   }
-
TAILQ_FOREACH(inp, &divb6table.inpt_queue, inp_queue) {
-   if (inp->inp_l

Re: growfs error message, lack of comprehension

2014-07-08 Thread Otto Moerbeek
On Tue, Jul 08, 2014 at 09:41:08PM -0500, Adam Thompson wrote:

> On 5.4-RELEASE, I'm trying to use growfs to expand a root filesystem. I've
> grown the disk from 2GB to 10GB, I've used disklabel(8) to adjust the
> OpenBSD area and the size of partition 'a'.  All those numbers line up.
> Rebooting into bsd.rd, copying /sbin/growfs to the ramdisk, and then using
> it (successfully, more or less), I still see this error:
> 
> new filesystem size is: 3411480 frags
> Warning: 376928 sector(s) cannot be allocated.
> growfs: not enough new space
> 
> I can see in growfs.c where that messages is being emitted, and I see the
> comment there that says "The space in the new last cylinder group is too
> small, so revert back.
> I lack, however, a sufficiently low-level understanding of FFS internals to
> understand _why_ the space in the new last cylinder group is too small.
> dumpfs(8) reports "cylgrp dynamic inodes 4.4BSD fslevel 3", which I vaguely
> understood to mean that cylinder groups were now dynamically-sized
> things...?
> 
> So, per growfs, what is the space too small _for_ ?  Too small for another
> cylinder group?  How would I go about calculating how much more (or less)
> disk space I would need to make growfs happy, and not wasting ?
> 
> Now waiting incoming cluebats :-(
> 
> -- 
> -Adam Thompson
>  athom...@athompso.net

An ffs filsystems consist of a number of cylinder groups. Note that
this term is fornm the old days, these days cylinders do not matter at
all. Each group holds it's own inode table, block usage bitmap and
summary information (but note that files can span cgs).

The cylinder group size is fixed during newfs. The last cylinder group can
be smaller than the standard size, but needs to be big enough to hold
the minimal amount of metadata like the inode table and block usage bitmap.

Now assume a cylgroup size is 100, old size of the disk was 500, so
you have 5 cgs. If you grow it to 601, the last cg would need to be 1
in size, which might be too small to hold the cg meta info.

The message says that you hit a situation where the last cg would have
been to small, so instead of n new cylinder groups you get n-1. 

Now use dumpfs.

Look at the superblock: dblkno is the offset of the start of
the data area (in frags) of a cylgroup. That's the minimal size of a
cylinder group.
fpg is the standard group size (in frags). You should be able to
compute what you want using these numbers. dumpfs also reports the
size of the last cg.

"dynamic" has to do with the rotational info, it is hardly relevant,
afaik. 

-Otto