from:"Kostik Belousov"

Re: FreeBSD 9 recompile ports

2012-01-13 Thread Kostik Belousov

On Fri, Jan 13, 2012 at 04:11:22PM +0200, Andriy Gapon wrote:
 on 13/01/2012 14:57 George Kontostanos said the following:
  Still the question remains regarding COMPAT_FREEBSD8 and how does this
  affects ports/misc/compat8x/
 
 Looks like all the previous hints have not been clear enough.
 There is no direct relation between COMPAT_FREEBSD8 and misc/compat8x.
 COMPAT_FREEBSDX options are only needed when going from release X to 
 release X+1
 there was a change to an existing system call at the kernel-userland boundary.
 A side note: kernel options affect only what's in the kernel, quite obviously.
 misc/compatXx contains versions of shared libraries from release X that are 
 no
 longer present in X+1.

Additional twist is that not every change at the kernel/usermode boundary
is covered with backward-compatibility shims. Recent example is the CAM
ABI change, which makes libcam.so.5 from the compat8x useless.


pgpsHXivVSJKp.pgp
Description: PGP signature

Re: Mystery panic, FreeBSD 7.2-PRE

2011-12-23 Thread Kostik Belousov

On Thu, Dec 22, 2011 at 04:04:48PM -0700, Charlie Martin wrote:
 We've got another mystery panic in 7.2-PRE.  Upgrading is not an option; 
 however, if this is familiar to anyone, backporting a patch would be.
 
 The stack trace is:
 
 db_trace_self_wrapper() at 0x8019120a = db_trace_self_wrapper+0x2a^M
 panic() at 0x80308797 = panic+0x187^M
 devfs_populate_loop() at 0x802a45c8 = devfs_populate_loop+0x548^M
 devfs_populate() at 0x802a46ab = devfs_populate+0x3b^M
 devfs_lookup() at 0x802a7824 = devfs_lookup+0x264^M
 VOP_LOO[24165][irq261: plx0] DEBUG (hasc_sv_rcv_cb):  rcvd hrtbt ts 
 24051, 7/9,
 rc 0^M
 KUP_APV() at 0x804d5995 = VOP_LOOKUP_APV+0x95^M
 lookup() at 0x80384a3e = lookup+0x4ce^M
 namei() at 0x80385768 = namei+0x2c8^M
 vn_open_cred() at 0x8039b283 = vn_open_cred+0x1b3^M
 kern_open() at 0x8039a4a0 = kern_open+0x110^M
 syscall() at 0x804b0e3c = syscall+0x1ec^M
 Xfast_syscall() at 0x80494ecb = Xfast_syscall+0xab^M
 --- syscall (5, FreeBSD ELF64, open), rip = 0x800e022fc, rsp = 
 0x7fbfa128,
 rbp = 0x801002240 ---^M
 KDB: enter: panic^M

It is impossible to diagnose the real cause of the panic from the backtrace
above. 99.99% of the issues causing that backtrace are problems in the
specific drivers, which failed to dev_ref() the newly created cdev, e.g.
in the clone handler.

My interest in the issue is limited to the slightest possibility that the
bug is not yet fixed in HEAD or 9/8. Usual suspects are tty, which were
completely rototiled in 8.


pgpGbzHKAjHSb.pgp
Description: PGP signature

Re: directory listing hangs in ufs state

2011-12-22 Thread Kostik Belousov

On Wed, Dec 21, 2011 at 09:03:02PM +0400, Andrey Zonov wrote:
 On 15.12.2011 17:01, Kostik Belousov wrote:
 On Thu, Dec 15, 2011 at 03:51:02PM +0400, Andrey Zonov wrote:
 On Thu, Dec 15, 2011 at 12:42 AM, Jeremy Chadwick
 free...@jdc.parodius.comwrote:
 
 On Wed, Dec 14, 2011 at 11:47:10PM +0400, Andrey Zonov wrote:
 On 14.12.2011 22:22, Jeremy Chadwick wrote:
 On Wed, Dec 14, 2011 at 10:11:47PM +0400, Andrey Zonov wrote:
 Hi Jeremy,
 
 This is not hardware problem, I've already checked that. I also ran
 fsck today and got no errors.
 
 After some more exploration of how mongodb works, I found that then
 listing hangs, one of mongodb thread is in biowr state for a long
 time. It periodically calls msync(MS_SYNC) accordingly to ktrace
 out.
 
 If I'll remove msync() calls from mongodb, how often data will be
 sync by OS?
 
 --
 Andrey Zonov
 
 On 14.12.2011 2:15, Jeremy Chadwick wrote:
 On Wed, Dec 14, 2011 at 01:11:19AM +0400, Andrey Zonov wrote:
 
 Have you any ideas what is going on? or how to catch the problem?
 
 Assuming this isn't a file on the root filesystem, try booting the
 machine in single-user mode and using fsck -f on the filesystem in
 question.
 
 Can you verify there's no problems with the disk this file lives on 
 as
 well (smartctl -a /dev/disk)?  I'm doubting this is the problem, but
 thought I'd mention it.
 
 I have no real answer, I'm sorry.  msync(2) indicates it's effectively
 deprecated (see BUGS).  It looks like this is effectively a 
 mmap-version
 of fsync(2).
 
 I replaced msync(2) with fsync(2).  Unfortunately, from man pages it
 is not obvious that I can do this. Anyway, thanks.
 
 Sorry, that wasn't what I was implying.  Let me try to explain
 differently.
 
 msync(2) looks, to me, like an mmap-specific version of fsync(2).  Based
 on the man page, it seems that the with msync() you can effectively
 guaranteed flushing of certain pages within an mmap()'d region to disk.
 fsync() would flush **all** buffers/internal pages to be flushed to
 disk.
 
 One would need to look at the code to mongodb to find out what it's
 actually doing with msync().  That is to say, if it's doing something
 like this (I probably have the semantics wrong -- I've never spent much
 time with mmap()):
 
 fd = open(/some/file, O_RDWR);
 ptr = mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
 ret = msync(ptr, 65536, MS_SYNC);
 /* or alternatively, this:
 ret = msync(ptr, NULL, MS_SYNC);
 */
 
 Then this, to me, would be mostly the equivalent to:
 
 fd = fopen(/some/file, r+);
 ret = fsync(fd);
 
 Otherwise, if it's calling msync() only on an address/location within
 the region ptr points to, then that may be more efficient (less pages to
 flush).
 
 
 They call msync() for the whole file.  So, there will not be any 
 difference.
 
 
 The mmap() arguments -- specifically flags (see man page) -- also play
 a role here.  The one that catches my attention is MAP_NOSYNC.  So you
 may need to look at the mongodb code to figure out what it's mmap()
 call is.
 
 One might wonder why they don't just use open() with the O_SYNC.  I
 imagine that has to do with, again, performance; possibly the don't want
 all I/O synchronous, and would rather flush certain pages in the mmap'd
 region to disk as needed.  I see the legitimacy in that approach (vs.
 just using O_SYNC).
 
 There's really no easy way for me to tell you which is more efficient,
 better, blah blah without spending a lot of time with a benchmarking
 program that tests all of this, *plus* an entire system (world) built
 with profiling.
 
 
 I ran for two hours mongodb with fsync() and got the following:
 STARTED  INBLK OUBLK MAJFLT MINFLT
 Thu Dec 15 10:34:52 2011 3 192744314 3080182
 
 This is output of `ps -o lstart,inblock,oublock,majflt,minflt -U mongodb'.
 
 Then I ran it with default msync():
 STARTED  INBLK OUBLK MAJFLT MINFLT
 Thu Dec 15 12:34:53 2011 0 7241555 79 5401945
 
 There are also two graphics of disk business [1] [2].
 
 The difference is significant, in 37 times!  That what I expected to get.
 
 In commentaries for vm_object_page_clean() I found this:
 
   *  When stuffing pages asynchronously, allow clustering.  XXX we 
   need a
   *  synchronous clustering mode implementation.
 
 It means for me that msync(MS_SYNC) flush every page on disk in single IO
 transaction.  If we multiply 4K and 37 we get 150K.  This number is size 
 of
 the single transaction in my experience.
 
 +alc@, kib@
 
 Am I right? Is there any plan to implement this?
 Current buffer clustering code can only do only async writes. In fact, I
 am not quite sure what would consitute the sync clustering, because the
 ability to delay the write is important to be able to cluster at all.
 
 Also, I am not sure that lack of clustering is the biggest problem.
 IMO, the fact that each write is sync is the first problem there. It
 would be quite a work to add the tracking of the issued writes

Re: fsck_ufs out of swapspace

2011-12-20 Thread Kostik Belousov

On Tue, Dec 20, 2011 at 09:51:43AM +1100, Peter Jeremy wrote:
 On 2011-Dec-19 22:27:49 +0100, Michiel Boland bolan...@xs4all.nl wrote:
 Problem solved - it was indeed an endian thing.
 The problem is that fsck uses a real_dev_bsize variable that is declared 
 long, 
 but the DIOCGSECTORSIZE ioctl takes an u_int argument.
 
 To be accurate, this isn't an endian problem, it's a general problem
 of passing a pointer to an incorrectly sized object.  The bug is
 masked on amd64  iA64 because real_dev_bsize is statically allocated
 and therefore initialised to zero.  This means the failure to assign
 the top 32 bits in the ioctl doesn't affect the final result.
 
 A PR has been submitted.
 
 sparc64/163460 for the record.  Thank you for tracking that down.

The easier fix is to change the type of real_dev_bsize. I used long only
because other n variables keeping the sector size are long, but there
is no much reason to use long there.

Peter, would you, please retest the +J on non-512 byte sectors, with the
patch attached ?

diff --git a/sbin/fsck_ffs/fsck.h b/sbin/fsck_ffs/fsck.h
index 8091d0f..4e30a7e 100644
--- a/sbin/fsck_ffs/fsck.h
+++ b/sbin/fsck_ffs/fsck.h
@@ -268,7 +268,7 @@ charsnapname[BUFSIZ];   /* when doing 
snapshots, the name of the file */
 char   *cdevname;  /* name of device being checked */
 long   dev_bsize;  /* computed value of DEV_BSIZE */
 long   secsize;/* actual disk sector size */
-long   real_dev_bsize;
+u_int  real_dev_bsize; /* actual disk sector size, not overriden */
 char   nflag;  /* assume a no response */
 char   yflag;  /* assume a yes response */
 intbkgrdflag;  /* use a snapshot to run on an active system */
diff --git a/sbin/fsck_ffs/suj.c b/sbin/fsck_ffs/suj.c
index ec8b5ab..b784519 100644
--- a/sbin/fsck_ffs/suj.c
+++ b/sbin/fsck_ffs/suj.c
@@ -206,7 +206,7 @@ opendisk(const char *devnam)
real_dev_bsize) == -1)
real_dev_bsize = secsize;
if (debug)
-   printf(dev_bsize %ld\n, real_dev_bsize);
+   printf(dev_bsize %u\n, real_dev_bsize);
 }
 
 /*


pgpcm0dWM9HIP.pgp
Description: PGP signature

Re: directory listing hangs in ufs state

2011-12-15 Thread Kostik Belousov

On Thu, Dec 15, 2011 at 03:51:02PM +0400, Andrey Zonov wrote:
 On Thu, Dec 15, 2011 at 12:42 AM, Jeremy Chadwick
 free...@jdc.parodius.comwrote:
 
  On Wed, Dec 14, 2011 at 11:47:10PM +0400, Andrey Zonov wrote:
   On 14.12.2011 22:22, Jeremy Chadwick wrote:
   On Wed, Dec 14, 2011 at 10:11:47PM +0400, Andrey Zonov wrote:
   Hi Jeremy,
   
   This is not hardware problem, I've already checked that. I also ran
   fsck today and got no errors.
   
   After some more exploration of how mongodb works, I found that then
   listing hangs, one of mongodb thread is in biowr state for a long
   time. It periodically calls msync(MS_SYNC) accordingly to ktrace
   out.
   
   If I'll remove msync() calls from mongodb, how often data will be
   sync by OS?
   
   --
   Andrey Zonov
   
   On 14.12.2011 2:15, Jeremy Chadwick wrote:
   On Wed, Dec 14, 2011 at 01:11:19AM +0400, Andrey Zonov wrote:
   
   Have you any ideas what is going on? or how to catch the problem?
   
   Assuming this isn't a file on the root filesystem, try booting the
   machine in single-user mode and using fsck -f on the filesystem in
   question.
   
   Can you verify there's no problems with the disk this file lives on as
   well (smartctl -a /dev/disk)?  I'm doubting this is the problem, but
   thought I'd mention it.
   
   I have no real answer, I'm sorry.  msync(2) indicates it's effectively
   deprecated (see BUGS).  It looks like this is effectively a mmap-version
   of fsync(2).
  
   I replaced msync(2) with fsync(2).  Unfortunately, from man pages it
   is not obvious that I can do this. Anyway, thanks.
 
  Sorry, that wasn't what I was implying.  Let me try to explain
  differently.
 
  msync(2) looks, to me, like an mmap-specific version of fsync(2).  Based
  on the man page, it seems that the with msync() you can effectively
  guaranteed flushing of certain pages within an mmap()'d region to disk.
  fsync() would flush **all** buffers/internal pages to be flushed to
  disk.
 
  One would need to look at the code to mongodb to find out what it's
  actually doing with msync().  That is to say, if it's doing something
  like this (I probably have the semantics wrong -- I've never spent much
  time with mmap()):
 
  fd = open(/some/file, O_RDWR);
  ptr = mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
  ret = msync(ptr, 65536, MS_SYNC);
  /* or alternatively, this:
  ret = msync(ptr, NULL, MS_SYNC);
  */
 
  Then this, to me, would be mostly the equivalent to:
 
  fd = fopen(/some/file, r+);
  ret = fsync(fd);
 
  Otherwise, if it's calling msync() only on an address/location within
  the region ptr points to, then that may be more efficient (less pages to
  flush).
 
 
 They call msync() for the whole file.  So, there will not be any difference.
 
 
  The mmap() arguments -- specifically flags (see man page) -- also play
  a role here.  The one that catches my attention is MAP_NOSYNC.  So you
  may need to look at the mongodb code to figure out what it's mmap()
  call is.
 
  One might wonder why they don't just use open() with the O_SYNC.  I
  imagine that has to do with, again, performance; possibly the don't want
  all I/O synchronous, and would rather flush certain pages in the mmap'd
  region to disk as needed.  I see the legitimacy in that approach (vs.
  just using O_SYNC).
 
  There's really no easy way for me to tell you which is more efficient,
  better, blah blah without spending a lot of time with a benchmarking
  program that tests all of this, *plus* an entire system (world) built
  with profiling.
 
 
 I ran for two hours mongodb with fsync() and got the following:
 STARTED  INBLK OUBLK MAJFLT MINFLT
 Thu Dec 15 10:34:52 2011 3 192744314 3080182
 
 This is output of `ps -o lstart,inblock,oublock,majflt,minflt -U mongodb'.
 
 Then I ran it with default msync():
 STARTED  INBLK OUBLK MAJFLT MINFLT
 Thu Dec 15 12:34:53 2011 0 7241555 79 5401945
 
 There are also two graphics of disk business [1] [2].
 
 The difference is significant, in 37 times!  That what I expected to get.
 
 In commentaries for vm_object_page_clean() I found this:
 
  *  When stuffing pages asynchronously, allow clustering.  XXX we need a
  *  synchronous clustering mode implementation.
 
 It means for me that msync(MS_SYNC) flush every page on disk in single IO
 transaction.  If we multiply 4K and 37 we get 150K.  This number is size of
 the single transaction in my experience.
 
 +alc@, kib@
 
 Am I right? Is there any plan to implement this?
Current buffer clustering code can only do only async writes. In fact, I
am not quite sure what would consitute the sync clustering, because the
ability to delay the write is important to be able to cluster at all.

Also, I am not sure that lack of clustering is the biggest problem.
IMO, the fact that each write is sync is the first problem there. It
would be quite a work to add the tracking of the issued writes to the

Re: tmpfs deadlock on stable/9

2011-12-07 Thread Kostik Belousov

On Wed, Dec 07, 2011 at 01:57:08PM +0400, Dmitry Morozovsky wrote:
 Dear colleagues,
 
 I have ports tinderbox runnign on stable/9-amd64, with working directories on 
 tmpfs. I have two consecutive tmpfs deadlocks like
 
 root@beaver:/usr/local/tb/scripts# ps t2
   PID  TT  STATTIME COMMAND
  2337   2  Is   0:00.04 /bin/tcsh  
  3079   2  I0:00.01 sudo -sE   
  3260   2  I0:00.02 /bin/tcsh  
 20309   2  I+   0:00.06 /bin/sh ./tc tinderbuild -nullfs -norebuild -b 
 9-i386-RiNet
 27035   2  S+   0:00.13 make PACKAGES=/usr/local/tb/packages/9-i386-RiNet -k 
 -j1 all
 46470   2  I+   0:00.00 sh -ev
 46471   2  I+   0:00.01 /bin/sh /usr/local/tb/scripts/lib/portbuild 
 9-i386-RiNet 9-i386 RiNet -nullfs gsm-1.0.13.tbz /usr/ports/audio/gsm
 46677   2  I+   0:00.00 /bin/sh /buildscript /usr/ports/audio/gsm 2
 46766   2  I+   0:00.00 /pnohang 7200 /tmp/make.log4 gsm-1.0.13 make build
 46767   2  I+   0:00.02 make build
 46768   2  I+   0:00.00 /pnohang 7200 /tmp/make.log4 gsm-1.0.13 make build
 46789   2  I+   0:00.00 [sh]
 46790   2  D+   0:00.01 make -f Makefile -j4 all
 46918   2  I+   0:00.00 sh -ev
 46926   2  I+   0:00.00 sh -ev
 46928   2  Z+   0:00.09 defunct
 46938   2  D+   0:00.00 mv gsm_create.o ./src/gsm_create.o
 46940   2  D+   0:00.00 mv gsm_print.o ./src/gsm_print.o
 
 (this is parallel build, last 2 rm's are deadlocked on tmpfs) 
 
 what kind of additional info should I send? I have debugging turned on in 
 kernel, if it's needed.
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html

No, I do not promise to look into it.


pgpQrBHe06ciC.pgp
Description: PGP signature

Re: tmpfs deadlock on stable/9

2011-12-07 Thread Kostik Belousov

On Wed, Dec 07, 2011 at 09:59:32PM +0400, Dmitry Morozovsky wrote:
 On Wed, 7 Dec 2011, Kostik Belousov wrote:
 
   I have ports tinderbox runnign on stable/9-amd64, with working 
   directories on 
   tmpfs. I have two consecutive tmpfs deadlocks like
   
   root@beaver:/usr/local/tb/scripts# ps t2
 PID  TT  STATTIME COMMAND
2337   2  Is   0:00.04 /bin/tcsh  
3079   2  I0:00.01 sudo -sE   
3260   2  I0:00.02 /bin/tcsh  
   20309   2  I+   0:00.06 /bin/sh ./tc tinderbuild -nullfs -norebuild -b 
   9-i386-RiNet
   27035   2  S+   0:00.13 make PACKAGES=/usr/local/tb/packages/9-i386-RiNet 
   -k 
   -j1 all
   46470   2  I+   0:00.00 sh -ev
   46471   2  I+   0:00.01 /bin/sh /usr/local/tb/scripts/lib/portbuild 
   9-i386-RiNet 9-i386 RiNet -nullfs gsm-1.0.13.tbz /usr/ports/audio/gsm
   46677   2  I+   0:00.00 /bin/sh /buildscript /usr/ports/audio/gsm 2
   46766   2  I+   0:00.00 /pnohang 7200 /tmp/make.log4 gsm-1.0.13 make build
   46767   2  I+   0:00.02 make build
   46768   2  I+   0:00.00 /pnohang 7200 /tmp/make.log4 gsm-1.0.13 make build
   46789   2  I+   0:00.00 [sh]
   46790   2  D+   0:00.01 make -f Makefile -j4 all
   46918   2  I+   0:00.00 sh -ev
   46926   2  I+   0:00.00 sh -ev
   46928   2  Z+   0:00.09 defunct
   46938   2  D+   0:00.00 mv gsm_create.o ./src/gsm_create.o
   46940   2  D+   0:00.00 mv gsm_print.o ./src/gsm_print.o
   
   (this is parallel build, last 2 rm's are deadlocked on tmpfs) 
   
   what kind of additional info should I send? I have debugging turned on in 
   kernel, if it's needed.
  http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
  
  No, I do not promise to look into it.
 
 It is available at http://bsd.woozle.net/tmpfs-lock-20111207.txt (~260k)
 
 BTW, at least some of the debugger commands referenced (show locks, show 
 alllocks) are no longer exist
This means that you do not have witness in your kernel.
Look at the reference I pointed you once more.


pgpcWVtYWuC4L.pgp
Description: PGP signature

Re: Something missing in truss

2011-12-04 Thread Kostik Belousov

On Sat, Dec 03, 2011 at 01:54:58PM -0600, Dan Nelson wrote:
 In the last episode (Dec 02), Eivind Evensen said:
  Does anybody else see this or know why?
  
  The machine here is running :
  
   uname -a
  FreeBSD elg.hjerdalen.lokalnett 8.2-STABLE FreeBSD 8.2-STABLE #36: Wed Nov 
  30 22:03:07 CET 2011 
  rumrunner@elg.hjerdalen.lokalnett:/usr/obj/usr/src/sys/RUM  amd64
  
  While trying to weed out some firefox problems, I've noticed
  that truss doesn't recognise certain syscalls :
  
  getpid() = 1519 (0x5ef)
  clock_gettime(4,{48496.335142903 })  = 0 (0x0)
  kevent(20,{0x23,EVFILT_READ,EV_ADD,0,0x0,0x809ec9d80},1,{0x15,EVFILT_READ,0x0,0,0x1,0x809ec9e80},64,0x0)
   = 1 (0x1)
  clock_gettime(4,{48496.335293202 })  = 0 (0x0)
  read(21,\0,1)  = 1 (0x1)
  clock_gettime(4,{48496.335382599 })  = 0 (0x0)
  umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 
  (0x74)
  -- UNKNOWN SYSCALL -14704864 --
  syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 
  (0x1c6)
  umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 
  (0x74)
  -- UNKNOWN SYSCALL -14704864 --
  syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 
  (0x1c6)
  umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 
  (0x74)
  -- UNKNOWN SYSCALL -14704864 --
  syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 
  (0x1c6)
  umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 
  (0x74)
  -- UNKNOWN SYSCALL -14704864 --
  syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 
  (0x1c6)
  umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 
  (0x74)
  -- UNKNOWN SYSCALL -14704864 --
  syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 
  (0x1c6)
 
 Two problems: truss get confused when you attach to a process that's
 currently executing a syscall, and it gets even more confused when you have
 a threaded process waiting in many syscalls at once.
 
 The following patch fixes problem #1, but problem #2 involves keeping more
 per-thread state and ends up touching a lot of the truss code.  See
 http://www.evoy.net/FreeBSD/truss.diff for one solution (and more syscall
 decodes).
 
 Index: setup.c
 ===
 --- setup.c   (revision 228242)
 +++ setup.c   (working copy)
 @@ -202,8 +202,10 @@
   find_thread(info, lwpinfo.pl_lwpid);
   switch(WSTOPSIG(waitval)) {
   case SIGTRAP:
 - info-pr_why = info-curthread-in_syscall?S_SCX:S_SCE;
 - info-curthread-in_syscall = 1 - 
 info-curthread-in_syscall;
 + if ((lwpinfo.pl_flags(PL_FLAG_SCE|PL_FLAG_SCX)) == 0)
 + err(1,pl_flags=%x contains neither PL_FLAG_SCE 
 or PL_FLAG_SCX, lwpinfo.pl_flags);
 + info-pr_why = (lwpinfo.pl_flagsPL_FLAG_SCE) ? 
 S_SCE:S_SCX;
 + info-curthread-in_syscall = (info-pr_why == S_SCE) ? 
 1:0;
   break;
   default:
   info-pr_why = S_SIG;
 
I started the similar but bigger patch to handle syscalls entry, leave using
explicit kernel hints. The patch is bigger because it also aims to also
handle execve(2) kind of syscalls to properly change ABI decoder, and
forks to attach to the childs in race-free manner. Unfortunately, it is
stalled.

I just committed the similar change from the patch, adding your assertion
for the case when no PL_FLAG_SCE/SCX were provided. I think that assertion
is in fact not quite right, and code should fall to the default case in
the switch. The reason is that SIGTRAP may be sent as a normal signal.
But this change is more controversial, and the patch should be an improvement
over the current situation.

Also, I should note that the patch cannot be merged even to stable/9,
because MIPS and ARM still does not properly support PL_FLAGS_XXX.
I hope to handle the merges after 9.0 is released.


pgptBfyTXHzM1.pgp
Description: PGP signature

Re: 8.2 + apache == a LOT of sigprocmask

2011-11-20 Thread Kostik Belousov

On Fri, Nov 18, 2011 at 12:07:51PM -0800, Doug Barton wrote:
 On 11/18/2011 01:19, Kostik Belousov wrote:
  On Fri, Nov 18, 2011 at 12:00:57AM -0800, Doug Barton wrote:
  On 11/17/2011 02:57, Kostik Belousov wrote:
  It's not catching there though:
 
  Reading symbols from /libexec/ld-elf.so.1...done.
  Loaded symbols for /libexec/ld-elf.so.1
  0x28183b2d in accept () at accept.S:3
  3   RSYSCALL(accept)
  (gdb) c
  Continuing.
  no thread to satisfy query
  0x28183b2d in accept () at accept.S:3
  3   RSYSCALL(accept)
  (gdb) info threads
  Cannot get thread info: invalid key
  (gdb)
  Err, the other part of my message was that you shall set the breakpoint
  on sigprocmask.
 
  I'm sorry I'm not making myself clear. We are setting the breakpoint on
  sigprocmask. But, maybe I'm doing it wrong. Can you give precise
  instructions as to what you want me to do, from the beginning? Sorry to
  be so dense.
  Find the pid of the process issuing excessive number of sigprocmask
  calls. Do
  $ gdb /usr/local/bin/httpd
  (gdb) attach pid
  (gdb) b _sigprocmask
  (gdb) c
  Bah ! Breakpoint fired.
  (gdb) bt
  (gdb) c
  ... Repeat ... 
 
 Right, so we're on the same page at least. I've been abbreviating the
 output of gdb to make it easier to see the problem, but here is a
 (nearly) complete transcript:
 
 gdb /usr/local/bin/httpd
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain
 conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as i386-marcel-freebsd...
 (gdb) attach 1380
 Attaching to program: /usr/local/bin/httpd, process 1380
 Reading symbols from  (lots of symbol-reading snipped)
 3 RSYSCALL(accept)
 Current language:  auto; currently asm
 (gdb) b _sigprocmask
 Breakpoint 1 at 0x282d9055: file /usr/src/lib/libthr/thread/thr_sig.c,
 line 210.
 (gdb) c
 Continuing.
 no thread to satisfy query
 0x28183b2d in accept () at accept.S:3
 3 RSYSCALL(accept)
 (gdb) c
 Continuing.
 no thread to satisfy query
 0x28183b2d in accept () at accept.S:3
 3 RSYSCALL(accept)
 (gdb) c
 Continuing.
 no thread to satisfy query
 0x28183b2d in accept () at accept.S:3
 3 RSYSCALL(accept)
 
  etc.

This is an issue with either your environment or your gdb, or bug in gdb.
It seems that 'continue' did not worked for you at all. I tried to reproduce
this locally, but was not able to.

And, I am unable to hit sigprocmask for my apache anywhere except rtld.
I also have libthr linked in.

So the way forward to catch sigprocmask callers is one of
- figure out why your gdb does not work and fix it; might be, try to use
gdb from ports.
- or add libunwind backtraces into sigprocmask
- or use dtrace (I doubt that 8.2 has neccessary usermode bits, and
seriously doubt its stability).


pgplBF5ytqHkR.pgp
Description: PGP signature

Re: 8.2 + apache == a LOT of sigprocmask

2011-11-18 Thread Kostik Belousov

On Fri, Nov 18, 2011 at 12:00:57AM -0800, Doug Barton wrote:
 On 11/17/2011 02:57, Kostik Belousov wrote:
   It's not catching there though:
   
   Reading symbols from /libexec/ld-elf.so.1...done.
   Loaded symbols for /libexec/ld-elf.so.1
   0x28183b2d in accept () at accept.S:3
   3RSYSCALL(accept)
   (gdb) c
   Continuing.
   no thread to satisfy query
   0x28183b2d in accept () at accept.S:3
   3RSYSCALL(accept)
   (gdb) info threads
   Cannot get thread info: invalid key
   (gdb)
  Err, the other part of my message was that you shall set the breakpoint
  on sigprocmask.
 
 I'm sorry I'm not making myself clear. We are setting the breakpoint on
 sigprocmask. But, maybe I'm doing it wrong. Can you give precise
 instructions as to what you want me to do, from the beginning? Sorry to
 be so dense.
Find the pid of the process issuing excessive number of sigprocmask
calls. Do
$ gdb /usr/local/bin/httpd
(gdb) attach pid
(gdb) b _sigprocmask
(gdb) c
Bah ! Breakpoint fired.
(gdb) bt
(gdb) c
... Repeat ... 

 
  I want to see a backtrace from the breakpoint hit.
  Several times.
 
 Me too. :)
 
 Meanwhile, in response to one of the other questions, we are using
 mpm_prefork. Also, the particular problem we're seeing does not appear
 related to fork(). The pattern of sigprocmask() calls is different from
 the pattern you see with fork().
I am sure that your sigprocmask calls do not come from rtld, it is some
use of setjmp or sigsetjmp(1), most likely.

I am not aware of any significant users of setjmp or sigprocmask in
our system libraries.
 
 
 Doug
 
 -- 
 
   We could put the whole Internet into a book.
   Too practical.
 
   Breadth of IT experience, and depth of knowledge in the DNS.
   Yours for the right price.  :)  http://SupersetSolutions.com/


pgpLtgYwMDdmp.pgp
Description: PGP signature

Re: 8.2 + apache == a LOT of sigprocmask

2011-11-17 Thread Kostik Belousov

On Wed, Nov 16, 2011 at 11:59:06PM -0800, Doug Barton wrote:
 On 11/16/2011 23:49, Kostik Belousov wrote:
  On Wed, Nov 16, 2011 at 10:46:27PM -0800, Doug Barton wrote:
  On 11/15/2011 02:09, Jeremy Chadwick wrote:
  On Tue, Nov 15, 2011 at 11:07:45AM +0200, Kostik Belousov wrote:
  On Mon, Nov 14, 2011 at 12:51:35PM -0800, Doug Barton wrote:
  On 11/14/2011 12:31, Doug Barton wrote:
  Trying to track down a load problem we're seeing on 8.2-RELEASE-p4 i386
  in a busy web hosting environment I came across the following post:
 
  http://lists.freebsd.org/pipermail/freebsd-questions/2011-October/234520.html
 
  That basically describes what we're seeing as well, including the
  doesn't happen on Linux part.
 
  Does anyone have any ideas about this?
 
  With incredibly similar stuff running on 7.x we didn't see this 
  problem,
  so it seems to be something new in 8.
 
  Just took a closer look at our ktrace, and actually our pattern is
  slightly different than the one in that post. In ours the second option
  is null, but the third is set:
 
  74195 httpd0.17 RET   sigprocmask 0
  74195 httpd0.13 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
  74195 httpd0.09 RET   sigprocmask 0
  74195 httpd0.13 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
  74195 httpd0.09 RET   sigprocmask 0
  74195 httpd0.12 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
 
  But repeated hundreds of times in a row.
 
  The calls cannot come from rtld, they are generated by some setjmp()
  invocation. If signal-safety is not needed, sigsetjmp() should be used
  instead.
 
  Quick grep of the apache httpd source shows a single setjmp() in their
  copy of pcre. No idea is it to safe to change setjmp() into sigsetjmp(?, 
  0).
 
  I hate cross-posting, but: adding freebsd-apache@ to the list.  Some of
  the Apache folks (not just port committers) may have some insight to
  Kostik's findings.
 
  Thanks to everyone for the responses. We tried Kostik's suggestion and
  unfortunately it didn't reduce the number of sigprocmask() calls to a
  statistically significant degree.
 
  Does anyone have any other ideas on ways to debug this? We're sort of
  running out of things to test. :-/
 
  Given how important (and prevalent) the Apache + FreeBSD combination is,
  I'm kind of disturbed that we're seeing this performance problem, and if
  it's something in 8.x that's also in 9.x, it would be better to fix it
  prior to 9.0-RELEASE.
  
  Since my guess appeared to be not useful,
 
 Well I wouldn't say that they weren't useful, we eliminated the obvious
 candidate. So, not good news certainly, but not unhelpful. :)
 
  the way forward is to identify
  the location of the call(s) that cause the issue. I suggest compliling
  at least apache itself, libc, rtld and libthr (if used) with debugging
  information. Then, attach to the running apache worker with the gdb and
Note this part.

  set breakpoint on sigprocmask. Several backtraces from the hit breakpoint
  should give enough data.
 
 We tried that, and got this:
 
 Loaded symbols for /libexec/ld-elf.so.1
 0x28183a5d in accept () from /lib/libc.so.7
 (gdb) b sigprocmask
 Breakpoint 1 at 0x282d8f84
 (gdb) c
 Continuing.
 no thread to satisfy query
 0x28183a5d in accept () from /lib/libc.so.7
 (gdb)
It seems your libc has no debugging information.
accept() is the pure syscall wrapper, it cannot call sigprocmask.
If gdb catched the PLT trampoline instead of real accept(),  we would
see the rtld frames. So install libc, libthr and rtld with debug.

Also, having debug symbols for apache itself can be useful.

 
 Of course I'm not the world's greatest gdb'er, so maybe there is a
 better way to do it?
 
  High-tech solution is to link with libunwind and add code into sigprocmask()
  to gather the stacks. But I expect that gdb attach is enough.
 
 Ok, we'll look into that, thanks.
 
 
 Doug
 
 -- 
 
   We could put the whole Internet into a book.
   Too practical.
 
   Breadth of IT experience, and depth of knowledge in the DNS.
   Yours for the right price.  :)  http://SupersetSolutions.com/


pgp8pbhWv1A3X.pgp
Description: PGP signature

Re: 8.2 + apache == a LOT of sigprocmask

2011-11-17 Thread Kostik Belousov

On Thu, Nov 17, 2011 at 01:26:49AM -0800, Doug Barton wrote:
 On 11/17/2011 00:12, Kostik Belousov wrote:
  On Wed, Nov 16, 2011 at 11:59:06PM -0800, Doug Barton wrote:
  On 11/16/2011 23:49, Kostik Belousov wrote:
  On Wed, Nov 16, 2011 at 10:46:27PM -0800, Doug Barton wrote:
  On 11/15/2011 02:09, Jeremy Chadwick wrote:
  On Tue, Nov 15, 2011 at 11:07:45AM +0200, Kostik Belousov wrote:
  On Mon, Nov 14, 2011 at 12:51:35PM -0800, Doug Barton wrote:
  On 11/14/2011 12:31, Doug Barton wrote:
  Trying to track down a load problem we're seeing on 8.2-RELEASE-p4 
  i386
  in a busy web hosting environment I came across the following post:
 
  http://lists.freebsd.org/pipermail/freebsd-questions/2011-October/234520.html
 
  That basically describes what we're seeing as well, including the
  doesn't happen on Linux part.
 
  Does anyone have any ideas about this?
 
  With incredibly similar stuff running on 7.x we didn't see this 
  problem,
  so it seems to be something new in 8.
 
  Just took a closer look at our ktrace, and actually our pattern is
  slightly different than the one in that post. In ours the second 
  option
  is null, but the third is set:
 
  74195 httpd0.17 RET   sigprocmask 0
  74195 httpd0.13 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
  74195 httpd0.09 RET   sigprocmask 0
  74195 httpd0.13 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
  74195 httpd0.09 RET   sigprocmask 0
  74195 httpd0.12 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
 
  But repeated hundreds of times in a row.
 
  The calls cannot come from rtld, they are generated by some setjmp()
  invocation. If signal-safety is not needed, sigsetjmp() should be used
  instead.
 
  Quick grep of the apache httpd source shows a single setjmp() in their
  copy of pcre. No idea is it to safe to change setjmp() into 
  sigsetjmp(?, 0).
 
  I hate cross-posting, but: adding freebsd-apache@ to the list.  Some of
  the Apache folks (not just port committers) may have some insight to
  Kostik's findings.
 
  Thanks to everyone for the responses. We tried Kostik's suggestion and
  unfortunately it didn't reduce the number of sigprocmask() calls to a
  statistically significant degree.
 
  Does anyone have any other ideas on ways to debug this? We're sort of
  running out of things to test. :-/
 
  Given how important (and prevalent) the Apache + FreeBSD combination is,
  I'm kind of disturbed that we're seeing this performance problem, and if
  it's something in 8.x that's also in 9.x, it would be better to fix it
  prior to 9.0-RELEASE.
 
  Since my guess appeared to be not useful,
 
  Well I wouldn't say that they weren't useful, we eliminated the obvious
  candidate. So, not good news certainly, but not unhelpful. :)
 
  the way forward is to identify
  the location of the call(s) that cause the issue. I suggest compliling
  at least apache itself, libc, rtld and libthr (if used) with debugging
  information. Then, attach to the running apache worker with the gdb and
  Note this part.
 
 Right, we attached to a worker, that's why it's in accept(). :)
 
  It seems your libc has no debugging information.
  accept() is the pure syscall wrapper, it cannot call sigprocmask.
  If gdb catched the PLT trampoline instead of real accept(),  we would
  see the rtld frames. So install libc, libthr and rtld with debug.
 
 It's not catching there though:
 
 Reading symbols from /libexec/ld-elf.so.1...done.
 Loaded symbols for /libexec/ld-elf.so.1
 0x28183b2d in accept () at accept.S:3
 3 RSYSCALL(accept)
 (gdb) c
 Continuing.
 no thread to satisfy query
 0x28183b2d in accept () at accept.S:3
 3 RSYSCALL(accept)
 (gdb) info threads
 Cannot get thread info: invalid key
 (gdb)

Err, the other part of my message was that you shall set the breakpoint
on sigprocmask. I want to see a backtrace from the breakpoint hit.
Several times.

The backtrace at the attach time has no use.


pgptW6yGgAFjw.pgp
Description: PGP signature

Re: 8.2 + apache == a LOT of sigprocmask

2011-11-16 Thread Kostik Belousov

On Wed, Nov 16, 2011 at 10:46:27PM -0800, Doug Barton wrote:
 On 11/15/2011 02:09, Jeremy Chadwick wrote:
  On Tue, Nov 15, 2011 at 11:07:45AM +0200, Kostik Belousov wrote:
  On Mon, Nov 14, 2011 at 12:51:35PM -0800, Doug Barton wrote:
  On 11/14/2011 12:31, Doug Barton wrote:
  Trying to track down a load problem we're seeing on 8.2-RELEASE-p4 i386
  in a busy web hosting environment I came across the following post:
 
  http://lists.freebsd.org/pipermail/freebsd-questions/2011-October/234520.html
 
  That basically describes what we're seeing as well, including the
  doesn't happen on Linux part.
 
  Does anyone have any ideas about this?
 
  With incredibly similar stuff running on 7.x we didn't see this problem,
  so it seems to be something new in 8.
 
  Just took a closer look at our ktrace, and actually our pattern is
  slightly different than the one in that post. In ours the second option
  is null, but the third is set:
 
  74195 httpd0.17 RET   sigprocmask 0
  74195 httpd0.13 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
  74195 httpd0.09 RET   sigprocmask 0
  74195 httpd0.13 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
  74195 httpd0.09 RET   sigprocmask 0
  74195 httpd0.12 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
 
  But repeated hundreds of times in a row.
 
  The calls cannot come from rtld, they are generated by some setjmp()
  invocation. If signal-safety is not needed, sigsetjmp() should be used
  instead.
 
  Quick grep of the apache httpd source shows a single setjmp() in their
  copy of pcre. No idea is it to safe to change setjmp() into sigsetjmp(?, 
  0).
  
  I hate cross-posting, but: adding freebsd-apache@ to the list.  Some of
  the Apache folks (not just port committers) may have some insight to
  Kostik's findings.
 
 Thanks to everyone for the responses. We tried Kostik's suggestion and
 unfortunately it didn't reduce the number of sigprocmask() calls to a
 statistically significant degree.
 
 Does anyone have any other ideas on ways to debug this? We're sort of
 running out of things to test. :-/
 
 Given how important (and prevalent) the Apache + FreeBSD combination is,
 I'm kind of disturbed that we're seeing this performance problem, and if
 it's something in 8.x that's also in 9.x, it would be better to fix it
 prior to 9.0-RELEASE.

Since my guess appeared to be not useful, the way forward is to identify
the location of the call(s) that cause the issue. I suggest compliling
at least apache itself, libc, rtld and libthr (if used) with debugging
information. Then, attach to the running apache worker with the gdb and
set breakpoint on sigprocmask. Several backtraces from the hit breakpoint
should give enough data.

High-tech solution is to link with libunwind and add code into sigprocmask()
to gather the stacks. But I expect that gdb attach is enough.


pgph4H6aDhzI5.pgp
Description: PGP signature

Re: 8.2 + apache == a LOT of sigprocmask

2011-11-15 Thread Kostik Belousov

On Mon, Nov 14, 2011 at 12:51:35PM -0800, Doug Barton wrote:
 On 11/14/2011 12:31, Doug Barton wrote:
  Trying to track down a load problem we're seeing on 8.2-RELEASE-p4 i386
  in a busy web hosting environment I came across the following post:
  
  http://lists.freebsd.org/pipermail/freebsd-questions/2011-October/234520.html
  
  That basically describes what we're seeing as well, including the
  doesn't happen on Linux part.
  
  Does anyone have any ideas about this?
  
  With incredibly similar stuff running on 7.x we didn't see this problem,
  so it seems to be something new in 8.
 
 Just took a closer look at our ktrace, and actually our pattern is
 slightly different than the one in that post. In ours the second option
 is null, but the third is set:
 
 74195 httpd0.17 RET   sigprocmask 0
 74195 httpd0.13 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
 74195 httpd0.09 RET   sigprocmask 0
 74195 httpd0.13 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
 74195 httpd0.09 RET   sigprocmask 0
 74195 httpd0.12 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
 
 But repeated hundreds of times in a row.
The calls cannot come from rtld, they are generated by some setjmp()
invocation. If signal-safety is not needed, sigsetjmp() should be used
instead.

Quick grep of the apache httpd source shows a single setjmp() in their
copy of pcre. No idea is it to safe to change setjmp() into sigsetjmp(?, 0).


pgp1NaxV3yHCF.pgp
Description: PGP signature

Re: valgrind on FreeBSD?

2011-10-09 Thread Kostik Belousov

On Sun, Oct 09, 2011 at 04:18:48PM +0200, V??clav Zeman wrote:
 Jeremy Chadwick wrote, On 9.10.2011 16:11:
  On Sun, Oct 09, 2011 at 03:48:57PM +0200, V??clav Zeman wrote:
  V??clav Zeman wrote, On 9.10.2011 15:25:
  Bakul Shah wrote, On 6.10.2011 8:40:
  On Wed, 05 Oct 2011 23:06:04 +0200 =?UTF-8?B?VsOhY2xhdiBaZW1hbg==?= 
  v.hais...@sh.cvut.cz  wrote:
  Hi.
 
  No matter what I try, valgrind on 7.3-STABLE is giving me this, both 
  Valgrind
  ports:
 
  valgrind: Startup or configuration error:
 Can't establish current working directory at startup
  valgrind: Unable to start up properly.  Giving up.
 
  What do I need to do to make it work?
 
  Try running valgrind under ktrace ( view with kdump). That
  will tell you what directory it is trying to access or what
  syscall fails and why.
  Hi.
 
  So I have done that and more. I have first updated from 7.3 to 8.2 
  (RELENG_8
  actually). I have not managed to recompile all of the installed Ports yet,
  but I made sure to recompile valgrind and its dependencies. The same thing
  has happened!
 
  As I have said, I have done the ktrace and here is the interesting bit:
 
   78028 valgrind NAMI  /usr/local/lib/valgrind/memcheck-amd64-freebsd
   78028 memcheck-amd64-free RET   execve 0
   78028 memcheck-amd64-free CALL  getpid
   78028 memcheck-amd64-free RET   getpid 78028/0x130cc
   78028 memcheck-amd64-free CALL
  __sysctl(0x39a91450,0x4,0x389a3800,0x39a91468,0,0)
   78028 memcheck-amd64-free SCTL  kern.proc.vmmap.78028
   78028 memcheck-amd64-free RET   __sysctl 0
   78028 memcheck-amd64-free CALL
  mmap(0x49000,0x40,PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_FIXED|MAP_ANON,0x,0)
   78028 memcheck-amd64-free RET   mmap 17179906048/0x49000
   78028 memcheck-amd64-free CALL  getrlimit(RLIMIT_DATA,0x39e6a780)
   78028 memcheck-amd64-free RET   getrlimit 0
   78028 memcheck-amd64-free CALL  setrlimit(RLIMIT_DATA,0x39a919e0)
   78028 memcheck-amd64-free RET   setrlimit 0
   78028 memcheck-amd64-free CALL  getrlimit(RLIMIT_STACK,0x39e6a790)
   78028 memcheck-amd64-free RET   getrlimit 0
   78028 memcheck-amd64-free CALL  __getcwd(0x3882d700,0x3ff)
   78028 memcheck-amd64-free NAMI  ..
   78028 memcheck-amd64-free RET   __getcwd -1 errno 2 No such file or 
  directory
   78028 memcheck-amd64-free CALL  write(0x2,0x3830b060,0x6c)
   78028 memcheck-amd64-free GIO   fd 2 wrote 108 bytes
 valgrind: Startup or configuration error:
  valgrind:Can't establish current working directory at startup
 
   78028 memcheck-amd64-free RET   write 108/0x6c
   78028 memcheck-amd64-free CALL  write(0x2,0x3830b060,0x33)
   78028 memcheck-amd64-free GIO   fd 2 wrote 51 bytes
 valgrind: Unable to start up properly.  Giving up.
 
   78028 memcheck-amd64-free RET   write 51/0x33
   78028 memcheck-amd64-free CALL  exit(0x1)
 
  Now what? Why would the __getcwd call be failing with No such file or
  directory?
 
  It is the nullfs!
 
  I have /home mounted using nullfs to /usr/home:
 
  /usr/home   /home   nullfs  rw,multilabel,acls
 0 0
 
  When I run valgrind from the /usr based directory, it works:
 
  shell::wilx:/usr/home/users/wilx/tmp/yttool valgrind --tool=memcheck 
  ./yttool
  ==34679== Memcheck, a memory error detector
  ==34679== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
  ==34679== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
  ==34679== Command: ./yttool
  ==34679==
  ==34679==
  ==34679== HEAP SUMMARY:
  ==34679== in use at exit: 20,395 bytes in 119 blocks
  ==34679==   total heap usage: 6,719 allocs, 6,600 frees, 716,787 bytes 
  allocated
  ==34679==
  ==34679== LEAK SUMMARY:
  ==34679==definitely lost: 0 bytes in 0 blocks
  ==34679==indirectly lost: 0 bytes in 0 blocks
  ==34679==  possibly lost: 134 bytes in 4 blocks
  ==34679==still reachable: 20,261 bytes in 115 blocks
  ==34679== suppressed: 0 bytes in 0 blocks
  ==34679== Rerun with --leak-check=full to see details of leaked memory
  ==34679==
  ==34679== For counts of detected and suppressed errors, rerun with: -v
  ==34679== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
 
  But when I run it from the nullfs mount, it fails:
 
  shell::wilx:/usr/home/users/wilx/tmp/yttool cd $HOME/tmp/yttool
  shell::wilx:~/tmp/yttool valgrind --tool=memcheck ./yttool
  valgrind: Startup or configuration error:
  valgrind:Can't establish current working directory at startup
  valgrind: Unable to start up properly.  Giving up.
  
  Amazing how userland utilities behave differently depending upon the
  underlying filesystem type, eh?  Good thing I asked what your underlying
  filesystem types were.  Don't ever think that it'll all just work.
  :-)
  
  I believe there are other issues/stipulations with nullfs (some have
  been reported over the years), so I'm not too surprised by this issue.
  I have no idea who currently

Re: stable/9 r225827 i386 panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero

2011-09-29 Thread Kostik Belousov

On Thu, Sep 29, 2011 at 02:52:31PM +0300, Alexandr Kovalenko wrote:
 Hello!
 
 I'm running 9.0-BETA3 (r225827) and now rebuilding all my 1215 ports
 (I've upgraded from 8.2). I'm getting panic. Is it known
 problem/already fixed somewhere?
 
 FreeBSD mile.xxx.ua 9.0-BETA3 FreeBSD 9.0-BETA3 #0 r225827: Wed Sep 28
 17:11:17 EEST 2011 r...@mile.xxx.ua:/usr/obj/usr/src/sys/mile-9
 i386
 
 Unread portion of the kernel message buffer:
 panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero
 cpuid = 1
 Uptime: 16h6m53s
 Physical memory: 1904 MB
 Dumping 367 MB: 352 336 320 304 288 272 256 240 224 208 192 176 160
 144 128 112 96 80 64 48 32 16
 
 #0  doadump (textdump=1) at pcpu.h:244
 #1  0xc071e5cb in kern_reboot (howto=260)
 at /usr/src/sys/kern/kern_shutdown.c:442
 #2  0xc071e82b in panic (fmt=Variable fmt is not available.
 ) at /usr/src/sys/kern/kern_shutdown.c:607
 #3  0xc0966903 in vm_page_unwire (m=0xc2a38dc8, activate=0)
 at /usr/src/sys/vm/vm_page.c:1905
Please do frame 2, then p/x *m and show the result.

 #4  0xc0796b80 in vfs_vmio_release (bp=0xde8bcbf4)
 at /usr/src/sys/kern/vfs_bio.c:1638
 #5  0xc0798813 in getnewbuf (vp=0xc6ea3550, slpflag=0, slptimeo=0,
 size=16384, maxsize=16384, gbflags=0) at /usr/src/sys/kern/vfs_bio.c:1949
 #6  0xc0799f2a in getblk (vp=0xc6ea3550, blkno=2520, size=16384, slpflag=0,
 slptimeo=0, flags=Variable flags is not available.
 ) at /usr/src/sys/kern/vfs_bio.c:2788
 #7  0xc079d49c in cluster_rbuild (vp=0xc6ea3550, filesize=44505088, lbn=2520,
 blkno=1209440, size=16384, run=Variable run is not available.
 ) at /usr/src/sys/kern/vfs_cluster.c:332
 #8  0xc079e145 in cluster_read (vp=0xc6ea3550, filesize=44505088,
 lblkno=2520, size=16384, cred=0x0, totread=1024, seqcount=7,
 bpp=0xf5824b60) at /usr/src/sys/kern/vfs_cluster.c:254
 #9  0xc0934cf5 in ffs_read (ap=0xf5824bac)
 at /usr/src/sys/ufs/ffs/ffs_vnops.c:514
 #10 0xc09ccb92 in VOP_READ_APV (vop=0xc0aa6a80, a=0xf5824bac)
 at vnode_if.c:887
 #11 0xc07c1120 in vn_read (fp=0xc5474508, uio=0xf5824c48,
 active_cred=0xc56a4d80, flags=1, td=0xc5b76b80) at vnode_if.h:384
 #12 0xc076380e in dofileread (td=0xc5b76b80, fd=3, fp=0xc5474508,
 auio=0xf5824c48, offset=41189376, flags=1) at file.h:254
 #13 0xc07639f5 in kern_preadv (td=0xc5b76b80, fd=3, auio=0xf5824c48,
 offset=41189376) at /usr/src/sys/kern/sys_generic.c:288
 #14 0xc0763b0d in sys_pread (td=0xc5b76b80, uap=0xf5824cec)
 at /usr/src/sys/kern/sys_generic.c:189
 #15 0xc09accf5 in syscall (frame=0xf5824d28) at subr_syscall.c:131
 #16 0xc0996db1 in Xint0x80_syscall ()
 at /usr/src/sys/i386/i386/exception.s:266
 #17 0x0033 in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 
 -- 
 Alexandr Kovalenko
 http://uafug.org.ua/
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


pgpEqW488Y83a.pgp
Description: PGP signature

Re: stable/9 r225827 i386 panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero

2011-09-29 Thread Kostik Belousov

On Thu, Sep 29, 2011 at 03:47:19PM +0300, Alexandr Kovalenko wrote:
 On Thu, Sep 29, 2011 at 3:30 PM, Kostik Belousov kostik...@gmail.com wrote:
  On Thu, Sep 29, 2011 at 02:52:31PM +0300, Alexandr Kovalenko wrote:
  Hello!
 
  I'm running 9.0-BETA3 (r225827) and now rebuilding all my 1215 ports
  (I've upgraded from 8.2). I'm getting panic. Is it known
  problem/already fixed somewhere?
 
  FreeBSD mile.xxx.ua 9.0-BETA3 FreeBSD 9.0-BETA3 #0 r225827: Wed Sep 28
  17:11:17 EEST 2011     r...@mile.xxx.ua:/usr/obj/usr/src/sys/mile-9
  i386
 
  Unread portion of the kernel message buffer:
  panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero
  cpuid = 1
  Uptime: 16h6m53s
  Physical memory: 1904 MB
  Dumping 367 MB: 352 336 320 304 288 272 256 240 224 208 192 176 160
  144 128 112 96 80 64 48 32 16
 
  #0  doadump (textdump=1) at pcpu.h:244
  #1  0xc071e5cb in kern_reboot (howto=260)
      at /usr/src/sys/kern/kern_shutdown.c:442
  #2  0xc071e82b in panic (fmt=Variable fmt is not available.
  ) at /usr/src/sys/kern/kern_shutdown.c:607
  #3  0xc0966903 in vm_page_unwire (m=0xc2a38dc8, activate=0)
      at /usr/src/sys/vm/vm_page.c:1905
  Please do frame 2, then p/x *m and show the result.
 
 (kgdb) frame 2
frame 3, sorry. p/x *(struct vm_page *)0xc2a38dc8 will do it as well.

 #2  0xc071e82b in panic (fmt=Variable fmt is not available.) at
 /usr/src/sys/kern/kern_shutdown.c:607
 607 kern_reboot(bootopt);
 (kgdb) p/x *m
 No symbol m in current context.
 
 
  #4  0xc0796b80 in vfs_vmio_release (bp=0xde8bcbf4)
      at /usr/src/sys/kern/vfs_bio.c:1638
  #5  0xc0798813 in getnewbuf (vp=0xc6ea3550, slpflag=0, slptimeo=0,
      size=16384, maxsize=16384, gbflags=0) at 
  /usr/src/sys/kern/vfs_bio.c:1949
  #6  0xc0799f2a in getblk (vp=0xc6ea3550, blkno=2520, size=16384, slpflag=0,
      slptimeo=0, flags=Variable flags is not available.
  ) at /usr/src/sys/kern/vfs_bio.c:2788
  #7  0xc079d49c in cluster_rbuild (vp=0xc6ea3550, filesize=44505088, 
  lbn=2520,
      blkno=1209440, size=16384, run=Variable run is not available.
  ) at /usr/src/sys/kern/vfs_cluster.c:332
  #8  0xc079e145 in cluster_read (vp=0xc6ea3550, filesize=44505088,
      lblkno=2520, size=16384, cred=0x0, totread=1024, seqcount=7,
      bpp=0xf5824b60) at /usr/src/sys/kern/vfs_cluster.c:254
  #9  0xc0934cf5 in ffs_read (ap=0xf5824bac)
      at /usr/src/sys/ufs/ffs/ffs_vnops.c:514
  #10 0xc09ccb92 in VOP_READ_APV (vop=0xc0aa6a80, a=0xf5824bac)
      at vnode_if.c:887
  #11 0xc07c1120 in vn_read (fp=0xc5474508, uio=0xf5824c48,
      active_cred=0xc56a4d80, flags=1, td=0xc5b76b80) at vnode_if.h:384
  #12 0xc076380e in dofileread (td=0xc5b76b80, fd=3, fp=0xc5474508,
      auio=0xf5824c48, offset=41189376, flags=1) at file.h:254
  #13 0xc07639f5 in kern_preadv (td=0xc5b76b80, fd=3, auio=0xf5824c48,
      offset=41189376) at /usr/src/sys/kern/sys_generic.c:288
  #14 0xc0763b0d in sys_pread (td=0xc5b76b80, uap=0xf5824cec)
      at /usr/src/sys/kern/sys_generic.c:189
  #15 0xc09accf5 in syscall (frame=0xf5824d28) at subr_syscall.c:131
  #16 0xc0996db1 in Xint0x80_syscall ()
      at /usr/src/sys/i386/i386/exception.s:266
  #17 0x0033 in ?? ()
  Previous frame inner to this frame (corrupt stack?)
 
  --
  Alexandr Kovalenko
  http://uafug.org.ua/
  ___
  freebsd-stable@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-stable
  To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
 
 
 
 
 -- 
 Alexandr Kovalenko
 http://uafug.org.ua/


pgpKS0hyuF5Gk.pgp
Description: PGP signature

Re: stable/9 r225827 i386 panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero

2011-09-29 Thread Kostik Belousov

On Thu, Sep 29, 2011 at 03:51:53PM +0300, Alexandr Kovalenko wrote:
 2011/9/29 Kostik Belousov kostik...@gmail.com:
  On Thu, Sep 29, 2011 at 03:47:19PM +0300, Alexandr Kovalenko wrote:
  On Thu, Sep 29, 2011 at 3:30 PM, Kostik Belousov kostik...@gmail.com 
  wrote:
   On Thu, Sep 29, 2011 at 02:52:31PM +0300, Alexandr Kovalenko wrote:
   Hello!
  
   I'm running 9.0-BETA3 (r225827) and now rebuilding all my 1215 ports
   (I've upgraded from 8.2). I'm getting panic. Is it known
   problem/already fixed somewhere?
Do you use custom kernel config ? Is there a chance you have ZERO_COPY_SOCKETS
option enabled ?

  
   FreeBSD mile.xxx.ua 9.0-BETA3 FreeBSD 9.0-BETA3 #0 r225827: Wed Sep 28
   17:11:17 EEST 2011     r...@mile.xxx.ua:/usr/obj/usr/src/sys/mile-9
   i386
  
   Unread portion of the kernel message buffer:
   panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero
   cpuid = 1
   Uptime: 16h6m53s
   Physical memory: 1904 MB
   Dumping 367 MB: 352 336 320 304 288 272 256 240 224 208 192 176 160
   144 128 112 96 80 64 48 32 16
  
   #0  doadump (textdump=1) at pcpu.h:244
   #1  0xc071e5cb in kern_reboot (howto=260)
       at /usr/src/sys/kern/kern_shutdown.c:442
   #2  0xc071e82b in panic (fmt=Variable fmt is not available.
   ) at /usr/src/sys/kern/kern_shutdown.c:607
   #3  0xc0966903 in vm_page_unwire (m=0xc2a38dc8, activate=0)
       at /usr/src/sys/vm/vm_page.c:1905
   Please do frame 2, then p/x *m and show the result.
 
  (kgdb) frame 2
  frame 3, sorry. p/x *(struct vm_page *)0xc2a38dc8 will do it as well.
 
 (kgdb) frame 3
 #3  0xc0966903 in vm_page_unwire (m=0xc2a38dc8, activate=0) at
 /usr/src/sys/vm/vm_page.c:1905
 1905panic(vm_page_unwire: page %p's wire count is
 zero, m);
 (kgdb) p/x *(struct vm_page *)0xc2a38dc8
 $1 = {pageq = {tqe_next = 0xc2a38e10, tqe_prev = 0xc282a2b0}, listq =
 {tqe_next = 0xc2a38e10, tqe_prev = 0xc282a2b8}, left = 0x0, right =
 0x0, object = 0xc5725770, pindex = 0xbd3, phys_addr = 0x56a32000, md =
 {pv_list = {tqh_first = 0xc3cc6418, tqh_last = 0xc3cc641c},
 pat_mode = 0x6}, queue = 0x1, segind = 0x2, hold_count = 0x0,
 order = 0xb, pool = 0x0, cow = 0x0, wire_count = 0x0, aflags = 0x3,
 flags = 0x0, oflags = 0x0, act_count = 0x5, busy = 0x0, valid = 0xff,
 dirty = 0xff}

Please show the output of p *(struct vm_object *)0xc5725770 from kgdb.
 
 
  #2  0xc071e82b in panic (fmt=Variable fmt is not available.) at
  /usr/src/sys/kern/kern_shutdown.c:607
  607             kern_reboot(bootopt);
  (kgdb) p/x *m
  No symbol m in current context.
 
 
   #4  0xc0796b80 in vfs_vmio_release (bp=0xde8bcbf4)
       at /usr/src/sys/kern/vfs_bio.c:1638
   #5  0xc0798813 in getnewbuf (vp=0xc6ea3550, slpflag=0, slptimeo=0,
       size=16384, maxsize=16384, gbflags=0) at 
   /usr/src/sys/kern/vfs_bio.c:1949
   #6  0xc0799f2a in getblk (vp=0xc6ea3550, blkno=2520, size=16384, 
   slpflag=0,
       slptimeo=0, flags=Variable flags is not available.
   ) at /usr/src/sys/kern/vfs_bio.c:2788
   #7  0xc079d49c in cluster_rbuild (vp=0xc6ea3550, filesize=44505088, 
   lbn=2520,
       blkno=1209440, size=16384, run=Variable run is not available.
   ) at /usr/src/sys/kern/vfs_cluster.c:332
   #8  0xc079e145 in cluster_read (vp=0xc6ea3550, filesize=44505088,
       lblkno=2520, size=16384, cred=0x0, totread=1024, seqcount=7,
       bpp=0xf5824b60) at /usr/src/sys/kern/vfs_cluster.c:254
   #9  0xc0934cf5 in ffs_read (ap=0xf5824bac)
       at /usr/src/sys/ufs/ffs/ffs_vnops.c:514
   #10 0xc09ccb92 in VOP_READ_APV (vop=0xc0aa6a80, a=0xf5824bac)
       at vnode_if.c:887
   #11 0xc07c1120 in vn_read (fp=0xc5474508, uio=0xf5824c48,
       active_cred=0xc56a4d80, flags=1, td=0xc5b76b80) at vnode_if.h:384
   #12 0xc076380e in dofileread (td=0xc5b76b80, fd=3, fp=0xc5474508,
       auio=0xf5824c48, offset=41189376, flags=1) at file.h:254
   #13 0xc07639f5 in kern_preadv (td=0xc5b76b80, fd=3, auio=0xf5824c48,
       offset=41189376) at /usr/src/sys/kern/sys_generic.c:288
   #14 0xc0763b0d in sys_pread (td=0xc5b76b80, uap=0xf5824cec)
       at /usr/src/sys/kern/sys_generic.c:189
   #15 0xc09accf5 in syscall (frame=0xf5824d28) at subr_syscall.c:131
   #16 0xc0996db1 in Xint0x80_syscall ()
       at /usr/src/sys/i386/i386/exception.s:266
   #17 0x0033 in ?? ()
   Previous frame inner to this frame (corrupt stack?)
  
   --
   Alexandr Kovalenko
   http://uafug.org.ua/
   ___
   freebsd-stable@freebsd.org mailing list
   http://lists.freebsd.org/mailman/listinfo/freebsd-stable
   To unsubscribe, send any mail to 
   freebsd-stable-unsubscr...@freebsd.org
  
 
 
 
  --
  Alexandr Kovalenko
  http://uafug.org.ua/
 
 
 
 
 -- 
 Alexandr Kovalenko
 http://uafug.org.ua/


pgpJLf0pqRQ0D.pgp
Description: PGP signature

Re: stable/9 r225827 i386 panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero

2011-09-29 Thread Kostik Belousov

On Thu, Sep 29, 2011 at 04:12:19PM +0300, Alexandr Kovalenko wrote:
 2011/9/29 Kostik Belousov kostik...@gmail.com:
  On Thu, Sep 29, 2011 at 03:51:53PM +0300, Alexandr Kovalenko wrote:
  2011/9/29 Kostik Belousov kostik...@gmail.com:
   On Thu, Sep 29, 2011 at 03:47:19PM +0300, Alexandr Kovalenko wrote:
   On Thu, Sep 29, 2011 at 3:30 PM, Kostik Belousov kostik...@gmail.com 
   wrote:
On Thu, Sep 29, 2011 at 02:52:31PM +0300, Alexandr Kovalenko wrote:
Hello!
   
I'm running 9.0-BETA3 (r225827) and now rebuilding all my 1215 ports
(I've upgraded from 8.2). I'm getting panic. Is it known
problem/already fixed somewhere?
  Do you use custom kernel config ? Is there a chance you have 
  ZERO_COPY_SOCKETS
  option enabled ?
 
 Yes, ZERO_COPY_SOCKETS is there.
Ok, this is the cause. Remove it.

I asked for some additional data below, which you ignored, but I believe
that I will not see anything new there, after we found the ZERO_COPY_SOCKETS
in kernel config.

 
 
 
 
   
FreeBSD mile.xxx.ua 9.0-BETA3 FreeBSD 9.0-BETA3 #0 r225827: Wed Sep 
28
17:11:17 EEST 2011     r...@mile.xxx.ua:/usr/obj/usr/src/sys/mile-9
i386
   
Unread portion of the kernel message buffer:
panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero
cpuid = 1
Uptime: 16h6m53s
Physical memory: 1904 MB
Dumping 367 MB: 352 336 320 304 288 272 256 240 224 208 192 176 160
144 128 112 96 80 64 48 32 16
   
#0  doadump (textdump=1) at pcpu.h:244
#1  0xc071e5cb in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:442
#2  0xc071e82b in panic (fmt=Variable fmt is not available.
) at /usr/src/sys/kern/kern_shutdown.c:607
#3  0xc0966903 in vm_page_unwire (m=0xc2a38dc8, activate=0)
    at /usr/src/sys/vm/vm_page.c:1905
Please do frame 2, then p/x *m and show the result.
  
   (kgdb) frame 2
   frame 3, sorry. p/x *(struct vm_page *)0xc2a38dc8 will do it as well.
 
  (kgdb) frame 3
  #3  0xc0966903 in vm_page_unwire (m=0xc2a38dc8, activate=0) at
  /usr/src/sys/vm/vm_page.c:1905
  1905                    panic(vm_page_unwire: page %p's wire count is
  zero, m);
  (kgdb) p/x *(struct vm_page *)0xc2a38dc8
  $1 = {pageq = {tqe_next = 0xc2a38e10, tqe_prev = 0xc282a2b0}, listq =
  {tqe_next = 0xc2a38e10, tqe_prev = 0xc282a2b8}, left = 0x0, right =
  0x0, object = 0xc5725770, pindex = 0xbd3, phys_addr = 0x56a32000, md =
  {pv_list = {tqh_first = 0xc3cc6418, tqh_last = 0xc3cc641c},
      pat_mode = 0x6}, queue = 0x1, segind = 0x2, hold_count = 0x0,
  order = 0xb, pool = 0x0, cow = 0x0, wire_count = 0x0, aflags = 0x3,
  flags = 0x0, oflags = 0x0, act_count = 0x5, busy = 0x0, valid = 0xff,
  dirty = 0xff}
 
  Please show the output of p *(struct vm_object *)0xc5725770 from kgdb.
 
 
   #2  0xc071e82b in panic (fmt=Variable fmt is not available.) at
   /usr/src/sys/kern/kern_shutdown.c:607
   607             kern_reboot(bootopt);
   (kgdb) p/x *m
   No symbol m in current context.
  
  
#4  0xc0796b80 in vfs_vmio_release (bp=0xde8bcbf4)
    at /usr/src/sys/kern/vfs_bio.c:1638
#5  0xc0798813 in getnewbuf (vp=0xc6ea3550, slpflag=0, slptimeo=0,
    size=16384, maxsize=16384, gbflags=0) at 
/usr/src/sys/kern/vfs_bio.c:1949
#6  0xc0799f2a in getblk (vp=0xc6ea3550, blkno=2520, size=16384, 
slpflag=0,
    slptimeo=0, flags=Variable flags is not available.
) at /usr/src/sys/kern/vfs_bio.c:2788
#7  0xc079d49c in cluster_rbuild (vp=0xc6ea3550, filesize=44505088, 
lbn=2520,
    blkno=1209440, size=16384, run=Variable run is not available.
) at /usr/src/sys/kern/vfs_cluster.c:332
#8  0xc079e145 in cluster_read (vp=0xc6ea3550, filesize=44505088,
    lblkno=2520, size=16384, cred=0x0, totread=1024, seqcount=7,
    bpp=0xf5824b60) at /usr/src/sys/kern/vfs_cluster.c:254
#9  0xc0934cf5 in ffs_read (ap=0xf5824bac)
    at /usr/src/sys/ufs/ffs/ffs_vnops.c:514
#10 0xc09ccb92 in VOP_READ_APV (vop=0xc0aa6a80, a=0xf5824bac)
    at vnode_if.c:887
#11 0xc07c1120 in vn_read (fp=0xc5474508, uio=0xf5824c48,
    active_cred=0xc56a4d80, flags=1, td=0xc5b76b80) at vnode_if.h:384
#12 0xc076380e in dofileread (td=0xc5b76b80, fd=3, fp=0xc5474508,
    auio=0xf5824c48, offset=41189376, flags=1) at file.h:254
#13 0xc07639f5 in kern_preadv (td=0xc5b76b80, fd=3, auio=0xf5824c48,
    offset=41189376) at /usr/src/sys/kern/sys_generic.c:288
#14 0xc0763b0d in sys_pread (td=0xc5b76b80, uap=0xf5824cec)
    at /usr/src/sys/kern/sys_generic.c:189
#15 0xc09accf5 in syscall (frame=0xf5824d28) at subr_syscall.c:131
#16 0xc0996db1 in Xint0x80_syscall ()
    at /usr/src/sys/i386/i386/exception.s:266
#17 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
   
--
Alexandr Kovalenko
http://uafug.org.ua/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo

Re: NFSD hang

2011-09-28 Thread Kostik Belousov

On Tue, Sep 27, 2011 at 08:06:50PM -0400, Rick Macklem wrote:

 I suspect something is making those threads loop, but I don't know how
 to figure out where? (In the bad old days, I would have exscaped to a
 debugger and looked where the program counter was, then repeated after
 a cont a few times, to see where they were executing. However, I
 have no idea how to do that on a multicore system, even if you still
 had the system sitting there? If someone does know how to do this,
 please feel free to chime in;-)

Absolutely the same. Break into the debugger, use ps to find the pid/tid
of the looping threads, then do bt id to get a backtrace for them.
Repeat several time to get some idea where the loop is located.


pgpCeDaV6lEDG.pgp
Description: PGP signature

Re: luit -encoding gbk causes Segmentation fault (core dumped) in 9-stable

2011-09-28 Thread Kostik Belousov

On Wed, Sep 28, 2011 at 09:09:30PM +0800, Adrian Chadd wrote:
 Hm, it's not all that useful. But it first calls dlopen(). Does it
 have shared modules? Do you have old copies of those somewhere lying
 around?

You need to build everything, i.e. base and the port, build with debug
symbols. Otherwise, the backtrace give no useful information.


pgpeghVB2ZyTk.pgp
Description: PGP signature

Re: /usr/bin/script eating 100% cpu with portupgrade and xargs

2011-09-18 Thread Kostik Belousov

On Sun, Sep 18, 2011 at 02:54:34PM +0300, Mikolaj Golub wrote:
 
 On Sun, 18 Sep 2011 13:25:26 +0200 Ronald Klop wrote:
 
  RK It is a while since I programmed C, but why will writing 0 bytes give
  RK the reader an end-of-file? Shouldn't the fd be closed to indicate
  RK end-of-file?
 
 AFAIR, this trick with writing 0 to emulate EOF because we can't close the fd
 -- we still want to read from it.  Poor shutdown(2) for non-socket :-).
 
 Colin might tell more...

Please note that interpreting the receiving of 0 bytes on the terminal 
as EOF is only a convention. If done absolutely properly, script shall
not interpret zero-byte read as EOF. Might be, the reasonable thing to
do would be to only look at the stdin once in a second after receiving
zero-bytes, and switching it back to normal mode if something is read.


pgpHt3LnC7dhH.pgp
Description: PGP signature

Re: /usr/bin/script eating 100% cpu with portupgrade and xargs

2011-09-18 Thread Kostik Belousov

On Sun, Sep 18, 2011 at 11:57:57PM +0300, Mikolaj Golub wrote:
 
 On Sun, 18 Sep 2011 20:24:23 +0300 Kostik Belousov wrote:
 
  KB On Sun, Sep 18, 2011 at 02:54:34PM +0300, Mikolaj Golub wrote:
   
   On Sun, 18 Sep 2011 13:25:26 +0200 Ronald Klop wrote:
   
RK It is a while since I programmed C, but why will writing 0 bytes give
RK the reader an end-of-file? Shouldn't the fd be closed to indicate
RK end-of-file?
   
   AFAIR, this trick with writing 0 to emulate EOF because we can't close 
 the fd
   -- we still want to read from it.  Poor shutdown(2) for non-socket :-).
   
   Colin might tell more...
 
  KB Please note that interpreting the receiving of 0 bytes on the terminal 
  KB as EOF is only a convention. If done absolutely properly, script shall
  KB not interpret zero-byte read as EOF. Might be, the reasonable thing to
  KB do would be to only look at the stdin once in a second after receiving
  KB zero-bytes, and switching it back to normal mode if something is read.
 
 Ok. I see. Below is the patch that does something like this.

Looks fine for me, but I did not tested it. I would also suggest to document
this behaviour, which can cause a 1-second pause in processing of the user
input, somewhere in script(1) manpage, BUGS ?


pgp8OqRUNOhFz.pgp
Description: PGP signature

Re: panic: spin lock held too long (RELENG_8 from today)

2011-09-03 Thread Kostik Belousov

On Sat, Sep 03, 2011 at 12:05:47PM +0200, Attilio Rao wrote:
 This should be enough for someone NFS-aware to look into it.
 
 Were you also able to get a core?
 
 I'll try to look into it in the next days, in particular about the
 softclock state.
 
I am absolutely  sure that this is a zfs deadlock.


pgpV5NUD9Kyx1.pgp
Description: PGP signature

Re: sigwait return 4

2011-08-27 Thread Kostik Belousov

On Sat, Aug 27, 2011 at 04:25:36PM +0200, Jilles Tjoelker wrote:
 On Thu, Aug 25, 2011 at 12:29:29AM +0300, Kostik Belousov wrote:
  On Wed, Aug 24, 2011 at 10:56:09PM +0200, Jilles Tjoelker wrote:
   sigwait() was fixed not to return EINTR in 9-current in r212405 (fixed
   up in r219709). The discussion started at
   http://lists.freebsd.org/pipermail/freebsd-threads/2010-September/004892.html
   
   Solaris is simply wrong in the same way we were wrong. Although POSIX
   may not be as clear on this as one may like, its intention is clear and
   additionally not returning EINTR reduces subtle portability problems.
 
  Can you, please, describe why do you consider the behaviour prohibiting
  return of EINTR reasonable ? I do consider that the Solaris behaviour is
  useful.
 
 Applications need to cope with EINTR returns (usually by retrying the
 call); if they do not do this, bugs arise in uncommon cases.
 
 In the case of sigwait(), applications do not really need EINTR: they
 can include the respective signal into the signal set and do the work
 inline that was originally in the signal handler. This might require
 additional pthread_sigmask() calls. This also fixes the race condition
 almost always associated with EINTR.
 
 Historically, this is because sigwait() came with POSIX threads, which
 also explains why it returns an error number rather than setting errno.
 The threads group considered EINTR errors not useful enough, given that
 they may lead to subtle bugs. This is fully standardized for functions
 like pthread_cond_wait() and pthread_mutex_lock().
 
 In the case of sigwait(), it also plays a role that glibc has decided
 not to return EINTR, so that returning EINTR may lead to subtle bugs
 appearing on FreeBSD in software originally written for GNU/Linux.
 
 The functions sigwaitinfo() and sigtimedwait() came with POSIX realtime
 and therefore follow different conventions.

I think I finally realized what was the problem Slawa searched the
fix for. The fix from r212405 indeed does not allow EINTR to be returned
from the sigwait() for new libc, but it still leaves the compat libc
and libthr with EINTR.

Below is the patch that I provided to Slawa to handle EINTR condition
in kernel. The meat is in kern_sig.c two lines, everything else is
the r212405 revert.

diff --git a/lib/libc/sys/Makefile.inc b/lib/libc/sys/Makefile.inc
index fe5061d..aa0959b 100644
--- a/lib/libc/sys/Makefile.inc
+++ b/lib/libc/sys/Makefile.inc
@@ -24,9 +24,6 @@ SRCS+=${SYSCALL_COMPAT_SRCS}
 NOASM+=${SYSCALL_COMPAT_SRCS:S/.c/.o/}
 PSEUDO+= _fcntl.o
 .endif
-SRCS+= sigwait.c
-NOASM+= sigwait.o
-PSEUDO+= _sigwait.o
 
 # Add machine dependent asm sources:
 SRCS+=${MDASM}
diff --git a/lib/libc/sys/Symbol.map b/lib/libc/sys/Symbol.map
index 095751a..2ba1f8f 100644
--- a/lib/libc/sys/Symbol.map
+++ b/lib/libc/sys/Symbol.map
@@ -937,7 +937,6 @@ FBSDprivate_1.0 {
_sigtimedwait;
__sys_sigtimedwait;
_sigwait;
-   __sigwait;
__sys_sigwait;
_sigwaitinfo;
__sys_sigwaitinfo;
diff --git a/lib/libc/sys/sigwait.c b/lib/libc/sys/sigwait.c
deleted file mode 100644
index 2fdffdd..000
--- a/lib/libc/sys/sigwait.c
+++ /dev/null
@@ -1,46 +0,0 @@
-/*-
- * Copyright (c) 2010 davi...@freebsd.org
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- * 1. Redistributions of source code must retain the above copyright
- *notice, this list of conditions and the following disclaimer.
- * 2. Redistributions in binary form must reproduce the above copyright
- *notice, this list of conditions and the following disclaimer in the
- *documentation and/or other materials provided with the distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
- * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
- * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
- * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
- * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- * SUCH DAMAGE.
- */
-
-#include sys/cdefs.h
-__FBSDID($FreeBSD$);
-
-#include errno.h
-#include signal.h
-
-int __sys_sigwait(const sigset_t * restrict, int * restrict);
-
-__weak_reference(__sigwait, sigwait);
-
-int
-__sigwait(const sigset_t * restrict set, int * restrict sig)
-{
-   int ret;
-
-   /* POSIX does not allow EINTR to be returned */
-   do  {
-   ret = __sys_sigwait(set, sig

Re: -m32 on freeBSD 8.2r amd64

2011-08-24 Thread Kostik Belousov

On Wed, Aug 24, 2011 at 01:57:02PM +0100, Tom Evans wrote:
 On Wed, Aug 24, 2011 at 12:11 PM, Michael Hoffmann benz...@arcor.de wrote:
  Maybe off topic?
 
  1: echo int main(void) { return 0; }  t.c
 
  2: setenv LDEMULATION elf_i386_fbsd
 
  3: gcc -c -m32 -o t.o t.c
 
  4: gcc -nostartfiles -o a.out
  t.o -L/usr/lib32 /usr/lib32/crt1.o /usr/lib32/crti.o
 
  5: file a.out
  a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD),
  dynamically linked (uses shared libs), for FreeBSD 8.2, not stripped
 
  6: uname -m
  amd64
 
  2: q.v. info binutils - Selecting The Target System
 
  Maybe there is a more comfortable way.
  Michael
 
 
 You don't need to go to all that effort:
 
 $ uname -m
 amd64
 $ echo int main(void) { return 0; }  t.c
 $ gcc -c -m32 -o t.o t.c
 $ gcc -m32 -o t t.o -B/usr/lib32
 $ file t
 t: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD),
 dynamically linked (uses shared libs), for FreeBSD 8.2 (802510), not
 stripped

Well-known problem is that /usr/include/machine/*.h still contains
amd64 arch definitions. The resulting binary is broken in the quite
subtle ways.


pgp1knQx89Ipg.pgp
Description: PGP signature

Re: sigwait return 4

2011-08-24 Thread Kostik Belousov

On Wed, Aug 24, 2011 at 10:19:07PM +0400, Slawa Olhovchenkov wrote:
 System is 8.2-RELEASE (GENERIC), amd64.
 Application -- i386 for freebsd7.
 
 In ktrace dump I find some strange result:
 
  22951 100556 kas-milter CALL  sigwait(0xffdfdf80,0xffdfdf7c)
  22951 100556 kas-milter RET   sigwait 4
  22951 100556 kas-milter PSIG  SIGUSR2 caught handler=0x804c0f0 mask=0x4003 
 code=0x0
 
 RET   sigwait 4 confused me, and, I think, confused application too.
 
 man sigwait:
 
 ERRORS
  The sigwait() system call will fail if:
 
  [EINVAL]   The set argument specifies one or more invalid signal
 numbers.
 
  [EFAULT]   Any arguments point outside the allocated address
 space or there is a memory protection fault.
 
 
 How sigwait can return '4'?
 May be EINTR, converted from ERESTART? But kern_sigtimedwait from sigwait 
 must 
 be called with timeout == NULL...
 

What should the system do for a delivered signal not present in the set ?
I guess this is the case of your ktrace.

Looking at the SUSv4, I see no mention of the situation, but in Oracle
SunOS 5.10 man page for sigwait(2), it is said explicitely
EINTR The wait was interrupted by an unblocked, caught signal.

So I think that we have a bug in the man page.

diff --git a/lib/libc/sys/sigwait.2 b/lib/libc/sys/sigwait.2
index 8c00cf4..b462201 100644
--- a/lib/libc/sys/sigwait.2
+++ b/lib/libc/sys/sigwait.2
@@ -27,7 +27,7 @@
 .\
 .\ $FreeBSD$
 .\
-.Dd November 11, 2005
+.Dd August 24, 2011
 .Dt SIGWAIT 2
 .Os
 .Sh NAME
@@ -94,6 +94,8 @@ The
 .Fn sigwait
 system call will fail if:
 .Bl -tag -width Er
+.It Bq Er EINTR
+The system call was interrupted by an unblocked, caught signal.
 .It Bq Er EINVAL
 The
 .Fa set


pgpWlJF1UhSzz.pgp
Description: PGP signature

Re: sigwait return 4

2011-08-24 Thread Kostik Belousov

On Wed, Aug 24, 2011 at 11:24:46PM +0400, Slawa Olhovchenkov wrote:
 On Wed, Aug 24, 2011 at 10:07:03PM +0300, Kostik Belousov wrote:
 
  On Wed, Aug 24, 2011 at 10:19:07PM +0400, Slawa Olhovchenkov wrote:
   System is 8.2-RELEASE (GENERIC), amd64.
   Application -- i386 for freebsd7.
   
   In ktrace dump I find some strange result:
   
22951 100556 kas-milter CALL  sigwait(0xffdfdf80,0xffdfdf7c)
22951 100556 kas-milter RET   sigwait 4
22951 100556 kas-milter PSIG  SIGUSR2 caught handler=0x804c0f0 
   mask=0x4003 code=0x0
   
   RET   sigwait 4 confused me, and, I think, confused application too.
   
   man sigwait:
   
   ERRORS
The sigwait() system call will fail if:
   
[EINVAL]   The set argument specifies one or more invalid 
   signal
   numbers.
   
[EFAULT]   Any arguments point outside the allocated address
   space or there is a memory protection fault.
   
   
   How sigwait can return '4'?
   May be EINTR, converted from ERESTART? But kern_sigtimedwait from sigwait 
   must 
   be called with timeout == NULL...
   
  
  What should the system do for a delivered signal not present in the set ?
  I guess this is the case of your ktrace.
  
  Looking at the SUSv4, I see no mention of the situation, but in Oracle
  SunOS 5.10 man page for sigwait(2), it is said explicitely
  EINTR The wait was interrupted by an unblocked, caught signal.
 
 I don't think you right in this case.
 This is kas-milter and in this thread (this is multi-thread
 application) kas-milter wait for USR2 for reload config.
 
 System return from sigwait only on USR2, but not each return w/
 non-zero return code.
 
 On freebsd7 this application don't complain about sigwait's return value.

Could it be that some other thread has the signal unblocked ?
(You can verify this with procstat -j).

Can you write the self-contained test case that demonstrates the behaviour ?


pgpgqCL1Hl1XD.pgp
Description: PGP signature

Re: sigwait return 4

2011-08-24 Thread Kostik Belousov

On Wed, Aug 24, 2011 at 11:42:29PM +0400, Slawa Olhovchenkov wrote:
 On Wed, Aug 24, 2011 at 10:32:02PM +0300, Kostik Belousov wrote:
 
What should the system do for a delivered signal not present in the set 
?
I guess this is the case of your ktrace.

Looking at the SUSv4, I see no mention of the situation, but in Oracle
SunOS 5.10 man page for sigwait(2), it is said explicitely
EINTR The wait was interrupted by an unblocked, caught signal.
   
   I don't think you right in this case.
   This is kas-milter and in this thread (this is multi-thread
   application) kas-milter wait for USR2 for reload config.
   
   System return from sigwait only on USR2, but not each return w/
   non-zero return code.
   
   On freebsd7 this application don't complain about sigwait's return value.
  
  Could it be that some other thread has the signal unblocked ?
  (You can verify this with procstat -j).
  
  Can you write the self-contained test case that demonstrates the behaviour ?
 
 This is closed-source software.
How is this statement related to the creation of the standalone test case ?

 # procstat -j
   PIDTID COMM SIG FLAGS
  1395 100199 kas-milter   USR2 --
  1395 100232 kas-milter   USR2 --

Both threads have the signal not blocked. This is not definitive,
since signal must be blocked during the call to sigwait(2). Note that
the SUSv4 says that The signals defined by set shall have been
blocked at the time of the call to sigwait(); otherwise, the behavior is
undefined.


pgpOjMqGVVeyP.pgp
Description: PGP signature

Re: sigwait return 4

2011-08-24 Thread Kostik Belousov

On Wed, Aug 24, 2011 at 10:56:09PM +0200, Jilles Tjoelker wrote:
 On Wed, Aug 24, 2011 at 10:07:03PM +0300, Kostik Belousov wrote:
  On Wed, Aug 24, 2011 at 10:19:07PM +0400, Slawa Olhovchenkov wrote:
   System is 8.2-RELEASE (GENERIC), amd64.
   Application -- i386 for freebsd7.
 
   In ktrace dump I find some strange result:
 
22951 100556 kas-milter CALL  sigwait(0xffdfdf80,0xffdfdf7c)
22951 100556 kas-milter RET   sigwait 4
22951 100556 kas-milter PSIG  SIGUSR2 caught handler=0x804c0f0 
   mask=0x4003 code=0x0
 
   RET   sigwait 4 confused me, and, I think, confused application too.
 
   man sigwait:
 
   ERRORS
The sigwait() system call will fail if:
 
[EINVAL]   The set argument specifies one or more invalid 
   signal
   numbers.
 
[EFAULT]   Any arguments point outside the allocated address
   space or there is a memory protection fault.
 
   How sigwait can return '4'?
   May be EINTR, converted from ERESTART? But kern_sigtimedwait from
   sigwait must be called with timeout == NULL...
 
  What should the system do for a delivered signal not present in the set ?
  I guess this is the case of your ktrace.
 
  Looking at the SUSv4, I see no mention of the situation, but in Oracle
  SunOS 5.10 man page for sigwait(2), it is said explicitely
  EINTR The wait was interrupted by an unblocked, caught signal.
 
  So I think that we have a bug in the man page.
 
  diff --git a/lib/libc/sys/sigwait.2 b/lib/libc/sys/sigwait.2
  index 8c00cf4..b462201 100644
  --- a/lib/libc/sys/sigwait.2
  +++ b/lib/libc/sys/sigwait.2
  @@ -27,7 +27,7 @@
   .\
   .\ $FreeBSD$
   .\
  -.Dd November 11, 2005
  +.Dd August 24, 2011
   .Dt SIGWAIT 2
   .Os
   .Sh NAME
  @@ -94,6 +94,8 @@ The
   .Fn sigwait
   system call will fail if:
   .Bl -tag -width Er
  +.It Bq Er EINTR
  +The system call was interrupted by an unblocked, caught signal.
   .It Bq Er EINVAL
   The
   .Fa set
 
 This patch would be wrong, except to document existing behaviour in
 -stable branches.
 
 sigwait() was fixed not to return EINTR in 9-current in r212405 (fixed
 up in r219709). The discussion started at
 http://lists.freebsd.org/pipermail/freebsd-threads/2010-September/004892.html
 
 Solaris is simply wrong in the same way we were wrong. Although POSIX
 may not be as clear on this as one may like, its intention is clear and
 additionally not returning EINTR reduces subtle portability problems.
Can you, please, describe why do you consider the behaviour prohibiting
return of EINTR reasonable ? I do consider that the Solaris behaviour is
useful.

Since we went the other route, the addition to sigwait(2) manpage that
clarifies this looks useful. And, sigwait(2) shall be sigwait(3). Also,
the sentence the sigwaitinfo() function is equivalent to sigwait() ...
in the sigwaitinfo(2) is not complete, due to EINTR.

 
 Note that sigwaitinfo() and sigtimedwait() may return EINTR. SA_RESTART
 applies to sigwaitinfo() but not to sigtimedwait() (because the timeout
 cannot be restarted).
 
 -- 
 Jilles Tjoelker


pgpyJTXjtKSeq.pgp
Description: PGP signature

Re: sigwait return 4

2011-08-24 Thread Kostik Belousov

On Thu, Aug 25, 2011 at 12:29:29AM +0300, Kostik Belousov wrote:
  Solaris is simply wrong in the same way we were wrong. Although POSIX
  may not be as clear on this as one may like, its intention is clear and
  additionally not returning EINTR reduces subtle portability problems.
 Can you, please, describe why do you consider the behaviour prohibiting
 return of EINTR reasonable ? I do consider that the Solaris behaviour is
 useful.
 
 Since we went the other route, the addition to sigwait(2) manpage that
 clarifies this looks useful. And, sigwait(2) shall be sigwait(3). Also,
 the sentence the sigwaitinfo() function is equivalent to sigwait() ...
 in the sigwaitinfo(2) is not complete, due to EINTR.

Like this (svn cp to be applied).

diff --git a/lib/libc/sys/sigwait.2 b/lib/libc/sys/sigwait.2
index 8c00cf4..a9e605c 100644
--- a/lib/libc/sys/sigwait.2
+++ b/lib/libc/sys/sigwait.2
@@ -27,7 +27,7 @@
 .\
 .\ $FreeBSD$
 .\
-.Dd November 11, 2005
+.Dd August 24, 2011
 .Dt SIGWAIT 2
 .Os
 .Sh NAME
@@ -82,6 +82,14 @@ selected, it will be the lowest numbered one.
 The selection order between realtime
 and non-realtime signals, or between multiple pending non-realtime signals,
 is unspecified.
+.Sh IMPLEMENTATION NOTES
+The
+.Fn sigwait
+function is implemented as a wrapper around the
+.Fn __sys_sigwait
+system call, which retries the call on
+.Er EINTR
+error.
 .Sh RETURN VALUES
 If successful,
 .Fn sigwait
diff --git a/lib/libc/sys/sigwaitinfo.2 b/lib/libc/sys/sigwaitinfo.2
index 41be9e2..a83de06 100644
--- a/lib/libc/sys/sigwaitinfo.2
+++ b/lib/libc/sys/sigwaitinfo.2
@@ -27,7 +27,7 @@
 .\
 .\ $FreeBSD$
 .\
-.Dd November 11, 2005
+.Dd August 24, 2011
 .Dt SIGTIMEDWAIT 2
 .Os
 .Sh NAME
@@ -116,6 +116,16 @@ except that the selected signal number shall be stored in 
the
 member, and the cause of the signal shall be stored in the
 .Va si_code
 member.
+Besides this, the
+.Fn sigwaitinfo
+and
+.Fn sigtimedwait
+system calls may return
+.Er EINTR
+if interrupted by signal, which is not allowed for the
+.Fn sigwait
+function.
+.Pp
 If any value is queued to the selected signal, the first such queued
 value is dequeued and, if the info argument is
 .Pf non- Dv NULL ,


pgppSwqcBJsag.pgp
Description: PGP signature

Re: 32GB limit per swap device?

2011-08-20 Thread Kostik Belousov

On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote:
 On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikov 
 melif...@ipfw.ruwrote:
 
  On 10.08.2011 19:16, per...@pluto.rain.com wrote:
 
  Chuck Swigercswi...@mac.com  wrote:
 
   On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:
 
  I am trying to set up 64GB partitions for swap for a system that
  has 64GB of RAM (with the idea to dump kernel core etc). But, on
  8-stable as of today I get:
 
  WARNING: reducing size to maximum of 67108864 blocks per swap unit
 
  Is there workaround for this limitation?
 
 
  Another interesting question:
 
  swap pager operates in page blocks (PAGE_SIZE=4k on common arch).
 
  Block device size in passed to swaponsomething() in number of _disk_ blocks
   (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of which swap
  pager is build) maximum objects check is enforced.
 
  The (possible) problem is that real object count we will operate on is not
  the value passed to swaponsomething() since it is calculated in wrong units.
 
  we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value which
  is rough (X / 8) so we should be able to address 32*8=256G.
 
  The code should look like this:
 
  Index: vm/swap_pager.c
  ==**==**===
  --- vm/swap_pager.c (revision 223877)
  +++ vm/swap_pager.c (working copy)
  @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long
 u_long mblocks;
 
 /*
  +* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
  +* First chop nblks off to page-align it, then convert.
  +*
  +* sw-sw_nblks is in page-sized chunks now too.
  +*/
  +   nblks = ~(ctodb(1) - 1);
  +   nblks = dbtoc(nblks);
  +
  +   /*
 
  * If we go beyond this, we get overflows in the radix
  * tree bitmap code.
  */
  @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long
 mblocks);
 nblks = mblocks;
 }
  -   /*
  -* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
  -* First chop nblks off to page-align it, then convert.
  -*
  -* sw-sw_nblks is in page-sized chunks now too.
  -*/
  -   nblks = ~(ctodb(1) - 1);
  -   nblks = dbtoc(nblks);
 
 sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
 sp-sw_vp = vp;
 
 
  (move pages recalculation before b-list check)
 
 
  Can someone comment on this?
 
 
 I believe that you are correct.  Have you tried testing this change on a
 large swap device?
I probably agree too, but I am in the process of re-reading the swap code,
and I do not quite believe in the limit.

When the initial code was committed, our daddr_t was 32bit, I checked
the RELENG_4 sources. Current code uses int64_t for daddr_t. My impression
right now is that we only utilize the low 32bits of daddr_t.

Esp. interesting looks the following typedef:
typedef uint32_tu_daddr_t;  /* unsigned disk address */
which (correctly) means that typical mask (u_daddr_t)-1 is 0x.

I wonder whether we could just use full 64bit and de-facto remove the
limitation on the swap partition size.


pgpJVixGsCJlw.pgp
Description: PGP signature

Re: 32GB limit per swap device?

2011-08-20 Thread Kostik Belousov

On Sat, Aug 20, 2011 at 10:42:28PM +0400, Alexander V. Chernikov wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Alan Cox wrote:
  On 08/20/2011 12:41, Kostik Belousov wrote:
  On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote:
  On Thu, Aug 18, 2011 at 3:16 AM, Alexander V.
  Chernikovmelif...@ipfw.ruwrote:
 
  On 10.08.2011 19:16, per...@pluto.rain.com wrote:
 
  Chuck Swigercswi...@mac.com   wrote:
 
On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:
  I am trying to set up 64GB partitions for swap for a system that
  has 64GB of RAM (with the idea to dump kernel core etc). But, on
  8-stable as of today I get:
 
  WARNING: reducing size to maximum of 67108864 blocks per swap unit
 
  Is there workaround for this limitation?
 
  Another interesting question:
 
  swap pager operates in page blocks (PAGE_SIZE=4k on common arch).
 
  Block device size in passed to swaponsomething() in number of _disk_
  blocks
(e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of
  which swap
  pager is build) maximum objects check is enforced.
 
  The (possible) problem is that real object count we will operate on
  is not
  the value passed to swaponsomething() since it is calculated in
  wrong units.
 
  we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value
  which
  is rough (X / 8) so we should be able to address 32*8=256G.
 
  The code should look like this:
 
  Index: vm/swap_pager.c
  ==**==**===
  --- vm/swap_pager.c (revision 223877)
  +++ vm/swap_pager.c (working copy)
  @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id,
  u_long
  u_long mblocks;
 
  /*
  +* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd
  chunks.
  +* First chop nblks off to page-align it, then convert.
  +*
  +* sw-sw_nblks is in page-sized chunks now too.
  +*/
  +   nblks= ~(ctodb(1) - 1);
  +   nblks = dbtoc(nblks);
  +
  +   /*
 
   * If we go beyond this, we get overflows in the radix
   * tree bitmap code.
   */
  @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id,
  u_long
  mblocks);
  nblks = mblocks;
  }
  -   /*
  -* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd
  chunks.
  -* First chop nblks off to page-align it, then convert.
  -*
  -* sw-sw_nblks is in page-sized chunks now too.
  -*/
  -   nblks= ~(ctodb(1) - 1);
  -   nblks = dbtoc(nblks);
 
  sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
  sp-sw_vp = vp;
 
 
  (move pages recalculation before b-list check)
 
 
  Can someone comment on this?
 
 
  I believe that you are correct.  Have you tried testing this change on a
  large swap device?
 I will try tomorrow.
 
  I probably agree too, but I am in the process of re-reading the swap
  code,
  and I do not quite believe in the limit.
 
  
  I'm uncertain whether the current limit, 0x4000 /
  BLIST_META_RADIX, is exact or not, but I doubt that it is too large.
 
 It is not exact.  It is rough estimation of
 sizeof(blmeta_t) * X  4G (blist_create() assumes malloc() not being
 able to allocate more that 4G. I'm not sure if it is true this days)
 X is number of blocks we need to store. Actual number, however, it is X
 / (1 + 1/BLIST_META_RADIX + 1/BLIST_META_RADIX^2 + ...) but it dffers
 from X not very much.
 
 blist can be seen as tree of radix trees, with metainformation for all
 those radix trees allocated by single allocation which imposes this
 limit. Metatinformation is used to find free blocks more quickly
 
 Single linear allocation is required to advance to next radix tree on
 the same level very fast:
 
 
 *   *   *   *   *
 **  **  **  **  **
 
 ^^^
 Some kind of schema with 3 level in tree and BLIST_META_RADIX=2 (instead
 of 16).
 
 
 
  
  When the initial code was committed, our daddr_t was 32bit, I checked
  the RELENG_4 sources. Current code uses int64_t for daddr_t. My
  impression
  right now is that we only utilize the low 32bits of daddr_t.
 
  Esp. interesting looks the following typedef:
  typedefuint32_tu_daddr_t;/* unsigned disk address */
  which (correctly) means that typical mask (u_daddr_t)-1 is 0x.
 
  I wonder whether we could just use full 64bit and de-facto remove the
  limitation on the swap partition size.
 
 This will increase struct blmeta_t twice and cause 2*X memory usage for
 every swap configuration.
No, daddr_t is already 64bit. Nothing will increase.
My point is the current limitation is artificial.

I think Alan note referred to the amount of the radix tree nodes
required to cover the large swap partition. But it could be a good
temporary measure.

I expect to be able to provide some numeric evidence later.
 
  
  I would rather argue first that the subr_list code

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-17 Thread Kostik Belousov

On Wed, Aug 17, 2011 at 11:21:42PM +0300, Andriy Gapon wrote:
[skip]

 But I also would like to use this opportunity to discuss how we can
 make it easier to debug such issue as this. I think that this problem
 demonstrates that when we treat certain junk in kernel address value
 as a userland address value, we throw additional heaps of irrelevant
 stuff on top of an actual problem. One solution could be to use a
 special flag that would mark all actual attempts to access userland
 address (e.g. setting the flag on entrance to copyin and clearing it
 upon return), so that in the page fault handler we could distinguish
 actual faults on userland addresses from faults on garbage kernel
 addresses. I am sure that there could be other clever techniques to
 catch such garbage addresses early.

We already have such mechanism, the kernel code aware of the usermode
page access sets pcb_onfault. See the end of trap_pfault() handler.
In fact, we can catch it earlier, before even calling vm_fault().

BTW, I think this is esp. useful in the combination with the support
for the SMEP in recent Intel CPUs.

commit 2e1b36fa93f9499e37acf04a66ff0646d4f13536
Author: Konstantin Belousov kos...@pooma.home
Date:   Thu Aug 18 00:08:50 2011 +0300

Assert that the exiting process does not return to usermode.
On x86, do not call vm_fault() when the kernel is not prepared
to handle unsuccessful page fault.

diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c
index 4e5f8b8..55e1e5a 100644
--- a/sys/amd64/amd64/trap.c
+++ b/sys/amd64/amd64/trap.c
@@ -674,6 +674,19 @@ trap_pfault(frame, usermode)
goto nogo;
 
map = vm-vm_map;
+
+   /*
+* When accessing a usermode address, kernel must be
+* ready to accept the page fault, and provide a
+* handling routine.  Since accessing the address
+* without the handler is a bug, do not try to handle
+* it normally, and panic immediately.
+*/
+   if (!usermode  (td-td_intr_nesting_level != 0 ||
+   PCPU_GET(curpcb)-pcb_onfault == NULL)) {
+   trap_fatal(frame, eva);
+   return (-1);
+   }
}
 
/*
diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c
index 5a8016c..e6d2b5a 100644
--- a/sys/i386/i386/trap.c
+++ b/sys/i386/i386/trap.c
@@ -831,6 +831,11 @@ trap_pfault(frame, usermode, eva)
goto nogo;
 
map = vm-vm_map;
+   if (!usermode  (td-td_intr_nesting_level != 0 ||
+   PCPU_GET(curpcb)-pcb_onfault == NULL)) {
+   trap_fatal(frame, eva);
+   return (-1);
+   }
}
 
/*
diff --git a/sys/kern/subr_trap.c b/sys/kern/subr_trap.c
index 3527ed1..a69b7b8 100644
--- a/sys/kern/subr_trap.c
+++ b/sys/kern/subr_trap.c
@@ -99,6 +99,8 @@ userret(struct thread *td, struct trapframe *frame)
 
CTR3(KTR_SYSC, userret: thread %p (pid %d, %s), td, p-p_pid,
 td-td_name);
+   KASSERT((p-p_flag  P_WEXIT) == 0,
+   (Exiting process returns to usermode));
 #if 0
 #ifdef DIAGNOSTIC
/* Check that we called signotify() enough. */


pgpMIIm18QgD2.pgp
Description: PGP signature

Re: dtrace ustack kernel panic

2011-07-30 Thread Kostik Belousov

On Sat, Jul 30, 2011 at 12:05:33PM -0700, maestro something wrote:
 Hi,
 
 
  Have you started kgdb with the correct kernel and core file?
  If yes, then I am out of ideas.
 
 
  I hope so, I only recompiled the kernel once according to the DTRACE wiki
  instructions and I certainly only have one /var/crash/vmcore.* file.
 
  I'll try recompiling the kernel with -O1 and try again. In the meantime,
  I'm wondering whether I'm really the only/first one that ran into this
  problem or if there are people that actually successfully used the ustack()
  target on freebsd-8.2?
 
 
 I could not get the information even after recompiling the kernel here is
 the relevant (I think information).
 
 fb82i386# cat /etc/make.conf
 CFLAGS= -O
 
 (accodring to man make.conf only -O and -O2 is supported for CFLAGS anyways)
 
 kernel.debug is the newly compiled kernel (according to the timestamp)
 
 fb82i386# kgdb kernel.debug /var/crash/vmcore.0
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain
 conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as i386-marcel-freebsd...
 
 Unread portion of the kernel message buffer:
 kernel trap 12 with interrupts disabled
 
 
 Fatal trap 12: page fault while in kernel mode
 cpuid = 0; apic id = 00
 fault virtual address= 0x108
 fault code= supervisor write, page not present
 instruction pointer= 0x20:0xc1100847
 stack pointer= 0x28:0xcd39a9e4
 frame pointer= 0x28:0xcd39a9fc
 code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, def32 1, gran 1
 processor eflags= resume, IOPL = 0
 current process= 1060 (nc)
 trap number= 12
 panic: page fault
 cpuid = 0
 KDB: stack backtrace:
 #0 0xc09036a7 at kdb_backtrace+0x47
 #1 0xc08d1a07 at panic+0x117
 #2 0xc0c158c3 at trap_fatal+0x323
 #3 0xc0c15bc0 at trap_pfault+0x2f0
 #4 0xc0c1612a at trap+0x48a
 #5 0xc0bfc97c at calltrap+0x6
 #6 0xc10e99db at dtrace_panic+0x1b
 #7 0xc10e9a0d at dtrace_assfail+0x2d
 #8 0xc10fa6a6 at dtrace_probe+0xfd6
 #9 0xc1237ce4 at systrace_probe+0x84
 #10 0xc090f63f at syscallenter+0x47f
 #11 0xc0c15c14 at syscall+0x34
 #12 0xc0bfca11 at Xint0x80_syscall+0x21
 Uptime: 2m39s
 Physical memory: 239 MB
 Dumping 78 MB: 63 47 31 15
 
 
 
 (kgdb) where
 #0  doadump () at pcpu.h:231
 #1  0xc08d17a3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:419
 #2  0xc08d1a40 in panic (fmt=Variable fmt is not available.
 ) at /usr/src/sys/kern/kern_shutdown.c:592
 #3  0xc0c158c3 in trap_fatal (frame=0xcd39a9a4, eva=264) at
 /usr/src/sys/i386/i386/trap.c:946
 #4  0xc0c15bc0 in trap_pfault (frame=0xcd39a9a4, usermode=0, eva=264) at
 /usr/src/sys/i386/i386/trap.c:859
 #5  0xc0c1612a in trap (frame=0xcd39a9a4) at
 /usr/src/sys/i386/i386/trap.c:532
 #6  0xc0bfc97c in calltrap () at /usr/src/sys/i386/i386/exception.s:166
 #7  0xc1100847 in dtrace_panic_trigger () from /boot/kernel/dtrace.ko
 Previous frame inner to this frame (corrupt stack?)
 (kgdb) list *dtrace_probe+0xfd6
 No source file for address 0xc10fa6a6.
 
 So I'm stuck at the same point.
 
 any other ideas?

This is i386, right ?
I think the cause is that assembler routine panic_trigger does not
establish the standard i386 frame. Basically, you need either this,
or dwarf annotations, for gdb to be able to walk over the frame.

You need to add the standard prologue
pushl   %ebp
movl%esp,%ebp
and standard epilogue
leave
to the function. No idea whether it will continue to operate correctly
after.


pgpTmrHqEp6po.pgp
Description: PGP signature

Re: Sleeping thread owns a nonsleepable lock panic ( lor)

2011-07-27 Thread Kostik Belousov

On Tue, Jul 26, 2011 at 07:12:23PM -0400, Rick Macklem wrote:
Kostik Belousov wrote:
On Tue, Jul 26, 2011 at 01:17:52PM +0200, Herve Boulouis wrote:
Le 26/07/2011 12:06, Kostik Belousov a Иcrit:
On Tue, Jul 26, 2011 at 11:49:13AM +0200, Herve Boulouis wrote:
Le 25/07/2011 11:59, Kostik Belousov a ?crit:

Ok the patched server crashed this morning strangely : all httpd
processes were stuck in nfs or vmopar
and were unkillable. Below is the full ps.

Please see the
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
for information required to debug the deadlocks.

the box was not stricly deadlocked since I was able to interact with
it but I suppose you want me to
break into debugger when the symptoms appears again and report all
the commands listed in the handbook
deadlock section ?

Exactly.

I think everything was hung that accessed an nfs mount point.
From the usermode, procstat -kk could catch some interesting
information,
but it is redundant if ddb output is captured.

Would it be worth considering reverting r223054?
(Note that I don't understand the VM side, so this may be completely
wrong:-)

The sleeps on vmopar could be happening because a dirty page is busy
and r223054 changes the VM_PAGER_xx value set a couple of ways.
1 - When it returns VM_PAGER_ERROR instead of VM_PAGER_AGAIN, the
return value of runlen from vm_pageout_flush() changes.
2 - I'm not sure, but I think the pre-r223054 code marked a partially
written page as VM_PAGER_OK instead of VM_PAGER_AGAIN?
(I'm wondering about this one, since the problem seems to happen
when the file's size has been truncated.)

Herve Boulouis, if you want to see what r223054 changes, just go to
http://svn.freebsd.org/viewvc/stable/8/sys/nfsclient
and then click on nfs_bio.c.
(The changes are small and could easily be reverted with a manual
edit.)

Since r223054 went into stable/8 on Jun 13, it seems a possible
explanation? rick

I doubt it. The ps output makes it not very inplausible that the
reporter got the LOR between vnode lock and page busy flag. The correct
order is vnode lock - busy bit. vmopar is a wait for the busy page
state.

Mentioned revision does not change the lock order.

Anyway, this is only a speculation, until the requested data is provided.

pgpN7hsFvpj0G.pgp
Description: PGP signature

Re: Sleeping thread owns a nonsleepable lock panic ( lor)

2011-07-26 Thread Kostik Belousov

On Tue, Jul 26, 2011 at 11:49:13AM +0200, Herve Boulouis wrote:
 Le 25/07/2011  11:59, Kostik Belousov a Иcrit:
 
 Ok the patched server crashed this morning strangely : all httpd processes 
 were stuck in nfs or vmopar
 and were unkillable. Below is the full ps.

Please see the
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
for information required to debug the deadlocks.


pgpkLJtHzd71H.pgp
Description: PGP signature

Re: Sleeping thread owns a nonsleepable lock panic ( lor)

2011-07-26 Thread Kostik Belousov

On Tue, Jul 26, 2011 at 01:17:52PM +0200, Herve Boulouis wrote:
 Le 26/07/2011  12:06, Kostik Belousov a Иcrit:
  On Tue, Jul 26, 2011 at 11:49:13AM +0200, Herve Boulouis wrote:
   Le 25/07/2011  11:59, Kostik Belousov a ?crit:
   
   Ok the patched server crashed this morning strangely : all httpd 
   processes were stuck in nfs or vmopar
   and were unkillable. Below is the full ps.
  
  Please see the
  http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
  for information required to debug the deadlocks.
 
 the box was not stricly deadlocked since I was able to interact with it but I 
 suppose you want me to
 break into debugger when the symptoms appears again and report all the 
 commands listed in the handbook
 deadlock section ?

Exactly.

I think everything was hung that accessed an nfs mount point.
From the usermode, procstat -kk could catch some interesting information,
but it is redundant if ddb output is captured.


pgpXgEYH9PI7d.pgp
Description: PGP signature

Re: Sleeping thread owns a nonsleepable lock panic ( lor)

2011-07-25 Thread Kostik Belousov

On Mon, Jul 25, 2011 at 12:21:07PM +0200, Herve Boulouis wrote:
 Hi list,
 
 We have 2 freebsd 8.2-STABLE (cvsuped june 22) that keeps crashing in a bad 
 way :
 
 The are doing heavy apache / php4 web serving from a nfs mount and panic at 
 least once a day
 with the following message (no crash dump produced, hand copied from the 
 console) :
 
 Sleeping on vmopar with the following non-sleepable locks held:
 exclusive sleep mutex NFSnode lock (NFSnode lock) r =  0 (0xff0201798000) 
 locked @ nfsclient/nfs_subs.c:538
 lock order reversal:
  1st 0x018ff6da80 turnstile lock (turnstile lock) @ 
 kern/subr_turnstile.c:190
  2nd 0xff80b52b10 scrlock (scrlock) @ dev/syscons.c:2570
 lock order reversal:
  1st 0x018ff6da80 turnstile lock (turnstile lock) @ 
 kern/subr_turnstile.c:190
  2nd 0xff80b78ef8 sleepq chain (sleepq chain) @ 
 kern/subr_turnstile.c:203
 lock order reversal:
  1st 0xff80b78ef8 sleepq chain (sleepq chain) @ 
 kern/subr_turnstile.c:203
  2nd 0xff80b52b10 scrlock (scrlock) @ dev/syscons.c:2570
 Sleeping thread (tid 100998, pid 20700) owns a non-sleepable lock
 panic: sleeping thread
 cpuid = 1
 panic: bufwrite: buffer is not busy???
 cpuid = 1
 
 The 2 servers share the same load and panic consistently. I enabled WITNESS 
 on the 2 in the hope
 it would allow the boxes to auto reboot after panic and get extra debug info. 
 I got debug info
 but the servers still hangs after the double panic :(

Try this. Calling vnode_pager_setsize() while holding a mutex is prohibited.
On the other hand, I remember that my attempt to add a strict assert
that a vnode is exclusively locked in vnode_pager_setsize() had to be
reversed because nfs_loadattrcache() sometimes called without vnode
lock held.

commit 2aa7d15c38b0c01e3f724f04d7ed02ce11c82cc0
Author: Konstantin Belousov kostik...@gmail.com
Date:   Mon Jul 25 11:56:04 2011 +0300

Postpone the vnode_pager_setsize() call until the nfs node mutex is dropped.

diff --git a/sys/nfsclient/nfs_subs.c b/sys/nfsclient/nfs_subs.c
index 19fde06..351885a 100644
--- a/sys/nfsclient/nfs_subs.c
+++ b/sys/nfsclient/nfs_subs.c
@@ -478,7 +478,9 @@ nfs_loadattrcache(struct vnode **vpp, struct mbuf **mdp, 
caddr_t *dposp,
struct timespec mtime, mtime_save;
int v3 = NFS_ISV3(vp);
int error = 0;
+   int do_setsize;
 
+   do_setsize = 0;
md = *mdp;
t1 = (mtod(md, caddr_t) + md-m_len) - *dposp;
cp2 = nfsm_disct(mdp, dposp, NFSX_FATTR(v3), t1, M_WAIT);
@@ -606,7 +608,7 @@ nfs_loadattrcache(struct vnode **vpp, struct mbuf **mdp, 
caddr_t *dposp,
np-n_size = vap-va_size;
np-n_flag |= NSIZECHANGED;
}
-   vnode_pager_setsize(vp, np-n_size);
+   do_setsize = 1;
} else {
np-n_size = vap-va_size;
}
@@ -643,6 +645,8 @@ nfs_loadattrcache(struct vnode **vpp, struct mbuf **mdp, 
caddr_t *dposp,
KDTRACE_NFS_ATTRCACHE_LOAD_DONE(vp, np-n_vattr, 0);
 #endif
mtx_unlock(np-n_mtx);
+   if (do_setsize)
+   vnode_pager_setsize(vp, np-n_size);
 out:
 #ifdef KDTRACE_HOOKS
if (error)


pgpIRWwyIFV50.pgp
Description: PGP signature

Re: disable 64-bit dma for one PCI slot only?

2011-07-20 Thread Kostik Belousov

On Wed, Jul 20, 2011 at 11:54:06AM +0200, Stefan Esser wrote:
 The Rev column is required for of devices that are not uniquely
 identified by their Vnd/Dev-IDs. (These used to exist, e.g. the Symbios
 SCSI controllers, though I'm not aware of any device that needed a
 different driver depending on the PCI revision number.)
Might be there is indeed no such device which require different driver
due to revision, but there are definitely devices that require different
workarounds in the driver based on revision. Seeing the revision in the
output of pciconf very much helps to reduce the mail turnaround when
analyzing user reports.


pgp5QDCLC8Yv0.pgp
Description: PGP signature

Re: panic: spin lock held too long (RELENG_8 from today)

2011-07-07 Thread Kostik Belousov

On Thu, Jul 07, 2011 at 10:36:42AM +0300, Andriy Gapon wrote:
 on 07/07/2011 08:55 Mike Tancsa said the following:
  I did a buildworld on this box to bring it up to RELENG_8 for the BIND
  fixes.  Unfortunately, the formerly solid box (April 13th kernel)
  panic'd tonight with
  
  Unread portion of the kernel message buffer:
  spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac8a0 (tid 100107) too long
  panic: spin lock held too long
  cpuid = 0
  Uptime: 13h30m4s
  Physical memory: 2035 MB
  
  
  Its a somewhat busy box taking in mail as well as backups for a few
  servers over nfs.  At the time, it would have been getting about 250Mb/s
  inbound on its gigabit interface.  Full core.txt file at
  
  http://www.tancsa.com/core-jul8-2011.txt
 
 I thought that this was supposed to contain output of 'thread apply all bt' in
 kgdb.  Anyway, I think that stacktrace for tid 100107 may have some useful
 information.
 
  #0  doadump () at pcpu.h:231
  231 pcpu.h: No such file or directory.
  in pcpu.h
  (kgdb) #0  doadump () at pcpu.h:231
  #1  0xc06fd6d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:429
  #2  0xc06fd937 in panic (fmt=Variable fmt is not available.
  ) at /usr/src/sys/kern/kern_shutdown.c:602
  #3  0xc06ed95f in _mtx_lock_spin_failed (m=0x0)
  at /usr/src/sys/kern/kern_mutex.c:490
  #4  0xc06ed9e5 in _mtx_lock_spin (m=0xc0b1d200, tid=3312388992, opts=0,
  file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:526
  #5  0xc0720254 in sched_add (td=0xc5dac5c0, flags=0)
  at /usr/src/sys/kern/sched_ule.c:1119
  #6  0xc07203f9 in sched_wakeup (td=0xc5dac5c0)
  at /usr/src/sys/kern/sched_ule.c:1950
  #7  0xc07061f8 in setrunnable (td=0xc5dac5c0)
  at /usr/src/sys/kern/kern_synch.c:499
  #8  0xc07362af in sleepq_resume_thread (sq=0xca0da300, td=0xc5dac5c0,
  pri=Variable pri is not available.
  )
  at /usr/src/sys/kern/subr_sleepqueue.c:751
  #9  0xc0736e18 in sleepq_signal (wchan=0xc5fafe50, flags=1, pri=0, queue=0)
  at /usr/src/sys/kern/subr_sleepqueue.c:825
  #10 0xc06b6764 in cv_signal (cvp=0xc5fafe50)
  at /usr/src/sys/kern/kern_condvar.c:422
  #11 0xc08eaa0d in xprt_assignthread (xprt=Variable xprt is not available.
  ) at /usr/src/sys/rpc/svc.c:342
  #12 0xc08ec502 in xprt_active (xprt=0xc95d9600) at
  /usr/src/sys/rpc/svc.c:378
  #13 0xc08ee051 in svc_vc_soupcall (so=0xc6372ce0, arg=0xc95d9600,
  waitflag=1)
  at /usr/src/sys/rpc/svc_vc.c:747
  #14 0xc075bbb1 in sowakeup (so=0xc6372ce0, sb=0xc6372d34)
  at /usr/src/sys/kern/uipc_sockbuf.c:191
  #15 0xc08447bc in tcp_do_segment (m=0xcaa8d200, th=0xca6aa824,
  so=0xc6372ce0,
  tp=0xc63b4d20, drop_hdrlen=52, tlen=1448, iptos=0 '\0', ti_locked=2)
  at /usr/src/sys/netinet/tcp_input.c:1775
  #16 0xc0847930 in tcp_input (m=0xcaa8d200, off0=20)
  at /usr/src/sys/netinet/tcp_input.c:1329
  #17 0xc07ddaf7 in ip_input (m=0xcaa8d200)
  at /usr/src/sys/netinet/ip_input.c:787
  #18 0xc07b8859 in netisr_dispatch_src (proto=1, source=0, m=0xcaa8d200)
  at /usr/src/sys/net/netisr.c:859
  #19 0xc07b8af0 in netisr_dispatch (proto=1, m=0xcaa8d200)
  at /usr/src/sys/net/netisr.c:946
  #20 0xc07ae5e1 in ether_demux (ifp=0xc56ed800, m=0xcaa8d200)
  at /usr/src/sys/net/if_ethersubr.c:894
  #21 0xc07aeb5f in ether_input (ifp=0xc56ed800, m=0xcaa8d200)
  at /usr/src/sys/net/if_ethersubr.c:753
  #22 0xc09977b2 in nfe_int_task (arg=0xc56ff000, pending=1)
  at /usr/src/sys/dev/nfe/if_nfe.c:2187
  #23 0xc07387ca in taskqueue_run_locked (queue=0xc5702440)
  at /usr/src/sys/kern/subr_taskqueue.c:248
  #24 0xc073895c in taskqueue_thread_loop (arg=0xc56ff130)
  at /usr/src/sys/kern/subr_taskqueue.c:385
  #25 0xc06d1027 in fork_exit (callout=0xc07388a0 taskqueue_thread_loop,
  arg=0xc56ff130, frame=0xc538ed28) at /usr/src/sys/kern/kern_fork.c:861
  #26 0xc09a5c24 in fork_trampoline () at
  /usr/src/sys/i386/i386/exception.s:275
  (kgdb)
  

BTW, we had a similar panic, spinlock held too long, the spinlock
is the sched lock N, on busy 8-core box recently upgraded to the
stable/8. Unfortunately, machine hung dumping core, so the stack trace
for the owner thread was not available.

I was unable to make any conclusion from the data that was present.
If the situation is reproducable, you coulld try to revert r221937. This
is pure speculation, though.


pgpW2o7azLFBo.pgp
Description: PGP signature

Re: csh Cannot open /etc/termcap after starting screen

2011-06-18 Thread Kostik Belousov

On Sat, Jun 18, 2011 at 10:14:32PM +0200, Stefan `Sec` Zehl wrote:
 Hi,
 
 On Thu, Jun 16, 2011 at 13:15 -0700, Jeremy Chadwick wrote:
Example: run mutt from within GNU screen while connected to
  the system with PuTTY, then copy some of the terminal content and paste
  it somewhere.  Wow, look at all those extraneous spaces at the end of
  lines, which you now gloriously have to manually remove.
 
 While I don't want to stand in the way of your rant, this is actually a
 bug/problem of mutt. -- mutt is really printing spaces there, so it is
 (IMHO) correct that copypaste copies spaces.

It is the case of the default termcap entry for the screen.
Try TERM=screen-bce mutt.


pgpVDFesl77jN.pgp
Description: PGP signature

Re: doscmd under 8-stable, anyone?

2011-06-15 Thread Kostik Belousov

On Wed, Jun 15, 2011 at 03:57:05PM +0200, Joerg Wunsch wrote:
 When trying to use doscmd on 8-stable, all I get is:
 
 Error mapping HMA, HMA disabled: : Invalid argument
 Segmentation fault (core dumped)
 
 The segfault happens at the end of mem_init(), when the allocated DOS
 memory (which is located at virtual address 0) is attempted to be
 written to.  Apparently, the mmap() failure that causes the HMA
 disabled message is actually a fatal error rather than a benign one
 the could be ignored, as it results in no valid DOS memory allocation
 at all.
 
 Right now, the only older system I could test it against uses FreeBSD
 5.x, where the mmap() works as expected.  So does anyone have an idea
 why this mmap() call:
 
 if (mmap((caddr_t)0x00, 0x10,
PROT_EXEC | PROT_READ | PROT_WRITE,
MAP_ANON | MAP_FIXED | MAP_SHARED,
-1, 0) == MAP_FAILED) {
 perror(Error mapping HMA, HMA disabled: );
 HMA_a20 = -1;
 close(HMA_fd_off);
 close(HMA_fd_on);
 return;
 }
 
 yields an EINVAL now under 8-stable?

Do sysctl security.bsd.map_at_zero=1


pgps0JeiyjSft.pgp
Description: PGP signature

Re: doscmd under 8-stable, anyone?

2011-06-15 Thread Kostik Belousov

On Wed, Jun 15, 2011 at 04:44:55PM +0200, Joerg Wunsch wrote:
 As Kostik Belousov wrote:
 
  Do sysctl security.bsd.map_at_zero=1
 
 Just for the record, this sysctl also makes my really really old utree
 binary work again.  The binary dates back to 386BSD 0.0, and I'm only
 keeping it out of curiosity:
 
 j@uriah 66% ls -l /usr/local/bin/utree
 -rwxr-xr-x  1 bin  bin  179639 Apr 30  1992 /usr/local/bin/utree*
 
 The only thing to make it run is to use a termcap entry that is
 smaller than 1024 byte, as this used to be a hard-coded limitation in
 the termcap library of those days, and the binary is statically
 linked.  TERM=vt100 works, xterm no longer does.
 
 The ability to run this binary only serves as a proof that no backward
 compatibility has ever been broken in FreeBSD. ;-)  (Obviously, all
 the various COMPAT_* options must be present in the kernel config.)

Yes, doscmd and N-magic a.out binaries were the arguments to implement
the sysctl instead of outright disable of the mapping at address 0.
You are the first documented case of the wiseness of the decision :).

BTW, I semi-jokingly committed the support for FreeBSD-1.0/i386 ABI
on amd64 on April 1. Would be interesting to see how does your binary
behaves.


pgpll3h9Y77Ec.pgp
Description: PGP signature

Re: automoc4 processes lock again

2011-05-09 Thread Kostik Belousov

On Mon, May 09, 2011 at 12:40:56PM +0400, Max Brazhnikov wrote:
 Hi,
 
 After recent Qt-4.7.3 update I can't build KDE4 ports anymore (tested on 
 8.2-STABLE amd64 only). The problem is always reproduced with x11/kdelibs4. 
 The build stalls with hanging automoc4 processes. Any help is appreciated.
 
 # ps | grep automoc
 18636   3  IN+0:00.02 /usr/local/bin/automoc4 
 /usr/obj/usr/local/tinderbox/portstrees/FreeBSD/ports/x11/kdelibs4/work/kdelibs-4.6.3/build/kde3support/
 18640   3  IN+0:00.00 /usr/local/bin/automoc4 
 /usr/obj/usr/local/tinderbox/portstrees/FreeBSD/ports/x11/kdelibs4/work/kdelibs-4.6.3/build/kde3support/
 
 # gdb automoc4 18636
...
 Reading symbols from /lib/libthr.so.3...done.
 [New Thread 801c0ae40 (LWP 100660/automoc4)]
 [New Thread 801c041c0 (LWP 100590/initial thread)]
...
 [Switching to Thread 801c0ae40 (LWP 100660/automoc4)]
 0x00080104c99c in select () at select.S:3
 3   RSYSCALL(select)
 (gdb) bt
 #0  0x00080104c99c in select () at select.S:3
 #1  0x0008008502cd in QProcessManager::run (this=0x800b196e0) at 
 io/qprocess_unix.cpp:245
 #2  0x000800749bde in QThreadPrivate::start (arg=0x800b196e0) at 
 thread/qthread_unix.cpp:320
 #3  0x0008017985e1 in thread_start (curthread=0x801c0ae40) at 
 /usr/freebsd/8/src/lib/libthr/thread/thr_create.c:288
 #4  0x in ?? ()
 Error accessing memory address 0x7fbff000: Bad address.
 Current language:  auto; currently asm
 
 # gdb automoc4 18640
...
 0x0008017a24cc in _umtx_op_err () at 
 /usr/freebsd/8/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
 37  RSYSCALL_ERR(_umtx_op)
 (gdb) bt
 #0  0x0008017a24cc in _umtx_op_err () at 
 /usr/freebsd/8/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
 #1  0x0008017a21bc in __thr_umutex_lock (mtx=0x8018a7380, id=100590) at 
 /usr/freebsd/8/src/lib/libthr/thread/thr_umtx.c:58
 #2  0x00080179d04a in init_static (thread=0x801c041c0, mutex=0x801166c78) 
 at thr_umtx.h:88
 #3  0x00080179d7ad in __pthread_mutex_lock (mutex=0x801166c78) at 
 /usr/freebsd/8/src/lib/libthr/thread/thr_mutex.c:441
 #4  0x00080104b21e in _flockfile (fp=0x801166be0) at 
 /usr/freebsd/8/src/lib/libc/stdio/_flock_stub.c:70
 #5  0x000801021515 in fileno (fp=0x801166be0) at 
 /usr/freebsd/8/src/lib/libc/stdio/fileno.c:52
 #6  0x00080084f109 in QProcessPrivate::execChild (this=0x801c51600, 
 workingDir=0x0, path=0x0, argv=0x801c5b7c0, envp=0x0) at 
 io/qprocess_unix.cpp:712
 #7  0x000800851fc3 in QProcessPrivate::startProcess (this=0x801c51600) at 
 io/qprocess_unix.cpp:665
 #8  0x000800802248 in QProcess::start (this=0x7fffcd10, 
 program=@0x7fffd8f8, arguments=@0x7fffcd00, mode=@0x7fffcd20)
 at io/qprocess.cpp:1960
 #9  0x0040acd2 in AutoMoc::echoColor (this=0x7fffd8d0, 
 msg=@0x7fffce80)
 at 
 /usr/obj/usr/ports/devel/automoc4/work/automoc4-0.9.88/kde4automoc.cpp:73
 #10 0x0040517c in AutoMoc::generateMoc (this=0x7fffd8d0, 
 sourceFile=@0x801c0f910, mocFileName=@0x801c0f918)
 at 
 /usr/obj/usr/ports/devel/automoc4/work/automoc4-0.9.88/kde4automoc.cpp:569
 #11 0x00408011 in AutoMoc::run (this=0x7fffd8d0) at 
 /usr/obj/usr/ports/devel/automoc4/work/automoc4-0.9.88/kde4automoc.cpp:470
 #12 0x00409135 in main (argc=6, argv=0x7fffd9a8) at 
 /usr/obj/usr/ports/devel/automoc4/work/automoc4-0.9.88/kde4automoc.cpp:114
 Current language:  auto; currently asm

You did not supplied enough information.
Which of the processes is parent, which is child ?
Note that there are other threads in the pid 18636. What does they do ?

If you would allow me to make some guess, then I could assume that pid
18640 is the child. Note that the child is waiting for the pthread
mutex locked which protects the stdio' FILE structure. Now, assume
additionally that the parent had the FILE locked in one thread while
another thread did the fork. Then, the child process would never be able
to obtain the lock because the lock was acquired by the thread that
exists no longer (in the child process, only the thread that called
fork is duplicated).

In fact, I believe that you already reported a similar problem with
malloc(3) some time ago. The root of the problem would be an undefined
(and permitted by POSIX) behaviour of calling non-async signal safe
functions in multithreaded process after fork.

For malloc(3), this can be argued to be a quality of the implementation
issue, but there is no reason to specially handle random mutexes, even
from libc. If the mutex was locked during the fork time, the protected
data structure is arguably in the inconsistent state after the fork in
the child.


pgpEO6Gaom30D.pgp
Description: PGP signature

Re: automoc4 processes lock again

2011-05-09 Thread Kostik Belousov

On Mon, May 09, 2011 at 07:39:46PM +0400, Max Brazhnikov wrote:
 On Mon, 9 May 2011 15:41:05 +0300, Kostik Belousov wrote:
  You did not supplied enough information.
  Which of the processes is parent, which is child ?
  Note that there are other threads in the pid 18636. What does they do ?
 
 Here is backtraces from all threads 
 http://people.freebsd.org/~makc/automoc4.bt
 63373 is a parent now, 63374 is a child.
 
 There were no related changes in Qt4 and automoc4 sources, probably my update 
 from 8.2-PRERELEASE to STABLE a week ago triggered the issue.

It is obviously application bug, yes, I think my guess was right.
Thou shalt not call non-async safe functions in thy child of
multithreaded process.

Since it is a race, I see it more curious that it did not manifested
itself prevously.

 
  If you would allow me to make some guess, then I could assume that pid
  18640 is the child. Note that the child is waiting for the pthread
  mutex locked which protects the stdio' FILE structure. Now, assume
  additionally that the parent had the FILE locked in one thread while
  another thread did the fork. Then, the child process would never be able
  to obtain the lock because the lock was acquired by the thread that
  exists no longer (in the child process, only the thread that called
  fork is duplicated).
  
  In fact, I believe that you already reported a similar problem with
  malloc(3) some time ago. The root of the problem would be an undefined
  (and permitted by POSIX) behaviour of calling non-async signal safe
  functions in multithreaded process after fork.
  
  For malloc(3), this can be argued to be a quality of the implementation
  issue, but there is no reason to specially handle random mutexes, even
  from libc. If the mutex was locked during the fork time, the protected
  data structure is arguably in the inconsistent state after the fork in
  the child.


pgpxoq0JuHcPt.pgp
Description: PGP signature

Re: Kernel memory leak in 8.2-PRERELEASE?

2011-04-02 Thread Kostik Belousov

On Sat, Apr 02, 2011 at 10:17:27AM -0400, Boris Kochergin wrote:
 Ahoy. This morning, I awoke to the following on one of my servers:
 
 pid 59630 (httpd), uid 80, was killed: out of swap space
 pid 59341 (find), uid 0, was killed: out of swap space
 pid 23134 (irssi), uid 1001, was killed: out of swap space
 pid 49332 (sshd), uid 1001, was killed: out of swap space
 pid 69074 (httpd), uid 0, was killed: out of swap space
 pid 11879 (eggdrop-1.6.19), uid 1001, was killed: out of swap space
 ...
 
 And so on.
 
 The machine is:
 
 FreeBSD exodus.poly.edu 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #2: Thu 
 Dec  2 11:39:21 EST 2010 
 sp...@exodus.poly.edu:/usr/obj/usr/src/sys/EXODUS  amd64
 
 10:13AM  up 120 days, 20:06, 2 users, load averages: 0.00, 0.01, 0.00
 
 The memory line from top intrigued me:
 
 Mem: 16M Active, 48M Inact, 6996M Wired, 229M Cache, 828M Buf, 605M Free
 
 The machine has 8 gigs of memory, and I don't know what all that wired 
 memory is being used for. There is a large-ish (6 x 1.5-TB) ZFS RAID-Z2 
 on it which has had a disk in the UNAVAIL state for a few months:
 
 # zpool status
   pool: home
  state: DEGRADED
 status: One or more devices could not be used because the label is 
 missing or
 invalid.  Sufficient replicas exist for the pool to continue
 functioning in a degraded state.
 action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
  scrub: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 homeDEGRADED 0 0 0
   raidz2DEGRADED 0 0 0
 ada0ONLINE   0 0 0
 ada1ONLINE   0 0 0
 ada2ONLINE   0 0 0
 ada3ONLINE   0 0 0
 ada4ONLINE   0 0 0
 ada5UNAVAIL  08511  experienced I/O failures
 
 errors: No known data errors
 
 vmstat -m and vmstat -z output:
 
 http://acm.poly.edu/~spawk/vmstat-m.txt
 http://acm.poly.edu/~spawk/vmstat-z.txt
 
 Anyone have a clue? I know it's just going to happen again if I reboot 
 the machine. It is still up in case there are diagnostics for me to run.

Try r218795. Most likely, your issue is not leak.


pgpDK3atxfMFJ.pgp
Description: PGP signature

Re: Problem using POSIX message queues

2011-03-28 Thread Kostik Belousov

On Mon, Mar 28, 2011 at 01:19:38PM -0400, Derek Tattersall wrote:
 While trying to develop an understanding of the use POSIX message
 queues, I found that issuing a mq_open (2) call, resulted in Bad system
 call: 12 error message. I have tried to run the tools/regression/mqueue
 tests, but they fail in mq_open with the bad system call error. In
 addition, the mq_open (2) man page refers to mq_timedreceive (3),
 mq_timedsend(3) which exist as section 2 man pages and a mq_unlink(3)
 man page which I can't find at all.
Try kldload mqueuefs before the tests.


pgpZpbPIeMbum.pgp
Description: PGP signature

Re: [releng_7 tinderbox] failure on ia64/ia64

2011-03-13 Thread Kostik Belousov

On Sun, Mar 13, 2011 at 01:04:05PM +, FreeBSD Tinderbox wrote:
 TB --- 2011-03-13 11:05:21 - tinderbox 2.6 running on freebsd-legacy.sentex.ca
 TB --- 2011-03-13 11:05:21 - starting RELENG_7 tinderbox run for ia64/ia64
 TB --- 2011-03-13 11:05:21 - cleaning the object tree
 TB --- 2011-03-13 11:05:35 - cvsupping the source tree
 TB --- 2011-03-13 11:05:35 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
 -s /usr/home/tinderbox/RELENG_7/ia64/ia64/supfile
 TB --- 2011-03-13 11:05:42 - building world
 TB --- 2011-03-13 11:05:42 - MAKEOBJDIRPREFIX=/obj
 TB --- 2011-03-13 11:05:42 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
 TB --- 2011-03-13 11:05:42 - TARGET=ia64
 TB --- 2011-03-13 11:05:42 - TARGET_ARCH=ia64
 TB --- 2011-03-13 11:05:42 - TZ=UTC
 TB --- 2011-03-13 11:05:42 - __MAKE_CONF=/dev/null
 TB --- 2011-03-13 11:05:42 - cd /src
 TB --- 2011-03-13 11:05:42 - /usr/bin/make -B buildworld
  World build started on Sun Mar 13 11:05:44 UTC 2011
  Rebuilding the temporary build tree
  stage 1.1: legacy release compatibility shims
  stage 1.2: bootstrap tools
  stage 2.1: cleaning up the object tree
  stage 2.2: rebuilding the object tree
  stage 2.3: build tools
  stage 3: cross tools
  stage 4.1: building includes
  stage 4.2: building libraries
  stage 4.3: make dependencies
  stage 4.4: building everything
  World build completed on Sun Mar 13 12:50:12 UTC 2011
 TB --- 2011-03-13 12:50:12 - generating LINT kernel config
 TB --- 2011-03-13 12:50:12 - cd /src/sys/ia64/conf
 TB --- 2011-03-13 12:50:12 - /usr/bin/make -B LINT
 TB --- 2011-03-13 12:50:12 - building LINT kernel
 TB --- 2011-03-13 12:50:12 - MAKEOBJDIRPREFIX=/obj
 TB --- 2011-03-13 12:50:12 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
 TB --- 2011-03-13 12:50:12 - TARGET=ia64
 TB --- 2011-03-13 12:50:12 - TARGET_ARCH=ia64
 TB --- 2011-03-13 12:50:12 - TZ=UTC
 TB --- 2011-03-13 12:50:12 - __MAKE_CONF=/dev/null
 TB --- 2011-03-13 12:50:12 - cd /src
 TB --- 2011-03-13 12:50:12 - /usr/bin/make -B buildkernel KERNCONF=LINT
  Kernel build for LINT started on Sun Mar 13 12:50:12 UTC 2011
  stage 1: configuring the kernel
  stage 2.1: cleaning up the object tree
  stage 2.2: rebuilding the object tree
  stage 2.3: build tools
  stage 3.1: making dependencies
  stage 3.2: building everything
 [...]
 cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
 -Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
 -Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions -nostdinc 
  -I. -I/src/sys -I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src 
 -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
 -finline-limit=15000 --param inline-unit-growth=100 --param 
 large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
 -mfixed-range=f32-f127 -fpic -ffreestanding -Werror  
 /src/sys/kern/imgact_shell.c
 /src/sys/kern/imgact_shell.c: In function 'exec_shell_imgact':
 /src/sys/kern/imgact_shell.c:238: error: invalid storage class for function 
 'shell_modevent'
 cc1: warnings being treated as errors
 /src/sys/kern/imgact_shell.c:238: warning: no previous prototype for 
 'shell_modevent'
 /src/sys/kern/imgact_shell.c:238: error: initializer element is not constant
 /src/sys/kern/imgact_shell.c:238: error: (near initialization for 
 'shell_mod.evhand')
 /src/sys/kern/imgact_shell.c:238: error: expected declaration or statement at 
 end of input
 *** Error code 1
 
 Stop in /obj/ia64/src/sys/LINT.
 *** Error code 1
 
 Stop in /src.
 *** Error code 1
 
 Stop in /src.
 TB --- 2011-03-13 13:04:05 - WARNING: /usr/bin/make returned exit code  1 
 TB --- 2011-03-13 13:04:05 - ERROR: failed to build lint kernel
 TB --- 2011-03-13 13:04:05 - 5823.39 user 719.80 system 7123.70 real
 
I committed from the wrong tree, sorry. Should be fixed now.


pgpQbKJAJ5f00.pgp
Description: PGP signature

Re: Linker set issues with ath(4) HALs

2011-03-05 Thread Kostik Belousov

On Sat, Mar 05, 2011 at 07:50:05PM +1100, Peter Jeremy wrote:
 I have a Atheros AR5424 and so, based on the 8.2-STABLE i386 NOTES
 and some rummaging in the sources, I tried to build a kernel with:
 
 deviceath # Atheros pci/cardbus NIC's
 deviceath_ar5212  # HAL for Atheros AR5212 and derived 
 chips
 deviceath_rate_sample # SampleRate tx rate control for ath
 
 and this died during the kernel linking with:
 linking kernel.debug
 ah.o(.text+0x23c): In function `ath_hal_rfprobe':
 /usr/src/sys/dev/ath/ath_hal/ah.c:142: undefined reference to 
 `__start_set_ah_rf
 s'
 ah.o(.text+0x241):/usr/src/sys/dev/ath/ath_hal/ah.c:142: undefined reference 
 to  `__stop_set_ah_rfs'
 ah.o(.text+0x25a):/usr/src/sys/dev/ath/ath_hal/ah.c:142: undefined reference 
 to  `__stop_set_ah_rfs'
 
 Following a suggestion by a friend, I changed that to:
 
 deviceath # Atheros pci/cardbus NIC's
 options AH_SUPPORT_AR5416
 deviceath_hal # Atheros HAL
 deviceath_rate_sample # SampleRate tx rate control for ath
 
 and it worked.  Normally, I would leave it at that but I'd like to
 understand what is actually going on...
 
 In both cases, ah.o contains the following 4 references:
  U __start_set_ah_chips
  U __start_set_ah_rfs
  U __stop_set_ah_chips
  U __stop_set_ah_rfs
 generated by:
 /* linker set of registered chips */
 OS_SET_DECLARE(ah_chips, struct ath_hal_chip);
 /* linker set of registered RF backends */
 OS_SET_DECLARE(ah_rfs, struct ath_hal_rf);
 
 These symbols do not appear in any other .o files, though there are a
 variety of other __{start,stop}_set_* symbols - all of which show up
 as 'A' (absolule) values in the final kernel.
 
 My questions are:
 How are these linker set references resolved?  I can't find anything
 that defines these symbols - either in .o files or in ldscript files.
 
 In the first case, there are 2 pairs of undefined linker set variables
 but the linker only reports one pair as unresolved.  Why don't both
 sets show up as resolved or unresolved?  Why does using the generic
 ath_hal, rather than the hardware-specific HAL fix the problem?

Linker synthesizes the symbols assuming the following two conditions are
met:
- the symbols are referenced;
- there exists an ELF section named `set_ah_rfs'.
It assigns the (relocated) start of the section to __start_sectionname,
and end to __stop_sectionname.

Most likely, omitting the option causes some SET_ENTRY() macro, which put
a symbol into linker set, to be ommitted. Then, no section is created
and linker does not synthesizes the missed symbols.


pgp26WSwRNadH.pgp
Description: PGP signature

Re: svn commit: r219178 - head/sys/crypto/aesni

2011-03-02 Thread Kostik Belousov

On Wed, Mar 02, 2011 at 02:56:58PM +, Konstantin Belousov wrote:
 Author: kib
 Date: Wed Mar  2 14:56:58 2011
 New Revision: 219178
 URL: http://svn.freebsd.org/changeset/base/219178
 
 Log:
   Fix a bug in the result of manual assembly.
   
   Reported by:Stefan Grundmann sg2342 googlemail com
   PR: kern/155118
   MFC after:  3 days
The end result of this bug should affect only AES256 variants,
causing wrong keyschedule calculation. If you have a geli partition
with 256bit key that worked with previous version of aesni(4), best
strategy is backup, reinitialize geli volume with the new driver,
then restore.

Sorry.


 
 Modified:
   head/sys/crypto/aesni/aeskeys_amd64.S
   head/sys/crypto/aesni/aeskeys_i386.S
 
 Modified: head/sys/crypto/aesni/aeskeys_amd64.S
 ==
 --- head/sys/crypto/aesni/aeskeys_amd64.S Wed Mar  2 14:39:26 2011
 (r219177)
 +++ head/sys/crypto/aesni/aeskeys_amd64.S Wed Mar  2 14:56:58 2011
 (r219178)
 @@ -162,7 +162,7 @@ ENTRY(aesni_set_enckey)
   .byte   0x66,0x0f,0x3a,0xdf,0xc8,0x20
   call_key_expansion_256b
  //   aeskeygenassist $0x40,%xmm2,%xmm1   # round 7
 - .byte   0x66,0x0f,0x3a,0xdf,0xca,0x20
 + .byte   0x66,0x0f,0x3a,0xdf,0xca,0x40
   call_key_expansion_256a
   retq
  .Lenc_key192:
 
 Modified: head/sys/crypto/aesni/aeskeys_i386.S
 ==
 --- head/sys/crypto/aesni/aeskeys_i386.S  Wed Mar  2 14:39:26 2011
 (r219177)
 +++ head/sys/crypto/aesni/aeskeys_i386.S  Wed Mar  2 14:56:58 2011
 (r219178)
 @@ -167,7 +167,7 @@ ENTRY(aesni_set_enckey)
   .byte   0x66,0x0f,0x3a,0xdf,0xc8,0x20
   call_key_expansion_256b
  //   aeskeygenassist $0x40,%xmm2,%xmm1   # round 7
 - .byte   0x66,0x0f,0x3a,0xdf,0xca,0x20
 + .byte   0x66,0x0f,0x3a,0xdf,0xca,0x40
   call_key_expansion_256a
   .cfi_adjust_cfa_offset -4
   leave


pgpOzcvoWU4UT.pgp
Description: PGP signature

Re: FYI: Userspace DTrace MFC to stable/8

2011-03-01 Thread Kostik Belousov

On Tue, Mar 01, 2011 at 11:03:07AM +0200, Nikolay Denev wrote:
 On 1 Mar, 2011, at 01:33 , Robert Watson wrote:
 
  Dear all:
  
  Just an FYI that I've gone ahead and merged userspace DTrace support to 
  FreeBSD 8.x from 9.x.  While it appeared to pass build tests locally, boot 
  and run, etc, this is a non-trivial merge, and it's possible I've messed 
  up.  If so, apologies in advance, and I'll try to resolve any problems as 
  quickly as I can!
  
  And of course, many thanks go to Rui Paulo, who did the port of userspace 
  DTrace to FreeBSD 9.x with support from the FreeBSD Foundation!
  
  Thanks,
  
  Robert N M Watson
  Computer Laboratory
  University of Cambridge
  
 
 That's great news! Many thanks to all that made this possible!
 
 I have a quick question though, now do I have to rebuild my world with 
 WITH_CTF ?
 I'm asking because I did that by mistake some months ago on a RELENG_8 
 machine, and
 the world that was built had some problems, like gcc giving segfault 11 while 
 compiling world or some ports.
 
It was a known issue that ctfconvert (I think it is ctfconvert) damages
statically linked binaries. Most likely, it was not fixed yet.


pgpgIgYwH0GU1.pgp
Description: PGP signature

Re: FreeBSD 8.2 Release, ZFS + Samba, running out of memory

2011-02-22 Thread Kostik Belousov

On Tue, Feb 22, 2011 at 10:55:37PM +0100, Henner Heck wrote:
 
 Hello,
 
 i experience freezing of my FreeBSD machine when performing certain
 operations
 on a Samba share.
 
 Technical info:
 - FreeBSD 8.2 Release 64 Bit (it also happened with 8.2 RC3)
 - Samba 3.5.6.1
 - Athlon II Quadcore, 4 GB Ram
 - 1 SSD with a ZFS pool (No.0) containing the FreeBSD system
 - 12x2TB RaidZ2 pool (No.1) for data, created on 12 GEOM eli encrypted
 partitions on 12 disks,
 shared to a Windows 7 PC with Samba,
 8 of the disks are attached to 2 Marvell SATA controllers, 4 to the
 onboard controller
 - ZPool v15, ZFS v4
 
 Scenarios (checked using top):
 
 A:
 When copying files from one directory in pool 1 to another, the free
 memory drops from
 about 3700M to abaout 200M in the process, but seems to stabilize then.
 
 B:
 When copying the files onto a Windows machine using the Samba share,
 the free memory seems to stabilize at about 100M.
 
 C:
 When computing a hashvalue of files from the share on Windows or doing a
 binary compare to copies of the files stored on the Windows PC (using
 Total Commander),
 the free memory on the FreeBSD machine drops even lower and shortly
 after the BSD system freezes.
 Here is the last top output i got via ssh:
 
 /last pid:  1328;  load averages:  4.53,  2.23,  0.99up 0+00:04:39 
 22:07:50
 263 processes: 43 running, 201 sleeping, 19 waiting
 CPU:  0.9% user,  0.0% nice, 23.1% system,  4.2% interrupt, 71.9% idle
 Mem: 720K Active, 516M Wired, 144K Cache, 320K Buf, *39M Free*
 Swap: 4096M Total, 12M Used, 4084M Free, 3008K In, 5124K Out
 
   PID USERNAME  THR PRI NICE   SIZERES STATE   C   TIME   WCPU COMMAND
11 root4 171 ki31 0K64K RUN 0  15:54 303.61% idle
  1321 root1  520 27812K   704K swread  1   0:24 14.26% smbd
12 root   19 -60- 0K   304K WAIT0   0:21 12.45% intr
16 root1  48- 0K16K psleep  2   0:01  3.76%
 pagedaemon
 3 root1  -8- 0K16K RUN 0   0:06  3.27% g_up
 4 root1  -8- 0K16K -   3   0:05  2.69% g_down
 0 root  108  -80 0K  1712K -   0   1:02  1.86% kernel
 8 root6  -8- 0K88K tx-tx  1   0:00  1.27% zfskern
  1268 root1  44- 0K16K geli:w  1   0:03  0.98%
 g_eli[1] gpt
  1225 root1  45- 0K16K RUN 3   0:02  0.98%
 g_eli[3] gpt
  1267 root1  44- 0K16K geli:w  0   0:02  0.98%
 g_eli[0] gpt
  1237 root1  44- 0K16K RUN 0   0:02  0.88%
 g_eli[0] gpt
  1214 root1  44- 0K16K RUN 2   0:02  0.88%
 g_eli[2] gpt
  1244 root1  44- 0K16K RUN 2   0:02  0.78%
 g_eli[2] gpt
  1243 root1  44- 0K16K RUN 1   0:02  0.78%
 g_eli[1] gpt
  1212 root1  44- 0K16K RUN 0   0:02  0.78%
 g_eli[0] gpt
  1215 root1  44- 0K16K RUN 3   0:02  0.78%
 g_eli[3] gpt
  1213 root1  44- 0K16K RUN 1   0:02  0.78%
 g_eli[1] gpt
  1240 root1  44- 0K16K RUN 3   0:02  0.78%
 g_eli[3] gpt
  1217 root1  44- 0K16K RUN 0   0:02  0.78%
 g_eli[0] gpt
  1242 root1  44- 0K16K RUN 0   0:02  0.68%
 g_eli[0] gpt
  1238 root1  44- 0K16K RUN 1   0:02  0.68%
 g_eli[1] gpt
  1248 root1  44- 0K16K RUN 1   0:02  0.68%
 g_eli[1] gpt
  1252 root1  44- 0K16K RUN 0   0:02  0.68%
 g_eli[0] gpt
  1249 root1  44- 0K16K RUN 2   0:02  0.68%
 g_eli[2] gpt
  1269 root1  44- 0K16K geli:w  2   0:02  0.68%
 g_eli[2] gpt/
 
 It looks like a caching problem to me, but i don't know how to fix it.
 I am also a bit confused, since i don't see an obvious difference
 between scenario B and C.
 I had a similar setup with 5 disks RaidZ1 and Samba running on 8.1 Release,
 and never experienced such a freeze.
 
 Does anyone have advice on how to get rid of this problem?
Try the patch from rev. 218795.

If it indeed help, we would need an errara notice.


pgp337czssbUY.pgp
Description: PGP signature

Re: About panic: bufwrite: buffer is not busy???

2011-02-21 Thread Kostik Belousov

On Sun, Feb 20, 2011 at 10:30:52AM -0500, Mike Tancsa wrote:
 On 2/20/2011 9:33 AM, Andrey Smagin wrote:
  On week -current I have same problem, my box paniced every 2-15 min. I 
  resolve problem by next steps - unplug network connectors from 2 intel em 
  (82574L) cards. I think last time that mpd5 related panic, but mpd5 work 
  with another re interface interated on MB. I think it may be em related 
  panic, or em+mpd5.
 
 The latest panic I saw didnt have anything to do with em.  Are you sure
 your crashes are because of the nic drive ?
 
 The latest I saw was on Friday.
 
 # kgdb /usr/obj/usr/src/sys/router/kernel.debug vmcore.11
 (kgdb) bt
 #0  doadump () at pcpu.h:231
 #1  0xc04a51f9 in db_fncall (dummy1=1, dummy2=0, dummy3=-106856,
 dummy4=0xc6b9696c ) at /usr/src/sys/ddb/db_command.c:548
 #2  0xc04a55f1 in db_command (last_cmdp=0xc096f73c, cmd_table=0x0,
 dopager=1) at /usr/src/sys/ddb/db_command.c:445
 #3  0xc04a574a in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
 #4  0xc04a764d in db_trap (type=12, code=0) at
 /usr/src/sys/ddb/db_main.c:229
 #5  0xc068ba7e in kdb_trap (type=12, code=0, tf=0xc6b96b94) at
 /usr/src/sys/kern/subr_kdb.c:546
 #6  0xc088056f in trap_fatal (frame=0xc6b96b94, eva=52) at
 /usr/src/sys/i386/i386/trap.c:937
 #7  0xc0880830 in trap_pfault (frame=0xc6b96b94, usermode=0, eva=52) at
 /usr/src/sys/i386/i386/trap.c:859
 #8  0xc0880d4a in trap (frame=0xc6b96b94) at
 /usr/src/sys/i386/i386/trap.c:532
 #9  0xc086716c in calltrap () at /usr/src/sys/i386/i386/exception.s:166
 #10 0xc0657a16 in uihold (uip=0x0) at /usr/src/sys/kern/kern_resource.c:1248
 #11 0xc0654ec9 in crcopy (dest=0xce3ee800, src=0xce3ee600) at
 /usr/src/sys/kern/kern_prot.c:1873
 #12 0xc0654fd1 in crcopysafe (p=0xc90cc810, cr=0xce3ee800) at
 /usr/src/sys/kern/kern_prot.c:1950
 #13 0xc0656d7f in seteuid (td=0xc9196b80, uap=0xc6b96cec) at
 /usr/src/sys/kern/kern_prot.c:615
 #14 0xc06985ff in syscallenter (td=0xc9196b80, sa=0xc6b96ce4) at
 /usr/src/sys/kern/subr_trap.c:315
 #15 0xc0880884 in syscall (frame=0xc6b96d28) at
 /usr/src/sys/i386/i386/trap.c:1061
 #16 0xc08671d1 in Xint0x80_syscall () at
 /usr/src/sys/i386/i386/exception.s:264
 #17 0x0033 in ?? ()
 
 (kgdb) frame 10
 #10 0xc0657a16 in uihold (uip=0x0) at /usr/src/sys/kern/kern_resource.c:1248
 1248{
 (kgdb) list
 1243 * Place another refcount on a uidinfo struct.
 1244 */
 1245void
 1246uihold(uip)
 1247struct uidinfo *uip;
 1248{
 1249
 1250refcount_acquire(uip-ui_ref);
 1251}
 1252
 (kgdb) p *uip
 Cannot access memory at address 0x0
 (kgdb) p uip
 $1 = (struct uidinfo *) 0x0
 (kgdb)
Is this reproducable ?
What system version is it ?

Could you, please, go to frame 12 and show the output of p *p,
p *(p-p_ucred) ?


pgpw4qSDa53Ej.pgp
Description: PGP signature

Re: minor data-typing error in 8.1 fs/devfs/devfs_vnops.c

2011-02-07 Thread Kostik Belousov

On Mon, Feb 07, 2011 at 12:53:14AM -0800, per...@pluto.rain.com wrote:
 Noticed while digging through devfs_read_f() and devfs_write_f() in
 the course of investigating some unexpected (by me) geom behavior:
 
 ...
 int ioflag, error, resid;
 ...
 resid = uio-uio_resid;
 ...
 if (uio-uio_resid != resid || ...
 
 IOW resid (an int) is being assigned from and compared with
 uio-uio_resid (an ssize_t).
 
 I suppose it's probably harmless on any arch where an (int) is at
 least as large as an (ssize_t), but strictly speaking it does look
 like a bug -- or am I missing something?

The only consequence of resid truncating uio_resid would be failure
to update access time for the devfs node, which is probably not a big
issue.

In fact, HEAD cannot generate request for i/o greater than 4GB anyway.
The type of uio_resid was increased from int to ssize_t to not break
the KBI and ease indended fix to support full size_t arguments for
read(2)/write(2). The change requires lots of careful review, and thus
stalled.

I integrated your fix into the patch, see
http://people.freebsd.org/~kib/misc/uio_resid.4.patch



pgpAONwt6Yfz2.pgp
Description: PGP signature

Re: Xorg in swwrt

2011-02-06 Thread Kostik Belousov

On Sun, Feb 06, 2011 at 03:19:12PM +1030, Daniel O'Connor wrote:
 I updated ports (portmaster -a basically) on this 8.2-PRE box and now I find 
 X takes a long, long time to start up and uses lots of CPU. It shows the 
 wchan as swwrt.
 
 eg..
 last pid: 21791;  load averages:  0.12,  0.29,  0.23 up 
 0+16:09:07  15:16:15
 496 processes: 2 running, 494 sleeping
 CPU:  0.0% user,  0.0% nice, 46.7% system,  0.0% interrupt, 53.3% idle
 Mem: 190M Active, 33M Inact, 3217M Wired, 198M Cache, 15M Buf, 171M Free
 Swap: 4096M Total, 621M Used, 3475M Free, 15% Inuse, 212K Out
 
   PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME   WCPU COMMAND
 21787 fiona 1  760   168M   134M swwrt   0   0:04 32.37% Xorg
swwrt means waiting for the syncronous swap-out to finish.
This is consistent with the top indicating the non-trivial amount of
swap space used and swapout happen right now.

Look at the working set of the application you are starting.
Another thing that is standing out is huge wired count.
 21788 darius1  760 31860K  4868K pause   1   0:00  1.17% zsh
  2081 darius4  440   113M 11620K ucond   1   9:45  0.10% python2.6
   656 root  1  440 24392K  1096K select  1   3:44  0.00% ppp
  1881 darius   32  520   135M  8804K uwait   0   2:24  0.00% python2.6
 
 Does anyone else see this?
 If it matters I am using the xf86-video-ati driver
 (II) RADEON(0): vgaHWGetIOBase: hwp-IOBase is 0x03d0, hwp-PIOOffset is 
 0x
 (==) RADEON(0): RGB weight 888
 (II) RADEON(0): Using 8 bits per RGB (8 bit DAC)
 (--) RADEON(0): Chipset: ATI Radeon HD 4200 (ChipID = 0x9710)
 (--) RADEON(0): Linear framebuffer at 0xd800
 (II) RADEON(0): PCI card detected
 
 [snip]
 
 (II) RADEON(0):   MC_AGP_LOCATION  : 0x003f
 (II) RADEON(0): Depth moves disabled by default
 (II) RADEON(0): Allocating from a screen of 131008 kb
 (II) RADEON(0): Will use 32 kb for hardware cursor 0 at offset 0x00b7c000
 (II) RADEON(0): Will use 32 kb for hardware cursor 1 at offset 0x00b8
 (II) RADEON(0): Will use 11760 kb for front buffer at offset 0x
 (II) RADEON(0): Will use 64 kb for PCI GART at offset 0x07ff
 (II) RADEON(0): Will use 11760 kb for back buffer at offset 0x00b84000
 (II) RADEON(0): Will use 11760 kb for depth buffer at offset 0x0170
 (II) RADEON(0): Will use 47616 kb for textures at offset 0x0227c000
 (II) RADEON(0): Will use 48080 kb for X Server offscreen at offset 0x050fc000
 drmOpenDevice: node name is /dev/dri/card0
 drmOpenDevice: open result is 10, (OK)
 drmOpenDevice: node name is /dev/dri/card0
 drmOpenDevice: open result is 10, (OK)
 drmOpenByBusid: Searching for BusID pci::01:05.0
 drmOpenDevice: node name is /dev/dri/card0
 drmOpenDevice: open result is 10, (OK)
 drmOpenByBusid: drmOpenMinor returns 10
 drmOpenByBusid: drmGetBusid reports pci::01:05.0
 (II) [drm] DRM interface version 1.2
 (II) [drm] DRM open master succeeded.
 (II) RADEON(0): [drm] Using the DRM lock SAREA also for drawables.
 (II) RADEON(0): [drm] framebuffer handle = 0xd800
 (II) RADEON(0): [drm] added 1 reserved context for kernel
 (II) RADEON(0): X context handle = 0x3
 (II) RADEON(0): [drm] installed DRM signal handler
 [in swwrt]
 
 Does anyone else see this?
 
 Thanks.
 
 --
 Daniel O'Connor software and network engineer
 for Genesis Software - http://www.gsoft.com.au
 The nice thing about standards is that there
 are so many of them to choose from.
   -- Andrew Tanenbaum
 GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C
 
 
 
 
 
 
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


pgpyZWLQ7q6ud.pgp
Description: PGP signature

Re: tmpfs is zero bytes (no free space), maybe a zfs bug?

2011-01-30 Thread Kostik Belousov

On Wed, Jan 19, 2011 at 05:27:38PM +0100, Ivan Voras wrote:
 On 19 January 2011 16:02, Kostik Belousov kostik...@gmail.com wrote:
 
  http://people.freebsd.org/~ivoras/diffs/tmpfs.h.patch
 
  I don't think this is a complete solution but it's a start. If you can,
  try it and see if it helps.
  This is not a start, and actually a step in the wrong direction.
  Tmpfs is wrong now, but the patch would make the wrongness even bigger.
 
  Issue is that the current tmpfs calculation should not depend on the
  length of the inactive queue or the amount of free pages. This data only
  measures  the pressure on the pagedaemon, and has absolutely no relation
  to the amount of data that can be put into anonymous objects before the
  system comes out of swap.
 
  vm_lowmem handler is invoked in two situations:
  - when KVA cannot satisfy the request for the space allocation;
  - when pagedaemon have to start the scan.
  None of the situations has any direct correlation with the fact that
  tmpfs needs to check, that is Is there enough swap to keep all my
  future anonymous memory requests ?.
 
  Might be, swap reservation numbers can be useful to the tmpfs reporting.
  Also might be, tmpfs should reserve the swap explicitely on start, instead
  of making attempts to guess how much can be allocated at random moment.
 
 Thank you for your explanation! I'm still not very familiar with VM
 and VFS. Could you also read my report at
 http://www.mail-archive.com/freebsd-current@freebsd.org/msg126491.html
 ? I'm curious about the fact that there is lots of 'free' memory here
 in the same situation.
This is another ugliness in the dynamic calculation. Your wired is around
15GB, that is always greater then available swap + free + inactive.
As result, tmpfs_mem_info() always returns 0.
In this situation TMPFS_PAGES_MAX() seems to return negative value, and
then TMPFS_PAGES_AVAIL() clamps at 0.

 
 Do you think that there is something which can be done as a band-aid
 without a major modification to tmpfs?


pgpZ2A2eFkpjo.pgp
Description: PGP signature

Re: Living on gmirror: need to reincarnate /etc/rc.early

2011-01-25 Thread Kostik Belousov

On Tue, Jan 25, 2011 at 01:20:53PM +0600, Eugene Grosbein wrote:
 Hi!
 
 In RELENG_8, gmirror is good enough to keep whole HDD pair withing the mirror.
 Its performance, stability any pretty ease of maintainance allows
 to use it widely.
 
 With wide deployment of gmirror in production I've faced inability
 of RELENG_8 to store kernel crashdumps out-of-the-box.
 gmirror manual page documents a way to setup FreeBSD so that
 it would store crashdumps again but that way involves /etc/rc.early
 removed from RELENG_8. I've read about intentions - it was unsafe etc.
 But we still need working crashdump support.
 
 Easiest way is to reincarnate /etc/rc.d/early support making it better and 
 safer
 and it should support gmirror's mechanics for crashdumps out-of-the-box.
 
 Comments?
Yes, I have this change for eons. Actually, from the moment rc.early
was booted out.

diff --git a/etc/rc.d/Makefile b/etc/rc.d/Makefile
index 6f80b87..7981ce0 100755
--- a/etc/rc.d/Makefile
+++ b/etc/rc.d/Makefile
@@ -9,7 +9,7 @@ FILES=  DAEMON FILESYSTEMS LOGIN NETWORKING SERVERS \
ccd cleanvar cleartmp cron \
ddb defaultroute devd devfs dhclient \
dmesg dumpon \
-   encswap \
+   early encswap \
faith fsck ftp-proxy ftpd \
gbde geli geli2 gptboot gssd \
hastd hcsecd \
diff --git a/etc/rc.d/early b/etc/rc.d/early
new file mode 100755
index 000..8a863d0
--- /dev/null
+++ b/etc/rc.d/early
@@ -0,0 +1,29 @@
+#!/bin/sh
+#
+# $FreeBSD$
+#
+
+# PROVIDE: early
+# REQUIRE: disks localswap
+# BEFORE:  fsck
+
+#
+# Support for legacy /etc/rc.early script
+#
+. /etc/rc.subr
+
+name=early
+start_cmd=early_start
+stop_cmd=:
+
+early_start()
+{
+   if [ -r /etc/rc.early ]; then
+   echo -n 'Executing rc.early script:'
+   . /etc/rc.early
+   echo '.'
+   fi
+}
+
+load_rc_config $name
+run_rc_command $1


pgpK3h1KJyuPk.pgp
Description: PGP signature

Re: Living on gmirror: need to reincarnate /etc/rc.early

2011-01-25 Thread Kostik Belousov

On Tue, Jan 25, 2011 at 11:30:06AM -0800, Doug Barton wrote:
 On 01/24/2011 23:20, Eugene Grosbein wrote:
 Hi!
 
 In RELENG_8, gmirror is good enough to keep whole HDD pair withing the 
 mirror.
 Its performance, stability any pretty ease of maintainance allows
 to use it widely.
 
 With wide deployment of gmirror in production I've faced inability
 of RELENG_8 to store kernel crashdumps out-of-the-box.
 gmirror manual page documents a way to setup FreeBSD so that
 it would store crashdumps again but that way involves /etc/rc.early
 removed from RELENG_8. I've read about intentions - it was unsafe etc.
 But we still need working crashdump support.
 
 Easiest way is to reincarnate /etc/rc.d/early support making it better and 
 safer
 and it should support gmirror's mechanics for crashdumps out-of-the-box.
 
 I'll tell you the same thing I told Kostik way back when I removed it. 
 This is the only thing that anyone has ever suggested a use for in 
 /etc/rc.early, and the solution in the man page is a hack. :)
 
 If this is something that is necessary to do then I'd prefer to do it 
 properly and add an /etc/rc.d/gmirror that runs in the proper (early) 
 position, and then figure out the proper location in rc.d to handle the 
 second half of the configuration.
 
No, my use for rc.early is different. I use it to load modules
before filesystems are mounted.

 I'm happy to review patches.  :)
 
 
 Doug
 
 -- 
 
   Nothin' ever doesn't change, but nothin' changes much.
   -- OK Go
 
   Breadth of IT experience, and depth of knowledge in the DNS.
   Yours for the right price.  :)  http://SupersetSolutions.com/
 
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


pgp0IAJRe68JT.pgp
Description: PGP signature

Re: Living on gmirror: need to reincarnate /etc/rc.early

2011-01-25 Thread Kostik Belousov

On Tue, Jan 25, 2011 at 12:30:37PM -0800, Doug Barton wrote:
 On 01/25/2011 12:28, Kostik Belousov wrote:
 No, my use for rc.early is different. I use it to load modules
 before filesystems are mounted.
 
 Ok, I'll bite ... what is deficient about doing this in /boot/loader.conf?
The fact that for failing driver, I can still get to single-user reliably
with boot -s without doing loader command line magic under the stress.
Or, not having to describe that magic over the phone to somebody who
would prefer to play^H^H^H do something else instead.


pgp3YimxMXptr.pgp
Description: PGP signature

Re: tmpfs is zero bytes (no free space), maybe a zfs bug?

2011-01-19 Thread Kostik Belousov

On Wed, Jan 19, 2011 at 11:39:41AM +0100, Ivan Voras wrote:
 On 19/01/2011 11:09, Attila Nagy wrote:
 On 01/19/11 09:46, Jeremy Chadwick wrote:
 On Wed, Jan 19, 2011 at 09:37:35AM +0100, Attila Nagy wrote:
 I first noticed this problem on machines with more memory (32GB
 eg.), but now it happens on 4G machines too:
 tmpfs 0B 0B 0B
 100% /tmp
 FreeBSD builder 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #0: Sat Jan 8
 22:11:54 CET 2011
 
 Maybe it's related, that I use zfs on these machines...
 
 Sometimes it grows and shrinks, but generally there is no space even
 for a small file, or a socket to create.
 http://lists.freebsd.org/pipermail/freebsd-stable/2011-January/060867.html
 
 
 Oh crap. :(
 
 I hope somebody can find the time to look into this, it's pretty
 annoying...
 
 http://people.freebsd.org/~ivoras/diffs/tmpfs.h.patch
 
 I don't think this is a complete solution but it's a start. If you can, 
 try it and see if it helps.
This is not a start, and actually a step in the wrong direction.
Tmpfs is wrong now, but the patch would make the wrongness even bigger.

Issue is that the current tmpfs calculation should not depend on the
length of the inactive queue or the amount of free pages. This data only
measures  the pressure on the pagedaemon, and has absolutely no relation
to the amount of data that can be put into anonymous objects before the
system comes out of swap.

vm_lowmem handler is invoked in two situations:
- when KVA cannot satisfy the request for the space allocation;
- when pagedaemon have to start the scan.
None of the situations has any direct correlation with the fact that
tmpfs needs to check, that is Is there enough swap to keep all my
future anonymous memory requests ?.

Might be, swap reservation numbers can be useful to the tmpfs reporting.
Also might be, tmpfs should reserve the swap explicitely on start, instead
of making attempts to guess how much can be allocated at random moment.


pgptpHEpyasZg.pgp
Description: PGP signature

Re: 8.2-PRERELEASE: live deadlock, almost all processes in pfault state

2011-01-08 Thread Kostik Belousov

On Sat, Jan 08, 2011 at 09:44:57PM +0300, Lev Serebryakov wrote:
 Hello, Freebsd-stable.
 
  I've added `transmission' BitTorrent client to my home server and now
 it deadlocks easily (after about 1 hour of intensive download and
 seeding). This server is upgraded from 7.x and last time I've run
 transmission on 7.x system without any problems.
 
  I have home partition on geom_raid5 device, so I can not exclude this
 third-party module from experiments.
 
  My home filsystem has 32KiB block and all other filesystems (/, /var,
 /tmp, /usr) has standard 16KiB block sizes. I know, that 7.x system
 had (has?) deadlock when 16KiB and 64KiB file systems are mixed up on
 one system, but I never experienced deadlocks with 16KiB and 32KiB
 mixture.
 
  All filesystems (Except root) is SU, but no gjournal so SU_J patch
 are in use.
 
  Same BitTorrent client on same filesystem, but accessed via NFS (from
 other host), doesn't cause deadlock and works rock-stable for days.
 
  I've built kernel with all debug options, waited for deadlock and
 collect all information, mentioned in Developer's Handbook / Debugging
 Deadlocks.
 
  Capture from debug session is attached, together with kernel config
 and dmesg from rebooting.
 
  As I can easily reproduce this deadlock, I could provide any
 additional information from kernel debugger, if needed.
 
 System:   FreeBSD 8.2-PRERELEASE
 cvsup:2011-01-08 00:41:24 MSK (GTM+3) time
 Platform: amd64
There is some weird backtrace at the pid 20, what is g_raid5 ?

If I am guessing right, this creature has a classic deadlock when 
bio processing requires memory allocation. It seems that tid 100079
is sleeping not even due to the free page shortage, but due to address
space exhaustion. As result, read/write requests are stalled.

Then, syncer is blocked waiting for some physical buffer (look at tid
100075), owning the vnode lock. Other processes also wait for the
locked buffers, etc.

So my belief is that this is plain driver (g_raid5, whatever is it)
i/o loss. Try the same load without it.


pgpLhnfw4K47p.pgp
Description: PGP signature

Re: 8.2-PRERELEASE: live deadlock, almost all processes in pfault state

2011-01-08 Thread Kostik Belousov

On Sat, Jan 08, 2011 at 10:29:09PM +0300, Lev Serebryakov wrote:
 Hello, Kostik.
 You wrote 8 января 2011 г., 22:02:32:
 
 
  There is some weird backtrace at the pid 20, what is g_raid5 ?
   It is geom_raid5, with two threads -- working one and one for
  processing finished bios.
 
  If I am guessing right, this creature has a classic deadlock when 
  bio processing requires memory allocation. It seems that tid 100079
   tid 100079 sleep in waiting for some data in queue.
 
  is sleeping not even due to the free page shortage, but due to address
  space exhaustion. As result, read/write requests are stalled.
   tid 100078 sleep in malloc(). But geom_raid5 never ever allocate
  more than 128MiB of memory and it is 64bit system with huge amount of
  kmem_size/kmem_size_max...
 
   How could I explore allocation (like vmstat -m) from kdb to be sure,
 it doesn't allocated more?
Use show uma and show malloc from ddb.

 
   And, if it is classic deadlock is here any classical solution to
 it?
Do not allocate during bio processing.

 
   Really, I'm maintainer of geom_raid5 now, so I need fix this
 deadlock, but I don't really understand, why does it occur? I've
 hit panic with kernel memory exhausted symptoms when module allocate
 too much, but not deadlock :(
Hm, I missed the kmem_back() in the stack. Yes, the thread is waiting for page
allocation.

 
  Then, syncer is blocked waiting for some physical buffer (look at tid
  100075), owning the vnode lock. Other processes also wait for the
  locked buffers, etc.
 
  So my belief is that this is plain driver (g_raid5, whatever is it)
  i/o loss. Try the same load without it.
   I can not, because all data is on this GEOM :)
 
 -- 
 // Black Lion AKA Lev Serebryakov l...@serebryakov.spb.ru
 


pgpl73U94BtBn.pgp
Description: PGP signature

Re: 8.2-PRERELEASE: live deadlock, almost all processes in pfault state

2011-01-08 Thread Kostik Belousov

On Sat, Jan 08, 2011 at 11:10:21PM +0300, Lev Serebryakov wrote:
 Hello, Kostik.
 You wrote 8 января 2011 г., 22:56:13:
 
 
And, if it is classic deadlock is here any classical solution to
  it?
  Do not allocate during bio processing.
  So, if GEOM need some cache, it needs pre-allocate it and implements
 custom allocator over allocated chunk? :(
 
  And what is bio processing in this context? geom_raid5 puts all
bio processing == whole time needed to finish pageout. Pageout is
often performed to clean the page to lower the page shortage.
If pageout requires more free pages to finish during the shortage,
then we get the deadlock.

Also, it seems that you allocate not only bios (small objects, not
every request cause page allocation), but also the huge buffers, that
require free pages each time.

 bios into the (private, internal) queue and geom_start() exits
 immediately, and bio could spend rather long time in queue (if it is
 write request) before it will be sent to underlying provider. And,
 yes, it could be combined with other bios to form new one (why
 allocation of new bio is needed).
 
  So, is bio processing a whole time before bio is complete, or only
 geom_start() call or what?
 
  Also, RAID5 needs to read data (other stripes) and write data (new
 checksum) when write bio is processed. BTW, system geom_raid3 and
 geom_vinum (with raid5 volume) need to do the same to maintain
 checksums, so they could deadlock (in theory) too, if problem is
 allocate memory during bio processing. And geom_mirror needs
 allocate bio for second (third, ...) component on every write...
 
 -- 
 // Black Lion AKA Lev Serebryakov l...@freebsd.org
 


pgpxNoOkpIjqZ.pgp
Description: PGP signature

Re: Hang in VOP_LOCK1_APV on 8-STABLE with NFS.

2011-01-07 Thread Kostik Belousov

On Fri, Jan 07, 2011 at 02:37:25PM -0500, Rick Macklem wrote:
  Hi,
  
  OpenOffice hangs on NFS when I try to save a file or even when I try
  to
  open the save dialog in this case.
  
  
  $ 17:25:35 ron...@ronald [~]
  procstat -kk 85575
  PID TID COMM TDNAME KSTACK
  85575 100322 soffice.bin initial thread mi_switch+0x176
  sleepq_wait+0x3b __lockmgr_args+0x655 vop_stdlock+0x39
  VOP_LOCK1_APV+0x46
  _vn_lock+0x44 vget+0x67 vfs_hash_get+0xeb nfs_nget+0xa8
  nfs_lookup+0x65e
  VOP_LOOKUP_APV+0x40 lookup+0x48a namei+0x518 kern_statat_vnhook+0x82
  kern_statat+0x15 lstat+0x22 syscallenter+0x186 syscall+0x40
  85575 100502 soffice.bin - mi_switch+0x176
  sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _sleep+0x1a0
  do_cv_wait+0x639 __umtx_op_cv_wait+0x51 syscallenter+0x186
  syscall+0x40
  Xfast_syscall+0xe2
  85575 100576 soffice.bin - mi_switch+0x176
  sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _sleep+0x1a0
  do_cv_wait+0x639 __umtx_op_cv_wait+0x51 syscallenter+0x186
  syscall+0x40
  Xfast_syscall+0xe2
  85575 100577 soffice.bin - mi_switch+0x176
  sleepq_catch_signals+0x309 sleepq_wait_sig+0xc _sleep+0x25d
  kern_accept+0x19c accept+0xfe syscallenter+0x186 syscall+0x40
  Xfast_syscall+0xe2
  85575 100578 soffice.bin - mi_switch+0x176
  sleepq_catch_signals+0x309 sleepq_wait_sig+0xc _cv_wait_sig+0x10e
  seltdwait+0xed poll+0x457 syscallenter+0x186 syscall+0x40
  Xfast_syscall+0xe2
  85575 100579 soffice.bin - mi_switch+0x176
  sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12
  _cv_timedwait_sig+0x11d seltdwait+0x79 poll+0x457 syscallenter+0x186
  syscall+0x40 Xfast_syscall+0xe2
  
  $ 17:25:35 ron...@ronald [~]
  uname -a
  FreeBSD ronald.office.base.nl 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE
  #6:
  Mon Dec 27 23:49:30 CET 2010
  r...@ronald.office.base.nl:/usr/obj/usr/src/sys/GENERIC amd64
  
 I think all the above tells us is that the thread is waiting for
 a vnode lock. The question then becomes what is holding a lock
 on that vnode and why?.
 
  It is not possible to exit or kill soffice.bin. I had a slighty
  different
  procstat stack before, but that was fixed a couple of days ago.
 
 Yea, it will be in an uniterruptible sleep when waiting for a vnode lock.
 
  Any thoughts? Enabling local locks in NFS doesn't fix it.
 
 Here's some things you could try:
 1 - apply the attached patch. It fixes a known problem w.r.t. the
 client side of the krpc. Not likely to fix this, but I can hope:-)
1a - Look around of other processes in the uninterruptible sleep state,
quite possible, one of them also owns the lock the openoffice is waiting
for. Also see
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html

Of the particular interest are the witness output and backtraces for
all threads that are reported by witness as owning the vnode locks.

 2 - If #1 doesn't fix the problem:
 - before making it hang, start capturing packets via:
 # tcpdump -s 0 -w xxx host server
 - then make it hang, kill the above and
 # procstat -ka
 # ps axHlww
 and capture the output of both of these. Hopefully these 2 commands
 will indicate what is holding the vnode lock and maybe, why. The
 xxx file can be looked at in wireshark to see what/if any NFS
 traffic is happening.
 If you aren't comfortable looking at the above, you can email them
 to me and I'll take a stab at them someday.
 3 - Try the experimental client to see if it behaves differently. The
 mount command is:
 # mount -t newnfs -o nfsv3,the options you already use server:/path 
 /mntpath
 (This might ideantify if the regular client has an infrequently executed 
 code
  path that forgets to unlock the vnode, since it uses a somewhat 
 different RPC
  layer. The buffer cache handling etc are almost the same, but the RPC 
 stuff is
  fairly different.)
 
  The nfs server is an up-to-date Linux Debian 5 with kernel 2.6.26.
  
 I'm afraid I can't blame Linux (at least not until we have more info;-).
 
  If more info is needed. I can easily reproduce this.
 
 See above #2.
 
 Good luck with it and let us know how it goes, rick
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


pgp5EBbygNWgK.pgp
Description: PGP signature

Re: FreeBSD 8.2-PRERELEASE hangs under load with live kernel

2011-01-06 Thread Kostik Belousov

On Thu, Jan 06, 2011 at 01:31:45PM +0300, Lev Serebryakov wrote:
 Hello, Freebsd-stable.
 
  I've  added  torrent  client  (transmission)  to  software on my home
  server  and it starts to hang in very unusual way: kernel works but
  userland doesn't.
 
I can ping it (and it answers). I can scroll console with
  scrolllock button and keys. I can break into debugger with
  Ctrl+SysReq and it shows, that one CPU is occupied by idle process and
  other by Giant tasq, but no userland processes answer: I can not
  ssh to it, I cannot login on console, samba is dead, etc.
 
ps in kernel debugger shows, that many of processes in pfault
  state, and noting more special.
 
memtest86+ doesn't show any errors after 8 passes of tests (about
  10 hours), so RAM looks Ok.
 
What should I do in kdb to understand what happens?
 
Kernel config and /var/run/dmesg.boot is attached.

http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html


pgpWvepMMaGgU.pgp
Description: PGP signature

Re: RFC: Upgrade BIND version in RELENG_7 to BIND 9.6.x

2010-12-18 Thread Kostik Belousov

On Fri, Dec 17, 2010 at 09:41:54PM -0800, Doug Barton wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256
 
 Howdy,
 
 Traditionally for contributed software generally, and BIND in particular
 we have tried to keep the major version of the contributed software
 consistent throughout a given RELENG_$N branch of FreeBSD. Hopefully the
 reasoning for this is obvious, we want to avoid POLA violations.
Actually not. My own POV is that we should follow the vendor release
cycle, and not the FreeBSD release cycle, for the contributed software.

I do not advocate immediate upgrade of the third-party software that
reached its EOL, but I think that we should do this without pushback
if maintainer consider the neccessity of upgrade.


pgpHCCfGCFIOv.pgp
Description: PGP signature

Re: Following vendor release cycle (Was: Re: RFC: Upgrade BIND version in RELENG_7 to BIND 9.6.x)

2010-12-18 Thread Kostik Belousov

On Sat, Dec 18, 2010 at 03:07:11PM -0800, Doug Barton wrote:
 On 12/18/2010 03:15, Kostik Belousov wrote:
 On Fri, Dec 17, 2010 at 09:41:54PM -0800, Doug Barton wrote:
 Howdy,
 
 Traditionally for contributed software generally, and BIND in particular
 we have tried to keep the major version of the contributed software
 consistent throughout a given RELENG_$N branch of FreeBSD. Hopefully the
 reasoning for this is obvious, we want to avoid POLA violations.
 Actually not. My own POV is that we should follow the vendor release
 cycle, and not the FreeBSD release cycle, for the contributed software.
 
 I do not advocate immediate upgrade of the third-party software that
 reached its EOL, but I think that we should do this without pushback
 if maintainer consider the neccessity of upgrade.
 
 Just to be clear, there were considerable discussions, over a long 
 period of time; between myself, the release engineers, and the 
 security-officer team regarding the subject of BIND 9.3 in RELENG_6. I 
 was given the green light to upgrade if I felt it was necessary (as 
 you're suggesting here) but the final decision to live with the status 
 quo was mine, and I accept responsibility for it.
 
 My reasoning was as follows:
 
 1. All the latest versions of BIND are available in ports, and I made 
 sure that they worked in RELENG_6 so that users who wanted to stay at 
 that OS level but had more serious DNS needs had an easy path.
 
 2. Because BIND 9.3 lacked the ability to do modern DNSSEC anyone who 
 wanted that feature would have to upgrade anyway.
 
 3. BIND 9.3 was still suitable for the (primary) stated purpose of BIND 
 in the base, a basic local resolving name server.
 
 4. BIND 9.3 was different enough that users migrating from it to more 
 modern versions were experiencing problems.
 
 5. Users were naturally migrating to RELENG_[78] at a pace which 
 minimized the impact of the issue.
 
 If any of those things had stopped being true my decision would have 
 been different, but as it was I chose to grin and bear it in order to 
 avoid the POLA violation for any users who were actually using BIND 9.3 
 in RELENG_6. However, the circumstances for BIND 9.4 and RELENG_7 are 
 different, and much more amenable to the upgrade, which is why I'm 
 proposing it.

I do not question your decision of upgrading or leaving the legacy version
of BIND in the legacy branch of FreeBSD src. I only noted that my personal
POV is that we develop the OS, and not are the vendor of the third-party
software, in this case the BIND. As such, I think that following the
vendor life-cycle for contrib is less resource-intensive for the project,
and should be the default.

If anybody who does the real work feels that it is interesting/nice to
the users/generally better to spend the time neccessary to keep the
upgrade path on the branch smoother, I am fine with this.


pgpkjh3N0ouV5.pgp
Description: PGP signature

Re: vm.swap_reserved toooooo large?

2010-12-15 Thread Kostik Belousov

On Wed, Dec 15, 2010 at 03:43:56PM +0200, George Mamalakis wrote:
 On 15/12/2010 13:26, Trond Endrest??l wrote:
 On Wed, 15 Dec 2010 13:04+0200, George Mamalakis wrote:
 
 I was testing a program that would exhaust all my memory (in C++),
 and when this would happen, it would call set_new_handler() along
 with one of my functions that would inform the user about the lack
 of memory and then it would exit the program. Instead, the program
 was force-killed by the kernel (signal 9) and I was informed that:
 If all your process' memory is exhausted, then there is no memory left
 for the runtime system for doing I/O and the other stuff you want.
 Next, unless I'm on drugs, maybe you should call set_new_handler()
 before you actually run out of memory. Just my $0.02.
 
 
 Trond.
 
 
 
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
 Trond,
 
 My problem was not that the program was force-killed, my problem was 
 that the system reserved 500G+ of swap, even though the total size is 4G.

Read tuning(7), in particular, the description of vm.overcommit sysctl.


pgptM012qrIFN.pgp
Description: PGP signature

Re: cryptodev cipher registration (aesni and padlock)

2010-12-13 Thread Kostik Belousov

On Mon, Dec 13, 2010 at 10:27:00AM -0500, Mike Tancsa wrote:
 While doing some testing with the aesni driver, it seems some ciphers are 
 registered with openssl and some are not.
 
 e.g. if I start an ssh session using aes128, I see the following
 
 [pyroxene]% ssh -c aes128-cbc smarthost1 cryptostats | grep sym
 654198 symmetric crypto ops (0 errors, 0 times driver blocked)
 [pyroxene]% ssh -c aes128-cbc smarthost1 cryptostats | grep sym
 654225 symmetric crypto ops (0 errors, 0 times driver blocked)
 [pyroxene]% 
 
 ie it shows the hardware transformation count increasing.  But if I do aes 
 192 or 256, it does not
 
 [pyroxene]% ssh -c aes256-cbc smarthost1 cryptostats | grep sym
 654231 symmetric crypto ops (0 errors, 0 times driver blocked)
 [pyroxene]% ssh -c aes192-cbc smarthost1 cryptostats | grep sym
 654231 symmetric crypto ops (0 errors, 0 times driver blocked)
 [pyroxene]% ssh -c aes192-cbc smarthost1 cryptostats | grep sym
 654231 symmetric crypto ops (0 errors, 0 times driver blocked)
 [pyroxene]% ssh -c aes192-cbc smarthost1 cryptostats | grep sym
 654231 symmetric crypto ops (0 errors, 0 times driver blocked)
 [pyroxene]% 
 Yet the are supposed to be supported, no ?  Where in openssl is this 
 configured ? The padlock driver does the same thing
 
 % ssh -c aes256-cbc smarthost1 cryptotest -z
0.000 sec,   2aes crypts,  16 bytes,  400 byte/sec,
 30.5 Mb/sec
0.000 sec,   2aes crypts,  32 bytes, 1600 byte/sec,   
 122.1 Mb/sec
0.000 sec,   2aes crypts,  64 bytes, 3200 byte/sec,   
 244.1 Mb/sec
0.000 sec,   2aes crypts, 128 bytes, 6400 byte/sec,   
 488.3 Mb/sec
0.000 sec,   2aes crypts, 256 bytes, 12800 byte/sec,   
 976.6 Mb/sec
0.000 sec,   2aes crypts, 512 bytes, 17067 byte/sec,  
 1302.1 Mb/sec
0.000 sec,   2aes crypts,1024 bytes, 292571429 byte/sec,  
 2232.1 Mb/sec
0.000 sec,   2aes crypts,2048 bytes, 45511 byte/sec,  
 3472.2 Mb/sec
0.000 sec,   2aes crypts,4096 bytes, 51200 byte/sec,  
 3906.2 Mb/sec
0.000 sec,   2aes crypts,8192 bytes, 420102564 byte/sec,  
 3205.1 Mb/sec
0.000 sec,   2 aes192 crypts,  16 bytes,  800 byte/sec,
 61.0 Mb/sec
0.000 sec,   2 aes192 crypts,  32 bytes, 1600 byte/sec,   
 122.1 Mb/sec
0.000 sec,   2 aes192 crypts,  64 bytes, 3200 byte/sec,   
 244.1 Mb/sec
0.000 sec,   2 aes192 crypts, 128 bytes, 6400 byte/sec,   
 488.3 Mb/sec
0.000 sec,   2 aes192 crypts, 256 bytes, 12800 byte/sec,   
 976.6 Mb/sec
0.000 sec,   2 aes192 crypts, 512 bytes, 20480 byte/sec,  
 1562.5 Mb/sec
0.000 sec,   2 aes192 crypts,1024 bytes, 34133 byte/sec,  
 2604.2 Mb/sec
0.000 sec,   2 aes192 crypts,2048 bytes, 40960 byte/sec,  
 3125.0 Mb/sec
0.000 sec,   2 aes192 crypts,4096 bytes, 54613 byte/sec,  
 4166.7 Mb/sec
0.000 sec,   2 aes192 crypts,8192 bytes, 496484848 byte/sec,  
 3787.9 Mb/sec
0.000 sec,   2 aes256 crypts,  16 bytes, 1067 byte/sec,
 81.4 Mb/sec
0.000 sec,   2 aes256 crypts,  32 bytes, 2133 byte/sec,   
 162.8 Mb/sec
0.000 sec,   2 aes256 crypts,  64 bytes, 3200 byte/sec,   
 244.1 Mb/sec
0.000 sec,   2 aes256 crypts, 128 bytes, 5120 byte/sec,   
 390.6 Mb/sec
0.000 sec,   2 aes256 crypts, 256 bytes, 10240 byte/sec,   
 781.2 Mb/sec
0.000 sec,   2 aes256 crypts, 512 bytes, 20480 byte/sec,  
 1562.5 Mb/sec
0.000 sec,   2 aes256 crypts,1024 bytes, 292571429 byte/sec,  
 2232.1 Mb/sec
0.000 sec,   2 aes256 crypts,2048 bytes, 40960 byte/sec,  
 3125.0 Mb/sec
0.000 sec,   2 aes256 crypts,4096 bytes, 51200 byte/sec,  
 3906.2 Mb/sec
0.000 sec,   2 aes256 crypts,8192 bytes, 442810811 byte/sec,  
 3378.4 Mb/secW

From my reading of src/crypto/openssl/crypto/engine/eng_cryptodev.c,
and browsing
http://cvs.openssl.org/rlog?f=openssl/crypto/engine/eng_cryptodev.c
it seems that only OpenSSL HEAD and 1.0 branch have support for
AES-192 and AES-256 when working with /dev/crypto.


pgp0dhrqHFg74.pgp
Description: PGP signature

Re: aesni(?) corrupts data on 8.2-BETA1

2010-12-12 Thread Kostik Belousov

On Sat, Dec 11, 2010 at 07:37:51PM -0500, Mike Tancsa wrote:
 On 12/11/2010 6:22 PM, Kostik Belousov wrote:
  On Sat, Dec 11, 2010 at 06:08:08PM -0500, Mike Tancsa wrote:
  On 12/11/2010 11:01 AM, Kostik Belousov wrote:
 
  I have no access to AESNI hardware. For start, you may use
  src/tools/tools/crypto/cryptotest
  to somewhat verify the sanity of the driver.
 
  I doesnt happen every time, but one out of 5 or so 
 
  First, which arch is it, amd64 or i386 ?
  
  Also, please revert r216162 and do the same tests.
 
 Hi,
   Its AMD64, but i386 seems to be impacted too. I am not sure how to
 revert to a specific commit, but for now I csup'd with a date tag of
 
 *date=2010.12.02.23.00.00
 
 which is a day before
 http://lists.freebsd.org/pipermail/svn-src-stable-8/2010-December/004338.html
 
 
 And that seems to fix it!
 
 I  have been running
 cryptotest -c -z -t 10
 in a loop for the past 10min and not one error.

Please try this patch on the latest HEAD or RELENG_8.

diff --git a/sys/amd64/amd64/fpu.c b/sys/amd64/amd64/fpu.c
index 482b5da..1b493b4 100644
--- a/sys/amd64/amd64/fpu.c
+++ b/sys/amd64/amd64/fpu.c
@@ -426,7 +426,9 @@ fpudna(void)
fxrstor(fpu_initialstate);
if (pcb-pcb_initial_fpucw != __INITIAL_FPUCW__)
fldcw(pcb-pcb_initial_fpucw);
-   fpuuserinited(curthread);
+   pcb-pcb_flags |= PCB_FPUINITDONE;
+   if (PCB_USER_FPU(pcb))
+   pcb-pcb_flags |= PCB_USERFPUINITDONE;
} else
fxrstor(pcb-pcb_save);
critical_exit();
diff --git a/sys/i386/isa/npx.c b/sys/i386/isa/npx.c
index 9ec5d25..f314e44 100644
--- a/sys/i386/isa/npx.c
+++ b/sys/i386/isa/npx.c
@@ -684,7 +684,9 @@ npxdna(void)
fpurstor(npx_initialstate);
if (pcb-pcb_initial_npxcw != __INITIAL_NPXCW__)
fldcw(pcb-pcb_initial_npxcw);
-   npxuserinited(curthread);
+   pcb-pcb_flags |= PCB_NPXINITDONE;
+   if (PCB_USER_FPU(pcb))
+   pcb-pcb_flags |= PCB_NPXUSERINITDONE;
} else {
/*
 * The following fpurstor() may cause an IRQ13 when the


pgpA0gqcjE6TG.pgp
Description: PGP signature

Re: aesni(?) corrupts data on 8.2-BETA1

2010-12-11 Thread Kostik Belousov

On Fri, Dec 10, 2010 at 08:43:18PM -0500, Mike Tancsa wrote:
 Actually, I just noticed something like this as well with ssh via
 cryptodev and rsync as well. It was erroring out. eg.
 
 
 Dec 10 16:50:01 backup3 sshd[13120]: Corrupted MAC on input.
 Dec 10 16:50:01 backup3 sshd[13120]: Finished discarding for 64.x.x.x
 
 I had a few ssh sessions die as well.  It was working ok with a kernel /
 world from last week.  I was going to try and see if I can narrow it
 down, but it seemed to had been working fine with world from last week.
  Not sure if its the openssl update ?  But if you are seeing issues with
 geli, then I doubt its openssl.
 
   ---Mike
 
 On 12/10/2010 7:49 PM, Jan Henrik Sylvester wrote:
  I just upgraded my main laptop from 8.1-RELEASE (GENERIC, amd64) to
  8.2-BETA1 and added aesni_load=YES to my /boot/loader.conf.
  
  (If my interpretation is correct:) With aesni loaded, I see many files
  corrupted on my geli encrypted volume. Without aesni loaded, they are ok.
  
  I have got a journaling UFS2 on gjournal on geli on a FreeBSD partition
  on a MBR slice on a disk with ahci loaded.
  
  Story: First I noticed some weirdness of Thunderbird not showing the
  upgraded message properly and reloading IMAP messages that have
  already been read, but did not think of anything. Only during my usual
  rsyncing of the encrypted volume, I saw that some files could not be
  read (invalid file descriptor?). I rebooted without aesni and got a
  different error message.
  
  I created checksums of all files on that encrypted volume with and
  without aesni loaded (rebooting in between): 150 Differences (one files
  could not be read in both cases).
  
  Just to make sure, I tried to rsync with --checksum and --dry-run to
  the other machine that is supposed to have the same files: With aesni,
  many files were scheduled to be synced and one could not be read, but
  without aesni, only that one file was scheduled to be synced -- it
  probably got corrupted for good with aesni loaded. It is especially
  weird that I did not attempt to write to the file that got corrupted on
  disk with aesni loaded.
  
  Is there anything I am doing wrong or is it really aesni or the
  processor failing?
  
  The processor is a Core i7-M620 (with AESNI of course).
  
  Before I investigate any further, I have to make a real backup...
  rsyncing does not prevent silent corruption. I am lucky that it was not
  so silent after all.

I have no access to AESNI hardware. For start, you may use
src/tools/tools/crypto/cryptotest
to somewhat verify the sanity of the driver.


pgpAfHZnB2pXv.pgp
Description: PGP signature

Re: aesni(?) corrupts data on 8.2-BETA1

2010-12-11 Thread Kostik Belousov

On Sat, Dec 11, 2010 at 06:08:08PM -0500, Mike Tancsa wrote:
 On 12/11/2010 11:01 AM, Kostik Belousov wrote:
  
  I have no access to AESNI hardware. For start, you may use
  src/tools/tools/crypto/cryptotest
  to somewhat verify the sanity of the driver.
 
 I doesnt happen every time, but one out of 5 or so 
 
First, which arch is it, amd64 or i386 ?

Also, please revert r216162 and do the same tests.


pgplwXlG6q3qY.pgp
Description: PGP signature

Re: top io mode

2010-11-25 Thread Kostik Belousov

On Thu, Nov 25, 2010 at 05:28:30AM -0600, Adam Vande More wrote:
 top io doesn't seem to display stats when dealing direct with a block device
 like so:
 
 dd if=/dev/ada0 of=/dev/null
 
 However if dd runs on a regular file eg
 
 dd if=test.file of=/dev/null
 
 then stats are reported in top.
 
 Is this the expected behavior?

I do not think so, and the patch at the end of the message worked for me.

I cannot explain the
if (!TD_IS_IDLETHREAD(curthread))
curthread-td_ru.ru_inblock++;
checks that are done in vfs_bio.c.

diff --git a/sys/kern/kern_physio.c b/sys/kern/kern_physio.c
index d6be6e7..34072f3 100644
--- a/sys/kern/kern_physio.c
+++ b/sys/kern/kern_physio.c
@@ -57,10 +57,13 @@ physio(struct cdev *dev, struct uio *uio, int ioflag)
for (i = 0; i  uio-uio_iovcnt; i++) {
while (uio-uio_iov[i].iov_len) {
bp-b_flags = 0;
-   if (uio-uio_rw == UIO_READ)
+   if (uio-uio_rw == UIO_READ) {
bp-b_iocmd = BIO_READ;
-   else 
+   curthread-td_ru.ru_inblock++;
+   } else {
bp-b_iocmd = BIO_WRITE;
+   curthread-td_ru.ru_oublock++;
+   }
bp-b_iodone = bdone;
bp-b_data = uio-uio_iov[i].iov_base;
bp-b_bcount = uio-uio_iov[i].iov_len;



pgpfkjhFTcdCl.pgp
Description: PGP signature

Re: top io mode

2010-11-25 Thread Kostik Belousov

On Thu, Nov 25, 2010 at 04:35:53PM -0600, Adam Vande More wrote:
 On Thu, Nov 25, 2010 at 3:04 PM, Jeremy Chadwick
 free...@jdc.parodius.comwrote:
 
  Bad form to follow up to my own Email of course, but some discussion
  material:
 
 
 I'm a frequent offender myself so I won't be pointing any fingers.
 
 top -m io doesn't show any I/O writes, while gstat(8) does, and to
  numerous devices all which make up some form of ZFS pool.
 
 
 Yes, it's a generic ZFS mirror.
 
 If you do something like dd if=/dev/urandom of=/pool/file bs=64k, does
  top -m io show write I/O for the dd process?
 
 
 It does not on my ZFS STABLE system with Kostick's path.  So the patch fixes
 reads, but not writes. cc'ing to notify in case he has more ideas.
Can you show exact command and describe some details about setup for
the case where you still do not observe the counter in top ?
(With my patch applied).

There are two mixed things in the thread:
First, there is (was) a missed accounting for i/o to physical devices,
and the patch I posted should fix it.

Second, it is relatively well-known that ZFS does not properly accounts
i/o. Might be, the patch by Andrew fixes it, I do not know.


pgpnsJ18JpLPt.pgp
Description: PGP signature

Re: top io mode

2010-11-25 Thread Kostik Belousov

On Thu, Nov 25, 2010 at 05:18:13PM -0600, Adam Vande More wrote:
 On Thu, Nov 25, 2010 at 4:44 PM, Kostik Belousov kostik...@gmail.comwrote:
 
  Can you show exact command and describe some details about setup for
  the case where you still do not observe the counter in top ?
  (With my patch applied).
 
 -
 Still broken with patch applied;
 dd if=/dev/zero of=/tmp/delete.me bs=64k
What is /tmp/delete.me ? A file ? On what kind of filesystem is it
located ?

Summoning some psychic power, I can predict that delete.me is
located on ZFS or tmpfs filesystem. Is this right ? If yes, then
the result is expected and nothing is broken there (except ZFS,
but I already described it).

 
 during this top -m io displays for dd:
  2235 adam   14 24  0  0  0  0   0.00% dd
 
 -
 Fixed with patch applied;
 
 dd if=/dev/ada0 of=/dev/null bs=64k
 
 during this top -m io displays for dd:
  2248 adam 3262  0   3262  0  0   3262 100.00% dd
 
 
 
 -- 
 Adam Vande More


pgpFbyWSWKOjh.pgp
Description: PGP signature

Re: Call for testers: FPU changes

2010-11-20 Thread Kostik Belousov

On Sat, Nov 20, 2010 at 01:30:54AM -0500, Mike Tancsa wrote:
 On 11/16/2010 4:43 AM, Kostik Belousov wrote:
  On Mon, Nov 15, 2010 at 10:42:50PM -0500, Mike Tancsa wrote:
  On 11/15/2010 4:13 PM, Kostik Belousov wrote:
 
  Patch is at
  http://people.freebsd.org/~kib/misc/releng_8_fpu.1.patch
 
 
 I did some more tests post commit today using the aesni kld taken
 directly from HEAD.  BTW, do you plan to MFC this as well ?
Sure, I will merge aesni(4), it was the only reason to work on the
kern_fpu in stable/8.

I want some pause between KPI and driver MFC, to ease the handling
of possible mismerge or fixing latent bugs (since stable has much
larger testing base then HEAD).
 
 Results at the bottom of http://www.tancsa.com/fpu.html
 
 It certainly makes a difference with geli. IPSEC and userland stuff, not
 so much. The CPU itself is crazy fast, so its hard to see a difference
 in things like ssh and even ipsec didnt yield any differences.  For ssh
 and userland stuff I guess once there is an aesni userland engine, this
 would probably help over the cryptodev interface.

Yes, the small blocks encoding/decoding has a large overhead of loop setup
code.

Thank you.


pgpGfPDahzWMt.pgp
Description: PGP signature

Re: Call for testers: FPU changes

2010-11-17 Thread Kostik Belousov

On Tue, Nov 16, 2010 at 08:46:23PM -0500, Mike Tancsa wrote:
 On 11/16/2010 5:19 PM, Kostik Belousov wrote:
  Would your conclusion be that the patch seems to increase the throughput
  of the aesni(4) ?
  
  I think that on small-sized blocks, when using aesni(4), the dominating
  factor is the copying/copyout of the data to/from the kernel address
  space. Still would be interesting to compare the full output
  of openssl speed on aesni(4) with and without the patch I posted.
 
 Hi,
   There does seem to be some improvement on large blocks.  But there are
 some freakishly fast times. On other sizes, there is no difference in
 speed it would seem
 
 I did 20 runs. Updated stats at http://www.tancsa.com/fpu.html

Thank you. Indeed, I think that the test units are too small so that
random system events can cause the variation. Nonetheless, patch seems
to help, so I committed it.

Meantime, the similar change may be beneficial for padlock(4) too.
f you are going to test it, please note that most likely, openssl padlock
engine does not use padlock(4), I do not know for sure.

diff --git a/sys/crypto/via/padlock.c b/sys/crypto/via/padlock.c
index 77e059b..ba63093 100644
--- a/sys/crypto/via/padlock.c
+++ b/sys/crypto/via/padlock.c
@@ -170,7 +170,7 @@ padlock_newsession(device_t dev, uint32_t *sidp, struct 
cryptoini *cri)
struct padlock_session *ses = NULL;
struct cryptoini *encini, *macini;
struct thread *td;
-   int error;
+   int error, saved_ctx;
 
if (sidp == NULL || cri == NULL)
return (EINVAL);
@@ -238,10 +238,18 @@ padlock_newsession(device_t dev, uint32_t *sidp, struct 
cryptoini *cri)
 
if (macini != NULL) {
td = curthread;
-   error = fpu_kern_enter(td, ses-ses_fpu_ctx, FPU_KERN_NORMAL);
+   if (!is_fpu_kern_thread(0)) {
+   error = fpu_kern_enter(td, ses-ses_fpu_ctx,
+   FPU_KERN_NORMAL);
+   saved_ctx = 1;
+   } else {
+   error = 0;
+   saved_ctx = 0;
+   }
if (error == 0) {
error = padlock_hash_setup(ses, macini);
-   fpu_kern_leave(td, ses-ses_fpu_ctx);
+   if (saved_ctx)
+   fpu_kern_leave(td, ses-ses_fpu_ctx);
}
if (error != 0) {
padlock_freesession_one(sc, ses, 0);
diff --git a/sys/crypto/via/padlock_cipher.c b/sys/crypto/via/padlock_cipher.c
index 0ae26c8..1456ddf 100644
--- a/sys/crypto/via/padlock_cipher.c
+++ b/sys/crypto/via/padlock_cipher.c
@@ -205,7 +205,7 @@ padlock_cipher_process(struct padlock_session *ses, struct 
cryptodesc *enccrd,
struct thread *td;
u_char *buf, *abuf;
uint32_t *key;
-   int allocated, error;
+   int allocated, error, saved_ctx;
 
buf = padlock_cipher_alloc(enccrd, crp, allocated);
if (buf == NULL)
@@ -250,14 +250,21 @@ padlock_cipher_process(struct padlock_session *ses, 
struct cryptodesc *enccrd,
}
 
td = curthread;
-   error = fpu_kern_enter(td, ses-ses_fpu_ctx, FPU_KERN_NORMAL);
+   if (!is_fpu_kern_thread(0)) {
+   error = fpu_kern_enter(td, ses-ses_fpu_ctx, FPU_KERN_NORMAL);
+   saved_ctx = 1;
+   } else {
+   error = 0;
+   saved_ctx = 0;
+   }
if (error != 0)
goto out;
 
padlock_cbc(abuf, abuf, enccrd-crd_len / AES_BLOCK_LEN, key, cw,
ses-ses_iv);
 
-   fpu_kern_leave(td, ses-ses_fpu_ctx);
+   if (saved_ctx)
+   fpu_kern_leave(td, ses-ses_fpu_ctx);
 
if (allocated) {
crypto_copyback(crp-crp_flags, crp-crp_buf, enccrd-crd_skip,
diff --git a/sys/crypto/via/padlock_hash.c b/sys/crypto/via/padlock_hash.c
index 58c58b2..0fe182b 100644
--- a/sys/crypto/via/padlock_hash.c
+++ b/sys/crypto/via/padlock_hash.c
@@ -366,17 +366,24 @@ padlock_hash_process(struct padlock_session *ses, struct 
cryptodesc *maccrd,
 struct cryptop *crp)
 {
struct thread *td;
-   int error;
+   int error, saved_ctx;
 
td = curthread;
-   error = fpu_kern_enter(td, ses-ses_fpu_ctx, FPU_KERN_NORMAL);
+   if (!is_fpu_kern_thread(0)) {
+   error = fpu_kern_enter(td, ses-ses_fpu_ctx, FPU_KERN_NORMAL);
+   saved_ctx = 1;
+   } else {
+   error = 0;
+   saved_ctx = 0;
+   }
if (error != 0)
return (error);
if ((maccrd-crd_flags  CRD_F_KEY_EXPLICIT) != 0)
padlock_hash_key_setup(ses, maccrd-crd_key, maccrd-crd_klen);
 
error = padlock_authcompute(ses, maccrd, crp-crp_buf, crp-crp_flags);
-   fpu_kern_leave(td, ses-ses_fpu_ctx);
+   if (saved_ctx)
+   fpu_kern_leave(td, ses-ses_fpu_ctx);
return (error

Re: Call for testers: FPU changes

2010-11-17 Thread Kostik Belousov

On Wed, Nov 17, 2010 at 02:18:50PM -0500, Mike Tancsa wrote:
 On 11/17/2010 11:35 AM, Kostik Belousov wrote:
  Meantime, the similar change may be beneficial for padlock(4) too.
  f you are going to test it, please note that most likely, openssl padlock
  engine does not use padlock(4), I do not know for sure.
  
  diff --git a/sys/crypto/via/padlock.c b/sys/crypto/via/padlock.c
  index 77e059b..ba63093 100644
  --- a/sys/crypto/via/padlock.c
  +++ b/sys/crypto/via/padlock.c
 
 Patch applied cleanly
 
 
 Full results at the bottom of
 http://www.tancsa.com/fpu.html
 
 On large blocks, version 1 vs the above patch show no significant
 difference.  This is with openssl using the cryptodev engine. I also
 compared to the openssl padlock engine which gave interesting results!
 
 
 
 0(via)# cat version1.txt | sed -e 's/k//g' | awk '{print $6}'  1
 0(via)# cat version2.txt | sed -e 's/k//g' | awk '{print $6}'  2
 0(via)# ministat 1 2
 x 1
 + 2
 N   Min   MaxMedian   AvgStddev
 x  30 2591851.6 6645345.1 4326340.6 4227917.6 1083181.2
 +  30 2574883.9 8830282.8 4033610.4 4241195.6 1519334.8
 No difference proven at 95.0% confidence
 
 0(via)# cat version1.txt | sed -e 's/k//g' | awk '{print $5}'  1
 0(via)# cat version2.txt | sed -e 's/k//g' | awk '{print $5}'  2
 0(via)# ministat 1 2
 N   Min   MaxMedian   AvgStddev
 x  30 1124673.3 2320883.7 1527677.1 1550631.9  295165.4
 +  30 1069788.2 2508865.7 1594506.2 1588193.2 389414.33
 No difference proven at 95.0% confidence
 0(via)#
 

Thank you once more.

If nothing new pops up, I will commit the MFC tomorrow.
Unfortunately, no suspend/resume testers appeared, so be it.


pgpCEmxG512GE.pgp
Description: PGP signature

Re: Call for testers: FPU changes

2010-11-16 Thread Kostik Belousov

On Mon, Nov 15, 2010 at 10:42:50PM -0500, Mike Tancsa wrote:
 On 11/15/2010 4:13 PM, Kostik Belousov wrote:
  
  Patch is at
  http://people.freebsd.org/~kib/misc/releng_8_fpu.1.patch
 
 
 Hi,
   One small failure on the patch
 
 The text leading up to this was:
 --
 |Index: pc98/include/npx.h
 |===
 |--- pc98/include/npx.h (revision 215253)
 |+++ pc98/include/npx.h (working copy)
 --
 Patching file pc98/include/npx.h using Plan A...
 Hunk #1 failed at 1.
 1 out of 1 hunks failed--saving rejects to pc98/include/npx.h.rej
This is because our patch(1) in base is somewhat old, I believe.
The diff was generated by svn diff from the up to date stable/8
checkout, and the reason for failure is expanded $FreeBSD$ tags.

Newer gnu patch, available in ports, handless this correctly,
reporting about patches applied with fuzz.

 
 
 I tested with openssl and openvpn and all seems to work great on the via
 board and my i5 board!!  Simple test details at
 
 http://www.tancsa.com/fpu.html
 
 I will try out geli and some more extensive tests tomorrow
 
 Thanks for porting this back to RELENG_8 !
This is actually somewhat puzzling. Does openssl in base automatically
use crypto(4) ?

Also, could you, please redo the speed tests for aesni(4) with the
following patch applied over the driver sources ?

Thank you !

diff --git a/sys/crypto/aesni/aesni_wrap.c b/sys/crypto/aesni/aesni_wrap.c
index 36c66ea..3fd397c 100644
--- a/sys/crypto/aesni/aesni_wrap.c
+++ b/sys/crypto/aesni/aesni_wrap.c
@@ -246,14 +246,21 @@ int
 aesni_cipher_setup(struct aesni_session *ses, struct cryptoini *encini)
 {
struct thread *td;
-   int error;
+   int error, saved_ctx;
 
td = curthread;
-   error = fpu_kern_enter(td, ses-fpu_ctx, FPU_KERN_NORMAL);
+   if (!is_fpu_kern_thread(0)) {
+   error = fpu_kern_enter(td, ses-fpu_ctx, FPU_KERN_NORMAL);
+   saved_ctx = 1;
+   } else {
+   error = 0;
+   saved_ctx = 0;
+   }
if (error == 0) {
error = aesni_cipher_setup_common(ses, encini-cri_key,
encini-cri_klen);
-   fpu_kern_leave(td, ses-fpu_ctx);
+   if (saved_ctx)
+   fpu_kern_leave(td, ses-fpu_ctx);
}
return (error);
 }
@@ -264,16 +271,22 @@ aesni_cipher_process(struct aesni_session *ses, struct 
cryptodesc *enccrd,
 {
struct thread *td;
uint8_t *buf;
-   int error, allocated;
+   int error, allocated, saved_ctx;
 
buf = aesni_cipher_alloc(enccrd, crp, allocated);
if (buf == NULL)
return (ENOMEM);
 
td = curthread;
-   error = fpu_kern_enter(td, ses-fpu_ctx, FPU_KERN_NORMAL);
-   if (error != 0)
-   goto out;
+   if (!is_fpu_kern_thread(0)) {
+   error = fpu_kern_enter(td, ses-fpu_ctx, FPU_KERN_NORMAL);
+   if (error != 0)
+   goto out;
+   saved_ctx = 1;
+   } else {
+   saved_ctx = 0;
+   error = 0;
+   }
 
if ((enccrd-crd_flags  CRD_F_KEY_EXPLICIT) != 0) {
error = aesni_cipher_setup_common(ses, enccrd-crd_key,
@@ -311,7 +324,8 @@ aesni_cipher_process(struct aesni_session *ses, struct 
cryptodesc *enccrd,
ses-iv);
}
}
-   fpu_kern_leave(td, ses-fpu_ctx);
+   if (saved_ctx)
+   fpu_kern_leave(td, ses-fpu_ctx);
if (allocated)
crypto_copyback(crp-crp_flags, crp-crp_buf, enccrd-crd_skip,
enccrd-crd_len, buf);


pgpTmlaTNbgbt.pgp
Description: PGP signature

Re: Call for testers: FPU changes

2010-11-16 Thread Kostik Belousov

On Tue, Nov 16, 2010 at 05:08:30PM -0500, Mike Tancsa wrote:
 On 11/16/2010 4:43 AM, Kostik Belousov wrote:
  On Mon, Nov 15, 2010 at 10:42:50PM -0500, Mike Tancsa wrote:
  On 11/15/2010 4:13 PM, Kostik Belousov wrote:
 
  Patch is at
  http://people.freebsd.org/~kib/misc/releng_8_fpu.1.patch
 
 
  Hi,
 One small failure on the patch
 
  The text leading up to this was:
  --
  |Index: pc98/include/npx.h
  |===
  |--- pc98/include/npx.h (revision 215253)
  |+++ pc98/include/npx.h (working copy)
  --
  Patching file pc98/include/npx.h using Plan A...
  Hunk #1 failed at 1.
  1 out of 1 hunks failed--saving rejects to pc98/include/npx.h.rej
  This is because our patch(1) in base is somewhat old, I believe.
  The diff was generated by svn diff from the up to date stable/8
  checkout, and the reason for failure is expanded $FreeBSD$ tags.
  
  Newer gnu patch, available in ports, handless this correctly,
  reporting about patches applied with fuzz.
  
 
 
  I tested with openssl and openvpn and all seems to work great on the via
  board and my i5 board!!  Simple test details at
 
  http://www.tancsa.com/fpu.html
 
  I will try out geli and some more extensive tests tomorrow
 
  Thanks for porting this back to RELENG_8 !
  This is actually somewhat puzzling. Does openssl in base automatically
  use crypto(4) ?
 
 
 I force it it via ssl.cnf
 
 
 0(achinetboot)% tail -11 /etc/ssl/openssl.cnf
 
 openssl_conf = openssl_def
 
 [openssl_def]
 engines = openssl_engines
 
 [openssl_engines]
 padlock = cryptodev_engine
 
 [cryptodev_engine]
 default_algorithms = ALL
 0(achinetboot)%
Ah, that explains the results.

 
 
 The limiting factor here for ssh seems to be the 100Mb link my i5 box is
 on. Here is with and without aesni loaded
 
 0(achinetboot)% /usr/bin/time scp -c aes128-cbc test.bin
 mdtan...@10.255.255.1:/dev/null
 test.bin
   100%   88MB  11.0MB/s   00:08
 8.14 real 0.44 user 0.57 sys
 0(achinetboot)% /usr/bin/time scp -c aes128-cbc test.bin
 mdtan...@10.255.255.1:/dev/null
 test.bin
   100%   88MB  11.0MB/s   00:08
 8.15 real 1.46 user 0.36 sys
 0(achinetboot)%
 
 I will move it to gigabit to get a better test shortly.
 
  
  Also, could you, please redo the speed tests for aesni(4) with the
  following patch applied over the driver sources ?
  
  Thank you !
  
  diff --git a/sys/crypto/aesni/aesni_wrap.c b/sys/crypto/aesni/aesni_wrap.c
  index 36c66ea..3fd397c 100644
  --- a/sys/crypto/aesni/aesni_wrap.c
  +++ b/sys/crypto/aesni/aesni_wrap.c
  @@ -246,14 +246,21 @@ int
 
 
 
  patch -p2  a
 Hmm...  Looks like a unified diff to me...
 The text leading up to this was:
 --
 |diff --git a/sys/crypto/aesni/aesni_wrap.c b/sys/crypto/aesni/aesni_wrap.c
 |index 36c66ea..3fd397c 100644
 |--- a/sys/crypto/aesni/aesni_wrap.c
 |+++ b/sys/crypto/aesni/aesni_wrap.c
 --
 Patching file crypto/aesni/aesni_wrap.c using Plan A...
 Hunk #1 succeeded at 246.
 Hunk #2 succeeded at 271.
 Hunk #3 succeeded at 324.
 Hmm...  Ignoring the trailing garbage.
 done
 
 
 Seems to work ok
 
 
 
 0(achinetboot)# kldload aesni
 0(achinetboot)#  openssl speed -evp aes-128-cbc
 To get the most accurate results, try to run this
 program when this computer is idle.
 Doing aes-128-cbc for 3s on 16 size blocks: 2587085 aes-128-cbc's in 0.39s
 Doing aes-128-cbc for 3s on 64 size blocks: 2425301 aes-128-cbc's in 0.38s
 Doing aes-128-cbc for 3s on 256 size blocks: 1925353 aes-128-cbc's in 0.19s
 Doing aes-128-cbc for 3s on 1024 size blocks: 1098255 aes-128-cbc's in 0.11s
 Doing aes-128-cbc for 3s on 8192 size blocks: 152631 aes-128-cbc's in 0.05s
 OpenSSL 0.9.8n 24 Mar 2010
 built on: date not available
 options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long)
 aes(partial) blowfish(idx)
 compiler: cc
 available timing options: USE_TOD HZ=128 [sysconf value]
 timing function used: getrusage
 The 'numbers' are in 1000s of bytes per second processed.
 type 16 bytes 64 bytes256 bytes   1024 bytes   8192
 bytes
 aes-128-cbc 105979.48k   404781.84k  2632455.13k  9955323.90k
 27619906.16k
 0(achinetboot)#
 
 But there is a LOT of variation between runs for some reason.
 
 I added to http://www.tancsa.com/fpu.html
 
 the different runs
 
 
Mike, thank you again.

Would your conclusion be that the patch seems to increase the throughput
of the aesni(4) ?

I think that on small-sized blocks, when using aesni(4), the dominating
factor is the copying/copyout of the data to/from the kernel address
space. Still would be interesting to compare the full output
of openssl speed on aesni(4) with and without the patch I posted.


pgpC53U96rkuf.pgp
Description: PGP signature

Call for testers: FPU changes

2010-11-15 Thread Kostik Belousov

Hello,
this is a call for testers of the merge of fpu_kern_enter/leave(9)
to RELENG_8. The changes are required to fix some issues with VIA
padlock engine, and to actually merge aesni(4) to RELENG_8.

I ask to look at the possible FPU context handling regressions.
Reports from the users of VIA padlock hardware are also needed.
Any user that has suspend/resume magically working on
8 branch, please test that the patch does not make the things
worse.

Please note that the pre-release freeze will start in 2 weeks, so 
I need to get testing results relatively quickly to be in time for 8.2.

Patch is at
http://people.freebsd.org/~kib/misc/releng_8_fpu.1.patch

Thanks in advance.


pgp3FKznhbprw.pgp
Description: PGP signature

Re: 8.1-STABLE: problem with unmounting ZFS snapshots

2010-11-13 Thread Kostik Belousov

On Sat, Nov 13, 2010 at 01:09:55PM +0200, Andriy Gapon wrote:
 on 13/11/2010 13:06 Martin Matuska said the following:
  No, this is not good for us. Solaris does not allow mounting of
  snapshots on any vnode, like we do. Solaris has them only in
  .zfs/snapshots. This allows us to have read-only mounts without even
  mounting the parent zfs.
  
  Before v15 we have been happy with that code and had no issues :-)
  
  I have a very simple testcase where just fixing the VFS_RELE breaks our
  forced unmount. Let's say we use the correct VFS_RELE in zfs_vfsops.c:
  VFS_RELE(vfsp-mnt_vnodecovered-v_vfsp);
  
  Now let's say you have a mounted filesystem (e.g. md) under /mnt:
  /dev/md5 on /mnt (ufs, local)
  
  # mkdir /mnt/test
  # mount -t zfs t...@t2 /mnt/test
  # umount -f /mnt
  
  Now you will hang because the second VFS_HOLD.
 
 Hang here would be bad, I agree.
 But I think that the umount shouldn't succeed either, in this case.
Normal unmount indeed shall not succeed in this case, because mount
adds a reference to the covered vnode. But forced unmount should be
allowed to proceed.

After unmount, you can use fsid to unmount the lower mount point.
 
  So I stick to my opinion
  that this extra protection is more a problem than a solution in our
  case and it should be commented out.
 
 
 -- 
 Andriy Gapon
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


pgpZDbybghpdJ.pgp
Description: PGP signature

Re: AESNI

2010-11-12 Thread Kostik Belousov

On Fri, Nov 12, 2010 at 09:58:53AM -0500, Daryl Richards wrote:
 I'm wondering what the status is of AES-NI in 8-STABLE? I can find 
 references to it being put into 9 back in July, with the note that it 
 would be MFC'ed within a month, but as far as I've been able to find, 
 nothing after that.
As a public service, since I already got several private mails,
I will state the current situation:

AESNI merge depends on the merge of r208833 and a lot of followups to
r208833. The merge of r208833 needs r208453, that was committed to stable/8
only recently, since it required an approval from r...@.

This delay together with lack of the time recently on my side makes me
skeptical about chances to have aesni(4) in 8.2.

Please note that aesni(4) is not needed to use AESNI in usermode.

 
 Did I miss something? Does it just need testing? How can I go about 
 doing that? I'd like to help!
 
 Thanks,
 -- 
 Daryl Richards
 Isle Technical Services Inc.
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


pgpvTKCa8O3yK.pgp
Description: PGP signature

Re: NFS deadlock (unkillable nfsd and no mounts work)

2010-11-05 Thread Kostik Belousov

On Fri, Nov 05, 2010 at 10:27:09AM -0700, Josh Carroll wrote:
  I'm having a problem with nfsd hanging and not serving mount points,
  during which time it can not not be killed. This problem started
  happening sometime after November 2nd, since kernel from 11/2 sources
  does not exhibit this problem.
 
  Please try the attached patch, rick
 
 Thanks! I had to manually patch for some reason, but I can confirmed
 that nfsd is now well-behaved with your patch applied. I tested a
 couple of different mounts and played two separate files on the
 Popcorn Hour (one lower bitrate, the other higher bitrate) and both
 played without a hiccup. While those were playing I also was able to
 automount my home directory on the macbook and move around my home
 directory.
 
 So it looks like this patch did the trick. Thanks Rick, really
 appreciate the fast response. Is there a reason why this doesn't seem
 to be getting reported a lot? What is particular in my setup that
 broke it?
 
  ps: Starting about Monday I won't be able to do commits for about 3 weeks
     so, if this patch works, could someone else please commit it, thanks,
     rick
 
 
 If someone can commit this, I'd really appreciate it. I will report
 back if I notice any problems, but I imagine this would probably get
 fixed in HEAD first, then MFC'd anyway, right? Unless this is already
 fixed in HEAD.
 
 Anyway, thanks again Rick! I appreciate it.
 
 Regards,
 Josh
 As far as I can tell, there have been no adverse effects or
 regressions with the kernel built with this patch (I had t

I agree that the fix a right fix for real issue. It should only
affect the filesystems that do support VFS_VGET(). In other words,
it is relevant for e.g. UFS exports, but not for ZFS, that is the
Andrey case.

The change is committed as r214851 with shortest MFC timeout possible.

There is further issue with use of VOP_ISLOCKED(). Andrey, can you
try this untested change in your settings ?

Thanks and sorry.

diff --git a/sys/nfsserver/nfs_serv.c b/sys/nfsserver/nfs_serv.c
index 2b9131f..668b02c 100644
--- a/sys/nfsserver/nfs_serv.c
+++ b/sys/nfsserver/nfs_serv.c
@@ -3037,6 +3037,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct 
nfssvc_sock *slp,
struct vattr va, at, *vap = va;
struct nfs_fattr *fp;
int len, nlen, rem, xfer, tsiz, i, error = 0, error1, getret = 1;
+   int vp_locked;
int siz, cnt, fullsiz, eofflag, rdonly, dirlen, ncookies;
u_quad_t off, toff, verf;
u_long *cookies = NULL, *cookiep; /* needs to be int64_t or off_t */
@@ -3067,10 +3068,12 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct 
nfssvc_sock *slp,
fullsiz = siz;
error = nfsrv_fhtovp(fhp, 1, vp, vfslocked, nfsd, slp,
nam, rdonly, TRUE);
+   vp_locked = 1;
if (!error  vp-v_type != VDIR) {
error = ENOTDIR;
vput(vp);
vp = NULL;
+   vp_locked = 0;
}
if (error) {
nfsm_reply(NFSX_UNSIGNED);
@@ -3090,6 +3093,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct 
nfssvc_sock *slp,
error = nfsrv_access(vp, VEXEC, cred, rdonly, 0);
if (error) {
vput(vp);
+   vp_locked = 0;
vp = NULL;
nfsm_reply(NFSX_V3POSTOPATTR);
nfsm_srvpostop_attr(getret, at);
@@ -3097,6 +3101,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct 
nfssvc_sock *slp,
goto nfsmout;
}
VOP_UNLOCK(vp, 0);
+   vp_locked = 0;
rbuf = malloc(siz, M_TEMP, M_WAITOK);
 again:
iv.iov_base = rbuf;
@@ -3110,6 +3115,7 @@ again:
io.uio_td = NULL;
eofflag = 0;
vn_lock(vp, LK_SHARED | LK_RETRY);
+   vp_locked = 1;
if (cookies) {
free((caddr_t)cookies, M_TEMP);
cookies = NULL;
@@ -3118,6 +3124,7 @@ again:
off = (u_quad_t)io.uio_offset;
getret = VOP_GETATTR(vp, at, cred);
VOP_UNLOCK(vp, 0);
+   vp_locked = 0;
if (!cookies  !error)
error = NFSERR_PERM;
if (!error)
@@ -3238,8 +3245,10 @@ again:
} else {
cn.cn_flags = ~ISDOTDOT;
}
-   if (!VOP_ISLOCKED(vp))
+   if (!vp_locked) {
vn_lock(vp, LK_SHARED | LK_RETRY);
+   vp_locked = 1;
+   }
if ((vp-v_vflag  VV_ROOT) != 0 
(cn.cn_flags  ISDOTDOT) != 0) {
vref(vp);
@@ -3342,7 +3351,7 @@ invalid:
cookiep++;
ncookies--;
}
-   if (!usevget  VOP_ISLOCKED(vp))
+   if (!usevget  vp_locked)
vput(vp);
else

Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Kostik Belousov

On Sat, Oct 30, 2010 at 05:43:54PM +0300, Andriy Gapon wrote:
 on 30/10/2010 14:25 Artemiev Igor said the following:
  On Sat, Oct 30, 2010 at 01:33:00PM +0300, Andriy Gapon wrote:
  on 30/10/2010 13:12 Artemiev Igor said the following:
  On Sat, Oct 30, 2010 at 12:52:54PM +0300, Andriy Gapon wrote:
 
  Heh, next try.
 
  Got a panic, vm_page_unwire: invalid wire count: 0
 
  Oh, thank you for testing - forgot another piece (VM_ALLOC_WIRE for 
  vm_page_alloc):
  
  Yep, it work. But VM_ALLOC_WIRE not exists in RELENG_8, therefore i 
  slightly modified your patch:
 
 I apologize for my haste, it should have been VM_ALLOC_WIRED.
 Here is a corrected patch:
 Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
 ===
 --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
 (revision 214318)
 +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
 (working copy)
 @@ -67,6 +67,7 @@
  #include sys/sf_buf.h
  #include sys/sched.h
  #include sys/acl.h
 +#include vm/vm_pageout.h
 
  /*
   * Programming rules.
 @@ -464,7 +465,7 @@
   uiomove_fromphys(m, off, bytes, uio);
   VM_OBJECT_LOCK(obj);
   vm_page_wakeup(m);
 - } else if (m != NULL  uio-uio_segflg == UIO_NOCOPY) {
 + } else if (uio-uio_segflg == UIO_NOCOPY) {
   /*
* The code below is here to make sendfile(2) work
* correctly with ZFS. As pointed out by ups@
 @@ -474,9 +475,23 @@
*/
   KASSERT(off == 0,
   (unexpected offset in mappedread for sendfile));
 - if (vm_page_sleep_if_busy(m, FALSE, zfsmrb))
 + if (m != NULL  vm_page_sleep_if_busy(m, FALSE, 
 zfsmrb))
   goto again;
 - vm_page_busy(m);
 + if (m == NULL) {
 + m = vm_page_alloc(obj, OFF_TO_IDX(start),
 + VM_ALLOC_NOBUSY | VM_ALLOC_WIRED | 
 VM_ALLOC_NORMAL);
 + if (m == NULL) {
 + VM_OBJECT_UNLOCK(obj);
 + VM_WAIT;
 + VM_OBJECT_LOCK(obj);
 + goto again;
 + }
 + } else {
 + vm_page_lock_queues();
 + vm_page_wire(m);
 + vm_page_unlock_queues();
 + }
 + vm_page_io_start(m);
Why wiring the page if it is busied ?


pgp8p8bSN9Uij.pgp
Description: PGP signature

Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Kostik Belousov

On Fri, Oct 29, 2010 at 06:31:21PM +0400, Alexander Zagrebin wrote:
   I've tried the nginx with
   disabled sendfile (the nginx.conf contains sendfile off;):
   
   $ dd if=/dev/random of=test bs=1m count=100
   100+0 records in
   100+0 records out
   104857600 bytes transferred in 5.892504 secs (17795083 bytes/sec)
   $ fetch -o /dev/null http://localhost/test
   /dev/null 100% of  100 
  MB   41 MBps
   $ fetch -o /dev/null http://localhost/test
   /dev/null 100% of  100 
  MB   44 MBps
   $ fetch -o /dev/null http://localhost/test
   /dev/null 100% of  100 
  MB   44 MBps
   
  
  I am really surprised with such a bad performance of sendfile.
  Will you be able to profile the issue further?
 
 Yes.
 
  I will also try to think of some measurements.
 
 A transfer rate is too low for the _first_ attempt only.
 Further attempts demonstrates a reasonable transfer rate.
 For example, nginx with sendfile on;:
 
 $ dd if=/dev/random of=test bs=1m count=100
 100+0 records in
 100+0 records out
 104857600 bytes transferred in 5.855305 secs (17908136 bytes/sec)
 $ fetch -o /dev/null http://localhost/test
 /dev/null   3% of  100 MB  118 kBps
 13m50s^C
 fetch: transfer interrupted
 $ fetch -o /dev/null http://localhost/test
 /dev/null 100% of  100 MB   39 MBps
 
 If there was no access to the file during some time, then everything
 repeats:
 The first attempt - transfer rate is too low
 A further attempts - no problems
 
 Can you reproduce the problem on your system?

Could it be the priming of the vm object pages content ?
Due to double-buffering, and (possibly false) optimization to only
perform double-buffering when vm object already has some data cached,
reads can prime vm object page list before file is mmapped or
sendfile-ed.



pgpnA8KHQc5Dk.pgp
Description: PGP signature

Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Kostik Belousov

On Fri, Oct 29, 2010 at 06:05:26PM +0300, Andriy Gapon wrote:
 on 29/10/2010 17:53 Kostik Belousov said the following:
  Could it be the priming of the vm object pages content ?
 
 Sorry, not familiar with this term.
 Do you mean prepopulation of vm object with valid pages?
 
  Due to double-buffering, and (possibly false) optimization to only
 
 What optimization?
On zfs vnode read, the page from the corresponding vm object is only
populated with the vnode data if the page already exists in the
object.

Not doing the optimization would be to allocate the page uncoditionally
on the read if not already present, and copy the data from ARC to the page.
 
  perform double-buffering when vm object already has some data cached,
  reads can prime vm object page list before file is mmapped or
  sendfile-ed.
  
 
 No double-buffering is done to optimize anything. Double-buffering
 is a consequence of having page cache and ARC. The special
 double-buffering code is to just handle that fact - e.g. making
 sure that VOP_READ reads data from page cache instead of ARC if it's
 possible that the data in them differs (i.e. page cache has more
 recent data).

 So, if I understood the term 'priming' correctly, no priming should
 ever occur.
The priming is done on the first call to VOP_READ() with the right
offset after the page is allocated.


pgpsWIastHVGc.pgp
Description: PGP signature

Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Kostik Belousov

On Fri, Oct 29, 2010 at 06:22:54PM +0300, Andriy Gapon wrote:
 on 29/10/2010 18:17 Kostik Belousov said the following:
  On Fri, Oct 29, 2010 at 06:05:26PM +0300, Andriy Gapon wrote:
  on 29/10/2010 17:53 Kostik Belousov said the following:
  Could it be the priming of the vm object pages content ?
 
  Sorry, not familiar with this term.
  Do you mean prepopulation of vm object with valid pages?
 
  Due to double-buffering, and (possibly false) optimization to only
 
  What optimization?
  On zfs vnode read, the page from the corresponding vm object is only
  populated with the vnode data if the page already exists in the
  object.
 
 Do you mean a specific type of read?
 For normal reads it's the other way around - if the page already exists and 
 is
 valid, then we read from the page, not from ARC.
Let me repeat it once more:
zfs does not properly caches the vnode data content in the page cache
(the cache is used in a weaker sence, not meaning the freebsd 'cached'
memory, but a cache in more common sence). Not doing the optimization
I mentioned would mean always allocating the pages and making it
(partially) valid for each read call.
 
  Not doing the optimization would be to allocate the page uncoditionally
  on the read if not already present, and copy the data from ARC to the page.
 
  perform double-buffering when vm object already has some data cached,
  reads can prime vm object page list before file is mmapped or
  sendfile-ed.
 
 
  No double-buffering is done to optimize anything. Double-buffering
  is a consequence of having page cache and ARC. The special
  double-buffering code is to just handle that fact - e.g. making
  sure that VOP_READ reads data from page cache instead of ARC if it's
  possible that the data in them differs (i.e. page cache has more
  recent data).
 
  So, if I understood the term 'priming' correctly, no priming should
  ever occur.
  The priming is done on the first call to VOP_READ() with the right
  offset after the page is allocated.
 
 Again, what is priming?
Filling the cache with an appropriate content.


pgpc8DbIfno18.pgp
Description: PGP signature

Re: kpanic on install 32GB of RAM [SEC=UNCLASSIFIED]

2010-10-21 Thread Kostik Belousov

On Thu, Oct 21, 2010 at 09:50:03AM -0700, Sean Bruno wrote:
 On Thu, 2010-10-21 at 05:48 -0700, Andriy Gapon wrote:
  on 20/10/2010 21:28 Sean Bruno said the following:
   I guess, I could replace the kernel on the CD and have them reburn it?
  
  That should work.
  BTW, here I described yet another way of building custom 
  recovery/installation
  CDs that I use:
  http://wiki.freebsd.org/AvgLiveCD
  
 
 Before I get started on this, it looks like something else is going on.
 
 Here is a panic + trace on the latest 9-current snap shot.  hammer
 time indeed.  
 
 Suggestions are welcome!
 
 
 http://people.freebsd.org/~sbruno/9-current-panic.png
 
 http://people.freebsd.org/~sbruno/9-current-trace-panic.png

It feels like msgbufp variable has absurd value. Can you arrange
to get the output of verbose boot, esp. the SMAP lines ?
Also, you could add printfs near amd64/amd64/machdep.c:1517
/* Map the message buffer. */
msgbufp = (struct msgbuf *)PHYS_TO_DMAP(phys_avail[pa_indx]);
to show the values of all participants, i.e. msgbufp, pa_indx
and phys_avail[pa_indx].


pgpl9NWXh3FQ7.pgp
Description: PGP signature

Re: Panic with chromium and 8.1-STABLE (Thu Sep 16 09:52:17 BRT 2010)

2010-09-22 Thread Kostik Belousov

On Sun, Sep 19, 2010 at 07:28:13PM -0300, Mario Sergio Fujikawa Ferreira wrote:
 Hi,
 
   I've just began trying chrome web browser from
 http://chromium.hybridsource.org/ but it triggered 2 panics on my
 8.1-STABLE system.
 
 $ uname -a
 FreeBSD exxodus.fedaykin.here 8.1-STABLE FreeBSD 8.1-STABLE #26: Thu Sep 16 
 09:52:17 BRT 2010 li...@exxodus:/usr/obj/usr/src/sys/LIOUX  amd64
 
   The panic information is:
 
 
 panic: vm_page_unwire: invalid wire count: 0
 cpuid = 0
 KDB: enter: panic
 
 0xff006ecce000: tag ufs, type VREG
 usecount 1, writecount 1, refcount 4 mountedhere 0
 flags ()
 v_object 0xff0151489870 ref 0 pages 8
 lock type ufs: EXCL by thread 0xff00200947c0 (pid 25025)
 ino 119526591, on dev ufs/fsusr
 
 0xff011107f938: tag ufs, type VREG
 usecount 0, writecount 0, refcount 4 mountedhere 0
 flags (VV_NOSYNC|VI_DOINGINACT)
 v_object 0xff0151f7f870 ref 0 pages 1284
 lock type ufs: EXCL by thread 0xff01882cc7c0 (pid 26689)
 ino 263, on dev md0
 
 
   I've made available 2 ddb textdumps at:
 
 http://people.freebsd.org/~lioux/panic/2010091900/textdump.tar.0
 http://people.freebsd.org/~lioux/panic/2010091900/textdump.tar.1
 
   I was able to use chrome prior to this latest kernel update.
 Now, I can reproduce a kernel panic even browsing www.google.com
 
   Please, let me know if I can provide any further information.

Does it panic if you remove ZERO_COPY_SOCKETS option from the kernel
config ?


pgpN8BHEAqKFh.pgp
Description: PGP signature

Re: Panic with chromium and 8.1-STABLE (Thu Sep 16 09:52:17 BRT 2010)

2010-09-22 Thread Kostik Belousov

On Wed, Sep 22, 2010 at 03:58:12PM -0400, jhell wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 09/22/2010 09:28, Kostik Belousov wrote:
  On Sun, Sep 19, 2010 at 07:28:13PM -0300, Mario Sergio Fujikawa Ferreira 
  wrote:
  Hi,
 
 I've just began trying chrome web browser from
  http://chromium.hybridsource.org/ but it triggered 2 panics on my
  8.1-STABLE system.
 
  $ uname -a
  FreeBSD exxodus.fedaykin.here 8.1-STABLE FreeBSD 8.1-STABLE #26: Thu Sep 
  16 09:52:17 BRT 2010 li...@exxodus:/usr/obj/usr/src/sys/LIOUX  amd64
 
 The panic information is:
 
  
  panic: vm_page_unwire: invalid wire count: 0
  cpuid = 0
  KDB: enter: panic
 
  0xff006ecce000: tag ufs, type VREG
  usecount 1, writecount 1, refcount 4 mountedhere 0
  flags ()
  v_object 0xff0151489870 ref 0 pages 8
  lock type ufs: EXCL by thread 0xff00200947c0 (pid 25025)
  ino 119526591, on dev ufs/fsusr
 
  0xff011107f938: tag ufs, type VREG
  usecount 0, writecount 0, refcount 4 mountedhere 0
  flags (VV_NOSYNC|VI_DOINGINACT)
  v_object 0xff0151f7f870 ref 0 pages 1284
  lock type ufs: EXCL by thread 0xff01882cc7c0 (pid 26689)
  ino 263, on dev md0
  
 
 I've made available 2 ddb textdumps at:
 
  http://people.freebsd.org/~lioux/panic/2010091900/textdump.tar.0
  http://people.freebsd.org/~lioux/panic/2010091900/textdump.tar.1
 
 I was able to use chrome prior to this latest kernel update.
  Now, I can reproduce a kernel panic even browsing www.google.com
 
 Please, let me know if I can provide any further information.
  
  Does it panic if you remove ZERO_COPY_SOCKETS option from the kernel
  config ?
 
 This is triggered as well on a system without ZERO_COPY_SOCKETS just to
 clear that bit up.
I do not know what did prompted you to decide that the issue is the same.
There is nothing common except the word panic in the report by
lioux and your backtraces.

You could have better luck showing your traces on the fs@ or asking zfs
porters directly.


pgpY0ktYoKGOu.pgp
Description: PGP signature

Re: SuperMicro i7 (UP) - very slow performance

2010-09-18 Thread Kostik Belousov

On Sat, Sep 18, 2010 at 08:32:32AM -0500, Bryce Edwards wrote:
 I have a Supermicro with the C7X58 motherboard and an i7 930 cpu, and
 it is nowhere near the performance it should be.  A buildworld just
 took 22.5 hours!
I use 5046A-XB with i7-930 as home workstation, running latest RELENG_8,
and I do not have the problem you noted. My BIOS is v1.1, USB legacy is
enabled.

I did noted one issue with hw, built-in firewire controller generated
too high interrupt rate, so I usually do not load firewire.ko unless
needed.

 
 br...@tahiti[~]uname -a
 FreeBSD tahiti.bryce.net 8.1-STABLE FreeBSD 8.1-STABLE #0: Tue Sep  7
 22:45:38 CDT 2010
 r...@tahiti.bryce.net:/usr/obj/usr/src/sys/GENERIC  amd64
 
 I have disabled Legacy USB Support in the BIOS and that helped, but
 I'm not finding any other setting that are getting things where they
 need to be.
 
 I have tested the two system drives independently (currently a zfs
 mirror), so it is not likely to be an hdd issue.
 
 Here's the verbose dmesg boot details - http://www.bryce.net/files/dmesg.boot
 
 And, the IPMI ASL in case that is of any value -
 http://www.bryce.net/files/tahiti.asl
 
 Currently, I'm not running powerd, performance is not better with it running.
 
 r...@tahiti[/usr/src]#cat /boot/loader.conf
 ahci_load=YES
 coretemp_load=YES
 
 zfs_load=YES
 vfs.root.mountfrom=zfs:system
 #vfs.zfs.prefetch_disable=1
 
 kern.maxfiles=16384
 
 # async i/o
 aio_load=YES
 
 # VirtualBox
 #vboxdrv_load=YES
 
 # SMB
 #ichsmb_load=YES
 #smb_load=YES
 
 # Power Saving
 #kern.hz=100
 
 # Disable APIC subsystem - no longer needed when disabling lapic below
 #hint.apic.0.disabled=1
 
 # Disable local APIC (LAPIC) timer - for C3 state
 #hint.apic.0.clock=0
 
 # Avoid 128 interrupts/sec per core, at cost of scheduling precision
 #hint.atrtc.0.clock=0
 
 # Disable throttle control (and rely on EIST)
 hint.p4tcc.0.disabled=1
 hint.acpi_throttle.0.disabled=1
 
 Thanks in advance for your time!
 
 ::Bryce::
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


pgpSYvjBzVbps.pgp
Description: PGP signature

Re: strange problem with FreeBSD 7.3 64bit

2010-09-10 Thread Kostik Belousov

On Fri, Sep 10, 2010 at 10:45:08AM +0200, freebsd wrote:
hi list,

we upgraded some 20 boxes from 7.1 and 7.2 to 7.3-RELEASE-p2 (all amd64)
and now are experiencing some weird behaviour on 6 of them with rsnapshot:

after a few days/several weeks (seems to be completely random),
rsnapshot reports that it can't start due it's lockfile and process
still being present. on such boxes either a zombie rm or find process
(which presumably were launched by rsnapshot) can be found.
if the backup was done to a separate partition (physical disks or RAIDs)
any access (ls, stat, fsck, etc) to the partition would kill the current
SSH session, creating a new zombie of the process one just started.
unmounting the affected partition would render the server completely
unresponsive and required a hardware reset.

when trying to restart, the machines wouldn't even shut down completely
but hanged somewhere after syncing buffers, only a hardware reset
worked. after the reboot, those partitions were unmounted and fscked.
after which the backups would work again until the next error happened
again.

the hardware of affected and unaffected system are:

HP ProLiant DL380 G4
HP ProLiant DL380 G5
HP ProLiant DL360 G5

there is no visible pattern between affected and unaffected boxes. also
those machines were upgraded the exact same way, running identical
kernels (more or less GENERIC, with QUOTA activated).

we upgraded the most critical boxes which showed that behaviour on a
daily interval to 8.0-RELEASE and ever since this behavior has
disappeared since nearly 3 months now.

we installed a debug-kernel on an affected box, but the machine wouldn't
panic when the error occured. when trying to unmount the affected
partition it just went completely unresponsive, as mentioned above.

before trying to unmount procstat -ak showed some processes with
VOP_LOCK1_APV:

55396 100135 find - mi_switch sleepq_switch sleepq_wait _sleep acquire
_lockmgr ffs_lock VOP_LOCK1_APV _vn_lock vget cache_lookup
vfs_cache_lookup VOP_LOOKUP_APV lookup namei kern_lstat lstat syscall
70923 100146 rsync - mi_switch sleepq_switch sleepq_wait _sleep acquire
_lockmgr ffs_lock VOP_LOCK1_APV _vn_lock vget vfs_hash_get ffs_vgetf
ufs_lookup_ vfs_cache_lookup OP_LOOKUP_APV lookup namei kern_lstat

since this hardware has been working before 7.3 and -- as we assume --
would work again with 8.*, we would be grateful for any hints what could
be the cause of all this.
It sounds like a deadlock, but the cause cannot be identified without
further diagnostic. It might be driver (ciss I assume), but may be quota
code, or even something else.

Please follow the
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
to obtain the required information.

pgp7ynd7eg2du.pgp
Description: PGP signature

Re: strange problem with FreeBSD 7.3 64bit

2010-09-10 Thread Kostik Belousov

On Fri, Sep 10, 2010 at 12:04:50PM +0200, freebsd wrote:
 Am 10.09.2010 11:21, schrieb Kostik Belousov:
 On Fri, Sep 10, 2010 at 10:45:08AM +0200, freebsd wrote:
 It sounds like a deadlock, but the cause cannot be identified without
 further diagnostic. It might be driver (ciss I assume), but may be quota
 code, or even something else.
 
 Please follow the
 http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
 to obtain the required information.
 
 thanks for the quick answer.
 we've added the additional options to debug deadlocks. we'll have the 
 required information in the timeframe of 1-2 weeks, since the testbox 
 isn't that fast at generating the error.
 
 QUOTA most likely isn't the culprit, since 2 of the affected 6 boxes 
 were running GENERIC w/o any modifications.
Ah. Then, the ciss(4) is the main suspect, but I cannot help with it.


pgpRF6jA2MDow.pgp
Description: PGP signature

Re: csup in repomirror mode dumps core @ stable/8

2010-09-02 Thread Kostik Belousov

On Thu, Sep 02, 2010 at 03:59:07AM +0400, Dmitry Morozovsky wrote:
 Dear colleagues,
 
 some 2 days ago my repo mirror (stable/8...@amd64) starts dumping core on 
 copying 
 repo:
 
 ...
  SetAttrs CVSROOT-src/Emptydir
  Edit CVSROOT-src/access,v
 Segmentation fault (core dumped)
 
 deleting files from sup/cvsroot-all/ did not help
 
 unfortunately, quick usual `make -DDEBUG_FLAGS=-g' in /usr/src/usr.bin/csup 
 does not work, and I did not dig into this deeply yet, so trace are without 
 parameters:
I think it should be DEBUG_FLAGS=-g and not -D


pgpmHR1fWSTWG.pgp
Description: PGP signature

Re: STABLE kernel panic: privileged instruction fault

2010-08-16 Thread Kostik Belousov

On Mon, Aug 16, 2010 at 07:15:16PM +0400, Alexey Tarasov wrote:
 Hello.
 
 I have a couple of Supermicro servers which got the similar kernel panic with 
 all FreeBSD versions I tried since 6.4.
 Now I want to investigate into the problem.
 The servers get into panic with similar workload: file server with a lot of 
 files and connections. Web server software is nginx. File system is 
 UFS+GJOURNAL. Outgoing traffic on each server is ~10 MB/s.
 I think it is not software problem, because when I've installed Linux with 
 such configuration there were no kernel panics.
 
 Here is the short overview of the hardware:
 
 CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (2992.51-MHz K8-class CPU)
  Origin = GenuineIntel  Id = 0xf65  Family = f  Model = 6  Stepping = 5
  
 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Features2=0xe59dSSE3,DTES64,MON,DS_CPL,EST,TM2,CNXT-ID,CX16,xTPR,PDCM
  AMD Features=0x20100800SYSCALL,NX,LM
  AMD Features2=0x1LAHF
  TSC: P-state invariant
 real memory  = 2147483648 (2048 MB)
 avail memory = 2054619136 (1959 MB)
 
 DMESG: http://lexasoft.ru/m/dmesg.txt
 
 CORE: http://lexasoft.ru/m/core.txt
 
 Fatal trap 1: privileged instruction fault while in kernel mode
 cpuid = 1; apic id = 01
 instruction pointer = 0x20:0xff8040d2cc83
 stack pointer   = 0x28:0xff8040d2ca80
 frame pointer   = 0x28:0xff0060c0b740
 code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 9388 (nginx)
 trap number = 1
 panic: privileged instruction fault
 cpuid = 1
 Uptime: 17d15h48m49s
 Physical memory: 2032 MB
 Dumping 1485 MB: 1470 1454 1438 1422 1406 1390 1374 1358 1342 1326 1310 1294 
 1278 1262 1246 1230 1214 1198 1182 1166 1150 1134 1118 1102 1086 1070 1054 
 1038 1022 1006 990 974 958 942 926 910 894 878 862 846 830 814 798 782 766 
 750 734 718 702 686 670 654 638 622 606 590 574 558 542 526 510 494 478 462 
 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 206 190 174 158 
 142 126 110 94 78 62 46 30 14
 
 
 (kgdb) #0  doadump () at pcpu.h:223
 #1  0x80590c59 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:416
 #2  0x8059108c in panic (fmt=0x80951fc4 %s)
at /usr/src/sys/kern/kern_shutdown.c:579
 #3  0x80878fd8 in trap_fatal (frame=0xff0060c0b740, eva=Variable 
 eva is not available.
 )
at /usr/src/sys/amd64/amd64/trap.c:857
 #4  0x808799ea in trap (frame=0xff8040d2c9d0)
at /usr/src/sys/amd64/amd64/trap.c:644
 #5  0x8085f983 in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:224
 #6  0xff8040d2cc83 in ?? ()
 #7  0xff8040d2cb50 in ?? ()
 #8  0xff8040d2caf0 in ?? ()
 #9  0xff8040d2cbf0 in ?? ()
 #10 0xff0060c0b740 in ?? ()
 #11 0x80b83c60 in sysent ()
 #12 0xff8040d2cc80 in ?? ()
 #13 0xff8040d2cae0 in ?? ()
 #14 0x8059c431 in bintime (bt=0x80ad3140)
at /usr/src/sys/kern/kern_tc.c:200
 Previous frame inner to this frame (corrupt stack?)
 (kgdb) 
The backtrace make absolutely no sense. I would not trust kgdb anyway.

Compile ddb in and do backtrace in console on the panic. Also, disassemble
the kernel at the fault address. I am very curious which instruction causes
this. This is stock GENERIC on the bare metal booted, right ?


pgp0KYiAf9rFf.pgp
Description: PGP signature

Re: STABLE kernel panic: privileged instruction fault

2010-08-16 Thread Kostik Belousov

On Mon, Aug 16, 2010 at 11:21:15PM +0400, Alexey Tarasov wrote:
 Hello Kostik!
 
 On Aug 16, 2010, at 10:48 PM, Kostik Belousov wrote:
 
  
  The backtrace make absolutely no sense. I would not trust kgdb anyway.
  
  Compile ddb in and do backtrace in console on the panic. Also, disassemble
  the kernel at the fault address. I am very curious which instruction causes
  this. This is stock GENERIC on the bare metal booted, right ?
 
 Yes, stock GENERIC.
 
 Please, check this out:
 
 Dump of assembler code from 0xff0060c0b700 to 0xff0060c0b780:

Would be nice if you keep all requested data in one place, so that
we do not need to search for the old mails to see the context.

According to your previous mail, the fault happen at the
address
instruction pointer = 0x20:0xff8040d2cc83
Your disassembled the stack instead. Please just do
disass 0xff8040d2cc83,0xff8040d2cca0
in kgdb.

But also, I want to see the backtrace and disassembly output from ddb.


pgp2XZZMDRqkp.pgp
Description: PGP signature

Re: STABLE kernel panic: privileged instruction fault

2010-08-16 Thread Kostik Belousov

On Mon, Aug 16, 2010 at 11:35:36PM +0400, Alexey Tarasov wrote:
 
 On Aug 16, 2010, at 11:31 PM, Kostik Belousov wrote:
 
  On Mon, Aug 16, 2010 at 11:21:15PM +0400, Alexey Tarasov wrote:
  Hello Kostik!
  
  On Aug 16, 2010, at 10:48 PM, Kostik Belousov wrote:
  
  
  The backtrace make absolutely no sense. I would not trust kgdb anyway.
  
  Compile ddb in and do backtrace in console on the panic. Also, disassemble
  the kernel at the fault address. I am very curious which instruction 
  causes
  this. This is stock GENERIC on the bare metal booted, right ?
  
  Yes, stock GENERIC.
  
  Please, check this out:
  
  Dump of assembler code from 0xff0060c0b700 to 0xff0060c0b780:
  
  Would be nice if you keep all requested data in one place, so that
  we do not need to search for the old mails to see the context.
  
  According to your previous mail, the fault happen at the
  address
  instruction pointer = 0x20:0xff8040d2cc83
  Your disassembled the stack instead. Please just do
  disass 0xff8040d2cc83,0xff8040d2cca0
  in kgdb.
  
  But also, I want to see the backtrace and disassembly output from ddb.
 
 (kgdb) disass 0xff8040d2cc83,0xff8040d2cca0
 No function contains specified address.
Err, it seems that old gdb accepts only spaces. Please try
disass 0xff8040d2cc83 0xff8040d2cca0 instead.

 
 I will build kernel with DDB tomorrow, install it on some servers and wait 
 for the panic occurs.

Ok. Did you checked for such things as rootkits ?


pgpTly6pt0t7A.pgp
Description: PGP signature

Re: Kernel symbol file alternate location

2010-08-06 Thread Kostik Belousov

On Fri, Aug 06, 2010 at 09:29:31AM +0200, Oliver Fromme wrote:
 Daniel O'Connor wrote:
   On 06/08/2010, at 2:38, Oliver Fromme wrote:
 I think this is the main reason / has had to grow - the actual kernel
 is relatively small so even a 256Mb / could hold several, but with
 the symbol files it is not possible.

I think a very simple solution would be to install the symbol
files elsewhere (probably configurable via make.conf), and
install symlinks in the kernel directory.  If you do this,
tools using the symbol files won't have to be changed.

This would probably be a fairly trivial change to the install-
kernel target, I guess.  I don't have patches, though.
   
   Yeah, I don't think it's hard to move them, however I'm worried what
   it will break :)
  
   The only thing I can see that would have to change would be kgdb so
   it tells gdb where to find the symbols.
 
 That's why I suggested to place symlinks in the kernel
 directory.  No change to kgdb necessary.
 
 It might even be possible to not install the symbol files
 at all, but keep them under /usr/obj, so the installkernel
 target would have to do nothing more than create symlinks.
 This could be controlled by a make.conf variable, like
 SYMLINK_SYMBOLS=YES (NO would be the existing behaviour
 of installing the actual symbol files in /boot/kernel).

If you keep /usr/obj around, you do not need symbol files at all,
and INSTALL_NODEBUG?=true in make.conf is enough. You can always
use kernel.debug and modules with debugging symbols from build
directory for kgdb.


pgpaoijv6x887.pgp
Description: PGP signature

1 2 3 4 5 >

1 - 100 of 406 matches

Mail list logo