Re: NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)

2015-11-27 Thread Emile `iMil' Heitor

On Thu, 26 Nov 2015, Emile `iMil' Heitor wrote:

If this issue _is_ NFS related, which I doubt now, it is then read-related, 
as

the build is done in tmpfs.


Pushing the logic further, I just tried with pkgsrc itself being in tmpfs, and
it froze even faster.


Emile `iMil' Heitor * 
  _
| http://imil.net| ASCII ribbon campaign ( )
| http://www.NetBSD.org  |  - against HTML email  X
| http://gcu.info|  & vCards / \



Re: NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)

2015-11-26 Thread Manuel Bouyer
On Thu, Nov 26, 2015 at 07:13:04PM +0100, Emile `iMil' Heitor wrote:
> On Thu, 26 Nov 2015, Manuel Bouyer wrote:
> 
> >what does 'show uvm' report ?
> 
> db{0}> show uvm
> Current UVM status:
>   pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
> , ncolors=8  7444115 VM pages: 53990 active, 1807 inactive, 1 wired, 7302474 
> fre

OK, so it's not a "out of memory" issue

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)

2015-11-26 Thread Emile `iMil' Heitor

On Thu, 26 Nov 2015, Manuel Bouyer wrote:


what does 'show uvm' report ?


db{0}> show uvm
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
, ncolors=8  7444115 VM pages: 53990 active, 1807 inactive, 1 wired, 7302474 fre
e
  pages  26626 anon, 25029 file, 4143 exec
  freemin=4096, free-target=5461, wired-max=2481371
  cpu0:
faults=2789285, traps=2796277, intrs=1396531, ctxswitch=1585247
softint=792597, syscalls=1259477
  cpu1:
faults=1696656, traps=1698017, intrs=180486, ctxswitch=127378
softint=36160, syscalls=608644
  cpu2:
faults=1207093, traps=1208093, intrs=160266, ctxswitch=65538
softint=18178, syscalls=412550
  cpu3:
faults=1434344, traps=1435516, intrs=174413, ctxswitch=100028
softint=24126, syscalls=512909
  cpu4:
faults=1273978, traps=1275187, intrs=161384, ctxswitch=68847
softint=19305, syscalls=424913
  cpu5:
faults=1622825, traps=1624084, intrs=171817, ctxswitch=105319
softint=31330, syscalls=510165
  cpu6:
faults=1734292, traps=1735749, intrs=170374, ctxswitch=99131
softint=26841, syscalls=551106
  cpu7:
faults=1392652, traps=1393985, intrs=166582, ctxswitch=81469
softint=20174, syscalls=442880
  cpu8:
faults=1492063, traps=1493265, intrs=166791, ctxswitch=88768
softint=24325, syscalls=492824
  cpu9:
faults=1579170, traps=1580406, intrs=167471, ctxswitch=89049
softint=23423, syscalls=506804
  cpu10:
faults=2153399, traps=2154831, intrs=184225, ctxswitch=149924
softint=40597, syscalls=828691
  cpu11:
faults=3136585, traps=3138031, intrs=219926, ctxswitch=251413
softint=67270, syscalls=1262227
  cpu12:
faults=4211510, traps=4213265, intrs=222549, ctxswitch=273560
softint=78470, syscalls=1584403
  cpu13:
faults=3938228, traps=3940765, intrs=252763, ctxswitch=368601
softint=110598, syscalls=1636441
  cpu14:
faults=1720207, traps=1721476, intrs=183332, ctxswitch=138148
softint=43486, syscalls=759336
  cpu15:
faults=1547431, traps=1548457, intrs=177462, ctxswitch=126099
softint=36803, syscalls=657976
  fault counts:
noram=0, noanon=0, pgwait=0, pgrele=0
ok relocks(total)=19975519(19975516), anget(retrys)=1498606(0), amapcopy=186
2658
neighbor anon/obj pg=1672558/779689, gets(lock/unlock)=20195408/19975523
cases: anon=1035657, anoncow=462949, obj=18439642, prcopy=1755801, przero=11
148433
  daemon and swap counts:
woke=0, revs=0, scans=0, obscans=0, anscans=0
busy=0, freed=0, reactivate=0, deactivate=0
pageouts=0, pending=0, nswget=0
nswapdev=0, swpgavail=0
swpages=0, swpginuse=0, swpgonly=0, paging=0



Emile `iMil' Heitor * 
  _
| http://imil.net| ASCII ribbon campaign ( )
| http://www.NetBSD.org  |  - against HTML email  X
| http://gcu.info|  & vCards / \



Re: NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)

2015-11-26 Thread J. Hannken-Illjes
On 26 Nov 2015, at 17:22, Emile `iMil' Heitor  wrote:

> 
> 242691 3  14 0   a0003f446a80cat nfsrcv

> 1868 1 3  15 0   a0003dca59c0  getty nfsrcv
> 2354 1 3   5 0   a0003ed51b00   cron nfsrcv

> 
> 2086 1 3  14   100   a0003eca46a0   qmgr nfsrcv

> 677  1 3   5 0   a0003dd6fa40syslogd nfsrcv

> 0  131 3   3   200   a0003d75d140  nfskqpoll nfsrcv

Looks like a NFS problem, too many threads in nfsrcv...

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)

2015-11-26 Thread Manuel Bouyer
On Thu, Nov 26, 2015 at 05:22:16PM +0100, Emile `iMil' Heitor wrote:
> On Thu, 26 Nov 2015, Emile `iMil' Heitor wrote:
> 
> >Again, as there's no log at all, what would help debugging this behaviour?
> 
> FWIW, some ddb output (ddb is triggered by hitting + on domU's console):

what does 'show uvm' report ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)

2015-11-26 Thread Manuel Bouyer
On Thu, Nov 26, 2015 at 03:55:27PM +0100, Emile `iMil' Heitor wrote:
> [...]
> Again, as there's no log at all, what would help debugging this behaviour?

Can you enter ddb on the console (on a PV domU this is with '+++',
not break which doesn't exists for xl console)

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)

2015-11-26 Thread Emile `iMil' Heitor

On Thu, 26 Nov 2015, Emile `iMil' Heitor wrote:


Again, as there's no log at all, what would help debugging this behaviour?


FWIW, some ddb output (ddb is triggered by hitting + on domU's console):

fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 8012e5ad cs e030 rflags 202 cr2 7f7ff6c1e049
ilevel 8 rsp a0051864cc58
curlwp 0xa00035538840 pid 0.2 lowest kstack 0xa0051864a2c0
Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x5:  leave
breakpoint() at netbsd:breakpoint+0x5
xencons_tty_input() at netbsd:xencons_tty_input+0xb2
xencons_handler() at netbsd:xencons_handler+0x65
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x19
evtchn_do_event() at netbsd:evtchn_do_event+0x281
do_hypervisor_callback() at netbsd:do_hypervisor_callback+0x143
hypervisor_callback() at netbsd:hypervisor_callback+0x9e
idle_loop() at netbsd:idle_loop+0xe8
ds  3c80
es  c780
fs  c040
gs  7524
rdi a0003a62b330
rsi 8437d01f
rbp a0051864cc58
rbx 8437d01f
rdx 2b
rcx 2b
rax 1
r8  0
r9  805fc780cpu_info_primary
r10 cdd9f51e239cbb87
r11 246
r12 a0003d754c00
r13 8437d020
r14 a0003a62b330
r15 1
rip 8012e5adbreakpoint+0x5
cs  e030
rflags  202
rsp a0051864cc58
ss  e02b
netbsd:breakpoint+0x5:  leave
db{0}> ps
PIDLID S CPU FLAGS   STRUCT LWP *   NAME WAIT
162591 3  13 0   a0003f435080awk netio
242691 3  14 0   a0003f446a80cat nfsrcv
233011 3  1480   a0003f3f8a60 sh wait
118351 3  1480   a0003f40c0c0  bmake wait
9493 1 3  1480   a0003f26d4a0 sh wait
4831 1 3  1480   a0003f42d320  bmake wait
285131 3  1480   a0003f3f09e0 sh wait
6232 1 3  1580   a0003ed15ae0 sh wait
1172 1 3  1580   a0003f083000  bmake wait
165441 3  1580   a0003f439500 sh wait
181411 3  1580   a0003f430420 sh wait
2349 1 3  1580   a0003f448280   bash wait
490  1 3   580   a0003f447680   sshd select
2135 1 3   080   a0003f445640   sshd select
2234 1 3   480   a0003f2be580   bash ttyraw
381  1 3  1280   a0003f2be9a0   sshd select
382  1 3  1380   a0003e9cc680   sshd select
1868 1 3  15 0   a0003dca59c0  getty nfsrcv
2354 1 3   5 0   a0003ed51b00   cron nfsrcv
1675 1 3  1180   a0003edbd2e0  inetd kqueue
2105 1 3  1280   a0003ee0d720   nrpe select
2086 1 3  14   100   a0003eca46a0   qmgr nfsrcv
2033 1 3  13 0   a0003eca4ac0 pickup nfskqdet
2055 1 3   4 0   a0003edbd700 master tstile
164013 5  11  1000   a0003efff740  python2.7
1640 9 3  1180   a0003ee0db40  python2.7 kqueue
1640 8 3  1280   a0003ed152a0  python2.7 kqueue
1640 1 3   980   a0003e0f6240  python2.7 select
1555 1 3  1380   a0003e0f6660   sshd select
1407 1 3  1380   a0003dd15a20 powerd kqueue
892  1 3   280   a0003e099640  rpc.lockd select
884  1 3  1580   a0003e099a60  rpc.statd select
686  1 3   780   a0003dd151e0rpcbind select
677  1 3   5 0   a0003dd6fa40syslogd nfsrcv
11 3   880   a0003d75b100   init wait
0  131 3   3   200   a0003d75d140  nfskqpoll nfsrcv
0  129 3   4   200   a0003dc4e160   aiodoned aiodoned
0  128 3   7   200   a0003dc4e580ioflush syncer
0  127 3   0   200   a0003dc4e9a0   pgdaemon pgdaemon
0  124 3  14   200   a0003d75a920  nfsio nfsiod
0  123 3  13   200   a0003d75a500  nfsio nfsiod
0  122 3   9   200   a0003d75a0e0  nfsio nfsiod
0  121 3  15   200   a0003d75b940  nfsio nfsiod
0  120 3   0   200   a0003d75b520  cryptoret crypto_w
0  119 3   0   200   a0003d7530c0  unpgc unpgc
0  118 3   0   200   a0003d75c960xen_balloon xen_balloon
0  117 3   9   200   a0003d75c540vmem_rehash vmem_rehash
0  116 3   0   200   a0003d75d980 xenbus r

NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)

2015-11-26 Thread Emile `iMil' Heitor

On Mon, 2 Nov 2015, Emile `iMil' Heitor wrote:


I'm trying to get rid of those hangs for weeks now, tried every mount flag
combination without success, the system would freeze randomly, leaving the 
whole
OS unresponsive. There's no log, no kernel message, the domU actually 
responds

to network solicitations (ping, telnet 22...) but once it's frozen, it is
impossible to run any command, it will just hang.

The exact same setup is successfully running since Sept 2014 on
NetBSD 6.1/amd64.

Any idea how to get some valuable information to help tracking down this
awful behaviour?


A bit of follow-up. I've been trying many workarounds during the past weeks, and
right now I'm not convinced it even is an NFS problem.
I've setup a tmpfs bulk build directory, and even that way, NetBSD 7.0 would
freeze randomly after a couple of minutes while processing `pbulk.sh'.
What I can say:

- the server is a fresh diskless NetBSD 7.0 domU (PXE/NFS)
- there's not a single information about the freeze, not even in the console
- I've only witnessed those freezes when calling `pbulk.sh' (couldn't get
  further anyway)
- cvs co pkgsrc does not freezes, I ran it many times without issues
- the domU stays up for days if no operation is made
- I started this domU on various dom0s to validate this was not a hardware
  problem, always had the same symptoms
- I tried a custom 7.0_STABLE kernel without success

If this issue _is_ NFS related, which I doubt now, it is then read-related, as
the build is done in tmpfs.

Again, as there's no log at all, what would help debugging this behaviour?


Emile `iMil' Heitor * 
  _
| http://imil.net| ASCII ribbon campaign ( )
| http://www.NetBSD.org  |  - against HTML email  X
| http://gcu.info|  & vCards / \



Re: Raspberry Pi 2, nfs mount hangs after some time

2015-11-02 Thread Christos Zoulas
In article ,
Emile `iMil' Heitor  wrote:
>On Mon, 2 Nov 2015, Christos Zoulas wrote:
>
>> Can you get into ddb?
>
>unfortunately no, the system hangs but does not panic, it just becomes 
>unusable.

If you start running crash on the console and leave it running, is
crash responsive when it hangs. Alternatively since it is pingable
you can implement a ping of death that calls the debugger or
crash-dumps and reboots. It seems strange to me though that the
system would be pingable and you can't get into the debugger.

christos



Re: Raspberry Pi 2, nfs mount hangs after some time

2015-11-02 Thread Emile `iMil' Heitor

On Mon, 2 Nov 2015, Christos Zoulas wrote:


Can you get into ddb?


unfortunately no, the system hangs but does not panic, it just becomes unusable.


Emile `iMil' Heitor * 
  _
| http://imil.net| ASCII ribbon campaign ( )
| http://www.NetBSD.org  |  - against HTML email  X
| http://gcu.info|  & vCards / \



Re: Raspberry Pi 2, nfs mount hangs after some time

2015-11-02 Thread Emile `iMil' Heitor

On Mon, 26 Oct 2015, Robert Elz wrote:


For workarounds, mount using tcp (won't cure the problem, but will
make it far less common), and use interruptible mounts (mount_nfs -T -i)
so when it does hang, you can kill the process(es) at least.


A mee-too reply.

I've setup a NetBSD 7.0/amd64 bulk-build domU as I did for NetBSD 6.1/amd64, it
uses our platform's NetApp NFS servers (thousands of Linux domUs are using
those, the hardware is not guilty).
I'm trying to get rid of those hangs for weeks now, tried every mount flag
combination without success, the system would freeze randomly, leaving the whole
OS unresponsive. There's no log, no kernel message, the domU actually responds
to network solicitations (ping, telnet 22...) but once it's frozen, it is
impossible to run any command, it will just hang.

The exact same setup is successfully running since Sept 2014 on
NetBSD 6.1/amd64.

Any idea how to get some valuable information to help tracking down this
awful behaviour?


Emile `iMil' Heitor * 
  _
| http://imil.net| ASCII ribbon campaign ( )
| http://www.NetBSD.org  |  - against HTML email  X
| http://gcu.info|  & vCards / \



Re: Raspberry Pi 2, nfs mount hangs after some time

2015-10-26 Thread Rhialto
On Mon 26 Oct 2015 at 20:44:15 +0700, Robert Elz wrote:
> Almost for sure the trigger is lost packets, perhaps only in some specific

Just recently I posted about a case where the queue of outstanding
requests (I think) got corrupted, leading either to hangs or kernel
crashes :-(

> Date: Tue, 20 Oct 2015 00:32:55 +0200
> From: Rhialto 
> Subject: NFS related panic? (was: Re: Killing a zombie process?)
> To: current-us...@netbsd.org

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- The Doctor: No, 'eureka' is Greek for
\X/ rhialto/at/xs4all.nl-- 'this bath is too hot.'


signature.asc
Description: PGP signature


Re: Raspberry Pi 2, nfs mount hangs after some time

2015-10-26 Thread Robert Elz
Date:Mon, 26 Oct 2015 19:39:23 +0700
From:Pongthep Kulkrisada 
Message-ID:  <20151026123923.g...@gmail.com>

  | Please you be specific or kindly give some examples.

It has been a long time, but if you search the list archives you'll
find many references to hangs using NFS.   I recall no actual fixes
(plenty of avoidance suggestions.)

Almost for sure the trigger is lost packets, perhaps only in some specific
situations, rather than just any lost packets, which is why tcp helps,
it will usually recover fine, and all NFS sees is a slight delay.

That would mean that if (in practice) your net loses very few packets, you
might never experience a problem.

But try swamping the net &/or server (mbufs in particular) while attempting
a UDP mounted busy NFS connection (ie: force lost packets) and I expect
you'll see the problem too.

When we were doing this we had dozens of busy nfs clients (or as busy
as SS1's and 486 class PCs of the vintage can ever be) sharing a 10Mbps
(thin wire) ethernet - lost packets would have been common.   Hangs were
not exactly frequent, even there, but often enough to be noticeable.

kre



Re: Raspberry Pi 2, nfs mount hangs after some time

2015-10-26 Thread Pongthep Kulkrisada
* Robert Elz (k...@munnari.oz.au) wrote:
> Issue, yes, NetBSD's nfs client has been buggy approximately forever
> (I used to see that problem back in 1.3 days (ie: ~ 20 years ago), and
> have never seen anything that could even remotely be considered a fix being
> committed.)
> 
> NFS server on NetBSD is just fine, and
> perfectly safe to use, it is just the client that tends to hang.

Please you be specific or kindly give some examples.
I have been using NFS client on NetBSD for a while without issues.
(NFS server on OS-X)

-- 
Pongthep Kulkrisada
 
"UNIX is basically a simple operating system,
but you have to be a genius to understand the simplicity."
-- Dennis M. Ritchie


Re: Raspberry Pi 2, nfs mount hangs after some time

2015-10-26 Thread Robert Elz
Date:Mon, 26 Oct 2015 09:41:14 +0530
From:Mayuresh 
Message-ID:  <20151026041114.GA29429@odin>

  | after a while the nfs mount got hung.

What mount options did you use?

  | Is there any known issue and workaround for this.

Issue, yes, NetBSD's nfs client has been buggy approximately forever
(I used to see that problem back in 1.3 days (ie: ~ 20 years ago), and
have never seen anything that could even remotely be considered a fix being
committed.)

For workarounds, mount using tcp (won't cure the problem, but will
make it far less common), and use interruptible mounts (mount_nfs -T -i)
so when it does hang, you can kill the process(es) at least.

  | I tried mounting an NFS share from Linux, though got "RPC Program not
  | registered error". Not sure why.

Something that you didn't start that linux expects probably.   But that
would make no difference, the NFS server on NetBSD is just fine, and
perfectly safe to use, it is just the client that tends to hang.

kre