buildworld errors at outset on fresh svn checkout

2016-10-06 Thread Scott Bennett
 I'm running into a problem in updating my 10-STABLE system from source.
A "make buildworld" quits immediately.  I tried a fresh svn checkout for
base/stable/10 and then tried to run buildworld again, but got the same error.
I've been scratching my head over this for hours, but must be missing something
simple.
 I have ccache installed and have been using it for a fairly long time now.
My /etc/src.conf contains just two lines:

PORTS_MODULES=multimedia/cuse4bsd-kmod sysutils/pefs-kmod # 
emulators/virtualbox-ose-kmod
WITH_LLDB=yes

My /etc/make.conf is rather longer, so I'll append it following .sig below.

 Here's what happens.

Script started on Thu Oct  6 23:31:47 2016
hellas# cd /usr/src
hellas# nice make buildworld
Unknown modifier '['

"/usr/src/Makefile.inc1", line 1113: Malformed conditional (${BUILDKERNELS:[)
Unknown modifier '['

"/usr/src/Makefile.inc1", line 1122: if-less endif
Unknown modifier '['

"/usr/src/Makefile.inc1", line 1144: Malformed conditional (${BUILDKERNELS:[)
Unknown modifier '['

"/usr/src/Makefile.inc1", line 1161: if-less endif
Unknown modifier '['

"/usr/src/Makefile.inc1", line 1183: Malformed conditional (${BUILDKERNELS:[)
Unknown modifier '['

"/usr/src/Makefile.inc1", line 1190: if-less endif
bmake: fatal errors encountered -- cannot continue
*** Error code 1

Stop.
make: stopped in /usr/src
hellas# exit
exit

Script done on Thu Oct  6 23:37:00 2016

 This just started happening after my machine had been down for a couple
of days after a hang that damaged stuff in /usr/home.  I had already restored
/usr/local from backups before narrowing down the weird behavior I was seeing
in wmaker to /usr/home corruption.  So /usr/home has now been restored to
good condition, too, but perhaps I need to restore something else as well.
This mess was part of my justification to myself for the fresh checkout of
/usr/src, but that doesn't seem to have made any difference in the buildworld
failure.
 If anyone else can see what's wrong and clue me in, I'd be grateful.
I'm subscribed to the digest for this list, so please Cc: me directly, so
I'll get replies right away.
Thanks in advance!


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
/etc/make.conf contains:

CPUTYPE?=core2
CFLAGS+="-mtune=core2"
SVNFLAGS?="-r RELENG_10"
# build ports with clang stack protector
WITH_SSP=yes
SSP_CFLAGS=-fstack-protector-all
# added for ports system use to avoid dialogs by SJB  4 May 2007
BATCH=YES
# added for new pkg system  --SJB  10 December 2014
WITH_PKGNG=yes
# build ports using ccache  --SJB  19 January 2015
WITH_CCACHE_BUILD=yes
## buildworld and buildkernel using ccache  --SJB  26 January 2015
.if (!empty(.CURDIR:M/usr/src*) || !empty(.CURDIR:M/usr/obj*))
.if !defined(NOCCACHE) && exists(/usr/local/libexec/ccache/world/cc)
CC:=${CC:C,^cc,/usr/local/libexec/ccache/world/cc,1}
CXX:=${CXX:C,^c\+\+,/usr/local/libexec/ccache/world/c++,1}
CCACHE_COMPILERCHECK=content
CCACHE_DIR=/buildwork/ccache.freebsd
.endif
.else
CFLAGS+="-mssse3"
#CFLAGS+="-mssse3 -msse4.1"
.endif
# added to deal with ccache bug 8460  --SJB  2 November 2013
# bug has been reported fixed, so try without this workaround
#CCACHE_CPP2=1
# added as a better specification of -j by SJB 17 November 2009
MAKE_JOBS_NUMBER=4
# put build tree where there is plenty of temporary workspace
WRKDIRPREFIX=/buildwork/ports
DEFAULT_VERSIONS+=  ssl=openssl
# Allow updating of Mesa3D from 7.4.4 to 7.6.1 and libdrm from 2.4.12 to 2.4.17
WITHOUT_NOUVEAU=yes
# Use ATLAS libraries in ports that use BLAS libraries
OPTIONS_SET=ATLAS
# Tell gnustep-related ports to use base system's compiler
GNUSTEP_WITH_BASE_GCC=yes
GNUSTEP_WITHOUT_LIBOBJC=yes
QT4_OPTIONS= CUPS NAS QGTKSTYLE
# Begin portconf settings
# Do not touch these lines
.if !empty(.CURDIR:M/usr/ports*) && exists(/usr/local/libexec/portconf)
_PORTCONF!=/usr/local/libexec/portconf
.if ${_PORTCONF} != "|"
.for i in ${_PORTCONF:S/^|//:S/|/ /g}
${i:C/^([^=]*)=.*/\1/}=${i:C/^[^=]*=//:S/%/ /g}
.endfor
.endif
.endif
# End portconf settings
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Reproducible panic - Going nowhere without my init!

2016-10-06 Thread Graham Menhennitt
Let me preface this by saying that I know nothing about this particular 
bit of code, but...


As a general rule, I would question the use of gettimeofday() while 
panicing. At that stage, everything could have already gone down the 
plug hole.


That said, it already calls sleep(), so maybe that uses the same 
gettimeofday() call internally. In which case, please ignore this comment.


Graham


On 7/10/2016 9:32 AM, Andy Farkas wrote:

With your latest patch applied, I ran through my procedure more
than a dozen times and no panics!

Any explanation why sleep(STALL_TIMEOUT) as apposed to a
bunch of sleep(1)'s tickles the panic?

Also, it is definitely not sleeping for 30 seconds. I guess some
event interrupts the sleep loop?

Thanks heaps for your time and effort,

-andyf

%%%
Please try the following patch.

diff --git a/sbin/init/init.c b/sbin/init/init.c
index bda86b5..25ac2bd 100644
--- a/sbin/init/init.c
+++ b/sbin/init/init.c
@@ -870,6 +870,7 @@ single_user(void)
   sigset_t mask;
   const char *shell;
   char *argv[2];
+ struct timeval tv, tn;
  #ifdef SECURE
   struct ttyent *typ;
   struct passwd *pp;
@@ -884,8 +885,13 @@ single_user(void)
   if (Reboot) {
   /* Instead of going single user, let's reboot the machine */
   sync();
- reboot(howto);
- _exit(0);
+ if (reboot(howto) == -1) {
+ emergency("reboot(%#x) failed, %s", howto,
+strerror(errno));
+ _exit(1); /* panic and reboot */
+ }
+ warning("reboot(%#x) returned", howto);
+ _exit(0); /* panic as well */
   }

   shell = get_shell();
@@ -1002,7 +1008,14 @@ single_user(void)
   *  reboot(8) killed shell?
   */
   warning("single user shell terminated.");
- sleep(STALL_TIMEOUT);
+ gettimeofday(, NULL);
+ tn = tv;
+ tv.tv_sec += STALL_TIMEOUT;
+ while (tv.tv_sec > tn.tv_sec || (tv.tv_sec ==
+tn.tv_sec && tv.tv_usec > tn.tv_usec)) {
+ sleep(1);
+ gettimeofday(, NULL);
+ }
   _exit(0);
   } else {
   warning("single user shell terminated, restarting");
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Reproducible panic - Going nowhere without my init!

2016-10-06 Thread Andy Farkas
With your latest patch applied, I ran through my procedure more
than a dozen times and no panics!

Any explanation why sleep(STALL_TIMEOUT) as apposed to a
bunch of sleep(1)'s tickles the panic?

Also, it is definitely not sleeping for 30 seconds. I guess some
event interrupts the sleep loop?

Thanks heaps for your time and effort,

-andyf

%%%
Please try the following patch.

diff --git a/sbin/init/init.c b/sbin/init/init.c
index bda86b5..25ac2bd 100644
--- a/sbin/init/init.c
+++ b/sbin/init/init.c
@@ -870,6 +870,7 @@ single_user(void)
  sigset_t mask;
  const char *shell;
  char *argv[2];
+ struct timeval tv, tn;
 #ifdef SECURE
  struct ttyent *typ;
  struct passwd *pp;
@@ -884,8 +885,13 @@ single_user(void)
  if (Reboot) {
  /* Instead of going single user, let's reboot the machine */
  sync();
- reboot(howto);
- _exit(0);
+ if (reboot(howto) == -1) {
+ emergency("reboot(%#x) failed, %s", howto,
+strerror(errno));
+ _exit(1); /* panic and reboot */
+ }
+ warning("reboot(%#x) returned", howto);
+ _exit(0); /* panic as well */
  }

  shell = get_shell();
@@ -1002,7 +1008,14 @@ single_user(void)
  *  reboot(8) killed shell?
  */
  warning("single user shell terminated.");
- sleep(STALL_TIMEOUT);
+ gettimeofday(, NULL);
+ tn = tv;
+ tv.tv_sec += STALL_TIMEOUT;
+ while (tv.tv_sec > tn.tv_sec || (tv.tv_sec ==
+tn.tv_sec && tv.tv_usec > tn.tv_usec)) {
+ sleep(1);
+ gettimeofday(, NULL);
+ }
  _exit(0);
  } else {
  warning("single user shell terminated, restarting");
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


11.0-RELEASE Status Update

2016-10-06 Thread Glen Barber
As many of you are aware, 11.0-RELEASE needed to be rebuilt to address
several issues that were discovered after the release was built.  Extra
caution is being taken in testing the rebuilt releases, so at present,
the final release announcement is planned for Monday, October 10.

Thank you for your patience in waiting for 11.0-RELEASE.

Glen
On behalf of:   re@



signature.asc
Description: PGP signature


Re: Reproducible panic - Going nowhere without my init!

2016-10-06 Thread Konstantin Belousov
On Thu, Oct 06, 2016 at 06:31:59PM +1000, Andy Farkas wrote:
> Reverted your patch then changed line 1011 of init.c to _exit(97):
> 
> --- init.c-orig 2016-10-05 18:52:24.02291 +1000
> +++ init.c 2016-10-06 17:02:33.714624000 +1000
> @@ -1008,7 +1008,7 @@
>   */
>   warning("single user shell terminated.");
>   sleep(STALL_TIMEOUT);
> - _exit(0);
> + _exit(97);
>   } else {
>   warning("single user shell terminated, restarting");
>   return (state_func_t) single_user;
> 
> ...and got a panic that showed "exit 97":  http://imgur.com/xonPwxR
> 
> I think that kern_reboot() is not being called somehow.
> kern_reboot() is the only place rebooting = 1; is executed.
> 
> "init died (signal 0, exit 97)
> panic: Going nowhere without my init!"
> 
> can only happen if rebooting = 0 in kern_exit.c exit1().
> 
> Another tell that kern_reboot() has not been called is "cpuid = 3"
> because the first thing kern_reboot() does is bind to CPU 0.
> 
> Why is kern_reboot() being skipped? I have no idea.
> 
> Anything more I can do to help?  Do you want a core dump?
> 

Please try the following patch.

diff --git a/sbin/init/init.c b/sbin/init/init.c
index bda86b5..25ac2bd 100644
--- a/sbin/init/init.c
+++ b/sbin/init/init.c
@@ -870,6 +870,7 @@ single_user(void)
sigset_t mask;
const char *shell;
char *argv[2];
+   struct timeval tv, tn;
 #ifdef SECURE
struct ttyent *typ;
struct passwd *pp;
@@ -884,8 +885,13 @@ single_user(void)
if (Reboot) {
/* Instead of going single user, let's reboot the machine */
sync();
-   reboot(howto);
-   _exit(0);
+   if (reboot(howto) == -1) {
+   emergency("reboot(%#x) failed, %s", howto,
+   strerror(errno));
+   _exit(1); /* panic and reboot */
+   }
+   warning("reboot(%#x) returned", howto);
+   _exit(0); /* panic as well */
}
 
shell = get_shell();
@@ -1002,7 +1008,14 @@ single_user(void)
 *  reboot(8) killed shell?
 */
warning("single user shell terminated.");
-   sleep(STALL_TIMEOUT);
+   gettimeofday(, NULL);
+   tn = tv;
+   tv.tv_sec += STALL_TIMEOUT;
+   while (tv.tv_sec > tn.tv_sec || (tv.tv_sec ==
+   tn.tv_sec && tv.tv_usec > tn.tv_usec)) {
+   sleep(1);
+   gettimeofday(, NULL);
+   }
_exit(0);
} else {
warning("single user shell terminated, restarting");
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: iicsmb

2016-10-06 Thread Mark Saad
On Thu, Oct 6, 2016 at 5:39 AM, Mark Dixon  wrote:

> If I load the module on my laptop (Lenovo Thinkpad X1 Carbon), I get:
>
> iicsmb0:  on iicbus0
> iicsmb1:  on iicbus1
> iicsmb2:  on iicbus2
> iicsmb3:  on iicbus3
> iicsmb4:  on iicbus4
> iicsmb5:  on iicbus5
> iicsmb6:  on iicbus6
> iicsmb7:  on iicbus7
> iicsmb8:  on iicbus8
> iicsmb9:  on iicbus9
> iicsmb10:  on iicbus10
> iicsmb11:  on iicbus11
> smbus1:  on iicsmb0
> smbus2:  on iicsmb1
> smbus3:  on iicsmb2
> smbus4:  on iicsmb3
> smbus5:  on iicsmb4
> smbus6:  on iicsmb5
> smbus7:  on iicsmb6
> smbus8:  on iicsmb7
> smbus9:  on iicsmb8
> smbus10:  on iicsmb9
> smbus11:  on iicsmb10
> smbus12:  on iicsmb11
>
> I have no idea what this means though.
>
> Regards,
>
> Mark


Andriy
 Likewise I have devices that appear but not sure what they are. I am on
smbios.planar.maker="BIOSTAR Group"
smbios.planar.product="A68I-350 DELUXE"

iicsmb0:  on iicbus0
iicsmb1:  on iicbus1
iicsmb2:  on iicbus2
iicsmb3:  on iicbus3
iicsmb4:  on iicbus4
iicsmb5:  on iicbus5
iicsmb6:  on iicbus6
iicsmb7:  on iicbus7
smbus0:  on iicsmb0
smbus1:  on iicsmb1
smbus2:  on iicsmb2
smbus3:  on iicsmb3
smbus4:  on iicsmb4
smbus5:  on iicsmb5
smbus6:  on iicsmb6
smbus7:  on iicsmb7

root@ostrich:~ # ls -l /dev/iic*
crw---  1 root  wheel  0x6d Sep 26 16:21 /dev/iic0
crw---  1 root  wheel  0x6e Sep 26 16:21 /dev/iic1
crw---  1 root  wheel  0x6f Sep 26 16:21 /dev/iic2
crw---  1 root  wheel  0x70 Sep 26 16:21 /dev/iic3
crw---  1 root  wheel  0x71 Sep 26 16:21 /dev/iic4
crw---  1 root  wheel  0x72 Sep 26 16:21 /dev/iic5
crw---  1 root  wheel  0x73 Sep 26 16:21 /dev/iic6
crw---  1 root  wheel  0x77 Sep 26 16:21 /dev/iic7

probing them with smbmsg doesn't return any data.

-- 
mark saad | nones...@longcount.org
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.0 stuck on high network load

2016-10-06 Thread Julien Charbon

 Hi,

On 9/28/16 1:59 PM, Slawa Olhovchenkov wrote:
> On Wed, Sep 28, 2016 at 12:06:47PM +0200, Julien Charbon wrote:
>> 
>>  I am still trying to reproduce your issue, without success so far.

 Thanks for Slawa effort and multiple debug report we start seeing the
bottom of this issue and it seems to be a generic one.  The most useful
report being:

panic: tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL
cpuid = 4
KDB: stack backtrace:
db_trace_self_wrapper() at 0x8032467b =
db_trace_self_wrapper+0x2b/frame 0xfe1f9e1f8730
vpanic() at 0x804b5672 = vpanic+0x182/frame 0xfe1f9e1f87b0
kassert_panic() at 0x804b54e6 = kassert_panic+0x126/frame
0xfe1f9e1f8820
tcp_usr_detach() at 0x806564dc = tcp_usr_detach+0x1bc/frame
0xfe1f9e1f8850
sofree() at 0x8053de66 = sofree+0x1a6/frame 0xfe1f9e1f8880
tcp_close() at 0x8064dd8e = tcp_close+0x11e/frame 0xfe1f9e1f88b0
tcp_timer_2msl() at 0x80653c28 = tcp_timer_2msl+0x278/frame
0xfe1f9e1f88e0
softclock_call_cc() at 0x804cbacc =
softclock_call_cc+0x19c/frame 0xfe1f9e1f89c0
softclock() at 0x804cbec7 = softclock+0x47/frame 0xfe1f9e1f89e0
intr_event_execute_handlers() at 0x8047aa86 =
intr_event_execute_handlers+0x96/frame 0xfe1f9e1f8a20
ithread_loop() at 0x8047b106 = ithread_loop+0xa6/frame
0xfe1f9e1f8a70
fork_exit() at 0x804781b4 = fork_exit+0x84/frame 0xfe1f9e1f8ab0
fork_trampoline() at 0x80713fce = fork_trampoline+0xe/frame
0xfe1f9e1f8ab0

 The scenario:

1. thread1:  tcp_timer_2msl() expires and tcp_close() is called to clean
this TCP connection.

2. thread1:  In tcp_close() the inp is marked with INP_DROPPED flag, the
process continues and calls INP_WUNLOCK() here:

https://github.com/freebsd/freebsd/blob/releng/11.0/sys/netinet/tcp_subr.c#L1568

3. thread2:  Now because INP_WLOCK is released, the inp can transition
to INP_TIMEWAIT state and nothing is preventing it.

4. thread2:  During the INP_TIMEWAIT state transition, the inp is marked
with INP_TIMEWAIT flag.

5. thread1:  Back in business and tcp_close() call continues with
sofree() -> tcp_usr_detach() -> tcp_detach().  Then as inp is marked
with INP_DROPPED|INP_TIMEWAIT flags, in_pcbfree() is called.  w/
INVARIANTS you have an assertion here, w/o INVARIANTS process continues.

6. Later:  tcp_twclose() cleans up this INP_TIMEWAIT inp and calls
in_pcbfree() again to achieve a fancy inp double-free.

 This issue is a tricky one and seems here since quite a while.  It has
been witness at least once in 10.1 and by two different people in 11.0.

Astute questions:

 o Why INP_DROPPED flag is not tested in tcp_input() in the first place?
 When you are marked as INP_DROPPED, you are almost dead, you should not
be allowed to transition to a different state!

 Good point, and tcp_input() relies on the fact that INP_DROPPED inps
are no more in TCP hash table.  But tcp_input() in some cases do relock
INP (see relocked: label) and if it does check a lot of things after
having relocked the inp it does not check for a recently added
INP_DROPPED flag.

 o Why tcp_detach() does an unconditional in_pcbfree() for inps in
TIMEWAIT state?  This because inps in TIMEWAIT state have only one exit:
 Being freed.  And it is the duty of tcp_detach() to free all inps with
INP_DROPPED|INP_TIMEWAIT.

 o Why this issue is so rare?

 Good question, I can see how to have a specific TCP traffic to make it
more frequent but no definitive answer yet.

Fix proposal:

 This issue description is still a bit fresh but I would enforce that an
inp with INP_DROPPED flag should not be allowed to change state.

Thing learned:

 When re-locking an inp, it might have changed a lot, and you might not
like what it became.

 Thanks again to Slawa, for his numerous debug reports and always
questioning my explanations.  His last question directly led to this
finding.  He is testing a quick workaround patch to check if there is more.

 I will create a review with a fix proposal, and don't hesitate if you
have other comments on this issue.

--
Julien



signature.asc
Description: OpenPGP digital signature


Re: 11.0 stuck on high network load

2016-10-06 Thread Slawa Olhovchenkov
On Thu, Oct 06, 2016 at 09:28:06AM +0200, Julien Charbon wrote:

> 2. thread1:  In tcp_close() the inp is marked with INP_DROPPED flag, the
> process continues and calls INP_WUNLOCK() here:
> 
> https://github.com/freebsd/freebsd/blob/releng/11.0/sys/netinet/tcp_subr.c#L1568

Look also to sys/netinet/tcp_timewait.c:488

And check other locks from r160549

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: iicsmb

2016-10-06 Thread Mark Dixon
If I load the module on my laptop (Lenovo Thinkpad X1 Carbon), I get:

iicsmb0:  on iicbus0
iicsmb1:  on iicbus1
iicsmb2:  on iicbus2
iicsmb3:  on iicbus3
iicsmb4:  on iicbus4
iicsmb5:  on iicbus5
iicsmb6:  on iicbus6
iicsmb7:  on iicbus7
iicsmb8:  on iicbus8
iicsmb9:  on iicbus9
iicsmb10:  on iicbus10
iicsmb11:  on iicbus11
smbus1:  on iicsmb0
smbus2:  on iicsmb1
smbus3:  on iicsmb2
smbus4:  on iicsmb3
smbus5:  on iicsmb4
smbus6:  on iicsmb5
smbus7:  on iicsmb6
smbus8:  on iicsmb7
smbus9:  on iicsmb8
smbus10:  on iicsmb9
smbus11:  on iicsmb10
smbus12:  on iicsmb11

I have no idea what this means though.

Regards,

Mark

signature.asc
Description: This is a digitally signed message part


Re: 11.0 stuck on high network load

2016-10-06 Thread Julien Charbon

 Hi Hiren,

On 10/6/16 9:44 AM, hiren panchasara wrote:
> On 10/06/16 at 09:28P, Julien Charbon wrote:
>> On 9/28/16 1:59 PM, Slawa Olhovchenkov wrote:
>>> On Wed, Sep 28, 2016 at 12:06:47PM +0200, Julien Charbon wrote:
 
  I am still trying to reproduce your issue, without success so far.
>>
>>  Thanks for Slawa effort and multiple debug report we start seeing the
>> bottom of this issue and it seems to be a generic one.  The most useful
>> report being:
>>
>> panic: tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL
> 
> I know there are multiple and probably related problems being
> discussed here but what about the one mentioned in subject of this
> thread?
> Apologies if I've missed something conclusive in one of the replies of
> this thread about that issue.

 This issue can lead the machine being stuck on high network load, by
double freeing an inp, you can corrupt/leak an inp lock, and the network
stack can wait definitely on this inp lock to be released.  You get this
assert only with INVARIANTS defined.

 Of usual, we can have more than one issue here, but this
INP_TIMEWAI|INP_DROPPED issue need to be fixed anyway.

--
Julien



signature.asc
Description: OpenPGP digital signature


Re: Reproducible panic - Going nowhere without my init!

2016-10-06 Thread Peter Jeremy
On 2016-Oct-04 11:14:38 +1000, Andy Farkas  wrote:
>Is it just me or
>
>Step 1: boot
>Step 2: login as root
>Step 3: type "w" *
>Step 4: type "shutdown now; logout"
>Step 5: press  at the 'Enter full pathname of shell or RETURN for
>/bin/sh:' prompt
>Step 6: type "reboot"
>Step 7: get a Panic: "Going nowhere without my init!"
>
>* The panic will not happen if you skip step 3.
>
>The panic will not happen if you type "sync; sync; sync" after step 5.
>
>The panic will not happen if you wait (an unknown amount of) some time
>after step 5.

I can reproduce this on the console of my GCE instance but the timing
seems important.  It doesn't seem to fail if I ssh in or if I pause between
any of the commands.

...
gce1# w
 7:47PM  up 38 secs, 1 users, load averages: 0.69, 0.22, 0.08
USER   TTY  FROM  LOGIN@  IDLE WHAT
root   u0   - 7:47PM - w
gce1# shutdown now;logout
Shutdown NOW!
shutdown: [pid 1071]
Stopping cron.
Stopping sshd.
Stopping ntpd.
Stopping local_unbound.
Stopping devd.
Writing entropy file:.
Writing early boot entropy file:.
Terminated
.
Oct  6 19:47:09 pflog0: promiscuous mode disabled
Enter full pathname of shell or RETURN for /bin/sh:
gce1# reboot
Oct  6 19:47:17 init: single user shell terminated.

init died (signal 0, exit 0)
panic: Going nowhere without my init!
Uptime: 55s
Changing serial settings was 0/0 now 3/0
Start bios (version 1.7.2-20150226_170051-google)

gce1$ uname -a
FreeBSD gce1.rulingia.com 11.0-PRERELEASE FreeBSD 11.0-PRERELEASE #83 r306704M: 
Thu Oct  6 13:22:27 AEDT 2016 
r...@gce1.rulingia.com:/usr/obj/usr/src/sys/GCE  amd64

I haven't investigated the cause yet.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: Reproducible panic - Going nowhere without my init!

2016-10-06 Thread Andy Farkas
Reverted your patch then changed line 1011 of init.c to _exit(97):

--- init.c-orig 2016-10-05 18:52:24.02291 +1000
+++ init.c 2016-10-06 17:02:33.714624000 +1000
@@ -1008,7 +1008,7 @@
  */
  warning("single user shell terminated.");
  sleep(STALL_TIMEOUT);
- _exit(0);
+ _exit(97);
  } else {
  warning("single user shell terminated, restarting");
  return (state_func_t) single_user;

...and got a panic that showed "exit 97":  http://imgur.com/xonPwxR

I think that kern_reboot() is not being called somehow.
kern_reboot() is the only place rebooting = 1; is executed.

"init died (signal 0, exit 97)
panic: Going nowhere without my init!"

can only happen if rebooting = 0 in kern_exit.c exit1().

Another tell that kern_reboot() has not been called is "cpuid = 3"
because the first thing kern_reboot() does is bind to CPU 0.

Why is kern_reboot() being skipped? I have no idea.

Anything more I can do to help?  Do you want a core dump?

-andyf
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.0 stuck on high network load

2016-10-06 Thread hiren panchasara
On 10/06/16 at 09:51P, Julien Charbon wrote:
> 
>  Hi Hiren,
> 
> On 10/6/16 9:44 AM, hiren panchasara wrote:
> > On 10/06/16 at 09:28P, Julien Charbon wrote:
> >> On 9/28/16 1:59 PM, Slawa Olhovchenkov wrote:
> >>> On Wed, Sep 28, 2016 at 12:06:47PM +0200, Julien Charbon wrote:
>  
>   I am still trying to reproduce your issue, without success so far.
> >>
> >>  Thanks for Slawa effort and multiple debug report we start seeing the
> >> bottom of this issue and it seems to be a generic one.  The most useful
> >> report being:
> >>
> >> panic: tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL
> > 
> > I know there are multiple and probably related problems being
> > discussed here but what about the one mentioned in subject of this
> > thread?
> > Apologies if I've missed something conclusive in one of the replies of
> > this thread about that issue.
> 
>  This issue can lead the machine being stuck on high network load, by
> double freeing an inp, you can corrupt/leak an inp lock, and the network
> stack can wait definitely on this inp lock to be released.  You get this
> assert only with INVARIANTS defined.
> 
>  Of usual, we can have more than one issue here, but this
> INP_TIMEWAI|INP_DROPPED issue need to be fixed anyway.

Thanks for the explanation, Julien.

Cheers,
Hiren


pgpsLxxVbSK2k.pgp
Description: PGP signature


Re: 11.0 stuck on high network load

2016-10-06 Thread hiren panchasara
On 10/06/16 at 09:28P, Julien Charbon wrote:
> 
>  Hi,
> 
> On 9/28/16 1:59 PM, Slawa Olhovchenkov wrote:
> > On Wed, Sep 28, 2016 at 12:06:47PM +0200, Julien Charbon wrote:
> >> 
> >>  I am still trying to reproduce your issue, without success so far.
> 
>  Thanks for Slawa effort and multiple debug report we start seeing the
> bottom of this issue and it seems to be a generic one.  The most useful
> report being:
> 
> panic: tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL

I know there are multiple and probably related problems being
discussed here but what about the one mentioned in subject of this
thread?
Apologies if I've missed something conclusive in one of the replies of
this thread about that issue.

Cheers,
Hiren


pgpZxjPShG4YG.pgp
Description: PGP signature