[Bug 1304754] Re: gccgo on ppc64el using split stacks when not supported

2014-04-24 Thread Bug Watch Updater
Launchpad has imported 7 comments from the remote bug at
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60931.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.


On 2014-04-23T07:06:09+00:00 Anton Blanchard wrote:

Created attachment 32659
Bump page size to 64kB

We are seeing random failures with go programs on a 64kB page size ppc64
box. It looks like garbage collection issues - sometimes we SEGV in
timer code, sometimes we SEGV in the code that wraps a kernel read
syscall. If I prevent the garbage collector from running, the programs
work.

The libgo malloc hard codes the page size so I wrote a quick hack to
bump this (and a few other dependent variables). This makes the problem
go away, but we will need to come up with a better way to do this at
runtime.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/comments/15


On 2014-04-23T07:11:55+00:00 Pinskia wrote:

This is going to be true on AARCH64 also where most distros are going to
be using 64k pages (some might use 4k pages if they also support
AARCH32).  MIPS has many different page sizes too (4k, 8k, 16k, 32k, and
64k).  So hard coding the page size seems wrong, maybe you should call
getpagesize instead.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/comments/17


On 2014-04-23T07:26:53+00:00 Anton Blanchard wrote:

I agree, but when I tried this I found a few places that expect PageSize
to be a compile time constant so it is not as trivial as I had hoped.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/comments/18


On 2014-04-23T16:42:11+00:00 Ian Lance Taylor wrote:

It would be extremely helpful if you could find a test case that can
recreate this problem with some reliability.  There is no obvious
dependency on the system page size in libgo.  The PageSize constant is
the unit that the memory allocator deals in, and should have no direct
relationship to the system page size.  I believe that there is a bug,
but we need to track it down.

If you set the environment variable GOGC=1 the garbage collector will
run much more frequently; perhaps that will help get a reproducible test
case.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/comments/19


On 2014-04-24T00:17:06+00:00 Anton Blanchard wrote:

Created attachment 32669
Don't use madvise(DONT_NEED) on sub pages

Reply at:
https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/comments/20


On 2014-04-24T00:18:11+00:00 Anton Blanchard wrote:

I think I see it:

19112 madvise(0xc21103, 4096, MADV_DONTNEED) = 0

That 4kB madvise(MADV_DONTNEED) gets rounded up to the system page size
of 64kB and we end up covering still in use memory.

The following patch fixes it for me, but it just ignores any sub pages.
We should keep them around so later calls have a chance at consolidating
regions up to a system page size.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/comments/21


On 2014-04-24T05:38:26+00:00 Jakub-gcc wrote:

Perhaps it would be better instead of not doing the madvise at all if
start or length isn't page aligned round the start to the next page
boundary and end to the previous page boundary and madvise if the
rounded end is above the rounded start.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/comments/22


** Changed in: gcc
   Status: Unknown => New

** Changed in: gcc
   Importance: Unknown => Medium

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1304754

Title:
  gccgo on ppc64el using split stacks when not supported

To manage notifications about this bug go to:
https://bugs.launchpad.net/gcc/+bug/1304754/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1304754] Re: gccgo on ppc64el using split stacks when not supported

2014-04-23 Thread Adam Conrad
** Also affects: gcc via
   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60931
   Importance: Unknown
   Status: Unknown

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1304754

Title:
  gccgo on ppc64el using split stacks when not supported

To manage notifications about this bug go to:
https://bugs.launchpad.net/gcc/+bug/1304754/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1304754] Re: gccgo on ppc64el using split stacks when not supported

2014-04-23 Thread Anton Blanchard
Hi Dave,

It does look like a page size issue. I submitted the following bug:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60931

** Bug watch added: GCC Bugzilla #60931
   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60931

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1304754

Title:
  gccgo on ppc64el using split stacks when not supported

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


Re: [Bug 1304754] Re: gccgo on ppc64el using split stacks when not supported

2014-04-16 Thread Dave Cheney
An excellent point. Timers are managed by a single goroutine and a
priority queue of events to wait on and channels to send the timer
event. It should be doable to write some code that stresses timers.

However I don't believe that SIGALARM is used, well at least not in gc
which most of the gccgo standard library extends from, gccgo might be
slightly different.

The event that crashes the go process is related to a watchdog timer
that expires and tries to kill the subprocess.


On Wed, Apr 16, 2014 at 6:04 PM, Anton Blanchard  wrote:
> There shouldn't be any difference in terms of signal handling.
>
> I've now seen a couple of failures in mongodb/TLS networking code:
>
> panic: runtime error: invalid memory address or nil pointer dereference
> [signal 0xb code=0x1 addr=0x38]
>
> goroutine 16 [running]:
> crypto_tls.SetWriteDeadline.pN15_crypto_tls.Conn
> ../../../gcc/libgo/go/crypto/tls/conn.go:111
> labix.org_v2_mgo.updateDeadline.pN28_labix.org_v2_mgo.mongoSocket
> /home/anton/juju-core-1.18.1/src/labix.org/v2/mgo/socket.go:273
> labix.org_v2_mgo.Query.pN28_labix.org_v2_mgo.mongoSocket
> /home/anton/juju-core-1.18.1/src/labix.org/v2/mgo/socket.go:474
> labix.org_v2_mgo.SimpleQuery.pN28_labix.org_v2_mgo.mongoSocket
> /home/anton/juju-core-1.18.1/src/labix.org/v2/mgo/socket.go:320
> labix.org_v2_mgo.pinger.pN28_labix.org_v2_mgo.mongoServer
> /home/anton/juju-core-1.18.1/src/labix.org/v2/mgo/server.go:278
> created by mgo.newServer
> /home/anton/juju-core-1.18.1/src/labix.org/v2/mgo/server.go:80
>
> which is:
>
> func (c *Conn) SetWriteDeadline(t time.Time) error {
> return c.conn.SetWriteDeadline(t)
> }
>
> SetWriteDeadline will end up in timer code, and I've previously seen
> failures in the timer code.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1304754
>
> Title:
>   gccgo on ppc64el using split stacks when not supported
>
> Status in “gccgo-4.9” package in Ubuntu:
>   Confirmed
>
> Bug description:
>   On kernels 3.13-18 and 3.13-23 (there may be others) the kernel is
>   killing gccgo compiled binaries
>
>   [18519.444748] jujud[19277]: bad frame in setup_rt_frame:
>    nip  lr 
>   [18519.673632] init: juju-agent-ubuntu-local main process (19220)
>   killed by SEGV signal
>   [18519.673651] init: juju-agent-ubuntu-local main process ended, respawning
>
>   In powerpc/kernel/signal_64.c:
>
>   sys_rt_sigreturn is jumping to the badframe: label and executing an
>   unconditional force_sigsegv which is delivered to the userland
>   process. Like C++, gccgo tries to decode SIGSEGV as a nil pointer
>   access and blame some random function that happened to be the top
>   stack frame.
>
>   Reverting to the 3.13-08 kernel appears to resolve the issue which
>   (weakly) points the finger at the recent switch to 64k pages.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/+subscriptions

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1304754

Title:
  gccgo on ppc64el using split stacks when not supported

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1304754] Re: gccgo on ppc64el using split stacks when not supported

2014-04-16 Thread Anton Blanchard
There shouldn't be any difference in terms of signal handling.

I've now seen a couple of failures in mongodb/TLS networking code:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x38]

goroutine 16 [running]:
crypto_tls.SetWriteDeadline.pN15_crypto_tls.Conn
../../../gcc/libgo/go/crypto/tls/conn.go:111
labix.org_v2_mgo.updateDeadline.pN28_labix.org_v2_mgo.mongoSocket
/home/anton/juju-core-1.18.1/src/labix.org/v2/mgo/socket.go:273
labix.org_v2_mgo.Query.pN28_labix.org_v2_mgo.mongoSocket
/home/anton/juju-core-1.18.1/src/labix.org/v2/mgo/socket.go:474
labix.org_v2_mgo.SimpleQuery.pN28_labix.org_v2_mgo.mongoSocket
/home/anton/juju-core-1.18.1/src/labix.org/v2/mgo/socket.go:320
labix.org_v2_mgo.pinger.pN28_labix.org_v2_mgo.mongoServer
/home/anton/juju-core-1.18.1/src/labix.org/v2/mgo/server.go:278
created by mgo.newServer
/home/anton/juju-core-1.18.1/src/labix.org/v2/mgo/server.go:80

which is:

func (c *Conn) SetWriteDeadline(t time.Time) error {
return c.conn.SetWriteDeadline(t)
}

SetWriteDeadline will end up in timer code, and I've previously seen
failures in the timer code.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1304754

Title:
  gccgo on ppc64el using split stacks when not supported

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


Re: [Bug 1304754] Re: gccgo on ppc64el using split stacks when not supported

2014-04-15 Thread Dave Cheney
Hi Anton,

I've been looking at another angle via a different crash. I see a
crash if a child process gets a signal, which sort of reflects back on
the parent.

Are there any alignment requirements for signal handling on 64k kernels
?

Dave

On Wed, Apr 16, 2014 at 4:28 PM, Anton Blanchard  wrote:
> This doesn't explain why we failed in the first place however. Using
> gdb, I have seen a couple of SEGVs in:
>
> * 1Thread 0x3fffa8c447e0 (LWP 5562) "jujud" timerproc
> (dummy=) at ../../../gcc/libgo/runtime/time.goc:217
>
> ie:
>
> f = (void*)t->fv->fn;
>
> Perhaps a stale timer that we aren't cancelling?
>
> I've also seen a fail here:
>
> fatal error: runtime_lock: lock count
>
> goroutine 2 [running]:
> runtime_dopanic
> ../../../gcc/libgo/runtime/panic.c:78
> runtime_throw
> ../../../gcc/libgo/runtime/panic.c:116
> runtime_lock
> ../../../gcc/libgo/runtime/lock_futex.c:41
> runtime_allocmcache
> ../../../gcc/libgo/runtime/malloc.goc:337
> runtime_startpanic
> ../../../gcc/libgo/runtime/panic.c:46
> runtime_throw
> ../../../gcc/libgo/runtime/panic.c:114
> runtime_unlock
> ../../../gcc/libgo/runtime/lock_futex.c:101
> runtime_MHeap_Scavenger
> ../../../gcc/libgo/runtime/mheap.c:482
> kickoff
> ../../../gcc/libgo/runtime/proc.c:237
>
> :0
>
> :0
> created by runtime_main
> ../../../gcc/libgo/runtime/proc.c:565
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1304754
>
> Title:
>   gccgo on ppc64el using split stacks when not supported
>
> Status in “gccgo-4.9” package in Ubuntu:
>   Confirmed
>
> Bug description:
>   On kernels 3.13-18 and 3.13-23 (there may be others) the kernel is
>   killing gccgo compiled binaries
>
>   [18519.444748] jujud[19277]: bad frame in setup_rt_frame:
>    nip  lr 
>   [18519.673632] init: juju-agent-ubuntu-local main process (19220)
>   killed by SEGV signal
>   [18519.673651] init: juju-agent-ubuntu-local main process ended, respawning
>
>   In powerpc/kernel/signal_64.c:
>
>   sys_rt_sigreturn is jumping to the badframe: label and executing an
>   unconditional force_sigsegv which is delivered to the userland
>   process. Like C++, gccgo tries to decode SIGSEGV as a nil pointer
>   access and blame some random function that happened to be the top
>   stack frame.
>
>   Reverting to the 3.13-08 kernel appears to resolve the issue which
>   (weakly) points the finger at the recent switch to 64k pages.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/+subscriptions

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1304754

Title:
  gccgo on ppc64el using split stacks when not supported

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 1304754] Re: gccgo on ppc64el using split stacks when not supported

2014-04-15 Thread Dave Cheney
On Wed, Apr 16, 2014 at 4:26 PM, Anton Blanchard  wrote:
> I've made some progress with these fails. A lot of the confusion is
> around the way gccgo hooks the SEGV handler and attempts to backtrace
> all goroutines (the code is in runtime_tracebackothers())
>
> It does this by calling runtime_gogo() which temporarily switches to the
> goroutine using setcontext(). If the context is bad in any way, this
> will cause us to SEGV again. I printed out the stack pointer (r1) and
> the NIA during this stack backtracing, and we see where things go south
> just as we are about to dump goroutine 0:
>
> goroutine 0 [idle]:
> DEBUG: runtime_gogo r1 0 nia 0
>
> r1 = 0, nia = 0. When we call setcontext on this invalid context we die
> with:
>
> jujud[5258]: bad frame in setup_rt_frame:  nip
>  lr 
>
> Perhaps we aren't saving away the context for goroutine 0 correctly.

Hmm, could be. It looks like the process was crashing anyway.

>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1304754
>
> Title:
>   gccgo on ppc64el using split stacks when not supported
>
> Status in “gccgo-4.9” package in Ubuntu:
>   Confirmed
>
> Bug description:
>   On kernels 3.13-18 and 3.13-23 (there may be others) the kernel is
>   killing gccgo compiled binaries
>
>   [18519.444748] jujud[19277]: bad frame in setup_rt_frame:
>    nip  lr 
>   [18519.673632] init: juju-agent-ubuntu-local main process (19220)
>   killed by SEGV signal
>   [18519.673651] init: juju-agent-ubuntu-local main process ended, respawning
>
>   In powerpc/kernel/signal_64.c:
>
>   sys_rt_sigreturn is jumping to the badframe: label and executing an
>   unconditional force_sigsegv which is delivered to the userland
>   process. Like C++, gccgo tries to decode SIGSEGV as a nil pointer
>   access and blame some random function that happened to be the top
>   stack frame.
>
>   Reverting to the 3.13-08 kernel appears to resolve the issue which
>   (weakly) points the finger at the recent switch to 64k pages.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/+subscriptions

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1304754

Title:
  gccgo on ppc64el using split stacks when not supported

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1304754] Re: gccgo on ppc64el using split stacks when not supported

2014-04-15 Thread Anton Blanchard
This doesn't explain why we failed in the first place however. Using
gdb, I have seen a couple of SEGVs in:

* 1Thread 0x3fffa8c447e0 (LWP 5562) "jujud" timerproc
(dummy=) at ../../../gcc/libgo/runtime/time.goc:217

ie:

f = (void*)t->fv->fn;

Perhaps a stale timer that we aren't cancelling?

I've also seen a fail here:

fatal error: runtime_lock: lock count

goroutine 2 [running]:
runtime_dopanic
../../../gcc/libgo/runtime/panic.c:78
runtime_throw
../../../gcc/libgo/runtime/panic.c:116
runtime_lock
../../../gcc/libgo/runtime/lock_futex.c:41
runtime_allocmcache
../../../gcc/libgo/runtime/malloc.goc:337
runtime_startpanic
../../../gcc/libgo/runtime/panic.c:46
runtime_throw
../../../gcc/libgo/runtime/panic.c:114
runtime_unlock
../../../gcc/libgo/runtime/lock_futex.c:101
runtime_MHeap_Scavenger
../../../gcc/libgo/runtime/mheap.c:482
kickoff 
../../../gcc/libgo/runtime/proc.c:237

:0

:0
created by runtime_main
../../../gcc/libgo/runtime/proc.c:565

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1304754

Title:
  gccgo on ppc64el using split stacks when not supported

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1304754] Re: gccgo on ppc64el using split stacks when not supported

2014-04-15 Thread Anton Blanchard
I've made some progress with these fails. A lot of the confusion is
around the way gccgo hooks the SEGV handler and attempts to backtrace
all goroutines (the code is in runtime_tracebackothers())

It does this by calling runtime_gogo() which temporarily switches to the
goroutine using setcontext(). If the context is bad in any way, this
will cause us to SEGV again. I printed out the stack pointer (r1) and
the NIA during this stack backtracing, and we see where things go south
just as we are about to dump goroutine 0:

goroutine 0 [idle]:
DEBUG: runtime_gogo r1 0 nia 0

r1 = 0, nia = 0. When we call setcontext on this invalid context we die
with:

jujud[5258]: bad frame in setup_rt_frame:  nip
 lr 

Perhaps we aren't saving away the context for goroutine 0 correctly.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1304754

Title:
  gccgo on ppc64el using split stacks when not supported

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1304754] Re: gccgo on ppc64el using split stacks when not supported

2014-04-13 Thread Dave Cheney
Anton:

I've done some experiments with the peano.go  test and confirmed that
gccgo on ppc is correctly configured to not use f-split-stack. It turns
out the peano.go can't pass without split stacks. On gccgo/ppc64 the
program crashes at a stack depth of

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3fffb770 (LWP 24713)]
0x10004e0c in main.is_zero ()
(gdb) bt
#0  0x10004e0c in main.is_zero ()
#1  0x100051fc in main.count ()
#2  0x1000522c in main.count ()
...
#31380 0x1000522c in main.count ()
#31381 0x10005854 in main.main ()

I think the peano example is just a straght 'fall off the stack' type error, it 
also generates a slightly different 
ubuntu@winton-02:~/go/test$ ./a.out 

  
Segmentation fault (core dumped)
ubuntu@winton-02:~/go/test$ dmesg | tail -n1
[501663.078093] a.out[25679]: bad frame in setup_rt_frame: 00c20ffaf0e0 nip 
10004e0c lr 100051fc

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1304754

Title:
  gccgo on ppc64el using split stacks when not supported

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs