Re: TLS bug?

2011-06-21 Thread Jason Evans

On 06/17/2011 02:35 PM, Marius Strobl wrote:

On Fri, Jun 17, 2011 at 03:31:29PM -0400, Nathaniel W Filardo wrote:

On Fri, Jun 17, 2011 at 08:07:13PM +0200, Marius Strobl wrote:

Using bonnie++ I can't reproduce this (didn't try mysql) but I have


I seem to have good luck reproducing it with "-r 5 -s 10 -x 10" by about the
third iteration.


Ok, with these parameters I can reproduce it.




some TLS fixes for libthr I forgot about but could be relevant here
(most actually date back to 2008 when the base binutils didn't support
GNUTLS for sparc64 so I couldn't test them easily). Could you please
give a libthr build with the following patch a try?
http://people.freebsd.org/~marius/libthr_sparc64.diff


Concurrent runs both with and without those diffs still asserted.
Interestingly, libc's .tbss section, even after the assertion, is still full
of zeros, so it looks like something stranger than a wild-write back to
.tbss.  I'll go diving through the tls allocation code again when I get a
minute.



In combination with the below patch bonnie++ survived 100 iterations
here. I'm not sure what this means though as I don't have much knowledge
about TLS, I merely implemented the necessary relocations. Could be
that malloc() actually requires the initial exec model for variant II.
Unfortunately, it's not documented why it was added for x86.
Jason, can you shed some light on this?

Marius

Index: malloc.c
===
--- malloc.c(revision 219535)
+++ malloc.c(working copy)
@@ -234,7 +234,7 @@
  #ifdef __sparc64__
  #  define LG_QUANTUM  4
  #  define LG_SIZEOF_PTR   3
-#  define TLS_MODEL/* default */
+#  define TLS_MODEL__attribute__((tls_model("initial-exec")))
  #endif
  #ifdef __amd64__
  #  define LG_QUANTUM  4



I added the initial-exec TLS_MODEL because it is faster than the 
default; jemalloc in no way depends on this for correctness though, so 
your patch is safe.


Thanks,
Jason
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: TLS bug?

2011-06-17 Thread Marius Strobl
On Fri, Jun 17, 2011 at 03:31:29PM -0400, Nathaniel W Filardo wrote:
> On Fri, Jun 17, 2011 at 08:07:13PM +0200, Marius Strobl wrote:
> > Using bonnie++ I can't reproduce this (didn't try mysql) but I have
> 
> I seem to have good luck reproducing it with "-r 5 -s 10 -x 10" by about the
> third iteration.

Ok, with these parameters I can reproduce it.

> 
> > some TLS fixes for libthr I forgot about but could be relevant here
> > (most actually date back to 2008 when the base binutils didn't support
> > GNUTLS for sparc64 so I couldn't test them easily). Could you please
> > give a libthr build with the following patch a try?
> > http://people.freebsd.org/~marius/libthr_sparc64.diff
> 
> Concurrent runs both with and without those diffs still asserted.
> Interestingly, libc's .tbss section, even after the assertion, is still full
> of zeros, so it looks like something stranger than a wild-write back to
> .tbss.  I'll go diving through the tls allocation code again when I get a
> minute.
> 

In combination with the below patch bonnie++ survived 100 iterations
here. I'm not sure what this means though as I don't have much knowledge
about TLS, I merely implemented the necessary relocations. Could be
that malloc() actually requires the initial exec model for variant II.
Unfortunately, it's not documented why it was added for x86.
Jason, can you shed some light on this?

Marius

Index: malloc.c
===
--- malloc.c(revision 219535)
+++ malloc.c(working copy)
@@ -234,7 +234,7 @@
 #ifdef __sparc64__
 #  define LG_QUANTUM   4
 #  define LG_SIZEOF_PTR3
-#  define TLS_MODEL/* default */
+#  define TLS_MODEL__attribute__((tls_model("initial-exec")))
 #endif
 #ifdef __amd64__
 #  define LG_QUANTUM   4
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: TLS bug?

2011-06-17 Thread Marius Strobl
On Thu, Jun 16, 2011 at 03:53:19AM -0400, Nathaniel W Filardo wrote:
> Atcht; it's late.  I forgot to mention that this system is a sparc64 V240
> 2-way SMP machine.  It's running a kernel from 9.0-CURRENT r222833+262af52:
> Tue Jun  7 18:47:35 EDT 2011 and a userland from a little later.
> 
> Sorry about that.
> --nwf;
> 
> On Thu, Jun 16, 2011 at 03:31:38AM -0400, Nathaniel W Filardo wrote:
> > I have a few applications (bonnie++ and mysql, specifically, both from
> > ports) which trip over the assertion in
> > lib/libc/stdlib/malloc.c:/^_malloc_thread_cleanup that
> > >   assert(tcache != (void *)(uintptr_t)1);
> > 
> > I have patched malloc.c thus:
> > 
> > > --- a/lib/libc/stdlib/malloc.c
> > > +++ b/lib/libc/stdlib/malloc.c
> > > @@ -1108,7 +1108,7 @@ static __thread arena_t   *arenas_map 
> > > TLS_MODEL;
> > >  
> > >  #ifdef MALLOC_TCACHE
> > >  /* Map of thread-specific caches. */
> > > -static __thread tcache_t   *tcache_tls TLS_MODEL;
> > > +__thread tcache_t  *tcache_tls TLS_MODEL;
> > >  
> > >  /*
> > >   * Number of cache slots for each bin in the thread cache, or 0 if tcache
> > >   * is
> > > @@ -6184,10 +6184,17 @@ _malloc_thread_cleanup(void)
> > >  #ifdef MALLOC_TCACHE
> > > tcache_t *tcache = tcache_tls;
> > >  
> > > +fprintf(stderr, "_m_t_c for %d:%lu with %p\n", 
> > > +   getpid(),
> > > +   (unsigned long) _pthread_self(),
> > > +   tcache);
> > > +
> > > if (tcache != NULL) {
> > > -   assert(tcache != (void *)(uintptr_t)1);
> > > -   tcache_destroy(tcache);
> > > -   tcache_tls = (void *)(uintptr_t)1;
> > > +   /* assert(tcache != (void *)(uintptr_t)1); */
> > > +   if((uintptr_t)tcache != (uintptr_t)1) {
> > > +   tcache_destroy(tcache);
> > > +   tcache_tls = (void *)(uintptr_t)1;
> > > +   }
> > 
> > and libthr/thread/thr_create.c thus:
> > 
> > > --- a/lib/libthr/thread/thr_create.c
> > > +++ b/lib/libthr/thread/thr_create.c
> > > @@ -243,6 +243,8 @@ create_stack(struct pthread_attr *pattr)
> > > return (ret);
> > >  }
> > >  
> > > +extern __thread void *tcache_tls;
> > > +
> > >  static void
> > >  thread_start(struct pthread *curthread)
> > >  {
> > > @@ -280,6 +282,11 @@ thread_start(struct pthread *curthread)
> > > curthread->attr.stacksize_attr;
> > >  #endif
> > >  
> > > +fprintf(stderr, "t_s for %d:%lu with %p\n",
> > > +getpid(),
> > > +(unsigned long) _pthread_self(),
> > > +tcache_tls);
> > > +
> > > /* Run the current thread's start routine with argument: */
> > > _pthread_exit(curthread->start_routine(curthread->arg));
> > >  
> > 
> > to attempt to debug this issue.  With those changes in place, bonnie++'s
> > execution looks like this:
> > 
> > >[...]
> > > Writing a byte at a time...done
> > > Writing intelligently...done
> > > Rewriting...done
> > > Reading a byte at a time...done
> > > Reading intelligently...done
> > > t_s for 79654:1086343168 with 0x0
> > > t_s for 79654:1086345216 with 0x0
> > > t_s for 79654:1086346240 with 0x0
> > > t_s for 79654:1086347264 with 0x0
> > > t_s for 79654:1086344192 with 0x0
> > > start 'em...done...done...done...done..._m_t_c for 79654:1086344192 with
> > > 0x41404400
> > > _m_t_c for 79654:1086346240 with 0x40d2c400
> > > _m_t_c for 79654:1086343168 with 0x41404200
> > > _m_t_c for 79654:1086345216 with 0x41804200
> > > done...
> > > _m_t_c for 79654:1086347264 with 0x41004200
> > > Create files in sequential order...done.
> > > Stat files in sequential order...done.
> > > Delete files in sequential order...done.
> > > Create files in random order...done.
> > > Stat files in random order...done.
> > > Delete files in random order...done.
> > > 1.96,1.96,hydra.priv.oc.ietfng.org,1,1308217772,10M,,7,81,2644,7,3577,14,34,93,+,+++,773.7,61,16,,,
> > > ,,2325,74,13016,99,2342,86,3019,91,11888,99,2184,89,16397ms,1237ms,671ms,2009ms,177us,1305ms,489ms,1029
> > > us,270ms,140ms,53730us,250ms
> > > Writing a byte at a time...done
> > > Writing intelligently...done
> > > Rewriting...done
> > > Reading a byte at a time...done
> > > Reading intelligently...done
> > > t_s for 79654:1086343168 with 0x1
> > > t_s for 79654:1086346240 with 0x1
> > > t_s for 79654:1086345216 with 0x1
> > > t_s for 79654:1086347264 with 0x1
> > > t_s for 79654:1086344192 with 0x1
> > > start 'em...done...done...done...done...done...
> > > _m_t_c for 79654:1086347264 with 0x1
> > > _m_t_c for 79654:1086344192 with 0x1
> > > _m_t_c for 79654:1086343168 with 0x1
> > >[...]
> > 
> > So what seems to be happening is that the TLS area is being set up
> > incorrectly, eventually: rather than zeroing the tcache_tls value, it is
> > being set to 1, which means no tcache is ever allocated, so when we get
> > around to exiting, the assert trips.
> > 
> > Unfortunately, setting a b

Re: TLS bug?

2011-06-16 Thread Nathaniel W Filardo
Atcht; it's late.  I forgot to mention that this system is a sparc64 V240
2-way SMP machine.  It's running a kernel from 9.0-CURRENT r222833+262af52:
Tue Jun  7 18:47:35 EDT 2011 and a userland from a little later.

Sorry about that.
--nwf;

On Thu, Jun 16, 2011 at 03:31:38AM -0400, Nathaniel W Filardo wrote:
> I have a few applications (bonnie++ and mysql, specifically, both from
> ports) which trip over the assertion in
> lib/libc/stdlib/malloc.c:/^_malloc_thread_cleanup that
> >   assert(tcache != (void *)(uintptr_t)1);
> 
> I have patched malloc.c thus:
> 
> > --- a/lib/libc/stdlib/malloc.c
> > +++ b/lib/libc/stdlib/malloc.c
> > @@ -1108,7 +1108,7 @@ static __thread arena_t   *arenas_map 
> > TLS_MODEL;
> >  
> >  #ifdef MALLOC_TCACHE
> >  /* Map of thread-specific caches. */
> > -static __thread tcache_t   *tcache_tls TLS_MODEL;
> > +__thread tcache_t  *tcache_tls TLS_MODEL;
> >  
> >  /*
> >   * Number of cache slots for each bin in the thread cache, or 0 if tcache
> >   * is
> > @@ -6184,10 +6184,17 @@ _malloc_thread_cleanup(void)
> >  #ifdef MALLOC_TCACHE
> > tcache_t *tcache = tcache_tls;
> >  
> > +fprintf(stderr, "_m_t_c for %d:%lu with %p\n", 
> > +   getpid(),
> > +   (unsigned long) _pthread_self(),
> > +   tcache);
> > +
> > if (tcache != NULL) {
> > -   assert(tcache != (void *)(uintptr_t)1);
> > -   tcache_destroy(tcache);
> > -   tcache_tls = (void *)(uintptr_t)1;
> > +   /* assert(tcache != (void *)(uintptr_t)1); */
> > +   if((uintptr_t)tcache != (uintptr_t)1) {
> > +   tcache_destroy(tcache);
> > +   tcache_tls = (void *)(uintptr_t)1;
> > +   }
> 
> and libthr/thread/thr_create.c thus:
> 
> > --- a/lib/libthr/thread/thr_create.c
> > +++ b/lib/libthr/thread/thr_create.c
> > @@ -243,6 +243,8 @@ create_stack(struct pthread_attr *pattr)
> > return (ret);
> >  }
> >  
> > +extern __thread void *tcache_tls;
> > +
> >  static void
> >  thread_start(struct pthread *curthread)
> >  {
> > @@ -280,6 +282,11 @@ thread_start(struct pthread *curthread)
> > curthread->attr.stacksize_attr;
> >  #endif
> >  
> > +fprintf(stderr, "t_s for %d:%lu with %p\n",
> > +getpid(),
> > +(unsigned long) _pthread_self(),
> > +tcache_tls);
> > +
> > /* Run the current thread's start routine with argument: */
> > _pthread_exit(curthread->start_routine(curthread->arg));
> >  
> 
> to attempt to debug this issue.  With those changes in place, bonnie++'s
> execution looks like this:
> 
> >[...]
> > Writing a byte at a time...done
> > Writing intelligently...done
> > Rewriting...done
> > Reading a byte at a time...done
> > Reading intelligently...done
> > t_s for 79654:1086343168 with 0x0
> > t_s for 79654:1086345216 with 0x0
> > t_s for 79654:1086346240 with 0x0
> > t_s for 79654:1086347264 with 0x0
> > t_s for 79654:1086344192 with 0x0
> > start 'em...done...done...done...done..._m_t_c for 79654:1086344192 with
> > 0x41404400
> > _m_t_c for 79654:1086346240 with 0x40d2c400
> > _m_t_c for 79654:1086343168 with 0x41404200
> > _m_t_c for 79654:1086345216 with 0x41804200
> > done...
> > _m_t_c for 79654:1086347264 with 0x41004200
> > Create files in sequential order...done.
> > Stat files in sequential order...done.
> > Delete files in sequential order...done.
> > Create files in random order...done.
> > Stat files in random order...done.
> > Delete files in random order...done.
> > 1.96,1.96,hydra.priv.oc.ietfng.org,1,1308217772,10M,,7,81,2644,7,3577,14,34,93,+,+++,773.7,61,16,,,
> > ,,2325,74,13016,99,2342,86,3019,91,11888,99,2184,89,16397ms,1237ms,671ms,2009ms,177us,1305ms,489ms,1029
> > us,270ms,140ms,53730us,250ms
> > Writing a byte at a time...done
> > Writing intelligently...done
> > Rewriting...done
> > Reading a byte at a time...done
> > Reading intelligently...done
> > t_s for 79654:1086343168 with 0x1
> > t_s for 79654:1086346240 with 0x1
> > t_s for 79654:1086345216 with 0x1
> > t_s for 79654:1086347264 with 0x1
> > t_s for 79654:1086344192 with 0x1
> > start 'em...done...done...done...done...done...
> > _m_t_c for 79654:1086347264 with 0x1
> > _m_t_c for 79654:1086344192 with 0x1
> > _m_t_c for 79654:1086343168 with 0x1
> >[...]
> 
> So what seems to be happening is that the TLS area is being set up
> incorrectly, eventually: rather than zeroing the tcache_tls value, it is
> being set to 1, which means no tcache is ever allocated, so when we get
> around to exiting, the assert trips.
> 
> Unfortunately, setting a breakpoint on __libc_allocate_tls seems to do bad
> things to the kernel (inducing a SIR without any panic message).  I am
> somewhat at a loss; help?
> 
> Thanks in advance!
> --nwf;




pgppZBaMjAyJt.pgp
Description: PGP signature


TLS bug?

2011-06-16 Thread Nathaniel W Filardo
I have a few applications (bonnie++ and mysql, specifically, both from
ports) which trip over the assertion in
lib/libc/stdlib/malloc.c:/^_malloc_thread_cleanup that
>   assert(tcache != (void *)(uintptr_t)1);

I have patched malloc.c thus:

> --- a/lib/libc/stdlib/malloc.c
> +++ b/lib/libc/stdlib/malloc.c
> @@ -1108,7 +1108,7 @@ static __thread arena_t   *arenas_map TLS_MODEL;
>  
>  #ifdef MALLOC_TCACHE
>  /* Map of thread-specific caches. */
> -static __thread tcache_t   *tcache_tls TLS_MODEL;
> +__thread tcache_t  *tcache_tls TLS_MODEL;
>  
>  /*
>   * Number of cache slots for each bin in the thread cache, or 0 if tcache
>   * is
> @@ -6184,10 +6184,17 @@ _malloc_thread_cleanup(void)
>  #ifdef MALLOC_TCACHE
> tcache_t *tcache = tcache_tls;
>  
> +fprintf(stderr, "_m_t_c for %d:%lu with %p\n", 
> +   getpid(),
> +   (unsigned long) _pthread_self(),
> +   tcache);
> +
> if (tcache != NULL) {
> -   assert(tcache != (void *)(uintptr_t)1);
> -   tcache_destroy(tcache);
> -   tcache_tls = (void *)(uintptr_t)1;
> +   /* assert(tcache != (void *)(uintptr_t)1); */
> +   if((uintptr_t)tcache != (uintptr_t)1) {
> +   tcache_destroy(tcache);
> +   tcache_tls = (void *)(uintptr_t)1;
> +   }

and libthr/thread/thr_create.c thus:

> --- a/lib/libthr/thread/thr_create.c
> +++ b/lib/libthr/thread/thr_create.c
> @@ -243,6 +243,8 @@ create_stack(struct pthread_attr *pattr)
> return (ret);
>  }
>  
> +extern __thread void *tcache_tls;
> +
>  static void
>  thread_start(struct pthread *curthread)
>  {
> @@ -280,6 +282,11 @@ thread_start(struct pthread *curthread)
> curthread->attr.stacksize_attr;
>  #endif
>  
> +fprintf(stderr, "t_s for %d:%lu with %p\n",
> +getpid(),
> +(unsigned long) _pthread_self(),
> +tcache_tls);
> +
> /* Run the current thread's start routine with argument: */
> _pthread_exit(curthread->start_routine(curthread->arg));
>  

to attempt to debug this issue.  With those changes in place, bonnie++'s
execution looks like this:

>[...]
> Writing a byte at a time...done
> Writing intelligently...done
> Rewriting...done
> Reading a byte at a time...done
> Reading intelligently...done
> t_s for 79654:1086343168 with 0x0
> t_s for 79654:1086345216 with 0x0
> t_s for 79654:1086346240 with 0x0
> t_s for 79654:1086347264 with 0x0
> t_s for 79654:1086344192 with 0x0
> start 'em...done...done...done...done..._m_t_c for 79654:1086344192 with
> 0x41404400
> _m_t_c for 79654:1086346240 with 0x40d2c400
> _m_t_c for 79654:1086343168 with 0x41404200
> _m_t_c for 79654:1086345216 with 0x41804200
> done...
> _m_t_c for 79654:1086347264 with 0x41004200
> Create files in sequential order...done.
> Stat files in sequential order...done.
> Delete files in sequential order...done.
> Create files in random order...done.
> Stat files in random order...done.
> Delete files in random order...done.
> 1.96,1.96,hydra.priv.oc.ietfng.org,1,1308217772,10M,,7,81,2644,7,3577,14,34,93,+,+++,773.7,61,16,,,
> ,,2325,74,13016,99,2342,86,3019,91,11888,99,2184,89,16397ms,1237ms,671ms,2009ms,177us,1305ms,489ms,1029
> us,270ms,140ms,53730us,250ms
> Writing a byte at a time...done
> Writing intelligently...done
> Rewriting...done
> Reading a byte at a time...done
> Reading intelligently...done
> t_s for 79654:1086343168 with 0x1
> t_s for 79654:1086346240 with 0x1
> t_s for 79654:1086345216 with 0x1
> t_s for 79654:1086347264 with 0x1
> t_s for 79654:1086344192 with 0x1
> start 'em...done...done...done...done...done...
> _m_t_c for 79654:1086347264 with 0x1
> _m_t_c for 79654:1086344192 with 0x1
> _m_t_c for 79654:1086343168 with 0x1
>[...]

So what seems to be happening is that the TLS area is being set up
incorrectly, eventually: rather than zeroing the tcache_tls value, it is
being set to 1, which means no tcache is ever allocated, so when we get
around to exiting, the assert trips.

Unfortunately, setting a breakpoint on __libc_allocate_tls seems to do bad
things to the kernel (inducing a SIR without any panic message).  I am
somewhat at a loss; help?

Thanks in advance!
--nwf;


pgpyJIoqnx75o.pgp
Description: PGP signature