[Bug libstdc++/21334] New: Lack of Posix compliant thread safety in std::basic_string

jkanze at cheuvreux dot com Mon, 02 May 2005 04:45:47 -0700

I am sending this to the g++ bug list on the recommendation of
Gabriel Dos Reis.  From what little I've read in the g++
documentation, I'm not convinced that the authors of the g++
library intend for it to be supported, although Posix would seem
to require it.


For the record, the statement in Posix is: "Applications shall
ensure that access to any memory location by more than one
thread of control (threads or processes) is restricted such that
no thread of control can read or modify a memory location while
another thread of control may be modifying it."  The obvious
extension to C++ is that of replacing "memory location" with
"object"; at the very least, of course, one can only require
something of "memory locations" which the user sees, directly or
indirectly.  The statement in the libstdc++-v3 FAQ (question
5.6) is: "All library objects are safe to use in a multithreaded
program as long as each thread carefully locks out access by any
other thread while it uses any object visible to another thread,
i.e., treat library objects like any other shared resource. In
general, this requirement includes both read and write access to
objects; unless otherwise documented as safe, do not assume that
two threads may access a shared standard library object at the
same time."  A considerably weaker guarantee than what one
normally expects under Posix.  (Note that the clause "like any
other shared resource" is simply false for those of us used to
the Posix model.  If I replace std::string with char[] in my
code below, the behavior is perfectly defined under Posix.)

The following is an example of a program which may cause
problems:

    #include <pthread.h>
    #include <iostream>
    #include <ostream>
    #include <string>

    std::string         g ;

    extern "C" void*
    thread1(
        void*               lock )
    {
        std::string         t( g ) ;
        pthread_mutex_lock( static_cast< pthread_mutex_t* >( lock ) ) ;
        std::cout << t << '\n' ;
        pthread_mutex_unlock( static_cast< pthread_mutex_t* >( lock ) ) ;
        return NULL ;
    }

    extern "C" void*
    thread2(
        void*               lock )
    {
        std::string         t( g.begin(), g.end() ) ;
        pthread_mutex_lock( static_cast< pthread_mutex_t* >( lock ) ) ;
        std::cout << t << '\n' ;
        pthread_mutex_unlock( static_cast< pthread_mutex_t* >( lock ) ) ;
        return NULL ;
    }

    int
    main()
    {
        g = "0123456789" ;
        pthread_mutex_t     lock ;
        pthread_mutex_init( &lock, NULL ) ;
        pthread_t           t1 ;
        if ( pthread_create( &t1, NULL, &thread1, &lock ) != 0 ) {
            std::cerr << "Could not create thread1" << std::endl ;
        }
        pthread_t           t2 ;
        if ( pthread_create( &t2, NULL, &thread2, &lock ) != 0 ) {
            std::cerr << "Could not create thread    }
        pthread_join( t1, NULL ) ;
        pthread_join( t2, NULL ) ;
        return 0 ;
    }

Consider the following scenario:

 -- Thread 2 executes first.  It gets to the expression
    g.begin() (which for purposes of argument, we will suppose
    is called before g.end() -- the problem will occur in which
    ever function is called first), and starts executing it.

    At this point, the value _M_refcount in the _Rep_base is 0,
    since there is only one instance, g, which shares the
    representation.  The representation is not "leaked", so we
    call _M_leak_hard.

    _M_leak_hard calls _M_rep()->_M_is_shared(), which returns
    false.

 -- Thread 1 interupts.  Thread 2 calls the copy constructor,
    with g as a parameter, which ultimately calls _M_grab on the
    _M_is_leaked() returns false, since the _M_refcount is still
    0 in the representation.  Thread 2 thus calls _M_refcopy()
    on the representation, which (atomically) increments
    _M_refcount.  Thread 1 leaves the copy constructor.

 -- Now back to thread 2.   Since _M_is_shared() returned false,
    thread 2 doesn't call _M_mutate -- is simply calls
    _M_set_leaked() on the representation, which sets
    _M_refcount to -1.

    We will suppose that this modification is visible to all
    other threads, despite the fact that there are no memory
    barriers around it (which means that this supposition will
    be false on certain platforms).

 -- And life goes on.  The second call to begin()/end() doesn't
    change anything, because it finds that the representation is
    already "leaked".

 -- Finally, suppose that thread 1 finishes while thread 1 is
    still using its iterators.  Thread 1 calls the destructor
    for its string.  It sees that _M_refcount < 0, concludes
    that the representation is leaked, and deletes it.  Despite
    the fact that thread 2 still holds iterators refering to it,
    and despite the fact that there is still a global variable
    (usable after the pthread_joins in main) which depends on
    it.

The problem is, of course, that the sequence which tests whether
we have to leak, and then leaks, is not atomic.

-- 
           Summary: Lack of Posix compliant thread safety in
                    std::basic_string
           Product: gcc
           Version: 3.4.3
            Status: UNCONFIRMED
          Severity: minor
          Priority: P2
         Component: libstdc++
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jkanze at cheuvreux dot com
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: sparc-sun-solaris2.8
  GCC host triplet: sparc-sun-solaris2.8
GCC target triplet: sparc-sun-solaris2.8


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21334

[Bug libstdc++/21334] New: Lack of Posix compliant thread safety in std::basic_string

Reply via email to