[Ecls-list] Improving ECL (and my software :)

Matthew Mondor Tue, 06 Sep 2011 07:00:43 -0700

Hello again,

I recently wrote a test HTTPd for ECL.  It's getting along quite well,
and can handle ~2000 requests/second on a dual Core2 system.  If it
eventually becomes very stable I intend to isolate it as a library for
use by application servers, on which frameworks and applications could
be written.  But, as it was beginning to be complete enough to write a
first stress-test dynamic application for it, it was possible to
discover some oddities.


There are two issues, one which I was already hitting all along when
developing it, which appears to be some race condition of sorts.  Even
when loading, it's possible for ECL to start looping in a busy loop or
to outright crash, yet it doesn't occur everytime.  Then eventually one
of both (stuck busy-looping or crashing) occurs randomly, but this can
take between minutes to days to occur.  If it's not a race condition,
it also could be due to some memory corruption happening somewhere in
the CL C library.

I'm not used to debugging code in gdb with as many threads and spurious
signals.  There also appears to be a problem on the NetBSD branch I'm
using with live debugging of threaded applications using gdb
(thread-related features only work properly on core dumps).  So I also
setup ECL+Emacs+SLIME+test-httpd.lisp on Linux yesterday, where perhaps
I'll find out more.

When I audited the thread locking code some weeks back, I noticed
various things which might possibly load to race conditions,
and also have written alternative mutex code.  Unfortunately, I'm not
sure that this solves any issue, for the short time I've used it I've
still seen issues.  Among the potential problems I've spotted was the
use of recursive mutexes everywhere even for non-recursive ones, along
with custom recursive counting code; also a check for the owner in
with-lock.  I attach here the alternative versions of mutex.d and
mp.lsp I also shortly tried a few weeks back.  It's possible that they
do fix some of the issues but I'm not sure yet.

Unfortunately, these kind of problems are usually the hardest to fix.
CLOS is not involved in the server code, except where standard generic
functions are used.  I would appreciate if others wish to help, audit
and/or confirm if they also experience this.  My CL experience is also
limited, having mostly used C before.  The test server code is
available and requires no external dependencies other than ECL:

cvs -z3 -d:pserver:anon...@cvs.pulsar-zone.net:/cvsroot co 
mmondor/mmsoftware/cl/server
http://cvs.pulsar-zone.net/cgi-bin/cvsweb.cgi/mmondor/mmsoftware/cl/server/

To test, simply change the options at the bottom of test-httpd.conf
(particularily the default vhost and address/port to bind).  There also
are debug options/features at the top (note that :beep is
NetBSD-specific though).  Then:

(setf *default-pathname-defaults* #P"<path-to-server-code-directory>")
(mapc #'compile-file '("dlist.lisp" "character.lisp" "html.lisp" 
"ecl-mp-server.lisp" "test-httpd.lisp"))
(load "test-httpd")

Then it should be ready (when it doesn't crash loading).  If the debug
feature is enabled for it (:test), then /test is available and contains
various information like passed GET/POST, version etc.  The little test
application I wrote yesterday is available as /chat.


The other problem, which I only discovered yesterday night when testing
the first dynamic application appears to be unicode related: In the
test application many messages can be entered of various length and
everything is fine.  Yet if I start copy-pasting UTF-8 from
UTF-8-demo.txt
(http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt) then
eventually the whole document isn't sent anymore, with the browser
waiting for it to finish loading but it doesn't (a partial page
results).

At first I thought the later problem had to do with disabled
TCP_NODELAY, SO_LINGER and possibly a flushing issue, but I disabled
those options and even tried a variant using write-char and
finish-output, with similar results.  It also doesn't seem to be
related to some string size limit or the like, as if logging the
output, it seems complete.


Both problems occur on both NetBSD and Linux, so it doesn't appear to
be a kernel or libc issue.  Both systems run 32-bit software and are
i686 (one a P4 and the other a Core2).  The latest ECL from CVS/GIT is
used, built with threads and unicode support.


Thanks for any help,
-- 
Matt

/* -*- mode: c; c-basic-offset: 8 -*- */
/*
    threads_mutex.d -- Native mutually exclusive locks.
*/
/*
    Copyright (c) 2003, Juan Jose Garcia Ripoll.
    Copyright (c) 2011, Matthew Mondor.

    ECL is free software; you can redistribute it and/or
    modify it under the terms of the GNU Library General Public
    License as published by the Free Software Foundation; either
    version 2 of the License, or (at your option) any later version.

    See file '../Copyright' for full details.
*/

#ifndef ECL_WINDOWS_THREADS

#ifndef __sun__ /* See unixinit.d for this */
#define _XOPEN_SOURCE 600       /* For pthread mutex attributes */
#endif
#include <errno.h>
#include <ecl/ecl.h>
#include <pthread.h>
#include <ecl/internal.h>

/*----------------------------------------------------------------------
 * LOCKS or MUTEX
 */

static int initialized = 0;

static pthread_mutexattr_t mutexattr_normal;
static pthread_mutexattr_t mutexattr_recursive;

static void lock_init(void)
{

        pthread_mutexattr_init(&mutexattr_normal);
        pthread_mutexattr_settype(&mutexattr_normal,
            PTHREAD_MUTEX_ERRORCHECK);

        pthread_mutexattr_init(&mutexattr_recursive);
        pthread_mutexattr_settype(&mutexattr_recursive,
            PTHREAD_MUTEX_RECURSIVE);

        initialized = 1;
}


static void
FEerror_not_a_lock(cl_object lock)
{
        FEwrong_type_argument(@'mp::lock', lock);
}

static void
FEunknown_lock_error(cl_object lock, cl_object error)
{
        FEerror("Error ~A when operating on lock ~A.", 2, error, lock);
}

static void
FEerror_deprecated_lock_api(cl_object function, cl_object lock)
{
        FEerror("Called deprecated function ~A on lock ~A.",
            2, function, lock);
}



cl_object
ecl_make_lock(cl_object name, bool recursive)
{
        cl_env_ptr the_env = ecl_process_env();
        cl_object output = ecl_alloc_object(t_lock);
        ecl_disable_interrupts_env(the_env);

        if (!initialized)
                lock_init();

        pthread_mutex_init(&output->lock.mutex,
            (recursive ? &mutexattr_recursive : &mutexattr_normal));
        output->lock.name = name;
        output->lock.holder = Cnil;
        output->lock.counter = 0;
        output->lock.recursive = recursive;

        ecl_set_finalizer_unprotected(output, Ct);
        ecl_enable_interrupts_env(the_env);
        return output;
}

@(defun mp::make-lock (&key name ((:recursive recursive) Ct))
@
        @(return ecl_make_lock(name, !Null(recursive)))
@)



/*
 * XXX The following functions are mostly useless except perhaps for
 * reflection and/or debugging.  They unfortunately also add complexity,
 * raising the chances of race conditions.
 */

cl_object
mp_recursive_lock_p(cl_object lock)
{
        cl_env_ptr env = ecl_process_env();
        if (type_of(lock) != t_lock)
                FEerror_not_a_lock(lock);
        ecl_return1(env, lock->lock.recursive? Ct : Cnil);
}

cl_object
mp_lock_name(cl_object lock)
{
        cl_env_ptr env = ecl_process_env();
        if (type_of(lock) != t_lock)
                FEerror_not_a_lock(lock);
        ecl_return1(env, lock->lock.name);
}

cl_object
mp_lock_holder(cl_object lock)
{
        if (type_of(lock) != t_lock)
                FEerror_not_a_lock(lock);
        FEerror_deprecated_lock_api(
            ecl_cstring_to_base_string_or_nil("MP:LOCK-HOLDER"),
            lock);
}

cl_object
mp_lock_mine_p(cl_object lock)
{
        if (type_of(lock) != t_lock)
                FEerror_not_a_lock(lock);
        FEerror_deprecated_lock_api(
            ecl_cstring_to_base_string_or_nil("MP:LOCK-MINE-P"),
            lock);
}

cl_object
mp_lock_count(cl_object lock)
{
        if (type_of(lock) != t_lock)
                FEerror_not_a_lock(lock);
        FEerror_deprecated_lock_api(
            ecl_cstring_to_base_string_or_nil("MP:LOCK-COUNT"),
            lock);
}

cl_object
mp_lock_count_mine(cl_object lock)
{
        if (type_of(lock) != t_lock)
                FEerror_not_a_lock(lock);
        FEerror_deprecated_lock_api(
            ecl_cstring_to_base_string_or_nil("MP:LOCK-COUNT-MINE"),
            lock);
}



/* Now let's deal as directly as possible with mutexes. */

cl_object
mp_giveup_lock(cl_object lock)
{
        int rc;
        cl_env_ptr env = ecl_process_env();
        if (type_of(lock) != t_lock)
                FEerror_not_a_lock(lock);

        if ((rc = pthread_mutex_unlock(&lock->lock.mutex)) != 0)
                FEunknown_lock_error(ecl_make_int(rc), lock);

        ecl_return1(env, Ct);
}

cl_object
mp_get_lock_nowait(cl_object lock)
{
        int rc;
        cl_env_ptr env = ecl_process_env();
        if (type_of(lock) != t_lock)
                FEerror_not_a_lock(lock);

        if ((rc = pthread_mutex_trylock(&lock->lock.mutex)) != 0)
                FEunknown_lock_error(ecl_make_int(rc), lock);

        ecl_return1(env, lock);
}

cl_object
mp_get_lock_wait(cl_object lock)
{
        int rc;
        cl_env_ptr env = ecl_process_env();
        if (type_of(lock) != t_lock)
                FEerror_not_a_lock(lock);

        if ((rc = pthread_mutex_lock(&lock->lock.mutex)) != 0)
                FEunknown_lock_error(ecl_make_int(rc), lock);

        ecl_return1(env, lock);
}

@(defun mp::get-lock (lock &optional (wait Ct))
@
        if (Null(wait))
                return mp_get_lock_nowait(lock);
        else
                return mp_get_lock_wait(lock);
@)

#endif /* ECL_WINDOWS_THREADS */

;;;;  -*- Mode: Lisp; Syntax: Common-Lisp; Package: SYSTEM -*-
;;;;
;;;;  MP.LSP  -- Multiprocessing capabilities.

;;;;  Copyright (c) 2003, Juan Jose Garcia-Ripoll
;;;;
;;;;    This program is free software; you can redistribute it and/or
;;;;    modify it under the terms of the GNU Library General Public
;;;;    License as published by the Free Software Foundation; either
;;;;    version 2 of the License, or (at your option) any later version.
;;;;
;;;;    See file '../Copyright' for full details.

#-threads
(defpackage "MP"
  (:use "CL" "SI")
  (:export "WITH-LOCK"))

(in-package "MP")

(defmacro without-interrupts (&body body)
  #!+sb-doc
  "Executes BODY with all deferrable interrupts disabled. Deferrable
interrupts arriving during execution of the BODY take effect after BODY has
been executed.

Deferrable interrupts include most blockable POSIX signals, and
SB-THREAD:INTERRUPT-THREAD. Does not interfere with garbage collection, and
unlike in many traditional Lisps using userspace threads, in SBCL
WITHOUT-INTERRUPTS does not inhibit scheduling of other threads.

Binds ALLOW-WITH-INTERRUPTS, WITH-LOCAL-INTERRUPTS and WITH-RESTORED-INTERRUPTS
as a local macros.

WITH-RESTORED-INTERRUPTS executes the body with interrupts enabled if and only
if the WITHOUT-INTERRUPTS was in an environment in which interrupts were 
allowed.

ALLOW-WITH-INTERRUPTS allows the WITH-INTERRUPTS to take effect during the
dynamic scope of its body, unless there is an outer WITHOUT-INTERRUPTS without
a corresponding ALLOW-WITH-INTERRUPTS.

WITH-LOCAL-INTERRUPTS executes its body with interrupts enabled provided that
for there is an ALLOW-WITH-INTERRUPTS for every WITHOUT-INTERRUPTS surrounding
the current one. WITH-LOCAL-INTERRUPTS is equivalent to:

  (allow-with-interrupts (with-interrupts ...))

Care must be taken not to let either ALLOW-WITH-INTERRUPTS or
WITH-LOCAL-INTERRUPTS appear in a function that escapes from inside the
WITHOUT-INTERRUPTS in:

  (without-interrupts
    ;; The body of the lambda would be executed with WITH-INTERRUPTS allowed
    ;; regardless of the interrupt policy in effect when it is called.
    (lambda () (allow-with-interrupts ...)))

  (without-interrupts
    ;; The body of the lambda would be executed with interrupts enabled
    ;; regardless of the interrupt policy in effect when it is called.
    (lambda () (with-local-interrupts ...)))
"
  (ext:with-unique-names (outer-allow-with-interrupts outer-interrupts-enabled)
    `(multiple-value-prog1
         (macrolet ((allow-with-interrupts (&body allow-forms)
                      `(let ((si:*allow-with-interrupts* 
,',outer-allow-with-interrupts))
                         ,@allow-forms))
                    (with-restored-interrupts (&body with-forms)
                      `(let ((si:*interrupts-enabled* 
,',outer-interrupts-enabled))
                         ,@with-forms))
                    (with-local-interrupts (&body with-forms)
                      `(let* ((si:*allow-with-interrupts* 
,',outer-allow-with-interrupts)
                              (si:*interrupts-enabled* 
,',outer-allow-with-interrupts))
                         (when ,',outer-allow-with-interrupts
                           (si::check-pending-interrupts))
                         (locally ,@with-forms))))
           (let* ((,outer-interrupts-enabled si:*interrupts-enabled*)
                  (si:*interrupts-enabled* nil)
                  (,outer-allow-with-interrupts si:*allow-with-interrupts*)
                  (si:*allow-with-interrupts* nil))
             (declare (ignorable ,outer-allow-with-interrupts
                                 ,outer-interrupts-enabled))
             ,@body))
       (when si:*interrupts-enabled*
         (si::check-pending-interrupts)))))

(defmacro with-interrupts (&body body)
  "Executes BODY with deferrable interrupts conditionally enabled. If there
are pending interrupts they take effect prior to executing BODY.

As interrupts are normally allowed WITH-INTERRUPTS only makes sense if there
is an outer WITHOUT-INTERRUPTS with a corresponding ALLOW-WITH-INTERRUPTS:
interrupts are not enabled if any outer WITHOUT-INTERRUPTS is not accompanied
by ALLOW-WITH-INTERRUPTS."
  (ext:with-unique-names (allowp enablep)
    ;; We could manage without ENABLEP here, but that would require
    ;; taking extra care not to ever have *ALLOW-WITH-INTERRUPTS* NIL
    ;; and *INTERRUPTS-ENABLED* T -- instead of risking future breakage
    ;; we take the tiny hit here.
    `(let* ((,allowp si:*allow-with-interrupts*)
            (,enablep si:*interrupts-enabled*)
            (si:*interrupts-enabled* (or ,enablep ,allowp)))
       (when (and ,allowp (not ,enablep))
         (si::check-pending-interrupts))
       (locally ,@body))))


(defmacro with-lock ((lock-form &rest options) &body body)
  #-threads
  `(progn ,@body)
  #+threads
  (ext:with-unique-names (lock interrupts)
    `(let ((,lock ,lock-form))
       (without-interrupts
           (unwind-protect
                (with-restored-interrupts
                    (mp::get-lock ,lock)
                  (locally ,@body))
                (mp::giveup-lock ,lock))))))

------------------------------------------------------------------------------
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev

_______________________________________________
Ecls-list mailing list
Ecls-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ecls-list

[Ecls-list] Improving ECL (and my software :)

Reply via email to