Rick T wrote:

The code below (server addresses Xed out for security) has been used
on my website for years, but it does seem to misbehave on rare
occasions, so I have a few questions on how I might improve it. I
apologize in advance for my amateurish coding; I’m a high school
teacher who cannot afford hiring help, so I bought a bunch of O’Reilly
books and plunged in bravely. The heavy commenting is because I don’t
often work on the code and need the reminders of why I am doing things!

This code serves multiple choice questions to students, stores their
answer and other statistics, and emails me when the test is completed.
It almost always works well, but there are exceptions which puzzle me.

First: Sometimes a test will exit early, sending me the result
although the student did not make it through all the questions. Once
this happened to the same student twice. I don’t think students are
hacking my code because this has happened only to relatively
unsophisticated individuals.

Second: In one case, although I cannot verify this, the student
claimed that she had answered a lot of questions that got lost.
Perhaps I need to save multiple copies of their score results and
delete the old ones only after a successful completion? Perhaps I need
to save data in text files and only append them? General strategies
for structuring and securing data are way above my skill level, I’m
afraid. My host offers mySQL, but when I bought a book on it I
discovered that it had a steep and scary learning curve!

Third: On one occasion my sever emailed me a student’s result twice in
quick succession (one second apart according to Header values). One of
the Received headers contained “by mx13.futurequest.net” and the other
had “mx14” instead of “13”. This is not actually a problem for me like
the anomaly above, but I am curious about it.

And of course I welcome any and all suggestions, however small, on how
to improve my code and even the strategies I am struggling to use, and
I don’t mind you being blunt about it. You folks are a treasure of
useful feedback and I am grateful for whatever you offer!




[ SNIP ]

my %progress_hash;
        my $progfile = "/xxx/xxxx/xxx/data/students/$student_id/p_$course_file";
        die "There is no file called $progfile: $!\n" unless -e $progfile;
        my $db = tie %progress_hash, 'DB_File', $progfile or die "Can't tie 
progress_hash to $progfile: $!\n";
        my $fd = $db->fd(); # get a file descriptor
        open PROGFILE, "+<&=$fd" or die "Can't safely open $progfile : $!\n";
        flock ( PROGFILE, LOCK_EX ) or die "Unable to acquire exclusive lock on 
$progfile: $!\n";
        undef $db;

[ SNIP ]

my %course_hash;
        my $coursefile = "/xxx/xxxx/xxx/data/courses/$course_file";
        die "There is no file called $coursefile: $!\n" unless -e $coursefile;
        my $db2 = tie %course_hash, 'DB_File', $coursefile, O_RDONLY or die "Can't 
tie course_hash to $coursefile: $!\n";
        my $fd2 = $db2->fd(); # get a file descriptor
        open COURSEFILE, "<&=$fd2" or die "Can't safely open $coursefile for 
reading: $!\n";
        flock ( COURSEFILE, LOCK_SH ) or die "Can't acquire a shared lock on 
$coursefile: $!";
        undef $db2;

[ SNIP ]

untie %progress_hash; close PROGFILE;
untie %course_hash; close COURSEFILE;

This could be your problem.  According to my copy of DB_File:

HINTS AND TIPS
   Locking: The Trouble with fd
       Until version 1.72 of this module, the recommended technique for
       locking DB_File databases was to flock the filehandle returned
       from the "fd" function. Unfortunately this technique has been
       shown to be fundamentally flawed (Kudos to David Harris for
       tracking this down). Use it at your own peril!

       The locking technique went like this.

           $db = tie(%db, 'DB_File', 'foo.db', O_CREAT|O_RDWR, 0644)
               || die "dbcreat foo.db $!";
           $fd = $db->fd;
           open(DB_FH, "+<&=$fd") || die "dup $!";
           flock (DB_FH, LOCK_EX) || die "flock: $!";
           ...
           $db{"Tom"} = "Jerry" ;
           ...
           flock(DB_FH, LOCK_UN);
           undef $db;
           untie %db;
           close(DB_FH);

       In simple terms, this is what happens:

       1.   Use "tie" to open the database.

       2.   Lock the database with fd & flock.

       3.   Read & Write to the database.

       4.   Unlock and close the database.

       Here is the crux of the problem. A side-effect of opening the
       DB_File database in step 2 is that an initial block from the
       database will get read from disk and cached in memory.

       To see why this is a problem, consider what can happen when two
       processes, say "A" and "B", both want to update the same DB_File
       database using the locking steps outlined above. Assume process
       "A" has already opened the database and has a write lock, but it
       hasn't actually updated the database yet (it has finished step
       2, but not started step 3 yet). Now process "B" tries to open
       the same database - step 1 will succeed, but it will block on
       step 2 until process "A" releases the lock. The important thing
       to notice here is that at this point in time both processes will
       have cached identical initial blocks from the database.

       Now process "A" updates the database and happens to change some
       of the data held in the initial buffer. Process "A" terminates,
       flushing all cached data to disk and releasing the database
       lock. At this point the database on disk will correctly reflect
       the changes made by process "A".

       With the lock released, process "B" can now continue. It also
       updates the database and unfortunately it too modifies the data
       that was in its initial buffer. Once that data gets flushed to
       disk it will overwrite some/all of the changes process "A" made
       to the database.

       The result of this scenario is at best a database that doesn't
       contain what you expect. At worst the database will corrupt.

       The above won't happen every time competing process update the
       same DB_File database, but it does illustrate why the technique
       should not be used.

   Safe ways to lock a database
       Starting with version 2.x, Berkeley DB  has internal support for
       locking.  The companion module to this one, BerkeleyDB, provides
       an interface to this locking functionality. If you are serious
       about locking Berkeley DB databases, I strongly recommend using
       BerkeleyDB.

       If using BerkeleyDB isn't an option, there are a number of
       modules available on CPAN that can be used to implement locking.
       Each one implements locking differently and has different goals
       in mind. It is therefore worth knowing the difference, so that
       you can pick the right one for your application. Here are the
       three locking wrappers:

       Tie::DB_Lock
            A DB_File wrapper which creates copies of the database file
            for read access, so that you have a kind of a
            multiversioning concurrent read system. However, updates
            are still serial. Use for databases where reads may be
            lengthy and consistency problems may occur.

       Tie::DB_LockFile
            A DB_File wrapper that has the ability to lock and unlock
            the database while it is being used. Avoids the
            tie-before-flock problem by simply re-tie-ing the database
            when you get or drop a lock.  Because of the flexibility in
            dropping and re-acquiring the lock in the middle of a
            session, this can be massaged into a system that will work
            with long updates and/or reads if the application follows
            the hints in the POD documentation.

       DB_File::Lock
            An extremely lightweight DB_File wrapper that simply flocks
            a lockfile before tie-ing the database and drops the lock
            after the untie. Allows one to use the same lockfile for
            multiple databases to avoid deadlock problems, if desired.
            Use for databases where updates are reads are quick and
            simple flock locking semantics are enough.




John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction.                   -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to