Use Sambe, not NFS [Re: New Module Idea: MLDBM::Sync]

2000-11-22 Thread Joshua Chamas

Paul Lindner wrote:
> 
> Might MLDBM::Sync work over an NFS mounted partition?  That's one
> reason I've not used the BerkeleyDB stuff yet..
> 

Paul,

For the first time, I benchmarked concurrent linux client write 
access over a SAMBA network share, and it worked, 0 data loss.
This is opposed to a NFS share accessed from linux which would
see data loss due to lack of serialization of write requests.

With MLDBM::Sync, I benchmarked 8 linux clients writing to a 
samba mount pointed at a WinNT PIII 450 over a 10Mbs network.
For 8000 writes, I got:

 SDBM_File: 105 writes/sec
 DB_File:99 writes/sec [ better than to local disk ]

It seems the network was the bottleneck on this test, as neither
client nor server CPU/disk was maxed out.  The WinNT server was 
running at 20-25% CPU utilization during the test.

As Apache::ASP $Session uses a method similar to MLDBM::Sync 
to flush i/o, you could then point StateDir to a samba/CIFS 
share to cluster well an ASP application, with 0 data loss.
My understanding is that you have a NetApp cluster which can
export CIFS?  

I'd benchmark this heavily obviously to see if there are any
NetApp cluster locking issues, but am guessing that you could
likely get 200+ ASP requests per second on a 100Mbs network, 
which will likely far exceed your base application performance.

-- Joshua
_
Joshua Chamas   Chamas Enterprises Inc.
NodeWorks >> free web link monitoring   Huntington Beach, CA  USA 
http://www.nodeworks.com1-714-625-4051

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: New Module Idea: MLDBM::Sync

2000-11-22 Thread Joshua Chamas

Paul Lindner wrote:
> 
> > I'm puzzled why people wouldn't just use version 3 of Berkeley DB (via
> > DB_File.pm or BerkeleyDB.pm) which supports multiple readers and
> > writers through a shared memory cache.  No open/close/flush required
> > per-write and very very much faster.
> >
> > Is there a reason I'm missing?
> 
> Might MLDBM::Sync work over an NFS mounted partition?  That's one
> reason I've not used the BerkeleyDB stuff yet..
> 

Kinda, but only in SDBM_File mode like Apache::ASP state.
Kinda, because flock() doesn't work over NFS, and that other
patch we worked with that called NFS locking didn't work
when I load tested it.  I've heard that a samba share might
support file locking transparently, but have yet to test this.

MLDBM::Sync uses a similar method that Apache::ASP::State does
to keep data synced.  In an NFS environment, whether data gets
committed is a matter of chance of collision.

--Joshua

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: New Module Idea: MLDBM::Sync

2000-11-22 Thread Perrin Harkins

On Wed, 22 Nov 2000, Paul Lindner wrote:
> Might MLDBM::Sync work over an NFS mounted partition?

It might, since it's lock-file based.  Performance will be poor though.  
You'd probably be better off using MySQL to share data if you have a
cluster of machines.

> That's one reason I've not used the BerkeleyDB stuff yet..

BerkeleyDB will definitely not work if you try to use it on multiple
machines that mount the same files using NFS.  It buffers writes in RAM,
so you would quickly get out of sync between machines.

- Perrin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: New Module Idea: MLDBM::Sync

2000-11-22 Thread Perrin Harkins

On Wed, 22 Nov 2000, Tim Bunce wrote:
> I'm puzzled why people wouldn't just use version 3 of Berkeley DB (via
> DB_File.pm or BerkeleyDB.pm) which supports multiple readers and
> writers through a shared memory cache.  No open/close/flush required
> per-write and very very much faster.
> 
> Is there a reason I'm missing?

There are a few.  It's much harder to build than most CPAN modules, partly
because of conflicts some people run into with the db library Red Hat
provides.  The documentation is pretty weak on how to use it with a shared
memory environment.  (You have to use BerkeleyDB.pm for this incidentally;
DB_File does not support it.)  We got past these problems and then ran
into issues with db corruption.  If Apache gets shut down with a SIGKILL
(and this seems to happen fairly often when using mod_perl), the data can
be corrupted in such a way that when you next try to open it BerkeleyDB
will just hang forever.  Sleepycat says this is a known issue with using
BerkeleyDB from Apache and they don't have a solution for it yet.  Even
using their transaction mechanism does not prevent this problem.

We tried lots of different things and finally have reached what seems to
be a solution by using database-level locks rather than page-level.  We
still get to open the database in ChildInit and keep it open, with all the
speed benefits of the shared memory buffer.  It is definitely the fastest
available way to share data between processes, but the problems we had
have got me looking at other solutions again.

If you do try it out, I'd be eager to hear what your experiences with it
are.

- Perrin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: New Module Idea: MLDBM::Sync

2000-11-22 Thread Paul Lindner

On Wed, Nov 22, 2000 at 10:58:43AM +, Tim Bunce wrote:
> On Tue, Nov 21, 2000 at 03:00:01PM -0800, Perrin Harkins wrote:
> > On Fri, 17 Nov 2000, Joshua Chamas wrote:
> > > I'm working on a new module to be used for mod_perl style 
> > > caching.  I'm calling it MLDBM::Sync because its a subclass 
> > > of MLDBM that makes sure concurrent access is serialized with 
> > > flock() and i/o flushing between reads and writes.
> > 
> > I looked through the code and couldn't see how you are doing i/o
> > flushing.  This is more of an issue with Berkeley DB than SDBM I think,
> > since Berkeley DB will cache things in memory.  Can you point to me it?
> 
> I'm puzzled why people wouldn't just use version 3 of Berkeley DB (via
> DB_File.pm or BerkeleyDB.pm) which supports multiple readers and
> writers through a shared memory cache.  No open/close/flush required
> per-write and very very much faster.
> 
> Is there a reason I'm missing?

Might MLDBM::Sync work over an NFS mounted partition?  That's one
reason I've not used the BerkeleyDB stuff yet..

-- 
Paul Lindner
[EMAIL PROTECTED]
Red Hat Inc.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: New Module Idea: MLDBM::Sync

2000-11-22 Thread Tim Bunce

On Wed, Nov 22, 2000 at 02:17:25PM +0300, Ruslan V. Sulakov wrote:
> Hi, Tim!
> 
> I'd like to use BerkeleyDB! But have you test it in mod_perl environment?

Not yet, but I will be very soon. I'm sure others are using it.

> May be I wrote scripts in wrong fasion.
> I open $dbe and $db at startup.pl of mod_perl
> Why do you think that  no open/close/flush required?

Not required *per write*.

Open when the child is started and close when the child exits.  (Probably
best not to open in the parent. I haven't checked the docs yet.)

No flush needed as the cache is shared and the last process to
disconnect from it will flush it automatically.

> Each new apache server generation (about 1 time per 30 request in my case)
> need to run starup.pl
> So, how to syncronize changes between different apache server processes when
> no flush made?
> Or am I not write?
> Or synchronization between simaltanious BerkeleyDB objects is done
> automatically throwgh DBEnvironment?

I believe so.

> I think ,this theme is very important to all developers!

Tim.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: New Module Idea: MLDBM::Sync

2000-11-22 Thread Joshua Chamas

Tim Bunce wrote:
> 
> > I looked through the code and couldn't see how you are doing i/o
> > flushing.  This is more of an issue with Berkeley DB than SDBM I think,
> > since Berkeley DB will cache things in memory.  Can you point to me it?
> 
> I'm puzzled why people wouldn't just use version 3 of Berkeley DB (via
> DB_File.pm or BerkeleyDB.pm) which supports multiple readers and
> writers through a shared memory cache.  No open/close/flush required
> per-write and very very much faster.
> 
> Is there a reason I'm missing?
> 

I'm not sure I want to go the shared memory route, generally,
and if I were to, I'd likely start with like you say BerkeleyDB
or IPC::Cache.  I know there isn't much of a learning curve, 
but its not complexity that I want to add to the system I'm 
working on now.  I've been doing stuff like MLDBM::Sync for
a while making DBMs work in multiprocess environment, and its
comforting.  1000 reads/writes per second is enough for my 
caching needs now, as its just a front end to SQL queries.

--Joshua

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: New Module Idea: MLDBM::Sync

2000-11-22 Thread Tim Bunce

On Tue, Nov 21, 2000 at 03:00:01PM -0800, Perrin Harkins wrote:
> On Fri, 17 Nov 2000, Joshua Chamas wrote:
> > I'm working on a new module to be used for mod_perl style 
> > caching.  I'm calling it MLDBM::Sync because its a subclass 
> > of MLDBM that makes sure concurrent access is serialized with 
> > flock() and i/o flushing between reads and writes.
> 
> I looked through the code and couldn't see how you are doing i/o
> flushing.  This is more of an issue with Berkeley DB than SDBM I think,
> since Berkeley DB will cache things in memory.  Can you point to me it?

I'm puzzled why people wouldn't just use version 3 of Berkeley DB (via
DB_File.pm or BerkeleyDB.pm) which supports multiple readers and
writers through a shared memory cache.  No open/close/flush required
per-write and very very much faster.

Is there a reason I'm missing?

Tim.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: New Module Idea: MLDBM::Sync

2000-11-21 Thread Perrin Harkins

On Tue, 21 Nov 2000, Joshua Chamas wrote:
> On my box, some rough numbers in writes per sec, with doing a
> tie/untie for each write, are:
> 
>   sync writes/sec with tie/untie
> 
> SDBM_File 1000
> DB_File   30
> GDBM_File 40
> 
> Note that on a RAM disk in Linux, DB_File goes to 500 writes per sec,
> but setting up a RAM disk is a pain, so I'd probably use File::Cache
> which gets about 300 writes per sec on the file system.

Useful numbers.  It looks as if File::Cache is the best approach if you
need anything beyond the SDBM size limit.  Maybe some fine-tuning of that
module could bring it more in line with SDBM performance.

If you have the RAM to spare - and I guess you do, if you're considering
things like RAM disks - you could try IPC::MM too.  I think it will be
faster than the other IPC modules because it's a Perl API to a shared hash
written in C.

- Perrin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: New Module Idea: MLDBM::Sync

2000-11-21 Thread Joshua Chamas

Perrin Harkins wrote:
> 
> On Fri, 17 Nov 2000, Joshua Chamas wrote:
> > I'm working on a new module to be used for mod_perl style
> > caching.  I'm calling it MLDBM::Sync because its a subclass
> > of MLDBM that makes sure concurrent access is serialized with
> > flock() and i/o flushing between reads and writes.
> 
> I looked through the code and couldn't see how you are doing i/o
> flushing.  This is more of an issue with Berkeley DB than SDBM I think,
> since Berkeley DB will cache things in memory.  Can you point to me it?
> 
> Also, I'm confused on the usage.  Do you open the dbm file and keep it
> open, or do you tie/untie on every request?
> 

Yes, tie/untie every request as this consistently flushes i/o
in an atomic way for SDBM_File, DB_File, GDBM_File...
Without this data can be lost, even for SDBM_File, if multiple
processes have tied to these DBMs and are writing concurrently.  
SDBM_File doesn't get corrupt though like DB_File, so access to 
it would seem error free without this sync method.

On my box, some rough numbers in writes per sec, with doing a
tie/untie for each write, are:

  sync writes/sec with tie/untie

SDBM_File   1000
DB_File 30
GDBM_File   40

Note that on a RAM disk in Linux, DB_File goes to 500 writes per sec,
but setting up a RAM disk is a pain, so I'd probably use File::Cache
which gets about 300 writes per sec on the file system.

> You might want to look at the Mason caching API.  It would be nice to make
> an interface like that available on top of a module like this.
> 

Because of the 1024 byte SDBM_File limit, Mason would probably be
better off using File::Cache for caching, but for little bits
of data SDBM_File with MLDBM::Sync works really well.

Thanks for the feedback.  BTW, if you want to experiment with it,
the module code I posted will only provide SDBM_File access.  
I have a fixed version which will work for any DBM format that
MLDBM works with, so let me know and I'll send you my latest.

--Joshua

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: New Module Idea: MLDBM::Sync

2000-11-21 Thread Perrin Harkins

On Fri, 17 Nov 2000, Joshua Chamas wrote:
> I'm working on a new module to be used for mod_perl style 
> caching.  I'm calling it MLDBM::Sync because its a subclass 
> of MLDBM that makes sure concurrent access is serialized with 
> flock() and i/o flushing between reads and writes.

I looked through the code and couldn't see how you are doing i/o
flushing.  This is more of an issue with Berkeley DB than SDBM I think,
since Berkeley DB will cache things in memory.  Can you point to me it?

Also, I'm confused on the usage.  Do you open the dbm file and keep it
open, or do you tie/untie on every request?

> Any thoughts? 

You might want to look at the Mason caching API.  It would be nice to make
an interface like that available on top of a module like this.

- Perrin

> package MLDBM::Sync;
> use MLDBM;
> use Fcntl qw(:flock);
> use strict;
> no strict qw(refs);
> use vars qw($AUTOLOAD);
> 
> sub TIEHASH { 
> my($class, $file, @args) = @_;
> 
> my $fh = "$file.lock";
> open($fh, ">>$fh") || die("can't open file $fh: $!");
> 
> bless { 
>  'args' => [ $file, @args ],
>  'lock' => $fh,
>  'keys' => [],
> };
> }
> 
> sub DESTROY { 
> my $self = shift;
> if (($self->{lock})) {
>   close($self->{lock})
> }
> }
> 
> sub AUTOLOAD {
> my $self = shift;
> $AUTOLOAD =~ /::([^:]+)$/;
> my $func = $1;
> $self->exlock;
> my $rv = $self->{dbm}->$func(@_);
> $self->unlock;
> $rv;
> }
> 
> sub STORE { 
> my $self = shift;
> $self->exlock;
> my $rv = $self->{dbm}->STORE(@_);
> $self->unlock;
> $rv;
> };
> 
> sub FETCH { 
> my $self = shift;
> $self->shlock;
> my $rv = $self->{dbm}->FETCH(@_);
> $self->unlock;
> $rv;
> };
> 
> sub FIRSTKEY {
> my $self = shift;
> $self->shlock;
> $self->{keys} = [ keys %{$self->{dbm_hash}} ];
> $self->unlock;
> $self->NEXTKEY;
> }
> 
> sub NEXTKEY {
> shift(@{shift->{keys}});
> }
> 
> sub mldbm_tie {
> my $self = shift;
> my $args = $self->{args};
> my %dbm_hash;
> my $dbm = tie(%dbm_hash, 'MLDBM', @$args) || die("can't tie to MLDBM with args: 
>".join(',', @$args)."; error: $!");
> $self->{dbm_hash} = \%dbm_hash;
> $self->{dbm} = $dbm;
> }
> 
> sub exlock {
> my $self = shift;
> flock($self->{lock}, LOCK_EX) || die("can't write lock $self->{lock}: $!");
> $self->mldbm_tie;
> }
> 
> sub shlock {
> my $self = shift;
> flock($self->{lock}, LOCK_SH) || die("can't share lock $self->{lock}: $!");
> $self->mldbm_tie;
> }
> 
> sub unlock {
> my $self = shift;
> undef $self->{dbm};
> untie %{$self->{dbm_hash}};
> flock($self->{lock}, LOCK_UN) || die("can't unlock $self->{lock}: $!");
> }


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




New Module Idea: MLDBM::Sync

2000-11-17 Thread Joshua Chamas

Hey,

I'm working on a new module to be used for mod_perl style 
caching.  I'm calling it MLDBM::Sync because its a subclass 
of MLDBM that makes sure concurrent access is serialized with 
flock() and i/o flushing between reads and writes.  Below is 
the code for the module.  I believe it could be used too as a 
safe backing store for Memoize in a multi-process environment.

It could be used like:

  tie %mldbm, 'MLDBM::Sync', '/tmp/mldbm_dbm', O_CREAT|O_RDWR, 0666;
  $mldbm{rand()} = [ rand() ];
  %mldbm = ();

The history is that I hunted around for on disk caching in 
which I can stuff db query results temporarily, and the best I 
liked was File::Cache, which is really cool BTW.  I would use it, 
but MLDBM::Sync using default SDBM_File seems to be 2 to 3 times 
faster, getting about 1000 writes / sec on my dual PIII 400.

MLDBM::Sync using MLDBM in DB_File mode is considerably slower 
than File::Cache, by 5-10 times, so it really depends on the
data you want to store, for which you might use.  The 1024 byte
limit on SDBM_File makes it often not the right choice.

I also thought about calling it MLDBM::Lock, MLDBM::Serialize, 
MLDBM::Multi ... I like MLDBM::Sync though.  For modperl
caching usage, I imagine tieing to it in each child, and clearing
when necessary, perhaps even at parent httpd initialization...
no auto-expiration here, use File::Cache, IPC::Cache for that!

Any thoughts? 

--Joshua

_
Joshua Chamas   Chamas Enterprises Inc.
NodeWorks >> free web link monitoring   Huntington Beach, CA  USA 
http://www.nodeworks.com1-714-625-4051

package MLDBM::Sync;
use MLDBM;
use Fcntl qw(:flock);
use strict;
no strict qw(refs);
use vars qw($AUTOLOAD);

sub TIEHASH { 
my($class, $file, @args) = @_;

my $fh = "$file.lock";
open($fh, ">>$fh") || die("can't open file $fh: $!");

bless { 
   'args' => [ $file, @args ],
   'lock' => $fh,
   'keys' => [],
  };
}

sub DESTROY { 
my $self = shift;
if (($self->{lock})) {
close($self->{lock})
}
}

sub AUTOLOAD {
my $self = shift;
$AUTOLOAD =~ /::([^:]+)$/;
my $func = $1;
$self->exlock;
my $rv = $self->{dbm}->$func(@_);
$self->unlock;
$rv;
}

sub STORE { 
my $self = shift;
$self->exlock;
my $rv = $self->{dbm}->STORE(@_);
$self->unlock;
$rv;
};

sub FETCH { 
my $self = shift;
$self->shlock;
my $rv = $self->{dbm}->FETCH(@_);
$self->unlock;
$rv;
};

sub FIRSTKEY {
my $self = shift;
$self->shlock;
$self->{keys} = [ keys %{$self->{dbm_hash}} ];
$self->unlock;
$self->NEXTKEY;
}

sub NEXTKEY {
shift(@{shift->{keys}});
}

sub mldbm_tie {
my $self = shift;
my $args = $self->{args};
my %dbm_hash;
my $dbm = tie(%dbm_hash, 'MLDBM', @$args) || die("can't tie to MLDBM with args: 
".join(',', @$args)."; error: $!");
$self->{dbm_hash} = \%dbm_hash;
$self->{dbm} = $dbm;
}

sub exlock {
my $self = shift;
flock($self->{lock}, LOCK_EX) || die("can't write lock $self->{lock}: $!");
$self->mldbm_tie;
}

sub shlock {
my $self = shift;
flock($self->{lock}, LOCK_SH) || die("can't share lock $self->{lock}: $!");
$self->mldbm_tie;
}

sub unlock {
my $self = shift;
undef $self->{dbm};
untie %{$self->{dbm_hash}};
flock($self->{lock}, LOCK_UN) || die("can't unlock $self->{lock}: $!");
}

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]