Use Sambe, not NFS [Re: New Module Idea: MLDBM::Sync]
Paul Lindner wrote: > > Might MLDBM::Sync work over an NFS mounted partition? That's one > reason I've not used the BerkeleyDB stuff yet.. > Paul, For the first time, I benchmarked concurrent linux client write access over a SAMBA network share, and it worked, 0 data loss. This is opposed to a NFS share accessed from linux which would see data loss due to lack of serialization of write requests. With MLDBM::Sync, I benchmarked 8 linux clients writing to a samba mount pointed at a WinNT PIII 450 over a 10Mbs network. For 8000 writes, I got: SDBM_File: 105 writes/sec DB_File:99 writes/sec [ better than to local disk ] It seems the network was the bottleneck on this test, as neither client nor server CPU/disk was maxed out. The WinNT server was running at 20-25% CPU utilization during the test. As Apache::ASP $Session uses a method similar to MLDBM::Sync to flush i/o, you could then point StateDir to a samba/CIFS share to cluster well an ASP application, with 0 data loss. My understanding is that you have a NetApp cluster which can export CIFS? I'd benchmark this heavily obviously to see if there are any NetApp cluster locking issues, but am guessing that you could likely get 200+ ASP requests per second on a 100Mbs network, which will likely far exceed your base application performance. -- Joshua _ Joshua Chamas Chamas Enterprises Inc. NodeWorks >> free web link monitoring Huntington Beach, CA USA http://www.nodeworks.com1-714-625-4051 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Module Idea: MLDBM::Sync
Paul Lindner wrote: > > > I'm puzzled why people wouldn't just use version 3 of Berkeley DB (via > > DB_File.pm or BerkeleyDB.pm) which supports multiple readers and > > writers through a shared memory cache. No open/close/flush required > > per-write and very very much faster. > > > > Is there a reason I'm missing? > > Might MLDBM::Sync work over an NFS mounted partition? That's one > reason I've not used the BerkeleyDB stuff yet.. > Kinda, but only in SDBM_File mode like Apache::ASP state. Kinda, because flock() doesn't work over NFS, and that other patch we worked with that called NFS locking didn't work when I load tested it. I've heard that a samba share might support file locking transparently, but have yet to test this. MLDBM::Sync uses a similar method that Apache::ASP::State does to keep data synced. In an NFS environment, whether data gets committed is a matter of chance of collision. --Joshua - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Module Idea: MLDBM::Sync
On Wed, 22 Nov 2000, Paul Lindner wrote: > Might MLDBM::Sync work over an NFS mounted partition? It might, since it's lock-file based. Performance will be poor though. You'd probably be better off using MySQL to share data if you have a cluster of machines. > That's one reason I've not used the BerkeleyDB stuff yet.. BerkeleyDB will definitely not work if you try to use it on multiple machines that mount the same files using NFS. It buffers writes in RAM, so you would quickly get out of sync between machines. - Perrin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Module Idea: MLDBM::Sync
On Wed, 22 Nov 2000, Tim Bunce wrote: > I'm puzzled why people wouldn't just use version 3 of Berkeley DB (via > DB_File.pm or BerkeleyDB.pm) which supports multiple readers and > writers through a shared memory cache. No open/close/flush required > per-write and very very much faster. > > Is there a reason I'm missing? There are a few. It's much harder to build than most CPAN modules, partly because of conflicts some people run into with the db library Red Hat provides. The documentation is pretty weak on how to use it with a shared memory environment. (You have to use BerkeleyDB.pm for this incidentally; DB_File does not support it.) We got past these problems and then ran into issues with db corruption. If Apache gets shut down with a SIGKILL (and this seems to happen fairly often when using mod_perl), the data can be corrupted in such a way that when you next try to open it BerkeleyDB will just hang forever. Sleepycat says this is a known issue with using BerkeleyDB from Apache and they don't have a solution for it yet. Even using their transaction mechanism does not prevent this problem. We tried lots of different things and finally have reached what seems to be a solution by using database-level locks rather than page-level. We still get to open the database in ChildInit and keep it open, with all the speed benefits of the shared memory buffer. It is definitely the fastest available way to share data between processes, but the problems we had have got me looking at other solutions again. If you do try it out, I'd be eager to hear what your experiences with it are. - Perrin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Module Idea: MLDBM::Sync
On Wed, Nov 22, 2000 at 10:58:43AM +, Tim Bunce wrote: > On Tue, Nov 21, 2000 at 03:00:01PM -0800, Perrin Harkins wrote: > > On Fri, 17 Nov 2000, Joshua Chamas wrote: > > > I'm working on a new module to be used for mod_perl style > > > caching. I'm calling it MLDBM::Sync because its a subclass > > > of MLDBM that makes sure concurrent access is serialized with > > > flock() and i/o flushing between reads and writes. > > > > I looked through the code and couldn't see how you are doing i/o > > flushing. This is more of an issue with Berkeley DB than SDBM I think, > > since Berkeley DB will cache things in memory. Can you point to me it? > > I'm puzzled why people wouldn't just use version 3 of Berkeley DB (via > DB_File.pm or BerkeleyDB.pm) which supports multiple readers and > writers through a shared memory cache. No open/close/flush required > per-write and very very much faster. > > Is there a reason I'm missing? Might MLDBM::Sync work over an NFS mounted partition? That's one reason I've not used the BerkeleyDB stuff yet.. -- Paul Lindner [EMAIL PROTECTED] Red Hat Inc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Module Idea: MLDBM::Sync
On Wed, Nov 22, 2000 at 02:17:25PM +0300, Ruslan V. Sulakov wrote: > Hi, Tim! > > I'd like to use BerkeleyDB! But have you test it in mod_perl environment? Not yet, but I will be very soon. I'm sure others are using it. > May be I wrote scripts in wrong fasion. > I open $dbe and $db at startup.pl of mod_perl > Why do you think that no open/close/flush required? Not required *per write*. Open when the child is started and close when the child exits. (Probably best not to open in the parent. I haven't checked the docs yet.) No flush needed as the cache is shared and the last process to disconnect from it will flush it automatically. > Each new apache server generation (about 1 time per 30 request in my case) > need to run starup.pl > So, how to syncronize changes between different apache server processes when > no flush made? > Or am I not write? > Or synchronization between simaltanious BerkeleyDB objects is done > automatically throwgh DBEnvironment? I believe so. > I think ,this theme is very important to all developers! Tim. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Module Idea: MLDBM::Sync
Tim Bunce wrote: > > > I looked through the code and couldn't see how you are doing i/o > > flushing. This is more of an issue with Berkeley DB than SDBM I think, > > since Berkeley DB will cache things in memory. Can you point to me it? > > I'm puzzled why people wouldn't just use version 3 of Berkeley DB (via > DB_File.pm or BerkeleyDB.pm) which supports multiple readers and > writers through a shared memory cache. No open/close/flush required > per-write and very very much faster. > > Is there a reason I'm missing? > I'm not sure I want to go the shared memory route, generally, and if I were to, I'd likely start with like you say BerkeleyDB or IPC::Cache. I know there isn't much of a learning curve, but its not complexity that I want to add to the system I'm working on now. I've been doing stuff like MLDBM::Sync for a while making DBMs work in multiprocess environment, and its comforting. 1000 reads/writes per second is enough for my caching needs now, as its just a front end to SQL queries. --Joshua - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Module Idea: MLDBM::Sync
On Tue, Nov 21, 2000 at 03:00:01PM -0800, Perrin Harkins wrote: > On Fri, 17 Nov 2000, Joshua Chamas wrote: > > I'm working on a new module to be used for mod_perl style > > caching. I'm calling it MLDBM::Sync because its a subclass > > of MLDBM that makes sure concurrent access is serialized with > > flock() and i/o flushing between reads and writes. > > I looked through the code and couldn't see how you are doing i/o > flushing. This is more of an issue with Berkeley DB than SDBM I think, > since Berkeley DB will cache things in memory. Can you point to me it? I'm puzzled why people wouldn't just use version 3 of Berkeley DB (via DB_File.pm or BerkeleyDB.pm) which supports multiple readers and writers through a shared memory cache. No open/close/flush required per-write and very very much faster. Is there a reason I'm missing? Tim. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Module Idea: MLDBM::Sync
On Tue, 21 Nov 2000, Joshua Chamas wrote: > On my box, some rough numbers in writes per sec, with doing a > tie/untie for each write, are: > > sync writes/sec with tie/untie > > SDBM_File 1000 > DB_File 30 > GDBM_File 40 > > Note that on a RAM disk in Linux, DB_File goes to 500 writes per sec, > but setting up a RAM disk is a pain, so I'd probably use File::Cache > which gets about 300 writes per sec on the file system. Useful numbers. It looks as if File::Cache is the best approach if you need anything beyond the SDBM size limit. Maybe some fine-tuning of that module could bring it more in line with SDBM performance. If you have the RAM to spare - and I guess you do, if you're considering things like RAM disks - you could try IPC::MM too. I think it will be faster than the other IPC modules because it's a Perl API to a shared hash written in C. - Perrin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Module Idea: MLDBM::Sync
Perrin Harkins wrote: > > On Fri, 17 Nov 2000, Joshua Chamas wrote: > > I'm working on a new module to be used for mod_perl style > > caching. I'm calling it MLDBM::Sync because its a subclass > > of MLDBM that makes sure concurrent access is serialized with > > flock() and i/o flushing between reads and writes. > > I looked through the code and couldn't see how you are doing i/o > flushing. This is more of an issue with Berkeley DB than SDBM I think, > since Berkeley DB will cache things in memory. Can you point to me it? > > Also, I'm confused on the usage. Do you open the dbm file and keep it > open, or do you tie/untie on every request? > Yes, tie/untie every request as this consistently flushes i/o in an atomic way for SDBM_File, DB_File, GDBM_File... Without this data can be lost, even for SDBM_File, if multiple processes have tied to these DBMs and are writing concurrently. SDBM_File doesn't get corrupt though like DB_File, so access to it would seem error free without this sync method. On my box, some rough numbers in writes per sec, with doing a tie/untie for each write, are: sync writes/sec with tie/untie SDBM_File 1000 DB_File 30 GDBM_File 40 Note that on a RAM disk in Linux, DB_File goes to 500 writes per sec, but setting up a RAM disk is a pain, so I'd probably use File::Cache which gets about 300 writes per sec on the file system. > You might want to look at the Mason caching API. It would be nice to make > an interface like that available on top of a module like this. > Because of the 1024 byte SDBM_File limit, Mason would probably be better off using File::Cache for caching, but for little bits of data SDBM_File with MLDBM::Sync works really well. Thanks for the feedback. BTW, if you want to experiment with it, the module code I posted will only provide SDBM_File access. I have a fixed version which will work for any DBM format that MLDBM works with, so let me know and I'll send you my latest. --Joshua - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Module Idea: MLDBM::Sync
On Fri, 17 Nov 2000, Joshua Chamas wrote: > I'm working on a new module to be used for mod_perl style > caching. I'm calling it MLDBM::Sync because its a subclass > of MLDBM that makes sure concurrent access is serialized with > flock() and i/o flushing between reads and writes. I looked through the code and couldn't see how you are doing i/o flushing. This is more of an issue with Berkeley DB than SDBM I think, since Berkeley DB will cache things in memory. Can you point to me it? Also, I'm confused on the usage. Do you open the dbm file and keep it open, or do you tie/untie on every request? > Any thoughts? You might want to look at the Mason caching API. It would be nice to make an interface like that available on top of a module like this. - Perrin > package MLDBM::Sync; > use MLDBM; > use Fcntl qw(:flock); > use strict; > no strict qw(refs); > use vars qw($AUTOLOAD); > > sub TIEHASH { > my($class, $file, @args) = @_; > > my $fh = "$file.lock"; > open($fh, ">>$fh") || die("can't open file $fh: $!"); > > bless { > 'args' => [ $file, @args ], > 'lock' => $fh, > 'keys' => [], > }; > } > > sub DESTROY { > my $self = shift; > if (($self->{lock})) { > close($self->{lock}) > } > } > > sub AUTOLOAD { > my $self = shift; > $AUTOLOAD =~ /::([^:]+)$/; > my $func = $1; > $self->exlock; > my $rv = $self->{dbm}->$func(@_); > $self->unlock; > $rv; > } > > sub STORE { > my $self = shift; > $self->exlock; > my $rv = $self->{dbm}->STORE(@_); > $self->unlock; > $rv; > }; > > sub FETCH { > my $self = shift; > $self->shlock; > my $rv = $self->{dbm}->FETCH(@_); > $self->unlock; > $rv; > }; > > sub FIRSTKEY { > my $self = shift; > $self->shlock; > $self->{keys} = [ keys %{$self->{dbm_hash}} ]; > $self->unlock; > $self->NEXTKEY; > } > > sub NEXTKEY { > shift(@{shift->{keys}}); > } > > sub mldbm_tie { > my $self = shift; > my $args = $self->{args}; > my %dbm_hash; > my $dbm = tie(%dbm_hash, 'MLDBM', @$args) || die("can't tie to MLDBM with args: >".join(',', @$args)."; error: $!"); > $self->{dbm_hash} = \%dbm_hash; > $self->{dbm} = $dbm; > } > > sub exlock { > my $self = shift; > flock($self->{lock}, LOCK_EX) || die("can't write lock $self->{lock}: $!"); > $self->mldbm_tie; > } > > sub shlock { > my $self = shift; > flock($self->{lock}, LOCK_SH) || die("can't share lock $self->{lock}: $!"); > $self->mldbm_tie; > } > > sub unlock { > my $self = shift; > undef $self->{dbm}; > untie %{$self->{dbm_hash}}; > flock($self->{lock}, LOCK_UN) || die("can't unlock $self->{lock}: $!"); > } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
New Module Idea: MLDBM::Sync
Hey, I'm working on a new module to be used for mod_perl style caching. I'm calling it MLDBM::Sync because its a subclass of MLDBM that makes sure concurrent access is serialized with flock() and i/o flushing between reads and writes. Below is the code for the module. I believe it could be used too as a safe backing store for Memoize in a multi-process environment. It could be used like: tie %mldbm, 'MLDBM::Sync', '/tmp/mldbm_dbm', O_CREAT|O_RDWR, 0666; $mldbm{rand()} = [ rand() ]; %mldbm = (); The history is that I hunted around for on disk caching in which I can stuff db query results temporarily, and the best I liked was File::Cache, which is really cool BTW. I would use it, but MLDBM::Sync using default SDBM_File seems to be 2 to 3 times faster, getting about 1000 writes / sec on my dual PIII 400. MLDBM::Sync using MLDBM in DB_File mode is considerably slower than File::Cache, by 5-10 times, so it really depends on the data you want to store, for which you might use. The 1024 byte limit on SDBM_File makes it often not the right choice. I also thought about calling it MLDBM::Lock, MLDBM::Serialize, MLDBM::Multi ... I like MLDBM::Sync though. For modperl caching usage, I imagine tieing to it in each child, and clearing when necessary, perhaps even at parent httpd initialization... no auto-expiration here, use File::Cache, IPC::Cache for that! Any thoughts? --Joshua _ Joshua Chamas Chamas Enterprises Inc. NodeWorks >> free web link monitoring Huntington Beach, CA USA http://www.nodeworks.com1-714-625-4051 package MLDBM::Sync; use MLDBM; use Fcntl qw(:flock); use strict; no strict qw(refs); use vars qw($AUTOLOAD); sub TIEHASH { my($class, $file, @args) = @_; my $fh = "$file.lock"; open($fh, ">>$fh") || die("can't open file $fh: $!"); bless { 'args' => [ $file, @args ], 'lock' => $fh, 'keys' => [], }; } sub DESTROY { my $self = shift; if (($self->{lock})) { close($self->{lock}) } } sub AUTOLOAD { my $self = shift; $AUTOLOAD =~ /::([^:]+)$/; my $func = $1; $self->exlock; my $rv = $self->{dbm}->$func(@_); $self->unlock; $rv; } sub STORE { my $self = shift; $self->exlock; my $rv = $self->{dbm}->STORE(@_); $self->unlock; $rv; }; sub FETCH { my $self = shift; $self->shlock; my $rv = $self->{dbm}->FETCH(@_); $self->unlock; $rv; }; sub FIRSTKEY { my $self = shift; $self->shlock; $self->{keys} = [ keys %{$self->{dbm_hash}} ]; $self->unlock; $self->NEXTKEY; } sub NEXTKEY { shift(@{shift->{keys}}); } sub mldbm_tie { my $self = shift; my $args = $self->{args}; my %dbm_hash; my $dbm = tie(%dbm_hash, 'MLDBM', @$args) || die("can't tie to MLDBM with args: ".join(',', @$args)."; error: $!"); $self->{dbm_hash} = \%dbm_hash; $self->{dbm} = $dbm; } sub exlock { my $self = shift; flock($self->{lock}, LOCK_EX) || die("can't write lock $self->{lock}: $!"); $self->mldbm_tie; } sub shlock { my $self = shift; flock($self->{lock}, LOCK_SH) || die("can't share lock $self->{lock}: $!"); $self->mldbm_tie; } sub unlock { my $self = shift; undef $self->{dbm}; untie %{$self->{dbm_hash}}; flock($self->{lock}, LOCK_UN) || die("can't unlock $self->{lock}: $!"); } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]