Re: [Be-devel] BE command server

W. Trevor King Mon, 27 Aug 2012 06:13:22 -0700

On Mon, Aug 27, 2012 at 01:23:28PM +0100, Niall Douglas wrote:
> On 26 Aug 2012 at 19:51, W. Trevor King wrote:
> > On Sun, Aug 26, 2012 at 07:17:55PM +0100, Niall Douglas wrote:
> > > BEurtle/BEXML takes a lot of care to ensure it works correctly if the
> > > user is simultaneously using the BE command on the same repo, mainly
> > > by being ultra-paranoid and way overusing stat(), which is slow.
> > 
> > The new command server is single threaded, so aquiring a connection
> > acts as an effective lock.  This is not ideal, but it's fine for
> > proof-of-concept.  In a hypothetical asynchronous server, changes to
> > the single in-memory Storage instance should be effectively atomic
> > without needing a lock.
> 
> I think you're missing my intended point: right now if two BE 
> instances are run simultaneously, there is a chance of data 
> corruption. BE needs to use lock files to serialise access *on* 
> *disk*.


If you're running all your commands through a single command-serve
process, you won't have simultaneous BE instances accessing the disk
(even simultaneous client BE calls).  No need for locking here.

If users start simultaneous calls to BE on that disk database anyway:

  $ be serve-commands &
  $ be add "this might corrupt the database"

Then they're silly ;).  They should instead use

  $ be serve-commands &
  $ be --server http://localhost:8000 add "this is safe"

We can't protect against everything users might dream up.  If
protection is expensive, it's better to just warn people and then let
them do what they want.

> > > Right now I watch id-cache on the assumption that if BE writes, it'll
> > > sometimes get updated - unfortunately this misses comment updates, so
> > > every time BEurtle/BEXML touches a BE repo it has to recursively scan
> > > the .be directory and hash the stat() output to detect changes.
> > 
> > For Linux systems, pyinotify might be a better approach [1].  For
> > other systems, some sort of locking system might be the best you can
> > do.
> 
> No, for inotify you need a constantly running background running 
> process. Should that process stop, inotify updates are lost, and you 
> can't guarantee a process won't be exited.

You only get corruption if process A changes the on-disk filesystem,
and process B reads some of the changing files while they are in the
act of changing.  You can also get synchronization errors if process A
changes the on-disk filesystem and process B thinks it's in-memory
versions of those files are still current.  In both cases, you can
avoid the problem if process B is using inotify and realizes that A
made a change.  If process B is not running, A can do whatever it
wants, and B will have a valid filesystem DB to load the next time it
starts.

> Agreeing on a fsynced, append-only writelog/journal system is the
> only safe way out - it's why filing systems journal after all.

As far as I can tell, my above inotify scheme is safe.  Can you give
an example usage where it breaks down?

> > This sounds reasonable, but I think it should be a readers-writer lock
> > [1].  Reading data of the BE filesystem is slow enough without forcing
> > serial reads.
> 
> I don't know what you mean by serial reads.

I meant that with a single lock for both reading and writing:

> > > The lockfile could live in the .be directory, and be held
> > > whenever a reading or writing operation is being performed.

Then simultaneous calls to `be list` would have to wait in line to
aquire the lock for reading, even though having them all reading
simultaneously would be fine.

> You can't use two lockfiles using different semantics, that's a 
> guaranteed race condition scenario.

If your locking is not sufficiently atomic, you can always have a
master lock for changing the detailed locks.

  master      (only held by one process, allows you to change locks)
   |-- read   (can be held by many, allows you to read)
   `-- write  (only held by one, allows you to write if read is empty)

where `write` must be empty for a process to aquire `read`.

This is the alternate implementation of the readers-writer lock that
avoids writer-starvation [1].

> > Sure.  We need either a persistent server to minimize re-reading, a
> > well-indexed cache, or both.  I'm not quite sure where BEXML fits in
> > here.  What does bexmlsrv.py actually serve?
> 
> BEXML serves what you ask it for, so if you ask for XML you'll get 
> XML, if you ask for JSON you'll get JSON and so on. It'll even serve 
> HTML :)

XML/JSON/HTML are formats.  I'm wondering what the *content* is.

Trevor

[1]: http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Be-devel mailing list
Be-devel@bugseverywhere.org
http://void.printf.net/cgi-bin/mailman/listinfo/be-devel

Re: [Be-devel] BE command server

Reply via email to