you might want to take a look at the Owen system under inferno, which addresses some of your issues here are some pointers:
http://inferno-owen.googlecode.com/hg/doc/owen/intro.pdf http://www.vitanuova.com/papers/ugrid.pdf http://code.google.com/p/inferno-owen/source/browse/man?r=d640591ab8c2faf30a8a44c9abfb02801ae2407d On 25 February 2010 22:21, Ciprian Dorin, Craciun <ciprian.crac...@gmail.com> wrote: > Hello all! > > In what follows I would like to ask for your comments regarding a > 9p file server that exports a file system with a (message) queue > semantics. (My major interest here is more about the actual semantic > itself, and less about the implementation details. But all comments > are welcomed :) .) > > To keep things short I shall prepare a small description of my > proposal (for those with not too much time), and also a longer one > (with more details for the patient ones). > > > ---------------------------------------- > Short version > ---------------------------------------- > > [Motivation] I want to obtain a message queue like IPC for > (distributed) applications, where it is not possible (or not wanted) > to implement / use an existing queueing library / implementation (like > JMS or AMQP based) (but where using the file system is a trivial > operation). Possible target languages: Bash, Tcl, Lua, even Python or > C. > > Also there could be legacy (or maybe just old) applications that > already use the file system like an IPC mechanism and which could be > just slightly updated to use a queue. Possible applications: anything > related with SMTP, just think about how qmail or Postfix works. > > [Solution] Implementing a 9p file server that exports a file > system with the following structure: > / (this / is actually relative to the mount point) > --> queues > --> <queue-name> > --> enqueue -- a folder > --> dequeue -- a folder > --> commit -- folder :) > --> rollback -- still a folder > > Possible operations (I'm assuming we use it via shell scripting, > and the commands found on most UNIX-es): > * queue access (creation if not existing, or just "opening" it): > mkdir /queues/to-smtp-gateway > * enqueue operation: > cp /.../path-to-my-email-file > /queues/to-smtp-gateway/enqueue/email-192832.eml > # instead of cp I could just create and edit the file in that folder > touch /queues/to-smtp-gateway/commit/email-192832.eml > * dequeue operation: > touch /queues/from-pop-server/dequeue/email-9283828.eml > # if there is no data in the queue the touch operation fails > # do something with the file like reading it or copying it > touch /queues/from-pop-server/commit/email-9283828.eml > * rollback: > in any case just touch the same file name inside the rollback > folder and the entire operation is rollbacked > > End of short version. :) > Comments? (Or go for the extended version.) > > > ---------------------------------------- > Extended version > ---------------------------------------- > > [Motivation -- extended] My real motives are in fact somehow > different: for the moment I'm working at a university, and here we > have a large (by our standards, but I'm betting small by your > standards) cluster of Linux servers. Sometimes I have to run some > independent simulations or jobs on (parts) of this cluster. So my > possible solutions are: > * Condor, Slurm, or any other true-and-tried queueing system -- > the problems with them are that most are big (as in heavy) solutions, > which need a stable environment, are tedious to install, and need a > lot of care; (also they need root access to install and operate...) > * Globus (either pre WS or g-Lite): I don't even want to enter > into this :) :) it just scares me... :) :) > * XCPU -- I'm aware of XCPU, but it needs me to push tasks onto > the worker nodes... (it could be used for the execution of the jobs in > my queue;) > * SSH + dtach -- my current solution -- I distribute an equal > number of job files to the servers, and then I just run a couple of > processes that try to grab a job file and execute it; (the problem is > that the job assignment is static and if one worker nodes finishes > early it just idles;) > > What I would like to have: > * (on the submitter) just copy the job files in a folder on my > workstation (laptop) and that's all; > * (on the worker nodes) just try to acquire a file from a folder, > execute it and write back the result to another folder; > > [Features] What should the queue file system support: > * transactional processing of individual enqueues / dequeues: as > seen from my short description I want to be able to obtain the data > file (in case of dequeue), read it (maybe multiple times, as in open / > read / close, again open / read / close, etc.) and only when I'm done > processing it, I want to tell the system to commit the dequeue > operation; > * transactional processing of multiple related enqueues / > dequeues: just think about an application that acts like a pipeline: > it dequeues a task, executes it and enqueues it for further processing > (to another queue); now the dequeue of the original message and the > enqueue of the processed message should be atomic; (this is of course > extended to multiple enqueues / dequeues from multiple queues); > * (maybe) tagging a messages with some meta-data, and allowing me > to dequeue only those messages that are tagged in a certain way (think > of pattern matching in Clips or Prolog); (this allows me to match two > related messages from two different queues, wait until I have both of > them and to process them as one (like a join in a workflow)); (this > could allow me to implement something like map-reduce, if one process > chooses to dequeue all messages tagged in a certain way); > * any other ideas? :) > > [Semantics] The semantics for the first feature set I've described > in the short explanation so I don't repeat them again here. > > For the multi-operation / multi-queue semantics I would propose > something like this: > / (root) > --> queues -- the same like before (only one operation is transactional) > --> transactions > --> <transaction-id> > --> queues > --> <queue-name> > --> enqueue > --> dequeue > --> commit -- only to allow applications to work > unmodified under the new transaction semantics > --> rollback > --> commit -- overall transactions commit folder > --> rollback -- likewise > > How these transactions work is simple: just `mkdir` a transaction > folder inside `transactions`. Then `mkdir` those queues that we want > to access. Then when rolling-back or commiting, just touch a file > named exactly like the transaction inside the `commit` or `rollback` > folder. > > Now about the names for transactions or enqueue / dequeue files: I > would have proposed UUIDs (and impose these names), as this would > reduce the likelihood of name clashes. > > Also because we have a central 9p file server that exports this > file system, there are two possible ways to "attach" the file system > to a node: > * each client when attaches the same file system (`aname`), it > obtains a fresh view that is not shared with any other client; (thus > one node can't interfere with another one's transaction); > * each client attaches the same `aname` and obtains a consolidated > view of all the other operations going on in the cluster; (we could > obtain thus distributed transactions;) (something like what was > obtained inside `/proc` inside a Beowulf cluster -- as I understood > from the XCPU and Beowulf papers;) > > > ---------------------------------------- > Technical (as in implementation) details > ---------------------------------------- > > I already have the operations implemented in Python (in an OOP > fashion) (both individual transactions and multi-operation / > multi-queue operations thanks to BerkeleyDB). I've already managed to > export and test a (local) file system based on Fuse (but with a > slightly different way to obtain the semantics, and only for the > individual operation transactions.) > > About the 9p protocol, I've already implemented the protocol > (decoding messages from the client -> server, and encoding messages > from the server -> client, thus the server side) in Python, and I've > exposed it to the network with the help of Twisted framework. (The > message decoder / encoder, OOP entities that embody / hide the 9p file > system semantics, and Twisted protocol and factory are all decoupled > and can be reused independently.) > > If this works nice I'm thinking to moving (if time allows) to > RabbitMQ (obtaining now distributed queues), and Erlang (better > performance from the network part of the project). Another direction > would be to stick to BerkelyDB and add support for it's key / value > tables (as BTrees or hash tables). > > Any comments or observations about my technical choices? > > > ---------- Finally the end :) :) > > > I hope I haven't missed anything. And I also hope that at least > someone has reached this phrase :). > > Thanks all of you that have devoted time to my email, > Ciprian Craciun. > >