sorry about the length but i have to dump my views on this.
p5ee appears to me to be a framework of objects. there are many such
systems out there and the two most critical design factors are how
objects are created and how they communicate with each other. this
missive will address various forms of inter object communication (IOC)
and why i chose simple messages for stem.
RPC
the most common ways for an object to communicate with a remote object
is some form of RPC. RPC is supported in most languages and object
frameworks. it has several advantages, it is easily understood by
programmers (similar to ordinary local method calls), it is synchronous
and (usually) reliable (you know when it fails immediately). but it also
has its problems. the synchronous and blocking nature of RPC means you
can't have multiple pending requests, they have to happen one at a time
so you can't parallelize them to different remote resources. this
problem is partly cured by threads which can individually block but then
you still have to synchronize the results somewhere in another
thread. but since threads in perl are not clean yet (5.8 should fix
that, perl6 will have proper threads from the start), rpc is not a good
solution for p5ee. also many RPC systems require both sides to compile
in a template that defines the procedures and their parameters. rpc is
typically used in a client/server environment where the client initiates
the requests and it generally doesn't work well peer-to-peer (more on
those below).
message passing
the other major IOC methodology is message passing. this has become much
more common and is growing rapidly. in fact there is an industry term
message oriented middleware (MOM). BTW if you wrote a mail fetching
program with message passing, it is a MOM and POP program. <me ducks>
message passing allows for parallel requests and better peer to peer
support. also it allows messages to be queued, delayed, resent, etc.
the major disadvantage with messages is unreliable delivery. this is
countered with complex reliable queues with DB backends, etc. also most
programmers just don't grok asynchronous communications well as their
brains are too linear. :) much of the MOM market is with financial and
commercial users who demand reliable messaging. this means complex
configuring and managing of queues. two well known products from redmond
and ibm are reviewed here:
http://www.networkcomputing.com/913/913ws1.html
notice how ibm recommends a single FULL time person just to manage
mqseries queues and configuration. that is what i call heavyweight
design. i was pointed to a open source java design that was hardwired to
use mysql for its message DB. this points out another issue, tying the
queue design to a particular technology.
message passing systems usually require some form of event loop engine
and that has its benefits and negative issues as well. you can do simple
message passing without events but you will limit yourself to knowing
exactly what messages you are sending/receiving.
client/server vs. peer to peer.
we all know what client/server means but peer to peer is much fuzzier
and currently buzzier :). most of the famous p2p systems are really
client to server to client. napster, jabber, gnutella come to mind
there. even ibm's mqseries is client/server/client as you can purchase
client nodes separately from server nodes. my take is that true peer to
peer is needed in a properly designed framework. why should a program be
limited to only making requests and getting responses and others have to
only support requests and manage resources and queues? it is better if
any program can do anything by just loading the appopriate modules and
confguring them. many MOM systems are c/c++ binary only and thus limit
themselves to the fixed client/server architecture. perl has the
wonderful advantage of being able to load modules on demand. stem uses
that to support remote configuratiom whereby the new config is either
loaded locally and sent to the remote process or the remote process is
told to load a config. in either case, the config will load up its
needed modules. this is true peer to peer where any object in any
process can interact with any other object.
object and message addressing
how an object is addressed by a message (or rpc) is a major design
issue. a common technique is called publish/subscribe. an object
announces it has data to publish at a known address. other objects that
want to receive that must subscribe to that data source. i feel this is
limiting as it requires all the receiving objects to know in advance
where they get their data from. it requires a new object that maintains
the publish/subscribe database and even more work if that needs to be
persistant. this also mean that both sides have to perform some sort of
address registration before they can communicate. it is also a
unidirectional construct so replies aren't directly supported. stem uses
a simpler approach of registering each object with its globally (within
the application framework) unique address and then just using that
address in a message. so the sender just puts an address into a message
and it gets sent. there is no need to set up a queue or publish object
in advance. the addresses on both sides can be set via configs and
replies are handled as a message will have a 'from' and optional
reply-to address. the one to many benefit of publish/subscribe is easily
handled by a multiplexor module (see Stem::Switch) that can duplicate
incoming messages and redirect them to multiple destinations. the
multiplex map of course is controlled by a config and can be changed at
runtime.
one aspect of message passing that stem currently doesn't support is
guaranteed delivery. i didn't want to build stem up around some DB back
end and all that heavy bloat. the users who need that feature will be
prejudiced against perl and open source in most cases anyhow (see my
reference to commercial and financial markets above). and when and if
stem needs guaranteed messages, it will be easier to add a module that
does this (and even multiple different modules with different
guaranteed delivery methods). stem's message support is very simple and
easily extended. just by addressing a message to a queue type object,
you can change from normal tcp based delivery to a guaranteed form
(assuming the queue is configured with the end node address).
stem has a very simple address in the form of a triplet. the first part
designates the process (stem hub) and it defaults to the current
process. the second part is the destination object (stem cell) and it is
required. the final part is used to address a single object from a
family of cloned objects. read more about this at:
http://www.stemsystems.com/technotes/registry_notes.html
another issue is sending messages to your own process or internal
object. POE/IKC has an small flaw discovered by someone investigating
using it or stem for a framework backbone. POE/IKC can't send a message
from an object to itself.
serialized message formats
sending a message inside a process is easy, you create it, send it and
some other object receives and processes it. but sending one to another
process requires serializing (also called marshalling or stringifying)
so it can be sent over a pipe, socket or other IPC method. selecting the
format used to serialize messages is a funny area that gets lots of
fights. the XML crowd (BOOO!) demands it even though messages are
usually internal and never get seen by human eyes. others want their
favorite technique. my view is that the serialized format doesn't matter
one bit. currently stem used Data::Dumper and eval. it takes up 1 line
of code in 2 places. i could replace it with storable or Data::Denter in
2 seconds. in fact i plan on supporting a message header string that
identifies the message serial format so multiple styles can be supported
at one time. as for the hue and cry "but i want my external program to
be able to use BLAH format to inject messages into the system!" i say
that is what gateway translators are for. just create a module that has
a socket interface and can send/receive messages on it in your favorite
format. it converts those to p5ee (or stem) internal messages which it
sends/receives internally. then you never need to know the internal
serialized message format. everybody is happy and the message format
wars are over.
anyhow, this is long enough. i hope you see why i chose the design
decisions for stem. a lightweight and simple framework is easier to
expand than a heavyweight one is to become flexible. so please don't
load down p5ee with complex api's, message queues, strong persistance,
guaranteed delivery off the bat. i have seen threads on this list that
are pushing in that direction. that would make for a large complex
system that is no better than the others and competes against commercial
products. instead develop a simple messaging interface and layer the
complex services on top of that. then you can plug and play much more
easily and change stuff, support multiple technologies and formats, etc.
if the licensing issues could be resolved, then stem could be the
backbone of p5ee. maybe you could use it and dedicate your efforts to
adding the higher level services it lacks, including reliable delivery,
message gateways, etc. so stem would remain GPL and p5ee would be a set
of modules that are additions to the stem core. this whole issue of
licensing and copyright really needs to be worked out soon.
thanx,
uri
--
Uri Guttman ------ [EMAIL PROTECTED] -------- http://www.stemsystems.com
-- Stem is an Open Source Network Development Toolkit and Application Suite -
----- Stem and Perl Development, Systems Architecture, Design and Coding ----
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org