to rules engine mongers:
damian has been very useful (what else is new? :) with helping me fine
tune the rules engine api. below is the current draft of the spec and it
is mostly poddish now. it is better organized and has more code examples
and such. obviously it is still a work in progress but feedback is
welcome. more to follow as it gets written and edited. when this gets to
something close to coherent, i will start on coding it. if anyone wants
to help with that (especially testing), let me know. i will be modifying
some existing code to compile and run flows. if you want to study that,
check out Stem::Cell::Flow module in the (unreleased!) tarball
stemsystems.com/Stem-0.11.tar.gz.
have fun,
uri
=head1 NAME
RulesEngine - Manage data flowing through a tree of logic rules
=head1 SYNOPSIS
use RulesEngine ;
my $engine = RulesEngine->new( ... ) ;
$engine->install_rules(
GET => sub {
my( $engine, $data, $key ) = @_ ;
return $data->{$key} ;
},
SET => sub {
my( $engine, $data, $key, $value ) = @_ ;
$data->{$key} = $value ;
return ;
},
COUNT_DOWN_WAIT => sub {
my( $engine, $data, $key ) = @_ ;
return 'WAIT' if --$data->{$key} > 0 ;
return ;
},
...
) ;
<more synopsis>
=head1 DESCRIPTION
The class RulesEngine is designed to shepherd data through trees of
logic rules. It can be viewed as a higher level state machine with
sequential control and asynchronous support. It can handle multiple
independent data elements passing through so it is well suited for event
loop applications.
A RulesEngine object is created and populated with rules and
flows. Rules are subs (either anonymous or a code reference to named
sub) that get called and passed the current data element. Rules can
access and modify the data, trigger external events and operations and
control how the data flows to the next rule by their return value. Flows
are trees of rules and the engine passes the data from rule to rule
through a flow. Flows have basic control operations such as
if/then/while/call and the ability to pass arguments to each
of its rule.
* Can handle and track multiple data objects inside the engine
* Data flow can be linear, or state to state or any combination
* Data flow has support for conditionals, loops and calls
* Can be driven from synchronous or asynchronous systems
* Less coding needed to create complex logic systems
* Easy to integrate in applications
* Rules and Flows can be loaded at runtime from multiple sources
(files, DB, network)
* A library of common Rules and Flows is provided. They can be
used or modified as needed
* Useful for network protocols, state machines, business logic.
=heade2 Advantages
when you finally look at this module, all you get is a different way to
call perl subs in sequence. and the question arises, why not just use
perl for that flow? the answer is in the term granularity. RulesEngine
allows for a flow to stop and wait for some external thing to be
satisfied (e.g. a header is read and parsed). this means you can handle
complex state systems with real time behavior. but beyond that (which
the Expect.pm module can do) is it can manage multiple flows through the
state machine at the same time. so this module is perfect for event loop
systems and other parallel applications.
=head2 Method B<new>
This is a class method that constructs a new RulesEngine object. The
constuctor takes a list of key/value options.
<contructor options>
timeout_style
timeout_style select from a support list of timeout
styles. these will include Event.pm,
Stem, SIGALARM, etc.
timeout_creator
this argument is a code ref to call when
the engine wants to create a
timeout. this call is passed the data
object (to be called back when the
timeout triggers), the timeout
period. the call returns a timeout
object which has a cancel method.
=head2 Method B<insert_rules>
This is an object method that inserts named rules into this engine. It
takes a list of key/value pairs, where the key is the name of the rule
and the value is a code reference (typically an anonymous sub).
$engine->install_rules(
GET => sub {
my( $engine, $data, $key ) = @_ ;
return $data->{$key} ;
},
SET => sub {
my( $engine, $data, $key, $value ) = @_ ;
$data->{$key} = $value ;
return ;
},
...
) ;
=head2 Method B<insert_common_rules>
This is an object method that inserts named rules from the
RulesEngine::Common class the into this engine. It takes a list of names
of the rule sets and installs all the named rules in each set.
$engine->insert_common_rules( qw(
ACCESS
WAIT
DBI
) ) ;
=head2 Method B<insert_flows>
$engine->install_rules(
# get a data element. args are a list with a single key
GET => sub {
my( $data, $args ) = @{$_[0]}{qw( DATA ARGS )} ;
return $data->{ $args->[0] } ;
},
# set a data element. args are a list with a single key/value pair
SET => sub {
my( $data, $args ) = @{$_[0]}{qw( DATA ARGS )} ;
$data->{ $args->[0] } = $data->{ $args->[1] } ;
return ;
},
...
) ;
=head2 Method B<inject_data>
This is an object method that injects data into the engine and starts it
into a flow. It takes a hash reference for the data and a flow name.
my $data = {
} ;
$engine->inject_data( $data, 'INSERT_ROWS' ) ;
=head2 Method B<trigger_data>
<other names? trigger/kick/signal/continue/run()>
cause this data object to execute its current rule. this is called to
start the initial flow or to resume flowing after a WAIT. it has
multiple choices for its name so please show your creativity here and
pick or suggest one.
=head2 Method B<clone_data>
=head2 Method B<delete_data/destroy_data>
destroy this data object and remove it from its engine. all
knowledge including the data object's PC are tossed.
=head2 Method B<create_timeout>
This method creates a timeout for a data element in this engine.
Timeouts are single shot, they are automatically canceled and deleted
after they are triggered. You must recreate a timeout if you want it to
be triggered again.
# timeouts are in milliseconds. the actual resolution depends on what
# timer support is being used.
$engine->create_timeout( $data, 1000 ) ;
# this will set the data element 'TIMEOUT_FLAG' to true when the
# timeout is triggered. This overrides any default timeout token set
# at engine creation time.
$engine->create_timeout( $data, 1000, 'TIMEOUT_FLAG' ) ;
=head2 Method B<cancel_timeout>
This method cancels and deletes an active timeout for a data element in
this engine. It is not an error to cancel a timeout that doesn't exist.
$engine->cancel_timeout( $data ) ;
=head2 Method B<status>
This method returns status about this engine. It can return different
status based on its arguments:
data => 'ALL' # return status of all data objects
data => $data # return status of this data object
the status method needs more work. it will evolve over the implementation
period.
=head1 Engine Rules
=head2 Named Rules
rules come in two main flavors, named and anon. anonymous rules
are just code refs (either anon subs or \&sub refs in a
flow). named rules are installed into the engine and referenced
in a flow by name. so named rules are meant to be reused by many
flows throughout the engine. this means named rules will be
effectively templated and refer to all their data by key
names. since some flow compilers will support custom arguments
to each rule, we can handle even more complex things in named
rules. you can consider them a library of useful and reusuable
rules. this implies a certain granularity of rules. if a rule is
so long and complex that it can only be used by one application
it is not a good idea to name it. it is the same concept of how
you break up a code library and specify its api but different in
that rules are called in a specific way and have a set of
expected return values. as the engine develops and gets used in
the real world, the list of named rules (and possibly commonly
used flows) will grow. named rules still have to be loaded (from
disk/db/net) and installed into the engine.
=head1 Calling Rules
Rules have a very simple API, they are passed a single hash reference
which has their engine object, the current data element and the
arguments (from the compiled flow). Typically the rule accesses the hash
elements via a slice like this:
my( $engine, $data, $args ) = @{$_[0]}{qw( ENGINE DATA ARGS )} ;
Many simple rules only need to access the data element so they can
shorten that to:
my( $data ) = @{$_[0]}{qw( DATA )} ;
or even this:
my( $data ) = $_[0]->{DATA} ;
If you don't like to use $_[0], you can use shift instead:
my( $engine, $data, $args ) = @{ shift() }{qw( ENGINE DATA ARGS )} ;
my( $data ) = shift()->{DATA} ;
=head1 Rule Return Values
Rules control the flow of the data element by their return values. Rules
can be called in a Boolean (in an if/while flow statement) or flow
(anywhere else in a flow) context. The return value of a rule called in
a Boolean context is used to control the if/while as you would expect it
to. If a Boolean rule returns a Perl true, then the if/while body is
executed. Rules called in a flow context must return a scalar that
controls how the data is passed to another rule. This return value must
be either a scalar string, a hash ref or it is a plain return (undef in
a Perl scalar context). Flow context return values which are boolean
(e.g. WAIT, DELETE) can return just the appropriate token or a hash
reference with the key and a true value. Here are the allowed return
values and which rule is executed next.
RulesEngine rules are just code references or names of installed
rules (which in turn must be code references). they are called
in a standard way and are expected to return a status value
which tells the engine what to do next.
rules are called with 2 positional arguments and a list.
the first argument is the data object which is passing through
this rule. rules can use the data object directly as a hash
ref. (NOTE: this may break some OO rules, but damian okayed such
breakage as long as it is public :). the data object will likely
contain most of the data needed by the rule.
the second argument is the flow's argument hash (see above).
this hash ref will have the data attached to this flow
instance. it will be used for default values, or data that won't
change often, or flow logic changes.
the rest of the argument list is what the flow compiler (see
note in flows) parsed out. (more below in named rules).
rules return either a token or a hash ref or an empty return
(not return undef!). the return value is used by the engine to
determine what to do with the current data object. a single
token can be used if the return is simple or a hash ref of
tokens and values can be returned.
an empty return (no undef!) tells the engine to flow this data
to the next rule in the current flow. it is the default
behavior. this means all rules must use explicit return calls so
you don't get an accidental return of some random last
expression.
=head2 B<undef>
A plain return (which returns undef in a scalar context) will
cause the next rule in the flow to be executed. This is a common
return value for many rules. Be sure you always use an explicit
return for this, so you don't accidentally return the last
evaluated expression (which may be fatal??)
=head2 B<WAIT>
return 'WAIT' ;
return { WAIT => 1 } ;
This flow return value will cause this same rule to executed
the next time the data element is triggered.
returning 'WAIT' tells the engine to stop flowing this data and
to return to the outside wordl (our caller). this rule remains
the current rule (the PC for this data object is
unchanged). when this data is triggered/kicked (see below)
again, this rule is executed again. here is an example that
waits for a match in the data object.
sub {
return if $_[0]->{BUFFER} =~ /header/ ;
return 'WAIT' ;
}
the outside world must modify the BUFFER field of the data
object (see below) and then cause it to be triggered. the logic
for when to WAIT or not can be very simple (as above) or very
complex. here is a way to wait for a set of async operations to
complete:
sub {
return 'WAIT' if keys %{$_[0]->{ASYNC_KEYS}} ;
return ;
}
the code initially assigns a set of keys to ASYNC_KEYS, each one
reflecting a single async operation (e.g. web fetches, db
accesses). when a given async operation succeeds, you delete its
associated key from the ASYNC_KEYS hash and retrigger this
rule. only when all the keys are deleted will this rule flow to
the next rule.
=head2 B<FLOW_TO>
return { 'FLOW' => 'flow_name_foo' } ;
that causes the data object to start executing the
'flow_name_foo' flow. it is basically a flow oriented goto. by
selecting what the flow name is (say a hash lookup), you can use
this to make a flow dispatch table. an external command comes in
and you dispatch to the flow that handles that command. it might
even mix in some other state info to determine the flow to goto.
this might become a standard named rule as it looks like it
would be common.
=head2 B<CLONE>
return { 'CLONE' => 'flow_name_foo' } ;
return { 'CLONE' => [qw(flow_1 flow_2)] } ;
this causes the current data object to be cloned (shallow or
deep??) and the clones are sent to the selected flows. this can
be used to start async operations or to trigger parallel events,
etc. the current data object will flow through to its next rule.
at the moment, i don't have any more rule return values but as
you can see it will be easy to add others. any suggestions will
be useful.
=head2 B<DELETE>
return 'DELETE' ;
return { 'DELETE' => 1 } ;
this causes the data object to be destroyed. all knowledge about
it is removed from the engine. this can be also done via the
destroy method (QUESTION: do we need both? for other
methods/returns as well?). you can return this when the rule
decides that the data object's flow is done and needs no more
processing in the engine.
=head1 Engine Data
<to be filled in>
=head1 Data Tracing
we can enable a data object tracing via a method. then each time
a rule is called with this data, a trace log is appended
to. this will probably need to be inside the data object but see
above for the design issues).
=head1 Rule Flows
a flow is created with a list/tree of rules, a attached set of
arguments and a unique name (in the flow namespace of this
engine). the list/tree of rules is to be executed sequentially
or in some other logical flow.
the only external method on a flow would be status. this would
report the flow's argument hash (see below), and anything else
we want to know (not much as the flow is mostly static). there
are some internal methods that start the flow on a data object
but that may be handled by the engine itself. all of the work
in a flow is done by its interpeter which crawls the rules
list/tree and executes its rules.
the program counter and call stack of a flow is actually data
object specific (see below). the flow keeps no special state
information.
the flow's argument hash is passed to each of the rules it
executes. it is a way to pass custom values to an instance of a
flow. you can consider it as a flow's environment hash. a flow
is effectively read only when it is installed with its argument
hash.
IDEA: is this argument hash read only? can it be changed as a
whole after the flow starts? if the outside code keeps access to
this hash, it can be changed by outside code. this can be used
to easily change runtime behavior on the fly.
NOTE: i have a similar module called Stem::Cell::Flow which
supports a simple mimi-language (thanks to damian's
Parse::RecDescent!) which has loops, conditionals, rule
arugments and other flow features (more can be added). flows
could be of several types which support different input syntaxes
and their, parsers (if needed) and an interpreter. the simplest
flow would be a list of tokens (all are named rules) where each
flow is executed in turn. also a more complex flow could support
simpler ones too (the simple flow is easily done with the one
that handles loops).
=head1 Timeouts
Timeouts are a critical feature for a rules engine/state machine so it
can be used in real time situations such as protocols. RulesEngine
supports several standard timeout mechanisms as well as allowing the
user to provide a timeout technology. Standard supported timers are
SIGALRM, Event.pm and Stem::Event::Timer. These are selected in the
engine constuctor with the B<timer_style> option. The user provides a
timer mechanism by passing in a code reference that is called to set
<more to be done here on user supplied timeout mechanisms>
A data element is notified that it has been timed out by having one of
its elements set to true. The element name can be set when the timeout
is created, or defaulted to the name set at engine construction time
(via the timeout_element argument) or it can default to
'TIMEOUT_FLAG'. When a timeout occurs for a data element, the timeout
name is set to 1 in that data. Then the current rule for that data is
executed. That rule should check the truth of timeout name and act
accordingly. Creating a timeout will clear that timeout element so you
don't have to do that.
=head1 Transactions
basic transactions on the data object can be done. when the
transaction method is called on a data object a copy of the
current data (and possibly its PC) is made and stored away
(possibly in the data object itself, see the design issues
above). in a later rule if a commit method is made on this data,
the saved data is thrown away and the transaction is marked as
complete. the data flows to the rule according to the return
value of this rule. if a rollback method is called, the saved
data (and possibly its PC) are copied back to the data
object. that saved rule is reexecuted (if we saved the PC) or
the next rule is selected by the return value. of course this is
only doing transactions in memory and are not persistant.
=head2 EXPORT
This module doesn't export anything
=head1 AUTHOR
Uri Guttman, E<lt>[EMAIL PROTECTED]<gt>
=cut
--
Uri Guttman ------ [EMAIL PROTECTED] -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm