[Boston.pm] RulesEngine module

Uri Guttman Wed, 19 Jan 2005 19:43:02 -0800

hi all,

we (at least i) had a good time at the social at fire+ice tonight. i was
discussing a new module i am developing and i said i would send what i
have to the list. this is the most recent draft of the specs and there
are some open issues with the api and the design. so feel free to ask
questions, suggest features, solutions, whatever about this. i am
looking for outside eyes and branes as i have been working on this
mostly alone so far. if there is interest, we could have an open
discussion of it at the tech meeting next tuesday. in any case have fun
with this. if you want to help more or be a beta tester, contact me off
list. another goal i have with this is to give a talk about it at this
year's conferences.


uri

The class RulesEngine is designed to shepherd data through sets of logic
rules. These rules can process and modify the data as well as execute
side effects via external calls. Rules are Perl code references and they
are organized inside the engine in structures called
RulesEngine::Flows. Here are some of the primary features and benefits
of this module:

        * Can handle and track multiple data objects inside the engine
        * Data flow can be linear, or state to state or any combination
        * Data flow has support for conditionals, loops and calls
        * Can be driven from synchronous or asynchronous systems
        * Less coding needed to create complex logic systems
        * Easy to integrate in applications
        * Rules and Flows can be loaded at runtime from multiple sources
          (files, DB, network)
        * A library of common Rules and Flows is provided. They can be
          used or modified as needed
        * Useful for network protocols, state machines, business logic.

Here are the primary Classes, their preliminary specs and some of their
methods. The API is still under development. Note that any names of the
form foo/bar are alternate names for that method. In your feedback
select the names you prefer.

All objects below will be constructed with key/value argument
lists. They can be passed as a hash ref or a list of pairs.


Class RulesEngine


This class is the core of the rules engine system. It contains all the
Data, Rule and Flow objects (see below) and manages the flow of data
through the engine. It is where all the external state of the Data
objects is maintained (of course, the Data objects maintain their own
data). Rules and Flows are inserted into the engine object and then data
objects are injected into it and targeted to a named Flow. The engine
manages which Flows and Rules get passed the Data based on the results
of the Rules.


Question: should there be only a singleton RulesEngine or should it
allow multiple instances? you can always emulate multiple state machines
inside a single one with proper namespace management. In any case, the
class must track all created engines so it can map data objects to their
owning engine objects.


Timeouts are a critical feature for a rules engine/state machine so it
can be used in real time situations such as protocols. The module will
support multiple timeout techniques. unsupported timeout techniques can
be handled with the timeout_create method.


Methods and their arguments

RulesEngine class

        new()           constructs a new RulesEngine object

                flows           list of pairs of flow names and their rules.
                                or a list of flow object? more below.

                flow_file       file to load that has flows/rules.
                                multiple file/flow formats will be
                                eventually supported.

                rules           list of rules and their names. used to
                                installed rules that can be accessed by
                                name

                timeout_style   select from a support list of timeout
                                styles. these will include Event.pm,
                                Stem, SIGALARM, etc.

                timeout_create  this argument is a code ref to call when
                                the engine wants to create a
                                timeout. this call is passed the data
                                object (to be called back when the
                                timeout triggers), the timeout
                                period. the call returns a timeout
                                object which has a cancel method.


        install_flows()         install flows into the engine.
                                can be used on empty or running engine.


                flows           list of pairs of flow names and their rules.
                                or a list of flow objects? more below.

                flow_file       file to load that has flows/rules.
                                multiple file/flow formats will be
                                eventually supported.

        install_rules()         install named rules into the engine.

                                list of rules and their names. used to
                                installed rules that can be accessed by
                                name

                name            name of rule. global to the engine

                rule            code reference

                rule_file       a file of rules to be installed

        inject_data()           inject data object to a flow in this engine
        

                data            data object being injected (could be a
                                hash and made into an object which is
                                returned).

                flow            name of flow which gets the data


                WAIT            don't start executing this flow. Execute
                                when the first trigger happens.


        status                  returns the status of the engine. info
                                about its rules, flows and data.

                rules           return status of all named rules

                flows           return status of all flows

                data            return status of all data objects


RulesEngine::Flow class

        Note: i am not sure if this needs to be blessed into any
        classes. i will assume it is blessed for now.

        a flow is created with a list/tree of rules, a attached set of
        arguments and a unique name (in the flow namespace of this
        engine).  the list/tree of rules is to be executed sequentially
        or in some other logical flow.

        the only external method on a flow would be status. this would
        report the flow's argument hash (see below), and anything else
        we want to know (not much as the flow is mostly static). there
        are some internal methods that start the flow on a data object
        but that may be handled by the engine itself.  all of the work
        in a flow is done by its interpeter which crawls the rules
        list/tree and executes its rules.

        the program counter and call stack of a flow is actually data
        object specific (see below). the flow keeps no special state
        information.

        the flow's argument hash is passed to each of the rules it
        executes. it is a way to pass custom values to an instance of a
        flow. you can consider it as a flow's environment hash.  a flow
        is effectively read only when it is installed with its argument
        hash.

        IDEA: is this argument hash read only? can it be changed as a
        whole after the flow starts? if the outside code keeps access to
        this hash, it can be changed by outside code. this can be used
        to easily change runtime behavior on the fly.

        NOTE: i have a similar module called Stem::Cell::Flow which
        supports a simple mimi-language (thanks to damian's
        Parse::RecDescent!) which has loops, conditionals, rule
        arugments and other flow features (more can be added). flows
        could be of several types which support different input syntaxes
        and their, parsers (if needed) and an interpreter. the simplest
        flow would be a list of tokens (all are named rules) where each
        flow is executed in turn. also a more complex flow could support
        simpler ones too (the simple flow is easily done with the one
        that handles loops).

RulesEngine Rules

        RulesEngine rules are just code references or names of installed
        rules (which in turn must be code references). they are called
        in a standard way and are expected to return a status value
        which tells the engine what to do next.

        rules are called with 2 positional arguments and a list.

        the first argument is the data object which is passing through
        this rule. rules can use the data object directly as a hash
        ref. (NOTE: this may break some OO rules, but damian okayed such
        breakage as long as it is public :). the data object will likely
        contain most of the data needed by the rule.

        the second argument is the flow's argument hash (see above).
        this hash ref will have the data attached to this flow
        instance. it will be used for default values, or data that won't
        change often, or flow logic changes.

        the rest of the argument list is what the flow compiler (see
        note in flows) parsed out. (more below in named rules).

        rules return either a token or a hash ref or an empty return
        (not return undef!). the return value is used by the engine to
        determine what to do with the current data object. a single
        token can be used if the return is simple or a hash ref of
        tokens and values can be returned.

        an empty return (no undef!) tells the engine to flow this data
        to the next rule in the current flow. it is the default
        behavior. this means all rules must use explicit return calls so
        you don't get an accidental return of some random last
        expression.

        return 'WAIT'
        return { 'WAIT' => 1 } ;

        returning 'WAIT' tells the engine to stop flowing this data and
        to return to the outside wordl (our caller). this rule remains
        the current rule (the PC for this data object is
        unchanged). when this data is triggered/kicked (see below)
        again, this rule is executed again. here is an example that
        waits for a match in the data object.

                sub {
                        return if $_[0]->{BUFFER} =~ /header/ ;
                        return 'WAIT' ;
                }

        the outside world must modify the BUFFER field of the data
        object (see below) and then cause it to be triggered. the logic
        for when to WAIT or not can be very simple (as above) or very
        complex. here is a way to wait for a set of async operations to
        complete:

                sub {
                        return 'WAIT' if keys %{$_[0]->{ASYNC_KEYS}} ;
                        return ;
                }

        the code initially assigns a set of keys to ASYNC_KEYS, each one
        reflecting a single async operation (e.g. web fetches, db
        accesses). when a given async operation succeeds, you delete its
        associated key from the ASYNC_KEYS hash and retrigger this
        rule. only when all the keys are deleted will this rule flow to
        the next rule.

        QUESTION: this is where timeouts need to be integrated. i haven't
        come up with the timeout api yet. does it trigger the rule but
        pass in a special argument? is that argument in the data object?
        or does the timeout force the data to another flow (longjump
        or even try/catch style)? all ideas here are welcome. i have done
        OO callbacks for timeouts but in this design, what would you
        like a timeout to do and how do you want to have it control the
        flow?

        return { 'FLOW' => 'flow_name_foo' } ;

        that causes the data object to start executing the
        'flow_name_foo' flow. it is basically a flow oriented goto. by
        selecting what the flow name is (say a hash lookup), you can use
        this to make a flow dispatch table. an external command comes in
        and you dispatch to the flow that handles that command. it might
        even mix in some other state info to determine the flow to goto.
        this might become a standard named rule as it looks like it
        would be common.

        return { 'CLONE' => 'flow_name_foo' } ;
        return { 'CLONE' => [qw(flow_1 flow_2)] } ;

        this causes the current data object to be cloned (shallow or
        deep??) and the clones are sent to the selected flows. this can
        be used to start async operations or to trigger parallel events,
        etc. the current data object will flow through to its next rule.

        at the moment, i don't have any more rule return values but as
        you can see it will be easy to add others. any suggestions will
        be useful.

        return 'DESTROY'
        return { 'DESTROY' => 1 } ;

        this causes the data object to be destroyed. all knowledge about
        it is removed from the engine. this can be also done via the
        destroy method (QUESTION: do we need both? for other
        methods/returns as well?). you can return this when the rule
        decides that the data object's flow is done and needs no more
        processing in the engine.

        Named Rules

        rules come in two main flavors, named and anon. anonymous rules
        are just code refs (either anon subs or \&sub refs in a
        flow). named rules are installed into the engine and referenced
        in a flow by name. so named rules are meant to be reused by many
        flows throughout the engine. this means named rules will be
        effectively templated and refer to all their data by key
        names. since some flow compilers will support custom arguments
        to each rule, we can handle even more complex things in named
        rules. you can consider them a library of useful and reusuable
        rules. this implies a certain granularity of rules. if a rule is
        so long and complex that it can only be used by one application
        it is not a good idea to name it. it is the same concept of how
        you break up a code library and specify its api but different in
        that rules are called in a specific way and have a set of
        expected return values. as the engine develops and gets used in
        the real world, the list of named rules (and possibly commonly
        used flows) will grow. named rules still have to be loaded (from
        disk/db/net) and installed into the engine.

RulesEngine::Data class

        this class creates and manages the data objects which flow
        through the rules engine. a data object flows through the engine
        by being passed to the rules in sequence. since rules need quick
        and simple access to the data object, it must be a hash ref
        (hell, we could make it have accessor methods like get/set but i
        want to have speed here. any thoughts?). there is nothing
        special about the hash other than possible reserved keys (see
        design issues below).

        new()

        this is the data object constructor. it takes a single hash ref
        which will be blessed into this class and returned. this object
        can be injected into an engine with the
        RulesEngine::inject_data() method. (QUESTION: should that be a
        data or engine method? either way, both objects are needed).

        the outside code can save the data object so it can be modified
        and triggered.

        trigger/kick/signal/continue/run()

        cause this data object to execute its current rule. this is
        called to start the initial flow or to resume flowing after a
        WAIT. it has multiple choices for its name so please show your
        creativity here and pick or suggest one.

        destroy()

        destroy this data object and remove it from its engine. all
        knowledge including the data object's PC are tossed.

        get/set()

        standard get/set accessors. not needed if we allow direct public
        access to the hash

        append()

        append text to a data object member. good for building up a
        buffer of read data until it passes some criteria such as a
        pattern or size.

        Data Object Ideas:

        Tracing

        we can enable a data object tracing via a method. then each time
        a rule is called with this data, a trace log is appended
        to. this will probably need to be inside the data object but see
        above for the design issues).

        Transactions

        basic transactions on the data object can be done. when the
        transaction method is called on a data object a copy of the
        current data (and possibly its PC) is made and stored away
        (possibly in the data object itself, see the design issues
        above). in a later rule if a commit method is made on this data,
        the saved data is thrown away and the transaction is marked as
        complete. the data flows to the rule according to the return
        value of this rule. if a rollback method is called, the saved
        data (and possibly its PC) are copied back to the data
        object. that saved rule is reexecuted (if we saved the PC) or
        the next rule is selected by the return value. of course this is
        only doing transactions in memory and are not persistant.


RulesEngine advantages

when you finally look at this module, all you get is a different way to
call perl subs in sequence. and the question arises, why not just use
perl for that flow? the answer is in the term granularity. RulesEngine
allows for a flow to stop and wait for some external thing to be
satisfied (e.g. a header is read and parsed). this means you can handle
complex state systems with real time behavior. but beyond that (which
the Expect.pm module can do) is it can manage multiple flows through the
state machine at the same time. so this module is perfect for event loop
systems and other parallel applications.





-- 
Uri Guttman  ------  [EMAIL PROTECTED]  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org
 
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

[Boston.pm] RulesEngine module

Reply via email to