RE: [chain] Pipeline implementation

Rory Winston Wed, 22 Sep 2004 08:46:59 -0700

Alex,

>Very interesting...  almost sounds like a generalized cocoon or
>something.  I like how you are building up using protocol transports to
>pump stuff in and out of this.  It's definately a nice high level
>example of what can be done with pipelining and a juicy set of
>connectors.



Exactly - a "generalized Cocoon" was almost exactly what I was thinking of.
I like Cocoon's architecture
- what I would like to see is an even more generic implementation, utilizing
async. processing, and with
a generalized connector capability, so the endpoints could be
HTTP/FTP/JMS/SOAP/JDBC, or even ERP/CMS systems,
via  a JCA-type capability. The pipeline is there to transform and massage
the data along the way, and ideally,
if there was some sort of workflow or process definition protocol, the
pipeline could have the capability
to "branch" at certain points. The pipeline stages may have a generic "hook"
for custom code, and may be reusable,
so a rich workflow could be built up quickly. Again, I'm using WebMethods
Integration for inspiration here - it's
fantastic, and I haven't seen anything in the OSS sphere that goes down
exactly the same route.


Thanks for the link to Coconut, I'll check it out!

Cheers,
Rory

-----Original Message-----
From: Alex Karasulu [mailto:[EMAIL PROTECTED]
Sent: 22 September 2004 16:22
To: Jakarta Commons Developers List
Subject: RE: [chain] Pipeline implementation


On Wed, 2004-09-22 at 04:53, Rory Winston wrote:
> I've been following bits of this thread (will re-read the whole thread
later
> when I have time), and I am fascinated with this approach. I have been
> mulling over a similar implementation for a new framework, which comes
from
> real-world requirements in my work. The basic idea will be a pipeline,
which
> may have many inputs, and many outputs. The outputs are configurable, and
> could be e.g. FTP, HTTP, SOAP, JMS, JDBC, JCA(??) etc. An input arrives at
> one end of the pipeline via a connector, and passes through each stage in
> the pipeline. Each individual stage takes an input and produces an output,
> and the "work" done in each pipeline stage can be performed via a plugin,
of
> sorts. A hook will be provided in each stage so that a developer could
> insert his/her custom code. Here is a simple diagram of what I have in
mind.
> This is a trivial plugin that reads line data files from an FTP connector,
> transforms the line data to XML via  predefined transform, then passes
> through another pipeline stage that transforms the XML files to PDF. The
> final stage is another connector that stores the generated PDF files in a
> database.

Very interesting...  almost sounds like a generalized cocoon or
something.  I like how you are building up using protocol transports to
pump stuff in and out of this.  It's definately a nice high level
example of what can be done with pipelining and a juicy set of
connectors.

<snip/>

Sorry to have lost yer ascii art :(.

> Thinking about this problem, there are a few things that are needed:
>
>  - A pipeline-based processing model
>  - A generic connector API
>  - Event-driven async. pipeline processing


> A workflow specification (possibly XML) could define the pipeline "flow",
> and this could be generated via a GUI. The WebMethods Integration platform
> is the slickest example of this approach that I have seen.
>
> My questions are thus : could Commons-Chain (+ pipeline) be used as the
> basis for this type of processing? Are there any other open-source
> frameworks that anyone knows of that do this already?

This is sounding more and more like you can use a combination of chain
and seda code.  Also there is another effort which is more on the SEDA
and Async IO side called coconut at the codehaus which has similar
functionality but its missing some of the aspects of the chain library
here:

http://coconut.codehaus.org/

> -----Original Message-----
> From: Kris Nuttycombe [mailto:[EMAIL PROTECTED]
> Sent: 22 September 2004 00:47
> To: Jakarta Commons Developers List
> Subject: Re: [chain] Pipeline implementation
>
>
> Alex Karasulu wrote:
>
> >>subscribers that share the same index have events processed in parallel.
> >>Also, perhaps instead of returning void StageHandler.handleEvent() could
> >>return a boolean value that flags whether or not the event is allowed to
> >>propagate to other stages with higher serial numbers.
> >
> >
> >That's also another good idea.  This almost reminds me of rule salience
> >in expert system shells.  What stage does the event have the most
> >affinity for?
> >
> >
> I hadn't thought of things in this context, but both the stage/event
> handling pieces of the SEDA framework and the pipeline we've developed
> here do seem a lot like frameworks for building specialized expert
> systems with concurrent processing.
>
> >>but then it seems like you have bleeding of the application logic into
> >>the configuration realm. Maybe one could modify the StageHandler
> >>interface by adding a method that allows you to query for the runtime
> >>class of the event returned to get around this problem.
> >>
> >>
> >
> >I don't understand the "bleeding of the application logic" comment.
> >Could you clarify this some more and explain how this is removed when
> >the class of the event can be queried?
> >
> >
> StageHandler's handleEvent() method is regularly responsible for raising
> events and pass them back to the event router, right? The problem is
> that there's nothing in the public API that makes it clear what events a
> particluar StageHandler may generate, so establishing a routing scheme
> is a manual process that involves the programmer having knowledge of the
> StageHandler's internals. In a situation where you're trying to set up a
> linear routing scheme from a configuration file, it would make more
> sense that the ordering of elements in that file would determine the
> routing. If it's possible for a configuration tool to look at a
> StageHandler and determine what events the handleEvent method has the
> potential to raise, then automatic configuration becomes much simpler.
> It might also be useful to define a method on the interface that allows
> a handler to announce what events it can handle.
>
> >>We do things like this all the time, but I'm beginning to see how we
> >>could get around it by having a base event type that related stages all
> >>process and have each stage raise a subtype of that event. Seems a bit
> >>like going the long way around the horn for our use case, but it might
> >>add enough value to be worth it.
> >>
> >>
> >
> >Well this way may not be the best way for you.  This is our first
> >attempt using the pub/sub pattern.  Questions about subtyping verses
> >other means have been discussed.  Right now we simply don't know which
> >way is the best way.
> >
> >
>
> I think that the pub/sub model definitely has the potential to be a lot
> more powerful than our current approach; it's just a matter of
> developing the interfaces to make them flexible enough to support use
> cases for both projects. I think that our use cases are different enough
> that if we can find a model that satisfies both it will be a broadly
> useful framework.
>
> Initially Craig had suggested setting up a commons-pipeline project in
> the sandbox. I've been preparing our code (licenses, submission
> agreements, etc) to make this transition. Are you at all interested in
> refactoring out the stage, event routing, and thread handling pieces
> from the network-oriented bits of SEDA into this project? There are
> definitely parts of your code that I'd like to be able to use without
> forking them, although I'm sure you don't really want to introduce
> extraneous dependencies.
>
> Kris
>
> --
> =====================================================
> Kris Nuttycombe
> Associate Scientist
> Geospatial Data Services Group
> CIRES, National Geophysical Data Center/NOAA
> (303) 497-6337
> [EMAIL PROTECTED]
> =====================================================
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [chain] Pipeline implementation

Reply via email to