
This sounds fantastic!....I would be very keen on taking a look at this when
you manage to get it into the repository. I would concur that something like
this would probably site better as a separate project (commons-pipeline,
perhaps?). Sorry if I am repeating any of the previous correspondance, but I
haven't had a chance to check the list yet - do you use NIO in this project,
and do your Connectors just conform to a generic interface? If so, what is
the contract for their input/output?


-----Original Message-----
From: Kris Nuttycombe [mailto:[EMAIL PROTECTED]
Sent: 22 September 2004 15:34
To: Jakarta Commons Developers List
Subject: Re: [chain] Pipeline implementation

This is exactly the sort of use case we have developed our pipeline
under, and there are in fact already plugins to obtain data over FTP and
HTTP, and work is ongoing on an XSLT plugin.

All of our pipeline implementations are configured from XML files using
Digester. It's interesting that you mention commons-chain because that
was where I started, as well. Earlier on in the thread we discussed the
relationship of commons-chain to the pipeline idea, and came to the
conclusion that since chain is more about decision making than
concurrent processing, pipeline should probably be a separate project
with an adapter that allows a chain to be used for decision making and
processing of a single stage in the pipeline.

I'll try and get our code to Craig today so that people can have a
closer look at it.


Rory Winston wrote:

>I've been following bits of this thread (will re-read the whole thread
>when I have time), and I am fascinated with this approach. I have been
>mulling over a similar implementation for a new framework, which comes from
>real-world requirements in my work. The basic idea will be a pipeline,
>may have many inputs, and many outputs. The outputs are configurable, and
>could be e.g. FTP, HTTP, SOAP, JMS, JDBC, JCA(??) etc. An input arrives at
>one end of the pipeline via a connector, and passes through each stage in
>the pipeline. Each individual stage takes an input and produces an output,
>and the "work" done in each pipeline stage can be performed via a plugin,
>sorts. A hook will be provided in each stage so that a developer could
>insert his/her custom code. Here is a simple diagram of what I have in
>This is a trivial plugin that reads line data files from an FTP connector,
>transforms the line data to XML via  predefined transform, then passes
>through another pipeline stage that transforms the XML files to PDF. The
>final stage is another connector that stores the generated PDF files in a
>     .--------------------.               .---------------------.
>.---------------------.    .-------------.
>     |                    |               |                     |
>|                     |    |             |
>     |                    |               |                     |
>|                     |    |             |
>     |   Conn. A          |-------------- | Pipeline Stage A
>|------------------| Pipeline Stage B    |----|  Conn. B    |
>     |    (FTP)           |      .        |                     |
>|                     |    |     (JDBC)  |
>     |                    |      |        |                     |
>|                     |    |             |
>     '--------------------'      |        '---------------------'
>'---------------------'    '-------------'
>                                 |                       .
>.                      .
>                                 |                       |
>|                      |
>                                 |                       |
>|                      |
>                                 |                       |
>Transform XMl          Write PDF
>                               Read files              Transform line data
>to PDF                 to DB
>                                                               to XML
>Thinking about this problem, there are a few things that are needed:
> - A pipeline-based processing model
> - A generic connector API
> - Event-driven async. pipeline processing
> -
>A workflow specification (possibly XML) could define the pipeline "flow",
>and this could be generated via a GUI. The WebMethods Integration platform
>is the slickest example of this approach that I have seen.
>My questions are thus : could Commons-Chain (+ pipeline) be used as the
>basis for this type of processing? Are there any other open-source
>frameworks that anyone knows of that do this already?
>-----Original Message-----
>From: Kris Nuttycombe [mailto:[EMAIL PROTECTED]
>Sent: 22 September 2004 00:47
>To: Jakarta Commons Developers List
>Subject: Re: [chain] Pipeline implementation
>Alex Karasulu wrote:
>>>subscribers that share the same index have events processed in parallel.
>>>Also, perhaps instead of returning void StageHandler.handleEvent() could
>>>return a boolean value that flags whether or not the event is allowed to
>>>propagate to other stages with higher serial numbers.
>>That's also another good idea.  This almost reminds me of rule salience
>>in expert system shells.  What stage does the event have the most
>>affinity for?
>I hadn't thought of things in this context, but both the stage/event
>handling pieces of the SEDA framework and the pipeline we've developed
>here do seem a lot like frameworks for building specialized expert
>systems with concurrent processing.
>>>but then it seems like you have bleeding of the application logic into
>>>the configuration realm. Maybe one could modify the StageHandler
>>>interface by adding a method that allows you to query for the runtime
>>>class of the event returned to get around this problem.
>>I don't understand the "bleeding of the application logic" comment.
>>Could you clarify this some more and explain how this is removed when
>>the class of the event can be queried?
>StageHandler's handleEvent() method is regularly responsible for raising
>events and pass them back to the event router, right? The problem is
>that there's nothing in the public API that makes it clear what events a
>particluar StageHandler may generate, so establishing a routing scheme
>is a manual process that involves the programmer having knowledge of the
>StageHandler's internals. In a situation where you're trying to set up a
>linear routing scheme from a configuration file, it would make more
>sense that the ordering of elements in that file would determine the
>routing. If it's possible for a configuration tool to look at a
>StageHandler and determine what events the handleEvent method has the
>potential to raise, then automatic configuration becomes much simpler.
>It might also be useful to define a method on the interface that allows
>a handler to announce what events it can handle.
>>>We do things like this all the time, but I'm beginning to see how we
>>>could get around it by having a base event type that related stages all
>>>process and have each stage raise a subtype of that event. Seems a bit
>>>like going the long way around the horn for our use case, but it might
>>>add enough value to be worth it.
>>Well this way may not be the best way for you.  This is our first
>>attempt using the pub/sub pattern.  Questions about subtyping verses
>>other means have been discussed.  Right now we simply don't know which
>>way is the best way.
>I think that the pub/sub model definitely has the potential to be a lot
>more powerful than our current approach; it's just a matter of
>developing the interfaces to make them flexible enough to support use
>cases for both projects. I think that our use cases are different enough
>that if we can find a model that satisfies both it will be a broadly
>useful framework.
>Initially Craig had suggested setting up a commons-pipeline project in
>the sandbox. I've been preparing our code (licenses, submission
>agreements, etc) to make this transition. Are you at all interested in
>refactoring out the stage, event routing, and thread handling pieces
>from the network-oriented bits of SEDA into this project? There are
>definitely parts of your code that I'd like to be able to use without
>forking them, although I'm sure you don't really want to introduce
>extraneous dependencies.
>Kris Nuttycombe
>Associate Scientist
>Geospatial Data Services Group
>CIRES, National Geophysical Data Center/NOAA
>(303) 497-6337
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]

Kris Nuttycombe
Associate Scientist
Geospatial Data Services Group
CIRES, National Geophysical Data Center/NOAA
(303) 497-6337

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to