Re: commons-pipeline status?

Kris Nuttycombe Mon, 07 Aug 2006 10:53:48 -0700

Hi, Steve,

I'm CC'ing this email to the commons user list so that if anyone elsehas similar questions, they can benefit as well.


Steve Christensen wrote:

Hi Kris,

Hope you are doing well. I've been looking at commons-pipeline over the
weekend, and it looks very close to what we'd been thinking of in our
high-level designs. Thank you very much for making it public.

I've got a couple quick questions:

1) Some of the stages in org...pipeline.stage have
ConsumedTypes / ProducedTypes annotations, but not all of them. Some of
the ones without annotations seem like they wouldn't need them (LogStage
and RaiseEventStage), but some seem like they're missing
(URLtoInputStreamStage and InputStreamLineBreakStage)

The stages which are missing annotations are simply ones that I haven'tgotten around to annotating yet. Also, the unit tests to ensure that thevalidation components are working correctly have yet to be written, butit is definitely on the near-term todo list to get all of the validationpieces set up.

2) It looks like the pipeline holds information about branches but it's
up to the Stage implementation to route things through the branch, is
that correct? That is, if we want a pipeline with branches, we should
have some sort of RouteStage that identifies the objects being fed to
it, and calls emit(branch-key,object) to feed the  object to the correct
branch

That is correct. Usually most Stage implementations just wrap businesslogic from other classes, so in practice I will frequently combine somesort of initial processing with the routing into a single stage, buthaving a stage that simply works as a router would work fine as well.The one straight "router" that I've done used Commons-Chain for makingrouting decisions and I found it pretty simple to work with.

3) Also w/ regard to pipelines/branches, is there a mechanism to merge

the results of a branch back in to the main pipeline?

That is, we might have a pipeline that downloads files, identifies files
by extension and routed them to the correct pipeline branch. Once all
data has passed through all branches, there would be a stage that
collected all the transformed output into a package for distribution to
our customer-facing system.

                       +--> PDF processing --------------+
                     /                                   \
Download --> Route --+--> Convert .DAT --+                 \
 Files      Files    \     To XML       |                  \
                      \                 |                   \
                       \                |                    \
                        +---------------+--> Convert XML -----+--> Merge Results

To Standard into outputXML package

Yes and no; the way that this could be implemented using the currentdesign would be to have a merge stage that would be registered as aStageEventListener, and to use events to pass the objects from otherbranches back to the main branch. I haven't thought much about how to doa genuine merge of multiple branches, but it seems like it would be easyto write a Stage implementation that used the Feeder from a specificStageDriver on your main branch. Configuring this setup in code would bestraightforward; I'm not sure how one would do it using the Digesterconfiguration setup.


Hope this helps!

Kris

Cool! Looks like you guys have been busy.
I think the single FAQ, and the page describing configuration, are what
I needed to push me in the right direction to start playing with things.
I'll let you know when I've got questions.

Thanks,
Steve
Here is the most current source distribution. Our group has aclandestine copy of the project website with updated documentation athttp://gdsg.ngdc.noaa.gov/projects/commons-pipeline that will hopefullygo away if the patches get committed. Due to a Maven bug, the JavaDoclink doesn't work properly buthttp://gdsg.ngdc.noaa.gov/projects/commons-pipeline/apidocs/index-all.htmlshould have the updated javadocs.
As usual on these projects, the documentation is a little thin but ifyou have any questions about how to proceed, let me know! If you want toset up a pipeline with a Digester configuration file, a simple exampleis available in the test code in the file src/test/resources/test_conf.xml.
Kris

Steve Christensen wrote:
Hi Kris,

It's too bad that things are in limbo at the moment. I'd love to get a
look at the latest code.
Also, is there a mailing list or homepage/wiki for the project?
Specifically, I'm looking for a tutorial or set of examples that I could
use to put together a quick proof-of-concept for our architect. I'm
slowly going through the Javadoc and JUnit tests, but its slow going.

Thanks,
Steve
What happens next will depend upon whether or not a committer is willingto take on and mentor the project. I have submitted a patch set to JIRAthat can be used to bring the code base up to date with respect torecent development that's been done, but if you want to take a look atthe code sooner than that I'd be happy to just email you a sourcedistribution to get you started.
Thanks for your interest!

Kris

Steve Christensen wrote:
Hi Kris,

I'm interested in commons-pipeline. I work for a content agregator -- we
do online distribution of medical journals/books/bibliographies.

I think commons-pipeline could be a good fit for the backend of our
workflow system. We get data in many different formats, translate some
to XML, transform the XML to a standard form, then transform the
standard form to a couple different web-platform-specific formats.

It doesn't seem like there's been much activity in the Sandbox since
last year. Has commons-pipeline moved to a new location? I see from the
mailing list that moving it to Incubator was discussed.

Thanks,
Steve


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: commons-pipeline status?

Reply via email to