Hi, Steve,
I'm CC'ing this email to the commons user list so that if anyone else
has similar questions, they can benefit as well.
Steve Christensen wrote:
Hi Kris,
Hope you are doing well. I've been looking at commons-pipeline over the
weekend, and it looks very close to what we'd been thinking of in our
high-level designs. Thank you very much for making it public.
I've got a couple quick questions:
1) Some of the stages in org...pipeline.stage have
ConsumedTypes / ProducedTypes annotations, but not all of them. Some of
the ones without annotations seem like they wouldn't need them (LogStage
and RaiseEventStage), but some seem like they're missing
(URLtoInputStreamStage and InputStreamLineBreakStage)
The stages which are missing annotations are simply ones that I haven't
gotten around to annotating yet. Also, the unit tests to ensure that the
validation components are working correctly have yet to be written, but
it is definitely on the near-term todo list to get all of the validation
pieces set up.
2) It looks like the pipeline holds information about branches but it's
up to the Stage implementation to route things through the branch, is
that correct? That is, if we want a pipeline with branches, we should
have some sort of RouteStage that identifies the objects being fed to
it, and calls emit(branch-key,object) to feed the object to the correct
branch
That is correct. Usually most Stage implementations just wrap business
logic from other classes, so in practice I will frequently combine some
sort of initial processing with the routing into a single stage, but
having a stage that simply works as a router would work fine as well.
The one straight "router" that I've done used Commons-Chain for making
routing decisions and I found it pretty simple to work with.
3) Also w/ regard to pipelines/branches, is there a mechanism to merge
the results of a branch back in to the main pipeline?
That is, we might have a pipeline that downloads files, identifies files
by extension and routed them to the correct pipeline branch. Once all
data has passed through all branches, there would be a stage that
collected all the transformed output into a package for distribution to
our customer-facing system.
+--> PDF processing --------------+
/ \
Download --> Route --+--> Convert .DAT --+ \
Files Files \ To XML | \
\ | \
\ | \
+---------------+--> Convert XML -----+--> Merge Results
To Standard into output
XML package
Yes and no; the way that this could be implemented using the current
design would be to have a merge stage that would be registered as a
StageEventListener, and to use events to pass the objects from other
branches back to the main branch. I haven't thought much about how to do
a genuine merge of multiple branches, but it seems like it would be easy
to write a Stage implementation that used the Feeder from a specific
StageDriver on your main branch. Configuring this setup in code would be
straightforward; I'm not sure how one would do it using the Digester
configuration setup.
Hope this helps!
Kris
Cool! Looks like you guys have been busy.
I think the single FAQ, and the page describing configuration, are what
I needed to push me in the right direction to start playing with things.
I'll let you know when I've got questions.
Thanks,
Steve
Here is the most current source distribution. Our group has a
clandestine copy of the project website with updated documentation at
http://gdsg.ngdc.noaa.gov/projects/commons-pipeline that will hopefully
go away if the patches get committed. Due to a Maven bug, the JavaDoc
link doesn't work properly but
http://gdsg.ngdc.noaa.gov/projects/commons-pipeline/apidocs/index-all.html
should have the updated javadocs.
As usual on these projects, the documentation is a little thin but if
you have any questions about how to proceed, let me know! If you want to
set up a pipeline with a Digester configuration file, a simple example
is available in the test code in the file src/test/resources/test_conf.xml.
Kris
Steve Christensen wrote:
Hi Kris,
It's too bad that things are in limbo at the moment. I'd love to get a
look at the latest code.
Also, is there a mailing list or homepage/wiki for the project?
Specifically, I'm looking for a tutorial or set of examples that I could
use to put together a quick proof-of-concept for our architect. I'm
slowly going through the Javadoc and JUnit tests, but its slow going.
Thanks,
Steve
What happens next will depend upon whether or not a committer is willing
to take on and mentor the project. I have submitted a patch set to JIRA
that can be used to bring the code base up to date with respect to
recent development that's been done, but if you want to take a look at
the code sooner than that I'd be happy to just email you a source
distribution to get you started.
Thanks for your interest!
Kris
Steve Christensen wrote:
Hi Kris,
I'm interested in commons-pipeline. I work for a content agregator -- we
do online distribution of medical journals/books/bibliographies.
I think commons-pipeline could be a good fit for the backend of our
workflow system. We get data in many different formats, translate some
to XML, transform the XML to a standard form, then transform the
standard form to a couple different web-platform-specific formats.
It doesn't seem like there's been much activity in the Sandbox since
last year. Has commons-pipeline moved to a new location? I see from the
mailing list that moving it to Incubator was discussed.
Thanks,
Steve
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]