Grzegorz Kossakowski schrieb:
Jakob Spörk pisze:
Hello,
Hello Jakob,
I just want to give my thoughts to unified pipeline and data conversion
topic. In my opinion, the pipeline can't do the data conversion, because it
has no information about how to do this. Let's take a simple example: We
have a pipeline processing XML documents that describe images. The first
components process this xml data while the rest of the components do
operations on the actual image. Now is the question, who will transform the
xml data to image data in the middle of the pipeline?
I agree with you that pipeline implementation should not handle data conversion
because there is no generic way to
handle it.
Now I would like to answer your question: it should be another /pipeline
component/ that handles data conversion.
I believe the pipeline cannot do this, because it simply do not know how to
transform, because that’s a custom operation. You would need a component
that is on the one hand a XML consumer and on the other hand an image
producer. Providing some automatic data conversions directly in the pipeline
may help developers that need exactly these default cases but I believe it
would be harder for people requiring custom data conversions (and that are
most of the cases).
The actual architecture allows to fit any components into the pipeline, and
only the components itself have to know if they can work with their
predecessor or the component following them. That allow most flexibility
when thinking about any possible conversions. If a pipeline should do this,
you would need "plug-ins" for the pipeline that are registered and allow the
pipeline to do the conversions. But then, it is the responsibility of the
developer to register the right conversion plug-ins and you would have get
new problems if a pipeline requires two different converters from the same
to the same data type because the pipeline cannot have automatically the
information which converter to use in which situation.
I believe that these problems could be addressed by... compiler. In my opinion,
pipelines should be type-safe which
basically means that for a given pipeline fragment you know what it expects on
the input and what kind of output it
gives to you. The same goes for components. This eliminates "flexibility" of
having a component that accepts more than
one kind of input or more than one kind of output. I believe that having more
than one output or one input only adds to
complexity and does not solve any problem.
If component was going to accept more than one kind of input how a user could
know the list of accepted inputs? I guess
the only way to find out would be checking source and looking for all
"instanceof" statements in its code.
The same way as in Cocoon 2.2, I guess.
Users have to know that a FileReader must not be followed by any
component, that the Serializer must be the last component of the
pipeline and the Generator the first component.
Currently users don't need to actually read the source code to find that
out and I don't see why this would need to change.
Of course the user of a pipeline needs to know which components he uses
and he needs to know which combinations of components actually make sense.
But I also do expect him to know what the components he selected do and
whether they are compatible or not.
It's not like we're building SAX components that cannot be combined with
each other or that some StAX components won't work with some other StAX
component.
That image data represented as a bunch of bytes cannot be passed to a
SAX transformer is something I expect from someone using Cocoon.
Just as I expect as certain knowledge of relation databases from someone
using an O/R mapper.
I would prefer situation when components have well-defined type of input and
output and if you one to combine components
for which input-output pairs do not match you should add converters as
intermediate components.
I've been thinking about generic but at the same time type-safe pipelines for
some time. I've designed them on paper and
everything looked quite promising. Then moved to implementation of my ideas and
got rather disappointing result which
can be seen here:
http://github.com/gkossakowski/cocoonpipelines/tree/master
The most interesting files are:
http://github.com/gkossakowski/cocoonpipelines/tree/master/src/org/apache/cocoon/pipeline/Pipeline.java
(generic and
type-safe pipeline interface)
http://github.com/gkossakowski/cocoonpipelines/tree/master/src/org/apache/cocoon/pipeline/PipelineComponent.java
(generic and type-safe component def.)
http://github.com/gkossakowski/cocoonpipelines/tree/master/src/org/apache/cocoon/pipeline/demo/RunPipeline.java
(shows how to use that thing)
The URLs above only return "Nothing to see here yet. Move along."...
Am I doing something wrong?
The only thing cocoon can help here with is to provide as much "standard"
converters for use as possible, but it is still the responsibility of the
developer to use the right ones.
I think Cocoon could define much better, type-safe Pipeline API but we are in
unfortunate situation that we are using
language that makes it extremely hard to express this kind of generic solutions.
Of course, I would like to be proven that I'm wrong and Java is powerful enough
to let us express our ideas and solve
our problems.
Actually I'm not sure which problems that are - as I'm sure we all have
slightly different views on all this.
Some of the suggestions are actually hard for me to comprehend since I
do not know which problem(s) they are trying to address.
I agree that we should try to avoid sources for mistakes as much as we can.
But trying to build a fail-proof API usually causes more harm than good IMO.
Actually, the whole idea of pipeline is not a rocket science as it's, in
essence, just ordinary function
composition. The only unique property of pipelines I can see is that we want to
access to _partial_ results of pipeline
execution so we can make it streamable.
What "_partial_ results" would you like to get from the pipeline?
And what for?
This become more a brain-dump than a real answer to your e-mail Jakob, but I
hope you (and others) have got my point.