Steven Dolg pisze:
>> It depends how you define being successful. I've managed to express
>> this rather simple idea but the code is horrible
>> thus I consider it as a failure.
>>   
> Well cleaning up working code cannot be that hard - you haven't invested
> years of time, have you ;-)

The problem is that I see no way how to clean it up at the moment. I have the 
feeling that I've just hit limitations of
Java as language and I can't get rid of this feeling. As I said in my original 
e-mail I would like to be proven I'm
wrong so any patches are welcome. :-)

You can treat it as a nice exercise on Java generics usage. If you are curious 
what kind of limitations I can see here are:
1. I would like to define that PipelineComponent.execute is a method with 
singature like: Event|Nothing|Continue ->
Event|Nothing|Continue. By Event|Nothing|Continue I mean case type where you 
can pass either Nothing object or Continue
object or an object that extends Event. There are no case types in Java so I 
had to introduce interfaces Continue and
Event. But even this does not solve a problem because Nothing and Continue 
implementations will not extend specific
event type that component accepts. Actually this is not a problem, as 
conceptually Nothing and Continue events are
completely different cases and should be handled differently. The problem is 
how to express this idea in Java in concise
way.

2. Have a look at PipelineImpl. There are two subclasses, what an ugliness 
right? But try to get rid of them. You would
need something like:

private Pipeline<T, W> pipeline;
private PipelineComponent<W, U> component;

This way we express that components accepts what pipeline produces but W is not 
defined anywhere. What we really need is
a tuple so you could say something like:
private <W extends Event> (Pipeline<T,W>, PipelineComponent<W, U>) 
pipelineAndComponent;

Or something like that. Again, I have no idea how to express this in Java in a 
_concise_ way.

> That adding a component actually returns a different pipeline is an
> interesting approach but I'm not sure I want to declare a new variable
> for each of the new pipelines.

Yep, that's a valid concern. Actually, this kind of construct is a 
functional-like and it at the same time enforces you
to use it differently. I don't want to go into details but the main idea is 
that handling of pipeline construction is
handled by various functions and you are just passing around partial pipeline 
without introducing any of additional
variables. This is similar to method-chaining (or method combining) in Java 
with a difference that in functional
languages function combining is perceived as a basic programming technique. As 
we are probably going to stay in Java I
would like to see if this inspires someone else to come up with casual Java 
counterpart.

> And method chaining is not really me idea of readable code.

Depends on view, but I sort of agree that in most cases it's not readable.

> Also I'm wondering what return type a SAXSerializer would have or what
> event types SAX uses.

We would have to define our own type which simply implements SAX events as 
simple classes instead of method calls as
it's done in standard way. I know that it's not the best thing to define our 
own APIs but original idea (even if
influenced by performance considerations) of passing events by method calls 
wasn't that good. Anyway, we have already
had this kind of discussion when StaX research was discussed.

> Are those event types just for the compiler or are they actually used to
> pass the data around?

They would be used for passing data around. You can see examples implementing 
reworked interfaces. For example, if
serializer produces an output stream then it just emits *one* event called 
OutputStreamEvent. Or if we want to have
partial results for this kind of serializer it could emit many events that 
would contain just fragments of the final
output. If we are at partial results, I remember you have already asked about 
it in some e-mail.

I would like to explain one nice "side-effect" of my design. I'll show how one 
functional concept - expression
evaluation laziness can be easily implemented in pipelines. In order to explain 
it I'll introduce my view on pipelines
and pipeline components.

Pipeline component is just a function f: Event|Nothing|Continue -> 
Event|Nothing|Continue. Nothing and Continue events
will be explain later. If we have f_1, f_2, ..., f_n, pipeline is just a 
function composition:
f_n * ... * f_2 * f_1 = f_n(...(f_2(f_1( )))

Now, what makes pipeline different from ordinary function composition is, in my 
opinion, that each of functions can emit
partial result based on partial input. Partial result/input is just a sequence 
of events where each is different from
Continue event. Full result of function execution is just a sequence of events 
ended with Nothing event (which is a
special marking object). The property of returning partial results makes 
functions (pipeline components) streamable.
This results in, for example, sending browser fragments of HTML page as soon as 
they are calculated without waiting for
finishing processing of all events.

If you are wondering how generator is defined, then it's just a function g: 
Nothing -> Event|Nothing. This definition
reflects the fact that generator is a special function that _generates_ events 
out of nothing from Pipeline point of
view. It does not base it's output on any incoming events but on some external 
data source that is unknown to pipeline
and is out of its focus. If generator emits all its events, it signalizes it 
with Nothing so its result is a sequence of
events ended with Nothing.

Now let's discuss Continue. This is a helper object that functions can emit in 
order to express the fact that they need
more input events in order to produce any portion of result. Think of 
transformer that replaces some fragment of XML
with another fragment of XML based on what has been in original fragment. 
Therefore it has to collect all events
repressing original XML fragment in order to produce new events. Here you can 
recognize that word "collecting" involves
some buffering but I won't go into details as I want to focus on other aspects 
and not implementation details.

Having rather precise definitions before our eyes we can move to laziness 
property of pipeline execution. In definition
of function f (pipeline component) is not said precisely when function f can 
emit Nothing event. Actually, it wasn't
part of definition but f must satisfy a property that f emits Nothing after 
finite number of receiving Nothing events
(it's reader's exercise to find out why). This means that f can emit Nothing as 
response to any kind of event.
Let's consider an example:
Pipeline P1: f_1 -> f_2 -> f_3
Pipeline P: P1 -> f_4 -> f5

Now let's assume that f_1 is a generator, generating a large stream of events 
from a big XML file or some records. Now
let's assume that f_2 is just a simple function doing some simple 
transformation like text formatting. Now f_3 is a
query function that has a query defined like: NumberOfRecord() <= 20. This 
means that in pipeline P1 we want to extract
only 20 first records of big file. What f_3 does after consuming 20 records is 
that it just returns Nothing event to say
that it's the end of result for f_3.

It means to pipeline execution that after Nothing is received from f_3 the 
whole P1 pipeline can be discarded and
execution should continue with f_4 and f_4. It means that the rest of that big 
XML file wont' be read.

I won't give you a formal definition of laziness but I'm sure you've got my 
point. With this kind of design of pipelines
we get laziness almost for free which is a nice addition after all. Isn't it?

                                                             ---- o0o ----


Ok, this e-mail got rather lengthy but I had a chance to explain to you how I 
see Cocoon Pipelines on paper. That was an
occasion for me to introduce to you a concept of lazy evaluation of pipelines. 
For you it was an opportunity to see what
have influenced my current view on pipelines design.

Thank you for your attention.

-- 
Best regards,
Grzegorz Kossakowski

Reply via email to