Well it kind of depends on what definition of union are we using. If this
is a union in a set theoretical way we can argue that the union of a stream
with itself should be the same stream because it contains exactly the same
elements with the same timestamps and lineage.

On the other hand stream and stream.map(id) are not exactly the same as
they might have elements with different order (the lineage differs).

So I wouldnt say that any self-union semantics is the only possible one.

Gyula

Bruecke, Christoph <christoph.brue...@campus.tu-berlin.de> ezt írta
(időpont: 2015. nov. 25., Sze, 13:47):

> Hi,
>
> the operation “stream.union(stream.map(id))” is equivalent to
> “stream.union(stream)” isn’t it? So it might also duplicate the data.
>
> - Christoph
>
>
> > On 25 Nov 2015, at 11:24, Stephan Ewen <se...@apache.org> wrote:
> >
> > "stream.union(stream.map(..))" should definitely be possible. Not sure
> why
> > this is not permitted.
> >
> > "stream.union(stream)" would contain each element twice, so should either
> > give an error or actually union (or duplicate) elements...
> >
> > Stephan
> >
> >
> > On Wed, Nov 25, 2015 at 10:42 AM, Gyula Fóra <gyf...@apache.org> wrote:
> >
> >> Yes, I am not sure if this the intentional behaviour. I think you are
> >> supposed to be able to do the things you described.
> >>
> >> stream.union(stream.map(..)) and things like this are fair operations.
> Also
> >> maybe stream.union(stream) should just give stream instead of an error.
> >>
> >> Could someone comment on this who knows the reasoning behind the current
> >> mechanics?
> >>
> >> Gyula
> >>
> >> Vasiliki Kalavri <vasilikikala...@gmail.com> ezt írta (időpont: 2015.
> nov.
> >> 24., K, 16:46):
> >>
> >>> Hi squirrels,
> >>>
> >>> when porting the gelly streaming code from 0.9 to 0.10 today with
> Paris,
> >> we
> >>> hit an exception in union: "*A DataStream cannot be unioned with
> >> itself*".
> >>>
> >>> The code raising this exception looks like this:
> >>> stream.union(stream.map(...)).
> >>>
> >>> Taking a look into the union code, we see that it's now not allowed to
> >>> union a stream, not only with itself, but with any product of itself.
> >>>
> >>> First, we are wondering, why is that? Does it make building the stream
> >>> graph easier in some way?
> >>> Second, we might want to give a better error message there, e.g. "*A
> >>> DataStream cannot be unioned with itself or a product of itself*", and
> >>> finally, we should update the docs, which currently state that union a
> >>> stream with itself is allowed and that "*If you union a data stream
> with
> >>> itself you will still only get each element once.*"
> >>>
> >>> Cheers,
> >>> -Vasia.
> >>>
> >>
>
>

Reply via email to