So, do we all agree that the current behavior is not correct? Shall I open
a JIRA about this?

On 25 November 2015 at 13:58, Gyula Fóra <gyula.f...@gmail.com> wrote:

> Well it kind of depends on what definition of union are we using. If this
> is a union in a set theoretical way we can argue that the union of a stream
> with itself should be the same stream because it contains exactly the same
> elements with the same timestamps and lineage.
>
> On the other hand stream and stream.map(id) are not exactly the same as
> they might have elements with different order (the lineage differs).
>
> So I wouldnt say that any self-union semantics is the only possible one.
>
> Gyula
>
> Bruecke, Christoph <christoph.brue...@campus.tu-berlin.de> ezt írta
> (időpont: 2015. nov. 25., Sze, 13:47):
>
> > Hi,
> >
> > the operation “stream.union(stream.map(id))” is equivalent to
> > “stream.union(stream)” isn’t it? So it might also duplicate the data.
> >
> > - Christoph
> >
> >
> > > On 25 Nov 2015, at 11:24, Stephan Ewen <se...@apache.org> wrote:
> > >
> > > "stream.union(stream.map(..))" should definitely be possible. Not sure
> > why
> > > this is not permitted.
> > >
> > > "stream.union(stream)" would contain each element twice, so should
> either
> > > give an error or actually union (or duplicate) elements...
> > >
> > > Stephan
> > >
> > >
> > > On Wed, Nov 25, 2015 at 10:42 AM, Gyula Fóra <gyf...@apache.org>
> wrote:
> > >
> > >> Yes, I am not sure if this the intentional behaviour. I think you are
> > >> supposed to be able to do the things you described.
> > >>
> > >> stream.union(stream.map(..)) and things like this are fair operations.
> > Also
> > >> maybe stream.union(stream) should just give stream instead of an
> error.
> > >>
> > >> Could someone comment on this who knows the reasoning behind the
> current
> > >> mechanics?
> > >>
> > >> Gyula
> > >>
> > >> Vasiliki Kalavri <vasilikikala...@gmail.com> ezt írta (időpont: 2015.
> > nov.
> > >> 24., K, 16:46):
> > >>
> > >>> Hi squirrels,
> > >>>
> > >>> when porting the gelly streaming code from 0.9 to 0.10 today with
> > Paris,
> > >> we
> > >>> hit an exception in union: "*A DataStream cannot be unioned with
> > >> itself*".
> > >>>
> > >>> The code raising this exception looks like this:
> > >>> stream.union(stream.map(...)).
> > >>>
> > >>> Taking a look into the union code, we see that it's now not allowed
> to
> > >>> union a stream, not only with itself, but with any product of itself.
> > >>>
> > >>> First, we are wondering, why is that? Does it make building the
> stream
> > >>> graph easier in some way?
> > >>> Second, we might want to give a better error message there, e.g. "*A
> > >>> DataStream cannot be unioned with itself or a product of itself*",
> and
> > >>> finally, we should update the docs, which currently state that union
> a
> > >>> stream with itself is allowed and that "*If you union a data stream
> > with
> > >>> itself you will still only get each element once.*"
> > >>>
> > >>> Cheers,
> > >>> -Vasia.
> > >>>
> > >>
> >
> >
>

Reply via email to