Hi, the operation “stream.union(stream.map(id))” is equivalent to “stream.union(stream)” isn’t it? So it might also duplicate the data.
- Christoph > On 25 Nov 2015, at 11:24, Stephan Ewen <se...@apache.org> wrote: > > "stream.union(stream.map(..))" should definitely be possible. Not sure why > this is not permitted. > > "stream.union(stream)" would contain each element twice, so should either > give an error or actually union (or duplicate) elements... > > Stephan > > > On Wed, Nov 25, 2015 at 10:42 AM, Gyula Fóra <gyf...@apache.org> wrote: > >> Yes, I am not sure if this the intentional behaviour. I think you are >> supposed to be able to do the things you described. >> >> stream.union(stream.map(..)) and things like this are fair operations. Also >> maybe stream.union(stream) should just give stream instead of an error. >> >> Could someone comment on this who knows the reasoning behind the current >> mechanics? >> >> Gyula >> >> Vasiliki Kalavri <vasilikikala...@gmail.com> ezt írta (időpont: 2015. nov. >> 24., K, 16:46): >> >>> Hi squirrels, >>> >>> when porting the gelly streaming code from 0.9 to 0.10 today with Paris, >> we >>> hit an exception in union: "*A DataStream cannot be unioned with >> itself*". >>> >>> The code raising this exception looks like this: >>> stream.union(stream.map(...)). >>> >>> Taking a look into the union code, we see that it's now not allowed to >>> union a stream, not only with itself, but with any product of itself. >>> >>> First, we are wondering, why is that? Does it make building the stream >>> graph easier in some way? >>> Second, we might want to give a better error message there, e.g. "*A >>> DataStream cannot be unioned with itself or a product of itself*", and >>> finally, we should update the docs, which currently state that union a >>> stream with itself is allowed and that "*If you union a data stream with >>> itself you will still only get each element once.*" >>> >>> Cheers, >>> -Vasia. >>> >>