Re: What is the future of Reshuffle?

2018-05-21 Thread Raghu Angadi
Filed https://issues.apache.org/jira/browse/BEAM-4372 (unassigned). On Mon, May 21, 2018 at 10:22 AM Raghu Angadi wrote: > > > On Mon, May 21, 2018 at 9:56 AM Robert Bradshaw > wrote: > >> We should probably keep the warning and all the caveats until we >> introduce the alternative (and migrate

Re: What is the future of Reshuffle?

2018-05-21 Thread Raghu Angadi
On Mon, May 21, 2018 at 9:56 AM Robert Bradshaw wrote: > We should probably keep the warning and all the caveats until we introduce > the alternative (and migrate to it for the non-parallelism uses of > reshuffle). I was just proposing we do this via a separate transform that > just calls Reshuff

Re: What is the future of Reshuffle?

2018-05-21 Thread Ben Chambers
+1 to introducing alternative transforms even if they wrap Reshuffle The benefits of making them distinct is that we can put appropriate Javadoc in place and runners can figure out what the user is intending and whether Reshuffle or some other implementation is appropriate. We can also see which o

Re: What is the future of Reshuffle?

2018-05-21 Thread Robert Bradshaw
We should probably keep the warning and all the caveats until we introduce the alternative (and migrate to it for the non-parallelism uses of reshuffle). I was just proposing we do this via a separate transform that just calls Reshuffle until we have the new story fully fleshed out (I don't know if

Re: What is the future of Reshuffle?

2018-05-20 Thread Raghu Angadi
On Sat, May 19, 2018 at 10:55 PM Robert Bradshaw wrote: > On Sat, May 19, 2018 at 6:27 PM Raghu Angadi wrote: > >> [...] >> > I think it would be much more user friendly to un-deprecate it to add a >> warning for advanced users about non-portability of durability/replay >> guarantees/stable inpu

Re: What is the future of Reshuffle?

2018-05-20 Thread Reuven Lax
On Sat, May 19, 2018 at 10:55 PM Robert Bradshaw wrote: > On Sat, May 19, 2018 at 6:27 PM Raghu Angadi wrote: > >> On Sat, May 19, 2018 at 8:11 AM Robert Bradshaw >> wrote: >> >>> On Fri, May 18, 2018 at 6:29 PM Raghu Angadi wrote: >>> True. I am still failing to see what is broken about

Re: What is the future of Reshuffle?

2018-05-19 Thread Robert Bradshaw
On Sat, May 19, 2018 at 6:27 PM Raghu Angadi wrote: > On Sat, May 19, 2018 at 8:11 AM Robert Bradshaw > wrote: > >> On Fri, May 18, 2018 at 6:29 PM Raghu Angadi wrote: >> >>> True. I am still failing to see what is broken about Reshuffle that is >>> also not broken with GroupByKey transform. If

Re: What is the future of Reshuffle?

2018-05-19 Thread Raghu Angadi
On Sat, May 19, 2018 at 8:11 AM Robert Bradshaw wrote: > On Fri, May 18, 2018 at 6:29 PM Raghu Angadi wrote: > >> True. I am still failing to see what is broken about Reshuffle that is >> also not broken with GroupByKey transform. If someone depends on GroupByKey >> to get stable input, isn't th

Re: What is the future of Reshuffle?

2018-05-19 Thread Robert Bradshaw
On Fri, May 18, 2018 at 6:29 PM Raghu Angadi wrote: > > On Fri, May 18, 2018 at 5:34 PM Robert Bradshaw > wrote: > >> Ah, thanks, that makes sense. That implies to me Reshuffle is no more >>> broken than GBK itself. May be Reshuffle.viaRandomKey() could have a clear >>> caveat. Reshuffle's JavaD

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
On Fri, May 18, 2018 at 5:34 PM Robert Bradshaw wrote: > Ah, thanks, that makes sense. That implies to me Reshuffle is no more >> broken than GBK itself. May be Reshuffle.viaRandomKey() could have a clear >> caveat. Reshuffle's JavaDoc could add a caveat too about non-deterministic >> keys and re

Re: What is the future of Reshuffle?

2018-05-18 Thread Robert Bradshaw
On Fri, May 18, 2018 at 4:24 PM Raghu Angadi wrote: > On Fri, May 18, 2018 at 4:07 PM Kenneth Knowles wrote: > >> It isn't any particular logic in Reshuffle - it is, semantically, an >> identity transform. It is the fact that other runners are perfectly able to >> re-run transform prior to a GBK

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
On Fri, May 18, 2018 at 4:07 PM Kenneth Knowles wrote: > It isn't any particular logic in Reshuffle - it is, semantically, an > identity transform. It is the fact that other runners are perfectly able to > re-run transform prior to a GBK. So, for example, randomly generated IDs > will be re-gener

Re: What is the future of Reshuffle?

2018-05-18 Thread Kenneth Knowles
It isn't any particular logic in Reshuffle - it is, semantically, an identity transform. It is the fact that other runners are perfectly able to re-run transform prior to a GBK. So, for example, randomly generated IDs will be re-generated. We tend to put in reshuffles in order to "commit" these ran

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
On Fri, May 18, 2018 at 12:22 PM Robert Bradshaw wrote: > [resending] > Agreed that keeping this deprecated without a clear replacement for so long > is not ideal. > > I would at least break this into two separate transforms, the > parallelism-breaking one (which seems OK) and the stable input on

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
On Fri, May 18, 2018 at 12:21 PM Robert Bradshaw wrote: > On Fri, May 18, 2018 at 11:46 AM Raghu Angadi wrote: > >> Thanks Kenn. >> >> On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles wrote: >> >>> The fact that its usage has grown probably indicates that we have a >>> large number of transform

Re: What is the future of Reshuffle?

2018-05-18 Thread Robert Bradshaw
[resending] Agreed that keeping this deprecated without a clear replacement for so long is not ideal. I would at least break this into two separate transforms, the parallelism-breaking one (which seems OK) and the stable input one (which may just call the parallelism-breaking one, but should be d

Re: What is the future of Reshuffle?

2018-05-18 Thread Robert Bradshaw
On Fri, May 18, 2018 at 11:46 AM Raghu Angadi wrote: > Thanks Kenn. > > On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles wrote: > >> The fact that its usage has grown probably indicates that we have a large >> number of transforms that can easily cause data loss / duplication. >> > > Is this spe

Re: What is the future of Reshuffle?

2018-05-18 Thread Robert Bradshaw
On Fri, May 18, 2018 at 11:46 AM Raghu Angadi wrote:>> Thanks Kenn.>> On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles wrote: The fact that its usage has grown probably indicates that we have a large number of transforms that can easily cause data loss / d

Re: What is the future of Reshuffle?

2018-05-18 Thread Robert Bradshaw
Agreed that keeping this deprecated without a clear replacement for so long is not ideal. I would at least break this into two separate transforms, the parallelism-breaking one (which seems OK) and the stable input one (which may just call the parallelism-breaking one, but should be decorated with

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
Thanks Kenn. On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles wrote: > The fact that its usage has grown probably indicates that we have a large > number of transforms that can easily cause data loss / duplication. > Is this specific to Reshuffle or it is true for any GroupByKey? I see Reshuffl

Re: What is the future of Reshuffle?

2018-05-18 Thread Kenneth Knowles
The fact that its usage has grown probably indicates that we have a large number of transforms that can easily cause data loss / duplication. Yes, it is deprecated because it is primarily used as a Dataflow-specific way to ensure stable input. My understanding is that the SparkRunner also material

Re: What is the future of Reshuffle?

2018-05-18 Thread Eugene Kirpichov
Agreed that it should be undeprecated, many users are getting confused by this. I know that some people are working on a replacement for at least one of its use cases (RequiresStableInput), but the use case of breaking fusion is, as of yet, unaddressed, and there's not much to be gained by keeping

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
I am interested in more clarity on this as well. It has been deprecated for a long time without a replacement, and its usage has only grown, both within Beam code base as well as in user applications. If we are certain that it will not be removed before there is a good replacement for it, can we u

What is the future of Reshuffle?

2018-05-18 Thread Ismaël Mejía
I saw in a recent thread that the use of the Reshuffle transform was recommended to solve an user issue: https://lists.apache.org/thread.html/87ef575ac67948868648e0a8110be242f811bfff8fdaa7f9b758b933@%3Cdev.beam.apache.org%3E I can see why it may fix the reported issue. I am just curious about the