@Neelesh Could you write an email to the user list explaining the change since it is a breaking change?
On Tue, Nov 1, 2016 at 10:08 PM Neelesh Salian <nsal...@cloudera.com> wrote: > Thanks everyone. The PR was merged. > > > On Thu, Oct 27, 2016 at 11:51 AM, Neelesh Salian <nsal...@cloudera.com> > wrote: > > > Thanks everyone for all the inputs. > > It's really encouraging for a new contributor, as myself, to get valuable > > input and mentoring (like on this thread) and, in turn, help make the > > community better. > > > > > > > > On Thu, Oct 27, 2016 at 11:41 AM, Jean-Baptiste Onofré <j...@nanthrax.net> > > wrote: > > > >> You did well ! It's an interesting discussion we have and it's great to > >> have it on the mailing list (better than in Jira or PR comments IMHO). > >> > >> Thanks ! > >> > >> Regards > >> JB > >> > >> > >> > >> On Oct 27, 2016, 20:39, at 20:39, Robert Bradshaw > >> <rober...@google.com.INVALID> wrote: > >> >+1 to all Dan says. > >> > > >> >I only brought this up because it seemed new contributors (yay) > >> >jumping in and renaming a core transform based on "Something to > >> >consider" deserved a couple more more eyeballs, but didn't intend for > >> >it to become a big deal. > >> > > >> >On Thu, Oct 27, 2016 at 11:03 AM, Dan Halperin > >> ><dhalp...@google.com.invalid> wrote: > >> >> Folks, I don't think this needs to be a "vote". This is just not that > >> >big a > >> >> deal :). It is important to be transparent and have these discussions > >> >on > >> >> the list, which is why we brought it here from GitHub/JIRA, but at > >> >the end > >> >> of the day I hope that a small group of committers and developers can > >> >> assess "good enough" consensus for these minor issues. > >> >> > >> >> Here's my assessment: > >> >> * We don't really have any rules about naming transforms. "Should be > >> >a > >> >> verb" is a sort of guiding principle inherited from the Google Flume > >> >> project from which Dataflow evolved, but honestly we violate this > >> >rule for > >> >> clarity all over the place. ("Values", for example). > >> >> * The "Big Data" community is significantly more familiar with the > >> >concept > >> >> of Distinct -- Jesse, who filed the original JIRA, is a good example > >> >here. > >> >> * Finally, nobody feels very strongly. We could argue minor points of > >> >each > >> >> solution, but at the end of the day I don't think anyone wants to > >> >block a > >> >> change. > >> >> > >> >> Let's go with Distinct. It's important to align Beam with the open > >> >source > >> >> big data community. (And thanks Jesse, our newest (*tied) committer, > >> >for > >> >> pushing us in the right direction!) > >> >> > >> >> Jesse, can you please take charge of wrapping up the PR and merging > >> >it? > >> >> > >> >> Thanks! > >> >> Dan > >> >> > >> >> On Wed, Oct 26, 2016 at 11:12 PM, Jean-Baptiste Onofré > >> ><j...@nanthrax.net> > >> >> wrote: > >> >> > >> >>> Just to clarify. Davor is right for a code modification change: -1 > >> >means a > >> >>> veto. > >> >>> I meant that -1 is not a veto for a release vote. > >> >>> > >> >>> Anyway, even if it's not a formal code, we can have a discussion > >> >with > >> >>> "options" a,b and c. > >> >>> > >> >>> Regards > >> >>> JB > >> >>> > >> >>> > >> >>> > >> >>> On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci > >> ><da...@google.com.INVALID> > >> >>> wrote: > >> >>> >In terms of reaching a decision on any code or design changes, > >> >>> >including > >> >>> >this one, I'd suggest going without formal votes. Voting process > >> >for > >> >>> >code > >> >>> >modifications between choices A and B doesn't necessarily end with > >> >a > >> >>> >decision A or B -- a single (qualified) -1 vote is a veto and > >> >cannot be > >> >>> >overridden [1]. Said differently, the guideline is that code > >> >changes > >> >>> >should > >> >>> >be made by consensus; not by one group outvoting another. I'd like > >> >to > >> >>> >avoid > >> >>> >setting such precedent; we should try to drive consensus, as > >> >opposed to > >> >>> >attempting to outvote another part of the community. > >> >>> > > >> >>> >In this particular case, we have had a great discussion. Many > >> >>> >contributors > >> >>> >brought different perspectives. Consequently, some opinions have > >> >been > >> >>> >likely changed. At this point, someone should summarize the > >> >arguments, > >> >>> >try > >> >>> >to critique them from a neutral standpoint, and suggest a refined > >> >>> >proposal > >> >>> >that takes these perspectives into account. If nobody objects in a > >> >>> >short > >> >>> >time, we should consider this decided. [ I can certainly help here, > >> >but > >> >>> >I'd > >> >>> >love to see somebody else do it! ] > >> >>> > > >> >>> >[1] http://www.apache.org/foundation/voting.html > >> >>> > > >> >>> >On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers > >> >>> ><bchamb...@google.com.invalid> > >> >>> >wrote: > >> >>> > > >> >>> >> I also like Distinct since it doesn't make it sound like it > >> >modifies > >> >>> >any > >> >>> >> underlying collection. RemoveDuplicates makes it sound like the > >> >>> >duplicates > >> >>> >> are removed, rather than a new PCollection without duplicates > >> >being > >> >>> >> returned. > >> >>> >> > >> >>> >> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré > >> ><j...@nanthrax.net> > >> >>> >> wrote: > >> >>> >> > >> >>> >> > Agree. It was more a transition proposal. > >> >>> >> > > >> >>> >> > Regards > >> >>> >> > JB > >> >>> >> > > >> >>> >> > > >> >>> >> > > >> >>> >> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw > >> >>> >> > <rober...@google.com.INVALID> wrote: > >> >>> >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré > >> >>> >> > ><j...@nanthrax.net> wrote: > >> >>> >> > >> And what about use RemoveDuplicates and create an alias > >> >Distinct > >> >>> >? > >> >>> >> > > > >> >>> >> > >I'd really like to avoid (long term) aliases--you end up > >> >having to > >> >>> >> > >document (and maintain) them both, and it adds confusion as to > >> >>> >which > >> >>> >> > >one to use (especially if they every diverge), and means > >> >searching > >> >>> >for > >> >>> >> > >one or the other yields half the results. > >> >>> >> > > > >> >>> >> > >> It doesn't break the API and would address both SQL users > >> >and > >> >>> >more > >> >>> >> > >"big data" users. > >> >>> >> > >> > >> >>> >> > >> My $0.01 ;) > >> >>> >> > >> > >> >>> >> > >> Regards > >> >>> >> > >> JB > >> >>> >> > >> > >> >>> >> > >> > >> >>> >> > >> > >> >>> >> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin > >> >>> >> > ><dhalp...@google.com.INVALID> wrote: > >> >>> >> > >>>I find "MakeDistinct" more confusing. My votes in decreasing > >> >>> >> > >>>preference: > >> >>> >> > >>> > >> >>> >> > >>>1. Keep `RemoveDuplicates` name, ensure that important > >> >keywords > >> >>> >are > >> >>> >> > >in > >> >>> >> > >>>the > >> >>> >> > >>>Javadoc. This reduces churn on our users and is honestly > >> >pretty > >> >>> >dang > >> >>> >> > >>> descriptive. > >> >>> >> > >>>2. Rename to `Distinct`, which is clear if you're a SQL user > >> >and > >> >>> >> > >likely > >> >>> >> > >>>less clear otherwise. This is a backwards-incompatible API > >> >>> >change, so > >> >>> >> > >>>we > >> >>> >> > >>>should do it before we go stable. > >> >>> >> > >>> > >> >>> >> > >>>I am not super strong that 1 > 2, but I am very strong that > >> >>> >> > >"Distinct" > >> >>> >> > >>>>>> > >> >>> >> > >>>"MakeDistinct" or and "RemoveDuplicates" >>> > >> >"AvoidDuplicate". > >> >>> >> > >>> > >> >>> >> > >>>Dan > >> >>> >> > >>> > >> >>> >> > >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles > >> >>> >> > >>><k...@google.com.invalid> > >> >>> >> > >>>wrote: > >> >>> >> > >>> > >> >>> >> > >>>> The precedent that we use verbs has many exceptions. We > >> >have > >> >>> >> > >>>> ApproximateQuantiles, Values, Keys, WithTimestamps, and I > >> >>> >would > >> >>> >> > >even > >> >>> >> > >>>> include Sum (at least when I read it). > >> >>> >> > >>>> > >> >>> >> > >>>> Historical note: the predilection towards verbs is from > >> >the > >> >>> >Google > >> >>> >> > >>>Style > >> >>> >> > >>>> Guide for Java method names > >> >>> >> > >>>> > >> >>> >> > >>><https://google.github.io/styleguide/javaguide.html#s5. > >> >>> >> 2.3-method-names > >> >>> >> > >, > >> >>> >> > >>>> which states "Method names are typically verbs or verb > >> >>> >phrases". > >> >>> >> > >But > >> >>> >> > >>>even > >> >>> >> > >>>> in Google code there are lots of exceptions when it makes > >> >>> >sense, > >> >>> >> > >like > >> >>> >> > >>>> Guava's > >> >>> >> > >>>> Iterables.any(), Iterables.all(), Iterables.toArray(), the > >> >>> >entire > >> >>> >> > >>>> Predicates module, etc. Just an aside; Beam isn't Google > >> >code. > >> >>> >I > >> >>> >> > >>>suggest we > >> >>> >> > >>>> use our judgment rather than a policy. > >> >>> >> > >>>> > >> >>> >> > >>>> I think "Distinct" is one of those exceptions. It is a > >> >>> >standard > >> >>> >> > >>>widespread > >> >>> >> > >>>> name and also reads better as an adjective. I prefer it, > >> >but > >> >>> >also > >> >>> >> > >>>don't > >> >>> >> > >>>> care strongly enough to change it or to change it back :-) > >> >>> >> > >>>> > >> >>> >> > >>>> If we must have a verb, I like it as-is more than > >> >MakeDistinct > >> >>> >and > >> >>> >> > >>>> AvoidDuplicate. > >> >>> >> > >>>> > >> >>> >> > >>>> On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson > >> >>> >> > >>><je...@smokinghand.com> > >> >>> >> > >>>> wrote: > >> >>> >> > >>>> > >> >>> >> > >>>> > My original thought for this change was that Crunch uses > >> >the > >> >>> >> > >class > >> >>> >> > >>>name > >> >>> >> > >>>> > Distinct. SQL also uses the keyword distinct. > >> >>> >> > >>>> > > >> >>> >> > >>>> > Maybe the rule should be changed to adjectives or verbs > >> >>> >depending > >> >>> >> > >>>on the > >> >>> >> > >>>> > context. > >> >>> >> > >>>> > > >> >>> >> > >>>> > Using a verb to describe this class really doesn't > >> >connote > >> >>> >what > >> >>> >> > >the > >> >>> >> > >>>class > >> >>> >> > >>>> > does as succinctly as the adjective. > >> >>> >> > >>>> > > >> >>> >> > >>>> > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian > >> >>> >> > >>><nsal...@cloudera.com> > >> >>> >> > >>>> > wrote: > >> >>> >> > >>>> > > >> >>> >> > >>>> > > Hello, > >> >>> >> > >>>> > > > >> >>> >> > >>>> > > First of all, thank you to Daniel, Robert and Jesse > >> >for > >> >>> >their > >> >>> >> > >>>review on > >> >>> >> > >>>> > > this: https://issues.apache.org/jira/browse/BEAM-239 > >> >>> >> > >>>> > > > >> >>> >> > >>>> > > A point that came up was using verbs explicitly for > >> >>> >Transforms. > >> >>> >> > >>>> > > Here is the PR: > >> >>> >> > >>>https://github.com/apache/incubator-beam/pull/1164 > >> >>> >> > >>>> > > > >> >>> >> > >>>> > > Posting it to help understand if we have a consensus > >> >for > >> >>> >it and > >> >>> >> > >>>if yes, > >> >>> >> > >>>> > we > >> >>> >> > >>>> > > could perhaps document it for future changes. > >> >>> >> > >>>> > > > >> >>> >> > >>>> > > Thank you. > >> >>> >> > >>>> > > > >> >>> >> > >>>> > > -- > >> >>> >> > >>>> > > Neelesh Srinivas Salian > >> >>> >> > >>>> > > Engineer > >> >>> >> > >>>> > > > >> >>> >> > >>>> > > >> >>> >> > >>>> > >> >>> >> > > >> >>> >> > >> >>> > >> > > > > > > > > -- > > Neelesh Srinivas Salian > > Customer Operations Engineer > > > > > > > > > > > -- > Neelesh Srinivas Salian > Customer Operations Engineer >