Will do. Feel free to chime in, if I missed anything. On Tue, Nov 1, 2016 at 9:16 PM, Jesse Anderson <je...@smokinghand.com> wrote:
> @Neelesh Could you write an email to the user list explaining the change > since it is a breaking change? > > On Tue, Nov 1, 2016 at 10:08 PM Neelesh Salian <nsal...@cloudera.com> > wrote: > > > Thanks everyone. The PR was merged. > > > > > > On Thu, Oct 27, 2016 at 11:51 AM, Neelesh Salian <nsal...@cloudera.com> > > wrote: > > > > > Thanks everyone for all the inputs. > > > It's really encouraging for a new contributor, as myself, to get > valuable > > > input and mentoring (like on this thread) and, in turn, help make the > > > community better. > > > > > > > > > > > > On Thu, Oct 27, 2016 at 11:41 AM, Jean-Baptiste Onofré < > j...@nanthrax.net> > > > wrote: > > > > > >> You did well ! It's an interesting discussion we have and it's great > to > > >> have it on the mailing list (better than in Jira or PR comments IMHO). > > >> > > >> Thanks ! > > >> > > >> Regards > > >> JB > > >> > > >> > > >> > > >> On Oct 27, 2016, 20:39, at 20:39, Robert Bradshaw > > >> <rober...@google.com.INVALID> wrote: > > >> >+1 to all Dan says. > > >> > > > >> >I only brought this up because it seemed new contributors (yay) > > >> >jumping in and renaming a core transform based on "Something to > > >> >consider" deserved a couple more more eyeballs, but didn't intend for > > >> >it to become a big deal. > > >> > > > >> >On Thu, Oct 27, 2016 at 11:03 AM, Dan Halperin > > >> ><dhalp...@google.com.invalid> wrote: > > >> >> Folks, I don't think this needs to be a "vote". This is just not > that > > >> >big a > > >> >> deal :). It is important to be transparent and have these > discussions > > >> >on > > >> >> the list, which is why we brought it here from GitHub/JIRA, but at > > >> >the end > > >> >> of the day I hope that a small group of committers and developers > can > > >> >> assess "good enough" consensus for these minor issues. > > >> >> > > >> >> Here's my assessment: > > >> >> * We don't really have any rules about naming transforms. "Should > be > > >> >a > > >> >> verb" is a sort of guiding principle inherited from the Google > Flume > > >> >> project from which Dataflow evolved, but honestly we violate this > > >> >rule for > > >> >> clarity all over the place. ("Values", for example). > > >> >> * The "Big Data" community is significantly more familiar with the > > >> >concept > > >> >> of Distinct -- Jesse, who filed the original JIRA, is a good > example > > >> >here. > > >> >> * Finally, nobody feels very strongly. We could argue minor points > of > > >> >each > > >> >> solution, but at the end of the day I don't think anyone wants to > > >> >block a > > >> >> change. > > >> >> > > >> >> Let's go with Distinct. It's important to align Beam with the open > > >> >source > > >> >> big data community. (And thanks Jesse, our newest (*tied) > committer, > > >> >for > > >> >> pushing us in the right direction!) > > >> >> > > >> >> Jesse, can you please take charge of wrapping up the PR and merging > > >> >it? > > >> >> > > >> >> Thanks! > > >> >> Dan > > >> >> > > >> >> On Wed, Oct 26, 2016 at 11:12 PM, Jean-Baptiste Onofré > > >> ><j...@nanthrax.net> > > >> >> wrote: > > >> >> > > >> >>> Just to clarify. Davor is right for a code modification change: -1 > > >> >means a > > >> >>> veto. > > >> >>> I meant that -1 is not a veto for a release vote. > > >> >>> > > >> >>> Anyway, even if it's not a formal code, we can have a discussion > > >> >with > > >> >>> "options" a,b and c. > > >> >>> > > >> >>> Regards > > >> >>> JB > > >> >>> > > >> >>> > > >> >>> > > >> >>> On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci > > >> ><da...@google.com.INVALID> > > >> >>> wrote: > > >> >>> >In terms of reaching a decision on any code or design changes, > > >> >>> >including > > >> >>> >this one, I'd suggest going without formal votes. Voting process > > >> >for > > >> >>> >code > > >> >>> >modifications between choices A and B doesn't necessarily end > with > > >> >a > > >> >>> >decision A or B -- a single (qualified) -1 vote is a veto and > > >> >cannot be > > >> >>> >overridden [1]. Said differently, the guideline is that code > > >> >changes > > >> >>> >should > > >> >>> >be made by consensus; not by one group outvoting another. I'd > like > > >> >to > > >> >>> >avoid > > >> >>> >setting such precedent; we should try to drive consensus, as > > >> >opposed to > > >> >>> >attempting to outvote another part of the community. > > >> >>> > > > >> >>> >In this particular case, we have had a great discussion. Many > > >> >>> >contributors > > >> >>> >brought different perspectives. Consequently, some opinions have > > >> >been > > >> >>> >likely changed. At this point, someone should summarize the > > >> >arguments, > > >> >>> >try > > >> >>> >to critique them from a neutral standpoint, and suggest a refined > > >> >>> >proposal > > >> >>> >that takes these perspectives into account. If nobody objects in > a > > >> >>> >short > > >> >>> >time, we should consider this decided. [ I can certainly help > here, > > >> >but > > >> >>> >I'd > > >> >>> >love to see somebody else do it! ] > > >> >>> > > > >> >>> >[1] http://www.apache.org/foundation/voting.html > > >> >>> > > > >> >>> >On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers > > >> >>> ><bchamb...@google.com.invalid> > > >> >>> >wrote: > > >> >>> > > > >> >>> >> I also like Distinct since it doesn't make it sound like it > > >> >modifies > > >> >>> >any > > >> >>> >> underlying collection. RemoveDuplicates makes it sound like the > > >> >>> >duplicates > > >> >>> >> are removed, rather than a new PCollection without duplicates > > >> >being > > >> >>> >> returned. > > >> >>> >> > > >> >>> >> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré > > >> ><j...@nanthrax.net> > > >> >>> >> wrote: > > >> >>> >> > > >> >>> >> > Agree. It was more a transition proposal. > > >> >>> >> > > > >> >>> >> > Regards > > >> >>> >> > JB > > >> >>> >> > > > >> >>> >> > > > >> >>> >> > > > >> >>> >> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw > > >> >>> >> > <rober...@google.com.INVALID> wrote: > > >> >>> >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré > > >> >>> >> > ><j...@nanthrax.net> wrote: > > >> >>> >> > >> And what about use RemoveDuplicates and create an alias > > >> >Distinct > > >> >>> >? > > >> >>> >> > > > > >> >>> >> > >I'd really like to avoid (long term) aliases--you end up > > >> >having to > > >> >>> >> > >document (and maintain) them both, and it adds confusion as > to > > >> >>> >which > > >> >>> >> > >one to use (especially if they every diverge), and means > > >> >searching > > >> >>> >for > > >> >>> >> > >one or the other yields half the results. > > >> >>> >> > > > > >> >>> >> > >> It doesn't break the API and would address both SQL users > > >> >and > > >> >>> >more > > >> >>> >> > >"big data" users. > > >> >>> >> > >> > > >> >>> >> > >> My $0.01 ;) > > >> >>> >> > >> > > >> >>> >> > >> Regards > > >> >>> >> > >> JB > > >> >>> >> > >> > > >> >>> >> > >> > > >> >>> >> > >> > > >> >>> >> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin > > >> >>> >> > ><dhalp...@google.com.INVALID> wrote: > > >> >>> >> > >>>I find "MakeDistinct" more confusing. My votes in > decreasing > > >> >>> >> > >>>preference: > > >> >>> >> > >>> > > >> >>> >> > >>>1. Keep `RemoveDuplicates` name, ensure that important > > >> >keywords > > >> >>> >are > > >> >>> >> > >in > > >> >>> >> > >>>the > > >> >>> >> > >>>Javadoc. This reduces churn on our users and is honestly > > >> >pretty > > >> >>> >dang > > >> >>> >> > >>> descriptive. > > >> >>> >> > >>>2. Rename to `Distinct`, which is clear if you're a SQL > user > > >> >and > > >> >>> >> > >likely > > >> >>> >> > >>>less clear otherwise. This is a backwards-incompatible API > > >> >>> >change, so > > >> >>> >> > >>>we > > >> >>> >> > >>>should do it before we go stable. > > >> >>> >> > >>> > > >> >>> >> > >>>I am not super strong that 1 > 2, but I am very strong > that > > >> >>> >> > >"Distinct" > > >> >>> >> > >>>>>> > > >> >>> >> > >>>"MakeDistinct" or and "RemoveDuplicates" >>> > > >> >"AvoidDuplicate". > > >> >>> >> > >>> > > >> >>> >> > >>>Dan > > >> >>> >> > >>> > > >> >>> >> > >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles > > >> >>> >> > >>><k...@google.com.invalid> > > >> >>> >> > >>>wrote: > > >> >>> >> > >>> > > >> >>> >> > >>>> The precedent that we use verbs has many exceptions. We > > >> >have > > >> >>> >> > >>>> ApproximateQuantiles, Values, Keys, WithTimestamps, and > I > > >> >>> >would > > >> >>> >> > >even > > >> >>> >> > >>>> include Sum (at least when I read it). > > >> >>> >> > >>>> > > >> >>> >> > >>>> Historical note: the predilection towards verbs is from > > >> >the > > >> >>> >Google > > >> >>> >> > >>>Style > > >> >>> >> > >>>> Guide for Java method names > > >> >>> >> > >>>> > > >> >>> >> > >>><https://google.github.io/styleguide/javaguide.html#s5. > > >> >>> >> 2.3-method-names > > >> >>> >> > >, > > >> >>> >> > >>>> which states "Method names are typically verbs or verb > > >> >>> >phrases". > > >> >>> >> > >But > > >> >>> >> > >>>even > > >> >>> >> > >>>> in Google code there are lots of exceptions when it > makes > > >> >>> >sense, > > >> >>> >> > >like > > >> >>> >> > >>>> Guava's > > >> >>> >> > >>>> Iterables.any(), Iterables.all(), Iterables.toArray(), > the > > >> >>> >entire > > >> >>> >> > >>>> Predicates module, etc. Just an aside; Beam isn't Google > > >> >code. > > >> >>> >I > > >> >>> >> > >>>suggest we > > >> >>> >> > >>>> use our judgment rather than a policy. > > >> >>> >> > >>>> > > >> >>> >> > >>>> I think "Distinct" is one of those exceptions. It is a > > >> >>> >standard > > >> >>> >> > >>>widespread > > >> >>> >> > >>>> name and also reads better as an adjective. I prefer it, > > >> >but > > >> >>> >also > > >> >>> >> > >>>don't > > >> >>> >> > >>>> care strongly enough to change it or to change it back > :-) > > >> >>> >> > >>>> > > >> >>> >> > >>>> If we must have a verb, I like it as-is more than > > >> >MakeDistinct > > >> >>> >and > > >> >>> >> > >>>> AvoidDuplicate. > > >> >>> >> > >>>> > > >> >>> >> > >>>> On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson > > >> >>> >> > >>><je...@smokinghand.com> > > >> >>> >> > >>>> wrote: > > >> >>> >> > >>>> > > >> >>> >> > >>>> > My original thought for this change was that Crunch > uses > > >> >the > > >> >>> >> > >class > > >> >>> >> > >>>name > > >> >>> >> > >>>> > Distinct. SQL also uses the keyword distinct. > > >> >>> >> > >>>> > > > >> >>> >> > >>>> > Maybe the rule should be changed to adjectives or > verbs > > >> >>> >depending > > >> >>> >> > >>>on the > > >> >>> >> > >>>> > context. > > >> >>> >> > >>>> > > > >> >>> >> > >>>> > Using a verb to describe this class really doesn't > > >> >connote > > >> >>> >what > > >> >>> >> > >the > > >> >>> >> > >>>class > > >> >>> >> > >>>> > does as succinctly as the adjective. > > >> >>> >> > >>>> > > > >> >>> >> > >>>> > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian > > >> >>> >> > >>><nsal...@cloudera.com> > > >> >>> >> > >>>> > wrote: > > >> >>> >> > >>>> > > > >> >>> >> > >>>> > > Hello, > > >> >>> >> > >>>> > > > > >> >>> >> > >>>> > > First of all, thank you to Daniel, Robert and Jesse > > >> >for > > >> >>> >their > > >> >>> >> > >>>review on > > >> >>> >> > >>>> > > this: https://issues.apache.org/ > jira/browse/BEAM-239 > > >> >>> >> > >>>> > > > > >> >>> >> > >>>> > > A point that came up was using verbs explicitly for > > >> >>> >Transforms. > > >> >>> >> > >>>> > > Here is the PR: > > >> >>> >> > >>>https://github.com/apache/incubator-beam/pull/1164 > > >> >>> >> > >>>> > > > > >> >>> >> > >>>> > > Posting it to help understand if we have a consensus > > >> >for > > >> >>> >it and > > >> >>> >> > >>>if yes, > > >> >>> >> > >>>> > we > > >> >>> >> > >>>> > > could perhaps document it for future changes. > > >> >>> >> > >>>> > > > > >> >>> >> > >>>> > > Thank you. > > >> >>> >> > >>>> > > > > >> >>> >> > >>>> > > -- > > >> >>> >> > >>>> > > Neelesh Srinivas Salian > > >> >>> >> > >>>> > > Engineer > > >> >>> >> > >>>> > > > > >> >>> >> > >>>> > > > >> >>> >> > >>>> > > >> >>> >> > > > >> >>> >> > > >> >>> > > >> > > > > > > > > > > > > -- > > > Neelesh Srinivas Salian > > > Customer Operations Engineer > > > > > > > > > > > > > > > > > > -- > > Neelesh Srinivas Salian > > Customer Operations Engineer > > > -- Neelesh Srinivas Salian Customer Operations Engineer