+1 to all Dan says. I only brought this up because it seemed new contributors (yay) jumping in and renaming a core transform based on "Something to consider" deserved a couple more more eyeballs, but didn't intend for it to become a big deal.
On Thu, Oct 27, 2016 at 11:03 AM, Dan Halperin <dhalp...@google.com.invalid> wrote: > Folks, I don't think this needs to be a "vote". This is just not that big a > deal :). It is important to be transparent and have these discussions on > the list, which is why we brought it here from GitHub/JIRA, but at the end > of the day I hope that a small group of committers and developers can > assess "good enough" consensus for these minor issues. > > Here's my assessment: > * We don't really have any rules about naming transforms. "Should be a > verb" is a sort of guiding principle inherited from the Google Flume > project from which Dataflow evolved, but honestly we violate this rule for > clarity all over the place. ("Values", for example). > * The "Big Data" community is significantly more familiar with the concept > of Distinct -- Jesse, who filed the original JIRA, is a good example here. > * Finally, nobody feels very strongly. We could argue minor points of each > solution, but at the end of the day I don't think anyone wants to block a > change. > > Let's go with Distinct. It's important to align Beam with the open source > big data community. (And thanks Jesse, our newest (*tied) committer, for > pushing us in the right direction!) > > Jesse, can you please take charge of wrapping up the PR and merging it? > > Thanks! > Dan > > On Wed, Oct 26, 2016 at 11:12 PM, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> Just to clarify. Davor is right for a code modification change: -1 means a >> veto. >> I meant that -1 is not a veto for a release vote. >> >> Anyway, even if it's not a formal code, we can have a discussion with >> "options" a,b and c. >> >> Regards >> JB >> >> >> >> On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci <da...@google.com.INVALID> >> wrote: >> >In terms of reaching a decision on any code or design changes, >> >including >> >this one, I'd suggest going without formal votes. Voting process for >> >code >> >modifications between choices A and B doesn't necessarily end with a >> >decision A or B -- a single (qualified) -1 vote is a veto and cannot be >> >overridden [1]. Said differently, the guideline is that code changes >> >should >> >be made by consensus; not by one group outvoting another. I'd like to >> >avoid >> >setting such precedent; we should try to drive consensus, as opposed to >> >attempting to outvote another part of the community. >> > >> >In this particular case, we have had a great discussion. Many >> >contributors >> >brought different perspectives. Consequently, some opinions have been >> >likely changed. At this point, someone should summarize the arguments, >> >try >> >to critique them from a neutral standpoint, and suggest a refined >> >proposal >> >that takes these perspectives into account. If nobody objects in a >> >short >> >time, we should consider this decided. [ I can certainly help here, but >> >I'd >> >love to see somebody else do it! ] >> > >> >[1] http://www.apache.org/foundation/voting.html >> > >> >On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers >> ><bchamb...@google.com.invalid> >> >wrote: >> > >> >> I also like Distinct since it doesn't make it sound like it modifies >> >any >> >> underlying collection. RemoveDuplicates makes it sound like the >> >duplicates >> >> are removed, rather than a new PCollection without duplicates being >> >> returned. >> >> >> >> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré <j...@nanthrax.net> >> >> wrote: >> >> >> >> > Agree. It was more a transition proposal. >> >> > >> >> > Regards >> >> > JB >> >> > >> >> > >> >> > >> >> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw >> >> > <rober...@google.com.INVALID> wrote: >> >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré >> >> > ><j...@nanthrax.net> wrote: >> >> > >> And what about use RemoveDuplicates and create an alias Distinct >> >? >> >> > > >> >> > >I'd really like to avoid (long term) aliases--you end up having to >> >> > >document (and maintain) them both, and it adds confusion as to >> >which >> >> > >one to use (especially if they every diverge), and means searching >> >for >> >> > >one or the other yields half the results. >> >> > > >> >> > >> It doesn't break the API and would address both SQL users and >> >more >> >> > >"big data" users. >> >> > >> >> >> > >> My $0.01 ;) >> >> > >> >> >> > >> Regards >> >> > >> JB >> >> > >> >> >> > >> >> >> > >> >> >> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin >> >> > ><dhalp...@google.com.INVALID> wrote: >> >> > >>>I find "MakeDistinct" more confusing. My votes in decreasing >> >> > >>>preference: >> >> > >>> >> >> > >>>1. Keep `RemoveDuplicates` name, ensure that important keywords >> >are >> >> > >in >> >> > >>>the >> >> > >>>Javadoc. This reduces churn on our users and is honestly pretty >> >dang >> >> > >>> descriptive. >> >> > >>>2. Rename to `Distinct`, which is clear if you're a SQL user and >> >> > >likely >> >> > >>>less clear otherwise. This is a backwards-incompatible API >> >change, so >> >> > >>>we >> >> > >>>should do it before we go stable. >> >> > >>> >> >> > >>>I am not super strong that 1 > 2, but I am very strong that >> >> > >"Distinct" >> >> > >>>>>> >> >> > >>>"MakeDistinct" or and "RemoveDuplicates" >>> "AvoidDuplicate". >> >> > >>> >> >> > >>>Dan >> >> > >>> >> >> > >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles >> >> > >>><k...@google.com.invalid> >> >> > >>>wrote: >> >> > >>> >> >> > >>>> The precedent that we use verbs has many exceptions. We have >> >> > >>>> ApproximateQuantiles, Values, Keys, WithTimestamps, and I >> >would >> >> > >even >> >> > >>>> include Sum (at least when I read it). >> >> > >>>> >> >> > >>>> Historical note: the predilection towards verbs is from the >> >Google >> >> > >>>Style >> >> > >>>> Guide for Java method names >> >> > >>>> >> >> > >>><https://google.github.io/styleguide/javaguide.html#s5. >> >> 2.3-method-names >> >> > >, >> >> > >>>> which states "Method names are typically verbs or verb >> >phrases". >> >> > >But >> >> > >>>even >> >> > >>>> in Google code there are lots of exceptions when it makes >> >sense, >> >> > >like >> >> > >>>> Guava's >> >> > >>>> Iterables.any(), Iterables.all(), Iterables.toArray(), the >> >entire >> >> > >>>> Predicates module, etc. Just an aside; Beam isn't Google code. >> >I >> >> > >>>suggest we >> >> > >>>> use our judgment rather than a policy. >> >> > >>>> >> >> > >>>> I think "Distinct" is one of those exceptions. It is a >> >standard >> >> > >>>widespread >> >> > >>>> name and also reads better as an adjective. I prefer it, but >> >also >> >> > >>>don't >> >> > >>>> care strongly enough to change it or to change it back :-) >> >> > >>>> >> >> > >>>> If we must have a verb, I like it as-is more than MakeDistinct >> >and >> >> > >>>> AvoidDuplicate. >> >> > >>>> >> >> > >>>> On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson >> >> > >>><je...@smokinghand.com> >> >> > >>>> wrote: >> >> > >>>> >> >> > >>>> > My original thought for this change was that Crunch uses the >> >> > >class >> >> > >>>name >> >> > >>>> > Distinct. SQL also uses the keyword distinct. >> >> > >>>> > >> >> > >>>> > Maybe the rule should be changed to adjectives or verbs >> >depending >> >> > >>>on the >> >> > >>>> > context. >> >> > >>>> > >> >> > >>>> > Using a verb to describe this class really doesn't connote >> >what >> >> > >the >> >> > >>>class >> >> > >>>> > does as succinctly as the adjective. >> >> > >>>> > >> >> > >>>> > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian >> >> > >>><nsal...@cloudera.com> >> >> > >>>> > wrote: >> >> > >>>> > >> >> > >>>> > > Hello, >> >> > >>>> > > >> >> > >>>> > > First of all, thank you to Daniel, Robert and Jesse for >> >their >> >> > >>>review on >> >> > >>>> > > this: https://issues.apache.org/jira/browse/BEAM-239 >> >> > >>>> > > >> >> > >>>> > > A point that came up was using verbs explicitly for >> >Transforms. >> >> > >>>> > > Here is the PR: >> >> > >>>https://github.com/apache/incubator-beam/pull/1164 >> >> > >>>> > > >> >> > >>>> > > Posting it to help understand if we have a consensus for >> >it and >> >> > >>>if yes, >> >> > >>>> > we >> >> > >>>> > > could perhaps document it for future changes. >> >> > >>>> > > >> >> > >>>> > > Thank you. >> >> > >>>> > > >> >> > >>>> > > -- >> >> > >>>> > > Neelesh Srinivas Salian >> >> > >>>> > > Engineer >> >> > >>>> > > >> >> > >>>> > >> >> > >>>> >> >> > >> >> >>