Re: [REMINDER] Technical discussion on the mailing list

Daniel Kulp Thu, 06 Oct 2016 03:05:35 -0700

At the end of the day, it comes down to two questions:

1) Are there technical and project direction discussions happening off list and 
not reflected back to the list?


2) If yes, are the concrete decisions being made as a result of the off list 
discussions?


The answer to #1 is a definite yes.   From a quick perusal, several of the pull 
requests that have more than 10-15 comments is a discussion that should be back 
here.  Look at the Metrics PR, the splittable DoFn pr, etc… There are 
discussions there that NEED to be coming back to this list one way or another.  
 For things like the Splittable DoFn that is a long running discussion, it 
would most likely need periodic summaries/updates.  Trying to follow everything 
going on in that PR is impossible with the comments on the outdated commits and 
such.  

Because of the scale of the #1 problem, I’m unsure of the answer to #2, but my 
gut feeling is yes.  If the off list discussions are resulting in technical 
changes that people don’t know about and cannot object to or comment on, then 
there is a problem.  

From an Apache standpoint, we have to get the answer to BOTH questions to a 
“no” state.   That’s a requirement.   The question is how do we get there?

Dan



> On Oct 6, 2016, at 8:29 AM, Davor Bonaci <[email protected]> wrote:
> 
> Daniel, so glad you are starting to contribute to Beam! It was great
> talking with you in person back in May. Welcome!
> 
> --
> 
> There are lots of different things mentioned here; I'll try to address them
> separately.
> 
> The first use of AutoValue should have been discussed on the dev@ mailing
> list. I think the main reason for the discussion is a bit different --
> AutoValue has a non-trivial tradeoff -- compile complexity vs. boilerplate
> code. For example, AutoValue may degrade IDE experience for some
> contributors. If we'd go in depth on this, I'm sure we'd find opposing
> opinions on the use of AutoValue. This tradeoff should have been discussed
> on the dev@ list, followed by a community decision.
> 
> Note that this has happened *before* the JdbcIO work. Since AutoValue has
> been already used elsewhere in the project, there was no real reason not to
> use it in JdbcIO, as appropriate. Therefore, I think JB and Eugene did
> everything right! Second, third or thousandth usage of a concept doesn't
> require any particular discussion. They didn't make anything worse. Their
> discussion is totally appropriate for code review.
> 
> Now, as Daniel points out, I think it is not right to ask a contributor to
> change his PR to use AutoValue when none of the existing IO connectors use
> it. This is making a too high standard. In fact, it is desirable for new
> contributions to follow already established patterns, instead of inventing
> something new. If we want to change a pattern, we should do it as a
> separate effort across the board.
> 
> On the other hand, dev@ discussion wouldn't have helped to prevent review
> comments/discussions. Let's say we have had the discussion, and a new
> contributor comes a year later. Should we ask her to read all discussions
> that ever happened in the project to learn everything she might need? Of
> course not! She should follow already established patterns and learn any
> specifics during code review. And then, best practices should be documented
> on the website.
> 
> To summarize, a few things could have been better:
> * Discussion of the first use of AutoValue on dev@.
> * Avoiding overzealous core review comments.
> * Changing a pattern should have been done by filing several starter tasks
> in JIRA.
> 
> --
> 
> There are also several different proposals for altering a part of the
> workflow.
> 
>> code review comments not making to a list
> 
> We have >1000 PRs so far, with at least a dozen comments on average, with
> pace increasing. This is >10,000 emails, most of which are "fix a typo".
> This leads into information overload, with actual information being missed.
> 
> If someone wants this extra information -- just clicking the Watch button
> in the GitHub UI will make it happen!
> 
>> creating new JIRA and opening PR to dev@
> 
> These currently go to commits@. This would have resulted in another 1,700
> email threads compared to <150 now.
> 
> Generally speaking, *all* of this is already available to anyone who wants
> to receive it. However, anyone I know that has tried, has given up very
> quickly ;). If anybody is concerned, we can create several new lists for
> this traffic -- but we shouldn't repurpose dev@ for it.
> 
>> I feel I’m missing things as there is significant amount of things not
> happening on a list
> 
> I think "feeling of missing things" is totally valid. I feel that too, as
> well as almost everybody else.
> 
> My best answer is -- we should realize that we are an extremely large and
> complex project, with >100 contributors and >20 people working on it full
> time. Nobody can follow every SDK, every runner, every IO connector, every
> pull request, every comment that all contributors make each and every day.
> 
> While nobody can follow everything, everything is being followed by
> multiple people. And, we need to be accountable to each other to surface
> everything relevant to the dev@ list. And I believe that is already
> happening the vast majority of time. This is just one example where it
> didn't happen.
> 
> --
> 
> All that said, there are certainly areas for improvement. If anyone has
> specific ideas, please reach out! I'd love to discuss them in detail and
> propose improvements to the wider community.
> 
> Thanks!
> 
> 
> On Wed, Oct 5, 2016 at 6:16 PM, Thomas Weise <[email protected]> wrote:
>> 
>> How about sending just the notifications for creating new JIRA and opening
>> PR to dev@ so that those that are interested can start watching?
>> 
>> Thanks,
>> Thomas
>> 
>> On Wed, Oct 5, 2016 at 5:33 PM, Dan Halperin <[email protected]>
>> wrote:
>> 
>>> On Wed, Oct 5, 2016 at 5:13 PM, Daniel Kulp <[email protected]> wrote:
>>> 
>>>> I just want to give a little more context to this….  I’ve been
> lurking on
>>>> this list for several months now reading everything that’s going on.
>>> From
>>>> Apache’s standpoint, that should be a “very good start” for getting to
>>> know
>>>> what is happening in a project.
>>>> 
>>>> On my last PR, Eugene commented about using the AutoValue pattern for
>>> part
>>>> of it which caught me off guard.   None of the other IO’s in master
> were
>>>> using it, there wasn’t any discussion on this list about it, I had no
>>> idea
>>>> what it was about.   So I asked JB to make sure I hadn’t missed
> anything.
>>>> 
>>> 
>>>> Anyway, this is one of the main concerns I have with Beam’s PR work
> flow,
>>>> I feel I’m missing things as there is significant amount of things not
>>>> happening on a list.   The initial pull request is going to the
> commits
>>>> list (ok, would prefer the dev list, but at least its on a list).
>>> However,
>>>> none of the comments or discussions or anything that is occurring as
> part
>>>> of the review is making it to any list.   The only people that “learn”
>>> from
>>>> the reviews are the reviewers and the person who initiated the PR
> unless
>>>> they go into each and every PR and read the comments (and find the
> news
>>>> ones and such).    With my Apache hat on, this bothers me.
>>> 
>>> 
>>> Anyway, I don’t really understand why the full github integration wasn’t
>>>> setup for the beam PR’s so that the comments would come back to the
> lists
>>>> as well (and JIRA, BTW).
>>>> 
>>> 
>>> This part confuses me. We've been told that discussions on JIRA, even
>>> though they are emailed to the mailing lists, don't count as happening
> on
>>> the mailing list. So why would github integration be helpful vs just
> more
>>> spam?
>>> 
>>> As another example, the comments on PR1003 are very applicable to anyone
>>>> looking into writing IO’s and they could learn about some of the “best
>>>> practices” presented there.
>>> 
>>> 
>>> As Beam grows during its incubation, we are moving a lot of knowledge to
>>> documentation, but yes -- right now, most of the I/O related practices
> live
>>> in Eugene's and my head (and now, JB's!). We're working on it, and hope
> to
>>> dramatically improve documentation for source authors in the next
> quarter.
>>> 
>>> For AutoValue specifically, this is by no means codified and it is
>>> DEFINITELY not mandatory. Eugene and JB just experimented with it in the
>>> last few days and decided it was useful in a few cases. We do (or did,
>>> before this thread) need to have an actual discussion on the mailing
> list
>>> before moving forward further towards making it policy.
>>> 
>>> Right now Ben Chambers is trying to apply AutoValue in places that need
>>> templated types and struggling with multiple ?s, so the discussion may
> need
>>> to continue! ...
>>> 
>>> Thanks,
>>> Dan
>>> 
>>> That’s basically the background as to why JB sent this.  I was confused
> and
>>>> bugged him.   :-)
>>>> 
>>>> Dan
>>>> 
>>>> 
>>>> 
>>>>> On Oct 5, 2016, at 1:51 PM, Jean-Baptiste Onofré <[email protected]>
>>>> wrote:
>>>>> 
>>>>> Hi team,
>>>>> 
>>>>> I would like to excuse myself to have forgotten to discuss and share
>>>> with you a technical point and generally speaking do a small reminder.
>>>>> 
>>>>> When we work with Eugene on the JdbcIO, we experimented AutoValue to
>>>> deal with IO configuration. AutoValue provides a nice way to reduce
> and
>>>> limit the boilerplate code required by the IO configuration.
>>>>> We used AutoValue in JdbcIO and, regarding the good improvements we
>>> saw,
>>>> we started to refactor the other IOs.
>>>>> 
>>>>> The use of AutoValue should have been notice and discussed on the
>>>> mailing list.
>>>>> 
>>>>> "If it doesn't exist on the mailing list, it doesn't exist at all."
>>>>> 
>>>>> So, any comment happening on a GitHub pull request, or discussion on
>>>> hangouts which can impact the project (generally speaking) has to
> happen
>>> on
>>>> the mailing list.
>>>>> 
>>>>> It provides project transparency and facilitates the new
> contribution
>>>> onboarding.
>>>>> 
>>>>> Thanks !
>>>>> 
>>>>> Regards
>>>>> JB
>>>>> 
>>>> 
>>>> --
>>>> Daniel Kulp
>>>> [email protected] <mailto:[email protected]> - http://dankulp.com/blog <
>>>> http://dankulp.com/blog>
>>>> Talend Community Coder - http://coders.talend.com <
>>>> http://coders.talend.com/>
>>>> 
>>> 

-- 
Daniel Kulp
[email protected] <mailto:[email protected]> - http://dankulp.com/blog 
<http://dankulp.com/blog>
Talend Community Coder - http://coders.talend.com <http://coders.talend.com/>

Re: [REMINDER] Technical discussion on the mailing list

Reply via email to