Thank you for bringing this up. I think the current committers are
bravely facing down a flood of PRs, and this (among other things) is a
step that needs to be taken to scale up and keep this fun. I'd love to
have a separate discussion about more steps, but for here I offer two
bits of advice from experience:

First, you guys most certainly can and should say 'no' to some
changes. It's part of keeping the project coherent. It's always good
to try to include all contributions, but, appreciating contributions
does not always mean accepting them. I have seen projects turned to
mush by the 'anything's welcome' mentality. Push back on contributors
to contribute the thing you think is right. Please keep the API
succinct, yes.

Second, contrib/ modules are problematic. It becomes a ball of legacy
code that you still have to keep maintaining to compile and run. In a
world of Github, I think 'contrib' stuff just belongs in other repos.
I know it sounds harmless to have a contrib, but I think you'd find
the consensus here is that contrib is a mistake.

$0.02 --
--
Sean Owen | Director, Data Science | London


On Sun, Feb 23, 2014 at 6:06 AM, Mridul Muralidharan <mri...@gmail.com> wrote:
> Hi,
>
>   Over the past few months, I have seen a bunch of pull requests which have
> extended spark api ... most commonly RDD itself.
>
> Most of them are either relatively niche case of specialization (which
> might not be useful for most cases) or idioms which can be expressed
> (sometimes with minor perf penalty) using existing api.
>
> While all of them have non zero value (hence the effort to contribute, and
> gladly welcomed !) they are extending the api in nontrivial ways and have a
> maintenance cost ... and we already have a pending effort to clean up our
> interfaces prior to 1.0
>
> I believe there is a need to keep exposed api succint, expressive and
> functional in spark; while at the same time, encouraging extensions and
> specialization within spark codebase so that other users can benefit from
> the shared contributions.
>
> One approach could be to start something akin to piggybank in pig to
> contribute user generated specializations, helper utils, etc : bundled as
> part of spark, but not part of core itself.
>
> Thoughts, comments ?
>
> Regards,
> Mridul

Reply via email to