Hi,

  Over the past few months, I have seen a bunch of pull requests which have
extended spark api ... most commonly RDD itself.

Most of them are either relatively niche case of specialization (which
might not be useful for most cases) or idioms which can be expressed
(sometimes with minor perf penalty) using existing api.

While all of them have non zero value (hence the effort to contribute, and
gladly welcomed !) they are extending the api in nontrivial ways and have a
maintenance cost ... and we already have a pending effort to clean up our
interfaces prior to 1.0

I believe there is a need to keep exposed api succint, expressive and
functional in spark; while at the same time, encouraging extensions and
specialization within spark codebase so that other users can benefit from
the shared contributions.

One approach could be to start something akin to piggybank in pig to
contribute user generated specializations, helper utils, etc : bundled as
part of spark, but not part of core itself.

Thoughts, comments ?

Regards,
Mridul

Reply via email to