I think telling people that they’re being considered as committers early on is
a good idea, but AFAIK we’ve always had individual committers do that with
contributors who were doing great work in various areas. We don’t have a
centralized process for it though — it’s up to whoever wants to work
That's fair, and it's great to find high quality contributors. But I also
feel the two projects have very different background and maturity phase.
There are 1300+ contributors to Spark, and only 300 to Beam, with the vast
majority of contributions coming from a single company for Beam (based on
my
As someone who floats a bit between both projects (as a contributor) I'd
love to see us adopt some of these techniques to be pro-active about
growing our committer-ship (I think perhaps we could do this by also moving
some of the newer committers into the PMC faster so there are more eyes out
Worth, I think, a read and consideration from Spark folks. I'd be
interested in comments; I have a few reactions too.
-- Forwarded message -
From: Kenneth Knowles
Date: Sat, Jun 30, 2018 at 1:15 AM
Subject: Beam's recent community development work
To: , , Griselda Cuevas <
The vote passes. Thanks to all who helped with the release!
I'll start publishing everything tomorrow, and an announcement will
be sent when artifacts have propagated to the mirrors (probably
early next week).
+1 (* = binding):
- Marcelo Vanzin *
- Sean Owen *
- Tom Graves *
- Holder Kaurau *-
I forgot to post it, I'm +1.
Tom
On Monday, July 2, 2018, 12:19:08 AM CDT, Holden Karau
wrote:
Leaving documents aside (I think we should maybe have a thread on how we want
to handle doc changes to existing releases on dev@) I'm +1 PySpark venv checks
out.
On Sun, Jul 1, 2018 at 9:40
May be this is a bug. The source can be found at:
https://github.com/purijatin/spark-retrain-bug
*Issue:*
The program takes input a set of documents. Where each document is in a
separate file.
The spark program tf-idf of the terms (Tokenizer -> Stopword remover ->
stemming -> tf -> tfidf).
Once