Hey Reynold, Looks like we all of the proposed changes into Proposed Community Mailing Lists / StackOverflow Changes <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>. Anything else we can do to update the Spark Community page / welcome email?
Meanwhile, let's all start answering questions on SO, eh?! :) Denny On Thu, Nov 10, 2016 at 1:54 PM Holden Karau <hol...@pigscanfly.ca> wrote: > That's a good question, looking at > http://stackoverflow.com/tags/apache-spark/topusers shows a few > contributors who have already been active on SO including some committers > and PMC members with very high overall SO reputations for any > administrative needs (as well as a number of other contributors besides > just PMC/committers). > > On Wed, Nov 9, 2016 at 2:18 AM, assaf.mendelson <assaf.mendel...@rsa.com> > wrote: > > I was just wondering, before we move on to SO. > > Do we have enough contributors with enough reputation do manage things in > SO? > > We would need contributors with enough reputation to have relevant > privilages. > > For example: creating tags (requires 1500 reputation), edit questions and > answers (2000), create tag synonums (2500), approve tag wiki edits (5000), > access to moderator tools (10000, this is required to delete questions > etc.), protect questions (15000). > > All of these are important if we plan to have SO as a main resource. > > I know I originally suggested SO, however, if we do not have contributors > with the required privileges and the willingness to help manage everything > then I am not sure this is a good fit. > > Assaf. > > > > *From:* Denny Lee [via Apache Spark Developers List] [mailto:ml-node+[hidden > email] <http:///user/SendEmail.jtp?type=node&node=19800&i=0>] > *Sent:* Wednesday, November 09, 2016 9:54 AM > *To:* Mendelson, Assaf > *Subject:* Re: Handling questions in the mailing lists > > > > Agreed that by simply just moving the questions to SO will not solve > anything but I think the call out about the meta-tags is that we need to > abide by SO rules and if we were to just jump in and start creating > meta-tags, we would be violating at minimum the spirit and at maximum the > actual conventions around SO. > > > > Saying this, perhaps we could suggest tags that we place in the header of > the question whether it be SO or the mailing lists that will help us sort > through all of these questions faster just as you suggested. The Proposed > Community Mailing Lists / StackOverflow Changes > <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p> > has > been updated to include suggested tags. WDYT? > > > > On Tue, Nov 8, 2016 at 11:02 PM assaf.mendelson <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=19799&i=0>> wrote: > > I like the document and I think it is good but I still feel like we are > missing an important part here. > > > > Look at SO today. There are: > > - 4658 unanswered questions under apache-spark tag. > > - 394 unanswered questions under spark-dataframe tag. > > - 639 unanswered questions under apache-spark-sql > > - 859 unanswered questions under pyspark > > > > Just moving people to ask there will not help. The whole issue is having > people answer the questions. > > > > The problem is that many of these questions do not fit SO (but are already > there so they are noise), are bad (i.e. unclear or hard to answer), > orphaned etc. while some are simply harder than what people with some > experience in spark can handle and require more expertise. > > The problem is that people with the relevant expertise are drowning in > noise. This. Is true for the mailing list and this is true for SO. > > > > For this reason I believe that just moving people to SO will not solve > anything. > > > > My original thought was that if we had different tags then different > people could watch open questions on these tags and therefore have a much > lower noise. I thought that we would have a low tier (current one) of > people just not following the documentation (which would remain as noise), > then a beginner tier where we could have people downvoting bad questions > but in most cases the community can answer the questions because they are > common, then a “medium” tier which would mean harder questions but that can > still be answered by advanced users and lastly an “advanced” tier to which > committers can actually subscribed to (and adding sub tags for subsystems > would improve this even more). > > > > I was not aware of SO policy for meta tags (the burnination link is about > removing tags completely so I am not sure how it applies, I believe this > link https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more > relevant). > > There was actually a discussion along the lines in SO ( > http://meta.stackoverflow.com/questions/253338/filtering-questions-by-difficulty-level > ). > > > > The fact that SO did not solve this issue, does not mean we shouldn’t > either. > > > > The way I see it, some tags can easily be used even with the meta tags > limitation. For example, using spark-internal-development tag can be used > to ask questions for development of spark. There are already tags for some > spark subsystems (there is a apachae-spark-sql tag, a pyspark tag, a > spark-streaming tag etc.). The main issue I see and the one we can’t seem > to get around is dividing between simple questions that the community > should answer and hard questions which only advanced users can answer. > > > > Maybe SO isn’t the correct platform for that but even within it we can try > to find a non meta name for spark beginner questions vs. spark advanced > questions. > > Assaf. > > > > > > *From:* Denny Lee [via Apache Spark Developers List] [mailto:[hidden > email] <http:///user/SendEmail.jtp?type=node&node=19799&i=1>[hidden email] > <http://user/SendEmail.jtp?type=node&node=19798&i=0>] > *Sent:* Tuesday, November 08, 2016 7:53 AM > *To:* Mendelson, Assaf > > > *Subject:* Re: Handling questions in the mailing lists > > > > To help track and get the verbiage for the Spark community page and > welcome email jump started, here's a working document for us to work with: > https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit# > <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit> > > > > Hope this will help us collaborate on this stuff a little faster. > > On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz <[hidden email] > <http://user/SendEmail.jtp?type=node&node=19770&i=0>> wrote: > > Just a couple of random thoughts regarding Stack Overflow... > > - If we are thinking about shifting focus towards SO all attempts of > micromanaging should be discarded right in the beginning. Especially things > like meta tags, which are discouraged and "burninated" ( > https://meta.stackoverflow.com/tags/burninate-request/info) , or > thread bumping. Depending on a context these won't be manageable, go > against community guidelines or simply obsolete. > - Lack of expertise is unlikely an issue. Even now there is a number > of advanced Spark users on SO. Of course the more the merrier. > > Things that can be easily improved: > > - Identifying, improving and promoting canonical questions and > answers. It means closing duplicate, suggesting edits to improve existing > answers, providing alternative solutions. This can be also used to identify > gaps in the documentation. > - Providing a set of clear posting guidelines to reduce effort > required to identify the problem (think about > http://stackoverflow.com/q/5963269 a.k.a How to make a great R > reproducible example?) > - Helping users decide if question is a good fit for SO (see below). > API questions are great fit, debugging problems like "my cluster is slow" > are not. > - Actively cleaning (closing, deleting) off-topic and low quality > questions. The less junk to sieve through the better chance of good > questions being answered. > - Repurposing and actively moderating SO docs ( > https://stackoverflow.com/documentation/apache-spark/topics). Right > now most of the stuff that goes there is useless, duplicated or > plagiarized, or border case SPAM. > - Encouraging community to monitor featured ( > https://stackoverflow.com/questions/tagged/apache-spark?sort=featured) > and active & upvoted & unanswered ( > https://stackoverflow.com/unanswered/tagged/apache-spark) questions. > - Implementing some procedure to identify questions which are likely > to be bugs or a material for feature requests. Personally I am quite often > tempted to simply send a link to dev list, but I don't think it is really > acceptable. > - Animating Spark related chat room. I tried this a couple of times > but to no avail. Without a certain critical mass of users it just won't > work. > > > > > > On 11/07/2016 07:32 AM, Reynold Xin wrote: > > This is an excellent point. If we do go ahead and feature SO as a way for > users to ask questions more prominently, as someone who knows SO very well, > would you be willing to help write a short guideline (ideally the shorter > the better, which makes it hard) to direct what goes to user@ and what > goes to SO? > > > > Sure, I'll be happy to help if I can. > > On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden email] > <http://user/SendEmail.jtp?type=node&node=19770&i=1>> wrote: > > Damn, I always thought that mailing list is only for nice and welcoming > people and there is nothing to do for me here >:) > > To be serious though, there are many questions on the users list which > would fit just fine on SO but it is not true in general. There are dozens > of questions which are to broad, opinion based, ask for external resources > and so on. If you want to direct users to SO you have to help them to > decide if it is the right channel. Otherwise it will just create a really > bad experience for both seeking help and active answerers. Former ones will > be downvoted and bashed, latter ones will have to deal with handling all > the junk and the number of active Spark users with moderation privileges is > really low (with only Massg and me being able to directly close duplicates). > > Believe me, I've seen this before. > > On 11/07/2016 05:08 AM, Reynold Xin wrote: > > You have substantially underestimated how opinionated people can be on > mailing lists too :) > > On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden email] > <http://user/SendEmail.jtp?type=node&node=19770&i=2>> wrote: > > You have to remember that Stack Overflow crowd (like me) is highly > opinionated, so many questions, which could be just fine on the mailing > list, will be quickly downvoted and / or closed as off-topic. Just > saying... > > -- > > Best, > > Maciej > > > > On 11/07/2016 04:03 AM, Reynold Xin wrote: > > OK I've checked on the ASF member list (which is private so there is no > public archive). > > > > It is not against any ASF rule to recommend StackOverflow as a place for > users to ask questions. I don't think we can or should delete the existing > user@spark list either, but we can certainly make SO more visible than it > is. > > > > > > On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden email] > <http://user/SendEmail.jtp?type=node&node=19770&i=3>> wrote: > > Actually after talking with more ASF members, I believe the only policy is > that development decisions have to be made and announced on ASF properties > (dev list or jira), but user questions don't have to. > > > > I'm going to double check this. If it is true, I would actually recommend > us moving entirely over the Q&A part of the user list to stackoverflow, or > at least make that the recommended way rather than the existing user list > which is not very scalable. > > > > On Wednesday, November 2, 2016, Nicholas Chammas <[hidden email] > <http://user/SendEmail.jtp?type=node&node=19770&i=4>> wrote: > > We’ve discussed several times upgrading our communication tools, as far > back as 2014 and maybe even before that too. The bottom line is that we > can’t due to ASF rules requiring the use of ASF-managed mailing lists. > > For some history, see this discussion: > > · > https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@...%3E > <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oy5no2dhwj_kveop...@mail.gmail.com%3E> > > · > https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@...%3E > <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=tktxy_...@mail.gmail.com%3E> > > (It’s ironic that it’s difficult to follow the past discussion on why we > can’t change our official communication tools due to those very tools…) > > Nick > > > > On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <[hidden email] > <http://user/SendEmail.jtp?type=node&node=19770&i=5>> wrote: > > I fell Assaf point is quite relevant if we want to move this project > forward from the Spark user perspective (as I do). In fact, we're still > using 20th century tools (mailing lists) with some add-ons (like Stack > Overflow). > > > > As usually, Sean and Cody's contributions are very to the point. > > I fell it is indeed a matter of of culture (hard to enforce) and tools > (much easier). Isn't it? > > On 2 November 2016 at 16:36, Cody Koeninger <[hidden email] > <http://user/SendEmail.jtp?type=node&node=19770&i=6>> wrote: > > So concrete things people could do > > - users could tag subject lines appropriately to the component they're > asking about > > - contributors could monitor user@ for tags relating to components > they've worked on. > I'd be surprised if my miss rate for any mailing list questions > well-labeled as Kafka was higher than 5% > > - committers could be more aggressive about soliciting and merging PRs > to improve documentation. > It's a lot easier to answer even poorly-asked questions with a link to > relevant docs. > > > On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <[hidden email] > <http://user/SendEmail.jtp?type=node&node=19770&i=7>> wrote: > > There's already reviews@ and issues@. dev@ is for project development > itself > > and I think is OK. You're suggesting splitting up user@ and I sympathize > > with the motivation. Experience tells me that we'll have a beginner@ > that's > > then totally ignored, and people will quickly learn to post to advanced@ > to > > get attention, and we'll be back where we started. Putting it in JIRA > > doesn't help. I don't think this a problem that is merely down to lack of > > process. It actually requires cultivating a culture change on the > community > > list. > > > > > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <[hidden email] > <http://user/SendEmail.jtp?type=node&node=19770&i=8>> > > > > wrote: > >> > >> What I am suggesting is basically to fix that. > >> > >> For example, we might say that mailing list A is only for voting, > mailing > >> list B is only for PR and have something like stack overflow for > developer > >> questions (I would even go as far as to have beginner, intermediate and > >> advanced mailing list for users and beginner/advanced for dev). > >> > >> > >> > >> This can easily be done using stack overflow tags, however, that would > >> probably be harder to manage. > >> > >> Maybe using special jira tags and manage it in jira? > >> > >> > >> > >> Anyway as I said, the main issue is not user questions (except maybe > >> advanced ones) but more for dev questions. It is so easy to get lost in > the > >> chatter that it makes it very hard for people to learn spark internals… > >> > >> Assaf. > >> > >> > >> > > >> From: Sean Owen [mailto:[hidden email] > <http://user/SendEmail.jtp?type=node&node=19770&i=9>] > > > >> Sent: Wednesday, November 02, 2016 2:07 PM > > >> To: Mendelson, Assaf; [hidden email] > <http://user/SendEmail.jtp?type=node&node=19770&i=10> > > > >> Subject: Re: Handling questions in the mailing lists > >> > >> > >> > >> I think that unfortunately mailing lists don't scale well. This one has > >> thousands of subscribers with different interests and levels of > experience. > >> For any given person, most messages will be irrelevant. I also find > that a > >> lot of questions on user@ are not well-asked, aren't an SSCCE > >> (http://sscce.org/), not something most people are going to bother > replying > >> to even if they could answer. I almost entirely ignore user@ because > there > >> are higher-priority channels like PRs to deal with, that already have > >> hundreds of messages per day. This is why little of it gets an answer > -- too > >> noisy. > >> > >> > >> > >> We have to have official mailing lists, in any event, to have some > >> official channel for things like votes and announcements. It's not > wrong to > >> ask questions on user@ of course, but a lot of the questions I see > could > >> have been answered with research of existing docs or looking at the > code. I > >> think that given the scale of the list, it's not wrong to assert that > this > >> is sort of a prerequisite for asking thousands of people to answer one's > >> question. But we can't enforce that. > >> > >> > >> > >> The situation will get better to the extent people ask better questions, > >> help other people ask better questions, and answer good questions. I'd > >> encourage anyone feeling this way to try to help along those dimensions. > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > > >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <[hidden email] > <http://user/SendEmail.jtp?type=node&node=19770&i=11>> > > > >> wrote: > >> > >> Hi, > >> > >> I know this is a little off topic but I wanted to raise an issue about > >> handling questions in the mailing list (this is true both for the user > >> mailing list and the dev but since there are other options such as stack > >> overflow for user questions, this is more problematic in dev). > >> > >> Let’s say I ask a question (as I recently did). Unfortunately this was > >> during spark summit in Europe so probably people were busy. In any case > no > >> one answered. > >> > >> The problem is, that if no one answers very soon, the question will > almost > >> certainly remain unanswered because new messages will simply drown it. > >> > >> > >> > >> This is a common issue not just for questions but for any comment or > idea > >> which is not immediately picked up. > >> > >> > >> > >> I believe we should have a method of handling this. > >> > >> Generally, I would say these types of things belong in stack overflow, > >> after all, the way it is built is perfect for this. More seasoned spark > >> contributors and committers can periodically check out unanswered > questions > >> and answer them. > >> > >> The problem is that stack overflow (as well as other targets such as the > >> databricks forums) tend to have a more user based orientation. This > means > >> that any spark internal question will almost certainly remain > unanswered. > >> > >> > >> > >> I was wondering if we could come up with a solution for this. > >> > >> > >> > >> Assaf. > >> > >> > >> > >> > >> > >> ________________________________ > >> > >> View this message in context: Handling questions in the mailing lists > >> Sent from the Apache Spark Developers List mailing list archive at > >> Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: [hidden email] > <http://user/SendEmail.jtp?type=node&node=19770&i=12> > > > > > > > > > > > > > > -- > > Maciej Szymkiewicz > > > ------------------------------ > > *If you reply to this email, your message will be added to the discussion > below:* > > > http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19770.html > > To start a new topic under Apache Spark Developers List, email [hidden > email] <http://user/SendEmail.jtp?type=node&node=19798&i=1> > To unsubscribe from Apache Spark Developers List, click here. > NAML > <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > > > ------------------------------ > > View this message in context: RE: Handling questions in the mailing lists > <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19798.html> > Sent from the Apache Spark Developers List mailing list archive > <http://apache-spark-developers-list.1001551.n3.nabble.com/> at > Nabble.com. > > > ------------------------------ > > *If you reply to this email, your message will be added to the discussion > below:* > > > http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19799.html > > To start a new topic under Apache Spark Developers List, email [hidden > email] <http:///user/SendEmail.jtp?type=node&node=19800&i=1> > To unsubscribe from Apache Spark Developers List, click here. > NAML > <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > > ------------------------------ > View this message in context: RE: Handling questions in the mailing lists > <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19800.html> > Sent from the Apache Spark Developers List mailing list archive > <http://apache-spark-developers-list.1001551.n3.nabble.com/> at > Nabble.com. > > > > > -- > Cell : 425-233-8271 <(425)%20233-8271> > Twitter: https://twitter.com/holdenkarau >