On Mon, Sep 23, 2019 at 3:08 PM Brian Hulette <bhule...@google.com> wrote: > > Would people actually click on that link though? I think Kyle has a point > that in practice users would only find and click on that link when they're > having some kind of issue, especially if the link has "feedback" in it.
I think the idea is that we would make the link very light-weight, kind of like a survey (but even easier as it's pre-populated). Basically an opt-in phone-home. If we don't collect any personal data (not even IP/geo, just (say) version + runner, all visible in the URL), no need to guard/anonymize (and this may be sufficient--I don't think we have to worry about spammers and ballot stuffers given the target audience). If we can catch people while they wait for their pipeline to start up (and/or complete), this is a great time to get some feedback. > I agree usage data would be really valuable, but I'm not sure that this > approach would get us good data. Is there a way to get download statistics > for the different runner artifacts? Maybe that could be a better metric to > compare usage. This'd be useful too, but hard to get and very noisy. > > On Mon, Sep 23, 2019 at 2:57 PM Ankur Goenka <goe...@google.com> wrote: >> >> I agree, these are the questions that need to be answered. >> The data can be anonymize and stored as public data in BigQuery or some >> other place. >> >> The intent is to get the usage statistics so that we can get to know what >> people are using Flink or Spark etc and not intended for discussion or a >> help channel. >> I also think that we don't need to monitor this actively as it's more like a >> survey rather than active channel to get issues resolved. >> >> If we think its useful for the community then we come up with the solution >> as to how can we do this (similar to how we released the container images). >> >> >> >> On Fri, Sep 20, 2019 at 4:38 PM Kyle Weaver <kcwea...@google.com> wrote: >>> >>> There are some logistics that would need worked out. For example, Where >>> would the data go? Who would own it? >>> >>> Also, I'm not convinced we need yet another place to discuss Beam when we >>> already have discussed the challenge of simultaneously monitoring mailing >>> lists, Stack Overflow, Slack, etc. While "how do you use Beam" is certainly >>> an interesting question, and I'd be curious to know that >= X many people >>> use a certain runner, I'm not sure answers to these questions are as useful >>> for guiding the future of Beam as discussions on the dev/users lists, etc. >>> as the latter likely result in more depth/specific feedback. >>> >>> However, I do think it could be useful in general to include links directly >>> in the console output. For example, maybe something along the lines of "Oh >>> no, your Flink pipeline crashed! Check Jira/file a bug/ask the mailing >>> list." >>> >>> Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com >>> >>> >>> On Fri, Sep 20, 2019 at 4:14 PM Ankur Goenka <goe...@google.com> wrote: >>>> >>>> Hi, >>>> >>>> At the moment we don't really have a good way to collect any usage >>>> statistics for Apache Beam. Like runner used etc. As many of the users >>>> don't really have a way to report their usecase. >>>> How about if we create a feedback page where users can add their pipeline >>>> details and usecase. >>>> Also, we can start printing the link to this page when user launch the >>>> pipeline in the command line. >>>> Example: >>>> $ python my_pipeline.py --runner DirectRunner --input /tmp/abc >>>> >>>> Starting pipeline >>>> Please use http://feedback.beam.org?args=runner=DirectRunner,input=/tmp/abc >>>> Pipeline started >>>> ...... >>>> >>>> Using a link and not publishing the data automatically will give user >>>> control over what they publish and what they don't. We can enhance the >>>> text and usage further but the basic idea is to ask for user feeback at >>>> each run of the pipeline. >>>> Let me know what you think. >>>> >>>> >>>> Thanks, >>>> Ankur