I agree, these are the questions that need to be answered. The data can be anonymize and stored as public data in BigQuery or some other place.
The intent is to get the usage statistics so that we can get to know what people are using Flink or Spark etc and not intended for discussion or a help channel. I also think that we don't need to monitor this actively as it's more like a survey rather than active channel to get issues resolved. If we think its useful for the community then we come up with the solution as to how can we do this (similar to how we released the container images). On Fri, Sep 20, 2019 at 4:38 PM Kyle Weaver <kcwea...@google.com> wrote: > There are some logistics that would need worked out. For example, Where > would the data go? Who would own it? > > Also, I'm not convinced we need yet another place to discuss Beam when we > already have discussed the challenge of simultaneously monitoring mailing > lists, Stack Overflow, Slack, etc. While "how do you use Beam" is certainly > an interesting question, and I'd be curious to know that >= X many people > use a certain runner, I'm not sure answers to these questions are as useful > for guiding the future of Beam as discussions on the dev/users lists, etc. > as the latter likely result in more depth/specific feedback. > > However, I do think it could be useful in general to include links > directly in the console output. For example, maybe something along the > lines of "Oh no, your Flink pipeline crashed! Check Jira/file a bug/ask the > mailing list." > > Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com > > > On Fri, Sep 20, 2019 at 4:14 PM Ankur Goenka <goe...@google.com> wrote: > >> Hi, >> >> At the moment we don't really have a good way to collect any usage >> statistics for Apache Beam. Like runner used etc. As many of the users >> don't really have a way to report their usecase. >> How about if we create a feedback page where users can add their pipeline >> details and usecase. >> Also, we can start printing the link to this page when user launch the >> pipeline in the command line. >> Example: >> $ python my_pipeline.py --runner DirectRunner --input /tmp/abc >> >> Starting pipeline >> Please use >> http://feedback.beam.org?args=runner=DirectRunner,input=/tmp/abc >> Pipeline started >> ...... >> >> Using a link and not publishing the data automatically will give user >> control over what they publish and what they don't. We can enhance the text >> and usage further but the basic idea is to ask for user feeback at each run >> of the pipeline. >> Let me know what you think. >> >> >> Thanks, >> Ankur >> >