I think the goal is to lower the barrier of entry. Displaying a URL to click on while waiting for your pipeline to start up, that contains all the data explicitly visible, is about as easy as it gets. Remembering to run a new (probably not as authentic) pipeline with that flag is less so.
On Tue, Sep 24, 2019 at 11:04 AM Mikhail Gryzykhin <mig...@google.com> wrote: > > I'm with Luke on this. We can add a set of flags to send home stats and crash > dumps if user agrees. If we keep code isolated, it will be easy enough for > user to check what is being sent. > > One more heavy-weight option is to also allow user configure and persist what > information he is ok with sharing. > > --Mikhail > > > On Tue, Sep 24, 2019 at 10:02 AM Lukasz Cwik <lc...@google.com> wrote: >> >> Why not add a flag to the SDK that would do the phone home when specified? >> >> From a support perspective it would be useful to know: >> * SDK version >> * Runner >> * SDK provided PTransforms that are used >> * Features like user state/timers/side inputs/splittable dofns/... >> * Graph complexity (# nodes, # branches, ...) >> * Pipeline failed or succeeded >> >> On Mon, Sep 23, 2019 at 3:18 PM Robert Bradshaw <rober...@google.com> wrote: >>> >>> On Mon, Sep 23, 2019 at 3:08 PM Brian Hulette <bhule...@google.com> wrote: >>> > >>> > Would people actually click on that link though? I think Kyle has a point >>> > that in practice users would only find and click on that link when >>> > they're having some kind of issue, especially if the link has "feedback" >>> > in it. >>> >>> I think the idea is that we would make the link very light-weight, >>> kind of like a survey (but even easier as it's pre-populated). >>> Basically an opt-in phone-home. If we don't collect any personal data >>> (not even IP/geo, just (say) version + runner, all visible in the >>> URL), no need to guard/anonymize (and this may be sufficient--I don't >>> think we have to worry about spammers and ballot stuffers given the >>> target audience). If we can catch people while they wait for their >>> pipeline to start up (and/or complete), this is a great time to get >>> some feedback. >>> >>> > I agree usage data would be really valuable, but I'm not sure that this >>> > approach would get us good data. Is there a way to get download >>> > statistics for the different runner artifacts? Maybe that could be a >>> > better metric to compare usage. >>> >>> This'd be useful too, but hard to get and very noisy. >>> >>> > >>> > On Mon, Sep 23, 2019 at 2:57 PM Ankur Goenka <goe...@google.com> wrote: >>> >> >>> >> I agree, these are the questions that need to be answered. >>> >> The data can be anonymize and stored as public data in BigQuery or some >>> >> other place. >>> >> >>> >> The intent is to get the usage statistics so that we can get to know >>> >> what people are using Flink or Spark etc and not intended for discussion >>> >> or a help channel. >>> >> I also think that we don't need to monitor this actively as it's more >>> >> like a survey rather than active channel to get issues resolved. >>> >> >>> >> If we think its useful for the community then we come up with the >>> >> solution as to how can we do this (similar to how we released the >>> >> container images). >>> >> >>> >> >>> >> >>> >> On Fri, Sep 20, 2019 at 4:38 PM Kyle Weaver <kcwea...@google.com> wrote: >>> >>> >>> >>> There are some logistics that would need worked out. For example, Where >>> >>> would the data go? Who would own it? >>> >>> >>> >>> Also, I'm not convinced we need yet another place to discuss Beam when >>> >>> we already have discussed the challenge of simultaneously monitoring >>> >>> mailing lists, Stack Overflow, Slack, etc. While "how do you use Beam" >>> >>> is certainly an interesting question, and I'd be curious to know that >>> >>> >= X many people use a certain runner, I'm not sure answers to these >>> >>> questions are as useful for guiding the future of Beam as discussions >>> >>> on the dev/users lists, etc. as the latter likely result in more >>> >>> depth/specific feedback. >>> >>> >>> >>> However, I do think it could be useful in general to include links >>> >>> directly in the console output. For example, maybe something along the >>> >>> lines of "Oh no, your Flink pipeline crashed! Check Jira/file a bug/ask >>> >>> the mailing list." >>> >>> >>> >>> Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com >>> >>> >>> >>> >>> >>> On Fri, Sep 20, 2019 at 4:14 PM Ankur Goenka <goe...@google.com> wrote: >>> >>>> >>> >>>> Hi, >>> >>>> >>> >>>> At the moment we don't really have a good way to collect any usage >>> >>>> statistics for Apache Beam. Like runner used etc. As many of the users >>> >>>> don't really have a way to report their usecase. >>> >>>> How about if we create a feedback page where users can add their >>> >>>> pipeline details and usecase. >>> >>>> Also, we can start printing the link to this page when user launch the >>> >>>> pipeline in the command line. >>> >>>> Example: >>> >>>> $ python my_pipeline.py --runner DirectRunner --input /tmp/abc >>> >>>> >>> >>>> Starting pipeline >>> >>>> Please use >>> >>>> http://feedback.beam.org?args=runner=DirectRunner,input=/tmp/abc >>> >>>> Pipeline started >>> >>>> ...... >>> >>>> >>> >>>> Using a link and not publishing the data automatically will give user >>> >>>> control over what they publish and what they don't. We can enhance the >>> >>>> text and usage further but the basic idea is to ask for user feeback >>> >>>> at each run of the pipeline. >>> >>>> Let me know what you think. >>> >>>> >>> >>>> >>> >>>> Thanks, >>> >>>> Ankur