On Mon, Sep 23, 2019 at 3:08 PM Brian Hulette <bhule...@google.com> wrote:
>
> Would people actually click on that link though? I think Kyle has a point 
> that in practice users would only find and click on that link when they're 
> having some kind of issue, especially if the link has "feedback" in it.

I think the idea is that we would make the link very light-weight,
kind of like a survey (but even easier as it's pre-populated).
Basically an opt-in phone-home. If we don't collect any personal data
(not even IP/geo, just (say) version + runner, all visible in the
URL), no need to guard/anonymize (and this may be sufficient--I don't
think we have to worry about spammers and ballot stuffers given the
target audience). If we can catch people while they wait for their
pipeline to start up (and/or complete), this is a great time to get
some feedback.

> I agree usage data would be really valuable, but I'm not sure that this 
> approach would get us good data. Is there a way to get download statistics 
> for the different runner artifacts? Maybe that could be a better metric to 
> compare usage.

This'd be useful too, but hard to get and very noisy.

>
> On Mon, Sep 23, 2019 at 2:57 PM Ankur Goenka <goe...@google.com> wrote:
>>
>> I agree, these are the questions that need to be answered.
>> The data can be anonymize and stored as public data in BigQuery or some 
>> other place.
>>
>> The intent is to get the usage statistics so that we can get to know what 
>> people are using Flink or Spark etc and not intended for discussion or a 
>> help channel.
>> I also think that we don't need to monitor this actively as it's more like a 
>> survey rather than active channel to get issues resolved.
>>
>> If we think its useful for the community then we come up with the solution 
>> as to how can we do this (similar to how we released the container images).
>>
>>
>>
>> On Fri, Sep 20, 2019 at 4:38 PM Kyle Weaver <kcwea...@google.com> wrote:
>>>
>>> There are some logistics that would need worked out. For example, Where 
>>> would the data go? Who would own it?
>>>
>>> Also, I'm not convinced we need yet another place to discuss Beam when we 
>>> already have discussed the challenge of simultaneously monitoring mailing 
>>> lists, Stack Overflow, Slack, etc. While "how do you use Beam" is certainly 
>>> an interesting question, and I'd be curious to know that >= X many people 
>>> use a certain runner, I'm not sure answers to these questions are as useful 
>>> for guiding the future of Beam as discussions on the dev/users lists, etc. 
>>> as the latter likely result in more depth/specific feedback.
>>>
>>> However, I do think it could be useful in general to include links directly 
>>> in the console output. For example, maybe something along the lines of "Oh 
>>> no, your Flink pipeline crashed! Check Jira/file a bug/ask the mailing 
>>> list."
>>>
>>> Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com
>>>
>>>
>>> On Fri, Sep 20, 2019 at 4:14 PM Ankur Goenka <goe...@google.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> At the moment we don't really have a good way to collect any usage 
>>>> statistics for Apache Beam. Like runner used etc. As many of the users 
>>>> don't really have a way to report their usecase.
>>>> How about if we create a feedback page where users can add their pipeline 
>>>> details and usecase.
>>>> Also, we can start printing the link to this page when user launch the 
>>>> pipeline in the command line.
>>>> Example:
>>>> $ python my_pipeline.py --runner DirectRunner --input /tmp/abc
>>>>
>>>> Starting pipeline
>>>> Please use http://feedback.beam.org?args=runner=DirectRunner,input=/tmp/abc
>>>> Pipeline started
>>>> ......
>>>>
>>>> Using a link and not publishing the data automatically will give user 
>>>> control over what they publish and what they don't. We can enhance the 
>>>> text and usage further but the basic idea is to ask for user feeback at 
>>>> each run of the pipeline.
>>>> Let me know what you think.
>>>>
>>>>
>>>> Thanks,
>>>> Ankur

Reply via email to