+1 on collecting the download information.

And collecting data when starting up is a bit dangerous I'd say, both
technically and legally...

Maybe a possible way is to add a link on the master state page,  or some
ASCII arts in the master start log, to guide the people to our survey?

Allan Yang <allan...@apache.org> 于2018年11月15日周四 上午11:23写道:

> I also think having metrics about the downloads from Apache/archives is a
> doable action. Most HBase clusters are running in user's Intranet with no
> public access, sending anonymous data from them may not be possible. And
> also we need to find a way to obtain their authorization I think...
> Best Regards
> Allan Yang
>
> Zach York <zyork.contribut...@gmail.com> 于2018年11月15日周四 上午5:35写道:
>
> > Can we have metrics around the downloads from Apache/archives? I'm not
> sure
> > how that is all set up, but might be a low cost way to get some metrics.
> >
> > On Wed, Nov 14, 2018, 12:12 PM Andrew Purtell <apurt...@apache.org
> wrote:
> >
> > > While it seems you are proposing some kind of autonomous ongoing usage
> > > metrics collection, please note I ran an anonymous version usage survey
> > via
> > > surveymonkey for 1.x last year. It was opt in and there were no PII
> > > concerns by its nature. All of the issues around data collection,
> > storage,
> > > and processing were also handled (by surveymonkey). Unfortunately I
> > > recently cancelled my account.
> > >
> > > For occasional surveys something like that might work. Otherwise there
> > are
> > > a ton of questions: How do we generate the data? How do we get per-site
> > > opt-in permission? How do we collect the data? Store it? Process it?
> > Audit
> > > it? Seems more trouble than it's worth and requires ongoing volunteer
> > > hosting and effort to maintain.
> > >
> > >
> > > On Wed, Nov 14, 2018 at 11:47 AM Misty Linville <mi...@apache.org>
> > wrote:
> > >
> > > > When discussing the 2.0.x branch in another thread, it came up that
> we
> > > > don’t have a good way to understand the version skew of HBase across
> > the
> > > > user base. Metrics gathering can be tricky. You don’t want to capture
> > > > personally identifiable information (PII) and you need to be
> > transparent
> > > > about what you gather, for what purpose, how long the data will be
> > > > retained, etc. The data can also be sensitive, for instance if a
> large
> > > > number of installations are running a version with a CVE or known
> > > > vulnerability against it. If you gather metrics, it really needs to
> be
> > > > opt-out rather than opt-in so that you actually get a reasonable
> amount
> > > of
> > > > data. You also need to stand up some kind of metrics-gathering
> service
> > > and
> > > > run it somewhere, and some kind of reporting / visualization tooling.
> > The
> > > > flip side of all these difficulties is a more intelligent way to
> decide
> > > > when to retire a branch or when to communicate more broadly / loudly
> > > asking
> > > > people in a certain version stream to upgrade, as well as where to
> > > > concentrate our efforts.
> > > >
> > > > I’m not sticking my hand up to implement such a monster. I only
> wanted
> > to
> > > > open a discussion and see what y’all think. It seems to me that a few
> > > > must-haves are:
> > > >
> > > > - Transparency: Release notes, logging about the status of
> > > > metrics-gathering (on or off) at master or RS start-up, logging about
> > > > exactly when and what metrics are sent
> > > > - Low frequency: Would we really need to wake up and send metrics
> more
> > > > often than weekly?
> > > > - Conservative approach: Only collect what we can find useful today,
> > > don’t
> > > > collect the world.
> > > > - Minimize PII: This probably means not trying to group together
> > > > time-series results for a given server or cluster at all, but could
> > make
> > > > the data look like there were a lot more clusters running in the
> world
> > > than
> > > > really are.
> > > > - Who has access to the data? Do we make it public or limit access to
> > the
> > > > PMC? Making it public would bolster our discipline about transparency
> > and
> > > > minimizing PII.
> > > >
> > > > I’m sure I’m missing a ton so I leave the discussion to y’all.
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Words like orphans lost among the crosstalk, meaning torn from truth's
> > > decrepit hands
> > >    - A23, Crosstalk
> > >
> >
>

Reply via email to