Re: Telemetry Policy

2017-08-15 Thread Volker Krause
On Monday, 14 August 2017 22:40:28 CEST Ingo Klöcker wrote:
> On Monday 14 August 2017 19:28:06 Volker Krause wrote:
> > On Monday, 14 August 2017 14:16:12 CEST Bhushan Shah wrote:
> > > Can we have policy on how long we can store data? It's just random
> > > idea but I think it makes sense to tell users that after X period
> > > of time your data will be invalidated?
> > > 
> > > This gives the "part-solution" to problem where user wants to delete
> > > their shared data.
> > 
> > Good point. I'm unsure on what to pick as a suitable timeframe though.
> > It's hard to give a specific time right now, we don't know yet how
> > quickly updates of our software are deployed, which is what is going
> > to determine the latency of getting the data we want. For that
> > question alone we are looking at years I think. Maybe this could be
> > worded as "data is only kept as long as the purpose of the data
> > collection hasn't been achieved yet", ie. as soon as we have the
> > answer we were looking for we delete the raw data.
> 
> For most purposes (e.g. which parts of the software are used how often)
> it should be possible to aggregate the raw data monthly and then throw
> away the raw data.
> 
> > The bigger problem however is that this conflicts with publishing the
> > data under a free license. At this point we lose any control to
> > enforce data retention limits.
> 
> With respect to the considerations to make the collected raw data public
> I ask you to contact a data protection officer
> (Datenschutzbeauftragte/r) to get her/his opinion. Quoting
> https://en.wikipedia.org/wiki/General_Data_Protection_Regulation: "Valid
> consent must be explicit for data collected and the purposes data is
> used for (Article 7; defined in Article 4)." Since you cannot state the
> purposes the data is used for (because once made public it could be used
> for any purpose), I cannot see how you could get the users' consent for
> this.

That is true, as long as we deal with personal data. When we discussed this 
for the deployment in GammaRay regarding GDPR compliance we came to conclusion 
that the collected data is not personal data, which makes this considerably 
easier. 

For illustration, the following JSON data is what a random GammaRay instance 
on this machine would submit right now if I would opt-in to the maximum 
telemetry level:

{
"applicationVersion": {  "value": "2.8.50"  },
"compiler": { "type": "GCC",  "version": "7.1" },
"opengl": {
"glslVersion": "1.30",
"renderer": "Haswell Mobile ",
"type": "GL",
"vendor": "Intel",
"vendorVersion": "Mesa 17.1.4",
"version": "3.0"
},
"platform": {
"os": "linux",
"version": "opensuse-tumbleweed"
},
"qtVersion": { "value": "5.9.2" },
"startCount": { "value": 34 },
"toolRatio": {
"objectinspector": { "property": 0.7619047619047619 },
"quickinspector": { "property": 0.23809523809523808 }
},
"usageTime": { "value": 12113  }
}

The server would add a timestamp to that. That's also the level of detail we 
are looking at for telemetry in KDE I think.

The policy kinda implies that we do not want to track anything that common 
sense or laws/regulations would classify as personal data, I'll make that 
explicit to be sure.

The only personal data item we get in touch then is the IP address I think, 
therefore the early separation from telemetry data is crucial. Then the 
telemetry data is just "non-personal" data, and GDPR etc wouldn't apply (in my 
understanding, IANAL).

Regards,
Volker

signature.asc
Description: This is a digitally signed message part.


Re: Telemetry Policy

2017-08-15 Thread Volker Krause
On Monday, 14 August 2017 22:26:36 CEST Thomas Pfeiffer wrote:
> On Sonntag, 13. August 2017 11:47:28 CEST Volker Krause wrote:
> > ## Minimalism
> > 
> > We only track the bare minimum of data necessary to answer specific
> > questions, we do not collect data preemptively or for exploratory
> > research.
> > In particular, this means:
> > - collected data  must have a clear purpose
> 
> While from a privacy perspective this certainly makes sense, with my user
> researcher hat on I'm worried that this might severely limit the usefulness
> of the whole operation, at least if changes to what is being tracked can
> only be made with each new major release of an application.
> 
> Psychologists usually collect more information in their studies than they
> would need strictly to test their hypotheses. We don't do that because we
> just want to collect data or to sell them or whatever.
> No, we collect them because in reality, things are hardly ever as clear-cut
> as we had hypothesized. Our hypotheses are often based on correlations
> between two variables, but in reality, more often than not there is some
> other variable which we had not thought of before that affects one or both
> of the variables we're interested in, and thereby distorts the data.
> 
> Now if we only collected the data that we had a-priori hypotheses about,
> that would mean that after every study, we'd have to go back to the drawing
> board and define which variables to collect next time. This would make
> research both slow and very expensive. By collecting additional data,
> however, we have the chance to run additional exploratory tests after the
> fact, and uncover new hypotheses that we can then test in the next study.
> 
> In the case of KUserFeedback, fortunately cost is not really an issue
> because we don't pay our users for providing the data. Time, on the other
> hand, _is_ an issue. If we strictly only collect data if a hypothesis
> exists about them, that means the following:
> 
> T0: The day of a KDE Applications release, I have a hypothesis about a
> causal link between two variables regarding the usage of KAlgebra.
> 
> T+1day: I use my incredible charming skills to coerce Aleix into
> implementing triggers for collecting data about these two variables.
> 
> T+4 months: The next release ships these collection triggers, data comes in.
> 
> T+5 months: After one month's worth of data are collected, I analyze them.
> the numbers look weird, something is odd. Damn, seems like some other
> variable is in play there. I have a few candidates in mind, some are more
> likely to be the culprit than others.
> 
> T+6 months: I convince Aleix to implement triggers for all the candidates.
> He's reluctant because that seems to go against the minimalism rule, but I
> convince him that I'm really unsure and don't want to risk another release
> cycle only to find out we had tested the wrong variables
> 
> T+8 months: The release with the new variables is out.
> 
> T+9 months: After a month's worth of data, I run my analysis again. Eureka!
> I've finally found my causal link!
> 
> T+10 months: We come up with an improvement to KAlgebra based on the link
> we've found, and it gets implemented.
> 
> T+12 months: A year after I formulated my first hypothesis, the fruits of
> the whole endeavor get into users' hands.
> 
> And this scenario does not even take into account that it may take months
> until our software reaches the big chunk of users who are on "stable
> distros".
> 
> So, long story short: While I agree that we should not just wildly collect
> everything we can, being able to start measuring variables only on the next
> release after a concrete hypothesis has been formulated about them could
> really slow us down.
> 
> Is there any possible way to mitigate this issue?

The latency is indeed a very valid concern, and we can't even estimate this 
properly yet (deployment latency is one of the first things to measure with 
telemetry IMHO). Expecting anything below several months is way too optimistic 
I think.

More aggressive preemptive tracking might avoid one cycle in your above 
example, but only if you actually manage to think about everything you will 
need in the end.

So to have the complete picture, what data would you want to collect if the 
policy wouldn't restrict you to purpose-bound minimalism? Having a few 
examples would make it easier to tweak the balance here I think.

Also note that if we would publish and freely license the raw data, any 
exploratory research on that would still be possible, even if that wasn't the 
original purpose of the data collection.

Technically there are of course ways to address all this, for example by data 
collection scripts provided by the server and executed by a KUserFeedback 
application-side runtime. That's actually how this started, based on Björn's 
initial wishlist, but I think it's clear why we didn't end up there :)

Regards,
Volker


signature.asc
Description: This is a digital

Re: [kde-community] Re: radical proposal: move IRC to Rocket.Chat

2017-08-15 Thread Jonathan Riddell
On Fri, Aug 11, 2017 at 09:49:11PM +0900, Eike Hein wrote:
> 
> I've given some more thought to Matrix as a contender and I'm
> increasingly liking this option among the available contenders.

We have the possibility of moving to Matrix and allowing individual
IRC channels to move to real Matrix channels in their own time.

But alas we've still not heard from the groups who have chosen not to
use IRC to see if it would interest them. VDG, Promo, anyone?

Jonathan


Re: [kde-community] Re: radical proposal: move IRC to Rocket.Chat

2017-08-15 Thread Thomas Pfeiffer

> On 15 Aug 2017, at 11:42, Jonathan Riddell  wrote:
> 
> On Fri, Aug 11, 2017 at 09:49:11PM +0900, Eike Hein wrote:
>> 
>> I've given some more thought to Matrix as a contender and I'm
>> increasingly liking this option among the available contenders.
> 
> We have the possibility of moving to Matrix and allowing individual
> IRC channels to move to real Matrix channels in their own time.
> 
> But alas we've still not heard from the groups who have chosen not to
> use IRC to see if it would interest them. VDG, Promo, anyone?
> 

The VDG has contributed to the Etherpad, so their requirements are covered in 
there.



Re: [kde-community] Re: radical proposal: move IRC to Rocket.Chat

2017-08-15 Thread Jonathan Riddell
On 15 August 2017 at 10:44, Thomas Pfeiffer  wrote:
> The VDG has contributed to the Etherpad, so their requirements are covered in 
> there.

How to evaluate if Matrix/Riot covers them?  Stuff like "Have a UI
that someone who is < 20 years old and cares about the looks of a UI
would use" is hard to evaluate and much of the rest is also about feel
which is hard to quantify.

Jonathan


Re: Proposal: Have the Community Set Ambitious Goals for Itself

2017-08-15 Thread Eike Hein


On 08/15/2017 07:47 AM, Lydia Pintscher wrote:
> On Mon, Aug 14, 2017 at 4:48 PM, Mirko Boehm - KDE  wrote:
>> I have seen only agreement and support for the porposal. What would be the
>> required steps to make an official announcement, and encourage people to
>> participate?
> 
> If I get at least two people to agree in this thread that they will
> submit a goal I commit to making the process work according to the
> proposed timeline.

I will.


Cheers,
Eike


Re: [kde-community] Re: radical proposal: move IRC to Rocket.Chat

2017-08-15 Thread Thomas Pfeiffer

> On 15 Aug 2017, at 12:09, Jonathan Riddell  wrote:
> 
> On 15 August 2017 at 10:44, Thomas Pfeiffer  wrote:
>> The VDG has contributed to the Etherpad, so their requirements are covered 
>> in there.
> 
> How to evaluate if Matrix/Riot covers them?  Stuff like "Have a UI
> that someone who is < 20 years old and cares about the looks of a UI
> would use" is hard to evaluate and much of the rest is also about feel
> which is hard to quantify.

That one was my initial copy & paste from the mailing list thread. I admit it’s 
not very good and I’ll have to make it more objective. Other VDG members were 
better at that because they didn’t come from the emotion-laden email thread.

In general the Etherpad has to be cleaned up, the more discussion-y parts have 
to be removed.

I’ll go over it tonight and make sure it’s in a usable shape, but any help with 
that is welcome, so everybody please feel free to edit anything that isn’t an 
objective requirement to turn it into one, and just delete the comments.

Re: Proposal: Have the Community Set Ambitious Goals for Itself

2017-08-15 Thread Bhushan Shah
On Tue, Aug 15, 2017 at 12:47:15AM +0200, Lydia Pintscher wrote:
> If I get at least two people to agree in this thread that they will
> submit a goal I commit to making the process work according to the
> proposed timeline.

/me too! :)

-- 
Bhushan Shah
http://blog.bshah.in
IRC Nick : bshah on Freenode
GPG key fingerprint : 0AAC 775B B643 7A8D 9AF7 A3AC FE07 8411 7FBC E11D


signature.asc
Description: PGP signature


Re: Collecting requirements for a KDE-wide instant messaging solution (was: Re: radical proposal: move IRC to Rocket.Chat)

2017-08-15 Thread Thomas Pfeiffer
Hey everyone,
just a quick progress update:

I have now cleaned up  https://notes.kde.org/p/KDE_IM_requirements by removing 
duplicates, removing all discussion / comments (so only plain requirements are 
left) and rewording most requirements to that they have a somewhat common 
wording.

The next step will be to turn this into a Kano survey which will be used to 
prioritize them (will do that tomorrow).

Cheers,
Thomas





Re: Proposal: Have the Community Set Ambitious Goals for Itself

2017-08-15 Thread Lydia Pintscher
OK great. Thanks! I'll look into the details until the end of the week. If
you want to help let me know by email.


Cheers
Lydia