Re: [arch-dev-public] AUR ToS (aka making AUR user names public)

2017-03-07 Thread Sébastien Luttringer
On Sun, 2017-03-05 at 14:35 +0100, Lukas Fleischer wrote:
> ***

To me,
- It make sense to have AUR RPC searchable for username;- Information
are already public, so the question is more about a proper api
interface vs crawling.- Solution should be available for every
registered users, not only for a researcher.
so, go on.
Regards,

-- 
Sébastien "Seblu" Luttringer
https://seblu.net | Twitter: @seblu42
GPG: 0x2072D77A


signature.asc
Description: This is a digitally signed message part


Re: [arch-dev-public] AUR ToS (aka making AUR user names public)

2017-03-07 Thread Allan McRae
On 05/03/17 23:35, Lukas Fleischer wrote:
> Hi,
> 
> I was recently contacted by a Polish researcher asking for a list of AUR
> account names.

Please share this data.  It looks like this person is doing genuine
research into networks within open-source software communities.

Providing easy access to already public data should be encouraged in all
fields of research.

Allan


Re: [arch-dev-public] AUR ToS (aka making AUR user names public)

2017-03-07 Thread Bartłomiej Piotrowski
On 2017-03-05 14:35, Lukas Fleischer wrote:
> 

As I already said on IRC, I do not consider username as private data. If
someone is afraid of connecting his activity on AUR with GitHub (or any
other service), they shouldn't use the same username in the first place.
Especially given how easily the data can be scrapped from AUR itself and
git repositories. +1 for sharing it, either by RPC or giving the list to
Dorota.

Bartłomiej



signature.asc
Description: OpenPGP digital signature


Re: [arch-dev-public] AUR ToS (aka making AUR user names public)

2017-03-07 Thread Lukas Fleischer
On Tue, 07 Mar 2017 at 09:23:47, Gaetan Bisson wrote:
> [2017-03-07 09:05:28 +0100] Lukas Fleischer:
> > If we *really* think that we need to keep user names secret, I think we
> > should take down the whole AUR website because we already share this
> > information everywhere without explicitly telling our users we do so. Or
> > at least censor the user names on every single page they appear on which
> > would be a lot of work.
> 
> Intent matters a lot in court. It's not just pure logic arguments.
> There's a big difference between showing the usernames of package
> maintainers, or comment posters, as users would expect, and plainly
> giving the whole list out to a third party --- as they wouldn't expect.

As I already mentioned in my initial email (below the part you quoted in
your first reply), my idea was to make the list of user names obtainable
via the RPC interface, similar to the GitHub API for user names [1]. If
you skipped that part, please read the complete email. The basic idea is
that user names are public anyway, so it makes sense to provide a sane
interface for retrieving them.

Regards,
Lukas

[1] https://developer.github.com/v3/users/


Re: [arch-dev-public] AUR ToS (aka making AUR user names public)

2017-03-07 Thread Gaetan Bisson
[2017-03-07 09:05:28 +0100] Lukas Fleischer:
> If we *really* think that we need to keep user names secret, I think we
> should take down the whole AUR website because we already share this
> information everywhere without explicitly telling our users we do so. Or
> at least censor the user names on every single page they appear on which
> would be a lot of work.

Intent matters a lot in court. It's not just pure logic arguments.
There's a big difference between showing the usernames of package
maintainers, or comment posters, as users would expect, and plainly
giving the whole list out to a third party --- as they wouldn't expect.

Cheers.

-- 
Gaetan


Re: [arch-dev-public] AUR ToS (aka making AUR user names public)

2017-03-07 Thread Lukas Fleischer
On Sun, 05 Mar 2017 at 22:54:07, Gaetan Bisson wrote:
> [2017-03-05 14:35:05 +0100] Lukas Fleischer:
> > My original questions was: Are we fine with sharing the list of AUR
> > accounts names (only user names, no real names or email addresses) with
> > a researcher that seems trustworthy and agrees to not share the data in
> > any form other than the resulting anonymized statistics?
> 
> I am strongly against this because it seems to me it would put us in a
> very weak legal position (though as always IANAL).
> 
> The simple argument is that when users sign up for an AUR account they
> have no expectation that any data they submit (including their username)
> might be shared with a third-party.
> 
> Now as you've noticed with other Internet services, sharing data with
> third-parties is kind of a big deal. To the point that many services can
> only be used after you've agreed to some kind of EULA where you consent
> to your data being shared. For us it's even worse, there's no EULA, just
> what users might expect us to do with their data. So please let's err on
> the safe side here.
> [...]

I gave this a second thought and I still do not see how publishing the
list of user names would lead to a very weak legal position, especially
if you consider our legal position relative to the current situation.

If we *really* think that we need to keep user names secret, I think we
should take down the whole AUR website because we already share this
information everywhere without explicitly telling our users we do so. Or
at least censor the user names on every single page they appear on which
would be a lot of work.

Maybe we should do what Phil suggested in the email I just forwarded to
the list (forgot to fix the In-Reply-To and References headers, sorry).
Write ToS as soon as possible, make users accept them when logging in
and send notifications to all users. Then delete all remaining accounts
after a grace period. A nice side benefit of this is that we would make
sure all passwords are migrated from MD5 to bcrypt, see [1, 2].
Opinions?

Regards,
Lukas

[1] https://lists.archlinux.org/pipermail/aur-dev/2017-February/004291.html
[2] https://git.archlinux.org/aurweb.git/commit/?id=29a4870


Re: [arch-dev-public] AUR ToS (aka making AUR user names public)

2017-03-05 Thread Gaetan Bisson
Dave,

You appeared to have inserted some text in the middle of Lukas' message
with no indication whatsoever which paragraphs are yours and which are
his. I'm sure GMail can tell them apart but for those of us who use
run-of-the-mill emails could you find a way to fix this behavior?

I'm attaching your mail as it got into my inbox.

Cheers.

-- 
Gaetan


dave.mbox
Description: application/mbox


Re: [arch-dev-public] AUR ToS (aka making AUR user names public)

2017-03-05 Thread Dave Reisner
On Sun, Mar 5, 2017 at 10:34 AM, Lukas Fleischer 
wrote:

> On Sun, 05 Mar 2017 at 19:11:33, Dave Reisner wrote:
> > As long as we publish a list of all available packages, it doesn't matter
> > if we comply with this request -- the information is already obtainable
> > through RPC requests.
>
> Could you elaborate, please? I do not see how this information is
> already obtainable. In particular, there is no easy way to obtain user
> names of "inactive" accounts (no comments, no package submissions, no
> requests, ...) -- is there?
>

Ah, I guess you're right -- there isn't.


Re: [arch-dev-public] AUR ToS (aka making AUR user names public)

2017-03-05 Thread Gaetan Bisson
[2017-03-05 14:35:05 +0100] Lukas Fleischer:
> My original questions was: Are we fine with sharing the list of AUR
> accounts names (only user names, no real names or email addresses) with
> a researcher that seems trustworthy and agrees to not share the data in
> any form other than the resulting anonymized statistics?

I am strongly against this because it seems to me it would put us in a
very weak legal position (though as always IANAL).

The simple argument is that when users sign up for an AUR account they
have no expectation that any data they submit (including their username)
might be shared with a third-party.

Now as you've noticed with other Internet services, sharing data with
third-parties is kind of a big deal. To the point that many services can
only be used after you've agreed to some kind of EULA where you consent
to your data being shared. For us it's even worse, there's no EULA, just
what users might expect us to do with their data. So please let's err on
the safe side here.

Surely there's tons of other username lists on the Internet this
researcher can use...

Cheers.

-- 
Gaetan


Re: [arch-dev-public] AUR ToS (aka making AUR user names public)

2017-03-05 Thread Lukas Fleischer
On Sun, 05 Mar 2017 at 19:11:33, Dave Reisner wrote:
> As long as we publish a list of all available packages, it doesn't matter
> if we comply with this request -- the information is already obtainable
> through RPC requests.

Could you elaborate, please? I do not see how this information is
already obtainable. In particular, there is no easy way to obtain user
names of "inactive" accounts (no comments, no package submissions, no
requests, ...) -- is there?


Re: [arch-dev-public] AUR ToS (aka making AUR user names public)

2017-03-05 Thread Lukas Fleischer
On Sun, 05 Mar 2017 at 18:40:36, Thorsten Töpper wrote:
> As stated in IRC I'm against handing out user data (including nick
> names) to a 3rd party. Personally due to mentioned privacy stuff, but
> also the legal problems we may run into as we don't have a ToS. So
> under these circumstances I have a bad feeling being making these
> information available to someone else even if the person leaves a proper
> impression.

While I agree that not having ToS might be a problem, I don't see why
publicly advertising the user name list is an issue, compared to the
situation we are currently in. We already make the user name public in
so many contexts (pretty much every action that requires an account).
Given that, it should be pretty clear that you implicitly agree to share
your user name by registering (IANAL, so this might be wrong from a
legal standpoint, though).

> 
> Regarding the crawler I put in as a work around for the researcher
> party to collect the already available public names I don't understand
> why you extend this to brute forcing the account pages or going
> through archives of the mailing list. The suggestion I made was that
> it's simple to collect a list of all packages stored in AUR and then
> get the common fields of original submitter, maintainers and people who
> made comments for each package. Either by using a plain GET to request
> the HTML page for the package or using the interfaces available (I'm not
> familiar with those and what they provide). This does not involve any
> brute force attacks as the package names are available. Also for the
> scripts doing this no login necessary.

I only mentioned various possibilities to already obtain a list of user
names. Theoretically, the complete list is already available online but
brute-forcing all account details sites is not feasible in practice.

Parsing the package details pages is a first naive approach that works
in practice. Parsing even more sites is the next logical step. Scanning
account details pages for a list of known user names gives you even more
information and is still practically feasible.

> 
> The names gathered this way are already public and can be found with
> every large search engine. Sure this will create some load, but I
> assume any reasonable person would put a short sleep in between the
> requests.

All true, but if we are fine with sharing this information I still do
not see why we should not provide a sane interface.

> 
> I agree that we should get a ToS for the AUR.

Volunteers? :)

Regards,
Lukas


Re: [arch-dev-public] AUR ToS (aka making AUR user names public)

2017-03-05 Thread Dave Reisner
On Mar 5, 2017 8:35 AM, "Lukas Fleischer"  wrote:

Hi,

I was recently contacted by a Polish researcher asking for a list of AUR
account names. I did not expect this to be controversial but a couple of
Trusted Users raised concerns on IRC, so I decided to move this to the
public mailing list and discuss the whole topic in generality. I would
like to head more opinions but please read the whole email and give it a
second thought before simply bringing up the usual privacy arguments
mentioned below.

My original questions was: Are we fine with sharing the list of AUR
accounts names (only user names, no real names or email addresses) with
a researcher that seems trustworthy and agrees to not share the data in
any form other than the resulting anonymized statistics?


As long as we publish a list of all available packages, it doesn't matter
if we comply with this request -- the information is already obtainable
through RPC requests.

In this particular case, we are talking about Dorota Celinska [1] from
the University of Warsaw, Faculty of Economic Sciences [2], see [3] for
a list of her publications and [4] for a summary of her research project
funded recently by the Polish National Science Centre. She needs the
list of user names to perform a segmentation analysis, including users
which were active on the older AUR releases both do not show any
activity on AUR 4. She would also like to use the user names as
identifiers to establish connections with other platforms, such as
GitHub.

The next question is: Would it make sense to even make this data
publicly available? Would it make sense to extend our RPC interface such
that one can search for users names? GitHub, for example, already
provides such an interface [5]. Let me quickly summarize some arguments
for this idea which came up on IRC:

* User names are mostly identifiers. It is questionable whether they
  can/should be considered personal/private information. Maybe this can
  only be answered by a lawyer, though.

* The user names of all accounts with any kind of public activity, like
  uploading a package, filing a request, writing a comment, are public
  already.

* After logging into the aurweb interface, you can already check whether
  an account with a given user name exists because the account details
  page URIs have the form https://aur.archlinux.org/account/$username.
  This means that for any platform providing a list of user names (such
  as GitHub), you can "establish connections" with the AUR already.

Now the arguments against:

* Principle of data economy: We should not share any kind of information
  we do not need to share.

* Sharing user names lowers the threshold for sharing other information
  which is considered more confidential.

* Users can (and should) already use crawlers to fetch the user names.
  For example, the user names of all package maintainers and comment
  authors appear on the package details pages. The names of all users
  filing package requests appear in the mailing list archives etc.

* We do not have ToS so we better not share anything.

I, personally, find the second last argument a very weak one. Telling
users to build crawlers scraping an brute-forcing our HTML pages makes
life difficult for both them and us. What do you think?

On the other side of the coin, the last argument is a very good one and
it brings me to my last point. Independently of the outcome of this
discussion, I think we should add some ToS that users need to agree upon
when registering. It should contain information on liability and on
privacy. Is anybody willing to write a draft? Do we need the support of
a lawyer here?

Thank you for your time and have a nice Sunday!

Regards,
Lukas

[1] http://coin.wne.uw.edu.pl/dcelinska/en/
[2] https://www.wne.uw.edu.pl/index.php/en/
[3] http://coin.wne.uw.edu.pl/dcelinska/en/pages/publications.html
[4] https://ncn.gov.pl/sites/default/files/listy-rankingowe/2016-03-15/
streszczenia/337724-en.pdf
[5] https://developer.github.com/v3/users/


Re: [arch-dev-public] AUR ToS (aka making AUR user names public)

2017-03-05 Thread Thorsten Töpper
On Sun, 05 Mar 2017 14:35:05 +0100
Lukas Fleischer  wrote:

> Hi,
> 
> I was recently contacted by a Polish researcher asking for a list of
> AUR account names. I did not expect this to be controversial but a
> couple of Trusted Users raised concerns on IRC, so I decided to move
> this to the public mailing list and discuss the whole topic in
> generality. I would like to head more opinions but please read the
> whole email and give it a second thought before simply bringing up
> the usual privacy arguments mentioned below.
> 
> My original questions was: Are we fine with sharing the list of AUR
> accounts names (only user names, no real names or email addresses)
> with a researcher that seems trustworthy and agrees to not share the
> data in any form other than the resulting anonymized statistics?
> 
> In this particular case, we are talking about Dorota Celinska [1] from
> the University of Warsaw, Faculty of Economic Sciences [2], see [3]
> for a list of her publications and [4] for a summary of her research
> project funded recently by the Polish National Science Centre. She
> needs the list of user names to perform a segmentation analysis,
> including users which were active on the older AUR releases both do
> not show any activity on AUR 4. She would also like to use the user
> names as identifiers to establish connections with other platforms,
> such as GitHub.
> 
> The next question is: Would it make sense to even make this data
> publicly available? Would it make sense to extend our RPC interface
> such that one can search for users names? GitHub, for example, already
> provides such an interface [5]. Let me quickly summarize some
> arguments for this idea which came up on IRC:
> 
> * User names are mostly identifiers. It is questionable whether they
>   can/should be considered personal/private information. Maybe this
> can only be answered by a lawyer, though.
> 
> * The user names of all accounts with any kind of public activity,
> like uploading a package, filing a request, writing a comment, are
> public already.
> 
> * After logging into the aurweb interface, you can already check
> whether an account with a given user name exists because the account
> details page URIs have the form
> https://aur.archlinux.org/account/$username. This means that for any
> platform providing a list of user names (such as GitHub), you can
> "establish connections" with the AUR already.
> 
> Now the arguments against:
> 
> * Principle of data economy: We should not share any kind of
> information we do not need to share.
> 
> * Sharing user names lowers the threshold for sharing other
> information which is considered more confidential.
> 
> * Users can (and should) already use crawlers to fetch the user names.
>   For example, the user names of all package maintainers and comment
>   authors appear on the package details pages. The names of all users
>   filing package requests appear in the mailing list archives etc.
> 
> * We do not have ToS so we better not share anything.
> 
> I, personally, find the second last argument a very weak one. Telling
> users to build crawlers scraping an brute-forcing our HTML pages makes
> life difficult for both them and us. What do you think?
> 
> On the other side of the coin, the last argument is a very good one
> and it brings me to my last point. Independently of the outcome of
> this discussion, I think we should add some ToS that users need to
> agree upon when registering. It should contain information on
> liability and on privacy. Is anybody willing to write a draft? Do we
> need the support of a lawyer here?
> 
> Thank you for your time and have a nice Sunday!
> 
> Regards,
> Lukas
 

Hello,

As stated in IRC I'm against handing out user data (including nick
names) to a 3rd party. Personally due to mentioned privacy stuff, but
also the legal problems we may run into as we don't have a ToS. So
under these circumstances I have a bad feeling being making these
information available to someone else even if the person leaves a proper
impression.

Regarding the crawler I put in as a work around for the researcher
party to collect the already available public names I don't understand
why you extend this to brute forcing the account pages or going
through archives of the mailing list. The suggestion I made was that
it's simple to collect a list of all packages stored in AUR and then
get the common fields of original submitter, maintainers and people who
made comments for each package. Either by using a plain GET to request
the HTML page for the package or using the interfaces available (I'm not
familiar with those and what they provide). This does not involve any
brute force attacks as the package names are available. Also for the
scripts doing this no login necessary.

The names gathered this way are already public and can be found with
every large search engine. Sure this will create some load, but I
assume any reasonable person would put a 

[arch-dev-public] AUR ToS (aka making AUR user names public)

2017-03-05 Thread Lukas Fleischer
Hi,

I was recently contacted by a Polish researcher asking for a list of AUR
account names. I did not expect this to be controversial but a couple of
Trusted Users raised concerns on IRC, so I decided to move this to the
public mailing list and discuss the whole topic in generality. I would
like to head more opinions but please read the whole email and give it a
second thought before simply bringing up the usual privacy arguments
mentioned below.

My original questions was: Are we fine with sharing the list of AUR
accounts names (only user names, no real names or email addresses) with
a researcher that seems trustworthy and agrees to not share the data in
any form other than the resulting anonymized statistics?

In this particular case, we are talking about Dorota Celinska [1] from
the University of Warsaw, Faculty of Economic Sciences [2], see [3] for
a list of her publications and [4] for a summary of her research project
funded recently by the Polish National Science Centre. She needs the
list of user names to perform a segmentation analysis, including users
which were active on the older AUR releases both do not show any
activity on AUR 4. She would also like to use the user names as
identifiers to establish connections with other platforms, such as
GitHub.

The next question is: Would it make sense to even make this data
publicly available? Would it make sense to extend our RPC interface such
that one can search for users names? GitHub, for example, already
provides such an interface [5]. Let me quickly summarize some arguments
for this idea which came up on IRC:

* User names are mostly identifiers. It is questionable whether they
  can/should be considered personal/private information. Maybe this can
  only be answered by a lawyer, though.

* The user names of all accounts with any kind of public activity, like
  uploading a package, filing a request, writing a comment, are public
  already.

* After logging into the aurweb interface, you can already check whether
  an account with a given user name exists because the account details
  page URIs have the form https://aur.archlinux.org/account/$username.
  This means that for any platform providing a list of user names (such
  as GitHub), you can "establish connections" with the AUR already.

Now the arguments against:

* Principle of data economy: We should not share any kind of information
  we do not need to share.

* Sharing user names lowers the threshold for sharing other information
  which is considered more confidential.

* Users can (and should) already use crawlers to fetch the user names.
  For example, the user names of all package maintainers and comment
  authors appear on the package details pages. The names of all users
  filing package requests appear in the mailing list archives etc.

* We do not have ToS so we better not share anything.

I, personally, find the second last argument a very weak one. Telling
users to build crawlers scraping an brute-forcing our HTML pages makes
life difficult for both them and us. What do you think?

On the other side of the coin, the last argument is a very good one and
it brings me to my last point. Independently of the outcome of this
discussion, I think we should add some ToS that users need to agree upon
when registering. It should contain information on liability and on
privacy. Is anybody willing to write a draft? Do we need the support of
a lawyer here?

Thank you for your time and have a nice Sunday!

Regards,
Lukas

[1] http://coin.wne.uw.edu.pl/dcelinska/en/
[2] https://www.wne.uw.edu.pl/index.php/en/
[3] http://coin.wne.uw.edu.pl/dcelinska/en/pages/publications.html
[4] 
https://ncn.gov.pl/sites/default/files/listy-rankingowe/2016-03-15/streszczenia/337724-en.pdf
[5] https://developer.github.com/v3/users/