Re: [Haskell-community] 2018 state of Haskell survey results

Gershom B Sun, 18 Nov 2018 21:07:31 -0800

On Sun, Nov 18, 2018 at 11:20 PM Richard Eisenberg <[email protected]> wrote:
>
> I have not analyzed the data myself, but I wonder how we jumped to the 
> conclusion that the troll was trying to promote Stack. Is there statistical 
> data that supports that conclusion? For example, just reading this thread, it 
> sounds like the bogus responses also really don't like the new release 
> schedule. Maybe the troll wants the old release schedule back and was just 
> lazy about programming the tool to vary the stack/cabal question answers 
> adequately.


Roughly 90% of the bogus responses disliked the new ghc schedule and
10% left the answer blank. As far as I know, 100% of the bogus
responses said they used stack exclusively. The answers to almost
every other question (except, I think, for targeted platform?) varied
significantly (although according to either uniform, linear, or normal
distributions for the most part). So as guesses go, this seems pretty
strong.

I will also say, though there's speculation about "false flags" and
other silliness floating around that I personally have a very good
guess as to who did this. There's one well-known troll who has these
preoccupations and is known for creating serial sockpuppet accounts,
and is just the right amount of obsessed to do something like this. A
few of the bogus responses actually had comments, and the comments
were all written in a voice that was unmistakeable as this troll as
well. Occam's razor seems to apply.

Finally, let me add why I don't think this was a "false flag" -- while
there were enough telltale markers that the fake answers could seem to
be detected, I don't think this was on purpose. There was _too much_
effort put into distributions of other choices, etc. If they had
wanted the fakes to be detected they would have left much stronger
evidence. Rather, from a forensic standpoint, this seems pretty clear
to me that the pattern of data is of someone _trying_ to cover their
tracks, but just making four or five errors which I could assemble
into a pattern. If they hadn't made those errors -- likely based on
bad priors about what the organic data would be that theirs would need
to "mesh" into -- then I think the deception would have been much
harder to detect.

--Gershom

> Given the contention around cabal vs stack, I agree that sociological 
> concerns suggest that the troll meant to tilt those scales. But I wouldn't 
> want a public accusation without at least some statistical analysis that 
> independently supports that conclusion.
>
> In any case, thanks to all for putting this together!
>
> Richard
>
> On Nov 18, 2018, at 4:31 PM, Taylor Fausak <[email protected]> wrote:
>
> Oops, the ordering of the answer choices is manual because some questions 
> have a natural order while others should just be most to least popular. I've 
> made another run through to make sure everything is sorted properly. I'll 
> probably hit publish in the next half hour or so unless there are any 
> objections.
>
> https://github.com/tfausak/tfausak.github.io/blob/fce97d07c369856d4c05b756c492eb6229a1b5c7/_posts/2018-11-18-2018-state-of-haskell-survey-results.markdown
>
>
> On Sun, Nov 18, 2018, at 3:07 PM, Gershom B wrote:
>
> The language extensions section doesn’t appear to be sorted properly. Outside 
> of that, I think that these results are looking much better and any effort to 
> find any additional outliers is probably not worth it for the moment. Thanks 
> for your work on this, and I appreciate you being responsive and attentive 
> when problems with the data were pointed out. There’s certainly some 
> interesting and helpful information to be gleaned from this data.
>
> Cheers,
> Gershom
>
>
>
>
> On November 18, 2018 at 2:55:10 PM, Taylor Fausak ([email protected]) wrote:
>
>
>
>
> Ok, I updated the function that checks for bad responses, re-ran the script, 
> and updated the announcement along with all the assets (charts, tables, and 
> CSV). Hopefully it's the last time, as I can't justify spending much more 
> time on this.
>
> https://github.com/tfausak/tfausak.github.io/blob/6f9991758ffeed085c45dd97e4ce6a82a8b1a73f/_posts/2018-11-18-2018-state-of-haskell-survey-results.markdown
>
>
> On Sun, Nov 18, 2018, at 2:32 PM, Michael Snoyman wrote:
>
> Just wanted to add in: good catch Gershom on identifying the problem, and 
> thank you Taylor for working to remove them from the report.
>
> On 18 Nov 2018, at 21:17, Taylor Fausak <[email protected]> wrote:
>
> Great catch, Gershom! There are indeed about 300 responses that tick all the 
> boxes except for disliking the new GHC release schedule. The main thing the 
> attacker seemed to be interested in was over-representing Stack and Stackage. 
> Also, bizarrely, Java.
>
> That brings the number of bogus responses up to 3,735, which puts the number 
> of legitimate responses at 1,361. For context, last year's survey asked far 
> fewer questions and had 1,335 responses.
>
>
> On Sun, Nov 18, 2018, at 1:26 PM, Imants Cekusins wrote:
>
> What if the announcement mentioned a large number of potentially bogus 
> responses, explained the grounds for this conclusion, with a new survey 
> conducted early next year?
>
> The next survey would then need to be done differently from this one somehow. 
> To improve the reliability, some authentication may be necessary.
>
>
> Maybe Stack, Cabal questions could be grouped as separate distinct surveys, 
> conducted by their maintainers through own channels?
>
> Not sure how much value is in exact numbers of users of Stack or Cabal. Both 
> groups are large enough. The maintainers of both groups are aware about usage 
> stats.
>
> Is either library likely to be influenced by this survey?
> _______________________________________________
> Haskell-community mailing list
> [email protected]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community
>
>
> _______________________________________________
> Haskell-community mailing list
> [email protected]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community
>
>
> _______________________________________________
> Haskell-community mailing list
> [email protected]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community
>
>
> _______________________________________________
> Haskell-community mailing list
> [email protected]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community
>
>
_______________________________________________
Haskell-community mailing list
[email protected]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Re: [Haskell-community] 2018 state of Haskell survey results

Reply via email to