On May 8, 2014, at 5:22 PM, Donald Stufft <don...@stufft.io> wrote:

>> Socially, this change does not seem to be having the effect of
>> persuading more package developers to host on PyPI. The stick doesn't
>> appear to have worked, maybe we should be trying to find a carrot?
> 
> Do you have any data to point to that says it hasn’t worked? Just to see
> what impact it has had, I’m running my scripts again that I ran a year
> ago to see what has changed, already I can see they are processing
> MUCH faster than last year.

The data has finished processing, it represents a time diff of approximately
one year. The pip release that caused all of this was released about 4-5 months
ago.

Overall PyPI has seen a 50% growth in installable projects in that time. If the
change would have had no effect we'd expect to see a ~50% increase across the
board. However what we've seen is a a 60% (+10% of expected) increase in
projects that can only be installed from PyPI and a 12% decrease in projects
that have any unsafe files (-62% of expected).

Further more we can see that if pip were to change the default of
--allow-all-external it would take 23 projects from unable to be installed by
default to able to be installed by default. This represents 0.2% of installable
projects on PyPI. It would take an additional 40 projects and make one or more
additional files able to be downloaded by default.

Some other data points:

* We've gone from 86% of projects being installable from PyPI to 92%.
* We've gone from 5% of projects being only unsafely installable to 3%
* We've gone from 14% of projects having any files unsafe to install to 8%
* We've gone from 0.004% of projects being safely hosted externally to 0.2%

Looking at these numbers I think it's safe to say that in this time period that
the "hosting hygiene" of a PyPI project is more likely to be a better state
than it was a year ago. We cannot state for a fact if this is because of this
change or not, however given that the fallout is ~23 (or ~63) projects out of
38,835 I think it is incredibly reasonable to leave the defaults alone since
there is a reasonably high chance that they played at least some part in that
change.

I'd love to get these numbers to the point where the number of projects
installable strictly from PyPI is 100% (or at least 100% installable safely),
however 92% (or 92.2%) is getting pretty close to that and hopefully that
number will just continue to grow until it hits 100%.

For reference, here's the raw numbers as well as some summary of the data here:

    https://gist.github.com/dstufft/b14008d11c0a5760dbed

And the repository where the raw data as well as the scripts used to collect
and process it is here:

    https://github.com/dstufft/pypi.linkcheck

linkcollector.py collections while linkwriter.py writes out the json file, and
stats2.py processes and gives the numbers from the gist above. links.json is
the data from a year ago, and 2014-05-08.links.json is the data from today.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to