Re: [Distutils] Draft PEP for JSON based metadata published

2013-05-28 Thread Nick Coghlan
A couple of significant upcoming changes:

"build label" will be renamed as "source label" (since it refers to
the common unbuilt source rather than a specific build)
"version URL" will be renamed as "source URL" (same rationale)

The current names are ambiguous as to whether they refer to the source
code for the version or can be used to refer to built versions. Since
they're specifically for source references (you need to add at least
PEP 425 compatibility tags to construct a built reference), it makes
sense to change the names.

A more minor change is that the "organization" type/role for contacts
will go away. Organization will be able to have any of the defined
roles (author, maintainer, contributor) and if we later decide we need
a programmatic means to distinguish abstract organisations from flesh
and blood humans we can consider adding a new mechanism.

Cheers,
Nick.

--
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Draft PEP for JSON based metadata published

2013-05-28 Thread Nick Coghlan
On Tue, May 28, 2013 at 7:28 AM, Donald Stufft  wrote:
> On May 27, 2013, at 10:44 AM, Ronald Oussoren 
> wrote:
> The versioning spec mentions that distribution tools may refuse to publish
> distributions that pin the versions of dependencies. I understand why this
> is needed, and agree in general, but have a usecase that I don't know how to
> express without pinning.
>
> In particular, PyObjC consists of a number of distributions (pyobjc-core,
> pyobjc-framework-Cocoa, ...) and an umbrella package (pyobjc) what depends
> on the various distributions to make it easier to install all of PyObjC. The
> umbrella package currently pins the versions of subpackages to ensure that
> "pip install pyobjc==2.5.1" installs exactly that version of the entire
> project.  When I'd use the "compatible release" specifier I can no longer
> easily ensure that users can install an exact version of the entire project,
> other than by hacking the system: specify a compatible version with an
> additional level that isn't used by the project (for example ~=2.5.2.0).
> What is the correct way to create an umberella project without getting
> yelled at by distribution tools?
>
>
> It's unlikely PyPI will get more than a warning for ``==``, `is` comparisons
> might be disallowed? Not sure.

I think Ronald's example of publishing metadistributions that pin
particular versions of subdistributions is a valid one (I do exactly
the same thing myself with RPM, it just didn't occur to me as a use
case while updating the PEPs), so I need to reconsider some of the
index server restrictions currently proposed in the PEPs.

However, I'd also still like to not-so-gently steer users away from
overly restrictive dependencies in the general case.

This is a case where in a *technical* sense there's no difference
between "We are making these distributions we maintain easier to
install all at once" and "Our distribution needs a compatible version
of this other distribution in order to work", but *semantically*
they're two quite different operations.

So, what do people think of the idea of a new top level "distributes"
field? Syntax identical to "requires", but *semantically*
distinguished in that version pinning in "distributes" would be not
only allowed, but encouraged. A metapackage like PyObjC would then
have just entries in the "distributes" field, and no direct
dependencies of its own.

Thoughts?

Cheers,
Nick.

--
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Draft PEP for JSON based metadata published

2013-05-28 Thread Nick Coghlan
On Wed, May 29, 2013 at 11:04 AM, Donald Stufft  wrote:
>
> On May 28, 2013, at 9:00 PM, Nick Coghlan  wrote:
>
> On Mon, May 27, 2013 at 9:36 PM, Nick Coghlan  wrote:
>
> After preliminary reviews by Donald and Daniel, I have now pushed the
> first complete draft of the JSON-based metadata 2.0 proposal to
> python.org
>
> PEP 426 (metadata 2.0): http://www.python.org/dev/peps/pep-0426/
> PEP 440 (versioning): http://www.python.org/dev/peps/pep-0440/
>
>
> Based on some offline feedback from Daniel, I'm going to change the
> current "type" field in the contact metadata to "role". The name of
> the default role will change from "individual" to "contributor", and
> projects will be given freedom to define their own roles beyond the
> predefined ones. (We're actually stealing this from the way contact
> metadata works in PHP's composer).

Hmm, I may actually drop the extensibility idea - it makes the tooling
harder without providing a significant benefit. So just the name
changes for the field and the default value.

Cheers,
Nick.

--
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Draft PEP for JSON based metadata published

2013-05-28 Thread Donald Stufft

On May 28, 2013, at 9:00 PM, Nick Coghlan  wrote:

> On Mon, May 27, 2013 at 9:36 PM, Nick Coghlan  wrote:
>> After preliminary reviews by Donald and Daniel, I have now pushed the
>> first complete draft of the JSON-based metadata 2.0 proposal to
>> python.org
>> 
>> PEP 426 (metadata 2.0): http://www.python.org/dev/peps/pep-0426/
>> PEP 440 (versioning): http://www.python.org/dev/peps/pep-0440/
> 
> Based on some offline feedback from Daniel, I'm going to change the
> current "type" field in the contact metadata to "role". The name of
> the default role will change from "individual" to "contributor", and
> projects will be given freedom to define their own roles beyond the
> predefined ones. (We're actually stealing this from the way contact
> metadata works in PHP's composer).
> 
> Cheers,
> Nick.
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> http://mail.python.org/mailman/listinfo/distutils-sig

Please define what the valid values for the role field are when you include it.

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Draft PEP for JSON based metadata published

2013-05-28 Thread Nick Coghlan
On Mon, May 27, 2013 at 9:36 PM, Nick Coghlan  wrote:
> After preliminary reviews by Donald and Daniel, I have now pushed the
> first complete draft of the JSON-based metadata 2.0 proposal to
> python.org
>
> PEP 426 (metadata 2.0): http://www.python.org/dev/peps/pep-0426/
> PEP 440 (versioning): http://www.python.org/dev/peps/pep-0440/

Based on some offline feedback from Daniel, I'm going to change the
current "type" field in the contact metadata to "role". The name of
the default role will change from "individual" to "contributor", and
projects will be given freedom to define their own roles beyond the
predefined ones. (We're actually stealing this from the way contact
metadata works in PHP's composer).

Cheers,
Nick.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] [ANN] pypiserver 1.1.1 - minimal private pypi server

2013-05-28 Thread Ralf Schmitt
Hi,

I've just uploaded pypiserver 1.1.1 to the python package index.

pypiserver is a minimal PyPI compatible server. It can be used to serve
a set of packages and eggs to easy_install or pip.

pypiserver is easy to install (i.e. just 'pip install pypiserver'). It
doesn't have any external dependencies.

https://pypi.python.org/pypi/pypiserver/ should contain enough
information to easily get you started running your own PyPI server in a
few minutes.

The code is available on github: https://github.com/schmir/pypiserver

Changes in this version
---
- add 'overwrite' option to allow overwriting existing package
  files (default: false)
- show names with hyphens instead of underscores on the "/simple"
  listing
- make the standalone version work with jython 2.5.3
- upgrade waitress to 0.8.5 in the standalone version
- workaround broken xmlrpc api on pypi.python.org by using HTTPS

-- 
Cheers
Ralf
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Draft PEP for JSON based metadata published

2013-05-28 Thread Daniel Holth
On Tue, May 28, 2013 at 2:07 PM, Erik Bray  wrote:
> On Mon, May 27, 2013 at 7:36 AM, Nick Coghlan  wrote:
>> After preliminary reviews by Donald and Daniel, I have now pushed the
>> first complete draft of the JSON-based metadata 2.0 proposal to
>> python.org
>>
>> PEP 426 (metadata 2.0): http://www.python.org/dev/peps/pep-0426/
>> PEP 440 (versioning): http://www.python.org/dev/peps/pep-0440/
>>
>> With the rationale and commentary, they're over 3000 lines between
>> them, so I'm not attaching them here.
>>
>> The rationale for many of the changes is at the end of each PEP, along
>> with some comments on features that I have either rejected or
>> deliberately chosen to defer to the next revision of the metadata (at
>> the earliest).
>>
>> Those with BitBucket accounts may also comment inline on the drafts here:
>>
>> PEP 426: 
>> https://bitbucket.org/ncoghlan/misc/src/05d3586464b10d6a04a35409468269d7c89a87ba/pep_drafts/pep-0426.txt?at=default
>> PEP 440: 
>> https://bitbucket.org/ncoghlan/misc/src/05d3586464b10d6a04a35409468269d7c89a87ba/pep_drafts/pep-0440.txt?at=default
>
> This is looking fantastic so far--thanks to Nick, Daniel, and Donald
> for their continued work on this.  For now I just have a handful of
> minor notes on the latest draft of PEP 426:
>
> Typos:
>
> Under "Essential dependency resolution metadata" the "may_require" and
> related metadata keywords are spelled with hyphens instead of
> underscores.
>
> Under "Metabuild system" in the first example I think
> "some_test_harness.metabuild_hook" was meant to read
> "some_test_harness:metabuild_hook"
>
>
> Under "Development, build and deployment dependencies":  "allow" -> "allows"
>
> Under "Support for metabuild hooks":  "by allows projects" -> "by
> allowing projects"
>
> Comment:
>
> I'm not sure if this PEP is the best place for this, but I wonder if
> the description of the "Keywords" format could provide some
> clarification on how that field should be formatted in older metadata
> versions (specifically when including version 1.x metadata for
> backwards compatibility).  In the past its format has never been
> specified.  Some tools treat it as a space-separated fields.  Others
> have treated it as a comma-separated field.  Sometimes one or the
> other depending on whether commas are present.  It's a very annoying
> field.

I suggest treating it as a space-separated field for converting from
2.0 to 1.0. To convert from 1.0 to 2.0 you should just split on "not a
letter" or if you are feeling ambitious "not some larger set of
characters, probably resembling the identifier or package name rules".
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Good news everyone, PyPI is behind a CDN

2013-05-28 Thread Donald Stufft

On May 28, 2013, at 2:21 PM, Erik Bray  wrote:

> On Mon, May 27, 2013 at 1:19 AM, Lennart Regebro  wrote:
>> On Sun, May 26, 2013 at 7:34 PM, Noah Kantrowitz  wrote:
>>> 
>>> 
>>> but seriously, at long last today it was my honor to throw the DNS switch 
>>> to move PyPI to the Fastly caching CDN. I would like to thank Donald Stufft 
>>> for doing much of the heavy lifting on the PyPI side, and to Fastly for 
>>> graciously offering to host us. What does this mean for everyone? Well the 
>>> biggest change is PyPI should get a whole lot faster. There are two major 
>>> downsides however. There will now be a delay of several minutes in some 
>>> cases between updating a package and having it be installable, and download 
>>> counts will now be even more incorrect than they were before. The PyPI 
>>> admins are discussing what to do about download counts long-term, but for 
>>> now we all feel that the performance and availability benefits outweigh the 
>>> loss. If anyone has any questions, or hears anything about issues with PyPI 
>>> please don't hesitate to contact me.
>> 
>> This is going to spell disaster for the coffee industry, as you no
>> longer have to take a coffee break when re-running a buildout.
>> 
>> Thanks!
> 
> I always test pip installation from PyPI "just in case" after
> uploading a new package, so the new cache delay still leaves some time
> for a coffee break (until Daniel gets the cache invalidation
> integrated :/).  But yes, so many hoorays for this \o/

I already enabled Cache Invalidation.

> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> http://mail.python.org/mailman/listinfo/distutils-sig


-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Good news everyone, PyPI is behind a CDN

2013-05-28 Thread Erik Bray
On Mon, May 27, 2013 at 1:19 AM, Lennart Regebro  wrote:
> On Sun, May 26, 2013 at 7:34 PM, Noah Kantrowitz  wrote:
>> 
>>
>> but seriously, at long last today it was my honor to throw the DNS switch to 
>> move PyPI to the Fastly caching CDN. I would like to thank Donald Stufft for 
>> doing much of the heavy lifting on the PyPI side, and to Fastly for 
>> graciously offering to host us. What does this mean for everyone? Well the 
>> biggest change is PyPI should get a whole lot faster. There are two major 
>> downsides however. There will now be a delay of several minutes in some 
>> cases between updating a package and having it be installable, and download 
>> counts will now be even more incorrect than they were before. The PyPI 
>> admins are discussing what to do about download counts long-term, but for 
>> now we all feel that the performance and availability benefits outweigh the 
>> loss. If anyone has any questions, or hears anything about issues with PyPI 
>> please don't hesitate to contact me.
>
> This is going to spell disaster for the coffee industry, as you no
> longer have to take a coffee break when re-running a buildout.
>
> Thanks!

I always test pip installation from PyPI "just in case" after
uploading a new package, so the new cache delay still leaves some time
for a coffee break (until Daniel gets the cache invalidation
integrated :/).  But yes, so many hoorays for this \o/
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Draft PEP for JSON based metadata published

2013-05-28 Thread Erik Bray
On Mon, May 27, 2013 at 7:36 AM, Nick Coghlan  wrote:
> After preliminary reviews by Donald and Daniel, I have now pushed the
> first complete draft of the JSON-based metadata 2.0 proposal to
> python.org
>
> PEP 426 (metadata 2.0): http://www.python.org/dev/peps/pep-0426/
> PEP 440 (versioning): http://www.python.org/dev/peps/pep-0440/
>
> With the rationale and commentary, they're over 3000 lines between
> them, so I'm not attaching them here.
>
> The rationale for many of the changes is at the end of each PEP, along
> with some comments on features that I have either rejected or
> deliberately chosen to defer to the next revision of the metadata (at
> the earliest).
>
> Those with BitBucket accounts may also comment inline on the drafts here:
>
> PEP 426: 
> https://bitbucket.org/ncoghlan/misc/src/05d3586464b10d6a04a35409468269d7c89a87ba/pep_drafts/pep-0426.txt?at=default
> PEP 440: 
> https://bitbucket.org/ncoghlan/misc/src/05d3586464b10d6a04a35409468269d7c89a87ba/pep_drafts/pep-0440.txt?at=default

This is looking fantastic so far--thanks to Nick, Daniel, and Donald
for their continued work on this.  For now I just have a handful of
minor notes on the latest draft of PEP 426:

Typos:

Under "Essential dependency resolution metadata" the "may_require" and
related metadata keywords are spelled with hyphens instead of
underscores.

Under "Metabuild system" in the first example I think
"some_test_harness.metabuild_hook" was meant to read
"some_test_harness:metabuild_hook"


Under "Development, build and deployment dependencies":  "allow" -> "allows"

Under "Support for metabuild hooks":  "by allows projects" -> "by
allowing projects"

Comment:

I'm not sure if this PEP is the best place for this, but I wonder if
the description of the "Keywords" format could provide some
clarification on how that field should be formatted in older metadata
versions (specifically when including version 1.x metadata for
backwards compatibility).  In the past its format has never been
specified.  Some tools treat it as a space-separated fields.  Others
have treated it as a comma-separated field.  Sometimes one or the
other depending on whether commas are present.  It's a very annoying
field.

Erik
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] changelog / CDN inconsistency

2013-05-28 Thread Martin v. Löwis
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Am 28.05.13 11:04, schrieb Christian Theune:
> So - what's the next step that can happen ASAP?

In addition to the changes Donald already did, I think it would be
wise to restart mirroring at min(last_serial, last_mirroring - 1 minute).

This will cause any simple pages to be re-downloaded that had been
updated just before the last mirror run completed.

If you aren't checking md5sums of files after download, you should
(I always wanted to put this into pep381client).

Then, if you re-download the simple page, you can skip files that
you already have downloaded, and whose md5sum did not change.

Regards,
Martin

-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.18 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlGk28UACgkQavBT8H2dyNKmSACffo+Nwa0R+csgRgm/5fJUsqUY
Xm0AnjanJrexpu7Y/Rv0CJP76r6rdsS7
=oMsM
-END PGP SIGNATURE-
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] pypi protocol

2013-05-28 Thread Martin v. Löwis
Am 26.05.13 22:08, schrieb Jonas Geiregat:
> I ended up reading pypiserver's source code to find out the internals. 

Notice that this is the wrong source code. The real PyPI source code is in

https://bitbucket.org/pypa/pypi/src

Regards,
Martin
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] [Infrastructure] changelog / CDN inconsistency

2013-05-28 Thread Donald Stufft

On May 28, 2013, at 12:00 PM, "Martin v. Löwis"  wrote:

> Am 28.05.13 16:23, schrieb Donald Stufft:
>> Option 4: We add the expected hash of the simple page to the change log.
>> Mirror clients can then assert their state consistent.
> 
> That would work. It would also cover the case where a new release
> happens while the mirroring is in progress.
> 
> On the other hand, it's difficult to advise what to do if you find that
> the simple page does *not* match the most recent hashsum. You'ld have to
> wait a little bit, and hope that the CDN will eventually provide the
> current version.
> 
>> Should also probably assert the file hashes that are in the simple index. 
> 
> Indeed, with the same limitation as above: if you find that the CDN
> gives the old version, you'll have to wait (or bypass the CDN).
> 
> Regards,
> Martin
> 
> 

Immediately after committing the database transaction PyPI tells the CDN to 
purge it's cache for the packages that have been affected. Fastly advertises 
"instant" purging and in practice this means that the CDN will be serving the 
current version in less than a second after the database transaction has been 
commited. At certainly at the most a handful of seconds.

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] [Infrastructure] changelog / CDN inconsistency

2013-05-28 Thread Martin v. Löwis
Am 28.05.13 16:23, schrieb Donald Stufft:
> Option 4: We add the expected hash of the simple page to the change log.
> Mirror clients can then assert their state consistent.

That would work. It would also cover the case where a new release
happens while the mirroring is in progress.

On the other hand, it's difficult to advise what to do if you find that
the simple page does *not* match the most recent hashsum. You'ld have to
wait a little bit, and hope that the CDN will eventually provide the
current version.

> Should also probably assert the file hashes that are in the simple index. 

Indeed, with the same limitation as above: if you find that the CDN
gives the old version, you'll have to wait (or bypass the CDN).

Regards,
Martin


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] [Infrastructure] changelog / CDN inconsistency

2013-05-28 Thread Martin v. Löwis
Am 28.05.13 16:39, schrieb Donald Stufft:
> On May 28, 2013, at 10:36 AM, holger krekel  wrote:
>> yes, i also thought of option 4.  Is that easy to implement on the side of 
>> pypi?
>> If we checksum the simple-page, we need idem-potent generation of simple 
>> pages
>> and ordering to begin with -- which is probably anyway a good idea.  
>> It doesn't need to be version-ordering, just some consistent ordering.
> 
> Check summing is easy yes. 

And there is already a guarantee of a stable checksum for simple pages,
because of the server signing of simple pages (which also computes a
hash of the simple page already).

Regards,
Martin

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] [Infrastructure] changelog / CDN inconsistency

2013-05-28 Thread Martin v. Löwis
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

> Mirroring is in a bad state because it comes (and has always) with 
> absolutely no guarantees of consistency.

That is not true. There are no absolute guarantees, but certainly
partial guarantees of consistency. It's a kind of "eventual
consistency": if the releases are all older than the mirror
frequency, the mirror will be consistent.

> You dismiss the issues of having serial n+1 changes, but that is a
> serious problem. If you fetch up to serial N of package1 which has
> the released version of 1.0, and then you fetch serial N+2 of
> package2 which has a hard requirement on package 1.1 (which was
> released in serial N+1) you now have packages that are not
> installable via your mirror because of inconsistent state.

Sure, but that is only a temporary problem, with a inconsistency window
of a few minutes in the worst case - and it only occurs if serials N,
N+1, and N+2 all happen within 5 minutes (i.e. two releases of package1,
and one release of package2).

When the mirror script runs again, it will find that serial N+1 already
happened, and fetch package1 and package2 again.

> If someone comes up with a better option that doesn't require a
> large rearch of the storage code in PyPI I'm happy to review and
> deploy it.

This could be fixed by having PyPI provide old versions of the simple
page. It would not be possible to do so exactly currently. However,
excluding releases newer than a given date would be possible, by
inspecting the journal.

Regards,
Martin
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.18 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlGk0YgACgkQavBT8H2dyNImkQCffy1BKiYNxV71Bvtxpk+UAwPc
j7wAn39wK7vMmERQhpSTfJ5iBPcP3wCr
=yZBk
-END PGP SIGNATURE-
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] [Infrastructure] Good news everyone, PyPI is behind a CDN

2013-05-28 Thread Martin v. Löwis
Am 28.05.13 14:48, schrieb M.-A. Lemburg:
> We've had the CDN discussion for quite a while and I even setup
> a test CDN some months ago. No one ever mentioned the HTTP/1.0
> problem and so it simply wasn't on the radar.

On the other hand, the other problems *where* mentioned with respect
to CDNs multiple times over the recent years, so this shouldn't have
surprised anybody.

Regards,
Martin


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] changelog / CDN inconsistency (was: Re: Good news everyone, PyPI is behind a CDN)

2013-05-28 Thread Donald Stufft
On May 28, 2013, at 10:36 AM, holger krekel  wrote:

> On Tue, May 28, 2013 at 10:23 -0400, Donald Stufft wrote:
>> On May 28, 2013, at 8:20 AM, Donald Stufft  wrote:
>> 
>>> 
>>> On May 28, 2013, at 5:04 AM, Christian Theune  wrote:
>>> 
 Hi,
 
 
 On 27. May2013, at 10:41 PM, Donald Stufft  wrote:
> Just to assure folks. I do consider Mirroring a first class citizen and 
> an important feature.
 
 Thanks for that acknowledgement. Lets sort out what to do now - this is 
 becoming urgent for me as the author of the currently recommended 
 mirroring tool for public mirrors and as an operator of a mirror that is 
 being relied upon.
 
 I agree with Holgers points.
 
 I don't think the mirroring is completely backwards right now. I agree 
 there's been an incomplete PEP that's been hanging around too long. 
 
 My current client implementation is pretty simple and has had reliable 
 semantics until now.
 
 A couple of things I noticed in the discussion that I'd like to point out:
 
 - We mirror simple pages because the PEP requires us to - this is part of 
 the existing validation approach. I can drop that to get mirrors not to 
 rely on simple pages from the CDN but then authentication of the simple 
 pages will be broken.
 
 - Release files are replaced all the time.
 
 The semantics that I like to keep with the mirrors is this:
 
 When I get a changelog for serial X and I start copying simple pages and 
 files then I (as a mirror) promise my clients that I have incorporated *at 
 least* all changes up until serial X  (but maybe also partial changes from 
 X+n).
 
 I'm afraid that the mirrors data are now inconsistent - we can repair that 
 once we have a stable mirroring approach again, but until then people will 
 start getting annoyed again. 
 
 I'm also concerned that I don't really have time to follow up on what's 
 happening with TUF regarding mirroring on top of what happened regarding 
 the CDN. My feeling is that will result in more fire fighting.
 
 So - what's the next step that can happen ASAP?
>>> 
>>> Options)
>>> 
>>> 1) When mirroring retain N minutes worth of old serials and redo them. 
>>> Mirroring is idempotent you can repeat it with no negative side effects.  
>>> Conditional HTTP requests should also be supported to minimize the 
>>> bandwidth.
>>> 2) Wait a few seconds after fetching the change log to begin processing.
>>> 3) Use front.python.org with the pypi.python.org HOST header with the 
>>> caveat this is not guaranteed to be stable in the long term.
>>> 4) ???
>> 
>> Option 4: We add the expected hash of the simple page to the change log. 
>> Mirror clients can then assert their state consistent.
>> 
>> Should also probably assert the file hashes that are in the simple index.
> 
> yes, i also thought of option 4.  Is that easy to implement on the side of 
> pypi?
> If we checksum the simple-page, we need idem-potent generation of simple pages
> and ordering to begin with -- which is probably anyway a good idea.  
> It doesn't need to be version-ordering, just some consistent ordering.

Check summing is easy yes. 

> 
> As mentioned in the other mail, for the short-term i'd go for 3) once Noah
> and you confirm you are not going to kill it before we have settled on
> a new solution (maybe option 4). 
> 
> best,
> holger
> 
> 
>>> Of them 1) is more likely to give you the best 
>>> resultshttp://mail.python.org/pipermail/distutils-sig/2013-May/020855.html 
>>> the constraints of HTTP. All it takes is someone to run your mirroring 
>>> script behind a caching proxy and pre-CDN you'd have the exact situation we 
>>> have now.
>>> 
>>> Mirroring is in a bad state because it comes (and has always) with 
>>> absolutely no guarantees of consistency. You dismiss the issues of having 
>>> serial n+1 changes, but that is a serious problem. If you fetch up to 
>>> serial N of package1 which has the released version of 1.0, and then you 
>>> fetch serial N+2 of package2 which has a hard requirement on package 1.1 
>>> (which was released in serial N+1) you now have packages that are not 
>>> installable via your mirror because of inconsistent state.
>>> 
>>> If someone comes up with a better option that doesn't require a large 
>>> rearch of the storage code in PyPI I'm happy to review and deploy it.
>>> 
 
 Christian
 
 -- 
 Christian Theune · c...@gocept.com
 gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
 http://gocept.com · Tel +49 345 1229889-7
 Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
>>> 
>>> 
>>> -
>>> Donald Stufft
>>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>>> 
>>> ___
>>> Distutils-SIG maillist  -  Distutils-SIG@python

Re: [Distutils] changelog / CDN inconsistency (was: Re: Good news everyone, PyPI is behind a CDN)

2013-05-28 Thread Donald Stufft

On May 28, 2013, at 10:36 AM, holger krekel  wrote:

> On Tue, May 28, 2013 at 10:23 -0400, Donald Stufft wrote:
>> On May 28, 2013, at 8:20 AM, Donald Stufft  wrote:
>> 
>>> 
>>> On May 28, 2013, at 5:04 AM, Christian Theune  wrote:
>>> 
 Hi,
 
 
 On 27. May2013, at 10:41 PM, Donald Stufft  wrote:
> Just to assure folks. I do consider Mirroring a first class citizen and 
> an important feature.
 
 Thanks for that acknowledgement. Lets sort out what to do now - this is 
 becoming urgent for me as the author of the currently recommended 
 mirroring tool for public mirrors and as an operator of a mirror that is 
 being relied upon.
 
 I agree with Holgers points.
 
 I don't think the mirroring is completely backwards right now. I agree 
 there's been an incomplete PEP that's been hanging around too long. 
 
 My current client implementation is pretty simple and has had reliable 
 semantics until now.
 
 A couple of things I noticed in the discussion that I'd like to point out:
 
 - We mirror simple pages because the PEP requires us to - this is part of 
 the existing validation approach. I can drop that to get mirrors not to 
 rely on simple pages from the CDN but then authentication of the simple 
 pages will be broken.
 
 - Release files are replaced all the time.
 
 The semantics that I like to keep with the mirrors is this:
 
 When I get a changelog for serial X and I start copying simple pages and 
 files then I (as a mirror) promise my clients that I have incorporated *at 
 least* all changes up until serial X  (but maybe also partial changes from 
 X+n).
 
 I'm afraid that the mirrors data are now inconsistent - we can repair that 
 once we have a stable mirroring approach again, but until then people will 
 start getting annoyed again. 
 
 I'm also concerned that I don't really have time to follow up on what's 
 happening with TUF regarding mirroring on top of what happened regarding 
 the CDN. My feeling is that will result in more fire fighting.
 
 So - what's the next step that can happen ASAP?
>>> 
>>> Options)
>>> 
>>> 1) When mirroring retain N minutes worth of old serials and redo them. 
>>> Mirroring is idempotent you can repeat it with no negative side effects.  
>>> Conditional HTTP requests should also be supported to minimize the 
>>> bandwidth.
>>> 2) Wait a few seconds after fetching the change log to begin processing.
>>> 3) Use front.python.org with the pypi.python.org HOST header with the 
>>> caveat this is not guaranteed to be stable in the long term.
>>> 4) ???
>> 
>> Option 4: We add the expected hash of the simple page to the change log. 
>> Mirror clients can then assert their state consistent.
>> 
>> Should also probably assert the file hashes that are in the simple index.
> 
> yes, i also thought of option 4.  Is that easy to implement on the side of 
> pypi?
> If we checksum the simple-page, we need idem-potent generation of simple pages
> and ordering to begin with -- which is probably anyway a good idea.  
> It doesn't need to be version-ordering, just some consistent ordering.
> 
> As mentioned in the other mail, for the short-term i'd go for 3) once Noah
> and you confirm you are not going to kill it before we have settled on
> a new solution (maybe option 4). 

#3 is how fastly connects. 

> 
> best,
> holger
> 
> 
>>> Of them 1) is more likely to give you the best 
>>> resultshttp://mail.python.org/pipermail/distutils-sig/2013-May/020855.html 
>>> the constraints of HTTP. All it takes is someone to run your mirroring 
>>> script behind a caching proxy and pre-CDN you'd have the exact situation we 
>>> have now.
>>> 
>>> Mirroring is in a bad state because it comes (and has always) with 
>>> absolutely no guarantees of consistency. You dismiss the issues of having 
>>> serial n+1 changes, but that is a serious problem. If you fetch up to 
>>> serial N of package1 which has the released version of 1.0, and then you 
>>> fetch serial N+2 of package2 which has a hard requirement on package 1.1 
>>> (which was released in serial N+1) you now have packages that are not 
>>> installable via your mirror because of inconsistent state.
>>> 
>>> If someone comes up with a better option that doesn't require a large 
>>> rearch of the storage code in PyPI I'm happy to review and deploy it.
>>> 
 
 Christian
 
 -- 
 Christian Theune · c...@gocept.com
 gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
 http://gocept.com · Tel +49 345 1229889-7
 Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
>>> 
>>> 
>>> -
>>> Donald Stufft
>>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>>> 
>>> ___
>>> Distutils-SIG maillist  -  Distutils-SIG@pytho

Re: [Distutils] changelog / CDN inconsistency (was: Re: Good news everyone, PyPI is behind a CDN)

2013-05-28 Thread holger krekel
On Tue, May 28, 2013 at 10:23 -0400, Donald Stufft wrote:
> On May 28, 2013, at 8:20 AM, Donald Stufft  wrote:
> 
> > 
> > On May 28, 2013, at 5:04 AM, Christian Theune  wrote:
> > 
> >> Hi,
> >> 
> >> 
> >> On 27. May2013, at 10:41 PM, Donald Stufft  wrote:
> >>> Just to assure folks. I do consider Mirroring a first class citizen and 
> >>> an important feature.
> >> 
> >> Thanks for that acknowledgement. Lets sort out what to do now - this is 
> >> becoming urgent for me as the author of the currently recommended 
> >> mirroring tool for public mirrors and as an operator of a mirror that is 
> >> being relied upon.
> >> 
> >> I agree with Holgers points.
> >> 
> >> I don't think the mirroring is completely backwards right now. I agree 
> >> there's been an incomplete PEP that's been hanging around too long. 
> >> 
> >> My current client implementation is pretty simple and has had reliable 
> >> semantics until now.
> >> 
> >> A couple of things I noticed in the discussion that I'd like to point out:
> >> 
> >> - We mirror simple pages because the PEP requires us to - this is part of 
> >> the existing validation approach. I can drop that to get mirrors not to 
> >> rely on simple pages from the CDN but then authentication of the simple 
> >> pages will be broken.
> >> 
> >> - Release files are replaced all the time.
> >> 
> >> The semantics that I like to keep with the mirrors is this:
> >> 
> >> When I get a changelog for serial X and I start copying simple pages and 
> >> files then I (as a mirror) promise my clients that I have incorporated *at 
> >> least* all changes up until serial X  (but maybe also partial changes from 
> >> X+n).
> >> 
> >> I'm afraid that the mirrors data are now inconsistent - we can repair that 
> >> once we have a stable mirroring approach again, but until then people will 
> >> start getting annoyed again. 
> >> 
> >> I'm also concerned that I don't really have time to follow up on what's 
> >> happening with TUF regarding mirroring on top of what happened regarding 
> >> the CDN. My feeling is that will result in more fire fighting.
> >> 
> >> So - what's the next step that can happen ASAP?
> > 
> > Options)
> > 
> > 1) When mirroring retain N minutes worth of old serials and redo them. 
> > Mirroring is idempotent you can repeat it with no negative side effects.  
> > Conditional HTTP requests should also be supported to minimize the 
> > bandwidth.
> > 2) Wait a few seconds after fetching the change log to begin processing.
> > 3) Use front.python.org with the pypi.python.org HOST header with the 
> > caveat this is not guaranteed to be stable in the long term.
> > 4) ???
> > 
> 
> Option 4: We add the expected hash of the simple page to the change log. 
> Mirror clients can then assert their state consistent.
> 
> Should also probably assert the file hashes that are in the simple index. 

yes, i also thought of option 4.  Is that easy to implement on the side of pypi?
If we checksum the simple-page, we need idem-potent generation of simple pages
and ordering to begin with -- which is probably anyway a good idea.  
It doesn't need to be version-ordering, just some consistent ordering.

As mentioned in the other mail, for the short-term i'd go for 3) once Noah
and you confirm you are not going to kill it before we have settled on
a new solution (maybe option 4). 

best,
holger


> > Of them 1) is more likely to give you the best 
> > resultshttp://mail.python.org/pipermail/distutils-sig/2013-May/020855.html 
> > the constraints of HTTP. All it takes is someone to run your mirroring 
> > script behind a caching proxy and pre-CDN you'd have the exact situation we 
> > have now.
> > 
> > Mirroring is in a bad state because it comes (and has always) with 
> > absolutely no guarantees of consistency. You dismiss the issues of having 
> > serial n+1 changes, but that is a serious problem. If you fetch up to 
> > serial N of package1 which has the released version of 1.0, and then you 
> > fetch serial N+2 of package2 which has a hard requirement on package 1.1 
> > (which was released in serial N+1) you now have packages that are not 
> > installable via your mirror because of inconsistent state.
> > 
> > If someone comes up with a better option that doesn't require a large 
> > rearch of the storage code in PyPI I'm happy to review and deploy it.
> > 
> >> 
> >> Christian
> >> 
> >> -- 
> >> Christian Theune · c...@gocept.com
> >> gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
> >> http://gocept.com · Tel +49 345 1229889-7
> >> Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
> > 
> > 
> > -
> > Donald Stufft
> > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> > 
> > ___
> > Distutils-SIG maillist  -  Distutils-SIG@python.org
> > http://mail.python.org/mailman/listinfo/distutils-sig

> ___
> D

Re: [Distutils] changelog / CDN inconsistency (was: Re: Good news everyone, PyPI is behind a CDN)

2013-05-28 Thread Donald Stufft
On May 28, 2013, at 8:20 AM, Donald Stufft  wrote:

> 
> On May 28, 2013, at 5:04 AM, Christian Theune  wrote:
> 
>> Hi,
>> 
>> 
>> On 27. May2013, at 10:41 PM, Donald Stufft  wrote:
>>> Just to assure folks. I do consider Mirroring a first class citizen and an 
>>> important feature.
>> 
>> Thanks for that acknowledgement. Lets sort out what to do now - this is 
>> becoming urgent for me as the author of the currently recommended mirroring 
>> tool for public mirrors and as an operator of a mirror that is being relied 
>> upon.
>> 
>> I agree with Holgers points.
>> 
>> I don't think the mirroring is completely backwards right now. I agree 
>> there's been an incomplete PEP that's been hanging around too long. 
>> 
>> My current client implementation is pretty simple and has had reliable 
>> semantics until now.
>> 
>> A couple of things I noticed in the discussion that I'd like to point out:
>> 
>> - We mirror simple pages because the PEP requires us to - this is part of 
>> the existing validation approach. I can drop that to get mirrors not to rely 
>> on simple pages from the CDN but then authentication of the simple pages 
>> will be broken.
>> 
>> - Release files are replaced all the time.
>> 
>> The semantics that I like to keep with the mirrors is this:
>> 
>> When I get a changelog for serial X and I start copying simple pages and 
>> files then I (as a mirror) promise my clients that I have incorporated *at 
>> least* all changes up until serial X  (but maybe also partial changes from 
>> X+n).
>> 
>> I'm afraid that the mirrors data are now inconsistent - we can repair that 
>> once we have a stable mirroring approach again, but until then people will 
>> start getting annoyed again. 
>> 
>> I'm also concerned that I don't really have time to follow up on what's 
>> happening with TUF regarding mirroring on top of what happened regarding the 
>> CDN. My feeling is that will result in more fire fighting.
>> 
>> So - what's the next step that can happen ASAP?
> 
> Options)
> 
> 1) When mirroring retain N minutes worth of old serials and redo them. 
> Mirroring is idempotent you can repeat it with no negative side effects.  
> Conditional HTTP requests should also be supported to minimize the bandwidth.
> 2) Wait a few seconds after fetching the change log to begin processing.
> 3) Use front.python.org with the pypi.python.org HOST header with the caveat 
> this is not guaranteed to be stable in the long term.
> 4) ???
> 

Option 4: We add the expected hash of the simple page to the change log. Mirror 
clients can then assert their state consistent.

Should also probably assert the file hashes that are in the simple index. 

> Of them 1) is more likely to give you the best 
> resultshttp://mail.python.org/pipermail/distutils-sig/2013-May/020855.html 
> the constraints of HTTP. All it takes is someone to run your mirroring script 
> behind a caching proxy and pre-CDN you'd have the exact situation we have now.
> 
> Mirroring is in a bad state because it comes (and has always) with absolutely 
> no guarantees of consistency. You dismiss the issues of having serial n+1 
> changes, but that is a serious problem. If you fetch up to serial N of 
> package1 which has the released version of 1.0, and then you fetch serial N+2 
> of package2 which has a hard requirement on package 1.1 (which was released 
> in serial N+1) you now have packages that are not installable via your mirror 
> because of inconsistent state.
> 
> If someone comes up with a better option that doesn't require a large rearch 
> of the storage code in PyPI I'm happy to review and deploy it.
> 
>> 
>> Christian
>> 
>> -- 
>> Christian Theune · c...@gocept.com
>> gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
>> http://gocept.com · Tel +49 345 1229889-7
>> Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
> 
> 
> -
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> http://mail.python.org/mailman/listinfo/distutils-sig
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] [Infrastructure] Good news everyone, PyPI is behind a CDN

2013-05-28 Thread M.-A. Lemburg
On 28.05.2013 14:26, Nick Coghlan wrote:
> On Tue, May 28, 2013 at 10:07 PM, Donald Stufft  wrote:
>> Moving to a CDN has been discussed before on either catalog-sig or
>> distutils-sig (Can't recall which offhand).
>>
>> Weekly status updates were posted to the infrastructure list as well as the
>> communication between us and Fastly as we ironed out SSL issues.
>>
>> The mirroring issue pre-invalidation was quickly corrected. We now
>> invalidate and we are looking at a window that is at most a few seconds
>> large.
> 
> One of the things I (successfully) advocated for at PyCon US was to
> open up the PEP process to cover things where python-dev aren't
> directly involved, but we need an official avenue for publication of
> significant changes in the Python ecosystem (with my main aim being to
> empower distutils-sig as a place where we could actually making final
> decisions about the evolution of the packaging ecosystem).
> 
> Given that, an Informational PEP with Discussion-To set to
> infrastructure-sig and Noah as BDFL-Delegate would be an eminently
> suitable way of keeping PyPI users and mirror operators that *aren't*
> following infrastructure-sig informed of upcoming changes that may
> impact the operation of PyPI clients.
> 
> infrastructure-sig has historically just been for backend hosting
> details, without significant impact to *client* facing behaviour -
> while I think it's fine to change that, it's also understandable that
> most developers of PyPI clients wouldn't be aware of upcoming changes
> that have only been discussed in detail on that list.

I don't think the infra sig is the right host for such discussions
and decisions.

I'd suggest to use the distutils-sig and make Donald/Richard the
PEP master for PyPI things, as they are maintaining it.

> So, as Holger said, great work and thanks for your efforts, but good
> communication does matter with these things. People don't like
> surprises, even well intentioned ones :)

We've had the CDN discussion for quite a while and I even setup
a test CDN some months ago. No one ever mentioned the HTTP/1.0
problem and so it simply wasn't on the radar.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 28 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2013-07-01: EuroPython 2013, Florence, Italy ...   34 days to go

: Try our mxODBC.Connect Python Database Interface for free ! ::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Good news everyone, PyPI is behind a CDN

2013-05-28 Thread Donald Stufft
On May 28, 2013, at 8:26 AM, Nick Coghlan  wrote:

> On Tue, May 28, 2013 at 10:07 PM, Donald Stufft  wrote:
>> Moving to a CDN has been discussed before on either catalog-sig or
>> distutils-sig (Can't recall which offhand).
>> 
>> Weekly status updates were posted to the infrastructure list as well as the
>> communication between us and Fastly as we ironed out SSL issues.
>> 
>> The mirroring issue pre-invalidation was quickly corrected. We now
>> invalidate and we are looking at a window that is at most a few seconds
>> large.
> 
> One of the things I (successfully) advocated for at PyCon US was to
> open up the PEP process to cover things where python-dev aren't
> directly involved, but we need an official avenue for publication of
> significant changes in the Python ecosystem (with my main aim being to
> empower distutils-sig as a place where we could actually making final
> decisions about the evolution of the packaging ecosystem).
> 
> Given that, an Informational PEP with Discussion-To set to
> infrastructure-sig and Noah as BDFL-Delegate would be an eminently
> suitable way of keeping PyPI users and mirror operators that *aren't*
> following infrastructure-sig informed of upcoming changes that may
> impact the operation of PyPI clients.
> 
> infrastructure-sig has historically just been for backend hosting
> details, without significant impact to *client* facing behaviour -
> while I think it's fine to change that, it's also understandable that
> most developers of PyPI clients wouldn't be aware of upcoming changes
> that have only been discussed in detail on that list.
> 

It is only a significant change if you make invalid assumptions about HTTP and 
consistent state between two requests. 

If you want to rely on that then let's talk about a system where we can 
reliably promise that. 

> So, as Holger said, great work and thanks for your efforts, but good
> communication does matter with these things. People don't like
> surprises, even well intentioned ones :)
> 

Point taken. In the future ill post any infrastructure upgrades I'm involved in 
not only to the infrastructure list but also to distutils sig. 

> Cheers,
> Nick.
> 
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Good news everyone, PyPI is behind a CDN

2013-05-28 Thread Nick Coghlan
On Tue, May 28, 2013 at 10:07 PM, Donald Stufft  wrote:
> Moving to a CDN has been discussed before on either catalog-sig or
> distutils-sig (Can't recall which offhand).
>
> Weekly status updates were posted to the infrastructure list as well as the
> communication between us and Fastly as we ironed out SSL issues.
>
> The mirroring issue pre-invalidation was quickly corrected. We now
> invalidate and we are looking at a window that is at most a few seconds
> large.

One of the things I (successfully) advocated for at PyCon US was to
open up the PEP process to cover things where python-dev aren't
directly involved, but we need an official avenue for publication of
significant changes in the Python ecosystem (with my main aim being to
empower distutils-sig as a place where we could actually making final
decisions about the evolution of the packaging ecosystem).

Given that, an Informational PEP with Discussion-To set to
infrastructure-sig and Noah as BDFL-Delegate would be an eminently
suitable way of keeping PyPI users and mirror operators that *aren't*
following infrastructure-sig informed of upcoming changes that may
impact the operation of PyPI clients.

infrastructure-sig has historically just been for backend hosting
details, without significant impact to *client* facing behaviour -
while I think it's fine to change that, it's also understandable that
most developers of PyPI clients wouldn't be aware of upcoming changes
that have only been discussed in detail on that list.

So, as Holger said, great work and thanks for your efforts, but good
communication does matter with these things. People don't like
surprises, even well intentioned ones :)

Cheers,
Nick.

--
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] changelog / CDN inconsistency (was: Re: Good news everyone, PyPI is behind a CDN)

2013-05-28 Thread Donald Stufft

On May 28, 2013, at 5:04 AM, Christian Theune  wrote:

> Hi,
> 
> 
> On 27. May2013, at 10:41 PM, Donald Stufft  wrote:
>> Just to assure folks. I do consider Mirroring a first class citizen and an 
>> important feature.
> 
> Thanks for that acknowledgement. Lets sort out what to do now - this is 
> becoming urgent for me as the author of the currently recommended mirroring 
> tool for public mirrors and as an operator of a mirror that is being relied 
> upon.
> 
> I agree with Holgers points.
> 
> I don't think the mirroring is completely backwards right now. I agree 
> there's been an incomplete PEP that's been hanging around too long. 
> 
> My current client implementation is pretty simple and has had reliable 
> semantics until now.
> 
> A couple of things I noticed in the discussion that I'd like to point out:
> 
> - We mirror simple pages because the PEP requires us to - this is part of the 
> existing validation approach. I can drop that to get mirrors not to rely on 
> simple pages from the CDN but then authentication of the simple pages will be 
> broken.
> 
> - Release files are replaced all the time.
> 
> The semantics that I like to keep with the mirrors is this:
> 
> When I get a changelog for serial X and I start copying simple pages and 
> files then I (as a mirror) promise my clients that I have incorporated *at 
> least* all changes up until serial X  (but maybe also partial changes from 
> X+n).
> 
> I'm afraid that the mirrors data are now inconsistent - we can repair that 
> once we have a stable mirroring approach again, but until then people will 
> start getting annoyed again. 
> 
> I'm also concerned that I don't really have time to follow up on what's 
> happening with TUF regarding mirroring on top of what happened regarding the 
> CDN. My feeling is that will result in more fire fighting.
> 
> So - what's the next step that can happen ASAP?

Options)

1) When mirroring retain N minutes worth of old serials and redo them. 
Mirroring is idempotent you can repeat it with no negative side effects.  
Conditional HTTP requests should also be supported to minimize the bandwidth.
2) Wait a few seconds after fetching the change log to begin processing.
3) Use front.python.org with the pypi.python.org HOST header with the caveat 
this is not guaranteed to be stable in the long term.
4) ???

Of them 1) is more likely to give you the best results within the constraints 
of HTTP. All it takes is someone to run your mirroring script behind a caching 
proxy and pre-CDN you'd have the exact situation we have now.

Mirroring is in a bad state because it comes (and has always) with absolutely 
no guarantees of consistency. You dismiss the issues of having serial n+1 
changes, but that is a serious problem. If you fetch up to serial N of package1 
which has the released version of 1.0, and then you fetch serial N+2 of 
package2 which has a hard requirement on package 1.1 (which was released in 
serial N+1) you now have packages that are not installable via your mirror 
because of inconsistent state.

If someone comes up with a better option that doesn't require a large rearch of 
the storage code in PyPI I'm happy to review and deploy it.

> 
> Christian
> 
> -- 
> Christian Theune · c...@gocept.com
> gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
> http://gocept.com · Tel +49 345 1229889-7
> Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
> 


-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Good news everyone, PyPI is behind a CDN

2013-05-28 Thread Donald Stufft

On May 28, 2013, at 2:57 AM, holger krekel  wrote:

> On Tue, May 28, 2013 at 07:42 +0100, Paul Moore wrote:
>> On 28 May 2013 02:53, Donald Stufft  wrote:
>> 
>>> Figured it out.
>>> 
>>> Use HTTPS.
>>> 
>> 
>> Can I suggest that if the new CDN means that there are additional
>> restrictions on what is supported (I've used the XMLRPC API without https
>> in one-off scripts in the past) then the officially supported API should be
>> properly documented once and for all in a PEP, including some sort of
>> "what's new" or "rationale" section describing the various changes that
>> have occurred recently and their impact on user code?
> 
> I second this.  I am building tools that interact with PyPI and people
> and customers are using them.  I don't want to find a switch announced
> which breaks them and then hear "sorry, that's the future now" without
> this future being documented and discussed before the fact.  The PyPI
> infrastructure and its supported tool interactions today are as important as
> evolving the language itself so PEPs are warranted.  As with PEP438 i am
> willing to help this process.
> 
>> I'm purely a casual user of the PyPI API and the discussion of these
>> changes haa mostly gone over my head. The one thing I've taken away from it
>> is that I may get problems if I just google for sample code to use. For
>> example, the above comment implies that
>> http://wiki.python.org/moin/PyPIXmlRpc (AIUI, the nearest to formal
>> documentation that the XMLRPC API has) is wrong (as it uses http).
>> 
>> I do appreciate all the work that is going on to improve the PyPI
>> infrastructure. I'm not saying the changes should be reverted, just that
>> the consequences should be clearly explained.
> 
> I also appreciate Noah's and Donald's CDN work here, up to the point where 
> it breaks things for unclear reasons.  Reasons which might very well
> be valid, nevertheless!
> 
> best,
> holger

Moving to a CDN has been discussed before on either catalog-sig or 
distutils-sig (Can't recall which offhand).

Weekly status updates were posted to the infrastructure list as well as the 
communication between us and Fastly as we ironed out SSL issues.

The mirroring issue pre-invalidation was quickly corrected. We now invalidate 
and we are looking at a window that is at most a few seconds large.

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] changelog / CDN inconsistency (was: Re: Good news everyone, PyPI is behind a CDN)

2013-05-28 Thread Christian Theune
Hi,


On 27. May2013, at 10:41 PM, Donald Stufft  wrote:
> Just to assure folks. I do consider Mirroring a first class citizen and an 
> important feature.

Thanks for that acknowledgement. Lets sort out what to do now - this is 
becoming urgent for me as the author of the currently recommended mirroring 
tool for public mirrors and as an operator of a mirror that is being relied 
upon.

I agree with Holgers points.

I don't think the mirroring is completely backwards right now. I agree there's 
been an incomplete PEP that's been hanging around too long. 

My current client implementation is pretty simple and has had reliable 
semantics until now.

A couple of things I noticed in the discussion that I'd like to point out:

- We mirror simple pages because the PEP requires us to - this is part of the 
existing validation approach. I can drop that to get mirrors not to rely on 
simple pages from the CDN but then authentication of the simple pages will be 
broken.

- Release files are replaced all the time.

The semantics that I like to keep with the mirrors is this:

When I get a changelog for serial X and I start copying simple pages and files 
then I (as a mirror) promise my clients that I have incorporated *at least* all 
changes up until serial X  (but maybe also partial changes from X+n).

I'm afraid that the mirrors data are now inconsistent - we can repair that once 
we have a stable mirroring approach again, but until then people will start 
getting annoyed again. 

I'm also concerned that I don't really have time to follow up on what's 
happening with TUF regarding mirroring on top of what happened regarding the 
CDN. My feeling is that will result in more fire fighting.

So - what's the next step that can happen ASAP?

Christian

-- 
Christian Theune · c...@gocept.com
gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
http://gocept.com · Tel +49 345 1229889-7
Python, Pyramid, Plone, Zope · consulting, development, hosting, operations



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI Download Counts

2013-05-28 Thread Domen Kožar
I'll also be at EP and can help to explain how Nix could solve the
isolation problem.


On Mon, May 27, 2013 at 5:46 PM, holger krekel  wrote:

> Hi Florian,
>
> On Mon, May 27, 2013 at 10:36 +0200, Florian Friesdorf wrote:
> > Hi Holger,
> >
> > holger krekel  writes:
> > > On Mon, May 27, 2013 at 17:41 +1000, Nick Coghlan wrote:
> > >> On Mon, May 27, 2013 at 5:27 PM, holger krekel 
> wrote:
> > >> > Not having download counts maybe lets us think harder about
> > >> > better metrics.  The number of projects using a package as a dep
> > >> > might be one.
> > >>
> > >> With the current downside being that it's hard for PyPI to figure out
> > >> that number, too :)
> > >
> > > Yip.  But something like Vinaj's red-dove approach or Marius'
> get_deps.py
> > > could provide a base.  We might think about a docker instance which
> > > could allow to quickly spawn new light VMs so we can isolate setup.py
> runs.
> > > (Yes, it's only Linux but it'd be a start).
> >
> > nix and nixpkgs allow this isolation on-top off linux, freebsd, OS X and
> > theoretically also cygwin (not sure how good cygwin is supported at the
> > moment).
> >
> > http://nixos.org/nix/
> > http://nixos.org/nixpkgs/
> >
> > From nixos.org:
> > Nix is a purely functional package manager. This means that it can
> > ensure that an upgrade to one package cannot break others, that you can
> > always roll back to previous version, that multiple versions of a
> > package can coexist on the same system, and much more.
> >
> > Nixpkgs is a large collection of packages that can be installed with the
> > Nix package manager.
>
> Interesting stuff, didn't know about it.
>
> Did you post this as a suggestion for provisioning an environment to
> run setup.py (on nix-supported platforms)?  If so, i am not sure how it
> would help exactly.  I guess myself i'd aim for a 80% solution for
> discovering
> dependencies first.  Simplest/quickest wins there :)
>
> > >> Agreed it would be a good number to publish once it's more readily
> > >> available, too.
> > >
> > > I think "dep" numbers are mostly interesting for libraries, not so
> > > much for applications like django or pyramid or tools like nose/pytest.
> > >
> > > Another more practical data point would be "does this package even
> > > install on win32/linux/osx py26/py27/py33" and even better, do its
> automated
> > > tests pass?
> >
> > http://hydra.nixos.org/build/5062796
> >
> > > If we could evolve to have this info published on pypi.python.org
> > > it would be quite useful i think.  I am actually currently implementing
> > > a system which enables this (the "devpi" system) so i don't mean this
> all just
> > > as "nice to have" theory.  I aim to present the status of this work
> > > at EuroPython.
> >
> > Nice! Looking forward to that.
> >
> > If you have any questions about nix/nixpkgs/nixos, especially about the
> > way python packages are packaged, please let me know. Also, it's not set
> > in stone.
>
> are you going to be at EP?  It's a long conference and i am more than
> happy to sit together on this topic for a bit sometimes.
>
> best,
> holger
>
>
> > Personally, I'd love to see hydra.python.org providing builds of all
> > pypi packages and would be happy to help. Also including Domen and Rok
> > for whome I assume the same.
>
>
> >
> > You might have other tools that are better suited for you.
> >
> > regards
> > florian
> > --
> > Florian Friesdorf 
> >   GPG FPR: 7A13 5EEE 1421 9FC2 108D  BAAF 38F8 99A3 0C45 F083
> > Jabber/XMPP: f...@chaoflow.net
> > IRC: chaoflow on freenode,ircnet,blafasel,OFTC
>
>
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Good news everyone, PyPI is behind a CDN

2013-05-28 Thread Donald Stufft

On May 28, 2013, at 2:42 AM, Paul Moore  wrote:

> On 28 May 2013 02:53, Donald Stufft  wrote:
> Figured it out.
> 
> Use HTTPS.
> 
> Can I suggest that if the new CDN means that there are additional 
> restrictions on what is supported (I've used the XMLRPC API without https in 
> one-off scripts in the past) then the officially supported API should be 
> properly documented once and for all in a PEP, including some sort of "what's 
> new" or "rationale" section describing the various changes that have occurred 
> recently and their impact on user code?
> 
> I'm purely a casual user of the PyPI API and the discussion of these changes 
> haa mostly gone over my head. The one thing I've taken away from it is that I 
> may get problems if I just google for sample code to use. For example, the 
> above comment implies that http://wiki.python.org/moin/PyPIXmlRpc (AIUI, the 
> nearest to formal documentation that the XMLRPC API has) is wrong (as it uses 
> http).
> 
> I do appreciate all the work that is going on to improve the PyPI 
> infrastructure. I'm not saying the changes should be reverted, just that the 
> consequences should be clearly explained.
> 
> Paul.

To be quite honest the HTTP 1.0 + HTTP issue simply wasn't discovered in 
testing. The http url works fine on Python 2.7 (which I'm assuming uses HTTP 
1.1). I'm not completely happy that HTTP is broken in Python2.6 (and I'm 
assuming earlier) and have it on my list to see if there's anything that can be 
done.

THat being said the most future compatible way will be to use the HTTPS url for 
any interaction (and ideally verify the SSL, but the built in XMLRPC library 
doesn't do that). My "Use HTTPS" was more to speak how to solve the issue 
*right now*.

Documentation should be updated to point to HTTPS though.

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] changelog / CDN inconsistency (was: Re: Good news everyone, PyPI is behind a CDN)

2013-05-28 Thread holger krekel
On Tue, May 28, 2013 at 11:04 +0200, Christian Theune wrote:
> On 27. May2013, at 10:41 PM, Donald Stufft  wrote:
> > Just to assure folks. I do consider Mirroring a first class citizen and an 
> > important feature.
> 
> Thanks for that acknowledgement. Lets sort out what to do now - this is 
> becoming urgent for me as the author of the currently recommended mirroring 
> tool for public mirrors and as an operator of a mirror that is being relied 
> upon.
> 
> I agree with Holgers points.
> 
> I don't think the mirroring is completely backwards right now. I agree 
> there's been an incomplete PEP that's been hanging around too long. 
> 
> My current client implementation is pretty simple and has had reliable 
> semantics until now.
> 
> A couple of things I noticed in the discussion that I'd like to point out:
> 
> - We mirror simple pages because the PEP requires us to - this is part of the 
> existing validation approach. I can drop that to get mirrors not to rely on 
> simple pages from the CDN but then authentication of the simple pages will be 
> broken.
> 
> - Release files are replaced all the time.
> 
> The semantics that I like to keep with the mirrors is this:
> 
> When I get a changelog for serial X and I start copying simple pages and 
> files then I (as a mirror) promise my clients that I have incorporated *at 
> least* all changes up until serial X  (but maybe also partial changes from 
> X+n).
> 
> I'm afraid that the mirrors data are now inconsistent - we can repair that 
> once we have a stable mirroring approach again, but until then people will 
> start getting annoyed again. 
> 
> I'm also concerned that I don't really have time to follow up on what's 
> happening with TUF regarding mirroring on top of what happened regarding the 
> CDN. My feeling is that will result in more fire fighting.
> 
> So - what's the next step that can happen ASAP?

The immediate way to get around the CDN/mirroring problems and to revert
to the pre-CDN consistency level, is to use the same access that fastly 
uses to get updates from pypi.python.org, namely a request on front.python.org
with a host-header.  I have this info from Donald with the cave-eat that
it's not guaranteed to remain possible.  Maybe Noah could agree to not
remove this facility without the current actors being on board for changes?
(i am also fine to have a dedicated domain instead of course).

Once this is settled, we can move on to fix current tools and deployments
and afterwards think about future improvements without the current urgency.

holger


> Christian
> 
> -- 
> Christian Theune · c...@gocept.com
> gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
> http://gocept.com · Tel +49 345 1229889-7
> Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
> 


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig