On May 28, 2013, at 10:36 AM, holger krekel <hol...@merlinux.eu> wrote:
> On Tue, May 28, 2013 at 10:23 -0400, Donald Stufft wrote: >> On May 28, 2013, at 8:20 AM, Donald Stufft <don...@stufft.io> wrote: >> >>> >>> On May 28, 2013, at 5:04 AM, Christian Theune <c...@gocept.com> wrote: >>> >>>> Hi, >>>> >>>> >>>> On 27. May2013, at 10:41 PM, Donald Stufft <don...@stufft.io> wrote: >>>>> Just to assure folks. I do consider Mirroring a first class citizen and >>>>> an important feature. >>>> >>>> Thanks for that acknowledgement. Lets sort out what to do now - this is >>>> becoming urgent for me as the author of the currently recommended >>>> mirroring tool for public mirrors and as an operator of a mirror that is >>>> being relied upon. >>>> >>>> I agree with Holgers points. >>>> >>>> I don't think the mirroring is completely backwards right now. I agree >>>> there's been an incomplete PEP that's been hanging around too long. >>>> >>>> My current client implementation is pretty simple and has had reliable >>>> semantics until now. >>>> >>>> A couple of things I noticed in the discussion that I'd like to point out: >>>> >>>> - We mirror simple pages because the PEP requires us to - this is part of >>>> the existing validation approach. I can drop that to get mirrors not to >>>> rely on simple pages from the CDN but then authentication of the simple >>>> pages will be broken. >>>> >>>> - Release files are replaced all the time. >>>> >>>> The semantics that I like to keep with the mirrors is this: >>>> >>>> When I get a changelog for serial X and I start copying simple pages and >>>> files then I (as a mirror) promise my clients that I have incorporated *at >>>> least* all changes up until serial X (but maybe also partial changes from >>>> X+n). >>>> >>>> I'm afraid that the mirrors data are now inconsistent - we can repair that >>>> once we have a stable mirroring approach again, but until then people will >>>> start getting annoyed again. >>>> >>>> I'm also concerned that I don't really have time to follow up on what's >>>> happening with TUF regarding mirroring on top of what happened regarding >>>> the CDN. My feeling is that will result in more fire fighting. >>>> >>>> So - what's the next step that can happen ASAP? >>> >>> Options) >>> >>> 1) When mirroring retain N minutes worth of old serials and redo them. >>> Mirroring is idempotent you can repeat it with no negative side effects. >>> Conditional HTTP requests should also be supported to minimize the >>> bandwidth. >>> 2) Wait a few seconds after fetching the change log to begin processing. >>> 3) Use front.python.org with the pypi.python.org HOST header with the >>> caveat this is not guaranteed to be stable in the long term. >>> 4) ??? >> >> Option 4: We add the expected hash of the simple page to the change log. >> Mirror clients can then assert their state consistent. >> >> Should also probably assert the file hashes that are in the simple index. > > yes, i also thought of option 4. Is that easy to implement on the side of > pypi? > If we checksum the simple-page, we need idem-potent generation of simple pages > and ordering to begin with -- which is probably anyway a good idea. > It doesn't need to be version-ordering, just some consistent ordering. > > As mentioned in the other mail, for the short-term i'd go for 3) once Noah > and you confirm you are not going to kill it before we have settled on > a new solution (maybe option 4). #3 is how fastly connects. > > best, > holger > > >>> Of them 1) is more likely to give you the best >>> resultshttp://mail.python.org/pipermail/distutils-sig/2013-May/020855.html >>> the constraints of HTTP. All it takes is someone to run your mirroring >>> script behind a caching proxy and pre-CDN you'd have the exact situation we >>> have now. >>> >>> Mirroring is in a bad state because it comes (and has always) with >>> absolutely no guarantees of consistency. You dismiss the issues of having >>> serial n+1 changes, but that is a serious problem. If you fetch up to >>> serial N of package1 which has the released version of 1.0, and then you >>> fetch serial N+2 of package2 which has a hard requirement on package 1.1 >>> (which was released in serial N+1) you now have packages that are not >>> installable via your mirror because of inconsistent state. >>> >>> If someone comes up with a better option that doesn't require a large >>> rearch of the storage code in PyPI I'm happy to review and deploy it. >>> >>>> >>>> Christian >>>> >>>> -- >>>> Christian Theune · c...@gocept.com >>>> gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany >>>> http://gocept.com · Tel +49 345 1229889-7 >>>> Python, Pyramid, Plone, Zope · consulting, development, hosting, operations >>> >>> >>> ----------------- >>> Donald Stufft >>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA >>> >>> _______________________________________________ >>> Distutils-SIG maillist - Distutils-SIG@python.org >>> http://mail.python.org/mailman/listinfo/distutils-sig > >> _______________________________________________ >> Distutils-SIG maillist - Distutils-SIG@python.org >> http://mail.python.org/mailman/listinfo/distutils-sig > _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig