On May 28, 2013, at 10:36 AM, holger krekel <hol...@merlinux.eu> wrote:

> On Tue, May 28, 2013 at 10:23 -0400, Donald Stufft wrote:
>> On May 28, 2013, at 8:20 AM, Donald Stufft <don...@stufft.io> wrote:
>> 
>>> 
>>> On May 28, 2013, at 5:04 AM, Christian Theune <c...@gocept.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> 
>>>> On 27. May2013, at 10:41 PM, Donald Stufft <don...@stufft.io> wrote:
>>>>> Just to assure folks. I do consider Mirroring a first class citizen and 
>>>>> an important feature.
>>>> 
>>>> Thanks for that acknowledgement. Lets sort out what to do now - this is 
>>>> becoming urgent for me as the author of the currently recommended 
>>>> mirroring tool for public mirrors and as an operator of a mirror that is 
>>>> being relied upon.
>>>> 
>>>> I agree with Holgers points.
>>>> 
>>>> I don't think the mirroring is completely backwards right now. I agree 
>>>> there's been an incomplete PEP that's been hanging around too long. 
>>>> 
>>>> My current client implementation is pretty simple and has had reliable 
>>>> semantics until now.
>>>> 
>>>> A couple of things I noticed in the discussion that I'd like to point out:
>>>> 
>>>> - We mirror simple pages because the PEP requires us to - this is part of 
>>>> the existing validation approach. I can drop that to get mirrors not to 
>>>> rely on simple pages from the CDN but then authentication of the simple 
>>>> pages will be broken.
>>>> 
>>>> - Release files are replaced all the time.
>>>> 
>>>> The semantics that I like to keep with the mirrors is this:
>>>> 
>>>> When I get a changelog for serial X and I start copying simple pages and 
>>>> files then I (as a mirror) promise my clients that I have incorporated *at 
>>>> least* all changes up until serial X  (but maybe also partial changes from 
>>>> X+n).
>>>> 
>>>> I'm afraid that the mirrors data are now inconsistent - we can repair that 
>>>> once we have a stable mirroring approach again, but until then people will 
>>>> start getting annoyed again. 
>>>> 
>>>> I'm also concerned that I don't really have time to follow up on what's 
>>>> happening with TUF regarding mirroring on top of what happened regarding 
>>>> the CDN. My feeling is that will result in more fire fighting.
>>>> 
>>>> So - what's the next step that can happen ASAP?
>>> 
>>> Options)
>>> 
>>> 1) When mirroring retain N minutes worth of old serials and redo them. 
>>> Mirroring is idempotent you can repeat it with no negative side effects.  
>>> Conditional HTTP requests should also be supported to minimize the 
>>> bandwidth.
>>> 2) Wait a few seconds after fetching the change log to begin processing.
>>> 3) Use front.python.org with the pypi.python.org HOST header with the 
>>> caveat this is not guaranteed to be stable in the long term.
>>> 4) ???
>> 
>> Option 4: We add the expected hash of the simple page to the change log. 
>> Mirror clients can then assert their state consistent.
>> 
>> Should also probably assert the file hashes that are in the simple index.
> 
> yes, i also thought of option 4.  Is that easy to implement on the side of 
> pypi?
> If we checksum the simple-page, we need idem-potent generation of simple pages
> and ordering to begin with -- which is probably anyway a good idea.  
> It doesn't need to be version-ordering, just some consistent ordering.

Check summing is easy yes. 

> 
> As mentioned in the other mail, for the short-term i'd go for 3) once Noah
> and you confirm you are not going to kill it before we have settled on
> a new solution (maybe option 4). 
> 
> best,
> holger
> 
> 
>>> Of them 1) is more likely to give you the best 
>>> resultshttp://mail.python.org/pipermail/distutils-sig/2013-May/020855.html 
>>> the constraints of HTTP. All it takes is someone to run your mirroring 
>>> script behind a caching proxy and pre-CDN you'd have the exact situation we 
>>> have now.
>>> 
>>> Mirroring is in a bad state because it comes (and has always) with 
>>> absolutely no guarantees of consistency. You dismiss the issues of having 
>>> serial n+1 changes, but that is a serious problem. If you fetch up to 
>>> serial N of package1 which has the released version of 1.0, and then you 
>>> fetch serial N+2 of package2 which has a hard requirement on package 1.1 
>>> (which was released in serial N+1) you now have packages that are not 
>>> installable via your mirror because of inconsistent state.
>>> 
>>> If someone comes up with a better option that doesn't require a large 
>>> rearch of the storage code in PyPI I'm happy to review and deploy it.
>>> 
>>>> 
>>>> Christian
>>>> 
>>>> -- 
>>>> Christian Theune · c...@gocept.com
>>>> gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
>>>> http://gocept.com · Tel +49 345 1229889-7
>>>> Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
>>> 
>>> 
>>> -----------------
>>> Donald Stufft
>>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>>> 
>>> _______________________________________________
>>> Distutils-SIG maillist  -  Distutils-SIG@python.org
>>> http://mail.python.org/mailman/listinfo/distutils-sig
> 
>> _______________________________________________
>> Distutils-SIG maillist  -  Distutils-SIG@python.org
>> http://mail.python.org/mailman/listinfo/distutils-sig
> 
_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to