On May 27, 2013, at 12:18 PM, holger krekel wrote: > On Mon, May 27, 2013 at 14:59 -0400, Donald Stufft wrote: >> On May 27, 2013, at 2:54 PM, holger krekel <hol...@merlinux.eu> wrote: >> >>> On Mon, May 27, 2013 at 13:50 -0400, Donald Stufft wrote: >>>> On May 27, 2013, at 12:39 PM, Donald Stufft <don...@stufft.io> wrote: >>>> >>>>> >>>>> On May 27, 2013, at 8:08 AM, holger krekel <hol...@merlinux.eu> wrote: >>>>> >>>>>> Hi Noah, Donald, (CC also Richard, Christian), >>>>>> >>>>>> i just checked with a test package and think we might have a cache >>>>>> consistency / changelog API problem. It took me a while but here is >>>>>> the basic thing: I uploaded a test package, changelog API reports it has >>>>>> changed, then i go to its simple page, and some of the time the new >>>>>> release >>>>>> file shows up, sometimes not. >>>>>> >>>>>> Tools like bandersnatch, pep381 and devpi-server (and probably others) >>>>>> use PyPI's changelog API to determine if there are changes. It seems >>>>>> those changes are signalled faster than they become consistently >>>>>> accessible >>>>>> through the CDN. This can lead to inconsistent mirrors because when >>>>>> the CDN has the files there is no change event anymore. Such mirrors >>>>>> are run by companies in-house so i think it's a real problem. >>>>>> >>>>>> Even without mirroring there can be problems because installs are not >>>>>> directly repeatable: "pip install XYZ>=2.0" can give you first 2.0.1, >>>>>> then 2.0.0 a minute later. I had hoped that a particular ip address >>>>>> sees things consistently. >>>>>> >>>>>> I am not familiar with Fastly's caching properties -- can they notify >>>>>> about the fact that a page/file is consistently up-to-date everywhere? >>>>>> Or can the cache be globally invalidated for a particular page/file? >>>>>> Any other ideas? >>>>>> >>>>>> Failing customizing Fastly usage and also maybe for the short term, >>>>>> is/could there be a special location provided by pypi.python.org which >>>>>> the above tools could use to get at the actual non-cached data? We >>>>>> could then maybe mitigate the problem through updates of the respective >>>>>> tools. >>>>>> That would at least solve the problem for one of my customers i think. >>>>>> >>>>>> best, >>>>>> holger >>>>>> >>>>>> >>>>>> On Sun, May 26, 2013 at 10:34 -0700, Noah Kantrowitz wrote: >>>>>>> </farnsworth> >>>>>>> >>>>>>> but seriously, at long last today it was my honor to throw the DNS >>>>>>> switch to move PyPI to the Fastly caching CDN. I would like to thank >>>>>>> Donald Stufft for doing much of the heavy lifting on the PyPI side, and >>>>>>> to Fastly for graciously offering to host us. What does this mean for >>>>>>> everyone? Well the biggest change is PyPI should get a whole lot >>>>>>> faster. There are two major downsides however. There will now be a >>>>>>> delay of several minutes in some cases between updating a package and >>>>>>> having it be installable, and download counts will now be even more >>>>>>> incorrect than they were before. The PyPI admins are discussing what to >>>>>>> do about download counts long-term, but for now we all feel that the >>>>>>> performance and availability benefits outweigh the loss. If anyone has >>>>>>> any questions, or hears anything about issues with PyPI please don't >>>>>>> hesitate to contact me. >>>>>>> >>>>>>> --Noah >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> _______________________________________________ >>>>>>> Distutils-SIG maillist - Distutils-SIG@python.org >>>>>>> http://mail.python.org/mailman/listinfo/distutils-sig >>>>>> >>>>>> _______________________________________________ >>>>>> Distutils-SIG maillist - Distutils-SIG@python.org >>>>>> http://mail.python.org/mailman/listinfo/distutils-sig >>>>> >>>>> I mentioned it on twitter but might as well mention it here as well. >>>>> >>>>> Currently there is no invalidation going on. The effect on the mirroring >>>>> was unanticipated and I'm currently getting the invalidation API setup >>>>> within PyPI. >>>>> >>>>> ----------------- >>>>> Donald Stufft >>>>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 >>>>> DCFA >>>>> >>>>> _______________________________________________ >>>>> Distutils-SIG maillist - Distutils-SIG@python.org >>>>> http://mail.python.org/mailman/listinfo/distutils-sig >>>> >>>> >>>> >>>> /simple/ Pages should now be immediately invalidated when a new package is >>>> released. >>> >>> thanks Donald. Looking at the implementation, i wonder what happens if >>> after ``self._conn.commit()`` a changelog API call arrives, returns changes >>> and a client uses it to retrieve changes before the fastly-purging takes >>> place. It's still a potential race-condition or am i missing something? >>> >>> best, >>> holger >>> >>>> ----------------- >>>> Donald Stufft >>>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 >>>> DCFA >>>> >>> >>> >> >> >> There's no way around a race condition. >> >> ``self._conn.commit()`` is what makes the changes available. If we purge >> prior to committing it then if someone hits the page between the purge and >> the self._conn.commit() then the client will see a page cached prior to the >> update (while the change log will appear to be updated). Essentially the >> same problem we have now. >> >> The current implementation does mean that if a client happens to hit between >> the commit and the purge they'll see old data however that's pretty unlikely. > > Purging can take a second and also depends on the network connectivity > between pypi.python.org and fastly's api to begin with. I am afraid > the race-condition is bound to happen and then hard to detect. > > Not sure how exactly pypi.python.org is deployed but could commit() use > a semaphore which also the changelog-APIs use so that the latter only > returns after purging (and them some) has happened? I don't think > mirrors would mind sometimes waiting a few seconds before the changelog* call > returns as long as the state is then consistent. > > Lastly, i think introducing a bit of internal syncing overhead to commit()/ > changelog should be ok because we have only few writes and hardly read load.
Mirroring should not be affected by caching at all, as new packages mean new URLs (/pypi/name/version), so when you retrieve them there will be no cache issues. What I think you mean is this makes a race condition for pep381client, however this is a bug in pep381client, not PyPI. If you would like to submit a patch for a Paxos-based replication protocol, I'm sure Donald and I would be happy to review it. --Noah
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig