On Mon, May 27, 2013 at 14:59 -0400, Donald Stufft wrote: > On May 27, 2013, at 2:54 PM, holger krekel <hol...@merlinux.eu> wrote: > > > On Mon, May 27, 2013 at 13:50 -0400, Donald Stufft wrote: > >> On May 27, 2013, at 12:39 PM, Donald Stufft <don...@stufft.io> wrote: > >> > >>> > >>> On May 27, 2013, at 8:08 AM, holger krekel <hol...@merlinux.eu> wrote: > >>> > >>>> Hi Noah, Donald, (CC also Richard, Christian), > >>>> > >>>> i just checked with a test package and think we might have a cache > >>>> consistency / changelog API problem. It took me a while but here is > >>>> the basic thing: I uploaded a test package, changelog API reports it has > >>>> changed, then i go to its simple page, and some of the time the new > >>>> release > >>>> file shows up, sometimes not. > >>>> > >>>> Tools like bandersnatch, pep381 and devpi-server (and probably others) > >>>> use PyPI's changelog API to determine if there are changes. It seems > >>>> those changes are signalled faster than they become consistently > >>>> accessible > >>>> through the CDN. This can lead to inconsistent mirrors because when > >>>> the CDN has the files there is no change event anymore. Such mirrors > >>>> are run by companies in-house so i think it's a real problem. > >>>> > >>>> Even without mirroring there can be problems because installs are not > >>>> directly repeatable: "pip install XYZ>=2.0" can give you first 2.0.1, > >>>> then 2.0.0 a minute later. I had hoped that a particular ip address > >>>> sees things consistently. > >>>> > >>>> I am not familiar with Fastly's caching properties -- can they notify > >>>> about the fact that a page/file is consistently up-to-date everywhere? > >>>> Or can the cache be globally invalidated for a particular page/file? > >>>> Any other ideas? > >>>> > >>>> Failing customizing Fastly usage and also maybe for the short term, > >>>> is/could there be a special location provided by pypi.python.org which > >>>> the above tools could use to get at the actual non-cached data? We > >>>> could then maybe mitigate the problem through updates of the respective > >>>> tools. > >>>> That would at least solve the problem for one of my customers i think. > >>>> > >>>> best, > >>>> holger > >>>> > >>>> > >>>> On Sun, May 26, 2013 at 10:34 -0700, Noah Kantrowitz wrote: > >>>>> </farnsworth> > >>>>> > >>>>> but seriously, at long last today it was my honor to throw the DNS > >>>>> switch to move PyPI to the Fastly caching CDN. I would like to thank > >>>>> Donald Stufft for doing much of the heavy lifting on the PyPI side, and > >>>>> to Fastly for graciously offering to host us. What does this mean for > >>>>> everyone? Well the biggest change is PyPI should get a whole lot > >>>>> faster. There are two major downsides however. There will now be a > >>>>> delay of several minutes in some cases between updating a package and > >>>>> having it be installable, and download counts will now be even more > >>>>> incorrect than they were before. The PyPI admins are discussing what to > >>>>> do about download counts long-term, but for now we all feel that the > >>>>> performance and availability benefits outweigh the loss. If anyone has > >>>>> any questions, or hears anything about issues with PyPI please don't > >>>>> hesitate to contact me. > >>>>> > >>>>> --Noah > >>>>> > >>>> > >>>> > >>>> > >>>>> _______________________________________________ > >>>>> Distutils-SIG maillist - Distutils-SIG@python.org > >>>>> http://mail.python.org/mailman/listinfo/distutils-sig > >>>> > >>>> _______________________________________________ > >>>> Distutils-SIG maillist - Distutils-SIG@python.org > >>>> http://mail.python.org/mailman/listinfo/distutils-sig > >>> > >>> I mentioned it on twitter but might as well mention it here as well. > >>> > >>> Currently there is no invalidation going on. The effect on the mirroring > >>> was unanticipated and I'm currently getting the invalidation API setup > >>> within PyPI. > >>> > >>> ----------------- > >>> Donald Stufft > >>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 > >>> DCFA > >>> > >>> _______________________________________________ > >>> Distutils-SIG maillist - Distutils-SIG@python.org > >>> http://mail.python.org/mailman/listinfo/distutils-sig > >> > >> > >> > >> /simple/ Pages should now be immediately invalidated when a new package is > >> released. > > > > thanks Donald. Looking at the implementation, i wonder what happens if > > after ``self._conn.commit()`` a changelog API call arrives, returns changes > > and a client uses it to retrieve changes before the fastly-purging takes > > place. It's still a potential race-condition or am i missing something? > > > > best, > > holger > > > >> ----------------- > >> Donald Stufft > >> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 > >> DCFA > >> > > > > > > > There's no way around a race condition. > > ``self._conn.commit()`` is what makes the changes available. If we purge > prior to committing it then if someone hits the page between the purge and > the self._conn.commit() then the client will see a page cached prior to the > update (while the change log will appear to be updated). Essentially the same > problem we have now. > > The current implementation does mean that if a client happens to hit between > the commit and the purge they'll see old data however that's pretty unlikely.
Purging can take a second and also depends on the network connectivity between pypi.python.org and fastly's api to begin with. I am afraid the race-condition is bound to happen and then hard to detect. Not sure how exactly pypi.python.org is deployed but could commit() use a semaphore which also the changelog-APIs use so that the latter only returns after purging (and them some) has happened? I don't think mirrors would mind sometimes waiting a few seconds before the changelog* call returns as long as the state is then consistent. Lastly, i think introducing a bit of internal syncing overhead to commit()/ changelog should be ok because we have only few writes and hardly read load. holger > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig