On Mon, May 27, 2013 at 14:59 -0400, Donald Stufft wrote:
> On May 27, 2013, at 2:54 PM, holger krekel <hol...@merlinux.eu> wrote:
> 
> > On Mon, May 27, 2013 at 13:50 -0400, Donald Stufft wrote:
> >> On May 27, 2013, at 12:39 PM, Donald Stufft <don...@stufft.io> wrote:
> >> 
> >>> 
> >>> On May 27, 2013, at 8:08 AM, holger krekel <hol...@merlinux.eu> wrote:
> >>> 
> >>>> Hi Noah, Donald, (CC also Richard, Christian),
> >>>> 
> >>>> i just checked with a test package and think we might have a cache
> >>>> consistency / changelog API problem.  It took me a while but here is 
> >>>> the basic thing: I uploaded a test package, changelog API reports it has
> >>>> changed, then i go to its simple page, and some of the time the new 
> >>>> release
> >>>> file shows up, sometimes not.
> >>>> 
> >>>> Tools like bandersnatch, pep381 and devpi-server (and probably others)
> >>>> use PyPI's changelog API to determine if there are changes.  It seems
> >>>> those changes are signalled faster than they become consistently 
> >>>> accessible 
> >>>> through the CDN.  This can lead to inconsistent mirrors because when 
> >>>> the CDN has the files there is no change event anymore.  Such mirrors 
> >>>> are run by companies in-house so i think it's a real problem.
> >>>> 
> >>>> Even without mirroring there can be problems because installs are not
> >>>> directly repeatable: "pip install XYZ>=2.0" can give you first 2.0.1,
> >>>> then 2.0.0 a minute later.  I had hoped that a particular ip address
> >>>> sees things consistently.
> >>>> 
> >>>> I am not familiar with Fastly's caching properties -- can they notify
> >>>> about the fact that a page/file is consistently up-to-date everywhere?  
> >>>> Or can the cache be globally invalidated for a particular page/file?
> >>>> Any other ideas?
> >>>> 
> >>>> Failing customizing Fastly usage and also maybe for the short term,
> >>>> is/could there be a special location provided by pypi.python.org which
> >>>> the above tools could use to get at the actual non-cached data?  We
> >>>> could then maybe mitigate the problem through updates of the respective 
> >>>> tools.
> >>>> That would at least solve the problem for one of my customers i think.
> >>>> 
> >>>> best,
> >>>> holger
> >>>> 
> >>>> 
> >>>> On Sun, May 26, 2013 at 10:34 -0700, Noah Kantrowitz wrote:
> >>>>> </farnsworth>
> >>>>> 
> >>>>> but seriously, at long last today it was my honor to throw the DNS 
> >>>>> switch to move PyPI to the Fastly caching CDN. I would like to thank 
> >>>>> Donald Stufft for doing much of the heavy lifting on the PyPI side, and 
> >>>>> to Fastly for graciously offering to host us. What does this mean for 
> >>>>> everyone? Well the biggest change is PyPI should get a whole lot 
> >>>>> faster. There are two major downsides however. There will now be a 
> >>>>> delay of several minutes in some cases between updating a package and 
> >>>>> having it be installable, and download counts will now be even more 
> >>>>> incorrect than they were before. The PyPI admins are discussing what to 
> >>>>> do about download counts long-term, but for now we all feel that the 
> >>>>> performance and availability benefits outweigh the loss. If anyone has 
> >>>>> any questions, or hears anything about issues with PyPI please don't 
> >>>>> hesitate to contact me.
> >>>>> 
> >>>>> --Noah
> >>>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>>> _______________________________________________
> >>>>> Distutils-SIG maillist  -  Distutils-SIG@python.org
> >>>>> http://mail.python.org/mailman/listinfo/distutils-sig
> >>>> 
> >>>> _______________________________________________
> >>>> Distutils-SIG maillist  -  Distutils-SIG@python.org
> >>>> http://mail.python.org/mailman/listinfo/distutils-sig
> >>> 
> >>> I mentioned it on twitter but might as well mention it here as well.
> >>> 
> >>> Currently there is no invalidation going on. The effect on the mirroring 
> >>> was unanticipated and I'm currently getting the invalidation API setup 
> >>> within PyPI.
> >>> 
> >>> -----------------
> >>> Donald Stufft
> >>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 
> >>> DCFA
> >>> 
> >>> _______________________________________________
> >>> Distutils-SIG maillist  -  Distutils-SIG@python.org
> >>> http://mail.python.org/mailman/listinfo/distutils-sig
> >> 
> >> 
> >> 
> >> /simple/ Pages should now be immediately invalidated when a new package is 
> >> released.
> > 
> > thanks Donald.  Looking at the implementation, i wonder what happens if 
> > after ``self._conn.commit()`` a changelog API call arrives, returns changes
> > and a client uses it to retrieve changes before the fastly-purging takes 
> > place.  It's still a potential race-condition or am i missing something?
> > 
> > best,
> > holger
> > 
> >> -----------------
> >> Donald Stufft
> >> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 
> >> DCFA
> >> 
> > 
> > 
> 
> 
> There's no way around a race condition.
> 
> ``self._conn.commit()`` is what makes the changes available. If we purge 
> prior to committing it then if someone hits the page between the purge and 
> the self._conn.commit() then the client will see a page cached prior to the 
> update (while the change log will appear to be updated). Essentially the same 
> problem we have now.
> 
> The current implementation does mean that if a client happens to hit between 
> the commit and the purge they'll see old data however that's pretty unlikely.

Purging can take a second and also depends on the network connectivity 
between pypi.python.org and fastly's api to begin with.   I am afraid 
the race-condition is bound to happen and then hard to detect.  

Not sure how exactly pypi.python.org is deployed but could commit() use
a semaphore which also the changelog-APIs use so that the latter only
returns after purging (and them some) has happened?  I don't think
mirrors would mind sometimes waiting a few seconds before the changelog* call
returns as long as the state is then consistent.

Lastly, i think introducing a bit of internal syncing overhead to commit()/
changelog should be ok because we have only few writes and hardly read load.

holger



> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 


_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to