On May 27, 2013, at 2:54 PM, holger krekel <hol...@merlinux.eu> wrote:

> On Mon, May 27, 2013 at 13:50 -0400, Donald Stufft wrote:
>> On May 27, 2013, at 12:39 PM, Donald Stufft <don...@stufft.io> wrote:
>> 
>>> 
>>> On May 27, 2013, at 8:08 AM, holger krekel <hol...@merlinux.eu> wrote:
>>> 
>>>> Hi Noah, Donald, (CC also Richard, Christian),
>>>> 
>>>> i just checked with a test package and think we might have a cache
>>>> consistency / changelog API problem.  It took me a while but here is 
>>>> the basic thing: I uploaded a test package, changelog API reports it has
>>>> changed, then i go to its simple page, and some of the time the new release
>>>> file shows up, sometimes not.
>>>> 
>>>> Tools like bandersnatch, pep381 and devpi-server (and probably others)
>>>> use PyPI's changelog API to determine if there are changes.  It seems
>>>> those changes are signalled faster than they become consistently 
>>>> accessible 
>>>> through the CDN.  This can lead to inconsistent mirrors because when 
>>>> the CDN has the files there is no change event anymore.  Such mirrors 
>>>> are run by companies in-house so i think it's a real problem.
>>>> 
>>>> Even without mirroring there can be problems because installs are not
>>>> directly repeatable: "pip install XYZ>=2.0" can give you first 2.0.1,
>>>> then 2.0.0 a minute later.  I had hoped that a particular ip address
>>>> sees things consistently.
>>>> 
>>>> I am not familiar with Fastly's caching properties -- can they notify
>>>> about the fact that a page/file is consistently up-to-date everywhere?  
>>>> Or can the cache be globally invalidated for a particular page/file?
>>>> Any other ideas?
>>>> 
>>>> Failing customizing Fastly usage and also maybe for the short term,
>>>> is/could there be a special location provided by pypi.python.org which
>>>> the above tools could use to get at the actual non-cached data?  We
>>>> could then maybe mitigate the problem through updates of the respective 
>>>> tools.
>>>> That would at least solve the problem for one of my customers i think.
>>>> 
>>>> best,
>>>> holger
>>>> 
>>>> 
>>>> On Sun, May 26, 2013 at 10:34 -0700, Noah Kantrowitz wrote:
>>>>> </farnsworth>
>>>>> 
>>>>> but seriously, at long last today it was my honor to throw the DNS switch 
>>>>> to move PyPI to the Fastly caching CDN. I would like to thank Donald 
>>>>> Stufft for doing much of the heavy lifting on the PyPI side, and to 
>>>>> Fastly for graciously offering to host us. What does this mean for 
>>>>> everyone? Well the biggest change is PyPI should get a whole lot faster. 
>>>>> There are two major downsides however. There will now be a delay of 
>>>>> several minutes in some cases between updating a package and having it be 
>>>>> installable, and download counts will now be even more incorrect than 
>>>>> they were before. The PyPI admins are discussing what to do about 
>>>>> download counts long-term, but for now we all feel that the performance 
>>>>> and availability benefits outweigh the loss. If anyone has any questions, 
>>>>> or hears anything about issues with PyPI please don't hesitate to contact 
>>>>> me.
>>>>> 
>>>>> --Noah
>>>>> 
>>>> 
>>>> 
>>>> 
>>>>> _______________________________________________
>>>>> Distutils-SIG maillist  -  Distutils-SIG@python.org
>>>>> http://mail.python.org/mailman/listinfo/distutils-sig
>>>> 
>>>> _______________________________________________
>>>> Distutils-SIG maillist  -  Distutils-SIG@python.org
>>>> http://mail.python.org/mailman/listinfo/distutils-sig
>>> 
>>> I mentioned it on twitter but might as well mention it here as well.
>>> 
>>> Currently there is no invalidation going on. The effect on the mirroring 
>>> was unanticipated and I'm currently getting the invalidation API setup 
>>> within PyPI.
>>> 
>>> -----------------
>>> Donald Stufft
>>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>>> 
>>> _______________________________________________
>>> Distutils-SIG maillist  -  Distutils-SIG@python.org
>>> http://mail.python.org/mailman/listinfo/distutils-sig
>> 
>> 
>> 
>> /simple/ Pages should now be immediately invalidated when a new package is 
>> released.
> 
> thanks Donald.  Looking at the implementation, i wonder what happens if 
> after ``self._conn.commit()`` a changelog API call arrives, returns changes
> and a client uses it to retrieve changes before the fastly-purging takes 
> place.  It's still a potential race-condition or am i missing something?
> 
> best,
> holger
> 
>> -----------------
>> Donald Stufft
>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>> 
> 
> 


There's no way around a race condition.

``self._conn.commit()`` is what makes the changes available. If we purge prior 
to committing it then if someone hits the page between the purge and the 
self._conn.commit() then the client will see a page cached prior to the update 
(while the change log will appear to be updated). Essentially the same problem 
we have now.

The current implementation does mean that if a client happens to hit between 
the commit and the purge they'll see old data however that's pretty unlikely.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to