On May 27, 2013, at 12:18 PM, holger krekel wrote:

> On Mon, May 27, 2013 at 14:59 -0400, Donald Stufft wrote:
>> On May 27, 2013, at 2:54 PM, holger krekel <hol...@merlinux.eu> wrote:
>> 
>>> On Mon, May 27, 2013 at 13:50 -0400, Donald Stufft wrote:
>>>> On May 27, 2013, at 12:39 PM, Donald Stufft <don...@stufft.io> wrote:
>>>> 
>>>>> 
>>>>> On May 27, 2013, at 8:08 AM, holger krekel <hol...@merlinux.eu> wrote:
>>>>> 
>>>>>> Hi Noah, Donald, (CC also Richard, Christian),
>>>>>> 
>>>>>> i just checked with a test package and think we might have a cache
>>>>>> consistency / changelog API problem.  It took me a while but here is 
>>>>>> the basic thing: I uploaded a test package, changelog API reports it has
>>>>>> changed, then i go to its simple page, and some of the time the new 
>>>>>> release
>>>>>> file shows up, sometimes not.
>>>>>> 
>>>>>> Tools like bandersnatch, pep381 and devpi-server (and probably others)
>>>>>> use PyPI's changelog API to determine if there are changes.  It seems
>>>>>> those changes are signalled faster than they become consistently 
>>>>>> accessible 
>>>>>> through the CDN.  This can lead to inconsistent mirrors because when 
>>>>>> the CDN has the files there is no change event anymore.  Such mirrors 
>>>>>> are run by companies in-house so i think it's a real problem.
>>>>>> 
>>>>>> Even without mirroring there can be problems because installs are not
>>>>>> directly repeatable: "pip install XYZ>=2.0" can give you first 2.0.1,
>>>>>> then 2.0.0 a minute later.  I had hoped that a particular ip address
>>>>>> sees things consistently.
>>>>>> 
>>>>>> I am not familiar with Fastly's caching properties -- can they notify
>>>>>> about the fact that a page/file is consistently up-to-date everywhere?  
>>>>>> Or can the cache be globally invalidated for a particular page/file?
>>>>>> Any other ideas?
>>>>>> 
>>>>>> Failing customizing Fastly usage and also maybe for the short term,
>>>>>> is/could there be a special location provided by pypi.python.org which
>>>>>> the above tools could use to get at the actual non-cached data?  We
>>>>>> could then maybe mitigate the problem through updates of the respective 
>>>>>> tools.
>>>>>> That would at least solve the problem for one of my customers i think.
>>>>>> 
>>>>>> best,
>>>>>> holger
>>>>>> 
>>>>>> 
>>>>>> On Sun, May 26, 2013 at 10:34 -0700, Noah Kantrowitz wrote:
>>>>>>> </farnsworth>
>>>>>>> 
>>>>>>> but seriously, at long last today it was my honor to throw the DNS 
>>>>>>> switch to move PyPI to the Fastly caching CDN. I would like to thank 
>>>>>>> Donald Stufft for doing much of the heavy lifting on the PyPI side, and 
>>>>>>> to Fastly for graciously offering to host us. What does this mean for 
>>>>>>> everyone? Well the biggest change is PyPI should get a whole lot 
>>>>>>> faster. There are two major downsides however. There will now be a 
>>>>>>> delay of several minutes in some cases between updating a package and 
>>>>>>> having it be installable, and download counts will now be even more 
>>>>>>> incorrect than they were before. The PyPI admins are discussing what to 
>>>>>>> do about download counts long-term, but for now we all feel that the 
>>>>>>> performance and availability benefits outweigh the loss. If anyone has 
>>>>>>> any questions, or hears anything about issues with PyPI please don't 
>>>>>>> hesitate to contact me.
>>>>>>> 
>>>>>>> --Noah
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Distutils-SIG maillist  -  Distutils-SIG@python.org
>>>>>>> http://mail.python.org/mailman/listinfo/distutils-sig
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Distutils-SIG maillist  -  Distutils-SIG@python.org
>>>>>> http://mail.python.org/mailman/listinfo/distutils-sig
>>>>> 
>>>>> I mentioned it on twitter but might as well mention it here as well.
>>>>> 
>>>>> Currently there is no invalidation going on. The effect on the mirroring 
>>>>> was unanticipated and I'm currently getting the invalidation API setup 
>>>>> within PyPI.
>>>>> 
>>>>> -----------------
>>>>> Donald Stufft
>>>>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 
>>>>> DCFA
>>>>> 
>>>>> _______________________________________________
>>>>> Distutils-SIG maillist  -  Distutils-SIG@python.org
>>>>> http://mail.python.org/mailman/listinfo/distutils-sig
>>>> 
>>>> 
>>>> 
>>>> /simple/ Pages should now be immediately invalidated when a new package is 
>>>> released.
>>> 
>>> thanks Donald.  Looking at the implementation, i wonder what happens if 
>>> after ``self._conn.commit()`` a changelog API call arrives, returns changes
>>> and a client uses it to retrieve changes before the fastly-purging takes 
>>> place.  It's still a potential race-condition or am i missing something?
>>> 
>>> best,
>>> holger
>>> 
>>>> -----------------
>>>> Donald Stufft
>>>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 
>>>> DCFA
>>>> 
>>> 
>>> 
>> 
>> 
>> There's no way around a race condition.
>> 
>> ``self._conn.commit()`` is what makes the changes available. If we purge 
>> prior to committing it then if someone hits the page between the purge and 
>> the self._conn.commit() then the client will see a page cached prior to the 
>> update (while the change log will appear to be updated). Essentially the 
>> same problem we have now.
>> 
>> The current implementation does mean that if a client happens to hit between 
>> the commit and the purge they'll see old data however that's pretty unlikely.
> 
> Purging can take a second and also depends on the network connectivity 
> between pypi.python.org and fastly's api to begin with.   I am afraid 
> the race-condition is bound to happen and then hard to detect.  
> 
> Not sure how exactly pypi.python.org is deployed but could commit() use
> a semaphore which also the changelog-APIs use so that the latter only
> returns after purging (and them some) has happened?  I don't think
> mirrors would mind sometimes waiting a few seconds before the changelog* call
> returns as long as the state is then consistent.
> 
> Lastly, i think introducing a bit of internal syncing overhead to commit()/
> changelog should be ok because we have only few writes and hardly read load.

Mirroring should not be affected by caching at all, as new packages mean new 
URLs (/pypi/name/version), so when you retrieve them there will be no cache 
issues. What I think you mean is this makes a race condition for pep381client, 
however this is a bug in pep381client, not PyPI. If you would like to submit a 
patch for a Paxos-based replication protocol, I'm sure Donald and I would be 
happy to review it.

--Noah

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to