Re: [Distutils] Fwd: The state of PyPI
>> However, since this all forms a single >> web site, integrating them will be either infeasible or pointless. > > There are 2 sites, "simple" and "pypi". No, it's one site: http://pypi.python.org/pypi, and http://pypi.python.org/simple (plus there are other URLs, like /packages). Regards, Martin ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
On Tue, Sep 27, 2011 at 2:10 PM, "Martin v. Löwis" wrote: > Am 27.09.2011 13:12, schrieb Jim Fulton: >> On Tue, Sep 27, 2011 at 6:07 AM, Lennart Regebro wrote: >>> On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé wrote: 1/ stability and high availability >>> >>> How are opinions on setting up country-specific PyPI mirrors? The lag >>> to the US is pretty severe in Poland, and I suspect my buildouts would >>> benefit from having a server in Poland. Now, of course, it could be >>> called x.pypi.python.org, but maybe we should have aliases such as >>> pl.pypi.python.org as well? >>> >>> I have no strong opinion on the issue, what do others think? >> >> Wouldn't CloudFront make this moot? > > I personally don't believe the CloudFront project is feasible - IMO, > it just won't work. This is because there needs to be both dynamic > content (at least for uploads) and static content; CloudFront can > only mirror the static content. That's the intent. To serve the static content. > However, since this all forms a single > web site, integrating them will be either infeasible or pointless. There are 2 sites, "simple" and "pypi". I would only host the simple site in a CDN. I assume the mirrors are only mirroring the simple site. pypi would, I assume, either stay where it is now, or be evolved separately. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
> That's an extra HTTP request I need to make when I'm > considering use of a mirror. If the first mirror I check seems to > be out of date, I may need to check all the mirrors. It's an open > question what should be considered potentially out of date, a > timestamp older than an hour? a day? 15 minutes; if all mirrors are older than 15 minutes, it probably means that the master is down, and you should then use the newest mirror you can find. > How old is too old? More than 15 minutes. >> get the list of mirrors (-> the list of >> mirrors and their timestamps get cached) > > They'll only get cached for the program invocation. No, you can also cache the fastest mirror across invocations, and keep using it if it is younger than 15 minutes. >> - pick the closest one > > How do I decide what's closest? Did you mean closest? He probably means the one that responds fastest. > "etc" is just waving hands. Selecting the right value is hard, possibly > application dependent. Is this a configuration variable? Now the > user has something to deal with. No, it will be 15 minutes. People might request being able to configure it, but it won't really be necessary to do so. Regards, Martin ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
On Sep 28, 2011, at 09:59 AM, Greg Ewing wrote: >Are there really no existing open-source mirroring >systems out there? rsync? :) -Barry ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
Jim Fulton wrote: Life is short. We don't have to invent this ourselves. Are there really no existing open-source mirroring systems out there? This seems like a common enough thing that it must have been solved already. -- Greg ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
On Tue, Sep 27, 2011 at 1:25 PM, Tarek Ziadé wrote: > On Tue, Sep 27, 2011 at 5:35 PM, Jim Fulton wrote: ... >> But I don't want to have to update buildout *just* because of an itch >> to have a custom protocol. > > I kind of wonder how hard it would be to have a standalone pypi > download client, ripped off from python 3.3's packaging, so you would > not have to worry about this. I doubt I'm going to be able to avoid worrying about it. Still a reference client implementation would be useful. > And, well, you do not sound like you want to spend time in these > matters in any case, I don't know what you mean. Not sure I care. :) > so if someone brings a patch I hope you will not > refuse it. No. I'll eventually implement it if no one else does. >>> But the use case is usually: PyPI is down, we fallback to a mirror. I >>> don't think it's more complicated than this. >> >> I don't agree. On multiple levels. PYPI is often up but slow. > > That's an orthogonal issue : any server can be slow. A service can be fast even if an individual server is slow. Also, CDNs can make lots of horsepower available that is shared among multiple customers. I really doubt that anything we build will be faster. > One better way to drastically speed up buildout is to download / > build stuff in parallel imo. Thats true and something I'd like to do at some point. That's one of the reasons I expect I'll have to worry about the protocol. > >> It's also in the wrong place. A CDN should provide better performance, >> reliability and locality. > > Locality is indeed important, and picking up the nearest server is great. > Reliability is also solved by the mirrors. At the expense of increased complexity on the client. >> >> A client has to: >> >> - try pypi >> - fallback to "last" >> - If that's down, decide what other indexes to check >> >> I don't see how having timestamps help unless you know >> what the current timestamp is, unless you say that you'll reject >> a mirror with a timestamp more than some period in the past. > > How hard it is to make those decisions ? It's not "hard" conceptually, but it's still a lot of implementation complexity and a lot of extra network requests. > Do you really think getting the current timestamp is that hard ? > > And the mirror timestamp, > > http://b.pypi.python.org/last-modified > > In all you've said I fail to see how complicated it is, or long to do. That's an extra HTTP request I need to make when I'm considering use of a mirror. If the first mirror I check seems to be out of date, I may need to check all the mirrors. It's an open question what should be considered potentially out of date, a timestamp older than an hour? a day? > The ordering I see is: > > normal behavior: > - if the cache is too old: How old is too old? > get the list of mirrors (-> the list of > mirrors and their timestamps get cached) They'll only get cached for the program invocation. This means I have to potentially check lots of mirrors every time someone runs buildout. I can reduce latency by doing this in parallel, but that's still a lot of requests. > - pick the closest one How do I decide what's closest? Did you mean closest? or most up to date > - use it > > the server times out: > - try the "next closest" > > >> It's not clear what this time delta should be and, in any case, >> the client needs to first validate a mirror by checking it's timestamp. > > This is the job of the client yes. An option that says, discard > mirrors that are > 1 day, or 5 hours etc. "etc" is just waving hands. Selecting the right value is hard, possibly application dependent. Is this a configuration variable? Now the user has something to deal with. > Keeping a local cache that gets updated eventually is sufficient. In process, or on disk? This just gets better and better. :) >> I think this protocol is going to be hard to get right. > > Maybe ? but if a v1 allows us to switch from server 1 being down to > server 2, it's already a success, no ? > > servers that *we* the community, manage. I fail to see why this is inherently a good thing. I don't like "managing" things. Less work is good. ... > Do we really want Amazon to handle PyPI ? Yes, or Rackspace, or Google, or AOL, or, whatever. Just not us. (I suspect some of these might even do it for free.) > I prefer a bunch of community mirrors. Heck, I have one at Mozilla, > and might make it public one day :) > > Or maybe the optimal solution is our own CND proxy so we don't deal > with this on client side. > > raises, slowly> Uh, yeah, sure. FWIW, it hadn't occurred to me to use a CDN until a conversation a few days ago. Doh. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
> I understand why you don't want to rely on a proprietary solution. It's not a proprietary solution. It uses standard technologies, such as HTTP and XML-RPC. If you say that the specific URLs to access are proprietary: true, but so would be a CloudFront mirror (which also requires proprietary API to get the data into CloudFront). > It's a reverse proxy. You point it at s3 and at a web server and it caches. > Of course, it has aspects that are specific to it's implementation. That's not my understanding as to what the project to get PyPI onto CloudFront is doing (IIUC). Regards, Martin ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
Am 27.09.2011 13:12, schrieb Jim Fulton: > On Tue, Sep 27, 2011 at 6:07 AM, Lennart Regebro wrote: >> On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé wrote: >>> 1/ stability and high availability >> >> How are opinions on setting up country-specific PyPI mirrors? The lag >> to the US is pretty severe in Poland, and I suspect my buildouts would >> benefit from having a server in Poland. Now, of course, it could be >> called x.pypi.python.org, but maybe we should have aliases such as >> pl.pypi.python.org as well? >> >> I have no strong opinion on the issue, what do others think? > > Wouldn't CloudFront make this moot? I personally don't believe the CloudFront project is feasible - IMO, it just won't work. This is because there needs to be both dynamic content (at least for uploads) and static content; CloudFront can only mirror the static content. However, since this all forms a single web site, integrating them will be either infeasible or pointless. It would be possible to use CloudFront if the clients would be changed. But if so, it would be best if the CloudFront copy would just be a PEP 381 mirror. Regards, Martin ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
On 9/27/11 1:25 PM, Tarek Ziadé wrote: On Tue, Sep 27, 2011 at 5:35 PM, Jim Fulton wrote: I understand where you're coming from but, .. Sorry, I don't understand what you imply here. I understand why you don't want to rely on a proprietary solution. But it's true that I don't want to rely on a proprietary solution. That's based on a good reason I think, mentioned at the end of this mail. ... If you're saying that CloudFront is proven technology and that we should not worry about relying on them, then I think we can do better for the community to get locked-in for this, and continue to work on an open protocol where everyone can participate by providing a spare server. But maybe that's just me ? It's nice to have a hobby. :) I think you've missed what we, bunch of hobbyists, did in the past two years + 5 community mirrors are up and running, collecting download stats that get merged + pip does work with the mirrors, and offer fallback options It's too bad you were not there to tell us we were wasting our time and how awesome CloudFront was ;) But at this point, the shortest road to a better PyPI is to add the mirroring support to other clients, pip showed the lead. And if zc.buildout uses Distribute, it should get this feature at some point. But having a CloudFront-based PyPI could also be interesting in parallel, I am not saying it's not. But the project is stalled, and has the defaults I've mentioned. But I don't want to have to update buildout *just* because of an itch to have a custom protocol. I kind of wonder how hard it would be to have a standalone pypi download client, ripped off from python 3.3's packaging, so you would not have to worry about this. And, well, you do not sound like you want to spend time in these matters in any case, so if someone brings a patch I hope you will not refuse it. But the use case is usually: PyPI is down, we fallback to a mirror. I don't think it's more complicated than this. I don't agree. On multiple levels. PYPI is often up but slow. That's an orthogonal issue : any server can be slow. One better way to drastically speed up buildout is to download / build stuff in parallel imo. It's also in the wrong place. A CDN should provide better performance, reliability and locality. Locality is indeed important, and picking up the nearest server is great. Reliability is also solved by the mirrors. A client has to: - try pypi - fallback to "last" - If that's down, decide what other indexes to check I don't see how having timestamps help unless you know what the current timestamp is, unless you say that you'll reject a mirror with a timestamp more than some period in the past. How hard it is to make those decisions ? Do you really think getting the current timestamp is that hard ? And the mirror timestamp, http://b.pypi.python.org/last-modified In all you've said I fail to see how complicated it is, or long to do. The ordering I see is: normal behavior: - if the cache is too old: get the list of mirrors (-> the list of mirrors and their timestamps get cached) - pick the closest one - use it the server times out: - try the "next closest" It's not clear what this time delta should be and, in any case, the client needs to first validate a mirror by checking it's timestamp. This is the job of the client yes. An option that says, discard mirrors that are> 1 day, or 5 hours etc. Keeping a local cache that gets updated eventually is sufficient. I think this protocol is going to be hard to get right. Maybe ? but if a v1 allows us to switch from server 1 being down to server 2, it's already a success, no ? servers that *we* the community, manage. - It either requires extra dns calls or relies to heavily on the last mirror, which is probably likely to be the least reliable. Once you have the list, I don't think you require extra call. see http://hg.python.org/cpython/file/84280fac98b9/Lib/packaging/pypi/mirrors.py It has to make extra dns calls to resolve the other mirror names to ips. Yeah, once per session. but in any case, this is not a decision you're making on every download. It's something you do when you start to download stuff, and/or when a server times out. You stick with a server once it's working Life is short. We don't have to invent this ourselves. Ah well, yeah -- Not sure what you are proposing right now. If you imply that everything should be solved on server-side, and that we should not have mirroring I think we should pick a good CDN and use it. I won't object, because this is orthogonal to the mirroring stuff, but I am not going to scratch the mirroring efforts to move PyPI to a single shop. Every service on the planet, even Amazon, can be down. oh, my: - https://forums.aws.amazon.com/message.jspa?messageID=244986 - http://money.cnn.com/2011/04/22/technology/amazon_ec2_cloud_outage/index.htm. - http://www.labnol.org/internet/amazon-s3-cloudfront-down/5667/ - h
Re: [Distutils] Fwd: The state of PyPI
On Tue, Sep 27, 2011 at 5:35 PM, Jim Fulton wrote: >>> I understand where you're coming from but, .. >> Sorry, I don't understand what you imply here. > I understand why you don't want to rely on a proprietary solution. But it's true that I don't want to rely on a proprietary solution. That's based on a good reason I think, mentioned at the end of this mail. ... > >> If you're saying that CloudFront is proven technology and that we >> should not worry about relying on them, then I think we can do better >> for the community to get locked-in for this, and continue to work on >> an open protocol where everyone can participate by providing a spare >> server. But maybe that's just me ? > > It's nice to have a hobby. :) I think you've missed what we, bunch of hobbyists, did in the past two years + 5 community mirrors are up and running, collecting download stats that get merged + pip does work with the mirrors, and offer fallback options It's too bad you were not there to tell us we were wasting our time and how awesome CloudFront was ;) But at this point, the shortest road to a better PyPI is to add the mirroring support to other clients, pip showed the lead. And if zc.buildout uses Distribute, it should get this feature at some point. But having a CloudFront-based PyPI could also be interesting in parallel, I am not saying it's not. But the project is stalled, and has the defaults I've mentioned. > But I don't want to have to update buildout *just* because of an itch > to have a custom protocol. I kind of wonder how hard it would be to have a standalone pypi download client, ripped off from python 3.3's packaging, so you would not have to worry about this. And, well, you do not sound like you want to spend time in these matters in any case, so if someone brings a patch I hope you will not refuse it. >> But the use case is usually: PyPI is down, we fallback to a mirror. I >> don't think it's more complicated than this. > > I don't agree. On multiple levels. PYPI is often up but slow. That's an orthogonal issue : any server can be slow. One better way to drastically speed up buildout is to download / build stuff in parallel imo. > It's also in the wrong place. A CDN should provide better performance, > reliability and locality. Locality is indeed important, and picking up the nearest server is great. Reliability is also solved by the mirrors. > > A client has to: > > - try pypi > - fallback to "last" > - If that's down, decide what other indexes to check > > I don't see how having timestamps help unless you know > what the current timestamp is, unless you say that you'll reject > a mirror with a timestamp more than some period in the past. How hard it is to make those decisions ? Do you really think getting the current timestamp is that hard ? And the mirror timestamp, http://b.pypi.python.org/last-modified In all you've said I fail to see how complicated it is, or long to do. The ordering I see is: normal behavior: - if the cache is too old: get the list of mirrors (-> the list of mirrors and their timestamps get cached) - pick the closest one - use it the server times out: - try the "next closest" > It's not clear what this time delta should be and, in any case, > the client needs to first validate a mirror by checking it's timestamp. This is the job of the client yes. An option that says, discard mirrors that are > 1 day, or 5 hours etc. Keeping a local cache that gets updated eventually is sufficient. > I think this protocol is going to be hard to get right. Maybe ? but if a v1 allows us to switch from server 1 being down to server 2, it's already a success, no ? servers that *we* the community, manage. >>> >>> - It either requires extra dns calls or relies to heavily on the last >>> mirror, which is probably likely >>> to be the least reliable. >> >> Once you have the list, I don't think you require extra call. >> >> see >> http://hg.python.org/cpython/file/84280fac98b9/Lib/packaging/pypi/mirrors.py > > It has to make extra dns calls to resolve the other mirror names to ips. Yeah, once per session. but in any case, this is not a decision you're making on every download. It's something you do when you start to download stuff, and/or when a server times out. You stick with a server once it's working > > >>> Life is short. We don't have to invent this ourselves. >> >> Ah well, yeah -- Not sure what you are proposing right now. >> >> If you imply that everything should be solved on server-side, and that >> we should not have mirroring > > I think we should pick a good CDN and use it. I won't object, because this is orthogonal to the mirroring stuff, but I am not going to scratch the mirroring efforts to move PyPI to a single shop. Every service on the planet, even Amazon, can be down. oh, my: - https://forums.aws.amazon.com/message.jspa?messageID=244986 - http://money.cnn.com/2011/04/22/technology/amazon_ec2_cloud_outage/index.htm. - http://www.labnol.or
Re: [Distutils] Fwd: The state of PyPI
On Tue, Sep 27, 2011 at 8:40 AM, Tarek Ziadé wrote: > On Tue, Sep 27, 2011 at 2:27 PM, Jim Fulton wrote: > ... >> >> I understand where you're coming from but, .. > > Sorry, I don't understand what you imply here. I understand why you don't want to rely on a proprietary solution. >> I think it's saner to rely on proven technology >> than to invent our own protocol. NIH? > > Ah sorry I misunderstood then. I thought CloudFront was a proprietary > platform, with its own protocol. It's a reverse proxy. You point it at s3 and at a web server and it caches. Of course, it has aspects that are specific to it's implementation. > If you're saying that we can move away from CloudFront at any time and > have the same feature elsewhere, then it's perfect. If we move to something else, *some* changes will be necessary, but we can certainly move. I agree, it's perfect. ;) > If you're saying that CloudFront is proven technology and that we > should not worry about relying on them, then I think we can do better > for the community to get locked-in for this, and continue to work on > an open protocol where everyone can participate by providing a spare > server. But maybe that's just me ? It's nice to have a hobby. :) But I don't want to have to update buildout *just* because of an itch to have a custom protocol. > > Most of the mirroring protocol was inspired by Perl's CPAN btw. > But the use case is usually: PyPI is down, we fallback to a mirror. I > don't think it's more complicated than this. I don't agree. On multiple levels. PYPI is often up but slow. It's also in the wrong place. A CDN should provide better performance, reliability and locality. A client has to: - try pypi - fallback to "last" - If that's down, decide what other indexes to check I don't see how having timestamps help unless you know what the current timestamp is, unless you say that you'll reject a mirror with a timestamp more than some period in the past. It's not clear what this time delta should be and, in any case, the client needs to first validate a mirror by checking it's timestamp. I think this protocol is going to be hard to get right. >> >> - It either requires extra dns calls or relies to heavily on the last >> mirror, which is probably likely >> to be the least reliable. > > Once you have the list, I don't think you require extra call. > > see > http://hg.python.org/cpython/file/84280fac98b9/Lib/packaging/pypi/mirrors.py It has to make extra dns calls to resolve the other mirror names to ips. >> Life is short. We don't have to invent this ourselves. > > Ah well, yeah -- Not sure what you are proposing right now. > > If you imply that everything should be solved on server-side, and that > we should not have mirroring I think we should pick a good CDN and use it. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
On 9/27/11 9:07 AM, Tarek Ziadé wrote: On Tue, Sep 27, 2011 at 3:00 PM, Wichert Akkerman wrote: On 09/27/2011 02:51 PM, Tarek Ziadé wrote: On Tue, Sep 27, 2011 at 2:39 PM, Wichert Akkerman wrote: .. I understand where you're coming from but, .. I think it's saner to rely on proven technology than to invent our own protocol. NIH? This also feels like a problem that has already been solved in various ways by Debian, RedHat, CPAN and others. Yes, and we've found a way similar to CPAN, with some Python specifics (PyPI download statistics mainly) Oh my, we're cycling again. Nothing personal to you or Jim, but I have a sudden fatigue on packaging because it seems like people are ignoring what's being done to complain afterwards about us suffering of some kind of NIH :) It's just that my perspective is that of a simple user. And from my perspective nothing has changed in the last couple of years. Pypi still goes down occasionally, and when that happens many things start breaking. It may very well be that there are things planned or in progress, but until they are both usable and used by standard tools, which for me means buildout and setuptools, they are invisible. Fair enough, Pip has now the mirroring protocol implemented. I think they want to make it a default option for the next major Pip release. IOW, you should not suffer for downtimes using pip. We'd need to add the same feature in easy_installand zc.buildout. But since Pip did it, I think it's possible. We have 5 mirrors run by the community (http://pypi.python.org/mirrors) and I suspect porting pip's feature to zc.buildout and easy_install would take less time than creating a app. I'm not sure I fully understand why each-tool needs updating (vs. something that would provide HA but be invisible to the tools), but I suspect a lot of this is "legacy" related (i.e. PyPI was not originally designed to handle HA) and adding support to each tool, though potentially tedious-sounding (if not actually tedious) may be reasonable. Alex Cheers Tarek Wichert. -- Alex Clark · http://aclark.net ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
On 9/27/11 8:51 AM, Tarek Ziadé wrote: On Tue, Sep 27, 2011 at 2:39 PM, Wichert Akkerman wrote: .. I understand where you're coming from but, .. I think it's saner to rely on proven technology than to invent our own protocol. NIH? This also feels like a problem that has already been solved in various ways by Debian, RedHat, CPAN and others. Yes, and we've found a way similar to CPAN, with some Python specifics (PyPI download statistics mainly) Oh my, we're cycling again. Nothing personal to you or Jim, but I have a sudden fatigue on packaging because it seems like people are ignoring what's being done to complain afterwards about us suffering of some kind of NIH :) If you're seeing anything you don't like in PEP 381 (accepted a while ago), go ahead and propose some improvements. But please keep in mind that we've looked at other systems before we wrote that PEP. +1 If someone (Tarek in this case) wants to take the lead on this then I don't care where the fix was invented ;-) Obviously the less we make volunteers do, and the more we can outsource the better (I say this in Plone-land repeatedly) but sometimes inventing-something-here is the right thing to do. And it certainly would not preclude other (cloud) fixes AFAICT. Alex Cheers Tarek -- Alex Clark · http://aclark.net ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
Is there any plan also to work on an updated and better version of Pypi, I mean, i understand that we need to work on the infrastructure and make sure it will handle the load. But i also think that the current codebase of Pypi would need be updated. Is there any plan refactored in part of this work ? i Suspect that this is something we'll have to do one day another if we want continue to integrate things in it, to add feature... On Tue, Sep 27, 2011 at 9:07 AM, Tarek Ziadé wrote: > On Tue, Sep 27, 2011 at 3:00 PM, Wichert Akkerman wrote: >> On 09/27/2011 02:51 PM, Tarek Ziadé wrote: >>> >>> On Tue, Sep 27, 2011 at 2:39 PM, Wichert Akkerman >>> wrote: >>> .. > > I understand where you're coming from but, .. > I think it's saner to rely on proven technology > than to invent our own protocol. NIH? This also feels like a problem that has already been solved in various ways by Debian, RedHat, CPAN and others. >>> >>> Yes, and we've found a way similar to CPAN, with some Python specifics >>> (PyPI download statistics mainly) >>> >>> Oh my, we're cycling again. >>> >>> Nothing personal to you or Jim, but I have a sudden fatigue on >>> packaging because it seems like people are ignoring what's being done >>> to complain afterwards about us suffering of some kind of NIH :) >> >> It's just that my perspective is that of a simple user. And from my >> perspective nothing has changed in the last couple of years. Pypi still goes >> down occasionally, and when that happens many things start breaking. It may >> very well be that there are things planned or in progress, but until they >> are both usable and used by standard tools, which for me means buildout and >> setuptools, they are invisible. > > Fair enough, > > Pip has now the mirroring protocol implemented. I think they want to > make it a default option for the next major Pip release. IOW, you > should not suffer for downtimes using pip. > > We'd need to add the same feature in easy_install and zc.buildout. > But since Pip did it, I think it's possible. > > We have 5 mirrors run by the community > (http://pypi.python.org/mirrors) and I suspect porting pip's feature > to zc.buildout and easy_install would take less time than creating a > app. > > > Cheers > Tarek > >> >> Wichert. >> > > > > -- > Tarek Ziadé | http://ziade.org > ___ > Distutils-SIG maillist - Distutils-SIG@python.org > http://mail.python.org/mailman/listinfo/distutils-sig > -- Mathieu Leduc-Hamel ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
On Tue, Sep 27, 2011 at 3:00 PM, Wichert Akkerman wrote: > On 09/27/2011 02:51 PM, Tarek Ziadé wrote: >> >> On Tue, Sep 27, 2011 at 2:39 PM, Wichert Akkerman >> wrote: >> .. I understand where you're coming from but, .. I think it's saner to rely on proven technology than to invent our own protocol. NIH? >>> >>> This also feels like a problem that has already been solved in various >>> ways >>> by Debian, RedHat, CPAN and others. >> >> Yes, and we've found a way similar to CPAN, with some Python specifics >> (PyPI download statistics mainly) >> >> Oh my, we're cycling again. >> >> Nothing personal to you or Jim, but I have a sudden fatigue on >> packaging because it seems like people are ignoring what's being done >> to complain afterwards about us suffering of some kind of NIH :) > > It's just that my perspective is that of a simple user. And from my > perspective nothing has changed in the last couple of years. Pypi still goes > down occasionally, and when that happens many things start breaking. It may > very well be that there are things planned or in progress, but until they > are both usable and used by standard tools, which for me means buildout and > setuptools, they are invisible. Fair enough, Pip has now the mirroring protocol implemented. I think they want to make it a default option for the next major Pip release. IOW, you should not suffer for downtimes using pip. We'd need to add the same feature in easy_install and zc.buildout. But since Pip did it, I think it's possible. We have 5 mirrors run by the community (http://pypi.python.org/mirrors) and I suspect porting pip's feature to zc.buildout and easy_install would take less time than creating a app. Cheers Tarek > > Wichert. > -- Tarek Ziadé | http://ziade.org ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
On 09/27/2011 02:51 PM, Tarek Ziadé wrote: On Tue, Sep 27, 2011 at 2:39 PM, Wichert Akkerman wrote: .. I understand where you're coming from but, .. I think it's saner to rely on proven technology than to invent our own protocol. NIH? This also feels like a problem that has already been solved in various ways by Debian, RedHat, CPAN and others. Yes, and we've found a way similar to CPAN, with some Python specifics (PyPI download statistics mainly) Oh my, we're cycling again. Nothing personal to you or Jim, but I have a sudden fatigue on packaging because it seems like people are ignoring what's being done to complain afterwards about us suffering of some kind of NIH :) It's just that my perspective is that of a simple user. And from my perspective nothing has changed in the last couple of years. Pypi still goes down occasionally, and when that happens many things start breaking. It may very well be that there are things planned or in progress, but until they are both usable and used by standard tools, which for me means buildout and setuptools, they are invisible. Wichert. ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
On Tue, Sep 27, 2011 at 2:39 PM, Wichert Akkerman wrote: .. >> >> I understand where you're coming from but, .. >> I think it's saner to rely on proven technology >> than to invent our own protocol. NIH? > > This also feels like a problem that has already been solved in various ways > by Debian, RedHat, CPAN and others. Yes, and we've found a way similar to CPAN, with some Python specifics (PyPI download statistics mainly) Oh my, we're cycling again. Nothing personal to you or Jim, but I have a sudden fatigue on packaging because it seems like people are ignoring what's being done to complain afterwards about us suffering of some kind of NIH :) If you're seeing anything you don't like in PEP 381 (accepted a while ago), go ahead and propose some improvements. But please keep in mind that we've looked at other systems before we wrote that PEP. Cheers Tarek -- Tarek Ziadé | http://ziade.org ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
On Tue, Sep 27, 2011 at 2:27 PM, Jim Fulton wrote: ... > > I understand where you're coming from but, .. Sorry, I don't understand what you imply here. > I think it's saner to rely on proven technology > than to invent our own protocol. NIH? Ah sorry I misunderstood then. I thought CloudFront was a proprietary platform, with its own protocol. If you're saying that we can move away from CloudFront at any time and have the same feature elsewhere, then it's perfect. If you're saying that CloudFront is proven technology and that we should not worry about relying on them, then I think we can do better for the community to get locked-in for this, and continue to work on an open protocol where everyone can participate by providing a spare server. But maybe that's just me ? Most of the mirroring protocol was inspired by Perl's CPAN btw. > > BTW, in looking at PEP 381 (yeah, I know, I'm a bad person > for waiting so long) Yeah, started around 2 years ago, but comments are always welcome :) > I have lots of reservations about the protocol: > > - It's potentially complex to implement efficiently, especially given that: > > - We've had problems with mirrors getting out of date, meaning that, > potentially, clients should > check multiple indexes, Yeah, mirrors do get out of sync. There's a freshness time stamp. But the use case is usually: PyPI is down, we fallback to a mirror. I don't think it's more complicated than this. > > - It either requires extra dns calls or relies to heavily on the last > mirror, which is probably likely > to be the least reliable. Once you have the list, I don't think you require extra call. see http://hg.python.org/cpython/file/84280fac98b9/Lib/packaging/pypi/mirrors.py > > Life is short. We don't have to invent this ourselves. Ah well, yeah -- Not sure what you are proposing right now. If you imply that everything should be solved on server-side, and that we should not have mirroring > > Jim > > -- > Jim Fulton > http://www.linkedin.com/in/jimfulton > -- Tarek Ziadé | http://ziade.org ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
On 09/27/2011 02:27 PM, Jim Fulton wrote: On Tue, Sep 27, 2011 at 7:58 AM, Tarek Ziadé wrote: 2011/9/27 Jim Fulton: On Tue, Sep 27, 2011 at 6:07 AM, Lennart Regebro wrote: On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé wrote: 1/ stability and high availability How are opinions on setting up country-specific PyPI mirrors? The lag to the US is pretty severe in Poland, and I suspect my buildouts would benefit from having a server in Poland. Now, of course, it could be called x.pypi.python.org, but maybe we should have aliases such as pl.pypi.python.org as well? I have no strong opinion on the issue, what do others think? Wouldn't CloudFront make this moot? If we state that PyPI completely depends on Amazon I guess yes. But imho, it's saner for the long term to have a community-driven protocol for mirroring so we don't rely on third-party vendors. I understand where you're coming from but, .. I think it's saner to rely on proven technology than to invent our own protocol. NIH? This also feels like a problem that has already been solved in various ways by Debian, RedHat, CPAN and others. Wichert. ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
On Tue, Sep 27, 2011 at 7:58 AM, Tarek Ziadé wrote: > 2011/9/27 Jim Fulton : >> On Tue, Sep 27, 2011 at 6:07 AM, Lennart Regebro wrote: >>> On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé wrote: 1/ stability and high availability >>> >>> How are opinions on setting up country-specific PyPI mirrors? The lag >>> to the US is pretty severe in Poland, and I suspect my buildouts would >>> benefit from having a server in Poland. Now, of course, it could be >>> called x.pypi.python.org, but maybe we should have aliases such as >>> pl.pypi.python.org as well? >>> >>> I have no strong opinion on the issue, what do others think? >> >> Wouldn't CloudFront make this moot? > > If we state that PyPI completely depends on Amazon I guess yes. > > But imho, it's saner for the long term to have a community-driven > protocol for mirroring so we don't rely on third-party vendors. I understand where you're coming from but, .. I think it's saner to rely on proven technology than to invent our own protocol. NIH? BTW, in looking at PEP 381 (yeah, I know, I'm a bad person for waiting so long) I have lots of reservations about the protocol: - It's potentially complex to implement efficiently, especially given that: - We've had problems with mirrors getting out of date, meaning that, potentially, clients should check multiple indexes, - It either requires extra dns calls or relies to heavily on the last mirror, which is probably likely to be the least reliable. Life is short. We don't have to invent this ourselves. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
2011/9/27 Jim Fulton : > On Tue, Sep 27, 2011 at 6:07 AM, Lennart Regebro wrote: >> On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé wrote: >>> 1/ stability and high availability >> >> How are opinions on setting up country-specific PyPI mirrors? The lag >> to the US is pretty severe in Poland, and I suspect my buildouts would >> benefit from having a server in Poland. Now, of course, it could be >> called x.pypi.python.org, but maybe we should have aliases such as >> pl.pypi.python.org as well? >> >> I have no strong opinion on the issue, what do others think? > > Wouldn't CloudFront make this moot? If we state that PyPI completely depends on Amazon I guess yes. But imho, it's saner for the long term to have a community-driven protocol for mirroring so we don't rely on third-party vendors. > > I'm not clear I follow the plans, but it appears that CloudFront would > provide greater availability with little or no changes to clients. > > Jim > > -- > Jim Fulton > http://www.linkedin.com/in/jimfulton > -- Tarek Ziadé | http://ziade.org ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
On Tue, Sep 27, 2011 at 6:07 AM, Lennart Regebro wrote: > On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé wrote: >> 1/ stability and high availability > > How are opinions on setting up country-specific PyPI mirrors? The lag > to the US is pretty severe in Poland, and I suspect my buildouts would > benefit from having a server in Poland. Now, of course, it could be > called x.pypi.python.org, but maybe we should have aliases such as > pl.pypi.python.org as well? > > I have no strong opinion on the issue, what do others think? Wouldn't CloudFront make this moot? I'm not clear I follow the plans, but it appears that CloudFront would provide greater availability with little or no changes to clients. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
On Tue, Sep 27, 2011 at 12:07 PM, Lennart Regebro wrote: > On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé wrote: >> 1/ stability and high availability > > How are opinions on setting up country-specific PyPI mirrors? The lag > to the US is pretty severe in Poland, and I suspect my buildouts would > benefit from having a server in Poland. Now, of course, it could be > called x.pypi.python.org, but maybe we should have aliases such as > pl.pypi.python.org as well? > > I have no strong opinion on the issue, what do others think? I think it's a good idea to have the closest mirror,. One long-term goal I had was to add a client-side geloc code that would prefer the closest mirror. IOW, with your IP and the mirrors IP, pick the closest IP > > //Lennart > -- Tarek Ziadé | http://ziade.org ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fwd: The state of PyPI
On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé wrote: > 1/ stability and high availability How are opinions on setting up country-specific PyPI mirrors? The lag to the US is pretty severe in Poland, and I suspect my buildouts would benefit from having a server in Poland. Now, of course, it could be called x.pypi.python.org, but maybe we should have aliases such as pl.pypi.python.org as well? I have no strong opinion on the issue, what do others think? //Lennart ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
[Distutils] Fwd: The state of PyPI
I have sent that to the PSF list because there's a PSF project about PyPI infra. But someone complained, saying that I was doing this discussion "behind closed doors" SInce this is not my goal, I am now spamming more lists... -- Forwarded message -- From: Tarek Ziadé Date: Tue, Sep 27, 2011 at 10:37 AM Subject: The state of PyPI To: PSF Members List Cc: Richard Jones , Steve Holden Hey This is just a mail that summarizes the current state of PyPI, the existing features, and what can be done next to improve stuff. I am sending this in the PSF members list because we had a project of an infrastructure going on, and I want to make sure all involved parties are in the same page. 1/ stability and high availability 2/ private mirrors 3/ private projects 4/ tutorial ? = stability and high availability = we went in two directions to improve PyPI : 1/ add the mirroring protocol 2/ make the PyPI server more reliable by pushing its storage in a redundant cloud. == mirroring == The mirroring protocol (PEP 381) is implemented on server-side, I've worked with Martin on this, and we have mirrors now: Look at http://pypi.python.org/mirrors Also, there's a client that anyone can use to set up a mirror: http://pypi.python.org/pypi/pep381client The idea is that anyone in the community willing to maintain a mirror can do so. We add the mirror in the CND, and make it available for client tools to use. What's really missing right now is more integration on client-side. - Pip supports the mirroring protocol, and can fall back to a mirror, but I am not 100% sure this is a default behavior. (please correct me if it is now) - Buildout knows how to use *another server* than the main PyPI, so can manually switch to a mirror, but I don't think it's transparent. It should. - Distribute/Setuptools does not do anything for this, and should. - everything is already implemented in packaging/distutils2 The effect of the mirrors is that PyPI being down should not impact the community. This will be true once all tool are transparently using the mirrors. == better infra == I think the project is staled right now. = private mirrors = Having a private mirror makes a lot of sense, when companies need to make sure their build systems are not relying on external services like PyPI or a mirror. It's also a good way to dramatically reduce the load for the community servers. The idea is that a Jenkins server that builds hundreds of Python apps every hour should not hammer PyPI. We have everything needed these days to set up this kind of system, with zc.buildout or pip good practices. What we need is a good tutorial or a guide [*] = private projects = The part that we do not address in the community is private projects: since we don't have any permissions/group/roles system in PyPI, everything is public. One way to solve this is to have a local repository for private packages, that is looked by tools like pip or easy_install, with the --find-links option. What we need is a good tutorial or a guide [*] = tutorial = [*] If this helps, I am willing to work on a tutorial day for Pycon US, that goes through all of this, to help people set up their dev. environment the best way possible. The material could then be published at python.org/pypi to help out. I know Richard has some material already, so maybe this could be a joint tutorial ? HTH Cheers Tarek -- Tarek Ziadé | http://ziade.org -- Tarek Ziadé | http://ziade.org ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig