Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Martin v. Löwis
>> However, since this all forms a single
>> web site, integrating them will be either infeasible or pointless.
> 
> There are 2 sites, "simple" and "pypi".

No, it's one site: http://pypi.python.org/pypi, and
http://pypi.python.org/simple (plus there are other URLs, like
/packages).

Regards,
Martin
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Jim Fulton
On Tue, Sep 27, 2011 at 2:10 PM, "Martin v. Löwis"  wrote:
> Am 27.09.2011 13:12, schrieb Jim Fulton:
>> On Tue, Sep 27, 2011 at 6:07 AM, Lennart Regebro  wrote:
>>> On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé  wrote:
 1/ stability and high availability
>>>
>>> How are opinions on setting up country-specific PyPI mirrors? The lag
>>> to the US is pretty severe in Poland, and I suspect my buildouts would
>>> benefit from having a server in Poland. Now, of course, it could be
>>> called x.pypi.python.org, but maybe we should have aliases such as
>>> pl.pypi.python.org as well?
>>>
>>> I have no strong opinion on the issue, what do others think?
>>
>> Wouldn't CloudFront make this moot?
>
> I personally don't believe the CloudFront project is feasible - IMO,
> it just won't work. This is because there needs to be both dynamic
> content (at least for uploads) and static content; CloudFront can
> only mirror the static content.

That's the intent. To serve the static content.

> However, since this all forms a single
> web site, integrating them will be either infeasible or pointless.

There are 2 sites, "simple" and "pypi".  I would only host the simple
site in a CDN. I assume the mirrors are only mirroring the simple
site. pypi would, I assume, either stay where it is now, or be evolved
separately.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Martin v. Löwis
> That's an extra HTTP request I need to make when I'm
> considering use of a mirror.  If the first mirror I check seems to
> be out of date, I may need to check all the mirrors.  It's an open
> question what should be considered potentially out of date, a
> timestamp older than an hour? a day?

15 minutes; if all mirrors are older than 15 minutes, it probably
means that the master is down, and you should then use the newest
mirror you can find.

> How old is too old?

More than 15 minutes.

>> get the list of mirrors  (-> the list of
>> mirrors and their timestamps get cached)
> 
> They'll only get cached for the program invocation.

No, you can also cache the fastest mirror across invocations,
and keep using it if it is younger than 15 minutes.

>> - pick the closest one
> 
> How do I decide what's closest? Did you mean closest?

He probably means the one that responds fastest.

> "etc" is just waving hands.  Selecting the right value is hard, possibly
> application dependent. Is this a configuration variable?  Now the
> user has something to deal with.

No, it will be 15 minutes. People might request being able to configure
it, but it won't really be necessary to do so.

Regards,
Martin
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Barry Warsaw
On Sep 28, 2011, at 09:59 AM, Greg Ewing wrote:

>Are there really no existing open-source mirroring
>systems out there?

rsync? :)

-Barry
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Greg Ewing

Jim Fulton wrote:


Life is short. We don't have to invent this ourselves.


Are there really no existing open-source mirroring
systems out there?

This seems like a common enough thing that it must
have been solved already.

--
Greg
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Jim Fulton
On Tue, Sep 27, 2011 at 1:25 PM, Tarek Ziadé  wrote:
> On Tue, Sep 27, 2011 at 5:35 PM, Jim Fulton  wrote:

...

>> But I don't want to have to update buildout *just* because of an itch
>> to have a custom protocol.
>
> I kind of wonder how hard it would be to have a standalone pypi
> download client, ripped off from python 3.3's packaging, so you would
> not have to worry about this.

I doubt I'm going to be able to avoid worrying about it.

Still a reference client implementation would be useful.

> And, well, you do not sound like you want to spend time in these
> matters in any case,

I don't know what you mean.  Not sure I care. :)

> so if someone brings a patch I hope you will not
> refuse it.

No.  I'll eventually implement it if no one else does.

>>> But the use case is usually: PyPI is down, we fallback to a mirror. I
>>> don't think it's more complicated than this.
>>
>> I don't agree.  On multiple levels.  PYPI is often up but slow.
>
> That's an orthogonal issue :  any server can be slow.

A service can be fast even if an individual server is slow.
Also, CDNs can make lots of horsepower available that
is shared among multiple customers.  I really doubt that
anything we build will be faster.

> One better way to drastically speed up buildout is to  download /
> build stuff in parallel imo.

Thats true and something I'd like to do at some point. That's one of
the reasons I expect I'll have to worry about the protocol.

>
>> It's also in the wrong place.  A CDN should provide better performance,
>> reliability and locality.
>
> Locality is indeed important, and picking up the nearest server is great.
> Reliability is also solved by the mirrors.

At the expense of increased complexity on the client.

>>
>> A client has to:
>>
>> - try pypi
>> - fallback to "last"
>> - If that's down, decide what other indexes to check
>>
>> I don't see how having timestamps help unless you know
>> what the current timestamp is, unless you say that you'll reject
>> a mirror with a timestamp more than some period in the past.
>
> How hard it is to make those decisions ?

It's not "hard" conceptually, but it's still a lot of
implementation complexity and a lot of extra network
requests.

> Do you really think getting the current timestamp is that hard ?
>
> And the mirror timestamp,
>
>  http://b.pypi.python.org/last-modified
>
> In all you've said I fail to see how complicated it is, or long to do.

That's an extra HTTP request I need to make when I'm
considering use of a mirror.  If the first mirror I check seems to
be out of date, I may need to check all the mirrors.  It's an open
question what should be considered potentially out of date, a
timestamp older than an hour? a day?

> The ordering I see is:
>
> normal behavior:
> - if the cache is too old:

How old is too old?

> get the list of mirrors  (-> the list of
> mirrors and their timestamps get cached)

They'll only get cached for the program invocation.
This means I have to potentially check lots of mirrors
every time someone runs buildout.  I can reduce latency
by doing this in parallel, but that's still a lot of requests.

> - pick the closest one

How do I decide what's closest? Did you mean closest?
or most up to date

> - use it
>
> the server times out:
> - try the "next closest"
>
>
>> It's not clear what this time delta should be and, in any case,
>> the client needs to first validate a mirror by checking it's timestamp.
>
> This is the job of the client yes. An option that says, discard
> mirrors that are > 1 day, or 5 hours etc.

"etc" is just waving hands.  Selecting the right value is hard, possibly
application dependent. Is this a configuration variable?  Now the
user has something to deal with.

> Keeping a local cache that gets updated eventually is sufficient.

In process, or on disk?  This just gets better and better. :)

>> I think this protocol is going to be hard to get right.
>
> Maybe ? but if a v1 allows us to switch from server 1 being down to
> server 2, it's already a success, no ?
>
> servers that *we* the community, manage.

I fail to see why this is inherently a good thing.  I don't like
"managing" things.  Less work is good.

...

> Do we really want Amazon to handle PyPI ?

Yes, or Rackspace, or Google, or AOL, or, whatever.  Just not us.

(I suspect some of these might even do it for free.)

> I prefer a bunch of community mirrors. Heck, I have one at Mozilla,
> and might make it public one day  :)
>
> Or maybe the optimal solution is our own CND proxy so we don't deal
> with this on client side.
>
>  raises, slowly>

Uh, yeah, sure.

FWIW, it hadn't occurred to me to use a CDN until a conversation a few
days ago. Doh.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Martin v. Löwis
> I understand why you don't want to rely on a proprietary solution.

It's not a proprietary solution. It uses standard technologies, such
as HTTP and XML-RPC. If you say that the specific URLs to access are
proprietary: true, but so would be a CloudFront mirror (which also
requires proprietary API to get the data into CloudFront).

> It's a reverse proxy.  You point it at s3 and at a web server and it caches.
> Of course, it has aspects that are specific to it's implementation.

That's not my understanding as to what the project to get PyPI onto
CloudFront is doing (IIUC).

Regards,
Martin
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Martin v. Löwis
Am 27.09.2011 13:12, schrieb Jim Fulton:
> On Tue, Sep 27, 2011 at 6:07 AM, Lennart Regebro  wrote:
>> On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé  wrote:
>>> 1/ stability and high availability
>>
>> How are opinions on setting up country-specific PyPI mirrors? The lag
>> to the US is pretty severe in Poland, and I suspect my buildouts would
>> benefit from having a server in Poland. Now, of course, it could be
>> called x.pypi.python.org, but maybe we should have aliases such as
>> pl.pypi.python.org as well?
>>
>> I have no strong opinion on the issue, what do others think?
> 
> Wouldn't CloudFront make this moot?

I personally don't believe the CloudFront project is feasible - IMO,
it just won't work. This is because there needs to be both dynamic
content (at least for uploads) and static content; CloudFront can
only mirror the static content. However, since this all forms a single
web site, integrating them will be either infeasible or pointless.

It would be possible to use CloudFront if the clients would be changed.
But if so, it would be best if the CloudFront copy would just be a
PEP 381 mirror.

Regards,
Martin
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Alex Clark

On 9/27/11 1:25 PM, Tarek Ziadé wrote:

On Tue, Sep 27, 2011 at 5:35 PM, Jim Fulton  wrote:


I understand where you're coming from but, ..



Sorry, I don't understand what you imply here.



I understand why you don't want to rely on a proprietary solution.


But it's true that I don't want to rely on a proprietary solution.
That's based on a good reason I think, mentioned at the end of this
mail.

...



If you're saying that CloudFront is proven technology and that we
should not worry about relying on them, then I think we can do better
for the community to get locked-in for this, and continue to work on
an open protocol where everyone can participate by providing a spare
server.  But maybe that's just me ?


It's nice to have a hobby. :)


I think you've missed what we, bunch of hobbyists, did in the past two years

+ 5 community mirrors are up and running, collecting download stats
that get merged
+ pip does work with the mirrors, and offer fallback options

It's too bad you were not there to tell us we were wasting our time
and how awesome CloudFront was ;)

But at this point, the shortest road to a better PyPI is to add the
mirroring support to other clients, pip showed the lead. And if
zc.buildout uses Distribute, it should get this feature at some point.

But having a CloudFront-based PyPI could also be interesting in
parallel, I am not saying it's not. But the project is stalled, and
has the defaults I've mentioned.


But I don't want to have to update buildout *just* because of an itch
to have a custom protocol.


I kind of wonder how hard it would be to have a standalone pypi
download client, ripped off from python 3.3's packaging, so you would
not have to worry about this.

And, well, you do not sound like you want to spend time in these
matters in any case, so if someone brings a patch I hope you will not
refuse it.


But the use case is usually: PyPI is down, we fallback to a mirror. I
don't think it's more complicated than this.


I don't agree.  On multiple levels.  PYPI is often up but slow.


That's an orthogonal issue :  any server can be slow.

One better way to drastically speed up buildout is to  download /
build stuff in parallel imo.



It's also in the wrong place.  A CDN should provide better performance,
reliability and locality.


Locality is indeed important, and picking up the nearest server is great.
Reliability is also solved by the mirrors.



A client has to:

- try pypi
- fallback to "last"
- If that's down, decide what other indexes to check

I don't see how having timestamps help unless you know
what the current timestamp is, unless you say that you'll reject
a mirror with a timestamp more than some period in the past.


How hard it is to make those decisions ?

Do you really think getting the current timestamp is that hard ?

And the mirror timestamp,

   http://b.pypi.python.org/last-modified

In all you've said I fail to see how complicated it is, or long to do.

The ordering I see is:

normal behavior:
- if the cache is too old: get the list of mirrors  (->  the list of
mirrors and their timestamps get cached)
- pick the closest one
- use it

the server times out:
- try the "next closest"



It's not clear what this time delta should be and, in any case,
the client needs to first validate a mirror by checking it's timestamp.


This is the job of the client yes. An option that says, discard
mirrors that are>  1 day, or 5 hours etc.

Keeping a local cache that gets updated eventually is sufficient.


I think this protocol is going to be hard to get right.


Maybe ? but if a v1 allows us to switch from server 1 being down to
server 2, it's already a success, no ?

servers that *we* the community, manage.





- It either requires extra dns calls or relies to heavily on the last
mirror, which is probably likely
  to be the least reliable.


Once you have the list, I don't think you require extra call.

see http://hg.python.org/cpython/file/84280fac98b9/Lib/packaging/pypi/mirrors.py


It has to make extra dns calls to resolve the other mirror names to ips.


Yeah, once per session. but in any case, this is not a decision you're
making on every download. It's something you do when you start to
download stuff, and/or when a server times out.

You stick with a server once it's working





Life is short. We don't have to invent this ourselves.


Ah well, yeah -- Not sure what you are proposing right now.

If you imply that everything should be solved on server-side, and that
we should not have mirroring


I think we should pick a good CDN and use it.


I won't object, because this is orthogonal to the mirroring stuff, but
I am not going to scratch the mirroring efforts to move PyPI to a
single shop.

Every service on the planet, even Amazon, can be down.

oh, my:

- https://forums.aws.amazon.com/message.jspa?messageID=244986
- http://money.cnn.com/2011/04/22/technology/amazon_ec2_cloud_outage/index.htm.
- http://www.labnol.org/internet/amazon-s3-cloudfront-down/5667/
- h

Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Tarek Ziadé
On Tue, Sep 27, 2011 at 5:35 PM, Jim Fulton  wrote:

>>> I understand where you're coming from but, ..

>> Sorry, I don't understand what you imply here.

> I understand why you don't want to rely on a proprietary solution.

But it's true that I don't want to rely on a proprietary solution.
That's based on a good reason I think, mentioned at the end of this
mail.

...
>
>> If you're saying that CloudFront is proven technology and that we
>> should not worry about relying on them, then I think we can do better
>> for the community to get locked-in for this, and continue to work on
>> an open protocol where everyone can participate by providing a spare
>> server.  But maybe that's just me ?
>
> It's nice to have a hobby. :)

I think you've missed what we, bunch of hobbyists, did in the past two years

+ 5 community mirrors are up and running, collecting download stats
that get merged
+ pip does work with the mirrors, and offer fallback options

It's too bad you were not there to tell us we were wasting our time
and how awesome CloudFront was ;)

But at this point, the shortest road to a better PyPI is to add the
mirroring support to other clients, pip showed the lead. And if
zc.buildout uses Distribute, it should get this feature at some point.

But having a CloudFront-based PyPI could also be interesting in
parallel, I am not saying it's not. But the project is stalled, and
has the defaults I've mentioned.

> But I don't want to have to update buildout *just* because of an itch
> to have a custom protocol.

I kind of wonder how hard it would be to have a standalone pypi
download client, ripped off from python 3.3's packaging, so you would
not have to worry about this.

And, well, you do not sound like you want to spend time in these
matters in any case, so if someone brings a patch I hope you will not
refuse it.

>> But the use case is usually: PyPI is down, we fallback to a mirror. I
>> don't think it's more complicated than this.
>
> I don't agree.  On multiple levels.  PYPI is often up but slow.

That's an orthogonal issue :  any server can be slow.

One better way to drastically speed up buildout is to  download /
build stuff in parallel imo.


> It's also in the wrong place.  A CDN should provide better performance,
> reliability and locality.

Locality is indeed important, and picking up the nearest server is great.
Reliability is also solved by the mirrors.

>
> A client has to:
>
> - try pypi
> - fallback to "last"
> - If that's down, decide what other indexes to check
>
> I don't see how having timestamps help unless you know
> what the current timestamp is, unless you say that you'll reject
> a mirror with a timestamp more than some period in the past.

How hard it is to make those decisions ?

Do you really think getting the current timestamp is that hard ?

And the mirror timestamp,

  http://b.pypi.python.org/last-modified

In all you've said I fail to see how complicated it is, or long to do.

The ordering I see is:

normal behavior:
- if the cache is too old: get the list of mirrors  (-> the list of
mirrors and their timestamps get cached)
- pick the closest one
- use it

the server times out:
- try the "next closest"


> It's not clear what this time delta should be and, in any case,
> the client needs to first validate a mirror by checking it's timestamp.

This is the job of the client yes. An option that says, discard
mirrors that are > 1 day, or 5 hours etc.

Keeping a local cache that gets updated eventually is sufficient.

> I think this protocol is going to be hard to get right.

Maybe ? but if a v1 allows us to switch from server 1 being down to
server 2, it's already a success, no ?

servers that *we* the community, manage.



>>>
>>> - It either requires extra dns calls or relies to heavily on the last
>>> mirror, which is probably likely
>>>  to be the least reliable.
>>
>> Once you have the list, I don't think you require extra call.
>>
>> see 
>> http://hg.python.org/cpython/file/84280fac98b9/Lib/packaging/pypi/mirrors.py
>
> It has to make extra dns calls to resolve the other mirror names to ips.

Yeah, once per session. but in any case, this is not a decision you're
making on every download. It's something you do when you start to
download stuff, and/or when a server times out.

You stick with a server once it's working

>
>
>>> Life is short. We don't have to invent this ourselves.
>>
>> Ah well, yeah -- Not sure what you are proposing right now.
>>
>> If you imply that everything should be solved on server-side, and that
>> we should not have mirroring
>
> I think we should pick a good CDN and use it.

I won't object, because this is orthogonal to the mirroring stuff, but
I am not going to scratch the mirroring efforts to move PyPI to a
single shop.

Every service on the planet, even Amazon, can be down.

oh, my:

- https://forums.aws.amazon.com/message.jspa?messageID=244986
- http://money.cnn.com/2011/04/22/technology/amazon_ec2_cloud_outage/index.htm.
- http://www.labnol.or

Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Jim Fulton
On Tue, Sep 27, 2011 at 8:40 AM, Tarek Ziadé  wrote:
> On Tue, Sep 27, 2011 at 2:27 PM, Jim Fulton  wrote:
> ...
>>
>> I understand where you're coming from but, ..
>
> Sorry, I don't understand what you imply here.

I understand why you don't want to rely on a proprietary solution.

>> I think it's saner to rely on proven technology
>> than to invent our own protocol. NIH?
>
> Ah sorry I misunderstood then. I thought CloudFront was a proprietary
> platform, with its own protocol.

It's a reverse proxy.  You point it at s3 and at a web server and it caches.
Of course, it has aspects that are specific to it's implementation.

> If you're saying that we can move away from CloudFront at any time and
> have the same feature elsewhere, then it's perfect.

If we move to something else, *some* changes will be necessary,
but we can certainly move. I agree, it's perfect. ;)

> If you're saying that CloudFront is proven technology and that we
> should not worry about relying on them, then I think we can do better
> for the community to get locked-in for this, and continue to work on
> an open protocol where everyone can participate by providing a spare
> server.  But maybe that's just me ?

It's nice to have a hobby. :)

But I don't want to have to update buildout *just* because of an itch
to have a custom protocol.

>
> Most of the mirroring protocol was inspired by Perl's CPAN btw.



> But the use case is usually: PyPI is down, we fallback to a mirror. I
> don't think it's more complicated than this.

I don't agree.  On multiple levels.  PYPI is often up but slow.
It's also in the wrong place.  A CDN should provide better performance,
reliability and locality.

A client has to:

- try pypi
- fallback to "last"
- If that's down, decide what other indexes to check

I don't see how having timestamps help unless you know
what the current timestamp is, unless you say that you'll reject
a mirror with a timestamp more than some period in the past.
It's not clear what this time delta should be and, in any case,
the client needs to first validate a mirror by checking it's timestamp.

I think this protocol is going to be hard to get right.

>>
>> - It either requires extra dns calls or relies to heavily on the last
>> mirror, which is probably likely
>>  to be the least reliable.
>
> Once you have the list, I don't think you require extra call.
>
> see 
> http://hg.python.org/cpython/file/84280fac98b9/Lib/packaging/pypi/mirrors.py

It has to make extra dns calls to resolve the other mirror names to ips.


>> Life is short. We don't have to invent this ourselves.
>
> Ah well, yeah -- Not sure what you are proposing right now.
>
> If you imply that everything should be solved on server-side, and that
> we should not have mirroring

I think we should pick a good CDN and use it.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Alex Clark

On 9/27/11 9:07 AM, Tarek Ziadé wrote:

On Tue, Sep 27, 2011 at 3:00 PM, Wichert Akkerman  wrote:

On 09/27/2011 02:51 PM, Tarek Ziadé wrote:


On Tue, Sep 27, 2011 at 2:39 PM, Wichert Akkerman
  wrote:
..


I understand where you're coming from but, ..
I think it's saner to rely on proven technology
than to invent our own protocol. NIH?


This also feels like a problem that has already been solved in various
ways
by Debian, RedHat, CPAN and others.


Yes, and we've found a way similar to CPAN, with some Python specifics
(PyPI download statistics mainly)

Oh my, we're cycling again.

Nothing personal to you or Jim, but I have a sudden fatigue on
packaging because it seems like people are ignoring what's being done
to complain afterwards about us suffering of some kind of NIH :)


It's just that my perspective is that of a simple user. And from my
perspective nothing has changed in the last couple of years. Pypi still goes
down occasionally, and when that happens many things start breaking. It may
very well be that there are things planned or in progress, but until they
are both usable and used by standard tools, which for me means buildout and
setuptools, they are invisible.


Fair enough,

Pip has now the mirroring protocol implemented. I think they want to
make it a default option for the next major Pip release. IOW, you
should not suffer for downtimes using pip.

We'd need to add the same feature in easy_installand zc.buildout.
But since Pip did it, I think it's possible.

We have 5 mirrors run by the community
(http://pypi.python.org/mirrors) and I suspect porting pip's feature
to zc.buildout and easy_install would take less time than creating a
  app.



I'm not sure I fully understand why each-tool needs updating (vs. 
something that would provide HA but be invisible to the tools), but I 
suspect a lot of this is "legacy" related (i.e. PyPI was not originally 
designed to handle HA) and adding support to each tool, though 
potentially tedious-sounding (if not actually tedious) may be reasonable.




Alex








Cheers
Tarek



Wichert.








--
Alex Clark · http://aclark.net

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Alex Clark

On 9/27/11 8:51 AM, Tarek Ziadé wrote:

On Tue, Sep 27, 2011 at 2:39 PM, Wichert Akkerman  wrote:
..


I understand where you're coming from but, ..
I think it's saner to rely on proven technology
than to invent our own protocol. NIH?


This also feels like a problem that has already been solved in various ways
by Debian, RedHat, CPAN and others.


Yes, and we've found a way similar to CPAN, with some Python specifics
(PyPI download statistics mainly)

Oh my, we're cycling again.

Nothing personal to you or Jim, but I have a sudden fatigue on
packaging because it seems like people are ignoring what's being done
to complain afterwards about us suffering of some kind of NIH :)

If you're seeing anything you don't like in PEP  381 (accepted a while
ago), go ahead and propose some improvements.

But please keep in mind that we've looked at other systems before we
wrote that PEP.



+1 If someone (Tarek in this case) wants to take the lead on this then I 
don't care where the fix was invented ;-)


Obviously the less we make volunteers do, and the more we can outsource 
the better (I say this in Plone-land repeatedly) but sometimes 
inventing-something-here is the right thing to do. And it certainly 
would not preclude other (cloud) fixes AFAICT.




Alex








Cheers
Tarek




--
Alex Clark · http://aclark.net

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Mathieu Leduc-Hamel
Is there any plan also to work on an updated and better version of
Pypi, I mean, i understand that we need to work on the infrastructure
and make sure it will handle the load.

But i also think that the current codebase of Pypi would need be
updated. Is there any plan refactored in part of this work ? i Suspect
that this is something we'll have to do one day another if we want
continue to integrate things in it, to add feature...

On Tue, Sep 27, 2011 at 9:07 AM, Tarek Ziadé  wrote:
> On Tue, Sep 27, 2011 at 3:00 PM, Wichert Akkerman  wrote:
>> On 09/27/2011 02:51 PM, Tarek Ziadé wrote:
>>>
>>> On Tue, Sep 27, 2011 at 2:39 PM, Wichert Akkerman
>>>  wrote:
>>> ..
>
> I understand where you're coming from but, ..
> I think it's saner to rely on proven technology
> than to invent our own protocol. NIH?

 This also feels like a problem that has already been solved in various
 ways
 by Debian, RedHat, CPAN and others.
>>>
>>> Yes, and we've found a way similar to CPAN, with some Python specifics
>>> (PyPI download statistics mainly)
>>>
>>> Oh my, we're cycling again.
>>>
>>> Nothing personal to you or Jim, but I have a sudden fatigue on
>>> packaging because it seems like people are ignoring what's being done
>>> to complain afterwards about us suffering of some kind of NIH :)
>>
>> It's just that my perspective is that of a simple user. And from my
>> perspective nothing has changed in the last couple of years. Pypi still goes
>> down occasionally, and when that happens many things start breaking. It may
>> very well be that there are things planned or in progress, but until they
>> are both usable and used by standard tools, which for me means buildout and
>> setuptools, they are invisible.
>
> Fair enough,
>
> Pip has now the mirroring protocol implemented. I think they want to
> make it a default option for the next major Pip release. IOW, you
> should not suffer for downtimes using pip.
>
> We'd need to add the same feature in easy_install and zc.buildout.
> But since Pip did it, I think it's possible.
>
> We have 5 mirrors run by the community
> (http://pypi.python.org/mirrors) and I suspect porting pip's feature
> to zc.buildout and easy_install would take less time than creating a
>  app.
>
>
> Cheers
> Tarek
>
>>
>> Wichert.
>>
>
>
>
> --
> Tarek Ziadé | http://ziade.org
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> http://mail.python.org/mailman/listinfo/distutils-sig
>



-- 
Mathieu Leduc-Hamel
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Tarek Ziadé
On Tue, Sep 27, 2011 at 3:00 PM, Wichert Akkerman  wrote:
> On 09/27/2011 02:51 PM, Tarek Ziadé wrote:
>>
>> On Tue, Sep 27, 2011 at 2:39 PM, Wichert Akkerman
>>  wrote:
>> ..

 I understand where you're coming from but, ..
 I think it's saner to rely on proven technology
 than to invent our own protocol. NIH?
>>>
>>> This also feels like a problem that has already been solved in various
>>> ways
>>> by Debian, RedHat, CPAN and others.
>>
>> Yes, and we've found a way similar to CPAN, with some Python specifics
>> (PyPI download statistics mainly)
>>
>> Oh my, we're cycling again.
>>
>> Nothing personal to you or Jim, but I have a sudden fatigue on
>> packaging because it seems like people are ignoring what's being done
>> to complain afterwards about us suffering of some kind of NIH :)
>
> It's just that my perspective is that of a simple user. And from my
> perspective nothing has changed in the last couple of years. Pypi still goes
> down occasionally, and when that happens many things start breaking. It may
> very well be that there are things planned or in progress, but until they
> are both usable and used by standard tools, which for me means buildout and
> setuptools, they are invisible.

Fair enough,

Pip has now the mirroring protocol implemented. I think they want to
make it a default option for the next major Pip release. IOW, you
should not suffer for downtimes using pip.

We'd need to add the same feature in easy_install and zc.buildout.
But since Pip did it, I think it's possible.

We have 5 mirrors run by the community
(http://pypi.python.org/mirrors) and I suspect porting pip's feature
to zc.buildout and easy_install would take less time than creating a
 app.


Cheers
Tarek

>
> Wichert.
>



-- 
Tarek Ziadé | http://ziade.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Wichert Akkerman

On 09/27/2011 02:51 PM, Tarek Ziadé wrote:

On Tue, Sep 27, 2011 at 2:39 PM, Wichert Akkerman  wrote:
..

I understand where you're coming from but, ..
I think it's saner to rely on proven technology
than to invent our own protocol. NIH?

This also feels like a problem that has already been solved in various ways
by Debian, RedHat, CPAN and others.

Yes, and we've found a way similar to CPAN, with some Python specifics
(PyPI download statistics mainly)

Oh my, we're cycling again.

Nothing personal to you or Jim, but I have a sudden fatigue on
packaging because it seems like people are ignoring what's being done
to complain afterwards about us suffering of some kind of NIH :)


It's just that my perspective is that of a simple user. And from my 
perspective nothing has changed in the last couple of years. Pypi still 
goes down occasionally, and when that happens many things start 
breaking. It may very well be that there are things planned or in 
progress, but until they are both usable and used by standard tools, 
which for me means buildout and setuptools, they are invisible.


Wichert.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Tarek Ziadé
On Tue, Sep 27, 2011 at 2:39 PM, Wichert Akkerman  wrote:
..
>>
>> I understand where you're coming from but, ..
>> I think it's saner to rely on proven technology
>> than to invent our own protocol. NIH?
>
> This also feels like a problem that has already been solved in various ways
> by Debian, RedHat, CPAN and others.

Yes, and we've found a way similar to CPAN, with some Python specifics
(PyPI download statistics mainly)

Oh my, we're cycling again.

Nothing personal to you or Jim, but I have a sudden fatigue on
packaging because it seems like people are ignoring what's being done
to complain afterwards about us suffering of some kind of NIH :)

If you're seeing anything you don't like in PEP  381 (accepted a while
ago), go ahead and propose some improvements.

But please keep in mind that we've looked at other systems before we
wrote that PEP.

Cheers
Tarek

-- 
Tarek Ziadé | http://ziade.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Tarek Ziadé
On Tue, Sep 27, 2011 at 2:27 PM, Jim Fulton  wrote:
...
>
> I understand where you're coming from but, ..

Sorry, I don't understand what you imply here.


> I think it's saner to rely on proven technology
> than to invent our own protocol. NIH?

Ah sorry I misunderstood then. I thought CloudFront was a proprietary
platform, with its own protocol.

If you're saying that we can move away from CloudFront at any time and
have the same feature elsewhere, then it's perfect.

If you're saying that CloudFront is proven technology and that we
should not worry about relying on them, then I think we can do better
for the community to get locked-in for this, and continue to work on
an open protocol where everyone can participate by providing a spare
server.  But maybe that's just me ?

Most of the mirroring protocol was inspired by Perl's CPAN btw.

>
> BTW, in looking at PEP 381 (yeah, I know, I'm a bad person
> for waiting so long)

Yeah, started around 2 years ago, but comments are always welcome  :)

>  I have lots of reservations about the protocol:
>
> - It's potentially complex to implement efficiently, especially given that:
>
> - We've had problems with mirrors getting out of date, meaning that,
> potentially, clients should
>  check multiple indexes,

Yeah, mirrors do get out of sync. There's a freshness time stamp.

But the use case is usually: PyPI is down, we fallback to a mirror. I
don't think it's more complicated than this.

>
> - It either requires extra dns calls or relies to heavily on the last
> mirror, which is probably likely
>  to be the least reliable.

Once you have the list, I don't think you require extra call.

see http://hg.python.org/cpython/file/84280fac98b9/Lib/packaging/pypi/mirrors.py

>
> Life is short. We don't have to invent this ourselves.

Ah well, yeah -- Not sure what you are proposing right now.

If you imply that everything should be solved on server-side, and that
we should not have mirroring

>
> Jim
>
> --
> Jim Fulton
> http://www.linkedin.com/in/jimfulton
>



-- 
Tarek Ziadé | http://ziade.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Wichert Akkerman

On 09/27/2011 02:27 PM, Jim Fulton wrote:

On Tue, Sep 27, 2011 at 7:58 AM, Tarek Ziadé  wrote:

2011/9/27 Jim Fulton:

On Tue, Sep 27, 2011 at 6:07 AM, Lennart Regebro  wrote:

On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé  wrote:

1/ stability and high availability

How are opinions on setting up country-specific PyPI mirrors? The lag
to the US is pretty severe in Poland, and I suspect my buildouts would
benefit from having a server in Poland. Now, of course, it could be
called x.pypi.python.org, but maybe we should have aliases such as
pl.pypi.python.org as well?

I have no strong opinion on the issue, what do others think?

Wouldn't CloudFront make this moot?

If we state that PyPI completely depends on Amazon I guess yes.

But imho, it's saner for the long term to have a community-driven
protocol for mirroring so we don't rely on third-party vendors.

I understand where you're coming from but, ..
I think it's saner to rely on proven technology
than to invent our own protocol. NIH?


This also feels like a problem that has already been solved in various 
ways by Debian, RedHat, CPAN and others.


Wichert.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Jim Fulton
On Tue, Sep 27, 2011 at 7:58 AM, Tarek Ziadé  wrote:
> 2011/9/27 Jim Fulton :
>> On Tue, Sep 27, 2011 at 6:07 AM, Lennart Regebro  wrote:
>>> On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé  wrote:
 1/ stability and high availability
>>>
>>> How are opinions on setting up country-specific PyPI mirrors? The lag
>>> to the US is pretty severe in Poland, and I suspect my buildouts would
>>> benefit from having a server in Poland. Now, of course, it could be
>>> called x.pypi.python.org, but maybe we should have aliases such as
>>> pl.pypi.python.org as well?
>>>
>>> I have no strong opinion on the issue, what do others think?
>>
>> Wouldn't CloudFront make this moot?
>
> If we state that PyPI completely depends on Amazon I guess yes.
>
> But imho, it's saner for the long term to have a community-driven
> protocol for mirroring so we don't rely on third-party vendors.

I understand where you're coming from but, ..
I think it's saner to rely on proven technology
than to invent our own protocol. NIH?

BTW, in looking at PEP 381 (yeah, I know, I'm a bad person
for waiting so long) I have lots of reservations about the protocol:

- It's potentially complex to implement efficiently, especially given that:

- We've had problems with mirrors getting out of date, meaning that,
potentially, clients should
  check multiple indexes,

- It either requires extra dns calls or relies to heavily on the last
mirror, which is probably likely
  to be the least reliable.

Life is short. We don't have to invent this ourselves.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Tarek Ziadé
2011/9/27 Jim Fulton :
> On Tue, Sep 27, 2011 at 6:07 AM, Lennart Regebro  wrote:
>> On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé  wrote:
>>> 1/ stability and high availability
>>
>> How are opinions on setting up country-specific PyPI mirrors? The lag
>> to the US is pretty severe in Poland, and I suspect my buildouts would
>> benefit from having a server in Poland. Now, of course, it could be
>> called x.pypi.python.org, but maybe we should have aliases such as
>> pl.pypi.python.org as well?
>>
>> I have no strong opinion on the issue, what do others think?
>
> Wouldn't CloudFront make this moot?

If we state that PyPI completely depends on Amazon I guess yes.

But imho, it's saner for the long term to have a community-driven
protocol for mirroring so we don't rely on third-party vendors.

>
> I'm not clear I follow the plans, but it appears that CloudFront would
> provide greater availability with little or no changes to clients.
>
> Jim
>
> --
> Jim Fulton
> http://www.linkedin.com/in/jimfulton
>



-- 
Tarek Ziadé | http://ziade.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Jim Fulton
On Tue, Sep 27, 2011 at 6:07 AM, Lennart Regebro  wrote:
> On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé  wrote:
>> 1/ stability and high availability
>
> How are opinions on setting up country-specific PyPI mirrors? The lag
> to the US is pretty severe in Poland, and I suspect my buildouts would
> benefit from having a server in Poland. Now, of course, it could be
> called x.pypi.python.org, but maybe we should have aliases such as
> pl.pypi.python.org as well?
>
> I have no strong opinion on the issue, what do others think?

Wouldn't CloudFront make this moot?

I'm not clear I follow the plans, but it appears that CloudFront would
provide greater availability with little or no changes to clients.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Tarek Ziadé
On Tue, Sep 27, 2011 at 12:07 PM, Lennart Regebro  wrote:
> On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé  wrote:
>> 1/ stability and high availability
>
> How are opinions on setting up country-specific PyPI mirrors? The lag
> to the US is pretty severe in Poland, and I suspect my buildouts would
> benefit from having a server in Poland. Now, of course, it could be
> called x.pypi.python.org, but maybe we should have aliases such as
> pl.pypi.python.org as well?
>
> I have no strong opinion on the issue, what do others think?

I think it's a good idea to have the closest mirror,.

One long-term goal I had was to add a client-side geloc code that
would prefer the closest mirror.

IOW, with your IP and the mirrors IP, pick the closest IP

>
> //Lennart
>



-- 
Tarek Ziadé | http://ziade.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fwd: The state of PyPI

2011-09-27 Thread Lennart Regebro
On Tue, Sep 27, 2011 at 11:40, Tarek Ziadé  wrote:
> 1/ stability and high availability

How are opinions on setting up country-specific PyPI mirrors? The lag
to the US is pretty severe in Poland, and I suspect my buildouts would
benefit from having a server in Poland. Now, of course, it could be
called x.pypi.python.org, but maybe we should have aliases such as
pl.pypi.python.org as well?

I have no strong opinion on the issue, what do others think?

//Lennart
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] Fwd: The state of PyPI

2011-09-27 Thread Tarek Ziadé
I have sent that to the PSF list because there's a PSF project about PyPI infra.

But someone complained, saying that I was doing this discussion
"behind closed doors"

SInce this is not my goal, I am now spamming more lists...


-- Forwarded message --
From: Tarek Ziadé 
Date: Tue, Sep 27, 2011 at 10:37 AM
Subject: The state of PyPI
To: PSF Members List 
Cc: Richard Jones , Steve Holden 


Hey

This is just a mail that summarizes the current state of PyPI, the
existing features, and what can be done next to improve stuff.

I am sending this in the PSF members list because we had a project of
an infrastructure going on, and I want to make sure all involved
parties are in the same page.

1/ stability and high availability
2/ private mirrors
3/ private projects
4/ tutorial ?


= stability and high availability =

we went in two directions to improve PyPI :

1/ add the mirroring protocol
2/ make the PyPI server more reliable by pushing its storage in a
redundant cloud.

== mirroring ==

The mirroring protocol (PEP 381) is implemented on server-side, I've
worked with Martin on this, and we have mirrors now:

Look at http://pypi.python.org/mirrors

Also, there's a client that anyone can use to set up a mirror:
http://pypi.python.org/pypi/pep381client

The idea is that anyone in the community willing to maintain a mirror
can do so. We add the mirror in the CND, and make it available for
client tools to use. What's really missing right now is more
integration on client-side.

- Pip supports the mirroring protocol, and can fall back to a mirror,
but I am not 100% sure this is a default behavior.  (please correct me
if it is now)
- Buildout knows how to use *another server* than the main PyPI, so
can manually switch to a mirror, but I don't think it's transparent.
It should.
- Distribute/Setuptools does not do anything for this, and should.
- everything is already implemented in packaging/distutils2

The effect of the mirrors is that PyPI being down should not impact
the community. This will be true once all tool are transparently using
the mirrors.

== better infra ==

I think the project is staled right now.


= private mirrors =

Having a private mirror makes a lot of sense, when companies need to
make sure their build systems are not relying on external services
like PyPI or a mirror. It's also a good way to dramatically reduce the
load for the community servers.

The idea is that a Jenkins server that builds hundreds of Python apps
every hour should not hammer PyPI.

We have everything needed these days to set up this kind of system,
with zc.buildout or pip good practices.

What we need is a good tutorial or a guide [*]

= private projects =

The part that we do not address in the community is private projects:
since we don't have any permissions/group/roles system in PyPI,
everything is public.

One way to solve this is to have a local repository for private
packages, that is looked by tools like pip or easy_install, with the
--find-links option.


What we need is a good tutorial or a guide [*]

= tutorial =

[*] If this helps, I am willing to work on a tutorial day for Pycon
US, that goes through all of this, to help people set up their dev.
environment the best way possible.

The material could then be published at python.org/pypi to help out.

I know Richard has some material already, so maybe this could be a
joint tutorial ?

HTH

Cheers
Tarek





--
Tarek Ziadé | http://ziade.org



-- 
Tarek Ziadé | http://ziade.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig