[Python-ideas] Re: "Curated" package repo?

2023-07-25 Thread Chris Angelico
On Wed, 26 Jul 2023 at 01:20, Jonathan Crall  wrote:
>
> > On Mon, Jul 24, 2023 at 10:04 AM Chris Angelico  wrote:
> > can you tell me what this vetting procedure proves that isn't already 
> > proven by mere popularity itself?
>
> I think that's what this thread is trying to discuss. Do I have the exact 
> perfect implementation? No. But I imagine it to be akin to peer-review. I 
> can't prove this, but I think it adds signal that complements popularity.
>

So you're suggesting nothing, only that there might be a possible
suggestion to be made. Do you have an actual concrete proposal?

> > And are you also saying that packages should be *removed* from this curated 
> > list?
>
> Yes, absolutely. If packages fall out of maintenance, are deprecated, or 
> end-of-life, they should no longer be this curated list. I imagine the 
> mechanism would be a series of snapshots of what the state of the list is at 
> a particular point in time. What is the mechanism for determining the trigger 
> to remove a package? No clue right now.  There are a lot of problems - most 
> of them social - that need to be sorted out to make a useful curated package 
> list a reality. I don't claim to have the answers, but I'm willing to 
> participate in discussion to find them.
>

So who's going to actually do the work? You?

> > More robust in what way?
>
> It's curated by people I trust more than the average bear.

I trust the average bear a lot more than most people. They are pretty
awesome. Did you see the one that tried to eat Tom Scott's gopro
camera? I'd trust that bear (and stay out of its way).

> > If not, how is it different from "yet another collection"?
>
> We are discussing details about it here instead of just posting what we think 
> should be there on github right now. I'm putting a bit of trust in this group 
> to find a way to do that. I do think that separating the opinions of experts 
> (however we choose to define that group - but I hope you agree that it should 
> be possible to find some reasonable definition) as an auxiliary re-ranking on 
> top of popularity is a good differentiator.
>

So yeah, you have no concrete proposal. Okay then.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WNFRRVBMX3KTWVG4KXEJILXR26HMCQCS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-25 Thread Jonathan Crall
> On Mon, Jul 24, 2023 at 10:04 AM Chris Angelico  wrote:
> can you tell me what this vetting procedure proves that isn't already
proven by mere popularity itself?

I think that's what this thread is trying to discuss. Do I have the exact
perfect implementation? No. But I imagine it to be akin to peer-review. I
can't prove this, but I think it adds signal that complements popularity.

> And are you also saying that packages should be *removed* from this
curated list?

Yes, absolutely. If packages fall out of maintenance, are deprecated, or
end-of-life, they should no longer be this curated list. I imagine the
mechanism would be a series of snapshots of what the state of the list is
at a particular point in time. What is the mechanism for determining the
trigger to remove a package? No clue right now.  There are a lot of
problems - most of them social - that need to be sorted out to make a
*useful* curated package list a reality. I don't claim to have the answers,
but I'm willing to participate in discussion to find them.

> More robust in what way?

It's curated by people I trust more than the average bear.

> If not, how is it different from "yet another collection"?

We are discussing details about it here instead of just posting what we
think should be there on github right now. I'm putting a bit of trust in
this group to find a way to do that. I do think that separating the
opinions of experts (however we choose to define that group - but I hope
you agree that it should be possible to find *some *reasonable definition)
as an auxiliary re-ranking on top of popularity is a good differentiator.



On Tue, Jul 25, 2023 at 1:31 AM Stephen J. Turnbull <
turnbull.stephen...@u.tsukuba.ac.jp> wrote:

> George Fischhof writes:
>
> [For heaven's sake, trim!  You expressed your ideas very clearly, the
> quote adds little to them.]
>
>  > it has got to my mind that even just grouping similar / same goal
>  > packages could help the current situation.
>
> This is a good idea.  I doubt it reduces the problem compared to the
> review site or the curation very much: some poor rodent(s) still gotta
> put the dinger on the feline.
>
> However, in designing those pages, we could explicitly ask for names
> of similar packages and recommendations for use cases where an
> alternative package might be preferred, and provide links to the
> review pages for those packages that are mentioned in the response.
> We can also provide suggestions based on comparisons other users have
> made.  (Hopefully there won't be too many comparisons like "this
> package is the numpy of its category" -- that's hard to parse!)
>
>  > Additionally perhaps the users could give relative valuation,
>
> Not sure asking for rankings is a great idea, globally valid rankings
> are rare -- ask any heavy numpy user who occasionally uses the sum
> builtin on lists.
>
>  > for example there are A, B, C, D similar packages, users could say:
>  > I tried out A and B, and found that A is better then B, and could
>  > have some valuation categories: simple, easy, powerful etc. This
>  > would show for example that package A is simple, but B is more
>  > powerful
>
> These tags would be useful.  I think the explanation needs to be
> considered carefully, because absolutes don't really exist, and if
> you're comparing to the class, you want to know which packages the
> reviewer is comparing to.  I'm not sure many users would go to the
> trouble of providing full rankings, even for the packages they've
> mentioned.  Worth a try though!
>
> Steve
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/MGYK3EEEYKISRQMFJY7U73ID6VBQY4AT/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
-Dr. Jon Crall (him)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VOIGDXADBJIL2GKIYJE6STVY3XQXQFDG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-24 Thread Stephen J. Turnbull
George Fischhof writes:

[For heaven's sake, trim!  You expressed your ideas very clearly, the
quote adds little to them.]

 > it has got to my mind that even just grouping similar / same goal
 > packages could help the current situation.

This is a good idea.  I doubt it reduces the problem compared to the
review site or the curation very much: some poor rodent(s) still gotta
put the dinger on the feline.

However, in designing those pages, we could explicitly ask for names
of similar packages and recommendations for use cases where an
alternative package might be preferred, and provide links to the
review pages for those packages that are mentioned in the response.
We can also provide suggestions based on comparisons other users have
made.  (Hopefully there won't be too many comparisons like "this
package is the numpy of its category" -- that's hard to parse!)

 > Additionally perhaps the users could give relative valuation,

Not sure asking for rankings is a great idea, globally valid rankings
are rare -- ask any heavy numpy user who occasionally uses the sum
builtin on lists.

 > for example there are A, B, C, D similar packages, users could say:
 > I tried out A and B, and found that A is better then B, and could
 > have some valuation categories: simple, easy, powerful etc. This
 > would show for example that package A is simple, but B is more
 > powerful

These tags would be useful.  I think the explanation needs to be
considered carefully, because absolutes don't really exist, and if
you're comparing to the class, you want to know which packages the
reviewer is comparing to.  I'm not sure many users would go to the
trouble of providing full rankings, even for the packages they've
mentioned.  Worth a try though!

Steve

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MGYK3EEEYKISRQMFJY7U73ID6VBQY4AT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-24 Thread Chris Angelico
On Mon, 24 Jul 2023 at 23:28, Jonathan Crall  wrote:
>
> If popular packages weren't favored that would be a problem. Popularity 
> should be correlated with "trustworthiness" or whatever the metric this 
> curated repo seeks to maximize. I think the important thing is that the 
> packages are both popular and have passed some sort of vetting procedure.
>

Okay, but can you tell me what this vetting procedure proves that
isn't already proven by mere popularity itself?

> For instance, for a very long time Python2 was far more popular than Python3, 
> but any expert in the field would encourage users to move to Python3 sooner 
> rather than later. Python2 is popular, but it wouldn't have made the cut on 
> some expert-curated list.
>

Experts were divided for a very long time. I'm not sure what your
point is here. And are you also saying that packages should be
*removed* from this curated list? Because if so, what's the mechanic
for this? (Python 2 absolutely WOULD have made the cut on any
expert-curated list prior to Python 3's inception.)

> So it helps in that it reranks popular packages (and also excludes some) for 
> those who want to adopt a more strict security / reliability posture.
>
> By no means do I think this would replace pypi as the de-facto packaging 
> repository. Its low barrier to entry is extremely important for a thriving 
> community, but I also wouldn't mind having something a bit more robust.
>
> I also think this project would have to careful not to become yet another 
> "awsome-python-package" collection. Those certainly have value, but based on 
> the initial proposal, I'm interested in something a tad more robust.
>

More robust in what way? What exactly are the requirements to be part
of this list? Will all experts agree? If not, how is it different from
"yet another collection"?

(Also, PLEASE don't top-post. There's no value in it. Show what you're
responding to.)

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HCTAH2YI2F3AX5POPKMLNJVYKJ7NJPNB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-24 Thread George Fischhof
Jonathan Crall  ezt írta (időpont: 2023. júl. 24., H,
15:29):

> If popular packages weren't favored that would be a problem. Popularity
> should be correlated with "trustworthiness" or whatever the metric this
> curated repo seeks to maximize. I think the important thing is that the
> packages are both popular and have passed some sort of vetting procedure.
>
> For instance, for a very long time Python2 was far more popular than
> Python3, but any expert in the field would encourage users to move to
> Python3 sooner rather than later. Python2 is popular, but it wouldn't have
> made the cut on some expert-curated list.
>
> So it helps in that it reranks popular packages (and also excludes some)
> for those who want to adopt a more strict security / reliability posture.
>
> By no means do I think this would replace pypi as the de-facto packaging
> repository. Its low barrier to entry is extremely important for a thriving
> community, but I also wouldn't mind having something a bit more robust.
>
> I also think this project would have to careful not to become yet another
> "awsome-python-package" collection. Those certainly have value, but based
> on the initial proposal, I'm interested in something a tad more robust.
>
>
... some old stuff cut ...

Hi Folks,

it has got to my mind that even just grouping similar / same goal packages
could help the current situation.
Unfortunately searching by name or category is not enough, and takes much
time.
By linking similar packages together would give the users the possibility
to evaluate all / several of them.

Additionally perhaps the users could give relative valuation, for example
there are A, B, C, D similar packages, users could say: I tried out A and
B, and found that A is better then B, and could have some valuation
categories: simple, easy, powerful etc. This would show for example that
package A is simple, but B is more powerful

BR,
George
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OIX5GLWSJORZXS4DET3YJGKQR7IUFYUE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-24 Thread Jonathan Crall
If popular packages weren't favored that would be a problem. Popularity
should be correlated with "trustworthiness" or whatever the metric this
curated repo seeks to maximize. I think the important thing is that the
packages are both popular and have passed some sort of vetting procedure.

For instance, for a very long time Python2 was far more popular than
Python3, but any expert in the field would encourage users to move to
Python3 sooner rather than later. Python2 is popular, but it wouldn't have
made the cut on some expert-curated list.

So it helps in that it reranks popular packages (and also excludes some)
for those who want to adopt a more strict security / reliability posture.

By no means do I think this would replace pypi as the de-facto packaging
repository. Its low barrier to entry is extremely important for a thriving
community, but I also wouldn't mind having something a bit more robust.

I also think this project would have to careful not to become yet another
"awsome-python-package" collection. Those certainly have value, but based
on the initial proposal, I'm interested in something a tad more robust.

On Mon, Jul 24, 2023 at 8:55 AM Chris Angelico  wrote:

> On Mon, 24 Jul 2023 at 21:02, James Addison via Python-ideas
>  wrote:
> > ... some thoughts on how to build a scalable, resilient trust network
> based on user ratings; I can't guarantee that it'll change your opinion,
> though!
> >
>
> This still has the fundamental problems of any sort of user rating
> system: popular packages are inherently favoured. And we can already
> get a list of popular packages, because download stats are available.
> So how would this scheme help?
>
> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/LU6BFQGNCMZZVESCUUCPSVKWPKJEJB7H/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
-Dr. Jon Crall (him)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WRF6VSZ47KMGJI3PZEWBQCHFFGYE7AJ2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-24 Thread Chris Angelico
On Mon, 24 Jul 2023 at 21:02, James Addison via Python-ideas
 wrote:
> ... some thoughts on how to build a scalable, resilient trust network based 
> on user ratings; I can't guarantee that it'll change your opinion, though!
>

This still has the fundamental problems of any sort of user rating
system: popular packages are inherently favoured. And we can already
get a list of popular packages, because download stats are available.
So how would this scheme help?

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LU6BFQGNCMZZVESCUUCPSVKWPKJEJB7H/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-24 Thread James Addison via Python-ideas
On Sun, Jul 9, 2023, 23:35 Christopher Barker  wrote:

> On Sun, Jul 9, 2023 at 8:37 AM James Addison via Python-ideas <
> python-ideas@python.org> wrote:
>
>> ISTM the primary use cases advanced here have been for "naive" users.
 Likely they won't be in a position to decide whether they trust Guido
 van Rossum or Egg Rando more.
>>>
>>>
> There are 718,155 users on PyPi -- I can't imagine that trying to figure
> out which of those hundreds of thousands of users you trust for
> reviews would be at all helpful -- it simply doesn't scale.
>

The page https://levien.com/free/tmetric-HOWTO.html has some thoughts on
how to build a scalable, resilient trust network based on user ratings; I
can't guarantee that it'll change your opinion, though!

I suppose if my fantasy "curated" site existed, and the curation group
> were of a manageable size, then you could do that, but the point of having
> a modest number of curators is that you can already trust them ;-)
>
> Honestly, I'd be more likely to go with "I can assume that projects that
>>> are dependencies of other projects that I already know are good quality,
>>> are themselves good quality". Which excludes people from the
>>> equation altogether,
>>>
>>
> I there are a number of metrics that could be used -- and "how many
> projects" use this projecct as a dependency" is a good one. -- "which"
> projects would be even stronger. And there are others.
>
> Anything like that it can be gamed, but I"m not sure that's as huge a
> problem as it might be -- what is the incentive to game this system? this
> is all open source, no one's making money, and frankly, having a lot of
> users can be a burden as well!
>
> Sure, many of us would really like a lot of people to use our code, but
> the incentives to cheat to get more users really aren't that strong. -- at
> least if. you can filter out the malware in some other way.
>
> -CHB
>
>
>
>
>
>
>> but which falls apart when I'm looking for a library in a new area.
>>>
>>> Paul
>>>
>>
>> Cautious +1, since PageRank did pretty well for a good stint in a
>> somewhat analogous environment.
>>
>>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/J5RH7ZGWO23APG42E6ZU5QPRXMYKJ7W4/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> --
> Christopher Barker, PhD (Chris)
>
> Python Language Consulting
>   - Teaching
>   - Scientific Software Development
>   - Desktop GUI and Web Development
>   - wxPython, numpy, scipy, Cython
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/REXQPXKI3IUB3Z5XZ2UTG6WLKWJLOVB5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-09 Thread Christopher Barker
On Sun, Jul 9, 2023 at 8:37 AM James Addison via Python-ideas <
python-ideas@python.org> wrote:

> ISTM the primary use cases advanced here have been for "naive" users.
>>> Likely they won't be in a position to decide whether they trust Guido
>>> van Rossum or Egg Rando more.
>>
>>
There are 718,155 users on PyPi -- I can't imagine that trying to figure
out which of those hundreds of thousands of users you trust for
reviews would be at all helpful -- it simply doesn't scale.

I suppose if my fantasy "curated" site existed, and the curation group
were of a manageable size, then you could do that, but the point of having
a modest number of curators is that you can already trust them ;-)

Honestly, I'd be more likely to go with "I can assume that projects that
>> are dependencies of other projects that I already know are good quality,
>> are themselves good quality". Which excludes people from the
>> equation altogether,
>>
>
I there are a number of metrics that could be used -- and "how many
projects" use this projecct as a dependency" is a good one. -- "which"
projects would be even stronger. And there are others.

Anything like that it can be gamed, but I"m not sure that's as huge a
problem as it might be -- what is the incentive to game this system? this
is all open source, no one's making money, and frankly, having a lot of
users can be a burden as well!

Sure, many of us would really like a lot of people to use our code, but
the incentives to cheat to get more users really aren't that strong. -- at
least if. you can filter out the malware in some other way.

-CHB






> but which falls apart when I'm looking for a library in a new area.
>>
>> Paul
>>
>
> Cautious +1, since PageRank did pretty well for a good stint in a somewhat
> analogous environment.
>
>> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/J5RH7ZGWO23APG42E6ZU5QPRXMYKJ7W4/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Q35LO7KS5XPZVGTYA2XEFEJVSVO27EBC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-09 Thread James Addison via Python-ideas
On Sun, Jul 9, 2023, 16:25 Paul Moore  wrote:

>
>
> On Sun, 9 Jul 2023 at 15:56, Stephen J. Turnbull <
> turnbull.stephen...@u.tsukuba.ac.jp> wrote:
>
>> James Addison via Python-ideas writes:
>>
>>  > The implementation of such a system could either be centralized or
>>  > distributed; the trust signals that human users infer from it
>>  > should always be distributed.
>>
>> ISTM the primary use cases advanced here have been for "naive" users.
>> Likely they won't be in a position to decide whether they trust Guido
>> van Rossum or Egg Rando more.  So in practice they'll often want to go
>> with some kind of publicly weighted average of scores.
>>
>
> I'll also point out that I'm a long-standing Python developer, and a core
> dev, and I still *regularly* get surprised by finding out that community
> members that I know and respect are maintainers of projects that I had no
> idea they were associated with. Which suggests that I have no idea how many
> *other* people who I think of as "just another person" might be maintainers
> of key, high-profile projects. So I think that a model based round
> weighting results based on "who you trust" would have some rather
> unfortunate failure modes.
>
> Honestly, I'd be more likely to go with "I can assume that projects that
> are dependencies of other projects that I already know are good quality,
> are themselves good quality". Which excludes people from the
> equation altogether, but which falls apart when I'm looking for a library
> in a new area.
>
> Paul
>

Cautious +1, since PageRank did pretty well for a good stint in a somewhat
analogous environment.

>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/J5RH7ZGWO23APG42E6ZU5QPRXMYKJ7W4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-09 Thread Paul Moore
On Sun, 9 Jul 2023 at 15:56, Stephen J. Turnbull <
turnbull.stephen...@u.tsukuba.ac.jp> wrote:

> James Addison via Python-ideas writes:
>
>  > The implementation of such a system could either be centralized or
>  > distributed; the trust signals that human users infer from it
>  > should always be distributed.
>
> ISTM the primary use cases advanced here have been for "naive" users.
> Likely they won't be in a position to decide whether they trust Guido
> van Rossum or Egg Rando more.  So in practice they'll often want to go
> with some kind of publicly weighted average of scores.
>

I'll also point out that I'm a long-standing Python developer, and a core
dev, and I still *regularly* get surprised by finding out that community
members that I know and respect are maintainers of projects that I had no
idea they were associated with. Which suggests that I have no idea how many
*other* people who I think of as "just another person" might be maintainers
of key, high-profile projects. So I think that a model based round
weighting results based on "who you trust" would have some rather
unfortunate failure modes.

Honestly, I'd be more likely to go with "I can assume that projects that
are dependencies of other projects that I already know are good quality,
are themselves good quality". Which excludes people from the
equation altogether, but which falls apart when I'm looking for a library
in a new area.

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/N6X7JFHR6U4TEE4YSZPTE2M4OPD6BMMM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-09 Thread James Addison via Python-ideas
I didn't really address your point there; indirectly mine was to reaffirm a
sense that not all participants may want to read the opinions of others
while learning technologies, and that's why I am skeptical of the
suggestions to include subjective user ratings of any kind within Python
packaging infrastructure.

On Sun, Jul 9, 2023, 16:09 James Addison  wrote:

> On Sun, Jul 9, 2023, 15:52 Stephen J. Turnbull <
> turnbull.stephen...@u.tsukuba.ac.jp> wrote:
>
>> James Addison via Python-ideas writes:
>>
>>  > The implementation of such a system could either be centralized or
>>  > distributed; the trust signals that human users infer from it
>>  > should always be distributed.
>>
>> ISTM the primary use cases advanced here have been for "naive" users.
>> Likely they won't be in a position to decide whether they trust Guido
>> van Rossum or Egg Rando more.  So in practice they'll often want to go
>> with some kind of publicly weighted average of scores.
>>
>> To avoid the problem of ballot-box stuffing, you could go the way that
>> pro sports often do for their All-Star teams: have one vote by anybody
>> who cares to register an ID, and another by verified committers,
>> including committers from "trusted" projects as well.
>>
>
> As someone who sometimes prefers to learn independently -- even if that
> takes longer and may produce unusual perspectives -- I remember learning
> web development by reading the source HTML of websites.
>
> Maybe that wouldn't be the typical way to learn programming -- but given
> the volume of successful and important software that exists in the world
> today, I think that having that code and the packages that it is composed
> of available to learn from would be highly beneficial to maintainers,
> educators and students, and other groups as well.
>
>>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DLH44V4UUDUQN6NCMIXSADM6RE27RIEJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-09 Thread James Addison via Python-ideas
On Sun, Jul 9, 2023, 15:52 Stephen J. Turnbull <
turnbull.stephen...@u.tsukuba.ac.jp> wrote:

> James Addison via Python-ideas writes:
>
>  > The implementation of such a system could either be centralized or
>  > distributed; the trust signals that human users infer from it
>  > should always be distributed.
>
> ISTM the primary use cases advanced here have been for "naive" users.
> Likely they won't be in a position to decide whether they trust Guido
> van Rossum or Egg Rando more.  So in practice they'll often want to go
> with some kind of publicly weighted average of scores.
>
> To avoid the problem of ballot-box stuffing, you could go the way that
> pro sports often do for their All-Star teams: have one vote by anybody
> who cares to register an ID, and another by verified committers,
> including committers from "trusted" projects as well.
>

As someone who sometimes prefers to learn independently -- even if that
takes longer and may produce unusual perspectives -- I remember learning
web development by reading the source HTML of websites.

Maybe that wouldn't be the typical way to learn programming -- but given
the volume of successful and important software that exists in the world
today, I think that having that code and the packages that it is composed
of available to learn from would be highly beneficial to maintainers,
educators and students, and other groups as well.

>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XRPK7LDU3JMP7NBY75SUOHUSHHW33BKA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-09 Thread Stephen J. Turnbull
James Addison via Python-ideas writes:

 > The implementation of such a system could either be centralized or
 > distributed; the trust signals that human users infer from it
 > should always be distributed.

ISTM the primary use cases advanced here have been for "naive" users.
Likely they won't be in a position to decide whether they trust Guido
van Rossum or Egg Rando more.  So in practice they'll often want to go
with some kind of publicly weighted average of scores.

To avoid the problem of ballot-box stuffing, you could go the way that
pro sports often do for their All-Star teams: have one vote by anybody
who cares to register an ID, and another by verified committers,
including committers from "trusted" projects as well.

Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4XKWJDHQWBCX7HIX7UT5GJNXMFOLMDWY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-09 Thread James Addison via Python-ideas
On Sun, Jul 9, 2023, 09:13 Chris Angelico  wrote:

> On Sun, 9 Jul 2023 at 18:06, James Addison via Python-ideas
>  wrote:
> >
> > On Sun, 9 Jul 2023 at 02:11, Cameron Simpson  wrote:
> > > I have always thought that any community scoring system should allow
> > > other users to mark up/down other reviewers w.r.t the scores presented.
> > > That markup should only affect the scoring as presented to the person
> > > doing the markup, like a personal killfile. The idea is that you can
> > > have the ratings you see affected by notions that "I trust the opinions
> > > of user A" or "I find user B's opinion criteria not useful for my
> > > criteria".
> >
> > That sounds to me like the basis of a distributed trust network, and
> > could be useful.
> >
>
> Why distributed? This sounded more like a centralized system, but one
> where you can "ignore reviews from this user" for any other user.
>

The implementation of such a system could either be centralized or
distributed; the trust signals that human users infer from it should always
be distributed.  And I'd argue that it's more difficult to guarantee that
the trust presented to all participants is fair and accurate in either a
centralized or a proprietary system.

>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LAVE5SWYASATB7H3D4CAKZOCZX4GT3SW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-09 Thread Chris Angelico
On Sun, 9 Jul 2023 at 18:06, James Addison via Python-ideas
 wrote:
>
> On Sun, 9 Jul 2023 at 02:11, Cameron Simpson  wrote:
> > I have always thought that any community scoring system should allow
> > other users to mark up/down other reviewers w.r.t the scores presented.
> > That markup should only affect the scoring as presented to the person
> > doing the markup, like a personal killfile. The idea is that you can
> > have the ratings you see affected by notions that "I trust the opinions
> > of user A" or "I find user B's opinion criteria not useful for my
> > criteria".
>
> That sounds to me like the basis of a distributed trust network, and
> could be useful.
>

Why distributed? This sounded more like a centralized system, but one
where you can "ignore reviews from this user" for any other user.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IOFZ4NR3XYQDUTD3FY2XUTRRADPMQ7AC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-09 Thread James Addison via Python-ideas
On Sun, 9 Jul 2023 at 02:11, Cameron Simpson  wrote:
>
> On 04Jul2023 17:21, Christopher Barker  wrote:
> >3) A rating system built into PyPi -- This could be a combination of
> >two
> >things:
> >  A - Automated analysis -- download stats, dependency stats, release
> >frequency, etc, etc, etc.
> >  B - Community ratings -- upvotes. stars, whatever.
> >
> >If done well, that could be very useful -- search on PyPi listed by rating.
> >However -- :done well" ios a huge challenge -- I don't think there's a way
> >to do the automated system right, and community scoring can be abused
> >pretty easily. But maybe folks smarter than me could make it work with one
> >or both of these approaches.
>
> I have always thought that any community scoring system should allow
> other users to mark up/down other reviewers w.r.t the scores presented.
> That markup should only affect the scoring as presented to the person
> doing the markup, like a personal killfile. The idea is that you can
> have the ratings you see affected by notions that "I trust the opinions
> of user A" or "I find user B's opinion criteria not useful for my
> criteria".
>
> Of course the "ignore user B" has some of the same downsides as trying
> individually ignore certain spam sources: good for a single "bad" actor
> (by my personal criteria) to ignore their (apparent) gaming of the
> ratings but not good for a swarm of robots.

Hi Cameron,

That sounds to me like the basis of a distributed trust network, and
could be useful.

Some thoughts from experience working with Python (and other
ecosystem) packages: after getting to know the usernames of developers
and publishers of packages, I think that much of that trust can be
learned by individuals without the assistance of technology -- that is
to say, people begin to recognize authors that they trust, and authors
that they don't.

How to provide reassurance that each author's identity remains the
same between modifications to packages/code is a related challenge,
though.  FWIW, I don't really like many of the common multi-factor
authentication systems used today, because I don't like seeing
barriers to expression emerge, even when the intent is benevolent.
I'm not sure I yet have better alternatives to suggest, though.

Your message also helped me clarify why I don't like embedding any
review information at all within packaging ecosystems -- regardless of
whether transitive trust is additionally available in the form of
reviews.

The reason is that I would prefer to see end-to-end transparent supply
chain integrity for almost all, if not all, software products.  I'm
typing this in a GMail web interface, but I do not believe that many
people have access to all of the source code for the version that I'm
using.  If everyone did, and if that source included strong dependency
hashes to indicate the dependencies used -- similar to the way that
pip-tools[1] can write a persistent record of a dependency set,
allowing the same dependencies to be inspected and installed by others
-- then people could begin to build their own mental models of what
packages -- and what specific versions of those packages -- are worth
trusting.

In other words: if all of the software and bill-of-materials for it
became open and published, and could be constructed reproducibly[2],
then social trust would emerge without a requirement for reviews.
That would not be mutually-exclusive with the presence of reviews --
verbal, written, or otherwise -- elsewhere.

Thanks,
James

[1] - https://github.com/jazzband/pip-tools/

[2] - https://www.reproducible-builds.org/
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OOPIHTBTJFHYVJLJVYHWAK4EPYKP6YBH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-08 Thread Cameron Simpson

On 04Jul2023 17:21, Christopher Barker  wrote:
3) A rating system built into PyPi -- This could be a combination of 
two

things:
 A - Automated analysis -- download stats, dependency stats, release
frequency, etc, etc, etc.
 B - Community ratings -- upvotes. stars, whatever.

If done well, that could be very useful -- search on PyPi listed by rating.
However -- :done well" ios a huge challenge -- I don't think there's a way
to do the automated system right, and community scoring can be abused
pretty easily. But maybe folks smarter than me could make it work with one
or both of these approaches.


I have always thought that any community scoring system should allow 
other users to mark up/down other reviewers w.r.t the scores presented.  
That markup should only affect the scoring as presented to the person 
doing the markup, like a personal killfile. The idea is that you can 
have the ratings you see affected by notions that "I trust the opinions 
of user A" or "I find user B's opinion criteria not useful for my 
criteria".


Of course the "ignore user B" has some of the same downsides as trying 
individually ignore certain spam sources: good for a single "bad" actor 
(by my personal criteria) to ignore their (apparent) gaming of the 
ratings but not good for a swarm of robots.


Cheers,
Cameron Simpson 
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MXIVYW6PLBGHM2X4TH43CQJOBEFCJJXX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-06 Thread Dom Grigonis
Thanks David,

Not keen to take it on solo.

Ideally, IMO, this could be a joint project of this whole group. Someone more 
senior from this group creates a repo and oversights the progress, couple of 
more experienced members make initial decisions to get things going, someone 
issues a review PR, couple of new OPs are suggested to write a review on their 
queries.

Would be good to know if:
1. Someone has their own little benchmarking/reviews and would be willing to 
spend a little time to issue PR for some initial content.
2. People from this group see themselves of going to such place and adding a 
new great package they have just found to existing reviews (or even more 
importantly an awful package)
3. Someone sees the opportunity to contribute given someone took such project 
on. e.g.
  a) someone is very excited about benchmarking automation
  b) someone has some working scripts to fetch github stats / stack trends that 
are waiting to be used
  c) someone wants to take their devops to next level and sees this as a good 
opportunity
  d) someone is very keen on high level view and would like to contribute in 
working on categorisation (partially relying on python stdlib/libref could be 
intuitive, although then it is a dependency)

A lot of “someones” in this e-mail...

> On 6 Jul 2023, at 16:55, David Mertz, Ph.D.  wrote:
> 
> Dom:
> 
> I'd recommend you simply start a GitHub project for "Curated PyPI", find a 
> catchy domain name, and publish that via GH Pages.  That's a few hours of 
> work to get a skeleton.  But no, I'm not quite volunteering to create and 
> maintain it myself today.
> 
> After there is a concrete site existing, you can refine the presentation and 
> governance procedure iteratively.  As a start, it can basically just be a web 
> page with evaluations like yours of the JSON libraries.  At a first pass, 
> there's no need for anything dynamic on the page, just some tables (or maybe 
> accordions, or side-bar navigation, or whatever).
> 
> I'd be very likely to make some PRs to such a repository myself.  At some 
> point, with enough recommendations, you might add some automation. E.g. some 
> script that checks all the submitted "package reviews" and creates an 
> aggregation ("10 reviews with average rating of 8").  Even there, running 
> that thing offline every once in a while is plenty to start (you could do GH 
> Actions or something too, if you like).
> 
> There are a few decisions to make, but none that difficult.  For example, 
> what format are reviews? Markdown? YAML? TOML? JSON? Python with conventions? 
> Whatever it is, it should have a gentle learning curve and be human readable 
> IMO.
> 
> 
> 
> On Thu, Jul 6, 2023 at 9:26 AM Dom Grigonis  > wrote:
> It is possible, that issues being discussed at this stage are not as relevant 
> as they seem at stage 0, which this idea is at.
> (Unless someone here is looking for a very serious commitment.)
> 
> If some sort of starting point which is “light” in approach was decided on, 
> then the process can be readjusted as/if it progresses. Maybe no need to put 
> a “stamp” on a package, but simply provide comparison statistics given some 
> initial structure.
> 
> I think a lot of packages can be filtered on objective criteria, without even 
> reaching the stage of subjective opinions.
> 
> ———
> 
> General info - fairly easy to inspect without the need of subjective opinions.
> 1. License
> 2. Maintenance - hard stack overflow & repo stats
> 
> Performance - hard stats:
> 1. There will be lower level language extensions, which even if not up to 
> standards in other aspects are worth attention, someone else might pick it up 
> and rejuvenate if explicitly indicated.
> 2. There will be a pure python packages:
>   a) good coding standards with good knowledge on efficient programming in 
> pure python
>   b) pure python packages that take ages to execute
> 
> In many areas, this will filter out many libraries. Although, there are some, 
> where it wouldn’t. E.g. schema-based low level serialisation, where 
> benchmarks can be quite tight.
> 
> The remaining evaluation can be subjective opinions, where preferences of 
> curators can be taken into account:
> 1. Coding standards
> 2. Integration
> 3. Flexibility/functionality
> 4. …
> 
> IMO, all of this can be done while being on the safe side - if unsure, leave 
> the plain statistics for users and developers to see.
> 
> ———
> 
> An example. (I am not the developer of any of these)
> Json serialisers:
> 1. json - stdlib, average performance, well maintained, flexible, very safe 
> to depend on
> 2. simplejson - 3rd party, pure python, performance in line with 1), drop-in 
> replacement for json, been around for a while, safe to depend on
> 2. ultrajson - 3rd party, written in C, >3x performance boost on 1) & 2), 
> drop-in replacement for json, been around for a while, safe to depend on
> 3. ijson - 3rd party, C, 

[Python-ideas] Re: "Curated" package repo?

2023-07-06 Thread James Addison via Python-ideas
On Thu, Jul 6, 2023, 14:22 James Addison  wrote:

> I agree, we should encourage or await a single organization to reimplement
> a packaging ecosystem with a slightly different set of properties that
> continue to provide them with editor biasing, preventing eventual global
> consensus and system neutrality.
>
> Is your time available to help build it?
>

I'd like to apologise for this comment; I don't think that I argued in good
faith here.

I was frustrated by a sense that many of the more straightforward attempts
to make improvements in packaging ecosystems are, in themselves, a
reinvention of previously-existing wheels, often producing similarly wonky
spokes to previous attempts that result in repeated off-course journeys
that, given enough knowledge of the history of technology, seem predictable.

To observe that and then to go on to suggest that we simply wait for the
next wonky wheel builder doesn't seem like genuine progress, and I should
neither argue for that nor ask whether other people want to spend their own
valuable time on it.

On Thu, Jul 6, 2023, 14:17 Gregory Disney 
> wrote:
>
>> why do people insist on reinventing the wheel? Blockchain is not the
>> answer for adding trust that is verifiable. Code signing is the answer,
>> it’s widely accepted and would be useful in cases of trusted computing and
>> other security use cases.
>>
>> I don’t want to load a hash table to load a third party module on a UEFI
>> interface.
>>
>> On Thu, Jul 6, 2023 at 9:11 AM James Addison via Python-ideas <
>> python-ideas@python.org> wrote:
>>
>>> On Wed, Jul 5, 2023, 19:06 Chris Angelico  wrote:
>>>
 On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas
  wrote:
 > I also agree with a later reply about avoiding the murkier side of
 blockchains / etc.  That said, it seems to me (again, sample size one
 anecdata) that creating a more levelled playing field for package
 publication could benefit from the use of some distributed technologies.
 Even HTTP mirrors are, arguably, a basic form of that.. there's at least
 one question related to recency of data, though.  Delaying availability of
 a package to an audience -- if it's important enough -- could under some
 circumstances become effectively similar to censorship.
 >

 A blockchain won't solve anything here. It would be completely and
 utterly impractical to put the packages themselves into a blockchain,
 so all you'd have is the index, and that means it's just a bad version
 of PyPI's own single-page index.

 ChrisA
 ___
 Python-ideas mailing list -- python-ideas@python.org
 To unsubscribe send an email to python-ideas-le...@python.org
 https://mail.python.org/mailman3/lists/python-ideas.python.org/
 Message archived at
 https://mail.python.org/archives/list/python-ideas@python.org/message/PTIS3HZHJSFV7ETWE7UP4HKXS4WN2OEO/
 Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>>
>>> Mostly agreed.  A distributed hash table or similar, though, could be
>>> appropriate in combination with ideas similar to the accreting layers of
>>> self-reinforcing consensus that some blockchain technologies provide.
>>> ___
>>> Python-ideas mailing list -- python-ideas@python.org
>>> To unsubscribe send an email to python-ideas-le...@python.org
>>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>>> Message archived at
>>> https://mail.python.org/archives/list/python-ideas@python.org/message/I3CDZAXGYVS33DJ4JEENGYMF4MY6BQ7O/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KAKBDC3WSSUKCAY24SMABP3GIVXXEILD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-06 Thread David Mertz, Ph.D.
Dom:

I'd recommend you simply start a GitHub project for "Curated PyPI", find a
catchy domain name, and publish that via GH Pages.  That's a few hours of
work to get a skeleton.  But no, I'm not quite volunteering to create and
maintain it myself today.

After there is a concrete site existing, you can refine the presentation
and governance procedure iteratively.  As a start, it can basically just be
a web page with evaluations like yours of the JSON libraries.  At a first
pass, there's no need for anything dynamic on the page, just some tables
(or maybe accordions, or side-bar navigation, or whatever).

I'd be very likely to make some PRs to such a repository myself.  At some
point, with enough recommendations, you might add some automation. E.g.
some script that checks all the submitted "package reviews" and creates an
aggregation ("10 reviews with average rating of 8").  Even there, running
that thing offline every once in a while is plenty to start (you could do
GH Actions or something too, if you like).

There are a few decisions to make, but none that difficult.  For example,
what format are reviews? Markdown? YAML? TOML? JSON? Python with
conventions? Whatever it is, it should have a gentle learning curve and be
human readable IMO.



On Thu, Jul 6, 2023 at 9:26 AM Dom Grigonis  wrote:

> It is possible, that issues being discussed at this stage are not as
> relevant as they seem at stage 0, which this idea is at.
> (Unless someone here is looking for a very serious commitment.)
>
> If some sort of starting point which is “light” in approach was decided
> on, then the process can be readjusted as/if it progresses. Maybe no need
> to put a “stamp” on a package, but simply provide comparison statistics
> given some initial structure.
>
> I think a lot of packages can be filtered on objective criteria, without
> even reaching the stage of subjective opinions.
>
> ———
>
> General info - fairly easy to inspect without the need of subjective
> opinions.
> 1. License
> 2. Maintenance - hard stack overflow & repo stats
>
> Performance - hard stats:
> 1. There will be lower level language extensions, which even if not up to
> standards in other aspects are worth attention, someone else might pick it
> up and rejuvenate if explicitly indicated.
> 2. There will be a pure python packages:
>   a) good coding standards with good knowledge on efficient programming in
> pure python
>   b) pure python packages that take ages to execute
>
> In many areas, this will filter out many libraries. Although, there are
> some, where it wouldn’t. E.g. schema-based low level serialisation, where
> benchmarks can be quite tight.
>
> The remaining evaluation can be subjective opinions, where preferences of
> curators can be taken into account:
> 1. Coding standards
> 2. Integration
> 3. Flexibility/functionality
> 4. …
>
> IMO, all of this can be done while being on the safe side - if unsure,
> leave the plain statistics for users and developers to see.
>
> ———
>
> An example. (I am not the developer of any of these)
> Json serialisers:
> 1. json - stdlib, average performance, well maintained, flexible, very
> safe to depend on
> 2. simplejson - 3rd party, pure python, performance in line with 1),
> drop-in replacement for json, been around for a while, safe to depend on
> 2. ultrajson - 3rd party, written in C, >3x performance boost on 1) & 2),
> drop-in replacement for json, been around for a while, safe to depend on
> 3. ijson - 3rd party, C, average performance, proprietary interface
> relying heavily on iterator protocol, status 
> 4. orjson - 3rd party, highly optimised C, performance on par with fastest
> serialisers on the market, not-a-drop-in-replacement for json, due to
> sacrifices for performance, rich in functionality, well maintained, safe to
> depend on
> 5. pyjson5 - 3rd party, c++ performance similar to ultrajson, can be a
> drop-in replacement for json, extends json to json5 features such as
> comments, well maintained, safe to depend on
>
> (THIS IS JUST AN EXAMPLE OF COMPARISON, NOT TO BE RELIED ON)
>
> So there is still a bit of opinion here, but all of this can be
> standardised and put in numbers, and comparison of this type can be  done
> with little-to-none personal opinion.
>
> ———
>
> After structure for this is in place, it would be easier to discuss
> further whether more serious curation is needed/worthwhile/makes sense.
>
> Allow queries from users, package developers, places to gather opinions,
> maybe volunteering to do a deeper analysis…
>
> And once there is enough input, maybe a curated guidance can be added to
> the review. But this is the next stage, which is not necessarily needed to
> be thoroughly thought out before putting in place something simple,
> objective & risk-free.
>
> ———
>
> Maybe stage 1. is all that users need - a reliable place to check hard
> stats, where users and developers can update them for the benefit of all.
> With 

[Python-ideas] Re: "Curated" package repo?

2023-07-06 Thread Dom Grigonis
It is possible, that issues being discussed at this stage are not as relevant 
as they seem at stage 0, which this idea is at.
(Unless someone here is looking for a very serious commitment.)

If some sort of starting point which is “light” in approach was decided on, 
then the process can be readjusted as/if it progresses. Maybe no need to put a 
“stamp” on a package, but simply provide comparison statistics given some 
initial structure.

I think a lot of packages can be filtered on objective criteria, without even 
reaching the stage of subjective opinions.

———

General info - fairly easy to inspect without the need of subjective opinions.
1. License
2. Maintenance - hard stack overflow & repo stats

Performance - hard stats:
1. There will be lower level language extensions, which even if not up to 
standards in other aspects are worth attention, someone else might pick it up 
and rejuvenate if explicitly indicated.
2. There will be a pure python packages:
  a) good coding standards with good knowledge on efficient programming in pure 
python
  b) pure python packages that take ages to execute

In many areas, this will filter out many libraries. Although, there are some, 
where it wouldn’t. E.g. schema-based low level serialisation, where benchmarks 
can be quite tight.

The remaining evaluation can be subjective opinions, where preferences of 
curators can be taken into account:
1. Coding standards
2. Integration
3. Flexibility/functionality
4. …

IMO, all of this can be done while being on the safe side - if unsure, leave 
the plain statistics for users and developers to see.

———

An example. (I am not the developer of any of these)
Json serialisers:
1. json - stdlib, average performance, well maintained, flexible, very safe to 
depend on
2. simplejson - 3rd party, pure python, performance in line with 1), drop-in 
replacement for json, been around for a while, safe to depend on
2. ultrajson - 3rd party, written in C, >3x performance boost on 1) & 2), 
drop-in replacement for json, been around for a while, safe to depend on
3. ijson - 3rd party, C, average performance, proprietary interface 
relying heavily on iterator protocol, status 
4. orjson - 3rd party, highly optimised C, performance on par with fastest 
serialisers on the market, not-a-drop-in-replacement for json, due to 
sacrifices for performance, rich in functionality, well maintained, safe to 
depend on
5. pyjson5 - 3rd party, c++ performance similar to ultrajson, can be a drop-in 
replacement for json, extends json to json5 features such as comments, well 
maintained, safe to depend on

(THIS IS JUST AN EXAMPLE OF COMPARISON, NOT TO BE RELIED ON)

So there is still a bit of opinion here, but all of this can be standardised 
and put in numbers, and comparison of this type can be  done with 
little-to-none personal opinion.

———

After structure for this is in place, it would be easier to discuss further 
whether more serious curation is needed/worthwhile/makes sense.

Allow queries from users, package developers, places to gather opinions, maybe 
volunteering to do a deeper analysis… 

And once there is enough input, maybe a curated guidance can be added to the 
review. But this is the next stage, which is not necessarily needed to be 
thoroughly thought out before putting in place something simple, objective & 
risk-free.

———

Maybe stage 1. is all that users need - a reliable place to check hard stats, 
where users and developers can update them for the benefit of all. With enough 
popularity, package developers should be motivated to issue stat updates (e.g. 
add additional column to benchmarking script), and users would issue similar 
updates (e.g. add additional column to benchmarking script, where the library 
is extremely slow).

It is possible that the project would naturally turn to direction of hard stat 
coverage instead of “deep” curation. E.g.
json serialisers become a sub-branch of schema-less serialisers,
which in turn become a branch of serialisers

Then the user can then view comparable stats of the whole branch, sub-branch, 
sub-sub-branch to get the information he needs to make decisions. And apply 
different filters in the process to get to the final list of packages on which 
the user will have to do hiss final subjective analysis anyways.

———

E.g. User needs a serialiser. He prefers schema-less, but willing to go 
schema-based given large increases in performance. Does not mind low 
maintenance status given he aims to maintain his own proprietary serialisation 
library in the long run. Naturally, clean & simple coding with permissive 
license is preferred.

Just a portal with up-to-date stats where user could interactively navigate 
such decisions would be a good start and potentially a “safe” route to begin 
with.

The starting work on such thing then would be more heavy on automation, rather 
than politics, which in turn will be easier to tackle later 

[Python-ideas] Re: "Curated" package repo?

2023-07-06 Thread James Addison via Python-ideas
I agree, we should encourage or await a single organization to reimplement
a packaging ecosystem with a slightly different set of properties that
continue to provide them with editor biasing, preventing eventual global
consensus and system neutrality.

Is your time available to help build it?

On Thu, Jul 6, 2023, 14:17 Gregory Disney 
wrote:

> why do people insist on reinventing the wheel? Blockchain is not the
> answer for adding trust that is verifiable. Code signing is the answer,
> it’s widely accepted and would be useful in cases of trusted computing and
> other security use cases.
>
> I don’t want to load a hash table to load a third party module on a UEFI
> interface.
>
> On Thu, Jul 6, 2023 at 9:11 AM James Addison via Python-ideas <
> python-ideas@python.org> wrote:
>
>> On Wed, Jul 5, 2023, 19:06 Chris Angelico  wrote:
>>
>>> On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas
>>>  wrote:
>>> > I also agree with a later reply about avoiding the murkier side of
>>> blockchains / etc.  That said, it seems to me (again, sample size one
>>> anecdata) that creating a more levelled playing field for package
>>> publication could benefit from the use of some distributed technologies.
>>> Even HTTP mirrors are, arguably, a basic form of that.. there's at least
>>> one question related to recency of data, though.  Delaying availability of
>>> a package to an audience -- if it's important enough -- could under some
>>> circumstances become effectively similar to censorship.
>>> >
>>>
>>> A blockchain won't solve anything here. It would be completely and
>>> utterly impractical to put the packages themselves into a blockchain,
>>> so all you'd have is the index, and that means it's just a bad version
>>> of PyPI's own single-page index.
>>>
>>> ChrisA
>>> ___
>>> Python-ideas mailing list -- python-ideas@python.org
>>> To unsubscribe send an email to python-ideas-le...@python.org
>>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>>> Message archived at
>>> https://mail.python.org/archives/list/python-ideas@python.org/message/PTIS3HZHJSFV7ETWE7UP4HKXS4WN2OEO/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>>
>> Mostly agreed.  A distributed hash table or similar, though, could be
>> appropriate in combination with ideas similar to the accreting layers of
>> self-reinforcing consensus that some blockchain technologies provide.
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/I3CDZAXGYVS33DJ4JEENGYMF4MY6BQ7O/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YS3UQFENIAX7GGXD2KCJ3GHZJJJV3KLM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-06 Thread Gregory Disney
why do people insist on reinventing the wheel? Blockchain is not the answer
for adding trust that is verifiable. Code signing is the answer, it’s
widely accepted and would be useful in cases of trusted computing and other
security use cases.

I don’t want to load a hash table to load a third party module on a UEFI
interface.

On Thu, Jul 6, 2023 at 9:11 AM James Addison via Python-ideas <
python-ideas@python.org> wrote:

> On Wed, Jul 5, 2023, 19:06 Chris Angelico  wrote:
>
>> On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas
>>  wrote:
>> > I also agree with a later reply about avoiding the murkier side of
>> blockchains / etc.  That said, it seems to me (again, sample size one
>> anecdata) that creating a more levelled playing field for package
>> publication could benefit from the use of some distributed technologies.
>> Even HTTP mirrors are, arguably, a basic form of that.. there's at least
>> one question related to recency of data, though.  Delaying availability of
>> a package to an audience -- if it's important enough -- could under some
>> circumstances become effectively similar to censorship.
>> >
>>
>> A blockchain won't solve anything here. It would be completely and
>> utterly impractical to put the packages themselves into a blockchain,
>> so all you'd have is the index, and that means it's just a bad version
>> of PyPI's own single-page index.
>>
>> ChrisA
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/PTIS3HZHJSFV7ETWE7UP4HKXS4WN2OEO/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
> Mostly agreed.  A distributed hash table or similar, though, could be
> appropriate in combination with ideas similar to the accreting layers of
> self-reinforcing consensus that some blockchain technologies provide.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/I3CDZAXGYVS33DJ4JEENGYMF4MY6BQ7O/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VMEPUO252ZSC6SCM7L5NNLHXUG7COXRB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-06 Thread James Addison via Python-ideas
On Wed, Jul 5, 2023, 19:06 Chris Angelico  wrote:

> On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas
>  wrote:
> > I also agree with a later reply about avoiding the murkier side of
> blockchains / etc.  That said, it seems to me (again, sample size one
> anecdata) that creating a more levelled playing field for package
> publication could benefit from the use of some distributed technologies.
> Even HTTP mirrors are, arguably, a basic form of that.. there's at least
> one question related to recency of data, though.  Delaying availability of
> a package to an audience -- if it's important enough -- could under some
> circumstances become effectively similar to censorship.
> >
>
> A blockchain won't solve anything here. It would be completely and
> utterly impractical to put the packages themselves into a blockchain,
> so all you'd have is the index, and that means it's just a bad version
> of PyPI's own single-page index.
>
> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/PTIS3HZHJSFV7ETWE7UP4HKXS4WN2OEO/
> Code of Conduct: http://python.org/psf/codeofconduct/


Mostly agreed.  A distributed hash table or similar, though, could be
appropriate in combination with ideas similar to the accreting layers of
self-reinforcing consensus that some blockchain technologies provide.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/I3CDZAXGYVS33DJ4JEENGYMF4MY6BQ7O/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-05 Thread Brendan Barnwell

On 2023-07-05 00:00, Christopher Barker wrote:
I'm noting this, because I think it's part of the problem to be solved, 
but maybe not the mainone (to me anyway). I've been focused more on 
"these packages are worthwhile, by some definition of worthwhile). While 
I think Chris A is more focused on "which of these seemingly similar 
packages should I use?" -- not unrelated, but not the same question either.


	I noticed this in the discussion and I think it's an important 
difference in how people approach this question.  Basically what some 
people want from a curated index is "this package is not junk" while 
others want "this package is actually good" or even "you should use this 
package for this purpose".


	I think that providing "not-junk level" curation is somewhat more 
tractable, because this form of curation is closer to a logical OR on 
different people's opinions.  It may be that many people tried a package 
and didn't find it useful, but if at least one person did find it 
useful, then we can probably say it's not junk.


	Providing "actually-good level" curation or "recommendations" is 
harder, because it means you actually have to address differences of 
opinion among curators.


	Personally I tend to think a not-junk type curation is the better one 
to aim at, for a few reasons.  First, it's easier.  Second, it 
eliminates one of the main problems with trying to search for packages 
on pypi, namely the huge number of "mytestpackage1"-type packages. 
Third, this is what conda-forge does and it seems to be working pretty 
well there.


--
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is no 
path, and leave a trail."

   --author unknown

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XH2GTRKHZ2T4Z3VHQUCC5L7OATSHPUQU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-05 Thread Brendan Barnwell

On 2023-07-05 07:13, Chris Angelico wrote:

Right; hence the question of how a "vetted Python package collection"
would compare. I can type "sudo apt install python-" and add the name
of a package, and I get some assurance that:

1) The package works
2) The package is useful enough
3) It's not malware
4) The specific*version*  of the package works along with the versions
of everything else.


	In my experience this is how conda-forge is too.  The level of 
assurance is somewhat lower, but there is still a level of assurance 
about all those things.  For point 4, the assurance is about the version 
you install working with the conda environment you install it into. 
This is an advantage over systemwide installs like debian packages 
because it means you can have multiple environments and know each one is 
consistent.


	Most of the problems arise when you circumvent conda's consistency 
checking, for instance by installing a package with pip rather than with 
conda.


--
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is no 
path, and leave a trail."

   --author unknown

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/B6ASNFCW2K47UKXAWM3LWWH2UTKUPSUE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-05 Thread Gregory Disney
Why not just use gpg signatures and maintain trusted signing keys? There’s
no reason to reinvent the wheel. If a user wants to use a unsigned or
untrusted packages, they have to accept the risk.

Thanks,
Greg

On Wed, Jul 5, 2023 at 2:05 PM Chris Angelico  wrote:

> On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas
>  wrote:
> > I also agree with a later reply about avoiding the murkier side of
> blockchains / etc.  That said, it seems to me (again, sample size one
> anecdata) that creating a more levelled playing field for package
> publication could benefit from the use of some distributed technologies.
> Even HTTP mirrors are, arguably, a basic form of that.. there's at least
> one question related to recency of data, though.  Delaying availability of
> a package to an audience -- if it's important enough -- could under some
> circumstances become effectively similar to censorship.
> >
>
> A blockchain won't solve anything here. It would be completely and
> utterly impractical to put the packages themselves into a blockchain,
> so all you'd have is the index, and that means it's just a bad version
> of PyPI's own single-page index.
>
> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/PTIS3HZHJSFV7ETWE7UP4HKXS4WN2OEO/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NYQSV7RO3GKE7272WZQ7VSIASNYKITMI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-05 Thread Dom Grigonis
One way would be to categorise areas and sub-areas and have a clear indication, 
where the work has not been done.

So that if I came to such place, I could find the sub-topic that I am 
interested in with clear indication of the status.

> On 5 Jul 2023, at 21:48, Brendan Barnwell  wrote:
> 
> On 2023-07-04 17:21, Christopher Barker wrote:
>> Anyway, I'd love to hear your thoughts on these ideas (or others?)  - both 
>> technical and social.
> 
>   To my mind there are two interrelated social problems that make this 
> task difficult:
> 
> 1) Promulgating a list of "good" packages tends to make people think packages 
> not on the list are not good (aka "implied exhaustiveness")
> 2) In order to curate all or nearly all packages, you need curators with a 
> wide range of areas of interest and expertise (aka "breadth").
> 
>   The reason these are interrelated is that once people start thinking 
> your list is exhaustive, it's really important to have breadth in the 
> curation, or else entire domains of utility can wind up having all packages 
> implicitly proscribed.
> 
>   As an example, a few months ago I wanted to do some automated email 
> manipulations via IMAP.  I looked at the builtin imaplib module and found it 
> useless, so I went looking for other things.  I eventually found one that 
> more or less met my needs (imap_tools).
> 
>   The question is, what happens when a person goes to our curated index 
> looking for an IMAP library?  If they don't find one, does that mean there 
> aren't any, or there are but they're all junk, or just that there was no 
> curator who had any reason to explore the space of packages available in this 
> area?  In short, it becomes difficult for a user to decide whether a tool's 
> *absence" from the index indicates a negative opinion or no opinion.
> 
>   There are ways around this, like adding categories (so if you see a 
> category you know someone at least attempted to evaluate some packages in 
> that category), but they can also have their own problems (like increasing 
> the level of work required for curation).  I'm not sure what the best 
> solution is, but just wanted to mention this issue.
> 
> -- 
> Brendan Barnwell
> "Do not follow where the path may lead.  Go, instead, where there is no path, 
> and leave a trail."
>   --author unknown
> 
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/WYLHRIBIRKP6W3HGCGFJGYHDL3GCSOR2/
> Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/O233MP5I5MJLTTVK7FRIKJPD6736R3KX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-05 Thread Brendan Barnwell

On 2023-07-04 17:21, Christopher Barker wrote:
Anyway, I'd love to hear your thoughts on these ideas (or others?)  - 
both technical and social.


	To my mind there are two interrelated social problems that make this 
task difficult:


1) Promulgating a list of "good" packages tends to make people think 
packages not on the list are not good (aka "implied exhaustiveness")
2) In order to curate all or nearly all packages, you need curators with 
a wide range of areas of interest and expertise (aka "breadth").


	The reason these are interrelated is that once people start thinking 
your list is exhaustive, it's really important to have breadth in the 
curation, or else entire domains of utility can wind up having all 
packages implicitly proscribed.


	As an example, a few months ago I wanted to do some automated email 
manipulations via IMAP.  I looked at the builtin imaplib module and 
found it useless, so I went looking for other things.  I eventually 
found one that more or less met my needs (imap_tools).


	The question is, what happens when a person goes to our curated index 
looking for an IMAP library?  If they don't find one, does that mean 
there aren't any, or there are but they're all junk, or just that there 
was no curator who had any reason to explore the space of packages 
available in this area?  In short, it becomes difficult for a user to 
decide whether a tool's *absence" from the index indicates a negative 
opinion or no opinion.


	There are ways around this, like adding categories (so if you see a 
category you know someone at least attempted to evaluate some packages 
in that category), but they can also have their own problems (like 
increasing the level of work required for curation).  I'm not sure what 
the best solution is, but just wanted to mention this issue.


--
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is no 
path, and leave a trail."

   --author unknown

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WYLHRIBIRKP6W3HGCGFJGYHDL3GCSOR2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-05 Thread Chris Angelico
On Thu, 6 Jul 2023 at 04:08, Gregory Disney
 wrote:
>
> Why not just use gpg signatures and maintain trusted signing keys? There’s no 
> reason to reinvent the wheel. If a user wants to use a unsigned or untrusted 
> packages, they have to accept the risk.
>

As an alternative to a blockchain? No idea, but I've never considered
blockchains to be useful for anything more than toys anyway.

As an alternative to a curated package list? That just comes down to
who holds the trusted keys, so it's the same as the other suggestions,
only you're looking at the mechanics for knowing whether it's on the
list, as opposed to the mechanics for figuring out which things go on
the list - two sides of the same coin, pretty much.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ANBP64KBYAB3MXO4NQDNMQHSXM525ZTN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-05 Thread Chris Angelico
On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas
 wrote:
> I also agree with a later reply about avoiding the murkier side of 
> blockchains / etc.  That said, it seems to me (again, sample size one 
> anecdata) that creating a more levelled playing field for package publication 
> could benefit from the use of some distributed technologies.  Even HTTP 
> mirrors are, arguably, a basic form of that.. there's at least one question 
> related to recency of data, though.  Delaying availability of a package to an 
> audience -- if it's important enough -- could under some circumstances become 
> effectively similar to censorship.
>

A blockchain won't solve anything here. It would be completely and
utterly impractical to put the packages themselves into a blockchain,
so all you'd have is the index, and that means it's just a bad version
of PyPI's own single-page index.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PTIS3HZHJSFV7ETWE7UP4HKXS4WN2OEO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-05 Thread James Addison via Python-ideas
On Wed, Jul 5, 2023, 01:24 Christopher Barker  wrote:

> Stating a new thread with a correct title.
>
> On 2 Jul 2023, at 10:12, Paul Moore  wrote:
>
> Unfortunately, too much of this discussion is framed as “someone should”,
>> or “it would be good if”. No-one is saying “I will”. Naming groups, like
>> “the PyPA should” doesn’t help either - groups don’t do things, people do.
>> Who in the PyPA? Me? Nope, sorry, I don’t have the time or interest - I’d
>> *use* a curated index, I sure as heck couldn’t *create* one.
>
>
> Well, I started this topic, and I don't *think* I ever wrote "someone
> should", and I certainly didn't write "PyPa should".
>
> But whatever I or anyone else wrote, my intention was to discuss what
> might be done to address what I think is a real problem/limitation in the
> discoverability of useful packages for Python.
>
> And I think of it not so much as "someone should" but as "it would be nice
> to have".
>
> Of course, any of these ideas would take a lot of work to implement --
> and  even though there are a lot of folks, on this list  and elsewhere,
> that would like to help, I don't think any substantial open-source project
> has gotten anywhere without a concerted effort by a very small group (often
> just 1) of people doing a lot of work to get it to a useful state before a
> larger group can contribute. So I"m fully aware that nothings going to
> happen unless *someone* really puts the work in up front. That someone
> *might* be me, but I'm really good at over-committing myself, and not so
> great at keeping my nose to the grindstone, so 
>
> And I think this particular problem calls for a solution that would have
> to be pretty well established before reaching critical mass to actually be
> useful -- after all, we already have PyPi -- why go anywhere else that is
> less comprehensive?
>
> All that being said, it's still worth having a conversation about what a
> good solution might look like -- there are a lot of options, and hashing
> out some of the ideas might inspire someone to rise to the occasion.
>
> The :problem", as I see it.
>
>  - The Python standard library is not, and will never be fully
> comprehensive -- most projects require *some* third party packages.
>  - There are a LOT of packages available on PyPi -- with a very wide range
> of usefulness, quality and maintenance -- everything from widely used
> packages with a huge community (e.g. numpy) to packages that are release
> 0.0.1, and never seen an update, and may not even work.
>
> So the odds that there's a package that does what you need are good, but
> it can be pretty hard to find them sometimes -- and can be a fair bit
> of work to sift through to find the good ones -- and many folks don't feel
> qualified to do so.
>
> This can result in two opposite consequences:
>
> 1) People using a package that really isn't reliable or maintained (or not
> supported on all platforms, or ..) and getting stuck with it (I've had that
> on some of my projects -- I'm pretty careful, but not everyone on my team
> is)
>
> 2) People writing their own code - wasting time, and maybe not getting a
> very good solution either. I've also had that problem on my projects...
>
> To avoid this -- SOME way for folks to find packages that have at least
> has some level of vetting would be good -- exactly what level of vetting,
> is a very open question, but I think "even a little" could be very helpful.
>

Doesn't each individual / team / company / organization already discuss and
document their preferred packages, and (indirectly or directly) help to
evolve them and consider alternatives by doing so?

I think it's important that there's a common space where packages can exist
and be available.  Whether that realtime environment should state its' own
preferred packages seems more debatable to me - because it could become a
source of contention and gaming similar to search engine optimization.

An exception could be software museums: I can see a curated collection of
best-known and most-effective libraries used by particular cultures at
points-in-time being of interest to future (and some current?) generations.

I also agree with a later reply about avoiding the murkier side of
blockchains / etc.  That said, it seems to me (again, sample size one
anecdata) that creating a more levelled playing field for package
publication could benefit from the use of some distributed technologies.
Even HTTP mirrors are, arguably, a basic form of that.. there's at least
one question related to recency of data, though.  Delaying availability of
a package to an audience -- if it's important enough -- could under some
circumstances become effectively similar to censorship.

A few ideas that have come up in the previous thread -- more or less in
> order of level of effort.
>
> 1) A "clearing house" of package reviews, for lack of a better word -- a
> single web site that would point to various resources on the internet --
> blog posts, etc, that 

[Python-ideas] Re: "Curated" package repo?

2023-07-05 Thread Chris Angelico
On Wed, 5 Jul 2023 at 17:00, Christopher Barker  wrote:
> I'm noting this, because I think it's part of the problem to be solved, but 
> maybe not the mainone (to me anyway). I've been focused more on "these 
> packages are worthwhile, by some definition of worthwhile). While I think 
> Chris A is more focused on "which of these seemingly similar packages should 
> I use?" -- not unrelated, but not the same question either.
>

Indeed, not the same question; but "some definition of worthwhile" is
the crucial point here. If there is one single curated package index
of "worthwhile" packages, who decides what's on it and what's not? If
not everyone can agree, will there have to be multiple such listings?

> Technically, conda is similar to pip -- it has a default "channel" (a channel 
> is an indexed repository of packages) it points to, and you can point it to a 
> different one, or any number of others, or install a single package from a 
> particular channel.
>
> Socially, it's pretty different
> - There is no channel like PyPi that anyone can put anything on willy nilly.
> - The default channel is operated by Anaconda.com -- and no one else can put 
> any thing on there. (they take suggestions, but it's a pretty big lift to get 
> them to add a package)
> - The protocol for a channel is pretty simple -- all you really need is an 
> http server, but in practice, most folks host their channels on the 
> Anaconda.org server -- it's a free service that anyone can create a channel 
> on -- there are a LOT -- folks use them for their personal projects, etc.
>

So, high barrier to entry. Good to know. That's neither good nor bad
inherently, but it is a point of note.

> - Then there is conda-forge:
> It grew out of an effort to collaborate among a number of folks operating 
> channels focused on particular fields -- met/ocean science, astronomy, 
> computational biology, ... we all had different needs, but they overlapped -- 
> why not share resources? Thanks to the heroic efforts of a few folks, it grew 
> to what it is now: a gitHub and CI -based conda package build system that 
> published a conda channel on anaconda.org with over 22,000 (wow! I think I'm 
> reading that right) packages.
>
> (https://anaconda.org/conda-forge/repo)
>
> They are curated -- anyone can propose a new package (via PR) -- but it only 
> gets added once it's been reviewed and approved by the core team. Curation 
> wasn't the goal, but it's necessary in order to have any hope that they will 
> all work together. The review process is really of the package, not the code 
> in the package (is it built correctly? is it compatible with the rest of 
> conda-forge? Does it include the license file? Is there a maintainer? ...) 
> But the end result is a fair bit of curation -- users can be assured that:
> 1 - The package works
> 2 - The package is useful enough that someone took the time to get it up 
> there.
> 3 - It's very unlikely to be malware (I don't think the conda-forge policy 
> really looks hard for that, but typosquatting and that sort of thing are 
> pretty much impossible.
>

Cool. The trouble is, point 1 is nearly impossible to assure except in
the very narrowest of definitions, and point 2's value correlates with
the height of the barrier to entry, so it's a fairly strict tradeoff.
And unless that barrier is extremely high, there will always be the
possibility that someone puts in the effort to get malware pushed,
although it does become vanishingly improbable.

>> What about OS package managers like the Debian repositories?
>
> I have no idea, other than that the majors, at least, put a LOT of work into 
> having a pretty comprehensive base repository of "vetted" packages

Right; hence the question of how a "vetted Python package collection"
would compare. I can type "sudo apt install python-" and add the name
of a package, and I get some assurance that:

1) The package works
2) The package is useful enough
3) It's not malware
4) The specific *version* of the package works along with the versions
of everything else.

This is a very strong set of criteria, much stronger than we'd be
looking for here, as they come with correspondingly higher barriers to
entry (getting a package update into the Debian repositories becomes
harder and harder as the release date approaches).

> conda-forge has about 22,121 -- that's enough to be very useful, but a lot of 
> use-cases are not well covered, and I know I still have to contribute one 
> once in a while.
>
> Looking now -- PyPi has 465,295 projects more than 20 times as many -- I 
> wonder how many of those are "useful"?

Contrariwise, the Debian repository has under a thousand "python-*"
packages, but with a much stronger requirement that they be useful.

It's interesting that there are only twenty on PyPI for every one on
conda-forge. I would have expected a greater ratio. It seems that
conda-forge is able to be incomplete AND dauntingly large; how
successful would you be at guessing 

[Python-ideas] Re: "Curated" package repo?

2023-07-05 Thread David Mertz, Ph.D.
I would go a bit further: DAOs are absolutely terrible for EVERYTHING, and
anything that remotely mentions the acronym is a scam.

Let's please, please, please not go down some cryptoscam, blockchain,
rabbit hole here.  Drop it, burn the remains, try to forget it ever
happened.

On Wed, Jul 5, 2023 at 3:57 AM Stephen J. Turnbull <
turnbull.stephen...@u.tsukuba.ac.jp> wrote:

> Christopher Barker writes:
>
>  > Yes, it needs to be funded somehow, but some sort of donation / non
>  > profit / etc funding mechanism would be best -- but I don't think
>  > peer reviewers should be paid. Peer review in academic journals
>  > isn't cash compensated either.
>
> It's been done.  The most common scheme is nominal compensation (say
> USD50 per review) dependent on beating a relatively short deadline
> (typically 1-3 months).  But this is not really the same as academic
> publishing.  It's also not the same as movie and book reviewers who
> are paid staffers (at least they used to be in the days of paper
> journals).  It has aspects of both.  It might work here, although
> funding and appointment of reviewers are tough issues.
>
>  > I had to look that up: "Decentralized autonomous organization (DAO)"
>  >
>  > So, yes.
>
> Please, no.  DAOs are fine when only money is at risk (too risky for
> me, though).  But they're a terrible way to manage a community or its
> money.  Too fragile, too inflexible.  The history of DAOs is basically
> an empirical confirmation of Arrow's Impossibility Theorem.
> https://simple.wikipedia.org/wiki/Arrow%27s_impossibility_theorem
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/ZLBEHQGMFIA5PR26XVDQF4YAVPIOYWY4/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
The dead increasingly dominate and strangle both the living and the
not-yet born.  Vampiric capital and undead corporate persons abuse
the lives and control the thoughts of homo faber. Ideas, once born,
become abortifacients against new conceptions.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TAJCT2VGRWQG6WRVFL5EYOEYIOFK6ZJZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-05 Thread Stephen J. Turnbull
Christopher Barker writes:

 > Yes, it needs to be funded somehow, but some sort of donation / non
 > profit / etc funding mechanism would be best -- but I don't think
 > peer reviewers should be paid. Peer review in academic journals
 > isn't cash compensated either.

It's been done.  The most common scheme is nominal compensation (say
USD50 per review) dependent on beating a relatively short deadline
(typically 1-3 months).  But this is not really the same as academic
publishing.  It's also not the same as movie and book reviewers who
are paid staffers (at least they used to be in the days of paper
journals).  It has aspects of both.  It might work here, although
funding and appointment of reviewers are tough issues.

 > I had to look that up: "Decentralized autonomous organization (DAO)"
 > 
 > So, yes.

Please, no.  DAOs are fine when only money is at risk (too risky for
me, though).  But they're a terrible way to manage a community or its
money.  Too fragile, too inflexible.  The history of DAOs is basically
an empirical confirmation of Arrow's Impossibility Theorem.
https://simple.wikipedia.org/wiki/Arrow%27s_impossibility_theorem

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZLBEHQGMFIA5PR26XVDQF4YAVPIOYWY4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-05 Thread Christopher Barker
On Tue, Jul 4, 2023 at 5:49 PM Chris Angelico  wrote:

> On Wed, 5 Jul 2023 at 10:26, Christopher Barker 
> wrote:
> > The :problem", as I see it.
> >
> >  - The Python standard library is not, and will never be fully
> comprehensive -- most projects require *some* third party packages.
> >  - There are a LOT of packages available on PyPi -- with a very wide
> range of usefulness, quality and maintenance -- everything from widely used
> packages with a huge community (e.g. numpy) to packages that are release
> 0.0.1, and never seen an update, and may not even work.
> >
>
> Remember though that this problem is also Python's strength here. The
> standard library does not NEED to be comprehensive, and publishing on
> PyPI is deliberately easy. The barrier to entry for the stdlib is
> high; the barrier to entry for PyPI is low.
>

Absolutely -- and I"m not making any comment about the stdlib -- it's not
the point here.

But yes, the low (zero) barrier to entry to PyPi was probably the right way
to go when it got started -- now, I'm not so sure. *some* barrier to entry
would be helpful.

>
> https://wiki.python.org/moin/FrontPage


Thanks, I had forgotten about the Wiki! It foundered for a while ( a lot of
years ago -- I"ve been around...), but at a glance, it's looking good now.

Imagine a page like
> https://wiki.python.org/moin/Python_Package_Recommendations

...

> That way, it's decentralized for editing, but has a central "hub" that
> people can easily find.
>

yup -- great idea.


> I suspect this would end up being broadly equivalent to the first
> option, but with more effort by a core group of people (or a single
> maintainer), and in return, would have a more consistent look and
> feel.
>
yup.


> > 3) A rating system built into PyPi -- This could be a combination of two
> things:
> >   A - Automated analysis -- download stats, dependency stats, release
> frequency, etc, etc, etc.
> >   B - Community ratings -- upvotes. stars, whatever.
> >
> > If done well, that could be very useful -- search on PyPi listed by
> rating. However -- :done well" ios a huge challenge -- I don't think
> there's a way to do the automated system right, and community scoring can
> be abused pretty easily. But maybe folks smarter than me could make it work
> with one or both of these approaches.
> >
>
> Neither of them adequately answers questions like
> "which is right *for this use-case*",


I'm noting this, because I think it's part of the problem to be solved, but
maybe not the mainone (to me anyway). I've been focused more on "these
packages are worthwhile, by some definition of worthwhile). While I think
Chris A is more focused on "which of these seemingly similar
packages should I use?" -- not unrelated, but not the same question either.

Which makes me realize that having a centralized package review site is
complementary to a curated package index -- they are not replacements for
one another.

> 4)  A self contained repository of packages that you could point pip to --
>
...

Definitely possible; how would this compare to Conda?


Technically, conda is similar to pip -- it has a default "channel" (a
channel is an indexed repository of packages) it points to, and you can
point it to a different one, or any number of others, or install a single
package from a particular channel.

Socially, it's pretty different
- There is no channel like PyPi that anyone can put anything on willy
nilly.
- The default channel is operated by Anaconda.com -- and no one else can
put any thing on there. (they take suggestions, but it's a pretty big lift
to get them to add a package)
- The protocol for a channel is pretty simple -- all you really need is an
http server, but in practice, most folks host their channels on the
Anaconda.org server -- it's a free service that anyone can create a channel
on -- there are a LOT -- folks use them for their personal projects, etc.

- Then there is conda-forge:
It grew out of an effort to collaborate among a number of folks operating
channels focused on particular fields -- met/ocean science, astronomy,
computational biology, ... we all had different needs, but they overlapped
-- why not share resources? Thanks to the heroic efforts of a few folks, it
grew to what it is now: a gitHub and CI -based conda package build system
that published a conda channel on anaconda.org with over 22,000 (wow! I
think I'm reading that right) packages.

(https://anaconda.org/conda-forge/repo)

They are curated -- anyone can propose a new package (via PR) -- but it
only gets added once it's been reviewed and approved by the core team.
Curation wasn't the goal, but it's necessary in order to have any hope that
they will all work together. The review process is really of the package,
not the code in the package (is it built correctly? is it compatible with
the rest of conda-forge? Does it include the license file? Is there a
maintainer? ...) But the end result is a fair bit of curation -- users can
be assured that:
1 - 

[Python-ideas] Re: "Curated" package repo?

2023-07-05 Thread Christopher Barker
On Tue, Jul 4, 2023 at 10:06 PM Jonathan Crall  wrote:

> I like the idea of a vetted package index that pip can point to. The more
> I think about it, the more I think that it needs some sort of peer review
> system as the barrier to entry, and my thoughts to to establishing some
> DeSci DAO that could distribute the peer review of packages amongst a set
> of trusted maintainers while also being a mechanism to add new trusted
> maintainers to the peer-review pool.
>


> Peer reviewers could be funded via a fee to submit a package for
> publishing.
>

I agree until this point -- we REALLY don't want to have a pay to play
system.

Yes, it needs to be funded somehow, but some sort of donation / non profit
/ etc funding mechanism would be best -- but I don't think peer
reviewers should be paid. Peer review in academic journals isn't cash
compensated either.


> I think to achieve a scalable, funded, decentralized, and trustworthy
> package index a DAO makes some amount of sense.
>

I had to look that up: "Decentralized autonomous organization (DAO)"

So, yes.

-CHB



-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/C2W5F6U5Y55SBGQZ5S3O3ZETSVYCOQYG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-04 Thread Chris Angelico
On Wed, 5 Jul 2023 at 15:07, Jonathan Crall  wrote:
>
> I like the idea of a vetted package index that pip can point to. The more I 
> think about it, the more I think that it needs some sort of peer review 
> system as the barrier to entry, and my thoughts to to establishing some DeSci 
> DAO that could distribute the peer review of packages amongst a set of 
> trusted maintainers while also being a mechanism to add new trusted 
> maintainers to the peer-review pool. Peer reviewers could be funded via a fee 
> to submit a package for publishing. There are a lot of open questions of how 
> this would be done correctly - or even if this is necessary - but I think to 
> achieve a scalable, funded, decentralized, and trustworthy package index a 
> DAO makes some amount of sense.
>

This adds up to a HUGE barrier to entry for new packages. For your
proposal to be successful, a good number of users would need to point
pip to the vetted index and NOT to the normal one (otherwise there's
no benefit - you're just getting non-vetted packages), and in order
for a new package to become relevant, it needs to:

1. Pay money. Even if it's not a huge dollar amount, that is already a
very significant barrier for casual users.
2. Get reviewed. This is going to take time.
3. Pass the review. If the review system is at all meaningful, it has
to knock some packages back, otherwise all you have is "pay to be
visible" which is a horrible system.
4. OR, instead of those steps: Convince your users to switch to the
"untrusted" package repository.

That makes for a very closed-off ecosystem, where an incumbent has a
dramatic advantage over anything that comes along. It would almost
certainly require that vetted packages not depend on non-vetted
packages (otherwise you'd need some bizarre mechanic whereby "pip
install package-name" looks at one repository, but it resolves
dependencies by looking at a different one), so nothing would have a
chance to be seen in the walled garden until you get it vetted AND
recognized.

So, sure, this would make life easier for those who want to randomly
download packages without thinking about them, but at the cost of
making the package repository extremely minimalist and insular. It'd
make the Python packaging ecosystem look as unfriendly as iPhone app
publishing.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2WLUMPDRTDVWIHWIJ7KINPF6LD4KRDCY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-04 Thread Jonathan Crall
I like the idea of a vetted package index that pip can point to. The more I
think about it, the more I think that it needs some sort of peer review
system as the barrier to entry, and my thoughts to to establishing some
DeSci DAO that could distribute the peer review of packages amongst a set
of trusted maintainers while also being a mechanism to add new trusted
maintainers to the peer-review pool. Peer reviewers could be funded via a
fee to submit a package for publishing. There are a lot of open questions
of how this would be done correctly - or even if this is necessary - but I
think to achieve a scalable, funded, decentralized, and trustworthy package
index a DAO makes some amount of sense.



On Tue, Jul 4, 2023 at 8:52 PM Dom Grigonis  wrote:

> To have 4) would be great. 2) could be a temporary test pilot to get some
> ideas when/if 4) was to be implemented.
>
> Starting point for 2) could be simple (?). E.g. 4 parts:
>   1. Features & integration
>   2. Performance - set-up automatic benchmarking via CI
>   3. Status (stack hits, user base, open issues, lines of code, update
> frequency & similar)
>   4. User rating (wander if up/down voting or star rating can be somehow
> implemented in github)
>
> Author collects inputs from this e-mail group or via opened issue and
> collates results. Once the comparison tables are there package authors can
> then issue PR themselves or next interested user does the update.
>
> Could be a simple way to gauge variables before committing to 3) or 4).
>
> On 5 Jul 2023, at 03:21, Christopher Barker  wrote:
>
> Stating a new thread with a correct title.
>
> On 2 Jul 2023, at 10:12, Paul Moore  wrote:
>
> Unfortunately, too much of this discussion is framed as “someone should”,
>> or “it would be good if”. No-one is saying “I will”. Naming groups, like
>> “the PyPA should” doesn’t help either - groups don’t do things, people do.
>> Who in the PyPA? Me? Nope, sorry, I don’t have the time or interest - I’d
>> *use* a curated index, I sure as heck couldn’t *create* one.
>
>
> Well, I started this topic, and I don't *think* I ever wrote "someone
> should", and I certainly didn't write "PyPa should".
>
> But whatever I or anyone else wrote, my intention was to discuss what
> might be done to address what I think is a real problem/limitation in the
> discoverability of useful packages for Python.
>
> And I think of it not so much as "someone should" but as "it would be nice
> to have".
>
> Of course, any of these ideas would take a lot of work to implement --
> and  even though there are a lot of folks, on this list  and elsewhere,
> that would like to help, I don't think any substantial open-source project
> has gotten anywhere without a concerted effort by a very small group (often
> just 1) of people doing a lot of work to get it to a useful state before a
> larger group can contribute. So I"m fully aware that nothings going to
> happen unless *someone* really puts the work in up front. That someone
> *might* be me, but I'm really good at over-committing myself, and not so
> great at keeping my nose to the grindstone, so 
>
> And I think this particular problem calls for a solution that would have
> to be pretty well established before reaching critical mass to actually be
> useful -- after all, we already have PyPi -- why go anywhere else that is
> less comprehensive?
>
> All that being said, it's still worth having a conversation about what a
> good solution might look like -- there are a lot of options, and hashing
> out some of the ideas might inspire someone to rise to the occasion.
>
> The :problem", as I see it.
>
>  - The Python standard library is not, and will never be fully
> comprehensive -- most projects require *some* third party packages.
>  - There are a LOT of packages available on PyPi -- with a very wide range
> of usefulness, quality and maintenance -- everything from widely used
> packages with a huge community (e.g. numpy) to packages that are release
> 0.0.1, and never seen an update, and may not even work.
>
> So the odds that there's a package that does what you need are good, but
> it can be pretty hard to find them sometimes -- and can be a fair bit
> of work to sift through to find the good ones -- and many folks don't feel
> qualified to do so.
>
> This can result in two opposite consequences:
>
> 1) People using a package that really isn't reliable or maintained (or not
> supported on all platforms, or ..) and getting stuck with it (I've had that
> on some of my projects -- I'm pretty careful, but not everyone on my team
> is)
>
> 2) People writing their own code - wasting time, and maybe not getting a
> very good solution either. I've also had that problem on my projects...
>
> To avoid this -- SOME way for folks to find packages that have at least
> has some level of vetting would be good -- exactly what level of vetting,
> is a very open question, but I think "even a little" could be very helpful.
>
> A few ideas that have come up 

[Python-ideas] Re: "Curated" package repo?

2023-07-04 Thread Dom Grigonis
To have 4) would be great. 2) could be a temporary test pilot to get some ideas 
when/if 4) was to be implemented.

Starting point for 2) could be simple (?). E.g. 4 parts:
  1. Features & integration
  2. Performance - set-up automatic benchmarking via CI
  3. Status (stack hits, user base, open issues, lines of code, update 
frequency & similar)
  4. User rating (wander if up/down voting or star rating can be somehow 
implemented in github)

Author collects inputs from this e-mail group or via opened issue and collates 
results. Once the comparison tables are there package authors can then issue PR 
themselves or next interested user does the update.

Could be a simple way to gauge variables before committing to 3) or 4).

> On 5 Jul 2023, at 03:21, Christopher Barker  wrote:
> 
> Stating a new thread with a correct title.
> 
> On 2 Jul 2023, at 10:12, Paul Moore  > wrote:
> 
> Unfortunately, too much of this discussion is framed as “someone should”, or 
> “it would be good if”. No-one is saying “I will”. Naming groups, like “the 
> PyPA should” doesn’t help either - groups don’t do things, people do. Who in 
> the PyPA? Me? Nope, sorry, I don’t have the time or interest - I’d *use* a 
> curated index, I sure as heck couldn’t *create* one.
> 
> Well, I started this topic, and I don't *think* I ever wrote "someone 
> should", and I certainly didn't write "PyPa should".
> 
> But whatever I or anyone else wrote, my intention was to discuss what might 
> be done to address what I think is a real problem/limitation in the 
> discoverability of useful packages for Python.
> 
> And I think of it not so much as "someone should" but as "it would be nice to 
> have".
> 
> Of course, any of these ideas would take a lot of work to implement -- and  
> even though there are a lot of folks, on this list  and elsewhere, that would 
> like to help, I don't think any substantial open-source project has gotten 
> anywhere without a concerted effort by a very small group (often just 1) of 
> people doing a lot of work to get it to a useful state before a larger group 
> can contribute. So I"m fully aware that nothings going to happen unless 
> *someone* really puts the work in up front. That someone *might* be me, but 
> I'm really good at over-committing myself, and not so great at keeping my 
> nose to the grindstone, so 
> 
> And I think this particular problem calls for a solution that would have to 
> be pretty well established before reaching critical mass to actually be 
> useful -- after all, we already have PyPi -- why go anywhere else that is 
> less comprehensive?
> 
> All that being said, it's still worth having a conversation about what a good 
> solution might look like -- there are a lot of options, and hashing out some 
> of the ideas might inspire someone to rise to the occasion.
> 
> The :problem", as I see it.
> 
>  - The Python standard library is not, and will never be fully comprehensive 
> -- most projects require *some* third party packages.
>  - There are a LOT of packages available on PyPi -- with a very wide range of 
> usefulness, quality and maintenance -- everything from widely used packages 
> with a huge community (e.g. numpy) to packages that are release 0.0.1, and 
> never seen an update, and may not even work.
> 
> So the odds that there's a package that does what you need are good, but it 
> can be pretty hard to find them sometimes -- and can be a fair bit of work to 
> sift through to find the good ones -- and many folks don't feel qualified to 
> do so.
> 
> This can result in two opposite consequences:
> 
> 1) People using a package that really isn't reliable or maintained (or not 
> supported on all platforms, or ..) and getting stuck with it (I've had that 
> on some of my projects -- I'm pretty careful, but not everyone on my team is) 
> 
> 2) People writing their own code - wasting time, and maybe not getting a very 
> good solution either. I've also had that problem on my projects...
> 
> To avoid this -- SOME way for folks to find packages that have at least has 
> some level of vetting would be good -- exactly what level of vetting, is a 
> very open question, but I think "even a little" could be very helpful.
> 
> A few ideas that have come up in the previous thread -- more or less in order 
> of level of effort.
> 
> 1) A "clearing house" of package reviews, for lack of a better word -- a 
> single web site that would point to various resources on the internet -- blog 
> posts, etc, that would help people select packages. So one might go to the 
> "JSON: section, and find links to some of the comparisons of JSON parsers, to 
> help you choose. The authors of this site could even solicit folks to write 
> reviews, etc that they could then point to.
> Chris A: this is my interpretation of your decentralized idea -- please do 
> correct me if I got it wrong.
> 
> 2) A Python package review site. This could be setup and managed with, e.g. a 
> 

[Python-ideas] Re: "Curated" package repo?

2023-07-04 Thread Chris Angelico
On Wed, 5 Jul 2023 at 10:26, Christopher Barker  wrote:
> The :problem", as I see it.
>
>  - The Python standard library is not, and will never be fully comprehensive 
> -- most projects require *some* third party packages.
>  - There are a LOT of packages available on PyPi -- with a very wide range of 
> usefulness, quality and maintenance -- everything from widely used packages 
> with a huge community (e.g. numpy) to packages that are release 0.0.1, and 
> never seen an update, and may not even work.
>

Remember though that this problem is also Python's strength here. The
standard library does not NEED to be comprehensive, and publishing on
PyPI is deliberately easy. The barrier to entry for the stdlib is
high; the barrier to entry for PyPI is low.

> 1) A "clearing house" of package reviews, for lack of a better word -- a 
> single web site that would point to various resources on the internet -- blog 
> posts, etc, that would help people select packages. So one might go to the 
> "JSON: section, and find links to some of the comparisons of JSON parsers, to 
> help you choose. The authors of this site could even solicit folks to write 
> reviews, etc that they could then point to.
> Chris A: this is my interpretation of your decentralized idea -- please do 
> correct me if I got it wrong.
>

That is a correct interpretation of what I said, yes. I'll take it a
bit further though and add that this "single web site" would be
ideally editable by multiple people. In fact, Python has a place for
that sort of thing:

https://wiki.python.org/moin/FrontPage

Imagine a page like
https://wiki.python.org/moin/Python_Package_Recommendations (doesn't
currently exist) that would be managed the same way as other
recommendation pages like
https://wiki.python.org/moin/IntroductoryBooks - anyone can add a link
to their blog post about packages, and if they've focused on a
specific category (eg "web frameworks"), that can be right there in
the wiki page so people can search for it.

That way, it's decentralized for editing, but has a central "hub" that
people can easily find.

> 2) A Python package review site. This could be setup and managed with, e.g. a 
> gitHub repo, so that there could be a small number of editors, but anyone 
> could contribute a review via PR. The ultimiate goal would be 
> reviews/recommendations of all the major package categories on PyPi. If well 
> structured and searchable, this could be very useful.
>  - This was proposed by someone on the previous thread -- again, correct me 
> if I'm wrong.
>

I suspect this would end up being broadly equivalent to the first
option, but with more effort by a core group of people (or a single
maintainer), and in return, would have a more consistent look and
feel.

> 3) A rating system built into PyPi -- This could be a combination of two 
> things:
>   A - Automated analysis -- download stats, dependency stats, release 
> frequency, etc, etc, etc.
>   B - Community ratings -- upvotes. stars, whatever.
>
> If done well, that could be very useful -- search on PyPi listed by rating. 
> However -- :done well" ios a huge challenge -- I don't think there's a way to 
> do the automated system right, and community scoring can be abused pretty 
> easily. But maybe folks smarter than me could make it work with one or both 
> of these approaches.
>

Huge challenge indeed. Possibly insurmountable. Popularity contests
(purely based on download stats and such) have their value, but do not
tell you what's best, only what's the most used. Community ratings, as
you say, can be gamed all too easily, plus they STILL tend to favour
those that are heavily used rather than niche things that are
potentially better. Neither of them adequately answers questions like
"which is right *for this use-case*", which will leave a lot of
packages out in the cold.

> 4)  A self contained repository of packages that you could point pip to -- it 
> would contain only the packages that had met some sort of "vetting" criteria. 
> In theory, anyone could run it, but a stamp of approval from the PSF would 
> make it far more acceptable to people. This would be a LOT of work to get set 
> up, and still a lot of work to maintain.
>

Definitely possible; how would this compare to Conda? What about OS
package managers like the Debian repositories?

> Personally, I think (4) is the best end result, but probably the most work as 
> well[*], so ???
>
> (1) and (2) have the advantage that they could be useful even without being 
> comprehensive -- tthey's need to have some critical mass to get any traction, 
> but maybe not THAT much.

Right, and notably, they can be useful without covering every topic.
You could write a blog post about database ORM packages, and that's
useful right there, without worrying about whether there's any review
of internet protocol packages.

Hmm. Once that sort of collection gets some traction, someone might
say "Hey, anyone know about grammar parsers?" and add that to a "help