[Python-ideas] Re: "Curated" package repo?

2023-07-06 Thread Dom Grigonis
Thanks David,

Not keen to take it on solo.

Ideally, IMO, this could be a joint project of this whole group. Someone more 
senior from this group creates a repo and oversights the progress, couple of 
more experienced members make initial decisions to get things going, someone 
issues a review PR, couple of new OPs are suggested to write a review on their 
queries.

Would be good to know if:
1. Someone has their own little benchmarking/reviews and would be willing to 
spend a little time to issue PR for some initial content.
2. People from this group see themselves of going to such place and adding a 
new great package they have just found to existing reviews (or even more 
importantly an awful package)
3. Someone sees the opportunity to contribute given someone took such project 
on. e.g.
  a) someone is very excited about benchmarking automation
  b) someone has some working scripts to fetch github stats / stack trends that 
are waiting to be used
  c) someone wants to take their devops to next level and sees this as a good 
opportunity
  d) someone is very keen on high level view and would like to contribute in 
working on categorisation (partially relying on python stdlib/libref could be 
intuitive, although then it is a dependency)

A lot of “someones” in this e-mail...

> On 6 Jul 2023, at 16:55, David Mertz, Ph.D.  wrote:
> 
> Dom:
> 
> I'd recommend you simply start a GitHub project for "Curated PyPI", find a 
> catchy domain name, and publish that via GH Pages.  That's a few hours of 
> work to get a skeleton.  But no, I'm not quite volunteering to create and 
> maintain it myself today.
> 
> After there is a concrete site existing, you can refine the presentation and 
> governance procedure iteratively.  As a start, it can basically just be a web 
> page with evaluations like yours of the JSON libraries.  At a first pass, 
> there's no need for anything dynamic on the page, just some tables (or maybe 
> accordions, or side-bar navigation, or whatever).
> 
> I'd be very likely to make some PRs to such a repository myself.  At some 
> point, with enough recommendations, you might add some automation. E.g. some 
> script that checks all the submitted "package reviews" and creates an 
> aggregation ("10 reviews with average rating of 8").  Even there, running 
> that thing offline every once in a while is plenty to start (you could do GH 
> Actions or something too, if you like).
> 
> There are a few decisions to make, but none that difficult.  For example, 
> what format are reviews? Markdown? YAML? TOML? JSON? Python with conventions? 
> Whatever it is, it should have a gentle learning curve and be human readable 
> IMO.
> 
> 
> 
> On Thu, Jul 6, 2023 at 9:26 AM Dom Grigonis  > wrote:
> It is possible, that issues being discussed at this stage are not as relevant 
> as they seem at stage 0, which this idea is at.
> (Unless someone here is looking for a very serious commitment.)
> 
> If some sort of starting point which is “light” in approach was decided on, 
> then the process can be readjusted as/if it progresses. Maybe no need to put 
> a “stamp” on a package, but simply provide comparison statistics given some 
> initial structure.
> 
> I think a lot of packages can be filtered on objective criteria, without even 
> reaching the stage of subjective opinions.
> 
> ———
> 
> General info - fairly easy to inspect without the need of subjective opinions.
> 1. License
> 2. Maintenance - hard stack overflow & repo stats
> 
> Performance - hard stats:
> 1. There will be lower level language extensions, which even if not up to 
> standards in other aspects are worth attention, someone else might pick it up 
> and rejuvenate if explicitly indicated.
> 2. There will be a pure python packages:
>   a) good coding standards with good knowledge on efficient programming in 
> pure python
>   b) pure python packages that take ages to execute
> 
> In many areas, this will filter out many libraries. Although, there are some, 
> where it wouldn’t. E.g. schema-based low level serialisation, where 
> benchmarks can be quite tight.
> 
> The remaining evaluation can be subjective opinions, where preferences of 
> curators can be taken into account:
> 1. Coding standards
> 2. Integration
> 3. Flexibility/functionality
> 4. …
> 
> IMO, all of this can be done while being on the safe side - if unsure, leave 
> the plain statistics for users and developers to see.
> 
> ———
> 
> An example. (I am not the developer of any of these)
> Json serialisers:
> 1. json - stdlib, average performance, well maintained, flexible, very safe 
> to depend on
> 2. simplejson - 3rd party, pure python, performance in line with 1), drop-in 
> replacement for json, been around for a while, safe to depend on
> 2. ultrajson - 3rd party, written in C, >3x performance boost on 1) & 2), 
> drop-in replacement for json, been around for a while, safe to depend on
> 3. ijson - 3rd party, C, 

[Python-ideas] Re: "Curated" package repo?

2023-07-06 Thread James Addison via Python-ideas
On Thu, Jul 6, 2023, 14:22 James Addison  wrote:

> I agree, we should encourage or await a single organization to reimplement
> a packaging ecosystem with a slightly different set of properties that
> continue to provide them with editor biasing, preventing eventual global
> consensus and system neutrality.
>
> Is your time available to help build it?
>

I'd like to apologise for this comment; I don't think that I argued in good
faith here.

I was frustrated by a sense that many of the more straightforward attempts
to make improvements in packaging ecosystems are, in themselves, a
reinvention of previously-existing wheels, often producing similarly wonky
spokes to previous attempts that result in repeated off-course journeys
that, given enough knowledge of the history of technology, seem predictable.

To observe that and then to go on to suggest that we simply wait for the
next wonky wheel builder doesn't seem like genuine progress, and I should
neither argue for that nor ask whether other people want to spend their own
valuable time on it.

On Thu, Jul 6, 2023, 14:17 Gregory Disney 
> wrote:
>
>> why do people insist on reinventing the wheel? Blockchain is not the
>> answer for adding trust that is verifiable. Code signing is the answer,
>> it’s widely accepted and would be useful in cases of trusted computing and
>> other security use cases.
>>
>> I don’t want to load a hash table to load a third party module on a UEFI
>> interface.
>>
>> On Thu, Jul 6, 2023 at 9:11 AM James Addison via Python-ideas <
>> python-ideas@python.org> wrote:
>>
>>> On Wed, Jul 5, 2023, 19:06 Chris Angelico  wrote:
>>>
 On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas
  wrote:
 > I also agree with a later reply about avoiding the murkier side of
 blockchains / etc.  That said, it seems to me (again, sample size one
 anecdata) that creating a more levelled playing field for package
 publication could benefit from the use of some distributed technologies.
 Even HTTP mirrors are, arguably, a basic form of that.. there's at least
 one question related to recency of data, though.  Delaying availability of
 a package to an audience -- if it's important enough -- could under some
 circumstances become effectively similar to censorship.
 >

 A blockchain won't solve anything here. It would be completely and
 utterly impractical to put the packages themselves into a blockchain,
 so all you'd have is the index, and that means it's just a bad version
 of PyPI's own single-page index.

 ChrisA
 ___
 Python-ideas mailing list -- python-ideas@python.org
 To unsubscribe send an email to python-ideas-le...@python.org
 https://mail.python.org/mailman3/lists/python-ideas.python.org/
 Message archived at
 https://mail.python.org/archives/list/python-ideas@python.org/message/PTIS3HZHJSFV7ETWE7UP4HKXS4WN2OEO/
 Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>>
>>> Mostly agreed.  A distributed hash table or similar, though, could be
>>> appropriate in combination with ideas similar to the accreting layers of
>>> self-reinforcing consensus that some blockchain technologies provide.
>>> ___
>>> Python-ideas mailing list -- python-ideas@python.org
>>> To unsubscribe send an email to python-ideas-le...@python.org
>>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>>> Message archived at
>>> https://mail.python.org/archives/list/python-ideas@python.org/message/I3CDZAXGYVS33DJ4JEENGYMF4MY6BQ7O/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KAKBDC3WSSUKCAY24SMABP3GIVXXEILD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-06 Thread David Mertz, Ph.D.
Dom:

I'd recommend you simply start a GitHub project for "Curated PyPI", find a
catchy domain name, and publish that via GH Pages.  That's a few hours of
work to get a skeleton.  But no, I'm not quite volunteering to create and
maintain it myself today.

After there is a concrete site existing, you can refine the presentation
and governance procedure iteratively.  As a start, it can basically just be
a web page with evaluations like yours of the JSON libraries.  At a first
pass, there's no need for anything dynamic on the page, just some tables
(or maybe accordions, or side-bar navigation, or whatever).

I'd be very likely to make some PRs to such a repository myself.  At some
point, with enough recommendations, you might add some automation. E.g.
some script that checks all the submitted "package reviews" and creates an
aggregation ("10 reviews with average rating of 8").  Even there, running
that thing offline every once in a while is plenty to start (you could do
GH Actions or something too, if you like).

There are a few decisions to make, but none that difficult.  For example,
what format are reviews? Markdown? YAML? TOML? JSON? Python with
conventions? Whatever it is, it should have a gentle learning curve and be
human readable IMO.



On Thu, Jul 6, 2023 at 9:26 AM Dom Grigonis  wrote:

> It is possible, that issues being discussed at this stage are not as
> relevant as they seem at stage 0, which this idea is at.
> (Unless someone here is looking for a very serious commitment.)
>
> If some sort of starting point which is “light” in approach was decided
> on, then the process can be readjusted as/if it progresses. Maybe no need
> to put a “stamp” on a package, but simply provide comparison statistics
> given some initial structure.
>
> I think a lot of packages can be filtered on objective criteria, without
> even reaching the stage of subjective opinions.
>
> ———
>
> General info - fairly easy to inspect without the need of subjective
> opinions.
> 1. License
> 2. Maintenance - hard stack overflow & repo stats
>
> Performance - hard stats:
> 1. There will be lower level language extensions, which even if not up to
> standards in other aspects are worth attention, someone else might pick it
> up and rejuvenate if explicitly indicated.
> 2. There will be a pure python packages:
>   a) good coding standards with good knowledge on efficient programming in
> pure python
>   b) pure python packages that take ages to execute
>
> In many areas, this will filter out many libraries. Although, there are
> some, where it wouldn’t. E.g. schema-based low level serialisation, where
> benchmarks can be quite tight.
>
> The remaining evaluation can be subjective opinions, where preferences of
> curators can be taken into account:
> 1. Coding standards
> 2. Integration
> 3. Flexibility/functionality
> 4. …
>
> IMO, all of this can be done while being on the safe side - if unsure,
> leave the plain statistics for users and developers to see.
>
> ———
>
> An example. (I am not the developer of any of these)
> Json serialisers:
> 1. json - stdlib, average performance, well maintained, flexible, very
> safe to depend on
> 2. simplejson - 3rd party, pure python, performance in line with 1),
> drop-in replacement for json, been around for a while, safe to depend on
> 2. ultrajson - 3rd party, written in C, >3x performance boost on 1) & 2),
> drop-in replacement for json, been around for a while, safe to depend on
> 3. ijson - 3rd party, C, average performance, proprietary interface
> relying heavily on iterator protocol, status 
> 4. orjson - 3rd party, highly optimised C, performance on par with fastest
> serialisers on the market, not-a-drop-in-replacement for json, due to
> sacrifices for performance, rich in functionality, well maintained, safe to
> depend on
> 5. pyjson5 - 3rd party, c++ performance similar to ultrajson, can be a
> drop-in replacement for json, extends json to json5 features such as
> comments, well maintained, safe to depend on
>
> (THIS IS JUST AN EXAMPLE OF COMPARISON, NOT TO BE RELIED ON)
>
> So there is still a bit of opinion here, but all of this can be
> standardised and put in numbers, and comparison of this type can be  done
> with little-to-none personal opinion.
>
> ———
>
> After structure for this is in place, it would be easier to discuss
> further whether more serious curation is needed/worthwhile/makes sense.
>
> Allow queries from users, package developers, places to gather opinions,
> maybe volunteering to do a deeper analysis…
>
> And once there is enough input, maybe a curated guidance can be added to
> the review. But this is the next stage, which is not necessarily needed to
> be thoroughly thought out before putting in place something simple,
> objective & risk-free.
>
> ———
>
> Maybe stage 1. is all that users need - a reliable place to check hard
> stats, where users and developers can update them for the benefit of all.
> With 

[Python-ideas] Re: "Curated" package repo?

2023-07-06 Thread Dom Grigonis
It is possible, that issues being discussed at this stage are not as relevant 
as they seem at stage 0, which this idea is at.
(Unless someone here is looking for a very serious commitment.)

If some sort of starting point which is “light” in approach was decided on, 
then the process can be readjusted as/if it progresses. Maybe no need to put a 
“stamp” on a package, but simply provide comparison statistics given some 
initial structure.

I think a lot of packages can be filtered on objective criteria, without even 
reaching the stage of subjective opinions.

———

General info - fairly easy to inspect without the need of subjective opinions.
1. License
2. Maintenance - hard stack overflow & repo stats

Performance - hard stats:
1. There will be lower level language extensions, which even if not up to 
standards in other aspects are worth attention, someone else might pick it up 
and rejuvenate if explicitly indicated.
2. There will be a pure python packages:
  a) good coding standards with good knowledge on efficient programming in pure 
python
  b) pure python packages that take ages to execute

In many areas, this will filter out many libraries. Although, there are some, 
where it wouldn’t. E.g. schema-based low level serialisation, where benchmarks 
can be quite tight.

The remaining evaluation can be subjective opinions, where preferences of 
curators can be taken into account:
1. Coding standards
2. Integration
3. Flexibility/functionality
4. …

IMO, all of this can be done while being on the safe side - if unsure, leave 
the plain statistics for users and developers to see.

———

An example. (I am not the developer of any of these)
Json serialisers:
1. json - stdlib, average performance, well maintained, flexible, very safe to 
depend on
2. simplejson - 3rd party, pure python, performance in line with 1), drop-in 
replacement for json, been around for a while, safe to depend on
2. ultrajson - 3rd party, written in C, >3x performance boost on 1) & 2), 
drop-in replacement for json, been around for a while, safe to depend on
3. ijson - 3rd party, C, average performance, proprietary interface 
relying heavily on iterator protocol, status 
4. orjson - 3rd party, highly optimised C, performance on par with fastest 
serialisers on the market, not-a-drop-in-replacement for json, due to 
sacrifices for performance, rich in functionality, well maintained, safe to 
depend on
5. pyjson5 - 3rd party, c++ performance similar to ultrajson, can be a drop-in 
replacement for json, extends json to json5 features such as comments, well 
maintained, safe to depend on

(THIS IS JUST AN EXAMPLE OF COMPARISON, NOT TO BE RELIED ON)

So there is still a bit of opinion here, but all of this can be standardised 
and put in numbers, and comparison of this type can be  done with 
little-to-none personal opinion.

———

After structure for this is in place, it would be easier to discuss further 
whether more serious curation is needed/worthwhile/makes sense.

Allow queries from users, package developers, places to gather opinions, maybe 
volunteering to do a deeper analysis… 

And once there is enough input, maybe a curated guidance can be added to the 
review. But this is the next stage, which is not necessarily needed to be 
thoroughly thought out before putting in place something simple, objective & 
risk-free.

———

Maybe stage 1. is all that users need - a reliable place to check hard stats, 
where users and developers can update them for the benefit of all. With enough 
popularity, package developers should be motivated to issue stat updates (e.g. 
add additional column to benchmarking script), and users would issue similar 
updates (e.g. add additional column to benchmarking script, where the library 
is extremely slow).

It is possible that the project would naturally turn to direction of hard stat 
coverage instead of “deep” curation. E.g.
json serialisers become a sub-branch of schema-less serialisers,
which in turn become a branch of serialisers

Then the user can then view comparable stats of the whole branch, sub-branch, 
sub-sub-branch to get the information he needs to make decisions. And apply 
different filters in the process to get to the final list of packages on which 
the user will have to do hiss final subjective analysis anyways.

———

E.g. User needs a serialiser. He prefers schema-less, but willing to go 
schema-based given large increases in performance. Does not mind low 
maintenance status given he aims to maintain his own proprietary serialisation 
library in the long run. Naturally, clean & simple coding with permissive 
license is preferred.

Just a portal with up-to-date stats where user could interactively navigate 
such decisions would be a good start and potentially a “safe” route to begin 
with.

The starting work on such thing then would be more heavy on automation, rather 
than politics, which in turn will be easier to tackle later 

[Python-ideas] Re: "Curated" package repo?

2023-07-06 Thread James Addison via Python-ideas
I agree, we should encourage or await a single organization to reimplement
a packaging ecosystem with a slightly different set of properties that
continue to provide them with editor biasing, preventing eventual global
consensus and system neutrality.

Is your time available to help build it?

On Thu, Jul 6, 2023, 14:17 Gregory Disney 
wrote:

> why do people insist on reinventing the wheel? Blockchain is not the
> answer for adding trust that is verifiable. Code signing is the answer,
> it’s widely accepted and would be useful in cases of trusted computing and
> other security use cases.
>
> I don’t want to load a hash table to load a third party module on a UEFI
> interface.
>
> On Thu, Jul 6, 2023 at 9:11 AM James Addison via Python-ideas <
> python-ideas@python.org> wrote:
>
>> On Wed, Jul 5, 2023, 19:06 Chris Angelico  wrote:
>>
>>> On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas
>>>  wrote:
>>> > I also agree with a later reply about avoiding the murkier side of
>>> blockchains / etc.  That said, it seems to me (again, sample size one
>>> anecdata) that creating a more levelled playing field for package
>>> publication could benefit from the use of some distributed technologies.
>>> Even HTTP mirrors are, arguably, a basic form of that.. there's at least
>>> one question related to recency of data, though.  Delaying availability of
>>> a package to an audience -- if it's important enough -- could under some
>>> circumstances become effectively similar to censorship.
>>> >
>>>
>>> A blockchain won't solve anything here. It would be completely and
>>> utterly impractical to put the packages themselves into a blockchain,
>>> so all you'd have is the index, and that means it's just a bad version
>>> of PyPI's own single-page index.
>>>
>>> ChrisA
>>> ___
>>> Python-ideas mailing list -- python-ideas@python.org
>>> To unsubscribe send an email to python-ideas-le...@python.org
>>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>>> Message archived at
>>> https://mail.python.org/archives/list/python-ideas@python.org/message/PTIS3HZHJSFV7ETWE7UP4HKXS4WN2OEO/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>>
>> Mostly agreed.  A distributed hash table or similar, though, could be
>> appropriate in combination with ideas similar to the accreting layers of
>> self-reinforcing consensus that some blockchain technologies provide.
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/I3CDZAXGYVS33DJ4JEENGYMF4MY6BQ7O/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YS3UQFENIAX7GGXD2KCJ3GHZJJJV3KLM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-06 Thread Gregory Disney
why do people insist on reinventing the wheel? Blockchain is not the answer
for adding trust that is verifiable. Code signing is the answer, it’s
widely accepted and would be useful in cases of trusted computing and other
security use cases.

I don’t want to load a hash table to load a third party module on a UEFI
interface.

On Thu, Jul 6, 2023 at 9:11 AM James Addison via Python-ideas <
python-ideas@python.org> wrote:

> On Wed, Jul 5, 2023, 19:06 Chris Angelico  wrote:
>
>> On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas
>>  wrote:
>> > I also agree with a later reply about avoiding the murkier side of
>> blockchains / etc.  That said, it seems to me (again, sample size one
>> anecdata) that creating a more levelled playing field for package
>> publication could benefit from the use of some distributed technologies.
>> Even HTTP mirrors are, arguably, a basic form of that.. there's at least
>> one question related to recency of data, though.  Delaying availability of
>> a package to an audience -- if it's important enough -- could under some
>> circumstances become effectively similar to censorship.
>> >
>>
>> A blockchain won't solve anything here. It would be completely and
>> utterly impractical to put the packages themselves into a blockchain,
>> so all you'd have is the index, and that means it's just a bad version
>> of PyPI's own single-page index.
>>
>> ChrisA
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/PTIS3HZHJSFV7ETWE7UP4HKXS4WN2OEO/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
> Mostly agreed.  A distributed hash table or similar, though, could be
> appropriate in combination with ideas similar to the accreting layers of
> self-reinforcing consensus that some blockchain technologies provide.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/I3CDZAXGYVS33DJ4JEENGYMF4MY6BQ7O/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VMEPUO252ZSC6SCM7L5NNLHXUG7COXRB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: "Curated" package repo?

2023-07-06 Thread James Addison via Python-ideas
On Wed, Jul 5, 2023, 19:06 Chris Angelico  wrote:

> On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas
>  wrote:
> > I also agree with a later reply about avoiding the murkier side of
> blockchains / etc.  That said, it seems to me (again, sample size one
> anecdata) that creating a more levelled playing field for package
> publication could benefit from the use of some distributed technologies.
> Even HTTP mirrors are, arguably, a basic form of that.. there's at least
> one question related to recency of data, though.  Delaying availability of
> a package to an audience -- if it's important enough -- could under some
> circumstances become effectively similar to censorship.
> >
>
> A blockchain won't solve anything here. It would be completely and
> utterly impractical to put the packages themselves into a blockchain,
> so all you'd have is the index, and that means it's just a bad version
> of PyPI's own single-page index.
>
> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/PTIS3HZHJSFV7ETWE7UP4HKXS4WN2OEO/
> Code of Conduct: http://python.org/psf/codeofconduct/


Mostly agreed.  A distributed hash table or similar, though, could be
appropriate in combination with ideas similar to the accreting layers of
self-reinforcing consensus that some blockchain technologies provide.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/I3CDZAXGYVS33DJ4JEENGYMF4MY6BQ7O/
Code of Conduct: http://python.org/psf/codeofconduct/