[Python-ideas] Re: "Curated" package repo?

Dom Grigonis Thu, 06 Jul 2023 09:01:39 -0700

Thanks David,

Not keen to take it on solo.


Ideally, IMO, this could be a joint project of this whole group. Someone more 
senior from this group creates a repo and oversights the progress, couple of 
more experienced members make initial decisions to get things going, someone 
issues a review PR, couple of new OPs are suggested to write a review on their 
queries.

Would be good to know if:
1. Someone has their own little benchmarking/reviews and would be willing to 
spend a little time to issue PR for some initial content.
2. People from this group see themselves of going to such place and adding a 
new great package they have just found to existing reviews (or even more 
importantly an awful package)
3. Someone sees the opportunity to contribute given someone took such project 
on. e.g.
  a) someone is very excited about benchmarking automation
  b) someone has some working scripts to fetch github stats / stack trends that 
are waiting to be used
  c) someone wants to take their devops to next level and sees this as a good 
opportunity
  d) someone is very keen on high level view and would like to contribute in 
working on categorisation (partially relying on python stdlib/libref could be 
intuitive, although then it is a dependency)

A lot of “someones” in this e-mail...

> On 6 Jul 2023, at 16:55, David Mertz, Ph.D. <david.me...@gmail.com> wrote:
> 
> Dom:
> 
> I'd recommend you simply start a GitHub project for "Curated PyPI", find a 
> catchy domain name, and publish that via GH Pages.  That's a few hours of 
> work to get a skeleton.  But no, I'm not quite volunteering to create and 
> maintain it myself today.
> 
> After there is a concrete site existing, you can refine the presentation and 
> governance procedure iteratively.  As a start, it can basically just be a web 
> page with evaluations like yours of the JSON libraries.  At a first pass, 
> there's no need for anything dynamic on the page, just some tables (or maybe 
> accordions, or side-bar navigation, or whatever).
> 
> I'd be very likely to make some PRs to such a repository myself.  At some 
> point, with enough recommendations, you might add some automation. E.g. some 
> script that checks all the submitted "package reviews" and creates an 
> aggregation ("10 reviews with average rating of 8").  Even there, running 
> that thing offline every once in a while is plenty to start (you could do GH 
> Actions or something too, if you like).
> 
> There are a few decisions to make, but none that difficult.  For example, 
> what format are reviews? Markdown? YAML? TOML? JSON? Python with conventions? 
> Whatever it is, it should have a gentle learning curve and be human readable 
> IMO.
> 
> 
> 
> On Thu, Jul 6, 2023 at 9:26 AM Dom Grigonis <dom.grigo...@gmail.com 
> <mailto:dom.grigo...@gmail.com>> wrote:
> It is possible, that issues being discussed at this stage are not as relevant 
> as they seem at stage 0, which this idea is at.
> (Unless someone here is looking for a very serious commitment.)
> 
> If some sort of starting point which is “light” in approach was decided on, 
> then the process can be readjusted as/if it progresses. Maybe no need to put 
> a “stamp” on a package, but simply provide comparison statistics given some 
> initial structure.
> 
> I think a lot of packages can be filtered on objective criteria, without even 
> reaching the stage of subjective opinions.
> 
> ———————————————
> 
> General info - fairly easy to inspect without the need of subjective opinions.
> 1. License
> 2. Maintenance - hard stack overflow & repo stats
> 
> Performance - hard stats:
> 1. There will be lower level language extensions, which even if not up to 
> standards in other aspects are worth attention, someone else might pick it up 
> and rejuvenate if explicitly indicated.
> 2. There will be a pure python packages:
>   a) good coding standards with good knowledge on efficient programming in 
> pure python
>   b) pure python packages that take ages to execute
> 
> In many areas, this will filter out many libraries. Although, there are some, 
> where it wouldn’t. E.g. schema-based low level serialisation, where 
> benchmarks can be quite tight.
> 
> The remaining evaluation can be subjective opinions, where preferences of 
> curators can be taken into account:
> 1. Coding standards
> 2. Integration
> 3. Flexibility/functionality
> 4. …
> 
> IMO, all of this can be done while being on the safe side - if unsure, leave 
> the plain statistics for users and developers to see.
> 
> ———————————————
> 
> An example. (I am not the developer of any of these)
> Json serialisers:
> 1. json - stdlib, average performance, well maintained, flexible, very safe 
> to depend on
> 2. simplejson - 3rd party, pure python, performance in line with 1), drop-in 
> replacement for json, been around for a while, safe to depend on
> 2. ultrajson - 3rd party, written in C, >3x performance boost on 1) & 2), 
> drop-in replacement for json, been around for a while, safe to depend on
> 3. ijson - 3rd party, C&python, average performance, proprietary interface 
> relying heavily on iterator protocol, status <TBC>
> 4. orjson - 3rd party, highly optimised C, performance on par with fastest 
> serialisers on the market, not-a-drop-in-replacement for json, due to 
> sacrifices for performance, rich in functionality, well maintained, safe to 
> depend on
> 5. pyjson5 - 3rd party, c++ performance similar to ultrajson, can be a 
> drop-in replacement for json, extends json to json5 features such as 
> comments, well maintained, safe to depend on
> 
> (THIS IS JUST AN EXAMPLE OF COMPARISON, NOT TO BE RELIED ON)
> 
> So there is still a bit of opinion here, but all of this can be standardised 
> and put in numbers, and comparison of this type can be  done with 
> little-to-none personal opinion.
> 
> ———————————————
> 
> After structure for this is in place, it would be easier to discuss further 
> whether more serious curation is needed/worthwhile/makes sense.
> 
> Allow queries from users, package developers, places to gather opinions, 
> maybe volunteering to do a deeper analysis… 
> 
> And once there is enough input, maybe a curated guidance can be added to the 
> review. But this is the next stage, which is not necessarily needed to be 
> thoroughly thought out before putting in place something simple, objective & 
> risk-free.
> 
> ———————————————
> 
> Maybe stage 1. is all that users need - a reliable place to check hard stats, 
> where users and developers can update them for the benefit of all. With 
> enough popularity, package developers should be motivated to issue stat 
> updates (e.g. add additional column to benchmarking script), and users would 
> issue similar updates (e.g. add additional column to benchmarking script, 
> where the library is extremely slow).
> 
> It is possible that the project would naturally turn to direction of hard 
> stat coverage instead of “deep” curation. E.g.
> json serialisers become a sub-branch of schema-less serialisers,
> which in turn become a branch of serialisers
> 
> Then the user can then view comparable stats of the whole branch, sub-branch, 
> sub-sub-branch to get the information he needs to make decisions. And apply 
> different filters in the process to get to the final list of packages on 
> which the user will have to do hiss final subjective analysis anyways.
> 
> ———————————————
> 
> E.g. User needs a serialiser. He prefers schema-less, but willing to go 
> schema-based given large increases in performance. Does not mind low 
> maintenance status given he aims to maintain his own proprietary 
> serialisation library in the long run. Naturally, clean & simple coding with 
> permissive license is preferred.
> 
> Just a portal with up-to-date stats where user could interactively navigate 
> such decisions would be a good start and potentially a “safe” route to begin 
> with.
> 
> The starting work on such thing then would be more heavy on automation, 
> rather than politics, which in turn will be easier to tackle later once there 
> is something more tangible to discuss.
> 
> > On 5 Jul 2023, at 21:34, Brendan Barnwell <brenb...@brenbarn.net 
> > <mailto:brenb...@brenbarn.net>> wrote:
> > 
> > On 2023-07-05 00:00, Christopher Barker wrote:
> >> I'm noting this, because I think it's part of the problem to be solved, 
> >> but maybe not the mainone (to me anyway). I've been focused more on "these 
> >> packages are worthwhile, by some definition of worthwhile). While I think 
> >> Chris A is more focused on "which of these seemingly similar packages 
> >> should I use?" -- not unrelated, but not the same question either.
> > 
> >       I noticed this in the discussion and I think it's an important 
> > difference in how people approach this question.  Basically what some 
> > people want from a curated index is "this package is not junk" while others 
> > want "this package is actually good" or even "you should use this package 
> > for this purpose".
> > 
> >       I think that providing "not-junk level" curation is somewhat more 
> > tractable, because this form of curation is closer to a logical OR on 
> > different people's opinions.  It may be that many people tried a package 
> > and didn't find it useful, but if at least one person did find it useful, 
> > then we can probably say it's not junk.
> > 
> >       Providing "actually-good level" curation or "recommendations" is 
> > harder, because it means you actually have to address differences of 
> > opinion among curators.
> > 
> >       Personally I tend to think a not-junk type curation is the better one 
> > to aim at, for a few reasons.  First, it's easier.  Second, it eliminates 
> > one of the main problems with trying to search for packages on pypi, namely 
> > the huge number of "mytestpackage1"-type packages. Third, this is what 
> > conda-forge does and it seems to be working pretty well there.
> > 
> > -- 
> > Brendan Barnwell
> > "Do not follow where the path may lead.  Go, instead, where there is no 
> > path, and leave a trail."
> >   --author unknown
> > 
> > _______________________________________________
> > Python-ideas mailing list -- python-ideas@python.org 
> > <mailto:python-ideas@python.org>
> > To unsubscribe send an email to python-ideas-le...@python.org 
> > <mailto:python-ideas-le...@python.org>
> > https://mail.python.org/mailman3/lists/python-ideas.python.org/ 
> > <https://mail.python.org/mailman3/lists/python-ideas.python.org/>
> > Message archived at 
> > https://mail.python.org/archives/list/python-ideas@python.org/message/XH2GTRKHZ2T4Z3VHQUCC5L7OATSHPUQU/
> >  
> > <https://mail.python.org/archives/list/python-ideas@python.org/message/XH2GTRKHZ2T4Z3VHQUCC5L7OATSHPUQU/>
> > Code of Conduct: http://python.org/psf/codeofconduct/ 
> > <http://python.org/psf/codeofconduct/>
> 
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org 
> <mailto:python-ideas@python.org>
> To unsubscribe send an email to python-ideas-le...@python.org 
> <mailto:python-ideas-le...@python.org>
> https://mail.python.org/mailman3/lists/python-ideas.python.org/ 
> <https://mail.python.org/mailman3/lists/python-ideas.python.org/>
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/57RIWZ72DZXZYPNJV4PFROVMLSFJB6KO/
>  
> <https://mail.python.org/archives/list/python-ideas@python.org/message/57RIWZ72DZXZYPNJV4PFROVMLSFJB6KO/>
> Code of Conduct: http://python.org/psf/codeofconduct/ 
> <http://python.org/psf/codeofconduct/>
> 
> 
> -- 
> The dead increasingly dominate and strangle both the living and the 
> not-yet born.  Vampiric capital and undead corporate persons abuse 
> the lives and control the thoughts of homo faber. Ideas, once born, 
> become abortifacients against new conceptions.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/T7GGQR4DPZ3R26UIO24RWJTCN5XNI5KV/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: "Curated" package repo?

Reply via email to