[Python-ideas] Re: "Curated" package repo?

Dom Grigonis Thu, 06 Jul 2023 06:29:16 -0700

It is possible, that issues being discussed at this stage are not as relevant 
as they seem at stage 0, which this idea is at.
(Unless someone here is looking for a very serious commitment.)

If some sort of starting point which is “light” in approach was decided on, 
then the process can be readjusted as/if it progresses. Maybe no need to put a 
“stamp” on a package, but simply provide comparison statistics given some 
initial structure.

I think a lot of packages can be filtered on objective criteria, without even 
reaching the stage of subjective opinions.

———————————————

General info - fairly easy to inspect without the need of subjective opinions.
1. License
2. Maintenance - hard stack overflow & repo stats

Performance - hard stats:
1. There will be lower level language extensions, which even if not up to 
standards in other aspects are worth attention, someone else might pick it up 
and rejuvenate if explicitly indicated.
2. There will be a pure python packages:
  a) good coding standards with good knowledge on efficient programming in pure 
python
  b) pure python packages that take ages to execute

In many areas, this will filter out many libraries. Although, there are some, 
where it wouldn’t. E.g. schema-based low level serialisation, where benchmarks 
can be quite tight.

The remaining evaluation can be subjective opinions, where preferences of 
curators can be taken into account:
1. Coding standards
2. Integration
3. Flexibility/functionality
4. …

IMO, all of this can be done while being on the safe side - if unsure, leave 
the plain statistics for users and developers to see.

———————————————

An example. (I am not the developer of any of these)
Json serialisers:
1. json - stdlib, average performance, well maintained, flexible, very safe to 
depend on
2. simplejson - 3rd party, pure python, performance in line with 1), drop-in 
replacement for json, been around for a while, safe to depend on
2. ultrajson - 3rd party, written in C, >3x performance boost on 1) & 2), 
drop-in replacement for json, been around for a while, safe to depend on
3. ijson - 3rd party, C&python, average performance, proprietary interface 
relying heavily on iterator protocol, status <TBC>
4. orjson - 3rd party, highly optimised C, performance on par with fastest 
serialisers on the market, not-a-drop-in-replacement for json, due to 
sacrifices for performance, rich in functionality, well maintained, safe to 
depend on
5. pyjson5 - 3rd party, c++ performance similar to ultrajson, can be a drop-in 
replacement for json, extends json to json5 features such as comments, well 
maintained, safe to depend on

(THIS IS JUST AN EXAMPLE OF COMPARISON, NOT TO BE RELIED ON)

So there is still a bit of opinion here, but all of this can be standardised 
and put in numbers, and comparison of this type can be  done with 
little-to-none personal opinion.

———————————————

After structure for this is in place, it would be easier to discuss further 
whether more serious curation is needed/worthwhile/makes sense.

Allow queries from users, package developers, places to gather opinions, maybe 
volunteering to do a deeper analysis… 

And once there is enough input, maybe a curated guidance can be added to the 
review. But this is the next stage, which is not necessarily needed to be 
thoroughly thought out before putting in place something simple, objective & 
risk-free.

———————————————

Maybe stage 1. is all that users need - a reliable place to check hard stats, 
where users and developers can update them for the benefit of all. With enough 
popularity, package developers should be motivated to issue stat updates (e.g. 
add additional column to benchmarking script), and users would issue similar 
updates (e.g. add additional column to benchmarking script, where the library 
is extremely slow).

It is possible that the project would naturally turn to direction of hard stat 
coverage instead of “deep” curation. E.g.
json serialisers become a sub-branch of schema-less serialisers,
which in turn become a branch of serialisers

Then the user can then view comparable stats of the whole branch, sub-branch, 
sub-sub-branch to get the information he needs to make decisions. And apply 
different filters in the process to get to the final list of packages on which 
the user will have to do hiss final subjective analysis anyways.

———————————————

E.g. User needs a serialiser. He prefers schema-less, but willing to go 
schema-based given large increases in performance. Does not mind low 
maintenance status given he aims to maintain his own proprietary serialisation 
library in the long run. Naturally, clean & simple coding with permissive 
license is preferred.

Just a portal with up-to-date stats where user could interactively navigate 
such decisions would be a good start and potentially a “safe” route to begin 
with.

The starting work on such thing then would be more heavy on automation, rather 
than politics, which in turn will be easier to tackle later once there is 
something more tangible to discuss.

> On 5 Jul 2023, at 21:34, Brendan Barnwell <brenb...@brenbarn.net> wrote:
> 
> On 2023-07-05 00:00, Christopher Barker wrote:
>> I'm noting this, because I think it's part of the problem to be solved, but 
>> maybe not the mainone (to me anyway). I've been focused more on "these 
>> packages are worthwhile, by some definition of worthwhile). While I think 
>> Chris A is more focused on "which of these seemingly similar packages should 
>> I use?" -- not unrelated, but not the same question either.
> 
>       I noticed this in the discussion and I think it's an important 
> difference in how people approach this question.  Basically what some people 
> want from a curated index is "this package is not junk" while others want 
> "this package is actually good" or even "you should use this package for this 
> purpose".
> 
>       I think that providing "not-junk level" curation is somewhat more 
> tractable, because this form of curation is closer to a logical OR on 
> different people's opinions.  It may be that many people tried a package and 
> didn't find it useful, but if at least one person did find it useful, then we 
> can probably say it's not junk.
> 
>       Providing "actually-good level" curation or "recommendations" is 
> harder, because it means you actually have to address differences of opinion 
> among curators.
> 
>       Personally I tend to think a not-junk type curation is the better one 
> to aim at, for a few reasons.  First, it's easier.  Second, it eliminates one 
> of the main problems with trying to search for packages on pypi, namely the 
> huge number of "mytestpackage1"-type packages. Third, this is what 
> conda-forge does and it seems to be working pretty well there.
> 
> -- 
> Brendan Barnwell
> "Do not follow where the path may lead.  Go, instead, where there is no path, 
> and leave a trail."
>   --author unknown
> 
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/XH2GTRKHZ2T4Z3VHQUCC5L7OATSHPUQU/
> Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/57RIWZ72DZXZYPNJV4PFROVMLSFJB6KO/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: "Curated" package repo?

Reply via email to