On 2021-03-03 18:39, Matus Kalas wrote: > Hey all again, and thanks for your thoughts Andrius and Andreas! > > On 2021-03-03 09:36, Andreas Tille wrote: >> Hi Andrius, >> >> On 2021-03-03 08:54, Andrius Merkys wrote: >>> Dear Matus, >>> >>> On 2021-03-02 19:56, Matus Kalas wrote: >>>> I'd suggest hearing from the folks who have done the most of the work >>>> with manually including those IDs, and letting them approve/decide. >>> >>> Absolutely! > > Steffen et al., your opninions on this matter? > >>> >>>> I can imagine that for purely practical reasons in the process of the >>>> manual curation, it might make sense to allow explicitly: >>>> - Name: OMICtools >>>> Entry: N/A (Meaning: I have checked and there was no record) >>>> - Name: bio.tools >>>> Entry: "" (Meaning: I or someone else should check this out; >>>> or perhaps: I checked but wasn't conclusive yet) >>>> >>>> The latter might be useful for contributors who aren't used to all >>>> those >>>> IDs, to make them more visible (including where the gaps are). But on >>>> the other hand, if those are well present in an upstream/metadata >>>> template and very clear in the documentation of upstream/metadata, then >>>> it is not necessary and I'd then tend to like your suggestion Andrius. >>> >>> To me, three flavors of "unknown" looks like an overkill. Most of the >>> metadata in Debian does not even have the two flavors of "unknown": >>> missing Bug-Submit field in d/u/metadata, Homepage in d/control and >>> Upstream-Contact in d/copyright means that this piece of information is >>> either nonexistent or simply not entered (for example, due to the lack >>> of time). Thus I am not sure whether the added value is worth the >>> infrastructure/effort here. But again, this is solely my opinion, >>> certainly not aimed at reflecting those of the people who enter and use >>> the data in d/u/metadata. >> >> I wrote the UDD importer for the metadata files and thus look at the >> data as a "consumer" of the provided information. From this side those >> different meanings of unknown are all turned into "ignore this value". >> So in this respect differentiating between those unknowns is basically >> helpful for those who edit the metadata files. Flagging something as "I >> was here and have checked" is probably kind of helpful. However, it >> might perfectly be that some registry will include that specific >> software later and re-checking makes sense. >> >> For this reason I was recommending to not make those simple things to >> complex since making it complex just drains time from the people who are >> working on it with no visible effect to the users. >> >>> >>> If three flavors option would be preferred, I would also suggest adding >>> date fields for each entry to signal at which point in time the registry >>> was inspected. >> >> As I wrote above later addition of some software to some registry can >> spoil the different meanings of unknown. This could be cured by such a >> date field but I don't think it is of any better value than draining >> time from people maintaining that extra field. Thus I do not think we >> should do this. > > We definitely don't need a date, git blame does that. Also in the form > of the Blame button in Salsa. Without a possibility for inconsistency.
Agree. >> Thanks a lot for your work on this >> >> Andreas. >> >> -- >> http://fam-tille.de >>> >>> Best, >>> Andrius > > There is one closely related issue, which we just briefly touched upon > with Steffen and Hervé in a telcon: What to do with those "NA" packages > that are missing in e.g. bio.tools? > > The regitration in bio.tools (and surely also SciCrunch) could be > automated, but there are at least a couple of things needing human > curation: > > - Which src packages represent one tool (often e.g. libs | language > bindings form separate Debian pkgs). How to mark this and where? Is > there an exisiting Debian mechanism? Or do we need to abuse the > d/u/metadata "Entry" for that, before they're added? (3rd or 4th flavour > of info then 😀 ; btw. git branches could help here 😉 ; and not in > google spreadsheet perhaps 😜 as it has to be machine-readable) Maybe a separate field could be introduced for that? I would prefer leaving "Entry" for IDs only, so that an URL inside the registry could be formulated in a straightforward manner. Imposing internal structure on fields (i.e., abusing "Entry") introduces both negative effect on machine-readability and possible namespace collisions. Should there be a need for free-form storage for information, I would better introduce a "Comment" field for each entry, where a maintainer could store anything one believes is important about that entry. > - Choosing an available, reasonable biotoolsID and tool name. Ideally > tool name and biotoolsID are identical with ID having all small case and > spaces removed/replaced. > > - Any other things needing human curation? > > > > Thank you all, I'm very happy seeing this progressing! > Matus > > > P.S.: Could you please leave all the contents in when replying to the > thread, so that others can reply to previously mentioned points without > having to read every single email in the thread and possibly breaking > linearity of it? I agree that's it not ecological to broadcast the same > text all around the globe again and again, but there are other solutions > than emails that handle that without compromising. Many thanks! OK! Best, Andrius