>>>>> Kurt Hornik writes: The variant attaches drops the URL and does unique.
Hmm, the ones in head(with(a, sort_by(a, ~ family + given)), 100) without a family look suspicious ... Best -k
x <- tools::CRAN_package_db() a <- lapply(x[["Authors@R"]], function(a) { if(!is.na(a)) { a <- tryCatch(utils:::.read_authors_at_R_field(a), error = identity) if (inherits(a, "person")) return(a) } NULL }) a <- do.call(c, a) a <- lapply(a, function(e) { if(is.null(o <- e$comment["ORCID"]) || is.na(o)) return(NULL) cbind(given = paste(e$given, collapse = " "), family = paste(e$family, collapse = " "), oid = unname(tools:::.ORCID_iD_canonicalize(o))) }) a <- unique(as.data.frame(do.call(rbind, a)))
>>>>> Dirk Eddelbuettel writes: >> On 20 August 2024 at 07:57, Dirk Eddelbuettel wrote: >> | >> | Hi Kurt, >> | >> | On 20 August 2024 at 14:29, Kurt Hornik wrote: >> | | I think for now you could use something like what I attach below. >> | | >> | | Not ideal: I had not too long ago starting adding orcidtools.R to tools, >> | | which e.g. has .persons_from_metadata(), but that works on the unpacked >> | | sources and not the CRAN package db. Need to think about that ... >> | >> | We need something like that too as I fat-fingered the string 'ORCID'. See >> | fortune::fortunes("Dirk can type"). >> | >> | Will the function below later. Many thanks for sending it along. >> Very nice. Resisted my common impulse to make it a data.table for easy >> sorting via keys etc. After running your code the line >> head(with(a, sort_by(a, ~ family + given)), 100) >> shows that we need a bit more QA as person entries are not properly split >> between 'family' and 'given', use the URL and that we have repeats. >> Excluding those is next. > Right. One should canonicalize the ORCID (having the URLs is from being > nice) and then do unique() ... > Best > -k >> Dirk >> | Dirk >> | >> | | >> | | Best >> | | -k >> | | >> | | ******************************************************************** >> | | x <- tools::CRAN_package_db() >> | | a <- lapply(x[["Authors@R"]], >> | | function(a) { >> | | if(!is.na(a)) { >> | | a <- tryCatch(utils:::.read_authors_at_R_field(a), >> | | error = identity) >> | | if (inherits(a, "person")) >> | | return(a) >> | | } >> | | NULL >> | | }) >> | | a <- do.call(c, a) >> | | a <- lapply(a, >> | | function(e) { >> | | if(is.null(o <- e$comment["ORCID"]) || is.na(o)) >> | | return(NULL) >> | | cbind(given = paste(e$given, collapse = " "), >> | | family = paste(e$family, collapse = " "), >> | | oid = unname(o)) >> | | }) >> | | a <- as.data.frame(do.call(rbind, a)) >> | | ******************************************************************** >> | | >> | | > Salut Thierry, >> | | >> | | > On 20 August 2024 at 13:43, Thierry Onkelinx wrote: >> | | > | Happy to help. I'm working on a new version of the checklist >> package. I could >> | | > | export the function if that makes it easier for you. >> | | >> | | > Would be happy to help / iterate. Can you take a stab at making the >> | | > per-column split more robust so that we can bulk-process all non-NA >> entries >> | | > of the returned db? >> | | >> | | > Best, Dirk >> | | >> | | > -- >> | | > dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org >> | >> | -- >> | dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org >> -- >> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel