On Thu, 07 Oct 2021, Leonard Mada via R-help writes: > Dear R Users, > > > I wrote a minimal parser to extract strings and > comments from the function definitions. > > > The string extraction works fine. But there are no comments: > > a.) Are the comments stripped from the compiled packages? > > b.) Alternatively: Is the deparse() not suited for this task? > > b.2.) Is deparse() parsing the function/expression itself? > > [see code for extract.str.fun() function below] > > > ### All strings in "base" > extract.str.pkg("base") > # type = 2 for Comments: > extract.str.pkg("base", type=2) > extract.str.pkg("sp", type=2) > extract.str.pkg("NetLogoR", type=2) > > The code for the 2 functions (extract.str.pkg & > extract.str.fun) and the code for the parse.simple() > parser are below. > > > Sincerely, > > > Leonard > > ======= > > The latest code is on GitHub: > > https://github.com/discoleo/R/blob/master/Stat/Tools.Formulas.R > > > ### Code to process functions in packages: > extract.str.fun = function(fn, pkg, type=1, strip=TRUE) { > fn = as.symbol(fn); pkg = as.symbol(pkg); > fn = list(substitute(pkg ::: fn)); > # deparse > s = paste0(do.call(deparse, fn), collapse=""); > npos = parse.simple(s); > extract.str(s, npos[[type]], strip=strip) > } > extract.str.pkg = function(pkg, type=1, exclude.z = TRUE, strip=TRUE) { > nms = ls(getNamespace(pkg)); > l = lapply(nms, function(fn) extract.str.fun(fn, > pkg, type=type, strip=strip)); > if(exclude.z) { > hasStr = sapply(l, function(s) length(s) >= 1); > nms = nms[hasStr]; > l = l[hasStr]; > } > names(l) = nms; > return(l); > } > > ### minimal Parser: > # - proof of concept; > # - may be useful to process non-conformant R "code", e.g.: > # "{\"abc\" + \"bcd\"} {FUN}"; (still TODO) > # Warning: > # - not thoroughly checked & > # may be a little buggy! > > parse.simple = function(x, eol="\n") { > len = nchar(x); > n.comm = list(integer(0), integer(0)); > n.str = list(integer(0), integer(0)); > is.hex = function(ch) { > # Note: only for 1 character! > return((ch >= "0" && ch <= "9") || > (ch >= "A" && ch <= "F") || > (ch >= "a" && ch <= "f")); > } > npos = 1; > while(npos <= len) { > s = substr(x, npos, npos); > # State: COMMENT > if(s == "#") { > n.comm[[1]] = c(n.comm[[1]], npos); > while(npos < len) { > npos = npos + 1; > if(substr(x, npos, npos) == eol) break; > } > n.comm[[2]] = c(n.comm[[2]], npos); > npos = npos + 1; next; > } > # State: STRING > if(s == "\"" || s == "'") { > n.str[[1]] = c(n.str[[1]], npos); > while(npos < len) { > npos = npos + 1; > se = substr(x, npos, npos); > if(se == "\\") { > npos = npos + 1; > # simple escape vs Unicode: > if(substr(x, npos, npos) != "u") next; > len.end = min(len, npos + 4); > npos = npos + 1; > isAllHex = TRUE; > while(npos <= len.end) { > se = substr(x, npos, npos); > if( ! is.hex(se)) { isAllHex = FALSE; break; } > npos = npos + 1; > } > if(isAllHex) next; > } > if(se == s) break; > } > n.str[[2]] = c(n.str[[2]], npos); > npos = npos + 1; next; > } > npos = npos + 1; > } > return(list(str = n.str, comm = n.comm)); > } > > > extract.str = function(s, npos, strip=FALSE) { > if(length(npos[[1]]) == 0) return(character(0)); > strip.FUN = if(strip) { > function(id) { > if(npos[[1]][[id]] + 1 < npos[[2]][[id]]) { > nStart = npos[[1]][[id]] + 1; > nEnd = npos[[2]][[id]] - 1; # TODO: > Error with malformed string > return(substr(s, nStart, nEnd)); > } else { > return(""); > } > } > } else function(id) substr(s, npos[[1]][[id]], npos[[2]][[id]]); > sapply(seq(length(npos[[1]])), strip.FUN); > } >
On a.) There is an option "keep.source" that controls this behaviour. When you install a package via R CMD INSTALL, you can specify the option; see R CMD INSTALL --help . There is also the "remindR" package on CRAN which (I think) does something similar. -- Enrico Schumann Lucerne, Switzerland http://enricoschumann.net ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.