Re: [R] Extracting Comments from Functions/Packages

2021-10-07 Thread Enrico Schumann
On Thu, 07 Oct 2021, Leonard Mada via R-help writes:

> Dear R Users,
>
>
> I wrote a minimal parser to extract strings and
> comments from the function definitions.
>
>
> The string extraction works fine. But there are no comments:
>
> a.) Are the comments stripped from the compiled packages?
>
> b.) Alternatively: Is the deparse() not suited for this task?
>
> b.2.) Is deparse() parsing the function/expression itself?
>
> [see code for extract.str.fun() function below]
>
>
> ### All strings in "base"
> extract.str.pkg("base")
> # type = 2 for Comments:
> extract.str.pkg("base", type=2)
> extract.str.pkg("sp", type=2)
> extract.str.pkg("NetLogoR", type=2)
>
> The code for the 2 functions (extract.str.pkg &
> extract.str.fun) and the code for the parse.simple()
> parser are below.
>
>
> Sincerely,
>
>
> Leonard
>
> ===
>
> The latest code is on GitHub:
>
> https://github.com/discoleo/R/blob/master/Stat/Tools.Formulas.R
>
>
> ### Code to process functions in packages:
> extract.str.fun = function(fn, pkg, type=1, strip=TRUE) {
>     fn = as.symbol(fn); pkg = as.symbol(pkg);
>     fn = list(substitute(pkg ::: fn));
>     # deparse
>     s = paste0(do.call(deparse, fn), collapse="");
>     npos = parse.simple(s);
>     extract.str(s, npos[[type]], strip=strip)
> }
> extract.str.pkg = function(pkg, type=1, exclude.z = TRUE, strip=TRUE) {
>     nms = ls(getNamespace(pkg));
>     l = lapply(nms, function(fn) extract.str.fun(fn,
> pkg, type=type, strip=strip));
>     if(exclude.z) {
>         hasStr = sapply(l, function(s) length(s) >= 1);
>         nms = nms[hasStr];
>         l = l[hasStr];
>     }
>     names(l) = nms;
>     return(l);
> }
>
> ### minimal Parser:
> # - proof of concept;
> # - may be useful to process non-conformant R "code", e.g.:
> #   "{\"abc\" + \"bcd\"} {FUN}"; (still TODO)
> # Warning:
> # - not thoroughly checked &
> #   may be a little buggy!
>
> parse.simple = function(x, eol="\n") {
>     len = nchar(x);
>     n.comm = list(integer(0), integer(0));
>     n.str  = list(integer(0), integer(0));
>     is.hex = function(ch) {
>         # Note: only for 1 character!
>         return((ch >= "0" && ch <= "9") ||
>             (ch >= "A" && ch <= "F") ||
>             (ch >= "a" && ch <= "f"));
>     }
>     npos = 1;
>     while(npos <= len) {
>         s = substr(x, npos, npos);
>         # State: COMMENT
>         if(s == "#") {
>             n.comm[[1]] = c(n.comm[[1]], npos);
>             while(npos < len) {
>                 npos = npos + 1;
>                 if(substr(x, npos, npos) == eol) break;
>             }
>             n.comm[[2]] = c(n.comm[[2]], npos);
>             npos = npos + 1; next;
>         }
>         # State: STRING
>         if(s == "\"" || s == "'") {
>             n.str[[1]] = c(n.str[[1]], npos);
>             while(npos < len) {
>                 npos = npos + 1;
>                 se = substr(x, npos, npos);
>                 if(se == "\\") {
>                     npos = npos + 1;
>                     # simple escape vs Unicode:
>                     if(substr(x, npos, npos) != "u") next;
>                     len.end = min(len, npos + 4);
>                     npos = npos + 1;
>                     isAllHex = TRUE;
>                     while(npos <= len.end) {
>                         se = substr(x, npos, npos);
>                         if( ! is.hex(se)) { isAllHex = FALSE; break; }
>                         npos = npos + 1;
>                     }
>                     if(isAllHex) next;
>                 }
>                 if(se == s) break;
>             }
>             n.str[[2]] = c(n.str[[2]], npos);
>             npos = npos + 1; next;
>         }
>         npos = npos + 1;
>     }
>     return(list(str = n.str, comm = n.comm));
> }
>
>
> extract.str = function(s, npos, strip=FALSE) {
>     if(length(npos[[1]]) == 0) return(character(0));
>     strip.FUN = if(strip) {
>             function(id) {
>                 if(npos[[1]][[id]] + 1 < npos[[2]][[id]]) {
>                     nStart = npos[[1]][[id]] + 1;
>                     nEnd = npos[[2]][[id]] - 1; # TODO:
> Error with malformed string
>                     return(substr(s, nStart, nEnd));
>                 } else {
>                     return("");
>                 }
>             }
>         } else function(id) substr(s, npos[[1]][[id]], npos[[2]][[id]]);
>     sapply(seq(length(npos[[1]])), strip.FUN);
> }
>

On a.) There is an option "keep.source" that controls
   this behaviour. When you install a package via
   R CMD INSTALL, you can specify the option; see
   R CMD INSTALL --help .

There is also the "remindR" package on CRAN which
(I think) does something similar.


-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

[R] Extracting Comments from Functions/Packages

2021-10-07 Thread Leonard Mada via R-help

Dear R Users,


I wrote a minimal parser to extract strings and comments from the 
function definitions.



The string extraction works fine. But there are no comments:

a.) Are the comments stripped from the compiled packages?

b.) Alternatively: Is the deparse() not suited for this task?

b.2.) Is deparse() parsing the function/expression itself?

[see code for extract.str.fun() function below]


### All strings in "base"
extract.str.pkg("base")
# type = 2 for Comments:
extract.str.pkg("base", type=2)
extract.str.pkg("sp", type=2)
extract.str.pkg("NetLogoR", type=2)

The code for the 2 functions (extract.str.pkg & extract.str.fun) and the 
code for the parse.simple() parser are below.



Sincerely,


Leonard

===

The latest code is on GitHub:

https://github.com/discoleo/R/blob/master/Stat/Tools.Formulas.R


### Code to process functions in packages:
extract.str.fun = function(fn, pkg, type=1, strip=TRUE) {
    fn = as.symbol(fn); pkg = as.symbol(pkg);
    fn = list(substitute(pkg ::: fn));
    # deparse
    s = paste0(do.call(deparse, fn), collapse="");
    npos = parse.simple(s);
    extract.str(s, npos[[type]], strip=strip)
}
extract.str.pkg = function(pkg, type=1, exclude.z = TRUE, strip=TRUE) {
    nms = ls(getNamespace(pkg));
    l = lapply(nms, function(fn) extract.str.fun(fn, pkg, type=type, 
strip=strip));

    if(exclude.z) {
        hasStr = sapply(l, function(s) length(s) >= 1);
        nms = nms[hasStr];
        l = l[hasStr];
    }
    names(l) = nms;
    return(l);
}

### minimal Parser:
# - proof of concept;
# - may be useful to process non-conformant R "code", e.g.:
#   "{\"abc\" + \"bcd\"} {FUN}"; (still TODO)
# Warning:
# - not thoroughly checked &
#   may be a little buggy!

parse.simple = function(x, eol="\n") {
    len = nchar(x);
    n.comm = list(integer(0), integer(0));
    n.str  = list(integer(0), integer(0));
    is.hex = function(ch) {
        # Note: only for 1 character!
        return((ch >= "0" && ch <= "9") ||
            (ch >= "A" && ch <= "F") ||
            (ch >= "a" && ch <= "f"));
    }
    npos = 1;
    while(npos <= len) {
        s = substr(x, npos, npos);
        # State: COMMENT
        if(s == "#") {
            n.comm[[1]] = c(n.comm[[1]], npos);
            while(npos < len) {
                npos = npos + 1;
                if(substr(x, npos, npos) == eol) break;
            }
            n.comm[[2]] = c(n.comm[[2]], npos);
            npos = npos + 1; next;
        }
        # State: STRING
        if(s == "\"" || s == "'") {
            n.str[[1]] = c(n.str[[1]], npos);
            while(npos < len) {
                npos = npos + 1;
                se = substr(x, npos, npos);
                if(se == "\\") {
                    npos = npos + 1;
                    # simple escape vs Unicode:
                    if(substr(x, npos, npos) != "u") next;
                    len.end = min(len, npos + 4);
                    npos = npos + 1;
                    isAllHex = TRUE;
                    while(npos <= len.end) {
                        se = substr(x, npos, npos);
                        if( ! is.hex(se)) { isAllHex = FALSE; break; }
                        npos = npos + 1;
                    }
                    if(isAllHex) next;
                }
                if(se == s) break;
            }
            n.str[[2]] = c(n.str[[2]], npos);
            npos = npos + 1; next;
        }
        npos = npos + 1;
    }
    return(list(str = n.str, comm = n.comm));
}


extract.str = function(s, npos, strip=FALSE) {
    if(length(npos[[1]]) == 0) return(character(0));
    strip.FUN = if(strip) {
            function(id) {
                if(npos[[1]][[id]] + 1 < npos[[2]][[id]]) {
                    nStart = npos[[1]][[id]] + 1;
                    nEnd = npos[[2]][[id]] - 1; # TODO: Error with 
malformed string

                    return(substr(s, nStart, nEnd));
                } else {
                    return("");
                }
            }
        } else function(id) substr(s, npos[[1]][[id]], npos[[2]][[id]]);
    sapply(seq(length(npos[[1]])), strip.FUN);
}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.