Re: [R-pkg-devel] duplicate function during build
On Sat, 23 Jul 2016, ProfJCNash writes: > Thanks Sven. That indeed works. And if anyone has ideas how it could be > put into R so Windows users could benefit, I'm sure it would be useful > in checks of packages. You could use R functionality to rewrite the shell commands. Perhaps along those lines: --8<---cut here---start->8--- fun_names <- function(dir, duplicates_only = TRUE, file_pattern = "[.][rR]$", fun_pattern = " *([^\\s]+) *<- *function.*") { files <- dir(dir, pattern = file_pattern, full.names = TRUE) ans <- data.frame(fun = character(0), file = character(0)) for (f in files) { txt <- readLines(f) fun.lines <- grepl(fun_pattern, txt) if (any(fun.lines)) { ans <- rbind(ans, data.frame(fun = gsub(fun_pattern, "\\1", txt[fun.lines], perl = TRUE), file = f, line = which(fun.lines), stringsAsFactors = FALSE)) } } ans <- ans[order(ans[["fun"]]), ] if (duplicates_only) { d <- duplicated(ans[["fun"]]) d0 <- match(unique(ans[["fun"]][d]), ans[["fun"]]) ans <- ans[sort(c(d0, which(d))),] } ans } --8<---cut here---end--->8--- One would call then function on a directory. For instance, fun_names("~/Packages/NMOF/R") gives me output funfile line 10 cfHeston /home/es/Packages/NMOF/R/callCF.R 41 18 cfHeston /home/es/Packages/NMOF/R/callHestoncf.R 29 ## [...] But it will be tricky to catch only such re-definitions of functions that have been left in the files by mistake. For instance, I often define short helper functions within other functions, and such helper functions might then get flagged, too. Kind regards Enrico > In other investigations of this, I realized that install.R has to > prepare the .rdb and .rdx files and at that stage duplication might be > detected. If install.R puts both versions of a duplicated name into > these files, then the lazy load of library() or require() could be a > place where detection would be useful, though only one of the names gets > actually made available for use. However, my expertise with this > internal aspect of R is rather weak. > > Cheers, JN > > On 16-07-23 12:04 PM, Sven E. Templer wrote: >> Despite it might help, learning/using git is not tackling this specific >> problem, I suggest code that does: >> >> sed -e 's/^[\ \t]*//' -e 's/#.*//' R/* | awk '/function/{print $1}' | sort | >> uniq -d >> >> or >> >> https://gist.github.com/setempler/7fcf2a3a737ce1293e0623d2bb8e08ed >> (any comments welcome) >> >> If one knows coding R, it might be more productive developing a tiny tool >> for that, instead of learning a new (and complex) one (as git). >> >> Nevertheless, git is great! >> >> Best wishes, >> >> Sven >> >> --- >> >> web: www.templer.se >> twitter: @setempler >>> On 23 Jul 2016, at 16:17, Hadley Wickham wrote: >>> >>> I think this sort of meta problem is best solved with svn/git because you >>> can easily see if the changes you think you made align with the changes you >>> actually made. Learning svn or git is a lot of work, but the payoff is >>> worth it. >>> >>> Hadley >>> >>> On Friday, July 22, 2016, ProfJCNash wrote: >>> In trying to rationalize some files in a package I'm working on, I copied a function from one file to another, but forgot to change the name of one of them. It turns out the name of the file containing the "old" function was later in collation sequence than the one I was planning to be the "new" one. To debug some issues, I put some print() and cat() statements in the "new" file, but after building the package, they weren't there. Turns out the "old" function got installed, as might be expected if files processed in order. Debugging this took about 2 hours of slightly weird effort with 2 machines and 3 OS distributions before I realized the problem. It's fairly obvious that I should expect issues in this case, but not so clear how to detect the source of the problem. Question: Has anyone created a script to catch such duplicate functions from different files during build? I think a warning message that there are duplicate functions could save some time and effort. Maybe it's already there, but I saw no obvious message. In this case, I'm only working in R. I've found build.R in the R tarball, which is where I suspect such a check should go, and I'm willing to prepare a patch when I figure out how this should be done. However, it seems worth askin
Re: [R-pkg-devel] duplicate function during build
I don't know if ctags works with R files, but ctags does a similar thing as you are asking for other languages, and can be integrated into git using hooks, as in: https://robots.thoughtbot.com/use-git-hooks-to-automate-annoying-tasks Don't know if this helps, but thought I would pass it along. -Roy > On Jul 23, 2016, at 10:20 AM, ProfJCNash wrote: > > Thanks Sven. That indeed works. And if anyone has ideas how it could be > put into R so Windows users could benefit, I'm sure it would be useful > in checks of packages. > > In other investigations of this, I realized that install.R has to > prepare the .rdb and .rdx files and at that stage duplication might be > detected. If install.R puts both versions of a duplicated name into > these files, then the lazy load of library() or require() could be a > place where detection would be useful, though only one of the names gets > actually made available for use. However, my expertise with this > internal aspect of R is rather weak. > > Cheers, JN > > On 16-07-23 12:04 PM, Sven E. Templer wrote: >> Despite it might help, learning/using git is not tackling this specific >> problem, I suggest code that does: >> >> sed -e 's/^[\ \t]*//' -e 's/#.*//' R/* | awk '/function/{print $1}' | sort | >> uniq -d >> >> or >> >> https://gist.github.com/setempler/7fcf2a3a737ce1293e0623d2bb8e08ed >> (any comments welcome) >> >> If one knows coding R, it might be more productive developing a tiny tool >> for that, instead of learning a new (and complex) one (as git). >> >> Nevertheless, git is great! >> >> Best wishes, >> >> Sven >> >> --- >> >> web: www.templer.se >> twitter: @setempler >>> On 23 Jul 2016, at 16:17, Hadley Wickham wrote: >>> >>> I think this sort of meta problem is best solved with svn/git because you >>> can easily see if the changes you think you made align with the changes you >>> actually made. Learning svn or git is a lot of work, but the payoff is >>> worth it. >>> >>> Hadley >>> >>> On Friday, July 22, 2016, ProfJCNash wrote: >>> In trying to rationalize some files in a package I'm working on, I copied a function from one file to another, but forgot to change the name of one of them. It turns out the name of the file containing the "old" function was later in collation sequence than the one I was planning to be the "new" one. To debug some issues, I put some print() and cat() statements in the "new" file, but after building the package, they weren't there. Turns out the "old" function got installed, as might be expected if files processed in order. Debugging this took about 2 hours of slightly weird effort with 2 machines and 3 OS distributions before I realized the problem. It's fairly obvious that I should expect issues in this case, but not so clear how to detect the source of the problem. Question: Has anyone created a script to catch such duplicate functions from different files during build? I think a warning message that there are duplicate functions could save some time and effort. Maybe it's already there, but I saw no obvious message. In this case, I'm only working in R. I've found build.R in the R tarball, which is where I suspect such a check should go, and I'm willing to prepare a patch when I figure out how this should be done. However, it seems worth asking if anyone has needed to do this before. I've already done some searching, but the results seem to pick up quite different posts than I need. Cheers, JN __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel >>> >>> >>> -- >>> http://hadley.nz >>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-package-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-package-devel >> >> >> >> >> > > __ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel ** "The contents of this message do not reflect any position of the U.S. Government or NOAA." ** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center ***Note new address and phone*** 110 Shaffer Road Santa Cruz, CA 95060 Phone: (831)-420-3666 Fax: (831) 420-3980 e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/ "Old age and treachery will overcome youth and skill." "From those who have been given much, much will be expected" "the arc of the moral universe is long, but it bends toward justice" -MLK Jr. __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] duplicate function during build
Thanks Sven. That indeed works. And if anyone has ideas how it could be put into R so Windows users could benefit, I'm sure it would be useful in checks of packages. In other investigations of this, I realized that install.R has to prepare the .rdb and .rdx files and at that stage duplication might be detected. If install.R puts both versions of a duplicated name into these files, then the lazy load of library() or require() could be a place where detection would be useful, though only one of the names gets actually made available for use. However, my expertise with this internal aspect of R is rather weak. Cheers, JN On 16-07-23 12:04 PM, Sven E. Templer wrote: > Despite it might help, learning/using git is not tackling this specific > problem, I suggest code that does: > > sed -e 's/^[\ \t]*//' -e 's/#.*//' R/* | awk '/function/{print $1}' | sort | > uniq -d > > or > > https://gist.github.com/setempler/7fcf2a3a737ce1293e0623d2bb8e08ed > (any comments welcome) > > If one knows coding R, it might be more productive developing a tiny tool for > that, instead of learning a new (and complex) one (as git). > > Nevertheless, git is great! > > Best wishes, > > Sven > > --- > > web: www.templer.se > twitter: @setempler >> On 23 Jul 2016, at 16:17, Hadley Wickham wrote: >> >> I think this sort of meta problem is best solved with svn/git because you >> can easily see if the changes you think you made align with the changes you >> actually made. Learning svn or git is a lot of work, but the payoff is >> worth it. >> >> Hadley >> >> On Friday, July 22, 2016, ProfJCNash wrote: >> >>> In trying to rationalize some files in a package I'm working on, I >>> copied a function from one file to another, but forgot to change the >>> name of one of them. It turns out the name of the file containing the >>> "old" function was later in collation sequence than the one I was >>> planning to be the "new" one. To debug some issues, I put some print() >>> and cat() statements in the "new" file, but after building the package, >>> they weren't there. Turns out the "old" function got installed, as might >>> be expected if files processed in order. Debugging this took about 2 >>> hours of slightly weird effort with 2 machines and 3 OS distributions >>> before I realized the problem. It's fairly obvious that I should expect >>> issues in this case, but not so clear how to detect the source of the >>> problem. >>> >>> Question: Has anyone created a script to catch such duplicate functions >>> from different files during build? I think a warning message that there >>> are duplicate functions could save some time and effort. Maybe it's >>> already there, but I saw no obvious message. In this case, I'm only >>> working in R. >>> >>> I've found build.R in the R tarball, which is where I suspect such a >>> check should go, and I'm willing to prepare a patch when I figure out >>> how this should be done. However, it seems worth asking if anyone has >>> needed to do this before. I've already done some searching, but the >>> results seem to pick up quite different posts than I need. >>> >>> Cheers, JN >>> >>> __ >>> R-package-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-package-devel >>> >> >> >> -- >> http://hadley.nz >> >> [[alternative HTML version deleted]] >> >> __ >> R-package-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-package-devel > > > > > __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] duplicate function during build
Hadley My initial reflex reaction was svn/git too, but then I could not see how to use either to identify the problem John had. If you have a good svn/git command for identifying duplicate functions could you please post it, I am curious. (BTW, John does use svn, and possibly git too.) Thanks, Paul On 07/23/2016 10:17 AM, Hadley Wickham wrote: I think this sort of meta problem is best solved with svn/git because you can easily see if the changes you think you made align with the changes you actually made. Learning svn or git is a lot of work, but the payoff is worth it. Hadley On Friday, July 22, 2016, ProfJCNash wrote: In trying to rationalize some files in a package I'm working on, I copied a function from one file to another, but forgot to change the name of one of them. It turns out the name of the file containing the "old" function was later in collation sequence than the one I was planning to be the "new" one. To debug some issues, I put some print() and cat() statements in the "new" file, but after building the package, they weren't there. Turns out the "old" function got installed, as might be expected if files processed in order. Debugging this took about 2 hours of slightly weird effort with 2 machines and 3 OS distributions before I realized the problem. It's fairly obvious that I should expect issues in this case, but not so clear how to detect the source of the problem. Question: Has anyone created a script to catch such duplicate functions from different files during build? I think a warning message that there are duplicate functions could save some time and effort. Maybe it's already there, but I saw no obvious message. In this case, I'm only working in R. I've found build.R in the R tarball, which is where I suspect such a check should go, and I'm willing to prepare a patch when I figure out how this should be done. However, it seems worth asking if anyone has needed to do this before. I've already done some searching, but the results seem to pick up quite different posts than I need. Cheers, JN __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] duplicate function during build
Despite it might help, learning/using git is not tackling this specific problem, I suggest code that does: sed -e 's/^[\ \t]*//' -e 's/#.*//' R/* | awk '/function/{print $1}' | sort | uniq -d or https://gist.github.com/setempler/7fcf2a3a737ce1293e0623d2bb8e08ed (any comments welcome) If one knows coding R, it might be more productive developing a tiny tool for that, instead of learning a new (and complex) one (as git). Nevertheless, git is great! Best wishes, Sven --- web: www.templer.se twitter: @setempler > On 23 Jul 2016, at 16:17, Hadley Wickham wrote: > > I think this sort of meta problem is best solved with svn/git because you > can easily see if the changes you think you made align with the changes you > actually made. Learning svn or git is a lot of work, but the payoff is > worth it. > > Hadley > > On Friday, July 22, 2016, ProfJCNash wrote: > >> In trying to rationalize some files in a package I'm working on, I >> copied a function from one file to another, but forgot to change the >> name of one of them. It turns out the name of the file containing the >> "old" function was later in collation sequence than the one I was >> planning to be the "new" one. To debug some issues, I put some print() >> and cat() statements in the "new" file, but after building the package, >> they weren't there. Turns out the "old" function got installed, as might >> be expected if files processed in order. Debugging this took about 2 >> hours of slightly weird effort with 2 machines and 3 OS distributions >> before I realized the problem. It's fairly obvious that I should expect >> issues in this case, but not so clear how to detect the source of the >> problem. >> >> Question: Has anyone created a script to catch such duplicate functions >> from different files during build? I think a warning message that there >> are duplicate functions could save some time and effort. Maybe it's >> already there, but I saw no obvious message. In this case, I'm only >> working in R. >> >> I've found build.R in the R tarball, which is where I suspect such a >> check should go, and I'm willing to prepare a patch when I figure out >> how this should be done. However, it seems worth asking if anyone has >> needed to do this before. I've already done some searching, but the >> results seem to pick up quite different posts than I need. >> >> Cheers, JN >> >> __ >> R-package-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-package-devel >> > > > -- > http://hadley.nz > > [[alternative HTML version deleted]] > > __ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] duplicate function during build
I think this sort of meta problem is best solved with svn/git because you can easily see if the changes you think you made align with the changes you actually made. Learning svn or git is a lot of work, but the payoff is worth it. Hadley On Friday, July 22, 2016, ProfJCNash wrote: > In trying to rationalize some files in a package I'm working on, I > copied a function from one file to another, but forgot to change the > name of one of them. It turns out the name of the file containing the > "old" function was later in collation sequence than the one I was > planning to be the "new" one. To debug some issues, I put some print() > and cat() statements in the "new" file, but after building the package, > they weren't there. Turns out the "old" function got installed, as might > be expected if files processed in order. Debugging this took about 2 > hours of slightly weird effort with 2 machines and 3 OS distributions > before I realized the problem. It's fairly obvious that I should expect > issues in this case, but not so clear how to detect the source of the > problem. > > Question: Has anyone created a script to catch such duplicate functions > from different files during build? I think a warning message that there > are duplicate functions could save some time and effort. Maybe it's > already there, but I saw no obvious message. In this case, I'm only > working in R. > > I've found build.R in the R tarball, which is where I suspect such a > check should go, and I'm willing to prepare a patch when I figure out > how this should be done. However, it seems worth asking if anyone has > needed to do this before. I've already done some searching, but the > results seem to pick up quite different posts than I need. > > Cheers, JN > > __ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel > -- http://hadley.nz [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] duplicate function during build
Not during build, but before, you could run in a bash from the package source root: $ awk '/function/{print $1}' R/* | uniq -d To find the files, use: $ grep R/* Best wishes, Sven > On 23 Jul 2016, at 05:01, ProfJCNash wrote: > > In trying to rationalize some files in a package I'm working on, I > copied a function from one file to another, but forgot to change the > name of one of them. It turns out the name of the file containing the > "old" function was later in collation sequence than the one I was > planning to be the "new" one. To debug some issues, I put some print() > and cat() statements in the "new" file, but after building the package, > they weren't there. Turns out the "old" function got installed, as might > be expected if files processed in order. Debugging this took about 2 > hours of slightly weird effort with 2 machines and 3 OS distributions > before I realized the problem. It's fairly obvious that I should expect > issues in this case, but not so clear how to detect the source of the > problem. > > Question: Has anyone created a script to catch such duplicate functions > from different files during build? I think a warning message that there > are duplicate functions could save some time and effort. Maybe it's > already there, but I saw no obvious message. In this case, I'm only > working in R. > > I've found build.R in the R tarball, which is where I suspect such a > check should go, and I'm willing to prepare a patch when I figure out > how this should be done. However, it seems worth asking if anyone has > needed to do this before. I've already done some searching, but the > results seem to pick up quite different posts than I need. > > Cheers, JN > > __ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel