Re: [R] by function does not separate output from function with, mulliple parts
Dear John, Printing inside the function is problematic. Your function itself does NOT print the labels. Just as a clarification: F = factor(rep(1:2, 2)) by(data.frame(V = 1:4, F = F), F, function(x) { print(x); return(NULL); } ) # V F # 1 1 1 # 3 3 1 # V F # 2 2 2 # 4 4 2 # F: 1 <- this is NOT printed inside the function # NULL # - # F: 2 # NULL ### Return Results by(data.frame(V = 1:4, F = F), F, function(x) { return(x); } ) # F: 1 # V F # 1 1 1 # 3 3 1 # -- # F: 2 # V F # 2 2 2 # 4 4 2 Maybe others on the list can offer further assistance. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue from R-devel: subset on table
Another solution could also be possible - see below. On 10/21/2023 10:38 PM, Leonard Mada wrote: My mistake! It does actually something else, which is incorrect. One could still use (although the code is more difficult to read): subset(tmp <- table(sample(1:10, 100, T)), tmp > 10) 2) Alternative solution Enhance subset.default to accept also formulas, e.g.: subset.default = function (x, subset, ...) { if(inherits(subset, "formula")) { subset = subset[[2]]; subset = eval(subset, list("." = x)); } else if (! is.logical(subset)) stop("'subset' must be logical") x[subset & ! is.na(subset)] } # it works now: but results depend on sample() subset(table(sample(1:10, 100, T)), ~ . > 10) subset(table(sample(1:10, 100, T)), ~ . > 10 & . < 13) Sincerely, Leonard On 10/21/2023 10:26 PM, Leonard Mada wrote: Dear List Members, There was recently an issue on R-devel (which I noticed only very late): https://stat.ethz.ch/pipermail/r-devel/2023-October/082943.html It is possible to use subset as well, almost as initially stated: subset(table(sample(1:5, 100, T)), table > 10) # Error in table > 10 : # comparison (>) is possible only for atomic and list types subset(table(sample(1:5, 100, T)), 'table' > 10) # 1 2 3 4 5 # 21 13 15 28 23 Note: The result was ok only by chance! But it is incorrect in general. Works with the letters-example as well. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue from R-devel: subset on table
My mistake! It does actually something else, which is incorrect. One could still use (although the code is more difficult to read): subset(tmp <- table(sample(1:10, 100, T)), tmp > 10) Sincerely, Leonard On 10/21/2023 10:26 PM, Leonard Mada wrote: Dear List Members, There was recently an issue on R-devel (which I noticed only very late): https://stat.ethz.ch/pipermail/r-devel/2023-October/082943.html It is possible to use subset as well, almost as initially stated: subset(table(sample(1:5, 100, T)), table > 10) # Error in table > 10 : # comparison (>) is possible only for atomic and list types subset(table(sample(1:5, 100, T)), 'table' > 10) # 1 2 3 4 5 # 21 13 15 28 23 Note: The result was ok only by chance! But it is incorrect in general. Works with the letters-example as well. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Issue from R-devel: subset on table
Dear List Members, There was recently an issue on R-devel (which I noticed only very late): https://stat.ethz.ch/pipermail/r-devel/2023-October/082943.html It is possible to use subset as well, almost as initially stated: subset(table(sample(1:5, 100, T)), table > 10) # Error in table > 10 : # comparison (>) is possible only for atomic and list types subset(table(sample(1:5, 100, T)), 'table' > 10) # 1 2 3 4 5 # 21 13 15 28 23 Works with the letters-example as well. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best way to test for numeric digits?
Dear Rui, On 10/18/2023 8:45 PM, Rui Barradas wrote: split_chem_elements <- function(x, rm.digits = TRUE) { regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])" if(rm.digits) { stringr::str_replace_all(mol, regex, "#") |> strsplit("#|[[:digit:]]") |> lapply(\(x) x[nchar(x) > 0L]) } else { strsplit(x, regex, perl = TRUE) } } split.symbol.character = function(x, rm.digits = TRUE) { # Perl is partly broken in R 4.3, but this works: regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])" s <- strsplit(x, regex, perl = TRUE) if(rm.digits) { s <- lapply(s, \(x) x[grep("[[:digit:]]+", x, invert = TRUE)]) } s } You have a glitch (mol is hardcoded) in the code of the first function. The times are similar, after correcting for that glitch. Note: - grep("[[:digit:]]", ...) behaves almost twice as slow as grep("[0-9]", ...)! - corrected results below; Sincerely, Leonard ### split_chem_elements <- function(x, rm.digits = TRUE) { regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])" if(rm.digits) { stringr::str_replace_all(x, regex, "#") |> strsplit("#|[[:digit:]]") |> lapply(\(x) x[nchar(x) > 0L]) } else { strsplit(x, regex, perl = TRUE) } } split.symbol.character = function(x, rm.digits = TRUE) { # Perl is partly broken in R 4.3, but this works: regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])" s <- strsplit(x, regex, perl = TRUE) if(rm.digits) { s <- lapply(s, \(x) x[grep("[0-9]", x, invert = TRUE)]) } s } mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl") mol1 <- rep(mol, 1) system.time( split_chem_elements(mol1) ) # user system elapsed # 0.58 0.00 0.58 system.time( split.symbol.character(mol1) ) # user system elapsed # 0.67 0.00 0.67 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best way to test for numeric digits?
Dear Rui, Thank you for your reply. I do have actually access to the chemical symbols: I have started to refactor and enhance the Rpdb package, see Rpdb::elements: https://github.com/discoleo/Rpdb However, the regex that you have constructed is quite heavy, as it needs to iterate through all chemical symbols (in decreasing nchar). Elements like C, and especially O, P or S, appear late in the regex expression - but are quite common in chemistry. The alternative regex is (in this respect) simpler. It actually works (once you know about the workaround). Q: My question focused if there is anything like is.numeric, but to parse each element of a vector. Sincerely, Leonard On 10/18/2023 6:53 PM, Rui Barradas wrote: Às 15:59 de 18/10/2023, Leonard Mada via R-help escreveu: Dear List members, What is the best way to test for numeric digits? suppressWarnings(as.double(c("Li", "Na", "K", "2", "Rb", "Ca", "3"))) # [1] NA NA NA 2 NA NA 3 The above requires the use of the suppressWarnings function. Are there any better ways? I was working to extract chemical elements from a formula, something like this: split.symbol.character = function(x, rm.digits = TRUE) { # Perl is partly broken in R 4.3, but this works: regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"; # stringi::stri_split(x, regex = regex); s = strsplit(x, regex, perl = TRUE); if(rm.digits) { s = lapply(s, function(s) { isNotD = is.na(suppressWarnings(as.numeric(s))); s = s[isNotD]; }); } return(s); } split.symbol.character(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")) Sincerely, Leonard Note: # works: regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"; strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T) # broken in R 4.3.1 # only slightly "erroneous" with stringi::stri_split regex = "(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])"; strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://eu01.z.antigena.com/l/boS9jwics77ZHEe0yO-Lt8AIDZm9-s6afEH4ulMO3sMyE9mLHNAR603_eeHQG2-_t0N2KsFVQRcldL-XDy~dLMhLtJWX69QR9Y0E8BCSopItW8RqG76PPj7ejTkm7UOsLQcy9PUV0-uTjKs2zeC_oxUOrjaFUWIhk8xuDJWb PLEASE do read the posting guide https://eu01.z.antigena.com/l/rUSt2cEKjOO0HrIFcEgHH_NROfU9g5sZ8MaK28fnBl9G6CrCrrQyqd~_vNxLYzQ7Ruvlxfq~P_77QvT1BngSg~NLk7joNyC4dSEagQsiroWozpyhR~tbGOGCRg5cGlOszZLsmq2~w6qHO5T~8b5z8ZBTJkCZ8CBDi5KYD33-OK and provide commented, minimal, self-contained, reproducible code. Hello, If you want to extract chemical elements symbols, the following might work. It uses the periodic table in GitHub package chemr and a package stringr function. devtools::install_github("paleolimbot/chemr") split_chem_elements <- function(x) { data(pt, package = "chemr", envir = environment()) el <- pt$symbol[order(nchar(pt$symbol), decreasing = TRUE)] pat <- paste(el, collapse = "|") stringr::str_extract_all(x, pat) } mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl") split_chem_elements(mol) #> [[1]] #> [1] "C" "Cl" "F" #> #> [[2]] #> [1] "Li" "Al" "H" #> #> [[3]] #> [1] "C" "Cl" "C" "O" "Al" "P" "O" "Si" "O" "Cl" It is also possible to rewrite the function without calls to non base packages but that will take some more work. Hope this helps, Rui Barradas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Best way to test for numeric digits?
Dear List members, What is the best way to test for numeric digits? suppressWarnings(as.double(c("Li", "Na", "K", "2", "Rb", "Ca", "3"))) # [1] NA NA NA 2 NA NA 3 The above requires the use of the suppressWarnings function. Are there any better ways? I was working to extract chemical elements from a formula, something like this: split.symbol.character = function(x, rm.digits = TRUE) { # Perl is partly broken in R 4.3, but this works: regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"; # stringi::stri_split(x, regex = regex); s = strsplit(x, regex, perl = TRUE); if(rm.digits) { s = lapply(s, function(s) { isNotD = is.na(suppressWarnings(as.numeric(s))); s = s[isNotD]; }); } return(s); } split.symbol.character(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")) Sincerely, Leonard Note: # works: regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"; strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T) # broken in R 4.3.1 # only slightly "erroneous" with stringi::stri_split regex = "(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])"; strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create new data frame with conditional sums
Dear Jason, The code could look something like: dummyData = data.frame(Tract=seq(1, 10, by=1), Pct = c(0.05,0.03,0.01,0.12,0.21,0.04,0.07,0.09,0.06,0.03), Totpop = c(4000,3500,4500,4100,3900,4250,5100,4700,4950,4800)) # Define the cutoffs # - allow for duplicate entries; by = 0.03; # by = 0.01; cutoffs <- seq(0, 0.20, by = by) # Create a new column with cutoffs dummyData$Cutoff <- cut(dummyData$Pct, breaks = cutoffs, labels = cutoffs[-1], ordered_result = TRUE) # Sort data # - we could actually order only the columns: # Totpop & Cutoff; dummyData = dummyData[order(dummyData$Cutoff), ] # Result cs = cumsum(dummyData$Totpop) # Only last entry: # - I do not have a nice one-liner, but this should do it: isLast = rev(! duplicated(rev(dummyData$Cutoff))) data.frame(Total = cs[isLast], Cutoff = dummyData$Cutoff[isLast]) Sincerely, Leonard On 10/15/2023 7:41 PM, Leonard Mada wrote: Dear Jason, I do not think that the solution based on aggregate offered by GPT was correct. That quasi-solution only aggregates for every individual level. As I understand, you want the cumulative sum. The idea was proposed by Bert; you need only to sort first based on the cutoff (e.g. using an ordered factor). And then only extract the last value for each level. If Pct is unique, than you can skip this last step and use directly the cumsum (but on the sorted data set). Alternatives: see the solutions with loops or with sapply. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create new data frame with conditional sums
Dear Jason, I do not think that the solution based on aggregate offered by GPT was correct. That quasi-solution only aggregates for every individual level. As I understand, you want the cumulative sum. The idea was proposed by Bert; you need only to sort first based on the cutoff (e.g. using an ordered factor). And then only extract the last value for each level. If Pct is unique, than you can skip this last step and use directly the cumsum (but on the sorted data set). Alternatives: see the solutions with loops or with sapply. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Pkg-Collaboratos] BioShapes Almost-Package
Thank you very much for all the responses; especially Duncan's guidance. I will add some further ideas on workflows below. There were quite a few views on GitHub; but there is not much to see, as there is absolutely no documentation. I have added in the meantime a basic example: https://github.com/discoleo/BioShapes/blob/main/Examples.Bioshapes.png The actual code can do a lot more. Some ideas on workflows: 1. Most code is written as C/C++; the R code is a thin wrapper around the C/C++ functions It is practical to embed the documentation with the R code - as there is no complex code anyway. The same may apply for small packages. 2. Complex R code The comments may clutter the code. It is also difficult to maintain this documentation, as the comments are less easily readable. Separating the documentation from the code is a good idea. Unfortunately, this is not so obvious when you start working on your first package. Many thanks, Leonard On 9/4/2023 5:47 AM, Jeff Newmiller wrote: Leonard... the reason roxygen exists is to allow markup in source files to be used to automatically generate the numerous files required by standard R packages as documented in Writing R Extensions. If your goal is to not use source files this way then the solution is to not use roxygen at all. Just create those files yourself by directly editing them from scratch. On September 3, 2023 7:06:09 PM PDT, Leonard Mada via R-help wrote: Thank you Bert. Clarification: Indeed, I am using an add-on package: it is customary for that package - that is what I have seen - to have the entire documentation included as comments in the R src files. (But maybe I am wrong.) I will try to find some time over the next few days to explore in more detail the R documentation. Although, I do not know how this will interact with the add-on package. Sincerely, Leonard On 9/4/2023 4:58 AM, Bert Gunter wrote: 1. R-package-devel is where queries about package protocols should go. 2. But... "Is there a succinct, but sufficiently informative description of documentation tools?" "Writing R Extensions" (shipped with R) is *the* reference for R documentation. Whether it's sufficiently "succinct" for you, I cannot say. "I find that including the documentation in the source files is very distracting." ?? R documentation (.Rd) files are separate from source (.R) files. Inline documentation in source files is an "add-on" capability provided by optional packages if one prefers to do this. Such packages parse the source files to extract the documentation into the .Rd files/ So not sure what you mean here. Apologies if I have misunderstood. " I would prefer to have only basic comments in the source files and an expanded documentation in a separate location." If I understand you correctly, this is exactly what the R package process specifies. Again, see the "Writing R Extensions" manual for details. Also, if you wish to have your package on CRAN, it requires that the package documents all functions in the package as specified by the "Writing ..." manual. Again, further questions and elaboration should go to the R-package-devel list, although I think the manual is really the authoritative resource to follow. Cheers, Bert On Sun, Sep 3, 2023 at 5:06 PM Leonard Mada via R-help wrote: Dear R-List Members, I am looking for collaborators to further develop the BioShapes almost-package. I added a brief description below. A.) BioShapes (Almost-) Package The aim of the BioShapes quasi-package is to facilitate the generation of graphical objects resembling biological and chemical entities, enabling the construction of diagrams based on these objects. It currently includes functions to generate diagrams depicting viral particles, liposomes, double helix / DNA strands, various cell types (like neurons, brush-border cells and duct cells), Ig-domains, as well as more basic shapes. It should offer researchers in the field of biological and chemical sciences a tool to easily generate diagrams depicting the studied biological processes. The package lacks a proper documentation and is not yet released on CRAN. However, it is available on GitHub: https://github.com/discoleo/BioShapes Although there are 27 unique cloners on GitHub, I am still looking for contributors and collaborators. I would appreciate any collaborations to develop it further. I can be contacted both by email and on GitHub. B.) Documentation Tools Is there a succinct, but sufficiently informative description of documentation tools? I find that including the documentation in the source files is very distracting. I would prefer to have only basic comments in the source files and an expanded documentation in a sepa
Re: [R] [Pkg-Collaboratos] BioShapes Almost-Package
Thank you Bert. Clarification: Indeed, I am using an add-on package: it is customary for that package - that is what I have seen - to have the entire documentation included as comments in the R src files. (But maybe I am wrong.) I will try to find some time over the next few days to explore in more detail the R documentation. Although, I do not know how this will interact with the add-on package. Sincerely, Leonard On 9/4/2023 4:58 AM, Bert Gunter wrote: > 1. R-package-devel is where queries about package protocols should go. > > 2. But... > "Is there a succinct, but sufficiently informative description of > documentation tools?" > "Writing R Extensions" (shipped with R) is *the* reference for R > documentation. Whether it's sufficiently "succinct" for you, I cannot > say. > > "I find that including the documentation in the source files is very > distracting." > ?? R documentation (.Rd) files are separate from source (.R) files. > Inline documentation in source files is an "add-on" capability > provided by optional packages if one prefers to do this. Such packages > parse the source files to extract the documentation into the .Rd > files/ So not sure what you mean here. Apologies if I have misunderstood. > > " I would prefer to have only basic comments in the source > files and an expanded documentation in a separate location." > If I understand you correctly, this is exactly what the R package > process specifies. Again, see the "Writing R Extensions" manual for > details. > > Also, if you wish to have your package on CRAN, it requires that the > package documents all functions in the package as specified by the > "Writing ..." manual. > > Again, further questions and elaboration should go to the > R-package-devel list, although I think the manual is really the > authoritative resource to follow. > > Cheers, > Bert > > > > On Sun, Sep 3, 2023 at 5:06 PM Leonard Mada via R-help > wrote: > > Dear R-List Members, > > I am looking for collaborators to further develop the BioShapes > almost-package. I added a brief description below. > > A.) BioShapes (Almost-) Package > > The aim of the BioShapes quasi-package is to facilitate the > generation > of graphical objects resembling biological and chemical entities, > enabling the construction of diagrams based on these objects. It > currently includes functions to generate diagrams depicting viral > particles, liposomes, double helix / DNA strands, various cell types > (like neurons, brush-border cells and duct cells), Ig-domains, as > well > as more basic shapes. > > It should offer researchers in the field of biological and chemical > sciences a tool to easily generate diagrams depicting the studied > biological processes. > > The package lacks a proper documentation and is not yet released on > CRAN. However, it is available on GitHub: > https://github.com/discoleo/BioShapes > > Although there are 27 unique cloners on GitHub, I am still looking > for > contributors and collaborators. I would appreciate any > collaborations to > develop it further. I can be contacted both by email and on GitHub. > > > B.) Documentation Tools > > Is there a succinct, but sufficiently informative description of > documentation tools? > I find that including the documentation in the source files is very > distracting. I would prefer to have only basic comments in the source > files and an expanded documentation in a separate location. > > This question may be more appropriate for the R-package-devel list. I > can move the 2nd question to that list. > > ### > > As the biological sciences are very vast, I would be very happy for > collaborators on the development of this package. Examples with > existing > shapes are available in (but are unfortunately not documented): > > Man/examples/Examples.Man.R > R/Examples.R > R/Examples.Cells.R > tests/experimental/* > > > Many thanks, > > Leonard > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [Pkg-Collaboratos] BioShapes Almost-Package
Dear R-List Members, I am looking for collaborators to further develop the BioShapes almost-package. I added a brief description below. A.) BioShapes (Almost-) Package The aim of the BioShapes quasi-package is to facilitate the generation of graphical objects resembling biological and chemical entities, enabling the construction of diagrams based on these objects. It currently includes functions to generate diagrams depicting viral particles, liposomes, double helix / DNA strands, various cell types (like neurons, brush-border cells and duct cells), Ig-domains, as well as more basic shapes. It should offer researchers in the field of biological and chemical sciences a tool to easily generate diagrams depicting the studied biological processes. The package lacks a proper documentation and is not yet released on CRAN. However, it is available on GitHub: https://github.com/discoleo/BioShapes Although there are 27 unique cloners on GitHub, I am still looking for contributors and collaborators. I would appreciate any collaborations to develop it further. I can be contacted both by email and on GitHub. B.) Documentation Tools Is there a succinct, but sufficiently informative description of documentation tools? I find that including the documentation in the source files is very distracting. I would prefer to have only basic comments in the source files and an expanded documentation in a separate location. This question may be more appropriate for the R-package-devel list. I can move the 2nd question to that list. ### As the biological sciences are very vast, I would be very happy for collaborators on the development of this package. Examples with existing shapes are available in (but are unfortunately not documented): Man/examples/Examples.Man.R R/Examples.R R/Examples.Cells.R tests/experimental/* Many thanks, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Query on finding root
Dear R-Users, Just out of curiosity: Which of the 2 methods is the better one? The results seem to differ slightly. fun = function(u){((26104.50*u^0.03399381)/((1-u)^0.107)) - 28353.7} uniroot(fun, c(0,1)) # 0.6048184 curve(fun(x), 0, 1) abline(v=0.3952365, col="red") abline(v=0.6048184, col="red") abline(h=0, col="blue") fun = function(u){ (0.03399381*log(u) - 0.107*log(1-u)) - log(28353.7/26104.50) } fun = function(u){ (0.03399381*log(u) - 0.107*log1p(-u)) - log(28353.7/26104.50) } uniroot(fun, c(0,1)) # 0.6047968 curve(fun(x), 0, 1) abline(v=0.3952365, col="red") abline(v=0.6047968, col="red") abline(h=0, col="blue") Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Numerical stability of: 1/(1 - cos(x)) - 2/x^2
Dear Bert, On 8/19/2023 2:47 AM, Bert Gunter wrote: > "Values of type 2^(-n) (and its binary complement) are exactly > represented as floating point numbers and do not generate the error. > However, values away from such special x-values will generate errors:" > > That was exactly my point: The size of errors depends on the accuracy > of binary representation of floating point numbers and their arithmetic. > > But you previously said: > "The ugly thing is that the error only gets worse as x decreases. The > value neither drops to 0, nor does it blow up to infinity; but it gets > worse in a continuous manner." > > That is wrong and disagrees with what you say above. > > -- Bert On "average", the error increases. But it does NOT increase monotonically: x = 2^(-20) * 1.1 # is still relatively close to the exact value! y <- 1 - x^2/2; 1/(1 - y) - 2/x^2 # 58672303, not 0, nor close to 0; Sincerely, Leonard > > On Fri, Aug 18, 2023 at 4:34 PM Leonard Mada wrote: > > Dear Bert, > > > Values of type 2^(-n) (and its binary complement) are exactly > represented as floating point numbers and do not generate the > error. However, values away from such special x-values will > generate errors: > > > # exactly represented: > x = 9.53674316406250e-07 > y <- 1 - x^2/2; > 1/(1 - y) - 2/x^2 > > # almost exact: > x = 9.536743164062502e-07 > y <- 1 - x^2/2; > 1/(1 - y) - 2/x^2 > > x = 9.536743164062498e-07 > y <- 1 - x^2/2; > 1/(1 - y) - 2/x^2 > > # the result behaves far better around values > # which can be represented exactly, > # but fails drastically for other values! > x = 2^(-20) * 1.1 > y <- 1 - x^2/2; > 1/(1 - y) - 2/x^2 > # 58672303 instead of 0! > > > Sincerely, > > > Leonard > > > On 8/19/2023 2:06 AM, Bert Gunter wrote: >> "The ugly thing is that the error only gets worse as x decreases. >> The >> value neither drops to 0, nor does it blow up to infinity; but it >> gets >> worse in a continuous manner." >> >> If I understand you correctly, this is wrong: >> >> > x <- 2^(-20) ## considerably less then 1e-4 !! >> > y <- 1 - x^2/2; >> > 1/(1 - y) - 2/x^2 >> [1] 0 >> >> It's all about the accuracy of the binary approximation of >> floating point numbers (and their arithmetic) >> >> Cheers, >> Bert >> >> >> On Fri, Aug 18, 2023 at 3:25 PM Leonard Mada via R-help >> wrote: >> >> I have added some clarifications below. >> >> On 8/18/2023 10:20 PM, Leonard Mada wrote: >> > [...] >> > After more careful thinking, I believe that it is a >> limitation due to >> > floating points: >> > [...] >> > >> > The problem really stems from the representation of 1 - >> x^2/2 as shown >> > below: >> > x = 1E-4 >> > print(1 - x^2/2, digits=20) >> > print(0.5, digits=20) # fails >> > # 0.50003039 >> >> The floating point representation of 1 - x^2/2 is the real >> culprit: >> # 0.50003039 >> >> The 3039 at the end is really an error due to the floating point >> representation. However, this error blows up when inverting >> the value: >> x = 1E-4; >> y = 1 - x^2/2; >> 1/(1 - y) - 2/x^2 >> # 1.215494 >> # should be 1/(x^2/2) - 2/x^2 = 0 >> >> >> The ugly thing is that the error only gets worse as x >> decreases. The >> value neither drops to 0, nor does it blow up to infinity; >> but it gets >> worse in a continuous manner. At least the reason has become >> now clear. >> >> >> > >> > Maybe some functions of type cos1p and cos1n would be handy >> for such >> > computations (to replace the manual series expansion): >> > cos1p(x) = 1 + cos(x) >> > cos1n(x) = 1 - cos(x) >> > Though, I do not have yet the big picture. >> > >> >> Sincerely, >> >> >> Leonard >> >> > >> > >> > On 8/17/2023 1:57 PM, Ma
Re: [R] Numerical stability of: 1/(1 - cos(x)) - 2/x^2
Dear Bert, Values of type 2^(-n) (and its binary complement) are exactly represented as floating point numbers and do not generate the error. However, values away from such special x-values will generate errors: # exactly represented: x = 9.53674316406250e-07 y <- 1 - x^2/2; 1/(1 - y) - 2/x^2 # almost exact: x = 9.536743164062502e-07 y <- 1 - x^2/2; 1/(1 - y) - 2/x^2 x = 9.536743164062498e-07 y <- 1 - x^2/2; 1/(1 - y) - 2/x^2 # the result behaves far better around values # which can be represented exactly, # but fails drastically for other values! x = 2^(-20) * 1.1 y <- 1 - x^2/2; 1/(1 - y) - 2/x^2 # 58672303 instead of 0! Sincerely, Leonard On 8/19/2023 2:06 AM, Bert Gunter wrote: > "The ugly thing is that the error only gets worse as x decreases. The > value neither drops to 0, nor does it blow up to infinity; but it gets > worse in a continuous manner." > > If I understand you correctly, this is wrong: > > > x <- 2^(-20) ## considerably less then 1e-4 !! > > y <- 1 - x^2/2; > > 1/(1 - y) - 2/x^2 > [1] 0 > > It's all about the accuracy of the binary approximation of floating > point numbers (and their arithmetic) > > Cheers, > Bert > > > On Fri, Aug 18, 2023 at 3:25 PM Leonard Mada via R-help > wrote: > > I have added some clarifications below. > > On 8/18/2023 10:20 PM, Leonard Mada wrote: > > [...] > > After more careful thinking, I believe that it is a limitation > due to > > floating points: > > [...] > > > > The problem really stems from the representation of 1 - x^2/2 as > shown > > below: > > x = 1E-4 > > print(1 - x^2/2, digits=20) > > print(0.5, digits=20) # fails > > # 0.50003039 > > The floating point representation of 1 - x^2/2 is the real culprit: > # 0.50003039 > > The 3039 at the end is really an error due to the floating point > representation. However, this error blows up when inverting the value: > x = 1E-4; > y = 1 - x^2/2; > 1/(1 - y) - 2/x^2 > # 1.215494 > # should be 1/(x^2/2) - 2/x^2 = 0 > > > The ugly thing is that the error only gets worse as x decreases. The > value neither drops to 0, nor does it blow up to infinity; but it > gets > worse in a continuous manner. At least the reason has become now > clear. > > > > > > Maybe some functions of type cos1p and cos1n would be handy for > such > > computations (to replace the manual series expansion): > > cos1p(x) = 1 + cos(x) > > cos1n(x) = 1 - cos(x) > > Though, I do not have yet the big picture. > > > > Sincerely, > > > Leonard > > > > > > > On 8/17/2023 1:57 PM, Martin Maechler wrote: > >>>>>>> Leonard Mada > >>>>>>> on Wed, 16 Aug 2023 20:50:52 +0300 writes: > >> > Dear Iris, > >> > Dear Martin, > >> > >> > Thank you very much for your replies. I add a few comments. > >> > >> > 1.) Correct formula > >> > The formula in the Subject Title was correct. A small > glitch > >> swept into > >> > the last formula: > >> > - 1/(cos(x) - 1) - 2/x^2 > >> > or > >> > 1/(1 - cos(x)) - 2/x^2 # as in the subject title; > >> > >> > 2.) log1p > >> > Actually, the log-part behaves much better. And when it > fails, > >> it fails > >> > completely (which is easy to spot!). > >> > >> > x = 1E-6 > >> > log(x) -log(1 - cos(x))/2 > >> > # 0.3465291 > >> > >> > x = 1E-8 > >> > log(x) -log(1 - cos(x))/2 > >> > # Inf > >> > log(x) - log1p(- cos(x))/2 > >> > # Inf => fails as well! > >> > # although using only log1p(cos(x)) seems to do the trick; > >> > log1p(cos(x)); log(2)/2; > >> > >> > 3.) 1/(1 - cos(x)) - 2/x^2 > >> > It is possible to convert the formula to one which is > >> numerically more > >> > stable. It is also possible to compute it manually, but it > >> involves much > >> > more work and is also error prone: > >> > >> > (x^2 - 2 + 2*cos
Re: [R] Numerical stability of: 1/(1 - cos(x)) - 2/x^2
I have added some clarifications below. On 8/18/2023 10:20 PM, Leonard Mada wrote: [...] After more careful thinking, I believe that it is a limitation due to floating points: [...] The problem really stems from the representation of 1 - x^2/2 as shown below: x = 1E-4 print(1 - x^2/2, digits=20) print(0.5, digits=20) # fails # 0.50003039 The floating point representation of 1 - x^2/2 is the real culprit: # 0.50003039 The 3039 at the end is really an error due to the floating point representation. However, this error blows up when inverting the value: x = 1E-4; y = 1 - x^2/2; 1/(1 - y) - 2/x^2 # 1.215494 # should be 1/(x^2/2) - 2/x^2 = 0 The ugly thing is that the error only gets worse as x decreases. The value neither drops to 0, nor does it blow up to infinity; but it gets worse in a continuous manner. At least the reason has become now clear. Maybe some functions of type cos1p and cos1n would be handy for such computations (to replace the manual series expansion): cos1p(x) = 1 + cos(x) cos1n(x) = 1 - cos(x) Though, I do not have yet the big picture. Sincerely, Leonard On 8/17/2023 1:57 PM, Martin Maechler wrote: Leonard Mada on Wed, 16 Aug 2023 20:50:52 +0300 writes: > Dear Iris, > Dear Martin, > Thank you very much for your replies. I add a few comments. > 1.) Correct formula > The formula in the Subject Title was correct. A small glitch swept into > the last formula: > - 1/(cos(x) - 1) - 2/x^2 > or > 1/(1 - cos(x)) - 2/x^2 # as in the subject title; > 2.) log1p > Actually, the log-part behaves much better. And when it fails, it fails > completely (which is easy to spot!). > x = 1E-6 > log(x) -log(1 - cos(x))/2 > # 0.3465291 > x = 1E-8 > log(x) -log(1 - cos(x))/2 > # Inf > log(x) - log1p(- cos(x))/2 > # Inf => fails as well! > # although using only log1p(cos(x)) seems to do the trick; > log1p(cos(x)); log(2)/2; > 3.) 1/(1 - cos(x)) - 2/x^2 > It is possible to convert the formula to one which is numerically more > stable. It is also possible to compute it manually, but it involves much > more work and is also error prone: > (x^2 - 2 + 2*cos(x)) / (x^2 * (1 - cos(x))) > And applying L'Hospital: > (2*x - 2*sin(x)) / (2*x * (1 - cos(x)) + x^2*sin(x)) > # and a 2nd & 3rd & 4th time > 1/6 > The big problem was that I did not expect it to fail for x = 1E-4. I > thought it is more robust and works maybe until 1E-5. > x = 1E-5 > 2/x^2 - 2E+10 > # -3.814697e-06 > This is the reason why I believe that there is room for improvement. > Sincerely, > Leonard Thank you, Leonard. Yes, I agree that it is amazing how much your formula suffers from (a generalization of) "cancellation" --- leading you to think there was a problem with cos() or log() or .. in R. But really R uses the system builtin libmath library, and the problem is really the inherent instability of your formula. Indeed your first approximation was not really much more stable: ## 3.) 1/(1 - cos(x)) - 2/x^2 ## It is possible to convert the formula to one which is numerically more ## stable. It is also possible to compute it manually, but it involves much ## more work and is also error prone: ## (x^2 - 2 + 2*cos(x)) / (x^2 * (1 - cos(x))) ## MM: but actually, that approximation does not seem better (close to the breakdown region): f1 <- \(x) 1/(1 - cos(x)) - 2/x^2 f2 <- \(x) (x^2 - 2 + 2*cos(x)) / (x^2 * (1 - cos(x))) curve(f1, 1e-8, 1e-1, log="xy" n=2^10) curve(f2, add = TRUE, col=2, n=2^10) ## Zoom in: curve(f1, 1e-4, 1e-1, log="xy",n=2^9) curve(f2, add = TRUE, col=2, n=2^9) ## Zoom in much more in y-direction: yl <- 1/6 + c(-5, 20)/10 curve(f1, 1e-4, 1e-1, log="x", ylim=yl, n=2^9) abline(h = 1/6, lty=3, col="gray") curve(f2, add = TRUE, n=2^9, col=adjustcolor(2, 1/2)) Now, you can use the Rmpfr package (interface to the GNU MPFR multiple-precision C library) to find out more : if(!requireNamespace("Rmpfr")) install.packages("Rmpfr") M <- function(x, precBits=128) Rmpfr::mpfr(x, precBits) (xM <- M(1e-8))# yes, only ~ 16 dig accurate ## 1.20922560830128472675327e-8 M(10, 128)^-8 # would of course be more accurate, ## but we want the calculation for the double precision number 1e-8 ## Now you can draw "the truth" into the above plots: curve(f1, 1e-4, 1e-1, log="xy",n=2^9) curve(f2, add = TRUE, col=2, n=2^9) ## correct: curve(f1(M(x, 256)), add = TRUE, col=4, lwd=2, n=2^9) abline(h = 1/6, lty=3, col="gray") But, indeed we take note how much it is the formula instability: Also MPFR needs a lot of extra bits precision before it gets to the correct numbers: xM <- c(M(1e-8, 80), M(1e-8, 96), M(1e-8, 112), M(1e-8, 128), M(1e-8, 180), M(1e-8, 256)) ## to and round back to 70 bits for display: R <- \(x) Rmpfr::roundMpfr(x, 70) R(f1(
Re: [R] Numerical stability of: 1/(1 - cos(x)) - 2/x^2
Dear Martin, Thank you very much for your analysis. I add only a small comment: - the purpose of the modified formula was to apply l'Hospital; - there are other ways to transform the formula; although applying l'Hospital once is probably more robust than simple transformations (but the computations are also more tedious and error prone); After more careful thinking, I believe that it is a limitation due to floating points: x = 1E-4 1/(-x^2/2 - x^4/24) + 2/x^2 1/6 y = 1 - x^2/2 - x^4/24; 1/(cos(x) - 1) + 2/x^2 1/(y - 1) + 2/x^2 # -1.215494 # correct: 1/6 We need the 3rd term for the correct computation of cos(x) in this problem: but this is x^4 / 24, which for 1E-4 requires precision at least up to 1E-16 / 24, or ~ 1E-18. I did not thought initially about that. The trigonometric functions skip one term, and are therefore much uglier than the log. The problem really stems from the representation of 1 - x^2/2 as shown below: x = 1E-4 print(1 - x^2/2, digits=20) print(0.5, digits=20) # fails # 0.50003039 Maybe some functions of type cos1p and cos1n would be handy for such computations (to replace the manual series expansion): cos1p(x) = 1 + cos(x) cos1n(x) = 1 - cos(x) Though, I do not have yet the big picture. Sincerely, Leonard On 8/17/2023 1:57 PM, Martin Maechler wrote: Leonard Mada on Wed, 16 Aug 2023 20:50:52 +0300 writes: > Dear Iris, > Dear Martin, > Thank you very much for your replies. I add a few comments. > 1.) Correct formula > The formula in the Subject Title was correct. A small glitch swept into > the last formula: > - 1/(cos(x) - 1) - 2/x^2 > or > 1/(1 - cos(x)) - 2/x^2 # as in the subject title; > 2.) log1p > Actually, the log-part behaves much better. And when it fails, it fails > completely (which is easy to spot!). > x = 1E-6 > log(x) -log(1 - cos(x))/2 > # 0.3465291 > x = 1E-8 > log(x) -log(1 - cos(x))/2 > # Inf > log(x) - log1p(- cos(x))/2 > # Inf => fails as well! > # although using only log1p(cos(x)) seems to do the trick; > log1p(cos(x)); log(2)/2; > 3.) 1/(1 - cos(x)) - 2/x^2 > It is possible to convert the formula to one which is numerically more > stable. It is also possible to compute it manually, but it involves much > more work and is also error prone: > (x^2 - 2 + 2*cos(x)) / (x^2 * (1 - cos(x))) > And applying L'Hospital: > (2*x - 2*sin(x)) / (2*x * (1 - cos(x)) + x^2*sin(x)) > # and a 2nd & 3rd & 4th time > 1/6 > The big problem was that I did not expect it to fail for x = 1E-4. I > thought it is more robust and works maybe until 1E-5. > x = 1E-5 > 2/x^2 - 2E+10 > # -3.814697e-06 > This is the reason why I believe that there is room for improvement. > Sincerely, > Leonard Thank you, Leonard. Yes, I agree that it is amazing how much your formula suffers from (a generalization of) "cancellation" --- leading you to think there was a problem with cos() or log() or .. in R. But really R uses the system builtin libmath library, and the problem is really the inherent instability of your formula. Indeed your first approximation was not really much more stable: ## 3.) 1/(1 - cos(x)) - 2/x^2 ## It is possible to convert the formula to one which is numerically more ## stable. It is also possible to compute it manually, but it involves much ## more work and is also error prone: ## (x^2 - 2 + 2*cos(x)) / (x^2 * (1 - cos(x))) ## MM: but actually, that approximation does not seem better (close to the breakdown region): f1 <- \(x) 1/(1 - cos(x)) - 2/x^2 f2 <- \(x) (x^2 - 2 + 2*cos(x)) / (x^2 * (1 - cos(x))) curve(f1, 1e-8, 1e-1, log="xy" n=2^10) curve(f2, add = TRUE, col=2, n=2^10) ## Zoom in: curve(f1, 1e-4, 1e-1, log="xy",n=2^9) curve(f2, add = TRUE, col=2, n=2^9) ## Zoom in much more in y-direction: yl <- 1/6 + c(-5, 20)/10 curve(f1, 1e-4, 1e-1, log="x", ylim=yl, n=2^9) abline(h = 1/6, lty=3, col="gray") curve(f2, add = TRUE, n=2^9, col=adjustcolor(2, 1/2)) Now, you can use the Rmpfr package (interface to the GNU MPFR multiple-precision C library) to find out more : if(!requireNamespace("Rmpfr")) install.packages("Rmpfr") M <- function(x, precBits=128) Rmpfr::mpfr(x, precBits) (xM <- M(1e-8))# yes, only ~ 16 dig accurate ## 1.20922560830128472675327e-8 M(10, 128)^-8 # would of course be more accurate, ## but we want the calculation for the double precision number 1e-8 ## Now you can draw "the truth" into the above plots: curve(f1, 1e-4, 1e-1, log="xy",n=2^9) curve(f2, add = TRUE, col=2, n=2^9) ## correct: curve(f1(M(x, 256)), add = TRUE, col=4, lwd=2, n=2^9) abline(h = 1/6, lty=3, col="gray") But, indeed we take note how much it is the formula instability: Also MPFR needs a lot of extra bits precision before it gets to the correct numbers: xM <- c(M(1e-8, 80), M(1e-8, 96), M(1e-8, 112), M
Re: [R] Numerical stability of: 1/(1 - cos(x)) - 2/x^2
Dear Iris, Dear Martin, Thank you very much for your replies. I add a few comments. 1.) Correct formula The formula in the Subject Title was correct. A small glitch swept into the last formula: - 1/(cos(x) - 1) - 2/x^2 or 1/(1 - cos(x)) - 2/x^2 # as in the subject title; 2.) log1p Actually, the log-part behaves much better. And when it fails, it fails completely (which is easy to spot!). x = 1E-6 log(x) -log(1 - cos(x))/2 # 0.3465291 x = 1E-8 log(x) -log(1 - cos(x))/2 # Inf log(x) - log1p(- cos(x))/2 # Inf => fails as well! # although using only log1p(cos(x)) seems to do the trick; log1p(cos(x)); log(2)/2; 3.) 1/(1 - cos(x)) - 2/x^2 It is possible to convert the formula to one which is numerically more stable. It is also possible to compute it manually, but it involves much more work and is also error prone: (x^2 - 2 + 2*cos(x)) / (x^2 * (1 - cos(x))) And applying L'Hospital: (2*x - 2*sin(x)) / (2*x * (1 - cos(x)) + x^2*sin(x)) # and a 2nd & 3rd & 4th time 1/6 The big problem was that I did not expect it to fail for x = 1E-4. I thought it is more robust and works maybe until 1E-5. x = 1E-5 2/x^2 - 2E+10 # -3.814697e-06 This is the reason why I believe that there is room for improvement. Sincerely, Leonard On 8/16/2023 9:51 AM, Iris Simmons wrote: > You could rewrite > > 1 - cos(x) > > as > > 2 * sin(x/2)^2 > > and that might give you more precision? > > On Wed, Aug 16, 2023, 01:50 Leonard Mada via R-help > wrote: > > Dear R-Users, > > I tried to compute the following limit: > x = 1E-3; > (-log(1 - cos(x)) - 1/(cos(x)-1)) / 2 - 1/(x^2) + log(x) > # 0.4299226 > log(2)/2 + 1/12 > # 0.4299069 > > However, the result diverges as x decreases: > x = 1E-4 > (-log(1 - cos(x)) - 1/(cos(x)-1)) / 2 - 1/(x^2) + log(x) > # 0.9543207 > # correct: 0.4299069 > > I expected the precision to remain good with x = 1E-4 or x = 1E-5. > > This part blows up - probably some significant loss of precision of > cos(x) when x -> 0: > 1/(cos(x) - 1) - 2/x^2 > > Maybe there is some room for improvement. > > Sincerely, > > Leonard > == > The limit was part of the integral: > up = pi/5; > integrate(function(x) 1 / sin(x)^3 - 1/x^3 - 1/(2*x), 0, up) > (log( (1 - cos(up)) / (1 + cos(up)) ) + > + 1/(cos(up) - 1) + 1/(cos(up) + 1) + 2*log(2) - 1/3) / 4 + > + (1/(2*up^2) - log(up)/2); > > # see: > > https://github.com/discoleo/R/blob/master/Math/Integrals.Trig.Fractions.Poly.R > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://eu01.z.antigena.com/l/boSAjwics773HEe0HFHDZf3m1AU7fmWr4bglOgXO3sMyE9zLHAAMytf-SnATeHdnKJyeFbBsM6nXG-uPpd0NTW30ooAzNgYV5uhwnlhwxr4i8i21qKJUC~IrUTz2X1a5ioWqOWtHPlgzUrOid926sUOri-_H8XkLDcodDRWb > > PLEASE do read the posting guide > > https://eu01.z.antigena.com/l/AUS87vWM-isc3qtDXhJTp4jyQv7tuxdolKFlpY6mWcDOjbSlNzcD~~GORwHJFcX866fJF~qQmKc9R6LV9upRYcB4CBlSnLN0U_X8fIqLyhOIiPzDjYTVLEgiilZrKGuUqfW72b_50MVi~TaTlnE_R7fz8zXoZWGrKmGA > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Numerical stability of: 1/(1 - cos(x)) - 2/x^2
Dear R-Users, I tried to compute the following limit: x = 1E-3; (-log(1 - cos(x)) - 1/(cos(x)-1)) / 2 - 1/(x^2) + log(x) # 0.4299226 log(2)/2 + 1/12 # 0.4299069 However, the result diverges as x decreases: x = 1E-4 (-log(1 - cos(x)) - 1/(cos(x)-1)) / 2 - 1/(x^2) + log(x) # 0.9543207 # correct: 0.4299069 I expected the precision to remain good with x = 1E-4 or x = 1E-5. This part blows up - probably some significant loss of precision of cos(x) when x -> 0: 1/(cos(x) - 1) - 2/x^2 Maybe there is some room for improvement. Sincerely, Leonard == The limit was part of the integral: up = pi/5; integrate(function(x) 1 / sin(x)^3 - 1/x^3 - 1/(2*x), 0, up) (log( (1 - cos(up)) / (1 + cos(up)) ) + + 1/(cos(up) - 1) + 1/(cos(up) + 1) + 2*log(2) - 1/3) / 4 + + (1/(2*up^2) - log(up)/2); # see: https://github.com/discoleo/R/blob/master/Math/Integrals.Trig.Fractions.Poly.R __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regex Split?
Dear Bert, Thank you for the suggestion. Indeed, there are various solutions and workarounds. However, there is still a bug in strsplit. 2.) gsub I would try to avoid gsub on a Wikipedia-sized corpus: using strsplit directly should be far more efficient. 3.) Punctuation marks Abbreviations and "word1-word2" may be a problem: gsub("(?[[:punct:]])", "\\1 ", "A.B.C.", perl=T) # "A. B. C. " I do not yet have an intuition if the spaces in "A. B. C. " would adversely affect the language model. But this goes off-topic. Sincerely, Leonard On 5/6/2023 1:35 AM, Bert Gunter wrote: > Primarily for my own amusement, here is a way to do what I think you > wanted without look-aheads/behinds > > strsplit(gsub("([[:punct:]])"," \\1 ","a bc,def, adef,x; ,,gh"), " +") > [[1]] > [1] "a" "bc" "," "def" "," "adef" "," "x" ";" > [10] "," "," "gh" > > I certainly would *not* claim that it is in any way superior to > anything that has already been suggested -- indeed, probably the > contrary. But it's simple (as am I). > > Cheers, > Bert > > On Fri, May 5, 2023 at 2:54 PM Leonard Mada via R-help > wrote: > > Dear Avi, > > Punctuation marks are used in various NLP language models. Preserving > the "," is therefore useful in such scenarios and Regex are useful to > accomplish this (especially if you have sufficient experience with > such > expressions). > > I observed only an odd behaviour using strsplit: the example > string is > constructed; but it is always wise to test a Regex expression against > various scenarios. It is usually hard to predict what special > cases will > occur in a specific corpus. > > strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T) > # "a" "bc" "," "def" "," "" "adef" "," "," "gh" > > stringi::stri_split("a bc,def, adef ,,gh", regex=" > |(?=,)|(?<=,)(?![ ])") > # "a" "bc" "," "def" "," "adef" "" "," "," "gh" > > stringi::stri_split("a bc,def, adef ,,gh", regex=" |(? )(?=,)|(?<=,)(?![ ])") > # "a" "bc" "," "def" "," "adef" "," "," "gh" > > # Expected: > # "a" "bc" "," "def" "," "adef" "," "," "gh" > # see 2nd instance of stringi::stri_split > > > Sincerely, > > > Leonard > > > On 5/5/2023 11:20 PM, avi.e.gr...@gmail.com wrote: > > Leonard, > > > > It can be helpful to spell out your intent in English or some of > us have to go back to the documentation to remember what some of > the operators do. > > > > Your text being searched seems to be an example of items between > comas with an optional space after some commas and in one case, > nothing between commas. > > > > So what is your goal for the example, and in general? You > mention a bit unclearly at the end some of what you expect and I > think it would be clearer if you also showed exactly the output > you would want. > > > > I saw some other replies that addressed what you wanted and am > going to reply in another direction. > > > > Why do things the hard way using things like lookahead or look > behind? Would several steps get you the result way more clearly? > > > > For the sake of argument, you either want what reading in a CSV > file would supply, or something else. Since you are not simply > splitting on commas, it sounds like something else. But what > exactly else? Something as simple as this on just a comma produces > results including empty strings and embedded leading or trailing > spaces: > > > > strsplit("a bc,def, adef ,,gh", ",") > > [[1]] > > [1] "a bc" "def" " adef " "" "gh" > > > > That can of course be handled by, for example, trimming the > result after unlisting the odd way strsplit returns results: > > >
Re: [R] Regex Split?
Dear Avi, Punctuation marks are used in various NLP language models. Preserving the "," is therefore useful in such scenarios and Regex are useful to accomplish this (especially if you have sufficient experience with such expressions). I observed only an odd behaviour using strsplit: the example string is constructed; but it is always wise to test a Regex expression against various scenarios. It is usually hard to predict what special cases will occur in a specific corpus. strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T) # "a" "bc" "," "def" "," "" "adef" "," "," "gh" stringi::stri_split("a bc,def, adef ,,gh", regex=" |(?=,)|(?<=,)(?![ ])") # "a" "bc" "," "def" "," "adef" "" "," "," "gh" stringi::stri_split("a bc,def, adef ,,gh", regex=" |(?)(?=,)|(?<=,)(?![ ])") # "a" "bc" "," "def" "," "adef" "," "," "gh" # Expected: # "a" "bc" "," "def" "," "adef" "," "," "gh" # see 2nd instance of stringi::stri_split Sincerely, Leonard On 5/5/2023 11:20 PM, avi.e.gr...@gmail.com wrote: Leonard, It can be helpful to spell out your intent in English or some of us have to go back to the documentation to remember what some of the operators do. Your text being searched seems to be an example of items between comas with an optional space after some commas and in one case, nothing between commas. So what is your goal for the example, and in general? You mention a bit unclearly at the end some of what you expect and I think it would be clearer if you also showed exactly the output you would want. I saw some other replies that addressed what you wanted and am going to reply in another direction. Why do things the hard way using things like lookahead or look behind? Would several steps get you the result way more clearly? For the sake of argument, you either want what reading in a CSV file would supply, or something else. Since you are not simply splitting on commas, it sounds like something else. But what exactly else? Something as simple as this on just a comma produces results including empty strings and embedded leading or trailing spaces: strsplit("a bc,def, adef ,,gh", ",") [[1]] [1] "a bc" "def"" adef " "" "gh" That can of course be handled by, for example, trimming the result after unlisting the odd way strsplit returns results: library("stringr") str_squish(unlist(strsplit("a bc,def, adef ,,gh", ","))) [1] "a bc" "def" "adef" "" "gh" Now do you want the empty string to be something else, such as an NA? That can be done too with another step. And a completely different variant can be used to read in your one-line CSV as text using standard overkill tools: read.table(text="a bc,def, adef ,,gh", sep=",") V1 V2 V3 V4 V5 1 a bc def adef NA gh The above is a vector of texts. But if you simply want to reassemble your initial string cleaned up a bit, you can use paste to put back commas, as in a variation of the earlier example: paste(str_squish(unlist(strsplit("a bc,def, adef ,,gh", ","))), collapse=",") [1] "a bc,def,adef,,gh" So my question is whether using advanced methods is really necessary for your case, or even particularly efficient. If efficiency matters, often, it is better to use tools without regular expressions such as paste0() when they meet your needs. Of course, unless I know what you are actually trying to do, my remarks may be not useful. -Original Message- From: R-help On Behalf Of Leonard Mada via R-help Sent: Thursday, May 4, 2023 5:00 PM To: R-help Mailing List Subject: [R] Regex Split? Dear R-Users, I tried the following 3 Regex expressions in R 4.3: strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T) # "a""bc" ",""def" ",""" "adef" ",""," "gh" strsplit("a bc,def, adef ,,gh", " |(?https://eu01.z.antigena.com/l/boS91wizs77ZHrpn6fDgE-TZu7JxUnjyNg_9mZDUsLWLylcL-dhQytfeUHheLHZnKJw-VwwfCd_W4XdAukyKenqYPFzSJmP5FrWmF_wepejCrBByUVa66jUF7wKGiA8LnqB49ZUVq-urjKs272Rl-mj-SE1q7--Xj1UXRol3 PLEASE do read the posting guide https://eu01.z.antigena.com/l/rUS82cEKjOa3tTqQ7yTAXLpuOWG1NttoMdEKDQkk3EZhrLW63rsvJ77vuFxoc44Nwo7BGuQyBzF3bNlYLccamhXBk0shpe_1ZhOeonqIbTm59I58PKOPwwqUt6gLF2fLg3OmstDk7ueraKARO4qpUToOguMdYKyE2_LZnBk7QR and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regex Split?
Dear Bill, Indeed, there are other cases as well - as documented. Various Regex sites give the warning to avoid the legacy syntax "[[:<:]]", so this is the alternative syntax: strsplit(split="\\b(?=\\w)", "One, two; three!", perl=TRUE) # "O" "n" "e" ", " "t" "w" "o" "; " "t" "h" "r" "e" "e" "!" gsub("\\b(?=\\w)", "#", "One, two; three!", perl=TRUE) # "#One, #two; #three!" Sincerely, Leonard On 5/5/2023 6:19 PM, Bill Dunlap wrote: > https://eu01.z.antigena.com/l/BgIBOxsm88PwDTBiTTrQ784MFk2oGZVOA3RMHiarAZuyoEemKrcnpfJeD8X0FgxRDG33qHZho~NriRCbhv9_Ffr3EOfqn2vpaNUAlCDjQ8nOyVUgPM2iGnHi-qpN54kl1YVO_gHimn0m2ZJ68ntGtysras~0mRMDuAgwbTXsQcQ~ > > (from 2016, still labelled 'UNCONFIRMED") contains some other examples > of strsplit misbehaving when using 0-length perl look-behinds. E.g., > > > strsplit(split="[[:<:]]", "One, two; three!", perl=TRUE)[[1]] > [1] "O" "n" "e" ", " "t" "w" "o" "; " "t" "h" "r" "e" "e" "!" > > gsub(pattern="[[:<:]]", "#", "One, two; three!", perl=TRUE) > [1] "#One, #two; #three!" > > The bug report includes the comment > It may be possible that strsplit is not using the startoffset argument > to pcre_exec > >pcre/pcre/doc/html/pcreapi.html > A non-zero starting offset is useful when searching for another match > in the same subject by calling pcre_exec() again after a previous > success. Setting startoffset differs from just passing over a > shortened string and setting PCRE_NOTBOL in the case of a pattern that > begins with any kind of lookbehind. > > or it could be something else. > > > On Fri, May 5, 2023 at 3:25 AM Ivan Krylov wrote: > > On Thu, 4 May 2023 23:59:33 +0300 > Leonard Mada via R-help wrote: > > > strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T) > > # "a" "bc" "," "def" "," "" "adef" "," "," "gh" > > > > strsplit("a bc,def, adef ,,gh", " |(? perl=T) > > # "a" "bc" "," "def" "," "" "adef" "," "," "gh" > > > > strsplit("a bc,def, adef ,,gh", " |(? > perl=T) > > # "a" "bc" "," "def" "," "" "adef" "," "," "gh" > > > > > > Is this correct? > > Perl seems to return the results you expect: > > $ perl -E ' > say("$_:\n ", join " ", map qq["$_"], split $_, q[a bc,def, adef > ,,gh]) > for ( > qr[ |(?=,)|(?<=,)(?![ ])], > qr[ |(? qr[ |(? )' > (?^u: |(?=,)|(?<=,)(?![ ])): > "a" "bc" "," "def" "," "adef" "," "," "gh" > (?^u: |(? "a" "bc" "," "def" "," "adef" "," "," "gh" > (?^u: |(? "a" "bc" "," "def" "," "adef" "," "," "gh" > > The same thing happens when I ask R to replace the separators instead > of splitting by them: > > sapply(setNames(nm = c( > " |(?=,)|(?<=,)(?![ ])", > " |(? " |(? ), gsub, '[]', "a bc,def, adef ,,gh", perl = TRUE) > # |(?=,)|(?<=,)(?![ ]) |(? )(?=,)|(?<=,)(?![ ]) > # "a[]bc[],[]def[],[]adef[],[],[]gh" > "a[]bc[],[]def[],[]adef[],[],[]gh" > # |(? # "a[]bc[],[]def[],[]adef[],[],[]gh" > > I think that something strange happens when the delimeter pattern > matches more than once in the same place: > > gsub( > '(?=<--)|(?<=-->)', '[]', 'split here --><-- split here', > perl = TRUE > ) > # [1] "split here -->[]<-- split here" > > (Both Perl's split() and s///g agree with R's gsub() here, although I >
[R] Regex Split?
Dear R-Users, I tried the following 3 Regex expressions in R 4.3: strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T) # "a" "bc" "," "def" "," "" "adef" "," "," "gh" strsplit("a bc,def, adef ,,gh", " |(?- the first one could also return "", "," (but probably not; not fully sure about this); Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Split String in regex while Keeping Delimiter
Dear Emily, I have written a more robust version of the function: extract.nonLetters = function(x, rm.space = TRUE, normalize=TRUE, sort=TRUE) { if(normalize) str = stringi::stri_trans_nfc(str); ch = strsplit(str, "", fixed = TRUE); ch = unique(unlist(ch)); if(sort) ch = sort(ch); pat = if(rm.space) "^[a-zA-Z ]" else "^[a-zA-Z]"; isLetter = grepl(pat, ch); ch = ch[ ! isLetter]; return(stringi::stri_escape_unicode(ch)); } extract.nonLetters(str) # "\\u2013" "+" This code ("\u2013") is included in the expanded Regex expression: tokens = strsplit(str, "(?<=[-+\u2010-\u2014])\\s++", perl=TRUE) Sincerely, Leonard On 4/13/2023 9:40 PM, Leonard Mada wrote: Dear Emily, Using a look-behind solves the split problem in this case. (Note: Using Regex is in most/many cases the simplest solution.) str = c("leucocyten + gramnegatieve staven +++ grampositieve staven ++", "leucocyten – grampositieve coccen +") tokens = strsplit(str, "(?<=[-+])\\s++", perl=TRUE) PROBLEM The current expression does NOT work for a different reason: the "-" is coded using a NON-ASCII character. I have written a small utility function to approximately extract "non-standard" characters: ### Identify non-ASCII Characters # beware: the filtering and the sorting may break the codes; extract.nonLetters = function(x, rm.space = TRUE, sort=FALSE) { code = as.numeric(unique(unlist(lapply(x, charToRaw; isLetter = (code >= 97 & code <= 122) | (code >= 65 & code <= 90); code = code[ ! isLetter]; if(rm.space) { # removes only simple space! code = code[code != 32]; } if(sort) code = sort(code); return(code); } extract.nonLetters(str, sort = FALSE) # 43 226 128 147 Note: - the code for "+" is 43, and for simple "-" is 45: as.numeric (charToRaw("+-")); - "226 128 147" codes something else, but it is not trivial to get the Unicode code Point; https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128&utf8=dec The following is a more comprehensive Regex expression, which accepts many variants of "-": tokens = strsplit(str, "(?<=[-+\u2010-\u2014])\\s++", perl=TRUE) Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Split String in regex while Keeping Delimiter
Dear Emily, Using a look-behind solves the split problem in this case. (Note: Using Regex is in most/many cases the simplest solution.) str = c("leucocyten + gramnegatieve staven +++ grampositieve staven ++", "leucocyten – grampositieve coccen +") tokens = strsplit(str, "(?<=[-+])\\s++", perl=TRUE) PROBLEM The current expression does NOT work for a different reason: the "-" is coded using a NON-ASCII character. I have written a small utility function to approximately extract "non-standard" characters: ### Identify non-ASCII Characters # beware: the filtering and the sorting may break the codes; extract.nonLetters = function(x, rm.space = TRUE, sort=FALSE) { code = as.numeric(unique(unlist(lapply(x, charToRaw; isLetter = (code >= 97 & code <= 122) | (code >= 65 & code <= 90); code = code[ ! isLetter]; if(rm.space) { # removes only simple space! code = code[code != 32]; } if(sort) code = sort(code); return(code); } extract.nonLetters(str, sort = FALSE) # 43 226 128 147 Note: - the code for "+" is 43, and for simple "-" is 45: as.numeric (charToRaw("+-")); - "226 128 147" codes something else, but it is not trivial to get the Unicode code Point; https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128&utf8=dec The following is a more comprehensive Regex expression, which accepts many variants of "-": tokens = strsplit(str, "(?<=[-+\u2010-\u2014])\\s++", perl=TRUE) Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Default Generic function for: args(name, default = TRUE)
Dear Bert, Thank you for the idea. It works, although a little bit ugly. The original code generated an ugly warning as well. I have modified it slightly: is.function.generic = function(name) { # TODO: is.function.generic(); # - this version is a little bit ugly; # - S4: if(isGeneric(name)); length(do.call(.S3methods, list(name))) > 0; } The latest code is on GitHub: https://github.com/discoleo/R/blob/master/Stat/Tools.Code.R Sincerely, Leonard ### initial variant: is.function.generic = function(name) { length(.S3methods(name)) > 0; } is.function.generic(plot) # [1] TRUE # Warning message: # In .S3methods(name) : # generic function 'name' dispatches methods for generic 'plot' On 3/8/2023 9:24 PM, Bert Gunter wrote: > ?.S3methods > > f <- function()(2) > > length(.S3methods(f)) > [1] 0 > > length(.S3methods(print)) > [1] 206 > > There may be better ways, but this is what came to my mind. > -- Bert > > On Wed, Mar 8, 2023 at 11:09 AM Leonard Mada via R-help > wrote: > > Dear R-Users, > > I want to change the args() function to return by default the > arguments > of the default generic function: > args = function(name, default = TRUE) { > # TODO: && is.function.generic(); > if(default) { > fn = match.call()[[2]]; > fn = paste0(as.character(fn), ".default"); > name = fn; > } > .Internal(args(name)); > } > > Is there a nice way to find out if a function is generic: > something like > is.function.generic()? > > Many thanks, > > Leonard > === > > Note: > - the latest version of this code will be on GitHub: [edited] > https://github.com/discoleo/R/blob/master/Stat/Tools.Code.R > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://eu01.z.antigena.com/l/WZHjscrVT77vHCg4mpOMD6G3~71hGfsI0mptEj63qDZ99ANSHBBoL682m4B1eeO4WJY2kWbEgf4nD6ORsu0Q1G3~fQBVXndWyFwTK0o1QPwCDSM4XcAT_kGxZsqFs0nU5LG9FCvwtZX7Lsta070KiGRfGTdgKbpCnJ9vNNn > > PLEASE do read the posting guide > > https://eu01.z.antigena.com/l/Efe9Tc5mAjjlHqsLNsVk2LXfGus-29wP9xeO3U-ofLI66Tb7tlGzxrP41MCDN4tLHzRIy8CNw2lGBcBL8IJrlNylyzjgGj38QiP1AqozMauoon-m6yOtCa2oLqMafoHs6kmA~KUXemho3gXsgpaNdEzAmkcv5WqXCZJU9h > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Default Generic function for: args(name, default = TRUE)
Dear Gregg, Thank you for the fast response. I believe though that isGeneric works only for S4-functions: isGeneric("plot") # FALSE I still try to get it to work. Sincerely, Leonard On 3/8/2023 9:13 PM, Gregg Powell wrote: Yes, there is a way to check if a function is generic. You can use the isGeneric function to check if a function is generic. Here's an updated version of the args function that includes a check for generic functions: args = function(name, default = TRUE) { if(default && isGeneric(name)) { fn = paste0(as.character(name), ".default") name = fn } .Internal(args(name)) } r/ Gregg --- Original Message --- On Wednesday, March 8th, 2023 at 12:09 PM, Leonard Mada via R-help wrote: Dear R-Users, I want to change the args() function to return by default the arguments of the default generic function: args = function(name, default = TRUE) { # TODO: && is.function.generic(); if(default) { fn = match.call()[[2]]; fn = paste0(as.character(fn), ".default"); name = fn; } .Internal(args(name)); } Is there a nice way to find out if a function is generic: something like is.function.generic()? Many thanks, Leonard === Note: - the latest version of this code will be on GitHub: [edited] https://github.com/discoleo/R/blob/master/Stat/Tools.Code.R __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://eu01.z.antigena.com/l/CoS9jxWcs77ZH1EdYpfdt4f1PgsZsEHB9bGmOMbk~shS17K1eTNQZGNHQeCHXm_3FeDw2W64pxRnUN0qYZCeC-KYHDkXzka9~lEVxf9mq1jf24zCj~A96_OSrQj~-IjJgb0J9~DKPpwYsqrK9jJHy9jPj6T3V2M8SAwxFZzj PLEASE do read the posting guide https://eu01.z.antigena.com/l/CoS9jwics77ZHse0yVQ6JzRj1U7ZoE-xBthsLCBb5dXiWzcWWdMG5w6w0ko3hHpOaKJLXU-CYJO0bkWf-_eiJLh3FOnfRi22P4jyYsHId9eMpOYB7kA9rQXziAjRycjqgrVw9~DKHfurjKs2zz-nxsjOrlIOuYLYHtTk0XXQ and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Default Generic function for: args(name, default = TRUE)
Dear R-Users, I want to change the args() function to return by default the arguments of the default generic function: args = function(name, default = TRUE) { # TODO: && is.function.generic(); if(default) { fn = match.call()[[2]]; fn = paste0(as.character(fn), ".default"); name = fn; } .Internal(args(name)); } Is there a nice way to find out if a function is generic: something like is.function.generic()? Many thanks, Leonard === Note: - the latest version of this code will be on GitHub: https://github.com/discoleo/R/commits/master/Stat/Tools.Code.R __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Generic Function read?
Dear R-Users, I noticed that *read* is not a generic function. Although it could benefit from the functionality available for generic functions: read = function(file, ...) UseMethod("read") methods(read) # [1] read.csv read.csv2 read.dcf read.delim read.delim2 read.DIF read.fortran # [8] read.ftable read.fwf read.socket read.table The users would still need to call the full function name. But it seems useful to be able to find rapidly what formats can be read; including with other packages (e.g. for Excel, SAS, ... - although most packages do not adhere to the generic naming convention, but maybe they will change in the future). Note: This should be possible (even though impractical), but actually does NOT work: read = function(file, ...) UseMethod("read") file = "file.csv" class(file) = c("csv", class(file)); read(file) Should it not work? Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Covid Mutations: Cumulative?
Dear R-Users, Did anyone follow more closely the SARS Cov-2 lineages? I have done a quick check of Cov-2 mutations on the list downloaded from NCBI (see GitHub page below); but it seems that the list contains the cumulative mutations only for B.1 => B.1.1, but not after the B.1.1 branch: # B.1 => B.1.1 seems cumulative diff.lineage("B.1.1", "B.1", data=z) # but B.1.1 => B.1.1.529 is NOT cumulative anymore; diff.lineage("B.1.1.529", "B.1.1", data=z) diff.lineage("B.1.1.529", "BA.2", data=z) diff.lineage("B.1.1.529", "BA.5", data=z) # Column id: B(oth) = present in both lineages: V Mutation P AA Pos AAi AAm Polymorphism id 899 B.1.1 nsp3:F106F nsp3 F106F 106 F F TRUE B 900 B.1.1 RdRp:P323L RdRp P323L 323 P L FALSE B 901 B.1.1 S:D614G S D614G 614 D G FALSE B 902 B.1.1 N:R203K N R203K 203 R K FALSE 1 903 B.1.1 N:R203R N R203R 203 R R TRUE 1 904 B.1.1 N:G204R N G204R 204 G R FALSE 1 896 B.1 nsp3:F106F nsp3 F106F 106 F F TRUE B 897 B.1 RdRp:P323L RdRp P323L 323 P L FALSE B 898 B.1 S:D614G S D614G 614 D G FALSE B # B.1.1.529 and branches do not have any of the defining mutations of B.1.1; I have uploaded the code on GitHub: https://github.com/discoleo/R/blob/master/Stat/Infx/Cov2.Variants.R 1.) Does anyone have a better picture of what is going on? The sub-variants should have cumulative mutations. This should be the logic for the sub-lineages and I deduce it also by the data/post on the GitHub pango page: https://github.com/cov-lineages/pango-designation/issues/361 2.) Cumulative List It maybe that NCBI kept only the new mutations, as the number of mutations increased. Does anyone know if there is a full cumulative list? Alternatively, there might be a list or package with the full lineage encoding. There is a list on the Pango GitHub project, but I hope to skip at least this step; the synonyms in the NCBI file seem uglier to process. Note: This question may be more oriented towards Bioconductor; but I haven't found any real Covid packages on Bioconductor. Thank you very much for any help. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Covid-19 Variants & Lineages
Dear Ivan, Thank you very much. Indeed, I missed the download button. The csv file seems to contain all the mutations in a usable format. Sincerely, Leonard On 1/24/2023 11:29 PM, Ivan Krylov wrote: On Tue, 24 Jan 2023 22:26:34 +0200 Leonard Mada via R-help wrote: The data on the NCBI page "Explore in SARS-CoV-2 Variants Overview" seems very difficult to download: https://www.ncbi.nlm.nih.gov/activ E.g.: (in the lower-left corner, but impossible to copy) NSP1: S135R NSP13: R392C The page has a "download" button, which requests https://www.ncbi.nlm.nih.gov/genomes/VirusVariation/activ/?report=download_lineages and offers to save it as "lineages.csv". I think that the information you're looking for is available if you feed this URL to read.csv() and look at the aa_definition column. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Covid-19 Variants & Lineages
Dear R-Users, 1.) Is there a package which gives the full code of a Covid-19 lineage/variant? E.g. Omicron = B.1.1.529, while BA correspond to specific subtypes of Omicron: BA.x: BA.1 = B.1.1.529.1; BA.1.1 = B.1.1.529.1.1; BA.1.1.5 = B.1.1.529.1.1.5; Is there any package to offer such trans-coding functionality? And possibly warn if the lineage has been withdrawn? The full list is available on GitHub: https://github.com/cov-lineages/pango-designation/blob/master/lineage_notes.txt Some of the lineages are reassigned or withdrawn. It seems feasible to process this list. 2.) Covid Mutations Is there a package to retrieve the full list of mutations of a specific lineage/variant? E.g. each node in the "tree" B.1.1.529.1.1.5 accumulates 1 or more new mutations. It is probably very uncommon for a mutation to get mutated back; so the mutations accumulate. The data on the NCBI page "Explore in SARS-CoV-2 Variants Overview" seems very difficult to download: https://www.ncbi.nlm.nih.gov/activ E.g.: (in the lower-left corner, but impossible to copy) NSP1: S135R NSP13: R392C [...] Maybe there is a package already offering such functionality. I am now looking over the documentation of the COVID19.Analytics package, but I may miss the relevant functions. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R emulation of FindRoot in Mathematica
Dear Troels, There might be an error in one of the eqs: # [modified] TODO: check; mg2atp <- 10^(-7)*Mg*mgatp; This version works: x0 = c( atp = 0.008, adp = 0.1, pi = 0.003, pcr = 0.042, cr = 0.004, lactate = 0.005 ) / 30; # solved the positive value x0[1] = 1E-6; x = multiroot(solve.AcidSpecies, x0, H = 4E-8) print(x) # Results: # atp adp pi pcr cr lactate # 4.977576e-04 3.254998e-06 5.581774e-08 4.142785e-09 5.011807e-10 4.973691e-03 Sincerely, Leonard On 1/23/2023 2:24 AM, Leonard Mada wrote: > Dear Troels, > > I send you an updated version of the response. I think that a hybrid > approach is probably the best solution: > - Input / Starting values = free: ATP, ADP, Crea, CreaP, lactate, > inorganic phosphate; > - Output: diff(Total, given total value); > - I assume that the pH is given; > - I did NOT check all individual eqs; > > library(rootSolve) > > solve.AcidSpecies = function(x, H, Mg=0.0006, K = 0.12) { > # ... the eqs: ...; > ATPTotal = KaATPH * ATP * H + KaATPH2 * KaATPH * ATP * H^2 + > + KaATPH3 * KaATPH2 * KaATPH * ATP * H^3 + KdATPMg * ATP * Mg + > + KdATPHMg * ATP * H * Mg + KdATPMg2 * ATP * Mg^2 + KdATPK * ATP * K; > ### Output: > total = c( > ATPTotal - 0.008, > ADPTotal - 1E-5, > CreaTotal - 0.004, > CreaPTotal - 0.042, > PiTotal - 0.003, > LactateTotal - 0.005); > return(total); > } > > KaATPH = 10^6.494; # ... > x0 = c(ATP = 0.008, ADP = 1E-5, > Crea = 0.004, CreaP = 0.042, Pi = 0.003, Lactate = 0.005) / 2; > x = multiroot(solve.AcidSpecies, x0, H = 4E-8); > > > print(x) > > > I think that it is possible to use the eqs directly as provided > initially (with some minor corrections). You only need to output the > known totals (as a diff), see full code below. > > > Sincerely, > > > Leonard > > > > library(rootSolve) > > solve.AcidSpecies = function(x, H, Mg=0.0006, k = 0.12) { > # with(list(x), { seems NOT to work with multiroot }); > atp = x[1]; adp = x[2]; pi = x[3]; > pcr = x[4]; cr = x[5]; lactate = x[6]; > > ### > hatp <- 10^6.494*H*atp > hhatp <- 10^3.944*H*hatp > hhhatp <- 10^1.9*H*hhatp > atp <- 10*H*hhhatp > mgatp <- 10^4.363*atp*Mg > mghatp <- 10^2.299*hatp*Mg > mg2atp <- 10^1-7*Mg*mgatp > katp <- 10^0.959*atp*k > > hadp <- 10^6.349*adp*H > hhadp <- 10^3.819*hadp*H > hhhadp <- 10*H*hhadp > mgadp <- 10^3.294*Mg*adp > mghadp <- 10^1.61*Mg*hadp > mg2adp <- 10*Mg*mgadp > kadp <- 10^0.82*k*adp > > hpi <- 10^11.616*H*pi > hhpi <- 10^6.7*H*hpi > hhhpi <- 10^1.962*H*hhpi > mgpi <- 10^3.4*Mg*pi > mghpi <- 10^1.946*Mg*hpi > mghhpi <- 10^1.19*Mg*hhpi > kpi <- 10^0.6*k*pi > khpi <- 10^1.218*k*hpi > khhpi <- 10^-0.2*k*hhpi > > hpcr <- 10^14.3*H*pcr > hhpcr <- 10^4.5*H*hpcr > hhhpcr <- 10^2.7*H*hhpcr > pcr <- 100*H*hhhpcr > mghpcr <- 10^1.6*Mg*hpcr > kpcr <- 10^0.74*k*pcr > khpcr <- 10^0.31*k*hpcr > khhpcr <- 10^-0.13*k*hhpcr > > hcr <- 10^14.3*H*cr > hhcr <- 10^2.512*H*hcr > > hlactate <- 10^3.66*H*lactate > mglactate <- 10^0.93*Mg*lactate > > tatp <- atp + hatp + hhatp + hhhatp + mgatp + mghatp + mg2atp + katp > tadp <- adp + hadp + hhadp + hhhadp + mghadp + mgadp + mg2adp + kadp > tpi <- pi + hpi + hhpi + hhhpi + mgpi + mghpi + mghhpi + kpi + khpi + > khhpi > tpcr <- pcr + hpcr + hhpcr + hhhpcr + pcr + mghpcr + kpcr + khpcr > + khhpcr > tcr <- cr + hcr + hhcr > tlactate <- lactate + hlactate + mglactate > # tmg <- Mg + mgatp + mghatp + mg2atp + mgadp + mghadp + mg2adp + mgpi + > # kghpi + mghhpi + mghpcr + mglactate > # tk <- k + katp + kadp + kpi + khpi + khhpi + kpcr + khpcr + khhpcr > > > total = c( > tatp - 0.008, > tadp - 0.1, > tpi - 0.003, > tpcr - 0.042, > tcr - 0.004, > tlactate - 0.005) > return(total); > # }) > } > > # conditions > > x0 = c( > atp = 0.008, > adp = 0.1, > pi = 0.003, > pcr = 0.042, > cr = 0.004, > lactate = 0.005 > ) / 3; > # tricky to get a positive value !!! > x0[1] = 0.001; # still NOT positive; > > x = multiroot(solve.AcidSpecies, x0, H = 4E-8) > > > On 1/23/2023 12:37 AM, Leonard Mada wrote: >> Dear Troels, >> >> The system that you mentioned needs to be transformed first. The >> equations are standard acid-base equilibria-type equations in >> analytic chemistry. >> >> ATP + H <-> ATPH >> ATPH + H <-> ATPH2 >> ATPH2 + H <-> ATPH3 >> [...] >> The total amount of [ATP] is provided, while the concentration of the >> intermediates are unknown. >> >> Q.) It was unclear from your description: >> Do you know the pH? >> Or is the pH also unknown? >> >> I believe that the system is exactly solvable. The "multivariable" >> system/solution may be easier to write down: but is uglier to solve, >> as the "system" is under-determined. You can use optim in such cases, >> see eg. an example were I use it: >> https://github.com/discoleo/R/blob/master/Stat/Polygons.Examples.R >> >> >> a2 = optim(c(0.9, 0.5), polygon
Re: [R] R emulation of FindRoot in Mathematica
Dear Troels, I send you an updated version of the response. I think that a hybrid approach is probably the best solution: - Input / Starting values = free: ATP, ADP, Crea, CreaP, lactate, inorganic phosphate; - Output: diff(Total, given total value); - I assume that the pH is given; - I did NOT check all individual eqs; library(rootSolve) solve.AcidSpecies = function(x, H, Mg=0.0006, K = 0.12) { # ... the eqs: ...; ATPTotal = KaATPH * ATP * H + KaATPH2 * KaATPH * ATP * H^2 + + KaATPH3 * KaATPH2 * KaATPH * ATP * H^3 + KdATPMg * ATP * Mg + + KdATPHMg * ATP * H * Mg + KdATPMg2 * ATP * Mg^2 + KdATPK * ATP * K; ### Output: total = c( ATPTotal - 0.008, ADPTotal - 1E-5, CreaTotal - 0.004, CreaPTotal - 0.042, PiTotal - 0.003, LactateTotal - 0.005); return(total); } KaATPH = 10^6.494; # ... x0 = c(ATP = 0.008, ADP = 1E-5, Crea = 0.004, CreaP = 0.042, Pi = 0.003, Lactate = 0.005) / 2; x = multiroot(solve.AcidSpecies, x0, H = 4E-8); print(x) I think that it is possible to use the eqs directly as provided initially (with some minor corrections). You only need to output the known totals (as a diff), see full code below. Sincerely, Leonard library(rootSolve) solve.AcidSpecies = function(x, H, Mg=0.0006, k = 0.12) { # with(list(x), { seems NOT to work with multiroot }); atp = x[1]; adp = x[2]; pi = x[3]; pcr = x[4]; cr = x[5]; lactate = x[6]; ### hatp <- 10^6.494*H*atp hhatp <- 10^3.944*H*hatp hhhatp <- 10^1.9*H*hhatp atp <- 10*H*hhhatp mgatp <- 10^4.363*atp*Mg mghatp <- 10^2.299*hatp*Mg mg2atp <- 10^1-7*Mg*mgatp katp <- 10^0.959*atp*k hadp <- 10^6.349*adp*H hhadp <- 10^3.819*hadp*H hhhadp <- 10*H*hhadp mgadp <- 10^3.294*Mg*adp mghadp <- 10^1.61*Mg*hadp mg2adp <- 10*Mg*mgadp kadp <- 10^0.82*k*adp hpi <- 10^11.616*H*pi hhpi <- 10^6.7*H*hpi hhhpi <- 10^1.962*H*hhpi mgpi <- 10^3.4*Mg*pi mghpi <- 10^1.946*Mg*hpi mghhpi <- 10^1.19*Mg*hhpi kpi <- 10^0.6*k*pi khpi <- 10^1.218*k*hpi khhpi <- 10^-0.2*k*hhpi hpcr <- 10^14.3*H*pcr hhpcr <- 10^4.5*H*hpcr hhhpcr <- 10^2.7*H*hhpcr pcr <- 100*H*hhhpcr mghpcr <- 10^1.6*Mg*hpcr kpcr <- 10^0.74*k*pcr khpcr <- 10^0.31*k*hpcr khhpcr <- 10^-0.13*k*hhpcr hcr <- 10^14.3*H*cr hhcr <- 10^2.512*H*hcr hlactate <- 10^3.66*H*lactate mglactate <- 10^0.93*Mg*lactate tatp <- atp + hatp + hhatp + hhhatp + mgatp + mghatp + mg2atp + katp tadp <- adp + hadp + hhadp + hhhadp + mghadp + mgadp + mg2adp + kadp tpi <- pi + hpi + hhpi + hhhpi + mgpi + mghpi + mghhpi + kpi + khpi + khhpi tpcr <- pcr + hpcr + hhpcr + hhhpcr + pcr + mghpcr + kpcr + khpcr + khhpcr tcr <- cr + hcr + hhcr tlactate <- lactate + hlactate + mglactate # tmg <- Mg + mgatp + mghatp + mg2atp + mgadp + mghadp + mg2adp + mgpi + # kghpi + mghhpi + mghpcr + mglactate # tk <- k + katp + kadp + kpi + khpi + khhpi + kpcr + khpcr + khhpcr total = c( tatp - 0.008, tadp - 0.1, tpi - 0.003, tpcr - 0.042, tcr - 0.004, tlactate - 0.005) return(total); # }) } # conditions x0 = c( atp = 0.008, adp = 0.1, pi = 0.003, pcr = 0.042, cr = 0.004, lactate = 0.005 ) / 3; # tricky to get a positive value !!! x0[1] = 0.001; # still NOT positive; x = multiroot(solve.AcidSpecies, x0, H = 4E-8) On 1/23/2023 12:37 AM, Leonard Mada wrote: > Dear Troels, > > The system that you mentioned needs to be transformed first. The > equations are standard acid-base equilibria-type equations in analytic > chemistry. > > ATP + H <-> ATPH > ATPH + H <-> ATPH2 > ATPH2 + H <-> ATPH3 > [...] > The total amount of [ATP] is provided, while the concentration of the > intermediates are unknown. > > Q.) It was unclear from your description: > Do you know the pH? > Or is the pH also unknown? > > I believe that the system is exactly solvable. The "multivariable" > system/solution may be easier to write down: but is uglier to solve, > as the "system" is under-determined. You can use optim in such cases, > see eg. an example were I use it: > https://github.com/discoleo/R/blob/master/Stat/Polygons.Examples.R > > > a2 = optim(c(0.9, 0.5), polygonOptim, d=x) > # where the function polygonOptim() computes the distance between the > starting-point & ending point of the polygon; > # (the polygon is defined only by the side lengths and optim() tries > to compute the angles); > # optimal distance = 0, when the polygon is closed; > # Note: it is possible to use more than 2 starting values in the > example above (the version with optim) works quit well; > # - but you need to "design" the function that is optimized for your > particular system, e.g. > # by returning: c((ATPTotal - value)^2, (ADPTotal - value)^2, ...); > > > S.1.) Exact Solution > ATP system: You can express all components as eqs of free ATP, [ATP], > and [H], [Mg], [K]. > ATPH = KaATPH * ATP * H; > ATPH2 = KaATPH2 * ATPH * H > = KaATPH2 * KaATPH * ATP * H^2; > [...] > > Then you substitute these into
Re: [R] return value of {....}
Dear Akshay, The best response was given by Andrew. "{...}" is not a closure. This is unusual for someone used to C-type languages. But I will try to explain some of the rationale. In the case that "{...}" was a closure, then external variables would need to be explicitly declared before the closure (in order to reuse those values): intermediate = c() { intermediate = ...; result = someFUN(intermediate); } 1.) Interactive Sessions This is cumbersome in interactive sessions. For example: you often compute the mean or the variance as intermediary results, and will need them later on as well. They could have been computed outside the "closure", but writing code in interactive sessions may not always be straightforward. 2.) Missing arguments f = function(x, y) { if(missing(y)) { # assuming x = matrix y = x[,2]; x = x[,1]; } } It would be much more cumbersome to define/use a temporary tempY. I hope this gives a better perspective why this is indeed a useful feature - even if it is counterintuitive. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with integrate(function(x) x^3 / sin(x), -pi/2, pi/2)
Dear List-Members, I encounter a problem while trying to integrate the following function: integrate(function(x) x^3 / sin(x), -pi/2, pi/2) # Error in integrate(function(x) x^3/sin(x), -pi/2, pi/2) : # non-finite function value # the value should be finite: curve(x^3 / sin(x), -pi/2, pi/2) integrate(function(x) x^3 / sin(x), -pi/2, 0) integrate(function(x) x^3 / sin(x), 0, pi/2) # but this does NOT work: integrate(function(x) x^3 / sin(x), -pi/2, pi/2, subdivisions=4096) integrate(function(x) x^3 / sin(x), -pi/2, pi/2, subdivisions=4097) # works: integrate(function(x) x^3 / sin(x), -pi/2, pi/2 + 1E-10) # Note: works directly with other packages pracma::integral(function(x) x^3 / sin(x), -pi/2, pi/2 ) # 3.385985 I hope that integrate() gets improved in base R as well. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract from a list of lists
Dear Terry, The following approach may be more suitable: fits <- lapply(argument, function) fits.df = do.call(rbind, fits); It works if all the lists returned by "function" have the same number of elements. Example: fits.df = lapply(seq(3), function(id) { list( beta = rnorm(1), loglik = id^2, iter = sample(seq(100,200), 1), id = id); }) fits.df = do.call(rbind, x); fits.df I have added an id in case the function returns a variable number of "rows". Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hidden Problems with Clustering Algorithms
Dear R-Users, Hidden Problems with Clustering Algorithms I stumbled recently upon a presentation about hierarchical clustering. Unfortunately, it contains a hidden problem of clustering algorithms. The problem is deeper and I think that it warrants a closer inspection by the statistical community. The presentation is available online. Both the scaled & non-scaled versions show the problem. de.NBI course - Advanced analysis of quantitative proteomics data using R: 03b Clustering Part2 [Note: it's more like introductory notes to basic statistics] https://www.youtube.com/watch?v=7e1uW_BhljA times: - at 6:15 - 6:28 & 6:29 - 7:10 [2 versions, both non-scaled] - at 5:51 - 6:10 [the scaled version] - same problem at 7:56; PROBLEM Non-Scaled Version: (e.g. the one at 6:15) - the upper 2 rows are split into various sub-clusters; - the top tree: a cluster is formed by the right-right sub-tree (some 17 "genes" or similar "activities" / "expressions"); - the left-most 2 "genes" are actually over-expressed "genes" and functionally really belong to the previous/right sub-cluster; Scaled-Version: (at 5:52) - the left-most 2 "genes" are over-expressed at the same time with the right cluster, and not otherwise; Unfortunately, the 2 over-expressed (outliers or extreme-values) are split off from the relevant cluster and inserted as a separate main-branch in the top dendrogram. Switching only the main left & right branches in the top tree would only mask this problem. The 2 pseudo-outliers are really the (probably) upper values in the larger cluster of over-expressed "genes" (all the dark genes should belong to the same cluster). The middle sub-cluster shows really NO activity (some 16 "genes"). The main branches in the top tree should really split between this *NO*-activity cluster and the cluster showing activity (including the 2 massively over-expressed genes). The problem is present in the scaled version as well. The hierarchical clustering algorithm fails. I have not analysed the data, but some problems may contribute to this: - "gene expression" or "activity" may not be linear, but exponential or follow some power rule: a logarithmic transformation (or some other transformation) may have been useful; - simple distances between clusters may be too inaccurate; - the variance in the low-activity (middle) cluster may be very low (almost 0!), while the variance in the high-activity cluster may be much higher: the Mahalanobis distance or joining the sub-clusters based on some z/t-test taking into account the different variances may be more robust; These questions should be addressed by more senior statisticians. I hope that the presentation remains on-line as is, as the clustering problem is really easy to see and to analyse. It is impossible to detect and visualise such anomalies in a heatmap with 1,000 gene-expressions or with 10,000 genes, or with 500-1000 samples. It is very obvious on this small heatmap. I do not know if there are any robust tools to validate the generated trees. Inspecting by "eye" a dendrogram with > 1,000 genes and hundreds of samples is really futile. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Partition vector of strings into lines of preferred width
Dear Andrew, Thank you for the fast reply. I forgot about strwrap. Though my problem is slightly different. I do have the actual vector. Of course, I could first join the strings - but this then involves both a join and a split (done by strwrap). Maybe its possible to avoid the join and the split. My 2nd approach may be also fine, but I have not tested it thoroughly (and I may miss an existing solution). Sincerely, Leonard On 10/29/2022 12:51 AM, Andrew Simmons wrote: I would suggest using strwrap(), the documentation at ?strwrap has plenty of details and examples. For paragraphs, I would usually do something like: strwrap(x = , width = 80, indent = 4) On Fri, Oct 28, 2022 at 5:42 PM Leonard Mada via R-help wrote: Dear R-Users, text = " What is the best way to split/cut a vector of strings into lines of preferred width? I have come up with a simple solution, albeit naive, as it involves many arithmetic divisions. I have an alternative idea which avoids this problem. But I may miss some existing functionality!" # Long vector of strings: str = strsplit(text, " |(?<=\n)", perl=TRUE)[[1]]; lenWords = nchar(str); # simple, but naive solution: # - it involves many divisions; cut.character.int = function(n, w) { ncm = cumsum(n); nwd = ncm %/% w; count = rle(nwd)$lengths; pos = cumsum(count); posS = pos[ - length(pos)] + 1; posS = c(1, posS); pos = rbind(posS, pos); return(pos); } npos = cut.character.int(lenWords, w=30); # lets print the results; for(id in seq(ncol(npos))) { len = npos[2, id] - npos[1, id]; cat(str[seq(npos[1, id], npos[2, id])], c(rep(" ", len), "\n")); } The first solution performs an arithmetic division on all string lengths. It is possible to find out the total length and divide only the last element of the cumsum. Something like this should work (although it is not properly tested). w = 30; cumlen = cumsum(lenWords); max = tail(cumlen, 1) %/% w + 1; pos = cut(cumlen, seq(0, max) * w); count = rle(as.numeric(pos))$lengths; # everything else is the same; pos = cumsum(count); posS = pos[ - length(pos)] + 1; posS = c(1, posS); pos = rbind(posS, pos); npos = pos; # then print The cut() may be optimized as well, as the cumsum is sorted ascending. I did not evaluate the efficiency of the code either. But do I miss some existing functionality? Note: - technically, the cut() function should probably return a vector of indices (something like: rep(seq_along(count), count)), but it was more practical to have both the start and end positions. Many thanks, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Partition vector of strings into lines of preferred width
Dear R-Users, text = " What is the best way to split/cut a vector of strings into lines of preferred width? I have come up with a simple solution, albeit naive, as it involves many arithmetic divisions. I have an alternative idea which avoids this problem. But I may miss some existing functionality!" # Long vector of strings: str = strsplit(text, " |(?<=\n)", perl=TRUE)[[1]]; lenWords = nchar(str); # simple, but naive solution: # - it involves many divisions; cut.character.int = function(n, w) { ncm = cumsum(n); nwd = ncm %/% w; count = rle(nwd)$lengths; pos = cumsum(count); posS = pos[ - length(pos)] + 1; posS = c(1, posS); pos = rbind(posS, pos); return(pos); } npos = cut.character.int(lenWords, w=30); # lets print the results; for(id in seq(ncol(npos))) { len = npos[2, id] - npos[1, id]; cat(str[seq(npos[1, id], npos[2, id])], c(rep(" ", len), "\n")); } The first solution performs an arithmetic division on all string lengths. It is possible to find out the total length and divide only the last element of the cumsum. Something like this should work (although it is not properly tested). w = 30; cumlen = cumsum(lenWords); max = tail(cumlen, 1) %/% w + 1; pos = cut(cumlen, seq(0, max) * w); count = rle(as.numeric(pos))$lengths; # everything else is the same; pos = cumsum(count); posS = pos[ - length(pos)] + 1; posS = c(1, posS); pos = rbind(posS, pos); npos = pos; # then print The cut() may be optimized as well, as the cumsum is sorted ascending. I did not evaluate the efficiency of the code either. But do I miss some existing functionality? Note: - technically, the cut() function should probably return a vector of indices (something like: rep(seq_along(count), count)), but it was more practical to have both the start and end positions. Many thanks, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [External Help] Multivariate Polynomials in R
Dear R Users, I have written some R code for multivariate polynomials in R. I am looking forward for some help in redesigning and improving the code. Although this code was not planned initially to be released as a package, the functionality has become quite versatile over time. I will provide some examples below. If anyone is interested in multivariate polynomials and has some spare time, or has some students interested to learn some interesting math, feel free to contact me. The immediate focus should be on: 1) Writing/improving the automatic tests; 2) Redesigning the code (and build an R package); As the code has grown in size, I am very cautious to change anything, until proper tests are written. I have started to write some test code (the link to the GitHub page is below), but I am not yet very confident how to properly write the tests and also lack the time as well. I will appreciate any expertise and help on this topic. Ultimately, I hope to be able to focus more on the math topics. I will post a separate call for some of these topics. CODE DETAILS The source files are on GitHub: https://github.com/discoleo/R/blob/master/Math/Polynomials.Helper.R - (all) files named Polynomials.Helper.XXX.R are needed; (~ 25 files, including the test files); - if requested, I can also upload a zip file with all these source files; - the code started as some helper scripts (which is why all those files are mixed with other files); The multivariate polynomials are stored as data.frames and R's aggregate() function is the workhorse: but it proved sufficiently fast and the code works well even with polynomials with > 10,000 monomials. I have some older Java code which used a TreeMap (sorted map), but I do not maintain that code anymore. I was very reserved initially regarding the efficiency of the data frame; but it worked well! And it proved very useful for sub-setting specific monomials! I have attached some concrete examples below. Sincerely, Leonard ### Examples source("Polynomials.Helper.R") # - requires also the other Helper scripts; # - not strictly needed (but are loaded automatically): # library(polynom) # library(pracma) ### Example 1: n = 2; # Power "n" will be evaluated automatically p1 = toPoly.pm("x^n*y + b*z - R") p2 = toPoly.pm("y^n*z + b*x - R") p3 = toPoly.pm("z^n*x + b*y - R") pR = solve.lpm(p1, p2, p3, xn=c("z", "y")) str(pR) # 124 monomials # tweaking manually can improve the results; pR = solve.lpm(p1, p2, p3, xn=c("y", "z")) str(pR) # pR[[2]]$Rez: 19 monomials: much better; pR2 = div.pm(pR[[2]]$Rez, "x^3 + b*x - R", "x") # Divisible! str(pR2) # Order 12 polynomial in x (24 monomials); ### Note: # - the P[12] contains the distinct roots: # it is the minimal order polynomial; # - the trivial solution (x^3 + b*x = R) was removed; # - this is the naive way to solve this system (but good as Demo); # print the coefficients of x; # (will be used inside the function coeff.S3Ht below) pR2 = pR2$Rez; pR2$coeff = - pR2$coeff; # positive leading coeff; toCoeff(pR2, "x") ### Quick Check solve.S3Ht = function(R, b) { coeff = coeff.S3Ht(R, b); x = roots(coeff); # library(pracma) # Note: pracma uses leading to free coeff; z = b*x^11 - R*x^10 - 2*R^2*b*x^5 + 2*R^2*b^2*x^3 + R*b^4*x^2 - R*b^5; z = z / (- R^2*x^6 - R*b^2*x^5 + 3*R*b^3*x^3 - b^6); y = (R - z^2*x) / b; sol = cbind(x, y, z); return(sol); } coeff.S3Ht = function(R, b) { coeff = c(b^2, - 2*R*b, R^2 - b^3, 3*R*b^2, - 3*R^2*b + b^4, R^3 - 4*R*b^3, 2*R^2*b^2 - b^5, 5*R*b^4, R^4 - R^2*b^3 + b^6, - 3*R^3*b^2 - 3*R*b^5, - R^4*b + 3*R^2*b^4 - b^7, 2*R^3*b^3 - R*b^6, - R^2*b^5 + b^8); return(coeff); } R = 5; b = -2; sol = solve.S3Ht(R, b) # all 12 sets of solutions: x = sol[,1]; y = sol[,2]; z = sol[,3]; ### Test: x^2*y + b*z y^2*z + b*x z^2*x + b*y id = 1; eval.pm(p1, list(x=x[id], y=y[id], z=z[id], b=b, R=R)) ## ### Example 2: n = 5 p1 = toPoly.pm("(x + a)^n + (y + a)^n - R1") p2 = toPoly.pm("(x + b)*(y + b) - R2") # Very Naive way to solve: pR = solve.pm(p1, p2, "y") str(pR) table(pR$Rez$x) # Order 10 with 109 monomials; # [very naive!] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regexpr: R takes very long with non-existent pattern
Dear Bert, The variable patt does not exist in the R environment. I was pasting the code for an R function in the R console and I had a syntax error on a line. But the next lines executed simply as simple R code. The variable patt was not previously defined. Though x was a different object and the long execution time may originate there. x = original xml with the Pubmed abstracts Sincerely, Leonard On 5/19/2022 3:31 AM, Bert Gunter wrote: > Doubt that I can help, but what does "not defined" mean? -- NA, "", " > " ? Something else? > I would guess that if it's NA, you should get an immediate error. > If it's "" , that's a legitimate pattern and would result in matches > of 0 length for everything, which might trigger an error in other > parts of your code. > All a guess, though. > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Wed, May 18, 2022 at 5:08 PM Leonard Mada via R-help > wrote: > > Dear R Users, > > > I have run the following command in R: > > # x = larger vector of strings (1200 Pubmed abstracts); > # patt = not defined; > npos = regexpr(patt, x, perl=TRUE); > # Error in regexpr(patt, x, perl = TRUE) : object 'patt' not found > > > The problem: > > R becomes unresponsive and it takes 1-2 minutes to return the > error. The > operation completes almost instantaneously with a valid pattern. > > Is there a reason for this behavior? > > Tested with R 4.2.0 on MS Windows 10. > > > I have uploaded a set with 1200 Pubmed abstracts on Github, if anyone > wants to check: > > - see file: Example_Abstracts_Title_Pubmed.csv; > > https://github.com/discoleo/R/tree/master/TextMining/Pubmed > > The variable patt was not defined due to an error: but it took > very long > to exit the operation and report the error. > > > Many thanks, > > > Leonard > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regexpr: R takes very long with non-existent pattern
Dear Andrew, I screwed it a little bit up. The object was not a string vector, but an xml object (the original xml with the abstracts). str(x) List of 2 $ node: $ doc : - attr(*, "class")= chr [1:2] "xml_document" "xml_node" i pasted the R code for a function but had an error, which stopped the parsing of the function. But the next lines were still executed: npos = regexpr(patt, x, perl=TRUE); # Error in regexpr(patt, x, perl = TRUE) : object 'patt' not found Variable x was actually the xml object - my mistake. It still takes 1-2 minutes to generate the final error. Is regexpr trying to parse the xml with as.character first (I have not checked this)? It makes more sense to first parse the regex expression. Sincerely, Leonard On 5/19/2022 3:26 AM, Andrew Simmons wrote: Hello, I tried this myself, something like: dat <- utils::read.csv( "https://raw.githubusercontent.com/discoleo/R/master/TextMining/Pubmed/Example_Abstracts_Title_Pubmed.csv";, check.names = FALSE ) regexpr(patt, dat$Abstract, perl = TRUE) regexpr(patt, dat$Title, perl = TRUE) and I can't reproduce your issue. Mine seems to raise the error within a second or less that object 'patt' does not exist. I'm using R 4.2.0 and Windows 11, though that shouldn't be making a difference: if you look at Sys.info(), it's still Windows 10 with a build version of 22000. Don't really know what else to say, have you tried it again since? Regards, Andrew Simmons On Wed, May 18, 2022 at 5:09 PM Leonard Mada via R-help wrote: Dear R Users, I have run the following command in R: # x = larger vector of strings (1200 Pubmed abstracts); # patt = not defined; npos = regexpr(patt, x, perl=TRUE); # Error in regexpr(patt, x, perl = TRUE) : object 'patt' not found The problem: R becomes unresponsive and it takes 1-2 minutes to return the error. The operation completes almost instantaneously with a valid pattern. Is there a reason for this behavior? Tested with R 4.2.0 on MS Windows 10. I have uploaded a set with 1200 Pubmed abstracts on Github, if anyone wants to check: - see file: Example_Abstracts_Title_Pubmed.csv; https://github.com/discoleo/R/tree/master/TextMining/Pubmed The variable patt was not defined due to an error: but it took very long to exit the operation and report the error. Many thanks, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] regexpr: R takes very long with non-existent pattern
Dear R Users, I have run the following command in R: # x = larger vector of strings (1200 Pubmed abstracts); # patt = not defined; npos = regexpr(patt, x, perl=TRUE); # Error in regexpr(patt, x, perl = TRUE) : object 'patt' not found The problem: R becomes unresponsive and it takes 1-2 minutes to return the error. The operation completes almost instantaneously with a valid pattern. Is there a reason for this behavior? Tested with R 4.2.0 on MS Windows 10. I have uploaded a set with 1200 Pubmed abstracts on Github, if anyone wants to check: - see file: Example_Abstracts_Title_Pubmed.csv; https://github.com/discoleo/R/tree/master/TextMining/Pubmed The variable patt was not defined due to an error: but it took very long to exit the operation and report the error. Many thanks, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to convert category (or range/group) into continuous?
Dear Marna, If you want to extract the middle of those intervals, please find below an improved variant of Luigi's code. Note: - it is more efficient to process the levels of a factor, instead of all the individual strings; - I envision that there are benefits in a large data frame (> 1 million rows) - although I have not explicitly checked it; - the code also handles better the open/closed intervals; - the returned data structure may require some tweaking (currently returns a data.frame); ### Middle of an Interval mid.factor = function(x, inf.to = NULL, split.str=",") { lvl0 = levels(x); lvl = lvl0; lvl = sub("^[(\\[]", "", lvl); lvl = sub("[])]$", "", lvl); # tricky; lvl = strsplit(lvl, split.str); lvl = lapply(lvl, function(x) as.numeric(x)); if( ! is.null(inf.to)) { FUN = function(x) { if(any(x == Inf)) 1 else if(any(x == - Inf)) -1 else 0; } whatInf = sapply(lvl, FUN); # TODO: more advanced; lvl[whatInf == -1] = inf.to[1]; lvl[whatInf == 1] = inf.to[2]; } mid = sapply(lvl, mean); lvl = data.frame(lvl=lvl0, mid=mid); merge(data.frame(lvl=x), lvl, by="lvl"); } # uses the daT data frame; # requires a factor: # - this is probably the case with the original data; daT$group = as.factor(daT$group); mid.factor(daT$group); I have uploaded this code also on my GitHub list of useful data tools: https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Method Guidance
Dear Jeff, I am sending an updated version of the code. The initial version assumed that the time points correspond to an integer sequence. The code would fail for arbitrary times. The new code is robust. I still assume that the data is in column-format and that you want the time to the previous "A"-event, even if there are other non-A events in between. The code is similar, but we cannot use seq(0, x-1) anymore. Instead, we will repeat the time point of the previous A-event. (Last-A Carried forward) # jrdf = the data frame from my previous mail; cumEvent = cumsum(jrdf$Event_A); # we cannot use the actual values of the cumsum, # but will use the number of (same) values to the previous event; freqEvent = rle(cumEvent); freqEvent = freqEvent$lengths; # repeat the time-points timesA = jrdf$Time[jrdf$Event_A == 1]; sameTime = rep(timesA, freqEvent); timeToA = jrdf$Time - sameTime; ### Step 2: # extract/view the times (as before); timeToA[jrdf$Event_B >= 1]; # Every Time to A: e.g. for multiple extractions; cbind(jrdf, timeToA); # Time to A only for B: set non-B to 0; # Note: - the rle() function might be less known; - it is "equivalent" to: tbl = table(cumEvent); # to be on the safe side (as the cumsum is increasing): id = order(as.numeric(names(tbl))); tbl = tbl[id]; Hope this helps, Leonard On 1/14/2022 3:30 AM, Leonard Mada wrote: Dear Jeff, My answer is a little bit late, but I hope it helps. jrdf = read.table(text="Time Event_A Event_B Lag_B 1 1 1 0 2 0 1 1 3 0 0 0 4 1 0 0 5 0 1 1 6 0 0 0 7 0 1 3 8 1 1 0 9 0 0 0 10 0 1 2", header=TRUE, stringsAsFactors=FALSE) Assuming that: - Time, Event_A, Event_B are given; - Lag_B needs to be computed; Step 1: - compute time to previous Event A; tmp = jrdf[, c(1,2)]; # add an extra event so last rows are not lost: tmp = rbind(tmp, c(nrow(tmp) + 1, 1)); timeBetweenA = diff(tmp$Time[tmp$Event_A > 0]); timeToA = unlist(sapply(timeBetweenA, function(x) seq(0, x-1))) ### Step 2: # - extract the times; timeToA[jrdf$Event_B >= 1]; cbind(jrdf, timeToA); Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Method Guidance
Dear Jeff, My answer is a little bit late, but I hope it helps. jrdf = read.table(text="Time Event_AEvent_B Lag_B 1 1 10 2 0 11 3 0 00 4 1 00 5 0 11 6 0 00 7 0 13 8 1 10 9 0 00 10 0 12", header=TRUE, stringsAsFactors=FALSE) Assuming that: - Time, Event_A, Event_B are given; - Lag_B needs to be computed; Step 1: - compute time to previous Event A; tmp = jrdf[, c(1,2)]; # add an extra event so last rows are not lost: tmp = rbind(tmp, c(nrow(tmp) + 1, 1)); timeBetweenA = diff(tmp$Time[tmp$Event_A > 0]); timeToA = unlist(sapply(timeBetweenA, function(x) seq(0, x-1))) ### Step 2: # - extract the times; timeToA[jrdf$Event_B >= 1]; cbind(jrdf, timeToA); Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum every n (4) observations by group
Dear Miluji, something like this could help: sapply(tapply(x$Value, x$ID, cumsum), function(x) x[seq(4, length(x), by=4)] - c(0, x[head(seq(4, length(x), by=4), -1)])) 1.) Step 1: Compute the cumsum for each ID: tapply(x$Value, x$ID, cumsum) 2.) Step 2: - iterate over the resulting list and select each 4th value; - you can either run a diff on this or subtract directly the (n-4) sum; Note: - you may wish to check if the last value is a multiple of 4; - alternative: you can do a LOCF (last observation carried forward); Hope this code example helps. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to create a proper S4 class?
Dear Martin, thank you very much for the guidance. Ultimately, I got it running. But, for mysterious reasons, it was challenging: - I skipped for now the inheritance (and used 2 explicit non-inherited slots): this is still unresolved; [*] - the code is definitely cleaner; [*] Mysterious errors, like: "Error in cbind(deparse.level, ...) : cbind for agentMatrix is only defined for 2 agentMatrices" One last question pops up: If B inherits from A, how can I down-cast back to A? b = new("B", someA); ??? as.A(b) ??? Is there a direct method? I could not explore this, as I am still struggling with the inheritance. The information may be useful, though: it helps in deciding the design of the data-structures. [Actually, all base-methods should work natively as well - but to have a solution in any case.] Sincerely, Leonard On 11/17/2021 5:48 PM, Martin Morgan wrote: Hi Leonard -- Remember that a class can have 'has a' and 'is a' relationships. For instance, a People class might HAVE slots name and age .People <- setClass( "People", slots = c(name = "character", age = "numeric") ) while an Employees class might be described as an 'is a' relationship -- all employeeds are people -- while also having slots like years_of_employment and job_title .Employees <- setClass( "Employees", contains = "People", slots = c(years_of_employment = "numeric", job_title = "character") ) I've used .People and .Employees to capture the return value of setClass(), and these can be used as constructors people <- .People( name = c("Simon", "Andre"), age = c(30, 60) ) employees = .Employees( people, # unnamed arguments are class(es) contained in 'Employees' years_of_employment = c(3, 30), job_title = c("hard worker", "manager") ) I would not suggest using attributes in addition to slots. Rather, embrace the paradigm and represent attributes as additional slots. In practice it is often helpful to write a constructor function that might transform between formats useful for users to formats useful for programming, and that can be easily documented. Employees <- function(name, age, years_of_employment, job_title) { ## implement sanity checks here, or in validity methods people <- .People(name = name, age = age) .Employees(people, years_of_employment = years_of_employment, job_title = job_title) } plot() and lines() are both S3 generics, and the rules for S3 generics using S4 objects are described in the help page ?Methods_for_S3. Likely you will want to implement a show() method; show() is an S4 method, so see ?Methods_Details. Typically this should use accessors rather than relying on direct slot access, e.g., person_names <- function(x) x@name employee_names <- person_names The next method implemented is often the [ (single bracket subset) function; this is relatively complicated to get right, but worth exploring. I hope that gets you a little further along the road. Martin Morgan On 11/16/21, 11:34 PM, "R-help on behalf of Leonard Mada via R-help" wrote: Dear List-Members, I want to create an S4 class with 2 data slots, as well as a plot and a line method. Unfortunately I lack any experience with S4 classes. I have put together some working code - but I presume that it is not the best way to do it. The actual code is also available on Github (see below). 1.) S4 class - should contain 2 data slots: Slot 1: the agents: = agentMatrix class (defined externally, NetlogoR S4 class); Slot 2: the path traveled by the agents: = a data frame: (x, y, id); - my current code: defines only the agents ("t"); setClass("agentsWithPath", contains = c(t="agentMatrix")); 1.b.) Attribute with colors specific for each agent - should be probably an attribute attached to the agentMatrix and not a proper data slot; Note: - it is currently an attribute on the path data.frame, but I will probably change this once I get the S4 class properly implemented; - the agentMatrix does NOT store the colors (which are stored in another class - but it is useful to have this information available with the agents); 2.) plot & line methods for this class plot.agentsWithPath; lines.agentsWithPath; I somehow got stuck with the S4 class definition. Though it may be a good opportunity to learn about S4 classes (and it is probably better suited as an S4 class than polynomials). The GitHub code draws the agents, but was somehow hacked together. For anyone interested: https://github.com/discoleo/R/blob/master/Stat/ABM.Models.Particles.R
[R] How to create a proper S4 class?
Dear List-Members, I want to create an S4 class with 2 data slots, as well as a plot and a line method. Unfortunately I lack any experience with S4 classes. I have put together some working code - but I presume that it is not the best way to do it. The actual code is also available on Github (see below). 1.) S4 class - should contain 2 data slots: Slot 1: the agents: = agentMatrix class (defined externally, NetlogoR S4 class); Slot 2: the path traveled by the agents: = a data frame: (x, y, id); - my current code: defines only the agents ("t"); setClass("agentsWithPath", contains = c(t="agentMatrix")); 1.b.) Attribute with colors specific for each agent - should be probably an attribute attached to the agentMatrix and not a proper data slot; Note: - it is currently an attribute on the path data.frame, but I will probably change this once I get the S4 class properly implemented; - the agentMatrix does NOT store the colors (which are stored in another class - but it is useful to have this information available with the agents); 2.) plot & line methods for this class plot.agentsWithPath; lines.agentsWithPath; I somehow got stuck with the S4 class definition. Though it may be a good opportunity to learn about S4 classes (and it is probably better suited as an S4 class than polynomials). The GitHub code draws the agents, but was somehow hacked together. For anyone interested: https://github.com/discoleo/R/blob/master/Stat/ABM.Models.Particles.R Many thanks, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dispatching on 2 arguments?
Dear List-members, I would like to experiment with dispatching on 2 arguments and have a few questions. p1 = data.frame(x=1:3, coeff=1) class(p1) = c("pm", class(p1)); I want to replace variables in a polynomial with either: another polynomial, or another variable (character) or with a specific value. 1.) Can I dispatch on 2 arguments? replace.pm.? = function(p1, p2, ...) {...} or classic: replace.pm = function(p1, p2, ...) { if(is.numeric(p2) || is.complex(p2)) return(replace.pm.numeric(p1, p2, ...)); if(is.character(p2)) return(replace.pm.character(p1, p2=p2, ...)); else ... } I will provide some realistic examples below. 2.) Advantages / Disadvantages of each method What are the advantages or disadvantages to each of these methods? I do not yet understand what should be the best design. Real example: ### Quintic p1 = toPoly.pm("x^5 - 5*K*x^3 - 5*(K^2 + K)*x^2 - 5*K^3*x - K^4 - 6*K^3 + 5*K^2 - K") # fractional powers: [works as well] r = toPoly.pm("K^(4/5) + K^(3/5) + K^(1/5)") # - we just found a root of a non-trivial quintic! # all variables/monomials got cancelled; replace.pm(p1, r, "x", pow=1) # more formal r = toPoly.pm("k^4*m^4 + k^3*m^3 + k*m") # m^5 = 1; # m = any of the 5 roots of unity of order 5; pR = p1; pR = replace.pm(pR, r, xn="x") # poly pR = replace.pm(pR, "K", xn="k", pow=5) # character pR = replace.pm(pR, 1, xn="m", pow=5) # value pR # the roots worked! [no remaining rows] # - we just found ALL 5 roots! The code is on Github (see below). Sincerely, Leonard = # very experimental code # some names & arguments may change; source("Polynomials.Helper.R") # also required, but are loaded automatically if present in wd; # source("Polynomials.Helper.Parser.R") # source("Polynomials.Helper.Format.R") ### not necessary for this Test (just loaded) # source("Polynomials.Helper.D.R") # source("Polynomials.Helper.Factorize.R") # the libraries pracma & polynom are not really required for this test either; ### Github: https://github.com/discoleo/R/blob/master/Math/Polynomials.Helper.R https://github.com/discoleo/R/blob/master/Math/Polynomials.Helper.Parser.R https://github.com/discoleo/R/blob/master/Math/Polynomials.Helper.Format.R # not necessary for this Test https://github.com/discoleo/R/blob/master/Math/Polynomials.Helper.D.R https://github.com/discoleo/R/blob/master/Math/Polynomials.Helper.Factorize.R __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] names.data.frame?
Thank you very much. Indeed, NextMethod() is the correct way and is working fine. There are some alternatives (as pointed out). Although I am still trying to figure out what would be the best design strategy of such a class. Note: - I wanted to exclude the "coeff" column from the returned names; names.pm = function(p) { nms = NextMethod(); # excludes the Coefficients: id = match("coeff", nms); return(nms[ - id]); } This is why I also hesitate regarding what to use: dimnames or names. Sincerely, Leonard On 11/3/2021 8:54 PM, Andrew Simmons wrote: > First, your signature for names.pm <http://names.pm> is wrong. It > should look something more like: > > > names.pm <http://names.pm> <- function (x) > { > } > > > As for the body of the function, you might do something like: > > > names.pm <http://names.pm> <- function (x) > { > NextMethod() > } > > > but you don't need to define a names method if you're just going to > call the next method. I would suggest not defining a names method at all. > > > As a side note, I would suggest making your class through the methods > package, with methods::setClass("pm", ...) > See the documentation for setClass for more details, it's the > recommended way to define classes in R. > > On Wed, Nov 3, 2021 at 2:36 PM Leonard Mada via R-help > wrote: > > Dear List members, > > > Is there a way to access the default names() function? > > > I tried the following: > > # Multi-variable polynomial > > p = data.frame(x=1:3, coeff=1) > > class(p) = c("pm", class(p)); > > > names.pm <http://names.pm> = function(p) { > # .Primitive("names")(p) # does NOT function > # .Internal("names")(p) # does NOT function > # nms = names.default(p) # does NOT exist > # nms = names.data.frame(p) # does NOT exist > # nms = names(p); # obvious infinite recursion; > nms = names(unclass(p)); > } > > > Alternatively: > > Would it be better to use dimnames.pm <http://dimnames.pm> instead > of names.pm <http://names.pm>? > > I am not fully aware of the advantages and disadvantages of > dimnames vs > names. > > > Sincerely, > > > Leonard > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] names.data.frame?
Dear List members, Is there a way to access the default names() function? I tried the following: # Multi-variable polynomial p = data.frame(x=1:3, coeff=1) class(p) = c("pm", class(p)); names.pm = function(p) { # .Primitive("names")(p) # does NOT function # .Internal("names")(p) # does NOT function # nms = names.default(p) # does NOT exist # nms = names.data.frame(p) # does NOT exist # nms = names(p); # obvious infinite recursion; nms = names(unclass(p)); } Alternatively: Would it be better to use dimnames.pm instead of names.pm? I am not fully aware of the advantages and disadvantages of dimnames vs names. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to use ifelse without invoking warnings
Dear Ravi, I have uploaded on GitHub a version which handles also constant values instead of functions. Regarding named arguments: this is actually handled automatically as well: eval.by.formula((x > 5 & x %% 2) ~ (x <= 5) ~ ., FUN, y=2, x) # [1] 1 4 9 16 25 6 14 8 18 10 eval.by.formula((x > 5 & x %% 2) ~ (x <= 5) ~ ., FUN, x=2, x) # [1] 4 4 4 4 4 2 14 2 18 2 eval.by.formula((x > 5 & x %% 2) ~ (x <= 5) ~ ., list(FUN[[1]], 0, 1), y=2, x) # [1] 0 0 0 0 0 1 14 1 18 1 But it still needs proper testing and maybe optimization: it is possible to run sapply on the filtered sequence (but I did not want to break anything now). Sincerely, Leonard On 10/9/2021 9:26 PM, Leonard Mada wrote: Dear Ravi, I wrote a small replacement for ifelse() which avoids such unnecessary evaluations (it bothered me a few times as well - so I decided to try a small replacement). ### Example: x = 1:10 FUN = list(); FUN[[1]] = function(x, y) x*y; FUN[[2]] = function(x, y) x^2; FUN[[3]] = function(x, y) x; # lets run multiple conditions # eval.by.formula(conditions, FUN.list, ... (arguments for FUN) ); eval.by.formula((x > 5 & x %% 2) ~ (x <= 5) ~ ., FUN, x, x-1) # Example 2 eval.by.formula((x > 5 & x %% 2) ~ (x <= 5) ~ ., FUN, 2, x) ### Disclaimer: - NOT properly tested; The code for the function is below. Maybe someone can experiment with the code and improve it further. There are a few issues / open questions, like: 1.) Best Name: eval.by.formula, ifelse.formula, ...? 2.) Named arguments: not yet; 3.) Fixed values inside FUN.list 4.) Format of expression for conditions: expression(cond1, cond2, cond3) vs cond1 ~ cond2 ~ cond3 ??? 5.) Code efficiency - some tests on large data sets & optimizations are warranted; Sincerely, Leonard === The latest code is on Github: https://github.com/discoleo/R/blob/master/Stat/Tools.Formulas.R [...] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to use ifelse without invoking warnings
Dear Ravi, I wrote a small replacement for ifelse() which avoids such unnecessary evaluations (it bothered me a few times as well - so I decided to try a small replacement). ### Example: x = 1:10 FUN = list(); FUN[[1]] = function(x, y) x*y; FUN[[2]] = function(x, y) x^2; FUN[[3]] = function(x, y) x; # lets run multiple conditions # eval.by.formula(conditions, FUN.list, ... (arguments for FUN) ); eval.by.formula((x > 5 & x %% 2) ~ (x <= 5) ~ ., FUN, x, x-1) # Example 2 eval.by.formula((x > 5 & x %% 2) ~ (x <= 5) ~ ., FUN, 2, x) ### Disclaimer: - NOT properly tested; The code for the function is below. Maybe someone can experiment with the code and improve it further. There are a few issues / open questions, like: 1.) Best Name: eval.by.formula, ifelse.formula, ...? 2.) Named arguments: not yet; 3.) Fixed values inside FUN.list 4.) Format of expression for conditions: expression(cond1, cond2, cond3) vs cond1 ~ cond2 ~ cond3 ??? 5.) Code efficiency - some tests on large data sets & optimizations are warranted; Sincerely, Leonard === The latest code is on Github: https://github.com/discoleo/R/blob/master/Stat/Tools.Formulas.R eval.by.formula = function(e, FUN.list, ..., default=NA) { tok = split.formula(e); if(length(tok) == 0) return(); FUN = FUN.list; # Argument List clst = substitute(as.list(...))[-1]; len = length(clst); clst.all = lapply(clst, eval); eval.f = function(idCond) { sapply(seq(length(isEval)), function(id) { if(isEval[[id]] == FALSE) return(default); args.l = lapply(clst.all, function(a) if(length(a) == 1) a else a[[id]]); do.call(FUN[[idCond]], args.l); }); } # eval 1st condition: isEval = eval(tok[[1]]); rez = eval.f(1); if(length(tok) == 1) return(rez); # eval remaining conditions isEvalAll = isEval; for(id in seq(2, length(tok))) { if(tok[[id]] == ".") { # Remaining conditions: tok == "."; # makes sens only on the last position if(id < length(tok)) warning("\".\" is not last!"); isEval = ! isEvalAll; rez[isEval] = eval.f(id)[isEval]; next; } isEval = rep(FALSE, length(isEval)); isEval[ ! isEvalAll] = eval(tok[[id]])[ ! isEvalAll]; isEvalAll[isEval] = isEval[isEval]; rez[isEval] = eval.f(id)[isEval]; } return(rez); } # current code uses the formula format: # cond1 ~ cond 2 ~ cond3 # tokenizes a formula in its parts delimited by "~" # Note: # - tokenization is automatic for ","; # - but call MUST then use FUN(expression(_conditions_), other_args, ...); split.formula = function(e) { tok = list(); while(length(e) > 0) { if(e[[1]] == "~") { if(length(e) == 2) { tok = c(NA, e[[2]], tok); break; } tok = c(e[[3]], tok); e = e[[2]]; } else { tok = c(e, tok); break; } } return(tok); } __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extracting Comments from Functions/Packages
Dear R Users, I wrote a minimal parser to extract strings and comments from the function definitions. The string extraction works fine. But there are no comments: a.) Are the comments stripped from the compiled packages? b.) Alternatively: Is the deparse() not suited for this task? b.2.) Is deparse() parsing the function/expression itself? [see code for extract.str.fun() function below] ### All strings in "base" extract.str.pkg("base") # type = 2 for Comments: extract.str.pkg("base", type=2) extract.str.pkg("sp", type=2) extract.str.pkg("NetLogoR", type=2) The code for the 2 functions (extract.str.pkg & extract.str.fun) and the code for the parse.simple() parser are below. Sincerely, Leonard === The latest code is on GitHub: https://github.com/discoleo/R/blob/master/Stat/Tools.Formulas.R ### Code to process functions in packages: extract.str.fun = function(fn, pkg, type=1, strip=TRUE) { fn = as.symbol(fn); pkg = as.symbol(pkg); fn = list(substitute(pkg ::: fn)); # deparse s = paste0(do.call(deparse, fn), collapse=""); npos = parse.simple(s); extract.str(s, npos[[type]], strip=strip) } extract.str.pkg = function(pkg, type=1, exclude.z = TRUE, strip=TRUE) { nms = ls(getNamespace(pkg)); l = lapply(nms, function(fn) extract.str.fun(fn, pkg, type=type, strip=strip)); if(exclude.z) { hasStr = sapply(l, function(s) length(s) >= 1); nms = nms[hasStr]; l = l[hasStr]; } names(l) = nms; return(l); } ### minimal Parser: # - proof of concept; # - may be useful to process non-conformant R "code", e.g.: # "{\"abc\" + \"bcd\"} {FUN}"; (still TODO) # Warning: # - not thoroughly checked & # may be a little buggy! parse.simple = function(x, eol="\n") { len = nchar(x); n.comm = list(integer(0), integer(0)); n.str = list(integer(0), integer(0)); is.hex = function(ch) { # Note: only for 1 character! return((ch >= "0" && ch <= "9") || (ch >= "A" && ch <= "F") || (ch >= "a" && ch <= "f")); } npos = 1; while(npos <= len) { s = substr(x, npos, npos); # State: COMMENT if(s == "#") { n.comm[[1]] = c(n.comm[[1]], npos); while(npos < len) { npos = npos + 1; if(substr(x, npos, npos) == eol) break; } n.comm[[2]] = c(n.comm[[2]], npos); npos = npos + 1; next; } # State: STRING if(s == "\"" || s == "'") { n.str[[1]] = c(n.str[[1]], npos); while(npos < len) { npos = npos + 1; se = substr(x, npos, npos); if(se == "\\") { npos = npos + 1; # simple escape vs Unicode: if(substr(x, npos, npos) != "u") next; len.end = min(len, npos + 4); npos = npos + 1; isAllHex = TRUE; while(npos <= len.end) { se = substr(x, npos, npos); if( ! is.hex(se)) { isAllHex = FALSE; break; } npos = npos + 1; } if(isAllHex) next; } if(se == s) break; } n.str[[2]] = c(n.str[[2]], npos); npos = npos + 1; next; } npos = npos + 1; } return(list(str = n.str, comm = n.comm)); } extract.str = function(s, npos, strip=FALSE) { if(length(npos[[1]]) == 0) return(character(0)); strip.FUN = if(strip) { function(id) { if(npos[[1]][[id]] + 1 < npos[[2]][[id]]) { nStart = npos[[1]][[id]] + 1; nEnd = npos[[2]][[id]] - 1; # TODO: Error with malformed string return(substr(s, nStart, nEnd)); } else { return(""); } } } else function(id) substr(s, npos[[1]][[id]], npos[[2]][[id]]); sapply(seq(length(npos[[1]])), strip.FUN); } __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Descriptive Statistics: useful hacks
Dear R users, I wrote in the meantime a new function: apply.html(html, XPATH, FUN, ...) This function applies FUN to the nodes selected using XPATH. However, I wonder if there is a possibility to use more simple selectors (e.g. jQuery). Although I am not an expert with jQuery, it may be easier for end users than XPATH. Package htmltools does not seem to offer support to import a native html file, nor do I see any functions using jQuery selectors. I do not seem to find any such packages. I would be glad for any hints. Many thanks, Leonard === Latest code is on Github: https://github.com/discoleo/R/blob/master/Stat/Tools.DescriptiveStatistics.R Notes: 1.) as.html() currently imports only a few types, but it could be easily extended to fully generic html; Note: the export as shiny app may not work with a fully generic html; I have not yet explored all the implications! 2.) I am still struggling to understand how to best design the option: with.tags = TRUE. 3.) llammas.FUN: Was implemented at great expense and at the last minute, but unfortunately is still incomplete and important visual styles are missing. Help is welcomed. On 10/3/2021 1:00 AM, Leonard Mada wrote: > Dear R Users, > > > I have started to compile some useful hacks for the generation of nice > descriptive statistics. I hope that these functions & hacks are useful > to the wider R community. I hope that package developers also get some > inspiration from the code or from these ideas. > > > I have started to review various packages focused on descriptive > statistics - although I am still at the very beginning. > > > ### Hacks / Code > - split table headers in 2 rows; > - split results over 2 rows: view.gtsummary(...); > - add abbreviations as footnotes: add.abbrev(...); > > The results are exported as a web page (using shiny) and can be > printed as a pdf documented. See the following pdf example: > > https://github.com/discoleo/R/blob/master/Stat/Tools.DescriptiveStatistics.Example_1.pdf > > > > > ### Example > # currently focused on package gtsummary > library(gtsummary) > library(xml2) > > mtcars %>% > # rename2(): > # - see file Tools.Data.R; > # - behaves in most cases the same as dplyr::rename(); > rename2("HP" = "hp", "Displ" = disp, "Wt (klb)" = "wt", "Rar" = > drat) %>% > # as.factor.df(): > # - see file Tools.Data.R; > # - encode as (ordered) factor; > as.factor.df("cyl", "Cyl ") %>% > # the Descriptive Statistics: > tbl_summary(by = cyl) %>% > modify_header(update = header) %>% > add_p() %>% > add_overall() %>% > modify_header(update = header0) %>% > # Hack: split long statistics !!! > view.gtsummary(view=FALSE, len=8) %>% > add.abbrev( > c("Displ", "HP", "Rar", "Wt (klb)" = "Wt"), > c("Displacement (in^3)", "Gross horsepower", "Rear axle ratio", > "Weight (1000 lbs)")); > > > The required functions are on Github: > https://github.com/discoleo/R/blob/master/Stat/Tools.DescriptiveStatistics.R > > > > The functions rename2() & as.factor.df() are only data-helpers and can > be found also on Github: > https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R > > > Note: > > 1.) The function add.abbrev() operates on the generated html-code: > > - the functionality is more generic and could be used easily with > other packages that export web pages as well; > > 2.) Split statistics: is an ugly hack. I plan to redesign the > functionality using xml-technologies. But I have already too many > side-projects. > > 3.) as.factor.df(): traditionally, one would create derived data-sets > or add a new column with the variable as factor (as the user may need > the numeric values for further analysis). But it looked nicer as a > single block of code. > > > Sincerely, > > > Leonard > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Descriptive Statistics: useful hacks
Dear R Users, I have started to compile some useful hacks for the generation of nice descriptive statistics. I hope that these functions & hacks are useful to the wider R community. I hope that package developers also get some inspiration from the code or from these ideas. I have started to review various packages focused on descriptive statistics - although I am still at the very beginning. ### Hacks / Code - split table headers in 2 rows; - split results over 2 rows: view.gtsummary(...); - add abbreviations as footnotes: add.abbrev(...); The results are exported as a web page (using shiny) and can be printed as a pdf documented. See the following pdf example: https://github.com/discoleo/R/blob/master/Stat/Tools.DescriptiveStatistics.Example_1.pdf ### Example # currently focused on package gtsummary library(gtsummary) library(xml2) mtcars %>% # rename2(): # - see file Tools.Data.R; # - behaves in most cases the same as dplyr::rename(); rename2("HP" = "hp", "Displ" = disp, "Wt (klb)" = "wt", "Rar" = drat) %>% # as.factor.df(): # - see file Tools.Data.R; # - encode as (ordered) factor; as.factor.df("cyl", "Cyl ") %>% # the Descriptive Statistics: tbl_summary(by = cyl) %>% modify_header(update = header) %>% add_p() %>% add_overall() %>% modify_header(update = header0) %>% # Hack: split long statistics !!! view.gtsummary(view=FALSE, len=8) %>% add.abbrev( c("Displ", "HP", "Rar", "Wt (klb)" = "Wt"), c("Displacement (in^3)", "Gross horsepower", "Rear axle ratio", "Weight (1000 lbs)")); The required functions are on Github: https://github.com/discoleo/R/blob/master/Stat/Tools.DescriptiveStatistics.R The functions rename2() & as.factor.df() are only data-helpers and can be found also on Github: https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R Note: 1.) The function add.abbrev() operates on the generated html-code: - the functionality is more generic and could be used easily with other packages that export web pages as well; 2.) Split statistics: is an ugly hack. I plan to redesign the functionality using xml-technologies. But I have already too many side-projects. 3.) as.factor.df(): traditionally, one would create derived data-sets or add a new column with the variable as factor (as the user may need the numeric values for further analysis). But it looked nicer as a single block of code. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Word-Wrapper Library/Package?
Many thanks for the hint. The function is actually a wrapper for: stringi:::stri_wrap I will need to have a closer look to this one (documentation and maybe also peek into the code): ret <- .Call(C_stri_wrap, str, width, cost_exponent, indent, exdent, prefix, initial, whitespace_only, use_length, locale) I thought at some point about the stringi package, but somehow overlooked it. Sincerely, Leonard On 9/30/2021 1:09 AM, CALUM POLWART wrote: > Have you looked at stringr::str_wrap or its parent function > stringi::stri_wrap ? > > It applies an algorithm for the wrap. But it doesn't vectorise the > lines they are returned with \n for new lines, but you could apply a > string split to that result... > > > On 29 Sep 2021 04:57, Andrew Simmons wrote: > > 'strwrap' should wrap at the target column, so I think it's behaving > correctly. You could do + 1 if you're expecting it to wrap > immediately > after the target column. > > As far as splitting while trying to minimize a penalty, I don't think > strwrap can do that, and I don't know of any packages that do such > a thing. > If such a thing exists in another language, there's probably an R > package > with a similar name containing ports of such functions, that might > be your > best bet. I hope this helps. > > On Tue, Sep 28, 2021, 23:51 Leonard Mada wrote: > > > Thank you Andrew. > > > > > > I will explore this function more, although I am struggling to > get it to > > work properly: > > > > strwrap("Abc. B. Defg", 7) > > # [1] "Abc." "B." "Defg" > > > > # both "Abc. B." and "B. Defg" are 7 characters long. > > > > strwrap(paste0(rep("ab", 7), collapse=""), 7) > > # [1] "ababababababab" > > > > > > Can I set an absolute maximum width? > > > > It would be nice to have an algorithm that computes a penalty > for the > > split and selects the split with the smallest penalty (when no > obvious > > split is possible). > > > > > > Sincerely, > > > > > > Leonard > > > > > > > > On 9/29/2021 6:30 AM, Andrew Simmons wrote: > > > > I think what you're looking for is 'strwrap', it's in package base. > > > > On Tue, Sep 28, 2021, 22:26 Leonard Mada via R-help > > > wrote: > > > >> Dear R-Users, > >> > >> > >> Does anyone know any package or library that implements > functions for > >> word wrapping? > >> > >> > >> I did implement a very rudimentary one (Github link below), but > would > >> like to avoid to reinvent the wheel. Considering that > word-wrapping is a > >> very common task, it should be available even in base R (e.g. in a > >> "format" module/package). > >> > >> > >> Sincerely, > >> > >> > >> Leonard > >> > >> === > >> > >> The latest versions of the functions are on Github: > >> > >> https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R > >> # Note: > >> # - the function implementing word wrapping: split.N.line(...); > >> # - for the example below: the functions defined in > Tools.CRAN.R are > >> required; > >> > >> > >> Examples: > >> ### Search CRAN > >> library(pkgsearch) > >> > >> searchCran = function(s, from=1, len=60, len.print=20, extend="*", > >> sep=" ", sep.h="-") { > >> if( ! is.null(extend)) s = paste0(s, extend); > >> x = advanced_search(s, size=len, from=from); > >> if(length(x$package_data) == 0) { > >> cat("No packages found!", sep="\n"); > >> } else { > >> scroll.pkg(x, len=len.print, sep=sep, sep.h=sep.h); > >> } > >> invisible(x) > >> } > >> > >> # with nice formatting & printing: > >> x = searchCran("text", from=60, sep.h="-") > >> > >&g
Re: [R] Word-Wrapper Library/Package?
Thank you Andrew. I will explore this function more, although I am struggling to get it to work properly: strwrap("Abc. B. Defg", 7) # [1] "Abc." "B." "Defg" # both "Abc. B." and "B. Defg" are 7 characters long. strwrap(paste0(rep("ab", 7), collapse=""), 7) # [1] "ababababababab" Can I set an absolute maximum width? It would be nice to have an algorithm that computes a penalty for the split and selects the split with the smallest penalty (when no obvious split is possible). Sincerely, Leonard On 9/29/2021 6:30 AM, Andrew Simmons wrote: > I think what you're looking for is 'strwrap', it's in package base. > > On Tue, Sep 28, 2021, 22:26 Leonard Mada via R-help > mailto:r-help@r-project.org>> wrote: > > Dear R-Users, > > > Does anyone know any package or library that implements functions for > word wrapping? > > > I did implement a very rudimentary one (Github link below), but would > like to avoid to reinvent the wheel. Considering that > word-wrapping is a > very common task, it should be available even in base R (e.g. in a > "format" module/package). > > > Sincerely, > > > Leonard > > === > > The latest versions of the functions are on Github: > > https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R > <https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R> > # Note: > # - the function implementing word wrapping: split.N.line(...); > # - for the example below: the functions defined in Tools.CRAN.R are > required; > > > Examples: > ### Search CRAN > library(pkgsearch) > > searchCran = function(s, from=1, len=60, len.print=20, extend="*", > sep=" ", sep.h="-") { > if( ! is.null(extend)) s = paste0(s, extend); > x = advanced_search(s, size=len, from=from); > if(length(x$package_data) == 0) { > cat("No packages found!", sep="\n"); > } else { > scroll.pkg(x, len=len.print, sep=sep, sep.h=sep.h); > } > invisible(x) > } > > # with nice formatting & printing: > x = searchCran("text", from=60, sep.h="-") > > scroll.pkg(x, start=20, len=21, sep.h = "-*") > # test of sep.h=NULL vs ... > > > Notes: > > 1.) split.N.line: > > - was implemented to output a pre-specified number of lines (kind of > "maxLines"), but this is not required from an actual word-wrapper; > > - it was an initial design decision when implementing the > format.lines() > function; but I plan to implement a 1-pass exact algorithm during the > next few days anyway; > > 2.) Refactoring > > - I will also move the formatting code to a new file: probably > Tools.Formatting.R; > > - the same applies for the formatting code for ftable (currently > in file > Tools.Data.R); > > 3.) Package gridtext > > - seems to have some word-wrapping functionality, but does not > seem to > expose it; > > - I am also currently focused on character-based word wrapping > (e.g. for > RConsole); > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Word-Wrapper Library/Package?
Dear R-Users, Does anyone know any package or library that implements functions for word wrapping? I did implement a very rudimentary one (Github link below), but would like to avoid to reinvent the wheel. Considering that word-wrapping is a very common task, it should be available even in base R (e.g. in a "format" module/package). Sincerely, Leonard === The latest versions of the functions are on Github: https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R # Note: # - the function implementing word wrapping: split.N.line(...); # - for the example below: the functions defined in Tools.CRAN.R are required; Examples: ### Search CRAN library(pkgsearch) searchCran = function(s, from=1, len=60, len.print=20, extend="*", sep=" ", sep.h="-") { if( ! is.null(extend)) s = paste0(s, extend); x = advanced_search(s, size=len, from=from); if(length(x$package_data) == 0) { cat("No packages found!", sep="\n"); } else { scroll.pkg(x, len=len.print, sep=sep, sep.h=sep.h); } invisible(x) } # with nice formatting & printing: x = searchCran("text", from=60, sep.h="-") scroll.pkg(x, start=20, len=21, sep.h = "-*") # test of sep.h=NULL vs ... Notes: 1.) split.N.line: - was implemented to output a pre-specified number of lines (kind of "maxLines"), but this is not required from an actual word-wrapper; - it was an initial design decision when implementing the format.lines() function; but I plan to implement a 1-pass exact algorithm during the next few days anyway; 2.) Refactoring - I will also move the formatting code to a new file: probably Tools.Formatting.R; - the same applies for the formatting code for ftable (currently in file Tools.Data.R); 3.) Package gridtext - seems to have some word-wrapping functionality, but does not seem to expose it; - I am also currently focused on character-based word wrapping (e.g. for RConsole); [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading File Sizes: very slow!
On 9/27/2021 1:06 AM, Leonard Mada wrote: > > Dear Bill, > > > Does list.files() always sort the results? > > It seems so. The option: full.names = FALSE does not have any effect: > the results seem always sorted. > > > Maybe it is better to process the files in an unsorted order: as > stored on the disk? > After some more investigations: This took only a few seconds: sapply(list.dirs(path=path, full.name=F, recursive=F), function(f) length(list.files(path = paste0(path, "/", f), full.names = FALSE, recursive = TRUE))) # maybe with caching, but the difference is enormous Seems BH contains *by far* the most files: 11701 files. But excluding it from processing did have only a liniar effect: still 377 s. I had a look at src/main/platform.c, but do not fully understand it. Sincerely, Leonard > > Sincerely, > > > Leonard > > > On 9/25/2021 8:13 PM, Bill Dunlap wrote: >> On my Windows 10 laptop I see evidence of the operating system >> caching information about recently accessed files. This makes it >> hard to say how the speed might be improved. Is there a way to clear >> this cache? >> >> > system.time(L1 <- size.f.pkg(R.home("library"))) >> user system elapsed >> 0.48 2.81 30.42 >> > system.time(L2 <- size.f.pkg(R.home("library"))) >> user system elapsed >> 0.35 1.10 1.43 >> > identical(L1,L2) >> [1] TRUE >> > length(L1) >> [1] 30 >> > length(dir(R.home("library"),recursive=TRUE)) >> [1] 12949 >> >> On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help >> mailto:r-help@r-project.org>> wrote: >> >> Dear List Members, >> >> >> I tried to compute the file sizes of each installed package and the >> process is terribly slow. >> >> It took ~ 10 minutes for 512 packages / 1.6 GB total size of files. >> >> >> 1.) Package Sizes >> >> >> system.time({ >> x = size.pkg(file=NULL); >> }) >> # elapsed time: 509 s !!! >> # 512 Packages; 1.64 GB; >> # R 4.1.1 on MS Windows 10 >> >> >> The code for the size.pkg() function is below and the latest >> version is >> on Github: >> >> https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R >> <https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R> >> >> >> Questions: >> Is there a way to get the file size faster? >> It takes long on Windows as well, but of the order of 10-20 s, >> not 10 >> minutes. >> Do I miss something? >> >> >> 1.b.) Alternative >> >> It came to my mind to read first all file sizes and then use >> tapply or >> aggregate - but I do not see why it should be faster. >> >> Would it be meaningful to benchmark each individual package? >> >> Although I am not very inclined to wait 10 minutes for each new >> try out. >> >> >> 2.) Big Packages >> >> Just as a note: there are a few very large packages (in my list >> of 512 >> packages): >> >> 1 123,566,287 BH >> 2 113,578,391 sf >> 3 112,252,652 rgdal >> 4 81,144,868 magick >> 5 77,791,374 openNLPmodels.en >> >> I suspect that sf & rgdal have a lot of duplicated data structures >> and/or duplicate code and/or duplicated libraries - although I am >> not an >> expert in the field and did not check the sources. >> >> >> Sincerely, >> >> >> Leonard >> >> === >> >> >> # Package Size: >> size.f.pkg = function(path=NULL) { >> if(is.null(path)) path = R.home("library"); >> xd = list.dirs(path = path, full.names = FALSE, recursive = >> FALSE); >> size.f = function(p) { >> p = paste0(path, "/", p); >> sum(file.info <http://file.info>(list.files(path=p, >> pattern=".", >> full.names = TRUE, all.files = TRUE, recursive = >> TRUE))$size); >> } >> sapply(xd, size.f); >> } >> >> size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") { >> x = size.f.pkg(path=path); >> x = as.data.frame(x); >> names(x) = "Size"
Re: [R] Reading File Sizes: very slow!
Dear Bill, Does list.files() always sort the results? It seems so. The option: full.names = FALSE does not have any effect: the results seem always sorted. Maybe it is better to process the files in an unsorted order: as stored on the disk? Sincerely, Leonard On 9/25/2021 8:13 PM, Bill Dunlap wrote: > On my Windows 10 laptop I see evidence of the operating system caching > information about recently accessed files. This makes it hard to say > how the speed might be improved. Is there a way to clear this cache? > > > system.time(L1 <- size.f.pkg(R.home("library"))) > user system elapsed > 0.48 2.81 30.42 > > system.time(L2 <- size.f.pkg(R.home("library"))) > user system elapsed > 0.35 1.10 1.43 > > identical(L1,L2) > [1] TRUE > > length(L1) > [1] 30 > > length(dir(R.home("library"),recursive=TRUE)) > [1] 12949 > > On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help > mailto:r-help@r-project.org>> wrote: > > Dear List Members, > > > I tried to compute the file sizes of each installed package and the > process is terribly slow. > > It took ~ 10 minutes for 512 packages / 1.6 GB total size of files. > > > 1.) Package Sizes > > > system.time({ > x = size.pkg(file=NULL); > }) > # elapsed time: 509 s !!! > # 512 Packages; 1.64 GB; > # R 4.1.1 on MS Windows 10 > > > The code for the size.pkg() function is below and the latest > version is > on Github: > > https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R > <https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R> > > > Questions: > Is there a way to get the file size faster? > It takes long on Windows as well, but of the order of 10-20 s, not 10 > minutes. > Do I miss something? > > > 1.b.) Alternative > > It came to my mind to read first all file sizes and then use > tapply or > aggregate - but I do not see why it should be faster. > > Would it be meaningful to benchmark each individual package? > > Although I am not very inclined to wait 10 minutes for each new > try out. > > > 2.) Big Packages > > Just as a note: there are a few very large packages (in my list of > 512 > packages): > > 1 123,566,287 BH > 2 113,578,391 sf > 3 112,252,652 rgdal > 4 81,144,868 magick > 5 77,791,374 openNLPmodels.en > > I suspect that sf & rgdal have a lot of duplicated data structures > and/or duplicate code and/or duplicated libraries - although I am > not an > expert in the field and did not check the sources. > > > Sincerely, > > > Leonard > > === > > > # Package Size: > size.f.pkg = function(path=NULL) { > if(is.null(path)) path = R.home("library"); > xd = list.dirs(path = path, full.names = FALSE, recursive = > FALSE); > size.f = function(p) { > p = paste0(path, "/", p); > sum(file.info <http://file.info>(list.files(path=p, > pattern=".", > full.names = TRUE, all.files = TRUE, recursive = > TRUE))$size); > } > sapply(xd, size.f); > } > > size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") { > x = size.f.pkg(path=path); > x = as.data.frame(x); > names(x) = "Size" > x$Name = rownames(x); > # Order > if(sort) { > id = order(x$Size, decreasing=TRUE) > x = x[id,]; > } > if( ! is.null(file)) { > if( ! is.character(file)) { > print("Error: Size NOT written to file!"); > } else write.csv(x, file=file, row.names=FALSE); > } > return(x); > } > > __ > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading File Sizes: very slow!
Dear Bill, - using the Ms Windows Properties: ~ 15 s; [Windows new start, 1st operation, bulk size] - using R / file.info() (2nd operation): still 523.6 s [and R seems mostly unresponsive during this time] Unfortunately, I do not know how to clear any cache. [The cache may play a role only for smaller sizes? But I am rather not inclined to run the ~ 10 minutes procedure multiple times.] Sincerely, Leonard On 9/26/2021 5:49 AM, Richard O'Keefe wrote: On a $150 second-hand laptop with 0.9GB of library, and a single-user installation of R so only one place to look LIBRARY=$HOME/R/x86_64-pc-linux-gnu-library/4.0 cd $LIBRARY echo "kbytes package" du -sk * | sort -k1n took 150 msec to report the disc space needed for every package. That' On Sun, 26 Sept 2021 at 06:14, Bill Dunlap wrote: On my Windows 10 laptop I see evidence of the operating system caching information about recently accessed files. This makes it hard to say how the speed might be improved. Is there a way to clear this cache? system.time(L1 <- size.f.pkg(R.home("library"))) user system elapsed 0.482.81 30.42 system.time(L2 <- size.f.pkg(R.home("library"))) user system elapsed 0.351.101.43 identical(L1,L2) [1] TRUE length(L1) [1] 30 length(dir(R.home("library"),recursive=TRUE)) [1] 12949 On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help < r-help@r-project.org> wrote: Dear List Members, I tried to compute the file sizes of each installed package and the process is terribly slow. It took ~ 10 minutes for 512 packages / 1.6 GB total size of files. 1.) Package Sizes system.time({ x = size.pkg(file=NULL); }) # elapsed time: 509 s !!! # 512 Packages; 1.64 GB; # R 4.1.1 on MS Windows 10 The code for the size.pkg() function is below and the latest version is on Github: https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R Questions: Is there a way to get the file size faster? It takes long on Windows as well, but of the order of 10-20 s, not 10 minutes. Do I miss something? 1.b.) Alternative It came to my mind to read first all file sizes and then use tapply or aggregate - but I do not see why it should be faster. Would it be meaningful to benchmark each individual package? Although I am not very inclined to wait 10 minutes for each new try out. 2.) Big Packages Just as a note: there are a few very large packages (in my list of 512 packages): 1 123,566,287 BH 2 113,578,391 sf 3 112,252,652rgdal 4 81,144,868 magick 5 77,791,374 openNLPmodels.en I suspect that sf & rgdal have a lot of duplicated data structures and/or duplicate code and/or duplicated libraries - although I am not an expert in the field and did not check the sources. Sincerely, Leonard === # Package Size: size.f.pkg = function(path=NULL) { if(is.null(path)) path = R.home("library"); xd = list.dirs(path = path, full.names = FALSE, recursive = FALSE); size.f = function(p) { p = paste0(path, "/", p); sum(file.info(list.files(path=p, pattern=".", full.names = TRUE, all.files = TRUE, recursive = TRUE))$size); } sapply(xd, size.f); } size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") { x = size.f.pkg(path=path); x = as.data.frame(x); names(x) = "Size" x$Name = rownames(x); # Order if(sort) { id = order(x$Size, decreasing=TRUE) x = x[id,]; } if( ! is.null(file)) { if( ! is.character(file)) { print("Error: Size NOT written to file!"); } else write.csv(x, file=file, row.names=FALSE); } return(x); } __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading File Sizes: very slow!
Dear List Members, I tried to compute the file sizes of each installed package and the process is terribly slow. It took ~ 10 minutes for 512 packages / 1.6 GB total size of files. 1.) Package Sizes system.time({ x = size.pkg(file=NULL); }) # elapsed time: 509 s !!! # 512 Packages; 1.64 GB; # R 4.1.1 on MS Windows 10 The code for the size.pkg() function is below and the latest version is on Github: https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R Questions: Is there a way to get the file size faster? It takes long on Windows as well, but of the order of 10-20 s, not 10 minutes. Do I miss something? 1.b.) Alternative It came to my mind to read first all file sizes and then use tapply or aggregate - but I do not see why it should be faster. Would it be meaningful to benchmark each individual package? Although I am not very inclined to wait 10 minutes for each new try out. 2.) Big Packages Just as a note: there are a few very large packages (in my list of 512 packages): 1 123,566,287 BH 2 113,578,391 sf 3 112,252,652 rgdal 4 81,144,868 magick 5 77,791,374 openNLPmodels.en I suspect that sf & rgdal have a lot of duplicated data structures and/or duplicate code and/or duplicated libraries - although I am not an expert in the field and did not check the sources. Sincerely, Leonard === # Package Size: size.f.pkg = function(path=NULL) { if(is.null(path)) path = R.home("library"); xd = list.dirs(path = path, full.names = FALSE, recursive = FALSE); size.f = function(p) { p = paste0(path, "/", p); sum(file.info(list.files(path=p, pattern=".", full.names = TRUE, all.files = TRUE, recursive = TRUE))$size); } sapply(xd, size.f); } size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") { x = size.f.pkg(path=path); x = as.data.frame(x); names(x) = "Size" x$Name = rownames(x); # Order if(sort) { id = order(x$Size, decreasing=TRUE) x = x[id,]; } if( ! is.null(file)) { if( ! is.character(file)) { print("Error: Size NOT written to file!"); } else write.csv(x, file=file, row.names=FALSE); } return(x); } __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installed packages: Bioconductor vs CRAN?
[working version] On 9/25/2021 2:55 AM, Leonard Mada wrote: Dear List Members, Is there a way to extract if an installed package is from Bioconductor or if it is a regular Cran package? The information seems to be *not* available in: installed.packages() ### [updated] # Basic Info: info.pkg = function(pkg=NULL, fields="Repository") { if(is.null(pkg)) { pkg = installed.packages(fields=fields); } else { all.pkg = installed.packages(); pkg = all.pkg[all.pkg[,1] %in% pkg, ]; } p = pkg; p = as.data.frame(p); p = p[ , c("Package", "Version", "Built", fields, "Imports")]; return(p); } I will think later how to improve the filtering of Bioconductor packages. Probably based on biocViews. Many thanks, Leonard Sincerely, Leonard === I started to write some utility functions to analyse installed packages. The latest version is on Github: https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R # Basic Info: info.pkg = function(pkg=NULL) { if(is.null(pkg)) { pkg = installed.packages(); } else { all.pkg = installed.packages(); pkg = all.pkg[all.pkg[,1] %in% pkg, ]; } p = pkg; p = as.data.frame(p); p = p[ , c("Package", "Version", "Built", "Imports")]; return(p); } # Imported packages: imports.pkg = function(pkg=NULL, sort=TRUE) { p = info.pkg(pkg); ### Imported packages imp = lapply(p$Imports, function(s) strsplit(s, "[,][ ]*")) imp = unlist(imp) imp = imp[ ! is.na(imp)] # Cleanup: imp = sub("[ \n\r\t]*+\\([-,. >=0-9\n\t\r]++\\) *+$", "", imp, perl=TRUE) imp = sub("^[ \n\r\t]++", "", imp, perl=TRUE); # Tabulate: tbl = as.data.frame(table(imp), stringsAsFactors=FALSE); names(tbl)[1] = "Name"; if(sort) { id = order(tbl$Freq, decreasing=TRUE); tbl = tbl[id,]; } return(tbl); } match.imports = function(pkg, x=NULL, quote=FALSE) { if(is.null(x)) x = info.pkg(); if(quote) { pkg = paste0("\\Q", pkg, "\\E"); } # TODO: Use word delimiters? # "( __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installed packages: Bioconductor vs CRAN?
Dear Bert, Indeed, this seems to work: installed.packages(fields="Repository") I still need to figure out what variants to expect. Sincerely, Leonard On 9/25/2021 3:31 AM, Leonard Mada wrote: Dear Bert, The DESCRIPTION file contains additional useful information, e.g.: 1.) Package EBImage: biocViews: Visualization Packaged: 2021-05-19 23:53:29 UTC; biocbuild 2.) deSolve Repository: CRAN I have verified a few of the CRAN packages, and they seem to include the tag: Repository: CRAN The Bioconductor packages are different (see e.g. EBImage). I am wondering if there is already a method to extract this info? Sincerely, Leonard On 9/25/2021 3:06 AM, Bert Gunter wrote: The help file tells you that installed.packages() looks at the DESCRIPTION files of packages. Section 1.1.1 of "Writing R Extensions" tells you what information is in such files. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Sep 24, 2021 at 4:56 PM Leonard Mada via R-help wrote: Dear List Members, Is there a way to extract if an installed package is from Bioconductor or if it is a regular Cran package? The information seems to be *not* available in: installed.packages() Sincerely, Leonard === I started to write some utility functions to analyse installed packages. The latest version is on Github: https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R # Basic Info: info.pkg = function(pkg=NULL) { if(is.null(pkg)) { pkg = installed.packages(); } else { all.pkg = installed.packages(); pkg = all.pkg[all.pkg[,1] %in% pkg, ]; } p = pkg; p = as.data.frame(p); p = p[ , c("Package", "Version", "Built", "Imports")]; return(p); } # Imported packages: imports.pkg = function(pkg=NULL, sort=TRUE) { p = info.pkg(pkg); ### Imported packages imp = lapply(p$Imports, function(s) strsplit(s, "[,][ ]*")) imp = unlist(imp) imp = imp[ ! is.na(imp)] # Cleanup: imp = sub("[ \n\r\t]*+\\([-,. >=0-9\n\t\r]++\\) *+$", "", imp, perl=TRUE) imp = sub("^[ \n\r\t]++", "", imp, perl=TRUE); # Tabulate: tbl = as.data.frame(table(imp), stringsAsFactors=FALSE); names(tbl)[1] = "Name"; if(sort) { id = order(tbl$Freq, decreasing=TRUE); tbl = tbl[id,]; } return(tbl); } match.imports = function(pkg, x=NULL, quote=FALSE) { if(is.null(x)) x = info.pkg(); if(quote) { pkg = paste0("\\Q", pkg, "\\E"); } # TODO: Use word delimiters? # "(https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installed packages: Bioconductor vs CRAN?
Dear Bert, The DESCRIPTION file contains additional useful information, e.g.: 1.) Package EBImage: biocViews: Visualization Packaged: 2021-05-19 23:53:29 UTC; biocbuild 2.) deSolve Repository: CRAN I have verified a few of the CRAN packages, and they seem to include the tag: Repository: CRAN The Bioconductor packages are different (see e.g. EBImage). I am wondering if there is already a method to extract this info? Sincerely, Leonard On 9/25/2021 3:06 AM, Bert Gunter wrote: The help file tells you that installed.packages() looks at the DESCRIPTION files of packages. Section 1.1.1 of "Writing R Extensions" tells you what information is in such files. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Sep 24, 2021 at 4:56 PM Leonard Mada via R-help wrote: Dear List Members, Is there a way to extract if an installed package is from Bioconductor or if it is a regular Cran package? The information seems to be *not* available in: installed.packages() Sincerely, Leonard === I started to write some utility functions to analyse installed packages. The latest version is on Github: https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R # Basic Info: info.pkg = function(pkg=NULL) { if(is.null(pkg)) { pkg = installed.packages(); } else { all.pkg = installed.packages(); pkg = all.pkg[all.pkg[,1] %in% pkg, ]; } p = pkg; p = as.data.frame(p); p = p[ , c("Package", "Version", "Built", "Imports")]; return(p); } # Imported packages: imports.pkg = function(pkg=NULL, sort=TRUE) { p = info.pkg(pkg); ### Imported packages imp = lapply(p$Imports, function(s) strsplit(s, "[,][ ]*")) imp = unlist(imp) imp = imp[ ! is.na(imp)] # Cleanup: imp = sub("[ \n\r\t]*+\\([-,. >=0-9\n\t\r]++\\) *+$", "", imp, perl=TRUE) imp = sub("^[ \n\r\t]++", "", imp, perl=TRUE); # Tabulate: tbl = as.data.frame(table(imp), stringsAsFactors=FALSE); names(tbl)[1] = "Name"; if(sort) { id = order(tbl$Freq, decreasing=TRUE); tbl = tbl[id,]; } return(tbl); } match.imports = function(pkg, x=NULL, quote=FALSE) { if(is.null(x)) x = info.pkg(); if(quote) { pkg = paste0("\\Q", pkg, "\\E"); } # TODO: Use word delimiters? # "(https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installed packages: Bioconductor vs CRAN?
Dear List Members, Is there a way to extract if an installed package is from Bioconductor or if it is a regular Cran package? The information seems to be *not* available in: installed.packages() Sincerely, Leonard === I started to write some utility functions to analyse installed packages. The latest version is on Github: https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R # Basic Info: info.pkg = function(pkg=NULL) { if(is.null(pkg)) { pkg = installed.packages(); } else { all.pkg = installed.packages(); pkg = all.pkg[all.pkg[,1] %in% pkg, ]; } p = pkg; p = as.data.frame(p); p = p[ , c("Package", "Version", "Built", "Imports")]; return(p); } # Imported packages: imports.pkg = function(pkg=NULL, sort=TRUE) { p = info.pkg(pkg); ### Imported packages imp = lapply(p$Imports, function(s) strsplit(s, "[,][ ]*")) imp = unlist(imp) imp = imp[ ! is.na(imp)] # Cleanup: imp = sub("[ \n\r\t]*+\\([-,. >=0-9\n\t\r]++\\) *+$", "", imp, perl=TRUE) imp = sub("^[ \n\r\t]++", "", imp, perl=TRUE); # Tabulate: tbl = as.data.frame(table(imp), stringsAsFactors=FALSE); names(tbl)[1] = "Name"; if(sort) { id = order(tbl$Freq, decreasing=TRUE); tbl = tbl[id,]; } return(tbl); } match.imports = function(pkg, x=NULL, quote=FALSE) { if(is.null(x)) x = info.pkg(); if(quote) { pkg = paste0("\\Q", pkg, "\\E"); } # TODO: Use word delimiters? # "(https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [Questionnaire] Standardized Options: Justify/Alignment
Dear R users, I have started to work on an improved version of the format.ftable function. The code and ideas should be reused to improve other R functions (enabling more advanced format of the character output). However, there are a number of open questions. These are focused on standardizing the names and the options of the various arguments used to format the output. A separate post will address questions related to technical and design decisions. Note: The arguments are passed to various helper functions. It may be tedious to modify once implemented. Furthermore, package developers should be encouraged to use the standardized names and options as well (and may use the helper functions as well). The users are also encouraged to test the various options. Some code to enable testing is available at the end of this post. Structure of "Questionnaire": a.) Answers should follow a Likert-type scale: # Strongly disagree (on option ...); # Disagree (on option ...); # Agree (on option ...); # Strongly agree (on option ...); b.) Motivation: ... c.) Other comments: ... ### The "Questionnaire" 1.) "MiddleTop", "MiddleBottom" vs "Middle" Example problem: positioning 3 lines of text on 4 rows; Workaround: user can easily prepend or append a newline to the relevant names, forcing the desired behaviour. However, there is a helper function to merge (cbind) 2 string matrices and it may be tedious for a user to modify all names. [but this is less used in ftable] Disadvantages: the 2 variants "break" pmatch()! 2.) Lower case vs Upper case Motivation: the options for most named algorithms are uppercase and are likely to remain uppercase; Example: option = c("MyName1", "MyName2", "Fields1", "Fields2", ...); Existing options in format.ftable: are lowercase, "left", "right", *"centre"*; 3.) Standardized Names 3.a.) Arguments: justify = ... or align = ...? pos = ... or position = ... or valign = ... ? 3.b.) Options: - "left", "right", "centre" vs "center"? - using both "centre" and "center" break pmatch(); - "top", "bottom", "middle"; Native-English speakers should review this question as well. Note: The new function enables to justify differently both the row-names and the factor levels: - there are actually 2 arguments: justify="left", justify.lvl="c"; # with centre vs center issue! I do not now if there is any facility to run such questionnaires through R. My resources are also rather limited - if anyone is willing to provide help - I would be very happy. Sincerely, Leonard === ### Test Code The latest version of the ftable2 function (contains a fix) and the needed helper functions are available on Github: https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R ### Some Data mtcars$carbCtg = cut(mtcars$carb, c(1, 2, 4, +Inf), right=FALSE) # Alternative: # mtcars$carbCtg = cut(mtcars$carb, c(1, 2, 4, 8), include.lowest=TRUE) tbl = with(mtcars, table(cyl, hp, carbCtg, gear)) id = c(1,3,4); # Note: the names can be modified to test various scenarios xnm = c("Long\nname: ", "", "Extremely\nlong\nname: ") xnm = paste0(xnm, names(dimnames(tbl))[id]) names(dimnames(tbl))[id] = xnm; ftbl = ftable(tbl, row.vars = id) ### Test: FTABLE ftable2(ftbl, sep=" | ", justify="left", justify.lvl="c", pos="Top", split="\n") __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improvement: function cut
Hello Andrew, I add this info as a completion (so other users can get a better understanding): If we want to perform a survival analysis, than the interval should be closed to the right, but we should include also the first time point (as per Intention-to-Treat): [0, 4](4, 8](8, 12](12, 16] [0, 4](4, 8](8, 12](12, 16](16, 20] So the series is extendible to the right without any errors! But the 1st interval (which is the same in both series) is different from the other intervals: [0, 4]. I feel that this should have been the default behaviour for cut(). Note: I was induced to think about a different situation in my previous message, as you constructed open intervals on the right, and also extended to the right. But survival analysis should be as described in this mail and should probably be the default. Sincerely, Leonard On 9/18/2021 1:29 AM, Andrew Simmons wrote: > I disagree, I don't really think it's too long or ugly, but if you > think it is, you could abbreviate it as 'i'. > > > x <- 0:20 > breaks1 <- seq.int <http://seq.int>(0, 16, 4) > breaks2 <- seq.int <http://seq.int>(0, 20, 4) > data.frame( > cut(x, breaks1, right = FALSE, i = TRUE), > cut(x, breaks2, right = FALSE, i = TRUE), > check.names = FALSE > ) > > > I hope this helps. > > On Fri, Sep 17, 2021 at 6:26 PM Leonard Mada <mailto:leo.m...@syonic.eu>> wrote: > > Hello Andrew, > > > But "cut" generates factors. In most cases with real data one > expects to have also the ends of the interval: the argument > "include.lowest" is both ugly and too long. > > [The test-code on the ftable thread contains this error! I have > run through this error a couple of times.] > > > The only real situation that I can imagine to be problematic: > > - if the interval goes to +Inf (or -Inf): I do not know if there > would be any effects when including +Inf (or -Inf). > > > Leonard > > > On 9/18/2021 1:14 AM, Andrew Simmons wrote: >> While it is not explicitly mentioned anywhere in the >> documentation for .bincode, I suspect 'include.lowest = FALSE' is >> the default to keep the definitions of the bins consistent. For >> example: >> >> >> x <- 0:20 >> breaks1 <- seq.int <http://seq.int>(0, 16, 4) >> breaks2 <- seq.int <http://seq.int>(0, 20, 4) >> cbind( >> .bincode(x, breaks1, right = FALSE, include.lowest = TRUE), >> .bincode(x, breaks2, right = FALSE, include.lowest = TRUE) >> ) >> >> >> by having 'include.lowest = TRUE' with different ends, you can >> get inconsistent behaviour. While this probably wouldn't be an >> issue with 'real' data, this would seem like something you'd want >> to avoid by default. The definitions of the bins are >> >> >> [0, 4) >> [4, 8) >> [8, 12) >> [12, 16] >> >> >> and >> >> >> [0, 4) >> [4, 8) >> [8, 12) >> [12, 16) >> [16, 20] >> >> >> so you can see where the inconsistent behaviour comes from. You >> might be able to get R-core to add argument 'warn', but probably >> not to change the default of 'include.lowest'. I hope this helps >> >> >> On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada > <mailto:leo.m...@syonic.eu>> wrote: >> >> Thank you Andrew. >> >> >> Is there any reason not to make: include.lowest = TRUE the >> default? >> >> >> Regarding the NA: >> >> The user still has to suspect that some values were not >> included and run that test. >> >> >> Leonard >> >> >> On 9/18/2021 12:53 AM, Andrew Simmons wrote: >>> Regarding your first point, argument 'include.lowest' >>> already handles this specific case, see ?.bincode >>> >>> Your second point, maybe it could be helpful, but since both >>> 'cut.default' and '.bincode' return NA if a value isn't >>> within a bin, you could make something like this on your own. >>> Might be worth pitching to R-bugs on the wishlist. >>> >>> >>> >>> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help >>> mailto:r-help@r-project.org>> wrote: >>> >>> Hello L
Re: [R] Improvement: function cut
The warn should be in cut() => .bincode(). It should be generated whenever a real value (excludes NA or NAN or +/- Inf) is not included in any of the bins. If the user writes a script and doesn't want any warnings: he can select warn = FALSE. But otherwise it would be very helpful to catch immediately the error (and not after a number of steps or miss the error altogether). Leonard On 9/18/2021 1:28 AM, Jeff Newmiller wrote: Re your objection that "the user has to suspect that some values were not included" applies equally to your proposed warn option. There are a lot of ways to introduce NAs... in real projects all analysts should be suspecting this problem. On September 17, 2021 3:01:35 PM PDT, Leonard Mada via R-help wrote: Thank you Andrew. Is there any reason not to make: include.lowest = TRUE the default? Regarding the NA: The user still has to suspect that some values were not included and run that test. Leonard On 9/18/2021 12:53 AM, Andrew Simmons wrote: Regarding your first point, argument 'include.lowest' already handles this specific case, see ?.bincode Your second point, maybe it could be helpful, but since both 'cut.default' and '.bincode' return NA if a value isn't within a bin, you could make something like this on your own. Might be worth pitching to R-bugs on the wishlist. On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help mailto:r-help@r-project.org>> wrote: Hello List members, the following improvements would be useful for function cut (and .bincode): 1.) Argument: Include extremes extremes = TRUE if(right == FALSE) { # include also right for last interval; } else { # include also left for first interval; } 2.) Argument: warn = TRUE Warn if any values are not included in the intervals. Motivation: - reduce risk of errors when using function cut(); Sincerely, Leonard __ R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improvement: function cut
Why would you want to merge different factors? It makes no sense on real data. Even if some names are the same, the factors are not the same! The only real-data application that springs to mind is censoring (right or left, depending on the choice): but here we have both open and closed intervals, e.g. to the right (in the same data-set). Leonard On 9/18/2021 1:29 AM, Andrew Simmons wrote: > I disagree, I don't really think it's too long or ugly, but if you > think it is, you could abbreviate it as 'i'. > > > x <- 0:20 > breaks1 <- seq.int <http://seq.int>(0, 16, 4) > breaks2 <- seq.int <http://seq.int>(0, 20, 4) > data.frame( > cut(x, breaks1, right = FALSE, i = TRUE), > cut(x, breaks2, right = FALSE, i = TRUE), > check.names = FALSE > ) > > > I hope this helps. > > On Fri, Sep 17, 2021 at 6:26 PM Leonard Mada <mailto:leo.m...@syonic.eu>> wrote: > > Hello Andrew, > > > But "cut" generates factors. In most cases with real data one > expects to have also the ends of the interval: the argument > "include.lowest" is both ugly and too long. > > [The test-code on the ftable thread contains this error! I have > run through this error a couple of times.] > > > The only real situation that I can imagine to be problematic: > > - if the interval goes to +Inf (or -Inf): I do not know if there > would be any effects when including +Inf (or -Inf). > > > Leonard > > > On 9/18/2021 1:14 AM, Andrew Simmons wrote: >> While it is not explicitly mentioned anywhere in the >> documentation for .bincode, I suspect 'include.lowest = FALSE' is >> the default to keep the definitions of the bins consistent. For >> example: >> >> >> x <- 0:20 >> breaks1 <- seq.int <http://seq.int>(0, 16, 4) >> breaks2 <- seq.int <http://seq.int>(0, 20, 4) >> cbind( >> .bincode(x, breaks1, right = FALSE, include.lowest = TRUE), >> .bincode(x, breaks2, right = FALSE, include.lowest = TRUE) >> ) >> >> >> by having 'include.lowest = TRUE' with different ends, you can >> get inconsistent behaviour. While this probably wouldn't be an >> issue with 'real' data, this would seem like something you'd want >> to avoid by default. The definitions of the bins are >> >> >> [0, 4) >> [4, 8) >> [8, 12) >> [12, 16] >> >> >> and >> >> >> [0, 4) >> [4, 8) >> [8, 12) >> [12, 16) >> [16, 20] >> >> >> so you can see where the inconsistent behaviour comes from. You >> might be able to get R-core to add argument 'warn', but probably >> not to change the default of 'include.lowest'. I hope this helps >> >> >> On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada > <mailto:leo.m...@syonic.eu>> wrote: >> >> Thank you Andrew. >> >> >> Is there any reason not to make: include.lowest = TRUE the >> default? >> >> >> Regarding the NA: >> >> The user still has to suspect that some values were not >> included and run that test. >> >> >> Leonard >> >> >> On 9/18/2021 12:53 AM, Andrew Simmons wrote: >>> Regarding your first point, argument 'include.lowest' >>> already handles this specific case, see ?.bincode >>> >>> Your second point, maybe it could be helpful, but since both >>> 'cut.default' and '.bincode' return NA if a value isn't >>> within a bin, you could make something like this on your own. >>> Might be worth pitching to R-bugs on the wishlist. >>> >>> >>> >>> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help >>> mailto:r-help@r-project.org>> wrote: >>> >>> Hello List members, >>> >>> >>> the following improvements would be useful for function >>> cut (and .bincode): >>> >>> >>> 1.) Argument: Include extremes >>> extremes = TRUE >>> if(right == FALSE) { >>> # include also right for last interval; >>> } else { >>> # include also lef
Re: [R] Improvement: function cut
Hello Andrew, But "cut" generates factors. In most cases with real data one expects to have also the ends of the interval: the argument "include.lowest" is both ugly and too long. [The test-code on the ftable thread contains this error! I have run through this error a couple of times.] The only real situation that I can imagine to be problematic: - if the interval goes to +Inf (or -Inf): I do not know if there would be any effects when including +Inf (or -Inf). Leonard On 9/18/2021 1:14 AM, Andrew Simmons wrote: > While it is not explicitly mentioned anywhere in the documentation for > .bincode, I suspect 'include.lowest = FALSE' is the default to keep > the definitions of the bins consistent. For example: > > > x <- 0:20 > breaks1 <- seq.int <http://seq.int>(0, 16, 4) > breaks2 <- seq.int <http://seq.int>(0, 20, 4) > cbind( > .bincode(x, breaks1, right = FALSE, include.lowest = TRUE), > .bincode(x, breaks2, right = FALSE, include.lowest = TRUE) > ) > > > by having 'include.lowest = TRUE' with different ends, you can get > inconsistent behaviour. While this probably wouldn't be an issue with > 'real' data, this would seem like something you'd want to avoid by > default. The definitions of the bins are > > > [0, 4) > [4, 8) > [8, 12) > [12, 16] > > > and > > > [0, 4) > [4, 8) > [8, 12) > [12, 16) > [16, 20] > > > so you can see where the inconsistent behaviour comes from. You might > be able to get R-core to add argument 'warn', but probably not to > change the default of 'include.lowest'. I hope this helps > > > On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada <mailto:leo.m...@syonic.eu>> wrote: > > Thank you Andrew. > > > Is there any reason not to make: include.lowest = TRUE the default? > > > Regarding the NA: > > The user still has to suspect that some values were not included > and run that test. > > > Leonard > > > On 9/18/2021 12:53 AM, Andrew Simmons wrote: >> Regarding your first point, argument 'include.lowest' already >> handles this specific case, see ?.bincode >> >> Your second point, maybe it could be helpful, but since both >> 'cut.default' and '.bincode' return NA if a value isn't within a >> bin, you could make something like this on your own. >> Might be worth pitching to R-bugs on the wishlist. >> >> >> >> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help >> mailto:r-help@r-project.org>> wrote: >> >> Hello List members, >> >> >> the following improvements would be useful for function cut >> (and .bincode): >> >> >> 1.) Argument: Include extremes >> extremes = TRUE >> if(right == FALSE) { >> # include also right for last interval; >> } else { >> # include also left for first interval; >> } >> >> >> 2.) Argument: warn = TRUE >> >> Warn if any values are not included in the intervals. >> >> >> Motivation: >> - reduce risk of errors when using function cut(); >> >> >> Sincerely, >> >> >> Leonard >> >> __ >> R-help@r-project.org <mailto:R-help@r-project.org> mailing >> list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible >> code. >> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improvement: function cut
Thank you Andrew. Is there any reason not to make: include.lowest = TRUE the default? Regarding the NA: The user still has to suspect that some values were not included and run that test. Leonard On 9/18/2021 12:53 AM, Andrew Simmons wrote: > Regarding your first point, argument 'include.lowest' already handles > this specific case, see ?.bincode > > Your second point, maybe it could be helpful, but since both > 'cut.default' and '.bincode' return NA if a value isn't within a bin, > you could make something like this on your own. > Might be worth pitching to R-bugs on the wishlist. > > > > On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help > mailto:r-help@r-project.org>> wrote: > > Hello List members, > > > the following improvements would be useful for function cut (and > .bincode): > > > 1.) Argument: Include extremes > extremes = TRUE > if(right == FALSE) { > # include also right for last interval; > } else { > # include also left for first interval; > } > > > 2.) Argument: warn = TRUE > > Warn if any values are not included in the intervals. > > > Motivation: > - reduce risk of errors when using function cut(); > > > Sincerely, > > > Leonard > > __ > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Improvement: function cut
Hello List members, the following improvements would be useful for function cut (and .bincode): 1.) Argument: Include extremes extremes = TRUE if(right == FALSE) { # include also right for last interval; } else { # include also left for first interval; } 2.) Argument: warn = TRUE Warn if any values are not included in the intervals. Motivation: - reduce risk of errors when using function cut(); Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R Code] Split long names in format.ftable
Dear List members, I have uploaded an improved version on Github. The function is now fully functional: - justify: left, right, cent...: TODO centre vs center; - sep: separator when printing; - pos: Top, Bottom; TODO: Middle; see: https://github.com/discoleo/R/blob/master/Stat/Tools.Data. I will address some open questions in a separate post. ### Test # Required: # - all functions from Github / section "Formatting": # from space.builder() to ftable2(); ### Data mtcars$carbCtg = cut(mtcars$carb, c(1, 2, 4, 8), right=FALSE) tbl = with(mtcars, table(cyl, hp, carbCtg, gear)) id = c(1,3,4); xnm = c("Long\nname: ", "", "Extremely\nlong\nname: ") xnm = paste0(xnm, names(dimnames(tbl))[id]); names(dimnames(tbl))[id] = xnm; ftbl = ftable(tbl, row.vars = id); ### FTABLE ftable2(ftbl, sep="|"); # works nicely ftable2(ftbl, sep=" ") ftable2(ftbl, sep=" | ") ftable2(ftbl, sep=" | ", justify="left") ftable2(ftbl, sep=" | ", justify="cent") # TODO: center vs centre ftable2(ftbl, sep=" | ", justify="left", justify.lvl="c") Sincerely, Leonard On 9/15/2021 11:14 PM, Leonard Mada wrote: Dear List members, I have uploaded an improved version on Github: - new option: align top vs bottom; Functions: split.names: splits and aligns the names; merge.align: aligns 2 string matrices; ftable2: enhanced version of format.ftable (proof of concept); see: https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R It makes sense to have such functionality in base R as well: it may be useful in various locations to format character output. Sincerely, Leonard On 9/14/2021 8:18 PM, Leonard Mada wrote: Dear List members, I wrote some code to split long names in format.ftable. I hope it will be useful to others as well. Ideally, this code should be implemented natively in R. I will provide in the 2nd part of the mail a concept how to actually implement the code in R. This may be interesting to R-devel as well. [...] C.) split.names Function This function may be useful in other locations as well, particularly to split names/labels used in axes and legends in various plots. But I do not have much knowledge of the graphics engine in R. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R Code] Split long names in format.ftable
Dear List members, I have uploaded an improved version on Github: - new option: align top vs bottom; Functions: split.names: splits and aligns the names; merge.align: aligns 2 string matrices; ftable2: enhanced version of format.ftable (proof of concept); see: https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R It makes sense to have such functionality in base R as well: it may be useful in various locations to format character output. Sincerely, Leonard On 9/14/2021 8:18 PM, Leonard Mada wrote: Dear List members, I wrote some code to split long names in format.ftable. I hope it will be useful to others as well. Ideally, this code should be implemented natively in R. I will provide in the 2nd part of the mail a concept how to actually implement the code in R. This may be interesting to R-devel as well. ### Helper function # Split the actual names split.names = function(names, extend=0, justify="Right", blank.rm=FALSE, split.ch = "\n", detailed=TRUE) { justify = if(is.null(justify)) 0 else pmatch(justify, c("Left", "Right")); str = strsplit(names, split.ch); if(blank.rm) str = lapply(str, function(s) s[nchar(s) > 0]); nr = max(sapply(str, function(s) length(s))); nch = lapply(str, function(s) max(nchar(s))); chf = function(nch) paste0(rep(" ", nch), collapse=""); ch0 = sapply(nch, chf); mx = matrix(rep(ch0, each=nr), nrow=nr, ncol=length(names)); for(nc in seq(length(names))) { n = length(str[[nc]]); # Justifying s = sapply(seq(n), function(nr) paste0(rep(" ", nch[[nc]] - nchar(str[[nc]][nr])), collapse="")); s = if(justify == 2) paste0(s, str[[nc]]) else paste0(str[[nc]], s); mx[seq(nr + 1 - length(str[[nc]]), nr) , nc] = s; } if(extend > 0) { mx = cbind(mx, matrix("", nr=nr, ncol=extend)); } if(detailed) attr(mx, "nchar") = unlist(nch); return(mx); } ### ftable with name splitting # - this code should be ideally integrated inside format.ftable; ftable2 = function(ftbl, print=TRUE, quote=FALSE, ...) { ftbl2 = format(ftbl, quote=quote, ...); row.vars = names(attr(ftbl, "row.vars")) nr = length(row.vars); nms = split.names(row.vars, extend = ncol(ftbl2) - nr); ftbl2 = rbind(ftbl2[1,], nms, ftbl2[-c(1,2),]); # TODO: update width of factor labels; # - new width available in attr(nms, "nchar"); if(print) { cat(t(ftbl2), sep = c(rep(" ", ncol(ftbl2) - 1), "\n")) } invisible(ftbl2); } I have uploaded this code also on Github: https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R B.) Detailed Concept # - I am ignoring any variants; # - the splitting is actually done in format.ftable; # - we set only an attribute in ftable; ftable = function(..., split.ch="\n") { [...] attr(ftbl, "split.ch") = split.ch; # set an attribute "split.ch" return(ftbl); } format.ftable(ftbl, ..., split.ch) { if(is.missing(split.ch)) { # check if the split.ch attribute is set and use it; } else { # use the explicitly provided split.ch: if( ! is.null(split.ch)) } [...] } C.) split.names Function This function may be useful in other locations as well, particularly to split names/labels used in axes and legends in various plots. But I do not have much knowledge of the graphics engine in R. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Evaluating lazily 'f<-' ?
gt; >>>> it is broken into small stages. The guide "R Language >>>> Definition" claims that the above would be equivalent to: >>>> >>>> >>>> `<-`(df, `padding<-`(df, value = `right<-`(padding(df), >>>> value = 1))) >>>> >>>> >>>> but that is not correct, and you can tell by using >>>> `substitute` as you were above. There isn't a way to do >>>> what you want with the syntax you provided, you'll have to >>>> do something different. You could add a `which` argument to >>>> each style function, and maybe put the code for `match.arg` >>>> in a separate function: >>>> >>>> >>>> match.which <- function (which) >>>> match.arg(which, c("bottom", "left", "top", "right"), >>>> several.ok = TRUE) >>>> >>>> >>>> padding <- function (x, which) >>>> { >>>> which <- match.which(which) >>>> # more code >>>> } >>>> >>>> >>>> border <- function (x, which) >>>> { >>>> which <- match.which(which) >>>> # more code >>>> } >>>> >>>> >>>> some_other_style <- function (x, which) >>>> { >>>> which <- match.which(which) >>>> # more code >>>> } >>>> >>>> >>>> I hope this helps. >>>> >>>> On Mon, Sep 13, 2021 at 12:17 PM Leonard Mada >>>> mailto:leo.m...@syonic.eu>> wrote: >>>> >>>> Hello Andrew, >>>> >>>> >>>> this could work. I will think about it. >>>> >>>> >>>> But I was thinking more generically. Suppose we have a >>>> series of functions: >>>> padding(), border(), some_other_style(); >>>> Each of these functions has the parameter "right" (or >>>> the group of parameters c("right", ...)). >>>> >>>> >>>> Then I could design a function right(FUN) that assigns >>>> the value to this parameter and evaluates the function >>>> FUN(). >>>> >>>> >>>> There are a few ways to do this: >>>> >>>> 1.) Other parameters as ... >>>> right(FUN, value, ...) = value; and then pass "..." to FUN. >>>> right(value, FUN, ...) = value; # or is this the >>>> syntax? (TODO: explore) >>>> >>>> 2.) Another way: >>>> right(FUN(...other parameters already specified...)) = >>>> value; >>>> I wanted to explore this 2nd option: but avoid >>>> evaluating FUN, unless the parameter "right" is >>>> injected into the call. >>>> >>>> 3.) Option 3: >>>> The option you mentioned. >>>> >>>> >>>> Independent of the method: there are still >>>> weird/unexplained behaviours when I try the initial >>>> code (see the latest mail with the improved code). >>>> >>>> >>>> Sincerely, >>>> >>>> >>>> Leonard >>>> >>>> >>>> On 9/13/2021 6:45 PM, Andrew Simmons wrote: >>>>> I think you're trying to do something like: >>>>> >>>>> `padding<-` <- function (x, which, value) >>>>> { >>>>> which <- match.arg(which, c("bottom", "left", >>>>> "top", "right"), several.ok = TRUE) >>>>> # code to pad to each side here >>>>> } >>>>> >>>>> Then you could use it like >>>>> >>>>> df <- data
[R] [R Code] Split long names in format.ftable
Dear List members, I wrote some code to split long names in format.ftable. I hope it will be useful to others as well. Ideally, this code should be implemented natively in R. I will provide in the 2nd part of the mail a concept how to actually implement the code in R. This may be interesting to R-devel as well. ### Helper function # Split the actual names split.names = function(names, extend=0, justify="Right", blank.rm=FALSE, split.ch = "\n", detailed=TRUE) { justify = if(is.null(justify)) 0 else pmatch(justify, c("Left", "Right")); str = strsplit(names, split.ch); if(blank.rm) str = lapply(str, function(s) s[nchar(s) > 0]); nr = max(sapply(str, function(s) length(s))); nch = lapply(str, function(s) max(nchar(s))); chf = function(nch) paste0(rep(" ", nch), collapse=""); ch0 = sapply(nch, chf); mx = matrix(rep(ch0, each=nr), nrow=nr, ncol=length(names)); for(nc in seq(length(names))) { n = length(str[[nc]]); # Justifying s = sapply(seq(n), function(nr) paste0(rep(" ", nch[[nc]] - nchar(str[[nc]][nr])), collapse="")); s = if(justify == 2) paste0(s, str[[nc]]) else paste0(str[[nc]], s); mx[seq(nr + 1 - length(str[[nc]]), nr) , nc] = s; } if(extend > 0) { mx = cbind(mx, matrix("", nr=nr, ncol=extend)); } if(detailed) attr(mx, "nchar") = unlist(nch); return(mx); } ### ftable with name splitting # - this code should be ideally integrated inside format.ftable; ftable2 = function(ftbl, print=TRUE, quote=FALSE, ...) { ftbl2 = format(ftbl, quote=quote, ...); row.vars = names(attr(ftbl, "row.vars")) nr = length(row.vars); nms = split.names(row.vars, extend = ncol(ftbl2) - nr); ftbl2 = rbind(ftbl2[1,], nms, ftbl2[-c(1,2),]); # TODO: update width of factor labels; # - new width available in attr(nms, "nchar"); if(print) { cat(t(ftbl2), sep = c(rep(" ", ncol(ftbl2) - 1), "\n")) } invisible(ftbl2); } I have uploaded this code also on Github: https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R B.) Detailed Concept # - I am ignoring any variants; # - the splitting is actually done in format.ftable; # - we set only an attribute in ftable; ftable = function(..., split.ch="\n") { [...] attr(ftbl, "split.ch") = split.ch; # set an attribute "split.ch" return(ftbl); } format.ftable(ftbl, ..., split.ch) { if(is.missing(split.ch)) { # check if the split.ch attribute is set and use it; } else { # use the explicitly provided split.ch: if( ! is.null(split.ch)) } [...] } C.) split.names Function This function may be useful in other locations as well, particularly to split names/labels used in axes and legends in various plots. But I do not have much knowledge of the graphics engine in R. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fastest way to extract rows of smaller matrix many times by index to make larger matrix? and multiply columsn of matrix by vector
Hello Nevil, you could test something like: # the Matrix m = matrix(1:1000, ncol=10) m = t(m) # Extract Data idcol = sample(seq(100), 100, TRUE); # now columns for(i in 1:100) { m2 = m[ , idcol]; } m2 = t(m2); # transpose back It may be faster, although I did not benchmark it. There may be more complex variants. Maybe it is warranted to try for 10^7 extractions: - e.g. extracting one row and replacing all occurrences of that row; Sincerely, Leonard It seems I cannot extract digested mail anymore. I hope though that the message is processed properly. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Evaluating lazily 'f<-' ?
But I was thinking more generically. Suppose we have a >>> series of functions: >>> padding(), border(), some_other_style(); >>> Each of these functions has the parameter "right" (or the >>> group of parameters c("right", ...)). >>> >>> >>> Then I could design a function right(FUN) that assigns the >>> value to this parameter and evaluates the function FUN(). >>> >>> >>> There are a few ways to do this: >>> >>> 1.) Other parameters as ... >>> right(FUN, value, ...) = value; and then pass "..." to FUN. >>> right(value, FUN, ...) = value; # or is this the syntax? >>> (TODO: explore) >>> >>> 2.) Another way: >>> right(FUN(...other parameters already specified...)) = value; >>> I wanted to explore this 2nd option: but avoid evaluating >>> FUN, unless the parameter "right" is injected into the call. >>> >>> 3.) Option 3: >>> The option you mentioned. >>> >>> >>> Independent of the method: there are still weird/unexplained >>> behaviours when I try the initial code (see the latest mail >>> with the improved code). >>> >>> >>> Sincerely, >>> >>> >>> Leonard >>> >>> >>> On 9/13/2021 6:45 PM, Andrew Simmons wrote: >>>> I think you're trying to do something like: >>>> >>>> `padding<-` <- function (x, which, value) >>>> { >>>> which <- match.arg(which, c("bottom", "left", "top", >>>> "right"), several.ok = TRUE) >>>> # code to pad to each side here >>>> } >>>> >>>> Then you could use it like >>>> >>>> df <- data.frame(x=1:5, y = sample(1:5, 5)) >>>> padding(df, "right") <- 1 >>>> >>>> Does that work as expected for you? >>>> >>>> On Mon, Sep 13, 2021, 11:28 Leonard Mada via R-help >>>> mailto:r-help@r-project.org>> wrote: >>>> >>>> I try to clarify the code: >>>> >>>> >>>> ### >>>> right = function(x, val) {print("Right");}; >>>> padding = function(x, right, left, top, bottom) >>>> {print("Padding");}; >>>> 'padding<-' = function(x, ...) {print("Padding = ");}; >>>> df = data.frame(x=1:5, y = sample(1:5, 5)); # anything >>>> >>>> ### Does NOT work as expected >>>> 'right<-' = function(x, value) { >>>> print("This line should be the first printed!") >>>> print("But ERROR: x was already evaluated, which >>>> printed \"Padding\""); >>>> x = substitute(x); # x was already evaluated >>>> before substitute(); >>>> return("Nothing"); # do not now what the behaviour >>>> should be? >>>> } >>>> >>>> right(padding(df)) = 1; >>>> >>>> ### Output: >>>> >>>> [1] "Padding" >>>> [1] "This line should be the first printed!" >>>> [1] "But ERROR: x was already evaluated, which printed >>>> \"Padding\"" >>>> [1] "Padding = " # How did this happen ??? >>>> >>>> >>>> ### Problems: >>>> >>>> 1.) substitute(x): did not capture the expression; >>>> - the first parameter of 'right<-' was already >>>> evaluated, which is not >>>> the case with '%f%'; >>>> Can I avoid evaluating this parameter? >>>> How can I avoid to evaluate it and capture the >>>> expression: "right(...)"? >>>> >>>> >>>> 2.)
Re: [R] Evaluating lazily 'f<-' ?
Hello, I have found the evaluation: it is described in the section on subsetting. The forced evaluation makes sense for subsetting. On 9/13/2021 9:42 PM, Leonard Mada wrote: > > Hello Andrew, > > > I try now to understand the evaluation of the expression: > > e = expression(r(x) <- 1) > > # parameter named "value" seems to be required; > 'r<-' = function(x, value) {print("R");} > eval(e, list(x=2)) > # [1] "R" > > # both versions work > 'r<-' = function(value, x) {print("R");} > eval(e, list(x=2)) > # [1] "R" > > > ### the Expression > e[[1]][[1]] # "<-", not "r<-" > e[[1]][[2]] # "r(x)" > > > The evaluation of "e" somehow calls "r<-", but evaluates also the > argument of r(...). I am still investigating what is actually happening. > The forced evaluation is relevant for subsetting, e.g.: expression(r(x)[3] <- 1) expression(r(x)[3] <- 1)[[1]][[2]] # r(x)[3] # the evaluation details are NOT visible in the expression per se; # Note: indeed, it makes sens to first evaluate r(x) and then to perform the subsetting; However, in the case of a non-subsetted expression: r(x) <- 1; It would make sense to evaluate lazily r(x) if no subsetting is involved (more precisely "r<-"(x, value) ). Would this have any impact on the current code? Sincerely, Leonard > > Sincerely, > > > Leonard > > > On 9/13/2021 9:15 PM, Andrew Simmons wrote: >> R's parser doesn't work the way you're expecting it to. When doing an >> assignment like: >> >> >> padding(right(df)) <- 1 >> >> >> it is broken into small stages. The guide "R Language Definition" >> claims that the above would be equivalent to: >> >> >> `<-`(df, `padding<-`(df, value = `right<-`(padding(df), value = 1))) >> >> >> but that is not correct, and you can tell by using `substitute` as >> you were above. There isn't a way to do what you want with the syntax >> you provided, you'll have to do something different. You could add a >> `which` argument to each style function, and maybe put the code for >> `match.arg` in a separate function: >> >> >> match.which <- function (which) >> match.arg(which, c("bottom", "left", "top", "right"), several.ok = TRUE) >> >> >> padding <- function (x, which) >> { >> which <- match.which(which) >> # more code >> } >> >> >> border <- function (x, which) >> { >> which <- match.which(which) >> # more code >> } >> >> >> some_other_style <- function (x, which) >> { >> which <- match.which(which) >> # more code >> } >> >> >> I hope this helps. >> >> On Mon, Sep 13, 2021 at 12:17 PM Leonard Mada > <mailto:leo.m...@syonic.eu>> wrote: >> >> Hello Andrew, >> >> >> this could work. I will think about it. >> >> >> But I was thinking more generically. Suppose we have a series of >> functions: >> padding(), border(), some_other_style(); >> Each of these functions has the parameter "right" (or the group >> of parameters c("right", ...)). >> >> >> Then I could design a function right(FUN) that assigns the value >> to this parameter and evaluates the function FUN(). >> >> >> There are a few ways to do this: >> >> 1.) Other parameters as ... >> right(FUN, value, ...) = value; and then pass "..." to FUN. >> right(value, FUN, ...) = value; # or is this the syntax? (TODO: >> explore) >> >> 2.) Another way: >> right(FUN(...other parameters already specified...)) = value; >> I wanted to explore this 2nd option: but avoid evaluating FUN, >> unless the parameter "right" is injected into the call. >> >> 3.) Option 3: >> The option you mentioned. >> >> >> Independent of the method: there are still weird/unexplained >> behaviours when I try the initial code (see the latest mail with >> the improved code). >> >> >> Sincerely, >> >> >> Leonard >> >> >> On 9/13/2021 6:45 PM, Andrew Simmons wrote: >>> I think you're trying to do something like: >>> >>> `padding<-` <- function (x, which, value) >>&g
Re: [R] Evaluating lazily 'f<-' ?
Hello Andrew, I try now to understand the evaluation of the expression: e = expression(r(x) <- 1) # parameter named "value" seems to be required; 'r<-' = function(x, value) {print("R");} eval(e, list(x=2)) # [1] "R" # both versions work 'r<-' = function(value, x) {print("R");} eval(e, list(x=2)) # [1] "R" ### the Expression e[[1]][[1]] # "<-", not "r<-" e[[1]][[2]] # "r(x)" The evaluation of "e" somehow calls "r<-", but evaluates also the argument of r(...). I am still investigating what is actually happening. Sincerely, Leonard On 9/13/2021 9:15 PM, Andrew Simmons wrote: > R's parser doesn't work the way you're expecting it to. When doing an > assignment like: > > > padding(right(df)) <- 1 > > > it is broken into small stages. The guide "R Language Definition" > claims that the above would be equivalent to: > > > `<-`(df, `padding<-`(df, value = `right<-`(padding(df), value = 1))) > > > but that is not correct, and you can tell by using `substitute` as you > were above. There isn't a way to do what you want with the syntax you > provided, you'll have to do something different. You could add a > `which` argument to each style function, and maybe put the code for > `match.arg` in a separate function: > > > match.which <- function (which) > match.arg(which, c("bottom", "left", "top", "right"), several.ok = TRUE) > > > padding <- function (x, which) > { > which <- match.which(which) > # more code > } > > > border <- function (x, which) > { > which <- match.which(which) > # more code > } > > > some_other_style <- function (x, which) > { > which <- match.which(which) > # more code > } > > > I hope this helps. > > On Mon, Sep 13, 2021 at 12:17 PM Leonard Mada <mailto:leo.m...@syonic.eu>> wrote: > > Hello Andrew, > > > this could work. I will think about it. > > > But I was thinking more generically. Suppose we have a series of > functions: > padding(), border(), some_other_style(); > Each of these functions has the parameter "right" (or the group of > parameters c("right", ...)). > > > Then I could design a function right(FUN) that assigns the value > to this parameter and evaluates the function FUN(). > > > There are a few ways to do this: > > 1.) Other parameters as ... > right(FUN, value, ...) = value; and then pass "..." to FUN. > right(value, FUN, ...) = value; # or is this the syntax? (TODO: > explore) > > 2.) Another way: > right(FUN(...other parameters already specified...)) = value; > I wanted to explore this 2nd option: but avoid evaluating FUN, > unless the parameter "right" is injected into the call. > > 3.) Option 3: > The option you mentioned. > > > Independent of the method: there are still weird/unexplained > behaviours when I try the initial code (see the latest mail with > the improved code). > > > Sincerely, > > > Leonard > > > On 9/13/2021 6:45 PM, Andrew Simmons wrote: >> I think you're trying to do something like: >> >> `padding<-` <- function (x, which, value) >> { >> which <- match.arg(which, c("bottom", "left", "top", >> "right"), several.ok = TRUE) >> # code to pad to each side here >> } >> >> Then you could use it like >> >> df <- data.frame(x=1:5, y = sample(1:5, 5)) >> padding(df, "right") <- 1 >> >> Does that work as expected for you? >> >> On Mon, Sep 13, 2021, 11:28 Leonard Mada via R-help >> mailto:r-help@r-project.org>> wrote: >> >> I try to clarify the code: >> >> >> ### >> right = function(x, val) {print("Right");}; >> padding = function(x, right, left, top, bottom) >> {print("Padding");}; >> 'padding<-' = function(x, ...) {print("Padding = ");}; >> df = data.frame(x=1:5, y = sample(1:5, 5)); # anything >> >> ### Does NOT work as expected >> 'right<-' = function(x, value) { >> print("This line should be the first printed!") >> print("But ERROR: x was alrea
Re: [R] Evaluating lazily 'f<-' ?
Hello Andrew, this could work. I will think about it. But I was thinking more generically. Suppose we have a series of functions: padding(), border(), some_other_style(); Each of these functions has the parameter "right" (or the group of parameters c("right", ...)). Then I could design a function right(FUN) that assigns the value to this parameter and evaluates the function FUN(). There are a few ways to do this: 1.) Other parameters as ... right(FUN, value, ...) = value; and then pass "..." to FUN. right(value, FUN, ...) = value; # or is this the syntax? (TODO: explore) 2.) Another way: right(FUN(...other parameters already specified...)) = value; I wanted to explore this 2nd option: but avoid evaluating FUN, unless the parameter "right" is injected into the call. 3.) Option 3: The option you mentioned. Independent of the method: there are still weird/unexplained behaviours when I try the initial code (see the latest mail with the improved code). Sincerely, Leonard On 9/13/2021 6:45 PM, Andrew Simmons wrote: > I think you're trying to do something like: > > `padding<-` <- function (x, which, value) > { > which <- match.arg(which, c("bottom", "left", "top", "right"), > several.ok = TRUE) > # code to pad to each side here > } > > Then you could use it like > > df <- data.frame(x=1:5, y = sample(1:5, 5)) > padding(df, "right") <- 1 > > Does that work as expected for you? > > On Mon, Sep 13, 2021, 11:28 Leonard Mada via R-help > mailto:r-help@r-project.org>> wrote: > > I try to clarify the code: > > > ### > right = function(x, val) {print("Right");}; > padding = function(x, right, left, top, bottom) {print("Padding");}; > 'padding<-' = function(x, ...) {print("Padding = ");}; > df = data.frame(x=1:5, y = sample(1:5, 5)); # anything > > ### Does NOT work as expected > 'right<-' = function(x, value) { > print("This line should be the first printed!") > print("But ERROR: x was already evaluated, which printed > \"Padding\""); > x = substitute(x); # x was already evaluated before substitute(); > return("Nothing"); # do not now what the behaviour should be? > } > > right(padding(df)) = 1; > > ### Output: > > [1] "Padding" > [1] "This line should be the first printed!" > [1] "But ERROR: x was already evaluated, which printed \"Padding\"" > [1] "Padding = " # How did this happen ??? > > > ### Problems: > > 1.) substitute(x): did not capture the expression; > - the first parameter of 'right<-' was already evaluated, which is > not > the case with '%f%'; > Can I avoid evaluating this parameter? > How can I avoid to evaluate it and capture the expression: > "right(...)"? > > > 2.) Unexpected > 'padding<-' was also called! > I did not know this. Is it feature or bug? > R 4.0.4 > > > Sincerely, > > > Leonard > > > On 9/13/2021 4:45 PM, Duncan Murdoch wrote: > > On 13/09/2021 9:38 a.m., Leonard Mada wrote: > >> Hello, > >> > >> > >> I can include code for "padding<-"as well, but the error is > before that, > >> namely in 'right<-': > >> > >> right = function(x, val) {print("Right");}; > >> # more options: > >> padding = function(x, right, left, top, bottom) > {print("Padding");}; > >> 'padding<-' = function(x, ...) {print("Padding = ");}; > >> df = data.frame(x=1:5, y = sample(1:5, 5)); > >> > >> > >> ### Does NOT work > >> 'right<-' = function(x, val) { > >> print("Already evaluated and also does not use 'val'"); > >> x = substitute(x); # x was evaluated before > >> } > >> > >> right(padding(df)) = 1; > > > > It "works" (i.e. doesn't generate an error) for me, when I correct > > your typo: the second argument to `right<-` should be `value`, not > > `val`. > > > > I'm still not clear whether it does what you want with that fix, > > because I don't really understand what you want. > > > > Duncan Murdoc
Re: [R] Evaluating lazily 'f<-' ?
I try to clarify the code: ### right = function(x, val) {print("Right");}; padding = function(x, right, left, top, bottom) {print("Padding");}; 'padding<-' = function(x, ...) {print("Padding = ");}; df = data.frame(x=1:5, y = sample(1:5, 5)); # anything ### Does NOT work as expected 'right<-' = function(x, value) { print("This line should be the first printed!") print("But ERROR: x was already evaluated, which printed \"Padding\""); x = substitute(x); # x was already evaluated before substitute(); return("Nothing"); # do not now what the behaviour should be? } right(padding(df)) = 1; ### Output: [1] "Padding" [1] "This line should be the first printed!" [1] "But ERROR: x was already evaluated, which printed \"Padding\"" [1] "Padding = " # How did this happen ??? ### Problems: 1.) substitute(x): did not capture the expression; - the first parameter of 'right<-' was already evaluated, which is not the case with '%f%'; Can I avoid evaluating this parameter? How can I avoid to evaluate it and capture the expression: "right(...)"? 2.) Unexpected 'padding<-' was also called! I did not know this. Is it feature or bug? R 4.0.4 Sincerely, Leonard On 9/13/2021 4:45 PM, Duncan Murdoch wrote: On 13/09/2021 9:38 a.m., Leonard Mada wrote: Hello, I can include code for "padding<-"as well, but the error is before that, namely in 'right<-': right = function(x, val) {print("Right");}; # more options: padding = function(x, right, left, top, bottom) {print("Padding");}; 'padding<-' = function(x, ...) {print("Padding = ");}; df = data.frame(x=1:5, y = sample(1:5, 5)); ### Does NOT work 'right<-' = function(x, val) { print("Already evaluated and also does not use 'val'"); x = substitute(x); # x was evaluated before } right(padding(df)) = 1; It "works" (i.e. doesn't generate an error) for me, when I correct your typo: the second argument to `right<-` should be `value`, not `val`. I'm still not clear whether it does what you want with that fix, because I don't really understand what you want. Duncan Murdoch I want to capture the assignment event inside "right<-" and then call the function padding() properly. I haven't thought yet if I should use: padding(x, right, left, ... other parameters); or padding(x, parameter) <- value; It also depends if I can properly capture the unevaluated expression inside "right<-": 'right<-' = function(x, val) { # x is automatically evaluated when using 'f<-'! # but not when implementing as '%f%' = function(x, y); } Many thanks, Leonard On 9/13/2021 4:11 PM, Duncan Murdoch wrote: On 12/09/2021 10:33 a.m., Leonard Mada via R-help wrote: How can I avoid evaluation? right = function(x, val) {print("Right");}; padding = function(x) {print("Padding");}; df = data.frame(x=1:5, y = sample(1:5, 5)); ### OK '%=%' = function(x, val) { x = substitute(x); } right(padding(df)) %=% 1; # but ugly ### Does NOT work 'right<-' = function(x, val) { print("Already evaluated and also does not use 'val'"); x = substitute(x); # is evaluated before } right(padding(df)) = 1 That doesn't make sense. You don't have a `padding<-` function, and yet you are trying to call right<- to assign something to padding(df). I'm not sure about your real intention, but assignment functions by their nature need to evaluate the thing they are assigning to, since they are designed to modify objects, not create new ones. To create a new object, just use regular assignment. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Evaluating lazily 'f<-' ?
Hello, I can include code for "padding<-"as well, but the error is before that, namely in 'right<-': right = function(x, val) {print("Right");}; # more options: padding = function(x, right, left, top, bottom) {print("Padding");}; 'padding<-' = function(x, ...) {print("Padding = ");}; df = data.frame(x=1:5, y = sample(1:5, 5)); ### Does NOT work 'right<-' = function(x, val) { print("Already evaluated and also does not use 'val'"); x = substitute(x); # x was evaluated before } right(padding(df)) = 1; I want to capture the assignment event inside "right<-" and then call the function padding() properly. I haven't thought yet if I should use: padding(x, right, left, ... other parameters); or padding(x, parameter) <- value; It also depends if I can properly capture the unevaluated expression inside "right<-": 'right<-' = function(x, val) { # x is automatically evaluated when using 'f<-'! # but not when implementing as '%f%' = function(x, y); } Many thanks, Leonard On 9/13/2021 4:11 PM, Duncan Murdoch wrote: On 12/09/2021 10:33 a.m., Leonard Mada via R-help wrote: How can I avoid evaluation? right = function(x, val) {print("Right");}; padding = function(x) {print("Padding");}; df = data.frame(x=1:5, y = sample(1:5, 5)); ### OK '%=%' = function(x, val) { x = substitute(x); } right(padding(df)) %=% 1; # but ugly ### Does NOT work 'right<-' = function(x, val) { print("Already evaluated and also does not use 'val'"); x = substitute(x); # is evaluated before } right(padding(df)) = 1 That doesn't make sense. You don't have a `padding<-` function, and yet you are trying to call right<- to assign something to padding(df). I'm not sure about your real intention, but assignment functions by their nature need to evaluate the thing they are assigning to, since they are designed to modify objects, not create new ones. To create a new object, just use regular assignment. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Evaluating lazily 'f<-' ?
How can I avoid evaluation? right = function(x, val) {print("Right");}; padding = function(x) {print("Padding");}; df = data.frame(x=1:5, y = sample(1:5, 5)); ### OK '%=%' = function(x, val) { x = substitute(x); } right(padding(df)) %=% 1; # but ugly ### Does NOT work 'right<-' = function(x, val) { print("Already evaluated and also does not use 'val'"); x = substitute(x); # is evaluated before } right(padding(df)) = 1 Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.