Thanks. I found this to be quite informative and a nice example of how useful R-Help can be as a resource for R users.
Best, Bert On Mon, Jul 22, 2024 at 4:50 AM Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > > Base R. Regarding code improvements: > > 1. Personally I find (\(...) ...)() notation hard to read (although by > placing (\(x), the body and )() on 3 separate lines it can be improved > somewhat). Instead let us use a named function. The name of the > function can also serve to self document the code. > > 2. The use of dat both at the start of the pipeline and then again > within a later step of the pipeline goes against a strict left to > right flow. In general if this occurs it is either a sign that we need > to break the pipeline into two or that we need to find another > approach which is what we do here. > > We can use the base R code below. Note that the column names produced > by transform(S = read.table(...)) are S.V1, S.V2, etc. so to fix the > column names remove .V from all column names as in the fix_colnames > function shown. It does no harm to apply that to all column names > since the remaining column names will not match. > > fix_colnames <- function(x) { > setNames(x, sub("\\.V", "", names(x))) > } > > dat |> > transform(S = read.table(text = string, > header = FALSE, fill = TRUE, na.strings = "")) |> > fix_colnames() > > Another way to write this which does not use a separate defined > function nor the anonymous function notation is to box the output of > transform: > > dat |> > transform(S = read.table(text = string, > header = FALSE, fill = TRUE, na.strings = "")) |> > list(x = _) |> > with( setNames(x, sub("\\.V", "", names(x))) ) > > dplyr. Alternately use dplyr in which case we can make use of > rename_with . In this case read.table(...) creates column names V1, > V2, etc. and mutate does not change them so simply replacing V with S > at the start of each column name in the output of read.table will do. > Also we can pipe the read.table output directly to rename_with using a > nested pipeline, i.e. the second pipe is entirely within mutate rather > than after it) since mutate won't change the column names. The win > here is because, unlike transform, mutate does not require the S= that > is needed with transform (although it allows it had we wanted it). > > library(dplyr) > > dat |> > mutate(read.table(text = string, > header = FALSE, fill = TRUE, na.strings = "") |> > rename_with(~ sub("^V", "S", .x)) > ) > > > On Sun, Jul 21, 2024 at 3:08 PM Bert Gunter <bgunter.4...@gmail.com> wrote: > > > > As always, good point. > > Here's a piped version of your code for those who are pipe > > afficianados. As I'm not very skilled with pipes, it might certainly > > be improved. > > dat <- > > dat$string |> > > read.table( text = _, fill = TRUE, header = FALSE, na.strings = > > "") |> > > (\(x)'names<-'(x,paste0("s", seq_along(x))))() |> > > (\(x)cbind(dat, x))() > > > > -- Bert > > > > > > On Sun, Jul 21, 2024 at 11:30 AM Gabor Grothendieck > > <ggrothendi...@gmail.com> wrote: > > > > > > Fixing col.names=paste0("S", 1:5) assumes that there will be 5 columns and > > > we may not want to do that. If there are only 3 fields in string, at the > > > most, > > > we may wish to generate only 3 columns. > > > > > > On Sun, Jul 21, 2024 at 2:20 PM Bert Gunter <bgunter.4...@gmail.com> > > > wrote: > > > > > > > > Nice! -- Let read.table do the work of handling the NA's. > > > > However, even simpler is to use the 'colnames' argument of > > > > read.table() for the column names no? > > > > > > > > string <- read.table(text = dat$string, fill = TRUE, header = > > > > FALSE, na.strings = "", > > > > col.names = paste0("s", 1:5)) > > > > dat <- cbind(dat, string) > > > > > > > > -- Bert > > > > > > > > On Sun, Jul 21, 2024 at 10:16 AM Gabor Grothendieck > > > > <ggrothendi...@gmail.com> wrote: > > > > > > > > > > We can use read.table for a base R solution > > > > > > > > > > string <- read.table(text = dat$string, fill = TRUE, header = FALSE, > > > > > na.strings = "") > > > > > names(string) <- paste0("S", seq_along(string)) > > > > > cbind(dat[-3], string) > > > > > > > > > > On Fri, Jul 19, 2024 at 12:52 PM Val <valkr...@gmail.com> wrote: > > > > > > > > > > > > Hi All, > > > > > > > > > > > > I want to extract new variables from a string and add it to the > > > > > > dataframe. > > > > > > Sample data is csv file. > > > > > > > > > > > > dat<-read.csv(text="Year, Sex,string > > > > > > 2002,F,15 xc Ab > > > > > > 2003,F,14 > > > > > > 2004,M,18 xb 25 35 21 > > > > > > 2005,M,13 25 > > > > > > 2006,M,14 ac 256 AV 35 > > > > > > 2007,F,11",header=TRUE) > > > > > > > > > > > > The string column has a maximum of five variables. Some rows have > > > > > > all > > > > > > and others may not have all the five variables. If missing then > > > > > > fill > > > > > > it with NA, > > > > > > Desired result is shown below, > > > > > > > > > > > > > > > > > > Year,Sex,string, S1, S2, S3 S4,S5 > > > > > > 2002,F,15 xc Ab, 15,xc,Ab, NA, NA > > > > > > 2003,F,14, 14,NA,NA,NA,NA > > > > > > 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21 > > > > > > 2005,M,13 25,13, 25,NA,NA,NA > > > > > > 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35 > > > > > > 2007,F,11, 11,NA,NA,NA,NA > > > > > > > > > > > > Any help? > > > > > > Thank you in advance. > > > > > > > > > > > > ______________________________________________ > > > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > > > PLEASE do read the posting guide > > > > > > http://www.R-project.org/posting-guide.html > > > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > > > > > > > > > > -- > > > > > Statistics & Software Consulting > > > > > GKX Group, GKX Associates Inc. > > > > > tel: 1-877-GKX-GROUP > > > > > email: ggrothendieck at gmail.com > > > > > > > > > > ______________________________________________ > > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > > PLEASE do read the posting guide > > > > > http://www.R-project.org/posting-guide.html > > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > > -- > > > Statistics & Software Consulting > > > GKX Group, GKX Associates Inc. > > > tel: 1-877-GKX-GROUP > > > email: ggrothendieck at gmail.com > > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.