But have we lured you to the dark side with the tidyverse yet ;-)
On Mon, 22 Jul 2024, 15:22 Bert Gunter, <bgunter.4...@gmail.com> wrote: > Thanks. > > I found this to be quite informative and a nice example of how useful > R-Help can be as a resource for R users. > > Best, > Bert > > On Mon, Jul 22, 2024 at 4:50 AM Gabor Grothendieck > <ggrothendi...@gmail.com> wrote: > > > > Base R. Regarding code improvements: > > > > 1. Personally I find (\(...) ...)() notation hard to read (although by > > placing (\(x), the body and )() on 3 separate lines it can be improved > > somewhat). Instead let us use a named function. The name of the > > function can also serve to self document the code. > > > > 2. The use of dat both at the start of the pipeline and then again > > within a later step of the pipeline goes against a strict left to > > right flow. In general if this occurs it is either a sign that we need > > to break the pipeline into two or that we need to find another > > approach which is what we do here. > > > > We can use the base R code below. Note that the column names produced > > by transform(S = read.table(...)) are S.V1, S.V2, etc. so to fix the > > column names remove .V from all column names as in the fix_colnames > > function shown. It does no harm to apply that to all column names > > since the remaining column names will not match. > > > > fix_colnames <- function(x) { > > setNames(x, sub("\\.V", "", names(x))) > > } > > > > dat |> > > transform(S = read.table(text = string, > > header = FALSE, fill = TRUE, na.strings = "")) |> > > fix_colnames() > > > > Another way to write this which does not use a separate defined > > function nor the anonymous function notation is to box the output of > > transform: > > > > dat |> > > transform(S = read.table(text = string, > > header = FALSE, fill = TRUE, na.strings = "")) |> > > list(x = _) |> > > with( setNames(x, sub("\\.V", "", names(x))) ) > > > > dplyr. Alternately use dplyr in which case we can make use of > > rename_with . In this case read.table(...) creates column names V1, > > V2, etc. and mutate does not change them so simply replacing V with S > > at the start of each column name in the output of read.table will do. > > Also we can pipe the read.table output directly to rename_with using a > > nested pipeline, i.e. the second pipe is entirely within mutate rather > > than after it) since mutate won't change the column names. The win > > here is because, unlike transform, mutate does not require the S= that > > is needed with transform (although it allows it had we wanted it). > > > > library(dplyr) > > > > dat |> > > mutate(read.table(text = string, > > header = FALSE, fill = TRUE, na.strings = "") |> > > rename_with(~ sub("^V", "S", .x)) > > ) > > > > > > On Sun, Jul 21, 2024 at 3:08 PM Bert Gunter <bgunter.4...@gmail.com> > wrote: > > > > > > As always, good point. > > > Here's a piped version of your code for those who are pipe > > > afficianados. As I'm not very skilled with pipes, it might certainly > > > be improved. > > > dat <- > > > dat$string |> > > > read.table( text = _, fill = TRUE, header = FALSE, na.strings > = "") |> > > > (\(x)'names<-'(x,paste0("s", seq_along(x))))() |> > > > (\(x)cbind(dat, x))() > > > > > > -- Bert > > > > > > > > > On Sun, Jul 21, 2024 at 11:30 AM Gabor Grothendieck > > > <ggrothendi...@gmail.com> wrote: > > > > > > > > Fixing col.names=paste0("S", 1:5) assumes that there will be 5 > columns and > > > > we may not want to do that. If there are only 3 fields in string, > at the most, > > > > we may wish to generate only 3 columns. > > > > > > > > On Sun, Jul 21, 2024 at 2:20 PM Bert Gunter <bgunter.4...@gmail.com> > wrote: > > > > > > > > > > Nice! -- Let read.table do the work of handling the NA's. > > > > > However, even simpler is to use the 'colnames' argument of > > > > > read.table() for the column names no? > > > > > > > > > > string <- read.table(text = dat$string, fill = TRUE, header = > > > > > FALSE, na.strings = "", > > > > > col.names = paste0("s", 1:5)) > > > > > dat <- cbind(dat, string) > > > > > > > > > > -- Bert > > > > > > > > > > On Sun, Jul 21, 2024 at 10:16 AM Gabor Grothendieck > > > > > <ggrothendi...@gmail.com> wrote: > > > > > > > > > > > > We can use read.table for a base R solution > > > > > > > > > > > > string <- read.table(text = dat$string, fill = TRUE, header = > FALSE, > > > > > > na.strings = "") > > > > > > names(string) <- paste0("S", seq_along(string)) > > > > > > cbind(dat[-3], string) > > > > > > > > > > > > On Fri, Jul 19, 2024 at 12:52 PM Val <valkr...@gmail.com> wrote: > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > I want to extract new variables from a string and add it to > the dataframe. > > > > > > > Sample data is csv file. > > > > > > > > > > > > > > dat<-read.csv(text="Year, Sex,string > > > > > > > 2002,F,15 xc Ab > > > > > > > 2003,F,14 > > > > > > > 2004,M,18 xb 25 35 21 > > > > > > > 2005,M,13 25 > > > > > > > 2006,M,14 ac 256 AV 35 > > > > > > > 2007,F,11",header=TRUE) > > > > > > > > > > > > > > The string column has a maximum of five variables. Some rows > have all > > > > > > > and others may not have all the five variables. If missing > then fill > > > > > > > it with NA, > > > > > > > Desired result is shown below, > > > > > > > > > > > > > > > > > > > > > Year,Sex,string, S1, S2, S3 S4,S5 > > > > > > > 2002,F,15 xc Ab, 15,xc,Ab, NA, NA > > > > > > > 2003,F,14, 14,NA,NA,NA,NA > > > > > > > 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21 > > > > > > > 2005,M,13 25,13, 25,NA,NA,NA > > > > > > > 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35 > > > > > > > 2007,F,11, 11,NA,NA,NA,NA > > > > > > > > > > > > > > Any help? > > > > > > > Thank you in advance. > > > > > > > > > > > > > > ______________________________________________ > > > > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > see > > > > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > > > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > > > > > > and provide commented, minimal, self-contained, reproducible > code. > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Statistics & Software Consulting > > > > > > GKX Group, GKX Associates Inc. > > > > > > tel: 1-877-GKX-GROUP > > > > > > email: ggrothendieck at gmail.com > > > > > > > > > > > > ______________________________________________ > > > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > see > > > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > > > > > and provide commented, minimal, self-contained, reproducible > code. > > > > > > > > > > > > > > > > -- > > > > Statistics & Software Consulting > > > > GKX Group, GKX Associates Inc. > > > > tel: 1-877-GKX-GROUP > > > > email: ggrothendieck at gmail.com > > > > > > > > -- > > Statistics & Software Consulting > > GKX Group, GKX Associates Inc. > > tel: 1-877-GKX-GROUP > > email: ggrothendieck at gmail.com > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.