Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?
Yay Chuck! Boo Bert. -- Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Sat, Oct 24, 2015 at 9:05 PM, Charles C. Berry wrote: > On Sat, 24 Oct 2015, Bert Gunter wrote: > >> Rolf's solution works for the situation where all duplicated values >> are contiguous, which may be what you need. However, I wondered how it >> could be done if this were not the case. Below is an answer. It is not >> as efficient or elegant as Rolf's solution for the contiguous case I >> think; maybe someone will come up with something better. > > > The often underappreciated `ave' comes to mind. viz., > > ave(w,w,FUN=seq_along) > and > ave(ID,ID,FUN=seq_along) > > agree with the results below. > > Of course, ave(...) is just split/unsplit in guise, further our discussion > of a month or two back. > > Best, > > Chuck > > >> But I think >> it works. Here's an example with code: >> >>> w <- c(1:5,3,1,2,7,8,5,5,5,2,3) >>> w >> >> [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3 >>> >>> d <- 0+duplicated(w) >>> for(x in unique(w)){ >> >> + i <- w==x >> + d[i]<-1+ cumsum(d[i]) >> + >> + } >>> >>> d >> >> [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3 >> >> As always, corrections and/or improvements welcome. >> >> Cheers, >> Bert >> Bert Gunter >> >> "Data is not information. Information is not knowledge. And knowledge >> is certainly not wisdom." >> -- Clifford Stoll >> >> >> On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner >> wrote: >>> >>> On 25/10/15 11:28, John Sorkin wrote: I have a file that has (1) Line numbers, (2) IDs. A given ID number can appear in more than one row. For each row with a repeated ID, I want to add a number that gives the sequence number of the repeated ID number. The R code below demonstrates what I want to have, without any attempt to produce the result, as I have no idea how to accomplish my goal. line <- c(1,2,3,4,5,6,7,8,9,10) ID<-c(1,1,2,3,4,5,6,7,8,8) cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID 8") cbind(line,ID) Seq <- c(1,2,1,1,1,1,1,1,1,2) cat("Sequence numbers within ID added to the data") cbind(line,ID,Seq) >>> >>> >>> >>> I *think* that >>> >>> unlist(lapply(rle(ID)$lengths,seq_len)) >>> >>> gives what you want. At least it does for the given example. >>> >>> cheers, >>> >>> Rolf Turner >>> >>> -- >>> Technical Editor ANZJS >>> Department of Statistics >>> University of Auckland >>> Phone: +64-9-373-7599 ext. 88276 >>> >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> > > Charles C. Berry Dept of Family Medicine & Public Health > cberry at ucsd edu UC San Diego / La Jolla, CA 92093-0901 > http://famprevmed.ucsd.edu/faculty/cberry/ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?
On 25/10/15 12:33, Bert Gunter wrote: Rolf's solution works for the situation where all duplicated values are contiguous, which may be what you need. However, I wondered how it could be done if this were not the case. Below is an answer. It is not as efficient or elegant as Rolf's solution for the contiguous case I think; maybe someone will come up with something better. But I think it works. Here's an example with code: w <- c(1:5,3,1,2,7,8,5,5,5,2,3) w [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3 d <- 0+duplicated(w) for(x in unique(w)){ + i <- w==x + d[i]<-1+ cumsum(d[i]) + + } d [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3 As always, corrections and/or improvements welcome. How about: o <- order(w) d <- unlist(lapply(rle(w[o])$lengths,seq_len))[order(o)] Works for the given example. :-) cheers, Rolf -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner wrote: On 25/10/15 11:28, John Sorkin wrote: I have a file that has (1) Line numbers, (2) IDs. A given ID number can appear in more than one row. For each row with a repeated ID, I want to add a number that gives the sequence number of the repeated ID number. The R code below demonstrates what I want to have, without any attempt to produce the result, as I have no idea how to accomplish my goal. line <- c(1,2,3,4,5,6,7,8,9,10) ID<-c(1,1,2,3,4,5,6,7,8,8) cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID 8") cbind(line,ID) Seq <- c(1,2,1,1,1,1,1,1,1,2) cat("Sequence numbers within ID added to the data") cbind(line,ID,Seq) I *think* that unlist(lapply(rle(ID)$lengths,seq_len)) gives what you want. At least it does for the given example. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?
On Sat, 24 Oct 2015, Bert Gunter wrote: Rolf's solution works for the situation where all duplicated values are contiguous, which may be what you need. However, I wondered how it could be done if this were not the case. Below is an answer. It is not as efficient or elegant as Rolf's solution for the contiguous case I think; maybe someone will come up with something better. The often underappreciated `ave' comes to mind. viz., ave(w,w,FUN=seq_along) and ave(ID,ID,FUN=seq_along) agree with the results below. Of course, ave(...) is just split/unsplit in guise, further our discussion of a month or two back. Best, Chuck But I think it works. Here's an example with code: w <- c(1:5,3,1,2,7,8,5,5,5,2,3) w [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3 d <- 0+duplicated(w) for(x in unique(w)){ + i <- w==x + d[i]<-1+ cumsum(d[i]) + + } d [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3 As always, corrections and/or improvements welcome. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner wrote: On 25/10/15 11:28, John Sorkin wrote: I have a file that has (1) Line numbers, (2) IDs. A given ID number can appear in more than one row. For each row with a repeated ID, I want to add a number that gives the sequence number of the repeated ID number. The R code below demonstrates what I want to have, without any attempt to produce the result, as I have no idea how to accomplish my goal. line <- c(1,2,3,4,5,6,7,8,9,10) ID<-c(1,1,2,3,4,5,6,7,8,8) cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID 8") cbind(line,ID) Seq <- c(1,2,1,1,1,1,1,1,1,2) cat("Sequence numbers within ID added to the data") cbind(line,ID,Seq) I *think* that unlist(lapply(rle(ID)$lengths,seq_len)) gives what you want. At least it does for the given example. cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry Dept of Family Medicine & Public Health cberry at ucsd edu UC San Diego / La Jolla, CA 92093-0901 http://famprevmed.ucsd.edu/faculty/cberry/ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?
Rolf's solution works for the situation where all duplicated values are contiguous, which may be what you need. However, I wondered how it could be done if this were not the case. Below is an answer. It is not as efficient or elegant as Rolf's solution for the contiguous case I think; maybe someone will come up with something better. But I think it works. Here's an example with code: > w <- c(1:5,3,1,2,7,8,5,5,5,2,3) > w [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3 > d <- 0+duplicated(w) > for(x in unique(w)){ + i <- w==x + d[i]<-1+ cumsum(d[i]) + + } > d [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3 As always, corrections and/or improvements welcome. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner wrote: > On 25/10/15 11:28, John Sorkin wrote: >> >> I have a file that has (1) Line numbers, (2) IDs. A given ID number can >> appear in more than one row. For each row with a repeated ID, I want to add >> a number that gives the sequence number of the repeated ID number. The R >> code below demonstrates what I want to have, without any attempt to produce >> the result, as I have no idea how to accomplish my goal. >> >> >> line <- c(1,2,3,4,5,6,7,8,9,10) >> ID<-c(1,1,2,3,4,5,6,7,8,8) >> cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID >> 8") >> cbind(line,ID) >> Seq <- c(1,2,1,1,1,1,1,1,1,2) >> cat("Sequence numbers within ID added to the data") >> cbind(line,ID,Seq) > > > I *think* that > > unlist(lapply(rle(ID)$lengths,seq_len)) > > gives what you want. At least it does for the given example. > > cheers, > > Rolf Turner > > -- > Technical Editor ANZJS > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?
On 25/10/15 11:28, John Sorkin wrote: I have a file that has (1) Line numbers, (2) IDs. A given ID number can appear in more than one row. For each row with a repeated ID, I want to add a number that gives the sequence number of the repeated ID number. The R code below demonstrates what I want to have, without any attempt to produce the result, as I have no idea how to accomplish my goal. line <- c(1,2,3,4,5,6,7,8,9,10) ID<-c(1,1,2,3,4,5,6,7,8,8) cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID 8") cbind(line,ID) Seq <- c(1,2,1,1,1,1,1,1,1,2) cat("Sequence numbers within ID added to the data") cbind(line,ID,Seq) I *think* that unlist(lapply(rle(ID)$lengths,seq_len)) gives what you want. At least it does for the given example. cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Add sequence numbers to lines with the same ID: How can this be accomplished?
I have a file that has (1) Line numbers, (2) IDs. A given ID number can appear in more than one row. For each row with a repeated ID, I want to add a number that gives the sequence number of the repeated ID number. The R code below demonstrates what I want to have, without any attempt to produce the result, as I have no idea how to accomplish my goal. line <- c(1,2,3,4,5,6,7,8,9,10) ID<-c(1,1,2,3,4,5,6,7,8,8) cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID 8") cbind(line,ID) Seq <- c(1,2,1,1,1,1,1,1,1,2) cat("Sequence numbers within ID added to the data") cbind(line,ID,Seq) John David Sorkin M.D., Ph.D. Professor of Medicine Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.