Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?

2015-10-25 Thread Bert Gunter
Yay Chuck!  Boo Bert.

-- Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Sat, Oct 24, 2015 at 9:05 PM, Charles C. Berry  wrote:
> On Sat, 24 Oct 2015, Bert Gunter wrote:
>
>> Rolf's solution works for the situation where all duplicated values
>> are contiguous, which may be what you need. However, I wondered how it
>> could be done if this were not the case. Below is an answer. It is not
>> as efficient or elegant as Rolf's solution for the contiguous case I
>> think; maybe someone will come up with something better.
>
>
> The often underappreciated `ave' comes to mind. viz.,
>
> ave(w,w,FUN=seq_along)
> and
> ave(ID,ID,FUN=seq_along)
>
> agree with the results below.
>
> Of course, ave(...) is just split/unsplit in guise, further our discussion
> of a month or two back.
>
> Best,
>
> Chuck
>
>
>> But I think
>> it works. Here's an example with code:
>>
>>> w <- c(1:5,3,1,2,7,8,5,5,5,2,3)
>>> w
>>
>> [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3
>>>
>>> d <- 0+duplicated(w)
>>> for(x in unique(w)){
>>
>> +   i <- w==x
>> +   d[i]<-1+ cumsum(d[i])
>> +
>> + }
>>>
>>> d
>>
>> [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3
>>
>> As always, corrections and/or improvements welcome.
>>
>> Cheers,
>> Bert
>> Bert Gunter
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>>   -- Clifford Stoll
>>
>>
>> On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner 
>> wrote:
>>>
>>> On 25/10/15 11:28, John Sorkin wrote:


 I have a file that has (1) Line numbers, (2) IDs. A given ID number can
 appear in more than one row. For each row with a repeated ID, I want to
 add
 a number that gives the sequence number of the repeated ID number. The R
 code below demonstrates what I want to have, without any attempt to
 produce
 the result, as I have no idea how to accomplish my goal.


 line <- c(1,2,3,4,5,6,7,8,9,10)
 ID<-c(1,1,2,3,4,5,6,7,8,8)
 cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain
 ID
 8")
 cbind(line,ID)
 Seq <-  c(1,2,1,1,1,1,1,1,1,2)
 cat("Sequence numbers within ID added to the data")
 cbind(line,ID,Seq)
>>>
>>>
>>>
>>> I *think* that
>>>
>>>   unlist(lapply(rle(ID)$lengths,seq_len))
>>>
>>> gives what you want.  At least it does for the given example.
>>>
>>> cheers,
>>>
>>> Rolf Turner
>>>
>>> --
>>> Technical Editor ANZJS
>>> Department of Statistics
>>> University of Auckland
>>> Phone: +64-9-373-7599 ext. 88276
>>>
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> Charles C. Berry Dept of Family Medicine & Public Health
> cberry at ucsd edu   UC San Diego / La Jolla, CA 92093-0901
> http://famprevmed.ucsd.edu/faculty/cberry/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?

2015-10-24 Thread Rolf Turner


On 25/10/15 12:33, Bert Gunter wrote:


Rolf's solution works for the situation where all duplicated values
are contiguous, which may be what you need. However, I wondered how it
could be done if this were not the case. Below is an answer. It is not
as efficient or elegant as Rolf's solution for the contiguous case I
think; maybe someone will come up with something better. But I think
it works. Here's an example with code:


w <- c(1:5,3,1,2,7,8,5,5,5,2,3)
w

  [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3

d <- 0+duplicated(w)
for(x in unique(w)){

+   i <- w==x
+   d[i]<-1+ cumsum(d[i])
+
+ }

d

  [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3

As always, corrections and/or improvements welcome.


How about:

o <- order(w)
d <- unlist(lapply(rle(w[o])$lengths,seq_len))[order(o)]

Works for the given example. :-)

cheers,

Rolf

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276



On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner  wrote:

On 25/10/15 11:28, John Sorkin wrote:


I have a file that has (1) Line numbers, (2) IDs. A given ID number can
appear in more than one row. For each row with a repeated ID, I want to add
a number that gives the sequence number of the repeated ID number. The R
code below demonstrates what I want to have, without any attempt to produce
the result, as I have no idea how to accomplish my goal.


line <- c(1,2,3,4,5,6,7,8,9,10)
ID<-c(1,1,2,3,4,5,6,7,8,8)
cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID
8")
cbind(line,ID)
Seq <-  c(1,2,1,1,1,1,1,1,1,2)
cat("Sequence numbers within ID added to the data")
cbind(line,ID,Seq)



I *think* that

   unlist(lapply(rle(ID)$lengths,seq_len))

gives what you want.  At least it does for the given example.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?

2015-10-24 Thread Charles C. Berry

On Sat, 24 Oct 2015, Bert Gunter wrote:


Rolf's solution works for the situation where all duplicated values
are contiguous, which may be what you need. However, I wondered how it
could be done if this were not the case. Below is an answer. It is not
as efficient or elegant as Rolf's solution for the contiguous case I
think; maybe someone will come up with something better.


The often underappreciated `ave' comes to mind. viz.,

ave(w,w,FUN=seq_along)
and
ave(ID,ID,FUN=seq_along)

agree with the results below.

Of course, ave(...) is just split/unsplit in guise, further our discussion 
of a month or two back.


Best,

Chuck


But I think
it works. Here's an example with code:


w <- c(1:5,3,1,2,7,8,5,5,5,2,3)
w

[1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3

d <- 0+duplicated(w)
for(x in unique(w)){

+   i <- w==x
+   d[i]<-1+ cumsum(d[i])
+
+ }

d

[1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3

As always, corrections and/or improvements welcome.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
  -- Clifford Stoll


On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner  wrote:

On 25/10/15 11:28, John Sorkin wrote:


I have a file that has (1) Line numbers, (2) IDs. A given ID number can
appear in more than one row. For each row with a repeated ID, I want to add
a number that gives the sequence number of the repeated ID number. The R
code below demonstrates what I want to have, without any attempt to produce
the result, as I have no idea how to accomplish my goal.


line <- c(1,2,3,4,5,6,7,8,9,10)
ID<-c(1,1,2,3,4,5,6,7,8,8)
cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID
8")
cbind(line,ID)
Seq <-  c(1,2,1,1,1,1,1,1,1,2)
cat("Sequence numbers within ID added to the data")
cbind(line,ID,Seq)



I *think* that

  unlist(lapply(rle(ID)$lengths,seq_len))

gives what you want.  At least it does for the given example.

cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




Charles C. Berry Dept of Family Medicine & Public Health
cberry at ucsd edu   UC San Diego / La Jolla, CA 92093-0901
http://famprevmed.ucsd.edu/faculty/cberry/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?

2015-10-24 Thread Bert Gunter
Rolf's solution works for the situation where all duplicated values
are contiguous, which may be what you need. However, I wondered how it
could be done if this were not the case. Below is an answer. It is not
as efficient or elegant as Rolf's solution for the contiguous case I
think; maybe someone will come up with something better. But I think
it works. Here's an example with code:

> w <- c(1:5,3,1,2,7,8,5,5,5,2,3)
> w
 [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3
> d <- 0+duplicated(w)
> for(x in unique(w)){
+   i <- w==x
+   d[i]<-1+ cumsum(d[i])
+
+ }
> d
 [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3

As always, corrections and/or improvements welcome.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner  wrote:
> On 25/10/15 11:28, John Sorkin wrote:
>>
>> I have a file that has (1) Line numbers, (2) IDs. A given ID number can
>> appear in more than one row. For each row with a repeated ID, I want to add
>> a number that gives the sequence number of the repeated ID number. The R
>> code below demonstrates what I want to have, without any attempt to produce
>> the result, as I have no idea how to accomplish my goal.
>>
>>
>> line <- c(1,2,3,4,5,6,7,8,9,10)
>> ID<-c(1,1,2,3,4,5,6,7,8,8)
>> cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID
>> 8")
>> cbind(line,ID)
>> Seq <-  c(1,2,1,1,1,1,1,1,1,2)
>> cat("Sequence numbers within ID added to the data")
>> cbind(line,ID,Seq)
>
>
> I *think* that
>
>   unlist(lapply(rle(ID)$lengths,seq_len))
>
> gives what you want.  At least it does for the given example.
>
> cheers,
>
> Rolf Turner
>
> --
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?

2015-10-24 Thread Rolf Turner

On 25/10/15 11:28, John Sorkin wrote:

I have a file that has (1) Line numbers, (2) IDs. A given ID number can appear 
in more than one row. For each row with a repeated ID, I want to add a number 
that gives the sequence number of the repeated ID number. The R code below 
demonstrates what I want to have, without any attempt to produce the result, as 
I have no idea how to accomplish my goal.


line <- c(1,2,3,4,5,6,7,8,9,10)
ID<-c(1,1,2,3,4,5,6,7,8,8)
cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID 8")
cbind(line,ID)
Seq <-  c(1,2,1,1,1,1,1,1,1,2)
cat("Sequence numbers within ID added to the data")
cbind(line,ID,Seq)


I *think* that

  unlist(lapply(rle(ID)$lengths,seq_len))

gives what you want.  At least it does for the given example.

cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Add sequence numbers to lines with the same ID: How can this be accomplished?

2015-10-24 Thread John Sorkin
I have a file that has (1) Line numbers, (2) IDs. A given ID number can appear 
in more than one row. For each row with a repeated ID, I want to add a number 
that gives the sequence number of the repeated ID number. The R code below 
demonstrates what I want to have, without any attempt to produce the result, as 
I have no idea how to accomplish my goal.


line <- c(1,2,3,4,5,6,7,8,9,10)
ID<-c(1,1,2,3,4,5,6,7,8,8)
cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID 8")
cbind(line,ID)
Seq <-  c(1,2,1,1,1,1,1,1,1,2)
cat("Sequence numbers within ID added to the data")
cbind(line,ID,Seq)



John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric 
Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) 


Confidentiality Statement:
This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized use, disclosure or distribution is prohibited. If you are not 
the intended recipient, please contact the sender by reply email and destroy 
all copies of the original message. 
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.