Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?

2015-10-25 Thread Bert Gunter
Yay Chuck!  Boo Bert.

-- Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Sat, Oct 24, 2015 at 9:05 PM, Charles C. Berry  wrote:
> On Sat, 24 Oct 2015, Bert Gunter wrote:
>
>> Rolf's solution works for the situation where all duplicated values
>> are contiguous, which may be what you need. However, I wondered how it
>> could be done if this were not the case. Below is an answer. It is not
>> as efficient or elegant as Rolf's solution for the contiguous case I
>> think; maybe someone will come up with something better.
>
>
> The often underappreciated `ave' comes to mind. viz.,
>
> ave(w,w,FUN=seq_along)
> and
> ave(ID,ID,FUN=seq_along)
>
> agree with the results below.
>
> Of course, ave(...) is just split/unsplit in guise, further our discussion
> of a month or two back.
>
> Best,
>
> Chuck
>
>
>> But I think
>> it works. Here's an example with code:
>>
>>> w <- c(1:5,3,1,2,7,8,5,5,5,2,3)
>>> w
>>
>> [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3
>>>
>>> d <- 0+duplicated(w)
>>> for(x in unique(w)){
>>
>> +   i <- w==x
>> +   d[i]<-1+ cumsum(d[i])
>> +
>> + }
>>>
>>> d
>>
>> [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3
>>
>> As always, corrections and/or improvements welcome.
>>
>> Cheers,
>> Bert
>> Bert Gunter
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>>   -- Clifford Stoll
>>
>>
>> On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner 
>> wrote:
>>>
>>> On 25/10/15 11:28, John Sorkin wrote:


 I have a file that has (1) Line numbers, (2) IDs. A given ID number can
 appear in more than one row. For each row with a repeated ID, I want to
 add
 a number that gives the sequence number of the repeated ID number. The R
 code below demonstrates what I want to have, without any attempt to
 produce
 the result, as I have no idea how to accomplish my goal.


 line <- c(1,2,3,4,5,6,7,8,9,10)
 ID<-c(1,1,2,3,4,5,6,7,8,8)
 cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain
 ID
 8")
 cbind(line,ID)
 Seq <-  c(1,2,1,1,1,1,1,1,1,2)
 cat("Sequence numbers within ID added to the data")
 cbind(line,ID,Seq)
>>>
>>>
>>>
>>> I *think* that
>>>
>>>   unlist(lapply(rle(ID)$lengths,seq_len))
>>>
>>> gives what you want.  At least it does for the given example.
>>>
>>> cheers,
>>>
>>> Rolf Turner
>>>
>>> --
>>> Technical Editor ANZJS
>>> Department of Statistics
>>> University of Auckland
>>> Phone: +64-9-373-7599 ext. 88276
>>>
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> Charles C. Berry Dept of Family Medicine & Public Health
> cberry at ucsd edu   UC San Diego / La Jolla, CA 92093-0901
> http://famprevmed.ucsd.edu/faculty/cberry/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?

2015-10-24 Thread Bert Gunter
Rolf's solution works for the situation where all duplicated values
are contiguous, which may be what you need. However, I wondered how it
could be done if this were not the case. Below is an answer. It is not
as efficient or elegant as Rolf's solution for the contiguous case I
think; maybe someone will come up with something better. But I think
it works. Here's an example with code:

> w <- c(1:5,3,1,2,7,8,5,5,5,2,3)
> w
 [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3
> d <- 0+duplicated(w)
> for(x in unique(w)){
+   i <- w==x
+   d[i]<-1+ cumsum(d[i])
+
+ }
> d
 [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3

As always, corrections and/or improvements welcome.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner  wrote:
> On 25/10/15 11:28, John Sorkin wrote:
>>
>> I have a file that has (1) Line numbers, (2) IDs. A given ID number can
>> appear in more than one row. For each row with a repeated ID, I want to add
>> a number that gives the sequence number of the repeated ID number. The R
>> code below demonstrates what I want to have, without any attempt to produce
>> the result, as I have no idea how to accomplish my goal.
>>
>>
>> line <- c(1,2,3,4,5,6,7,8,9,10)
>> ID<-c(1,1,2,3,4,5,6,7,8,8)
>> cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID
>> 8")
>> cbind(line,ID)
>> Seq <-  c(1,2,1,1,1,1,1,1,1,2)
>> cat("Sequence numbers within ID added to the data")
>> cbind(line,ID,Seq)
>
>
> I *think* that
>
>   unlist(lapply(rle(ID)$lengths,seq_len))
>
> gives what you want.  At least it does for the given example.
>
> cheers,
>
> Rolf Turner
>
> --
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?

2015-10-24 Thread Rolf Turner

On 25/10/15 11:28, John Sorkin wrote:

I have a file that has (1) Line numbers, (2) IDs. A given ID number can appear 
in more than one row. For each row with a repeated ID, I want to add a number 
that gives the sequence number of the repeated ID number. The R code below 
demonstrates what I want to have, without any attempt to produce the result, as 
I have no idea how to accomplish my goal.


line <- c(1,2,3,4,5,6,7,8,9,10)
ID<-c(1,1,2,3,4,5,6,7,8,8)
cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID 8")
cbind(line,ID)
Seq <-  c(1,2,1,1,1,1,1,1,1,2)
cat("Sequence numbers within ID added to the data")
cbind(line,ID,Seq)


I *think* that

  unlist(lapply(rle(ID)$lengths,seq_len))

gives what you want.  At least it does for the given example.

cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?

2015-10-24 Thread Charles C. Berry

On Sat, 24 Oct 2015, Bert Gunter wrote:


Rolf's solution works for the situation where all duplicated values
are contiguous, which may be what you need. However, I wondered how it
could be done if this were not the case. Below is an answer. It is not
as efficient or elegant as Rolf's solution for the contiguous case I
think; maybe someone will come up with something better.


The often underappreciated `ave' comes to mind. viz.,

ave(w,w,FUN=seq_along)
and
ave(ID,ID,FUN=seq_along)

agree with the results below.

Of course, ave(...) is just split/unsplit in guise, further our discussion 
of a month or two back.


Best,

Chuck


But I think
it works. Here's an example with code:


w <- c(1:5,3,1,2,7,8,5,5,5,2,3)
w

[1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3

d <- 0+duplicated(w)
for(x in unique(w)){

+   i <- w==x
+   d[i]<-1+ cumsum(d[i])
+
+ }

d

[1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3

As always, corrections and/or improvements welcome.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
  -- Clifford Stoll


On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner  wrote:

On 25/10/15 11:28, John Sorkin wrote:


I have a file that has (1) Line numbers, (2) IDs. A given ID number can
appear in more than one row. For each row with a repeated ID, I want to add
a number that gives the sequence number of the repeated ID number. The R
code below demonstrates what I want to have, without any attempt to produce
the result, as I have no idea how to accomplish my goal.


line <- c(1,2,3,4,5,6,7,8,9,10)
ID<-c(1,1,2,3,4,5,6,7,8,8)
cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID
8")
cbind(line,ID)
Seq <-  c(1,2,1,1,1,1,1,1,1,2)
cat("Sequence numbers within ID added to the data")
cbind(line,ID,Seq)



I *think* that

  unlist(lapply(rle(ID)$lengths,seq_len))

gives what you want.  At least it does for the given example.

cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




Charles C. Berry Dept of Family Medicine & Public Health
cberry at ucsd edu   UC San Diego / La Jolla, CA 92093-0901
http://famprevmed.ucsd.edu/faculty/cberry/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?

2015-10-24 Thread Rolf Turner


On 25/10/15 12:33, Bert Gunter wrote:


Rolf's solution works for the situation where all duplicated values
are contiguous, which may be what you need. However, I wondered how it
could be done if this were not the case. Below is an answer. It is not
as efficient or elegant as Rolf's solution for the contiguous case I
think; maybe someone will come up with something better. But I think
it works. Here's an example with code:


w <- c(1:5,3,1,2,7,8,5,5,5,2,3)
w

  [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3

d <- 0+duplicated(w)
for(x in unique(w)){

+   i <- w==x
+   d[i]<-1+ cumsum(d[i])
+
+ }

d

  [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3

As always, corrections and/or improvements welcome.


How about:

o <- order(w)
d <- unlist(lapply(rle(w[o])$lengths,seq_len))[order(o)]

Works for the given example. :-)

cheers,

Rolf

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276



On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner  wrote:

On 25/10/15 11:28, John Sorkin wrote:


I have a file that has (1) Line numbers, (2) IDs. A given ID number can
appear in more than one row. For each row with a repeated ID, I want to add
a number that gives the sequence number of the repeated ID number. The R
code below demonstrates what I want to have, without any attempt to produce
the result, as I have no idea how to accomplish my goal.


line <- c(1,2,3,4,5,6,7,8,9,10)
ID<-c(1,1,2,3,4,5,6,7,8,8)
cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID
8")
cbind(line,ID)
Seq <-  c(1,2,1,1,1,1,1,1,1,2)
cat("Sequence numbers within ID added to the data")
cbind(line,ID,Seq)



I *think* that

   unlist(lapply(rle(ID)$lengths,seq_len))

gives what you want.  At least it does for the given example.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.