spliting variable into groups by i.e. between start, NA, NA, stop1, start2, NA, stop2

Joris Meys Thu, 24 Jun 2010 09:04:06 -0700

Same trick :
c0<-rbind( 1,      2 , 3, 4,      5, 6, 7, 8, 9,10,11,
12,13,14,15,16,17     )
c0
c1<-rbind(10,     20 ,30,40,     50,10,60,20,30,40,50,      30,10,
0,NA,20,10.3444)
c1
c2<-rbind(NA,"A",NA,NA,"B",NA,NA,NA,NA,NA,NA,"C",NA,NA,NA,NA,"D")
c2
C.df<-data.frame(c0,c1,c2)
C.df


pos <- which(!is.na(C.df$c2))

idx <- sapply(2:length(pos),function(i) pos[i-1]:(pos[i]-1))
names(idx) <- sapply(2:length(pos),
    function(i) paste(C.df$c2[pos[i-1]],"-",C.df$c2[pos[i]]))


out <- lapply(idx,function(i) summary(C.df[i,1:2]))
out



2010/6/24 Eugeniusz Kałuża <eugeniusz.kal...@polsl.pl>:
> Dear useRs,
>
> Thanks for advice from Joris Meys,
> Now will try to think how to make it working for less specyfic case,
> to make the problem more general.
> Then the result should be displayed for every group between non empty string 
> in c2
> i.e. not only result for:
>  #mean:
>          c1     c3    c4           c5
>          20  Start1 Stop1 Start1-Stop1
>    25.48585  Start2 Stop2 Start2-Stop2
>
> but also for every one group created by space between two closest strings in 
> c2, that contains only seriess of Na, NA, NA, separated from time to time by 
> one string i.e.:
>  #mean:
>          c1     c3    c4           c5
>          20 Start1 Stop1 Start1-Stop1
>          .. Stop1 Start2 Stop1-Start2
>    25.48585  Start2 Stop2 Start2-Stop2
>
> i.e.
> to rewrite this maybe for another simpler version of command
>
> but also for every one group created by space between two closest strings in 
> c2, that contains only seriess of Na, NA, NA, separated from time to time by 
> one string A, NA, NA, NA, NA, B, NA, NA, NA, C, NA,NA,NA,NA,D, NA,NA
> i.e.:
>  #mean:
>          c1     c3    c4           c5
>          20      A     B          A-B
>          ..      B     C          B-C
>    25.48585      C     D          C-D
> ...................
>
>
> Looking for more general method (function), grouping between these letters in 
> c2,
> I will now try to study solution proposed by Joris Meys
> Thanks for immediate aswer
> Kaluza
>
>
>
>
> -----Wiadomo¶æ oryginalna-----
> Od: Joris Meys [mailto:jorism...@gmail.com]
> Wys³ano: Cz 2010-06-24 15:14
> Do: Eugeniusz Ka³u¿a
> DW: r-help@r-project.org
> Temat: Re: [R] ?to calculate sth for groups defined between points in one 
> variable (string), / value separating/ spliting variable into groups by i.e. 
> between start, NA, NA, stop1, start2, NA, stop2
>
> On Thu, Jun 24, 2010 at 1:18 PM, Eugeniusz Kaluza
> <eugeniusz.kal...@polsl.pl> wrote:
>>
>> Dear useRs,
>>
>> Thanks for any advices
>>
>> # I do not know where are the examples how to mark groups
>> #  based on signal occurence in the additional variable: cf. variable c2,
>> # How to calculate different calculations for groups defined by (split by 
>> occurence of c2 characteristic data)
>>
>>
>> #First example of simple data
>> #mexample   1      2    3  4     5  6  7  8  9  10 11       12 13 14 15 16 17
>> c0<-rbind( 1,      2 , 3, 4,      5, 6, 7, 8, 9,10,11,      
>> 12,13,14,15,16,17     )
>> c0
>> c1<-rbind(10,     20 ,30,40,     50,10,60,20,30,40,50,      30,10, 
>> 0,NA,20,10.3444)
>> c1
>> c2<-rbind(NA,"Start1",NA,NA,"Stop1",NA,NA,NA,NA,NA,NA,"Start2",NA,NA,NA,NA,"Stop2")
>> c2
>> C.df<-data.frame(cbind(c0,c1,c2))
>> colnames(C.df)<-c("c0","c1","c2")
>> C.df
>>
>> # preparation of form for explaining further needed result (next 3 lines are 
>> not needed indeed, they are only  to explain how to obtain final result
>>  c3<-rbind(NA,"Start1","Start1","Start1","Start1","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2")
>>  c4<-rbind(NA, "Stop1", "Stop1", "Stop1", "Stop1", "Stop2", "Stop2", 
>> "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", 
>> "Stop2", "Stop2")
>>  C.df<-data.frame(cbind(c0,c1,c2,c3,c4))
>>  colnames(C.df)<-c("c0","c1","c2","c3","c4")
>>  C.df$c5<-paste(C.df$c3,C.df$c4,sep="-")
>>  C.df
>>
> Now this is something I don't get. The list "Start2-Stop2" starts way
> before Start2, actually at Stop1. Sure that's what you want?
>
> I took the liberty of showing how to get the data between start and
> stop for every entry, and how to apply functions to it. If you don't
> get the code, look at
> ?lapply
> ?apply
> ?grep
>
> I also adjusted your example, as you caused all variables to be
> factors by using the cbind in the data.frame function. Never do this
> unless you're really sure you have to. But I can't think of a case
> where that would be beneficial...
>
> ...
> C.df<-data.frame(c0,c1,c2)
> C.df
>
> # find positions
> Start <- grep("Start",C.df$c2)
> Stop <- grep("Stop",C.df$c2)
>
> # create indices
> idx <- apply(cbind(Start,Stop),1,function(i) i[1]:i[2])
> names(idx) <- paste("Start",1:length(Start),"-Stop",1:length(Start),sep="")
>
> # Apply the function summary and get a list back named by the interval.
> out <- lapply(idx,function(i) summary(C.df[i,1:2]))
> out
>
> If you really need to start Start2 right after Stop1, you can use a
> similar approach.
>
> Cheers
> Joris
>
>> # NEEDED RESULTS
>>  # needed result
>> # for Stat1-Stop1: mean(20,30,40,50)
>> # for Stat2-Stop2: mean(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T)
>> #mean:
>>         c1     c3    c4           c5
>>         20  Start1 Stop1 Start1-Stop1
>>   25.48585  Start2 Stop2 Start2-Stop2
>>
>> #sum
>> # for Stat1-Stop1: sum(20,30,40,50)
>> # for Stat2-Stop2: sum(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T)
>> #sum:
>>         c1     c3    c4           c5
>>        140  Start1 Stop1 Start1-Stop1
>>   280.3444  Start2 Stop2 Start2-Stop2
>>
>> # for Stat1-Stop1: max(20,30,40,50)
>> # for Stat2-Stop2: max(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T)
>> #max:
>>         c1     c3    c4           c5
>>        50  Start1 Stop1 Start1-Stop1
>>        60  Start2 Stop2 Start2-Stop2
>>
>> # place of max  (in Start1-Stop1: 4 th element in gruop Start1-Stop1
>> # place of max  (in Start1-Stop1: 2 nd element in gruop Start1-Stop1
>>
>>        c0     c3    c4           c5
>>         4  Start1 Stop1 Start1-Stop1
>>         2  Start2 Stop2 Start2-Stop2
>>
>>
>> Thanks for any suggestion,
>> Kaluza
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] PD: ?to calculate sth for groups defined between points in one variable (string), / value separating/ spliting variable into groups by i.e. between start, NA, NA, stop1, start2, NA, stop2

Reply via email to