Id      cat1    location        item_values     p-values        sequence        
a111    1       3002737 0.196504377     0.01    1       
a112    1       3017821 0.196504377     0.05    2       
a113    1       3027730 0.196504377     0.02    3       
a114    1       3036220 0.196504377     0.04    4       
a115    1       3053984 0.196504377     0.03    5       
a116    1       3063892 0.196504377     0.07    6       
a117    1       3076333 0.196504377     0.08    7       
a118    1       3090500 0.196504377     0.02    8       
a119    1       3103304 0.196504377     0.03    9       
a120    1       3119350 0.196504377     0.05    10      
a121    1       3129884 0.196504377     0.01    11      
a122    1       3154598 0.196504377     0.03    12      
a123    1       3170910 0.196504377     0.05    13      
a124    1       3180712 0.196504377     0.06    14      
a125    1       3186519 0.196504377     0.07    15      
a126    1       3192256 0.196504377     0.09    16      
a127    1       3198441 0.196504377     0.01    17      
a128    1       3205784 0.196504377     0.02    18      
a129    1       3210685 0.196504377     0.03    19      
a130    1       3218542 0.196504377     0.04    20      
a131    1       3234318 0.196504377     0.05    21      
a132    1       3239972 0.196504377     0.09    22      
a133    1       3245663 0.196504377     0.05    23      
a134    1       3257997 0.196504377     0.02    24      
a135    1       3273226 0.196504377     0.03    26      
a136    1       3285404 0.196504377     0.04    27      
a137    1       3290332 0.196504377     0.05    28      
a138    1       3300679 0.196504377     0.03    29      
a139    1       3310164 0.196504377     0.09    30      


first of all, please pay attention to the P -values, all the rows with the
p-value <0.05 will be considered as one region until the p-value >0.05
identified. for instance: REGION 1 is the rows from id a111 to id A115 .
REGION 2  is the rows from id a118 to a123, etc.

what i am going to accomplish is to pick the start and end location, and the
peak value from the item_values for each region.

option 1:

   loop through each row until the p-value>0.05 identified then
        start_location=the first location value
        end_location=the location value before the p>0.05
        peak_value of the item_values=the maximum one

option 2

    create a sequence number for each row;
    subset the raw dataframe by p<0.05;
    the p-value regions will be identified by the gapped sequence number.
for instance
   from sequence 1 to 5 will be considering one region.

     Id cat1    location        item_values     p-values        sequence        
a111    1       3002737 0.196504377     0.01    1       
a112    1       3017821 0.196504377     0.05    2       
a113    1       3027730 0.196504377     0.02    3       
a114    1       3036220 0.196504377     0.04    4       
a115    1       3053984 0.196504377     0.03    5       
a118    1       3090500 0.196504377     0.02    8       
a119    1       3103304 0.196504377     0.03    9       


I need your recommendation on the different approach to implement this?
Thanks,

-- 
View this message in context: 
http://r.789695.n4.nabble.com/data-arranged-by-p-values-tp2301909p2301909.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to