Re: [R] Conditional Sums for Index creation

Patnaik, Tirthankar Mon, 14 May 2007 02:14:36 -0700

All,
        I guess the idea is identify the number of NAs na(r) in
particular row r, then calculate the 'original' row-sum ors(r) based on
the info on the set of non-missings in the previous row, i.e., na(r-1),
right?


Gaurav, I'd assume that the data is always for a stock, and the decision
to include a stock in the portfolio would be mine.

# Indexes of all the non-missing in all the rows
ad <- apply(ac,1,function(y)which(!is.na(y)))

Now here, if I try a row-by-row evaluation: 

> sum(ac[6,unlist(ad[5])],na.rm=TRUE)
[1] 208
> 
> sum(ac[11,unlist(ad[10])],na.rm=TRUE)
[1] 130
> 
> 

I get the answer I need! Works exactly well (Please see my mail below
for the expected answer. 151 in row=11 was incorrect, the right answer
is indeed 130. 

However, if I try and use this for all the rows in the data.frame:

ae <- ac

for (r in 2:nrow(ac)){
        print(r)
        ae$sumCorr <- unlist(sum(ac[r,unlist(ad[r-1])],na.rm=TRUE))
}

> head(ae)
  V1 V2 V3 V4 V5 sumCorr
1 27 93 82 NA NA     316
2 37 21 65 NA NA     316
3 57 65 78 NA NA     316
4 91 13 55 NA NA     316
5 20 27 53 NA NA     316
6 90 39 79 26 NA     316
> 

Which is obviously not right. The question is why does ae$sumCorr not
update with the change in row.  IMHO if I have this, then the solution
is nigh! :)

TIA and best,
-Tir

-----Original Message-----
From: Petr PIKAL [mailto:[EMAIL PROTECTED] 
Sent: Monday, May 14, 2007 2:21 PM
To: Patnaik, Tirthankar [GWM-CIR]
Cc: [email protected]
Subject: Odp: [R] Conditional Sums for Index creation

Hi

there probably is not an easy out of box solution. Some use can be in
rle function e.g.

> apply(apply(ac,2,function(x) (is.na(x))),2, function(x)
rle(x)$values[1])
   V1    V2    V3    V4    V5 
FALSE FALSE FALSE  TRUE  TRUE 

gives you columns which start with NA and

> apply(apply(ac,2,function(x) (is.na(x))),2, function(x)
rle(x)$lengths[1])
V1 V2 V3 V4 V5
 9  6  6  5 10 
>

gives you length of NA consecutive sequence in each column. Then you can

elaborate a condition which will compute rowSums differently for a 
particular row. E.g. when there is a TRUE value take a number for that 
given column and increase it by one. You will get row number for which
you 
need rowSum with and without this particular column.

Regards

Petr Pikal
[EMAIL PROTECTED]

[EMAIL PROTECTED] napsal dne 14.05.2007 08:23:01:

> Hi,
>    Apologies for the long mail. I have a data.frame with columns of
> price/mcap data for a portfolio of stocks, and the date. To get the
> total value of the portfolio on a daily basis, I calculate rowSums of
> the data.frame. 
> 
> > set.seed(1)
> > ab <- matrix(round(runif(100)*100),nrow=20,ncol=5)
> > ab[1:5,4:5] <- NA
> > ab[6:10,5] <- NA
> > ac <- as.data.frame(ifelse(ab <= 7,NA,ab))
> > ac
>    V1 V2 V3 V4 V5
> 1  27 93 82 NA NA
> 2  37 21 65 NA NA
> 3  57 65 78 NA NA
> 4  91 13 55 NA NA
> 5  20 27 53 NA NA
> 6  90 39 79 26 NA
> 7  94 NA NA 48 NA
> 8  66 38 48 77 NA
> 9  63 87 73  8 NA
> 10 NA 34 69 88 NA
> 11 21 48 48 34 24
> 12 18 60 86 84 NA
> 13 69 49 44 35 64
> 14 38 19 24 33 88
> 15 77 83 NA 48 78
> 16 50 67 10 89 80
> 17 72 79 32 86 46
> 18 99 11 52 39 41
> 19 38 72 66 78 81
> 20 78 41 41 96 60
> > 
> 
> Here the rows 1:20 are dates (also in my data.frame). 
> 
> Since some of the prices have NA, the rowSums is made to ignore these
> entries. 
> 
> > rowSums(ac,na.rm=TRUE)
>   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
18
> 19  20 
> 202 123 200 159 100 234 142 229 231 191 175 248 261 202 286 296 315
242
> 335 316 
> > 
> 
> Stocks are being added to the portfolio too. So from date=6 (or row=6)
> we have the 4th stock V4, and from date=11, we have the 5th stock V5.
My
> problem is that I need to calculate the rowSums for row=6 (When a new
> stock was added), _with_ and _without_ the new stock. So my answer for
> row=6 would be 234 for the plain row-sum, and 234 - 26 = 208 for the
> original set of stocks (without V4). Similarly, my answer for row=11
> would be 175 for the plain sum, and 175 - 24 = 151 for the original
sum
> (without V5). 
> 
> Basically I'm interested in finding out the value of the portfolio
with
> and without the new stock for the purposes of creating an index. It's
> possible that some stocks my get dropped later, in which case there
> would be an NA series starting for say V1 at row=18 and so on. In that
> case, the aim would be to find the sum at row=18 with and without the
> value of V1. 
> 
> Is there any way I can get the sum over columns, deleting specific
> colums? To get the columns that are NA in any row, I tried (shown for
> the first 12 rows):
> 
> > apply(ac[1:12,],1,function(y)which(is.na(y)))
> 
> Which correctly gives 
> 
> $`1`
> V4 V5 
>  4  5 
> 
> $`2`
> V4 V5 
>  4  5 
> 
> $`3`
> V4 V5 
>  4  5 
> 
> $`4`
> V4 V5 
>  4  5 
> 
> $`5`
> V4 V5 
>  4  5 
> 
> $`6`
> V5 
>  5 
> 
> $`7`
> V2 V3 V5 
>  2  3  5 
> 
> $`8`
> V5 
>  5 
> 
> $`9`
> V5 
>  5 
> 
> $`10`
> V1 V5 
>  1  5 
> 
> $`11`
> integer(0)
> 
> $`12`
> V5 
>  5 
> 
> > 
> 
> But now I'm stuck. I don't how to use this list of indices at each row
> to exclude my columns. 
> 
> Any pointers please? Would such an exercise be easier if I use a
> time-series based object, like a zoo.
> 
> 
> TIA and best,
> -Tir
> 
> Tirthankar Patnaik
> India Strategy
> Citigroup Investment Research
> +91-22-6631 9887
> 
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Conditional Sums for Index creation

Reply via email to