All,
I guess the idea is identify the number of NAs na(r) in
particular row r, then calculate the 'original' row-sum ors(r) based on
the info on the set of non-missings in the previous row, i.e., na(r-1),
right?
Gaurav, I'd assume that the data is always for a stock, and the decision
to include a stock in the portfolio would be mine.
# Indexes of all the non-missing in all the rows
ad <- apply(ac,1,function(y)which(!is.na(y)))
Now here, if I try a row-by-row evaluation:
> sum(ac[6,unlist(ad[5])],na.rm=TRUE)
[1] 208
>
> sum(ac[11,unlist(ad[10])],na.rm=TRUE)
[1] 130
>
>
I get the answer I need! Works exactly well (Please see my mail below
for the expected answer. 151 in row=11 was incorrect, the right answer
is indeed 130.
However, if I try and use this for all the rows in the data.frame:
ae <- ac
for (r in 2:nrow(ac)){
print(r)
ae$sumCorr <- unlist(sum(ac[r,unlist(ad[r-1])],na.rm=TRUE))
}
> head(ae)
V1 V2 V3 V4 V5 sumCorr
1 27 93 82 NA NA 316
2 37 21 65 NA NA 316
3 57 65 78 NA NA 316
4 91 13 55 NA NA 316
5 20 27 53 NA NA 316
6 90 39 79 26 NA 316
>
Which is obviously not right. The question is why does ae$sumCorr not
update with the change in row. IMHO if I have this, then the solution
is nigh! :)
TIA and best,
-Tir
-----Original Message-----
From: Petr PIKAL [mailto:[EMAIL PROTECTED]
Sent: Monday, May 14, 2007 2:21 PM
To: Patnaik, Tirthankar [GWM-CIR]
Cc: [email protected]
Subject: Odp: [R] Conditional Sums for Index creation
Hi
there probably is not an easy out of box solution. Some use can be in
rle function e.g.
> apply(apply(ac,2,function(x) (is.na(x))),2, function(x)
rle(x)$values[1])
V1 V2 V3 V4 V5
FALSE FALSE FALSE TRUE TRUE
gives you columns which start with NA and
> apply(apply(ac,2,function(x) (is.na(x))),2, function(x)
rle(x)$lengths[1])
V1 V2 V3 V4 V5
9 6 6 5 10
>
gives you length of NA consecutive sequence in each column. Then you can
elaborate a condition which will compute rowSums differently for a
particular row. E.g. when there is a TRUE value take a number for that
given column and increase it by one. You will get row number for which
you
need rowSum with and without this particular column.
Regards
Petr Pikal
[EMAIL PROTECTED]
[EMAIL PROTECTED] napsal dne 14.05.2007 08:23:01:
> Hi,
> Apologies for the long mail. I have a data.frame with columns of
> price/mcap data for a portfolio of stocks, and the date. To get the
> total value of the portfolio on a daily basis, I calculate rowSums of
> the data.frame.
>
> > set.seed(1)
> > ab <- matrix(round(runif(100)*100),nrow=20,ncol=5)
> > ab[1:5,4:5] <- NA
> > ab[6:10,5] <- NA
> > ac <- as.data.frame(ifelse(ab <= 7,NA,ab))
> > ac
> V1 V2 V3 V4 V5
> 1 27 93 82 NA NA
> 2 37 21 65 NA NA
> 3 57 65 78 NA NA
> 4 91 13 55 NA NA
> 5 20 27 53 NA NA
> 6 90 39 79 26 NA
> 7 94 NA NA 48 NA
> 8 66 38 48 77 NA
> 9 63 87 73 8 NA
> 10 NA 34 69 88 NA
> 11 21 48 48 34 24
> 12 18 60 86 84 NA
> 13 69 49 44 35 64
> 14 38 19 24 33 88
> 15 77 83 NA 48 78
> 16 50 67 10 89 80
> 17 72 79 32 86 46
> 18 99 11 52 39 41
> 19 38 72 66 78 81
> 20 78 41 41 96 60
> >
>
> Here the rows 1:20 are dates (also in my data.frame).
>
> Since some of the prices have NA, the rowSums is made to ignore these
> entries.
>
> > rowSums(ac,na.rm=TRUE)
> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18
> 19 20
> 202 123 200 159 100 234 142 229 231 191 175 248 261 202 286 296 315
242
> 335 316
> >
>
> Stocks are being added to the portfolio too. So from date=6 (or row=6)
> we have the 4th stock V4, and from date=11, we have the 5th stock V5.
My
> problem is that I need to calculate the rowSums for row=6 (When a new
> stock was added), _with_ and _without_ the new stock. So my answer for
> row=6 would be 234 for the plain row-sum, and 234 - 26 = 208 for the
> original set of stocks (without V4). Similarly, my answer for row=11
> would be 175 for the plain sum, and 175 - 24 = 151 for the original
sum
> (without V5).
>
> Basically I'm interested in finding out the value of the portfolio
with
> and without the new stock for the purposes of creating an index. It's
> possible that some stocks my get dropped later, in which case there
> would be an NA series starting for say V1 at row=18 and so on. In that
> case, the aim would be to find the sum at row=18 with and without the
> value of V1.
>
> Is there any way I can get the sum over columns, deleting specific
> colums? To get the columns that are NA in any row, I tried (shown for
> the first 12 rows):
>
> > apply(ac[1:12,],1,function(y)which(is.na(y)))
>
> Which correctly gives
>
> $`1`
> V4 V5
> 4 5
>
> $`2`
> V4 V5
> 4 5
>
> $`3`
> V4 V5
> 4 5
>
> $`4`
> V4 V5
> 4 5
>
> $`5`
> V4 V5
> 4 5
>
> $`6`
> V5
> 5
>
> $`7`
> V2 V3 V5
> 2 3 5
>
> $`8`
> V5
> 5
>
> $`9`
> V5
> 5
>
> $`10`
> V1 V5
> 1 5
>
> $`11`
> integer(0)
>
> $`12`
> V5
> 5
>
> >
>
> But now I'm stuck. I don't how to use this list of indices at each row
> to exclude my columns.
>
> Any pointers please? Would such an exercise be easier if I use a
> time-series based object, like a zoo.
>
>
> TIA and best,
> -Tir
>
> Tirthankar Patnaik
> India Strategy
> Citigroup Investment Research
> +91-22-6631 9887
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.