Hello

My dear friend.... the problem is not very heavy its very light, but you 
have to assume somethings.
as you say that any new stock can come anytime and go anytime, further, 
any time NA can come .....

both statements are so close that it would be nearly impossible to know 
which one is NA (of before the start of series and which one is end of 
series ) and 
which one is NA of series of genuine data which is missing for one day or 
two.

There can be n number of workarounds depending on the propersties of 
dataset.
for e.g.


1.
Given that genuine data is not having not more than two NA consequtively
Work around : Check for past two days that NAs are there or not
Flaw: Will be a lagged kind of computation

2. 
Given that series will have continuously NAs 
Work around : you can go for checking for whole of the past data whether 
there was any number or all NAs
Flaw : Based on backward looking hence will never know that new stock has 
come or not.
e,g, stock may have come but the observation was not recorded hence NA

etc etc

clarify more of your stand, do you get information before hand that today 
stock will get added or deleted.
meaning any exogenous information which will help to optimize it




Regards,

Gaurav Yadav
+++++++++++
Assistant Manager, CCIL, Mumbai (India)
Mob: +919821286118 Email: [EMAIL PROTECTED]
Bhagavad Gita:  Man is made by his belief, as He believes, so He is





"Patnaik, Tirthankar " <[EMAIL PROTECTED]> 
Sent by: [EMAIL PROTECTED]
05/14/2007 11:53 AM

To
<r-help@stat.math.ethz.ch>
cc

Subject
[R] Conditional Sums for Index creation






Hi,
                 Apologies for the long mail. I have a data.frame with 
columns of
price/mcap data for a portfolio of stocks, and the date. To get the
total value of the portfolio on a daily basis, I calculate rowSums of
the data.frame. 

> set.seed(1)
> ab <- matrix(round(runif(100)*100),nrow=20,ncol=5)
> ab[1:5,4:5] <- NA
> ab[6:10,5] <- NA
> ac <- as.data.frame(ifelse(ab <= 7,NA,ab))
> ac
   V1 V2 V3 V4 V5
1  27 93 82 NA NA
2  37 21 65 NA NA
3  57 65 78 NA NA
4  91 13 55 NA NA
5  20 27 53 NA NA
6  90 39 79 26 NA
7  94 NA NA 48 NA
8  66 38 48 77 NA
9  63 87 73  8 NA
10 NA 34 69 88 NA
11 21 48 48 34 24
12 18 60 86 84 NA
13 69 49 44 35 64
14 38 19 24 33 88
15 77 83 NA 48 78
16 50 67 10 89 80
17 72 79 32 86 46
18 99 11 52 39 41
19 38 72 66 78 81
20 78 41 41 96 60
> 

Here the rows 1:20 are dates (also in my data.frame). 

Since some of the prices have NA, the rowSums is made to ignore these
entries. 

> rowSums(ac,na.rm=TRUE)
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
19  20 
202 123 200 159 100 234 142 229 231 191 175 248 261 202 286 296 315 242
335 316 
> 

Stocks are being added to the portfolio too. So from date=6 (or row=6)
we have the 4th stock V4, and from date=11, we have the 5th stock V5. My
problem is that I need to calculate the rowSums for row=6 (When a new
stock was added), _with_ and _without_ the new stock. So my answer for
row=6 would be 234 for the plain row-sum, and 234 - 26 = 208 for the
original set of stocks (without V4). Similarly, my answer for row=11
would be 175 for the plain sum, and 175 - 24 = 151 for the original sum
(without V5). 

Basically I'm interested in finding out the value of the portfolio with
and without the new stock for the purposes of creating an index. It's
possible that some stocks my get dropped later, in which case there
would be an NA series starting for say V1 at row=18 and so on. In that
case, the aim would be to find the sum at row=18 with and without the
value of V1. 

Is there any way I can get the sum over columns, deleting specific
colums? To get the columns that are NA in any row, I tried (shown for
the first 12 rows):

> apply(ac[1:12,],1,function(y)which(is.na(y)))

Which correctly gives 

$`1`
V4 V5 
 4  5 

$`2`
V4 V5 
 4  5 

$`3`
V4 V5 
 4  5 

$`4`
V4 V5 
 4  5 

$`5`
V4 V5 
 4  5 

$`6`
V5 
 5 

$`7`
V2 V3 V5 
 2  3  5 

$`8`
V5 
 5 

$`9`
V5 
 5 

$`10`
V1 V5 
 1  5 

$`11`
integer(0)

$`12`
V5 
 5 

> 

But now I'm stuck. I don't how to use this list of indices at each row
to exclude my columns. 

Any pointers please? Would such an exercise be easier if I use a
time-series based object, like a zoo.


TIA and best,
-Tir

Tirthankar Patnaik
India Strategy
Citigroup Investment Research
+91-22-6631 9887

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



============================================================================================
DISCLAIMER AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}}

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to