[R] Change in order of names after applying plyr package

2012-09-26 Thread Vincy Pyne
Dear R helpers

I have following two data.frames viz. equity_data and param.

equity_data = data.frame(security_id = c(Air, Air, Air, Air, Air, 
Air, Air, Air, Air, Air, Air, Air, AB, AB, AB, AB, AB, 
AB, AB, AB, AB, AB, AB, AB, AD,  AD, AD, AD, AD, AD, 
AD, AD, AD, AD, AD, AD), ason_date = 
c(10-Jan-12,9-Jan-12,8-Jan-12, 7-Jan-12, 
6-Jan-12,5-Jan-12,4-Jan-12,3-Jan-12,2-Jan-12,1-Jan-12, 31-Dec-11, 
30-Dec-11, 10-Jan-12,9-Jan-12,8-Jan-12, 7-Jan-12, 
6-Jan-12,5-Jan-12,4-Jan-12,3-Jan-12,2-Jan-12,1-Jan-12, 31-Dec-11, 
30-Dec-11, 10-Jan-12,9-Jan-12,8-Jan-12, 7-Jan-12, 
6-Jan-12,5-Jan-12,4-Jan-12,3-Jan-12,2-Jan-12,1-Jan-12, 31-Dec-11,
 30-Dec-11), security_rate = c(0.597,0.61,0.6,0.63,0.67,0.7,0.74,0.735, 
7.61,0.795,0.796, 0.84, 8.5,8.1,8.9,8.9,8.9,9,9,9,9,9,9,9,3.21,3.22,3.12, 3.51, 
3.5, 3.37, 3.25, 3, 3.07, 3, 2.94, 2.6))

param = data.frame(confidence_level = c(0.99), holding_period = c(10),  
calculation_method = MC, no_simulation_mc = c(100))


library(plyr)
library(reshape2)

attach(equity_data)
attach(param)

security_names = unique(equity_data$security_id)  
# (security_names are used further in R code not included here)

alpha = param$confidence_level
t = param$holding_period
n = param$no_simulation_mc
method = param$calculation_method


  mc_VaR = function(security_id, ason_date, security_rate)
    {
    security_rate_returns - NULL
    for (i
 in(1:length(ason_date)-1))
    {
  security_rate_returns[i] = log(security_rate[i]/security_rate[i+1])
    }
        
    return_mean = mean(security_rate_returns)
    return_sd = sd(security_rate_returns)
    simulation = rnorm(n, return_mean, return_sd)
    qq = sort(simulation, decreasing = TRUE)
    VaR_mc = -qq[alpha * n]*sqrt(t)
    return(VaR_mc)
    }

    
result_method_other - dlply(.data = equity_data, .variables = security_id, 
.fun = function(x)
 
  mc_VaR(ason_date = x$ason_date, security_id = 
x$security_id, 
                  security_rate = x$security_rate))

    
 result_method_other
$AB
[1] 0.2657424

$AD
[1] 0.212061

$Air
[1] 6.789733

attr(,split_type)
[1] data.frame
attr(,split_labels)
  security_id
1  AB
2  AD
3 Air


MY PROBLEM :

My original data (i.e. equity_data) has the order of Air, AB and AD. 
However, after applying plyr, my order (and corresponding result) has changed 
to AB, AD Air.


I need to
 maintain my original order of Air, AB and AD. How do I modify my R code 
for this?

Kindly guide


Vincy


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to append the random no.s for different variables in the same data.frame

2012-09-12 Thread Vincy Pyne
Dear R helpers,

(At the outset I sincerely apologize if I have not put forward my following 
query properly, though I have tried to do so.)


Following is a curtailed part of my R - code where I am trying to generate say 
100 random no.s for each of the products under consideration.


library(plyr)
n = 100

my_code = function(product, output_avg, output_stdev)

    {

BUR_mc = rnorm(n, output_avg, output_stdev)

sim_BUR = data.frame(product, BUR_mc)

write.csv(data.frame(sim_BUR), 'sim_BUR.csv', row.names = FALSE) 
 
return(list(output_avg, output_stdev))

    }


result - dlply(.data = My_data, .variables = product, .fun = function(x)
 my_code(product = x$product, output_avg = x$output_avg,
 output_stdev = x$output_stdev))


There are some 12 products (and this may vary each time). In my original code, 
the return statement returns me some other output. Here for simplicity sake, 
I am just using the values as given in input.


PROBLEM - A :

I want to store the random no.s (BUR_mc) as generated above for each of the 
products and store them in a single data.frame. Now when I access 
'sim_BUR.csv', I get the csv file where the random nos. generated for the last 
product are getting stored. I need something like

product  random no
product1 ...
product1 ...
.

product1 ...  # (This is 100th value generated 
for product1)
product2 ...
product2 ...





Problem - B

Also, is it possible to have more than one 'return' statements in a given 
function? 

Thanking in advance

Vincy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to append the random no.s for different variables in the same data.frame

2012-09-12 Thread Vincy Pyne
Dear Mr Weylandt and R helpers,

Thanks a lot for your suggestion. Unfortunately the return statement in my 
original R code returns me different results which are obtained after 
processing the function I have constructed. 

My requirement for storing the product-wise random numbers is just a part of my 
whole exercise. For each of the products, I generate a set of random no.s, 
process these, construct some statistics and obtain these statistics using the 
Return statement. So for each of the products, I get these set of statistics 
generated and that is not my problem. 

My problem is BESIDES getting my required output (which anyways I am 
getting), I need the product-wise random numbers I have already generated and 
store them together in a single data.frame. So a single data.frame gives me all 
the product wise random nos.

I am reproducing my problem once again -

# 


library(plyr)
n = 100

my_code = function(product, output_avg, output_stdev)

    {

BUR_mc = rnorm(n, output_avg, output_stdev)

sim_BUR = data.frame(product, BUR_mc)

write.csv(data.frame(sim_BUR), 'sim_BUR.csv', row.names = FALSE) 
 
return(list(output_avg, output_stdev))   

    }


result - dlply(.data = My_data, .variables = product, .fun = function(x)

 my_code(product = x$product, output_avg = x$output_avg,
 output_stdev = x$output_stdev))


There
 are some 12 products (and this may vary each time). In my original 
code, the return statement returns me some other output. Here for 
simplicity sake, I am just using the values as given in input.


PROBLEM 

I
 want to store the random no.s (BUR_mc) as generated above for each of 
the products and store them in a single data.frame. Now when I access 
'sim_BUR.csv', I get the csv file where the random nos. generated for 
the last product are getting stored. I need something like

product  random no
product1 ...
product1
 ...
.

product1 ...  # (There will be 100 such values for 
product1)
product2 ...
product2 ...



product12   .. 

product12   ...  


Thanking you in advance

Vincy


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Maintaining Column names while writing csv file.

2012-07-19 Thread Vincy Pyne
Dear R helpers,

I have one trivial problem while writing an output file in csv format.

I have two dataframes say df1 and df2 which I am reading from two different csv 
files. 

df1 has column names as date, r1, r2, r3 while the dataframe df2 has column 
names as date, 1w, 2w. 

(the dates in both the date frames are identical also no of elements in each 
column are equal say = 10).

I merge these dataframes as

df_new = merge(df1, df2, by = date, all = T) 

So my new data frame has columns as 

date, r1, r2, r3, 1w, 2w

However, if I try to write this new dataframe as a csv file as

write.csv(data.frame(df_new), 'df_new.csv', row.names = FALSE)

The file gets written, but when I open the csv file, the column names displayed 
are as

date, r1, r2, r3, X1w, X2w

My original output file has about 200 columns so it is not possible to write 
column names individually. Also, I can't change the column names since I am 
receiving these files from external source and need to maintain the column 
names.

Kindly guide


Regards

Vincy


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to have original (name) order after melt and cast command

2012-07-18 Thread Vincy Pyne
Dear R helpers,

I have a data.frame as given below -

dat1 = data.frame(date = 
as.Date(c(3/30/12,3/29/12,3/28/12,3/27/12,3/26/12,
3/23/12,3/22/12,3/21/12,3/20/12, 
3/30/12,3/29/12,3/28/12,3/27/12,
3/26/12,3/23/12,3/22/12,3/21/12,3/20/12, 
3/30/12,3/29/12,3/28/12,
3/27/12,3/26/12,3/23/12,3/22/12,3/21/12,3/20/12), 
format=%m/%d/%y),

name = as.character(c(xyz,xyz,xyz,xyz,xyz,xyz,xyz,xyz, 
xyz,abc, abc,abc,abc,abc,abc, abc,abc,abc,lmn,lmn, 
lmn,lmn,  lmn,lmn, lmn,lmn,lmn)),

rate = c(c(0.065550707, 0.001825007, 0.054441969, 0.020810572, 0.073430586, 
0.037299722, 0.099807733, 0.042072817, 0.099487289, 5.550737022, 4.877620777,  
5.462477493, 4.972518082, 5.01495407, 5.820459609, 5.403881954, 5.009506516, 
4.807763909, 10.11885434,10.1856975,10.04976806,10.15428632, 10.20399335, 
10.22966704,10.20967742,10.22927793,10.02439192)))

 dat1
 date  name rate
1  2012-03-30  xyz  0.065550707
2  2012-03-29  xyz  0.001825007
3  2012-03-28  xyz  0.054441969
4  2012-03-27  xyz  0.020810572
5  2012-03-26  xyz  0.073430586
6  2012-03-23  xyz  0.037299722
7  2012-03-22  xyz  0.099807733
8  2012-03-21  xyz  0.042072817
9  2012-03-20  xyz  0.099487289
10 2012-03-30  abc  5.550737022
11 2012-03-29  abc  4.877620777
12 2012-03-28  abc  5.462477493
13 2012-03-27  abc  4.972518082
14 2012-03-26  abc  5.014954070
15 2012-03-23  abc  5.820459609
16 2012-03-22  abc  5.403881954
17 2012-03-21  abc  5.009506516
18 2012-03-20  abc  4.807763909
19 2012-03-30  lmn 10.118854340
20 2012-03-29  lmn 10.185697500
21 2012-03-28  lmn 10.049768060
22 2012-03-27  lmn 10.154286320
23 2012-03-26  lmn 10.203993350
24 2012-03-23  lmn 10.229667040
25 2012-03-22  lmn 10.209677420
26 2012-03-21  lmn 10.229277930
27 2012-03-20  lmn 10.024391920


attach(dat1)

library(plyr)
library(reshape)


in.melt - melt(dat1, measure = 'rate')
(df = cast(in.melt, date ~ name))

df_sorted = df[order(as.Date(df$date, %m/%d/%Y), decreasing = TRUE),]


 df_sorted
    date abc lmn xyz
9 2012-03-30    5.550737 10.11885 0.065550707
8 2012-03-29    4.877621 10.18570 0.001825007
7 2012-03-28    5.462477 10.04977 0.054441969
6 2012-03-27    4.972518 10.15429 0.020810572
5 2012-03-26    5.014954 10.20399 0.073430586
4 2012-03-23    5.820460 10.22967 0.037299722
3 2012-03-22    5.403882 10.20968 0.099807733
2 2012-03-21    5.009507 10.22928 0.042072817
1 2012-03-20    4.807764 10.02439 0.099487289


My Problem :-

The original data.frame has the order name as xyz, abc and lmn. However, 
after melt and cast command, the order in the df_sorted has changed to abc, 
lmn and  xyz. How do I maintain the original order in df_sorted i.e. I 
need 

    date   xyz     abc   lmn    

9 2012-03-30   0.065550707   5.550737   10.11885 

8 2012-03-29   0.001825007   4.877621   10.18570 

7 2012-03-28   0.054441969   5.462477   10.04977 

6 2012-03-27   0.020810572   4.972518   10.15429 

5 2012-03-26   0.073430586   5.014954   10.20399 

4 2012-03-23   0.037299722   5.820460   10.22967 

3 2012-03-22   0.099807733   5.403882   10.20968 

2 2012-03-21   0.042072817   5.009507   10.22928 

1 2012-03-20   0.099487289   4.807764   10.02439 


Kindly guide

Thanking in advance

Vincy 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to have original (name) order after melt and cast command

2012-07-18 Thread Vincy Pyne
Dear Mr Rui Barradas,

Thanks a lot for your wonderful suggestion. It worked and will help me 
immensely in future too. Really heartfelt thanks once again.

Vincy

--- On Wed, 7/18/12, Rui Barradas ruipbarra...@sapo.pt wrote:

From: Rui Barradas ruipbarra...@sapo.pt
Subject: Re: [R] How to have original (name) order after melt and cast command
To: Vincy Pyne vincy_p...@yahoo.ca
Cc: r-help@r-project.org
Received: Wednesday, July 18, 2012, 11:18 AM

Hello,

Try the following.

# This is your code
df_sorted = df[order(as.Date(df$date, %m/%d/%Y), decreasing = TRUE),]

# This is my code
nams - as.character(unique(dat1$name))
nums - sapply(nams, function(nm) which(names(df_sorted) %in% nm))
df_sorted[, sort(nums)] - df_sorted[, nams]
names(df_sorted)[sort(nums)] - nams
df_sorted


Hope this helps,

Rui Barradas

Em 18-07-2012 11:52, Vincy Pyne escreveu:
 Dear R helpers,

 I have a data.frame as given below -

 dat1 = data.frame(date = 
 as.Date(c(3/30/12,3/29/12,3/28/12,3/27/12,3/26/12,
 3/23/12,3/22/12,3/21/12,3/20/12, 
 3/30/12,3/29/12,3/28/12,3/27/12,
 3/26/12,3/23/12,3/22/12,3/21/12,3/20/12, 
 3/30/12,3/29/12,3/28/12,
 3/27/12,3/26/12,3/23/12,3/22/12,3/21/12,3/20/12), 
 format=%m/%d/%y),

 name = as.character(c(xyz,xyz,xyz,xyz,xyz,xyz,xyz,xyz, 
 xyz,abc, abc,abc,abc,abc,abc, abc,abc,abc,lmn,lmn, 
 lmn,lmn,  lmn,lmn, lmn,lmn,lmn)),

 rate = c(c(0.065550707, 0.001825007, 0.054441969, 0.020810572, 0.073430586, 
 0.037299722, 0.099807733, 0.042072817, 0.099487289, 5.550737022, 
 4.877620777,  5.462477493, 4.972518082, 5.01495407, 5.820459609, 5.403881954, 
 5.009506516,
 4.807763909, 10.11885434,10.1856975,10.04976806,10.15428632, 10.20399335, 
 10.22966704,10.20967742,10.22927793,10.02439192)))

 dat1
           date      name         rate
 1  2012-03-30  xyz  0.065550707
 2  2012-03-29  xyz  0.001825007
 3  2012-03-28  xyz  0.054441969
 4  2012-03-27  xyz  0.020810572
 5  2012-03-26  xyz  0.073430586
 6  2012-03-23  xyz  0.037299722
 7  2012-03-22  xyz  0.099807733
 8  2012-03-21  xyz  0.042072817
 9  2012-03-20  xyz  0.099487289
 10 2012-03-30  abc  5.550737022
 11 2012-03-29  abc  4.877620777
 12 2012-03-28  abc  5.462477493
 13 2012-03-27  abc  4.972518082
 14 2012-03-26  abc  5.014954070
 15 2012-03-23  abc  5.820459609
 16 2012-03-22  abc  5.403881954
 17 2012-03-21  abc  5.009506516
 18 2012-03-20  abc  4.807763909
 19 2012-03-30  lmn 10.118854340
 20 2012-03-29  lmn 10.185697500
 21 2012-03-28  lmn 10.049768060
 22 2012-03-27  lmn 10.154286320
 23 2012-03-26  lmn 10.203993350
 24 2012-03-23  lmn 10.229667040
 25 2012-03-22  lmn 10.209677420
 26 2012-03-21  lmn 10.229277930
 27 2012-03-20  lmn 10.024391920


 attach(dat1)

 library(plyr)
 library(reshape)


 in.melt - melt(dat1, measure = 'rate')
 (df = cast(in.melt, date ~ name))

 df_sorted = df[order(as.Date(df$date, %m/%d/%Y), decreasing = TRUE),]


 df_sorted
          date         abc         lmn         xyz
 9 2012-03-30    5.550737 10.11885 0.065550707
 8 2012-03-29    4.877621 10.18570 0.001825007
 7 2012-03-28    5.462477 10.04977 0.054441969
 6 2012-03-27    4.972518 10.15429 0.020810572
 5 2012-03-26    5.014954 10.20399 0.073430586
 4 2012-03-23    5.820460 10.22967 0.037299722
 3 2012-03-22    5.403882 10.20968 0.099807733
 2 2012-03-21    5.009507 10.22928 0.042072817
 1 2012-03-20    4.807764 10.02439 0.099487289


 My Problem :-

 The original data.frame has the order name as xyz, abc and lmn. 
 However, after melt and cast command, the order in the df_sorted has 
 changed to abc, lmn and  xyz. How do I maintain the original order in 
 df_sorted i.e. I need

          date       xyz                 abc           lmn

 9 2012-03-30   0.065550707   5.550737   10.11885

 8 2012-03-29   0.001825007   4.877621   10.18570

 7 2012-03-28   0.054441969   5.462477   10.04977

 6 2012-03-27   0.020810572   4.972518   10.15429

 5 2012-03-26   0.073430586   5.014954   10.20399

 4 2012-03-23   0.037299722   5.820460   10.22967

 3 2012-03-22   0.099807733   5.403882   10.20968

 2 2012-03-21   0.042072817   5.009507   10.22928

 1 2012-03-20   0.099487289   4.807764   10.02439


 Kindly guide

 Thanking in advance

 Vincy


     [[alternative HTML version deleted]]



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to use Sys.time() while writing a csv file name

2012-07-04 Thread Vincy Pyne

Dear Mr Newmiller and Mr Oettli,

Thanks a lot for your valuable guidance. Task is done. Thanks again.

Regards

Vincy


--- On Wed, 7/4/12, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote:

From: Jeff Newmiller jdnew...@dcn.davis.ca.us
Subject: Re: [R] How to use Sys.time() while writing a csv file name
To: Vincy Pyne vincy_p...@yahoo.ca, r-help@r-project.org
Received: Wednesday, July 4, 2012, 5:38 AM

You forgot to follow the posting guide and tell us what operating system you 
are using (sessionInfo), but I am going to guess that you are on Windows where 
the colon (:) is an illegal symbol in filenames. Try formatting the time 
explicitly in the conversion to character using the format string definitions 
found in ?strptime in a format that doesn't include colons.
---
Jeff Newmiller                        The     .    
   .  Go Live...
DCN:jdnew...@dcn.davis.ca.us        Basics: ##.#.       ##.#.  Live 
Go...
                                      Live:   OO#.. Dead: 
OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  
with
/Software/Embedded Controllers)               .OO#.       .OO#.  
rocks...1k
---
Sent from my phone. Please excuse my brevity.



Dear R helpers,

I am using Beta distribution to generate the random no.s (recovery
rates in my example). However, each time I need to save these random
no.s in a csv format. To distinguish different csv files, one way I
thought was use of Sys.time in the file name. My code is as follows -

# My code

rr = rbeta(25, 6.14, 8.12)

lgd = 1 - mean(rr)

write.csv(data.frame(recovery_rates = rr), file =
paste(recovery_rates_at_, Sys.time(), .csv, sep = ), row.names =
FALSE)


However, I get following error -

Error in file(file, ifelse(append, a, w)) : � cannot open the
connection
In addition: Warning message: In file(file, ifelse(append, a, w)) :
cannot open file 'recovery_rates_at_2012-07-04 1:14:05.csv': Invalid
argument


If instead of Sys.time, I use some other variable e.g. lgd as 

write.csv(data.frame(recovery_rates = rr), paste('rates_',lgd,'.csv',
sep = ), row.names = FALSE)

I am able to store these simulated recovery rates in different files.
But I need to use Sys.time in my csv file name. (or is there any other
way of writing these csv files so that files don't get over-written).

Kindly guide.

Regards and thanking in advance

Vincy


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to use Sys.time() while writing a csv file name

2012-07-03 Thread Vincy Pyne
Dear R helpers,

I am using Beta distribution to generate the random no.s (recovery rates in my 
example). However, each time I need to save these random no.s in a csv format. 
To distinguish different csv files, one way I thought was use of Sys.time in 
the file name. My code is as follows -

# My code

rr = rbeta(25, 6.14, 8.12)

lgd = 1 - mean(rr)

write.csv(data.frame(recovery_rates = rr), file = paste(recovery_rates_at_, 
Sys.time(), .csv, sep = ), row.names = FALSE)


However, I get following error -

Error in file(file, ifelse(append, a, w)) :   cannot open the connection
In addition: Warning message: In file(file, ifelse(append, a, w)) :
cannot open file 'recovery_rates_at_2012-07-04 1:14:05.csv': Invalid argument


If instead of Sys.time, I use some other variable e.g. lgd as 

write.csv(data.frame(recovery_rates = rr), paste('rates_',lgd,'.csv', sep = 
), row.names = FALSE)

I am able to store these simulated recovery rates in different files. But I 
need to use Sys.time in my csv file name. (or is there any other way of writing 
these csv files so that files don't get over-written).

Kindly guide.

Regards and thanking in advance

Vincy


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] What's wrong with MEAN?

2012-05-22 Thread Vincy Pyne

Dear R helpers,

I have recently installed R version 2.15.0

I just wanted to calculate 

mean(16, 18)

Surprisingly I got answer as 

 mean(16, 18)
[1] 16


 mean(18, 16)

[1] 18

 mean(14, 11, 17, 9, 5, 18)
[1] 14


So instead of calculating simple Arithmetic average, mean command is generating 
first element as average. I restarted the machine, changed the machine, but 
still the reply is same. I have been using this mean function ever since I 
strated learning R, but this has never happened.

Kindly guide

Vincy




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What's wrong with MEAN?

2012-05-22 Thread Vincy Pyne
Dear Mr. Thierry,

Thanks a lot for pointing out such a silly mistake from my side. I was simply 
wondering how come I am not getting such a simple mean. 

Thanks again.

Vincy

--- On Tue, 5/22/12, ONKELINX, Thierry thierry.onkel...@inbo.be wrote:

From: ONKELINX, Thierry thierry.onkel...@inbo.be
Subject: RE: [R] What's wrong with MEAN?
To: Vincy Pyne vincy_p...@yahoo.ca, r-help@r-project.org 
r-help@r-project.org
Received: Tuesday, May 22, 2012, 9:17 AM

You'll need to pass the data as a vector.

mean(16, 18) is asking the mean of 16. 18 is passed to the second argument 
which is trim. So you are doing mean(16, trim = 18)

What you want is

mean(c(16, 18))

Best regards,

Thierry

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey


-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens 
Vincy Pyne
Verzonden: dinsdag 22 mei 2012 11:10
Aan: r-help@r-project.org
Onderwerp: [R] What's wrong with MEAN?


Dear R helpers,

I have recently installed R version 2.15.0

I just wanted to calculate

mean(16, 18)

Surprisingly I got answer as

 mean(16, 18)
[1] 16


 mean(18, 16)

[1] 18

 mean(14, 11, 17, 9, 5, 18)
[1] 14


So instead of calculating simple Arithmetic average, mean command is generating 
first element as average. I restarted the machine, changed the machine, but 
still the reply is same. I have been using this mean function ever since I 
strated learning R, but this has never happened.

Kindly guide

Vincy




        [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en 
binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is 
door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the 
writer and may not be regarded as stating an official position of INBO, as long 
as the message is not confirmed by a duly signed document.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiple Conditional Statement

2012-04-24 Thread Vincy Pyne
Dear R helpers,

I have two separate data frames. In one data frame the transaction data is 
stored and the other data frame has exchange rates stored say rate_A and rate_B 
where rate_A and rate_B are series of rates. 

rate_A and rate_B are properly defined and I am reading them through the 
appropriate dataframe. (Actually I have a different datasets and to try to keep 
things simple, I am defining it as above).

I have BUY or SELL transaction (defined under the column head Type in 
transactions dataframe) and depending on the type of transaction, I need to 
define the rates. 

So if the type is BUY, rate_1 = rate_A and rate_2 = rate_B and if the type is 
SELL, rate_1 = rate_B and rate_2 = rate_A.

To begin with I have only one transaction in my data frame (I am not aware if 
it is BUY or SELL transaction)


Thus, I tried


if(Type == Buy) 

{rate_1 = rate_A  rate_2 = rate_B} else {rate_1 = rate_B  rate_2 = rate_A}    
   

I get following error
    
Error in rate_A  rate_2 = rate_B 
  could not find function -
    
How do I define multiple conditional statements?

Kindly guide.

Vincy     
    

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Matrix multiplication by multple constants

2012-04-20 Thread Vincy Pyne
Dear R helpers

Suppose 

x  - c(1:3)

y  - matrix(1:12, ncol = 3, nrow = 4)

 y
 [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12

I wish to multiply 1st column of y by first element of x i.e. 1, 2nd column of 
y by 2nd element of x i.e. 2 an so on. Thus the resultant matrix should be like

 z

 [,1]   [,2]    [,3]

[1,]    1    10    27

[2,]    2    12    30

[3,]    3    14    33

[4,]    4    16    36


When I tried simple multiplication like x*y, y is getting multiplied 
column-wise 

 x*z
  [,1] [,2] [,3]
[1,]    1    5    9
[2,]    4   12   20
[3,]    9   21   33
[4,]   16   32   48


Kindly guide

Regards

Vincy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix multiplication by multple constants

2012-04-20 Thread Vincy Pyne
Dear Mr. Dimitris Rizopoulos,

Thanks a lot for your great help. It worked nicely. I couldn't have figured it 
out. Thanks again.

Regards

Vincy

--- On Fri, 4/20/12, Dimitris Rizopoulos d.rizopou...@erasmusmc.nl wrote:

From: Dimitris Rizopoulos d.rizopou...@erasmusmc.nl
Subject: Re: [R] Matrix multiplication by multple constants
To: Vincy Pyne vincy_p...@yahoo.ca
Cc: r-help@r-project.org
Received: Friday, April 20, 2012, 8:57 AM

try this:

x  - 1:3
y  - matrix(1:12, ncol = 3, nrow = 4)

y * rep(x, each = nrow(y))


I hope it helps.

Best,
Dimitris


On 4/20/2012 10:51 AM, Vincy Pyne wrote:
 Dear R helpers

 Suppose

 x- c(1:3)

 y- matrix(1:12, ncol = 3, nrow = 4)

 y
       [,1] [,2] [,3]
 [1,]    1    5    9
 [2,]    2    6   10
 [3,]    3    7   11
 [4,]    4    8   12

 I wish to multiply 1st column of y by first element of x i.e. 1, 2nd column 
 of y by 2nd element of x i.e. 2 an so on. Thus the resultant matrix should be 
 like

 z

       [,1]   [,2]    [,3]

 [1,]    1    10    27

 [2,]    2    12    30

 [3,]    3    14    33

 [4,]    4    16    36


 When I tried simple multiplication like x*y, y is getting multiplied 
 column-wise

 x*z
        [,1] [,2] [,3]
 [1,]    1    5    9
 [2,]    4   12   20
 [3,]    9   21   33
 [4,]   16   32   48


 Kindly guide

 Regards

 Vincy

     [[alternative HTML version deleted]]




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Constructing a data.frame from csv files

2012-01-11 Thread Vincy Pyne
Dear R helpers,

Following is my R code where I am trying to calculate returns and then trying 
to create a data.frame. Since, I am not aware how many instruments I will be 
dealing so I have constructed a function. My R code is as follows -

library(plyr)

mydata - data.frame(instru_name = 
c(instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B),
 date = c(10-Jan-12,9-Jan-12,8-Jan-12, 7-Jan-12, 
6-Jan-12,5-Jan-12,4-Jan-12,3-Jan-12,2-Jan-12,1-Jan-12, 31-Dec-11, 
30-Dec-11,29-Dec-11,28-Dec-11,10-Jan-12,9-Jan-12,8-Jan-12, 
7-Jan-12,6-Jan-12,5-Jan-12,4-Jan-12,3-Jan-12,2-Jan-12,1-Jan-12,31-Dec-11,30-Dec-11,29-Dec-11,28-Dec-11),
 price = c(11.9,10.5,13,14.5,14.4,14.8,10.1,12,14.3, 
10.7,11.2,10.2,10.2,10.8,41.9,40.5,43,44.5,44.4,48.8,42.1,44,46.3,48.7,46.2,44.2,42.2,40.8))

attach(mydata)

opt_return_volatilty = function(price, instru_name)

{

price_returns = matrix(data = NA, nrow = (length(price)-1), ncol = 1)
    for (i in(1:(length(price)-1)))
    {
    price_returns[i] = log(price[i]/price[i+1])
    }
volatility = sd(price_returns)
entity_returns = unique(instru_name)
colnames(price_returns) = entity_returns

write.csv(price_returns, file = paste(entity_returns, .csv, sep = ), 
row.names = FALSE)

return(data.frame(list(volatility = volatility)))

}

entity_volatility - ddply(.data=mydata, .variables = instru_name,
    .fun=function(x) opt_return_volatilty(price = x$price, 
instru_name = x$instru_name))


 entity_volatility
  instru_name volatility
1    instru_A 0.17746897
2    instru_B 0.06565341                
                
                
fileNames - list.files(pattern = instru.*.csv)

 fileNames
[1] instru_A.csv instru_B.csv

# 
_

# MY QUERY

# I need to construct the data frame consisting of all the returns. I.e. I need 
to have # a data.frame like 

instru_A instru_B
0.125163143  0.033983853 
-0.2135741  -0.059898142 
-0.109199292    -0.034289073  
0.006920443  0.00224972 
-0.027398974    -0.094490843  

I am using following Code 

input - do.call(rbind, lapply(fileNames, function(.name)
    {
    .data - read.csv(.name, header = TRUE, as.is = TRUE)
    .data$file - .name
    .data
    }))

# I get following error.
    
    Error in match.names(clabs, names(xi)) : 
  names do not match previous names

  
 Kindly guide

Regards

Vincy

 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constructing a data.frame from csv files

2012-01-11 Thread Vincy Pyne
Dear Sir,
Thanks a lot for your guidance. I have understood my mistake. It was naming the 
columns viz.   colnames(price_returns) = entity_returns which was creating the 
problems. Code is running excellently once I got rid of this particular line. I 
will use melt from reshape etc to get the required data.frame.

Thanks again.
With warm regards
Vincy
--- On Wed, 1/11/12, jim holtman jholt...@gmail.com wrote:

From: jim holtman jholt...@gmail.com
Subject: Re: [R] Constructing a data.frame from csv files
To: Vincy Pyne vincy_p...@yahoo.ca
Cc: r-help@r-project.org
Received: Wednesday, January 11, 2012, 1:49 PM

The error message says it all:  the dataframes that you are creating,
and then trying to 'rbind', do not have the same columns.  You need to
at least show what the first couple of lines of each of you input
files are, or output the names of the columns as you are reading the
files.  This is some elementary debugging that you will have to learn.

On Wed, Jan 11, 2012 at 7:38 AM, Vincy Pyne vincy_p...@yahoo.ca wrote:
 Dear R helpers,

 Following is my R code where I am trying to calculate returns and then trying 
 to create a data.frame. Since, I am not aware how many instruments I will be 
 dealing so I have constructed a function. My R code is as follows -

 library(plyr)

 mydata - data.frame(instru_name = 
 c(instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B),
  date = c(10-Jan-12,9-Jan-12,8-Jan-12, 7-Jan-12, 
 6-Jan-12,5-Jan-12,4-Jan-12,3-Jan-12,2-Jan-12,1-Jan-12, 
 31-Dec-11, 
 30-Dec-11,29-Dec-11,28-Dec-11,10-Jan-12,9-Jan-12,8-Jan-12, 
 7-Jan-12,6-Jan-12,5-Jan-12,4-Jan-12,3-Jan-12,2-Jan-12,1-Jan-12,31-Dec-11,30-Dec-11,29-Dec-11,28-Dec-11),
  price = c(11.9,10.5,13,14.5,14.4,14.8,10.1,12,14.3, 
 10.7,11.2,10.2,10.2,10.8,41.9,40.5,43,44.5,44.4,48.8,42.1,44,46.3,48.7,46.2,44.2,42.2,40.8))

 attach(mydata)

 opt_return_volatilty = function(price, instru_name)

 {

 price_returns = matrix(data = NA, nrow = (length(price)-1), ncol = 1)
     for (i in(1:(length(price)-1)))
     {
     price_returns[i] = log(price[i]/price[i+1])
     }
 volatility = sd(price_returns)
 entity_returns = unique(instru_name)
 colnames(price_returns) = entity_returns

 write.csv(price_returns, file = paste(entity_returns, .csv, sep = ), 
 row.names = FALSE)

 return(data.frame(list(volatility = volatility)))

 }

 entity_volatility - ddply(.data=mydata, .variables = instru_name,
     .fun=function(x) opt_return_volatilty(price = x$price, 
 instru_name = x$instru_name))


 entity_volatility
   instru_name volatility
 1    instru_A 0.17746897
 2    instru_B 0.06565341


 fileNames - list.files(pattern = instru.*.csv)

 fileNames
 [1] instru_A.csv instru_B.csv

 # 
 _

 # MY QUERY

 # I need to construct the data frame consisting of all the returns. I.e. I 
 need to have # a data.frame like

 instru_A instru_B
 0.125163143  0.033983853
 -0.2135741  -0.059898142
 -0.109199292    -0.034289073
 0.006920443  0.00224972
 -0.027398974    -0.094490843

 I am using following Code

 input - do.call(rbind, lapply(fileNames, function(.name)
     {
     .data - read.csv(.name, header = TRUE, as.is = TRUE)
     .data$file - .name
     .data
     }))

 # I get following error.

     Error in match.names(clabs, names(xi)) :
   names do not match previous names


  Kindly guide

 Regards

 Vincy



        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] KS and AD test for Generalized PAreto and Generalized Extreme value

2012-01-04 Thread Vincy Pyne
Dear R helpers,

I need to use KS and AD test for Generalized Pareto and Generalized extreme 
value.

E.g. if I need to use KS for Weibull, I have teh syntax

ks.test(x.wei,pweibull, shape=2,scale=1)

Similarly, for AD I use

ad.test(x, distr.fun, ...)

My problem is fir given data, I have estimated the parameters of GPD and GEV 
using lmom. But I am not able to find out the distribution name I should be use 
for these distributions if I wish to use these tests. 

E.g, for gamma, I can use pgamma etc. What distribution name I should use for 
GPD and GEV and for that matter where can I find the distribution names I can 
use for KS and AD test.

Thanks in advance

Regards

Vincy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Matching two datasets and updating values

2011-10-04 Thread Vincy Pyne
Dear R forum

I have two datafarmes with category and cat_val forming one dataframe and cust 
and cust_category forming another dataframe.

category = c(C, D, B, A)
cat_val = c(0.10, 0.25, 0.40, 0.54)
cust = c(cust_1, cust_2, cust_3, cust_4, cust_5, cust_6, cust_7, 
cust_8, cust_9, cust_10)
cust_category = c(C, A, A, A, A, C, D, B, B, D)

Thus, I have 

 category
[1] C D B A

 cat_val
[1] 0.10 0.25 0.40 0.54

 cust
 [1] cust_1  cust_2  cust_3  cust_4  cust_5 
 [6] cust_6  cust_7  cust_8  cust_9  cust_10

 cust_category
 [1] C A A A A C D B B D

My problem is to match 'cust_category' with 'category' and accordingly selct 
the value assigned to this category value. In other words, 1st element of 
cust_category is C, so it should select the value 0.10, the second element is 
A, so it should assign value 0.54 against this. So effectively I should get

cust        cust_category  cat_val
cust_1    C                   0.10  
cust_2    A   0.54
cust_3    A   0.54

cust_10  D   0.25 


Kindly guide

Regards

Vincy


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question regarding dnorm()

2011-09-14 Thread Vincy Pyne
Hi,

I have one basic doubt. Suppose X ~ N(50,10).

I need to calculate Probability X = 50.

dnorm(50, 50, 10) gives me
[1] 0.03989423

My understanding is (which is bit statistical or may be mathematical) on a 
continuous scale, Probability of the type P(X = .) are nothing but 
1/Infinity i.e. = 0. So as per my understanding P(X = 50) should be 0, but even 
excel also gives 0.03989422. Obviously my understanding is wrong. If I put 
value of x = 0 in the normal density function, I do get 0.03989422.

My confusion is on the continuous scale if the probability (X = x) doesn't make 
sense, 0.03989423 is significant to neglect.

Please clarify

Regards

Vincy


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding dnorm()

2011-09-14 Thread Vincy Pyne
Dear Sirs,

Thanks a lot for your explanation. This was such a hugh conceptual error from 
my end. I never realized probability and density are two different things. I 
used to feel I have strated understanding stats a bit. This explanation has 
changed everything again. Thanks a lot again Mr Ellison and Mr Mark for your 
guidance.

Regards

Vincy

--- On Wed, 9/14/11, S Ellison s.elli...@lgcgroup.com wrote:

From: S Ellison s.elli...@lgcgroup.com
Subject: RE: [R] Question regarding dnorm()
To: Vincy Pyne vincy_p...@yahoo.ca, r-help@r-project.org 
r-help@r-project.org
Received: Wednesday, September 14, 2011, 11:37 AM

You have calculated density, not probability.

Probability is in [0,1]; density is in [0,Inf)  

And for a continuous variable, density cannot be interpreted as a probability 
or a frequency.

S


 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Vincy Pyne
 Sent: 14 September 2011 12:24
 To: r-help@r-project.org
 Subject: [R] Question regarding dnorm()
 
 Hi,
 
 I have one basic doubt. Suppose X ~ N(50,10).
 
 I need to calculate Probability X = 50.
 

 dnorm(50, 50, 10) gives me
 [1] 0.03989423
 
 My understanding is (which is bit statistical or may be 
 mathematical) on a continuous scale, Probability of the type 
 P(X = .) are nothing but 1/Infinity i.e. = 0. So as per 
 my understanding P(X = 50) should be 0, but even excel also 
 gives 0.03989422. Obviously my understanding is wrong. If I 
 put value of x = 0 in the normal density function, I do get 
 0.03989422.
 
 My confusion is on the continuous scale if the probability (X 
 = x) doesn't make sense, 0.03989423 is significant to neglect.
 
 Please clarify
 
 Regards
 
 Vincy
 
 
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 ***
This email and any attachments are confidential. Any use...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Autocorrelation using acf

2011-08-25 Thread Vincy Pyne
Dear R list

As suggested by Prof Brian Ripley, I have tried to read acf literature. The 
main problem is I am not the statistician and hence have some problem in 
understanding the concepts immediately. I came across one literature 
(http://www.stat.nus.edu.sg/~staxyc/REG32.pdf) on auto-correlation giving the 
methodology. As per that literature, the auto-correlation is arrived at as per 
following.

y = 
c(15.91,9.80,17.16,16.68,15.53,22.66,31.01,8.62,45.82,10.97,45.46,28.69,36.75,37.75,
 41.18,42.67,46.05, 43.70,53.08,47.56)

t = c(1:20) # defining time variable.

Fitting y = a + bt + e, I get the estimates of a and b as a = 9.12 and b = 
2.07. So using these estimates I obtain

y_fit = 
c(11.19,13.26,15.33,17.40,19.47,21.54,23.61,25.68,27.75,29.82,31.89,33.96, 
36.03,38.10, 40.17,42.24,44.31,46.38,48.45,50.52)  # these are fitted values.


e_t = (y - y_fit)   # dif between the observed y and fitted value of 
corresponding y

 e_t
 [1]   4.72  -3.46   1.83  -0.72  -3.94   1.12   7.40
 [8] -17.06  18.07 -18.85  13.57  -5.27   0.72  -0.35
[15]  
 1.01   0.43   1.74  -2.68   4.63  -2.96

# We define 

e_t1 = 
c(-3.46,1.83,-0.72,-3.94,1.12,7.40,-17.06,18.07,-18.85,13.57,-5.27,0.72,-0.35,1.01,
 0.43,1.74,-2.68,4.63,-2.96)   # 1 st element of e_t deleted

e_t2 = 
c(4.72,-3.46,1.83,-0.72,-3.94,1.12,7.40,-17.06,18.07,-18.85,13.57,-5.27,0.72,-0.35,
 1.01, 0.43,1.74,-2.68,4.63)     # Original series with last element deleted


cor(e_t1, e_t2)

 cor(e_t1, e_t2)
[1] -0.8732316


However, if I use 

acf(y, 1)

Autocorrelations of series ‘y’, by lag

    0     1 
1.000 0.343 

I am simply not able to figure out how acf is used? 

Thanking you in advance.

Regards

Vincy

--- On Wed, 8/24/11, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:

From: Prof Brian Ripley rip...@stats.ox.ac.uk
Subject: Re: [R] Autocorrelation using library(tseries)
To: Vincy Pyne vincy_p...@yahoo.ca
Cc: r-help@r-project.org
Received:
 Wednesday, August 24, 2011, 9:08 AM

Your understanding is wrong.  For a start, there is no function acf() in 
package tseries: it is in stats.

And the autocorrelation at lag one is not the correlation omitting the first 
and last values: it uses the mean and variance estimated from the whole series 
and divisor n.

Have you looked at the reference given on ?acf ?  As the help says

     (This contains the exact definitions used.)

Neither the R help pages nor R-help are intended as tutorials in statistics.

On Wed, 24 Aug 2011, Vincy Pyne wrote:

 Dear R list
 
 I am trying to understand the auto-correlation concept. Auto-correlation is 
 the self-correlation of random variable X with a certain time lag of say t.
 
 The article 
 http://www.mit.tut.fi/MIT-3010/luentokalvot/lk10-11/MDA_lecture16_11.pdf; 
 (Page no. 9 and 10) gives the methodology as under.

But that is not the definitive reference, and no, it doesn't (and what it does 
give is not the conventional definition in the time series literature).

 Suppose you have a time series observations as say
 
 X = c(44,41,46,49,49,50,40,44,49,41)
 
 # For autocorrelation with time lag of 1 we define
 
 A = c(41,46,49,49,50,40,44,49,41)?? # first element of X not considered
 B = c(44,41,46,49,49,50,40,44,49) # Last element of X not considered
 
 cor(A,B)
 [1] -0.02581234
 
 However, if I try the acf command using library tseries I get
 
 acf(X, 1)
 
 Autocorrelations of series ???X???, by
 lag
 
  0?? 1
 ??1.000 -0.019
 
 So
 by usual correlation command (where same random variable X is converted into 
two series with a time lag of 1), I obtain auto-correlation as -0.02581234 and 
by acf command I get auto-correlation = -0.019 (for time lag of 1).
 
 I am not able to figure out where I am going wrong or is it my understanding 
 of auto-correlation procedure is wrong?
 
 Will be grateful if someone guides .
 
 Vincy
 
 
 
     [[alternative HTML version deleted]]
 
 

-- Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,         
    Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Autocorrelation using library(tseries)

2011-08-24 Thread Vincy Pyne
Dear R list

I am trying to understand the auto-correlation concept. Auto-correlation is the 
self-correlation of random variable X with a certain time lag of say t.

The article 
http://www.mit.tut.fi/MIT-3010/luentokalvot/lk10-11/MDA_lecture16_11.pdf; 
(Page no. 9 and 10) gives the methodology as under. 

Suppose you have a time series observations as say

X = c(44,41,46,49,49,50,40,44,49,41) 

# For autocorrelation with time lag of 1 we define 

A = c(41,46,49,49,50,40,44,49,41)  # first element of X not considered
B = c(44,41,46,49,49,50,40,44,49) # Last element of X not considered

 cor(A,B)
[1] -0.02581234

However, if I try the acf command using library tseries I get

acf(X, 1)

Autocorrelations of series ‘X’, by
 lag

     0      1 
 1.000 -0.019 

So by usual correlation command (where same random variable X is converted into 
two series with a time lag of 1), I obtain auto-correlation as -0.02581234 and 
by acf command I get auto-correlation = -0.019 (for time lag of 1).

I am not able to figure out where I am going wrong or is it my understanding of 
auto-correlation procedure is wrong?

Will be grateful if someone guides .

Vincy



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Correlation discrepancy

2011-08-23 Thread Vincy Pyne
Dear R list, I have one very elementary question regrading correlation between 
two variables. 

x = c(44,46,46,47,45,43,45,44)
y = c(44,43,41,41,46,48,44,43)

 cov(x, y)
[1] -2.428571

However, if I try to calculate the covariance using the formula as


covariance = sum((x-mean(x))*(y-mean(y)))/8   # no of of paired obs. = 8

or 

covariance = sum(x*y)/8-(mean(x)*mean(y))

gives

covariance = 2.125

I am not able to figure out where I am going wrong w.r.t. the covariance 
formula. Kindly guide.

Regards

Vincy












[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation discrepancy

2011-08-23 Thread Vincy Pyne
Dear Mr. Dimitris and Mr Harding, thanks a lot for your guidance. It will be 
interesting to find out how the Excel deals with this formula. I will try it. 
Thanks again.

Regards

Ashok

--- On Tue, 8/23/11, ted.hard...@wlandres.net ted.hard...@wlandres.net wrote:

From: ted.hard...@wlandres.net ted.hard...@wlandres.net
Subject: Re: [R] Correlation discrepancy
To: r-help@r-project.org
Cc: Vincy Pyne vincy_p...@yahoo.ca
Received: Tuesday, August 23, 2011, 11:38 AM

In addition, something has gone wrong, Vincy, with your data x,y
between evaluating cov(x,y) and evaluating your explicit formula.

If I repeat your commands:

  x = c(44,46,46,47,45,43,45,44)
  y = c(44,43,41,41,46,48,44,43)
  cov(x, y)
  # [1] -2.428571

 
 sum((x-mean(x))*(y-mean(y)))/8
  # [1] -2.125

which has the right sign and, when changed to incorporate the
correct denomonator (n-1 = 7) as suggested by Dimitris:

  sum((x-mean(x))*(y-mean(y)))/7
  # [1] -2.428571

gives exact agreement. With regard to your second formula, this
should correspondingly be:

  sum(x*y)/7 - (mean(x)*mean(y))*8/7
  # [1] -2.428571

again agreeing exactly. Your result:

 covariance = sum((x-mean(x))*(y-mean(y)))/8   # no of of paired
 obs. = 8

 or

 covariance = sum(x*y)/8-(mean(x)*mean(y))

 gives

 covariance = 2.125

agrees in numerical magnitude with the 1/8 form, but has
the wrong sign. Or maybe you simply mis-typed -2.125 as 2.125.

Hoping this helps,
Ted.

On 23-Aug-11 11:25:15, Dimitris
 Rizopoulos wrote:
 well, you don't have the correct denominator, i.e., n-1,
 with n denoting the sample size. Have a look at the *Details*
 section of the online help file for cov(), and try also
 
 sum((x-mean(x))*(y-mean(y)))/7
 cov(x, y)
 
 
 I hope it helps.
 
 Best,
 Dimitris
 
 
 On 8/23/2011 1:18 PM, Vincy Pyne wrote:
 Dear R list, I have one very elementary question regrading correlation
 between two variables.

 x = c(44,46,46,47,45,43,45,44)
 y = c(44,43,41,41,46,48,44,43)

 cov(x, y)
 [1] -2.428571

 However, if I try to calculate the covariance using the formula as


 covariance = sum((x-mean(x))*(y-mean(y)))/8       # no of of paired
 obs. =
 8

 or

 covariance = sum(x*y)/8-(mean(x)*mean(y))

 gives

 covariance = 2.125

 I am not able to figure out where I am going wrong w.r.t. the
 covariance formula. Kindly guide.

 Regards

 Vincy












      [[alternative HTML version deleted]]




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE
 do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 Dimitris Rizopoulos
 Assistant Professor
 Department of Biostatistics
 Erasmus University Medical Center
 
 Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
 Tel: +31/(0)10/7043478
 Fax: +31/(0)10/7043014
 Web: http://www.erasmusmc.nl/biostatistiek/
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 23-Aug-11                                       Time: 12:38:36
-- XFMail --

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation discrepancy

2011-08-23 Thread Vincy Pyne
Dear Mr Dimitris and Mr Harding, by mistake I have typed my colleagues name 
(i.e. Ashok) while thanking you. Please excuse me for that.

Regards

Vincy

--- On Tue, 8/23/11, ted.hard...@wlandres.net ted.hard...@wlandres.net wrote:

From: ted.hard...@wlandres.net ted.hard...@wlandres.net
Subject: Re: [R] Correlation discrepancy
To: r-help@r-project.org
Cc: Vincy Pyne vincy_p...@yahoo.ca
Received: Tuesday, August 23, 2011, 11:38 AM

In addition, something has gone wrong, Vincy, with your data x,y
between evaluating cov(x,y) and evaluating your explicit formula.

If I repeat your
 commands:

  x = c(44,46,46,47,45,43,45,44)
  y = c(44,43,41,41,46,48,44,43)
  cov(x, y)
  # [1] -2.428571

  sum((x-mean(x))*(y-mean(y)))/8
  # [1] -2.125

which has the right sign and, when changed to incorporate the
correct denomonator (n-1 = 7) as suggested by Dimitris:

  sum((x-mean(x))*(y-mean(y)))/7
  # [1] -2.428571

gives exact agreement. With regard to your second formula, this
should correspondingly be:

  sum(x*y)/7 - (mean(x)*mean(y))*8/7
  # [1] -2.428571

again agreeing exactly. Your result:

 covariance = sum((x-mean(x))*(y-mean(y)))/8   # no of of paired
 obs. = 8

 or

 covariance = sum(x*y)/8-(mean(x)*mean(y))

 gives

 covariance = 2.125

agrees in numerical magnitude with the 1/8
 form, but has
the wrong sign. Or maybe you simply mis-typed -2.125 as 2.125.

Hoping this helps,
Ted.

On 23-Aug-11 11:25:15, Dimitris Rizopoulos wrote:
 well, you don't have the correct denominator, i.e., n-1,
 with n denoting the sample size. Have a look at the *Details*
 section of the online help file for cov(), and try also
 
 sum((x-mean(x))*(y-mean(y)))/7
 cov(x, y)
 
 
 I hope it helps.
 
 Best,
 Dimitris
 
 
 On 8/23/2011 1:18 PM, Vincy Pyne wrote:
 Dear R list, I have one very elementary question regrading correlation
 between two variables.

 x = c(44,46,46,47,45,43,45,44)
 y = c(44,43,41,41,46,48,44,43)

 cov(x, y)
 [1] -2.428571

 However, if I try to calculate the covariance using the formula
 as


 covariance = sum((x-mean(x))*(y-mean(y)))/8       # no of of paired
 obs. = 8

 or

 covariance = sum(x*y)/8-(mean(x)*mean(y))

 gives

 covariance = 2.125

 I am not able to figure out where I am going wrong w.r.t. the
 covariance formula. Kindly guide.

 Regards

 Vincy












      [[alternative HTML version deleted]]




 __
 R-help@r-project.org mailing
 list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 Dimitris Rizopoulos
 Assistant Professor
 Department of Biostatistics
 Erasmus University Medical Center
 
 Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
 Tel: +31/(0)10/7043478
 Fax: +31/(0)10/7043014
 Web: http://www.erasmusmc.nl/biostatistiek/
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 23-Aug-11                                       Time: 12:38:36
-- XFMail
 --

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Meaning of %%

2011-07-13 Thread Vincy Pyne
Dear r helpers

This may be very elementary question but I couldn't figure out what does the 
operator %% do?

E.g.

p - 100
q - 200

p%%q
[1] 100

q%%p
[1] 0

Please guide.

Vincy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Meaning of %%

2011-07-13 Thread Vincy Pyne
Meaning of %%

Thanks a lot for the guidance. 

Before posting this query, I had tried following things.

 ?%%
Error: unexpected SPECIAL in ?%%
 ??%%
Error: unexpected SPECIAL in ??%%

I also tried search.r-project.org and tried to search there also, but no luck.

 help(%%) GIVES ME

Error in file(out, wt) : cannot open the connection

In addition: Warning message:

In file(out, wt) :
 
cannot open file 'C:\DOCUME~1\LOCALS~1\Temp\RtmpoCnAxB\Rtxt52325f7': No such 
file or directory


Regards

Vincy


--- On Wed, 7/13/11, ONKELINX, Thierry thierry.onkel...@inbo.be wrote:

From: ONKELINX, Thierry thierry.onkel...@inbo.be
Subject: RE: [R] Meaning of %%
To: Vincy Pyne vincy_p...@yahoo.ca, r-help@r-project.org 
r-help@r-project.org
Received: Wednesday, July 13, 2011, 10:13 AM

help(%%)


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey


 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 Namens Vincy Pyne
 Verzonden: woensdag 13 juli 2011 12:00
 Aan: r-help@r-project.org
 Onderwerp: [R] Meaning of %%
 
 Dear r helpers
 
 This may be very elementary question but I couldn't figure out what does the
 operator %% do?
 
 E.g.
 
 p - 100
 q - 200
 
 p%%q
 [1] 100
 
 q%%p
 [1] 0
 
 Please guide.
 
 Vincy
 
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Generalized Logistic and Richards Curve

2011-07-07 Thread Vincy Pyne
Dear R helpers,

I am not a statistician and right now struggling with Richards curve. Wikipedia 
says

(http://en.wikipedia.org/wiki/Generalised_logistic_function)

The generalized logistic curve or function, also known as Richard's curve is 
a widely-used and flexible sigmoid function for growth modelling, extending the 
well-known logistic curve.

Now I am confused and will like to know if the Generalized Logistic 
distribution as described in lmomco package is same as what wikipedia is 
describing. In other words, is Generalized Logistic Function same as 
Generalized logistic distribution?

I do understand there is separate R package richards' for dealing with 
Richards curve. 

Kindly guide

Vincy




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Value of 'pi'

2011-05-30 Thread Vincy Pyne
Dear R helpers,

I have one basic doubt about the value of pi. In school, we have learned that 

pi = 22/7 (which is = 3.142857). However, if I type pi in R, I get pi = 
3.141593. So which value of pi should be considered?

Regards

Vincy





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value of 'pi'

2011-05-30 Thread Vincy Pyne
That's the beauty of this R forum. This forum is full of knowledgeable wizards 
and replies received along-with the related discussions pertaining to a simple 
harmless question like this enriches us tremendously. Thanks a lot for all your 
comments. I am sticking to the value of 'pi' as provided in R as I am hardcore 
R disciple.

Regards

Vincy

--- On Mon, 5/30/11, ted.hard...@wlandres.net ted.hard...@wlandres.net wrote:

From: ted.hard...@wlandres.net ted.hard...@wlandres.net
Subject: Re: [R] Value of 'pi'
To: r-help@r-project.org
Received: Monday, May 30, 2011, 8:52 AM

On 30-May-11 07:06:57, Peter Langfelder wrote:
 On Sun, May 29, 2011 at 11:53 PM,  bill.venab...@csiro.au wrote:
 There is an urban legend that says Indiana passed a law implying
 pi = 3.

 (Because it says so in the bible...)
 
 Apparently the Fortran language has a DATA statement just for this
 purpose. This is allegedly a quote from an early Fortran manual:
 
 The primary purpose of the DATA statement is to give names to
 constants; instead of referring to pi as 3.141592653589793 at
 every appearance, the variable PI can be given that value with
 a DATA statement and used instead of the longer form of the
 constant. This also simplifies modifying the program, should
 the value of pi change.
 
 Peter

My take on this discussion:

Take a nice-looking pie, say 113355, slice it, and put one
half on top of the other. Call it pi:

  pi = 355/113

Compared with pi = 22/7, which is not even pretty, it is
also a much closer approximation to the mathematical ideal:

To 20 decimal places (using 'bc' here)

true pi
= 3.14159265358979323844

355/113
= 3.14159292035398230088

22/7
= 3.14285714285714285714

so 355/113 is good to the 6th decimal place (3.141593),
while 22/7 breaks down at the 3rd (3.143 instead of 3.142).

In the back of my head is a memory of a passage I read
some 50 years ago. I write a paraphrase, since I don't
recall the exact words:

 For an engineer, assuming that pi = 3.142 will
  probably enable him to build a very satisfactory
  bridge. Assuming that pi = 3.14159265358979323844
  will give the circumference of the Earth's orbit
  to one millionth of a millimetre. For a pure
  mathematician, however, either assumption leads to
  the conclusion that 1 = 0. It is necessary to
  preserve common sense in the application of
  mathematical deduction.

I suspect (from my context at the time) that it may
well have been by J.L. Synge (beautiful writer on
theoretical physics, especially Relativity Theory)
in one of his several writings on Ballistics.

However, the one possibly relevant printed item which
I still have from those days:

K.L. Nielsen and J.L. Synge,
On the motion of a spinning shell
Quarterly of Applied Mathematics, 4(3), Oct 1946,201-226.

discusses a very similar issue, but puts it quite
differently. If my quotation above reminds anyone
of the original, I would be very grateful to learn
of the reference to the source!

With thanks, and Many Happy Approximations to you all!
Ted.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 30-May-11                                       Time: 09:52:09
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R forum for only Statistics

2011-04-28 Thread Vincy Pyne
Hi!

I wish to know if there is any R forum which is meant only for Statistics? I 
mean where we can clarify our statistics doubts and seek knowledge. I know 
there are lot many books and internet sites, but 'R forum' has altogether 
different standard and very high level and one can learn a lot from them.

Regards

Vincy


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reversing order of vector

2011-03-29 Thread Vincy Pyne
Dear R helpers

Suppose I have a vector as

vect1 = as.character(c(ABC, XYZ, LMN, DEF))

 vect1
[1] ABC XYZ LMN DEF

I want to reverse the order of this vector as

vect2 = c(DEF, LMN, XYZ, ABC)

Kindly guide

Regards

Vincy






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ordering data.frame based on class

2011-03-28 Thread Vincy Pyne
Dear R helpers

Suppose I have a data.frame as given below -

my_dat = data.frame(class = c(XYZ, XYZ, XYZ, XYZ, XYZ,ABC, ABC, 
ABC, ABC, ABC ),  var1 = c(20, 14, 89, 81, 17, 44, 36, 41, 11, 36), var2 
= c(1001, 250, 456, 740, 380, 641, 111, 209, 830, 920))

 my_dat
   class var1 var2
1    XYZ   20 1001
2    XYZ   14  250
3    XYZ   89  456
4    XYZ   81  740
5    XYZ   17  380
6    ABC   44  641
7    ABC   36  111
8    ABC   41  209
9    ABC   11 
 830
10   ABC  20  920 

I wish to sort above data.frame class-wise on var1. Thus, Ineed to get


class    var1    var2

 
 
  XYZ  
  14
  250
 
 
  XYZ  
  17
  380
 
 
  XYZ  
  20
  1001
 
 
  XYZ  
  81
  740
 
 
  XYZ  
  89
  456
 
 
  ABC  
  11
  830
 
 
  ABC  
  20
  920
 
 
  ABC  
  36
  111
 
 
  ABC  
  41
  209
 
 
  ABC  
  44
  641
 

Kindly guide

Vincy



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Resending the mail - Ordering data.frame based on some class

2011-03-28 Thread Vincy Pyne
Dear R helpers

I am resending my mail as the output I desire was not properly visible and I 
apologize for the same.

Suppose I have a data.frame as given below -

my_dat = data.frame(class = c(XYZ, XYZ, XYZ, XYZ, XYZ,ABC, ABC, 
ABC, ABC, ABC ),  var1 = c(20, 14, 89, 81, 17, 44, 36, 41, 11, 36), var2 
= c(1001, 250, 456, 740, 380, 641, 111, 209, 830, 920))

 my_dat
   class var1 var2
1    XYZ   20 1001
2    XYZ   14  250
3    XYZ   89  456
4    XYZ   81  740
5    XYZ   17  380
6    ABC   44  641
7    ABC   36 
 111
8    ABC   41  209
9    ABC   11 
830
10   ABC  20  920

I wish to sort above data.frame class-wise on var1. Thus, Ineed to get


class  var1  var2
XYZ      14     250
XYZ      17     380
XYZ      20     1001
XYZ      81     740
XYZ      89     456
ABC      11     830
ABC      20     920
ABC      36     111
ABC      41     209
ABC      44     641

Regards

Vincy


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ordering data.frame based on class

2011-03-28 Thread Vincy Pyne
Dear sir,

Thanks for the great solution.

Regards

Vincy

--- On Mon, 3/28/11, Henrique Dallazuanna www...@gmail.com wrote:

From: Henrique Dallazuanna www...@gmail.com
Subject: Re: [R] Ordering data.frame based on class
To: Vincy Pyne vincy_p...@yahoo.ca
Cc: r-help@r-project.org
Received: Monday, March 28, 2011, 9:02 PM

Try this:

my_dat[order(my_dat$class, -my_dat$var1, decreasing = TRUE),]

On Mon, Mar 28, 2011 at 5:55 PM, Vincy Pyne vincy_p...@yahoo.ca wrote:
 Dear R helpers

 Suppose I have a data.frame as given below -

 my_dat = data.frame(class = c(XYZ, XYZ, XYZ, XYZ, XYZ,ABC, ABC, 
 ABC, ABC, ABC ),  var1 = c(20, 14, 89, 81, 17, 44, 36, 41, 11, 36), 
 var2 = c(1001, 250, 456, 740, 380, 641, 111, 209, 830, 920))

 my_dat
    class var1 var2
 1    XYZ   20 1001
 2    XYZ   14  250
 3    XYZ   89  456
 4    XYZ   81  740
 5    XYZ   17  380
 6    ABC   44  641
 7    ABC   36  111
 8    ABC   41  209
 9    ABC   11
  830
 10   ABC  20  920

 I wish to sort above data.frame class-wise on var1. Thus, Ineed to get


 class    var1    var2



  XYZ
  14
  250


  XYZ
  17
  380


  XYZ
  20
  1001


  XYZ
  81
  740


  XYZ
  89
  456


  ABC
  11
  830


  ABC
  20
  920


  ABC
  36
  111


  ABC
  41
  209


  ABC
  44
  641


 Kindly guide

 Vincy



        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Appending data to a data.frame and writing a csv

2011-03-25 Thread Vincy Pyne
Dear R helpers

exposure - data.frame(id = 
c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20),
ead = c(9483.686,5,6843.4968,10509.37125,21297.8905,5,706152.8354, 
62670.5625, 687.801995,50641.4875,59227.125,43818.5778,52887.72534,601788.7937, 
56813.14859,4012356.056,1419501.179,210853.4743,749961,6599.0862),
pd = c(0.0191,0.0050,0.0298,0.0449,0.0442,0.0479,0.0007,0.0203,0.0431,0.0069, 
0.0122,0.0022,0.0016,0.0082,0.0109,0.0008,0.0142,0.0171,0.0276,0.0178),
lgd = 
c(0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,
 0.45,0.45,0.45,0.45))

param - data.frame(alpha = 0.99, size = 50)    # size is basically no of 
simulations


n  -
 length(exposure$id)
id - exposure$id
ead    - exposure$ead
lgd    - exposure$lgd
pd - exposure$pd 
alpha  - param$alpha
samplesize - param$size

## generate random numbers s.t. 1 = Default, 0 = no-default. 

L - matrix(data=NA, nrow=n, ncol=samplesize, byrow=TRUE)

for(i in 1:n)
    L[i,] - rbinom(n=samplesize, size=1, prob=exposure$pd[i])

# 

# compute for each simulation

p_loss - e_loss - u_loss - NULL

for(i in 1:samplesize)

{
  
defaulting - subset(data.frame(id=exposure$id, ead=exposure$ead, 
lgd=exposure$lgd, pd=exposure$pd, loss=L[,i]),
 loss==1)

p_loss[i]  - sum(defaulting$ead * defaulting$lgd)
e_loss[i]  - sum(defaulting$ead * defaulting$lgd * defaulting$pd)
u_loss[i]  - sum(sqrt((defaulting$ead*defaulting$lgd)^2*defaulting$pd - 
(defaulting$ead * defaulting$lgd * defaulting$pd)^2))

sim_data   - data.frame(sim_no=rep(i,length(defaulting$id)), id=defaulting$id, 
ead=defaulting$ead, lgd=defaulting$lgd, pd=defaulting$pd)

write.csv(sim_data, file='sim_data.csv', append=TRUE, row.names=FALSE)

}

For a given set of 0's and 1's (i.e. for each simulation and there are 50 
simulations), first I filter all the entries corresponding to 0's i.e. for a 
given simulation, I need to store ead, lgd and pd pertaining to only non-zeros 
i.e. pertaining to 1. Thus, for each of these 50 simulations, I need to define 
a data.frame giving me filtered ead, lgd and pd and in teh end write a single 
file sim_data.csv

I get
 following warnings.

Warning messages:
1: In write.csv(sim_data, file = sim_data.csv, append = TRUE,  ... :
  attempt to set 'append' ignored
2: In write.csv(sim_data, file = sim_data.csv, append = TRUE,  ... :
  attempt to set 'append' ignored
.

.
50: In write.csv(sim_data, file = sim_data.csv, append = TRUE,  ... :

  attempt to set 'append' ignored

Kindly guide

Regards

Vincy




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Appending data to a data.frame and writing a csv

2011-03-25 Thread Vincy Pyne
Dear Mr Ista Zahn,

Thanks a lot for your suggestion. I had also realized that if I need to 
write.csv command should be out of loop. At first, I need to construct the 
data.frame. 

Actually appending this data.frame is causing me the problem and not writing 
the csv file. That particular command will be executed outside the loop.

Once this is generated, writing of the csv file should not be problem outside 
the loop. 

Regards

Vincy

--- On Fri, 3/25/11, Ista Zahn iz...@psych.rochester.edu wrote:

From: Ista Zahn iz...@psych.rochester.edu
Subject: Re: [R] Appending data to a data.frame and writing a csv
To: Vincy Pyne vincy_p...@yahoo.ca
Cc: r-help@r-project.org
Received: Friday, March 25, 2011, 4:02 PM

Hi Vincy,
Please read the help file, particularly the part about write.csv and
write.csv2 where it says These wrappers are deliberately inflexible:
they are designed to ensure that the correct conventions are used to
write a valid file. Attempts to change append, col.names, sep, dec or
qmethod are ignored, with a warning.

Use write.table instead.

Best,
Ista

On Fri, Mar 25, 2011 at 8:55 AM, Vincy Pyne vincy_p...@yahoo.ca wrote:
 Dear R helpers

 exposure - data.frame(id = 
 c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20),
 ead = c(9483.686,5,6843.4968,10509.37125,21297.8905,5,706152.8354, 
 62670.5625, 
 687.801995,50641.4875,59227.125,43818.5778,52887.72534,601788.7937, 
 56813.14859,4012356.056,1419501.179,210853.4743,749961,6599.0862),
 pd = c(0.0191,0.0050,0.0298,0.0449,0.0442,0.0479,0.0007,0.0203,0.0431,0.0069, 
 0.0122,0.0022,0.0016,0.0082,0.0109,0.0008,0.0142,0.0171,0.0276,0.0178),
 lgd = 
 c(0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,
  0.45,0.45,0.45,0.45))

 param - data.frame(alpha = 0.99, size = 50)    # size is basically no of 
 simulations


 n  -
  length(exposure$id)
 id - exposure$id
 ead    - exposure$ead
 lgd    - exposure$lgd
 pd - exposure$pd
 alpha  - param$alpha
 samplesize - param$size

 ## generate random numbers s.t. 1 = Default, 0 = no-default.

 L - matrix(data=NA, nrow=n, ncol=samplesize, byrow=TRUE)

 for(i in 1:n)
     L[i,] - rbinom(n=samplesize, size=1, prob=exposure$pd[i])

 # 

 # compute for each simulation

 p_loss - e_loss - u_loss - NULL

 for(i in 1:samplesize)

 {

 defaulting - subset(data.frame(id=exposure$id, ead=exposure$ead, 
 lgd=exposure$lgd, pd=exposure$pd, loss=L[,i]),
  loss==1)

 p_loss[i]  - sum(defaulting$ead * defaulting$lgd)
 e_loss[i]  - sum(defaulting$ead * defaulting$lgd * defaulting$pd)
 u_loss[i]  - sum(sqrt((defaulting$ead*defaulting$lgd)^2*defaulting$pd - 
 (defaulting$ead * defaulting$lgd * defaulting$pd)^2))

 sim_data   - data.frame(sim_no=rep(i,length(defaulting$id)), 
 id=defaulting$id, ead=defaulting$ead, lgd=defaulting$lgd, pd=defaulting$pd)

 write.csv(sim_data, file='sim_data.csv', append=TRUE, row.names=FALSE)

 }

 For a given set of 0's and 1's (i.e. for each simulation and there are 50 
 simulations), first I filter all the entries corresponding to 0's i.e. for a 
 given simulation, I need to store ead, lgd and pd pertaining to only 
 non-zeros i.e. pertaining to 1. Thus, for each of these 50 simulations, I 
 need to define a data.frame giving me filtered ead, lgd and pd and in teh end 
 write a single file sim_data.csv

 I get
  following warnings.

 Warning messages:
 1: In write.csv(sim_data, file = sim_data.csv, append = TRUE,  ... :
   attempt to set 'append' ignored
 2: In write.csv(sim_data, file = sim_data.csv, append = TRUE,  ... :
   attempt to set 'append' ignored
 .

 .
 50: In write.csv(sim_data, file = sim_data.csv, append = TRUE,  ... :

   attempt to set 'append' ignored

 Kindly guide

 Regards

 Vincy




        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Correlation for no of variables

2011-03-21 Thread Vincy Pyne
Dear R helpers,

Suppose I have stock returns data of say 1500 companies each for say last 4 
years. Thus I have a matrix of dimension say 1000 * 1500 i.e. 1500 columns 
representing companies and 1000 rows of their returns.

I need to find the correlation matrix of these 1500 companies. 

So I can find out the correlation as 

cor(returns) and expect to get 1500 * 1500 matrix. However, the process takes a 
tremendous time. Is there any way in expediting such a process. In reality, I 
may be dealing with lots of even 5000 stocks and may simulate even 10 stock 
returns.



Kindly guide. 

Vincy 





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation for no of variables

2011-03-21 Thread Vincy Pyne
Thanks Mr Langfelder,

Definitely I will go through the packages you have suggested. Actually, I will 
be multiplying three matrices of the order (1 X 1500)%*%(1500 X 1500) %*% 
(1500, 1) giving me one value at the end.

I will be starting my process in a couple of days time and in between will 
refer to the packages you have suggested.

Thanks again

Vincy

--- On Mon, 3/21/11, Peter Langfelder peter.langfel...@gmail.com wrote:

From: Peter Langfelder peter.langfel...@gmail.com
Subject: Re: [R] Correlation for no of variables
To: Vincy Pyne vincy_p...@yahoo.ca
Cc: r-help@r-project.org
Received: Monday, March 21, 2011, 4:50 PM

On Mon, Mar 21, 2011 at 8:34 AM, Vincy Pyne vincy_p...@yahoo.ca wrote:
 Dear R helpers,

 Suppose I have stock returns data of say 1500 companies each for say last 4 
 years. Thus I have a matrix of dimension say 1000 * 1500 i.e. 1500 columns 
 representing companies and 1000 rows of their returns.

 I need to find the correlation matrix of these 1500 companies.

 So I can find out the correlation as

 cor(returns) and expect to get 1500 * 1500 matrix. However, the process takes 
 a tremendous time. Is there any way in expediting such a process. In reality, 
 I may be dealing with lots of even 5000 stocks and may simulate even 10 
 stock returns.


How long is tremendous time?

What platform are you on? If you can compile R against a tuned BLAS
library, stats::cor will run faster IF you do not have any missing
data.

If you do have missing data, you may want to try the package WGCNA
(where we work with bigger correlation matrices) that implements a
correlation calculation that is faster particularly if there are few
missing data. This will also run faster if you do have a tuned BLAS
installed.

HTH,

Peter




 Kindly guide.

 Vincy





        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] One to One Matching multiple vectors

2011-03-16 Thread Vincy Pyne
Dear R helpers

Suppose,

x = c(0,  1,  2,  3)

y = c(A, B, C, D)

z = c(1, 3)

For given values of z, I need to the values of y. So I should get B and D. 

I tried doing 

y[x][z] but it gives 

 y[x][z]
[1] A C

Kindly guide.

Regards

Vincy



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Matching two vectors

2011-03-15 Thread Vincy Pyne
Dear R helpers

Suppose I have a vector as

vect_1 = c(AAA, AA, A, BBB, BB, B, CCC)

vect_1_id = c(1:length(vect_1))

Through some process I obtain

vect_2_id = c(2, 3, 7), then I need a new vector say vect_2 which will give me

vect2 = (AA, A, CCC)  i.e. I need the subset of vect_1 as per vect_2_id.

Thanking in advance

Regards

Vincy





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Identifying unique pairs

2011-03-12 Thread Vincy Pyne
Dear R helpers

Suppose I have a data frame as given below

mydat = data.frame(x = c(1,1,1, 2, 2, 2, 2, 2, 5, 5, 6), y = c(10, 10, 10, 8, 
8, 8, 7, 7, 2, 2, 4))


mydat
    x y
1  1 10
2  1 10
3  1 10
4  2   8
5  2   8
6  2   8
7  2   7
8  2       7
9  5   2
10    5   2
11    6       4

unique(mydat$x) will give me 1, 2, 5, 6  i.e. 4 values and
unique(mydat$y) will give me 10, 8, 7, 2, 4.

What I need is a data frame where I will get a vector (say) x_new as (1, 2, 2, 
5, 6) and corresponding y_new as (10, 8, 7, 2, 4). I need to use these two 
vectors viz. x_new and y_new seperately for further processing. They may be 
under same data frame say mydat_new but I should be able to access them as 
mydat_new$x_new and similarly for y.

I tried following way.

pp = paste(mydat$x, mydat$y)

pp =  pp
 [1] 1 10 1 10 1 10 2 8  2 8  2 8  2 7  2 7  5 2  5 2  6 
4 

qq = unique(pp)

 qq
[1] 1 10 2 8  2 7  5 2  6 4 

So I get the desired pairs, but I want each element of pair in two columns 
seperately as

x_new y_new

1    10
2  8
2  7
5  2
6  4

Kindly guide

Vincy




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Identifying unique pairs

2011-03-12 Thread Vincy Pyne
Thanks sir for your reply. Unfortunately I couldn't figure out the solution.

Vincy

--- On Sat, 3/12/11, Dennis Murphy djmu...@gmail.com wrote:

From: Dennis Murphy djmu...@gmail.com
Subject: Re: [R] Identifying unique pairs
To: Vincy Pyne vincy_p...@yahoo.ca
Cc: r-help@r-project.org
Received: Saturday, March 12, 2011, 11:45 AM

Hi:

This problem came up the other day - see

http://stats.stackexchange.com/questions/7884/fast-ways-in-r-to-get-the-first-row-of-a-data-frame-grouped-by-an-identifier/7985#7985


Dennis

On Sat, Mar 12, 2011 at 3:20 AM, Vincy Pyne vincy_p...@yahoo.ca wrote:

Dear R helpers



Suppose I have a data frame as given below



mydat = data.frame(x = c(1,1,1, 2, 2, 2, 2, 2, 5, 5, 6), y = c(10, 10, 10, 8, 
8, 8, 7, 7, 2, 2, 4))





mydat

    x y

1  1 10

2  1 10

3  1 10

4  2   8

5  2   8

6  2   8

7  2   7

8  2       7

9  5   2

10    5   2

11    6       4



unique(mydat$x) will give me 1, 2, 5, 6  i.e. 4 values and

unique(mydat$y) will give me 10, 8, 7, 2, 4.



What I need is a data frame where I will get a vector (say) x_new as (1, 2, 2, 
5, 6) and corresponding y_new as (10, 8, 7, 2, 4). I need to use these two 
vectors viz. x_new and y_new seperately for further processing. They may be 
under same data frame say mydat_new but I should be able to access them as 
mydat_new$x_new and similarly for y.




I tried following way.



pp = paste(mydat$x, mydat$y)



pp =  pp

 [1] 1 10 1 10 1 10 2 8  2 8  2 8  2 7  2 7  5 2  5 2  6 4



qq = unique(pp)



 qq

[1] 1 10 2 8  2 7  5 2  6 4



So I get the desired pairs, but I want each element of pair in two columns 
seperately as



x_new y_new



1    10

2  8

2  7

5  2

6  4



Kindly guide



Vincy









        [[alternative HTML version deleted]]




__

R-help@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Identifying unique pairs

2011-03-12 Thread Vincy Pyne
Dear sir, 

Thanks a lot for the solution.

It was such a simple solution, but people like me close their minds and don't 
think of data.frame as a whole and keep on thinking about vector elements only.

I also almost got the solution when I tried
qq = unique(pp)

(qq - sub( .*,, qq)) but this was giving me only first element of qq.

so I reversed the way I had defined paste command and saved it in some otehr 
name and again applied above command and I got the element. But I know that is 
not how good programs are written.

Regards

Vincy



--- On Sat, 3/12/11, Petr Savicky savi...@praha1.ff.cuni.cz wrote:

From: Petr Savicky savi...@praha1.ff.cuni.cz
Subject: Re: [R] Identifying unique pairs
To: r-help@r-project.org
Received: Saturday, March 12, 2011, 2:10 PM

On Sat, Mar 12, 2011 at 03:20:01AM -0800, Vincy Pyne wrote:
 Dear R helpers
 
 Suppose I have a data frame as given below
 
 mydat = data.frame(x = c(1,1,1, 2, 2, 2, 2, 2, 5, 5, 6), y = c(10, 10, 10, 8, 
 8, 8, 7, 7, 2, 2, 4))
 
[...]
 
 unique(mydat$x) will give me 1, 2, 5, 6? i.e. 4 values and
 unique(mydat$y) will give me 10, 8, 7, 2, 4.
 
 What I need is a data frame where I will get a vector (say) x_new as (1, 2, 
 2, 5, 6) and corresponding y_new as (10, 8, 7, 2, 4). I need to use these two 
 vectors viz. x_new and y_new seperately for further processing. They may be 
 under same data frame say mydat_new but I should be able to access them as 
 mydat_new$x_new and similarly for y.
 
 I tried following way.
 
 pp = paste(mydat$x, mydat$y)
 
 pp =  pp
 ?[1] 1 10 1 10 1 10 2 8? 2 8? 2 8? 2 7? 2 7? 5 2? 5 2? 6 
 4 
 
 qq = unique(pp)
 
  qq
 [1] 1 10 2 8? 2 7? 5 2? 6 4 

Hi.

If i understand you correctly, then the solution is easy, since function
unique() can handle also rows of a data frame. Is the following, what
you expect?

  unique(mydat)

     x  y
  1  1 10
  4  2  8
  7  2  7
  9  5  2
  11 6  4

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Generation of random numbers in a function - (Return command)

2011-03-11 Thread Vincy Pyne
Dear R helpers

I have following data.frame and for each product_name, I have associated mean 
and standard deviation. I need to generate 1000 random no.s for each of these 
products and find the respective mean and standard deviation.
 
My R code is as follows. 
 

library(plyr)
library(reshape2)
 
filtered_new - data.frame(product_name = c(P1, P2, P3, P4, P5), 
output_avg = c(22.71078,22.16979,21.34420,20.17421,19.83799),
output_stdev = c(23.59924,21.21430,22.01025,18.88877,18.80436))

n - 100

myfunction_mc = function(product_name, output_avg, output_stdev)

{

product_usage_borrowing_room_mc = rnorm(n, output_avg, output_stdev)

output_avg_mc =
 mean(product_usage_borrowing_room_mc)
output_stdev_mc = sd(product_usage_borrowing_room_mc)

return(output_avg_mc )

}

result - dlply(.data = filtered_new, .variables = product_name, .fun = 
function(x) 
 myfunction_mc(product_name = x$product_name, output_avg = 
x$output_avg, 
                 output_stdev = x$output_stdev))

result1 - data.frame(result)

result2 - melt(result1)

result - data.frame(product = filtered_new$product_name, Monte_Carlo_result = 
result2$value)                  

And it gives me the desired result. 


# PROBLEM is as given below -

But if in the myfunction_mc, in the return statement if I try to add 
'output_stdev_mc'
 i.e.

myfunction_mc = function(product_name, output_avg, output_stdev)

{

product_usage_borrowing_room_mc = rnorm(n, output_avg, output_stdev)

output_avg_mc = mean(product_usage_borrowing_room_mc)
output_stdev_mc = sd(product_usage_borrowing_room_mc)

return(output_avg_mc, output_stdev_mc)    # I have added output_stdev_m

}

result - dlply(.data = filtered_new, .variables = product_name, .fun = 
function(x) 
 myfunction_mc(product_name = x$product_name, output_avg = 
x$output_avg, 
                 output_stdev = x$output_stdev))

I get following error -

Error in return(output_avg_mc, output_stdev_mc) : multi-argument returns are 
not permitted

Kindly
 guide.

Regards


Vincy


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to use conditional statement

2011-03-10 Thread Vincy Pyne
Dear R helpers

Suppose

val1 = c(10, 20, 35, 80, 12)
val2 = c(3, 8, 11, 7)

I want to select either val1 or val2 depending on value of third quantity val3.

val3 assumes either of the values Monthly or Yearly.

If val3 = Monthly, then val = val1 and if val3 = Yearly, then val = val2.

I tried the ifelse statement as


ifelse(val3 = Monthly, val = val1, val2)
 
I get following error

 ifelse(val3 = Monthly, val = val1, val2)
Error in ifelse(val3 = Monthly, val = val1, val2) : 
  unused argument(s) (val3 = Monthly, val = val1)

 val
Error: object 'val' not found

Kindly guide.

Regards

Vincy



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to use conditional statement

2011-03-10 Thread Vincy Pyne
Thanks a lot for the wonderful guidance.

Regards

Vincy

--- On Thu, 3/10/11, Ivan Calandra ivan.calan...@uni-hamburg.de wrote:

From: Ivan Calandra ivan.calan...@uni-hamburg.de
Subject: Re: [R] How to use conditional statement
To: Duncan Murdoch murdoch.dun...@gmail.com
Cc: r-help@r-project.org
Received: Thursday, March 10, 2011, 12:32 PM

Thanks for the comment, I didn't think about this thorougly.

Le 3/10/2011 12:54, Duncan Murdoch a écrit :
 On 11-03-10 5:54 AM, Ivan Calandra wrote:
 Try with double == instead:
 ifelse(val3 == Monthly, val- val1, val- val2)

 That might work, but it is not how you should do it.  (It should work 
 if val3 has a single entry, but will do strange things if val3 is a 
 vector:

  val3 - c(Monthly, Daily)
  ifelse(val3 == Monthly, val- 1, val- 2)
 [1] 1 2
  val
 [1] 2


 The ifelse() function does a vectorized test, and picks results from 
 the two vector alternatives.  Vincy wants a simple logical if, which 
 can be computed in a few different ways:

  val - if(val3 == Monthly) val1 else val2

 or

  if (val3 == Monthly) val - val1
  else val - val2

 For a simple calculation like this I'd probably use the former; if the 
 calculation got more complex I'd prefer the latter.

 Duncan Murdoch


 Single = is for setting arguments within a function call. If you want
 to test equality, then double == is required.
 See ?==

 HTH,
 Ivan

 Le 3/10/2011 11:45, Vincy Pyne a écrit :
 Dear R helpers

 Suppose

 val1 = c(10, 20, 35, 80, 12)
 val2 = c(3, 8, 11, 7)

 I want to select either val1 or val2 depending on value of third 
 quantity val3.

 val3 assumes either of the values Monthly or Yearly.

 If val3 = Monthly, then val = val1 and if val3 = Yearly, then 
 val = val2.

 I tried the ifelse statement as


 ifelse(val3 = Monthly, val = val1, val2)

 I get following error

 ifelse(val3 = Monthly, val = val1, val2)
 Error in ifelse(val3 = Monthly, val = val1, val2) :
     unused argument(s) (val3 = Monthly, val = val1)

 val
 Error: object 'val' not found

 Kindly guide.

 Regards

 Vincy



     [[alternative HTML version deleted]]



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rearranging the data

2011-03-09 Thread Vincy Pyne
Dear R helpers,

xx = data.frame(country = c(USA, UK, Canada), x = c(10, 50, 20), y = 
c(40, 80, 35), z = c(70, 62, 10))

 xx
       country  x y    z
1  USA    10    40  70
2 
 UK  50   80   62
3 Canada    20   35   10




I need to arrange this as a new data.frame as follows -

country   type values
USA    x 10 
USA    y 40
USA    z 70
UK x 50
 
UK y 80 
UK z 62  
Canada x 20  
Canada y 35
Canada z 10

I did try reshape package but things are in mess. Please guide

Regards

Vincy  
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to sort using a predefined criterion

2011-03-08 Thread Vincy Pyne
Dear R helpers,

Suppose I have following data.frame.

df - data.frame(category = c(treat_A, treat_A, treat_A, treat_A, 
treat_A, treat_A, treat_A, treat_A, treat_B, treat_B, treat_B, 
treat_B, treat_B, treat_B, treat_B, treat_B), type = c(AA, , 
B, AAA, BB, , BBB, AAA, B, AAA, BBB, AA, , BB, 
A, ), values = c(0.382000183, 0.100680563, 0.596484268, 0.899105808, 
0.884609516, 0.958464309, 0.014496292, 0.407422102, 0.863246559, 0.138584552, 
0.245033113, 0.045472579, 0.032380139, 0.164128544, 0.219611194, 0.017090365))

 df
   category type  values
1   treat_A 
 AA   0.38200018
2   treat_A        0.10068056
3   treat_A       B    0.59648427
4   treat_A AAA 0.89910581
5   treat_A     BB   0.88460952
6   treat_A       0.95846431
7   treat_A    BBB 0.01449629
8   treat_A   A      0.40742210
9   treat_B  B   0.86324656
10  treat_B   AAA 0.13858455
11  treat_B   BBB 0.24503311
12  treat_B   
 AA  0.04547258
13  treat_B     0.03238014
14  treat_B    BB  0.16412854
15  treat_B     A   0.21961119
16  treat_B      0.01709036

I need to sort above dataframe for the category treat_A and treat_B type-wise 
i.e. in the order (, AAA, AA, A, , BBB, BB, B) Thus I need 

   category type values
1   treat_A   0.10068056
2   treat_A  AAA   0.89910581
3   treat_A   AA    0.38200018
4  
 treat_A    A 0.40742210
5   treat_A  0.95846431
6   treat_A  BBB  0.01449629
7   treat_A   BB   0.88460952
8   treat_A    B    0.59648427
9   treat_B     0.03238014
10  treat_B AAA  0.13858455
11  treat_B  AA   0.04547258
12  treat_B   A    0.21961119
13  treat_B
    0.01709036
14  treat_B  BBB    0.24503311
15  treat_B   BB 0.16412854
16  treat_B    B  0.86324656

Kindly advice how this can be achieved. I referred to ?sort and ?order 
literature, but couldn't find any example of this sort.

Thanking you in advance.


Vincy



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Replacing an element in a vector

2011-02-28 Thread Vincy Pyne
Dear R helpers

I seem to have one trivial problem but can't find solution to it.

Suppose I have following input.

A = c(1,  3,  0,  5,  8)  # 3rd element is 0
B = c(100, 30,  0,  25,  40)  # 3rd element is 0

C = A/B

 C
[1] 0.01 0.10  NaN 0.20 0.20

Obviously, I can't divide 0/0 and hence NaN. My problem is how to replace this 
NaN say by 0.

So that I can have C as

C = c(0.01, 0.10, 0, 0.20, 0.20) 

I tried the replace command but can't get rid of NaN.

Kindly guide.

Vincy



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subtracting elements of data.frame

2011-01-25 Thread Vincy Pyne
Dear R helpers

I have a dataframe as

df = data.frame(x = c(1, 14, 3, 21, 11), y = c(102, 500, 40, 101, 189))

 df
   x   y
1  1 102
2 14 500
3  3  40
4 21 101
5 11 189

# Actually I am having dataframe having multiple columns. I am just giving an 
example.

I need to subtract all the rows of df by the first row of df i.e. I need to 
subtract each element of 'x' column by 1. Likewise I need to subtract all 
elements of column 'y' by 11. Thus I need an output like

 df_new
   x   y
1  0   0
2 13 398
3  2 -62
4 20  -1
5 10  87

As I had mentioned above, I have number of columns in reality and thus I can't 
use the command
 say

df_new = data.frame(x = df$x-df$x[1], y = df$y-df$y[1])

Kindly guide

Thanking you all in advance

Regards

Vincy



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sorting data.frame datewise in a descending order

2010-12-30 Thread Vincy Pyne
Dear 'HTH' R friends

I have a small dataframe as given below. I need to sort this database based on 
date in a decending order. I am not sure whether I have defined the date column 
in a proper format.

mydat-data.frame(date = (c(1/31/2010, 2/28/2010, 3/31/2010, 4/30/2010, 
5/31/2010, 6/30/2010, 7/31/2010, 8/31/2010, 9/30/2010, 10/31/2010, 
11/30/2010, 12/28/2010)), total=c(429, 25, 239, 99, 100, 96, 18, 21, 10, 
76, 101, 81), newspapers=c(103, 4, 37, 109, 52, 87, 17, 13, 10, 56, 87, 14))

                           
 mydat
 date total newspapers
1   1/31/2010   429    103
2   2/28/2010    25  4
3   3/31/2010   239 37
4   4/30/2010    99    109
5   5/31/2010   100 52
6   6/30/2010    96 87
7   7/31/2010    18 17
8   8/31/2010    21 13
9   9/30/2010    10 10
10 10/31/2010    76 56
11 11/30/2010   101 87
12 12/28/2010    81 14
                           
I need to sort this data in a DESCENDING order based on a date. I.e. I need to 
have

        date          total     newspapers
   12/28/2010        81   14
   11/30/2010  101   87   
   10/31/2010    76   56  
 .
 ..

    1/31/2010 429  103
       
When I tried        
                           
mydat.sort - mydat[order(mydat$date)]

 mydat.sort - mydat[order(mydat$date)]

Error in `[.data.frame`(mydat, order(mydat$date)) :  undefined columns selected

Kindly guide

Vincy Pyne




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting data.frame datewise in a descending order

2010-12-30 Thread Vincy Pyne
Dear sir,

Thanks a lot for your great guidance. It worked fantastically.

Regards

Vincy Pyne

--- On Thu, 12/30/10, Henrique Dallazuanna www...@gmail.com wrote:

From: Henrique Dallazuanna www...@gmail.com
Subject: Re: [R] Sorting data.frame datewise in a descending order
To: Vincy Pyne vincy_p...@yahoo.ca
Cc: r-help@r-project.org
Received: Thursday, December 30, 2010, 11:31 AM

Try this:

mydat[order(as.Date(mydat$date, %m/%d/%Y), decreasing = TRUE),]

On Thu, Dec 30, 2010 at 9:27 AM, Vincy Pyne vincy_p...@yahoo.ca wrote:


Dear 'HTH' R friends



I have a small dataframe as given below. I need to sort this database based on 
date in a decending order. I am not sure whether I have defined the date column 
in a proper format.



mydat-data.frame(date = (c(1/31/2010, 2/28/2010, 3/31/2010, 4/30/2010, 
5/31/2010, 6/30/2010, 7/31/2010, 8/31/2010, 9/30/2010, 10/31/2010, 
11/30/2010, 12/28/2010)), total=c(429, 25, 239, 99, 100, 96, 18, 21, 10, 
76, 101, 81), newspapers=c(103, 4, 37, 109, 52, 87, 17, 13, 10, 56, 87, 14))





                           

 mydat

 date total newspapers

1   1/31/2010   429    103

2   2/28/2010    25  4

3   3/31/2010   239 37

4   4/30/2010    99    109

5   5/31/2010   100 52

6   6/30/2010    96 87

7   7/31/2010    18 17

8   8/31/2010    21 13

9   9/30/2010    10 10

10 10/31/2010    76 56

11 11/30/2010   101 87

12 12/28/2010    81 14

                           

I need to sort this data in a DESCENDING order based on a date. I.e. I need to 
have



        date          total     newspapers

   12/28/2010        81   14

   11/30/2010  101   87   

   10/31/2010    76   56  

 .

 ..



    1/31/2010 429  103

       

When I tried        

                           

mydat.sort - mydat[order(mydat$date)]



 mydat.sort - mydat[order(mydat$date)]



Error in `[.data.frame`(mydat, order(mydat$date)) :  undefined columns selected



Kindly guide



Vincy Pyne









        [[alternative HTML version deleted]]




__

R-help@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Changing column names

2010-12-30 Thread Vincy Pyne
Dear R helpers

Wish you all a very Happy and Prosperous New Year 2011.

I have following query.

country = c(US, France, UK, NewZealand, Germany, Austria, Italy, 
Canada)

Through some other R process, the result.csv file is generated as

result.csv

 var1   var2  var3  var4    var5    var6   var7   var8
1  25 45    29    92 108 105 65 56
2  80    132    83    38  38  11 47 74
3 135 11    74    56  74  74 74 29

                  
I need the country names to be column heads i.e. I need an output like

 result_new
    US    France     UK   NewZealand  Germany  Austria      Italy 
Canada
1   25          45      29              92     108    
105    65         56
2   80        132  83  38       38      
11    47         74
3  135 11  74  56       74      
74    74         29
                  

The number of countries i.e. length(country) matches with total number of 
variables (i.e. no of columns in 'result.csv').                  
                                    
One way of doing this is to use country names as column names while writing the 
'result.csv' file. 

write.csv(data.frame(US = ..., France = ...), 'result.csv', 
row.names = FALSE)


However, the problem is I don't know in what order the country names will 
appear and also there could be addition or deletion of some country names. 
Also, if there are say 150 country names, the above way (i.e. writing.csv) of 
defining the column names is not practical. 

Basically I want to change the column heads after the 'result.csv' is generated.

Kindly guide.

Regards

Vincy




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sequence generation in a table

2010-12-09 Thread Vincy Pyne




Dear R helpers

I have following input

f = c(257, 520, 110). I need to generate a decreasing sequence (decreasing by 
100) which will give me an input (in a tabular form) like

257, 157, 57
520, 420, 320, 220, 120, 20
110, 10


I tried the following R code


f = c(257, 520, 110)
yy = matrix(data = NA, nrow = 3, ncol = 6)

for (i in 1:3)
 {
 value = NULL

 for (j in 1 : 6)
  {
  value = c(ans, seq(f[i], 1, by = -100))
  }
    yy[i,] = ans[i,j]
    }

I get following message

Error in ans[i, j] : incorrect number of dimensions. Also, I understand above 
logic will generate a result in (3 by 6) matrix format, while I need to 
generate only 3 numbers pertaining to first no. i.e. 257, 6 nos. beginning from 
520, and only 2 numbers beginning from 110. I also tried tapply etc.

Please guide

Vincy





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sequence generation in a table

2010-12-09 Thread Vincy Pyne
Dear Sir,

Sorry to bother you again. Sir, the R code provided by you gives me following 
output.

 yy - lapply(c(257, 520, 110), seq, to=0, by=-100)
 yy
[[1]]
[1] 257 157  57

[[2]]
[1] 520 420 320 220 120  20

[[3]]
[1] 110  10

The biggest constraint for me is here as an example I have taken only three 
cases i.e. c(257, 520, 110), however in reality I will be dealing with no of 
cases and that number is unknown. But your code will certainly generate me the 
required numbers. In above case for doing further calculations, I can define 
say 

yy1 = as.numeric(yy[[1]])
yy2 = as.numeric(yy[2]])
yy3 = as.numeric(yy[[3]])

But when the number of cases are unknown, perhaps this is not the practical way 
of me defining individually. So is there any way that I can have all the 
sequence numbers generated can be accommodated in a single dataframe. I 
sincerely apologize for disturbing you Sir and hope I am able to put up my 
problem in a proper manner.

Regards

Vincy Pyne


--- On Thu, 12/9/10, Jan van der Laan rh...@eoos.dds.nl wrote:

From: Jan van der Laan rh...@eoos.dds.nl
Subject: Re: [R] Sequence generation in a table
To: r-help@r-project.org, vincy_p...@yahoo.ca
Received: Thursday, December 9, 2010, 10:57 AM

Vincy,

I suppose the following does what you want. yy is now a list which allows for 
differing lengths of the vectors.

 yy - lapply(c(257, 520, 110), seq, to=0, by=-100)
 yy[[1]]
[1] 257 157  57
 yy[[2]]
[1] 520 420 320 220 120  20


Regards,
Jan


On 9-12-2010 11:40, Vincy Pyne wrote:
 c(257, 520, 110)



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sequence generation in a table

2010-12-09 Thread Vincy Pyne
Dear Sirs,

I understand these already are numeric values. Sir, Basically I am working on 
Value at Risk for the Bond portfolio using the historical simulation and for 
this I need to find out Marked to Market (MTM) value suing the Present Value of 
the coupon payments for each Bonds (here as an example I have taken only 3). 

What I have done so far is for a given bond I have found no of days left for 
maturity. E.g. in 1st case there are 257 days left for maturity. The bond pays 
coupon twice a year and thus on 257th day the bond will mature and I will be 
getting the Principal and final coupon payment. Since teh bond is paying the 
coupons every 6 months, going backward from 257 th day, my earlier coupon 
payment falls on (257 - 180) = 77 days. (However, in above example, I have just 
taken 100 just for example purpose)

Thus, assuming 100 days, my coupons will be paid on 257, 157,  57days. I need 
to convert these days in terms of years and so when I try to divide yy defined 
as 

yy - lapply(c(257, 520,
 110), seq, to=0, by=-100)

yy/360, I get following error.

Error in yy/360 : non-numeric argument to binary operator

On the other hand, 

yy[[1]]/365  fetches me 

[1] 0.7138889 0.436 0.158


Thus, I am trying to obtain the result yy - lapply(c(257, 520,
 110), seq, to=0, by=-100) in such a form, so taht I should be able to further 
analysis. What I was trying to say is since here I am taking only three bonds, 
so I can do it individually, however if there are number of bonds (say 1000) in 
the portfolio, my method of converting the days individually is not practical.

I am extremely sorry for the inconvenience caused. I tried to keep my problem 
short in oder not to consume your valuable time.

Regards

Vince Pyne



--- On Thu, 12/9/10, Petr PIKAL petr.pi...@precheza.cz wrote:

From: Petr PIKAL petr.pi...@precheza.cz
Subject: Re: [R] Sequence generation in a table
To: Vincy Pyne vincy_p...@yahoo.ca
Cc: r-help@r-project.org
Received: Thursday, December 9, 2010, 12:03 PM

Hi

r-help-boun...@r-project.org napsal dne 09.12.2010 12:41:47:

 Dear Sir,
 
 Sorry to bother you again. Sir, the R code provided by you gives me 
following output.
 
  yy - lapply(c(257, 520,
 110), seq, to=0, by=-100)
  yy
 [[1]]
 [1] 257 157  57
 
 [[2]]
 [1] 520 420 320 220 120  20
 
 [[3]]
 [1] 110  10
 
 The biggest constraint for me is here as an example I have taken only 
three 
 cases i.e. c(257, 520, 110), however in reality I will be dealing with 
no of 
 cases and that number is unknown. But your code will certainly generate 
me the
 required numbers. In above case for doing further calculations, I can 
define say 
 
 yy1 = as.numeric(yy[[1]])
 yy2 = as.numeric(yy[2]])
 yy3 = as.numeric(yy[[3]])

Why? Those values are already numeric.

lapply(yy, is.numeric)

[[1]]
[1] TRUE

[[2]]
[1] TRUE

[[3]]
[1] TRUE

and you can use the same construction to perform almost any operation on 
list.

lapply(yy, max)
lapply(yy,
 mean)
lapply(yy, sd)
lapply(yy, t.test)

Regards
Petr


 
 But when the number of cases are unknown, perhaps this is not the 
practical 
 way of me defining individually. So is there any way that I can have all 
the 
 sequence numbers generated can be accommodated in a single dataframe. I 
 sincerely apologize for disturbing you Sir and hope I am able to put up 
my 
 problem in a proper manner.
 
 Regards
 
 Vincy Pyne
 
 
 --- On Thu, 12/9/10, Jan van der Laan rh...@eoos.dds.nl wrote:
 
 From: Jan van der Laan rh...@eoos.dds.nl
 Subject: Re: [R] Sequence generation in a table
 To: r-help@r-project.org, vincy_p...@yahoo.ca
 Received: Thursday, December 9, 2010, 10:57 AM
 
 Vincy,
 
 I suppose the following does what you want. yy is now a list which 
allows for 
 differing lengths of the vectors.
 
  yy - lapply(c(257, 520, 110), seq, to=0, by=-100)
  yy[[1]]
 [1] 257 157  57
  yy[[2]]
 [1] 520 420 320 220 120  20
 
 
 Regards,
 Jan
 
 
 On 9-12-2010 11:40, Vincy Pyne wrote:
  c(257, 520, 110)
 
 
 
    [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] One silly question about tapply output

2010-10-27 Thread Vincy Pyne
Dear R helpers

I have a data which gives Month-wise and Rating-wise Rates. So the input file 
is something like

month   rating   rate
January    AAA 9.04
February  AAA             9.07
..
..
Decemeber     AAA            8.97  
January   BBB   11.15
February BBB    11.13



January  CCC    17.13
.

December   CCC           17.56

and so on.

My objective is to calculate Rating-wise mean rate, for which I have used 

rating_mean = tapply(rate, rating, mean) 

and I am getting following output

 tapply(rate, rating, mean)
  AAA           BBB  CCC 
   9.1104   11.1361637    17.1606779

which is correct when compared with an excel output.

However, I wish to have my output something like a data.frame (so that I should 
be able to save this output as csv file with respective headings and should be 
able to carry out further analysis)

Rating Mean
AAA    9.1104
BBB   11.1361637
CCC   17.1606779


Please guide as how should I achieve my output like this.
   
Thanking in advance.

Regards

Vincy






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] One silly question about tapply output

2010-10-27 Thread Vincy Pyne
Dear Sirs,

Thanks a lot for your great help. This is going to help me immensely in future 
as many times I had found myself struggling with this problem. 

Thanks again for the great help.

Regards

Vincy



--- On Wed, 10/27/10, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com 
wrote:

From: Dimitri Liakhovitski dimitri.liakhovit...@gmail.com
Subject: Re: [R] One silly question about tapply output
To: Vincy Pyne vincy_p...@yahoo.ca
Received: Wednesday, October 27, 2010, 11:28
 AM

Assign your result to an object and then write out the object as a csv
file. For example:

x-data.frame(rating=rep(letters[1:3],2),rate=runif(1:6)) # example data frame
rating.means-tapply(x$rate,x$rating,mean)
write.csv(rating.means,file=my.file.csv,row.names=T)

Dimitri

On Wed, Oct 27, 2010 at
 6:39 AM, Vincy Pyne vincy_p...@yahoo.ca wrote:
 Dear R helpers

 I have a data which gives Month-wise and Rating-wise Rates. So the input file 
 is something like

 month   rating   rate
 January    AAA 9.04
 February  AAA             9.07
 ..
 ..
 Decemeber     AAA            8.97
 January  
 BBB   11.15
 February BBB    11.13



 January  CCC    17.13
 .
 
 December   CCC           17.56

 and so on.

 My objective is to calculate Rating-wise mean rate, for which I have used

 rating_mean = tapply(rate, rating, mean)

 and I am getting following output

 tapply(rate, rating, mean)
   AAA          
 BBB  CCC
    9.1104   11.1361637    17.1606779

 which is correct when compared with an excel output.

 However, I wish to have my output something like a data.frame (so that I 
 should be able to save this output as csv file with respective headings and 
 should be able to carry out further analysis)

 Rating Mean
 AAA    9.1104
 BBB   11.1361637
 CCC   17.1606779


 Please guide as how should
 I achieve my output like this.

 Thanking in advance.

 Regards

 Vincy






        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Band-wise Conditional Sum - Actual problem

2010-08-30 Thread Vincy Pyne
 earlier mail. I could test 
the reply sent to me earlier by Winsemius Sir only today as I was traveling 
over weekends. Also, I have tried to go through earlier emails dealing with 
such conditional sums. Unfortunately, I couldn't understand as I have recently 
started my
 venture with R.


Thanking you in advance and sincerely apologize for any mis-communication if it 
had occurred in my earlier mail. 

Regards

Vincy


--- On Fri, 8/27/10, David Winsemius dwinsem...@comcast.net wrote:

From: David Winsemius dwinsem...@comcast.net
Subject: Re: [R] Band-wise Sum
To: Vincy Pyne vincy_p...@yahoo.ca
Cc: r-help@r-project.org
Received: Friday, August 27, 2010, 2:36 PM


On Aug 27, 2010, at 9:49 AM, Vincy Pyne wrote:

 Hi
 
 I have a large credit portfolio (exceeding 5 borrowers). For particular 
 process I need to add up the exposures based on the bands. I am giving a 
 small test data below.

I would think that cut() would be the accepted method for defining
 a factor
 variable based
 on specified cutpoints. If you then wanted to see what the cumsum() was across 
the range of possible levels, that to would be a fairly simple task.

df$ead.cat - cut(df$ead, breaks=c(0, 10, 50, 100, 200, 500 
, 1000, 1) )
df
with(df, tapply(ead.cat, rating, length))
#  A  AA AAA   B  BB BBB
# 10   8   2   1   4   7
with(df, tapply(ead.cat, rating, table))
# returns a list of table objects by bond rating

lapply( with(df, tapply(ead.cat, rating, table)) , cumsum)
#returns the cumsum of those tables

# sapply gives a more compact output of that result:
 sapply( with(df, tapply(ead.cat, rating, table)) , cumsum)
               A AA AAA B BB BBB
(0,1e+05]      4  2   1 0 
 3   1
(1e+05,5e+05]  8  2   1 1  3   1
(5e+05,1e+06]  9  2   1 1  3   1
(1e+06,2e+06]  9  4   2 1  4   3
(2e+06,5e+06]  9  5   2 1  4   4
(5e+06,1e+07] 10  5   2 1  4   7
(1e+07,1e+08] 10  8   2 1  4   7

Loops, you say we need loops? We don't need no stinkin' loops.

--David.

 
 rating - c(A, AAA, A, BBB,AA,A,BB, BBB, AA, AA, AA, 
 A, A, AA,BB,BBB,AA, A, AAA,BBB,BBB, BB, A, BB, A, 
 AA, B,A, AA, BBB, A, BBB)
 
 ead - c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360,  
 600, 17715000,  14430325.24, 1180946.57, 15,
 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3, 3283048.61, 
120, 11800, 3000,  96894.02,  453671.72,  7590, 106065.24, 940711.67,  
2443000, 950, 39000, 1501939.67)
 
 ## First I have sorted the data rating-wise as
 
 df - data.frame(rating, ead)
 
 df_sorted -
 df[order(df$rating),]
 
 df_sorted_AAA - subset(df_sorted, rating==AAA)
 df_sorted_AA - subset(df_sorted, rating==AA)
 df_sorted_A - subset(df_sorted, rating==A)
 df_sorted_BBB - subset(df_sorted, rating==BBB)
 df_sorted_BB - subset(df_sorted, rating==BB)
 df_sorted_B - subset(df_sorted, rating==B)
 df_sorted_CCC - subset(df_sorted, rating==CCC)
 
 ## we begin with BBB rating. The R output for df_sorted_BBB is as follows
 
 df_sorted_BBB
   
    rating      ead
 4     BBB      9530149
 8     BBB      600
 16    BBB     3000
 20    BBB     3283049
 21    BBB     120
 30    BBB     950
 32    BBB     1501940
 
 My problem is I need to totals of eads falling in the respective bands
 
 I
 am defining bands in millions as
 
 seq_BBB - seq(100, max(df_sorted_BBB$ead), by = 100)
 
 # The output is
 [1] 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06
 
 So for the sub data pertaining to Rating BBB, I want corresponding ead 
 totals i.e. I want ead totals where ead  1e+06, then I want ead
 totals where 1+e06  ead  2e+06, 2e+06  ead  3e+06 ...and so on.
 
 I have tried the following code
 
 s_BBB - NULL
 
 for (i in 1:length(s_BBB))
 {
 s_BBB[i] = sum(subset(df_sorted_BBB$ead, df_sorted_BBB$ead  s_BBB[i]))
 }
 
 I was trying to find totals ofads  1e+06, ead  2e+06, ead3e+06and so on.
 
 but the result is
 
 s_BBB
 [1] 0
 
 
 I apologize if I am not able to express my problem properly. My only 
 objective is first to sort the whole portfolio rating-wise and then within 
 each of these rating-wise sorted data, I wish to find out total of eads based
 on various bands starting 100,  100 - 20, 200 - 300, 
 300 - 400 and so on. Since the database contains more than 5 
 records, various ead amounts ranging from few 000's to billion are
 available.
 
 Please guide
 
 Thanking  you all in advance
 
 Vincy
 
 
 
 
 
 
 
 
 
 
 
 
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read

Re: [R] Band-wise Conditional Sum - Actual problem

2010-08-30 Thread Vincy Pyne
Dear David and Dennis Sir,

Thanks a lot for your guidance. 

As guided by Mr Dennis Murphy Sir in his reply 

Replace table in the tapply call with sum. While you're at it, typing 
?tapply to find out what the function does wouldn't hurt...

I had  really tried earlier to understand the apply, tapply, mapply and sapply 
commands before writing back to the R forum. But I was not able to figure out 
where was the problem. But Mr Dennis Sir really inspired me and when I 
revisited 'tapply', I realized that instead of using 'ead' for getting sum, I 
was using 'ead.cat', and that solved my problem. 

Then I had a new problem of 'How to get rid of NA's' , Again instead of  
posting to the group, I had accessed the earlier R mails and in the end got the 
solution. I sincerely thank both of you for taking so much efforts and guiding 
me.

I will certainly take efforts to understand 'R' at the earliest.

Regards

Vincy



Replace table in the tapply call with sum. While you're at it, typing 
?tapply to find out what the function does wouldn't hurt...

HTH,
Dennis

 

--- On Mon, 8/30/10, David Winsemius dwinsem...@comcast.net wrote:

From: David Winsemius dwinsem...@comcast.net
Subject: Re: [R] Band-wise Conditional Sum - Actual problem

Cc: r-help@r-project.org
Received: Monday, August 30, 2010, 2:43 PM


On Aug 30, 2010, at 4:05 AM, Vincy Pyne wrote:

 Dear R helpers,
 
 Thanks a lot for your earlier guidance esp. Mr Davind Winsemius Sir. However, 
 there seems to be mis-communication from my end
 corresponding to my requirement. As I had mentioned in my earlier mail, I am 
dealing with a very large database of borrowers and I had given a part of it in 
my earlier mail as given below. For a given rating say A, I needed to have 
the bad-wise sums of ead's (where bands are constructed using the ead size 
itself.) and not the number of borrowers falling in a particular band.
 
 I am reproducing the data and solution as provided by Winsemius Sir (which 
 generates the number of band-wise borrowers for a given rating.
 
 rating - c(A, AAA, A, BBB,AA,A,BB, BBB, AA, AA, AA, 
 A, A, AA,BB,BBB,AA, A, AAA,BBB,BBB, BB, A, BB, A, 
 AA, B,A, AA, BBB, A, BBB)
 
 ead - c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360,  
 600, 17715000,  14430325.24, 1180946.57, 15, 167490, 81255.16, 
 54812.5, 3000, 1275702.94, 9100,
 1763142.3, 3283048.61, 120, 11800, 3000,  96894.02,  453671.72,  7590, 
106065.24, 940711.67,  2443000, 950, 39000, 1501939.67)
 
 df$ead.cat - cut(df$ead, breaks=c(0, 10, 50, 100, 200, 
 500 , 1000, 1) )
 
 df
 
 df_sorted - df[order(df$rating),]      # the output is as given below.
 
  df_sorted
    rating         ead                     ead.cat
 1       A          169229.93        (1e+05,5e+05]
 3       A         5877794.25        (5e+06,1e+07]
 6       A            21000.00   
            (0,1e+05]
 12      A          15.00       (1e+05,5e+05]
 13      A          167490.00       (1e+05,5e+05]
 18      A             9100.00               (0,1e+05]
 23      A             3000.00               (0,1e+05]
 25      A          453671.72       (1e+05,5e+05]
 28      A          940711.67       (5e+05,1e+06]
 31      A            39000.00   
           (0,1e+05]
 5      AA       75040962.06      (1e+07,1e+08]
 9      AA       17715000.00      (1e+07,1e+08]
 10     AA      14430325.24      (1e+07,1e+08]
 11     AA        1180946.57      (1e+06,2e+06]
 14     AA            81255.16             (0,1e+05]
 17     AA         1275702.94     (1e+06,2e+06]
 26     AA              7590.00            (0,1e+05]
 29     AA     
    2443000.00     (2e+06,5e+06]
 2     AAA               100.00             (0,1e+05]
 19    AAA       1763142.30      (1e+06,2e+06]
 27      B           106065.24      (1e+05,5e+05]
 7      BB         1028360.00      (1e+06,2e+06]
 15     BB            54812.50             (0,1e+05]
 22     BB            11800.00             (0,1e+05]
 24     BB           
 96894.02             (0,1e+05]
 4     BBB        9530148.63      (5e+06,1e+07]
 8     BBB        600.00      (5e+06,1e+07]
 16    BBB            3000.00              (0,1e+05]
 20    BBB       3283048.61       (2e+06,5e+06]
 21    BBB       120.00       (1e+06,2e+06]
 30    BBB       950.00       (5e+06,1e+07]
 32    BBB       1501939.67       (1e+06,2e+06]
 
 
 ## The following command fetches rating-wise and ead
 size no of borrowers. Thus, for rating A, there are 4 borrowers in the ead 
range (0, 1e+05], 4 borrowers in the range (1e+05 to 5e+05] and so on..
 
  with(df, tapply(ead.cat, rating, table))
 $A
 
     (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
(5e+06,1e+07] (1e+07,1e+08]
             4             4             1             0             0         
    1             0
 
 $AA
 
     (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
(5e+06,1e+07] (1e

Re: [R] Band-wise Sum

2010-08-28 Thread Vincy Pyne
Dear David Sir,

Thanks a lot for your guidance. You reply besides helping, also taught me the 
importance of sharing your knowledge. It also helped me understand where do I 
stand. I am a starter in R and I have started going through at least some mails 
everyday whenever possible so that I can learn something from THE WISE like you.

Thanks once again Sir. Your help was great and it means a lot to me and for 
other freshers like me.

Regards

Vincy Pyne

--- On Fri, 8/27/10, David Winsemius dwinsem...@comcast.net wrote:

From: David Winsemius dwinsem...@comcast.net
Subject: Re: [R] Band-wise Sum
To: Vincy Pyne vincy_p...@yahoo.ca
Cc: r-help@r-project.org
Received: Friday, August 27, 2010, 2:36 PM


On Aug 27, 2010, at 9:49 AM, Vincy Pyne wrote:

 Hi
 
 I have a large credit portfolio (exceeding 5 borrowers). For particular 
 process I need to add up the exposures based on the bands. I am giving a 
 small test data below.

I would think that cut() would be the accepted method for defining a factor 
variable based on specified cutpoints. If you then wanted to see what the 
cumsum() was across the range of possible levels, that to would be a fairly 
simple task.

df$ead.cat - cut(df$ead, breaks=c(0, 10, 50, 100, 200, 500 
, 1000, 1) )
df
with(df, tapply(ead.cat, rating, length))
#  A  AA AAA   B  BB BBB
# 10   8   2   1   4   7
with(df, tapply(ead.cat, rating, table))
# returns a list of table objects by bond rating

lapply( with(df, tapply(ead.cat, rating, table)) , cumsum)
#returns the cumsum of those tables

# sapply gives a more compact output of that result:
 sapply( with(df, tapply(ead.cat, rating, table)) , cumsum)
               A AA AAA B BB BBB
(0,1e+05]      4  2   1 0  3   1
(1e+05,5e+05]  8  2   1 1  3   1
(5e+05,1e+06]  9  2   1 1  3   1
(1e+06,2e+06]  9  4   2 1  4   3
(2e+06,5e+06]  9  5   2 1  4   4
(5e+06,1e+07] 10  5   2 1  4   7
(1e+07,1e+08] 10  8   2 1  4   7

Loops, you say we need loops? We don't need no stinkin' loops.

--David.

 
 rating - c(A, AAA, A, BBB,AA,A,BB, BBB, AA, AA, AA, 
 A, A, AA,BB,BBB,AA, A, AAA,BBB,BBB, BB, A, BB, A, 
 AA, B,A, AA, BBB, A, BBB)
 
 ead - c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360,  
 600, 17715000,  14430325.24, 1180946.57, 15, 167490, 81255.16, 
 54812.5, 3000, 1275702.94, 9100, 1763142.3, 3283048.61, 120, 11800, 
 3000,  96894.02,  453671.72,  7590, 106065.24, 940711.67,  2443000, 950, 
 39000, 1501939.67)
 
 ## First I have sorted the data rating-wise as
 
 df - data.frame(rating, ead)
 
 df_sorted -
 df[order(df$rating),]
 
 df_sorted_AAA - subset(df_sorted, rating==AAA)
 df_sorted_AA - subset(df_sorted, rating==AA)
 df_sorted_A - subset(df_sorted, rating==A)
 df_sorted_BBB - subset(df_sorted, rating==BBB)
 df_sorted_BB - subset(df_sorted, rating==BB)
 df_sorted_B - subset(df_sorted, rating==B)
 df_sorted_CCC - subset(df_sorted, rating==CCC)
 
 ## we begin with BBB rating. The R output for df_sorted_BBB is as follows
 
 df_sorted_BBB
       rating      ead
 4     BBB      9530149
 8     BBB      600
 16    BBB     3000
 20    BBB     3283049
 21    BBB     120
 30    BBB     950
 32    BBB     1501940
 
 My problem is I need to totals of eads falling in the respective bands
 
 I
 am defining bands in millions as
 
 seq_BBB - seq(100, max(df_sorted_BBB$ead), by = 100)
 
 # The output is
 [1] 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06
 
 So for the sub data pertaining to Rating BBB, I want corresponding ead 
 totals i.e. I want ead totals where ead  1e+06, then I want ead totals where 
 1+e06  ead  2e+06, 2e+06  ead  3e+06 ...and so on.
 
 I have tried the following code
 
 s_BBB - NULL
 
 for (i in 1:length(s_BBB))
 {
 s_BBB[i] = sum(subset(df_sorted_BBB$ead, df_sorted_BBB$ead  s_BBB[i]))
 }
 
 I was trying to find totals ofads  1e+06, ead  2e+06, ead3e+06and so on.
 
 but the result is
 
 s_BBB
 [1] 0
 
 
 I apologize if I am not able to express my problem properly. My only 
 objective is first to sort the whole portfolio rating-wise and then within 
 each of these rating-wise sorted data, I wish to find out total of eads based
 on various bands starting 100,  100 - 20, 200 - 300, 
 300 - 400 and so on. Since the database contains more than 5 
 records, various ead amounts ranging from few 000's to billion are available.
 
 Please guide
 
 Thanking  you all in advance
 
 Vincy
 
 
 
 
 
 
 
 
 
 
 
 
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch

[R] Band-wise Sum

2010-08-27 Thread Vincy Pyne
Hi

I have a large credit portfolio (exceeding 5 borrowers). For particular 
process I need to add up the exposures based on the bands. I am giving a small 
test data below.

rating - c(A, AAA, A, BBB,AA,A,BB, BBB, AA, AA, AA, A, 
A, AA,BB,BBB,AA, A, AAA,BBB,BBB, BB, A, BB, A, AA, 
B,A, AA, BBB, A, BBB)

ead - c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360,  
600, 17715000,  14430325.24, 1180946.57, 15, 167490, 81255.16, 54812.5, 
3000, 1275702.94, 9100, 1763142.3, 3283048.61, 120, 11800, 3000,  
96894.02,  453671.72,  7590, 106065.24, 940711.67,  2443000, 950, 39000, 
1501939.67)

## First I have sorted the data rating-wise as

df - data.frame(rating, ead)

df_sorted -
 df[order(df$rating),]

df_sorted_AAA - subset(df_sorted, rating==AAA)  
df_sorted_AA - subset(df_sorted, rating==AA)
df_sorted_A - subset(df_sorted, rating==A)
df_sorted_BBB - subset(df_sorted, rating==BBB)
df_sorted_BB - subset(df_sorted, rating==BB)
df_sorted_B - subset(df_sorted, rating==B)
df_sorted_CCC - subset(df_sorted, rating==CCC)

## we begin with BBB rating. The R output for df_sorted_BBB is as follows

 df_sorted_BBB
  rating      ead
4 BBB      9530149
8 BBB  600
16    BBB 3000
20    BBB 3283049
21    BBB     120
30    BBB 950
32    BBB 1501940

My problem is I need to totals of eads falling in the respective bands

I
 am defining bands in millions as 

seq_BBB - seq(100, max(df_sorted_BBB$ead), by = 100)

# The output is 
[1] 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06

So for the sub data pertaining to Rating BBB, I want corresponding ead totals 
i.e. I want ead totals where ead  1e+06, then I want ead totals where 1+e06  
ead  2e+06, 2e+06  ead  3e+06 ...and so on.

I have tried the following code

s_BBB - NULL

for (i in 1:length(s_BBB))
{
s_BBB[i] = sum(subset(df_sorted_BBB$ead, df_sorted_BBB$ead  s_BBB[i]))
}

I was trying to find totals of eads  1e+06, ead  2e+06, ead3e+06 and so on.

but the result is 

 s_BBB
[1] 0


I apologize if I am not able to express my problem properly. My only objective 
is first to sort the whole portfolio rating-wise and then within each of these 
rating-wise sorted data, I wish to find out total of eads based
 on various bands starting 100,  100 - 20, 200 - 300, 
300 - 400 and so on. Since the database contains more than 5 
records, various ead amounts ranging from few 000's to billion are available.

Please guide

Thanking  you all in advance

Vincy












[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.