[R] Change in order of names after applying plyr package
Dear R helpers I have following two data.frames viz. equity_data and param. equity_data = data.frame(security_id = c(Air, Air, Air, Air, Air, Air, Air, Air, Air, Air, Air, Air, AB, AB, AB, AB, AB, AB, AB, AB, AB, AB, AB, AB, AD, AD, AD, AD, AD, AD, AD, AD, AD, AD, AD, AD), ason_date = c(10-Jan-12,9-Jan-12,8-Jan-12, 7-Jan-12, 6-Jan-12,5-Jan-12,4-Jan-12,3-Jan-12,2-Jan-12,1-Jan-12, 31-Dec-11, 30-Dec-11, 10-Jan-12,9-Jan-12,8-Jan-12, 7-Jan-12, 6-Jan-12,5-Jan-12,4-Jan-12,3-Jan-12,2-Jan-12,1-Jan-12, 31-Dec-11, 30-Dec-11, 10-Jan-12,9-Jan-12,8-Jan-12, 7-Jan-12, 6-Jan-12,5-Jan-12,4-Jan-12,3-Jan-12,2-Jan-12,1-Jan-12, 31-Dec-11, 30-Dec-11), security_rate = c(0.597,0.61,0.6,0.63,0.67,0.7,0.74,0.735, 7.61,0.795,0.796, 0.84, 8.5,8.1,8.9,8.9,8.9,9,9,9,9,9,9,9,3.21,3.22,3.12, 3.51, 3.5, 3.37, 3.25, 3, 3.07, 3, 2.94, 2.6)) param = data.frame(confidence_level = c(0.99), holding_period = c(10), calculation_method = MC, no_simulation_mc = c(100)) library(plyr) library(reshape2) attach(equity_data) attach(param) security_names = unique(equity_data$security_id) # (security_names are used further in R code not included here) alpha = param$confidence_level t = param$holding_period n = param$no_simulation_mc method = param$calculation_method mc_VaR = function(security_id, ason_date, security_rate) { security_rate_returns - NULL for (i in(1:length(ason_date)-1)) { security_rate_returns[i] = log(security_rate[i]/security_rate[i+1]) } return_mean = mean(security_rate_returns) return_sd = sd(security_rate_returns) simulation = rnorm(n, return_mean, return_sd) qq = sort(simulation, decreasing = TRUE) VaR_mc = -qq[alpha * n]*sqrt(t) return(VaR_mc) } result_method_other - dlply(.data = equity_data, .variables = security_id, .fun = function(x) mc_VaR(ason_date = x$ason_date, security_id = x$security_id, security_rate = x$security_rate)) result_method_other $AB [1] 0.2657424 $AD [1] 0.212061 $Air [1] 6.789733 attr(,split_type) [1] data.frame attr(,split_labels) security_id 1 AB 2 AD 3 Air MY PROBLEM : My original data (i.e. equity_data) has the order of Air, AB and AD. However, after applying plyr, my order (and corresponding result) has changed to AB, AD Air. I need to maintain my original order of Air, AB and AD. How do I modify my R code for this? Kindly guide Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to append the random no.s for different variables in the same data.frame
Dear R helpers, (At the outset I sincerely apologize if I have not put forward my following query properly, though I have tried to do so.) Following is a curtailed part of my R - code where I am trying to generate say 100 random no.s for each of the products under consideration. library(plyr) n = 100 my_code = function(product, output_avg, output_stdev) { BUR_mc = rnorm(n, output_avg, output_stdev) sim_BUR = data.frame(product, BUR_mc) write.csv(data.frame(sim_BUR), 'sim_BUR.csv', row.names = FALSE) return(list(output_avg, output_stdev)) } result - dlply(.data = My_data, .variables = product, .fun = function(x) my_code(product = x$product, output_avg = x$output_avg, output_stdev = x$output_stdev)) There are some 12 products (and this may vary each time). In my original code, the return statement returns me some other output. Here for simplicity sake, I am just using the values as given in input. PROBLEM - A : I want to store the random no.s (BUR_mc) as generated above for each of the products and store them in a single data.frame. Now when I access 'sim_BUR.csv', I get the csv file where the random nos. generated for the last product are getting stored. I need something like product random no product1 ... product1 ... . product1 ... # (This is 100th value generated for product1) product2 ... product2 ... Problem - B Also, is it possible to have more than one 'return' statements in a given function? Thanking in advance Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to append the random no.s for different variables in the same data.frame
Dear Mr Weylandt and R helpers, Thanks a lot for your suggestion. Unfortunately the return statement in my original R code returns me different results which are obtained after processing the function I have constructed. My requirement for storing the product-wise random numbers is just a part of my whole exercise. For each of the products, I generate a set of random no.s, process these, construct some statistics and obtain these statistics using the Return statement. So for each of the products, I get these set of statistics generated and that is not my problem. My problem is BESIDES getting my required output (which anyways I am getting), I need the product-wise random numbers I have already generated and store them together in a single data.frame. So a single data.frame gives me all the product wise random nos. I am reproducing my problem once again - # library(plyr) n = 100 my_code = function(product, output_avg, output_stdev) { BUR_mc = rnorm(n, output_avg, output_stdev) sim_BUR = data.frame(product, BUR_mc) write.csv(data.frame(sim_BUR), 'sim_BUR.csv', row.names = FALSE) return(list(output_avg, output_stdev)) } result - dlply(.data = My_data, .variables = product, .fun = function(x) my_code(product = x$product, output_avg = x$output_avg, output_stdev = x$output_stdev)) There are some 12 products (and this may vary each time). In my original code, the return statement returns me some other output. Here for simplicity sake, I am just using the values as given in input. PROBLEM I want to store the random no.s (BUR_mc) as generated above for each of the products and store them in a single data.frame. Now when I access 'sim_BUR.csv', I get the csv file where the random nos. generated for the last product are getting stored. I need something like product random no product1 ... product1 ... . product1 ... # (There will be 100 such values for product1) product2 ... product2 ... product12 .. product12 ... Thanking you in advance Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Maintaining Column names while writing csv file.
Dear R helpers, I have one trivial problem while writing an output file in csv format. I have two dataframes say df1 and df2 which I am reading from two different csv files. df1 has column names as date, r1, r2, r3 while the dataframe df2 has column names as date, 1w, 2w. (the dates in both the date frames are identical also no of elements in each column are equal say = 10). I merge these dataframes as df_new = merge(df1, df2, by = date, all = T) So my new data frame has columns as date, r1, r2, r3, 1w, 2w However, if I try to write this new dataframe as a csv file as write.csv(data.frame(df_new), 'df_new.csv', row.names = FALSE) The file gets written, but when I open the csv file, the column names displayed are as date, r1, r2, r3, X1w, X2w My original output file has about 200 columns so it is not possible to write column names individually. Also, I can't change the column names since I am receiving these files from external source and need to maintain the column names. Kindly guide Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to have original (name) order after melt and cast command
Dear R helpers, I have a data.frame as given below - dat1 = data.frame(date = as.Date(c(3/30/12,3/29/12,3/28/12,3/27/12,3/26/12, 3/23/12,3/22/12,3/21/12,3/20/12, 3/30/12,3/29/12,3/28/12,3/27/12, 3/26/12,3/23/12,3/22/12,3/21/12,3/20/12, 3/30/12,3/29/12,3/28/12, 3/27/12,3/26/12,3/23/12,3/22/12,3/21/12,3/20/12), format=%m/%d/%y), name = as.character(c(xyz,xyz,xyz,xyz,xyz,xyz,xyz,xyz, xyz,abc, abc,abc,abc,abc,abc, abc,abc,abc,lmn,lmn, lmn,lmn, lmn,lmn, lmn,lmn,lmn)), rate = c(c(0.065550707, 0.001825007, 0.054441969, 0.020810572, 0.073430586, 0.037299722, 0.099807733, 0.042072817, 0.099487289, 5.550737022, 4.877620777, 5.462477493, 4.972518082, 5.01495407, 5.820459609, 5.403881954, 5.009506516, 4.807763909, 10.11885434,10.1856975,10.04976806,10.15428632, 10.20399335, 10.22966704,10.20967742,10.22927793,10.02439192))) dat1 date name rate 1 2012-03-30 xyz 0.065550707 2 2012-03-29 xyz 0.001825007 3 2012-03-28 xyz 0.054441969 4 2012-03-27 xyz 0.020810572 5 2012-03-26 xyz 0.073430586 6 2012-03-23 xyz 0.037299722 7 2012-03-22 xyz 0.099807733 8 2012-03-21 xyz 0.042072817 9 2012-03-20 xyz 0.099487289 10 2012-03-30 abc 5.550737022 11 2012-03-29 abc 4.877620777 12 2012-03-28 abc 5.462477493 13 2012-03-27 abc 4.972518082 14 2012-03-26 abc 5.014954070 15 2012-03-23 abc 5.820459609 16 2012-03-22 abc 5.403881954 17 2012-03-21 abc 5.009506516 18 2012-03-20 abc 4.807763909 19 2012-03-30 lmn 10.118854340 20 2012-03-29 lmn 10.185697500 21 2012-03-28 lmn 10.049768060 22 2012-03-27 lmn 10.154286320 23 2012-03-26 lmn 10.203993350 24 2012-03-23 lmn 10.229667040 25 2012-03-22 lmn 10.209677420 26 2012-03-21 lmn 10.229277930 27 2012-03-20 lmn 10.024391920 attach(dat1) library(plyr) library(reshape) in.melt - melt(dat1, measure = 'rate') (df = cast(in.melt, date ~ name)) df_sorted = df[order(as.Date(df$date, %m/%d/%Y), decreasing = TRUE),] df_sorted date abc lmn xyz 9 2012-03-30 5.550737 10.11885 0.065550707 8 2012-03-29 4.877621 10.18570 0.001825007 7 2012-03-28 5.462477 10.04977 0.054441969 6 2012-03-27 4.972518 10.15429 0.020810572 5 2012-03-26 5.014954 10.20399 0.073430586 4 2012-03-23 5.820460 10.22967 0.037299722 3 2012-03-22 5.403882 10.20968 0.099807733 2 2012-03-21 5.009507 10.22928 0.042072817 1 2012-03-20 4.807764 10.02439 0.099487289 My Problem :- The original data.frame has the order name as xyz, abc and lmn. However, after melt and cast command, the order in the df_sorted has changed to abc, lmn and xyz. How do I maintain the original order in df_sorted i.e. I need date xyz abc lmn 9 2012-03-30 0.065550707 5.550737 10.11885 8 2012-03-29 0.001825007 4.877621 10.18570 7 2012-03-28 0.054441969 5.462477 10.04977 6 2012-03-27 0.020810572 4.972518 10.15429 5 2012-03-26 0.073430586 5.014954 10.20399 4 2012-03-23 0.037299722 5.820460 10.22967 3 2012-03-22 0.099807733 5.403882 10.20968 2 2012-03-21 0.042072817 5.009507 10.22928 1 2012-03-20 0.099487289 4.807764 10.02439 Kindly guide Thanking in advance Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to have original (name) order after melt and cast command
Dear Mr Rui Barradas, Thanks a lot for your wonderful suggestion. It worked and will help me immensely in future too. Really heartfelt thanks once again. Vincy --- On Wed, 7/18/12, Rui Barradas ruipbarra...@sapo.pt wrote: From: Rui Barradas ruipbarra...@sapo.pt Subject: Re: [R] How to have original (name) order after melt and cast command To: Vincy Pyne vincy_p...@yahoo.ca Cc: r-help@r-project.org Received: Wednesday, July 18, 2012, 11:18 AM Hello, Try the following. # This is your code df_sorted = df[order(as.Date(df$date, %m/%d/%Y), decreasing = TRUE),] # This is my code nams - as.character(unique(dat1$name)) nums - sapply(nams, function(nm) which(names(df_sorted) %in% nm)) df_sorted[, sort(nums)] - df_sorted[, nams] names(df_sorted)[sort(nums)] - nams df_sorted Hope this helps, Rui Barradas Em 18-07-2012 11:52, Vincy Pyne escreveu: Dear R helpers, I have a data.frame as given below - dat1 = data.frame(date = as.Date(c(3/30/12,3/29/12,3/28/12,3/27/12,3/26/12, 3/23/12,3/22/12,3/21/12,3/20/12, 3/30/12,3/29/12,3/28/12,3/27/12, 3/26/12,3/23/12,3/22/12,3/21/12,3/20/12, 3/30/12,3/29/12,3/28/12, 3/27/12,3/26/12,3/23/12,3/22/12,3/21/12,3/20/12), format=%m/%d/%y), name = as.character(c(xyz,xyz,xyz,xyz,xyz,xyz,xyz,xyz, xyz,abc, abc,abc,abc,abc,abc, abc,abc,abc,lmn,lmn, lmn,lmn, lmn,lmn, lmn,lmn,lmn)), rate = c(c(0.065550707, 0.001825007, 0.054441969, 0.020810572, 0.073430586, 0.037299722, 0.099807733, 0.042072817, 0.099487289, 5.550737022, 4.877620777, 5.462477493, 4.972518082, 5.01495407, 5.820459609, 5.403881954, 5.009506516, 4.807763909, 10.11885434,10.1856975,10.04976806,10.15428632, 10.20399335, 10.22966704,10.20967742,10.22927793,10.02439192))) dat1 date name rate 1 2012-03-30 xyz 0.065550707 2 2012-03-29 xyz 0.001825007 3 2012-03-28 xyz 0.054441969 4 2012-03-27 xyz 0.020810572 5 2012-03-26 xyz 0.073430586 6 2012-03-23 xyz 0.037299722 7 2012-03-22 xyz 0.099807733 8 2012-03-21 xyz 0.042072817 9 2012-03-20 xyz 0.099487289 10 2012-03-30 abc 5.550737022 11 2012-03-29 abc 4.877620777 12 2012-03-28 abc 5.462477493 13 2012-03-27 abc 4.972518082 14 2012-03-26 abc 5.014954070 15 2012-03-23 abc 5.820459609 16 2012-03-22 abc 5.403881954 17 2012-03-21 abc 5.009506516 18 2012-03-20 abc 4.807763909 19 2012-03-30 lmn 10.118854340 20 2012-03-29 lmn 10.185697500 21 2012-03-28 lmn 10.049768060 22 2012-03-27 lmn 10.154286320 23 2012-03-26 lmn 10.203993350 24 2012-03-23 lmn 10.229667040 25 2012-03-22 lmn 10.209677420 26 2012-03-21 lmn 10.229277930 27 2012-03-20 lmn 10.024391920 attach(dat1) library(plyr) library(reshape) in.melt - melt(dat1, measure = 'rate') (df = cast(in.melt, date ~ name)) df_sorted = df[order(as.Date(df$date, %m/%d/%Y), decreasing = TRUE),] df_sorted date abc lmn xyz 9 2012-03-30 5.550737 10.11885 0.065550707 8 2012-03-29 4.877621 10.18570 0.001825007 7 2012-03-28 5.462477 10.04977 0.054441969 6 2012-03-27 4.972518 10.15429 0.020810572 5 2012-03-26 5.014954 10.20399 0.073430586 4 2012-03-23 5.820460 10.22967 0.037299722 3 2012-03-22 5.403882 10.20968 0.099807733 2 2012-03-21 5.009507 10.22928 0.042072817 1 2012-03-20 4.807764 10.02439 0.099487289 My Problem :- The original data.frame has the order name as xyz, abc and lmn. However, after melt and cast command, the order in the df_sorted has changed to abc, lmn and xyz. How do I maintain the original order in df_sorted i.e. I need date xyz abc lmn 9 2012-03-30 0.065550707 5.550737 10.11885 8 2012-03-29 0.001825007 4.877621 10.18570 7 2012-03-28 0.054441969 5.462477 10.04977 6 2012-03-27 0.020810572 4.972518 10.15429 5 2012-03-26 0.073430586 5.014954 10.20399 4 2012-03-23 0.037299722 5.820460 10.22967 3 2012-03-22 0.099807733 5.403882 10.20968 2 2012-03-21 0.042072817 5.009507 10.22928 1 2012-03-20 0.099487289 4.807764 10.02439 Kindly guide Thanking in advance Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to use Sys.time() while writing a csv file name
Dear Mr Newmiller and Mr Oettli, Thanks a lot for your valuable guidance. Task is done. Thanks again. Regards Vincy --- On Wed, 7/4/12, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: From: Jeff Newmiller jdnew...@dcn.davis.ca.us Subject: Re: [R] How to use Sys.time() while writing a csv file name To: Vincy Pyne vincy_p...@yahoo.ca, r-help@r-project.org Received: Wednesday, July 4, 2012, 5:38 AM You forgot to follow the posting guide and tell us what operating system you are using (sessionInfo), but I am going to guess that you are on Windows where the colon (:) is an illegal symbol in filenames. Try formatting the time explicitly in the conversion to character using the format string definitions found in ?strptime in a format that doesn't include colons. --- Jeff Newmiller            The    .     . Go Live... DCN:jdnew...@dcn.davis.ca.us    Basics: ##.#.     ##.#. Live Go...                    Live:   OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries      O.O#.     #.O#. with /Software/Embedded Controllers)         .OO#.     .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Dear R helpers, I am using Beta distribution to generate the random no.s (recovery rates in my example). However, each time I need to save these random no.s in a csv format. To distinguish different csv files, one way I thought was use of Sys.time in the file name. My code is as follows - # My code rr = rbeta(25, 6.14, 8.12) lgd = 1 - mean(rr) write.csv(data.frame(recovery_rates = rr), file = paste(recovery_rates_at_, Sys.time(), .csv, sep = ), row.names = FALSE) However, I get following error - Error in file(file, ifelse(append, a, w)) : � cannot open the connection In addition: Warning message: In file(file, ifelse(append, a, w)) : cannot open file 'recovery_rates_at_2012-07-04 1:14:05.csv': Invalid argument If instead of Sys.time, I use some other variable e.g. lgd as write.csv(data.frame(recovery_rates = rr), paste('rates_',lgd,'.csv', sep = ), row.names = FALSE) I am able to store these simulated recovery rates in different files. But I need to use Sys.time in my csv file name. (or is there any other way of writing these csv files so that files don't get over-written). Kindly guide. Regards and thanking in advance Vincy    [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to use Sys.time() while writing a csv file name
Dear R helpers, I am using Beta distribution to generate the random no.s (recovery rates in my example). However, each time I need to save these random no.s in a csv format. To distinguish different csv files, one way I thought was use of Sys.time in the file name. My code is as follows - # My code rr = rbeta(25, 6.14, 8.12) lgd = 1 - mean(rr) write.csv(data.frame(recovery_rates = rr), file = paste(recovery_rates_at_, Sys.time(), .csv, sep = ), row.names = FALSE) However, I get following error - Error in file(file, ifelse(append, a, w)) : cannot open the connection In addition: Warning message: In file(file, ifelse(append, a, w)) : cannot open file 'recovery_rates_at_2012-07-04 1:14:05.csv': Invalid argument If instead of Sys.time, I use some other variable e.g. lgd as write.csv(data.frame(recovery_rates = rr), paste('rates_',lgd,'.csv', sep = ), row.names = FALSE) I am able to store these simulated recovery rates in different files. But I need to use Sys.time in my csv file name. (or is there any other way of writing these csv files so that files don't get over-written). Kindly guide. Regards and thanking in advance Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] What's wrong with MEAN?
Dear R helpers, I have recently installed R version 2.15.0 I just wanted to calculate mean(16, 18) Surprisingly I got answer as mean(16, 18) [1] 16 mean(18, 16) [1] 18 mean(14, 11, 17, 9, 5, 18) [1] 14 So instead of calculating simple Arithmetic average, mean command is generating first element as average. I restarted the machine, changed the machine, but still the reply is same. I have been using this mean function ever since I strated learning R, but this has never happened. Kindly guide Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What's wrong with MEAN?
Dear Mr. Thierry, Thanks a lot for pointing out such a silly mistake from my side. I was simply wondering how come I am not getting such a simple mean. Thanks again. Vincy --- On Tue, 5/22/12, ONKELINX, Thierry thierry.onkel...@inbo.be wrote: From: ONKELINX, Thierry thierry.onkel...@inbo.be Subject: RE: [R] What's wrong with MEAN? To: Vincy Pyne vincy_p...@yahoo.ca, r-help@r-project.org r-help@r-project.org Received: Tuesday, May 22, 2012, 9:17 AM You'll need to pass the data as a vector. mean(16, 18) is asking the mean of 16. 18 is passed to the second argument which is trim. So you are doing mean(16, trim = 18) What you want is mean(c(16, 18)) Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Vincy Pyne Verzonden: dinsdag 22 mei 2012 11:10 Aan: r-help@r-project.org Onderwerp: [R] What's wrong with MEAN? Dear R helpers, I have recently installed R version 2.15.0 I just wanted to calculate mean(16, 18) Surprisingly I got answer as mean(16, 18) [1] 16 mean(18, 16) [1] 18 mean(14, 11, 17, 9, 5, 18) [1] 14 So instead of calculating simple Arithmetic average, mean command is generating first element as average. I restarted the machine, changed the machine, but still the reply is same. I have been using this mean function ever since I strated learning R, but this has never happened. Kindly guide Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple Conditional Statement
Dear R helpers, I have two separate data frames. In one data frame the transaction data is stored and the other data frame has exchange rates stored say rate_A and rate_B where rate_A and rate_B are series of rates. rate_A and rate_B are properly defined and I am reading them through the appropriate dataframe. (Actually I have a different datasets and to try to keep things simple, I am defining it as above). I have BUY or SELL transaction (defined under the column head Type in transactions dataframe) and depending on the type of transaction, I need to define the rates. So if the type is BUY, rate_1 = rate_A and rate_2 = rate_B and if the type is SELL, rate_1 = rate_B and rate_2 = rate_A. To begin with I have only one transaction in my data frame (I am not aware if it is BUY or SELL transaction) Thus, I tried if(Type == Buy) {rate_1 = rate_A rate_2 = rate_B} else {rate_1 = rate_B rate_2 = rate_A} I get following error Error in rate_A rate_2 = rate_B could not find function - How do I define multiple conditional statements? Kindly guide. Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matrix multiplication by multple constants
Dear R helpers Suppose x - c(1:3) y - matrix(1:12, ncol = 3, nrow = 4) y [,1] [,2] [,3] [1,] 1 5 9 [2,] 2 6 10 [3,] 3 7 11 [4,] 4 8 12 I wish to multiply 1st column of y by first element of x i.e. 1, 2nd column of y by 2nd element of x i.e. 2 an so on. Thus the resultant matrix should be like z [,1] [,2] [,3] [1,] 1 10 27 [2,] 2 12 30 [3,] 3 14 33 [4,] 4 16 36 When I tried simple multiplication like x*y, y is getting multiplied column-wise x*z [,1] [,2] [,3] [1,] 1 5 9 [2,] 4 12 20 [3,] 9 21 33 [4,] 16 32 48 Kindly guide Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix multiplication by multple constants
Dear Mr. Dimitris Rizopoulos, Thanks a lot for your great help. It worked nicely. I couldn't have figured it out. Thanks again. Regards Vincy --- On Fri, 4/20/12, Dimitris Rizopoulos d.rizopou...@erasmusmc.nl wrote: From: Dimitris Rizopoulos d.rizopou...@erasmusmc.nl Subject: Re: [R] Matrix multiplication by multple constants To: Vincy Pyne vincy_p...@yahoo.ca Cc: r-help@r-project.org Received: Friday, April 20, 2012, 8:57 AM try this: x - 1:3 y - matrix(1:12, ncol = 3, nrow = 4) y * rep(x, each = nrow(y)) I hope it helps. Best, Dimitris On 4/20/2012 10:51 AM, Vincy Pyne wrote: Dear R helpers Suppose x- c(1:3) y- matrix(1:12, ncol = 3, nrow = 4) y [,1] [,2] [,3] [1,] 1 5 9 [2,] 2 6 10 [3,] 3 7 11 [4,] 4 8 12 I wish to multiply 1st column of y by first element of x i.e. 1, 2nd column of y by 2nd element of x i.e. 2 an so on. Thus the resultant matrix should be like z [,1] [,2] [,3] [1,] 1 10 27 [2,] 2 12 30 [3,] 3 14 33 [4,] 4 16 36 When I tried simple multiplication like x*y, y is getting multiplied column-wise x*z [,1] [,2] [,3] [1,] 1 5 9 [2,] 4 12 20 [3,] 9 21 33 [4,] 16 32 48 Kindly guide Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Constructing a data.frame from csv files
Dear R helpers, Following is my R code where I am trying to calculate returns and then trying to create a data.frame. Since, I am not aware how many instruments I will be dealing so I have constructed a function. My R code is as follows - library(plyr) mydata - data.frame(instru_name = c(instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B), date = c(10-Jan-12,9-Jan-12,8-Jan-12, 7-Jan-12, 6-Jan-12,5-Jan-12,4-Jan-12,3-Jan-12,2-Jan-12,1-Jan-12, 31-Dec-11, 30-Dec-11,29-Dec-11,28-Dec-11,10-Jan-12,9-Jan-12,8-Jan-12, 7-Jan-12,6-Jan-12,5-Jan-12,4-Jan-12,3-Jan-12,2-Jan-12,1-Jan-12,31-Dec-11,30-Dec-11,29-Dec-11,28-Dec-11), price = c(11.9,10.5,13,14.5,14.4,14.8,10.1,12,14.3, 10.7,11.2,10.2,10.2,10.8,41.9,40.5,43,44.5,44.4,48.8,42.1,44,46.3,48.7,46.2,44.2,42.2,40.8)) attach(mydata) opt_return_volatilty = function(price, instru_name) { price_returns = matrix(data = NA, nrow = (length(price)-1), ncol = 1) for (i in(1:(length(price)-1))) { price_returns[i] = log(price[i]/price[i+1]) } volatility = sd(price_returns) entity_returns = unique(instru_name) colnames(price_returns) = entity_returns write.csv(price_returns, file = paste(entity_returns, .csv, sep = ), row.names = FALSE) return(data.frame(list(volatility = volatility))) } entity_volatility - ddply(.data=mydata, .variables = instru_name, .fun=function(x) opt_return_volatilty(price = x$price, instru_name = x$instru_name)) entity_volatility instru_name volatility 1 instru_A 0.17746897 2 instru_B 0.06565341 fileNames - list.files(pattern = instru.*.csv) fileNames [1] instru_A.csv instru_B.csv # _ # MY QUERY # I need to construct the data frame consisting of all the returns. I.e. I need to have # a data.frame like instru_A instru_B 0.125163143 0.033983853 -0.2135741 -0.059898142 -0.109199292 -0.034289073 0.006920443 0.00224972 -0.027398974 -0.094490843 I am using following Code input - do.call(rbind, lapply(fileNames, function(.name) { .data - read.csv(.name, header = TRUE, as.is = TRUE) .data$file - .name .data })) # I get following error. Error in match.names(clabs, names(xi)) : names do not match previous names Kindly guide Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constructing a data.frame from csv files
Dear Sir, Thanks a lot for your guidance. I have understood my mistake. It was naming the columns viz. colnames(price_returns) = entity_returns which was creating the problems. Code is running excellently once I got rid of this particular line. I will use melt from reshape etc to get the required data.frame. Thanks again. With warm regards Vincy --- On Wed, 1/11/12, jim holtman jholt...@gmail.com wrote: From: jim holtman jholt...@gmail.com Subject: Re: [R] Constructing a data.frame from csv files To: Vincy Pyne vincy_p...@yahoo.ca Cc: r-help@r-project.org Received: Wednesday, January 11, 2012, 1:49 PM The error message says it all: the dataframes that you are creating, and then trying to 'rbind', do not have the same columns. You need to at least show what the first couple of lines of each of you input files are, or output the names of the columns as you are reading the files. This is some elementary debugging that you will have to learn. On Wed, Jan 11, 2012 at 7:38 AM, Vincy Pyne vincy_p...@yahoo.ca wrote: Dear R helpers, Following is my R code where I am trying to calculate returns and then trying to create a data.frame. Since, I am not aware how many instruments I will be dealing so I have constructed a function. My R code is as follows - library(plyr) mydata - data.frame(instru_name = c(instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_A,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B,instru_B), date = c(10-Jan-12,9-Jan-12,8-Jan-12, 7-Jan-12, 6-Jan-12,5-Jan-12,4-Jan-12,3-Jan-12,2-Jan-12,1-Jan-12, 31-Dec-11, 30-Dec-11,29-Dec-11,28-Dec-11,10-Jan-12,9-Jan-12,8-Jan-12, 7-Jan-12,6-Jan-12,5-Jan-12,4-Jan-12,3-Jan-12,2-Jan-12,1-Jan-12,31-Dec-11,30-Dec-11,29-Dec-11,28-Dec-11), price = c(11.9,10.5,13,14.5,14.4,14.8,10.1,12,14.3, 10.7,11.2,10.2,10.2,10.8,41.9,40.5,43,44.5,44.4,48.8,42.1,44,46.3,48.7,46.2,44.2,42.2,40.8)) attach(mydata) opt_return_volatilty = function(price, instru_name) { price_returns = matrix(data = NA, nrow = (length(price)-1), ncol = 1) for (i in(1:(length(price)-1))) { price_returns[i] = log(price[i]/price[i+1]) } volatility = sd(price_returns) entity_returns = unique(instru_name) colnames(price_returns) = entity_returns write.csv(price_returns, file = paste(entity_returns, .csv, sep = ), row.names = FALSE) return(data.frame(list(volatility = volatility))) } entity_volatility - ddply(.data=mydata, .variables = instru_name, .fun=function(x) opt_return_volatilty(price = x$price, instru_name = x$instru_name)) entity_volatility instru_name volatility 1 instru_A 0.17746897 2 instru_B 0.06565341 fileNames - list.files(pattern = instru.*.csv) fileNames [1] instru_A.csv instru_B.csv # _ # MY QUERY # I need to construct the data frame consisting of all the returns. I.e. I need to have # a data.frame like instru_A instru_B 0.125163143 0.033983853 -0.2135741 -0.059898142 -0.109199292 -0.034289073 0.006920443 0.00224972 -0.027398974 -0.094490843 I am using following Code input - do.call(rbind, lapply(fileNames, function(.name) { .data - read.csv(.name, header = TRUE, as.is = TRUE) .data$file - .name .data })) # I get following error. Error in match.names(clabs, names(xi)) : names do not match previous names Kindly guide Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] KS and AD test for Generalized PAreto and Generalized Extreme value
Dear R helpers, I need to use KS and AD test for Generalized Pareto and Generalized extreme value. E.g. if I need to use KS for Weibull, I have teh syntax ks.test(x.wei,pweibull, shape=2,scale=1) Similarly, for AD I use ad.test(x, distr.fun, ...) My problem is fir given data, I have estimated the parameters of GPD and GEV using lmom. But I am not able to find out the distribution name I should be use for these distributions if I wish to use these tests. E.g, for gamma, I can use pgamma etc. What distribution name I should use for GPD and GEV and for that matter where can I find the distribution names I can use for KS and AD test. Thanks in advance Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matching two datasets and updating values
Dear R forum I have two datafarmes with category and cat_val forming one dataframe and cust and cust_category forming another dataframe. category = c(C, D, B, A) cat_val = c(0.10, 0.25, 0.40, 0.54) cust = c(cust_1, cust_2, cust_3, cust_4, cust_5, cust_6, cust_7, cust_8, cust_9, cust_10) cust_category = c(C, A, A, A, A, C, D, B, B, D) Thus, I have category [1] C D B A cat_val [1] 0.10 0.25 0.40 0.54 cust [1] cust_1 cust_2 cust_3 cust_4 cust_5 [6] cust_6 cust_7 cust_8 cust_9 cust_10 cust_category [1] C A A A A C D B B D My problem is to match 'cust_category' with 'category' and accordingly selct the value assigned to this category value. In other words, 1st element of cust_category is C, so it should select the value 0.10, the second element is A, so it should assign value 0.54 against this. So effectively I should get cust cust_category cat_val cust_1 C 0.10 cust_2 A 0.54 cust_3 A 0.54 cust_10 D 0.25 Kindly guide Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question regarding dnorm()
Hi, I have one basic doubt. Suppose X ~ N(50,10). I need to calculate Probability X = 50. dnorm(50, 50, 10) gives me [1] 0.03989423 My understanding is (which is bit statistical or may be mathematical) on a continuous scale, Probability of the type P(X = .) are nothing but 1/Infinity i.e. = 0. So as per my understanding P(X = 50) should be 0, but even excel also gives 0.03989422. Obviously my understanding is wrong. If I put value of x = 0 in the normal density function, I do get 0.03989422. My confusion is on the continuous scale if the probability (X = x) doesn't make sense, 0.03989423 is significant to neglect. Please clarify Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question regarding dnorm()
Dear Sirs, Thanks a lot for your explanation. This was such a hugh conceptual error from my end. I never realized probability and density are two different things. I used to feel I have strated understanding stats a bit. This explanation has changed everything again. Thanks a lot again Mr Ellison and Mr Mark for your guidance. Regards Vincy --- On Wed, 9/14/11, S Ellison s.elli...@lgcgroup.com wrote: From: S Ellison s.elli...@lgcgroup.com Subject: RE: [R] Question regarding dnorm() To: Vincy Pyne vincy_p...@yahoo.ca, r-help@r-project.org r-help@r-project.org Received: Wednesday, September 14, 2011, 11:37 AM You have calculated density, not probability. Probability is in [0,1]; density is in [0,Inf) And for a continuous variable, density cannot be interpreted as a probability or a frequency. S -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Vincy Pyne Sent: 14 September 2011 12:24 To: r-help@r-project.org Subject: [R] Question regarding dnorm() Hi, I have one basic doubt. Suppose X ~ N(50,10). I need to calculate Probability X = 50. dnorm(50, 50, 10) gives me [1] 0.03989423 My understanding is (which is bit statistical or may be mathematical) on a continuous scale, Probability of the type P(X = .) are nothing but 1/Infinity i.e. = 0. So as per my understanding P(X = 50) should be 0, but even excel also gives 0.03989422. Obviously my understanding is wrong. If I put value of x = 0 in the normal density function, I do get 0.03989422. My confusion is on the continuous scale if the probability (X = x) doesn't make sense, 0.03989423 is significant to neglect. Please clarify Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** This email and any attachments are confidential. Any use...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Autocorrelation using acf
Dear R list As suggested by Prof Brian Ripley, I have tried to read acf literature. The main problem is I am not the statistician and hence have some problem in understanding the concepts immediately. I came across one literature (http://www.stat.nus.edu.sg/~staxyc/REG32.pdf) on auto-correlation giving the methodology. As per that literature, the auto-correlation is arrived at as per following. y = c(15.91,9.80,17.16,16.68,15.53,22.66,31.01,8.62,45.82,10.97,45.46,28.69,36.75,37.75, 41.18,42.67,46.05, 43.70,53.08,47.56) t = c(1:20) # defining time variable. Fitting y = a + bt + e, I get the estimates of a and b as a = 9.12 and b = 2.07. So using these estimates I obtain y_fit = c(11.19,13.26,15.33,17.40,19.47,21.54,23.61,25.68,27.75,29.82,31.89,33.96, 36.03,38.10, 40.17,42.24,44.31,46.38,48.45,50.52) # these are fitted values. e_t = (y - y_fit)  # dif between the observed y and fitted value of corresponding y e_t  [1]  4.72 -3.46  1.83 -0.72 -3.94  1.12  7.40  [8] -17.06 18.07 -18.85 13.57 -5.27  0.72 -0.35 [15]  1.01  0.43  1.74 -2.68  4.63 -2.96 # We define e_t1 = c(-3.46,1.83,-0.72,-3.94,1.12,7.40,-17.06,18.07,-18.85,13.57,-5.27,0.72,-0.35,1.01, 0.43,1.74,-2.68,4.63,-2.96)  # 1 st element of e_t deleted e_t2 = c(4.72,-3.46,1.83,-0.72,-3.94,1.12,7.40,-17.06,18.07,-18.85,13.57,-5.27,0.72,-0.35, 1.01, 0.43,1.74,-2.68,4.63)    # Original series with last element deleted cor(e_t1, e_t2) cor(e_t1, e_t2) [1] -0.8732316 However, if I use acf(y, 1) Autocorrelations of series âyâ, by lag    0    1 1.000 0.343 I am simply not able to figure out how acf is used? Thanking you in advance. Regards Vincy --- On Wed, 8/24/11, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: From: Prof Brian Ripley rip...@stats.ox.ac.uk Subject: Re: [R] Autocorrelation using library(tseries) To: Vincy Pyne vincy_p...@yahoo.ca Cc: r-help@r-project.org Received: Wednesday, August 24, 2011, 9:08 AM Your understanding is wrong. For a start, there is no function acf() in package tseries: it is in stats. And the autocorrelation at lag one is not the correlation omitting the first and last values: it uses the mean and variance estimated from the whole series and divisor n. Have you looked at the reference given on ?acf ? As the help says     (This contains the exact definitions used.) Neither the R help pages nor R-help are intended as tutorials in statistics. On Wed, 24 Aug 2011, Vincy Pyne wrote: Dear R list I am trying to understand the auto-correlation concept. Auto-correlation is the self-correlation of random variable X with a certain time lag of say t. The article http://www.mit.tut.fi/MIT-3010/luentokalvot/lk10-11/MDA_lecture16_11.pdf; (Page no. 9 and 10) gives the methodology as under. But that is not the definitive reference, and no, it doesn't (and what it does give is not the conventional definition in the time series literature). Suppose you have a time series observations as say X = c(44,41,46,49,49,50,40,44,49,41) # For autocorrelation with time lag of 1 we define A = c(41,46,49,49,50,40,44,49,41)?? # first element of X not considered B = c(44,41,46,49,49,50,40,44,49) # Last element of X not considered cor(A,B) [1] -0.02581234 However, if I try the acf command using library tseries I get acf(X, 1) Autocorrelations of series ???X???, by lag 0?? 1 ??1.000 -0.019 So by usual correlation command (where same random variable X is converted into two series with a time lag of 1), I obtain auto-correlation as -0.02581234 and by acf command I get auto-correlation = -0.019 (for time lag of 1). I am not able to figure out where I am going wrong or is it my understanding of auto-correlation procedure is wrong? Will be grateful if someone guides . Vincy    [[alternative HTML version deleted]] -- Brian D. Ripley,         rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford,        Tel: +44 1865 272861 (self) 1 South Parks Road,            +44 1865 272866 (PA) Oxford OX1 3TG, UK        Fax: +44 1865 272595 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Autocorrelation using library(tseries)
Dear R list I am trying to understand the auto-correlation concept. Auto-correlation is the self-correlation of random variable X with a certain time lag of say t. The article http://www.mit.tut.fi/MIT-3010/luentokalvot/lk10-11/MDA_lecture16_11.pdf; (Page no. 9 and 10) gives the methodology as under. Suppose you have a time series observations as say X = c(44,41,46,49,49,50,40,44,49,41) # For autocorrelation with time lag of 1 we define A = c(41,46,49,49,50,40,44,49,41) # first element of X not considered B = c(44,41,46,49,49,50,40,44,49) # Last element of X not considered cor(A,B) [1] -0.02581234 However, if I try the acf command using library tseries I get acf(X, 1) Autocorrelations of series âXâ, by lag     0     1  1.000 -0.019 So by usual correlation command (where same random variable X is converted into two series with a time lag of 1), I obtain auto-correlation as -0.02581234 and by acf command I get auto-correlation = -0.019 (for time lag of 1). I am not able to figure out where I am going wrong or is it my understanding of auto-correlation procedure is wrong? Will be grateful if someone guides . Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Correlation discrepancy
Dear R list, I have one very elementary question regrading correlation between two variables. x = c(44,46,46,47,45,43,45,44) y = c(44,43,41,41,46,48,44,43) cov(x, y) [1] -2.428571 However, if I try to calculate the covariance using the formula as covariance = sum((x-mean(x))*(y-mean(y)))/8 # no of of paired obs. = 8 or covariance = sum(x*y)/8-(mean(x)*mean(y)) gives covariance = 2.125 I am not able to figure out where I am going wrong w.r.t. the covariance formula. Kindly guide. Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correlation discrepancy
Dear Mr. Dimitris and Mr Harding, thanks a lot for your guidance. It will be interesting to find out how the Excel deals with this formula. I will try it. Thanks again. Regards Ashok --- On Tue, 8/23/11, ted.hard...@wlandres.net ted.hard...@wlandres.net wrote: From: ted.hard...@wlandres.net ted.hard...@wlandres.net Subject: Re: [R] Correlation discrepancy To: r-help@r-project.org Cc: Vincy Pyne vincy_p...@yahoo.ca Received: Tuesday, August 23, 2011, 11:38 AM In addition, something has gone wrong, Vincy, with your data x,y between evaluating cov(x,y) and evaluating your explicit formula. If I repeat your commands: x = c(44,46,46,47,45,43,45,44) y = c(44,43,41,41,46,48,44,43) cov(x, y) # [1] -2.428571 sum((x-mean(x))*(y-mean(y)))/8 # [1] -2.125 which has the right sign and, when changed to incorporate the correct denomonator (n-1 = 7) as suggested by Dimitris: sum((x-mean(x))*(y-mean(y)))/7 # [1] -2.428571 gives exact agreement. With regard to your second formula, this should correspondingly be: sum(x*y)/7 - (mean(x)*mean(y))*8/7 # [1] -2.428571 again agreeing exactly. Your result: covariance = sum((x-mean(x))*(y-mean(y)))/8 # no of of paired obs. = 8 or covariance = sum(x*y)/8-(mean(x)*mean(y)) gives covariance = 2.125 agrees in numerical magnitude with the 1/8 form, but has the wrong sign. Or maybe you simply mis-typed -2.125 as 2.125. Hoping this helps, Ted. On 23-Aug-11 11:25:15, Dimitris Rizopoulos wrote: well, you don't have the correct denominator, i.e., n-1, with n denoting the sample size. Have a look at the *Details* section of the online help file for cov(), and try also sum((x-mean(x))*(y-mean(y)))/7 cov(x, y) I hope it helps. Best, Dimitris On 8/23/2011 1:18 PM, Vincy Pyne wrote: Dear R list, I have one very elementary question regrading correlation between two variables. x = c(44,46,46,47,45,43,45,44) y = c(44,43,41,41,46,48,44,43) cov(x, y) [1] -2.428571 However, if I try to calculate the covariance using the formula as covariance = sum((x-mean(x))*(y-mean(y)))/8 # no of of paired obs. = 8 or covariance = sum(x*y)/8-(mean(x)*mean(y)) gives covariance = 2.125 I am not able to figure out where I am going wrong w.r.t. the covariance formula. Kindly guide. Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 23-Aug-11 Time: 12:38:36 -- XFMail -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correlation discrepancy
Dear Mr Dimitris and Mr Harding, by mistake I have typed my colleagues name (i.e. Ashok) while thanking you. Please excuse me for that. Regards Vincy --- On Tue, 8/23/11, ted.hard...@wlandres.net ted.hard...@wlandres.net wrote: From: ted.hard...@wlandres.net ted.hard...@wlandres.net Subject: Re: [R] Correlation discrepancy To: r-help@r-project.org Cc: Vincy Pyne vincy_p...@yahoo.ca Received: Tuesday, August 23, 2011, 11:38 AM In addition, something has gone wrong, Vincy, with your data x,y between evaluating cov(x,y) and evaluating your explicit formula. If I repeat your commands: x = c(44,46,46,47,45,43,45,44) y = c(44,43,41,41,46,48,44,43) cov(x, y) # [1] -2.428571 sum((x-mean(x))*(y-mean(y)))/8 # [1] -2.125 which has the right sign and, when changed to incorporate the correct denomonator (n-1 = 7) as suggested by Dimitris: sum((x-mean(x))*(y-mean(y)))/7 # [1] -2.428571 gives exact agreement. With regard to your second formula, this should correspondingly be: sum(x*y)/7 - (mean(x)*mean(y))*8/7 # [1] -2.428571 again agreeing exactly. Your result: covariance = sum((x-mean(x))*(y-mean(y)))/8 # no of of paired obs. = 8 or covariance = sum(x*y)/8-(mean(x)*mean(y)) gives covariance = 2.125 agrees in numerical magnitude with the 1/8 form, but has the wrong sign. Or maybe you simply mis-typed -2.125 as 2.125. Hoping this helps, Ted. On 23-Aug-11 11:25:15, Dimitris Rizopoulos wrote: well, you don't have the correct denominator, i.e., n-1, with n denoting the sample size. Have a look at the *Details* section of the online help file for cov(), and try also sum((x-mean(x))*(y-mean(y)))/7 cov(x, y) I hope it helps. Best, Dimitris On 8/23/2011 1:18 PM, Vincy Pyne wrote: Dear R list, I have one very elementary question regrading correlation between two variables. x = c(44,46,46,47,45,43,45,44) y = c(44,43,41,41,46,48,44,43) cov(x, y) [1] -2.428571 However, if I try to calculate the covariance using the formula as covariance = sum((x-mean(x))*(y-mean(y)))/8 # no of of paired obs. = 8 or covariance = sum(x*y)/8-(mean(x)*mean(y)) gives covariance = 2.125 I am not able to figure out where I am going wrong w.r.t. the covariance formula. Kindly guide. Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 23-Aug-11 Time: 12:38:36 -- XFMail -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Meaning of %%
Dear r helpers This may be very elementary question but I couldn't figure out what does the operator %% do? E.g. p - 100 q - 200 p%%q [1] 100 q%%p [1] 0 Please guide. Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Meaning of %%
Meaning of %% Thanks a lot for the guidance. Before posting this query, I had tried following things. ?%% Error: unexpected SPECIAL in ?%% ??%% Error: unexpected SPECIAL in ??%% I also tried search.r-project.org and tried to search there also, but no luck. help(%%) GIVES ME Error in file(out, wt) : cannot open the connection In addition: Warning message: In file(out, wt) : cannot open file 'C:\DOCUME~1\LOCALS~1\Temp\RtmpoCnAxB\Rtxt52325f7': No such file or directory Regards Vincy --- On Wed, 7/13/11, ONKELINX, Thierry thierry.onkel...@inbo.be wrote: From: ONKELINX, Thierry thierry.onkel...@inbo.be Subject: RE: [R] Meaning of %% To: Vincy Pyne vincy_p...@yahoo.ca, r-help@r-project.org r-help@r-project.org Received: Wednesday, July 13, 2011, 10:13 AM help(%%) ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Vincy Pyne Verzonden: woensdag 13 juli 2011 12:00 Aan: r-help@r-project.org Onderwerp: [R] Meaning of %% Dear r helpers This may be very elementary question but I couldn't figure out what does the operator %% do? E.g. p - 100 q - 200 p%%q [1] 100 q%%p [1] 0 Please guide. Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Generalized Logistic and Richards Curve
Dear R helpers, I am not a statistician and right now struggling with Richards curve. Wikipedia says (http://en.wikipedia.org/wiki/Generalised_logistic_function) The generalized logistic curve or function, also known as Richard's curve is a widely-used and flexible sigmoid function for growth modelling, extending the well-known logistic curve. Now I am confused and will like to know if the Generalized Logistic distribution as described in lmomco package is same as what wikipedia is describing. In other words, is Generalized Logistic Function same as Generalized logistic distribution? I do understand there is separate R package richards' for dealing with Richards curve. Kindly guide Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Value of 'pi'
Dear R helpers, I have one basic doubt about the value of pi. In school, we have learned that pi = 22/7 (which is = 3.142857). However, if I type pi in R, I get pi = 3.141593. So which value of pi should be considered? Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Value of 'pi'
That's the beauty of this R forum. This forum is full of knowledgeable wizards and replies received along-with the related discussions pertaining to a simple harmless question like this enriches us tremendously. Thanks a lot for all your comments. I am sticking to the value of 'pi' as provided in R as I am hardcore R disciple. Regards Vincy --- On Mon, 5/30/11, ted.hard...@wlandres.net ted.hard...@wlandres.net wrote: From: ted.hard...@wlandres.net ted.hard...@wlandres.net Subject: Re: [R] Value of 'pi' To: r-help@r-project.org Received: Monday, May 30, 2011, 8:52 AM On 30-May-11 07:06:57, Peter Langfelder wrote: On Sun, May 29, 2011 at 11:53 PM, bill.venab...@csiro.au wrote: There is an urban legend that says Indiana passed a law implying pi = 3. (Because it says so in the bible...) Apparently the Fortran language has a DATA statement just for this purpose. This is allegedly a quote from an early Fortran manual: The primary purpose of the DATA statement is to give names to constants; instead of referring to pi as 3.141592653589793 at every appearance, the variable PI can be given that value with a DATA statement and used instead of the longer form of the constant. This also simplifies modifying the program, should the value of pi change. Peter My take on this discussion: Take a nice-looking pie, say 113355, slice it, and put one half on top of the other. Call it pi: pi = 355/113 Compared with pi = 22/7, which is not even pretty, it is also a much closer approximation to the mathematical ideal: To 20 decimal places (using 'bc' here) true pi = 3.14159265358979323844 355/113 = 3.14159292035398230088 22/7 = 3.14285714285714285714 so 355/113 is good to the 6th decimal place (3.141593), while 22/7 breaks down at the 3rd (3.143 instead of 3.142). In the back of my head is a memory of a passage I read some 50 years ago. I write a paraphrase, since I don't recall the exact words: For an engineer, assuming that pi = 3.142 will probably enable him to build a very satisfactory bridge. Assuming that pi = 3.14159265358979323844 will give the circumference of the Earth's orbit to one millionth of a millimetre. For a pure mathematician, however, either assumption leads to the conclusion that 1 = 0. It is necessary to preserve common sense in the application of mathematical deduction. I suspect (from my context at the time) that it may well have been by J.L. Synge (beautiful writer on theoretical physics, especially Relativity Theory) in one of his several writings on Ballistics. However, the one possibly relevant printed item which I still have from those days: K.L. Nielsen and J.L. Synge, On the motion of a spinning shell Quarterly of Applied Mathematics, 4(3), Oct 1946,201-226. discusses a very similar issue, but puts it quite differently. If my quotation above reminds anyone of the original, I would be very grateful to learn of the reference to the source! With thanks, and Many Happy Approximations to you all! Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 30-May-11 Time: 09:52:09 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R forum for only Statistics
Hi! I wish to know if there is any R forum which is meant only for Statistics? I mean where we can clarify our statistics doubts and seek knowledge. I know there are lot many books and internet sites, but 'R forum' has altogether different standard and very high level and one can learn a lot from them. Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reversing order of vector
Dear R helpers Suppose I have a vector as vect1 = as.character(c(ABC, XYZ, LMN, DEF)) vect1 [1] ABC XYZ LMN DEF I want to reverse the order of this vector as vect2 = c(DEF, LMN, XYZ, ABC) Kindly guide Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ordering data.frame based on class
Dear R helpers Suppose I have a data.frame as given below - my_dat = data.frame(class = c(XYZ, XYZ, XYZ, XYZ, XYZ,ABC, ABC, ABC, ABC, ABC ), var1 = c(20, 14, 89, 81, 17, 44, 36, 41, 11, 36), var2 = c(1001, 250, 456, 740, 380, 641, 111, 209, 830, 920)) my_dat class var1 var2 1 XYZ 20 1001 2 XYZ 14 250 3 XYZ 89 456 4 XYZ 81 740 5 XYZ 17 380 6 ABC 44 641 7 ABC 36 111 8 ABC 41 209 9 ABC 11 830 10 ABC 20 920 I wish to sort above data.frame class-wise on var1. Thus, Ineed to get class var1 var2 XYZ 14 250 XYZ 17 380 XYZ 20 1001 XYZ 81 740 XYZ 89 456 ABC 11 830 ABC 20 920 ABC 36 111 ABC 41 209 ABC 44 641 Kindly guide Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Resending the mail - Ordering data.frame based on some class
Dear R helpers I am resending my mail as the output I desire was not properly visible and I apologize for the same. Suppose I have a data.frame as given below - my_dat = data.frame(class = c(XYZ, XYZ, XYZ, XYZ, XYZ,ABC, ABC, ABC, ABC, ABC ), var1 = c(20, 14, 89, 81, 17, 44, 36, 41, 11, 36), var2 = c(1001, 250, 456, 740, 380, 641, 111, 209, 830, 920)) my_dat class var1 var2 1 XYZ 20 1001 2 XYZ 14 250 3 XYZ 89 456 4 XYZ 81 740 5 XYZ 17 380 6 ABC 44 641 7 ABC 36 111 8 ABC 41 209 9 ABC 11 830 10 ABC 20 920 I wish to sort above data.frame class-wise on var1. Thus, Ineed to get class var1 var2 XYZ 14 250 XYZ 17 380 XYZ 20 1001 XYZ 81 740 XYZ 89 456 ABC 11 830 ABC 20 920 ABC 36 111 ABC 41 209 ABC 44 641 Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ordering data.frame based on class
Dear sir, Thanks for the great solution. Regards Vincy --- On Mon, 3/28/11, Henrique Dallazuanna www...@gmail.com wrote: From: Henrique Dallazuanna www...@gmail.com Subject: Re: [R] Ordering data.frame based on class To: Vincy Pyne vincy_p...@yahoo.ca Cc: r-help@r-project.org Received: Monday, March 28, 2011, 9:02 PM Try this: my_dat[order(my_dat$class, -my_dat$var1, decreasing = TRUE),] On Mon, Mar 28, 2011 at 5:55 PM, Vincy Pyne vincy_p...@yahoo.ca wrote: Dear R helpers Suppose I have a data.frame as given below - my_dat = data.frame(class = c(XYZ, XYZ, XYZ, XYZ, XYZ,ABC, ABC, ABC, ABC, ABC ), var1 = c(20, 14, 89, 81, 17, 44, 36, 41, 11, 36), var2 = c(1001, 250, 456, 740, 380, 641, 111, 209, 830, 920)) my_dat class var1 var2 1 XYZ 20 1001 2 XYZ 14 250 3 XYZ 89 456 4 XYZ 81 740 5 XYZ 17 380 6 ABC 44 641 7 ABC 36 111 8 ABC 41 209 9 ABC 11 830 10 ABC 20 920 I wish to sort above data.frame class-wise on var1. Thus, Ineed to get class var1 var2 XYZ 14 250 XYZ 17 380 XYZ 20 1001 XYZ 81 740 XYZ 89 456 ABC 11 830 ABC 20 920 ABC 36 111 ABC 41 209 ABC 44 641 Kindly guide Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Appending data to a data.frame and writing a csv
Dear R helpers exposure - data.frame(id = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20), ead = c(9483.686,5,6843.4968,10509.37125,21297.8905,5,706152.8354, 62670.5625, 687.801995,50641.4875,59227.125,43818.5778,52887.72534,601788.7937, 56813.14859,4012356.056,1419501.179,210853.4743,749961,6599.0862), pd = c(0.0191,0.0050,0.0298,0.0449,0.0442,0.0479,0.0007,0.0203,0.0431,0.0069, 0.0122,0.0022,0.0016,0.0082,0.0109,0.0008,0.0142,0.0171,0.0276,0.0178), lgd = c(0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45, 0.45,0.45,0.45,0.45)) param - data.frame(alpha = 0.99, size = 50) # size is basically no of simulations n - length(exposure$id) id - exposure$id ead - exposure$ead lgd - exposure$lgd pd - exposure$pd alpha - param$alpha samplesize - param$size ## generate random numbers s.t. 1 = Default, 0 = no-default. L - matrix(data=NA, nrow=n, ncol=samplesize, byrow=TRUE) for(i in 1:n) L[i,] - rbinom(n=samplesize, size=1, prob=exposure$pd[i]) # # compute for each simulation p_loss - e_loss - u_loss - NULL for(i in 1:samplesize) { defaulting - subset(data.frame(id=exposure$id, ead=exposure$ead, lgd=exposure$lgd, pd=exposure$pd, loss=L[,i]), loss==1) p_loss[i] - sum(defaulting$ead * defaulting$lgd) e_loss[i] - sum(defaulting$ead * defaulting$lgd * defaulting$pd) u_loss[i] - sum(sqrt((defaulting$ead*defaulting$lgd)^2*defaulting$pd - (defaulting$ead * defaulting$lgd * defaulting$pd)^2)) sim_data - data.frame(sim_no=rep(i,length(defaulting$id)), id=defaulting$id, ead=defaulting$ead, lgd=defaulting$lgd, pd=defaulting$pd) write.csv(sim_data, file='sim_data.csv', append=TRUE, row.names=FALSE) } For a given set of 0's and 1's (i.e. for each simulation and there are 50 simulations), first I filter all the entries corresponding to 0's i.e. for a given simulation, I need to store ead, lgd and pd pertaining to only non-zeros i.e. pertaining to 1. Thus, for each of these 50 simulations, I need to define a data.frame giving me filtered ead, lgd and pd and in teh end write a single file sim_data.csv I get following warnings. Warning messages: 1: In write.csv(sim_data, file = sim_data.csv, append = TRUE, ... : attempt to set 'append' ignored 2: In write.csv(sim_data, file = sim_data.csv, append = TRUE, ... : attempt to set 'append' ignored . . 50: In write.csv(sim_data, file = sim_data.csv, append = TRUE, ... : attempt to set 'append' ignored Kindly guide Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Appending data to a data.frame and writing a csv
Dear Mr Ista Zahn, Thanks a lot for your suggestion. I had also realized that if I need to write.csv command should be out of loop. At first, I need to construct the data.frame. Actually appending this data.frame is causing me the problem and not writing the csv file. That particular command will be executed outside the loop. Once this is generated, writing of the csv file should not be problem outside the loop. Regards Vincy --- On Fri, 3/25/11, Ista Zahn iz...@psych.rochester.edu wrote: From: Ista Zahn iz...@psych.rochester.edu Subject: Re: [R] Appending data to a data.frame and writing a csv To: Vincy Pyne vincy_p...@yahoo.ca Cc: r-help@r-project.org Received: Friday, March 25, 2011, 4:02 PM Hi Vincy, Please read the help file, particularly the part about write.csv and write.csv2 where it says These wrappers are deliberately inflexible: they are designed to ensure that the correct conventions are used to write a valid file. Attempts to change append, col.names, sep, dec or qmethod are ignored, with a warning. Use write.table instead. Best, Ista On Fri, Mar 25, 2011 at 8:55 AM, Vincy Pyne vincy_p...@yahoo.ca wrote: Dear R helpers exposure - data.frame(id = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20), ead = c(9483.686,5,6843.4968,10509.37125,21297.8905,5,706152.8354, 62670.5625, 687.801995,50641.4875,59227.125,43818.5778,52887.72534,601788.7937, 56813.14859,4012356.056,1419501.179,210853.4743,749961,6599.0862), pd = c(0.0191,0.0050,0.0298,0.0449,0.0442,0.0479,0.0007,0.0203,0.0431,0.0069, 0.0122,0.0022,0.0016,0.0082,0.0109,0.0008,0.0142,0.0171,0.0276,0.0178), lgd = c(0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45, 0.45,0.45,0.45,0.45)) param - data.frame(alpha = 0.99, size = 50) # size is basically no of simulations n - length(exposure$id) id - exposure$id ead - exposure$ead lgd - exposure$lgd pd - exposure$pd alpha - param$alpha samplesize - param$size ## generate random numbers s.t. 1 = Default, 0 = no-default. L - matrix(data=NA, nrow=n, ncol=samplesize, byrow=TRUE) for(i in 1:n) L[i,] - rbinom(n=samplesize, size=1, prob=exposure$pd[i]) # # compute for each simulation p_loss - e_loss - u_loss - NULL for(i in 1:samplesize) { defaulting - subset(data.frame(id=exposure$id, ead=exposure$ead, lgd=exposure$lgd, pd=exposure$pd, loss=L[,i]), loss==1) p_loss[i] - sum(defaulting$ead * defaulting$lgd) e_loss[i] - sum(defaulting$ead * defaulting$lgd * defaulting$pd) u_loss[i] - sum(sqrt((defaulting$ead*defaulting$lgd)^2*defaulting$pd - (defaulting$ead * defaulting$lgd * defaulting$pd)^2)) sim_data - data.frame(sim_no=rep(i,length(defaulting$id)), id=defaulting$id, ead=defaulting$ead, lgd=defaulting$lgd, pd=defaulting$pd) write.csv(sim_data, file='sim_data.csv', append=TRUE, row.names=FALSE) } For a given set of 0's and 1's (i.e. for each simulation and there are 50 simulations), first I filter all the entries corresponding to 0's i.e. for a given simulation, I need to store ead, lgd and pd pertaining to only non-zeros i.e. pertaining to 1. Thus, for each of these 50 simulations, I need to define a data.frame giving me filtered ead, lgd and pd and in teh end write a single file sim_data.csv I get following warnings. Warning messages: 1: In write.csv(sim_data, file = sim_data.csv, append = TRUE, ... : attempt to set 'append' ignored 2: In write.csv(sim_data, file = sim_data.csv, append = TRUE, ... : attempt to set 'append' ignored . . 50: In write.csv(sim_data, file = sim_data.csv, append = TRUE, ... : attempt to set 'append' ignored Kindly guide Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Correlation for no of variables
Dear R helpers, Suppose I have stock returns data of say 1500 companies each for say last 4 years. Thus I have a matrix of dimension say 1000 * 1500 i.e. 1500 columns representing companies and 1000 rows of their returns. I need to find the correlation matrix of these 1500 companies. So I can find out the correlation as cor(returns) and expect to get 1500 * 1500 matrix. However, the process takes a tremendous time. Is there any way in expediting such a process. In reality, I may be dealing with lots of even 5000 stocks and may simulate even 10 stock returns. Kindly guide. Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correlation for no of variables
Thanks Mr Langfelder, Definitely I will go through the packages you have suggested. Actually, I will be multiplying three matrices of the order (1 X 1500)%*%(1500 X 1500) %*% (1500, 1) giving me one value at the end. I will be starting my process in a couple of days time and in between will refer to the packages you have suggested. Thanks again Vincy --- On Mon, 3/21/11, Peter Langfelder peter.langfel...@gmail.com wrote: From: Peter Langfelder peter.langfel...@gmail.com Subject: Re: [R] Correlation for no of variables To: Vincy Pyne vincy_p...@yahoo.ca Cc: r-help@r-project.org Received: Monday, March 21, 2011, 4:50 PM On Mon, Mar 21, 2011 at 8:34 AM, Vincy Pyne vincy_p...@yahoo.ca wrote: Dear R helpers, Suppose I have stock returns data of say 1500 companies each for say last 4 years. Thus I have a matrix of dimension say 1000 * 1500 i.e. 1500 columns representing companies and 1000 rows of their returns. I need to find the correlation matrix of these 1500 companies. So I can find out the correlation as cor(returns) and expect to get 1500 * 1500 matrix. However, the process takes a tremendous time. Is there any way in expediting such a process. In reality, I may be dealing with lots of even 5000 stocks and may simulate even 10 stock returns. How long is tremendous time? What platform are you on? If you can compile R against a tuned BLAS library, stats::cor will run faster IF you do not have any missing data. If you do have missing data, you may want to try the package WGCNA (where we work with bigger correlation matrices) that implements a correlation calculation that is faster particularly if there are few missing data. This will also run faster if you do have a tuned BLAS installed. HTH, Peter Kindly guide. Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] One to One Matching multiple vectors
Dear R helpers Suppose, x = c(0, 1, 2, 3) y = c(A, B, C, D) z = c(1, 3) For given values of z, I need to the values of y. So I should get B and D. I tried doing y[x][z] but it gives y[x][z] [1] A C Kindly guide. Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matching two vectors
Dear R helpers Suppose I have a vector as vect_1 = c(AAA, AA, A, BBB, BB, B, CCC) vect_1_id = c(1:length(vect_1)) Through some process I obtain vect_2_id = c(2, 3, 7), then I need a new vector say vect_2 which will give me vect2 = (AA, A, CCC) i.e. I need the subset of vect_1 as per vect_2_id. Thanking in advance Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Identifying unique pairs
Dear R helpers Suppose I have a data frame as given below mydat = data.frame(x = c(1,1,1, 2, 2, 2, 2, 2, 5, 5, 6), y = c(10, 10, 10, 8, 8, 8, 7, 7, 2, 2, 4)) mydat x y 1 1 10 2 1 10 3 1 10 4 2 8 5 2 8 6 2 8 7 2 7 8 2 7 9 5 2 10 5 2 11 6 4 unique(mydat$x) will give me 1, 2, 5, 6 i.e. 4 values and unique(mydat$y) will give me 10, 8, 7, 2, 4. What I need is a data frame where I will get a vector (say) x_new as (1, 2, 2, 5, 6) and corresponding y_new as (10, 8, 7, 2, 4). I need to use these two vectors viz. x_new and y_new seperately for further processing. They may be under same data frame say mydat_new but I should be able to access them as mydat_new$x_new and similarly for y. I tried following way. pp = paste(mydat$x, mydat$y) pp = pp [1] 1 10 1 10 1 10 2 8 2 8 2 8 2 7 2 7 5 2 5 2 6 4 qq = unique(pp) qq [1] 1 10 2 8 2 7 5 2 6 4 So I get the desired pairs, but I want each element of pair in two columns seperately as x_new y_new 1 10 2 8 2 7 5 2 6 4 Kindly guide Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Identifying unique pairs
Thanks sir for your reply. Unfortunately I couldn't figure out the solution. Vincy --- On Sat, 3/12/11, Dennis Murphy djmu...@gmail.com wrote: From: Dennis Murphy djmu...@gmail.com Subject: Re: [R] Identifying unique pairs To: Vincy Pyne vincy_p...@yahoo.ca Cc: r-help@r-project.org Received: Saturday, March 12, 2011, 11:45 AM Hi: This problem came up the other day - see http://stats.stackexchange.com/questions/7884/fast-ways-in-r-to-get-the-first-row-of-a-data-frame-grouped-by-an-identifier/7985#7985 Dennis On Sat, Mar 12, 2011 at 3:20 AM, Vincy Pyne vincy_p...@yahoo.ca wrote: Dear R helpers Suppose I have a data frame as given below mydat = data.frame(x = c(1,1,1, 2, 2, 2, 2, 2, 5, 5, 6), y = c(10, 10, 10, 8, 8, 8, 7, 7, 2, 2, 4)) mydat x y 1 1 10 2 1 10 3 1 10 4 2 8 5 2 8 6 2 8 7 2 7 8 2 7 9 5 2 10 5 2 11 6 4 unique(mydat$x) will give me 1, 2, 5, 6 i.e. 4 values and unique(mydat$y) will give me 10, 8, 7, 2, 4. What I need is a data frame where I will get a vector (say) x_new as (1, 2, 2, 5, 6) and corresponding y_new as (10, 8, 7, 2, 4). I need to use these two vectors viz. x_new and y_new seperately for further processing. They may be under same data frame say mydat_new but I should be able to access them as mydat_new$x_new and similarly for y. I tried following way. pp = paste(mydat$x, mydat$y) pp = pp [1] 1 10 1 10 1 10 2 8 2 8 2 8 2 7 2 7 5 2 5 2 6 4 qq = unique(pp) qq [1] 1 10 2 8 2 7 5 2 6 4 So I get the desired pairs, but I want each element of pair in two columns seperately as x_new y_new 1 10 2 8 2 7 5 2 6 4 Kindly guide Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Identifying unique pairs
Dear sir, Thanks a lot for the solution. It was such a simple solution, but people like me close their minds and don't think of data.frame as a whole and keep on thinking about vector elements only. I also almost got the solution when I tried qq = unique(pp) (qq - sub( .*,, qq)) but this was giving me only first element of qq. so I reversed the way I had defined paste command and saved it in some otehr name and again applied above command and I got the element. But I know that is not how good programs are written. Regards Vincy --- On Sat, 3/12/11, Petr Savicky savi...@praha1.ff.cuni.cz wrote: From: Petr Savicky savi...@praha1.ff.cuni.cz Subject: Re: [R] Identifying unique pairs To: r-help@r-project.org Received: Saturday, March 12, 2011, 2:10 PM On Sat, Mar 12, 2011 at 03:20:01AM -0800, Vincy Pyne wrote: Dear R helpers Suppose I have a data frame as given below mydat = data.frame(x = c(1,1,1, 2, 2, 2, 2, 2, 5, 5, 6), y = c(10, 10, 10, 8, 8, 8, 7, 7, 2, 2, 4)) [...] unique(mydat$x) will give me 1, 2, 5, 6? i.e. 4 values and unique(mydat$y) will give me 10, 8, 7, 2, 4. What I need is a data frame where I will get a vector (say) x_new as (1, 2, 2, 5, 6) and corresponding y_new as (10, 8, 7, 2, 4). I need to use these two vectors viz. x_new and y_new seperately for further processing. They may be under same data frame say mydat_new but I should be able to access them as mydat_new$x_new and similarly for y. I tried following way. pp = paste(mydat$x, mydat$y) pp = pp ?[1] 1 10 1 10 1 10 2 8? 2 8? 2 8? 2 7? 2 7? 5 2? 5 2? 6 4 qq = unique(pp) qq [1] 1 10 2 8? 2 7? 5 2? 6 4 Hi. If i understand you correctly, then the solution is easy, since function unique() can handle also rows of a data frame. Is the following, what you expect? unique(mydat) x y 1 1 10 4 2 8 7 2 7 9 5 2 11 6 4 Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Generation of random numbers in a function - (Return command)
Dear R helpers I have following data.frame and for each product_name, I have associated mean and standard deviation. I need to generate 1000 random no.s for each of these products and find the respective mean and standard deviation. My R code is as follows. library(plyr) library(reshape2) filtered_new - data.frame(product_name = c(P1, P2, P3, P4, P5), output_avg = c(22.71078,22.16979,21.34420,20.17421,19.83799), output_stdev = c(23.59924,21.21430,22.01025,18.88877,18.80436)) n - 100 myfunction_mc = function(product_name, output_avg, output_stdev) { product_usage_borrowing_room_mc = rnorm(n, output_avg, output_stdev) output_avg_mc = mean(product_usage_borrowing_room_mc) output_stdev_mc = sd(product_usage_borrowing_room_mc) return(output_avg_mc ) } result - dlply(.data = filtered_new, .variables = product_name, .fun = function(x) myfunction_mc(product_name = x$product_name, output_avg = x$output_avg, output_stdev = x$output_stdev)) result1 - data.frame(result) result2 - melt(result1) result - data.frame(product = filtered_new$product_name, Monte_Carlo_result = result2$value) And it gives me the desired result. # PROBLEM is as given below - But if in the myfunction_mc, in the return statement if I try to add 'output_stdev_mc' i.e. myfunction_mc = function(product_name, output_avg, output_stdev) { product_usage_borrowing_room_mc = rnorm(n, output_avg, output_stdev) output_avg_mc = mean(product_usage_borrowing_room_mc) output_stdev_mc = sd(product_usage_borrowing_room_mc) return(output_avg_mc, output_stdev_mc) # I have added output_stdev_m } result - dlply(.data = filtered_new, .variables = product_name, .fun = function(x) myfunction_mc(product_name = x$product_name, output_avg = x$output_avg, output_stdev = x$output_stdev)) I get following error - Error in return(output_avg_mc, output_stdev_mc) : multi-argument returns are not permitted Kindly guide. Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to use conditional statement
Dear R helpers Suppose val1 = c(10, 20, 35, 80, 12) val2 = c(3, 8, 11, 7) I want to select either val1 or val2 depending on value of third quantity val3. val3 assumes either of the values Monthly or Yearly. If val3 = Monthly, then val = val1 and if val3 = Yearly, then val = val2. I tried the ifelse statement as ifelse(val3 = Monthly, val = val1, val2) I get following error ifelse(val3 = Monthly, val = val1, val2) Error in ifelse(val3 = Monthly, val = val1, val2) : unused argument(s) (val3 = Monthly, val = val1) val Error: object 'val' not found Kindly guide. Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to use conditional statement
Thanks a lot for the wonderful guidance. Regards Vincy --- On Thu, 3/10/11, Ivan Calandra ivan.calan...@uni-hamburg.de wrote: From: Ivan Calandra ivan.calan...@uni-hamburg.de Subject: Re: [R] How to use conditional statement To: Duncan Murdoch murdoch.dun...@gmail.com Cc: r-help@r-project.org Received: Thursday, March 10, 2011, 12:32 PM Thanks for the comment, I didn't think about this thorougly. Le 3/10/2011 12:54, Duncan Murdoch a écrit : On 11-03-10 5:54 AM, Ivan Calandra wrote: Try with double == instead: ifelse(val3 == Monthly, val- val1, val- val2) That might work, but it is not how you should do it. (It should work if val3 has a single entry, but will do strange things if val3 is a vector: val3 - c(Monthly, Daily) ifelse(val3 == Monthly, val- 1, val- 2) [1] 1 2 val [1] 2 The ifelse() function does a vectorized test, and picks results from the two vector alternatives. Vincy wants a simple logical if, which can be computed in a few different ways: val - if(val3 == Monthly) val1 else val2 or if (val3 == Monthly) val - val1 else val - val2 For a simple calculation like this I'd probably use the former; if the calculation got more complex I'd prefer the latter. Duncan Murdoch Single = is for setting arguments within a function call. If you want to test equality, then double == is required. See ?== HTH, Ivan Le 3/10/2011 11:45, Vincy Pyne a écrit : Dear R helpers Suppose val1 = c(10, 20, 35, 80, 12) val2 = c(3, 8, 11, 7) I want to select either val1 or val2 depending on value of third quantity val3. val3 assumes either of the values Monthly or Yearly. If val3 = Monthly, then val = val1 and if val3 = Yearly, then val = val2. I tried the ifelse statement as ifelse(val3 = Monthly, val = val1, val2) I get following error ifelse(val3 = Monthly, val = val1, val2) Error in ifelse(val3 = Monthly, val = val1, val2) : unused argument(s) (val3 = Monthly, val = val1) val Error: object 'val' not found Kindly guide. Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rearranging the data
Dear R helpers, xx = data.frame(country = c(USA, UK, Canada), x = c(10, 50, 20), y = c(40, 80, 35), z = c(70, 62, 10)) xx country x y z 1 USA 10 40 70 2 UK 50 80 62 3 Canada 20 35 10 I need to arrange this as a new data.frame as follows - country type values USA x 10 USA y 40 USA z 70 UK x 50 UK y 80 UK z 62 Canada x 20 Canada y 35 Canada z 10 I did try reshape package but things are in mess. Please guide Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to sort using a predefined criterion
Dear R helpers, Suppose I have following data.frame. df - data.frame(category = c(treat_A, treat_A, treat_A, treat_A, treat_A, treat_A, treat_A, treat_A, treat_B, treat_B, treat_B, treat_B, treat_B, treat_B, treat_B, treat_B), type = c(AA, , B, AAA, BB, , BBB, AAA, B, AAA, BBB, AA, , BB, A, ), values = c(0.382000183, 0.100680563, 0.596484268, 0.899105808, 0.884609516, 0.958464309, 0.014496292, 0.407422102, 0.863246559, 0.138584552, 0.245033113, 0.045472579, 0.032380139, 0.164128544, 0.219611194, 0.017090365)) df category type values 1 treat_A AA 0.38200018 2 treat_A 0.10068056 3 treat_A B 0.59648427 4 treat_A AAA 0.89910581 5 treat_A BB 0.88460952 6 treat_A 0.95846431 7 treat_A BBB 0.01449629 8 treat_A A 0.40742210 9 treat_B B 0.86324656 10 treat_B AAA 0.13858455 11 treat_B BBB 0.24503311 12 treat_B AA 0.04547258 13 treat_B 0.03238014 14 treat_B BB 0.16412854 15 treat_B A 0.21961119 16 treat_B 0.01709036 I need to sort above dataframe for the category treat_A and treat_B type-wise i.e. in the order (, AAA, AA, A, , BBB, BB, B) Thus I need category type values 1 treat_A 0.10068056 2 treat_A AAA 0.89910581 3 treat_A AA 0.38200018 4 treat_A A 0.40742210 5 treat_A 0.95846431 6 treat_A BBB 0.01449629 7 treat_A BB 0.88460952 8 treat_A B 0.59648427 9 treat_B 0.03238014 10 treat_B AAA 0.13858455 11 treat_B AA 0.04547258 12 treat_B A 0.21961119 13 treat_B 0.01709036 14 treat_B BBB 0.24503311 15 treat_B BB 0.16412854 16 treat_B B 0.86324656 Kindly advice how this can be achieved. I referred to ?sort and ?order literature, but couldn't find any example of this sort. Thanking you in advance. Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Replacing an element in a vector
Dear R helpers I seem to have one trivial problem but can't find solution to it. Suppose I have following input. A = c(1, 3, 0, 5, 8) # 3rd element is 0 B = c(100, 30, 0, 25, 40) # 3rd element is 0 C = A/B C [1] 0.01 0.10 NaN 0.20 0.20 Obviously, I can't divide 0/0 and hence NaN. My problem is how to replace this NaN say by 0. So that I can have C as C = c(0.01, 0.10, 0, 0.20, 0.20) I tried the replace command but can't get rid of NaN. Kindly guide. Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subtracting elements of data.frame
Dear R helpers I have a dataframe as df = data.frame(x = c(1, 14, 3, 21, 11), y = c(102, 500, 40, 101, 189)) df x y 1 1 102 2 14 500 3 3 40 4 21 101 5 11 189 # Actually I am having dataframe having multiple columns. I am just giving an example. I need to subtract all the rows of df by the first row of df i.e. I need to subtract each element of 'x' column by 1. Likewise I need to subtract all elements of column 'y' by 11. Thus I need an output like df_new x y 1 0 0 2 13 398 3 2 -62 4 20 -1 5 10 87 As I had mentioned above, I have number of columns in reality and thus I can't use the command say df_new = data.frame(x = df$x-df$x[1], y = df$y-df$y[1]) Kindly guide Thanking you all in advance Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sorting data.frame datewise in a descending order
Dear 'HTH' R friends I have a small dataframe as given below. I need to sort this database based on date in a decending order. I am not sure whether I have defined the date column in a proper format. mydat-data.frame(date = (c(1/31/2010, 2/28/2010, 3/31/2010, 4/30/2010, 5/31/2010, 6/30/2010, 7/31/2010, 8/31/2010, 9/30/2010, 10/31/2010, 11/30/2010, 12/28/2010)), total=c(429, 25, 239, 99, 100, 96, 18, 21, 10, 76, 101, 81), newspapers=c(103, 4, 37, 109, 52, 87, 17, 13, 10, 56, 87, 14)) mydat date total newspapers 1 1/31/2010 429 103 2 2/28/2010 25 4 3 3/31/2010 239 37 4 4/30/2010 99 109 5 5/31/2010 100 52 6 6/30/2010 96 87 7 7/31/2010 18 17 8 8/31/2010 21 13 9 9/30/2010 10 10 10 10/31/2010 76 56 11 11/30/2010 101 87 12 12/28/2010 81 14 I need to sort this data in a DESCENDING order based on a date. I.e. I need to have date total newspapers 12/28/2010 81 14 11/30/2010 101 87 10/31/2010 76 56 . .. 1/31/2010 429 103 When I tried mydat.sort - mydat[order(mydat$date)] mydat.sort - mydat[order(mydat$date)] Error in `[.data.frame`(mydat, order(mydat$date)) : undefined columns selected Kindly guide Vincy Pyne [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting data.frame datewise in a descending order
Dear sir, Thanks a lot for your great guidance. It worked fantastically. Regards Vincy Pyne --- On Thu, 12/30/10, Henrique Dallazuanna www...@gmail.com wrote: From: Henrique Dallazuanna www...@gmail.com Subject: Re: [R] Sorting data.frame datewise in a descending order To: Vincy Pyne vincy_p...@yahoo.ca Cc: r-help@r-project.org Received: Thursday, December 30, 2010, 11:31 AM Try this: mydat[order(as.Date(mydat$date, %m/%d/%Y), decreasing = TRUE),] On Thu, Dec 30, 2010 at 9:27 AM, Vincy Pyne vincy_p...@yahoo.ca wrote: Dear 'HTH' R friends I have a small dataframe as given below. I need to sort this database based on date in a decending order. I am not sure whether I have defined the date column in a proper format. mydat-data.frame(date = (c(1/31/2010, 2/28/2010, 3/31/2010, 4/30/2010, 5/31/2010, 6/30/2010, 7/31/2010, 8/31/2010, 9/30/2010, 10/31/2010, 11/30/2010, 12/28/2010)), total=c(429, 25, 239, 99, 100, 96, 18, 21, 10, 76, 101, 81), newspapers=c(103, 4, 37, 109, 52, 87, 17, 13, 10, 56, 87, 14)) mydat date total newspapers 1 1/31/2010 429 103 2 2/28/2010 25 4 3 3/31/2010 239 37 4 4/30/2010 99 109 5 5/31/2010 100 52 6 6/30/2010 96 87 7 7/31/2010 18 17 8 8/31/2010 21 13 9 9/30/2010 10 10 10 10/31/2010 76 56 11 11/30/2010 101 87 12 12/28/2010 81 14 I need to sort this data in a DESCENDING order based on a date. I.e. I need to have date total newspapers 12/28/2010 81 14 11/30/2010 101 87 10/31/2010 76 56 . .. 1/31/2010 429 103 When I tried mydat.sort - mydat[order(mydat$date)] mydat.sort - mydat[order(mydat$date)] Error in `[.data.frame`(mydat, order(mydat$date)) : undefined columns selected Kindly guide Vincy Pyne [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Changing column names
Dear R helpers Wish you all a very Happy and Prosperous New Year 2011. I have following query. country = c(US, France, UK, NewZealand, Germany, Austria, Italy, Canada) Through some other R process, the result.csv file is generated as result.csv var1 var2 var3 var4 var5 var6 var7 var8 1 25 45 29 92 108 105 65 56 2 80 132 83 38 38 11 47 74 3 135 11 74 56 74 74 74 29 I need the country names to be column heads i.e. I need an output like result_new US France UK NewZealand Germany Austria Italy Canada 1 25 45 29 92 108 105 65 56 2 80 132 83 38 38 11 47 74 3 135 11 74 56 74 74 74 29 The number of countries i.e. length(country) matches with total number of variables (i.e. no of columns in 'result.csv'). One way of doing this is to use country names as column names while writing the 'result.csv' file. write.csv(data.frame(US = ..., France = ...), 'result.csv', row.names = FALSE) However, the problem is I don't know in what order the country names will appear and also there could be addition or deletion of some country names. Also, if there are say 150 country names, the above way (i.e. writing.csv) of defining the column names is not practical. Basically I want to change the column heads after the 'result.csv' is generated. Kindly guide. Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sequence generation in a table
Dear R helpers I have following input f = c(257, 520, 110). I need to generate a decreasing sequence (decreasing by 100) which will give me an input (in a tabular form) like 257, 157, 57 520, 420, 320, 220, 120, 20 110, 10 I tried the following R code f = c(257, 520, 110) yy = matrix(data = NA, nrow = 3, ncol = 6) for (i in 1:3) { value = NULL for (j in 1 : 6) { value = c(ans, seq(f[i], 1, by = -100)) } yy[i,] = ans[i,j] } I get following message Error in ans[i, j] : incorrect number of dimensions. Also, I understand above logic will generate a result in (3 by 6) matrix format, while I need to generate only 3 numbers pertaining to first no. i.e. 257, 6 nos. beginning from 520, and only 2 numbers beginning from 110. I also tried tapply etc. Please guide Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sequence generation in a table
Dear Sir, Sorry to bother you again. Sir, the R code provided by you gives me following output. yy - lapply(c(257, 520, 110), seq, to=0, by=-100) yy [[1]] [1] 257 157 57 [[2]] [1] 520 420 320 220 120 20 [[3]] [1] 110 10 The biggest constraint for me is here as an example I have taken only three cases i.e. c(257, 520, 110), however in reality I will be dealing with no of cases and that number is unknown. But your code will certainly generate me the required numbers. In above case for doing further calculations, I can define say yy1 = as.numeric(yy[[1]]) yy2 = as.numeric(yy[2]]) yy3 = as.numeric(yy[[3]]) But when the number of cases are unknown, perhaps this is not the practical way of me defining individually. So is there any way that I can have all the sequence numbers generated can be accommodated in a single dataframe. I sincerely apologize for disturbing you Sir and hope I am able to put up my problem in a proper manner. Regards Vincy Pyne --- On Thu, 12/9/10, Jan van der Laan rh...@eoos.dds.nl wrote: From: Jan van der Laan rh...@eoos.dds.nl Subject: Re: [R] Sequence generation in a table To: r-help@r-project.org, vincy_p...@yahoo.ca Received: Thursday, December 9, 2010, 10:57 AM Vincy, I suppose the following does what you want. yy is now a list which allows for differing lengths of the vectors. yy - lapply(c(257, 520, 110), seq, to=0, by=-100) yy[[1]] [1] 257 157 57 yy[[2]] [1] 520 420 320 220 120 20 Regards, Jan On 9-12-2010 11:40, Vincy Pyne wrote: c(257, 520, 110) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sequence generation in a table
Dear Sirs, I understand these already are numeric values. Sir, Basically I am working on Value at Risk for the Bond portfolio using the historical simulation and for this I need to find out Marked to Market (MTM) value suing the Present Value of the coupon payments for each Bonds (here as an example I have taken only 3). What I have done so far is for a given bond I have found no of days left for maturity. E.g. in 1st case there are 257 days left for maturity. The bond pays coupon twice a year and thus on 257th day the bond will mature and I will be getting the Principal and final coupon payment. Since teh bond is paying the coupons every 6 months, going backward from 257 th day, my earlier coupon payment falls on (257 - 180) = 77 days. (However, in above example, I have just taken 100 just for example purpose) Thus, assuming 100 days, my coupons will be paid on 257, 157, 57days. I need to convert these days in terms of years and so when I try to divide yy defined as yy - lapply(c(257, 520, 110), seq, to=0, by=-100) yy/360, I get following error. Error in yy/360 : non-numeric argument to binary operator On the other hand, yy[[1]]/365 fetches me [1] 0.7138889 0.436 0.158 Thus, I am trying to obtain the result yy - lapply(c(257, 520, 110), seq, to=0, by=-100) in such a form, so taht I should be able to further analysis. What I was trying to say is since here I am taking only three bonds, so I can do it individually, however if there are number of bonds (say 1000) in the portfolio, my method of converting the days individually is not practical. I am extremely sorry for the inconvenience caused. I tried to keep my problem short in oder not to consume your valuable time. Regards Vince Pyne --- On Thu, 12/9/10, Petr PIKAL petr.pi...@precheza.cz wrote: From: Petr PIKAL petr.pi...@precheza.cz Subject: Re: [R] Sequence generation in a table To: Vincy Pyne vincy_p...@yahoo.ca Cc: r-help@r-project.org Received: Thursday, December 9, 2010, 12:03 PM Hi r-help-boun...@r-project.org napsal dne 09.12.2010 12:41:47: Dear Sir, Sorry to bother you again. Sir, the R code provided by you gives me following output. yy - lapply(c(257, 520, 110), seq, to=0, by=-100) yy [[1]] [1] 257 157 57 [[2]] [1] 520 420 320 220 120 20 [[3]] [1] 110 10 The biggest constraint for me is here as an example I have taken only three cases i.e. c(257, 520, 110), however in reality I will be dealing with no of cases and that number is unknown. But your code will certainly generate me the required numbers. In above case for doing further calculations, I can define say yy1 = as.numeric(yy[[1]]) yy2 = as.numeric(yy[2]]) yy3 = as.numeric(yy[[3]]) Why? Those values are already numeric. lapply(yy, is.numeric) [[1]] [1] TRUE [[2]] [1] TRUE [[3]] [1] TRUE and you can use the same construction to perform almost any operation on list. lapply(yy, max) lapply(yy, mean) lapply(yy, sd) lapply(yy, t.test) Regards Petr But when the number of cases are unknown, perhaps this is not the practical way of me defining individually. So is there any way that I can have all the sequence numbers generated can be accommodated in a single dataframe. I sincerely apologize for disturbing you Sir and hope I am able to put up my problem in a proper manner. Regards Vincy Pyne --- On Thu, 12/9/10, Jan van der Laan rh...@eoos.dds.nl wrote: From: Jan van der Laan rh...@eoos.dds.nl Subject: Re: [R] Sequence generation in a table To: r-help@r-project.org, vincy_p...@yahoo.ca Received: Thursday, December 9, 2010, 10:57 AM Vincy, I suppose the following does what you want. yy is now a list which allows for differing lengths of the vectors. yy - lapply(c(257, 520, 110), seq, to=0, by=-100) yy[[1]] [1] 257 157 57 yy[[2]] [1] 520 420 320 220 120 20 Regards, Jan On 9-12-2010 11:40, Vincy Pyne wrote: c(257, 520, 110) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] One silly question about tapply output
Dear R helpers I have a data which gives Month-wise and Rating-wise Rates. So the input file is something like month rating rate January AAA 9.04 February AAA 9.07 .. .. Decemeber AAA 8.97 January BBB 11.15 February BBB 11.13 January CCC 17.13 . December CCC 17.56 and so on. My objective is to calculate Rating-wise mean rate, for which I have used rating_mean = tapply(rate, rating, mean) and I am getting following output tapply(rate, rating, mean) AAA BBB CCC 9.1104 11.1361637 17.1606779 which is correct when compared with an excel output. However, I wish to have my output something like a data.frame (so that I should be able to save this output as csv file with respective headings and should be able to carry out further analysis) Rating Mean AAA 9.1104 BBB 11.1361637 CCC 17.1606779 Please guide as how should I achieve my output like this. Thanking in advance. Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] One silly question about tapply output
Dear Sirs, Thanks a lot for your great help. This is going to help me immensely in future as many times I had found myself struggling with this problem. Thanks again for the great help. Regards Vincy --- On Wed, 10/27/10, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: From: Dimitri Liakhovitski dimitri.liakhovit...@gmail.com Subject: Re: [R] One silly question about tapply output To: Vincy Pyne vincy_p...@yahoo.ca Received: Wednesday, October 27, 2010, 11:28 AM Assign your result to an object and then write out the object as a csv file. For example: x-data.frame(rating=rep(letters[1:3],2),rate=runif(1:6)) # example data frame rating.means-tapply(x$rate,x$rating,mean) write.csv(rating.means,file=my.file.csv,row.names=T) Dimitri On Wed, Oct 27, 2010 at 6:39 AM, Vincy Pyne vincy_p...@yahoo.ca wrote: Dear R helpers I have a data which gives Month-wise and Rating-wise Rates. So the input file is something like month rating rate January AAA 9.04 February AAA 9.07 .. .. Decemeber AAA 8.97 January BBB 11.15 February BBB 11.13 January CCC 17.13 . December CCC 17.56 and so on. My objective is to calculate Rating-wise mean rate, for which I have used rating_mean = tapply(rate, rating, mean) and I am getting following output tapply(rate, rating, mean) AAA BBB CCC 9.1104 11.1361637 17.1606779 which is correct when compared with an excel output. However, I wish to have my output something like a data.frame (so that I should be able to save this output as csv file with respective headings and should be able to carry out further analysis) Rating Mean AAA 9.1104 BBB 11.1361637 CCC 17.1606779 Please guide as how should I achieve my output like this. Thanking in advance. Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitri Liakhovitski Ninah Consulting www.ninah.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Band-wise Conditional Sum - Actual problem
earlier mail. I could test the reply sent to me earlier by Winsemius Sir only today as I was traveling over weekends. Also, I have tried to go through earlier emails dealing with such conditional sums. Unfortunately, I couldn't understand as I have recently started my venture with R. Thanking you in advance and sincerely apologize for any mis-communication if it had occurred in my earlier mail. Regards Vincy --- On Fri, 8/27/10, David Winsemius dwinsem...@comcast.net wrote: From: David Winsemius dwinsem...@comcast.net Subject: Re: [R] Band-wise Sum To: Vincy Pyne vincy_p...@yahoo.ca Cc: r-help@r-project.org Received: Friday, August 27, 2010, 2:36 PM On Aug 27, 2010, at 9:49 AM, Vincy Pyne wrote: Hi I have a large credit portfolio (exceeding 5 borrowers). For particular process I need to add up the exposures based on the bands. I am giving a small test data below. I would think that cut() would be the accepted method for defining a factor variable based on specified cutpoints. If you then wanted to see what the cumsum() was across the range of possible levels, that to would be a fairly simple task. df$ead.cat - cut(df$ead, breaks=c(0, 10, 50, 100, 200, 500 , 1000, 1) ) df with(df, tapply(ead.cat, rating, length)) # A AA AAA B BB BBB # 10 8 2 1 4 7 with(df, tapply(ead.cat, rating, table)) # returns a list of table objects by bond rating lapply( with(df, tapply(ead.cat, rating, table)) , cumsum) #returns the cumsum of those tables # sapply gives a more compact output of that result: sapply( with(df, tapply(ead.cat, rating, table)) , cumsum) A AA AAA B BB BBB (0,1e+05] 4 2 1 0 3 1 (1e+05,5e+05] 8 2 1 1 3 1 (5e+05,1e+06] 9 2 1 1 3 1 (1e+06,2e+06] 9 4 2 1 4 3 (2e+06,5e+06] 9 5 2 1 4 4 (5e+06,1e+07] 10 5 2 1 4 7 (1e+07,1e+08] 10 8 2 1 4 7 Loops, you say we need loops? We don't need no stinkin' loops. --David. rating - c(A, AAA, A, BBB,AA,A,BB, BBB, AA, AA, AA, A, A, AA,BB,BBB,AA, A, AAA,BBB,BBB, BB, A, BB, A, AA, B,A, AA, BBB, A, BBB) ead - c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360, 600, 17715000, 14430325.24, 1180946.57, 15, 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3, 3283048.61, 120, 11800, 3000, 96894.02, 453671.72, 7590, 106065.24, 940711.67, 2443000, 950, 39000, 1501939.67) ## First I have sorted the data rating-wise as df - data.frame(rating, ead) df_sorted - df[order(df$rating),] df_sorted_AAA - subset(df_sorted, rating==AAA) df_sorted_AA - subset(df_sorted, rating==AA) df_sorted_A - subset(df_sorted, rating==A) df_sorted_BBB - subset(df_sorted, rating==BBB) df_sorted_BB - subset(df_sorted, rating==BB) df_sorted_B - subset(df_sorted, rating==B) df_sorted_CCC - subset(df_sorted, rating==CCC) ## we begin with BBB rating. The R output for df_sorted_BBB is as follows df_sorted_BBB rating ead 4 BBB 9530149 8 BBB 600 16 BBB 3000 20 BBB 3283049 21 BBB 120 30 BBB 950 32 BBB 1501940 My problem is I need to totals of eads falling in the respective bands I am defining bands in millions as seq_BBB - seq(100, max(df_sorted_BBB$ead), by = 100) # The output is [1] 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06 So for the sub data pertaining to Rating BBB, I want corresponding ead totals i.e. I want ead totals where ead 1e+06, then I want ead totals where 1+e06 ead 2e+06, 2e+06 ead 3e+06 ...and so on. I have tried the following code s_BBB - NULL for (i in 1:length(s_BBB)) { s_BBB[i] = sum(subset(df_sorted_BBB$ead, df_sorted_BBB$ead s_BBB[i])) } I was trying to find totals ofads 1e+06, ead 2e+06, ead3e+06and so on. but the result is s_BBB [1] 0 I apologize if I am not able to express my problem properly. My only objective is first to sort the whole portfolio rating-wise and then within each of these rating-wise sorted data, I wish to find out total of eads based on various bands starting 100, 100 - 20, 200 - 300, 300 - 400 and so on. Since the database contains more than 5 records, various ead amounts ranging from few 000's to billion are available. Please guide Thanking you all in advance Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read
Re: [R] Band-wise Conditional Sum - Actual problem
Dear David and Dennis Sir, Thanks a lot for your guidance. As guided by Mr Dennis Murphy Sir in his reply Replace table in the tapply call with sum. While you're at it, typing ?tapply to find out what the function does wouldn't hurt... I had really tried earlier to understand the apply, tapply, mapply and sapply commands before writing back to the R forum. But I was not able to figure out where was the problem. But Mr Dennis Sir really inspired me and when I revisited 'tapply', I realized that instead of using 'ead' for getting sum, I was using 'ead.cat', and that solved my problem. Then I had a new problem of 'How to get rid of NA's' , Again instead of posting to the group, I had accessed the earlier R mails and in the end got the solution. I sincerely thank both of you for taking so much efforts and guiding me. I will certainly take efforts to understand 'R' at the earliest. Regards Vincy Replace table in the tapply call with sum. While you're at it, typing ?tapply to find out what the function does wouldn't hurt... HTH, Dennis --- On Mon, 8/30/10, David Winsemius dwinsem...@comcast.net wrote: From: David Winsemius dwinsem...@comcast.net Subject: Re: [R] Band-wise Conditional Sum - Actual problem Cc: r-help@r-project.org Received: Monday, August 30, 2010, 2:43 PM On Aug 30, 2010, at 4:05 AM, Vincy Pyne wrote: Dear R helpers, Thanks a lot for your earlier guidance esp. Mr Davind Winsemius Sir. However, there seems to be mis-communication from my end corresponding to my requirement. As I had mentioned in my earlier mail, I am dealing with a very large database of borrowers and I had given a part of it in my earlier mail as given below. For a given rating say A, I needed to have the bad-wise sums of ead's (where bands are constructed using the ead size itself.) and not the number of borrowers falling in a particular band. I am reproducing the data and solution as provided by Winsemius Sir (which generates the number of band-wise borrowers for a given rating. rating - c(A, AAA, A, BBB,AA,A,BB, BBB, AA, AA, AA, A, A, AA,BB,BBB,AA, A, AAA,BBB,BBB, BB, A, BB, A, AA, B,A, AA, BBB, A, BBB) ead - c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360, 600, 17715000, 14430325.24, 1180946.57, 15, 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3, 3283048.61, 120, 11800, 3000, 96894.02, 453671.72, 7590, 106065.24, 940711.67, 2443000, 950, 39000, 1501939.67) df$ead.cat - cut(df$ead, breaks=c(0, 10, 50, 100, 200, 500 , 1000, 1) ) df df_sorted - df[order(df$rating),] # the output is as given below. df_sorted rating ead ead.cat 1 A 169229.93 (1e+05,5e+05] 3 A 5877794.25 (5e+06,1e+07] 6 A 21000.00 (0,1e+05] 12 A 15.00 (1e+05,5e+05] 13 A 167490.00 (1e+05,5e+05] 18 A 9100.00 (0,1e+05] 23 A 3000.00 (0,1e+05] 25 A 453671.72 (1e+05,5e+05] 28 A 940711.67 (5e+05,1e+06] 31 A 39000.00 (0,1e+05] 5 AA 75040962.06 (1e+07,1e+08] 9 AA 17715000.00 (1e+07,1e+08] 10 AA 14430325.24 (1e+07,1e+08] 11 AA 1180946.57 (1e+06,2e+06] 14 AA 81255.16 (0,1e+05] 17 AA 1275702.94 (1e+06,2e+06] 26 AA 7590.00 (0,1e+05] 29 AA 2443000.00 (2e+06,5e+06] 2 AAA 100.00 (0,1e+05] 19 AAA 1763142.30 (1e+06,2e+06] 27 B 106065.24 (1e+05,5e+05] 7 BB 1028360.00 (1e+06,2e+06] 15 BB 54812.50 (0,1e+05] 22 BB 11800.00 (0,1e+05] 24 BB 96894.02 (0,1e+05] 4 BBB 9530148.63 (5e+06,1e+07] 8 BBB 600.00 (5e+06,1e+07] 16 BBB 3000.00 (0,1e+05] 20 BBB 3283048.61 (2e+06,5e+06] 21 BBB 120.00 (1e+06,2e+06] 30 BBB 950.00 (5e+06,1e+07] 32 BBB 1501939.67 (1e+06,2e+06] ## The following command fetches rating-wise and ead size no of borrowers. Thus, for rating A, there are 4 borrowers in the ead range (0, 1e+05], 4 borrowers in the range (1e+05 to 5e+05] and so on.. with(df, tapply(ead.cat, rating, table)) $A (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] 4 4 1 0 0 1 0 $AA (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e
Re: [R] Band-wise Sum
Dear David Sir, Thanks a lot for your guidance. You reply besides helping, also taught me the importance of sharing your knowledge. It also helped me understand where do I stand. I am a starter in R and I have started going through at least some mails everyday whenever possible so that I can learn something from THE WISE like you. Thanks once again Sir. Your help was great and it means a lot to me and for other freshers like me. Regards Vincy Pyne --- On Fri, 8/27/10, David Winsemius dwinsem...@comcast.net wrote: From: David Winsemius dwinsem...@comcast.net Subject: Re: [R] Band-wise Sum To: Vincy Pyne vincy_p...@yahoo.ca Cc: r-help@r-project.org Received: Friday, August 27, 2010, 2:36 PM On Aug 27, 2010, at 9:49 AM, Vincy Pyne wrote: Hi I have a large credit portfolio (exceeding 5 borrowers). For particular process I need to add up the exposures based on the bands. I am giving a small test data below. I would think that cut() would be the accepted method for defining a factor variable based on specified cutpoints. If you then wanted to see what the cumsum() was across the range of possible levels, that to would be a fairly simple task. df$ead.cat - cut(df$ead, breaks=c(0, 10, 50, 100, 200, 500 , 1000, 1) ) df with(df, tapply(ead.cat, rating, length)) # A AA AAA B BB BBB # 10 8 2 1 4 7 with(df, tapply(ead.cat, rating, table)) # returns a list of table objects by bond rating lapply( with(df, tapply(ead.cat, rating, table)) , cumsum) #returns the cumsum of those tables # sapply gives a more compact output of that result: sapply( with(df, tapply(ead.cat, rating, table)) , cumsum) A AA AAA B BB BBB (0,1e+05] 4 2 1 0 3 1 (1e+05,5e+05] 8 2 1 1 3 1 (5e+05,1e+06] 9 2 1 1 3 1 (1e+06,2e+06] 9 4 2 1 4 3 (2e+06,5e+06] 9 5 2 1 4 4 (5e+06,1e+07] 10 5 2 1 4 7 (1e+07,1e+08] 10 8 2 1 4 7 Loops, you say we need loops? We don't need no stinkin' loops. --David. rating - c(A, AAA, A, BBB,AA,A,BB, BBB, AA, AA, AA, A, A, AA,BB,BBB,AA, A, AAA,BBB,BBB, BB, A, BB, A, AA, B,A, AA, BBB, A, BBB) ead - c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360, 600, 17715000, 14430325.24, 1180946.57, 15, 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3, 3283048.61, 120, 11800, 3000, 96894.02, 453671.72, 7590, 106065.24, 940711.67, 2443000, 950, 39000, 1501939.67) ## First I have sorted the data rating-wise as df - data.frame(rating, ead) df_sorted - df[order(df$rating),] df_sorted_AAA - subset(df_sorted, rating==AAA) df_sorted_AA - subset(df_sorted, rating==AA) df_sorted_A - subset(df_sorted, rating==A) df_sorted_BBB - subset(df_sorted, rating==BBB) df_sorted_BB - subset(df_sorted, rating==BB) df_sorted_B - subset(df_sorted, rating==B) df_sorted_CCC - subset(df_sorted, rating==CCC) ## we begin with BBB rating. The R output for df_sorted_BBB is as follows df_sorted_BBB rating ead 4 BBB 9530149 8 BBB 600 16 BBB 3000 20 BBB 3283049 21 BBB 120 30 BBB 950 32 BBB 1501940 My problem is I need to totals of eads falling in the respective bands I am defining bands in millions as seq_BBB - seq(100, max(df_sorted_BBB$ead), by = 100) # The output is [1] 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06 So for the sub data pertaining to Rating BBB, I want corresponding ead totals i.e. I want ead totals where ead 1e+06, then I want ead totals where 1+e06 ead 2e+06, 2e+06 ead 3e+06 ...and so on. I have tried the following code s_BBB - NULL for (i in 1:length(s_BBB)) { s_BBB[i] = sum(subset(df_sorted_BBB$ead, df_sorted_BBB$ead s_BBB[i])) } I was trying to find totals ofads 1e+06, ead 2e+06, ead3e+06and so on. but the result is s_BBB [1] 0 I apologize if I am not able to express my problem properly. My only objective is first to sort the whole portfolio rating-wise and then within each of these rating-wise sorted data, I wish to find out total of eads based on various bands starting 100, 100 - 20, 200 - 300, 300 - 400 and so on. Since the database contains more than 5 records, various ead amounts ranging from few 000's to billion are available. Please guide Thanking you all in advance Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch
[R] Band-wise Sum
Hi I have a large credit portfolio (exceeding 5 borrowers). For particular process I need to add up the exposures based on the bands. I am giving a small test data below. rating - c(A, AAA, A, BBB,AA,A,BB, BBB, AA, AA, AA, A, A, AA,BB,BBB,AA, A, AAA,BBB,BBB, BB, A, BB, A, AA, B,A, AA, BBB, A, BBB) ead - c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360, 600, 17715000, 14430325.24, 1180946.57, 15, 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3, 3283048.61, 120, 11800, 3000, 96894.02, 453671.72, 7590, 106065.24, 940711.67, 2443000, 950, 39000, 1501939.67) ## First I have sorted the data rating-wise as df - data.frame(rating, ead) df_sorted - df[order(df$rating),] df_sorted_AAA - subset(df_sorted, rating==AAA) df_sorted_AA - subset(df_sorted, rating==AA) df_sorted_A - subset(df_sorted, rating==A) df_sorted_BBB - subset(df_sorted, rating==BBB) df_sorted_BB - subset(df_sorted, rating==BB) df_sorted_B - subset(df_sorted, rating==B) df_sorted_CCC - subset(df_sorted, rating==CCC) ## we begin with BBB rating. The R output for df_sorted_BBB is as follows df_sorted_BBB rating ead 4 BBB 9530149 8 BBB 600 16 BBB 3000 20 BBB 3283049 21 BBB 120 30 BBB 950 32 BBB 1501940 My problem is I need to totals of eads falling in the respective bands I am defining bands in millions as seq_BBB - seq(100, max(df_sorted_BBB$ead), by = 100) # The output is [1] 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06 So for the sub data pertaining to Rating BBB, I want corresponding ead totals i.e. I want ead totals where ead 1e+06, then I want ead totals where 1+e06 ead 2e+06, 2e+06 ead 3e+06 ...and so on. I have tried the following code s_BBB - NULL for (i in 1:length(s_BBB)) { s_BBB[i] = sum(subset(df_sorted_BBB$ead, df_sorted_BBB$ead s_BBB[i])) } I was trying to find totals of eads 1e+06, ead 2e+06, ead3e+06 and so on. but the result is s_BBB [1] 0 I apologize if I am not able to express my problem properly. My only objective is first to sort the whole portfolio rating-wise and then within each of these rating-wise sorted data, I wish to find out total of eads based on various bands starting 100, 100 - 20, 200 - 300, 300 - 400 and so on. Since the database contains more than 5 records, various ead amounts ranging from few 000's to billion are available. Please guide Thanking you all in advance Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.