[R] Random Forest, Giving More Importance to Some Data
Dear All, I am using randomForest to predict the final selling price of some items. As it often happens, I have a lot of (noisy) historical data, but the question is not so much about data cleaning. The dataset for which I need to carry out some predictions are fairly recent sales or even some sales that will took place in the near future. As a consequence, historical data should be somehow weighted: the older they are, the less they should matter for the prediction. Any idea about how this could be achieved? Please find below a snippet showing how I use the randomForest library (on a multi-core machine). Any suggestion is appreciated. Cheers Lorenzo ### rf_model - foreach(iteration=1:cores, ntree = rep(50, 4), .combine = combine, .packages = randomForest) %dopar%{ sink(log.txt, append=TRUE) cat(paste(Starting iteration,iteration,\n)) randomForest(trainRF, prices_train, ## mtry=20, nodesize=5, ## maxnodes=140, importance=FALSE, do.trace=10,ntree=ntree) ### __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Parallelizing GBM
Dear All, I am far from being a guru about parallel programming. Most of the time, I rely or randomForest for data mining large datasets. I would like to give a try also to the gradient boosted methods in GBM, but I have a need for parallelization. I normally rely on gbm.fit for speed reasons, and I usually call it this way gbm_model - gbm.fit(trainRF,prices_train, offset = NULL, misc = NULL, distribution = multinomial, w = NULL, var.monotone = NULL, n.trees = 50, interaction.depth = 5, n.minobsinnode = 10, shrinkage = 0.001, bag.fraction = 0.5, nTrain = (n_train/2), keep.data = FALSE, verbose = TRUE, var.names = NULL, response.name = NULL) Does anybody know an easy way to parallelize the model (in this case it means simply having 4 cores on the same machine working on the problem)? Any suggestion is welcome. Cheers Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parallelizing GBM
See this: https://code.google.com/p/gradientboostedmodels/issues/detail?id=3 and this: https://code.google.com/p/gradientboostedmodels/source/browse/?name=parallel Max On Sun, Mar 24, 2013 at 7:31 AM, Lorenzo Isella lorenzo.ise...@gmail.comwrote: Dear All, I am far from being a guru about parallel programming. Most of the time, I rely or randomForest for data mining large datasets. I would like to give a try also to the gradient boosted methods in GBM, but I have a need for parallelization. I normally rely on gbm.fit for speed reasons, and I usually call it this way gbm_model - gbm.fit(trainRF,prices_train, offset = NULL, misc = NULL, distribution = multinomial, w = NULL, var.monotone = NULL, n.trees = 50, interaction.depth = 5, n.minobsinnode = 10, shrinkage = 0.001, bag.fraction = 0.5, nTrain = (n_train/2), keep.data = FALSE, verbose = TRUE, var.names = NULL, response.name = NULL) Does anybody know an easy way to parallelize the model (in this case it means simply having 4 cores on the same machine working on the problem)? Any suggestion is welcome. Cheers Lorenzo __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] boxplot
Unless you have a really large number of wells I'd just use the brute force approach of reading in each data set with a simple read.table or read.csv like well1 - read.csv(well1.csv) type of statement and repeat for each well. Here is a simple example that may give you an idea of how to do the boxplots . I have done them two ways, one using base graphics and the other using ggplot2. You will probably have to install the ggplot2 package -- just issue the command install.packages(ggplot2) The base approach is initially a lot simpler but in the longer term, if you expect to do a lot of graphing work in R, the grid packages like ggplot2 or lattice seem to offer a lot more control for less actual typing, especially if you need publication/report quality graphics. ##===start code= set.seed(345) #reproducable sample # create three sample data sets, well_1 - data.frame(arsenic = rnorm(12)) well_2 - data.frame (arsenic = rnorm(10)) well_3 - data.frame (arsenic = rnorm(15)) wells - rbind(well_1, well_2, well_3) # create single data.frame #create an id value for each well well_id - c(rep(1,nrow(well_1)), rep(2, nrow(well_2)), rep(3, nrow(well_3))) #add the well identifier wells - cbind(wells , well_id) str(wells) # check to see what we have boxplot(arsenic ~ well_id, data = wells) # plot vertical boxplot boxplot(arsenic ~ well_id, data = wells, horizontal = TRUE,col=c(red,green,blue)) #horizontal box plot # vertical boxplot using ggplot2 library(ggplot2) p - ggplot(wells, aes(as.factor(well_id), arsenic)) + geom_boxplot() p # horizontal boxplot p1 - p + coord_flip() p1 p2 - ggplot(wells, aes(as.factor(well_id), arsenic, fill = as.factor(well_id) )) + geom_boxplot() + coord_flip() + scale_fill_discrete(guide=FALSE) ##===end code== John Kane Kingston ON Canada -Original Message- From: annij...@gmail.com Sent: Sat, 23 Mar 2013 10:22:02 -0400 To: jrkrid...@inbox.com Subject: Re: [R] boxplot Hello John, I apologize for the delayed response. Yes I am referring to the same type of data in the data sets. For example, the arsenic concentrations in individual groundwater monitoring wells at a groundwater contaminated site, where one well may have 12 concentration measurements, another well has 10, etc. Thanks Janh On Fri, Mar 22, 2013 at 5:31 PM, John Kane [1]jrkrid...@inbox.com wrote: Hi Janh, When you say that you have multiple data sets of unequal sample sizes are you speaking of the same kind of data For example are you speaking of data from a set of experiments where the variables measured are all the same and where when you graph them you expect the same x and y scales? Or are you talking about essentilly independent data sets that it makes sense to graph in a grid ? John Kane Kingston ON Canada -Original Message- From: [2]annij...@gmail.com Sent: Fri, 22 Mar 2013 10:46:21 -0400 To: [3]dcarl...@tamu.edu Subject: Re: [R] boxplot Hello All, On the subject of boxplots, I have multiple data sets of unequal sample sizes and was wondering what would be the most efficient way to read in the data and plot side-by-side boxplots, with options for controlling the orientation of the plots (i.e. vertical or horizontal) and the spacing? Your assistance is greatly appreciated, but please try to be explicit as I am no R expert. Thanks Janh On Thu, Mar 21, 2013 at 9:19 AM, David L Carlson [4]dcarl...@tamu.edu wrote: Your variable loc_type combines information from two variables (loc and type). Since you are subsetting on loc, why not just plot by type? boxplot(var1~type, data[data$loc==nice,]) -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: [5]r-help-boun...@r-project.org [mailto:[6]r-help-bounces@r- [7]project.org] On Behalf Of Jim Lemon Sent: Thursday, March 21, 2013 4:05 AM To: carol white Cc: [8]r-h...@stat.math.ethz.ch Subject: Re: [R] boxplot On 03/21/2013 07:40 PM, carol white wrote: Hi, It must be an easy question but how to boxplot a subset of data: data = read.table(my_data.txt, header = T) boxplot(data$var1[data$loc == nice]~data$loc_type[data$loc == nice]) #in this case, i want to display only the boxplot loc == nice #doesn't display the boxplot of only loc == nice. It also displays loc == mice Hi Carol, It's them old factors sneakin'
Re: [R] Parallelizing GBM
Thanks a lot for the quick answer. However, from what I see, the parallelization affects only the cross-validation part in the gbm interface (but it changes nothing when you call gbm.fit). Am I missing anything here? Is there any fundamental reason why gbm.fit cannot be parallelized? Lorenzo On Sun, 24 Mar 2013 12:45:39 +0100, Max Kuhn mxk...@gmail.com wrote: See this: https://code.google.com/p/gradientboostedmodels/issues/detail?id=3 and this: https://code.google.com/p/gradientboostedmodels/source/browse/?name=parallel Max On Sun, Mar 24, 2013 at 7:31 AM, Lorenzo Isella lorenzo.ise...@gmail.com wrote: Dear All, I am far from being a guru about parallel programming. Most of the time, I rely or randomForest for data mining large datasets. I would like to give a try also to the gradient boosted methods in GBM, but I have a need for parallelization. I normally rely on gbm.fit for speed reasons, and I usually call it this way gbm_model - gbm.fit(trainRF,prices_train, offset = NULL, misc = NULL, distribution = multinomial, w = NULL, var.monotone = NULL, n.trees = 50, interaction.depth = 5, n.minobsinnode = 10, shrinkage = 0.001, bag.fraction = 0.5, nTrain = (n_train/2), keep.data = FALSE, verbose = TRUE, var.names = NULL, response.name = NULL) Does anybody know an easy way to parallelize the model (in this case it means simply having 4 cores on the same machine working on the problem)? Any suggestion is welcome. Cheers Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parallelizing GBM
Yes, I think the second link is a test build of a parallelized cv loop within gbm(). On Mar 24, 2013, at 9:28 AM, Lorenzo Isella lorenzo.ise...@gmail.com wrote: Thanks a lot for the quick answer. However, from what I see, the parallelization affects only the cross-validation part in the gbm interface (but it changes nothing when you call gbm.fit). Am I missing anything here? Is there any fundamental reason why gbm.fit cannot be parallelized? Lorenzo On Sun, 24 Mar 2013 12:45:39 +0100, Max Kuhn mxk...@gmail.com wrote: See this: https://code.google.com/p/gradientboostedmodels/issues/detail?id=3 and this: https://code.google.com/p/gradientboostedmodels/source/browse/?name=parallel Max On Sun, Mar 24, 2013 at 7:31 AM, Lorenzo Isella lorenzo.ise...@gmail.com wrote: Dear All, I am far from being a guru about parallel programming. Most of the time, I rely or randomForest for data mining large datasets. I would like to give a try also to the gradient boosted methods in GBM, but I have a need for parallelization. I normally rely on gbm.fit for speed reasons, and I usually call it this way gbm_model - gbm.fit(trainRF,prices_train, offset = NULL, misc = NULL, distribution = multinomial, w = NULL, var.monotone = NULL, n.trees = 50, interaction.depth = 5, n.minobsinnode = 10, shrinkage = 0.001, bag.fraction = 0.5, nTrain = (n_train/2), keep.data = FALSE, verbose = TRUE, var.names = NULL, response.name = NULL) Does anybody know an easy way to parallelize the model (in this case it means simply having 4 cores on the same machine working on the problem)? Any suggestion is welcome. Cheers Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating a boxplot from a data summary
Hi, I'm trying to create a boxplot from the summary of a large data set and I'm having trouble finding any way to do this. I'm familiar with, but by no means good at, using R, so the only two websites I've found pertaining to this issue have been way over my head. I was hoping for a simple set of instructions that I could follow to produce a boxplot, in R, for three groups of data, with 8 weighted responses. For example, the groups are three different professions, each asked to fill out and rank 8 statements in order from 1-8. Ideally these would be on one graphic output, but if that can't be done, one output per group would be suitable. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LOOCV over SVM,KNN
Thanks you very much! Your help has been very useful! Regards! 2013/3/23 mxkuhn mxk...@gmail.com train() in caret. See http://caret.r-forge.r-project.org/ Also, the C5.0 function in the C50 is much more effective than J48. Max On Mar 23, 2013, at 2:57 PM, Nicolás Sánchez eni...@gmail.com wrote: Good afternoon. I would like to know if there is any function in R to do LOOCV with these classifiers: 1)SVM 2)Neural Networks 3)C4.5 ( J48) 4)KNN Thanks a lot! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating a boxplot from a data summary
On 3/24/2013 11:39 AM, Josh Hall wrote: Hi, I'm trying to create a boxplot from the summary of a large data set and I'm having trouble finding any way to do this. I'm familiar with, but by no means good at, using R, so the only two websites I've found pertaining to this issue have been way over my head. I was hoping for a simple set of instructions that I could follow to produce a boxplot, in R, for three groups of data, with 8 weighted responses. For example, the groups are three different professions, each asked to fill out and rank 8 statements in order from 1-8. Ideally these would be on one graphic output, but if that can't be done, one output per group would be suitable. Look at the help for bxp: ?bxp # The following give you insight into a boxplot structure bp = boxplot(list(a=rnorm(10),b = rnorm(10), c = rnorm(10))) bp [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Robert W. Baer, Ph.D. Professor of Physiology Kirksille College of Osteopathic Medicine A. T. Still University of Health Sciences Kirksville, MO 63501 USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rscript does not load/capture all shell arguments
Hi, I am working on a GRASS script (bash script), which should run a R script. I am working on Ubuntu 12.10, with R 2.15.3 and GRASS GIS 7.0 (I am not sure the latter isn't really relevant as the grass script is just a bash script). The R script is evoked with a call to Rscript ($RGRASSSCRIPT is a shell variable with the file name of the R script, the rest are variables I want to read into R) : Rscript --no-save --no-restore $RGRASSSCRIPT $GIS_OPT_INMAP $GIS_OPT_PRES $GIS_OPT_ENV $GIS_OPT_OUTMAP_GLM $GIS_OPT_FSTATS $GIS_FLAG_M $GIS_OPT_SPP $GIS_OPT_SPA $GIS_OPT_FAM $GIS_OPT_TERMS $GIS_FLAG_Q $GIS_OPT_OUTMAP_PROJ $GIS_OPT_ENV_PROJ $GIS_OPT_KFOLD $GIS_OPT_SFOLD In the R script, I use the commandArgs function to capture the arguments supplied to Rscript: args - commandArgs(trailingOnly=TRUE) The problem is that args only contains the first 12 arguments (up to and including $GIS_OPT_OUTPUT_PROJ). I also tried using a call to R instead of Rscript: R --no-save --no-restore --no-site-file --no-init-file --args ${GIS_OPT_INMAP} ${GIS_OPT_PRES} ${GIS_OPT_ENV} ${GIS_OPT_OUTMAP_GLM} ${GIS_OPT_FSTATS} ${GIS_FLAG_M} ${GIS_OPT_SPP} ${GIS_OPT_SPA} ${GIS_OPT_FAM} ${GIS_OPT_TERMS} ${GIS_FLAG_Q} ${GIS_OPT_OUTMAP_PROJ} ${GIS_OPT_ENV_PROJ} ${GIS_OPT_KFOLD} ${GIS_OPT_SFOLD} $RGRASSSCRIPT $LOGFILE 21 This gives me all the 15 arguments when using commandArgs (The reason I am not using that option is because evoking the script that way gives me another problem with R jumping back to the top at a seemingly random place somewhere halfway the script and start to run the first lines code again - but that is probably better left for another email). So, my question is, I guess, are there limits on the number of arguments that can be supplied to Rscript? Or, perhaps more likely, am I doing something wrong? Best wishes, Paulo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Forest, Giving More Importance to Some Data
your question doesn't seem to specifically related to either R or random forest. instead, it is about how to assign weights to training observations. On Sun, Mar 24, 2013 at 6:43 AM, Lorenzo Isella lorenzo.ise...@gmail.comwrote: Dear All, I am using randomForest to predict the final selling price of some items. As it often happens, I have a lot of (noisy) historical data, but the question is not so much about data cleaning. The dataset for which I need to carry out some predictions are fairly recent sales or even some sales that will took place in the near future. As a consequence, historical data should be somehow weighted: the older they are, the less they should matter for the prediction. Any idea about how this could be achieved? Please find below a snippet showing how I use the randomForest library (on a multi-core machine). Any suggestion is appreciated. Cheers Lorenzo ##**##** ### rf_model - foreach(iteration=1:cores, ntree = rep(50, 4), .combine = combine, .packages = randomForest) %dopar%{ sink(log.txt, append=TRUE) cat(paste(Starting iteration,iteration,\n)) randomForest(trainRF, prices_train, ## mtry=20, nodesize=5, ## maxnodes=140, importance=FALSE, do.trace=10,ntree=ntree) ##**##** ### __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- == WenSui Liu Credit Risk Manager, 53 Bancorp wensui@53.com 513-295-4370 == [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] boxplot
Hello John, Thank you so much for your kind assistance and the detailed descriptions. I will play with the scripts and see which one is the easiest that serves the purpose.. Best regards, Janh On Sun, Mar 24, 2013 at 7:50 AM, John Kane jrkrid...@inbox.com wrote: ** Unless you have a really large number of wells I'd just use the brute force approach of reading in each data set with a simple read.table or read.csv like well1 - read.csv(well1.csv) type of statement and repeat for each well. Here is a simple example that may give you an idea of how to do the boxplots . I have done them two ways, one using base graphics and the other using ggplot2. You will probably have to install the ggplot2 package -- just issue the command install.packages(ggplot2) The base approach is initially a lot simpler but in the longer term, if you expect to do a lot of graphing work in R, the grid packages like ggplot2 or lattice seem to offer a lot more control for less actual typing, especially if you need publication/report quality graphics. ##===start code= set.seed(345) #reproducable sample # create three sample data sets, well_1 - data.frame(arsenic = rnorm(12)) well_2 - data.frame (arsenic = rnorm(10)) well_3 - data.frame (arsenic = rnorm(15)) wells - rbind(well_1, well_2, well_3) # create single data.frame #create an id value for each well well_id - c(rep(1,nrow(well_1)), rep(2, nrow(well_2)), rep(3, nrow(well_3))) #add the well identifier wells - cbind(wells , well_id) str(wells) # check to see what we have boxplot(arsenic ~ well_id, data = wells) # plot vertical boxplot boxplot(arsenic ~ well_id, data = wells, horizontal = TRUE,col=c(red,green,blue)) #horizontal box plot # vertical boxplot using ggplot2 library(ggplot2) p - ggplot(wells, aes(as.factor(well_id), arsenic)) + geom_boxplot() p # horizontal boxplot p1 - p + coord_flip() p1 p2 - ggplot(wells, aes(as.factor(well_id), arsenic, fill = as.factor(well_id) )) + geom_boxplot() + coord_flip() + scale_fill_discrete(guide=FALSE) ##===end code== John Kane Kingston ON Canada -Original Message- *From:* annij...@gmail.com *Sent:* Sat, 23 Mar 2013 10:22:02 -0400 *To:* jrkrid...@inbox.com *Subject:* Re: [R] boxplot Hello John, I apologize for the delayed response. Yes I am referring to the same type of data in the data sets. For example, the arsenic concentrations in individual groundwater monitoring wells at a groundwater contaminated site, where one well may have 12 concentration measurements, another well has 10, etc. Thanks Janh On Fri, Mar 22, 2013 at 5:31 PM, John Kane jrkrid...@inbox.com wrote: Hi Janh, When you say that you have multiple data sets of unequal sample sizes are you speaking of the same kind of data For example are you speaking of data from a set of experiments where the variables measured are all the same and where when you graph them you expect the same x and y scales? Or are you talking about essentilly independent data sets that it makes sense to graph in a grid ? John Kane Kingston ON Canada -Original Message- From: annij...@gmail.com Sent: Fri, 22 Mar 2013 10:46:21 -0400 To: dcarl...@tamu.edu Subject: Re: [R] boxplot Hello All, On the subject of boxplots, I have multiple data sets of unequal sample sizes and was wondering what would be the most efficient way to read in the data and plot side-by-side boxplots, with options for controlling the orientation of the plots (i.e. vertical or horizontal) and the spacing? Your assistance is greatly appreciated, but please try to be explicit as I am no R expert. Thanks Janh On Thu, Mar 21, 2013 at 9:19 AM, David L Carlson dcarl...@tamu.edu wrote: Your variable loc_type combines information from two variables (loc and type). Since you are subsetting on loc, why not just plot by type? boxplot(var1~type, data[data$loc==nice,]) -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Jim Lemon Sent: Thursday, March 21, 2013 4:05 AM To: carol white Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] boxplot On 03/21/2013 07:40 PM, carol white wrote: Hi, It must be an easy question but how to boxplot a subset of data: data = read.table(my_data.txt, header = T) boxplot(data$var1[data$loc == nice]~data$loc_type[data$loc == nice]) #in this case, i want to display only the boxplot loc == nice #doesn't display the boxplot of only loc == nice. It also displays loc == mice Hi Carol, It's them old factors sneakin' up on you.
Re: [R] Ordering a matrix by row value in R2.15
fitz_ra wrote I know this is posted a lot, I've been through about 40 messages reading how to do this so let me apologize in advance because I can't get this operation to work unlike the many examples shown. I have a 2 row matrix temp [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9][,10] [1,] 17.000 9.00 26.0 5.0 23.0 21.0 19.0 17.0 10.0 63. [2,] 15.554 7.793718 33.29079 15.53094 20.44825 14.34443 11.83552 11.62997 10.16019 115.2602 I want to order the matrix using the second row in ascending order. From the many examples (usually applied to columns) the typical solution appears to be: temp[order(temp[2,]),] Error: subscript out of bounds However as you can see I get an error here. When I run this one line command: sort(temp[2,]) [1] 7.793718 10.160190 11.629973 11.835520 14.344426 15.530939 15.553999 20.448249 33.290789 [10] 115.260192 This works but I want the matrix to update and the corresponding values of row 1 to switch with the sort. Maybe consider the order function orig - matrix(c(10,20,30,3,1,2), nrow=2, byrow=TRUE) new -t(apply(orig,1,function(x) x[order(orig[2,])])) orig [,1] [,2] [,3] [1,] 10 20 30 [2,]312 new [,1] [,2] [,3] [1,] 20 30 10 [2,]123 HTH Pete -- View this message in context: http://r.789695.n4.nabble.com/Ordering-a-matrix-by-row-value-in-R2-15-tp4662337p4662340.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integrate with vectors and varying upper limit
sunny0 wrote I'd like to integrate vectors 't' and 'w' for log(w)/(1-t)^2 where i can vary the upper limit of the integral to change with each value of 't' and 'w', and then put the output into another vector. So, something like this... w=c(.33,.34,.56) t=c(.2,.5,.1) k-c(.3,.4,.5) integrand - function(t) {log(w)/(1-t)^2} integrate(integrand, lower = 0, upper = k) or maybe... integrand - function(tt) {tt} integrate(integrand, lower = 0, upper = k) ... with sapply or something similar to create the output vector. How can this be done? Something like this? w=c(.33,.34,.56) t=c(.2,.5,.1) k=c(.3,.4,.5) integrand - function(t) {log(w)/(1-t)^2} out - sapply(k,function(x) integrate(integrand, lower = 0, upper = x , subdivisions=1000)) # Output [,1] [,2][,3] value-0.4017303 -0.6249136 -0.9373696 abs.error2.798235e-05 9.17413e-05 9.209191e-05 subdivisions 32 208 91 message OK OKOK call Expression Expression Expression HTH Pete -- View this message in context: http://r.789695.n4.nabble.com/Integrate-with-vectors-and-varying-upper-limit-tp4662338p4662341.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ordering a matrix by row value in R2.15
or this with Pete's example orig[,order(orig[2,])] Pete Brecknock wrote fitz_ra wrote I know this is posted a lot, I've been through about 40 messages reading how to do this so let me apologize in advance because I can't get this operation to work unlike the many examples shown. I have a 2 row matrix temp [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9][,10] [1,] 17.000 9.00 26.0 5.0 23.0 21.0 19.0 17.0 10.0 63. [2,] 15.554 7.793718 33.29079 15.53094 20.44825 14.34443 11.83552 11.62997 10.16019 115.2602 I want to order the matrix using the second row in ascending order. From the many examples (usually applied to columns) the typical solution appears to be: temp[order(temp[2,]),] Error: subscript out of bounds However as you can see I get an error here. When I run this one line command: sort(temp[2,]) [1] 7.793718 10.160190 11.629973 11.835520 14.344426 15.530939 15.553999 20.448249 33.290789 [10] 115.260192 This works but I want the matrix to update and the corresponding values of row 1 to switch with the sort. Maybe consider the order function orig - matrix(c(10,20,30,3,1,2), nrow=2, byrow=TRUE) new -t(apply(orig,1,function(x) x[order(orig[2,])])) orig [,1] [,2] [,3] [1,] 10 20 30 [2,]312 new [,1] [,2] [,3] [1,] 20 30 10 [2,]123 HTH Pete -- View this message in context: http://r.789695.n4.nabble.com/Ordering-a-matrix-by-row-value-in-R2-15-tp4662337p4662342.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ordering a matrix by row value in R2.15
fitz_ra no address I want to order the matrix using the second row in ascending order. From the many examples (usually applied to columns) the typical solution appears to be: temp[order(temp[2,]),] Error: subscript out of bounds That tries to reorder the rows of temp according the values in its second row, which causes the error (unless you are unlucky and have more rows than columns, in which case you silently get a wrong answer). You want to reorder the columns of temp, so make the output of order() the column (second) argument to [,]: temp[, order(temp[2,])] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 9.00 10.0 17.0 19.0 21.0 5.0 17.000 23.0 [2,] 7.793718 10.16019 11.62997 11.83552 14.34443 15.53094 15.554 20.44825 [,9][,10] [1,] 26.0 63. [2,] 33.29079 115.2602 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Pete Brecknock Sent: Sunday, March 24, 2013 5:12 PM To: r-help@r-project.org Subject: Re: [R] Ordering a matrix by row value in R2.15 fitz_ra wrote I know this is posted a lot, I've been through about 40 messages reading how to do this so let me apologize in advance because I can't get this operation to work unlike the many examples shown. I have a 2 row matrix temp [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9][,10] [1,] 17.000 9.00 26.0 5.0 23.0 21.0 19.0 17.0 10.0 63. [2,] 15.554 7.793718 33.29079 15.53094 20.44825 14.34443 11.83552 11.62997 10.16019 115.2602 I want to order the matrix using the second row in ascending order. From the many examples (usually applied to columns) the typical solution appears to be: temp[order(temp[2,]),] Error: subscript out of bounds However as you can see I get an error here. When I run this one line command: sort(temp[2,]) [1] 7.793718 10.160190 11.629973 11.835520 14.344426 15.530939 15.553999 20.448249 33.290789 [10] 115.260192 This works but I want the matrix to update and the corresponding values of row 1 to switch with the sort. Maybe consider the order function orig - matrix(c(10,20,30,3,1,2), nrow=2, byrow=TRUE) new -t(apply(orig,1,function(x) x[order(orig[2,])])) orig [,1] [,2] [,3] [1,] 10 20 30 [2,]312 new [,1] [,2] [,3] [1,] 20 30 10 [2,]123 HTH Pete -- View this message in context: http://r.789695.n4.nabble.com/Ordering-a-matrix-by-row- value-in-R2-15-tp4662337p4662340.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] a contrast question
Dear R People: I have the following in a file: resp factA factB 39.5 low B- 38.6 high B- 27.2 low B+ 24.6 high B+ 43.1 low B- 39.5 high B- 23.2 low B+ 24.2 high B+ 45.2 low B- 33.0 high B- 24.8 low B+ 22.2 high B+ and I construct the data frame: collard.df - read.table(collard.txt,header=TRUE) collard.aov - aov(resp~factA*factB,data=collard.df) summary(collard.aov) Df Sum Sq Mean Sq F value Pr(F) factA1 36.436.4 5.511 0.0469 * factB1 716.1 716.1 108.419 6.27e-06 *** factA:factB 1 13.013.0 1.971 0.1979 Residuals8 52.8 6.6 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 tapply(collard.df$resp,list(collard.df$factA,collard.df$factB),mean) B- B+ high 37.0 23.7 low 42.6 25.06667 Fair enough. Let's pretend for a second that interaction existed. Then I want to set up contrasts such that mean(high) - mean(low) = 0 and mean(B+) - mean(B-) = 0. I know that this is really simple, but I've tried all kinds of things with glht and am not sure that I'm on the right track. Sorry for the trouble for the simple question. Thanks, erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a contrast question
I found the solution: http://stats.stackexchange.com/questions/12993/how-to-setup-and-interpret-anova-contrasts-with-the-car-package-in-r Sorry for the trouble. On Sun, Mar 24, 2013 at 8:58 PM, Erin Hodgess erinm.hodg...@gmail.comwrote: Dear R People: I have the following in a file: resp factA factB 39.5 low B- 38.6 high B- 27.2 low B+ 24.6 high B+ 43.1 low B- 39.5 high B- 23.2 low B+ 24.2 high B+ 45.2 low B- 33.0 high B- 24.8 low B+ 22.2 high B+ and I construct the data frame: collard.df - read.table(collard.txt,header=TRUE) collard.aov - aov(resp~factA*factB,data=collard.df) summary(collard.aov) Df Sum Sq Mean Sq F value Pr(F) factA1 36.436.4 5.511 0.0469 * factB1 716.1 716.1 108.419 6.27e-06 *** factA:factB 1 13.013.0 1.971 0.1979 Residuals8 52.8 6.6 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 tapply(collard.df$resp,list(collard.df$factA,collard.df$factB),mean) B- B+ high 37.0 23.7 low 42.6 25.06667 Fair enough. Let's pretend for a second that interaction existed. Then I want to set up contrasts such that mean(high) - mean(low) = 0 and mean(B+) - mean(B-) = 0. I know that this is really simple, but I've tried all kinds of things with glht and am not sure that I'm on the right track. Sorry for the trouble for the simple question. Thanks, erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error with paired t-test
This error keeps appearing when i perform a paired t-test in R Error in t.test.default(payoff, paired = T) : 'y' is missing for paired test This is the method i have used read.table(MeanPayoff.txt,header=T) Open Closed1 47.5 42.37502 49.25000 50.3 50.0 49.80004 33.5 20.5 34.75000 33.88006 35.5 20.50007 33.35000 12.87508 50.0 22.50009 47.15625 34.937510 44.38000 43.250011 50.0 47.500012 42.12500 26.750013 27.35000 26.625014 31.75000 36.5000 attach(payoff)names(payoff)t.test(payoff,paired=T) then the error keeps coming up Please help Charlotte [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clip a contour with shapefile while using contourplot
Hi Below is some code that does what I think you want by drawing a path based on the map data. This does some grubby low-level work with the 'sp' objects that someone else may be able to tidy up # The 21st polygon in 'hello' is the big outer boundary # PLUS the 20 other inner holes map - as(hello, SpatialPolygons)[21] # Convert map outline to path information polygonsToPath - function(ps) { # Turn the list of polygons into a single set of x/y x - do.call(c, sapply(ps, function(p) { p@coords[,1] })) y - do.call(c, sapply(ps, function(p) { p@coords[,2] })) id.lengths - sapply(ps, function(p) { nrow(p@coords) }) # Generate vertex set lengths list(x=x, y=y, id.lengths=id.lengths) } path - polygonsToPath(map@polygons[[1]]@Polygons) # Generate rect surrounding the path xrange - range(path$x) yrange - range(path$y) xbox - xrange + c(-5, 5) ybox - yrange + c(-5, 5) # Invert the path invertpath - list(x=c(xbox, rev(xbox), path$x), y=c(rep(ybox, each=2), path$y), id.lengths=c(4, path$id.lengths)) # Draw inverted map over contourplot contourplot(Salinity ~ Eastings+Northings | Time, mydata, cuts=30, pretty=TRUE, panel=function(...) { panel.contourplot(...) grid.path(invertpath$x, invertpath$y, id.lengths=invertpath$id.lengths, default=native, gp=gpar(col=green, fill=white)) }) The final result is far from perfect, but I hope it might be of some help. One issue is that most of the contour labels are obscured, though that might be ameliorated by filling the inverted map with a semi-transparent colour like rgb(1,1,1,.8). Paul On 15/02/13 08:58, Janesh Devkota wrote: Hi, I have done the interpolation for my data and I was able to create the contours in multipanel with the help of Pascal. Now, I want to clip the contour with the shapefile. I want only the portion of contour to be displayed which falls inside the boundary of the shapefile. The data mydata.csv can be found on https://www.dropbox.com/s/khi7nv0160hi68p/mydata.csv The data for shapefile can be found on https://www.dropbox.com/sh/ztvmibsslr9ocmc/YOtiwB8p9p THe code I have used so far is as follows: # Load Libraries library(latticeExtra) library(sp) library(rgdal) library(lattice) library(gridExtra) #Read Shapefile hello - readOGR(shape, layer=Export_Output_4) ## Project the shapefile to the UTM 16 zone proj4string(hello) - CRS(+proj=utm +zone=16 +ellps=WGS84) ## Read Contour data mydata - read.csv(mydata.csv) head(mydata ) summary(mydata) #Create a contourplot contourplot(Salinity ~ Eastings+Northings | Time, mydata, cuts=30,pretty=TRUE) Thank you so much. I would welcome any other ways to do this aside from contourplot and lattice. Best Regards, Janesh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr Paul Murrell Department of Statistics The University of Auckland Private Bag 92019 Auckland New Zealand 64 9 3737599 x85392 p...@stat.auckland.ac.nz http://www.stat.auckland.ac.nz/~paul/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error with paired t-test
Hi, The error message is explicit enough. You need 'y' for the paired test. with(payoff, t.test(Open, Closed1, paired=TRUE)) HTH, Pascal On 25/03/13 07:42, Charlotte Rayner wrote: This error keeps appearing when i perform a paired t-test in R Error in t.test.default(payoff, paired = T) : 'y' is missing for paired test This is the method i have used read.table(MeanPayoff.txt,header=T) Open Closed1 47.5 42.37502 49.25000 50.3 50.0 49.80004 33.5 20.5 34.75000 33.88006 35.5 20.50007 33.35000 12.87508 50.0 22.50009 47.15625 34.937510 44.38000 43.250011 50.0 47.500012 42.12500 26.750013 27.35000 26.625014 31.75000 36.5000 attach(payoff)names(payoff)t.test(payoff,paired=T) then the error keeps coming up Please help Charlotte [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.