[R] Neural Network models
I apologize for writing my question in Spanish. I thought that I was writing my question to the Spanish list. Agust�n Alonso --- Este correo electr�nico ha sido comprobado en busca de virus por AVG. http://www.avg.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] neural network, random forest with survey data
Hi, everyone: Does anyone know if any statistical packages (such as R) can accommodate neural network or random forest with survey data? With survey data, we have to incorporate weight with sampling issue or even with design effect. Would appreciate if anyone can help. Grace [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Neural Network
Javad, You misunderstand what is meant be 'dependent' and 'independent' variables. What you are describing is with respect to statistical independence. Please review these basic statistical concepts http://en.wikipedia.org/wiki/Dependent_and_independent_variables. Perhaps, the terms 'explanatory' (e.g. your phosphorus, nitrogen, etc.) and 'response' (e.g. eutrophication) variables are more approachable. Now, as I was saying in my first response, you don't appear to have a dependent/response variable (i.e. Eutrophication). No where in your data do you say that Eutrophication was measured or is represented in any way. Now, I assume you have 'a priori' knowledge that those variables are involved with eutrophication. You are now asking if you can predict eutrophication from these variables. Well, without something for a statistical model to evaluate against there is no means to do so, hence the exploratory, unsupervised analysis I recommended. With respect to your other question, "How can I predict these variables by NN?", well you need something to test against. For example, let's say I want to predict how much ice cream will be sold today and I have a bunch of data with amounts of ice cream sold but no other data. No matter how you approach this problem, you cannot get much out of a list of numbers with nothing to test against. Now, if my ice cream data has the amounts of ice cream and temperatures of each day associated with the respective sold amount, now I can do something. I can do my basic linear regression so help predict how much ice cream will be sold given today's temperature. The same appears to be true of your data. You have your variables, you have all of your response variables (assuming you are trying to predict Nitrogen, Chlorophyll, etc.) but nothing to test against. The best you may have is your time data which I can only assume is actual dates? If so, you could do some form of prediction based on the date. If your data is just every two weeks (no date, just repeated measures) you could analyze it temporally to see if the various nutrients are changing over time and potentially extrapolate (with caution) where the levels may ultimately reach. This may be of interest to you. As a last point, seeing as this is environmental analysis you could also try the R-sig-ecology mailing list. I am admittedly not an ecologist and there may be some other approaches or methods that could possibly be used. Feel free to sign up on that list here https://stat.ethz.ch/mailman/listinfo/r-sig-ecology I hope this explanation helps you get a better grasp of what you are trying to accomplish. Regards, On Sat, Jan 24, 2015 at 12:41 AM, javad bayat wrote: > Dear Charles; > I think my variables are dependent. For e.g. the concentration of > Phosphorus, Nitrogen, Silica and etc. have effect on the present of > Chlorophyll a and the concentration of Chlorophyll a can make the > Eutrophication in lake along with other algeas. > So I think they are dependent variables. > Regards. > > > > ------------ > On Thu, 1/22/15, Charles Determan Jr wrote: > > Subject: Re: [R] Neural Network > To: "javad bayat" , "r-help@r-project.org" < > r-help@r-project.org> > Date: Thursday, January 22, 2015, 4:41 PM > > Javad, > First, > please make sure to hit 'reply all' so that these > messages go to the R help list so others (many far more > skilled than I) may possibly chime in. > The problem here is that you appear > to have no dependent variable (i.e. no eutrophication > variable). Without it, there is no way to a typical > 'supervised' analysis. Given that this is likely a > regression type problem (I assume eutrophication would be > continous) I'm not quite sure 'supervised' is > the correct description but it furthers my point that you > need a dependent variable for any neuralnet algorithm I am > aware of. As such, if you don't have a dependent > variable then you will need to look at unsupervised methods > such as PCA. Other users may have other > suggestions. > Regards,Charles > On Wed, Jan 21, 2015 at > 11:36 PM, javad bayat > wrote: > Dear > Charles; > > Many thanks for your attention. what I want to know is: How > can I predict the Eutrophication by these parameters in the > future? > > These variables are the most important variables that > control the Eutro. in lakes. > > Let me break it to two parts. > > 1) How can I predict these variables by NN? > > 2) Is it possible to predict the Eutro. by these > variables? > > > > > > Many thanks for your help. > > Regards, > > > > > > &g
Re: [R] Neural Network
Dear All; Many thanks for your attention. what I want to know is: How can I predict the Eutrophication by these parameters in the future? These variables are the most important variables that control the Eutro. in lakes. Let me break it to two parts. 1) How can I predict these variables by NN? 2) Is it possible to predict the Eutro. by these variables? Many thanks for your help. Regards, On Thu, 1/22/15, Charles Determan Jr wrote: Subject: Re: [R] Neural Network roject.org> Date: Thursday, January 22, 2015, 4:41 PM Javad, First, please make sure to hit 'reply all' so that these messages go to the R help list so others (many far more skilled than I) may possibly chime in. The problem here is that you appear to have no dependent variable (i.e. no eutrophication variable). Without it, there is no way to a typical 'supervised' analysis. Given that this is likely a regression type problem (I assume eutrophication would be continous) I'm not quite sure 'supervised' is the correct description but it furthers my point that you need a dependent variable for any neuralnet algorithm I am aware of. As such, if you don't have a dependent variable then you will need to look at unsupervised methods such as PCA. Other users may have other suggestions. Regards,Charles On Wed, Jan 21, 2015 at wrote: Dear Charles; Many thanks for your attention. what I want to know is: How can I predict the Eutrophication by these parameters in the future? These variables are the most important variables that control the Eutro. in lakes. Let me break it to two parts. 1) How can I predict these variables by NN? 2) Is it possible to predict the Eutro. by these variables? Many thanks for your help. Regards, On Wed, 1/21/15, Charles Determan Jr wrote: Subject: Re: [R] Neural Network Cc: "r-help@r-project.org" Date: Wednesday, January 21, 2015, 9:10 PM Javad, You question is a little too broad to be answered definitively. Also, this is not a code writing service. You should make a meaningful attempt and we are here to help when you get stuck. 1. If you want to know if you can do neural nets, the answer is yes. The three packages most commonly used (that I know of) are 'neuralnet', 'nnet' and 'RSNNS'. You should look in to these package documentation for how to use them. There are also many examples online if you simply google them. 2. You question is unclear, are you wanting to predict all the variables (e.g. phosphorus, Total N, etc.) or do you have some metric for eutrophication? What exactly is the model supposed to predict? 3. If you want to know if a neuralnet is appropriate, that is more of a statistical question. It depends more on the question you want to answer. Given your temporal data, you may want to look in to mixed effects models (e.g nlme, lme4) as another potential approach. Regards, On Tue, Jan 20, 2015 at 11:35 PM, javad bayat via R-help wrote: Dear all; I am the new user of R. I want to simulation or prediction the Eutrophication of a lake. I have weekly data(almost for two years) for Total phosphorus, Total N, pH, Chlorophyll a, Alkalinity, Silica. Can I predict the Eutrophication by Neural Network in R? How can I simulation the Eutrophication by these parameter? please help me to write the codes. many thanks. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. Charles Determan, PhD Integrated Biosciences -- Dr. Charles Determan, PhD Integrated Biosciences __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Neural Network
Dear Charles; I think my variables are dependent. For e.g. the concentration of Phosphorus, Nitrogen, Silica and etc. have effect on the present of Chlorophyll a and the concentration of Chlorophyll a can make the Eutrophication in lake along with other algeas. So I think they are dependent variables. Regards. On Thu, 1/22/15, Charles Determan Jr wrote: Subject: Re: [R] Neural Network roject.org> Date: Thursday, January 22, 2015, 4:41 PM Javad, First, please make sure to hit 'reply all' so that these messages go to the R help list so others (many far more skilled than I) may possibly chime in. The problem here is that you appear to have no dependent variable (i.e. no eutrophication variable). Without it, there is no way to a typical 'supervised' analysis. Given that this is likely a regression type problem (I assume eutrophication would be continous) I'm not quite sure 'supervised' is the correct description but it furthers my point that you need a dependent variable for any neuralnet algorithm I am aware of. As such, if you don't have a dependent variable then you will need to look at unsupervised methods such as PCA. Other users may have other suggestions. Regards,Charles On Wed, Jan 21, 2015 at wrote: Dear Charles; Many thanks for your attention. what I want to know is: How can I predict the Eutrophication by these parameters in the future? These variables are the most important variables that control the Eutro. in lakes. Let me break it to two parts. 1) How can I predict these variables by NN? 2) Is it possible to predict the Eutro. by these variables? Many thanks for your help. Regards, On Wed, 1/21/15, Charles Determan Jr wrote: Subject: Re: [R] Neural Network Cc: "r-help@r-project.org" Date: Wednesday, January 21, 2015, 9:10 PM Javad, You question is a little too broad to be answered definitively. Also, this is not a code writing service. You should make a meaningful attempt and we are here to help when you get stuck. 1. If you want to know if you can do neural nets, the answer is yes. The three packages most commonly used (that I know of) are 'neuralnet', 'nnet' and 'RSNNS'. You should look in to these package documentation for how to use them. There are also many examples online if you simply google them. 2. You question is unclear, are you wanting to predict all the variables (e.g. phosphorus, Total N, etc.) or do you have some metric for eutrophication? What exactly is the model supposed to predict? 3. If you want to know if a neuralnet is appropriate, that is more of a statistical question. It depends more on the question you want to answer. Given your temporal data, you may want to look in to mixed effects models (e.g nlme, lme4) as another potential approach. Regards, On Tue, Jan 20, 2015 at 11:35 PM, javad bayat via R-help wrote: Dear all; I am the new user of R. I want to simulation or prediction the Eutrophication of a lake. I have weekly data(almost for two years) for Total phosphorus, Total N, pH, Chlorophyll a, Alkalinity, Silica. Can I predict the Eutrophication by Neural Network in R? How can I simulation the Eutrophication by these parameter? please help me to write the codes. many thanks. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. Charles Determan, PhD Integrated Biosciences -- Dr. Charles Determan, PhD Integrated Biosciences __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Neural Network
Javad, First, please make sure to hit 'reply all' so that these messages go to the R help list so others (many far more skilled than I) may possibly chime in. The problem here is that you appear to have no dependent variable (i.e. no eutrophication variable). Without it, there is no way to a typical 'supervised' analysis. Given that this is likely a regression type problem (I assume eutrophication would be continous) I'm not quite sure 'supervised' is the correct description but it furthers my point that you need a dependent variable for any neuralnet algorithm I am aware of. As such, if you don't have a dependent variable then you will need to look at unsupervised methods such as PCA. Other users may have other suggestions. Regards, Charles On Wed, Jan 21, 2015 at 11:36 PM, javad bayat wrote: > Dear Charles; > Many thanks for your attention. what I want to know is: How can I predict > the Eutrophication by these parameters in the future? > These variables are the most important variables that control the Eutro. > in lakes. > Let me break it to two parts. > 1) How can I predict these variables by NN? > 2) Is it possible to predict the Eutro. by these variables? > > > Many thanks for your help. > Regards, > > > > > > > > ------------ > On Wed, 1/21/15, Charles Determan Jr wrote: > > Subject: Re: [R] Neural Network > To: "javad bayat" > Cc: "r-help@r-project.org" > Date: Wednesday, January 21, 2015, 9:10 PM > > Javad, > You > question is a little too broad to be answered > definitively. Also, this is not a code writing service. > You should make a meaningful attempt and we are here to help > when you get stuck. > 1. > If you want to know if you can do neural nets, the answer is > yes. The three packages most commonly used (that I know > of) are 'neuralnet', 'nnet' and > 'RSNNS'. You should look in to these package > documentation for how to use them. There are also many > examples online if you simply google them. > 2. You question is unclear, are you > wanting to predict all the variables (e.g. phosphorus, Total > N, etc.) or do you have some metric for eutrophication? > What exactly is the model supposed to predict? > 3. If you want to know if a > neuralnet is appropriate, that is more of a statistical > question. It depends more on the question you want to > answer. Given your temporal data, you may want to look in > to mixed effects models (e.g nlme, lme4) as another > potential approach. > Regards, > On Tue, Jan 20, 2015 at > 11:35 PM, javad bayat via R-help > wrote: > Dear > all; > > I am the new user of R. I want to simulation or prediction > the Eutrophication of a lake. I have weekly data(almost for > two years) for Total phosphorus, Total N, pH, Chlorophyll a, > Alkalinity, Silica. > > Can I predict the Eutrophication by Neural Network in R? > > How can I simulation the Eutrophication by these > parameter? > > please help me to write the codes. > > many thanks. > > > > __ > > R-help@r-project.org > mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible > code. > > > > > -- > Dr. Charles Determan, PhD > Integrated Biosciences > > > -- Dr. Charles Determan, PhD Integrated Biosciences [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Neural Network
Javad, You question is a little too broad to be answered definitively. Also, this is not a code writing service. You should make a meaningful attempt and we are here to help when you get stuck. 1. If you want to know if you can do neural nets, the answer is yes. The three packages most commonly used (that I know of) are 'neuralnet', 'nnet' and 'RSNNS'. You should look in to these package documentation for how to use them. There are also many examples online if you simply google them. 2. You question is unclear, are you wanting to predict all the variables (e.g. phosphorus, Total N, etc.) or do you have some metric for eutrophication? What exactly is the model supposed to predict? 3. If you want to know if a neuralnet is appropriate, that is more of a statistical question. It depends more on the question you want to answer. Given your temporal data, you may want to look in to mixed effects models (e.g nlme, lme4) as another potential approach. Regards, On Tue, Jan 20, 2015 at 11:35 PM, javad bayat via R-help < r-help@r-project.org> wrote: > Dear all; > I am the new user of R. I want to simulation or prediction the > Eutrophication of a lake. I have weekly data(almost for two years) for > Total phosphorus, Total N, pH, Chlorophyll a, Alkalinity, Silica. > Can I predict the Eutrophication by Neural Network in R? > How can I simulation the Eutrophication by these parameter? > please help me to write the codes. > many thanks. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Dr. Charles Determan, PhD Integrated Biosciences [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Neural Network
Dear all; I am the new user of R. I want to simulation or prediction the Eutrophication of a lake. I have weekly data(almost for two years) for Total phosphorus, Total N, pH, Chlorophyll a, Alkalinity, Silica. Can I predict the Eutrophication by Neural Network in R? How can I simulation the Eutrophication by these parameter? please help me to write the codes. many thanks. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] neural network in R
Hello everybody I try to fit a neural network on my data by using package 'neuralnet' or 'nnet'. I did it several times but I got an unexpected answer, this is my code (num.obs=100): ( library('nnet') y<-data.frame(data$CU) (y is cu concentration) x<-data.frame(data$mrvbf,data$plcurvature,data$insol) mod1<-nnet(x,y,size=3,linout=T) ) when I write: mod1$fitted.values, it is same for all of 100 y. e.g. 1 832.77 2 832.77 3 832.77 . . . 100 832.77 I don't know where is the problem? Please help Thanks alot [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Neural Network Problem
Hello Professionals, I am new to R and am planning to use R for a Artificial Neural Network regression. I have 10 different scenarios for each observation (Input). For each scenario, there are 7 variables, which means 7 output. I have 1000 observations in total and I do have 1000 expected output.I want to use 800 observations for training and the rest for testing. Could any one provide a sample for my case? I don't quite understand the instructions from the packages. Appreciated. -- View this message in context: http://r.789695.n4.nabble.com/Neural-Network-Problem-tp4672275.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Neural network: Amore adaptative vs batch why the results are so different?
I am using the iris example came with nnet package to test AMORE. I can see the outcomes are similar to nnet with adaptative gradient descent. However, when I changed the method in the newff to the batch gradient descent, even by setting the epoch numbers very large, I still found all the iris expected class=2 being classified as class=3. In addition, all those records in the outcomes (y) are the three digits, 0, 0.4677313, and 0.5111955. The script is as below. Please help to understand this behavior. library('AMORE') ir <- rbind(iris3[,,1], iris3[,,2], iris3[,,3]) targets <- matrix(c(rep(c(1,0,0),50), rep(c(0,1,0),50), rep(c(0,0,1),50)), 150, 3, byrow=TRUE) samp <- c(sample(1:50,25), sample(51:100,25), sample(101:150,25)) net <- newff(n.neurons=c(4, 2, 3), # number of units per layer learning.rate.global=1e-2,# learning rate at which every neuron is trained momentum.global=5e-4, # momentum for every neuron error.criterium="LMS",# error criterium: least mean squares hidden.layer="sigmoid",# activation function of the hidden layer neurons output.layer="sigmoid", # activation function of the output layer neurons method="BATCHgdwm") # training method: adaptative or batch nnfit <- train(net, # network to train ir[samp,], # input training samples targets[samp,], # output training samples error.criterium="LMS", # error criterium report=TRUE, # provide information during training n.show=10, # number of times to report show.step=4) y<-sim(nnfit$net,ir[samp,]) Thanks, Xiaoyan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Neural Network
2010/7/18 Arnaud Trébaol : > Hi all, > > I am working for my master's thesis and I need to do a neural network to > forecast stock market price, with also external inputs like technical > indicators. > I would like to know which function and package of R are more suitable for > this study. > > Thanks a lot for your response, > Arnaud TREBAOL. See also the following article in the current issue of the R Journal: neuralnet: Training of neural networks http://journal.r-project.org/archive/2010-1/RJournal_2010-1_Guenther+Fritsch.pdf -Rainer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Neural Network
I'd start with the nnet library type: ?nnet CS - Corey Sparks, PhD Assistant Professor Department of Demography and Organization Studies University of Texas at San Antonio 501 West Durango Blvd Monterey Building 2.270C San Antonio, TX 78207 210-458-3166 corey.sparks 'at' utsa.edu https://rowdyspace.utsa.edu/users/ozd504/www/index.htm -- View this message in context: http://r.789695.n4.nabble.com/Neural-Network-tp2293366p2293369.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Neural Network
Hi all, I am working for my master's thesis and I need to do a neural network to forecast stock market price, with also external inputs like technical indicators. I would like to know which function and package of R are more suitable for this study. Thanks a lot for your response, Arnaud TREBAOL. -- Arnaud Trébaol T.I.M.E. Student Ecole Centrale de Lille (09) Politecnico di Milano (10) Mail : arnaud.treb...@centraliens-lille.org Tel1 : +33 (0)6 76 46 42 92 Tel2 : +39 327 280 57 68 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Neural Network package AMORE and a weight decay
Hi, I want to use the neural network package AMORE and I don't find in the documentation the weight decay option. Could someone tell if it is possible to add a regularization parameter (also known as a weight decay) to the training method. Is it possible to alter the gradient descent rule for that? Thanks, Ron _ The New Busy is not the old busy. Search, chat and e-mail from your inbox. N:WL:en-US:WM_HMP:042010_3 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Neural Network
Hi, We are trying to implement a early stopping rule with validation set on a neural network. We’re using the AMORE package (http://rwiki.sciviews.org/doku.php?id=packages:cran:amore) of R and when you train the network you have to specify following variables: Pval Tval What do we have to put here, or how do we have to specify this values? We are using simulated data from a sinc function. This is the code that we are using. #define a sinc function sinc <- function(x) sin(pi*x)/(pi*x) size_data = 200 # Generate data from sin function ticks = linspace(-1,1,size_data) sin_data = sinc(ticks) # Generate noise std_dev = 0.5 noise_data <- runif(size_data, 0, std_dev) # Impose noise on sin data dat = sin_data + noise_data #Normalise data max_dat = max(dat) norm_dat = dat/max(dat) #Define a neural network net.start <- newff(n.neurons=c(1,20, 1), learning.rate.global=1e-3, momentum.global=0.5, error.criterium="LMS", Stao=NA, hidden.layer="tansig", output.layer="purelin", method="ADAPTgd") #Train the network result <- train(net.start, ticks, norm_dat, Pval= NULL, Tval=NULL, error.criterium="LMS", report=FALSE, show.step=8000, n.shows=0) Are there any tips you can give for a better neural network or a better training of this net? Thanks a lot, A desperate team in search of help. -- View this message in context: http://n4.nabble.com/Neural-Network-tp1579365p1579365.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] neural network arguments
hi everyone! my inquiry with neural network is rather basic. i am just learning neural network, particularly the VR bundle. i read the documentations for the said bundle but still is struggling on understanding some arguments - x is the matrix or data frame of x values for example does this mean data frame for training? - y is the matrix or data frame of target values for example does this mean data frame for testing? - would fitting the single-hidden-layer NN train and test my data? what does "fitting" really do? i know these are very basic questions but i just started exploring NN packages. thanks in advance for your help! -- View this message in context: http://www.nabble.com/neural-network-arguments-tp25787734p25787734.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Neural Network resource
Thanks to all those who have replied to my query. I have decided do a thorough reading on this subject and try to seek out a proper solution. I will stick to the nnet package as mentioned by Jude and try and compare results with other neural network software if possible. Regards, Indrajit From: "jude.r...@ubs.com" Sent: Thursday, May 28, 2009 10:49:36 PM Subject: Re: [R] Neural Network resource The package AMORE appears to be more flexible, but I got very poor results using it when I tried to improve the predictive accuracy of a regression model. I don't understand all the options well enough to be able to fine tune it to get better predictions. However, using the nnet() function in package VR gave me decent results and is pretty easy to use (see the Venables and Ripley book, Modern Applied Statistics with S, pages 243 to 249, for more details). I tried using package neuralnet as well but the neural net failed to converge. I could not figure out how to set the threshold option (or other options) to get the neural net to converge. I explored package neural as well. Of all these 4 packages, the nnet() function in package VR worked the best for me. As another R user commented as well, you have too many hidden layers and too many neurons. In general you do not need more than 1 hidden layer. One hidden layer is sufficient for the "universal approximator" property of neural networks to hold true. As you keep adding neurons to the one hidden layer, the problem becomes more and more non-linear. If you add too many neurons you will overfit. In general, you do not need to add more than 10 neurons. The activation function in the hidden layer of Venables and Ripley's nnet() function is logistic, and you can specify the activation function in the output layer to be linear using linout = T in nnet(). Using one hidden layer, and starting with one hidden neuron and working up to 10 hidden neurons, I built several neural nets (4,000 records) and computed the training MSE. I also computed the validation MSE on a holdout sample of over 1,000 records. I also started with 2 variables and worked up to 15 variables in a "for" loop, so in all, I built 140 neural nets using 2 "for" loops, and stored the results in lists. I arranged my variables in the data frame based on correlations and partial correlations so that I could easily add variables in a "for" loop. This was my "crude" attempt to simulate variable selection since, from what I have seen, neural networks do not have variable selection methods. In my particular case, neural networks gave me marginally better results than regression. It all depends on the problem. If the data has non-linear patterns, neural networks will be better than linear regression. My code is below. You can modify it to suit your needs if you find it useful. There are probably lines in the code that are redundant which can be deleted. HTH. Jude Ryan My code: # set order in data frame train2 based on correlations and partial correlations train2 <- train[, c(5,27,19,20,25,26,4,9,3,10,16,6,2,14,21,28)] dim(train2) names(train2) library(nnet) # skip = T # train 10 neural networks in a loop and find the one with the minimum test and validation error # create various lists to store the results of the neural network running in two for loops # The Column List is for the outer for loop, which loops over variables # The Row List is for the inner for loop, which loops over number of neurons in the hidden layer col_nn <- list() # stores the results of nnet() over variables - outer loop row_nn <- list() # stores the results of nnet() over neurons - inner loop col_mse <- list() # row_mse <- list() # not needed because nn.mse is a data frame with rows col_sum <- list() row_sum <- list() col_vars <- list() row_vars <- list() col_wts <- list() row_wts <- list() df_dim <- dim(train2) df_dim[2] # number of variables df_dim[2] - 1 num_of_neurons <- 10 # build data frame to store results of neural net for each run nn.mse <- data.frame(Train_MSE=seq(1:num_of_neurons), Valid_MSE=seq(1:num_of_neurons)) # open log file and redirect output to log file sink("D:\\XXX\\YYY\\ Programs\\Neural_Network_v8_VR_log.txt") # outer loop - loop over variables for (i in 3:df_dim[2]) { # df_dim[2] # inner loop - loop over number of hidden neurons for (j in 1:num_of_neurons) { # upto 10 neurons in the hidden layer # need to create a new data frame with just the predictor/input variables needed train3 <- train2[,c(1:i)] coreaff.nn <- nnet(dep_var ~ ., train3, size = j, decay = 1e-3, linout = T, skip = T, maxit = 1000, Hess = T) # row_vars[[j]] <- coreaff.nn$call # not what we want # row_vars[[j]] <- names(train3)[c(2:i)] # not needed in inner loop - same number of variables f
Re: [R] Neural Network resource
The package AMORE appears to be more flexible, but I got very poor results using it when I tried to improve the predictive accuracy of a regression model. I don't understand all the options well enough to be able to fine tune it to get better predictions. However, using the nnet() function in package VR gave me decent results and is pretty easy to use (see the Venables and Ripley book, Modern Applied Statistics with S, pages 243 to 249, for more details). I tried using package neuralnet as well but the neural net failed to converge. I could not figure out how to set the threshold option (or other options) to get the neural net to converge. I explored package neural as well. Of all these 4 packages, the nnet() function in package VR worked the best for me. As another R user commented as well, you have too many hidden layers and too many neurons. In general you do not need more than 1 hidden layer. One hidden layer is sufficient for the "universal approximator" property of neural networks to hold true. As you keep adding neurons to the one hidden layer, the problem becomes more and more non-linear. If you add too many neurons you will overfit. In general, you do not need to add more than 10 neurons. The activation function in the hidden layer of Venables and Ripley's nnet() function is logistic, and you can specify the activation function in the output layer to be linear using linout = T in nnet(). Using one hidden layer, and starting with one hidden neuron and working up to 10 hidden neurons, I built several neural nets (4,000 records) and computed the training MSE. I also computed the validation MSE on a holdout sample of over 1,000 records. I also started with 2 variables and worked up to 15 variables in a "for" loop, so in all, I built 140 neural nets using 2 "for" loops, and stored the results in lists. I arranged my variables in the data frame based on correlations and partial correlations so that I could easily add variables in a "for" loop. This was my "crude" attempt to simulate variable selection since, from what I have seen, neural networks do not have variable selection methods. In my particular case, neural networks gave me marginally better results than regression. It all depends on the problem. If the data has non-linear patterns, neural networks will be better than linear regression. My code is below. You can modify it to suit your needs if you find it useful. There are probably lines in the code that are redundant which can be deleted. HTH. Jude Ryan My code: # set order in data frame train2 based on correlations and partial correlations train2 <- train[, c(5,27,19,20,25,26,4,9,3,10,16,6,2,14,21,28)] dim(train2) names(train2) library(nnet) # skip = T # train 10 neural networks in a loop and find the one with the minimum test and validation error # create various lists to store the results of the neural network running in two for loops # The Column List is for the outer for loop, which loops over variables # The Row List is for the inner for loop, which loops over number of neurons in the hidden layer col_nn <- list() # stores the results of nnet() over variables - outer loop row_nn <- list() # stores the results of nnet() over neurons - inner loop col_mse <- list() # row_mse <- list() # not needed because nn.mse is a data frame with rows col_sum <- list() row_sum <- list() col_vars <- list() row_vars <- list() col_wts <- list() row_wts <- list() df_dim <- dim(train2) df_dim[2] # number of variables df_dim[2] - 1 num_of_neurons <- 10 # build data frame to store results of neural net for each run nn.mse <- data.frame(Train_MSE=seq(1:num_of_neurons), Valid_MSE=seq(1:num_of_neurons)) # open log file and redirect output to log file sink("D:\\XXX\\YYY\\ Programs\\Neural_Network_v8_VR_log.txt") # outer loop - loop over variables for (i in 3:df_dim[2]) { # df_dim[2] # inner loop - loop over number of hidden neurons for (j in 1:num_of_neurons) { # upto 10 neurons in the hidden layer # need to create a new data frame with just the predictor/input variables needed train3 <- train2[,c(1:i)] coreaff.nn <- nnet(dep_var ~ ., train3, size = j, decay = 1e-3, linout = T, skip = T, maxit = 1000, Hess = T) # row_vars[[j]] <- coreaff.nn$call # not what we want # row_vars[[j]] <- names(train3)[c(2:i)] # not needed in inner loop - same number of variables for all neurons row_sum[[j]] <- summary(coreaff.nn) row_wts[[j]] <- coreaff.nn$wts rownames(nn.mse)[j] <- paste("H", j, sep="") nn.mse[j, "Train_MSE"] <- mean((train3$dep_var - predict(coreaff.nn))^2) nn.mse[j, "Valid_MSE"] <- mean((valid$dep_var - predict(coreaff.nn, valid))^2) } col_vars[[i-2]] <- names(train3)[c(2:i)] col_sum[[i-2]] <- row_sum col_wts[[i-2]] <- row_wts col_mse[[i-2]] <- nn.mse } # cbind(col_vars[1],col_vars[2]) col_vars col_sum col_wts sink() cbind(col_mse[[1]],col_mse[[2]],col_mse[[3]],col_mse[[4]],col_mse[[5]
Re: [R] Neural Network resource
I haven't used the AMORE package before, but it sounds like you haven't set linear output units or something. Here's an example using the nnet package of what you're doing i think: ### R START### > # set random seed to a cool number > set.seed(42) > > # set up data > x1<-rnorm(100); x2<-rnorm(100); x3<-rnorm(100) > x4<-rnorm(100); x5<-rnorm(100); x6<-rnorm(100) > b1<-1; b2<-2; b3<-3 > b4<-4; b5<-5; b6<-6 > y<-b1*x1 + b2*x2 + b3*x3 + b4*x4 + b5*x5 + b6*x6 > my.df <- data.frame(cbind(y, x1, x2, x3, x4, x5, x6)) > > # 1. linear regression > my.lm <- lm(y~., data=my.df) > > # look at correlation > my.lm.predictions<-predict(my.lm) > cor(my.df["y"], my.lm.predictions) [,1] y1 > > # 2. nnet > library(nnet) > my.nnet<-nnet(y~., data=my.df, size=3, linout=TRUE, skip=TRUE, trace=FALSE, maxit=1000) > > my.nnet.predictions<-predict(my.nnet, my.df) > # look at correlation > cor(my.df["y"], my.nnet.predictions) [,1] y1 > > # to look at the values side by side > cbind(my.df["y"], my.nnet.predictions) y my.nnet.predictions 110.60102566 10.59958907 2 6.70939465 6.70956529 3 2.28934732 2.28928930 414.51012458 14.51043732 5 -12.85845371-12.85849345 [..etc] ### R END ### Hope that helps a wee bit mate, Tony Breyal On 27 May, 15:36, Indrajit Sengupta wrote: > You are right there is a pdf file which describes the function. But let tell > you where I am coming from. > > Just to test if a neural network will work better than a ordinary least > square regression, I created a dataset with one dependent variable and 6 > other independent variables. Now I had deliberately created the dataset in > such manner that we have an excellent regression model. Eg: Y = b0 + b1*x1 + > b2*x2 + b3*x3.. + b6*x6 + e > where e is normal random variable. Naturally any statistical analysis system > running regression would easily predict the values of b1, b2, b3, ..., b6 > with around 30-40 observations. > > I fed this data into a Neural network (3 hidden layers with 6 neurons in each > layer) and trained the network. When I passed the input dataset and tried to > get the predictions, all the predicted values were identical! This confused > me a bit and was wondering whether my understanding of the Neural Network was > wrong. > > Have you ever faced anything like it? > > Regards, > Indrajit > > > From: "markle...@verizon.net" > > Sent: Wednesday, May 27, 2009 7:54:59 PM > Subject: Re: [R] Neural Network resource > > Hi: I've never used that package but most likely there is a AMORE vignette > that shows examples and describes the functions. > it should be on the same cran web page where the package resides, in pdf > form. > > Hi All, > > I am trying to learn Neural Networks. I found that R has packages which can > help build Neural Nets - the popular one being AMORE package. Is there any > book / resource available which guides us in this subject using the AMORE > package? > > Any help will be much appreciated. > > Thanks, > Indrajit > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Neural Network resource
Here is the code that i had used: # ## Read in the raw data fitness <- c(44,89.47,44.609,11.37,62,178,182, 40,75.07,45.313,10.07,62,185,185, 44,85.84,54.297,8.65,45,156,168, 42,68.15,59.571,8.17,40,166,172, 38,89.02,49.874,9.22,55,178,180, 47,77.45,44.811,11.63,58,176,176, 40,75.98,45.681,11.95,70,176,180, 43,81.19,49.091,10.85,64,162,170, 44,81.42,39.442,13.08,63,174,176, 38,81.87,60.055,8.63,48,170,186, 44,73.03,50.541,10.13,45,168,168, 45,87.66,37.388,14.03,56,186,192, 45,66.45,44.754,11.12,51,176,176, 47,79.15,47.273,10.6,47,162,164, 54,83.12,51.855,10.33,50,166,170, 49,81.42,49.156,8.95,44,180,185, 51,69.63,40.836,10.95,57,168,172, 51,77.91,46.672,10,48,162,168, 48,91.63,46.774,10.25,48,162,164, 49,73.37,50.388,10.08,67,168,168, 57,73.37,39.407,12.63,58,174,176, 54,79.38,46.08,11.17,62,156,165, 52,76.32,45.441,9.63,48,164,166, 50,70.87,54.625,8.92,48,146,155, 51,67.25,45.118,11.08,48,172,172, 54,91.63,39.203,12.88,44,168,172, 51,73.71,45.79,10.47,59,186,188, 57,59.08,50.545,9.93,49,148,155, 49,76.32,48.673,9.4,56,186,188, 48,61.24,47.92,11.5,52,170,176, 52,82.78,47.467,10.5,53,170,172 ) fitness2 <- data.frame(matrix(fitness,nrow = 31, byrow = TRUE)) colnames(fitness2) <- c("Age","Weight","Oxygen","RunTime","RestPulse","RunPulse","MaxPulse") attach(fitness2) ## Create the input dataset indep <- fitness2[,-3] ## Create the neural network structure net.start <- newff(n.neurons=c(6,6,6,1), learning.rate.global=1e-2, momentum.global=0.5, error.criterium="LMS", Stao=NA, hidden.layer="tansig", output.layer="purelin", method="ADAPTgdwm") ## Train the net result <- train(net.start, indep, Oxygen, error.criterium="LMS", report=TRUE, show.step=100, n.shows=5 ) ## Predict pred <- sim(result$net, indep) pred ### Here I am trying to predict Oxygen levels using the 6 independent variables. But whenever I am trying to run a prediction - I am getting constant values throughout (In the above example - the values of pred). Thanks & Regards, Indrajit - Original Message From: Max Kuhn To: Indrajit Sengupta Cc: markle...@verizon.net; R Help Sent: Wednesday, May 27, 2009 9:19:47 PM Subject: Re: [R] Neural Network resource > I fed this data into a Neural network (3 hidden layers with 6 neurons in each > layer) and trained the network. When I passed the input dataset and tried to > get the predictions, all the predicted values were identical! This confused > me a bit and was wondering whether my understanding of the Neural Network was > wrong. > > Have you ever faced anything like it? You should really provide code for us to help. I would initially suspect that you didn't use a linear function between your hidden units and the outcomes. Also, using 3 hidden layers and 6 units per layer is a bit much for your data set (30-40 samples). You will probably end up overfitting. -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Neural Network resource
> I fed this data into a Neural network (3 hidden layers with 6 neurons in each > layer) and trained the network. When I passed the input dataset and tried to > get the predictions, all the predicted values were identical! This confused > me a bit and was wondering whether my understanding of the Neural Network was > wrong. > > Have you ever faced anything like it? You should really provide code for us to help. I would initially suspect that you didn't use a linear function between your hidden units and the outcomes. Also, using 3 hidden layers and 6 units per layer is a bit much for your data set (30-40 samples). You will probably end up overfitting. -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Neural Network resource
You are right there is a pdf file which describes the function. But let tell you where I am coming from. Just to test if a neural network will work better than a ordinary least square regression, I created a dataset with one dependent variable and 6 other independent variables. Now I had deliberately created the dataset in such manner that we have an excellent regression model. Eg: Y = b0 + b1*x1 + b2*x2 + b3*x3.. + b6*x6 + e where e is normal random variable. Naturally any statistical analysis system running regression would easily predict the values of b1, b2, b3, ..., b6 with around 30-40 observations. I fed this data into a Neural network (3 hidden layers with 6 neurons in each layer) and trained the network. When I passed the input dataset and tried to get the predictions, all the predicted values were identical! This confused me a bit and was wondering whether my understanding of the Neural Network was wrong. Have you ever faced anything like it? Regards, Indrajit From: "markle...@verizon.net" Sent: Wednesday, May 27, 2009 7:54:59 PM Subject: Re: [R] Neural Network resource Hi: I've never used that package but most likely there is a AMORE vignette that shows examples and describes the functions. it should be on the same cran web page where the package resides, in pdf form. Hi All, I am trying to learn Neural Networks. I found that R has packages which can help build Neural Nets - the popular one being AMORE package. Is there any book / resource available which guides us in this subject using the AMORE package? Any help will be much appreciated. Thanks, Indrajit __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Neural Network resource
There's a link on the CRAN page for the AMORE package which apears to have some cool information: http://wiki.r-project.org/rwiki/doku.php?id=packages:cran:amore Seems like an interesting package, I hadn't actually heard of it before your post. HTH, Tony On 27 May, 09:13, Indrajit Sengupta wrote: > Hi All, > > I am trying to learn Neural Networks. I found that R has packages which can > help build Neural Nets - the popular one being AMORE package. Is there any > book / resource available which guides us in this subject using the AMORE > package? > > Any help will be much appreciated. > > Thanks, > Indrajit > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Neural Network resource
Hi All, I am trying to learn Neural Networks. I found that R has packages which can help build Neural Nets - the popular one being AMORE package. Is there any book / resource available which guides us in this subject using the AMORE package? Any help will be much appreciated. Thanks, Indrajit __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] neural network not using all observations
I am exploring neural networks (adding non-linearities) to see if I can get more predictive power than a linear regression model I built. I am using the function nnet and following the example of Venables and Ripley, in Modern Applied Statistics with S, on pages 246 to 249. I have standardized variables (z-scores) such as assets, age and tenure. I have other variables that are binary (0 or 1). In max_acc_ownr_nwrth_n_med for example, the variable has a value of 1 if the client's net worth is above the median net worth and a value of 0 otherwise. These are derived variable I created and variables that the regression algorithm has found to be predictive. A regression on the same variables shown below gives me an R-Square of about 0.12. I am trying to increase the predictive power of this regression model with a neural network being careful to avoid overfitting. Similar to Venables and Ripley, I used the following code: > library(nnet) > dim(coreaff.trn.nn) [1] 50888 > head(coreaff.trn.nn) hh.iast.y WC_Total_Assets all_assets_per_hh age tenure max_acc_ownr_liq_asts_n_med max_acc_ownr_nwrth_n_med max_acc_ownr_ann_incm_n_med 1 3059448 -0.4692186-0.4173532 -0.06599001 -1.04747935 01 0 2 4899746 3.4854334 4.064 -0.06599001 -0.72540200 11 1 3727333 -0.2677357-0.4177944 -0.30136473 -0.40332465 11 1 4443138 -0.5295170-0.6999646 -0.1825 -1.04747935 00 0 5484253 -0.6112205-0.7306664 0.64013414 0.07979137 10 0 6799054 0.6580506 1.1763114 0.24784295 0.07979137 01 1 > coreaff.nn1 <- nnet(hh.iast.y ~ WC_Total_Assets + all_assets_per_hh + age + tenure + max_acc_ownr_liq_asts_n_med + + max_acc_ownr_nwrth_n_med + max_acc_ownr_ann_incm_n_med, coreaff.trn.nn, size = 2, decay = 1e-3, + linout = T, skip = T, maxit = 1000, Hess = T) # weights: 26 initial value 12893652845419998.00 iter 10 value 6352515847944854.00 final value 6287104424549762.00 converged > summary(coreaff.nn1) a 7-2-1 network with 26 weights options were - skip-layer connections linear output units decay=0.001 b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1 i6->h1 i7->h1 -21604.84 -2675.80 -5001.90 -1240.16-335.44 -12462.51 -13293.80 -9032.34 b->h2 i1->h2 i2->h2 i3->h2 i4->h2 i5->h2 i6->h2 i7->h2 210841.52 47296.92 58100.43 -13819.10 -9195.80 117088.99 131939.57 106994.47 b->o h1->o h2->o i1->o i2->o i3->o i4->o i5->o i6->o i7->o 1115190.67 894123.33 -417269.57 89621.84 170268.12 44833.63 59585.05 112405.30 437581.05 244201.69 > sum((hh.iast.y - predict(coreaff.nn1))^2) Error: object "hh.iast.y" not found So I try: > sum((coreaff.trn.nn$hh.iast.y - predict(coreaff.nn1))^2) Error: dims [product 5053] do not match the length of object [5088] In addition: Warning message: In coreaff.trn.nn$hh.iast.y - predict(coreaff.nn1) : longer object length is not a multiple of shorter object length Doing a little debugging: > pred <- predict(coreaff.nn1) > dim(pred) [1] 50531 > dim(coreaff.trn.nn) [1] 50888 So it looks like the dimensions (number of records/cases) of the vector pred is 5,053 and the number of records of the input dataset is 5,088. It looks like the neural network is dropping 35 records. Does anyone have any idea of why it would do this? It is most probably because those 35 records are "bad" data, a pretty common occurrence in the real world. Does anyone know how I can identify the dropped records? If I can do this I can get the dimensions of the input dataset to be 5,053 and then: > sum((coreaff.trn.nn$hh.iast.y - predict(coreaff.nn1))^2) would work. A summary of my dataset is: > summary(coreaff.trn.nn) hh.iast.yWC_Total_Assets all_assets_per_hh age tenure max_acc_ownr_liq_asts_n_med Min. : 0 Min. :-6.970e-01 Min. :-8.918e-01 Min. :-4.617e+00 Min. :-1.209e+00 Min. :0. 1st Qu.: 565520 1st Qu.:-5.387e-01 1st Qu.:-6.147e-01 1st Qu.:-4.583e-01 1st Qu.:-7.254e-01 1st Qu.:0. Median : 834164 Median :-3.160e-01 Median :-3.718e-01 Median : 9.093e-02 Median :-2.423e-01 Median :0. Mean : 1060244 Mean : 2.948e-13 Mean : 3.204e-12 Mean :-1.884e-11 Mean :-3.302e-12 Mean :0.4951 3rd Qu.: 1207181 3rd Qu.: 1.127e-01 3rd Qu.: 1.891e-01 3rd Qu.: 5.617e-01 3rd Qu.: 5.629e-01 3rd Qu.:1.