Hi Nicholas, I don't get it.
The coefficients are estimated through OLS. Essentially, you are just calculating a matrix pseudo inverse, where beta = (X^T * X)^(-1) * X^T * y Splitting the data does not improve the model, It only works in something like LASSO, where you have a tuning parameter. Holding out some data will make the regression estimates worse off. Hope to hear from you, thanks! On Sat, Jun 1, 2019 at 10:04 AM Nicolas Hug <nio...@gmail.com> wrote: > Splitting the data into train and test data is needed with any machine > learning model (not just linear regression with or without least squares). > > The idea is that you want to evaluate the performance of your model > (prediction + scoring) on a portion of the data that you did not use for > training. > > You'll find more details in the user guide > https://scikit-learn.org/stable/modules/cross_validation.html > > Nicolas > > > On 5/31/19 8:54 PM, C W wrote: > > Hello everyone, > > I'm new to scikit learn. I see that many tutorial in scikit-learn follows > the work-flow along the lines of > 1) tranform the data > 2) split the data: train, test > 3) instantiate the sklearn object and fit > 4) predict and tune parameter > > But, linear regression is done in least squares, so I don't think train > test split is necessary. So, I guess I can just use the entire dataset? > > Thanks in advance! > > _______________________________________________ > scikit-learn mailing > listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn