Hi Nicholas,

I don't get it.

The coefficients are estimated through OLS. Essentially, you are just
calculating a matrix pseudo inverse, where
beta = (X^T * X)^(-1) * X^T * y

Splitting the data does not improve the model, It only works in something
like LASSO, where you have a tuning parameter.

Holding out some data will make the regression estimates worse off.

Hope to hear from you, thanks!



On Sat, Jun 1, 2019 at 10:04 AM Nicolas Hug <nio...@gmail.com> wrote:

> Splitting the data into train and test data is needed with any machine
> learning model (not just linear regression with or without least squares).
>
> The idea is that you want to evaluate the performance of your model
> (prediction + scoring) on a portion of the data that you did not use for
> training.
>
> You'll find more details in the user guide
> https://scikit-learn.org/stable/modules/cross_validation.html
>
> Nicolas
>
>
> On 5/31/19 8:54 PM, C W wrote:
>
> Hello everyone,
>
> I'm new to scikit learn. I see that many tutorial in scikit-learn follows
> the work-flow along the lines of
> 1) tranform the data
> 2) split the data: train, test
> 3) instantiate the sklearn object and fit
> 4) predict and tune parameter
>
> But, linear regression is done in least squares, so I don't think train
> test split is necessary. So, I guess I can just use the entire dataset?
>
> Thanks in advance!
>
> _______________________________________________
> scikit-learn mailing 
> listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to