Hello R folks, Today I noticed that using the subset argument in lm() with a polynomial gives a different result than using the polynomial when the data has already been subsetted. This was not at all intuitive for me. You can see an example here: https://stackoverflow.com/questions/70490599/why-does-lm-with-the-subset-argument-give-a-different-answer-than-subsetting-i
If this is a design feature that you don’t think should be fixed, can you please include it in the documentation and explain why it makes sense to figure out the orthogonal polynomials on the entire dataset? This feels like a serous leak of information when evaluating train and test datasets in a statistical learning framework. Ray Raymond R. Balise, PhD Assistant Professor Department of Public Health Sciences, Biostatistics University of Miami, Miller School of Medicine 1120 N.W. 14th Street Don Soffer Clinical Research Center - Room 1061 Miami, Florida 33136 [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel