Thanks everyone for their responses.

My data is organized in a data.table.  My goal is to perform analyses according to some groups.  The results of analysis are objects.  If these objects could be stored as elements of a data.table, this would help downstream summarizing of results.

Let me try another example.

carsdt <- setDT(copy(mtcars))

carsdt[, unique(cyl) |> length()]
#[1] 3

carsreg <- carsdt[, .(fit = lm(mpg ~ disp + hp + wt)), by = .(cyl)]

#I would like a data.table with three rows, one each for "lm" object corresponding to cyl value

carsreg[, .N]
#[1] 36

#Here each component of "lm" object is stored in a separate row.

carsreg[1]
#     cyl                                             fit
#   <num> <lm>
#1:     6 30.27790680, 0.01610061,-0.01097072,-3.89618307

lm(mpg ~ disp + hp + wt, data = mtcars, subset = (cyl == 6)) |> coef()
#(Intercept)        disp          hp          wt
#30.27790680  0.01610061 -0.01097072 -3.89618307

A less satisfactory solution is to extract desired components and store them in data.table.  But this requires multiple calls to lm().

carsreg2 <- carsdt[, .(coef = list(coef(lm(mpg ~ disp + hp + wt))), rsq = summary(lm(mpg ~ disp + hp + wt))$r.squared), by = .(cyl)]

Now if I want to also include F-statistic, it would require an additional call to lm() and adding a column to above data.table.  Is there a way to avoid this?

Naresh

On 9/22/24 2:00 AM, Bert Gunter wrote:
Well, you may have good reasons to do things this way -- and you
certainly do not have to explain them here.

But you might wish to consider using R's poly() function and a basic
nested list structure to do something quite similar that seems much
simpler to me, anyway:

x <- rnorm(20)
df <- data.frame(x = x, y = x + .1*x^2 + rnorm(20, sd = .2))
result <-
    with(df,
           lapply(1:2, \(i)
                  list(
                      degree = i, reg =lm(y ~ poly(x, i, raw = TRUE))
                     )
           )
    )

As you can see, 'result' is a list, each component of which is a list
of two with names "degree" and "reg" giving the same info as each row
of your 'mydt'. You can use lapply() and friends to access these
results and fiddle with them as you like, such as: "extract the
coefficients from the second degree fits only", and so forth. Also
note that individual components of nested lists can be extracted by
giving a vector to [[ instead of repeated [['s. For example:
result[[2]][[2]]  ## the reg component of the degree 2 polynomial
## is the same as
result[[c(2,2)]] ## this is a bit easier for me to groc.

Again, feel free to ignore without replying if my gratuitous remarks
are unhelpful.

Cheers,
Bert


On Sat, Sep 21, 2024 at 2:25 PM Naresh Gurbuxani
<naresh_gurbux...@hotmail.com> wrote:
I am trying to store regression objects in a data.table

df <- data.frame(x = rnorm(20))
df[, "y"] <- with(df, x + 0.1 * x^2 + 0.2 * rnorm(20))

mydt <- data.table(mypower = c(1, 2), myreg = list(lm(y ~ x, data = df),
lm(y ~ x + I(x^2), data = df)))

mydt
#   mypower    myreg
#     <num>   <list>
#1:       1 <lm[12]>
#2:       2 <lm[12]>

But mydt[1, 2] has only the coeffients of the first regression. mydt[2,
2] has residuals of the first regression.  These are the first two
components of "lm" object.

mydt[1, myreg[[1]]]
#(Intercept)           x
#   0.107245    1.034110

Is there a way to put full "lm" object in each row?

Thanks,
Naresh

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to