> 2008/6/7 John Darrington <[EMAIL PROTECTED]>: > > On Thu, Jun 05, 2008 at 05:05:38PM -0400, Jason Stover wrote: > > > > The big > > problem there is the "accounting" problem of mapping values of a > > qualitative > > variable back and forth to vectors with binary entries. > > > > I don't understand why this is a "big" problem, but perhaps I'm being > > naive. Would it be possible to have a brief specification for the > > problem. > > On Sat, Jun 07, 2008 at 09:49:27AM +0100, Ed wrote: > I read this code last night, and the existing implementation is > straightforward, but doesn't handle some of the more complicated > things: > > sigma restricted encoding (this seems tough - might be worth leaving > as a later enhancement) > interactions > - which lead to: > nested designs > [partial] factorial designs > mixture surface models (i think they're called - regression with > interactions) > > I'm not sure what the ideal spec for a routine building a design > matrix is. The existing code does everything you need at a basic > level, provided you have all your independent variables, but it > doesn't introduce terms to handle interactions. Something on top > perhaps needs to take a model spec like A(B) C C*D or whatever and > turn that into a set of independent variables for the design matrix > routine to handle.
This is what makes the design matrix routine a "big" problem. I'm not sure how big, but it does need to know which columns in the matrix belong to which variables (that's already done), which columns correspond to interactions, which to nested effects, and random effects. Mapping interactions to columns might not be easy. Also, the coefficient portion of the model struct will need a way to match coefficients with columns (or maybe variables). The GLM procedure code would have to call the design matrix code, hand it a model with any conceivable combination of these kinds of effects, and get a design matrix back, along with a way to match any variables (or combinations thereof) with the corresponding columns in the design matrix. This was the hardest part of writing the REGRESSION procedure, so I think it will be the hardest part of writing a GLM procedure. Once the design matrix is in place, estimation can proceed according to one of the many algorithms out there in the literature. Even if we picked the wrong one, it wouldn't be hard to change purely linear algebraic code later. The problem is going to be getting the data to the algorithm, and sorting through the results afterward. > I haven't really thought this through yet, but I am hoping to work on it. I'm not sure of the best way to do it, either. It might be worth taking a look at similar code in R. -Jason _______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev
