On Tue, Apr 17, 2007 at 08:28:18AM +0800, John Darrington wrote: > On Mon, Apr 16, 2007 at 01:53:54PM -0400, Jason Stover wrote: > On Mon, Apr 16, 2007 at 11:10:43AM +0800, John Darrington wrote: > > If we were to follow approach 2, am I right in thinking that the > > 'interaction' data structure could be as large as the number of > > cases in the casefile? > > No. It would have either a hash of possible values (all unique), or a > small function to get back and forth between a union value and a > binary vector. > > So, given an interaction involving N variables, from a datafile with M > observations, what is the upper bound on the size of this hash ?
That depends on the number of distinct values of the variables. If you have 2 categorical variables, one with n possible values and the other with m, the hash would need n*m-1 entries. With k variables and n1, n2,...,nk distinct possible values, the number of entries would be n1*n2*...*nk - 1. Only in unusual circumstances would k be larger than 3, and almost never largers than 4, but that is how people "should" use interactions. But some users could make a lot more interactions, making that hash very large. If the variables are numeric, then the interaction is just their product. If one is numeric and one categorical, then the interaction is the scalar product. -Jason _______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev
