Re: Next step in covariance matrix

Jason Stover Tue, 27 Oct 2009 12:20:46 -0700

On Tue, Oct 27, 2009 at 06:25:32PM +0000, John Darrington wrote:
> Will that be enough to allow a subset of GLM to be implemented?


Yes, except for the interactions.

> 
> J'
> 
> On Tue, Oct 27, 2009 at 11:47:23AM -0400, Jason Stover wrote:
>      On Tue, Oct 27, 2009 at 06:38:19AM +0000, John Darrington wrote:
>      > Just to make sure I understand things correctly, consider the 
> following example, 
>      > where x and y are numeric variables and A and B are categorical ones:
>      > 
>      > x y A B
>      > =======
>      > 3 4 x v
>      > 5 6 y v
>      > 7 8 z w
>      > 
>      > We replace the categorical variables with bit_vectors:
>      > 
>      > x y A_0 A_1 A_2  B_0 B_1
>      > ========================
>      > 3 4  1   0   0    1   0
>      > 5 6  0   1   0    1   0
>      > 7 8  0   0   1    0   1
>      > 
>      > and arbitrarily drop the (say zeroth) subscript:
>      > 
>      > x y  A_1 A_2   B_1
>      > ==================
>      > 3 4   0   0     0
>      > 5 6   1   0     0
>      > 7 8   0   1     1
>      > 
>      > That will produce a 5x5 matrix. 5 is calculated from n + m - p,  where 
>      > n is the number of numeric  variables, m is the total number of  
> categories,
>      > and p is the number of categorical variables.  
>      
>      This is correct. 
>      
>      > However I don't see how such a matrix can be very useful. A better one 
> would involve 
>      > the  products of the categorical and numeric variables:
>      > 
>      > x y  x*A_1 x*A_2  y*A_1 y*A_2   x*B_1 y*B_1
>      > ===========================================
>      > 3 4     0   0        0     0       0     0
>      > 5 6     5   0        6     0       0     0
>      > 7 8     0   7        0     8       7     8
>      > 
>      > This makes an 8x8 matrix, where 8 is calculated from n + n * (m - p) , 
>      > which happens to be identical to n * (1 + m - p).  But this involves
>      > a whole lot more calculations.
>      
>      This second choice would give you the covariance of x and y, and the
>      covariances of the *interactions* between x and A, x and B, y and A,
>      and y and B, but not the covariance between (say) x and A. The
>      covariance between x and A would be stored in the first matrix you
>      mentioned, in elements (0,2), (0,3), (2,0) and (3,0) assuming we kept
>      both upper and lower triangles.
>      
>      You mention that matrix not being very useful, and in a sense it
>      isn't: No human would care about the covariance between x and the
>      column corresponding to the first bit vector of A. But in another
>      sense, that matrix is absolutely necessary: It's used to solve the
>      least squares problem, whose solution we use to tell us if A and our
>      dependent variable are related. That relation is shown via analysis of
>      variance, whose p-value is many computations away from the covariance
>      matrix, but depends on it nevertheless.
>      
>      This matrix is unnecessary for a one-way ANOVA, whose computations from
>      the matrix above can be simplified into the simple sums used in
>      oneway.q.  But for a bigger model, with many factors and interactions
>      and covariates, we need that first matrix because we can't reduce the
>      problem to a few easy-to-read summations.
>      
>      
>      _______________________________________________
>      pspp-dev mailing list
>      [email protected]
>      http://lists.gnu.org/mailman/listinfo/pspp-dev
> 
> -- 
> PGP Public key ID: 1024D/2DE827B3 
> fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
> See http://pgp.mit.edu or any PGP keyserver for public key.
> 
> 




_______________________________________________
pspp-dev mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/pspp-dev

Re: Next step in covariance matrix

Reply via email to