Re: [R] mixtures as outcome variables

Kjetil Brinchmann Halvorsen Wed, 23 Mar 2005 09:10:19 -0800

Jason W. Martinez wrote:

Dear R-users,

I have an outcome variable and I'm unsure about how to treat it. Any
advice?

I have spending data for each county in the state of California (N=58).
Each county has been allocated money to spend on any one of the
following four categories: A, B, C, and D.

Each county may spend the money in any way they see fit. This also means
that the county need not spend all the money that was allocated to them.
The data structure looks something like the one below:

COUNTY    A        B       C       D        Total
----------------------------------------------------
alameda  2534221  1555592 2835475  3063249  9988537
alpine   3174     8500    0        45558    55232
amador    0       0        0        0       0
....


The goal is to explain variation in spending patterns, which are
presumably the result of characteristics for each county.

I may treat the problem like a simple linear regression problem for each
category, but by definition, money spent in one category will take away
the amount of money that can be spent in any other category---and each
county is not allocated the same amount of money to spend.

I have constructed proportions of amount spent on each category and have conducted quasibinomial regression, on each dependent outcome but that does not seem very convincing to me.

Would anyone have any advice about how to treat an outcome variable of
this sort?

Thanks for any hints!

Jason

If you only concentrate on the relative proportions, this are called compositional data. I f your data are in mydata (n x 4), you obtain compositions by sweep(mydata, 1, apply(mydata, 1, sum), "/")

There are not (AFAIK) specific functions/packages for R for compositional data AFAIK, but you can try googling. Aitchison has a monography (Chapman & Hall) and a paper in JRSS B.

One way to start might be lm's or anova on the symmetric logratio transform of the compositons. The R function lm can take a multivariate response, but some extra programming will be needed for interpretation. With simulated data:

> slr
function(y) { # y should sum to 1
         v <- log(y)
         return( v - mean(v) ) }
> testdata <- matrix( rgamma(120, 2,3), 30, 4)
> str(testdata)
num [1:30, 1:4] 0.200 0.414 0.311 2.145 0.233 ...
> comp <- sweep(testdata, 1, apply(testdata,1,sum), "/")
# To get the symmetric logratio transform:
comp <- t(apply(comp, 1, slr))
# Observe:
apply(cov(comp), 1, sum)
[1] -5.551115e-17  2.775558e-17  5.551115e-17 -2.775558e-17
> lm( comp ~ 1)

Call:
lm(formula = comp ~ 1)

Coefficients: [,1] [,2] [,3] [,4] (Intercept) 0.17606 0.06165 -0.03783 -0.19988

> summary(lm( comp ~ 1))
Response Y1 :

Call:
lm(formula = Y1 ~ 1)

Residuals:
    Min       1Q   Median       3Q      Max
-1.29004 -0.46725 -0.07657  0.55834  1.20551

Coefficients:
    Estimate Std. Error t value Pr(>|t|)
[1,]   0.1761     0.1265   1.391    0.175

Residual standard error: 0.6931 on 29 degrees of freedom


Response Y2 :

Call:
lm(formula = Y2 ~ 1)

Residuals:
   Min      1Q  Median      3Q     Max
-1.2982 -0.5711 -0.1355  0.5424  1.6598

Coefficients:
    Estimate Std. Error t value Pr(>|t|)
[1,]  0.06165    0.15049    0.41    0.685

Residual standard error: 0.8242 on 29 degrees of freedom


Response Y3 :

Call:
lm(formula = Y3 ~ 1)

Residuals:
    Min       1Q   Median       3Q      Max
-1.97529 -0.41115  0.03666  0.42785  0.88567

Coefficients:
    Estimate Std. Error t value Pr(>|t|)
[1,] -0.03783    0.11623  -0.325    0.747

Residual standard error: 0.6366 on 29 degrees of freedom


Response Y4 :

Call:
lm(formula = Y4 ~ 1)

Residuals:
   Min      1Q  Median      3Q     Max
-2.8513 -0.3955  0.2815  0.5939  1.2475

Coefficients:
    Estimate Std. Error t value Pr(>|t|)
[1,]  -0.1999     0.1620  -1.234    0.227

Residual standard error: 0.8872 on 29 degrees of freedom


Sorry for not being of more help!

Kjetil

--

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
              --  Mahdi Elmandjra

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] mixtures as outcome variables

Reply via email to