Jason W. Martinez wrote:
If you only concentrate on the relative proportions, this are called compositional data. I f your data are inDear R-users,
I have an outcome variable and I'm unsure about how to treat it. Any advice?
I have spending data for each county in the state of California (N=58). Each county has been allocated money to spend on any one of the following four categories: A, B, C, and D.
Each county may spend the money in any way they see fit. This also means that the county need not spend all the money that was allocated to them. The data structure looks something like the one below:
COUNTY A B C D Total ---------------------------------------------------- alameda 2534221 1555592 2835475 3063249 9988537 alpine 3174 8500 0 45558 55232 amador 0 0 0 0 0 ....
The goal is to explain variation in spending patterns, which are presumably the result of characteristics for each county.
I may treat the problem like a simple linear regression problem for each category, but by definition, money spent in one category will take away the amount of money that can be spent in any other category---and each county is not allocated the same amount of money to spend.
I have constructed proportions of amount spent on each category and have
conducted quasibinomial regression, on each dependent outcome but that
does not seem very convincing to me.
Would anyone have any advice about how to treat an outcome variable of this sort?
Thanks for any hints!
Jason
mydata (n x 4), you obtain compositions by
sweep(mydata, 1, apply(mydata, 1, sum), "/")
There are not (AFAIK) specific functions/packages for R for compositional data AFAIK, but you
can try googling. Aitchison has a monography (Chapman & Hall) and a paper in JRSS B.
One way to start might be lm's or anova on the symmetric logratio transform of the
compositons. The R function lm can take a multivariate response, but some extra programming will be needed
for interpretation. With simulated data:
> slr function(y) { # y should sum to 1 v <- log(y) return( v - mean(v) ) } > testdata <- matrix( rgamma(120, 2,3), 30, 4) > str(testdata) num [1:30, 1:4] 0.200 0.414 0.311 2.145 0.233 ... > comp <- sweep(testdata, 1, apply(testdata,1,sum), "/") # To get the symmetric logratio transform: comp <- t(apply(comp, 1, slr)) # Observe: apply(cov(comp), 1, sum) [1] -5.551115e-17 2.775558e-17 5.551115e-17 -2.775558e-17 > lm( comp ~ 1)
Call: lm(formula = comp ~ 1)
Coefficients:
[,1] [,2] [,3] [,4] (Intercept) 0.17606 0.06165 -0.03783 -0.19988
> summary(lm( comp ~ 1)) Response Y1 :
Call: lm(formula = Y1 ~ 1)
Residuals: Min 1Q Median 3Q Max -1.29004 -0.46725 -0.07657 0.55834 1.20551
Coefficients: Estimate Std. Error t value Pr(>|t|) [1,] 0.1761 0.1265 1.391 0.175
Residual standard error: 0.6931 on 29 degrees of freedom
Response Y2 :
Call: lm(formula = Y2 ~ 1)
Residuals: Min 1Q Median 3Q Max -1.2982 -0.5711 -0.1355 0.5424 1.6598
Coefficients: Estimate Std. Error t value Pr(>|t|) [1,] 0.06165 0.15049 0.41 0.685
Residual standard error: 0.8242 on 29 degrees of freedom
Response Y3 :
Call: lm(formula = Y3 ~ 1)
Residuals: Min 1Q Median 3Q Max -1.97529 -0.41115 0.03666 0.42785 0.88567
Coefficients: Estimate Std. Error t value Pr(>|t|) [1,] -0.03783 0.11623 -0.325 0.747
Residual standard error: 0.6366 on 29 degrees of freedom
Response Y4 :
Call: lm(formula = Y4 ~ 1)
Residuals: Min 1Q Median 3Q Max -2.8513 -0.3955 0.2815 0.5939 1.2475
Coefficients: Estimate Std. Error t value Pr(>|t|) [1,] -0.1999 0.1620 -1.234 0.227
Residual standard error: 0.8872 on 29 degrees of freedom
Sorry for not being of more help!
Kjetil
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra
-- No virus found in this outgoing message. Checked by AVG Anti-Virus.
______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html