[R] Questions about Probit Analysis

Lorenzo Isella Sun, 31 Oct 2010 11:15:05 -0700

Dear All,
I have some questions about probit regressions.
I saw a nice introduction at


http://bit.ly/bU9xL5

and I mainly have two questions.

(1) The first is almost about data manipulation. Consider the followingsnippet


##################################################

mydata <- read.csv(url("http://www.ats.ucla.edu/stat/r/dae/binary.csv";))
names(mydata) <- c("outcome","x1","x2","x3")

myprobit <- glm(mydata$outcome~mydata$x1+mydata$x2+as.factor(mydata$x3),family=binomial(link="probit"))


print(summary(myprobit))


#Now assume I can make a regression only on x1


myprobit2 <- glm(mydata$outcome~mydata$x1, family=binomial(link="probit"))

print(summary(myprobit2))

#express in terms of counts

md <- t(table(mydata$outcome, mydata$x1))

# create new dataframe


mydatanew <- data.frame(as.numeric(row.names(md)))

names(mydatanew) <- c("x1")

mydatanew$successes <-as.numeric(md[ ,2])

mydatanew$failures <-as.numeric(md[ ,1])


########################################################################

where first I carry out a logit regression of the binary outcome (i.e.taking only 0/1 as values) on 3 regressors, then I simply regress theoutcome on the x1 variable.


Finally, I generate the data frame mydatanew (see some of its entries below)

> mydatanew
    x1 successes failures
1  220         0        1
2  300         1        2
3  340         1        3
4  360         0        4
5  380         0        8
[...................]

where for every value of x1 I count the number of 0 and 1 outcomes(namely number of failures and number of successes). This is equivalentto having a full list of x1 values with an associated 0/1 outcome (Ihave simply counted them) hence it is all the info I need to againperform a logit regression of the binary outcome on x1, but the dataformat is now different. How can I actually feed R with mydatanew toperform again a logistic regression on x1 only?(2) This is a bit more conceptual. Let us say that you have a set ofproducts A,B,C,D,E,F. Each product has a list of features: x_A forproduct A, x_B for B etc...Each customer has its own set of parameters (age, sex, income etc..) Icall x_cust. Finally, the customer is confronted with two products (e.g.A and D; combinations may vary, I call each combination of two productsa scenario) and asked which one he would like to buy. Bottom line: yourdata are in the format



1 x_A x_cust
0 x_D x_cust

meaning that a certain customer chose product A against product D; similarly

1 x_C x_cust
0 x_B x_cust

would mean that the customer choosing between C and B finally selectedC. Every customer needs to choose a product in a variety of differentscenarios. How would you analyze this kind of data? Is there any way Ican express, in my probit analysis, the fact that my binary outcome (butthis product or not) arises always from the comparison of two productsonly (customers are never given a choice between more than two productsin a given scenario). Or should I simply run my logistic regression onmy 0/1 outcome without any extra worry (like in the snippet above)?

Many thanks

Lorenzo

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Questions about Probit Analysis

Reply via email to