Dear All,
I have some questions about probit regressions.
I saw a nice introduction at

http://bit.ly/bU9xL5

and I mainly have two questions.

(1) The first is almost about data manipulation. Consider the following snippet

##################################################

mydata <- read.csv(url("http://www.ats.ucla.edu/stat/r/dae/binary.csv";))
names(mydata) <- c("outcome","x1","x2","x3")

myprobit <- glm(mydata$outcome~mydata$x1+mydata$x2+as.factor(mydata$x3), family=binomial(link="probit"))

print(summary(myprobit))


#Now assume I can make a regression only on x1


myprobit2 <- glm(mydata$outcome~mydata$x1, family=binomial(link="probit"))

print(summary(myprobit2))

#express in terms of counts

md <- t(table(mydata$outcome, mydata$x1))

# create new dataframe


mydatanew <- data.frame(as.numeric(row.names(md)))

names(mydatanew) <- c("x1")

mydatanew$successes <-as.numeric(md[ ,2])

mydatanew$failures <-as.numeric(md[ ,1])


########################################################################

where first I carry out a logit regression of the binary outcome (i.e. taking only 0/1 as values) on 3 regressors, then I simply regress the outcome on the x1 variable.

Finally, I generate the data frame mydatanew (see some of its entries below)

> mydatanew
    x1 successes failures
1  220         0        1
2  300         1        2
3  340         1        3
4  360         0        4
5  380         0        8
[...................]

where for every value of x1 I count the number of 0 and 1 outcomes (namely number of failures and number of successes). This is equivalent to having a full list of x1 values with an associated 0/1 outcome (I have simply counted them) hence it is all the info I need to again perform a logit regression of the binary outcome on x1, but the data format is now different. How can I actually feed R with mydatanew to perform again a logistic regression on x1 only? (2) This is a bit more conceptual. Let us say that you have a set of products A,B,C,D,E,F. Each product has a list of features: x_A for product A, x_B for B etc... Each customer has its own set of parameters (age, sex, income etc..) I call x_cust. Finally, the customer is confronted with two products (e.g. A and D; combinations may vary, I call each combination of two products a scenario) and asked which one he would like to buy. Bottom line: your data are in the format


1 x_A x_cust
0 x_D x_cust

meaning that a certain customer chose product A against product D; similarly

1 x_C x_cust
0 x_B x_cust

would mean that the customer choosing between C and B finally selected C. Every customer needs to choose a product in a variety of different scenarios. How would you analyze this kind of data? Is there any way I can express, in my probit analysis, the fact that my binary outcome (but this product or not) arises always from the comparison of two products only (customers are never given a choice between more than two products in a given scenario). Or should I simply run my logistic regression on my 0/1 outcome without any extra worry (like in the snippet above)?
Many thanks

Lorenzo

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to