On 6-May-2004, Richard Ulrich <[EMAIL PROTECTED]> wrote: > Phil, are you asserting, implicitly, that your decision-tree analysis > has a built-in facility for handling Survival analysis in a life-table > manner?
No I am not. Please reread the problem statement that was posted: > > > Basically, we have 8 years data and thousands of rows regarding a > > > subscription service. Three raw variables are as follows. > > > > > > a) Starting Date of subscription > > > b) Cancellation Date of subscription Every entry has a starting date and a cancellation date; there is no truncated survival period. So why do you think survival analysis is required for this? Just because the dependent variable happens to relate to time periods doesn't immediately mean that survival analysis is called for. They could be trying to predict the amount of money the customer spent during the subscription period and the same type of analysis would work. > Since there is only one predictor variable, with 66 levels, I don't > see why the analysis should take more than 3 seconds.... Why do you think there is only one predictor variable with 66 levels. Here is the statement: > > > c) Demograhpic Segments that a customer belongs to. We have 66 > > > categorical values such as 01, 02..etc. These segments are given to > > > us by an outside firm that basically appends a segment to a customer > > > data based on variables such as what kind of car a customer drives, > > > how much she is educated, or how much she earns etc. Note: "_variables_ such as..." (1) kind of car, (2) education, (3) income... There are 66 variables with multiple levels. It is very possible (even likely) that they will want to use the zip code and/or state of residence as predictors. Are you going to recast all of the zip code classes as separate binary variables? Even the type of car may have dozens of classes. I recently developed a decision tree model for an application that used zip code as one of the predictors, and there were over 5000 categories. > After Survival analysis, there are 66 groups, each of which > is distinguished by a survival percentage, estimated by a > life table. If the error terms differ, there also could be an > estimate of variance, for 'weighting' a further analysis. > That could be rather straightforward as a regression, > if those category-terms are known and scorable. I believe there may be hundreds of thousands of "groups" defined by intersections of the various classes on the 66 predictors. Also, who knows what type of interactions you're going to run into. I think it would be a nightmare to try to fit a regression to this. -- Phil Sherrod (phil.sherrod 'at' sandh.com) http://www.dtreg.com (decision tree modeling) http://www.nlreg.com (nonlinear regression) . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
