On Thu, 6 May 2004 14:19:45 GMT, "Phil Sherrod" <[EMAIL PROTECTED]> wrote:
> > On 5-May-2004, [EMAIL PROTECTED] (AJ) wrote: > > > I am trying to forecast customer life span for a set of data. > > > > Basically, we have 8 years data and thousands of rows regarding a > > subscription service. Three raw variables are as follows. > > > > a) Starting Date of subscription > > b) Cancellation Date of subscription > > c) Demograhpic Segments that a customer belongs to. We have 66 > > categorical values such as 01, 02..etc. These segments are given to > > us by an outside firm that basically appends a segment to a customer > > data based on variables such as what kind of car a customer drives, > > how much she is educated, or how much she earns etc. > > > > I am interested in predicting the number of months a customer would > > stay with the product. I was thinking I could use the following > > variables in my regression model. > > This is a good example of a data mining problem that could be handled well > by a decision tree (regression tree). Unlike classical (numeric function) > regression where your categorical variables have to be recast as multiple > binary (0/1) variables, decision trees handle categorical variables in a > natural way. I would just dump all of the data with all of the variables Phil, are you asserting, implicitly, that your decision-tree analysis has a built-in facility for handling Survival analysis in a life-table manner? > into the analysis and let it pick out which variables are significant and > look for interactions. Unless there is something unusual about your data, I > believe the entire setup and analysis run could be done in a half hour. Since there is only one predictor variable, with 66 levels, I don't see why the analysis should take more than 3 seconds.... > > I recommend first developing a single-tree model which is excellent for > getting a visual picture of the model and looking for significant variables > and interactions. Then, for significantly increased accuracy, I would build > a TreeBoost model consisting a series of boosted trees. TreeBoost typically > has comparable accuracy to neural networks. I do think that there is potential for a model based on the characteristics coded into the 66 groups; but I did not read the question as asking that. And it is not a sure thing that the information about the 66 groups is even coded. After Survival analysis, there are 66 groups, each of which is distinguished by a survival percentage, estimated by a life table. If the error terms differ, there also could be an estimate of variance, for 'weighting' a further analysis. That could be rather straightforward as a regression, if those category-terms are known and scorable. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
