On Thu, 6 May 2004 14:19:45 GMT, "Phil Sherrod"
<[EMAIL PROTECTED]> wrote:

> 
> On  5-May-2004, [EMAIL PROTECTED] (AJ) wrote:
> 
> > I am trying to forecast customer life span for a set of data.
> >
> > Basically, we have 8 years data and thousands of rows regarding a
> > subscription service. Three raw variables are as follows.
> >
> > a) Starting Date of subscription
> > b) Cancellation Date of subscription
> > c) Demograhpic Segments that a customer belongs to. We have 66
> > categorical values such as 01, 02..etc. These segments are given to
> > us by an outside firm that basically appends a segment to a customer
> > data based on variables such as what kind of car a customer drives,
> > how much she is educated, or how much she earns etc.
> >
> > I am interested in predicting the number of months a customer would
> > stay with the product. I was thinking I could use the following
> > variables in my regression model.
> 
> This is a good example of a data mining problem that could be handled well
> by a decision tree (regression tree).  Unlike classical (numeric function)
> regression where your categorical variables have to be recast as multiple
> binary (0/1) variables, decision trees handle categorical variables in a
> natural way.  I would just dump all of the data with all of the variables

Phil, are you asserting, implicitly, that your decision-tree analysis
has a built-in facility for handling Survival analysis in a life-table
manner?

> into the analysis and let it pick out which variables are significant and
> look for interactions.  Unless there is something unusual about your data, I
> believe the entire setup and analysis run could be done in a half hour.

Since there is only one predictor variable, with 66 levels, I don't
see why the analysis should take more than 3 seconds....

> 
> I recommend first developing a single-tree model which is excellent for
> getting a visual picture of the model and looking for significant variables
> and interactions.  Then, for significantly increased accuracy, I would build
> a TreeBoost model consisting a series of boosted trees.  TreeBoost typically
> has comparable accuracy to neural networks.

I do think that there is potential for a model based on the
characteristics coded into the 66 groups; but I did not read
the question as asking that.  And it is not a sure thing that
the information about the 66 groups is even coded.

After Survival analysis, there are 66 groups, each of which 
is distinguished by a survival percentage, estimated by a 
life table.  If the error terms differ, there also could be an
estimate of variance, for 'weighting' a further analysis.
That could be rather straightforward as a regression, 
if those category-terms are known and scorable.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to