Re: Forecasting customer life span

Phil Sherrod Fri, 07 May 2004 05:06:20 -0700

On  6-May-2004, Richard Ulrich <[EMAIL PROTECTED]> wrote:

> Phil, are you asserting, implicitly, that your decision-tree analysis
> has a built-in facility for handling Survival analysis in a life-table
> manner?


No I am not.  Please reread the problem statement that was posted:

> > > Basically, we have 8 years data and thousands of rows regarding a
> > > subscription service. Three raw variables are as follows.
> > >
> > > a) Starting Date of subscription
> > > b) Cancellation Date of subscription

Every entry has a starting date and a cancellation date; there is no truncated
survival period.  So why do you think survival analysis is required for this?
Just because the dependent variable happens to relate to time periods doesn't
immediately mean that survival analysis is called for.  They could be trying to
predict the amount of money the customer spent during the subscription period
and the same type of analysis would work.

> Since there is only one predictor variable, with 66 levels, I don't
> see why the analysis should take more than 3 seconds....

Why do you think there is only one predictor variable with 66 levels.  Here is
the statement:

> > > c) Demograhpic Segments that a customer belongs to. We have 66
> > > categorical values such as 01, 02..etc. These segments are given to
> > > us by an outside firm that basically appends a segment to a customer
> > > data based on variables such as what kind of car a customer drives,
> > > how much she is educated, or how much she earns etc.

Note: "_variables_ such as..." (1) kind of car, (2) education, (3) income...

There are 66 variables with multiple levels.  It is very possible (even likely)
that they will want to use the zip code and/or state of residence as
predictors.  Are you going to recast all of the zip code classes as separate
binary variables?  Even the type of car may have dozens of classes. I recently
developed a decision tree model for an application that used zip code as one of
the predictors, and there were over 5000 categories.

> After Survival analysis, there are 66 groups, each of which
> is distinguished by a survival percentage, estimated by a
> life table.  If the error terms differ, there also could be an
> estimate of variance, for 'weighting' a further analysis.
> That could be rather straightforward as a regression,
> if those category-terms are known and scorable.

I believe there may be hundreds of thousands of "groups" defined by
intersections of the various classes on the 66 predictors.  Also, who knows
what type of interactions you're going to run into.  I think it would be a
nightmare to try to fit a regression to this.

-- 
Phil Sherrod
(phil.sherrod 'at' sandh.com)
http://www.dtreg.com  (decision tree modeling)
http://www.nlreg.com  (nonlinear regression)
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: Forecasting customer life span

Reply via email to