Phil,  I think you have misunderstood the problem.  See below.  -- Don.

On Fri, 7 May 2004, Phil Sherrod wrote in part:

> On  6-May-2004, Richard Ulrich <[EMAIL PROTECTED]> wrote:
>
> > Phil, are you asserting, implicitly, that your decision-tree analysis
> > has a built-in facility for handling Survival analysis in a life-table
> > manner?
>
> No I am not.  Please reread the problem statement that was posted:
>
> > > > Basically, we have 8 years data and thousands of rows regarding a
> > > > subscription service. Three raw variables are as follows.
> > > >
> > > > a) Starting Date of subscription
> > > > b) Cancellation Date of subscription
>
> Every entry has a starting date and a cancellation date; there is no
> truncated survival period.  So why do you think survival analysis is
> required for this?

Mostly because AJ (the OP) said so, explicitly:

>>  Dependent Variables: NumberOfMonths (derived from taking the
>> difference between the starting and ending date of subscription for
>> both cancelled customers and customer who are still with us)
>>  Independent Variables
>> a) Status (whether a customer has cancelled (0) or still with us (1))
>> b) Demograhpic Segment

Which refers both to "cancelled customers" and to "customers who are
still with us".  One may have an ending date for the current
subscription;  but until the customer decides to renew (or to cancel)
one does not know whether the subscription will in fact end on that
date.  Sounds like survival analysis to me.

By way of confirmation:  in the next paragraph, AJ asked:

>> Questions:
>>  Q1) Is it ok to calculate "NumberOfMonths" variable from starting
>> and ending date of subscription? The reason I ask this is that for
>> customers who have not cancelled subscription yet, it will only
>> result in a number that will be the same whether they are still with
>> us [or not -- DFB]. Of course this information (cancellation of
>> subscription) will simultaneously be captured in the "status"
>> independent variable (0 or 1).

  <snip>

[Rich Ulrich:]
> > Since there is only one predictor variable, with 66 levels, I don't
> > see why the analysis should take more than 3 seconds....
>
> Why do you think there is only one predictor variable with 66 levels.
> Here is the statement:
>
>> c) Demograhpic Segments that a customer belongs to. We have 66
>> categorical values such as 01, 02..etc. These segments are given to
>> us by an outside firm that basically appends a segment to a customer
>> data based on variables such as what kind of car a customer drives,
>> how much she is educated, or how much she earns etc.
>
> Note: "_variables_ such as..." (1) kind of car, (2) education, (3)
> income...  There are 66 variables with multiple levels.

No.  AJ explicitly writes "66 categorical VALUES" [emphasis added].
These segments (which I take to mean the applicable one of the 66 values
{1,2,3,...,66} for each customer) are appended to the customer's data,
and are BASED ON an unspecified number of variables (of which three
exemplars are named).  They do not COMPRISE those variables.
 [And (not that it matters) we have no idea whether the number of those
variables is 66, or more, or (more likely, in my opinion) fewer.
(There are probably more of them than the three named examples.)]

 <snip, the rest>

 ------------------------------------------------------------
 Donald F. Burrill                              [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110      (603) 626-0816
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to