On Fri, 7 May 2004 01:22:59 GMT, "Phil Sherrod"
<[EMAIL PROTECTED]> wrote:

> On  6-May-2004, Richard Ulrich <[EMAIL PROTECTED]> wrote:
> 
> > Phil, are you asserting, implicitly, that your decision-tree analysis
> > has a built-in facility for handling Survival analysis in a life-table
> > manner?
> 
> No I am not.  Please reread the problem statement that was posted:
> 
> > > > Basically, we have 8 years data and thousands of rows regarding a
> > > > subscription service. Three raw variables are as follows.
> > > >
> > > > a) Starting Date of subscription
> > > > b) Cancellation Date of subscription
> 
> Every entry has a starting date and a cancellation date; there is no truncated
> survival period.  So why do you think survival analysis is required for this?

I suppose it was the user's "Question 1" that makes it necessary,
and it was the earlier post by someone else that made it seem
obvious -

======== from the initial post in sci.stat.consult
Questions:
Q1) Is it ok to calculate "NumberOfMonths" variable from starting and
ending date of subscription? The reason I ask this is that for
customers who have not cancelled subscription yet, it will only result
in a number that will be
the same whether they are still with us. Of course this information
(cancellation of subscription) will simultaneously be captured in the
"status" independent variable (0 or 1).
=========


> Just because the dependent variable happens to relate to time periods doesn't
> immediately mean that survival analysis is called for.  They could be trying to
> predict the amount of money the customer spent during the subscription period
> and the same type of analysis would work.
> 
> > Since there is only one predictor variable, with 66 levels, I don't
> > see why the analysis should take more than 3 seconds....
> 
> Why do you think there is only one predictor variable with 66 levels.  Here is
> the statement:
> 
> > > > c) Demograhpic Segments that a customer belongs to. We have 66
> > > > categorical values such as 01, 02..etc. These segments are given to
> > > > us by an outside firm that basically appends a segment to a customer
> > > > data based on variables such as what kind of car a customer drives,
> > > > how much she is educated, or how much she earns etc.
> 
> Note: "_variables_ such as..." (1) kind of car, (2) education, (3) income...
> 
> There are 66 variables with multiple levels.  It is very possible (even likely)
> that they will want to use the zip code and/or state of residence as

I will confess that my reading is not a sure one.  The language, 
I think, is inconsistent;  I still find my interpretation to be the
likely one, but I won't be surprised if I'm wrong.

What it says is, "66 categorical values,"  not "variables".
It also says, "demographic segments that a customer belongs to" -
which seems to imply more than one segment per customer; then
it says, "appends *a* segment ... *based on*  variables such as 
what kind of car [et cetera]."   I read that as being a categorical
1-66, which was reinforced by the alternative suggested of using
65 dummy variables.  -- If there were a dozen variables with a
total of 66 values, there would be 66 minus a dozen dummy 
variables.  

[snip, more about "66 variables"]

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to