Imran Younus created SYSTEMML-1401:
--------------------------------------

             Summary: Data mismatch problem with Cox Predict script
                 Key: SYSTEMML-1401
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1401
             Project: SystemML
          Issue Type: Bug
          Components: Algorithms
         Environment: 

            Reporter: Imran Younus


The Cox predict script internally sorts the input/test data set w.r.t. time. 
This is necessary to calculate the cumulative hazard function. But creates a 
serious problem for the user because all the results returned from the predict 
script are sorted by time but the input data is not, and user has no way of 
matching the input data with predictions.

There are two possible solutions to this problems:

1) We should restore the original order inside the predict script before 
returning the final results, so that the order of the predictions match exactly 
with order of the input data.

2) We can add sorted time column in the final output to let the user know which 
prediction corresponds to which time value. This may be easier to implement, 
but I think this is not ideal solution because in case of ties in time values, 
user will still have problem matching input with the predictions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to