[ https://issues.apache.org/jira/browse/SYSTEMML-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924804#comment-15924804 ]
Imran Younus commented on SYSTEMML-1401: ---------------------------------------- [~prithvianight] [~mboehm7] [~ae2015] [~a1singh] Please have a look at this. This is important for R4ML. > Data mismatch problem with Cox Predict script > --------------------------------------------- > > Key: SYSTEMML-1401 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1401 > Project: SystemML > Issue Type: Bug > Components: Algorithms > Environment: > Reporter: Imran Younus > > The Cox predict script internally sorts the input/test data set w.r.t. time. > This is necessary to calculate the cumulative hazard function. But creates a > serious problem for the user because all the results returned from the > predict script are sorted by time but the input data is not, and user has no > way of matching the input data with predictions. > There are two possible solutions to this problems: > 1) We should restore the original order inside the predict script before > returning the final results, so that the order of the predictions match > exactly with order of the input data. > 2) We can add sorted time column in the final output to let the user know > which prediction corresponds to which time value. This may be easier to > implement, but I think this is not ideal solution because in case of ties in > time values, user will still have problem matching input with the predictions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)