Hi Sourav, For some reason, I didn't get your email on "Tue, 08 Dec 2015 12:56:38 -0800 " (which I noticed in the archive).
>> Not sure how exactly I can modify the GLM-predict.dml to get some prediction to start with. There are two options here: 1. Modify GLM-predict.dml as suggested by Shirish (better approach with respect to the SystemML optimizer) or 2. Run a new script on the output of GLM-predict. Please see: https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/api/ml/LogisticRegressionModel.java#L163 If you chose to go with option 2, you might also want to read the documentation of following two built-in functions: a. rowIndexMax (See http://apache.github.io/incubator-systemml/dml-language-reference.html#matrix-andor-scalar-comparison-built-in-functions ) b. ppred >> Can you give me some idea how from here I can calculate the predicted value of the label using some value of probability threshold ? Very simple way to predict the label given probability matrix: Prediction = rowIndexMax(Prob) # predicts the label with highest probability. This assumes one-based labels. Thanks, Niketan Pansare IBM Almaden Research Center E-mail: npansar At us.ibm.com http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar From: Shirish Tatikonda <shirish.tatiko...@gmail.com> To: dev@systemml.incubator.apache.org Date: 12/08/2015 12:49 PM Subject: Re: Using GLM-predict Hi Sourav, Yes, GLM-predict.dml gives out only the probabilities. You can put a threshold on the resulting probabilities to get the actual class labels -- for example, prob > 0.5 is positive and <=0.5 as negative. The exact value of threshold typically depends on the data and the application. Different thresholds yield different classifiers with different performance (precision, recall, etc.). You can find the best threshold for the given data set by finding a value that gives the desired classifier performance (for example, a threshold that gives roughly equal precision and recall). Such an optimization is obviously done during the training phase using a held out test set. If you wish, you can also modify the DML script to perform this entire process. Shirish On Tue, Dec 8, 2015 at 12:23 PM, Sourav Mazumder < sourav.mazumde...@gmail.com> wrote: > Hi, > > I have used GLM.dml to create a model using some sample data. It returns to > me the matrix of Beta, B. > > Now I want to use this matrix of Beta on a new set of data points and > generate predicted value of the dependent variable/observation. > > When I checked GLM-predict, I could see that one can pass feature vector > for the new data set and also the matrix of beta. > > But I could not see any way to get the predicted value of the dependent > variable/observation. The output parameter only supports matrix of > predicted means/probabilities. > > Is there a way one can get the predicted value of the dependent > variable/observation from GLM-predict ? > > Regards, > Sourav >