Hi Sourav, 1. In the GLM-predict.dml I could see 'means' is the output variable. In my understanding it is same as the probability matrix u have mentioned in your mail (to be used to compute the prediction). Am I right ? Yes, that's correct.
2. From GLM.dml I get the 'betas' as output using outputs.getBinaryBlockedRDD("beta_out"). The same I pass to GLM-predict.dml as B. Can you try this ? // Get output from GLM val beta = outputs.getBinaryBlockedRDD("beta_out") val betaMC = outputs.getMatrixCharacteristics("beta_out") // This way you don't have to worry about dimensions. // ----------------------------------------- val Xin = DataFrame/RDD of values (or even text/csv file) you want to predict // ----------------------------------------- // Execute GLM-predict ml.reset() // Please read https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/GLM.dml // dfam Int 1 Distribution family code: 1 = Power, 2 = Binomial val cmdLineParamsPredict = Map("X" -> " ", "B" -> " ", "dfam" -> "...") // family of distribution ? ml.registerInput("X", Xin) ml.registerInput("B_full", beta, betaMC) ml.registerOutput("means") val outputsPredict = ml.execute ("/home/system-ml-0.9.0-SNAPSHOT/algorithms/GLM-predict.dml", cmdLineParamsPredict) val prob = out.getBinaryBlockedRDD("means"); val probMC = out.getMatrixCharacteristics("means"); // ----------------------------------------- // Get predicted label ml.reset() ml.registerInput("Prob",prob, probMC) ml.registerOutput("Prediction") val outputsLabels = = mlNew.executeScript("Prob = read(\"temp1\"); " + "Prediction = rowIndexMax(Prob); " + "write(Prediction, \"tempOut\", \"csv\")") val pred = outputsLabels.getDF(sqlContext, "Prediction").withColumnRenamed ("C1", "prediction") // ----------------------------------------- 3. Say I get back prediction matrix as an output (from predictions = rowIndexMax(means);). Now can I read add that as a column to my original data frame (the one from which I created the feature vector for the original model) ? My concern is whether adding back will ensure the right order so that teh key for the feature vector and the predicted value remain same ? If not how to achieve the same ? In above example 'pred' is a DataFrame with column 'ID' which provides the row ID. Thanks, Niketan Pansare IBM Almaden Research Center E-mail: npansar At us.ibm.com http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar From: Sourav Mazumder <sourav.mazumde...@gmail.com> To: dev@systemml.incubator.apache.org, Niketan Pansare/Almaden/IBM@IBMUS Date: 12/08/2015 10:53 PM Subject: Re: Using GLM-predict Hi Niketan, Thanks again for the detailed inputs. Some more follow up Qs - 1. In the GLM-predict.dml I could see 'means' is the output variable. In my understanding it is same as the probability matrix u have mentioned in your mail (to be used to compute the prediction). Am I right ? 2. From GLM.dml I get the 'betas' as output using outputs.getBinaryBlockedRDD("beta_out"). The same I pass to GLM-predict.dml as B. For registering B following statements are used val beta = outputs.getBinaryBlockedRDD("beta_out") ml.registerInput("B", beta, 1, 4) // I have four feature vectors so I get 4 coefficients However, when I execute GLM-predict.dml I get following error. val outputs = ml.execute("/home/system-ml-0.9.0-SNAPSHOT/algorithms/GLM-predict.dml", cmdLineParams) 15/12/09 05:32:47 WARN Expression: Metadata file: .mtd not provided 15/12/09 05:32:47 ERROR Expression: ERROR: /home/system-ml-0.9.0-SNAPSHOT/algori thms/GLM-predict.dml -- line 117, column 8 -- Missing or incomplete dimensio n information in read statement: .mtd com.ibm.bi.dml.parser.LanguageException: Invalid Parameters : ERROR: /home/syste m-ml-0.9.0-SNAPSHOT/algorithms/GLM-predict.dml -- line 117, column 8 -- Miss ing or incomplete dimension information in read statement: .mtd In line 117 we have following statement : X = read (fileX); 3. Say I get back prediction matrix as an output (from predictions = rowIndexMax(means);). Now can I read add that as a column to my original data frame (the one from which I created the feature vector for the original model) ? My concern is whether adding back will ensure the right order so that teh key for the feature vector and the predicted value remain same ? If not how to achieve the same ? Regards, Sourav On Tue, Dec 8, 2015 at 2:08 PM, Niketan Pansare <npan...@us.ibm.com> wrote: > Hi Sourav, > > For some reason, I didn't get your email on "*Tue, 08 Dec 2015 12:56:38 > -0800* > < https://www.mail-archive.com/search?l=dev@systemml.incubator.apache.org&q=date:20151208 > " > (which I noticed in the archive). > > >> Not sure how exactly I can modify the GLM-predict.dml to get some > prediction to start with. > There are two options here: > 1. Modify GLM-predict.dml as suggested by Shirish (better approach with > respect to the SystemML optimizer) or > > 2. Run a new script on the output of GLM-predict. Please see: > https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/api/ml/LogisticRegressionModel.java#L163 > If you chose to go with option 2, you might also want to read the > documentation of following two built-in functions: > a. rowIndexMax (See > http://apache.github.io/incubator-systemml/dml-language-reference.html#matrix-andor-scalar-comparison-built-in-functions > < http://apache.github.io/incubator-systemml/dml-language-reference.html#matrix-andor-scalar-comparison-built-in-functions > > ) > b. ppred > > >> Can you give me some idea how from here I can calculate the predicted > value of the label using some value of probability threshold ? > Very simple way to predict the label given probability matrix: > Prediction = rowIndexMax(Prob) # predicts the label with highest > probability. This assumes one-based labels. > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > [image: Inactive hide details for Shirish Tatikonda ---12/08/2015 12:49:47 > PM---Hi Sourav, Yes, GLM-predict.dml gives out only the prob]Shirish > Tatikonda ---12/08/2015 12:49:47 PM---Hi Sourav, Yes, GLM-predict.dml gives > out only the probabilities. You can put a > > From: Shirish Tatikonda <shirish.tatiko...@gmail.com> > To: dev@systemml.incubator.apache.org > Date: 12/08/2015 12:49 PM > Subject: Re: Using GLM-predict > ------------------------------ > > > > Hi Sourav, > > Yes, GLM-predict.dml gives out only the probabilities. You can put a > threshold on the resulting probabilities to get the actual class labels -- > for example, prob > 0.5 is positive and <=0.5 as negative. > > The exact value of threshold typically depends on the data and the > application. Different thresholds yield different classifiers with > different performance (precision, recall, etc.). You can find the best > threshold for the given data set by finding a value that gives the desired > classifier performance (for example, a threshold that gives roughly equal > precision and recall). Such an optimization is obviously done during the > training phase using a held out test set. > > If you wish, you can also modify the DML script to perform this entire > process. > > Shirish > > > On Tue, Dec 8, 2015 at 12:23 PM, Sourav Mazumder < > sourav.mazumde...@gmail.com> wrote: > > > Hi, > > > > I have used GLM.dml to create a model using some sample data. It returns > to > > me the matrix of Beta, B. > > > > Now I want to use this matrix of Beta on a new set of data points and > > generate predicted value of the dependent variable/observation. > > > > When I checked GLM-predict, I could see that one can pass feature vector > > for the new data set and also the matrix of beta. > > > > But I could not see any way to get the predicted value of the dependent > > variable/observation. The output parameter only supports matrix of > > predicted means/probabilities. > > > > Is there a way one can get the predicted value of the dependent > > variable/observation from GLM-predict ? > > > > Regards, > > Sourav > > > > >