Hi Sourav,

1. In the GLM-predict.dml I could see 'means' is the output variable. In my
understanding it is same as the probability matrix u have mentioned in your
mail (to be used to compute the prediction). Am I right ?
Yes, that's correct.

2. From GLM.dml I get the 'betas' as output using
outputs.getBinaryBlockedRDD("beta_out"). The same I pass to GLM-predict.dml
as B.

Can you try this ?
// Get output from GLM
val beta = outputs.getBinaryBlockedRDD("beta_out")
val betaMC = outputs.getMatrixCharacteristics("beta_out") // This way you
don't have to worry about dimensions.
// -----------------------------------------
val Xin = DataFrame/RDD of values (or even text/csv file) you want to
predict
// -----------------------------------------
// Execute GLM-predict
ml.reset()
// Please read
https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/GLM.dml
// dfam Int 1 Distribution family code: 1 = Power, 2 = Binomial
val cmdLineParamsPredict = Map("X" -> " ", "B" -> " ", "dfam" -> "...") //
family of distribution ?
ml.registerInput("X", Xin)
ml.registerInput("B_full", beta, betaMC)
ml.registerOutput("means")
val outputsPredict = ml.execute
("/home/system-ml-0.9.0-SNAPSHOT/algorithms/GLM-predict.dml",
cmdLineParamsPredict)
val prob = out.getBinaryBlockedRDD("means");
val probMC = out.getMatrixCharacteristics("means");
// -----------------------------------------
// Get predicted label
ml.reset()
ml.registerInput("Prob",prob, probMC)
ml.registerOutput("Prediction")
val outputsLabels = = mlNew.executeScript("Prob = read(\"temp1\"); "
+ "Prediction = rowIndexMax(Prob); "
+ "write(Prediction, \"tempOut\", \"csv\")")
val pred = outputsLabels.getDF(sqlContext, "Prediction").withColumnRenamed
("C1", "prediction")
// -----------------------------------------


3. Say I get back prediction matrix as an output (from predictions =
rowIndexMax(means);). Now can I read add that as a column to my original
data frame (the one from which I created the feature vector for the
original model) ? My concern is whether adding back will ensure the right
order so that teh key for the feature vector and the predicted value remain
same ? If not how to achieve the same ?
In above example 'pred' is a DataFrame with column 'ID' which provides the
row ID.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:   Sourav Mazumder <sourav.mazumde...@gmail.com>
To:     dev@systemml.incubator.apache.org, Niketan
            Pansare/Almaden/IBM@IBMUS
Date:   12/08/2015 10:53 PM
Subject:        Re: Using GLM-predict



Hi Niketan,

Thanks again for the detailed inputs.

Some more follow up Qs -

1. In the GLM-predict.dml I could see 'means' is the output variable. In my
understanding it is same as the probability matrix u have mentioned in your
mail (to be used to compute the prediction). Am I right ?

2. From GLM.dml I get the 'betas' as output using
outputs.getBinaryBlockedRDD("beta_out"). The same I pass to GLM-predict.dml
as B. For registering B following statements are used
val beta = outputs.getBinaryBlockedRDD("beta_out")
ml.registerInput("B", beta, 1, 4) // I have four feature vectors so I get 4
coefficients

However, when I execute GLM-predict.dml I get following error.

val outputs =
ml.execute("/home/system-ml-0.9.0-SNAPSHOT/algorithms/GLM-predict.dml",
cmdLineParams)

15/12/09 05:32:47 WARN Expression: Metadata file:  .mtd not provided
15/12/09 05:32:47 ERROR Expression: ERROR:
/home/system-ml-0.9.0-SNAPSHOT/algori
thms/GLM-predict.dml -- line 117, column 8 -- Missing or incomplete
dimensio
n information in read statement:  .mtd
com.ibm.bi.dml.parser.LanguageException: Invalid Parameters : ERROR:
/home/syste
m-ml-0.9.0-SNAPSHOT/algorithms/GLM-predict.dml -- line 117, column 8 --
Miss
ing or incomplete dimension information in read statement:  .mtd

In line 117 we have following statement : X = read (fileX);

3. Say I get back prediction matrix as an output (from predictions =
rowIndexMax(means);). Now can I read add that as a column to my original
data frame (the one from which I created the feature vector for the
original model) ? My concern is whether adding back will ensure the right
order so that teh key for the feature vector and the predicted value remain
same ? If not how to achieve the same ?

Regards,
Sourav





On Tue, Dec 8, 2015 at 2:08 PM, Niketan Pansare <npan...@us.ibm.com> wrote:

> Hi Sourav,
>
> For some reason, I didn't get your email on "*Tue, 08 Dec 2015 12:56:38
> -0800*
> <
https://www.mail-archive.com/search?l=dev@systemml.incubator.apache.org&q=date:20151208
> "
> (which I noticed in the archive).
>
> >> Not sure how exactly I can modify the GLM-predict.dml to get some
> prediction to start with.
> There are two options here:
> 1. Modify GLM-predict.dml as suggested by Shirish (better approach with
> respect to the SystemML optimizer) or
>
> 2. Run a new script on the output of GLM-predict. Please see:
>
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/api/ml/LogisticRegressionModel.java#L163

> If you chose to go with option 2, you might also want to read the
> documentation of following two built-in functions:
> a. rowIndexMax (See
>
http://apache.github.io/incubator-systemml/dml-language-reference.html#matrix-andor-scalar-comparison-built-in-functions

> <
http://apache.github.io/incubator-systemml/dml-language-reference.html#matrix-andor-scalar-comparison-built-in-functions
>
> )
> b. ppred
>
> >> Can you give me some idea how from here I can calculate the predicted
> value of the label using some value of probability threshold ?
> Very simple way to predict the label given probability matrix:
> Prediction = rowIndexMax(Prob) # predicts the label with highest
> probability. This assumes one-based labels.
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> [image: Inactive hide details for Shirish Tatikonda ---12/08/2015
12:49:47
> PM---Hi Sourav, Yes, GLM-predict.dml gives out only the prob]Shirish
> Tatikonda ---12/08/2015 12:49:47 PM---Hi Sourav, Yes, GLM-predict.dml
gives
> out only the probabilities. You can put a
>
> From: Shirish Tatikonda <shirish.tatiko...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 12/08/2015 12:49 PM
> Subject: Re: Using GLM-predict
> ------------------------------
>
>
>
> Hi Sourav,
>
> Yes, GLM-predict.dml gives out only the probabilities. You can put a
> threshold on the resulting probabilities to get the actual class labels
--
> for example, prob > 0.5 is positive and <=0.5 as negative.
>
> The exact value of threshold typically depends on the data and the
> application. Different thresholds yield different classifiers with
> different performance (precision, recall, etc.). You can find the best
> threshold for the given data set by finding a value that gives the
desired
> classifier performance (for example, a threshold that gives roughly equal
> precision and recall). Such an optimization is obviously done during the
> training phase using a held out test set.
>
> If you wish, you can also modify the DML script to perform this entire
> process.
>
> Shirish
>
>
> On Tue, Dec 8, 2015 at 12:23 PM, Sourav Mazumder <
> sourav.mazumde...@gmail.com> wrote:
>
> > Hi,
> >
> > I have used GLM.dml to create a model using some sample data. It
returns
> to
> > me the matrix of Beta, B.
> >
> > Now I want to use this matrix of Beta on a new set of data points and
> > generate predicted value of the dependent variable/observation.
> >
> > When I checked GLM-predict, I could see that one can pass feature
vector
> > for the new data set and also the matrix of beta.
> >
> > But I could not see any way to get the predicted value of the dependent
> > variable/observation. The output parameter only supports matrix of
> > predicted means/probabilities.
> >
> > Is there a way one can get the predicted value of the dependent
> > variable/observation from GLM-predict ?
> >
> > Regards,
> > Sourav
> >
>
>
>

Reply via email to