Hi Ashic, Unfortunately I don't know how to work around that - I suggested this line as it looked promising (I had considered it once before deciding to use a different algorithm) but I never actually tried it.
Regards, James On 13 April 2016 at 02:29, Ashic Mahtab <as...@live.com> wrote: > It looks like the issue is around impurity stats. After converting an rf > model to old, and back to new (without disk storage or anything), and > specifying the same num of features, same categorical features map, etc., > DecisionTreeClassifier::predictRaw throws a null pointer exception here: > > override protected def predictRaw(features: Vector): Vector = { > Vectors.dense(rootNode.predictImpl(features).*impurityStats.* > stats.clone()) > } > > It appears impurityStats is always null (even though impurity does have a > value). Any known workarounds? It's looking like I'll have to revert to > using mllib instead :( > > -Ashic. > > ------------------------------ > From: as...@live.com > To: ja...@gluru.co > CC: user@spark.apache.org > Subject: RE: ML Random Forest Classifier > Date: Wed, 13 Apr 2016 02:20:53 +0100 > > > I managed to get to the map using MetadataUtils (it's a private ml > package). There are still some issues, around feature names, etc. Trying to > pin them down. > > ------------------------------ > From: as...@live.com > To: ja...@gluru.co > CC: user@spark.apache.org > Subject: RE: ML Random Forest Classifier > Date: Wed, 13 Apr 2016 00:50:31 +0100 > > Hi James, > Following on from the previous email, is there a way to get the > categoricalFeatures of a Spark ML Random Forest? Essentially something I > can pass to > > RandomForestClassificationModel.fromOld(oldModel, parent, > *categoricalFeatures*, numClasses, numFeatures) > > I could construct it by hand, but I was hoping for a more automated way of > getting the map. Since the trained model already knows about the value, > perhaps it's possible to grab it for storage? > > Thanks, > Ashic. > > ------------------------------ > From: as...@live.com > To: ja...@gluru.co > CC: user@spark.apache.org > Subject: RE: ML Random Forest Classifier > Date: Mon, 11 Apr 2016 23:21:53 +0100 > > Thanks, James. That looks promising. > > ------------------------------ > Date: Mon, 11 Apr 2016 10:41:07 +0100 > Subject: Re: ML Random Forest Classifier > From: ja...@gluru.co > To: as...@live.com > CC: user@spark.apache.org > > To add a bit more detail perhaps something like this might work: > > package org.apache.spark.ml > > > import org.apache.spark.ml.classification.RandomForestClassificationModel > import org.apache.spark.ml.classification.DecisionTreeClassificationModel > import org.apache.spark.ml.classification.LogisticRegressionModel > import org.apache.spark.mllib.tree.model.{ RandomForestModel => > OldRandomForestModel } > import org.apache.spark.ml.classification.RandomForestClassifier > > > object RandomForestModelConverter { > > > def fromOld(oldModel: OldRandomForestModel, parent: > RandomForestClassifier = null, > categoricalFeatures: Map[Int, Int], numClasses: Int, numFeatures: Int > = -1): RandomForestClassificationModel = { > RandomForestClassificationModel.fromOld(oldModel, parent, > categoricalFeatures, numClasses, numFeatures) > } > > > def toOld(newModel: RandomForestClassificationModel): > OldRandomForestModel = { > newModel.toOld > } > } > > > Regards, > > James > > > On 11 April 2016 at 10:36, James Hammerton <ja...@gluru.co> wrote: > > There are methods for converting the dataframe based random forest models > to the old RDD based models and vice versa. Perhaps using these will help > given that the old models can be saved and loaded? > > In order to use them however you will need to write code in the > org.apache.spark.ml package. > > I've not actually tried doing this myself but it looks as if it might work. > > Regards, > > James > > On 11 April 2016 at 10:29, Ashic Mahtab <as...@live.com> wrote: > > Hello, > I'm trying to save a pipeline with a random forest classifier. If I try to > save the pipeline, it complains that the classifier is not Writable, and > indeed the classifier itself doesn't have a write function. There's a pull > request that's been merged that enables this for Spark 2.0 (any dates > around when that'll release?). I am, however, using the Spark Cassandra > Connector which doesn't seem to be able to create a CqlContext with spark > 2.0 snapshot builds. Seeing that ML Lib's random forest classifier supports > storing and loading models, is there a way to create a Spark ML pipeline in > Spark 1.6 with a random forest classifier that'll allow me to store and > load the model? The model takes significant amount of time to train, and I > really don't want to have to train it every time my application launches. > > Thanks, > Ashic. > > > >