Hi Ashic,

Unfortunately I don't know how to work around that - I suggested this line
as it looked promising (I had considered it once before deciding to use a
different algorithm) but I never actually tried it.

Regards,

James

On 13 April 2016 at 02:29, Ashic Mahtab <as...@live.com> wrote:

> It looks like the issue is around impurity stats. After converting an rf
> model to old, and back to new (without disk storage or anything), and
> specifying the same num of features, same categorical features map, etc.,
> DecisionTreeClassifier::predictRaw throws a null pointer exception here:
>
>  override protected def predictRaw(features: Vector): Vector = {
>     Vectors.dense(rootNode.predictImpl(features).*impurityStats.*
> stats.clone())
>   }
>
> It appears impurityStats is always null (even though impurity does have a
> value). Any known workarounds? It's looking like I'll have to revert to
> using mllib instead :(
>
> -Ashic.
>
> ------------------------------
> From: as...@live.com
> To: ja...@gluru.co
> CC: user@spark.apache.org
> Subject: RE: ML Random Forest Classifier
> Date: Wed, 13 Apr 2016 02:20:53 +0100
>
>
> I managed to get to the map using MetadataUtils (it's a private ml
> package). There are still some issues, around feature names, etc. Trying to
> pin them down.
>
> ------------------------------
> From: as...@live.com
> To: ja...@gluru.co
> CC: user@spark.apache.org
> Subject: RE: ML Random Forest Classifier
> Date: Wed, 13 Apr 2016 00:50:31 +0100
>
> Hi James,
> Following on from the previous email, is there a way to get the
> categoricalFeatures of a Spark ML Random Forest? Essentially something I
> can pass to
>
> RandomForestClassificationModel.fromOld(oldModel, parent,
> *categoricalFeatures*, numClasses, numFeatures)
>
> I could construct it by hand, but I was hoping for a more automated way of
> getting the map. Since the trained model already knows about the value,
> perhaps it's possible to grab it for storage?
>
> Thanks,
> Ashic.
>
> ------------------------------
> From: as...@live.com
> To: ja...@gluru.co
> CC: user@spark.apache.org
> Subject: RE: ML Random Forest Classifier
> Date: Mon, 11 Apr 2016 23:21:53 +0100
>
> Thanks, James. That looks promising.
>
> ------------------------------
> Date: Mon, 11 Apr 2016 10:41:07 +0100
> Subject: Re: ML Random Forest Classifier
> From: ja...@gluru.co
> To: as...@live.com
> CC: user@spark.apache.org
>
> To add a bit more detail perhaps something like this might work:
>
> package org.apache.spark.ml
>
>
> import org.apache.spark.ml.classification.RandomForestClassificationModel
> import org.apache.spark.ml.classification.DecisionTreeClassificationModel
> import org.apache.spark.ml.classification.LogisticRegressionModel
> import org.apache.spark.mllib.tree.model.{ RandomForestModel =>
> OldRandomForestModel }
> import org.apache.spark.ml.classification.RandomForestClassifier
>
>
> object RandomForestModelConverter {
>
>
>   def fromOld(oldModel: OldRandomForestModel, parent:
> RandomForestClassifier = null,
>     categoricalFeatures: Map[Int, Int], numClasses: Int, numFeatures: Int
> = -1): RandomForestClassificationModel = {
>     RandomForestClassificationModel.fromOld(oldModel, parent,
> categoricalFeatures, numClasses, numFeatures)
>   }
>
>
>   def toOld(newModel: RandomForestClassificationModel):
> OldRandomForestModel = {
>     newModel.toOld
>   }
> }
>
>
> Regards,
>
> James
>
>
> On 11 April 2016 at 10:36, James Hammerton <ja...@gluru.co> wrote:
>
> There are methods for converting the dataframe based random forest models
> to the old RDD based models and vice versa. Perhaps using these will help
> given that the old models can be saved and loaded?
>
> In order to use them however you will need to write code in the
> org.apache.spark.ml package.
>
> I've not actually tried doing this myself but it looks as if it might work.
>
> Regards,
>
> James
>
> On 11 April 2016 at 10:29, Ashic Mahtab <as...@live.com> wrote:
>
> Hello,
> I'm trying to save a pipeline with a random forest classifier. If I try to
> save the pipeline, it complains that the classifier is not Writable, and
> indeed the classifier itself doesn't have a write function. There's a pull
> request that's been merged that enables this for Spark 2.0 (any dates
> around when that'll release?). I am, however, using the Spark Cassandra
> Connector which doesn't seem to be able to create a CqlContext with spark
> 2.0 snapshot builds. Seeing that ML Lib's random forest classifier supports
> storing and loading models, is there a way to create a Spark ML pipeline in
> Spark 1.6 with a random forest classifier that'll allow me to store and
> load the model? The model takes significant amount of time to train, and I
> really don't want to have to train it every time my application launches.
>
> Thanks,
> Ashic.
>
>
>
>

Reply via email to