RE: Random Forest FeatureImportance throwing NullPointerException

2016-01-14 Thread Rachana Srivastava
Tried using 1.6 version of Spark that takes numberOfFeatures fifth argument in  
the API but still getting featureImportance as null.

RandomForestClassifier rfc = getRandomForestClassifier( numTrees,  maxBinSize,  
maxTreeDepth,  seed,  impurity);
RandomForestClassificationModel rfm = 
RandomForestClassificationModel.fromOld(model, rfc, categoricalFeatures, 
numberOfClasses,numberOfFeatures);
System.out.println(rfm.featureImportances());

Stack Trace:
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.spark.ml.tree.impl.RandomForest$.computeFeatureImportance(RandomForest.scala:1152)
at 
org.apache.spark.ml.tree.impl.RandomForest$$anonfun$featureImportances$1.apply(RandomForest.scala:)
at 
org.apache.spark.ml.tree.impl.RandomForest$$anonfun$featureImportances$1.apply(RandomForest.scala:1108)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at 
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at 
org.apache.spark.ml.tree.impl.RandomForest$.featureImportances(RandomForest.scala:1108)
at 
org.apache.spark.ml.classification.RandomForestClassificationModel.featureImportances$lzycompute(RandomForestClassifier.scala:237)
at 
org.apache.spark.ml.classification.RandomForestClassificationModel.featureImportances(RandomForestClassifier.scala:237)
at 
com.markmonitor.antifraud.ce.ml.CheckFeatureImportance.main(CheckFeatureImportance.java:49)

From: Rachana Srivastava
Sent: Wednesday, January 13, 2016 3:30 PM
To: 'user@spark.apache.org'; 'd...@spark.apache.org'
Subject: Random Forest FeatureImportance throwing NullPointerException

I have a Random forest model for which I am trying to get the featureImportance 
vector.

Map<Object,Object> categoricalFeaturesParam = new HashMap<>();
scala.collection.immutable.Map<Object,Object> categoricalFeatures =  
(scala.collection.immutable.Map<Object,Object>)
scala.collection.immutable.Map$.MODULE$.apply(JavaConversions.mapAsScalaMap(categoricalFeaturesParam).toSeq());
int numberOfClasses =2;
RandomForestClassifier rfc = new RandomForestClassifier();
RandomForestClassificationModel rfm = 
RandomForestClassificationModel.fromOld(model, rfc, categoricalFeatures, 
numberOfClasses);
System.out.println(rfm.featureImportances());

When I run above code I found featureImportance as null.  Do I need to set 
anything in specific to get the feature importance for the random forest model.

Thanks,

Rachana


Re: Random Forest FeatureImportance throwing NullPointerException

2016-01-14 Thread Bryan Cutler
If you are able to just train the RandomForestClassificationModel from ML
directly instead of training the old model and converting, then that would
be the way to go.

On Thu, Jan 14, 2016 at 2:21 PM, <rachana.srivast...@thomsonreuters.com>
wrote:

> Thanks so much Bryan for your response.  Is there any workaround?
>
>
>
> *From:* Bryan Cutler [mailto:cutl...@gmail.com]
> *Sent:* Thursday, January 14, 2016 2:19 PM
> *To:* Rachana Srivastava
> *Cc:* user@spark.apache.org; d...@spark.apache.org
> *Subject:* Re: Random Forest FeatureImportance throwing
> NullPointerException
>
>
>
> Hi Rachana,
>
> I got the same exception.  It is because computing the feature importance
> depends on impurity stats, which is not calculated with the old
> RandomForestModel in MLlib.  Feel free to create a JIRA for this if you
> think it is necessary, otherwise I believe this problem will be eventually
> solved as part of this JIRA
> https://issues.apache.org/jira/browse/SPARK-12183
>
> Bryan
>
>
>
> On Thu, Jan 14, 2016 at 8:12 AM, Rachana Srivastava <
> rachana.srivast...@markmonitor.com> wrote:
>
> Tried using 1.6 version of Spark that takes numberOfFeatures fifth
> argument in  the API but still getting featureImportance as null.
>
>
>
> RandomForestClassifier rfc = *getRandomForestClassifier*( numTrees,
> maxBinSize,  maxTreeDepth,  seed,  impurity);
>
> RandomForestClassificationModel rfm = RandomForestClassificationModel.
> *fromOld*(model, rfc, categoricalFeatures, numberOfClasses,
> numberOfFeatures);
>
> System.*out*.println(rfm.featureImportances());
>
>
>
> Stack Trace:
>
> Exception in thread "main" *java.lang.NullPointerException*
>
> at
> org.apache.spark.ml.tree.impl.RandomForest$.computeFeatureImportance(RandomForest.scala:1152)
>
> at
> org.apache.spark.ml.tree.impl.RandomForest$$anonfun$featureImportances$1.apply(RandomForest.scala:)
>
> at
> org.apache.spark.ml.tree.impl.RandomForest$$anonfun$featureImportances$1.apply(RandomForest.scala:1108)
>
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>
> at
> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>
> at
> org.apache.spark.ml.tree.impl.RandomForest$.featureImportances(RandomForest.scala:1108)
>
> at
> org.apache.spark.ml.classification.RandomForestClassificationModel.featureImportances$lzycompute(RandomForestClassifier.scala:237)
>
> at
> org.apache.spark.ml.classification.RandomForestClassificationModel.featureImportances(RandomForestClassifier.scala:237)
>
> at
> com.markmonitor.antifraud.ce.ml.CheckFeatureImportance.main(
> *CheckFeatureImportance.java:49*)
>
>
>
> *From:* Rachana Srivastava
> *Sent:* Wednesday, January 13, 2016 3:30 PM
> *To:* 'user@spark.apache.org'; 'd...@spark.apache.org'
> *Subject:* Random Forest FeatureImportance throwing NullPointerException
>
>
>
> I have a Random forest model for which I am trying to get the
> featureImportance vector.
>
>
>
> Map<Object,Object> categoricalFeaturesParam = *new* HashMap<>();
>
> scala.collection.immutable.Map<Object,Object> categoricalFeatures =
>  (scala.collection.immutable.Map<Object,Object>)
>
> scala.collection.immutable.Map$.*MODULE$*.apply(JavaConversions.
> *mapAsScalaMap*(categoricalFeaturesParam).toSeq());
>
> *int* numberOfClasses =2;
>
> RandomForestClassifier rfc = *new* RandomForestClassifier();
>
> RandomForestClassificationModel rfm = RandomForestClassificationModel.
> *fromOld*(model, rfc, categoricalFeatures, numberOfClasses);
>
> System.*out*.println(rfm.featureImportances());
>
>
>
> When I run above code I found featureImportance as null.  Do I need to set
> anything in specific to get the feature importance for the random forest
> model.
>
>
>
> Thanks,
>
>
>
> Rachana
>
>
>


Re: Random Forest FeatureImportance throwing NullPointerException

2016-01-14 Thread Bryan Cutler
Hi Rachana,

I got the same exception.  It is because computing the feature importance
depends on impurity stats, which is not calculated with the old
RandomForestModel in MLlib.  Feel free to create a JIRA for this if you
think it is necessary, otherwise I believe this problem will be eventually
solved as part of this JIRA
https://issues.apache.org/jira/browse/SPARK-12183

Bryan

On Thu, Jan 14, 2016 at 8:12 AM, Rachana Srivastava <
rachana.srivast...@markmonitor.com> wrote:

> Tried using 1.6 version of Spark that takes numberOfFeatures fifth
> argument in  the API but still getting featureImportance as null.
>
>
>
> RandomForestClassifier rfc = *getRandomForestClassifier*( numTrees,
> maxBinSize,  maxTreeDepth,  seed,  impurity);
>
> RandomForestClassificationModel rfm = RandomForestClassificationModel.
> *fromOld*(model, rfc, categoricalFeatures, numberOfClasses,
> numberOfFeatures);
>
> System.*out*.println(rfm.featureImportances());
>
>
>
> Stack Trace:
>
> Exception in thread "main" *java.lang.NullPointerException*
>
> at
> org.apache.spark.ml.tree.impl.RandomForest$.computeFeatureImportance(RandomForest.scala:1152)
>
> at
> org.apache.spark.ml.tree.impl.RandomForest$$anonfun$featureImportances$1.apply(RandomForest.scala:)
>
> at
> org.apache.spark.ml.tree.impl.RandomForest$$anonfun$featureImportances$1.apply(RandomForest.scala:1108)
>
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>
> at
> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>
> at
> org.apache.spark.ml.tree.impl.RandomForest$.featureImportances(RandomForest.scala:1108)
>
> at
> org.apache.spark.ml.classification.RandomForestClassificationModel.featureImportances$lzycompute(RandomForestClassifier.scala:237)
>
> at
> org.apache.spark.ml.classification.RandomForestClassificationModel.featureImportances(RandomForestClassifier.scala:237)
>
> at
> com.markmonitor.antifraud.ce.ml.CheckFeatureImportance.main(
> *CheckFeatureImportance.java:49*)
>
>
>
> *From:* Rachana Srivastava
> *Sent:* Wednesday, January 13, 2016 3:30 PM
> *To:* 'user@spark.apache.org'; 'd...@spark.apache.org'
> *Subject:* Random Forest FeatureImportance throwing NullPointerException
>
>
>
> I have a Random forest model for which I am trying to get the
> featureImportance vector.
>
>
>
> Map<Object,Object> categoricalFeaturesParam = *new* HashMap<>();
>
> scala.collection.immutable.Map<Object,Object> categoricalFeatures =
>  (scala.collection.immutable.Map<Object,Object>)
>
> scala.collection.immutable.Map$.*MODULE$*.apply(JavaConversions.
> *mapAsScalaMap*(categoricalFeaturesParam).toSeq());
>
> *int* numberOfClasses =2;
>
> RandomForestClassifier rfc = *new* RandomForestClassifier();
>
> RandomForestClassificationModel rfm = RandomForestClassificationModel.
> *fromOld*(model, rfc, categoricalFeatures, numberOfClasses);
>
> System.*out*.println(rfm.featureImportances());
>
>
>
> When I run above code I found featureImportance as null.  Do I need to set
> anything in specific to get the feature importance for the random forest
> model.
>
>
>
> Thanks,
>
>
>
> Rachana
>


Random Forest FeatureImportance throwing NullPointerException

2016-01-13 Thread Rachana Srivastava
I have a Random forest model for which I am trying to get the featureImportance 
vector.

Map categoricalFeaturesParam = new HashMap<>();
scala.collection.immutable.Map categoricalFeatures =  
(scala.collection.immutable.Map)
scala.collection.immutable.Map$.MODULE$.apply(JavaConversions.mapAsScalaMap(categoricalFeaturesParam).toSeq());
int numberOfClasses =2;
RandomForestClassifier rfc = new RandomForestClassifier();
RandomForestClassificationModel rfm = 
RandomForestClassificationModel.fromOld(model, rfc, categoricalFeatures, 
numberOfClasses);
System.out.println(rfm.featureImportances());

When I run above code I found featureImportance as null.  Do I need to set 
anything in specific to get the feature importance for the random forest model.

Thanks,

Rachana