subject:"Random Forest Classification"

Re: Random Forest Classification

2016-08-31 Thread Bryan Cutler

I see. You might try this, create a pipeline of just your feature transformers, then call fit() on the complete dataset to get a model. Finally make second pipeline and add this model and the decision tree as stages. On Aug 30, 2016 8:19 PM, "Bahubali Jain" wrote: > Hi

Re: Random Forest Classification

2016-08-30 Thread Bahubali Jain

Hi Bryan, Thanks for the reply. I am indexing 5 columns ,then using these indexed columns to generate the "feature" column thru vector assembler. Which essentially means that I cannot use *fit()* directly on "completeDataset" dataframe since it will neither have the "feature" column and nor the 5

Re: Random Forest Classification

2016-08-30 Thread Bryan Cutler

You need to first fit just the VectorIndexer which returns the model, then add the model to the pipeline where it will only transform. val featureVectorIndexer = new VectorIndexer() .setInputCol("feature") .setOutputCol("indexedfeature") .setMaxCategories(180)

Re: Random Forest Classification

2016-08-30 Thread Bahubali Jain

Hi, I had run into similar exception " java.util.NoSuchElementException: key not found: " . After further investigation I realized it is happening due to vectorindexer being executed on training dataset and not on entire dataset. In the dataframe I have 5 categories , each of these have to go

Re: Random Forest Classification

2016-07-08 Thread Bryan Cutler

Hi Rich, I looked at the notebook and it seems like you are fitting the StringIndexer and VectorIndexer to only the training data, and it should the the entire data set. So if the training data does not include all of the labels and an unknown label appears in the test data during evaluation,

Re: Random Forest Classification

2016-07-01 Thread Rich Tarro

Hi Bryan. Thanks for your continued help. Here is the code shown in a Jupyter notebook. I figured this was easier that cutting and pasting the code into an email. If you would like me to send you the code in a different format let, me know. The necessary data is all downloaded within the

Re: Random Forest Classification

2016-06-28 Thread Bryan Cutler

Are you fitting the VectorIndexer to the entire data set and not just training or test data? If you are able to post your code and some data to reproduce, that would help in troubleshooting. On Tue, Jun 28, 2016 at 4:40 PM, Rich Tarro wrote: > Thanks for the response, but

Re: Random Forest Classification

2016-06-28 Thread Rich Tarro

Thanks for the response, but in my case I reversed the meaning of "prediction" and "predictedLabel". It seemed to make more sense to me that way, but in retrospect, it probably only causes confusion to anyone else looking at this. I reran the code with all the pipeline stage inputs and outputs

Re: Random Forest Classification

2016-06-28 Thread Bryan Cutler

The problem might be that you are evaluating with "predictionLabel" instead of "prediction", where predictionLabel is the prediction index mapped to the original label strings - at least according to the RandomForestClassifierExample, not sure if your code is exactly the same. On Tue, Jun 28,

Random Forest Classification

2016-06-28 Thread Rich Tarro

I created a ML pipeline using the Random Forest Classifier - similar to what is described here except in my case the source data is in csv format rather than libsvm. https://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier I am able to successfully train

Re: Random Forest Classification

Re: Random Forest Classification

Re: Random Forest Classification

Re: Random Forest Classification

Re: Random Forest Classification

Re: Random Forest Classification

Re: Random Forest Classification

Re: Random Forest Classification

Re: Random Forest Classification

Random Forest Classification

10 matches

Site Navigation

Mail list logo

Footer information