Re: Ignite ML random forest questions

2020-08-11 Thread akorensh
Hi,
  The model(s) learn a correlation between the label(s) and the features.
  In the Random Forest Classification example the Labeled feature represents
the class that a wine belongs
  to based on a given set of features. 
  see: 

  The labeled feature is defined here:
  Vectorizer vectorizer =
new DummyVectorizer()
.labeled(Vectorizer.LabelCoordinate.FIRST);
ModelsComposition randomForestMdl = classifier.fit(ignite,
dataCache, vectorizer);
   
   After the model has learned the associations between class and labels, it
is tested here:
double groundTruth = val.get(0);

double prediction = randomForestMdl.predict(inputs);

totalAmount++;
if (!Precision.equals(groundTruth, prediction,
Precision.EPSILON))
amountOfErrors++;


  if you put breakpoints on these lines, groundTruth will be one of 3
available classes and the model
 prediction will try match that classification based on available inputs.


see: https://apacheignite.readme.io/docs/random-forest
In that document you will find more references on working with random forest
models.

If you are new to ML, simple Linear Regression might be the most accessible
model to learn.
https://apacheignite.readme.io/docs/ols-multiple-linear-regression

  
Is there a way to parallelize the training across available cores while
still limiting
the operation to a single JVM process?

Apache Ignite machine learning was designed from the bottom up to train a
model quickly by spreading the load across all nodes of a cluster. 

see: https://apacheignite.readme.io/docs/ml-partition-based-dataset

If you want to limit training to a single JVM process then create a cluster
of one node.



Take a look in the examples here on pointers with feature selection:

https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml/selection
https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml/tutorial/hyperparametertuning

Thanks, Alex



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Ignite ML random forest questions

2020-08-08 Thread Thilo-Alexander Ginkel
Hello everyone,

I am currently experimenting with Ignite machine learning (random
forest regression / classifier) and have come up with a couple of
questions that I can't seem to answer using docs or sample code. I am
rather new to ML as well as Ignite, so I hope that answers aren't too
obvious. ;-)

Is my assumption correct that the label is the coordinate that is
supposed to be learned (possibly depending on all other features) and
later predicted by the model?

At the moment, I am training my model from a local cache
(CacheMode.LOCAL) that I populate through a CacheStoreAdapter from
ElasticSearch as I can fit all data into RAM of a single node.
Training seems to be single-threaded, though. Is there a way to
parallelize the training across available cores while still limiting
the operation to a single JVM process?

After training a model I'd like to figure out the importance of the
different features. Is there a way to obtain the feature importance
from the model?

Thanks,
Thilo