ML RandomForest - creating Mdl.predict(Vector) from One_hot dataset

2020-04-23 Thread kencottrell
Hello all,

I've searched through examples and so far have seen examples on how to do to
use one-hot-encoder only for model fitting or for evaluator, but can't
figure out how to do this for the predict call. For example, we see use of
one-hot as inputs to :

1. RF_MODEL = trainer.fit(
,
,  // this has category column before
one-hot
split.getTrainFilter(),
  // this does one-hot inside the model
- how do I get the cache with additional columns?
);

OR ALSO here:

2.RegressionMetricValues regMetrics = Evaluator.evaluateRegression(
,
split.getTestFilter(),


);


But rfmodel.predict(Vector features) requires the original Vector with
categorical columns be already converted into all doubles. What is best way
to do this intermediate step. 




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re: Apache Ignite ML & Python

2020-03-04 Thread kencottrell
Andrei, 

I am also working with Apache Ignite ML and am interested in providing
wrappers for Ignite ML API, but am wondering if instead of simply recreating
the low level Java API for ML inside Python, how about creating some higher
level services "Auto ML" workflow ? For example:

1. here is raw dataset, already inside this cluster cache "myName", with
Label column "MyLable" , take N samples tell me which appear to be numeric,
unique id, and categorical values?
2. based on N samples, , please run some analysis and tell me the top 5
feature columns in terms of predictive value using algorithm = RandonForest
3. do a batch run, sample size = N, using these preprocessing steps list 
{impute, scale, etc} and algorithms (knn, Decision Tree, etc} and give me a
report of accuracies obtain with each.

In other words, we have a simple sample in the Tutorial demo where these 
all run and then we compare outputs - why not automate these with a Python
Notebook UI of some sort? 




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Random Forest, trying to evaluate accuracy but getting exception

2020-02-05 Thread kencottrell
I have attempted to add this call to a RandomForestModel in order to obtain
accuracy:

  *double accuracy = Evaluator.evaluate(
dataCache,
randomForestMdl,
vectorizer,
new Accuracy<>()
);

System.out.println("\n>>> Accuracy " + accuracy);
System.out.println("\n>>> Test Error " + (1 - accuracy));*


But I get this exception (editted to remove extra detail). Am I using the
wrong parameters?

Exception in thread "main" java.lang.RuntimeException: class
/org.apache.ignite.ml.math.exceptions.IndexException: Invalid (out of bound)
index: 16
at
org.apache.ignite.ml.selection.scoring.evaluator.Evaluator.calculateMetric(Evaluator.java:330)
at
org.apache.ignite.ml.selection.scoring.evaluator.Evaluator.evaluate(Evaluator.java:57)
at com.gg.TrainRandomForest2.main(TrainRandomForest2.java:111)
Caused by: class org.apache.ignite.ml.math.exceptions.IndexException:
Invalid (out of bound) index: 16
at
org.apache.ignite.ml.math.primitives.vector.AbstractVector.checkIndex(AbstractVector.java:174)
at
org.apache.ignite.ml.math.primitives.vector.AbstractVector.get(AbstractVector.java:179)
at
org.apache.ignite.ml.tree.randomforest.data.TreeNode.predict(TreeNode.java:91)
at
org.apache.ignite.ml.tree.randomforest.data.TreeNode.predict(TreeNode.java:92)
at
org.apache.ignite.ml.tree.randomforest.data.TreeRoot.predict(TreeRoot.java:52)
at
org.apache.ignite.ml.tree.randomforest.data.TreeRoot.predict(TreeRoot.java:29)
at
org.apache.ignite.ml.composition.ModelsComposition.predict(ModelsComposition.java:64)
at
org.apache.ignite.ml.composition.ModelsComposition.predict(ModelsComposition.java:32)
at
org.apache.ignite.ml.selection.scoring.cursor.CacheBasedLabelPairCursor$TruthWithPredictionIterator.next(CacheBasedLabelPairCursor.java:145)
at
org.apache.ignite.ml.selection.scoring.cursor.CacheBasedLabelPairCursor$TruthWithPredictionIterator.next(CacheBasedLabelPairCursor.java:121)
at
org.apache.ignite.ml.selection.scoring.metric.classification.Accuracy.score(Accuracy.java:36)
at
org.apache.ignite.ml.selection.scoring.evaluator.Evaluator.calculateMetric(Evaluator.java:328)
... 2 more

/



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Ignite to H20 integration

2020-01-17 Thread kencottrell
Have any of you performed an H20 integration with Ignite to import an
extracted feature data set directly as input into Ignite training engine? 



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


ML Model persist and reuse

2020-01-17 Thread kencottrell
Hello all Apache Ignite ML developers:

I understand currently Ignite can't save a model after training, in such a
way that the model can be re-imported by another Ignite cluster. Correct me
if you can save and reload a model but I don't think you can. 

Anyway, I'd like to know if you have recommendations on how you can do
either one of the following:
1. convert the Ignite model into an interchangeable format? For example
there are some emerging standards (such as https://onnx.ai/  for one)  and
others - have any of you worked with such

2. if not transform the Ignite model into some standard format, how about
saving the model into Native persistence, binary serialized format, creating
some kind of handle that can shared with other clusters, and then use this
to reload the model into a new Ignite session?


This question has been asked of me recently, and this would be a good way to
let Apache Ignite ML/DL models participate in a broader enterprise model
deployment process. 




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/