Re: Apache Ignite ML & Python

2020-03-05 Thread Alexey Zinoviev
Ken, thanks for the feedback, part of the ideas looks like the good
candidates for the next release for the Java API.

We should understand, that Python API could only wrap Java API.

We approach with wrapping via P4j library as in mentioned repository could
be used, it is a common approach, it is used in PySparj, for example.

Currently I m not ready to make Python wrapper a part of Ignite for many
reasons: part of ML API is released firstly, this is a big work for many
Committer, we couldnt guarantee the release cycle for such component.


пт, 6 мар. 2020 г., 2:49 Denis Magda :

> Folks,
>
> Does it make sense to take an approach of Python ML implementation
> available for GridGain in a beta mode? (where Python APIs wrap around Java
> ML library)
>
> https://www.gridgain.com/docs/latest/developers-guide/python-ml/using-python-ml
>
> -
> Denis
>
>
> On Thu, Mar 5, 2020 at 6:50 AM Alexey Zinoviev 
> wrote:
>
> > Agree with simple case, I think we could start from the simple poc for
> the
> > Python for ML in the next release
> >
> > чт, 5 мар. 2020 г., 17:05 AG :
> >
> > >
> > > Thanks, for the reply!
> > >
> > > It looks like a high-level API similar to Sklearn pipelines.
> > > In my opinion, for the first steps easier to add simple assess to gain
> > the
> > > ability to run a simple model or simple preprocessor from python.
> > >
> > > According to your example:
> > > Here is raw dataset, already inside this cluster cache "myName", with
> > > Label column "MyLable".
> > >
> > > I want to run from notebook UI imputer and knn using python API. Export
> > > results to file storage as an example.
> > >
> > > In my opinion, the ability to create such a simple workflow should be
> our
> > > goal for the first time.
> > >
> > > Thank You!
> > >
> > > Best regards,
> > > Andrei Gavrilov.
> > >
> > > Sent with ProtonMail Secure Email.
> > >
> > > ‐‐‐ Original Message ‐‐‐
> > > On Wednesday, March 4, 2020 10:49 PM, kencottrell <
> > > ken.cottr...@gridgain.com> wrote:
> > >
> > > > Andrei,
> > > >
> > > > I am also working with Apache Ignite ML and am interested in
> providing
> > > > wrappers for Ignite ML API, but am wondering if instead of simply
> > > recreating
> > > > the low level Java API for ML inside Python, how about creating some
> > > higher
> > > > level services "Auto ML" workflow ? For example:
> > > >
> > > > 1.  here is raw dataset, already inside this cluster cache "myName",
> > with
> > > > Label column "MyLable" , take N samples tell me which appear to
> be
> > > numeric,
> > > > unique id, and categorical values?
> > > >
> > > > 2.  based on N samples, , please run some analysis and tell me the
> top
> > 5
> > > > feature columns in terms of predictive value using algorithm =
> > > RandonForest
> > > >
> > > > 3.  do a batch run, sample size = N, using these preprocessing steps
> > list
> > > > {impute, scale, etc} and algorithms (knn, Decision Tree, etc} and
> > > give me a
> > > > report of accuracies obtain with each.
> > > >
> > > > In other words, we have a simple sample in the Tutorial demo
> where
> > > these
> > > > all run and then we compare outputs - why not automate these
> with a
> > > Python
> > > > Notebook UI of some sort?
> > > >
> > > > --
> > > > Sent from:
> http://apache-ignite-developers.2346864.n4.nabble.com/
> > > >
> > >
> > >
> > >
> >
>


Re: Apache Ignite ML & Python

2020-03-05 Thread Denis Magda
Folks,

Does it make sense to take an approach of Python ML implementation
available for GridGain in a beta mode? (where Python APIs wrap around Java
ML library)
https://www.gridgain.com/docs/latest/developers-guide/python-ml/using-python-ml

-
Denis


On Thu, Mar 5, 2020 at 6:50 AM Alexey Zinoviev 
wrote:

> Agree with simple case, I think we could start from the simple poc for the
> Python for ML in the next release
>
> чт, 5 мар. 2020 г., 17:05 AG :
>
> >
> > Thanks, for the reply!
> >
> > It looks like a high-level API similar to Sklearn pipelines.
> > In my opinion, for the first steps easier to add simple assess to gain
> the
> > ability to run a simple model or simple preprocessor from python.
> >
> > According to your example:
> > Here is raw dataset, already inside this cluster cache "myName", with
> > Label column "MyLable".
> >
> > I want to run from notebook UI imputer and knn using python API. Export
> > results to file storage as an example.
> >
> > In my opinion, the ability to create such a simple workflow should be our
> > goal for the first time.
> >
> > Thank You!
> >
> > Best regards,
> > Andrei Gavrilov.
> >
> > Sent with ProtonMail Secure Email.
> >
> > ‐‐‐ Original Message ‐‐‐
> > On Wednesday, March 4, 2020 10:49 PM, kencottrell <
> > ken.cottr...@gridgain.com> wrote:
> >
> > > Andrei,
> > >
> > > I am also working with Apache Ignite ML and am interested in providing
> > > wrappers for Ignite ML API, but am wondering if instead of simply
> > recreating
> > > the low level Java API for ML inside Python, how about creating some
> > higher
> > > level services "Auto ML" workflow ? For example:
> > >
> > > 1.  here is raw dataset, already inside this cluster cache "myName",
> with
> > > Label column "MyLable" , take N samples tell me which appear to be
> > numeric,
> > > unique id, and categorical values?
> > >
> > > 2.  based on N samples, , please run some analysis and tell me the top
> 5
> > > feature columns in terms of predictive value using algorithm =
> > RandonForest
> > >
> > > 3.  do a batch run, sample size = N, using these preprocessing steps
> list
> > > {impute, scale, etc} and algorithms (knn, Decision Tree, etc} and
> > give me a
> > > report of accuracies obtain with each.
> > >
> > > In other words, we have a simple sample in the Tutorial demo where
> > these
> > > all run and then we compare outputs - why not automate these with a
> > Python
> > > Notebook UI of some sort?
> > >
> > > --
> > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > >
> >
> >
> >
>


Re: Apache Ignite ML & Python

2020-03-05 Thread Ken Cottrell
Alexey, Andrei,

Here are some thoughts on what would be good to have in Python-Ignite ML
notebook:

   - some way to pick an optional sample size (out of a very big cache
   size) that gets communicated and set aside to all partitions
   - some way to count number of unique values for a category (for example
   should we do a one-hot or String encoding) - this might need to be entire
   dataset if you want to do one-hot
   - some way to do a quick assessment and simple listing (similar to
   SKlearn pretty bar chart) about contributions for each feature to the label.
   - some way to allow vector to choose its own indexes based on:
   predictive weight, data type (for example automatically encodes category)
   - some way to report simple cluster metrics in  Python notebook - but
   focus on ML stuff like raw cache, sample cache, vector info, preprocessing
   / training stats etc
   - some way to input lists of things to do on a data set in parallel
   (list of algorithms for example) and then let Ignite ML run them all and
   report comparisons back
   - some way to explain the steps that were run by ML in the background
   and a report on all the steps

We might even just create some sort of demo Python wrapper that sits in
front of the  org.apache.ignite.examples.ml.tutorial code, but pass in a
different cache handle instead of Titanic and also run all of the Java
classes (DT, impute, categorial encoding, scaling, etc etc) in parallel
instead of serially.




*Ken Cottrell*

*mobile: +1 (214) 546-5100*
*ken.cottr...@gridgain.com *

*https://www.linkedin.com/in/kennethcottrell
*



On Thu, Mar 5, 2020 at 8:53 AM Alexey Zinoviev 
wrote:

> Agree with simple case, I think we could start from the simple poc for the
> Python for ML in the next release
>
> чт, 5 мар. 2020 г., 17:05 AG :
>
> >
> > Thanks, for the reply!
> >
> > It looks like a high-level API similar to Sklearn pipelines.
> > In my opinion, for the first steps easier to add simple assess to gain
> the
> > ability to run a simple model or simple preprocessor from python.
> >
> > According to your example:
> > Here is raw dataset, already inside this cluster cache "myName", with
> > Label column "MyLable".
> >
> > I want to run from notebook UI imputer and knn using python API. Export
> > results to file storage as an example.
> >
> > In my opinion, the ability to create such a simple workflow should be our
> > goal for the first time.
> >
> > Thank You!
> >
> > Best regards,
> > Andrei Gavrilov.
> >
> > Sent with ProtonMail Secure Email.
> >
> > ‐‐‐ Original Message ‐‐‐
> > On Wednesday, March 4, 2020 10:49 PM, kencottrell <
> > ken.cottr...@gridgain.com> wrote:
> >
> > > Andrei,
> > >
> > > I am also working with Apache Ignite ML and am interested in providing
> > > wrappers for Ignite ML API, but am wondering if instead of simply
> > recreating
> > > the low level Java API for ML inside Python, how about creating some
> > higher
> > > level services "Auto ML" workflow ? For example:
> > >
> > > 1.  here is raw dataset, already inside this cluster cache "myName",
> with
> > > Label column "MyLable" , take N samples tell me which appear to be
> > numeric,
> > > unique id, and categorical values?
> > >
> > > 2.  based on N samples, , please run some analysis and tell me the top
> 5
> > > feature columns in terms of predictive value using algorithm =
> > RandonForest
> > >
> > > 3.  do a batch run, sample size = N, using these preprocessing steps
> list
> > > {impute, scale, etc} and algorithms (knn, Decision Tree, etc} and
> > give me a
> > > report of accuracies obtain with each.
> > >
> > > In other words, we have a simple sample in the Tutorial demo where
> > these
> > > all run and then we compare outputs - why not automate these with a
> > Python
> > > Notebook UI of some sort?
> > >
> > > --
> > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > >
> >
> >
> >
>


Re: Apache Ignite ML & Python

2020-03-05 Thread Alexey Zinoviev
Agree with simple case, I think we could start from the simple poc for the
Python for ML in the next release

чт, 5 мар. 2020 г., 17:05 AG :

>
> Thanks, for the reply!
>
> It looks like a high-level API similar to Sklearn pipelines.
> In my opinion, for the first steps easier to add simple assess to gain the
> ability to run a simple model or simple preprocessor from python.
>
> According to your example:
> Here is raw dataset, already inside this cluster cache "myName", with
> Label column "MyLable".
>
> I want to run from notebook UI imputer and knn using python API. Export
> results to file storage as an example.
>
> In my opinion, the ability to create such a simple workflow should be our
> goal for the first time.
>
> Thank You!
>
> Best regards,
> Andrei Gavrilov.
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐ Original Message ‐‐‐
> On Wednesday, March 4, 2020 10:49 PM, kencottrell <
> ken.cottr...@gridgain.com> wrote:
>
> > Andrei,
> >
> > I am also working with Apache Ignite ML and am interested in providing
> > wrappers for Ignite ML API, but am wondering if instead of simply
> recreating
> > the low level Java API for ML inside Python, how about creating some
> higher
> > level services "Auto ML" workflow ? For example:
> >
> > 1.  here is raw dataset, already inside this cluster cache "myName", with
> > Label column "MyLable" , take N samples tell me which appear to be
> numeric,
> > unique id, and categorical values?
> >
> > 2.  based on N samples, , please run some analysis and tell me the top 5
> > feature columns in terms of predictive value using algorithm =
> RandonForest
> >
> > 3.  do a batch run, sample size = N, using these preprocessing steps list
> > {impute, scale, etc} and algorithms (knn, Decision Tree, etc} and
> give me a
> > report of accuracies obtain with each.
> >
> > In other words, we have a simple sample in the Tutorial demo where
> these
> > all run and then we compare outputs - why not automate these with a
> Python
> > Notebook UI of some sort?
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
>
>
>


Re: Apache Ignite ML & Python

2020-03-05 Thread Alexey Zinoviev
Agree with simple case, I think we could start from the simple poc for the
Python for ML in the next release

чт, 5 мар. 2020 г., 17:05 AG :

>
> Thanks, for the reply!
>
> It looks like a high-level API similar to Sklearn pipelines.
> In my opinion, for the first steps easier to add simple assess to gain the
> ability to run a simple model or simple preprocessor from python.
>
> According to your example:
> Here is raw dataset, already inside this cluster cache "myName", with
> Label column "MyLable".
>
> I want to run from notebook UI imputer and knn using python API. Export
> results to file storage as an example.
>
> In my opinion, the ability to create such a simple workflow should be our
> goal for the first time.
>
> Thank You!
>
> Best regards,
> Andrei Gavrilov.
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐ Original Message ‐‐‐
> On Wednesday, March 4, 2020 10:49 PM, kencottrell <
> ken.cottr...@gridgain.com> wrote:
>
> > Andrei,
> >
> > I am also working with Apache Ignite ML and am interested in providing
> > wrappers for Ignite ML API, but am wondering if instead of simply
> recreating
> > the low level Java API for ML inside Python, how about creating some
> higher
> > level services "Auto ML" workflow ? For example:
> >
> > 1.  here is raw dataset, already inside this cluster cache "myName", with
> > Label column "MyLable" , take N samples tell me which appear to be
> numeric,
> > unique id, and categorical values?
> >
> > 2.  based on N samples, , please run some analysis and tell me the top 5
> > feature columns in terms of predictive value using algorithm =
> RandonForest
> >
> > 3.  do a batch run, sample size = N, using these preprocessing steps list
> > {impute, scale, etc} and algorithms (knn, Decision Tree, etc} and
> give me a
> > report of accuracies obtain with each.
> >
> > In other words, we have a simple sample in the Tutorial demo where
> these
> > all run and then we compare outputs - why not automate these with a
> Python
> > Notebook UI of some sort?
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
>
>
>


Re: Apache Ignite ML & Python

2020-03-05 Thread AG


Thanks, for the reply!

It looks like a high-level API similar to Sklearn pipelines.
In my opinion, for the first steps easier to add simple assess to gain the 
ability to run a simple model or simple preprocessor from python.

According to your example:
Here is raw dataset, already inside this cluster cache "myName", with Label 
column "MyLable".

I want to run from notebook UI imputer and knn using python API. Export results 
to file storage as an example.

In my opinion, the ability to create such a simple workflow should be our goal 
for the first time.

Thank You!

Best regards,
Andrei Gavrilov.

Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐
On Wednesday, March 4, 2020 10:49 PM, kencottrell  
wrote:

> Andrei,
>
> I am also working with Apache Ignite ML and am interested in providing
> wrappers for Ignite ML API, but am wondering if instead of simply recreating
> the low level Java API for ML inside Python, how about creating some higher
> level services "Auto ML" workflow ? For example:
>
> 1.  here is raw dataset, already inside this cluster cache "myName", with
> Label column "MyLable" , take N samples tell me which appear to be 
> numeric,
> unique id, and categorical values?
>
> 2.  based on N samples, , please run some analysis and tell me the top 5
> feature columns in terms of predictive value using algorithm = 
> RandonForest
>
> 3.  do a batch run, sample size = N, using these preprocessing steps list
> {impute, scale, etc} and algorithms (knn, Decision Tree, etc} and give me 
> a
> report of accuracies obtain with each.
>
> In other words, we have a simple sample in the Tutorial demo where these
> all run and then we compare outputs - why not automate these with a Python
> Notebook UI of some sort?
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>




Re: Apache Ignite ML & Python

2020-03-04 Thread kencottrell
Andrei, 

I am also working with Apache Ignite ML and am interested in providing
wrappers for Ignite ML API, but am wondering if instead of simply recreating
the low level Java API for ML inside Python, how about creating some higher
level services "Auto ML" workflow ? For example:

1. here is raw dataset, already inside this cluster cache "myName", with
Label column "MyLable" , take N samples tell me which appear to be numeric,
unique id, and categorical values?
2. based on N samples, , please run some analysis and tell me the top 5
feature columns in terms of predictive value using algorithm = RandonForest
3. do a batch run, sample size = N, using these preprocessing steps list 
{impute, scale, etc} and algorithms (knn, Decision Tree, etc} and give me a
report of accuracies obtain with each.

In other words, we have a simple sample in the Tutorial demo where these 
all run and then we compare outputs - why not automate these with a Python
Notebook UI of some sort? 




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Apache Ignite ML & Python

2020-03-04 Thread ag239
Dear Community,

I was very inspired in Ignite ML and I wanted to try it with Python.  
Particularly I was interested in compares  Ignite ML VS Spark ML
 
However, I came across the fact that pyignite component allows only to
perform basic cache operations through the API and it has nothing to do with
Ignite ML.

I have discussed this issue with Alexey Zinoviev 
and he suggested to describe here all required features which are not
presented now in Ignite.

Therefore the list of required features:

* Ignite ML and pyignite integration.
Ignite ML is a fairly versatile ML, just inside driving on Ignite
primitives, so Ignite ML and pyignite compatibility requires a lot of java
code using py4j library to wrap Ignite ML with python. Also, I'm sure lots
of python developers will be appreciated opportunity to test this solution
in their tasks.

* Ignite ML and PySpark integration.
The really interesting case is using pyignite ML with data preprocessed via
pyspark. As soon as I know the current version of Ignite supports only
integration with Spark (not Pyspark)

I hope I wrote this letter in accordance with the rules of the community.

Also, I hope these cases will be interested in the dev community. 

BG
Andrei Gavrilov. 
Software Engineer. EPAM Systems



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/