________________________________
From: Niketan Pansare <npan...@us.ibm.com>
Sent: Friday, September 22, 2017 3:10 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML
>> a) Does it mean you are proposing spliting R4ML into two R-wrapper and R4ML?
I was only suggesting how you ought to stage the PRs into SystemML once the
vote passes :)
Alok: Wondering, if we do it and if we have to push to CRAN and there
requirement that all dependencies should be in CRAN might be issues.
>> So I was thinking is it absolutely must have to sync between api?
Soft-yes, we should try our best to do so.
Thanks,
Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
[http://researcher.watson.ibm.com/researcher/photos/3531.jpg]<http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar>
Niketan Pansare -
IBM<http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar>
researcher.watson.ibm.com
Niketan Pansare is a Senior Software Engineer at IBM Research Almaden, where he
works on advanced information management systems that include analytics,
distributed ...
[Inactive hide details for alok singh ---09/22/2017 02:40:10 PM---see comments
Alok: From: Niketan Pansare <npan...@us.ibm.com>]alok singh ---09/22/2017
02:40:10 PM---see comments Alok: From: Niketan Pansare <npan...@us.ibm.com>
From: alok singh <singh_a...@hotmail.com>
To: "dev@systemml.apache.org" <dev@systemml.apache.org>
Date: 09/22/2017 02:40 PM
Subject: Re: [PROPOSAL] R4ML Integration with SystemML
________________________________
see comments Alok:
From: Niketan Pansare <npan...@us.ibm.com>
Sent: Friday, September 22, 2017 2:11 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML
>>> As pointed out earlier, R4ML is not just R interface so it is based on the
>>> earlier product of IBM on R and it has many product feature.
Also note that the pure ML Ctx and the cmd options for dml is not ideally allow
all the things user want to do in his ML code.
The solution could be to create wrapper to make user happy . but we have
created those wrapper but those are in R and from user point for view it feels
that are just writing the R code
If the ultimate goal is to have just MLCtx based R interface than I think it
undermines and R4ML value proposition.
(We can definitely just expose MLCtx api. However calling Logistic Regression
example just for the purpose of MLCtx won't be best) R4ML.mlogit has better apis
May be we are not on same page.
(a) MLContext is not the only API, but an important one that needs to be
supported.
(b) Like R4ML, our mllearn wrappers aim to simplify the usage for the Python
users. These wrappers were designed so that if someone wrote a python script
that uses scikit-learn or mllib. Then, a simple change from `from sklearn
import LogisticRegression` to `from systemml.mllearn import
LogisticRegression` should in principle allow SystemML to be incorporated in
their workflow.
Alok:
a) Does it mean you are proposing spliting R4ML into two R-wrapper and R4ML? I
think that could be idea one can potentially look into it. I second it. That
way one can have pure R wrapper and like mllearn kind of R4ML
b) Currently we can sure expose the MLContext from R as public api but to use
all the code involves many convulations to make life easier for R user. For
example see code func *execute* *output* *getDF* in
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_aloknsingh_r4ml_blob_0d79b3c7975be55989466869fe99ccfd47dd6dc3_R4ML_R_sysml.bridge.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=SivKuY8Zn0LQAmM2UmppEwy4L-lROLYUzT9iYnS4Njg&s=VZv9IEtLnaXzZ3mp1bICD4zRv3SL2VO7b68H0wHTCis&e=
[https://avatars2.githubusercontent.com/u/12959246?v=4&s=400]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_aloknsingh_r4ml_blob_0d79b3c7975be55989466869fe99ccfd47dd6dc3_R4ML_R_sysml.bridge.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=SivKuY8Zn0LQAmM2UmppEwy4L-lROLYUzT9iYnS4Njg&s=VZv9IEtLnaXzZ3mp1bICD4zRv3SL2VO7b68H0wHTCis&e=>
aloknsingh/r4ml<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_aloknsingh_r4ml_blob_0d79b3c7975be55989466869fe99ccfd47dd6dc3_R4ML_R_sysml.bridge.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=SivKuY8Zn0LQAmM2UmppEwy4L-lROLYUzT9iYnS4Njg&s=VZv9IEtLnaXzZ3mp1bICD4zRv3SL2VO7b68H0wHTCis&e=>
urldefense.proofpoint.com
r4ml - Scalable R for Machine Learning
>> 1) I think it will require a lot of work for scala and python api to be in
>> sync with r4ml api.
Also I feel that if the goal is too have just python, scala than we have to do
the coding at R4ML. but I think goals was to merge this project.
I guess the goal is to make SystemML better and more user-friendly. To do that,
we have to try our best to keep our APIs across language consistent. I
understand it might require lot of work for Scala and Python APIs to be in sync
with R4ML API, but it has to be done.
Since R4ML was designed in isolation with the SystemML project, I am
recommending to do a gradual merge of (1) the additional features and (2)
features that diverge from SystemML APIs so as to be R friendly; thus, allowing
the SystemML community to comment on them before merging. This also allows the
R4ML features that match one-to-one with the Python and Scala APIs to be merged
quickly and not be in the PR until we agree to every (1) and (2) features :)
Alok: See the previous comments I like we should explore the idea of splitting
the way you splitted mllearn. Still more discussion needed as I see it. At
this stage those changes will require complete change at R4ML to have those.
Another way to think would be that R4ML can be independent package, which
eventually be pushed to CRAN.
note that in the spark dev repo. Spark core is there and SparkR is there as
seperate dir and python is there as seperate dir
Initially, SparkScala, SparkR and pyspark tried to be in sync but I think now
many features are been added which is not causing sync between sparkR and
pyspark and similar between SparkScala and SpakR and PySpark.
So I was thinking is it absolutely must have to sync between api? Since all
these will cater to different user.
These are ideas.
Thanks,
Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
[http://researcher.watson.ibm.com/researcher/photos/3531.jpg]<http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar>
Niketan Pansare -
IBM<http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar>
researcher.watson.ibm.com
Niketan Pansare is a Senior Software Engineer at IBM Research Almaden, where he
works on advanced information management systems that include analytics,
distributed ...
http://researcher.watson.ibm.com/researcher/photos/3531.jpg
[http://researcher.watson.ibm.com/researcher/photos/3531.jpg]
Niketan Pansare - IBM
researcher.watson.ibm.com
Niketan Pansare is a Senior Software Engineer at IBM Research Almaden, where he
works on advanced information management systems that include analytics,
distributed ...
cid:1__=8FBB0B30DFE367208f9e8a93df938690918c8FB@ alok singh ---09/22/2017
12:30:51 PM---Here are Niketan's question Thanks for taking time to answer our
questions and also for considering
From: alok singh <singh_a...@hotmail.com>
To: "dev@systemml.apache.org" <dev@systemml.apache.org>, "de...@apache.org"
<de...@apache.org>
Date: 09/22/2017 12:30 PM
Subject: Re: [PROPOSAL] R4ML Integration with SystemML
Here are Niketan's question
Thanks for taking time to answer our questions and also for considering to help
SystemML community. I have couple more questions:
Niketan:1.
In case there is inconsistency, do you (as R4ML developers) feel comfortable
changing R4ML interface to be compatible with our other APIs ? May be you can
go over the below two links and imagine adding a corresponding R tab:
- MLContext Programming guide:
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_spark-2Dmlcontext-2Dprogramming-2Dguide&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=xyErlMsfwKjn_qfkXHpjLG8E1B70N5zVX-OWl5LU-yU&e=
apache.github.io<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_spark-2Dmlcontext-2Dprogramming-2Dguide&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=xyErlMsfwKjn_qfkXHpjLG8E1B70N5zVX-OWl5LU-yU&e=
>
apache.github.io
Spark MLContext Programming Guide. Overview; Spark Shell Example. Start Spark
Shell with SystemML; Create MLContext; Hello World; LeNet on MNIST Example;
DataFrame ...
- Algorithm wrappers:
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_algorithms-2Dclassification.html-23multinomial-2Dlogistic-2Dregression&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=TpQy-5v3cbhFJfGbEodsNvhrU8gDWexYBwN9x2eXzlc&e=
ALOK: Hi Niketan
As pointed out earlier, R4ML is not just R interface
so it is based on the earlier product of IBM on R and it has many product
feature.
Also note that the pure ML Ctx and the cmd options for dml is not ideally allow
all the things user want to do in his ML code.
The solution could be to create wrapper to make user happy . but we have
created those wrapper but those are in R and from user point for view it feels
that are just writing the R code
see some of the examples at
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_tree_master_R4ML_inst_examples&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=r4-fcsboHpxlbVf6KyY7C6ptdLcjmyT2g1hBHuqRa2s&e=
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_blob_master_R4ML_inst_examples_r4ml.demo.mlogit.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=ScIkMGbMLlKu7VgjnDI5pDia2L8C3W_9fZXwZBjb7BI&e=
NOTE: that R4ML uses combination of SparkR and DML and R to make user
experience best.
If the ultimate goal is to have just MLCtx based R interface than I think it
undermines and R4ML value proposition.
(We can definitely just expose MLCtx api. However calling Logistic Regression
example just for the purpose of MLCtx won't be best) R4ML.mlogit has better apis
2. Classification - GitHub
Pages<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_algorithms-2Dclassification.html-23multinomial-2Dlogistic-2Dregression&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=TpQy-5v3cbhFJfGbEodsNvhrU8gDWexYBwN9x2eXzlc&e=
>
apache.github.io
SystemML Algorithms Reference 2. Classification 2.1. Multinomial Logistic
Regression Description. The MultiLogReg.dml script performs both binomial and
multinomial ...
Niketan: 2. Other than providing R interface to SystemML as the above APIs,
what additional features/code R4ML plans to add in SystemML ? Just like we want
the R API to be functionally complete with our Python and Scala API, we want
Python and Scala APIs to be functionally complete with the R API. So a
discussion on supporting the additional features in Python and Scala APIs is
required :)
ALOK: as talked in point 1) I think it will require a lot of work for scala and
python api to be in sync with r4ml api.
Also I feel that if the goal is too have just python, scala than we have to do
the coding at R4ML.
but I think goals was to merge this project.
I think @Fred if he can comment also that would be nice
Thanks
Alok
From: alok singh <singh_a...@hotmail.com>
Sent: Thursday, September 21, 2017 7:32 PM
To: dev@systemml.apache.org; de...@apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML
Hi
We (me and Brendan) has been focusing on other things like journeys apart
from new MLCtx changes. R4ML commits and PR you can also review,
I think code will definitely be maintained.
Alok
From: Deron Eriksson <deroneriks...@gmail.com>
Sent: Thursday, September 21, 2017 6:03 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML
>
> * Looking over the github repo, apparently R4ML is not under active
> development/maintenance anymore (last commit Jul 20). So who would be
> willing to maintain and extend it?
>
> ALOK: We will doing development into it . there are open PR already.
>
>
No commits since Jul 20 does raise warning flags, as Matthias pointed out.
For some perspective, SystemML has 1013 commits in the last year (~2.78 per
day). No R4ML commits in 2 months is concerning for obvious reasons. It
implies no real work has been done on the project for months.
> * Providing wrappers for our algorithm scripts would be just a start
> because it hides our core value proposition of custom large-scale ML.
> Hence, we would also need an MLContext equivalent that allows to execute
> arbitrary DML scripts or R functions. Is there already a tentative design
> of such an API and if not, who would like to take it over?
>
> ALOK: Currently no out of box MLCtx.
>
>
I believe this also raises some warning flags. Looking over the code at
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_blob_master_R4ML_R_sysml.bridge.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=5kDETV7oPDlZ3OUDHX3lkMp6VxEJB9dUWCX7bZ1c76o&e=
, it looks
https://urldefense.proofpoint.com/v2/url?u=https-3A__avatars2.githubusercontent.com_u_13631156-3Fv-3D4-26s-3D400&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=YbUfZ7ntWQKbF6sqdbPrpVyZpRnB5ZwvnabMDRSyrw0&e=
SparkTC/r4ml
github.com
r4ml - Scalable R for Machine Learning
like the code in the R4ML master branch utilizes an old API that does not
currently exist in SystemML. As Matthias pointed out, a key value
proposition of SystemML is customizable machine learning, which would
require an API that currently exists in the project.
That said, I believe an R API interface to SystemML is extremely valuable
and I think the whole SystemML community would benefit from the R API, and
I hope you will pursue the issue further. It looks like it has been in
development since June
(https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_pull_50&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=fw5g1aTmnaaxg3-r142R9vfQbpvKlAPZPYbqHMe5Y-4&e=
).
https://urldefense.proofpoint.com/v2/url?u=https-3A__avatars2.githubusercontent.com_u_12959246-3Fv-3D4-26s-3D400&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=Z7RXGGwxwpayjbVxUMlwBw1v-s03TDqZDeIlo496ITo&e=
[WIP][I-50][R4ML-123] new MLContext API by aloknsingh · Pull Request #50 ·
SparkTC/r4ml
github.com
Developer's Certificate of Origin 1.1 By making a contribution to this project,
I certify that: (a) The contribution was created in whole or in part by me and
I have the right to subm...
Deron