Re: [scikit-learn] Scikit-learn porting strategy

Avi Gross Tue, 05 Feb 2019 21:24:32 -0800

I haven’t looked at Ruby in a long time. I do wonder what people mean by 
PORTING to another language or environment that already has their own way of 
doing things.


 

I did most of my  recent work in native R enhanced by packages  and have been 
learning how to do similar things in modules on top of modules … on top of 
native python.

 

R chose lots of built-in functionality up-front that python did not, and vice 
versa. If someone wanted to port some machine learning tools to R from python, 
there would not necessarily be much point in porting numpy or pandas as a 
whole. If you did, there would be even more duplication than there is now. On 
the other hand, I have seen people port things to R like a dict datatype which 
is not quite the same as the environments objects R uses. 

 

So if RUBY already has available much of what is needed, it could make sense to 
rewrite algorithms around them and only add what is needed. For efficiency, 
sure, you might want to link in C/C++/FORTRAN libraries.

 

As mentioned, there are already ways to run some languages within/from others. 
R and python can be run with either one being the initiator.  If you want RUBY 
to completely have the new functionality, do you want to slavishly copy entire 
packages or have your own new one designed eclectically? There are many ways to 
do these things and each time I compare a few, I see differences that make some 
more easy or intuitive than others and other times reversed. 

 

And how far do you expect to port? What does RUBY provide for graphics for 
example? R had base graphics and added lattice and then ggplot. I use them all, 
depending on the task and how much detail I want to tweak.  They are quite 
different as is the matplotlib that seems to be used quite a bit in python. 
Making plots is definitely a part of the process but if a function  expects 
certain data structures then would your version of numpy and pandas data 
structures interface well with that?

 

As Andreas says (and I am coincidentally in middle of the book he wrote with a 
Guido, albeit that is her last name unlike the python founder) you may find 
that a part of what you would do is create wrappers that accept one function 
interface and massage things to call a different interface. Calling a graphics 
program that expects a list using an array won’t work unless you quietly 
convert first …

From: scikit-learn <scikit-learn-bounces+avigross=verizon....@python.org> On 
Behalf Of Andreas Mueller
Sent: Tuesday, February 5, 2019 11:40 AM
To: scikit-learn@python.org
Subject: Re: [scikit-learn] Scikit-learn porting strategy

 

There's some stuff already:
https://github.com/SciRuby/

And in terms of strategy:
No, you can go estimator by estimator and at some point implement 
cross-validation and grid-search and pipelines and metrics pretty independently.

It looks like daru is written in ruby which I expect to be too slow.
nmatrix is written in C++, so I guess you'd have to write many of the 
algorithms in C++.

At that point it might be easier to wrap an existing C++ library like mlpack or 
shogun.

On 2/5/19 6:12 AM, Joel Nothman wrote:

If you count things in Scipy and NumPy (and Joblib and Cython?) that 
Scikit-learn depends on and which may be lacking or hard to find in SciRuby, 
it's much much more than 39 years. PyCall, and potentially some 
Scikit-learn-specific wrappers around it, seems a much more sensible approach.





_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org> 
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Scikit-learn porting strategy

Reply via email to