I agree. I personally think the ML efforts should follow the StatsBase
and Optim conventions where it makes sense.
The notational differences are inconvenient, but they are manageable. I
think readability should be the goal there. For example if you implement
some algorithm one should use the notation from the referenced paper. A
package tailored towards use in a statistical context such as GLMs
should probably follow the convention used in statistics (e.g. beta for
the coefficients). A package for SVMs should follow the conventions for
SVMs (e.g. w for the coefficients) and so forth. It's nice to streamline
things, but lets not get carried away with this kind of micromanagement
On 2015-11-11 16:01, Tom Breloff wrote:
One of the tricky things to figure out is how to separate statistics
from machine learning, as they overlap heavily (completely?) but with
different terminology and goals. I think it's really important that
JuliaStats and JuliaML/JuliaLearn play nicely together, and this
probably means that any ML interface uses StatsBase verbs whenever
possible. There has been a little tension (from my perspective) and a
slight turf war when it comes to statistical processes and
terminology... is it possible to avoid?
On Wed, Nov 11, 2015 at 9:49 AM, Stefan Karpinski
<ste...@karpinski.org <mailto:ste...@karpinski.org>> wrote:
This is definitely already in progress, but we've a ways to go
before it's as easy as scikit-learn. I suspect that having an
organization will be more effective at coordinating the various
efforts than people might expect.
On Wed, Nov 11, 2015 at 9:46 AM, Tom Breloff <t...@breloff.com
<mailto:t...@breloff.com>> wrote:
Randy, see LearnBase.jl, MachineLearning.jl, Learn.jl (just a
readme for now), Orchestra.jl, and many others. Many people
have the same goal, and wrapping TensorFlow isn't going to
change the need for a high level interface. I do agree that a
good high level interface is higher on the priority list, though.
On Wed, Nov 11, 2015 at 9:29 AM, Randy Zwitch
<randy.zwi...@fuqua.duke.edu
<mailto:randy.zwi...@fuqua.duke.edu>> wrote:
Sure. I'm not against anyone doing anything, just that it
seems like Julia suffers from an "expert/edge case"
problem right now. For me, it'd be awesome if there was
just a scikit-learn (Python) or caret (R) type
mega-interface that ties together the packages that are
already coded together. From my cursory reading, it seems
like TensorFlow is more like a low-level toolkit for
expressing/solving equations, where I see Julia lacking an
easy method to evaluate 3-5 different algorithms on the
same dataset quickly.
A tweet I just saw sums it up pretty succinctly:
"TensorFlow already has more stars than scikit-learn, and
probably more stars than people actually doing deep learning"
On Tuesday, November 10, 2015 at 11:28:32 PM UTC-5,
Alireza Nejati wrote:
Randy: To answer your question, I'd reckon that the
two major gaps in julia that TensorFlow could fill are:
1. Lack of automatic differentiation on arbitrary
graph structures.
2. Lack of ability to map computations across cpus and
clusters.
Funny enough, I was thinking about (1) for the past
few weeks and I think I have an idea about how to
accomplish it using existing JuliaDiff libraries.
About (2), I have no idea, and that's probably going
to be the most important aspect of TensorFlow moving
forward (and also probably the hardest to implement).
So for the time being, I think it's definitely
worthwhile to just have an interface to TensorFlow.
There are a few ways this could be done. Some ways
that I can think of:
1. Just tell people to use PyCall directly. Not an
elegant solution.
2. A more julia-integrated interface /a la/ SymPy.
3. Using TensorFlow as the 'backend' of a novel
julia-based machine learning library. In this
scenario, everything would be in julia, and TensorFlow
would only be used to map computations to hardware.
I think 3 is the most attractive option, but also
probably the hardest to do.