I agree. I personally think the ML efforts should follow the StatsBase and Optim conventions where it makes sense.

The notational differences are inconvenient, but they are manageable. I think readability should be the goal there. For example if you implement some algorithm one should use the notation from the referenced paper. A package tailored towards use in a statistical context such as GLMs should probably follow the convention used in statistics (e.g. beta for the coefficients). A package for SVMs should follow the conventions for SVMs (e.g. w for the coefficients) and so forth. It's nice to streamline things, but lets not get carried away with this kind of micromanagement

On 2015-11-11 16:01, Tom Breloff wrote:
One of the tricky things to figure out is how to separate statistics from machine learning, as they overlap heavily (completely?) but with different terminology and goals. I think it's really important that JuliaStats and JuliaML/JuliaLearn play nicely together, and this probably means that any ML interface uses StatsBase verbs whenever possible. There has been a little tension (from my perspective) and a slight turf war when it comes to statistical processes and terminology... is it possible to avoid?

On Wed, Nov 11, 2015 at 9:49 AM, Stefan Karpinski <ste...@karpinski.org <mailto:ste...@karpinski.org>> wrote:

    This is definitely already in progress, but we've a ways to go
    before it's as easy as scikit-learn. I suspect that having an
    organization will be more effective at coordinating the various
    efforts than people might expect.

    On Wed, Nov 11, 2015 at 9:46 AM, Tom Breloff <t...@breloff.com
    <mailto:t...@breloff.com>> wrote:

        Randy, see LearnBase.jl, MachineLearning.jl, Learn.jl (just a
        readme for now), Orchestra.jl, and many others.  Many people
        have the same goal, and wrapping TensorFlow isn't going to
        change the need for a high level interface.  I do agree that a
        good high level interface is higher on the priority list, though.

        On Wed, Nov 11, 2015 at 9:29 AM, Randy Zwitch
        <randy.zwi...@fuqua.duke.edu
        <mailto:randy.zwi...@fuqua.duke.edu>> wrote:

            Sure. I'm not against anyone doing anything, just that it
            seems like Julia suffers from an "expert/edge case"
            problem right now. For me, it'd be awesome if there was
            just a scikit-learn (Python) or caret (R) type
            mega-interface that ties together the packages that are
            already coded together. From my cursory reading, it seems
            like TensorFlow is more like a low-level toolkit for
            expressing/solving equations, where I see Julia lacking an
            easy method to evaluate 3-5 different algorithms on the
            same dataset quickly.

            A tweet I just saw sums it up pretty succinctly:
            "TensorFlow already has more stars than scikit-learn, and
            probably more stars than people actually doing deep learning"



            On Tuesday, November 10, 2015 at 11:28:32 PM UTC-5,
            Alireza Nejati wrote:

                Randy: To answer your question, I'd reckon that the
                two major gaps in julia that TensorFlow could fill are:

                1. Lack of automatic differentiation on arbitrary
                graph structures.
                2. Lack of ability to map computations across cpus and
                clusters.

                Funny enough, I was thinking about (1) for the past
                few weeks and I think I have an idea about how to
                accomplish it using existing JuliaDiff libraries.
                About (2), I have no idea, and that's probably going
                to be the most important aspect of TensorFlow moving
                forward (and also probably the hardest to implement).
                So for the time being, I think it's definitely
                worthwhile to just have an interface to TensorFlow.
                There are a few ways this could be done. Some ways
                that I can think of:

                1. Just tell people to use PyCall directly. Not an
                elegant solution.
                2. A more julia-integrated interface /a la/ SymPy.
                3. Using TensorFlow as the 'backend' of a novel
                julia-based machine learning library. In this
                scenario, everything would be in julia, and TensorFlow
                would only be used to map computations to hardware.

                I think 3 is the most attractive option, but also
                probably the hardest to do.





Reply via email to