Have you seen Simon D's viz stuff at juliacon? I believe a lot of that stuff is soon going to be ready for wider use. I am sure he will chime in further on this thread.
-viral On 27 Dec 2015 11:59 pm, "Lampkld" <lampk...@gmail.com> wrote: > > Viral and Symon, > > Since you asked, I will write out some rough and probably excessively abstract Ideas that have been floating around in my head below. I don't have time to formally polish, so please forgive the inchoate nature of these thoughts: > > Yes, composability and generality are the names of the game! I would also add expressiveness, scalability and fostering innovation. > > Part of #1 is at least reaching parity with R in terms of data cleaning and manipulation syntax. Part of R's popularity and its stubborn growth in the face of python's recent maturation is its advantage in ease of expressing data manipulation. If Julia is to compete, at the very least the ecosystem should leverage macros to emulate R's NSE, in a more measured manner (similar to DF meta). > > However I think we should be greedy and think , how can we do better? How can we shorten the overhead and feedback loop in exploring and experimenting with ideas and the data and models? I don't have many concrete suggestions here, but I suspect the solution would involve something Dplyr like with conservative and targeted use of interactive javascript and web gl. Can we do transforms on the data with the mouse? Fly through it With 3d glasses? I think we should think kinda wild here. Hadley has discussed his dreams regarding a "grammar of modeling". Is this prob programming or something else? > > What about plotting specifically ? I think an excellent sort of general exploration framework is Topological Data Analysis. > > Finally, I would look at maximizing diffuse innovation while maintaining uniformity, the strengths of R's and Python's ecosystems respectively. My amateur read of the complex systems science research is that the ability of a system to produce new ideas and process information robustly and quickly is correlated with a balance between looseness and diversity on one hand balanced with strength of connection between nodes and some hierarchy. How can we design the Julia ecosystem to leverage this insight while keeping uniformity in interface? I'm thinking an abstract interface with generic functions and types (similar to distributions.jl,) that can be easily composed together by researchers to create new models but can be plugged back in to an API and tooling to be easily leveraged by end users. Further making experimentation easy and fun (a trait that has received much acclaim from researchers already.) will encourage grad students to pick up Julia and the abstract interface will encourage use of these packages, further increasing incentives to produce. > > I know this is all very vague, but I just wanted to get my general vision out there. Things like passing in types instead of symbols for choosing methods, using multi inheritance traits to tag new models and solvers, using functions defined on abstract types to get tests and optimizes for free are some potential specifics. > > Specifically regarding a PPL, I would say with recent Lora.jl progress and Distributions.jl, and Julia's much more concise and expressive nature vs C++, I don't think it would take anywhere near the work of Stan to get something decent. Pymc 3 is pretty darn close and exceeds stan in some areas with much less labor and code volume... and this is just in python. (though also leveraging theano. > > What does everyone think? > > On Sunday, December 27, 2015 at 12:41:18 PM UTC-5, Simon Byrne wrote: >> >> Thanks for the suggestions, these are certainly the main areas in which we're looking to address as part of this work. >> >> I'd be interested to hear if you have more thoughts about the model specification/probabilistic programming language. A few other people have requested things like this, and this would certainly play to Julia's strengths (as shown by JuMP.jl). That said, a full-scale probabilistic programming language might be a bit too much to ask as part of this work (keep in mind that Stan has been 3+ year project with 2-3 full-time devs + volunteers), but there might be some low-hanging fruit here we can pick. >> >> -simon >> >> On Sunday, 27 December 2015 02:32:43 UTC, Lampkld wrote: >>> >>> Thanks for the response. >>> >>> Since you kindly asked, the following are two main areas in our assessment of the general arc of the Julia ecosystem: >>> >>> 1. Will the roadmap obviate some of the bottlenecks for day to day normal exploratory workflow? These are minimal things that R and Python have and whose lack hamper any use of Julia for regular analysis. Thing like robust dataframe with data i/o into different formats, web scraping, work out nullable semantics and integration with ecosystem , robust data cleaning and tidy data, modeling with basic diagnostic tests etc >>> >>> 2. Will the roadmap jump leapfrog into areas and capabilities that are currently not covered by other stats and data science ecosystems? >>> >>> There are many here, but we are specifically looking at the ability to work with modeling on medium sized out of core databases. This would include an abstract dataframe like interface to said databses MySQL and SQLlite, and some sort of modeling capability on the same. My dream would be separation of model specification as a DAG/ probabilistic programming framework, from fitting the model. Thus the same model can be fit with different sort of data and optimizers. Streaming black box variation inference can be a means to extend this to OOC work. >>> >>> I realize Julia won't for a while have all the statistical tests and random models of python, much less R. However, a general yet powerful and scalable data querying and prob programming framework could arguably suffice for most python and R use cases in Data Science while provide a comparative advantage over other frameworks where it counts. To my knowledge, Right now SAS and STATA are the only packages that offer general modeling with on disk data sets, but the sort of capability I outlined would seem to be in excess of what they offer. >>> >>> A bonus would be filling out gadfly towards Ggplot and ggvis capability. >>> >>> >>> >>> On Thursday, December 24, 2015 at 11:50:42 AM UTC-5, Viral Shah wrote: >>>> >>>> What would be helpful is to know what kind of decisions you are thinking of and what are the factors. >>>> >>>> I suspect within 2 weeks for sure - but it's really for the Julia stats folks to say. The idea is to get feedback and chart a course. >>>> >>>> -viral >>>> >>>> On 24 Dec 2015 10:07 p.m., "Lampkld" <lamp...@gmail.com> wrote: >>>>> >>>>> Sorry to bug you, but can we expect something this or next week? Would be helpful in knowing until when to push some stuff off. >>>>> >>>>> On Thursday, December 17, 2015 at 6:20:45 PM UTC-5, Viral Shah wrote: >>>>>> >>>>>> >>>>>> The JuliaStats team will be publishing a general plan on stats+df in a few days. I doubt we will have settled on all the df issues by then, but at least there will be something to start with. >>>>>> >>>>>> >>>>>> -viral >>>>>> >>>>>> >>>>>> >>>>>> > On 17-Dec-2015, at 10:15 PM, Lampkld <lamp...@gmail.com> wrote: >>>>>> > >>>>>> > Hi Viral, >>>>>> > >>>>>> > Any update on this (stats + df) by chance or idea when we can get one? Even a roadmap or some sort of vision or other details would help with decision making regarding infrastructure. >>>>>> > >>>>>> > Thanks! >>>>>> > >>>>>> > On Wednesday, November 11, 2015 at 3:00:50 AM UTC-5, Viral Shah wrote: >>>>>> > Yes, we are really excited. This grant is to focus on core Julia compiler infrastructure and key math libraries. Much of the libraries focus will be on statistical Computing. >>>>>> > -viral >>>>>> > >>>>>>