Have you seen Simon D's viz stuff at juliacon? I believe a lot of that
stuff is soon going to be ready for wider use. I am sure he will chime in
further on this thread.

-viral

On 27 Dec 2015 11:59 pm, "Lampkld" <lampk...@gmail.com> wrote:
>
> Viral and Symon,
>
> Since you asked, I will write out some rough and probably excessively
abstract Ideas that have been floating around in my head below. I don't
have time to formally polish, so please forgive the inchoate nature of
these thoughts:
>
> Yes, composability and generality are the names of the game! I would also
add expressiveness, scalability and fostering innovation.
>
> Part of #1 is at least reaching parity with R in terms of data cleaning
and manipulation syntax.  Part of R's popularity and its stubborn growth in
the face of python's recent maturation is its advantage in ease of
expressing data manipulation. If Julia is to compete, at the very least the
ecosystem should leverage macros to emulate R's NSE, in a more measured
manner (similar to DF meta).
>
> However I think we should be greedy and think , how can we do better? How
can we shorten the overhead and feedback loop in exploring and
experimenting with  ideas and the data and models?  I don't have many
concrete suggestions here, but I suspect the solution would involve
something Dplyr like with conservative and targeted use of interactive
javascript and web gl. Can we do transforms on the data with the mouse?
Fly through it With 3d glasses? I think we should think kinda wild here.
Hadley has discussed his dreams regarding a "grammar of modeling". Is this
prob programming or something else?
>
> What about plotting specifically ? I think an excellent sort of  general
exploration framework is Topological Data Analysis.
>
> Finally, I would look at maximizing diffuse innovation while maintaining
uniformity, the strengths of R's and Python's ecosystems respectively. My
amateur read of the complex systems science research  is that the ability
of a  system to produce new ideas and process information robustly and
quickly is correlated with a balance between looseness and diversity on one
hand balanced with strength of connection between nodes and some
hierarchy.  How can we design the Julia ecosystem to leverage this insight
while keeping uniformity in interface?  I'm thinking an abstract interface
with generic functions and types (similar to distributions.jl,) that can be
easily composed  together by researchers to create new models but can be
plugged back in to an API and tooling to be easily leveraged by end users.
Further making experimentation easy and fun (a trait that has received much
acclaim from researchers already.)  will encourage grad students to pick up
Julia and the abstract interface will encourage use of these packages,
further increasing incentives to produce.
>
>  I know this is all very vague, but I just wanted to get my general
vision out there. Things like passing in types instead of symbols for
choosing methods, using multi inheritance traits to tag new models and
solvers, using functions defined on abstract types to get tests and
optimizes for free  are some potential specifics.
>
> Specifically regarding a PPL, I would say with recent Lora.jl progress
and Distributions.jl, and Julia's much more concise and expressive nature
vs C++,  I don't think it would take anywhere near the work of Stan to get
something decent. Pymc 3 is pretty darn close and exceeds stan in some
areas with much less labor and code volume... and this is just in python.
(though also leveraging theano.
>
> What does everyone think?
>
> On Sunday, December 27, 2015 at 12:41:18 PM UTC-5, Simon Byrne wrote:
>>
>> Thanks for the suggestions, these are certainly the main areas in which
we're looking to address as part of this work.
>>
>> I'd be interested to hear if you have more thoughts about the model
specification/probabilistic programming language. A few other people have
requested things like this, and this would certainly play to Julia's
strengths (as shown by JuMP.jl). That said, a full-scale probabilistic
programming language might be a bit too much to ask as part of this work
(keep in mind that Stan has been 3+ year project with 2-3 full-time devs +
volunteers), but there might be some low-hanging fruit here we can pick.
>>
>> -simon
>>
>> On Sunday, 27 December 2015 02:32:43 UTC, Lampkld wrote:
>>>
>>> Thanks for the response.
>>>
>>> Since you kindly asked, the following are two main areas in our
assessment of the general arc of the Julia ecosystem:
>>>
>>> 1. Will the roadmap obviate some of the bottlenecks for day to day
normal exploratory workflow?  These are minimal  things that R and Python
have and whose lack hamper any use of Julia for regular analysis. Thing
like robust dataframe with data i/o into different formats, web scraping,
work out nullable semantics and integration with ecosystem , robust data
cleaning and tidy data, modeling with basic  diagnostic tests etc
>>>
>>> 2. Will the roadmap jump leapfrog into areas and capabilities that are
currently not covered by other stats and data science ecosystems?
>>>
>>>  There are many here, but we are specifically looking at the ability to
work with modeling on medium sized out of core databases. This would
include an abstract dataframe like interface to said databses MySQL and
SQLlite, and some sort of modeling capability on the same. My dream would
be separation of model specification as a DAG/ probabilistic programming
framework, from fitting the model. Thus the same model can be fit with
different sort of data and optimizers. Streaming black box variation
inference can be a means to extend this to  OOC work.
>>>
>>> I realize Julia won't for a while have all the statistical tests and
random models of python, much less R. However, a general yet powerful and
scalable data querying and prob programming framework could arguably
suffice for most python and R use cases in Data Science while provide a
comparative advantage over other frameworks where it counts.  To my
knowledge, Right now SAS and STATA are the only packages that offer general
modeling with on disk data sets, but the sort of capability I outlined
would seem to be in excess of what they offer.
>>>
>>> A bonus would be filling out gadfly towards Ggplot and ggvis
capability.
>>>
>>>
>>>
>>> On Thursday, December 24, 2015 at 11:50:42 AM UTC-5, Viral Shah wrote:
>>>>
>>>> What would be helpful is to know what kind of decisions you are
thinking of and what are the factors.
>>>>
>>>> I suspect within 2 weeks for sure - but it's really for the Julia
stats folks to say. The idea is to get feedback and chart a course.
>>>>
>>>> -viral
>>>>
>>>> On 24 Dec 2015 10:07 p.m., "Lampkld" <lamp...@gmail.com> wrote:
>>>>>
>>>>> Sorry to bug you, but can we expect something  this or next week?
Would be helpful in knowing until when to push some stuff off.
>>>>>
>>>>> On Thursday, December 17, 2015 at 6:20:45 PM UTC-5, Viral Shah wrote:
>>>>>>
>>>>>>
>>>>>> The JuliaStats team will be publishing a general plan on stats+df in
a few days. I doubt we will have settled on all the df issues by then, but
at least there will be something to start with.
>>>>>>
>>>>>>
>>>>>> -viral
>>>>>>
>>>>>>
>>>>>>
>>>>>> > On 17-Dec-2015, at 10:15 PM, Lampkld <lamp...@gmail.com> wrote:
>>>>>> >
>>>>>> > Hi Viral,
>>>>>> >
>>>>>> > Any update on this (stats + df) by chance or idea when we can get
one? Even a roadmap or some sort of vision or other details would help with
  decision making regarding infrastructure.
>>>>>> >
>>>>>> > Thanks!
>>>>>> >
>>>>>> > On Wednesday, November 11, 2015 at 3:00:50 AM UTC-5, Viral Shah
wrote:
>>>>>> > Yes, we are really excited. This grant is to focus on core Julia
compiler infrastructure and key math libraries. Much of the libraries focus
will be on statistical Computing.
>>>>>> > -viral
>>>>>> >
>>>>>>

Reply via email to