Briefly,

1. Robust dataframes is a key thrust area for this work. At this point the work 
is exploratory, but we all are expecting this being one of the first areas to 
see rapid progress on. Julia’s db support has improved a lot, independently, 
and will keep getting better. As soon as there is consensus on this direction, 
the rest of the stats work should accelerate greatly. Web scraping is unlikely 
to be something that is part of this - but the rest of what you mention is fair 
game.

2. This is really the question and comes up repeatedly. As a goal, we certainly 
don’t want to just clone what other tools are doing, but do better. It is hard 
to outline exactly how that will happen, but one key thing that I am personally 
focussed on is being able to work with larger volumes of data in a composable 
general way. What are the other capabilities and areas where Julia could 
potentially leapfrog?

-viral

> On 27-Dec-2015, at 8:02 AM, Lampkld <lampk...@gmail.com> wrote:
> 
> Thanks for the response.
> 
> Since you kindly asked, the following are two main areas in our assessment of 
> the general arc of the Julia ecosystem:
> 
> 1. Will the roadmap obviate some of the bottlenecks for day to day normal 
> exploratory workflow?  These are minimal  things that R and Python have and 
> whose lack hamper any use of Julia for regular analysis. Thing like robust 
> dataframe with data i/o into different formats, web scraping, work out 
> nullable semantics and integration with ecosystem , robust data cleaning and 
> tidy data, modeling with basic  diagnostic tests etc
> 
> 2. Will the roadmap jump leapfrog into areas and capabilities that are 
> currently not covered by other stats and data science ecosystems?
> 
>  There are many here, but we are specifically looking at the ability to work 
> with modeling on medium sized out of core databases. This would include an 
> abstract dataframe like interface to said databses MySQL and SQLlite, and 
> some sort of modeling capability on the same. My dream would be separation of 
> model specification as a DAG/ probabilistic programming framework, from 
> fitting the model. Thus the same model can be fit with different sort of data 
> and optimizers. Streaming black box variation inference can be a means to 
> extend this to  OOC work. 
> 
> I realize Julia won't for a while have all the statistical tests and random 
> models of python, much less R. However, a general yet powerful and scalable 
> data querying and prob programming framework could arguably  suffice for most 
> python and R use cases in Data Science while provide a comparative advantage 
> over other frameworks where it counts.  To my knowledge, Right now SAS and 
> STATA are the only packages that offer general modeling with on disk data 
> sets, but the sort of capability I outlined would seem to be in excess of 
> what they offer. 
> 
> A bonus would be filling out gadfly towards Ggplot and ggvis capability. 
>  
> 
> 
> On Thursday, December 24, 2015 at 11:50:42 AM UTC-5, Viral Shah wrote:
> What would be helpful is to know what kind of decisions you are thinking of 
> and what are the factors.
> 
> I suspect within 2 weeks for sure - but it's really for the Julia stats folks 
> to say. The idea is to get feedback and chart a course.
> 
> -viral
> 
> On 24 Dec 2015 10:07 p.m., "Lampkld" <lamp...@gmail.com> wrote:
> Sorry to bug you, but can we expect something  this or next week?  Would be 
> helpful in knowing until when to push some stuff off. 
> 
> On Thursday, December 17, 2015 at 6:20:45 PM UTC-5, Viral Shah wrote:
> 
> The JuliaStats team will be publishing a general plan on stats+df in a few 
> days. I doubt we will have settled on all the df issues by then, but at least 
> there will be something to start with. 
> 
> 
> -viral 
> 
> 
> 
> > On 17-Dec-2015, at 10:15 PM, Lampkld <lamp...@gmail.com> wrote: 
> > 
> > Hi Viral, 
> > 
> > Any update on this (stats + df) by chance or idea when we can get one? Even 
> > a roadmap or some sort of vision or other details would help with   
> > decision making regarding infrastructure. 
> > 
> > Thanks! 
> > 
> > On Wednesday, November 11, 2015 at 3:00:50 AM UTC-5, Viral Shah wrote: 
> > Yes, we are really excited. This grant is to focus on core Julia compiler 
> > infrastructure and key math libraries. Much of the libraries focus will be 
> > on statistical Computing. 
> > -viral 
> > 
> 


-viral



Reply via email to