If the goal is to help Haskell be a more acceptable choice for general statistical analysis tasks, then hmatrix, statistics, and the various gsl wrappers already provide the majority of the functionality needed. I think the bigger problem is that there is no guidance on which libraries are industrial strength, and there's no glue layer making it easier to use the APIs you'd want to, and GHCi isn't always ideal as a repl for this workflow.
If you're interested in UI work, ideally we'd have something similar to RStudio as an environment, a simple set of windows encapsulating an editor, a repl, a plotting panel and help/history, this sounds superficial but it really has an impact when you're exploring a data set and trying stuff out. However, it would be a bigger contribution to get us to the point where we are able to just "import Quant.Prelude" to bring into scope all the standard functionality assumed in an environment like R or Matlab. In my experience most of this can come from re-exporting existing libraries while occasionally wrapping functions to simplify the interfaces and make them more consistent (e.g., a quant doesn't particularly need to know why Statistics.Sample.KernelDensity.kde uses unboxed vectors when the rest of that lib uses Generic, and they certainly won't want to spend their time remembering that they need to convert to call that function). As an exercise, in GHCi, try loading a few arbitrary csv files of tables including floating point columns, do a linear regression of one such column on another, and then display a scatterplot with the regression line, maybe throw in a check for the normality of the residuals. Assume you'll need to be able to handle large data sets so you need to use bytestring, attoparsec etc; beware that there's a known bug that will cause a segfault/bus error if you use some hmatrix/gsl functions from GHCi on x86_64, which is kind of a blocker in itself. Maybe I missed something obvious but it took me a looong time to figure out which containers, persistence + parsing, stats and plotting packages I should choose. I really disagree that we need a data frame type structure; they're an abomination in R, they try to accommodate event records and time series, and do neither well. Haskell records are fine for inhomogeneous event series and for homogeneous time series parallel Vectors or Matrices are better as they can be passed to BLAS and LAPACK with consequent performance and clarity advantages - column oriented storage rocks, and Haskell is already a good fit. Having used C++, Matlab and R (the latter for quite a while) I now use Haskell for all of my statistical analysis work, despite the many shortcomings it's definitely worth it for the code clarity and type checking, to say nothing of the pre-optimization performance and robustness. Best of luck, happy to share some preliminary code with you directly if you're interested! Tom On 21 March 2012 17:24, Ben Jones <ben.jamin.pw...@gmail.com> wrote: > I am a student currently interested in participating in Google Summer of > Code. I have a strong interest in Haskell, and a semester's worth of coding > experience in the language. I am a mathematics and cs double major with only > a semester left and I am looking for information regarding what the > community is lacking as far as mathematics and statistics libraries are > concerned. If there is enough interest I would like to put together a > project with this. I understand that such libraries are probably low > priority, but if anyone has anything I would love to hear it. > > Thanks for reading, > -Benjamin > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe