I agree with Pete. Moreover, Python doesn't have built-in statistic
functions but adding package (numpy and scipy in this case) is very simple.
Quentin
Le 12/09/2012 17:11, Pete Meyer a écrit :
One thing to keep in mind is that there's usually a trade-off between
setup (writing and testing) and execution time. For one-off data
processing, I'd focus on implementation speed rather than execution
speed (in other words, FORTRAN might not be ideal unless you're
already fluent with it).
That said, I'd take a look at python, octave or R. Python's
relatively easy to learn, and more flexible than octave/R; but it
doesn't have the built-in statistic functions that octave and R do.
One other tip which you've probably already though of - Depending on
your runtimes (I don't think 100s MB of data is usually considered an
enormous amount, but it'll depend on what you're doing) it may be
worth getting things working on a small subset of the data first.
Pete
Jacob Keller wrote:
Dear List,
since this probably comes up a lot in manipulation of pdb/reflection
files
and so on, I was curious what people thought would be the best
language for
the following: I have some huge (100s MB) tables of tab-delimited
data on
which I would like to do some math (averaging, sigmas, simple
arithmetic,
etc) as well as some sorting and rejecting. It can be done in Excel, but
this is exceedingly slow even in 64-bit, so I am looking to do it
through
some scripting. Just as an example, a "sort" which takes >10 min in
Excel
takes ~10 sec max with the unix command sort (seems crazy, no?). Any
suggestions?
Thanks, and sorry for being off-topic,
Jacob