I've been using (and recommend) Pandas http://pandas.pydata.org/ along with
this book:
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CDIQFjAA&url=http%3A%2F%2Fshop.oreilly.com%2Fproduct%2F0636920023784.do&ei=GfSnUJSbGqm5ywH7poCwDA&usg=AFQjCNEJuio5DbubgyNQR4Tp9iM1RClZHA


Good luck,
Dave

On Fri, Nov 16, 2012 at 11:02 AM, Jon Wilson <j...@fnal.gov> wrote:

> Hi all,
> I am trying to find the best way to make histograms from large data
> sets.  Up to now, I've been just loading entire columns into in-memory
> numpy arrays and making histograms from those.  However, I'm currently
> working on a handful of datasets where this is prohibitively memory
> intensive (causing an out-of-memory kernel panic on a shared machine
> that you have to open a ticket to have rebooted makes you a little
> gun-shy), so I am now exploring other options.
>
> I know that the Column object is rather nicely set up to act, in some
> circumstances, like a numpy ndarray.  So my first thought is to try just
> creating the histogram out of the Column object directly. This is,
> however, 1000x slower than loading it into memory and creating the
> histogram from the in-memory array.  Please see my test notebook at:
> http://www-cdf.fnal.gov/~jsw/pytables%20test%20stuff.html
>
> For such a small table, loading into memory is not an issue.  For larger
> tables, though, it is a problem, and I had hoped that pytables was
> optimized so that histogramming directly from disk would proceed no
> slower than loading into memory and histogramming. Is there some other
> way of accessing the column (or Array or CArray) data that will make
> faster histograms?
> Regards,
> Jon
>
>
> ------------------------------------------------------------------------------
> Monitor your physical, virtual and cloud infrastructure from a single
> web console. Get in-depth insight into apps, servers, databases, vmware,
> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
> Pricing starts from $795 for 25 servers or applications!
> http://p.sf.net/sfu/zoho_dev2dev_nov
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>



-- 
David C. Wilson
(612) 460-1329
david.craig.wil...@gmail.com
http://www.linkedin.com/in/davidcwilson
------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to