On Thu, Mar 1, 2012 at 10:58 PM, Jay Bourque <jayv...@gmail.com> wrote:
> 1. Loading text files using loadtxt/genfromtxt need a significant > performance boost (I think at least an order of magnitude increase in > performance is very doable based on what I've seen with Erin's recfile code) > 2. Improved memory usage. Memory used for reading in a text file shouldn’t > be more than the file itself, and less if only reading a subset of file. > 3. Keep existing interfaces for reading text files (loadtxt, genfromtxt, > etc). No new ones. > 4. Underlying code should keep IO iteration and transformation of data > separate (awaiting more thoughts from Travis on this). > 5. Be able to plug in different transformations of data at low level (also > awaiting more thoughts from Travis). > 6. memory mapping of text files? > 7. Eventually reduce memory usage even more by using same object for > duplicate values in array (depends on implementing enum dtype?) > Anything else? Yes -- I'd like to see the solution be able to do high -performance reads of a portion of a file -- not always the whole thing. I seem to have a number of custom text files that I need to read that are laid out in chunks: a bit of a header, then a block of number, another header, another block. I'm happy to read and parse the header sections with pure pyton, but would love a way to read the blocks of numbers into a numpy array fast. This will probably come out of the box with any of the proposed solutions, as long as they start at the current position of a passes-in fiel object, and can be told how much to read, then leave the file pointer in the correct position. Great to see this moving forward. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion