Does the reshape/transpose really take any appreciable time (compared to the 
I/O)?

--Tim

On Monday, December 08, 2014 09:14:35 AM John Myles White wrote:
> Yes, this is how I've been doing things so far.
> 
>  -- John
> 
> On Dec 8, 2014, at 9:12 AM, Tim Holy <tim.h...@gmail.com> wrote:
> > My suspicion is you should read into a 1d vector (and use `append!`), then
> > at the end do a reshape and finally a transpose. I bet that will be many
> > times faster than any other alternative, because we have a really fast
> > transpose now.
> > 
> > The only disadvantage I see is taking twice as much memory as would be
> > minimally needed. (This can be fixed once we have row-major arrays.)
> > 
> > --Tim
> > 
> > On Monday, December 08, 2014 08:38:06 AM John Myles White wrote:
> >> I believe/hope the proposed solution will work for most cases, although
> >> there's still a bunch of performance work left to be done. I think the
> >> decoupling problem isn't as hard as it might seem since there are very
> >> clearly distinct stages in parsing a CSV file. But we'll find out if the
> >> indirection I've introduced causes performance problems when things can't
> >> be inlined.
> >> 
> >> While writing this package, I found the two most challenging problems to
> >> be:
> >> 
> >> (A) The disconnect between CSV files providing one row at a time and
> >> Julia's usage of column major arrays, which encourage reading one column
> >> at a time. (B) The inability to easily resize! a matrix.
> >> 
> >> -- John
> >> 
> >> On Dec 8, 2014, at 5:16 AM, Stefan Karpinski <ste...@karpinski.org> 
wrote:
> >>> Doh. Obfuscate the code quick, before anyone uses it! This is very nice
> >>> and something I've always felt like we need for data formats like CSV –
> >>> a
> >>> way of decoupling the parsing of the format from the populating of a
> >>> data
> >>> structure with that data. It's a tough problem.
> >>> 
> >>> On Mon, Dec 8, 2014 at 8:08 AM, Tom Short <tshort.rli...@gmail.com>
> >>> wrote:
> >>> Exciting, John! Although your documentation may be "very sparse", the
> >>> code
> >>> is nicely documented.
> >>> 
> >>> On Mon, Dec 8, 2014 at 12:35 AM, John Myles White
> >>> <johnmyleswh...@gmail.com> wrote: Over the last month or so, I've been
> >>> slowly working on a new library that defines an abstract toolkit for
> >>> writing CSV parsers. The goal is to provide an abstract interface that
> >>> users can implement in order to provide functions for reading data into
> >>> their preferred data structures from CSV files. In principle, this
> >>> approach should allow us to unify the code behind Base's readcsv and
> >>> DataFrames's readtable functions.
> >>> 
> >>> The library is still very much a work-in-progress, but I wanted to let
> >>> others see what I've done so that I can start getting feedback on the
> >>> design.
> >>> 
> >>> Because the library makes heavy use of Nullables, you can only try out
> >>> the
> >>> library on Julia 0.4. If you're interested, it's available at
> >>> https://github.com/johnmyleswhite/CSVReaders.jl
> >>> 
> >>> For now, I've intentionally given very sparse documentation to
> >>> discourage
> >>> people from seriously using the library before it's officially released.
> >>> But there are some examples in the README that should make clear how the
> >>> library is intended to be used.>
> >>> -- John

Reply via email to