On Fri, Jan 22, 2016 at 10:04:58PM +0000, data pulverizer via Digitalmars-d-learn wrote: [...] > I guess the next step is allowing Tuple rows with mixed types.
Alright. I threw together a new CSV parsing function that loads CSV data into an array of structs. Currently, the implementation is not quite polished yet (it blindly assumes the first row is a header row, which it discards), but it does work, and outperforms std.csv by about an order of magnitude. The initial implementation was very slow (albeit still somewhat fast than std.csv by about 10% or so) when given a struct with string fields. However, structs with POD fields are lightning fast (not significantly different from before, in spite of all the calls to std.conv.to!). This suggested that the slowdown was caused by excessive allocations of small strings, causing a heavy GC load. This suspicion was confirmed when I ran the same input data with a struct where all string fields were replaced with const(char)[] (so that std.conv.to simply returned slices to the data) -- the performance shot back up to about 1700 msecs, a little slower than the original version of reading into an array of array of const(char)[] slices, but about 58 times(!) the performance of std.csv. So I tried a simple optimization: instead of allocating a string per field, allocate 64KB string buffers and copy string field values into it, then taking slices from the buffer to assign to the struct's string fields. With this optimization, running times came down to about the 1900 msec range, which is only marginally slower than the const(char)[] case, about 51 times faster than std.csv. Here are the actual benchmark values: 1) std.csv: 2126883 records, 102136 msecs 2) fastcsv (struct with string fields): 2126883 records, 1978 msecs 3) fastcsv (struct with const(char)[] fields): 2126883 records, 1743 msecs The latest code is available on github: https://github.com/quickfur/fastcsv The benchmark driver now has 3 new targets: stdstruct - std.csv parsing of CSV into structs faststruct - fastcsv parsing of CSV into struct (string fields) faststruct2 - fastcsv parsing of CSV into struct (const(char)[] fields) Note that the structs are hard-coded into the code, so they will only work with the census.gov test file. Things still left to do: - Fix header parsing to have a consistent interface with std.csv, or at least allow the user to configure whether or not the first row should be discarded. - Support transcription to Tuples? - Refactor the code to have less copy-pasta. - Ummm... make it ready for integration with std.csv maybe? ;-) T -- Fact is stranger than fiction.