On Friday, October 28, 2011 09:18:27 dsimcha wrote: > Formal review of Jesse Phillips's CSV parser module begins today and > runs through November . Following that, a vote will take place for one > week. Please post any comments about this library in this thread. > > Docs: > http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html > > Code: > https://github.com/he-the-great/phobos/tree/csv
- Nitpick: You should probably proofread the module documentation again. It has several mispellings and grammar errors. - The functions and types mentioned in the documentation should use LREF so that they're links to their documentation. - Shouldn't the term "header" be used rather than "heading?" So, you'd have HeaderMismatchException instead of HeadingMismatchException. Usually, the term heading is only used for stuff like the heading of an article. Usually, the term header is used for stuff like files. RFC 4180 uses the term header but never heading. So, unless I'm missing something, I really think that all of the instances of heading (both in the documentation and in the API) should be changed to header. - I'd argue against Separator in csvReader being char by default. Individual characters should generally be dchar, not char. Using char by itself is almost always bad practice IMHO. - I'd say that csvReader should say that it returns a Records struct which is an input range rather than saying that it provides std.range.isInputRange. We just don't talk that way about ranges normally. We say that something is a particular range type, not that it provides the API for a particular range type. If you want to make "is an input range" a link to std.range.isInputRange, that's fine, but what's currently there is not how ranges are typically referred to. I'd also suggest that you on the range functions where you say that they're part of the std.range.isInputRange interface, you should use the term API instead of interface, since interface has a specific meaning in the language which has nothing to do with this context. - Why does csvNextToken take an Appender!(char[])? That seems overly restrictive. I'd argue that it should take an output range of dchar (which Appender!(char[]) is). Then it's not restricted to Appender, and it's not restricted to char[]. - Also, the fact that you have isSomeChar!(ElementType!Range) concerns me somewhat. Normally, I'd expect something more like is(ElementType!Range == dchar). As it is, you could have a range of char or wchar, which could do funny things. Phobos doesn't normally support that, and I'd be concerned about what would happen if someone tried. ElementType!Range will be dchar for all string types, so they'll work fine either way, but programmers who are overly adventurous or ignorant may try and use a char or wchar range, and I'm willing to be that there would be problems with that. Just all around, I'd advise checking that the ranges are ranges of dchar and using dchar for the separators rather than char. - You should have unit tests testing ranges _other_ than string. The fact that neither wstring nor dstring is tested is worrisome, and ideally, you'd do stuff like call filter!"true" on a string so that you have an input range which isn't a string. Without at least some of that, I'd be very worried that your code can't actually handle generic ranges of dchar and really only works with string. - You should have some unit tests with unicode values - preferably some which are multiple code units for both char and wchar. For instance, std.string uses \U00010143 in some of its tests, because it's 4 code units in UTF-8 and 2 in UTF-16. The complete lack of testing with unicode is worrisome in the same way that only testing with string is worrisome. It's too easy to screw something up such that it happens to work with string and ASCII but not in general. So, please add tests which involve unicode - not only in the data but in the separators. - I find it somewhat odd that you would use a pointer to the range in Record rather than using a slice, but I'd have to examine it in more detail to see whether that's actually necessary or not. But in general, I'd argue that a slice should be used rather than a pointer if it can be. - Jonathan M Davis P.S. Just as a note, since you seem to like not using braces if you don't need to for if statements and the like, I though that I'd point out that try and catch don't require braces in D if htey contain only one statement. You have several one line catches which use braces when they don't need to. I'm not arguing that you should or shouldn't change it, but I thought that I would point out that the braces aren't necessary in case you didn't know (since they _are_ necessary in other languages).