Thank you, lots of stuff to ponder and comment. On Thu, 10 Nov 2011 02:10:28 -0800, Jonathan M Davis wrote:
> On Friday, October 28, 2011 09:18:27 dsimcha wrote: >> Formal review of Jesse Phillips's CSV parser module begins today and >> runs through November . Following that, a vote will take place for one >> week. Please post any comments about this library in this thread. >> >> Docs: >> http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html >> >> Code: >> https://github.com/he-the-great/phobos/tree/csv > > - Nitpick: You should probably proofread the module documentation again. > It has several mispellings and grammar errors. Err, I'll try, use some of the great tips you gave me :) > - The functions and types mentioned in the documentation should use LREF > so that they're links to their documentation. I'll see if I can find them all while proofing. > - Shouldn't the term "header" be used rather than "heading?" So, you'd > have HeaderMismatchException instead of HeadingMismatchException. > Usually, the term heading is only used for stuff like the heading of an > article. Usually, the term header is used for stuff like files. RFC 4180 > uses the term header but never heading. So, unless I'm missing > something, I really think that all of the instances of heading (both in > the documentation and in the API) should be changed to header. Used both in development, then chose one, guess I got it wrong. Changing. > - I'd argue against Separator in csvReader being char by default. > Individual characters should generally be dchar, not char. Using char by > itself is almost always bad practice IMHO. The only reason I put defaults on it was because it wouldn't choose the value for it from my default values. Otherwise it works with what you give it, be it char, wchar, dchar. > - I'd say that csvReader should say that it returns a Records struct > which is an input range rather than saying that it provides > std.range.isInputRange. We just don't talk that way about ranges > normally. We say that something is a particular range type, not that it > provides the API for a particular range type. If you want to make "is an > input range" a link to std.range.isInputRange, that's fine, but what's > currently there is not how ranges are typically referred to. I'd also > suggest that you on the range functions where you say that they're part > of the std.range.isInputRange interface, you should use the term API > instead of interface, since interface has a specific meaning in the > language which has nothing to do with this context. Thanks. > - Why does csvNextToken take an Appender!(char[])? That seems overly > restrictive. I'd argue that it should take an output range of dchar > (which Appender!(char[]) is). Then it's not restricted to Appender, and > it's not restricted to char[]. Appender is used so that the exception thrown can contain the consumed data. Thinking again on this, I suppose I can catch the exception add the data and rethrow. Sounds pretty good actually. > - Also, the fact that you have isSomeChar!(ElementType!Range) concerns > me somewhat. Normally, I'd expect something more like > is(ElementType!Range == dchar). As it is, you could have a range of char > or wchar, which could do funny things. Phobos doesn't normally support > that, and I'd be concerned about what would happen if someone tried. > ElementType!Range will be dchar for all string types, so they'll work > fine either way, but programmers who are overly adventurous or ignorant > may try and use a char or wchar range, and I'm willing to be that there > would be problems with that. Just all around, I'd advise checking that > the ranges are ranges of dchar and using dchar for the separators rather > than char. I was of a different mindset of the time. You are probably right I should prevent someone from eating their leg off. > - You should have unit tests testing ranges _other_ than string. The > fact that neither wstring nor dstring is tested is worrisome, and > ideally, you'd do stuff like call filter!"true" on a string so that you > have an input range which isn't a string. Without at least some of that, > I'd be very worried that your code can't actually handle generic ranges > of dchar and really only works with string. Did add a wchar input range test, but now it should just be an dchar input range test since wchar ranges won't be allowed. > - You should have some unit tests with unicode values - preferably some > which are multiple code units for both char and wchar. For instance, > std.string uses \U00010143 in some of its tests, because it's 4 code > units in UTF-8 and 2 in UTF-16. The complete lack of testing with > unicode is worrisome in the same way that only testing with string is > worrisome. It's too easy to screw something up such that it happens to > work with string and ASCII but not in general. So, please add tests > which involve unicode - not only in the data but in the separators. Didn't create any new tests, just added unicode in to input and as separators into old tests. > - I find it somewhat odd that you would use a pointer to the range in > Record rather than using a slice, but I'd have to examine it in more > detail to see whether that's actually necessary or not. But in general, > I'd argue that a slice should be used rather than a pointer if it can > be. Record requires modifying the range, for a true input range this is not a problem, but strings and some other struct based ranges are implicit forward ranges and save upon passing. Slicing doesn't exist on input ranges, though I'd almost argue [] should take the place of save() in a forward range. > - Jonathan M Davis > > > P.S. Just as a note, since you seem to like not using braces if you > don't need to for if statements and the like, I though that I'd point > out that try and catch don't require braces in D if htey contain only > one statement. You have several one line catches which use braces when > they don't need to. I'm not arguing that you should or shouldn't change > it, but I thought that I would point out that the braces aren't > necessary in case you didn't know (since they _are_ necessary in other > languages). Thanks, hopefully I'm consistent in that regard and I'll just leave it. I knew of try, but not catch.