[ https://issues.apache.org/jira/browse/CSV-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691947#comment-13691947 ]
Gary Gregory commented on CSV-42: --------------------------------- Just a note on: {quote} Added a close method. The try-with-resources statement in Java 7 makes resources management much easier, there is no need to add a close() method to the parser. {quote} The try-with-resources in Java 7 requires the a class implements Closeable, which means implementing close(), but the parser does not create closeable resources, it takes a {{Reader}}, which itself can be used with try-with-resources in Java 7. > Lots of possible changes > ------------------------ > > Key: CSV-42 > URL: https://issues.apache.org/jira/browse/CSV-42 > Project: Commons CSV > Issue Type: Improvement > Affects Versions: 1.0 > Reporter: Bob Smith > Priority: Minor > Fix For: 1.0 > > Attachments: src.zip > > > I made a lot of changes to pretty much all of the classes in the csv package. > I thought it would be better to put all of the the changes here in one > issue, but feel free to only take the parts you like (if any). Hopefully if > nothing else the test cases will be useful to you. > I'll try to list most of the changes here, but I'm sure I'm forgetting some. > This should include all of the big changes at least. I focused mostly on the > parser, but I also made a few changes to the printer classes (although I > don't think I added any new test cases there). > h3. General Changes > - Changed all class names with "CSV" in them to use "Csv". This is how it > appears in the commons-lang "escapeCsv" methods and I think it's easier to > read the class name when acronyms are not in all upper case. (x) {color:red}3 > letter acronyms are usually kept in uppercase (for example URLConnection or > SAXParser in the JDK, but there are some exceptions){color} > - Formatted the code. I used Eclipse with a version of the Java formatting > style that uses spaces instead of tabs and with a few other small changes to > try to make it more similar to the style of this code. The formatting was > inconsistent before (sometimes 2 space indent, sometimes 4) which made it > really hard to work on. (/) {color:green}DONE{color} > - Removed all deprecated methods/constructors (/) {color:green}DONE{color} > - Made all public classes final. If there is ever a need to create subclasses > of them then this could be changed, but I think it would be better to at > least start them as final (since once they are released as non-final it's > hard to go back). > - A few bug fixes (and test cases for them) > h3. CsvParser > - There were a few bugs for special cases, so I made as small of changes as I > could to the parser code to fix these. > - Added a lot of test cases. I created a test case for all bugs that I found, > so even if you don't use my changes to this class you should be able to use > the test cases to find all of the same bugs. > - Added a close method. (x) {color:red} The try-with-resources statement in > Java 7 makes resources management much easier, there is no need to add a > close() method to the parser.{color} > - Renamed the nextValue method to getValue (so it is more consistent with the > getAll and getLine method names). I think I would prefer to use a different > method name prefix for all three of these (like "readAll") since I wouldn't > normally expect a "get" method to have side effects, but I didn't want to > just change the names of the most used methods. (x) {color:red}This method > has been removed, the parser now works line by line.{color} > - Changed the getLineNumber method to return the correct line number when > there are multi-line values. (x) {color:red}The suggested code counts the > number of records instead of the number of lines. For debugging it's better > to return the actual line number.{color} > - Moved all of the lexer methods into an inner CsvLexer class that is > completely independent of the CsvParser class. The methods were already > separated out, so it wasn't a very big change. I also moved the lexer test > cases into a new CsvLexerTest class. (/) {color:green}DONE{color} > - Got rid of the interpreting unicode escape options. This doesn't really > have anything to do with parsing a CSV file so I think it should be left up > to the user of the class to implement this if needed. As an example, I made a > CsvParserUnicodeEscapeTest class that uses the code from the lexer in a > Reader subclass. One nice thing is that with this implementation, the > interpreted values can be used as the delimiter, encapsulator, etc. (/) > {color:green}DONE - The unicode unescaping is now handled by a class > implementing java.io.Reader (to be contributed to Commons IO).{color} > - Got rid of the "escape" option for the same reason as the unicode escape > option. I replaced it with an encapsulator escape option that is only used as > an escape operator on the encapsulator character. > h3. ExtendedBufferedReader > - Greatly simplified this class. I removed all the methods that weren't being > used (including keeping track of the line number) and changed the lookahead > option to use the BufferedReader mark and reset methods. (/) > {color:green}DONE - ExtendedBufferedReader is still counting the lines, but > the mark/reset lookahead improved the performance by 30%.{color} > h3. CsvStrategy > - I split this class into three classes: an abstract base class > (CsvStrategy), a parser-specific version (CsvParseStrategy) and a > printer-specific version (CsvPrintStrategy). I didn't like that the strategy > was used for both parsing and printing even though some of the values only > applied to parsing (and there could be values that apply only to printing as > well). (x) {color:red}There aren't enough properties in CSVFormat to split > the class{color} > - Made this class immutable (as described in SANDBOX-279) (/) > {color:green}DONE{color} > - Changed the whitespace options to not ignore whitespace by default. This is > what the document at http://www.rfc-editor.org/rfc/rfc4180.txt recommends for > the CSV format, so I think it should be like that by default. I added an > "IGNORE_WHITESPACE_STRATEGY" field that works the same as the old defaults. > (/) {color:green}DONE{color} > - Removed the interpretUnicodeEscape option and replaced the escape field > with an encapsulatorEscape field (as described in the CsvParser change > details). (/) {color:green}DONE - Unicode unescaping has been removed from > the format parameters as part of CSV-67{color} > - Added an ignoreEncapsulationTrailingCharacters field. This is used to > either ignore or append characters that are after an encapsulated value. > Previously an IOException was being thrown here, which I don't think is ever > a good idea. > - Added some restrictions to prevent the values from being things that would > break the parser. This includes using a line break for anything or having > equal two values (other than the encapsulator and encapsulator escape). (/) > {color:green}DONE{color} > h3. CsvPrinter > - I changed this to use a modified version of the commons-lang escapeCsv > method (I hope it is ok to copy a small amount of code from one commons > project to another?). The escaping is a little different (and simpler) that > the old version, but I think the commons-lang version seems to be the best > way to do it. (x) {color:red}Good point on the code reuse, but escapeCsv > doesn't handle properly a format using escaped characters instead of quoted > values, such as the default MySQL format.{color} > - I added an option to the constructor to allow disabling auto-flushing of > the output stream (similar to what is in the PrintStream class). I also > reduced the number of times the output is flushed when using the print method > that take array input. (x) {color:red}CSVPrinter now outputs to an Appendable > instead of a Writer, which means you can use a PrintStream and control > exactly the behavior of the flushing.{color} > h3. CharBuffer > - I didn't really make any changes other than to make it a non-public class. > (/) {color:green}DONE{color} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira