[jira] [Commented] (CSV-42) Lots of possible changes

Gary Gregory (JIRA) Mon, 24 Jun 2013 06:09:22 -0700

    [ 
https://issues.apache.org/jira/browse/CSV-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691947#comment-13691947
 ]


Gary Gregory commented on CSV-42:
---------------------------------

Just a note on:

{quote}
Added a close method. The try-with-resources statement in Java 7 makes 
resources management much easier, there is no need to add a close() method to 
the parser.
{quote}

The try-with-resources in Java 7 requires the a class implements Closeable, 
which means implementing close(), but the parser does not create closeable 
resources, it takes a {{Reader}}, which itself can be used with 
try-with-resources in Java 7.
                
> Lots of possible changes
> ------------------------
>
>                 Key: CSV-42
>                 URL: https://issues.apache.org/jira/browse/CSV-42
>             Project: Commons CSV
>          Issue Type: Improvement
>    Affects Versions: 1.0
>            Reporter: Bob Smith
>            Priority: Minor
>             Fix For: 1.0
>
>         Attachments: src.zip
>
>
> I made a lot of changes to pretty much all of the classes in the csv package. 
>  I thought it would be better to put all of the the changes here in one 
> issue, but feel free to only take the parts you like (if any).  Hopefully if 
> nothing else the test cases will be useful to you.
> I'll try to list most of the changes here, but I'm sure I'm forgetting some. 
> This should include all of the big changes at least. I focused mostly on the 
> parser, but I also made a few changes to the printer classes (although I 
> don't think I added any new test cases there).
> h3. General Changes
> - Changed all class names with "CSV" in them to use "Csv". This is how it 
> appears in the commons-lang "escapeCsv" methods and I think it's easier to 
> read the class name when acronyms are not in all upper case. (x) {color:red}3 
> letter acronyms are usually kept in uppercase (for example URLConnection or 
> SAXParser in the JDK, but there are some exceptions){color}
> - Formatted the code. I used Eclipse with a version of the Java formatting 
> style that uses spaces instead of tabs and with a few other small changes to 
> try to make it more similar to the style of this code. The formatting was 
> inconsistent before (sometimes 2 space indent, sometimes 4) which made it 
> really hard to work on. (/) {color:green}DONE{color}
> - Removed all deprecated methods/constructors (/) {color:green}DONE{color}
> - Made all public classes final. If there is ever a need to create subclasses 
> of them then this could be changed, but I think it would be better to at 
> least start them as final (since once they are released as non-final it's 
> hard to go back).
> - A few bug fixes (and test cases for them)
> h3. CsvParser
> - There were a few bugs for special cases, so I made as small of changes as I 
> could to the parser code to fix these.
> - Added a lot of test cases. I created a test case for all bugs that I found, 
> so even if you don't use my changes to this class you should be able to use 
> the test cases to find all of the same bugs.
> - Added a close method. (x) {color:red} The try-with-resources statement in 
> Java 7 makes resources management much easier, there is no need to add a 
> close() method to the parser.{color}
> - Renamed the nextValue method to getValue (so it is more consistent with the 
> getAll and getLine method names). I think I would prefer to use a different 
> method name prefix for all three of these (like "readAll") since I wouldn't 
> normally expect a "get" method to have side effects, but I didn't want to 
> just change the names of the most used methods. (x) {color:red}This method 
> has been removed, the parser now works line by line.{color}
> - Changed the getLineNumber method to return the correct line number when 
> there are multi-line values. (x) {color:red}The suggested code counts the 
> number of records instead of the number of lines. For debugging it's better 
> to return the actual line number.{color}
> - Moved all of the lexer methods into an inner CsvLexer class that is 
> completely independent of the CsvParser class. The methods were already 
> separated out, so it wasn't a very big change. I also moved the lexer test 
> cases into a new CsvLexerTest class. (/) {color:green}DONE{color}
> - Got rid of the interpreting unicode escape options. This doesn't really 
> have anything to do with parsing a CSV file so I think it should be left up 
> to the user of the class to implement this if needed. As an example, I made a 
> CsvParserUnicodeEscapeTest class that uses the code from the lexer in a 
> Reader subclass. One nice thing is that with this implementation, the 
> interpreted values can be used as the delimiter, encapsulator, etc. (/) 
> {color:green}DONE - The unicode unescaping is now handled by a class 
> implementing java.io.Reader (to be contributed to Commons IO).{color}
> - Got rid of the "escape" option for the same reason as the unicode escape 
> option. I replaced it with an encapsulator escape option that is only used as 
> an escape operator on the encapsulator character.
> h3. ExtendedBufferedReader
> - Greatly simplified this class. I removed all the methods that weren't being 
> used (including keeping track of the line number) and changed the lookahead 
> option to use the BufferedReader mark and reset methods. (/) 
> {color:green}DONE - ExtendedBufferedReader is still counting the lines, but 
> the mark/reset lookahead improved the performance by 30%.{color}
> h3. CsvStrategy
> - I split this class into three classes: an abstract base class 
> (CsvStrategy), a parser-specific version (CsvParseStrategy) and a 
> printer-specific version (CsvPrintStrategy). I didn't like that the strategy 
> was used for both parsing and printing even though some of the values only 
> applied to parsing (and there could be values that apply only to printing as 
> well). (x) {color:red}There aren't enough properties in CSVFormat to split 
> the class{color}
> - Made this class immutable (as described in SANDBOX-279) (/) 
> {color:green}DONE{color}
> - Changed the whitespace options to not ignore whitespace by default. This is 
> what the document at http://www.rfc-editor.org/rfc/rfc4180.txt recommends for 
> the CSV format, so I think it should be like that by default. I added an 
> "IGNORE_WHITESPACE_STRATEGY" field that works the same as the old defaults. 
> (/) {color:green}DONE{color}
> - Removed the interpretUnicodeEscape option and replaced the escape field 
> with an encapsulatorEscape field (as described in the CsvParser change 
> details). (/) {color:green}DONE - Unicode unescaping has been removed from 
> the format parameters as part of CSV-67{color}
> - Added an ignoreEncapsulationTrailingCharacters field. This is used to 
> either ignore or append characters that are after an encapsulated value. 
> Previously an IOException was being thrown here, which I don't think is ever 
> a good idea.
> - Added some restrictions to prevent the values from being things that would 
> break the parser. This includes using a line break for anything or having 
> equal two values (other than the encapsulator and encapsulator escape). (/) 
> {color:green}DONE{color}
> h3. CsvPrinter
> - I changed this to use a modified version of the commons-lang escapeCsv 
> method (I hope it is ok to copy a small amount of code from one commons 
> project to another?). The escaping is a little different (and simpler) that 
> the old version, but I think the commons-lang version seems to be the best 
> way to do it. (x) {color:red}Good point on the code reuse, but escapeCsv 
> doesn't handle properly a format using escaped characters instead of quoted 
> values, such as the default MySQL format.{color}
> - I added an option to the constructor to allow disabling auto-flushing of 
> the output stream (similar to what is in the PrintStream class). I also 
> reduced the number of times the output is flushed when using the print method 
> that take array input. (x) {color:red}CSVPrinter now outputs to an Appendable 
> instead of a Writer, which means you can use a PrintStream and control 
> exactly the behavior of the flushing.{color}
> h3. CharBuffer
> - I didn't really make any changes other than to make it a non-public class. 
> (/) {color:green}DONE{color}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CSV-42) Lots of possible changes

Reply via email to