[ 
https://issues.apache.org/jira/browse/SANDBOX-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224219#comment-13224219
 ] 

Emmanuel Bourg commented on SANDBOX-291:
----------------------------------------

There are a lot of good suggestions, I moved the details in the description of 
the issue so we can update it as we make progress integrating the changes.
                
> Lots of possible changes
> ------------------------
>
>                 Key: SANDBOX-291
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-291
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: CSV
>    Affects Versions: Nightly Builds
>            Reporter: Bob Smith
>            Priority: Minor
>         Attachments: src.zip
>
>
> I made a lot of changes to pretty much all of the classes in the csv package. 
>  I thought it would be better to put all of the the changes here in one 
> issue, but feel free to only take the parts you like (if any).  Hopefully if 
> nothing else the test cases will be useful to you.
> I'll try to list most of the changes here, but I'm sure I'm forgetting some. 
> This should include all of the big changes at least. I focused mostly on the 
> parser, but I also made a few changes to the printer classes (although I 
> don't think I added any new test cases there).
> h3. General Changes
> - Changed all class names with "CSV" in them to use "Csv". This is how it 
> appears in the commons-lang "escapeCsv" methods and I think it's easier to 
> read the class name when acronyms are not in all upper case.
> - Formatted the code. I used Eclipse with a version of the Java formatting 
> style that uses spaces instead of tabs and with a few other small changes to 
> try to make it more similar to the style of this code. The formatting was 
> inconsistent before (sometimes 2 space indent, sometimes 4) which made it 
> really hard to work on.
> - Removed all deprecated methods/constructors
> - Made all public classes final. If there is ever a need to create subclasses 
> of them then this could be changed, but I think it would be better to at 
> least start them as final (since once they are released as non-final it's 
> hard to go back).
> - A few bug fixes (and test cases for them)
> h3. CsvParser
> - There were a few bugs for special cases, so I made as small of changes as I 
> could to the parser code to fix these.
> - Added a lot of test cases. I created a test case for all bugs that I found, 
> so even if you don't use my changes to this class you should be able to use 
> the test cases to find all of the same bugs.
> - Added a close method.
> - Renamed the nextValue method to getValue (so it is more consistent with the 
> getAll and getLine method names). I think I would prefer to use a different 
> method name prefix for all three of these (like "readAll") since I wouldn't 
> normally expect a "get" method to have side effects, but I didn't want to 
> just change the names of the most used methods.
> - Changed the getLineNumber method to return the correct line number when 
> there are multi-line values.
> - Moved all of the lexer methods into an inner CsvLexer class that is 
> completely independent of the CsvParser class. The methods were already 
> separated out, so it wasn't a very big change. I also moved the lexer test 
> cases into a new CsvLexerTest class.
> - Got rid of the interpreting unicode escape options. This doesn't really 
> have anything to do with parsing a CSV file so I think it should be left up 
> to the user of the class to implement this if needed. As an example, I made a 
> CsvParserUnicodeEscapeTest class that uses the code from the lexer in a 
> Reader subclass. One nice thing is that with this implementation, the 
> interpretted values can be used as the delimiter, encapsulator, etc.
> - Got rid of the "escape" option for the same reason as the unicode escape 
> option. I replaced it with an encapsulator escape option that is only used as 
> an escape operator on the encapsulator character.
> h3. ExtendedBufferedReader
> - Greatly simplified this class. I removed all the methods that weren't being 
> used (including keeping track of the line number) and changed the lookahead 
> option to use the BufferedReader mark and reset methods.
> h3. CsvStrategy
> - I split this class into three classes: an abstract base class 
> (CsvStrategy), a parser-specific version (CsvParseStrategy) and a 
> printer-specific version (CsvPrintStrategy). I didn't like that the strategy 
> was used for both parsing and printing even though some of the values only 
> applied to parsing (and there could be values that apply only to printing as 
> well).
> - Made this class immutable (as described in SANDBOX-279)
> - Changed the whitespace options to not ignore whitespace by default. This is 
> what the document at http://www.rfc-editor.org/rfc/rfc4180.txt recommends for 
> the CSV format, so I think it should be like that by default. I added an 
> "IGNORE_WHITESPACE_STRATEGY" field that works the same as the old defaults.
> - Removed the interpretUnicodeEscape option and replaced the escape field 
> with an encapsuatorEscape field (as described in the CsvParser change 
> details).
> - Added an ignoreEncapsulationTrailingCharacters field. This is used to 
> either ignore or append characters that are after an encapsulated value. 
> Previously an IOException was being thrown here, which I don't think is ever 
> a good idea.
> - Added some restrictions to prevent the values from being things that would 
> break the parser. This includes using a line break for anything or having 
> equal two values (other than the encapsulator and encapsulator escape).
> h3. CsvPrinter
> - I changed this to use a modified version of the commons-lang escapeCsv 
> method (I hope it is ok to copy a small amount of code from one commons 
> project to another?). The escaping is a little different (and simpler) that 
> the old version, but I think the commons-lang version seems to be the best 
> way to do it.
> - I added an option to the constructor to allow disabling auto-flushing of 
> the output stream (similar to what is in the PrintStream class). I also 
> reduced the number of times the output is flushed when using the print method 
> that take array input.
> h3. CharBuffer
> - I didn't really make any changes other than to make it a non-public class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to