Hello,

I’m using commons csv for a component in our application, and I ran into a 
weird edge case. In our application, we take in CSV files from the file system 
without knowing the format beforehand. So, I’m writing a method that guesses 
the CSV format based on column consistency, encountering trailing data, and a 
few other things. While writing tests, I encountered the fact that commons csv 
does not escape the full CRLF with a single escape character. For example, if \ 
is the escape character, row,\\r\n<file:////r/n>test will be parsed as:

row\r
test

Instead of:

row,\r\ntest

Initially this felt like the wrong decision to me, so I created a fix for it. 
During the regression tests, the testRandomMySql test failed because 
occasionally a \\r<file:////r> was generated as the last part of a row, which 
also escaped the \n record separator, causing an incorrect number of rows to be 
read. This made me question whether it’s a good idea at all to escape both the 
CR and the LF if they’re together, since maybe it’s best to assume that they 
would be escaped separately like so: \\r\\n<file:////r/n>. Though, if someone 
were writing a csv manually on a windows machine and decided to escape a 
newline, I could see them simply typing \ and then hitting enter, which would 
give: \\r\n<file:////r/n>.

I would be interested to hear other people’s thoughts on this. If it’s still 
something we deem an issue, I can modify the mySQL test and make a PR.

Thank you,
Josh

Reply via email to