Hello, I’m using commons csv for a component in our application, and I ran into a weird edge case. In our application, we take in CSV files from the file system without knowing the format beforehand. So, I’m writing a method that guesses the CSV format based on column consistency, encountering trailing data, and a few other things. While writing tests, I encountered the fact that commons csv does not escape the full CRLF with a single escape character. For example, if \ is the escape character, row,\\r\n<file:////r/n>test will be parsed as:
row\r test Instead of: row,\r\ntest Initially this felt like the wrong decision to me, so I created a fix for it. During the regression tests, the testRandomMySql test failed because occasionally a \\r<file:////r> was generated as the last part of a row, which also escaped the \n record separator, causing an incorrect number of rows to be read. This made me question whether it’s a good idea at all to escape both the CR and the LF if they’re together, since maybe it’s best to assume that they would be escaped separately like so: \\r\\n<file:////r/n>. Though, if someone were writing a csv manually on a windows machine and decided to escape a newline, I could see them simply typing \ and then hitting enter, which would give: \\r\n<file:////r/n>. I would be interested to hear other people’s thoughts on this. If it’s still something we deem an issue, I can modify the mySQL test and make a PR. Thank you, Josh