Hello Josh, Is there a mismatch in expectations of what escaping means?
Escaping works one character at a time: Escape the next single character. There is no escape start and escape end sequence characters. Am I missing something? Gary On Wed, Jul 17, 2024 at 5:38 PM Josh Bultman <josh.bult...@pkware.com.invalid> wrote: > > Hello, > > I’m using commons csv for a component in our application, and I ran into a > weird edge case. In our application, we take in CSV files from the file > system without knowing the format beforehand. So, I’m writing a method that > guesses the CSV format based on column consistency, encountering trailing > data, and a few other things. While writing tests, I encountered the fact > that commons csv does not escape the full CRLF with a single escape > character. For example, if \ is the escape character, > row,\\r\n<file:////r/n>test will be parsed as: > > row\r > test > > Instead of: > > row,\r\ntest > > Initially this felt like the wrong decision to me, so I created a fix for it. > During the regression tests, the testRandomMySql test failed because > occasionally a \\r<file:////r> was generated as the last part of a row, which > also escaped the \n record separator, causing an incorrect number of rows to > be read. This made me question whether it’s a good idea at all to escape both > the CR and the LF if they’re together, since maybe it’s best to assume that > they would be escaped separately like so: \\r\\n<file:////r/n>. Though, if > someone were writing a csv manually on a windows machine and decided to > escape a newline, I could see them simply typing \ and then hitting enter, > which would give: \\r\n<file:////r/n>. > > I would be interested to hear other people’s thoughts on this. If it’s still > something we deem an issue, I can modify the mySQL test and make a PR. > > Thank you, > Josh --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org