n3world commented on pull request #10202: URL: https://github.com/apache/arrow/pull/10202#issuecomment-853877596
> > Does this use case make sense? Does it seem like something you want to support? > > It makes sense, but I'm not sure we want to support it. Basically, I would like to see if it can be implemented with minimal complication at the heart of the CSV parser internals. In particular, I don't want the handler to be able to modify any data. It should only be allowed to return a `Status` to say whether we should go on or not. By modify the row are you specifically referring to the ability to remove columns from the row, or do you mean instead of passing the RowModifier into the callback the callback will return an enum indicating `error`, `skip` or `fix` and the csv parser will modify the row for `fix`? Does this mean you now don't mind the idea of a callback just you want limit its abilities? > Sidenote: if the CSV parser loses sync (for exemple because of a misquoted CSV cell), you may also have many "invalid" rows. Yes that could be a problem but could be reduced by capping the number of errors to report or a threshold that after so many bad rows an error will be generated and parsing stopped. If there is a callback it would be up to the callback implementer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org