n3world commented on pull request #10202:
URL: https://github.com/apache/arrow/pull/10202#issuecomment-853877596


   > > Does this use case make sense? Does it seem like something you want to 
support?
   > 
   > It makes sense, but I'm not sure we want to support it. Basically, I would 
like to see if it can be implemented with minimal complication at the heart of 
the CSV parser internals. In particular, I don't want the handler to be able to 
modify any data. It should only be allowed to return a `Status` to say whether 
we should go on or not.
   
   By modify the row are you specifically referring to the ability to remove 
columns from the row, or do you mean instead of passing the RowModifier into 
the callback the callback will return an enum indicating `error`, `skip` or 
`fix` and the csv parser will modify the row for `fix`?
   
   Does this mean you now don't mind the idea of a callback just you want limit 
its abilities?
   
   > Sidenote: if the CSV parser loses sync (for exemple because of a misquoted 
CSV cell), you may also have many "invalid" rows.
   
   Yes that could be a problem but could be reduced by capping the number of 
errors to report or a threshold that after so many bad rows an error will be 
generated and parsing stopped. If there is a callback it would be up to the 
callback implementer.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to