Dandandan commented on pull request #9084:
URL: https://github.com/apache/arrow/pull/9084#issuecomment-763148387


   Do you still want to move this in @jorgecarleitao ?
   I had a few thoughts that followed from this PR:
   
   * Making `StringRecord` public now would mean that we introduce a breaking 
change when changing the implementation of the csv parser.
   * We can spend some more time optimizing the parsers themselves, e.g. for 
csv parsing we can remove the overhead of the `csv` crate (which seems a big 
chunk of reading csv according to profiling results), avoiding copies, utf-8 
validation in some cases, etc.
   * The parsers could use a `cast` kernel, this requires some more changes to 
`cast`. From there, additional parallelization on parsing might be easier / 
generic / reusable as we already have a number of `Array`s. This might even 
enable more advanced things like using SIMD for parsing some data types.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to