Dandandan commented on pull request #9084: URL: https://github.com/apache/arrow/pull/9084#issuecomment-763148387
Do you still want to move this in @jorgecarleitao ? I had a few thoughts that followed from this PR: * Making `StringRecord` public now would mean that we introduce a breaking change when changing the implementation of the csv parser. * We can spend some more time optimizing the parsers themselves, e.g. for csv parsing we can remove the overhead of the `csv` crate (which seems a big chunk of reading csv according to profiling results), avoiding copies, utf-8 validation in some cases, etc. * The parsers could use a `cast` kernel, this requires some more changes to `cast`. From there, additional parallelization on parsing might be easier / generic / reusable as we already have a number of `Array`s. This might even enable more advanced things like using SIMD for parsing some data types. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
