[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #1061: ARROW2: support avro

GitBox Tue, 05 Oct 2021 21:48:34 -0700


jorgecarleitao commented on issue #1061:
URL: 
https://github.com/apache/arrow-datafusion/issues/1061#issuecomment-935443808



   The approach I took in `arrow2` was to not use `AvroValue` and work directly 
with the byte stream. The reason for this is that `AvroValue` takes bytes by 
value and not by reference, which implies that when we perform the conversion 
`File -> AvroValue -> Arrow` for a `Utf8` or `Binary`, we end up with the 
transformation `bytes -> AvroValue::String(String) -> Utf8Array`, performing an 
extra allocation _per item_.
   
   Since the avro format is relatively simple to read from, I just implemented 
a reader from bytes directly to arrow. I [did not implemented it for all 
types](https://github.com/jorgecarleitao/arrow2/blob/main/src/io/avro/read/deserialize.rs#L131),
 only for the basic ones, but the idea stands.
   
   So, if I understood, the goal is to generalize the parser to more types. 
Which ones are needed here?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #1061: ARROW2: support avro

Reply via email to