[ https://issues.apache.org/jira/browse/ARROW-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184086#comment-17184086 ]
Andrew Lamb commented on ARROW-9275: ------------------------------------ In general, I think the notion of implementing async Parquet and Arrow APIs that don't rely on tokio or other executors is a good idea. I think in order to make the crate as widely useful as possible, it should also retain a synchronous API for use with the rust standard library. One pattern I have seen is a using a `async` crate option that adds the appropriate async options (and possibly additional dependencies). For example, https://docs.rs/bzip2/0.4.1/bzip2/#async-io > [Rust] – Async Sans IO: R/W into/to Arrow Arrays > ------------------------------------------------ > > Key: ARROW-9275 > URL: https://issues.apache.org/jira/browse/ARROW-9275 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust > Reporter: Mahmut Bulut > Assignee: Mahmut Bulut > Priority: Major > > This issue can be considered an epic level that spans across other arrow > projects. > *Drill down* > Currently, traits like `ParquetReader` only allow synchronous interface which > uses BufReader having 8KB constant buffer. Over the network, this becomes a > problem. This can be easily solvable with differential buffers. In addition > to this shortage, there is a problem of executor engine is needed to schedule > from async trait methods to sync trait methods which should sit somewhere in > between to make requests asynchronous to external IO. On-disk IO is > acceptable with the approach we currently have since no reliable evented IO > exists for on-disk IO on major platforms. > All these considered abstractions that will expose asynchronous IO without > any side from executors, needs to be exposed. > > *Design Suggestions & Considerations* > The design should apply and consider: > * Sans IO, (for more information about Sans approach please see > [https://sans-io.readthedocs.io/] ) > * Not including any executor specific data, at all. > * Tests should work with any executor with little to no modification. > * Buffers are adjusted accordingly and use differential buffers to optimize > network trips. > * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO > traits or we do overlapping implementation, that will make our life harder in > the future. Sans IO should be compartmentalized. > > *Notes* > If Sans approach is not taken, the project will: > * use an extreme amount of dependencies. > * be not compatible with other Rust code at all. > * break currently working code uses array ingestions. > * integrations tests are going to be harder. > * it will really hard to adapt to completion-based APIs stabilize in the > future. (in the user projects) > * this suggestion is not about the flight format or any flight-related > information atm. This is purely making on-disk, remote IO (provider backends > like AWS etc.) async. > > *Open points* > A couple of open points: > * Identifying traits that are going to be asyncized. > * Designing internal routines. > * package name to expose. > * Gather traits into the designated packages in all file formats. -- This message was sent by Atlassian Jira (v8.3.4#803005)