[ https://issues.apache.org/jira/browse/ARROW-18113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634647#comment-17634647 ]
Percy Camilo Triveño Aucahuasi edited comment on ARROW-18113 at 11/16/22 5:12 AM: ---------------------------------------------------------------------------------- {quote}Sounds reasonable to me, at first glance. Maybe leave behind a using CacheOptions = CoalesceOptions for compatibility (can you deprecate a using declaration?) {quote} David, I think we can deprecate it (in case we choose to rename&move it). {quote}So if we add a ReadManyAsync then I think it should not have a cache options parameter. Instead that should be a property of the filesystem if it needs to be configurable. {quote} Weston, yes initially I thought that _CoalesceOptions_ would be part of _arrow::io::IOContext_ (as an attribute) and _ReadManyAsync_ could use/pass the _CoalesceOptions_ to the filesystem. But it make sense to let the filesystem handle all of that, so in that case: # we still may choose to rename _arrow::io::CacheOptions_ to {_}arrow::io::{_}{_}CoalesceOptions{_} and move it into {_}interfaces.h{_}, so each filesystem's ctor will require {_}arrow::io::CoalesceOptions{_}. # or we just can include _caching.h_ in every filesystem declaration without changing/renaming _arrow::io::CacheOptions_ (so each filesystem's ctor will require {_}arrow::io::{_}{_}CacheOptions{_}) Let me know which one sounds better to you, thanks. was (Author: aucahuasi): ??Sounds reasonable to me, at first glance. Maybe leave behind a {{using CacheOptions = CoalesceOptions}} for compatibility (can you deprecate a {{using}} declaration?)?? David, I think we can deprecate it (in case we choose to rename&move it). ??So if we add a _ReadManyAsync_ then I think it should not have a cache options parameter. Instead that should be a property of the filesystem if it needs to be configurable.?? Weston, yes initially I thought that _CoalesceOptions_ would be part of _arrow::io::IOContext_ (as an attribute) and _ReadManyAsync_ could use/pass the _CoalesceOptions_ to the filesystem. But it make sense to let the filesystem handle all of that, so in that case: # we still may choose to rename _arrow::io::CacheOptions_ to {_}arrow::io::{_}{_}CoalesceOptions{_} and move it into {_}interfaces.h{_}, so each filesystem's ctor will require {_}arrow::io::CoalesceOptions{_}. # or we just can include _caching.h_ in every filesystem declaration without changing/renaming _arrow::io::CacheOptions_ (so each filesystem's ctor will require {_}arrow::io::{_}{_}CacheOptions{_}) Let me know which one sounds better to you, thanks. > Implement a read range process without caching > ---------------------------------------------- > > Key: ARROW-18113 > URL: https://issues.apache.org/jira/browse/ARROW-18113 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Percy Camilo Triveño Aucahuasi > Assignee: Percy Camilo Triveño Aucahuasi > Priority: Major > > The current > [ReadRangeCache|https://github.com/apache/arrow/blob/e06e98db356e602212019cfbae83fd3d5347292d/cpp/src/arrow/io/caching.h#L100] > is mixing caching with coalescing and making difficult to implement readers > capable to really perform concurrent reads on coalesced data (see this > [github > comment|https://github.com/apache/arrow/pull/14226#discussion_r999334979] for > additional context); for instance, right now the prebuffering feature of > those readers cannot handle concurrent invocations. > The goal for this ticket is to implement a similar component to > ReadRangeCache for performing non-cache reads (doing only the coalescing part > instead). So, once we have that new capability, we can port the parquet and > IPC readers to this new component and keep improving the reading process > (that would be part of other set of follow-up tickets). Similar ideas were > mentioned here https://issues.apache.org/jira/browse/ARROW-17599 > Maybe a good place to implement this new capability is inside the file system > abstraction (as part of a dedicated method to read coalesced data) and where > the abstract file system can provide a default implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010)