[
https://issues.apache.org/jira/browse/ORC-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577592#comment-14577592
]
Aliaksei Sandryhaila commented on ORC-5:
----------------------------------------
The current implementation of positioned read is Buffer*
FileInputStream::read(uint64_t offset, uint64_t length, Buffer* buffer). It
takes a pointer to a constant-size Buffer and returns it; or if the buffer is
not specified, creates a new one and returns it. This approach has several
disadvantages:
a) it does not adhere to posix standards because it can allocate a new buffer
b) it is very non-trivial to use a custom memory pool for buffer allocation
c) at the moment, it is not compatible with libhdfs++ method Status
InputStreamImpl::PositionRead(void *buf, size_t nbyte, size_t offset, size_t
*read_bytes), which takes a pre-allocated buffer, length, and offset. One of
our goals is to align ORC library to libhdfs++ interface.
As an initial spec for buffers and positioned reads from input streams, I
propose the following:
1) InputStreams implement a POSIX-like method
ssize_t read(void *buf, size_t count, off_t offset)
that accepts a pre-allocated, sufficiently large buffer 'buf' that reads
'count' bytes from position 'offset';
the return value is the number of bytes actually read. The method cannot create
and return its own buffer.
2) Since DataBuffer<char> has all the functionality of HeapBuffer, use the
former instead of the latter.
3) Remove Buffer and its children from the library, as no longer needed.
4) If we want to keep the input stream for memory-mapped files, then derive
MMapBuffer from DataBuffer<char>.
> Converge on buffer design
> -------------------------
>
> Key: ORC-5
> URL: https://issues.apache.org/jira/browse/ORC-5
> Project: Orc
> Issue Type: Improvement
> Reporter: Aliaksei Sandryhaila
>
> Current implementation uses two kinds of buffers: DataBuffer<T> and children
> of Buffer class. The former can use a custom memory pool for allocation,
> while the latter is more similar to asio::buffer. We need to converge to a
> single buffer design that has both of these traits.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)