cyb70289 commented on code in PR #13640: URL: https://github.com/apache/arrow/pull/13640#discussion_r924061697
########## cpp/src/arrow/io/file.cc: ########## @@ -378,6 +378,77 @@ Status FileOutputStream::Write(const void* data, int64_t length) { int FileOutputStream::file_descriptor() const { return impl_->fd(); } +// ---------------------------------------------------------------------- +// DirectFileOutputStream, change the Open, Write and Close methods from FileOutputStream +// Uses DirectIO for writes. Will only write out things in 4096 byte blocks. Buffers leftover bytes +// in an internal data structure, which will be padded to 4096 bytes and flushed upon call to close. + +class DirectFileOutputStream::DirectFileOutputStreamImpl : public OSFile { + public: + Status Open(const std::string& path, bool append) { + const bool truncate = !append; + return OpenWritable(path, truncate, append, true /* write_only */, true); + } + Status Open(int fd) { return OpenWritable(fd); } +}; + +DirectFileOutputStream::DirectFileOutputStream() { + uintptr_t mask = (uintptr_t)(4095); + uint8_t *mem = static_cast<uint8_t *>(malloc(4096 + 4095)); + cached_data = reinterpret_cast<uint8_t *>( reinterpret_cast<uintptr_t>(mem+4095) & ~(mask)); Review Comment: Also, the biggest advantage of direct IO is to transfer data directly between the user mode buffer and disk controller. This additional buffering may render the advantage useless IIUC. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org