cyb70289 commented on code in PR #13640:
URL: https://github.com/apache/arrow/pull/13640#discussion_r924058255


##########
cpp/src/arrow/io/file.cc:
##########
@@ -378,6 +378,77 @@ Status FileOutputStream::Write(const void* data, int64_t 
length) {
 
 int FileOutputStream::file_descriptor() const { return impl_->fd(); }
 
+// ----------------------------------------------------------------------
+// DirectFileOutputStream, change the Open, Write and Close methods from 
FileOutputStream
+// Uses DirectIO for writes. Will only write out things in 4096 byte blocks. 
Buffers leftover bytes
+// in an internal data structure, which will be padded to 4096 bytes and 
flushed upon call to close.
+
+class DirectFileOutputStream::DirectFileOutputStreamImpl : public OSFile {
+ public:
+  Status Open(const std::string& path, bool append) {
+    const bool truncate = !append;
+    return OpenWritable(path, truncate, append, true /* write_only */, true);
+  }
+  Status Open(int fd) { return OpenWritable(fd); }
+};
+
+DirectFileOutputStream::DirectFileOutputStream() { 
+  uintptr_t mask = (uintptr_t)(4095);
+  uint8_t *mem = static_cast<uint8_t *>(malloc(4096 + 4095));
+  cached_data = reinterpret_cast<uint8_t *>( 
reinterpret_cast<uintptr_t>(mem+4095) & ~(mask));

Review Comment:
   `posix_memalign`?
   
   Actually, I'm not sure if an internal buffer is beneficial. AFAIK, direct IO 
is only for advanced apps like DMBS which implements their own buffer 
management and dislikes additional buffering (e.g., OS page cache). Looks to me 
Arrow as a library should not change this behaviour. Another issue is that we'd 
better leave IO alignment to the apps, as it depends heavily on OS and 
hardware, hardcode to 4096 implicitly is not good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to