I don’t use the output stream objects directly though right? Just to take a step back a bit, what im trying to do is to generate streaming rows to a table in realtime ( with the ability to control how many rows to batch up before writing out a recordbatch )
My understanding is that to properly stream table data I need to: a) create an outputstream instance b) create a RecordBatchStreamWriter binding my strmea object to it c) create a RecordBatchBuilder. As rows are added, add it to the record batch builder. When we're ready to flush, call Flust on the batchbuilder to create a record batch and pass the batch to the RecordBatchStreamWriter. I was hoping use MemoryMappedFile for a but since it doesn’t support dynamically growing the mmap file I'll have to write my own impl -----Original Message----- From: Antoine Pitrou [mailto:[email protected]] Sent: Wednesday, May 09, 2018 11:42 AM To: [email protected] Subject: Re: Question about streaming to memorymapped files As for buffering data before making a call to write(): in Arrow 0.10.0 you'll be able to use BufferedOutputStream for this: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_arrow_blob_master_cpp_src_arrow_io_buffered.h&d=DwIDaQ&c=f5Q7ov8zryUUIGT55zpGgw&r=saGHLviPO9fhScNR4CP81xeAZv0qydj6cD5eJs7fZG4&m=JPb2EN-IHSoqJKmEqn-rC7CorVXLSWxcrywaUrMYYzc&s=1E4T4kTw88QvpO9Bk2GiADuArl_rn72Up4EXqHGwCnk&e= Regards Antoine. Le 09/05/2018 à 17:39, Ambalu, Robert a écrit : > I don’t have any offhand, no, but I would imagine that direct file writes > will at some point need to make a system call, which is expensive ( fwrite > might buffer before eventually making the sys call, looks like > FileOutputStream uses the raw system write for every write call). > The current MMap io interface isn’t usable as a streaming output > unfortunately, though I suppose I could just implement my own > > -----Original Message----- > From: Antoine Pitrou [mailto:[email protected]] > Sent: Wednesday, May 09, 2018 11:11 AM > To: [email protected] > Subject: Re: Question about streaming to memorymapped files > > > Do you know of any benchmark numbers / performance studies about this? > While it's true that a memory-mapped file avoids explicit system calls, > I've heard file I/O is quite well optimized, at least on Linux, > nowadays. > > Regards > > Antoine. > > > On Wed, 9 May 2018 14:47:53 +0000 > "Ambalu, Robert" <[email protected]> wrote: >> Antoine, thanks for the quick reply. >> You can actually grow memorymapped files with a mremap call ( and I think a >> seek/write on the file ), I do this in my applications and it works fine. >> I want the efficiency of writing via memory maps, so would prefer to avoid >> FileOutputStream >> >> -----Original Message----- >> From: Antoine Pitrou [mailto:[email protected]] >> Sent: Wednesday, May 09, 2018 10:37 AM >> To: [email protected] >> Subject: Re: Question about streaming to memorymapped files >> >> >> Hi, >> >> If you don't know the output size upfront then should probably use a >> FileOutputStream instead. By definition, memory mapped files must have >> a fixed size (since they are mapped to a fixed area in virtual memory). >> >> Regards >> >> Antoine. >> >> >> Le 09/05/2018 à 16:31, Ambalu, Robert a écrit : >>> Hey, I'm looking into streaming table updates into a memory mapped file ( >>> C++ ) >>> I think I have everything I need ( MemoryMappedFile output streamer, >>> RecordBatchStreamWriter ) but I don't understand how to properly create the >>> memmap file. It looks like it requires you to preset a size to the file >>> when you create it, but since ill be streaming I don't actually know how >>> big a file im going to need... >>> Am I missing some other API point here? Any reason why size is required up >>> front and the memmap doesn't auto-grow as needed? >>> >>> Thanks in advance >>> - Rob >>> >>> >>> >>> >>> >>> DISCLAIMER: This e-mail message and any attachments are intended solely for >>> the use of the individual or entity to which it is addressed and may >>> contain information that is confidential or legally privileged. If you are >>> not the intended recipient, you are hereby notified that any dissemination, >>> distribution, copying or other use of this message or its attachments is >>> strictly prohibited. If you have received this message in error, please >>> notify the sender immediately and permanently delete this message and any >>> attachments. >>> >>> >>> >>> >
