I don’t use the output stream objects directly though right? Just to take a 
step back a bit, what im trying to do is to generate streaming rows to a table 
in realtime ( with the ability to control how many rows to batch up before 
writing out a recordbatch )

My understanding is that to properly stream table data I need to:
a) create an outputstream instance
b) create a RecordBatchStreamWriter binding my strmea object to it
c) create a RecordBatchBuilder.  As rows are added, add it to the record batch 
builder.  When we're ready to flush, call Flust on the batchbuilder to create a 
record batch and pass the batch to the RecordBatchStreamWriter.

I was hoping use MemoryMappedFile for a but since it doesn’t support 
dynamically growing the mmap file I'll have to write my own impl

-----Original Message-----
From: Antoine Pitrou [mailto:[email protected]] 
Sent: Wednesday, May 09, 2018 11:42 AM
To: [email protected]
Subject: Re: Question about streaming to memorymapped files


As for buffering data before making a call to write(): in Arrow 0.10.0
you'll be able to use BufferedOutputStream for this:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_arrow_blob_master_cpp_src_arrow_io_buffered.h&d=DwIDaQ&c=f5Q7ov8zryUUIGT55zpGgw&r=saGHLviPO9fhScNR4CP81xeAZv0qydj6cD5eJs7fZG4&m=JPb2EN-IHSoqJKmEqn-rC7CorVXLSWxcrywaUrMYYzc&s=1E4T4kTw88QvpO9Bk2GiADuArl_rn72Up4EXqHGwCnk&e=

Regards

Antoine.


Le 09/05/2018 à 17:39, Ambalu, Robert a écrit :
> I don’t have any offhand, no, but I would imagine that direct file writes 
> will at some point need to make a system call, which is expensive ( fwrite 
> might buffer before eventually making the sys call, looks like 
> FileOutputStream uses the raw system write for every write call).
> The current MMap io interface isn’t usable as a streaming output 
> unfortunately, though I suppose I could just implement my own
> 
> -----Original Message-----
> From: Antoine Pitrou [mailto:[email protected]] 
> Sent: Wednesday, May 09, 2018 11:11 AM
> To: [email protected]
> Subject: Re: Question about streaming to memorymapped files
> 
> 
> Do you know of any benchmark numbers / performance studies about this?
> While it's true that a memory-mapped file avoids explicit system calls,
> I've heard file I/O is quite well optimized, at least on Linux,
> nowadays.
> 
> Regards
> 
> Antoine.
> 
> 
> On Wed, 9 May 2018 14:47:53 +0000
> "Ambalu, Robert" <[email protected]> wrote:
>> Antoine, thanks for the quick reply.
>> You can actually grow memorymapped files with a mremap call ( and I think a 
>> seek/write on the file ), I do this in my applications and it works fine.
>> I want the efficiency of writing via memory maps, so would prefer to avoid 
>> FileOutputStream
>>
>> -----Original Message-----
>> From: Antoine Pitrou [mailto:[email protected]] 
>> Sent: Wednesday, May 09, 2018 10:37 AM
>> To: [email protected]
>> Subject: Re: Question about streaming to memorymapped files
>>
>>
>> Hi,
>>
>> If you don't know the output size upfront then should probably use a
>> FileOutputStream instead.  By definition, memory mapped files must have
>> a fixed size (since they are mapped to a fixed area in virtual memory).
>>
>> Regards
>>
>> Antoine.
>>
>>
>> Le 09/05/2018 à 16:31, Ambalu, Robert a écrit :
>>> Hey, I'm looking into streaming table updates into a memory mapped file ( 
>>> C++ )
>>> I think I have everything I need ( MemoryMappedFile output streamer, 
>>> RecordBatchStreamWriter ) but I don't understand how to properly create the 
>>> memmap file.  It looks like it requires you to preset a size to the file 
>>> when you create it, but since ill be streaming I don't actually know how 
>>> big a file im going to need...
>>> Am I missing some other API point here?  Any reason why size is required up 
>>> front and the memmap doesn't auto-grow as needed?
>>>
>>> Thanks in advance
>>> - Rob
>>>
>>>
>>>
>>>
>>>
>>> DISCLAIMER: This e-mail message and any attachments are intended solely for 
>>> the use of the individual or entity to which it is addressed and may 
>>> contain information that is confidential or legally privileged. If you are 
>>> not the intended recipient, you are hereby notified that any dissemination, 
>>> distribution, copying or other use of this message or its attachments is 
>>> strictly prohibited. If you have received this message in error, please 
>>> notify the sender immediately and permanently delete this message and any 
>>> attachments.
>>>
>>>
>>>
>>>   
> 

Reply via email to