u3Izx9ql7vW4 commented on issue #43929: URL: https://github.com/apache/arrow/issues/43929#issuecomment-2325536409
That's correct, I'm writing to IPC with memory mapped files. I did go over the page you linked a few times but couldn't figure out the difference between `new_file` and `new_stream`, as they both accept `NativeFile` . I needed to have multiple IPCs streams running, so I opted for memory mapped files which allowed me to designate a file path for each producer. Could this be done with Arrows' memory buffer somehow? > If you just want to write to disk and keep the file memory mapped it's likely easier (and faster) to just write an arrow file to disk and mmap it after. I don't quite follow this part. Isn't this what I'm already doing? Perhaps you're suggesting that I check if the mem map file has already been created before creating a new one, like below? ```python def save_data(): size = table.get_total_buffer_size() file_path = os.path.join(prefix_stream, sink) if not os.path.exists(file_path): pa.create_memory_map(file_path, size) with pa.memory_map(file_path, 'wb') as sink: with pa.ipc.new_file(sink, table.schema) as writer: writer.write_table(table, max_chunksize=1000) ``` > edit: actually the offset is likely metadata, maybe? That's what I thought as well, but I would have thought the `total_buffer_size` would have included the metadata. Though I may have dictionary encoded columns, so maybe that's inflating the metadata. Do you know if there's a way to find out? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org