Re: [I] [Python]: OSError: Write out of bounds (offset = 13784, size = 1496) in file of size 14638 [arrow]

via GitHub Mon, 02 Sep 2024 22:00:10 -0700


assignUser commented on issue #43929:
URL: https://github.com/apache/arrow/issues/43929#issuecomment-2325604847


   `new_file` and `new_stream` differ in the format that is used, which one you 
want depends on your use case:
   > - Streaming format: for sending an arbitrary length sequence of record 
batches. The format must be processed from start to end, and does not support 
random access
   > 
   > - File or Random Access format: for serializing a fixed number of record 
batches. Supports random access, and thus is very useful when used with memory 
maps
   
   So it depends what exactly you want to do, if you want to write a stream of 
record batches of unknown number you want to use the stream APIs. If you just 
want save a bunch (read: fixed number of batches) of data (like the table you 
are using in your code)  in one go and make it available with minimal 
allocation for a consumer you can use the IPC file format which can be 
effectively mmap'ed (in the consumer, for writing arrow handles that 
internally!) .
   
   Could you try this?
   ```
   def save_data():
       size = table.get_total_buffer_size()
   
       file_path = os.path.join(prefix_stream, sink)
     
       with pa.ipc.new_file(file_path, table.schema) as writer:
         writer.write_table(table, max_chunksize=1000)
             
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [Python]: OSError: Write out of bounds (offset = 13784, size = 1496) in file of size 14638 [arrow]

Reply via email to