alamb commented on issue #3142: URL: https://github.com/apache/arrow-rs/issues/3142#issuecomment-1322692312
> Fortunately however, append-only and immutable are not necessarily at odds, as long as one takes immutable RecordBatch snapshots at a point-in-time of the append-only version. I think the general pattern of * Append data .... * Snapshot (by copying into a read only `RecordBatch`) * keep appending data Makes sense with arrow. In fact this is what we do in IOx on the write path. This pattern can be done with the various `*Builder`s -- such as https://docs.rs/arrow/27.0.0/arrow/array/type.Int64Builder.html So something like: ```rust builder.append_value(1); // get an array to use in datafusion, etc let array = builder.build(); // make a new builder and start building the second batch builder = Int64Builder::new(); builder.append_value(2); builder.append_value(3); ``` I think what @tustvold is suggesting with " non-consuming finish method to the builders" means you could do something like: ```rust builder.append_value(1); // get an array to use in datafusion, etc (by copying the underlying values) let array = builder.snapshot(); // existing builder can be used to append builder.append_value(2); builder.append_value(3); ``` I don't fully follow the proposal in https://gist.github.com/avantgardnerio/48d977ea6bd28c790cfb6df09250336d As soon as you have this function: ```rust pub fn as_slice(&self) -> RecordBatch { self.record_batch.slice(0, self.len()) } ``` This means that now multiple locations may be able to read that data so trying to update as well will likely be an exercise in frustration as you try and fight the rust compiler to teach it that somehow you have guaranteed that the memory remains valid and that concurrent read and modification are ok. Maybe we could start with the "make a snapshot based copy" and if the copying is too much figure out some different approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
