blackmwk commented on code in PR #2247:
URL: https://github.com/apache/iceberg-rust/pull/2247#discussion_r3008107675


##########
crates/iceberg/src/io/storage/mod.rs:
##########
@@ -133,10 +137,16 @@ pub trait StorageFactory: Debug + Send + Sync {
     /// # Arguments
     ///
     /// * `config` - The storage configuration containing scheme and properties
+    /// * `metadata` - Optional table metadata that storage backends can use
+    ///   for table-level configuration (e.g., table properties).
     ///
     /// # Returns
     ///
     /// A `Result` containing an `Arc<dyn Storage>` on success, or an error
     /// if the storage could not be created.
-    fn build(&self, config: &StorageConfig) -> Result<Arc<dyn Storage>>;
+    fn build(
+        &self,
+        config: &StorageConfig,
+        metadata: Option<&TableMetadata>,

Review Comment:
   > Maybe it's far fetched, but I do have some use cases in mind for adding 
table information to the storage. For example, if we want to implement a 
refreshable storage, that refreshes vended storage credentials when they 
expire, it has got to talk to catalog to vend credentials again. But by 
default, it has no information about which table to vend credentials for. So 
passing table information (name in this case) is needed. Similarly, passing 
table_location can be useful for optimizing storage resolution (s3 vs azure 
etc.).
   
   I don't think this is a good use case. `Stroage` instance typically will be 
serialized and distributed to workers in a distributed compute engine, and 
refershing credentials from it will cause a lot of load traffic for rest 
catalog server. 
   
   >I don't know how big to perf difference this is, but I feel like we could 
do thousands of IOPS, and resolving storage on every IO could be an 
anti-pattern when we could do it just once. That said, if the perf impact is 
not big, it may not be required just for that. I'd let you decide whether the 
above pattern in combination with this one is enough to warrant the ability to 
attach some additional table information to the storage.
   
   I would suggest you to do a microbenchmark to test the impact of the url 
comparison operation. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to