blackmwk commented on code in PR #1876:
URL: https://github.com/apache/iceberg-rust/pull/1876#discussion_r2752240588


##########
crates/iceberg/src/catalog/metadata_location.rs:
##########
@@ -15,38 +15,100 @@
 // specific language governing permissions and limitations
 // under the License.
 
+use std::collections::HashMap;
 use std::fmt::Display;
 use std::str::FromStr;
 
 use uuid::Uuid;
 
+use crate::compression::CompressionCodec;
+use crate::spec::{TableMetadata, parse_metadata_file_compression};
 use crate::{Error, ErrorKind, Result};
 
 /// Helper for parsing a location of the format: 
`<location>/metadata/<version>-<uuid>.metadata.json`
+/// or with compression: 
`<location>/metadata/<version>-<uuid>.gz.metadata.json`
 #[derive(Clone, Debug, PartialEq)]
 pub struct MetadataLocation {
     table_location: String,
     version: i32,
     id: Uuid,
+    compression_suffix: Option<String>,

Review Comment:
   I think instead of storing a suffix, we should instead store the compression 
codec.



##########
crates/iceberg/src/spec/table_metadata.rs:
##########
@@ -468,9 +483,44 @@ impl TableMetadata {
         file_io: &FileIO,
         metadata_location: impl AsRef<str>,

Review Comment:
   I think we may need to change `metadata_location`'s type to the struct 
`MetadataLocation`, rather than `impl AsRef<str>`. Also we should not infer 
compression codec from suffix, instead we should infer it from `TableMetadata` 
itself. With this change, we could avoid a lot of validation in the method body.



##########
crates/iceberg/src/catalog/metadata_location.rs:
##########
@@ -15,38 +15,100 @@
 // specific language governing permissions and limitations
 // under the License.
 
+use std::collections::HashMap;
 use std::fmt::Display;
 use std::str::FromStr;
 
 use uuid::Uuid;
 
+use crate::compression::CompressionCodec;
+use crate::spec::{TableMetadata, parse_metadata_file_compression};
 use crate::{Error, ErrorKind, Result};
 
 /// Helper for parsing a location of the format: 
`<location>/metadata/<version>-<uuid>.metadata.json`
+/// or with compression: 
`<location>/metadata/<version>-<uuid>.gz.metadata.json`
 #[derive(Clone, Debug, PartialEq)]
 pub struct MetadataLocation {
     table_location: String,
     version: i32,
     id: Uuid,
+    compression_suffix: Option<String>,
 }
 
 impl MetadataLocation {
+    /// Determines the compression suffix from table properties.
+    fn compression_suffix_from_properties(
+        properties: &HashMap<String, String>,
+    ) -> Result<Option<String>> {
+        let codec = parse_metadata_file_compression(properties)?;
+
+        Ok(if codec.is_none() {
+            None
+        } else {
+            Some(codec.suffix()?.to_string())
+        })
+    }
+
     /// Creates a completely new metadata location starting at version 0.
-    /// Only used for creating a new table. For updates, see 
`with_next_version`.
+    /// Only used for creating a new table. For updates, see `next_version`.
+    #[deprecated(
+        since = "0.8.0",
+        note = "Use new_with_metadata instead to properly handle compression 
settings"
+    )]
     pub fn new_with_table_location(table_location: impl ToString) -> Self {
         Self {
             table_location: table_location.to_string(),
             version: 0,
             id: Uuid::new_v4(),
+            compression_suffix: None,
+        }
+    }
+
+    /// Creates a completely new metadata location starting at version 0,
+    /// with compression settings from the table metadata.
+    /// Only used for creating a new table. For updates, see `next_version`.
+    pub fn new_with_metadata(table_location: impl ToString, metadata: 
&TableMetadata) -> Self {
+        Self {
+            table_location: table_location.to_string(),
+            version: 0,
+            id: Uuid::new_v4(),
+            // This will go away 
https://github.com/apache/iceberg-rust/issues/2028 is resolved, so for now
+            // we use a default value.
+            compression_suffix: 
Self::compression_suffix_from_properties(metadata.properties())
+                .unwrap_or(None),
         }
     }
 
     /// Creates a new metadata location for an updated metadata file.
+    /// Uses compression settings from the new metadata.
+    pub fn next_version(

Review Comment:
   I'm still confused why this method is better than `self.with_next_version`? 
The `self.with_next_version` looks more reasonable to me since we need the 
original version, which is  contained in `self`.



##########
crates/catalog/s3tables/src/catalog.rs:
##########
@@ -467,8 +467,7 @@ impl Catalog for S3TablesCatalog {
                     .send()
                     .await
                     .map_err(from_aws_sdk_error)?;
-                let warehouse_location = 
get_resp.warehouse_location().to_string();
-                
MetadataLocation::new_with_table_location(warehouse_location).to_string()
+                get_resp.warehouse_location().to_string()

Review Comment:
   While I think this fix is correct, I think it's much related to this pr? We 
should move it to a standalone pr.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to