etseidl commented on code in PR #8894:
URL: https://github.com/apache/arrow-rs/pull/8894#discussion_r2554166195


##########
parquet/src/bin/parquet-fromcsv.rs:
##########
@@ -445,6 +445,9 @@ mod tests {
         let mut actual = String::from_utf8(buffer_vec).unwrap();
         let pos = actual.find('\n').unwrap() + 1;
         actual = actual[pos..].to_string();
+        // Normalize line endings for cross-platform compatibility
+        let expected = expected.replace("\r\n", "\n").trim_start().to_string();

Review Comment:
   Also not relevant



##########
parquet/src/arrow/schema/extension.rs:
##########
@@ -36,7 +36,7 @@ use arrow_schema::extension::ExtensionType;
 /// Arrow DataType, and instead are represented by an Arrow ExtensionType.
 /// Extension types are attached to Arrow Fields via metadata.
 pub(crate) fn try_add_extension_type(
-    mut arrow_field: Field,
+    #[allow(unused_mut)] mut arrow_field: Field,

Review Comment:
   Nor do the changes here seem necessary



##########
parquet/src/arrow/decoder/mod.rs:
##########
@@ -20,5 +20,7 @@
 mod delta_byte_array;
 mod dictionary_index;
 
+#[allow(unused_imports)]
 pub use delta_byte_array::DeltaByteArrayDecoder;
+#[allow(unused_imports)]

Review Comment:
   Nor are these



##########
parquet/src/arrow/array_reader/mod.rs:
##########
@@ -51,7 +51,7 @@ mod test_util;
 
 // Note that this crate is public under the `experimental` feature flag.
 use crate::file::metadata::RowGroupMetaData;
-pub use builder::{ArrayReaderBuilder, CacheOptions, CacheOptionsBuilder};
+pub use builder::{ArrayReaderBuilder, CacheOptionsBuilder};

Review Comment:
   This change is not relevant to the issue



##########
parquet/src/file/metadata/writer.rs:
##########
@@ -181,16 +170,8 @@ impl<'a, W: Write> ThriftMetadataWriter<'a, W> {
             .as_ref()
             .is_some_and(|oi| oi.iter().all(|oii| oii.iter().all(|idx| 
idx.is_none())));
 
-        let offset_indexes: Option<ParquetOffsetIndex> = if all_none {
-            None
-        } else {
-            // FIXME(ets): this will panic if there's a missing index.
-            offset_indexes.map(|ovvi| {
-                ovvi.into_iter()
-                    .map(|vi| vi.into_iter().map(|oi| oi.unwrap()).collect())
-                    .collect()
-            })
-        };
+        let offset_indexes: Option<ParquetOffsetIndex> =
+            if all_none { None } else { offset_indexes };

Review Comment:
   Same comment here



##########
parquet/src/file/metadata/parser.rs:
##########
@@ -269,14 +269,26 @@ pub(crate) fn parse_column_index(
                             rg_idx,
                             col_idx,
                         )
+                        .map(Some)
                     }
-                    None => Ok(ColumnIndexMetaData::NONE),
+                    None => Ok(None),
                 })
                 .collect::<crate::errors::Result<Vec<_>>>()
         })
         .collect::<crate::errors::Result<Vec<_>>>()?;
 
     metadata.set_column_index(Some(index));
+
+    if column_index_policy == PageIndexPolicy::Required {

Review Comment:
   I don't think we should add this check...we didn't enforce `Required` for 
the column index, which can be legitimately missing for some columns. It's only 
the offset index that needs this treatment.



##########
parquet/src/file/metadata/writer.rs:
##########
@@ -487,10 +468,7 @@ impl<'a, W: Write> ParquetMetaDataWriter<'a, W> {
                 (0..self.metadata.row_groups().len())
                     .map(|rg_idx| {
                         let column_indexes = &row_group_column_indexes[rg_idx];
-                        column_indexes
-                            .iter()
-                            .map(|column_index| Some(column_index.clone()))
-                            .collect()
+                        column_indexes.to_vec()

Review Comment:
   This whole function is no longer needed. The calls to 
`convert_column_index()` and `convert_offset_index()` above can instead be 
   ```rust
         let column_indexes = self.metadata.column_index();
         let offset_indexes = self.metadata.offset_index();
   ```



##########
parquet/src/file/metadata/reader.rs:
##########
@@ -96,7 +96,7 @@ pub enum PageIndexPolicy {
 impl From<bool> for PageIndexPolicy {
     fn from(value: bool) -> Self {
         match value {
-            true => Self::Required,
+            true => Self::Optional,

Review Comment:
   This is probably preferable, but I think warrants some discussion. This 
should likely be in a different PR with its own issue.



##########
parquet/src/file/metadata/writer.rs:
##########
@@ -149,20 +149,9 @@ impl<'a, W: Write> ThriftMetadataWriter<'a, W> {
             .is_some_and(|ci| ci.iter().all(|cii| cii.iter().all(|idx| 
idx.is_none())));
 
         // transform from Option<Vec<Vec<Option<ColumnIndexMetaData>>>> to
-        // Option<Vec<Vec<ColumnIndexMetaData>>>
-        let column_indexes: Option<ParquetColumnIndex> = if all_none {
-            None
-        } else {
-            column_indexes.map(|ovvi| {
-                ovvi.into_iter()
-                    .map(|vi| {
-                        vi.into_iter()
-                            .map(|ci| ci.unwrap_or(ColumnIndexMetaData::NONE))
-                            .collect()
-                    })
-                    .collect()
-            })
-        };
+        // Option<Vec<Vec<Option<ColumnIndexMetaData>>>>
+        let column_indexes: Option<ParquetColumnIndex> =
+            if all_none { None } else { column_indexes };

Review Comment:
   There's no transformation to do any longer, which is the whole point of the 
underlying issue for this PR. This should simply be
   ```rust
       Ok(if all_none { None } else { column_indexes })
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to