aokolnychyi commented on a change in pull request #2240:
URL: https://github.com/apache/iceberg/pull/2240#discussion_r684557800



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -77,6 +80,14 @@ public static MetricsConfig getDefault() {
   }
 
   public static MetricsConfig fromProperties(Map<String, String> props) {

Review comment:
       I've been thinking about this change while doing recent work around 
writer factories and deletes.
   
   Naming-wise, I'd probably prefer calling it `forTable(table)`. If we decide 
to store [metrics for rows in position 
deletes](https://github.com/apache/iceberg/pull/2942#discussion_r683715471), we 
then can provide something like `forPositionDelete(table)` to promote the path 
and pos columns and apply the regular config for all other columns in optional 
`row`.

##########
File path: core/src/main/java/org/apache/iceberg/util/SortOrderUtil.java
##########
@@ -64,4 +68,16 @@ static SortOrder buildSortOrder(Schema schema, PartitionSpec 
spec, SortOrder sor
 
     return builder.build();
   }
+
+  public static Set<String> getSortedColumns(SortOrder sortOrder) {

Review comment:
       nit: `getSortedColumns` -> `sortColumns`. We usually try to avoid `get` 
in method names. It is a style preference we follow in Iceberg. Such prefixes 
usually have no meaning and just make the names longer.

##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -95,16 +118,32 @@ public static MetricsConfig fromProperties(Map<String, 
String> props) {
           try {
             mode = MetricsModes.fromString(props.get(key));
           } catch (IllegalArgumentException err) {
-            // Mode was invalid, log the error and use the default
+            // Mode was invalid, log the error and use the default (or default 
for sorted columns)
             LOG.warn("Ignoring invalid metrics mode for column {}: {}", 
columnAlias, props.get(key), err);
-            mode = spec.defaultMode;
+            if (sortedCols.contains(columnAlias)) {

Review comment:
       If you remove the if condition above, you will be able to avoid this 
change too.
   I think something as below would work.
   
   ```
   if (order != null) {
     MetricsMode sortColumnDefaultMode = sortColumnDefaultMode(defaultMode);
     Set<String> sortColumns = SortOrderUtil.sortColumns(order);
     for (String sortColumn : sortColumns) {
       spec.columnModes.put(sortColumn, sortColumnDefaultMode);
     }
   }
   ```

##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -87,6 +98,18 @@ public static MetricsConfig fromProperties(Map<String, 
String> props) {
       spec.defaultMode = 
MetricsModes.fromString(DEFAULT_WRITE_METRICS_MODE_DEFAULT);
     }
 
+    // First set sorted column with sorted column default (can be overridden 
by user)
+    final Set<String> sortedCols = new HashSet<>();

Review comment:
       You could probably init the set to the method call result instead of 
calling calling `addAll` later.
   
   ```
   Set<String> sortColumns = order != null ? SortOrderUtil.sortColumns(order) : 
ImmutableSet.of();
   ...
   ```

##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -87,6 +98,18 @@ public static MetricsConfig fromProperties(Map<String, 
String> props) {
       spec.defaultMode = 
MetricsModes.fromString(DEFAULT_WRITE_METRICS_MODE_DEFAULT);
     }
 
+    // First set sorted column with sorted column default (can be overridden 
by user)
+    final Set<String> sortedCols = new HashSet<>();

Review comment:
       nit: we usually don't mark temp vars as final in Iceberg. It is a style 
preference. I know there were some performance optimizations in old JVMs for 
final vars but I think modern JVM are smart enough these days.

##########
File path: core/src/main/java/org/apache/iceberg/util/SortOrderUtil.java
##########
@@ -64,4 +68,16 @@ static SortOrder buildSortOrder(Schema schema, PartitionSpec 
spec, SortOrder sor
 
     return builder.build();
   }
+
+  public static Set<String> getSortedColumns(SortOrder sortOrder) {
+    if (sortOrder == null) {

Review comment:
       Or keep the check here but remove in the caller. Up to you.

##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -95,16 +118,32 @@ public static MetricsConfig fromProperties(Map<String, 
String> props) {
           try {
             mode = MetricsModes.fromString(props.get(key));
           } catch (IllegalArgumentException err) {
-            // Mode was invalid, log the error and use the default
+            // Mode was invalid, log the error and use the default (or default 
for sorted columns)
             LOG.warn("Ignoring invalid metrics mode for column {}: {}", 
columnAlias, props.get(key), err);
-            mode = spec.defaultMode;
+            if (sortedCols.contains(columnAlias)) {
+              mode = sortedColDefaultMode;
+            } else {
+              mode = spec.defaultMode;
+            }
           }
           spec.columnModes.put(columnAlias, mode);
         });
-
     return spec;
   }
 
+  /**
+   * Auto promote sorted columns to truncate(16) if default is set at Counts 
or None.
+   * @param defaultMode default mode
+   * @return mode to use
+   */
+  private static MetricsMode promoteSortedColumnDefault(MetricsMode 
defaultMode) {

Review comment:
       nit: `promoteSortedColumnDefault` as just `sortColumnDefaultMode`?

##########
File path: 
data/src/main/java/org/apache/iceberg/data/GenericAppenderFactory.java
##########
@@ -47,27 +47,31 @@
  */
 public class GenericAppenderFactory implements FileAppenderFactory<Record> {

Review comment:
       We have just recently introduced `WriterFactory`. We will be moving to 
that API instead of appenders.
   I think it will be sufficient to just cover the new API. Plus, it should be 
super simple as I thought about this change while implementing 
`BaseWriterFactory`. It already has a reference to `Table`. This will help us 
to avoid changing so many places around `AppenderFactory` that we will remove 
in the nearest future.

##########
File path: core/src/test/java/org/apache/iceberg/TestMetricUtil.java
##########
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Collections;
+import java.util.Map;
+
+/**
+ * Utils for testing Metrics.
+ */
+public class TestMetricUtil {
+
+  private TestMetricUtil() {
+  }
+
+  /**
+   * @param properties table properties
+   * @param sortOrder table sort order
+   * @param schema table schema
+   * @return table
+   */
+  public static Table createTestTable(Map<String, String> properties, 
SortOrder sortOrder,

Review comment:
       What about calling `table.replaceSortOrder()` within tests and using the 
existing `createTable`?

##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -87,6 +98,18 @@ public static MetricsConfig fromProperties(Map<String, 
String> props) {
       spec.defaultMode = 
MetricsModes.fromString(DEFAULT_WRITE_METRICS_MODE_DEFAULT);
     }
 
+    // First set sorted column with sorted column default (can be overridden 
by user)
+    final Set<String> sortedCols = new HashSet<>();
+    MetricsMode sortedColDefaultMode = 
promoteSortedColumnDefault(spec.defaultMode);
+    if (order != null) {
+      sortedCols.addAll(SortOrderUtil.getSortedColumns(order));
+      sortedCols.stream().forEach(sc -> {
+        if (!props.containsKey(METRICS_MODE_COLUMN_CONF_PREFIX + sc)) {

Review comment:
       Do we need this if condition? I think it is fine to default the mode 
here. If the user sets an explicit value, it will override this one.

##########
File path: core/src/main/java/org/apache/iceberg/util/SortOrderUtil.java
##########
@@ -64,4 +68,16 @@ static SortOrder buildSortOrder(Schema schema, PartitionSpec 
spec, SortOrder sor
 
     return builder.build();
   }
+
+  public static Set<String> getSortedColumns(SortOrder sortOrder) {
+    if (sortOrder == null) {

Review comment:
       Do we need this extra check for null here? Looks like we validate the 
order is not null before calling. I'd probably simplify  and assume it is not 
null. It is a pretty internal utility class.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to