aokolnychyi commented on a change in pull request #2240:
URL: https://github.com/apache/iceberg/pull/2240#discussion_r684557800
##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -77,6 +80,14 @@ public static MetricsConfig getDefault() {
}
public static MetricsConfig fromProperties(Map<String, String> props) {
Review comment:
I've been thinking about this change while doing recent work around
writer factories and deletes.
Naming-wise, I'd probably prefer calling it `forTable(table)`. If we decide
to store [metrics for rows in position
deletes](https://github.com/apache/iceberg/pull/2942#discussion_r683715471), we
then can provide something like `forPositionDelete(table)` to promote the path
and pos columns and apply the regular config for all other columns in optional
`row`.
##########
File path: core/src/main/java/org/apache/iceberg/util/SortOrderUtil.java
##########
@@ -64,4 +68,16 @@ static SortOrder buildSortOrder(Schema schema, PartitionSpec
spec, SortOrder sor
return builder.build();
}
+
+ public static Set<String> getSortedColumns(SortOrder sortOrder) {
Review comment:
nit: `getSortedColumns` -> `sortColumns`. We usually try to avoid `get`
in method names. It is a style preference we follow in Iceberg. Such prefixes
usually have no meaning and just make the names longer.
##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -95,16 +118,32 @@ public static MetricsConfig fromProperties(Map<String,
String> props) {
try {
mode = MetricsModes.fromString(props.get(key));
} catch (IllegalArgumentException err) {
- // Mode was invalid, log the error and use the default
+ // Mode was invalid, log the error and use the default (or default
for sorted columns)
LOG.warn("Ignoring invalid metrics mode for column {}: {}",
columnAlias, props.get(key), err);
- mode = spec.defaultMode;
+ if (sortedCols.contains(columnAlias)) {
Review comment:
If you remove the if condition above, you will be able to avoid this
change too.
I think something as below would work.
```
if (order != null) {
MetricsMode sortColumnDefaultMode = sortColumnDefaultMode(defaultMode);
Set<String> sortColumns = SortOrderUtil.sortColumns(order);
for (String sortColumn : sortColumns) {
spec.columnModes.put(sortColumn, sortColumnDefaultMode);
}
}
```
##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -87,6 +98,18 @@ public static MetricsConfig fromProperties(Map<String,
String> props) {
spec.defaultMode =
MetricsModes.fromString(DEFAULT_WRITE_METRICS_MODE_DEFAULT);
}
+ // First set sorted column with sorted column default (can be overridden
by user)
+ final Set<String> sortedCols = new HashSet<>();
Review comment:
You could probably init the set to the method call result instead of
calling calling `addAll` later.
```
Set<String> sortColumns = order != null ? SortOrderUtil.sortColumns(order) :
ImmutableSet.of();
...
```
##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -87,6 +98,18 @@ public static MetricsConfig fromProperties(Map<String,
String> props) {
spec.defaultMode =
MetricsModes.fromString(DEFAULT_WRITE_METRICS_MODE_DEFAULT);
}
+ // First set sorted column with sorted column default (can be overridden
by user)
+ final Set<String> sortedCols = new HashSet<>();
Review comment:
nit: we usually don't mark temp vars as final in Iceberg. It is a style
preference. I know there were some performance optimizations in old JVMs for
final vars but I think modern JVM are smart enough these days.
##########
File path: core/src/main/java/org/apache/iceberg/util/SortOrderUtil.java
##########
@@ -64,4 +68,16 @@ static SortOrder buildSortOrder(Schema schema, PartitionSpec
spec, SortOrder sor
return builder.build();
}
+
+ public static Set<String> getSortedColumns(SortOrder sortOrder) {
+ if (sortOrder == null) {
Review comment:
Or keep the check here but remove in the caller. Up to you.
##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -95,16 +118,32 @@ public static MetricsConfig fromProperties(Map<String,
String> props) {
try {
mode = MetricsModes.fromString(props.get(key));
} catch (IllegalArgumentException err) {
- // Mode was invalid, log the error and use the default
+ // Mode was invalid, log the error and use the default (or default
for sorted columns)
LOG.warn("Ignoring invalid metrics mode for column {}: {}",
columnAlias, props.get(key), err);
- mode = spec.defaultMode;
+ if (sortedCols.contains(columnAlias)) {
+ mode = sortedColDefaultMode;
+ } else {
+ mode = spec.defaultMode;
+ }
}
spec.columnModes.put(columnAlias, mode);
});
-
return spec;
}
+ /**
+ * Auto promote sorted columns to truncate(16) if default is set at Counts
or None.
+ * @param defaultMode default mode
+ * @return mode to use
+ */
+ private static MetricsMode promoteSortedColumnDefault(MetricsMode
defaultMode) {
Review comment:
nit: `promoteSortedColumnDefault` as just `sortColumnDefaultMode`?
##########
File path:
data/src/main/java/org/apache/iceberg/data/GenericAppenderFactory.java
##########
@@ -47,27 +47,31 @@
*/
public class GenericAppenderFactory implements FileAppenderFactory<Record> {
Review comment:
We have just recently introduced `WriterFactory`. We will be moving to
that API instead of appenders.
I think it will be sufficient to just cover the new API. Plus, it should be
super simple as I thought about this change while implementing
`BaseWriterFactory`. It already has a reference to `Table`. This will help us
to avoid changing so many places around `AppenderFactory` that we will remove
in the nearest future.
##########
File path: core/src/test/java/org/apache/iceberg/TestMetricUtil.java
##########
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Collections;
+import java.util.Map;
+
+/**
+ * Utils for testing Metrics.
+ */
+public class TestMetricUtil {
+
+ private TestMetricUtil() {
+ }
+
+ /**
+ * @param properties table properties
+ * @param sortOrder table sort order
+ * @param schema table schema
+ * @return table
+ */
+ public static Table createTestTable(Map<String, String> properties,
SortOrder sortOrder,
Review comment:
What about calling `table.replaceSortOrder()` within tests and using the
existing `createTable`?
##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -87,6 +98,18 @@ public static MetricsConfig fromProperties(Map<String,
String> props) {
spec.defaultMode =
MetricsModes.fromString(DEFAULT_WRITE_METRICS_MODE_DEFAULT);
}
+ // First set sorted column with sorted column default (can be overridden
by user)
+ final Set<String> sortedCols = new HashSet<>();
+ MetricsMode sortedColDefaultMode =
promoteSortedColumnDefault(spec.defaultMode);
+ if (order != null) {
+ sortedCols.addAll(SortOrderUtil.getSortedColumns(order));
+ sortedCols.stream().forEach(sc -> {
+ if (!props.containsKey(METRICS_MODE_COLUMN_CONF_PREFIX + sc)) {
Review comment:
Do we need this if condition? I think it is fine to default the mode
here. If the user sets an explicit value, it will override this one.
##########
File path: core/src/main/java/org/apache/iceberg/util/SortOrderUtil.java
##########
@@ -64,4 +68,16 @@ static SortOrder buildSortOrder(Schema schema, PartitionSpec
spec, SortOrder sor
return builder.build();
}
+
+ public static Set<String> getSortedColumns(SortOrder sortOrder) {
+ if (sortOrder == null) {
Review comment:
Do we need this extra check for null here? Looks like we validate the
order is not null before calling. I'd probably simplify and assume it is not
null. It is a pretty internal utility class.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]