singhpk234 commented on code in PR #15939:
URL: https://github.com/apache/iceberg/pull/15939#discussion_r3074141487


##########
api/src/main/java/org/apache/iceberg/stats/FieldStatistic.java:
##########
@@ -125,13 +125,13 @@ public static Types.StructType 
fieldStatsFor(Types.NestedField field, int baseFi
               baseFieldId + AVG_VALUE_SIZE.offset(),
               AVG_VALUE_SIZE.fieldName(),
               Types.IntegerType.get(),
-              "Avg value size of variable-length types (String, Binary)"));
+              "Avg value size in bytes of variable-length types (String, 
Binary)"));
       fields.add(
           optional(
               baseFieldId + MAX_VALUE_SIZE.offset(),
               MAX_VALUE_SIZE.fieldName(),
               Types.IntegerType.get(),
-              "Max value size of variable-length types (String, Binary)"));
+              "Max value size in bytes of variable-length types (String, 
Binary)"));

Review Comment:
   Thanks for taking a look into @nastra, I believe it will be a bit tricky if 
based on table versions we store different sizes (compressed encoded for v1 - 
v3 and then v4 uncompressed and un encoded), think of tables migrated from v3 
to v4, i think there have been proposals to just have a dedicated metric for 
this as well like in the doc https://github.com/apache/iceberg/issues/10703, 
which would be a bit easier to reason about specially for migrated table 
   
   For CBO integration if you see engines such as Trino are already having a 
constant multiplier added to compensate for this (based on TPC-DS) which is 
just a type check on the column Handle 
   
https://github.com/trinodb/trino/blob/master/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsReader.java#L237-L250
   
   Spark CBO by default is off, but i think where injection point is considered 
we have estimateStats in the connnector which would also be suff easy to 
reason. 
   
   I believe from integ pov to have a dedicated metric would be very nice, 
specially for migrated tables v3 -> v4
   
   Please let me know wdyt ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to