anoopj commented on code in PR #15634:
URL: https://github.com/apache/iceberg/pull/15634#discussion_r3003733441
##########
core/src/main/java/org/apache/iceberg/ManifestReader.java:
##########
@@ -281,8 +294,22 @@ private CloseableIterable<ManifestEntry<F>> open(Schema
projection) {
Preconditions.checkArgument(
format != null, "Unable to determine format of manifest: %s",
file.location());
+ boolean unpartitioned = spec.rawPartitionType().fields().isEmpty();
+
+ // V4+ manifests omit the partition field when unpartitioned (Parquet
cannot represent
+ // empty structs, and the field is meaningless regardless of format). Mark
it optional so
+ // the reader returns null for the missing field instead of throwing. The
field must stay
+ // in the projection to preserve positional access for callers like
StructProjection.
+ // For older versions where the empty struct is present, making it
optional is harmless.
List<Types.NestedField> fields = Lists.newArrayList();
- fields.addAll(projection.asStruct().fields());
+ for (Types.NestedField field : projection.asStruct().fields()) {
Review Comment:
As mentioned above, we can clean this up after v4 data structures get added,
since we are folding partition into content_stats.
##########
core/src/main/java/org/apache/iceberg/V4Metadata.java:
##########
@@ -278,28 +279,35 @@ static Schema wrapFileSchema(Types.StructType fileSchema)
{
}
static Types.StructType fileType(Types.StructType partitionType) {
- return Types.StructType.of(
- DataFile.CONTENT.asRequired(),
- DataFile.FILE_PATH,
- DataFile.FILE_FORMAT,
- required(
- DataFile.PARTITION_ID, DataFile.PARTITION_NAME, partitionType,
DataFile.PARTITION_DOC),
- DataFile.RECORD_COUNT,
- DataFile.FILE_SIZE,
- DataFile.COLUMN_SIZES,
- DataFile.VALUE_COUNTS,
- DataFile.NULL_VALUE_COUNTS,
- DataFile.NAN_VALUE_COUNTS,
- DataFile.LOWER_BOUNDS,
- DataFile.UPPER_BOUNDS,
- DataFile.KEY_METADATA,
- DataFile.SPLIT_OFFSETS,
- DataFile.EQUALITY_IDS,
- DataFile.SORT_ORDER_ID,
- DataFile.FIRST_ROW_ID,
- DataFile.REFERENCED_DATA_FILE,
- DataFile.CONTENT_OFFSET,
- DataFile.CONTENT_SIZE);
+ List<Types.NestedField> fields = Lists.newArrayList();
+ fields.add(DataFile.CONTENT.asRequired());
+ fields.add(DataFile.FILE_PATH);
+ fields.add(DataFile.FILE_FORMAT);
+ if (!partitionType.fields().isEmpty()) {
Review Comment:
In v4, we are folding partition into `content_stats`, so this problem goes
away when we merge the v4 `TrackedFile` code in.
##########
core/src/main/java/org/apache/iceberg/ManifestWriter.java:
##########
@@ -65,7 +67,8 @@ private ManifestWriter(
Long snapshotId,
Long firstRowId,
Map<String, String> writerProperties) {
- this.file = file.encryptingOutputFile();
+ this.format =
FileFormat.fromFileName(file.encryptingOutputFile().location());
Review Comment:
V4 will always have Parquet right? so the decision to use the right
extension is made somewhere upstream?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]