rdblue commented on code in PR #9366: URL: https://github.com/apache/iceberg/pull/9366#discussion_r1439065448
########## core/src/main/java/org/apache/iceberg/avro/GenericAvroReader.java: ########## @@ -69,62 +99,107 @@ public void setRowPositionSupplier(Supplier<Long> posSupplier) { @Override public T read(T reuse, Decoder decoder) throws IOException { - return DecoderResolver.resolveAndRead(decoder, readSchema, fileSchema, reader, reuse); + return reader.read(decoder, reuse); } - private static class ReadBuilder extends AvroSchemaVisitor<ValueReader<?>> { - private final ClassLoader loader; + private class ResolvingReadBuilder extends AvroWithPartnerVisitor<Type, ValueReader<?>> { + private final Map<Type, Schema> avroSchemas; - private ReadBuilder(ClassLoader loader) { - this.loader = loader; + private ResolvingReadBuilder(Types.StructType expectedType, String rootName) { + this.avroSchemas = AvroSchemaUtil.convertTypes(expectedType, rootName); } @Override - @SuppressWarnings("unchecked") - public ValueReader<?> record(Schema record, List<String> names, List<ValueReader<?>> fields) { - try { - Class<?> recordClass = - DynClasses.builder().loader(loader).impl(record.getFullName()).buildChecked(); - if (IndexedRecord.class.isAssignableFrom(recordClass)) { - return ValueReaders.record(fields, (Class<? extends IndexedRecord>) recordClass, record); + public ValueReader<?> record(Type partner, Schema record, List<ValueReader<?>> fieldResults) { + Types.StructType expected = partner != null ? partner.asStructType() : null; + Map<Integer, Integer> idToPos = idToPos(expected); + + List<Pair<Integer, ValueReader<?>>> readPlan = Lists.newArrayList(); + List<Schema.Field> fileFields = record.getFields(); + for (int pos = 0; pos < fileFields.size(); pos += 1) { + Schema.Field field = fileFields.get(pos); + ValueReader<?> fieldReader = fieldResults.get(pos); + Integer fieldId = AvroSchemaUtil.fieldId(field); + Integer projectionPos = idToPos.remove(fieldId); + + Object constant = idToConstant.get(fieldId); + if (projectionPos != null && constant != null) { + readPlan.add(Pair.of(projectionPos, ValueReaders.replaceWithConstant(fieldReader, constant))); + } else { + readPlan.add(Pair.of(projectionPos, fieldReader)); + } + } + + // handle any expected columns that are not in the data file + for (Map.Entry<Integer, Integer> idAndPos : idToPos.entrySet()) { + int fieldId = idAndPos.getKey(); + int pos = idAndPos.getValue(); + + Object constant = idToConstant.get(fieldId); + Types.NestedField field = expected.field(fieldId); + if (constant != null) { + readPlan.add(Pair.of(pos, ValueReaders.constant(constant))); + } else if (fieldId == MetadataColumns.IS_DELETED.fieldId()) { Review Comment: Yes, I think this is better than how we did it before. Previously, we would [inject these fields in the `StructReader`](https://github.com/apache/iceberg/pull/9366/files#diff-08597bac062ea934a49ebeb66da72f459dea1da642e1df2cdd2c8ca6884fed4dL578-L586), but there improvements that we can make to that approach: 1. It isn't the struct reader's responsibility to skip or change the readers that are passed to it. Read "planning" should be done here, where the read and file schemas are both present. 2. It wasn't clear what readers should be passed in or produced here, given that readers might be replaced 3. This unifies how new optional fields are handled with how row position and metadata fields are handled. It also sets up future default value handling using a constant reader, which is one of the reasons for making these changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org