pvary commented on code in PR #9173:
URL: https://github.com/apache/iceberg/pull/9173#discussion_r1413759640
##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java:
##########
@@ -197,17 +203,21 @@ private SplitEnumerator<IcebergSourceSplit,
IcebergEnumeratorState> createEnumer
LOG.info(
"Iceberg source restored {} splits from state for table {}",
enumState.pendingSplits().size(),
- lazyTable().name());
+ tableName());
assigner = assignerFactory.createAssigner(enumState.pendingSplits());
}
+ // Create a copy of the table loader to avoid lifecycle management
conflicts with the user
+ // provided table loader. This copy is only required for split planning,
which uses the
+ // underlying io, and should be closed after split planning is complete
Review Comment:
I usually prefer if the objects own the whole lifecycle of the child objects.
So ideally:
- `new ContinuousSplitPlannerImpl` should clone the loader itself, and keep
it as long as it needs, and closes it at the end of the reading
- `planSplitsForBatch` should clone the loader itself, and keep it as long
as it needs, and closes it at the end of the reading
- For `tableName`, I am a bit confused. In `planSplitsForBatch` we need to
clone the loader to do the planning, but use the old loader to get the
`tableName` for logging? Why not set the `tableName` value in the constructor,
and forget the whole `lazyTable` thing?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]