Re: [PR] Flink: Fix IcebergSource tableloader lifecycle management in batch mode [iceberg]

via GitHub Mon, 04 Dec 2023 03:47:09 -0800


pvary commented on code in PR #9173:
URL: https://github.com/apache/iceberg/pull/9173#discussion_r1413759640



##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java:
##########
@@ -197,17 +203,21 @@ private SplitEnumerator<IcebergSourceSplit, 
IcebergEnumeratorState> createEnumer
       LOG.info(
           "Iceberg source restored {} splits from state for table {}",
           enumState.pendingSplits().size(),
-          lazyTable().name());
+          tableName());
       assigner = assignerFactory.createAssigner(enumState.pendingSplits());
     }
 
+    // Create a copy of the table loader to avoid lifecycle management 
conflicts with the user
+    // provided table loader. This copy is only required for split planning, 
which uses the
+    // underlying io, and should be closed after split planning is complete

Review Comment:
   I usually prefer if the objects own the whole lifecycle of the child objects.
   
   So ideally:
   - `new ContinuousSplitPlannerImpl` should clone the loader itself, and keep 
it as long as it needs, and closes it at the end of the reading
   - `planSplitsForBatch` should clone the loader itself, and keep it as long 
as it needs, and closes it at the end of the reading
   - For `tableName`, I am a bit confused. In `planSplitsForBatch` we need to 
clone the loader to do the planning, but use the old loader to get the 
`tableName` for logging? Why not set the `tableName` value in the constructor, 
and forget the whole `lazyTable` thing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Flink: Fix IcebergSource tableloader lifecycle management in batch mode [iceberg]

Reply via email to