kbendick commented on a change in pull request #2916:
URL: https://github.com/apache/iceberg/pull/2916#discussion_r685765465
##########
File path: core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java
##########
@@ -105,7 +112,11 @@ protected BaseEqualityDeltaWriter(StructLike partition,
Schema schema, Schema de
this.dataWriter = new RollingFileWriter(partition);
this.eqDeleteWriter = new RollingEqDeleteWriter(partition);
- this.posDeleteWriter = new SortedPosDeleteWriter<>(appenderFactory,
fileFactory, format, partition);
+ this.posDeleteWriter = new SortedPosDeleteWriter<>(appenderFactory,
+ fileFactory,
+ format,
+ partition,
+ getRecordsNumThreshold(properties));
Review comment:
Nit: This function name is admittedly confusing for me.
Also, we're passing around a potentially large table properties map across a
large number of frameworks where previously we only passed around Iceberg
specific classes. I'm not so worried about the size (though it does seem
potentially wasteful to be passing the whole table properties map around when
only one field is presently needed). I'm more worried about serializability
concerns. Sometimes certain maps made by Guava etc (ImmutableMap comes to mind)
are not serializable, particularly when using Kryo (which is not the default
but the de facto default with Spark).
Do we have any tests to ensure these new changes can be serialized? Or at
the least, if you're going to pass around the whole table properties map like
this, can you check that serialization doesn't break when using Kryo - I think
there are some unit tests for checking kryo and java serde with Spark.
Essentially, just like with Flink, the driver needs to be able to serialize all
of this code and send it to the executors (similar to how the job manager sends
generated code to the task managers responsible for individual subtasks).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]