hudi-agent commented on code in PR #18862:
URL: https://github.com/apache/hudi/pull/18862#discussion_r3314902817
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/HoodieRowDataLanceReader.java:
##########
@@ -119,7 +120,23 @@ public BloomFilter readBloomFilter() {
@Override
public Set<Pair<String, Long>> filterRowKeys(Set<String> candidateRowKeys) {
- throw new HoodieException("Filtering row keys from Lance files is not
supported for Flink append-only tables without primary keys: " + path);
+ Set<Pair<String, Long>> result = new HashSet<>();
+ long position = 0;
+ boolean includeAllKeys = candidateRowKeys == null ||
candidateRowKeys.isEmpty();
Review Comment:
🤖 nit: treating an empty `candidateRowKeys` as "include all" is surprising —
most `filterRowKeys` implementations would return an empty set for an empty
input. Could you document this behavior in a Javadoc, or restrict the "include
all" branch to `null` only?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java:
##########
@@ -216,29 +216,22 @@ private void checkIndexType(Configuration conf) {
}
/**
- * Validate the base file format. Flink Lance support is scoped to
append-only COW tables.
+ * Validate the base file format. Flink Lance support is scoped to COW
tables.
*/
- private void checkBaseFileFormatForRead(Configuration conf, ResolvedSchema
schema) {
- checkLanceBaseFileFormat(conf, schema);
+ private void checkBaseFileFormatForRead(Configuration conf) {
+ checkLanceBaseFileFormat(conf);
Review Comment:
🤖 nit: now that the primary-key and append-mode checks are gone,
`checkBaseFileFormatForRead` and `checkBaseFileFormatForWrite` are identical
one-line wrappers around `checkLanceBaseFileFormat`. Could you collapse them
into a single `checkBaseFileFormat(conf)` call at the two call sites?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]