voonhous commented on code in PR #18837:
URL: https://github.com/apache/hudi/pull/18837#discussion_r3347310213


##########
hudi-trino-plugin/src/main/java/io/trino/plugin/hudi/HudiPageSource.java:
##########
@@ -115,8 +132,14 @@ public long getMemoryUsage()
     public void close()
             throws IOException
     {
-        fileGroupReader.close();
-        pageSource.close();
+        // Closer attempts every close and aggregates failures via 
addSuppressed, rethrowing the
+        // first. Registration is LIFO, so registering in reverse gives the 
original close order:
+        // recordIterator (wraps the readers) first, then fileGroupReader, 
then pageSource.
+        try (Closer closer = Closer.create()) {
+            closer.register(pageSource::close);
+            closer.register(fileGroupReader::close);
+            closer.register(recordIterator::close);

Review Comment:
   Good catch. Switched `close()` to register only `recordIterator::close` in 
the `Closer`. It's the outermost wrapper (`CloseableMappingIterator` -> 
`HoodieFileGroupReaderIterator` -> `fileGroupReader.close()` -> 
`baseFileIterator.close()`, which closes the Trino `pageSource`, plus 
`recordBuffer.close()`), so every underlying resource is now closed exactly 
once instead of 2-3x. Addressed in cce7bef.



##########
hudi-trino-plugin/src/main/java/io/trino/plugin/hudi/HudiUtil.java:
##########
@@ -78,10 +92,14 @@
 import static io.trino.plugin.hudi.HudiErrorCode.HUDI_SCHEMA_ERROR;
 import static io.trino.plugin.hudi.HudiErrorCode.HUDI_UNSUPPORTED_FILE_FORMAT;
 import static java.lang.Math.toIntExact;
-import static org.apache.hudi.common.model.HoodieRecord.HOODIE_META_COLUMNS;
+import static 
org.apache.hudi.common.model.HoodieRecord.PARTITION_PATH_METADATA_FIELD;
+import static 
org.apache.hudi.common.model.HoodieRecord.RECORD_KEY_METADATA_FIELD;
 
 public final class HudiUtil

Review Comment:
   Renamed `HOODIE_META_COLUMNS` -> `HUDI_REQUIRED_META_COLUMNS` to make 
explicit it's the minimal merge-required subset (record-key + partition-path), 
distinct from the upstream 5-field `HoodieRecord.HOODIE_META_COLUMNS`. Updated 
both usages in `HudiUtil`; the `HudiTestUtils` static import is the upstream 
5-field constant, so it's left as-is. Addressed in 557bc9e.



##########
hudi-trino-plugin/src/main/java/io/trino/plugin/hudi/HudiUtil.java:
##########
@@ -397,4 +410,53 @@ public static Schema 
getLatestTableSchema(HoodieTableMetaClient metaClient, Stri
             throw new TrinoException(HUDI_FILESYSTEM_ERROR, e);
         }
     }
+
+    public static List<HiveColumnHandle> getOrderingColumnHandles(Table table, 
TypeManager typeManager, Lazy<HoodieTableMetaClient> lazyMetaClient, 
HiveTimestampPrecision timestampPrecision)
+    {
+        RecordMergeMode recordMergeMode = 
lazyMetaClient.get().getTableConfig().getRecordMergeMode();
+        if (recordMergeMode == null || recordMergeMode == 
RecordMergeMode.COMMIT_TIME_ORDERING) {
+            // if commit time ordering is enabled, return empty list
+            return Collections.emptyList();
+        }
+
+        ImmutableList.Builder<HiveColumnHandle> columns = 
ImmutableList.builder();
+        List<String> orderingColumnNames = 
lazyMetaClient.get().getTableConfig().getOrderingFields();
+
+        int hiveColumnIndex = 0;
+        for (Column field : table.getDataColumns()) {
+            // ignore unsupported types rather than failing
+            if (orderingColumnNames.contains(field.getName())) {
+                HiveType hiveType = field.getType();
+                if (typeSupported(hiveType.getTypeInfo(), 
table.getStorage().getStorageFormat())) {
+                    columns.add(createBaseColumn(field.getName(), 
hiveColumnIndex, hiveType, getType(hiveType, typeManager, timestampPrecision), 
REGULAR, field.getComment()));
+                }
+            }
+            hiveColumnIndex++;
+        }
+
+        return columns.build();
+    }
+
+    /**
+     * Converts the given {@link HoodiePairData} into a {@link Map}.
+     * <p>
+     * Special handling is applied for null keys:
+     * <ul>
+     *   <li>If a key is null, it is stored in the map as a {@code null} 
entry.</li>
+     *   <li>If multiple entries share the same key (including null), the 
latest value overwrites the previous one.</li>
+     * </ul>
+     *
+     * @param pairData the HoodiePairData containing key-value pairs
+     * @param <K> the type of keys maintained by the resulting map
+     * @param <V> the type of mapped values
+     * @return a {@link Map} containing all key-value pairs from the input data
+     */
+    public static <K, V> Map<K, V> collectAsMap(HoodiePairData<K, V> pairData)
+    {
+        // HashMap allows null keys, so collect directly; on duplicate keys 
the later entry wins.
+        return pairData.collectAsList().stream()

Review Comment:
   Switched the three-arg `Stream.collect` to a plain `HashMap` + `forEach` as 
suggested - clearer for a put-each-pair loop, same semantics (null keys 
allowed, last write wins). Addressed in 9336b06.



##########
pom.xml:
##########
@@ -130,6 +129,12 @@
     <hive.avro.version>1.11.4</hive.avro.version>
     <presto.version>0.273</presto.version>
     <trino.version>390</trino.version>
+    <!-- Trino SPI version that hudi-trino-plugin main code compiles against. 
-->
+    <trino.connector.version>482-SNAPSHOT</trino.connector.version>
+    <!-- Trino version used only for *-tests.jar classifier deps. Trino does 
not publish
+         test-jars for tagged releases, so this tracks a snapshot. Built 
locally via the

Review Comment:
   Resolved. Both `trino.connector.version` and `trino.connector.test.version` 
are pinned to the released `481` now - no more 480/482 SNAPSHOTs and no 
main-vs-test divergence. The main Trino artifacts (trino-spi, etc.) at 481 are 
on Maven Central; the test-jars (trino-spi/-filesystem/-hive/-main test 
classifiers) aren't published anywhere, so the Hudi Trino CI checks out the 
`trinodb/trino` `481` tag and installs just those test-jars into the local m2 
before building the connector (`.github/workflows/hudi_trino_ci.yml`). 
Downstream contributors running the plugin tests do the same via the documented 
build steps.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to