iemejia commented on code in PR #12390:
URL: https://github.com/apache/gluten/pull/12390#discussion_r3491496850
##########
gluten-substrait/src/main/java/org/apache/gluten/substrait/rel/LocalFilesNode.java:
##########
@@ -96,6 +96,9 @@ public enum ReadFileFormat {
/**
* Copies an existing node, replacing its per-file extra metadata. Lets
data-lake subclasses
* decorate a generically built node without re-deriving the file listing.
+ *
+ * <p>Note: uses direct list reference transfer (not deep copy) for
efficiency, since the original
+ * node is typically discarded immediately after this constructor returns.
Review Comment:
Fixed. Updated the comment to accurately say "shallow list copy (element
references are shared, not deep-copied)". A deep copy is unnecessary here since
callers supply freshly built maps and the original node is discarded
immediately after construction.
##########
gluten-delta/src-delta40/main/scala/org/apache/gluten/delta/DeltaDeletionVectorScanInfo.scala:
##########
@@ -62,10 +64,20 @@ object DeltaDeletionVectorScanInfo {
* Materializes per-file Delta DV read options for a split, alongside each
file's metadata with
* the DV bookkeeping keys stripped. Returns None when no file in the split
carries a deletion
* vector, so callers can keep the generic split representation.
+ *
+ * Performance: resolves the table path once (using the first file) and
reuses a single Hadoop
+ * Configuration instance across all files in the partition to avoid
redundant filesystem I/O and
+ * object allocation.
*/
def normalize(partitionColumnCount: Int, partitionFiles:
Seq[PartitionedFile])
: Option[(Seq[JMap[String, Object]], Seq[DeltaFileReadOptions])] = {
- val scanInfos = extractAll(activeSparkSession, partitionColumnCount,
partitionFiles)
+ val spark = activeSparkSession
+ val hadoopConf = spark.sessionState.newHadoopConf()
+ val cachedTablePath = resolveTablePath(hadoopConf, partitionColumnCount,
partitionFiles.head)
Review Comment:
Fixed. Added an early `if (partitionFiles.isEmpty) return None` guard before
accessing `.head`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]