wypb commented on code in PR #9388:
URL: https://github.com/apache/incubator-gluten/pull/9388#discussion_r2057671164
##########
gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala:
##########
@@ -416,90 +429,60 @@ case class WholeStageTransformer(child: SparkPlan,
materializeInput: Boolean = f
)
)
- SoftAffinity.updateFilePartitionLocations(Seq(allInputPartitions), rdd.id)
+ SoftAffinity.updateFilePartitionLocations(allInputPartitions, rdd.id)
rdd
}
- private def getSplitInfosFromPartitionSeqs(
- leafTransformers: Seq[BatchScanExecTransformerBase]):
Seq[Seq[SplitInfo]] = {
- // If these are two batch scan transformer with keyGroupPartitioning,
- // they have same partitionValues,
- // but some partitions maybe empty for those partition values that are not
present,
- // otherwise, exchange will be inserted. We should combine the two leaf
- // transformers' partitions with same index, and set them together in
- // the substraitContext. We use transpose to do that, You can refer to
- // the diagram below.
- // leaf1 Seq(p11) Seq(p12, p13) Seq(p14) ... Seq(p1n)
- // leaf2 Seq(p21) Seq(p22) Seq() ... Seq(p2n)
- // transpose =>
- // leaf1 | leaf2
- // Seq(p11) | Seq(p21) =>
substraitContext.setSplitInfo([Seq(p11), Seq(p21)])
- // Seq(p12, p13) | Seq(p22) =>
substraitContext.setSplitInfo([Seq(p12, p13), Seq(p22)])
- // Seq(p14) | Seq() ...
- // ...
- // Seq(p1n) | Seq(p2n) =>
substraitContext.setSplitInfo([Seq(p1n), Seq(p2n)])
-
- val allSplitInfos = leafTransformers.map(_.getSplitInfosWithIndex)
+ private def getSplitInfosFromPartitions(
+ isKeyGroupPartition: Boolean,
+ leafTransformers: Seq[LeafTransformSupport]): Seq[Seq[SplitInfo]] = {
Review Comment:
The `getSplitInfosFromPartitions` method combines the previous
`getSplitInfosFromPartitions` and `getSplitInfosFromPartitionSeqs`, so the
`leafTransformers` parameter type needs to be `Seq[LeafTransformSupport]`. If
`isKeyGroupPartition` is true, line 457 will converts it to
`BatchScanExecTransformerBase`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]