Re: [PR] fix: RangePartitioning with native shuffle [datafusion-comet]

via GitHub Wed, 17 Sep 2025 09:29:31 -0700


mbutrovich commented on code in PR #2258:
URL: https://github.com/apache/datafusion-comet/pull/2258#discussion_r2356081738



##########
spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala:
##########
@@ -779,6 +779,15 @@ case class CometExecRule(session: SparkSession) extends 
Rule[SparkPlan] {
         false
     }
 
+    def supportedRangePartitioningDataType(dt: DataType): Boolean = dt match {
+      case _: BooleanType | _: ByteType | _: ShortType | _: IntegerType | _: 
LongType |
+          _: FloatType | _: DoubleType | _: TimestampType | _: 
TimestampNTZType | _: DecimalType |

Review Comment:
   I'll add a TODO after the first round of PR feedback (don't want to push a 
commit with just a comment) why we don't support `StringType` or `BinaryType` 
yet.
   
   It basically boils down to the `Row` API that we use from Arrow to represent 
our partition boundaries and compare against incoming batches to determine 
partitions doesn't support comparing dictionary and non-dictionary encoded 
varlens. We'd either need to unpack the dictionaries, or I'd have to extend 
comparator support in Arrow. I lean towards the latter, but have not scoped 
that work and it's beyond the scope of this PR. I can also open an issue and 
reference it in the code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: RangePartitioning with native shuffle [datafusion-comet]

Reply via email to