mbutrovich commented on code in PR #2258:
URL: https://github.com/apache/datafusion-comet/pull/2258#discussion_r2356081738
##########
spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala:
##########
@@ -779,6 +779,15 @@ case class CometExecRule(session: SparkSession) extends
Rule[SparkPlan] {
false
}
+ def supportedRangePartitioningDataType(dt: DataType): Boolean = dt match {
+ case _: BooleanType | _: ByteType | _: ShortType | _: IntegerType | _:
LongType |
+ _: FloatType | _: DoubleType | _: TimestampType | _:
TimestampNTZType | _: DecimalType |
Review Comment:
I'll add a TODO after the first round of PR feedback (don't want to push a
commit with just a comment) why we don't support `StringType` or `BinaryType`
yet.
It basically boils down to the `Row` API that we use from Arrow to represent
our partition boundaries and compare against incoming batches to determine
partitions doesn't support comparing dictionary and non-dictionary encoded
varlens. We'd either need to unpack the dictionaries, or I'd have to extend
comparator support in Arrow. I lean towards the latter, but have not scoped
that work and it's beyond the scope of this PR. I can also open an issue and
reference it in the code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]