AnishMahto commented on code in PR #50875:
URL: https://github.com/apache/spark/pull/50875#discussion_r2096465030
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -2105,6 +2105,12 @@ class Analyzer(override val catalogManager:
CatalogManager) extends RuleExecutor
catalog, "table-valued functions")
}
}
+ if (u.isStreaming && !resolvedFunc.isStreaming) {
Review Comment:
These are great questions.
First of all, the intention of the STREAM keyword is to convert a batch
relation into a streaming relation, with the primary use case being that it can
be used to populate downstream streaming tables in pipelines using a batch
source. If a relation is _already_ streaming, the STREAM keyword should just be
no-op.
Second, I _think_ the TVF is responsible for defining whether it's streaming
or not on its own. The `TableFunctionRegistry` trait at least is a registry of
function names -> `LogicalPlan`, and the `LogicalPlan` backing the TVF can
define whether its streaming or not (ex. `Range` is [not
streaming](https://github.com/apache/spark/blob/6ad0bd448b6e9d65cdc3bea9145e8c0bea8a4454/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L998)).
And it looks like the resolution of a TVF is simply replacing it with the
corresponding LogicalPlan mapped to that TVF in the registry, which then gets
analyzed.
With my changes as-is, for TVFs I'm not converting the LogicalPlan (that
backs the TVF, as per the function registry) to be streaming - I am just
validating that when STREAM is used on a TVF, the resolved LogicalPlan for the
TVF is indeed streaming - but I don't even know if any default/builtin
functions in the registry are defined by a streaming logical plan anyway.
When I think about it, I don't actually think the pipelines module will need
the ability to convert TVFs to streaming in the first place. In theory users
can always define a table that is populated by a batch TVF, and then STREAM
from that table.
Hence I decided to just remove `STREAM <tvf>` support for now, we can
revisit adding it back later if a concrete use-case comes up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]