xuanyuanking opened a new pull request #29256:
URL: https://github.com/apache/spark/pull/29256


   ### What changes were proposed in this pull request?
   Check the Distinct nodes by assuming it as Aggregate in 
`UnsupportOperationChecker` for streaming.
   
   ### Why are the changes needed?
   Since the union clause in SQL has the requirement of deduplication, the 
parser will generate `Distinct(Union)` and the optimizer rule 
`ReplaceDistinctWithAggregate` will change it to `Aggregate(Union)`. This logic 
is of both batch and streaming queries. However, in the streaming, the 
aggregation will be wrapped by state store operations.
   
   Before this change, the SS union queries in Append mode will get the 
following confusing error when the watermark is lacking.
   ```
   java.util.NoSuchElementException: None.get
        at scala.None$.get(Option.scala:529)
        at scala.None$.get(Option.scala:527)
        at 
org.apache.spark.sql.execution.streaming.StateStoreSaveExec.$anonfun$doExecute$9(statefulOperators.scala:346)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:561)
        at 
org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs(statefulOperators.scala:112)
   ...
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, return a better error message.
   
   
   ### How was this patch tested?
   New UT added.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to