sahnib commented on code in PR #44323:
URL: https://github.com/apache/spark/pull/44323#discussion_r1600366874


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinHelper.scala:
##########
@@ -219,10 +222,41 @@ object StreamingSymmetricHashJoinHelper extends Logging {
           attributesWithEventWatermark = 
AttributeSet(otherSideInputAttributes),
           condition,
           eventTimeWatermarkForEviction)
-        val inputAttributeWithWatermark = 
oneSideInputAttributes.find(_.metadata.contains(delayKey))
-        val expr = watermarkExpression(inputAttributeWithWatermark, 
stateValueWatermark)
-        expr.map(JoinStateValueWatermarkPredicate.apply _)
 
+        // If the condition itself is empty (for example, left_time < 
left_time + INTERVAL ...),
+        // then we will not have generated a stateValueWatermark.
+        if (stateValueWatermark.isEmpty) {
+          None
+        } else {
+          // For example, if the condition is of the form:
+          //    left_time > right_time + INTERVAL 30 MINUTES
+          // Then this extracts left_time and right_time.
+          val attributesInCondition = AttributeSet(
+            condition.get.collect { case a: AttributeReference => a }
+          )
+
+          // Construct an AttributeSet so that we can perform equality between 
attributes,
+          // which we do in the filter below.
+          val oneSideInputAttributeSet = AttributeSet(oneSideInputAttributes)
+
+          // oneSideInputAttributes could be [left_value, left_time], and we 
just
+          // want the attribute _in_ the time-interval condition.
+          val oneSideStateWatermarkAttributes = attributesInCondition.filter { 
a =>
+            oneSideInputAttributeSet.contains(a)
+          }
+
+          // There should be a single attribute per side in the time-interval 
condition, so,
+          // filtering for oneSideInputAttributes as done above should lead us 
with 1 attribute.
+          if (oneSideStateWatermarkAttributes.size == 1) {
+            val expr =

Review Comment:
   Discussed offline as well. This assumption does not seem to be correct. We 
actually need to find the partial join condition where the otherSide has 
eventTime attribute, and use that attribute to calculate watermark predicate. 
   
   As an aside, it might be beneficial to combine this function with 
`getStateWatermark` as both of these have similar logic. 



##########
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala:
##########
@@ -257,6 +257,75 @@ class StreamingInnerJoinSuite extends StreamingJoinSuite {
     )
   }
 
+

Review Comment:
   Lets also add a testcase for join condition where we compare eventTime and 
some other attribute (example - id)
   
   See https://github.com/apache/spark/pull/44323/files#r1582300122 for 
context. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to