HeartSaVioR edited a comment on issue #23634: [SPARK-26154][SS] Streaming 
left/right outer join should not return outer nulls for already matched rows
URL: https://github.com/apache/spark/pull/23634#issuecomment-464907821
 
 
   @tdas 
   Now I'm also thinking about how to add matched flag to 
KeyWithIndexToValueStateStore, but fail to think how it is compatible with 
existing state, specifically, regarding optional matched flag.
   
   Does it mean you're suggesting not to version state format for streaming 
join? Situation would be pretty different which we choose.
   
   1) Keep state format as 1
   
   I'm not sure this is safe to guarantee accessibility of `matched` flag, 
given that state will be mixed up with existing rows and new rows which some 
don't have `matched` field (numFields also doesn't count `matched` field) and 
some have `matched` field. Even if this can be possible via specific trick 
(knowledge of internal), I'd like to address this without breaking any 
interface/expectation cause it could be a blocker when we want to change 
InternalRow. (Like allowing different schema of actual rows - this could occur 
crash on runtime.)
   
   Please shed a light (sketched idea) on how to do it if your would want to 
guide me here.
   
   2) Have two state formats
   
   I think refactoring is still necessary if we version state format (we could 
flatten some interface and implementations into one though), cause we will end 
up having two StreamingJoinStateManagers which most of things are duplicated 
but implementation of KeyWithIndexToValueStore will be a bit different. If we 
don't add interface but only add boolean flag to classify state formats we 
would end up having code regarding state format being mixed up with logic, 
complicated and hard to debug.
   
   In overall I totally understand the importance about reducing the code diff, 
but I'd hope we'd not too concerned about large code diff if we find its worth 
on introducing more codes. (I meant I'd hope the amount of code change would 
not be the first thing to consider.) To evaluate the worth we might be ideal to 
put aside of length of code diff when deciding general direction.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to