GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/16186

    [SPARK-18758][SS] StreamingQueryListener events from a StreamingQuery 
should be sent only to the listeners in the same session as the query

    ## What changes were proposed in this pull request?
    
    Listeners added with `sparkSession.streams.addListener(l)` are added to a 
SparkSession. So events only from queries in the same session as a listener 
should be posted to the listener. Currently, all the events gets rerouted 
through the Spark's main listener bus, that is, 
    - StreamingQuery posts event to StreamingQueryListenerBus. Only the queries 
associated with the same session as the bus posts events to it.
    - StreamingQueryListenerBus posts event to Spark's main LiveListenerBus as 
a SparkEvent.
    - StreamingQueryListenerBus also subscribes to LiveListenerBus events thus 
getting back the posted event in a different thread. 
    - The received is posted to the registered listeners.
    
    The problem is that *all StreamingQueryListenerBuses in all sessions* gets 
the events and posts them to their listeners. This is wrong.
    
    In this PR, I solve it by making StreamingQueryListenerBus track active 
queries (by their runIds) when a query posts the QueryStarted event to the bus. 
This allows the rerouted events to be filtered using the tracked queries.
    
    Note that this list needs to be maintained separately
    from the `StreamingQueryManager.activeQueries` because a terminated query 
is cleared from
    `StreamingQueryManager.activeQueries` as soon as it is stopped, but the 
this ListenerBus must
    clear a query only after the termination event of that query has been 
posted lazily, much after the query has been terminated.
    
    Credit goes to @zsxwing for coming up with the initial idea.
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-18758

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16186.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16186
    
----
commit 2c73f1e9965ea4ad28c9f7f0574540d614fa5de7
Author: Tathagata Das <tathagata.das1...@gmail.com>
Date:   2016-12-07T03:19:01Z

    Fixed bug

commit d3057d5cee64a0b0d308452511730770f4866bd7
Author: Tathagata Das <tathagata.das1...@gmail.com>
Date:   2016-12-07T03:42:11Z

    Simpler fix

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to