[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18253 You commented on my code, not on the idea. My code was hacked together quickly, it can be cleaned up a lot. Your comments don't disprove that separating the refactoring of the listener bus hierarchy from the introduction of queues is impossible or undesirable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 @vanzin I pushed some comments on your code. I think that trying to keep the exact same class hierarchy leads to a very complex code, with many drawbacks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18253 @bOOm-X I pushed some code to my repo: https://github.com/vanzin/spark/tree/SPARK-18838 Which is an attempt to do things the way I've been trying to explain. It tries to keep changes as local as possible to `LiveListenerBus`, creating a few types that implement the grouping and asynchronous behavior. You could do filtering by extending the new `AsyncListener`, for example, and adding it to the live listener bus. It's just a p.o.c. so I cut a few corners (like metrics), and I only ran `SparkListenerSuite`, but I'm just trying to show a different approach that leaves the `ListenerBus` hierarchy mostly the same as now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18253 I really dislike `WithListenerBus` - both as a name and as a concept. There's already a `ListenerBus` trait; if it's not enough or is broken in some way, it should be fixed, instead of being patched by introducing yet more complexity in the hierarchy. I think part of the confusion here is that the current code is trying to both refactor the `ListenerBus` hierarchy and add the concept of queues are the same time. From my point of view they're different things and could be done separately. For example you could add queues to `LiveListenerBus` only, which is really the only place that matters in the end. Maybe it won't be optimal in its first iteration, but it would be a much easier change to review. I don't doubt that there's benefit in taking a holistic look into this part of the class hierarchy; but it would be good to do that separately, both so that we can clearly see that the proposed hierarchy makes sense, and so that it's easier to review things. It's easier to wrap your head around the code if it's focused on one problem instead of two. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 @vanzin `GroupOfListener` is just a set of listeners, which handle all the metric stuff. it allows to decouple this metric aspect from the `ListernerBus` and the `LiveListenerBus`. And It allows to enforce the dependency model between listeners (one message processed sequentially). I can change its name if you want. The `BusQueue` just handle the queuing / dequeueing generic processing, which includes the consumer thread start an stop and the queuing strategy (drop if full). `WithListenerBus` is indeed just the declaration of the method implemented in `ListenerBus`. This declaration is shared between the `LiveListenerBus` and the `ReplayListenerBus`, and the UI listeners are added to them through it. The `ListenerBus` still contains a simple synchronous implementation of them used by the `ReplayListenerBus`. The `LiveListenerBus` has its own based on multiple queues for some kind of set of listeners. I can rename this interface if you want. I will do a pass to try to fix the code style issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 @cloud-fan In my opinion, having the queues indexed by a string and reflecting that in the API is a bit too error prone in this case. What you want is to be in an isolated queue. With this kind of API it is easy to have an other listener added to the same queue because of a conflicting label. In the contrary, it is quite easy to not had 2 depending listeners not in the same queue because of case, ..., of the label. With a map of queues, it enforces the mutability of the queue, which is not really necessary here. And I think that it is a good idea to enforce the fact that depending listeners should be put in the same queue, it increases the readability and the clarity of the API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18253 Agree that it still seems like there's too many moving parts here. I don't see a whole lot of difference between `BusQueue` and `GroupOfListener` (which is a weird name, btw). A queue can have a bunch of listeners and a `LiveListenerBus` can have a bunch of queues. I don't see the need for anything more complicated than that. `WithListenerBus` is another thing I don't understand; aside from the really weird name, it seems to be mostly the same thing as `ListenerBus`. Also to reinforce a previous comment, your code has a ton of style issues. It doesn't matter that checkstyle doesn't complain about them; you still have to follow the code convention of the project or we'll be forever pointing out style issues in your code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18253 I think `BusQueue` is a good abstraction, but we can still simplify other parts. My proposal: ``` // Do we need a better name like ListenesGroup? it's very similar to the previous LiveListenerBus. class BusQueue extends SparkListenerBus { val eventQueue = ... } class LiveListenerBus { val listenersGroups: Map[String, BusQueue] = ... def addListener(listener, groupName = "default") { val group = listenerGroups.getOrUpdate(groupName, new BusQueue ...) group.addListener(listener) } def post(event) { listenersGroups.foreach { case (_, group) => group.post(event) } } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 @cloud-fan I will update the PR description with some details on the implementation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18253 The PR description is good for explaining the new behavior, but can you say more about the implementation? IMO, we just need to duplicate the event queue for each important listener like event logging listener, and non-important listeners can share one event queue like the current behavior. Then each event queue is processed by an individual thread. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 @vanzin I simplify a lot the code. There are now only one implementation for the queue and for the group of listeners. I removed the extra trait in the listener hierarchy too. Can you take a look ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18253 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 @cloud-fan There are now multiple queues. Many threads send events to a dispatcher which copy the event in each queue. One thread per queue poll the events from its dedicated queue and call its attached listener(s). For listener bus there is now two pure traits. the first one WithListenerBus allow to add / remove listener (current behavior). By default all the current implementation implement this trait . Another trait WithMultipleListenerBus inherit from the previous one and add a method to add isolated listener. Only the LiveListenerBus and the ReplayListenerBus implement this trait. For the second one, it is just a synonym of the simple addListener method. All the UI listeners are added to both through this method as a group of listeners (they are inter-dependent). The StreamingListenerBus, the EventLoggingListener, the ExecutorAllocationManager are added to the LiveListenerBus (with this method) to a dedicated queue. The "extralisteners" are added to it as a group of listeners in a dedicated queue too. All other listeners are added to the "default" queue with the old method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 @vanzin I put an ArrayBlockingQueue as you wanted --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18253 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79524/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18253 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18253 **[Test build #79524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79524/testReport)** for PR 18253 at commit [`aca4dc5`](https://github.com/apache/spark/commit/aca4dc5b440576d8a35e4b2c083eb217dbeaaaf2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18253 **[Test build #79524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79524/testReport)** for PR 18253 at commit [`aca4dc5`](https://github.com/apache/spark/commit/aca4dc5b440576d8a35e4b2c083eb217dbeaaaf2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18253 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79517/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18253 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18253 **[Test build #79517 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79517/testReport)** for PR 18253 at commit [`4edcee7`](https://github.com/apache/spark/commit/4edcee7f68b95dd78a6a487aa054cbd9bafcae66). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18253 **[Test build #79517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79517/testReport)** for PR 18253 at commit [`4edcee7`](https://github.com/apache/spark/commit/4edcee7f68b95dd78a6a487aa054cbd9bafcae66). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18253 I was out for the last couple of weeks. But, taking a quick look, I really would prefer for you to stop trying to implement your own blocking queue without numbers that justify the need for it. Putting event listeners in separate queues does not require you to write a custom blocking queue implementation; if there's benefit in that, you can do it as a separate PR, when you actually provide evidence that backs up the necessity for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18253 What's the processing model now? Previously it's very simple, we have an event queue, and many threads send events to this queue, and one thread poll events from this and call listeners. What is it now? Can you give a summary about the implementation? e.g. How you change the API in `ListenerBus` and what's the new class hierarchy for listener buses? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18253 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18253 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79296/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18253 **[Test build #79296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79296/testReport)** for PR 18253 at commit [`19ee415`](https://github.com/apache/spark/commit/19ee415f3a074e97d6f550fed46a8556ef3ce62b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18253 **[Test build #79296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79296/testReport)** for PR 18253 at commit [`19ee415`](https://github.com/apache/spark/commit/19ee415f3a074e97d6f550fed46a8556ef3ce62b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18253 **[Test build #79271 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79271/testReport)** for PR 18253 at commit [`a3b193a`](https://github.com/apache/spark/commit/a3b193a5511be12282db020cd48d5518d86da696). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18253 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79271/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18253 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18253 **[Test build #79271 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79271/testReport)** for PR 18253 at commit [`a3b193a`](https://github.com/apache/spark/commit/a3b193a5511be12282db020cd48d5518d86da696). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18253 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 @cloud-fan Yes it is ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18253 is this a successor of https://github.com/apache/spark/pull/16291 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 @vanzin : Any news ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 `PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 11312 a.prang 20 0 17.661g 1.699g 66924 S 8.3 0.7 1:53.41 java` Ok it is good now. Do you have an idea why the PySpark packaging test are failling ? I tried to rebase but no impact --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18253 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18253 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78624/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18253 **[Test build #78624 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78624/testReport)** for PR 18253 at commit [`a3b193a`](https://github.com/apache/spark/commit/a3b193a5511be12282db020cd48d5518d86da696). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18253 **[Test build #78624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78624/testReport)** for PR 18253 at commit [`a3b193a`](https://github.com/apache/spark/commit/a3b193a5511be12282db020cd48d5518d86da696). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 please test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18253 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18253 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78569/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18253 **[Test build #78569 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78569/testReport)** for PR 18253 at commit [`f423363`](https://github.com/apache/spark/commit/f423363d2712cfec7fd93f5ff2ef1a078408ce9f). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18253 **[Test build #78569 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78569/testReport)** for PR 18253 at commit [`f423363`](https://github.com/apache/spark/commit/f423363d2712cfec7fd93f5ff2ef1a078408ce9f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18253 And idle spark-shell should be using 0% of the CPU. ``` PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 1304 vanzin20 0 6773996 743104 31468 S 0.3 2.3 0:37.28 java ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 With a `LockSupport.parkNanos(1000)` instead I get: ``` PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 43448 a.prang 20 0 17.663g 2.378g 8 S 35.2 0.9 3:41.04 java ``` Is it ok ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18253 To illustrate what I'm trying to say, here's what `top` says for an *idle* spark-shell with your code: ``` PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 24247 vanzin20 0 6778304 766336 31836 S 201.0 2.3 1:34.06 java ``` I hope you see why that's an issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18253 > I change the Thread.sleep to a Thread.yield to be less agressive for the thread unscheduling How does that help? Now you have a thread that never sleeps. It gives up its scheduler slot but is still active and will be scheduled again, even if it doesn't need to. Can you explain in numbers why exactly you're against any of the other solutions that actually prevent these threads from being scheduled when not needed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18253 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78412/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18253 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18253 **[Test build #78412 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78412/testReport)** for PR 18253 at commit [`7c4c040`](https://github.com/apache/spark/commit/7c4c04022aabda2f1b056b4ff1e3bc8f0f1aa5cf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 > Well, you could use an ArrayBlockingQueue. Then no extra object is allocated either. Yes I agree. But you get the synchronization too. I am still agree that it should not have a big impact yet. But using an ArrayBlockingQueue does not simplify the code a lot. The current implementation is not complicated, not too verbose, and base on a simple pure scala array. I do not think that it has a huge complexity cost compared to the java ArrayBlockingQueue. I change the Thread.sleep to a Thread.yield to be less agressive for the thread unscheduling. Even with very few messages it should not consume too much CPU, and it will be much more reactive when messages are bursting. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18253 **[Test build #78412 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78412/testReport)** for PR 18253 at commit [`7c4c040`](https://github.com/apache/spark/commit/7c4c04022aabda2f1b056b4ff1e3bc8f0f1aa5cf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18253 > This implementation had 2 advantages... it use much less synchronization. Yes, but have you quantified how much you win with that? If the blocking queue approach has enough throughput for the listener bus, it's safer to use it. > is that no object is created for each message added to the queue Well, you could use an `ArrayBlockingQueue`. Then no extra object is allocated either. Here's a link with numbers for `ArrayBlockingQueue`: https://github.com/LMAX-Exchange/disruptor/wiki/Performance-Results 4M ops per sec in the 1P-1C case looks plenty fast for Spark's need. > I change it to 1 ms instead of 20ms. I think if you really insist on going this route, you should use `LockSupport.park/unpark` instead of fighting for CPU time like this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...
Github user bOOm-X commented on the issue: https://github.com/apache/spark/pull/18253 @vanzin Ok it is ready now. > why SynchronousListenerBus? It doesn't seem to add anything interesting over the existing interface. I removed it. > the whole "group of listeners" abstraction seems unnecessary. As far as I see it can be folded into the "listener queue" concept - a queue feeds events to a collection of listeners. I simplified it and add usage in the other commits. It is basically usefull to hold the metrics, and I need a common way to add a group of dependent listeners to the LiveListenerBus and the ReplayBus. > Changing "post" to "postToAll" as part of this change also is adding a lot of unnecessary noise. I'm not a fan of the current class hierarchy of the listener bus and I think that change makes sense, but at the same time it should be done separately since it's distracting here 100% agree. I removed it > I'd also like to see better justification for your custom queue implementation. Have you identified the use of BlockingQueue as a hotspot in the current code? This implementation had 2 advantages: it is a 1 producer - 1 consumer queue whereas the BlockingQueue is a n producers - m consumers. So it use much less synchronization. The other advantage (the main one) is that no object is created for each message added to the queue. So it produces a lot less garbage. More independent queues we have, more it is significant. > My main worry is the sleep; if you're unlucky, your queue will always be 20ms behind in processing events and may suffer if there's a sudden burst while it's sleeping. I change it to 1 ms instead of 20ms. This time is much less than the average processing time of the fastest listener (around 5 ms for the HeartbeatListener). It is just to force the consumer thread to escape in case of empty queue to give more chance to the producer thread to be scheduled. I can remove it if you want. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org