[GitHub] spark issue #16774: [SPARK-19357][ML] Adding parallel model evaluation in ML...

MLnick Fri, 09 Jun 2017 00:48:34 -0700

Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/16774
  
    @BryanCutler getting back to this after a long delay now that `2.2` is 
about ready. Sorry about that!
    
    You mentioned the approach above for creating the `ExecutorService` being 
useful because it is instantiated at the start of `fit` (which is good) and 
then cleaned up. However, I don't see any explicit cleanup of it happening and 
if we create a daemon thread pool won't that just hang around?
    
    Overall, I prefer the following semantics:
    1. If no custom context is set, we create the default one within `fit` and 
clean it up at the end of `fit`.
    2. For most use cases, even on shared resource cluster / notebook setups, 
this should suffice as we're creating a dedicated threadpool that only lives 
while `fit` is called. For custom contexts, that is really an expert setting. I 
would prefer to keep the API very simple as I mentioned above: `def 
setExecutorService(executorService: ExecutorService): Unit`, rather than 
introducing a new trait for this purpose. I think the overhead of creating a 
(typically relatively small) threadpool non-lazily before instantiating the 
validator class is not significant. Experts can then decide exactly what they 
want to do, and when / how to clean up their threadpool (e.g. they may want to 
keep it around for some reason).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16774: [SPARK-19357][ML] Adding parallel model evaluation in ML...

Reply via email to