Github user feynmanliang commented on the pull request:

    https://github.com/apache/spark/pull/8241#issuecomment-134831829
  
    OK, I understand why the ClassTag is necessary, thanks.
    
    @jkbradley @sabhyankar I have some concerns with this PR:
    
    1. In the other PRs, `lclBcModel` is stored as a local stack variable so 
closure capture will only serialize that variable along with the task. Here, 
since it's getting mixed in via a trait, it's no longer a local stack variable 
but rather an attribute of the class instance. My concern is that closure 
serialization will now serialize the entire model instance along with the task, 
nullifying any performance benefits this may bring.
    
    Does that make sense? Is there a way for us to check that isn't happening?
    
    2. I'm a little uncomfortable that we don't restrict the types this trait 
can be mixed in to; this might lead to people using this in places we don't 
expect. This isn't a problem for ML models but since there's upper bound for 
mllib models in general, we have to decide what's better:
     a. duplicate the broadcast code in each mllib model to prevent unintended 
use of this trait
     b. leave it to each developer's best judgement
    Personally I'm indifferent, leaning slightly towards (a).
    
    3. We should make the filename consistent with the attribute name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to