[GitHub] spark pull request: [SPARK-10182] [MLlib] GeneralizedLinearModel d...

SlavikBaranov Mon, 24 Aug 2015 11:26:21 -0700

Github user SlavikBaranov commented on the pull request:

    https://github.com/apache/spark/pull/8395#issuecomment-134327407
  
    @srowen I think, intermediate data is cached because it significantly 
improves performance if the number of features is high (since `appendBias` 
method performs `System.arrayCopy`). On the other hand, 
`LogisticRegressionWithLBFGS` class always performs feature scaling, so 
redundant evaluation of the input RDD might take significant time. 
    
    Thinking of it again, I'd prefer to return removed warning and limit the 
fix to something like this:
    
        if (data.getStorageLevel != StorageLevel.NONE) {
          data.unpersist(false)
        }
    
    What do you think about it?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10182] [MLlib] GeneralizedLinearModel d...

Reply via email to