[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator
[ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000188#comment-16000188 ] Apache Spark commented on SPARK-17134: -- User 'VinceShieh' has created a pull request for this issue: https://github.com/apache/spark/pull/17894 > Use level 2 BLAS operations in LogisticAggregator > - > > Key: SPARK-17134 > URL: https://issues.apache.org/jira/browse/SPARK-17134 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Seth Hendrickson >Assignee: Seth Hendrickson > > Multinomial logistic regression uses LogisticAggregator class for gradient > updates. We should look into refactoring MLOR to use level 2 BLAS operations > for the updates. Performance testing should be done to show improvements. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator
[ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000176#comment-16000176 ] Vincent commented on SPARK-17134: - I will submit a PR for this issue soon. > Use level 2 BLAS operations in LogisticAggregator > - > > Key: SPARK-17134 > URL: https://issues.apache.org/jira/browse/SPARK-17134 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Seth Hendrickson >Assignee: Seth Hendrickson > > Multinomial logistic regression uses LogisticAggregator class for gradient > updates. We should look into refactoring MLOR to use level 2 BLAS operations > for the updates. Performance testing should be done to show improvements. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator
[ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15515529#comment-15515529 ] Seth Hendrickson commented on SPARK-17134: -- This makes sense. In my initial testing I found that having to standardize the features in every iteration takes a non-trivial amount of time. Still, you mentioned the desire to not cache the standardized dataset since it can create unnecessary memory overhead. One solution is to allow the users to specify that there data has already been standardized, and then we don't have to perform the extra divisions in the update method. Alternatively, we could do as you suggest above, but store the coefficients in column major order in order to still maximize cache hits. We'll need some testing for both cases to truly understand this. > Use level 2 BLAS operations in LogisticAggregator > - > > Key: SPARK-17134 > URL: https://issues.apache.org/jira/browse/SPARK-17134 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Seth Hendrickson > > Multinomial logistic regression uses LogisticAggregator class for gradient > updates. We should look into refactoring MLOR to use level 2 BLAS operations > for the updates. Performance testing should be done to show improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator
[ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15515491#comment-15515491 ] DB Tsai commented on SPARK-17134: - I did benchmark again. In old implementation, it takes 1.3hrs for one iteration, and in new implementation, it takes 3.5hrs for one iteration. I ran both experiment in the same spark job for fairness since they will get the same # of executors. I suspect that in old implementation, we cache the standardized dataset resulting better performance. > Use level 2 BLAS operations in LogisticAggregator > - > > Key: SPARK-17134 > URL: https://issues.apache.org/jira/browse/SPARK-17134 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Seth Hendrickson > > Multinomial logistic regression uses LogisticAggregator class for gradient > updates. We should look into refactoring MLOR to use level 2 BLAS operations > for the updates. Performance testing should be done to show improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator
[ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510672#comment-15510672 ] DB Tsai commented on SPARK-17134: - I'll try the old mlor in rdd tonight when the cluster is not busy. Actually, this is a very large training dataset, and around 160GB in memory. Since there are 22533 classes, and 100 features, the total parameters are 2.2M. I expect that level 2 blas will help significantly in this case. > Use level 2 BLAS operations in LogisticAggregator > - > > Key: SPARK-17134 > URL: https://issues.apache.org/jira/browse/SPARK-17134 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Seth Hendrickson > > Multinomial logistic regression uses LogisticAggregator class for gradient > updates. We should look into refactoring MLOR to use level 2 BLAS operations > for the updates. Performance testing should be done to show improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator
[ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510198#comment-15510198 ] Seth Hendrickson commented on SPARK-17134: -- Hmm, it would be nice to see this vs the old mlor in rdd API, just as a sanity check. I conducted performance testing against mllib initially, though, so there shouldn't be any regressions. > Use level 2 BLAS operations in LogisticAggregator > - > > Key: SPARK-17134 > URL: https://issues.apache.org/jira/browse/SPARK-17134 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Seth Hendrickson > > Multinomial logistic regression uses LogisticAggregator class for gradient > updates. We should look into refactoring MLOR to use level 2 BLAS operations > for the updates. Performance testing should be done to show improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator
[ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509053#comment-15509053 ] DB Tsai commented on SPARK-17134: - I'm benchmarking MLOR with 22533 of classes, and dense feature of 100. The number of instances are 200M. On a cluster with 1k executors, it takes 2.5 hours for one iteration. Will be great that we can do some performance investigation to see if we can push the performance further. Thanks. > Use level 2 BLAS operations in LogisticAggregator > - > > Key: SPARK-17134 > URL: https://issues.apache.org/jira/browse/SPARK-17134 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Seth Hendrickson > > Multinomial logistic regression uses LogisticAggregator class for gradient > updates. We should look into refactoring MLOR to use level 2 BLAS operations > for the updates. Performance testing should be done to show improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator
[ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429941#comment-15429941 ] Qian Huang commented on SPARK-17134: Thank you. I will do it. > Use level 2 BLAS operations in LogisticAggregator > - > > Key: SPARK-17134 > URL: https://issues.apache.org/jira/browse/SPARK-17134 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Seth Hendrickson > > Multinomial logistic regression uses LogisticAggregator class for gradient > updates. We should look into refactoring MLOR to use level 2 BLAS operations > for the updates. Performance testing should be done to show improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator
[ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429196#comment-15429196 ] Yanbo Liang commented on SPARK-17134: - [~qhuang] Please feel free to take this task and do the performance investigation. Thanks! > Use level 2 BLAS operations in LogisticAggregator > - > > Key: SPARK-17134 > URL: https://issues.apache.org/jira/browse/SPARK-17134 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Seth Hendrickson > > Multinomial logistic regression uses LogisticAggregator class for gradient > updates. We should look into refactoring MLOR to use level 2 BLAS operations > for the updates. Performance testing should be done to show improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator
[ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428848#comment-15428848 ] DB Tsai commented on SPARK-17134: - {code:borderStyle=solid} val margins = Array.ofDim[Double](numClasses) features.foreachActive { (index, value) => if (featuresStd(index) != 0.0 && value != 0.0) { var i = 0 val temp = value / featuresStd(index) while ( i < numClasses) { margins(i) += coefficients(i * numFeaturesPlusIntercept + index) * temp i += 1 } } } if (fitIntercept) { var i = 0 val length = features.size while ( i < numClasses) { margins(i) += coefficients(i * numFeaturesPlusIntercept + length) i += 1 } } val maxMargin = margins.max val marginOfLabel = margins(label.toInt) {code} > Use level 2 BLAS operations in LogisticAggregator > - > > Key: SPARK-17134 > URL: https://issues.apache.org/jira/browse/SPARK-17134 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Seth Hendrickson > > Multinomial logistic regression uses LogisticAggregator class for gradient > updates. We should look into refactoring MLOR to use level 2 BLAS operations > for the updates. Performance testing should be done to show improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator
[ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428228#comment-15428228 ] Qian Huang commented on SPARK-17134: I could be your backup if you are not available. This task is sort of like SPARK-6685 what i have done. > Use level 2 BLAS operations in LogisticAggregator > - > > Key: SPARK-17134 > URL: https://issues.apache.org/jira/browse/SPARK-17134 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Seth Hendrickson > > Multinomial logistic regression uses LogisticAggregator class for gradient > updates. We should look into refactoring MLOR to use level 2 BLAS operations > for the updates. Performance testing should be done to show improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator
[ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428227#comment-15428227 ] Qian Huang commented on SPARK-17134: I could be your backup if you are not available. This task is sort of like SPARK-6685 what i have done. > Use level 2 BLAS operations in LogisticAggregator > - > > Key: SPARK-17134 > URL: https://issues.apache.org/jira/browse/SPARK-17134 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Seth Hendrickson > > Multinomial logistic regression uses LogisticAggregator class for gradient > updates. We should look into refactoring MLOR to use level 2 BLAS operations > for the updates. Performance testing should be done to show improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator
[ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427529#comment-15427529 ] Yanbo Liang commented on SPARK-17134: - This is interesting. We also trying to use BLAS to accelerate linear algebra operations in other algorithms such as {{KMeans/ALS}} and I have some basic performance test result. I would like to contribute to this task. Thanks! > Use level 2 BLAS operations in LogisticAggregator > - > > Key: SPARK-17134 > URL: https://issues.apache.org/jira/browse/SPARK-17134 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Seth Hendrickson > > Multinomial logistic regression uses LogisticAggregator class for gradient > updates. We should look into refactoring MLOR to use level 2 BLAS operations > for the updates. Performance testing should be done to show improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org