[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-77221980 That's correct: element i should have the error/loss for the ensemble containing trees {0, 1, ..., i}. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-77105855 Yes but each element of the array corresponds to the error / loss in every iteration right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-77099027 @MechCoder No problem; sorry I didn't make the JIRAs clearer! Calling it ```errorPerIteration``` sounds OK unless we allow users to pass in evaluators, in which case the evaluator might be something new like Accuracy which isn't an "error" metric. I'd still vote for evaluateEachIteration in case we allow this later on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...
Github user MechCoder closed the pull request at: https://github.com/apache/spark/pull/4819 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-77003845 @jkbradley Just one quick clarification, please. When you mean `evaluateEachIteration` should return an Array of Doubles, do you mean that each element corresponds to the cumulative error per iteration (i.e tree)? In that case how does the name `errorPerIteration` sound? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-76990366 Ouch. I just realised what you meant.. Scratch my previous couple of comments. :/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-76779548 @MechCoder I had intended to use this internally and to expose a public method. (The "evaluateEachIteration" method was the public one, but feel free to think of a better name.) Yes, the evaluator was the loss metric, which should probably be an optional parameter (defaulting to the training metric). * [https://issues.apache.org/jira/browse/SPARK-6025]: This is the JIRA for the public method. * [https://issues.apache.org/jira/browse/SPARK-5972]: This is the JIRA for the internal optimization. I'm Ok with combining the 2 JIRAs in 1 PR since they are closely related. For the internal optimization, the "residual" to store is not really the residual but rather the cumulative prediction of the ensemble; that in turn can be used to compute both the gradient and the error. (Note it will be important to use the cached residual for computing the gradient, not just the objective.) That may require adding some internal API to ensembles to permit prediction from a pre-computed sum of trees' predictions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-76527936 Also, the present code is unoptimized since there are two runs across the data RDD. one to update the residual, and the other to calculate the error. But that can be taken care after we discuss the design. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-76527811 @jkbradley I am assuming that this is what you intended. It works but I'm not sure about the present design, which differs from the design that you had posted in the JIRA. def evaluateEachIteration(data: RDD[LabeledPoint], evaluator): Array[Double] I am not sure how this would work, if the existing residual is not passed and could you also say what Array[Double] is supposed to be? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-76511909 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28110/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-76511907 [Test build #28110 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28110/consoleFull) for PR 4819 at commit [`7d4ed48`](https://github.com/apache/spark/commit/7d4ed483e0a0c58669ab00421d00eecda832cfba). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-76509618 [Test build #28110 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28110/consoleFull) for PR 4819 at commit [`7d4ed48`](https://github.com/apache/spark/commit/7d4ed483e0a0c58669ab00421d00eecda832cfba). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org