[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user fjiang6 commented on the pull request: https://github.com/apache/spark/pull/8013#issuecomment-130106745 This class was not added by me. I didn't touch PySpark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user fjiang6 commented on the pull request: https://github.com/apache/spark/pull/8013#issuecomment-128907692 @dbtsai ust added the objective function, and use Params to switch between different objective function. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user fjiang6 commented on the pull request: https://github.com/apache/spark/pull/8013#issuecomment-128553895 @mengxr @dbtsai @srowen had RobustRegression in the same LinearRegression codebase as requested. And included the Unit Tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user fjiang6 commented on the pull request: https://github.com/apache/spark/pull/8013#issuecomment-128553707 @dbtsai @srowen had RobustRegression in the same LinearRegression codebase as requested. And included the Unit Tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
GitHub user fjiang6 opened a pull request: https://github.com/apache/spark/pull/8013 [SPARK-3181][MLLIB]: Add Robust Regression Algorithm with Huber Estimator Huber Robust Regression under spark/ml/regression Unit Tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/fjiang6/spark Huawei-Robust Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8013.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8013 commit 2f67e63c89292ffdff4e498433fd786fe830d627 Author: Fan Jiang Date: 2015-08-07T00:10:41Z add RobustRegression and Unit Tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user fjiang6 commented on the pull request: https://github.com/apache/spark/pull/7722#issuecomment-128165505 @AmplabJenkins Need your help. I can build with this command: sbt publish-local -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.2 and I can run all the tests. Please help understand the errors: not enough arguments for constructor LinearRegressionTrainingSummary: (predictions: org.apache.spark.sql.DataFrame, predictionCol: String, labelCol: String, featuresCol: String, objectiveHistory: Array[Double])org.apache.spark.ml.regression.LinearRegressionTrainingSummary. [error] Unspecified value parameter objectiveHistory. [error] val trainingSummary = new LinearRegressionTrainingSummary( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user fjiang6 commented on the pull request: https://github.com/apache/spark/pull/7722#issuecomment-127804144 @AmplabJenkins I can build. Can you re-test please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user fjiang6 commented on the pull request: https://github.com/apache/spark/pull/7722#issuecomment-126402876 @dbtsai @srowen Need your input to decide whether we want to add costFunc: Param[String] to LinearRegression or create a new class RobustRegression (or RobustLinearRegression). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
GitHub user fjiang6 opened a pull request: https://github.com/apache/spark/pull/7722 [SPARK-3181][MLLIB]: Add Robust Regression Algorithm with Huber Estimator Huber Robust Regression under spark/ml/regression You can merge this pull request into a Git repository by running: $ git pull https://github.com/Huawei-Spark/spark Robust Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7722.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7722 commit a4a8aa3250187653a8fb1f352b7e672d515fc5e8 Author: xinyunh Date: 2014-08-14T23:10:28Z fix the bug in 'Last' component commit b5e5d8d4ad07a649b8bcbfe71baabc60b15c7224 Author: xinyunh Date: 2014-08-14T20:13:02Z add 'Last' component commit ce65eb68ecef9880643da879abdc34dd4813a92d Author: xinyunh Date: 2014-08-14T23:10:28Z fix the bug in 'Last' component commit af4ff87851261d80aee92f407943d249909e42e8 Author: Fan Jiang Date: 2015-07-28T06:31:39Z update commit dcd757b72a1ba93f4edbaab5482ff434555a0ca4 Author: Fan Jiang Date: 2015-07-28T06:56:06Z add RobustRegression.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user fjiang6 commented on the pull request: https://github.com/apache/spark/pull/2096#issuecomment-107773045 Failed Tests: org.apache.spark.streaming.kafka.JavaKafkaStreamSuite.testKafkaStream org.apache.spark.streaming.kafka.KafkaStreamSuite.Kafka input stream org.apache.spark.sql.hive.thriftserver.CliSuite.simple commands --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4259][MLlib]: Add Power Iteration Clust...
GitHub user fjiang6 opened a pull request: https://github.com/apache/spark/pull/4254 [SPARK-4259][MLlib]: Add Power Iteration Clustering Algorithm with Gaussian Similarity Function Add single pseudo-eigenvector PIC Including documentations, one property file and updated pom.xml with the following codes: mllib/src/main/scala/org/apache/spark/mllib/clustering/PIClustering.scala mllib/src/test/scala/org/apache/spark/mllib/clustering/PIClusteringSuite.scala You can merge this pull request into a Git repository by running: $ git pull https://github.com/Huawei-Spark/spark PIC Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4254.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4254 commit a3c5fbe3451b665968d503fa4ee52f1f6118252a Author: Jiang Fan Date: 2015-01-22T21:52:52Z Adding Power Iteration Clustering commit d5aae2032c08d097ed3c6cd61ed2612a55a619df Author: Jiang Fan Date: 2015-01-22T21:57:35Z Adding Power Iteration Clustering and Suite test commit 3fd5bc895f1594c57a182c31e010966affb47325 Author: sboeschhuawei Date: 2015-01-23T00:17:57Z PIClustering is running in new branch (up to the pseudo-eigenvector convergence step) commit 0ef163f89ed82ed72967b51330e16ac3cf5759be Author: sboeschhuawei Date: 2015-01-23T04:20:47Z Added ConcentricCircles data generation and KMeans clustering commit 32a90dc5570ea02ee25b80c4440293581416209c Author: sboeschhuawei Date: 2015-01-23T16:48:00Z Update circles test data values commit 0700335d7b4fe9132046f034a67eb3405cd20953 Author: sboeschhuawei Date: 2015-01-23T22:30:53Z First end to end working version: but has bad performance issue commit e5df2b88c3668ecc4bc0cd25cde10dd033b9f72f Author: sboeschhuawei Date: 2015-01-24T04:20:32Z First end to end working PIC commit 929426339d9934d61878880b2182bc5e18acee6c Author: sboeschhuawei Date: 2015-01-25T11:00:07Z Added visualization/plotting of input/output data commit a2b1e5720266393a1813f0abe43c3709ebf46268 Author: sboeschhuawei Date: 2015-01-25T11:21:43Z Revert inadvertent update to KMeans commit b7dbcbe56767a8609314a20f24e907c426e827af Author: sboeschhuawei Date: 2015-01-26T00:03:46Z Added axes and combined into single plot for matplotlib commit f656c349b059a7df1c6415e69c2010873ba4d2d4 Author: sboeschhuawei Date: 2015-01-26T00:04:10Z Added iris dataset commit a112f38d0476cee2bb5aa49311ce98b800141f8e Author: sboeschhuawei Date: 2015-01-26T08:42:05Z Added graphx main and test jars as dependencies to mllib/pom.xml commit ace9749338c7454d17839dcf98ed75b131a21537 Author: Fan Jiang Date: 2015-01-26T18:27:50Z Update PIClustering.scala commit b29c0dbf081d8baa30a3a83b57492bf92b2f4b6a Author: Fan Jiang Date: 2015-01-26T18:57:04Z Update PIClustering.scala commit bea48eaa0cca25695c283616d86235227357980c Author: sboeschhuawei Date: 2015-01-27T00:58:57Z Converted custom Linear Algebra datatypes/routines to use Breeze. commit 90e7fa4b58b6d12f6b04dab3bf5f0a9d50f8d330 Author: sboeschhuawei Date: 2015-01-28T02:04:05Z Converted from custom Linalg routines to Breeze: added JavaDoc comments; added Markdown documentation commit be659e31f5d9b1d35561ee43620f36d26732a950 Author: sboeschhuawei Date: 2015-01-28T02:06:53Z Added mllib specific log4j commit 060e6bf8d45a211a6b71e2cba8e4bf2b14b9e72a Author: sboeschhuawei Date: 2015-01-28T06:49:12Z Added link to PIC doc from the main clustering md doc commit 24f438e9c72fcc77691fe5d70f01c1bb577ee874 Author: sboeschhuawei Date: 2015-01-28T06:50:29Z fixed incorrect markdown in clustering doc commit 88aacc8fa8aa955be2ec81caf001897b2bc91625 Author: sboeschhuawei Date: 2015-01-28T19:48:51Z Add assert to testcase on cluster sizes commit 43ab10be1c634f88d08f666df71ff15427e8a3d2 Author: sboeschhuawei Date: 2015-01-28T19:55:09Z Change last two println's to log4j logger commit 218a49d4e74b24bebf94033440904ca7411a28f0 Author: sboeschhuawei Date: 2015-01-28T20:38:04Z Applied Xiangrui's comments - especially removing RDD/PICLinalg classes and making noncritical methods private commit 1c3a62ea8d45609e22bf2394a73930b1a334422d Author: sboeschhuawei Date: 2015-01-28T21:23:52Z removed matplot.py and reordered all private methods to bottom of PIC commit 121e4d5fc0a0ab61a211fc71fea7a74775feb763 Author: sboeschhuawei Date: 2015-01-28T21:33:29Z Remove unused testing data files --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
[GitHub] spark pull request: [SPARK-4510][MLlib]: Add k-medoids Partitionin...
GitHub user fjiang6 opened a pull request: https://github.com/apache/spark/pull/3382 [SPARK-4510][MLlib]: Add k-medoids Partitioning Around Medoids (PAM) algorithm PAM (k-medoids) including the test case and an example. Passed the style checks Tested and compared with K-Means in MLlib, showing more steady performances. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Huawei-Spark/spark PAM Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3382.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3382 commit 95cd43e21a5e4499fd63125dde6973a4271a0de2 Author: Jiang Fan Date: 2014-11-20T02:59:18Z add PAM algorithm with an example commit 8721fc2d0fead9e72427909d5dab455e7dcd67f9 Author: Jiang Fan Date: 2014-11-20T03:05:43Z add newline at end of file commit 9b4131a3fee5e9cd5a7ac58c7718b78236412f7e Author: Jiang Fan Date: 2014-11-20T05:05:06Z add the PAMSuite.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user fjiang6 commented on the pull request: https://github.com/apache/spark/pull/2096#issuecomment-54686931 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user fjiang6 commented on the pull request: https://github.com/apache/spark/pull/2096#issuecomment-54577616 ERROR: Timeout after 10 minutes FATAL: Failed to fetch from https://github.com/apache/spark.git Can you please retest? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user fjiang6 commented on the pull request: https://github.com/apache/spark/pull/2096#issuecomment-54517693 Can you please retest this? Thanks! åèªæç iPhone > å¨ Sep 3, 2014ï¼11:22 PMï¼Apache Spark QA åéï¼ > > QA tests have finished for PR 2096 at commit ab0f539. > > This patch fails unit tests. > This patch merges cleanly. > This patch adds the following public classes (experimental): > case class Params( > class HuberRobustGradient extends Gradient > class HuberRobustRegressionModel ( > â > Reply to this email directly or view it on GitHub. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user fjiang6 commented on the pull request: https://github.com/apache/spark/pull/2096#issuecomment-54508101 Can you please retest this? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user fjiang6 commented on the pull request: https://github.com/apache/spark/pull/2096#issuecomment-54422920 The failed test is "org.apache.spark.graphx.lib.TriangleCountSuite.Count two triangles", which is a part I never touched. What could be the possible reason for this failure? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3188][MLLIB]: Add Robust Regression Alg...
GitHub user fjiang6 opened a pull request: https://github.com/apache/spark/pull/2110 [SPARK-3188][MLLIB]: Add Robust Regression Algorithm with Turkey bisquare (Biweight) function Biweight Robust Regression including the test case and an example. Passed the style checks You can merge this pull request into a Git repository by running: $ git pull https://github.com/Huawei-Spark/spark fanjiang-robustregression Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2110.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2110 commit 6ca4a44a9e8bd3b9a9dbc3d15ba0b76d77c75faa Author: yzhou2001 Date: 2014-08-14T23:26:36Z Merge pull request #1 from apache/master A merge of latest changes since creation commit 41d65b749ca583a2452864578005a1b960d8b117 Author: Bo Meng Date: 2014-08-22T02:54:19Z Merge branch 'master' of https://github.com/Huawei-Spark/spark cessary, commit f0571890522eb09d092f01e8b5a208d4849e56df Author: fjiang6 Date: 2014-08-21T22:34:15Z add example HuberRobustRegression.scala commit 38c1b3c0f7dd2a6f62eae2fecb006495ffc8a064 Author: fanjiang Date: 2014-08-22T09:49:41Z add new class HuberRobustGradient and RobustRegression.scala commit ba49567719299eb90b8f7a3a3db7561a912a210f Author: fjiang6 Date: 2014-08-24T01:34:14Z add Robust Regression Algorithm with Turkey bisquare weight function (Biweight Estimates) commit f4a84a3cb938b9917ca1faa5f173152eb9b01998 Author: fjiang6 Date: 2014-08-24T17:17:39Z adjust Gradient commit 5f9e03fd393992074edff0ccea48f4887f556578 Author: fjiang6 Date: 2014-08-24T21:15:48Z adjust Robust Regression Algorithm with Turkey bisquare weight function ⦠commit c9cafb31b234791d9728d1c5e0d2084214b1c3b5 Author: fjiang6 Date: 2014-08-24T21:31:19Z adjust Tukey bisquare (Biweight) function for Robust Regression commit 07e0e78d28d198ef36910933d0e199d0d4fa91ec Author: fjiang6 Date: 2014-08-24T21:42:32Z to validate the code style --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
GitHub user fjiang6 opened a pull request: https://github.com/apache/spark/pull/2096 [SPARK-3181][MLLIB]: Add Robust Regression Algorithm with Huber Estimator Huber Robust Regression including the test case and an example. Passed the style checks You can merge this pull request into a Git repository by running: $ git pull https://github.com/Huawei-Spark/spark fanjiang-huber Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2096.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2096 commit 6ca4a44a9e8bd3b9a9dbc3d15ba0b76d77c75faa Author: yzhou2001 Date: 2014-08-14T23:26:36Z Merge pull request #1 from apache/master A merge of latest changes since creation commit 41d65b749ca583a2452864578005a1b960d8b117 Author: Bo Meng Date: 2014-08-22T02:54:19Z Merge branch 'master' of https://github.com/Huawei-Spark/spark cessary, commit f0571890522eb09d092f01e8b5a208d4849e56df Author: fjiang6 Date: 2014-08-21T22:34:15Z add example HuberRobustRegression.scala commit 38c1b3c0f7dd2a6f62eae2fecb006495ffc8a064 Author: fanjiang Date: 2014-08-22T09:49:41Z add new class HuberRobustGradient and RobustRegression.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org