[GitHub] incubator-spark pull request: [WIP] SPARK-1058, Fix Style Errors a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/557#issuecomment-34543052 Merged build finished.
答复:[GitHub] incubator-spark pull request:
I think may be like yarn , any JIRA creation will be forward to dev@ list , dev list also include discussion of new features and bugs , and all Jenkins build message for Pre commit . And any update of a specific JIRA, like assigne or comment will be forward to issues@ list. yarn also have a commit@ list if any svn ci happens . Maybe we can use some of it , it's just a advice ^_^ -- 发件人:Xuefeng Wu ben...@gmail.com 发送时间:2014年2月8日(星期六) 14:34 收件人:dev@spark.incubator.apache.org dev@spark.incubator.apache.org 主 题:Re: [GitHub] incubator-spark pull request: github have this feature, but these mails are from g...@git.apache.org. I think some github information are filtered. https://github.com/blog/811-reply-to-comments-from-email On Sat, Feb 8, 2014 at 2:21 PM, Reynold Xin r...@databricks.com wrote: I don't think it does. On Fri, Feb 7, 2014 at 8:58 PM, Nan Zhu zhunanmcg...@gmail.com wrote: If we reply these emails, will the reply be posted on pull request discussion board automatically? if yes, that would be very nice -- Nan Zhu On Friday, February 7, 2014 at 9:23 PM, Henry Saputra wrote: I am with Chris on this one. These github notifications are similar to JIRA updates that in most ASF projects are sent to dev@ list, and these are valid messages that contributors in the project should concern about. Especially the PPMCs (which willl be PMCs hopefully soon) need to know about them and become audit trail/ archive of development discussions for ASF. We already have user@ list which targeted for people interested to ask for questions using Spark and should be the proper list for people interested on using Spark. As Matei have said, you can filter these github notifications email easily. Thanks, - Henry On Fri, Feb 7, 2014 at 6:02 PM, Chris Mattmann mattm...@apache.org (mailto: mattm...@apache.org) wrote: Guys this Github discussion seems like dev discussion in which case it must be on dev list and not moved - the whole point of this is that development, including conversations related to it, which are the lifeblood of the project should occur on the ASF mailing lists. Refactoring the lists is one thing for the more automated messages, but the comments below look like Kay commenting on some relevant stuff in which case I would argue against (paraphrased) moving it to some ASF list that those who care can subscribe to. Those who care in this case should be people who care about Kay's comments (which aren't automated commit messages from some bot; they are relevant dev comments) in which case those who care should be the PMC. My suggestion is if there is a notifications list set up, it can be like for automated stuff - but *NOT* for dev discussion -- that needs to happen on the dev lists. If it's on another list, then I would expect periodically (frequently; with enough diligence to VOTE on and discuss and contribute to) to see that flushed or summarized on the dev list. Cheers, Chris -Original Message- From: Andrew Ash and...@andrewash.com (mailto:and...@andrewash.com ) Reply-To: dev@spark.incubator.apache.org (mailto: dev@spark.incubator.apache.org) dev@spark.incubator.apache.org(mailto: dev@spark.incubator.apache.org) Date: Friday, February 7, 2014 5:43 PM To: dev@spark.incubator.apache.org (mailto: dev@spark.incubator.apache.org) dev@spark.incubator.apache.org(mailto: dev@spark.incubator.apache.org) Subject: Re: [GitHub] incubator-spark pull request: +1 on moving this stuff to a separate mailing list. It's Apache policy that discussion is archived, but it's not policy that it must be interleaved with other dev discussion. Let's move it to a spark-github-discuss list (or a different name) and people who care to see it can subscribe. On Fri, Feb 7, 2014 at 5:19 PM, Reynold Xin r...@databricks.com (mailto: r...@databricks.com) wrote: I concur wholeheartedly ... On Fri, Feb 7, 2014 at 4:55 PM, Dean Wampler deanwamp...@gmail.com (mailto:deanwamp...@gmail.com) wrote: This SPAM is not doing anyone any good. How about another mailing list for people who want to see this? Sent from my rotary phone. On Feb 7, 2014, at 10:33 AM, mridulm g...@git.apache.org (mailto: g...@git.apache.org) wrote: Github user mridulm commented on the pull request: https://github.com/apache/incubator-spark/pull/517#issuecomment-34484468 I am hoping that the PR Prashant Sharma submitted would also include ability to check these things once committed ! Thanks
[GitHub] incubator-spark pull request: [PySpark] Adding support for Sequenc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/263#issuecomment-34546763 Merged build started.
[GitHub] incubator-spark pull request: Adding an option to persist Spark RD...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/468#issuecomment-34546755 Merged build triggered.
[GitHub] incubator-spark pull request: [PySpark] Adding support for Sequenc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/263#issuecomment-34546762 Merged build triggered.
[GitHub] incubator-spark pull request: [PySpark] Adding support for Sequenc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/263#issuecomment-34546802 Merged build finished.
[GitHub] incubator-spark pull request: [PySpark] Adding support for Sequenc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/263#issuecomment-34546803 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12630/
[GitHub] incubator-spark pull request: Adding an option to persist Spark RD...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/468#issuecomment-34547492 Merged build finished.
[GitHub] incubator-spark pull request: Adding an option to persist Spark RD...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/468#issuecomment-34547493 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12629/
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
GitHub user martinjaggi opened a pull request: https://github.com/apache/incubator-spark/pull/563 new MLlib documentation for optimization, regression and classification new documentation with tex formulas, hopefully improving usability and reproducibility of the offered MLlib methods. also did some minor changes in the code for consistency. scala tests pass. for easier merging, we could maybe rebase these changes (only feb 7 is relevant) after https://github.com/apache/incubator-spark/pull/552 is merged? jira: https://spark-project.atlassian.net/browse/MLLIB-19 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-spark polishing-opt-MLlib Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-spark/pull/563.patch commit d73948db0d9bc36296054e79fec5b1a657b4eab4 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-06T15:57:23Z minor update on how to compile the documentation commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-06T15:59:43Z enable mathjax formula in the .md documentation files code by @shivaram commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-06T16:31:29Z split MLlib documentation by techniques and linked from the main mllib-guide.md site commit dcd2142c164b2f602bf472bb152ad55bae82d31a Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-06T17:04:26Z enabling inline latex formulas with $.$ same mathjax configuration as used in math.stackexchange.com sample usage in the linear algebra (SVD) documentation commit 0364bfabbfc347f917216057a20c39b631842481 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T02:19:38Z minor polishing, as suggested by @pwendell commit 93d74988c33a9e4ef0d15e39c8b8fc9e6c36bb28 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T16:33:24Z renaming LeastSquaresGradient not to confuse with squared regularizer or a squared gradient. added some more comments as what the loss functions are good for commit e4cbe99bbcf7f53ebb8f1a0d2e0b869a4922bca4 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T16:34:45Z use d for the number of features try to be consistent, that n is the number of data examples in the RDD, and each of them has d entries (also in documentation) commit 79768fd3429df5c6d56f05ac93bdd8cf4355d946 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T17:13:17Z correct scaling for MSE loss to be consistent with the documentation commit 1e228062b01ac806c4bd032eb0975a8b92431fd9 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T17:15:44Z new classification and regression documentation with complete mathematical formulations. trying to be general for adding future ML methods as well. table of all subgradients used for reference. this change also required a small addition to the mathjax configuration, to allow equation numbers. commit 89e472f4121debb175b625ab0c138e24c4e60de8 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T17:16:51Z new optimization documentation explaining GD and SGD and the distributed versions that MLlib implements. commit a33be78a47bad1745a03a6e0ee1a4ea1a7893805 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T17:38:57Z better comments in SGD code for regression commit 73f5e71e3d9a253ff378907fca202b8d6aae1268 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T22:41:42Z lambda R() in documentation commit eec58c9c860def9b3b7604c990ec1697812bcbbf Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-08T17:31:05Z telling what updater actually does also use proper scaling for the L2 regularization (using 1/2 as in the documentation) commit 2c1cf8d35145081a61865f55f4e48fcfbafddbbe Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-08T17:56:01Z remove broken url commit ecbac73a7450fc90ef1509d9a410c9b627617130 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-08T17:57:12Z better description of GradientDescent
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user pwendell commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34553608 Jenkins add to whitelist.
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user pwendell commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34553612 Jenkins, test this please.
[GitHub] incubator-spark pull request: tex formulas in the documentation
Github user martinjaggi commented on the pull request: https://github.com/apache/incubator-spark/pull/552#issuecomment-34553631 ok thanks!
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34553685 Merged build started.
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34553684 Merged build triggered.
[GitHub] incubator-spark pull request: [WIP] SPARK-1058, Fix Style Errors a...
Github user pwendell commented on the pull request: https://github.com/apache/incubator-spark/pull/557#issuecomment-34553870 @ScrapCodes Words cannot express my elation at having this patch. I noticed there are still style errors. Did you want me to merge this as-is and then you will add future pull requests (to avoid conflicts)?
[GitHub] incubator-spark pull request: [WIP] SPARK-1058, Fix Style Errors a...
Github user pwendell commented on the pull request: https://github.com/apache/incubator-spark/pull/557#issuecomment-34554122 Hey @ScrapCodes I noticed the size of indent is inconsistent. The rule is to always use 2 spaces. If you are breaking initialization of a code block (e.g. a function signature) then it's okay to use 4 spaces to distinguish it from the body. I think scala is silent on this exception but it's the convention we usually use. If you could go through and address those I'm happy to merge an intermediate clean-up to avoid conflicts.
[GitHub] incubator-spark pull request: tex formulas in the documentation
Github user martinjaggi closed the pull request at: https://github.com/apache/incubator-spark/pull/552
[GitHub] incubator-spark pull request: tex formulas in the documentation
GitHub user martinjaggi reopened a pull request: https://github.com/apache/incubator-spark/pull/552 tex formulas in the documentation using mathjax. and spliting the MLlib documentation by techniques see jira https://spark-project.atlassian.net/browse/MLLIB-19 and https://github.com/shivaram/spark/compare/mathjax You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-spark/pull/552.patch commit d73948db0d9bc36296054e79fec5b1a657b4eab4 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-06T15:57:23Z minor update on how to compile the documentation commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-06T15:59:43Z enable mathjax formula in the .md documentation files code by @shivaram commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-06T16:31:29Z split MLlib documentation by techniques and linked from the main mllib-guide.md site commit dcd2142c164b2f602bf472bb152ad55bae82d31a Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-06T17:04:26Z enabling inline latex formulas with $.$ same mathjax configuration as used in math.stackexchange.com sample usage in the linear algebra (SVD) documentation commit 0364bfabbfc347f917216057a20c39b631842481 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T02:19:38Z minor polishing, as suggested by @pwendell
[GitHub] incubator-spark pull request: tex formulas in the documentation
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/552#issuecomment-34554284 Merged build triggered.
[GitHub] incubator-spark pull request: tex formulas in the documentation
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/552#issuecomment-34554285 Merged build started.
[GitHub] incubator-spark pull request: Kill drivers in postStop() for Worke...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/561#issuecomment-34554453 Merged build started.
[GitHub] incubator-spark pull request: Kill drivers in postStop() for Worke...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/561#issuecomment-34554452 Merged build triggered.
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34554535 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12631/
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34554533 Merged build finished.
[GitHub] incubator-spark pull request: ROC AUC and Average precision metric...
Github user pwendell commented on the pull request: https://github.com/apache/incubator-spark/pull/550#issuecomment-34554613 @schmit Mind adding a JIRA for this?
[GitHub] incubator-spark pull request: Make sbt download an atomic operatio...
Github user pwendell commented on the pull request: https://github.com/apache/incubator-spark/pull/454#issuecomment-34554940 Seems reasonable to me, I'll merge this.
[GitHub] incubator-spark pull request: Make sbt download an atomic operatio...
Github user jey closed the pull request at: https://github.com/apache/incubator-spark/pull/454
[GitHub] incubator-spark pull request: tex formulas in the documentation
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/552#issuecomment-34555134 Merged build finished.
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34555222 Build started.
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34555221 Build triggered.
[GitHub] incubator-spark pull request: Kill drivers in postStop() for Worke...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/561#issuecomment-34555277 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12633/
[SUMMARY] Proposal for Spark Release Strategy
Hey All, Thanks for everyone who participated in this thread. I've distilled feedback based on the discussion and wanted to summarize the conclusions: - People seem universally +1 on semantic versioning in general. - People seem universally +1 on having a public merge windows for releases. - People seem universally +1 on a policy of having associated JIRA's with features. - Everyone believes link-level compatiblity should be the goal. Some people think we should outright promise it now. Others thing we should either not promise it or promise it later. -- Compromise: let's do one minor release 1.0-1.1 to convince ourselves this is possible (some issues with Scala traits will make this tricky). Then we can codify it in writing. I've created SPARK-1069 [1] to clearly establish that this is the goal for 1.X family of releases. - Some people think we should add particular features before having 1.0. -- Version 1.X indicates API stability rather than a feature set; this was clarified. -- That said, people still have several months to work on features if they really want to get them in for this release. I'm going to integrate this feedback and post a tentative version of the release guidelines to the wiki. With all this said, I would like to move the master version to 1.0.0-SNAPSHOT as the main concerns with this have been addressed and clarified. This merely represents a tentative consensus and the release is still subject to a formal vote amongst PMC members. [1] https://spark-project.atlassian.net/browse/SPARK-1069 - Patrick
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34555965 Build finished.
[GitHub] incubator-spark pull request: Principal Component Analysis
GitHub user rezazadeh opened a pull request: https://github.com/apache/incubator-spark/pull/564 Principal Component Analysis # Principal Component Analysis Computes the top k principal component coefficients for the m-by-n data matrix X. Rows of X correspond to observations and columns correspond to variables. The coefficient matrix is n-by-k. Each column of coeff contains coefficients for one principal component, and the columns are in descending order of component variance. This function centers the data and uses the singular value decomposition (SVD) algorithm. # Testing Tests included: * All principal components * Only top k principal components # Documentation # Example Usage import org.apache.spark.SparkContext import org.apache.spark.mllib.linalg.PCA import org.apache.spark.mllib.linalg.SparseMatrix import org.apache.spark.mllib.linalg.MatrixEntry // Load and parse the data file val data = sc.textFile(mllib/data/als/test.data).map { line = val parts = line.split(',') MatrixEntry(parts(0).toInt, parts(1).toInt, parts(2).toDouble) } val m = 4 val n = 4 val k = 1 // recover top principal component val coeffs = PCA.computePCA(SparseMatrix(data, m, n), k) {% endhighlight %} You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-spark pca Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-spark/pull/564.patch commit 0642afb2ec1ca6896ffd1a4d3b12eca3f4db52b3 Author: Reza Zadeh riz...@gmail.com Date: 2014-02-02T05:53:33Z Initial files commit 371f40ae288d45986c364adcfe4b584a9b00aa3d Author: Reza Zadeh riz...@gmail.com Date: 2014-02-08T01:50:59Z new interfaces commit 173148288dffe6cfa1d6671fa8dd9c57499fd0e8 Author: Reza Zadeh riz...@gmail.com Date: 2014-02-08T04:04:46Z add option to compute U commit fb022fcc857bc3793882587480671b3e0b23 Author: Reza Zadeh riz...@gmail.com Date: 2014-02-08T08:48:24Z new tests, SVD interface commit f756aff7b322504f09236f3ad4e05d4b75e8cc42 Author: Reza Zadeh riz...@gmail.com Date: 2014-02-08T08:49:47Z fix tests commit 2d831f8f734ddf207707b721aa9718ebd7e65ca9 Author: Reza Zadeh riz...@gmail.com Date: 2014-02-08T09:04:48Z Documentation, yo commit 31a5ecf977e6e4e6cd4d038aaa9f3d1ad1b3de49 Author: Reza Zadeh riz...@gmail.com Date: 2014-02-08T09:15:23Z added mllib guide docs commit 57fe6d4ed9e214a504dbb2c5c66205045d5846b5 Author: Reza Zadeh riz...@gmail.com Date: 2014-02-08T09:18:07Z SparkPCA example commit 07657476d3be2bd177090aaa37f6a4357329a188 Author: Reza Zadeh riz...@gmail.com Date: 2014-02-08T09:22:15Z fix typo commit b45c1e88cb36ce2e5c78f493b05455f87ecfc662 Author: Reza Zadeh riz...@gmail.com Date: 2014-02-08T09:23:15Z fix example
[GitHub] incubator-spark pull request: Principal Component Analysis
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/564#issuecomment-34556062 Build started.
[GitHub] incubator-spark pull request: Principal Component Analysis
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/564#issuecomment-34556061 Build triggered.
[GitHub] incubator-spark pull request: Principal Component Analysis
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/564#issuecomment-34556205 Build triggered.
[GitHub] incubator-spark pull request: Kill drivers in postStop() for Worke...
Github user Qiuzhuang closed the pull request at: https://github.com/apache/incubator-spark/pull/561
[GitHub] incubator-spark pull request: Principal Component Analysis
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/564#issuecomment-34556909 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12635/
[GitHub] incubator-spark pull request: Principal Component Analysis
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/564#issuecomment-34557013 Build finished.
[GitHub] incubator-spark pull request: Principal Component Analysis
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/564#issuecomment-34557014 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12636/
Re: [SUMMARY] Proposal for Spark Release Strategy
Thanks for the summary Patrick. I'm glad that we discussed the options before pulling the trigger on a version number update (my -1 had only been about committing a major version update without thorough discussion). IMO that's been addressed and given the discussion, I'm changing to a +1 for 1.0.0 On Feb 8, 2014 12:56 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Thanks for everyone who participated in this thread. I've distilled feedback based on the discussion and wanted to summarize the conclusions: - People seem universally +1 on semantic versioning in general. - People seem universally +1 on having a public merge windows for releases. - People seem universally +1 on a policy of having associated JIRA's with features. - Everyone believes link-level compatiblity should be the goal. Some people think we should outright promise it now. Others thing we should either not promise it or promise it later. -- Compromise: let's do one minor release 1.0-1.1 to convince ourselves this is possible (some issues with Scala traits will make this tricky). Then we can codify it in writing. I've created SPARK-1069 [1] to clearly establish that this is the goal for 1.X family of releases. - Some people think we should add particular features before having 1.0. -- Version 1.X indicates API stability rather than a feature set; this was clarified. -- That said, people still have several months to work on features if they really want to get them in for this release. I'm going to integrate this feedback and post a tentative version of the release guidelines to the wiki. With all this said, I would like to move the master version to 1.0.0-SNAPSHOT as the main concerns with this have been addressed and clarified. This merely represents a tentative consensus and the release is still subject to a formal vote amongst PMC members. [1] https://spark-project.atlassian.net/browse/SPARK-1069 - Patrick
Re: [SUMMARY] Proposal for Spark Release Strategy
Patrick, do you know if there is a way to check if a Github PR's subject/ title contains JIRA number and will raise warning by the Jenkins? - Henry On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Thanks for everyone who participated in this thread. I've distilled feedback based on the discussion and wanted to summarize the conclusions: - People seem universally +1 on semantic versioning in general. - People seem universally +1 on having a public merge windows for releases. - People seem universally +1 on a policy of having associated JIRA's with features. - Everyone believes link-level compatiblity should be the goal. Some people think we should outright promise it now. Others thing we should either not promise it or promise it later. -- Compromise: let's do one minor release 1.0-1.1 to convince ourselves this is possible (some issues with Scala traits will make this tricky). Then we can codify it in writing. I've created SPARK-1069 [1] to clearly establish that this is the goal for 1.X family of releases. - Some people think we should add particular features before having 1.0. -- Version 1.X indicates API stability rather than a feature set; this was clarified. -- That said, people still have several months to work on features if they really want to get them in for this release. I'm going to integrate this feedback and post a tentative version of the release guidelines to the wiki. With all this said, I would like to move the master version to 1.0.0-SNAPSHOT as the main concerns with this have been addressed and clarified. This merely represents a tentative consensus and the release is still subject to a formal vote amongst PMC members. [1] https://spark-project.atlassian.net/browse/SPARK-1069 - Patrick
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user pwendell commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34559476 @martinjaggi can you rebase this now?
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user martinjaggi commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34559530 can i do this on the github website or only in command line? On Sun, Feb 9, 2014 at 12:09 AM, Patrick Wendell notificati...@github.comwrote: @martinjaggi https://github.com/martinjaggi can you rebase this now? -- Reply to this email directly or view it on GitHubhttps://github.com/apache/incubator-spark/pull/563#issuecomment-34559476 .
Re: [SUMMARY] Proposal for Spark Release Strategy
I know that it can be done -- which is different from saying that I know how to set it up. On Feb 8, 2014, at 2:57 PM, Henry Saputra henry.sapu...@gmail.com wrote: Patrick, do you know if there is a way to check if a Github PR's subject/ title contains JIRA number and will raise warning by the Jenkins? - Henry On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Thanks for everyone who participated in this thread. I've distilled feedback based on the discussion and wanted to summarize the conclusions: - People seem universally +1 on semantic versioning in general. - People seem universally +1 on having a public merge windows for releases. - People seem universally +1 on a policy of having associated JIRA's with features. - Everyone believes link-level compatiblity should be the goal. Some people think we should outright promise it now. Others thing we should either not promise it or promise it later. -- Compromise: let's do one minor release 1.0-1.1 to convince ourselves this is possible (some issues with Scala traits will make this tricky). Then we can codify it in writing. I've created SPARK-1069 [1] to clearly establish that this is the goal for 1.X family of releases. - Some people think we should add particular features before having 1.0. -- Version 1.X indicates API stability rather than a feature set; this was clarified. -- That said, people still have several months to work on features if they really want to get them in for this release. I'm going to integrate this feedback and post a tentative version of the release guidelines to the wiki. With all this said, I would like to move the master version to 1.0.0-SNAPSHOT as the main concerns with this have been addressed and clarified. This merely represents a tentative consensus and the release is still subject to a formal vote amongst PMC members. [1] https://spark-project.atlassian.net/browse/SPARK-1069 - Patrick
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user martinjaggi commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34559872 i'm scared of the wrath of the git gods ;) https://help.github.com/articles/interactive-rebase (the rebase succeeded locally on my machine, but nothing has happened on github yet) On Sun, Feb 9, 2014 at 12:11 AM, Martin Jaggi m.ja...@gmail.com wrote: can i do this on the github website or only in command line? On Sun, Feb 9, 2014 at 12:09 AM, Patrick Wendell notificati...@github.com wrote: @martinjaggi https://github.com/martinjaggi can you rebase this now? -- Reply to this email directly or view it on GitHubhttps://github.com/apache/incubator-spark/pull/563#issuecomment-34559476 .
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user rxin commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34559900 (And then submit a new PR and close this one)
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user rxin commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34559894 To play it safe, you can always create a new branch and do the rebase there so it doesn't change your current branch.
Re: [SUMMARY] Proposal for Spark Release Strategy
:P - I'm pretty sure this can be done but it will require some work - we already use the github API in our merge script and we could hook something like that up with the jenkins tests. Henry maybe you could create a JIRA for this for Spark 1.0? - Patrick On Sat, Feb 8, 2014 at 3:20 PM, Mark Hamstra m...@clearstorydata.com wrote: I know that it can be done -- which is different from saying that I know how to set it up. On Feb 8, 2014, at 2:57 PM, Henry Saputra henry.sapu...@gmail.com wrote: Patrick, do you know if there is a way to check if a Github PR's subject/ title contains JIRA number and will raise warning by the Jenkins? - Henry On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Thanks for everyone who participated in this thread. I've distilled feedback based on the discussion and wanted to summarize the conclusions: - People seem universally +1 on semantic versioning in general. - People seem universally +1 on having a public merge windows for releases. - People seem universally +1 on a policy of having associated JIRA's with features. - Everyone believes link-level compatiblity should be the goal. Some people think we should outright promise it now. Others thing we should either not promise it or promise it later. -- Compromise: let's do one minor release 1.0-1.1 to convince ourselves this is possible (some issues with Scala traits will make this tricky). Then we can codify it in writing. I've created SPARK-1069 [1] to clearly establish that this is the goal for 1.X family of releases. - Some people think we should add particular features before having 1.0. -- Version 1.X indicates API stability rather than a feature set; this was clarified. -- That said, people still have several months to work on features if they really want to get them in for this release. I'm going to integrate this feedback and post a tentative version of the release guidelines to the wiki. With all this said, I would like to move the master version to 1.0.0-SNAPSHOT as the main concerns with this have been addressed and clarified. This merely represents a tentative consensus and the release is still subject to a formal vote amongst PMC members. [1] https://spark-project.atlassian.net/browse/SPARK-1069 - Patrick
[GitHub] incubator-spark pull request: Principal Component Analysis
Github user pwendell commented on the pull request: https://github.com/apache/incubator-spark/pull/564#issuecomment-34560617 @rezazadeh Mind adding a JIRA for this?
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user martinjaggi closed the pull request at: https://github.com/apache/incubator-spark/pull/563
[GitHub] incubator-spark pull request: Version number to 1.0.0-SNAPSHOT
Github user markhamstra closed the pull request at: https://github.com/apache/incubator-spark/pull/542
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
GitHub user martinjaggi reopened a pull request: https://github.com/apache/incubator-spark/pull/563 new MLlib documentation for optimization, regression and classification new documentation with tex formulas, hopefully improving usability and reproducibility of the offered MLlib methods. also did some minor changes in the code for consistency. scala tests pass. for easier merging, we could maybe rebase these changes (only feb 7 is relevant) after https://github.com/apache/incubator-spark/pull/552 is merged? jira: https://spark-project.atlassian.net/browse/MLLIB-19 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-spark polishing-opt-MLlib Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-spark/pull/563.patch commit d73948db0d9bc36296054e79fec5b1a657b4eab4 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-06T15:57:23Z minor update on how to compile the documentation commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-06T15:59:43Z enable mathjax formula in the .md documentation files code by @shivaram commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-06T16:31:29Z split MLlib documentation by techniques and linked from the main mllib-guide.md site commit dcd2142c164b2f602bf472bb152ad55bae82d31a Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-06T17:04:26Z enabling inline latex formulas with $.$ same mathjax configuration as used in math.stackexchange.com sample usage in the linear algebra (SVD) documentation commit 0364bfabbfc347f917216057a20c39b631842481 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T02:19:38Z minor polishing, as suggested by @pwendell commit 93d74988c33a9e4ef0d15e39c8b8fc9e6c36bb28 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T16:33:24Z renaming LeastSquaresGradient not to confuse with squared regularizer or a squared gradient. added some more comments as what the loss functions are good for commit e4cbe99bbcf7f53ebb8f1a0d2e0b869a4922bca4 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T16:34:45Z use d for the number of features try to be consistent, that n is the number of data examples in the RDD, and each of them has d entries (also in documentation) commit 79768fd3429df5c6d56f05ac93bdd8cf4355d946 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T17:13:17Z correct scaling for MSE loss to be consistent with the documentation commit 1e228062b01ac806c4bd032eb0975a8b92431fd9 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T17:15:44Z new classification and regression documentation with complete mathematical formulations. trying to be general for adding future ML methods as well. table of all subgradients used for reference. this change also required a small addition to the mathjax configuration, to allow equation numbers. commit 89e472f4121debb175b625ab0c138e24c4e60de8 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T17:16:51Z new optimization documentation explaining GD and SGD and the distributed versions that MLlib implements. commit a33be78a47bad1745a03a6e0ee1a4ea1a7893805 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T17:38:57Z better comments in SGD code for regression commit 73f5e71e3d9a253ff378907fca202b8d6aae1268 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-07T22:41:42Z lambda R() in documentation commit eec58c9c860def9b3b7604c990ec1697812bcbbf Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-08T17:31:05Z telling what updater actually does also use proper scaling for the L2 regularization (using 1/2 as in the documentation) commit 2c1cf8d35145081a61865f55f4e48fcfbafddbbe Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-08T17:56:01Z remove broken url commit ecbac73a7450fc90ef1509d9a410c9b627617130 Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-08T17:57:12Z better description of GradientDescent commit eae3dce25a4b68bf32ece1ca7783f9b2ffd56dff Author: Martin Jaggi m.ja...@gmail.com Date: 2014-02-08T20:30:35Z line wrap at 100 chars
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34561878 Build triggered.
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34561880 Build started.
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34562386 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12637/
[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/563#issuecomment-34562385 Build finished.
Re: [SUMMARY] Proposal for Spark Release Strategy
:) Sure thing. I will create JIRA ticket for this. Thx guys, Henry On Saturday, February 8, 2014, Patrick Wendell pwend...@gmail.com wrote: :P - I'm pretty sure this can be done but it will require some work - we already use the github API in our merge script and we could hook something like that up with the jenkins tests. Henry maybe you could create a JIRA for this for Spark 1.0? - Patrick On Sat, Feb 8, 2014 at 3:20 PM, Mark Hamstra m...@clearstorydata.comjavascript:; wrote: I know that it can be done -- which is different from saying that I know how to set it up. On Feb 8, 2014, at 2:57 PM, Henry Saputra henry.sapu...@gmail.comjavascript:; wrote: Patrick, do you know if there is a way to check if a Github PR's subject/ title contains JIRA number and will raise warning by the Jenkins? - Henry On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell pwend...@gmail.comjavascript:; wrote: Hey All, Thanks for everyone who participated in this thread. I've distilled feedback based on the discussion and wanted to summarize the conclusions: - People seem universally +1 on semantic versioning in general. - People seem universally +1 on having a public merge windows for releases. - People seem universally +1 on a policy of having associated JIRA's with features. - Everyone believes link-level compatiblity should be the goal. Some people think we should outright promise it now. Others thing we should either not promise it or promise it later. -- Compromise: let's do one minor release 1.0-1.1 to convince ourselves this is possible (some issues with Scala traits will make this tricky). Then we can codify it in writing. I've created SPARK-1069 [1] to clearly establish that this is the goal for 1.X family of releases. - Some people think we should add particular features before having 1.0. -- Version 1.X indicates API stability rather than a feature set; this was clarified. -- That said, people still have several months to work on features if they really want to get them in for this release. I'm going to integrate this feedback and post a tentative version of the release guidelines to the wiki. With all this said, I would like to move the master version to 1.0.0-SNAPSHOT as the main concerns with this have been addressed and clarified. This merely represents a tentative consensus and the release is still subject to a formal vote amongst PMC members. [1] https://spark-project.atlassian.net/browse/SPARK-1069 - Patrick
[GitHub] incubator-spark pull request: Principal Component Analysis
Github user mateiz commented on the pull request: https://github.com/apache/incubator-spark/pull/564#issuecomment-34562749 Made a few comments on the style.
[GitHub] incubator-spark pull request: Principal Component Analysis
Github user rezazadeh commented on the pull request: https://github.com/apache/incubator-spark/pull/564#issuecomment-34563228 @pwendell Not sure why you want this, but here you go: https://spark-project.atlassian.net/browse/MLLIB-21
[GitHub] incubator-spark pull request: Principal Component Analysis
Github user pwendell commented on the pull request: https://github.com/apache/incubator-spark/pull/564#issuecomment-34563275 @rezazadeh We need to track all features with JIRA's it's an Apache requirement.
[GitHub] incubator-spark pull request: Principal Component Analysis
Github user rezazadeh commented on the pull request: https://github.com/apache/incubator-spark/pull/564#issuecomment-34563699 @mateiz All those style changes made.
[GitHub] incubator-spark pull request: Principal Component Analysis
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/564#issuecomment-34563716 Build triggered.
[GitHub] incubator-spark pull request: Principal Component Analysis
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/564#issuecomment-34564093 Build finished.
Re: [TODO] Document the release process for Apache Spark
I ported the release docs to the wiki today. Thanks for reminding me about this Henry: https://cwiki.apache.org/confluence/display/SPARK/Preparing+Spark+Releases - Patrick On Fri, Feb 7, 2014 at 11:51 AM, Henry Saputra henry.sapu...@gmail.com wrote: Cool, Thanks Patrick! Really appreciate it =) - Henry On Fri, Feb 7, 2014 at 11:46 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Henry, Let me document this on the wiki. I've already keep pretty thorough docs on this I just need to migrate them to the wiki. I've created a JIRA here: https://spark-project.atlassian.net/browse/SPARK-1066 - Patrick On Fri, Feb 7, 2014 at 11:35 AM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Patrick, As part of the unofficial checklist for graduation, we need to have a documented steps to make a release. As the first and so far the only RE for Apache Spark, I would like to ask for your help to document the steps to release. This will help other member to do the release and take turns to make sure all future PMCs and committers know how to do Apache Spark release. Most of the steps are probably similar to other projects but it is always useful for each podling to have its own documentation to release artifacts. Really appreciate your help. Thanks, - Henry
[GitHub] incubator-spark pull request: SPARK-1066: Add developer scripts to...
GitHub user pwendell opened a pull request: https://github.com/apache/incubator-spark/pull/565 SPARK-1066: Add developer scripts to repository. These are some developer scripts I've been maintaining in a separate public repo. This patch adds them to the Spark repository so they can evolve here and are clearly accessible to all committers. I may do some small additional clean-up in this PR, but wanted to put them here in case others want to review. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-spark dev-scripts Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-spark/pull/565.patch commit 5d5d331d01f6fd59c2eb830f652955119b012173 Author: Patrick Wendell pwend...@gmail.com Date: 2014-02-09T06:11:47Z SPARK-1066: Add developer scripts to repository.
[GitHub] incubator-spark pull request: SPARK-1066: Add developer scripts to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/565#issuecomment-34566956 Merged build started.
[GitHub] incubator-spark pull request: SPARK-1066: Add developer scripts to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/565#issuecomment-34566955 Merged build triggered.
[GitHub] incubator-spark pull request: SPARK-1066: Add developer scripts to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/565#issuecomment-34567293 Merged build finished.
[GitHub] incubator-spark pull request: SPARK-1066: Add developer scripts to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/565#issuecomment-34567294 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12639/
[GitHub] incubator-spark pull request: Added example Python code for sort
Github user rxin commented on the pull request: https://github.com/apache/incubator-spark/pull/562#issuecomment-34567786 Thanks. Merged this in master branch-0.9.
[GitHub] incubator-spark pull request: [WIP] SPARK-1067: Default log4j init...
Github user rxin commented on the pull request: https://github.com/apache/incubator-spark/pull/560#issuecomment-34567839 Oops I didn't realize the WIP in title. Feel free to revert if necessary.
[GitHub] incubator-spark pull request: [SPARK-1038] Add more fields in Json...
Github user qqsun8819 commented on the pull request: https://github.com/apache/incubator-spark/pull/551#issuecomment-34567873 I update the diff , using hard-coded json string for json data verification. @pwendell @rxin and @aarondav Please reivew it again. Thanks very much!
[GitHub] incubator-spark pull request: [SPARK-1038] Add more fields in Json...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/551#issuecomment-34567877 Merged build started.
Re: [SUMMARY] Proposal for Spark Release Strategy
Ok, JIRA ticket filed [1] for this one. - Henry [1] https://spark-project.atlassian.net/browse/SPARK-1070 On Sat, Feb 8, 2014 at 3:39 PM, Patrick Wendell pwend...@gmail.com wrote: :P - I'm pretty sure this can be done but it will require some work - we already use the github API in our merge script and we could hook something like that up with the jenkins tests. Henry maybe you could create a JIRA for this for Spark 1.0? - Patrick On Sat, Feb 8, 2014 at 3:20 PM, Mark Hamstra m...@clearstorydata.com wrote: I know that it can be done -- which is different from saying that I know how to set it up. On Feb 8, 2014, at 2:57 PM, Henry Saputra henry.sapu...@gmail.com wrote: Patrick, do you know if there is a way to check if a Github PR's subject/ title contains JIRA number and will raise warning by the Jenkins? - Henry On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Thanks for everyone who participated in this thread. I've distilled feedback based on the discussion and wanted to summarize the conclusions: - People seem universally +1 on semantic versioning in general. - People seem universally +1 on having a public merge windows for releases. - People seem universally +1 on a policy of having associated JIRA's with features. - Everyone believes link-level compatiblity should be the goal. Some people think we should outright promise it now. Others thing we should either not promise it or promise it later. -- Compromise: let's do one minor release 1.0-1.1 to convince ourselves this is possible (some issues with Scala traits will make this tricky). Then we can codify it in writing. I've created SPARK-1069 [1] to clearly establish that this is the goal for 1.X family of releases. - Some people think we should add particular features before having 1.0. -- Version 1.X indicates API stability rather than a feature set; this was clarified. -- That said, people still have several months to work on features if they really want to get them in for this release. I'm going to integrate this feedback and post a tentative version of the release guidelines to the wiki. With all this said, I would like to move the master version to 1.0.0-SNAPSHOT as the main concerns with this have been addressed and clarified. This merely represents a tentative consensus and the release is still subject to a formal vote amongst PMC members. [1] https://spark-project.atlassian.net/browse/SPARK-1069 - Patrick
[GitHub] incubator-spark pull request: [SPARK-1038] Add more fields in Json...
Github user rxin commented on the pull request: https://github.com/apache/incubator-spark/pull/551#issuecomment-34567984 Thanks. I left some comments to improve readability of the code.