[GitHub] incubator-spark pull request: [WIP] SPARK-1058, Fix Style Errors a...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/557#issuecomment-34543052
  
Merged build finished.



答复:[GitHub] incubator-spark pull request:

2014-02-08 Thread 欧阳晋(欧阳晋)
I think may be like yarn , any JIRA creation will be forward to dev@ list , dev 
list also include discussion of new features and bugs , and all Jenkins build 
message for Pre commit . And any update of a specific JIRA, like assigne or 
comment will be forward to issues@ list. yarn also have a commit@ list if any 
svn ci happens . Maybe we can use some of it , it's just a advice ^_^

--
发件人:Xuefeng Wu ben...@gmail.com
发送时间:2014年2月8日(星期六) 14:34
收件人:dev@spark.incubator.apache.org dev@spark.incubator.apache.org
主 题:Re: [GitHub] incubator-spark pull request:

github have this feature, but these mails are from g...@git.apache.org.  I
think some github information are filtered.

https://github.com/blog/811-reply-to-comments-from-email


On Sat, Feb 8, 2014 at 2:21 PM, Reynold Xin r...@databricks.com wrote:

 I don't think it does.


 On Fri, Feb 7, 2014 at 8:58 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

  If we reply these emails, will the reply be posted on pull request
  discussion board automatically?
 
  if yes, that would be very nice
 
  --
  Nan Zhu
 
 
 
  On Friday, February 7, 2014 at 9:23 PM, Henry Saputra wrote:
 
   I am with Chris on this one.
  
   These github notifications are similar to JIRA updates that in most
   ASF projects are sent to dev@ list, and these are valid messages that
   contributors in the project should concern about.
  
   Especially the PPMCs (which willl be PMCs hopefully soon) need to know
   about them and become audit trail/ archive of development discussions
   for ASF.
  
   We already have user@ list which targeted for people interested to ask
   for questions using Spark and should be the proper list for people
   interested on using Spark.
  
   As Matei have said, you can filter these github notifications email
  easily.
  
   Thanks,
  
  
   - Henry
  
  
   On Fri, Feb 7, 2014 at 6:02 PM, Chris Mattmann mattm...@apache.org
 (mailto:
  mattm...@apache.org) wrote:
Guys this Github discussion seems like dev discussion in which case
 it
must be
on dev list and not moved - the whole point of this is that
  development,
including
conversations related to it, which are the lifeblood of the project
  should
occur
on the ASF mailing lists.
   
Refactoring the lists is one thing for the more automated messages,
  but the
comments below look like Kay commenting on some relevant stuff in
 which
case
I would argue against (paraphrased) moving it to some ASF list that
  those
who
care can subscribe to. Those who care in this case should be
 people
  who
care about Kay's comments (which aren't automated commit messages
 from
some bot;
they are relevant dev comments) in which case those who care should
  be
the
PMC.
   
My suggestion is if there is a notifications list set up, it can be
  like
for
automated stuff - but *NOT* for dev discussion -- that needs to
 happen
  on
the
dev lists. If it's on another list, then I would expect periodically
(frequently;
with enough diligence to VOTE on and discuss and contribute to) to
 see
  that
flushed or summarized on the dev list.
   
Cheers,
Chris
   
   
   
   
-Original Message-
From: Andrew Ash and...@andrewash.com (mailto:and...@andrewash.com
 )
Reply-To: dev@spark.incubator.apache.org (mailto:
  dev@spark.incubator.apache.org) dev@spark.incubator.apache.org(mailto:
  dev@spark.incubator.apache.org)
Date: Friday, February 7, 2014 5:43 PM
To: dev@spark.incubator.apache.org (mailto:
  dev@spark.incubator.apache.org) dev@spark.incubator.apache.org(mailto:
  dev@spark.incubator.apache.org)
Subject: Re: [GitHub] incubator-spark pull request:
   
 +1 on moving this stuff to a separate mailing list. It's Apache
  policy
 that discussion is archived, but it's not policy that it must be
 interleaved with other dev discussion. Let's move it to a
 spark-github-discuss list (or a different name) and people who care
  to see
 it can subscribe.


 On Fri, Feb 7, 2014 at 5:19 PM, Reynold Xin r...@databricks.com
 (mailto:
  r...@databricks.com) wrote:

  I concur wholeheartedly ...
 
 
  On Fri, Feb 7, 2014 at 4:55 PM, Dean Wampler 
  deanwamp...@gmail.com (mailto:deanwamp...@gmail.com)
  wrote:
 
   This SPAM is not doing anyone any good. How about another
  mailing list
  for
   people who want to see this?
  
   Sent from my rotary phone.
  
  
On Feb 7, 2014, at 10:33 AM, mridulm g...@git.apache.org
 (mailto:
  g...@git.apache.org) wrote:
   
Github user mridulm commented on the pull request:
 
  https://github.com/apache/incubator-spark/pull/517#issuecomment-34484468
   
I am hoping that the PR Prashant Sharma submitted would also
  include
ability to check these things once committed !
Thanks 

[GitHub] incubator-spark pull request: [PySpark] Adding support for Sequenc...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/263#issuecomment-34546763
  
Merged build started.



[GitHub] incubator-spark pull request: Adding an option to persist Spark RD...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/468#issuecomment-34546755
  
 Merged build triggered.



[GitHub] incubator-spark pull request: [PySpark] Adding support for Sequenc...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/263#issuecomment-34546762
  
 Merged build triggered.



[GitHub] incubator-spark pull request: [PySpark] Adding support for Sequenc...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/263#issuecomment-34546802
  
Merged build finished.



[GitHub] incubator-spark pull request: [PySpark] Adding support for Sequenc...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/263#issuecomment-34546803
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12630/



[GitHub] incubator-spark pull request: Adding an option to persist Spark RD...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/468#issuecomment-34547492
  
Merged build finished.



[GitHub] incubator-spark pull request: Adding an option to persist Spark RD...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/468#issuecomment-34547493
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12629/



[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread martinjaggi
GitHub user martinjaggi opened a pull request:

https://github.com/apache/incubator-spark/pull/563

new MLlib documentation for optimization, regression and classification

new documentation with tex formulas, hopefully improving usability and 
reproducibility of the offered MLlib methods.
also did some minor changes in the code for consistency. scala tests pass.

for easier merging, we could maybe rebase these changes (only  feb 7 is 
relevant) after 
https://github.com/apache/incubator-spark/pull/552
is merged?

jira:
https://spark-project.atlassian.net/browse/MLLIB-19

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-spark polishing-opt-MLlib

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-spark/pull/563.patch


commit d73948db0d9bc36296054e79fec5b1a657b4eab4
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-06T15:57:23Z

minor update on how to compile the documentation

commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-06T15:59:43Z

enable mathjax formula in the .md documentation files

code by @shivaram

commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-06T16:31:29Z

split MLlib documentation by techniques

and linked from the main mllib-guide.md site

commit dcd2142c164b2f602bf472bb152ad55bae82d31a
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-06T17:04:26Z

enabling inline latex formulas with $.$

same mathjax configuration as used in math.stackexchange.com

sample usage in the linear algebra (SVD) documentation

commit 0364bfabbfc347f917216057a20c39b631842481
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T02:19:38Z

minor polishing, as suggested by @pwendell

commit 93d74988c33a9e4ef0d15e39c8b8fc9e6c36bb28
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T16:33:24Z

renaming LeastSquaresGradient

not to confuse with squared regularizer or a squared gradient. added
some more comments as what the loss functions are good for

commit e4cbe99bbcf7f53ebb8f1a0d2e0b869a4922bca4
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T16:34:45Z

use d for the number of features

try to be consistent, that n is the number of data examples in the RDD,
and each of them has d entries (also in documentation)

commit 79768fd3429df5c6d56f05ac93bdd8cf4355d946
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T17:13:17Z

correct scaling for MSE loss

to be consistent with the documentation

commit 1e228062b01ac806c4bd032eb0975a8b92431fd9
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T17:15:44Z

new classification and regression documentation

with complete mathematical formulations. trying to be general for
adding future ML methods as well. table of all subgradients used for
reference.
this change also required a small addition to the mathjax
configuration, to allow equation numbers.

commit 89e472f4121debb175b625ab0c138e24c4e60de8
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T17:16:51Z

new optimization documentation

explaining GD and SGD and the distributed versions that MLlib
implements.

commit a33be78a47bad1745a03a6e0ee1a4ea1a7893805
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T17:38:57Z

better comments in SGD code for regression

commit 73f5e71e3d9a253ff378907fca202b8d6aae1268
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T22:41:42Z

lambda R() in documentation

commit eec58c9c860def9b3b7604c990ec1697812bcbbf
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-08T17:31:05Z

telling what updater actually does

also use proper scaling for the L2 regularization (using 1/2 as in the
documentation)

commit 2c1cf8d35145081a61865f55f4e48fcfbafddbbe
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-08T17:56:01Z

remove broken url

commit ecbac73a7450fc90ef1509d9a410c9b627617130
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-08T17:57:12Z

better description of GradientDescent





[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34553608
  
Jenkins add to whitelist.



[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34553612
  
Jenkins, test this please.



[GitHub] incubator-spark pull request: tex formulas in the documentation

2014-02-08 Thread martinjaggi
Github user martinjaggi commented on the pull request:

https://github.com/apache/incubator-spark/pull/552#issuecomment-34553631
  
ok thanks!



[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34553685
  
Merged build started.



[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34553684
  
 Merged build triggered.



[GitHub] incubator-spark pull request: [WIP] SPARK-1058, Fix Style Errors a...

2014-02-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/incubator-spark/pull/557#issuecomment-34553870
  
@ScrapCodes Words cannot express my elation at having this patch. I noticed 
there are still style errors. Did you want me to merge this as-is and then you 
will add future pull requests (to avoid conflicts)?



[GitHub] incubator-spark pull request: [WIP] SPARK-1058, Fix Style Errors a...

2014-02-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/incubator-spark/pull/557#issuecomment-34554122
  
Hey @ScrapCodes I noticed the size of indent is inconsistent. The rule is 
to always use 2 spaces. If you are breaking initialization of a code block 
(e.g. a function signature) then it's okay to use 4 spaces to distinguish it 
from the body. I think scala is silent on this exception but it's the 
convention we usually use.

If you could go through and address those I'm happy to merge an 
intermediate clean-up to avoid conflicts.



[GitHub] incubator-spark pull request: tex formulas in the documentation

2014-02-08 Thread martinjaggi
Github user martinjaggi closed the pull request at:

https://github.com/apache/incubator-spark/pull/552



[GitHub] incubator-spark pull request: tex formulas in the documentation

2014-02-08 Thread martinjaggi
GitHub user martinjaggi reopened a pull request:

https://github.com/apache/incubator-spark/pull/552

tex formulas in the documentation

using mathjax.
and spliting the MLlib documentation by techniques

see jira
https://spark-project.atlassian.net/browse/MLLIB-19
and
https://github.com/shivaram/spark/compare/mathjax

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-spark/pull/552.patch


commit d73948db0d9bc36296054e79fec5b1a657b4eab4
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-06T15:57:23Z

minor update on how to compile the documentation

commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-06T15:59:43Z

enable mathjax formula in the .md documentation files

code by @shivaram

commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-06T16:31:29Z

split MLlib documentation by techniques

and linked from the main mllib-guide.md site

commit dcd2142c164b2f602bf472bb152ad55bae82d31a
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-06T17:04:26Z

enabling inline latex formulas with $.$

same mathjax configuration as used in math.stackexchange.com

sample usage in the linear algebra (SVD) documentation

commit 0364bfabbfc347f917216057a20c39b631842481
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T02:19:38Z

minor polishing, as suggested by @pwendell





[GitHub] incubator-spark pull request: tex formulas in the documentation

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/552#issuecomment-34554284
  
 Merged build triggered.



[GitHub] incubator-spark pull request: tex formulas in the documentation

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/552#issuecomment-34554285
  
Merged build started.



[GitHub] incubator-spark pull request: Kill drivers in postStop() for Worke...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/561#issuecomment-34554453
  
Merged build started.



[GitHub] incubator-spark pull request: Kill drivers in postStop() for Worke...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/561#issuecomment-34554452
  
 Merged build triggered.



[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34554535
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12631/



[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34554533
  
Merged build finished.



[GitHub] incubator-spark pull request: ROC AUC and Average precision metric...

2014-02-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/incubator-spark/pull/550#issuecomment-34554613
  
@schmit Mind adding a JIRA for this?



[GitHub] incubator-spark pull request: Make sbt download an atomic operatio...

2014-02-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/incubator-spark/pull/454#issuecomment-34554940
  
Seems reasonable to me, I'll merge this.



[GitHub] incubator-spark pull request: Make sbt download an atomic operatio...

2014-02-08 Thread jey
Github user jey closed the pull request at:

https://github.com/apache/incubator-spark/pull/454



[GitHub] incubator-spark pull request: tex formulas in the documentation

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/552#issuecomment-34555134
  
Merged build finished.



[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34555222
  
Build started.



[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34555221
  
 Build triggered.



[GitHub] incubator-spark pull request: Kill drivers in postStop() for Worke...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/561#issuecomment-34555277
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12633/



[SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Patrick Wendell
Hey All,

Thanks for everyone who participated in this thread. I've distilled
feedback based on the discussion and wanted to summarize the
conclusions:

- People seem universally +1 on semantic versioning in general.

- People seem universally +1 on having a public merge windows for releases.

- People seem universally +1 on a policy of having associated JIRA's
with features.

- Everyone believes link-level compatiblity should be the goal. Some
people think we should outright promise it now. Others thing we should
either not promise it or promise it later.
-- Compromise: let's do one minor release 1.0-1.1 to convince
ourselves this is possible (some issues with Scala traits will make
this tricky). Then we can codify it in writing. I've created
SPARK-1069 [1] to clearly establish that this is the goal for 1.X
family of releases.

- Some people think we should add particular features before having 1.0.
-- Version 1.X indicates API stability rather than a feature set;
this was clarified.
-- That said, people still have several months to work on features if
they really want to get them in for this release.

I'm going to integrate this feedback and post a tentative version of
the release guidelines to the wiki.

With all this said, I would like to move the master version to
1.0.0-SNAPSHOT as the main concerns with this have been addressed and
clarified. This merely represents a tentative consensus and the
release is still subject to a formal vote amongst PMC members.

[1] https://spark-project.atlassian.net/browse/SPARK-1069

- Patrick


[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34555965
  
Build finished.



[GitHub] incubator-spark pull request: Principal Component Analysis

2014-02-08 Thread rezazadeh
GitHub user rezazadeh opened a pull request:

https://github.com/apache/incubator-spark/pull/564

Principal Component Analysis

# Principal Component Analysis

Computes the top k principal component coefficients for the m-by-n data 
matrix X. Rows of X correspond to observations and columns correspond to 
variables. The coefficient matrix is n-by-k. Each column of coeff contains 
coefficients for one principal component, and the columns are in descending
order of component variance. This function centers the data and uses the 
singular value decomposition (SVD) algorithm.

# Testing
Tests included:
 * All principal components
 * Only top k principal components

# Documentation

# Example Usage 
import org.apache.spark.SparkContext
import org.apache.spark.mllib.linalg.PCA
import org.apache.spark.mllib.linalg.SparseMatrix
import org.apache.spark.mllib.linalg.MatrixEntry

// Load and parse the data file
val data = sc.textFile(mllib/data/als/test.data).map { line =
  val parts = line.split(',')
  MatrixEntry(parts(0).toInt, parts(1).toInt, parts(2).toDouble)
}
val m = 4
val n = 4
val k = 1

// recover top principal component
val coeffs = PCA.computePCA(SparseMatrix(data, m, n), k)

{% endhighlight %}


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-spark pca

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-spark/pull/564.patch


commit 0642afb2ec1ca6896ffd1a4d3b12eca3f4db52b3
Author: Reza Zadeh riz...@gmail.com
Date:   2014-02-02T05:53:33Z

Initial files

commit 371f40ae288d45986c364adcfe4b584a9b00aa3d
Author: Reza Zadeh riz...@gmail.com
Date:   2014-02-08T01:50:59Z

new interfaces

commit 173148288dffe6cfa1d6671fa8dd9c57499fd0e8
Author: Reza Zadeh riz...@gmail.com
Date:   2014-02-08T04:04:46Z

add option to compute U

commit fb022fcc857bc3793882587480671b3e0b23
Author: Reza Zadeh riz...@gmail.com
Date:   2014-02-08T08:48:24Z

new tests, SVD interface

commit f756aff7b322504f09236f3ad4e05d4b75e8cc42
Author: Reza Zadeh riz...@gmail.com
Date:   2014-02-08T08:49:47Z

fix tests

commit 2d831f8f734ddf207707b721aa9718ebd7e65ca9
Author: Reza Zadeh riz...@gmail.com
Date:   2014-02-08T09:04:48Z

Documentation, yo

commit 31a5ecf977e6e4e6cd4d038aaa9f3d1ad1b3de49
Author: Reza Zadeh riz...@gmail.com
Date:   2014-02-08T09:15:23Z

added mllib guide docs

commit 57fe6d4ed9e214a504dbb2c5c66205045d5846b5
Author: Reza Zadeh riz...@gmail.com
Date:   2014-02-08T09:18:07Z

SparkPCA example

commit 07657476d3be2bd177090aaa37f6a4357329a188
Author: Reza Zadeh riz...@gmail.com
Date:   2014-02-08T09:22:15Z

fix typo

commit b45c1e88cb36ce2e5c78f493b05455f87ecfc662
Author: Reza Zadeh riz...@gmail.com
Date:   2014-02-08T09:23:15Z

fix example





[GitHub] incubator-spark pull request: Principal Component Analysis

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/564#issuecomment-34556062
  
Build started.



[GitHub] incubator-spark pull request: Principal Component Analysis

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/564#issuecomment-34556061
  
 Build triggered.



[GitHub] incubator-spark pull request: Principal Component Analysis

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/564#issuecomment-34556205
  
 Build triggered.



[GitHub] incubator-spark pull request: Kill drivers in postStop() for Worke...

2014-02-08 Thread Qiuzhuang
Github user Qiuzhuang closed the pull request at:

https://github.com/apache/incubator-spark/pull/561



[GitHub] incubator-spark pull request: Principal Component Analysis

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/564#issuecomment-34556909
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12635/



[GitHub] incubator-spark pull request: Principal Component Analysis

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/564#issuecomment-34557013
  
Build finished.



[GitHub] incubator-spark pull request: Principal Component Analysis

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/564#issuecomment-34557014
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12636/



Re: [SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Andy Konwinski
Thanks for the summary Patrick. I'm glad that we discussed the options
before pulling the trigger on a version number update (my -1 had only been
about committing a major version update without thorough discussion).
IMO that's been addressed and given the discussion, I'm changing to a +1
for 1.0.0
On Feb 8, 2014 12:56 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hey All,

 Thanks for everyone who participated in this thread. I've distilled
 feedback based on the discussion and wanted to summarize the
 conclusions:

 - People seem universally +1 on semantic versioning in general.

 - People seem universally +1 on having a public merge windows for releases.

 - People seem universally +1 on a policy of having associated JIRA's
 with features.

 - Everyone believes link-level compatiblity should be the goal. Some
 people think we should outright promise it now. Others thing we should
 either not promise it or promise it later.
 -- Compromise: let's do one minor release 1.0-1.1 to convince
 ourselves this is possible (some issues with Scala traits will make
 this tricky). Then we can codify it in writing. I've created
 SPARK-1069 [1] to clearly establish that this is the goal for 1.X
 family of releases.

 - Some people think we should add particular features before having 1.0.
 -- Version 1.X indicates API stability rather than a feature set;
 this was clarified.
 -- That said, people still have several months to work on features if
 they really want to get them in for this release.

 I'm going to integrate this feedback and post a tentative version of
 the release guidelines to the wiki.

 With all this said, I would like to move the master version to
 1.0.0-SNAPSHOT as the main concerns with this have been addressed and
 clarified. This merely represents a tentative consensus and the
 release is still subject to a formal vote amongst PMC members.

 [1] https://spark-project.atlassian.net/browse/SPARK-1069

 - Patrick



Re: [SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Henry Saputra
Patrick, do you know if there is a way to check if a Github PR's
subject/ title contains JIRA number and will raise warning by the
Jenkins?

- Henry

On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey All,

 Thanks for everyone who participated in this thread. I've distilled
 feedback based on the discussion and wanted to summarize the
 conclusions:

 - People seem universally +1 on semantic versioning in general.

 - People seem universally +1 on having a public merge windows for releases.

 - People seem universally +1 on a policy of having associated JIRA's
 with features.

 - Everyone believes link-level compatiblity should be the goal. Some
 people think we should outright promise it now. Others thing we should
 either not promise it or promise it later.
 -- Compromise: let's do one minor release 1.0-1.1 to convince
 ourselves this is possible (some issues with Scala traits will make
 this tricky). Then we can codify it in writing. I've created
 SPARK-1069 [1] to clearly establish that this is the goal for 1.X
 family of releases.

 - Some people think we should add particular features before having 1.0.
 -- Version 1.X indicates API stability rather than a feature set;
 this was clarified.
 -- That said, people still have several months to work on features if
 they really want to get them in for this release.

 I'm going to integrate this feedback and post a tentative version of
 the release guidelines to the wiki.

 With all this said, I would like to move the master version to
 1.0.0-SNAPSHOT as the main concerns with this have been addressed and
 clarified. This merely represents a tentative consensus and the
 release is still subject to a formal vote amongst PMC members.

 [1] https://spark-project.atlassian.net/browse/SPARK-1069

 - Patrick


[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34559476
  
@martinjaggi can you rebase this now?



[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread martinjaggi
Github user martinjaggi commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34559530
  
can i do this on the github website or only in command line?


On Sun, Feb 9, 2014 at 12:09 AM, Patrick Wendell
notificati...@github.comwrote:

 @martinjaggi https://github.com/martinjaggi can you rebase this now?

 --
 Reply to this email directly or view it on 
GitHubhttps://github.com/apache/incubator-spark/pull/563#issuecomment-34559476
 .




Re: [SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Mark Hamstra
I know that it can be done -- which is different from saying that I know how to 
set it up.


 On Feb 8, 2014, at 2:57 PM, Henry Saputra henry.sapu...@gmail.com wrote:
 
 Patrick, do you know if there is a way to check if a Github PR's
 subject/ title contains JIRA number and will raise warning by the
 Jenkins?
 
 - Henry
 
 On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey All,
 
 Thanks for everyone who participated in this thread. I've distilled
 feedback based on the discussion and wanted to summarize the
 conclusions:
 
 - People seem universally +1 on semantic versioning in general.
 
 - People seem universally +1 on having a public merge windows for releases.
 
 - People seem universally +1 on a policy of having associated JIRA's
 with features.
 
 - Everyone believes link-level compatiblity should be the goal. Some
 people think we should outright promise it now. Others thing we should
 either not promise it or promise it later.
 -- Compromise: let's do one minor release 1.0-1.1 to convince
 ourselves this is possible (some issues with Scala traits will make
 this tricky). Then we can codify it in writing. I've created
 SPARK-1069 [1] to clearly establish that this is the goal for 1.X
 family of releases.
 
 - Some people think we should add particular features before having 1.0.
 -- Version 1.X indicates API stability rather than a feature set;
 this was clarified.
 -- That said, people still have several months to work on features if
 they really want to get them in for this release.
 
 I'm going to integrate this feedback and post a tentative version of
 the release guidelines to the wiki.
 
 With all this said, I would like to move the master version to
 1.0.0-SNAPSHOT as the main concerns with this have been addressed and
 clarified. This merely represents a tentative consensus and the
 release is still subject to a formal vote amongst PMC members.
 
 [1] https://spark-project.atlassian.net/browse/SPARK-1069
 
 - Patrick


[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread martinjaggi
Github user martinjaggi commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34559872
  
i'm scared of the wrath of the git gods ;)
https://help.github.com/articles/interactive-rebase

(the rebase succeeded locally on my machine, but nothing has happened on
github yet)


On Sun, Feb 9, 2014 at 12:11 AM, Martin Jaggi m.ja...@gmail.com wrote:

 can i do this on the github website or only in command line?


 On Sun, Feb 9, 2014 at 12:09 AM, Patrick Wendell notificati...@github.com
  wrote:

 @martinjaggi https://github.com/martinjaggi can you rebase this now?

 --
 Reply to this email directly or view it on 
GitHubhttps://github.com/apache/incubator-spark/pull/563#issuecomment-34559476
 .






[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34559900
  
(And then submit a new PR and close this one)



[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34559894
  
To play it safe, you can always create a new branch and do the rebase there 
so it doesn't change your current branch.



Re: [SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Patrick Wendell
:P - I'm pretty sure this can be done but it will require some work -
we already use the github API in our merge script and we could hook
something like that up with the jenkins tests. Henry maybe you could
create a JIRA for this for Spark 1.0?

- Patrick

On Sat, Feb 8, 2014 at 3:20 PM, Mark Hamstra m...@clearstorydata.com wrote:
 I know that it can be done -- which is different from saying that I know how 
 to set it up.


 On Feb 8, 2014, at 2:57 PM, Henry Saputra henry.sapu...@gmail.com wrote:

 Patrick, do you know if there is a way to check if a Github PR's
 subject/ title contains JIRA number and will raise warning by the
 Jenkins?

 - Henry

 On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey All,

 Thanks for everyone who participated in this thread. I've distilled
 feedback based on the discussion and wanted to summarize the
 conclusions:

 - People seem universally +1 on semantic versioning in general.

 - People seem universally +1 on having a public merge windows for releases.

 - People seem universally +1 on a policy of having associated JIRA's
 with features.

 - Everyone believes link-level compatiblity should be the goal. Some
 people think we should outright promise it now. Others thing we should
 either not promise it or promise it later.
 -- Compromise: let's do one minor release 1.0-1.1 to convince
 ourselves this is possible (some issues with Scala traits will make
 this tricky). Then we can codify it in writing. I've created
 SPARK-1069 [1] to clearly establish that this is the goal for 1.X
 family of releases.

 - Some people think we should add particular features before having 1.0.
 -- Version 1.X indicates API stability rather than a feature set;
 this was clarified.
 -- That said, people still have several months to work on features if
 they really want to get them in for this release.

 I'm going to integrate this feedback and post a tentative version of
 the release guidelines to the wiki.

 With all this said, I would like to move the master version to
 1.0.0-SNAPSHOT as the main concerns with this have been addressed and
 clarified. This merely represents a tentative consensus and the
 release is still subject to a formal vote amongst PMC members.

 [1] https://spark-project.atlassian.net/browse/SPARK-1069

 - Patrick


[GitHub] incubator-spark pull request: Principal Component Analysis

2014-02-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/incubator-spark/pull/564#issuecomment-34560617
  
@rezazadeh Mind adding a JIRA for this?



[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread martinjaggi
Github user martinjaggi closed the pull request at:

https://github.com/apache/incubator-spark/pull/563



[GitHub] incubator-spark pull request: Version number to 1.0.0-SNAPSHOT

2014-02-08 Thread markhamstra
Github user markhamstra closed the pull request at:

https://github.com/apache/incubator-spark/pull/542



[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread martinjaggi
GitHub user martinjaggi reopened a pull request:

https://github.com/apache/incubator-spark/pull/563

new MLlib documentation for optimization, regression and classification

new documentation with tex formulas, hopefully improving usability and 
reproducibility of the offered MLlib methods.
also did some minor changes in the code for consistency. scala tests pass.

for easier merging, we could maybe rebase these changes (only  feb 7 is 
relevant) after 
https://github.com/apache/incubator-spark/pull/552
is merged?

jira:
https://spark-project.atlassian.net/browse/MLLIB-19

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-spark polishing-opt-MLlib

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-spark/pull/563.patch


commit d73948db0d9bc36296054e79fec5b1a657b4eab4
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-06T15:57:23Z

minor update on how to compile the documentation

commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-06T15:59:43Z

enable mathjax formula in the .md documentation files

code by @shivaram

commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-06T16:31:29Z

split MLlib documentation by techniques

and linked from the main mllib-guide.md site

commit dcd2142c164b2f602bf472bb152ad55bae82d31a
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-06T17:04:26Z

enabling inline latex formulas with $.$

same mathjax configuration as used in math.stackexchange.com

sample usage in the linear algebra (SVD) documentation

commit 0364bfabbfc347f917216057a20c39b631842481
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T02:19:38Z

minor polishing, as suggested by @pwendell

commit 93d74988c33a9e4ef0d15e39c8b8fc9e6c36bb28
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T16:33:24Z

renaming LeastSquaresGradient

not to confuse with squared regularizer or a squared gradient. added
some more comments as what the loss functions are good for

commit e4cbe99bbcf7f53ebb8f1a0d2e0b869a4922bca4
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T16:34:45Z

use d for the number of features

try to be consistent, that n is the number of data examples in the RDD,
and each of them has d entries (also in documentation)

commit 79768fd3429df5c6d56f05ac93bdd8cf4355d946
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T17:13:17Z

correct scaling for MSE loss

to be consistent with the documentation

commit 1e228062b01ac806c4bd032eb0975a8b92431fd9
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T17:15:44Z

new classification and regression documentation

with complete mathematical formulations. trying to be general for
adding future ML methods as well. table of all subgradients used for
reference.
this change also required a small addition to the mathjax
configuration, to allow equation numbers.

commit 89e472f4121debb175b625ab0c138e24c4e60de8
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T17:16:51Z

new optimization documentation

explaining GD and SGD and the distributed versions that MLlib
implements.

commit a33be78a47bad1745a03a6e0ee1a4ea1a7893805
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T17:38:57Z

better comments in SGD code for regression

commit 73f5e71e3d9a253ff378907fca202b8d6aae1268
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-07T22:41:42Z

lambda R() in documentation

commit eec58c9c860def9b3b7604c990ec1697812bcbbf
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-08T17:31:05Z

telling what updater actually does

also use proper scaling for the L2 regularization (using 1/2 as in the
documentation)

commit 2c1cf8d35145081a61865f55f4e48fcfbafddbbe
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-08T17:56:01Z

remove broken url

commit ecbac73a7450fc90ef1509d9a410c9b627617130
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-08T17:57:12Z

better description of GradientDescent

commit eae3dce25a4b68bf32ece1ca7783f9b2ffd56dff
Author: Martin Jaggi m.ja...@gmail.com
Date:   2014-02-08T20:30:35Z

line wrap at 100 chars





[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34561878
  
 Build triggered.



[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34561880
  
Build started.



[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34562386
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12637/



[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/563#issuecomment-34562385
  
Build finished.



Re: [SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Henry Saputra
:)

Sure thing. I will create JIRA ticket for this.

Thx guys,

Henry

On Saturday, February 8, 2014, Patrick Wendell pwend...@gmail.com wrote:

 :P - I'm pretty sure this can be done but it will require some work -
 we already use the github API in our merge script and we could hook
 something like that up with the jenkins tests. Henry maybe you could
 create a JIRA for this for Spark 1.0?

 - Patrick

 On Sat, Feb 8, 2014 at 3:20 PM, Mark Hamstra 
 m...@clearstorydata.comjavascript:;
 wrote:
  I know that it can be done -- which is different from saying that I know
 how to set it up.
 
 
  On Feb 8, 2014, at 2:57 PM, Henry Saputra 
  henry.sapu...@gmail.comjavascript:;
 wrote:
 
  Patrick, do you know if there is a way to check if a Github PR's
  subject/ title contains JIRA number and will raise warning by the
  Jenkins?
 
  - Henry
 
  On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell 
  pwend...@gmail.comjavascript:;
 wrote:
  Hey All,
 
  Thanks for everyone who participated in this thread. I've distilled
  feedback based on the discussion and wanted to summarize the
  conclusions:
 
  - People seem universally +1 on semantic versioning in general.
 
  - People seem universally +1 on having a public merge windows for
 releases.
 
  - People seem universally +1 on a policy of having associated JIRA's
  with features.
 
  - Everyone believes link-level compatiblity should be the goal. Some
  people think we should outright promise it now. Others thing we should
  either not promise it or promise it later.
  -- Compromise: let's do one minor release 1.0-1.1 to convince
  ourselves this is possible (some issues with Scala traits will make
  this tricky). Then we can codify it in writing. I've created
  SPARK-1069 [1] to clearly establish that this is the goal for 1.X
  family of releases.
 
  - Some people think we should add particular features before having
 1.0.
  -- Version 1.X indicates API stability rather than a feature set;
  this was clarified.
  -- That said, people still have several months to work on features if
  they really want to get them in for this release.
 
  I'm going to integrate this feedback and post a tentative version of
  the release guidelines to the wiki.
 
  With all this said, I would like to move the master version to
  1.0.0-SNAPSHOT as the main concerns with this have been addressed and
  clarified. This merely represents a tentative consensus and the
  release is still subject to a formal vote amongst PMC members.
 
  [1] https://spark-project.atlassian.net/browse/SPARK-1069
 
  - Patrick



[GitHub] incubator-spark pull request: Principal Component Analysis

2014-02-08 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/incubator-spark/pull/564#issuecomment-34562749
  
Made a few comments on the style.



[GitHub] incubator-spark pull request: Principal Component Analysis

2014-02-08 Thread rezazadeh
Github user rezazadeh commented on the pull request:

https://github.com/apache/incubator-spark/pull/564#issuecomment-34563228
  
@pwendell Not sure why you want this, but here you go:
https://spark-project.atlassian.net/browse/MLLIB-21



[GitHub] incubator-spark pull request: Principal Component Analysis

2014-02-08 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/incubator-spark/pull/564#issuecomment-34563275
  
@rezazadeh We need to track all features with JIRA's it's an Apache 
requirement.



[GitHub] incubator-spark pull request: Principal Component Analysis

2014-02-08 Thread rezazadeh
Github user rezazadeh commented on the pull request:

https://github.com/apache/incubator-spark/pull/564#issuecomment-34563699
  
@mateiz All those style changes made.



[GitHub] incubator-spark pull request: Principal Component Analysis

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/564#issuecomment-34563716
  
 Build triggered.



[GitHub] incubator-spark pull request: Principal Component Analysis

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/564#issuecomment-34564093
  
Build finished.



Re: [TODO] Document the release process for Apache Spark

2014-02-08 Thread Patrick Wendell
I ported the release docs to the wiki today. Thanks for reminding me
about this Henry:

https://cwiki.apache.org/confluence/display/SPARK/Preparing+Spark+Releases

- Patrick

On Fri, Feb 7, 2014 at 11:51 AM, Henry Saputra henry.sapu...@gmail.com wrote:
 Cool, Thanks Patrick! Really appreciate it =)

 - Henry

 On Fri, Feb 7, 2014 at 11:46 AM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Henry,

 Let me document this on the wiki. I've already keep pretty thorough
 docs on this I just need to migrate them to the wiki. I've created a
 JIRA here:

 https://spark-project.atlassian.net/browse/SPARK-1066

 - Patrick

 On Fri, Feb 7, 2014 at 11:35 AM, Henry Saputra henry.sapu...@gmail.com 
 wrote:
 Hi Patrick,

 As part of the unofficial checklist for graduation, we need to have a
 documented steps to make a release.

 As the first and so far the only RE for Apache Spark, I would like to
 ask for your help to document the steps to release. This will help
 other member to do the release and take turns to make sure all future
 PMCs and committers know how to do Apache Spark release.

 Most of the steps are probably similar to other projects but it is
 always useful for each podling to have its own documentation to
 release artifacts.

 Really appreciate your help.


 Thanks,

 - Henry


[GitHub] incubator-spark pull request: SPARK-1066: Add developer scripts to...

2014-02-08 Thread pwendell
GitHub user pwendell opened a pull request:

https://github.com/apache/incubator-spark/pull/565

SPARK-1066: Add developer scripts to repository.

These are some developer scripts I've been maintaining in a separate public 
repo. This patch adds them to the Spark repository so they can evolve here and 
are clearly accessible to all committers.

I may do some small additional clean-up in this PR, but wanted to put them 
here in case others want to review.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-spark dev-scripts

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-spark/pull/565.patch


commit 5d5d331d01f6fd59c2eb830f652955119b012173
Author: Patrick Wendell pwend...@gmail.com
Date:   2014-02-09T06:11:47Z

SPARK-1066: Add developer scripts to repository.





[GitHub] incubator-spark pull request: SPARK-1066: Add developer scripts to...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/565#issuecomment-34566956
  
Merged build started.



[GitHub] incubator-spark pull request: SPARK-1066: Add developer scripts to...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/565#issuecomment-34566955
  
 Merged build triggered.



[GitHub] incubator-spark pull request: SPARK-1066: Add developer scripts to...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/565#issuecomment-34567293
  
Merged build finished.



[GitHub] incubator-spark pull request: SPARK-1066: Add developer scripts to...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/565#issuecomment-34567294
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12639/



[GitHub] incubator-spark pull request: Added example Python code for sort

2014-02-08 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/incubator-spark/pull/562#issuecomment-34567786
  
Thanks. Merged this in master  branch-0.9.



[GitHub] incubator-spark pull request: [WIP] SPARK-1067: Default log4j init...

2014-02-08 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/incubator-spark/pull/560#issuecomment-34567839
  
Oops I didn't realize the WIP in title. Feel free to revert if necessary.



[GitHub] incubator-spark pull request: [SPARK-1038] Add more fields in Json...

2014-02-08 Thread qqsun8819
Github user qqsun8819 commented on the pull request:

https://github.com/apache/incubator-spark/pull/551#issuecomment-34567873
  
I update the diff , using hard-coded json string for json data 
verification. @pwendell  @rxin  and @aarondav  Please reivew it again. Thanks 
very much!



[GitHub] incubator-spark pull request: [SPARK-1038] Add more fields in Json...

2014-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/incubator-spark/pull/551#issuecomment-34567877
  
Merged build started.



Re: [SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Henry Saputra
Ok, JIRA ticket filed [1] for this one.

- Henry

[1] https://spark-project.atlassian.net/browse/SPARK-1070

On Sat, Feb 8, 2014 at 3:39 PM, Patrick Wendell pwend...@gmail.com wrote:
 :P - I'm pretty sure this can be done but it will require some work -
 we already use the github API in our merge script and we could hook
 something like that up with the jenkins tests. Henry maybe you could
 create a JIRA for this for Spark 1.0?

 - Patrick

 On Sat, Feb 8, 2014 at 3:20 PM, Mark Hamstra m...@clearstorydata.com wrote:
 I know that it can be done -- which is different from saying that I know how 
 to set it up.


 On Feb 8, 2014, at 2:57 PM, Henry Saputra henry.sapu...@gmail.com wrote:

 Patrick, do you know if there is a way to check if a Github PR's
 subject/ title contains JIRA number and will raise warning by the
 Jenkins?

 - Henry

 On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell pwend...@gmail.com 
 wrote:
 Hey All,

 Thanks for everyone who participated in this thread. I've distilled
 feedback based on the discussion and wanted to summarize the
 conclusions:

 - People seem universally +1 on semantic versioning in general.

 - People seem universally +1 on having a public merge windows for releases.

 - People seem universally +1 on a policy of having associated JIRA's
 with features.

 - Everyone believes link-level compatiblity should be the goal. Some
 people think we should outright promise it now. Others thing we should
 either not promise it or promise it later.
 -- Compromise: let's do one minor release 1.0-1.1 to convince
 ourselves this is possible (some issues with Scala traits will make
 this tricky). Then we can codify it in writing. I've created
 SPARK-1069 [1] to clearly establish that this is the goal for 1.X
 family of releases.

 - Some people think we should add particular features before having 1.0.
 -- Version 1.X indicates API stability rather than a feature set;
 this was clarified.
 -- That said, people still have several months to work on features if
 they really want to get them in for this release.

 I'm going to integrate this feedback and post a tentative version of
 the release guidelines to the wiki.

 With all this said, I would like to move the master version to
 1.0.0-SNAPSHOT as the main concerns with this have been addressed and
 clarified. This merely represents a tentative consensus and the
 release is still subject to a formal vote amongst PMC members.

 [1] https://spark-project.atlassian.net/browse/SPARK-1069

 - Patrick


[GitHub] incubator-spark pull request: [SPARK-1038] Add more fields in Json...

2014-02-08 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/incubator-spark/pull/551#issuecomment-34567984
  
Thanks. I left some comments to improve readability of the code.