Re: Automated close of PR's ?

2015-12-30 Thread Mridul Muralidharan
I am not sure of others, but I had a PR close from under me where ongoing discussion was as late as 2 weeks back. Given this, I assumed it was automated close and not manual ! When the change was opened is not a good metric about viability of the change (particularly when it touches code which is

Automated close of PR's ?

2015-12-30 Thread Mridul Muralidharan
Is there a script running to close "old" PR's ? I was not aware of any discussion about this in dev list. - Mridul - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail:

Re: problem with reading source code-pull out nondeterministic expresssions

2015-12-30 Thread Michael Armbrust
The goal here is to ensure that the non-deterministic value is evaluated only once, so the result won't change for a given row (i.e. when sorting). On Tue, Dec 29, 2015 at 10:57 PM, 汪洋 wrote: > Hi fellas, > I am new to spark and I have a newbie question. I am currently

New processes / tools for changing dependencies in Spark

2015-12-30 Thread Josh Rosen
I just merged https://github.com/apache/spark/pull/10461, a PR that adds new automated tooling to help us reason about dependency changes in Spark. Here's a summary of the changes: - The dev/run-tests script (used in the SBT Jenkins builds and for testing Spark pull requests) now generates

Re: IndentationCheck of checkstyle

2015-12-30 Thread Ted Yu
Right. Pardon my carelessness. > On Dec 29, 2015, at 9:58 PM, Reynold Xin wrote: > > OK to close the loop - this thread has nothing to do with Spark? > > >> On Tue, Dec 29, 2015 at 9:55 PM, Ted Yu wrote: >> Oops, wrong list :-) >> >>> On Dec 29,

Re: running lda in spark throws exception

2015-12-30 Thread Li Li
I use a small data and reproduce the problem. But I don't know my codes are correct or not because I am not familiar with spark. So I first post my codes here. If it's correct, then I will post the data. one line of my data like: { "time":"08-09-17","cmtUrl":"2094361"

Re: Data and Model Parallelism in MLPC

2015-12-30 Thread Disha Shrivastava
Hi, I went through the code for implementation of MLPC and couldn't understand why stacking/unstacking of the input data has been done. The description says " Block size for stacking input data in matrices to speed up the computation. Data is stacked within partitions. If block size is more than