Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Krishna Sankar
I do run both Python and Scala. But via iPython/Python2 with my own test code. Not running the tests from the distribution. Cheers On Mon, Sep 26, 2016 at 11:59 AM, Holden Karau wrote: > I'm seeing some test failures with Python 3 that could definitely be > environmental

Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-20 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OS X 10.11.5 (El Capitan) OK Total time: 24:07 min mvn clean package -Pyarn -Phadoop-2.7 -DskipTests 2. Tested pyspark, mllib (iPython 4.0) 2.0 Spark version is 2.0.0 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Lasso Regression

Re: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-15 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OS X 10.11.5 (El Capitan) OK Total time: 26:27 min mvn clean package -Pyarn -Phadoop-2.7 -DskipTests 2. Tested pyspark, mllib (iPython 4.0) 2.0 Spark version is 2.0.0 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Lasso Regression

Re: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-15 Thread Krishna Sankar
Can't find the "spark-assembly-2.0.0-hadoop2.7.0.jar" after compilation. Usually it is in the assembly/target/scala-2.11 Has the packaging changed for 2.0.0 ? Cheers On Thu, Jul 14, 2016 at 11:59 AM, Reynold Xin wrote: > Please vote on releasing the following candidate as

Thanks For a Job Well Done !!!

2016-06-18 Thread Krishna Sankar
Hi all, Just wanted to thank all for the dataset API - most of the times we see only bugs in these lists ;o). - Putting some context, this weekend I was updating the SQL chapters of my book - it had all the ugliness of SchemaRDD, registerTempTable, take(10).foreach(println) and

Re: [vote] Apache Spark 2.0.0-preview release (rc1)

2016-05-18 Thread Krishna Sankar
+1. Looks Good. The mllib results are in line with 1.6.1. Deprecation messages. I will convert to ml and test later in the day. Also will try GraphX exercises for our Strata London Tutorial Quick Notes: 1. pyspark env variables need to be changed - IPYTHON and IPYTHON_OPTS are removed

Re: [GRAPHX] Graph Algorithms and Spark

2016-04-21 Thread Krishna Sankar
Hi, 1. Yep, GraphX is stable and would be a good choice for you to implement algorithms. For a quick intro you can refer to our Strata MLlib tutorial GraphX slides http://goo.gl/Ffq2Az 2. GraphX has implemented algorithms like PageRank & ConnectedComponents[1] 3. It also has

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-25 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 29:25 min mvn clean package -Pyarn -Phadoop-2.6 -DskipTests 2. Tested pyspark, mllib (iPython 4.0) 2.0 Spark version is 1.6.0 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-17 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 29:32 min mvn clean package -Pyarn -Phadoop-2.6 -DskipTests 2. Tested pyspark, mllib (iPython 4.0) 2.0 Spark version is 1.6.0 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-14 Thread Krishna Sankar
Guys, The sc.version gives 1.6.0-SNAPSHOT. Need to change to 1.6.0. Can you pl verify ? Cheers On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.6.0! > > The vote is open until

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread Krishna Sankar
In addition to the wrong entry point, I suspect there is a cache problem as well. I have seen strange errors that disappear completely once the ivy cache is deleted. Cheers On Sun, Nov 8, 2015 at 7:54 PM, Ted Yu wrote: > Why did you directly jump to spark-streaming-mqtt

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-06 Thread Krishna Sankar
+1 (non-binding, of course) (Hope I made it in time. ~T-20 !) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 25:52 min mvn clean package -Pyarn -Phadoop-2.6 -DskipTests 2. Tested pyspark, mllib (iPython 4.0, FYI, notebook install is separate “conda install ipython” and then “conda install

Re: [VOTE] Release Apache Spark 1.5.2 (RC1)

2015-10-26 Thread Krishna Sankar
Guys, The sc.version returns 1.5.1 in python and scala. Is anyone getting the same results ? Probably I am doing something wrong. Cheers On Sun, Oct 25, 2015 at 12:07 AM, Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark > version

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-12 Thread Krishna Sankar
I think the key is to vote a specific set of source tarballs without any binary artifacts. The specific binaries are useful but shouldn't be part of the voting process. Makes sense, we really cannot prove (and no need to) that the binaries do not contain malware, but the source can be proven to

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-04 Thread Krishna Sankar
/jira/browse/SPARK-9550 > I assume it was decided to be ok and its going to be in the release notes > but Reynold or Josh can probably speak to it more. > > Tom > > > > On Thursday, September 3, 2015 10:21 PM, Krishna Sankar < > ksanka...@gmail.com> wrote: > >

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-04 Thread Krishna Sankar
Yin > > On Fri, Sep 4, 2015 at 8:00 AM, Krishna Sankar <ksanka...@gmail.com> > wrote: > >> Thanks Tom. Interestingly it happened between RC2 and RC3. >> Now my vote is +1/2 unless the memory error is known and has a workaround. >> >> Cheers >> >&

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-03 Thread Krishna Sankar
+? 1. Compiled OSX 10.10 (Yosemite) OK Total time: 26:09 min mvn clean package -Pyarn -Phadoop-2.6 -DskipTests 2. Tested pyspark, mllib 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4. KMeans OK Center And

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-27 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 42:36 min mvn clean package -Pyarn -Phadoop-2.6 -DskipTests 2. Tested pyspark, mllib 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4.

Re: [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-09 Thread Krishna Sankar
+1 1. Compiled OSX 10.10 (Yosemite) OK Total time: 38:11 min mvn clean package -Pyarn -Phadoop-2.6 -DskipTests 2. Tested pyspark, mllib 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4. KMeans OK Center And

Re: [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Krishna Sankar
). It might be that we require a newer version of maven than you have. The release itself is built with maven 3.3.3: https://github.com/apache/spark/blob/master/build/mvn#L72 - Patrick On Fri, Jul 3, 2015 at 3:19 PM, Krishna Sankar ksanka...@gmail.com wrote: Yep, happens to me as well

Re: Can not build master

2015-07-03 Thread Krishna Sankar
Patrick, I assume an RC3 will be out for folks like me to test the distribution. As usual, I will run the tests when you have a new distribution. Cheers k/ On Fri, Jul 3, 2015 at 4:38 PM, Patrick Wendell pwend...@gmail.com wrote: Patch that added test-jar dependencies:

Re: [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Krishna Sankar
Yep, happens to me as well. Build loops. Cheers k/ On Fri, Jul 3, 2015 at 2:40 PM, Ted Yu yuzhih...@gmail.com wrote: Patrick: I used the following command: ~/apache-maven-3.3.1/bin/mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive clean package The build doesn't seem to stop. Here is tail of

Re: except vs subtract

2015-07-03 Thread Krishna Sankar
Thanks. Forgot about that ;o( On Thu, Jul 2, 2015 at 11:57 PM, Reynold Xin r...@databricks.com wrote: except is a keyword in Python unfortunately. On Thu, Jul 2, 2015 at 11:54 PM, Krishna Sankar ksanka...@gmail.com wrote: Guys, Scala says except while python has subtract. (I verified

except vs subtract

2015-07-03 Thread Krishna Sankar
Guys, Scala says except while python has subtract. (I verified that except doesn't exist in python) Why the difference in syntax for the same functionality ? Cheers k/

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-29 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 13:26 min mvn clean package -Pyarn -Phadoop-2.6 -DskipTests 2. Tested pyspark, mllib 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4.

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-28 Thread Krishna Sankar
Patrick, Haven't seen any replies on test results. I will byte ;o) - Should I test this version or is another one in the wings ? Cheers k/ On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 25:42 min (My brand new shiny MacBookPro12,1 : 16GB. Inaugurated the machine with compile test 1.4.0-RC4 !) mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests 2. Tested

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-05-30 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests 2. Tested pyspark, mlib - running as well as compare results with 1.3.1 2.1. statistics

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Krishna Sankar
Quick tests from my side - looks OK. The results are same or very similar to 1.3.1. Will add dataframes et al in future tests. +1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:42 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4

Re: [VOTE] Release Apache Spark 1.3.1 (RC2)

2015-04-08 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 14:16 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11 2. Tested pyspark, mlib - running as well as compare results with 1.3.0 pyspark works

Re: [VOTE] Release Apache Spark 1.2.2

2015-04-06 Thread Krishna Sankar
+1 On Sun, Apr 5, 2015 at 4:24 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.2! The tag to be voted on is v1.2.2-rc1 (commit 7531b50):

Re: [VOTE] Release Apache Spark 1.3.1

2015-04-04 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 15:04 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11 2. Tested pyspark, mlib - running as well as compare results with 1.3.0 pyspark works

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Krishna Sankar
Excellent, Thanks Xiangrui. The mystery is solved. Cheers k/ On Mon, Mar 9, 2015 at 3:30 PM, Xiangrui Meng men...@gmail.com wrote: Krishna, I tested your linear regression example. For linear regression, we changed its objective function from 1/n * \|A x - b\|_2^2 to 1/(2n) * \|Ax - b\|_2^2

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Krishna Sankar
Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop Distributions X ... May be one option is to have a minimum basic set (which I know is what we are discussing) and move the rest to spark-packages.org. There the vendors can add the latest downloads - for example when 1.4 is

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 13:55 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11 2. Tested pyspark, mlib - running as well as compare results with 1.1.x 1.2.x pyspark

Re: [VOTE] Release Apache Spark 1.3.0 (RC2)

2015-03-04 Thread Krishna Sankar
: On Tue, Mar 3, 2015 at 11:15 PM, Krishna Sankar ksanka...@gmail.com wrote: +1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 13:53 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11 2. Tested

Re: [VOTE] Release Apache Spark 1.3.0 (RC2)

2015-03-03 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 13:53 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11 2. Tested pyspark, mlib - running as well as compare results with 1.1.x 1.2.x 2.1.

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-19 Thread Krishna Sankar
Excellent. Explicit toDF() works. a) employees.toDF().registerTempTable(Employees) - works b) Also affects saveAsParquetFile - orders.toDF().saveAsParquetFile Adding to my earlier tests: 4.0 SQL from Scala and Python 4.1 result = sqlContext.sql(SELECT * from Employees WHERE State = 'WA') OK 4.2

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-18 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 14:50 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11 2. Tested pyspark, mlib - running as well as compare results with 1.1.x 1.2.x 2.1.

Re: [VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-02 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 11:13 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11 2. Tested pyspark, mlib - running as well as compare results with 1.1.x 1.2.0 2.1.

Re: [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-28 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 12:22 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests 2. Tested pyspark, mlib - running as well as compare results with 1.1.x 1.2.0 2.1. statistics

Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Krishna Sankar
+1 1. Compiled OSX 10.10 (Yosemite) OK Total time: 12:55 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests 2. Tested pyspark, mlib - running as well as compare results with 1.1.x 1.2.0 2.1. statistics OK 2.2. Linear/Ridge/Laso Regression

Fwd: [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-17 Thread Krishna Sankar
Forgot Reply To All ;o( -- Forwarded message -- From: Krishna Sankar ksanka...@gmail.com Date: Wed, Dec 10, 2014 at 9:16 PM Subject: Re: [VOTE] Release Apache Spark 1.2.0 (RC2) To: Matei Zaharia matei.zaha...@gmail.com +1 Works same as RC1 1. Compiled OSX 10.10 (Yosemite) mvn

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-04 Thread Krishna Sankar
, Nov 30, 2014 at 6:49 AM, Krishna Sankar ksanka...@gmail.com wrote: +1 1. Compiled OSX 10.10 (Yosemite) mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package 16:46 min (slightly slower connection) 2. Tested pyspark, mlib - running as well as compare esults with 1.1.x

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-19 Thread Krishna Sankar
+1 1. Compiled OSX 10.10 (Yosemite) mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package 10:49 min 2. Tested pyspark, mlib 2.1. statistics OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4. KMeans OK 2.5. rdd operations OK 2.6. recommendation OK

Re: Breaking the previous large-scale sort record with Spark

2014-10-13 Thread Krishna Sankar
Well done guys. MapReduce sort at that time was a good feat and Spark now has raised the bar with the ability to sort a PB. Like some of the folks in the list, a summary of what worked (and didn't) as well as the monitoring practices would be good. Cheers k/ P.S: What are you folks planning next ?

Re: [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-05 Thread Krishna Sankar
+1 - Compiled rc2 w/ CentOS 6.5, Yarn,Hadoop 2.2.0 - successful - Smoke Test (scala,python) (distributed cluster) - successful - We had ran Java/SparkSQL (count, distinct et al) ~250M records RDD over HBase 0.98.3 over last build (rc1) - successful - Stand alone multi-node cluster

Re: [VOTE] Release Apache Spark 1.0.1 (RC1)

2014-06-27 Thread Krishna Sankar
+1 Compiled for CentOS 6.5, deployed in our 4 node cluster (Hadoop 2.2, YARN) Smoke Tests (sparkPi,spark-shell, web UI) successful Cheers k/ On Thu, Jun 26, 2014 at 7:06 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version

Re: Contributing Spark Infrastructure Configuration Docs

2014-06-05 Thread Krishna Sankar
Stephen, We are working thru Dell configurations; would be happy to review your diagrams and offer feedback from our experience. Let me know the URLs. Cheers k/ On Thu, Jun 5, 2014 at 2:51 PM, Stephen Watt sw...@redhat.com wrote: Hi Folks My name is Steve Watt and I work in the CTO

Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-05-28 Thread Krishna Sankar
+1 Pulled built on MacOS X, EC2 Amazon Linux Ran test programs on OS X, 5 node c3.4xlarge cluster Cheers k/ On Wed, May 28, 2014 at 7:36 PM, Andy Konwinski andykonwin...@gmail.comwrote: +1 On May 28, 2014 7:05 PM, Xiangrui Meng men...@gmail.com wrote: +1 Tested apps with standalone