Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Proust GZ Feng
Thanks Proust From: Sean Owen so...@cloudera.com To: Proust GZ Feng/China/IBM@IBMCN Cc: user user@spark.apache.org Date: 07/28/2015 02:20 PM Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0 It wasn't removed, but rewritten. Cygwin is just a distribution

Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Proust GZ Feng
) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:117) at org.apache.spark.launcher.Main.main(Main.java:74) Thanks Proust From: Sean Owen so...@cloudera.com To: Proust GZ Feng/China/IBM@IBMCN Cc: user user@spark.apache.org Date: 07/28/2015 06:54 PM

Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Proust GZ Feng
Thanks Vanzin, spark-submit.cmd works Thanks Proust From: Marcelo Vanzin van...@cloudera.com To: Proust GZ Feng/China/IBM@IBMCN Cc: Sean Owen so...@cloudera.com, user user@spark.apache.org Date: 07/29/2015 10:35 AM Subject:Re: NO Cygwin Support in bin/spark-class in Spark

NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-27 Thread Proust GZ Feng
Hi, Spark Users Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of Cygwin support in bin/spark-class The changeset is https://github.com/apache/spark/commit/517975d89d40a77c7186f488547eed11f79c1e97#diff-fdf4d3e600042c63ffa17b692c4372a3 The changeset said Add a library for

Re: Spark DataFrame Reduce Job Took 40s for 6000 Rows

2015-06-15 Thread Proust GZ Feng
Is there any additional idea? Thanks a lot. Proust From: Akhil Das ak...@sigmoidanalytics.com To: Proust GZ Feng/China/IBM@IBMCN Cc: user@spark.apache.org user@spark.apache.org Date: 06/15/2015 03:02 PM Subject:Re: Spark DataFrame Reduce Job Took 40s for 6000 Rows Have

Spark DataFrame Reduce Job Took 40s for 6000 Rows

2015-06-14 Thread Proust GZ Feng
Hi, Spark Experts I have played with Spark several weeks, after some time testing, a reduce operation of DataFrame cost 40s on a cluster with 5 datanode executors. And the back-end rows is about 6,000, is this a normal case? Such performance looks too bad because in Java a loop for 6,000 rows