Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0
Thanks Vanzin, spark-submit.cmd works Thanks Proust From: Marcelo Vanzin To: Proust GZ Feng/China/IBM@IBMCN Cc: Sean Owen , user Date: 07/29/2015 10:35 AM Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0 Can you run the windows batch files (e.g. spark-submit.cmd) from the cygwin shell? On Tue, Jul 28, 2015 at 7:26 PM, Proust GZ Feng wrote: Hi, Owen Add back the cygwin classpath detection can pass the issue mentioned before, but there seems lack of further support in the launch lib, see below stacktrace LAUNCH_CLASSPATH: C:\spark-1.4.0-bin-hadoop2.3\lib\spark-assembly-1.4.0-hadoop2.3.0.jar java -cp C:\spark-1.4.0-bin-hadoop2.3\lib\spark-assembly-1.4.0-hadoop2.3.0.jar org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit --driver-class-path ../thirdparty/lib/db2-jdbc4-95fp6a/db2jcc4.jar --properties-file conf/spark.properties target/scala-2.10/price-scala-assembly-15.4.0-SNAPSHOT.jar Exception in thread "main" java.lang.IllegalStateException: Library directory 'C:\c\spark-1.4.0-bin-hadoop2.3\lib_managed\jars' does not exist. at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:229) at org.apache.spark.launcher.AbstractCommandBuilder.buildClassPath(AbstractCommandBuilder.java:215) at org.apache.spark.launcher.AbstractCommandBuilder.buildJavaCommand(AbstractCommandBuilder.java:115) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:192) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:117) at org.apache.spark.launcher.Main.main(Main.java:74) Thanks Proust From:Sean Owen To: Proust GZ Feng/China/IBM@IBMCN Cc:user Date:07/28/2015 06:54 PM Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0 Does adding back the cygwin detection and this clause make it work? if $cygwin; then CLASSPATH="`cygpath -wp "$CLASSPATH"`" fi If so I imagine that's fine to bring back, if that's still needed. On Tue, Jul 28, 2015 at 9:49 AM, Proust GZ Feng wrote: > Thanks Owen, the problem under Cygwin is while run spark-submit under 1.4.0, > it simply report > > Error: Could not find or load main class org.apache.spark.launcher.Main > > This is because under Cygwin spark-class make the LAUNCH_CLASSPATH as > "/c/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar" > But under Cygwin java in Windows cannot recognize the classpath, so below > command simply error out > > java -cp > /c/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar > org.apache.spark.launcher.Main > Error: Could not find or load main class org.apache.spark.launcher.Main > > Thanks > Proust > > > > From:Sean Owen > To:Proust GZ Feng/China/IBM@IBMCN > Cc:user > Date:07/28/2015 02:20 PM > Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0 > > > > > It wasn't removed, but rewritten. Cygwin is just a distribution of > POSIX-related utilities so you should be able to use the normal .sh > scripts. In any event, you didn't say what the problem is? > > On Tue, Jul 28, 2015 at 5:19 AM, Proust GZ Feng wrote: >> Hi, Spark Users >> >> Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of >> Cygwin >> support in bin/spark-class >> >> The changeset is >> >> https://github.com/apache/spark/commit/517975d89d40a77c7186f488547eed11f79c1e97#diff-fdf4d3e600042c63ffa17b692c4372a3 >> >> The changeset said "Add a library for launching Spark jobs >> programmatically", but how to use it in Cygwin? >> I'm wondering any solutions available to make it work in Windows? >> >> >> Thanks >> Proust > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Marcelo
Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0
Although I'm not sure how valuable Cygwin support it is, at least the release notes need mention that Cygwin is not supported by design from 1.4.0 >From the description of the changeset, looks like remove the supporting is not intended by the author Thanks Proust From: Sachin Naik To: Sean Owen Cc: Steve Loughran , Proust GZ Feng/China/IBM@IBMCN, "user@spark.apache.org" Date: 07/29/2015 05:05 AM Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0 I agree with Sean - using virtual box on windows and using linux vm is a lot easier than trying to circumvent the cygwin oddities. a lot of functionality might not work in cygwin and you will end up trying to do back patches. Unless there is a compelling reason - cygwin support seems not required @sachinnaik from iphone On Jul 28, 2015, at 1:25 PM, Sean Owen wrote: > That's for the Windows interpreter rather than bash-running Cygwin. I > don't know it's worth doing a lot of legwork for Cygwin, but, if it's > really just a few lines of classpath translation in one script, seems > reasonable. > > On Tue, Jul 28, 2015 at 9:13 PM, Steve Loughran wrote: >> >> there's a spark-submit.cmd file for windows. Does that work? >> >> On 27 Jul 2015, at 21:19, Proust GZ Feng wrote: >> >> Hi, Spark Users >> >> Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of Cygwin >> support in bin/spark-class >> >> The changeset is >> https://github.com/apache/spark/commit/517975d89d40a77c7186f488547eed11f79c1e97#diff-fdf4d3e600042c63ffa17b692c4372a3 >> >> The changeset said "Add a library for launching Spark jobs >> programmatically", but how to use it in Cygwin? >> I'm wondering any solutions available to make it work in Windows? >> >> >> Thanks >> Proust > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0
Hi, Owen Add back the cygwin classpath detection can pass the issue mentioned before, but there seems lack of further support in the launch lib, see below stacktrace LAUNCH_CLASSPATH: C:\spark-1.4.0-bin-hadoop2.3\lib\spark-assembly-1.4.0-hadoop2.3.0.jar java -cp C:\spark-1.4.0-bin-hadoop2.3\lib\spark-assembly-1.4.0-hadoop2.3.0.jar org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit --driver-class-path ../thirdparty/lib/db2-jdbc4-95fp6a/db2jcc4.jar --properties-file conf/spark.properties target/scala-2.10/price-scala-assembly-15.4.0-SNAPSHOT.jar Exception in thread "main" java.lang.IllegalStateException: Library directory 'C:\c\spark-1.4.0-bin-hadoop2.3\lib_managed\jars' does not exist. at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:229) at org.apache.spark.launcher.AbstractCommandBuilder.buildClassPath(AbstractCommandBuilder.java:215) at org.apache.spark.launcher.AbstractCommandBuilder.buildJavaCommand(AbstractCommandBuilder.java:115) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:192) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:117) at org.apache.spark.launcher.Main.main(Main.java:74) Thanks Proust From: Sean Owen To: Proust GZ Feng/China/IBM@IBMCN Cc: user Date: 07/28/2015 06:54 PM Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0 Does adding back the cygwin detection and this clause make it work? if $cygwin; then CLASSPATH="`cygpath -wp "$CLASSPATH"`" fi If so I imagine that's fine to bring back, if that's still needed. On Tue, Jul 28, 2015 at 9:49 AM, Proust GZ Feng wrote: > Thanks Owen, the problem under Cygwin is while run spark-submit under 1.4.0, > it simply report > > Error: Could not find or load main class org.apache.spark.launcher.Main > > This is because under Cygwin spark-class make the LAUNCH_CLASSPATH as > "/c/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar" > But under Cygwin java in Windows cannot recognize the classpath, so below > command simply error out > > java -cp > /c/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar > org.apache.spark.launcher.Main > Error: Could not find or load main class org.apache.spark.launcher.Main > > Thanks > Proust > > > > From:Sean Owen > To:Proust GZ Feng/China/IBM@IBMCN > Cc:user > Date:07/28/2015 02:20 PM > Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0 > > > > > It wasn't removed, but rewritten. Cygwin is just a distribution of > POSIX-related utilities so you should be able to use the normal .sh > scripts. In any event, you didn't say what the problem is? > > On Tue, Jul 28, 2015 at 5:19 AM, Proust GZ Feng wrote: >> Hi, Spark Users >> >> Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of >> Cygwin >> support in bin/spark-class >> >> The changeset is >> >> https://github.com/apache/spark/commit/517975d89d40a77c7186f488547eed11f79c1e97#diff-fdf4d3e600042c63ffa17b692c4372a3 >> >> The changeset said "Add a library for launching Spark jobs >> programmatically", but how to use it in Cygwin? >> I'm wondering any solutions available to make it work in Windows? >> >> >> Thanks >> Proust > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0
Thanks Owen, the problem under Cygwin is while run spark-submit under 1.4.0, it simply report Error: Could not find or load main class org.apache.spark.launcher.Main This is because under Cygwin spark-class make the LAUNCH_CLASSPATH as " /c/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar" But under Cygwin java in Windows cannot recognize the classpath, so below command simply error out java -cp /c/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar org.apache.spark.launcher.Main Error: Could not find or load main class org.apache.spark.launcher.Main Thanks Proust From: Sean Owen To: Proust GZ Feng/China/IBM@IBMCN Cc: user Date: 07/28/2015 02:20 PM Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0 It wasn't removed, but rewritten. Cygwin is just a distribution of POSIX-related utilities so you should be able to use the normal .sh scripts. In any event, you didn't say what the problem is? On Tue, Jul 28, 2015 at 5:19 AM, Proust GZ Feng wrote: > Hi, Spark Users > > Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of Cygwin > support in bin/spark-class > > The changeset is > https://github.com/apache/spark/commit/517975d89d40a77c7186f488547eed11f79c1e97#diff-fdf4d3e600042c63ffa17b692c4372a3 > > The changeset said "Add a library for launching Spark jobs > programmatically", but how to use it in Cygwin? > I'm wondering any solutions available to make it work in Windows? > > > Thanks > Proust - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
NO Cygwin Support in bin/spark-class in Spark 1.4.0
Hi, Spark Users Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of Cygwin support in bin/spark-class The changeset is https://github.com/apache/spark/commit/517975d89d40a77c7186f488547eed11f79c1e97#diff-fdf4d3e600042c63ffa17b692c4372a3 The changeset said "Add a library for launching Spark jobs programmatically", but how to use it in Cygwin? I'm wondering any solutions available to make it work in Windows? Thanks Proust
Re: Spark DataFrame Reduce Job Took 40s for 6000 Rows
Thanks a lot Akhil, after try some suggestions in the tuning guide, there seems no improvement at all. And below is the job detail when running locally(8cores) which took 3min to complete the job, we can see it is the map operation took most of time, looks like the mapPartitions took too long Is there any additional idea? Thanks a lot. Proust From: Akhil Das To: Proust GZ Feng/China/IBM@IBMCN Cc: "user@spark.apache.org" Date: 06/15/2015 03:02 PM Subject:Re: Spark DataFrame Reduce Job Took 40s for 6000 Rows Have a look here https://spark.apache.org/docs/latest/tuning.html Thanks Best Regards On Mon, Jun 15, 2015 at 11:27 AM, Proust GZ Feng wrote: Hi, Spark Experts I have played with Spark several weeks, after some time testing, a reduce operation of DataFrame cost 40s on a cluster with 5 datanode executors. And the back-end rows is about 6,000, is this a normal case? Such performance looks too bad because in Java a loop for 6,000 rows cause just several seconds I'm wondering any document I should read to make the job much more fast? Thanks in advance Proust
Spark DataFrame Reduce Job Took 40s for 6000 Rows
Hi, Spark Experts I have played with Spark several weeks, after some time testing, a reduce operation of DataFrame cost 40s on a cluster with 5 datanode executors. And the back-end rows is about 6,000, is this a normal case? Such performance looks too bad because in Java a loop for 6,000 rows cause just several seconds I'm wondering any document I should read to make the job much more fast? Thanks in advance Proust