Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Proust GZ Feng
Thanks Vanzin, spark-submit.cmd works 

Thanks
Proust




From:   Marcelo Vanzin 
To: Proust GZ Feng/China/IBM@IBMCN
Cc: Sean Owen , user 
Date:   07/29/2015 10:35 AM
Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0



Can you run the windows batch files (e.g. spark-submit.cmd) from the 
cygwin shell?

On Tue, Jul 28, 2015 at 7:26 PM, Proust GZ Feng  wrote:
Hi, Owen 

Add back the cygwin classpath detection can pass the issue mentioned 
before, but there seems lack of further support in the launch lib, see 
below stacktrace 

LAUNCH_CLASSPATH: 
C:\spark-1.4.0-bin-hadoop2.3\lib\spark-assembly-1.4.0-hadoop2.3.0.jar 
java -cp 
C:\spark-1.4.0-bin-hadoop2.3\lib\spark-assembly-1.4.0-hadoop2.3.0.jar 
org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit 
--driver-class-path ../thirdparty/lib/db2-jdbc4-95fp6a/db2jcc4.jar 
--properties-file conf/spark.properties 
target/scala-2.10/price-scala-assembly-15.4.0-SNAPSHOT.jar 
Exception in thread "main" java.lang.IllegalStateException: Library 
directory 'C:\c\spark-1.4.0-bin-hadoop2.3\lib_managed\jars' does not 
exist. 
at 
org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:229)
 

at 
org.apache.spark.launcher.AbstractCommandBuilder.buildClassPath(AbstractCommandBuilder.java:215)
 

at 
org.apache.spark.launcher.AbstractCommandBuilder.buildJavaCommand(AbstractCommandBuilder.java:115)
 

at 
org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:192)
 

at 
org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:117)
 

at org.apache.spark.launcher.Main.main(Main.java:74) 

Thanks 
Proust 




From:Sean Owen  
To:    Proust GZ Feng/China/IBM@IBMCN 
Cc:user  
Date:07/28/2015 06:54 PM 
Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0 



Does adding back the cygwin detection and this clause make it work?

if $cygwin; then
 CLASSPATH="`cygpath -wp "$CLASSPATH"`"
fi

If so I imagine that's fine to bring back, if that's still needed.

On Tue, Jul 28, 2015 at 9:49 AM, Proust GZ Feng  wrote:
> Thanks Owen, the problem under Cygwin is while run spark-submit under 
1.4.0,
> it simply report
>
> Error: Could not find or load main class org.apache.spark.launcher.Main
>
> This is because under Cygwin spark-class make the LAUNCH_CLASSPATH as
> "/c/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar"
> But under Cygwin java in Windows cannot recognize the classpath, so 
below
> command simply error out
>
>  java -cp
> /c/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar
> org.apache.spark.launcher.Main
> Error: Could not find or load main class org.apache.spark.launcher.Main
>
> Thanks
> Proust
>
>
>
> From:Sean Owen 
> To:Proust GZ Feng/China/IBM@IBMCN
> Cc:user 
> Date:07/28/2015 02:20 PM
> Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0
> 
>
>
>
> It wasn't removed, but rewritten. Cygwin is just a distribution of
> POSIX-related utilities so you should be able to use the normal .sh
> scripts. In any event, you didn't say what the problem is?
>
> On Tue, Jul 28, 2015 at 5:19 AM, Proust GZ Feng  
wrote:
>> Hi, Spark Users
>>
>> Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of
>> Cygwin
>> support in bin/spark-class
>>
>> The changeset is
>>
>> 
https://github.com/apache/spark/commit/517975d89d40a77c7186f488547eed11f79c1e97#diff-fdf4d3e600042c63ffa17b692c4372a3

>>
>> The changeset said "Add a library for launching Spark jobs
>> programmatically", but how to use it in Cygwin?
>> I'm wondering any solutions available to make it work in Windows?
>>
>>
>> Thanks
>> Proust
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org





-- 
Marcelo


Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Proust GZ Feng
Although I'm not sure how valuable Cygwin support it is, at least the 
release notes need mention that Cygwin is not supported by design from 
1.4.0

>From the description of the changeset, looks like remove the supporting is 
not intended by the author

Thanks
Proust




From:   Sachin Naik 
To: Sean Owen 
Cc: Steve Loughran , Proust GZ 
Feng/China/IBM@IBMCN, "user@spark.apache.org" 
Date:   07/29/2015 05:05 AM
Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0



I agree with Sean - using virtual box on windows and using linux vm is a 
lot easier than trying to circumvent the cygwin oddities. a lot of 
functionality might not work in cygwin and you will end up trying to do 
back patches. Unless there is a compelling reason - cygwin support seems 
not required 


@sachinnaik from iphone


On Jul 28, 2015, at 1:25 PM, Sean Owen  wrote:

> That's for the Windows interpreter rather than bash-running Cygwin. I
> don't know it's worth doing a lot of legwork for Cygwin, but, if it's
> really just a few lines of classpath translation in one script, seems
> reasonable.
> 
> On Tue, Jul 28, 2015 at 9:13 PM, Steve Loughran  
wrote:
>> 
>> there's a spark-submit.cmd file for windows. Does that work?
>> 
>> On 27 Jul 2015, at 21:19, Proust GZ Feng  wrote:
>> 
>> Hi, Spark Users
>> 
>> Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of 
Cygwin
>> support in bin/spark-class
>> 
>> The changeset is
>> 
https://github.com/apache/spark/commit/517975d89d40a77c7186f488547eed11f79c1e97#diff-fdf4d3e600042c63ffa17b692c4372a3

>> 
>> The changeset said "Add a library for launching Spark jobs
>> programmatically", but how to use it in Cygwin?
>> I'm wondering any solutions available to make it work in Windows?
>> 
>> 
>> Thanks
>> Proust
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Proust GZ Feng
Hi, Owen

Add back the cygwin classpath detection can pass the issue mentioned 
before, but there seems lack of further support in the launch lib, see 
below stacktrace

LAUNCH_CLASSPATH: 
C:\spark-1.4.0-bin-hadoop2.3\lib\spark-assembly-1.4.0-hadoop2.3.0.jar
java -cp 
C:\spark-1.4.0-bin-hadoop2.3\lib\spark-assembly-1.4.0-hadoop2.3.0.jar 
org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit 
--driver-class-path ../thirdparty/lib/db2-jdbc4-95fp6a/db2jcc4.jar 
--properties-file conf/spark.properties 
target/scala-2.10/price-scala-assembly-15.4.0-SNAPSHOT.jar
Exception in thread "main" java.lang.IllegalStateException: Library 
directory 'C:\c\spark-1.4.0-bin-hadoop2.3\lib_managed\jars' does not 
exist.
at 
org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:229)
at 
org.apache.spark.launcher.AbstractCommandBuilder.buildClassPath(AbstractCommandBuilder.java:215)
at 
org.apache.spark.launcher.AbstractCommandBuilder.buildJavaCommand(AbstractCommandBuilder.java:115)
at 
org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:192)
at 
org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:117)
at org.apache.spark.launcher.Main.main(Main.java:74)

Thanks
Proust




From:   Sean Owen 
To: Proust GZ Feng/China/IBM@IBMCN
Cc: user 
Date:   07/28/2015 06:54 PM
Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0



Does adding back the cygwin detection and this clause make it work?

if $cygwin; then
  CLASSPATH="`cygpath -wp "$CLASSPATH"`"
fi

If so I imagine that's fine to bring back, if that's still needed.

On Tue, Jul 28, 2015 at 9:49 AM, Proust GZ Feng  wrote:
> Thanks Owen, the problem under Cygwin is while run spark-submit under 
1.4.0,
> it simply report
>
> Error: Could not find or load main class org.apache.spark.launcher.Main
>
> This is because under Cygwin spark-class make the LAUNCH_CLASSPATH as
> "/c/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar"
> But under Cygwin java in Windows cannot recognize the classpath, so 
below
> command simply error out
>
>  java -cp
> /c/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar
> org.apache.spark.launcher.Main
> Error: Could not find or load main class org.apache.spark.launcher.Main
>
> Thanks
> Proust
>
>
>
> From:Sean Owen 
> To:Proust GZ Feng/China/IBM@IBMCN
> Cc:user 
> Date:07/28/2015 02:20 PM
> Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0
> 
>
>
>
> It wasn't removed, but rewritten. Cygwin is just a distribution of
> POSIX-related utilities so you should be able to use the normal .sh
> scripts. In any event, you didn't say what the problem is?
>
> On Tue, Jul 28, 2015 at 5:19 AM, Proust GZ Feng  
wrote:
>> Hi, Spark Users
>>
>> Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of
>> Cygwin
>> support in bin/spark-class
>>
>> The changeset is
>>
>> 
https://github.com/apache/spark/commit/517975d89d40a77c7186f488547eed11f79c1e97#diff-fdf4d3e600042c63ffa17b692c4372a3

>>
>> The changeset said "Add a library for launching Spark jobs
>> programmatically", but how to use it in Cygwin?
>> I'm wondering any solutions available to make it work in Windows?
>>
>>
>> Thanks
>> Proust
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Proust GZ Feng
Thanks Owen, the problem under Cygwin is while run spark-submit under 
1.4.0, it simply report

Error: Could not find or load main class org.apache.spark.launcher.Main

This is because under Cygwin spark-class make the LAUNCH_CLASSPATH as "
/c/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar"
But under Cygwin java in Windows cannot recognize the classpath, so below 
command simply error out

 java -cp 
/c/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar 
org.apache.spark.launcher.Main
Error: Could not find or load main class org.apache.spark.launcher.Main

Thanks
Proust



From:   Sean Owen 
To:     Proust GZ Feng/China/IBM@IBMCN
Cc: user 
Date:   07/28/2015 02:20 PM
Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0



It wasn't removed, but rewritten. Cygwin is just a distribution of
POSIX-related utilities so you should be able to use the normal .sh
scripts. In any event, you didn't say what the problem is?

On Tue, Jul 28, 2015 at 5:19 AM, Proust GZ Feng  wrote:
> Hi, Spark Users
>
> Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of 
Cygwin
> support in bin/spark-class
>
> The changeset is
> 
https://github.com/apache/spark/commit/517975d89d40a77c7186f488547eed11f79c1e97#diff-fdf4d3e600042c63ffa17b692c4372a3

>
> The changeset said "Add a library for launching Spark jobs
> programmatically", but how to use it in Cygwin?
> I'm wondering any solutions available to make it work in Windows?
>
>
> Thanks
> Proust

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-27 Thread Proust GZ Feng
Hi, Spark Users

Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of 
Cygwin support in bin/spark-class

The changeset is 
https://github.com/apache/spark/commit/517975d89d40a77c7186f488547eed11f79c1e97#diff-fdf4d3e600042c63ffa17b692c4372a3

The changeset said "Add a library for launching Spark jobs 
programmatically", but how to use it in Cygwin?
I'm wondering any solutions available to make it work in Windows? 


Thanks
Proust


Re: Spark DataFrame Reduce Job Took 40s for 6000 Rows

2015-06-15 Thread Proust GZ Feng
Thanks a lot Akhil, after try some suggestions in the tuning guide, there 
seems no improvement at all.

And below is the job detail when running locally(8cores) which took 3min 
to complete the job, we can see it is the map operation took most of time, 
looks like the mapPartitions took too long

Is there any additional idea? Thanks a lot.

Proust




From:   Akhil Das 
To: Proust GZ Feng/China/IBM@IBMCN
Cc: "user@spark.apache.org" 
Date:   06/15/2015 03:02 PM
Subject:Re: Spark DataFrame Reduce Job Took 40s for 6000 Rows



Have a look here https://spark.apache.org/docs/latest/tuning.html

Thanks
Best Regards

On Mon, Jun 15, 2015 at 11:27 AM, Proust GZ Feng  wrote:
Hi, Spark Experts 

I have played with Spark several weeks, after some time testing, a reduce 
operation of DataFrame cost 40s on a cluster with 5 datanode executors. 
And the back-end rows is about 6,000, is this a normal case? Such 
performance looks too bad because in Java a loop for 6,000 rows cause just 
several seconds 

I'm wondering any document I should read to make the job much more fast? 




Thanks in advance 
Proust 



Spark DataFrame Reduce Job Took 40s for 6000 Rows

2015-06-14 Thread Proust GZ Feng
Hi, Spark Experts

I have played with Spark several weeks, after some time testing, a reduce 
operation of DataFrame cost 40s on a cluster with 5 datanode executors.
And the back-end rows is about 6,000, is this a normal case? Such 
performance looks too bad because in Java a loop for 6,000 rows cause just 
several seconds

I'm wondering any document I should read to make the job much more fast?




Thanks in advance
Proust