RE: SparkLauncher not notified about finished job - hangs infinitely.

2015-08-03 Thread Tomasz Guziałek
Reading from the input stream and the error stream (in separate threads) indeed 
unblocked the launcher and it exited properly. Thanks for your responses!

Best regards,
Tomasz

From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Friday, July 31, 2015 19:20
To: Elkhan Dadashov
Cc: Tomasz Guziałek; user@spark.apache.org
Subject: Re: SparkLauncher not notified about finished job - hangs infinitely.

Tomasz:
Please take a look at the Redirector class inside:
./launcher/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java

FYI

On Fri, Jul 31, 2015 at 10:02 AM, Elkhan Dadashov 
mailto:elkhan8...@gmail.com>> wrote:
Hi Tomasz,

Answer to your 1st question:

Clear/read the error (spark.getErrorStream()) and output 
(spark.getInputStream()) stream buffers before you call spark.waitFor(), it 
would be better to clear/read them with 2 different threads. Then it should 
work fine.

As Spark job is launched as subprocess, and according to Oracle 
documentation<https://docs.oracle.com/javase/8/docs/api/java/lang/Process.html>:

"By default, the created subprocess does not have its own terminal or console. 
All its standard I/O (i.e. stdin, stdout, stderr) operations will be redirected 
to the parent process, where they can be accessed via the streams obtained 
using the methodsgetOutputStream(), getInputStream(), and getErrorStream(). The 
parent process uses these streams to feed input to and get output from the 
subprocess. Because some native platforms only provide limited buffer size for 
standard input and output streams, failure to promptly write the input stream 
or read the output stream of the subprocess may cause the subprocess to block, 
or even deadlock.
"



On Fri, Jul 31, 2015 at 2:45 AM, Tomasz Guziałek 
mailto:tomasz.guzia...@humaninference.com>> 
wrote:
I am trying to submit a JAR with Spark job into the YARN cluster from Java 
code. I am using SparkLauncher to submit SparkPi example:

Process spark = new SparkLauncher()

.setAppResource("C:\\spark-1.4.1-bin-hadoop2.6\\lib\\spark-examples-1.4.1-hadoop2.6.0.jar")
.setMainClass("org.apache.spark.examples.SparkPi")
.setMaster("yarn-cluster")
.launch();
System.out.println("Waiting for finish...");
int exitCode = spark.waitFor();
System.out.println("Finished! Exit code:" + exitCode);

There are two problems:

1. While submitting in "yarn-cluster" mode, the application is successfully 
submitted to YARN and executes successfully (it is visible in the YARN UI, 
reported as SUCCESS and PI value is printed in the output). However, the 
submitting application is never notified that processing is finished - it hangs 
infinitely after printing "Waiting to finish..." The log of the container can 
be found here: http://pastebin.com/LscBjHQc
2. While submitting in "yarn-client" mode, the application does not appear in 
YARN UI and the submitting application hangs at "Waiting to finish..." When 
hanging code is killed, the application shows up in YARN UI and it is reported 
as SUCCESS, but the output is empty (PI value is not printed out). The log of 
the container can be found here: http://pastebin.com/9KHi81r4

I tried to execute the submitting application both with Oracle Java 8 and 7.

Any hints what might be wrong?

Best regards,
Tomasz



--

Best regards,
Elkhan Dadashov



Re: SparkLauncher not notified about finished job - hangs infinitely.

2015-07-31 Thread Ted Yu
Tomasz:
Please take a look at the Redirector class inside:
./launcher/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java

FYI

On Fri, Jul 31, 2015 at 10:02 AM, Elkhan Dadashov 
wrote:

> Hi Tomasz,
>
> *Answer to your 1st question*:
>
> Clear/read the error (spark.getErrorStream()) and output
> (spark.getInputStream()) stream buffers before you call spark.waitFor(), it
> would be better to clear/read them with 2 different threads. Then it should
> work fine.
>
> As Spark job is launched as subprocess, and according to Oracle
> documentation
> :
>
> "By default, the created subprocess does not have its own terminal or
> console. All its standard I/O (i.e. stdin, stdout, stderr) operations will
> be redirected to the parent process, where they can be accessed via the
> streams obtained using the methodsgetOutputStream(), getInputStream(), and
> getErrorStream(). The parent process uses these streams to feed input to
> and get output from the subprocess. Because some native platforms only
> provide limited buffer size for standard input and output streams, failure
> to promptly write the input stream or read the output stream of the
> subprocess may cause the subprocess to block, or even deadlock.
> "
>
>
>
> On Fri, Jul 31, 2015 at 2:45 AM, Tomasz Guziałek <
> tomasz.guzia...@humaninference.com> wrote:
>
>> I am trying to submit a JAR with Spark job into the YARN cluster from
>> Java code. I am using SparkLauncher to submit SparkPi example:
>>
>> Process spark = new SparkLauncher()
>>
>> .setAppResource("C:\\spark-1.4.1-bin-hadoop2.6\\lib\\spark-examples-1.4.1-hadoop2.6.0.jar")
>> .setMainClass("org.apache.spark.examples.SparkPi")
>> .setMaster("yarn-cluster")
>> .launch();
>> System.out.println("Waiting for finish...");
>> int exitCode = spark.waitFor();
>> System.out.println("Finished! Exit code:" + exitCode);
>>
>> There are two problems:
>>
>> 1. While submitting in "yarn-cluster" mode, the application is
>> successfully submitted to YARN and executes successfully (it is visible in
>> the YARN UI, reported as SUCCESS and PI value is printed in the output).
>> However, the submitting application is never notified that processing is
>> finished - it hangs infinitely after printing "Waiting to finish..." The
>> log of the container can be found here: http://pastebin.com/LscBjHQc
>> 2. While submitting in "yarn-client" mode, the application does not
>> appear in YARN UI and the submitting application hangs at "Waiting to
>> finish..." When hanging code is killed, the application shows up in YARN UI
>> and it is reported as SUCCESS, but the output is empty (PI value is not
>> printed out). The log of the container can be found here:
>> http://pastebin.com/9KHi81r4
>>
>> I tried to execute the submitting application both with Oracle Java 8 and
>> 7.
>>
>>
>>
>> Any hints what might be wrong?
>>
>>
>>
>> Best regards,
>>
>> Tomasz
>>
>
>
>
> --
>
> Best regards,
> Elkhan Dadashov
>


Re: SparkLauncher not notified about finished job - hangs infinitely.

2015-07-31 Thread Elkhan Dadashov
Nope, output stream of that subprocess should be spark.getInputStream()

According to Oracle Doc

:

"public abstract InputStream

 getInputStream()
Returns the input stream connected to the normal output of the subprocess.
The stream obtains data piped from the standard output of the process
represented by this Process object."

On Fri, Jul 31, 2015 at 10:10 AM, Ted Yu  wrote:

> minor typo:
>
> bq. output (spark.getInputStream())
>
> Should be spark.getOutputStream()
>
> Cheers
>
> On Fri, Jul 31, 2015 at 10:02 AM, Elkhan Dadashov 
> wrote:
>
>> Hi Tomasz,
>>
>> *Answer to your 1st question*:
>>
>> Clear/read the error (spark.getErrorStream()) and output
>> (spark.getInputStream()) stream buffers before you call spark.waitFor(), it
>> would be better to clear/read them with 2 different threads. Then it should
>> work fine.
>>
>> As Spark job is launched as subprocess, and according to Oracle
>> documentation
>> :
>>
>> "By default, the created subprocess does not have its own terminal or
>> console. All its standard I/O (i.e. stdin, stdout, stderr) operations will
>> be redirected to the parent process, where they can be accessed via the
>> streams obtained using the methodsgetOutputStream(), getInputStream(), and
>> getErrorStream(). The parent process uses these streams to feed input to
>> and get output from the subprocess. Because some native platforms only
>> provide limited buffer size for standard input and output streams, failure
>> to promptly write the input stream or read the output stream of the
>> subprocess may cause the subprocess to block, or even deadlock.
>> "
>>
>>
>>
>> On Fri, Jul 31, 2015 at 2:45 AM, Tomasz Guziałek <
>> tomasz.guzia...@humaninference.com> wrote:
>>
>>> I am trying to submit a JAR with Spark job into the YARN cluster from
>>> Java code. I am using SparkLauncher to submit SparkPi example:
>>>
>>> Process spark = new SparkLauncher()
>>>
>>> .setAppResource("C:\\spark-1.4.1-bin-hadoop2.6\\lib\\spark-examples-1.4.1-hadoop2.6.0.jar")
>>> .setMainClass("org.apache.spark.examples.SparkPi")
>>> .setMaster("yarn-cluster")
>>> .launch();
>>> System.out.println("Waiting for finish...");
>>> int exitCode = spark.waitFor();
>>> System.out.println("Finished! Exit code:" + exitCode);
>>>
>>> There are two problems:
>>>
>>> 1. While submitting in "yarn-cluster" mode, the application is
>>> successfully submitted to YARN and executes successfully (it is visible in
>>> the YARN UI, reported as SUCCESS and PI value is printed in the output).
>>> However, the submitting application is never notified that processing is
>>> finished - it hangs infinitely after printing "Waiting to finish..." The
>>> log of the container can be found here: http://pastebin.com/LscBjHQc
>>> 2. While submitting in "yarn-client" mode, the application does not
>>> appear in YARN UI and the submitting application hangs at "Waiting to
>>> finish..." When hanging code is killed, the application shows up in YARN UI
>>> and it is reported as SUCCESS, but the output is empty (PI value is not
>>> printed out). The log of the container can be found here:
>>> http://pastebin.com/9KHi81r4
>>>
>>> I tried to execute the submitting application both with Oracle Java 8
>>> and 7.
>>>
>>>
>>>
>>> Any hints what might be wrong?
>>>
>>>
>>>
>>> Best regards,
>>>
>>> Tomasz
>>>
>>
>>
>>
>> --
>>
>> Best regards,
>> Elkhan Dadashov
>>
>
>


-- 

Best regards,
Elkhan Dadashov


Re: SparkLauncher not notified about finished job - hangs infinitely.

2015-07-31 Thread Ted Yu
minor typo:

bq. output (spark.getInputStream())

Should be spark.getOutputStream()

Cheers

On Fri, Jul 31, 2015 at 10:02 AM, Elkhan Dadashov 
wrote:

> Hi Tomasz,
>
> *Answer to your 1st question*:
>
> Clear/read the error (spark.getErrorStream()) and output
> (spark.getInputStream()) stream buffers before you call spark.waitFor(), it
> would be better to clear/read them with 2 different threads. Then it should
> work fine.
>
> As Spark job is launched as subprocess, and according to Oracle
> documentation
> :
>
> "By default, the created subprocess does not have its own terminal or
> console. All its standard I/O (i.e. stdin, stdout, stderr) operations will
> be redirected to the parent process, where they can be accessed via the
> streams obtained using the methodsgetOutputStream(), getInputStream(), and
> getErrorStream(). The parent process uses these streams to feed input to
> and get output from the subprocess. Because some native platforms only
> provide limited buffer size for standard input and output streams, failure
> to promptly write the input stream or read the output stream of the
> subprocess may cause the subprocess to block, or even deadlock.
> "
>
>
>
> On Fri, Jul 31, 2015 at 2:45 AM, Tomasz Guziałek <
> tomasz.guzia...@humaninference.com> wrote:
>
>> I am trying to submit a JAR with Spark job into the YARN cluster from
>> Java code. I am using SparkLauncher to submit SparkPi example:
>>
>> Process spark = new SparkLauncher()
>>
>> .setAppResource("C:\\spark-1.4.1-bin-hadoop2.6\\lib\\spark-examples-1.4.1-hadoop2.6.0.jar")
>> .setMainClass("org.apache.spark.examples.SparkPi")
>> .setMaster("yarn-cluster")
>> .launch();
>> System.out.println("Waiting for finish...");
>> int exitCode = spark.waitFor();
>> System.out.println("Finished! Exit code:" + exitCode);
>>
>> There are two problems:
>>
>> 1. While submitting in "yarn-cluster" mode, the application is
>> successfully submitted to YARN and executes successfully (it is visible in
>> the YARN UI, reported as SUCCESS and PI value is printed in the output).
>> However, the submitting application is never notified that processing is
>> finished - it hangs infinitely after printing "Waiting to finish..." The
>> log of the container can be found here: http://pastebin.com/LscBjHQc
>> 2. While submitting in "yarn-client" mode, the application does not
>> appear in YARN UI and the submitting application hangs at "Waiting to
>> finish..." When hanging code is killed, the application shows up in YARN UI
>> and it is reported as SUCCESS, but the output is empty (PI value is not
>> printed out). The log of the container can be found here:
>> http://pastebin.com/9KHi81r4
>>
>> I tried to execute the submitting application both with Oracle Java 8 and
>> 7.
>>
>>
>>
>> Any hints what might be wrong?
>>
>>
>>
>> Best regards,
>>
>> Tomasz
>>
>
>
>
> --
>
> Best regards,
> Elkhan Dadashov
>


Re: SparkLauncher not notified about finished job - hangs infinitely.

2015-07-31 Thread Elkhan Dadashov
Hi Tomasz,

*Answer to your 1st question*:

Clear/read the error (spark.getErrorStream()) and output
(spark.getInputStream()) stream buffers before you call spark.waitFor(), it
would be better to clear/read them with 2 different threads. Then it should
work fine.

As Spark job is launched as subprocess, and according to Oracle
documentation
:

"By default, the created subprocess does not have its own terminal or
console. All its standard I/O (i.e. stdin, stdout, stderr) operations will
be redirected to the parent process, where they can be accessed via the
streams obtained using the methodsgetOutputStream(), getInputStream(), and
getErrorStream(). The parent process uses these streams to feed input to
and get output from the subprocess. Because some native platforms only
provide limited buffer size for standard input and output streams, failure
to promptly write the input stream or read the output stream of the
subprocess may cause the subprocess to block, or even deadlock.
"



On Fri, Jul 31, 2015 at 2:45 AM, Tomasz Guziałek <
tomasz.guzia...@humaninference.com> wrote:

> I am trying to submit a JAR with Spark job into the YARN cluster from Java
> code. I am using SparkLauncher to submit SparkPi example:
>
> Process spark = new SparkLauncher()
>
> .setAppResource("C:\\spark-1.4.1-bin-hadoop2.6\\lib\\spark-examples-1.4.1-hadoop2.6.0.jar")
> .setMainClass("org.apache.spark.examples.SparkPi")
> .setMaster("yarn-cluster")
> .launch();
> System.out.println("Waiting for finish...");
> int exitCode = spark.waitFor();
> System.out.println("Finished! Exit code:" + exitCode);
>
> There are two problems:
>
> 1. While submitting in "yarn-cluster" mode, the application is
> successfully submitted to YARN and executes successfully (it is visible in
> the YARN UI, reported as SUCCESS and PI value is printed in the output).
> However, the submitting application is never notified that processing is
> finished - it hangs infinitely after printing "Waiting to finish..." The
> log of the container can be found here: http://pastebin.com/LscBjHQc
> 2. While submitting in "yarn-client" mode, the application does not appear
> in YARN UI and the submitting application hangs at "Waiting to finish..."
> When hanging code is killed, the application shows up in YARN UI and it is
> reported as SUCCESS, but the output is empty (PI value is not printed out).
> The log of the container can be found here: http://pastebin.com/9KHi81r4
>
> I tried to execute the submitting application both with Oracle Java 8 and
> 7.
>
>
>
> Any hints what might be wrong?
>
>
>
> Best regards,
>
> Tomasz
>



-- 

Best regards,
Elkhan Dadashov