Re: Is that possible to launch spark streaming application on yarn with only one machine?

Yu Wei Tue, 19 Jul 2016 08:55:13 -0700

Thanks very much for your help.

Finally I understood the deploy mode with your explanation after trying 
different approach on my development environment.

Thanks again.

________________________________
From: Yu Wei <yu20...@hotmail.com>
Sent: Saturday, July 9, 2016 3:04:40 PM
To: Rabin Banerjee
Cc: Mich Talebzadeh; Deng Ching-Mallete; user
Subject: Re: Is that possible to launch spark streaming application on yarn 
with only one machine?

I tried to flush the information to external system in cluster mode. It works 
well.

I suspect that in yarn cluster mode, stdout is closed.

________________________________
From: Rabin Banerjee <dev.rabin.baner...@gmail.com>
Sent: Saturday, July 9, 2016 4:22:10 AM
To: Yu Wei
Cc: Mich Talebzadeh; Deng Ching-Mallete; user
Subject: Re: Is that possible to launch spark streaming application on yarn 
with only one machine?

Ya , I mean dump in hdfs as a file ,via yarn cluster mode .

On Jul 8, 2016 3:10 PM, "Yu Wei" 
<yu20...@hotmail.com<mailto:yu20...@hotmail.com>> wrote:

How could I dump data into text file? Writing to HDFS or other approach?

Thanks,

Jared

________________________________
From: Rabin Banerjee 
<dev.rabin.baner...@gmail.com<mailto:dev.rabin.baner...@gmail.com>>
Sent: Thursday, July 7, 2016 7:04:29 PM
To: Yu Wei
Cc: Mich Talebzadeh; user; Deng Ching-Mallete
Subject: Re: Is that possible to launch spark streaming application on yarn 
with only one machine?

In that case, I suspect that Mqtt is not getting data while you are submitting  
in yarn cluster .

Can you please try dumping data in text file instead of printing while 
submitting in yarn cluster mode.?

On Jul 7, 2016 12:46 PM, "Yu Wei" 
<yu20...@hotmail.com<mailto:yu20...@hotmail.com>> wrote:

Yes. Thanks for your clarification.

The problem I encountered is that in yarn cluster mode, no output for 
"DStream.print()" in yarn logs.

In spark implementation org/apache/spark/streaming/dstream/DStream.scala, the 
logs related with "Time" was printed out. However, other information for 
firstNum.take(num).foreach(println) was not printed in logs.

What's the root cause for the behavior difference?

/**
   * Print the first ten elements of each RDD generated in this DStream. This 
is an output
   * operator, so this DStream will be registered as an output stream and there 
materialized.
   */
  def print(): Unit = ssc.withScope {
    print(10)
  }

  /**
   * Print the first num elements of each RDD generated in this DStream. This 
is an output
   * operator, so this DStream will be registered as an output stream and there 
materialized.
   */
  def print(num: Int): Unit = ssc.withScope {
    def foreachFunc: (RDD[T], Time) => Unit = {
      (rdd: RDD[T], time: Time) => {
        val firstNum = rdd.take(num + 1)
        // scalastyle:off println
        println("-------------------------------------------")
        println("Time: " + time)
        println("-------------------------------------------")
        firstNum.take(num).foreach(println)
        if (firstNum.length > num) println("...")
        println()
        // scalastyle:on println
      }
    }

Thanks,

Jared

________________________________
From: Rabin Banerjee 
<dev.rabin.baner...@gmail.com<mailto:dev.rabin.baner...@gmail.com>>
Sent: Thursday, July 7, 2016 1:04 PM
To: Yu Wei
Cc: Mich Talebzadeh; Deng Ching-Mallete; 
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Is that possible to launch spark streaming application on yarn 
with only one machine?

In yarn cluster mode , Driver is running in AM , so you can find the logs in 
that AM log . Open rersourcemanager UI , and check for the Job and logs. or 
yarn logs -applicationId <appId>

In yarn client mode , the driver is the same JVM from where you are launching 
,,So you are getting it in the log .

On Thu, Jul 7, 2016 at 7:56 AM, Yu Wei 
<yu20...@hotmail.com<mailto:yu20...@hotmail.com>> wrote:

Launching via client deploy mode, it works again.

I'm still a little confused about the behavior difference for cluster and 
client mode on a single machine.

Thanks,

Jared

________________________________
From: Mich Talebzadeh 
<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>>
Sent: Wednesday, July 6, 2016 9:46:11 PM
To: Yu Wei
Cc: Deng Ching-Mallete; user@spark.apache.org<mailto:user@spark.apache.org>

Subject: Re: Is that possible to launch spark streaming application on yarn 
with only one machine?

Deploy-mode cluster don't think will work.

Try --master yarn --deploy-mode client

FYI

  *   Spark Local - Spark runs on the local host. This is the simplest set up 
and best suited for learners who want to understand different concepts of Spark 
and those performing unit testing.

  *   Spark Standalone – a simple cluster manager included with Spark that 
makes it easy to set up a cluster.

  *   YARN Cluster Mode, the Spark driver runs inside an application master 
process which is managed by YARN on the cluster, and the client can go away 
after initiating the application. This is invoked with –master yarn and 
--deploy-mode cluster

  *   YARN Client Mode, the driver runs in the client process, and the 
application master is only used for requesting resources from YARN. Unlike 
Spark standalone mode, in which the master’s address is specified in the 
--master parameter, in YARN mode the ResourceManager’s address is picked up 
from the Hadoop configuration. Thus, the --master parameter is yarn. This is 
invoked with --deploy-mode client

HTH

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.

On 6 July 2016 at 12:31, Yu Wei 
<yu20...@hotmail.com<mailto:yu20...@hotmail.com>> wrote:

Hi Deng,

I tried the same code again.

It seemed that when launching application via yarn on single node, 
JavaDStream.print() did not work. However, occasionally it worked.

If launch the same application in local mode, it always worked.

The code is as below,

        SparkConf conf = new SparkConf().setAppName("Monitor&Control");
        JavaStreamingContext jssc = new JavaStreamingContext(conf, 
Durations.seconds(1));
        JavaReceiverInputDStream<String> inputDS = MQTTUtils.createStream(jssc, 
"tcp://114.55.145.185:1883<http://114.55.145.185:1883>", "Control");
        inputDS.print();
        jssc.start();
        jssc.awaitTermination();

Command for launching via yarn, (did not work)

spark-submit --master yarn --deploy-mode cluster --driver-memory 4g 
--executor-memory 2g target/CollAna-1.0-SNAPSHOT.jar
 Command for launching via local mode (works)
   spark-submit --master local[4] --driver-memory 4g --executor-memory 2g 
--num-executors 4 target/CollAna-1.0-SNAPSHOT.jar

Any advice?

Thanks,

Jared

________________________________
From: Yu Wei <yu20...@hotmail.com<mailto:yu20...@hotmail.com>>
Sent: Tuesday, July 5, 2016 4:41 PM
To: Deng Ching-Mallete

Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Is that possible to launch spark streaming application on yarn 
with only one machine?

Hi Deng,

Thanks for the help. Actually I need pay more attention to memory usage.

I found the root cause in my problem. It seemed that it existed in spark 
streaming MQTTUtils module.

When I use "localhost" in brokerURL, it doesn't work.

After change it to "127.0.0.1", it works now.

Thanks again,

Jared

________________________________
From: odeach...@gmail.com<mailto:odeach...@gmail.com> 
<odeach...@gmail.com<mailto:odeach...@gmail.com>> on behalf of Deng 
Ching-Mallete <och...@apache.org<mailto:och...@apache.org>>
Sent: Tuesday, July 5, 2016 4:03:28 PM
To: Yu Wei
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Is that possible to launch spark streaming application on yarn 
with only one machine?

Hi Jared,

You can launch a Spark application even with just a single node in YARN, 
provided that the node has enough resources to run the job.

It might also be good to note that when YARN calculates the memory allocation 
for the driver and the executors, there is an additional memory overhead that 
is added for each executor then it gets rounded up to the nearest GB, IIRC. So 
the 4G driver-memory + 4x2G executor memory do not necessarily translate to a 
total of 12G memory allocation. It would be more than that, so the node would 
need to have more than 12G of memory for the job to execute in YARN. You should 
be able to see something like "No resources available in cluster.." in the 
application master logs in YARN if that is the case.

HTH,
Deng

On Tue, Jul 5, 2016 at 4:31 PM, Yu Wei 
<yu20...@hotmail.com<mailto:yu20...@hotmail.com>> wrote:

Hi guys,

I set up pseudo hadoop/yarn cluster on my labtop.

I wrote a simple spark streaming program as below to receive messages with 
MQTTUtils.

conf = new SparkConf().setAppName("Monitor&Control");
jssc = new JavaStreamingContext(conf, Durations.seconds(1));
JavaReceiverInputDStream<String> inputDS = MQTTUtils.createStream(jssc, 
brokerUrl, topic);

inputDS.print();
jssc.start();
jssc.awaitTermination()

If I submitted the app with "--master local[2]", it works well.

spark-submit --master local[4] --driver-memory 4g --executor-memory 2g 
--num-executors 4 target/CollAna-1.0-SNAPSHOT.jar

If I submitted with "--master yarn",  no output for "inputDS.print()".

spark-submit --master yarn --deploy-mode cluster --driver-memory 4g 
--executor-memory 2g --num-executors 4 target/CollAna-1.0-SNAPSHOT.jar

Is it possible to launch spark application on yarn with only one single node?

Thanks for your advice.

Jared

Re: Is that possible to launch spark streaming application on yarn with only one machine?

Reply via email to