Re: How to use FlumeInputDStream in spark cluster?

Ping Tang Mon, 01 Dec 2014 12:03:07 -0800

Thank you very much for your reply.

I have a cluster of 8 nodes: m1, m2, m3.. m8. m1 configured as Spark master 
node, the rest of the nodes are all worker node. I also configured m3 as the 
History Server. But the history server fails to start.I ran FlumeEventCount in 
m1 using the right hostname and a port that is not used by any application. 
Here is the script I used to run FlumeEventCount:



#!/bin/bash


spark-submit --verbose --class 
org.apache.spark.examples.streaming.FlumeEventCount --deploy-mode client 
--master yarn-client --jars 
lib/spark-streaming-flume_2.10-1.1.0-cdh5.2.2-20141112.193826-1.jar 
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/spark/examples/lib/spark-examples-1.1.0-cdh5.2.0-hadoop2.5.0-cdh5.2.0.jar
 m1.ptang.aerohive.com 1234


Same issue observed after added  "spark.ui.port=4321" in  
/etc/spark/conf/spark-defaults.conf. Followings are the exceptions from the job 
run:


14/12/01 11:51:45 INFO JobScheduler: Finished job streaming job 1417463504000 
ms.0 from job set of time 1417463504000 ms

14/12/01 11:51:45 INFO JobScheduler: Total delay: 1.465 s for time 
1417463504000 ms (execution: 1.415 s)

14/12/01 11:51:45 ERROR ReceiverTracker: Deregistered receiver for stream 0: 
Error starting receiver 0 - org.jboss.netty.channel.ChannelException: Failed to 
bind to: m1.ptang.aerohive.com/192.168.10.22:1234

at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)

at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:106)

at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:119)

at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:74)

at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:68)

at 
org.apache.spark.streaming.flume.FlumeReceiver.initServer(FlumeInputDStream.scala:164)

at 
org.apache.spark.streaming.flume.FlumeReceiver.onStart(FlumeInputDStream.scala:171)

at 
org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:121)

at 
org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106)

at 
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:264)

at 
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:257)

at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)

at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)

at org.apache.spark.scheduler.Task.run(Task.scala:54)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.net.BindException: Cannot assign requested address

at sun.nio.ch.Net.bind0(Native Method)

at sun.nio.ch.Net.bind(Net.java:444)

at sun.nio.ch.Net.bind(Net.java:436)

at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)

at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)

at 
org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)

at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)

at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)

at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)

... 3 more


14/12/01 11:51:45 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 72, 
m8.ptang.aerohive.com): org.jboss.netty.channel.ChannelException: Failed to 
bind to: m1.ptang.aerohive.com/192.168.10.22:1234

        org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)

        org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:106)

        org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:119)

        org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:74)

        org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:68)

        
org.apache.spark.streaming.flume.FlumeReceiver.initServer(FlumeInputDStream.scala:164)

        
org.apache.spark.streaming.flume.FlumeReceiver.onStart(FlumeInputDStream.scala:171)

        
org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:121)

        
org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106)

        
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:264)

        
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:257)

        
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)

        
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)

        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)

        org.apache.spark.scheduler.Task.run(Task.scala:54)

        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180)

        
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        java.lang.Thread.run(Thread.java:745)

Thanks

Ping

From: Prannoy 
<pran...@sigmoidanalytics.com<mailto:pran...@sigmoidanalytics.com>>
Date: Friday, November 28, 2014 at 12:56 AM
To: "u...@spark.incubator.apache.org<mailto:u...@spark.incubator.apache.org>" 
<u...@spark.incubator.apache.org<mailto:u...@spark.incubator.apache.org>>
Subject: Re: How to use FlumeInputDStream in spark cluster?

Hi,

BindException comes when two processes are using the same port. In your spark 
configuration just set ("spark.ui.port","xxxxx"),
to some other port. xxxxx can be any number say 12345. BindException will not 
break your job in either case. Just to fix it change the port number.

Thanks.

On Fri, Nov 28, 2014 at 1:30 PM, pamtang [via Apache Spark User List] <[hidden 
email]</user/SendEmail.jtp?type=node&node=19999&i=0>> wrote:
I'm seeing the same issue on CDH 5.2 with Spark 1.1. FlumeEventCount works fine 
on a Standalone cluster but throw BindException on YARN mode. Is there a 
solution to this problem or FlumeInputDStream will not be working in a cluster 
environment?

________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-FlumeInputDStream-in-spark-cluster-tp1604p19997.html
To start a new topic under Apache Spark User List, email [hidden 
email]</user/SendEmail.jtp?type=node&node=19999&i=1>
To unsubscribe from Apache Spark User List, click here.
NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>


________________________________
View this message in context: Re: How to use FlumeInputDStream in spark 
cluster?<http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-FlumeInputDStream-in-spark-cluster-tp1604p19999.html>
Sent from the Apache Spark User List mailing list 
archive<http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.

Re: How to use FlumeInputDStream in spark cluster?

Reply via email to