Re: Re: About FlumeUtils.createStream

2015-02-23 Thread bit1...@163.com
Thanks both of you guys on this!



bit1...@163.com
 
From: Akhil Das
Date: 2015-02-24 12:58
To: Tathagata Das
CC: user; bit1129
Subject: Re: About FlumeUtils.createStream
I see, thanks for the clarification TD.
On 24 Feb 2015 09:56, Tathagata Das t...@databricks.com wrote:
Akhil, that is incorrect. 

Spark will list on the given port for Flume to push data into it. 
When in local mode, it will listen on localhost:
When in some kind of cluster, instead of localhost you will have to give the 
hostname of the cluster node where you want Flume to forward the data. Spark 
will launch the Flume receiver on that node (assuming the hostname matching is 
correct), and list on port , for receiving data from Flume. So only the 
configured machine will listen on port . 

I suggest trying the other stream. FlumeUtils.createPollingStream. More details 
here. 
http://spark.apache.org/docs/latest/streaming-flume-integration.html



On Sat, Feb 21, 2015 at 12:17 AM, Akhil Das ak...@sigmoidanalytics.com wrote:
Spark won't listen on  mate, It basically means you have a flume source 
running at port  of your localhost. And when you submit your application in 
standalone mode, workers will consume date from that port.

Thanks
Best Regards

On Sat, Feb 21, 2015 at 9:22 AM, bit1...@163.com bit1...@163.com wrote:

Hi,
In the spark streaming application, I write the code, 
FlumeUtils.createStream(ssc,localhost,),which means spark will listen on 
the  port, and wait for Flume Sink to write to it.
My question is:  when I submit the application to the Spark Standalone cluster, 
will  be opened only on the Driver Machine or all the workers will also 
open the  port and wait for the Flume data? 








Re: Re: About FlumeUtils.createStream

2015-02-23 Thread bit1...@163.com
The behvior is exactly what I expected. Thanks Akhil and Tathagata!



bit1...@163.com
 
From: Akhil Das
Date: 2015-02-24 13:32
To: bit1129
CC: Tathagata Das; user
Subject: Re: Re: About FlumeUtils.createStream
That depends on how many machines you have in your cluster. Say you have 6 
workers and its most likely it is to be distributed across all worker (assuming 
your topic has 6 partitions). Now when you have more than 6 partition, say 12. 
Then these 6 receivers will start to consume from 2 partitions at a time. And 
when you have less partitions say 3, then 3 of the receivers will be idle.
On 24 Feb 2015 10:16, bit1...@163.com bit1...@163.com wrote:
Hi, Akhil,Tathagata,

This leads me to another question ,For the Spark Streaming and Kafka 
Integration, If there are more than one Receiver in the cluster, such as 
  val streams = (1 to 6).map ( _ = KafkaUtils.createStream(ssc, zkQuorum, 
group, topicMap).map(_._2) ), 
then these Receivers will stay on one cluster node, or will they distributed 
among the cluster nodes?



bit1...@163.com
 
From: Akhil Das
Date: 2015-02-24 12:58
To: Tathagata Das
CC: user; bit1129
Subject: Re: About FlumeUtils.createStream
I see, thanks for the clarification TD.
On 24 Feb 2015 09:56, Tathagata Das t...@databricks.com wrote:
Akhil, that is incorrect. 

Spark will list on the given port for Flume to push data into it. 
When in local mode, it will listen on localhost:
When in some kind of cluster, instead of localhost you will have to give the 
hostname of the cluster node where you want Flume to forward the data. Spark 
will launch the Flume receiver on that node (assuming the hostname matching is 
correct), and list on port , for receiving data from Flume. So only the 
configured machine will listen on port . 

I suggest trying the other stream. FlumeUtils.createPollingStream. More details 
here. 
http://spark.apache.org/docs/latest/streaming-flume-integration.html



On Sat, Feb 21, 2015 at 12:17 AM, Akhil Das ak...@sigmoidanalytics.com wrote:
Spark won't listen on  mate, It basically means you have a flume source 
running at port  of your localhost. And when you submit your application in 
standalone mode, workers will consume date from that port.

Thanks
Best Regards

On Sat, Feb 21, 2015 at 9:22 AM, bit1...@163.com bit1...@163.com wrote:

Hi,
In the spark streaming application, I write the code, 
FlumeUtils.createStream(ssc,localhost,),which means spark will listen on 
the  port, and wait for Flume Sink to write to it.
My question is:  when I submit the application to the Spark Standalone cluster, 
will  be opened only on the Driver Machine or all the workers will also 
open the  port and wait for the Flume data? 








Re: Re: About FlumeUtils.createStream

2015-02-23 Thread bit1...@163.com
Hi, Akhil,Tathagata,

This leads me to another question ,For the Spark Streaming and Kafka 
Integration, If there are more than one Receiver in the cluster, such as 
  val streams = (1 to 6).map ( _ = KafkaUtils.createStream(ssc, zkQuorum, 
group, topicMap).map(_._2) ), 
then these Receivers will stay on one cluster node, or will they distributed 
among the cluster nodes?



bit1...@163.com
 
From: Akhil Das
Date: 2015-02-24 12:58
To: Tathagata Das
CC: user; bit1129
Subject: Re: About FlumeUtils.createStream
I see, thanks for the clarification TD.
On 24 Feb 2015 09:56, Tathagata Das t...@databricks.com wrote:
Akhil, that is incorrect. 

Spark will list on the given port for Flume to push data into it. 
When in local mode, it will listen on localhost:
When in some kind of cluster, instead of localhost you will have to give the 
hostname of the cluster node where you want Flume to forward the data. Spark 
will launch the Flume receiver on that node (assuming the hostname matching is 
correct), and list on port , for receiving data from Flume. So only the 
configured machine will listen on port . 

I suggest trying the other stream. FlumeUtils.createPollingStream. More details 
here. 
http://spark.apache.org/docs/latest/streaming-flume-integration.html



On Sat, Feb 21, 2015 at 12:17 AM, Akhil Das ak...@sigmoidanalytics.com wrote:
Spark won't listen on  mate, It basically means you have a flume source 
running at port  of your localhost. And when you submit your application in 
standalone mode, workers will consume date from that port.

Thanks
Best Regards

On Sat, Feb 21, 2015 at 9:22 AM, bit1...@163.com bit1...@163.com wrote:

Hi,
In the spark streaming application, I write the code, 
FlumeUtils.createStream(ssc,localhost,),which means spark will listen on 
the  port, and wait for Flume Sink to write to it.
My question is:  when I submit the application to the Spark Standalone cluster, 
will  be opened only on the Driver Machine or all the workers will also 
open the  port and wait for the Flume data? 








Re: Re: About FlumeUtils.createStream

2015-02-23 Thread Tathagata Das
Distributed among cluster nodes.

On Mon, Feb 23, 2015 at 8:45 PM, bit1...@163.com bit1...@163.com wrote:

 Hi, Akhil,Tathagata,

 This leads me to another question ,For the Spark Streaming and Kafka
 Integration, If there are more than one Receiver in the cluster, such as
   val streams = (1 to 6).map ( _ = KafkaUtils.createStream(ssc,
 zkQuorum, group, topicMap).map(_._2) ),
 then these Receivers will stay on one cluster node, or will they
 distributed among the cluster nodes?

 --
 bit1...@163.com


 *From:* Akhil Das ak...@sigmoidanalytics.com
 *Date:* 2015-02-24 12:58
 *To:* Tathagata Das t...@databricks.com
 *CC:* user user@spark.apache.org; bit1129 bit1...@163.com
 *Subject:* Re: About FlumeUtils.createStream

 I see, thanks for the clarification TD.
 On 24 Feb 2015 09:56, Tathagata Das t...@databricks.com wrote:

 Akhil, that is incorrect.

 Spark will list on the given port for Flume to push data into it.
 When in local mode, it will listen on localhost:
 When in some kind of cluster, instead of localhost you will have to give
 the hostname of the cluster node where you want Flume to forward the data.
 Spark will launch the Flume receiver on that node (assuming the hostname
 matching is correct), and list on port , for receiving data from Flume.
 So only the configured machine will listen on port .

 I suggest trying the other stream. FlumeUtils.createPollingStream. More
 details here.
 http://spark.apache.org/docs/latest/streaming-flume-integration.html



 On Sat, Feb 21, 2015 at 12:17 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 Spark won't listen on  mate, It basically means you have a flume
 source running at port  of your localhost. And when you submit your
 application in standalone mode, workers will consume date from that port.

 Thanks
 Best Regards

 On Sat, Feb 21, 2015 at 9:22 AM, bit1...@163.com bit1...@163.com
 wrote:


 Hi,
 In the spark streaming application, I write the code, 
 FlumeUtils.createStream(ssc,localhost,),which
 means spark will listen on the  port, and wait for Flume Sink to write
 to it.
 My question is:  when I submit the application to the Spark Standalone
 cluster, will  be opened only on the Driver Machine or all the workers
 will also open the  port and wait for the Flume data?

 --






Re: About FlumeUtils.createStream

2015-02-23 Thread Akhil Das
I see, thanks for the clarification TD.
On 24 Feb 2015 09:56, Tathagata Das t...@databricks.com wrote:

 Akhil, that is incorrect.

 Spark will list on the given port for Flume to push data into it.
 When in local mode, it will listen on localhost:
 When in some kind of cluster, instead of localhost you will have to give
 the hostname of the cluster node where you want Flume to forward the data.
 Spark will launch the Flume receiver on that node (assuming the hostname
 matching is correct), and list on port , for receiving data from Flume.
 So only the configured machine will listen on port .

 I suggest trying the other stream. FlumeUtils.createPollingStream. More
 details here.
 http://spark.apache.org/docs/latest/streaming-flume-integration.html



 On Sat, Feb 21, 2015 at 12:17 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 Spark won't listen on  mate, It basically means you have a flume
 source running at port  of your localhost. And when you submit your
 application in standalone mode, workers will consume date from that port.

 Thanks
 Best Regards

 On Sat, Feb 21, 2015 at 9:22 AM, bit1...@163.com bit1...@163.com wrote:


 Hi,
 In the spark streaming application, I write the code, 
 FlumeUtils.createStream(ssc,localhost,),which
 means spark will listen on the  port, and wait for Flume Sink to write
 to it.
 My question is:  when I submit the application to the Spark Standalone
 cluster, will  be opened only on the Driver Machine or all the workers
 will also open the  port and wait for the Flume data?

 --






Re: About FlumeUtils.createStream

2015-02-21 Thread Akhil Das
Spark won't listen on  mate, It basically means you have a flume source
running at port  of your localhost. And when you submit your
application in standalone mode, workers will consume date from that port.

Thanks
Best Regards

On Sat, Feb 21, 2015 at 9:22 AM, bit1...@163.com bit1...@163.com wrote:


 Hi,
 In the spark streaming application, I write the code, 
 FlumeUtils.createStream(ssc,localhost,),which
 means spark will listen on the  port, and wait for Flume Sink to write
 to it.
 My question is:  when I submit the application to the Spark Standalone
 cluster, will  be opened only on the Driver Machine or all the workers
 will also open the  port and wait for the Flume data?

 --