subject:"\"Spark Streaming \\\- stream between 2 applications\""

Re: Spark Streaming - stream between 2 applications

2015-11-21 Thread Christian

Instead of sending the results of the one spark app directly to the other
one, you could write the results to a Kafka topic which is consumed by your
other spark application.

On Fri, Nov 20, 2015 at 12:07 PM Saiph Kappa  wrote:

> I think my problem persists whether I use Kafka or sockets. Or am I wrong?
> How would you use Kafka here?
>
> On Fri, Nov 20, 2015 at 7:12 PM, Christian  wrote:
>
>> Have you considered using Kafka?
>>
>> On Fri, Nov 20, 2015 at 6:48 AM Saiph Kappa 
>> wrote:
>>
>>> Hi,
>>>
>>> I have a basic spark streaming application like this:
>>>
>>> «
>>> ...
>>>
>>> val ssc = new StreamingContext(sparkConf, Duration(batchMillis))
>>> val rawStreams = (1 to numStreams).map(_ =>
>>>   ssc.rawSocketStream[String](host, port, 
>>> StorageLevel.MEMORY_ONLY_SER)).toArray
>>> val union = ssc.union(rawStreams)
>>>
>>> union.flatMap(line => line.split(' ')).foreachRDD(rdd => {
>>>
>>>   // TODO
>>>
>>> }
>>> ...
>>> »
>>>
>>>
>>> My question is: what is the best and fastest way to send the resulting rdds
>>> as input to be consumed by another spark streaming application?
>>>
>>> I tried to add this code in place of the "TODO" comment:
>>>
>>> «
>>> val serverSocket = new ServerSocket(9998)
>>> while (true) {
>>>   val socket = serverSocket.accept()
>>>   @transient val out = new PrintWriter(socket.getOutputStream)
>>>   try {
>>> rdd.foreach(out.write)
>>>   } catch {
>>> case e: IOException =>
>>>   socket.close()
>>>   }
>>> }
>>> »
>>>
>>>
>>> I also tried to create a thread in the driver application code to launch the
>>> socket server and then share state (the PrintWriter object) between the 
>>> driver program and tasks.
>>> But got an exception saying that task is not serializable - PrintWriter is 
>>> not serializable
>>> (despite the @trasient annotation). I know this is not a very elegant 
>>> solution, but what other
>>> directions should I explore?
>>>
>>> Thanks.
>>>
>>>
>

Re: Spark Streaming - stream between 2 applications

2015-11-20 Thread Cody Koeninger

You're confused about which parts of your code are running on the driver vs
the executor, which is why you're getting serialization errors.

Read

http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd



On Fri, Nov 20, 2015 at 1:07 PM, Saiph Kappa  wrote:

> I think my problem persists whether I use Kafka or sockets. Or am I wrong?
> How would you use Kafka here?
>
> On Fri, Nov 20, 2015 at 7:12 PM, Christian  wrote:
>
>> Have you considered using Kafka?
>>
>> On Fri, Nov 20, 2015 at 6:48 AM Saiph Kappa 
>> wrote:
>>
>>> Hi,
>>>
>>> I have a basic spark streaming application like this:
>>>
>>> «
>>> ...
>>>
>>> val ssc = new StreamingContext(sparkConf, Duration(batchMillis))
>>> val rawStreams = (1 to numStreams).map(_ =>
>>>   ssc.rawSocketStream[String](host, port, 
>>> StorageLevel.MEMORY_ONLY_SER)).toArray
>>> val union = ssc.union(rawStreams)
>>>
>>> union.flatMap(line => line.split(' ')).foreachRDD(rdd => {
>>>
>>>   // TODO
>>>
>>> }
>>> ...
>>> »
>>>
>>>
>>> My question is: what is the best and fastest way to send the resulting rdds
>>> as input to be consumed by another spark streaming application?
>>>
>>> I tried to add this code in place of the "TODO" comment:
>>>
>>> «
>>> val serverSocket = new ServerSocket(9998)
>>> while (true) {
>>>   val socket = serverSocket.accept()
>>>   @transient val out = new PrintWriter(socket.getOutputStream)
>>>   try {
>>> rdd.foreach(out.write)
>>>   } catch {
>>> case e: IOException =>
>>>   socket.close()
>>>   }
>>> }
>>> »
>>>
>>>
>>> I also tried to create a thread in the driver application code to launch the
>>> socket server and then share state (the PrintWriter object) between the 
>>> driver program and tasks.
>>> But got an exception saying that task is not serializable - PrintWriter is 
>>> not serializable
>>> (despite the @trasient annotation). I know this is not a very elegant 
>>> solution, but what other
>>> directions should I explore?
>>>
>>> Thanks.
>>>
>>>
>

Re: Spark Streaming - stream between 2 applications

2015-11-20 Thread Saiph Kappa

I think my problem persists whether I use Kafka or sockets. Or am I wrong?
How would you use Kafka here?

On Fri, Nov 20, 2015 at 7:12 PM, Christian  wrote:

> Have you considered using Kafka?
>
> On Fri, Nov 20, 2015 at 6:48 AM Saiph Kappa  wrote:
>
>> Hi,
>>
>> I have a basic spark streaming application like this:
>>
>> «
>> ...
>>
>> val ssc = new StreamingContext(sparkConf, Duration(batchMillis))
>> val rawStreams = (1 to numStreams).map(_ =>
>>   ssc.rawSocketStream[String](host, port, 
>> StorageLevel.MEMORY_ONLY_SER)).toArray
>> val union = ssc.union(rawStreams)
>>
>> union.flatMap(line => line.split(' ')).foreachRDD(rdd => {
>>
>>   // TODO
>>
>> }
>> ...
>> »
>>
>>
>> My question is: what is the best and fastest way to send the resulting rdds
>> as input to be consumed by another spark streaming application?
>>
>> I tried to add this code in place of the "TODO" comment:
>>
>> «
>> val serverSocket = new ServerSocket(9998)
>> while (true) {
>>   val socket = serverSocket.accept()
>>   @transient val out = new PrintWriter(socket.getOutputStream)
>>   try {
>> rdd.foreach(out.write)
>>   } catch {
>> case e: IOException =>
>>   socket.close()
>>   }
>> }
>> »
>>
>>
>> I also tried to create a thread in the driver application code to launch the
>> socket server and then share state (the PrintWriter object) between the 
>> driver program and tasks.
>> But got an exception saying that task is not serializable - PrintWriter is 
>> not serializable
>> (despite the @trasient annotation). I know this is not a very elegant 
>> solution, but what other
>> directions should I explore?
>>
>> Thanks.
>>
>>

Re: Spark Streaming - stream between 2 applications

2015-11-20 Thread Christian

Have you considered using Kafka?
On Fri, Nov 20, 2015 at 6:48 AM Saiph Kappa  wrote:

> Hi,
>
> I have a basic spark streaming application like this:
>
> «
> ...
>
> val ssc = new StreamingContext(sparkConf, Duration(batchMillis))
> val rawStreams = (1 to numStreams).map(_ =>
>   ssc.rawSocketStream[String](host, port, 
> StorageLevel.MEMORY_ONLY_SER)).toArray
> val union = ssc.union(rawStreams)
>
> union.flatMap(line => line.split(' ')).foreachRDD(rdd => {
>
>   // TODO
>
> }
> ...
> »
>
>
> My question is: what is the best and fastest way to send the resulting rdds
> as input to be consumed by another spark streaming application?
>
> I tried to add this code in place of the "TODO" comment:
>
> «
> val serverSocket = new ServerSocket(9998)
> while (true) {
>   val socket = serverSocket.accept()
>   @transient val out = new PrintWriter(socket.getOutputStream)
>   try {
> rdd.foreach(out.write)
>   } catch {
> case e: IOException =>
>   socket.close()
>   }
> }
> »
>
>
> I also tried to create a thread in the driver application code to launch the
> socket server and then share state (the PrintWriter object) between the 
> driver program and tasks.
> But got an exception saying that task is not serializable - PrintWriter is 
> not serializable
> (despite the @trasient annotation). I know this is not a very elegant 
> solution, but what other
> directions should I explore?
>
> Thanks.
>
>

Spark Streaming - stream between 2 applications

2015-11-20 Thread Saiph Kappa

Hi,

I have a basic spark streaming application like this:

«
...

val ssc = new StreamingContext(sparkConf, Duration(batchMillis))
val rawStreams = (1 to numStreams).map(_ =>
  ssc.rawSocketStream[String](host, port, StorageLevel.MEMORY_ONLY_SER)).toArray
val union = ssc.union(rawStreams)

union.flatMap(line => line.split(' ')).foreachRDD(rdd => {

  // TODO

}
...
»


My question is: what is the best and fastest way to send the resulting rdds
as input to be consumed by another spark streaming application?

I tried to add this code in place of the "TODO" comment:

«
val serverSocket = new ServerSocket(9998)
while (true) {
  val socket = serverSocket.accept()
  @transient val out = new PrintWriter(socket.getOutputStream)
  try {
rdd.foreach(out.write)
  } catch {
case e: IOException =>
  socket.close()
  }
}
»


I also tried to create a thread in the driver application code to launch the
socket server and then share state (the PrintWriter object) between
the driver program and tasks.
But got an exception saying that task is not serializable -
PrintWriter is not serializable
(despite the @trasient annotation). I know this is not a very elegant
solution, but what other
directions should I explore?

Thanks.

Re: Spark Streaming - stream between 2 applications

Re: Spark Streaming - stream between 2 applications

Re: Spark Streaming - stream between 2 applications

Re: Spark Streaming - stream between 2 applications

Spark Streaming - stream between 2 applications

5 matches

Site Navigation

Mail list logo

Footer information