Dear Andy,

As far as I understand, the transformations are applied to the RDDs not to
the data and I need to send the actual data to Kafka. This way, I think I
should perform at least one action to make spark load the data.

Kindly correct me if I do not understand this the correct way.

Best regards.

On 29 March 2016 at 19:40, Andy Davidson <a...@santacruzintegration.com>
wrote:

> Hi Fanoos
>
> I would be careful about using collect(). You need to make sure you local
> computer has enough memory to hold your entire data set.
>
> Eventually I will need to do something similar. I have to written the code
> yet. My plan is to load the data into a data frame and then write a UDF
> that actually publishes the Kafka
>
> If you are using RDD’s you could use map() or some other transform to
> cause the data to be published
>
> Andy
>
> From: fanooos <dev.fano...@gmail.com>
> Date: Tuesday, March 29, 2016 at 4:26 AM
> To: "user @spark" <user@spark.apache.org>
> Subject: Re: Sending events to Kafka from spark job
>
> I think I find a solution but I have no idea how this affects the execution
> of the application.
>
> At the end of the script I added  a sleep statement.
>
> import time
> time.sleep(1)
>
>
> This solved the problem.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Sending-events-to-Kafka-from-spark-job-tp26622p26624.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>


-- 
Anas Rabei
Senior Software Developer
Mubasher.info
anas.ra...@mubasher.info

Reply via email to