Re: Apache Spark data locality when integrating with Kafka
We are using spark in two ways 1. Yarn with spark support. Kafka running along with data nodes 2. Spark master and workers running with some of Kafka brokers. Data locality is important. Regards Diwakar Sent from Samsung Mobile. Original message From: أنس الليثي Date:08/02/2016 02:07 (GMT+05:30) To: Diwakar Dhanuskodi Cc: "Yuval.Itzchakov" , user Subject: Re: Apache Spark data locality when integrating with Kafka Diwakar We have our own servers. We will not use any cloud service like Amazon's On 7 February 2016 at 18:24, Diwakar Dhanuskodi wrote: Fanoos, Where you want the solution to be deployed ?. On premise or cloud? Regards Diwakar . Sent from Samsung Mobile. Original message From: "Yuval.Itzchakov" Date:07/02/2016 19:38 (GMT+05:30) To: user@spark.apache.org Cc: Subject: Re: Apache Spark data locality when integrating with Kafka I would definitely try to avoid hosting Kafka and Spark on the same servers. Kafka and Spark will be doing alot of IO between them, so you'll want to maximize on those resources and not share them on the same server. You'll want each Kafka broker to be on a dedicated server, as well as your spark master and workers. If you're hosting them on Amazon EC2 instances, then you'll want these to be on the same availability zone, so you can benefit from low latency in that same zone. If you're on a dedicated servers, perhaps you'll want to create a VPC between the two clusters so you can, again, benefit from low IO latency and high throughput. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-data-locality-when-integrating-with-Kafka-tp26165p26170.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Anas Rabei Senior Software Developer Mubasher.info anas.ra...@mubasher.info
Re: Apache Spark data locality when integrating with Kafka
Diwakar We have our own servers. We will not use any cloud service like Amazon's On 7 February 2016 at 18:24, Diwakar Dhanuskodi < diwakar.dhanusk...@gmail.com> wrote: > Fanoos, > Where you want the solution to be deployed ?. On premise or cloud? > > Regards > Diwakar . > > > > Sent from Samsung Mobile. > > > Original message > From: "Yuval.Itzchakov" > Date:07/02/2016 19:38 (GMT+05:30) > To: user@spark.apache.org > Cc: > Subject: Re: Apache Spark data locality when integrating with Kafka > > I would definitely try to avoid hosting Kafka and Spark on the same > servers. > > Kafka and Spark will be doing alot of IO between them, so you'll want to > maximize on those resources and not share them on the same server. You'll > want each Kafka broker to be on a dedicated server, as well as your spark > master and workers. If you're hosting them on Amazon EC2 instances, then > you'll want these to be on the same availability zone, so you can benefit > from low latency in that same zone. If you're on a dedicated servers, > perhaps you'll want to create a VPC between the two clusters so you can, > again, benefit from low IO latency and high throughput. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-data-locality-when-integrating-with-Kafka-tp26165p26170.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Anas Rabei Senior Software Developer Mubasher.info anas.ra...@mubasher.info
Re: Apache Spark data locality when integrating with Kafka
Fanoos, Where you want the solution to be deployed ?. On premise or cloud? Regards Diwakar . Sent from Samsung Mobile. Original message From: "Yuval.Itzchakov" Date:07/02/2016 19:38 (GMT+05:30) To: user@spark.apache.org Cc: Subject: Re: Apache Spark data locality when integrating with Kafka I would definitely try to avoid hosting Kafka and Spark on the same servers. Kafka and Spark will be doing alot of IO between them, so you'll want to maximize on those resources and not share them on the same server. You'll want each Kafka broker to be on a dedicated server, as well as your spark master and workers. If you're hosting them on Amazon EC2 instances, then you'll want these to be on the same availability zone, so you can benefit from low latency in that same zone. If you're on a dedicated servers, perhaps you'll want to create a VPC between the two clusters so you can, again, benefit from low IO latency and high throughput. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-data-locality-when-integrating-with-Kafka-tp26165p26170.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Apache Spark data locality when integrating with Kafka
I would definitely try to avoid hosting Kafka and Spark on the same servers. Kafka and Spark will be doing alot of IO between them, so you'll want to maximize on those resources and not share them on the same server. You'll want each Kafka broker to be on a dedicated server, as well as your spark master and workers. If you're hosting them on Amazon EC2 instances, then you'll want these to be on the same availability zone, so you can benefit from low latency in that same zone. If you're on a dedicated servers, perhaps you'll want to create a VPC between the two clusters so you can, again, benefit from low IO latency and high throughput. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-data-locality-when-integrating-with-Kafka-tp26165p26170.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Apache Spark data locality when integrating with Kafka
spark can benefit from data locality and will try to launch tasks on the node where the kafka partition resides. however i think in production many organizations run a dedicated kafka cluster. On Sat, Feb 6, 2016 at 11:27 PM, Diwakar Dhanuskodi < diwakar.dhanusk...@gmail.com> wrote: > Yes . To reduce network latency . > > > Sent from Samsung Mobile. > > > Original message > From: fanooos > Date:07/02/2016 09:24 (GMT+05:30) > To: user@spark.apache.org > Cc: > Subject: Apache Spark data locality when integrating with Kafka > > Dears > > If I will use Kafka as a streaming source to some spark jobs, is it advised > to install spark to the same nodes of kafka cluster? > > What are the benefits and drawbacks of such a decision? > > regards > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-data-locality-when-integrating-with-Kafka-tp26165.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
RE: Apache Spark data locality when integrating with Kafka
Yes . To reduce network latency . Sent from Samsung Mobile. Original message From: fanooos Date:07/02/2016 09:24 (GMT+05:30) To: user@spark.apache.org Cc: Subject: Apache Spark data locality when integrating with Kafka Dears If I will use Kafka as a streaming source to some spark jobs, is it advised to install spark to the same nodes of kafka cluster? What are the benefits and drawbacks of such a decision? regards -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-data-locality-when-integrating-with-Kafka-tp26165.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Apache Spark data locality when integrating with Kafka
Dears If I will use Kafka as a streaming source to some spark jobs, is it advised to install spark to the same nodes of kafka cluster? What are the benefits and drawbacks of such a decision? regards -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-data-locality-when-integrating-with-Kafka-tp26165.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org