On the other side increasing parallelism with kakfa partition avoid the shuffle in spark to repartition
Le mer. 7 nov. 2018 à 09:51, Michael Shtelma <mshte...@gmail.com> a écrit : > If you configure to many Kafka partitions, you can run into memory issues. > This will increase memory requirements for spark job a lot. > > Best, > Michael > > > On Wed, Nov 7, 2018 at 8:28 AM JF Chen <darou...@gmail.com> wrote: > >> I have a Spark Streaming application which reads data from kafka and save >> the the transformation result to hdfs. >> My original partition number of kafka topic is 8, and repartition the >> data to 100 to increase the parallelism of spark job. >> Now I am wondering if I increase the kafka partition number to 100 >> instead of setting repartition to 100, will the performance be enhanced? (I >> know repartition action cost a lot cpu resource) >> If I set the kafka partition number to 100, does it have any negative >> efficiency? >> I just have one production environment so it's not convenient for me to >> do the test.... >> >> Thanks! >> >> Regard, >> Junfeng Chen >> >