Below ingestion rate is actually when I am using a bactch size of 10mb, 100000 records. I have tried with 20-50 partitions, higher partitions give bulk queue exceptions.
Anyways thanks for suggestion I would appreciate more inputs, specifically on cluster design. Rohit > On Dec 22, 2016, at 11:31 PM, genia...@gmail.com <genia...@gmail.com> wrote: > > One thing I will look at is how many partitions your dataset has before > writing to ES using Spark. As it may be the limiting factor to your parallel > writing. > > You can also tune the batch size on ES writes... > > One more thing, make sure you have enough network bandwidth... > > Regards, > > Yang > > Sent from my iPhone > >> On Dec 22, 2016, at 12:35 PM, Rohit Verma <rohit.ve...@rokittech.com> wrote: >> >> I am setting up a spark cluster. I have hdfs data nodes and spark master >> nodes on same instances. To add elasticsearch to this cluster, should I >> spawn es on different machine on same machine. I have only 12 machines, >> 1-master (spark and hdfs) >> 8-spark workers and hdfs data nodes >> I can use 3 nodes for es dedicatedly or can use 11 nodes running all three. >> >> All instances are same, 16gig dual core (unfortunately). >> >> Also I am trying with es hadoop, es-spark project but I felt ingestion is >> very slow if I do 3 dedicated nodes, its like 0.6 million records/minute. >> If any one had experience using that project can you please share your >> thoughts about tuning. >> >> Regards >> Rohit >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org