One thing I will look at is how many partitions your dataset has before writing 
to ES using Spark. As it may be the limiting factor to your parallel writing. 

You can also tune the batch size on ES writes...

One more thing, make sure you have enough network bandwidth...

Regards,

Yang

Sent from my iPhone

> On Dec 22, 2016, at 12:35 PM, Rohit Verma <rohit.ve...@rokittech.com> wrote:
> 
> I am setting up a spark cluster. I have hdfs data nodes and spark master 
> nodes on same instances. To add elasticsearch to this cluster, should I spawn 
> es on different machine on same machine. I have only 12 machines, 
> 1-master (spark and hdfs)
> 8-spark workers and hdfs data nodes
> I can use 3 nodes for es dedicatedly or can use 11 nodes running all three.
> 
> All instances are same, 16gig dual core (unfortunately). 
> 
> Also I am trying with es hadoop, es-spark project but I felt ingestion is 
> very slow if I do 3 dedicated nodes, its like 0.6 million records/minute. 
> If any one had experience using that project can you please share your 
> thoughts about tuning.
> 
> Regards
> Rohit
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to