Re: Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

Rohit Verma Thu, 22 Dec 2016 23:13:39 -0800

Below ingestion rate is actually when I am using a bactch size of 10mb, 100000 
records. I have tried with 20-50 partitions, higher partitions give bulk queue 
exceptions.


 Anyways thanks for suggestion I would appreciate more inputs, specifically on 
cluster design.

Rohit
> On Dec 22, 2016, at 11:31 PM, genia...@gmail.com <genia...@gmail.com> wrote:
> 
> One thing I will look at is how many partitions your dataset has before 
> writing to ES using Spark. As it may be the limiting factor to your parallel 
> writing. 
> 
> You can also tune the batch size on ES writes...
> 
> One more thing, make sure you have enough network bandwidth...
> 
> Regards,
> 
> Yang
> 
> Sent from my iPhone
> 
>> On Dec 22, 2016, at 12:35 PM, Rohit Verma <rohit.ve...@rokittech.com> wrote:
>> 
>> I am setting up a spark cluster. I have hdfs data nodes and spark master 
>> nodes on same instances. To add elasticsearch to this cluster, should I 
>> spawn es on different machine on same machine. I have only 12 machines, 
>> 1-master (spark and hdfs)
>> 8-spark workers and hdfs data nodes
>> I can use 3 nodes for es dedicatedly or can use 11 nodes running all three.
>> 
>> All instances are same, 16gig dual core (unfortunately). 
>> 
>> Also I am trying with es hadoop, es-spark project but I felt ingestion is 
>> very slow if I do 3 dedicated nodes, its like 0.6 million records/minute. 
>> If any one had experience using that project can you please share your 
>> thoughts about tuning.
>> 
>> Regards
>> Rohit
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

Reply via email to