Re: Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

Anastasios Zouzias Fri, 23 Dec 2016 01:03:21 -0800

Hi Rohit,

Since your instances have 16G dual core only, I would suggest to use
dedicated nodes for elastic using 8GB for elastic heap memory. This way you
won't have any interference between spark executors and elastic.


Also, if possible, you could try to use SSD disk on these 3 machines for
storing the elastic indices; this will boost your elastic cluster
performance.

Best,
Anastasios

On Thu, Dec 22, 2016 at 6:35 PM, Rohit Verma <rohit.ve...@rokittech.com>
wrote:

> I am setting up a spark cluster. I have hdfs data nodes and spark master
> nodes on same instances. To add elasticsearch to this cluster, should I
> spawn es on different machine on same machine. I have only 12 machines,
> 1-master (spark and hdfs)
> 8-spark workers and hdfs data nodes
> I can use 3 nodes for es dedicatedly or can use 11 nodes running all three.
>
> All instances are same, 16gig dual core (unfortunately).
>
> Also I am trying with es hadoop, es-spark project but I felt ingestion is
> very slow if I do 3 dedicated nodes, its like 0.6 million records/minute.
> If any one had experience using that project can you please share your
> thoughts about tuning.
>
> Regards
> Rohit
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
-- Anastasios Zouzias
<a...@zurich.ibm.com>

Re: Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

Reply via email to