Re: Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

2016-12-23 Thread Anastasios Zouzias
Hi Rohit, Since your instances have 16G dual core only, I would suggest to use dedicated nodes for elastic using 8GB for elastic heap memory. This way you won't have any interference between spark executors and elastic. Also, if possible, you could try to use SSD disk on these 3 machines for

Re: Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

2016-12-22 Thread Rohit Verma
Below ingestion rate is actually when I am using a bactch size of 10mb, 10 records. I have tried with 20-50 partitions, higher partitions give bulk queue exceptions. Anyways thanks for suggestion I would appreciate more inputs, specifically on cluster design. Rohit > On Dec 22, 2016, at

Re: Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

2016-12-22 Thread genia...@gmail.com
One thing I will look at is how many partitions your dataset has before writing to ES using Spark. As it may be the limiting factor to your parallel writing. You can also tune the batch size on ES writes... One more thing, make sure you have enough network bandwidth... Regards, Yang Sent

Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

2016-12-22 Thread Rohit Verma
I am setting up a spark cluster. I have hdfs data nodes and spark master nodes on same instances. To add elasticsearch to this cluster, should I spawn es on different machine on same machine. I have only 12 machines, 1-master (spark and hdfs) 8-spark workers and hdfs data nodes I can use 3

Re: spark cluster setup

2015-08-03 Thread Akhil Das
Are you sitting behind a firewall and accessing a remote master machine? In that case, have a look at this http://spark.apache.org/docs/latest/configuration.html#networking, you might want to fix few properties like spark.driver.host, spark.driver.host etc. Thanks Best Regards On Mon, Aug 3,

Re: spark cluster setup

2015-08-03 Thread Sonal Goyal
Your master log files will be on the spark home folder/logs at the master machine. Do they show an error ? Best Regards, Sonal Founder, Nube Technologies http://www.nubetech.co Check out Reifier at Spark Summit 2015

Re: spark cluster setup

2015-08-02 Thread Sonal Goyal
What do the master logs show? Best Regards, Sonal Founder, Nube Technologies http://t.sidekickopen13.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs1pNkJdVdDLZW1q7zBxW64k9XR56dLFLf58_ZT802?t=http%3A%2F%2Fwww.nubetech.co%2Fsi=5462006004973568pi=903294d1-e4a2-4926-cf03-b51cc168cfc1 Check out

Re: Spark Cluster Setup

2015-04-27 Thread Denny Lee
Similar to what Dean called out, we build Puppet manifests so we could do the automation - its a bit of work to setup, but well worth the effort. On Fri, Apr 24, 2015 at 11:27 AM Dean Wampler deanwamp...@gmail.com wrote: It's mostly manual. You could try automating with something like Chef, of

Spark Cluster Setup

2015-04-24 Thread James King
I'm trying to find out how to setup a resilient Spark cluster. Things I'm thinking about include: - How to start multiple masters on different hosts? - there isn't a conf/masters file from what I can see Thank you.

Re: Spark Cluster Setup

2015-04-24 Thread Dean Wampler
It's mostly manual. You could try automating with something like Chef, of course, but there's nothing already available in terms of automation. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com

Re: Spark Cluster Setup

2015-04-24 Thread James King
Thanks Dean, Sure I have that setup locally and testing it with ZK. But to start my multiple Masters do I need to go to each host and start there or is there a better way to do this. Regards jk On Fri, Apr 24, 2015 at 5:23 PM, Dean Wampler deanwamp...@gmail.com wrote: The convention for

Re: Spark Cluster Setup

2015-04-24 Thread Dean Wampler
The convention for standalone cluster is to use Zookeeper to manage master failover. http://spark.apache.org/docs/latest/spark-standalone.html Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com