Hi Rohit,
Since your instances have 16G dual core only, I would suggest to use
dedicated nodes for elastic using 8GB for elastic heap memory. This way you
won't have any interference between spark executors and elastic.
Also, if possible, you could try to use SSD disk on these 3 machines for
Below ingestion rate is actually when I am using a bactch size of 10mb, 10
records. I have tried with 20-50 partitions, higher partitions give bulk queue
exceptions.
Anyways thanks for suggestion I would appreciate more inputs, specifically on
cluster design.
Rohit
> On Dec 22, 2016, at
One thing I will look at is how many partitions your dataset has before writing
to ES using Spark. As it may be the limiting factor to your parallel writing.
You can also tune the batch size on ES writes...
One more thing, make sure you have enough network bandwidth...
Regards,
Yang
Sent
I am setting up a spark cluster. I have hdfs data nodes and spark master nodes
on same instances. To add elasticsearch to this cluster, should I spawn es on
different machine on same machine. I have only 12 machines,
1-master (spark and hdfs)
8-spark workers and hdfs data nodes
I can use 3
Are you sitting behind a firewall and accessing a remote master machine? In
that case, have a look at this
http://spark.apache.org/docs/latest/configuration.html#networking, you
might want to fix few properties like spark.driver.host, spark.driver.host
etc.
Thanks
Best Regards
On Mon, Aug 3,
Your master log files will be on the spark home folder/logs at the master
machine. Do they show an error ?
Best Regards,
Sonal
Founder, Nube Technologies http://www.nubetech.co
Check out Reifier at Spark Summit 2015
What do the master logs show?
Best Regards,
Sonal
Founder, Nube Technologies
http://t.sidekickopen13.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs1pNkJdVdDLZW1q7zBxW64k9XR56dLFLf58_ZT802?t=http%3A%2F%2Fwww.nubetech.co%2Fsi=5462006004973568pi=903294d1-e4a2-4926-cf03-b51cc168cfc1
Check out
Similar to what Dean called out, we build Puppet manifests so we could do
the automation - its a bit of work to setup, but well worth the effort.
On Fri, Apr 24, 2015 at 11:27 AM Dean Wampler deanwamp...@gmail.com wrote:
It's mostly manual. You could try automating with something like Chef, of
I'm trying to find out how to setup a resilient Spark cluster.
Things I'm thinking about include:
- How to start multiple masters on different hosts?
- there isn't a conf/masters file from what I can see
Thank you.
It's mostly manual. You could try automating with something like Chef, of
course, but there's nothing already available in terms of automation.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
Thanks Dean,
Sure I have that setup locally and testing it with ZK.
But to start my multiple Masters do I need to go to each host and start
there or is there a better way to do this.
Regards
jk
On Fri, Apr 24, 2015 at 5:23 PM, Dean Wampler deanwamp...@gmail.com wrote:
The convention for
The convention for standalone cluster is to use Zookeeper to manage master
failover.
http://spark.apache.org/docs/latest/spark-standalone.html
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
12 matches
Mail list logo