Is this likely to cause any problems?

2016-02-20 Thread Teng Qiu
@Daniel, there are at least 3 things that EMR can not solve, yet: - HA support - AWS provides auto scaling feature, but scale up/down EMR needs manual operations - security concerns in a public VPC EMR is basically designed for short term running use cases with some pre-defined bootstrap actions

Re: Is this likely to cause any problems?

2016-02-19 Thread Sabarish Sasidharan
EMR does cost more than vanilla EC2. Using spark-ec2 can result in savings with large clusters, though that is not everybody's cup of tea. Regards Sab On 19-Feb-2016 7:55 pm, "Daniel Siegmann" wrote: > With EMR supporting Spark, I don't see much reason to use the

Re: Is this likely to cause any problems?

2016-02-19 Thread Nicholas Chammas
The docs mention spark-ec2 because it is part of the Spark project. There are many, many alternatives to spark-ec2 out there like EMR, but it's probably not the place of the official docs to promote any one of those third-party solutions. On Fri, Feb 19, 2016 at 11:05 AM James Hammerton

Re: Is this likely to cause any problems?

2016-02-19 Thread James Hammerton
Hi, Having looked at how easy it is to use EMR, I reckon you may be right, especially if using Java 8 is no more difficult with that than with spark-ec2 (where I had to install it on the master and slaves and edit the spark-env.sh). I'm now curious as to why the Spark documentation (

Re: Is this likely to cause any problems?

2016-02-19 Thread Daniel Siegmann
With EMR supporting Spark, I don't see much reason to use the spark-ec2 script unless it is important for you to be able to launch clusters using the bleeding edge version of Spark. EMR does seem to do a pretty decent job of keeping up to date - the latest version (4.3.0) supports the latest Spark

Re: Is this likely to cause any problems?

2016-02-18 Thread James Hammerton
I have now... So far I think the issues I've had are not related to this, but I wanted to be sure in case it should be something that needs to be patched. I've had some jobs run successfully but this warning appears in the logs. Regards, James On 18 February 2016 at 12:23, Ted Yu

Re: Is this likely to cause any problems?

2016-02-18 Thread James Hammerton
I'm fairly new to Spark. The documentation suggests using the spark-ec2 script to launch clusters in AWS, hence I used it. Would EMR offer any advantage? Regards, James On 18 February 2016 at 14:04, Gourav Sengupta wrote: > Hi, > > Just out of sheet curiosity why

Re: Is this likely to cause any problems?

2016-02-18 Thread Gourav Sengupta
Hi Ted/ Teng, Just read the content in the email which is very different from what the facts are: Just to want to add another point, spark-ec2 is nice to keep and improve because it allows users to any version of spark (nightly-build for example). EMR does not allow you to do that without manual

Re: Is this likely to cause any problems?

2016-02-18 Thread Gourav Sengupta
Hi Teng, Are you using VPC in EMR? Seems quite curious though that you can lock in traffic at gateway, subnet, security group (using private setting using NAT) and still feel insecured. I will be really interested to know what your feelings are based on. I bet Amazon guys will also find it very

Re: Is this likely to cause any problems?

2016-02-18 Thread Ted Yu
Please see the last 3 posts on this thread: http://search-hadoop.com/m/q3RTtTorTf2o3UGK1=Re+spark+ec2+vs+EMR FYI On Thu, Feb 18, 2016 at 6:25 AM, Teng Qiu wrote: > EMR is great, but I'm curiosity how are you dealing with security settings > with EMR, only whitelisting some

Re: Is this likely to cause any problems?

2016-02-18 Thread Teng Qiu
EMR is great, but I'm curiosity how are you dealing with security settings with EMR, only whitelisting some IP range with security group setting is really too weak. are there really many production system are using EMR? for me, i feel using EMR means everyone in my IP range (for some ISP it may

Re: Is this likely to cause any problems?

2016-02-18 Thread Gourav Sengupta
Hi, Just out of sheet curiosity why are you not using EMR to start your SPARK cluster? Regards, Gourav On Thu, Feb 18, 2016 at 12:23 PM, Ted Yu wrote: > Have you seen this ? > > HADOOP-10988 > > Cheers > > On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton

Re: Is this likely to cause any problems?

2016-02-18 Thread Ted Yu
Have you seen this ? HADOOP-10988 Cheers On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton wrote: > HI, > > I am seeing warnings like this in the logs when I run Spark jobs: > > OpenJDK 64-Bit Server VM warning: You have loaded library >

Is this likely to cause any problems?

2016-02-18 Thread James Hammerton
HI, I am seeing warnings like this in the logs when I run Spark jobs: OpenJDK 64-Bit Server VM warning: You have loaded library /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you