Is this likely to cause any problems?
@Daniel, there are at least 3 things that EMR can not solve, yet: - HA support - AWS provides auto scaling feature, but scale up/down EMR needs manual operations - security concerns in a public VPC EMR is basically designed for short term running use cases with some pre-defined bootstrap actions and steps, so mainly for scheduled querying processes, not good as a permanent running cluster for adhoc queries and analytical works. Therefore in our organization (a e-commerce company in europe, most of you may never heard :p but we have more than 1000 techies and 10k employees now...), we made a solution for this: https://github.com/zalando/spark-appliance It enables HA with zookeeper, nodes are under a auto scaling group, and running in private subnets, provides REST api secured with oauth, and even integrated with jupyter notebook :) Am Samstag, 20. Februar 2016 schrieb Sabarish Sasidharan : > EMR does cost more than vanilla EC2. Using spark-ec2 can result in savings with large clusters, though that is not everybody's cup of tea. > > Regards > Sab > > On 19-Feb-2016 7:55 pm, "Daniel Siegmann" wrote: >> >> With EMR supporting Spark, I don't see much reason to use the spark-ec2 script unless it is important for you to be able to launch clusters using the bleeding edge version of Spark. EMR does seem to do a pretty decent job of keeping up to date - the latest version (4.3.0) supports the latest Spark version (1.6.0). >> >> So I'd flip the question around and ask: is there any reason to continue using the spark-ec2 script rather than EMR? >> >> On Thu, Feb 18, 2016 at 11:39 AM, James Hammerton wrote: >>> >>> I have now... So far I think the issues I've had are not related to this, but I wanted to be sure in case it should be something that needs to be patched. I've had some jobs run successfully but this warning appears in the logs. >>> Regards, >>> James >>> >>> On 18 February 2016 at 12:23, Ted Yu wrote: Have you seen this ? HADOOP-10988 Cheers On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton wrote: > > HI, > I am seeing warnings like this in the logs when I run Spark jobs: > > OpenJDK 64-Bit Server VM warning: You have loaded library /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. > It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'. > > I used spark-ec2 to launch the cluster with the default AMI, Spark 1.5.2, hadoop major version 2.4. I altered the jdk to be openjdk 8 as I'd written some jobs in Java 8. The 6 workers nodes are m4.2xlarge and master is m4.large. > Could this contribute to any problems running the jobs? > Regards, > James >>> >> >
Re: Is this likely to cause any problems?
EMR does cost more than vanilla EC2. Using spark-ec2 can result in savings with large clusters, though that is not everybody's cup of tea. Regards Sab On 19-Feb-2016 7:55 pm, "Daniel Siegmann" wrote: > With EMR supporting Spark, I don't see much reason to use the spark-ec2 > script unless it is important for you to be able to launch clusters using > the bleeding edge version of Spark. EMR does seem to do a pretty decent job > of keeping up to date - the latest version (4.3.0) supports the latest > Spark version (1.6.0). > > So I'd flip the question around and ask: is there any reason to continue > using the spark-ec2 script rather than EMR? > > On Thu, Feb 18, 2016 at 11:39 AM, James Hammerton wrote: > >> I have now... So far I think the issues I've had are not related to >> this, but I wanted to be sure in case it should be something that needs to >> be patched. I've had some jobs run successfully but this warning appears in >> the logs. >> >> Regards, >> >> James >> >> On 18 February 2016 at 12:23, Ted Yu wrote: >> >>> Have you seen this ? >>> >>> HADOOP-10988 >>> >>> Cheers >>> >>> On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton wrote: >>> HI, I am seeing warnings like this in the logs when I run Spark jobs: OpenJDK 64-Bit Server VM warning: You have loaded library /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'. I used spark-ec2 to launch the cluster with the default AMI, Spark 1.5.2, hadoop major version 2.4. I altered the jdk to be openjdk 8 as I'd written some jobs in Java 8. The 6 workers nodes are m4.2xlarge and master is m4.large. Could this contribute to any problems running the jobs? Regards, James >>> >>> >> >
Re: Is this likely to cause any problems?
The docs mention spark-ec2 because it is part of the Spark project. There are many, many alternatives to spark-ec2 out there like EMR, but it's probably not the place of the official docs to promote any one of those third-party solutions. On Fri, Feb 19, 2016 at 11:05 AM James Hammerton wrote: > Hi, > > Having looked at how easy it is to use EMR, I reckon you may be right, > especially if using Java 8 is no more difficult with that than with > spark-ec2 (where I had to install it on the master and slaves and edit the > spark-env.sh). > > I'm now curious as to why the Spark documentation ( > http://spark.apache.org/docs/latest/index.html) mentions EC2 but not EMR. > > Regards, > > James > > > On 19 February 2016 at 14:25, Daniel Siegmann > wrote: > >> With EMR supporting Spark, I don't see much reason to use the spark-ec2 >> script unless it is important for you to be able to launch clusters using >> the bleeding edge version of Spark. EMR does seem to do a pretty decent job >> of keeping up to date - the latest version (4.3.0) supports the latest >> Spark version (1.6.0). >> >> So I'd flip the question around and ask: is there any reason to continue >> using the spark-ec2 script rather than EMR? >> >> On Thu, Feb 18, 2016 at 11:39 AM, James Hammerton wrote: >> >>> I have now... So far I think the issues I've had are not related to >>> this, but I wanted to be sure in case it should be something that needs to >>> be patched. I've had some jobs run successfully but this warning appears in >>> the logs. >>> >>> Regards, >>> >>> James >>> >>> On 18 February 2016 at 12:23, Ted Yu wrote: >>> Have you seen this ? HADOOP-10988 Cheers On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton wrote: > HI, > > I am seeing warnings like this in the logs when I run Spark jobs: > > OpenJDK 64-Bit Server VM warning: You have loaded library > /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have > disabled stack guard. The VM will try to fix the stack guard now. > It's highly recommended that you fix the library with 'execstack -c > ', or link it with '-z noexecstack'. > > > I used spark-ec2 to launch the cluster with the default AMI, Spark > 1.5.2, hadoop major version 2.4. I altered the jdk to be openjdk 8 as I'd > written some jobs in Java 8. The 6 workers nodes are m4.2xlarge and master > is m4.large. > > Could this contribute to any problems running the jobs? > > Regards, > > James > >>> >> >
Re: Is this likely to cause any problems?
Hi, Having looked at how easy it is to use EMR, I reckon you may be right, especially if using Java 8 is no more difficult with that than with spark-ec2 (where I had to install it on the master and slaves and edit the spark-env.sh). I'm now curious as to why the Spark documentation ( http://spark.apache.org/docs/latest/index.html) mentions EC2 but not EMR. Regards, James On 19 February 2016 at 14:25, Daniel Siegmann wrote: > With EMR supporting Spark, I don't see much reason to use the spark-ec2 > script unless it is important for you to be able to launch clusters using > the bleeding edge version of Spark. EMR does seem to do a pretty decent job > of keeping up to date - the latest version (4.3.0) supports the latest > Spark version (1.6.0). > > So I'd flip the question around and ask: is there any reason to continue > using the spark-ec2 script rather than EMR? > > On Thu, Feb 18, 2016 at 11:39 AM, James Hammerton wrote: > >> I have now... So far I think the issues I've had are not related to >> this, but I wanted to be sure in case it should be something that needs to >> be patched. I've had some jobs run successfully but this warning appears in >> the logs. >> >> Regards, >> >> James >> >> On 18 February 2016 at 12:23, Ted Yu wrote: >> >>> Have you seen this ? >>> >>> HADOOP-10988 >>> >>> Cheers >>> >>> On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton wrote: >>> HI, I am seeing warnings like this in the logs when I run Spark jobs: OpenJDK 64-Bit Server VM warning: You have loaded library /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'. I used spark-ec2 to launch the cluster with the default AMI, Spark 1.5.2, hadoop major version 2.4. I altered the jdk to be openjdk 8 as I'd written some jobs in Java 8. The 6 workers nodes are m4.2xlarge and master is m4.large. Could this contribute to any problems running the jobs? Regards, James >>> >>> >> >
Re: Is this likely to cause any problems?
With EMR supporting Spark, I don't see much reason to use the spark-ec2 script unless it is important for you to be able to launch clusters using the bleeding edge version of Spark. EMR does seem to do a pretty decent job of keeping up to date - the latest version (4.3.0) supports the latest Spark version (1.6.0). So I'd flip the question around and ask: is there any reason to continue using the spark-ec2 script rather than EMR? On Thu, Feb 18, 2016 at 11:39 AM, James Hammerton wrote: > I have now... So far I think the issues I've had are not related to this, > but I wanted to be sure in case it should be something that needs to be > patched. I've had some jobs run successfully but this warning appears in > the logs. > > Regards, > > James > > On 18 February 2016 at 12:23, Ted Yu wrote: > >> Have you seen this ? >> >> HADOOP-10988 >> >> Cheers >> >> On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton wrote: >> >>> HI, >>> >>> I am seeing warnings like this in the logs when I run Spark jobs: >>> >>> OpenJDK 64-Bit Server VM warning: You have loaded library >>> /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have >>> disabled stack guard. The VM will try to fix the stack guard now. >>> It's highly recommended that you fix the library with 'execstack -c >>> ', or link it with '-z noexecstack'. >>> >>> >>> I used spark-ec2 to launch the cluster with the default AMI, Spark >>> 1.5.2, hadoop major version 2.4. I altered the jdk to be openjdk 8 as I'd >>> written some jobs in Java 8. The 6 workers nodes are m4.2xlarge and master >>> is m4.large. >>> >>> Could this contribute to any problems running the jobs? >>> >>> Regards, >>> >>> James >>> >> >> >
Re: Is this likely to cause any problems?
I have now... So far I think the issues I've had are not related to this, but I wanted to be sure in case it should be something that needs to be patched. I've had some jobs run successfully but this warning appears in the logs. Regards, James On 18 February 2016 at 12:23, Ted Yu wrote: > Have you seen this ? > > HADOOP-10988 > > Cheers > > On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton wrote: > >> HI, >> >> I am seeing warnings like this in the logs when I run Spark jobs: >> >> OpenJDK 64-Bit Server VM warning: You have loaded library >> /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have disabled >> stack guard. The VM will try to fix the stack guard now. >> It's highly recommended that you fix the library with 'execstack -c >> ', or link it with '-z noexecstack'. >> >> >> I used spark-ec2 to launch the cluster with the default AMI, Spark 1.5.2, >> hadoop major version 2.4. I altered the jdk to be openjdk 8 as I'd written >> some jobs in Java 8. The 6 workers nodes are m4.2xlarge and master is >> m4.large. >> >> Could this contribute to any problems running the jobs? >> >> Regards, >> >> James >> > >
Re: Is this likely to cause any problems?
I'm fairly new to Spark. The documentation suggests using the spark-ec2 script to launch clusters in AWS, hence I used it. Would EMR offer any advantage? Regards, James On 18 February 2016 at 14:04, Gourav Sengupta wrote: > Hi, > > Just out of sheet curiosity why are you not using EMR to start your SPARK > cluster? > > > Regards, > Gourav > > On Thu, Feb 18, 2016 at 12:23 PM, Ted Yu wrote: > >> Have you seen this ? >> >> HADOOP-10988 >> >> Cheers >> >> On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton wrote: >> >>> HI, >>> >>> I am seeing warnings like this in the logs when I run Spark jobs: >>> >>> OpenJDK 64-Bit Server VM warning: You have loaded library >>> /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have >>> disabled stack guard. The VM will try to fix the stack guard now. >>> It's highly recommended that you fix the library with 'execstack -c >>> ', or link it with '-z noexecstack'. >>> >>> >>> I used spark-ec2 to launch the cluster with the default AMI, Spark >>> 1.5.2, hadoop major version 2.4. I altered the jdk to be openjdk 8 as I'd >>> written some jobs in Java 8. The 6 workers nodes are m4.2xlarge and master >>> is m4.large. >>> >>> Could this contribute to any problems running the jobs? >>> >>> Regards, >>> >>> James >>> >> >> >
Re: Is this likely to cause any problems?
Hi Ted/ Teng, Just read the content in the email which is very different from what the facts are: Just to want to add another point, spark-ec2 is nice to keep and improve because it allows users to any version of spark (nightly-build for example). EMR does not allow you to do that without manual process. EMR does provide different version of SPARK to run, like currently SPARK versions 1.4.1, 1.5.0, 1.5.2 and 1.6 are all available. SPARK 1.6 was released in Amazon on Jan4, 2016 and EMR provided SPARK 1.6 in another 20 days, production ready, scalable, and integrated in AWS world. Regards, Gourav Sengupta On Thu, Feb 18, 2016 at 2:30 PM, Ted Yu wrote: > Please see the last 3 posts on this thread: > > http://search-hadoop.com/m/q3RTtTorTf2o3UGK1&subj=Re+spark+ec2+vs+EMR > > FYI > > On Thu, Feb 18, 2016 at 6:25 AM, Teng Qiu wrote: > >> EMR is great, but I'm curiosity how are you dealing with security >> settings with EMR, only whitelisting some IP range with security group >> setting is really too weak. >> >> are there really many production system are using EMR? for me, i feel >> using EMR means everyone in my IP range (for some ISP it may be the whole >> town...) is able to see my spark web UI or use my running zepplin notebook >> if they do some port scanning... >> >> 2016-02-18 15:04 GMT+01:00 Gourav Sengupta : >> >>> Hi, >>> >>> Just out of sheet curiosity why are you not using EMR to start your >>> SPARK cluster? >>> >>> >>> Regards, >>> Gourav >>> >>> On Thu, Feb 18, 2016 at 12:23 PM, Ted Yu wrote: >>> Have you seen this ? HADOOP-10988 Cheers On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton wrote: > HI, > > I am seeing warnings like this in the logs when I run Spark jobs: > > OpenJDK 64-Bit Server VM warning: You have loaded library > /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have > disabled stack guard. The VM will try to fix the stack guard now. > It's highly recommended that you fix the library with 'execstack -c > ', or link it with '-z noexecstack'. > > > I used spark-ec2 to launch the cluster with the default AMI, Spark > 1.5.2, hadoop major version 2.4. I altered the jdk to be openjdk 8 as I'd > written some jobs in Java 8. The 6 workers nodes are m4.2xlarge and master > is m4.large. > > Could this contribute to any problems running the jobs? > > Regards, > > James > >>> >> >
Re: Is this likely to cause any problems?
Hi Teng, Are you using VPC in EMR? Seems quite curious though that you can lock in traffic at gateway, subnet, security group (using private setting using NAT) and still feel insecured. I will be really interested to know what your feelings are based on. I bet Amazon guys will also find it very interesting. And I am almost sure that none of EMR hosted services of HADOOP, SPARK, Zepplin, etc are exposed to the external IP addresses even if you are using the classical setting. Regards, Gourav Sengupta On Thu, Feb 18, 2016 at 2:25 PM, Teng Qiu wrote: > EMR is great, but I'm curiosity how are you dealing with security settings > with EMR, only whitelisting some IP range with security group setting is > really too weak. > > are there really many production system are using EMR? for me, i feel > using EMR means everyone in my IP range (for some ISP it may be the whole > town...) is able to see my spark web UI or use my running zepplin notebook > if they do some port scanning... > > 2016-02-18 15:04 GMT+01:00 Gourav Sengupta : > >> Hi, >> >> Just out of sheet curiosity why are you not using EMR to start your SPARK >> cluster? >> >> >> Regards, >> Gourav >> >> On Thu, Feb 18, 2016 at 12:23 PM, Ted Yu wrote: >> >>> Have you seen this ? >>> >>> HADOOP-10988 >>> >>> Cheers >>> >>> On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton wrote: >>> HI, I am seeing warnings like this in the logs when I run Spark jobs: OpenJDK 64-Bit Server VM warning: You have loaded library /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'. I used spark-ec2 to launch the cluster with the default AMI, Spark 1.5.2, hadoop major version 2.4. I altered the jdk to be openjdk 8 as I'd written some jobs in Java 8. The 6 workers nodes are m4.2xlarge and master is m4.large. Could this contribute to any problems running the jobs? Regards, James >>> >>> >> >
Re: Is this likely to cause any problems?
Please see the last 3 posts on this thread: http://search-hadoop.com/m/q3RTtTorTf2o3UGK1&subj=Re+spark+ec2+vs+EMR FYI On Thu, Feb 18, 2016 at 6:25 AM, Teng Qiu wrote: > EMR is great, but I'm curiosity how are you dealing with security settings > with EMR, only whitelisting some IP range with security group setting is > really too weak. > > are there really many production system are using EMR? for me, i feel > using EMR means everyone in my IP range (for some ISP it may be the whole > town...) is able to see my spark web UI or use my running zepplin notebook > if they do some port scanning... > > 2016-02-18 15:04 GMT+01:00 Gourav Sengupta : > >> Hi, >> >> Just out of sheet curiosity why are you not using EMR to start your SPARK >> cluster? >> >> >> Regards, >> Gourav >> >> On Thu, Feb 18, 2016 at 12:23 PM, Ted Yu wrote: >> >>> Have you seen this ? >>> >>> HADOOP-10988 >>> >>> Cheers >>> >>> On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton wrote: >>> HI, I am seeing warnings like this in the logs when I run Spark jobs: OpenJDK 64-Bit Server VM warning: You have loaded library /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'. I used spark-ec2 to launch the cluster with the default AMI, Spark 1.5.2, hadoop major version 2.4. I altered the jdk to be openjdk 8 as I'd written some jobs in Java 8. The 6 workers nodes are m4.2xlarge and master is m4.large. Could this contribute to any problems running the jobs? Regards, James >>> >>> >> >
Re: Is this likely to cause any problems?
EMR is great, but I'm curiosity how are you dealing with security settings with EMR, only whitelisting some IP range with security group setting is really too weak. are there really many production system are using EMR? for me, i feel using EMR means everyone in my IP range (for some ISP it may be the whole town...) is able to see my spark web UI or use my running zepplin notebook if they do some port scanning... 2016-02-18 15:04 GMT+01:00 Gourav Sengupta : > Hi, > > Just out of sheet curiosity why are you not using EMR to start your SPARK > cluster? > > > Regards, > Gourav > > On Thu, Feb 18, 2016 at 12:23 PM, Ted Yu wrote: > >> Have you seen this ? >> >> HADOOP-10988 >> >> Cheers >> >> On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton wrote: >> >>> HI, >>> >>> I am seeing warnings like this in the logs when I run Spark jobs: >>> >>> OpenJDK 64-Bit Server VM warning: You have loaded library >>> /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have >>> disabled stack guard. The VM will try to fix the stack guard now. >>> It's highly recommended that you fix the library with 'execstack -c >>> ', or link it with '-z noexecstack'. >>> >>> >>> I used spark-ec2 to launch the cluster with the default AMI, Spark >>> 1.5.2, hadoop major version 2.4. I altered the jdk to be openjdk 8 as I'd >>> written some jobs in Java 8. The 6 workers nodes are m4.2xlarge and master >>> is m4.large. >>> >>> Could this contribute to any problems running the jobs? >>> >>> Regards, >>> >>> James >>> >> >> >
Re: Is this likely to cause any problems?
Hi, Just out of sheet curiosity why are you not using EMR to start your SPARK cluster? Regards, Gourav On Thu, Feb 18, 2016 at 12:23 PM, Ted Yu wrote: > Have you seen this ? > > HADOOP-10988 > > Cheers > > On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton wrote: > >> HI, >> >> I am seeing warnings like this in the logs when I run Spark jobs: >> >> OpenJDK 64-Bit Server VM warning: You have loaded library >> /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have disabled >> stack guard. The VM will try to fix the stack guard now. >> It's highly recommended that you fix the library with 'execstack -c >> ', or link it with '-z noexecstack'. >> >> >> I used spark-ec2 to launch the cluster with the default AMI, Spark 1.5.2, >> hadoop major version 2.4. I altered the jdk to be openjdk 8 as I'd written >> some jobs in Java 8. The 6 workers nodes are m4.2xlarge and master is >> m4.large. >> >> Could this contribute to any problems running the jobs? >> >> Regards, >> >> James >> > >
Re: Is this likely to cause any problems?
Have you seen this ? HADOOP-10988 Cheers On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton wrote: > HI, > > I am seeing warnings like this in the logs when I run Spark jobs: > > OpenJDK 64-Bit Server VM warning: You have loaded library > /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have disabled > stack guard. The VM will try to fix the stack guard now. > It's highly recommended that you fix the library with 'execstack -c > ', or link it with '-z noexecstack'. > > > I used spark-ec2 to launch the cluster with the default AMI, Spark 1.5.2, > hadoop major version 2.4. I altered the jdk to be openjdk 8 as I'd written > some jobs in Java 8. The 6 workers nodes are m4.2xlarge and master is > m4.large. > > Could this contribute to any problems running the jobs? > > Regards, > > James >
Is this likely to cause any problems?
HI, I am seeing warnings like this in the logs when I run Spark jobs: OpenJDK 64-Bit Server VM warning: You have loaded library /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'. I used spark-ec2 to launch the cluster with the default AMI, Spark 1.5.2, hadoop major version 2.4. I altered the jdk to be openjdk 8 as I'd written some jobs in Java 8. The 6 workers nodes are m4.2xlarge and master is m4.large. Could this contribute to any problems running the jobs? Regards, James