Re: spark-ec2 vs. EMR

2015-12-04 Thread Jonathan Kelly
gt;>
>>>>>> just my $0.02
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Dec 1, 2015, at 11:15 AM, Nick Chammas >>>>> > wrote:
>>>>>>
>>>>>> Pinging this thread in case anyone has thoughts on the matter they
>>>>>> want to share.
>>>>>>
>>>>>> On Sat, Nov 21, 2015 at 11:32 AM Nicholas Chammas <[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> Spark has come bundled with spark-ec2
>>>>>>> <http://spark.apache.org/docs/latest/ec2-scripts.html> for many
>>>>>>> years. At the same time, EMR has been capable of running Spark for a 
>>>>>>> while,
>>>>>>> and earlier this year it added "official" support
>>>>>>> <https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/>.
>>>>>>>
>>>>>>> If you're looking for a way to provision Spark clusters, there are
>>>>>>> some clear differences between these 2 options. I think the biggest one
>>>>>>> would be that EMR is a "production" solution backed by a company, 
>>>>>>> whereas
>>>>>>> spark-ec2 is not really intended for production use (as far as I know).
>>>>>>>
>>>>>>> That particular difference in intended use may or may not matter to
>>>>>>> you, but I'm curious:
>>>>>>>
>>>>>>> What are some of the other differences between the 2 that do matter
>>>>>>> to you? If you were considering these 2 solutions for your use case at 
>>>>>>> one
>>>>>>> point recently, why did you choose one over the other?
>>>>>>>
>>>>>>> I'd be especially interested in hearing about why people might
>>>>>>> choose spark-ec2 over EMR, since the latter option seems to have shaped 
>>>>>>> up
>>>>>>> nicely this year.
>>>>>>>
>>>>>>> Nick
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> View this message in context: Re: spark-ec2 vs. EMR
>>>>>> <http://apache-spark-user-list.1001560.n3.nabble.com/Re-spark-ec2-vs-EMR-tp25538.html>
>>>>>> Sent from the Apache Spark User List mailing list archive
>>>>>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>>>>>
>>>>>>
>>>>>>
>>>
>>
>


Re: spark-ec2 vs. EMR

2015-12-02 Thread Jonathan Kelly
EMR is currently running a private preview of an upcoming feature allowing
EMR clusters to be launched in VPC private subnets. This will allow you to
launch a cluster in a subnet without an Internet Gateway attached. Please
contact jonfr...@amazon.com if you would like more information.

~ Jonathan

Note: jonfr...@amazon.com is not me. I'm a different Jonathan. :)

On Wed, Dec 2, 2015 at 10:21 AM, Jerry Lam  wrote:

> Hi Dana,
>
> Yes, we get VPC + EMR working but I'm not the person who deploys it. It is
> related to subnet as Alex points out.
>
> Just to want to add another point, spark-ec2 is nice to keep and improve
> because it allows users to any version of spark (nightly-build for
> example). EMR does not allow you to do that without manual process.
>
> Best Regards,
>
> Jerry
>
> On Wed, Dec 2, 2015 at 1:02 PM, Alexander Pivovarov 
> wrote:
>
>> Do you think it's a security issue if EMR started in VPC with a subnet
>> having Auto-assign Public IP: Yes
>>
>> you can remove all Inbound rules having 0.0.0.0/0 Source in master and
>> slave Security Group
>> So, master and slave boxes will be accessible only for users who are on
>> VPN
>>
>>
>>
>>
>> On Wed, Dec 2, 2015 at 9:44 AM, Dana Powers 
>> wrote:
>>
>>> EMR was a pain to configure on a private VPC last I tried. Has anyone
>>> had success with that? I found spark-ec2 easier to use w private
>>> networking, but also agree that I would use for prod.
>>>
>>> -Dana
>>> On Dec 1, 2015 12:29 PM, "Alexander Pivovarov" 
>>> wrote:
>>>
>>>> 1. Emr 4.2.0 has Zeppelin as an alternative to DataBricks Notebooks
>>>>
>>>> 2. Emr has Ganglia 3.6.0
>>>>
>>>> 3. Emr has hadoop fs settings to make s3 work fast
>>>> (direct.EmrFileSystem)
>>>>
>>>> 4. EMR has s3 keys in hadoop configs
>>>>
>>>> 5. EMR allows to resize cluster on fly.
>>>>
>>>> 6. EMR has aws sdk in spark classpath. Helps to reduce app assembly jar
>>>> size
>>>>
>>>> 7. ec2 script installs all in /root, EMR has dedicated users: hadoop,
>>>> zeppelin, etc. EMR is similar to Cloudera or Hortonworks
>>>>
>>>> 8. There are at least 3 spark-ec2 projects. (in apache/spark, in mesos,
>>>> in amplab). Master branch in spark has outdated ec2 script. Other projects
>>>> have broken links in readme. WHAT A MESS!
>>>>
>>>> 9. ec2 script has bad documentation and non informative error messages.
>>>> e.g. readme does not say anything about --private-ips option. If you did
>>>> not add the flag it will connect to empty string host (localhost) instead
>>>> of master. Fixed only last week. Not sure if fixed in all branches
>>>>
>>>> 10. I think Amazon will include spark-jobserver to EMR soon.
>>>>
>>>> 11. You do not need to be aws expert to start EMR cluster. Users can
>>>> use EMR web ui to start cluster to run some jobs or work in Zeppelun during
>>>> the day
>>>>
>>>> 12. EMR cluster starts in abour 8 min. Ec2 script works longer and you
>>>> need to be online.
>>>> On Dec 1, 2015 9:22 AM, "Jerry Lam"  wrote:
>>>>
>>>>> Simply put:
>>>>>
>>>>> EMR = Hadoop Ecosystem (Yarn, HDFS, etc) + Spark + EMRFS + Amazon EMR
>>>>> API + Selected Instance Types + Amazon EC2 Friendly (bootstrapping)
>>>>> spark-ec2 = HDFS + Yarn (Optional) + Spark (Standalone Default) + Any
>>>>> Instance Type
>>>>>
>>>>> I use spark-ec2 for prototyping and I have never use it for production.
>>>>>
>>>>> just my $0.02
>>>>>
>>>>>
>>>>>
>>>>> On Dec 1, 2015, at 11:15 AM, Nick Chammas 
>>>>> wrote:
>>>>>
>>>>> Pinging this thread in case anyone has thoughts on the matter they
>>>>> want to share.
>>>>>
>>>>> On Sat, Nov 21, 2015 at 11:32 AM Nicholas Chammas <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> Spark has come bundled with spark-ec2
>>>>>> <http://spark.apache.org/docs/latest/ec2-scripts.html> for many
>>>>>> years. At the same time, EMR has been capable of running Spark for a 
>>>>>> while,
>>>>>> and earlier this year it added "official" su

Re: spark-ec2 vs. EMR

2015-12-02 Thread Jerry Lam
Hi Dana,

Yes, we get VPC + EMR working but I'm not the person who deploys it. It is
related to subnet as Alex points out.

Just to want to add another point, spark-ec2 is nice to keep and improve
because it allows users to any version of spark (nightly-build for
example). EMR does not allow you to do that without manual process.

Best Regards,

Jerry

On Wed, Dec 2, 2015 at 1:02 PM, Alexander Pivovarov 
wrote:

> Do you think it's a security issue if EMR started in VPC with a subnet
> having Auto-assign Public IP: Yes
>
> you can remove all Inbound rules having 0.0.0.0/0 Source in master and
> slave Security Group
> So, master and slave boxes will be accessible only for users who are on VPN
>
>
>
>
> On Wed, Dec 2, 2015 at 9:44 AM, Dana Powers  wrote:
>
>> EMR was a pain to configure on a private VPC last I tried. Has anyone had
>> success with that? I found spark-ec2 easier to use w private networking,
>> but also agree that I would use for prod.
>>
>> -Dana
>> On Dec 1, 2015 12:29 PM, "Alexander Pivovarov" 
>> wrote:
>>
>>> 1. Emr 4.2.0 has Zeppelin as an alternative to DataBricks Notebooks
>>>
>>> 2. Emr has Ganglia 3.6.0
>>>
>>> 3. Emr has hadoop fs settings to make s3 work fast (direct.EmrFileSystem)
>>>
>>> 4. EMR has s3 keys in hadoop configs
>>>
>>> 5. EMR allows to resize cluster on fly.
>>>
>>> 6. EMR has aws sdk in spark classpath. Helps to reduce app assembly jar
>>> size
>>>
>>> 7. ec2 script installs all in /root, EMR has dedicated users: hadoop,
>>> zeppelin, etc. EMR is similar to Cloudera or Hortonworks
>>>
>>> 8. There are at least 3 spark-ec2 projects. (in apache/spark, in mesos,
>>> in amplab). Master branch in spark has outdated ec2 script. Other projects
>>> have broken links in readme. WHAT A MESS!
>>>
>>> 9. ec2 script has bad documentation and non informative error messages.
>>> e.g. readme does not say anything about --private-ips option. If you did
>>> not add the flag it will connect to empty string host (localhost) instead
>>> of master. Fixed only last week. Not sure if fixed in all branches
>>>
>>> 10. I think Amazon will include spark-jobserver to EMR soon.
>>>
>>> 11. You do not need to be aws expert to start EMR cluster. Users can use
>>> EMR web ui to start cluster to run some jobs or work in Zeppelun during the
>>> day
>>>
>>> 12. EMR cluster starts in abour 8 min. Ec2 script works longer and you
>>> need to be online.
>>> On Dec 1, 2015 9:22 AM, "Jerry Lam"  wrote:
>>>
>>>> Simply put:
>>>>
>>>> EMR = Hadoop Ecosystem (Yarn, HDFS, etc) + Spark + EMRFS + Amazon EMR
>>>> API + Selected Instance Types + Amazon EC2 Friendly (bootstrapping)
>>>> spark-ec2 = HDFS + Yarn (Optional) + Spark (Standalone Default) + Any
>>>> Instance Type
>>>>
>>>> I use spark-ec2 for prototyping and I have never use it for production.
>>>>
>>>> just my $0.02
>>>>
>>>>
>>>>
>>>> On Dec 1, 2015, at 11:15 AM, Nick Chammas 
>>>> wrote:
>>>>
>>>> Pinging this thread in case anyone has thoughts on the matter they want
>>>> to share.
>>>>
>>>> On Sat, Nov 21, 2015 at 11:32 AM Nicholas Chammas <[hidden email]>
>>>> wrote:
>>>>
>>>>> Spark has come bundled with spark-ec2
>>>>> <http://spark.apache.org/docs/latest/ec2-scripts.html> for many
>>>>> years. At the same time, EMR has been capable of running Spark for a 
>>>>> while,
>>>>> and earlier this year it added "official" support
>>>>> <https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/>.
>>>>>
>>>>> If you're looking for a way to provision Spark clusters, there are
>>>>> some clear differences between these 2 options. I think the biggest one
>>>>> would be that EMR is a "production" solution backed by a company, whereas
>>>>> spark-ec2 is not really intended for production use (as far as I know).
>>>>>
>>>>> That particular difference in intended use may or may not matter to
>>>>> you, but I'm curious:
>>>>>
>>>>> What are some of the other differences between the 2 that do matter to
>>>>> you? If you were considering these 2 solutions for your use case at one
>>>>> point recently, why did you choose one over the other?
>>>>>
>>>>> I'd be especially interested in hearing about why people might choose
>>>>> spark-ec2 over EMR, since the latter option seems to have shaped up nicely
>>>>> this year.
>>>>>
>>>>> Nick
>>>>>
>>>>>
>>>> --
>>>> View this message in context: Re: spark-ec2 vs. EMR
>>>> <http://apache-spark-user-list.1001560.n3.nabble.com/Re-spark-ec2-vs-EMR-tp25538.html>
>>>> Sent from the Apache Spark User List mailing list archive
>>>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>>>
>>>>
>>>>
>


Re: spark-ec2 vs. EMR

2015-12-02 Thread Alexander Pivovarov
Do you think it's a security issue if EMR started in VPC with a subnet
having Auto-assign Public IP: Yes

you can remove all Inbound rules having 0.0.0.0/0 Source in master and
slave Security Group
So, master and slave boxes will be accessible only for users who are on VPN




On Wed, Dec 2, 2015 at 9:44 AM, Dana Powers  wrote:

> EMR was a pain to configure on a private VPC last I tried. Has anyone had
> success with that? I found spark-ec2 easier to use w private networking,
> but also agree that I would use for prod.
>
> -Dana
> On Dec 1, 2015 12:29 PM, "Alexander Pivovarov" 
> wrote:
>
>> 1. Emr 4.2.0 has Zeppelin as an alternative to DataBricks Notebooks
>>
>> 2. Emr has Ganglia 3.6.0
>>
>> 3. Emr has hadoop fs settings to make s3 work fast (direct.EmrFileSystem)
>>
>> 4. EMR has s3 keys in hadoop configs
>>
>> 5. EMR allows to resize cluster on fly.
>>
>> 6. EMR has aws sdk in spark classpath. Helps to reduce app assembly jar
>> size
>>
>> 7. ec2 script installs all in /root, EMR has dedicated users: hadoop,
>> zeppelin, etc. EMR is similar to Cloudera or Hortonworks
>>
>> 8. There are at least 3 spark-ec2 projects. (in apache/spark, in mesos,
>> in amplab). Master branch in spark has outdated ec2 script. Other projects
>> have broken links in readme. WHAT A MESS!
>>
>> 9. ec2 script has bad documentation and non informative error messages.
>> e.g. readme does not say anything about --private-ips option. If you did
>> not add the flag it will connect to empty string host (localhost) instead
>> of master. Fixed only last week. Not sure if fixed in all branches
>>
>> 10. I think Amazon will include spark-jobserver to EMR soon.
>>
>> 11. You do not need to be aws expert to start EMR cluster. Users can use
>> EMR web ui to start cluster to run some jobs or work in Zeppelun during the
>> day
>>
>> 12. EMR cluster starts in abour 8 min. Ec2 script works longer and you
>> need to be online.
>> On Dec 1, 2015 9:22 AM, "Jerry Lam"  wrote:
>>
>>> Simply put:
>>>
>>> EMR = Hadoop Ecosystem (Yarn, HDFS, etc) + Spark + EMRFS + Amazon EMR
>>> API + Selected Instance Types + Amazon EC2 Friendly (bootstrapping)
>>> spark-ec2 = HDFS + Yarn (Optional) + Spark (Standalone Default) + Any
>>> Instance Type
>>>
>>> I use spark-ec2 for prototyping and I have never use it for production.
>>>
>>> just my $0.02
>>>
>>>
>>>
>>> On Dec 1, 2015, at 11:15 AM, Nick Chammas 
>>> wrote:
>>>
>>> Pinging this thread in case anyone has thoughts on the matter they want
>>> to share.
>>>
>>> On Sat, Nov 21, 2015 at 11:32 AM Nicholas Chammas <[hidden email]>
>>> wrote:
>>>
>>>> Spark has come bundled with spark-ec2
>>>> <http://spark.apache.org/docs/latest/ec2-scripts.html> for many years.
>>>> At the same time, EMR has been capable of running Spark for a while, and
>>>> earlier this year it added "official" support
>>>> <https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/>.
>>>>
>>>> If you're looking for a way to provision Spark clusters, there are some
>>>> clear differences between these 2 options. I think the biggest one would be
>>>> that EMR is a "production" solution backed by a company, whereas spark-ec2
>>>> is not really intended for production use (as far as I know).
>>>>
>>>> That particular difference in intended use may or may not matter to
>>>> you, but I'm curious:
>>>>
>>>> What are some of the other differences between the 2 that do matter to
>>>> you? If you were considering these 2 solutions for your use case at one
>>>> point recently, why did you choose one over the other?
>>>>
>>>> I'd be especially interested in hearing about why people might choose
>>>> spark-ec2 over EMR, since the latter option seems to have shaped up nicely
>>>> this year.
>>>>
>>>> Nick
>>>>
>>>>
>>> --
>>> View this message in context: Re: spark-ec2 vs. EMR
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/Re-spark-ec2-vs-EMR-tp25538.html>
>>> Sent from the Apache Spark User List mailing list archive
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>>
>>>
>>>


Re: spark-ec2 vs. EMR

2015-12-02 Thread Dana Powers
EMR was a pain to configure on a private VPC last I tried. Has anyone had
success with that? I found spark-ec2 easier to use w private networking,
but also agree that I would use for prod.

-Dana
On Dec 1, 2015 12:29 PM, "Alexander Pivovarov"  wrote:

> 1. Emr 4.2.0 has Zeppelin as an alternative to DataBricks Notebooks
>
> 2. Emr has Ganglia 3.6.0
>
> 3. Emr has hadoop fs settings to make s3 work fast (direct.EmrFileSystem)
>
> 4. EMR has s3 keys in hadoop configs
>
> 5. EMR allows to resize cluster on fly.
>
> 6. EMR has aws sdk in spark classpath. Helps to reduce app assembly jar
> size
>
> 7. ec2 script installs all in /root, EMR has dedicated users: hadoop,
> zeppelin, etc. EMR is similar to Cloudera or Hortonworks
>
> 8. There are at least 3 spark-ec2 projects. (in apache/spark, in mesos, in
> amplab). Master branch in spark has outdated ec2 script. Other projects
> have broken links in readme. WHAT A MESS!
>
> 9. ec2 script has bad documentation and non informative error messages.
> e.g. readme does not say anything about --private-ips option. If you did
> not add the flag it will connect to empty string host (localhost) instead
> of master. Fixed only last week. Not sure if fixed in all branches
>
> 10. I think Amazon will include spark-jobserver to EMR soon.
>
> 11. You do not need to be aws expert to start EMR cluster. Users can use
> EMR web ui to start cluster to run some jobs or work in Zeppelun during the
> day
>
> 12. EMR cluster starts in abour 8 min. Ec2 script works longer and you
> need to be online.
> On Dec 1, 2015 9:22 AM, "Jerry Lam"  wrote:
>
>> Simply put:
>>
>> EMR = Hadoop Ecosystem (Yarn, HDFS, etc) + Spark + EMRFS + Amazon EMR API
>> + Selected Instance Types + Amazon EC2 Friendly (bootstrapping)
>> spark-ec2 = HDFS + Yarn (Optional) + Spark (Standalone Default) + Any
>> Instance Type
>>
>> I use spark-ec2 for prototyping and I have never use it for production.
>>
>> just my $0.02
>>
>>
>>
>> On Dec 1, 2015, at 11:15 AM, Nick Chammas 
>> wrote:
>>
>> Pinging this thread in case anyone has thoughts on the matter they want
>> to share.
>>
>> On Sat, Nov 21, 2015 at 11:32 AM Nicholas Chammas <[hidden email]> wrote:
>>
>>> Spark has come bundled with spark-ec2
>>> <http://spark.apache.org/docs/latest/ec2-scripts.html> for many years.
>>> At the same time, EMR has been capable of running Spark for a while, and
>>> earlier this year it added "official" support
>>> <https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/>.
>>>
>>> If you're looking for a way to provision Spark clusters, there are some
>>> clear differences between these 2 options. I think the biggest one would be
>>> that EMR is a "production" solution backed by a company, whereas spark-ec2
>>> is not really intended for production use (as far as I know).
>>>
>>> That particular difference in intended use may or may not matter to you,
>>> but I'm curious:
>>>
>>> What are some of the other differences between the 2 that do matter to
>>> you? If you were considering these 2 solutions for your use case at one
>>> point recently, why did you choose one over the other?
>>>
>>> I'd be especially interested in hearing about why people might choose
>>> spark-ec2 over EMR, since the latter option seems to have shaped up nicely
>>> this year.
>>>
>>> Nick
>>>
>>>
>> --
>> View this message in context: Re: spark-ec2 vs. EMR
>> <http://apache-spark-user-list.1001560.n3.nabble.com/Re-spark-ec2-vs-EMR-tp25538.html>
>> Sent from the Apache Spark User List mailing list archive
>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>
>>
>>


Re: spark-ec2 vs. EMR

2015-12-01 Thread Alexander Pivovarov
1. Emr 4.2.0 has Zeppelin as an alternative to DataBricks Notebooks

2. Emr has Ganglia 3.6.0

3. Emr has hadoop fs settings to make s3 work fast (direct.EmrFileSystem)

4. EMR has s3 keys in hadoop configs

5. EMR allows to resize cluster on fly.

6. EMR has aws sdk in spark classpath. Helps to reduce app assembly jar size

7. ec2 script installs all in /root, EMR has dedicated users: hadoop,
zeppelin, etc. EMR is similar to Cloudera or Hortonworks

8. There are at least 3 spark-ec2 projects. (in apache/spark, in mesos, in
amplab). Master branch in spark has outdated ec2 script. Other projects
have broken links in readme. WHAT A MESS!

9. ec2 script has bad documentation and non informative error messages.
e.g. readme does not say anything about --private-ips option. If you did
not add the flag it will connect to empty string host (localhost) instead
of master. Fixed only last week. Not sure if fixed in all branches

10. I think Amazon will include spark-jobserver to EMR soon.

11. You do not need to be aws expert to start EMR cluster. Users can use
EMR web ui to start cluster to run some jobs or work in Zeppelun during the
day

12. EMR cluster starts in abour 8 min. Ec2 script works longer and you need
to be online.
On Dec 1, 2015 9:22 AM, "Jerry Lam"  wrote:

> Simply put:
>
> EMR = Hadoop Ecosystem (Yarn, HDFS, etc) + Spark + EMRFS + Amazon EMR API
> + Selected Instance Types + Amazon EC2 Friendly (bootstrapping)
> spark-ec2 = HDFS + Yarn (Optional) + Spark (Standalone Default) + Any
> Instance Type
>
> I use spark-ec2 for prototyping and I have never use it for production.
>
> just my $0.02
>
>
>
> On Dec 1, 2015, at 11:15 AM, Nick Chammas 
> wrote:
>
> Pinging this thread in case anyone has thoughts on the matter they want to
> share.
>
> On Sat, Nov 21, 2015 at 11:32 AM Nicholas Chammas <[hidden email]> wrote:
>
>> Spark has come bundled with spark-ec2
>> <http://spark.apache.org/docs/latest/ec2-scripts.html> for many years.
>> At the same time, EMR has been capable of running Spark for a while, and
>> earlier this year it added "official" support
>> <https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/>.
>>
>> If you're looking for a way to provision Spark clusters, there are some
>> clear differences between these 2 options. I think the biggest one would be
>> that EMR is a "production" solution backed by a company, whereas spark-ec2
>> is not really intended for production use (as far as I know).
>>
>> That particular difference in intended use may or may not matter to you,
>> but I'm curious:
>>
>> What are some of the other differences between the 2 that do matter to
>> you? If you were considering these 2 solutions for your use case at one
>> point recently, why did you choose one over the other?
>>
>> I'd be especially interested in hearing about why people might choose
>> spark-ec2 over EMR, since the latter option seems to have shaped up nicely
>> this year.
>>
>> Nick
>>
>>
> --
> View this message in context: Re: spark-ec2 vs. EMR
> <http://apache-spark-user-list.1001560.n3.nabble.com/Re-spark-ec2-vs-EMR-tp25538.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>
>
>


Re: spark-ec2 vs. EMR

2015-12-01 Thread Jerry Lam
Simply put:

EMR = Hadoop Ecosystem (Yarn, HDFS, etc) + Spark + EMRFS + Amazon EMR API + 
Selected Instance Types + Amazon EC2 Friendly (bootstrapping)
spark-ec2 = HDFS + Yarn (Optional) + Spark (Standalone Default) + Any Instance 
Type

I use spark-ec2 for prototyping and I have never use it for production.

just my $0.02



> On Dec 1, 2015, at 11:15 AM, Nick Chammas  wrote:
> 
> Pinging this thread in case anyone has thoughts on the matter they want to 
> share.
> 
> On Sat, Nov 21, 2015 at 11:32 AM Nicholas Chammas <[hidden email] 
> > wrote:
> Spark has come bundled with spark-ec2 
> <http://spark.apache.org/docs/latest/ec2-scripts.html> for many years. At the 
> same time, EMR has been capable of running Spark for a while, and earlier 
> this year it added "official" support 
> <https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/>.
> 
> If you're looking for a way to provision Spark clusters, there are some clear 
> differences between these 2 options. I think the biggest one would be that 
> EMR is a "production" solution backed by a company, whereas spark-ec2 is not 
> really intended for production use (as far as I know).
> 
> That particular difference in intended use may or may not matter to you, but 
> I'm curious:
> 
> What are some of the other differences between the 2 that do matter to you? 
> If you were considering these 2 solutions for your use case at one point 
> recently, why did you choose one over the other?
> 
> I'd be especially interested in hearing about why people might choose 
> spark-ec2 over EMR, since the latter option seems to have shaped up nicely 
> this year.
> 
> Nick
> 
> 
> View this message in context: Re: spark-ec2 vs. EMR 
> <http://apache-spark-user-list.1001560.n3.nabble.com/Re-spark-ec2-vs-EMR-tp25538.html>
> Sent from the Apache Spark User List mailing list archive 
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.



Re: spark-ec2 vs. EMR

2015-12-01 Thread Nick Chammas
Pinging this thread in case anyone has thoughts on the matter they want to
share.

On Sat, Nov 21, 2015 at 11:32 AM Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> Spark has come bundled with spark-ec2
> <http://spark.apache.org/docs/latest/ec2-scripts.html> for many years. At
> the same time, EMR has been capable of running Spark for a while, and
> earlier this year it added "official" support
> <https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/>.
>
> If you're looking for a way to provision Spark clusters, there are some
> clear differences between these 2 options. I think the biggest one would be
> that EMR is a "production" solution backed by a company, whereas spark-ec2
> is not really intended for production use (as far as I know).
>
> That particular difference in intended use may or may not matter to you,
> but I'm curious:
>
> What are some of the other differences between the 2 that do matter to
> you? If you were considering these 2 solutions for your use case at one
> point recently, why did you choose one over the other?
>
> I'd be especially interested in hearing about why people might choose
> spark-ec2 over EMR, since the latter option seems to have shaped up nicely
> this year.
>
> Nick
>
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Re-spark-ec2-vs-EMR-tp25538.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.