Re: Databricks Cloud vs AWS EMR

2016-01-28 Thread Eran Witkon
Can you name the features that make databricks better than zepplin?
Eran
On Fri, 29 Jan 2016 at 01:37 Michal Klos  wrote:

> We use both databricks and emr. We use databricks for our exploratory /
> adhoc use cases because their notebook is pretty badass and better than
> Zeppelin IMHO.
>
> We use EMR for our production machine learning and ETL tasks. The nice
> thing about EMR is you can use applications other than spark. From a "tools
> in the toolbox" perspective this is very important.
>
> M
>
> On Jan 28, 2016, at 6:05 PM, Sourav Mazumder 
> wrote:
>
> You can also try out IBM's spark as a service in IBM Bluemix. You'll get
> there all required features for security, multitenancy, notebook,
> integration with other big data services. You can try that out for free too.
>
> Regards,
> Sourav
>
> On Thu, Jan 28, 2016 at 2:10 PM, Rakesh Soni 
> wrote:
>
>> At its core, EMR just launches Spark applications, whereas Databricks is
>>> a higher-level platform that also includes multi-user support, an
>>> interactive UI, security, and job scheduling.
>>>
>>> Specifically, Databricks runs standard Spark applications inside a
>>> user’s AWS account, similar to EMR, but it adds a variety of features to
>>> create an end-to-end environment for working with Spark. These include:
>>>
>>>
>>>-
>>>
>>>Interactive UI (includes a workspace with notebooks, dashboards, a
>>>job scheduler, point-and-click cluster management)
>>>-
>>>
>>>Cluster sharing (multiple users can connect to the same cluster,
>>>saving cost)
>>>-
>>>
>>>Security features (access controls to the whole workspace)
>>>-
>>>
>>>Collaboration (multi-user access to the same notebook, revision
>>>control, and IDE and GitHub integration)
>>>-
>>>
>>>Data management (support for connecting different data sources to
>>>Spark, caching service to speed up queries)
>>>
>>>
>>> The idea is that a lot of Spark deployments soon need to bring in
>>> multiple users, different types of jobs, etc, and we want to have these
>>> built-in. But if you just want to connect to existing data and run jobs,
>>> that also works.
>>>
>>> The cluster manager in Databricks is based on Standalone mode, not YARN,
>>> but Databricks adds several features, such as allowing multiple users to
>>> run commands on the same cluster and running multiple versions of Spark.
>>> Because Databricks is also the team that initially built Spark, the service
>>> is very up to date and integrated with the newest Spark features -- e.g.
>>> you can run previews of the next release, any data in Spark can be
>>> displayed visually, etc.
>>>
>>> *From: *Alex Nastetsky 
>>> *Subject: **Databricks Cloud vs AWS EMR*
>>> *Date: *January 26, 2016 at 11:55:41 AM PST
>>> *To: *user 
>>>
>>> As a user of AWS EMR (running Spark and MapReduce), I am interested in
>>> potential benefits that I may gain from Databricks Cloud. I was wondering
>>> if anyone has used both and done comparison / contrast between the two
>>> services.
>>>
>>> In general, which resource manager(s) does Databricks Cloud use for
>>> Spark? If it's YARN, can you also run MapReduce jobs in Databricks Cloud?
>>>
>>> Thanks.
>>>
>>> --
>>
>>
>>
>


Re: Databricks Cloud vs AWS EMR

2016-01-28 Thread Rakesh Soni
>
> At its core, EMR just launches Spark applications, whereas Databricks is a
> higher-level platform that also includes multi-user support, an interactive
> UI, security, and job scheduling.
>
> Specifically, Databricks runs standard Spark applications inside a user’s
> AWS account, similar to EMR, but it adds a variety of features to create an
> end-to-end environment for working with Spark. These include:
>
>
>-
>
>Interactive UI (includes a workspace with notebooks, dashboards, a job
>scheduler, point-and-click cluster management)
>-
>
>Cluster sharing (multiple users can connect to the same cluster,
>saving cost)
>-
>
>Security features (access controls to the whole workspace)
>-
>
>Collaboration (multi-user access to the same notebook, revision
>control, and IDE and GitHub integration)
>-
>
>Data management (support for connecting different data sources to
>Spark, caching service to speed up queries)
>
>
> The idea is that a lot of Spark deployments soon need to bring in multiple
> users, different types of jobs, etc, and we want to have these built-in.
> But if you just want to connect to existing data and run jobs, that also
> works.
>
> The cluster manager in Databricks is based on Standalone mode, not YARN,
> but Databricks adds several features, such as allowing multiple users to
> run commands on the same cluster and running multiple versions of Spark.
> Because Databricks is also the team that initially built Spark, the service
> is very up to date and integrated with the newest Spark features -- e.g.
> you can run previews of the next release, any data in Spark can be
> displayed visually, etc.
>
> *From: *Alex Nastetsky 
> *Subject: **Databricks Cloud vs AWS EMR*
> *Date: *January 26, 2016 at 11:55:41 AM PST
> *To: *user 
>
> As a user of AWS EMR (running Spark and MapReduce), I am interested in
> potential benefits that I may gain from Databricks Cloud. I was wondering
> if anyone has used both and done comparison / contrast between the two
> services.
>
> In general, which resource manager(s) does Databricks Cloud use for Spark?
> If it's YARN, can you also run MapReduce jobs in Databricks Cloud?
>
> Thanks.
>
> --


Re: Databricks Cloud vs AWS EMR

2016-01-28 Thread Sourav Mazumder
You can also try out IBM's spark as a service in IBM Bluemix. You'll get
there all required features for security, multitenancy, notebook,
integration with other big data services. You can try that out for free too.

Regards,
Sourav

On Thu, Jan 28, 2016 at 2:10 PM, Rakesh Soni  wrote:

> At its core, EMR just launches Spark applications, whereas Databricks is a
>> higher-level platform that also includes multi-user support, an interactive
>> UI, security, and job scheduling.
>>
>> Specifically, Databricks runs standard Spark applications inside a user’s
>> AWS account, similar to EMR, but it adds a variety of features to create an
>> end-to-end environment for working with Spark. These include:
>>
>>
>>-
>>
>>Interactive UI (includes a workspace with notebooks, dashboards, a
>>job scheduler, point-and-click cluster management)
>>-
>>
>>Cluster sharing (multiple users can connect to the same cluster,
>>saving cost)
>>-
>>
>>Security features (access controls to the whole workspace)
>>-
>>
>>Collaboration (multi-user access to the same notebook, revision
>>control, and IDE and GitHub integration)
>>-
>>
>>Data management (support for connecting different data sources to
>>Spark, caching service to speed up queries)
>>
>>
>> The idea is that a lot of Spark deployments soon need to bring in
>> multiple users, different types of jobs, etc, and we want to have these
>> built-in. But if you just want to connect to existing data and run jobs,
>> that also works.
>>
>> The cluster manager in Databricks is based on Standalone mode, not YARN,
>> but Databricks adds several features, such as allowing multiple users to
>> run commands on the same cluster and running multiple versions of Spark.
>> Because Databricks is also the team that initially built Spark, the service
>> is very up to date and integrated with the newest Spark features -- e.g.
>> you can run previews of the next release, any data in Spark can be
>> displayed visually, etc.
>>
>> *From: *Alex Nastetsky 
>> *Subject: **Databricks Cloud vs AWS EMR*
>> *Date: *January 26, 2016 at 11:55:41 AM PST
>> *To: *user 
>>
>> As a user of AWS EMR (running Spark and MapReduce), I am interested in
>> potential benefits that I may gain from Databricks Cloud. I was wondering
>> if anyone has used both and done comparison / contrast between the two
>> services.
>>
>> In general, which resource manager(s) does Databricks Cloud use for
>> Spark? If it's YARN, can you also run MapReduce jobs in Databricks Cloud?
>>
>> Thanks.
>>
>> --
>
>
>


Re: Databricks Cloud vs AWS EMR

2016-01-28 Thread Michal Klos
We use both databricks and emr. We use databricks for our exploratory / adhoc 
use cases because their notebook is pretty badass and better than Zeppelin IMHO.

We use EMR for our production machine learning and ETL tasks. The nice thing 
about EMR is you can use applications other than spark. From a "tools in the 
toolbox" perspective this is very important.

M

> On Jan 28, 2016, at 6:05 PM, Sourav Mazumder  
> wrote:
> 
> You can also try out IBM's spark as a service in IBM Bluemix. You'll get 
> there all required features for security, multitenancy, notebook, integration 
> with other big data services. You can try that out for free too.
> 
> Regards,
> Sourav
> 
> On Thu, Jan 28, 2016 at 2:10 PM, Rakesh Soni  wrote:
 At its core, EMR just launches Spark applications, whereas Databricks is a 
 higher-level platform that also includes multi-user support, an 
 interactive UI, security, and job scheduling.
 
 Specifically, Databricks runs standard Spark applications inside a user’s 
 AWS account, similar to EMR, but it adds a variety of features to create 
 an end-to-end environment for working with Spark. These include:
 
 Interactive UI (includes a workspace with notebooks, dashboards, a job 
 scheduler, point-and-click cluster management)
 Cluster sharing (multiple users can connect to the same cluster, saving 
 cost)
 Security features (access controls to the whole workspace)
 Collaboration (multi-user access to the same notebook, revision control, 
 and IDE and GitHub integration)
 Data management (support for connecting different data sources to Spark, 
 caching service to speed up queries)
 
 The idea is that a lot of Spark deployments soon need to bring in multiple 
 users, different types of jobs, etc, and we want to have these built-in. 
 But if you just want to connect to existing data and run jobs, that also 
 works.
 
 The cluster manager in Databricks is based on Standalone mode, not YARN, 
 but Databricks adds several features, such as allowing multiple users to 
 run commands on the same cluster and running multiple versions of Spark. 
 Because Databricks is also the team that initially built Spark, the 
 service is very up to date and integrated with the newest Spark features 
 -- e.g. you can run previews of the next release, any data in Spark can be 
 displayed visually, etc.
 
 From: Alex Nastetsky 
 Subject: Databricks Cloud vs AWS EMR
 Date: January 26, 2016 at 11:55:41 AM PST
 To: user 
 
 As a user of AWS EMR (running Spark and MapReduce), I am interested in 
 potential benefits that I may gain from Databricks Cloud. I was wondering 
 if anyone has used both and done comparison / contrast between the two 
 services.
 
 In general, which resource manager(s) does Databricks Cloud use for Spark? 
 If it's YARN, can you also run MapReduce jobs in Databricks Cloud?
 
 Thanks.
>> --
>