Re: Databricks Cloud vs AWS EMR
Can you name the features that make databricks better than zepplin? Eran On Fri, 29 Jan 2016 at 01:37 Michal Kloswrote: > We use both databricks and emr. We use databricks for our exploratory / > adhoc use cases because their notebook is pretty badass and better than > Zeppelin IMHO. > > We use EMR for our production machine learning and ETL tasks. The nice > thing about EMR is you can use applications other than spark. From a "tools > in the toolbox" perspective this is very important. > > M > > On Jan 28, 2016, at 6:05 PM, Sourav Mazumder > wrote: > > You can also try out IBM's spark as a service in IBM Bluemix. You'll get > there all required features for security, multitenancy, notebook, > integration with other big data services. You can try that out for free too. > > Regards, > Sourav > > On Thu, Jan 28, 2016 at 2:10 PM, Rakesh Soni > wrote: > >> At its core, EMR just launches Spark applications, whereas Databricks is >>> a higher-level platform that also includes multi-user support, an >>> interactive UI, security, and job scheduling. >>> >>> Specifically, Databricks runs standard Spark applications inside a >>> user’s AWS account, similar to EMR, but it adds a variety of features to >>> create an end-to-end environment for working with Spark. These include: >>> >>> >>>- >>> >>>Interactive UI (includes a workspace with notebooks, dashboards, a >>>job scheduler, point-and-click cluster management) >>>- >>> >>>Cluster sharing (multiple users can connect to the same cluster, >>>saving cost) >>>- >>> >>>Security features (access controls to the whole workspace) >>>- >>> >>>Collaboration (multi-user access to the same notebook, revision >>>control, and IDE and GitHub integration) >>>- >>> >>>Data management (support for connecting different data sources to >>>Spark, caching service to speed up queries) >>> >>> >>> The idea is that a lot of Spark deployments soon need to bring in >>> multiple users, different types of jobs, etc, and we want to have these >>> built-in. But if you just want to connect to existing data and run jobs, >>> that also works. >>> >>> The cluster manager in Databricks is based on Standalone mode, not YARN, >>> but Databricks adds several features, such as allowing multiple users to >>> run commands on the same cluster and running multiple versions of Spark. >>> Because Databricks is also the team that initially built Spark, the service >>> is very up to date and integrated with the newest Spark features -- e.g. >>> you can run previews of the next release, any data in Spark can be >>> displayed visually, etc. >>> >>> *From: *Alex Nastetsky >>> *Subject: **Databricks Cloud vs AWS EMR* >>> *Date: *January 26, 2016 at 11:55:41 AM PST >>> *To: *user >>> >>> As a user of AWS EMR (running Spark and MapReduce), I am interested in >>> potential benefits that I may gain from Databricks Cloud. I was wondering >>> if anyone has used both and done comparison / contrast between the two >>> services. >>> >>> In general, which resource manager(s) does Databricks Cloud use for >>> Spark? If it's YARN, can you also run MapReduce jobs in Databricks Cloud? >>> >>> Thanks. >>> >>> -- >> >> >> >
Re: Databricks Cloud vs AWS EMR
> > At its core, EMR just launches Spark applications, whereas Databricks is a > higher-level platform that also includes multi-user support, an interactive > UI, security, and job scheduling. > > Specifically, Databricks runs standard Spark applications inside a user’s > AWS account, similar to EMR, but it adds a variety of features to create an > end-to-end environment for working with Spark. These include: > > >- > >Interactive UI (includes a workspace with notebooks, dashboards, a job >scheduler, point-and-click cluster management) >- > >Cluster sharing (multiple users can connect to the same cluster, >saving cost) >- > >Security features (access controls to the whole workspace) >- > >Collaboration (multi-user access to the same notebook, revision >control, and IDE and GitHub integration) >- > >Data management (support for connecting different data sources to >Spark, caching service to speed up queries) > > > The idea is that a lot of Spark deployments soon need to bring in multiple > users, different types of jobs, etc, and we want to have these built-in. > But if you just want to connect to existing data and run jobs, that also > works. > > The cluster manager in Databricks is based on Standalone mode, not YARN, > but Databricks adds several features, such as allowing multiple users to > run commands on the same cluster and running multiple versions of Spark. > Because Databricks is also the team that initially built Spark, the service > is very up to date and integrated with the newest Spark features -- e.g. > you can run previews of the next release, any data in Spark can be > displayed visually, etc. > > *From: *Alex Nastetsky> *Subject: **Databricks Cloud vs AWS EMR* > *Date: *January 26, 2016 at 11:55:41 AM PST > *To: *user > > As a user of AWS EMR (running Spark and MapReduce), I am interested in > potential benefits that I may gain from Databricks Cloud. I was wondering > if anyone has used both and done comparison / contrast between the two > services. > > In general, which resource manager(s) does Databricks Cloud use for Spark? > If it's YARN, can you also run MapReduce jobs in Databricks Cloud? > > Thanks. > > --
Re: Databricks Cloud vs AWS EMR
You can also try out IBM's spark as a service in IBM Bluemix. You'll get there all required features for security, multitenancy, notebook, integration with other big data services. You can try that out for free too. Regards, Sourav On Thu, Jan 28, 2016 at 2:10 PM, Rakesh Soniwrote: > At its core, EMR just launches Spark applications, whereas Databricks is a >> higher-level platform that also includes multi-user support, an interactive >> UI, security, and job scheduling. >> >> Specifically, Databricks runs standard Spark applications inside a user’s >> AWS account, similar to EMR, but it adds a variety of features to create an >> end-to-end environment for working with Spark. These include: >> >> >>- >> >>Interactive UI (includes a workspace with notebooks, dashboards, a >>job scheduler, point-and-click cluster management) >>- >> >>Cluster sharing (multiple users can connect to the same cluster, >>saving cost) >>- >> >>Security features (access controls to the whole workspace) >>- >> >>Collaboration (multi-user access to the same notebook, revision >>control, and IDE and GitHub integration) >>- >> >>Data management (support for connecting different data sources to >>Spark, caching service to speed up queries) >> >> >> The idea is that a lot of Spark deployments soon need to bring in >> multiple users, different types of jobs, etc, and we want to have these >> built-in. But if you just want to connect to existing data and run jobs, >> that also works. >> >> The cluster manager in Databricks is based on Standalone mode, not YARN, >> but Databricks adds several features, such as allowing multiple users to >> run commands on the same cluster and running multiple versions of Spark. >> Because Databricks is also the team that initially built Spark, the service >> is very up to date and integrated with the newest Spark features -- e.g. >> you can run previews of the next release, any data in Spark can be >> displayed visually, etc. >> >> *From: *Alex Nastetsky >> *Subject: **Databricks Cloud vs AWS EMR* >> *Date: *January 26, 2016 at 11:55:41 AM PST >> *To: *user >> >> As a user of AWS EMR (running Spark and MapReduce), I am interested in >> potential benefits that I may gain from Databricks Cloud. I was wondering >> if anyone has used both and done comparison / contrast between the two >> services. >> >> In general, which resource manager(s) does Databricks Cloud use for >> Spark? If it's YARN, can you also run MapReduce jobs in Databricks Cloud? >> >> Thanks. >> >> -- > > >
Re: Databricks Cloud vs AWS EMR
We use both databricks and emr. We use databricks for our exploratory / adhoc use cases because their notebook is pretty badass and better than Zeppelin IMHO. We use EMR for our production machine learning and ETL tasks. The nice thing about EMR is you can use applications other than spark. From a "tools in the toolbox" perspective this is very important. M > On Jan 28, 2016, at 6:05 PM, Sourav Mazumder> wrote: > > You can also try out IBM's spark as a service in IBM Bluemix. You'll get > there all required features for security, multitenancy, notebook, integration > with other big data services. You can try that out for free too. > > Regards, > Sourav > > On Thu, Jan 28, 2016 at 2:10 PM, Rakesh Soni wrote: At its core, EMR just launches Spark applications, whereas Databricks is a higher-level platform that also includes multi-user support, an interactive UI, security, and job scheduling. Specifically, Databricks runs standard Spark applications inside a user’s AWS account, similar to EMR, but it adds a variety of features to create an end-to-end environment for working with Spark. These include: Interactive UI (includes a workspace with notebooks, dashboards, a job scheduler, point-and-click cluster management) Cluster sharing (multiple users can connect to the same cluster, saving cost) Security features (access controls to the whole workspace) Collaboration (multi-user access to the same notebook, revision control, and IDE and GitHub integration) Data management (support for connecting different data sources to Spark, caching service to speed up queries) The idea is that a lot of Spark deployments soon need to bring in multiple users, different types of jobs, etc, and we want to have these built-in. But if you just want to connect to existing data and run jobs, that also works. The cluster manager in Databricks is based on Standalone mode, not YARN, but Databricks adds several features, such as allowing multiple users to run commands on the same cluster and running multiple versions of Spark. Because Databricks is also the team that initially built Spark, the service is very up to date and integrated with the newest Spark features -- e.g. you can run previews of the next release, any data in Spark can be displayed visually, etc. From: Alex Nastetsky Subject: Databricks Cloud vs AWS EMR Date: January 26, 2016 at 11:55:41 AM PST To: user As a user of AWS EMR (running Spark and MapReduce), I am interested in potential benefits that I may gain from Databricks Cloud. I was wondering if anyone has used both and done comparison / contrast between the two services. In general, which resource manager(s) does Databricks Cloud use for Spark? If it's YARN, can you also run MapReduce jobs in Databricks Cloud? Thanks. >> -- >