Re: Can Spark stack scale to petabyte scale without performance degradation?
Thanks Matei. On Tue, Jul 15, 2014 at 11:47 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Yup, as mentioned in the FAQ, we are aware of multiple deployments running jobs on over 1000 nodes. Some of our proof of concepts involved people running a 2000-node job on EC2. I wouldn't confuse buzz with FUD :). Matei On Jul 15, 2014, at 9:17 PM, Sonal Goyal sonalgoy...@gmail.com wrote: Hi Rohit, I think the 3rd question on the FAQ may help you. https://spark.apache.org/faq.html Some other links that talk about building bigger clusters and processing more data: http://spark-summit.org/wp-content/uploads/2014/07/Building-1000-node-Spark-Cluster-on-EMR.pdf http://apache-spark-user-list.1001560.n3.nabble.com/Largest-Spark-Cluster-td3782.html Best Regards, Sonal Nube Technologies http://www.nubetech.co/ http://in.linkedin.com/in/sonalgoyal On Wed, Jul 16, 2014 at 9:17 AM, Rohit Pujari rpuj...@hortonworks.com wrote: Hello Folks: There is lot of buzz in the hadoop community around Spark's inability to scale beyond the 1 TB datasets ( or 10-20 nodes). It is being regarded as great tech for cpu intensive workloads on smaller data( less that TB) but fails to scale and perform effectively on larger datasets. How true it is? Are there any customers in who are running petabyte scale workloads on spark in production? Are there any benchmarks performed by databricks or other companies to clear this perception? I'm a big fan of spark. Knowing spark is in its early stages, I'd like to better understand boundaries of the tech and recommend right solution for right problem. Thanks, Rohit Pujari Solutions Engineer, Hortonworks rpuj...@hortonworks.com 716-430-6899 CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Rohit Pujari Solutions Engineer, Hortonworks rpuj...@hortonworks.com 716-430-6899 -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Can Spark stack scale to petabyte scale without performance degradation?
Hello Folks: There is lot of buzz in the hadoop community around Spark's inability to scale beyond the 1 TB datasets ( or 10-20 nodes). It is being regarded as great tech for cpu intensive workloads on smaller data( less that TB) but fails to scale and perform effectively on larger datasets. How true it is? Are there any customers in who are running petabyte scale workloads on spark in production? Are there any benchmarks performed by databricks or other companies to clear this perception? I'm a big fan of spark. Knowing spark is in its early stages, I'd like to better understand boundaries of the tech and recommend right solution for right problem. Thanks, Rohit Pujari Solutions Engineer, Hortonworks rpuj...@hortonworks.com 716-430-6899 -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Can Spark stack scale to petabyte scale without performance degradation?
Hi Rohit, I think the 3rd question on the FAQ may help you. https://spark.apache.org/faq.html Some other links that talk about building bigger clusters and processing more data: http://spark-summit.org/wp-content/uploads/2014/07/Building-1000-node-Spark-Cluster-on-EMR.pdf http://apache-spark-user-list.1001560.n3.nabble.com/Largest-Spark-Cluster-td3782.html Best Regards, Sonal Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Wed, Jul 16, 2014 at 9:17 AM, Rohit Pujari rpuj...@hortonworks.com wrote: Hello Folks: There is lot of buzz in the hadoop community around Spark's inability to scale beyond the 1 TB datasets ( or 10-20 nodes). It is being regarded as great tech for cpu intensive workloads on smaller data( less that TB) but fails to scale and perform effectively on larger datasets. How true it is? Are there any customers in who are running petabyte scale workloads on spark in production? Are there any benchmarks performed by databricks or other companies to clear this perception? I'm a big fan of spark. Knowing spark is in its early stages, I'd like to better understand boundaries of the tech and recommend right solution for right problem. Thanks, Rohit Pujari Solutions Engineer, Hortonworks rpuj...@hortonworks.com 716-430-6899 CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.