Hi Ben, > My company uses Lamba to do simple data moving and processing using python > scripts. I can see using Spark instead for the data processing would make it > into a real production level platform.
That may be true. Spark has first class support for Python which should make your life easier if you do go this route. Once you've fleshed out your ideas I'm sure folks on this mailing list can provide helpful guidance based on their real world experience with Spark. > Does this pave the way into replacing > the need of a pre-instantiated cluster in AWS or bought hardware in a > datacenter? In a word, no. SAMBA is designed to extend-not-replace the traditional Spark computation and deployment model. At it's most basic, the traditional Spark computation model distributes data and computations across worker nodes in the cluster. SAMBA simply allows some of those computations to be performed by AWS Lambda rather than locally on your worker nodes. There are I believe a number of potential benefits to using SAMBA in some circumstances: 1. It can help reduce some of the workload on your Spark cluster by moving that workload onto AWS Lambda, an infrastructure on-demand compute service. 2. It allows Spark applications written in Java or Scala to make use of libraries and features offered by Python and JavaScript (Node.js) today, and potentially, more libraries and features offered by additional languages in the future as AWS Lambda language support evolves. 3. It provides a simple, clean API for integration with REST APIs that may be a benefit to Spark applications that form part of a broader data pipeline or solution. > If so, then this would be a great efficiency and make an easier > entry point for Spark usage. I hope the vision is to get rid of all cluster > management when using Spark. You might find one of the hosted Spark platform solutions such as Databricks or Amazon EMR that handle cluster management for you a good place to start. At least in my experience, they got me up and running without difficulty. David --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org